NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

ronlaws86's avatar
May 31, 2018
Solved

btrfs corruption AGAIN

So I noticed today after trying to write a file to the nas that it was read only (Access denied) coudl not even edit files from the admin page. 

 

Sure enough i went in to SSH to peek at the dmesg output and this message filled my scrollbuffer in seconds 

[3740002.200944] BTRFS error (device md127): bad tree block start 9800114264141725795 18105397411840
[3740002.201191] BTRFS error (device md127): bad tree block start 9800114264141725795 18105397411840
[3740122.654060] BTRFS error (device md127): bad tree block start 9800114264141725795 18105397411840
[3740122.654306] BTRFS error (device md127): bad tree block start 9800114264141725795 18105397411840
[3740243.107223] BTRFS error (device md127): bad tree block start 9800114264141725795 18105397411840
[3740243.107472] BTRFS error (device md127): bad tree block start 9800114264141725795 18105397411840
[3740363.560247] BTRFS error (device md127): bad tree block start 9800114264141725795 18105397411840

i've Truncated it, but this message is repeated more times than i can scroll up to view the start. 

I also notice that the weekly scrubs have been failing, suggesting this issue started several weeks back. 

 

For now i am leaving it powere on in this odd state of limbo, the last time this happened i rebooted it, and lost my RAID5 array.

 

 

..Seriously how can anybody consider this filesystem production ready this is the 8th time in 2 years now.

  • So a quick update to this. I know it's been a year since my last post, but since this shows up in google now I may as well put a closing comment.

    Since my last post, Volume failures have continued and i've pretty much just come to accept this to be a quirk of bad implementation on Netgear's part and that these devices are simply unreliable due to the well known failings of BTRFS as a file system in general everywhere else in the linux community. I really wish the devices used XFS, but alas; we're stuck with the poor design choices Netgear gave us - short of hacking and flashing them with something else.

    In subsequent failures, i've not even bothered SSH'ing in to the devices, and used only factory provided tools (Exluding SSH which is still factory provided too btw) and used instead regular backup options as well as ReadyDR. - the file system still crashes. even at 90% which is a huge waste of otherwise perfectly usable free space.

    Regular volume house keeping has always been in place, weekly scrubs, monthly defrags, etc.

    Currently in the process of switching the disks over from Seagate Barracudas to WD Reds with X2 the capacity to hopefully mitigate the volume almost full self destruct issue that shouldn't exist in the first place (On any sane file system)

    But if even this fails to help and i end up once again with a busted volume later down the line, my advise to the general populace at the moment would be "Don't use these devices for anything mission critical where you expect the free space to become limited. If you do, don't trust the RAID configurations and stick to single disk shares, as bugs in BTRFS regarding RAID will likely leave you with a busted volume 6 months down the line."
    Until Netgear either fix the bugs in BTRFS (Unlikly) Or switch to a more mature and reliable Filesystem (Like XFS) these NAS drives are volatile at best.


25 Replies

Replies have been turned off for this discussion
  • StephenB's avatar
    StephenB
    Guru - Experienced User

    ronlaws86 wrote:

     

    I also notice that the weekly scrubs have been failing, suggesting this issue started several weeks back. 

     


    Did you get any email alerts on the scrubs failing?

     

    Also I am wondering if you are seeing issues in the SMART stats for any of the disks.

  • HI Stephen. 

     

    Yes, on the 13th I Have an e-mail to say the scrub had failed. 

     

    Here is a pastebin of the smartctl output

     

    I see no reallocated sectors, though the read/write error count seems a bit disconcerting,(Apparently normal for seagate drives? wth) These were brand new Seagate Barracuda drives though, purchased at the same time as the NAS Drives.

     

     

    On a side note; I ran a manual scrub over night via ssh after shutting down all services on the nas and unmounting the array, 

    issuing 

    echo repair > /sys/block/md127/md/sync_action

    and according to dmesg, it completed without error.  

    [3742947.701351] md: requested-resync of RAID array md127
    [3742947.701361] md: minimum _guaranteed_  speed: 30000 KB/sec/disk.
    [3742947.701368] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync.
    [3742947.701377] md: using 128k window, over a total of 1948664832k.
    [3764596.073548] md: md127: requested-resync done.
    

     

    • ronlaws86's avatar
      ronlaws86
      Guide

      Quick update: 

      I rebooted the nas after veryfying the array was sat idle and no scans were going on; sure enough the array has come back up, however the filesystem is totally dead. 

      root@INT-NAS-1:~# mount /dev/md127 /data
      mount: wrong fs type, bad option, bad superblock on /dev/md127,
             missing codepage or helper program, or other error
      
             In some cases useful info is found in syslog - try
             dmesg | tail or so.
      root@INT-NAS-1:~# btrfsck --repair /dev/md127
      enabling repair mode
      bytenr mismatch, want=18105397379072, have=16016835313664
      ERROR: cannot read chunk root
      ERROR: cannot open file system
      root@INT-NAS-1:~# 
      

      dmesg: 

      [  154.994060] BTRFS info (device md127): has skinny extents
      [  154.995265] BTRFS critical (device md127): unable to find logical 513211334656 len 4096
      [  154.995278] BTRFS critical (device md127): unable to find logical 513211334656 len 4096
      [  154.995321] BTRFS critical (device md127): unable to find logical 513211334656 len 4096
      [  154.995330] BTRFS critical (device md127): unable to find logical 513211334656 len 4096
      [  154.995358] BTRFS critical (device md127): unable to find logical 513211334656 len 4096
      [  154.995367] BTRFS critical (device md127): unable to find logical 513211334656 len 4096
      [  154.995384] BTRFS error (device md127): failed to read chunk root
      [  155.057373] BTRFS error (device md127): open_ctree failed
      

       

      • mdgm-ntgr's avatar
        mdgm-ntgr
        NETGEAR Employee Retired

        Please send us your logs (see the Sending Logs link in my sig)

  • The same thing happend to me today. All the shares now appear empty. I never used SSH and didn't do anything unusual. Using the latest firmware (6.9.3).

     

    [Mon Jun  4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096
    [Mon Jun  4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096
    [Mon Jun  4 12:07:15 2018] BTRFS info (device md127): no csum found for inode 8749 start 406678265856
    [Mon Jun  4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096
    [Mon Jun  4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096
    [Mon Jun  4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096
    [Mon Jun  4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096
    [Mon Jun  4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096
    [Mon Jun  4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096
    [Mon Jun  4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096

    I'm going to try to run a scrub but I think i'll have to reinstall the NAS.

    • mdgm-ntgr's avatar
      mdgm-ntgr
      NETGEAR Employee Retired

      The volume maintenance options in the GUI are not the way to deal with problems like this.

       

      If you don’t have an up to date backup and need a data recovery attempt you could contact support.

      • ronlaws86's avatar
        ronlaws86
        Guide
        So a quick update to this. I know it's been a year since my last post, but since this shows up in google now I may as well put a closing comment.

        Since my last post, Volume failures have continued and i've pretty much just come to accept this to be a quirk of bad implementation on Netgear's part and that these devices are simply unreliable due to the well known failings of BTRFS as a file system in general everywhere else in the linux community. I really wish the devices used XFS, but alas; we're stuck with the poor design choices Netgear gave us - short of hacking and flashing them with something else.

        In subsequent failures, i've not even bothered SSH'ing in to the devices, and used only factory provided tools (Exluding SSH which is still factory provided too btw) and used instead regular backup options as well as ReadyDR. - the file system still crashes. even at 90% which is a huge waste of otherwise perfectly usable free space.

        Regular volume house keeping has always been in place, weekly scrubs, monthly defrags, etc.

        Currently in the process of switching the disks over from Seagate Barracudas to WD Reds with X2 the capacity to hopefully mitigate the volume almost full self destruct issue that shouldn't exist in the first place (On any sane file system)

        But if even this fails to help and i end up once again with a busted volume later down the line, my advise to the general populace at the moment would be "Don't use these devices for anything mission critical where you expect the free space to become limited. If you do, don't trust the RAID configurations and stick to single disk shares, as bugs in BTRFS regarding RAID will likely leave you with a busted volume 6 months down the line."
        Until Netgear either fix the bugs in BTRFS (Unlikly) Or switch to a more mature and reliable Filesystem (Like XFS) these NAS drives are volatile at best.


NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More