btrfs corruption AGAIN

So I noticed today after trying to write a file to the nas that it was read only (Access denied) coudl not even edit files from the admin page.

Sure enough i went in to SSH to peek at the dmesg output and this message filled my scrollbuffer in seconds

[3740002.200944] BTRFS error (device md127): bad tree block start 9800114264141725795 18105397411840
[3740002.201191] BTRFS error (device md127): bad tree block start 9800114264141725795 18105397411840
[3740122.654060] BTRFS error (device md127): bad tree block start 9800114264141725795 18105397411840
[3740122.654306] BTRFS error (device md127): bad tree block start 9800114264141725795 18105397411840
[3740243.107223] BTRFS error (device md127): bad tree block start 9800114264141725795 18105397411840
[3740243.107472] BTRFS error (device md127): bad tree block start 9800114264141725795 18105397411840
[3740363.560247] BTRFS error (device md127): bad tree block start 9800114264141725795 18105397411840

i've Truncated it, but this message is repeated more times than i can scroll up to view the start.

I also notice that the weekly scrubs have been failing, suggesting this issue started several weeks back.

For now i am leaving it powere on in this odd state of limbo, the last time this happened i rebooted it, and lost my RAID5 array.

..Seriously how can anybody consider this filesystem production ready this is the 8th time in 2 years now.

Other Discussions

ronlaws86
Sep 17, 2019
So a quick update to this. I know it's been a year since my last post, but since this shows up in google now I may as well put a closing comment.

Since my last post, Volume failures have continued and i've pretty much just come to accept this to be a quirk of bad implementation on Netgear's part and that these devices are simply unreliable due to the well known failings of BTRFS as a file system in general everywhere else in the linux community. I really wish the devices used XFS, but alas; we're stuck with the poor design choices Netgear gave us - short of hacking and flashing them with something else.

In subsequent failures, i've not even bothered SSH'ing in to the devices, and used only factory provided tools (Exluding SSH which is still factory provided too btw) and used instead regular backup options as well as ReadyDR. - the file system still crashes. even at 90% which is a huge waste of otherwise perfectly usable free space.

Regular volume house keeping has always been in place, weekly scrubs, monthly defrags, etc.

Currently in the process of switching the disks over from Seagate Barracudas to WD Reds with X2 the capacity to hopefully mitigate the volume almost full self destruct issue that shouldn't exist in the first place (On any sane file system)

But if even this fails to help and i end up once again with a busted volume later down the line, my advise to the general populace at the moment would be "Don't use these devices for anything mission critical where you expect the free space to become limited. If you do, don't trust the RAID configurations and stick to single disk shares, as bugs in BTRFS regarding RAID will likely leave you with a busted volume 6 months down the line."
Until Netgear either fix the bugs in BTRFS (Unlikly) Or switch to a more mature and reliable Filesystem (Like XFS) these NAS drives are volatile at best.

25 Replies

Replies have been turned off for this discussion

StephenB
Guru - Experienced User
May 31, 2018
ronlaws86 wrote:

I also notice that the weekly scrubs have been failing, suggesting this issue started several weeks back.

Did you get any email alerts on the scrubs failing?

Also I am wondering if you are seeing issues in the SMART stats for any of the disks.

ronlaws86

Guide

Jun 01, 2018

HI Stephen.

Yes, on the 13th I Have an e-mail to say the scrub had failed.

Here is a pastebin of the smartctl output

I see no reallocated sectors, though the ~~read/write error count seems a bit disconcerting,~~(Apparently normal for seagate drives? wth) These were brand new Seagate Barracuda drives though, purchased at the same time as the NAS Drives.

On a side note; I ran a manual scrub over night via ssh after shutting down all services on the nas and unmounting the array,

issuing

echo repair > /sys/block/md127/md/sync_action

and according to dmesg, it completed without error.

[3742947.701351] md: requested-resync of RAID array md127
[3742947.701361] md: minimum _guaranteed_  speed: 30000 KB/sec/disk.
[3742947.701368] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync.
[3742947.701377] md: using 128k window, over a total of 1948664832k.
[3764596.073548] md: md127: requested-resync done.

ronlaws86

Guide

Jun 01, 2018

Quick update:

I rebooted the nas after veryfying the array was sat idle and no scans were going on; sure enough the array has come back up, however the filesystem is totally dead.

root@INT-NAS-1:~# mount /dev/md127 /data
mount: wrong fs type, bad option, bad superblock on /dev/md127,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
root@INT-NAS-1:~# btrfsck --repair /dev/md127
enabling repair mode
bytenr mismatch, want=18105397379072, have=16016835313664
ERROR: cannot read chunk root
ERROR: cannot open file system
root@INT-NAS-1:~#

dmesg:

[  154.994060] BTRFS info (device md127): has skinny extents
[  154.995265] BTRFS critical (device md127): unable to find logical 513211334656 len 4096
[  154.995278] BTRFS critical (device md127): unable to find logical 513211334656 len 4096
[  154.995321] BTRFS critical (device md127): unable to find logical 513211334656 len 4096
[  154.995330] BTRFS critical (device md127): unable to find logical 513211334656 len 4096
[  154.995358] BTRFS critical (device md127): unable to find logical 513211334656 len 4096
[  154.995367] BTRFS critical (device md127): unable to find logical 513211334656 len 4096
[  154.995384] BTRFS error (device md127): failed to read chunk root
[  155.057373] BTRFS error (device md127): open_ctree failed

mdgm-ntgr
NETGEAR Employee Retired
Jun 01, 2018
Please send us your logs (see the Sending Logs link in my sig)

Ikalou
Aspirant
Jun 04, 2018
The same thing happend to me today. All the shares now appear empty. I never used SSH and didn't do anything unusual. Using the latest firmware (6.9.3).

[Mon Jun 4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096 [Mon Jun 4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096 [Mon Jun 4 12:07:15 2018] BTRFS info (device md127): no csum found for inode 8749 start 406678265856 [Mon Jun 4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096 [Mon Jun 4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096 [Mon Jun 4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096 [Mon Jun 4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096 [Mon Jun 4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096 [Mon Jun 4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096 [Mon Jun 4 12:07:15 2018] BTRFS critical (device md127): unable to find logical 13599006130176 len 4096

I'm going to try to run a scrub but I think i'll have to reinstall the NAS.
- mdgm-ntgr
  NETGEAR Employee Retired
  Jun 05, 2018
  The volume maintenance options in the GUI are not the way to deal with problems like this.
  
  If you don’t have an up to date backup and need a data recovery attempt you could contact support.
  - ronlaws86
    Guide
    Sep 17, 2019
    So a quick update to this. I know it's been a year since my last post, but since this shows up in google now I may as well put a closing comment.
    
    Since my last post, Volume failures have continued and i've pretty much just come to accept this to be a quirk of bad implementation on Netgear's part and that these devices are simply unreliable due to the well known failings of BTRFS as a file system in general everywhere else in the linux community. I really wish the devices used XFS, but alas; we're stuck with the poor design choices Netgear gave us - short of hacking and flashing them with something else.
    
    In subsequent failures, i've not even bothered SSH'ing in to the devices, and used only factory provided tools (Exluding SSH which is still factory provided too btw) and used instead regular backup options as well as ReadyDR. - the file system still crashes. even at 90% which is a huge waste of otherwise perfectly usable free space.
    
    Regular volume house keeping has always been in place, weekly scrubs, monthly defrags, etc.
    
    Currently in the process of switching the disks over from Seagate Barracudas to WD Reds with X2 the capacity to hopefully mitigate the volume almost full self destruct issue that shouldn't exist in the first place (On any sane file system)
    
    But if even this fails to help and i end up once again with a busted volume later down the line, my advise to the general populace at the moment would be "Don't use these devices for anything mission critical where you expect the free space to become limited. If you do, don't trust the RAID configurations and stick to single disk shares, as bugs in BTRFS regarding RAID will likely leave you with a busted volume 6 months down the line."
    Until Netgear either fix the bugs in BTRFS (Unlikly) Or switch to a more mature and reliable Filesystem (Like XFS) these NAS drives are volatile at best.

Forum Discussion

btrfs corruption AGAIN

25 Replies

Related Content

RN214 btrfs corruption forced readonly

ReadyNAS 214 BTRFS corruption - appears to be out of memory related

Readynas rn10400 btrfs error

BTRFS error ReadyNAS 314, OS6.10.8

BTRFS command not found

NETGEAR Academy

ProSupport for Business