NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.

Forum Discussion

valk1's avatar
valk1
Guide
Apr 22, 2018

RN104 - BTRFS Read-Only - No SMART Errors

Hi There,

 

I'm going through a series of unfortunate events that seems to be a bit odd to me, I'd like to hear your opinion.

 

I've start to see strange errors in dmesg for my RN104, related to BTRFS transaction having bad ids.

root@ARMADA:~# dmesg | grep BTRFS
[ 20.625101] BTRFS: device label 0e36878a:data devid 1 transid 3469259 /dev/md127
[ 112.915132] BTRFS error (device md127): qgroup generation mismatch, marked as inconsistent
[ 113.051953] BTRFS info (device md127): checking UUID tree
[ 198.050654] BTRFS error (device md127): parent transid verify failed on 8599424794624 wanted 3462108 found 3469258
[ 198.060303] BTRFS error (device md127): parent transid verify failed on 8599424794624 wanted 3462108 found 3469258
[ 198.060348] BTRFS warning (device md127): Skipping commit of aborted transaction.
[ 198.060372] BTRFS: error (device md127) in cleanup_transaction:1856: errno=-5 IO failure
[ 198.060381] BTRFS info (device md127): forced readonly
[ 198.060399] BTRFS info (device md127): delayed_refs has NO entry
[ 323.995310] BTRFS error (device md127): cleaner transaction attach returned -30
[35853.010390] BTRFS error (device md127): open_ctree failed
[35965.598501] BTRFS error (device md127): qgroup generation mismatch, marked as inconsistent
[35965.760837] BTRFS info (device md127): checking UUID tree
[36036.989875] BTRFS error (device md127): parent transid verify failed on 8599424794624 wanted 3462108 found 3469258
[36036.999693] BTRFS error (device md127): parent transid verify failed on 8599424794624 wanted 3462108 found 3469258
[36036.999742] BTRFS warning (device md127): Skipping commit of aborted transaction.
[36036.999765] BTRFS: error (device md127) in cleanup_transaction:1856: errno=-5 IO failure
[36036.999774] BTRFS info (device md127): forced readonly
[36036.999908] BTRFS info (device md127): delayed_refs has NO entry
[36075.082924] BTRFS error (device md127): Remounting read-write after error is not allowed
[36174.833658] BTRFS error (device md127): cleaner transaction attach returned -30

 

The I/O error made me believe the HDD weren't working as expected but smartctl didn't find any issue.

I've then run an extended smart test for all the HDD, with no luck. disks work just fine according to SMART.

 

Turned off the NAS, took a drive out at the time:

  • dumped the whole disk with dd: no I/O errors
  • filled the disk with zeros for the full size: no I/O errors
  • restored previous dd: no I/O errors 
  • mounted the disk back
  • turned nas back up
  • checked the array was assembling without issue
  • reiterate x 4

I'd say disk are undeniably ok.

 

Since then I've run btrfs check with and without repair, no difference, always aborted due "qgroup generation mismatch, marked as inconsistent"

I've btrfs rescue zero-log the device, everythings fine, I can mount the disk rw, few seconds later, same transid mismatch different ids, same qgroup error.

Please note that the issue started:

Apr 19 15:03:55 ARMADA kernel: BTRFS error (device md127): parent transid verify failed on 8599424794624 wanted 3462108 found 3469258

There's been some alternance between the disk couldn't keep up with the transid as well as the wanted one.

I'd call a bug in the FS, but I need this fs in rw, I'd like to scrub (something I do every month on the 1st of the month as well as defrag).

I'm so frustrated as all disks are fine, what is the point of having a nas in fault tollerance if than the FS blows?

 

root@ARMADA:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md127 : active raid5 sdd3[4] sdb3[3] sdc3[2] sda3[5]
11706505920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

md1 : active raid6 sdb2[0] sda2[3] sdd2[2] sdc2[1]
1047552 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]

md0 : active raid1 sdd1[4] sda1[5] sdc1[2] sdb1[3]
4190208 blocks super 1.2 [4/4] [UUUU]

unused devices: <none>
root@ARMADA:~#

Note : md1 & md0 work just fine

root@ARMADA:~# btrfs fi show /dev/md127

Label: '0e36878a:data' uuid: 869ce344-d015-470c-8edb-4da20df085da
Total devices 1 FS bytes used 7.78TiB
devid 1 size 10.90TiB used 8.34TiB path /dev/md127

root@ARMADA:~#

root@ARMADA:~# smartctl -H /dev/sda | grep -i passed
SMART overall-health self-assessment test result: PASSED
root@ARMADA:~# smartctl -H /dev/sdb | grep -i passed
SMART overall-health self-assessment test result: PASSED
root@ARMADA:~# smartctl -H /dev/sdc | grep -i passed
SMART overall-health self-assessment test result: PASSED
root@ARMADA:~# smartctl -H /dev/sdd | grep -i passed
SMART overall-health self-assessment test result: PASSED
root@ARMADA:~#

 

Attached journalctl with all btrfs errors.

 

 

 

 

 

6 Replies

Replies have been turned off for this discussion
  • Retired_Member's avatar
    Retired_Member

    You are nowhere mentioning that you run balance on a regular base. If not done so far, please do.

  • Marc_V's avatar
    Marc_V
    NETGEAR Employee Retired

    Hi valk1

     

    Can you please send us the logs of your NAS

     

    Regards

     

    • valk1's avatar
      valk1
      Guide
      It is long gone.
      I was so disappointed by the grave mistake Netgear did using BTRFS in a device meant to support RAID5 and RAID6.
      I've tossed that lil **bleep** away.
      If I was Netgear I would fire whomever chose to use BTRFS in a NAS. Such a rookie mistake.

      I do hold grudges. Disks are perfect. Running in a homemade nas. SMART perfect. I was victim of the notorious multitude of bugs that affect BTRFS. I lost all my data because I trusted Netgear more than my skills. Never again.
      I wouldn't recommend ReadyNAS series ever.

      I got laughed by BTRFS guys on IRC for using it in a RAID5.
      I got pointed to the wiki and the whitepaper and I knew I was screwed.
      I am wondering if Netgear ever read these.

      LR;TD: DO NOT USE NETGEAR NAS.
      • StephenB's avatar
        StephenB
        Guru - Experienced User

        valk1 wrote:
        If I was Netgear I would fire whomever chose to use BTRFS in a NAS.

        Synology followed their lead, which suggests it's not as brain-dead as you seem to think it is.  Netgear doesn't use the BTRFS-raid modes (which the BTRFS folks say is unstable), instead they combined BTRFS with MDADM software RAID (which has been stable for quite a while).

         

        I've had no issues with it myself.  The main drawback is that there aren't a lot of good repair tools yet.  IMO, the snapshots and other BTRFS features make up for that.

         

        RAID with any file system isn't enough to keep your data safe, so you need a backup on a different device no matter what NAS you use.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology! 

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

 

Learn More