× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

RN104 - BTRFS Read-Only - No SMART Errors

valk1
Guide

RN104 - BTRFS Read-Only - No SMART Errors

Hi There,

 

I'm going through a series of unfortunate events that seems to be a bit odd to me, I'd like to hear your opinion.

 

I've start to see strange errors in dmesg for my RN104, related to BTRFS transaction having bad ids.

root@ARMADA:~# dmesg | grep BTRFS
[ 20.625101] BTRFS: device label 0e36878a:data devid 1 transid 3469259 /dev/md127
[ 112.915132] BTRFS error (device md127): qgroup generation mismatch, marked as inconsistent
[ 113.051953] BTRFS info (device md127): checking UUID tree
[ 198.050654] BTRFS error (device md127): parent transid verify failed on 8599424794624 wanted 3462108 found 3469258
[ 198.060303] BTRFS error (device md127): parent transid verify failed on 8599424794624 wanted 3462108 found 3469258
[ 198.060348] BTRFS warning (device md127): Skipping commit of aborted transaction.
[ 198.060372] BTRFS: error (device md127) in cleanup_transaction:1856: errno=-5 IO failure
[ 198.060381] BTRFS info (device md127): forced readonly
[ 198.060399] BTRFS info (device md127): delayed_refs has NO entry
[ 323.995310] BTRFS error (device md127): cleaner transaction attach returned -30
[35853.010390] BTRFS error (device md127): open_ctree failed
[35965.598501] BTRFS error (device md127): qgroup generation mismatch, marked as inconsistent
[35965.760837] BTRFS info (device md127): checking UUID tree
[36036.989875] BTRFS error (device md127): parent transid verify failed on 8599424794624 wanted 3462108 found 3469258
[36036.999693] BTRFS error (device md127): parent transid verify failed on 8599424794624 wanted 3462108 found 3469258
[36036.999742] BTRFS warning (device md127): Skipping commit of aborted transaction.
[36036.999765] BTRFS: error (device md127) in cleanup_transaction:1856: errno=-5 IO failure
[36036.999774] BTRFS info (device md127): forced readonly
[36036.999908] BTRFS info (device md127): delayed_refs has NO entry
[36075.082924] BTRFS error (device md127): Remounting read-write after error is not allowed
[36174.833658] BTRFS error (device md127): cleaner transaction attach returned -30

 

The I/O error made me believe the HDD weren't working as expected but smartctl didn't find any issue.

I've then run an extended smart test for all the HDD, with no luck. disks work just fine according to SMART.

 

Turned off the NAS, took a drive out at the time:

  • dumped the whole disk with dd: no I/O errors
  • filled the disk with zeros for the full size: no I/O errors
  • restored previous dd: no I/O errors 
  • mounted the disk back
  • turned nas back up
  • checked the array was assembling without issue
  • reiterate x 4

I'd say disk are undeniably ok.

 

Since then I've run btrfs check with and without repair, no difference, always aborted due "qgroup generation mismatch, marked as inconsistent"

I've btrfs rescue zero-log the device, everythings fine, I can mount the disk rw, few seconds later, same transid mismatch different ids, same qgroup error.

Please note that the issue started:

Apr 19 15:03:55 ARMADA kernel: BTRFS error (device md127): parent transid verify failed on 8599424794624 wanted 3462108 found 3469258

There's been some alternance between the disk couldn't keep up with the transid as well as the wanted one.

I'd call a bug in the FS, but I need this fs in rw, I'd like to scrub (something I do every month on the 1st of the month as well as defrag).

I'm so frustrated as all disks are fine, what is the point of having a nas in fault tollerance if than the FS blows?

 

root@ARMADA:~# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md127 : active raid5 sdd3[4] sdb3[3] sdc3[2] sda3[5]
11706505920 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]

md1 : active raid6 sdb2[0] sda2[3] sdd2[2] sdc2[1]
1047552 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]

md0 : active raid1 sdd1[4] sda1[5] sdc1[2] sdb1[3]
4190208 blocks super 1.2 [4/4] [UUUU]

unused devices: <none>
root@ARMADA:~#

Note : md1 & md0 work just fine

root@ARMADA:~# btrfs fi show /dev/md127

Label: '0e36878a:data' uuid: 869ce344-d015-470c-8edb-4da20df085da
Total devices 1 FS bytes used 7.78TiB
devid 1 size 10.90TiB used 8.34TiB path /dev/md127

root@ARMADA:~#

root@ARMADA:~# smartctl -H /dev/sda | grep -i passed
SMART overall-health self-assessment test result: PASSED
root@ARMADA:~# smartctl -H /dev/sdb | grep -i passed
SMART overall-health self-assessment test result: PASSED
root@ARMADA:~# smartctl -H /dev/sdc | grep -i passed
SMART overall-health self-assessment test result: PASSED
root@ARMADA:~# smartctl -H /dev/sdd | grep -i passed
SMART overall-health self-assessment test result: PASSED
root@ARMADA:~#

 

Attached journalctl with all btrfs errors.

 

 

 

 

 

Model: RN104|ReadyNAS 100 Series
Message 1 of 7
Retired_Member
Not applicable

Re: RN104 - BTRFS Read-Only - No SMART Errors

You are nowhere mentioning that you run balance on a regular base. If not done so far, please do.

Message 2 of 7
Marc_V
NETGEAR Employee Retired

Re: RN104 - BTRFS Read-Only - No SMART Errors

Hi @valk1

 

Can you please send us the logs of your NAS

 

Regards

 

Message 3 of 7
valk1
Guide

Re: RN104 - BTRFS Read-Only - No SMART Errors

It is long gone.
I was so disappointed by the grave mistake Netgear did using BTRFS in a device meant to support RAID5 and RAID6.
I've tossed that lil **bleep** away.
If I was Netgear I would fire whomever chose to use BTRFS in a NAS. Such a rookie mistake.

I do hold grudges. Disks are perfect. Running in a homemade nas. SMART perfect. I was victim of the notorious multitude of bugs that affect BTRFS. I lost all my data because I trusted Netgear more than my skills. Never again.
I wouldn't recommend ReadyNAS series ever.

I got laughed by BTRFS guys on IRC for using it in a RAID5.
I got pointed to the wiki and the whitepaper and I knew I was screwed.
I am wondering if Netgear ever read these.

LR;TD: DO NOT USE NETGEAR NAS.
Message 4 of 7
StephenB
Guru

Re: RN104 - BTRFS Read-Only - No SMART Errors


@valk1 wrote:
If I was Netgear I would fire whomever chose to use BTRFS in a NAS.

Synology followed their lead, which suggests it's not as brain-dead as you seem to think it is.  Netgear doesn't use the BTRFS-raid modes (which the BTRFS folks say is unstable), instead they combined BTRFS with MDADM software RAID (which has been stable for quite a while).

 

I've had no issues with it myself.  The main drawback is that there aren't a lot of good repair tools yet.  IMO, the snapshots and other BTRFS features make up for that.

 

RAID with any file system isn't enough to keep your data safe, so you need a backup on a different device no matter what NAS you use.

Message 5 of 7
valk1
Guide

Re: RN104 - BTRFS Read-Only - No SMART Errors

Mate,
You do not ship hardware meant to be used by unskilled people with software that isn't stable. Is just common sense.

Thousands of hits for my issue all around the globe: BTRFS not working just because.

BTRFS is in development without any serious breakthrough since ever, so much that RH deprecated it, mark my words Suse will too, soon.

I had no drive failures.
I had no power outages.
NAS was heavenly underused.
I am not mad because I lost my data.
I have backup for everything that matters.
I am disappointed because using BTRFS on mdadm slashes the performance of the FS and anyway DOES NOT fixes BTRFS issues it works them around.

I was 1:1000 that had issues . 0.001% does not male a statistics but does pisses me off if I am deliberately the minority that does not matter.

Synology does not differ from Netgear, consumer products that aim to sell. I knew when I bought it it wasn't going to be as stable as a professional solution, I was expecting hw failure, disk failures, write problems due ECC lacks, not a bloody entry in the journal that prevents the whole fs from working. I just wanted connect and use, without having to manage smartmon, thermal sensors, console & madam, etc.

Better off with EXT4+MDADM... for God's Sake even RaiserFS has better recovery tools, you grep the fs. Raiser was famus for being "Glitchy".
The "good" of BTRFS is not in discussion here, I do respect their job and the efforts they are doing.
Why not LVM+MDADM? What is the REAL gain in having snapshots on a home device?
This is about a design flow that exposes possible issues to any RN* running.

Choosing an fs practically impossible to recover data from and additionally that has a write hole ONGOING from years is a gamble.
A gamble that Netgear and Synology are taking at the expenses of people's data, obviously nor of them are responsible if you lose your data.
The FS was accessible RO in Recovery until it wasn't anymore. good night moon.

On the "NAS" is not BACKUP I know, but ONLY BECAUSE I am an IT professional. My father would think that it is Backup, you know why? Because it has BACKUP in the feature list and you buy it in a supermarket.

BTRFS is amazing (on paper) however 99.99% of people won't care about these features adn the one whom knows about them will not use them in an embeed hw, home users needs stability over features. You do not need a gun to catch a fly, you need a fly catcher.
Choosing a stabler combination of software would had definitely made me less bitter.

It is pointless to me to be bitter here and now.
I got away from it, you should too, professional opinion.
Message 6 of 7
StephenB
Guru

Re: RN104 - BTRFS Read-Only - No SMART Errors


@valk1 wrote:

I got away from it, you should too, professional opinion.

Well, in my own professional opinion it is stable (and the BTRFS team does claim that also), so we will have to agree to disagree on that.

 

Personally, I have four ReadyNAS deployed running OS-6 (one since 2013), and have never lost a BTRFS volume.  I did lose one once using ext (the disks were healthy - the failure was the result of an unexpected power cut).  Though obviously your experience wasn't as good as mine.

 

I do agree that the need for independent backups should be more prominently stated.

 

 

 

Message 7 of 7
Top Contributors
Discussion stats
  • 6 replies
  • 2358 views
  • 1 kudo
  • 4 in conversation
Announcements