× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

ReadyNAS 104: Sudden BTRFS IO Failure - do I need a new NAS, or is this potentially fixable?

grhmp
Aspirant

ReadyNAS 104: Sudden BTRFS IO Failure - do I need a new NAS, or is this potentially fixable?

Hi.

I have a ReadyNas 104 that stopped working all of a sudden.

 

Backup state:
My latest backup (to an external HD) is maybe a year old. I'll lose nothing critical if I can't get it working again, but it's going to be annoying, so I have to try.

 

System:
ReadyNAS 104 running firmware 6.10.1. (Bought new in 2013)
4 x Seagate IronWolf 8TB 3.5'' NAS HDD ( ST8000VN0022 ) in Raid 5. (Second set of disks, bought in 2017/2018)

 

Discovering the problem:
1. Got an error message when I tried to mount one of the shares from an ubuntu box. I do this every other day, so it can't have been failing for long. This was september 19th/20th.
2. Logged into the admin interface. Device status is green/healthy. Checked the logs:

The last entries:
Sep 20, 2019 05:30:49 PM System: ReadyNASOS background service started.
Sep 20, 2019 05:01:59 PM System: The system is shutting down.
Sep 19, 2019 10:18:18 PM System: ReadyNASOS background service started.
Sep 19, 2019 10:14:56 PM System: The system is rebooting. <-- This is me rebooting (See 4)
Sep 15, 2019 01:16:52 PM Volume: The volume data encountered an error and was made read-only. It is recommended to backup your data.
Sep 15, 2019 01:16:17 PM System: ReadyNASOS background service started.
Sep 12, 2019 07:53:11 AM System: The system is shutting down.
Aug 31, 2019 11:32:32 AM Volume: The volume data encountered an error and was made read-only. It is recommended to backup your data.
Aug 11, 2019 09:01:18 PM System: ReadyNASOS background service started.
Aug 09, 2019 08:56:15 AM System: The system is shutting down.

 

3. Pretty sure I was able to access my shares via the admin interface. Or at the very least, I could see my shares.
4. Decided to reboot to see if that fixed anything, but that turned out to be a mistake.
5. After rebooting, the nas seems to be unable to understand the disks or the raid setup, and gives me this warning:

"No volume exists. NETGEAR recommends that you create a volume before configuring other settings. Navigate to the System -> Volumes page to create a volume."

6. Obviously, I'm no longer able to access my shares via the admin interface either.
7. Installed raidar on a windows machine, and extracted the following diagnostics information:

2019-09-15 13:15:51: BTRFS: error (device md127) in btrfs_commit_transaction:2241: errno=-5 IO failure (Error while writing out transaction)
2019-08-31 11:30:23: BTRFS: error (device md127) in cleanup_transaction:1864: errno=-5 IO failure

 

8. I check the volumes tab, and all the drives are colored red. According to the mouse over tooltip, they all are:
Temperature: 46C
ATA Error: 0
Volume State: NEW
Disk State: ONLINE

 

9. Used Raidar to download all the logs, and as far as I can tell from looking at several of them, the SMART self-checks on the drives all seem to think things are fine. This leads me to suspect that the problem is on the internal storage device on the NAS (where the OS lives), not one of the drives.

10. Looking at kernel.log, I found a whole lot of entries like this, starting sept 18th:
Sep 18 07:35:25 nas kernel: BTRFS error (device md127): bad tree block start 6271299006300265298 20197114380288

It also lists my raid setup:
Sep 20 17:28:23 nas kernel: md/raid:md127: raid level 5 active with 4 out of 4 devices, algorithm 2
Sep 20 17:28:23 nas kernel: RAID conf printout:
Sep 20 17:28:23 nas kernel: --- level:5 rd:4 wd:4
Sep 20 17:28:23 nas kernel: disk 0, o:1, dev:sdd3
Sep 20 17:28:23 nas kernel: disk 1, o:1, dev:sdc3
Sep 20 17:28:23 nas kernel: disk 2, o:1, dev:sdb3
Sep 20 17:28:23 nas kernel: disk 3, o:1, dev:sda3

And eventually what I assume is the reason my shares aren't mounted:

Sep 20 17:28:24 nas kernel: BTRFS info (device md127): setting nodatasum
Sep 20 17:28:24 nas kernel: BTRFS info (device md127): has skinny extents
Sep 20 17:28:24 nas kernel: BTRFS error (device md127): bad tree block start 4771527319059878682 20197113266176
Sep 20 17:28:24 nas kernel: BTRFS error (device md127): bad tree block start 4771527319059878682 20197113266176
Sep 20 17:28:24 nas kernel: BTRFS warning (device md127): failed to read tree root
Sep 20 17:28:24 nas kernel: BTRFS error (device md127): bad tree block start 4771527319059878682 20197113266176
Sep 20 17:28:24 nas kernel: BTRFS error (device md127): bad tree block start 4771527319059878682 20197113266176
Sep 20 17:28:24 nas kernel: BTRFS warning (device md127): failed to read tree root
Sep 20 17:29:39 nas kernel: BTRFS error (device md127): logical 20220198518784 len 1073741824 found bg but no related chunk
Sep 20 17:29:39 nas kernel: BTRFS error (device md127): failed to read block groups: -2
Sep 20 17:29:40 nas kernel: BTRFS error (device md127): failed to read block groups: -17
Sep 20 17:29:40 nas kernel: BTRFS error (device md127): parent transid verify failed on 15174143508480 wanted 76685 found 76687
Sep 20 17:29:40 nas kernel: BTRFS error (device md127): parent transid verify failed on 15174143508480 wanted 76685 found 76687
Sep 20 17:29:40 nas kernel: BTRFS error (device md127): open_ctree failed

Looking at disk_info.log, none of the disks report any errors.

 

Questions and support:
The nas itself is over 6 years old, so obviously I have no warranties or support coverage. I'm happy to pay for support if this is something that can (potentially) be fixed, but I'm not too keen on paying 200 euro just for someone to tell me I need a new NAS.


Any suggestions or recommendations?

 


Things I haven't tried yet:
Reading the disks as described here: https://community.netgear.com/t5/Using-your-ReadyNAS-in-Business/RN104-quot-No-Volume-Exists-quot-er...
SSH-ing into the nas and poke around. I'm not able to enable SSH, and haven't spent time trying to figure out why. (Unable to start or modify service. Code 15002030001)


--Regards
grhmp

Model: RN10400|ReadyNAS 100 Series 4- Bay (Diskless)
Message 1 of 5
StephenB
Guru

Re: ReadyNAS 104: Sudden BTRFS IO Failure - do I need a new NAS, or is this potentially fixable?

You have file system corruption.  It's not clear why that happened.  One possibility is a problem with one or more disks.

 

So one option is to use paid support (my.netgear.com), and see if they can mount the volume.  If data recovery is needed, the terms of service for that are here: https://kb.netgear.com/69/ReadyNAS-Data-Recovery-Diagnostics-Scope-of-Service

 

If you have a backup of the data (or are ok with losing it), then you could try testing the disks in a Windows PC with Seatools.  If they pass, then do a factory default, reconfigure the NAS, reinstall the apps and restore the data.

 

 

Message 2 of 5
Sandshark
Sensei

Re: ReadyNAS 104: Sudden BTRFS IO Failure - do I need a new NAS, or is this potentially fixable?

The OS that is running reides on the drives, not on the flash.  This is clearly not a problem with the flash, but with your actual volume.  The flash contains an absolutey clean copy of the OS, just to install on a new set of doives, and a boot-loader ro load into the various boot menu modes.  When a normal boot is performed, it passes contro to the OS on the drives, and that's when your volume gets mounted.

 

It is a shame you didn't head the warning the NAS was giving.  It is nearly impossible to recover from an error that causes a volume to go read-only.  Even when I had one caused by a loose cable to an EDA500, so the root cause to be created, the damage to the volume was irreparable.  The correctiove action is to make the recommended backup, do a factory default (or, a volume destroy and-re-create if you have multiple volumes) and restore that data.

 

By continuing to re-boot, you gave multiple drives the oppoprtunity to go out of sync, killing the volume.  Now, the question is "What drove that?"

 

With SMART saying the drives are OK, it could be a hardware problem within the NAS.

 

If there is any reason you want to try and recover data, take note of how many volumes are displayed on the Volumes page, and their names.  You'll probably see more than one in the form data-0, data-1, etc.  The, go the Performace and hover the mouse over the "dot" for each drive and note which volume it says it's a part of.  This can give us a start, though the full logs mey be necessary.

 

The SMART data from the log is likely accurate, but testing the drives with with vendor tools is a good troubleshooting step.  I like looking at SMART with Crystal Dsikinfo, too, because the vendor tools just give a pass/fail.

 

You can also test the chassis with a spare drive (whose contents will be deleted) by removing your normal drives and instering just the spare.  Let the NAS initialize it, and create a single-drive volume.  Then, power down, move the drive to the next bay, and power up.  Repeat till all are tested.  if it fails to boot in one of more slots, but boots in some, it's definately a hardware problem.

 

If you are OK with the backup you have, you could just do a factory default and see what happens.  If one drive fails to sync, test it with vendor tools..  If it really is OK, swap the drives around and try again.  If the same slot(s) have an issue with another drive, it's a hardware problem you aren't going to solve.

Message 3 of 5
grhmp
Aspirant

Re: ReadyNAS 104: Sudden BTRFS IO Failure - do I need a new NAS, or is this potentially fixable?

First of all, thank you to both of you for taking the time to respond. I very much appriciate it!

 

It's been very busy week, so I haven't gotten around to looking more at this until last night.

 

The current status is:
I connected 3 drives to my desktop, and found that ReclaimMe (which I saw recommended in another thread describing a similar problem) could read the file system. Decided that paying for recovery software was a more attractive alternative than paying for recovery service for several reasons (lifetime license, I have other disks I might want to recover, learning experience, etc) so I bought it and I'm currently copying my content (seemingly without issues) to an external USB disk.

 

So I *think* the backup side of things will be fine.

 

As for the NAS and disks; I checked the SMART data in the performance tab as Sandshark suggested, and it too reported no errors on any of the disks. So my current plan is that when I'm done with the data recovery part (Which I fear will take at least 1-2 weeks) I'm probably going to go with the suggestion to pick a drive and try to boot the NAS with the drive once in each slot, and see if I can narrow the problem down a bit.

 

I'll be back with updates and/or questions, depending on how this works out. Again, I very much appriciate the responses and input so far. 🙂

 

Message 4 of 5
grhmp
Aspirant

Re: ReadyNAS 104: Sudden BTRFS IO Failure - do I need a new NAS, or is this potentially fixable?

A quick update:

 

I've recovered, as far as I can tell, pretty much all the files, but I've lost some of the directory structure, meaning that effectively they're a jumbled mess. So at some point in the near future, I have a sorting/cleanup job I'm not looking forward to.

 

On the hardware side, I used a drive and initialized it successfully in all the four slots in the NAS, as suggested. No issues. All drives still report no Smart errors. So I've put them all back in, and the nas is now syncing (very, very slowly). I've also invested in more cold storage where I'm going to duplicate everything once I'm done sorting.

Message 5 of 5
Top Contributors
Discussion stats
  • 4 replies
  • 1181 views
  • 0 kudos
  • 3 in conversation
Announcements