NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
alaeth
Jul 13, 2017Aspirant
Missing volume after hard reboot
After losing access (no web, ssh, mounts, anything) from my ReadyNAS Pro 6 running 6.7.4 firmware, I decided to hard-reboot it by holding down the power button until it powered off. after booting...
- Jul 23, 2017
I'm going to consider the data lost, and rebuild/factory reset back down to 4.2 for stability.
I think this is the answer without getting into costly data recovery services, or massive time investment.
StephenB
Jul 13, 2017Guru - Experienced User
It looks like there's some file corruption. Per Kernel.log
Jul 12 20:27:09 readynas01 kernel: BTRFS: device label 33ea999f:data devid 1 transid 1431232 /dev/md127
Jul 12 20:27:09 readynas01 kernel: BTRFS info (device md127): has skinny extents
Jul 12 20:27:10 readynas01 kernel: BTRFS critical (device md127): corrupt leaf, slot offset bad: block=2291416563712, root=1, slot=77
Jul 12 20:27:10 readynas01 kernel: BTRFS error (device md127): failed to read block groups: -5
Jul 12 20:27:10 readynas01 kernel: BTRFS critical (device md127): corrupt leaf, slot offset bad: block=2291416563712, root=1, slot=77
Jul 12 20:27:10 readynas01 kernel: BTRFS error (device md127): failed to read block groups: -5
Jul 12 20:27:10 readynas01 kernel: BTRFS critical (device md127): corrupt leaf, slot offset bad: block=2291416563712, root=1, slot=77
Jul 12 20:27:10 readynas01 kernel: BTRFS error (device md127): failed to read block groups: -5
Jul 12 20:27:10 readynas01 kernel: BTRFS critical (device md127): corrupt leaf, slot offset bad: block=2291416563712, root=1, slot=77
Jul 12 20:27:10 readynas01 kernel: BTRFS error (device md127): failed to read block groups: -5
Jul 12 20:27:10 readynas01 kernel: BTRFS critical (device md127): corrupt leaf, slot offset bad: block=2291416563712, root=1, slot=77
Jul 12 20:27:10 readynas01 kernel: BTRFS error (device md127): failed to read block groups: -5
Jul 12 20:27:10 readynas01 kernel: BTRFS error (device md127): open_ctree failed
jak0lantash might have some suggestions on next steps.
- alaethJul 13, 2017Aspirant
That seems... not good. :(
I bought a QNAP and am trying to get data restored manually for now. Luckily I have online backups of the critical stuff (10+ years of photography shoots and proofs).
Regardless of the outcome, I think I'll roll-back the firmware to the "officially" supported version 4.x. Ever since upgrading to the unsupported 6.x, I've noticed it locks up fairly often - I even setup a cron job to reboot it nightly to try and reduce the impact.
I know any OS dis-likes hard-reset... and it's been restarted this way a few times in the past couple months.
My wife thinks it's because she gave it the middle finger before powering it off...
- jak0lantashJul 13, 2017Mentor
I'm taking a look at the logs now. But you should remove them for Google Drive. There is your serial number in there.
- jak0lantashJul 13, 2017Mentor
You have 5 good drives (WD30EFRX), the 6th is a Seagate Desktop drive (ST3000DM001), that's a shame.
It looks like this drive has shown a failure rate higher than normal:
https://www.backblaze.com/blog/3tb-hard-drive-failure/
https://www.extremetech.com/extreme/222267-seagate-faces-lawsuit-over-3tb-hard-drive-failure-rates
If it was my NAS, I would replace it immediately. It doesn't show any error, but has 25,000 hours and a terrible reputation.
Based on the logs:
You have daily snapshots on 8 shares (applications, backup, Documents, istat, Music, Pictures, Transmission, Videos).
No balance nor defrag were run since the creation of the volume, 2017/01/04.
I cannot see the metadata allocation because the data volume isn't mounted.
The logs don't give any details about the configuration of the shares in regards to Bit Rot Protection (implies Copy-on-Write) and Compression, can you tell us?
My guess is that the metadata allocation is high and the data fragmented which may lead to crashes. But the issue can be completely unrelated. It's not something I can explain from these logs. Unfortunately, after a reboot, the dmesg logs are flushed. It's possible to setup something like netconsole to push the kernel logs with a similar process to syslog in order to capture them on another machine, so you have the ones just before the crash.
Not sure why, but you have a cron job to capture the last status of processes and load every 5 minutes. Maybe in an attempt to debug the lock-ups?
The LAN interface seems to be connected to a FastEthernet network (100Mbps), that's weird.
It would be interesting to know to read the loadavg.log file, but I don't know how. (@mdgm maybe?)
The NAS was power-cycled twice recently:
- It seems that the NAS was power-cycled on Jun 18 15:01 (the logs start there so can't tell when it hung).
- It seems that the NAS crashed shortly after Jun 18 18:25 and was power-cycled on Jun 19 20:28. There didn't seem to be much going on before the crash, in terms of process anyway.
Power-cycles clearly don't help, but I understand you may not have had a choice.
The NAS remained then shutdown for nearly a month. When it booted last, the volume was unmountable.
That's when it failed to mount the data volume.
Jul 12 20:27:10 readynas01 kernel: BTRFS critical (device md127): corrupt leaf, slot offset bad: block=2291416563712, root=1, slot=77 Jul 12 20:27:10 readynas01 kernel: BTRFS error (device md127): failed to read block groups: -5 Jul 12 20:27:10 readynas01 kernel: BTRFS critical (device md127): corrupt leaf, slot offset bad: block=2291416563712, root=1, slot=77 Jul 12 20:27:10 readynas01 kernel: BTRFS error (device md127): failed to read block groups: -5 Jul 12 20:27:10 readynas01 kernel: BTRFS critical (device md127): corrupt leaf, slot offset bad: block=2291416563712, root=1, slot=77 Jul 12 20:27:10 readynas01 kernel: BTRFS error (device md127): failed to read block groups: -5 Jul 12 20:27:10 readynas01 kernel: BTRFS critical (device md127): corrupt leaf, slot offset bad: block=2291416563712, root=1, slot=77 Jul 12 20:27:10 readynas01 kernel: BTRFS error (device md127): failed to read block groups: -5 Jul 12 20:27:10 readynas01 kernel: BTRFS critical (device md127): corrupt leaf, slot offset bad: block=2291416563712, root=1, slot=77 Jul 12 20:27:10 readynas01 kernel: BTRFS error (device md127): failed to read block groups: -5 Jul 12 20:27:10 readynas01 kernel: BTRFS error (device md127): open_ctree failed Jul 12 20:27:10 readynas01 mount[1465]: mount: wrong fs type, bad option, bad superblock on /dev/md127, Jul 12 20:27:10 readynas01 mount[1465]: missing codepage or helper program, or other error Jul 12 20:27:10 readynas01 mount[1465]: In some cases useful info is found in syslog - try Jul 12 20:27:10 readynas01 mount[1465]: dmesg | tail or so. Jul 12 20:27:10 readynas01 systemd[1]: data.mount: Mount process exited, code=exited status=32 Jul 12 20:27:10 readynas01 systemd[1]: Failed to mount /data.
That's when you usually see many reboot attempts in the log as panic grows, but it doesn't seem you tried to reboot the NAS at all.
I would change the fstab from:
LABEL=33ea999f:data /data btrfs defaults 0 0
to:
LABEL=33ea999f:data /data btrfs defaults,ro,recovery 0 0
and try to reboot the NAS gracefully from the GUI or rn_shutdown -r
After reboot, if the data volume mounts OK, update your backups immediately.
Please then give the ouput of this command:
btrfs fi us /data
After your backups are complete (AND inspected!!!), you'll have to recreate the volume and reimport the data.
- alaethJul 14, 2017Aspirant
Thanks for the write-up. I'll post details from the NAS once I'm home and have tried your suggestions.
Good point on the logs, I've disabled sharing.
Agreed on the Seagate... it was my first 3GB purchase.
I think copy-on-write and compression are enabled...? not 100% sure if those are defaults with 6.x
You are correct, the cron 5 minute was an attempt to narrow down the crashing cause. Good news is a have Splunk-Universal-Forwarder installed and configured. Everything from /var/log/ _should_ be captured on my Windows desktop Splunk server (shameless plug: Splunk is 100% free if your data volume is less than 500gb/day - super awesome for troubleshooting faults like this as you can correlate timesamped event across multiple files)
100mbit LAN is correct - I moved the NAS from the basement to my office upstairs and the switch there is only 100.
Once I realized the data volume was gone (about a month ago) I did some simple troubleshooting, then decided to power it down (using the web interface) until I could spend more time with it. After discussions with the spouse, we decided the age of it, and the ongoing unstability warrented a full NAs replacement. It remained off until the new one (a QNAP 671) arrived and I could mount drives in it.
I'll post again tonight once I try the fstab settings, and output from brtfs command. Any other logs you'd like to see? I'll check if they're indexed in Splunk.
Related Content
NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!