Forum Discussion

Tutor

Jan 27, 2023

ReadyNAS 316 (6x4TB Enterprise) - No volumes after graceful shutdown and restart

I shutdown my RN316 while leaving for a few days. Upon restart, I was able to connect and access normally. However after about 15 minutes when I tried to save a file to the NAS I was not able to acce...

Tips from other users

RT6507

Tutor

Jan 29, 2023

Your help is much appreciated! I can move the drives (one at a time) to a Windows chassis I have standing by, but it only has three available SATA connections. Can I pull and examine the drives (WD Red Pro 4.0 TB) from the RN316 individually without overwriting any needed RAID tables?

Also, I don't see any explicit drive failure BTRFS errors for drive 1 (SDB3). This is from Systemd-Journal.log. It implies Disk 1 (SDB3) is offline.

Jan 26 08:46:46 NAS_IV kernel: BTRFS: device label 7c6e3b76:root devid 1 transid 4240694 /dev/md0

Jan 26 08:46:46 NAS_IV systemd[1]: systemd 44 running in system mode. (+PAM +LIBWRAP +AUDIT +SELINUX +IMA +SYSVINIT +LIBCRYPTSETUP; debian)

Jan 26 08:46:46 NAS_IV systemd[1]: Set hostname to <NAS_IV>.

Jan 26 08:46:46 NAS_IV udevd[1342]: starting version 175

Jan 26 08:46:46 NAS_IV systemd-journal[1333]: Journal started

Jan 26 08:46:46 NAS_IV kernel: md: md127 stopped.

Jan 26 08:46:46 NAS_IV kernel: md: bind<sdb3>

Jan 26 08:46:46 NAS_IV kernel: md: bind<sdc3>

Jan 26 08:46:46 NAS_IV kernel: md: bind<sdd3>

Jan 26 08:46:46 NAS_IV kernel: md: bind<sde3>

Jan 26 08:46:46 NAS_IV kernel: md: bind<sdf3>

Jan 26 08:46:46 NAS_IV kernel: md: bind<sda3>

Jan 26 08:46:46 NAS_IV kernel: md: kicking non-fresh sdb3 from array!

Jan 26 08:46:46 NAS_IV kernel: md: unbind<sdb3>

Jan 26 08:46:46 NAS_IV kernel: md: export_rdev(sdb3)

Jan 26 08:46:46 NAS_IV kernel: md/raid:md127: device sda3 operational as raid disk 0

Jan 26 08:46:46 NAS_IV kernel: md/raid:md127: device sdf3 operational as raid disk 5

Jan 26 08:46:46 NAS_IV kernel: md/raid:md127: device sde3 operational as raid disk 4

Jan 26 08:46:46 NAS_IV kernel: md/raid:md127: device sdd3 operational as raid disk 3

Jan 26 08:46:46 NAS_IV kernel: md/raid:md127: device sdc3 operational as raid disk 2

Jan 26 08:46:46 NAS_IV kernel: md/raid:md127: allocated 0kB

Jan 26 08:46:46 NAS_IV kernel: md/raid:md127: raid level 5 active with 5 out of 6 devices, algorithm 2

Jan 26 08:46:46 NAS_IV kernel: RAID conf printout:

Jan 26 08:46:46 NAS_IV kernel: --- level:5 rd:6 wd:5

Jan 26 08:46:46 NAS_IV kernel: disk 0, o:1, dev:sda3

Jan 26 08:46:46 NAS_IV kernel: disk 2, o:1, dev:sdc3

Jan 26 08:46:46 NAS_IV kernel: disk 3, o:1, dev:sdd3

Jan 26 08:46:46 NAS_IV kernel: disk 4, o:1, dev:sde3

Jan 26 08:46:46 NAS_IV kernel: disk 5, o:1, dev:sdf3

Jan 26 08:46:46 NAS_IV kernel: created bitmap (30 pages) for device md127

Jan 26 08:46:46 NAS_IV kernel: md127: bitmap initialized from disk: read 2 pages, set 568 of 59543 bits

Jan 26 08:46:46 NAS_IV kernel: md127: detected capacity change from 0 to 19979093934080

Jan 26 08:46:46 NAS_IV start_raids[1325]: mdadm: /dev/md/data-0 has been started with 5 drives (out of 6).

Jan 26 08:46:46 NAS_IV kernel: Adding 2094844k swap on /dev/md1. Priority:-1 extents:1 across:2094844k

Jan 26 08:46:47 NAS_IV kernel: BTRFS: device label 7c6e3b76:data devid 1 transid 285232 /dev/md127

StephenB

Guru - Experienced User

Jan 29, 2023

RT6507 wrote:

This is from Systemd-Journal.log. It implies Disk 1 (SDB3) is offline.

Jan 26 08:46:46 NAS_IV kernel: md: kicking non-fresh sdb3 from array!

Jan 26 08:46:46 NAS_IV kernel: md: unbind<sdb3>

Jan 26 08:46:46 NAS_IV kernel: md: export_rdev(sdb3)

Jan 26 08:46:46 NAS_IV kernel: md/raid:md127: device sda3 operational as raid disk 0

Jan 26 08:46:46 NAS_IV kernel: md/raid:md127: device sdf3 operational as raid disk 5

Jan 26 08:46:46 NAS_IV kernel: md/raid:md127: device sde3 operational as raid disk 4

Jan 26 08:46:46 NAS_IV kernel: md/raid:md127: device sdd3 operational as raid disk 3

Jan 26 08:46:46 NAS_IV kernel: md/raid:md127: device sdc3 operational as raid disk 2

Jan 26 08:46:46 NAS_IV kernel: md/raid:md127: allocated 0kB

Jan 26 08:46:46 NAS_IV kernel: md/raid:md127: raid level 5 active with 5 out of 6 devices, algorithm 2

Actually it doesn't. It says that sdb3 is out of sync with the rest of the array, so it is being removed.

Also, at this point in time the array would have been degraded, but would still have been mounted. So something must have happened after that with disk 6.

If you send me the log zip, I could take a look. Do that via a private message (PM) using the envelope icon in the upper right hand of the forum page. You'll need to put the zip into cloud storage (dropbox, etc), and include a sharable link in the PM.

RT6507 wrote:

Your help is much appreciated! I can move the drives (one at a time) to a Windows chassis I have standing by, but it only has three available SATA connections. Can I pull and examine the drives (WD Red Pro 4.0 TB) from the RN316 individually without overwriting any needed RAID tables?

You'd need to power down the NAS, and then test the two disks. Leave the NAS powered down until you return them to their proper slots in the NAS.

RT6507
Tutor
Jan 30, 2023
I sent the link for the log files .ZIP. You should also see the WD Dasboard screen shots for Disk 1 (sdb) and Disk 5 (sdf). Let me know if they aren't visible.

StephenB

Guru - Experienced User

Jan 30, 2023

RT6507 wrote:

I sent the link for the log files .ZIP. You should also see the WD Dasboard screen shots for Disk 1 (sdb) and Disk 5 (sdf). Let me know if they aren't visible.

I am able to access the files.

You are running rather old firmware (6.5.1), it would be good to update that after the issue is resolved.

Piecing together the history from a couple of logs shows this:

Dec 22 09:55:20 NAS_IV readynasd[2914]: Volume data health changed from Redundant to Degraded.

Jan 23 12:18:10 NAS_IV readynasd[2914]: The system is shutting down.


Jan 26 08:46:46 NAS_IV kernel: md: kicking non-fresh sdb3 from array!
Jan 26 08:46:46 NAS_IV start_raids[1325]: mdadm: /dev/md/data-0 has been started with 5 drives (out of 6).

Jan 26 08:48:05 NAS_IV mdadm[2296]: DegradedArray event detected on md device /dev/md127
Jan 26 08:49:04 NAS_IV mdadm[2296]: RebuildStarted event detected on md device /dev/md127, component device recovery

Jan 26 08:55:22 NAS_IV mdadm[2296]: Rebuild95 event detected on md device /dev/md127, component device recovery
Jan 26 08:55:23 NAS_IV kernel: md: md127: recovery interrupted.
Jan 26 08:55:23 NAS_IV mdadm[2296]: Fail event detected on md device /dev/md127, component device /dev/sdf3
Jan 26 08:55:24 NAS_IV mdadm[2296]: RebuildFinished event detected on md device /dev/md127, component device recovery
Jan 26 08:55:34 NAS_IV readynasd[3000]: Disk in channel 6 (Internal) changed state from ONLINE to FAILED.

Jan 26 09:51:38 NAS_IV readynasd[3000]: The system is shutting down.

Jan 26 09:54:23 NAS_IV kernel: md: kicking non-fresh sdf3 from array!
Jan 26 09:54:23 NAS_IV kernel: md: md127 stopped.
Jan 26 09:54:23 NAS_IV start_raids[1323]: mdadm: NOT forcing event count in /dev/sdf3(5) from 16702 up to 16711
Jan 26 09:54:23 NAS_IV start_raids[1323]: mdadm: You can use --really-force to do that (DANGEROUS)

Jan 26 09:54:23 NAS_IV start_raids[1323]: mdadm: failed to RUN_ARRAY /dev/md/data-0: Input/output error
Jan 26 09:54:23 NAS_IV start_raids[1323]: mdadm: Not enough devices to start the array.
Jan 26 10:13:18 NAS_IV readynasd[2744]: The system is rebooting.

As you can see, the array became degraded in December. That message is repeated every day at 1 am until 23 January, when the system was shut down. I can't tell which disk caused the degradation event (the detailed logs don't go back that far), but my guess would be disk 2.

When you booted on 26 January, it looks like disk 2 came on-line. The system tried to resync disk 2, but then ran into an error in disk 6 - which caused the resync to fail.

The system was rebooted on 26 January, about 9:54. At that point, both disk 2 and disk 6 were out of sync, so the array failed.

The disk_info.log shows 1 current pending sector for sdb, and 2 for sdf.

Device:             sdb
Controller:         0
Channel:            1
Model:              WDC WD4001FFSX-68JNUN0
Firmware:           81.00A81
Class:              SATA
RPM:                7200
Sectors:            7814037168
Pool:               data-0
PoolType:           RAID 5
PoolState:          5
PoolHostId:         7c6e3b76
Health Data:
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   1
  Uncorrectable Sector Count:     0
  Temperature:                    45
  Start/Stop Count:               92
  Power-On Hours:                 46365
  Power Cycle Count:              92
  Load Cycle Count:               39

Device:             sdf
Controller:         0
Channel:            5
Model:              WDC WD4001FFSX-68JNUN0
Firmware:           81.00A81
Class:              SATA
RPM:                7200
Sectors:            7814037168
Pool:               data-0
PoolType:           RAID 5
PoolState:          5
PoolHostId:         7c6e3b76
Health Data:
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   2
  Uncorrectable Sector Count:     0
  Temperature:                    42
  Start/Stop Count:               79
  Power-On Hours:                 45281
  Power Cycle Count:              79
  Load Cycle Count:               37

As far as the error codes in dashboard goes,WDC says

Western Digital Dashboard Error Codes Explained (wd.com)

The previous self-test completed having the read element of the test failed. Retest after checking the connections. Replace the drive if the error repeats.

I think it does make sense to double-check the connections, and try the test again.

As far as recovery goes, despite the "DANGEROUS" comment on the --really-force line above, that option is probably your best path to in-place recovery. Disk 2 appears to have been off-line for a very long time, so you'd want to use that option with disk 2 removed. Assuming the volume assembles and mounts, you'd then want to make a full backup before doing anything else.

After getting the array healthy, I'd recommend upgrading the firmware, and also scheduliing the system maintenance tasks. Fixing the problem with email alerts would also be a good idea - if they were working, you'd have received the alert on 22 December when the volume first became degraded.

Let me know if you want to go down this path - we can help with the needed commands. Other options would be to use RAID recovery software in the Windows PC (ReclaiMe or other software that supports BTRFS), or to contract with a data recovery service. Those are more expensive, but would probably have somewhat less risk to your data.

RT6507
Tutor
Jan 30, 2023
Ok, I'd like to try and fix based on your help with commands. Will I still be able to resort to Reclaime or a data recovery service if I'm not able to remount the volume?
StephenB
Guru - Experienced User
Jan 30, 2023
RT6507 wrote:

Will I still be able to resort to Reclaime or a data recovery service if I'm not able to remount the volume?

The more you do on your own, the more difficult recovery can become. But you should still be able to do it.

What you'd need to do is power up the NAS with disk 2 removed. Enable ssh on the NAS, and log in with ssh from a PC. The username is "root", the password is the NAS admin password. (From windows 10 or 11, you'd enter ssh root@nas-ip-address from the windows search bar - using the real NAS IP address of course).

From there, you would enter

mdadm --assemble --really-force /dev/sdf3

If that works, you can then manually mount the array with

btrfs device scan mount /dev/md127 /data

If you run into problems with these commands, just post back.
RT6507
Tutor
Jan 30, 2023
While trying to enable SSH I get "Service Operation Failed - cannot start service without volume"
StephenB
Guru - Experienced User
Jan 30, 2023
RT6507 wrote:

While trying to enable SSH I get "Service Operation Failed - cannot start service without volume"

Annoying.

You can boot up the system in tech support mode, and access with telnet. Instructions are on page 81-82 here:

https://www.downloads.netgear.com/files/GDC/READYNAS-100/ReadyNAS_%20OS6_Desktop_HM_EN.pdf

The username is root, the password is infr8ntdebug.

Once in, you start the RAID and chroot with this command:

rnutil chroot

Then you can use the instructions above.

Once the array mounts, you should be able to reboot and still see the volume.
RT6507
Tutor
Jan 31, 2023
I will attempt this later today when I have a clear slate.
RT6507
Tutor
Jan 31, 2023
OK, I'm telnetted in but mdadm error says: "/dev/sdf3 not identified in config file."
I've uploaded a screen shot of the Telnet session to cloud.
StephenB
Guru - Experienced User
Feb 01, 2023
RT6507 wrote:

OK, I'm telnetted in but mdadm error says: "/dev/sdf3 not identified in config file."

I've uploaded a screen shot of the Telnet session to cloud.

Try adding --scan

mdadm --assemble --scan --really-force /dev/sdf3

If that doesn't help,try

mdadm --examine /dev/sdf*
RT6507
Tutor
Feb 01, 2023
The first command (--assemble) did not succeed.
The second command (--examine) returned a list of partitions but only one that referenced /dev/sdf1. I tried "mdadm --assemble --really-force /dev/sd1" but got the mdadm error: "device /dev/sdf1 exists but is not an md array."
RT6507
Tutor
Feb 01, 2023
Screenshot on cloud.
StephenB
Guru - Experienced User
Feb 01, 2023
RT6507 wrote:

The first command (--assemble) did not succeed.

The second command (--examine) returned a list of partitions but only one that referenced /dev/sdf1. I tried "mdadm --assemble --really-force /dev/sd1" but got the mdadm error: "device /dev/sdf1 exists but is not an md array."

Not a good sign. sdf3 was identified as part of the array in the logs, but that was 26 January. sdf1 would normally be part of the OS partition - so also part of an md array (just not the data volume).

You could try powering down, and putting sdb back into the NAS. Then go back into tech support mode, and try

mdadm --examine /dev/sdb3

Note the event counter (if it gives you one). Then also run

mdadm --examine /dev/sda3

and see how the event counter compares.
RT6507
Tutor
Feb 01, 2023
Replaced Disk 1 (sdb3) and restarted in Tech Support mode. I see six blue LEDs alonside drive caddies. Ran mdadm --examine on sdb3 and sda3 but did not see any event counters. I also ran mdadm --examine on sdf3 and saw the same results.

If I didn't know better I'd think the drives were fine.

Screen shots on cloud.
StephenB
Guru - Experienced User
Feb 01, 2023
RT6507 wrote:

Ran mdadm --examine on sdb3 and sda3 but did not see any event counters.

/dev/sda3 Events: 16711 /dev/sdb3: Events: 16711 /dev/sdf3: Events: 16702

This is a bit odd, since last time mdadm told you sdf3 wasn't in an array.

A bit more odd is that disk 2 appears to be in sync (the event counter matches sda3)..

I think the next step is to power down, remove sdf, and power up normally, and see what happens. If the volume does mount, then back up the data before doing anything else.
RT6507
Tutor
Feb 01, 2023
No luck. Can't access old Windows Shares. WebUI shows all 5 drives healthy but "no volumes exist". I uploaded a screen shot of WebUI status.
StephenB
Guru - Experienced User
Feb 01, 2023
RT6507 wrote:

No luck. Can't access old Windows Shares. WebUI shows all 5 drives healthy but "no volumes exist". I uploaded a screen shot of WebUI status.

Can you get a second set of logs with the NAS in this state?
AnishaA
NETGEAR Employee Retired
Feb 02, 2023
Hello RT6507

Please see the volume logs once and please find whether the md127 is mounted.

If md127 is not mounted then please mount the volume using the below command
df -h
cat/ proc/partitions
mdadm -A --scan

We offer paid data recovery service.

Note: Data recovery is to investigate whether the data can be recovered. We do not promise to recover the data. Data recovery is paid to check and investigate.

Have a lovely day,
Anisha A
Netgear Team
RT6507
Tutor
Feb 02, 2023
Uploaded System-log zip to cloud storage.

I don't see a mention of md127 in the Volume.log. Would I issue the mdadm mount commands from Tech Support mode as Telnet session seems refused with NAS_IV in standard bootup?
RT6507
Tutor
Feb 02, 2023
OK. ran df -h and others from Telnet - Tech Support mode.

error on 2nd cmd: "-sh: cat/: not found"

mdadm: "/dev/md/data-0 assembled from 4 drives and 1 spare - not enough to start the array."
Screenshot uploaded.

What are the terms for data recovery from you?
RT6507
Tutor
Feb 02, 2023
Also, tried these commands with all six drives installed and got slightly different messages. See uploaded screen shot.
RT6507
Tutor
Feb 03, 2023
Hi StephenB.
I wonder if you saw my last logs and screen shots uploaded to Box? Do you see any hope to continue trying to restore my RN316 to normal operations or should I shift to a data recovery focus. Either way, Thank You for your insight and patience in trying to help me resolve this problem. Your instructions have been clear and concise. I'm grateful for your help. If its time to admit defeat I'll know we tried our best.
RT6507
Tutor
Feb 04, 2023
Anisha A
Can you provide more information on Netgear data recovery service, such as costs and what you would need to access my NAS? Thanks.
AnishaA
NETGEAR Employee Retired
Feb 05, 2023
Hello RT6507 ,

The basic pay for data recovery service would cost around 200 USD. And we would need SDM to do the data recovery process.

Note: Data recovery is to investigate whether the data can be recovered. We do not promise to recover the data. Data recovery is paid to check and investigate.

Have a lovely day,
Anisha A
Netgear Team
RT6507
Tutor
Feb 06, 2023
OK. What is "SDM"?
StephenB
Guru - Experienced User
Feb 06, 2023
RT6507 wrote:

OK. What is "SDM"?

Secure Diagnostic Mode.

Go to the settings page in the web admin interface and look at the support section.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

Learn More

Forum Discussion

ReadyNAS 316 (6x4TB Enterprise) - No volumes after graceful shutdown and restart

Related Content

enterprise disk replacement for RN3138

Pelco VideoExpert Enterprise

ReadyNAS 316 High command time outs on WD Enterprise SSD

WPA2-Enterprise Support on Orbi Pro

ORBI Support for WPA2 Enterprise?

NETGEAR Academy

ProSupport for Business