Forum Discussion

Aspirant

Mar 10, 2019

Solved

Volume problems NAS 314

Hi - I replaced a failed disk yesterday. The system resynced, and then I updated the firmware to 6.9.5 Access to the system is now down, and system webpage tells me "Remove inactive volumes to use ...

Hopchen

Mar 10, 2019

Hi tomatohead1

Unfortunately, I am not the bearer of good news. The reason that you get the "Remove inactive volumes" error, is because the data volume cannot mount. In your case, it cannot mount because your data raid is not running. As can be seen in the raid config, only the OS raid and the swap raid is running.

Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid10 sda2[0] sdb2[3] sdd2[2] sdc2[1]
1046528 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU] <<<=== Swap raid

md0 : active raid1 sda1[0] sdd1[4] sdc1[2] sdb1[5]
4190208 blocks super 1.2 [4/4] [UUUU] <<<=== OS raid

<<<=== Missing data raid (md127)

You recently replaced disk 2. That should normally be fine in a raid 5 (you can tolerate one disk failure in such a raid). Disk no. 1 and no. 3 does have a few errors on them - 4 ATA Errors on each.

Device: sda
Channel: 0 <<<=== Bay 1
ATA Error Count: 4

Device: sdc
Channel: 2 <<<=== Bay 3
ATA Error Count: 4

This is not a big amount of errors but that is the thing with disk errors... Sometimes, one error is enough.

You replaced disk 2 and the raid sync was started as per normal.

[19/03/09 01:00:29 EST] warning:volume:LOGMSG_HEALTH_VOLUME_WARN Volume data is Degraded.
[19/03/09 13:26:40 EST] notice:disk:LOGMSG_ADD_DISK Disk Model:TOSHIBA HDWD130 Serial:xxxxxxxx was added to Channel 2 of the head unit.
[19/03/09 13:26:48 EST] notice:volume:LOGMSG_RESILVERSTARTED_VOLUME Resyncing started for Volume data.

5 hours later disk 3 dropped out and the data raid "died".

[19/03/09 18:33:05 EST] notice:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Degraded to Dead.
[19/03/09 18:34:54 EST] err:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 3 (Internal) changed state from ONLINE to FAILED.

Just before disk 3 fails, we see these kernel messages about disk 3. This is definitely a dodgy disk.

Mar 09 18:31:52 kernel: ata3.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
Mar 09 18:31:52 kernel: ata3.00: irq_stat 0x40000008
Mar 09 18:31:52 kernel: ata3.00: failed command: READ FPDMA QUEUED
Mar 09 18:31:52 kernel: ata3.00: cmd 60/40:b8:c0:e5:a5/05:00:d0:00:00/40 tag 23 ncq 688128 in
res 41/40:40:98:e9:a5/00:05:d0:00:00/00 Emask 0x409 (media error) <F>
Mar 09 18:31:52 kernel: ata3.00: status: { DRDY ERR }
Mar 09 18:31:52 kernel: ata3.00: error: { UNC }
Mar 09 18:31:52 kernel: ata3.00: configured for UDMA/133
Mar 09 18:31:52 kernel: sd 2:0:0:0: [sdc] tag#23 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 09 18:31:52 kernel: sd 2:0:0:0: [sdc] tag#23 Sense Key : Medium Error [current] [descriptor]
Mar 09 18:31:52 kernel: sd 2:0:0:0: [sdc] tag#23 Add. Sense: Unrecovered read error - auto reallocate failed
Mar 09 18:31:52 kernel: sd 2:0:0:0: [sdc] tag#23 CDB: Read(16) 88 00 00 00 00 00 d0 a5 e5 c0 00 00 05 40 00 00
Mar 09 18:31:52 kernel: blk_update_request: I/O error, dev sdc, sector 3500534168
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096920 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096928 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096936 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096944 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096952 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096960 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096968 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096976 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096984 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096992 on sdc3).
Mar 09 18:31:52 kernel: ata3: EH complete
Mar 09 18:31:56 kernel: do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
Mar 09 18:31:56 kernel: ata3.00: exception Emask 0x0 SAct 0x7f60003f SErr 0x0 action 0x0
Mar 09 18:31:56 kernel: ata3.00: irq_stat 0x40000008
Mar 09 18:31:56 kernel: ata3.00: failed command: READ FPDMA QUEUED
Mar 09 18:31:56 kernel: ata3.00: cmd 60/68:a8:98:e9:a5/01:00:d0:00:00/40 tag 21 ncq 184320 in
res 41/40:68:98:e9:a5/00:01:d0:00:00/00 Emask 0x409 (media error) <F>
Mar 09 18:31:56 kernel: ata3.00: status: { DRDY ERR }
Mar 09 18:31:56 kernel: ata3.00: error: { UNC }
Mar 09 18:31:56 kernel: ata3.00: configured for UDMA/133
Mar 09 18:31:56 kernel: sd 2:0:0:0: [sdc] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 09 18:31:56 kernel: sd 2:0:0:0: [sdc] tag#21 Sense Key : Medium Error [current] [descriptor]
Mar 09 18:31:56 kernel: sd 2:0:0:0: [sdc] tag#21 Add. Sense: Unrecovered read error - auto reallocate failed
Mar 09 18:31:56 kernel: sd 2:0:0:0: [sdc] tag#21 CDB: Read(16) 88 00 00 00 00 00 d0 a5 e9 98 00 00 01 68 00 00
Mar 09 18:31:56 kernel: blk_update_request: I/O error, dev sdc, sector 3500534168

As a result, the raid sync stops since a raid 5 cannot operate on 2 devices and the raid is declared "dead" at this point. You have suffered from the classic case of replacing a disk and during raid re-sync another disk in the raid failed (double disk failure). I would not blame the ReadyNAS for this. A raid sync is a strenuous task for the disks and a disk that might have shown just a tiny bit of errors before, can "blow" up during a raid sync. It would be highly advisable to always ensure an up-to-date backup is available, especially before a raid sync (i.e. replacing a disk).

I would estimate that recovery possibilities here are decent. Even though disk 3 is not stable it can still be read by the NAS. The new disk 2 is likely of no use to us as the raid sync wouldn't have fully finished before disk 3 dropped out.
I reckon that, in order to look at recovery, you would need:
- Disk 1
- Clone disk 3 to new healthy disk. The reason for cloning disk 3 is because it has proven not stable at this point.
- Disk 4

With those 3 disks one could force-assemble the data raid and hope for the best. You might even need to deal with some minor filesystem issues afterwards. Definitely not for the faint of heart.

If you have an up-to-date backup or if the data is not important then you can factory reset but ensure that you use 100% healthy disks. I would be very hesitant to use disk 3. Might be good to test all disks with manufacturer's disk-test tool.

If you do need the data on the other hand, then I would advise to make use of NETGEAR's data recovery service. This will of course carry a fee with it - I believe it is a couple hundred bucks. They should be able to help with disk cloning and raid assembly.

Cheers

Hopchen

Prodigy

Mar 10, 2019

Hi tomatohead1

Well, it looks like more than one disk might have issues. Likely the data raid is not starting due to more than one disk being trouble, in your raid 5 config.

If you want, download the logs and upload to a Google link (or similar) and PM me the link. I can take a look for you.

Cheers

tomatohead1

Aspirant

Mar 10, 2019

Thanks! Link sent...

Hopchen

Prodigy

Mar 10, 2019

Hi tomatohead1

Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4]
md1 : active raid10 sda2[0] sdb2[3] sdd2[2] sdc2[1]
1046528 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU] <<<=== Swap raid

md0 : active raid1 sda1[0] sdd1[4] sdc1[2] sdb1[5]
4190208 blocks super 1.2 [4/4] [UUUU] <<<=== OS raid

<<<=== Missing data raid (md127)

You recently replaced disk 2. That should normally be fine in a raid 5 (you can tolerate one disk failure in such a raid). Disk no. 1 and no. 3 does have a few errors on them - 4 ATA Errors on each.

Device: sda
Channel: 0 <<<=== Bay 1
ATA Error Count: 4

Device: sdc
Channel: 2 <<<=== Bay 3
ATA Error Count: 4

This is not a big amount of errors but that is the thing with disk errors... Sometimes, one error is enough.

You replaced disk 2 and the raid sync was started as per normal.

[19/03/09 01:00:29 EST] warning:volume:LOGMSG_HEALTH_VOLUME_WARN Volume data is Degraded.
[19/03/09 13:26:40 EST] notice:disk:LOGMSG_ADD_DISK Disk Model:TOSHIBA HDWD130 Serial:xxxxxxxx was added to Channel 2 of the head unit.
[19/03/09 13:26:48 EST] notice:volume:LOGMSG_RESILVERSTARTED_VOLUME Resyncing started for Volume data.

5 hours later disk 3 dropped out and the data raid "died".

[19/03/09 18:33:05 EST] notice:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Degraded to Dead.
[19/03/09 18:34:54 EST] err:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 3 (Internal) changed state from ONLINE to FAILED.

Just before disk 3 fails, we see these kernel messages about disk 3. This is definitely a dodgy disk.

Mar 09 18:31:52 kernel: ata3.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0
Mar 09 18:31:52 kernel: ata3.00: irq_stat 0x40000008
Mar 09 18:31:52 kernel: ata3.00: failed command: READ FPDMA QUEUED
Mar 09 18:31:52 kernel: ata3.00: cmd 60/40:b8:c0:e5:a5/05:00:d0:00:00/40 tag 23 ncq 688128 in
res 41/40:40:98:e9:a5/00:05:d0:00:00/00 Emask 0x409 (media error) <F>
Mar 09 18:31:52 kernel: ata3.00: status: { DRDY ERR }
Mar 09 18:31:52 kernel: ata3.00: error: { UNC }
Mar 09 18:31:52 kernel: ata3.00: configured for UDMA/133
Mar 09 18:31:52 kernel: sd 2:0:0:0: [sdc] tag#23 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 09 18:31:52 kernel: sd 2:0:0:0: [sdc] tag#23 Sense Key : Medium Error [current] [descriptor]
Mar 09 18:31:52 kernel: sd 2:0:0:0: [sdc] tag#23 Add. Sense: Unrecovered read error - auto reallocate failed
Mar 09 18:31:52 kernel: sd 2:0:0:0: [sdc] tag#23 CDB: Read(16) 88 00 00 00 00 00 d0 a5 e5 c0 00 00 05 40 00 00
Mar 09 18:31:52 kernel: blk_update_request: I/O error, dev sdc, sector 3500534168
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096920 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096928 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096936 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096944 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096952 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096960 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096968 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096976 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096984 on sdc3).
Mar 09 18:31:52 kernel: md/raid:md127: read error not correctable (sector 3491096992 on sdc3).
Mar 09 18:31:52 kernel: ata3: EH complete
Mar 09 18:31:56 kernel: do_marvell_9170_recover: ignoring PCI device (8086:3a22) at PCI#0
Mar 09 18:31:56 kernel: ata3.00: exception Emask 0x0 SAct 0x7f60003f SErr 0x0 action 0x0
Mar 09 18:31:56 kernel: ata3.00: irq_stat 0x40000008
Mar 09 18:31:56 kernel: ata3.00: failed command: READ FPDMA QUEUED
Mar 09 18:31:56 kernel: ata3.00: cmd 60/68:a8:98:e9:a5/01:00:d0:00:00/40 tag 21 ncq 184320 in
res 41/40:68:98:e9:a5/00:01:d0:00:00/00 Emask 0x409 (media error) <F>
Mar 09 18:31:56 kernel: ata3.00: status: { DRDY ERR }
Mar 09 18:31:56 kernel: ata3.00: error: { UNC }
Mar 09 18:31:56 kernel: ata3.00: configured for UDMA/133
Mar 09 18:31:56 kernel: sd 2:0:0:0: [sdc] tag#21 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Mar 09 18:31:56 kernel: sd 2:0:0:0: [sdc] tag#21 Sense Key : Medium Error [current] [descriptor]
Mar 09 18:31:56 kernel: sd 2:0:0:0: [sdc] tag#21 Add. Sense: Unrecovered read error - auto reallocate failed
Mar 09 18:31:56 kernel: sd 2:0:0:0: [sdc] tag#21 CDB: Read(16) 88 00 00 00 00 00 d0 a5 e9 98 00 00 01 68 00 00
Mar 09 18:31:56 kernel: blk_update_request: I/O error, dev sdc, sector 3500534168

With those 3 disks one could force-assemble the data raid and hope for the best. You might even need to deal with some minor filesystem issues afterwards. Definitely not for the faint of heart.

Cheers

tomatohead1
Aspirant
Mar 10, 2019
Thanks, Hopchen! You do a great service to the community by making your time and knowledge available to us.

I'll contact Netgear data recovery. Is this something they can do remotely?

I do not have back ups for much of the data - I had set up a Dropbox account for remote backup, but at some point, either with a firmware update, or because I set it up incorrectly, the backup service dropped out.

Can you recommend a setup that is more reliable? Perhaps nothing works better than keeping this NAS and making sure back up is always in place. But if multiple disks can crash without warning, I would think there might be a more reliable solution.

In any event, thanks again for your help. You're great!

Tom

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

Learn More

Forum Discussion

Volume problems NAS 314

Related Content

Ready NAS 102 app problem

Volume Expansion question

Ready NAS RN10400 volume danneggiato

Ready NAS Duo v2 -- Installation problems

ReadyNAS 3138 - Discard volume(s) - inaccessible volumes help

NETGEAR Academy

ProSupport for Business