NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
chyzm
Jan 23, 2019Aspirant
Volume degraded after firmware update to 6.9.5 on RN212
NAS device was upgraded to the 6.9.5 version this evening and after reboot my drive in slot 1 is now showing as degraded. I rebooted and have power reset with no fix. Ran disk tests also with no luck...
ccbnz
Apr 13, 2019Aspirant
Did you get this sorted?
I seem to have the same problem on my ReadyNAS Pro (upgraded to OS6).
I updated the firmware to 6.9.5 and later disk 1 showed as Failed. I rebooted the NAS and the disk now shows OK but the resync didn’t complete and the NAS display is flashing “Data Degraded”. Not sure what caused this.
Hopchen
Apr 15, 2019Prodigy
Hi ccbnz
I took a look at your logs. Analysis below.
=== Overview ===
Your disk configuration is: 3TB x 3TB x 2TB x 3TB x 3TB x 3TB.
This means that the NAS will create two raids due to the different sized disks. As follows:
md127 = 6 x 2TB (raid5)
md126 = 5 x 1TB (raid5)
We can see by the raid config that this is indeed the case. However, notice how sda is a spare in the md126 raid. I suspect this happens because it cannot sync in the disk properly. More on that further down.
md126 : active raid5 sdb4[1] sda4[5](S) sdf4[4] sde4[3] sdd4[2]
3906483712 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/4] [_UUUU] <<<=== Raid degraded.
md127 : active raid5 sda3[0] sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[1]
9743324160 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]
Your two raids are then "stuck" together by the filesystem in order to create one volume.
Label: '<masked>:data' uuid: <masked> Total devices 2 FS bytes used 970.18GiB devid 1 size 9.07TiB used 972.02GiB path /dev/md127 devid 2 size 3.64TiB used 0.00B path /dev/md126
=== Issue ===
Disk 1 fell out of the md126 raid and was never able to re-join the raid array. The below sequence happens every time you reboot.
[19/04/12 18:01:48] err:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 1 (Internal) changed state from ONLINE to FAILED. [19/04/12 19:45:48] notice:volume:LOGMSG_RESILVERSTARTED_VOLUME Resyncing started for Volume data. [19/04/12 19:45:52] notice:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 1 (Internal) changed state from ONLINE to RESYNC. [19/04/12 23:29:08] notice:volume:LOGMSG_RESILVERCOMPLETE_DEGRADED_VOLUME The resync operation finished on volume data. However, the volume is still degraded. [19/04/12 23:45:10] notice:system:LOGMSG_SYSTEM_REBOOT The system is rebooting. [19/04/12 23:46:44] warning:volume:LOGMSG_HEALTH_VOLUME_WARN Volume data is Degraded. [19/04/12 23:46:45] notice:volume:LOGMSG_RESILVERSTARTED_VOLUME Resyncing started for Volume data. [19/04/13 00:24:08] notice:volume:LOGMSG_RESILVERCOMPLETE_DEGRADED_VOLUME The resync operation finished on volume data. However, the volume is still degraded.
A raid sync stopping like this is a safety mechanism because one or more drives are not responding properly during the sync. It is to avoid a potential double disk failure. You disks appear OK-ish with two disks having a small amount of errors.
Device: sdc Channel: 2 <<<=== Disk 3 ATA Error Count: 9 Device: sdd Channel: 3 <<<=== Disk 4 ATA Error Count: 1
However, the kernel logs are completely flooded with disk errors from 4 of the disks: 2, 4, 5 and 6. Below is a mere extract.
So, this is why the raid sync fails. It is likely also why disk 1 eventually is simply marked as spare in the md126 raid.
--- Disk 2 ---
Apr 13 20:40:24 kernel: ata2.00: exception Emask 0x50 SAct 0xb0000 SErr 0x280900 action 0x6 frozen
Apr 13 20:40:24 kernel: ata2.00: irq_stat 0x08000000, interface fatal error
Apr 13 20:40:24 kernel: ata2: SError: { UnrecovData HostInt 10B8B BadCRC }
Apr 13 20:40:24 kernel: ata2.00: failed command: READ FPDMA QUEUED
Apr 13 20:40:24 kernel: ata2.00: cmd 60/40:80:40:19:95/00:00:00:00:00/40 tag 16 ncq 32768 in
res 40/00:a4:00:19:95/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:40:24 kernel: ata2.00: status: { DRDY }
Apr 13 20:40:24 kernel: ata2.00: failed command: READ FPDMA QUEUED
Apr 13 20:40:24 kernel: ata2.00: cmd 60/40:88:40:7f:9b/00:00:00:00:00/40 tag 17 ncq 32768 in
res 40/00:a4:00:19:95/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:40:24 kernel: ata2.00: status: { DRDY }
Apr 13 20:40:24 kernel: ata2.00: failed command: READ FPDMA QUEUED
Apr 13 20:40:24 kernel: ata2.00: cmd 60/80:98:c0:7f:9b/00:00:00:00:00/40 tag 19 ncq 65536 in
res 40/00:a4:00:19:95/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:40:24 kernel: ata2.00: status: { DRDY }
Apr 13 20:40:24 kernel: ata2: hard resetting link
--- Disk 4 ----
Apr 13 20:36:41 kernel: ata4.00: exception Emask 0x50 SAct 0xfe0000 SErr 0x280900 action 0x6 frozen
Apr 13 20:36:41 kernel: ata4.00: irq_stat 0x08000000, interface fatal error
Apr 13 20:36:41 kernel: ata4: SError: { UnrecovData HostInt 10B8B BadCRC }
Apr 13 20:36:41 kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata4.00: cmd 60/40:88:b0:3b:e7/05:00:ee:00:00/40 tag 17 ncq 688128 in
res 40/00:bc:30:5b:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata4.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata4.00: cmd 60/40:90:f0:40:e7/05:00:ee:00:00/40 tag 18 ncq 688128 in
res 40/00:bc:30:5b:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata4.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata4.00: cmd 60/40:98:30:46:e7/05:00:ee:00:00/40 tag 19 ncq 688128 in
res 40/00:bc:30:5b:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata4.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata4.00: cmd 60/40:a0:70:4b:e7/05:00:ee:00:00/40 tag 20 ncq 688128 in
res 40/00:bc:30:5b:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata4.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata4.00: cmd 60/40:a8:b0:50:e7/05:00:ee:00:00/40 tag 21 ncq 688128 in
res 40/00:bc:30:5b:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata4.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata4.00: cmd 60/40:b0:f0:55:e7/05:00:ee:00:00/40 tag 22 ncq 688128 in
res 40/00:bc:30:5b:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata4.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata4.00: cmd 60/00:b8:30:5b:e7/05:00:ee:00:00/40 tag 23 ncq 655360 in
res 40/00:bc:30:5b:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata4.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata4: hard resetting link
--- Disk 5 ----
Apr 13 20:38:33 kernel: ata5.00: exception Emask 0x50 SAct 0x7 SErr 0x200900 action 0x6 frozen
Apr 13 20:38:34 kernel: ata5.00: irq_stat 0x08000000, interface fatal error
Apr 13 20:38:34 kernel: ata5: SError: { UnrecovData HostInt BadCRC }
Apr 13 20:38:34 kernel: ata5.00: failed command: READ FPDMA QUEUED
Apr 13 20:38:34 kernel: ata5.00: cmd 60/80:00:40:10:95/00:00:00:00:00/40 tag 0 ncq 65536 in
res 40/00:0c:00:10:95/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:38:34 kernel: ata5.00: status: { DRDY }
Apr 13 20:38:34 kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 13 20:38:34 kernel: ata5.00: cmd 61/40:08:00:10:95/00:00:00:00:00/40 tag 1 ncq 32768 out
res 40/00:0c:00:10:95/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:38:34 kernel: ata5.00: status: { DRDY }
Apr 13 20:38:34 kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 13 20:38:34 kernel: ata5.00: cmd 61/40:10:c0:0f:95/00:00:00:00:00/40 tag 2 ncq 32768 out
res 40/00:0c:00:10:95/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:38:34 kernel: ata5.00: status: { DRDY }
Apr 13 20:38:34 kernel: ata5: hard resetting link
Apr 13 20:38:34 kernel: do_marvell_9170_recover: ignoring PCI device (8086:2821) at PCI#0
Apr 13 20:38:34 kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Apr 13 20:38:34 kernel: ata5.00: configured for UDMA/33
Apr 13 20:38:34 kernel: ata5: EH complete
--- Disk 6 ----
Apr 13 20:36:41 kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Apr 13 20:36:41 kernel: ata6.00: configured for UDMA/33
Apr 13 20:36:41 kernel: ata6: EH complete
Apr 13 20:36:41 kernel: ata6.00: exception Emask 0x50 SAct 0x807fffc SErr 0x280900 action 0x6 frozen
Apr 13 20:36:41 kernel: ata6.00: irq_stat 0x08000000, interface fatal error
Apr 13 20:36:41 kernel: ata6: SError: { UnrecovData HostInt 10B8B BadCRC }
Apr 13 20:36:41 kernel: ata6.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata6.00: cmd 60/18:10:98:a9:e6/02:00:ee:00:00/40 tag 2 ncq 274432 in
res 40/00:94:30:5e:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata6.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata6.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata6.00: cmd 60/40:18:58:a4:e6/05:00:ee:00:00/40 tag 3 ncq 688128 in
res 40/00:94:30:5e:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata6.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata6.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata6.00: cmd 60/40:20:18:9f:e6/05:00:ee:00:00/40 tag 4 ncq 688128 in
res 40/00:94:30:5e:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata6.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata6.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata6.00: cmd 60/40:28:d8:99:e6/05:00:ee:00:00/40 tag 5 ncq 688128 in
res 40/00:94:30:5e:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
It is of course unlikely that soo many disks are bad even though the disks aren't young anymore. I would be suspect of the chassis here.
I suggest that you take a backup right now. Because md126 is degraded, one more disk being kicked from the raid could leave you in serious trouble.
=== My recommendations ===
1. Take a backup of your data asap.
2. Turn off NAS and test each disk in a PC with WD disk test tool. It can be downloaded from their website.
3. Replace any disks that come out bad.
4. Factory reset the NAS and start over with all healthy disks. Restore data from backups.
5. If issue re-occurs --> replace the NAS. Keep backups at all times!
Cheers
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!