NAS device was upgraded to the 6.9.5 version this evening and after reboot my drive in slot 1 is now showing as degraded. I rebooted and have power reset with no fix. Ran disk tests also with no luck. I swapped drives between slots to see if the issue follows and it does not. Slot 1 still showing degraded state. So that proves its not a drive issue but a slot issue and I am assuming related to firmware. Anyone else having this issue? Thanks Chris

I am not seeing it in my RN202. You could of course try downgrading back to 6.9.4. Though I think it'd be better to PM JohnCM_S and ask him if he is willing to review your logs. If you are the original purchaser, then your RN212 should still be covered by the warranty (3 years). So another option is to request an RMA via my.netgear.com.

Hi chyzm, You may upload the logs to a file sharing site then PM me the download link. Regards,

Once I get home I will get you those. Thanks!

Hi chyzm, I have checked the logs and see communication error with the drives. Most likely it is already an issue with the bay itself but it should not be related to the firmware upgrade specifically. It will be best to contact support so they can process an RMA for the chassis. Regards,

Volume degraded after firmware update to 6.9.5 on RN212

31 Replies

Replies have been turned off for this discussion

StephenB
Guru - Experienced User
Jan 24, 2019
I am not seeing it in my RN202. You could of course try downgrading back to 6.9.4. Though I think it'd be better to PM JohnCM_S and ask him if he is willing to review your logs.

If you are the original purchaser, then your RN212 should still be covered by the warranty (3 years). So another option is to request an RMA via my.netgear.com.
- chyzm
  Aspirant
  Jan 24, 2019
  Thanks for the info!
  - JohnCM_S
    NETGEAR Employee Retired
    Jan 24, 2019
    Hi chyzm,
    
    You may upload the logs to a file sharing site then PM me the download link.
    
    Regards,

ccbnz

Aspirant

Apr 13, 2019

Did you get this sorted?

I seem to have the same problem on my ReadyNAS Pro (upgraded to OS6).

I updated the firmware to 6.9.5 and later disk 1 showed as Failed. I rebooted the NAS and the disk now shows OK but the resync didn’t complete and the NAS display is flashing “Data Degraded”. Not sure what caused this.

chyzm
Aspirant
Apr 13, 2019
No luck. Trying to contact and get Netgear to responsed was a failure. Customer service and experience are the pits with them. After two days trying to open a case I was finally able to do so but again no response. After a week the drive came back. Mine is under warranty but again I have had no luck in getting Netgear to respond. Funny you posted this because last night I had to power reset some of my equipment and the NAS was one that got reset and same issue has come back. I give up and am looking at a Synology DS218+ to replace. Don’t want to go through the whole debacle again and will never buy a Netgear product again.
- ccbnz
  Aspirant
  Apr 14, 2019
  Thanks for your reply chyzm. I've factory reset the NAS and it seems to be OK now. It's currently going through the resync process. My problem may not be related to the 6.9.5 update. I also changed the settings on the NAS to enable disk spin down after 10 minutes and it may have been this. I've changed it back to having the disks spinning all the time.
  
  Yeah the Synology DS218+ looks nice. I've had a look at that too. But my ReadyNAS has been pretty well rock solid for more than 10 years. I've had to change out all of the 6 disks and I've also upgraded the CPU and memory on it to get a bit more performance. It's really reliable but performance is still a bit low for apps like Plex.
Hopchen
Prodigy
Apr 14, 2019
ccbnz wrote:

Did you get this sorted?

I seem to have the same problem on my ReadyNAS Pro (upgraded to OS6).

I updated the firmware to 6.9.5 and later disk 1 showed as Failed. I rebooted the NAS and the disk now shows OK but the resync didn’t complete and the NAS display is flashing “Data Degraded”. Not sure what caused this.

ccbnz

Don't assume you hit the same problem because symptoms are similar :) Chances are that your disk is just bad.

If you want to, let me take a look at the logs for you. Download them from the NAS and upload to Google link or similar and PM me the link.

Also, take a backup of the data if you don't have one already. Now is a good time.
- ccbnz
  Aspirant
  Apr 14, 2019
  Thanks Hopchen .
  
  Yes it may not be the firmware upgrade. I did three things prior to it failing:
  
  (i) Upgraded firmware to 6.9.5
  
  (ii) Changed power settings to disk spin down after 10mins of inactivity
  
  (iii) Installed Plex
  
  It may be related to (ii). From what I've read, it's better to leave the disks spinning all of the time rather than spinning them up. I have an unproven theory that maybe the disks didn't spin up fast enough and the system mapped one as bad. But that's only speculation. This is this alos happend with disk 3 briefly but a reboot fixed it.
  
  I've factory reset the NAS and it seems to be all fine now.
  
  Thanks for the offer of looking at the Log files. I'll take you up on the offer. What I have noticed in mdstat is that /dev/md/data-0 all of the disks are OK but in /dev/md/data-1 it's showing 4 active devices, 5 working devices and 1 spare. UNless somehow disk 3 was also mapped as bad and that's led to a situation where the disks can't resync.

Hopchen

Prodigy

Apr 15, 2019

Hi ccbnz

I took a look at your logs. Analysis below.

=== Overview ===

Your disk configuration is: 3TB x 3TB x 2TB x 3TB x 3TB x 3TB.

This means that the NAS will create two raids due to the different sized disks. As follows:

md127 = 6 x 2TB (raid5)
md126 = 5 x 1TB (raid5)

We can see by the raid config that this is indeed the case. However, notice how sda is a spare in the md126 raid. I suspect this happens because it cannot sync in the disk properly. More on that further down.

md126 : active raid5 sdb4[1] sda4[5](S) sdf4[4] sde4[3] sdd4[2]
      3906483712 blocks super 1.2 level 5, 64k chunk, algorithm 2 [5/4] [_UUUU] <<<=== Raid degraded.
      
md127 : active raid5 sda3[0] sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[1]
      9743324160 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU]

Your two raids are then "stuck" together by the filesystem in order to create one volume.

Label: '<masked>:data'  uuid: <masked>
	Total devices 2 FS bytes used 970.18GiB
	devid    1 size 9.07TiB used 972.02GiB path /dev/md127
	devid    2 size 3.64TiB used 0.00B path /dev/md126

=== Issue ===

Disk 1 fell out of the md126 raid and was never able to re-join the raid array. The below sequence happens every time you reboot.

[19/04/12 18:01:48] err:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 1 (Internal) changed state from ONLINE to FAILED.
[19/04/12 19:45:48] notice:volume:LOGMSG_RESILVERSTARTED_VOLUME Resyncing started for Volume data.
[19/04/12 19:45:52] notice:disk:LOGMSG_ZFS_DISK_STATUS_CHANGED Disk in channel 1 (Internal) changed state from ONLINE to RESYNC.
[19/04/12 23:29:08] notice:volume:LOGMSG_RESILVERCOMPLETE_DEGRADED_VOLUME The resync operation finished on volume data. However, the volume is still degraded.

[19/04/12 23:45:10] notice:system:LOGMSG_SYSTEM_REBOOT The system is rebooting.
[19/04/12 23:46:44] warning:volume:LOGMSG_HEALTH_VOLUME_WARN Volume data is Degraded.
[19/04/12 23:46:45] notice:volume:LOGMSG_RESILVERSTARTED_VOLUME Resyncing started for Volume data.
[19/04/13 00:24:08] notice:volume:LOGMSG_RESILVERCOMPLETE_DEGRADED_VOLUME The resync operation finished on volume data. However, the volume is still degraded.

A raid sync stopping like this is a safety mechanism because one or more drives are not responding properly during the sync. It is to avoid a potential double disk failure. You disks appear OK-ish with two disks having a small amount of errors.

Device: sdc
Channel: 2 <<<=== Disk 3
ATA Error Count: 9

Device: sdd
Channel: 3 <<<=== Disk 4
ATA Error Count: 1

However, the kernel logs are completely flooded with disk errors from 4 of the disks: 2, 4, 5 and 6. Below is a mere extract.

So, this is why the raid sync fails. It is likely also why disk 1 eventually is simply marked as spare in the md126 raid.

--- Disk 2 ---
Apr 13 20:40:24 kernel: ata2.00: exception Emask 0x50 SAct 0xb0000 SErr 0x280900 action 0x6 frozen
Apr 13 20:40:24 kernel: ata2.00: irq_stat 0x08000000, interface fatal error
Apr 13 20:40:24 kernel: ata2: SError: { UnrecovData HostInt 10B8B BadCRC }
Apr 13 20:40:24 kernel: ata2.00: failed command: READ FPDMA QUEUED
Apr 13 20:40:24 kernel: ata2.00: cmd 60/40:80:40:19:95/00:00:00:00:00/40 tag 16 ncq 32768 in
                                            res 40/00:a4:00:19:95/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:40:24 kernel: ata2.00: status: { DRDY }
Apr 13 20:40:24 kernel: ata2.00: failed command: READ FPDMA QUEUED
Apr 13 20:40:24 kernel: ata2.00: cmd 60/40:88:40:7f:9b/00:00:00:00:00/40 tag 17 ncq 32768 in
                                            res 40/00:a4:00:19:95/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:40:24 kernel: ata2.00: status: { DRDY }
Apr 13 20:40:24 kernel: ata2.00: failed command: READ FPDMA QUEUED
Apr 13 20:40:24 kernel: ata2.00: cmd 60/80:98:c0:7f:9b/00:00:00:00:00/40 tag 19 ncq 65536 in
                                            res 40/00:a4:00:19:95/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:40:24 kernel: ata2.00: status: { DRDY }
Apr 13 20:40:24 kernel: ata2: hard resetting link

--- Disk 4 ----
Apr 13 20:36:41 kernel: ata4.00: exception Emask 0x50 SAct 0xfe0000 SErr 0x280900 action 0x6 frozen
Apr 13 20:36:41 kernel: ata4.00: irq_stat 0x08000000, interface fatal error
Apr 13 20:36:41 kernel: ata4: SError: { UnrecovData HostInt 10B8B BadCRC }
Apr 13 20:36:41 kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata4.00: cmd 60/40:88:b0:3b:e7/05:00:ee:00:00/40 tag 17 ncq 688128 in
                                            res 40/00:bc:30:5b:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata4.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata4.00: cmd 60/40:90:f0:40:e7/05:00:ee:00:00/40 tag 18 ncq 688128 in
                                            res 40/00:bc:30:5b:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata4.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata4.00: cmd 60/40:98:30:46:e7/05:00:ee:00:00/40 tag 19 ncq 688128 in
                                            res 40/00:bc:30:5b:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata4.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata4.00: cmd 60/40:a0:70:4b:e7/05:00:ee:00:00/40 tag 20 ncq 688128 in
                                            res 40/00:bc:30:5b:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata4.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata4.00: cmd 60/40:a8:b0:50:e7/05:00:ee:00:00/40 tag 21 ncq 688128 in
                                            res 40/00:bc:30:5b:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata4.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata4.00: cmd 60/40:b0:f0:55:e7/05:00:ee:00:00/40 tag 22 ncq 688128 in
                                            res 40/00:bc:30:5b:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata4.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata4.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata4.00: cmd 60/00:b8:30:5b:e7/05:00:ee:00:00/40 tag 23 ncq 655360 in
                                            res 40/00:bc:30:5b:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata4.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata4: hard resetting link

--- Disk 5 ----
Apr 13 20:38:33 kernel: ata5.00: exception Emask 0x50 SAct 0x7 SErr 0x200900 action 0x6 frozen
Apr 13 20:38:34 kernel: ata5.00: irq_stat 0x08000000, interface fatal error
Apr 13 20:38:34 kernel: ata5: SError: { UnrecovData HostInt BadCRC }
Apr 13 20:38:34 kernel: ata5.00: failed command: READ FPDMA QUEUED
Apr 13 20:38:34 kernel: ata5.00: cmd 60/80:00:40:10:95/00:00:00:00:00/40 tag 0 ncq 65536 in
                                            res 40/00:0c:00:10:95/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:38:34 kernel: ata5.00: status: { DRDY }
Apr 13 20:38:34 kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 13 20:38:34 kernel: ata5.00: cmd 61/40:08:00:10:95/00:00:00:00:00/40 tag 1 ncq 32768 out
                                            res 40/00:0c:00:10:95/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:38:34 kernel: ata5.00: status: { DRDY }
Apr 13 20:38:34 kernel: ata5.00: failed command: WRITE FPDMA QUEUED
Apr 13 20:38:34 kernel: ata5.00: cmd 61/40:10:c0:0f:95/00:00:00:00:00/40 tag 2 ncq 32768 out
                                            res 40/00:0c:00:10:95/00:00:00:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:38:34 kernel: ata5.00: status: { DRDY }
Apr 13 20:38:34 kernel: ata5: hard resetting link
Apr 13 20:38:34 kernel: do_marvell_9170_recover: ignoring PCI device (8086:2821) at PCI#0
Apr 13 20:38:34 kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Apr 13 20:38:34 kernel: ata5.00: configured for UDMA/33
Apr 13 20:38:34 kernel: ata5: EH complete

--- Disk 6 ----
Apr 13 20:36:41 kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Apr 13 20:36:41 kernel: ata6.00: configured for UDMA/33
Apr 13 20:36:41 kernel: ata6: EH complete
Apr 13 20:36:41 kernel: ata6.00: exception Emask 0x50 SAct 0x807fffc SErr 0x280900 action 0x6 frozen
Apr 13 20:36:41 kernel: ata6.00: irq_stat 0x08000000, interface fatal error
Apr 13 20:36:41 kernel: ata6: SError: { UnrecovData HostInt 10B8B BadCRC }
Apr 13 20:36:41 kernel: ata6.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata6.00: cmd 60/18:10:98:a9:e6/02:00:ee:00:00/40 tag 2 ncq 274432 in
                                            res 40/00:94:30:5e:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata6.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata6.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata6.00: cmd 60/40:18:58:a4:e6/05:00:ee:00:00/40 tag 3 ncq 688128 in
                                            res 40/00:94:30:5e:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata6.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata6.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata6.00: cmd 60/40:20:18:9f:e6/05:00:ee:00:00/40 tag 4 ncq 688128 in
                                            res 40/00:94:30:5e:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)
Apr 13 20:36:41 kernel: ata6.00: status: { DRDY }
Apr 13 20:36:41 kernel: ata6.00: failed command: READ FPDMA QUEUED
Apr 13 20:36:41 kernel: ata6.00: cmd 60/40:28:d8:99:e6/05:00:ee:00:00/40 tag 5 ncq 688128 in
                                            res 40/00:94:30:5e:e7/00:00:ee:00:00/40 Emask 0x50 (ATA bus error)

It is of course unlikely that soo many disks are bad even though the disks aren't young anymore. I would be suspect of the chassis here.

I suggest that you take a backup right now. Because md126 is degraded, one more disk being kicked from the raid could leave you in serious trouble.

=== My recommendations ===

1. Take a backup of your data asap.

2. Turn off NAS and test each disk in a PC with WD disk test tool. It can be downloaded from their website.

3. Replace any disks that come out bad.

4. Factory reset the NAS and start over with all healthy disks. Restore data from backups.

5. If issue re-occurs --> replace the NAS. Keep backups at all times!

Cheers

ccbnz
Aspirant
Apr 15, 2019
Hi Hopchen . Thanks so much for looking at my log files. Yikes!

I've backed up the NAS and done a factory reset. The NAS seems to be working fine now but looking at the latest kernel log file, there are still ata errors such as:

Apr 16 01:27:23 kernel: do_marvell_9170_recover: ignoring PCI device (8086:2821) at PCI#0
Apr 16 01:27:23 kernel: ata6.00: exception Emask 0x40 SAct 0x0 SErr 0x800800 action 0x6
Apr 16 01:27:23 kernel: ata6.00: irq_stat 0x40000001
Apr 16 01:27:23 kernel: ata6: SError: { HostInt LinkSeq }
Apr 16 01:27:23 kernel: ata6.00: failed command: WRITE DMA
Apr 16 01:27:23 kernel: ata6.00: cmd ca/00:08:40:d8:1a/00:00:00:00:00/e0 tag 10 dma 4096 out
res 51/10:08:40:d8:1a/00:00:00:00:00/e0 Emask 0xc1 (internal error)
Apr 16 01:27:23 kernel: ata6.00: status: { DRDY ERR }
Apr 16 01:27:23 kernel: ata6.00: error: { IDNF }
Apr 16 01:27:23 kernel: ata6: hard resetting link
Apr 16 01:27:23 kernel: do_marvell_9170_recover: ignoring PCI device (8086:2821) at PCI#0
Apr 16 01:27:23 kernel: ata6: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Apr 16 01:27:23 kernel: ata6.00: configured for UDMA/33
Apr 16 01:27:23 kernel: ata6: EH complete

This seems to be pretty much happening with all of the drives. These are different errors than those reported in the log you looked at.

Would you expect the above errors normally?

DMA errors would likely be related to the Motherboard right? I did upgrade the CPU and memory on the motherboard a year or so back. But it's been fine since.

chyzm Do you have similar errors in your kernel log file?

Forum Discussion

Volume degraded after firmware update to 6.9.5 on RN212

31 Replies

Related Content

Volume Data Degraded

readynas RN3220 volume degraded volume resyncing

Volume Data degraded error

RN312 random volume degraded with no indication why

Hope to Get Data & Volume: Volume Degraded --> No Volume

NETGEAR Academy

ProSupport for Business