New 428 with degraded c and data volumes

Kish · ‎2019-04-16

The unit is about two and a half weeks up and running...

Last night it showed degraded c and data volumes. And no errors.

What do I do ?

Hopchen · ‎2019-04-16

I can take a look at the logs for you, to see what is going on. Please download the logs from the NAS.

Follow the instructions for OS6 - https://kb.netgear.com/25625/How-do-I-send-ReadyNAS-system-logs-to-NETGEAR-Support

Upload the logs (zip file) to a Google drive or similar and make a link for it. PM me the link (don't post it publicly here).

Cheers

Kish · ‎2019-04-16

Hopchen · ‎2019-04-16

Hi @Kish

Thanks for sending the logs over.

You get the degraded warning because your data raid has kicked out one disk. Here we can see that the raid (md127) has marked disk "sdg" as faulty (F).

md127 : active raid6 sda3[0] sdh3[7] sdg3[6](F) sdf3[5] sde3[4] sdd3[3] sdc3[2] sdb3[1]
      17552500992 blocks super 1.2 level 6, 64k chunk, algorithm 2 [8/7] [UUUUUU_U] <<<=== Disk missing from the raid

That disk is located in bay number 7 (channel 6). The disk does not show any error stats on the surface.

Device:             sdg
Controller:         0
Channel:            6 <<<=== Disk 7
Model:              WDC WD30EFRX-68EUZN0
Serial:             (masked)
Firmware:           82.00A82
Class:              SATA
RPM:                5400
Sectors:            5860533168
Pool:               data
PoolType:           RAID 6
PoolState:          3
PoolHostId:         (masked)
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    32
  Start/Stop Count:               11
  Power-On Hours:                 29303
  Power Cycle Count:              9
  Load Cycle Count:               30103

However, the kernel log shows that the disk is throwing I/O errors due to bad sectors. I suspect that if you ran a disk test on that disk, it would show quite a few sector errors.

[Tue Apr 16 09:06:10 2019] sd 11:0:0:0: [sdg] tag#24 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Tue Apr 16 09:06:10 2019] sd 11:0:0:0: [sdg] tag#24 Sense Key : Medium Error [current] [descriptor] 
[Tue Apr 16 09:06:10 2019] sd 11:0:0:0: [sdg] tag#24 Add. Sense: Unrecovered read error - auto reallocate failed
[Tue Apr 16 09:06:10 2019] sd 11:0:0:0: [sdg] tag#24 CDB: Read(16) 88 00 00 00 00 00 00 00 00 48 00 00 00 08 00 00
[Tue Apr 16 09:06:10 2019] blk_update_request: 3 callbacks suppressed
[Tue Apr 16 09:06:10 2019] blk_update_request: I/O error, dev sdg, sector 73
[Tue Apr 16 09:06:10 2019] Buffer I/O error on dev sdg1, logical block 1, async page read
[Tue Apr 16 09:06:10 2019] ata12: EH complete
[Tue Apr 16 09:06:10 2019] do_marvell_9170_recover: ignoring PCI device (8086:19c2) at PCI#0
[Tue Apr 16 09:06:10 2019] ata12.00: exception Emask 0x0 SAct 0x20 SErr 0x0 action 0x0
[Tue Apr 16 09:06:10 2019] ata12.00: irq_stat 0x40000008
[Tue Apr 16 09:06:10 2019] ata12.00: failed command: READ FPDMA QUEUED
[Tue Apr 16 09:06:10 2019] ata12.00: cmd 60/08:28:48:00:00/00:00:00:00:00/40 tag 5 ncq 4096 in
         res 41/40:00:49:00:00/00:00:00:00:00/40 Emask 0x409 (media error) <F>
[Tue Apr 16 09:06:10 2019] ata12.00: status: { DRDY ERR }
[Tue Apr 16 09:06:10 2019] ata12.00: error: { UNC }
[Tue Apr 16 09:06:10 2019] ata12.00: configured for UDMA/133
[Tue Apr 16 09:06:10 2019] sd 11:0:0:0: [sdg] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[Tue Apr 16 09:06:10 2019] sd 11:0:0:0: [sdg] tag#5 Sense Key : Medium Error [current] [descriptor] 
[Tue Apr 16 09:06:10 2019] sd 11:0:0:0: [sdg] tag#5 Add. Sense: Unrecovered read error - auto reallocate failed
[Tue Apr 16 09:06:10 2019] sd 11:0:0:0: [sdg] tag#5 CDB: Read(16) 88 00 00 00 00 00 00 00 00 48 00 00 00 08 00 00
[Tue Apr 16 09:06:10 2019] blk_update_request: I/O error, dev sdg, sector 73
[Tue Apr 16 09:06:10 2019] Buffer I/O error on dev sdg1, logical block 1, async page read
[Tue Apr 16 09:06:10 2019] ata12: EH complete

The data raid kicked the disk on the 15th.

[19/04/15 19:43:03 EDT] warning:volume:LOGMSG_HEALTH_VOLUME Volume data health changed from Redundant to Degraded.

So, you have bad a disk. If the disks came with your NAS then get a disk RMA from NETGEAR. If the disks were sourced elsewhere then test the disk with Western Digital disk test tool (it can be downloaded from their website) and seek an RMA of the disk through them, if still under warranty. You would need to connect the disk to your PC for that.

Cheers

StephenB · ‎2019-04-17

@Hopchen wrote:

then test the disk with Western Digital disk test tool (it can be downloaded from their website) and seek an RMA of the disk through them, if still under warranty. You would need to connect the disk to your PC for that.

The disk is more than 3 years old (powered on for 29303 hours), so it is no longer under warranty. Still, I agree that you should test it with Lifeguard (the tool @Hopchen mentions). It will confirm that the issue is the disk.

New 428 with degraded c and data volumes

New 428 with degraded c and data volumes

Re: New 428 with degraded c and data volumes

Re: New 428 with degraded c and data volumes

Re: New 428 with degraded c and data volumes

Re: New 428 with degraded c and data volumes