Forum Discussion

Aspirant

Mar 10, 2019

Solved

ReadyNAS Ultra-6 data degraded after firmware update

I have an old ReadyNAS Ultra-6 that I upgraded to Firmware 6.9.5. After the upgrade it shows data degraded. However the data is not degraded, all drives are healthy, I can still access the shares, and have copied files off them and they seem to work fine.

When I go into the web interface I can see it has 2x RAID groups listed that share some (but not all) of the same HDDs!?? Is this normal (or even possible)? I think maybe during the upgrade it found an old array config and reactivate it causing my issue...? I dunno.

While I have copied off some of the more important data, I do not have sufficient space to copy off the remainder (non-critical stuff, but I'd rather not have to download it all again). So before I blow it all away and start from scratch, is there a way to use the console through SSH to fix it without wiping the config? I know nothing about Linux so I'm not keen to start blindly trying stuff, but I can follow instructions...

I don't even know which of the RAID groups is correct, if either. It used to have 6x2TB drives, and I'd swapped in 3x 6TB drives a while back, so now I'm not sure what the correct configuration should be as it's been a while since I looked at it.

Anyway, if anyone has any instructions on how to diagnose or fix this weird issue I'd be grateful for the help. If not I guess I will just wipe the config and start from scratch. :(

Thanks

Matthew

Hopchen
Mar 11, 2019
Hi again

Thanks for posting the disk info. As suspected, disk sda has seen better days. From what I can see in the logs, the disk should be located in bay 1 (the first disk in the NAS). We can see that the disk has some Current Pending Sector errors.

Device: sda Controller: 0 Channel: 0 <<<=== Bay 1 Model: WDC WD60EFRX-68MYMN0 Serial: Firmware: 82.00A82 Class: SATA RPM: 5700 Sectors: 11721045168 Pool: data PoolType: RAID 5 PoolState: 3 PoolHostId: 33eac74a Health data ATA Error Count: 0 Reallocated Sectors: 0 Reallocation Events: 0 Spin Retry Count: 0 Current Pending Sector Count: 11 <<<=== Bad sectors on the disk Uncorrectable Sector Count: 0 Temperature: 32 Start/Stop Count: 136 Power-On Hours: 35247 Power Cycle Count: 77 Load Cycle Count: 15929

Pending sectors typically indicates imminent failure of the disk. An Acronis KB, describes the issue particularly well.

Current Pending Sector Count S.M.A.R.T. parameter is a critical parameter and indicates the current count of unstable sectors (waiting for remapping). The raw value of this attribute indicates the total number of sectors waiting for remapping. Later, when some of these sectors are read successfully, the value is decreased. If errors still occur when reading some sector, the hard drive will try to restore the data, transfer it to the reserved disk area (spare area) and mark this sector as remapped.

Please also consult your machines's or hard disks documentation.
Recommendations

This is a critical parameter. Degradation of this parameter may indicate imminent drive failure. Urgent data backup and hardware replacement is recommended.

https://kb.acronis.com/content/9133

It is quite likely that these bad sectors caused the NAS to kick the disk from the one of the data raids. The fact that those sectors appear stuck in "pending" is an indication that the sectors will probably never recover. Without further examination of the logs I'd say you need to replace that disk asap. Note: you must replace with a disk of same size or larger.

Your other disks appear to be healthy, which is good!

Cheers

15 Replies

Replies have been turned off for this discussion

Hopchen
Prodigy
Mar 10, 2019
Hey SLAM-ER

Sounds like a disk probably dropped from the raid.

Can you download the logs and post the output of: mdstat.log

You can just post the first section, called "Personalities"
- Sandshark
  Sensei
  Mar 10, 2019
  Since you say you "swapped out" the 6TB drives, I'm assuming the other bays stll have the 2TBs. So, yes, you'll have two RAID groups. One 6x2TB and one 3x4TB. Only a factory default will change that (but there is normally no need to do so). "Degraded" means "no redundancy", not "no access", so it's likely true. Since you have no full backup, you need to fix that before you lose another drive and the volume does become "dead". Unfortunately, fixing it could put your other drives at higher risk if a resync is needed (which I think it will). So fixing the no backup issue should also be on your short list.
  
  It is odd that all drives show green if the volume is degraded, unless it is currently re-syncing. Hopchen should be able to tell you more from the log, but if you hover over the green dot, do all drives say they are part of volume "data"?
  - SLAM-ER
    Aspirant
    Mar 10, 2019
    It was all 2tb drives then I swapped in 6tb drives one at a time so now it has 3x6tb and 3x2rb.
    
    Yeah I did more reading and saw that multiple raid groups is result of raid-x expansion.
    
    All drives are green, all listed as part of 'data'. On the unit LCD display where it says degraded all the drive bays are shown and flashing, whether this means they are failed or just populated I don't know.
    
    I will post logs etc when I get home.
Hopchen
Prodigy
Mar 11, 2019
Hi SLAM-ER

Thanks for posting the mdstat log.

Firstly, let me clarify the behaviour of the NAS with the disk configuration that you have. The NAS will actually raid partitions together, not entire disks. So, when using different size disks as you do, the NAS will make a 2TB partition across all 6 disks and raid those partitions together in a raid 5. That will form one data raid - md126 in this instance.

md126 : active raid5 sdc3[8] sdf3[5] sde3[4] sdd3[3] sdb3[6] 9743324160 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/5] [_UUUUU]

Next the NAS will take the 3 remaining larger disks and make 4TB partitions on each disk and raid those partitions together in a separate data raid. In this case, md127.

md127 : active raid5 sda4[0] sdc4[2] sdb4[1] 7813753856 blocks super 1.2 level 5, 64k chunk, algorithm 2 [3/3] [UUU]

Thereafter, the NAS sticks the two data raids together on the filesystem level in order to make it "one volume". So, what you are seeing with two data raids is perfectly normal when using different sized disks. Sandshark - FYI this will be same configuration whether he factory defaults or not.

With that out of the way, we can see that md126 is degraded. The partition from sda (one of the disks) is missing in this raid.

md126 : active raid5 sdc3[8] sdf3[5] sde3[4] sdd3[3] sdb3[6] <<<=== "sda3" missing as a paticipant here. 9743324160 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/5] [_UUUUU] <<<=== raid notifying you that one disk is out of this raid.

This makes the md126 raid degraded - i.e. non-redundant anymore. Another disk failure will render the entire volume dead at this point so it needs to be addressed. There could be several reasons for the disk going missing in the md126 raid but a firmware update is not a likely suspect. What is more likely is that the "sda" disk has some dodgy sectors on the partition used for the md126 raid and thus the NAS might have kicked it from that raid upon boot.

What is the health of the disks overall? Can you post the output of the disk_info.log (masking any serial numbers for your disks)?

Thanks
- SLAM-ER
  Aspirant
  Mar 11, 2019
  Device: sda
  Controller: 0
  Channel: 0
  Model: WDC WD60EFRX-68MYMN0
  Serial:
  Firmware: 82.00A82
  Class: SATA
  RPM: 5700
  Sectors: 11721045168
  Pool: data
  PoolType: RAID 5
  PoolState: 3
  PoolHostId: 33eac74a
  Health data
  ATA Error Count: 0
  Reallocated Sectors: 0
  Reallocation Events: 0
  Spin Retry Count: 0
  Current Pending Sector Count: 11
  Uncorrectable Sector Count: 0
  Temperature: 32
  Start/Stop Count: 136
  Power-On Hours: 35247
  Power Cycle Count: 77
  Load Cycle Count: 15929
  
  Device: sdb
  Controller: 0
  Channel: 1
  Model: WDC WD60EFRX-68MYMN0
  Serial:
  Firmware: 82.00A82
  Class: SATA
  RPM: 5700
  Sectors: 11721045168
  Pool: data
  PoolType: RAID 5
  PoolState: 3
  PoolHostId: 33eac74a
  Health data
  ATA Error Count: 0
  Reallocated Sectors: 0
  Reallocation Events: 0
  Spin Retry Count: 0
  Current Pending Sector Count: 0
  Uncorrectable Sector Count: 0
  Temperature: 35
  Start/Stop Count: 105
  Power-On Hours: 37416
  Power Cycle Count: 75
  Load Cycle Count: 20776
  
  Device: sdc
  Controller: 0
  Channel: 2
  Model: ST6000NM0115-1YZ110
  Serial:
  Firmware: SN04
  Class: SATA
  RPM: 7200
  Sectors: 11721045168
  Pool: data
  PoolType: RAID 5
  PoolState: 3
  PoolHostId: 33eac74a
  Health data
  ATA Error Count: 0
  Reallocated Sectors: 0
  Reallocation Events: 0
  Spin Retry Count: 0
  End-to-End Errors: 0
  Command Timeouts: 0
  Current Pending Sector Count: 0
  Uncorrectable Sector Count: 0
  Temperature: 37
  Start/Stop Count: 16
  Power-On Hours: 4434
  Power Cycle Count: 6
  Load Cycle Count: 5541
  
  Device: sdd
  Controller: 0
  Channel: 3
  Model: WDC WD20EFRX-68EUZN0
  Serial:
  Firmware: 82.00A82
  Class: SATA
  RPM: 5400
  Sectors: 3907029168
  Pool: data
  PoolType: RAID 5
  PoolState: 3
  PoolHostId: 33eac74a
  Health data
  ATA Error Count: 0
  Reallocated Sectors: 0
  Reallocation Events: 0
  Spin Retry Count: 0
  Current Pending Sector Count: 0
  Uncorrectable Sector Count: 0
  Temperature: 28
  Start/Stop Count: 220
  Power-On Hours: 19423
  Power Cycle Count: 13
  Load Cycle Count: 1799
  
  Device: sde
  Controller: 0
  Channel: 4
  Model: WDC WD20EFRX-68EUZN0
  Serial:
  Firmware: 82.00A82
  Class: SATA
  RPM: 5400
  Sectors: 3907029168
  Pool: data
  PoolType: RAID 5
  PoolState: 3
  PoolHostId: 33eac74a
  Health data
  ATA Error Count: 0
  Reallocated Sectors: 0
  Reallocation Events: 0
  Spin Retry Count: 0
  Current Pending Sector Count: 0
  Uncorrectable Sector Count: 0
  Temperature: 29
  Start/Stop Count: 181
  Power-On Hours: 19417
  Power Cycle Count: 13
  Load Cycle Count: 1794
  
  Device: sdf
  Controller: 0
  Channel: 5
  Model: WDC WD20EFRX-68EUZN0
  Serial:
  Firmware: 82.00A82
  Class: SATA
  RPM: 5400
  Sectors: 3907029168
  Pool: data
  PoolType: RAID 5
  PoolState: 3
  PoolHostId: 33eac74a
  Health data
  ATA Error Count: 0
  Reallocated Sectors: 0
  Reallocation Events: 0
  Spin Retry Count: 0
  Current Pending Sector Count: 0
  Uncorrectable Sector Count: 0
  Temperature: 29
  Start/Stop Count: 223
  Power-On Hours: 19423
  Power Cycle Count: 14
  Load Cycle Count: 1869
  - Hopchen
    Prodigy
    Mar 11, 2019
    Hi again
    
    Thanks for posting the disk info. As suspected, disk sda has seen better days. From what I can see in the logs, the disk should be located in bay 1 (the first disk in the NAS). We can see that the disk has some Current Pending Sector errors.
    
    Device: sda Controller: 0 Channel: 0 <<<=== Bay 1 Model: WDC WD60EFRX-68MYMN0 Serial: Firmware: 82.00A82 Class: SATA RPM: 5700 Sectors: 11721045168 Pool: data PoolType: RAID 5 PoolState: 3 PoolHostId: 33eac74a Health data ATA Error Count: 0 Reallocated Sectors: 0 Reallocation Events: 0 Spin Retry Count: 0 Current Pending Sector Count: 11 <<<=== Bad sectors on the disk Uncorrectable Sector Count: 0 Temperature: 32 Start/Stop Count: 136 Power-On Hours: 35247 Power Cycle Count: 77 Load Cycle Count: 15929
    
    Pending sectors typically indicates imminent failure of the disk. An Acronis KB, describes the issue particularly well.
    
    Current Pending Sector Count S.M.A.R.T. parameter is a critical parameter and indicates the current count of unstable sectors (waiting for remapping). The raw value of this attribute indicates the total number of sectors waiting for remapping. Later, when some of these sectors are read successfully, the value is decreased. If errors still occur when reading some sector, the hard drive will try to restore the data, transfer it to the reserved disk area (spare area) and mark this sector as remapped.
    
    Please also consult your machines's or hard disks documentation.
    Recommendations
    
    This is a critical parameter. Degradation of this parameter may indicate imminent drive failure. Urgent data backup and hardware replacement is recommended.
    
    https://kb.acronis.com/content/9133
    
    It is quite likely that these bad sectors caused the NAS to kick the disk from the one of the data raids. The fact that those sectors appear stuck in "pending" is an indication that the sectors will probably never recover. Without further examination of the logs I'd say you need to replace that disk asap. Note: you must replace with a disk of same size or larger.
    
    Your other disks appear to be healthy, which is good!
    
    Cheers