ReadyNAS 316 Degraded and Backup Done?

MalcolmSlaney · ‎2020-07-16

My ReadyNAS 316 status is showing as "degraded" but there is to explanation of why. The system is currently reporting "The resync operation finished on volume data. However, the volume is still degraded." But why?

This all started with this message: Volume: The volume data encountered an error and was made read-only. It is recommended to backup your data. But again not explanation!

I've seen others advice that the only solution is to reformat all the drives, perhaps removing the bad one. OK, fair enough.

So, two questions:

1) How do I know what drive is bad? Or how to fix it? I've appended the mdstat.log as others have suggested this will explain all. But I don't see any mention here of bad or spare drives.

2) And perhaps more importantly. How do I know the backup I have created is any good???? And I got everything? On the shares page, I only see one internal object (containing documents, home folders, music, pictures, video) and the the bar graphs says that I've got 7.57 of 10.91TB available, suggesting I'm using 3T of disk space. But the external drive is showing 7.7T available, 1.3T used (of a 10T drive). The backup page says the backup was completed. Is there any way to check?

Where do the time machine backups go? I just uploaded 400G of data from my laptop, and I don't see that reflected anywhere. I'm very nervous about whether I have a full backup.

Thanks,

Confused and worried,

- Malcolm

P.S. I'm not sure if this shows up in the log, but I did insert the new 10T drive in the fourth bay for a few minutes, before I realized that I really should use it as a backup and removed it. But I have more than enough disk space with only two drives.... so I'm not sure why I am degraded!!!

Personalities : [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] 
md1 : active raid10 sdd2[3] sdc2[2] sdb2[1] sda2[0]
      1044480 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      
md127 : active raid5 sdd3[3] sda3[0] sdc3[2] sdb3[1]
      11711341568 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
      [>....................]  reshape =  3.4% (200509312/5855670784) finish=2399.6min speed=39278K/sec
      
md0 : active raid1 sdd1[3] sda1[0] sdc1[2] sdb1[1]
      4190208 blocks super 1.2 [4/4] [UUUU]
      
unused devices: <none>
/dev/md/0:
           Version : 1.2
     Creation Time : Tue Jan 26 18:37:05 2016
        Raid Level : raid1
        Array Size : 4190208 (4.00 GiB 4.29 GB)
     Used Dev Size : 4190208 (4.00 GiB 4.29 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

       Update Time : Mon Jul 13 10:48:22 2020
             State : clean 
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

Consistency Policy : unknown

              Name : 2fe68116:0  (local to host 2fe68116)
              UUID : 644fc5d1:e118a8cd:cbcf4b4d:58f1886a
            Events : 302

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
       3       8       49        3      active sync   /dev/sdd1
/dev/md/data-0:
           Version : 1.2
     Creation Time : Tue Jan 26 18:37:05 2016
        Raid Level : raid5
        Array Size : 11711341568 (11168.81 GiB 11992.41 GB)
     Used Dev Size : 5855670784 (5584.40 GiB 5996.21 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

       Update Time : Mon Jul 13 10:48:19 2020
             State : clean, reshaping 
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 64K

Consistency Policy : unknown

    Reshape Status : 3% complete
     Delta Devices : 1, (3->4)

              Name : 2fe68116:data-0  (local to host 2fe68116)
              UUID : f90f48f4:b0d62274:a0f20ad6:f31c5cb0
            Events : 1970

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3

StephenB · ‎2020-07-19

@MalcolmSlaney wrote:

My ReadyNAS 316 status is showing as "degraded" but there is to explanation of why. The system is currently reporting "The resync operation finished on volume data. However, the volume is still degraded." But why?
      
md127 : active raid5 sdd3[3] sda3[0] sdc3[2] sdb3[1]
      11711341568 blocks super 1.2 level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
      [>....................]  reshape =  3.4% (200509312/5855670784) finish=2399.6min speed=39278K/sec
      

The resync isn't finished, (at least not according to mdstat). The volume will remain degraded until it does.

I am confused by your comment that you "have enough space on two drives". mdstat is showing 4 drives in your array. How many do you have inserted now?

@MalcolmSlaney wrote:

P.S. I'm not sure if this shows up in the log, but I did insert the new 10T drive in the fourth bay for a few minutes, before I realized that I really should use it as a backup and removed it. But I have more than enough disk space with only two drives.... so I'm not sure why I am degraded!!!

You clearly don't understand how RAID works. It creates a virtual disk that uses all your drives, and the file system then uses that virtual disk. Your files are spread across all the drives. Inserting the drive for a couple of minutes and then replacing it was a very bad idea. You could easily have lost your data. Fortunately it appears the the NAS didn't try to add the drive to your array. That suggests that the drive was formatted.

@MalcolmSlaney wrote:

2) And perhaps more importantly. How do I know the backup I have created is any good???? And I got everything? On the shares page, I only see one internal object (containing documents, home folders, music, pictures, video) and the the bar graphs says that I've got 7.57 of 10.91TB available, suggesting I'm using 3T of disk space. But the external drive is showing 7.7T available, 1.3T used (of a 10T drive). The backup page says the backup was completed. Is there any way to check?

You can look in the backup logs. You can also directly compare the file contents on the NAS with the file contents on the USB drive, and see if anything is missing.

Do you use snapshots on the NAS? If so, how much space is being used by snapshots?

Also, how did you back up the NAS? Likely you didn't back up the timemachine backups.

@MalcolmSlaney wrote:

So, two questions:

1) How do I know what drive is bad? Or how to fix it? I've appended the mdstat.log as others have suggested this will explain all. But I don't see any mention here of bad or spare drives.

mdstat.log shows what disks were part of the array (and what disks were not) when you downloaded the log zip. In your case, it shows 4 drives were in the array, though it is resyncing. The timing here matters - if you downloaded this log when the 10 TB drive was inserted, then it was clearly adding the 10 TB drive to your RAID array when you removed it.

Disk_info.log is a better place to look to see the drive health. If you want someone to review the data there, just put it in a reply (as you did with mdstat.log).

Sandshark · ‎2020-07-19

I'm not sure why you got the message the sync was complete, unless it did (and failed) and you re-started the NAS, which re-started the sync. You appear to be playing with fire here. You most likely have a bad drive, and are trying to force the NAS to accept it again. That can result in loss of everything.

Time machine backups are in your personal folder. Unfortunately, space used by personal folders doesn't show up on the Shares page except in the total for the volume. I'm not a Mac user, but I understand there is a special process needed to back up a Time Machine backup such that it can be transfered/restored and still be a valid backup.

MalcolmSlaney · ‎2020-07-26

OK, it does appear that my first drive had errors. Are 16 uncorrectable sector errors enough to trash the disk?

First question is, why is this information so hard to find? The only message I'm getting from ReadyNAS is that my system is degraded. Thanks to this thread I knew to look in the disk_info.log (attached) but it shouldn't have been so hard to find.

Second question, what can I do about it? I don't mind throwing away the drive, but I'd rather not throw away somethat is working. Can the system mark these sectors as bad and go on about its work? Do I have to trust my backup and reformat all the volumes? Or just remove this volume and let it shift all the data around. Can I mark the disk as going out of service so it can rejigger things before I remove it?

THanks.

-- Malcolm

Device:             sda
Controller:         0
Channel:            0
Model:              HGST HDN726060ALE610
Serial:             NAHYMWSX
Firmware:           APGNT517H
Class:              SATA
RPM:                7200
Sectors:            11721045168
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         2fe68116
Health data 
  ATA Error Count:                2
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     16
  Temperature:                    47
  Start/Stop Count:               37007
  Power-On Hours:                 36540
  Power Cycle Count:              65
  Load Cycle Count:               37060

Device:             sdb
Controller:         0
Channel:            1
Model:              HGST HDN726060ALE610
Serial:             NAHK5L4Y
Firmware:           APGNT517H
Class:              SATA
RPM:                7200
Sectors:            11721045168
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         2fe68116
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    50
  Start/Stop Count:               35708
  Power-On Hours:                 36778
  Power Cycle Count:              74
  Load Cycle Count:               35757

Device:             sdc
Controller:         0
Channel:            2
Model:              HGST HDN726060ALE610
Serial:             NAHK6G1Y
Firmware:           APGNT517H
Class:              SATA
RPM:                7200
Sectors:            11721045168
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         2fe68116
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    51
  Start/Stop Count:               35156
  Power-On Hours:                 36786
  Power Cycle Count:              74
  Load Cycle Count:               35192

Device:             sdd
Controller:         0
Channel:            3
Model:              HGST HUH721010ALE600
Serial:             2YKGEA1D
Firmware:           LHGNT384H
Class:              SATA
RPM:                7200
Sectors:            19532873728
Pool:               data
PoolType:           RAID 5
PoolState:          1
PoolHostId:         2fe68116
Health data 
  ATA Error Count:                0
  Reallocated Sectors:            0
  Reallocation Events:            0
  Spin Retry Count:               0
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:     0
  Temperature:                    41
  Start/Stop Count:               1
  Power-On Hours:                 1
  Power Cycle Count:              1
  Load Cycle Count:               64

StephenB · ‎2020-07-27

Is the resync still running (or the system still degraded)? Normally I'd expect that the NAS would abort the resync at some point, and declare the drive failed.

@MalcolmSlaney wrote:

Are 16 uncorrectable sector errors enough to trash the disk?

You don't actually know how many bad sectors there are, you only know how many the ReadyNAS system has found.

One option is to run a disk test (one of the maintenance tasks on the volume settings wheel), and wait for that to complete. Then download the logs again, and see what the count is.

Alternatively, try powering down the NAS, move the drive to a windows PC, and test it with WD's lifeguard software. You'd run the long test (and if that passes, I'd personally also run the destructive erase test).

Or you can just replace the disk. I have replaced disks with similar error counts - especially if they were starting to rise.

@MalcolmSlaney wrote:

Can the system mark these sectors as bad and go on about its work?

Personally I haven't found that strategy to be very useful. Usually when my disks start to fail, the number of bad sectors continues to rise - so flagging them as bad doesn't work for very long. In the particular case of RAID, if another disk needed to be replaced, then any errors on this one whould cause the volume to collapse.

So my own approach is to replace disks I can't trust.

ReadyNAS 316 Degraded and Backup Done?

ReadyNAS 316 Degraded and Backup Done?

Re: ReadyNAS 316 Degraded and Backup Done?

Re: ReadyNAS 316 Degraded and Backup Done?

Re: ReadyNAS 316 Degraded and Backup Done?

Re: ReadyNAS 316 Degraded and Backup Done?