vertical expansion - of the wrong nas?

StephenB
Guru - Experienced User
Oct 24, 2021
berillio wrote:

Actually, if I power it on to copy the data BEFORE doing any volume rebuilding, would it be better to do it with the failing disk removed (it is already out), or power it back on, and hot remove it?

I had a disk failure initiating a daisy chain failure fo an healthy disk, I would prefer REMOVE the disk before it the NAS failed it

The puzzle here is whether you have multiple disks on the edge or not.

If only a single disk is at risk, then removing it is fine. But if multiple disks are stuggling, then I think it's best to leave them all in place. The theory there is that the bad sectors aren't likely to overlap, so all the data is still recoverable.

Personally I've never seen a case where a failing disk provoked a failure on a second drive.
- berillio
  Aspirant
  Oct 25, 2021
  Stephen, Sandshark,
  Thanks again for coming back
  
  Some NAS/disk history.
  The array in NAS A was built in stages in a RN104. Started with a 4TB Seagate, added 2 WD Reds, one of which failed and was replaced, and a 4th Red.
  In March (?) 2020 the RN104 failed, in April I bought two RN214s (NAS A&B), migrated the array from the RN104 to NAS A; the failed Red subsequently passed WD diagnostics and went in Slot 1 of NAS B with 3 more WD Reds, all powered up in May 2020. Faultless so far, and it is the MOST used, PCs are writing to it at all times (i.e., right now).
  
  NAS A: slot 1 is the only drive which has a sector count [64]
  slot    disk                                 dom             fitted
  1        ST4000DM000-1F2168      20-May-13    Sept-2013 (?)
  2        WD40EFRX-68N32N0        15-May-18    Oct-2018 (the replacement)
  3        WD40EFRX-68WT0N0       02-Feb-15     April -2015
  4        WD40EFRX-68N32N0        07-Mar-18    Oct-2018
  
  From disk_info.log (“health data” non reported was ZERO);(I don’t know if there is other data of interest in other logs).
  
  NAS A,
  Disk 1 (the “failing” disk)
  Date of Manufacture (DoM)          20-May-2013
  Fitted                                        September (?) 2013
  Current Pending Sector Count:   64
  Uncorrectable Sector Count:     64
  Temperature:                                 47
  Start/Stop Count:                          542
  Power-On Hours:                         57893
  Power Cycle Count:                    389
  Load Cycle Count:                      543
  
  Disk 2
  DoM                                          15-May-2018
  Fitted                                       October 2018
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:        0
  Temperature:                                48
  Start/Stop Count:                       280
  Power-On Hours:                        22363
  Power Cycle Count:                    180
  Load Cycle Count:                      24
  
  Disk 3
  DoM                                          02-Feb-15
  Fitted                                       April -2015
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:        0
  Temperature:                                49
  Start/Stop Count:                       461
  Power-On Hours:                        48656
  Power Cycle Count:                    340
  Load Cycle Count:                      7847
  
  Disk 4
  DoM                                          07-Mar-18
  Fitted                                        Oct-2018
  Current Pending Sector Count:    0
  Uncorrectable Sector Count:        0
  Temperature:                             43
  Start/Stop Count:                       298
  Power-On Hours:                        24677
  Power Cycle Count:                    193
  Load Cycle Count:                      307
  
  NAS B - fully populated & started in May 2020
  1        WD40EFRX -68WT0N0      02-Feb-2015 Failed, then passed DLDIAG
  2        WD40EFRX-68N32N0        26-Jan-2020
  3        WD40EFRX-68N32N0        26-Jan-2020
  4        WD40EFRX-68N32N0        26-Jan-2020
  
  NAS B – all populated and started in May 2020
  Disk 1
  DoM                                          02-Feb-2015
  Fitted           April 2015, failed    May 2020 (passed DLDDIAG)
  Current Pending Sector Count: 0
  Uncorrectable Sector Count:        0
  Temperature:                             46
  Start/Stop Count:                       1631
  Power-On Hours:                        39333
  Power Cycle Count:                    275
  Load Cycle Count:                      8708
  
  Disk 2
  DoM                                          26 Jan 2020
  Fitted                                        5 May 2020
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:        0
  Temperature:                             47
  Start/Stop Count:                       1296
  Power-On Hours:                        12847
  Power Cycle Count:                    66
  Load Cycle Count:                      1291
  
  Disk 3
  DoM                                          26 Jan 2020
  Fitted                                        5 May 2020
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:        0
  Temperature:                             47
  Start/Stop Count:                       1207
  Power-On Hours:                        12830
  Power Cycle Count:                    61
  Load Cycle Count:                      1201
  
  Disk 4
  DoM                                          26 Jan 2020
  Fitted                                        5 May 2020
  Current Pending Sector Count:   0
  Uncorrectable Sector Count:        0
  Temperature:                             43
  Start/Stop Count:                       1140
  Power-On Hours:                        12796
  Power Cycle Count:                    60
  Load Cycle Count:                      1134
  
  Sandshark – I appreciate and share your concern about Disk1 in NAS B, the next oldest disk
  Unfortunately, I am not at all familiar with reading SMART data
  
  This is smart_history.log for NAS A
  time                 model                 serial                realloc_sect realloc_evnt spin_retry_cnt ioedc       cmd_timeouts pending_sect uncorrectable_err ata_errors
  ------------------- -------------------- -------------------- ------------ ------------ -------------- ---------- ------------ ------------ ----------------- ----------
  2013-11-07 20:06:55 ST4000DM000-1F2168    W300G5AN              0             0             0               0           0             0             0                  0
  2015-04-10 10:06:04 WDC WD40EFRX-68WT0N0 WD-WCC4E2KNHZ2N       0             0             0               -1          -1            0             0                  0
  2015-04-10 22:49:49 WDC WD40EFRX-68WT0N0 WD-WCC4E5AU82XY       0             0             0               -1          -1            0             0                  0
  2018-10-03 13:53:27 WDC WD40EFRX-68N32N0 WD-WCC7K6HY4DPN       0             0             0               -1          -1            0             0                  0
  2019-01-30 23:14:03 WDC WD40EFRX-68N32N0 WD-WCC7K7HD8AJD       0             0             0               -1          -1            0             0                  0
  2019-04-19 12:12:05 ST4000DM000-1F2168    W300G5AN              0             0             0               0           0             8             8                  0
  2021-10-20 10:58:53 ST4000DM000-1F2168    W300G5AN              0             0             0               0           0             16            16                 0
  2021-10-20 11:02:54 ST4000DM000-1F2168    W300G5AN              0             0             0               0           0             24            24                 0
  2021-10-20 11:04:54 ST4000DM000-1F2168    W300G5AN              0             0             0               0           0             32            32                 0
  2021-10-20 11:06:57 ST4000DM000-1F2168    W300G5AN              0             0             0               0           0             40            40                 0
  2021-10-20 11:12:59 ST4000DM000-1F2168    W300G5AN              0             0             0               0           0             48            48                 0
  2021-10-20 11:15:00 ST4000DM000-1F2168    W300G5AN              0             0             0               0           0             64            64                 0
  
  And for NASB
  time                 model                 serial                realloc_sect realloc_evnt spin_retry_cnt ioedc       cmd_timeouts pending_sect uncorrectable_err ata_errors
  ------------------- -------------------- -------------------- ------------ ------------ -------------- ---------- ------------ ------------ ----------------- ----------
  2020-05-02 21:01:07 WDC WD40EFRX-68WT0N0 WD-WCC4E2KNHZ2N       0             0             0               -1          -1            0             0                  0
  2020-05-05 02:36:57 WDC WD40EFRX-68N32N0 WD-WCC7K0SF9U38       0             0             0               -1          -1            0             0                  0
  2020-05-05 20:58:12 WDC WD40EFRX-68N32N0 WD-WCC7K4JKN2L4       0             0             0               -1          -1            0             0                  0
  2020-05-07 04:06:28 WDC WD40EFRX-68N32N0 WD-WCC7K6YX6PYY       0             0             0               -1          -1            0             0                  0
  
  stephen “disks on the edge”
  Well, AFAIK, everything is fine (but so it was last Thursday when, BY COMPLETE FLUKE, as I had the admin page open, I saw the alerts and shut down NAS A).
  Simply judging by the hours of operation (I don’t know how to judge the other parameters), the next oldest disks are NAS B slot 1 and NAS A Slot 3.
  I am currently transferring ~1.25TB from NAS B to NAS C; I am planning to fit a 8TB to a PC, to copy NAS B data on it.
  
  “Personally I've never seen a case where a failing disk provoked a failure on a second drive.”
  I believe that this is what happened on my NV+ v2, because only one disk was writing errors and because now that the disks are in one PC with a Linux double boot, two recovery suites can see the entire data set, one of them can also read/show the data in the free version. I never REALLY tried to recover the data because of lack of expertise (I know nothing about Linux), time, and priorities level. But maybe I am wrong, there was a REAL failure on another disk, and, althought the data "appears" to be recoverable, the recovery would fail. Now, any recovery attempt will have to wait even more, I have to take out those disks to make space for the 8TB on that PC.
  - StephenB
    Guru - Experienced User
    Oct 25, 2021
    berillio wrote:
    
    time model serial realloc_sect realloc_evnt spin_retry_cnt ioedc cmd_timeouts pending_sect uncorrectable_err ata_errors ------------------- -------------------- -------------------- ------------ ------------ -------------- ---------- ------------ ------------ ----------------- ---------- 2013-11-07 20:06:55 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 0 2019-04-19 12:12:05 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 8 8 0 2021-10-20 10:58:53 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 16 16 0 2021-10-20 11:02:54 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 24 24 0 2021-10-20 11:04:54 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 32 32 0 2021-10-20 11:06:57 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 40 40 0 2021-10-20 11:12:59 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 48 48 0 2021-10-20 11:15:00 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 64 64 0
    
    The log you posted is a table. I've changed the font to courier to make the columns line up.
    
    Uncorrectable errors/pending sectors are identical, suggesting that the errors are reading the disk (if they were write failures, you'd see reallocated sectors go up).
    
    The significant increase in counts (from 8 to 64) on 10/20 is concerning. If this were my disk, I would definitely take it out of service.
    
    berillio wrote:
    
    “Personally I've never seen a case where a failing disk provoked a failure on a second drive.”
    
    I believe that this is what happened on my NV+ v2,
    
    File System corruption is different from having a hardware disk failure somehow propagate.
    
    I guess if a disk seriously overheated, then it might cause an adjacent disk to fail. But I think the NAS would shut down to due to excess temp well before that could happen. I can't think of any other mechanism.
    
    Also, disks can fail well before that failure is detected - especially if you have files that aren't regularly accessed.

Forum Discussion

vertical expansion - of the wrong nas?

Related Content

X-RAID vertical expansion incomplete

RN10222D Vertical Expansion - First Time

ReadyNAS 312 Vertical Expansion - First Time

X-Raid vertical expansion - harddrive compatibility

ReadyNAS Ultra 6 (RNDU6000) Vertical expansion, when do I get the extra space

NETGEAR Academy

ProSupport for Business