Forum Discussion

Aspirant

Oct 23, 2021

vertical expansion - of the wrong nas?

Hello Forum, I have an issue, but describing became War and Peace (600 pages of it), so this is the short of it I have three NASs, (say A, B, C) (4x 4TB, 4x 4TB, 4x 8TB). A & B run OS-6.10.3, C ...

StephenB

Guru - Experienced User

Oct 24, 2021

berillio wrote:

A 4Tb disk failed on NAS A. I can replace it with a new 8TB I have – and start expanding, which is good because I need more storage space.

but

That is NOT the NAS which needs expanding. It is NAS B which needs expanding; it has one aging disk but also three newish one (april 2020).

What can/should I do?

I tend to favor minimal steps - so I'd normally just replace the failed disk in A with the 8 TB drive. Since it isn't going to expand, you can replace the 8 TB drive with a 4 TB later on if you want to.

However, your plan A would also work. Generally I test my drives in a PC before inserting them into - first running the full non-destructive generic test, and following that up with the full erase/write zeros test. I'd recommend doing that on the 4 TB drive you want to re-use. Erasing it will also avoid any confusion in NAS A when you add it.

It would be good to make sure you have a backup of the unique files of each NAS before you manipulate its drives. So if you want to expand anyway, getting a second 8 TB drive for that purpose makes sense.

I'm not sure what your long term plan here is. I suggest designating one NAS as the "primary" NAS, and putting all the content you have on that NAS. Then use the other 2 as backups. You can over time expand them all to have the same size (giving you two full backups), but in the beginning you can back up some shares on each (giving you one full backup between the two NAS).

If you go with that suggestion, then you first should figure out what capacity you want in each of the NAS.

berillio wrote: I can expand NAS A. Then move the ENTIRE contents from NAS A to NAS B and viceversa. Lengthy (~8TB each way, incredibly messy, and (if I understood right, as they are 90% a mistery to me) all the snapshots will get jumbled up and possibly useless.

You might also want to re-think how much retention you really need in the snapshots. For me, snapshots are a way to recover from user error. If retention is too short, then you might not realize that you need to recover something until it's too late. If the retention is too long, then you end up with a lot of disk space, and a lot of fragmentation in the main shares. I tend to use 3-month retention on the snapshots (though some shares have shorter retention).

Just to clarify this. If you

use NAS backup jobs to copy everything on A to B
then use NAS backup jobs to copy everything on B to A

A and B will have identical content in the main shares, but A and B will retain their original snapshots. The original B snapshots won't be on A (and vice versa).

FWIW, I do use NAS->NAS backup. My RN526x is the primary NAS. I do share-by-share backup to the other NAS (running daily), with daily snapshots enabled on each NAS.

The snapshots on the backup NAS are similar to the primary, but not identical. If a folder is renamed on the main NAS, then the rsync backup ends up doing a copy/delete. So the snapshots will reflect that (using more storage on the backup). But it is close enough for my purpose. If I rename a really large folder, I can always go into the backup NAS and rename the folder there as well.

Sandshark

Sensei

Oct 24, 2021

Plan 1 has the risk that the older drive will fail during the NAS2 re-sync. Just how old and whether there are any SMART errors are factors in the likelihood of failure. How long can you afford to have NAS1 non-redundant (which you'd base on how much "churn" there is on it, how well backed up, and whether or not you have any SMART errors on the remaining drives. If you can wait for the time it takes to expand NAS2, then replace the older drive in NAS2 with 8TB. Then, when that sync finishes, replace a newer one and move the newer 4TB into NAS1. Before moving it, you may want to zero it on a PC, especially if the volume name on NAS2 is the same as on NAS1.

berillio
Aspirant
Oct 24, 2021
thanks Stephen B and Sandshark.

My fault, it was in the "war & peace" but not in the summary.
NAS Ais OFFLINE
That is the NAS with the failed drive - actually it has not failed yet, I started receiving mails 32-40-48 errors, I saved the logs and switched it off before it did.
NAS A is rarely accessed and written to, it can wait offline a week or two.

Actually, if I power it on to copy the data BEFORE doing any volume rebuilding, would it be better to do it with the failing disk removed (it is already out), or power it back on, and hot remove it?
I had a disk failure initiating a daisy chain failure fo an healthy disk, I would prefer REMOVE the disk before it the NAS failed it
- StephenB
  Guru - Experienced User
  Oct 24, 2021
  berillio wrote:
  
  Actually, if I power it on to copy the data BEFORE doing any volume rebuilding, would it be better to do it with the failing disk removed (it is already out), or power it back on, and hot remove it?
  
  I had a disk failure initiating a daisy chain failure fo an healthy disk, I would prefer REMOVE the disk before it the NAS failed it
  
  The puzzle here is whether you have multiple disks on the edge or not.
  
  If only a single disk is at risk, then removing it is fine. But if multiple disks are stuggling, then I think it's best to leave them all in place. The theory there is that the bad sectors aren't likely to overlap, so all the data is still recoverable.
  
  Personally I've never seen a case where a failing disk provoked a failure on a second drive.
  - berillio
    Aspirant
    Oct 25, 2021
    Stephen, Sandshark,
    Thanks again for coming back
    
    Some NAS/disk history.
    The array in NAS A was built in stages in a RN104. Started with a 4TB Seagate, added 2 WD Reds, one of which failed and was replaced, and a 4th Red.
    In March (?) 2020 the RN104 failed, in April I bought two RN214s (NAS A&B), migrated the array from the RN104 to NAS A; the failed Red subsequently passed WD diagnostics and went in Slot 1 of NAS B with 3 more WD Reds, all powered up in May 2020. Faultless so far, and it is the MOST used, PCs are writing to it at all times (i.e., right now).
    
    NAS A: slot 1 is the only drive which has a sector count [64]
    slot    disk                                 dom             fitted
    1        ST4000DM000-1F2168      20-May-13    Sept-2013 (?)
    2        WD40EFRX-68N32N0        15-May-18    Oct-2018 (the replacement)
    3        WD40EFRX-68WT0N0       02-Feb-15     April -2015
    4        WD40EFRX-68N32N0        07-Mar-18    Oct-2018
    
    From disk_info.log (“health data” non reported was ZERO);(I don’t know if there is other data of interest in other logs).
    
    NAS A,
    Disk 1 (the “failing” disk)
    Date of Manufacture (DoM)          20-May-2013
    Fitted                                        September (?) 2013
    Current Pending Sector Count:   64
    Uncorrectable Sector Count:     64
    Temperature:                                 47
    Start/Stop Count:                          542
    Power-On Hours:                         57893
    Power Cycle Count:                    389
    Load Cycle Count:                      543
    
    Disk 2
    DoM                                          15-May-2018
    Fitted                                       October 2018
    Current Pending Sector Count:   0
    Uncorrectable Sector Count:        0
    Temperature:                                48
    Start/Stop Count:                       280
    Power-On Hours:                        22363
    Power Cycle Count:                    180
    Load Cycle Count:                      24
    
    Disk 3
    DoM                                          02-Feb-15
    Fitted                                       April -2015
    Current Pending Sector Count:   0
    Uncorrectable Sector Count:        0
    Temperature:                                49
    Start/Stop Count:                       461
    Power-On Hours:                        48656
    Power Cycle Count:                    340
    Load Cycle Count:                      7847
    
    Disk 4
    DoM                                          07-Mar-18
    Fitted                                        Oct-2018
    Current Pending Sector Count:    0
    Uncorrectable Sector Count:        0
    Temperature:                             43
    Start/Stop Count:                       298
    Power-On Hours:                        24677
    Power Cycle Count:                    193
    Load Cycle Count:                      307
    
    NAS B - fully populated & started in May 2020
    1        WD40EFRX -68WT0N0      02-Feb-2015 Failed, then passed DLDIAG
    2        WD40EFRX-68N32N0        26-Jan-2020
    3        WD40EFRX-68N32N0        26-Jan-2020
    4        WD40EFRX-68N32N0        26-Jan-2020
    
    NAS B – all populated and started in May 2020
    Disk 1
    DoM                                          02-Feb-2015
    Fitted           April 2015, failed    May 2020 (passed DLDDIAG)
    Current Pending Sector Count: 0
    Uncorrectable Sector Count:        0
    Temperature:                             46
    Start/Stop Count:                       1631
    Power-On Hours:                        39333
    Power Cycle Count:                    275
    Load Cycle Count:                      8708
    
    Disk 2
    DoM                                          26 Jan 2020
    Fitted                                        5 May 2020
    Current Pending Sector Count:   0
    Uncorrectable Sector Count:        0
    Temperature:                             47
    Start/Stop Count:                       1296
    Power-On Hours:                        12847
    Power Cycle Count:                    66
    Load Cycle Count:                      1291
    
    Disk 3
    DoM                                          26 Jan 2020
    Fitted                                        5 May 2020
    Current Pending Sector Count:   0
    Uncorrectable Sector Count:        0
    Temperature:                             47
    Start/Stop Count:                       1207
    Power-On Hours:                        12830
    Power Cycle Count:                    61
    Load Cycle Count:                      1201
    
    Disk 4
    DoM                                          26 Jan 2020
    Fitted                                        5 May 2020
    Current Pending Sector Count:   0
    Uncorrectable Sector Count:        0
    Temperature:                             43
    Start/Stop Count:                       1140
    Power-On Hours:                        12796
    Power Cycle Count:                    60
    Load Cycle Count:                      1134
    
    Sandshark – I appreciate and share your concern about Disk1 in NAS B, the next oldest disk
    Unfortunately, I am not at all familiar with reading SMART data
    
    This is smart_history.log for NAS A
    time                 model                 serial                realloc_sect realloc_evnt spin_retry_cnt ioedc       cmd_timeouts pending_sect uncorrectable_err ata_errors
    ------------------- -------------------- -------------------- ------------ ------------ -------------- ---------- ------------ ------------ ----------------- ----------
    2013-11-07 20:06:55 ST4000DM000-1F2168    W300G5AN              0             0             0               0           0             0             0                  0
    2015-04-10 10:06:04 WDC WD40EFRX-68WT0N0 WD-WCC4E2KNHZ2N       0             0             0               -1          -1            0             0                  0
    2015-04-10 22:49:49 WDC WD40EFRX-68WT0N0 WD-WCC4E5AU82XY       0             0             0               -1          -1            0             0                  0
    2018-10-03 13:53:27 WDC WD40EFRX-68N32N0 WD-WCC7K6HY4DPN       0             0             0               -1          -1            0             0                  0
    2019-01-30 23:14:03 WDC WD40EFRX-68N32N0 WD-WCC7K7HD8AJD       0             0             0               -1          -1            0             0                  0
    2019-04-19 12:12:05 ST4000DM000-1F2168    W300G5AN              0             0             0               0           0             8             8                  0
    2021-10-20 10:58:53 ST4000DM000-1F2168    W300G5AN              0             0             0               0           0             16            16                 0
    2021-10-20 11:02:54 ST4000DM000-1F2168    W300G5AN              0             0             0               0           0             24            24                 0
    2021-10-20 11:04:54 ST4000DM000-1F2168    W300G5AN              0             0             0               0           0             32            32                 0
    2021-10-20 11:06:57 ST4000DM000-1F2168    W300G5AN              0             0             0               0           0             40            40                 0
    2021-10-20 11:12:59 ST4000DM000-1F2168    W300G5AN              0             0             0               0           0             48            48                 0
    2021-10-20 11:15:00 ST4000DM000-1F2168    W300G5AN              0             0             0               0           0             64            64                 0
    
    And for NASB
    time                 model                 serial                realloc_sect realloc_evnt spin_retry_cnt ioedc       cmd_timeouts pending_sect uncorrectable_err ata_errors
    ------------------- -------------------- -------------------- ------------ ------------ -------------- ---------- ------------ ------------ ----------------- ----------
    2020-05-02 21:01:07 WDC WD40EFRX-68WT0N0 WD-WCC4E2KNHZ2N       0             0             0               -1          -1            0             0                  0
    2020-05-05 02:36:57 WDC WD40EFRX-68N32N0 WD-WCC7K0SF9U38       0             0             0               -1          -1            0             0                  0
    2020-05-05 20:58:12 WDC WD40EFRX-68N32N0 WD-WCC7K4JKN2L4       0             0             0               -1          -1            0             0                  0
    2020-05-07 04:06:28 WDC WD40EFRX-68N32N0 WD-WCC7K6YX6PYY       0             0             0               -1          -1            0             0                  0
    
    stephen “disks on the edge”
    Well, AFAIK, everything is fine (but so it was last Thursday when, BY COMPLETE FLUKE, as I had the admin page open, I saw the alerts and shut down NAS A).
    Simply judging by the hours of operation (I don’t know how to judge the other parameters), the next oldest disks are NAS B slot 1 and NAS A Slot 3.
    I am currently transferring ~1.25TB from NAS B to NAS C; I am planning to fit a 8TB to a PC, to copy NAS B data on it.
    
    “Personally I've never seen a case where a failing disk provoked a failure on a second drive.”
    I believe that this is what happened on my NV+ v2, because only one disk was writing errors and because now that the disks are in one PC with a Linux double boot, two recovery suites can see the entire data set, one of them can also read/show the data in the free version. I never REALLY tried to recover the data because of lack of expertise (I know nothing about Linux), time, and priorities level. But maybe I am wrong, there was a REAL failure on another disk, and, althought the data "appears" to be recoverable, the recovery would fail. Now, any recovery attempt will have to wait even more, I have to take out those disks to make space for the 8TB on that PC.