NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
berillio
Oct 24, 2021Aspirant
vertical expansion - of the wrong nas?
Hello Forum, I have an issue, but describing became War and Peace (600 pages of it), so this is the short of it I have three NASs, (say A, B, C) (4x 4TB, 4x 4TB, 4x 8TB). A & B run OS-6.10.3, C ...
Sandshark
Oct 24, 2021Sensei
Plan 1 has the risk that the older drive will fail during the NAS2 re-sync. Just how old and whether there are any SMART errors are factors in the likelihood of failure. How long can you afford to have NAS1 non-redundant (which you'd base on how much "churn" there is on it, how well backed up, and whether or not you have any SMART errors on the remaining drives. If you can wait for the time it takes to expand NAS2, then replace the older drive in NAS2 with 8TB. Then, when that sync finishes, replace a newer one and move the newer 4TB into NAS1. Before moving it, you may want to zero it on a PC, especially if the volume name on NAS2 is the same as on NAS1.
berillio
Oct 24, 2021Aspirant
thanks Stephen B and Sandshark.
My fault, it was in the "war & peace" but not in the summary.
NAS Ais OFFLINE
That is the NAS with the failed drive - actually it has not failed yet, I started receiving mails 32-40-48 errors, I saved the logs and switched it off before it did.
NAS A is rarely accessed and written to, it can wait offline a week or two.
Actually, if I power it on to copy the data BEFORE doing any volume rebuilding, would it be better to do it with the failing disk removed (it is already out), or power it back on, and hot remove it?
I had a disk failure initiating a daisy chain failure fo an healthy disk, I would prefer REMOVE the disk before it the NAS failed it
- StephenBOct 24, 2021Guru - Experienced User
berillio wrote:
Actually, if I power it on to copy the data BEFORE doing any volume rebuilding, would it be better to do it with the failing disk removed (it is already out), or power it back on, and hot remove it?
I had a disk failure initiating a daisy chain failure fo an healthy disk, I would prefer REMOVE the disk before it the NAS failed it
The puzzle here is whether you have multiple disks on the edge or not.
If only a single disk is at risk, then removing it is fine. But if multiple disks are stuggling, then I think it's best to leave them all in place. The theory there is that the bad sectors aren't likely to overlap, so all the data is still recoverable.
Personally I've never seen a case where a failing disk provoked a failure on a second drive.
- berillioOct 25, 2021Aspirant
Stephen, Sandshark,
Thanks again for coming back
Some NAS/disk history.
The array in NAS A was built in stages in a RN104. Started with a 4TB Seagate, added 2 WD Reds, one of which failed and was replaced, and a 4th Red.
In March (?) 2020 the RN104 failed, in April I bought two RN214s (NAS A&B), migrated the array from the RN104 to NAS A; the failed Red subsequently passed WD diagnostics and went in Slot 1 of NAS B with 3 more WD Reds, all powered up in May 2020. Faultless so far, and it is the MOST used, PCs are writing to it at all times (i.e., right now).
NAS A: slot 1 is the only drive which has a sector count [64]
slot disk dom fitted
1 ST4000DM000-1F2168 20-May-13 Sept-2013 (?)
2 WD40EFRX-68N32N0 15-May-18 Oct-2018 (the replacement)
3 WD40EFRX-68WT0N0 02-Feb-15 April -2015
4 WD40EFRX-68N32N0 07-Mar-18 Oct-2018
From disk_info.log (“health data” non reported was ZERO);(I don’t know if there is other data of interest in other logs).
NAS A,
Disk 1 (the “failing” disk)
Date of Manufacture (DoM) 20-May-2013
Fitted September (?) 2013
Current Pending Sector Count: 64
Uncorrectable Sector Count: 64
Temperature: 47
Start/Stop Count: 542
Power-On Hours: 57893
Power Cycle Count: 389
Load Cycle Count: 543
Disk 2
DoM 15-May-2018
Fitted October 2018
Current Pending Sector Count: 0
Uncorrectable Sector Count: 0
Temperature: 48
Start/Stop Count: 280
Power-On Hours: 22363
Power Cycle Count: 180
Load Cycle Count: 24
Disk 3
DoM 02-Feb-15
Fitted April -2015
Current Pending Sector Count: 0
Uncorrectable Sector Count: 0
Temperature: 49
Start/Stop Count: 461
Power-On Hours: 48656
Power Cycle Count: 340
Load Cycle Count: 7847
Disk 4
DoM 07-Mar-18
Fitted Oct-2018
Current Pending Sector Count: 0
Uncorrectable Sector Count: 0
Temperature: 43
Start/Stop Count: 298
Power-On Hours: 24677
Power Cycle Count: 193
Load Cycle Count: 307
NAS B - fully populated & started in May 2020
1 WD40EFRX -68WT0N0 02-Feb-2015 Failed, then passed DLDIAG
2 WD40EFRX-68N32N0 26-Jan-2020
3 WD40EFRX-68N32N0 26-Jan-2020
4 WD40EFRX-68N32N0 26-Jan-2020
NAS B – all populated and started in May 2020
Disk 1
DoM 02-Feb-2015
Fitted April 2015, failed May 2020 (passed DLDDIAG)
Current Pending Sector Count: 0
Uncorrectable Sector Count: 0
Temperature: 46
Start/Stop Count: 1631
Power-On Hours: 39333
Power Cycle Count: 275
Load Cycle Count: 8708
Disk 2
DoM 26 Jan 2020
Fitted 5 May 2020
Current Pending Sector Count: 0
Uncorrectable Sector Count: 0
Temperature: 47
Start/Stop Count: 1296
Power-On Hours: 12847
Power Cycle Count: 66
Load Cycle Count: 1291
Disk 3
DoM 26 Jan 2020
Fitted 5 May 2020
Current Pending Sector Count: 0
Uncorrectable Sector Count: 0
Temperature: 47
Start/Stop Count: 1207
Power-On Hours: 12830
Power Cycle Count: 61
Load Cycle Count: 1201
Disk 4
DoM 26 Jan 2020
Fitted 5 May 2020
Current Pending Sector Count: 0
Uncorrectable Sector Count: 0
Temperature: 43
Start/Stop Count: 1140
Power-On Hours: 12796
Power Cycle Count: 60
Load Cycle Count: 1134
Sandshark – I appreciate and share your concern about Disk1 in NAS B, the next oldest disk
Unfortunately, I am not at all familiar with reading SMART data
This is smart_history.log for NAS A
time model serial realloc_sect realloc_evnt spin_retry_cnt ioedc cmd_timeouts pending_sect uncorrectable_err ata_errors
------------------- -------------------- -------------------- ------------ ------------ -------------- ---------- ------------ ------------ ----------------- ----------
2013-11-07 20:06:55 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 0 0 0
2015-04-10 10:06:04 WDC WD40EFRX-68WT0N0 WD-WCC4E2KNHZ2N 0 0 0 -1 -1 0 0 0
2015-04-10 22:49:49 WDC WD40EFRX-68WT0N0 WD-WCC4E5AU82XY 0 0 0 -1 -1 0 0 0
2018-10-03 13:53:27 WDC WD40EFRX-68N32N0 WD-WCC7K6HY4DPN 0 0 0 -1 -1 0 0 0
2019-01-30 23:14:03 WDC WD40EFRX-68N32N0 WD-WCC7K7HD8AJD 0 0 0 -1 -1 0 0 0
2019-04-19 12:12:05 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 8 8 0
2021-10-20 10:58:53 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 16 16 0
2021-10-20 11:02:54 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 24 24 0
2021-10-20 11:04:54 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 32 32 0
2021-10-20 11:06:57 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 40 40 0
2021-10-20 11:12:59 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 48 48 0
2021-10-20 11:15:00 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 64 64 0
And for NASB
time model serial realloc_sect realloc_evnt spin_retry_cnt ioedc cmd_timeouts pending_sect uncorrectable_err ata_errors
------------------- -------------------- -------------------- ------------ ------------ -------------- ---------- ------------ ------------ ----------------- ----------
2020-05-02 21:01:07 WDC WD40EFRX-68WT0N0 WD-WCC4E2KNHZ2N 0 0 0 -1 -1 0 0 0
2020-05-05 02:36:57 WDC WD40EFRX-68N32N0 WD-WCC7K0SF9U38 0 0 0 -1 -1 0 0 0
2020-05-05 20:58:12 WDC WD40EFRX-68N32N0 WD-WCC7K4JKN2L4 0 0 0 -1 -1 0 0 0
2020-05-07 04:06:28 WDC WD40EFRX-68N32N0 WD-WCC7K6YX6PYY 0 0 0 -1 -1 0 0 0
stephen “disks on the edge”
Well, AFAIK, everything is fine (but so it was last Thursday when, BY COMPLETE FLUKE, as I had the admin page open, I saw the alerts and shut down NAS A).
Simply judging by the hours of operation (I don’t know how to judge the other parameters), the next oldest disks are NAS B slot 1 and NAS A Slot 3.
I am currently transferring ~1.25TB from NAS B to NAS C; I am planning to fit a 8TB to a PC, to copy NAS B data on it.
“Personally I've never seen a case where a failing disk provoked a failure on a second drive.”
I believe that this is what happened on my NV+ v2, because only one disk was writing errors and because now that the disks are in one PC with a Linux double boot, two recovery suites can see the entire data set, one of them can also read/show the data in the free version. I never REALLY tried to recover the data because of lack of expertise (I know nothing about Linux), time, and priorities level. But maybe I am wrong, there was a REAL failure on another disk, and, althought the data "appears" to be recoverable, the recovery would fail. Now, any recovery attempt will have to wait even more, I have to take out those disks to make space for the 8TB on that PC.
- StephenBOct 25, 2021Guru - Experienced User
berillio wrote:
time model serial realloc_sect realloc_evnt spin_retry_cnt ioedc cmd_timeouts pending_sect uncorrectable_err ata_errors ------------------- -------------------- -------------------- ------------ ------------ -------------- ---------- ------------ ------------ ----------------- ---------- 2013-11-07 20:06:55 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 0 2019-04-19 12:12:05 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 8 8 0 2021-10-20 10:58:53 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 16 16 0 2021-10-20 11:02:54 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 24 24 0 2021-10-20 11:04:54 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 32 32 0 2021-10-20 11:06:57 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 40 40 0 2021-10-20 11:12:59 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 48 48 0 2021-10-20 11:15:00 ST4000DM000-1F2168 W300G5AN 0 0 0 0 0 64 64 0
The log you posted is a table. I've changed the font to courier to make the columns line up.
Uncorrectable errors/pending sectors are identical, suggesting that the errors are reading the disk (if they were write failures, you'd see reallocated sectors go up).
The significant increase in counts (from 8 to 64) on 10/20 is concerning. If this were my disk, I would definitely take it out of service.
berillio wrote:
“Personally I've never seen a case where a failing disk provoked a failure on a second drive.”
I believe that this is what happened on my NV+ v2,
File System corruption is different from having a hardware disk failure somehow propagate.
I guess if a disk seriously overheated, then it might cause an adjacent disk to fail. But I think the NAS would shut down to due to excess temp well before that could happen. I can't think of any other mechanism.
Also, disks can fail well before that failure is detected - especially if you have files that aren't regularly accessed.
Related Content
- Sep 07, 2016Retired_Member
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!