× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: Replacement procedure for suspect drive in ReadyNAS Pro BE

gregb_pro
Aspirant

Replacement procedure for suspect drive in ReadyNAS Pro BE

BACKGROUND:
ReadyNAS Pro Business Edition with six WD 750 GB enterprise drives (WDC WD7502ABYS-01A6B0), all drives with 03.00C05 firmware. The drives show about 57,000 runtime hours (about 6.5 years). The NAS has never had a drive failure.
The ReadyNAS is running RAIDiator 4.2.27, and drives are configured with X-RAID2.
Operating temperatures have been consistent over the life of the system: SYS: 50 C (122 F), Temp CPU: 20.5 C (68 F). All six drives consistently run at 39-42 C (102-107 F).
I have several spare equivalent drives with zero runtime hours.


ISSUE:

Drive #2 (top middle) has a pronounced bearing noise with off-nominal (also pronounced) hand-sensed vibration.
SMART data shows no issues for any disk.

QUESTION:
What is the recommended replacement procedure for suspect drive in ReadyNAS Pro BE?

i.e. Wait for drive to fail and force system shutdown; or
replace hot (while running); or
shutdown and replace cold; or
pull suspect drive to verify pending failure, and shutdown or
...

Obviously backup first.

Thanks, Greg

 

Message 1 of 7
StephenB
Guru

Re: Replacement procedure for suspect drive in ReadyNAS Pro BE

I always recommend a hot-swap - because it ensures that the NAS will detect the removal and insertion.  However, a cold insertion of an unformatted disk should also work.

 

I'd also replace the drive now, I see no reason to wait.  

 

You might want to run Western Digital's Lifeguard diag on the replacement first (that is a windows application, available on the WDC web site).  That's just to make sure nothing happened while the drive was on the shelf.

Message 2 of 7
gregb_pro
Aspirant

Re: Replacement procedure for suspect drive in ReadyNAS Pro BE

Suspect drive #2 was removed as suggested (system up). Unfortunately, the "noise" did not subside with removal of the drive. Tested the drive with WD Lifeguard Diagnostic for Windows (v1.24) and wrote zeros to drive (full test, full zero). Re-installed drive back into ReadyNAS, and the system rebuilt back to X-RAID2 as expected. ReadyNAS log:

 

Sun Aug 23 08:40:34 MDT 2015    Disk removal detected. [Disk 2]
Sun Aug 23 08:40:35 MDT 2015    A disk was removed from the ReadyNAS. For [...]
Sun Aug 23 08:40:42 MDT 2015    Disk failure detected.
Sun Aug 23 08:40:42 MDT 2015    If the failed disk is used in a RAID level [...]
Sun Aug 23 13:42:02 MDT 2015    New disk detected. If multiple disks have [...]
Sun Aug 23 13:44:13 MDT 2015    Data volume will be rebuilt with disk 2.
Sun Aug 23 13:44:35 MDT 2015    RAID sync started on volume C.
Sun Aug 23 17:47:19 MDT 2015    RAID sync finished on volume C.

System appeared to run well for 5 days.

 

Still trying to diagnose noise source. Remove suspect drive from bay 1 while system up:

Fri Aug 28 07:36:55 MDT 2015    Disk removal detected. [Disk 1]
Fri Aug 28 07:36:55 MDT 2015    A disk was removed from the ReadyNAS.
Fri Aug 28 07:37:21 MDT 2015    Disk failure detected.
Fri Aug 28 07:37:21 MDT 2015    If the failed disk is used in a RAID level [...]

 

Test and zero new zero-hour drive with WD Lifeguard Diagnostic for Windows (v1.24). Passed.
Test and zero old drive #1 with WD Lifeguard Diagnostic for Windows (v1.24). Passed.
Insert *new* drive into bay 1; surprisingly it FAILED SMART test: Arrgh!

 

Fri Aug 28 14:42:28 MDT 2015    New disk detected. If multiple disks have been [...]
Fri Aug 28 14:42:41 MDT 2015    Newly added disk has failed SMART test. Please check disk 1.


Remove new drive from bay 1, then
Insert old drive into bay 1; it also FAILED SMART test:

 

Fri Aug 28 14:55:12 MDT 2015    Disk removal detected. [Disk 1]
Fri Aug 28 14:55:12 MDT 2015    A disk was removed from the ReadyNAS. For full [...]
Fri Aug 28 14:59:51 MDT 2015    New disk detected. If multiple disks have been [...]
Fri Aug 28 15:00:04 MDT 2015    Newly added disk has failed SMART test. Please check disk 1.


Remove old drive from bay 1; then
Test and quick zero old drive with WD Lifeguard Diagnostic (Passed);
Shutdown and reboot; then
Insert old drive into bay 1; it also FAILED SMART test:

 

Fri Aug 28 15:06:55 MDT 2015    Disk removal detected. [Disk 1]
Fri Aug 28 15:06:55 MDT 2015    A disk was removed from the ReadyNAS. For full [...]
Fri Aug 28 15:08:30 MDT 2015    Please close this browser session and use RAIDar [...]
Fri Aug 28 15:09:47 MDT 2015    System is up.
Fri Aug 28 15:19:16 MDT 2015    New disk detected. If multiple disks have been [...]
Fri Aug 28 15:19:27 MDT 2015    Newly added disk has failed SMART test. Please check disk 1.

So, bay 1 failed two successive SMART tests (two different drives), and failed again after a reboot.

Both drives pass WD Lifeguard Diagnostic (one is zero-hour drive).

Looking for recommendations.

Thanks, gregb

 

Message 3 of 7
gregb_pro
Aspirant

Re: Replacement procedure for suspect drive in ReadyNAS Pro BE

Tested drives again with WD Lifeguard Diagnostic for Windows; both show PASS.

Both drives show no SMART issues (all zero's in expected warning indicators).

Does RAIDiator 4.2.27 keep an internal database of drives and serial numbers with capability to reject drives based on database history? (although neither of these drives are actually failed).

Thanks, gregb

Message 4 of 7
gregb_pro
Aspirant

Re: Replacement procedure for suspect drive in ReadyNAS Pro BE

I am still looking for a solution to return the device back to 6 active drives (X-RAID2).

WD Lifeguard Diagnostic shows two good replacement drives: one drive with 57K hours and one new drive.
Both pass WD Lifeguard Diagnostic tests, both have zeros written to all blocks, both show good SMART data.

However, for both drives, ReadyNAS Frontview status indicates "Newly added disk has failed SMART test" when inserting into bay 1 (as noted in previous posts). The "failed SMARTtest" message doesn't show up in system.log.

 

Old drive:

 

Raw Read Error Rate         0
Spin Up Time             1133
Start Stop Count           29
Reallocated Sector Count    0
Seek Error Rate             0
Power On Hours          57150
Spin Retry Count            0
Calibration Retry Count     0
Power Cycle Count          27
Power-Off Retract Count    10
Load Cycle Count           29
Temperature Celsius        43
Reallocated Event Count     0
Current Pending Sector      0
Offline Uncorrectable       0
UDMA CRC Error Count        0
Multi Zone Error Rate       0
ATA Error Count             0

New drive:

Raw Read Error Rate         0
Spin Up Time             1200
Start Stop Count            8
Reallocated Sector Count    0
Seek Error Rate             0
Power On Hours            140
Spin Retry Count            0
Calibration Retry Count     0
Power Cycle Count           6
Power-Off Retract Count     5
Load Cycle Count            8
Temperature Celsius        33
Reallocated Event Count     0
Current Pending Sector      0
Offline Uncorrectable       0
UDMA CRC Error Count        0
Multi Zone Error Rate       0
ATA Error Count             0

Thanks, gregb

 

Message 5 of 7
StephenB
Guru

Re: Replacement procedure for suspect drive in ReadyNAS Pro BE

I am wondering if the SATA port in the NAS itself has failed.

 

Can you try powering down/removing the existing drives.  Then insert one of the new drives into the NAS into slot 1, and see if you get the same error?

 

If you do, power down again and try it with the drive in slot 2.

Message 6 of 7
gregb_pro
Aspirant

Re: Replacement procedure for suspect drive in ReadyNAS Pro BE

Drive 1 was already removed; system was in degraded mode for several days without drive 1 (drives 2-6 active).
Write full zeros to disk 1 (again) using WD Lifeguard Diagnostic for Windows. Good SMART / passed.

 

Action completed on 01 Sep 2015:

Shutdown, insert disk 1, and power on.
Different behavior than shown above last week (Fri Aug 28). Odd:


Front panel showed this set of four messages sequentially several times:
    Testing Disk 1
    Disk 1 Passed
    Disk 1 Failed
    Volume C degraded
    [repeat]
then finally:
    Resync in progress

 

Although I really didn't fix anything, the system appears normal again.

Thanks, gregb

Message 7 of 7
Top Contributors
Discussion stats
  • 6 replies
  • 4866 views
  • 0 kudos
  • 2 in conversation
Announcements