- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
Re: Replacement procedure for suspect drive in ReadyNAS Pro BE
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Replacement procedure for suspect drive in ReadyNAS Pro BE
BACKGROUND:
ReadyNAS Pro Business Edition with six WD 750 GB enterprise drives (WDC WD7502ABYS-01A6B0), all drives with 03.00C05 firmware. The drives show about 57,000 runtime hours (about 6.5 years). The NAS has never had a drive failure.
The ReadyNAS is running RAIDiator 4.2.27, and drives are configured with X-RAID2.
Operating temperatures have been consistent over the life of the system: SYS: 50 C (122 F), Temp CPU: 20.5 C (68 F). All six drives consistently run at 39-42 C (102-107 F).
I have several spare equivalent drives with zero runtime hours.
ISSUE:
Drive #2 (top middle) has a pronounced bearing noise with off-nominal (also pronounced) hand-sensed vibration.
SMART data shows no issues for any disk.
QUESTION:
What is the recommended replacement procedure for suspect drive in ReadyNAS Pro BE?
i.e. Wait for drive to fail and force system shutdown; or
replace hot (while running); or
shutdown and replace cold; or
pull suspect drive to verify pending failure, and shutdown or
...
Obviously backup first.
Thanks, Greg
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Replacement procedure for suspect drive in ReadyNAS Pro BE
I always recommend a hot-swap - because it ensures that the NAS will detect the removal and insertion. However, a cold insertion of an unformatted disk should also work.
I'd also replace the drive now, I see no reason to wait.
You might want to run Western Digital's Lifeguard diag on the replacement first (that is a windows application, available on the WDC web site). That's just to make sure nothing happened while the drive was on the shelf.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Replacement procedure for suspect drive in ReadyNAS Pro BE
Suspect drive #2 was removed as suggested (system up). Unfortunately, the "noise" did not subside with removal of the drive. Tested the drive with WD Lifeguard Diagnostic for Windows (v1.24) and wrote zeros to drive (full test, full zero). Re-installed drive back into ReadyNAS, and the system rebuilt back to X-RAID2 as expected. ReadyNAS log:
Sun Aug 23 08:40:34 MDT 2015 Disk removal detected. [Disk 2] Sun Aug 23 08:40:35 MDT 2015 A disk was removed from the ReadyNAS. For [...] Sun Aug 23 08:40:42 MDT 2015 Disk failure detected. Sun Aug 23 08:40:42 MDT 2015 If the failed disk is used in a RAID level [...] Sun Aug 23 13:42:02 MDT 2015 New disk detected. If multiple disks have [...] Sun Aug 23 13:44:13 MDT 2015 Data volume will be rebuilt with disk 2. Sun Aug 23 13:44:35 MDT 2015 RAID sync started on volume C. Sun Aug 23 17:47:19 MDT 2015 RAID sync finished on volume C.
System appeared to run well for 5 days.
Still trying to diagnose noise source. Remove suspect drive from bay 1 while system up:
Fri Aug 28 07:36:55 MDT 2015 Disk removal detected. [Disk 1] Fri Aug 28 07:36:55 MDT 2015 A disk was removed from the ReadyNAS. Fri Aug 28 07:37:21 MDT 2015 Disk failure detected. Fri Aug 28 07:37:21 MDT 2015 If the failed disk is used in a RAID level [...]
Test and zero new zero-hour drive with WD Lifeguard Diagnostic for Windows (v1.24). Passed.
Test and zero old drive #1 with WD Lifeguard Diagnostic for Windows (v1.24). Passed.
Insert *new* drive into bay 1; surprisingly it FAILED SMART test: Arrgh!
Fri Aug 28 14:42:28 MDT 2015 New disk detected. If multiple disks have been [...] Fri Aug 28 14:42:41 MDT 2015 Newly added disk has failed SMART test. Please check disk 1.
Remove new drive from bay 1, then
Insert old drive into bay 1; it also FAILED SMART test:
Fri Aug 28 14:55:12 MDT 2015 Disk removal detected. [Disk 1] Fri Aug 28 14:55:12 MDT 2015 A disk was removed from the ReadyNAS. For full [...] Fri Aug 28 14:59:51 MDT 2015 New disk detected. If multiple disks have been [...] Fri Aug 28 15:00:04 MDT 2015 Newly added disk has failed SMART test. Please check disk 1.
Remove old drive from bay 1; then
Test and quick zero old drive with WD Lifeguard Diagnostic (Passed);
Shutdown and reboot; then
Insert old drive into bay 1; it also FAILED SMART test:
Fri Aug 28 15:06:55 MDT 2015 Disk removal detected. [Disk 1] Fri Aug 28 15:06:55 MDT 2015 A disk was removed from the ReadyNAS. For full [...] Fri Aug 28 15:08:30 MDT 2015 Please close this browser session and use RAIDar [...] Fri Aug 28 15:09:47 MDT 2015 System is up. Fri Aug 28 15:19:16 MDT 2015 New disk detected. If multiple disks have been [...] Fri Aug 28 15:19:27 MDT 2015 Newly added disk has failed SMART test. Please check disk 1.
So, bay 1 failed two successive SMART tests (two different drives), and failed again after a reboot.
Both drives pass WD Lifeguard Diagnostic (one is zero-hour drive).
Looking for recommendations.
Thanks, gregb
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Replacement procedure for suspect drive in ReadyNAS Pro BE
Tested drives again with WD Lifeguard Diagnostic for Windows; both show PASS.
Both drives show no SMART issues (all zero's in expected warning indicators).
Does RAIDiator 4.2.27 keep an internal database of drives and serial numbers with capability to reject drives based on database history? (although neither of these drives are actually failed).
Thanks, gregb
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Replacement procedure for suspect drive in ReadyNAS Pro BE
I am still looking for a solution to return the device back to 6 active drives (X-RAID2).
WD Lifeguard Diagnostic shows two good replacement drives: one drive with 57K hours and one new drive.
Both pass WD Lifeguard Diagnostic tests, both have zeros written to all blocks, both show good SMART data.
However, for both drives, ReadyNAS Frontview status indicates "Newly added disk has failed SMART test" when inserting into bay 1 (as noted in previous posts). The "failed SMARTtest" message doesn't show up in system.log.
Old drive:
Raw Read Error Rate 0 Spin Up Time 1133 Start Stop Count 29 Reallocated Sector Count 0 Seek Error Rate 0 Power On Hours 57150 Spin Retry Count 0 Calibration Retry Count 0 Power Cycle Count 27 Power-Off Retract Count 10 Load Cycle Count 29 Temperature Celsius 43 Reallocated Event Count 0 Current Pending Sector 0 Offline Uncorrectable 0 UDMA CRC Error Count 0 Multi Zone Error Rate 0 ATA Error Count 0
New drive:
Raw Read Error Rate 0 Spin Up Time 1200 Start Stop Count 8 Reallocated Sector Count 0 Seek Error Rate 0 Power On Hours 140 Spin Retry Count 0 Calibration Retry Count 0 Power Cycle Count 6 Power-Off Retract Count 5 Load Cycle Count 8 Temperature Celsius 33 Reallocated Event Count 0 Current Pending Sector 0 Offline Uncorrectable 0 UDMA CRC Error Count 0 Multi Zone Error Rate 0 ATA Error Count 0
Thanks, gregb
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Replacement procedure for suspect drive in ReadyNAS Pro BE
I am wondering if the SATA port in the NAS itself has failed.
Can you try powering down/removing the existing drives. Then insert one of the new drives into the NAS into slot 1, and see if you get the same error?
If you do, power down again and try it with the drive in slot 2.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Re: Replacement procedure for suspect drive in ReadyNAS Pro BE
Drive 1 was already removed; system was in degraded mode for several days without drive 1 (drives 2-6 active).
Write full zeros to disk 1 (again) using WD Lifeguard Diagnostic for Windows. Good SMART / passed.
Action completed on 01 Sep 2015:
Shutdown, insert disk 1, and power on.
Different behavior than shown above last week (Fri Aug 28). Odd:
Front panel showed this set of four messages sequentially several times:
Testing Disk 1
Disk 1 Passed
Disk 1 Failed
Volume C degraded
[repeat]
then finally:
Resync in progress
Although I really didn't fix anything, the system appears normal again.
Thanks, gregb