NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
mgbenson
Jan 13, 2013Aspirant
New disk added, restripe finished, then disk failure
I've got an NV+v2 with 2 x WD20EARX, and have just added a third drive (also WD20EARX). However, despite seeming to recognise the drive and start checking/restriping, the drive (drive 3) was showing as 'Spare' and 'Inactive' after the restripe (although the capcity was now reporting 2.7TB total rather than the previous 1.8TB). That didn't seem right, so I figured I better update my 3-month old backup of stuff in case of a problem.
Whilst I was making the backup, drive 1 appears to have failed and all of my shares seem to be empty. Here's the log:
Sun Jan 13 16:58:10 WET 2013 RAID sync finished on volume C. The array is still in degraded mode, however. This can be caused by a disk sync failure or failed disks in a multi-parity disk array.
Sun Jan 13 16:57:46 WET 2013 RAID sync started on volume C.
Sun Jan 13 12:52:42 WET 2013 Data volume has been successfully expanded to 3692 GB.
Sun Jan 13 10:01:09 WET 2013 If the failed disk is used in a RAID level 1, 5, or X-RAID volume, please note that volume is now unprotected, and an additional disk failure may render that volume dead. If this disk is a part of a RAID 10 volume,your volume is still protected if more than half of the disks alive. But another failure of disks been marked may render that volume dead. It is recommended that you replace the failed disk as soon as possible to maintain optimal protection of your volume.
Sun Jan 13 10:01:08 WET 2013 Disk failure detected.
Sat Jan 12 13:56:41 WET 2013 Volume expansion started. Do not interrupt the system during this time. When finished, email notification will be sent to the alert contact list.
Sat Jan 12 13:56:26 WET 2013 New disk detected. If multiple disks have been added, they will be processed one at a time. Please do not remove any added disk(s) during this time. [Disk 3]
Looking back a little further, it seems there had been errors on Drive 1 too:
Sun Jan 6 05:03:58 WET 2013 Detected increasing uncorrectable errors[39] on disk 1 [WDC WD20EARX-008FB0, WD-WCAZAE202953]. This often indicates an impending failure. Please be prepared to replace this disk to maintain data redundancy.
With hindsight, I probably shouldn't have added the new drive.
I'd appreciate some suggestions on what to do next, to try and recover the data. If it helps, I've got a further WD20EARX drive still sealed that I could use as a replacement for drive 1 (if that has really died).
Should I attempt to shutdown the unit, or will that make things worse?
Whilst I was making the backup, drive 1 appears to have failed and all of my shares seem to be empty. Here's the log:
Sun Jan 13 16:58:10 WET 2013 RAID sync finished on volume C. The array is still in degraded mode, however. This can be caused by a disk sync failure or failed disks in a multi-parity disk array.
Sun Jan 13 16:57:46 WET 2013 RAID sync started on volume C.
Sun Jan 13 12:52:42 WET 2013 Data volume has been successfully expanded to 3692 GB.
Sun Jan 13 10:01:09 WET 2013 If the failed disk is used in a RAID level 1, 5, or X-RAID volume, please note that volume is now unprotected, and an additional disk failure may render that volume dead. If this disk is a part of a RAID 10 volume,your volume is still protected if more than half of the disks alive. But another failure of disks been marked may render that volume dead. It is recommended that you replace the failed disk as soon as possible to maintain optimal protection of your volume.
Sun Jan 13 10:01:08 WET 2013 Disk failure detected.
Sat Jan 12 13:56:41 WET 2013 Volume expansion started. Do not interrupt the system during this time. When finished, email notification will be sent to the alert contact list.
Sat Jan 12 13:56:26 WET 2013 New disk detected. If multiple disks have been added, they will be processed one at a time. Please do not remove any added disk(s) during this time. [Disk 3]
Looking back a little further, it seems there had been errors on Drive 1 too:
Sun Jan 6 05:03:58 WET 2013 Detected increasing uncorrectable errors[39] on disk 1 [WDC WD20EARX-008FB0, WD-WCAZAE202953]. This often indicates an impending failure. Please be prepared to replace this disk to maintain data redundancy.
With hindsight, I probably shouldn't have added the new drive.
I'd appreciate some suggestions on what to do next, to try and recover the data. If it helps, I've got a further WD20EARX drive still sealed that I could use as a replacement for drive 1 (if that has really died).
Should I attempt to shutdown the unit, or will that make things worse?
4 Replies
Replies have been turned off for this discussion
- mgbensonAspirantProbably worth adding that no new data has been added to the system since the restripe, if that makes any difference...
Strangely, looking at the status of the system via the web interface, it thinks that all 3 disks are ok (all showing green), and that the volume is ok too.
Maybe I should have run the wdidle util on the drives before I used them - if so, a note to this *REALLY* needs to be put on the hardware compatibility list. - StephenBGuru - Experienced UserI don't think this is related to widdle. What do the SMART stats say?
- mgbensonAspirantIt doesn't want to talk to drive 3, but here they are for drives 1 and 2:
root@ReadyNAS:~# smartctl --all /dev/sda
smartctl 5.42 2011-10-20 r3458 [armv5tel-linux-2.6.31.8.nv+v2] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (Adv. Format)
Device Model: WDC WD20EARX-008FB0
Serial Number: WD-WCAZAE202953
LU WWN Device Id: 5 0014ee 207119fa6
Firmware Version: 51.0AB51
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Mon Jan 14 08:20:44 2013 WET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (34560) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x30b5) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 199 199 051 Pre-fail Always - 1708
3 Spin_Up_Time 0x0027 187 184 021 Pre-fail Always - 5625
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 223
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 8
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 2357
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 128
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 124
193 Load_Cycle_Count 0x0032 121 121 000 Old_age Always - 239288
194 Temperature_Celsius 0x0022 123 116 000 Old_age Always - 27
196 Reallocated_Event_Count 0x0032 192 192 000 Old_age Always - 8
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 33
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 39
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 87
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 0 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
root@ReadyNAS:~# smartctl --all /dev/sdb
smartctl 5.42 2011-10-20 r3458 [armv5tel-linux-2.6.31.8.nv+v2] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (Adv. Format)
Device Model: WDC WD20EARX-008FB0
Serial Number: WD-WCAZAE032781
LU WWN Device Id: 5 0014ee 25c66d736
Firmware Version: 51.0AB51
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Mon Jan 14 08:20:46 2013 WET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (35460) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x30b5) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 183 181 021 Pre-fail Always - 5808
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 220
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0
9 Power_On_Hours 0x0032 097 097 000 Old_age Always - 2357
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 128
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 125
193 Load_Cycle_Count 0x0032 120 120 000 Old_age Always - 240092
194 Temperature_Celsius 0x0022 122 113 000 Old_age Always - 28
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 0 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
root@ReadyNAS:~# smartctl --all /dev/sdc
smartctl 5.42 2011-10-20 r3458 [armv5tel-linux-2.6.31.8.nv+v2] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
Smartctl open device: /dev/sdc failed: No such device or address
root@ReadyNAS:~#
I've raised a case with Netgear support too - Case #20343149 - mgbensonAspirantThe retailer I bought the two WD Green WD20EARX drives from has agreed to take them both back and let me upgrade to WD Red WD20ERFX ones - just need to see if I can get any of my data back first, then I'll rebuild the whole thing to avoid the pesky 'green' drives.
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!