Failing Drive Whilst Restriping

Question

Any suggestions folks as it looks like I'm about to a majour data loss?

I've an Ultra 6 that was fitted with 3 No 3Tb drives configured with redundancy total space 5547GB and it was about 75% full.

This afternoon I added a fourth drive so as to provide dual redundancy and the unit started re-striping OK. However after 3 hrs it is only showing 1.0% complete and is estimating 203 hrs to finish.

Assuming something was wrong I checked the system health and logs and spotted that Disk 2 appears to be going faulty fast, as the SMART status is now showing:

Model: Hitachi HDS723030ALA640
Firmware: MKAOA580

SMART Attribute

Raw Read Error Rate 14024923
Throughput Performance 90
Spin Up Time 552
Start Stop Count 120
Reallocated Sector Count 70
Seek Error Rate 0
Seek Time Performance 27
Power On Hours 282
Spin Retry Count 0
Power Cycle Count 72
Power-Off Retract Count 121
Load Cycle Count 121
Temperature Celsius 46
Reallocated Event Count 70
Current Pending Sector 1
Offline Uncorrectable 0
UDMA CRC Error Count 0

ATA Error Count 499

What happend if this drive fails before restriping has been completed? should I consider powering down removing the faulty disk and replacing it with the one in bay 4 and if I did would that mean that the system would recover to the state before I tried to add dual redundancy.

Any help much appreciated

Pete

ReadySECURE · Answer

There are a few things you can do. I would recommend powering the device down for both.

First, you may wish to test the drive with Hitachi's SMART tools. This would be a test to see if the drive is really failing, or if it may be another issue entirely.

Secondly, you could always use knoppix to clone the drive to a known good drive, and then place it back in the device. Keep in mind that the known good drive should be on the HCL. After successfully cloning with knoppix, you can then place the good drive in the NAS and power on. If all goes well, it will be as though the drive did not have any issues.

Here is a simple guide to quickly recover a failed drive using dd_rescue.

I often have to deal with pesky failed drives, so here is a quick simple guide how to achieve this with a free Linux Live CD and a PC with two SATA connections.
I will be using a Knoppix 6.2 Live CD for this guide. Can be found at www.knoppix.net
Using dd_rescue command allows you to copy data from one drive to another block for block. This is especially useful for recovering a failed drive. Often when a drive fails, the drive is still accessible, it has just surpassed the S.M.A.R.T. error threshold. dd_rescue allows you to ignore the bad sectors and continue cloning the bad drive to a new healthy drive.

1) Connect your old drive and new drive to your PC
2) Boot up using your Linux live CD
3) Launch a terminal window.
4) Run fdisk -l to make sure the system sees both of the hard drives.
5) Run hdparm -i /dev/sdx on both of the drives to find which drive is your source drive and which drive is your destination drive
6) Once you know which drive is which you can start the clone process.

dd_rescue /dev/sdx(source disk) /dev/sdx(destination drive)
7) You will see the process start, just keep an eye on it, it might take a few hours for the clone job to finish, depending on the size of the drive.

Once the process is complete, there will be no notification, the transfer will just stop and you will see the terminal prompt again.

If you see a lot of errors or see that there is no more data being shown as succxfer: it means the drive got marked faulty by the kernel. At this point reboot the system and make sure you know which drive is which again, as it is possible they lettering might switch. Run the dd-rescue command again but this time with -r option. This will start the cloning again but this time will start from the back of the drive and will make sure to get the data that has not been cloned yet.

There is no guarantee this will recover your data, but there is a very high chance this will work and its free…

mearglen · Answer

Thanks for the suggestion, I've only just read your message but I left it running ovenight and it's speeded up a bit currently quoting 92 hrs and 40 minutes to complete/

What is starnge is that when I look at the SMART attributes for Disk 2 the raw read errors have disapeared whilst relocated sectors remain stable.

Model: Hitachi HDS723030ALA640

Firmware: MKAOA580

SMART Attribute

Raw Read Error Rate 0
Throughput Performance 90
Spin Up Time 552
Start Stop Count 120
Reallocated Sector Count 70
Seek Error Rate 0
Seek Time Performance 27
Power On Hours 296
Spin Retry Count 0
Power Cycle Count 72
Power-Off Retract Count 121
Load Cycle Count 121
Temperature Celsius 45
Reallocated Event Count 70
Current Pending Sector 0
Offline Uncorrectable 0
UDMA CRC Error Count 0

ATA Error Count 499

Not sure what to do power down and take up one of your suggestions or let it run it's course. can I be fairly sure that powering it down in the middle of it's restriping excercise will not cause me bigger problems.

I also note that according to the Hitachi website my 3TB drive Hitachi HDS723030ALA640 are not supported by their Drive Fitness Test tool

Regards

Peter

Forum Discussion

Failing Drive Whilst Restriping

2 Replies

Related Content

Readynas 526X - Network Access Whilst Resync

Firmware download fails

WAX610 login fails

readynas duo v2 disk failed ..

xr1000v2 fails to update

NETGEAR Academy

ProSupport for Business