ReadyNAS 314 - a year of hell and now lost 10years of data!

Apprentice

May 11, 2014

xeltros wrote:
I would be interested to know how you restore data from a raid 5 array with 2 dead disks. simplified way on how Raid 5 works is it stores data like this 1+2+3=6 and recovers it doing 1+X+3=6 so 6-(3+1)=X=2.
Are you counting on the fact that the whole disk is not dead and that you can get back part of the data and hope that the two dead only partially failed so that you may recover one number of that sum and rebuild part of the raid using the two dead disk but recovering some data because they didn't fail on blocks concerning the same data ?

xeltros wrote:
I would be interested to know how you restore data from a raid 5 array with 2 dead disks. simplified way on how Raid 5 works is it stores data like this 1+2+3=6 and recovers it doing 1+X+3=6 so 6-(3+1)=X=2. Are you counting on the fact that the whole disk is not dead and that you can get back part of the data and hope that the two dead only partially failed so that you may recover one number of that sum and rebuild part of the raid using the two dead disk but recovering some data because they didn't fail on blocks concerning the same data ?

Yes. A dropped raid doesn't necessarily mean that the data is done for, or even corrupt. It only means that the RAID array controller cannot guarantee data integrity.

Now if two disks are physically dead -- let's say ground to pieces -- in a single redundancy system (RAID5) data is indeed impossible to get back. But the typical "disk fail" error is a read or write error, and that can often be reconstructed. How it's handled depends on how the array controller (including software raid) handles a broken array, but even there, you can use commercial RAID reconstruction tools to get data back.

I have personally reconstructed two broken RAID5 arrays with two "dropped" disks each, one on a QNAP nas and one on a Dell PERC6 (LSI) hardware RAID, and these undertakings have made me view RAID as something much less magical and much more mundane than I thought it was before. :)

StephenB wrote:
My understanding (from a PM) is that one disk is totally dead, and that the second one may have some read errors. I suggested cloning the second disk.

StephenB wrote:
My understanding (from a PM) is that one disk is totally dead, and that the second one may have some read errors. I suggested cloning the second disk.

This is excellent advice, if it's possible to do.

And in this case: it's possible (theoretically) to reconstruct such an array, because RAID5 can handle the one completely broken disk. For the disk with read errors, so what, so some stripes of data are lost. The tricky part is finding a "controller" that doesn't say "OMG a broken stripe set I MUST DROP THIS DISK FROM THE ARRAY IMMEDIATELY, OH NOES ME DED NAO". And essentially this is what RAID reconstruction software does, it reads the "data chunks" from the disks as is, combining them, accepting all read errors, rereading broken sectors multiple times and carrying on anyway. For working online RAID sets time and "write integrity" are critical factors, you can't hang the array to make multi-hour recovery operations and you must drop the array to make sure nothing is written to a faulty array, but recovery software have all the time in the world and won't make any writes.

In this non-theoretical case it's much harder, of course, because of the legendarily stupid decision to use BTRFS in the new products. One of my main arguments against BTRFS was always that no software supports unfinished beta file systems, a fact which unfortunately now will bite OP in the behind. The right approach here would probably be to try reconstructing the array and read it out as a disk image, storing it on another storage area of equal size, and then mounting that giant image as BTRFS. It's icky as this method doesn't support recovery of single files until the very last step, and I don't know how BTRFS will handle multiple blocks of randomly corrupt data (the stripe sets with read errors). Even then, if the resulting image can't even be mounted as a file system, recognizable unfragmented data could perhaps be restored anyway, using data recover tools that looks for known file headers (testdisk/PhotoRec could work directly with that image without even looking at the file system).

But my shortcut approach to the problem as a whole until I tried the procedure above would be:
0) Check that the problem is not due to usage of known faulty drives. Some models are simply bad. If that is the case, I would clone ALL the disks in next step, weighing the cost for all this against the value of my data.
1) Clone the working faulty disk; this will give you an image with read errors.
2) Give that faulty disk a workout with non-destructive reconstruction software such as Spinrite (I know of the hyperbole and crap the author spouts about this software but it's good for some veeerrry specific cases, such as this). If you are very very VERY lucky, data will be recovered to spare sectors and then you can force-mount the original array in Linux.

If this doesn't work, try recovering the array with the image-approach above.

Don't forget marking the disks when you take them out of the unit.

All this will take several days of work, maybe weeks.

Forum Discussion

ReadyNAS 314 - a year of hell and now lost 10years of data!

Related Content

ReadyNAS RN104 power died - moving data?

ReadyNas 104 Data Deleted

ReadyNAS 716X data- Two volumes after Resync

Readynas 104 data dead

ReadyNAS 524 data Dead

NETGEAR Academy

ProSupport for Business