NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
rabidh
Apr 23, 2018Aspirant
Disk fail in X-RAID2, after sync half my files are gone!
Hi, I'm on a ReadyNAS NV+ v2, with RAIDiator 5.3.11. I had it configured for X-RAID2 with 3x 2GB drives and one older 512GB drive. A few days ago the 512GB drive failed, leaving the array unprote...
- Apr 24, 2018
rabidh wrote:
It seems particularly unlucky that the replacement drive I put in was faulty. Having just read into it a bit, I wasn't aware that in most RAID systems if one copy of the data becomes corrupt then even though it is duplicated ...
In your case your NAS is using RAID-5. RAID-5 doesn't duplicate your data. Rather it uses parity blocks that allow it to reconstruct data when something is missing.
Putting this in mathematical terms: Imagine a 4-disk RAID-5 array. If disks 1,2, and 3 have A, B, and C data blocks at sector N, then the fourth disk would have P=A+B+C in that sector. (It doesn't use normal addition, but does something else that has the same effect). Then if the disk 3 is replaced, the NAS reconstructs C using P-A-B.
This only works if the remaining disks can all be read (and when all have the correct data). If a disk can't be read duiring reconstruction, then the reconstruction fails (and the NAS knows that). If a disk is read, but gives the wrong data, then the reconstruction gives the wrong result (and the NAS has no way to detect that). Similarly, if the wrong data was somehow written to one of the disks in the first place (or if a disk write was lost), then the reconstruction will fail (and there is no way to detect that).
rabidh wrote:
it'll probably still cause corruption ... and probably the more high end systems have options in place to work around that.
Once corruption happens, then there is risk of data loss - that's just as true in high-end enterprise/cloud systems as it is in home NAS.
High-end systems have some features which can reduce the chance of corruption happening in the first place. For instance
- Error-correcting RAM
- Dual Power Supplies to help ensure that a PSU failure doesn't result in lost writes.
- UPS protection
BTW, UPS protection is something I always recommend (for all NAS). Often data corruption occurs with unexpected power loss.
Also if you have more disks in the NAS, there are some advanced RAID modes that can handle more than one failed disk. There is a price for that (both reduction in capacity and lower performance). And they don't help if the wrong data is on one or more disks. They only help if the disk can't be read.
Newer OS-6 ReadyNAS (at all price points) do have some features that are relevant here. They have more scheduled maintenance functions, that can detect issues sooner. They also use a newer file system called BTRFS, which supports built-in checksums that can detect corruption. That also gives those NAS some more sophisticated options for reconstruction.
But for all storage (enterprise and home) the primary defense against data loss/corruption is to have independent backups - full copies of the data on other devices,
StephenB
Apr 26, 2018Guru - Experienced User
rabidh wrote:
UDMA_CRC_Error_Count looks pretty disasterous? Does UDMA imply a problem with the SATA link itself rather than the disk though?
They are errors detected on the SATA link by the drive. So potential causes are the SATA backplane/connections, the NAS sata interface electronics, and the drive's sata interface electronics. Are the counts rising?
Just wanted to add that this could explain the dropout of the drive - the NAS disk drives might be declaring the interface dead.
You could also try powering down, and moving the drive to a different bay. If it the SATA link (and not the drive) the array might stay up. Still best to boot the system in read-only mode.
rabidh
Apr 29, 2018Aspirant
Just an update on this...
The UDMA error count hasn't gone up, so it seems that was a bit of a red herring.
However, I took that drive out and plugged it into my PC, then used `dd` with `conf=sync,noerror` and cloned it onto the 2TB drive that I'd used originally when the whole thing stopped working (I backed up *all* the drives onto a 6TB drive just in case). I got 7 IO errors from the drive I was reading, but that was it - the copy sailed though.
I put the cloned drive in, turned it on, and it now works great. I'm sure those 7 IO errors mean maybe 7 files are slightly corrupt, but that's a hell of a lot better than 2TB of lost data...
So, it looks like:
- I had a legit failure of the 512GB disk, and at the same time one 2TB drive was silently a little flaky
- When I swapped the 512GB disk out with the 2TB one, the ReadyNAS had an IO error and just freaked out, refusing to set it up as part of the volume and also dropping the 2TB disk from the array!
- I then rebooted and all the drives came back, but as soon as I started to copy I'd hit one of those bad sectors on the disk, get an IO error, and the ReadyNAS would drop the entire volume until I rebooted again.
So yeah, not impressed with ReadyNAS on this. I can understand dropping a volume due to IO errors when you're in a redundant array, but doing so when in an unprotected array *and sending no alert messages about it at all* seems like a really bad choice. The lack of official support from Netgear when the solution was so simple was a bit of an eye-opener too.
After two different ReadyNAS and 7 years of ownership I received a Synology NAS yesterday. While the build quality isn't as good as the ReadyNAS I'm blown away by the software (and the speed!) - I'm a total convert.
I'll still be keeping separate backups though :)
- StephenBApr 30, 2018Guru - Experienced User
FWIW, both vendors use the same linux tools to build their RAID arrays (mdadm), so the response to a disk error would likely be identical with your Synology.
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!