NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
rabidh
Apr 23, 2018Aspirant
Disk fail in X-RAID2, after sync half my files are gone!
Hi, I'm on a ReadyNAS NV+ v2, with RAIDiator 5.3.11. I had it configured for X-RAID2 with 3x 2GB drives and one older 512GB drive. A few days ago the 512GB drive failed, leaving the array unprote...
- Apr 24, 2018
rabidh wrote:
It seems particularly unlucky that the replacement drive I put in was faulty. Having just read into it a bit, I wasn't aware that in most RAID systems if one copy of the data becomes corrupt then even though it is duplicated ...
In your case your NAS is using RAID-5. RAID-5 doesn't duplicate your data. Rather it uses parity blocks that allow it to reconstruct data when something is missing.
Putting this in mathematical terms: Imagine a 4-disk RAID-5 array. If disks 1,2, and 3 have A, B, and C data blocks at sector N, then the fourth disk would have P=A+B+C in that sector. (It doesn't use normal addition, but does something else that has the same effect). Then if the disk 3 is replaced, the NAS reconstructs C using P-A-B.
This only works if the remaining disks can all be read (and when all have the correct data). If a disk can't be read duiring reconstruction, then the reconstruction fails (and the NAS knows that). If a disk is read, but gives the wrong data, then the reconstruction gives the wrong result (and the NAS has no way to detect that). Similarly, if the wrong data was somehow written to one of the disks in the first place (or if a disk write was lost), then the reconstruction will fail (and there is no way to detect that).
rabidh wrote:
it'll probably still cause corruption ... and probably the more high end systems have options in place to work around that.
Once corruption happens, then there is risk of data loss - that's just as true in high-end enterprise/cloud systems as it is in home NAS.
High-end systems have some features which can reduce the chance of corruption happening in the first place. For instance
- Error-correcting RAM
- Dual Power Supplies to help ensure that a PSU failure doesn't result in lost writes.
- UPS protection
BTW, UPS protection is something I always recommend (for all NAS). Often data corruption occurs with unexpected power loss.
Also if you have more disks in the NAS, there are some advanced RAID modes that can handle more than one failed disk. There is a price for that (both reduction in capacity and lower performance). And they don't help if the wrong data is on one or more disks. They only help if the disk can't be read.
Newer OS-6 ReadyNAS (at all price points) do have some features that are relevant here. They have more scheduled maintenance functions, that can detect issues sooner. They also use a newer file system called BTRFS, which supports built-in checksums that can detect corruption. That also gives those NAS some more sophisticated options for reconstruction.
But for all storage (enterprise and home) the primary defense against data loss/corruption is to have independent backups - full copies of the data on other devices,
rabidh
Apr 24, 2018Aspirant
Thanks for the in-depth reply.
It seems particularly unlucky that the replacement drive I put in was faulty. Having just read into it a bit, I wasn't aware that in most RAID systems if one copy of the data becomes corrupt then even though it is duplicated, it'll probably still cause corruption. I guess that's what happened in this case, and probably the more high end systems have options in place to work around that.
Do you have a link to where the NetGear recovery service is? In my.netgear.com and 'purchase service contract' for my device I just see 'There are currently no service contracts available for this product' - I assumed because it was too old (6 years).
StephenB
Apr 24, 2018Guru - Experienced User
rabidh wrote:
It seems particularly unlucky that the replacement drive I put in was faulty. Having just read into it a bit, I wasn't aware that in most RAID systems if one copy of the data becomes corrupt then even though it is duplicated ...
In your case your NAS is using RAID-5. RAID-5 doesn't duplicate your data. Rather it uses parity blocks that allow it to reconstruct data when something is missing.
Putting this in mathematical terms: Imagine a 4-disk RAID-5 array. If disks 1,2, and 3 have A, B, and C data blocks at sector N, then the fourth disk would have P=A+B+C in that sector. (It doesn't use normal addition, but does something else that has the same effect). Then if the disk 3 is replaced, the NAS reconstructs C using P-A-B.
This only works if the remaining disks can all be read (and when all have the correct data). If a disk can't be read duiring reconstruction, then the reconstruction fails (and the NAS knows that). If a disk is read, but gives the wrong data, then the reconstruction gives the wrong result (and the NAS has no way to detect that). Similarly, if the wrong data was somehow written to one of the disks in the first place (or if a disk write was lost), then the reconstruction will fail (and there is no way to detect that).
rabidh wrote:
it'll probably still cause corruption ... and probably the more high end systems have options in place to work around that.
Once corruption happens, then there is risk of data loss - that's just as true in high-end enterprise/cloud systems as it is in home NAS.
High-end systems have some features which can reduce the chance of corruption happening in the first place. For instance
- Error-correcting RAM
- Dual Power Supplies to help ensure that a PSU failure doesn't result in lost writes.
- UPS protection
BTW, UPS protection is something I always recommend (for all NAS). Often data corruption occurs with unexpected power loss.
Also if you have more disks in the NAS, there are some advanced RAID modes that can handle more than one failed disk. There is a price for that (both reduction in capacity and lower performance). And they don't help if the wrong data is on one or more disks. They only help if the disk can't be read.
Newer OS-6 ReadyNAS (at all price points) do have some features that are relevant here. They have more scheduled maintenance functions, that can detect issues sooner. They also use a newer file system called BTRFS, which supports built-in checksums that can detect corruption. That also gives those NAS some more sophisticated options for reconstruction.
But for all storage (enterprise and home) the primary defense against data loss/corruption is to have independent backups - full copies of the data on other devices,
- rabidhApr 24, 2018Aspirant
Thanks - and you're totally right about the load issues, as it seems that one of the other disks just reported smart errors as well - so that won't have helped the reconstruction either.
I do have a UPS, as well as a separate computer running scheduled rsync backups, and different makes and models of hard disk in the NAS to try and avoid 2 disks going at the same time - so it's still frustrating to have lost data. I guess I should have invested in more storage and rsynced *everything*, not just the super important data.
It sounds like OS6 with BTRFS and scheduled checks is a real improvement. It's just a shame older Netgear devices aren't kept updated - if there had been scheduled checks (or alert emails via gmail hadn't silently stopped working) then this most likely could have been avoided.
- rabidhApr 26, 2018Aspirant
I'm posting again here as it looks like the new post I started on this got locked/hidden/deleted somehow? https://community.netgear.com/t5/Using-your-ReadyNAS/quot-Status-Spare-Inactive-quot-on-previously-ok-drive/td-p/1558637
After rebooting the ReadyNAS comes up with all 3 drives showing 'Ok', and with no filesystem errors when a check is run. I'm able to copy data off just great.
However, some files (I'm not sure which ones) cause the ReadyNAS to drop one of the 3 drives - turning it from "Ok" to "Spare Inactive". No alerts are created in the console. After that, rsync fails on the majority of files with an input/output error - however a restart of the NAS shows no filesystem errors and everything starts working again.
Is there anything I can do to avoid this and get the ReadyNAS to keep all 3 drives in the array all the time while I copy the files off? Which log files should I look at to find why it's dropped the volume?
I have shell access to it and I'm a long-time Linux user (10+ years), so if this is a timeout or some setting that could be modified I'm happy to dig in.
What are my options here apart from ReclaiMe? As previously stated NetGear's support options (paid or not) are not available to me for some reason.
What if I copied all the partitions off all the drives to a new hard disk on my Linux PC (with dd conv=sync,noerror)? Is there enough metadata that mdadm could reconstruct the volumes automatically, or could I get the information needed to reconstruct from the ReadyNAS somehow?
Looking at mdstat I have several different RAID volumes using multiple partitions (I guess due to XRAID-2) so it's not going to be a matter of just setting up a single RAID5 array from them - looks like I'd have to link the 2 RAID5 arrays somehow (are they all just concatenated together?).
- StephenBApr 26, 2018Guru - Experienced User
The safest thing to do is to clone all three drives to new ones. Then power down the NAS, insert the three clones, and power up. Then your original disks remain completely intact (no chance of more issues).
You could alternatively just clone the drive that is dropping out. Then power down, swap the problem drive with the clone, and power up read-only using the boot menu.
Have you looked at the SMART stats on the drive that is dropping out? There should be something in the logs related to the drive health, mdadm issues, or btrfs issues.
Related Content
NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!