XRAID2 vs RAID6, etc.

Guru - Experienced User

Dec 31, 2014

InteXX wrote:
StephenB wrote:
there are scenarios where the bitrot protection approach wouldn't work

You've got me curious. Care to discuss?

Here's my thinking (rather long winded I'm afraid...)

Let's start with the causes of bitrot. The disk itself has its own CRC codes saved for each block, which it verifies. It's pretty unlikely that the data block will simply change and still have a valid CRC (though given the amount of disk storage in use, I wouldn't claim it never has happened anywhere). So let's keep "spontaneous" bitrot on the list.
Other potential causes include
-a network error delivered wrong data to the NAS (with a valid ethernet CRC). No reason to think that the network can't rot data too.
-a memory failure in the NAS or the disk's own internal cache that created an error in the queued block before it was written
-a failure of some kind prevented the blocks from being written. For instance a power failure, or a disk controller bug.
-One or more disks with read failures are cloned, so there is bad data on the disks, but it all can be read.

Of course there could be more causes I am missing.

Some of these might not fit your definition of bitrot, but they all have the property that readable but wrong data ends up on the disk drive, so without knowing the cause they are indistinguishable from each other. And the disk cloning scenario is quite common with RAID array repair, so including it in the mix is worthwhile even if it isn't actually bitrot.

I'll assume RAID-5 but the reasoning easily extends to RAID6.

Taking these cases in turn:
(a)If the data the NAS was told to write was rotted by the network before it even reached the NAS, then clearly there isn't anything the NAS can do on its own to correct it. So the method surely fails in that case.

(b) If a memory failure in the NAS occurred before the parity block was recomputed, then there will be wrong data on the disk - but the parity can't help, because the parity was also computed using the same wrong data. If the memory failure corrupts the checksum, then the method also fails. But if the failure occurred after the parity block was computed and the checksum is valid, then the method will likely work.

(c)If the failure that prevented blocks from being written affects two or more blocks in the same group, then RAID repair can't help you either. If the correct checksum isn't written, then it also fails. If it affects just a single block in the group, then it likely works.

(d) In the cloned case, there is only one block in each group on the cloned disk. If the other blocks in each group (on the other disks) are intact, then the raid repair should work. If there is a read error or bitrot on another block in the group (e.g., a different disk is also ailing) then it will fail. Multiple disk failures do happen, there are plenty of posts here from users who had it happen to them.

If you needed to clone more than one disk, there are bad blocks on multiple disks. With luck, the bad blocks will be in different groups. In that case the method works. But if they aren't in different groups, then the technique fails.

In all cloned cases if a bad block holds the checksum the method fails to repair any other bad blocks covered by that checksum.

(e) going back to the original "spontaneous" rot case (where the data appears to change on its own on the disk drive), then the technique requires that
-there is no more than one bad (but readable) data block in the group, and all the other blocks (including parity) are readable and correct.
-the file checksum is correct.

The technique will fail when these conditions aren't met. For instance,
if two blocks in the group both rotted.
if one block and the checksum are rotted.
if only one block is rotted, but there is a read error on another block in the group.
if you do a raid scrub before you detect the checksum error, the scrub will "fix" the parity block using the rotted data, so the method will fail when the checksum error is detected later.

So my conclusions -
-most of the scenarios above sound pretty improbable (disk cloning aside, which is intentional). The one we are most likely to see in practice is a failure of one or more blocks to be written in the first place. If you had truly massive amounts of data (e.g, the amount google or facebook have), then seeing some of the other causes becomes more plausible.

-corollary: if the most probable cause is queued blocks not being written, then clean shutdowns lower the odds of bitrot substantially, and using a UPS helps ensure clean shutdowns.

-If you do need to clone disks as part of a RAID repair, you should restore as many files from other copies (e.g., backups) as you can after the volume is restored.

-Using a btrfs checksum to trigger raid repair will work some times, but will fail other times. It seems unlikely to make things worse in the cases where it fails. It's worth enabling, because reducing the amount of corruption in the volume is a good thing, even if you can't do it 100%.

Forum Discussion

Related Content

Is XRaid2 dual-redundancy more reliable than RAID6?

New OS6 NAS No Longer offers XRAID2(Raid6) option

WAX630E Mesh / Channels etc

RAID6 horizontale Erweiterung

X10 question, 2.5Gbit solutions etc..

NETGEAR Academy

ProSupport for Business