NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
InteXX
Dec 29, 2014Luminary
XRAID2 vs RAID6, etc.
I've got an RN104 on the way and I'd like to better understand my configuration options. To start with, I have to admit that I'm planning on using 3x2TB Caviar Greens to start with (WD20EARX); I kn...
StephenB
Dec 31, 2014Guru - Experienced User
Here's my thinking (rather long winded I'm afraid...)
InteXX wrote:
StephenB wrote: there are scenarios where the bitrot protection approach wouldn't work
You've got me curious. Care to discuss?
Let's start with the causes of bitrot. The disk itself has its own CRC codes saved for each block, which it verifies. It's pretty unlikely that the data block will simply change and still have a valid CRC (though given the amount of disk storage in use, I wouldn't claim it never has happened anywhere). So let's keep "spontaneous" bitrot on the list.
Other potential causes include
-a network error delivered wrong data to the NAS (with a valid ethernet CRC). No reason to think that the network can't rot data too.
-a memory failure in the NAS or the disk's own internal cache that created an error in the queued block before it was written
-a failure of some kind prevented the blocks from being written. For instance a power failure, or a disk controller bug.
-One or more disks with read failures are cloned, so there is bad data on the disks, but it all can be read.
Of course there could be more causes I am missing.
Some of these might not fit your definition of bitrot, but they all have the property that readable but wrong data ends up on the disk drive, so without knowing the cause they are indistinguishable from each other. And the disk cloning scenario is quite common with RAID array repair, so including it in the mix is worthwhile even if it isn't actually bitrot.
I'll assume RAID-5 but the reasoning easily extends to RAID6.
Taking these cases in turn:
(a)If the data the NAS was told to write was rotted by the network before it even reached the NAS, then clearly there isn't anything the NAS can do on its own to correct it. So the method surely fails in that case.
(b) If a memory failure in the NAS occurred before the parity block was recomputed, then there will be wrong data on the disk - but the parity can't help, because the parity was also computed using the same wrong data. If the memory failure corrupts the checksum, then the method also fails. But if the failure occurred after the parity block was computed and the checksum is valid, then the method will likely work.
(c)If the failure that prevented blocks from being written affects two or more blocks in the same group, then RAID repair can't help you either. If the correct checksum isn't written, then it also fails. If it affects just a single block in the group, then it likely works.
(d) In the cloned case, there is only one block in each group on the cloned disk. If the other blocks in each group (on the other disks) are intact, then the raid repair should work. If there is a read error or bitrot on another block in the group (e.g., a different disk is also ailing) then it will fail. Multiple disk failures do happen, there are plenty of posts here from users who had it happen to them.
If you needed to clone more than one disk, there are bad blocks on multiple disks. With luck, the bad blocks will be in different groups. In that case the method works. But if they aren't in different groups, then the technique fails.
In all cloned cases if a bad block holds the checksum the method fails to repair any other bad blocks covered by that checksum.
(e) going back to the original "spontaneous" rot case (where the data appears to change on its own on the disk drive), then the technique requires that
-there is no more than one bad (but readable) data block in the group, and all the other blocks (including parity) are readable and correct.
-the file checksum is correct.
The technique will fail when these conditions aren't met. For instance,
if two blocks in the group both rotted.
if one block and the checksum are rotted.
if only one block is rotted, but there is a read error on another block in the group.
if you do a raid scrub before you detect the checksum error, the scrub will "fix" the parity block using the rotted data, so the method will fail when the checksum error is detected later.
So my conclusions -
-most of the scenarios above sound pretty improbable (disk cloning aside, which is intentional). The one we are most likely to see in practice is a failure of one or more blocks to be written in the first place. If you had truly massive amounts of data (e.g, the amount google or facebook have), then seeing some of the other causes becomes more plausible.
-corollary: if the most probable cause is queued blocks not being written, then clean shutdowns lower the odds of bitrot substantially, and using a UPS helps ensure clean shutdowns.
-If you do need to clone disks as part of a RAID repair, you should restore as many files from other copies (e.g., backups) as you can after the volume is restored.
-Using a btrfs checksum to trigger raid repair will work some times, but will fail other times. It seems unlikely to make things worse in the cases where it fails. It's worth enabling, because reducing the amount of corruption in the volume is a good thing, even if you can't do it 100%.
Related Content
NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!