NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
ScottChapman
Dec 10, 2014Apprentice
How does bitrot protection actually work?
I understand the concept, but am curious how it is actually implemented on 6.2.0
45 Replies
Replies have been turned off for this discussion
- StephenBGuru - Experienced UserWell, I am very skeptical that the bad non-ecc memory would magically fail systematically on the checksum and nothing else.
Having said that, I believe Netgear's implementation repairs bitrot using the RAID protection. So if the checksum fails to validate, it attempts to rebuild the sector that failed from the other RAID blocks in that stripe. If that doesn't result in a checksum that passes, then the bitrot repair fails. That sounds similar to the way you describe ZFS.
Of course, once either approach finds something good to use, that pesky bad memory might just corrupt it before it gets rewritten. So I am sticking with my position that bad memory can corrupt any file system. It seems to me that the simplest way to corrupt the volume with bad memory is on the initial write (the data being corrupted in memory before it is ever written to the disk).
If you are concerned about the impact of non-ecc memory, then perhaps buy a readynas that has ecc (e.g., the RN516) - sgogoAspirantYes, but there has been discussion that non-ecc memory could corrupt an entire volume systematically.
During a scrub, the data is regularly checked for bit-rot and if the memory is bad, it would calculate bit rot (incorrectly)...then replace good data with bad data.
ZFS wont do that based on the way it replaces the "bad" memory spot (It would have to find something it calculates as "good" to replace the bad, and bad memory would never find something "good" to use).
But how does Netgear's implementation of BTRFS do that? - StephenBGuru - Experienced UserBad memory in the NAS can always corrupt the file system. It doesn't matter what the file system is (ZFS, BTRFS, EXT, ...).
- sgogoAspirantDo you guys know if using non-ECC memory with BTRFS and bitrot protection "on" has the ability to damage good data?
There has always been some discussion with ZFS that a scrub with non-ECC memory could potentially re-write the entire drive with bad data. I do not think that is true based on the method ZFS uses to replace data... it will not overwrite unless the new data iss confirmed good.
However, I do not understand the methodology for BTRFS scrubbing... is it the same as ZFS? Could a bad memory module without ECC casue the BTRFS to scrub the disk(s) with bad data? - StephenBGuru - Experienced UserI believe checksums are enabled/disabled at the volume level only (at least I am not seeing any subvolume controls). If so, then btrfs should alert you to checksum failures.
I run jbod also, but have bitrot protection and snapshots enabled on shares. (if you want CoW on all the time, then you do need bitrot protection enabled, since the features are coupled). - anonymAspirantHi, I've got an RN102 with only one disk installed.
The NAS initialized with a single volume under XRAID2 JBOD.
Checksums are enabled on the volume because the default set of shares have bitrot protection enabled.
On one of the shares, I disabled bitrot protection (since bitrot can't be fixed without a redundant copy) and set snapshots to never and then restored data on to it over usb.
I presume BTRFS has created checksums for these files? Does that mean that, even with bitrot protection disabled, BTRFS will detect bitrot and alert me to the problem?
Is there any benefit in enabling bitrot protection in this scenario?
Or should I be switching checksums off as well? (although I might add a second disk later...then I might have the option of enabling automatic bitrot protection on the existing data)
Thanks
Paul. - BaJohnVirtuosoThanks Stephen.
So to answer my own question:-
"If I put data on my ReadyNAS (with snapshots) and never updated it for 5 years, would bitrot be detected?"
Yes BUT only when I go to read some data, and more significantly only those blocks that are being read would be checked.
Then the BTRFS passes the error to the RNOS which (in my case - RAID10) would go off and repair with data from the mirror.
Thanks again. - StephenBGuru - Experienced UserBTRFS generates the checksums on writes, and verifies them on reads. This feature is built into BTRFS itself.
Netgear's protection algorithm is their own, and they aren't saying much about how it works, other than what I said above. It's a unique feature for Netgear, and my guess is that they want to keep it that way.
There is a similar bitrot protection being built into BTRFS (using a raid-like mode that is integrated into the file system). But that raid-like mode is still experimental, and OS6 is using traditional software raid instead. - BaJohnVirtuoso
StephenB wrote: There are several posts on this topic, mostly speculative on the details.
Hence my comments in this forum about having a definitive technical source.StephenB wrote: What we know for sure is that
(a) BTRFS includes a checksum feature, and that is enabled when bitrot protection is on.
(b) if a checksum error occurs, then bitrot is detected, and the Netgear algorithm attempts to repair it from the RAID parity blocks. This is different from normal RAID repair, which is triggered by a read failure (not a checksum error).
One or two users have reported cases there this algorithm failed to recover data with a correct checksum. So far I have not seen any users reporting a case where it did recover data.
BUT what prompts for a checksum error to be discovered?
i.e. Does BTRFS regularly do checksum testing unprompted? Is it only on a data write? Is it on a data read? etc - StephenBGuru - Experienced UserThere are several posts on this topic, mostly speculative on the details.
What we know for sure is that
(a) BTRFS includes a checksum feature, and that is enabled when bitrot protection is on.
(b) if a checksum error occurs, then bitrot is detected, and the Netgear algorithm attempts to repair it from the RAID parity blocks. This is different from normal RAID repair, which is triggered by a read failure (not a checksum error).
One or two users have reported cases there this algorithm failed to recover data with a correct checksum. So far I have not seen any users reporting a case where it did recover data.
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!