NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
ScottChapman
Dec 10, 2014Apprentice
How does bitrot protection actually work?
I understand the concept, but am curious how it is actually implemented on 6.2.0
45 Replies
Replies have been turned off for this discussion
Just got a RN316 and found this great thread. Thanks everyone for your helpful discussion of technical details. I plan on running a memtest any time before scrubbing.
Was also wondering about what scenarios bit rot protection would work, especially the scenarios snakyjake propsed in message 6.
Has anyone managed to find out any info about this from Netgear? It would be nice to know specifically if it will ever verify against checksums of data in snapshots, and not just RAID parity.
Is there a notification any time there's a checksum mismatch and not just when an error couldn't be corrected, or just a log entry generated in that scenario?
I did find a document at an EU Netgear site where it explains that BTRFS corrupt checksum events are sent to the md layer to find the correct data as mdgm said (thanks also for that information about use cases): ReadyNAS_Bit_Rot_Protection_Overview.pdf
This sheds a little more light on the mechanism, but not necessarily the technicals. Hope it gives slightly more clarity to the OP's question.
- sgogoAspirantMdgm-
That is great news! Thanks for following up!
SteveG - mdgm-ntgrNETGEAR Employee RetiredActually I enquired with one of our product engineers and had a clarification:
If there is a filesystem checksum mismatch, we try to re-assemble that RAID stripe in different ways until we get a checksum match. If we never get a checksum match, we give up and inform the user that we detected an error but couldn't correct it, as you've seen reported elsewhere. We never generate data to make the data match the checksum. It's about as safe as it can get.- pec967Luminary
Can bit rot protection be enabled on Home folders in OS 6.4.x? Unlike shared folders, there is not a check box to enable bit rot protection on Home folders.
I replaced a ReadyNAS Duo v1 with a RN312 in a RAID 1 configuration last year for home use. I only use the RN312 for user backups, and these backup files are written to folders in the users' Home directories. I then use ReadyNAS Vault to provide an off-site backup. I would certainly like to enable bit rot protection since many of the photos and music files in these incremental backups are static. Perhaps I need to create shared folders and then adjust the permissions to only allow access by the individual users?
I don't understand why ReadyNAS does not share more information on exactly how checksums, bit rot, and RAID reconstruction with checksums works in OS 6 for RAID 1, 5, and 6. While the information in this thread is helpful, I still have a number of questions. In the Enterprise storage array space, vendors like NetApp have always published in the open literature the specifics of how their checksums (data and metadata), file identity blocks, raid scrubbing, and bit rot work. For example, a Usenix paper in 2008 reported data for three years on 1.5 million disk drives in NetApp storage arrays at customer sites. Over this time, they found 400,000 checksum errors, of which 8% were discovered during RAID reconstruction often leading to data loss, The file identiy blocks identified an order of magnitude smaller number of errors due to things like lost or misdirected writes. The superior error handling performance of OS 6 is a selling point for ReadyNAS, particularly given the slower performance for the price compared to the competition, and you should step up and let your customers understand exactly how it works.
- sgogoAspirantThanks mdgm!
- mdgm-ntgrNETGEAR Employee Retired
sgogo wrote:
Just so I am clear, the process would be that the bit rot protection routine first checks the primary data, then, if it finds an error, it goes out to the redundant data location and checks THAT data.
It will only write from the redundant location to the original location if it finds a correct checksum at the redundant location.
If it finds an incorrect checksum at both locations (it will find both locations incorrect, since the memory is defective) then no data is written and an error is generated.
This is the way the ZFS system works and inspires confidence. Do I have it correct?
Yes - sgogoAspirant
mdgm wrote: I don't think that's possible. If the checksums at the filesystem level are all bad then one would expect the checksums at the md level to all be bad as well.
I think I understand...
Just so I am clear, the process would be that the bit rot protection routine first checks the primary data, then, if it finds an error, it goes out to the redundant data location and checks THAT data.
It will only write from the redundant location to the original location if it finds a correct checksum at the redundant location.
If it finds an incorrect checksum at both locations (it will find both locations incorrect, since the memory is defective) then no data is written and an error is generated.
This is the way the ZFS system works and inspires confidence. Do I have it correct?mdgm wrote: In any case bitrot protection is a great feature, but backups are still important. No important data should be stored on just the one device.
I am with you. Minimum of three (3) copies with at least one off site.
However, you can easily corrupt multiple copies if your primary source gets damaged by the file system and you do not know. As an example:
-On day 1, I have (3) 1TB drives A, B, & C with the same info.
-On day 2, copy A is damaged systematically by the file system without me knowing (but the drive is fine with no SMART errors).
-On day 3, drive B fails in the normal way, so I copy my data from drive A to drive B. - mdgm-ntgrNETGEAR Employee RetiredI don't think that's possible. If the checksums at the filesystem level are all bad then one would expect the checksums at the md level to all be bad as well.
In any case bitrot protection is a great feature, but backups are still important. No important data should be stored on just the one device. - sgogoAspirant
StephenB wrote: ...
If you are concerned about the impact of non-ecc memory, then perhaps buy a readynas that has ecc (e.g., the RN516)
I already have 3 ReadyNAS without ECC, so that is not an option until my next purchase.
I understand that bad memory can corrupt anything it writes, but most of my data is write once, save a long time, read often. Business records, photos, etc.
My concern is turning the bit rot protection on and then, due to bad memory, having every checksum fail during a scrub. This could conceivably cause a re-write of an entire disk with the bad memory. Then in one shot I have corrupted everything.
What do you think? - StephenBGuru - Experienced UserI don't know for sure, but I believe corrected errors are logged.
- BaJohnVirtuoso
StephenB wrote: If you are concerned about the impact of non-ecc memory, then perhaps buy a readynas that has ecc (e.g., the RN516)
Just to satisfy my curiosity as I have RN516.
If there was an error in the eec memory, would I know about it, or would it be hidden from the user?
Related Content
NETGEAR Academy
Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!