Forum Discussion

Virtuoso

Aug 27, 2025

Solved

checksumming outside of raid?

I am just curious if anyone here is doing any supplemental checksum checking of their data beyond what btrfs does, for example to check the data before backing it up to an external source...or compar...

Tips from other users

Dewdman42
Sep 01, 2025
after much reflection I have decided that using some kind of filesystem level checksumming is not a bad idea, but only on data that I don't expect to change. I will refer to that here as "archive" data. In which case a simple md5sum checksum saved somewhere will suffice for this purpose, can be created with bash script, should be created once per file and never changed (unless the file is actually updated). That way at any time I can check to see if the file is still as it was originally saved.

There are a variety of ways a file could be corrupted, besides just bit rot. Bit rot will theoretically be found and maybe corrected by btrfs. But other kinds of corruption from a variety of problems might end up as looking like perfectly legit file changes at the block level and would not be spotted. Which is where file system checksums come in. So if I keep these on my important "archive" data, and include it in all backups, at some point in the future I can always check the data and check my backups to see which are still good.

periodically checking them is not a bad idea to make sure its not corrupted and being backed up over the top of good backup. If I use snapshot versioning in the backup, then I can still get back to a good version too that way, and having the checksum will help me know.

so I see it as being useful, but only for "archive" data. Trying to track checksums of data that will be changed intentionally makes it very hard to distinguish between legit changes and corruption because it depends on file date and some forms of corruption would actually update the date. And or the date itself in meta data could also be compromised...so...I think it is quite complicated to keep updating checksums at filesystem level and not reliable anyway, so might as well forget it, just rely on BTRFS to do checksum checking at block level and then use checksums on "archive" data only. and no special tool needed, readynas as md5sum command which can be scripted in bash pretty easily to do that.

over and out on this topic.

Dewdman42

Virtuoso

Aug 27, 2025

yea I remember winsfv, haven't used that in a long time.

Well the main thing is catching the bit rot before the corrupted data gets backed up all over the place. I guess I just like the idea of being alerted immediately if there is bit rot that wasn't corrected by btrfs scrub so that I can go find the last known good copy immediately and update at the primary source, or first try scrub maybe and then if still bad, go find the last known good version.

I like the idea of chkbit, I am still learning about it. It has many modes but one thing it automatically does when you scan with it, is check the current data against last known checksum unless the date has been updated, in which case it assumes you have intentionally updated the file and so it updates the checksum in that case. Then basically you can be alerted if the data has changed compared to the saved checksum, on a per file basis...and do something about it.

I could see running it before any backup, if it fails, then email me and don't do the backup. or if I have proper snapshotting on my backup dest, then it can do the backup but give me dates for when the checksum error is found and I can go recover the file from a snapshot.

In theory btrfs is doing this automatically during scrub. I'm actually more interested in checking my non raided files on my desktops that don't self-heal... And also scrub doesn't happen every day so there could be a window of time when the file got corrupted and backed up as corrupted...so I'd want to be alerted to go find the file from backup history and restore it healthy condition. My understanding is that in real world, this will not happen often.

Recently when I lost my raid5 array, none of the drives had failed, but it had been giving me bit rot detection errors and I wasn't sure what other make of that but the truth is, some data was compromised and wasn't self-healed, possibly because the drives were in the process of failing...and I never really had any way to go back and try to find out WHEN they got corrupted or try to restore from some backup snapshot somewhere a known corrupted file. Live and learn, I wasn't on top of the disk errors, but the bit corruption will be in the back of my mind now, not really knowing whether I can trust the data, hopefully most of it will work fine and it will only be by sheer luck that I stumble on the files that were bit rotted and maybe readynas didn't correct properly due to failing disks. Thus a chkbit report log before every daily backup would be useful I think if I really care about the data.

StephenB

Guru - Experienced User

Aug 28, 2025

Dewdman42 wrote:
I like the idea of chkbit, I am still learning about it. It has many modes but one thing it automatically does when you scan with it, is check the current data against last known checksum unless the date has been updated, in which case it assumes you have intentionally updated the file and so it updates the checksum in that case. Then basically you can be alerted if the data has changed compared to the saved checksum, on a per file basis...and do something about it.

This would be useful, though I am thinking you could also monitor the BTRFS checksums. Though these checksums are on blocks, not files, so it would be tricky to trace back a BTRFS checksum error to a specific file.

Bit-rot can affect metadata, not just data, so depending on the date (or even the file name) isn't bulletproof.

Dewdman42
Virtuoso
Aug 28, 2025
Right. Well there is no perfect answer we can only check whatever we can check and try to do that more often. I would be curious if there is some way to use Btrfs to Check for bit rot in a way that is actually helpful enough to figure out how to fix it. I mean if we get an error that says sorry dude you have rotted data somewhere, the only option is to restore the entire volume from another backup and how do I know it hasn’t been backed up with rotted data also. Chkbit at least checks the file contents for change but as you point out the file name or file date could all be rotted too which perhaps makes the whole concept a moot point. In my view even built in scrub could be negatively affected by that possibility, no?

I guess btrfs checksums and bit rot detection must happen at block level underneath everything which would be the only way though bit rot even at that level could corrupt the the storage of the checksums I suppose, same problem. Does btrfs do the same checksumming with non raided volumes? Just scrub to find out if the whole volume is still clear of rot? How would you go about out determining a backup is ok to restore from? Must it be a raided btrfs backup in order to do so?

well automatic bit rot scrub correction is obviously way easier if it’s present. One thing I was thinking about is mainly using chkbit in my desktop computers since they are not raided. I feel like once data is stored on a btrfs raid the regular scrubs should be correcting bit rot automatically, in theory. It I have desktop working areas that are not raided and would never know if data was rotted and got moved to nas raid with rotted data.
- StephenB
  Guru - Experienced User
  Aug 28, 2025
  Dewdman42 wrote:
  I would be curious if there is some way to use Btrfs to Check for bit rot in a way that is actually helpful enough to figure out how to fix it
  
  dmesg is supposed to contain some details, and you can also use btrfs check --check-data-csum
  
  I don't know if either will give you the path to the damaged file (no way to test that unless there is damage).
  
  Part of the puzzle here is that ReadyNAS is using BTRFS version 4.16, the current version is 6.61.
  
  Dewdman42 wrote:
  I feel like once data is stored on a btrfs raid the regular scrubs should be correcting bit rot automatically, in theory.
  
  Netgear has added some proprietary features to try and correct the bit-rot. They don't always work - the one case I've seen on my own NAS over years resulted in a "can't correct" error message. I've seen some other similar results posted from other users.
  
  One thing I am not sure of is whether the scrub will ever simply recompute the errored checksums (not repairing anything).
  
  Dewdman42 wrote:
  Does btrfs do the same checksumming with non raided volumes?
  
  Yes, as long as you have checksums enabled on the volume tab. My recollection here is that it might not create checksums for files already in the volume (just going forward), but I am not certain on that.
  - Dewdman42
    Virtuoso
    Aug 28, 2025
    I'll follow up on some of those commands. The discussion also somewhat motivates me to get a new NAS for my critical data, could use the 524X as a backup server maybe so it doesn't die in a closet. but the fact that btrfs is so old is not great. I'm also leaning a bit towards ZFS the more I read about it.
    
    I was under the impression that both ZFS and BTRFS when they scrub they will attempt at that time to correct mismatched checksums. ZFS also does this whenever you ready (or write?) files... if scrub didn't take so long I'd do it every night honestly.
    
    My thoughts at the moment are that we can not 100% garauntee against bit rot, even with raid. We can reduce the odds of it happening. tools like chkbit can be useful at the filesystem level to find bit rot in the data section of files, but if there is rot in metadata, then the entire checking procedure based on the date and filename and contents being verified against each other, is going to be compromised. I guess that would be same result though? Errors. if it's the metadata causing the error or the contents of the file..does it matter? either way its gonna need to be replaced, I guess replacement would be totally remove the file and copy it in new again, or something like that? and if it happens to be running on top of BTRFS then we'd still have that scrub check happening at the block level also.
    
    I think BTRFS and ZFS checksum verification could also be compromised if at the block level the checksum data itself or date of the checksum, or perhaps other block level underpinnings are compromised in some way. That is perhaps less likely than filesystem metadata being corrupted, which is of course less likely then the actual file contents (knock on wood).
    
    Also disks with errors pre-fail could be causing bit rot in strange ways and who knows if scrub would fix it properly because the built in raid functionality to correct, could actually make things work if anything related to any of that were compromised.
    
    well basically I have desktops without BTRFS or ZFS checksums. There is zero checking there. chkbit would at least be a little bit more than what they have now, though...for sure, could not count on it always being correct for the reasons you said. It's still better than nothing. Having it on top of BTRFS/ZFS would basically get double checking, at both the block and the filesystem level. but either way...neither approach in my mind is 100% validation of no bit rot.
    
    In terms of both those, as well as my NAS, I have 321 backups in place, local and cloud. But the fundamental question is how to make sure that rotted data doesn't get blindly backed up to all the destinations. The destination won't even know it's corrupt. It will just have a copy that is bogus data and if the destination has checksums, they will all check out there.
    
    So how can we make sure we never backup rotted source to a destination, or make sure we will be able to find a non-rotted backup to restore from?
    
    One thing that is cool about chkbit is that the saved checksums are in hidden files (JSON) and would be backed up also, so then if you run chkbit again on the destination it will verify that the actual backup itself was carried out accurately and that the destination data matches what the checksums from the source said it should be.
    
    again that is presuming no problems with metadata. but I don't know how you could otherwise verify the actual backup or restore than at the filesystem level of the destination, at that point.
    
    Counting on scrub before every backup is not realistic. It will take way too long. But the btrfs check command might do it, I will have a look at that. basically a way to prevent backup from proceeding if errors are detected, so that I must find and correct those errors before backing it up over the top of a presumably good backup. But that will not fall automatically in to the normal front view backup procedure I guess.
    
    I also have questions about what happens if there is bit rot turning up, I guess it would affect many snapshots, not just the current one. since the filesystem doesn't even think the file has changed.
    
    I don't know, just thinking out loud here. I did have a bunch of bit rot errors that suddenly turned up while I was out of the country for a month. it turned out to be disks with errors, that were still functioning, not failed, but enough errors that I ended up pulling several of them. But somehow bit rot developed on a dozen files due to the raid not being perfectly healthy. the rot was not handled, not corrected, and I do recall using dmesg to locate the files which I just threw away, thankfully they were nothing that important this time. Did a factory reset since then, and of course no bit rot errors, but that doesn't mean some of the data was ruined because if I had backed up rotted data to idrive and then restored from idrive onto my factory reset..its not technically rotted now, checksums all match...but if any corrupt data got into the backup...the bad data is still there now nonetheless and I have no way to know.
    
    So anyway I'm trying to be more methodical and **bleep** about everything now, particularly a subset of critical data that I really can't lose.

NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!

Join Us!

ProSupport for Business

Comprehensive support plans for maximum network uptime and business peace of mind.

Learn More

Forum Discussion

checksumming outside of raid?

Related Content

XR500 V2.3.2.130 checksum

SRX5308 bootloader checksum error

I use x-raid (OS 6.9). I have 3 disks (raid 5). Can I change raid 5 to raid 1?

Raid X - Raid 10 RN628X

RAID 1 Demaged

NETGEAR Academy

ProSupport for Business