× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Re: BTRFS and rsync --inplace what is the the default behavior and what is the optimal behavior?

metapaso
Apprentice

BTRFS and rsync --inplace what is the the default behavior and what is the optimal behavior?

Hi All,

 

I've just set up a few RSYNC backups from my new RN316.  I was poking around to see what options the built-in rsync backup uses.

 

1) From what I read elsewhere, I am lead to believe that the rsync option --inplace has significant benefits under btrfs when using snapshots.  The idea here is that if a large file has a small change, the --inplace option will allow btrfs COW to track that change, whereas without --inplace, rsync writes an entirely new file that btrfs can't track across snapshots.  According to what I read, this behavior is independent of the file-delta transfer that rsync does by default.  That is, even when rsync does a delta-transfer across a network, rsync itself creates and writes the changes to a brand-new file before deleting the old one (this makes sense on non-COW filesystems to preserve integrity in case the tranfer fails mid-write.)  How does this work on Readynas OS 6 in detail?  I don't even know how to go about testing it, or whether it's a bad idea to enable even on a COW filesystem.

 

2) Are there any plans to include user options to specify rsync command line options in the backup settings?  It would be great if we could just add the options that we prefer.

 

3) If I convince myself that --inplace is the proper option for my setup, where are the locations of the rsync scripts that OS 6 uses so I can add this myself.  Yes I know this may break software support.

 

Thanks!

Damon

Message 1 of 4

Accepted Solutions
StephenB
Guru

Re: BTRFS and rsync --inplace what is the the default behavior and what is the optimal behavior?

The NAS is not using this option now.

 

Per the rsync documentation, this is what the option actually does:


--inplaceThis option changes how rsync transfers a file when its data needs to be updated: instead of the default method of creating a new copy of the file and moving it into place when it is complete, rsync instead writes the updated data directly to the destination file.

This has several effects:

 

  • Hard links are not broken. This means the new data will be visible through other hard links to the destination file. Moreover, attempts to copy differing source files onto a multiply-linked destination file will result in a "tug of war" with the destination data changing back and forth.
  • In-use binaries cannot be updated (either the OS will prevent this from happening, or binaries that attempt to swap-in their data will misbehave or crash).
  • The file's data will be in an inconsistent state during the transfer and will be left that way if the transfer is interrupted or if an update fails.
  • A file that rsync cannot write to cannot be updated. While a super user can update any file, a normal user needs to be granted write permission for the open of the file for writing to be successful.
  • The efficiency of rsync's delta-transfer algorithm may be reduced if some data in the destination file is overwritten before it can be copied to a position later in the file. This does not apply if you use --backup, since rsync is smart enough to use the backup file as the basis file for the transfer.

WARNING: you should not use this option to update files that are being accessed by others, so be careful when choosing to use this for a copy.

This option is useful for transferring large files with block-based changes or appended data, and also on systems that are disk bound, not network bound. It can also help keep a copy-on-write filesystem snapshot from diverging the entire contents of a file that only has minor changes.

The option implies --partial (since an interrupted transfer does not delete the file), but conflicts with --partial-dir and --delay-updates. Prior to rsync 2.6.4 --inplace was also incompatible with --compare-dest and --link-dest.


 

The section I underlined is a significant drawback, especially for people using rsync over ssh to update to a remote location.  So if Netgear changes this, hopefully they will make it an option.

 

How this impacts CoW depends on another option, you'd also need to specify --no-whole-file if your goal is to minimize snapshot size.

 

The combination should reduce snapshot size.  As is always the case with CoW, it also increases fragmentation of the main file. 

 

@metapaso wrote:

 

3) If I convince myself that --inplace is the proper option for my setup...

 

 

Well, the main risk I see is that a failed backup will leave you with an partially updated (inconsistent) file, instead of just leaving the older version of the file on the backup device.

 

The potential benefit is only realized if the files are updated in a way that leaves most of the file blocks unchanged.  If you are inserting something new at the beginning of a file, then it likely won't help (since the old data shifts, so rsync will generally think it is new).  

 

I don't think it would make much difference for my own shares - partly because there isn't a lot of churn in the first place (files are added, but not often modified).  And the modifications I make to the largest files (media files) are generally adding tagging - which is normally inserted at the beginning of the file.

 


@metapaso wrote:

 

... where are the locations of the rsync scripts that OS 6 uses so I can add this myself.  Yes I know this may break software support.

 

 


At the moment, I think you'd have to use your own rsync commands.  AFAIK there is no way to modify any base script.

 

View solution in original post

Message 2 of 4

All Replies
StephenB
Guru

Re: BTRFS and rsync --inplace what is the the default behavior and what is the optimal behavior?

The NAS is not using this option now.

 

Per the rsync documentation, this is what the option actually does:


--inplaceThis option changes how rsync transfers a file when its data needs to be updated: instead of the default method of creating a new copy of the file and moving it into place when it is complete, rsync instead writes the updated data directly to the destination file.

This has several effects:

 

  • Hard links are not broken. This means the new data will be visible through other hard links to the destination file. Moreover, attempts to copy differing source files onto a multiply-linked destination file will result in a "tug of war" with the destination data changing back and forth.
  • In-use binaries cannot be updated (either the OS will prevent this from happening, or binaries that attempt to swap-in their data will misbehave or crash).
  • The file's data will be in an inconsistent state during the transfer and will be left that way if the transfer is interrupted or if an update fails.
  • A file that rsync cannot write to cannot be updated. While a super user can update any file, a normal user needs to be granted write permission for the open of the file for writing to be successful.
  • The efficiency of rsync's delta-transfer algorithm may be reduced if some data in the destination file is overwritten before it can be copied to a position later in the file. This does not apply if you use --backup, since rsync is smart enough to use the backup file as the basis file for the transfer.

WARNING: you should not use this option to update files that are being accessed by others, so be careful when choosing to use this for a copy.

This option is useful for transferring large files with block-based changes or appended data, and also on systems that are disk bound, not network bound. It can also help keep a copy-on-write filesystem snapshot from diverging the entire contents of a file that only has minor changes.

The option implies --partial (since an interrupted transfer does not delete the file), but conflicts with --partial-dir and --delay-updates. Prior to rsync 2.6.4 --inplace was also incompatible with --compare-dest and --link-dest.


 

The section I underlined is a significant drawback, especially for people using rsync over ssh to update to a remote location.  So if Netgear changes this, hopefully they will make it an option.

 

How this impacts CoW depends on another option, you'd also need to specify --no-whole-file if your goal is to minimize snapshot size.

 

The combination should reduce snapshot size.  As is always the case with CoW, it also increases fragmentation of the main file. 

 

@metapaso wrote:

 

3) If I convince myself that --inplace is the proper option for my setup...

 

 

Well, the main risk I see is that a failed backup will leave you with an partially updated (inconsistent) file, instead of just leaving the older version of the file on the backup device.

 

The potential benefit is only realized if the files are updated in a way that leaves most of the file blocks unchanged.  If you are inserting something new at the beginning of a file, then it likely won't help (since the old data shifts, so rsync will generally think it is new).  

 

I don't think it would make much difference for my own shares - partly because there isn't a lot of churn in the first place (files are added, but not often modified).  And the modifications I make to the largest files (media files) are generally adding tagging - which is normally inserted at the beginning of the file.

 


@metapaso wrote:

 

... where are the locations of the rsync scripts that OS 6 uses so I can add this myself.  Yes I know this may break software support.

 

 


At the moment, I think you'd have to use your own rsync commands.  AFAIK there is no way to modify any base script.

 

Message 2 of 4
metapaso
Apprentice

Re: BTRFS and rsync --inplace what is the the default behavior and what is the optimal behavior?

Ok.  Thanks for the explanation. And very much agreed that --inplace would be a terrible idea if the destination volume were not CoW.

 

It's true too for me that most of the files I'm rsyinc'ing will remain unchaged, but in particular there are a couple of large database-like files associated with Adobe Lightroom that are on the order of 2GB-10GB and have small daily changes. I planned to make hourly backups of these files and I imagine them filling up my disk more rapidly than I had anticipated even with smart pruning.  Maybe over a few years this could add up to a few hundred GB of wasted space...not too much sacrifice really if the disadvantages are as clear as you point out.  I need to really add up how much I'm talking about here so I can make the proper decision.  If the waste adds up to TB, then I need to figure out how to handle these files.  Also, it seems it very much depends on how well rsync actually matches up the blocks that need to be updated.

 

It seems that ideally if a file transfer were interrupted, rsync would notify the CoW filesystem that the file didn't finish and be able to roll back to the original.  That's my understanding of one of the advantages of CoW anyway.  

 

I note in your post the lines I was looking at in the rysnc documentation " It can also help keep a copy-on-write filesystem snapshot from diverging the entire contents of a file that only has minor changes."  That's really what I'm looking for, but without the risks you outline.

 

I was googling a bit on rsync, btrfs and --inplace and came across a few development notes from earlier this year about a proposed rsync option called "--reflink" which does indeed interact with CoW filesystems.  https://bugzilla.samba.org/show_bug.cgi?id=10170.

 

I'm not sure if there's anythign we can do to spur development, but I wait for this --reflink option with great anticipation.

 

Thanks!

Damon

 

Message 3 of 4
StephenB
Guru

Re: BTRFS and rsync --inplace what is the the default behavior and what is the optimal behavior?

FWIW, I don't think rsync pays any attention to the file system type on the local system.  It works the same way with ext as btrfs.  As CoW systems become more common, I would expect to see more tools becoming tailored for them.

 

In the meantime, I'd suggest turning off snapshots on the share with the large databases.  Alternatively, host the databases on the NAS and reverse the flow of the backup.

Message 4 of 4
Top Contributors
Discussion stats
  • 3 replies
  • 10311 views
  • 0 kudos
  • 2 in conversation
Announcements