Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
Rsync over SSH: only changed bits synced?
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-04-23
11:04 AM
2015-04-23
11:04 AM
Rsync over SSH: only changed bits synced?
Hi,
With the ReadyNAS Ultra Plus I owned several years ago, I set up rsync without ssh and loved that it only synchronized changed bits. When I renamed some large file on the Source, rather than deleting and copying it over to the Target, it simply renamed the file on the Target too. Great! That saves on bandwidth! I do not recall, but I think that when I moved a file to a different location on the same share, it simply moved it on the Target too without needing to transfer over the LAN.
Now that I own two RN314 running OS 6.2.2 (to be upgraded to 6.3 later), I have rsync WITH ssh configured, and noticed something that I haven't been able to resolve: when synchronizing from Source to Target (with option enabled to delete any files on Target that no longer exist on Source/ "Differential backup"), I notice that it DELETES the files on the Target, then COPIES them over. As a result, I see that it actually transfers the full files to the Target, clogging up bandwidth. Is that normal behavior? It no longer renames the file on the Target if it noticed the rename on the Source? I have not yet tested this without SSH, but due to the sensitivity of data, I will not be able to test this over the Internet without SSH.
Those who have rsync over SSH enabled, what has your experience been?
With the ReadyNAS Ultra Plus I owned several years ago, I set up rsync without ssh and loved that it only synchronized changed bits. When I renamed some large file on the Source, rather than deleting and copying it over to the Target, it simply renamed the file on the Target too. Great! That saves on bandwidth! I do not recall, but I think that when I moved a file to a different location on the same share, it simply moved it on the Target too without needing to transfer over the LAN.
Now that I own two RN314 running OS 6.2.2 (to be upgraded to 6.3 later), I have rsync WITH ssh configured, and noticed something that I haven't been able to resolve: when synchronizing from Source to Target (with option enabled to delete any files on Target that no longer exist on Source/ "Differential backup"), I notice that it DELETES the files on the Target, then COPIES them over. As a result, I see that it actually transfers the full files to the Target, clogging up bandwidth. Is that normal behavior? It no longer renames the file on the Target if it noticed the rename on the Source? I have not yet tested this without SSH, but due to the sensitivity of data, I will not be able to test this over the Internet without SSH.
Those who have rsync over SSH enabled, what has your experience been?
Message 1 of 7
Labels:
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-04-23
03:03 PM
2015-04-23
03:03 PM
Re: Rsync over SSH: only changed bits synced?
I believe that rsync always copies the full file if it thinks the file has changed. It doesn't try to detect that blocks x, y, z have changed, but other blocks are the same.
If --checksum is set, then rsync can verify if two files of the same size are identical or not. Generally readynas does not set the --checksum option - it is very CPU intensive even on a fast processor. One consequence is that it will sometimes send files it doesn't need to send.
Your ultra plus didn't set --checksum either (assuming you were using normal frontview backup). So I think this behavior has not changed.
If --checksum is set, then rsync can verify if two files of the same size are identical or not. Generally readynas does not set the --checksum option - it is very CPU intensive even on a fast processor. One consequence is that it will sometimes send files it doesn't need to send.
Your ultra plus didn't set --checksum either (assuming you were using normal frontview backup). So I think this behavior has not changed.
Message 2 of 7
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-04-23
07:36 PM
2015-04-23
07:36 PM
Re: Rsync over SSH: only changed bits synced?
Hi, Stephen,
When I researched rsync many years ago, I think that the protocol was designed to help send just the changed chunks, but I could be wrong (I'm far from being knowledgeable on rsync.) Based on http://en.wikipedia.org/wiki/Rsync under the section, "Determining which parts of a file have changed", it seems to suggest that only the changed parts of a file are sent: "If the sender's and recipient's versions of the file have many sections in common, the utility needs to transfer relatively little data to synchronize the files."
Even in my own, unscientific testing back in the Ultra 4 Plus days, I observed the following when checking rsync without ssh:
Source: 10GB of files, each file being at least 2GB in size.
LAN: 100 Mbps
Test procedure:
1. Using rsync, backed up Source to Target
2. Renamed most of the files on the Source
3. Performed another rsync backup, and I noticed that the same files at the Target instantly changed their names too. The entire "backup" finished within seconds. Had it actually had to transfer the 10GB of files to the Target, it would have taken at least a few minutes to do so.
4. Added another 2GB file and renamed a bunch of files again
5. Started rsync backup, and the files that were renamed also instantly renamed on the Target while the new file was slowly being copied over the LAN.
This test satisfied my requirement to keep both NAS synchronized while transferring only a minimal amount of changed data.
As stated, I have not been able to reproduce this with SSH and have no spare ReadyNAS to test without SSH. It would seem odd that rsync would behave differently via SSH, and so I am more inclined to believe that perhaps the default rsync options used on OS6 is different from that on the Ultra 4 Plus.
When I researched rsync many years ago, I think that the protocol was designed to help send just the changed chunks, but I could be wrong (I'm far from being knowledgeable on rsync.) Based on http://en.wikipedia.org/wiki/Rsync under the section, "Determining which parts of a file have changed", it seems to suggest that only the changed parts of a file are sent: "If the sender's and recipient's versions of the file have many sections in common, the utility needs to transfer relatively little data to synchronize the files."
Even in my own, unscientific testing back in the Ultra 4 Plus days, I observed the following when checking rsync without ssh:
Source: 10GB of files, each file being at least 2GB in size.
LAN: 100 Mbps
Test procedure:
1. Using rsync, backed up Source to Target
2. Renamed most of the files on the Source
3. Performed another rsync backup, and I noticed that the same files at the Target instantly changed their names too. The entire "backup" finished within seconds. Had it actually had to transfer the 10GB of files to the Target, it would have taken at least a few minutes to do so.
4. Added another 2GB file and renamed a bunch of files again
5. Started rsync backup, and the files that were renamed also instantly renamed on the Target while the new file was slowly being copied over the LAN.
This test satisfied my requirement to keep both NAS synchronized while transferring only a minimal amount of changed data.
As stated, I have not been able to reproduce this with SSH and have no spare ReadyNAS to test without SSH. It would seem odd that rsync would behave differently via SSH, and so I am more inclined to believe that perhaps the default rsync options used on OS6 is different from that on the Ultra 4 Plus.
Message 3 of 7
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-04-24
03:38 AM
2015-04-24
03:38 AM
Re: Rsync over SSH: only changed bits synced?
Yes, the rolling checksum is intended to send only updated blocks. Thx for posting that link. Though again that requires --checksum.
As far as renames goes, the only option I'm aware of to track them is the "fuzzy" option (which would need --checksum to be safe).
I haven't seen any of the speedups you are saw in your early test when using rsync with my pro and NV+ machines, and I do know that --checksum and --fuzzy are not set with frontview backup. viewtopic.php?f=31&t=69497 shows the options that are used.
So I can't explain your early results.
As far as renames goes, the only option I'm aware of to track them is the "fuzzy" option (which would need --checksum to be safe).
I haven't seen any of the speedups you are saw in your early test when using rsync with my pro and NV+ machines, and I do know that --checksum and --fuzzy are not set with frontview backup. viewtopic.php?f=31&t=69497 shows the options that are used.
So I can't explain your early results.
Message 4 of 7
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-04-24
10:26 AM
2015-04-24
10:26 AM
Re: Rsync over SSH: only changed bits synced?
Thanks for providing that thread link -- I'll check it out later today. I will also check the rsync options later to see what are available, and whether --checksum would accomplish what I'd like to do (while fully aware, as you stated, that it would tax the CPU heavily.) If there are desirable options available, I will also later look for how to configure ReadyNAS to perform backups with those options set. I may have to set off those backups manually via a SSH terminal, or find a way to link the physical backup button to that job.
My ISP unfortunately has a data cap, and so my goal is to replicate my early Ultra 4 Plus results to enable not only speedy synchronization, but to also minimize bandwidth usage.
My ISP unfortunately has a data cap, and so my goal is to replicate my early Ultra 4 Plus results to enable not only speedy synchronization, but to also minimize bandwidth usage.
Message 5 of 7
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-04-24
01:23 PM
2015-04-24
01:23 PM
Re: Rsync over SSH: only changed bits synced?
A backup service that provides deduplication is another option - something like crashplan. It also has a free "friend" mode that would let you back up to another NAS - so you don't have to store data on their servers. The paid modes (to their servers) are continuous, the free mode is not.
This would be strictly backup though - you could not use it to sync the two NAS. The output files are encrypted, and cannot be accessed on the target machine.
I've never actually measured crashplan's overall efficiency (let alone compared it with rsync). But it seems pretty good (and supports compression as well as de-duplication). It might be worth testing (both NAS on the same LAN initially).
This would be strictly backup though - you could not use it to sync the two NAS. The output files are encrypted, and cannot be accessed on the target machine.
I've never actually measured crashplan's overall efficiency (let alone compared it with rsync). But it seems pretty good (and supports compression as well as de-duplication). It might be worth testing (both NAS on the same LAN initially).
Message 6 of 7
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-04-24
01:29 PM
2015-04-24
01:29 PM
Re: Rsync over SSH: only changed bits synced?
Dedupe functionality by ReadyNAS would be great 😉 CrashPlan does sound nice, but yes, not being able to access on the Target would be a problem 🙂
Message 7 of 7