Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
find and remove duplicate archive tarballs
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2011-07-18
10:05 AM
2011-07-18
10:05 AM
find and remove duplicate archive tarballs
At my work we have weekly cron jobs that gzip up things like our subversion server or the TFTP directory. I then have to thin out the archives every week and copy them on to a rotating set of 1TB disks for offsite backup. I use to just delete the oldest, but I found that for many weeks, some of the archives did not change. To be able to fit as many older versions of the archves on the offsite disk, I decided to run a one-liner in the background over the weekend.
You need to be able to ssh in as root for this.
# nice -15 find /backup/ -type f -name "*bz" -exec md5sum {} \; >>/root/arc_md5sum.txt &
This creates a text file of the md5sum of every archive in /backup (your needs may differ, of course). Nice makes the one-liner run slowly and & make it run in the background. You can use nohup too if you want.
Come Monday, I just do a quick sort:
# sort <arc_md5sum.txt >x.txt ; mv x.txt arc_md5sum.txt
and then I open it in vi for some hand editing. I delete every file I want to keep, and then replace the md5sum with `rm` . The finished file look like this:
# cat arc_md5sum.txt
rm /backup/hurricanes/backup/archive/hurricanes-tftpboot-backup-20110630.tar.bz
rm /backup/uma/backup/archive/uma-tftpboot-backup-20110625.tar.bz
rm /backup/hurricanes/backup/archive/hurricanes-tbova-backup-20110630.tar.bz
rm /backup/hurricanes/backup/archive/hurricanes-aghotkar-backup-20110630.tar.bz
Then I just type
# bash arc_md5sum.txt
and all the files in that text file are deleted.
You need to be able to ssh in as root for this.
# nice -15 find /backup/ -type f -name "*bz" -exec md5sum {} \; >>/root/arc_md5sum.txt &
This creates a text file of the md5sum of every archive in /backup (your needs may differ, of course). Nice makes the one-liner run slowly and & make it run in the background. You can use nohup too if you want.
Come Monday, I just do a quick sort:
# sort <arc_md5sum.txt >x.txt ; mv x.txt arc_md5sum.txt
and then I open it in vi for some hand editing. I delete every file I want to keep, and then replace the md5sum with `rm` . The finished file look like this:
# cat arc_md5sum.txt
rm /backup/hurricanes/backup/archive/hurricanes-tftpboot-backup-20110630.tar.bz
rm /backup/uma/backup/archive/uma-tftpboot-backup-20110625.tar.bz
rm /backup/hurricanes/backup/archive/hurricanes-tbova-backup-20110630.tar.bz
rm /backup/hurricanes/backup/archive/hurricanes-aghotkar-backup-20110630.tar.bz
Then I just type
# bash arc_md5sum.txt
and all the files in that text file are deleted.
Message 1 of 2
Labels:
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-09-05
08:12 AM