NETGEAR is aware of a growing number of phone and online scams. To learn how to stay safe click here.
Forum Discussion
tonedeaf
Jul 18, 2011Aspirant
find and remove duplicate archive tarballs
At my work we have weekly cron jobs that gzip up things like our subversion server or the TFTP directory. I then have to thin out the archives every week and copy them on to a rotating set of 1TB disks for offsite backup. I use to just delete the oldest, but I found that for many weeks, some of the archives did not change. To be able to fit as many older versions of the archves on the offsite disk, I decided to run a one-liner in the background over the weekend.
You need to be able to ssh in as root for this.
# nice -15 find /backup/ -type f -name "*bz" -exec md5sum {} \; >>/root/arc_md5sum.txt &
This creates a text file of the md5sum of every archive in /backup (your needs may differ, of course). Nice makes the one-liner run slowly and & make it run in the background. You can use nohup too if you want.
Come Monday, I just do a quick sort:
# sort <arc_md5sum.txt >x.txt ; mv x.txt arc_md5sum.txt
and then I open it in vi for some hand editing. I delete every file I want to keep, and then replace the md5sum with `rm` . The finished file look like this:
# cat arc_md5sum.txt
rm /backup/hurricanes/backup/archive/hurricanes-tftpboot-backup-20110630.tar.bz
rm /backup/uma/backup/archive/uma-tftpboot-backup-20110625.tar.bz
rm /backup/hurricanes/backup/archive/hurricanes-tbova-backup-20110630.tar.bz
rm /backup/hurricanes/backup/archive/hurricanes-aghotkar-backup-20110630.tar.bz
Then I just type
# bash arc_md5sum.txt
and all the files in that text file are deleted.
You need to be able to ssh in as root for this.
# nice -15 find /backup/ -type f -name "*bz" -exec md5sum {} \; >>/root/arc_md5sum.txt &
This creates a text file of the md5sum of every archive in /backup (your needs may differ, of course). Nice makes the one-liner run slowly and & make it run in the background. You can use nohup too if you want.
Come Monday, I just do a quick sort:
# sort <arc_md5sum.txt >x.txt ; mv x.txt arc_md5sum.txt
and then I open it in vi for some hand editing. I delete every file I want to keep, and then replace the md5sum with `rm` . The finished file look like this:
# cat arc_md5sum.txt
rm /backup/hurricanes/backup/archive/hurricanes-tftpboot-backup-20110630.tar.bz
rm /backup/uma/backup/archive/uma-tftpboot-backup-20110625.tar.bz
rm /backup/hurricanes/backup/archive/hurricanes-tbova-backup-20110630.tar.bz
rm /backup/hurricanes/backup/archive/hurricanes-aghotkar-backup-20110630.tar.bz
Then I just type
# bash arc_md5sum.txt
and all the files in that text file are deleted.
1 Reply
Related Content
NETGEAR Academy

Boost your skills with the Netgear Academy - Get trained, certified and stay ahead with the latest Netgear technology!
Join Us!