Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
Finding and deleting duplicate files.
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2009-01-08
06:29 AM
2009-01-08
06:29 AM
Finding and deleting duplicate files.
I've only just got a readyNAS NV+, after years of using various sizes of usb-connected disks, and now I need to rationalise the thousands of files I've accumulated. Since there are lots of duplicates, that would be next to impossible by hand. I considered using one of the free de-cloning programs for Windows, but that would take days to scan over the whole system, I wouldn't be able to reboot or logout of my local pc, and if there was a network problem I'd have to start again. So I decided to see what could run directly on the ReadyNAS.
I've updated the NV+'s firmware, enabled ssh access as root, and installed apt-get as described in the developer's tips. Then on my desktop box I did a google search for the keywords "Debian Etch file duplicate" to see what clone-finding utilities were around. Of the ones I looked at, 'fdupes' seemed simplest and needed fewest changes on the NV+. It's also free 🙂
I ssh'ed to my NV+ and logged in as root, then ran:
apt-get update
and to see what changes installing fdupes would make I ran
apt-get -s install fdupes
(don't forget the -s which tells the NV+ to just simulate the install)
In my case it showed only one file being downloaded and added, and none being removed or upgraded. That seemed safe, so I went ahead with the install. Your results could be different, so be careful, and don't blame me if you have different results.
apt-get install fdupes
Everything seemed fine, so I found a small subtree in the directory structure to test it on and ran it to recursively search.
On my system it took 1hr 50m to scan just over 67000 files. I suspect it would be quicker with more memory, and I'll re-time it after the upgrade I ordered comes through. The NAS wasn't doing anything else at the time, and fdupes was using about 96% of the available cpu.
Some thoughts on how to use fdupes.
As it is, there's nothing sensible you can do, as it outputs to screen You can't even disconnect without ending the session. So:
Use 'nohup' to tell the session to continue even if you disconnect, and '>' to redirect the output to a text file you can edit later to create a script to delete unwanted copies etc.
eg
#cd /c/backup
#nohup fdupes -1frq * > /c/backupDups.txt &
(nb: that's -<one>fr... it says recursively search all directories within the 'backup' directory and send all the results to the named textfile. List each group of duplicates on a single line, omitting the first one found, and don't bother with a progress indicator. The ampersand says 'start this command, then give me a prompt so I can do something else').
Edit the resulting textfile as you wish. You might want to access it from your desktop machine, and load it into a spreadsheet or similar to allow sorting or other processing.
If you find your file access is too slow while this is running, use 'top' to see what is loading your processor. You'll see a changing display like:
The line below the one starting 'PID' shows what is using most resources, but you can make it be more considerate to other processes by 'renicing' it. Within 'top' press 'r' and it will ask which process to renice. In this case it's '32630' so type that number, then when it asks what to renice to, choose a +ve number less than 10. It will show under the 'NI' column.
If you want to make something yield processor less, so it will finish quicker, you can also renice to a -ve number. Don't renice anything you've not started yourself, and even then be careful! Use 'q' to exit 'top'
Hope that's useful to some of you. maybe a front-end addon could be made?
I've updated the NV+'s firmware, enabled ssh access as root, and installed apt-get as described in the developer's tips. Then on my desktop box I did a google search for the keywords "Debian Etch file duplicate" to see what clone-finding utilities were around. Of the ones I looked at, 'fdupes' seemed simplest and needed fewest changes on the NV+. It's also free 🙂
I ssh'ed to my NV+ and logged in as root, then ran:
apt-get update
and to see what changes installing fdupes would make I ran
apt-get -s install fdupes
(don't forget the -s which tells the NV+ to just simulate the install)
In my case it showed only one file being downloaded and added, and none being removed or upgraded. That seemed safe, so I went ahead with the install. Your results could be different, so be careful, and don't blame me if you have different results.
apt-get install fdupes
Everything seemed fine, so I found a small subtree in the directory structure to test it on and ran it to recursively search.
On my system it took 1hr 50m to scan just over 67000 files. I suspect it would be quicker with more memory, and I'll re-time it after the upgrade I ordered comes through. The NAS wasn't doing anything else at the time, and fdupes was using about 96% of the available cpu.
Some thoughts on how to use fdupes.
As it is, there's nothing sensible you can do, as it outputs to screen You can't even disconnect without ending the session. So:
Use 'nohup' to tell the session to continue even if you disconnect, and '>' to redirect the output to a text file you can edit later to create a script to delete unwanted copies etc.
eg
#cd /c/backup
#nohup fdupes -1frq * > /c/backupDups.txt &
(nb: that's -<one>fr... it says recursively search all directories within the 'backup' directory and send all the results to the named textfile. List each group of duplicates on a single line, omitting the first one found, and don't bother with a progress indicator. The ampersand says 'start this command, then give me a prompt so I can do something else').
Edit the resulting textfile as you wish. You might want to access it from your desktop machine, and load it into a spreadsheet or similar to allow sorting or other processing.
If you find your file access is too slow while this is running, use 'top' to see what is loading your processor. You'll see a changing display like:
top - 14:07:04 up 1 day, 17:08, 2 users, load average: 0.20, 0.10, 0.40
Tasks: 89 total, 1 running, 88 sleeping, 0 stopped, 0 zombie
Cpu(s): 4.5% us, 3.6% sy, 0.9% ni, 91.0% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 226384k total, 221808k used, 4576k free, 11680k buffers
Swap: 255968k total, 30128k used, 225840k free, 93504k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32630 root 17 0 2992 1680 1328 R 96.0 0.7 1:02.74 fdupes
...
The line below the one starting 'PID' shows what is using most resources, but you can make it be more considerate to other processes by 'renicing' it. Within 'top' press 'r' and it will ask which process to renice. In this case it's '32630' so type that number, then when it asks what to renice to, choose a +ve number less than 10. It will show under the 'NI' column.
If you want to make something yield processor less, so it will finish quicker, you can also renice to a -ve number. Don't renice anything you've not started yourself, and even then be careful! Use 'q' to exit 'top'
Hope that's useful to some of you. maybe a front-end addon could be made?
Message 1 of 10
Labels:
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2010-12-09
11:55 PM
2010-12-09
11:55 PM
Re: Finding and deleting duplicate files.
I ran into a problem of duplicate files and someone told me about www.duplicates-finder.com
Message 2 of 10
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2011-08-23
03:17 PM
2011-08-23
03:17 PM
Re: Finding and deleting duplicate files.
Hi
while this maybe an old post, but a function/option on the NAS that could look for duplicates would be awesome,
Is there any chance one could be added as an add-on (even as an unsupported one)?
In know I have loads of duplicates, and I am sure I am no alone if I said it will take me ages to find them all.
Cheers
while this maybe an old post, but a function/option on the NAS that could look for duplicates would be awesome,
Is there any chance one could be added as an add-on (even as an unsupported one)?
In know I have loads of duplicates, and I am sure I am no alone if I said it will take me ages to find them all.
Cheers
Message 3 of 10
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2011-10-01
12:24 AM
2011-10-01
12:24 AM
Re: Finding and deleting duplicate files.
Just wanted to add another post to this one to keep it alive. With a 2TB limitation on the Duos now optimisation of space is key, so if Netgear or anyone can come up with an add on that will show me all my duplicate files and give me the option to move or even delete that would be huge !
I've tried duplicate file finder apps on the PC looking to a readynas share, but these are so slow and due to the number of files nit one has worked yet 😞
I've tried duplicate file finder apps on the PC looking to a readynas share, but these are so slow and due to the number of files nit one has worked yet 😞
Message 4 of 10
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2011-10-24
11:17 AM
2011-10-24
11:17 AM
Re: Finding and deleting duplicate files.
Hi Antony09, not sure if you are on commission for that post, but if not you must only have used it with a few files, because the people I have spoke to find large NAS searches impossible as the toolslike the one you've suggested freeze due to the amount of files it has to go through.
An netgear add-on that does not require SSH etc, which can do a small amount of indexing on the NAS itself pulling key attributes would be hugely usefull.
An netgear add-on that does not require SSH etc, which can do a small amount of indexing on the NAS itself pulling key attributes would be hugely usefull.
Message 5 of 10
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2012-01-05
06:39 AM
2012-01-05
06:39 AM
Re: Finding and deleting duplicate files.
Does anyone with some good Linux experience know if this "http://www.pixelbeat.org/fslint/" can be installed on the ReadyNAS DUO ?
Message 6 of 10
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2012-01-05
07:01 PM
2012-01-05
07:01 PM
Re: Finding and deleting duplicate files.
It's not a solution more of a warning.
I purchased no clone (enterprise version in fact)..does not support network drives. Everything must be mapped.
I think at this point you might try a tool like beyond compare http://www.scootersoftware.com/moreinfo ... fo_compare
I work in the ediscovery business and outside of indexing in a specialized app you might be out of luck... *other than to do a direct filesize/md5hash diff check. It is a LOT overhead, (coming from someone who knows).
I purchased no clone (enterprise version in fact)..does not support network drives. Everything must be mapped.
I think at this point you might try a tool like beyond compare http://www.scootersoftware.com/moreinfo ... fo_compare
I work in the ediscovery business and outside of indexing in a specialized app you might be out of luck... *other than to do a direct filesize/md5hash diff check. It is a LOT overhead, (coming from someone who knows).
Message 7 of 10
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2015-08-04
02:28 AM
2015-08-04
02:28 AM
Re: Finding and deleting duplicate files.
"Duplicatefilesdeleter" is best software to delete multiple duplicates files..
Message 8 of 10
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2016-01-14
10:32 AM
2016-01-14
10:32 AM
Re: Finding and deleting duplicate files.
I use Duplicate Files Deleter as it is very effective. It is 100% accurate and performs the scan quickly.
Message 9 of 10
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
2024-02-23
12:17 AM
2024-02-23
12:17 AM
Re: Finding and deleting duplicate files.
Auslogics Duplicate File Finder is a basic program that allows users to find and delete duplicate files taking up valuable space on their hard drives.
It works good…you can try it.
Message 10 of 10