× NETGEAR will be terminating ReadyCLOUD service by July 1st, 2023. For more details click here.
Orbi WiFi 7 RBE973
Reply

Finding and deleting duplicate files.

JimND
Aspirant

Finding and deleting duplicate files.

I've only just got a readyNAS NV+, after years of using various sizes of usb-connected disks, and now I need to rationalise the thousands of files I've accumulated. Since there are lots of duplicates, that would be next to impossible by hand. I considered using one of the free de-cloning programs for Windows, but that would take days to scan over the whole system, I wouldn't be able to reboot or logout of my local pc, and if there was a network problem I'd have to start again. So I decided to see what could run directly on the ReadyNAS.

I've updated the NV+'s firmware, enabled ssh access as root, and installed apt-get as described in the developer's tips. Then on my desktop box I did a google search for the keywords "Debian Etch file duplicate" to see what clone-finding utilities were around. Of the ones I looked at, 'fdupes' seemed simplest and needed fewest changes on the NV+. It's also free 🙂

I ssh'ed to my NV+ and logged in as root, then ran:
apt-get update
and to see what changes installing fdupes would make I ran
apt-get -s install fdupes

(don't forget the -s which tells the NV+ to just simulate the install)

In my case it showed only one file being downloaded and added, and none being removed or upgraded. That seemed safe, so I went ahead with the install. Your results could be different, so be careful, and don't blame me if you have different results.

apt-get install fdupes

Everything seemed fine, so I found a small subtree in the directory structure to test it on and ran it to recursively search.

On my system it took 1hr 50m to scan just over 67000 files. I suspect it would be quicker with more memory, and I'll re-time it after the upgrade I ordered comes through. The NAS wasn't doing anything else at the time, and fdupes was using about 96% of the available cpu.

Some thoughts on how to use fdupes.
As it is, there's nothing sensible you can do, as it outputs to screen You can't even disconnect without ending the session. So:
Use 'nohup' to tell the session to continue even if you disconnect, and '>' to redirect the output to a text file you can edit later to create a script to delete unwanted copies etc.

eg
#cd /c/backup
#nohup fdupes -1frq * > /c/backupDups.txt &
(nb: that's -<one>fr... it says recursively search all directories within the 'backup' directory and send all the results to the named textfile. List each group of duplicates on a single line, omitting the first one found, and don't bother with a progress indicator. The ampersand says 'start this command, then give me a prompt so I can do something else').

Edit the resulting textfile as you wish. You might want to access it from your desktop machine, and load it into a spreadsheet or similar to allow sorting or other processing.

If you find your file access is too slow while this is running, use 'top' to see what is loading your processor. You'll see a changing display like:
             
top - 14:07:04 up 1 day, 17:08, 2 users, load average: 0.20, 0.10, 0.40
Tasks: 89 total, 1 running, 88 sleeping, 0 stopped, 0 zombie
Cpu(s): 4.5% us, 3.6% sy, 0.9% ni, 91.0% id, 0.0% wa, 0.0% hi, 0.0% si
Mem: 226384k total, 221808k used, 4576k free, 11680k buffers
Swap: 255968k total, 30128k used, 225840k free, 93504k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32630 root 17 0 2992 1680 1328 R 96.0 0.7 1:02.74 fdupes
...


The line below the one starting 'PID' shows what is using most resources, but you can make it be more considerate to other processes by 'renicing' it. Within 'top' press 'r' and it will ask which process to renice. In this case it's '32630' so type that number, then when it asks what to renice to, choose a +ve number less than 10. It will show under the 'NI' column.
If you want to make something yield processor less, so it will finish quicker, you can also renice to a -ve number. Don't renice anything you've not started yourself, and even then be careful! Use 'q' to exit 'top'

Hope that's useful to some of you. maybe a front-end addon could be made?
Message 1 of 10
jyotidayal
Aspirant

Re: Finding and deleting duplicate files.

I ran into a problem of duplicate files and someone told me about www.duplicates-finder.com
Message 2 of 10
parkerc
Aspirant

Re: Finding and deleting duplicate files.

Hi

while this maybe an old post, but a function/option on the NAS that could look for duplicates would be awesome,

Is there any chance one could be added as an add-on (even as an unsupported one)?

In know I have loads of duplicates, and I am sure I am no alone if I said it will take me ages to find them all.

Cheers
Message 3 of 10
parkerc
Aspirant

Re: Finding and deleting duplicate files.

Just wanted to add another post to this one to keep it alive. With a 2TB limitation on the Duos now optimisation of space is key, so if Netgear or anyone can come up with an add on that will show me all my duplicate files and give me the option to move or even delete that would be huge !

I've tried duplicate file finder apps on the PC looking to a readynas share, but these are so slow and due to the number of files nit one has worked yet 😞
Message 4 of 10
parkerc
Aspirant

Re: Finding and deleting duplicate files.

Hi Antony09, not sure if you are on commission for that post, but if not you must only have used it with a few files, because the people I have spoke to find large NAS searches impossible as the toolslike the one you've suggested freeze due to the amount of files it has to go through.

An netgear add-on that does not require SSH etc, which can do a small amount of indexing on the NAS itself pulling key attributes would be hugely usefull.
Message 5 of 10
parkerc
Aspirant

Re: Finding and deleting duplicate files.

Does anyone with some good Linux experience know if this "http://www.pixelbeat.org/fslint/" can be installed on the ReadyNAS DUO ?
Message 6 of 10
Scaevola
Aspirant

Re: Finding and deleting duplicate files.

It's not a solution more of a warning.

I purchased no clone (enterprise version in fact)..does not support network drives. Everything must be mapped.

I think at this point you might try a tool like beyond compare http://www.scootersoftware.com/moreinfo ... fo_compare

I work in the ediscovery business and outside of indexing in a specialized app you might be out of luck... *other than to do a direct filesize/md5hash diff check. It is a LOT overhead, (coming from someone who knows).
Message 7 of 10
Ketankohli
Aspirant

Re: Finding and deleting duplicate files.

"Duplicatefilesdeleter" is best software to delete multiple duplicates files..
Message 8 of 10
rewanya
Aspirant

Re: Finding and deleting duplicate files.

I use Duplicate Files Deleter as it is very effective. It is 100% accurate and performs the scan quickly. 

Message 9 of 10
robertgahan
Aspirant

Re: Finding and deleting duplicate files.

Auslogics Duplicate File Finder is a basic program that allows users to find and delete duplicate files taking up valuable space on their hard drives.
It works good…you can try it.

Message 10 of 10
Top Contributors
Discussion stats
  • 9 replies
  • 5365 views
  • 0 kudos
  • 7 in conversation
Announcements