• Some users have recently had their accounts hijacked. It seems that the now defunct EVGA forums might have compromised your password there and seems many are using the same PW here. We would suggest you UPDATE YOUR PASSWORD and TURN ON 2FA for your account here to further secure it. None of the compromised accounts had 2FA turned on.
    Once you have enabled 2FA, your account will be updated soon to show a badge, letting other members know that you use 2FA to protect your account. This should be beneficial for everyone that uses FSFT.

Finding Duplicate Files

Joined
Feb 9, 2018
Messages
3
Hi guys,


I've currently got around 2TB of CR2 Canon RAW files, 1TB of MOV/MP4, 600GB of JPGs, - I'm pretty sure a lot of that is duplicate files, but not in any sort of organised back up sense!


The storage is there to do that but I want to get everything in the one place then get a proper backup in place. (4TB, 2TB, 5x1TB)


Adobe Lightroom has an option to just re-import files from all over the computer and you can tell it to "Ignore Duplicates" however I'm not sure how robust that is... (It also seems to have loads of issues with video files importing so a lot don't get imported!)


Does anyone have any suggestions on programs that could help solve this issue? I'd just hate to have a program think something is a duplicate to then end up deleting something there is only one of. (Yes, I know, if it's so important I should have had it better organised and backed up
sad.png
haha)



Any help is much appreciated!!!
 
I've been using Duplicate Cleaner for years and it has most of the options I've needed. I think it's free. Actually, I think it's the app I've been using - I have a fresh install now due to losing my old rig so I haven't reinstalled many apps.
 
There are a number of duplicate file finder apps out there. Many will do a checksum test and/or a filename test for dupes, but if you have dupes across format types (eg same image in RAW and jpg) they will not trigger unless they have identical filenames (some will let you trigger filename matches with different extensions).
 
You might also be interested in image fingerprinting, aka image hashing:

https://realpython.com/blog/python/fingerprinting-images-for-near-duplicate-detection/
https://pypi.python.org/pypi/ImageHash

It's useful for detecting duplication across filetypes (e.g. grouping JPGs with their CR2s), or for finding images that have near-identical contents, but differ in size or quality.

I don't know of any programs that do this out-of-the-box, though. Haven't looked into it. But if you can code, it only takes a few lines of Python to scan images and compare their hashes.
 
Back
Top