I have a naive user who has unknowingly copied image files from one directory to another instead of moving them. Thousands of them. I need to find a utility that can recursively (from his home directory) identify duplicate filenames and perhaps verify that they are the same size (as some might be shrunk versions of the same image with the same name). I don’t need a GUI version. But something that works without needing a number of non-SUSE dependencies.
On 2014-05-28 22:46, ionmich wrote:
>
> openSUSE 13.1 64bit KDE
>
> I have a naive user who has unknowingly copied image files from one
> directory to another instead of moving them. Thousands of them. I need
> to find a utility that can recursively (from his home directory)
> identify duplicate filenames and perhaps verify that they are the same
> size (as some might be shrunk versions of the same image with the same
> name). I don’t need a GUI version. But something that works without
> needing a number of non-SUSE dependencies.
I was thinking about creating a similar tool myself…
You need to locate files that may have different names, but the exact
same content, or the same name, but possibly different content?
To locate the former, I think the trick is to generate a checksum of the
tree, and compare checksums.
–
Cheers / Saludos,
Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)
On Wed, 28 May 2014 20:46:01 +0000, ionmich wrote:
> openSUSE 13.1 64bit KDE
>
> I have a naive user who has unknowingly copied image files from one
> directory to another instead of moving them. Thousands of them. I need
> to find a utility that can recursively (from his home directory)
> identify duplicate filenames and perhaps verify that they are the same
> size (as some might be shrunk versions of the same image with the same
> name). I don’t need a GUI version. But something that works without
> needing a number of non-SUSE dependencies.
>
> Thanks in advance.
which you can then manipulate however you like. Since it’s sorted by
checksum, you can identify the duplicates by checking the line before/
line after each line.
You can also limit the scope of the find command in whatever way you
want, either by grepping (say the image files are all png files - you
could limit the scope that way)
Thanks for all the reponses. I believe I have not described my problem accurately. English is not my native language. I’ll try again using HYPOTHETICAL examples.
My Mamiya camera produces 80 Megapixel images which I copy to my hard drive. They have long numerical names and come grouped in variously named directories on the camera. I wanted to E-mail some of them them to people who might steal them and sell them. I used a utility that shrank them, but since it overwrites the original file I copied the 80 MP images to a separate directory. I reduced them in that directory. I was left with two directories holding identical filenames but different sizes. Years later I find that I have 30,000 images, some large, some small. I want to delete the small ones. If I had been intelligent I would have named the shrink directories with names that included “shrunk” instead of various unrelated names like “DCIM” or “photos”. Then I could easile search and delete.
If I can find duplicated filenames (recursively) I can easily delete the directories that have the shrunk versions using file size since ALL the files in any particular directory will be shrunk.
I tried it in a test directory of only 24 files. You will note that file DSC03267.JPG exists in the current directory and in the subdirectory TEST. Easy enough to match up, but I have 30,000 files of which I suspect some 20% are duplicated.
I’m not clear about the actual difficulty, is it that the User’s copies are mixed with other, unique files? Are you concerned that some files might be named the same but actually be altogether different files and not simply a compressed or otherwise altered but still same file?
If you can specify <exactly> what are your concerns then only after that a solution can be described…
My guess is that diff could compare and find identical filenames by itself,
but,
I depending on how valuable you consider your pics and the possibility that your compressed or otherwise modified files might be faulty (it happens, particularly if the data is old)
I’d instead recommend
Verifying all your original source files are usable
Then create new copies of your original source files in a new location, using whatever method you want to verify integrity. If only the filename has been changed, then you can automate using checksums (eg the code from Jim)
Then, you’d simply delete the “old” file copies with their naming difficulties.
Assumes more or less that all existing copies of files are in their own directory separate from the source.
You can try the comm(1) utility (which was written by the one and only RMS lol! ) and with find,md5sum and sort.
Compare Directory1 and Directory2
comm -13 <(find** Directory1** -type f -iname '*.jpg' -execdir md5sum {} + |sort) <(find **Directory2** -type f -iname '*.jpg' -execdir md5sum {} + | sort)
For more info about comm(1) see, it is also a very short manual.
man comm
Any way, you can play with those numbers with comm and see what will make you happy lol! , of course that example is just for jpg files that do not care about the case i.e JPG,JpG,JPg,jPG,jpg
On Sat 31 May 2014 04:46:01 AM CDT, jetchisel wrote:
Code:
konqueror man:find &
to be more precise.
Code:
konqueror man:/usr/share/man/man1/find.1.gz &
I guess that works for nautilus as well but i don’t know to be honest
Hi
For Gnome there is yelp, so press alt+F2 and enter yelp man:find
–
Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
openSUSE 13.1 (Bottle) (x86_64) GNOME 3.10.1 Kernel 3.11.10-11-desktop
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below… Thanks!