Moving duplicate files

Bill_L · March 19, 2020, 3:58am

I have a bunch of duplicates in my Music directory. They are intentional duplicates.
I would like to move all duplicates to another directory to see if I want to rename them or delete some of them.

I know how to use ‘fdupes’ to find them, and so far trying to recursively move them has not worked.

fdupes —delete —recursive “$MusicDupes”

only returns an error ‘could not chdir to’ with no directory listed.

I have to admit I found that command line here https://unix.stackexchange.com/questions/482309/finding-duplicate-files-and-moving-one-copy-to-another-drive-deleting-all-other
And thought I could bypass the strategy shown there.

so once again , I come seeking help and guidance from this forum.

IMO, there should be an easy(ier) way to do that without a lot of fiddling around, I May Be Wrong! Again!

hcvv · March 19, 2020, 12:26pm

I do not know fdupes, but in this command there seem to be some strange characters (like U+201C LEFT DOUBLE QUOTATION MARK and U+201D RIGHT DOUBLE QUOTATION MARK). Is this realy a straight copy/paste of what you have on the terminal emulator? That is what we expect when you show something within CODE tags.

hcvv · March 19, 2020, 12:33pm

I took the trouble to look in the link you posted. There I found

fdupes --delete --recurse "$stagingdir"

This command however has ASCII " characters for quoting the parameter expansion of $stagingdir (to avoid it being interpereted by the shell and forwarded to fdupe unchanged).

Now the question is, did you mangle the command before you used it, or afterwards when you posted it here?

(And of course we have no idea what $MusicDupes expands into, could be nonsense, but that is secondary to my first question above.)

jetchisel · March 19, 2020, 3:14pm

Bill_L:

I have a bunch of duplicates in my Music directory. They are intentional duplicates.
I would like to move all duplicates to another directory to see if I want to rename them or delete some of them.

I know how to use ‘fdupes’ to find them, and so far trying to recursively move them has not worked.
fdupes —delete —recursive “$MusicDupes”
only returns an error ‘could not chdir to’ with no directory listed.

I have to admit I found that command line here linux - Finding duplicate files and moving ONE copy to another drive, deleting all other copies - Unix & Linux Stack Exchange
And thought I could bypass the strategy shown there.

so once again , I come seeking help and guidance from this forum.

IMO, there should be an easy(ier) way to do that without a lot of fiddling around, I May Be Wrong! Again!

Hi,

if the code

fdupes —delete —recursive “$MusicDupes”

Is what you’re trying to run then you will surely get an error because it has some funny quotes, as what described by hcv.

If not then most probably "$MusicDupes" is not a directory or it does not exists at all.

One way of doing what you’re trying to do is to parse the output of fdupes like the for example.


while IFS= read -r files; do 
    $files ]] || continue
   mv -v -- "$files" directory_to_move_dulicate_files/
done < <(fdupes -r -f directory_with_duplicate_files/)

Fdupes output has some empty lines that’s why I have added the ** $files ]] || continue** in the loop to skip it.

That should be enough to handle file names with spaces and tabs however it fails on file names that has a new line.

I know that it is silly and highly unlikely that file/path names to have new lines in them but they are allowed in at least in Unix and it’s derivatives.

A work around that issue is to use null bytes as a delimiter for each files, and for recursive feature find is the right tool for the job.
Since fdupes is not an option using a hash for the files should work, and just compare them to find the duplicates.

#!/usr/bin/env bash

declare -A array

while IFS= read -r -d '' file; do
  read -r checksum _ < <(sha512sum -- "$file")
  if ((array$checksum]++)); then
    mv -v -- "$file" directory_to_move_dulicate_files/
  fi
done < <(find directory_with_duplicate_files/ -type f -print0)

The -print0 from find and the -d ‘’ from the builtin read both handle null bytes properly so the code above should be safe from files that has a spaces, tabs and new lines.

The declare -A creates an associative array.

Bill_L · March 20, 2020, 3:20am

@hcvv, the mess in the quotes in my coded command line was bad on my part. I didn’t know what I was doing.
Regardless it was not the way to do this.
Until I read @jetchisel’s post, I was about to go through the whole thing on the web page I posted in my OP.

@jetchisel, THANKS both worked. So more tools in my Linux experiences.
So, I have my music duplicates found and moved from two ‘music’ directories to two different ‘dupes’ directories to check out both of your suggestions.
Again, thanks!

hcvv · March 20, 2020, 12:11pm

@Bill_L
I thought over this during my sleep last night ;). How could this happen. It may be that you already now how the " were mistreated, but I assume you used some word processor for the statement. Those tend to make text better redabale for human beings by adapting, amongst many other things like spell-checking. Thus they try to make"intelligent guesses" about what those " are for and then adapting to “real” quoting open and close signs depending on the lanugage used.

For things that are to be understood primaraly by computers and not ny humans, better use an editor and not a word processor. Personaly I use vi (the older incarnation of present day vim), but that has a very steep learning curve. I think editors like Kate (in KDE) also do a good editing job without changing to much on what you mean (some editor are computer language sensitive and do things like using coulour highlighting to help you in finding unclosed constructs, but that is only in showing, nothing is changed).

Bill_L · March 20, 2020, 7:38pm

hcvv:

@Bill_L
I thought over this during my sleep last night ;). How could this happen. It may be that you already now how the " were mistreated, but I assume you used some word processor for the statement. Those tend to make text better redabale for human beings by adapting, amongst many other things like spell-checking. Thus they try to make"intelligent guesses" about what those " are for and then adapting to “real” quoting open and close signs depending on the lanugage used.

For things that are to be understood primaraly by computers and not ny humans, better use an editor and not a word processor. Personaly I use vi (the older incarnation of present day vim), but that has a very steep learning curve. I think editors like Kate (in KDE) also do a good editing job without changing to much on what you mean (some editor are computer language sensitive and do things like using coulour highlighting to help you in finding unclosed constructs, but that is only in showing, nothing is changed).

That line of code was copied from a terminal window. It was what I ‘thought’ would (re)move duplicate files to a directory named $MusicDupes.
Like I said, I had my head in a dark place, and didn’t know what I was doing, only what I was trying to do.
I do use vi, vim, nano, when necessary within terminals. But I am still learning their capabilities.

unix111 · March 20, 2020, 8:23pm

I usually see this kind of »best-intentions« typographical replacement of characters in content-management and blogging software (Typo3, Django, WordPress, Blogger, that kind of stuff).

While I really like (and regularly use) fancy Unicode characters, having them replaced automagically by the mentioned software can cause havoc when copy-pasting to/from source code, IDE, terminal window or text shell. My pet peeves:

double-hyphens get replaced by typographically correct longer lines (»—« for example, the Unicode Em-Dash) and work no longer as command-line switches (»–recurse« vs »—recurse«)
backticks may get converted to accent characters, even combined to accented glyphs (echo to èchò)
bits of pathnames get interpreted as italic or cursive (/usr/bin to usr
bin) … which is just /wrong/
seperate ASCII characters get fused to ligatures (fish to ﬁsh — the former has 4 codepoints, the latter only 3 because »ﬁ« is one glyph), hard to spot sometimes
and of course, all kinds of »swiss«/«french»/‘single’/“double”/“fancy” shenanigans

Beware of well-intentioned goodies thrown at you by well-intentioned web-developers, I guess.

hcvv · March 20, 2020, 8:46pm

I remember one earlier case here on the forums. But there the quotes were alread wrong on the web-site. In this case they are correct on the web-site. But in any case, one should be aware of all these (as you call the rightly)
“best intentions”
„beste Absichte“
« des meileures intentions »

karlmistelberger · March 24, 2020, 7:35am

A sample command I used to add the API key to my google maps:

find /home/Albums/Bilder/ -name Karte.htt -exec sed -i -e 's/google_api_key\ =\ '\'''\''/google_api_key\ =\ '\''A...........'\''/g' {} \;

I didn’t try to figure that out, but copied it from the internet. You may consider canonifying your file names such as linux - How to replace spaces in file names using a bash script - Stack Overflow