Page 1 of 2 12 LastLast
Results 1 to 10 of 11

Thread: ap for deleting duplicate files? (mostly .jpg files)

  1. #1
    Join Date
    Jan 2014
    Location
    San Jose City, PH.
    Posts
    118

    Default ap for deleting duplicate files? (mostly .jpg files)

    I just ran photorec on a bad drive to recover files, and I ended up with thousands of files, many of them being duplicates but with different file names. I have moved all .jpg files into one directory. Is there an application that can scan all the files in a directory, identify duplicates regardless of file name, and delete them leaving only one copy?
    AMD Ryzen 3 1200 Quad-core Processor, 32MB memory, KDE
    DRIVES: 2 4TB BTRFS raid1, 1 4TB BTRFS for backups

  2. #2
    Join Date
    Jun 2008
    Location
    Netherlands
    Posts
    25,394

    Default Re: ap for deleting duplicate files? (mostly .jpg files)

    I do not think there is such an application out of the box. This is typically the sort of task Unix/Linux is very good at because it offers a lot of low level tools and a shell to combine them into an ad hoc solution.

    I probably would try to sort them according to size, then only look at those with the same size and compare those (two's, three's ?) if they are the same (with diff or compare).

    The sorting (starting from a ls listing with appropriate options) would take some try and error I assume.
    Henk van Velden

  3. #3
    Join Date
    Jun 2008
    Location
    Netherlands
    Posts
    25,394

    Default Re: ap for deleting duplicate files? (mostly .jpg files)

    Tried a bit. This one will list file and their size in decreasing size:
    Code:
    ls -Sl --time-style=+%t | while read P L U G S N; do echo $N $S; done
    Try that one first in your directory. And tell me if it helps and if you want to go on with this way of working. Then we can carry on.
    Henk van Velden

  4. #4

    Default Re: ap for deleting duplicate files? (mostly .jpg files)

    Hi,

    There is the program called

    Code:
    fdupes
    I suggest you first backup the directory where you have those files.
    "Unfortunately time is always against us" -- [Morpheus]

    .:https://github.com/Jetchisel:.

  5. #5
    Join Date
    Jun 2008
    Location
    Netherlands
    Posts
    25,394

    Default Re: ap for deleting duplicate files? (mostly .jpg files)

    Quote Originally Posted by jetchisel View Post
    Hi,

    There is the program called

    Code:
    fdupes
    I suggest you first backup the directory where you have those files.
    I searched for "man fdupes" in Google. Looks as a solution, but as you suggest: handle with care and on a duplicate!
    Henk van Velden

  6. #6

    Default Re: ap for deleting duplicate files? (mostly .jpg files)

    Quote Originally Posted by hcvv View Post
    I searched for "man fdupes" in Google. Looks as a solution, but as you suggest: handle with care and on a duplicate!
    Hi,

    Right, as a rule of thumb I always backup files/directories whenever I'm editing, no matter which/what/how I edit the files.

    Now if you run

    Code:
    fdupes directory
    It should show you the duplicate files in groups ( A blank line separates the group from the others )

    For example

    Code:
    fdupes .
    Should lists the files in the current pwd/cwd or whatever you want to call it.

    For a recursive search. Note the trailing dot

    Code:
    fdupes -r .
    You can compare text files too using some utility/program.

    Code:
    diff file1 file2
    returns nothing if the files are identical.
    For the OP which sie said that files are mainly pictures. the cmp utility is an alternative.

    Code:
    cmp file1.jpg file2.jpg
    The same outputs nothing if the files are identical.

    you can add a test if you like.

    Code:
    if cmp file1.jpg file2.jpg; then
      printf '%s\n' file1.jpg and file2.jpg are identical'
    fi

    Another solution would be to hash the files and compare them, but imo there are specific tools written just to do what the OP wants. so better use the existing tools rather than reinvent the wheel.
    "Unfortunately time is always against us" -- [Morpheus]

    .:https://github.com/Jetchisel:.

  7. #7
    Join Date
    Jun 2008
    Location
    Netherlands
    Posts
    25,394

    Default Re: ap for deleting duplicate files? (mostly .jpg files)

    Quote Originally Posted by jetchisel View Post
    so better use the existing tools rather than reinvent the wheel.
    The "better" here is your idea. It will be easier. But it is a nice little exercise to write a small script for it.
    Henk van Velden

  8. #8
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    27,239
    Blog Entries
    15

    Default Re: ap for deleting duplicate files? (mostly .jpg files)

    On Sat 24 Feb 2018 09:26:01 AM CST, hcvv wrote:

    I do not think there is such an application out of the box. This is
    typically the sort of task Unix/Linux is very good at because it offers
    a lot of low level tools and a shell to combine them into an ad hoc
    solution.

    I probably would try to sort them according to size, then only look at
    those with the same size and compare those (two's, three's ?) if they
    are the same (with diff or compare).

    The sorting (starting from a ls listing with appropriate options) would
    take some try and error I assume.


    Hi
    Use fdupes with required options....
    Code:
    fdupes - finds duplicate files in a given set of directories
    --
    Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
    openSUSE Leap 42.3|GNOME 3.20.2|4.4.114-42-default
    If you find this post helpful and are logged into the web interface,
    please show your appreciation and click on the star below... Thanks!


  9. #9
    Join Date
    Jun 2008
    Location
    San Diego, Ca, USA
    Posts
    11,484
    Blog Entries
    2

    Default Re: ap for deleting duplicate files? (mostly .jpg files)

    Here's another option

    https://github.com/dedupeio/dedupe

    You can either download the python library from PyPi or try one of the listed services that use the library to dedupe your data.
    Looks to me that it'll find your duplicates, then it's up to you what to do with the result (eg delete a copy).

    TSU
    Beginner Wiki Quickstart - https://en.opensuse.org/User:Tsu2/Quickstart_Wiki
    Solved a problem recently? Create a wiki page for future personal reference!
    Learn something new?
    Attended a computing event?
    Post and Share!

  10. #10
    Join Date
    Jan 2014
    Location
    San Jose City, PH.
    Posts
    118

    Default Re: ap for deleting duplicate files? (mostly .jpg files)

    I looked at all suggestions, all of which were good, and
    decided that fdupes would probably be best. So I tried
    it on a backed-up directory, and it did exactly what I
    needed without any difficulty.

    After trying it out I felt confident enough to use it on
    the entire recovered file tree - all 3,097,177 files. I
    figure that it' saving me lots of time and labor, and if
    anything does go wrong I can always re-run photorec on
    the bad drive again as it'll only take about 10 hours.

    Thanks to everyone who responded! This was a good
    learning experience for me.
    AMD Ryzen 3 1200 Quad-core Processor, 32MB memory, KDE
    DRIVES: 2 4TB BTRFS raid1, 1 4TB BTRFS for backups

Page 1 of 2 12 LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •