how to compare two huge music directories playback length and file count of corresponding files

Hi I am Rupesh from India and I have a huge directory of size 150 gb of which 93 gb are mp3 files which are speeches recorded by someone at 64 kbps and remaining are music files with bitrate between 96 kbps to 216 kbps.

I want to compress all these files and store these in external memory card. Upon performing a number of experiments I came to conclusion that opus codec is best for speeches and mp4 files generated using aac(Nero acc encoder) is best for music with crystal clear quality at even lower bitrates.

I have converted 93 gb speech files to opus files and occupied 30 gb and converted remaining music files to mp4 files which occupied 20 gb. I have choosed lamexp for conversion which is a best open source software which combines binaries of other encoders including lame, opusenc, NeroAacEnc etc.,. For opus encoding I have used 11 kbps,16 kbps and 24 kbps. For aac encoding I have used 48 kbps and 56 kbps. For all encoding I have used 48000 Hz sample rate.

Even though I have used lower bitrates for conversion the resulted output files are of acceptable quality. The output file names generated and directory structure are identical to source.

During conversion very few of the original files are ignored and so no output files generated for those. Some very few of files are not transcoded properly I mean suppose source file is of 50 minutes length it was transcoded to a file containing 5 minutes or 10 minutes in length but the source and destination file names are identical. I have noticed those differences by examining some source and destination directories.

Sometimes I think that a huge amount of files are not transcoded properly as illustrated above. I want to compare source and destination directories and obtain a list of files with paths which are not transcoded properly.

Suppose I have a directory of size 23 gb and 2400 files and I have transcoded to a destination directory of size 5 gb and 2200 files.
can anyone of you suggest how to the list of the missing 200 file’s.

Suppose I have a directory of playback length 600 hours distributed into various files say 1000 files and I have transcoded these files and the resulting file count is 1000 of playback length say 500 hours. Can anyone of you suggest how to obtain the list of files with paths which are different in playback length.

Atleast can you suggest how to obtain the playback length of the total directory which contains 150 files.

Open a console and navigate to the topmost folder of each collection: enter

ls -lR >File1.txt

in the first and

ls -lR > File2.txt

in the second.
Open LibreOffice and open File1.txt. Go to Edit>Track changes>Compare and open File2.txt.
You will end up with a file listing the differences between the two files.

that’s a good idea except I wouldn’t recommend an office suit to view differences but a diff utility like kompare (formerly kdiff)

zypper in kompare
kompare File1.txt File2.txt  or
kompare /path/to/folder1/ /path/to/folder2/

kdiff (kompare) can do a comparison of folders too but I assume a diff on 2 folders with gigabytes of binary files would take a long time
there are other compare utilities that more or less do the same thing like kdiff3 (not the same)

Yep, and you both assume that the output filenames are the same as the input filenames. OP states he transcoded all these files to compress them, hence will have other extensions at least. Of course that could be scripted, but we don’t know since the OP didn’t tell.

Hindsight is 2020,
If the current Q was recognized as an objective from the beginning, then it would have been simple to modify the compilation script to copy any failures to a separate directory.


All of you are thinking about file comparison in the sense file count but my need is not that.

Actually source directory consists of 15000 files of which 11000 are mp3, 2000 are m4a, 100 are flv, 500 are jpg, 100 are gif, 100 are gif, 100 are txt, 200 are avi files.

I have transcoded 9000 files to opus and remaining to m4a files using batch encoder and it maintained directory structure same as source and renamed output files according to my selection.

I am suspecting that some of files were not transcoded at all and some output files play back length are not same as their sources. I want a list of files which are not transcoded at all with path in a text file or if possible those files in a another directory. Second most important I want list of files which are transcoded less in play back length than source and if possible I want these files also in another separate directory.

Remember that output file may be created but don’t have same play back length as source.

I think that this post is not related to file management but related to multimedia.Please examine some toils like mediainfo, mp3tag etc., and give suggestions.

Someone suggested to try meldmerge for file comparison.

Can you suggest how to obtain total play back length of a folder containing 100 audio files.

Use k3b, choose “New Audio CD Project”, drop ALL of your 100 audio files into the project.

k3b will display the total track time (together with a warning you’ve exceeded the capacity of the CD!)

like this:

… then cancel the operation as obviously you don’t want to write a CD :stuck_out_tongue:

If the directory has sub directories which contains audio files then can I add these into k3b.

Someone suggested that mediainfo can calculate length and even it can export all the properties of all files to excel csv file.

Yes, just add those to the same (k3b) project.

Someone suggested that mediainfo can calculate length and even it can export all the properties of all files to excel csv file.

I’m not familiar with that package, this appears to be the information about it: looks quite interesting, may take a look myself later. :wink:

in theory one could create a shell script to read the length field output from mediainfo
and add the value from the file read (you could modify the previous conversion script to use mediainfo instead of ffmpeg) the thing is length is in time format hh:mm:ss and adding those is a bit difficult maybe somebody in the programing sub-forum can be of more help

Indeed, scripting would be needed. We still don’t know if the filenames are identical ( or at least without their extension ). Here’s an example using ffmpeg ( IMHO the best tool to use for his ), maybe this will help

for f in *.ts; do
    _t=$(ffmpeg -i "$f" 2>&1 | grep "Duration" | grep -o " [0-9:.]*, " | head -n1 | tr ',' ' ' | awk -F: '{ print ($1 * 3600) + ($2 * 60) + $3 }')
echo "${times@]}" | sed 's/ /+/g' | bc

This might even be faster, and for all files

$ for i in *; do 
    dur=$(ffmpeg -i "$i" 2>&1 | grep -oP "(?<=Duration: ).*(?=, start.*)");
    date -ud "1970/01/01 $dur" +%s; done | paste -s -d+ | bc

It’s all documented here and since it’s ffmpeg ( which supports a huge collection of formats ) this should work for all files in a folder as well, and replacing the ‘*’ by

`ls -R *`

would include the subdirectories as well.

There is an app in Windows called playtime I think may be based on media info which exports all the properties of a directory containing media files including audio and video to an excel CSV file.

We can compare the properties of two huge directories containing 10000 media files by comparing two such CSV files which are exported by the above app.

It’s wonder that we can perform any task in Windows with very less effort but in Linux it’s not possible because we must have thorough skills in shell programming, commands.

If you can’t believe the above statement examine the app playtime.

In your many requests for help you often quote Windows applications that seem to suit your various needs.

Which begs the question … Why don’t you use Windows and those numerous applications to achieve your goals? :wink:

or use wine I use a couple of windows apps under wine I’ve had no issues (well the window decorations look funkie but that’s cosmetic)


just a note on the

`ls -R *`

That should work until some files contains spaces and other characters that the shell might interpret.

Here is an example.
First we create a directory so we don’t mess up the current directory and go into it if it is successful.

mkdir temp && cd "$_"

Create an empty files using touch since it was mentioned on the other post.

touch one.mp4 two.avi three.mkv four.mp3 'five six.flv' 'seven' 'nine ten.jpg'

Create an additional directory and go into it if it is successful.

mkdir OtherDirectory && cd "$_"

Create an additional empty files.

touch eleven.mp4 twelve.mp3 'thirteen fourteen.mkv' 'fifteen sixteen.avi'

Go back to the upper/parent directory.

cd -

Now we do first the **ls -R ***

for i in `ls -R *`; do printf '<%s>
' "$i"; done

Now we have more files than expected. Quoting will not matter in that code that is using ls.
This is the same issue with the xargs question from the other posts which is parsing zyppers output.
The open and close bracket in printf is just to print out the spaces if any, you can replace it with anything you like.

My suggestion Since this is openSUSE, the bash shell option globstar can be enabled.

shopt -s globstar
for i in **; do
  printf '<%s>
' "$i"
<five six.flv>
<nine ten.jpg>
<OtherDirectory/fifteen sixteen.avi>
<OtherDirectory/thirteen fourteen.mkv>

Replace the blah…blah… part with your code. The continue part is just to skip directories.

for i in **; do
   -d $i ]] && continue
  blah...blah..blah "$i"

Note the use of double ** instead of one *

Or use find for portability since find is recursive by default. Note that this is only to print the output of the files including the Pathname. The type -f in find will skip directories.

find . -type f -printf '<%p>
<./OtherDirectory/fifteen sixteen.avi>
<./OtherDirectory/thirteen fourteen.mkv>
<./nine ten.jpg>
<./five six.flv>

Replace the blah…blah part of course.

find . type f -exec blah...blah...blah... {} +