Update and recomplile a installed system

I have written an application to find, compare and flag or delete duplicate files. This application is used to identify and clean out files with identical contents from about 2 million photos and video clips stored in about 200 directories. Total data is about 4 Tb on a 5 disk Raid 5 array with Reiser file system. The applications is written in “C” and compiled with -mtune=native -march=native -O3 and works perfectly.

It takes about 8 hours for the application to complete when run the application on Suse 11.3 and about 2.5 hours to complete when run on Gentoo, complied with the same compiler flags as the application. The systems are installed as dual boot on a 64 bit quad Phenom II 3000 processor.

In addition, there is a monster of a spreadsheet that takes several minutes to re-calculate on data changes.

My question:
1.) Is there a plugin for Yast or a script that will download the source RPM’s compile with user set compiler flags, link and replace the installed executables on a fully installed and configured system?

2.) Is there a plugin for Yast or a script that will download the source RPM for updates, compile with user set compiler flags, link and replace the installed executables?

The target is to have a Suse / Gnome system optimised for this hardware.

I have been installing and using Suse since 1995. I am to stupid to get Gnome to work with Gentoo.

All information and recommendations will be highly appreciated.

Hans

On 08/07/2011 08:56 AM, Hans Linux wrote:
> Suse 11.3

there is no “Suse 11.3” so i wonder if you ask about openSUSE 11.3 or
SUSE Linux Enterprise Server (SLES) 11 SP3?

any of the following in a terminal will turn up the correct nomenclature
so we all know what you meant:


cat /etc/SuSE-release
cat /etc/issue
lsb_release -sd |cut -f2 -d ""\"


DD
openSUSE®, the “German Engineered Automobiles” of operating systems!

What do you mean by “identical content”? If it means “identical files”, then checksumming the files which have an identical size could be done in a simple bash script. Here’s what I’m using (to just find the duplicates):

#! /bin/bash
# recursively find duplicate files (same size and same md5sum), optionaly with the given extension

dir=$1
ext=$2

if [ "x$*" == "x" ] ; then
	exec echo "syntax : $0 directory [extension]"
elif [ ! -d $dir ] ; then
	exec echo "directory $dir not found"
elif [ "x$2" == "x" ] ; then
	allfiles=(`find $1 -type f -ls | awk '{ print $7"@"$11 }' | sort -n`)
else
	allfiles=(`find $1 -type f -name "*.$2" -ls | awk '{ print $7"@"$11 }' | sort -n`)
fi

i=0
j=0

while [ $i -lt  ${#allfiles[li]}  ] ; do
[/li]	j=$(($i+1))
	if [ $j -lt ${#allfiles[li]} ] ; then
[/li]		e1=${allfiles[$i]} ; e2=${allfiles[$j]}
		f1=${e1##*@} ; f2=${e2##*@}
		s1=${e1%%@*} ; s2=${e2%%@*}
		if [ $s1 -eq $s2 ] ; then
			m1=`md5sum $f1 | awk '{ print $1}'`
			m2=`md5sum $f2 | awk '{ print $1}'`
			echo "$f1 = $f2" 
		fi
	fi
	let i++
done

I don’t know if it’s relevant though … and don’t know how/if it would handle 2 million files (probably not).

If however you compare the photos, then I’m sure you have written a fine program.

Forget about Gentoo and use ArchLinux.

The name of the Operating System is “openSUSE 11.3 (x86_64)” and will change in the future to “openSUSE 11.4 (x86_64)” or “openSUSE 12.? (x86_64)”.

Hans

I mean, same or different file name, same file size and identical or near identical contents.

Creating and comparing checksums is to slow.

Hans

Hans Linux wrote:
> I have written an application to find, compare and flag or delete
> duplicate files. This application is used to identify and clean out
> files with identical contents from about 2 million photos and video
> clips stored in about 200 directories. Total data is about 4 Tb on a 5
> disk Raid 5 array with Reiser file system. The applications is written
> in “C” and compiled with -mtune=native -march=native -O3 and works
> perfectly.

Not answering your question, I’m afraid …

faster_dupemerge
<http://www.hungrycats.org/~zblaxell/projects/dupemerge/dupemerge.html>

> It takes about 8 hours for the application to complete when run the
> application on Suse 11.3 and about 2.5 hours to complete when run on
> Gentoo, complied with the same compiler flags as the application. The
> systems are installed as dual boot on a 64 bit quad Phenom II 3000
> processor.

I’m not sure how the questions you ask below affect this? Presumably you
are already compiling your application on the opensuse box?

Have you profiled your application to see where the time is going?

> In addition, there is a monster of a spreadsheet that takes several
> minutes to re-calculate on data changes.
>
> My question:
> 1.) Is there a plugin for Yast or a script that will download the
> source RPM’s compile with user set compiler flags, link and replace the
> installed executables on a fully installed and configured system?

I don’t know. I guess the opensuse preferred answer is to upload the
source to the build service and compile it there.

The application has been compiled profiled on the Suse box and on the same box running Gentoo compiled with CFLAGS=-O2 flag set. There are no significant differences in performance.

Compiling the Gentoo box (full recompile) with and the application with CFLAGS=-O3 -mtune=native -march=native the performance is a about 3.5 times faster than on the Suse box and Gentoo box complied with CFLAGS=O2.

The cause seems to be the optimisation of the kernel, I/O system, raid system, memory access and unknowns.

My present fix is to run full optimised Gentoo and the application as a virtual machine with direct access to the hard drive array. It’s only slightly slower then with Gentoo and the application running direct on the hardware.