Executable running slower on opensuse 12.3

A lot of interesting observations on my end. First, my cpu makes a strange squeak noise while this code runs. It seems to run better either inside a virtual environment, or more likely using 32-bit math (my host is 64-bit). There seems to be little difference when compiled using either clang or gcc. Also that if I use -ffast-math it is actually “fast” on openSUSE. It uses a lot more resident memory on openSUSE. I used the command in the file to compile (g++ -Wall -Wextra -O2 check_speed.cpp) and did the best of three runs. Edit: I wonder if this code isn’t specifically designed to exploit a bug (in libc?) perhaps?

Debian 7 - 3.8-trunk-amd64 - g++ (Debian 4.7.2-5) 4.7.2

3.95user 0.00system 0:03.96elapsed 99%CPU (0avgtext+0avgdata 1288maxresident)k
0inputs+0outputs (0major+377minor)pagefaults 0swaps

Debian 7 - 3.8-trunk-amd64 - clang version 3.0-6.1

3.92user 0.00system 0:03.92elapsed 99%CPU (0avgtext+0avgdata 1284maxresident)k
0inputs+0outputs (0major+376minor)pagefaults 0swaps

Debian 7 VM - 3.2.0-4-486 - g++ (Debian 4.7.2-5) 4.7.2

3.34user 0.00system 0:03:35elapsed 99%CPU (0avgtext+0avgdata 1008maxresident)k
0inputs+0outputs (0major+291minor)pagefaults 0swaps

Debian 7 VM - 3.2.0-4-486 - clang version 3.0-6.1

3.32user 0.00system 0:03:33elapsed 99%CPU (0avgtext+0avgdata 1004maxresident)k
0inputs+0outputs (0major+292minor)pagefaults 0swaps

openSUSE 12.3 VM - 3.7.10-1.1-default - g++ 4.7.2

16.19user 0.00system 0:16.25elapsed 99%CPU (0avgtext+0avgdata 4144maxresident)k
0inputs+0outputs (0major+306minor)pagefaults 0swaps

openSUSE 12.3 VM - 3.7.10-1.1-default - clang version 3.2

16:54user 0.01system 0:16.60elapsed 99%CPU (0avgtext+0avgdata 4144maxresident)l
0inputs+0outputs (0major+305minor)pagefaults 0swaps

openSUSE 12.3 VM - 3.7.10-1.1-default - clang version 3.2
command: clang++ -Wall -Wextra -O3 -ffast-math check_speed.cpp

2.80user 0.00system 0:02.82elapsed 99%CPU (0avgtext+0avgdata 3856maxresident)l
0inputs+0outputs (0major+288minor)pagefaults 0swaps

Lol - of course you know where the squeak noise comes from… No, it is not the sound of many tired electrons needed to be moved around to yield those cos()-s and exp()-s.

Still, the times you get don’t answer the main question: is the running in opensuse 12.3 slower than in opensuse 12.2; how much and why?

And VM, Debian, clang and -fast-math introduce four new degrees of freedom which only complicates the 12.3 vs.12.2 comparison.

I don’t use -fast-math. But it is interesting how miserably slow the VM with opensuse 12.3 and kernel 3.7.10-1.1-default is. I can only guess that it is because the compilation cannot be optimized in a good way.

I am complicating nothing. I show the results with openSUSE 12.3 and something not opensuse 12.3. I do not feel like downloading openSUSE 12.2, I only have debian and opensuse 12.3 images available. You can do your own comparison if you are interested in adding the data to the bug report. I did this to learn and have fun.

Either way, I am still quite sure the overall performance of openSUSE 12.3 is not poor. I think this code is specfic somehow. If I spent more than 5 minutes in c++ code in my life, I would know why. :slight_smile:

What is the processor and what is the maximum operation frequency?

One way to get the answers is to look at file /proc/cpuinfo while the code is being executed.

My computers have Intel Core i7 (two similar varieties of the processor when compared in single execution thread mode), and CPU frequency around 3.5 GHz, which is boosted under load to about 4 GHz.

On 2013-03-26 04:16, ZStefan wrote:
>> I would be happy to test a better example. :slight_smile:
> Here is a better example. On opensuse 12.2, it takes 2 s. On opensuse
> 12.3, it takes 4.4 s.


cer@Telcontar:~/bin/test> time a.out
z=34 a0=12 a1=34

real    0m0.001s
user    0m0.001s
sys     0m0.000s
cer@Telcontar:~/bin/test> time a.out
z=34 a0=12 a1=34

real    0m0.001s
user    0m0.000s
sys     0m0.000s
cer@Telcontar:~/bin/test> time a.out
z=34 a0=12 a1=34

real    0m0.001s
user    0m0.000s
sys     0m0.000s
cer@Telcontar:~/bin/test>


cpuinfo:

model name      : Intel(R) Core(TM)2 Quad CPU    Q9550  @ 2.83GHz


Zero seconds run, 12.1. Not enough time to make measurements. How can it
take 2…4 seconds on your system? :-?


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)

Not sure what you did Carlos, but on my i3 it gives


martinh@ganymed:~/tmp> time ./a.out
x = 0.998523

real    0m7.825s
user    0m7.817s
sys     0m0.000s
martinh@ganymed:~/tmp>

so I can imagine it takes 2 sec or so on a faster cpu.
Just checked on the i7


martinh@sirius:~/scratch> time ./a.out
x = 0.998523

real    0m4.849s
user    0m4.832s
sys     0m0.001s
martinh@sirius:~/scratch>

now I have the problem to compare as I have updated the systems recently
and no 12.2 on that machines.


PC: oS 12.3 x86_64 | i7-2600@3.40GHz | 16GB | KDE 4.10.0 | GTX 650 Ti
ThinkPad E320: oS 12.3 x86_64 | i3@2.30GHz | 8GB | KDE 4.10.0 | HD 3000
HannsBook: oS 12.3 x86_64 | SU4100@1.3GHz | 2GB | KDE 4.10.0 | GMA4500

A bit of speculation.
Since most of the time is spent in the cos and exp calculations if it
was really faster on 12.2 it tends to point to glibc as the problem as
this contains the libm.


PC: oS 12.3 x86_64 | i7-2600@3.40GHz | 16GB | KDE 4.10.0 | GTX 650 Ti
ThinkPad E320: oS 12.3 x86_64 | i3@2.30GHz | 8GB | KDE 4.10.0 | HD 3000
HannsBook: oS 12.3 x86_64 | SU4100@1.3GHz | 2GB | KDE 4.10.0 | GMA4500

Am 26.03.2013 17:23, schrieb Martin Helm:
> A bit of speculation.
> Since most of the time is spent in the cos and exp calculations if it
> was really faster on 12.2 it tends to point to glibc as the problem as
> this contains the libm.
>
Hm, maybe I am too tired now, but I do not see a double cos in cmath
only a float and a long double cos ???


PC: oS 12.3 x86_64 | i7-2600@3.40GHz | 16GB | KDE 4.10.0 | GTX 650 Ti
ThinkPad E320: oS 12.3 x86_64 | i3@2.30GHz | 8GB | KDE 4.10.0 | HD 3000
HannsBook: oS 12.3 x86_64 | SU4100@1.3GHz | 2GB | KDE 4.10.0 | GMA4500

Am 26.03.2013 17:32, schrieb Martin Helm:
> Am 26.03.2013 17:23, schrieb Martin Helm:
>> A bit of speculation.
>> Since most of the time is spent in the cos and exp calculations if it
>> was really faster on 12.2 it tends to point to glibc as the problem as
>> this contains the libm.
>>
> Hm, maybe I am too tired now, but I do not see a double cos in cmath
> only a float and a long double cos ???
>
That was red herring, but I see the same program on a crappy old CPU
performing faster with 12.2 64bit than on my i7 with 12.3 64 bit


michaela@michaela-pc:~/scratch> time ./a.out
x = 0.998523

real    0m3.499s
user    0m3.495s
sys     0m0.001s
michaela@michaela-pc:~/scratch>

Intel(R) Core™2 Quad CPU Q8300 @ 2.50GHz

What’s going on here?


PC: oS 12.3 x86_64 | i7-2600@3.40GHz | 16GB | KDE 4.10.0 | GTX 650 Ti
ThinkPad E320: oS 12.3 x86_64 | i3@2.30GHz | 8GB | KDE 4.10.0 | HD 3000
HannsBook: oS 12.3 x86_64 | SU4100@1.3GHz | 2GB | KDE 4.10.0 | GMA4500

Did one more test with the ‘Code 1’ I described in Post #9](https://forums.opensuse.org/english/other-forums/development/programming-scripting/484988-executable-running-slower-opensuse-12-3-a.html#post2540874) of this thread. I compiled it in a VM running OpenSuSE 11.3_64 bit, Kernel 2.6.34.7-0.5-desktop with g++ V 4.5.0 20100604. This binary run it in this VM with 940 iterations / second. If I run the very same binary (no recompilation!) on OpenSuSE 12.3_64 bit (not a VM but a ‘real PC’), with stock kernel I get only 230 iterations per second (same as previously in Post #9). The only compiler flag I used is -O2, nothing else. The CPU is an Inten Xeon E3-1270v2 (same core as the i7 3770…)

So since this code was compiled with an older version of gcc (V. 4.5.0 instead of the one that ships with OS12.3, which is 4.7.x) I assume the problem lies within a library that was supplied with the 12.3 distro, rather than the compiler itself.

Cheers,
Tom

Am 26.03.2013 18:46, schrieb trs123:
> So since this code was compiled with an older version of gcc (V. 4.5.0
> instead of the one that ships with OS12.3, which is 4.7.x) I assume the
> problem lies within a library that was supplied with the 12.3 distro,
> rather than the compiler itself.

It’s the libm, I copied over the libm from 12.2


LD_PRELOAD=./libm-2.15.so time ./a.out
x = 0.998523

3.32user
0.00system
0:03.32elapsed

compared to


real    0m7.838s
user    0m7.828s
sys     0m0.002s

without the LD_PRELOAD using the 2.17 system library.


PC: oS 12.3 x86_64 | i7-2600@3.40GHz | 16GB | KDE 4.10.0 | GTX 650 Ti
ThinkPad E320: oS 12.3 x86_64 | i3@2.30GHz | 8GB | KDE 4.10.0 | HD 3000
HannsBook: oS 12.3 x86_64 | SU4100@1.3GHz | 2GB | KDE 4.10.0 | GMA4500

There are no float or long double precision variables in the speed test program - all calculations are carried out with doubles and integers.

But it shouldn’t matter in speed comparison. There is a major slowdown in opensuse 12.3.

Are we speaking about this library in /usr/lib64:

libm.so -> /lib64/libm.so.6

I will try to temporarily replace it in 12.3 with one from 12.2, but I am afraid the OS will fail. If it fails, I will boot from Live USB and restore.

Yeay! I can confirm this! I copied a very old libm.so from OpenSuSE11.3 (64 bit version) over /lib64/libm-2.17.so that was provided with OpenSuSE 12.3. Speedup is tremendous:

Code 1: went from 230 i/s (see post #9) to 940 i/s (factor of 4.09 improvement!)
Code 2: went from 920 i/s (see post #9) to 2450 i/s (factor of 2.66 improvement!)

Glad it’s not my ‘buggy code’ as some posters mentioned :slight_smile:

I hope there will be a corrected version supplied through the online update soon so everyone will benefit.

Best,
Tom

This is a great result!!!

Before the new library emerges, is it safe to install (maybe force-install?) the packages from opensuse 12.2? Is it possible at all?

Which packages shall be installed from old opensuse?

Unless you submit a bug(bugzilla.novell.com) you may not get it
FAQ:- openSUSE:Submitting bug reports - openSUSE Wiki

If not done already, can someone please write a bug report on this, and note in the bug report the work around ?

I am curious: do you get exactly the same numerical outcome, assuming a comparison is meaningful?

A bug report has been submitted about a day ago. It refers to this thread in Forums.

https://bugzilla.novell.com/show_bug.cgi?id=811546

Am 26.03.2013 20:16, schrieb trs123:
> Glad it’s not my ‘buggy code’ as some posters mentioned :slight_smile:
>
> I hope there will be a corrected version supplied through the online
> update soon so everyone will benefit.
>
You should file a bug report about this problem otherwise nobody will
fix it.
I can also do it later tonight.


PC: oS 12.3 x86_64 | i7-2600@3.40GHz | 16GB | KDE 4.10.0 | GTX 650 Ti
ThinkPad E320: oS 12.3 x86_64 | i3@2.30GHz | 8GB | KDE 4.10.0 | HD 3000
HannsBook: oS 12.3 x86_64 | SU4100@1.3GHz | 2GB | KDE 4.10.0 | GMA4500

Yes, exactly same numerical outcome.