Executable running slower on opensuse 12.3

I have a small piece of C++ code, which runs substantially slower in opensuse 12.3 than in opensuse 12.2, in the same computer with processor Intel i7-3820. The code is doing math calculations.

I can compile the code, get the executable in another computer running opensuse 12.2. When I bring the executable to i7-3820 computer, it also runs almost twice slower.

On the other hand, when I compile and create the executable in 3820 computer and take it to 12.2 computer, it runs at normal speed.

I compile with -O2, have tried a few processor-specific options - all yield similar results.

It looks like the libraries supplied with opensuse 12.3 do not take full advantage of the i7-3820.

Has anybody compared speeds of self-compiled executables between 12.2. and 12.3?

Are both(12.2/12.3) using same architecture(32/64)?

Yes.

I have two computers:

A: based on i7-3820. Earlier ran 12.2 64 bit, now running 12.3 64 bit. It is between these OSes that I noticed the slowdown.

B: based on some other i7. Runs opensuse 12.2 64 bit. No slowdowns, no matter where the executubale is compiled: in A or B.

On a small 64 bit laptop which I also tested, there is no difference between speeds in old opensuse 12.2 and new opensuse 12.3 that I installed recently.

I don’t have enough information to give you a good answer. I can say that anything I have compiled has no problems on 12.3. Though you might try also using an alternate compiler such as clang for comparison.

On 2013-03-25 05:46, ZStefan wrote:
> Has anybody compared speeds of self-compiled executables between 12.2.
> and 12.3?

No, but if you can produce a sample code to do test, I could try.


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)

Hi All,

I have noticed the exact same thing on my machine. 12.3 runs g++ and Qt code (also g++ 4.7) 2 - 3 times slower than on previous versions of OpenSuSE. I tested 11.3, 12.1 and 12.3. It only runs slow on 12.3. I tested single and multi-threadded code bits - no difference in respect of slow-down on 12.3. Besides the slow g++ code execution everything seems to be working perfectly, but I guess I wouldn’t really notice a a factor 3 slower execution speed in office tasks anyway. (However, numerical simulations that run hours-to-days are very sensitive to a factor of 2-3…)
I see this effect if I install 12.3 directly on the PC but I can even reproduce this if I install it in VMPlayer under linux or Windows7 (sorry to mention that ugly word on this forum :stuck_out_tongue: but I just had to try it out). I tested this on a somewhat older Intel i7 860 and on a Intel Xeon E3-1270v2 CPU. If I install OpenSUSE 12.1 and 12.3 in two otherwise identical virtual machines on the same PC I see substantially slower code execution on the 12.3 VM. I can execute the same binary (i.e. without recompiling the code) on both machines and I still see the large difference in code execution.

As ZStefan mentioned, I don’t see any improvements with optimized compiler settings: if I instruct g++ to optimize only for i686 (plus -O2) I still see a 2-3 times slower execution on OpenSuSE 12.3 compared to older OpenSuSE versions. Everything else (such as prime95) seems to run as fast as expected for each scenario I tested.

I also upgraded to the latest stable kernel version (hence the experiments in the virtual machine), but no improvement so I’m a bit at a loss here but I’m happy to try out any good suggestions. Initially I suspected some issues with how the kernel handles the CPU internal cache, but I confirm that the cache latency seems to be exactly what it is supposed to be based on Intel’s spec sheet.

I hope we can get this sorted out soon!

Best,
Tom

I know nothing about C+ benchmarks nor comparison of compiler speeds nor program mathematics execution speeds (wrt operations requiring mathematical calculations). I do note on 3 openSUSE installs (on different hardware) and 2 liveCD boots (also on different hardware) the subjective ‘feel’ is a faster desktop than 12.1 and faster than 12.2. NO applications doing mathematics calculations were involved in that assessment (as I have no such apps that I nominally use).

I see no mention as to whether kernel-desktop or kernel-default was chosen by those experiencing the reported significant slow down. My limited understanding is those different kernel releases are optimised for different functionality. Was that factor investigated by those experiencing the speed slow down?

Again, to emphasize, this is not a subject to which I can offer solutions. I can only provide subjective non-mathematical observations and ask questions on kernel selection.

On Mon, 25 Mar 2013 16:56:01 +0000, trs123 wrote:

> If I install OpenSUSE 12.1 and 12.3 in two otherwise
> identical virtual machines on the same PC I see substantially slower
> code execution on the 12.3 VM.

Well, a couple of things:

  1. VMs aren’t a great place to do performance testing, because there are
    a lot of factors that figure into a VM’s performance.

  2. “substantially slower code execution” is actually pretty meaningless.
    Execution times and code samples that demonstrate the problem are what is
    useful when trying to figure out this - ie, hard data, not vague
    anecdotes. :slight_smile:

Jim


Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

Hi Jim,

Thanks for your comments. As mentioned in my previous post, I see pretty much the same slow-down in the VMs compared to the slow-down on ‘real’ PCs (ups! another vague statement. So here comes my unofficual definition of ‘pretty much’: it refers to <10% difference; and with ‘substantial’ slowdown I meant to say slowdown by a factor of ~2).

I fully agree that VMs are not great for performance testing but to my surprise they are pretty close to the real thing. In fact, what concerns me the most is that the code in SuSE 12.1 in a VM on an older CPU runs >100% faster per MHz compared to on OpenSuSE 12.3 on a real PC!! Here some of the data (the numbers correspond to iterations per second for a large numerical simulation; larger values are better):

**Code 1, single threaded: **
System 1: OpenSuSE 12.3 on an Intel Xeon E3-1270v2 (‘real PC’): 230 iterations / second. (instead of >730 i/s expected on this platform) Code compiled with GNU g++ 4.7.2 20130108 on this machine with -O2 and only 686 optimizations to keep things simple.
System 2: OpenSuSE 12.1 on VMPlayer on Intel i7 860 (‘virtual box’): 650 iterations / second. Code was not recompiled (same binary as above). Recompiling with g++ that came with OpenSuSE 12.1 doesn’t change the numbers more than a few (i.e. single digit) %, which is simply noise.

So** the identical binary runs on OpenSUSE 12.1 in a VM on a slower CPU **(the i7 860 has a turbo clock of 3.45 GHz and is two generations older than the Xeon E3 1270v2 with a turbo clock-rate of 3.9 GHz) 2.83 times faster than on a real PC with OpenSUSE 12.3 (again identical binary!) which is not a VM! In my opinion the VM should only make things worse. I recall that this run much faster than 230i/s on a much older Core2quad 2.4GHz as well, but I don’t remember the exact number (~500+) I can rerun this test if that would change the outcome of this discussion.

Code 2: multi-threaded (code does essentially the same as the above, but with lot’s of hand optimized code, so the numbers are directly comparable)
System 1: OpenSuSE 12.3 on an Intel Xeon E3-1270v2 (‘real PC’): 920 iterations / second. (instead of the >1890 i/s expected on this platform) Code was compiled using qtcreator GUI (V2.6.2 based on Qt 4.8.4) using g++.
System 2: OpenSuSE 12.1 on VMPlayer on Intel i7 860 (‘virtual box’): 1670 iterations / second. Same binary as above.
System 3: OpenSuSE 12.1 on Intel Core 2 quad Q9650 (‘real PC’): ~1500 iterations / second. Compiled with qtcreator V 4.8.1 using g++. Note that this 5+year old PC with SuSE 12.1 delivers much higher throuput than the 6month old E3-1270v2 archidecture on SuSE 12.3.
System 4: Windows 7 in VMPlayer on Intel Xeon E3-1270v2: 1130 iterations / second. Compiled with qtcreator. QT V4.8.1, compiler is minGW32 (I generally notice ~40% lower performance on this compiler compared to g++. But what is puzzling to me is that this VM actually runs on System 1 (i.e. in a VM under OpenSuSE 12.3!!) and it still performs 22% better despide of being in a VM and despite of being compiled with a possibly less tweaked compiler compated to current g++ versions).
System 5: Windows 7 on Intel i7 860 (‘real’ PC) compiled with minGW32:** 950 itterations / second.** Again, I attribute this lower performance to the MinGW32 compiler. (Note that the VM vs. real PC is again within single digit percentate the same performance / MHz. That’s actually quite a surprise to me and it speaks for the implementation of VMPlayer. Note that this is a nummerics intensive task and does not have any disk IO. So this is probably not your typical office scenario.)

In summary, the same binary delivers about a factor of 2-3 less under OpenSuSE 12.3 from what I would expect from scaling from an older platform. I’m not sure what other numbers I could post (as I mentioned 2-3 times in my post above) here to be more scientific and less ‘anecdotal’.

(Note: All systems above are 64 bit OS, but the minGW compiler only generates 32bit code, which is then executed on a 64 windows system.)

So no matter how I try it, OpenSuSE 12.3 always delivers much slower number-crunching performance than the previous versions. Since system1 and 2 use the exact same binary I don’t see how the compiler could be the culprit. And again: another kernel didn’t change anything (I tried various desktop Kernel versions and I never saw an impact on the numerical performance. On OpenSuSE 12.3 I tried the stock desktop kernel and the latest 3.9 desktop flavor; difference in the single % range, i.e. just noise).

Posting code-samples is difficult at this point as both simulations are part of active research projects and are several thousand lines of code each.

So there you have it! I’m happy to try any constructive comments. - at least in the VMs
Cheers,
Tom

Out of curiousity - do kernel-desktop and kernel-default yield near identical results ?

On Mon, 25 Mar 2013 20:36:02 +0000, trs123 wrote:

> Posting code-samples is difficult at this point as both simulations are
> part of active research projects and are several thousand lines of code
> each.

The best thing to do is to try to narrow the problem code down - breaking
it down into smaller chunks to find out what sections experience the most
performance breakdown. There are some tools that might help with this
kind of profiling work - something like GNU Profiler might help isolate
code sections that are not performing well (comparing on the two
different systems), and can help with reporting bugs to the gcc
developers (assuming you’re using gcc to compile).

Breaking the problem down to a smaller reproducible unit is a common
technique to finding and isolating code performance issues (as well as
other performance characteristics, output weirdness, and the like).

They don’t want to have to debug your code, but rather see an example of
code that causes the strange performance.

That also helps keep your research out of competing hands (if that’s a
concern), because it’s just limiting the code exposure to a similar type
of code path that causes the problem in your code.

Jim


Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

Or just write less buggy software.

On Mon, 25 Mar 2013 22:06:01 +0000, nightwishfan wrote:

> Or just write less buggy software.

Well, with complex code that’s often easier said than done. But I note
as well that gcc’s rewrite in C++ was completed recently, and as the
compiler evolves, different optimizations may be affected by changes in
the compiler code.

But often times simplifying problematic code in order to isolate where an
optimization is failing can result in determining that there is a better
way in the code to do it - profiling the code to get information about
where bottlenecks are can expose problems either in compiler
optimizations or in the code that’s fed to the compiler - and both are
useful. :slight_smile:

Jim


Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

I agree. It is very difficult demanding and exacting work.

“But I note as well that gcc’s rewrite in C++ was completed recently, and as the compiler evolves, different optimizations may be affected by changes in the compiler code.”

This looks more like deoptimization rather than optimization. Why was the badly functioning product (perhaps the gcc C or C++ compiler, but the reason may lie somewhere else) released and why was it picked up and distributed in opensuse 12.3? Didn’t the authors know that the compiled executable runs twice (!) slower?

I can imagine how painful it would be to install the older or newest versions of gcc in opensuse 12.3. It is likely not possible. But if possible, here’s my wish to opensuse Build Service: please make available a rollback or rollforward to better versions of gcc, g++, kernel, glibc or whatever the culprit is.

I have submitted a report on Novell’s bugzilla, the number is 811546. The details on kernel, compiler and execution speeds are there.

With an idle host I did an unscientific test (as in no multiple runs or etc) with openSUSE 12.3 and Debian 7 in virtual machines. Both are stock installs (command line only) with only installing binutils and g++ and their dependencies. Not exactly the most intensive but I compiled this c++ code (/usr/bin/g++ -O2 poisson_serial.cpp) and timed the execution of it (/usr/bin/time ./a.out). The source code is here: http://people.sc.fsu.edu/~jburkardt/cpp_src/poisson_serial/poisson_serial.cpp

You can reproduce my results if you wish.

opensuse 12.3 vm - 3.7.10-1.1-default

execution:
0.00user 0.13system 0:00.14elapsed 94%CPU
(0avgtext+0avgdata 4480maxresident)k
0inputs+0ouputs (0major+327minor)pagefaults 0swaps

debian 7 vm - 3.2.0-4-486

execution:
0.00user 0.20system 0:00.21elapsed 97%CPU
0avgtext+0avgdata 1104maxresident)k
0input+0ouputs (0major+316minor)pagefaults 0swaps

I should note, I run a demanding C++ powerpc emulator all the time. If it were a ‘factor of 2’ slower. It would not have any performance at all on my weak hardware.

On 2013-03-26 02:16, nightwishfan wrote:
>
> With an idle host I did an unscientific test (as in no multiple runs or
> etc) with openSUSE 12.3 and Debian 7 in virtual machines. Both are stock
> installs (command line only) with only installing binutils and g++ and
> their dependencies. Not exactly the most intensive but I compiled this
> c++ code (/usr/bin/g++ -O2 poisson_serial.cpp) and timed the execution
> of it (/usr/bin/time ./a.out). The source code is here:
> http://tinyurl.com/ckzs9bl
>
> You can reproduce my results if you wish.

I want to, but, it runs in 1 mS (variance from 1 to 3mS) in my 12.1. I’m
not sure that if I run it under 12.3 I could see a significant
difference. The code should take half a minute to run so that we can
measure differences…

no? :-?


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)

I can’t find any decent c++ snippets. Though if this were truly a huge bug I think even this execution would take much longer. Though I didn’t really examine the code much, just looked for something cpu hungry. I would be happy to test a better example. :slight_smile:

Here is a better example. On opensuse 12.2, it takes 2 s. On opensuse 12.3, it takes 4.4 s.


// Version 1.0
//
// Compilation command:
// g++ -Wall -Wextra -O2 check_speed.C

# include <cstdlib>
# include <iostream>
# include <cmath>

using namespace std;

//----------------------------------------------
int main()  // Purpose: measure execution speed. 
{
const int NX=1000000, NY=8;
double    x=-3.3, y=0.0, z=9.4;
int       j = 99;

for(int i=0; i<NX; i++) 
{
 x = i + cos(i+z-j);
 for(j=0; j<NY; j++) 
 {
   y  = exp(-fabs(sin(x-i+y)));
   z  = x+y+z - cos(y+i-j);
   z += x - rand();
   if(x+i-j < j+20) z += y + fabs(cos(x-i)); 
   else z = sin(z+x) - sin(i-j+z);
   x = cos(x-y+z-i+j/20);
 }
}
cout << "x = " << x << endl;
return 0;
}

On 2013-03-26 02:56, nightwishfan wrote:
>
> I can’t find any decent c++ snippets. Though if this were truly a huge
> bug I think even this execution would take much longer. Though I didn’t
> really examine the code much, just looked for something cpu hungry. I
> would be happy to test a better example. :slight_smile:

A benchmarking test suite, perhaps?

NBench? Just had a quick look on the wikipedia. It mentions Linux, I see
no sourcecode or binary link, though. Hum…


http://www.tux.org/~mayer/linux/bmark.html

I tried to build it per instructions, failed:

>
> cer@Telcontar:~/bin/test/nbench-byte-2.2.3> make
> gcc -s -static -Wall -O3    pointer.c   -o pointer
> /usr/lib64/gcc/x86_64-suse-linux/4.6/../../../../x86_64-suse-linux/bin/ld: cannot find -lc
> collect2: ld returned 1 exit status
> make: *** [pointer] Error 1
> cer@Telcontar:~/bin/test/nbench-byte-2.2.3>


I’m off to bed.


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)