Strange 64 bit binaries slowdown vs 32 bit binaries on the same 11.464bit system

I have some ugly performance issue on on a 64 bit system, the 32 bit binary run 2x faster than the 64 “native” version on the same system.

I was doing some benchmarks of my own and this puzzles me.
The system is a fully updated 11.4.

I was initially testing two java programs on differents machines, an 11.3 32bit on PhenomIIx4 2.6GHZ and my old notebook with 11.4, finding the same performance.

Then I used a couple mini-bech test programs on C, Just a very inefficient but cpu intensive prime number counter.

The exact same source in pure ASCII C, compiled without any add on libraries produces with -m32 a nearly 2x faster program, than a “native” 64 bit compiled program.

I first copied the 32 bit version from my 11.3 machine, then build on the 11.4 machine, the results are the same my notebook has a Turion64x2 2Ghz cpu, way older, being a K8.

Both machines use the standard -desktop kernel of their corresponding 11.x versions.

A BIG gcc bug?

A kernel bug?

Libraries bug?.. Even statically compiled versions perform the same scale factor.

The implication is that all 64 bit systems out there might be outperformed by the equivalent 32bit distro.

Any ideas?


#include <stdio.h>
#include <stdlib.h>

inline int is_prime2(long num){
        long den=1;
        do{
                den++;
        }while( num%den != 0);

        if(den == num){
                return 1;
        }else{
                return 0;
        }
}

int main(int argc,char* argv]){
        long i=0;
        int j=0;
        for(i=2;i<=200000;i++){
                if (is_prime2(i) ){
                        //printf("%li / ",i);
                        j++;
                }
        }
        printf("Num Primos: %i 
",j);
        return 0;
}

Then compiled with :


gcc -Wall -m32 -O2 prim.c -o prim32
gcc -Wall -m64 -O2 prim.c -o prim64

And tested with:

notebook:~/src>test ./prim32 ; test prim64

Num Primos: 17984 

real    0m36.638s
user    0m36.600s
sys     0m0.004s
Num Primos: 17984 

real    1m4.446s
user    1m4.412s
sys     0m0.002s

If anyone can reproduce this results please let me know.

PD: I know that 64 systems have bigger <long><double>,etc. But this means bigger use of memory, it should not affect speed per se.

Your test program is doing a lot of arithmetical divisions. That is typically the slowest arithmetical operation, and the one least able to be speeded up. It is likely that division is significantly slower on longer words.

On 2011-03-31 07:36, rcornet wrote:

> Libraries bug?.. Even statically compiled versions perform the same
> scale factor.

A 64 bit program may use double the memory, if you are not careful.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)

On 03/31/2011 07:36 AM, rcornet wrote:
>
> Any ideas?
>

for me, two scoops of ice cream is (almost) always better than one…

lots of people believe that 64 bit is always faster than 32…
but that they believe it does not make it fact…

they forget that while 64 is twice 32, what we are talking about is
the width of the ‘word’ transmitted…think of a race with two trucks
moving at the same speed, with the ability to carry the same number of
words, but one carries words twice the size of the other–the big word
carrier gets there at the same time, with the same number of words,
but twice as many letter…so, which is faster? the answer will depend
on how big are the words you need, how you define ‘fast’ and how
you measure the relative merits of work performed in terms of time…

that is, for example if you only need one truck’s worth of small
‘words’ delivered, then spending the extra money to buy the fuel
needed to send those short words in a half empty truck can’t really be
called “faster”…just more underused potential…

its like having a quad core but only enough work to keep one warm…

and, by the way: if i get two scoops of a low quality, too long in the
freezer, or cheaply/artificially flavored ice cream i will always wish
i had only one.

ymmv…when 32 bit chips were made and MS was selling 16 bit software
they began hyping the coming out of Redmond of the software miracle of
32 bit computing (though 32 bit was old hat in the server
market)…and, and they moved the desktop earth off of Win3.1 and onto
Win95 (and moved a LOT of money into their pockets)…

when 64 came around they did the same…(though lots of folks were
running 64 desktops years before Vista limped out)

and, if they are still operating when 128 is available you can expect
the same again–you may quote me.


CAVEAT: http://is.gd/bpoMD
Tried LibreOffice? Do that and help at http://is.gd/dZ9j2W
[NNTP via openSUSE 11.3 + KDE4.5.5 + Thunderbird3.1.8]

On 2011-03-31 07:36, rcornet wrote:

> Then compiled with :
>
>
> Code:
> --------------------
>
> gcc -Wall -m32 -O2 prim.c -o prim32

Interestingly, I can’t compile this:

Code:

cer@Telcontar:~/bin/C> gcc -Wall -m32 primos.c -o primos32
/usr/lib64/gcc/x86_64-suse-linux/4.4/…/…/…/…/x86_64-suse-linux/bin/ld:
skipping incompatible /usr/lib64/gcc/x86_64-suse-linux/4.4/libgcc.a when
searching for -lgcc
/usr/lib64/gcc/x86_64-suse-linux/4.4/…/…/…/…/x86_64-suse-linux/bin/ld:
cannot find -lgcc
collect2: ld returned 1 exit status

What is it missing, libgcc? I have “libgcc44”. Perhaps it needs a symlink?

Why is it looking in /usr/lib64/gcc/ and not in /usr/lib/gcc/?

> gcc -Wall -m64 -O2 prim.c -o prim64

This version compiles fine.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)

Carlos E. R. wrote:

>
> What is it missing, libgcc? I have “libgcc44”. Perhaps it needs a symlink?
>
> Why is it looking in /usr/lib64/gcc/ and not in /usr/lib/gcc/?
>
You need gcc44-32bit in addition to the default gcc to cross compile 32
applications with -m32.


PC: oS 11.3 64 bit | Intel Core2 Quad Q8300@2.50GHz | KDE 4.6.1 | GeForce
9600 GT | 4GB Ram
Eee PC 1201n: oS 11.4 64 bit | Intel Atom 330@1.60GHz | KDE 4.6.0 | nVidia
ION | 3GB Ram

On 2011-04-22 14:08, martin_helm wrote:

> You need gcc44-32bit in addition to the default gcc to cross compile 32
> applications with -m32.

Ah, right.

Indeed, the 32 bit version is faster:

cer@Telcontar:~/bin/C> time ./primos64 ; time ./primos32
Num Primos: 17984

real 0m21.812s
user 0m21.712s
sys 0m0.005s
Num Primos: 17984

real 0m6.396s
user 0m6.370s
sys 0m0.002s
cer@Telcontar:~/bin/C>


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)

Anyway this is comparing apples and oranges here since the default sizes for
the elementary types differ between 64bit and 32bit compilation.


martinh@sirius:~/scratch> time ./bench32
sizeof long 4
sizeof int  4
Num Primos: 17984

real    0m7.004s
user    0m6.974s
sys     0m0.004s
martinh@sirius:~/scratch> time ./bench64
sizeof long 8
sizeof int  4
Num Primos: 17984

real    0m21.925s
user    0m21.809s
sys     0m0.019s


To get a correct comparism one has to declare the types so that they are the
same length for both compilations.


PC: oS 11.3 64 bit | Intel Core2 Quad Q8300@2.50GHz | KDE 4.6.1 | GeForce
9600 GT | 4GB Ram
Eee PC 1201n: oS 11.4 64 bit | Intel Atom 330@1.60GHz | KDE 4.6.0 | nVidia
ION | 3GB Ram

#include <stdio.h>
#include <stdlib.h>

inline int is_prime2(long num){
        long den=1;
        do{
                den++;
        }while( num%den != 0);

        if(den == num){
                return 1;
        }else{
                return 0;
        }
}

int main(int argc,char* argv]){
        long i=0;
        int j=0;
        for(i=2;i<=200000;i++){
                if (is_prime2(i) ){
                        //printf("%li / ",i);
                        j++;
                }
        }
        printf("Num Primos: %i 
",j);
        return 0;
}

Lets take the program line by line
It is a C program so let’s check the libraries
The first error who I see is that you missed system(“PAUSE”) in the end of your program.
Second lets analyze your program
Beginning from main() function
The main has a repetition loop to 200000
This program said in function main()
after repetition loop:
With if(is_prime2(i)) it tries(program) to call the function inside function main(), but it is completely wrong because in function is_prime2(long num), has not i inside () but num so in your main() function you should write is_prime2(num) not …(i).

In printf("Num Primos: %i
",j);
Num Primos is not declared inside program neither as global variable, either as local variable.

To have a good comparism replace the int and long declaration everywhere by
int32_t and int64_t (whatever really suits your needs) to have it
architecture independent and include the header <stdint.h>.


PC: oS 11.3 64 bit | Intel Core2 Quad Q8300@2.50GHz | KDE 4.6.1 | GeForce
9600 GT | 4GB Ram
Eee PC 1201n: oS 11.4 64 bit | Intel Atom 330@1.60GHz | KDE 4.6.0 | nVidia
ION | 3GB Ram

That is - to say it blunt - the weirdest nonsense I have ever read.
The name of an actual parameter has absolutely nothing to do with the name of the formal parameter.
That there is a need to call a system(“PAUSE”) is complete nonsense in a standard C program. A main function which is declared as int ends with a return of an integer for decades (except you declare it as void).
The “Num Primos” is part of a string constant if you have not noticed that.

If you use # include<stdlib.h> you must to the end of program to write a System(“Pause”)
The variables must be the same in input and output of them, such as functions of program.
Different names of variables in program is not acceptable from program. For example: you declare num and in printf() you write NUM PRIMOS. It is not acceptable and it is wrong.

stamostolias wrote:

> If you use # include<stdlib.h> you must to the end of program to write
> a System(“Pause”)
This is absolute nonsense do you have a clue at all about C programming.

> The variables must be the same in input and output of them, such as
> functions of program.
The same type not the same name do you understand what a function is and
what a function call is?

> Different names of variables in program is not acceptable from program.
This is how programming works - abstraction!

> For example: you declare num and in printf() you write NUM PRIMOS. It is
> not acceptable and it is wrong.
>
If you do not know the difference between a string constant and a variable I
recommend you read a good book about C programming there is a variety of
tutorials available, some of them for free.

I am speakless - you pollute this thread with completely unrelated and plain
wrong statements which are so wrong that I have to come to the conclusion
you never wrote a C program (or if you did you do not understand it at all).

PLEASE STOP THAT NOW.
It is more than annoying.


PC: oS 11.3 64 bit | Intel Core2 Quad Q8300@2.50GHz | KDE 4.6.1 | GeForce
9600 GT | 4GB Ram
Eee PC 1201n: oS 11.4 64 bit | Intel Atom 330@1.60GHz | KDE 4.6.0 | nVidia
ION | 3GB Ram

Look I develop in C++ and study computer science, and know about many in developing
Well one by one what i said

Different names of variables in program is not acceptable from program.
This is how programming works - abstraction!

With different names of variables I mean that when you declare a variable in the beginning of program, you can not change the name of variable inside program.
For example
int num





printf("Num Primos: %i
",j)

Sorry??? What???
I polute this thread???
I know C and very well.
It would better to read this C (programming language) - Wikipedia, the free encyclopedia

Again YOU CAN NOT CHANGE THE NAME OF VARIABLE WHEN THIS VARIABLE HAS ALREADY BEEN DECLARED.
This words are written in all programming books.
You can Change the type of variable
float PI=3.14;
int i;
i=(int) PI;
But not the name.

That’s I try to tell.

I recommend “The C Programming Language” by K&R.


Regards,
Barry D. Nichols

Barry_Nichols@no-mx.forums.opensuse.org wrote:

> I recommend “The C Programming Language” by K&R.
>
> –
> Regards,
> Barry D. Nichols

+1


PC: oS 11.3 64 bit | Intel Core2 Quad Q8300@2.50GHz | KDE 4.6.1 | GeForce
9600 GT | 4GB Ram
Eee PC 1201n: oS 11.4 64 bit | Intel Atom 330@1.60GHz | KDE 4.6.0 | nVidia
ION | 3GB Ram

Carlos E. R. wrote:

> On 2011-03-31 07:36, rcornet wrote:
>
>> Libraries bug?.. Even statically compiled versions perform the same
>> scale factor.
>
> A 64 bit program may use double the memory, if you are not careful.
>
To come back on topic.

@rcornet:
You ran into the trap that when compiling with -m64 xou use different sizes
for the elementary types which is the reason your program slows down, that
is essentialy what Carlos wrote.

Here is a self contained example which shows you that you get the same speed
with both (-m32 and -m64) if you take care to use the right declarations for
the integer types:


gcc -Wall -m32 -O2 bench.c -o bench32
gcc -Wall -m64 -O2 bench.c -o bench64

time ./bench32
Num Primos: 17984

real    0m6.994s
user    0m6.981s
sys     0m0.001s

time ./bench64
Num Primos: 17984

real    0m6.944s
user    0m6.906s
sys     0m0.004s


This is with the following bench.c


#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

inline int is_prime2(int32_t num)
{
int32_t den=1;
do
{
den++;
}
while( num%den != 0);

if(den == num)
{
return 1;
}
else
{
return 0;
}
}

int main(int argc,char* argv])
{
int32_t i=0;
int32_t j=0;

for(i=2; i<=200000; i++)
{
if (is_prime2(i) )
{
//printf("%li / ",i);
j++;
}
}
printf("Num Primos: %i 
",j);
return 0;
}


If you want that what you previously declared as long is a 64bit integer in
both cases just change it to this


#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

inline int is_prime2(int64_t num)
{
int64_t den=1;
do
{
den++;
}
while( num%den != 0);

if(den == num)
{
return 1;
}
else
{
return 0;
}
}

int main(int argc,char* argv])
{
int64_t i=0;
int32_t j=0;

for(i=2; i<=200000; i++)
{
if (is_prime2(i) )
{
//printf("%li / ",i);
j++;
}
}
printf("Num Primos: %i 
",j);
return 0;
}


With the following times in this case


time ./bench32
Num Primos: 17984

real    0m22.890s
user    0m22.868s
sys     0m0.001s

time ./bench64
Num Primos: 17984

real    0m21.931s
user    0m21.834s
sys     0m0.010s


The benefit of using <stdint.h> and the types declared in it is that you get
a C99 compliant program, which does not depend on the compiler defaults for
different architectures.


PC: oS 11.3 64 bit | Intel Core2 Quad Q8300@2.50GHz | KDE 4.6.1 | GeForce
9600 GT | 4GB Ram
Eee PC 1201n: oS 11.4 64 bit | Intel Atom 330@1.60GHz | KDE 4.6.0 | nVidia
ION | 3GB Ram

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>

inline int is_prime2(int64_t num)
{
int64_t den=1;
do
{
den++;
}
while( num%den != 0);

if(den == num)
{
return 1;
}
else
{
return 0;
}
}

int main(int argc,char* argv])
{
int64_t i=0;
int32_t j=0;

for(i=2; i<=200000; i++)
{
if (is_prime2(i) )
{
//printf("%li / ",i);
j++;
}
}
printf("Num Primos: %i 
",j);
return 0;
}

Now it is right written. But I said it again when we have #include <stdlib.h> you need a System(“pause”) to the end of algorithm.

There is not relation with this algorithm

#include <stdio.h>
#include <stdlib.h>

inline int is_prime2(long num){
        long den=1;
        do{
                den++;
        }while( num%den != 0);

        if(den == num){
                return 1;
        }else{
                return 0;
        }
}

int main(int argc,char* argv]){
        long i=0;
        int j=0;
        for(i=2;i<=200000;i++){
                if (is_prime2(i) ){
                        //printf("%li / ",i);
                        j++;
                }
        }
        printf("Num Primos: %i 
",j);
        return 0;
}

On 2011-04-22 17:36, stamostolias wrote:

> Look I develop in C++ and study computer science, and know about many
> in developing

I don’t believe it.

> With different names of variables I mean that when you declare a
> variable in the beginning of program, you can not change the name of
> variable inside program.
> For example
> int num
> …
>
> …
> …
> …
> …
> printf("Num Primos: %i
",j)

printf is printing the text “Num Primos”, not a variable called “Num” nor
“num”. This is basic C, if you don’t know that it is impossible your are
studying computer science with C included.

I refuse to consider the rest of what you said. My C skills are rusty and
outdated, I know very little of linux programming, but yours are… are
unbelievable.

:frowning:

Or perhaps you study pure C++ with no plain C. Dunno. I try to understand it…

Or perhaps you know no English and use an automated translator from Greek
that makes such big warfs.

Hint: the program was compiled with “-Wall” enabled, and didn’t print the
smallest protest. If what you said were true it would print big complains.

Hint2: Ask your teacher, before making things worse.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)

On 2011-04-22 15:35, martin_helm wrote:
> Anyway this is comparing apples and oranges here since the default sizes for
> the elementary types differ between 64bit and 32bit compilation.

Yes, I know that :slight_smile:

I mean, I know what you mean, because that is what I said a month ago: that
if you are not careful your program uses variables of double the size when
compiling for 64 bits.

But I do not know how to choose the correct sizes, because when I learnt C
the 32 bit processor was the new powerful gadget. There were not 64 bit
processors in sight.

>
>


> martinh@sirius:~/scratch> time ./bench32
> sizeof long 4
> sizeof int  4
> Num Primos: 17984
>
> real    0m7.004s
> user    0m6.974s
> sys     0m0.004s
> martinh@sirius:~/scratch> time ./bench64
> sizeof long 8
> sizeof int  4
> Num Primos: 17984
>
> real    0m21.925s
> user    0m21.809s
> sys     0m0.019s
>
> 

>
> To get a correct comparism one has to declare the types so that they are the
> same length for both compilations.

Right.

I’ll read what you say on the other post, I think you post a sample.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)