SEGFAULT on 32bit ELDK

As asked by Carlos E. R. I reposted this from http://forums.opensuse.org/english/get-technical-help-here/64-bit/476190-segfault-32bit-eldk.html

I used to run the quite old ELDK-3.1 on my box with no trouble.
I have OpenSuse Factory installed, and everything worked fine until 2012-05-30.

Since then, I did an update that replaced “glibc-2.15-19.1” by “glibc-2.15-21.1” and I started to see a SEGFAULT on any GCC power-pc executable (“ppc-linux-*”).

From “/var/log/messages”:


Jun 19 09:48:36 linux-8666 kernel: [40167.765187] ppc-linux-gcc[14437]: segfault at 74706f3b ip 00000000f765c5ee sp 00000000ffbfe470 error 4 in libc-2.15.so[f75e9000+1a0000]

From “/var/log/zypp/history”:


2012-05-15  09:58:26|install|glibc|2.15-19.1|x86_64||openSUSE-Factory-Oss|7a8e425e69530d6dee8d384d043bf9c9d5b18ecae4a60f07e437cddfd867d2e4 
...
2012-05-30  12:22:55|install|glibc|2.15-21.1|x86_64||openSUSE-Factory-Oss|55a44cb57252ef338583fa62756e72e8500bb105186049b95dc3806c8737d01a 


GDB backtrace:


 gdb `which ppc-linux-gcc`
GNU gdb (GDB) SUSE (7.4.50.20120603-1.1)
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /opt/ELDK-3.1/usr/bin/ppc-linux-gcc...(no debugging symbols found)...done.
(gdb) r
Starting program: /opt/ELDK-3.1/usr/bin/ppc-linux-gcc 

Program received signal SIGSEGV, Segmentation fault.
0xf7e7b5ee in _int_free (av=0xf7faa440 <main_arena>, p=0x8063640, have_lock=1) at malloc.c:4085
4085            unlink(nextchunk, bck, fwd);
(gdb) backtrace
#0  0xf7e7b5ee in _int_free (av=0xf7faa440 <main_arena>, p=0x8063640, have_lock=1) at malloc.c:4085
#1  0xf7e7d8d4 in free_check (mem=0x8064650, caller=0x80547a0) at hooks.c:257
#2  0xf7e7f00b in __GI___libc_free (mem=0x8064650) at malloc.c:2959
#3  0x080547a0 in ?? ()
#4  0x080522d9 in ?? ()
#5  0xf7e213d5 in __libc_start_main (main=0x8051c40, argc=1, ubp_av=0xffffce74, init=0x8048d78, fini=0x8056250, rtld_fini=0xf7fea840 <_dl_fini>, 
    stack_end=0xffffce6c) at libc-start.c:226
#6  0x080491c1 in ?? ()
(gdb) backtrace full
#0  0xf7e7b5ee in _int_free (av=0xf7faa440 <main_arena>, p=0x8063640, have_lock=1) at malloc.c:4085
        size = 8208
        fb = <optimized out>
        nextchunk = 0x8065650
        nextsize = <optimized out>
        nextinuse = 0
        prevsize = <optimized out>
        bck = 0x444c452f
        fwd = 0x74706f2f
        errstr = 0x0
        locked = 0
        __func__ = "_int_free"
#1  0xf7e7d8d4 in free_check (mem=0x8064650, caller=0x80547a0) at hooks.c:257
        p = <optimized out>
#2  0xf7e7f00b in __GI___libc_free (mem=0x8064650) at malloc.c:2959
        ar_ptr = <optimized out>
        p = <optimized out>
        hook = <optimized out>
#3  0x080547a0 in ?? ()
No symbol table info available.
#4  0x080522d9 in ?? ()
No symbol table info available.
#5  0xf7e213d5 in __libc_start_main (main=0x8051c40, argc=1, ubp_av=0xffffce74, init=0x8048d78, fini=0x8056250, rtld_fini=0xf7fea840 <_dl_fini>, 
    stack_end=0xffffce6c) at libc-start.c:226
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {-134569996, 0, 0, 0, -434506036, -576384292}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 
              0x1, 0x80491a0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 1}}}
        not_first_call = <optimized out>
#6  0x080491c1 in ?? ()
No symbol table info available.

Other 32bit binaries like Skype are still running.

I considered a downgrade of “glibc” but “glibc-2.15-19.1” is no longer available and zypper frightened me about one thousand dependent downgrades and arch changes (to i586) that led me to think it was not a good solution.

Could anyone please help me to find a fix for this?
Any help will be very wellcome.

Thanks a lot.

Sorry, I forgot to mention that my box is x86_64 (intel i7).

On 06/19/2012 12:46 PM, J arantes wrote:
>
> Sorry, I forgot to mention that my box is x86_64 (intel i7).

A bug in the PPC cross-compiler is likely something that should be reported to
the gcc mailing list. At a minimum, report it to the Novell Bugzilla.

ELDK-3.1 has been working perfectly for more than 7 years in hundreds (if not thousands) of computers around the world.
I can not classify this as a PPC-GCC bug since it’s been working even in my own machine up to the aforementioned update.

It sounds me far more like a glibc-2.15-21.1 issue.

Furthermore, as ELDK-3.1 is 7 years old and there is a much newer version out there, I don’t think I am gonna get their stuff to debug an ancient version for me just because it does not work on a beta version of OpenSuse.

Unfortunately I can’t switch to a newer toolchain. I am tied to this old one, so I ask everybody, please, don’t consider telling me to fix my toolchain.

I will be very thankful if anyone could tell me a way to give a better diagnosis or find any workaround for my problem inside the scope of OpenSuse-12.2 beta.

Thanks again.

On 06/20/2012 09:46 AM, J arantes wrote:
>
> lwfinger;2470149 Wrote:
>> On 06/19/2012 12:46 PM, J arantes wrote:
>> A bug in the PPC cross-compiler is likely something that should be
>> reported to
>> the gcc mailing list. At a minimum, report it to the Novell Bugzilla.
>
> ELDK-3.1 has been working perfectly for more than 7 years in hundreds
> (if not thousands) of computers around the world.
> I can not classify this as a PPC-GCC bug since it’s been working even
> in my own machine up to the aforementioned update.
>
> It sounds me far more like a glibc-2.15-21.1 issue.
>
> Furthermore, as ELDK-3.1 is 7 years old and there is a much newer
> version out there, I don’t think I am gonna get their stuff to debug an
> ancient version for me just because it does not work on a beta version
> of OpenSuse.
>
> Unfortunately I can’t switch to a newer toolchain. I am tied to this
> old one, so I ask everybody, please, don’t consider telling me to fix my
> toolchain.
>
> I will be very thankful if anyone could tell me a way to give a better
> diagnosis or find any workaround for my problem inside the scope of
> OpenSuse-12.2 beta.

Run the failing command under gdb. You will likely have to install a number of
debug packages that you can get from the debug repo.

Obviously I don’t need to tell you how to run gdb as you know a lot more than I do.

I already did it and the relevant output is in the original question.
But now I don’t know how to go further.

You will likely have to install a number of
debug packages that you can get from the debug repo.

All I could have installed for debug info is already installed (all glibc-*-debug stuff).
Sadly ELDK has no debug info.

Of course not. I am no GDB master, but Google seems to be :wink:


What I can digest from GDB output is:


static void                  // Line 3904
_int_free(mstate av, mchunkptr p, int have_lock)
//...
// Line 4083:
      /* consolidate forward */
      if (!nextinuse) {
    unlink(nextchunk, bck, fwd);        // Line 4085. Here is the crash.
    size += nextsize;
      } else
    clear_inuse_bit_at_offset(nextchunk, 0);

Pointed out by:



#0  0xf7e7b5ee in _int_free (av=0xf7faa440 <main_arena>, p=0x8063640, have_lock=1) at malloc.c:4085
        size = 8208
        fb = <optimized out>
        nextchunk = 0x8065650
        nextsize = <optimized out>
        nextinuse = 0
        prevsize = <optimized out>
        bck = 0x444c452f
        fwd = 0x74706f2f
        errstr = 0x0
        locked = 0
        __func__ = "_int_free"

My understanding is that the application tried to “free()” some memory but the process broke.
I suppose that “free()” should evaluate the given memory location so that it would be safe to free a “non-existent” memory address (that is what I understand for segfault).

I could try to understand the “_int_free()” function working, but it will undoubtely take me a long time to figure out what is happening and this is the main reason to as for help.

Please tell me if I am supposed to search for help elsewhere before deciding to open a bug report.

Really thanks.

On 06/20/2012 01:36 PM, J arantes wrote:
>
> lwfinger;2470299 Wrote:
>>
>> Run the failing command under gdb.
>
> I already did it and the relevant output is in the original question.
> But now I don’t know how to go further.
>
>> You will likely have to install a number of
>> debug packages that you can get from the debug repo.
>
> All I could have installed for debug info is already installed (all
> glibc--debug stuff).
> Sadly ELDK has no debug info.
>
> lwfinger;2470299 Wrote:
>> Obviously I don’t need to tell you how to run gdb as you know a lot more
>> than I do.
> Of course not. I am no GDB master, but Google seems to be :wink:
>
> ----
> What I can digest from GDB output is:
>
>
> Code:
> --------------------
>
> static void // Line 3904
> _int_free(mstate av, mchunkptr p, int have_lock)
> //…
> // Line 4083:
> /
consolidate forward */
> if (!nextinuse) {
> unlink(nextchunk, bck, fwd); // Line 4085. Here is the crash.
> size += nextsize;
> } else
> clear_inuse_bit_at_offset(nextchunk, 0);
>
> --------------------
>
>
> Pointed out by:
>
> Code:
> --------------------
>
>
> #0 0xf7e7b5ee in _int_free (av=0xf7faa440 <main_arena>, p=0x8063640, have_lock=1) at malloc.c:4085
> size = 8208
> fb = <optimized out>
> nextchunk = 0x8065650
> nextsize = <optimized out>
> nextinuse = 0
> prevsize = <optimized out>
> bck = 0x444c452f
> fwd = 0x74706f2f
> errstr = 0x0
> locked = 0
> func = “_int_free”
>
> --------------------
>
>
> My understanding is that the application tried to “free()” some memory
> but the process broke.
> I suppose that “free()” should evaluate the given memory location so
> that it would be safe to free a “non-existent” memory address (that is
> what I understand for segfault).
>
> I could try to understand the “_int_free()” function working, but it
> will undoubtely take me a long time to figure out what is happening and
> this is the main reason to as for help.
>
> Please tell me if I am supposed to search for help elsewhere before
> deciding to open a bug report.

The only other values that are needed are those of “bck” and “fwd” at the time
of the crash. from the names, it seems to me that an entry in a double-linked
list is being removed, and that the list is incorrectly set up, or those
pointers are not the ones used in the list.

With those values, I think you are ready to file the bug report.