Results 1 to 8 of 8

Thread: Segfaults on cold boot but not "warm" boot?

  1. #1

    Default Segfaults on cold boot but not "warm" boot?

    Generally, my computer behaves okay. Sometimes, however, it crashes and is unstable until after I shut it down, wait ten seconds and boot it up again (what I'd possibly incorrectly call a warm boot). It only ever seems to need a "warm boot" after being off for over a day (e.g. when the in-laws visit for an evening and I don't use my computer from Wednesday night until Friday night).

    I've been poking and prodding the problem as best I know, but I've not worked out what the problem is. The kernel is still awake enough to respond to SysReq and so I can reboot cleanly if my machine locks up, but it was only in the past couple of days that I noticed segfault messages.

    Sometimes Chromium will crash out and refuse to restart after that because of segfaults. Sometimes X locks up and I can't switch to another terminal. Sometimes X just restarts itself (normally alternating between Ctrl+Alt+F7 and F8), but once that happens then X won't stay stable. I've also once seen some "preload" command segfault on TTY1 (not sure which one it was).

    I've run memtest a couple of times and it passes a couple of iterations each time before I finish it. I've moved my memory around in case it isn't seating properly. I've tried hammering the processor with some of the performance testing in Parted Magic. I've swapped out the graphics card (partly to see if it was sending bad messages, because I've had other problems - reported in other threads) with no effect. The BIOS is still keeping the time, so it wouldn't seem to be anything battery related (plus the mobo is only about a year old).

    I can't work out what causes it, but if I power off, wait and power on again then everything is fine.

    The only stack traces I've seen have been "segfault at 8 ip ..... error 4 in libstdc++.so.6.0.14" and "/usr/bin/Xorg - corrupted doubly linked list". Google wasn't much help, because so many segfault questions are people working with their own apps that they're writing (where it is their mistake) or some desktop app (where it is still the same type of mistake, but it can be run through GBD, unlike the entire OS).

    I'm running openSUSE 11.4 with all the latest updates, plus Gnome 3.

    Anyone got any ideas what might cause it or how to diagnose better? Thanks.

  2. #2
    Will Honea NNTP User

    Default Re: Segfaults on cold boot but not "warm" boot?

    IBBoard wrote:

    > Generally, my computer behaves okay. Sometimes, however, it crashes and
    > is unstable until after I shut it down, wait ten seconds and boot it up
    > again (what I'd possibly incorrectly call a warm boot). It only ever
    > seems to need a "warm boot" after being off for over a day (e.g. when
    > the in-laws visit for an evening and I don't use my computer from
    > Wednesday night until Friday night).


    Since you have had the memory modules in your warm hands, I have to ask if
    you bothered to clean the contacts? The apparent "memory" issues may also be
    due to transfer/data errors from disk - all too often those are tin plated
    connectors subject to mild corrosion, especially in humid climates. Another
    possible problem area is the power supply - it may be marginal at cold
    startup temps.

    You may be able to reproduce the error by directing a fan blowing the
    coldest air you have available into the case to simulate the cold start
    conditions but realize that even with refrigerated air it can take an hour
    or so to pull down internal temps.

    Did I mention that I HATE intermittent problems???

    --
    WHonea

  3. #3
    Join Date
    Jun 2008
    Location
    Earth - Denmark
    Posts
    10,730

    Default Re: Segfaults on cold boot but not "warm" boot?

    On 07/19/2011 05:41 PM, Will Honea wrote:
    > Another possible problem area is the power supply - it may be marginal at cold
    > startup temps.


    let me get on that piggy's back a moment:

    how old is your system? if it is more than a couple of years old it has
    outlived the warranty on parts such as the power supply....some (many,
    most, all?) computer manufacturers are in the business of turning a
    profit...and, to do so they use the least costly parts that are likely
    to make it past the warranty period (logical huh?)..

    my experience is that most home/consumer level quality boxes on the
    retail shelves ship with power supplies just barely having the capacity
    to turn out enough clean stable power the day they are *new* and ~two
    years later when the warranty is finished....and, the thing about power
    supplies is that the _do_ weaken and wear out over time.

    so, if your is over (say) three years old you might wanna consider
    getting a new power supply with more capacity than the one which came
    with your computer...

    hmmmm...trying to remember...i think the last time i bought one i bought
    a 500 watt to replace a 350 which was about 4 years old....and, i had
    been having strange shutdown/startup/freeze problems for about six
    months when finally someone told me about what i've written above for you...

    however: from here it is impossible for me to say that a weak or failing
    power supply _is_ your problem.....but, it might be..

    --
    DD
    Caveat-Hardware-Software
    openSUSE®, the "German Engineered Automobiles" of operating systems!

  4. #4

    Default Re: Segfaults on cold boot but not "warm" boot?

    Thanks for the suggestions

    I didn't clean the memory contacts - I thought about it, but I wasn't sure what the best material to do it with would be, and I didn't want to make things worse! Also, I'd half assumed that memtest might pick up bad contacts.

    I did consider the HDD as well. I've run it through a long SMART test and badblocks and it all checks out, but I guess those check the mechanism rather than the transfer.

    I'll see what I can do to check the power supply. I guess it is another possible culprit, but I'd have expected lockups and spontaneous shutdowns rather than segfaults. Can small power fluctuations mess with the memory like that?

    As for the age of my system - it varies! I built a machine from scratch in about 2005 (AMD Athlon 64). I replaced the ATI graphics card in about 2007-2008 because I was sick of the lack of support - I got a second-hand Geforce 7950GT from a gamer at work. The HDD started to error and died in about 2008, just within warranty, so I bought a replacement and then got a refurb back (with something like a whole month of warranty offered on that new one!). Somewhere around 2007-2008 the motherboard seemed to be playing up, so I got a 2nd hand replacement from work, then the PSU was dead/died (I think I killed it while testing), so I got a new 500W in about 2008. In 2010 I ripped out the guts (mobo, processor and memory) and replaced it with a Core i5, but I kept the HDD and graphics card.

    So, it depends which bit! The power supply is one of the older bits and was only a Jeantech from PC World. The memory is fairly new (almost exactly 1 year, in fact), but the hard disk is a few years older and is currently running as part of a degraded RAID array (the other disk was the refurb, but taking it out fixed a few errors I was having a month back and reduced the noise level).

  5. #5
    Join Date
    Jun 2008
    Location
    Earth - Denmark
    Posts
    10,730

    Default Re: Segfaults on cold boot but not "warm" boot?

    On 07/19/2011 09:06 PM, IBBoard wrote:
    >
    > I didn't clean the memory contacts - I thought about it, but I wasn't
    > sure what the best material to do it with would be, and I didn't want to
    > make things worse!


    just use the eraser on a normal wooden pencil (not the gritty kind on
    ink pen)...don't get so aggressive on it, and wipe any eraser bits off
    with a cloth..


    > I'll see what I can do to check the power supply.


    i've heard you can buy a PSU tester, but i've never actually seen one..
    i do know that it is not likely you can test it with a simple voltage
    tester because the PSU has some built in sensor and responds to the
    draw...so, you have provide a load or you won't get accurate
    readings...google it, there are ways to check with the power on and the
    system providing the load....but, that is not something i'm willing to do.

    > I guess it is another
    > possible culprit, but I'd have expected lockups and spontaneous
    > shutdowns rather than segfaults. Can small power fluctuations mess with
    > the memory like that?


    everything in the system depends on smooth power at a prescribed
    predictable voltage..


    > so I got a new 500W in about 2008.


    maybe good, maybe not....if i had to guess: probably good (thats based
    on figuring you have a good bit less need than 500W, so it should have a
    longer life..

    --
    DD
    Caveat-Hardware-Software
    openSUSE®, the "German Engineered Automobiles" of operating systems!

  6. #6

    Default Re: Segfaults on cold boot but not "warm" boot?

    I don't know what I'm actually drawing from my power supply, but it should be a reasonable amount less than 500W (Core i5, 2 sticks of memory, a couple of HDDs, an optical drive that is hardly ever used, and a couple of case fans, plus a 7950GT with a Zalman on it). It has been stable enough the rest of the time with that power supply, even when hammering it much harder than I do these days.

    I'll try to clean the memory contacts at some point and see if that improves anything.

    We're going to be upgrading my wife's computer by the end of the week, so I plan to swap the power supplies and test with that as well. It's actually an older (but slightly more powerful) PSU, but if it gets better then it'll be a better indicator that my PSU had a short work life.

    Thanks.

  7. #7
    Will Honea NNTP User

    Default Re: Segfaults on cold boot but not "warm" boot?

    IBBoard wrote:

    > I'll see what I can do to check the power supply. I guess it is another
    > possible culprit, but I'd have expected lockups and spontaneous
    > shutdowns rather than segfaults. Can small power fluctuations mess with
    > the memory like that?


    Major failures will cause the gross effects but noise levels can result in
    infuriating errors that appear almost random. Modern memory design is much
    more tolerant - or should I say less stressful? - than earlier dynamic
    designs but I've traced memory failures to seemingly insignificant 100pf
    caps whose sole reason for being was to filter power leads right at the
    array connection. One trick I used to isolate such problems was a small
    heat gun to preheat portions of the layout at a time. Today, the problem is
    more commonly from the power supply itself as DD describes.

    Think of the problem in terms of noise on the lines rather than outright
    failure.

    --
    WHonea

  8. #8

    Default Re: Segfaults on cold boot but not "warm" boot?

    Sorry for the delay, but everything seems to be okay now.

    The weekend after my last post, I upgraded my wife's PC. We gutted the hardware and basically left the DVD drive and the case. While doing that I noticed that she had the Jeantech power supply that I'd bought, meaning I was using the five year old PSU that came in her machine as part of a pre-built unit. I then remembered she'd been having problems with her graphics card in Windows. I must have swapped the power supplies to see if that was the problem, it stabilised her machine and didn't affect mine until I started using Gnome 3, by which time I'd forgotten about the power supply swap! I put the Jeantech back in my case and all seems to be stable now.

    Thanks for the pointers. In future I'll be more wary of the PSU being on the fritz

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •