Generally, my computer behaves okay. Sometimes, however, it crashes and is unstable until after I shut it down, wait ten seconds and boot it up again (what I’d possibly incorrectly call a warm boot). It only ever seems to need a “warm boot” after being off for over a day (e.g. when the in-laws visit for an evening and I don’t use my computer from Wednesday night until Friday night).
I’ve been poking and prodding the problem as best I know, but I’ve not worked out what the problem is. The kernel is still awake enough to respond to SysReq and so I can reboot cleanly if my machine locks up, but it was only in the past couple of days that I noticed segfault messages.
Sometimes Chromium will crash out and refuse to restart after that because of segfaults. Sometimes X locks up and I can’t switch to another terminal. Sometimes X just restarts itself (normally alternating between Ctrl+Alt+F7 and F8), but once that happens then X won’t stay stable. I’ve also once seen some “preload” command segfault on TTY1 (not sure which one it was).
I’ve run memtest a couple of times and it passes a couple of iterations each time before I finish it. I’ve moved my memory around in case it isn’t seating properly. I’ve tried hammering the processor with some of the performance testing in Parted Magic. I’ve swapped out the graphics card (partly to see if it was sending bad messages, because I’ve had other problems - reported in other threads) with no effect. The BIOS is still keeping the time, so it wouldn’t seem to be anything battery related (plus the mobo is only about a year old).
I can’t work out what causes it, but if I power off, wait and power on again then everything is fine.
The only stack traces I’ve seen have been “segfault at 8 ip … error 4 in libstdc++.so.6.0.14” and “/usr/bin/Xorg - corrupted doubly linked list”. Google wasn’t much help, because so many segfault questions are people working with their own apps that they’re writing (where it is their mistake) or some desktop app (where it is still the same type of mistake, but it can be run through GBD, unlike the entire OS).
I’m running openSUSE 11.4 with all the latest updates, plus Gnome 3.
Anyone got any ideas what might cause it or how to diagnose better? Thanks.
> Generally, my computer behaves okay. Sometimes, however, it crashes and
> is unstable until after I shut it down, wait ten seconds and boot it up
> again (what I’d possibly incorrectly call a warm boot). It only ever
> seems to need a “warm boot” after being off for over a day (e.g. when
> the in-laws visit for an evening and I don’t use my computer from
> Wednesday night until Friday night).
Since you have had the memory modules in your warm hands, I have to ask if
you bothered to clean the contacts? The apparent “memory” issues may also be
due to transfer/data errors from disk - all too often those are tin plated
connectors subject to mild corrosion, especially in humid climates. Another
possible problem area is the power supply - it may be marginal at cold
startup temps.
You may be able to reproduce the error by directing a fan blowing the
coldest air you have available into the case to simulate the cold start
conditions but realize that even with refrigerated air it can take an hour
or so to pull down internal temps.
Did I mention that I HATE intermittent problems???
On 07/19/2011 05:41 PM, Will Honea wrote:
> Another possible problem area is the power supply - it may be marginal at cold
> startup temps.
let me get on that piggy’s back a moment:
how old is your system? if it is more than a couple of years old it has
outlived the warranty on parts such as the power supply…some (many,
most, all?) computer manufacturers are in the business of turning a
profit…and, to do so they use the least costly parts that are likely
to make it past the warranty period (logical huh?)…
my experience is that most home/consumer level quality boxes on the
retail shelves ship with power supplies just barely having the capacity
to turn out enough clean stable power the day they are new and ~two
years later when the warranty is finished…and, the thing about power
supplies is that the do weaken and wear out over time.
so, if your is over (say) three years old you might wanna consider
getting a new power supply with more capacity than the one which came
with your computer…
hmmmm…trying to remember…i think the last time i bought one i bought
a 500 watt to replace a 350 which was about 4 years old…and, i had
been having strange shutdown/startup/freeze problems for about six
months when finally someone told me about what i’ve written above for you…
however: from here it is impossible for me to say that a weak or failing
power supply is your problem…but, it might be…
–
DD Caveat-Hardware-Software
openSUSE®, the “German Engineered Automobiles” of operating systems!
I didn’t clean the memory contacts - I thought about it, but I wasn’t sure what the best material to do it with would be, and I didn’t want to make things worse! Also, I’d half assumed that memtest might pick up bad contacts.
I did consider the HDD as well. I’ve run it through a long SMART test and badblocks and it all checks out, but I guess those check the mechanism rather than the transfer.
I’ll see what I can do to check the power supply. I guess it is another possible culprit, but I’d have expected lockups and spontaneous shutdowns rather than segfaults. Can small power fluctuations mess with the memory like that?
As for the age of my system - it varies! I built a machine from scratch in about 2005 (AMD Athlon 64). I replaced the ATI graphics card in about 2007-2008 because I was sick of the lack of support - I got a second-hand Geforce 7950GT from a gamer at work. The HDD started to error and died in about 2008, just within warranty, so I bought a replacement and then got a refurb back (with something like a whole month of warranty offered on that new one!). Somewhere around 2007-2008 the motherboard seemed to be playing up, so I got a 2nd hand replacement from work, then the PSU was dead/died (I think I killed it while testing), so I got a new 500W in about 2008. In 2010 I ripped out the guts (mobo, processor and memory) and replaced it with a Core i5, but I kept the HDD and graphics card.
So, it depends which bit! The power supply is one of the older bits and was only a Jeantech from PC World. The memory is fairly new (almost exactly 1 year, in fact), but the hard disk is a few years older and is currently running as part of a degraded RAID array (the other disk was the refurb, but taking it out fixed a few errors I was having a month back and reduced the noise level).
On 07/19/2011 09:06 PM, IBBoard wrote:
>
> I didn’t clean the memory contacts - I thought about it, but I wasn’t
> sure what the best material to do it with would be, and I didn’t want to
> make things worse!
just use the eraser on a normal wooden pencil (not the gritty kind on
ink pen)…don’t get so aggressive on it, and wipe any eraser bits off
with a cloth…
> I’ll see what I can do to check the power supply.
i’ve heard you can buy a PSU tester, but i’ve never actually seen one…
i do know that it is not likely you can test it with a simple voltage
tester because the PSU has some built in sensor and responds to the
draw…so, you have provide a load or you won’t get accurate
readings…google it, there are ways to check with the power on and the
system providing the load…but, that is not something i’m willing to do.
> I guess it is another
> possible culprit, but I’d have expected lockups and spontaneous
> shutdowns rather than segfaults. Can small power fluctuations mess with
> the memory like that?
everything in the system depends on smooth power at a prescribed
predictable voltage…
> so I got a new 500W in about 2008.
maybe good, maybe not…if i had to guess: probably good (thats based
on figuring you have a good bit less need than 500W, so it should have a
longer life…
–
DD Caveat-Hardware-Software
openSUSE®, the “German Engineered Automobiles” of operating systems!
I don’t know what I’m actually drawing from my power supply, but it should be a reasonable amount less than 500W (Core i5, 2 sticks of memory, a couple of HDDs, an optical drive that is hardly ever used, and a couple of case fans, plus a 7950GT with a Zalman on it). It has been stable enough the rest of the time with that power supply, even when hammering it much harder than I do these days.
I’ll try to clean the memory contacts at some point and see if that improves anything.
We’re going to be upgrading my wife’s computer by the end of the week, so I plan to swap the power supplies and test with that as well. It’s actually an older (but slightly more powerful) PSU, but if it gets better then it’ll be a better indicator that my PSU had a short work life.
> I’ll see what I can do to check the power supply. I guess it is another
> possible culprit, but I’d have expected lockups and spontaneous
> shutdowns rather than segfaults. Can small power fluctuations mess with
> the memory like that?
Major failures will cause the gross effects but noise levels can result in
infuriating errors that appear almost random. Modern memory design is much
more tolerant - or should I say less stressful? - than earlier dynamic
designs but I’ve traced memory failures to seemingly insignificant 100pf
caps whose sole reason for being was to filter power leads right at the
array connection. One trick I used to isolate such problems was a small
heat gun to preheat portions of the layout at a time. Today, the problem is
more commonly from the power supply itself as DD describes.
Think of the problem in terms of noise on the lines rather than outright
failure.
Sorry for the delay, but everything seems to be okay now.
The weekend after my last post, I upgraded my wife’s PC. We gutted the hardware and basically left the DVD drive and the case. While doing that I noticed that she had the Jeantech power supply that I’d bought, meaning I was using the five year old PSU that came in her machine as part of a pre-built unit. I then remembered she’d been having problems with her graphics card in Windows. I must have swapped the power supplies to see if that was the problem, it stabilised her machine and didn’t affect mine until I started using Gnome 3, by which time I’d forgotten about the power supply swap! I put the Jeantech back in my case and all seems to be stable now.
Thanks for the pointers. In future I’ll be more wary of the PSU being on the fritz