Regular crashes & boot problems

Hi,

I’m running on OS11.3 since a few weeks. I’me experimenting since the beginning a few problems, which begin to be disturbing.

  • regularly, when the system is running, OS crashes. All is fixed, the pointer doesn’t move, nothing runs. First the system keeps alive (Amarok keeps playing, for example), then bug and crash/reboot. Nothing can stop this exept of hard shutdown.
    (leds are pulsing in phase, so I suppose it’s mean “kernel panic”)

  • Often, when I boot, nothing goes on : the system is charging, but the screen is black. I see the normal “flash” when Nvidia drivers come in use, but all is always black. There is just no video output.
    To pass this problem, i juste have
    to reboot, more and more. Until now I hadn’t searched for the solution, but last time I had to reboot 11 times before the progress bar appears and the system runs !

Any idea about these problems ?

bugounet wrote:
> I’m running on OS11.3
> Any idea about these problems ?

you do know that openSUSE 11.3 is software for TESTING only, right?

that is, it is not our currently released and supported software…if
you want help in this forum you need to be running openSUSE 11.0,
11.1 or 11.2 (only)…

on the other hand if you are an old hand at Linux and want to help
squash the KNOWN BUGS in that not yet ready for prime time openSUSE
11.3 then you are welcome to do that but please confine your
discussion of problems over in the correct TESTING forum:
http://forums.opensuse.org/get-help-here/pre-release-beta/

and, to be an effective tester you do need to be keen on posting the
bugs you find, either to bugzilla
<http://en.opensuse.org/Submitting_Bug_Reports> or directly to a the
developers mail list (do NOT assume posts to the pre-release-beta
forum will ever be seen by a developer…you must use bugzilla)


DenverD (Linux Counter 282315)
CAVEAT: http://is.gd/bpoMD
posted via NNTP w/TBird 2.0.0.23 | KDE 3.5.7 | openSUSE 10.3
2.6.22.19-0.4-default SMP i686
AMD Athlon 1 GB RAM | GeForce FX 5500 | ASRock K8Upgrade-760GX |
CMedia 9761 AC’97 Audio

Woups, I think I made an error, sorry. The version I have is the latest stable, so this is the 11.2. I didn’t verify the number… -_-"

I have however to note that I run on KDE4.4, which was of course installed from the official repositories. But since the problem is on the boot, I think this has no link with KDE.

bugounet wrote:
> I didn’t verify the number… -_-"

when seeking help it is kinda important to give potential helpers the
facts…

> I have however to note that I run on KDE4.4, which was of course
> installed from the official repositories. But since the problem is on
> the boot, I think this has no link with KDE.

ok…first i know the ‘standard’ repair routine for some other
operating systems is to just reboot and hope everything is great next
time…and, if that doesn’t work then you just boot a few more times
to see if it will eventually fix itself…and, then reinstall over
and over until it magically works…

well, Linux is more predictable than that and there are not so many
different things that cause problems which will magically go away by
booting…

so, lets start trying to FIX the problem by finding out what the
problem might be:

first, did you have these problems immediately after the install
process had booted one time, or did they come some days later?


DenverD (Linux Counter 282315)
CAVEAT: http://is.gd/bpoMD
posted via NNTP w/TBird 2.0.0.23 | KDE 3.5.7 | openSUSE 10.3
2.6.22.19-0.4-default SMP i686
AMD Athlon 1 GB RAM | GeForce FX 5500 | ASRock K8Upgrade-760GX |
CMedia 9761 AC’97 Audio

(yep, I know that it must be repairable, I’m running with Linux since a few years, but until now the problem wasn’t too disturbing)

I thinks it began to come immediately.
Until now, i had just to try to boot 2 or 3 times max to have a real boot, I never had si long bug. This can happen after a crash, or if the PC was off.
Last time, after the 11 retries, it booted in command line. I made a fsck, it repaired a broken inode and then was able to boot.

Hem, I have to confess that I haven’t tried to make the fsck manually until now. But there were no need to have so much reboot.

bugounet wrote:
> (yep, I know that it must be repairable, I’m running with Linux since a
> few years, but until now the problem wasn’t too disturbing)
>
> I thinks it began to come immediately.
> Until now, i had just to try to boot 2 or 3 times max to have a real
> boot, I never had si long bug. This can happen after a crash, or if the
> PC was off.
> Last time, after the 11 retries, it booted in command line. I made a
> fsck, it repaired a broken inode and then was able to boot.
>
> Hem, I have to confess that I haven’t tried to make the fsck manually
> until now. But there were no need to have so much reboot.
>
>

well, maybe we don’t have software problems, but rather hardware
troubles!

desktop or laptop? brand?

how old is your hard drive? (hate to tell you, but it might be dying)
do you have all your “good stuff” backed up on another machine…

and, what about your power supply? if it is several years old and you
have added stuff to the machine, you may be overtaxing the power
supply…and, voltage fluctuations can make really strange stuff happen…

how about heat problems?

how much RAM…


DenverD (Linux Counter 282315)
CAVEAT: http://is.gd/bpoMD
posted via NNTP w/TBird 2.0.0.23 | KDE 3.5.7 | openSUSE 10.3
2.6.22.19-0.4-default SMP i686
AMD Athlon 1 GB RAM | GeForce FX 5500 | ASRock K8Upgrade-760GX |
CMedia 9761 AC’97 Audio

My thoughts: a piece of broken/breaking hardware. Linux should boot about the same every time, not crash once, boot a couple of times and then crash again. That indicates a hardware error. Could be the disk, could be the videocard, could be the RAM.
Can you give some more specs, like in DD’s and my signature?

This file: /var/log/messages should contain info about the crashes. You can only access it with root permission, in a terminal window do:


su -c 'grep kernel /var/log/messages | more'
(enter root password)

touching the spacebar will let you scroll one page down. Look carefully for strings like error, crash, panic

Hum…

I though also about hardware too, mainly the hard drive (I had freezes on Mandriva too, but I though it was because of the lot of distribution bugs I had). But I made a test, and all was ok.

I was on Kubuntu a few months ago. I migrated on Mandriva to search a better KDE experience, and it cause me troubles I’d never seen with the ACPI. Since other users had this problem I though it was just Mandriva fault (there were a lot of others bugs, and then I came on OpenSUse which seems great).

But I saw references to the ACPI in the logs…

Jun 7 00:15:51 linux-820q kernel: [12346.989381] Uhhuh. NMI received for unknown reason a1 on CPU 0.
Jun 7 00:15:51 linux-820q kernel: [12346.989381] You have some hardware problem, likely on the PCI bus.
Jun 7 00:15:51 linux-820q kernel: [12346.989381] Dazed and confused, but trying to continue
Jun 7 00:15:51 linux-820q kernel: [12347.003043] NVRM: Xid (0001:00): 6, PE0003
Jun 7 00:15:51 linux-820q kernel: [12347.083800] NVRM: Xid (0001:00): 6, PE0001
Jun 7 00:15:51 linux-820q kernel: [12347.162655] NVRM: Xid (0001:00): 6, PE0001

Jun 7 08:37:21 linux-820q kernel: 14.504137] Clocksource tsc unstable
(delta = -68594819 ns)

Jun 7 08:37:26 linux-820q kernel: 23.313201] ACPI Exception: AE_TIME, Returned by Handler for [EmbeddedControl] 20090521 evregion-424
Jun 7 08:37:26 linux-820q kernel: 23.313256] ACPI Error (psparse-0537): Method parse/execution failed _GPE._L02] (Node f6c36e64), AE_TIME
Jun 7 08:37:26 linux-820q kernel: 23.313363] ACPI Exception: AE_TIME, while evaluating GPE method [_L02] 20090521 evgpe-568

Jun 7 20:37:28 linux-820q kernel: 22.230996] ACPI: EC: missing confirmations, switch off interrupt mode.

“You have some hardware problem, likely on the PCI bus.”
Heum… ^^"

After a few searches, it seems that it’s not necessary real hardware problem. Other users have the same messages on laptops.
Moreover, 3 months ago, I was on Kubuntu, and never had this type of problem. Concerning power, my battery is almost dead, I don’t know if it can have an influence.

And, again, I can add the fact that I’ve Windows7 on my computer too, and it never had any problem (no freeze or boot-bug like Mandriva/OpenSuse). I know it doesn’t include the same check tools, and maybe doesn’t tell me about problems which would be shown by a Linux… But if there were a real and disturbing hardware fail, I think it would have a few bugs.

Hm… In conclusion… I don’t have any precise idea of the problem. xD
Do you see any other possibility, or did I miss something ?

bugounet wrote:
> Hm… In conclusion… I don’t have any precise idea of the problem. xD
> Do you see any other possibility, or did I miss something ?

you never told us much (directly) about your hardware but now i’ve
surmised you have a laptop new enough to support Win7 (so probably
came with it), with an almost dead battery…

and, i wonder if you ever bothered to check if the hardware on board
is compatible <http://en.opensuse.org/HCL/> with Linux…as, just
because it runs Win7 is zero proof that it can reliably run any
Linux distro without problems…

and, by the way: yes, flashing keyboard lights do indicate a kernel
panic…and, in my experience those panics are most often the result
of faulty RAM or other memory problems (which may be induced by lots
of stuff, including using the wrong drivers, because the hardware is
misidentified)…boot from your install media and run the memory
check for at least 12 hours…

i can say this with a lot of confidence: the symptoms you relate point
to a very sick system…either hardware or software…or both…

i’d guess a complete, fresh, format and install (including /home)
might remove all symptoms…for a while, maybe…if not then there
is absolutely either a hardware failure or hardware incompatibility
problem…

then if after a while the symptoms return it is because of a very few
things:

  • a system update is installed which is not compatible with the system
  • an intermittent hardware fault reappears
  • the user is interjecting settings which are killing the system
  • other?


DenverD (Linux Counter 282315)
CAVEAT: http://is.gd/bpoMD
posted via NNTP w/TBird 2.0.0.23 | KDE 3.5.7 | openSUSE 10.3
2.6.22.19-0.4-default SMP i686
AMD Athlon 1 GB RAM | GeForce FX 5500 | ASRock K8Upgrade-760GX |
CMedia 9761 AC’97 Audio

you never told us much (directly) about your hardware but now i’ve
surmised you have a laptop new enough to support Win7 (so probably
came with it), with an almost dead battery…

Ow sorry, I forgot again.

The laptop is 3 years old. No mark, it was assembled in a little shop in which I use to command all my informatic stuff.
2Go RAM DDR2, Geforce8600, Intel Dual core 2.2.

The PC has always run mainly with Linux, with a near Windows for the specificss softwares (XP, then Seven).
I ran Ubuntu for a long time, then Kubuntu, Mandriva (3 months), OpenSuse (since a few weeks). Plus a few tests of others distributions on short periods.

So, I don’t think having any incompatibility with Linux kernel and other. Except if there is any hardware recent problem (since < 3 months : Mandriva had problems, not Kubuntu). But if it’s not software, yeah, probably hardware. :confused:

Nevertheless, when the system is running, except these few kernel panics (1 or 2 per day max), all is ok, no special problem (maybe a less good detection of wifi than 7, but it’s another problem).

For now, I removed my battery to see if the problems continue without it. I’ll make a few boot tests as soon as I’ve time.

One suggestion: if you are confident in doing so, take the laptop apart, reseat all hardware modules. I bought my current laptop because I thought the previous one had a dying screen. When the new one was ready for work, I disassembled the old one completely, reassembled it, my son has been working on it for a year now, all OK.

Ok, no link with the battery, I had the same without it.

Disassemble the PC… Not very possible. I can go until the necessary to clean/change thermal grease, but then the parts of the PC seems to be not removables.

Some news after a long time :

In fact I think the problem isn’t material. Since I did the upgrade to 11.3, I haven’t had any crash… Until I installed the Nvidia driver.

Curiously, the same day I made this install, I had freeze/crash, and then black screen at the reboot.
If the Nouveau driver was able to manage graphic acceleration, I would use it, but it can’t. Do you know if there is some special procedure to make the nvidia driver more stable ?