CPU hard lockup Opensuse 12.1

Hello,

I got Hard lockup on cpu. After first Lock Up (while configuring thunderbird)
i was albe to restart and, then another lockup (don’t remember anymore what
I was doing) i get black screen after login. Although if i press power bottom
I get logout screen in blackness, then i just shut down.

Any point in debugging or trying to repair system ?

On 11/28/2011 05:46 PM, Adenozinas wrote:
> Any point in debugging or trying to repair system ?

impossible to answer with such little information…

“black screen” see http://tinyurl.com/23mgej6


DD http://tinyurl.com/DD-Caveat
openSUSE®, the “German Engineered Automobiles” of operating systems!

You see, I do not think with the information given, that it is possible to give you a reasonable answer. Just to make the point:
is this a

  • 32 or 64 bit system
  • what graphics card (type, processor, driver) did you use
  • what processor do you use, what mainboard, what RAM
  • did you check your ram
  • did you check your temperature settings of the CPU
  • did you overclock your system or do you run your RAM with higher voltage then usual.
  • do you have access to the command line or is the screen black from the very boot?
  • did you perform any update before the second time “black screen” occurred?
  • do you use KDE or Gnome.

If you do not give information about your system (the more the better) it will not possible to help, that is sure.
Good luck.

Hello,

I had i feeling i put it rather obscure way :slight_smile: sorry.

Asus FK3a laptop. dual boot OSS12.1 64 bit KDE desktop environment (from DVD) and M$ vista 32bit
I have been running every verstion of opensuse on this laptop since 10.2 never had this kind
of kernel panic.

No overclocking and no kernel modification done on system
Processor: Turion 64 X2 TL-64.
Graphical card: ATI HD2600

  • Is it graphical problem ?: Installed FGLRX from repository on wiki page Index of /mirror/ati/openSUSE_12.1, worked ok, but
    gave some half a second flickering on effects so I uninstalled it. After that no problems observed,

  • Is it RAM ?: on kernel panic I got some numbers spitted out, looked like memory address format (not sure about it). M$ Vista works fine though, so i think no mechanical damage to memory.

You cannot tell if the RAM is O.K. If you have the original DVD, put it in the drive, boot and try the memory test. Let it run for a whole night to be sure that the RAM is ok. These test give you a clear result either at once or after a few hours.
Which graphics driver are you currently using? Did you use 11.4 before and did it give any problem? If not, what was the last system installed?
Did you try to start in failsafe mode and to work a little to see if the problem manifests?
Do you have the problem while on battery power or while on sector. Or is it indifferent?
Why do you say it is a kernel panic? Do you have terminal output on the error or logs? Then please post them here in “code” tags. (You continue to give too little information, IMO).

Hi, thanks for advices and having temper with me so far.
I’ll start from other end.
I mention Kernel panic because suddenly I got error message in “terminal/init3-mode”: kernel panic - not syncing hard lockup detected on CPU0; and after that 6-7 lines with stuff i didn’t write down, could be memory adresses. Since it is ‘‘hard lock up’’ it is not logged in log files so I can’t provide you exact output; I found this in http://en.opensuse.org. and/openSUSE:Bugreport_kernel. In addition there says

If this happens, you’re in trouble. You can stop reading now, and we wish you good luck in debugging this.
So that is why I ask here is it worth doing anything or just simply reinstall system. I can’t give you more details about laptop at the moment, because I don’t have here with me, writing what i remember. If you

it is possible that there has been a damage to the system. If you are not bothered by the 40 min install of openSUSE 12.1, a reinstall might be a good idea. But one should check beforehand the hardware.
Another question I would have is: was this a one time, two time error or is there a specific action when this happens. If 11.4 was working well and 12.1 after fresh install does not, the problem is more like the kernel, not necessarly the graphics card. Especially if you already tried to run the machine in failsafe mode and if your openSUSE installations where also 64 bit. The fact that the Windows 32 bit runs well and a 64 bit AMD system doesn’t could be a problem of RAM if you use more than 4 GB on your system. You did not provide this information. Even 4 GB might be an issue in these circumstances. So I would still run the RAM test.
Of course you should be conscious that there is no warranty that the same problem may not arise after a reinstall. If you do and the system does run well afterwards, please come back and report it, so we know how the situation solved.

Hi,

Another question I would have is: was this a one time, two time error or is there a specific action when this happens.

it was two time error. first one occured while configuring mozilla thunderbird, second one at different circumstances (don’t remember). And after second one i could not get visual after login (i hear welcome message from speakers but no view).

If 11.4 was working well and 12.1 after fresh install does not, the problem is more like the kernel, not necessarly the graphics card. Especially if you already tried to run the machine in failsafe mode and if your openSUSE installations where also 64 bit.

correct, I have been using every single Opensuse version after 10.2 all 64bit and didn’t have this kernel message before. Fail safe mode did not help at all.

Thank you for suggestion of RAM test, I have 3GB of RAM, and I will run memory check from DVD. Then I’m plaining to reinstall and see if bug re-apears.

On 11/29/2011 05:26 AM, Adenozinas wrote:
>
> stakanov;2411965 Wrote:
>> You cannot tell if the RAM is O.K. If you have the original DVD, put it
>> in the drive, boot and try the memory test. Let it run for a whole night
>> to be sure that the RAM is ok. These test give you a clear result either
>> at once or after a few hours.
>> Which graphics driver are you currently using? Did you use 11.4 before
>> and did it give any problem? If not, what was the last system installed?
>>
>> Did you try to start in failsafe mode and to work a little to see if
>> the problem manifests?
>> Do you have the problem while on battery power or while on sector. Or
>> is it indifferent?
>> Why do you say it is a kernel panic? Do you have terminal output on the
>> error or logs? Then please post them here in “code” tags. (You continue
>> to give too little information, IMO).
>
>
> Hi, thanks for advices and having temper with me so far.
> I’ll start from other end.
> I mention Kernel panic because suddenly I got error message in
> “terminal/init3-mode”: kernel panic - not syncing hard lockup detected
> on CPU0; and after that 6-7 lines with stuff i didn’t write down, could
> be memory adresses. Since it is ‘‘hard lock up’’ it is not logged in log
> files so I can’t provide you exact output; I found this in
> ‘http://en.opensuse.org. and/openSUSE:Bugreport_kernel’
> (http://en.opensuse.org/openSUSE:Bugreport_kernel). In addition there
> says> If this happens, you’re in trouble. You can stop reading now, and we
>> wish you good luck in debugging this. So that is why I ask here is it worth doing anything or just simply
> reinstall system. I can’t give you more details about laptop at the
> moment, because I don’t have here with me, writing what i remember. If
> you

A reinstall should not be necessary, but we must have the stuff that is spit out
with the kernel panic. If you do not need to do something to force the panic,
boot up and switch to the debug console by using CTRL-ALT-F10. When the panic
happens, take a picture of the screen and post it, or write down the info. On
the chance that it does not fail this way, use CTRL-ALT-F7 to get back to the GUI.

MemTest86+ really isn’t that good any more for multi-core CPUs and testing RAM. Try Prime95 blend stress test under Linux (though it’s not the easiest for a beginner).

That would be “mprime”](ftp://mersenne.org/gimps/mprime266-linux64.tar.gz) for linux if I am not mistaken?

Yeah, mprime that’s the binary name. At the main menu, choose torture test, “13” for custom iirc, then 4 threads (assuming quad core) and then punch in a number that’s ~600 MB less than your total system memory, and 2 minutes per each FFT size. Let it rip for a couple hours or even better overnight. Press Ctrl-C and verify that it says “0 errors, 0 warnings” for all workers. What I really like to do is bump the system clock up 3% and test that – then I know there’s a bit of margin if it passes. If it fails at 3% “extra” then the hardware setup is awfully close to the hairy edge. I am allergic to flaky hardware (had to send some ddr3-1600 ram back because it had almost zero margin.)

As an aside, Prime95 is no good for checking SB cpu stability … use IBT for that (intel burn test) but it’s Windows only iirc.

Hi,

Thanks for tip for set-up of mprime!

Pirme95 test orvernight did not report any errors.
I’ll give a try with mprime, if if get network running. wireless of course does not run on init 3.
I’ll try wire it.

On 12/03/2011 01:46 PM, Adenozinas wrote:
>
> Hi,
>
> Thanks for tip for set-up of mprime!
>
> Pirme95 test orvernight did not report any errors.
> I’ll give a try with mprime, if if get network running. wireless of
> course does not run on init 3.
> I’ll try wire it.

If you use the ifup method, the wireless will work at run level 3. I have one
laptop with wireless that does not even have a GUI installed.

Hi,

sorry boys and girls, this was MemTest86+ form DVD not prime95+.
I don’t feel very confident in trying to bake my cpu on this 4 year old laptop, but let’s say it’s
for the best :slight_smile:

I think it is O.K. I had a mainboard with a faulty RAM on it and MEMtest86+ came out right away with it. I think the O.P. did advise for it, if you would be using “overclocked to the edge” machines. If the test did run fine, we departure by the idea that the RAM is O.K. Now let us focus on fixing your real problem. As you may have read IWfinger’s post, I would also think it could be very useful if you take a photo of the screen (with a camera or a good cell phone camera) when it happens. You post it then with the help of pastebin (or if the function works here in the forum, as an immage). That may give excellent indications.

ok, bingo. Mprime did the trick. i got the error reproduced.
there were three options of test i took hardest one with parameters
as close as suggested in post #12 here. test failed in aprox 15 seconds.
I took a picture of kernel panic output, but don’t know where to upload it (i have some sort of memoy,
that forum has some picture upload place), Could anyone help me out ?
I did then the weakest stress test on mprime and it was going fine for 50 min.
Mprime suggests in this situation that it a memory controller or ram problem.

You can use the service pastebin](http://pastebin.com/) for the photo. It is quite self explaining and does not bear a cost or subscription.
What is puzzling me that you get the error only in 12.1, but still. Let’s see, if you have a friend with the same memory modules you can try to put them in (if you are confident to do so and he is too). So if the test runs fine and the machine does not lock, you buy a high quality memory module (with the maximum possible) for the laptop. I would advice Kingston but I would then say that the main part of the other producers are equally valuable when it comes to recent modules.
This would be the “cheapest and fastest” way to know. Quick and dirty. If it still produces, somebody here could debug the kernel message (which is far above my knowledge but Iwfinger would be a very good candidate I guess. But not to waste time, a test with other RAM should help. Modules are cheap, especially if DDR and DDR2 sodimm modules. Probably even only one slot with max 2 GB. You are there about 25 Euros for a 2 GB quality module. Not expensive at all.

Hi,

Thanks for suggestions!
here’s when I ran mprime
ImageBam - Fast, Free Image Hosting and Photo Sharing

and here’s spontanious one

ImageBam - Fast, Free Image Hosting and Photo Sharing

Hello again,

I reinstalled opensuse 11.3 and ran the same mprime test that hooked my laptop in few seconds using opensuse 12.1,
and guess what, no problems what so ever. “Srongest” Stress Test ran 50 min, all good. So i don’t know what’s the actual reason but I think
I’ll stick with opensuse 11.4 or 11.3. Opensuse 12.1 apears “too cutting edge” for me…