boot hangs?

I don’t know if this has to do with SuSE or my hardware but sometimes when I turn on the computer the boot process will get to a certain point and then freeze. The only solution I’ve found is to turn it off and turn it on again. Is this a common problem with SuSE/Linux?

I think it’s a problem with either SuSE or my specific installation because the system posts without a problem. This also never happened with other Linux distributions I’ve tried.

Try hitting escape when the green ‘loading’ screen appears, and see if you can see where it’s hanging…

Or edit your /boot/grub/menu.lst (make a backup first!), and temporarily change the kernel line so it contains ‘splash=silent’, and it’ll always show you the boot messages…

Sorry - should read ‘splash=verbose’…

Also, see here for graceful (comparatively…) crash recovery. It doesn’t always work, depending on what’s crashed it, but it’s better than turning it off and on again…

http://www.susegeek.com/general/linux-kernel-magic-sysrq-keys-in-opensuse-for-crash-recovery/

I just had the realization that it’s a Grub problem, if anything. If it gets to the SuSE loading screen, everything is fine. It only ever freezes before the SuSE splash screen is shown. Is there a way to safely update Grub?

That’s a tricky one… I’d doubt it’s a grub bug, because the grub code is pretty stable - and if it is, chances of it having been fixed are low, because it’ll be very rare.

More likely corruption or quirky configuration.

Did you check the ISO, and the burned media? Check the preinstallation sticky at the top of the forum for advice on that.

Do you have an odd setup? RAID or multiple drives? It’s possible that the BIOS sometimes puts them in a different order, and that can play havoc.

What precisely does it do? Blank screen? Blinking cursor? The single word ‘boot:’?

Any extra information may help…

I have both a 160GB HDD and 640GB HDD. I have openSuSE installed on the 160GB HDD and I have that HDD set as the boot device.

I can’t explain what it does other than that it just…freezes. It always happens at a random point during the boot process. I can’t put my finger on what it is.

So it freezes at different points, but always before it’s got to the splash screen?

I’d still rule out corruption first. Again: did you test the ISO and the media?

Is it possible to disable the other drive in the BIOS, then reboot the system enough times to reliably check whether that fixes it?

some1onlinux wrote:
> happens at a random point during the boot process. I can’t put my
> finger on what it is.

Confuseling is doing good (don’t stop), but i wanna add:

if it is hanging before the first green screen pops up it has to be
either in the BIOS or in grub (of course)… my machine is several
years old but still fast enough that it FLASHES through from the very
first BIOS screen to the green screen that is is almost impossible to
see when one ends and the next begins…i wonder if maybe you have
hardware problem and it is not getting to grub, at all…

a flaky grounding somewhere…just a little too loose plug into a hard
drive/etc can cause things like you describe…

can you turn off the machine and safely wiggle (better actually to
unplug and replug) drive cables…ESPECIALLY if you (maybe) recently
added that second drive…maybe a cable got accidentally loosened…

kinda to get specific (with you lack of specific pointers)…

laptop, desktop, old, new…added hardware lately, etc etc etc etc…

have you recently changed anything in your BIOS setup?


palladium

I didn’t check the ISO media. Maybe I should have.

As for the BIOS, yes. I had to go in there to enable suspend and hibernate. I also had to disable Cool N quiet and lower the voltage of the CPU (because it was getting a little hot IMO).

I’ve also done Prime95 through wine but have had no errors.

This also never happened with other Linux distributions I’ve tried.

I should point out that this only happens occasionally. Most of the time, booting is very fast.

^Won’t let me edit post anymore.

I changed the Northbridge and Chipset voltages to 1.1v because it was getting too hot inside my case. I also enabled CPU quiet fan, which I disabled just now. I haven’t had any problems with the lower voltages in months, though. Is SuSE more demanding than other Linux distros?

some1onlinux wrote:
> disable Cool N quiet and lower the voltage of
> the CPU … too hot inside my case

i see you like fiddle with the knobs and switches (not that there is
any thing wrong with that)…

well, nothing wrong with that until you can’t boot and don’t which
knob or switch you turned/flipped last so that you can unturn/unflip
it and see if the non-boot situation magically goes away…

i guess you are kinda on you own to fiddle the controls until you undo
what you broke…

hey: it is YOUR machine, you can run it anyway you want to…but, my
personal opinion and practice is to use the manufacturers recommended
voltages/settings/defaults…sure, it might not burn rubber getting
away from the red light, but i bet it will last longer, run cooler,
last longer, be more stable, last longer, be more reliable/dependable,
last longer, cost less to operate, last longer and therefore have a
lower unit cost per operation over its life time…

oh, and it will also last longer…

you might consider restoring the defaults (from the list you wrote
down prior to beginning the exploration) and changing stuff with a
plan and record…change ONE thing, and run it for a week (at least
24 hours)…

> Is SuSE more demanding than other Linux distros?

i’ve done no tests, but i guess they are all pretty much using the
same kernel…i’d guess difference in the strain (heat) generated
generated between (say) openSUSE KDE w/desktop effects on and Ubuntu
GNOME w/desktop effects off, might be no more different than that
between openSUSE KDE/on and openSUSE GNOME/off…

all things are relative.


palladium

If my system is unstable, why am I not getting any errors in Prime95? That’s why I think it’s a software problem.

I had to change the voltages because they were incorrectly set when I chose ‘auto’. The CPU was maxed out at 1.375v when it’s supposed to be only 1.25. But it would not let me change the CPU voltage without also making me manually set the northbridge voltage as well. I set that to 1.1v, the lowest stable voltage according to my tests.

I think SuSE/Grub had a conflict with my CPU quiet fan setting. I disabled that last night.

If you verify to your satisfaction that you’ve found the fix, that sort of information is very much appreciated in the hardware compatibility list…

HCL/Main Boards - openSUSE

some1onlinux wrote:
> If my system is unstable, why am I not getting any errors in Prime95?
> That’s why I think it’s a software problem.

sorry, i have no idea what Prime95 is…and, your problem may very
well be software or software related…

if it is not in the BIOS (which we don’t know, do we?) and is
occurring AFTER the BIOS has handed off to grub we should be getting
an error in the logs…what do they say

> I think SuSE/Grub had a conflict with my CPU quiet fan setting. I
> disabled that last night.

“had” as in it is now ok??


palladium

Prime95 is a program used primarily by overclockers to test system stability. Basically all it does is push CPU/GPU/RAM usage to 100% and record any errors. It has to be left to run for hours in order to be accurate.

if it is not in the BIOS (which we don’t know, do we?) and is
occurring AFTER the BIOS has handed off to grub we should be getting
an error in the logs…what do they say

where do I find the logs?

“had” as in it is now ok??

It’s too early to tell.

System logs end up in /var/log - though I couldn’t tell you precisely where you should be looking, or for what.

‘boot.msg’ and ‘messages’ maybe?

grepping them for ‘PANIC’, ‘ERROR’, ‘Error’ etc sometimes helps…

As an addendum - running one program and finding no hardware errors does not guarantee their absence. It is quite common to find memory errors in Linux that aren’t visible in Windows, or the vice versa, apparently. They simply use different code, and different ‘hardware paths’.

I guess a ‘stress tester’ should run most things, but I also guess that rather depends how well written it is…

I was looking around in the BIOS and found that IGPU boost was enabled. I looked this up and found that it’s an option for overclocking the integrated GPU. I disabled this, so that should help somewhat.

did you find any errors or warnings in the logs?

these (one at a time into a terminal) might be revealing:


sudo cat /var/log/boot.msg | tail

sudo cat /var/log/boot.msg | grep -i error

sudo cat /var/log/boot.msg | grep -i warn

sudo cat /var/log/messages | tail -n50 | grep -i error

sudo cat /var/log/messages | tail -n50 | grep -i warn

sorry, i do not know how to learn what is going on in the BIOS sequence…


palladium

Here are the results. I’m not sure what any of it means.

brian@linux-0a1m:~> sudo cat /var/log/boot.msg | tail
root's password:
<6>    6.441904] loop: module loaded
<6>    6.454529] EXT4-fs (sdb1): barriers enabled
<6>    6.454777] kjournald2 starting: pid 796, dev sdb1:8, commit interval 5 seconds
<6>    6.454919] EXT4-fs (sdb1): internal journal on sdb1:8
<6>    6.454922] EXT4-fs (sdb1): delayed allocation enabled
<6>    6.454925] EXT4-fs: file extents enabled
<6>    6.514380] EXT4-fs: mballoc enabled
<6>    6.514396] EXT4-fs (sdb1): mounted filesystem with ordered data mode
Kernel logging (ksyslog) stopped.
Kernel log daemon terminating.
brian@linux-0a1m:~> sudo cat /var/log/boot.msg | grep -i error
<3>    4.728058] nForce2_smbus 0000:00:01.1: Error probing SMB2.
<4>    4.932672] amd64_edac: probe of 0000:00:18.2 failed with error -22
brian@linux-0a1m:~> sudo cat /var/log/boot.msg | grep -i warn
<4>    0.168889] ACPI Warning: Incorrect checksum in table [OEMB] - 2E, should be 27 20090521 tbutils-246
<4>    4.932658] EDAC amd64: WARNING: ECC is NOT currently enabled by the BIOS. Module will NOT be loaded.
brian@linux-0a1m:~> sudo cat /var/log/messages | tail -n50 | grep -i error
brian@linux-0a1m:~> sudo cat /var/log/messages | tail -n50 | grep -i error
brian@linux-0a1m:~> sudo cat /var/log/messages | tail -n50 | grep -i warn
brian@linux-0a1m:~>