Intermittent boot freezes on Suse 12.2

All,

Since I have installed 12.2 I am experiencing intermittent boot freezes when going through the default GRUB menu option.
There does not seem to be a pattern when the freeze takes place.
I have not seen the problem when going via GRUB recovery mode.
Suspecting the NVidia driver, I have added “nomodeset” to the GRUB options, but it does not solve the problem.

I have now added “debug” mode into the GRUB option list https://en.opensuse.org/SDB:Debugging_boot_hang

Occasionally I see the following error message when the system freezes:

initcall amd64_edac_init +0x0/0x1000 [amd64_edac_mod] returned -19 after…usecs

Any suggestions for follow up actions are welcomed.

System: Host: Hostname Kernel: 3.4.11-2.16-desktop x86_64 (64 bit)
Desktop KDE 4.9.5 Distro: openSUSE 12.2 (x86_64) VERSION = 12.2 CODENAME = Mantis
Machine: Mobo: MSI model: 890FXA-GD65 (MS-7640) version: 3.0 Bios: American Megatrends version: V18.0 date: 12/31/2010
CPU: Hexa core AMD Phenom II X6 1090T (-MCP-) cache: 3072 KB flags: (lm nx sse sse2 sse3 sse4a svm)
Clock Speeds: 1: 800.00 MHz 2: 2400.00 MHz 3: 1600.00 MHz 4: 3200.00 MHz 5: 800.00 MHz 6: 800.00 MHz
Graphics: Card: NVIDIA GF104 [GeForce GTX 460]
X.Org: 1.12.3 drivers: nvidia (unloaded: fbdev,nv,vesa,nouveau) Resolution: 1920x1080@60.0hz
GLX Renderer: GeForce GTX 460/PCIe/SSE2 GLX Version: 4.2.0 NVIDIA 304.43
Audio: Card-1: NVIDIA GF104 High Definition Audio Controller driver: snd_hda_intel Sound: ALSA ver: 1.0.25
Card-2: Advanced Micro Devices [AMD] nee ATI SBx00 Azalia (Intel HDA) driver: snd_hda_intel
Network: Card: Realtek RTL8111/8168B PCI Express Gigabit Ethernet controller driver: r8169
IF: eth0 state: up speed: 1000 Mbps duplex: full mac: 6c:62:6d:c9:e0:4e
Drives: HDD Total Size: 2330.5GB (123.7% used)
1: /dev/sda INTEL_SSDSA2M080 80.0GB
2: /dev/sdb WDC_WD5000AAKS 500.1GB
3: /dev/sdc SAMSUNG_HD103SJ 1000.2GB
4: /dev/sdd WDC_WD7500AAKS 750.2GB

Please post output of

systemd-analyze blame

It should show where the system “hangs”, I suspect that’s what it’s doing.

On 01/20/2013 04:06 PM, u20380 wrote:
> intermittent boot freezes . . . There does not seem to be a pattern
> when the freeze takes place . . . Occasionally I see the following
> error message when the system freezes:

it may not be important in your case, but sometimes apparently totally
random problems spring from consistent but varying hardware problems…

yes, i know the system has probably been like a rock, but now with this
new 12.2 there is a problem–and logically it must be a software
problem–but still it could be that the new kernel/system is
particularly sensitive to tiny voltage variances/spikes/drops which
might be from (for example) a weak or failing power supply unit
(PSU)… (or a loose ground somewhere, or … or … or)

i see the machine has four hard drives, a power munching CPU, and
multiple video/audio cards…could it be that your PSU is just not up to
the task…or, maybe was one day but not today…

PSUs do fail eventually, and often they degrade slowly, causing all
kinds of seemingly unrelated problems for months and months . . .

i can’t guess the total power needs for your system, but i’d guess you
probably need at least a 500W unit in good working order (and has had
the dust and chicken bones blown out lately)

on the other hand, the next poster might know exactly what is upsetting
the kernel…at least this google seems to indicate it is a kernel
problem http://tinyurl.com/a8ukvtj


dd
openSUSE®, the “German Engineered Automobile” of operating systems!

Here is the output:

3518ms avahi-daemon.service
3472ms syslog.service
3447ms systemd-logind.service
1680ms SuSEfirewall2_init.service
1601ms media-win_h.mount
1524ms systemd-tmpfiles-setup.service
797ms media-lin_backup.mount
744ms postfix.service
720ms home-user-MyDocuments.mount
603ms udev-trigger.service
445ms systemd-vconsole-setup.service
356ms console-kit-log-system-start.service
356ms fbset.service
271ms systemd-sysctl.service
270ms SuSEfirewall2_setup.service
267ms media-win_d.mount
232ms udev.service
149ms systemd-remount-api-vfs.service
148ms systemd-modules-load.service
133ms cpufreq.service
132ms ntp.service
90ms remount-rootfs.service
90ms systemd-user-sessions.service
89ms cycle.service
87ms network.service
85ms sys-fs-fuse-connections.mount
85ms media.mount
77ms var-lock.mount
74ms systemd-readahead-collect.service
72ms systemd-readahead-replay.service
71ms var-run.mount
69ms localnet.service
67ms NetworkManager.service
61ms network-remotefs.service
52ms nmb.service
50ms smb.service
46ms xdm.service
46ms udev-root-symlink.service
46ms sys-kernel-security.mount
39ms home.mount
38ms dev-mqueue.mount
36ms bluez-coldplug.service
35ms dev-hugepages.mount
33ms sys-kernel-debug.mount
26ms acpid.service
23ms windows-c.mount
22ms console-kit-daemon.service
19ms rc-local.service
10ms home-user-Share.mount
4ms upower.service
3ms rtkit-daemon.service

@dd : the hardware is a bit confusing, but just an onboard audio card, and an NVIDIA card (probably with HDMI output, hence also a sound processor).

@OP: I don’t see anything strange in the output. Best would be to take in look in /var/log/messages and dmesg output, look what happens around the time the system hung/froze

Thanks for sharing your thoughts. Indeed the system has a lot of power consumers in it.
The power supply is a Seasonic M12II Bronze 520W.

I myself was thinking perhaps critical memory timing or disk not appearing in time when mounted might cause the problem.
What still puzzles me is the fact that the system starts without problems in recovery mode. To me that suggests it is more a software than a hardware issue.

On 01/20/2013 05:06 PM, Knurpht wrote:
> just an onboard audio card, and an NVIDIA card

you don’t see four hard drives, with 123.7% used??

hmmmm…maybe he edited those out after the msg was passed to the
gateway (hate that that is even possible!)


dd
openSUSE®, the “German Engineered Automobile” of operating systems!

On 01/20/2013 05:06 PM, u20380 wrote:
> What still puzzles me is the fact that the system starts without
> problems in recovery mode.

then to find “the problem” one needs only to compare the kernel boot
options passed during a “recovery mode” boot and a normal boot, and then
experiment with various options until you find what it is that is
causing the freezes (i’d guess

that was pretty easy to do before systemd, but i have no idea if there
is even such a thing as /boot/grub/menu.lst still :frowning: <i may have to stay
on 11.4 forever.>

but, i GUESS adding one of these to the normal boot options will result
in ‘never’ having a boot freeze (but, there WILL be other side issues
like running hotter, or less battery life, or or or or or):

CAUTION: read the caveat in my sig before trying any of these…and get
someone here to help you find the actual list of failsafe boot options
on YOUR machine…don’t try any of mine (above) that are not on your
system!

apm=off
noresume
nosmp
maxcpus=0
edd=off
powersaved=off
nohz=off
highres=off
processor.max_cstate=1
nomodeset
x11failsafe


dd http://tinyurl.com/DD-Caveat

I have noticed an error message “Clocksource tsc unstable”, then and unrecoverable ohci_hcd error.
I have added “processor.max_cstate=1” to the GRUB options. This controls the CPU power saving, it seems.

Adding “processor.max_cstate=1” seems to have resolved the boot freeze.

Still - an unsatifying workaround…