Eternal Reboot - OpenSUSE 11.3

Greetings all. I have come across a problem with my OpenSUSE 11.3 install.

I installed and configured OpenSUSE 5 months ago. And it has been running wonderfully until this morning. To my knowledge I have not changed any of my configurations, or added any software in the last week…

When my system boots up I get the grub screen showing my options, and when I select the default (OpenSUSE 11.3) the screen goes completely black, and then reboots. It does it over and over again. I have let it go for as many as 5 reboots just to make sure.

I can load the failsafe kernel, and get things running, but I am concerned that I have to use the failsafe. I do not know the difference between the failsafe and the standard kernel, if someone could educate me I would appreciate it.

I have checked my ‘/var/log/messages’ and I see what the problem is (i think) although I have no idea how to fix. before everything starts shutting down, I get this message:

kdm[1572]: Cannot execute 'grub-set-default': not in $PATH

This line seems to be in every single reboot set of messages, and I think is what is forcing the infinite reboot. What would cause this error to happen? and does anyone have ideas for fixing? I will look at the grub documentation, but as the entry starts with “kdm” I’m not sure if it is a possible kdm problem?? I am very concerned, this just started happening recently, and I really need this machine to be operational.

On a side note, I have not tried booting Windows XP which resides on a separate hard drive yet… (as I write this I am in the failsafe kernel… so I don’t think this is a hardware problem, I guess it could be, but since this works I’m hoping not…)

Thanks to all who contribute,
Matt.

So, the command “grub-set-default” creates a text file called default in your /boot/grub folder. It is 10 bytes long and if you looked at it with a text editor, you might just see a ‘0’ in it and nothing else but blank lines. Here is a write up on the command:

Invoking grub-set-default - GNU GRUB Manual 0.97

So, I might wonder if this default file exists or if something is wrong with it, but I am not sure why you can’t use your standard kernel load line. I might suggest you post a copy of your menu.lst file for us to read and if the /boot/grub/default file exists. Open a terminal session and enter the following commands:

sudo cat /boot/grub/menu.lst
root's password:

Copy and past the contents on your screen into a message here. It is best to go into the advanced message mode, highlight all of the copied text and press the code button (#) to keep it from being reformatted.

Is is OK to use the failsafe mode, but does not use advanced hardware modes for video and even CPU’s, so your best performance will not be using failsafe mode, but the standard kernel load. If nothing has really changed, that is odd, but some updates might be the trouble.

Thank You,

On 2010-11-08 18:36, taggedzi wrote:

> I can load the failsafe kernel, and get things running, but I am
> concerned that I have to use the failsafe. I do not know the difference
> between the failsafe and the standard kernel, if someone could educate
> me I would appreciate it.

It runs when there are some problems, but slower. One core only, for example.

> I have checked my ‘/var/log/messages’ and I see what the problem is (i
> think) although I have no idea how to fix. before everything starts
> shutting down, I get this message:
>
>
> Code:
> --------------------
> kdm[1572]: Cannot execute ‘grub-set-default’: not in $PATH
> --------------------

I don’t know why kdm would do that. Maybe it thinks you want to reboot to
one of the entries in grub menu.

You could boot to normal mode, but level “3”, ie, text, no kdm. Just enter
a “3” at the grub prompt. Then log in and use “startx” to get a graphical
mode without using kdm.

Where to go from there, I’m unsure now.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)

If you are unsure about editing the menu.lst file, go to yast2>system>boot loader and choose ‘propose new config’ from the dropdown list and it should fix it for you.

I still randomly encounter this problem. It is somehow related to usb storage media.

IF the problem is happening (perpetual reboot)
IF my external hard drive is on
Turn it off -> Reboot -> Problem fixed
If my external hard drive is off
Turn it on -> Reboot -> Problem fixed.

I know this seems “odd” but if I am getting multiple reboots in a row… If I follow that logic it fixes it every time. Every time this happens there is a left over folder in “/media” for the drive that was causing me problems, that is no longer mounted. I am wondering perhaps if the system shuts down without properly un-mounting, it tries to remount and fails??? I have no idea. But I can fix the problem when it occurs, and it only happens occasionally.

Anyone have some insight?

How about posting fdisk -l (that is is a lower case L)

Also your /boot/grub/menu.lst

Check the BIOS and make sure your main disk is the first in boot order.

fdisk -l

Disk /dev/sdb: 251.0 GB, 251000193024 bytes
255 heads, 63 sectors/track, 30515 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000bccba

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1         262     2103296   82  Linux swap / Solaris
Partition 1 does not end on cylinder boundary.
/dev/sdb2             262        2873    20972544   83  Linux
/dev/sdb3            2873       30516   222040064   83  Linux

Disk /dev/sda: 80.0 GB, 80000000000 bytes
255 heads, 63 sectors/track, 9726 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00093d71

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1        9726    78124063+   7  HPFS/NTFS

Disk /dev/sdd: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x27e9bfe8

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1      182401  1465136001    7  HPFS/NTFS

cat /boot/grub/menu.lst
# Modified by YaST2. Last modification on Fri Oct 29 09:00:02 EDT 2010
# THIS FILE WILL BE PARTIALLY OVERWRITTEN by perl-Bootloader
# Configure custom boot parameters for updated kernels in /etc/sysconfig/bootloader

default 0
timeout 8
gfxmenu (hd1,1)/boot/message

###Don't change this comment - YaST2 identifier: Original name: linux###
title openSUSE 11.3 - 2.6.34.7-0.5 (default)
    root (hd1,1)
    kernel /boot/vmlinuz-2.6.34.7-0.5-default root=/dev/disk/by-id/ata-Maxtor_4A250J0_A806BC1E-part2 resume=/dev/disk/by-id/ata-Maxtor_4A250J0_A806BC1E-part1 splash=silent quiet showopts vga=0x31a
    initrd /boot/initrd-2.6.34.7-0.5-default

###Don't change this comment - YaST2 identifier: Original name: failsafe###
title Failsafe -- openSUSE 11.3 - 2.6.34.7-0.5
    root (hd1,1)
    kernel /boot/vmlinuz-2.6.34.7-0.5-default root=/dev/disk/by-id/ata-Maxtor_4A250J0_A806BC1E-part2 showopts apm=off noresume nosmp maxcpus=0 edd=off powersaved=off nohz=off highres=off processor.max_cstate=1 nomodeset x11failsafe vga=0x31a
    initrd /boot/initrd-2.6.34.7-0.5-default

###Don't change this comment - YaST2 identifier: Original name: xen###
title Xen -- openSUSE 11.3 - 2.6.34.7-0.5
    root (hd1,1)
    kernel /boot/xen.gz vgamode=0x31a 
    module /boot/vmlinuz-2.6.34.7-0.5-xen root=/dev/disk/by-id/ata-Maxtor_4A250J0_A806BC1E-part2 resume=/dev/disk/by-id/ata-Maxtor_4A250J0_A806BC1E-part1 splash=silent quiet showopts vga=0x31a
    module /boot/initrd-2.6.34.7-0.5-xen

###Don't change this comment - YaST2 identifier: Original name: windows###
title Windows
    rootnoverify (hd0,0)
    chainloader +1

###Don't change this comment - YaST2 identifier: Original name: floppy###
title Floppy
    rootnoverify (fd0)
    chainloader +1

There are the files… I have been using the method I posted above (switching usb drive states…) and I have been able to get around the problem. The problem is it only happen intermittently, and when it happens it is to late to go back in and figure out the problem… and when I fix it the problem is gone…

My bios seems to be fine to me… it is pointing to the proper boot device.

I think others more qualified than I should give an answer here, but my non-expert opinion is there is a problem in what is marked as the ‘active device’.

First, I think sda1 is an MS-Windows OS installed on your PC. Is that correct? Second I think sdbx is your openSUSE Linux, and sdd1 is your external NTFS drive? Is that correct ?

I am also surprised there is no sdcx entry with the ‘fstab’ and that suggests to me you have some entries in your /etc/fstab that is causing this sdcx to be missed.

It might help the experts on our forum assist you, if you were to post the contents of your /etc/fstab file as well.

I am puzzled by your configuration, as I see sdb2 as likely being your / drive where /boot/grub/menu.lst likely resides. But I see sda1 (the MS-Windows NTFS partition), flagged as the active partition. IMHO that should not be the case, and I can’t see how your PC can even boot with that configuration. Do you also use another boot manager besides grub?

On 2010-12-01 12:36, oldcpu wrote:

> It might help the experts on our forum assist you, if you were to post
> the contents of your /etc/fstab file as well.

Or the findgrub script.

>
> I am puzzled by your configuration, as I see sdb2 as likely being your
> / drive where /boot/grub/menu.lst likely resides. But I see sda1 (the
> MS-Windows NTFS partition), flagged as the active partition. IMHO that
> should not be the case, and I can’t see how your PC can even boot with
> that configuration.

If the boot disk is sdb, active partitions in sda are not seen, do not
matter. If the boot disk is sda, then he must have grub in the mbr of that
disk, and then the active partition is also ignored (or, to complicate
things, stage 1 in the mbr of sda and stage 2 somewhere in sdb).

If I understand correctly, the reboot cycle happens when kde complains:

Code:

kdm[1572]: Cannot execute ‘grub-set-default’: not in $PATH

grub-set-default is a script that defines the grub menu entry to be used as
booting entry for next reboot (I think). It might be choosing the wrong one
or failing for some unknown reason and leaving grub in a bad state.

Then, I would not halt/reboot using kde. I would close the session, switch
to text mode, then either type halt on a console, or give the three finger
salute, which can be configured to either reboot or halt (in inittab).


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)

/sda = My Primary IDE hard drive running Windows
/sdb = My Secondary IDE hard drive running Linux with 3 partitions.
/sdd = My external USB hard drive

Here is my FSTAB

cat /etc/fstab 
/dev/disk/by-id/ata-Maxtor_4A250J0_A806BC1E-part1 swap                 swap       defaults              0 0
/dev/disk/by-id/ata-Maxtor_4A250J0_A806BC1E-part2 /                    ext4       acl,user_xattr        1 1
/dev/disk/by-id/ata-Maxtor_4A250J0_A806BC1E-part3 /home                ext4       acl,user_xattr        1 2
/dev/disk/by-id/ata-WDC_WD800BB-75FJA1_WD-WCAJ91986680-part1 /windows/C           ntfs-3g    users,gid=users,fmask=133,dmask=022,locale=en_US.UTF-8 0 0
proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
usbfs                /proc/bus/usb        usbfs      noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0

If I remember correctly I installed windows on the computer first, on the primary hard drive (I had trouble getting it to install to the secondary for some reason.) Then I installed opensuse on the secondary hard drive. But the Windows boot loader could not boot linux, so I had to install grub on the boot sector of the windows partition. (I think it “points” to the /boot folder of my secondary hard drive, I believe this is common on dual boot systems…)

As I recall I had one heck of a time getting dual boot to work with 2 physically seperate hard drives. It seemed to work great when I was running all of it on a single hard drive. Windows only wanted to install on the primary drive.

Hopefully this can help the gurus. I’ve been using linux a long time… but I’m stumped on this one.

Can you confirm your assessment of their being no recent software installed by typing in a terminal (as a regular user):

rpm -qa --last

that will give a chronological listing of installed software and you will be able to tell if any software was installed immediately before this problem started.

If that scrolls by too fast, then you can redirect it to a text screen by typing:

rpm -qa --last > list-of-rpms.txt 

and open the text file ‘list-of-rpms.txt’ with a text editor to confirm that no rpms were installed immediately before problem occurence.

Someone else will need to help re: the booting and rebooting specifics, as I am not skilled in that aspect of Linux.

I am still puzzled by there being no sdc when you ran fdisk, and my only conclusion there is you had another drive (maybe a USB memory stick) plugged in, then you plugged in the external drive, then you removed the USB memory stick, and then you ran the ‘fdisk’ command, or something along those lines. It is UNUSUAL to see a jump from sdb to sdd with no sdc (unless what I described took place, or unless there was an fstab entry causing such a skipped entry).

I assume that you do NOT have a USB device as the 1st boot device in your BIOS (which I believe you confirmed above when you reported on your boot device order being checked and being ok).

Very cool, I learned something new.

I will check the list of software… Unfortunately since my initial post I have installed a good bit, but I will try to go back to the date it started. Thanks for the input.

To answer your question, I did my initial disk wiping (fdisk) from a live cd (the ultimate boot disk). Nuking both hard drives, then installing windows first. Then for the linux install I again booted from a live CD (the opensuse cd) and preformed a “normal” install on the secondary hard drive. I had (past tense) another external hard drive at the time, it was turned on during the install. I remember seeing the option to install to it and shuddering at the possibility of losing my data.

If I find anything with the application list, I will check. Thanks for the idea. I didn’t know you could do that.

On 2010-12-01 17:06, oldcpu wrote:

> Code:
> --------------------
> rpm -qa --last
> --------------------

My preferred way to do that is this:

Code:

rpm -q -a --queryformat "%{INSTALLTIME};%{INSTALLTIME:day};
%{BUILDTIME:day}; %{NAME};%{VERSION}-%-7{RELEASE};%{arch};
%{VENDOR};%{PACKAGER};%{DISTRIBUTION};%{DISTTAG}
"
| sort | cut --fields=“2-” --delimiter=;
| tee rpmlist.csv | less -S

or

rpm -q -a --queryformat "%{INSTALLTIME} %{INSTALLTIME:day}
%{BUILDTIME:day} %-30{NAME} %15{VERSION}-%-7{RELEASE} %{arch}
%25{VENDOR}%25{PACKAGER} == %{DISTRIBUTION} %{DISTTAG}
"
| sort | cut --fields=“2-” > rpmlist

It is a list sorted by date, with some interesting fields. The first one is
in csv format, ready for importing into a calc sheet or database. The other
is separated by spaces.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)