11.2 Hangs During Boot

I’ve just installed openSUSE11.2 on a Dell PowerEdge 1750 server. Unfortunately, the sucker freezes when the kernel is starting its services. The exact place the freeze occurs is inconsistent, but it’s always before I can log in.

The freeze happens whether I select “openSUSE 11.2 - 2.6.31…5-0.1 (default),” “Failsafe – openSUSE 11.2 - 2.6.31.5-0.1,” or “Xen – openSUSE 11.2 - 2.6.31.5-0.1.”

I’ve also booted manually from the grub command line. Same thing: The kernel loads, starts its services, and–whammo!–freezes.

I just need to get some hints or ideas on how to track down the problem. I’m aware of stepping through the startup process by changing the PROMPT_FOR_CONFIRM setting in the /etc/sysconfig/boot file, and I can view the boot file in the grub menu using the “cat” command, but I can’t figure out how to edit the sucker.

So, hints and ideas please. Thanks.

Okay. I’ve made some progress.

I booted the server off the original install CD and selected “Rescue System” from the grub boot menu.

This allowed me to get to the Linux prompt, login as root, mount /dev/sda2 (the disk partion on which the / partition resides), and edit the /etc/sysconfig/boot file.

So now I’ll reboot and trace the kernel startup as best as I can.

It appears that the problem is with the smartd service. The following message appears when it attempts to start: “Starting smartd /etc/init.d/rc5.d/S13smartd: line 190: 3680 Killed $SMARTD_BIN$smart_opts.” The server then hangs.

I looked into the smartd startup script, /etc/rc.d/smartd. Line 190 is the start of a case statement. Line 198, however, contains “$SMARTD_BIN$smart_opts.” I changed it to “$SMARTD_BIN $smart_opts”–that is, I added a space between the two variable strings–but it still errors out with the same message.

My only option seems to be to disallow smartd from starting.

Okay. Fixed the problem by disallowing smartd from automatically starting during boot: chkconfig smartd off.

The server now boots perfectly.

In case you did not know smrtd is a demon to monitor the smart system on the drives. perhaps the disk control in the Dell is having problems with this?? Is it a RAID controller? Dell tends to have some wierd hardware, BIOS setups on their servers.

BTW very good trouble shooting <:)

The 1750 is using a Dell PERC 4/Di SCSI RAID controller with the latest firmware, version 422D from July 17, 2007. This is an older server that I’m just using for testing, so the drives’ SMART capabilities aren’t all that important.