openSUSE 12.1: boot problem: Mount Race when Using Software RAID (md0)

I’ve been upgrading systems from openSUSE 11.4 to openSUSE 12.1. This message describes a problem I encountered which I think is intrinsic to openSUSE 12.1 itself, rather than being an artifact of the upgrade process; but, I could be wrong. I do not yet have a fix for this problem.

I have a system with two hard drives. For convenience, I’ll refer to these drives as /dev/sda and /dev/sdb, although in fact they appeared as /dev/sde and /dev/sdf this morning, which I described in a separate thread.

/dev/sda1 is “/”. /dev/sda2 and /dev/sdb2 are a software RAID1 silesystem, /dev/md0, which contains “/home”.

This morning this system failed to boot. After dealing with a device enumeration problem (which I described in another thread), the boot process failed with what looked like a nouveau error. dmesg revealed that this was just the last in a series of failure messages that started when “/home” failed to mount.

“/home” failed to mount because the system attempted to mount it before md had finished binding partitions into “/dev/md0”. Here’s what I saw:

  • md starts to build /dev/md0
  • an attempt is made to mount /home on /dev/md0, which failes because of a superblock problem
  • md then reports that is has build /dev/md0.

I believe that the underlying problem is an error in the dependency descriptions used by the new systemd system initialization procedure introduced in openSUSE 12.1. local filesystem mounts should wait until after md initialization has completed. I haven’t used systemd before, and I’m still reading up on it, and I don’t have a workaround at this time.

Could be that the change in drive order is messing things up. I’d concentrate on why that is happening. What are the new sda, sdb etc drives?

would help to see fdisk -l output

Could you post a copy of your /etc/fstab file? Have you considered placing the /home mount as the very last line in your fstab file?

Thank You,

hi,

Same problem here: during boot sequence, mounting / or /home fails with superblock problem, sending me to recovery console.

Exiting console with ^D leads to a successful boot.

Here’s my fstab:

% cat fstab
/dev/disk/by-id/ata-SAMSUNG_HD154UI_XXX-part4 swap                 swap       defaults              0 0
/dev/disk/by-id/ata-SAMSUNG_HD154UI_XXX-part4 swap                 swap       defaults              0 0
/dev/md1             /                    ext4       acl,user_xattr        1 1
/dev/md0             /boot                ext4       acl,user_xattr        1 2
/dev/md2             /home                ext4       acl,user_xattr        1 2
proc                 /proc                proc       defaults              0 0
sysfs                /sys                 sysfs      noauto                0 0
debugfs              /sys/kernel/debug    debugfs    noauto                0 0
devpts               /dev/pts             devpts     mode=0620,gid=5       0 0

I’ll try placing /home at the end of the file and let you know.

On 2011-11-21 22:16, CraigMiloRogers wrote:
> I believe that the underlying problem is an error in the dependency
> descriptions used by the new systemd system initialization procedure
> introduced in openSUSE 12.1. local filesystem mounts should wait until
> after md initialization has completed. I haven’t used systemd before,
> and I’m still reading up on it, and I don’t have a workaround at this
> time.
>

You can easily boot temporarily with the old systemV: at the boot prompt
use the F5 menu entry.

In the mail lists they have mentioned these bugs:

https://bugzilla.novell.com/show_bug.cgi?id=731230
https://bugzilla.novell.com/show_bug.cgi?id=731135


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

I was having the exact same problem with a recent 11.4 > 12.1 upgrade at work. My fstab file looks nearly identical to yours.

I ended up doing the same as suggested in a separate response here and choosing the older system V boot from the Grub menu.

I am away from work for the remainder of the holiday, but will follow this thread and am willing to try most anything to get the new systemd boot to go. I’ve got it running on some other systems (non-mdraid) and they saw nearly a 33% boot improvement.

Regards

In the normal SysV init procedure, service initialization dependencies are handled by assigning them to sequential steps in the initialization process. For example, consider “/etc/init.d/boot.d” (the names that appear below are symbolic links to files in “/etc/init.d”):

jib2:~ # ls /etc/init.d/boot.d
K01boot.cycle K04boot.localfs S08boot.md
K01boot.ipconfig K06boot.md S11boot.localfs
K01boot.klog K06boot.rootfsck S13boot.cleanup
K01boot.ldconfig K08boot.device-mapper S13boot.cycle
K01boot.localnet K09boot.udev S13boot.klog
K01boot.scpm K10boot.startpreload S13boot.scpm
K01boot.sysctl S01boot.proc S13boot.swap
K01boot.udev_retry S01boot.startpreload S13boot.udev_retry
K02boot.cleanup S02boot.udev S14boot.apparmor
K02boot.clock S03boot.rootfsck S14boot.ldconfig
K02boot.loadmodules S04boot.clock S14boot.sysctl
K02boot.proc S05boot.device-mapper S15boot.ipconfig
K02boot.swap S05boot.loadmodules
K03boot.apparmor S05boot.localnet

The RAID service (md) is started by “S08boot.md” (a link to “/etc/init.d/boot.md”), which is run before the local filesystem service, “S11boot.localfs” (a link to “/etc/init.d/boot.localfs”). This ensures that boot.md has run mdadm and, in the normal course of events, has assembled the local RAID devices (e.g., “/dev/md0”) from suitable partitions, before the localfs service attempts to mount their contents (in my case, mounting “/dev/md0” on “/home”).

There are comments at the beginning of these scripts, called LSB comments, a remnant of activity by the Linux Standards Base. These comments are intended to document the inter-service dependencies. There is code in systemd (see “service.c”) that parses the LSB comments, although I do not know if or when this parsing code is actually run.

I suspect that the LSB comments in “/etc/init.d/boot.localfs” are incorrect; or, at least, service.c’s interpretation is suspect. boot.localfs includes “boot.md” in its Should-Start: list. I suspect that boot.md should be in the Required-Start: list, instead.

Systemd maintains (or, at least, uses) a bunch of dependency information in “/lib/systemd”, and in particular in “/lib/systemd/system”. There is a “localfs.service” file, and “local-fs.target” and “local-fs-pre.target” files, which are supposed to encode the dependencies needed by the local filesystem boot-time mounting service. There is no mention of a “md.service”, and no encoding, that I can see, to express the requirement that boot.md (or its equivalent in the systemd world) must complete before boot.localfs (or its equivalent) is started.

If you use the systemctl command, you can see that an “md.service” is really created. I do not know how this takes place. I am fairly confident in my conclusion that however “md.service” gets created, it is being done at the wrong time with respect to “localfs.service”.

A workaround was mentioned by a prior poster. When the grub menu is present, hit F5 and select SysV initialization instead of Systemd.

Does anyone know whether it is possible to make SysV init the default choice? I’ll research this as time permits, but perhaps someone knows offhand.

I found this bit of information:

If for some reason, systemd does not work for you, you can still use the old sysV-init by pressing F5 in the bootloader. If you want to permanently use the old init, just do zypper rm systemd-sysvinit and accept the installation of ‘sysvinit-init’

I must tell you that at some point with openSUSE releases, this may become like swimming up stream to stick with SysV.

Thank You,

Yes, I saw a similar description in the OpenSUSE 12.1 release notes. I was hoping to find a grub/menu.lst option to override the default… and here it is.

Using YAST->Boot Loader, I added the following option to the bootstrap kernel command line:

init=/sbin/sysvinit

How did I arrive at this? Well…

It turns out that “/boot/message” contains the GUI program used by grub, but owned by the gfxboot package. The runtime package is “gfxboot-branding-openSUSE”. The source package is “gfxboot-4.4.7-2.1.4.src.rpm”. The specific source code is in the source package “/usr/src/packages/SOURCES/themes/openSUSE/src/dia_otheropts.inc”, which is tarred in “/usr/src/packages/SOURCES/openSUSE.tar.bz2”.

I am sure there are good reasons for all this complexity.

Along the way, I encountered two additional bugs in openSUSE 12.1:

  1. The source code repository for openSUSE 12.1 is not configured to allow access by YaST->Software Management. On the other hand, that may be better than the situation for openSUSE 11.4, which allows you to configure the source repository, and downloads, but then offers to switch system packages to using the source code repository, which doesn’t seem quite right.

  2. YaST->Software Management does not perform “file list” searches correctly.

I suppose I should go log these in bugzilla.

Oh, and the “strings” and “fgrep” commands were vital to my arriving at this conclusion.

Further worries:

If systemd isn’t properly sequencing md startup and local filesystem mounts during system startup, is it possible that it’s also not properly sequencing them on system shutdown??? I have this horrible thought that md might be shutdown before the filesystem is demounted. Perhaps the kernel enforces the right sequencing… probably the kernel enforces the right sequencing.

On 11/28/2011 05:56 AM, CraigMiloRogers wrote:

> suppose I should go log these in bugzilla.

absolutely…


DD
openSUSE®, the “German Engineered Automobiles” of operating systems!

@CraigMiloRogers

i’ve an opensuse 12.1 Xen 4.1.2 server host that boots with an md RAID-10 array attached @ a PCI sata card. once the drive’s attached, and the array’s mounted, nfs-server exports volumes from the array.

as you’ve pointed out, the order-of-things matters greatly.

after upgrading from 11.4, where this configuration has run without problem for ages, to 12.1, the attach/mount of the array was consistently failing. i was completely unable to get the systemd-controlled boot process to bring those drives up. of course, that caused NFS to fail as well.

on finding this thread, adding “init=/sbin/sysvinit” to the Xen kernel’s command line, everything’s back to normal.

i’ve looked, but can’t seem to find any documentation of systemd re: md/raid, &/or nfs service on opensuse. there’s bits and pieces in the Gentoo and Arch worlds, but the info doesn’t map, it seems, to suse’s configs. or, at least, i haven’t figured it out.

in any case, the switch BACK to sysvinit has my server back up & running. there’s something real here, i suspect your speculation is not far from the mark …

have you filed a bug on this yet, or know of one that has been?

there’s a resolution

Re: [opensuse-kernel] openSUSE 12.1: boot problem: Mount Race when Using

soon (?) to be released …

I came across this very issue when installing 12.1 and I use a small raid 1 boot partition on sda1 & sdb1
I found that as a temporary solution to the problem, I did the system install with systemV-init selected rather than the new systemd.

You are not far from the truth here unfortunately, which I think it is something that is not quite OK for a ‘released’ distro.

If the first part (the starting-up) was already fixed (hopefully) the shutdown part seems to be biting hard, have a look here: [Bug 752107] New: 12.1 reboot hangs with Intel software RAID1 enabled](http://lists.opensuse.org/opensuse-bugs/2012-03/msg01868.html)

Not sure if using systemV-init instead of systemd would fix that or not. I mean if the install/upgrade was done with systemd and then either use sysV-init at boot time or uninstall systemd after the install/upgrade.

Looking at RH, it seems they fixed that already, but not opensuse.

Cheers.