server (upgraded 12.3 -> 13.1) with lvm-on-raid FAILs to boot ; timeout waiting for /boot device

Starting with a working 12.3 server with boot-on-RAID & lvm-on-RAID for other partitions, I in-place-upgraded to 13.1.

No errors on upgrade; mkinitrd worked OK.

It no longer boots.

Appears to hang at an LVM2/Dbus timeout waiting for the /boot device that’s on RAID:

EDIT: since,

***“The text that you have entered is too long (52952 characters). Please shorten it to 15000 characters long.”



relocated to: http://pastebin.com/KcNPriuV

The upgrade from 12.3 -> 13.3 moves LVM2 from sysvinit to systemd init; the unit files are new.

There’s at least 1 issue with LVM2 systemd unit (https://forums.opensuse.org/newthread.php?do=newthread&f=668)

mdadm 3.3 has issues too (http://www.spinics.net/lists/raid/msg44943.html)

There’s obviously a dependency problem.

Finding/fixing it has been elusive so far.

Any suggestions how to specifically identify which dependency is broken?

Fyi, all machines I’ve upgraded 12.3 -> 13.1 with boot&lvm-on-RAID fail in this manner.

All machines I’ve upgraded 12.3 -> 13.1 with boot&LVM, but NO RAID, boot without fail.

Looks like you need to report this to bugzilla.

Seems like something is broken in the mdadm stack

openSUSE:Submitting bug reports - openSUSE

already has been: https://bugzilla.novell.com/show_bug.cgi?id=851741typo in the OP:

On 2013-11-22 20:36, aropensuse wrote:
>
> Starting with a working 12.3 server with boot-on-RAID & lvm-on-RAID for
> other partitions, I in-place-upgraded to 13.1.

Is that an online upgrade (zypper dup), or offline upgrade (DVD)?

Online upgrade
method

Offline upgrade
method

Chapter 16. Upgrading the System and System Changes
openSUSE 12.3 Release Notes
openSUSE 13.1 Release Notes


Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)

> Is that an online upgrade (zypper dup), or offline upgrade (DVD)?

The info above is from an offline upgrade.

But, I’ve done both. Same result in both cases: OK, if no RAID. FAIL, if RAID’s involved.

On 2013-11-22 21:46, aropensuse wrote:
>
>> Is that an online upgrade (zypper dup), or offline upgrade (DVD)?
>
> The info above is from an offline upgrade.
>
> But, I’ve done both. Same result in both cases: OK, if no RAID. FAIL,
> if RAID’s involved.

There was a problem with offline upgrade and raid: the DVD used a
different device name for the raid than the running system, so that the
raid is not found at some point. I don’t know if it has been solved.


Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)

> There was a problem with offline upgrade and raid: the DVD used a
> different device name for the raid than the running system, so that the
> raid is not found at some point. I don’t know if it has been solved

I’ve missed any news on that.

Even though I see issues in both cases, it’d be useful to have that reference.

If you fo manage to recall where you heard it, pls post the info here.

On 2013-11-22 23:16, aropensuse wrote:
>
>> There was a problem with offline upgrade and raid: the DVD used a
>> different device name for the raid than the running system, so that the
>> raid is not found at some point. I don’t know if it has been solved
>
> I’ve missed any news on that.
>
> Even though I see issues in both cases, it’d be useful to have that
> reference.
>
> If you fo manage to recall where you heard it, pls post the info here.

I don’t know if I’ll be able to. I hear many things, but neurones are
not good at reliable indexing…

…]

Found one.

+++··········································
Date: Thu, 21 Mar 2013 15:43:37 +0100
From: Claudio ML <>
To: opensuse at opensuse.org
Subject: [opensuse] Usual problem with upgrade from DVD to 12.3 with
system using mdadm

Hello all,

I have seen there is the same problem of the previous versions of
OpenSuSE also into 12.3. The problem is: If you try to upgrade from a
DVD an OpenSuSE with software raid (mdadm), the installer don’t see the
previous version of openSuSE on the hard disk.

This is caused because the kernel don’t recognize correctly the md
arrays. In example, if i have md0 (/ root file system), md1 (/var), md2
(/srv), the installer see this as md125, md126 and md127.

So, no DVD upgrade possible for this machines. I have tryied to edit
manually the fstab on the machine, adapting to what the installer see,
but after the upgrade the system becomes un-bootable.

Anyone have a good workaround for this problem?

Cordially,

Claudio.
··········································+±


Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)

Thanks. Reading up, I’m beginning to suspect several issues here.

I’m trying to get it properly consolidated & reported.

For those interested:

"Bug 851741 - lvm2 systemd incorrectly uses dependencies on Fedora services (fedora-storage-init.service fedora-storage-init-late.service)
 https://bugzilla.novell.com/show_bug.cgi?id=851741

and

"Bug 851993 - boot failure after 12.3 -&gt; 13.1 upgrade if /boot is on RAID"
 https://bugzilla.novell.com/show_bug.cgi?id=851993

just fyi,

The ‘boot blocker’ was

DEVICE ...

raid assembly instructions in /etc/mdadm.conf, specifically targeting udev-created “by-id” links.

OK in 12.3, not in 13.1; commenting those lines out, 13.1 system boots again.

continued at:

“Bug 851993 - boot failure after 12.3 -> 13.1 upgrade if /boot is on RAID”
https://bugzilla.novell.com/show_bug.cgi?id=851993

Will wait a bit before moving my 12.3 server with RAID to 13.1. However, need to build a new server with RAID. Any idea of what happens if I install 13.1 on a /boot partition on a single HD, then create /home on the RAID drives?
Thanks.

> Any idea of what happens if I install 13.1 on a /boot partition on a single HD, then create /home on the RAID drives?

That’s an ill-defined question. I.e., it depends. In my specific case (boot on RAID-1, / on LVM-on-RAID, /home on LVM-on-RAID) – lots:

https://bugzilla.novell.com/show_bug.cgi?id=851422
https://bugzilla.novell.com/show_bug.cgi?id=851993
https://bugzilla.novell.com/show_bug.cgi?id=852021
https://bugzilla.novell.com/show_bug.cgi?id=852652
https://bugzilla.novell.com/show_bug.cgi?id=853293
https://bugzilla.novell.com/show_bug.cgi?id=853762

I’ll not move 13.1 into production – either as upgrade or new install – until all of these are resolved, are shown to be stable in test for ~ a month, and nothing else boot-related has cropped up in the meantime.

You’ve neither described your usage, nor mentioned where your / will be.

My answer will be mostly irrelevant to your specific case.

My advice – test your scenario thoroughly before mv to production.

You realize that the 13.1 install will not change, the ISO is frozen. So if you can’t live with the 13.1 installer then you will have to wait until 13.2 Also please report you trouble to bugzilla or maybe it won’t get fixed in 13.2 either.

openSUSE:Submitting bug reports - openSUSE

Look up ^^^. There are six bugs that’ve been filed. So far.

On 2013-12-05 21:26, aropensuse wrote:
>
> Look up ^^^. There are six bugs that’ve been filed. So far.

Did you add me too’s to all the pertinent ones?


Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)

I filed the bugs.

On 2013-12-05 21:46, aropensuse wrote:
>
> I filed the bugs.

Ah :slight_smile:


Cheers / Saludos,

Carlos E. R.
(from 12.3 x86_64 “Dartmouth” at Telcontar)

I just ran into the same problem. But for me, it was on a fresh clean install from USB-DVD.
The first boot after yast first stage was fine, but any subsequent boot failed with “timeout waiting for device dev-md-boot.device” message.
Partition layout GPT (on 3TB disks) seems to be the cause, with /boot and / on md-RAID1 partitions.
It works fine with 1TB boot disks with DOS disklabel.

FIXing the problam was easy:
boot into rescue system and change fstab to mount /boot by uuid instead of /dev/disk/by-md-id or whatever else yast puts in there