boot fails with LVMs-on-RAID; systemd "Timed out" and "Dependency failed" errors ?

I'm working on an opensuse 13.2 machine running systemd v210.

It's disks are all on RAID.

/boot is on RAID1 on /dev/md126

The remaining partitions are on LVM-on-RAID10

The LVs are

    LV_ROOT       VG0 -wi-ao---  20.00g                                             
    LV_SWAP       VG0 -wi-ao---   8.00g                                             
    LV_HOME       VG0 -wi-ao--- 100.00g                                             
    LV_VAR        VG0 -wi-ao---   1.00g                                             

The system fails to boot, dropping to a maintenance mode prompt.

Simply hitting Ctrl-D to continue, finishes booting the system.

After boot, checking

    journalctl -b | egrep -i "Timed out|result=dependency" | egrep -i "dev|mount"
        Feb 20 08:16:15 ender systemd[1]: Job dev-VG0-LV_HOME.device/start timed out.
        Feb 20 08:16:15 ender systemd[1]: Timed out waiting for device dev-VG0-LV_HOME.device.
        Feb 20 08:16:15 ender systemd[1]: Job systemd-fsck@dev-VG0-LV_HOME.service/start finished, result=dependency
        Feb 20 08:16:15 ender systemd[1]: Dependency failed for File System Check on /dev/VG0/LV_HOME.
        Feb 20 08:16:15 ender systemd[1]: Job dev-VG0-LV_VAR.device/start timed out.
        Feb 20 08:16:15 ender systemd[1]: Timed out waiting for device dev-VG0-LV_VAR.device.
        Feb 20 08:16:15 ender systemd[1]: Job systemd-fsck@dev-VG0-LV_VAR.service/start finished, result=dependency
        Feb 20 08:16:15 ender systemd[1]: Dependency failed for File System Check on /dev/VG0/LV_VAR.
        Feb 20 08:16:15 ender systemd[1]: Job dev-disk-by\x2did-dm\x2dname\x2dVG0\x2dLV_HOME.device/start timed out.
        Feb 20 08:16:15 ender systemd[1]: Timed out waiting for device dev-disk-by\x2did-dm\x2dname\x2dVG0\x2dLV_HOME.device.

This is reported ON the same system.  I.e., all the LVs are correctly mounted and fully functional.

Why are these time-outs & dependency-fails occurring?  What do I need to change/fix to make sure it does not happen, and avoid getting dropped into emergency mode on boot?

hanlon

FWIW:

    I originally posted this to systemd-info mailing list; that got this reply

    On Fri, 20.02.15 08:38, [EMAIL="h15234@mailas.com"]h15234@mailas.com[/EMAIL] ([EMAIL="h15234@mailas.com"]h15234@mailas.com[/EMAIL]) wrote:I'm working on a machine running systemd v210 (Opensuse 13.2)

    This is a really old systemd version, please ask downtream for help on
    such old version!Its disks are all on RAID.

/boot is on RAID1 on /dev/md126

The remaining partitions are on LVM-on-RAID10

The LVs are

    LV_ROOT       VG0 -wi-ao---  20.00g                                             
    LV_SWAP       VG0 -wi-ao---   8.00g                                             
    LV_HOME       VG0 -wi-ao--- 100.00g                                             
    LV_VAR        VG0 -wi-ao---   1.00g    

    Well, LVM and RAID are nothign we support upstream, please talk to the
    LVM/MD communities or downstream for help. 
    Sorry,
                                             
    Lennart

Did it ever work? If yes, when is started failing - after some update or were any other changes? Does initrd correctly starts LVs (check with rd.break on command line which gives you shell in initrd)?


Yep, it used to work.  As recently as a few days ago.  Then I rebooted.

There had been a few upgrades over the few days prior to that but no restarts.

update:

i *JUST* managed to get it to boot without "timed out" or "dependency=" errors.

I found this BUG

https://bugzilla.novell.com/show_bug.cgi?id=862076

and dug around a bit.  A bunch of stuff has been done  on that but no clue what the final answer is; the discussion's all over the place.

I also found 

"systemd Boot Problem - Times out waiting for LVM partitions"
https://bbs.archlinux.org/viewtopic.php?pid=1180367#p1180367

So just to test I created

vi /etc/systemd/system/lvm_local.service
---------------------------
[Unit]
Description=LVM activation
DefaultDependencies=no

Requires=dev-VG0-ROOT.device
Requires=dev-VG0-SWAP.device
Requires=dev-VG0-HOME.device
Requires=dev-VG0-VAR.device

After=dev-VG0-ROOT.device
After=dev-VG0-SWAP.device
After=dev-VG0-HOME.device
After=dev-VG0-VAR.device

Before=local-fs.target
Before=basic.target shutdown.target
Conflicts=shutdown.target

[Service]
ExecStart=/sbin/vgchange --available y
Type=oneshot
TimeoutSec=0
RemainAfterExit=yes

[Install]
WantedBy=basic.target
---------------------------

AND did the OPPOSITE of what the other bug suggested. I changed

use_lvmetad = 0

to

use_lvmetad = 1

in /etc/lvm/lvm.conf

then

systemctl enable lvm_local.service
systemctl enable lvm2-lvmetad.socket
systemctl start lvm2-lvmetad.socket

and rebooted

No Emergency Mode, no stalls, and no errors reported in the journalctl.

So it works.

WHY it works, why I needed to do this, or whether this will cause other problems, no idea.  Obviously something recently changed in the upgrades.


I have more or less the same configuration (separate /boot and the rest are LVs on a single VG) with exception that I do not have Linux MD. So whatever problem you have seems to be related to Linux MD. There was recently update to mdadm that could have something to do with it.

There is really no way to tell unless you are willing to reproduce the problem and provide debugging information (and I do not know in advance what information may be needed).

As long as nobody who can actually reproduce it helps with debugging the problem will remain.

I tried creating VM with 4 disks, /boot on RAID1 and the rest on LVM on RAID10; installed 13.2, updated and it boots without problem. No delays, everything is mounted.

Hrm. Yeah this will be tough to nail down. :frowning:

I’m guessing it’s not just one thing, but a combination.

I can now reproduce this 100% on this machine by undoing or replacing the fixes.

Right now, I need the lvm.conf settings and both of the unit files enabled to make this behave.

In your test, what are your current settings for


egrep "filter =|use_lvmetad" /etc/lvm/lvm.conf | grep -v "#"
    filter =  "r|/dev/.*/by-path/.*|", "r|/dev/.*/by-id/.*|","r|/dev/fd.*|", "r|/dev/cdrom|",  "a/.*/" ]
    use_lvmetad = 1

systemd-analyze blame | head -n 2
    20.074s systemd-udev-trigger.service
    20.022s plymouth-start.service

systemctl list-unit-files | egrep "udev|lvm|mount"
    dev-hugepages.mount                     static  
    dev-mqueue.mount                        static  
    dracut-mount.service                    static  
    dracut-pre-mount.service                static  
    dracut-pre-udev.service                 static  
    initrd-udevadm-cleanup-db.service       static  
    lvm2-lvmetad.service                    disabled
    lvm2-lvmetad.socket                     enabled 
    lvm2-monitor.service                    disabled
    lvm_local.service                       enabled 
    proc-sys-fs-binfmt_misc.automount       static  
    proc-sys-fs-binfmt_misc.mount           static  
    sys-fs-fuse-connections.mount           static  
    sys-kernel-config.mount                 static  
    sys-kernel-debug.mount                  static  
    systemd-remount-fs.service              static  
    systemd-udevd-control.socket            static  
    systemd-udevd-kernel.socket             static  
    systemd-udevd.service                   static  
    systemd-udev-root-symlink.service       static  
    systemd-udev-settle.service             static  
    systemd-udev-trigger.service            static  
    tmp.mount                               static  
    udev.service                            static  
    umount.target                           static  
    var-lock.mount                          static  
    var-run.mount                           static  

rpm -qa | egrep "^kernel-desktop|^mdadm|^udev|^systemd|^lvm"
    kernel-desktop-3.19.0-2.1.g1133f88.x86_64
    kernel-desktop-devel-3.19.0-2.1.g1133f88.x86_64
    lvm2-2.02.98-43.17.1.x86_64
    mdadm-3.3.1-5.14.1.x86_64
    systemd-210-10.1.x86_64
    systemd-bash-completion-210-10.1.noarch
    systemd-devel-210-10.1.x86_64
    systemd-presets-branding-openSUSE-0.3.0-12.4.1.noarch
    systemd-rpm-macros-2-8.1.2.noarch
    systemd-sysvinit-210-10.1.x86_64
    systemd-ui-2-4.1.4.x86_64
    udev-210-10.1.x86_64

?

If it happened every time it would have been fixed long ago.

I can now reproduce this 100% on this machine
In this case I repeat my question - if you stop in initrd what is the state of configuration of MD and LVM (cat /proc/mdstat, pvs, lvs)? Same question when booting stops in emergecy mode. I.e. is the problem that resources are not configured or that systemd does not know about it.

use_lvmetad = 1

It is disabled by default after installation.

    kernel-desktop-3.19.0-2.1.g1133f88.x86_64
    kernel-desktop-devel-3.19.0-2.1.g1133f88.x86_64

?

I have standard 13.2 kernel. 3.16.7-7.1.

> If it happened every time it would have been fixed long ago.

Well that’s sure not a guarantee.

> if you stop in initrd what is the state of configuration of MD and LVM (cat /proc/mdstat, pvs, lvs)? Same question when booting stops in emergecy mode. I.e. is the problem that resources are not configured or that systemd does not know about it.

arrays, pvs, vgs & lvs are all up at all stages I check. both in initrd & at emergency mode. sure seems like systemd’s announcing the timeouts & dependency fails & kicking it into emergency mode.

> It is disabled by default after installation.
> I have standard 13.2 kernel. 3.16.7-7.1.[/QUOTE]

That’s a pretty significant difference. I’ll bet you may also have older udev/systemd/mdadm versions.

Apples 'n oranges …