I need help, and I admit I know far too little about OpenSuSE to offer much in the way of information about what my problem might be, at least in an initial post. I apologize for that, but I’ll try to explain what happened.
I am using OpenSuSE Tumbleweed. I had previously used OpenSolaris, and I knew that extremely well (former Sun engineer and kernel developer), but I switched to OpenSuSE (and thus Linux) for new hardware support.
I update once every couple of weeks. Mostly, it goes ok, except for having to recompile the NVIDIA driver manually, which is quite a pain. But it works. (I’d probably upgrade more frequently if this were less of a hassle.)
That is, it did work until today. I did my usual dance of running “zypper up” and waiting while 2GB+ of updates downloaded and installed. There were a few file conflicts between some texinfo packages – but I don’t care much about that. Then I rebooted to single user, and rebuilt the NVIDIA driver, as usual. That worked.
Then I rebooted again. That’s when the horrors started. First, the boot hung, so I pressed “Esc” to get off the three-dots screen and see what it was doing. I saw something about waiting on “dev-md-tank.device” with a “1min 30s” to go. It timed out and gave me “emergency mode.”
I logged in. I have two RAID arrays – a root mirror on /dev/md0 and a RAID5 array of 4 drives called /dev/md/tank. /proc/mdstat said only md0 was there. So, I ran mdadm -A /dev/md/tank. That immediately got the RAID5 array back without complaint. I mounted /tank manually, and everything was there. No problems. I searched around and found documentation about using mdadm to dump state, and I looked through it. There were no problems at all – all drives present, clean, no issues. It doesn’t appear to be an mdadm problem.
journalctl told me that systemd-udevd is in terrible shape. It is getting tons of fork failures, and is erroring out on all of the commands it’s trying to run as a result. I tried playing with “udevadm control -m” to set a different number of children (I tried high and low numbers), but nothing made any difference. Something seems to be wrong with udevd, but I don’t know what, and try as I might, I cannot locate where any of the configuration information comes from. (/usr/lib/udev looks promising, but I can’t figure out how any of it relates to anything I see.) If only I could delete “dev-md-tank.device” from the system I would have been able to proceed.
I tried rebooting. Hung again. I tried commenting out the mdadm.conf “ARRAY” entry to get rid of it temporarily (it’s just data; I don’t need it to boot) and removing “partitions” from the “DEVICE” line, but the system insisted on timing out on dev-md-tank.device. I googled but found no way to delete that udev entry or disable it or skip it or get around the dependency. It seems like it’s automatic somehow. I tried assembling it and mounting it manually and doing “systemctl default” to continue the boot. That hung because of something called “Plymouth.”
Plymouth is dropping core and I don’t know why. I read around the Internet some more, and like the complete idiot I am, I found a web page saying I should run “dracut.” That was a horrible, horrible mistake. Now it doesn’t boot at all. Just a cursor in the upper left corner and that’s it.
I got the system to boot by manually entering kernel and initrd 4.4.3-1 instead of the new kernel 4.5.0-3. Now the system is up (sort of; Xorg is sick, hangs at 100% CPU, and I can’t log in), but I have no idea how to edit the grub menu to force it to stay on the semi-working 4.4.3 version and avoid the completely broken 4.5.0 version, so reboots are perilous. I tried reinstalling “kernel-default” with zypper in the hope that this would rebuild whatever “dracut” broke, but it didn’t. Boots into 4.5.0 just hang – “Esc” from the three-dot screen just gives me a cursor and nothing else.
I’m stuck. Before I migrate back to one of the OpenSolaris distributions, where I feel like I know what I’m doing (and have the option of downgrading!), what should I try?
journalctl output from failed 4.5.0 kernel boot:
http://www.workingcode.com/journalctl-1.txt
and output from reversion back to 4.4.3: