Can only boot to rescue mode

Let me start off by saying that I normally do not post on forums in search of help. I’d rather waste hours of my own time than that of someone else’s. However, I’ve exhausted all of my resources with this recent issue and my Google-fu has let me down. I will say up front that I have learned a major lesson from this in terms of keeping a backup of the root system. I also learned while reading around that snapper works with ext4 (oh how I wish I would have made a snapshot of my system). I really hope that I don’t have to reformat since I will lose many configurations in the main system and it will probably take me at least a week to reinstall software and reconfigure anything not saved in /home (but that will be the price I pay for my incompetence and oversight). So with that said, let me begin by describing the problem and providing as much information as I can.

The problem started this past Saturday, 9/29, after a zypper dup. Not encountering any blatantly obvious issues, I rebooted as I usually do after updating. The update upgraded my kernel to 4.18.8 and all looked like things were fine, until the boot just sorta stopped… as in it didn’t reach either the KDE login screen or get thrown to emergency mode. I can’t access any of the tty environments to login either. Instead I just have the boot output showing OK ] Started Update system wide CA certificates as the last successful output. Reviewing the prior output, I can see that the root filesystem was mounted successfully and then my /home partition on a separate drive was mounted after it was unencrypted. I originally noticed warnings for the wicked service failing to start and my NFS shares failed to mount (for obvious reasons), however I believe I’ve fixed this since. Using Alt + F1, I can see the following:

  • a few mce [Hardware Error] entries (I’ve had these from before and I believe the CPU is correcting them before boot)
  • an entry for [FAILED] Failed to start Setup Virtual Console (I think this is related to /etc/vconsole.conf)
  • an entry for Starting Show Plymouth Boot Screen…

So not having any clue where to start, I started with the basics. First, I tried to boot into the recovery/fallback kernel by going into the Advanced options in grub. I also tried the other kernel options that were available to me and their respective recovery/fallback option (4.18.5 and 4.17.9). I even tried editing the grub for each boot and removing extra kernel options and enabling debug. I had absolutely no luck with any of these things, so my next thought was to make myself a new live USB and try an upgrade. After installing some new files, removing some obsolete, and rebooting, I returned to the same scenario. At this point I let the system sit for a while, I wondered if it was just taking a really long time for some unknown reason. Needless to say, that didn’t help.

Next thing on the list was repair the bootloader and check the filesystem. I booted off the live USB again, ran fsck on the system drive, then mounted the system, chrooted in and rebuilt grub with grub2-mkconfig -o /boot/grub2/grub.cfg and grub2-install /dev/sda. File system came back fine, and the grub rebuild didn’t improve my situation. So at this point I decided to boot into the main system and pass the rescue kernel option to force my way into rescue mode natively. I started poking around in the systemd journals for any error I could find and started researching each and every possible one.

Here’s the output of journalctl -xrb -1 for anyone who wants to take the time to read through it.
https://pastebin.com/jd62ayEG

Some things I noticed looking through systemd logs:

  • NVRM entries related to the Nvidia probe routine (I’ve tried removing and reinstalling the Nvidia proprietary drivers as well as blacklisting nouveau and using the nomodeset kernel option)
  • /usr/bin/loadkeys failed with exit status 1 (I believe this is related to the console keyboard layout)
  • Invalid rule /etc/udev/rules.d/50-brother-brscan4-libsane-type1.rules:9: unknown key ‘SYSFS{idVendor}’ (Related to my Brother printer, I think this is safe to ignore)
  • EDAC sbridge: Couldn’t find mci handler … ECC is disabled (I’ve had this since I started using linux, appears to be related to my chipset)
  • Process 236 (haveged) of user 0 dumped core (this seems to be related to system entropy and I assume the core dumps since the system is hanging at the end and not proceeding to a login)

And here is some other various information that may or may not be useful.

The output of mkinitrd:
https://pastebin.com/cSqzFzFH

Zypper history from 9/29:
https://pastebin.com/wKSrVk1K

System:

  • Tumbleweed x86_64
  • i7 5820K
  • Geforce GTX 1050 2GB
  • 16GB DDR4

Please let me know if there is anything else I can provide and thanks ahead of time for any assistance. My fate is in the communities’ hands.

If you have Plymouth installed, try removing it:
https://bugzilla.opensuse.org/show_bug.cgi?id=1090451

similar experience on laptop with Tumbleweed 32bit KDE install, ext4 disk

in rescue mode the command

fsck

was run, this found errors and ask if they should be corrected

after accepting, on reboot all was back to normal

hth

for disk identification command

df

was used to get boot disk details (e.g. /dev/sda2)

According to bug 1047225, I added omit_dracutmodules+=“plymouth” to a file in /etc/dracut.conf.d/ and rebuilt the initramfs. Tried booting with splash=silent for all three kernels with no luck. This did resolve all of the warnings/errors I saw related to nouveau in the mkinitrd output and the NVRM entries related to Nvidia in my systemd logs, so that’s a plus.

Tried running fsck from a live USB again. fsck /dev/sda1 clean, 373828/3637248 files, 4904928/14525440 blocks

Thanks for the suggestions. I’m willing to try anything to get my system back up and running.

Next suggestion

Do an upgrade from the original installation DVD or most recent DVD

When asked for deletion of existing repositories change as required to enable or disable
(try to replicate what was there before)

On the next menu accept all proposed new repositories
(this most probably will mean some duplication which can be corrected later)

At the accept menu have a look at all proposed changes and alter if necessary
(this should lead to a minimal update, dependent upon last status)

Then accept and sit back and wait for completion and hope a reboot is successful

If the installed os is not recognised by the updater then there is not must hope of recovery

cheers

After removing the Nvidia proprietary driver, did you try rebooting, with nomodeset absent from kernel cmdline, and with rebuilt initrds that lack proprietary NVidia specifics, before reinstalling the proprietary driver?

Felix, please stop propagating this as a solution. The forums already have a couple of threads where a patch was provided.

@OP: Welcome to these forums,
There already is a patch to solve the vconsole error message:

  • Create a file systemd-vconsole-setup-race-fix.patch in your homedir with the following content:

diff --git a/modules.d/98dracut-systemd/dracut-cmdline-ask.service b/modules.d/98dracut-systemd/dracut-cmdline-ask.service
index 1685479a..3d233143 100644
--- a/modules.d/98dracut-systemd/dracut-cmdline-ask.service
+++ b/modules.d/98dracut-systemd/dracut-cmdline-ask.service
@@ -12,6 +12,8 @@ Description=dracut ask for additional cmdline parameters
 DefaultDependencies=no
 Before=dracut-cmdline.service
 After=systemd-journald.socket
+After=systemd-vconsole-setup.service
+Requires=systemd-vconsole-setup.service
 Wants=systemd-journald.socket
 ConditionPathExists=/usr/lib/initrd-release
 ConditionKernelCommandLine=|rd.cmdline=ask

and save it in your homedir
Next do


su
cd /usr/lib/dracut
patch -p1 < /home/YOUR_USERNAME/systemd-vconsole-setup-race-fix.patch

Reboot and the message will be gone. An upstream patch is supposed to be on the way, haven’t seen it yet

Have you tried cleaning up the kernel cmdline:

BOOT_IMAGE=/boot/vmlinuz-4.18.8-1-default root=UUID=e47d1d02-9700-49c4-a118-0985296aa51d nomodeset nosplash quiet showopts intel_iommu=on iommu=pt

1-showopts is a noop unless using Gfxboot and Grub Legacy
2-why are intel_iommu=on and iommu=pt included? My Haswell doesn’t need them. kernel-parameters.txt doesn’t list them.
3-"" is the default splash. IOW, if you wish no splash, splash needs no cmdline mention. Splash is also absent from kernel-parameters.txt.
4-Don’t forget, nomodeset blocks use of all competent FOSS X drivers (ie: modesetting & nouveau; if booting without proprietary driver installed).

Does that equate to on the mirrors in oss, non-oss or updates?

If such was the case, posting the patch here would be useless. I already wrote that upstream needs to incorporate the patch.
But my point was that uninstalling plymouth is not a solution, not even a workaround.

I downloaded a recent DVD image and created a new USB. Performed an upgrade, keeping the existing repositories (with their enabled status set to the best of my memory), and accepted the new repositories. As part of the upgrade, I noticed that the kernel updated to 4.18.9 and new Nvidia drivers (390.87) were also installed from the official repo. I didn’t receive any errors, however upon reboot I encountered the same issue. All parts of the system seem to load fine and all my drives and network shares are mounting without error. Yet I still can’t gain access to a tty login or the graphical login.

Thanks for the patch. I followed your steps exactly while in rescue mode and it says the patch ran, however it doesn’t appear to have fixed anything. I’m still seeing the Failed to start Setup Virtual Console line in my systemd logs.

Also I’m still seeing NVRM: The NVIDIA probe routine was not called for 1 device(s) even though plymouth is omitted from dracut (nvidia actually did this automatically with 50-nvidia-default.conf in /etc/dracut.conf.d/). I’ve also blacklisted nouveau and I’m using nomodeset. Should I just remove all of the Nvidia packages?

  1. I don’t remember why I had showopts, so I’ve removed it
  2. intel_iommu=on and iommu=pt are used for PCI passthrough via OVMF for my QEMU VM according to this guide
  3. I removed nosplash as you suggested
  4. I’m using nomodeset since I’ve read this is sometimes necessary while using the proprietary driver even when blacklisting nouveau. If I remove the proprietary driver I will remove this

Did you run the patch after the upgrade? Sure the log entries are from the boot after applying the patch? Are you sure you applied the patch to the installed system?

Re. The NVIDIA issues: please stick to one subject per thread, things get confusing when replies also contain various subjects.

But, uninstall the NVIDIA packages, leave the NVIDIA repo enabled. Then run


zypper inr

It should install the correct nvidia package versions.
Some advice: do not change multiple things at once, it will be harder to detect the culprits in the issues you’re having.

Yes, I ran the upgrade and rebooted. Discovered the problem wasn’t fixed. Shutdown and booted into rescue mode. Ran the patch file as suggested then rebooted again. Let the system run till it went no further. Shutdown and booted into rescue mode for a second time. My systemd logs are still showing the error. Should I try running the patch again?

Fixed the vconsole errors. I found Bug 1055835 in which the patch was first introduced. This time I ran the patch and rebuilt the initramfs with mkinitrd. Now the Failed to start Setup Virtual Console, Input/Output, and loadkeys errors are all gone.

Still can’t boot to either graphical or multi-user targets though. What’s even more odd (at least in my mind) is my journal has output towards the end that says the system finished startup. Since I’m lost on what to try next, I’ll wait for your input. Thanks for your help so far.

Never? IME, independent of which gfxchip is involved, uninstalling or disabling plymouth can sometimes solve missing greeter, black screen or other startup problems.

Have you considered to try booting without any proprietary driver installed, back to basics, as stated in your OP? Doing so could rule it out as involved in the failure.

last suggestion,

rename /etc/X11/xorg.conf (eg /etc/X11/xorg.conf-org)
if the file exists

similarly rename the following if they have any active lines,
(lines which do not start with #)
/etc/X11/xorg.conf.d/50-monitor.conf
/etc/X11/xorg.conf.d/50-device.conf
/etc/X11/xorg.conf.d/50-screen.conf

and with nomodeset in the kernel command line, reboot

the above is intended to take the os to a vga gui without your particular graphics card support

if this does not work, rename the above files to their original

I actually tried this once Knurpht suggested I uninstall and reinstall the proprietary driver with zypper inr. So I removed nomodeset from my kernel parameters and used zypper to remove nvidia. Afterwards I tried rebooting with just nouveau and had the same problem. Since that didn’t work, I rebooted to rescue and finished up by using zypper inr and added nomodeset again.

No luck. I renamed xorg.conf, made sure nomodeset was set, and the other files you mentioned didn’t have any active lines. I tried this with nouveau since the proprietary driver creates and uses its own config.