(but I have no idea whether plymouth works with them installed now or not)
By adding the force_drivers+= to my dracut nvidia.conf file, the nVidia drivers appear to get loaded earlier, but I noticed by reading the journal that they’re tainting the kernel because they’re not in the kernel subdirectory. When I was at the shell though, I saw they where loaded…dmeg shows:
I think I should probably fix that before continuing any further. I have to figure out how.
The nvidia driver taints the kernel because it is a third party module (with an incompatible license even), not part of the kernel itself.
You cannot fix that really.
Shouldn’t make a difference though, except that bug reports against the kernel may get rejected…
They’re being installed in the /lib/modules/uname -r/updates directory because of DKMS. If I disable the DKMS option, they get installed in the proper location.
I read that it could disable certain API functions and thought that was possibly a problem. Thanks for the clarification.
So, I came to the same conclusion that you came to, about trying to disable Plymouth in the initrd and waiting for the real Plymouth starts. I went about it by modifying /etc/default/grub
and changing:
GRUB_CMDLINE_LINUX_DEFAULT
to have a
rd.plymouth=0
Then I ran grub2-mkconfig and dracut again.
I don’t see where dracut reads those values though. I might try it the way the RPM does it, although the RPM didn’t show a graphical boot screen either.
I see this in the journal:
Feb 03 16:09:53 eugene kernel: DMAR: Forcing write-buffer flush capability
Feb 03 16:09:53 eugene kernel: DMAR: Disabling IOMMU for graphics on this chipset
My board doesn’t have built-in video, nor does my old Intel(R) Core™2 Extreme CPU X9770 support built-in video. My understanding is IOMMU is disabled for my nVidia card. But I don’t think that’s causing any issues. From my understanding, that will only be an issue if I’m running a VM and using passthrough, which I am, but not on this OpenSuSE system. I do that on my actual server, the HP DL380 Gen9, which runs CentOS 7 right now.
I see a few other things in the journal which make me wonder if they’re causing problems or not.
Because you guys are more knowledgeable about Linux than I am, and my google searches haven’t really turned up anything that seems to suggest it’s an issue related to my problem, I’ll just post them here for you to see:
Feb 03 16:09:53 eugene kernel: x86: Booting SMP configuration:
Feb 03 16:09:53 eugene kernel: .... node #0, CPUs: #1
Feb 03 16:09:53 eugene kernel: TSC synchronization [CPU#0 -> CPU#1]:
Feb 03 16:09:53 eugene kernel: Measured 2150946 cycles TSC warp between CPUs, turning off TSC clock.
Feb 03 16:09:53 eugene kernel: tsc: Marking TSC unstable due to check_tsc_sync_source failed
Feb 03 16:09:53 eugene kernel: #2 #3
Feb 03 16:09:53 eugene kernel: smp: Brought up 1 node, 4 CPUs
Feb 03 16:09:53 eugene kernel: smpboot: Total of 4 processors activated (25600.20 BogoMIPS)
...
Feb 03 16:09:53 eugene kernel: ACPI: bus type PCI registered
Feb 03 16:09:53 eugene kernel: acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
Feb 03 16:09:53 eugene kernel: PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xe8000000-0xebffffff] (base 0xe8000000)
Feb 03 16:09:53 eugene kernel: PCI: MMCONFIG at [mem 0xe8000000-0xebffffff] reserved in E820
Feb 03 16:09:53 eugene kernel: pmd_set_huge: Cannot satisfy [mem 0xe8000000-0xe8200000] with a huge-page mapping due to MTRR override.
Feb 03 16:09:53 eugene kernel: PCI: Using configuration type 1 for base access
Feb 03 16:09:53 eugene kernel: HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
...
Feb 03 16:09:53 eugene kernel: acpi PNP0A03:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
Feb 03 16:09:53 eugene kernel: acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
...
Feb 03 16:09:53 eugene kernel: DMAR: Forcing write-buffer flush capability
Feb 03 16:09:53 eugene kernel: DMAR: Disabling IOMMU for graphics on this chipset
...
Feb 03 16:09:53 eugene kernel: pci 0000:00:1f.0: quirk: [io 0x0400-0x047f] claimed by ICH6 ACPI/GPIO/TCO
Feb 03 16:09:53 eugene kernel: pci 0000:00:1f.0: quirk: [io 0x0480-0x04bf] claimed by ICH6 GPIO
...
Feb 03 16:09:53 eugene kernel: pci 0000:03:00.0: disabling ASPM on pre-1.1 PCIe device. You can enable it with 'pcie_aspm=force'
...
Feb 03 16:09:53 eugene kernel: pci 0000:01:00.0: vgaarb: setting as boot VGA device
Feb 03 16:09:53 eugene kernel: pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
Feb 03 16:09:53 eugene kernel: pci 0000:01:00.0: vgaarb: bridge control possible
Feb 03 16:09:53 eugene kernel: vgaarb: loaded
...
Feb 03 16:09:53 eugene kernel: pci 0000:01:00.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
Feb 03 16:09:53 eugene kernel: pci 0000:03:00.0: async suspend disabled to avoid multi-function power-on ordering issue
Feb 03 16:09:53 eugene kernel: pci 0000:03:00.1: async suspend disabled to avoid multi-function power-on ordering issue
...
<after initrd loads>
Feb 03 16:09:53 eugene kernel: vesafb: mode is 1920x1080x32, linelength=7680, pages=0
Feb 03 16:09:53 eugene kernel: vesafb: scrolling: redraw
Feb 03 16:09:53 eugene kernel: vesafb: Truecolor: size=8:8:8:8, shift=24:16:8:0
Feb 03 16:09:53 eugene kernel: vesafb: framebuffer at 0xe7000000, mapped to 0xffff994441800000, using 8128k, total 8128k
Feb 03 16:09:53 eugene kernel: Console: switching to colour frame buffer device 240x67
Feb 03 16:09:53 eugene kernel: fb0: VESA VGA frame buffer device
...
Feb 03 16:09:53 eugene systemd-sysctl[180]: Couldn't write '0' to 'dev/cdrom/autoclose', ignoring: No such file or directory
Feb 03 16:09:53 eugene systemd-vconsole-setup[173]: /usr/bin/setfont failed with error code 71.
Feb 03 16:09:53 eugene systemd-vconsole-setup[173]: PIO_UNIMAPCLR: Input/output error
Feb 03 16:09:53 eugene systemd-vconsole-setup[173]: Setting fonts failed with a "system error", ignoring.
Feb 03 16:09:53 eugene dracut-cmdline[189]: dracut- dracut-044-10.2 dracut-044-10.2
...
Feb 03 16:09:53 eugene dracut-cmdline[189]: Using kernel command line parameters: rd.driver.pre=nvidia rd.driver.pre=nvidia_modeset rd.driver.pre=nvidia_uvm rd.driver.pre=nvidia_drm resume=UUID=ac985b5e-fb17-442f-b6ac-24c1c53e5bff root=UUID=3c6e7faf-093a-49df-83db-ca247620f093 rootfstype=ext4 rootflags=rw,relatime,data=ordered BOOT_IMAGE=/vmlinuz-4.14.15-1-default root=UUID=3c6e7faf-093a-49df-83db-ca247620f093 resume=/dev/sda2 splash quiet showopts nvidia-drm.modeset=1
Feb 03 16:09:53 eugene kernel: ipmi message handler version 39.2
Feb 03 16:09:53 eugene kernel: ipmi device interface
Feb 03 16:09:53 eugene kernel: nvidia: loading out-of-tree module taints kernel.
Feb 03 16:09:53 eugene kernel: nvidia: module license 'NVIDIA' taints kernel.
Feb 03 16:09:53 eugene kernel: Disabling lock debugging due to kernel taint
Feb 03 16:09:53 eugene kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 247
Feb 03 16:09:53 eugene kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
Feb 03 16:09:53 eugene kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 390.25 Wed Jan 24 20:02:43 PST 2018 (using threaded interrupts)
Feb 03 16:09:53 eugene kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 390.25 Wed Jan 24 19:29:37 PST 2018
Feb 03 16:09:53 eugene kernel: nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 246
Feb 03 16:09:53 eugene kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Feb 03 16:09:54 eugene kernel: resource sanity check: requesting [mem 0x000c0000-0x000fffff], which spans more than PCI Bus 0000:00 [mem 0x000c0000-0x000dffff window]
Feb 03 16:09:54 eugene kernel: caller _nv001170rm+0xe3/0x1d0 [nvidia] mapping multiple BARs
Feb 03 16:09:54 eugene kernel: nvidia-modeset: Allocated GPU:0 (GPU-840f2135-d35a-2aa8-4af4-02d87a8f3fc8) @ PCI:0000:01:00.0
Feb 03 16:09:54 eugene kernel: [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
Feb 03 16:09:54 eugene kernel: [drm] No driver support for vblank timestamp query.
Feb 03 16:09:54 eugene kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
Feb 03 16:09:54 eugene systemd[1]: Started dracut pre-udev hook.
-- Subject: Unit dracut-pre-udev.service has finished start-up
-- Defined-By: systemd
-- Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit dracut-pre-udev.service has finished starting up.
--
-- The start-up result is done.
...
Feb 03 16:09:54 eugene systemd[1]: Started Show Plymouth Boot Screen.
-- Subject: Unit plymouth-start.service has finished start-up
...
Feb 03 16:10:00 eugene kernel: audit: type=1400 audit(1517692200.420:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="ping" pid=446 comm="apparmor_parser"
Feb 03 16:10:00 eugene kernel: audit: type=1400 audit(1517692200.708:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="klogd" pid=456 comm="apparmor_parser"
Feb 03 16:10:00 eugene systemd-udevd[447]: invalid key/value pair in file /etc/udev/rules.d/99-usb-serial.rules on line 1, starting at character 42 (',')
Feb 03 16:10:00 eugene systemd-udevd[447]: invalid key/value pair in file /etc/udev/rules.d/99-usb-serial.rules on line 2, starting at character 25 (',')
Feb 03 16:10:00 eugene systemd[1]: Started udev Kernel Device Manager.
...
Feb 03 16:10:01 eugene kernel: ACPI Warning: SystemIO range 0x0000000000000428-0x000000000000042F conflicts with OpRegion 0x000000000000042C-0x000000000000042D (\GP2C) (20170728/utaddress-247)
Feb 03 16:10:01 eugene kernel: ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
Feb 03 16:10:01 eugene kernel: lpc_ich: Resource conflict(s) found affecting gpio_ich
...
Feb 03 16:10:06 eugene smartd[1829]: Device: /dev/sdb [SAT], WDC WD2002FAEX-007BA0, S/N:WD-WMAY02244308, WWN:5-0014ee-6012656ea, FW:05.01D05, 2.00 TB
Feb 03 16:10:06 eugene smartd[1829]: Device: /dev/sdb [SAT], found in smartd database: Western Digital Caviar Black
Feb 03 16:10:06 eugene smartd[1829]: Device: /dev/sdb [SAT], is SMART capable. Adding to "monitor" list.
Feb 03 16:10:06 eugene smartd[1829]: Device: /dev/sdb [SAT], state read from /var/lib/smartmontools/smartd.WDC_WD2002FAEX_007BA0-WD_WMAY02244308.ata.state
Feb 03 16:10:06 eugene smartd[1829]: Monitoring 2 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices
Feb 03 16:10:06 eugene smartd[1829]: Device: /dev/sda [SAT], 16 Currently unreadable (pending) sectors
Feb 03 16:10:06 eugene smartd[1829]: Device: /dev/sda [SAT], 16 Offline uncorrectable sectors
Feb 03 16:10:06 eugene smartd[1829]: Device: /dev/sda [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 55 to 56
Feb 03 16:10:06 eugene smartd[1829]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 45 to 44
It looks like perhaps my main hard drive might be dying, which is no good.
But this also shows what you said, nvidia drivers are being loaded the same second Plymouth is being started. I find it hard to believe I couldn’t somehow insert just a short pause in there somehow, either by writing a udev rule that executes a script or modifying the systemd plymouth-start.service file that inserts a short pause, maybe a second or two…or somehow have the plymouth-start.service script try calling an nVidia API (or whatever we call them in Linux) and wait until that API is successfully called…then, continue with starting Plymouth.
Granted, this would be a specific fix just for my system, but it’d make me happy…
It was a 3TB Seagate. Never had good luck with Seagates. Got a WD Gold ordered, 4TB, Enterprise class. Once it arrives, I’ll install an OS and then try to recover some data. Had /home on it’s own partition and seeing how grub wouldn’t even execute, I’m hoping the beginning of the hard drive was damaged and not so much the latter on sections.
Unfortunately, I’m very sick, lost a lot of weight real quick like, don’t have much energy, and haven’t made a backup in a while. I was working on some stuff and like an idiot, just kept putting off the backup until I got something a bit more stable. Hopefully I can recover it. Until then, we’ll have to put this on pause.
Just wanted to update. I put CentOS 7 on, and find it odd that if this is a race condition issue, why other distributions don’t seem to have any trouble loading Plymouth. I believe even non-tumbleweed OpenSuSE shows the Plymouth bootup splash screen correctly.
Not sure it’s really a race condition that’s causing this problem.
I never got it working; I’m not sure how to add those things (e.g. nvidia_uvm).
I disabled plymouth with [plymouth.enable=0](https://en.opensuse.org/SDB:NVIDIA_the_hard_way#Kernel_mode-setting) and actually like seeing that it’s “wicked m” doing something
rather than a generic “something is happening” screen.
That said, I’m still interested in using a virtual console with a higher resolution.
Let us know if you ever figure it out. :-]
I can reliably reproduce this problem if I start VM with cold caches (after “echo 3 > /proc/sys/vm/drop_caches”). And if I compare plymouth traces from working and non-working cases I have:
text splash (non-working):
[ply-device-manager.c:815] create_devices_from_udev:Timeout elapsed, looking for devices from udev[ply-device-manager.c:302] create_devices_for_subsystem:creating objects for drm devices
[ply-device-manager.c:319] create_devices_for_subsystem:found device /sys/devices/pci0000:00/0000:00:01.0/drm/card0
[ply-device-manager.c:342] create_devices_for_subsystem:it's not initialized
[ply-device-manager.c:319] create_devices_for_subsystem:found device /sys/devices/pci0000:00/0000:00:01.0/drm/card0/card0-Virtual-1
[ply-device-manager.c:342] create_devices_for_subsystem:it's not initialized
[ply-device-manager.c:319] create_devices_for_subsystem:found device /sys/devices/virtual/drm/ttm
[ply-device-manager.c:342] create_devices_for_subsystem:it's not initialized
[ply-device-manager.c:302] create_devices_for_subsystem:creating objects for frame buffer devices
[ply-device-manager.c:319] create_devices_for_subsystem:found device /sys/devices/pci0000:00/0000:00:01.0/graphics/fb0
[ply-device-manager.c:342] create_devices_for_subsystem:it's not initialized
[ply-device-manager.c:319] create_devices_for_subsystem:found device /sys/devices/virtual/graphics/fbcon
[ply-device-manager.c:342] create_devices_for_subsystem:it's not initialized
[ply-device-manager.c:823] create_devices_from_udev:Creating non-graphical devices, since there's no suitable graphics hardware
...
[ply-device-manager.c:365] on_udev_event:got add event for device card0
[ply-device-manager.c:377] on_udev_event:ignoring since we're already using text splash for local console
[ply-device-manager.c:365] on_udev_event:got add event for device fb0
[ply-device-manager.c:381] on_udev_event:ignoring since we only handle subsystem graphics devices after timeout
[ply-device-manager.c:365] on_udev_event:got add event for device card0-Virtual-1
[ply-device-manager.c:377] on_udev_event:ignoring since we're already using text splash for local console
Frame-buffer splash (working):
[ply-device-manager.c:815] create_devices_from_udev:Timeout elapsed, looking for devices from udev
[ply-device-manager.c:302] create_devices_for_subsystem:creating objects for drm devices
[ply-device-manager.c:319] create_devices_for_subsystem:found device /sys/devices/pci0000:00/0000:00:01.0/drm/card0
[ply-device-manager.c:326] create_devices_for_subsystem:device is initialized
[ply-device-manager.c:335] create_devices_for_subsystem:found node /dev/dri/card0
[ply-device-manager.c:229] create_devices_for_udev_device:device subsystem is drm
[ply-device-manager.c:244] create_devices_for_udev_device:found DRM device /dev/dri/card0
[ply-device-manager.c:247] create_devices_for_udev_device:forcing use of framebuffer for cirrusdrmfb
[ply-device-manager.c:319] create_devices_for_subsystem:found device /sys/devices/pci0000:00/0000:00:01.0/drm/card0/card0-Virtual-1
[ply-device-manager.c:326] create_devices_for_subsystem:device is initialized
[ply-device-manager.c:319] create_devices_for_subsystem:found device /sys/devices/virtual/drm/ttm
[ply-device-manager.c:342] create_devices_for_subsystem:it's not initialized
[ply-device-manager.c:302] create_devices_for_subsystem:creating objects for frame buffer devices
[ply-device-manager.c:319] create_devices_for_subsystem:found device /sys/devices/pci0000:00/0000:00:01.0/graphics/fb0
[ply-device-manager.c:326] create_devices_for_subsystem:device is initialized
[ply-device-manager.c:335] create_devices_for_subsystem:found node /dev/fb0
[ply-device-manager.c:229] create_devices_for_udev_device:device subsystem is graphics
[ply-device-manager.c:251] create_devices_for_udev_device:found frame buffer device /dev/fb0
[ply-device-manager.c:699] create_devices_for_terminal_and_renderer_type:creating devices for /dev/fb0 (renderer type: 2) (terminal: /dev/tty7)
So what happens here - plymouth enumerates graphical devices, but udev did not yet complete processing of these devices so plymouth ignores them and falls back to text mode. Later when udev completes initialization it sends event but plymouth ignores it because it already selected splash backend.
This is clear bug in plymouth. It should wait for udev to finish initialization.
Yes, I did that. If you read through the posts, you see where I create an nvidia.conf dracut configuration file in /etc/dracut.d That file makes sure the nVidia modules get loaded in the initramfs.
There’s a few ways to add the module parameters, like nvidia-drm.modeset=1. I created a special modprobe conf file. My dracut nvidia.conf file includes that special nVidia modprobe conf file, that makes sure the modules get loaded with the proper parameters.
Thank you for submitting a bug. I put a 3 second delay in in the systemctl service file for plymouth-start or whatever it’s called, and that should have been more than enough time for the udev rules to finish. I was working on adding more time (15 seconds), but I can’t remember if I ever actually did that before the hard drive died or not. Anyway, thanks for submitting the bug report and showing us the trace.