Black screen when resuming from suspend or hibernation

The resume process gets stuck on a black screen after suspending or hibernating. I can get back to SDDM after pressing CTRL+F2, but there’s many major issues for the system to be usable making it only useful to save my work:

  • NVIDIA GPU doesn’t works
  • WiFI and Bluetooth doesn’t works (It seems to be a different unrelated issue though, might open a new thread later)
  • Cooler fans stays in a high speed
  • System hangs when powering off or rebooting, forcing me to power off my laptop in a forceful way

The issue seems to be with the NVIDIA driver. There were 5 phases of this issue. There’s threads about this issue with NVIDIA specs in this forum but they’re very old.

  1. First install
    NVIDIA proprietary driver not installed, probably using Nouveau, resuming from suspending or hibernation works, but WiFi and Bluetooth could not work after resuming hibernation.

  2. NVIDIA proprietary driver installed
    Using the defaults after installing the driver from non-free X11 repositories. Rare random kernel panic. Resuming from suspending or hibernation used to work, but increased the risks of random kernel panic. Powering off or rebooting the system usually caused kernel panic. Same issue for the WiFi and Bluetooth post-resuming.

  3. After trying out different Optimus managers and installing open NVIDIA kernel module
    I’ve tried out some Optimus managers specified in the Arch Wiki, tried out Bumblebee, EnvyControl, optimus-manager, PRIME and switcheroo-control. Ended up using PRIME as it’s officially supported by NVIDIA and worked well. Resuming from suspending or hibernation worked sometimes, had a chance to resume successfully or hang at a black screen. No more kernel panics. Same issue for the WiFi and Bluetooth post-resuming.

  4. After getting CUDA to work
    Switched Optimus manager to switcheroo-control and NVIDIA kernel module to proprietary, and also to reset the modprobe settings for NVIDIA as EnvyControl created some in the phase 3 which conflicted with CUDA. No random kernel panic, but kernel had a chance to panic when powering off or rebooting. Resuming from suspending worked well when laptop was connected to plug, but would hang on battery. Same issue for the WiFi and Bluetooth post-resuming.

  5. Installing open module (current)
    I’ve decided to use open module since it’s open-source and i’m not required to enroll NVIDIA keys and re-encrypt my TPM-encrypted partitions every driver update. CUDA was still working and NVIDIA worked well, but the hanging-on-resume issue has worse as it also hangs plugged-in now. Same issue for the WiFi and Bluetooth post-resuming.

Right now, my NVIDIA modprobe options are:

options nvidia-drm modeset=1
options nvidia "NVreg_DynamicPowerManagement=0x03"

I’ve tried switching power management to different modes and using PRIME, but none of them changed the results.
On NVreg_DynamicPowerManagement set to 0x02, the kernel was spamming this on journalctl:

kernel: NVRM: nvCheckFailedNoLog: Check failed: pMemDesc->_pInternalMapping != NULL @ mem_desc.c:2260
kernel: NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ mem_utils.c:574
kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Ran out of a critical resource, other than memory [NV_ERR_INSUFFICIENT_RESOURCES] (0x0000001A) returned from memmgrMemCopy(pMemoryManager, &sysSurface, &vidSurface, copySize, TRANSFER_FLAGS_PREFER_CE) @ fbsr_gm107.c:1156

On mode 0x03:

kernel: NVRM: kbusVerifyBar2_GM107: MMUTest BAR0 window offset 0x70e000 returned garbage 0x0

Both of them seems to be about writing data from RAM back to VRAM.
The kernel also was sending this after resuming from hibernation, which is related to the WiFi/Bluetooth driver:

[ T7353] mt7921e 0000:04:00.0: Message 00000010 (seq N) timeout
[ T7353] mt7921e 0000:04:00.0: Failed to get patch semaphore
(loop N from 12 to 6)
[ T7353] mt7921e 0000:04:00.0: chip reset failed

These are my system specs:

 neofetch
              _aaaymQQmwaaa,                 joseskvolpe@ProtoFOX 
          ,wWQQQD????????$QQQQa,.            -------------------- 
       _wQQB?"              ??QQQa,          OS: openSUSE Tumbleweed-Slowroll x86_64 
     sQQD^                      ?QQ6\        Host: Nitro AN515-47 V1.14 
    yWW'                          4QQg       Kernel: 6.9.5-1-default 
  ,QQD          .aaaaaaaa          ^4Q6      Uptime: 1 hour, 40 mins 
 ,mQP        _wWQW?????YWWQa,        4Qm     Packages: 3636 (rpm), 48 (flatpak) 
 jQ@        wWW?'        ^4QQc       ^$QL    Shell: bash 5.2.26 
,QQ'       jWW'            )QW\       ]QQ    Resolution: 1920x1080 
|QQ       ,QW'              ]QQ       ^QQ|   DE: Plasma 6.0.5 
|QQ       |QQ               ]QQ        QQ|   WM: kwin 
|QQ        4Qg              ]QQ       .QQ|   Theme: [Plasma], X-Vulpus-DarkRed [GTK2/3] 
'QQ6       '$WQac.         _QQ(       jQQ    Icons: [Plasma], Vulpinity [GTK2/3] 
 ]QQw        "?QWQQf      _mQP       ,QQ(    Terminal: yakuake 
  4QQga                  wQQP       ,mQ?     CPU: AMD Ryzen 5 7535HS with Radeon Graphics (12) @ 4.603GHz 
   4QQQga,            saQWP'       jQQf      GPU: AMD ATI Radeon 680M 
    ?QQQQQQwaaaaaaaayWWW?'       _mQ@'       GPU: NVIDIA GeForce RTX 3050 Mobile 
      ?WQQQP?9VWUV???^        _amQP^         Memory: 6505MiB / 15171MiB 
        "4QQQaa,          ,awQQQ?^
           "?VQQQQQQQQQQQQQQP?'                                      
                                                                     

nvidia-smi:

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050 ...    Off |   00000000:01:00.0 Off |                  N/A |
| N/A   41C    P8              5W /   60W |       2MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

lshw:

$ sudo lshw -short
H/W path            Device          Class          Description
==============================================================
                                    system         Nitro AN515-47 (0000000000000000)
/0                                  bus            Jimny_RBH
/0/0                                memory         128KiB BIOS
/0/4                                processor      AMD Ryzen 5 7535HS with Radeon Graphics
/0/4/5                              memory         384KiB L1 cache
/0/4/6                              memory         3MiB L2 cache
/0/4/7                              memory         16MiB L3 cache
/0/b                                memory         16GiB System Memory
/0/b/0                              memory         8GiB SODIMM Synchronous Unbuffered (Unregistered) 4800 MHz (0,2 ns)
/0/b/1                              memory         8GiB SODIMM Synchronous Unbuffered (Unregistered) 4800 MHz (0,2 ns)
/0/100                              bridge         Family 17h-19h PCIe Root Complex
/0/100/0.2                          generic        Family 17h-19h IOMMU
/0/100/1.1                          bridge         Family 17h-19h PCIe GPP Bridge
/0/100/1.1/0        /dev/fb0        display        GA107M [GeForce RTX 3050 Mobile]
/0/100/1.2                          bridge         Family 17h-19h PCIe GPP Bridge
/0/100/1.2/0        /dev/nvme0      storage        HFS512GEJ9X125N
/0/100/1.2/0/0      hwmon1          disk           NVMe disk
/0/100/1.2/0/2      /dev/ng0n1      disk           NVMe disk
/0/100/1.2/0/1      /dev/nvme0n1    disk           512GB NVMe disk
/0/100/1.2/0/1/1    /dev/nvme0n1p1  volume         497MiB Windows FAT volume
/0/100/1.2/0/1/2    /dev/nvme0n1p2  volume         15MiB reserved partition
/0/100/1.2/0/1/3    /dev/nvme0n1p3  volume         15EiB Windows FAT volume
/0/100/1.2/0/1/4    /dev/nvme0n1p4  volume         1023MiB Windows NTFS volume
/0/100/1.2/0/1/5    /dev/nvme0n1p5  volume         99MiB EFI partition
/0/100/1.2/0/1/6    /dev/nvme0n1p6  volume         263GiB EFI partition
/0/100/1.2/0/1/7    /dev/nvme0n1p7  volume         488MiB EFI partition
/0/100/1.2/0/1/8    /dev/nvme0n1p8  volume         107GiB LVM Physical Volume
/0/100/2.1                          bridge         Family 17h-19h PCIe GPP Bridge
/0/100/2.1/0        enp3s0          network        Killer E2600 GbE Controller
/0/100/2.2                          bridge         Family 17h-19h PCIe GPP Bridge
/0/100/2.2/0        wlp4s0          network        MT7922 802.11ax PCI Express Wireless Network Adapter
/0/100/2.4                          bridge         Family 17h-19h PCIe GPP Bridge
/0/100/2.4/0        /dev/nvme1      storage        KINGSTON SNV2S1000G
/0/100/2.4/0/0      hwmon2          disk           NVMe disk
/0/100/2.4/0/2      /dev/ng1n1      disk           NVMe disk
/0/100/2.4/0/1      /dev/nvme1n1    disk           1TB NVMe disk
/0/100/2.4/0/1/1    /dev/nvme1n1p1  volume         466GiB Linux filesystem partition
/0/100/2.4/0/1/2    /dev/nvme1n1p2  volume         465GiB Linux filesystem partition
/0/100/4.1                          bridge         Family 19h USB4/Thunderbolt PCIe tunnel
/0/100/8.1                          bridge         Family 17h-19h Internal PCIe GPP Bridge
/0/100/8.1/0        /dev/fb0        display        Rembrandt [Radeon 680M]
/0/100/8.1/0.1      card0           multimedia     Rembrandt Radeon High Definition Audio Controller
/0/100/8.1/0.1/0    input18         input          HD-Audio Generic HDMI/DP,pcm=3
/0/100/8.1/0.2                      generic        Family 19h PSP/CCP
/0/100/8.1/0.3                      bus            Rembrandt USB4 XHCI controller #3
/0/100/8.1/0.3/0    usb1            bus            xHCI Host Controller
/0/100/8.1/0.3/1    usb2            bus            xHCI Host Controller
/0/100/8.1/0.4                      bus            Rembrandt USB4 XHCI controller #4
/0/100/8.1/0.4/0    usb3            bus            xHCI Host Controller
/0/100/8.1/0.4/0/2  input12         input          USB OPTICAL MOUSE  Keyboard
/0/100/8.1/0.4/0/3                  communication  Wireless_Device
/0/100/8.1/0.4/1    usb4            bus            xHCI Host Controller
/0/100/8.1/0.5                      multimedia     ACP/ACP3X/ACP6x Audio Coprocessor
/0/100/8.1/0.6      card1           multimedia     Family 17h/19h HD Audio Controller
/0/100/8.1/0.6/0    input19         input          HD-Audio Generic Headphone
/0/100/8.3                          bridge         Family 17h-19h Internal PCIe GPP Bridge
/0/100/8.3/0                        bus            Rembrandt USB4 XHCI controller #8
/0/100/8.3/0/0      usb5            bus            xHCI Host Controller
/0/100/8.3/0/0/1                    multimedia     ACER HD User Facing
/0/100/8.3/0/1      usb6            bus            xHCI Host Controller
/0/100/8.3/0.3                      bus            Rembrandt USB4 XHCI controller #5
/0/100/8.3/0.3/0    usb7            bus            xHCI Host Controller
/0/100/8.3/0.3/1    usb8            bus            xHCI Host Controller
/0/100/8.3/0.4                      bus            Rembrandt USB4 XHCI controller #6
/0/100/8.3/0.4/0    usb9            bus            xHCI Host Controller
/0/100/8.3/0.4/1    usb10           bus            xHCI Host Controller
/0/100/8.3/0.6                      bus            Rembrandt USB4/Thunderbolt NHI controller #2
/0/100/14                           bus            FCH SMBus Controller
/0/100/14.3                         bridge         FCH LPC Bridge
/0/100/14.3/0                       system         PnP device PNP0c02
/0/100/14.3/1                       system         PnP device PNP0b00
/0/100/14.3/2                       generic        PnP device FUJ7401
/0/100/14.3/3                       system         PnP device PNP0c02
/0/100/14.3/4                       system         PnP device PNP0c01
/0/101                              bridge         Family 17h-19h PCIe Dummy Host Bridge
/0/102                              bridge         Family 17h-19h PCIe Dummy Host Bridge
/0/103                              bridge         Family 17h-19h PCIe Dummy Host Bridge
/0/104                              bridge         Family 17h-19h PCIe Dummy Host Bridge
/0/105                              bridge         Family 17h-19h PCIe Dummy Host Bridge
/0/106                              bridge         Rembrandt Data Fabric: Device 18h; Function 0
/0/107                              bridge         Rembrandt Data Fabric: Device 18h; Function 1
/0/108                              bridge         Rembrandt Data Fabric: Device 18h; Function 2
/0/109                              bridge         Rembrandt Data Fabric: Device 18h; Function 3
/0/10a                              bridge         Rembrandt Data Fabric: Device 18h; Function 4
/0/10b                              bridge         Rembrandt Data Fabric: Device 18h; Function 5
/0/10c                              bridge         Rembrandt Data Fabric: Device 18h; Function 6
/0/10d                              bridge         Rembrandt Data Fabric: Device 18h; Function 7
/1                  input0          input          AT Translated Set 2 keyboard
/2                  input1          input          Power Button
/3                  input11         input          ELAN050A:01 04F3:3158 Touchpad
/4                  input15         input          Acer Wireless Radio Control
/5                  input16         input          PC Speaker
/6                  input17         input          Acer WMI hotkeys
/7                  input2          input          Sleep Button
/8                  input3          input          Lid Switch
/9                  input4          input          Video Bus
/a                  input5          input          Video Bus
/b                  input9          input          ELAN050A:01 04F3:3158 Mouse

I forgot to say that top lists nvidia-powerd at high CPU usage after resuming and using CTRL+F2 to force it to show my session

According to NVIDIA Chapter 21, i’d to configure memory allocation preservation. Since i got S0ix support i enabled S0ix-based power management. I’ve used these options on modprobe:

options nvidia-drm modeset=1
options nvidia "NVreg_DynamicPowerManagement=0x03"
options nvidia "NVreg_PreserveVideoMemoryAllocations=1"

# s0ix
options nvidia "NVreg_EnableS0ixPowerManagement=1"
options nvidia "NVreg_S0ixPowerManagementVideoMemoryThreshold=1024"

Then uses:

sudo dracut -f --regenerate-all

This fixed for the suspending, system doesn’t hangs while resuming from suspension anymore. That’s a huge step forward. But it still hangs on a black screen while resuming from hibernation, while nvidia-powerd uses a lot of the CPU.

According to chapter 44, the open module currently doesn’t supports preserving video memory across power management events. I’ve fallback to the proprietary module, but i got same results, so i returned to the open module.

I’ve also enabled GSP:

options nvidia-drm modeset=1
options nvidia "NVreg_DynamicPowerManagement=0x03"
options nvidia "NVreg_PreserveVideoMemoryAllocations=1"

# s0ix
options nvidia "NVreg_EnableS0ixPowerManagement=1"
options nvidia "NVreg_S0ixPowerManagementVideoMemoryThreshold=1024"

# gsp
options nvidia "NVreg_EnableGpuFirmware=1"

So, i checked the logs. The NVIDIA driver spams it during the hibernation process (the same thing suspending also used to spam), until it “completes” it:

kernel: NVRM: nvCheckFailedNoLog: Check failed: pMemDesc->_pInternalMapping != NULL @ mem_desc.c:2260
kernel: NVRM: nvAssertFailedNoLog: Assertion failed: 0 @ mem_utils.c:574
kernel: NVRM: nvAssertOkFailedNoLog: Assertion failed: Ran out of a critical resource, other than memory [NV_ERR_INSUFFICIENT_RESOURCES] (0x0000001A) returned from memmgrMemCopy(pMemoryManager, &sysSurface, &vidSurface, copySize, TRANSFER_FLAGS_PREFER_CE) @ fbsr_gm107.c:1156
(spams the above a million times)
jun 30 01:29:19 ProtoFOX systemd[1]: nvidia-hibernate.service: Deactivated successfully.
░░ Subject: Unit succeeded
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ The unit nvidia-hibernate.service has successfully entered the 'dead' state.
jun 30 01:29:19 ProtoFOX systemd[1]: Finished NVIDIA system hibernate actions.
░░ Subject: Unidade nvidia-hibernate.service concluiu a inicialização
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░
░░ A unidade nvidia-hibernate.service concluiu a inicialização.
░░
░░ The start-up result is done.
jun 30 01:29:19 ProtoFOX systemd[1]: Starting System Hibernate..

Then, once resuming, it spams:

kernel: NVRM: kbusVerifyBar2_GM107: MMUTest BAR0 window offset 0x70e000 returned garbage 0x0

Does it happen with both X11 and Wayland?

Yes, it does

I noticed the HDMI output (Connected to NVidia GPU) is extremely lag after using one of these options. The TV connected to it is set to output at 60Hz, but Glxgears gives 22 to 30 FPS.

What if you reset the system to the default configurations?

Removed the modprobe options, the HDMI output is still really laggy, and it now hangs while resuming from suspending. So not a modprobe option issue, mos likely something else, i’ll try installing back and re-enabling PRIME on offload mode and see what happens.

Using PRIME in offload mode, the HDMI output works well, no lag. Resuming from suspending also works, from hibernation not. But only Wayland works, X11 freezes on a black screen. Also, sometimes once booting, SDDM get really, really lag, while this spams to journal:

kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 1
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to apply atomic modeset.  Error code: -22
kernel: [drm:nv_drm_atomic_commit [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Flip event timeout on head 0

Using AMD mode, the HDMI output gets laggy again with the same other results as offload mode (including the occasional SDDM lag one). ¿It seems the default without PRIME seems to render HDMI buffer from the iGPU, sending to the GPU, and lagging through it? (while the inverse, NVIDIA render to AMD output, works well) Anyways, as HDMI is connected to the NVIDIA GPU, it’s preferable for it to render using NVIDIA, while other outputs (built-in and USB-C DP) should render with the AMD iGPU.

Checking /var/log/Xorg.4.log, it outputs while trying to start X11 session on Offload mode:

[  5548.092] (WW) NVIDIA(G0): Failed to set the display configuration
[  5548.092] (WW) NVIDIA(G0):  - Setting a mode on head 0 failed: Insufficient permissions
[  5548.092] (WW) NVIDIA(G0):  - Setting a mode on head 1 failed: Insufficient permissions
[  5548.092] (WW) NVIDIA(G0):  - Setting a mode on head 2 failed: Insufficient permissions
[  5548.092] (WW) NVIDIA(G0):  - Setting a mode on head 3 failed: Insufficient permissions
[  5548.099] (II) NVIDIA(GPU-0): Deleting GPU-0

@JoseskVolpe can you show the output from;

/sbin/lspci -nnk | grep -EA3 "VGA|Display|3D"

Sure

$ /sbin/lspci -nnk | grep -EA3 "VGA|Display|3D"
pcilib: Error reading /sys/bus/pci/devices/0000:00:08.3/label: Operation not permitted
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA107M [GeForce RTX 3050 Mobile] [10de:25a2] (rev a1)
        Subsystem: Acer Incorporated [ALI] Device [1025:159e]
        Kernel driver in use: nvidia
        Kernel modules: nouveau, nvidia_drm, nvidia
--
75:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Rembrandt [Radeon 680M] [1002:1681] (rev 0b)
        Subsystem: Acer Incorporated [ALI] Device [1025:159e]
        Kernel driver in use: amdgpu
        Kernel modules: amdgpu
1 Like

@JoseskVolpe that indicates based on BUS ID, Nvidia is primary and AMD secondary, but switcherooctl doesn’t show that? Also have you connected an external monitor to the system at all?

Switcherooctl lists AMD iGPU as GPU 0 and default. I also see more sense for the integrated graphics to be primary, but i didn’t made the hardware, blame Acer xd

Yes, i connected a TV and a external monitor on the HDMI (NVIDIA) and USB-C Thunderbolt (AMD).

@JoseskVolpe Thanks, yes I was interested in they are both VGA, that would indicate that setup can drive external screens from the respective card. If Display or 3D then no (well vGPU)… I need to check my AMD laptop that says Display for the secondary card, likewise the Tesla P4 is 3D…

1 Like

@JoseskVolpe See @jbouter’s output here https://forums.opensuse.org/t/extra-battery-drain-in-sleep-mode/176323/55 their system is VGA/3D Controller

1 Like

¿So, if my GPU is not a true offload device, what should i do? lol
I just noticed the performance in my TV is bad again, and it’s on offload mode. Weird symptom as it was ok before in this mode.

¿Maybe start 2 wayland sessions, one in the iGPU and other in the GPU? (Really really weird setup and i’m not sure that would work well, that wouldn’t even be able to share applications windows)

@JoseskVolpe well, if nothing connected it is…