[Tumbleweed, NVIDIA RTX 3060 Mobile] GPU and secondary NVMe stuck in D3cold state (cannot transition to D0)

I’m currently facing a power-state issue on my laptop involving both my NVIDIA RTX 3060 Mobile GPU (GA106M) and one of my NVMe SSDs. My setup includes two NVMe SSDs, a 500GB NVMe running Windows11 and a 1TB NVMe running openSUSE Tumbleweed. Under Windows11 both SSDs are fully recognised and the NVIDIA GPU works perfectly fine. However on openSUSE Tumbleweed only the 1 TB one (which openSUSE installed) is visible.
The second NVMe drive and the NVIDIA GPU are inaccessible and both appear to be stuck in PCIe power state D3cold and fail to transition back to D0 (active state). Honestly, I wasn’t sure whether I should open separate threads for the SSD and GPU issues but since both seem to share the same issue I opened only one. I also must say my laptop has second integrated Intel UHD Graphics GPU and Fast Startup on Windows is disabled.

Hardware and System Info:
Laptop: Casper Excalibur G900 (2021)
CPU: Intel Core i7-11800H
GPU: NVIDIA RTX 3060 Mobile (GA106M)
Kernel: Linux 6.17.5-1-default (Tumbleweed rolling), at the time of writing
DE: KDE Plasma 6.5 (Wayland)
Boot: UEFI (Secure Boot disabled)

The first sign of trouble appears early in the boot log where dmesg repeatedly reports messages like:
nvme 0000:02:00.0: Unable to change power state from D3cold to D0, device inaccessible,
nvidia 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible.
Later the NVIDIA driver gives up with:
NVRM: The NVIDIA GPU 0000:01:00.0 has fallen off the bus and is not responding to commands. nvidia 0000:01:00.0: probe with driver nvidia failed with error -1 NVRM: None of the NVIDIA devices were initialized.

My current GRUB kernel params:
GRUB_CMDLINE_LINUX_DEFAULT="splash=silent resume=/dev/disk/by-uuid/52f1bd57-902f-4eb6-ab82-957eb3707007 quiet security=selinux selinux=1 mitigations=auto rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvme_core.default_ps_max_latency_us=0"

sudo lspci -nnk | grep -A3 -E "NVIDIA|NVMe" output:

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GA106M [GeForce RTX 3060 Mobile / Max-Q] [10de:2520] (rev a1)
        Subsystem: QUANTA Computer Inc Device [152d:1356]
        Kernel modules: nouveau, nvidia_drm, nvidia
02:00.0 Non-Volatile memory controller [0108]: Micron/Crucial Technology P2 [Nick P2] / P3 / P3 Plus NVMe PCIe SSD (DRAM-less) [c0a9:540a] (rev 01)
        Subsystem: Micron/Crucial Technology P2 [Nick P2] / P3 / P3 Plus NVMe PCIe SSD (DRAM-less) [c0a9:540a]
        Kernel modules: nvme
03:00.0 Non-Volatile memory controller [0108]: Micron/Crucial Technology P3 Plus NVMe PCIe SSD (DRAM-less) [c0a9:5421] (rev 01)
        Subsystem: Micron/Crucial Technology Device [c0a9:5021]
        Kernel driver in use: nvme
        Kernel modules: nvme

I can provide full logs or any other data if needed.

@wirelover Hi and welcome to the Forum :smile:
So in the system BIOS, the NVMe devices are using Intel VMD or RST or AHCI?

Normally nvme_core.default_ps_max_latency_us=0 is added for funky controllers, why is this added?

Hi and thank you for welcoming me :grin:.
In BIOS, the storage controller is set to AHCI.
And I actually added the parameter after seeing someone mention it in another forum but can’t remember where exactly. Because they said it fixed a similar issue with their NVMe not waking up from D3cold. Didn’t make any difference in my case though.