Hi
I am running an XMG Neo 15 (2019) = XNE15M19 = Tongfang GK5CP0Z with an “Intel Corporation UHD Graphics 630 (Mobile)” and a dGPU “NVIDIA Corporation TU106M [GeForce RTX 2060 Mobile] (rev a1)”.
Here’'s lspci:
00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 07)
00:02.0 VGA compatible controller: Intel Corporation UHD Graphics 630 (Mobile)
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:14.3 Network controller: Intel Corporation Wireless-AC 9560 [Jefferson Peak] (rev 10)
00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller (rev 10)
00:1b.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #21 (rev f0)
00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 (rev f0)
00:1d.5 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #14 (rev f0)
00:1e.0 Communication controller: Intel Corporation Device a328 (rev 10)
00:1f.0 ISA bridge: Intel Corporation Device a30d (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
01:00.0 VGA compatible controller: NVIDIA Corporation TU106M [GeForce RTX 2060 Mobile] (rev a1)
01:00.1 Audio device: NVIDIA Corporation TU106 High Definition Audio Controller (rev a1)
01:00.2 USB controller: NVIDIA Corporation TU106 USB 3.1 Host Controller (rev a1)
01:00.3 Serial bus controller [0c80]: NVIDIA Corporation TU106 USB Type-C Port Policy Controller (rev a1)
02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
I had been using prime-select to switch off the nvidia card, but this turned out to only use intel for rendering - but keep the nvidia card powered.
So I installed bbswitch.
The card state output is unfortunately always the same:
# cat /proc/acpi/bbswitch
0000:01:00.0 ON
Writing a value has no effect:
# tee /proc/acpi/bbswitch <<<OFF && cat /proc/acpi/bbswitch
OFF
0000:01:00.0 ON
When writing to /proc/acpi/bbswitch, journalctl reports:
kernel: bbswitch: device 0000:01:00.0 is in use by driver 'nvidia', refusing OFF
Indeed nvidia is loaded:
# lsmod | grep nvidia
i2c_nvidia_gpu 16384 0
nvidia 18825216 9
ipmi_msghandler 65536 2 ipmi_devintf,nvidia
(This is after attempting a prime-select intel
, which successfully unloaded nvidia_drm and nvidia_uvm)
Apparently, X server is holding nvidia open:
# lsof | grep /dev/nvidia
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1000/gvfs
Output information may be incomplete.
X 2183 root 15u CHR 195,255 0t0 151 /dev/nvidiactl
X 2183 root 18u CHR 195,0 0t0 18487 /dev/nvidia0
X 2183 root 19u CHR 195,0 0t0 18487 /dev/nvidia0
X 2183 2249 X:disk$0 root 15u CHR 195,255 0t0 151 /dev/nvidiactl
X 2183 2249 X:disk$0 root 18u CHR 195,0 0t0 18487 /dev/nvidia0
X 2183 2249 X:disk$0 root 19u CHR 195,0 0t0 18487 /dev/nvidia0
X 2183 2334 X:disk$0 root 15u CHR 195,255 0t0 151 /dev/nvidiactl
X 2183 2334 X:disk$0 root 18u CHR 195,0 0t0 18487 /dev/nvidia0
X 2183 2334 X:disk$0 root 19u CHR 195,0 0t0 18487 /dev/nvidia0
X 2183 2339 InputThre root 15u CHR 195,255 0t0 151 /dev/nvidiactl
X 2183 2339 InputThre root 18u CHR 195,0 0t0 18487 /dev/nvidia0
X 2183 2339 InputThre root 19u CHR 195,0 0t0 18487 /dev/nvidia0
But this does not seem to be the root cause - since bbswitch already fails to disable the dGPU during boot, long before Xorg is started up.
Here from journalctl -b:
Jun 24 02:55:05 felicity kernel: nvidia: loading out-of-tree module taints kernel.
Jun 24 02:55:05 felicity kernel: nvidia: module license 'NVIDIA' taints kernel.
Jun 24 02:55:05 felicity kernel: Disabling lock debugging due to kernel taint
Jun 24 02:55:05 felicity kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
Jun 24 02:55:05 felicity kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 243
Jun 24 02:55:05 felicity kernel: nvidia 0000:01:00.0: enabling device (0000 -> 0003)
Jun 24 02:55:05 felicity kernel: nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
Jun 24 02:55:05 felicity systemd[1]: Reloading.
Jun 24 02:55:05 felicity kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module 430.26 Tue Jun 4 17:40:52 CDT 2019
Jun 24 02:55:05 felicity kernel: nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 241
Jun 24 02:55:05 felicity systemd[1]: Found device Samsung SSD 970 EVO 2TB 5.
Jun 24 02:55:05 felicity systemd[1]: Found device Samsung SSD 970 EVO 2TB BOOT.
Jun 24 02:55:05 felicity systemd[1]: Starting Cryptography Setup for cr_nvme-Samsung_SSD_970_EVO_2TB_S46ENB0M201744D-part5...
Jun 24 02:55:05 felicity kernel: nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 430.26 Tue Jun 4 17:45:09 CDT 2019
Jun 24 02:55:05 felicity systemd-cryptsetup[604]: Set cipher aes, mode xts-plain64, key size 512 bits for device /dev/disk/by-uuid/3482b2b4-9af9-4969-8c49-c07d1364e06f.
Jun 24 02:55:05 felicity kernel: [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
Jun 24 02:55:05 felicity kernel: [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0
...
Jun 24 02:55:10 felicity kernel: bbswitch: device 0000:01:00.0 is in use by driver 'nvidia', refusing OFF
Jun 24 02:55:10 felicity kernel: bbswitch: Succesfully loaded. Discrete card 0000:01:00.0 is on
...
Jun 24 02:55:10 felicity kernel: nvidia-gpu 0000:01:00.3: enabling device (0000 -> 0002)
In /etc/modprobe.d it is confusing me that there’s a “50-” prefixed and a “blank” version of nvidia-default.conf:
root@felicity:/etc/modprobe.d # exa -l
Permissions Size User Date Modified Name
.rw-r--r-- 3,8k root 12 Jun 18:28 00-system.conf
.rw-r--r-- 1,2k root 14 Mär 16:58 10-unsupported-modules.conf
.rw-r--r-- 45 root 12 Jun 20:03 50-bbswitch.conf
.rw-r--r-- 5,0k root 14 Mär 16:58 50-blacklist.conf
.rw-r--r-- 128 root 12 Jun 18:40 50-bluetooth.conf
.rw-r--r-- 33 root 12 Jun 18:46 50-ipw2200.conf
.rw-r--r-- 34 root 12 Jun 18:46 50-iwl3945.conf
.rw-r--r-- 1,2k root 19 Jun 20:20 50-nvidia-default.conf
.rw-r--r-- 18 root 12 Jun 18:46 50-prism54.conf
.rw-r--r-- 668 root 12 Jun 18:28 60-blacklist_fs-adfs.conf
...
.rw-r--r-- 664 root 12 Jun 18:28 60-blacklist_fs-ufs.conf
.rw-r--r-- 47 root 14 Mär 16:58 99-local.conf
.rw-r--r-- 158 root 12 Jun 19:01 firewalld-sysctls.conf
.rw-r--r-- 18 root 12 Jun 17:06 nvidia-default.conf
.rw-r--r-- 674 root 5 Apr 10:49 tuned.conf
Here’s the contents:
root@felicity:/etc/modprobe.d # cat 50-nvidia-default.conf
options nvidia NVreg_DeviceFileUID=0 NVreg_DeviceFileGID=480 NVreg_DeviceFileMode=0660
install nvidia PATH=$PATH:/bin:/usr/bin; if /sbin/modprobe --ignore-install nvidia; then if /sbin/modprobe nvidia_uvm; then if ! -c /dev/nvidia-uvm ]; then mknod -m 660 /dev/nvidia-uvm c $(cat /proc/devices | while read major device; do if "$device" == "nvidia-uvm" ]; then echo $major; break; fi ; done) 0; chown :video /dev/nvidia-uvm; fi; fi; if ! -c /dev/nvidiactl ]; then mknod -m 660 /dev/nvidiactl c 195 255; chown :video /dev/nvidiactl; fi; devid=-1; for dev in $(ls -d /sys/bus/pci/devices/*); do vendorid=$(cat $dev/vendor); if "$vendorid" == "0x10de" ]; then class=$(cat $dev/class); classid=${class%%00}; if "$classid" == "0x0300" -o "$classid" == "0x0302" ]; then devid=$((devid+1)); if ! -c /dev/nvidia${devid} ]; then mknod -m 660 /dev/nvidia${devid} c 195 ${devid}; chown :video /dev/nvidia${devid}; fi; fi; fi; done; /sbin/modprobe nvidia_drm; if ! -c /dev/nvidia-modeset ]; then mknod -m 660 /dev/nvidia-modeset c 195 254; chown :video /dev/nvidia-modeset; fi; fi root@felicity:/etc/modprobe.d # cat nvidia-default.conf
blacklist nouveau
root@felicity:/etc/modprobe.d #
bbswitch is configured to disable the dGPU at module load (which it attempts according to journalctl -b):
root@felicity:/etc/modprobe.d # cat 50-bbswitch.conf
options bbswitch load_state=0 unload_state=1
I found various hints about ACPI kernel parameters and found these to not be successful:
acpi_osi=! acpi_osi=Linux
acpi_osi=! acpi_osi="Windows 2009"
acpi_osi=! acpi_osi="Windows 2013"
acpi_osi=! acpi_osi="Windows 2015"
acpi_osi=! acpi_osi="Windows 2017"
acpi_osi=! acpi_osi="Windows 2018"
I also tried adding a “blacklist nvidia” to /etc/modprobe.d/50-nvidia-default.conf, but that caused the kernel to fail booting. It just hung endlessly very early in the boot process. Luckily I have a btrfs setup and could snapper rollback.
The BIOS/EC does not appear to have an option to disable the dGPU.
So this is where I’m out of wits here. Does anyone have an idea what else I could try?
Thank you so much in advance! This will finally give me a proper battery life… I hope