Good day all,
For a while now I’ve been having some weird freezes/hangs/crashes in darktable when it uses OpenCL (a RAW photo processor) on Tumbleweed.
At first I thought it was an issue with darktable itself, then figured it was probably a broader OpenCL issue as it persisted through various updates of darktable.
But at the same time it persisted through a number of graphics driver updates, Mesa updates and other (possibly) related packages. So it got to the point where I just couldn’t use my discrete GPU for my photo editing anymore.
OpenCL in darktable works on my iGPU (AMD Ryzen 7 7700 with RDNA-2 arch iGPU). Not an option for daily use though, much too slow, just wanted to confirm the issue was related to the Intel GPU specifically.
OpenCL in darktable works on Kubuntu 25.10 on my Intel Arc B580 and AMD iGPU.
Doesn’t matter which version of darktable I use by the way. Even compiled a version from source and it had the exact same issue.
So I think it’s safe to assume the issue is related the kernel drivers (Xe/i915) for the Intel GPU in Tumbleweed.
Removing all darktable related directories/files/config and OpenCL related packages, rebooting and installing them again hasn’t fixed anything.
Searching the internet hasn’t yielded any similar cases, at least nothing recent and this specific, so I dug a little deeper.
sudo dmesg:
xe 0000:03:00.0: [drm] Tile0: GT0: Engine reset: engine_class=ccs, logical_mask: 0x1, guc_id=95
[ 4870.511420] [ T80085] xe 0000:03:00.0: [drm] Xe device coredump has been created</i>
[ 4870.511423] [ T80085] xe 0000:03:00.0: [drm] Check your /sys/class/drm/card0/device/devcoredump/data
journalctl | grep ‘drm’:
Dec 28 20:13:01 Tumbleweed kernel: xe 0000:03:00.0: [drm] Xe device coredump has been deleted.
Dec 28 20:31:02 Tumbleweed kernel: xe 0000:03:00.0: [drm] Tile0: GT0: Engine reset: engine_class=ccs, logical_mask: 0x1, guc_id=113
Dec 28 20:31:02 Tumbleweed kernel: xe 0000:03:00.0: [drm] Xe device coredump has been created
Dec 28 20:31:02 Tumbleweed kernel: xe 0000:03:00.0: [drm] Check your /sys/class/drm/card0/device/devcoredump/data
Dec 28 21:31:40 Tumbleweed kernel: xe 0000:03:00.0: [drm] Xe device coredump has been deleted.
Dec 28 22:22:22 Tumbleweed kernel: xe 0000:03:00.0: [drm] Tile0: GT0: Engine reset: engine_class=ccs, logical_mask: 0x1, guc_id=93
Dec 28 22:22:22 Tumbleweed kernel: xe 0000:03:00.0: [drm] Xe device coredump has been created
Dec 28 22:22:22 Tumbleweed kernel: xe 0000:03:00.0: [drm] Check your /sys/class/drm/card0/device/devcoredump/data
Dec 28 22:38:26 Tumbleweed kernel: xe 0000:03:00.0: [drm] Tile0: GT0: Engine reset: engine_class=ccs, logical_mask: 0x1, guc_id=30
Dec 28 22:48:33 Tumbleweed kernel: xe 0000:03:00.0: [drm] Tile0: GT0: Engine reset: engine_class=ccs, logical_mask: 0x1, guc_id=30
So I had a look at the coredump which contained among other things:
***** GuC Log *****
GuC firmware: xe/bmg_guc_70.bin
**GuC version: 70.55.3 (wanted 70.49.4)**
*Kernel timestamp: 0x46E27177B19 [4871148763929]*
GuC timestamp: 0x15E17DA2E6 [93977420518]
I can only guess as to what it all means. Firmware mis-match? Kernel driver bug? Something totally different? I won’t bother speculating any further as this is way beyond what I know.
A quick search did suggest, possibly, that similar issues have occurred before and if I interpret it correctly it was/is a kernel bug.
That’s about as far as I’ve gotten after hours and hours of poking and prodding.
I hope someone here can make sense of this and maybe point me in the right direction. Honestly I’m not even sure this is the right place to ask since this is such a weird problem.
From where I’m sitting and with my lack of knowledge this could be a kernel bug that affects multiple Linux distros, various Intel GPUs and maybe it’s so specific it has largely gone unnoticed.
It could just as well be a firmware bug I guess. The log entries do seem to point at something firmware related, right?
I wasn’t going to speculate any more…
Anyway, if anybody has a clue I’d love to hear about it.
If I’m in the wrong place asking this question, I apologize.
If more info is required, please let me know and I’ll do my best to get back to you asap.
System:
Kernel: 6.18.2-1-default arch: x86_64 bits: 64 compiler: gcc v: 15.2.1
clocksource: tsc avail: hpet,acpi_pm
parameters: BOOT_IMAGE=/boot/vmlinuz-6.18.2-1-default
root=/dev/mapper/system-root splash=silent quiet security=selinux
selinux=1 enforcing=1 amd_pstate.shared_mem=1 amd_pstate=active
acpi_enforce_resources=lax mitigations=auto
Desktop: KDE Plasma v: 6.5.4 tk: Qt v: N/A info: frameworks v: 6.21.0
wm: kwin_wayland tools: avail: xscreensaver vt: 3 dm: SDDM Distro: openSUSE
Tumbleweed 20251227
Graphics:
Device-1: Intel Battlemage G21 [Arc B580] driver: xe v: kernel arch: Xe2
process: TSMC n4 (4nm) built: 2024+ pcie: gen: 1 speed: 2.5 GT/s lanes: 1
ports: active: HDMI-A-2,HDMI-A-3 empty: DP-1, DP-2, DP-3, HDMI-A-1,
HDMI-A-4 bus-ID: 03:00.0 chip-ID: 8086:e20b class-ID: 0300
Display: wayland server: X.org v: 1.21.1.21 with: Xwayland v: 24.1.8
compositor: kwin_wayland driver: X: loaded: modesetting unloaded: vesa
alternate: fbdev,intel dri: iris gpu: xe d-rect: 3840x1080 display-ID: 0
Monitor-1: HDMI-A-2 pos: right model: LG (GoldStar) IPS FULLHD built: 2014
res: mode: 1920x1080 hz: 60 scale: 100% (1) dpi: 102 gamma: 1.2
size: 480x270mm (18.9x10.63") diag: 551mm (21.7") ratio: 16:9 modes:
max: 1920x1080 min: 720x400
Monitor-2: HDMI-A-3 pos: primary,left model: Denon DENON-AVR
serial: <filter> built: 2022 res: mode: 1920x1080 hz: 60 scale: 100% (1)
dpi: 61 gamma: 1.2 size: 1600x900mm (62.99x35.43") diag: 1836mm (72.3")
ratio: 16:9 modes: max: 3840x2160 min: 720x400
API: EGL v: 1.5 hw: drv: intel iris platforms: device: 0 drv: iris
device: 1 drv: swrast gbm: drv: iris surfaceless: drv: iris wayland:
drv: iris x11: drv: iris
API: OpenGL v: 4.6 compat-v: 4.5 vendor: intel mesa v: 25.3.1 glx-v: 1.4
direct-render: yes renderer: Mesa Intel Arc B580 Graphics (BMG G21)
device-ID: 8086:e20b memory: 11.65 GiB unified: no display-ID: :1.0
API: Vulkan v: 1.4.335 layers: 4 device: 0 type: discrete-gpu name: Intel
Arc B580 Graphics (BMG G21) driver: mesa intel v: 25.3.1
device-ID: 8086:e20b surfaces: N/A device: 1 type: cpu name: llvmpipe
(LLVM 21.1.6 256 bits) driver: mesa llvmpipe v: 25.3.1 (LLVM 21.1.6)
device-ID: 10005:0000 surfaces: N/A
Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo
de: kscreen-console,kscreen-doctor gpu: amd-smi, lact, radeontop
wl: wayland-info x11: xdpyinfo, xprop, xrandr
03:00.0 VGA compatible controller: Intel Corporation Battlemage G21 [Arc B580]
Subsystem: Device 172f:4215
Kernel driver in use: xe
Kernel modules: xe
03:00.0 VGA compatible controller [0300]: Intel Corporation Battlemage G21 [Arc B580] [8086:e20b] (prog-if 00 [VGA controller])
Subsystem: Device [172f:4215]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin ? routed to IRQ 103
IOMMU group: 15
Region 0: Memory at f5000000 (64-bit, non-prefetchable) [size=16M]
Region 2: Memory at f800000000 (64-bit, prefetchable) [size=16G]
Expansion ROM at f6000000 [disabled] [size=2M]
Capabilities: [40] Vendor Specific Information: Intel Capabilities v1
CapA: Peg60Dis- Peg12Dis- Peg11Dis- Peg10Dis- PeLWUDis- DmiWidth=x4
EccDis- ForceEccEn- VTdDis- DmiG2Dis- PegG2Dis- DDRMaxSize=Unlimited
1NDis- CDDis- DDPCDis- X2APICEn- PDCDis- IGDis- CDID=0 CRID=0
DDROCCAP- OCEn- DDRWrtVrefEn+ DDR3LEn+
CapB: ImguDis- OCbySSKUCap- OCbySSKUEn- SMTCap- CacheSzCap 0x0
SoftBinCap- DDR3MaxFreqWithRef100=Disabled PegG3Dis-
PkgTyp- AddGfxEn- AddGfxCap- PegX16Dis- DmiG3Dis- GmmDis-
DDR3MaxFreq=2932MHz LPDDR3En-
Capabilities: [70] Express (v2) Endpoint, IntMsgNum 0
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0W TEE-IO-
DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 256 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, LnkDisable- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- FltModeDis-
LnkSta: Speed 2.5GT/s, Width x1
TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range B, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt+ EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
AtomicOpsCtl: ReqEn-
IDOReq- IDOCompl- LTR+ EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkCap2: Supported Link Speeds: 2.5GT/s, Crosslink- Retimer- 2Retimers- DRS-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported, FltMode-
Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable+ 64bit+
Address: 00000000fee00000 Data: 0000
Masking: 00000000 Pending: 00000000
Capabilities: [d0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [110 v1] Null
Capabilities: [200 v1] Address Translation Service (ATS)
ATSCap: Invalidate Queue Depth: 00
ATSCtl: Enable+, Smallest Translation Unit: 00
Capabilities: [420 v1] Physical Resizable BAR
BAR 2: current size: 16GB, supported: 256MB 512MB 1GB 2GB 4GB 8GB 16GB
Capabilities: [400 v1] Latency Tolerance Reporting
Max snoop latency: 1048576ns
Max no snoop latency: 1048576ns
Kernel driver in use: xe
Kernel modules: xe