Any experience to share on eGPU enclosure + GPU use with laptop and openSUSE?

I am thinking to buy an external GPU (eGPU) enclosure, to insert something like a nVidia RTX3070 graphic card with GPU (in the eGPU) to use with my Lenovo ultra book computer. I am considering to procure the eGPU device for use in processing videos and also for playing with some AI functionality where the applications use CUDA and can take advantage of a nVidia GPU to improve the processing speed.

For example there are some opensource apps that can upscale old 240p to 480p videos of mine to higher resolution, and for other video processing use ffmpeg (if compiled with the right options can also use (via cuda) GPU capability).

I appreciate a desktop PC is “THE WAY” to go for fastest processing with a GPU, but I no longer plan to use a desktop PC, but rather only use a thin ultra book for my computer. An ultrabook is best for 90% of my use, but intel based ultrabooks do seriously lag in processing capability compared to a laptop or desktop PC with a nvidia GPU.

There are laptops that have good GPU functionality (compatible with cuda) but they are large and bulkly. That is definitely not for me. 90% of the time, I want the light weight and slim form factor of an ultra book, so for the 10% of the time (at home) when I want faster GPU processing, I like the idea of the eGPU enclosure. I don’t want a second laptop/PC for this functionality.

I am currently looking to possibly purchase an EXP GDC TH3P4G3 eGPU and possibly a nvidia RTX3070 (8GB VRAM) or RTX3060 (12GB VRAM) graphic card. Given the major bandwidth limitations of Thunderbolt 3/4 when connecting to the eGPU (from the ultra book) I suspect the benefit of going with much faster nvidia GPU would be lost due to the bandwidth issues associated with external enclosure connection.

I do NOT plan (nor do I) play video games with such a setup … I only want such for some AI video processing. For example, I read that even thou kdenlive does NOT use cuda in its processing, kdenlive does use ffmpeg, so if one’s ffmpeg version is compiled with cuda support, kdenlive video editor will benefit from such.

So my question …

Any experience by openSUSE LEAP users on using an eGPU with nvidia card and a thunderbolt equipped laptop, that you care to share? Aside from also ensuring one’s eGPU has a sufficiently sized powersupply (to power the nvidia GPU), and that thunderbolt is supported, and care to ensure that any app used is both coded for and compiled for cuda support, is there anything else to be aware of?

I suspect I will need to compile my own version of ffmpeg (possibly rebuild/recompile a packman or other packaged ffmpeg version if I can find the rpm spec file). Worst case I would need to compile ffmpeg myself from source.

This is my current setup

remus@Dell-G16:~> inxi -G
Graphics:
  Device-1: Intel Alder Lake-P GT2 [Iris Xe Graphics] driver: i915 v: kernel
  Device-2: NVIDIA GA106M [GeForce RTX 3060 Mobile / Max-Q] driver: nvidia v: 570.144
  Device-3: NVIDIA TU104GL [Quadro RTX 4000] driver: nvidia v: 570.144
  Device-4: NVIDIA TU104GL [Quadro RTX 4000] driver: nvidia v: 570.144
  Device-5: Microdia Integrated_Webcam_HD driver: uvcvideo type: USB
  Display: server: X.org v: 1.21.1.11 with: Xwayland v: 24.1.1 driver: X: loaded: nvidia
    failed: modesetting dri: iris,nouveau gpu: i915,nvidia,nvidia-nvswitch tty: 224x58 resolution:
    1: 3840x2160 2: 3840x2160 3: 3840x2160 4: 3840x2160 5: 3840x2160 6: 3840x2160 7: 2560x1600
  API: OpenGL Message: GL data unavailable in console. Try -G --display

So very doable.

Both Quadros are connected thru eGPUs enclosures (Sonnet 750) using Thunderbolt 4 hub ( OWC 5 Port Thunderbolt Hub). And as you can see 6x4k monitors as well. The enclosures are thunderbolt3. I just could not find anything supporting the Quadros with Thurderbolt 4. I had to make the sacrifice if I wanted to get started with my 3d videos. Yes, I will upgrade in the future once all the hardware match the same speeds. But looking at 6x4k monitors is awesome just by itself.

My setup is a couple of years old and Thunderbolt 5 is out now. I would look into same setup but with tb5 devices.

I only use this setup for Blender 3d. Very fast!!

My current issue now is the NVIDIA update release just tonight (v570.144) that hosed my system. Missing a user that needs to run a service.

If you decided to move forward, the file below will save you a couple of weeks in internet searches.

/etc/X11/xorg.conf.d/80-igpu-primary-egpu-offload.conf

Section "Device"
    Identifier "Device0"
    Driver     "modesetting"
EndSection

Section "Device"
    Identifier "Device2"
    Driver     "nvidia"
    BusID      "PCI:07:00:0"                 # Edit according to lspci, translate from hex to decimal.
    Option     "AllowExternalGpus" "True"    # Required for proprietary NVIDIA driver.
EndSection

The BusID needs to be in DECIMAL not HEX. In case you plan to have more than one eGPU, copy the same file under a different name and update the BusID for the second or third eGPU.

Good luck!

2 Likes

Thanks. Interesting.

I have an older Lenovo X1 Carbon Generation-9 laptop which I plan to initially use (likely it will be another year or two before I replace this laptop).

My X1 Carbon Gen 9 laptop has an integrated Intel Iris Xe GPU. I believe that means I will be facing NVIDIA Optimus/Hybrid Graphics aspects. I fear that managing the switch between the integrated GPU and the external NVIDIA GPU (Optimus-like behavior) in Linux could be complex. Hence your comment on proper NVIDIA driver installation and your Xorg configuration file to ensure the NVIDIA card is used for the correct applications or as the primary display is most useful.

I read the Lenovo ThinkPads, including the X1 Carbon series, have BIOS settings for Thunderbolt security. I read that I might need to adjust this setting (e.g., set it to “No Security” or “User Authorization” if “DisplayPort & USB only” is too restrictive) to allow the eGPU to be fully recognized and utilized. I read that this can be a common troubleshooting step for eGPU users.

PCIe Tunneling: I also read that some Lenovo discussions mention an option called “PCIe Tunneling” under Thunderbolt settings. While typically recommended to be on for eGPU functionality, if one face issues, some users have tried toggling it off and on in conjunction with other troubleshooting steps.

Hence given the above, I read I should (for BIOS Thunderbolt Settings) go into my X1 Carbon Gen 9’s BIOS and explore the Thunderbolt settings. Purportedly I need to make sure the security level is set appropriately (often “User Authorization” or “No Security” is needed for eGPUs, where I read I should consult my laptop’s manual or Lenovo support docs for the exact wording). Also, I read i should confirm “PCIe Tunneling” is enabled.

I also read that in regards to boltd (Thunderbolt Daemon), that the boltd service, which manages Thunderbolt devices, can sometimes crash or have issues in openSUSE, preventing the eGPU from being recognized. Hopefully that does not occur.

This is also still far away. It will likely be some months before I proceed with the eGPU / GPU / powersupply (probably 500watt for eGPU) purchase.

@oldcpu X11 is going bye bye, if going to be months. You will be fine on Wayland, everything works OTB these days with Intel and Nvidia.

Or look at Intel ARC gpu’s. I’m running an A310 (31W) and a A380 (55W) here on older hardware compute, graphics, encode/decode run fine.

You should be fine. Just connect your hardware properly by instructions and boot it up. Once step at a time. Once you see the eGPU with the “lspci” output, you should be fine. The rest is gonna be software configuration.

That’s what I did with my dell laptop using default BIOS settings.

Good luck

Thankyou for the encouragements.

I read that there could be challenges with Hybrid Graphics and Wayland (although I guess I will cross that bridge when I come to it). Things I read (and I do not know how accurate):

  • NVIDIA’s Proprietary Driver: NVIDIA’s proprietary drivers, often needed for optimal performance, haven’t always had perfect integration with Wayland and hybrid setups.

  • GPU Switching: Ensuring that the correct GPU is used for appropriate tasks, and that the rendering results are displayed correctly, can be tricky.

  • Performance Issues: Some users report performance degradation or glitches when using Wayland with NVIDIA hybrid graphics, especially with external monitors.

  • External Monitors: Connecting external monitors can complicate things, as they are often directly connected to the dedicated GPU, leading to rendering issues in Wayland.

For solutions and Workarounds I read:

  • prime-run too: This tool, recommended by NVIDIA, allows one to run specific applications on the dedicated GPU while the rest of the system uses the integrated one.

  • Environment Variables: Pruprotedly Setting environment variables like __NV_PRIME_RENDER_OFFLOAD=1 and __GLX_VENDOR_LIBRARY_NAME=“nvidia” can help, but their effectiveness can vary.

  • DRM Kernel Mode Setting: Purportedly enabling DRM (Direct Rendering Manager) kernel mode setting in the Linux kernel is often necessary for Wayland to function correctly with NVIDIA.

  • BIOS Settings: As noted previous, some laptops offer BIOS options to switch between integrated and dedicated GPUs, which can help resolve issues.

  • EnvyControl: I never heard of this before. Purportedly this tool provides an alternative way to manage hybrid graphics, including switching between GPUs.

  • KDE Plasma 6: KDE Plasma 6 purportedly includes some improvements for handling hybrid graphics, including support for AMD FreeSync.

I suspect I am over thinking this and I just need to go ahead, make the purchases and learn.

Even thou retired, I never seem to find the time. I am currently juggling some other hobbies (and in particular planning for a lot of global travel, wanting to do this before I get too old to travel, and I tend to give that priority). So I suspect I am some months away, but I can’t help but think if I were to proceed sooner, I would gain some time back by faster graphics processing.

@oldcpu that is old out of date information/news…

Anything newer than Turning can use the OSS (MIT/GPL) nvidia driver and is the default in openSUSE from the oss repository…

It depends on the hardware manufacturer setup. Many newer Nvidia GPU’s may only be offload devices, so the primary graphics is connecting any external monitors on laptops.

Prime Render Offload uses the Nvidia or any other gpu without user interaction on Wayland for compute and graphics. Only when wanting to use Prime Render Offload with an application you can use switcherooctl. Prime is so last year and only for X11 and only for older optimus setups.

As you can see in the above nvtop my system is using the Intel Arc (Device 1) for Graphics and Compute, likewise the RTX4000 (Device 0) is using both as well.

I have Vulkan set to use the Nvidia GPU first as well via environment settings, likewise for libva applications.

 pinxi -Gxxz
Graphics:
  Device-1: Intel DG2 [Arc A380] vendor: ASRock driver: xe v: kernel
    arch: Xe-HPG pcie: speed: 2.5 GT/s lanes: 1 ports: active: DP-1,DP-2,DP-3
    empty: HDMI-A-1, HDMI-A-2, HDMI-A-3, HDMI-A-4 bus-ID: 04:00.0
    chip-ID: 8086:56a5
  Device-2: NVIDIA TU104GL [Quadro RTX 4000] vendor: Hewlett-Packard
    driver: nvidia v: 575.64 arch: Turing pcie: speed: 2.5 GT/s lanes: 16 ports:
    active: none empty: DP-4, DP-5, DP-6, Unknown-1 bus-ID: 07:00.0
    chip-ID: 10de:1eb1
  Display: wayland server: X.org v: 1.21.1.15 with: Xwayland v: 24.1.7
    compositor: gnome-shell v: 48.2 driver: X: loaded: modesetting,nvidia
    unloaded: vesa alternate: fbdev,intel,nouveau,nv gpu: xe d-rect: 5760x2160
    display-ID: 0
  Monitor-1: DP-1 pos: bottom-c model: Sceptre F24 res: 1920x1080 hz: 100
    dpi: 93 diag: 604mm (23.8")
  Monitor-2: DP-2 pos: primary,top-left model: Sceptre F24 res: 1920x1080
    hz: 60 dpi: 93 diag: 604mm (23.8")
  Monitor-3: DP-3 pos: top-right model: Sceptre F24 res: 1920x1080 hz: 100
    dpi: 93 diag: 604mm (23.8")
  API: EGL v: 1.5 platforms: device: 0 drv: nvidia device: 1 drv: iris
    device: 3 drv: swrast gbm: drv: iris surfaceless: drv: nvidia wayland:
    drv: iris x11: drv: iris inactive: device-2
  API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: intel mesa v: 25.1.3 glx-v: 1.4
    direct-render: yes renderer: Mesa Intel Arc A380 Graphics (DG2)
    device-ID: 8086:56a5 display-ID: :0.0
  API: Vulkan v: 1.4.313 surfaces: N/A device: 0 type: discrete-gpu
    driver: nvidia device-ID: 10de:1eb1 device: 1 type: discrete-gpu
    driver: mesa intel device-ID: 8086:56a5 device: 2 type: cpu
    driver: mesa llvmpipe device-ID: 10005:0000
  Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo gpu: gputop,
    intel_gpu_top, lsgpu, nvidia-settings, nvidia-smi wl: wayland-info
    x11: xdpyinfo, xprop, xrandr

I have no issues with Wayland here I recently switched to the Xe driver for Intel, it’s performing better than i915 driver…

I have no monitors connected to the Prime Render Offload devices here, it’s not needed to use the power of any additional gpus and Prime Render Offload.

I have two other setups Intel ARC and Tesla P4, that’s a vGPU so no outputs what so ever, then another with Intel UHD 630 and Quadro T400, again no issues…

@oldcpu what are the specs on your Ultra book, is this a recent purchase?

My laptop is an old Lenovo X1 Carbon Generation 9 with a Core-i7-1165G7, and 16GB of RAM. I suspect I may have to repartition the SSD drive and re-install GNU/Linux as I only allocated 25GB to / (root) and have too much allocated to /home (819GB allocated). Its unclear to me if I will need to use Tumbleweed or can stick with Leap-15.6.

Likely late next year I will purchase a new laptop, but I don’t want to wait 18-months.

My laptop:

oldcpu@lenovo:~> inxi -G -C -M -m -xx --graphics --cpu --memory
Machine:
  Type: Laptop System: LENOVO product: 20XW00A7TH v: ThinkPad X1 Carbon Gen 9
    serial: <superuser required> Chassis: type: 10 serial: <superuser required>
  Mobo: LENOVO model: 20XW00A7TH v: SDK0J40697 WIN
    serial: <superuser required> UEFI: LENOVO v: N32ET95W (1.71 )
    date: 10/24/2024
Memory:
  System RAM: available: 15.35 GiB used: 3.05 GiB (19.9%)
  RAM Report: permissions: Unable to run dmidecode. Root privileges
    required.
CPU:
  Info: quad core model: 11th Gen Intel Core i7-1165G7 bits: 64 type: MT MCP
    arch: Tiger Lake rev: 1 cache: L1: 320 KiB L2: 5 MiB L3: 12 MiB
  Speed (MHz): avg: 1305 high: 4099 min/max: 400/4700 cores: 1: 400 2: 400
    3: 1946 4: 4099 5: 400 6: 1361 7: 1437 8: 400 bogomips: 44851
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: Intel TigerLake-LP GT2 [Iris Xe Graphics] vendor: Lenovo
    driver: i915 v: kernel arch: Gen-12.1 ports: active: eDP-1 empty: DP-1,
    DP-2, DP-3, DP-4, HDMI-A-1, HDMI-A-2, HDMI-A-3 bus-ID: 00:02.0
    chip-ID: 8086:9a49
  Device-2: Chicony Integrated Camera driver: uvcvideo type: USB rev: 2.0
    speed: 480 Mb/s lanes: 1 bus-ID: 3-4:3 chip-ID: 04f2:b6ea
  Display: x11 server: X.Org v: 1.21.1.11 with: Xwayland v: 24.1.1
    compositor: kwin_x11 driver: X: loaded: modesetting unloaded: fbdev,vesa
    alternate: intel dri: iris gpu: i915 display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1200 s-dpi: 96
  Monitor-1: eDP-1 model-id: CSO 0x1404 res: 1920x1200 dpi: 161
    diag: 356mm (14")
  API: OpenGL v: 4.6 Mesa 23.3.4 renderer: Mesa Intel Xe Graphics (TGL GT2)
    direct-render: Yes

and also

oldcpu@lenovo:~> sudo lspci | grep -i "thunderbolt"
00:07.0 PCI bridge: Intel Corporation Tiger Lake-LP Thunderbolt 4 PCI Express Root Port #1 (rev 01)
00:07.2 PCI bridge: Intel Corporation Tiger Lake-LP Thunderbolt 4 PCI Express Root Port #2 (rev 01)
00:0d.0 USB controller: Intel Corporation Tiger Lake-LP Thunderbolt 4 USB Controller (rev 01)
00:0d.2 USB controller: Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #0 (rev 01)
00:0d.3 USB controller: Intel Corporation Tiger Lake-LP Thunderbolt 4 NHI #1 (rev 01)

As noted I am considering either an RTX 3070 GPU (8 GB VRAM) in an eGPU or an nVidia RTX 3060 (12 GB VRAM) to put in an eGPU. I believe the nVidia RTX 4060 ti (16 GB VRAM) will exceed my budget, and given bandwidth limitation of Thunderbolt, I may not get much benefit from the RTX 4060 ti in an eGPU. I am pondering something like a 500 watt power supply.

I believe the EXP GDC TH3P4G3 eGPU dock uses the Intel JHL7440 Thunderbolt 3 controller (also referred to as the “Tamales 2” module). Having this firmware is, I believe, important for GNU/Linxu compabibility. Some of the TH3P4G3 eGPU enclosure specs are

  • Thunderbolt 3/4 or USB4 (40 Gbps transfer rate)
  • PCIe Lanes: Supports up to 4 lanes of PCIe 3.0 (32 Gbps bandwidth)
  • 2 x Thunderbolt 3 USB-C ports (supports daisy-chaining for additional Thunderbolt devices)
  • 1 x USB 3.0 port (5 Gbps, 5V/2A)
  • 1 x USB-C port (5V/3A)

I need to sort when I have the time to procure this - I spend a lot of my time today planning future travel, with two lengthy visits to Canada and one lengthy visit to Australia coming up in the next year.

@oldcpu you need PCIe 4.0 x16 to get any benefit for 30xx and 40xx series, If only PCIe 3.0 consider the 20xx series. I have a feeling you may be disappointed?

Consider looking at a Mini PC with something like an Alder Lake gpu, I have a Beelink device here, only 4 cores, but uses hardware decode/encode OTB with the likes of flatpak VLC and Handbrake. You can get those with more powerful CPU/GPU’s these days. They probably have some with ARC GPU’s as well and likely cheaper than what your looking at?

Edit, have a search on “beelink-gti-ex-bundle” that looks interesting… and “beelink-ex-docking-station” it has a built in 600W power supply.

Thanks. That is DEFINITELY something for me to consider. Price wise, after just checking now, I think I can obtain:

  • RTX-2070 Super with 8GB VRAM for 6,800 Thai Baht [ ~$208 US$ ]
  • RTX-3070 with 8GB VVRAM for 7,890 Thai Baht [ ~$241 US$ ]
  • RTX-2080 with 8GB VRAM for 7,700 Thai baht. [ ~$235 US$ ]

I suspect that means at the current prices that I found, the RTX-2070 Super may be worth considering.

Assuming my research correct that the EXP GDC TH3P4G3 dock uses the Intel JHL7440 Thunderbolt 3 controller then indeed I am restricted to PCIe 3.0 for the setup under consideration.

From what I read, the actual usable PCIe bandwidth for the GPU through a Thunderbolt 3 controller like the JHL7440 is generally limited to around 22-24 Gbps (2.75 - 3 GB/s) due to overhead, where this is significantly less than a full PCIe 3.0 x16 slot (16 GB/s) or a PCIe 4.0 x16 slot (32 GB/s) in a desktop PC.

Perhaps this will improve a bit when Thunderbolt-5 is introduced (where I read Thunderbolt 5 is designed to provide a PCIe 4.0 x4 link to the external GPU and it may provide roughly double the usable PCIe bandwidth of Thunderbolt 3/4), but that could be a couple of years away before a company like Lenovo produce new X1 Carbon laptops (with Thunderbolt-5) and inexpensive eGPU docks like the EXP GDC TH3P4G3 are provided with Thunderbolt 5 docks.

Like all of us, I am not getting any younger, and waiting a couple of years (?) for industry to fully implement Thunderbolt 5 is likely longer than I wish to wait. I note that the RTX-2070 Super (came to the market in ~ July-2019) in comparison to the RTX-3070 (which came to the market in ~end-Oct-2020). I think the EXP GDC TH3P4G3 came to the market in very late 2022. The actual price in which I can buy this equipment may to a significant extent dictate which device I procure as the RTX-30xx series are not much more than the RTX-20xx series (in the online shops I have been looking at).

Further, I asked myself, if the Intel JHL7440 Thunderbolt 3 controller only supports PCIe 3.0, what Intel firmware controllers are anticipated to support PCIe 4.0 and Thunderbolt-5 ? After asking that, I then read the Intel JHL9580 (Dual Port Controller) or the JHL9480 (Single Port Controller) support PCIe-4.0 and Thunderbolt-5. Further (good news) I read that GNU/Linux supports the Intel Barlow Ridge controllers (which include the JHL9580 and JHL9480), although I don’t know if all bugs are sorted yet in regards to this interface support.

Following up further, I read some $$ eGPU docks that support Intel JHL9580 (Dual Port Controller) or the JHL9480 (Single Port Controller) are only JUST NOW just starting to appear (such as ASUS ROG XG Station 3) although such eGPUs, even if I could get it (where its not yet available where I live) is likely significantly beyond my $$ budget.

So my overall conclusion was this is a ‘moving target’, it may be a couple of years before there is a big improvement over what is available today, so if I wish to ONLY use a thin and lightweight ultrabook computer (which given my travel is a hardfast requirement), I might as well proceed soon to go for an eGPU that I keep at home, but simply don’t spend too much on the eGPU + GPU + power supply, as there will be limits to the graphic performance improvement.

Obviously, I will need to check prices again when I get closer to ‘pulling the trigger’ for my purchase.

I started adding up the costs of procuring an eGPU to connect to my laptop, and it has given me a bit of a pause. Costs to consider being:

  • eGPU
  • Power Supply Unit (PSU)
  • GPU
  • chassis to mount eGPU and PSU
  • auxiliary fan (given I live in Thailand I may need more cooling)
  • miscellaneous items

Given the GPU is an expensive part of this, some GPUs I considered, in addition to RTX-2070 super (8 GB VRAM), were RTX-3060 (12 GB VRAM), RTX-3070 (8 GB VRAM), RTX-3080 (10 GB VRAM), RTX4060 Ti (16 GB VRAM).

The cost of this definitely adds up and begins to approach the cost of that of a new desktop, which I wish to avoid.

So I took a step back and decided to re-consider my current requirements and see if I could think of future requirements - will I actually use such an eGPU/GPU ?

I asked some of the AI Chatbots which user applications (and not developer applications nor video game applications) that run under GNU/Linux that a user could make use of such an eGPU setup (to a lessor or greater degree).

The programs I had pointed out to me (some of which I knew about, but not all), in no particular order are below.

I also had the AI chatbots recommend which graphic card, given my eGPU setup, could benefit from a cost perspective (where in short VRAM/price aspects are “King” ) given these applications. To my surprise the RTX-3060 (w/12 GB VRAM) came up top from a cost-effective perspective. The RTX-3070 suffers due to lack of VRAM (and is more impacted by thunderbolt interface). And the RTX-4060 ti while faster (and has more VRAM), is significantly more expensive and loses benefit from bandwidth limitations.

Of course AI bots make mistakes, but this is the list.

  1. Lc0 (Leela Chess Zero) - (12 GB VRAM) was recommended for noted hardware setup in terms of cost-effective performance.
  2. FaceSwap - needs >= 12 GB VRAM for 512x512 resolutoin
  3. FFmpeg (with nvidia NVENC)
  4. Blender -
  5. Kdenlive - NVENC accelerates encoding
  6. Gimp (with G’MIC Plugin/OpenCL gives minor boost for some filters)
  7. Darktable -
  8. Shotcut - video editor
  9. Natron - Node-based compositing for VFX - I have no idea how this is useful.
  10. Synfig Studio - 2D animation software
  11. Flowblade - non-linear video editor
  12. Olive Video Editor
  13. Darknet (YOLOv8) - object detection
  14. Whisper.cpp - GPU accelerated speech-to-text
  15. DaVinci Resolve - free version uses GPU for color grading.
  16. RawTherapee - openCL accelerated photo editing.
  17. Handbrake (w/NVENC) - video transcoding
  18. Open3D - modern library for 3D processing -
  19. Stable Diffusion (via WebUI or NMKD) - needs >= 12GB VRAM (for 512x512 images)
  20. Upscayl - AI-based image upscaling tool

Programs above that I currently use are Lc0, ffmpeg, gimp, kdenlive, rawtherapee, handbrake, upscayl. Clearly there are programs there that I have not played with which perhaps I could consider (if I were to proceed with this eGPU setup).

And again - I put together the list after asking ChatGPT, DeepSeek, Grok and Gemini.

Its all food for thought (for me).

@oldcpu So your likely looking at a new laptop with Intel ARC and/or Nvidia Offload GPU.

There are a couple of Forum threads where users have such setups…

Newer Nvidia GPU’s should (will?) have the ability to share system RAM, Intel already do to some extent, but also you want a system that has rebar (resizable bar) capabilities.

Don’t shy away from flatpaks either, that’s what I use here for Handbrake and VLC etc, works fine for my needs and works with Nvidia and Intel gpu’s.

To be honest I would suggest looking for just a laptop with Intel B series GPU as this will work better with openCL else it will use the Nvidia GPU on a dual setup.

****Indeed the best approach for me may be a new laptop with a faster CPU (as opposed to going the eGPU route with thunderbolt-3/4).

Why?

I just learned something.

The past few days I have been playing with one of my main video applications, which is a complex bash shell series of commands (all in one line) where I use ffmpeg to stabilize videos (where the videos are of objects (such as boats) far in the distance, where I have lots of shake in the video). My camera is a Nikon CoolPix P950, with 83x optical zoom and on occasion I use 2x digital to 4x digital zoom, meaning zoom levels greater than 150 to greater than 300x magnification. Even with a camera stand, if i am panning the camera at a distant object, I get a lot of shake.

So I use ffmpeg to stabilize the video in 3 passes in a complex command. First using the original shaky video, ffmpeg creates a file of vectors, that mark the change from frame to frame. Then in second pass, ffmpeg using those vectors from that file, encodes the stabilized video using that vector information, creating a stable video. Then in a 3rd pass, ffmpeg takes the two videos (original, and stabilized video) and places them side by side in one video, so I can see how much of the shakiness was cleaned up.

For large videos, this command can run for MANY MANY minutes.

The past couple days, using multiple AI bots (ChatGPT, DeepSeek, Gemini, Grok, Claude.ai) I was able to come up with an improved bash shell command for my desktop (using Intel VA-API) and for my laptop (using Intel QVS) accessing the Intel GPU on my PCs.

Ok, an Intel GPU is not as fast as a nvidia GPU, but what I discovered was in my 3 pass process with the Intel GPU, pass #2 and #3 received a significant performance boost , running much faster. But pass#1, creating the vectors, is CPU driven, and it is a major bottleneck now, with ~80% of the processing time now spent there.

Getting a faster GPU won’t help much more than what the Intel GPUs can provide as the CPU is the performance bottle neck, and this is especially true (not much performance increase from eGPU) if that GPU is crippled by using a Thunderbolt-3/4 interface.

So while I concede I am not getting younger, maybe I am jumping the gun to get a eGPU now, and I should instead spend more time using the Intel GPU on my PCs.

@oldcpu so your user is in the video and render groups? You might want to peruse here https://trac.ffmpeg.org/wiki/Hardware/QuickSync

Edit: Also look at taskset to play around with CPU Affinity.

Thanks. That is an interesting link. I was a bit taken back when read it was referencing the old ?? i965 driver. The two PCs I use are an old intel core-i7-4770 (I think with Haswell graphics (where i am tuning my bash shell commands to use vaapi) and a slightly newer Lenovo X1 Carbon Gen9 with an 11th Gen Intel Core i7-1165G7 that uses Intel TigerLake-LP GT2 [Iris Xe Graphics], where i am tuning my bash shell commands to use QSV.

I haven’t heard of CPU Affinity before. I will need to research that - I am curious if I may be able to use it to help in the CPU intensive tasks.

@oldcpu yes taskset should help with your current setup.

Spec look good for the Lenovo CPU/GPU https://www.intel.com/content/www/us/en/products/sku/208921/intel-core-i71165g7-processor-12m-cache-up-to-4-70-ghz-with-ipu/specifications.html

See what intel_gpu_top (as root user) is running like… or if you set capabilities setcap cap_perfmon=ep $(which intel_gpu_top) to run as your user…

Edit: @Mir_ppc may have some tips as he’s doing a lot of video work…

The Haswell and Xe uses the same encoder. it should be iHD on Opensuse via the intel-media-driver. This enables the QSV that works with KDENlive, handbrake. Following the guide from @malcolmlewis i was able OneAPI working on my ARC card but Haswell and 6-8th gen i have not been able to get it working as LevelZero support doesnt work on those. I dont have anything newer so i cannot test.

Compute so far has worked with Blender, Whisper, and [REDACTEDPROJECT]. I have not opened Resolve.

That is most of what i have worked on so far and my experience as well. Not looking forward to needing to upgrade the Nvidia GTX 1050TI when support for it is dropped. It cannot handle Blender as the Vram is too low. This will make doing videos on installing/updating the Nvidia Driver the Hard Way a pain in the arse.

The Egpu route, that could be interesting as i HAVE done that via older methods. One was using the Expresscard slot in a laptop. With thinkpads i ran into the dreaded 1802 error so be aware of that. The other one i tested was a MPCIE. Both ended up working but there where compromises in performance. iirc Thunderbolt is up to PCIE x4 so there could be a performance hit.

now an oldfart like oldcpu knows where to find some of us over on that chat relay of internets :stuck_out_tongue:

1 Like

Thankyou all for the suggestions - where I have followed through with using the Intel GPU more and also using affinity (taskset) to improve the processing.

My videos are now from a Nikon CoolPix P950 camera vs older Nikon CoolPix P900:

To put my efforts here in perspective (stepping back a bit in my requirement), a big driver for my wanting an improvement in GPU processing, is that I replaced my 11 year old Nikon CoolPix 900 (1920 × 1080 - 1080p Full HD video resolution)) camera with a 6-year old Nikon CoolPix P950 camera (4K UHD => 3840 × 2160 video resolution). That is a much higher resolution. This means after the 83x optical zoom (in both cameras), because of its higher resolution, I have a temptation to go more into digital zoom with the P950. That means more shake, and more shake with a higher resolution, means my previous ffmpeg command line (to destablizing) was taking a long time.

Affinity/Taskset:

Thankyou for the taskset suggestion. With the assistance of a few AI bots, I played a bit with ‘taskset’ trying out affinity, and tested many different settings. With ‘taskset’ I was able to obtain about a 5% to 10% reduction in the processing time of the CPU creation of the ffmpeg created vector file (which indicates changes in each of the video frames). This was with my Lenovo laptop with a 11th Gen Intel Core i7-1165G7 CPU and Intel TigerLake-LP GT2 (Iris Xe Graphics).

Tuned ffmpeg vidstabtransform parameters

I should note this initial testing was over optimistically and mistakenly changed (by me) to very very aggressive (too aggressive and too time consuming) settings to create the vector file with ffmpeg’s vidstabtransform. I subsequently tuned that and significantly reduced the processing time (and hence reduced CPU load) by going for less aggressive settings to reduce the shake in the video (but I still retained video quality reducing the shake). I kept using the taskset command.

Some more detail:

Taskset:

After various trial and error of different configurations, I found for the vector file creation, that " … chrt -r 99 taskset -c 0-7 ffmpeg … " gave the fastest vector file creation (by only a tiny amount faster thou), where I also read there was an off chance that it might be thermally dangerous to the laptop PC to use that setting . So given its improvement was miniscule (over another setting), in the end I left out the “chrt -r 99” and only went with “taskset -c 0-7 ffmpeg” setting.

Overall, I am pretty happy with the results.

Descale for vidstabtransform did not work well:

I also attempted to descale the video (only as part of the process to detect the vectors for each frame in the video - but keep original video resolution when applying vectors) with the intent to massively reduce processing time (which it does), but it has a side affect that when I descale the resultant stabilized video becomes MUCH more shaky and jerky - so I rejected that approach for now.

vidstabtransform parameter optimization was the way to go:

Instead, as noted, I found a combination of step size, accuracy, smoothness and some other configuration aspects which not only reduced the CPU processing time for ffmpeg’s vidstabdetect, but gave better looking video output. It was a lot of trial and error thou to get a bit closer to the ‘sweet’ spot for configuring the ffmpeg vidstabdetect function for my videos. I don’t think I am at the ‘sweet’ spot yet, but I have implemented a big improvement.

QSV working with Intel GPU:

Anyway - back on topic. Using QSV to access the Intel GPU appears to be working and helping to speed up the processing of my videos a bit (using ffmpeg’s vidstabtransform) … and of very importance to me to learn, was for this video application, my processing time bottle neck on this application is now DEFINTELY my CPU and not my GPU, which was educational and a big surprise to me. I had mistakenly thought it to be the GPU.
.
Comparing my original command (before using task set, QSV, and slightly less optimized ffmpeg parameters) to my new command, processing the new 3840 × 2160 resolution videos from my P950:

Pass-1 (create a “vector file” that assesses the video ‘shake’, recording a frame by frame movement calculation of the video’s composition - uses CPU/taskset): ~4.5 fps to ~5.5 fps processing speed improvement - CPU intensive (previous old version was ~5.5 to 6.5 fps processing speed with no ‘taskset’ but with similar vidstabtransform parameters - ie a good improvement).

Pass-2 (create a stabilized video that is based on the “vector file” - uses CPU/taskset with GPU/QSV): ~7.0 fps to ~8 fps processing speed improvement - more GPU than CPU intensive (previous old version was ~1 to ~2.3 fps processing speed with no CPU/taskset with GPU/QSV - obviously a big improvement)

Pass-3 (create side-by-side comparison video of original video with stabilized video (with a slight resolution reduction where quality is less important) - uses CPU/taskset and “libx264 -preset ultrafast -crf 28” ): ~20 fps to ~25 fps processing speed improvement - where this is more CPU than GPU intensive. [previous old version was ~1.5 to ~2.3 fps processing speed where old version had no CPU/taskset and older version had no ‘libx264 -preset ultrafast’ and old version had no resolution reduction - obviously adding those to the new version was a big improvement]. I tried using QSV but libx264 with correct parameter was faster (as quality was not so important for the comparison video).

Extra project: Double run of the command (for difficult shaky videos)

Also, for very difficult shaky videos, I created a version of the very long bash shell command, that after first completing the creation of the stabilized video, it then takes that stabilized output video, and stabilizes that output video itself once again, creating an even more smooth video. That is intended for more difficult stablization efforts. I do thou, prefer not to use this double stabilization version, as obviously being run twice means it takes almost twice as long as opposed to being just being run once.

My application

I guess it is kind of obvious to see, I take a lot of high zoom videos. Going for a camera with a higher resolution has required me to adapt. And I am fortunate to live right on the coast of a bay, and I can watch many sailing regattas right from my condo balcony with the boats 1km to 4km away.

And again, I should note this is with my Lenovo laptop with a 11th Gen Intel Core i7-1165G7 CPU and Intel TigerLake-LP GT2 (Iris Xe Graphics). I have yet to tune the command to work with my Intel-core-i7-4770 desktop where I read its GPU may not run as fast as the Intel Core i7-1165G7 CPU GPU.

@oldcpu your making good progress :smile: I would suggest the next step for your setup is tuned, as you can create profiles for say normal use and then when your processing your images etc. Likewise sysctl tweaks…

For example https://forums.opensuse.org/t/playing-with-tuned-and-tuned-adm/184883