Thankyou all for the suggestions - where I have followed through with using the Intel GPU more and also using affinity (taskset) to improve the processing.
My videos are now from a Nikon CoolPix P950 camera vs older Nikon CoolPix P900:
To put my efforts here in perspective (stepping back a bit in my requirement), a big driver for my wanting an improvement in GPU processing, is that I replaced my 11 year old Nikon CoolPix 900 (1920 Ă 1080 - 1080p Full HD video resolution)) camera with a 6-year old Nikon CoolPix P950 camera (4K UHD => 3840 Ă 2160 video resolution). That is a much higher resolution. This means after the 83x optical zoom (in both cameras), because of its higher resolution, I have a temptation to go more into digital zoom with the P950. That means more shake, and more shake with a higher resolution, means my previous ffmpeg command line (to destablizing) was taking a long time.
Affinity/Taskset:
Thankyou for the taskset suggestion. With the assistance of a few AI bots, I played a bit with âtasksetâ trying out affinity, and tested many different settings. With âtasksetâ I was able to obtain about a 5% to 10% reduction in the processing time of the CPU creation of the ffmpeg created vector file (which indicates changes in each of the video frames). This was with my Lenovo laptop with a 11th Gen Intel Core i7-1165G7 CPU and Intel TigerLake-LP GT2 (Iris Xe Graphics).
Tuned ffmpeg vidstabtransform parameters
I should note this initial testing was over optimistically and mistakenly changed (by me) to very very aggressive (too aggressive and too time consuming) settings to create the vector file with ffmpegâs vidstabtransform. I subsequently tuned that and significantly reduced the processing time (and hence reduced CPU load) by going for less aggressive settings to reduce the shake in the video (but I still retained video quality reducing the shake). I kept using the taskset command.
Some more detail:
Taskset:
After various trial and error of different configurations, I found for the vector file creation, that " ⌠chrt -r 99 taskset -c 0-7 ffmpeg ⌠" gave the fastest vector file creation (by only a tiny amount faster thou), where I also read there was an off chance that it might be thermally dangerous to the laptop PC to use that setting . So given its improvement was miniscule (over another setting), in the end I left out the âchrt -r 99â and only went with âtaskset -c 0-7 ffmpegâ setting.
Overall, I am pretty happy with the results.
Descale for vidstabtransform did not work well:
I also attempted to descale the video (only as part of the process to detect the vectors for each frame in the video - but keep original video resolution when applying vectors) with the intent to massively reduce processing time (which it does), but it has a side affect that when I descale the resultant stabilized video becomes MUCH more shaky and jerky - so I rejected that approach for now.
vidstabtransform parameter optimization was the way to go:
Instead, as noted, I found a combination of step size, accuracy, smoothness and some other configuration aspects which not only reduced the CPU processing time for ffmpegâs vidstabdetect, but gave better looking video output. It was a lot of trial and error thou to get a bit closer to the âsweetâ spot for configuring the ffmpeg vidstabdetect function for my videos. I donât think I am at the âsweetâ spot yet, but I have implemented a big improvement.
QSV working with Intel GPU:
Anyway - back on topic. Using QSV to access the Intel GPU appears to be working and helping to speed up the processing of my videos a bit (using ffmpegâs vidstabtransform) ⌠and of very importance to me to learn, was for this video application, my processing time bottle neck on this application is now DEFINTELY my CPU and not my GPU, which was educational and a big surprise to me. I had mistakenly thought it to be the GPU.
.
Comparing my original command (before using task set, QSV, and slightly less optimized ffmpeg parameters) to my new command, processing the new 3840 Ă 2160 resolution videos from my P950:
Pass-1 (create a âvector fileâ that assesses the video âshakeâ, recording a frame by frame movement calculation of the videoâs composition - uses CPU/taskset): ~4.5 fps to ~5.5 fps processing speed improvement - CPU intensive (previous old version was ~5.5 to 6.5 fps processing speed with no âtasksetâ but with similar vidstabtransform parameters - ie a good improvement).
Pass-2 (create a stabilized video that is based on the âvector fileâ - uses CPU/taskset with GPU/QSV): ~7.0 fps to ~8 fps processing speed improvement - more GPU than CPU intensive (previous old version was ~1 to ~2.3 fps processing speed with no CPU/taskset with GPU/QSV - obviously a big improvement)
Pass-3 (create side-by-side comparison video of original video with stabilized video (with a slight resolution reduction where quality is less important) - uses CPU/taskset and âlibx264 -preset ultrafast -crf 28â ): ~20 fps to ~25 fps processing speed improvement - where this is more CPU than GPU intensive. [previous old version was ~1.5 to ~2.3 fps processing speed where old version had no CPU/taskset and older version had no âlibx264 -preset ultrafastâ and old version had no resolution reduction - obviously adding those to the new version was a big improvement]. I tried using QSV but libx264 with correct parameter was faster (as quality was not so important for the comparison video).
Extra project: Double run of the command (for difficult shaky videos)
Also, for very difficult shaky videos, I created a version of the very long bash shell command, that after first completing the creation of the stabilized video, it then takes that stabilized output video, and stabilizes that output video itself once again, creating an even more smooth video. That is intended for more difficult stablization efforts. I do thou, prefer not to use this double stabilization version, as obviously being run twice means it takes almost twice as long as opposed to being just being run once.
My application
I guess it is kind of obvious to see, I take a lot of high zoom videos. Going for a camera with a higher resolution has required me to adapt. And I am fortunate to live right on the coast of a bay, and I can watch many sailing regattas right from my condo balcony with the boats 1km to 4km away.
And again, I should note this is with my Lenovo laptop with a 11th Gen Intel Core i7-1165G7 CPU and Intel TigerLake-LP GT2 (Iris Xe Graphics). I have yet to tune the command to work with my Intel-core-i7-4770 desktop where I read its GPU may not run as fast as the Intel Core i7-1165G7 CPU GPU.