Lowest possible realtime MIDI-to-sound latency?

OldButNotGuru · September 28, 2019, 5:40am

Please forgive this long post which touches many complex topics while trying to answer the title’s seemingly simple question.

I currently have the following as a testbed:

An executable compiled from C++ source code that …
Reads a /dev/hidrawN special file …
Connected to a secondary USB PC (not a piano/synthesizer/MIDI) keyboard …
That has been detached from X via xinput disable <id>
The C++ translates the key-down and key-up hidraw packets into MIDI note-on and note-off messages …
Which it sends using the RtMidi library via ALSA …
To a running fluidsynth --midi-driver=alsa_seq --audio-driver=alsa process …
Which outputs sound through the PC’s built-in “snd_hda_intel” “sound card”'s analog 3.5mm stereo audio jack …
To a set of self-powered computer monitor speakers.

When running, the above seems subjectively to have too much latency. Apologies for overemphasizing it, but by “latency” I mean the absolute real-time, real world elapsed time between when the PC keyboard switch is pressed to when the speaker cone begins to move. I currently have no way of objectively measuring this, but believe I need to reduce it to below 1 or 2 milliseconds. Certainly less than 5. As per this post’s title, I’d like (on any particular PC hardware) to have the lowest delay possible – if it was 1 nanosecond that would be perfectly acceptable.

I’ve read extensively trying to understand the problem and its possible solutions, but have been confused by finding much conflicting and outdated information. This includes statements like:

openSUSE is not suitable for realtime/production audio; instead use a Linux distribution designed for the task such as KXStudio or Bandshed
“a kernel compiled with realtime patches and configuration is required” vs “all modern linux kernels have good realtime capabilities”
The kernel’s “Completely Fair Scheduler” and dynamic frequency changing make the CONFIG_HZ and/or CONFIG_NO_HZ kernel parameters meaningless.

I have tried changing the fluidsynth and keyboard software’s process priorities with nice/renice, ulimit, and chrt without noticing much difference. Note that at startup fluidsynth outputs:

fluidsynth: warning: Failed to set thread to high priority
fluidsynth: warning: Failed to set thread to high priority
fluidsynth: warning: Failed to pin the sample data to RAM; swapping is possible.
loaded SoundFont has ID 1

Using ulimit -l unlimited fixes the “swapping” warning, and chrt -r 50 <pid of fluidsynth> probably fixes the “thread to high priority” ones, but again neither seem to have much effect.

Side note: How is rtkitctl used? I have found no documentation on it besides the “man” page which merely states “–reset-known : Reset real, real-time status of known threads” and “–reset-all : Reset real-time status of all threads”. What threads? How is rtkitctl told what processes’ realtime priorities to change?

I assume that in my testbed both the keyboard and fluidsynth processes are normally idle – keyboard is waiting in a “read()” system call for data from /dev/hidrawN, and fluidsynth for input on an ALSA pipe or socket (I’m very unclear about the ALSA details).

The entire system is at runlevel 5 with an X desktop, so there are certainly other processes running – top never goes to 100% idle. I’ve looked at my Leap 15.1 install and found:

$ fgrep -i hz /boot/config-4.12.14-lp151.28.13-default
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
# CONFIG_NO_HZ_FULL is not set
CONFIG_NO_HZ=y
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
CONFIG_MACHZ_WDT=m

(Note that the output from zgrep -i hz /proc/config.gz is identical.)

Looking only at “CONFIG_HZ=250” and ignoring “CONFIG_NO_HZ=y” (which I don’t understand), it seems that the lower bound on my worst-case latency is 4 milliseconds – the kernel only interrupts the processes currently running on the system’s cores (assuming there more runnable ones than cores) every 4 ms to run itself. At those intervals it can check the USB hardware and driver(s), see that there is input, and run the keyboard process.

Is this correct? Also, is it true that these parameters can’t be changed in a running kernel (as I have read)? Can they be changed by editing the /boot/config* file and rebooting, or by editing the kernel modeline at boot time? Or does a new kernel need to be compiled?

Alternately, does the kernel wake up asynchronously on (in this example) a hardware USB interrupt, service it (run the driver module) and upon seeing it has data for me halt the lowest priority currently running process and immediately execute my keyboard process without waiting for the CONFIG_HZ timer to elapse?

I believe the latency is caused by kernel scheduling or something associated with it because I’ve tested using zynaddsubfx and amsynth instead of fluidsynth, and with a real MIDI hardware keyboard instead of my PC keyboard program, and the latency always feels about the same.

I’ve also tried using “jack” instead of plain ALSA as the MIDI API and again found no improvement, despite the claims that jack is targeted at low-latency and adds “absolutely zero” overhead (direct quote) even though it runs as a layer on top of ALSA. (At least it doesn’t claim to be faster.) My belief is that jack is designed for synchronizing multiple MIDI (and audio) streams, as is ALSA’s “alsa-seq” vs its “alsa-raw” API. (I know for a fact that alsa-seq embeds timestamps along with other metadata into the raw MIDI messages, which for my use case is just another source of overhead and potentially increased latency.)

I think that jack tries to achieve low latency by constantly streaming data across its connections, I assume to ensure that the processes sending and receiving that data are always tagged as runnable to the kernel scheduler. I know that I always saw the jackd daemon eating at least 5% of a core when running (but doing nothing). Again, please forgive my undeducated yet extremely opinionated belief that this is a Really Bad Idea/Architecture. (Current web browsers like Firefox and Chromium/Chrome work similarly, never going fully to sleep even when not in use. Pardon my disapproval.)

The only ideas I have are to set CONFIG_HZ_1000 (or higher if that’s possible) in order to wake up my keyboard process faster, and to chrt/rtkitctl both that and fluidsynth to SCHED_RR or even SCHED_FIFO. But do those mean an infinite loop bug in the code will lock up the whole system by preventing anything else from running (like a shell to do kill)?

Lots of open-ended questions here, so any discussion or answers welcome.

dcurtisfra · September 28, 2019, 6:52pm

@OldButNotGuru:

There was this openSUSE Kernel Mailing List thread: <https://lists.opensuse.org/opensuse-kernel/2013-07/msg00001.html>.

The “kernel-preempt_rt” package is currently neither available for openSUSE Tumbleweed nor, Leap: <https://software.opensuse.org/package/kernel-preempt_rt>.

You may well need this package if you want to increase the frequency of the Kernel’s clock – higher clock frequencies without a pre-emptive Kernel could be tricky.

BTW, Leap 15.1 currently provides the following data around this issue:


 > uname -a
Linux eck001 4.12.14-lp151.28.16-default #1 SMP Wed Sep 18 05:32:19 UTC 2019 (3e458e0) x86_64 x86_64 x86_64 GNU/Linux
 > 
 > grep ^CONFIG_HZ /boot/config-`uname -r`
CONFIG_HZ_250=y
CONFIG_HZ=250
 >

IOW, standard Leap 15.1 is not suitable for professional Audio (with MIDI) – it doesn’t have a pre-emptive Kernel and, the clock frequency is “only” 250 Hz – “Ted” «see below» recommends 1000 Hz.

The openSUSE Tuning Handbook provides some insight into improving this situation: <https://doc.opensuse.org/documentation/leap/tuning/html/book.sle.tuning/cha.tuning.io.html> and <https://doc.opensuse.org/documentation/leap/tuning/html/book.sle.tuning/cha.tuning.taskscheduler.html> but, I suspect that, the audio latency can only be improved by using a Kernel setup for pre-emption.
[HR][/HR]I presume that, you’ve found the Linux Audio group – <https://wiki.linuxaudio.org/> – and, this: <https://wiki.linuxaudio.org/wiki/jack_latency_tests> «a bit old – 2014» …
And, also, the Linux Documentation Project – <http://tldp.org/> – search for “MIDI” – about 690 results.
And, there’s “Ted”: <http://www.tedfelix.com/linux/linux-midi.html>.
[HR][/HR]So, what to do?

Possibly, the easiest way out is, to consider using “Ubuntu Studio” …

OldButNotGuru · September 28, 2019, 10:00pm

Thanks for the information and the links, dcurtisfra. I had seen several of them, but some of those not for a long time so it was good to review them along with the new ones. There’s still the problem (not your fault) of trying to figure out whether 5 to 10 year old info, some of it talking about kernels back to the 2.x series, is still relevant, plus the conflicting “you need a real-time kernel” vs. “no, you don’t” statements.

I had done some more experimentation between writing my original post and reading yours, and think I’ve found tentative answers to one of the issues – the “CONFIG_xxx_HZ” kernel parameters. As I said before, the stock Leap 15.1 kernel I’m using has both “CONFIG_HZ=250” and “CONFIG_HZ_250=y” but also “CONFIG_NO_HZ_COMMON=y”, “CONFIG_NO_HZ_IDLE=y”, and “CONFIG_NO_HZ=y” (plus “# CONFIG_NO_HZ_FULL is not set”).

I’m still unclear on what these all mean, but trying to answer the question, “Does the kernel use a fixed/slow interrupt frequency?” (which again online statements claim both “Yes” and “No”) I tried the following:

$ (fgrep LOC /proc/interrupts ; sleep 1 ; fgrep LOC /proc/interrupts) | awk '{print} NR==1 {core0=$2 ; core1=$3} NR==2 {print "end", $2-core0, $3-core1}'
LOC:  272022346  261434192   Local timer interrupts
LOC:  272022522  261434312   Local timer interrupts
end 176 120
$ (fgrep LOC /proc/interrupts ; sleep 10 ; fgrep LOC /proc/interrupts) | awk '{print} NR==1 {core0=$2 ; core1=$3} NR==2 {print "end", ($2-core0)/10, ($3-core1)/10}'                  
LOC:  272032818  261440203   Local timer interrupts
LOC:  272033382  261442676   Local timer interrupts
end 56.4 247.3
$ (fgrep LOC /proc/interrupts ; sleep 10 ; fgrep LOC /proc/interrupts) | awk '{print} NR==1 {core0=$2 ; core1=$3} NR==2 {print "end", ($2-core0)/10, ($3-core1)/10}'
LOC:  272043531  261455642   Local timer interrupts
LOC:  272044994  261457190   Local timer interrupts
end 146.3 154.8

So that’s jumping all over the place, and certainly not fixed at 250 Hz.

Then, with code from ADVENAGE GmbH that I modified slightly to run for a longer period of time, I got both that code’s analysis and new results from the /proc/interrupts test:

$ ./kernel_timer_test 100000 &
$ (fgrep LOC /proc/interrupts ; sleep 1 ; fgrep LOC /proc/interrupts) | awk '{print} NR==1 {core0=$2 ; core1=$3} NR==2 {print "end", $2-core0, $3-core1}'
LOC:  272153186  261511208   Local timer interrupts
LOC:  272153483  261515384   Local timer interrupts
end 297 4176
$ (fgrep LOC /proc/interrupts ; sleep 1 ; fgrep LOC /proc/interrupts) | awk '{print} NR==1 {core0=$2 ; core1=$3} NR==2 {print "end", $2-core0, $3-core1}'
LOC:  272153788  261519520   Local timer interrupts
LOC:  272154090  261523719   Local timer interrupts
end 302 4199
$ (fgrep LOC /proc/interrupts ; sleep 1 ; fgrep LOC /proc/interrupts) | awk '{print} NR==1 {core0=$2 ; core1=$3} NR==2 {print "end", $2-core0, $3-core1}'
LOC:  272154700  261530372   Local timer interrupts
LOC:  272155085  261534499   Local timer interrupts
end 385 4127
kernel timer interrupt frequency (100000 loops) is approx. 3816 Hz

I think this (semi-) conclusively shows that “CONFIG_NO_HZ_xxx” is in effect, that the kernel dynamically changes its interrupt frequency, and the end result is that this isn’t the source of my latency problems. (BTW, I finally realized that these are compiled-in kernel settings and that changing the file and rebooting or trying modeline parameters won’t do anything.)

The suggestions at System configuration [Linux-Sound] (thanks again for pointing me at the linuxaudio.org site) lead to GitHub - raboof/realtimeconfigquickscan: Linux configuration checker for systems to be used for real-time audio which I downloaded and ran, getting:

$ perl ./realTimeConfigQuickScan.pl
== GUI-enabled checks ==
Checking if you are root... no - good
Checking filesystem 'noatime' parameter... 4.12.14 kernel - good
(relatime is default since 2.6.30)
Checking CPU Governors... CPU 0: 'ondemand' CPU 1: 'ondemand'  - not good
Set CPU Governors to 'performance' with 'cpupower frequency-set -g performance' or 'cpufreq-set -c <cpunr> -g performance' (Debian/Ubuntu)
See also: http://linuxmusicians.com/viewtopic.php?f=27&t=844
Checking swappiness... 60 - not good
** vm.swappiness is larger than 10
set it with '/sbin/sysctl -w vm.swappiness=10'
See also: http://linuxmusicians.com/viewtopic.php?f=27&t=452&start=30#p8916
Checking for resource-intensive background processes... none found - good
Checking checking sysctl inotify max_user_watches... < 524288 - not good
increase max_user_watches by adding 'fs.inotify.max_user_watches = 524288' to /etc/sysctl.conf and rebooting
For more information, see http://wiki.linuxaudio.org/wiki/system_configuration#sysctlconf
Checking access to the high precision event timer... not readable - not good
/dev/hpet found, but not readable.
make /dev/hpet readable by the 'audio' group
For more information, see http://wiki.linuxaudio.org/wiki/system_configuration#hardware_timers
Checking access to the real-time clock... not readable - not good
/dev/rtc found, but not readable.
make /dev/rtc readable by the 'audio' group
For more information, see http://wiki.linuxaudio.org/wiki/system_configuration#hardware_timers
Checking whether you're in the 'audio' group... yes - good
Checking for multiple 'audio' groups... no - good
chrt: failed to set pid 0's policy: Operation not permitted
Checking the ability to prioritize processes with chrt... no - not good
Could not assign a 80 rtprio SCHED_FIFO value. Set up limits.conf.
For more information, see http://wiki.linuxaudio.org/wiki/system_configuration#limitsconfaudioconf
Checking kernel support for high resolution timers... found - good
Kernel with Real-Time Preemption... not found - not good
Kernel without real-time capabilities found
For more information, see http://wiki.linuxaudio.org/wiki/system_configuration#installing_a_real-time_kernel
Checking if kernel system timer is high-resolution... found - good
Checking kernel support for tickless timer... found - good
== Other checks ==
Checking filesystem types... ok.
Checking for devices at IRQ 27... did not find multiple. ok.

Most of the “not good” analyses can be fixed (by root) via systctl and chmod in a running kernel, which I did for confirmation. Of the two that can’t, I think that “Checking CPU Governors… CPU 0: ‘ondemand’ CPU 1: ‘ondemand’ - not good” can be addressed by some combination of loading modules and cpupower:

$ fgrep -i demand /boot/config-4.12.14-lp151.28.13-default    
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_INFINIBAND_ON_DEMAND_PAGING=y
CONFIG_DEVFREQ_GOV_SIMPLE_ONDEMAND=m

$ lsmod | egrep -i 'gov|freq'
acpi_cpufreq           24576  0

$ find /lib/modules/4.12.14-lp151.28.13-default -iname \*gov\*
/lib/modules/4.12.14-lp151.28.13-default/kernel/drivers/devfreq/governor_powersave.ko
/lib/modules/4.12.14-lp151.28.13-default/kernel/drivers/devfreq/governor_performance.ko
/lib/modules/4.12.14-lp151.28.13-default/kernel/drivers/devfreq/governor_userspace.ko
/lib/modules/4.12.14-lp151.28.13-default/kernel/drivers/devfreq/governor_passive.ko
/lib/modules/4.12.14-lp151.28.13-default/kernel/drivers/devfreq/governor_simpleondemand.ko
 
$ man -k cpupower
cpupower (1)         - Shows and sets processor power related values
cpupower-frequency-info (1) - Utility to retrieve cpufreq kernel information
cpupower-frequency-set (1) - A small tool which allows to modify cpufreq sett...
cpupower-idle-info (1) - Utility to retrieve cpu idle kernel information
cpupower-idle-set (1) - Utility to set cpu idle state specific kernel options
cpupower-info (1)    - Shows processor power related kernel or hardware confi...
cpupower-monitor (1) - Report processor frequency and idle statistics
cpupower-powercap-info (1) - Shows powercapping related kernel and hardware c...
cpupower-set (1)     - Set processor power related kernel or hardware configu...

Any insights or suggestions on the above appreciated.

But my current theory is that the last, and possible most important piece of the puzzle, is "Kernel with Real-Time Preemption… not found - not good. As you pointed out:

dcurtisfra:

The “kernel-preempt_rt” package is currently neither available for openSUSE Tumbleweed nor, Leap: https://software.opensuse.org/package/kernel-preempt_rt.

You may well need this package if you want to increase the frequency of the Kernel’s clock – higher clock frequencies without a pre-emptive Kernel could be tricky.

BTW, Leap 15.1 currently provides the following data around this issue:
 > uname -a
Linux eck001 4.12.14-lp151.28.16-default #1 SMP Wed Sep 18 05:32:19 UTC 2019 (3e458e0) x86_64 x86_64 x86_64 GNU/Linux
 >  

and, as per http://www.tedfelix.com/linux/linux-midi.html:

To check whether you are running a low latency kernel, use uname:$ uname -a
Linux ted-laptop 3.19.0-18-lowlatency #18-Ubuntu SMP PREEMPT Tue May 19 19:02:50 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

The important thing to notice is not “lowlatency”, but “PREEMPT”. That means that I’ve got a preemptible kernel loaded. This means low latency.

So what I think I’m left with is that I’m going to have to try to build a kernel with the kernel-preempt_rt patches. (Not sure where I’ll find them given there isn’t an openSUSE package for the current releases.) Not looking forward to it – I haven’t built a kernel since before loadable kernel modules came around (back then you had to). Might as well throw in “CONFIG_HZ=1000” while I’m at it although as per my comments above I don’t think it’s needed.

A very reasonable and rational suggestion. But I’ve been using SUSE/openSUSE since SuSE 5.3 (yes, really) and every time I look at the Debian/Ubuntu world and think about learning dpkg/apt-get/etc after all I’ve gone through with rpm/zypper/yast I come back here.

Thanks again, and once more I’d appreciate any further hints from you or anyone else willing to offer them.

tsu2 · September 28, 2019, 11:17pm

Based on the following MIDI forum discusion
https://www.midi.org/forum/3502-midi-latency-in-2018

I’d suggest at least exploring in a different direction…
Assuming that today’s hardware is so much more powerful than what was available a decade or more ago,
Instead of trying to force processes (like re-compiling the kernel to be preemptive),
You might consider removing anything that’s extraneous to your purpose.

Like,
Start off with a JeOS version of your distro, which is a special version based on the idea that instead of turning off OS functions, you start off with only what is barely necessary, then add what you need. You can decide how to implement sound processing and implement this way, too… And avoid inefficient sound device interfaces.

Not in the referenced Forum thread or your own post, I"d also suspect the mere production of the MIDI sound likely involves substantial latency, which suggests dedicated hardware, perhaps a card with special MIDI processing.

Speculating,
TSU

malcolmlewis · September 29, 2019, 12:22am

Hi
So my understanding of your issue is pressing the keyboard key and signalling your process, so perhaps look at the usbhid driver and tweak keyboard polling for lower latency?

As in set it to 1ms (1000Hz)? To do this add the following to your kernel command line options (YaST -> Bootloader);


usbhid.kbpoll=1

Perhaps look at any other kernel modules involved and see what is available and what they are set to?

For example for usbhid;


/sbin/modinfo usbhid | grep "parm:"
systool -vm usbhid

OldButNotGuru · September 29, 2019, 1:24am

@tsu2 – Interesting discussion at the midi.org forum. Thanks for the link, and for introducing me to the JeOS concept which I didn’t know about. One way or another I hold out hope that an optimally configured kernel can do what I want. Also agree with you that the MIDI-to-sound processing introduces latency, but am convinced that at least at the hardware level a standard multi-core, multi-GHz CPU should be able to handle at least simple music synthesis (and possibly GPU for the more difficult).

@malcolmlewis – Thanks for the USB HID tuning info, also new to me. There’s probably latency at every step of my pipeline. I should have been more clear, but the testbed I described in my post is just that, a temporary experiment. In the real system user-generated data will come in over USB, but almost certainly not USB-HID and only 50% chance it will be USB-MIDI. I currently have USB-CDC partly implemented, and in general the choice of protocol and data format is completely up to me (modulo what I can successfully get to work).