ksoftirqd 99% IO and increasing iodelay

I’ve tried upgrading from 42.3 as well as a fresh install of Leap 15 on a USB drive and I see some odd behaviour with IO.
I’m using a A2SDi-HLN4F motherboard from supermicro, and I’ve tried 2 different USB drives (USB2 and USB3)
Have 6 WD mechanical drives and 2 SSD as storage drives (no os, only lvm).

iotop shows 99% IO (0% occationally, 99% with --accumulated)
pidstat shows an ever increasing iodelay, increasing by 200 clock ticks/second.
Note that this only affects 2 cores (there are 4 and no hyperthreading), which cores seems random at boot time.
iostat shows all disks essentially idle.
ftrace seems to indicate that this is caused by usb-storage (which is why I’m mentioning USB in the first place).
Can’t see anything special in /proc/interrupts
I’ve checked the bios settings.

Has anyone seen something like this?

I’ve included some outputs below:

nas# iotop -o
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                             
   16 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [ksoftirqd/1]
   22 be/4 root        0.00 B/s    0.00 B/s  0.00 % 99.99 % [ksoftirqd/2]

nas# pidstat -d 1
Linux 4.12.14-lp150.11-default (nas)     07/04/18     _x86_64_    (4 CPU)


19:50:41      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command


19:50:42      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
19:50:43        0         7      0.00      0.00      0.00   57344  ksoftirqd/0
19:50:43        0        16      0.00      0.00      0.00   57373  ksoftirqd/1


19:50:43      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command


19:50:44      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
19:50:45        0         7      0.00      0.00      0.00   57545  ksoftirqd/0
19:50:45        0        16      0.00      0.00      0.00   57574  ksoftirqd/1

...
20:17:35      UID       PID   kB_rd/s   kB_wr/s kB_ccwr/s iodelay  Command
20:17:36        0        16      0.00      0.00      0.00  218659  ksoftirqd/1
20:17:36        0        22      0.00      0.00      0.00  218624  ksoftirqd/2


nas# cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
  0:         18          0          0          0  IR-IO-APIC   2-edge      timer
  8:          1          0          0          0  IR-IO-APIC   8-edge      rtc0
  9:          0          0          0          0  IR-IO-APIC   9-fasteoi   acpi
 23:          0          0          0          0  IR-IO-APIC  23-fasteoi   i801_smbus
 24:          0          0          0          0  DMAR-MSI   0-edge      dmar0
 25:          0          0          0          0  IR-PCI-MSI 98304-edge      aerdrv, PCIe PME
 26:          0          0          0          0  IR-PCI-MSI 262144-edge      aerdrv, PCIe PME, pcie-dpc
 27:          0          0          0          0  IR-PCI-MSI 278528-edge      aerdrv, PCIe PME, pcie-dpc
 28:          0          0          0          0  IR-PCI-MSI 360448-edge      aerdrv, PCIe PME
 29:          0          0          0          0  IR-PCI-MSI 376832-edge      aerdrv, PCIe PME
 30:        255          0          4          0  IR-PCI-MSI 311296-edge      ahci0
 31:        188          0          4          0  IR-PCI-MSI 311297-edge      ahci1
 32:        186          0          0          0  IR-PCI-MSI 311298-edge      ahci2
 33:        174          0          0          0  IR-PCI-MSI 311299-edge      ahci3
 41:        202          0          4          0  IR-PCI-MSI 327683-edge      ahci3
 42:        203          0          0          0  IR-PCI-MSI 327684-edge      ahci4
 43:        155          0          0          0  IR-PCI-MSI 327685-edge      ahci5
 44:        151          0          0          4  IR-PCI-MSI 327686-edge      ahci6
 46:      32951          0          0          0  IR-PCI-MSI 344064-edge      xhci_hcd
 47:          0          0          0          0  IR-PCI-MSI 294912-edge      ismt-msi
 48:          0          0          0          0  IR-PCI-MSI 524288-edge      qat0-bundle0
 49:          0          0          0          0  IR-PCI-MSI 524289-edge      qat0-bundle1
 50:          0          0          0          0  IR-PCI-MSI 524290-edge      qat0-bundle2
 51:          0          0          0          0  IR-PCI-MSI 524291-edge      qat0-bundle3
 52:          0          0          0          0  IR-PCI-MSI 524292-edge      qat0-bundle4
 53:          0          0          0          0  IR-PCI-MSI 524293-edge      qat0-bundle5
 54:          0          0          0          0  IR-PCI-MSI 524294-edge      qat0-bundle6
 55:          0          0          0          0  IR-PCI-MSI 524295-edge      qat0-bundle7
 56:          0          0          0          0  IR-PCI-MSI 524296-edge      qat0-bundle8
 57:          0          0          0          0  IR-PCI-MSI 524297-edge      qat0-bundle9
 58:          0          0          0          0  IR-PCI-MSI 524298-edge      qat0-bundle10
 59:          0          0          0          0  IR-PCI-MSI 524299-edge      qat0-bundle11
 60:          0          0          0          0  IR-PCI-MSI 524300-edge      qat0-bundle12
 61:          0          0          0          0  IR-PCI-MSI 524301-edge      qat0-bundle13
 62:          0          0          0          0  IR-PCI-MSI 524302-edge      qat0-bundle14
 63:          0          0          0          0  IR-PCI-MSI 524303-edge      qat0-bundle15
 64:          0          0          0          0  IR-PCI-MSI 524304-edge      qat0-ae-cluster
 65:       5648          0          0          0  IR-PCI-MSI 2621440-edge      eth0-TxRx-0
 66:          0         38       2130          0  IR-PCI-MSI 2621441-edge      eth0-TxRx-1
 67:          0          0         44       2209  IR-PCI-MSI 2621442-edge      eth0-TxRx-2
 68:       2066          0          0         39  IR-PCI-MSI 2621443-edge      eth0-TxRx-3
 69:          1          0          0          0  IR-PCI-MSI 2621444-edge      eth0
NMI:          1          1          1          1   Non-maskable interrupts
LOC:      30609      42973      14020      15601   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:          1          1          1          1   Performance monitoring interrupts
IWI:      23809      34997       8709       9167   IRQ work interrupts
RTR:          0          0          0          0   APIC ICR read retries
RES:       4074       3957       3999       3780   Rescheduling interrupts
CAL:       6049       3267       7060       6725   Function call interrupts
TLB:         19         20         27         24   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          4          3          2          2   Threshold APIC interrupts
DFR:          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:         14         13         12         12   Machine check polls
ERR:          0
MIS:          0
PIN:          0          0          0          0   Posted-interrupt notification event
NPI:          0          0          0          0   Nested posted-interrupt event
PIW:          0          0          0          0   Posted-interrupt wakeup event

/proc/interrupts is what you want to look at. You can:


cat /proc/interrupts

If you paste the output here: https://susepaste.org/

Then post the link here someone can probably help diagnose the issue.

Thanks, /proc/interrupts is already in the code section of my post.
Only thing I find a bit off is IWI.

Sorry, reading comprehension error.

This does not seem to be USB related. I installed everything on LVM, except the efi boot which I kept on the USB (all disks are in use).
Then removed the USB.
Same result.

Also booted into TumbleWeed live CD, and ran iotop (git clone git://repo.or.cz/iotop.git, something broken w/ network on TW live).
Problem gone.
/proc/interrupts IWI looks the same as Leap 15.

Leap 15 live doesn’t boot at all (not Gnome nor recovery CD)

Tried git clone git://repo.or.cz/iotop.git on Leap 15 same result, so that’s excluded.

This looks more and more like a bug. Unfortunately I don’t have time to debug this more.

I assume VT-d and VT-x are on in the bios?

Wonder if booting with intel_iommu=off resolves the IO issue.

I poked around this article to see the iommu settings: https://wiki.gentoo.org/wiki/IOMMU_SWIOTLB

Fortunately this issues seems to have been resolved with the latest update.
I didn’t try disabling any virtualisation features (though I plan on using them so wouldn’t have been an option other than for testing).
I’ve been on vacation, hence the late reply.

Thanks for showing interest.