3.11.10-17-default Kernel Unstable on OpenSUSE 13.1

I’m running OpenSUSE 13.1 with the 3.11.10-17-default kernel and I’ve had some major problems:

  1. If there’s a high load, and you’re running a multi-threaded app like apache tomcat, the load rate will soar through the roof. I’m talking like 150 or so. The system becomes unresponsive. For example, if I run a backup, then the load will go very high. If I kill the multi-threaded tomcat app, then things calm down and work OK. The tomcat app has been running stably on 11.3 systems for several years.

I upgraded the kernel to 3.15.8 and ran a full backup while tomcat was running, and things seem much more stable. Also, while the backup was running, another multi-threaded java app was running for several minutes and things were good.

During the lockup period, running top shows the “migration” processes of the kernel running at 100%. For example, I’ll see migration processes 1-8 consistently at 100% during the problem.

  1. I’m seeing some kernel error messages:

BUG: soft lockup - CPU#0 stuck for 22s! [kswapd0:162]
Modules linked in: ipt_REJECT nls_utf8 loop fuse xt_recent xt_LOG xt_tcpudp xt_conntrack iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat_ftp nf_conntrack_ftp ipt_MASQUERADE nf_nat x_tables nf_conntrack af_packet sr_mod cdrom iTCO_wdt iTCO_vendor_support gpio_ich ata_generic coretemp kvm_intel kvm joydev crc32c_intel igb ptp pps_core ata_piix pcspkr serio_raw lpc_ich i2c_i801 mfd_core i7core_edac ioatdma dca edac_core shpchp button acpi_cpufreq mperf sg dm_mod autofs4 hid_generic usbhid uhci_hcd mgag200 ttm drm_kms_helper ehci_pci ehci_hcd drm i2c_algo_bit sysimgblt sysfillrect syscopyarea usbcore usb_common 3w_9xxx processor thermal_sys scsi_dh_hp_sw scsi_dh_emc scsi_dh_rdac scsi_dh_alua scsi_dh
CPU: 0 PID: 162 Comm: kswapd0 Tainted: G          I  3.11.10-17-default #1
Hardware name: Supermicro X8DTU/X8DTU, BIOS 2.1c       08/03/2012
task: ffff880c00db2200 ti: ffff880c00db4000 task.ti: ffff880c00db4000
RIP: 0010:<ffffffff8155daea>]  <ffffffff8155daea>] _raw_spin_lock+0x1a/0x30
RSP: 0018:ffff880c00db5c08  EFLAGS: 00000293
RAX: 0000000000009975 RBX: ffffffff8122c216 RCX: 0000000000009978
RDX: 0000000000009978 RSI: ffff88081597db78 RDI: ffff880c0009e480
RBP: ffff880c0009e000 R08: 0000000000000000 R09: 0000000000000036
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88081597db78
R13: 0000000000000000 R14: 0000000000000000 R15: ffff8816d136f0b0
FS:  0000000000000000(0000) GS:ffff880c3fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007feca401f868 CR3: 0000000001c0e000 CR4: 00000000000007f0
Stack:
 ffffffff8122c83c ffff8816d136f0b0 ffff8816d136f1b8 ffffffff8120e498
 ffff8816d136f0b0 ffffffff81193da3 ffff880c00db5c78 ffff881055edf530
 ffff880c0002d108 ffffffff81193ec1 ffff881055edf5a0 ffffffff81194ccc
Call Trace:
 <ffffffff8122c83c>] ext4_es_lru_del+0x1c/0x60
 <ffffffff8120e498>] ext4_clear_inode+0x38/0x80
 <ffffffff81193da3>] evict+0xa3/0x190
 <ffffffff81193ec1>] dispose_list+0x31/0x40
 <ffffffff81194ccc>] prune_icache_sb+0x16c/0x310
 <ffffffff8117ee9b>] prune_super+0x15b/0x190
 <ffffffff81128953>] shrink_slab+0x153/0x2d0
 <ffffffff8112c0f9>] balance_pgdat+0x459/0x580
 <ffffffff8112c36a>] kswapd+0x14a/0x3e0
 <ffffffff8106f8ef>] kthread+0xaf/0xc0
 <ffffffff8156563c>] ret_from_fork+0x7c/0xb0
Code: ec b8 01 00 00 00 c3 66 2e 0f 1f 84 00 00 00 00 00 b8 00 00 01 00 f0 0f c1 07 89 c1 c1 e9 10 66 39 c1 89 ca 74 0d 0f 1f 00 f3 90 <0f> b7 07 66 39 d0 75 f6 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00

System configuration is:

2 ea Intel XEON 5520 CPU
96GB RAM
Supermicro X8DTUF MB
3Ware 9650-4LPML SATA RAID Controller
ROOT: 2 ea Magnetic Drives in RAID 1
/home: 2ea Samsung SSD in RAID 1

OpenSUSE 13.1 w/ All Updates Applied

I seem to have a decent work-around using the 3.15.8 kernel, but I’ve never really had these kinds of problems with OpenSUSE before and thought I would mention it.