We have been suffering from frequent (often 2 - 3 times a day), apparently random and unexplained crashes for some time on one of our desktop machines.
Frequently the machine would crash without leaving any messages in /var/log/messages or any hints as to the cause. Other times the crash was less sudden and dumps such as the following would ensue:
Jul 23 08:52:19 bijvoet kernel: [234721.562898] general protection fault: 0000 #1] PREEMPT SMP
Jul 23 08:52:19 bijvoet kernel: [234721.562908] CPU 3
Jul 23 08:52:19 bijvoet kernel: [234721.562911] Modules linked in: autofs4 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device jc42 coretemp edd vboxpci vboxnetadp vboxnetflt vboxdrv
nfs lockd fscache auth_rpcgss nfs_acl sunrpc af_packet cpufreq_conservative cpufreq_userspace microcode cpufreq_powersave acpi_cpufreq mperf snd_hda_codec_hdmi sr_mod cdrom joydev
sg snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep nvidia(P) snd_pcm i2c_i801 tpm_tis firewire_ohci firewire_core crc_itu_t i7core_edac tpm iTCO_wdt iTCO_vendor_support
pcspkr edac_core ioatdma tpm_bios xhci_hcd snd_timer snd soundcore igb dca snd_page_alloc button dm_mod linear processor thermal_sys ata_generic
Jul 23 08:52:19 bijvoet kernel: [234721.562993]
Jul 23 08:52:19 bijvoet kernel: [234721.562998] Pid: 4659, comm: kworker/3:1 Tainted: P 3.1.10-1.16-desktop #1 Intel Corporation S5520SC/S5520SC
Jul 23 08:52:19 bijvoet kernel: [234721.563009] RIP: 0010:<ffffffff8113f7bd>] <ffffffff8113f7bd>] free_block+0xcd/0x180
Jul 23 08:52:19 bijvoet kernel: [234721.563023] RSP: 0018:ffff88065936fd60 EFLAGS: 00010002
Jul 23 08:52:19 bijvoet kernel: [234721.563028] RAX: ffff88065b5e7d40 RBX: ffff88035b5736c0 RCX: ffff880655bfa000
Jul 23 08:52:19 bijvoet kernel: [234721.563034] RDX: ffff8803bee590c0 RSI: ffff8803bee59000 RDI: 0000e10000e50000
Jul 23 08:52:19 bijvoet kernel: [234721.563041] RBP: ffff88065b70c018 R08: 0000000000000001 R09: 0000000000000000
Jul 23 08:52:19 bijvoet kernel: [234721.563047] R10: 000000000000001b R11: dead000000100100 R12: 0000000000000018
Jul 23 08:52:19 bijvoet kernel: [234721.563053] R13: 0000000000000000 R14: ffffea0000000000 R15: 0000000000000008
Jul 23 08:52:19 bijvoet kernel: [234721.563060] FS: 0000000000000000(0000) GS:ffff88066fc20000(0000) knlGS:0000000000000000
Jul 23 08:52:19 bijvoet kernel: [234721.563067] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 23 08:52:19 bijvoet kernel: [234721.563072] CR2: 00007fc53d648000 CR3: 0000000001c05000 CR4: 00000000000006e0
Jul 23 08:52:19 bijvoet kernel: [234721.563079] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 23 08:52:19 bijvoet kernel: [234721.563085] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 23 08:52:19 bijvoet kernel: [234721.563091] Process kworker/3:1 (pid: 4659, threadinfo ffff88065936e000, task ffff880659d30840)
Jul 23 08:52:19 bijvoet kernel: [234721.563098] Stack:
Jul 23 08:52:19 bijvoet kernel: [234721.563100] 000000000000fc40 ffff8803bee590c0 0000000000000003 ffff88065b70c000
Jul 23 08:52:19 bijvoet kernel: [234721.563111] ffff88035b5736c0 0000000000000018 ffff88065b5e7d80 ffff88065b70c018
Jul 23 08:52:19 bijvoet kernel: [234721.563121] 0000000000000001 ffffffff8113fa73 ffff88066f826780 ffff88065b5e7d40
Jul 23 08:52:19 bijvoet kernel: [234721.563131] Call Trace:
Jul 23 08:52:19 bijvoet kernel: [234721.563144] <ffffffff8113fa73>] drain_array.part.42+0x83/0xe0
Jul 23 08:52:19 bijvoet kernel: [234721.563153] <ffffffff8113fdff>] cache_reap+0x6f/0x220
Jul 23 08:52:19 bijvoet kernel: [234721.563165] <ffffffff81071631>] process_one_work+0x111/0x4d0
Jul 23 08:52:19 bijvoet kernel: [234721.563174] <ffffffff81071db2>] worker_thread+0x152/0x340
Jul 23 08:52:19 bijvoet kernel: [234721.563184] <ffffffff81075e8e>] kthread+0x7e/0x90
Jul 23 08:52:19 bijvoet kernel: [234721.563195] <ffffffff815a9534>] kernel_thread_helper+0x4/0x10
Jul 23 08:52:19 bijvoet kernel: [234721.563203] Code: 8b 08 81 e1 80 00 00 00 0f 84 c0 00 00 00 48 8b 70 28 48 8b 43 68 49 bb 00 01 10 00 00 00 ad de 48 8b 3e 48 8b 4e 08 4a 8b 04
38
Jul 23 08:52:19 bijvoet kernel: [234721.563251] RIP <ffffffff8113f7bd>] free_block+0xcd/0x180
Jul 23 08:52:19 bijvoet kernel: [234721.563257] RSP <ffff88065936fd60>
Jul 23 08:52:19 bijvoet kernel: [234721.563263] — end trace 2c9a843aa76ab842 ]—
Jul 23 08:52:19 bijvoet kernel: [234721.565006] note: kworker/3:1[4659] exited with preempt_count 1
Jul 23 08:52:19 bijvoet kernel: [234721.565055] BUG: unable to handle kernel paging request at fffffffffffffff8
Jul 23 08:52:19 bijvoet kernel: [234721.565067] IP: <ffffffff810761b7>] kthread_data+0x7/0x10
Jul 23 08:52:19 bijvoet kernel: [234721.565079] PGD 1c07067 PUD 1c08067 PMD 0
Jul 23 08:52:19 bijvoet kernel: [234721.565090] Oops: 0000 #2] PREEMPT SMP
Jul 23 08:52:19 bijvoet kernel: [234721.565099] CPU 3
Jul 23 08:52:19 bijvoet kernel: [234721.565103] Modules linked in: autofs4 snd_pcm_oss snd_mixer_oss snd_seq snd_seq_device jc42 coretemp edd vboxpci vboxnetadp vboxnetflt vboxdrv
nfs lockd fscache auth_rpcgss nfs_acl sunrpc af_packet cpufreq_conservative cpufreq_userspace microcode cpufreq_powersave acpi_cpufreq mperf snd_hda_codec_hdmi sr_mod cdrom joydev
sg snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep nvidia(P) snd_pcm i2c_i801 tpm_tis firewire_ohci firewire_core crc_itu_t i7core_edac tpm iTCO_wdt iTCO_vendor_support
pcspkr edac_core ioatdma tpm_bios xhci_hcd snd_timer snd soundcore igb dca snd_page_alloc button dm_mod linear processor thermal_sys ata_generic
Jul 23 08:52:19 bijvoet kernel: [234721.565236]
Jul 23 08:52:19 bijvoet kernel: [234721.565241] Pid: 4659, comm: kworker/3:1 Tainted: P D 3.1.10-1.16-desktop #1 Intel Corporation S5520SC/S5520SC
Jul 23 08:52:19 bijvoet kernel: [234721.565259] RIP: 0010:<ffffffff810761b7>] <ffffffff810761b7>] kthread_data+0x7/0x10
Jul 23 08:52:19 bijvoet kernel: [234721.565276] RSP: 0018:ffff88065936fba0 EFLAGS: 00010002
Jul 23 08:52:19 bijvoet kernel: [234721.565286] RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000003
Jul 23 08:52:19 bijvoet kernel: [234721.565297] RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff880659d30840
Jul 23 08:52:19 bijvoet kernel: [234721.565309] RBP: ffff88065936fc28 R08: 0000000000989680 R09: ffff880659e52368
Jul 23 08:52:19 bijvoet kernel: [234721.565322] R10: 0000000000000400 R11: ffff880659e52358 R12: 0000000000000003
Jul 23 08:52:19 bijvoet kernel: [234721.565334] R13: ffff880659d30c50 R14: ffffea0000000000 R15: 0000000000000008
Jul 23 08:52:19 bijvoet kernel: [234721.565347] FS: 0000000000000000(0000) GS:ffff88066fc20000(0000) knlGS:0000000000000000
Jul 23 08:52:19 bijvoet kernel: [234721.565361] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul 23 08:52:19 bijvoet kernel: [234721.565371] CR2: fffffffffffffff8 CR3: 0000000001c05000 CR4: 00000000000006e0
Jul 23 08:52:19 bijvoet kernel: [234721.565383] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 23 08:52:19 bijvoet kernel: [234721.565395] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 23 08:52:19 bijvoet kernel: [234721.565408] Process kworker/3:1 (pid: 4659, threadinfo ffff88065936e000, task ffff880659d30840)
Jul 23 08:52:19 bijvoet kernel: [234721.565421] Stack:
Jul 23 08:52:19 bijvoet kernel: [234721.565427] ffffffff81072168 ffff88066fc324c0 ffffffff8159dfb8 ffff880659e52200
Jul 23 08:52:19 bijvoet kernel: [234721.565446] ffff88065936ffd8 0000000000000001 ffff88065936ffd8 ffff88065936ffd8
Jul 23 08:52:19 bijvoet kernel: [234721.565463] ffff88065936ffd8 ffff880659d30e5c ffff880659d30840 0000000000000000
Jul 23 08:52:19 bijvoet kernel: [234721.565475] Call Trace:
Jul 23 08:52:19 bijvoet kernel: [234721.565485] <ffffffff81072168>] wq_worker_sleeping+0x8/0x90
Jul 23 08:52:19 bijvoet kernel: [234721.565498] <ffffffff8159dfb8>] thread_return+0x26e/0x356
Jul 23 08:52:19 bijvoet kernel: [234721.565514] <ffffffff810580f8>] do_exit+0x268/0x450
Jul 23 08:52:19 bijvoet kernel: [234721.565527] <ffffffff815a16a6>] oops_end+0xa6/0xf0
Jul 23 08:52:19 bijvoet kernel: [234721.565538] <ffffffff815a0945>] general_protection+0x25/0x30
Jul 23 08:52:19 bijvoet kernel: [234721.565550] <ffffffff8113f7bd>] free_block+0xcd/0x180
Jul 23 08:52:19 bijvoet kernel: [234721.565560] <ffffffff8113fa73>] drain_array.part.42+0x83/0xe0
Jul 23 08:52:19 bijvoet kernel: [234721.565571] <ffffffff8113fdff>] cache_reap+0x6f/0x220
Jul 23 08:52:19 bijvoet kernel: [234721.565582] <ffffffff81071631>] process_one_work+0x111/0x4d0
Jul 23 08:52:19 bijvoet kernel: [234721.565593] <ffffffff81071db2>] worker_thread+0x152/0x340
Jul 23 08:52:19 bijvoet kernel: [234721.565604] <ffffffff81075e8e>] kthread+0x7e/0x90
Jul 23 08:52:19 bijvoet kernel: [234721.565615] <ffffffff815a9534>] kernel_thread_helper+0x4/0x10
Jul 23 08:52:19 bijvoet kernel: [234721.565623] Code: e8 9f 7f 52 00 e9 f3 fe ff ff 66 2e 0f 1f 84 00 00 00 00 00 48 89 df e8 68 ad fd ff e9 d7 fe ff ff 0f 1f 00 48 8b 87 b8 03 00 00
Jul 23 08:52:19 bijvoet kernel: [234721.565674] RIP <ffffffff810761b7>] kthread_data+0x7/0x10
Jul 23 08:52:19 bijvoet kernel: [234721.565683] RSP <ffff88065936fba0>
Jul 23 08:52:19 bijvoet kernel: [234721.565690] CR2: fffffffffffffff8
Jul 23 08:52:19 bijvoet kernel: [234721.565699] — end trace 2c9a843aa76ab843 ]—
Jul 23 08:52:19 bijvoet kernel: [234721.567329] Fixing recursive fault but reboot is needed!
Jul 23 08:54:23 bijvoet kernel: imklog 5.8.5, log source = /proc/kmsg started.
The machine was purchased in Feb 2011 and was initially installed with Opensuse 11.4. It was stable until around October 2011 when the instability suddenly started (I should say that the system was regularly updated). Since then, both Opensuse 12.1 and 12.2 have been installed (as fresh installs, not updates), but the instability has persisted.
Our initial feeling was a possible hardware problem. However, we have replaced the hard drives, the graphics card and sata card and the power supply, motherboard, cpu’s and memory have all been replaced under warranty (the only original hardware left is the case), but without any success. The system still crashes.
We finally got fed up with the instability and wiped the system and installed Ubuntu 12.04 lts. Since then the machine has been completely stable - no crashes of any sort.
This suggests to us a possible driver issue - at least a hardware-software interaction problem - probably introduced by an update first released in 11.4 as an update in around October 11.4 and which clearly still persists in 12.1 and 12.2. (I should say the instability is considerably worse if the machine was booted via systemd rather than systemV).
The hardware is:
Intel S5520SCR motherboard
Intel SC5650 Server chassis 1000W PSU
2 x Xeon X5660 2.80GHz 6 Core 12MB Cache
Seagate Barracuda 1.5TB 3.5" 7200RPM 32MB SATA
Crucial C300 SSD
24Gb DDR3 1333 ECC Kingston/Samsung Server memory (6 x 4Gb)
Zotac GeForce GT 560Ti 1GB DDR5 PCI-e HDMI DVI VGA
Startech PEXESAT32 eSATA card
I can send the output of hwinfo and lspci if more information is required.
Does anyone know of any known issues with any of the above hardware? We have tried removing the esata card and the SSD and reinstalling the OS on a normal hard drive with no improvement!
We are completely stumped.
Any thought?
Andrew >:(