I am finding 42.3 very unstable on a brand new system.
uname -a gives
Linux hostname 4.4.85-22-default #1 SMP Fri Sep 1 14:21:21 UTC 2017 (0c39a1f) x86_64 x86_64 x86_64 GNU/Linux
The motherboard is an Asus Prime B250-Pro with Intel Core i3-7100 with 16GB DDR4 RAM. Graphics is the integrated Intel device. HD is a Samsung 500GB SSD.
The install is a clean install from a USB stick
The system works for a while but after an hour or more the typical failure is the crash of a running application and then the complete seizure of the keyboard and mouse. No F1.
I can sometimes SSH into the system but often even that is slow and crashes.
An example is a crash of firtefox which was run from a konsole command line. This looks like this:
AiOGest: init
AiOGest: end init
ExceptionHandler::WaitForContinueSignal waiting for continue signal...
ExceptionHandler::GenerateDump cloned child 6921
ExceptionHandler::SendContinueSignalToChild sent continue signal to child
When I move to iceWM in an attempt to evade plasma problems that also seizes after a while.
Try to uninstall the kernel graphics stack update, drm-kmp-default, and see if it helps.
Or uninstall the intel Xorg driver, xf86-video-intel, and use the generic modesetting driver (which is even said to have better performance on some systems…).
I uninstalled drm-kmp-default and the display was dropped into a fixed 1024 x 768 resolution with no option to change the resolution back to my normal 1920 x 1080.
System settings did not have the resolution change feature normally found in the Dispalys tab.
Is there any way round this please?
Then apparently your GPU is quite new and not supported by kernel 4.4 yet.
So the system probably uses the generic fbdev driver now, which doesn’t allow resolution changes.
/var/log/Xorg.0.log should tell more though.
I have now installed xf86-video-intel without any noticeable problems but it I have since had a system crash. I was able to ssh into the system but an attempt to halt the system from the remote host failed. Having su’d into root:
bardsey:/home/peter # halt
Broadcast message from systemd-journald@bardsey (Fri 2017-09-15 14:34:46 BST):
systemd[1]: Caught <SEGV>, dumped core as pid 19702.
Broadcast message from systemd-journald@bardsey (Fri 2017-09-15 14:34:46 BST):
systemd[1]: Freezing execution.
Warning! D-Bus connection terminated.
Failed to wait for response: Connection reset by peer
Failed to open /dev/initctl: No such device or address
Failed to talk to init daemon.
bardsey:/home/peter #
bardsey:/home/peter # halt
Failed to start halt.target: Connection timed out
Failed to open /dev/initctl: No such device or address
Failed to talk to init daemon.
bardsey:/home/peter # halt
Failed to start halt.target: Activation of org.freedesktop.systemd1 timed out
Failed to open /dev/initctl: No such device or address
Failed to talk to init daemon.
bardsey:/home/peter # exit
exit
So this is looks to me as a complete failure of the system daemon.
Well, if it is not installed, it cannot be the cause of the problems either. (I mainly suggested to uninstall it because it is quite buggy on certain chipsets it seems)
Installing it probably won’t change anything at all, because if it is not installed by default it probably doesn’t support your graphics chip anyway (which means it won’t get used even if installed).
Again, /var/log/Xorg.0.log should tell what happens.
Looks like systemd crashed. (and that would also explain why the subsequent “halt” doesn’t work at all, as there’s no init system any more to talk to and initiate the halt)
Btw, it would probably be better to run “systemctl poweroff”, “shutdown”, or “halt -p”, as “halt” alone will not poweroff the system.
If systemd crashes, that can cause all sort of problems of course, but the question is whether this is related to your other problems, or rather only happens when you run “halt” via ssh.
coredumpctl should list all crashes, so having a look at that via ssh after the system froze may give more clues.
And the output of dmesg might be interesting too.
As this seems to be brand new hardware (as e.g. the GPU apparently is not even supported by the standard kernel), the best thing probably would be trying to install the latest kernel version though, available from the Kernel:stable repo. http://download.opensuse.org/repositories/Kernel:/stable/standard/
E.g. add this URL as repo with YaST->Repositories and then install the highest version of kernel-default in YaST->Software Management (click on “Versions” below the package list to see all available versions).
The standard kernel will be kept (and available to boot in “Advanced Options” in the boot menu) if you do this, so you can easily switch back in case of problems.
FWIW, I have ‘enjoyed’ many of the issues described by the OP on four very new systems (Kabylake, released by Intel about the same time as the B250 series) and found simply updating the Kernel to resolve most of those issues, however it’s important to know what version to update to.
Based on my experience my highly subjective opinion is as follows:
4.4 -not able to handle Kabylake, particularly graphics, very unstable
4.9 - better but not much better
4.11 - should be ok based on posts elsewhere, but that’s not my experience.
4.12 - very stable
4.13 - only using it on two systems for two days, but seems OK
Basically you need to compile/install the kernel modules for the new kernel.
If you use Oracle’s RPM, you can do that by running “sudo vboxconfig setup”.
If you use openSUSE’s RPM, this won’t work out of the box, as it only comes with a pre-compiled module that fits the distribution kernel, but no source code.
Additionally installing virtualbox-host-source should make that work (but I never tried that myself).
OTOH, there is a Virtualization repo that comes with a kmp (kernel module package) built against the latest kernel from Kernel:stable: http://download.opensuse.org/repositories/Virtualization/Kernel_stable_standard/
(you should also install virtualbox from there then, to avoid version mismatches)
I have experienced a crash with the latest kernel.
The output from journalctl from what I guess to be the start of the crash to the reboot is as follows:
After the oops there is more until a reboot. I will put the rest in a second post and then try to find out how to upload and reference a file.
The effect was kontact crashing while I was reading emails and when I tried to restart contact the plasma desktop vanished and then the entire system locked up. I could not even ssh in to do a reboot.
My next move will be to remove drm-kmp-default with the new kernel 4.13.2-1.1.g68f4aee to see if I am able to select the correct display resolution.
I have uninstalled drm-kmp-default with the 4.13.2-1.1.g68f4aee kernel and can scale the display resolution.
I am now soak testing the system without the drm-kmp-default module so fingers crossed.
OK Malcolm I have removed virtualbox which took virtualbox-host-source and virtualbox-qt with it. I also removed virtualbox-host-kmp-default.
I had to do a reboot with the big red button as the system crashed.
On reboot I reinstalled kernel-default, kernel-devel and kernel-default-devel (all 4.13.2-1.1.g68f4aee) which pulled in drm-kmp-default so I removed that again.
Bad news. The system has crashed again. This time there was no record in journalctl other than a hint.
The journal has a cron job each quarter hour until 12:45. The 13:00, 13:15 and 13:30 records are missing:
Sep 20 12:30:01 bardsey CRON[4586]: pam_unix(crond:session): session closed for user root
Sep 20 12:30:01 bardsey systemd[1]: Stopping User Manager for UID 0...
Sep 20 12:30:01 bardsey systemd[4587]: Stopped target Default.
Sep 20 12:30:01 bardsey systemd[4587]: Stopped target Basic System.
Sep 20 12:30:01 bardsey systemd[4587]: Stopped target Paths.
Sep 20 12:30:01 bardsey systemd[4587]: Stopped target Timers.
Sep 20 12:30:01 bardsey systemd[4587]: Stopped target Sockets.
Sep 20 12:30:01 bardsey systemd[4587]: Reached target Shutdown.
Sep 20 12:30:01 bardsey systemd[4587]: Starting Exit the Session...
Sep 20 12:30:01 bardsey systemd[4587]: Received SIGRTMIN+24 from PID 4622 (kill).
Sep 20 12:30:01 bardsey systemd[4588]: pam_unix(systemd-user:session): session closed for user root
Sep 20 12:30:01 bardsey systemd[1]: Stopped User Manager for UID 0.
Sep 20 12:30:01 bardsey systemd[1]: Removed slice User Slice of root.
Sep 20 12:45:01 bardsey cron[4694]: pam_unix(crond:session): session opened for user root by (uid=0)
Sep 20 12:45:01 bardsey systemd[1]: Created slice User Slice of root.
Sep 20 12:45:01 bardsey systemd[1]: Starting User Manager for UID 0...
Sep 20 12:45:01 bardsey systemd[1]: Started Session 18 of user root.
Sep 20 12:45:01 bardsey systemd[4695]: pam_unix(systemd-user:session): session opened for user root by (uid=0)
Sep 20 12:45:01 bardsey systemd[4695]: Reached target Paths.
Sep 20 12:45:01 bardsey systemd[4695]: Reached target Timers.
Sep 20 12:45:01 bardsey systemd[4695]: Reached target Sockets.
Sep 20 12:45:01 bardsey systemd[4695]: Reached target Basic System.
Sep 20 12:45:01 bardsey systemd[4695]: Reached target Default.
Sep 20 12:45:01 bardsey systemd[4695]: Startup finished in 11ms.
Sep 20 12:45:01 bardsey systemd[1]: Started User Manager for UID 0.
Sep 20 12:45:11 bardsey dbus[889]: [system] Activating service name='org.opensuse.Snapper' (using servicehelper)
Sep 20 12:45:11 bardsey dbus[889]: [system] Successfully activated service 'org.opensuse.Snapper'
Sep 20 12:45:34 bardsey kernel: BTRFS info (device sda3): qgroup scan completed (inconsistency flag cleared)
-- Reboot --
Sep 20 13:44:17 linux-8v7o systemd-journald[167]: Runtime journal (/run/log/journal/) is currently using 8.0M.
Maximum allowed usage is set to 795.3M.
Leaving at least 1.1G free (of currently available 7.7G of space).
Enforced usage limit is thus 795.3M, of which 787.3M are still available.
Sep 20 13:44:17 linux-8v7o kernel: microcode: microcode updated early to revision 0x5e, date = 2017-04-06
After waking the system with a keystroke (no log in) kontact was in focus. I pressed the keypad ‘+’ key to see the next unread message but kontact crashed with the usual KDE application crash dialogue. I could not complete this as the system stopped responding soon after.
Is there any way I can ensure that the kernel is untainted?
Do I have to just wait for a total crash and self-reboot to get that useful information that I was able to post last time?
On Wed 20 Sep 2017 01:06:01 PM CDT, pblewis wrote:
Bad news. The system has crashed again. This time there was no record in
journalctl other than a hint.
The journal has a cron job each quarter hour until 12:45. The 13:00,
13:15 and 13:30 records are missing:
Code:
Sep 20 12:30:01 bardsey CRON[4586]: pam_unix(crond:session):
session closed for user root Sep 20 12:30:01 bardsey systemd[1]:
Stopping User Manager for UID 0… Sep 20 12:30:01 bardsey
systemd[4587]: Stopped target Default. Sep 20 12:30:01 bardsey
systemd[4587]: Stopped target Basic System. Sep 20 12:30:01 bardsey
systemd[4587]: Stopped target Paths. Sep 20 12:30:01 bardsey
systemd[4587]: Stopped target Timers. Sep 20 12:30:01 bardsey
systemd[4587]: Stopped target Sockets. Sep 20 12:30:01 bardsey
systemd[4587]: Reached target Shutdown. Sep 20 12:30:01 bardsey
systemd[4587]: Starting Exit the Session… Sep 20 12:30:01 bardsey
systemd[4587]: Received SIGRTMIN+24 from PID 4622 (kill). Sep 20
12:30:01 bardsey systemd[4588]: pam_unix(systemd-user:session): session
closed for user root Sep 20 12:30:01 bardsey systemd[1]: Stopped User
Manager for UID 0. Sep 20 12:30:01 bardsey systemd[1]: Removed slice
User Slice of root. Sep 20 12:45:01 bardsey cron[4694]:
pam_unix(crond:session): session opened for user root by (uid=0) Sep 20
12:45:01 bardsey systemd[1]: Created slice User Slice of root. Sep 20
12:45:01 bardsey systemd[1]: Starting User Manager for UID 0… Sep 20
12:45:01 bardsey systemd[1]: Started Session 18 of user root. Sep 20
12:45:01 bardsey systemd[4695]: pam_unix(systemd-user:session): session
opened for user root by (uid=0) Sep 20 12:45:01 bardsey systemd[4695]:
Reached target Paths. Sep 20 12:45:01 bardsey systemd[4695]: Reached
target Timers. Sep 20 12:45:01 bardsey systemd[4695]: Reached target
Sockets. Sep 20 12:45:01 bardsey systemd[4695]: Reached target Basic
System. Sep 20 12:45:01 bardsey systemd[4695]: Reached target Default.
Sep 20 12:45:01 bardsey systemd[4695]: Startup finished in 11ms. Sep 20
12:45:01 bardsey systemd[1]: Started User Manager for UID 0. Sep 20
12:45:11 bardsey dbus[889]: [system] Activating service
name=‘org.opensuse.Snapper’ (using servicehelper) Sep 20 12:45:11
bardsey dbus[889]: [system] Successfully activated service
‘org.opensuse.Snapper’ Sep 20 12:45:34 bardsey kernel: BTRFS info
(device sda3): qgroup scan completed (inconsistency flag cleared) –
Reboot – Sep 20 13:44:17 linux-8v7o systemd-journald[167]: Runtime
journal (/run/log/journal/) is currently using 8.0M. Maximum allowed
usage is set to 795.3M. Leaving at least 1.1G free (of currently
available 7.7G of space). Enforced usage limit is thus 795.3M, of which
787.3M are still available. Sep 20 13:44:17 linux-8v7o kernel:
microcode: microcode updated early to revision 0x5e, date = 2017-04-06
After waking the system with a keystroke (no log in) kontact was in
focus. I pressed the keypad ‘+’ key to see the next unread message but
kontact crashed with the usual KDE application crash dialogue. I could
not complete this as the system stopped responding soon after.
Is there any way I can ensure that the kernel is untainted?
Do I have to just wait for a total crash and self-reboot to get that
useful information that I was able to post last time?
Peter
Hi
OK, sounds like at this point two issues, one kernel (vbox) and one for
the desktop, so if not getting the kernel output now, that’s possibly
identified one issue…
So you might just have to peruse the last log with journalctl -x are
there any core dumps (coredumpctl list)?
–
Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
openSUSE Leap 42.2|GNOME 3.20.2|4.4.79-18.26-default
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below… Thanks!
I have had another crash and this time I was able to reboot the machine.
An initial crash in Firefox followed by konsole and akonadi while the system was rebooting.
journalctl -x output of those evernts:
Sep 20 15:20:57 bardsey systemd-coredump[3875]: Process 3846 (Web Content) of user 500 dumped core.
-- Subject: Process 3846 (Web Content) dumped core
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: man:core(5)
--
-- Process 3846 (Web Content) crashed and dumped core.
--
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.
<--snip-->
Sep 20 15:23:58 bardsey systemd-coredump[4117]: Process 3251 (konsole) of user 500 dumped core.
-- Subject: Process 3251 (konsole) dumped core
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: man:core(5)
--
-- Process 3251 (konsole) crashed and dumped core.
--
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.
<--snip-->
Sep 20 15:24:03 bardsey systemd-coredump[4119]: Process 2769 (akonadiserver) of user 500 dumped core.
-- Subject: Process 2769 (akonadiserver) dumped core
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Documentation: man:core(5)
--
-- Process 2769 (akonadiserver) crashed and dumped core.
--
-- This usually indicates a programming error in the crashing program and
-- should be reported to its vendor as a bug.