since a few weeks one of my desktop computers never shuts down. I’m forced to use the power off button to shut it down.
The same thing happens on reboot, so reboot isn’t possible as it gets stuck.
The same thing happens whether I turn off or reboot the computer from command line or clicking on shutdown or reboot with the mouse in the KDE menu.
I’ve tried finding the cause but with no success so far, please give me suggestions on what to do next to try to find the problem causing this.
Problems:
journalctl or other logging is of very little use as logging shuts down before the computer hangs infinitely, so the logs show nothing at all when the computer hangs in shutdown mode.
Ctrl+Alt+F? works but I can’t access bash as it has closed down also, so not possible to login or type any commands
there is no crash happening from the linux kernel’s point of view, so no crash dump.
Things that might possibly help me:
Is there a way to turn off the (totally useless) black and green graphics with three dots which has one only purpose to hide information from the user during shutdown and boot? I honestly never understood why opensuse has this “feature” of hiding information, but this is the first time the “feature” might be causing a real problem for me as OpenSUSE/SUSE has been super stable for me during the 15+ years I’ve used it.
Is there any way of displaying what’s happening through some magic sysrq keys (keyboard shortcuts)? So that I can find the cause of why the computer never shuts down?
Is there some search word I should do when searching through the logs for a reason that might be causing the shutdown and reboot to lock indefinitely? This will only help if the same thing that causes the computer to never shut down also causes some other problem while it’s running.
By the time you answer this question I might have found a solution by googling, but please do answer anyway as others with the same problem might read the answers. This is the first day I found enough time to try to troubleshoot this problem, so I haven’t googled much yet.
A note that might have importance:
This particular machine has a problem with one USB port causing log entries. Before march 9 only once per day, since march 9 around 5 times per minute. The motherboard is MSI Z170A GAMING M5, and if I remember correctly it has 3 different types of USB ports with different hardware. Either OpenSUSE modifications of kernel or the Linux kernel make so the ports aren’t detected correctly, or there is a hardware problem on this machine. The ports work perfectly in Windows, so I suspect some driver or modification in Linux kernel needs improvement. The amount of log warnings about this is HUGE “usb3-port1: Cannot enable. Maybe the USB cable is bad?”, but no device is connected through usb3. There is a slight possibility that an official kernel I installed on march 9 caused this huge increase in log entries and these looped retries at enabling this USB port causes the inability to shutdown or reboot. However I can’t know for sure as I’ve found no way yet of finding what causes the computer to never shutdown completely. I haven’t troubleshooted this issue either, as I have been super busy the last weeks. So I’m troubleshooting this issue too today.
If you boot to the previous kernels is there still a problem… I suspect you are right about the USB since the shut down delays until all processes are ended. And if that one is hanging some how this may stop the shutdown.
What might help is going through the logs of the previous boot and pinpointing the last unit stopping successfully by issuing and posting the output of the following command:
erlangen:~ # journalctl -b -1 -u init.scope | grep Stopped
Mar 29 19:22:43 erlangen systemd[1]: Stopped target Remote File Systems.
Mar 29 19:22:43 erlangen systemd[1]: Stopped target Remote File Systems (Pre).
Mar 29 19:22:43 erlangen systemd[1]: Stopped target Timers.
Mar 29 19:22:43 erlangen systemd[1]: Stopped target Initrd Default Target.
Mar 29 19:22:43 erlangen systemd[1]: Stopped target Initrd Root Device.
Mar 29 19:22:43 erlangen systemd[1]: Stopped target Basic System.
Mar 29 19:22:43 erlangen systemd[1]: Stopped target Paths.
Mar 29 19:22:43 erlangen systemd[1]: Stopped target System Initialization.
Mar 29 19:22:43 erlangen systemd[1]: Stopped target Local File Systems.
Mar 29 19:22:43 erlangen systemd[1]: Stopped target Swap.
Mar 29 19:22:43 erlangen systemd[1]: Stopped udev Coldplug all Devices.
Mar 29 19:22:43 erlangen systemd[1]: Stopped target Slices.
Mar 29 19:22:43 erlangen systemd[1]: Stopped Apply Kernel Variables.
Mar 29 19:22:43 erlangen systemd[1]: Stopped Load Kernel Modules.
Mar 29 19:22:43 erlangen systemd[1]: Stopped target Sockets.
Mar 29 19:22:43 erlangen systemd[1]: Stopped dracut pre-trigger hook.
Mar 29 19:22:43 erlangen systemd[1]: Stopped dracut cmdline hook.
Mar 29 19:22:43 erlangen systemd[1]: Stopped dracut ask for additional cmdline parameters.
Mar 29 19:22:43 erlangen systemd[1]: Stopped udev Kernel Device Manager.
Mar 29 19:22:43 erlangen systemd[1]: Stopped Create Static Device Nodes in /dev.
Mar 29 19:22:43 erlangen systemd[1]: Stopped Create list of required static device nodes for the current kernel.
Mar 29 19:23:09 erlangen systemd[1]: Stopped User Manager for UID 475.
Mar 29 19:23:09 erlangen systemd[1]: Stopped User Runtime Directory /run/user/475.
Mar 29 19:24:45 erlangen systemd[1]: Stopped Machine Check Exception Logging Daemon.
Mar 29 19:28:11 erlangen systemd[1]: Stopped target Timers.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Discard unused blocks once a week.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Backup of /etc/sysconfig.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Backup of jAlbum projects.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Daily locate database update.
Mar 29 19:28:11 erlangen systemd[1]: Stopped target Graphical Interface.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Backup of RPM database.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Daily rotation of log files.
Mar 29 19:28:11 erlangen systemd[1]: Stopped target Multi-User System.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Apply settings from /etc/sysconfig/keyboard.
Mar 29 19:28:11 erlangen systemd[1]: Stopped target Login Prompts.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Systemd timer to update the system daily with PackageKit.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Daily Cleanup of Temporary Directories.
Mar 29 19:28:11 erlangen systemd[1]: Stopped target Sound Card.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Self Monitoring and Reporting Technology (SMART) Daemon.
Mar 29 19:28:11 erlangen systemd[1]: Stopped hd-idle disk spindown service.
Mar 29 19:28:11 erlangen systemd[1]: Stopped irqbalance daemon.
Mar 29 19:28:11 erlangen systemd[1]: Stopped A remote-mail retrieval utility.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Daemon for power management.
Mar 29 19:28:11 erlangen systemd[1]: Stopped RealtimeKit Scheduling Policy Service.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Machine Check Exception Logging Daemon.
Mar 29 19:28:11 erlangen systemd[1]: Stopped CUPS Scheduler.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Manage, Install and Generate Color Profiles.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Disk Manager.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Command Scheduler.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Authorization Manager.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Getty on tty1.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Restore /run/initramfs on shutdown.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Save/Restore Sound Card State.
Mar 29 19:28:11 erlangen systemd[1]: Stopped target Network is Online.
Mar 29 19:28:11 erlangen systemd[1]: Stopped Wait for Network to be Configured.
Mar 29 19:28:11 erlangen systemd[1]: Stopped MiniDLNA UPnP-A/V and DLNA media server.
Mar 29 19:28:12 erlangen systemd[1]: Stopped Postfix Mail Transport Agent.
Mar 29 19:28:13 erlangen systemd[1]: Stopped X Display Manager.
Mar 29 19:28:13 erlangen systemd[1]: Stopped The Apache Webserver.
Mar 29 19:28:13 erlangen systemd[1]: Stopped target System Time Synchronized.
Mar 29 19:28:13 erlangen systemd[1]: Stopped NTP Server Daemon.
Mar 29 19:28:32 erlangen systemd[1]: Stopped Session 2 of user karl.
Mar 29 19:28:32 erlangen systemd[1]: Stopped User Manager for UID 1000.
Mar 29 19:28:32 erlangen systemd[1]: Stopped User Runtime Directory /run/user/1000.
Mar 29 19:28:32 erlangen systemd[1]: Stopped Permit User Sessions.
Mar 29 19:28:32 erlangen systemd[1]: Stopped target Remote File Systems.
Mar 29 19:28:32 erlangen systemd[1]: Stopped target Remote File Systems (Pre).
Mar 29 19:28:32 erlangen systemd[1]: Stopped target Network.
Mar 29 19:28:33 erlangen systemd[1]: Stopped Login Service.
Mar 29 19:28:33 erlangen systemd[1]: Stopped Network Service.
Mar 29 19:28:33 erlangen systemd[1]: Stopped WPA Supplicant daemon (interface wlp3s0).
Mar 29 19:28:33 erlangen systemd[1]: Stopped D-Bus System Message Bus.
Mar 29 19:28:33 erlangen systemd[1]: Stopped target Basic System.
Mar 29 19:28:33 erlangen systemd[1]: Stopped target Slices.
Mar 29 19:28:33 erlangen systemd[1]: Stopped Forward Password Requests to Plymouth Directory Watch.
Mar 29 19:28:33 erlangen systemd[1]: Stopped target Paths.
Mar 29 19:28:33 erlangen systemd[1]: Stopped CUPS Scheduler.
Mar 29 19:28:33 erlangen systemd[1]: Stopped Watch for changes in CA certificates.
Mar 29 19:28:33 erlangen systemd[1]: Stopped target Sockets.
Mar 29 19:28:33 erlangen systemd[1]: Stopped target System Initialization.
Mar 29 19:28:33 erlangen systemd[1]: Stopped target Swap.
Mar 29 19:28:33 erlangen systemd[1]: Stopped target Local Encrypted Volumes.
Mar 29 19:28:33 erlangen systemd[1]: Stopped Apply Kernel Variables.
Mar 29 19:28:33 erlangen systemd[1]: Stopped Load Kernel Modules.
Mar 29 19:28:33 erlangen systemd[1]: Stopped Load/Save Random Seed.
Mar 29 19:28:33 erlangen systemd[1]: Stopped Update UTMP about System Boot/Shutdown.
Mar 29 19:28:33 erlangen systemd[1]: Stopped Security Auditing Service.
Mar 29 19:28:33 erlangen systemd[1]: Stopped Create Volatile Files and Directories.
Mar 29 19:28:33 erlangen systemd[1]: Stopped target Local File Systems.
Mar 29 19:28:33 erlangen systemd[1]: Stopped target Local File Systems (Pre).
Mar 29 19:28:33 erlangen systemd[1]: Stopped Create Static Device Nodes in /dev.
Mar 29 19:28:33 erlangen systemd[1]: Stopped Remount Root and Kernel File Systems.
erlangen:~ #
I also found out that it’s always possible to turn on and off the green dots hiding what’s happening by pressing the Del key during boot up or shutdown.
After tons of reboots trying different kernels and testing previous snapshots with different kernels I found out that all of these fail to shutdown the computer:
4.4.162, 4.4.165, 4.4.172, 4.4.175, 4.4.176.
However there’s 3 exceptions, shutdown with 4.4.175 worked once and 4.4.176 worked twice.
It seemed pretty random when it worked, I haven’t been able to draw any conclusion yet.
I probably have relevant stuff now in the journalctl log after the two shutdowns with 4.4.176 worked as they were not run in a previous snapshot in read only mode, I’ll go through it later tonight.
I also tried all the USB ports, and the mouse didn’t work in the USB 3.1 port but the keyboard worked fine in that port. All USB-ports with 2.0 and 3.0 worked fine and no port produced any extra error messages when I used them not even the one with 3.1.
During shutdown there was often a Failed message about /var/log not being unmountable. Possibly there was always a failed message about unmounting, but I didn’t find out about the Del key until a bit into the troubleshooting so I couldn’t see the messages the first shutdowns. Also many reboots/shutdowns were on older read only snapshots, so no logs to see after power off by pressing the button.
Yes, on this Leap 15.0 “no Btrfs” system, separate ext4 partition for ‘/var/’, I’m also seeing that, at every power-off/shutdown/reboot, the “/var” partition fails to unmount (systemd journal entries date back to December 22 last year):
# journalctl | grep -Ei 'kernel: Linux version |mount' | grep -Ei 'kernel: Linux version |/var'
This doesn’t seem to have any bad effects on a Leap 15.0 system with ext4 system partitions and XFS user partitions but, on a single-disk Leap 15.0 Laptop with the “openSUSE default partitions” (Btrfs system partition, XFS user partition), occasionally shutdown/power-off hangs due to the Btrfs “/var” sub-volume not being unmounted – reboots never seem to suffer from this issue …
Provided that, the Btrfs maintenance has been regularly executed, the Laptop, normally shuts down OK – even with the Plymouth splash enabled …
Wading through the systemd journal on this “no Btrfs” system, I’m noticing that, not only the “/var” ext4 partition is regularly failing to unmount cleanly but, also, the “/home” XFS partition …
Please note that, the filesystem check on the partitions being unmounted, seems to be, normally, stopped after each partition has been unmounted …
Which leaves, only, AFAICS, user and/or system processes which are accessing the affected partitions during system shutdown/power-off …
I am wondering about “Stopped File System Check on /dev/disk/by-uuid/2acf447f-6e95-41fc-8a00-e94f9b5572c0.” Have never seen a similar line on my system.
Typical output is:
erlangen:~ # journalctl -b -1 -u systemd-fsck-root.service
-- Logs begin at Mon 2019-03-18 17:54:36 CET, end at Mon 2019-04-01 12:03:08 CEST. --
Mar 29 19:22:43 erlangen systemd[1]: Starting File System Check on /dev/disk/by-uuid/8b190950-c141-4351-9198-7a9592b4fb34...
Mar 29 19:22:43 erlangen systemd-fsck[427]: Tumbleweed: clean, 633097/2097152 files, 4860742/8388096 blocks
Mar 29 19:22:43 erlangen systemd[1]: Started File System Check on /dev/disk/by-uuid/8b190950-c141-4351-9198-7a9592b4fb34.
erlangen:~ #
erlangen:~ # journalctl -b -1| grep -i 'File System Check'
Mar 29 19:22:43 erlangen systemd[1]: Starting File System Check on /dev/disk/by-uuid/8b190950-c141-4351-9198-7a9592b4fb34...
Mar 29 19:22:43 erlangen systemd[1]: Started File System Check on /dev/disk/by-uuid/8b190950-c141-4351-9198-7a9592b4fb34.
erlangen:~ #
But there is no line “Stopping File System Check …” Seems that during shutdown of pokemon fsck is underway and causing trouble. On erlangen this never occurred in any boot since start of journal.
Checked the logs now, for 1,5 years there’s been failed unmount of /var/log and stopped fsck of /home almost every shutdown.
So this seems to be constant, but the problem with the computer not shutting down has only been here for only some weeks.
Has there been a change in how the kernel communicates with the motherboard during shutdown?
For example changes in acpi or hardware/firmware related changes?
Running fsck which never terminates normally, but gets stopped by shutdown is moot. Checking a partition on SSD is really fast:
erlangen:~ # time fsck -f /dev/sdb3
fsck from util-linux 2.33.1
e2fsck 1.44.5 (15-Dec-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Tumbleweed-SSD: **601129/**1966080 files (0.3% non-contiguous), 4760578/7864320 blocks
real 0m2.839s
user 0m1.825s
sys 0m0.223s
erlangen:~ #
Even fscking /home on HDD with some 3.000.000 files is done in a minute or two. The issue should be fixed anyway, e.g. booting a rescue system and forcing fsck of everything.
Has there been a change in how the kernel communicates with the motherboard during shutdown? For example changes in acpi or hardware/firmware related changes?
Nearly all appliances at some point start to show unexpected behavior. In most cases a reset to factory defaults fixes that. Try with UEFI.
On this Desktop, the systemd “fsck” services are as follows:
# systemctl list-units | grep -i 'fsck'
systemd-fsck-root.service loaded active exited File System Check on Root Device
systemd-fsck@dev-disk-by\x2did-ata\x2dST3500418AS_Z2AENPNG\x2dpart1.service loaded active exited File System Check on /dev/disk/by-id/ata-ST3500418AS_Z2AENPNG-part1
systemd-fsck@dev-disk-by\x2did-ata\x2dST3500418AS_Z2AENPNG\x2dpart2.service loaded active exited File System Check on /dev/disk/by-id/ata-ST3500418AS_Z2AENPNG-part2
systemd-fsck@dev-disk-by\x2did-ata\x2dST3500418AS_Z2AENPNG\x2dpart3.service loaded active exited File System Check on /dev/disk/by-id/ata-ST3500418AS_Z2AENPNG-part3
systemd-fsck@dev-disk-by\x2did-ata\x2dWDC_WD10EZEX\x2d60M2NA0_WD\x2dWCC3F5AYCJL7\x2dpart1.service loaded active exited
File System Check on /dev/disk/by-id/ata-WDC_WD10EZEX-60M2NA0_WD-WCC3F5AYCJL7-part1
systemd-fsck@dev-disk-by\x2did-ata\x2dWDC_WD10EZEX\x2d60M2NA0_WD\x2dWCC3F5AYCJL7\x2dpart2.service loaded active exited
File System Check on /dev/disk/by-id/ata-WDC_WD10EZEX-60M2NA0_WD-WCC3F5AYCJL7-part2
system-systemd\x2dfsck.slice loaded active active system-systemd\x2dfsck.slice
#
Only the “root” fsck service has a status indicating that it has been executed …
The “systemd-fsck-root.service” (also “systemd-fsck@.service”) man page indicates that:
systemd-fsck does not know any details about specific filesystems, and simply executes file system checkers
specific to each filesystem type (/sbin/fsck.*). This helper will decide if the filesystem should actually be
checked based on the time since last check, number of mounts, unclean unmount, etc.
Presumably, if the file system is Btrfs then, the correct File System Check will be invoked – for Btrfs File Systems, the correct “fsck” command is:
# btrfs check [options] <device>
Further details are available from the “btrfsck” man page …
[HR][/HR]Bottom line:
Yes, manually check the system’s filesystems – from the systemd “rescue mode”.
For the case of any XFS filesystem, please read the “xfs_repair” man page – please note that, this is not usually needed – XFS filesystems check themselves when they’re mounted …
/home is XFS.
According to man fsck.xfs it should exit directly. https://linux.die.net/man/8/fsck.xfs
Probably everything is alright and the log is just confusing us a bit by saying it’s stopping fsck on a XFS partition?
However I might run a xfs_check on it unmounted later just to be sure it’s ok.
But I don’t think this is causing the inability to shut off the computer. Whatever is blocking the shutdown runs after the logging is turned off, so one of the last things a shutdown would do where no more logging is expected.