I had a failure of ending Session 2 of my user and a 1:29 wait. The resulting shutdown.txt file is 8400 lines of text.
Using the journalctl command I get:
**RYZEN9:~ #** journalctl -b -1 -u *.target
-- Logs begin at Sat 2021-07-17 14:12:25 CDT, end at Fri 2021-07-23 13:02:42 CDT. --
-- No entries --
which makes no sense since the journal file is 56MB in size. This is the complete path the system made for the journal after I created the directory that was specified:
The test one I created on this system (ref post #19) was 7909 lines.
Open the file in a text editor and go to the last line, although the file is large it’s quite easy to quickly scroll back watching the times at the beginning of each line to find the point at which the shutdown pauses. Remember, you’re looking for a 90 second gap, so it should be fairly obvious, even when scrolling quickly.
What was Session 2 of your user doing? Perhaps you should have terminated that first before closing the system down?
All files within the journal directory are in machine language and not viewable in a text editor.
I never open a second session under my user. It is something that the system is doing on it’s own. I thought that it had something to so with using “su”, but I did not use “su” last night and still got the Session 2 shutdown delay.
The shutdown.txt file content has no time stamps. I’m currently looking at the journalctl content for clues.
I did not use a “su” terminal last night that I remember, but I always use Yast Online Update immediately after every boot.
This may only address half of the problem. If I get a shutdown delay for my user (not Session 2), it is usually after I have been accessing my larger hard drives, even if I shut down 30 minutes after gkrellm shows that any file moves have completed. My primary concern with that is that twice now I have lost the mappings to a large drive upon shutdown and have to manually re-create the Luks mapping after the next boot. My primary concern is possible data loss if the drive is being forcibly shut down. The drives in question are several Terabytes in size.
The next time I get a shutdown error that is not Session 2, I will go back and search for the cause.
I was referring to “shutdown-log.txt”, not the journal.
I never open a second session under my user. It is something that the system is doing on it’s own. I thought that it had something to so with using “su”, but I did not use “su” last night and still got the Session 2 shutdown delay.
OK - Then the shutdown-log could contain the answers…
The shutdown.txt file content has no time stamps. I’m currently looking at the journalctl content for clues.
Really ?? - Then something is very amiss in it’s creation. Again from the test one I created on this system (ref post #19):
paul@Orion-15:~$ cd /
paul@Orion-15:/$ ls
bin dev home lib64 mnt proc run shutdown-log.txt sys usr
boot etc lib lost+found opt root sbin srv tmp var
paul@Orion-15:/$ tail -n 20 shutdown-log.txt
201.639555] systemd-shutdown[1]: Detaching loop devices.
201.639572] systemd-shutdown[1]: sd-device-enumerator: Scan all dirs
201.639599] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/bus
201.639629] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/class
201.639683] systemd-shutdown[1]: All loop devices detached.
201.639686] systemd-shutdown[1]: Stopping MD devices.
201.639692] systemd-shutdown[1]: sd-device-enumerator: Scan all dirs
201.639705] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/bus
201.639727] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/class
201.639766] systemd-shutdown[1]: All MD devices stopped.
201.639770] systemd-shutdown[1]: Detaching DM devices.
201.639776] systemd-shutdown[1]: sd-device-enumerator: Scan all dirs
201.639788] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/bus
201.639808] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/class
201.639846] systemd-shutdown[1]: All DM devices detached.
201.639851] systemd-shutdown[1]: All filesystems, swaps, loop devices, MD devices and DM devices detached.
201.640131] systemd-shutdown[1]: Successfully forked off '(sd-executor)' as PID 1593.
201.662719] [1593]: Successfully forked off '(direxec)' as PID 1594.
201.663171] [1593]: Successfully forked off '(direxec)' as PID 1595.
201.853498] EXT4-fs (sdc1): re-mounted. Opts: acl,user_xattr. Quota mode: none.
paul@Orion-15:/$
I guess it was my mistake. I do not see time stamps. I would call those index numbers and have no idea how to interpret “minutes” and “seconds”. I do note a difference between your output and mine. Mine was never able to finalize the final DM device and apparently forced it.
3238.254689] systemd-shutdown[1]: All filesystems unmounted.
3238.254691] systemd-shutdown[1]: Deactivating swaps.
3238.254715] systemd-shutdown[1]: All swaps deactivated.
3238.254717] systemd-shutdown[1]: Detaching loop devices.
3238.254726] systemd-shutdown[1]: sd-device-enumerator: Scan all dirs
3238.254742] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/bus
3238.254766] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/class
3238.254816] systemd-shutdown[1]: All loop devices detached.
3238.254817] systemd-shutdown[1]: Detaching DM devices.
3238.254820] systemd-shutdown[1]: sd-device-enumerator: Scan all dirs
3238.254825] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/bus
3238.254837] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/class
3238.254987] systemd-shutdown[1]: Not all DM devices detached, 1 left.
3238.255104] systemd-shutdown[1]: Detaching DM devices.
3238.255107] systemd-shutdown[1]: sd-device-enumerator: Scan all dirs
3238.255113] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/bus
3238.255126] systemd-shutdown[1]: sd-device-enumerator: Scanning /sys/class
3238.255249] systemd-shutdown[1]: Not all DM devices detached, 1 left.
3238.255251] systemd-shutdown[1]: Cannot finalize remaining DM devices, continuing.
3238.255395] systemd-shutdown[1]: Successfully forked off '(sd-executor)' as PID 11207.
3238.256992] [11207]: Successfully forked off '(direxec)' as PID 11208.
3238.257147] [11207]: Successfully forked off '(direxec)' as PID 11209.
3238.257296] [11207]: Successfully forked off '(direxec)' as PID 11210.
3238.263967] EXT4-fs (dm-0): re-mounted. Opts: (null)
I would note that my / is encrypted, which may be the DM that couldn’t be finalized because it was being written to.
OK. Time stamp or index number, you need to look further back in the shutdown-log.txt to find the point at which the 90 sec timeout begins, look at what is happening in the log for several lines immediately prior to that for any possible clues.
The only pre-timeout failure is the package “pidgin” and it is always the same:
3145.197645] systemd[1]: Received SIGCHLD from PID 3606 (pidgin).
3145.197658] systemd[1]: Child 3606 (pidgin) died (code=exited, status=1/FAILURE)
3145.197688] systemd[1]: session-2.scope: Failed to read oom_kill field of memory.events cgroup attribute: No such file or directory
3145.197690] systemd[1]: session-2.scope: Child 3606 belongs to session-2.scope.
The code shows that the process died, so I am not sure of it could be causing the timeout. I also do not know why pidgin is under Session-2, unless pidgin is spawning Session-2 if I open an active chat window.
I will verify the theory shortly. I do not run pidgin as SU, but I do have it set to autostart with KDE upon login.
I have saved 3 of the shutdown.txt files and they all show the same at the end of the time-out. It is basically the same information as the journalctl command gave me:
From failure 1:
3235.494506] systemd[1]: session-2.scope: Stopping timed out. Killing.
3235.494779] systemd[1]: session-2.scope: Killing process 3937 (kdesud) with signal SIGKILL.
3235.494831] systemd[1]: session-2.scope: Failed with result 'timeout'.
3235.494838] systemd[1]: session-2.scope changed stop-sigterm -> failed
From failure 2:
1871.515893] systemd[1]: session-2.scope: Stopping timed out. Killing.
1871.516037] systemd[1]: session-2.scope: Killing process 3833 (kdesud) with signal SIGKILL.
1871.516094] systemd[1]: session-2.scope: Failed with result 'timeout'.
1871.516103] systemd[1]: session-2.scope changed stop-sigterm -> failed
From failure 3:
6451.547019] systemd[1]: systemd-timesyncd.service: Got notification message from PID 1364 (WATCHDOG=1)
6472.546754] systemd[1]: session-2.scope: Stopping timed out. Killing.
6472.546901] systemd[1]: session-2.scope: Killing process 2894 (kdesud) with signal SIGKILL.
6472.546961] systemd[1]: session-2.scope: Failed with result 'timeout'.
6472.546969] systemd[1]: session-2.scope changed stop-sigterm -> failed
It seems that kdesud may be a problem. My question is, what is causing Session-2 of my user to start in the first place? On the third failure, the only package I ran as root was Yast2.Online Update. It was opened by the system icon and not by a terminal. That is the only time I used the admin password during that session.
On my machine there is a unit session-2.scope too:
**erlangen:~ #** systemctl list-units --type scope
UNIT LOAD ACTIVE SUB DESCRIPTION
init.scope loaded active running System and Service Manager
session-2.scope loaded active running Session 2 of user karl
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
**2 loaded units listed.** Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
**erlangen:~ #**
In the first place it’s me who is creating this unit by logging in.
I was not aware. Mine shows the same immediately upon login. I ran the systemctl command as a regular user, but the output was identical.
The question remains. What is causing the 90 second pause? “pidgin” is reported to die at the beginning of the pause. “kdesud” is killed immediately at the end of the pause / timeout.
I can confirm that in my last session, I did not open a chat window with pidgin and although pidgin is always running in the background, shutdown was normal. I am chatting via pidgin now. Once done, I will close the chat window, and then exit pidgin before shutdown. Perhaps shutdown will be normal if pidgin is no longer running in the background during shutdown. Possibly there is a bug in pidgin, but I would not know what to report, other than the slow shutdown.
I am installing psi+ and will try it as an alternative to pidgin. I will also stop pidgin from starting at login.
Pidgin always started upon login by choice. The process only caused problems during the sessions that I actively opened a pidgin chat window and initiated or responded to a chat. Those are the times when there was a delay during shutdown. If I never opened a chat window, pidgin ran only in the background and shutdown was normal.
I have replaced pidgin with another XMPP/Jabber client and my shutdowns have been normal so far, even after active chats.
I had another timeout on my last shutdown. It seems to be related to the second shutdown problem that I had mentioned. If I am interpreting it correctly, the system is having problems releasing my two large encrypted luks drives. The code is fairly long and I apologize, but I wanted to include all from the time the system called for release of the encrypted drives.
857.473620] systemd-cryptsetup[5635]: Deactivating volume 04T.
857.473625] systemd-cryptsetup[5635]: dm versions opencount flush ] [16384] (*1)
857.473630] systemd-cryptsetup[5635]: dm status 04T opencount noflush ] [16384] (*1)
857.473635] systemd-cryptsetup[5635]: dm versions opencount flush ] [16384] (*1)
857.473638] systemd-cryptsetup[5635]: dm table 04T opencount flush securedata ] [16384] (*1)
857.473647] systemd-cryptsetup[5635]: Trying to open and read device /dev/sdc1 with direct-io.
857.473839] systemd-cryptsetup[5635]: dm versions opencount flush ] [16384] (*1)
857.473852] systemd-cryptsetup[5635]: Udev cookie 0xd4d04e1 (semid 15) created
857.473855] systemd-cryptsetup[5635]: Udev cookie 0xd4d04e1 (semid 15) incremented to 1
857.473858] systemd-cryptsetup[5635]: Udev cookie 0xd4d04e1 (semid 15) incremented to 2
857.473862] systemd-cryptsetup[5635]: Udev cookie 0xd4d04e1 (semid 15) assigned to REMOVE task(2) with flags DISABLE_LIBRARY_FALLBACK (0x20)
857.473865] systemd-cryptsetup[5635]: dm remove 04T opencount flush retryremove ] [16384] (*1)
857.473868] systemd-cryptsetup[5635]: device-mapper: remove ioctl on 04T failed: Device or resource busy
857.475015] systemd-cryptsetup[5645]: Allocating crypt device context by device 10T.
857.475022] systemd-cryptsetup[5645]: Initialising device-mapper backend library.
857.475097] systemd-cryptsetup[5645]: dm version opencount flush ] [16384] (*1)
857.475106] systemd-cryptsetup[5645]: dm versions opencount flush ] [16384] (*1)
857.475113] systemd-cryptsetup[5645]: Detected dm-ioctl version 4.42.0.
857.475115] systemd-cryptsetup[5645]: Detected dm-crypt version 1.22.0.
857.475129] systemd-cryptsetup[5645]: Device-mapper backend running with UDEV support enabled.
857.475134] systemd-cryptsetup[5645]: dm status 10T opencount noflush ] [16384] (*1)
857.475142] systemd-cryptsetup[5645]: Releasing device-mapper backend.
857.475167] systemd-cryptsetup[5645]: Trying to open and read device /dev/sdb1 with direct-io.
857.488119] systemd-journald[1205]: Successfully sent stream file descriptor to service manager.
857.489356] systemd-cryptsetup[5645]: Allocating context for crypt device /dev/sdb1.
857.489362] systemd-cryptsetup[5645]: Trying to open and read device /dev/sdb1 with direct-io.
857.489599] systemd-cryptsetup[5645]: Initialising device-mapper backend library.
857.489618] systemd-cryptsetup[5645]: dm versions opencount flush ] [16384] (*1)
857.489639] systemd-cryptsetup[5645]: dm table 10T opencount flush securedata ] [16384] (*1)
857.489668] systemd-cryptsetup[5645]: Trying to open and read device /dev/sdb1 with direct-io.
857.489843] systemd-cryptsetup[5645]: dm versions opencount flush ] [16384] (*1)
857.489853] systemd-cryptsetup[5645]: dm deps 10T opencount flush ] [16384] (*1)
857.489882] systemd-cryptsetup[5645]: Crypto backend (OpenSSL 1.1.1d 10 Sep 2019) initialized in cryptsetup library version 2.3.4.
857.489886] systemd-cryptsetup[5645]: Detected kernel Linux 5.3.18-59.16-default x86_64.
857.490109] systemd-cryptsetup[5645]: PBKDF pbkdf2-sha256, time_ms 2000 (iterations 0).
857.490113] systemd-cryptsetup[5645]: Reading LUKS header of size 1024 from device /dev/sdb1
857.490278] systemd-cryptsetup[5645]: Key length 32, device size 19532871647 sectors, header size 2050 sectors.
857.490285] systemd-cryptsetup[5645]: Deactivating volume 10T.
857.490294] systemd-cryptsetup[5645]: dm versions opencount flush ] [16384] (*1)
857.490305] systemd-cryptsetup[5645]: dm status 10T opencount noflush ] [16384] (*1)
857.490315] systemd-cryptsetup[5645]: dm versions opencount flush ] [16384] (*1)
857.490323] systemd-cryptsetup[5645]: dm table 10T opencount flush securedata ] [16384] (*1)
857.490342] systemd-cryptsetup[5645]: Trying to open and read device /dev/sdb1 with direct-io.
857.490771] systemd-cryptsetup[5645]: dm versions opencount flush ] [16384] (*1)
857.490798] systemd-cryptsetup[5645]: Udev cookie 0xd4deeea (semid 16) created
857.490804] systemd-cryptsetup[5645]: Udev cookie 0xd4deeea (semid 16) incremented to 1
857.490812] systemd-cryptsetup[5645]: Udev cookie 0xd4deeea (semid 16) incremented to 2
857.490819] systemd-cryptsetup[5645]: Udev cookie 0xd4deeea (semid 16) assigned to REMOVE task(2) with flags DISABLE_LIBRARY_FALLBACK (0x20)
857.490826] systemd-cryptsetup[5645]: dm remove 10T opencount flush retryremove ] [16384] (*1)
857.512600] systemd-cryptsetup[5645]: Udev cookie 0xd4deeea (semid 16) decremented to 0
857.512604] systemd-cryptsetup[5645]: Udev cookie 0xd4deeea (semid 16) waiting for zero
857.512608] systemd-cryptsetup[5645]: Udev cookie 0xd4deeea (semid 16) destroyed
857.512613] systemd-cryptsetup[5645]: Releasing crypt device /dev/sdb1 context.
857.512615] systemd-cryptsetup[5645]: Releasing device-mapper backend.
857.512618] systemd-cryptsetup[5645]: Closing read only fd for /dev/sdb1.
857.673954] systemd-cryptsetup[5635]: dm remove 04T opencount flush retryremove ] [16384] (*2)
857.696580] systemd-cryptsetup[5635]: Udev cookie 0xd4d04e1 (semid 15) decremented to 0
857.696583] systemd-cryptsetup[5635]: Udev cookie 0xd4d04e1 (semid 15) waiting for zero
857.696588] systemd-cryptsetup[5635]: Udev cookie 0xd4d04e1 (semid 15) destroyed
857.696599] systemd-cryptsetup[5635]: Releasing crypt device /dev/sdc1 context.
857.696601] systemd-cryptsetup[5635]: Releasing device-mapper backend.
857.696606] systemd-cryptsetup[5635]: Closing read only fd for /dev/sdc1.
858.917164] systemd-journald[1205]: Successfully sent stream file descriptor to service manager.
858.919642] systemd-journald[1205]: Successfully sent stream file descriptor to service manager.
887.494395] systemd[1]: systemd-logind.service: Got notification message from PID 1726 (WATCHDOG=1)
947.493625] systemd[1]: session-2.scope: Stopping timed out. Killing.
947.493892] systemd[1]: session-2.scope: Killing process 3449 (kdesud) with signal SIGKILL.
947.493944] systemd[1]: session-2.scope: Failed with result 'timeout'.
947.493951] systemd[1]: session-2.scope changed stop-sigterm -> failed
All file managers and programs that were accessing the drives were closed before I initiated shutdown. This takes me back to my hypothesis that lagging writes from disk caching may be part of the problem. I have 64GB of RAM and a large part of it is used by disk caching for those two drives. What might I do to further troubleshoot this?
All file managers and programs that were accessing the drives were closed before I initiated shutdown. This takes me back to my hypothesis that lagging writes from disk caching may be part of the problem. I have 64GB of RAM and a large part of it is used by disk caching for those two drives. What might I do to further troubleshoot this?
Analyzing wild hypotheses has no merit. Rely on the system’s tools for analysis, such as ‘systemd-cgls -u user-1000.slice’ for user processes and ‘systemd-cgls -u system.slice’.
If I knew all the tools for diagnostics and troubleshooting I would not need this forum. I was simply throwing out an idea based on the fact that it seems to occur after I move large files, my RAM maxes out because of disk cache, and remains maxed out even 30 minutes after the file transfers are done. I suppose I can keep random observations to myself and post only output of commands that you specify.
Analyzing wild hypotheses has no merit. Rely on the system’s tools for analysis, such as ‘systemd-cgls -u user-1000.slice’ for user processes and ‘systemd-cgls -u system.slice’.
I assume you mean I need to run both tools every time before shutdown? It may be a week or more before this error is duplicated.