Bad shutdown of Mysql/mariadb by systemd innodb corruption (log sequence numbers complaint)

taistoqvist · August 27, 2015, 4:55pm

Hi folks,

My mariadb isnt being shutdown correctly by my systemd scripts. Running a tail -f on my mysqld.error shows a clean shutdown when I manually run “kill -SIGTERM <pid>”.

But when I let systemd handle it, I get no printouts in mysqld.error indicating SIGTERM, and worse, on every startup directly after, I get:

150827 16:32:58 [Note] InnoDB: The log sequence numbers 506614450 and 506614450 in ibdata files do not match the log sequence number 506614460 in the ib_logfiles!
150827 16:32:58 [Note] InnoDB: Database was not shutdown normally!
150827 16:32:58 [Note] InnoDB: Starting crash recovery.
150827 16:32:58 [Note] InnoDB: Reading tablespace information from the .ibd files…
150827 16:32:58 [Note] InnoDB: Restoring possible half-written data pages

If I manually terminate with kill, I do NOT get these corruptions.

I’ve tried modifying my mysql.service

[Service]
Restart=on-failure
Type=simple
ExecStartPre=/usr/lib/mysql/mysql-systemd-helper install default
ExecStartPre=/usr/lib/mysql/mysql-systemd-helper upgrade default
ExecStart=/usr/lib/mysql/mysql-systemd-helper start default
ExecStartPost=/usr/lib/mysql/mysql-systemd-helper wait default
ExecStop=/usr/bin/kill -SIGTERM $MAINPID
PIDFile=/var/lib/mysql/mysqld.pid
KillMode=mixed
KillSignal=SIGTERM

Changing Killmode from default, to process, to mixed, but it only gets worse, never better.
Above shows my last attempt, when I manually call kill with sigterm before systemd starts its job but STILL mysqld doesnt shutdown properly!

I also updated /etc/systemd/system.conf to increase logging from systemd, but it doesnt really give me much.

Any help/ideas? I am running out of new things to try…

mariadb-10.0.20-2.12.1.x86_64
Linux lxwww 3.16.7-24-default #1 SMP Mon Aug 3 14:37:06 UTC 2015 (ec183cc) x86_64 x86_64 x86_64 GNU/Linux
openSUSE 13.2 (x86_64)

Best Regards
Taisto Qvist

gogalthorp · August 27, 2015, 5:13pm

Shutting down a DB can take some time and systemd tends to do things in parallel so you must set systemd to wait until the DB is shutdown before proceeding. SO the DB must be on the critical path.

john_hudson · August 28, 2015, 12:06am

I’m not clear; are you saying this happens when you issue

mysql.service stop

What is the script that is causing problems?

gogalthorp · August 28, 2015, 1:51am

The shut down sends stop to the DB but a db takes time to shut down so the shutdown process needs to wait until the db process is done. How to accomplish that well get the old systemd manuals out and start reading. I don’t know systemd but I do know how to read the symptoms

tsu2 · August 28, 2015, 8:49am

@taistoqvist
I think gogalthorpe’s observation supposes that you are running a system shutdown immediately after your mysql service shutdown which based on my read is not necessarily the case.

So, some clarification may be required here…
If as you say the systemd mysql service isn’t shutting down the mysql app properly and completely, then how do you know this(did you query the Pid?) and what is happening immediately afterwards? Is something happening after the service is supposed to shut down that is the actual cause terminating the mysql app improperly?

If you really are doing a system shutdown and expecting the system shutdown would properly shutdown the database app… Well, from what I’ve seen I wouldn’t expect that because I’ve seen all my openSUSE kill processes ungracefully during a system shutdown.

If you’re doing a system shutdown, I’d recommend that you simply ensure your database app has shutdown, maybe he following will do it (needs testing)

systemctl stop mysql.service && shutdown -h now

TSU

taistoqvist · August 28, 2015, 10:51am

Hi again, and lets see if I can add more info.

I’m never talking about shutting down the operating system. (even though this is how I noticed I got db corruption).

I am simply turning on and of the mysql server, with standard linux scripts. Either “systemctl stop mysql.service”, or rcmysql stop, but that eventually ends up using systemd scripts anyway.

So there is nothing privatly hacked here for turning off mysqld. I initially found this on a VM I was just playing around with and thought that my OS updates had messed something up with mysql-versions or whatever.
But yesterday I found this same behavior on a clone of a produktion webserver, where I certainly have not been adding any clever tweaks to mysql or system configuration for startup and shutdown of services. Its standard OpenSuse 13.2 scripts.

So. Two versions.

1a) I start mysqld with systemctl start mysql.service. MySQLd starts up. wait a while.
2a) I stop db wth systemctl stop mysql.service. No info is written to /var/lib/mysql/mysql.error about mysqld shutting down. Wait awhile.
3a) I start mysqld with systemctl start mysql.service. MySQLd starts up, and complains on database corruption.

A few said “you have to wait for db to shutdown”, but I am using standard systemd config, which if I have understood it correctly from the man page of systemd.kill and /etc/systemd/system.conf, will wait TimeoutStopSec between sending SIGTERM och SIGKILL to the process. Which defaults to 90 seconds.

In all scenarios, wether using kill och systemctl mysql stops and starts in a second. The dbs hosted are microscopic.

Version 2.

And version 2.
1b) I start mysqld with systemctl start mysql.service. MySQLd starts up. wait a while.
2b) I kill mysqld with pkill -SIGTERM mysqld, or kill -SIGTERM <mysqld-pid>. mysql.error says:
150828 10:12:16 [Note] Event Scheduler: Purging the queue. 0 events
150828 10:12:16 [Note] InnoDB: FTS optimize thread exiting.
150828 10:12:16 [Note] InnoDB: Starting shutdown…
150828 10:12:18 [Note] InnoDB: Shutdown completed; log sequence number 506614580
150828 10:12:18 [Note] /usr/sbin/mysqld: Shutdown complete
3b) I start mysqld with systemctl start mysql.service. MySQLd starts up WITHOUT complaining on database corruption.

After reading manpage for systemd.kill, I started fiddling with ExecStop and KillMode and KillSignal but to no avail. Unless I completely misunderstand the manpage, the default mysql.service [service]-section:

[Service]
Restart=on-failure
Type=simple
ExecStartPre=/usr/lib/mysql/mysql-systemd-helper install default
ExecStartPre=/usr/lib/mysql/mysql-systemd-helper upgrade default
ExecStart=/usr/lib/mysql/mysql-systemd-helper start default
ExecStartPost=/usr/lib/mysql/mysql-systemd-helper wait default

Implies to me that all the processes started (since KillMode would be default=control-group) would receive SIGTERM on shutdown, and after 90 seconds SIGKILL if they havent terminated.
Since mysqld is started via a the helper-wrapper script, I thought that it would help if I added the PIDFile param, to indicate the actual mysqld pid
that should receive SIGTERM, and also that maybe I should set KillMode to “process” instead, since when I am manually using kill I only send SIGTERM to the lead parent mysqld process and that works fine.

But it didnt help, I think I got minor changes in behavior, but did start working properly. I’ll keep working on this and fill in more details if needed today.

Many thanks for the feedback so far!

taistoqvist · August 28, 2015, 2:31pm

Okey, so I’ve tested a few more configurations of the mysql.service file that systemd uses to decide how to start and stop the database.

[Service]
Restart=on-failure
Type=simple
ExecStartPre=/usr/lib/mysql/mysql-systemd-helper install default
ExecStartPre=/usr/lib/mysql/mysql-systemd-helper upgrade default
ExecStart=/usr/lib/mysql/mysql-systemd-helper start default
ExecStartPost=/usr/lib/mysql/mysql-systemd-helper wait default
PIDFile=/var/lib/mysql/mysqld.pid
[FONT=courier new]ExecStop=<cmd to kill mysqld>
KillSignal=SIGTERM.
KillMode=process | none | mixed | process-group
[/FONT]
So, the indented keywords are not in the default file. Their purpose, afaik, is

PIDFile= Normally systemd guesses which pid is the “mainpid”, and will kill mainpid(and childs) depending on the setting of KillMode.

Standard setup, systemd will guess that mainpid is the pid of the script “mysqld-system-helper start”, which actually spawns the mysqld server without “&”, so it sticks around until mysqld is terminated.

Default, **KillMode **is set to “control-group”, which should cause systemd to send SIGTERM to ALL started processes, wait a long time, and then send SIGKILL if things dont terminate. But they do terminate, in a second or two. But mysql.error log does NOT indicate that innodb was shutdown properly.

I DO actually get a " /usr/sbin/mysqld: Normal shutdown" in the mysql.error, but not the same output as when I manually use “kill -SIGTERM <pid>”.

What I have to do, to make it work (that is, NOT to have mysql complain on corrupt innodb tables during startup), is to set ExecStop to a command that manually sends SIGTERM to the mysqld process.

If I do this I normally also have to set KillMode= to none.
This because if you for some reason actually want to terminate mysqld with kill/pkill, then systemd will ALSO(when noticing that mysql terminated) run the configured ExecStop command itself(why?). If this ExecStop-command just runs “kill -SIGTERM <PID>” and that pid has already terminated the kill command fails, making systemd believe that something is wrong with the service and tries to restart it again. Sure, special case by still.

If I do this, mysqld stops properly. I can - if I want for some unknown reason - still have KillMode set to “process” if I want, but then I have to remove the PidFile setting so that systemd will guess that the main pid is the script that started mysqld, and send its stop-signal there. If PidFile is still set to the true mysqld-parent-process pid, systemd will stop it in an way that WILL corrupt my innodb…which really makes me wonder about the signal systemd is sending. Turning on debugging on systemd doesnt indicate which.

So the strange thing here is that when I tweak the above ExecStop=<manual kill cmd> + KillMode=none, I get one SIGTERM sent to the parent mysqld process, and nothing more, which gives my a clean shutdown…

And this result - *unless I am STILL missing something from reading man-pages on systemd.{kill|service|unit} *- is exacty what I ***should ***be getting if i just modified the default mysql.service at the top to:

[Service]
Restart=on-failure
Type=simple
ExecStartPre=/usr/lib/mysql/mysql-systemd-helper install default
ExecStartPre=/usr/lib/mysql/mysql-systemd-helper upgrade default
ExecStart=/usr/lib/mysql/mysql-systemd-helper start default
ExecStartPost=/usr/lib/mysql/mysql-systemd-helper wait default
PIDFile=/var/lib/mysql/mysqld.pid // Send signal to mysqld, not to helper-script.
KillMode=process // Only send signal to parent process not to ALL.

I even added KillSignal to force the signal to be SIGTERM, which already is the default…but the above WILL give me a corrupt innodb.

My only conclusion is that systemd is NOT sending SIGTERM, but SIGKILL?? But that just doesnt make sense >:(

TQ

kstreitova · September 2, 2015, 5:53pm

You are right! The following patch should fix this problem:

Index: mysql-systemd-helper
===================================================================
--- mysql-systemd-helper        (revision 41)
+++ mysql-systemd-helper        (working copy)
@@ -144,7 +144,7 @@
 }
 
 mysql_start() {
-       /usr/sbin/mysqld --user="$mysql_daemon_user" $opts 
+       exec /usr/sbin/mysqld --user="$mysql_daemon_user" $opts 
 }
 
 # We rely on output in english at some points

The fix will be available in openSUSE 13.1 & 13.2 in the next mysql/mariadb update.