systemd - no effect change service start timeout configuration

ceinma · July 13, 2015, 4:45pm

Hi ,

openSUSE 13.2 (Harlequin) (x86_64)

systemctl --version

systemd 210
+PAM -LIBWRAP +AUDIT +SELINUX -IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ +SECCOMP +APPARMOR

I’ve a situation where I start a MySQL database and it’s always need execute the recovery process , so take few seconds/minutes to finish.
(this is expected , since I’m starting a backup from snapshot).

This always works before, but few days ago , become failing constantly ( I execute this 1 time per day)
After investigating , notice the recovery of mysql is taking +1 minute , but systemctl start is keeping they timeout fixed in 30 seconds, then if the service doesn’t start in this time it’s kill the process and put in failed .

I’m pretty sure about that because the mysql log shows it’s stop during recovery process and there’s no PID of mysqld anymore.
if start manually the MySQL manually , works fine…

I’ve try few parameters at mysql.unit where systemd was recognized , but there no effect when I start the service…
What I missing here ?


# systemctl cat mysql
# /usr/lib/systemd/system/mysql.service
[Unit]
Description=MySQL server
Wants=basic.target
After=basic.target network.target
# by Cesar
JobTimeoutSec=120s


[Service]
Type=forking
ExecStart=/usr/lib/mysql/rcmysql start
ExecStop=/usr/lib/mysql/rcmysql stop
# by Cesar
TimeoutStartSec=95s


[Install]
WantedBy=multi-user.target




# systemctl show mysql | grep -i time
TimeoutStartUSec=1min 35s
TimeoutStopUSec=1min 30s
WatchdogTimestampMonotonic=0
ExecMainStartTimestamp=Mon 2015-07-13 10:20:21 BRT
ExecMainStartTimestampMonotonic=844288186352
ExecMainExitTimestamp=Mon 2015-07-13 10:20:51 BRT
ExecMainExitTimestampMonotonic=844318601781
ExecStart={ path=/usr/lib/mysql/rcmysql ; argv]=/usr/lib/mysql/rcmysql start ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStop={ path=/usr/lib/mysql/rcmysql ; argv]=/usr/lib/mysql/rcmysql stop ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
LimitRTTIME=18446744073709551615
TimerSlackNSec=50000
InactiveExitTimestamp=Mon 2015-07-13 10:20:19 BRT
InactiveExitTimestampMonotonic=844286075868
ActiveEnterTimestamp=Mon 2015-07-13 10:20:21 BRT
ActiveEnterTimestampMonotonic=844288186394
ActiveExitTimestamp=Mon 2015-07-13 10:20:51 BRT
ActiveExitTimestampMonotonic=844317785447
InactiveEnterTimestamp=Mon 2015-07-13 10:20:51 BRT
InactiveEnterTimestampMonotonic=844318642487
JobTimeoutUSec=2min
ConditionTimestamp=Mon 2015-07-13 10:20:19 BRT
ConditionTimestampMonotonic=844286006583


# time systemctl start mysql
Job for mysql.service failed. See "systemctl status mysql.service" and "journalctl -xn" for details.

real    0m32.736s
user    0m0.002s
sys     0m0.003s

Miuku · July 13, 2015, 5:13pm

Could you post the output of:

mysql --version

Also

zypper sl -d

Wondering if you updated your MySQL/MariaDB to the one in the repo which is currently pretty fubar.

tsu2 · July 13, 2015, 5:34pm

Maybe you should set a longer value, eg 3min ? According to the MAN pages, the service fails if your timeout isn’t configured long enough and 95sec might be enough only for a very small database. If you database grows, you may want to consider a much larger number.

TSU

http://www.dsm.fordham.edu/cgi-bin/man-cgi.pl?topic=systemd.service
From the MAN pages

 TimeoutStartSec=           
           Configures the time to wait for start-up. If a daemon service does
           not signal start-up completion within the configured time, the
           service will be considered failed and will be shut down again.
           Takes a unit-less value in seconds, or a time span value such as
           "5min 20s". Pass "0" to disable the timeout logic. Defaults to
           DefaultTimeoutStartSec= from the manager configuration file, except
           when Type=oneshot is used, in which case the timeout is disabled by
            default (see [systemd-system.conf(5)](http://www.dsm.fordham.edu/cgi-bin/man-cgi.pl?topic=systemd-system.conf&ampsect=5)).

ceinma · July 13, 2015, 7:15pm

Hi TSU,

I already try before , this values I show was just other test…
At first time I’ve set I use “3min 30s”… and got same effect.
Always the systemd “abort” the start of the service in 30 seconds…

tsu2:

Maybe you should set a longer value, eg 3min ? According to the MAN pages, the service fails if your timeout isn’t configured long enough and 95sec might be enough only for a very small database. If you database grows, you may want to consider a much larger number.

TSU

http://www.dsm.fordham.edu/cgi-bin/man-cgi.pl?topic=systemd.service
From the MAN pages
 TimeoutStartSec=           
           Configures the time to wait for start-up. If a daemon service does
           not signal start-up completion within the configured time, the
           service will be considered failed and will be shut down again.
           Takes a unit-less value in seconds, or a time span value such as
           "5min 20s". Pass "0" to disable the timeout logic. Defaults to
           DefaultTimeoutStartSec= from the manager configuration file, except
           when Type=oneshot is used, in which case the timeout is disabled by
            default (see [systemd-system.conf(5)](http://www.dsm.fordham.edu/cgi-bin/man-cgi.pl?topic=systemd-system.conf&ampsect=5)).

ceinma · July 13, 2015, 7:38pm

TSU , just for sure I try again.
This time I set to 15 and 20 minutes… keeps aborting the start after 30 seconds.

tsu2 · July 13, 2015, 8:37pm

Am wondering if your two custom commands might be conflicting with each other. Have you tried running only one or the other?

JobTimeoutSec
TimeoutStartSec

Also, don’t forget to run the following after any modifications to any Service Unit files else your changes won’t take (forgetting this leads to a lot of lost time)

systemctl daemon-reload

TSU

ceinma · July 13, 2015, 9:24pm

Yes, I try one first , the other…the both… then others (which isn’t in this examples)…

and yes to daemon-reload , I always execute it and check at journalctl if give some syntax error or invalid parameter…

tsu2 · July 15, 2015, 2:32am

Without knowing why your systemd setting isn’t working,

Maybe you should consider splitting out the backup creation from your existing job (ie create 2 smaller jobs from the one large job). Since the backup creation is likely the only thing causing the job to run exceedingly long, when you actually try to startup the mysql service the default startup values should not be exceeded.

It’s probably better practice anyway to configure multiple small jobs rather than one large job anyway, it should also run more efficiently.

TSU

ceinma · July 16, 2015, 1:12pm

Hi Tsu ,

There is no way to split the job… since this is a snapshot and not a regular backup with incremental data.

Is there some way to debug the systemctl command ?

someting like verbose mode… : systemctl -v start mysql

Miuku · July 16, 2015, 2:39pm

As I understand; you physically copy /var/lib/mysql/* database files back, then restart MariaDB in order for it to go through recovery cycle.

Stupid question;
Why don’t you simply mysqldump with routines and then restore it as opposed to physically copying data? I don’t really see the benefit in restoring the files physically.

ceinma · July 16, 2015, 3:34pm

Hi Miuku ,

Don’t worry , isn’t stupid question since I don’t explain all context before. (my focus here is only systemctl , not mysql db)
This database run a Zabbix Server (IT services monitor) , so it have high volume of transactions by second (collecting data from hosts) and not should have any fault or performance issue.
To run mysqldump I need lock the database to keep the integrity of the data and it affect considerable the performance of MySQL and the server .
I need keep simple and reduce any overhead to backup all data and avoid affect the server performance (or consequently it affect the Zabbix monitor and trigger false alarms to all IT staff)

The best solution I got so far is lock the MySQL database , create a LVM snapshot of FS , release the database , this took 5 seconds top.
Then later I use a rsync to copy from the snapshot only changed data of db files to other server.
At this server I start the database (where I got the problem with systemctl), run the data check (this way I avoid any overhead at production and if get some problem then I act at our mysql production database) , then run a mysqldump… and at finish I keep a backup do db files + mysqldump…

(other solution which I don’t try yet is active a mysql replication… but now isn’t the moment)

Now, with the context explained… people , please , let keep the focus at systemctl… it have a problem , not my MySQL database…

tsu2 · July 16, 2015, 5:59pm

ceinma:

Hi Miuku ,

Don’t worry , isn’t stupid question since I don’t explain all context before. (my focus here is only systemctl , not mysql db)
This database run a Zabbix Server (IT services monitor) , so it have high volume of transactions by second (collecting data from hosts) and not should have any fault or performance issue.
To run mysqldump I need lock the database to keep the integrity of the data and it affect considerable the performance of MySQL and the server .
I need keep simple and reduce any overhead to backup all data and avoid affect the server performance (or consequently it affect the Zabbix monitor and trigger false alarms to all IT staff)

The best solution I got so far is lock the MySQL database , create a LVM snapshot of FS , release the database , this took 5 seconds top.
Then later I use a rsync to copy from the snapshot only changed data of db files to other server.
At this server I start the database (where I got the problem with systemctl), run the data check (this way I avoid any overhead at production and if get some problem then I act at our mysql production database) , then run a mysqldump… and at finish I keep a backup do db files + mysqldump…

(other solution which I don’t try yet is active a mysql replication… but now isn’t the moment)

Now, with the context explained… people , please , let keep the focus at systemctl… it have a problem , not my MySQL database…

Been awhile since I’ve architected dBs,
IMO the way you’re extracting changes is the source of your problem.

If you implement log shipping, you can create copies of your changes in real time, it’s not exactly the same as database replication which snapshots the data being transferred. This would be better if you’re not just archiving but want to use the data for analysis.

And, getting back to your current solution…
If I understand what you described correctly, your problem is actually recovering and mounting the <backup> database with the changes, not your Production.
Since this is not a Production MySQL, perhaps you shouldn’t be running it as a service?
Maybe you should just invoke (manually start, not run as a service) MySQL instead using the following command (well, it should. I’m not on a machine with MySQL at the moment but you should get the idea)

/usr/lib/mysql/rcmysql start

TSU

ceinma · July 16, 2015, 7:55pm

tsu2:

Been awhile since I’ve architected dBs,
IMO the way you’re extracting changes is the source of your problem.

If you implement log shipping, you can create copies of your changes in real time, it’s not exactly the same as database replication which snapshots the data being transferred. This would be better if you’re not just archiving but want to use the data for analysis.

And, getting back to your current solution…
If I understand what you described correctly, your problem is actually recovering and mounting the <backup> database with the changes, not your Production.
Since this is not a Production MySQL, perhaps you shouldn’t be running it as a service?
Maybe you should just invoke (manually start, not run as a service) MySQL instead using the following command (well, it should. I’m not on a machine with MySQL at the moment but you should get the idea)
/usr/lib/mysql/rcmysql start
TSU

Hi Tsu ,
Yes and no … I’m extracting changes over a controlled situation where I know the db will have a work to do (recovery) when it is starded , but this don’t means the systemctl is working correctly…
About log shipping , yes , I’m aware about that and this is an option where I desire test in future…

Your suggestion make me investigate a little more.
Before start this thread I’ve tested the /usr/lib/mysql/rcmysql and get this output :

root@jdivm04:/var/log
# /usr/lib/mysql/rcmysql start
Starting service MySQL warning: /var/run/mysql/mysql.sock didn't appear within 30 seconds
chmod: cannot access '/var/run/mysql/mysqld.pid': No such file or directory
Something went wrong

Don’t know why , at my first test with this command I have not notice the database was startup despite of the output errors…
And works…
and enlightend me the problem is into this script… not at systemctl.
The problem with systemctl isn’t the timeout, is it believe the rcmysql really not work aborting in 30 seconds and kill the mysql PID … (I don’t know how…but someone kill)

here is the problem…
it’s fixed into mysql startup script… aaaaarrrrrrrrggh… crapy script…


584         wait_for_socket "$socket" || {
585             # let's not treat this as a fatal error
586             echo "warning: $socket didn't appear within 30 seconds"



126 wait_for_socket()
127 {
128     local i
129     for((i=0; i<150; i++)); do
130         sleep 0.2
131         test -S $1 && i='' && break
132     done
133     test -z "$i" || return 1
134     return 0
135 }

and I believe this is what create a confusion about rcmysql script and I don’t investigate it anymore (at first time)


jdivm04:/var/log/mysql # which rcmysql
/usr/sbin/rcmysql
jdivm04:/var/log/mysql # ls -l /usr/sbin/rcmysql
lrwxrwxrwx 1 root root 7 Jun 25 11:28 /usr/sbin/rcmysql -> service

So, problem detected and “solved” … and thank you for insist at mysql solution , after all the problem is with it … not systemd.

tsu2 · July 16, 2015, 7:58pm

Congrats,
That was an interesting troubleshoot.
Thx for posting.

TSU