I’ve a situation where I start a MySQL database and it’s always need execute the recovery process , so take few seconds/minutes to finish.
(this is expected , since I’m starting a backup from snapshot).
This always works before, but few days ago , become failing constantly ( I execute this 1 time per day)
After investigating , notice the recovery of mysql is taking +1 minute , but systemctl start is keeping they timeout fixed in 30 seconds, then if the service doesn’t start in this time it’s kill the process and put in failed .
I’m pretty sure about that because the mysql log shows it’s stop during recovery process and there’s no PID of mysqld anymore.
if start manually the MySQL manually , works fine…
I’ve try few parameters at mysql.unit where systemd was recognized , but there no effect when I start the service…
What I missing here ?
# systemctl cat mysql
# /usr/lib/systemd/system/mysql.service
[Unit]
Description=MySQL server
Wants=basic.target
After=basic.target network.target
# by Cesar
JobTimeoutSec=120s
[Service]
Type=forking
ExecStart=/usr/lib/mysql/rcmysql start
ExecStop=/usr/lib/mysql/rcmysql stop
# by Cesar
TimeoutStartSec=95s
[Install]
WantedBy=multi-user.target
# systemctl show mysql | grep -i time
TimeoutStartUSec=1min 35s
TimeoutStopUSec=1min 30s
WatchdogTimestampMonotonic=0
ExecMainStartTimestamp=Mon 2015-07-13 10:20:21 BRT
ExecMainStartTimestampMonotonic=844288186352
ExecMainExitTimestamp=Mon 2015-07-13 10:20:51 BRT
ExecMainExitTimestampMonotonic=844318601781
ExecStart={ path=/usr/lib/mysql/rcmysql ; argv]=/usr/lib/mysql/rcmysql start ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
ExecStop={ path=/usr/lib/mysql/rcmysql ; argv]=/usr/lib/mysql/rcmysql stop ; ignore_errors=no ; start_time=[n/a] ; stop_time=[n/a] ; pid=0 ; code=(null) ; status=0/0 }
LimitRTTIME=18446744073709551615
TimerSlackNSec=50000
InactiveExitTimestamp=Mon 2015-07-13 10:20:19 BRT
InactiveExitTimestampMonotonic=844286075868
ActiveEnterTimestamp=Mon 2015-07-13 10:20:21 BRT
ActiveEnterTimestampMonotonic=844288186394
ActiveExitTimestamp=Mon 2015-07-13 10:20:51 BRT
ActiveExitTimestampMonotonic=844317785447
InactiveEnterTimestamp=Mon 2015-07-13 10:20:51 BRT
InactiveEnterTimestampMonotonic=844318642487
JobTimeoutUSec=2min
ConditionTimestamp=Mon 2015-07-13 10:20:19 BRT
ConditionTimestampMonotonic=844286006583
# time systemctl start mysql
Job for mysql.service failed. See "systemctl status mysql.service" and "journalctl -xn" for details.
real 0m32.736s
user 0m0.002s
sys 0m0.003s
Maybe you should set a longer value, eg 3min ? According to the MAN pages, the service fails if your timeout isn’t configured long enough and 95sec might be enough only for a very small database. If you database grows, you may want to consider a much larger number.
TimeoutStartSec=
Configures the time to wait for start-up. If a daemon service does
not signal start-up completion within the configured time, the
service will be considered failed and will be shut down again.
Takes a unit-less value in seconds, or a time span value such as
"5min 20s". Pass "0" to disable the timeout logic. Defaults to
DefaultTimeoutStartSec= from the manager configuration file, except
when Type=oneshot is used, in which case the timeout is disabled by
default (see [systemd-system.conf(5)](http://www.dsm.fordham.edu/cgi-bin/man-cgi.pl?topic=systemd-system.conf&sect=5)).
I already try before , this values I show was just other test…
At first time I’ve set I use “3min 30s”… and got same effect.
Always the systemd “abort” the start of the service in 30 seconds…
Am wondering if your two custom commands might be conflicting with each other. Have you tried running only one or the other?
JobTimeoutSec
TimeoutStartSec
Also, don’t forget to run the following after any modifications to any Service Unit files else your changes won’t take (forgetting this leads to a lot of lost time)
Without knowing why your systemd setting isn’t working,
Maybe you should consider splitting out the backup creation from your existing job (ie create 2 smaller jobs from the one large job). Since the backup creation is likely the only thing causing the job to run exceedingly long, when you actually try to startup the mysql service the default startup values should not be exceeded.
It’s probably better practice anyway to configure multiple small jobs rather than one large job anyway, it should also run more efficiently.
As I understand; you physically copy /var/lib/mysql/* database files back, then restart MariaDB in order for it to go through recovery cycle.
Stupid question;
Why don’t you simply mysqldump with routines and then restore it as opposed to physically copying data? I don’t really see the benefit in restoring the files physically.
Don’t worry , isn’t stupid question since I don’t explain all context before. (my focus here is only systemctl , not mysql db)
This database run a Zabbix Server (IT services monitor) , so it have high volume of transactions by second (collecting data from hosts) and not should have any fault or performance issue.
To run mysqldump I need lock the database to keep the integrity of the data and it affect considerable the performance of MySQL and the server .
I need keep simple and reduce any overhead to backup all data and avoid affect the server performance (or consequently it affect the Zabbix monitor and trigger false alarms to all IT staff)
The best solution I got so far is lock the MySQL database , create a LVM snapshot of FS , release the database , this took 5 seconds top.
Then later I use a rsync to copy from the snapshot only changed data of db files to other server.
At this server I start the database (where I got the problem with systemctl), run the data check (this way I avoid any overhead at production and if get some problem then I act at our mysql production database) , then run a mysqldump… and at finish I keep a backup do db files + mysqldump…
(other solution which I don’t try yet is active a mysql replication… but now isn’t the moment)
Now, with the context explained… people , please , let keep the focus at systemctl… it have a problem , not my MySQL database…
Been awhile since I’ve architected dBs,
IMO the way you’re extracting changes is the source of your problem.
If you implement log shipping, you can create copies of your changes in real time, it’s not exactly the same as database replication which snapshots the data being transferred. This would be better if you’re not just archiving but want to use the data for analysis.
And, getting back to your current solution…
If I understand what you described correctly, your problem is actually recovering and mounting the <backup> database with the changes, not your Production.
Since this is not a Production MySQL, perhaps you shouldn’t be running it as a service?
Maybe you should just invoke (manually start, not run as a service) MySQL instead using the following command (well, it should. I’m not on a machine with MySQL at the moment but you should get the idea)
Hi Tsu ,
Yes and no … I’m extracting changes over a controlled situation where I know the db will have a work to do (recovery) when it is starded , but this don’t means the systemctl is working correctly…
About log shipping , yes , I’m aware about that and this is an option where I desire test in future…
Your suggestion make me investigate a little more.
Before start this thread I’ve tested the /usr/lib/mysql/rcmysql and get this output :
root@jdivm04:/var/log
# /usr/lib/mysql/rcmysql start
Starting service MySQL warning: /var/run/mysql/mysql.sock didn't appear within 30 seconds
chmod: cannot access '/var/run/mysql/mysqld.pid': No such file or directory
Something went wrong
Don’t know why , at my first test with this command I have not notice the database was startup despite of the output errors…
And works…
and enlightend me the problem is into this script… not at systemctl.
The problem with systemctl isn’t the timeout, is it believe the rcmysql really not work aborting in 30 seconds and kill the mysql PID … (I don’t know how…but someone kill)
here is the problem…
it’s fixed into mysql startup script… aaaaarrrrrrrrggh… crapy script…
584 wait_for_socket "$socket" || {
585 # let's not treat this as a fatal error
586 echo "warning: $socket didn't appear within 30 seconds"
126 wait_for_socket()
127 {
128 local i
129 for((i=0; i<150; i++)); do
130 sleep 0.2
131 test -S $1 && i='' && break
132 done
133 test -z "$i" || return 1
134 return 0
135 }
and I believe this is what create a confusion about rcmysql script and I don’t investigate it anymore (at first time)
jdivm04:/var/log/mysql # which rcmysql
/usr/sbin/rcmysql
jdivm04:/var/log/mysql # ls -l /usr/sbin/rcmysql
lrwxrwxrwx 1 root root 7 Jun 25 11:28 /usr/sbin/rcmysql -> service
So, problem detected and “solved” … and thank you for insist at mysql solution , after all the problem is with it … not systemd.