systemd - service - set start timeout

ceinma · May 31, 2014, 3:19am

Hi ,

Opensuse 13.1
systemd 208
+PAM +LIBWRAP +AUDIT +SELINUX -IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ

I run this linux at vmware esxi .
I having other problem where this linux is freezing and need to reboot manually.

When they restart , naturally run the fsck at mount points.
The problem is sometimes (most of times) it’s stop the boot and enter into emergency mode because it’s timeout the start of fsck service.
(at lest is what I think is the problem).

I want to change the timeout of all fsck services to timeout = 0 (no timeout) or some high value (like 10min) and avoid enter on the crappy emergency mode.

After lot research I found any clear answer… just the possibility to change manually the service file at /usr/lib/systemd/system.
But this look odds for me… I think should be a better way for that…

So, how I can change the timeout of all fsck (not root) at someway where this configuration will persists after any package updates and so on.

Thanks
Bests regards
Cesar

robin_listas · May 31, 2014, 5:03am

On 2014-05-31 03:26, ceinma wrote:

> When they restart , naturally run the fsck at mount points.
> The problem is sometimes (most of times) it’s stop the boot and enter
> into emergency mode because it’s timeout the start of fsck service.
> (at lest is what I think is the problem).

What timeout? I don’t understand.

–
Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

malcolmlewis · May 31, 2014, 5:45am

ceinma:

Hi ,

Opensuse 13.1
systemd 208
+PAM +LIBWRAP +AUDIT +SELINUX -IMA +SYSVINIT +LIBCRYPTSETUP +GCRYPT +ACL +XZ

I run this linux at vmware esxi .
I having other problem where this linux is freezing and need to reboot manually.

When they restart , naturally run the fsck at mount points.
The problem is sometimes (most of times) it’s stop the boot and enter into emergency mode because it’s timeout the start of fsck service.
(at lest is what I think is the problem).

I want to change the timeout of all fsck services to timeout = 0 (no timeout) or some high value (like 10min) and avoid enter on the crappy emergency mode.

After lot research I found any clear answer… just the possibility to change manually the service file at /usr/lib/systemd/system.
But this look odds for me… I think should be a better way for that…

So, how I can change the timeout of all fsck (not root) at someway where this configuration will persists after any package updates and so on.

Thanks
Bests regards
Cesar

Hi
You can always force the filesystem check so it won’t time out via a boot (kernel option) see


man 8 systemd-fsck@.service

fsck.mode=force

tsu2 · May 31, 2014, 8:11am

Aside from what others posted, you should never modify any file in the tree under

 /usr/lib/systemd/

If you want to customize a Unit file, you’re supposed to copy the file to the location described as follows, then modify your copy. As long as your copy exists, it will over-ride the file in the original location. If/when you want to return to the default settings you then simply delete your copy

cp  /usr/lib/systemd/system/*Unitfile* /etc/systemd/system/

I’m curious what modification you made which you think resolves your timeout issue.

TSU

robin_listas · May 31, 2014, 2:33pm

On 2014-05-31 08:16, tsu2 wrote:

> I’m curious what modification you made which you think resolves your
> timeout issue.

What is that timeout issue? :-??

–
Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

ceinma · June 1, 2014, 12:49am

Hi Carlos,

What is that timeout issue? :-??

When the systemd is starting the services , it have a timeout expiration to service start or consider the service starts failed…
I’m considering the fsck is taking more time what is this timeout (AFAIK is 90sec by default) because when I just restart the host again (with ctrl-alt-del , because the crappy systemd says entering in emergency mode but don’t give me the prompt) it’s just start all fine…

Hi tsu2,

If you want to customize a Unit file, you’re supposed to copy the file to the location described as follows, then modify your copy…

Thank you , I think your answer should be the correctly one… not simple and intuitive but if works, I will be happy…

Hi malcolmlewis,

You can always force the filesystem check so it won’t time out via a boot (kernel option) see

Hmmm…ok, I will try this too… I read the man and there do not have any mention about timeout, but I understand if they will force a full check, no timeout should be set…

So , I will try both solutions together , first I will check what unit I should copy to /etc/systemd/system (/user folder should work too?) and there set the fsck.mode=force …

Thanks guys…
This week I will try force some situations and if work I keep in touch.

Bests regards
Cesar

robin_listas · June 1, 2014, 1:23am

On 2014-06-01 00:56, ceinma wrote:
>
> Hi Carlos,
>
>> What is that timeout issue? :-??
> When the systemd is starting the services , it have a timeout expiration
> to service start or consider the service starts failed…
> I’m considering the fsck is taking more time what is this timeout (AFAIK
> is 90sec by default) because when I just restart the host again (with
> ctrl-alt-del , because the crappy systemd says entering in emergency
> mode but don’t give me the prompt) it’s just start all fine…

AHHH!

Ok, let me see if I understand.

Lets say the system is starting, and there is a problem in one
partition, and it runs fsck on it. If the fsck process takes more than
90 seconds, it aborts, and you get dumped into emergency mode. Is that
you mean?

Well, I have not noticed this behaviour myself (and I had a few full
kernel crashes recently, requiring full fsck of all partitions), but if
that is what happens, it is buggy. Systemd should wait for fsck to end,
no matter if it takes seconds or hours.

–
Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

gogalthorp · June 1, 2014, 2:49am

fsck only does a level 1 try at a fix running from boot if the fix may end up losing date it fails and you have to run it manually to force a more complete try which may result in lost files. So if the file system is seriously messed up you need to run a manual fsck. So I think what you are seeing is the auto-run fsck failing not a timeout.

robin_listas · June 1, 2014, 3:13am

On 2014-06-01 02:56, gogalthorp wrote:
>
> fsck only does a level 1 try at a fix running from boot if the fix may
> end up losing date it fails and you have to run it manually to force a
> more complete try which may result in lost files. So if the file system
> is seriously messed up you need to run a manual fsck. So I think what
> you are seeing is the auto-run fsck failing not a timeout.

What you describe is what I think is the normal flow of events I have
seen. The system runs fsck in automatic mode. If the issues found are to
complex for automatic mode, it bails out and dumps you into emergency
mode, so that you do it manually and take decisions.

So, yes, no timeout.

But if somebody can design a scenario to make it happen, I’ll try to
investigate. I have a bugzilla pending attention with a related problem,
so I could have a look at this as well.

–
Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

tsu2 · June 2, 2014, 7:42pm

More generally,
When you’re talking about fs problems in a virtual disk,

You need to not only run routine, ordinary maintenance (including fsck) like any normal Linux system, you <also> need to run disk integrity maintenance using tools from that virtualization technology(in your case VMware).

ie. typically
Run your normal fsck.
zero your empty space (typically using dd), this improves compression and speed
Take the disk offline (unmounts)
Run the virtual disk integrity check tools
If you <really> want to fix a faulty disk, backup the contents of the disk and restore to a brand new virtual disk.

Thkfully or knock on wood, I haven’t too many disk integrity issues and only had to do full maintenance repairs once.

TSU