Zarafa-spooler does not work when started by systemd

When systemd is used to start zarafa-spooler the process starts but does not pickup and deliver any message from zarafa-server. If the spooler is started on the command line it works correctly. If systemd is replaced by sysvinit as the boot mechanism, the spooler also works correctly. This is true on openSUSE 12.1, 12.2 and 12.3 with zarafa 7.0.x and zarafa 7.1.x from the opensuse repository.
Prior to my recent upgrade to 12.3 this has not been an issue for me as I’ve been using sysvinit and I assumed it was a teething problem with systemd on the earlier releases but now I’m on 12.3 with systemd the problem is still here so it needs to be investigated. I’m assuming this is a problem to do with the very limited execution environment systemd passes on to process it starts and that there is something special that the zarafa-spooler needs that isn’t being provided by systemd. Note that all zarafa processes are still using /etc/init.d scripts rather than unit files.

Using strace on a working zarafa-spooler (started from a shell) I get this output when a single message is in the queue to be processed:

Process 19380 attached
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
read(0, "\5[19379] Number of messages in t"..., 10240) = 44
time(NULL)                              = 1373286326
write(4, "Mon Jul  8 13:25:26 2013: [19379"..., 69) = 69
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
read(0, "\6[20278] /usr/sbin/zarafa-spoole"..., 10240) = 322
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
read(0, "\5[19379] Spooler process started"..., 10240) = 46
time(NULL)                              = 1373286326
write(4, "Mon Jul  8 13:25:26 2013: [19379"..., 71) = 71
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
read(0, "\6[20278] PYTHONPATH = /usr/share"..., 10240) = 55
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
read(0, "\5[20278] * Loading plugins start"..., 10240) = 35
time(NULL)                              = 1373286326
write(4, "Mon Jul  8 13:25:26 2013: [20278"..., 60) = 60
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
read(0, "\5[20278] ** Checking plugins in "..., 10240) = 96
time(NULL)                              = 1373286326
write(4, "Mon Jul  8 13:25:26 2013: [20278"..., 89) = 89
time(NULL)                              = 1373286326
write(4, "Mon Jul  8 13:25:26 2013: [20278"..., 57) = 57
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
read(0, "\1[20278] Sending e-mail for user"..., 10240) = 64
time(NULL)                              = 1373286326
write(4, "Mon Jul  8 13:25:26 2013: [20278"..., 89) = 89
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
read(0, "\5[20278] * PreSending processing"..., 10240) = 79
time(NULL)                              = 1373286327
write(4, "Mon Jul  8 13:25:27 2013: [20278"..., 66) = 66
time(NULL)                              = 1373286327
write(4, "Mon Jul  8 13:25:27 2013: [20278"..., 63) = 63
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
read(0, "\2[20278] Sending message with Me"..., 10240) = 104
time(NULL)                              = 1373286327
write(4, "Mon Jul  8 13:25:27 2013: [20278"..., 129) = 129
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
read(0, "\6[20278] RCPT TO: ian@hodgepigs."..., 10240) = 39
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
read(0, "\3[20278] SMTP: 2.0.0 Ok: queued "..., 10240) = 46
time(NULL)                              = 1373286327
write(4, "Mon Jul  8 13:25:27 2013: [20278"..., 71) = 71
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
read(0, "\1[20278] E-mail for user ian was"..., 10240) = 57

On a spooler started from systemd, this is what I get:

Process 20339 attached
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
read(0, "\5[20338] Number of messages in t"..., 10240) = 44
time(NULL)                              = 1373286461
write(4, "Mon Jul  8 13:27:41 2013: [20338"..., 69) = 69
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
read(0, "\6[20347] /usr/sbin/zarafa-spoole"..., 10240) = 322
select(1, [0], NULL, NULL, NULL)        = 1 (in [0])
read(0, "\5[20338] Spooler process started"..., 10240) = 46

I’m not sure how to go about debugging this any further - suggestions welcome!

If you suspect an issue relating to a systemd Unit, the following command returns

  • The current state of the Unit (eg loaded/unloaded, enabled/unenabled, running, stopped, paused)
  • If a problem exists, the last few lines from the syslog (journal) is displayed
  • The exact name and location of the Unit config file
systemctl status *unitname* 

If you don’t know the full name of the Unit or are a lazy typer (as I am) you can often use TAB to autocomplete the string, eg

systemctl status zara [TAB] 

If you post the result of this command, then your problem will likely be resolved.

TSU

So if we are saying that an init script that worked before systemd and is not working now with systemd, have a look at a solution I made with dkms by moving the init script and creating a real service file for it. Here is that example:

DKMS & systemd - How to get Dynamic Kernel Module Support to work in openSUSE 12.1 - Blogs - openSUSE Forums

And here is a script for working with systemd you might find useful:

SysdCmd - systemd Command Help/Config Editor - Blogs - openSUSE Forums

You the potential to fix this yourself if you want, but you will need to read up on systemd. It can be done so don’t give up just yet.

Thank You,

Thanks. I’d already looked at that but the systemd status doesn’t tell us anything. As far as systemd is concerned the spooler is working fine (except that it isn’t):


condor.hodgepigs.org.uk:/home/ian# systemctl status zarafa-spooler.service 
zarafa-spooler.service - LSB: Zarafa Collaboration Server's Spooler Daemon
          Loaded: loaded (/etc/init.d/zarafa-spooler)
          Active: active (running) since Tue, 2013-07-16 14:20:11 BST; 8s ago
         Process: 32402 ExecStart=/etc/init.d/zarafa-spooler start (code=exited, status=0/SUCCESS)
          CGroup: name=systemd:/system/zarafa-spooler.service
                  ├ 32412 /usr/sbin/zarafa-spooler -c /etc/zarafa/spooler.cfg
                  └ 32413 /usr/sbin/zarafa-spooler -c /etc/zarafa/spooler.cfg

Jul 16 14:20:11 condor.hodgepigs.org.uk zarafa-spooler[32402]: Starting Zarafa spooler ..done
Jul 16 14:20:11 condor.hodgepigs.org.uk systemd[1]: Started LSB: Zarafa Collaboration Server's Spooler Daemon.

I shall attempt to create a proper unit file as suggested by the next poster and let you know how it goes.

This existing Unit file already builds on the legacy init script(which is fine). It starts (and has started) the zarafa-spooler daemon. The daemon started stuccessfully, then exited because there is nothing to do (nothing in queue), which is normal.

Maybe take a look at your systlog after sending a job to the spooler, run the following

tail /var/log/messages

In other words, I suspect your spooler service is running just fine and hasn’t changed significantly from before.
Instead, I would start investigating how the jobs are passed to the service.

TSU

I’m not sure I follow what you mean, can you explain?

With zarafa-spooler started by hand (or via sysvinit on a system that does not used systemd) the spooler log file shows this at startup:


Tue Jul 16 19:23:26 2013:  5645] Starting zarafa-spooler version 7,1,5,42059 (42059), pid 5645
Tue Jul 16 19:23:26 2013:  5645] Using SMTP server: 127.0.0.1, port 25
Tue Jul 16 19:23:26 2013:  5645] Connection to Zarafa server succeeded
Tue Jul 16 19:23:26 2013:  5645] Number of messages in the queue: 0

When a message is put on the queue by the zarafa-server the spooler log then shows:


Tue Jul 16 19:24:02 2013:  5645] Number of messages in the queue: 1
Tue Jul 16 19:24:02 2013:  5645] Spooler process started on pid 5677
Tue Jul 16 19:24:02 2013:  5677] * Loading plugins started
Tue Jul 16 19:24:02 2013:  5677] ** Checking plugins in /var/lib/zarafa/spooler/plugins
Tue Jul 16 19:24:02 2013:  5677] * Loading plugins done
Tue Jul 16 19:24:02 2013:  5677] Sending e-mail for user ian, subject: 'Test', size: 1691
Tue Jul 16 19:24:02 2013:  5677] * PreSending processing started
Tue Jul 16 19:24:02 2013:  5677] * PreSending processing done
Tue Jul 16 19:24:02 2013:  5677] Sending message with Message-ID: zarafa.51e58fc2.162d.4a24113032a7e4cb@condor.hodgepigs.org.uk
Tue Jul 16 19:24:02 2013:  5677] SMTP: 2.0.0 Ok: queued as 86B8DBFE40
Tue Jul 16 19:24:02 2013:  5677] E-mail for user ian was accepted by SMTP server
Tue Jul 16 19:24:02 2013:  5645] Processed message for user ian

If zarafa-spooler is started by systemd (with no other change) then the logfile shows:


Tue Jul 16 19:07:10 2013:  5309] Starting zarafa-spooler version 7,1,5,42059 (42059), pid 5309
Tue Jul 16 19:07:10 2013:  5309] Using SMTP server: 127.0.0.1, port 25
Tue Jul 16 19:07:10 2013:  5309] Connection to Zarafa server succeeded
Tue Jul 16 19:07:10 2013:  5309] Number of messages in the queue: 0

And when a message is added to the queue:


Tue Jul 16 19:07:50 2013:  5309] Number of messages in the queue: 1
Tue Jul 16 19:07:50 2013:  5309] Spooler process started on pid 5346
Tue Jul 16 19:07:50 2013:  5309] Processed message for user ian
Tue Jul 16 19:08:50 2013:  5309] Number of messages in the queue: 1

No messages are processed or delivered. As I said, the only change is the way the spooler is started up.

OK, so I created a unit file for the spooler and used that to start it using systemd instead of relying on systemd using the LSB init script. It’s made no difference whatsoever. The spooler still doesn’t process messages.

To be honest, I’m not surprised. I don’t think the issue is with systemd itself, per se, but with the way it starts processes. I reckon that zarafa-spooler relies on something in it’s startup environment that it gets through pure sysvinit or on the command line but not via systemd. I’m sure there’s a way to get the correct environment (or whatever) into the spooler via systemd but I don’t know what’s missing. Hopefully someone who knows the internals of zarafa-spooler might come to my rescue. Meanwhile I’m going to do some diffs on /proc for the spooler started either way.

I’m still puzzled how this can be shipped via the opensuse build service and nobody else has encountered this. I can’t be the only one using Zarafa on opensuse!

Thanks for the help so far. Further suggestions welcome!

Did you try to manually restart it after system is booted? systemctl restart zarafa-spooler.service

Yes, but it has the same result - it starts up but doesn’t process any messages.

I’ve also tried adding all the environment variables that are set in its environment when started by hand (as list under /proc) to my unit file but that made no difference either.

The only real difference I can think of right now is stdin/out handling. Systemd makes them pipe to journal by default. You can try to explicitly set StandardOutput to null or tty (and set TTYPath to /dev/console).

Just a wild guess, but I do not really think it is missing environment. When you used initscript to start service, it should have started daemon using exactly the same environment as in case of sysvinit; if no, initscript does something weird indeed.

Looking at /proc/{PID}/fd I don’t see anything obviously different between a spooler started via systemd and one started manually. I have tried setting StandardOutput to null, and to tty in my unit file and starting the spooler via systemd but it made no difference.

The only diff I see between the two test messages is the level of verbosity which <might> be related to whether the message is being processed properly or not.

Am not a User of zarafa-spooler so I cannot advise you on how to set the level of verbosity.

Also, consider the possibility that spoolers often get stuck/orrupted and need to be cleared. When a message doesn’t send, try clearing the cache/queue and submitting a new message… But not the same message that got stuck. Sometimes something in a specific message can cause the queue to stick (eg an illegal character).

In fact, ideally you should want to turn verbosity all the way up so that you can identify the individual messages as they pass through the spooler, and state.

HTH,
TSU

The level of verbosity is identical (i.e. maximum). What you’re seeing in the logs is that in the one started by hand, the spooler picks up the message, does some internal processing of it as well as via some python ‘hooks’, then delivers it to postfix for onward delivery. In the one started by systemd, the spooler sees the new message ready for it to pick up and logs it, but doesn’t do anything else. Both logs were from a newly started spooler, using an identical blank test message consisting of a ‘To’ and a ‘Subject’. The only difference, is that one was started on the command line and one by systemd. This happens 100%repeatedly and I have tested it on openSUSE 12.1, 12.2 and 12.3, with various Zarafa versions from 7.1.0 to the current 7.1.5

Something that the spooler needs or does is not compatible with systemd!

OK,
Then, it sounds like the problem might be with the zarafa-spooler itself.
The problem isn’t likely related to systemd because the current systemd Unit file simply calls the zarafa-spooler init file (which is how the service worked before). Yes, systemd has cosmetically changed how it starts (but not really). But, other things have changed also which likely has caused a problem.

Recommend:
Uninstall and re-install zarafa-spooler completely. Perhaps during the upgrade a library changed and it didn’t pick up the change.

  • Where is the zarafa-spooler coming from, a packaged RPM from openSUSE or elsewhere? Compiled from source?
  • Has zarafa-spooler been packaged or tested specifically for/on openSUSE 12.3?

TSU

I’ve always asserted that the problem is with zarafa-spooler, not systemd, however it is something about the way it is started by systemd that is triggering the problem since the problem does not manifest when it is started by hand or by sysvinit. Systemd does start progams in a very different way from sysvinit - it’s one its big ‘selling points’ so I don’t think it’s truew to say that the difference is just cosmetic.

On openSUSE 12.1 and 12.2 I could use sysvinit to start the spooler without any problem then swap to systemd by removing the systemd-sysvinit compatiblity rpm and the problem would be there. That’s with no other change - an identical system except for the method of starting the zarafa-spooler daemon. As also noted above, all the installations are with openSUSE rpms from the build service, for the equivalent openSUSE version e.g. /repositories/network:/mail:/zarafa/openSUSE_12.3 The problem exists on a clean install of zarafa having uninstalled it all and re-installed (on openSUSE 12.1) and on subsequent upgrades of both openSUSE and zarafa from 12.1 to 12.2 and then 12.3. No matter what combination, the only common factor is that when the spooler is started by systemd it does not work but when started on the command line or by sysvinit it does.

This is already logged as a bug in bugzilla - https://bugzilla.novell.com/show_bug.cgi?id=828693 - I’m hoping the zarafa packagers can help debug it as I’ve run out of ideas.

Thanks.

I must have missed it in your messages, but I wasn’t aware you can start zarafa-spooler manually in 12.3 and it works. In this way you can at least determine whether the Unit file fundamentally works or not (but might be subject to the following).

If that is the case, then I suspect that the Unit file is missing some “Required” (more than liikely) or “Wanted” dependencies that aren’t running when the zarafa-spooler service first runs, but may be running later when if you are able to start the spooler manually.
This is possible when comparing to SysV because in that system a sequence of steps would be described leading to finally starting the named Service.

systemd Units on the other hand by default do not implement a hierarchy startup sequence.

Also, the names of one or more required dependencies may have changed in a way that may not have logged an error to the syslog.

TSU

You apparently also missed that it does not start under systemd when using original initscript too, without any explicit unit. OP created unit in attempt to fix the problem.

This is nothing to do with the boot sequence. The spooler starts up and connects to zarafa-server - but does not work properly - when started by systemd, no matter which way it is started using systemd: at boot or post boot after logging in, using either the LSB:initscript via systemd, or with my own unit file.

Consider this simple test sequence:

  1. Boot system, all services running including zarafa-server and postfix
  2. login and su to root
  3. kill any zarafa-spooler daemons that are running
  4. startup by hand: # /usr/sbin/zarafa-spooler -c /etc/zarafa/spooler.cfg
  5. spooler starts up and successfully process messages
  6. kill that spooler daemon process
  7. startup using systemd: # systemctl start zarafa-spooler.service
  8. spooler starts up but will not process messages

The only difference is the method to start the spooler daemon. on the commandline it works (and by sysvinit on older OS) and by systemd it does not.

From what you’ve posted, either something is very wrong or very close
Based on what you posted

condor.hodgepigs.org.uk:/home/ian# systemctl status zarafa-spooler.service
zarafa-spooler.service - LSB: Zarafa Collaboration Server’s Spooler Daemon
Loaded: loaded (/etc/init.d/zarafa-spooler)
Active: active (running) since Tue, 2013-07-16 14:20:11 BST; 8s ago
Process: 32402 ExecStart=/etc/init.d/zarafa-spooler start (code=exited, status=0/SUCCESS)
CGroup: name=systemd:/system/zarafa-spooler.service
├ 32412 /usr/sbin/zarafa-spooler -c /etc/zarafa/spooler.cfg
└ 32413 /usr/sbin/zarafa-spooler -c /etc/zarafa/spooler.cfg

Jul 16 14:20:11 condor.hodgepigs.org.uk zarafa-spooler[32402]: Starting Zarafa spooler …done
Jul 16 14:20:11 condor.hodgepigs.org.uk systemd[1]: Started LSB: Zarafa Collaboration Server’s Spooler Daemon.

The systemd Unit executes

/etc/init.d/zarafa-spooler start

So, the first Q that comes to mind is what in /etc/init.d/zarafa-spooler.service? Try opening in a text editor, it should be a plain text bash script.
Then, I’m also curious that the cgroup references a binary in /usr/sbin/zarafa-spooler.

To me, that looks like a mis-match…
systemd seems to be executing the legacy init script but the Cgroup is referencing a binary in /usr/sbin.

If the bash script doesn’t re-council this, then this can be tweaked. Either post the contents of /etc/init.d/zarafa-spooler or if you feel up to it, copy the contents of the zarafa-spooler Unit file as follows (leaving the original unmodified. If you ever want to return to the default, just delete the copy you’re customizing)

cp usr/lib/systemd/system/zarafa-spooler.service /etc/systemd/system

TSU

The LSB initscript script is /etc/init.d/zarafa-spooler and ultimately (ignoring all the standard stuff) all that does is source an environment file execute: ‘/usr/sbin/zarafa-spooler -c /etc/zarafa/spooler.cfg’ which is the same command you issue on the commandline to startup the zarafa spooler. The status report from systemd is exactly as expected since the spooler starts up so as far as systemd is concerned it thinks it is working (even though it isn’t). I’d expect the CGroup to reference the actual daemon, not the initscript. Thats true for all other daemons e.g. mysql, cups, ntp etc.

As I also said, I created a unit file (/etc/systemd/system/zarafa-spooler.service) which ‘Requires’ and ‘Wants’ zarafa-server.service and sources the same environment file as the initscript, then Execs ‘/usr/sbin/zarafa-spooler -c /etc/zarafa/spooler.cfg’. In all cases the spooler starts up ok and connects to the zarafa server but when started via systemd whether by the LSB init script or by the unit file, it does not process messages. The identical initscript when used on openSUSE 12.1 or 12.2 and sysvinit rather than systemd works in exactly the same way and starts up the spooler service in an identical manner yet the spooler works - it processes messages.

I believe this needs the help of someone who knows the internals of zarafa-spooler and can debug how its working under systemd so we can work out what is missing in its execution or IO environment when started that way since that seems to me to be the only difference.