An error occurred - unhelpful cron message

opensuse 42.2
linux4.4.79-18.26-default x86_64

I received a message at 4pm (why 4pm?) from our server that a cron job had a problem.

cronjob@sma-server3 - daily - FAILURE
running daily cronjob scripts
SCRIPT: logrotate exited with RETURNCODE = 1.

Huh! A vague message.

The system journal entry for the event:

 │Sep 18 16:00:01│run-crons                        │logrotate: OK 
 │Sep 18 16:00:01│logger[23072]                    │mdadm: OK 
 │Sep 18 16:00:01│logger[23078]                    │packagekit-background.cron: OK 
 │Sep 18 16:00:01│logger[23086]                    │suse-clean_catman: OK 
 │Sep 18 16:00:22│logger[23268]                    │suse-do_mandb: OK
 │Sep 18 16:00:22│logger[23274]                    │suse.cron-sa-update: OK
 │Sep 18 16:00:22│logger[23297]                    │ OK
 │Sep 18 16:00:51│run-crons                        │ OK
 │Sep 18 16:00:51│logger[23326]                    │ OK 
 │Sep 18 16:00:51│logger[23332]                    │ OK 
 │Sep 18 16:00:51│logger[23338]                    │ OK
 │Sep 18 16:00:51│cron[23026]                      │pam_unix(crond:session): session closed for user root 

A lot of OKs.

I manually ran “logrotate --verbose” on the /etc/cron.daily/ folder. No indications of a problem.

Is there another “daily” somewhere?
How do I find which instance of logrotate failed?

I guess that cron can not do much more then report that logrotate did that.

A more severe problem is that that the man page of logrotate has no information at all about the return codes of logrotate.
The message is not about the tool logrotate (/usr/sbin/logrotate), but about the script in /etc/cron.daily.
And this reads:


# exit immediately if there is another instance running
if checkproc /usr/sbin/logrotate; then
        /bin/logger -p cron.warning -t logrotate "ALERT another instance of logrotate is running - exiting"
        exit 1

TMPF=`mktemp /tmp/logrotate.XXXXXXXXXX`

/usr/sbin/logrotate /etc/logrotate.conf 2>&1 | tee $TMPF

if  $EXITVALUE != 0 ]; then
    # wait a sec, we might just have restarted syslog
    sleep 1
    # tell what went wrong
    /bin/logger -p cron.warning -t logrotate "ALERT exited abnormally with $EXITVALUE]"
    /bin/logger -p cron.warning -t logrotate -f $TMPF

rm -f $TMPF
exit 0

And there you see when the exit 1 is done.
Butt it seems that a message is loged then.

What sayeth;

journalctl --unit=logrotate.timer


journalctl --unit=logrotate.service

Maybe thou findest thy deepest desires of what caused the error there!

$ journalctl --unit=logrotate.timer
-- No entries --
$ journalctl --unit=logrotate.service
-- No entries --

Hmm, yes. It would seem I have logrotate instance that is hanging because of a zombie NFS mount.

Why does an NFS mounted volume go zombie if the network temporarily is disrupted?
Any system function, say ls or df or logrotate, that accesses the volume becomes hung and cannot be terminated.

I have different results if run as root. For instance:

:~> journalctl --unit=logrotate.timer
No journal files were found.

# journalctl --unit=logrotate.timer
-- Logs begin at Ter 2017-09-19 12:01:06 BRT, end at Qua 2017-09-20 23:52:36 BRT. --
Set 19 12:01:10 bruno-03 systemd[1]: Starting Daily rotation of log files.
Set 19 12:01:10 bruno-03 systemd[1]: Started Daily rotation of log files.

Those results were run as root.

Nevertheless, the actual problem was that one of the entries in the user logrotate config file accessed a NFS-mounted volume, and NFS had gone zombie because of the network disruption. Once NFS is a zombie-fied, there no getting it back, and NFS eats the brain of anything that touches it.

If there is a way to resurrect NFS without a reboot, it has eluded my researches and experiments.

The solution was to reboot the system to re-establish a working NFS.

Are you sure? You posted

$ journalctl --unit=logrotate.timer
 -- No entries --
$ journalctl --unit=logrotate.service
-- No entries --

Where the prompt shows you are not. But is that the complete prompt???

No, it was not. I trimmed it for clarity and laconic-ness. The command was indeed run as root.

Pleas don’t. People here expect that what is between CODE tags is the complete and unabridged of what you had original. And when you change something (like a readable password), please explain that. Else people will come to wrong conclusions and the discussion, and thus the trial to help you, will be very confused.

Whenever possible we like to see the prompt, the command, the output and the next prompt copied/pasted in on sweep of the mouse. Easy for you to do and better for all to understand.