We’ve got a cluster of opensuse 13.1 machines. Recently the automount daemon started crashing:
2014-10-01T11:03:12.065329-05:00 pc59-gsc kernel: [84722.382631] automount[69721
]: segfault at 0 ip 00007fc8a1731194 sp 00007fc89eb150a0 error 4 in libc-2.18.so
[7fc8a16c7000+1a5000]
I set up strace on several machines, and eventually got the following:
futex(0x7f9877475c60, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x7f985c0012d0, FUTEX_WAIT_PRIVATE, 2, NULL) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f987aba1000
read(-1, 0x7f987aba1000, 8192) = -1 EBADF (Bad file descriptor)
futex(0x7f985c0012d0, FUTEX_WAKE_PRIVATE, 1) = 0
— SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0} —
+++ killed by SIGSEGV (core dumped) +++
The machines auto-update, and the timing of the crashes seemed likely related to an update of automount, which had a recent modification time:
pc07-gsc /home/flowers> ls -l /usr/sbin/automount
-rwxr-xr-x 1 root root 262680 Sep 22 07:08 /usr/sbin/automount
pc07-gsc /home/flowers> /usr/sbin/automount -V
Linux automount version 5.0.9
Directories:
config dir: /etc/sysconfig
maps dir: /etc
modules dir: /usr/lib64/autofs
Compile options:
DISABLE_MOUNT_LOCKING ENABLE_FORCED_SHUTDOWN ENABLE_IGNORE_BUSY_MOUNTS
WITH_LDAP WITH_SASL LIBXML2_WORKAROUND WITH_LIBTIRPC
For now, we’re regressing automount to 5.0.8, which should fix the issue for us until there’s a patch for the issue.
Our setup is that all user directories and most data directories are nfs mounted to the cluster. Crashes seemed to be when someone would log in to the machine and automount tried to mount their home directory, although it’s certainly possible it happened other times as well. The nfs servers are Solaris 10 machines, as is the nis server (we don’t use ldap, but nis+ with nis compatibility). Restarting the automount daemon would work, but subsequent crashes certainly happened. Rebooting machines did not change the behavior.