Server broken: Forgot all user?

Hello,

I have a very strange behavior on my OpenSuse Leap 42.1 system. The system was running for months without problems.
Last Sunday I suddenly recognized that several services like apache and dovecot were not reachable anymore, and the system behaved strange. So I tried a reboot to get a clean basis.

Unfortunately the server didn’t boot. It got stuck at the screen with the three dotes. So I started the rescue system from usb stick and used chroot to reinstall grub2. This solved that problem and I was able to boot again. (There is a chance that the boot problem is just an independent second issue, I am not sure)

But also after that reboot most/all services including sshd didn’t start (but I can ping the server). To make things worse, I couldn’t even login in the terminal with keyboard and display directly connected to the computer. The keyboard layout is fine and I am 100% sure, the password for both root and my person login was correct.

So back to the rescue system. System disc can be mounted and looks fine. “btrfs check” shows no errors.
The logs show some problems. This is from the last boot:


2016-09-19T20:24:50.221096+02:00 Server rsyslogd: db error (2002): Can't connect to local MySQL server through socket '/var/run/mysql/mysql.sock' (2 "No such file or directory")
2016-09-19T20:25:10.652019+02:00 Server systemd[1328]: Failed at step USER spawning /usr/lib/fetchmail-systemd-exec: No such process
2016-09-19T20:25:10.652925+02:00 Server systemd[1]: Starting Docker Application Container Engine...
2016-09-19T20:25:10.654079+02:00 Server systemd[1329]: Failed at step USER spawning /srv/www/fhem/fhem.pl: No such process
2016-09-19T20:25:10.656386+02:00 Server sshd-gen-keys-start[1317]: No user exists for uid 0
2016-09-19T20:25:10.692200+02:00 Server dovecot[1324]: doveconf: Fatal: Error in configuration file /etc/dovecot/dovecot.conf: default_login_user doesn't exist: dovenull
2016-09-19T20:25:10.849113+02:00 Server mysql-systemd-helper[1322]: chown: invalid user: ‘mysql:mysql’
2016-09-19T20:25:10.931917+02:00 Server SuSEfirewall2: Error: ip6tables-batch failed, re-running using ip6tables
2016-09-19T20:25:10.948242+02:00 Server SuSEfirewall2[1319]: ip6tables v1.4.21: Port "dhcpv6-client" does not resolve to anything.
2016-09-19T20:25:10.948424+02:00 Server SuSEfirewall2[1319]: Try `ip6tables -h' or 'ip6tables --help' for more information.
2016-09-19T20:25:11.049969+02:00 Server systemd[1]: sshd.service: control process exited, code=exited status=1
2016-09-19T20:25:11.050110+02:00 Server systemd[1]: Failed to start OpenSSH Daemon.
2016-09-19T20:25:11.050244+02:00 Server systemd[1]: Unit sshd.service entered failed state.
2016-09-19T20:25:11.054479+02:00 Server systemd[1]: dovecot.service: main process exited, code=exited, status=89/n/a
2016-09-19T20:25:11.054611+02:00 Server systemd[1]: Unit dovecot.service entered failed state.
2016-09-19T20:25:11.054722+02:00 Server systemd[1]: fetchmail.service: main process exited, code=exited, status=217/USER
2016-09-19T20:25:11.054834+02:00 Server systemd[1]: Unit fetchmail.service entered failed state.
2016-09-19T20:25:11.054940+02:00 Server systemd[1]: fhem.service: main process exited, code=exited, status=217/USER
2016-09-19T20:25:11.055049+02:00 Server systemd[1]: Failed to start FHEM Heimautomatisierungsserver.
2016-09-19T20:25:11.055165+02:00 Server systemd[1]: Unit fhem.service entered failed state.
2016-09-19T20:25:11.068696+02:00 Server mysql-systemd-helper[1785]: chown: invalid user: ‘mysql:mysql’
2016-09-19T20:25:11.080864+02:00 Server mysql-systemd-helper[1802]: chown: invalid user: ‘mysql:mysql’
2016-09-19T20:25:11.081286+02:00 Server mysql-systemd-helper[1802]: Waiting for MySQL to start
2016-09-19T20:25:11.081514+02:00 Server mysql-systemd-helper[1801]: chown: invalid user: ‘mysql:
2016-09-19T20:25:11.233448+02:00 Server sshd-gen-keys-start[1818]: Checking for missing server keys in /etc/ssh
2016-09-19T20:25:11.234159+02:00 Server systemd[1]: PID file /var/run/openvpn/server.pid not readable (yet?) after start.
2016-09-19T20:25:11.349395+02:00 Server start[1326]: chown: invalid user: 'ldap:ldap'
2016-09-19T20:25:11.352202+02:00 Server slapd[1326]: No passwd entry for user ldap
2016-09-19T20:25:11.352518+02:00 Server systemd[1]: slapd.service: control process exited, code=exited status=1
2016-09-19T20:25:11.352662+02:00 Server systemd[1]: Failed to start OpenLDAP Server Daemon.
2016-09-19T20:25:11.352780+02:00 Server systemd[1]: Unit slapd.service entered failed state.
2016-09-19T20:25:12.280279+02:00 Server ntpd[1793]: ntpd 4.2.8p8@1.3265-o Mon Jun 13 15:58:22 UTC 2016 (1): Starting
2016-09-19T20:25:12.280417+02:00 Server ntpd[1793]: Command line: /usr/sbin/ntpd -p /var/run/ntp/ntpd.pid -g -u ntp:ntp -c /etc/ntp.conf
2016-09-19T20:25:12.280521+02:00 Server ntpd[1889]: proto: precision = 0.047 usec (-24)
2016-09-19T20:25:12.281538+02:00 Server clamd[1323]: ERROR: Can't get information about user vscan.
2016-09-19T20:25:12.408859+02:00 Server openvpn[1820]: failed to find GID for group nogroup
2016-09-19T20:25:12.409015+02:00 Server openvpn[1820]: Exiting due to fatal error
2016-09-19T20:25:12.434617+02:00 Server ntpd[1889]: switching logging to file /var/log/ntp
2016-09-19T20:25:13.035960+02:00 Server systemd[1]: 
2016-09-19T20:25:13.052351+02:00 Server smbd[1906]:   SamInfo3_for_guest: Unable to locate guest account [nobody]!
2016-09-19T20:25:13.052459+02:00 Server smbd[1906]: [2016/09/19 20:25:13.052157,  0] ../source3/auth/auth_util.c:826(make_new_session_info_guest)
2016-09-19T20:25:13.052561+02:00 Server smbd[1906]:   get_guest_info3 failed with NT_STATUS_NO_SUCH_USER
2016-09-19T20:25:13.052664+02:00 Server smbd[1906]: [2016/09/19 20:25:13.052177,  0] ../source3/smbd/server.c:1496(main)
2016-09-19T20:25:13.052765+02:00 Server smbd[1906]:   ERROR: failed to setup guest info.
2016-09-19T20:25:13.053517+02:00 Server systemd[1]: smb.service: main process exited, code=exited, status=255/n/a
2016-09-19T20:25:13.053687+02:00 Server systemd[1]: Failed to start Samba SMB Daemon.
2016-09-19T20:25:13.053807+02:00 Server systemd[1]: Unit smb.service entered failed state.
2016-09-19T20:25:13.722044+02:00 Server mysql-systemd-helper[1801]: 160919 20:25:13 [ERROR] Fatal error: Can't change to run as user 'mysql' ;  Please check that the user exists!
2016-09-19T20:25:14.118021+02:00 Server mysql-systemd-helper[1801]: 160919 20:25:13 [ERROR] Aborting
2016-09-19T20:25:14.118109+02:00 Server mysql-systemd-helper[1801]: 160919 20:25:13 [Note] /usr/sbin/mysqld: Shutdown complete
2016-09-19T20:25:14.118163+02:00 Server systemd[1]: mysql.service: main process exited, code=exited, status=1/FAILURE
2016-09-19T20:25:14.253390+02:00 Server systemd[1]: Failed to start NTP Server Daemon.
2016-09-19T20:25:14.253563+02:00 Server systemd[1]: Unit ntpd.service entered failed state.
2016-09-19T20:25:15.550898+02:00 Server display-manager[1927]: Failed to set keymap: The name org.freedesktop.locale1 was not provided by any .service files
2016-09-19T20:25:15.611533+02:00 Server systemd[1]: Received SIGRTMIN+21 from PID 267 (plymouthd).
2016-09-19T20:25:16.835590+02:00 Server systemd[1]: amavis.service: main process exited, code=exited, status=255/n/a
2016-09-19T20:25:16.984711+02:00 Server systemd[1]: PID file /var/run/displaymanager.pid not readable (yet?) after start.
2016-09-19T20:25:17.061690+02:00 Server systemd[1]: amavis.service: control process exited, code=exited status=255
2016-09-19T20:25:17.065395+02:00 Server systemd[1]: Failed to start Amavisd-new Virus Scanner interface.
2016-09-19T20:25:17.065580+02:00 Server systemd[1]: Unit amavis.service entered failed state.
2016-09-19T20:25:17.691797+02:00 Server start_apache2[2011]: AH00543: httpd-prefork: bad user name wwwrun
2016-09-19T20:25:17.704985+02:00 Server systemd[1]: apache2.service: control process exited, code=exited status=1
2016-09-19T20:25:17.705188+02:00 Server systemd[1]: Failed to start The Apache Webserver.
2016-09-19T20:25:17.705351+02:00 Server systemd[1]: Unit apache2.service entered failed state.
2016-09-19T20:25:20.680433+02:00 Server sddm[1999]: Failed to find the sddm user. Owner of the auth file will not be changed.
2016-09-19T20:25:20.685859+02:00 Server sddm-helper[2047]: [PAM] openSession: Cannot make/remove an entry for the specified session
2016-09-19T20:25:20.686020+02:00 Server sddm[1999]: Error from greeter session: "Cannot make/remove an entry for the specified session"
2016-09-19T20:25:20.686343+02:00 Server sddm-helper[2047]: [PAM] Ended.
2016-09-19T20:25:20.686498+02:00 Server sddm-helper: pam_loginuid(sddm-greeter:session): error: login user-name 'sddm' does not exist
2016-09-19T20:25:20.686649+02:00 Server sddm-helper: pam_unix(sddm-greeter:session): session opened for user sddm by (uid=0)
2016-09-19T20:25:20.686787+02:00 Server sddm-helper: pam_umask(sddm-greeter:session): account for sddm not found
2016-09-19T20:25:20.686851+02:00 Server sddm-helper: pam_systemd(sddm-greeter:session): Failed to get user data.
2016-09-19T20:25:20.687048+02:00 Server sddm[1999]: Auth: sddm-helper exited with 2
2016-09-19T20:25:20.687260+02:00 Server sddm[1999]: Greeter stopped.
2016-09-19T20:25:33.589648+02:00 Server docker[1338]: time="2016-09-19T20:25:33.589501748+02:00" level=warning msg="Your kernel does not support swap memory limit."
2016-09-19T20:25:33.590230+02:00 Server docker[1338]: time="2016-09-19T20:25:33.589609683+02:00" level=warning msg="Your kernel does not support kernel memory limit."
2016-09-19T20:25:45.857082+02:00 Server login: gkr-pam: error looking up user information
2016-09-19T20:25:46.317158+02:00 Server login: pam_unix(login:auth): check pass; user unknown
2016-09-19T20:25:46.317515+02:00 Server login: pam_unix(login:auth): authentication failure; logname=LOGIN uid=0 euid=0 tty=tty1 ruser= rhost=
2016-09-19T20:25:47.951032+02:00 Server login: FAILED LOGIN 1 FROM tty1 FOR (unknown), User not known to the underlying authentication module
2016-09-19T20:25:50.661289+02:00 Server login: gkr-pam: error looking up user information
2016-09-19T20:25:51.260884+02:00 Server kernel:    79.102905] docker0: port 1(vethecdfa3a) entered forwarding state
2016-09-19T20:25:56.221286+02:00 Server login: pam_unix(login:auth): check pass; user unknown
2016-09-19T20:25:58.462505+02:00 Server login: FAILED LOGIN 2 FROM tty1 FOR (unknown), User not known to the underlying authentication module
2016-09-19T20:26:03.325427+02:00 Server login: gkr-pam: error looking up user information
2016-09-19T20:26:10.653253+02:00 Server login: pam_unix(login:auth): check pass; user unknown
2016-09-19T20:26:11.999733+02:00 Server mysql-systemd-helper[1802]: MySQL is still dead
2016-09-19T20:26:12.000224+02:00 Server systemd[1]: mysql.service: control process exited, code=exited status=1
2016-09-19T20:26:12.000848+02:00 Server systemd[1]: Failed to start MySQL server.
2016-09-19T20:26:12.001339+02:00 Server systemd[1]: Unit mysql.service entered failed state.
2016-09-19T20:26:12.101930+02:00 Server systemd[1]: mysql.service holdoff time over, scheduling restart.
2016-09-19T20:26:12.738983+02:00 Server login: TOO MANY LOGIN TRIES (3) FROM tty1 FOR root, Have exhausted maximum number of retries for service
2016-09-19T20:26:12.739349+02:00 Server login: PAM 2 more authentication failures; logname=LOGIN uid=0 euid=0 tty=tty1 ruser= rhost=
2016-09-19T20:26:13.138149+02:00 Server systemd[1]: postfix.service: control process exited, code=exited status=1
2016-09-19T20:26:13.138655+02:00 Server systemd[1]: Failed to start Postfix Mail Transport Agent.
2016-09-19T20:26:13.139052+02:00 Server systemd[1]: Unit postfix.service entered failed state.
2016-09-19T20:26:13.139429+02:00 Server systemd[1]: Starting Command Scheduler...
2016-09-19T20:26:13.140895+02:00 Server systemd[1]: Started Command Scheduler.
2016-09-19T20:26:13.146688+02:00 Server cron[2296]: (CRON) INFO (RANDOM_DELAY will be scaled with factor 11% if used.)
2016-09-19T20:26:13.147462+02:00 Server cron[2296]: (CRON) bad username (/etc/crontab)
2016-09-19T20:26:13.147891+02:00 Server cron[2296]: (CRON) bad username (/etc/cron.d/spamcheck)
2016-09-19T20:26:13.148211+02:00 Server cron[2296]: (CRON) bad username (/etc/cron.d/owncloud)
2016-09-19T20:26:13.148506+02:00 Server cron[2296]: (CRON) bad username (/etc/cron.d/spam)
2016-09-19T20:26:13.149087+02:00 Server cron[2296]: (CRON) bad username (/etc/cron.d/sys_state)
2016-09-19T20:26:13.149432+02:00 Server cron[2296]: (CRON) bad username (/etc/cron.d/munin)
2016-09-19T20:26:13.150833+02:00 Server cron[2296]: (CRON) bad username (/etc/cron.d/mdadm)
2016-09-19T20:26:13.151138+02:00 Server cron[2296]: (CRON) bad username (/etc/cron.d/mdadm)
2016-09-19T20:26:21.579329+02:00 Server apcupsd[1786]: Cannot associate a name with uid 0
2016-09-19T20:26:21.583212+02:00 Server apcupsd[1786]: send-mail: fatal: file /etc/postfix/main.cf: parameter default_privs: unknown user name value: nobody
2016-09-19T20:26:21.583630+02:00 Server apcupsd[1786]: wall: cannot get tty name: Inappropriate ioctl for device

What I see is a lot of user unknown.
In the log entries around the time when the problem occurred happend first, I have thounsend of entries within a few seconds like these:

2016-09-18T21:46:17.233901+02:00 Server slapd[21271]: warning: cannot open /etc/hosts.deny: Permission denied
2016-09-18T21:46:17.233984+02:00 Server slapd[21271]: warning: cannot open /etc/hosts.allow: Permission denied

And also a few like this:

2016-09-18T21:47:36.567040+02:00 Server httpd-prefork[29941]: [core:error] [pid 29941] (13)Permission denied: [client 172.17.0.2:50304] AH00035: access to / denied (filesystem path '/srv') because search permissions are missing on a component of the path
2016-09-18T21:47:51.217342+02:00 Server dovecot: imap(torsten): Error: stat(/mails/new) failed: Permission denied
2016-09-18T21:47:51.219988+02:00 Server dovecot: imap(torsten): Error: stat(/mails/dovecot.index.log) failed: Permission denied (euid=1010(<unknown>) egid=100(<unknown>) stat() failed: No such file or directory)

For me it looks like the system has forgotten all users.
passwd and shadow files look fine.

What could be the reason for that behavior, what direction should I investigate?
Are there other information you need to know?

Best
Torsten

You should start by posting a full description of your layout, ie

  • Is your root and if a separate home partition installed on sdb or something else?
  • Are you installed using LVM or not?
  • Are any volumes encrypted?
  • What is the underlying physical disk system, sata only or sas or scsi?
  • Oftentimes the BIOS can do a disk recognition check, make sure your disks are recognized properly.

Only <after> you verify you don’t have a physical problem, then you <might> consider using snapper to roll back a few snapshots.

TSU

Hello TSU,

thanks for the reply.

To my configuration:

  • All discs are sata, nothing is encrypted, all parts are standard PC components, no server equipment
  • The root fs is installed on a 360GB single disk with btrfs filesystem. This system partition contains nearly the whole system. A part of that disk is partitioned separately for swap.
  • There is only one additional folder “/data” that is on a RAID5 from three 2TB disk with ext3, where I store bigger data amounts like videos, photos, ect. Most services store there data on the system disk and I copy them regularly on the RAID as a “backup”. So only samba is using the /data folder directly and the rest of the services would work fine even without it.
  • The BIOS shows me all four disks with correct size, vendor and typ.
  • With the rescue system I checked the system partition with “btrfs check”. That is the result:
Checking filesystem on /dev/sda2
UUID: ab3720eb-398b-4d9c-9043-c56fd4a53195
found 37673144388 bytes used err is 0
total csum bytes: 27746756
total tree bytes: 1248985088
total fs tree bytes: 1125236736
total extent tree bytes: 82149376
btree space waste bytes: 217408713
file data blocks allocated: 75939344384
 referenced 40292900864
btrfs-progs v4.1.2+20151002

With the rescue system I can mount the system partition and inspect it. I wander a little bit through the different folders, and everything looks normal and I can open all files that I tested and they .

Best
Torsten

Anybody any idea what could be the reason, that the system doesn’t know the users anymore?
Does have anybody a good side where it is explained, how linux verify, if a user exists? Maybe I find an idea there.

How do I use snapper to roll back to an snapshot if I can’t get into the system?

Best

There is not enough information to guess. As you can post logs, your system can write. Boot your system, try any command that returns information about users, like “getent passwd mysql” or “if mysql” for any user that is logged as non-existent. If it fails to return correct information, run it under strace to see where it possibly fails. This may give some starting point.

How do I use snapper to roll back to an snapshot if I can’t get into the system?

You mean - after reading documentation? Do you have specific question?

Hello arvidjaar,

thanks for the reply.

The problem is, that I cannot get into the system when I normally booting, because the system does not accept any user information.

So I can only start a rescue system for usb stick and look from there or make a chroot. But there everything seems normal. Also “getent passwd mysql” is working perfectly fine.

All the snapper documentation I found assumes that you are logged into the system. I didn’t found anything, if you only have a rescue system. Just chroot does not seems to be a solution, because I get told, that the config for root is missing.

So I am thankful for every hint what I can do.

Best

Also not root? Can you log in in run level 1? What happens if you try “init=/bin/sh” to exclude any system initialization at all?

All the snapper documentation I found assumes that you are logged into the system.

Yes, you boot into one of previous read-only snapshots. Did you test it? Does it also fail?