Dovecot auth fails, unable to send e-mail

I opened this thread a while ago. I figured all was well and moved on with live. Now I’m stuck once more and debugging is driving me insane.

My e-mail server is failing to send email. It will error out after a 10s delay for SASL_AUTH. Running swaks from the server throws:

postfix/submission/smtpd[32677]: warning: localhost[127.0.0.1]: SASL LOGIN authentication failed: Connection lost to authentication server, sasl_username=(unavailable)
dovecot[14258]: auth: Error: auth client 0 disconnected with 1 pending requests: EOF

I have been debugging this issue for days now and there is something not correct when dovecot wants to authenticate:

sudo time doveadm auth test user@domain.ca
Password: 
passdb: user@domain.ca auth succeeded
extra fields:
  user=user@domain.ca
  
0.02user 0.02system 0:32.23elapsed 0%CPU (0avgtext+0avgdata 7800maxresident)k

It takes 32 seconds to do something, but there is no indication as to what. Executing the SQL to return the username from the database from the CLI is near instant, that is also what I have in the log with a few msec.

I have verified and compared the permission for /var/run/ and all those are correct:

ls -ld /var/run/dovecot
drwxr-xr-x 5 root root 800 Dec  8 15:51 /var/run/dovecot

ls -l /var/run/dovecot/auth-userdb
srw-rw---- 1 postman mail 0 Dec  8 15:51 /var/run/dovecot/auth-userdb

Spamassassin does not appear to be an issue as well. Though it complaints in the log, while the permissions are right and the spamd user belongs to the correct group:

getent passwd spamd
spamd:x:1001:1000::/var/log/spamassassin:/bin/false

id spamd
uid=1001(spamd) gid=1000(spamd) groups=498(mail),150(postman),1000(spamd)

However journalctl shows:

lda(user@domain.ca)<12223><>: Error: auth-master: userdb lookup(user@domain.ca): connect(/var/run/dovecot/auth-userdb) failed: Permission denied (euid=1001(spamd) egid=1000(spamd) missing +r perm: /var/run/dovecot/auth-userdb, we're not in group 498(mail), dir owned by 0:0 mode=0755)
Dec 08 15:22:41 postoffice dovecot[12223]: lda(12223): Fatal: Internal error

Executing a simple test to make sure spamd is not the issue:

runuser -u spamd -- test -r /var/run/dovecot/auth-userdb && echo "ok" || echo "fail"
ok

And the spamd response is near instant.

My issues have arrived since a MySQL CVE update applied to my Leap install. It seems that there is an issue with dovecot and lda, but I’m lost on how to solve that hopefully the community can give some guidance. What could this time-out be?

I have been able to resolve the lda error from spamd, by changing the command in the main.cf file to use my dovecot user and not spamd as user. I still am stuck on the time out messages, which are unclear where they come from.

I have been able to get my mail server to mail again, but with a work-around and not the preferred configuration. I have taken out the sasl_authenticator and added my ip to the mynetworks variable in main.cf. I can now send, but this is not a proper solution.

After some time has passed without interaction with the server. I’m able to send maybe 1 or 2. Search suggests that the pool is not replenished fast enough. However I have Haveged already running and enabled. Fail2ban does not ban the IP used by my client, I verified the fail2ban log for that.

Is there a way to debug the auth-worker process, as postfix just throws that time-out after 10 seconds. Where could I look for the configuration or the process that triggers a time-out on the auth-worker?

Stab in the dark - once upon the time there was similar problem with Acroread on Linux which used /dev/random instead of the /dev/urandom and could hang for a very long time.

So this was really hard to debug. Dovecot does not really tell you what is going on even with all debugging enabled and the SASL timeout throws you all over the place to check configuration and what not. It turns out that during authentication the auth -w process uses 100% CPU. As the hash is in ARGON that is not unexpected.

It turned out that my hash was overkill with {ARGON2I}$argon2i$v=19$m=524288,t=12,p=. The t= value needs to come down. This can be done by setting a parameter, however opensuse sets that by default to t=4 and yet I saw t=12. Turns out in my config.local.php for Postfix Admin I had the line CONF['dovecotpw'] = "/usr/bin/doveadm pw -r 12";, because I apparently thought you can’t be secure enough?

I removed that line, regenerated my hashes through Postfix Admin, re-instated my server configuration and it is swift again in sending.

> sudo time doveadm auth test user@domain.caPassword: 
> passdb: user@domain.ca auth succeeded
> extra fields:
>   user=user@domain.ca
>   
> 0.01user 0.01system 0:03.12elapsed 0%CPU (0avgtext+0avgdata 7680maxresident)k

The reason this now popped out I think has to do with an update from openSUSE in the september - november time-frame.