AD authentication doesn't work on domain joined system using sssd method

I discovered this issue on MicroOS but later tested on Tumbleweed and also applies. On my lab setup I have Active Directory service installed on Windows Server 2019 VM as well as openSUSE Leap 15.3 and Tumbleweed machines. On Leap I successfully joined domain using this method and without any issues I could login domain user to system both in GDM and SSH. On Tumbleweed I also successfully joined to domain but I this time I cannot login to system. I tried both using YaST (with and without SAMBA auth) and manually according to this doc. I ensured that all necessary packages were installed, compared configuration files between Tumbleweed and Leap and can’t find reason why it doesn’t work. sssd service is up and running but they are warnings about offline backend. I also ensured all networking related settings (NTP, DNS, firewall), I can ping and resolve name of both my domain name and DC server.

Thanks for help, BK

Detail log from /var/log/sssd/sssd_lab.local.log:

********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:

  • (2022-02-06 20:21:36): [be[lab.local]] [child_callback] (0x0020): LDAP child was terminated due to timeout
  • (2022-02-06 20:21:36): [be[lab.local]] [sdap_kinit_done] (0x0080): Communication with KDC timed out, trying the next one
  • (2022-02-06 20:21:36): [be[lab.local]] [_be_fo_set_port_status] (0x8000): Setting status: PORT_NOT_WORKING. Called from: src/providers/ldap/sdap_async_connection.c: sdap_kinit_done: 1239
  • (2022-02-06 20:21:36): [be[lab.local]] [fo_set_port_status] (0x0100): Marking port 389 of server ‘dc.lab.local’ as ‘not working’
  • (2022-02-06 20:21:36): [be[lab.local]] [fo_set_port_status] (0x0400): Marking port 389 of duplicate server ‘dc.lab.local’ as ‘not working’
  • (2022-02-06 20:21:36): [be[lab.local]] [sdap_kinit_next_kdc] (0x1000): Resolving next KDC for service AD
  • (2022-02-06 20:21:36): [be[lab.local]] [fo_resolve_service_send] (0x0100): Trying to resolve service ‘AD’
  • (2022-02-06 20:21:36): [be[lab.local]] [get_server_status] (0x1000): Status of server ‘dc.lab.local’ is ‘name resolved’
  • (2022-02-06 20:21:36): [be[lab.local]] [get_port_status] (0x1000): Port status of port 389 for server ‘dc.lab.local’ is ‘not working’
  • (2022-02-06 20:21:36): [be[lab.local]] [get_port_status] (0x0080): SSSD is unable to complete the full connection request, this internal status does not necessarily indicate network port issues.
  • (2022-02-06 20:21:36): [be[lab.local]] [fo_resolve_service_send] (0x0020): No available servers for service ‘AD’
    ********************** BACKTRACE DUMP ENDS HERE *********************************

(2022-02-06 20:21:36): [be[lab.local]] [sdap_cli_connect_recv] (0x0040): Unable to establish connection [13]: Permission denied
********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:

  • (2022-02-06 20:21:36): [be[lab.local]] [be_resolve_server_done] (0x1000): Server [NULL] resolution failed: [5]: Input/output error
  • (2022-02-06 20:21:36): [be[lab.local]] [sdap_cli_kinit_done] (0x0400): Cannot get a TGT: ret [1432158230](Network I/O Error)
  • (2022-02-06 20:21:36): [be[lab.local]] [sdap_cli_connect_recv] (0x0040): Unable to establish connection [13]: Permission denied
    ********************** BACKTRACE DUMP ENDS HERE *********************************

(2022-02-06 20:21:36): [be[lab.local]] [fo_resolve_service_send] (0x0020): No available servers for service ‘AD’

  • … skipping repetitive backtrace …
    (2022-02-06 20:21:36): [be[lab.local]] [sdap_id_op_connect_done] (0x0040): Failed to connect, going offline (5 [Input/output error])
    ********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:
  • (2022-02-06 20:21:36): [be[lab.local]] [sdap_id_op_connect_done] (0x4000): attempting failover retry on op #2
  • (2022-02-06 20:21:36): [be[lab.local]] [sdap_id_op_connect_step] (0x4000): waiting for connection to complete
  • (2022-02-06 20:21:36): [be[lab.local]] [sdap_id_release_conn_data] (0x4000): Releasing unused connection with fd -1]
  • (2022-02-06 20:21:36): [be[lab.local]] [be_resolve_server_done] (0x1000): Server [NULL] resolution failed: [5]: Input/output error
  • (2022-02-06 20:21:36): [be[lab.local]] [sdap_id_op_connect_done] (0x0040): Failed to connect, going offline (5 [Input/output error])
    ********************** BACKTRACE DUMP ENDS HERE *********************************

(2022-02-06 20:21:36): [be[lab.local]] [sdap_sudo_refresh_connect_done] (0x0020): SUDO LDAP connection failed [11]: Resource temporarily unavailable
********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:

  • (2022-02-06 20:21:36): [be[lab.local]] [be_mark_offline] (0x2000): Going offline!
  • (2022-02-06 20:21:36): [be[lab.local]] [be_mark_offline] (0x2000): Initialize check_if_online_ptask.
  • (2022-02-06 20:21:36): [be[lab.local]] [be_ptask_create] (0x0400): Periodic task [Check if online (periodic)] was created
  • (2022-02-06 20:21:36): [be[lab.local]] [be_ptask_schedule] (0x0400): Task [Check if online (periodic)]: scheduling task 73 seconds from now [1644175369]
  • (2022-02-06 20:21:36): [be[lab.local]] [be_run_offline_cb] (0x0080): Going offline. Running callbacks.
  • (2022-02-06 20:21:36): [be[lab.local]] [sdap_id_op_connect_done] (0x4000): notify offline to op #1
  • (2022-02-06 20:21:36): [be[lab.local]] [sdap_sudo_refresh_connect_done] (0x0020): SUDO LDAP connection failed [11]: Resource temporarily unavailable
    ********************** BACKTRACE DUMP ENDS HERE *********************************

(2022-02-06 20:21:36): [be[lab.local]] [be_ptask_done] (0x0040): Task [SUDO Full Refresh]: failed with [11]: Resource temporarily unavailable
(2022-02-06 20:21:36): [be[lab.local]] [ad_subdomains_refresh_connect_done] (0x0020): Unable to connect to LDAP [11]: Resource temporarily unavailable
********************** PREVIOUS MESSAGE WAS TRIGGERED BY THE FOLLOWING BACKTRACE:

  • (2022-02-06 20:21:36): [be[lab.local]] [be_ptask_done] (0x0040): Task [SUDO Full Refresh]: failed with [11]: Resource temporarily unavailable
  • (2022-02-06 20:21:36): [be[lab.local]] [be_ptask_schedule] (0x0400): Task [SUDO Full Refresh]: scheduling task 21600 seconds from now [1644196896]
  • (2022-02-06 20:21:36): [be[lab.local]] [sdap_id_op_connect_done] (0x4000): notify offline to op #2
  • (2022-02-06 20:21:36): [be[lab.local]] [ad_subdomains_refresh_connect_done] (0x0020): Unable to connect to LDAP [11]: Resource temporarily unavailable
    ********************** BACKTRACE DUMP ENDS HERE *********************************

bkaczynski@weed:~> nslookup lab.local
Server: 192.168.122.61
Address: 192.168.122.61#53

Name: lab.local
Address: 192.168.122.61

[FONT=monospace]bkaczynski@weed:~> hostname --fqdn
weed.lab.local
bkaczynski@weed:~> ping -c 3 lab.local
PING lab.local (192.168.122.61) 56(84) bytes of data.
64 bytes from lab.local (192.168.122.61): icmp_seq=1 ttl=128 time=0.359 ms
64 bytes from lab.local (192.168.122.61): icmp_seq=2 ttl=128 time=0.895 ms
64 bytes from lab.local (192.168.122.61): icmp_seq=3 ttl=128 time=1.03 ms

— lab.local ping statistics —
3 packets transmitted, 3 received, 0% packet loss, time 2014ms
rtt min/avg/max/mdev = 0.359/0.762/1.033/0.290 ms
bkaczynski@weed:~> telnet lab.local 53
Trying 192.168.122.61…
Connected to lab.local.
Escape character is ‘^]’.
^]
telnet> Connection closed.
bkaczynski@weed:~> telnet lab.local 88
Trying 192.168.122.61…
Connected to lab.local.
Escape character is ‘^]’.
^]
telnet> Connection closed.
bkaczynski@weed:~> telnet lab.local 389
Trying 192.168.122.61…
Connected to lab.local.
Escape character is ‘^]’.
^]
telnet> Connection closed.
[/FONT]

UPDATE:
Another experiment. I’ve created another VM on Leap system, enrolled to AD, logged to system using domain account in GDM as well. Then I’ve performed upgrade to Tumbleweed and after that I noticed the same problem. I didn’t similar sssd/samba/ad issues on bugzilla so probably it’s not global error/bug so if anyone had similar issue and found resolution please, share with it.

I am experiencing the same issue. SSSD just cycles endlessly through my domain controllers, marking each one as not working.
The only solution I found was to rollback the update. I had been lax with keeping the system updated, so I had to roll back to 20210823, but the sssd-2.5.2-1.1 included with that release works.

I attempted to add package locks on the sssd packages and try to update everything else, but that failed, too – sssd couldn’t find the LDB library and thus couldn’t open its own database files.

Yes, indeed. 2.5.2 is the version included now in 15.4 Leap Alpha and on that system the authentication works. 2.6.2 is current version on Tumbleweed and almost the newest according to project site.

Ok, so we have light at the end of the tunnel. I’ve added experimental network:ldap repo where sssd package is in the newest - 2.6.3 - version. After dup --allow-vendor-change I successfully login as domain account :nerd:

It looks like 2.6.3 has been released to the main Tumbleweed repos. I see this in the RPM changelog:

* Tue Jan 25 2022 Jan Engelhardt <jengelh@inai.de> 
- Update to release 2.6.3 
  * A regression introduced in sssd-2.6.2 in the IPA provider 
    that prevented users from login was fixed. Access control 
    always denied access because the selinux_child returned an 
    unexpected reply. 
  * A critical regression that prevented authentication of users 
    via AD and IPA providers was fixed. LDAP port was reused for 
    Kerberos communication and this provider would send 
    incomprehensible information to this port. 
  * When authenticating AD users, backtrace was triggered even 
    though everything was working correctly. This was caused by a 
    search in the global catalog. Servers from the global catalog 
    are filtered out of the list before writing the KDC info 
    file. With this fix, SSSD does not attempt to write to the 
    KDC info file when performing a GC lookup.