infinite loop on start if misconfigured

Bug #1889196 reported by Ian Collier

This bug report will be marked for expiration in 16 days if no further activity occurs. (find out why)

62
This bug affects 13 people
Affects Status Importance Assigned to Milestone
sssd (Ubuntu)
Incomplete
Medium
Unassigned

Bug Description

Suppose sssd has a valid sssd.conf file but no /etc/krb5.keytab (maybe the
keytab disappeared or got corrupted; or maybe ipa-client-install encountered
an error and didn't create one, but nevertheless your custom installer put
the sssd.conf in place - note that ipa-client-install leaves the sssd service
enabled even if the enrollment failed, although it does try to delete
sssd.conf).

If I do 'sudo systemctl start sssd' then it will quite reasonably tell me
there was a startup error:

Job for sssd.service failed because the control process exited with error code.
See "systemctl status sssd.service" and "journalctl -xe" for details.

But for some reason it enters a cycle of starting and stopping in the
background.

$ systemctl status sssd
● sssd.service - System Security Services Daemon
     Loaded: loaded (/lib/systemd/system/sssd.service; enabled; vendor preset: enabled)
     Active: activating (start) since Tue 2020-07-28 09:58:43 UTC; 199ms ago
   Main PID: 7956 (sssd)

$ systemctl status sssd
● sssd.service - System Security Services Daemon
     Loaded: loaded (/lib/systemd/system/sssd.service; enabled; vendor preset: enabled)
     Active: activating (start) since Tue 2020-07-28 09:59:09 UTC; 5s ago
   Main PID: 8080 (sssd)

$ systemctl status sssd
● sssd.service - System Security Services Daemon
     Loaded: loaded (/lib/systemd/system/sssd.service; enabled; vendor preset: enabled)
     Active: activating (start) since Tue 2020-07-28 09:59:54 UTC; 1s ago
   Main PID: 8276 (sssd)

This is a problem because if sssd is enabled in systemd and I reboot the machine,
the system will fail to finish booting because it is waiting for infinite
retries of the sssd service.

[FAILED] Failed to start System Security Services Daemon.
See 'systemctl status sssd.service' for details.
[DEPEND] Dependency failed for SSSD PAM Service responder socket.
[DEPEND] Dependency failed for SSSD PAM Service responder private socket.
[DEPEND] Dependency failed for SSSD NSS Service responder socket.
[DEPEND] Dependency failed for SSSD SSH Service responder socket.
[DEPEND] Dependency failed for SSSD AutoFS Service responder socket.
[DEPEND] Dependency failed for SSSD Sudo Service responder socket.
[DEPEND] Dependency failed for SSSD PAC Service responder socket.
[ OK ] Stopped System Security Services Daemon.
         Starting System Security Services Daemon...
[*** ] A start job is running for System Security Services Daemon (4s / 1min 30s)
[FAILED] Failed to start System Security Services Daemon.
See 'systemctl status sssd.service' for details.
[DEPEND] Dependency failed for SSSD PAM Service responder socket.
[DEPEND] Dependency failed for SSSD PAM Service responder private socket.
[DEPEND] Dependency failed for SSSD NSS Service responder socket.
[DEPEND] Dependency failed for SSSD SSH Service responder socket.
[DEPEND] Dependency failed for SSSD AutoFS Service responder socket.
[DEPEND] Dependency failed for SSSD Sudo Service responder socket.
[DEPEND] Dependency failed for SSSD PAC Service responder socket.
[ OK ] Stopped System Security Services Daemon.
         Starting System Security Services Daemon...
[*** ] A start job is running for System Security Services Daemon (4s / 1min 30s)
...ad infinitum...

So in summary: I acknowledge that this was misconfigured because of user error,
but my contention is that that shouldn't make the system unbootable.

Version tested: sssd 2.2.3-3 under systemd 245.4-4ubuntu3.2 in Ubuntu 20.04.

Changed in sssd (Ubuntu Focal):
status: New → Triaged
Changed in sssd (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Changed in sssd (Ubuntu Focal):
importance: Undecided → Medium
Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu
better.

This bug is present in Debian as well and Ubuntu currently does not make any
changes to the Debian package. Therefore, this bug would be best fixed
directly in Debian, and then Ubuntu will pick up the fix automatically.

Would you mind filing a bug with Debian please?

----

Debian uses a bug tracker a little different than Launchpad. As an example,
assuming the source package we are interested in is 'vim', you can find
the Debian bug tracker for vim here:

https://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=vim

You can file a bug via email by sending mail to: <email address hidden>
The body of the email needs to contain everything between the --- below:
---
Package: vim
Version: your_version_here

Your details, explanation, and steps to reproduce go below those two lines,
Package and Version, which are required.
---

You can read more here about filing bugs in Debian here:
https://www.debian.org/Bugs/Reporting

Hope that helps!

Revision history for this message
BertN45 (lammert-nijhof) wrote :

I have the same problem in Groovy Gorrillaa

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Is it really an "infinite" loop? Afaik this is caused by "Restart=on-failure" in the [Service] section of sssd.service, and systemd will eventually backoff and give up on the restarts.

Alternatively, I wonder if it's an effect of the socket activation. nss calls on the system keep waking up the service, which tries to start, and fails. @timo, any opinion?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

Ian, can you try disabling the sockets? Something like "systemctl disable sssd-nss.socket sssd-pam.socket" and so on for all *.socket services, leave just "sssd.service" (note .service) enabled, and add "services = nss,pam" to /etc/sssd/sssd.conf in the global section? That will tell sssd to start these backend services by itself.

Revision history for this message
Alexander Fieroch (fieroch) wrote :

I can confirm this bug for me on 20.04 and confirm an infinite loop. Even if sssd service is not working it should be skipped and at least local users should be able to login!

Revision history for this message
Emmett Keyser (ekeyser) wrote :

Wow at least I wasn't doing anything on a production machine. Been a while since something as catastrophic as not being able to login has affected one of my machines. In fact all I think I did was "realm leave" which produced no errors. I rebooted to a borked machine. Awesome.

Revision history for this message
Pelle Rivberg (rivpelle) wrote :

Same here with Groovy Gorilla (20.10), with all the latest bells'n'whistles. On boot-up, I get multiple failures on SSSD services, 'sudo systemctl restart sssd' doesn't help either, 'journalctl -xe' says:

-------------------------
░░ A start job for unit sssd-nss.socket has finished with a failure.
░░
░░ The job identifier is 8989 and the job result is dependency.
Nov 12 19:40:11 mymachine systemd[1]: sssd-nss.socket: Job sssd-nss.socket/start failed with result 'dependency'.
Nov 12 19:40:11 mymachine systemd[1]: Dependency failed for SSSD AutoFS Service responder socket.
░░ Subject: A start job for unit sssd-autofs.socket has failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit sssd-autofs.socket has finished with a failure.
░░
░░ The job identifier is 8913 and the job result is dependency.
Nov 12 19:40:11 mymachine systemd[1]: sssd-autofs.socket: Job sssd-autofs.socket/start failed with result 'dependency'.
Nov 12 19:40:11 mymachine systemd[1]: Dependency failed for SSSD PAC Service responder socket.
░░ Subject: A start job for unit sssd-pac.socket has failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit sssd-pac.socket has finished with a failure.
░░
░░ The job identifier is 8995 and the job result is dependency.
Nov 12 19:40:11 mymachine systemd[1]: sssd-pac.socket: Job sssd-pac.socket/start failed with result 'dependency'.
Nov 12 19:40:11 mymachine systemd[1]: Dependency failed for SSSD Sudo Service responder socket.
░░ Subject: A start job for unit sssd-sudo.socket has failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit sssd-sudo.socket has finished with a failure.
░░
░░ The job identifier is 8992 and the job result is dependency.
Nov 12 19:40:11 mymachine systemd[1]: sssd-sudo.socket: Job sssd-sudo.socket/start failed with result 'dependency'.
Nov 12 19:40:11 mymachine systemd[1]: Dependency failed for SSSD PAM Service responder socket.
░░ Subject: A start job for unit sssd-pam.socket has failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit sssd-pam.socket has finished with a failure.
░░
░░ The job identifier is 8990 and the job result is dependency.
Nov 12 19:40:11 mymachine systemd[1]: Dependency failed for SSSD PAM Service responder private socket.
░░ Subject: A start job for unit sssd-pam-priv.socket has failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit sssd-pam-priv.socket has finished with a failure.
░░
░░ The job identifier is 8991 and the job result is dependency.
-------------------------

... And so forth. What a quagmire, once more! :)

Revision history for this message
Krister Swenson (thekswenson) wrote :

I experience the infinite loop on boot about half the time. The OP speaks of "misconfiguration because of user error", yet I'm not the type to twittle with the system when it's not broken, so I'm not sure how this misconfiguration happened.

Does anyone know of a workaround?

When the machine does boot I can get this error message from sssd.service:

=> systemctl status sssd.service
● sssd.service - System Security Services Daemon
     Loaded: loaded (/lib/systemd/system/sssd.service; enabled; vendor preset: enabled)
     Active: failed (Result: exit-code) since Wed 2021-01-20 14:19:18 CET; 16min ago
    Process: 5550 ExecStart=/usr/sbin/sssd -i ${DEBUG_LOGGER} (code=exited, status=4)
   Main PID: 5550 (code=exited, status=4)

Jan 20 14:19:18 praxis systemd[1]: Failed to start System Security Services Daemon.
Jan 20 14:19:18 praxis systemd[1]: sssd.service: Scheduled restart job, restart counter is at 5.
Jan 20 14:19:18 praxis systemd[1]: Stopped System Security Services Daemon.
Jan 20 14:19:18 praxis systemd[1]: sssd.service: Start request repeated too quickly.
Jan 20 14:19:18 praxis systemd[1]: sssd.service: Failed with result 'exit-code'.
Jan 20 14:19:18 praxis systemd[1]: Failed to start System Security Services Daemon.

Revision history for this message
Paride Legovini (paride) wrote :

Hi,

I think that part of the comments here are actually related to LP: #1900642, which is now fixed in groovy-updates and hirsute. However initial bug description from Ian Collier specifies that this bug report is about the case where a valid sssd.conf exists, but sssd fails to start due to a different issue. This is not a case covered in LP: #1900642.

@Ian: do you agree with this analysis? Did you try disabling the sockets as Andreas suggested? The problem could be indeed with the socket activation of sssd, which doesn't seem to have a backoff.

Some more detailed steps to reproduce the issue from scratch (e.g. in a clean LXD container) would greatly help debugging.

Waiting for further information I'm setting the status of this bug report to Incomplete.

no longer affects: sssd (Ubuntu Focal)
Changed in sssd (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for sssd (Ubuntu) because there has been no activity for 60 days.]

Changed in sssd (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Nathan Bushman (nbushman) wrote :

I have been able to reproduce this issue twice by installing fuse (sudo apt install fuse) and then rebooting. The system is then permanently stuck in an unbootable state with the aforementioned errors.

Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

Hi Nathan,

Thanks for commenting here then.

Could you provide more details on the issue? What Ubuntu series are you installing it in? Are those clean installations? Does that happen with specific configuration in place?

Changed in sssd (Ubuntu):
status: Expired → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.