socket-activated sshd breaks on concurrent connections

Bug #1905285 reported by Marcin Słowik
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
openssh (Ubuntu)
Fix Released
Medium
Unassigned
Focal
Fix Released
Undecided
Athos Ribeiro
Hirsute
Fix Released
Undecided
Athos Ribeiro

Bug Description

[Impact]

Users of the systemd socket activated ssh service may experience a race condition that may lead an ssh instance to fail.

The race condition happens when, for a running socket activated ssh service,

an instance A is started, creating the RuntimeDirectory for the service; then

an instance B is started, relying on the RuntimeDirectory created for instance A; then

instance A halts, causing the RuntimeDirectory to be deleted.

If, at this point, instance B has not chrooted into RuntimeDirectory yet, then instance B will fail.

The proposed patch fixes the issue by preserving the RuntimeDirectory after an instance A of the socket activated ssh service halts.

[Test Plan]

1) Stop any running instances of ssh.
`systemctl stop ssh`

2) Start the socket activated ssh service.
`systemctl start ssh.socket`

3) Verify that no errors related to ssh were logged in /var/log/auth.log
`cat /var/log/auth.log | grep 'sshd.*fatal.*chroot.*No such file or directory'`

4) perform several ssh connections to the running server in a short time span. ssh-keyscan may help here.
`ssh-keyscan localhost`

5) Verify that errors related to ssh were logged in /var/log/auth.log
`cat /var/log/auth.log | grep 'sshd.*fatal.*chroot.*No such file or directory'`

6) Apply the proposed fix (make sure the socket activated service is restarted)

7) repead step (4), then verify that no new entries were appended to the step (5) output

[Where problems could occur]

If the changes to the socket activated unit file are wrong, the socket activated service may fail to start after the package upgrade. In this case, we would need to instruct users to perform local changes to the unit file with possible additional fixes while a new version of the patch lands.

[racb] There might be cases where users are inadvertently depending on the cleanup that will now be disabled - for example by a bug or misconfiguration that would result in /run filling up otherwise. By disabling systemd cleanup and relying solely on openssh for cleanup, such a bug or misconfiguration may be exposed and cause problems on such systems.

[Other Info]

This fix has been forwarded to Debian and accepted in https://salsa.debian.org/ssh-team/openssh/-/merge_requests/12

[Original message]

This is mostly the same issue as https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=934663.

With the default configuration of openssh-server and systemd, sshd will complain and crash when multiple connections are made and terminated in a quick succession, e.g. with `ssh-keyscan`. It results in the following errors in /var/log/auth.log:

```
Nov 22 20:53:34 {host} sshd[14567]: Unable to negotiate with {client} port 41460: no matching host key type found. Their offer: <email address hidden> [preauth]
Nov 22 20:53:34 {host} sshd[14570]: fatal: chroot("/run/sshd"): No such file or directory [preauth]
Nov 22 20:53:34 {host} sshd[14569]: fatal: chroot("/run/sshd"): No such file or directory [preauth]
Nov 22 20:53:34 {host} sshd[14568]: fatal: chroot("/run/sshd"): No such file or directory [preauth]
Nov 22 20:53:34 {host} sshd[14566]: fatal: chroot("/run/sshd"): No such file or directory [preauth]
Nov 22 20:53:47 {host} sshd[14584]: Connection closed by {client} port 59312 [preauth]
Nov 22 20:53:47 {host} sshd[14586]: fatal: chroot("/run/sshd"): No such file or directory [preauth]
Nov 22 20:53:48 {host} sshd[14585]: fatal: chroot("/run/sshd"): No such file or directory [preauth]
```

as well as e.g. missing responses in ssh-keyscan:

```
$ ssh-keyscan -vvv {host}
debug2: fd 3 setting O_NONBLOCK
debug3: conalloc: oname {host} kt 2
debug2: fd 4 setting O_NONBLOCK
debug3: conalloc: oname {host} kt 4
debug2: fd 5 setting O_NONBLOCK
debug3: conalloc: oname {host} kt 8
debug2: fd 6 setting O_NONBLOCK
debug3: conalloc: oname {host} kt 32
debug2: fd 7 setting O_NONBLOCK
debug3: conalloc: oname {host} kt 64
debug1: match: OpenSSH_8.2p1 Ubuntu-4ubuntu0.1 pat OpenSSH* compat 0x04000000
# {host}:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.1
debug3: send packet: type 20
debug1: SSH2_MSG_KEXINIT sent
debug3: receive packet: type 20
debug1: SSH2_MSG_KEXINIT received
debug2: local client KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256,<email address hidden>,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256
debug2: host key algorithms: <email address hidden>
debug2: ciphers ctos: <email address hidden>,aes128-ctr,aes192-ctr,aes256-ctr,<email address hidden>,<email address hidden>
debug2: ciphers stoc: <email address hidden>,aes128-ctr,aes192-ctr,aes256-ctr,<email address hidden>,<email address hidden>
debug2: MACs ctos: <email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: <email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none,<email address hidden>
debug2: compression stoc: none,<email address hidden>
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug2: peer server KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256,<email address hidden>,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1
debug2: host key algorithms: rsa-sha2-512,rsa-sha2-256,ssh-rsa,ecdsa-sha2-nistp256,ssh-ed25519
debug2: ciphers ctos: <email address hidden>,aes128-ctr,aes192-ctr,aes256-ctr,<email address hidden>,<email address hidden>
debug2: ciphers stoc: <email address hidden>,aes128-ctr,aes192-ctr,aes256-ctr,<email address hidden>,<email address hidden>
debug2: MACs ctos: <email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: <email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none,<email address hidden>
debug2: compression stoc: none,<email address hidden>
debug2: languages ctos:
debug2: languages stoc:
debug2: first_kex_follows 0
debug2: reserved 0
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: (no match)
debug1: match: OpenSSH_8.2p1 Ubuntu-4ubuntu0.1 pat OpenSSH* compat 0x04000000
# {host}:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.1
debug3: send packet: type 20
debug1: SSH2_MSG_KEXINIT sent
debug1: match: OpenSSH_8.2p1 Ubuntu-4ubuntu0.1 pat OpenSSH* compat 0x04000000
# {host}:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.1
debug3: send packet: type 20
debug1: SSH2_MSG_KEXINIT sent
debug1: match: OpenSSH_8.2p1 Ubuntu-4ubuntu0.1 pat OpenSSH* compat 0x04000000
# {host}:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.1
debug3: send packet: type 20
debug1: SSH2_MSG_KEXINIT sent
debug1: match: OpenSSH_8.2p1 Ubuntu-4ubuntu0.1 pat OpenSSH* compat 0x04000000
# {host}:22 SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.1
debug3: send packet: type 20
debug1: SSH2_MSG_KEXINIT sent
```

The error is most likely caused by a race condition on removing /run/sshd, which is easily reproducible by ssh-keyscan.
I noticed that depeding on client, I'd sometimes miss all keys, sometimes get one, sometimes more.

Modifying the following files (they should me marked as modified in the bug report) seems to solve the issue, at least temporarily:
/usr/lib/systemd/system/ssh.service
/usr/lib/systemd/system/ssh@.service
In both cases, I added `RuntimeDirectoryPreserve=yes` to the `[Service]` section, after `RuntimeDirectory=sshd`.
This is the same solution mentioned in the Debian bug, although their bug report doesn't mention which service files are affected.

This doesn't seem to be a proper long-term solution though, as it seems apt doesn't respect configuration files in /usr (or they are unlisted somewhere),
because after upgrading system just before filing this bug report, the files got overwritten and reverted to their original form.
I only got asked about the /etc/ssh/sshd_config, which I have chosen to keep.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: openssh-server 1:8.2p1-4ubuntu0.1 [modified: lib/systemd/system/ssh.service lib/systemd/system/ssh@.service]
ProcVersionSignature: Ubuntu 5.4.0-54.60-generic 5.4.65
Uname: Linux 5.4.0-54-generic x86_64
ApportVersion: 2.20.11-0ubuntu27.12
Architecture: amd64
CasperMD5CheckResult: skip
Date: Mon Nov 23 15:09:32 2020
SourcePackage: openssh
UpgradeStatus: No upgrade log present (probably fresh install)

Related branches

Revision history for this message
Marcin Słowik (marandil) wrote :
Revision history for this message
Seth Arnold (seth-arnold) wrote :

Hello Marcin, the Description section of https://www.freedesktop.org/software/systemd/man/systemd.unit.html gives information on how to modify configurations without having them undone by future updates; the systemctl edit command automates the process of using these local modifications.

Thanks

Changed in openssh (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks for the report.

I was able to reproduce this bug. Basically:

$ systemctl start ssh.socket
$ ssh-keyscan localhost

Interesting enough, I wasn't able to solve the problem by setting RuntimeDirectoryPreserve=yes. I edited sshd.service and added the directive there, but I still see the fatal errors on /var/log/auth.log. Maybe I'm missing something, but I don't have the time right now to dive deep into this.

Marcin, as Seth said above, the right way to edit a systemd unit file is to invoke "systemctl edit", which will make sure that the new .service file is installed in a way that won't get ovewritten when you upgrade your package/system. You might want to use the "--full" option when invoking the command, which will already pre-fill the new file with the contents of the original .service.

I'm marking this bug as Triaged and setting the priority to Medium. Hopefully someone will be able to work on it soon.

Revision history for this message
Marcin Słowik (marandil) wrote :

Hello Segio,

Have you tried editing ssh@.service as well? In my case, sshd.service -> ssh.service and that is not used when ssh is in socket-activated mode, ssh@.service is, at least that's how I understand systemd.

I now understand how to override the unit files with [unit-file].d/override.conf files and that at least fixes the overwrite on update issue (although I must say it's a bit unintuitive, since some of the /usr/lib/systemd/system configs are symlinked from /etc/systemd/system and IIRC you should be allowed to modify configs under /etc, but I digress). AFAICT that's precisely what `systemctl edit` does.

Seth, thank you for linking the manpage, it was... educational to say the least :).

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks for the comment, Marcin. Yes, you're right, the correct file to edit was ssh@.service indeed. That was a thinko on my part.

Revision history for this message
Steve Dodd (anarchetic) wrote :

I'm seeing something similar to this (messages more like those in underlying debian bug report) - in this case triggered by a script which sshs in (invoking unison) twice in quick succession. Underlying hardware is an ARM board which may a little slow, don't know if that helps to trigger race?

I'm also a little confused as to whether socket activation is the default under Ubuntu or not. My etckeeper history suggests it's been enabled here since 2014, when records began .. so no idea if it was something I enabled or not!

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openssh - 1:8.4p1-5ubuntu2

---------------
openssh (1:8.4p1-5ubuntu2) impish; urgency=medium

  * d/systemd/ssh@.service: preserve the systemd managed runtime directory to
    ensure parallel processes will not disrupt one another when halting
    (LP: #1905285) (closes: #934663)

 -- Athos Ribeiro <email address hidden> Mon, 05 Jul 2021 09:21:03 -0300

Changed in openssh (Ubuntu):
status: Triaged → Fix Released
description: updated
Changed in openssh (Ubuntu Focal):
assignee: nobody → Athos Ribeiro (athos-ribeiro)
Changed in openssh (Ubuntu Focal):
status: New → In Progress
Revision history for this message
Robie Basak (racb) wrote :

The upload looks fine, but do you have any plans to fix Hirsute? Otherwise users upgrading from Focal up to Hirsute will be regressed after this fix is landed. It should be trivial to also fix this in Hirsute I think - and might even be beneficial to do first as a canary.

description: updated
Changed in openssh (Ubuntu Hirsute):
assignee: nobody → Athos Ribeiro (athos-ribeiro)
Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

Thanks, Robie!

I proposed a patch for hirsute which was already uploaded.

Changed in openssh (Ubuntu Hirsute):
status: New → In Progress
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Marcin, or anyone else affected,

Accepted openssh into hirsute-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/openssh/1:8.4p1-5ubuntu1.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-hirsute to verification-done-hirsute. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-hirsute. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in openssh (Ubuntu Hirsute):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-hirsute
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Marcin, or anyone else affected,

Accepted openssh into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/openssh/1:8.2p1-4ubuntu0.3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in openssh (Ubuntu Focal):
status: In Progress → Fix Committed
tags: added: verification-needed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (openssh/1:8.4p1-5ubuntu1.1)

All autopkgtests for the newly accepted openssh (1:8.4p1-5ubuntu1.1) for hirsute have finished running.
The following regressions have been reported in tests triggered by the package:

sshuttle/1.0.4-1ubuntu4 (s390x, ppc64el)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/hirsute/update_excuses.html#openssh

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

I verified the fixes by following the steps in the test plan described above by using the attached script on both focal and hirsute LXC containers.

The tests show that the patch successfully fixes the described issue.

tags: added: verification-done verification-done-focal verification-done-hirsute
removed: verification-needed verification-needed-focal verification-needed-hirsute
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openssh - 1:8.4p1-5ubuntu1.1

---------------
openssh (1:8.4p1-5ubuntu1.1) hirsute; urgency=medium

  * d/systemd/ssh@.service: preserve the systemd managed runtime directory to
    ensure parallel processes will not disrupt one another when halting
    (LP: #1905285)

 -- Athos Ribeiro <email address hidden> Wed, 28 Jul 2021 10:33:49 -0300

Changed in openssh (Ubuntu Hirsute):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for openssh has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package openssh - 1:8.2p1-4ubuntu0.3

---------------
openssh (1:8.2p1-4ubuntu0.3) focal; urgency=medium

  * d/systemd/ssh@.service: preserve the systemd managed runtime directory to
    ensure parallel processes will not disrupt one another when halting
    (LP: #1905285)

 -- Athos Ribeiro <email address hidden> Fri, 23 Jul 2021 09:55:12 -0300

Changed in openssh (Ubuntu Focal):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.