smtp "fatal: unknown service: smtp/tcp", probably after libc upgrade

Bug #1916541 reported by Eugene Crosser
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
GLibC
Fix Released
Medium
glibc (Ubuntu)
Fix Released
High
Balint Reczey
Hirsute
Fix Released
High
Balint Reczey
postfix (Ubuntu)
Invalid
High
Unassigned
Hirsute
Invalid
High
Unassigned

Bug Description

On hirsute, on Jan 20, postfix stopped sending outgoing mails with the error

Feb 22 23:35:18 pccross postfix/smtp[9582]: fatal: unknown service: smtp/tcp

There were no changes in the system other than daily upgrades.

Strace shows that `smtp` process (while chrooted), after successful open of the file in the queue, tried to use NSS, ran `fstat()` against `/etc/nsswitch.conf`, that succeeded (presumably using the copy in the /var/spool/postfix/etc/ directory), but there was no subsequent `open()`, and the next syscall is `sendto()` of the error message to syslog.

There was a big upgrade between the last successful and the first unsuccessful email delivery, and among other things libc was updated from 2.32-0ubuntu6 to 2.33-0ubuntu2. I suspect that there was some change in the NSS in libc that resulted in the failure of `getservbyname()`.

Is anybody else observing this problem?

ProblemType: Bug
DistroRelease: Ubuntu 21.04
Package: postfix 3.5.6-1
ProcVersionSignature: Ubuntu 5.10.0-14.15-generic 5.10.11
Uname: Linux 5.10.0-14-generic x86_64
ApportVersion: 2.20.11-0ubuntu59
Architecture: amd64
CasperMD5CheckResult: unknown
CurrentDesktop: X-Cinnamon
Date: Tue Feb 23 00:02:13 2021
EtcMailname: pccross.average.org
Hostname: pccross
PostconfMydomain: localdomain
PostconfMyhostname: pccross.localdomain
PostconfMyorigin: /etc/mailname
ResolvConf:
 # Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
 # DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN

 nameserver 127.0.0.1
 search average.org
SourcePackage: postfix
UpgradeStatus: No upgrade log present (probably fresh install)

CVE References

Revision history for this message
In , Sjon-8 (sjon-8) wrote :

I have an issue that I suspect is caused by a recent glibc change, but I haven't 100% ruled out another cause, so this report might be bogus.

I use PHP-FPM [1] with chroots enabled. Since upgrading glibc, name resolving (via DNS) fails with "getaddrinfo failed: System error" in my chroot and I'm pretty sure it is caused by the recently added "Block attempts to dlopen any module we haven't already opened" [2]

What seems to happen is that the PHP-FPM master process only loads libnss_files.so.2 and libnss_systemd.so.2 because it uses that to resolve the username (it matches nsswitch which contains: "passwd: files systemd")

If any of the FPM workers then attempts to perform dns resolving, that fails because libnss_dns.so.2 has not been loaded yet (even though I made it available in the chroot), and due to the recent change, it won't be loaded either.

I have confirmed I can "fix" it by forcing the fpm master to load the dns module by modifying nsswitch.conf outside of the chroot to contains "passwd: dns files systemd", this fixes it

1. https://www.php.net/manual/en/install.fpm.php
2. https://github.com/bminor/glibc/commit/429029a73ec2dba7f808f69ec8b9e3d84e13e804#diff-9305f1992144bc8c923a840d44827642f1c3f57e3df85a69357fff2fe7370fb8R352

Revision history for this message
In , Sjon-8 (sjon-8) wrote :

Some more information for clarification:

# grep dns /etc/nsswitch
hosts: files dns

# strace -eopenat -f /usr/bin/php-fpm --nodaemonize --fpm-config /etc/php/php-fpm.conf 2>&1|grep nss
openat(AT_FDCWD, "/etc/ssl/openssl.cnf", O_RDONLY) = 3
openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 22
openat(AT_FDCWD, "/usr/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 22
openat(AT_FDCWD, "/usr/lib/libnss_systemd.so.2", O_RDONLY|O_CLOEXEC) = 22

# curl https://localhost/test.php
getaddrinfo failed: System error

--

# grep dns /etc/nsswitch
passwd: dns files systemd
hosts: files dns

# strace -eopenat -f /usr/bin/php-fpm --nodaemonize --fpm-config /etc/php/php-fpm.conf 2>&1|grep nss
openat(AT_FDCWD, "/etc/ssl/openssl.cnf", O_RDONLY) = 3
openat(AT_FDCWD, "/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 22
openat(AT_FDCWD, "/usr/lib/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = 22
openat(AT_FDCWD, "/usr/lib/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 22
openat(AT_FDCWD, "/usr/lib/libnss_systemd.so.2", O_RDONLY|O_CLOEXEC) = 22

# curl https://localhost/test.php
works

Revision history for this message
In , Florian Weimer (fweimer) wrote :

I've started a discussion on libc-alpha:

NSS chroot protection
https://sourceware.org/pipermail/libc-alpha/2021-February/122714.html

Revision history for this message
In , Howard Chu (hyc) wrote :

(In reply to Florian Weimer from comment #2)
> I've started a discussion on libc-alpha:
>
> NSS chroot protection
> https://sourceware.org/pipermail/libc-alpha/2021-February/122714.html

Can you link to more context on what security regressions this change was meant to address? chrooted servers have been running for decades using the NSS databases of the chroot jail, and this change breaks all of them.

Revision history for this message
In , Florian Weimer (fweimer) wrote :

It's CVE-2019-14271:

https://nvd.nist.gov/vuln/detail/CVE-2019-14271

(The Docker fix is likely incomplete depending on nsswitch.conf contents.)

That issue is of no concern once the service is running, it applies to the construction of the chroot contents from the outside.

Revision history for this message
Eugene Crosser (crosser) wrote :
description: updated
Revision history for this message
Lucas Kanashiro (lucaskanashiro) wrote :

Thank you for taking the time to file a bug report.

I tried to reproduce your bug in a Ubuntu Hirsute container without success. Moreover, I looked for an upstream bug report about any incompatibility with glibc 2.33 but found nothing. Could you please share your postfix config file to help us understand what is going on here?

Since there is not enough information in your report to begin triage or to
differentiate between a local configuration problem and a bug in Ubuntu, I
am marking this bug as "Incomplete". We would be grateful if you would:
provide a more complete description of the problem, explain why you
believe this is a bug in Ubuntu rather than a problem specific to your
system, and then change the bug status back to "New".

For local configuration issues, you can find assistance here:
http://www.ubuntu.com/support/community

Changed in postfix (Ubuntu):
status: New → Incomplete
Revision history for this message
Eugene Crosser (crosser) wrote :
Download full text (3.3 KiB)

Hello @lucaskanashiro,

it may work differently in a container because the problem may be triggered by a combination of libc components and possibly the kernel (only a speculation). I can reproduce the problem on another hirsute machine, postfix was never used there. I `dpkg-reconfigure`d it as a "satellite system" and sent an email. Sure enough, it got stuck in the queue the same way:

crosser@journey:~$ mailq
-Queue ID- --Size-- ----Arrival Time---- -Sender/Recipient-------
6712B36270B 422 Tue Feb 23 21:36:05 <email address hidden>
                                                (unknown mail transport error)
                                         <email address hidden>

-- 0 Kbytes in 1 Request.

I am attaching all relevant files.

> not enough information in your report to begin triage or to differentiate between a local configuration problem and a bug in Ubuntu

Well, nothing in the local configuration changed between Feb 20 when delivery worked and Feb 21 when it stopped to work...

> provide a more complete description of the problem
The most complete I could provide short of recompiling postfix and adding debug printouts was that strace looks like libc's `getservbyname()` returned failure when it was called from a chrooted process, without trying to access `/etc/services` or even opening `/etc/nsswitch.conf` (but checked `stat` of the latter, which was successful).

> explain why you believe this is a bug in Ubuntu rather than a problem specific to your
system

As I explained in my first report (and another time above), there were no local changes (other than package upgrades) between the moment deliver worked and the moment after which it stopped working.

?field.comment=Hello @lucaskanashiro,

it may work differently in a container because the problem may be triggered by a combination of libc components and possibly the kernel (only a speculation). I can reproduce the problem on another hirsute machine, postfix was never used there. I `dpkg-reconfigure`d it as a "satellite system" and sent an email. Sure enough, it got stuck in the queue the same way:

crosser@journey:~$ mailq
-Queue ID- --Size-- ----Arrival Time---- -Sender/Recipient-------
6712B36270B 422 Tue Feb 23 21:36:05 <email address hidden>
                                                (unknown mail transport error)
                                         <email address hidden>

-- 0 Kbytes in 1 Request.

I am attaching all relevant files.

> not enough information in your report to begin triage or to differentiate between a local configuration problem and a bug in Ubuntu

Well, nothing in the local configuration changed between Feb 20 when delivery worked and Feb 21 when it stopped to work...

> provide a more complete description of the problem
The most complete I could provide short of recompiling postfix and adding debug printouts was that strace looks like libc's `getservbyname()` returned failure when it was called from a chrooted process, without trying to access `/etc/services` or even opening `/etc/nsswitch.conf` (but checked `stat` of the latter, which was successful).

> explain why you believe this is a bug in Ubuntu rather than a problem sp...

Read more...

Eugene Crosser (crosser)
Changed in postfix (Ubuntu):
status: Incomplete → New
Revision history for this message
Eugene Crosser (crosser) wrote :

I did some more investigation, and did not find an explanation, but I am writing it here to maybe save somebody else some effort.

Postfix uses normal(?) `getservbyname(service, protocol)` in the file src/smtp/smtp_connect.c line 363:

        if ((sp = getservbyname(service, protocol)) == 0)
            msg_fatal("unknown service: %s/%s", service, protocol);

and I see this message in the log.

On the other hand, a simple program that performs `chroot("/var/spool/postfix")` and calls `getservbyname()` _works correctly_ on hirsute (C code in attachment):

crosser@pccross:~/src$ sudo ./getsrvname
[sudo] password for crosser:
s_name="smtp", s_port=25, s_proto="tcp"

Note though that in such form the the program does not try to use NSS, it just opens `/etc/services` directly. While postfix's `smtp` goes through nsswitch, as can be seen in the strace log from my previous attachment.

Revision history for this message
Sergio Durigan Junior (sergiodj) wrote :

Thanks for providing more information.

Unfortunately, I still cannot reproduce the problem. This time I tried using a Hirsute VM instead of a container, but everything worked as expected. I was able to send an email from inside the VM, the email went through without problems, and I received it successfully.

You mentioned that you experience the issue even if you configure postfix to act as a satellite system. Could you perhaps provide the configuration files you used to do that? I did the same here, and I think it is unlikely that the problem is in some configuration file, but hey, here goes nothing...

Also, and I apologize if you have already done this, but please make sure that your Hirsute system is up-to-date. I'm using a completely up-to-date Hirsute VM with absolutely nothing extra installed except postfix and sasl2-bin.

Thanks.

Changed in postfix (Ubuntu):
status: New → Incomplete
Revision history for this message
Eugene Crosser (crosser) wrote :
Download full text (4.3 KiB)

Hello @sergiodj,

> Could you perhaps provide the configuration files you used to do that?

My complete configuration is included in the tar attached to this ticket.

> please make sure that your Hirsute system is up-to-date

Yes, updated daily, _plus_ by hand right before the investigation.
Note the that problem (quite certainly) _was triggered by an update_ of the libc package.

Perhaps worth to mention, my systems are being upgraded between distro versions, not fresh-installed.

In the meanwhile, I did more investigation. First, I checked that the problem is, as I suspected, connected to running `getservbyname()` under chroot. When I change smtp line in `master.cf` to not run under chroot, i.e.

smtp unix - - y - - smtp

is changed to

smtp unix - - n - - smtp

MAIL DELIVERY STARTS TO WORK!

So at least I have a workaround now. But the problem still stands.

Now here is something interesting: strace of the `smtp` process when it is run under chroot and without chroot:

===== under chroot =====
[pid 224329] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 14
[pid 224329] connect(14, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
[pid 224329] close(14) = 0
[pid 224329] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 14
[pid 224329] connect(14, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
[pid 224329] close(14) = 0
[pid 224329] newfstatat(AT_FDCWD, "/etc/nsswitch.conf", {st_mode=S_IFREG|0644, st_size=699, ...}, 0) = 0
[pid 224329] newfstatat(AT_FDCWD, "/", {st_mode=S_IFDIR|0755, st_size=4096, ...}, 0) = 0
[pid 224329] getpid() = 224329
[pid 224329] sendto(7, "<18>Feb 24 20:48:11 postfix/smtp"..., 92, MSG_NOSIGNAL, NULL, 0) = 92
========

(the last line is the error message about "smtp"/"tcp" service. No attempt to load any nsswitch modules!)

==== no chroot ====
[pid 224736] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 14
[pid 224736] connect(14, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
[pid 224736] close(14) = 0
[pid 224736] socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 14
[pid 224736] connect(14, {sa_family=AF_UNIX, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
[pid 224736] close(14) = 0
[pid 224736] newfstatat(AT_FDCWD, "/etc/nsswitch.conf", {st_mode=S_IFREG|0644, st_size=699, ...}, 0) = 0
[pid 224736] openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 14
[pid 224736] newfstatat(14, "", {st_mode=S_IFREG|0644, st_size=177082, ...}, AT_EMPTY_PATH) = 0
[pid 224736] mmap(NULL, 177082, PROT_READ, MAP_PRIVATE, 14, 0) = 0x7f35ad920000
[pid 224736] close(14) = 0
[pid 224736] access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory)
[pid 224736] openat(AT_FDCWD, "/lib/x86_64-linux-gnu/glibc-hwcaps/x86-64-v2/libnss_db.so.2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
[pid 224736] newfsta...

Read more...

Revision history for this message
Paride Legovini (paride) wrote :

Hi Eugene,

I could reproduce the issue from a clean Hirsute VM. Steps:

1. lxc launch ubuntu-daily:hirsute paride-h-vm --vm
2. lxc exec paride-h-vm bash
3. apt update
4. apt full-upgrade
5. apt install postfix
   # Debconf config as: Satellite system.
   # All the rest left with default values
6. apt install bsd-mailx
7. echo x | mailx <email address hidden>
8. mailq. Check the "unknown mail transport error":

-Queue ID- --Size-- ----Arrival Time---- -Sender/Recipient-------
3D4CB3F4DB 370 Thu Feb 25 13:52:53 <email address hidden>
                                                (unknown mail transport error)
                                         <email address hidden>

9. cat /var/log/mail.log. There it is:

Feb 25 13:52:53 paride-h-vm postfix/smtp[10307]: fatal: unknown service: smtp/tcp

I'm not sure if this should be considered a postfix bug or a glibc bug at this point. If this is a regression caused by a recent glibc upload then other packages may be affected, and action should be taken on that side.

I'm adding a "glibc" task to get attention from the glibc maintainers, and raising the importance of this bug. Thanks a lot for sharing all the details of your debugging work!

Changed in postfix (Ubuntu):
status: Incomplete → Confirmed
importance: Undecided → Critical
importance: Critical → High
Revision history for this message
Paride Legovini (paride) wrote :

Note: the very same steps in a ubuntu:focal VM do not lead to the same issue.

Revision history for this message
Eugene Crosser (crosser) wrote :

Thanks Paride,
I am glad that you could reproduce it, and it's not like either I or my system is crazy :)

I noticed something that may or may not be relevant. I have another machine with hirsute, that was purchased a year+ ago and has undergone less package upgrades: mail delivery _worked_ there. I did _not_ reconfigure it as "satellite" and I don't remember how it was configured at the start, probably as all default.

The difference that I noticed is that the contents of /var/spool/postfix/usr/lib (as filled by posttix's startup procedure) is significantly different. One wild speculation would be that some shared lib there is opportunistically loaded that refers to an earlier glibc version, and that subsequently somehow results in impossibility to load nss libraries.

tags: added: server-next
Revision history for this message
In , Florian Weimer (fweimer) wrote :

Fixed for 2.34. Also backported to 2.33.

commit 58673149f37389495c098421085ffdb468b3f7ad
Author: DJ Delorie <email address hidden>
Date: Thu Feb 18 15:26:30 2021 -0500

    nss: Re-enable NSS module loading after chroot [BZ #27389]

    The glibc 2.33 release enabled /etc/nsswitch.conf reloading,
    and to prevent potential security issues like CVE-2019-14271
    the re-loading of nsswitch.conf and all mdoules was disabled
    when the root filesystem changes (see bug 27077).

    Unfortunately php-lpfm and openldap both require the ability
    to continue to load NSS modules after chroot. The packages
    do not exec after the chroot, and so do not cause the
    protections to be reset. The only solution is to re-enable
    only NSS module loading (not nsswitch.conf reloading) and so
    get back the previous glibc behaviour.

    In the future we may introduce a way to harden applications
    so they do not reload NSS modules once the root filesystem
    changes, or that only files/dns are available pre-loaded
    (or builtin).

    Reviewed-by: Carlos O'Donell <email address hidden>

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in glibc (Ubuntu):
status: New → Confirmed
Revision history for this message
Max Bowsher (maxb) wrote :

I've just been bitten by this.

It seems this was broken by https://sourceware.org/git/?p=glibc.git;a=commit;h=429029a73ec2dba7f808f69ec8b9e3d84e13e804

and it has recently been realised that was a bad idea: https://sourceware.org/git/?p=glibc.git;a=commit;h=58673149f37389495c098421085ffdb468b3f7ad

https://sourceware.org/bugzilla/show_bug.cgi?id=27389

It looks like hirsute is going to need a backport of this

Revision history for this message
In , Eugene Crosser (crosser) wrote :

Just as a note, the change has also broken postfix. Its smtp client enters chroot after start, and then down the road calls `getservbyname()` which fails.

Revision history for this message
Paride Legovini (paride) wrote :

Comment #5 of the upstream bug report [1] says this is

  Fixed for 2.34. Also backported to 2.33.

The backport commit for 2.33 is [2].

[1] https://sourceware.org/bugzilla/show_bug.cgi?id=27389#c5
[2] https://sourceware.org/git/?p=glibc.git;a=commit;h=3e880d733753183696d1a81c34caef3a9add2b0c

tags: added: rls-hh-incoming
Changed in glibc (Ubuntu):
importance: Undecided → High
Balint Reczey (rbalint)
Changed in glibc (Ubuntu):
assignee: nobody → Balint Reczey (rbalint)
Revision history for this message
Balint Reczey (rbalint) wrote :

Thanks, I'll upload the backported fix to Hirsute soon.

Changed in postfix (Ubuntu):
status: Confirmed → In Progress
status: In Progress → Confirmed
status: Confirmed → Invalid
Changed in glibc (Ubuntu):
status: Confirmed → In Progress
tags: added: fr-1202
tags: removed: rls-hh-incoming
Changed in glibc:
importance: Unknown → Medium
status: Unknown → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package glibc - 2.33-0ubuntu4

---------------
glibc (2.33-0ubuntu4) hirsute; urgency=medium

  [ Matthias Klose ]
  * Run checks for every pass before failing the build.
  * Regenerate the control file.

  [ Balint Reczey ]
  * XFAIL new tests not setting up their container properly
  * Add Breaks: against fakeroot (<< 1.25.3-1.1ubuntu2~) (LP: #1915250)
  * debian/patches/git-updates.diff: update from upstream stable branch
    (LP: #1916541)
  * debian/tests/rebuild: Print cpuinfo to help triaging CPU-specific failures

 -- Balint Reczey <email address hidden> Thu, 11 Mar 2021 17:29:21 +0100

Changed in glibc (Ubuntu Hirsute):
status: In Progress → Fix Released
Revision history for this message
Eugene Crosser (crosser) wrote :

I confirm that with libc6 version 2.33-0ubuntu4 installed, postfix works again.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.