Test of dogtag-pki is failing on s390x due to LTO

Bug #1931104 reported by Christian Ehrhardt 
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
NSS
New
Unknown
nss (Fedora)
Unknown
Unknown
nss (Ubuntu)
Triaged
Undecided
Unassigned

Bug Description

The test of dogtag-pki is failing on the nss 3.63 that is in impish proposed.
Example:
https://autopkgtest.ubuntu.com/results/autopkgtest-impish/impish/s390x/d/dogtag-pki/20210516_212719_e6522@/log.gz

Bad:
Installing CA into /var/lib/pki/pki-tomcat.
Installation failed: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
ERROR: ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
  File "/usr/lib/python3/dist-packages/pki/server/pkispawn.py", line 575, in main
    scriptlet.spawn(deployer)
  File "/usr/lib/python3/dist-packages/pki/server/deployment/scriptlets/configuration.py", line 995, in spawn
    cert = deployer.setup_cert(client, tag)
  File "/usr/lib/python3/dist-packages/pki/server/deployment/__init__.py", line 355, in setup_cert
    return client.setupCert(request)
  File "/usr/lib/python3/dist-packages/pki/system.py", line 389, in setupCert
    response = self.connection.post(
  File "/usr/lib/python3/dist-packages/pki/client.py", line 55, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/pki/client.py", line 293, in post
    r = self.session.post(
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 590, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
>>>> CA spawn failed:

Good:
nstalling CA into /var/lib/pki/pki-tomcat.
Notice: Trust flag u is set automatically if the private key is present.
/usr/lib/python3/dist-packages/urllib3/connection.py:455: SubjectAltNameWarning: Certificate for i-dogtag has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/urllib3/urllib3/issues/497 for details.)
  warnings.warn(

    ==========================================================================
                                INSTALLATION SUMMARY
    ==========================================================================
...

The good test above was with:
ii libnss3:s390x 2:3.61-1ubuntu2 s390x Network Security Service libraries
ii 389-ds-base 1.4.4.11-2 s390x 389 Directory Server suite - server

Worth to know, the good case test still fails later on with:
IOException: SocketException cannot write on socket: Failed to write to socket: (-5938) Encountered end of file.
ERROR: CalledProcessError: Command '['pki', '-d', '/etc/pki/pki-tomcat/alias', '-f', '/etc/pki/pki-tomcat/password.conf', '-U', 'https://i-dogtag:8443', 'securitydomain-join', '--session', '4717921475119312283', '--type', 'TKS', '--hostname', 'i-dogtag', '--unsecure-port', '8080', '--secure-port', '8443', 'TKS i-dogtag 8443']' returned non-zero exit status 255.
  File "/usr/lib/python3/dist-packages/pki/server/pkispawn.py", line 575, in main
    scriptlet.spawn(deployer)
  File "/usr/lib/python3/dist-packages/pki/server/deployment/scriptlets/configuration.py", line 1038, in spawn
    subsystem.join_security_domain(
  File "/usr/lib/python3/dist-packages/pki/server/subsystem.py", line 1201, in join_security_domain
    subprocess.check_call(cmd)
  File "/usr/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
Installation failed: Command failed: pki -d /etc/pki/pki-tomcat/alias -f /etc/pki/pki-tomcat/password.conf -U https://i-dogtag:8443 securitydomain-join --session 4717921475119312283 --type TKS --hostname i-dogtag --unsecure-port 8080 --secure-port 8443 TKS i-dogtag 8443
Please check pkispawn logs in /var/log/pki/pki-tks-spawn.20210607093926.log

Well one issue at a time ... the current install issue first.

Since it worked with the nss in -release I was upgrading this to the new nss.
ii 389-ds-base 1.4.4.11-2 s390x 389 Directory Server suite - server
ii libnss3:s390x 2:3.63-1ubuntu1 s390x Network Security Service libraries

With this the install fail is reprodicible.
So we can switch in/out bad case by up/downgrading libnss3.

Comparing those two cases until they reach the first successful install message
I've seen a crash:

  pki-tomcat[37160]: #
  pki-tomcat[37160]: # A fatal error has been detected by the Java Runtime Environment:
  pki-tomcat[37160]: #
  pki-tomcat[37160]: # SIGSEGV (0xb) at pc=0x000003ff9ce9ec02, pid=37160, tid=37246
  pki-tomcat[37160]: #
  pki-tomcat[37160]: # JRE version: OpenJDK Runtime Environment (11.0.12+4) (build 11.0.12-ea+4-Ubuntu-0ubuntu2)
  pki-tomcat[37160]: # Java VM: OpenJDK 64-Bit Server VM (11.0.12-ea+4-Ubuntu-0ubuntu2, mixed mode, tiered, compressed oops, serial gc, linux-s390x)
  pki-tomcat[37160]: # Problematic frame:
  pki-tomcat[37160]: # C [libnss3.so+0x11ec02]
  pki-tomcat[37160]: #
  pki-tomcat[37160]: # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /var/lib/pki/pki-tomcat/core.37160)
  pki-tomcat[37160]: #
  pki-tomcat[37160]: # An error report file with more information is saved as:
  pki-tomcat[37160]: # /var/lib/pki/pki-tomcat/hs_err_pid37160.log
  pki-tomcat[37160]: #
  pki-tomcat[37160]: # If you would like to submit a bug report, please visit:
  pki-tomcat[37160]: # https://bugs.launchpad.net/ubuntu/+source/openjdk-lts
  pki-tomcat[37160]: # The crash happened outside the Java Virtual Machine in native code.
  pki-tomcat[37160]: # See problematic frame for where to report the bug.

A few extra runs had also shown:
   # Problematic frame:
   # C [libnssutil3.so+0x1b60c] PORT_FreeArena_Util+0xc

And while I could not get a core dump out as the config required to be changed
is written on the fly and then started I was able to find the code.
Obviously there has to be a lot of abstraction but plenty of recent changes
fixed double frees and dangling pointer values.
For example https://github.com/nss-dev/nss/commit/350807b3a70f60928ea3f2bc95fd1795aae9b753

This is all (this and more similar fixes) in 3.66 which is released and in Debian unstable.
It might be worth to re-merge that, throw it into a PPA and re-run the tests.

Tags: lto
description: updated
tags: added: update-excuse
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI by tjaaltonen - there's another crasher in 3.66 on ppc64el.. https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=989410

So 3.66 won't be the "take this and it works" solution.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

While we wait for 3.67 and maybe (Thanks Timo) for [1] I have ensured that we have a 3.66 test build.
=> https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4577/+packages

The check of it's delta also showed that we can drop a bit of it nowadays.
=> https://code.launchpad.net/~paelzer/ubuntu/+source/nss/+git/nss/+ref/merge-impish-3.66-1

[1]: https://phabricator.services.mozilla.com/D116274

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I was able to verify that a merge of 3.66 would on Ubuntu trigger the very same bug that Debian has blocking the dogtag-pki test on powerpc64.

=> https://autopkgtest.ubuntu.com/results/autopkgtest-impish-ci-train-ppa-service-4577/impish/ppc64el/d/dogtag-pki/20210608_031158_a9d4a@/log.gz

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

There is another fix in master that belongs to https://bugzilla.mozilla.org/show_bug.cgi?id=1566124 - I've bumped my PPA build to include both as it is worth a try if this fixes the current ppc64 issues in v3.66.

Build of 3.66-1ubuntu1~impishppa2 started, later on I'll let the autopkgtests run.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This feels like a circle with nss/2:3.66-1ubuntu1~impishppa2 in https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4577/+packages now ppc64 works but s390x fails with (on the surface) the same symptom as it started with in 3.63 :-/

I retriggered the tests to see if that is flaky or reproducible.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Without the recent PPC fixes s390x was broken as well, so I'm not saying "the ppc fixes broke s390x":
https://autopkgtest.ubuntu.com/results/autopkgtest-impish-ci-train-ppa-service-4577/impish/s390x/d/dogtag-pki/20210608_073451_f187d@/log.gz

Instead there seems to be a new crash affecting s390x that has to be looked at to be able to rev up the nss version.

Tagging server-next and subscribing the team as this is no more a +1 task.
If we are lucky the always dilligent locutus (who often merges nss) might come by and give it a short as well when we have latter nss versions.

tags: added: server-next
Paride Legovini (paride)
Changed in nss (Ubuntu):
assignee: nobody → Paride Legovini (paride)
Revision history for this message
Paride Legovini (paride) wrote :

After some work and testing, here are my findings.

 - The dogtag-pki autopkgtest failure is manually reproducible
   using autopkgtest-virt-lxd.
 - The dogtag-pki autopkgtests pass with in Impish using
   libnss3 *from the archive* (uploaded and built on Hirsute).
 - The dogtag-pki autopkgtests FAIL when using the very same
   libnss3 version but rebuilt from source on a Hirsute schroot.
   The debdiff between the two binary packages is:

File lists identical (after any substitutions)

Control files: lines which differ (wdiff format)
------------------------------------------------
Installed-Size: [-4255-] {+4318+}

 - This seems to be s390x-specific (I can't reproduce on my
   amd64 laptop)
 - I tried merging the latest version of src:nss from Debian,
   which required refreshing a s390x-specific patch, so I was
   really hoping this would fix it, but no: it fails in the
   same way.

FWIW I have a branch ready for merging 2:3.67-2 to Impish, but
I'm not sure on how to handle this dogtag-pki autopkgtest failure.

Revision history for this message
Paride Legovini (paride) wrote :

Rebuilding the same source package on Groovy produces a "good" libnss3 package (the Impish dogtag-pki autopkgtests pass when using it). This means that a build-dependency that was upgraded in Hirsute caused the regression.

(As Christian made me notice rebuilding on Hirsute *release* with no updates may still produce a broken package, even if the "good" package in the archive was built on Hirsute. This is because back when the Hirsute package was built Hirsute was still in development. The true way to go back to what Hirsute was initially is to go back to Groovy.)

Revision history for this message
Paride Legovini (paride) wrote :
Download full text (8.3 KiB)

Diff of the sbuild Installed-Build-Depends from the "good" Hirsute build that produced the nss packages now in the archive and a "bad" Hirsute build done in an up-to-date Hirsute schroot:

--- good 2021-07-21 12:02:03.870339411 +0200
+++ bad 2021-07-21 12:03:20.367850047 +0200
@@ -3,38 +3,39 @@
  automake (= 1:1.16.3-2ubuntu1),
  autopoint (= 0.21-3ubuntu2),
  autotools-dev (= 20180224.1+nmu1),
- base-files (= 11ubuntu16),
+ base-files (= 11ubuntu19),
  base-passwd (= 3.5.49),
- bash (= 5.1-1ubuntu1),
- binutils (= 2.36.1-0ubuntu1),
- binutils-common (= 2.36.1-0ubuntu1),
- binutils-s390x-linux-gnu (= 2.36.1-0ubuntu1),
- bsdextrautils (= 2.36.1-1ubuntu2),
- bsdutils (= 1:2.36.1-1ubuntu2),
+ bash (= 5.1-2ubuntu1),
+ binutils (= 2.36.1-6ubuntu1),
+ binutils-common (= 2.36.1-6ubuntu1),
+ binutils-s390x-linux-gnu (= 2.36.1-6ubuntu1),
+ bsdextrautils (= 2.36.1-7ubuntu2),
+ bsdutils (= 1:2.36.1-7ubuntu2),
  build-essential (= 12.8ubuntu3),
- bzip2 (= 1.0.8-4ubuntu2),
+ bzip2 (= 1.0.8-4ubuntu3),
  coreutils (= 8.32-4ubuntu2),
- cpp (= 4:10.2.0-1ubuntu1),
- cpp-10 (= 10.2.1-19ubuntu1),
+ cpp (= 4:10.3.0-1ubuntu1),
+ cpp-10 (= 10.3.0-1ubuntu1),
  dash (= 0.5.11+git20200708+dd9ef66+really0.5.11+git20200708+dd9ef66-5ubuntu1),
  debconf (= 1.5.74),
- debhelper (= 13.3.3ubuntu2),
+ debhelper (= 13.3.4ubuntu1),
  debianutils (= 4.11.2),
+ debugedit (= 1:0.1-0ubuntu2),
  dh-autoreconf (= 20),
- dh-exec (= 0.23.2),
+ dh-exec (= 0.23.4),
  dh-strip-nondeterminism (= 1.11.0-1),
  diffutils (= 1:3.7-3ubuntu1),
- dpkg (= 1.20.7.1ubuntu2),
- dpkg-dev (= 1.20.7.1ubuntu2),
- dwz (= 0.13+20210201-1),
+ dpkg (= 1.20.9ubuntu1),
+ dpkg-dev (= 1.20.9ubuntu1),
+ dwz (= 0.14-1),
  file (= 1:5.39-3),
- findutils (= 4.7.0-1ubuntu2),
- g++ (= 4:10.2.0-1ubuntu1),
- g++-10 (= 10.2.1-19ubuntu1),
- gcc (= 4:10.2.0-1ubuntu1),
- gcc-10 (= 10.2.1-19ubuntu1),
- gcc-10-base (= 10.2.1-19ubuntu1),
- gcc-11-base (= 11-20210207-1ubuntu1),
+ findutils (= 4.8.0-1ubuntu1),
+ g++ (= 4:10.3.0-1ubuntu1),
+ g++-10 (= 10.3.0-1ubuntu1),
+ gcc (= 4:10.3.0-1ubuntu1),
+ gcc-10 (= 10.3.0-1ubuntu1),
+ gcc-10-base (= 10.3.0-1ubuntu1),
+ gcc-11-base (= 11.1.0-1ubuntu1~21.04),
  gettext (= 0.21-3ubuntu2),
  gettext-base (= 0.21-3ubuntu2),
  grep (= 3.6-1),
@@ -43,113 +44,114 @@
  hostname (= 3.23),
  init-system-helpers (= 1.60),
  intltool-debian (= 0.35.0+20060710.5),
- libacl1 (= 2.2.53-10),
+ libacl1 (= 2.2.53-10ubuntu1),
  libarchive-zip-perl (= 1.68-1),
- libasan6 (= 10.2.1-19ubuntu1),
- libatomic1 (= 11-20210207-1ubuntu1),
- libattr1 (= 1:2.4.48-6),
+ libasan6 (= 11.1.0-1ubuntu1~21.04),
+ libatomic1 (= 11.1.0-1ubuntu1~21.04),
+ libattr1 (= 1:2.4.48-6build1),
  libaudit-common (= 1:3.0-2ubuntu1),
  libaudit1 (= 1:3.0-2ubuntu1),
- libbinutils (= 2.36.1-0ubuntu1),
- libblkid1 (= 2.36.1-1ubuntu2),
- libbz2-1.0 (= 1.0.8-4ubuntu2),
- libc-bin (= 2.33-0ubuntu2),
- libc-dev-bin (= 2.33-0ubuntu2),
- libc6 (= 2.33-0ubuntu2),
- libc6-dev (= 2.33-0ubuntu2),
+ libbinutils (= 2.36.1-6ubuntu1),
+ libblkid1 (= 2.36.1-7ubuntu2),
+ libbz2-1.0 (= 1.0.8-4ubuntu3),
+ libc-bin (= 2.33-0ubuntu5),
+ libc-dev-bin (= 2.33-0ubuntu5),
+ libc6 (= 2.33-0ubuntu5),
+ libc6-dev (= 2.33-0ubuntu5),
  libcap-ng0 (= 0.7.9-2.2build1),
...

Read more...

Revision history for this message
Paride Legovini (paride) wrote :

My plan is now to:

 - Setup a Groovy container
 - Build nss 2:3.61-1ubuntu2 and verify the libnss3 is good
 - Add Hirsute to sources.list and manually update the
   Build-Deps, starting from the usual suspects (compilers),
   hopefully finding which package breaks the dogtag-pki tests.

The testbed system will remain fixed (ubuntu:impish lxd container).

Revision history for this message
Paride Legovini (paride) wrote :

The good news is that the test passes when building nss in a Groovy lxd container, but fails when copying that container (lxc copy), upgrading the copy to Hirsute and rebuilding there, so I have good pair of containers to do the "bisect" on.

Revision history for this message
Paride Legovini (paride) wrote :

I tracked the problem down to the LTO optimizations that were enabled by default in dpkg 1.20.9ubuntu1.

Changed in nss (Ubuntu):
status: New → Triaged
tags: added: lto
Revision history for this message
In , Paride Legovini (paride) wrote :
Download full text (14.2 KiB)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0

Steps to reproduce:

When compiling nss with LTO enabled (gcc -flto) on s390x the resulting libnss3 is not fully functional. I noticed this as the build causes a regression in the dogtag-pki tests which are part of the dogtag-pki Ubuntu package.

Newer releases of Ubuntu enable LTO by default when building packages. This specific issue will be worked around by disabling the optimizations specifically for this package and on s390x, however the problem is worth investigating upstream.

The error printout doesn't immediately point to optimization issues, however this is always reproducible, and reliably goes away by turning LTO off.

Steps to reproduce:
 - Build nss on s390x with LTO enabled.
 - Install dogtag-pki and ensure it uses the just built libnss3.
 - Exercise the following tests: https://salsa.debian.org/freeipa-team/dogtag-pki/-/blob/master/debian/tests/pkispawn.

Actual results:

The tests fail:

autopkgtest [09:34:17]: test pkispawn: [-----------------------
>>>> IP address is 10.226.183.135
>>>> Hostname was:
>>>> /etc/hosts now has:
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
10.226.183.135 autopkgtest.debci autopkgtest
Starting installation...
Completed installation for pki-tomcat
Notice: Trust flag u is set automatically if the private key is present.
/usr/lib/python3/dist-packages/urllib3/connection.py:455: SubjectAltNameWarning: Certificate for autopkgtest.debci has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/urllib3/urllib3/issues/497 for details.)
  warnings.warn(
ERROR: ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
  File "/usr/lib/python3/dist-packages/pki/server/pkispawn.py", line 575, in main
    scriptlet.spawn(deployer)
  File "/usr/lib/python3/dist-packages/pki/server/deployment/scriptlets/configuration.py", line 995, in spawn
    cert = deployer.setup_cert(client, tag)
  File "/usr/lib/python3/dist-packages/pki/server/deployment/__init__.py", line 355, in setup_cert
    return client.setupCert(request)
  File "/usr/lib/python3/dist-packages/pki/system.py", line 389, in setupCert
    response = self.connection.post(
  File "/usr/lib/python3/dist-packages/pki/client.py", line 55, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/lib/python3/dist-packages/pki/client.py", line 293, in post
    r = self.session.post(
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 590, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters...

Revision history for this message
Paride Legovini (paride) wrote :

All of the above still applies to nss 3.68-1, for which I'm preparing a merge right now.

Revision history for this message
Paride Legovini (paride) wrote :

MP for a merge from Debian which also disabled LTO via DEB_BUILD_MAINT_OPTIONS=optimize=-lto:

https://code.launchpad.net/~paride/ubuntu/+source/nss/+git/nss/+merge/406163

summary: - Test of dogtag-pki is failing on s390x vs the nss v3.63 in impish-
- proposed
+ Test of dogtag-pki is failing on s390x due to LTO
Revision history for this message
In , Rrelyea (rrelyea) wrote :

Evidently there is a similiar issue in fedora: https://bugzilla.redhat.com/show_bug.cgi?id=1986627

Paride Legovini (paride)
tags: removed: server-next update-excuse
affects: fedora → nss (Fedora)
Changed in nss (Ubuntu):
assignee: Paride Legovini (paride) → nobody
Revision history for this message
In , Bbeurdouche (bbeurdouche) wrote :

Bob, I am marking this P3 for now as this is not a supported platform for us, but feel free to update the priority.

Revision history for this message
In , Rrelyea (rrelyea) wrote :

I did a scratch build of nss with LTO on in fedora, so the tests were working correctly. I haven't tested it against dogtag yet. Once NSS 3.69 builds are complete, I'll drop the LTO changes into fedora and see if our dogtag team has any issues.

Changed in nss:
status: Unknown → New
Revision history for this message
Paride Legovini (paride) wrote :

I reviewed this issue again and I don't think we're ready to drop the "Disable LTO on s390x" delta for now. There are newer NSS upstream versions worth testing but in Ubuntu we're still at 3.68 for now.

Revision history for this message
Athos Ribeiro (athos-ribeiro) wrote :

We decided to stay in 3.68.x (which is an ESR release) for jammy because it is an LTS release.

We could revisit this once we merge a newer NSS version next cycle, as suggested by Paride.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.