autopkgtest: upstream tests that run in qemu hang on ppc64el

Bug #1992178 reported by Nick Rosbrook
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
New
Medium
bugproxy
systemd
Unknown
Unknown
systemd (Ubuntu)
Fix Released
Medium
Unassigned
Jammy
New
Undecided
Unassigned

Bug Description

I believe this started early in the kinetic cycle, cf. https://autopkgtest.ubuntu.com/packages/systemd/kinetic/ppc64el vs https://autopkgtest.ubuntu.com/packages/systemd/jammy/ppc64el. Timeouts in the upstream tests have been an issue for a while, but kinetic on ppc64el consistently times out with upstream tests that run in QEMU.

Skipping individual tests does not help, because *which* tests time out appears to change with each build. For example, in 251.4-1ubuntu4 the TEST-36-NUMAPOLICY test was consistently the culprit, but now in 251.4-1ubuntu6 the TEST-14-MACHINE-ID often times out.

I have not been able to identify a root cause for this, but it seems that running tests in QEMU is very fragile on ppc64el, where as the tests that run in nspawn are more consistent.

Related branches

Nick Rosbrook (enr0n)
summary: - autopkgtest: upstream tests that run in qemu hang on ppc64l
+ autopkgtest: upstream tests that run in qemu hang on ppc64el
Changed in systemd (Ubuntu):
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 251.4-1ubuntu7

---------------
systemd (251.4-1ubuntu7) kinetic; urgency=medium

  [ Nick Rosbrook ]
  * autopkgtest: do not allow qemu to be used on ppc64el (LP: #1992178)
    Almost every run on ppc64el takes 12 to 24 hours, so do this as a last
    resort to relieve pressure on autopkgtest infrastructure.
    File: debian/tests/upstream
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=5ab2b8ad13a3d7441935d6f43bb81fbd540e2c33

  [ Dan Streetman ]
  * make machinectl read-only work (LP: #1991829)
    Author: Dan Streetman
    File: debian/patches/lp1991829-add-CAP_LINUX_IMMUTABLE-to-systemd-machined-so-it-ca.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=a8a41b11d712c8cfbf595d34f39d95096f23e202

 -- Nick Rosbrook <email address hidden> Tue, 11 Oct 2022 11:51:25 -0400

Changed in systemd (Ubuntu):
status: Triaged → Fix Released
Revision history for this message
Nick Rosbrook (enr0n) wrote :

Upstream has encountered this issue on their CI as well: https://github.com/systemd/systemd/issues/25091.

Revision history for this message
Nick Rosbrook (enr0n) wrote :

There have been several updates on the upstream bug, so I have opened a task for jammy.

Frank Heimes (fheimes)
tags: added: ppc64el reverse-proxy-bugzilla
Changed in ubuntu-power-systems:
importance: Undecided → Medium
assignee: nobody → bugproxy (bugproxy)
Revision history for this message
Luca Boccassi (bluca) wrote :

We are suspecting issues in the ppc64el toolchain. The problem is only reproducible on Ubuntu ppc, and not on other distributions. The backtrace show some borken pointers that can't be explained. The upstream issue has all the details. None of us speak ppc assembly so we can't really check the binaries.

bugproxy (bugproxy)
tags: added: architecture-ppc64le bugnameltc-201628 severity-medium targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2023-02-14 17:28 EDT-------
Howdy. Adding developers from KVM on Power and Toolchain to our bug but it would be good to understand exactly what IBM needs to look at. For example, you mention a backtrace you see but I don't see that in the bug. Any details you can provide are appreciated.

Revision history for this message
Luca Boccassi (bluca) wrote :

Backtraces are on the upstream bug, eg:

https://github.com/systemd/systemd/issues/25091#issuecomment-1401824195
https://github.com/systemd/systemd/issues/25091#issuecomment-1402425787

It would be better if the IBM developers could engage directly on the upstream ticket for the initial investigation, given the information is recorded there. Thanks!

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2023-06-06 01:35 EDT-------
(In reply to comment #6)
> Backtraces are on the upstream bug, eg:
>
> https://github.com/systemd/systemd/issues/25091#issuecomment-1401824195
> https://github.com/systemd/systemd/issues/25091#issuecomment-1402425787
>
> It would be better if the IBM developers could engage directly on the
> upstream ticket for the initial investigation, given the information is
> recorded there. Thanks!

Peter, Could you check this upstream bug and confirm if this needs to be looked by tools chain team as claimed by Ubuntu distro team?

Revision history for this message
Luca Boccassi (bluca) wrote :

Hi, did any PPC expert have a chance to look at this issue? We keep seeing it when running systemd ppc64 tests in qemu on Ubuntu jammy (both host and guest, same version)

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2023-08-23 15:16 EDT-------
(In reply to comment #8)
> Hi, did any PPC expert have a chance to look at this issue? We keep seeing
> it when running systemd ppc64 tests in qemu on Ubuntu jammy (both host and
> guest, same version)

I commented in the upstream github issue. What I said is that it doesn't seem like a compiler issue to me, since the problem only seems to be exposed when running inside autopkgtest and not when running outside of autopkgtest, even when using the same compiler options. It seems more the memory corruption to me given the random failure locations and memory corruption can be caused by just about anything.

Revision history for this message
Luca Boccassi (bluca) wrote :

Thanks, could it be a problem in the jammy ppc64 qemu then? We never saw this memory corruption outside of qemu

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2023-11-20 03:40 EDT-------
Harsh, This issue/bug is reported on Ubuntu 22.04 Distro and hence we cannot track behavior reported in other Distro like fedora though internally Development can check those behaviour.

I will ask Jamie if he can pass the machine recreate information and steps on Ubuntu distro.
.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.