udev issues with mantic beta

Bug #2037569 reported by Frank Heimes
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Unassigned
libblockdev (Ubuntu)
Fix Released
Undecided
Unassigned
systemd (Ubuntu)
Invalid
Undecided
Unassigned
udisks2 (Ubuntu)
Fix Released
Undecided
Dan Bungert

Bug Description

While installing mantic beta (on s390x, LPAR and z/VM - but this might not be architecture specific) I faced issues with udev.
In my installation I've updated the installer to "edge/lp-2009141" (subiquity 22.02.2+git1762.1b1ee6f4 5164).

During my installations I first noticed bad response times in case of dealing with devices (like enabling new devices with chzdev). chzdev is used during the installation, hence the installation procedure is also affected by this. (I mainly notice this issue in case of DASD ECKD disk enablements.)

But even after after a successful (but due to this issue less snappier) installation, means after the post-install reboot, in the installed system I can find several udev related processes, like:
  69448 root 20 0 31280 11944 2560 S 39.2 0.0 2:51.67 (udev-worker)
    509 root 20 0 31276 13812 4600 S 20.6 0.0 2:07.76 systemd-udevd
    893 root 20 0 469016 13544 10496 R 17.3 0.0 1:43.53 udisksd
      1 root 20 0 168664 12748 8396 S 16.3 0.0 1:40.47 systemd
which is not only unusual, but (as one can see) they consume quite some resources.
Even the remote ssh into that system is impacted by this high load.

So far I only see this in mantic.
I tried 20.04.3 as well as lunar, but both do not seem to be affected by this udev problem.
I neither face the bad response on device enablement, nor can see any udev related processes still running after post-install-reboot in the installed system.

(Sometimes I could also see a growing log file 'syslog').

I cannot say yet what is causing this, but since I see 'systemd-udevd' as prominent process in top, I'll first of all mark this as affecting systemd-udevd (or systemd).

I've attached the outcome of some more investigations I did ...

Revision history for this message
Frank Heimes (fheimes) wrote :
Dan Bungert (dbungert)
Changed in systemd (Ubuntu):
status: New → Confirmed
Dan Bungert (dbungert)
Changed in udisks2 (Ubuntu):
status: New → Confirmed
Revision history for this message
Dan Bungert (dbungert) wrote :

Spent some time analyzing this.

Frank loaned me a test system. The test system in the "idle" state is enough to see the issue, with a persistent load of 3+ when nothing is happening except event loop spin. The test system is using a dasd as the disk that it is running from.

With some udev tracing, one can see an apparent loop, with udisks2 opening /dev/dasda, which triggers an inotify, which triggers udev rules, which trigger udisks2 again, or something similar to this flow.

Workarounds include:
* adding a nowatch rule 'ACTION=="add|change", KERNEL=="dasd*", OPTIONS:="nowatch"'
* systemctl stop udisks2

I've begun exploring udev changes for udisks2 to ignore dasd.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Confirmed
Revision history for this message
Frank Heimes (fheimes) wrote (last edit ):
Download full text (12.3 KiB)

Today I did some more tests and investigations.

First of all I moved to the new mantic ISO image (20230928) that improved things quite a lot!
The installation (on z/VM so far) is again very snappy and quick,
incl. the enablement of a DASD device in the zDev activation screen.

At the end of the installation, before hitting 'Reboot now' I went to the console and had a look at top to see if udev related processes are active, but I couldn't identify any:

top - 10:47:51 up 9 min, 2 users, load average: 0.44, 0.41, 0.20
Tasks: 137 total, 1 running, 136 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 3988.9 total, 214.3 free, 1957.6 used, 3233.2 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 2031.3 avail Mem
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
  15026 root 20 0 8788 4864 2816 R 0.7 0.1 0:00.28 top
   1439 root 20 0 129824 74216 21376 S 0.3 1.8 0:03.76 /snap/subiquity/5183/usr/bin/python3.10 /snap/subiquity/518+
   1441 root 20 0 129788 74312 21376 S 0.3 1.8 0:03.57 /snap/subiquity/5183/usr/bin/python3.10 /snap/subiquity/518+
   2381 root 20 0 12216 5760 4864 S 0.3 0.1 0:00.10 sudo snap run subiquity
   2383 root 20 0 206204 77680 21632 S 0.3 1.9 0:04.96 /snap/subiquity/5183/usr/bin/python3.10 -m subiquity
      1 root 20 0 103300 13676 8940 S 0.0 0.3 0:03.49 /sbin/init ---
      2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kthreadd]
      3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 [rcu_gp]
      4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 [rcu_par_gp]
      5 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 [slub_flushwq]
      6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 [netns]
      8 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 [kworker/0:0H-events_highpri]
      9 root 20 0 0 0 0 I 0.0 0.0 0:00.00 [kworker/0:1-cgroup_destroy]
     10 root 20 0 0 0 0 I 0.0 0.0 0:01.11 [kworker/u128:0-events_unbound]
     11 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 [mm_percpu_wq]
     12 root 20 0 0 0 0 I 0.0 0.0 0:00.00 [rcu_tasks_rude_kthread]
     13 root 20 0 0 0 0 I 0.0 0.0 0:00.00 [rcu_tasks_trace_kthread]

Then, ...

Revision history for this message
Frank Heimes (fheimes) wrote :

I think it's caused by udisksctl here:
https://github.com/storaged-project/udisks/blob/c7027d888c00381851d918f33a13102e7b86e188/tools/udisksctl.c#L2810C1-L2811C75
due to its repeating monitor messages:
udisksctl monitor
** (udisksctl monitor:424911): WARNING **: 11:14:33.100: (udisksctl.c:2811):monitor_on_interface_proxy_properties_changed: runtime check failed: (g_strv_length ((gchar **) invalidated_properties) == 0)

Revision history for this message
Dan Bungert (dbungert) wrote :

In function udisks_linux_partition_table_update
if num_parts > 0,
no ID_PART_TABLE_TYPE is present,
and bd_part_get_disk_spec fails to get the disk information including part_type,
this function will fail.

Also, because libblockdev uses libfdisk in a manner that opens the disk RW,
that causes another udev event
and that will cause udisks_linux_partition_table_update to be called again.

Revision history for this message
Dan Bungert (dbungert) wrote :

Marking invalid in systemd, other than long-standing udev rules being involved I'm not sure this will be fixed in systemd.

Exploring if libblockdev with the s390 plugin enabled helps.

Changed in systemd (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Dan Bungert (dbungert) wrote :

after exploring udisks2/libblockdev/libfdisk, I have three possible proposals for resolution:

1. hack udisks2 udisks_linux_partition_table_update
  * calling bd_part_get_disk_spec is always going to be a loop, unless
    * watches have been inhibited
    * or calling bd_part_get_disk_spec causes
      ID_PART_TABLE_TYPE to be set
  * disable everything under `if (!part_type && num_parts > 0)`, log
    and bail
2. add read-only API in libblockdev so we open the dev RO in libfdisk
3. add vtoc support to libfdisk + enable the s390 bits in libblockdev

only option 1 sounds plausible to me, though, for a Mantic timeframe.

Revision history for this message
Dan Bungert (dbungert) wrote :

proposal 4. set ID_PART_TABLE_TYPE in the appropriate spot

Revision history for this message
Dan Bungert (dbungert) wrote :

Proposed changes to avoid the event storm

Revision history for this message
Dan Bungert (dbungert) wrote :

Seb reviewed and +1ed on IRC. Uploaded.

Changed in udisks2 (Ubuntu):
assignee: nobody → Dan Bungert (dbungert)
status: Confirmed → Fix Committed
tags: added: patch
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package udisks2 - 2.10.1-1ubuntu1

---------------
udisks2 (2.10.1-1ubuntu1) mantic; urgency=medium

  * Fix an event loop that can occur when ID_PART_TABLE_TYPE is not set for a
    device and that device has partitions. (LP: #2037569)

 -- Dan Bungert <email address hidden> Mon, 02 Oct 2023 11:26:55 -0600

Changed in udisks2 (Ubuntu):
status: Fix Committed → Fix Released
Revision history for this message
Frank Heimes (fheimes) wrote :

Hi Dan, I've took the manytic daily from yesterday (Oct 3rd) and checked if the updated udisks2 package is in - which is the case - and gave it a try.
I did multiple different installations - all with DASDs, on LPAR and on z/VM. Installations with a single DASD or with multiple DASDs (even combining them to a bigger disk using LVM), and I didn't faced udev issues anymore. I checked with top during the installation as well as after the post-install reboot was completed -- all good.

So many thanks for your fix and upload on this (and Seb's review) !

Changed in ubuntu-z-systems:
status: Confirmed → Fix Released
bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-203787 severity-high targetmilestone-inin---
bugproxy (bugproxy)
tags: added: targetmilestone-inin2310
removed: targetmilestone-inin---
Revision history for this message
Jeremy Bícha (jbicha) wrote :

The libblockdev fix was included in 3.0.4 which is now in Ubuntu 24.04 LTS

Changed in libblockdev (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.