Installer ui crash on kinetic on s390x (but probably not limited to)

Bug #1993257 reported by Frank Heimes
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Skipper Bug Screeners
subiquity
Fix Released
Undecided
Olivier Gayot

Bug Description

I'm trying today the kinetic ISO from Oct the 17th with a manual install on an s390x LPAR, which worked fine if using FCP/SCSI storage or a single DASD.
But if multiple (in my case 3) DASD disks are selected the installer crashes with this msg:

▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
  An error occurred during installation [ Help ]
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
  ┌────────────────────────────────────────────────────────────────────────┐
  │subiquity/Early/apply_autoinstall_config │

   ┌──────────────────────────────────────────────────────────────────────┐
   │ │
   │ Sorry, an unknown error occurred. │
   │ │
   │ [ View full report ] │
   │ │
   │ If you want to help improve the installer, you can send an error │
   │ report. │
   │ │
   │ [ Send to Canonical ] │
   │ │
   │ [ Close report ] │
   │ │
   └──────────────────────────────────────────────────────────────────────┘

 2022-10-18 07:12:17,779 ERROR subiquity.server.server:422 top level error
 Traceback (most recent call last):
   File "/snap/subiquity/3977/usr/lib/python3.8/asyncio/events.py", line 81, in _run
     self._context.run(self._callback, *self._args)
   File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquity/server/controllers/filesystem.py", line 680, in _udev_event
     action, dev = self._monitor.receive_device()
   File "/snap/subiquity/3977/lib/python3.8/site-packages/pyudev/monitor.py", line 400, in receive_device
     device = self.poll()
   File "/snap/subiquity/3977/lib/python3.8/site-packages/pyudev/monitor.py", line 358, in poll
     if eintr_retry_call(poll.Poll.for_events((self, "r")).poll, timeout):
   File "/snap/subiquity/3977/lib/python3.8/site-packages/pyudev/_util.py", line 164, in eintr_retry_call
     return func(*args, **kwargs)
   File "/snap/subiquity/3977/lib/python3.8/site-packages/pyudev/_os/poll.py", line 94, in poll
     return list(self._parse_events(eintr_retry_call(self._notifier.poll, timeout)))
   File "/snap/subiquity/3977/lib/python3.8/site-packages/pyudev/_os/poll.py", line 109, in _parse_events
     raise IOError("Error while polling fd: {0!r}".format(fd))
 OSError: Error while polling fd: 22
 2022-10-18 07:12:17,783 DEBUG subiquity.common.errorreport:384 generating crash report
 2022-10-18 07:12:17,783 INFO subiquity.common.errorreport:406 saving crash report 'unknown error crashed with OSError' to /var/crash/1666077137.783155680.unknown.crash
 2022-10-18 07:12:17,784 INFO root:37 start: subiquity/ErrorReporter/1666077137.783155680.unknown/add_info:
 2022-10-18 07:12:17,784 INFO root:37 finish: subiquity/Meta/status_GET: SUCCESS: 200 {"state": "ERROR", "confirming_tty": "", "error": {"state": "INCOMPLETE", "ba...
 2022-10-18 07:12:17,785 INFO root:37 finish: subiquity/Meta/status_GET: SUCCESS: 200 {"state": "ERROR", "confirming_tty": "", "error": {"state": "INCOMPLETE", "ba...
 2022-10-18 07:12:17,785 INFO root:37 finish: subiquity/Meta/status_GET: SUCCESS: 200 {"state": "ERROR", "confirming_tty": "", "error": {"state": "INCOMPLETE", "ba...
 2022-10-18 07:12:17,809 DEBUG subiquitycore.utils:77 run_command ['udevadm', 'settle', '-t', '0'] exited with code 1
 2022-10-18 07:12:17,809 DEBUG subiquity.server.controllers.filesystem:669 waiting 0.1 to let udev event queue settle
 2022-10-18 07:12:17,811 INFO aiohttp.access:233 [18/Oct/2022:07:10:32 +0000] "GET /meta/status?cur=%22WAITING%22 HTTP/1.1" 200 519 "-" "Python/3.8 aiohttp/3.6.2"
 2022-10-18 07:12:17,811 INFO aiohttp.access:233 [18/Oct/2022:07:10:32 +0000] "GET /meta/status?cur=%22WAITING%22 HTTP/1.1" 200 519 "-" "Python/3.8 aiohttp/3.6.2"
 2022-10-18 07:12:17,811 INFO aiohttp.access:233 [18/Oct/2022:07:11:20 +0000] "GET /meta/status?cur=%22WAITING%22 HTTP/1.1" 200 519 "-" "Python/3.8 aiohttp/3.6.2"
 2022-10-18 07:12:17,814 INFO root:37 start: subiquity/Meta/status_GET:
 2022-10-18 07:12:17,815 INFO root:37 start: subiquity/Meta/status_GET:

A lot of crash reports are generated:

root@ubuntu-server:/var/crash# ls -1 | wc -l
704
root@ubuntu-server:/var/crash#

This is reproducible.

A tgz with the complete content of /var/crash and /var/log is attached.

If I select only a single DASD, this error does not happen and the installation completes successfully.

Revision history for this message
Frank Heimes (fheimes) wrote :
description: updated
Changed in ubuntu-z-systems:
importance: Undecided → Critical
description: updated
Revision history for this message
Frank Heimes (fheimes) wrote :

Further investigations showed that this problem could have been caused by old LVM data on the disks.
I wiped out the 3 DASD disks (using 'dasdfmt --quick'), restarted the installation and it completed successfully.
With that I reduce the severity to high.

(attaching the logs for the successful install, after having wiped out the disks, just for the reason of completeness)

Changed in ubuntu-z-systems:
importance: Critical → High
Revision history for this message
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
https://iso.qa.ubuntu.com/qatracker/reports/bugs/1993257

tags: added: iso-testing
Revision history for this message
Frank Heimes (fheimes) wrote :
Download full text (4.2 KiB)

bummer, after that successful install (when I formatted the disks manually by hand before)
I tried another install on the same system and now ran into a ui crash, at screen:
'Guided storage configuration --> Custom storage layout'

generating crash report
report saved to /var/crash/1666082392.645956755.ui.crash
Traceback (most recent call last):
  File "/snap/subiquity/3977/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/snap/subiquity/3977/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquity/__main__.py", line 5, in <module>
    sys.exit(main())
  File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquity/cmd/tui.py", line 150, in main
    subiquity_interface.run()
  File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquity/client/client.py", line 407, in run
    super().run()
  File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquitycore/tui.py", line 381, in run
    super().run()
  File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquitycore/core.py", line 135, in run
    raise exc
  File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquity/client/controllers/filesystem.py", line 245, in _guided_choice
    self.model.load_server_data(status)
  File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquity/models/filesystem.py", line 1092, in load_server_data
    self._actions = self._actions_from_config(
  File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquity/models/filesystem.py", line 1223, in _actions_from_config
    for action in config:
TypeError: 'NoneType' object is not iterable
Connection to s1lp15 closed.

▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄▄
  Willkommen! Bienvenue! Welcome! Добро пожаловать! Welkom! [ Help ]
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
  Use UP, DOWN and ENTER keys to select your language.
 ┌───────────────────────────────────────────────────────────────────────┐
   │ │
   │ Sorry, the installer has restarted because of an error. │
   │ │
   │ [ View full report ] │
   │ │
   │ If you want to help improve the installer, you can send an error │
   │ report. │
   │ │
   │ [ Send to Canonical ] │
   │ │
   │ Select continue to try the installation again. │
   │ │
   │ [ Continue ] │
   │ │
   └───...

Read more...

Revision history for this message
Frank Heimes (fheimes) wrote :
Revision history for this message
Frank Heimes (fheimes) wrote :
Download full text (3.5 KiB)

I'm still not sure what's going on - now the UI crash reappears (still working on the same system with the same installation data)...

generating crash report
report saved to /var/crash/1666084828.321582079.ui.crash
Traceback (most recent call last):
  File "/snap/subiquity/3977/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/snap/subiquity/3977/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquity/__main__.py", line 5, in <module>
    sys.exit(main())
  File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquity/cmd/tui.py", line 150, in main
    subiquity_interface.run()
  File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquity/client/client.py", line 407, in run
    super().run()
  File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquitycore/tui.py", line 381, in run
    super().run()
  File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquitycore/core.py", line 135, in run
    raise exc
  File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquity/client/controllers/filesystem.py", line 245, in _guided_choice
    self.model.load_server_data(status)
  File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquity/models/filesystem.py", line 1092, in load_server_data
    self._actions = self._actions_from_config(
  File "/snap/subiquity/3977/lib/python3.8/site-packages/subiquity/models/filesystem.py", line 1223, in _actions_from_config
    for action in config:
TypeError: 'NoneType' object is not iterable
Connection to s1lp15 closed.

   ┌─────────────────────────────────────────────────────────────────────┐
   │ │
   │ Sorry, the installer has restarted because of an error. │
   │ │
   │ [ View full report ] │
   │ │
   │ If you want to help improve the installer, you can send an error │
   │ report. │
   │ │
   │ [ Send to Canonical ] │
   │ │
   │ Select continue to try the installation again. │
   │ │
   │ [ Continue ] │
   │ │
   └─────────────────────────────────────────────────────────────────────┘

 2022-10-18 09:20:14,450 INFO subiquity/Filesystem:107 start: starting UI
 2022-10-18 09:20:28,320 DEBUG subiquity.models.filesystem:1084 load_server_data
 StorageResponse(status=<ProbeStatus.PROBING: 1>, error_report=None, bootloader=
None, orig_config=None, config=None, blockdev=None, dasd=None, storage_version=1)
 2022-10-18 09:20:28,3...

Read more...

Revision history for this message
Frank Heimes (fheimes) wrote :
Revision history for this message
Frank Heimes (fheimes) wrote :
Revision history for this message
Frank Heimes (fheimes) wrote :

I've now used a different set of disks on the initial system and ran into the ui crash again

Revision history for this message
Frank Heimes (fheimes) wrote :

With a different set of disks I ran initially into the same ui problem (see attachment), and the ssh connection dropped.
But I could reconnect and successfully continue the installation.

Revision history for this message
Olivier Gayot (ogayot) wrote :

For the UI crash, it seems that that we interpreted a response that has the status PROBING as a response we can operate on.

This would be the sequence:

* on the client, _guided_choice is called with choice=None
* the client calls GET /storage (without wait=true) and receives a ProbeStatus.PROBING response.
* the client wrongly assumes that the response can safely be passed to FileSystemModel.load_server_data

Revision history for this message
Olivier Gayot (ogayot) wrote :

This is basically the same as https://bugs.launchpad.net/subiquity/+bug/1962205.
The fix that we did there only works if the client calls POST /storage/guided and not when the client calls GET /storage

        if choice is not None:
            coro = self.endpoint.guided.POST(choice)
        else:
            coro = self.endpoint.GET()

Frank Heimes (fheimes)
summary: - Installer crash on kinetic on s390x if multiple DASD disks are selected
+ Installer ui crash on kinetic on s390x (but probably not limited to)
Revision history for this message
Frank Heimes (fheimes) wrote (last edit ):

Thx, for having a look.

I forced my system now for exclusive DASD usage and didn't ran again into the DASD issue so far - so looks like it was used by different systems (that should not happen, happens only if there is not enough carefulness and will definitely no longer happen with the new z15 DPM system, since we have explicit storage pools there).

So please ignore the DASD issue and focus on the ui issue.

(I just did a FCP/SCSI installations and the ui issue happens there too - like expected, so just fyi.)

In the last days/weeks I mainly did autoinstall installation, probably the reason why I didn't bumped into this earlier.

Olivier Gayot (ogayot)
Changed in subiquity:
assignee: nobody → Olivier Gayot (ogayot)
status: New → In Progress
Revision history for this message
Olivier Gayot (ogayot) wrote :
Dan Bungert (dbungert)
Changed in subiquity:
status: In Progress → Fix Committed
Revision history for this message
Frank Heimes (fheimes) wrote :

Thx a lot Olivier and Dan, for the very quick turnaround on this.
I tested 'latest/edge/test-lp-1993257' meanwhile in 6 slightly different installation cases and I'm pretty confident that this issue (ui crash) is solved.
(Please ignore the DASD related issue, like mentioned before, this was on a system level.)

Changed in ubuntu-z-systems:
status: New → Fix Committed
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Revision history for this message
Dan Bungert (dbungert) wrote :

We believe a fix for this can be found in Subiquity 22.10.1. On
install you will be offered to update to the new version of the
installer if network is available, or you can perform a manual update
by running the follwing in a terminal:
sudo snap refresh subiquity

Changed in subiquity:
status: Fix Committed → Fix Released
Revision history for this message
Frank Heimes (fheimes) wrote :
Revision history for this message
Dan Bungert (dbungert) wrote :

After discussion with Frank, I believe the way the block devices are being wiped for the logs in comments 17 and 18 leave the devices in a strange state, and the tools we use for block probing are responding poorly, which Subiquity isn't handling well.

We plan to split that info to a new bug, as users may wipe block devices in the same manner but it's unrelated to the problem this bug was filed for.

Revision history for this message
Frank Heimes (fheimes) wrote :

Thanks Dan, I've created a spin-off based on comment #17 and #18.
And I'm marking also the project entry of this as Fix Released.

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.