ZFS + Encryption installations of Ubuntu Desktop do not come up correctly on first boot, systemd unmounts many of the zfs volumes

Bug #1993318 reported by Aaron Rainbolt
40
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Fix Released
Undecided
Unassigned
Ubuntu Manual Tests
Triaged
High
Tim Andersson
ubiquity (Ubuntu)
Fix Released
Undecided
Unassigned
Jammy
Fix Released
High
Unassigned
zfs-linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

This is *probably* the wrong package, but it's the best I can figure for this, so here goes.

Hardware: Kubuntu Focus XE, 32 GB RAM, 1 TB SSD, 11th Gen Intel Core i5, UEFI, no secure boot. Testing done in GNOME Boxes, BIOS, 4 GB RAM, 50 GB disk space . OS is Ubuntu Desktop, Kinetic Final ISO.

[Testcase]

tl;dr encrypted-zfs, firstboot, `systemctl daemon-reload` must not unmount half of mountpoints, ie. /var/lib.

Steps to reproduce:

1. Boot the Ubuntu desktop ISO.
2. Select "Install Ubuntu" and proceed with the installation process.
3. When you get to the "Installation type" screen, select "Advanced Options", and enable ZFS + Encryption.
4. Proceed with the rest of the installation as normal.
5. Reboot into the newly installed system.
6. Log in.
7. Run "sudo apt update" in a terminal.

Expected result: The package database should be updated normally.

Actual result: You are presented with the following errors at the end of the apt output:

Reading package lists... Error!
E: flAbsPath on /var/lib/dpkg/status failed - realpath (2: No such file or directory)
E: Could not open file - open (2: No such file or directory)
E: Problem opening
E: The package lists or status file could not be parsed or opened.

Notes: Switching to a TTY will print a crash error message related to the same missing /var/lib/dpkg/status file. Running "sudo touch /var/lib/dpkg/status" will allow "sudo apt update" to function and fix the crashed process in the TTY.

[End Testcase]

Once you log in, you'll notice that Firefox is missing (bug #1993279), and you will likely be presented with a ton of error messages and other scary junk. At least one of those error messages was related to update-manager in my experience, and another one was from "check-new-release-gtk".

ProblemType: Bug
DistroRelease: Ubuntu 22.10
Package: zsys (not installed)
ProcVersionSignature: Ubuntu 5.19.0-21.21-generic 5.19.7
Uname: Linux 5.19.0-21-generic x86_64
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
ApportVersion: 2.23.1-0ubuntu3
Architecture: amd64
CasperMD5CheckResult: pass
CurrentDesktop: ubuntu:GNOME
Date: Tue Oct 18 09:55:27 2022
InstallationDate: Installed on 2022-10-18 (0 days ago)
InstallationMedia: Ubuntu 22.10 "Kinetic Kudu" - Release amd64 (20221018)
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no username)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: zsys
UpgradeStatus: No upgrade log present (probably fresh install)

Related branches

Revision history for this message
Aaron Rainbolt (arraybolt3) wrote :
description: updated
Revision history for this message
Steve Langasek (vorlon) wrote :

"this is probably the wrong package" - do you have zsys installed? We stopped installing zsys in jammy and it shouldn't have been installed as part of a kinetic install

Changed in zsys (Ubuntu):
status: New → Incomplete
Revision history for this message
Brian Murray (brian-murray) wrote :
Revision history for this message
Charles Hedrick (hedrick) wrote :

Note that there are serious bugs with ZFS encryption. Most involve send | receive, but not all. I strongly recommend against using it.

Revision history for this message
Brian Murray (brian-murray) wrote :

Rebooting after install I noticed that /var/lib/dpkg/ and /var/lib/apt/ are both empty in the installed system.

bdmurray@bdmurray-Standard-PC-Q35-ICH9-2009:~$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
bpool 135M 1.37G 96K /boot
bpool/BOOT 134M 1.37G 96K none
bpool/BOOT/ubuntu_h8tofc 134M 1.37G 134M /boot
rpool 6.58G 4.08G 192K /
rpool/ROOT 6.06G 4.08G 192K none
rpool/ROOT/ubuntu_h8tofc 6.06G 4.08G 4.39G /
rpool/ROOT/ubuntu_h8tofc/srv 192K 4.08G 192K /srv
rpool/ROOT/ubuntu_h8tofc/usr 580K 4.08G 192K /usr
rpool/ROOT/ubuntu_h8tofc/usr/local 388K 4.08G 388K /usr/local
rpool/ROOT/ubuntu_h8tofc/var 1.67G 4.08G 192K /var
rpool/ROOT/ubuntu_h8tofc/var/games 192K 4.08G 192K /var/games
rpool/ROOT/ubuntu_h8tofc/var/lib 1.67G 4.08G 1.52G /var/lib
rpool/ROOT/ubuntu_h8tofc/var/lib/AccountsService 212K 4.08G 212K /var/lib/AccountsService
rpool/ROOT/ubuntu_h8tofc/var/lib/NetworkManager 224K 4.08G 224K /var/lib/NetworkManager
rpool/ROOT/ubuntu_h8tofc/var/lib/apt 88.2M 4.08G 88.2M /var/lib/apt
rpool/ROOT/ubuntu_h8tofc/var/lib/dpkg 57.7M 4.08G 57.7M /var/lib/dpkg
rpool/ROOT/ubuntu_h8tofc/var/log 1.54M 4.08G 1.54M /var/log
rpool/ROOT/ubuntu_h8tofc/var/mail 192K 4.08G 192K /var/mail
rpool/ROOT/ubuntu_h8tofc/var/snap 456K 4.08G 456K /var/snap
rpool/ROOT/ubuntu_h8tofc/var/spool 264K 4.08G 264K /var/spool
rpool/ROOT/ubuntu_h8tofc/var/www 192K 4.08G 192K /var/www
rpool/USERDATA 5.16M 4.08G 192K /
rpool/USERDATA/bdmurray_jx3u59 4.70M 4.08G 4.70M /home/bdmurray
rpool/USERDATA/root_jx3u59 272K 4.08G 272K /root
rpool/keystore 518M 4.52G 63.4M -

Revision history for this message
Steve Langasek (vorlon) wrote :

> rpool/ROOT/ubuntu_h8tofc/var/lib/apt 88.2M 4.08G 88.2M /var/lib/apt
> rpool/ROOT/ubuntu_h8tofc/var/lib/dpkg 57.7M 4.08G 57.7M /var/lib/dpkg

So the volumes exist; are they mounted? please show the output of 'mount'.

Revision history for this message
Brian Murray (brian-murray) wrote :
Download full text (3.3 KiB)

bdmurray@bdmurray-Standard-PC-Q35-ICH9-2009:~$ mount
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
udev on /dev type devtmpfs (rw,nosuid,relatime,size=1932912k,nr_inodes=483228,mode=755,inode64)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=399688k,mode=755,inode64)
/dev/mapper/keystore-rpool on /run/keystore/rpool type ext4 (rw,relatime,stripe=2)
rpool/ROOT/ubuntu_h8tofc on / type zfs (rw,relatime,xattr,posixacl)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev,inode64)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k,inode64)
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
bpf on /sys/fs/bpf type bpf (rw,nosuid,nodev,noexec,relatime,mode=700)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=29,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=15094)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
mqueue on /dev/mqueue type mqueue (rw,nosuid,nodev,noexec,relatime)
debugfs on /sys/kernel/debug type debugfs (rw,nosuid,nodev,noexec,relatime)
tracefs on /sys/kernel/tracing type tracefs (rw,nosuid,nodev,noexec,relatime)
configfs on /sys/kernel/config type configfs (rw,nosuid,nodev,noexec,relatime)
fusectl on /sys/fs/fuse/connections type fusectl (rw,nosuid,nodev,noexec,relatime)
ramfs on /run/credentials/systemd-sysusers.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700)
/dev/vda1 on /boot/efi type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
/dev/vda1 on /boot/grub type vfat (rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
bpool/BOOT/ubuntu_h8tofc on /boot type zfs (rw,nodev,relatime,xattr,posixacl)
rpool/ROOT/ubuntu_h8tofc/var/lib on /var/lib type zfs (rw,relatime,xattr,posixacl)
rpool/ROOT/ubuntu_h8tofc/var/log on /var/log type zfs (rw,relatime,xattr,posixacl)
rpool/ROOT/ubuntu_h8tofc/var/spool on /var/spool type zfs (rw,relatime,xattr,posixacl)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime)
tmpfs on /run/snapd/ns type tmpfs (rw,nosuid,nodev,noexec,relatime,size=399688k,mode=755,inode64)
nsfs on /run/snapd/ns/firefox.mnt type nsfs (rw)
rpool/ROOT/ubuntu_h8tofc/var/snap on /var/snap type zfs (rw,relatime,xattr,posixacl)
rpool/ROOT/ubuntu_h8tofc/var/lib/AccountsService on /var/lib/AccountsService type zfs (rw,relatime,xattr,posixacl)
rpool/ROOT/ubuntu_h8tofc on /var/snap/firefox/common/host-hunspell type zfs (ro,noexec,noatime,xattr,posixacl)
tmpfs on /run/user/1000 type tmpfs (rw,nosuid,nodev,relatime,size=399684k,nr_inodes=99921,mode=700,uid=1000,gid=1000,inode64)
rpool/USERDATA/bdmurray_jx3u59 on /home/bdmurray type zfs (rw,relatime,xattr,posixa...

Read more...

Revision history for this message
Steve Langasek (vorlon) wrote :

zsys doesn't get installed in the target system and removing it from the live environment before running ubiquity does not fix the problem, so marking this task invalid.

Changed in zsys (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Brian Murray (brian-murray) wrote :

Running `sudo zfs mount -a` caused /var/lib/apt and /var/lib/NetworkManager, probably among others, to be mounted. Additionally, on a second reboot of the system /var/lib/apt, /var/lib/dpkg, and /var/lib/NetworkManager were all mounted and available.

Revision history for this message
Nick Rosbrook (enr0n) wrote :
Revision history for this message
Nick Rosbrook (enr0n) wrote :

Here are some logs from the first boot of a buggy ZFS + encryption installation.

Revision history for this message
Steve Langasek (vorlon) wrote :

Fresh install of kinetic 20221018 Ubuntu image, first boot, I see:

$ zfs list | grep -c pool/ROOT/ubuntu_
16
$ mount | grep -c pool/ROOT/ubuntu_
7
$

So have the mount points are not mounted as they are supposed to be.

As I was trying to log in on first boot, I also found that gdm flickered off, then I saw on the console messages from systemd about it trying to and *failing* to unmount certain mountpoints under /var; then gdm restarts and when I log in, half the mountpoints are not where they are supposed to be.

Revision history for this message
Steve Langasek (vorlon) wrote :

/run/log/journal from first boot on an affected kinetic install.

Revision history for this message
Brian Murray (brian-murray) wrote :

I was not able to recreate this issue on first boot after an install of Ubuntu 22.04.1 (with serial 20220809.1).

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ubiquity - 22.10.10

---------------
ubiquity (22.10.10) kinetic; urgency=medium

  * Temporarily disable zfs + encryption option (LP: #1993318)

 -- William 'jawn-smith' Wilson <email address hidden> Tue, 18 Oct 2022 13:35:37 -0500

Changed in ubiquity (Ubuntu):
status: New → Fix Released
Revision history for this message
Steve Langasek (vorlon) wrote :

/var/log/journal from second boot on the same kinetic install.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I installed encrypted zfs kinetic 20221018 iso build.

Shut down the VM and made a copy of it, such that I can "first boot" all day long.

On first boot, I added break=bottom command, and changed journald.conf to use Storage=volatile, such that all logs remain in /run/log without any logs being flushed to /var/log, which may or may not be corrupted during first boot.

On first boot, machine doesn't reach "default graphical target", slightly after seeding firefox.snap something (snapd?) calls systemctl reload, and system looses its marbles and tries to unmount all the things, followed by finally reaching graphical target if one switched to tty1 to like cause it to reach the graphical target.

I will attach partial log showing the above; and full journal.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Revision history for this message
Dimitri John Ledkov (xnox) wrote :
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

1) doing first boot of encrypted zfs kinetic, edit boot cmdline to include:

systemd.mask=snapd.service systemd.snapd.seeded.service systemd.mask=snapd.socket

crutially this prevents snapd seeding to complete which calls systemctl daemon-relaod.

system boots normally

2) login into tty and check mounts (mount | grep zfs | wc => is 17)

3) systemctl daemon-reload => causes issues with -.mount and stops a bunch of stuff, and reastarts gdm, and unmounts a bunch of stuff.

4) login into tty, and check mounts, there are now just 7 zfs mounts

5) do $ sudo zfs mount -a => to get back to 17 mounts.

somehow something systemd does not like upon doing daemon-reload.

Changed in snapd (Ubuntu):
status: New → Incomplete
Changed in zfs-linux (Ubuntu):
status: New → Incomplete
Changed in systemd (Ubuntu):
importance: Undecided → Critical
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

subsequent boots are fine.

i wonder if we have a race somewhere, for example. On first boot zfs cache is out of date. And zfs mount generator doesn't generate mounts for all the mounted file systems.

then the first daemon-reload ever, will happen after zfs cache is populated and the zfs volumes are emitted for the first time.

maybe something like mounting all subvolumes should happen from initrd. as part of pivot root.

or at least building zfs cache, such that first boot's generators run is correct.

Changed in zfs-linux (Ubuntu):
status: Incomplete → New
Revision history for this message
Jose Ogando Justo (joseogando) wrote :

I have gathered a difference between zfs get all properties before and after mounting.

I observe no difference other than mounted property from no to yes.

Revision history for this message
Jose Ogando Justo (joseogando) wrote :

After zfs mount -a

Steve Langasek (vorlon)
summary: - ZFS + Encryption installations of Ubuntu Desktop suffer various severe
- problems related to the package manager
+ ZFS + Encryption installations of Ubuntu Desktop do not come up
+ correctly on first boot, systemd unmounts many of the zfs volumes
Revision history for this message
Dimitri John Ledkov (xnox) wrote (last edit ):

on first boot /etc/zfs/zfs-list.cache/rboot /etc/zfs/zfs-list.cache/bpool are empty.

during boot, eventually they get populated by zfs-zed.service, with material information, as used by zfs-mount-generator upon fist boot & any daemon-reload.

this causes daemon-reload to go bananzas, as zfs-mount-generator produces units for the first time and generates many essential mount units. After that if cache is populated, generators on boot & daemon-reload produces the same units and everything and everyone is happy.

.... until one boots a snapshot. I bet booting snapshot, result in an in-cohesive.

I wonder if installer could generated correct and expected zfs-list.cache in target system.

Or I wonder if our zfs-mount-generator(8) in ubuntu is out of date (because of zsys support).
Or if there has been some systemd regression.

I wonder if we can make zfs-mount-generator do nothing, after the boot has started.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

https://code.launchpad.net/~xnox/ubiquity/+git/ubiquity/+merge/431831 should solve first boot post-install.

However, I don't think that will solve booting snapshot, or whenever on-disk cached is missing/corrupted/out of date, and one tries to boot.

Seems quite scary.

tags: added: foundations-triage-discuss
Revision history for this message
Brian Murray (brian-murray) wrote :

There is no test case for this in the ubuntu-manual-tests code which contains the test cases which are listed in the ISO tracker. If this is a supported installation method then we should have a test case for it.

Changed in ubuntu-release-notes:
status: New → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Changed in zfs-linux (Ubuntu):
status: New → Confirmed
Nick Rosbrook (enr0n)
Changed in systemd (Ubuntu):
status: Confirmed → Invalid
tags: removed: foundations-triage-discuss
Changed in snapd (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Dimitri thank you for the upload! I'd like to accept it soonish, but since this SRU bug is missing the usual Test Case, Impact and Regression Potential, could you fill that up before I accept it? I'd like to know more about the fix that you performed, what it changes and what could go wrong with it.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :
no longer affects: snapd (Ubuntu Jammy)
no longer affects: systemd (Ubuntu Jammy)
Changed in ubiquity (Ubuntu Jammy):
importance: Undecided → High
milestone: none → ubuntu-22.04.2
status: New → Triaged
no longer affects: zfs-linux (Ubuntu Jammy)
no longer affects: zsys (Ubuntu Jammy)
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@sil2100 marked out test case in the bug report more clearly.

description: updated
Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Aaron, or anyone else affected,

Accepted ubiquity into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/ubiquity/22.04.19 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in ubiquity (Ubuntu Jammy):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Aaron Rainbolt (arraybolt3) wrote :

I can verify that the latest version of Ubiquity from -proposed fixes this problem - I booted Ubuntu Desktop 22.04.1, enabled -proposed on the ISO, installed Ubiquity from -proposed, then did a ZFS+encryption installation. Rebooted, logged in, "sudo apt update" behaves normally, Firefox launches and works properly out of the box, and no errors are present when I log into a TTY.

tags: added: verification-done verification-done-jammy
removed: verification-needed verification-needed-jammy
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ubiquity - 22.04.19

---------------
ubiquity (22.04.19) jammy; urgency=medium

  * zsys-setup: generate correct zfs-list.cache for target (LP: #1993318)

 -- Dimitri John Ledkov <email address hidden> Tue, 10 Jan 2023 16:26:48 +0000

Changed in ubiquity (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for ubiquity has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Tom Reynolds (tomreyn) wrote :

This can be a dupe of bug 1970066

Mathew Hodson (mhodson)
no longer affects: zsys (Ubuntu)
no longer affects: systemd (Ubuntu)
no longer affects: snapd (Ubuntu)
Changed in ubuntu-manual-tests:
status: New → Triaged
importance: Undecided → High
assignee: nobody → Tim Andersson (andersson123)
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

BTW

this bug was closed, when fix migrated in Lunar release.

This bug report is missing:

Lunar fix released

ubiquity (23.04.2) lunar; urgency=medium

  * Fix lunar's pep8, not cought with local rebuild due to noautomatic.

ubiquity (23.04.1) lunar; urgency=medium

  [ Sebastien Bacher ]
  * d-i/source/user-setup/debian/user-setup-udeb.templates:
    remove 'lxd' from user-default-groups and add 'users' instead,
    it will let lxd default to unprivileged containers which is better
    see https://discourse.ubuntu.com/t/easy-multi-user-lxd-setup for details
    (lp: #1949115)

  [ Dimitri John Ledkov ]
  * Re-enable zfs encryption
  * zsys-setup: generate correct zfs-list.cache for target (LP: #1993318)

 -- Dimitri John Ledkov <email address hidden> Mon, 28 Nov 2022 16:48:59 +0000

Revision history for this message
Tim Andersson (andersson123) wrote :

Here's an MP for ubuntu-manual-tests with this new testcase for the legacy image:

https://code.launchpad.net/~ubuntu-testcase/ubuntu-manual-tests/+git/ubuntu-manual-tests/+merge/451989

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.