netplan sometimes does not switch properly the renderer

Bug #1979277 reported by Alfonso Sanchez-Beato
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Netplan
Invalid
Medium
Unassigned

Bug Description

In some cases netplan is not switching the renderer as expected on Ubuntu Core 22. In a system with network-manager snap from track 22 installed, I run this script to switch between renderers:

´´´
#!/bin/bash -ex

netplan_cfg_p=/etc/netplan/00-default-nm-renderer.yaml
if [ -f $netplan_cfg_p ]; then
    rm $netplan_cfg_p
else
    cat > $netplan_cfg_p<<EOF
network:
  renderer: NetworkManager
EOF
fi

dbus-send --system --type=method_call --print-reply \
          --dest=io.netplan.Netplan /io/netplan/Netplan \
          io.netplan.Netplan.Apply
/snap/bin/nmcli d
networkctl
´´´

And with a couple of tries I see:

+ rm /etc/netplan/00-default-nm-renderer.yaml
+ dbus-send --system --type=method_call --print-reply --dest=io.netplan.Netplan /io/netplan/Netplan io.netplan.Netplan.Apply
method return time=1655796091.488185 sender=:1.29 -> destination=:1.95 serial=20 reply_serial=2
   boolean true
+ /snap/bin/nmcli d
DEVICE TYPE STATE CONNECTION
ens3 ethernet connected netplan-ens3
lo loopback unmanaged --
+ networkctl
IDX LINK TYPE OPERATIONAL SETUP
  1 lo loopback carrier unmanaged
  2 ens3 ether routable configured

2 links listed.

NM is still handling the device even though it is not the default renderer anymore. I see a file in /run/NetworkManager/system-connections/netplan-ens3.nmconnection, and no files in /run/systemd/network/ (although networkd seems to take care of the interface, not sure how that happens). The other netplan configuration file has content:
```
# This is the network config written by 'console-conf'
network:
  ethernets:
    ens3:
      dhcp4: true
  version: 2
```

description: updated
Revision history for this message
Lukas Märdian (slyon) wrote :

I have not yet been able to fully reproduce this in a classic system, but I wonder if this might help: https://github.com/canonical/netplan/pull/276

Revision history for this message
Lukas Märdian (slyon) wrote :

@alfonsosanchezbeato would you mind testing this against the version of netplan from this PPA?
https://launchpad.net/~slyon/+archive/ubuntu/lp1979277

It includes a (hacky) version of the patch from PR#276 and should improve the situation.

Changed in netplan:
status: New → Triaged
importance: Undecided → Medium
importance: Medium → Low
importance: Low → Medium
Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :

I can reproduce if I install, then remove, then install the network-manager snap. At that point, there is no /run/systemd/networkd/10-netplan-*.network, but networkd still thinks it owns the interface. After running sudo networkctl reload, things look good:

$ networkctl
IDX LINK TYPE OPERATIONAL SETUP
  1 lo loopback carrier unmanaged
  2 ens3 ether routable configured
$ ls run/systemd/network/
10-netplan-ens3.link
$ sudo networkctl reload
$ networkctl
IDX LINK TYPE OPERATIONAL SETUP
  1 lo loopback carrier unmanaged
  2 ens3 ether routable unmanaged

Revision history for this message
Lukas Märdian (slyon) wrote (last edit ):
Download full text (5.3 KiB)

I can now reproduce the issue inside a classic Jammy LXD container, using the network-manager snap from 22/stable.

When appending the reproducer script with this:
netplan get
ls -la /run/NetworkManager/system-connections
ls -la /run/systemd/network

I can see that netplan and sd-network are doing the correct things. All configuration are as they should be according to the default renderer displayed by "netplan get" and sd-network picks-up the connection when it is asked to and keeps it as "unmanaged" when NetworkManager is supposed to take control.

The NetworkManager deb also seems to work correctly. But the NM snap seems to cause this problem as can be seen from "journalctl -u snap.network-manager.networkmanager -e" logs. Most probably related to this:
https://git.launchpad.net/~snappy-hwe-team/snappy-hwe-snaps/+git/network-manager/tree/snap-common/bin/snap-config.sh?h=snap-22#n150

When calling netplan's "Apply()" dbus method, the NetworkManager systemd service is re-started. on restart the network-manager snap installs itself as default renderer and calls the "Apply()" method again, recursively. This doesn't look right.

IMO the problem is that calling "systemctl restart snap.network-manager.networkmanager" will reset the default renderer back to NetworkManager, as can be seen in the following log:
root@jj:~# ./repro-lp1979277.sh
+ netplan_cfg_p=/etc/netplan/00-default-nm-renderer.yaml
+ '[' -f /etc/netplan/00-default-nm-renderer.yaml ']'
+ cat
+ dbus-send --system --type=method_call --print-reply --dest=io.netplan.Netplan /io/netplan/Netplan io.netplan.Netplan.Apply
method return time=1656424157.108626 sender=:1.20 -> destination=:1.884 serial=252 reply_serial=2
   boolean true
+ nmcli d
DEVICE TYPE STATE CONNECTION
eth0 ethernet connected netplan-eth0
erspan0 erspan unmanaged --
gre0 iptunnel unmanaged --
gretap0 iptunnel unmanaged --
ip6gre0 iptunnel unmanaged --
ip6tnl0 iptunnel unmanaged --
sit0 iptunnel unmanaged --
tunl0 iptunnel unmanaged --
lo loopback unmanaged --
ip_vti0 vti unmanaged --
+ networkctl
IDX LINK TYPE OPERATIONAL SETUP
  1 lo loopback carrier unmanaged
  2 gre0 ipgre off unmanaged
  3 gretap0 ether off unmanaged
  4 erspan0 ether off unmanaged
  5 ip6tnl0 tunnel6 off unmanaged
  6 ip6gre0 ip6gre off unmanaged
  7 tunl0 tunnel off unmanaged
  8 sit0 sit off unmanaged
  9 ip_vti0 tunnel off unmanaged
 10 ip6_vti0 tunnel6 off unmanaged
188 eth0 ether carrier unmanaged

11 links listed.
+ netplan get
network:
  version: 2
  renderer: NetworkManager
  ethernets:
    eth0:
      dhcp4: true
+ ls -la /run/NetworkManager/system-connections
total 4
drwx------ 2 root root 60 Jun 28 13:49 .
drwxr-xr-x 6 root root 160 Jun 28 13:30 ..
-rw------- 1 root root 131 Jun 28 13:49 netplan-eth0.nmconnection
+ ls -la /run/systemd/network
total 0
drwxr-xr-x 2 root root 40 Jun 28 13:49 .
drwxr-xr-x 25 root root 600 Jun 28 13:4...

Read more...

Changed in netplan:
status: Triaged → Invalid
Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :

@Lukas, thanks for looking into this. I agree that the call to io.netplan.Netplan.Apply is causing problems, but at the same I think that the script I wrote is wrong precisely because of the calls from the NM snap. The re-installs problem is the one that worries me: the wrapper in the NM snap calls Netplan.Apply only if the netplan config that sets NM as the default renderer does not exist, so NM should be restarted by netplan only once (after the restart the call to Apply would not happen again), and in any case networkctl should not show the eth as managed anymore. I will try some more debugging later to try to gather more details.

Revision history for this message
Alfonso Sanchez-Beato (alfonsosanchezbeato) wrote :

@Lukas btw is there a way to see the currently *applied* netplan configuration? "netplan get" shows the content of files, but I am not sure if that is what has been applied (you could apply, then write a file in /etc/netplan/ and "netplan get" will show it but it would not be applied yet iiuc).

Revision history for this message
Lukas Märdian (slyon) wrote :

Thanks for trying to gather some more details!

Indeed, "netplan get" shows what's the configuration currently on disk (i.e. in /etc/netplan/*.yaml), only. This is not necessarily what is currently applied to the system.

We'll be working on a new "netplan status" command that will do precisely that (Spec: FO049, if you're interested in that). But it's not yet done.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.