vlan network device lost after the ubuntu 23.04 install complete

Bug #2022030 reported by mayijie
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Incomplete
Undecided
Unassigned
netplan.io (Ubuntu)
New
Undecided
Unassigned
subiquity (Ubuntu)
New
Undecided
Unassigned

Bug Description

ubuntu@ibm:~$ cat /etc/os-release
PRETTY_NAME="Ubuntu 23.04"
NAME="Ubuntu"
VERSION_ID="23.04"
VERSION="23.04 (Lunar Lobster)"

System hardware: IBM LinuxONE

Details:
During the installation, I set up and actived a vlan device, after the installation, when I login the OS, the vlan device disappeared, I opened the '/etc/netplan/00-installer-config.yaml', double checked the config file, the vlan1300 is existing, and then when I 'netplan apply' to perform the configuration, an error message shows up,

"[?2004hubuntu@ibm:~$
"sudo netplan apply
"[?2004l
"Cannot call openvswitch: ovsdb-server.service is not running.

And during the boot stage, the systemd-networkd-wait-online.se tart running message shows a lot in the log, not sure these are related.

"M[K[ [0;31m*[0;1;31m*[0m] Job systemd-networkd-wait-online.se tart running (1min 11s / no limit)
"
"M[K[ [0;31m*[0m] Job systemd-networkd-wait-online.se tart running (1min 12s / no limit)
"
"M[K[ [0;31m*[0;1;31m*[0m] Job systemd-networkd-wait-online.se tart running (1min 12s / no limit)
"
"M[K[ [0;31m*[0;1;31m*[0m[0;31m*[0m] Job systemd-networkd-wait-online.se tart running (1min 13s / no limit)
"
"M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] Job systemd-networkd-wait-online.se tart running (1min 13s / no limit)
"
"M[K[ [0;31m*[0;1;31m*[0m[0;31m* [0m] Job systemd-networkd-wait-online.se tart running (1min 14s / no limit)
"
"M[K[[0;31m*[0;1;31m*[0m[0;31m* [0m] Job systemd-networkd-wait-online.se tart running (1min 14s / no limit)
"
"M[K[[0;1;31m*[0m[0;31m* [0m] Job systemd-networkd-wait-online.se tart running (1min 15s / no limit)

Attached the operating system boot log.

Thanks.

Tags: netplan
Revision history for this message
mayijie (mayijie56) wrote :
Revision history for this message
Frank Heimes (fheimes) wrote :
Download full text (5.1 KiB)

Hello, it looks to me that you haven't done a default installation.

A default installation in a vlan network environment leads to such a netplan yaml file (/etc/netplan/00-installer-config.yaml):
--------%<----------------%<----------------%<----------------%<--------
# This is the network config written by 'subiquity'
network:
  ethernets:
    encc000: {}
  version: 2
  vlans:
    encc000.2653:
      addresses:
      - 10.245.236.15/24
      gateway4: 10.245.236.1
      id: 2653
      link: encc000
      nameservers:
        addresses:
        - 10.245.236.1
--------%<----------------%<----------------%<----------------%<--------
(I just did an installation from scratch, trying to reproduce.)

Your is this:
--------%<----------------%<----------------%<----------------%<--------
# This is the network config written by 'subiquity'
network:
 ethernets:
   enc1000: {}
 version: 2
 vlans:
   enc1000.1300:
     addresses:
     - 10.20.103.65/24
     routes:
       - to: default
         via: 10.20.103.254
     id: 1300
     link: enc1000
     nameservers:
       addresses:
       - 10.20.0.2
--------%<----------------%<----------------%<----------------%<--------

Notice 'gateway4' vs 'routes ...'

Do you have an openvswitch (ovs) in your environment? If so, how is it setup?
Have you done any other network related (re-)configurations?

Can you please perform a default installation w/o ovs?

Could you also share the output of:
$ lszdev --online

Is your system a DPM system?
In case yes, can you also share:
sudo lszdev --auto-conf

Can you please share the exact settings that you've specified in your early network configuration step?
So after:
"Attempt interactive netboot from a URL?"
in the console?
(
like in my case:
Attempt interactive netboot from a URL?
yes no (default yes):
yes
Available qeth devices:
0.0.c000 0.0.c003 0.0.c006
zdev to activate (comma separated, optional):
c000
[ 83.508886] lcs: Loading LCS driver
[ 83.576712] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.
[ 83.578446] qeth: register layer 2 discipline
[ 83.579381] qeth 0.0.c000: CHID: 14c CHPID: c0
[ 83.587240] qeth 0.0.c002: qdio: OSA on SC bae using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA: W
[ 83.612746] qeth 0.0.c000: Device is a OSD Express card (level: 0777)
[ 83.612746] with link type OSD_10GIG.
[ 83.613174] qeth 0.0.c000: The device represents a Bridge Capable Port
[ 83.614065] qeth 0.0.c000: MAC address 9e:30:60:41:bf:f8 successfully registered
[ 83.615392] qeth 0.0.c000 encc000: renamed from eth0
A manual update of the initial RAM-disk is required.
QETH device 0.0.c000:0.0.c001:0.0.c002 configured
Note: The initial RAM-disk must be updated for these changes to take effect:
- QETH device 0.0.c000:0.0.c001:0.0.c002
Two methods available for IP configuration:
* static: for static IP configuration
* dhcp: for automatic IP configuration
static dhcp (default 'dhcp'):
static
ip:
10.245.236.15
netmask (default 255.255.255.0):

gateway (default 10.245.236.1):

dns (default 10.245.236.1):

vlan id (optional):
2653
Available interfaces:
encc000
device (default encc000):
encc000
...

Read more...

Changed in ubuntu-z-systems:
status: New → Incomplete
Revision history for this message
mayijie (mayijie56) wrote :
Download full text (3.2 KiB)

Hi Frank,

Thank you very much for the repid and detailed reply, here is my answer:

1) Notice 'gateway4' vs 'routes ...'
>> Yes, the default installation using the 'gateway4', but there is a warning popuped 'gateway4 has been deprecated' when I tried 'netplan apply' to generate the vlan interface. For eliminate this warning, I change 'gateway4' to 'routes' manually and 'apply' again, the warning disappeared but the vlan interface still not there.

2) Do you have an openvswitch (ovs) in your environment? If so, how is it setup?
>> I don't think so, the only thing I've added was a 0.0.1000 OSA device and a vlan based on 0.0.1000, the vlan interface covers the IP addr.

3) Have you done any other network related (re-)configurations?
>> No, except the manual update in step 1)

4) Can you please perform a default installation w/o ovs?
>> I'll try this, but from my previous success trials on 20.04, 22.04... there were no ovs configure.

5) Could you also share the output of:
$ lszdev --online
>>
[?2004hubuntu@ibm:~$
sudo lszdev --online
[?2004l
[sudo] password for ubuntu:

TYPE ID ON PERS NAMES
zfcp-host 0.0.9000 yes yes
zfcp-host 0.0.9100 yes yes
zfcp-lun 0.0.9000:0x5001738030bb0141:0x0000000000000000 yes yes sg2
zfcp-lun 0.0.9000:0x5001738030bb0141:0x0001000000000000 yes yes sdb sg3
zfcp-lun 0.0.9000:0x5001738030bb0151:0x0000000000000000 yes yes sg0
zfcp-lun 0.0.9000:0x5001738030bb0151:0x0001000000000000 yes yes sda sg1
zfcp-lun 0.0.9100:0x5001738030bb0142:0x0000000000000000 yes yes sg4
zfcp-lun 0.0.9100:0x5001738030bb0142:0x0001000000000000 yes yes sdc sg5
zfcp-lun 0.0.9100:0x5001738030bb0152:0x0000000000000000 yes yes sg6
zfcp-lun 0.0.9100:0x5001738030bb0152:0x0001000000000000 yes yes sdd sg7
qeth 0.0.1000:0.0.1001:0.0.1002 yes yes enc1000

6) Is your system a DPM system?
In case yes, can you also share:
sudo lszdev --auto-conf
>> Yes,
[?2004hubuntu@ibm:~$
sudo lszdev --auto-conf
[?2004l
TYPE ID AUTO
zfcp-host 0.0.9000 yes
zfcp-host 0.0.9100 yes
zfcp-host 0.0.9200 yes
zfcp-host 0.0.9300 yes
zfcp-lun 0.0.9100:0x5001738030bb0142:0x0001000000000000 yes
qeth 0.0.1000:0.0.1001:0.0.1002 yes

7) Can you please share the exact settings that you've specified in your early network configuration step?
>> I'll attach the full interaction messages in this ticket while installing. (ubuntu-23.04-install-osm)

8) Also also what you have specified in the semi-graphical (subiquity) installer screen?
>> Please see the screen shot 1 and 2 (network in subiquity 1.png; network in subiquity 2.png)

9) Please would you also share the entire /var/log/installer folder?
>> Zipped the folder and will attach it soon. (install.tar)

10) And thank you for the 'systemd-networkd-wait-online' sharing, I'll follow up ...

Read more...

Revision history for this message
mayijie (mayijie56) wrote :
Revision history for this message
mayijie (mayijie56) wrote :
Revision history for this message
mayijie (mayijie56) wrote :
Revision history for this message
mayijie (mayijie56) wrote :
Revision history for this message
Frank Heimes (fheimes) wrote :

Hello mayijie,
thanks for the additional info and data.

Based on this I tried to recreate on a local DPM system here (in a configuration that is as close as possible), but was not able - in other words: it worked fine for me.

I stayed with gateway4, since it's just a warning, no need to do anything (it's on us/Canonical to change this before the deprecation warning will turn into 'no longer supported').

I also attached multiple NICs, some that are connected and have a link and others that are not (to force the timeout during the install), but also this didn't cause any issues.

(see attached attempt to recreate)

But your case reminds me about LP#1996007.
May I ask if you are working on the same system and environment than "<email address hidden>"?

We did quite some investigations and I am pretty sure that this (his and maybe your) issue is based on the configuration of that local system.

Another indicator is that nobody else ever reported this - but it seems to be in that env. the same for 22.04 and 23.04.

Revision history for this message
mayijie (mayijie56) wrote :

Hi Frank,

Thanks for the investiaging.

Yes, Wei is my colleague, contacted with him and his trial was on a similar system and almost same configuraiton of my testing this time.

And following you and Wei's last comments in 1996007, I reproduced the same scenario w/ the only change is set the network interface from trunk mode to access mode (leave the 'vlan id (optional)' field blank). In this case, the IP addr did not lost. (Acturally, in the previous failed case, the enc** interface was exist either, the lost interface was the vlan interface, and the IP addr was assigned to the vlan in trunk mode, so the IP addr was lost as well.)

This is the yaml configure file I use access mode adapter today:

*************************************************************
ubuntu@ibm:~$ cat /etc/netplan/00-installer-config.yaml
# This is the network config written by 'subiquity'
network:
  ethernets:
    enc1000:
      addresses:
      - 10.20.92.248/24
      gateway4: 10.20.92.254
      nameservers:
        addresses:
        - 10.20.0.2
  version: 2
********************************************************

From the behavior, I understand your comment: "That let's me assume that it's related to the current switch configuration.", but the strange is, I installed ubuntu using trunk mode network from the older ubuntu versions ago, the installation always successfully, ip assigned to the vlan interface, no lost.

As a comparation, I installed an ubuntu 20.04 using trunk mode today, the vlan interface remained after the installation, this issue (vlan interface lost) seems only exist in 22.04 and 23.04.

I'll attach the install logs I tried today, hope they helps.

install 23.04 in access mode, success --> 2304-access-success-installer.tar
install 20.04 in truck mode, success --> 2004-trunk-success-installer.tar
install 23.04 in trunk mode, vlan interface lost --> please refer comment #7

Thanks.

Revision history for this message
mayijie (mayijie56) wrote :
Revision history for this message
mayijie (mayijie56) wrote :
Revision history for this message
mayijie (mayijie56) wrote :

Hi Frank,

More test, in the trunk mode, I used 'ip link' commands to

1. down the enc1000
2. add the vlan 1300
3. assign the ip to vlan 1300
4. up the enc1000
5. up the vlan interface

vlan interface appears in the 'ip a' command, and the IP is usable, but after I reboot the operating system, the vlan interface lost again.

Please reference the bottom part of 'ip link down and up log'.

Thanks.

Revision history for this message
mayijie (mayijie56) wrote :
Revision history for this message
mayijie (mayijie56) wrote :

Hi Frank,

In you comment #8, you mentioned in you DPM environment, your testing is fine, may I know your dpm's version, my testing is on dpm R5.1.

These days, I tried to add the 'rd.zdev=no-auto' to the list of Linux kernel parameters to disable the auto-configuration, but no lucky, the issue is still there.

Thanks.

Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote :

Hi mayijie,

May I ask you to create a tarball with all the files from /var/run/udev/rules.d and /var/run/systemd/network, right after booting the problematic system, and attach to the case, please?

Also, please include in the tarball the systemd journal from the systemd-networkd and systemd-udevd services:

journalctl -u systemd-networkd > systemd-networkd.log

journalctl -u systemd-udevd > systemd-udevd.log

This information will help us to understand what happened during the network configuration.

Thank you!

Revision history for this message
mayijie (mayijie56) wrote :

Hi Danilo,

Thanks, I'll attach the logs,

/var/run/udev/rules.d ==> rules.tar
/var/run/systemd/network ==> network.tar

And here is the current interfaces in the system, and netplan configure file:

**********************
ubuntu@a90ubut15:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group defaul
t qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enc1000: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group d
efault qlen 1000
    link/ether 02:ca:72:80:ac:b0 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::ca:72ff:fe80:acb0/64 scope link
       valid_lft forever preferred_lft forever
**************************

ubuntu@a90ubut15:~$ sudo cat /etc/netplan/00-installer-config.yaml
# This is the network config written by 'subiquity'
network:
  ethernets:
    enc1000: {}
  version: 2
  vlans:
    enc1000.1300:
      addresses:
      - 10.20.103.65/24
      gateway4: 10.20.103.254
      id: 1300
      link: enc1000
      nameservers:
        addresses:
        - 10.20.0.2
********************************

For any more trace/log/test need, please feel free to let me know.

Thanks

Revision history for this message
mayijie (mayijie56) wrote :
Revision history for this message
mayijie (mayijie56) wrote :
Revision history for this message
mayijie (mayijie56) wrote :
Revision history for this message
mayijie (mayijie56) wrote :
Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote :

Hi there,

Thank you for the log files.

You configuration looks fine.

This line from your systemd-networkd journal is suspicious though:

----
Jun 13 06:59:19 a90ubut15 systemd-networkd[980]: enc1000: Could not create stacked netdev: No such device
----

Apparently, when this error doesn't pop up, your vlan is created. It smells like a race in systemd or something like that.

Can you enable the debug logs and attach to the ticket, please?

Create the file /etc/systemd/system/systemd-networkd.service.d/override.conf with this content:

[Service]
Environment=SYSTEMD_LOG_LEVEL=debug

and reboot your system. Then collect the systemd-networkd journal again and attach to the ticket.

Thank you!

Revision history for this message
mayijie (mayijie56) wrote :

Hi Danile,

Here is the new log file, thanks~

Revision history for this message
mayijie (mayijie56) wrote :
Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote :

Hi, thanks for the logs.

Was the vlan created (and persisted) after this last reboot? Based on the logs, seems like it worked this time?

Thanks!

Revision history for this message
mayijie (mayijie56) wrote :

Hi Danilo,

During the installation, I think the vlan should be ok, because I can 'ssh installer@<ip>' to continue the installation.

After the installation completed, it request a reboot, after the reboot is complete, I found I can't 'ssh <uid>@<ip>', I used the integrate console to log in and checked the interface by 'ip a', the enc1000 interface was exist and the vlan interface lost.

From the comment #12, I can bring up the vlan interface by some 'ip link' commands temporarily, but it will disappeared after reboot.

Reply your comment #24, not created and not worked this time.

Thanks.

Revision history for this message
mayijie (mayijie56) wrote :

Hi Danilo,

Any updates?

Thanks.

Revision history for this message
Lukas Märdian (slyon) wrote :
Download full text (4.1 KiB)

Hi mayijie, indeed from your latest systemd-networkd debug log it looks like the enc1000.1300 vlan interface was created successfully, I wonder why it doesn't show up in your "ip link" output..

What I find interesting, though it that the enc1000 interface seems to come UP during boot, but then looses carrier briefly (cable disconnected? underlying VM reconfigured?), which interrupts the creation of the enc1000.1300 vlan.

Would you mind adding the "ignore-carrier: true" setting to your enc1000 Netplan configuration and check if this makes any difference?

Here's the relevant log where carrier is lost:
```
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: Flags change: +UP +LOWER_UP +RUNNING
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: Link UP
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: Gained carrier
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: found matching network '/run/systemd/network/10-netplan-enc1000.network'.
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: LLDP Rx: Started LLDP client
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: link_check_ready(): link is not activated.
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: link_check_ready(): link is not activated.
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: link_check_ready(): link is not activated.
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: link_check_ready(): link is not activated.
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: link_check_ready(): link is not activated.
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: link_check_ready(): link is not activated.
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: link_check_ready(): link is not activated.
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: link_check_ready(): link is not activated.
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: link_check_ready(): link is not activated.
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: Received new foreign route (configured): dst: ff00::/8, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: local(255), proto: kernel, type: multicast, nexthop: 0, priority: 256, flags: n/a
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: Received new foreign route (configured): dst: fe80::/64, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main(254), proto: kernel, type: unicast, nexthop: 0, priority: 256, flags: n/a
Jun 19 03:00:33 a90ubut15 systemd-networkd[964]: enc1000: link_check_ready(): stacked netdevs are not created.
Jun 19 03:00:34 a90ubut15 systemd-networkd[964]: enc1000.1300: Creating
Jun 19 03:00:34 a90ubut15 systemd-networkd[964]: enc1000: Flags change: -UP -LOWER_UP -RUNNING
Jun 19 03:00:34 a90ubut15 systemd-networkd[964]: enc1000: Link DOWN
Jun 19 03:00:34 a90ubut15 systemd-networkd[964]: enc1000: Lost carrier
Jun 19 03:00:34 a90ubut15 systemd-networkd[964]: enc1000: LLDP Rx: Stopping LLDP client
Jun 19 03:00:34 a90ubut15 systemd-networkd[964]: enc1000: Forgetting foreign route (n/a): dst: fe80::/64, src: n/a, gw: n/a, prefsrc: n/a, scope: global, table: main(254), proto: kernel, type: unicast, nexthop: 0, prio...

Read more...

Revision history for this message
mayijie (mayijie56) wrote :

Hi Lukas,

Thanks for the investigation, I followed your instruction, added the "ignore-carrier: true", netplan apply and reboot, the vlan interface shows up after reboot, cool~

But what I don't understand is the trunk mode adapter actually neither cable disconnected, nor VM exist, and from my previous test, this case only happened in ubut23.04, you can reference my comment #9.

So I did another installation ubut23.04 using another trunk adapter as the backend adapter of vNic device, the vlan interface lost again, I tried added the "ignore-carrier: true" in the yaml file, but this time, no luchy.

Here the systemd-networkd.log read :

'''
"'-- Boot 02eb864dc23541f0911df9f082c7d33e --
"
"Jul 05 10:04:49 a90ubut21 systemd[1]: Starting systemd-networkd.service - Network Configuration...
"
"Jul 05 10:04:49 a90ubut21 systemd-networkd[1070]: lo: Link UP
"
"Jul 05 10:04:49 a90ubut21 systemd-networkd[1070]: lo: Gained carrier
"
"Jul 05 10:04:49 a90ubut21 systemd-networkd[1070]: Enumeration completed
"
"Jul 05 10:04:49 a90ubut21 systemd[1]: Started systemd-networkd.service - Network Configuration.
"
"Jul 05 10:04:49 a90ubut21 systemd-networkd[1070]: enc1000: Configuring with /run/systemd/network/10-netplan-enc1000.network.
"
"Jul 05 10:04:49 a90ubut21 systemd-networkd[1070]: enc1000: Link UP
"
"Jul 05 10:04:49 a90ubut21 systemd-networkd[1070]: enc1000: Gained carrier
"
"Jul 05 10:04:49 a90ubut21 systemd-networkd[1070]: enc1000: Link DOWN
"
"Jul 05 10:04:49 a90ubut21 systemd-networkd[1070]: enc1000: Lost carrier
"
"Jul 05 10:04:49 a90ubut21 systemd-networkd[1070]: enc1000: Could not create stacked netdev: No such device
"
"Jul 05 10:04:49 a90ubut21 systemd-networkd[1070]: enc1000: Failed
"
"Jul 05 10:04:49 a90ubut21 systemd-networkd[1070]: enc1000: Link UP
"
"Jul 05 10:04:49 a90ubut21 systemd-networkd[1070]: enc1000: Gained carrier
"
"Jul 05 10:04:51 a90ubut21 systemd-networkd[1070]: enc1000: Gained IPv6LL
'''

From the log, it seems 'lost carrier' again, Hmm......

I compared the 10-netplan-enc1000.network, no difference I found from the repaired one.

Here is the configure files:

'''
cat 10-netplan-enc1000.network
[?2004l
[Match]
Name=enc1000

[Network]
LinkLocalAddressing=ipv6
ConfigureWithoutCarrier=yes
VLAN=enc1000.1300

cat 10-netplan-enc1000.1300.network
[?2004l
[Match]
Name=enc1000.1300

[Network]
LinkLocalAddressing=ipv6
Address=10.20.103.66/24
Gateway=10.20.103.254
DNS=10.20.0.2
ConfigureWithoutCarrier=yes

cat 10-netplan-enc1000.1300.netdev
[?2004l
[NetDev]
Name=enc1000.1300
Kind=vlan

[VLAN]
Id=1300
'''

Thanks~

Revision history for this message
mayijie (mayijie56) wrote :

Hi,
The thing is that only ubuntu23.04 has this issue, Ubuntu22.04 and Ubuntu20.04 are fine. And the same adapter work fine for RHEL and SUSE, so in other word, only Ubuntu23.04 has this issue.
And I test all of adapter, all of adapters ran into this issue on Ubuntu23.04. So I am wondering it should be a specify issue only for Ubuntu23.04, not env issue.
And we are solution test team and using the customer like env, so this issue could happen on customer side, so we must fix it before it deliver to customer.
Looks like ‘ignore-carrier: true’ is not the real reason for this issue, I found it did not work on new installed partition.
Could you help us to take a look for this? Since customer will update Ubuntu to 23.04 soon. It will raise to a customer issue

Revision history for this message
mayijie (mayijie56) wrote :

Hi,
As we double checked the netplan configure yaml file and the generated 10-netplan-enc1000.1300 configure file is all correct, and the vlan interface lost occurred after the reboot operation, and we can add the vlan interface by 'ip link add' command, is that possible the lost happened during the IPL process, kernel can not generate the vlan interface by correct config file during boot process.

ubuntu@a257ubut02:~$ uname -r
6.2.0-25-generic

Thanks.

Revision history for this message
mayijie (mayijie56) wrote :

Hi,

I found that, the restart network service would add the vlan interfacce back to the system, and reboot would lost the interface again.

'sudo systemctl restart systemd-networkd.service'

I'll attach the systemd-networkd log, you can see the difference.

>>>>>>>>>>>>>>

timestamp to reboot the ubuntu os: Jul 17 04:15:10
timestamp to restart the systemd-networkd service: Jul 17 04:25:02

Thanks.

Revision history for this message
mayijie (mayijie56) wrote :

In the log, in the boot stage after reboot, it shows the enc1000 link up and then link down due to lost carrier, but I'm sure the back end adapter was good, this case just occurred in ubut 23.04 in trunk mode adapter.

Jul 17 04:15:51 a257ubut02 systemd-networkd[1066]: enc1000: Link UP
Jul 17 04:15:51 a257ubut02 systemd-networkd[1066]: enc1000: Gained carrier
Jul 17 04:15:51 a257ubut02 systemd-networkd[1066]: enc1000: Link DOWN
Jul 17 04:15:51 a257ubut02 systemd-networkd[1066]: enc1000: Lost carrier
Jul 17 04:15:51 a257ubut02 systemd-networkd[1066]: enc1000: Could not create stacked netdev: No such device
Jul 17 04:15:51 a257ubut02 systemd-networkd[1066]: enc1000: Failed
Jul 17 04:15:51 a257ubut02 systemd-networkd[1066]: enc1000: Link UP
Jul 17 04:15:51 a257ubut02 systemd-networkd[1066]: enc1000: Gained carrier
Jul 17 04:15:52 a257ubut02 systemd-networkd[1066]: enc1000: Gained IPv6LL

Revision history for this message
mayijie (mayijie56) wrote :
Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote :

Hi there, thanks a lot for your recent tests.

It really feels like a systemd issue...

A new version of systemd was uploaded to Ubuntu Mantic recently, do you think you could upgrade your system to mantic, install systemd from the -proposed pocket and see if it fixes the issue?

The test would look like this:

1) upgrade to mantic
2) reboot and check if the problem persists
3) if the problem is still there, upgrade systemd to the version in mantic-proposed (which is 253.5)
4) do the same tests to see if the problem is still present

Revision history for this message
mayijie (mayijie56) wrote :

Hi Danilo, thanks, I'd like to try the Mantic release, but I found the new release is not exist in the path I always download the iso image: https://cdimage.ubuntu.com/releases/

Is there another url I can get the Mantic on system Z platform now?

Thanks~

Revision history for this message
mayijie (mayijie56) wrote :

Danilo, got the Mantic image and checking ...

Revision history for this message
mayijie (mayijie56) wrote :

Hi Danilo,

I tried your steps on comment #34, but no lucky, from the systemd-networkd log, it looks the same issue.

1) upgrade to mantic
>
ubuntu@a257ubut03:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu Mantic Minotaur (development branch)
Release: 23.10
Codename: mantic

2) reboot and check if the problem persists
>
Yes, the problem persists.

3) if the problem is still there, upgrade systemd to the version in mantic-proposed (which is 253.5)
>
ubuntu@a257ubut03:~$ dpkg -l | grep systemd
ii systemd 253.5-1ubuntu1 s390x system and service manager
ii systemd-dev 253.5-1ubuntu1 all systemd development files
ii systemd-hwe-hwdb 252.4.3 all udev rules for hardware enablement (HWE)
ii systemd-resolved 252.5-2ubuntu3 s390x systemd DNS resolver
ii systemd-sysv 252.5-2ubuntu3 s390x system and service manager - SysV compatibility symlinks
ii systemd-timesyncd 252.5-2ubuntu3 s390x minimalistic service to synchronize local time with NTP servers

4) do the same tests to see if the problem is still present
>
Yes, the vlan interface disappeared after 'sudo reboot' and can be showed up after 'sudo systemctl restart systemd-networkd'. I'll attach the systemd-networkd log and the timestamps after systemd upgraded are:
Jul 18 08:31 -> reboot the os
Jul 18 08:36 -> restart systemd-networkd service

Thanks.

Revision history for this message
mayijie (mayijie56) wrote :

Hi Frank,

May I know your DPM environment while re-producate test in comment #8?, since I only have DPM R5.1 here and I'd like to try if I can change the DPM version as your version.

Thanks.

Revision history for this message
Frank Heimes (fheimes) wrote :

Hi mayijie,
I tried to find out the DPM version, but no luck so far.
the only answer that I got so far (from IBM) is that there will be no newer DPM versions as soon as a new machine is out.
So we'll need to stick with what we have in the L1-III, since L1-4 is already out.
If you can give me a hint on ho to obtain the current situation, I'll look it up...

Regarding mantic/future 23.10, it's the current development release with is not yet out,
hence it cannot be found at https://cdimage.ubuntu.com/releases/
but at the dailies for now:
- smoke tested, but maybe a bit older
  https://cdimage.ubuntu.com/ubuntu-server/daily-live/current/
- untested (or maybe not successfully tested), but a bit newer
  https://cdimage.ubuntu.com/ubuntu-server/daily-live/pending/

Revision history for this message
mayijie (mayijie56) wrote :

Thanks for the information Frank, talking with DPM colleagues and will let you know if there are some updates in the future.

Revision history for this message
mayijie (mayijie56) wrote :
Download full text (3.7 KiB)

Hi,

Did some more test, I changed the network daemon from systemd-networkd to NetworkManager (still use netplan), the 'vlan interface lost after reboot' still there: reboot the os will lost the vlan interface, restart the NM service could shows up the vlan interface. In the NM log, it shows during the boot stage: manager: "(netplan-enc1000.1292) couldn't create the device: no support for VLANs on interface enc1000 of type Ethernet"

Here are some networkmanager service command output before and after 'restart nm service' as the comparison, and journalctl log file with operate timestamps:
>>>>>>>>>>>
journalctl -u NetworkManager > nm.log

timestamp: Jul 27 08:27 -> reboot the ubut os

timestamp: Jul 27 08:32 -> restart the nm service
>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>
run 'nmcli d' and 'nmcli c' before and after 'restart nm'

> before NM restart

ubuntu@a90ubut15:~$ nmcli d
DEVICE TYPE STATE CONNECTION
enc1000 ethernet connected netplan-enc1000
lo loopback connected (externally) lo

ubuntu@a90ubut15:~$ nmcli c
NAME UUID TYPE DEVICE
lo 2f1c6dd9-26ca-4276-a210-8bf61256322a loopback lo
netplan-enc1000 20393913-b9a9-3c3a-81cb-20f5840b309b ethernet enc1000
netplan-enc1000.1292 491497ea-36cb-3b96-ad99-6519b363483d vlan --

> after restart NM

ubuntu@a90ubut15:~$ nmcli d
DEVICE TYPE STATE CONNECTION
enc1000.1292 vlan connected netplan-enc1000.1292
enc1000 ethernet connected netplan-enc1000
lo loopback connected (externally) lo

ubuntu@a90ubut15:~$ nmcli c
NAME UUID TYPE DEVICE
netplan-enc1000.1292 491497ea-36cb-3b96-ad99-6519b363483d vlan enc1000.1292
lo 2f1c6dd9-26ca-4276-a210-8bf61256322a loopback lo
netplan-enc1000 20393913-b9a9-3c3a-81cb-20f5840b309b ethernet enc1000
>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>
I also compared the output of 'nmcli c show netplan-enc1000.1292' command before and after 'restart nm', found that the output of 'after the restart nm' not only include all the parameters in 'before the restart nm', and also plus the following parameters:

GENERAL.NAME: netplan-enc1000.1292
GENERAL.UUID: 491497ea-36cb-3b96-ad99-6519b363483d
GENERAL.DEVICES: enc1000.1292
GENERAL.IP-IFACE: enc1000.1292
GENERAL.STATE: activated
GENERAL.DEFAULT: yes
GENERAL.DEFAULT6: no
GENERAL.SPEC-OBJECT: --
GENERAL.VPN: no
GENERAL.DBUS-PATH: /org/freedesktop/NetworkManager/ActiveConnection/3
GENERAL.CON-PATH: /org/freedesktop/NetworkManager/Settings/2
GENERAL.ZONE: ...

Read more...

Revision history for this message
mayijie (mayijie56) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.