Segfault Error when applying config with VRF

Bug #2016427 reported by Joshua Powell
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Netplan
Fix Released
High
Unassigned

Bug Description

Hello,

This is my first bug report, so hopefully I've included all the info required.

I've just made a config that seems to have an issue when trying to apply, the error is as follows:

Segmentation fault (core dumped)

Looking in syslog I can see the following:
Apr 16 19:19:23 router kernel: [154847.837589] netplan[599578]: segfault at 8 ip 00007f11b5f79d9a sp 00007fffeb163370 error 4 in libnetplan.so.0.0[7f11b5f64000+29000]
Apr 16 19:19:23 router kernel: [154847.837618] Code: 00 00 00 0f 85 6b 07 00 00 48 81 c4 d8 00 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 0f 1f 44 00 00 49 8b 86 00 01 00 00 45 31 ff <44> 8b 48 08 45 85 c9 75 41 e9 95 fd ff ff 0f 1f 84 00 00 00 00 00

The last bit of the dmesg log also shows the following:
[ 12.615499] kernel: audit: type=1400 audit(1681517929.186:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="lsb_release" pid=1072 comm="apparmor_parser"
[ 12.616738] kernel: audit: type=1400 audit(1681517929.190:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/bin/man" pid=1075 comm="apparmor_parser"
[ 12.616755] kernel: audit: type=1400 audit(1681517929.190:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_filter" pid=1075 comm="apparmor_parser"
[ 12.616767] kernel: audit: type=1400 audit(1681517929.190:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="man_groff" pid=1075 comm="apparmor_parser"
[ 12.617611] kernel: audit: type=1400 audit(1681517929.190:6): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=1073 comm="apparmor_parser"
[ 12.617618] kernel: audit: type=1400 audit(1681517929.190:7): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe//kmod" pid=1073 comm="apparmor_parser"
[ 12.625979] kernel: audit: type=1400 audit(1681517929.198:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/snapd/snap-confine" pid=1077 comm="apparmor_parser"
[ 12.626007] kernel: audit: type=1400 audit(1681517929.198:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=1077 comm="apparmor_parser"
[ 12.901386] kernel: kauditd_printk_skb: 5 callbacks suppressed
[ 12.901392] kernel: audit: type=1400 audit(1681517929.474:15): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap-update-ns.lxd" pid=1145 comm="apparmor_parser"
[ 12.906645] kernel: audit: type=1400 audit(1681517929.478:16): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/snap/snapd/18596/usr/lib/snapd/snap-confine" pid=1144 comm="apparmor_parser"
[ 12.906656] kernel: audit: type=1400 audit(1681517929.478:17): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/snap/snapd/18596/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=1144 comm="apparmor_parser"
[ 12.908288] kernel: audit: type=1400 audit(1681517929.482:18): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/snap/snapd/18357/usr/lib/snapd/snap-confine" pid=1143 comm="apparmor_parser"
[ 12.908312] kernel: audit: type=1400 audit(1681517929.482:19): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/snap/snapd/18357/usr/lib/snapd/snap-confine//mount-namespace-capture-helper" pid=1143 comm="apparmor_parser"
[ 12.912952] kernel: audit: type=1400 audit(1681517929.486:20): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.lxd.lxc" pid=1154 comm="apparmor_parser"
[ 12.914027] kernel: audit: type=1400 audit(1681517929.486:21): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.lxd.hook.install" pid=1152 comm="apparmor_parser"
[ 12.914110] kernel: audit: type=1400 audit(1681517929.486:22): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.lxd.hook.remove" pid=1153 comm="apparmor_parser"
[ 12.914363] kernel: audit: type=1400 audit(1681517929.486:23): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.lxd.check-kernel" pid=1149 comm="apparmor_parser"
[ 12.914433] kernel: audit: type=1400 audit(1681517929.486:24): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.lxd.buginfo" pid=1148 comm="apparmor_parser"
[ 13.766552] kernel: gre: GRE over IPv4 demultiplexor driver
[ 13.774548] kernel: ip_gre: GRE over IPv4 tunneling driver
[ 15.510978] kernel: mlx5_core 0000:04:00.1 ens1f1np1: Link up
[ 15.528629] kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ens1f1np1: link becomes ready
[ 16.047816] kernel: mlx5_core 0000:04:00.0 ens1f0np0: Link up
[ 16.176234] kernel: 8021q: 802.1Q VLAN Support v1.8
[ 16.176252] kernel: 8021q: adding VLAN 0 to HW filter on device ens1f0np0
[ 16.189931] kernel: 8021q: adding VLAN 0 to HW filter on device ens1f1np1
[ 16.550048] kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ens1f0np0: link becomes ready

Should I do a pastbin link for the netplan config it's self?

Tags: sru-next
Changed in netplan:
status: New → Incomplete
status: Incomplete → New
Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote :

Hi there, thank you for your bug report.

May I ask you to provide us with the netplan.io and Ubuntu versions you are running, please?

It would be great to also have access to the configuration that triggers this problem if you could attach it to the bug report. Please, redact any sensitive information before attaching it.

Meanwhile, I will try to investigate what could be wrong with VRFs handling in the libnetplan.

Thanks!

Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote :

Assuming the system is Jammy and netplan.io is the most recent one from -update.

Based on the code dumped during the crash and the IP offset, I believe the crash happened inside adopt_and_validate_vrf_routes(), around this line https://git.launchpad.net/ubuntu/+source/netplan.io/tree/src/validation.c?h=ubuntu/jammy-updates#n477

Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote :

Found a reproducer

cat > /etc/netplan/91-vrf.yaml <<EOF
network:
  vrfs:
    vrf20:
      table: 20
  bridges:
    br0:
      interfaces: []
EOF

cat > /etc/netplan/92-vrf.yaml <<EOF
network:
  vrfs:
    vrf20:
      routes:
        - to: 1.2.3.0/24
          via: 1.2.3.4
EOF

# netplan get
Segmentation fault (core dumped)

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7f8cd9a in adopt_and_validate_vrf_routes (npp=0x555555569060, error=0x7fffffffe318, netdefs=<optimized out>) at src/validation.c:477
477 for (size_t i = 0; i < nd->ip_rules->len; i++) {
(gdb) bt
#0 0x00007ffff7f8cd9a in adopt_and_validate_vrf_routes (npp=0x555555569060, error=0x7fffffffe318, netdefs=<optimized out>)
    at src/validation.c:477
#1 netplan_state_import_parser_results (np_state=0x55555556acc0, npp=0x555555569060, error=error@entry=0x7fffffffe318)
    at src/parse.c:3208
#2 0x0000555555556964 in main (argc=<optimized out>, argv=<optimized out>) at src/generate.c:251
(gdb) p nd->ip_rules
$1 = (GArray *) 0x0

Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote :

The problem still exists in Lunar with netplan.io 0.106-0ubuntu3

Changed in netplan:
status: New → Confirmed
Lukas Märdian (slyon)
Changed in netplan:
status: Confirmed → Triaged
importance: Undecided → High
Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote :

Here is a simpler reproducer with a single file:

cat > /etc/netplan/90-vrf.yaml <<EOF
network:
  vrfs:
    vrf20:
      table: 20
      routes:
        - to: 1.2.3.0/24
          via: 1.2.3.4
EOF

# netplan get
Segmentation fault (core dumped)

The problem is that we try to walk through the list of routing-policies without checking if we have items in the list.

Revision history for this message
Joshua Powell (joshpowell) wrote :

Heyo Danilo,

Yep exactly. Not sure there's a need to still post my config but using Ubuntu 22.04 and latest netplan.

If I can help with anything please let me know.

Kind regards
Josh

Revision history for this message
Danilo Egea Gondolfo (danilogondolfo) wrote :

Hi, Joshua

I think it's not necessary, unless you are facing a different bug.

Adding a dummy "routing-policy" entry to your configuration should work around this:

network:
  vrfs:
    vrf20:
      table: 20
      routes:
        - to: 1.2.3.0/24
          via: 1.2.3.4

      routing-policy:
        - from: 123.123.123.123

Can you try and let me know if the segfault goes away?

I prepared a fix for it https://github.com/canonical/netplan/pull/341

Thank you!

Lukas Märdian (slyon)
Changed in netplan:
status: Triaged → Fix Committed
Lukas Märdian (slyon)
Changed in netplan:
status: Fix Committed → Fix Released
Revision history for this message
Dr. Jens Harbott (j-harbott) wrote :

Can we get this backported to Jammy?

Lukas Märdian (slyon)
tags: added: sru-next
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.