focal kvm virtio_net set_features failed (-22)

Bug #1945868 reported by Markus Neubauer
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned
qemu (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Setup linux-modules-5.4.0-88-generic within focal guest on a (equal) ubuntu 20.04 kvm bare metal machine.

When using virtio as network driver it is failing with:

net eth0: Fail to set guest offload.
virtio_net virtio1 eth0: set_features() failed (-22); wanted 0x0000008000174a29, left 0x0
00000800017ca29

followed by further kern log entries:
------------[ cut here ]------------
 netdevice: br0: failed to disable 0x0000000000008000 on eth0!
 WARNING: CPU: 0 PID: 1048 at net/core/dev.c:8683 __netdev_update_features+0x4d1/0x9f0
 Modules linked in: macvlan virtio_gpu ttm drm_kms_helper fb_sys_fops input_leds syscopyarea joydev sysfillrect serio_raw sysimgblt mac_hid sch_fq_codel 9p fscache 9pnet_virtio 9pnet bridge stp llc drm virtio_rng ip_tables x_tables autofs4 dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c psmouse virtio_net i2c_piix4 virtio_blk net_failover failover pata_acpi floppy
 CPU: 0 PID: 1048 Comm: brctl Tainted: G W 5.4.0-88-generic #99-Ubuntu
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 RIP: 0010:__netdev_update_features+0x4d1/0x9f0
 Code: c7 c2 9b 30 9b a0 4c 8b 4d 88 48 85 c0 4d 0f 44 d4 4d 89 f0 48 8d 4d b8 4c 89 ca 4c 89 d6 48 c7 c7 d0 94 a3 a0 e8 dc 2d 19 00 <0f> 0b e9 5b fe ff ff 8b 45 9c 85 c0 0f 84 96 00 00 00 8b 45 9c f7
 RSP: 0018:ffffbf1840e0ba98 EFLAGS: 00010282
 RAX: 0000000000000000 RBX: 000000000000000f RCX: 0000000000000006
 RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff9c2c13a178c0
 RBP: ffffbf1840e0bb10 R08: 00000000000002d1 R09: 0000000000000004
 R10: 0000000000000000 R11: 0000000000000001 R12: ffff9c2c0b258000
 R13: ffff9c2c0b2580c0 R14: ffff9c2c0615a000 R15: 000000000000000f
 FS: 00007f98be558580(0000) GS:ffff9c2c13a00000(0000) knlGS:0000000000000000
 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007fffc24a1080 CR3: 000000030b3e4000 CR4: 00000000000006f0
 Call Trace:
  netdev_update_features+0x25/0x60
  br_device_event+0x20f/0x300 [bridge]
  notifier_call_chain+0x55/0x80
  raw_notifier_call_chain+0x16/0x20
  call_netdevice_notifiers_info+0x2e/0x60
  netdev_update_features+0x59/0x60
  dev_disable_lro+0x21/0xf0
  br_add_if+0x301/0x650 [bridge]
  add_del_if+0x67/0x80 [bridge]
  br_dev_ioctl+0x54/0x80 [bridge]
  dev_ifsioc+0xe5/0x3e0
  ? _cond_resched+0x19/0x30
  dev_ioctl+0x2dd/0x470
  sock_do_ioctl+0xa8/0x140
  ? evict+0x14c/0x1b0
  sock_ioctl+0x24f/0x3c0
  ? _cond_resched+0x19/0x30
  ? get_max_files+0x20/0x20
  do_vfs_ioctl+0x407/0x670
  ksys_ioctl+0x67/0x90
  __x64_sys_ioctl+0x1a/0x20
  do_syscall_64+0x57/0x190
  entry_SYSCALL_64_after_hwframe+0x44/0xa9
 RIP: 0033:0x7f98be47c50b
 Code: 0f 1e fa 48 8b 05 85 39 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 55 39 0d 00 f7 d8 64 89 01 48
 RSP: 002b:00007ffd783321e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
 RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f98be47c50b
 RDX: 00007ffd78332210 RSI: 00000000000089a2 RDI: 0000000000000003
 RBP: 00007ffd78332210 R08: 0000000000000020 R09: fefefeff77686d74
 R10: 00007ffd78333e80 R11: 0000000000000246 R12: 00007ffd78333e7e
 R13: 00007ffd78333e7e R14: 0000000000000000 R15: 0000000000000000
 ---[ end trace 240d361752e8f3f0 ]---
 ------------[ cut here ]------------

Temporary solution is to return to the old rtl8139 driver and slow network down

Tags: focal kvm virtio
description: updated
description: updated
Revision history for this message
Markus Neubauer (mn-ubuntu) wrote (last edit ):

It is affecting kernel linux-image-5.4.0-88-generic within guest and ocured after upgrading to focal. I returned to kernel Linux version 4.15.0-159-generic (bionic) within guest and virtio_net functions as expected.
The idea from https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=165059#c21 did not solve the problem.

Revision history for this message
Paul White (paulw2u) wrote :

There is no 'focal' package in any current release of Ubuntu.
Assuming you meant to task against the kernel.
So moving to 'linux'.

affects: focal (Ubuntu) → linux (Ubuntu)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1945868

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Markus Neubauer (mn-ubuntu) wrote :

The relevant part of the kern.log is already within the description. As the server is in production we needed to return to previous kernel.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
affects: qemu-kvm (Ubuntu) → qemu (Ubuntu)
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (4.8 KiB)

Hi Markus, thanks for the report.
While the good/bad switch with the kernels suggest a reason in there, knowing more about the exact configuration of the guest would help in any case. The reason easily is that myself and million others use virtio-net in focal guests just fine, so there must be some detail to it in your case that makes it differ. Knowing that will help to understand.

So let me ask a few clarifications:
- You said 20.04 Host, so I assume you are on qemu 1:4.2-3ubuntu6.17 and libvirt 6.0.0-0ubuntu8.14 (could as well be backports and I want to be sure)
- You mentioned guest versions, but which exact host kernel version are you using when this happens?
- How did you configure your guest and especially the network adapter (libvirt XML if you use it or qemu cmdline if you use some other way to create it)
- Does this only apply to an old guest that is kept up (upgrade to 20.04 in the guest, reboot guest, but the qemu process stays since a long time?) or is it reproducible with a freshly started 20.04 guest on the same system?

Furthermore let us know if there is more when the issue happens in either of:
a) host kernel (dmesg)
b) host userspace (qemu log in /var/log/libvirt/guestname)
c) guest journal
d) host journal

From the code that complains in the guest we can check which feature it could not set.
The workaround you mentioned was about checksumming, but maybe in your case it is something different.
"wanted 0x0000008000174a29, left 0x000000800017ca29"

So it wanted to disable one, but could not.
This is already interesting as plenty of features are fixed when using virtio-net.
You can see that if you look at `$ sudo ethtool --show-features enp1s0` for example.

The list your guest request is:
tx-scatter-gather
tx-checksum-ip-generic
highdma
rx-vlan-filter
tx-generic-segmentation
rx-gro
tx-tcp-segmentation
tx-gso-robust
tx-tcp-ecn-segmentation
tx-tcp6-segmentation
(This is somewhat hard to read, so I hope this is right)

But more important is the difference which it wanted to disable but could not.
That is 0x8000 which maps via netdev_features_t to "rx-lro"

Checking an example guest that I have I see this is default on but should be switchable.
This is from the newest Ubuntu release:
$ sudo ethtool --show-features enp1s0 | grep large
large-receive-offload: on
$ sudo ethtool --features enp1s0 lro off
$ sudo ethtool --show-features enp1s0 | grep large
large-receive-offload: off
$ sudo ethtool --features enp1s0 lro on
$ sudo ethtool --show-features enp1s0 | grep large
large-receive-offload: on

It might be worth with your working kernel to run the above sequence to see if there it is
a) off or on by default on start
b) can be turned off/on as requested

While you have lost networking after virtio-net fails, it would be even more awesome if you could run the same sequence via e.g. "virsh console" or any other non-network access to the guest that you might have.

I was running this sequence then as cross check with Focal host and Focal guest
1. 5.4.0-86-generic
$ sudo ethtool --show-features enp1s0 | grep large
large-receive-offload: off
$ sudo ethtool --features enp1s0 lro on
$ sudo ethtool --show-features enp1s0 | grep large
large...

Read more...

Changed in qemu (Ubuntu):
status: New → Incomplete
Revision history for this message
Markus Neubauer (mn-ubuntu) wrote :

Could be closed meanwhile. I needed to shut kvm down on this system and can not get/generate the required reports/infos any more. It has been working with e1000 drivers last.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for the ping Markus, let us know if you run into it again on another system
e1000 as fallback is fine, but also slow IMHO.
So I hope you we can one day find what it was.

Changed in qemu (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
fxpester (a-yurtaykin) wrote :

just hit it with 5.4.17-2011.6.2.el7uek.x86_64
switching to e1000 as well

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.