Qemu memory leak

Bug #1980401 reported by Lucas Zanella
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Only firefox and a 12gb RAM vm opened and 32gb ram are filled. htop and gnome-system-monitor disagree on RAM.

## Host environment
 - Operating system: Ubuntu 22.04
 - OS/kernel version: Linux pc 5.15.0-39-generic #42-Ubuntu SMP Thu Jun 9 23:42:32 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
 - Architecture: x86_64
 - QEMU flavor: virt-manager's qemu
 - QEMU version: QEMU emulator version 6.2.0 (Debian 1:6.2+dfsg-2ubuntu6.2)

## Emulated/Virtualized environment
 - Operating system: ubuntu 22.04
 - OS/kernel version:Linux pc 5.15.0-40-generic #43-Ubuntu SMP Wed Jun 15 12:54:21 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
 - Architecture: x86_64

## Description of problem

After starting/using my VM for a while, RAM fills up to the 32gb maximum, and firefox starts closing tabs and etc. This didn't happen in ubuntu 21.10 or earlier ubuntus. I've been using virt-manager + qemu for years and only had this after upgrading to ubuntu 22.04.

## Steps to reproduce
1. Launch virt-manager ubuntu VM with 12gb ram maximum (as an example)
2. RAM entire 32gb gets filled but nothing in gnome-system-monitor shows what is using all that RAM
3. Firefox starts closing tabs because RAM is full. Remember that only a 12gb RAM vm and firefox with a few tabs are running, and it fills all 32gb of RAM. Ram starts filling slowly and in 1 hour it fills the entire 32gb. For some reason htop shows a smaller usage, but I'm pretty sure all 32gb are being used as the computer starts freezing and almost crashing (I think swap is being used so it slows down but do not crash)
4. have to restart the computer for RAM to get normal again

Image of gnome system monitor: https://gitlab.com/qemu-project/qemu/uploads/bd012aab280aad7e355c61260be3d8cc/image.png

Revision history for this message
Lena Voytek (lvoytek) wrote :

Hello,
Thank you for submitting this bug report. I attempted to reproduce the issue on my 22.04 system with 32gb of RAM, allocating 12 to a qemu vm instance, but didn't run into a memory leak. Although it may just be difficult to encounter unless the right conditions are met.

As the RAM usage is increasing are you able to see any specific process taking up a large amount of it either in the system monitor processes tab or in htop?

Also would you be able to share the XML config of your virtual system? It may help determine the cause of the leak.

Thanks!

Changed in qemu (Ubuntu):
status: New → Incomplete
Revision history for this message
Lucas Zanella (lucaszanella) wrote :
Download full text (8.0 KiB)

No processes are taking this ammount of RAM. firefox is taking 332mb, proton vpn 800mb, and qemu shows with 149mb. I have no other programs in my host ubuntu, only virt-manager, I do everything inside ubuntu VMs for safety.

I left my computer running without virt-manager and it consumed around 7gb of RAM. Once I start my VM, ram starts filling until 32gb are reached after 1 hour or so, and then firefox starts closing tabs and I get freezes. It's getting difficult to work with this problem.

I tried without the virgl OpenGL renderer as well and same problem.

Here's my XML

<domain type="kvm">
  <name>myvm1</name>
  <uuid>7b22391a-ee8c-4259-8883-daa88a4559ac</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://ubuntu.com/ubuntu/22.04"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">12288000</memory>
  <currentMemory unit="KiB">12288000</currentMemory>
  <memoryBacking>
    <source type="memfd"/>
    <access mode="shared"/>
  </memoryBacking>
  <vcpu placement="static">6</vcpu>
  <os>
    <type arch="x86_64" machine="pc-q35-6.2">hvm</type>
    <boot dev="hd"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <vmport state="off"/>
  </features>
  <cpu mode="host-passthrough" check="none" migratable="on"/>
  <clock offset="utc">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2" discard="unmap"/>
      <source file="/var/lib/libvirt/images/orwell.qcow2"/>
      <target dev="vda" bus="virtio"/>
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <target dev="sda" bus="sata"/>
      <readonly/>
      <address type="drive" controller="0" bus="0" target="0" unit="0"/>
    </disk>
    <controller type="usb" index="0" model="qemu-xhci" ports="15">
      <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="2" port="0x11"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
    </controller>
    <controller type="pci" index="3" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="3" port="0x12"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
    </controller...

Read more...

Revision history for this message
Joril (jorilx) wrote :

I may have a similar problem... Can you post the output of
slabtop -o -sc
?

Revision history for this message
Lucas Zanella (lucaszanella) wrote :
Download full text (17.1 KiB)

 Active / Total Objects (% used) : 7182629 / 7337467 (97,9%)
 Active / Total Slabs (% used) : 146316 / 146316 (100,0%)
 Active / Total Caches (% used) : 157 / 216 (72,7%)
 Active / Total Size (% used) : 1159973,65K / 1193486,03K (97,2%)
 Minimum / Average / Maximum Object : 0,01K / 0,16K / 16,00K

  OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
5384128 5355671 99% 0,06K 84127 64 336508K vmap_area
 10632 10626 99% 16,00K 5316 2 170112K zio_buf_comb_16384
126060 126036 99% 0,97K 3820 33 122240K dnode_t
 10748 10674 99% 8,00K 2687 4 85984K kmalloc-8k
 57792 57568 99% 1,12K 2064 28 66048K zfs_znode_cache
 63072 62335 98% 1,00K 1971 32 63072K kmalloc-1k
150003 135089 90% 0,38K 7143 21 57144K arc_buf_hdr_t_full_crypt
167832 167254 99% 0,19K 7992 21 31968K dentry
 73521 70943 96% 0,38K 3501 21 28008K dmu_buf_impl_t
 36525 33211 90% 0,62K 1461 25 23376K inode_cache
 71370 61357 85% 0,20K 1830 39 14640K vm_area_struct
 25284 23666 93% 0,57K 903 28 14448K radix_tree_node
 57816 57780 99% 0,24K 1752 33 14016K sa_cache
124176 108407 87% 0,10K 3184 39 12736K abd_t
  2976 2817 94% 4,00K 372 8 11904K kmalloc-4k
  1384 1310 94% 7,88K 346 4 11072K task_struct
 70528 69835 99% 0,12K 2204 32 8816K kernfs_node_cache
121984 113143 92% 0,06K 1906 64 7624K kmalloc-64
  3136 2995 95% 2,00K 196 16 6272K kmalloc-2k
 11936 9739 81% 0,50K 373 32 5968K kmalloc-512
 57330 51105 89% 0,09K 1365 42 5460K kmalloc-96
160384 149885 93% 0,03K 1253 128 5012K kmalloc-32
  6348 6070 95% 0,70K 276 23 4416K proc_inode_cache
  5733 4807 83% 0,74K 273 21 4368K shmem_inode_cache
  4080 1774 43% 1,06K 136 30 4352K dmaengine-unmap-128
 16832 14145 84% 0,25K 526 32 4208K filp
   936 614 65% 4,00K 117 8 3744K zfs_btree_leaf_cache
 59136 59136 100% 0,06K 924 64 3696K kmalloc-rcl-64
123930 120281 97% 0,02K 729 170 2916K lsm_file_cache
 15057 14798 98% 0,19K 717 21 2868K kmalloc-192
 40704 29007 71% 0,06K 636 64 2544K anon_vma_chain
  7825 6235 79% 0,31K 313 25 2504K arc_buf_hdr_t_full
  3042 2565 84% 0,81K 78 39 2496K sock_inode_cache
  4960 2801 56% 0,50K 155 32 2480K zio_buf_comb_512
  9696 8076 83% 0,25K 303 32 2424K kmalloc-256
  8672 8398 96% 0,25K 271 ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Well - there is "only" ~1.2GB in the slab.
That isn't all you are looking for.

Let us check in general if we look at the kernel or userspace.
Maybe you could run and report here the output of

$ cat /proc/meminfo
$ sudo smem -tk

That might help further to include/exclude some things in out considerations.

Revision history for this message
Lucas Zanella (lucaszanella) wrote :
Download full text (13.0 KiB)

cat /proc/meminfo
MemTotal: 32582696 kB
MemFree: 5706384 kB
MemAvailable: 5720280 kB
Buffers: 4564 kB
Cached: 2417768 kB
SwapCached: 120580 kB
Active: 1312224 kB
Inactive: 8002812 kB
Active(anon): 1095264 kB
Inactive(anon): 7849040 kB
Active(file): 216960 kB
Inactive(file): 153772 kB
Unevictable: 1155116 kB
Mlocked: 112 kB
SwapTotal: 2097148 kB
SwapFree: 1428220 kB
Dirty: 0 kB
Writeback: 0 kB
AnonPages: 7950296 kB
Mapped: 521400 kB
Shmem: 2053184 kB
KReclaimable: 86320 kB
Slab: 969548 kB
SReclaimable: 86320 kB
SUnreclaim: 883228 kB
KernelStack: 18688 kB
PageTables: 48116 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 18388496 kB
Committed_AS: 17846100 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 834476 kB
VmallocChunk: 0 kB
Percpu: 9664 kB
HardwareCorrupted: 0 kB
AnonHugePages: 4524032 kB
ShmemHugePages: 0 kB
ShmemPmdMapped: 0 kB
FileHugePages: 0 kB
FilePmdMapped: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
Hugetlb: 0 kB
DirectMap4k: 6036656 kB
DirectMap2M: 26173440 kB
DirectMap1G: 2097152 kB

 sudo smem -tk
  PID User Command Swap USS PSS RSS
 4682 lz sh -c /usr/bin/ibus-daemon 104.0K 88.0K 95.0K 1.1M
 4274 root fusermount3 -o rw,nosuid,no 4.0K 112.0K 121.0K 1.3M
 2486 root /usr/sbin/acpid 0 252.0K 261.0K 1.4M
 2535 avahi avahi-daemon: chroot helper 0 128.0K 273.0K 948.0K
 2492 root /usr/sbin/cron -f -P 0 292.0K 302.0K 1.7M
 3105 root /usr/sbin/dnsmasq --conf-fi 0 112.0K 330.0K 1.2M
 2988 rtkit /usr/libexec/rtkit-daemon 0 396.0K 415.0K 2.1M
 4385 lz /usr/libexec/goa-identity-s 1.7M 420.0K 465.0K 5.6M
 4363 lz /usr/libexec/gdm-wayland-se 128.0K 444.0K 471.0K 4.1M
 4797 lz /usr/libexec/ibus-memconf 276.0K 460.0K 494.0K 4.6M
 4440 lz /usr/libexec/gnome-session- 0 484.0K 508.0K 3.7M
 5147 kernoops /usr/sbin/kerneloops 4.0K 472.0K 516.0K 1.9M
 5139 kernoops /usr/sbin/kerneloops --test 0 480.0K 523.0K 1.9M
 4481 lz /usr/bin/dbus-daemon --conf 240.0K 452.0K 545.0K 2.7M
 2508 root /usr/sbin/irqbalance --fore 0 540.0K 560.0K 2.4M
 4814 lz /usr/libexec/ibus-portal 364.0K 532.0K 565.0K 4.6M
17121 lz /usr/bin/gpg-agent --superv 0 516.0K 571.0K 2.1M
 3104 libvirt-dnsmasq /usr/sbin/dnsmasq --conf-fi 0 356.0K 584.0K 2.0M
 4468 lz /usr/libexec/at-spi-bus-lau 348.0K 548.0K 584.0K 4.9M
 4261 lz /usr/libexec/xdg-permission 0 632.0K 660.0K 4.3M
3756028 root sudo smem -tk 0 11...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

So of your
32G there are:

Re-usable:
- 5.7G free
- 2.3G cache

Allocated
- 1.3G Active
- 8.0G Inactive
- 2G shmem
- 1G slab
- 4.5G anon-huge

A lot of those things might double account and overlap, so I do not got to more detail.
But that doesn't look to bad to me.
But to be fair, your swap really has some pages in it, so at some point there was pressure pushing things out of memory onto swap for sure.

The qemu process only has ~6GB which is totally fine, I see no leak there.
Or does this grow out of the allocated ~12G at some point and keeps growing continuously?

Revision history for this message
Lucas Zanella (lucaszanella) wrote :

I left the computer 2 days without opening qemu, and it stays at 7gb of usage maximum. When I open Qemu, suddenly it grows to 31gb and something, even if the VM has only 6gb of RAM, and then firefox starts closing my tabs and etc.

There is a mismatch between gnome-system-monitor RAM, which is ~31gb and htop which shows 'real' usage. Anyway, my RAM is indeed full even if htop says it's 16gb only, cause firefox will close all tabs, and I get some freezes when it's possibly swapping some memory. So I don't know if those results are true. I only know that before 22.04 I never had so much RAM usage.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Interesting, I'm re-reading your XMl with that behavior in mind ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The two things that might be related would be:
a) backing definition
  <memoryBacking>
    <source type="memfd"/>
    <access mode="shared"/>
  </memoryBacking>

b) 3d enablement
    <graphics type="spice">
      <listen type="none"/>
      <image compression="off"/>
      <gl enable="yes" rendernode="/dev/dri/by-path/pci-0000:00:02.0-render"/>
    </graphics>
    <audio id="1" type="spice"/>
    <video>
      <model type="virtio" heads="1" primary="yes">
        <acceleration accel3d="yes"/>
      </model>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0"/>
    </video>

To be clear, none "should" grow it to 32GB, but they are the most related or oncommon things in case they now use other allocations or pre-allocate some buffers or that. You mentioned that you removed virtglrender already and it keeps doing that.

Could you please
1. remove both above sections
2. start your guest again and check if it still behaves in the bad way
   2.1 if it does keep the guest without those sections as it ust makes it easier to debug)
   2.2 if it stops behaving that way try which setting exactly it was by adding them back one by one
3. Once you have found the minimal set which exposes the issue please start your guest while running the smem call of above in a loop - maybe every 2 seconds. We'd want to see if it is qmeu or anything else that initially grows that much.

Revision history for this message
Lucas Zanella (lucaszanella) wrote :
Download full text (7.3 KiB)

Same problem. Here's my new XML:

<domain type="kvm">
  <name>supernova</name>
  <uuid>7b22391a-ee8c-4259-8883-daa88a4559ac</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://ubuntu.com/ubuntu/22.04"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit="KiB">20480000</memory>
  <currentMemory unit="KiB">20480000</currentMemory>
  <vcpu placement="static">8</vcpu>
  <os>
    <type arch="x86_64" machine="pc-q35-6.2">hvm</type>
    <boot dev="hd"/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <vmport state="off"/>
  </features>
  <cpu mode="host-passthrough" check="none" migratable="on"/>
  <clock offset="utc">
    <timer name="rtc" tickpolicy="catchup"/>
    <timer name="pit" tickpolicy="delay"/>
    <timer name="hpet" present="no"/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>destroy</on_crash>
  <pm>
    <suspend-to-mem enabled="no"/>
    <suspend-to-disk enabled="no"/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type="file" device="disk">
      <driver name="qemu" type="qcow2" discard="unmap"/>
      <source file="/var/lib/libvirt/images/orwell.qcow2"/>
      <target dev="vda" bus="virtio"/>
      <address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
    </disk>
    <disk type="file" device="cdrom">
      <driver name="qemu" type="raw"/>
      <target dev="sda" bus="sata"/>
      <readonly/>
      <address type="drive" controller="0" bus="0" target="0" unit="0"/>
    </disk>
    <controller type="usb" index="0" model="qemu-xhci" ports="15">
      <address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
    </controller>
    <controller type="pci" index="0" model="pcie-root"/>
    <controller type="pci" index="1" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="1" port="0x10"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
    </controller>
    <controller type="pci" index="2" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="2" port="0x11"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
    </controller>
    <controller type="pci" index="3" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="3" port="0x12"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
    </controller>
    <controller type="pci" index="4" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="4" port="0x13"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/>
    </controller>
    <controller type="pci" index="5" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="5" port="0x14"/>
      <address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/>
    </controller>
    <controller type="pci" index="6" model="pcie-root-port">
      <model name="pcie-root-port"/>
      <target chassis="6" port="0x15"/>
      <address type="...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I read your answer as "Same problem ... with those two sections removed".
Fine.

Have you gone any further?
I asked if you could try to find the minimal set that exposes the issue.

You can further further until you spot the one thing that causes this "eat all memory" behavior.
Reduce devices like spice, video, graphics, ... until you have found the minimal set where you can say "with option A it happens without A it does not".

Also as I said in comment #7 analyzing your smem/meminfo data.
Your system was not overloaded at that time.
Could you provide smem/meminfo data from the time this is blowing up memory consumption (preferably with the simplified guest config).

Because right now - we can't even see where this memory is consumed.

P.S. Not saying you do, but quite often people look at "free" too much which isn't meant to stay free. Quite the opposite - the system is optimized to use as much as possible for caches of all kind. Therefore seeing where it is using 32G for qemu will be important to be able to go further ...

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for qemu (Ubuntu) because there has been no activity for 60 days.]

Changed in qemu (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Paride Legovini (paride) wrote :

@Lucas: I had another look at this bug report, and looks like Christian's questions are still relevant. Should you find the time to answer please do so and change the bug status back to New, we'll look at it again. Thanks!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.