systemd-oomd frequently kills firefox and visual studio code

Bug #1972159 reported by ChM
92
This bug affects 13 people
Affects Status Importance Assigned to Milestone
Default settings and artwork for Baltix OS
New
Undecided
Unassigned
systemd (Fedora)
Fix Released
Critical
systemd (Ubuntu)
Fix Released
High
Nick Rosbrook
Jammy
Fix Released
Undecided
Nick Rosbrook
Kinetic
Fix Released
High
Nick Rosbrook

Bug Description

[Impact]

The "swap kill" side of systemd-oomd has caused unexpected behavior for desktop users. A user's browser, desktop session, or some other desktop application may be killed by systemd-oomd when SwapUsedLimit is reached, but system performance otherwise appears unaffected. This leaves users confused as to why their application was killed, and has a negative impact on their desktop experience.

For now, let's disable the swap kill functionality by default.

[Test Plan]

On Jammy desktop, check the ManagedOOMSwap property on -.slice:

$ systemctl show -- "-.slice" | grep "^ManagedOOMSwap"
ManagedOOMSwap=kill # After the fix, this should print ManagedOOMSwap=auto

[Where problems could occur]

Disabling swap kill by default means that users may experience degraded system performance due to high swap usage, because systemd-oomd will no longer act on cgroups with high swap usage.

[Other Info]

If a user wishes to restore the original systemd-oomd behavior, they can do so by creating the following overrides file:

 $ cat /etc/systemd/system/-.slice.d/10-oomd-root-slice-defaults.conf
 [Slice]
 ManagedOOMSwap=kill

[Original Description]

Since I installed Ubuntu 22.04, firefox and visual studio code are frequently killed by systemd-oomd (every 2hours).

I have 8 GB memory and never experienced this before the upgrade to Ubuntu 22.04. I thus assume that the claim that there is not enough memory is abusive. Did 64GB of memory become the minimum requirement to run Ubuntu ?

The second problem is that it gives a very bad user experience which is critical for new Ubuntu users.

There should be a warning prior killing apps to give the opportunity to save the app data. There should at least be an apologize and an explanation after killing the app.

The current behavior gives the impression that Ubuntu 22.04 is unreliable and unsafe to use which is a problem for an LTS release that many people might want to use for critical production context.

There might be a configuration problem with systemd-oomd or simply a bogus behavior. I would recommend to disable it or remove it completely until this problem is resolved. This is what I will do for myself because I have work to do.

Related branches

CVE References

Revision history for this message
In , isaac.bernadus (isaac.bernadus-redhat-bugs) wrote :

Description of problem:

Systemd-oomd is very aggressive when it comes to memory management. In fedora 33 I've been able to run quite a few apps without a problem but in fedora 34, the apps get killed way too quickly. Here's an example of atom getting killed by systemd-oomd:

Mar 20 22:18:34 x505za systemd-oomd[1020]: Memory pressure for /user.slice/user-1000.slice/user@1000.service is greater than 10 for more than 10 seconds and there was reclaim activity
Mar 20 22:18:34 x505za systemd[1604]: app-gnome-atom-11930.scope: systemd-oomd killed 47 process(es) in this unit.
Mar 20 22:18:36 x505za systemd[1604]: app-gnome-atom-11930.scope: Deactivated successfully.
Mar 20 22:18:36 x505za systemd[1604]: app-gnome-atom-11930.scope: Consumed 7.557s CPU time

This event is triggered when around 70-80% of my memory is filled up despite still having space in swap.

Version-Release number of selected component (if applicable): systemd 248 (v248~rc4-1.fc34)

How reproducible:
Always

Steps to Reproduce:
1. Load up a bunch of apps to fill up memory
2. Wait for systemd-oomd to trigger reclaim activity
3.

Actual results:
Apps get killed very quickly

Expected results:
Apps to run normally until memory and swap is almost full

Additional info:

System Specs:

Ryzen 3 2200u
4GB RAM
Kernel 5.11.7-300.fc34.x86_64

Revision history for this message
In , isaac.bernadus (isaac.bernadus-redhat-bugs) wrote :

Seems like if the memory gets filled fast enough, systemd will even decide to kill Gnome

Revision history for this message
In , red (red-redhat-bugs-1) wrote :

Same problem here.
The fedora defaults are too aggressive. They make systemd-oomd very trigger-happy (ManagedOOMMemoryPressureLimit=10% for 10 seconds)

With 4GB and zram, you can barely use anything. With 2GB it is a task-massacre all the time.

The defaults suggested in the manual (60% & 30s) still prevent excessive spinning while working way more predictably.

Revision history for this message
In , iolo (iolo-redhat-bugs) wrote :

I've noticed this too while testing Fedora Workstation 34 this week. I'll leave Netbeans or Brave running to go do something else, and I've got maybe about 4 GB of free RAM out of 8 GB total at that point. Then, some time later, I will try to go back to Netbeans or Brave, or whatever it is, only to find that it's been killed. I've never had anything like this happen before.

Revision history for this message
In , the.anitazha (the.anitazha-redhat-bugs) wrote :

I will work on updating the pressure defaults now that the test week results have come in. I agree that the defaults are a bit aggressive, but that's what the test week and beta was meant to iron out.

Revision history for this message
In , the.anitazha (the.anitazha-redhat-bugs) wrote :

I've submitted https://src.fedoraproject.org/rpms/systemd/pull-request/58# to bump pressure defaults to 50% for 20s. Hopefully these more conservative values will perform better for most people.

Revision history for this message
In , updates (updates-redhat-bugs) wrote :

FEDORA-2021-8595b30af3 has been submitted as an update to Fedora 34. https://bodhi.fedoraproject.org/updates/FEDORA-2021-8595b30af3

Revision history for this message
In , updates (updates-redhat-bugs) wrote :

FEDORA-2021-8595b30af3 has been pushed to the Fedora 34 testing repository.
Soon you'll be able to install the update with the following command:
`sudo dnf upgrade --enablerepo=updates-testing --advisory=FEDORA-2021-8595b30af3`
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2021-8595b30af3

See also https://fedoraproject.org/wiki/QA:Updates_Testing for more information on how to test updates.

Revision history for this message
In , updates (updates-redhat-bugs) wrote :

FEDORA-2021-8595b30af3 has been pushed to the Fedora 34 stable repository.
If problem still persists, please make note of it in this bug report.

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-redhat-bugs) wrote :

Created attachment 1769904
htop

Today oomd again killed my container.

$ cat /usr/lib/systemd/oomd.conf.d/10-oomd-defaults.conf
[OOM]
DefaultMemoryPressureDurationSec=20s

$ cat /usr/lib/systemd/system/user@.service.d/10-oomd-user-service-defaults.conf
[Service]
ManagedOOMMemoryPressure=kill
ManagedOOMMemoryPressureLimit=50%

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-redhat-bugs) wrote :

Created attachment 1769916
system log

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-redhat-bugs) wrote :

Created attachment 1769918
system log

Revision history for this message
In , the.anitazha (the.anitazha-redhat-bugs) wrote :

@Mikhail Was the system responsive and performing well at 54% pressure on the user service cgroup? Also can you try stopping systemd-oomd (sudo systemctl stop systemd-oomd) and recording what the highest tolerable pressure value was from `/sys/fs/cgroup/user.slice/user-$UID.slice/user@$UID.service/memory.pressure` while your container is running? We can't control for all workloads but it's worthwhile to see what pressure is tolerable or not.

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-redhat-bugs) wrote :

(In reply to Anita Zhang from comment #12)
> @Mikhail Was the system responsive and performing well at 54% pressure on
> the user service cgroup? Also can you try stopping systemd-oomd (sudo
> systemctl stop systemd-oomd) and recording what the highest tolerable
> pressure value was from
> `/sys/fs/cgroup/user.slice/user-$UID.slice/user@$UID.service/memory.
> pressure` while your container is running? We can't control for all
> workloads but it's worthwhile to see what pressure is tolerable or not.

$ cat /sys/fs/cgroup/user.slice/user-$UID.slice/user@$UID.service/memory.pressure
some avg10=0.00 avg60=0.00 avg300=0.13 total=1698253169
full avg10=0.00 avg60=0.00 avg300=0.11 total=1515028054

$ journalctl -b -u systemd-oomd --no-pager
-- Journal begins at Thu 2021-07-29 17:02:00 +05, ends at Wed 2021-09-08 00:51:09 +05. --
Sep 04 03:16:03 primary-ws systemd[1]: Starting Userspace Out-Of-Memory (OOM) Killer...
Sep 04 03:16:03 primary-ws systemd[1]: Started Userspace Out-Of-Memory (OOM) Killer.
Sep 08 00:23:23 primary-ws systemd-oomd[1552]: Killed /user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-887e6f17-fa6d-44cd-aa80-798d5c0c71ce.scope due to memory pressure for /user.slice/user-1000.slice/user@1000.service being 52.46% > 50.00% for > 20s with reclaim activity

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-redhat-bugs) wrote :

^^^ This is F36 and systemd oomd still killing my terminal tabs.

Revision history for this message
In , the.anitazha (the.anitazha-redhat-bugs) wrote :

(In reply to Mikhail from comment #13)
> $ journalctl -b -u systemd-oomd --no-pager
> -- Journal begins at Thu 2021-07-29 17:02:00 +05, ends at Wed 2021-09-08
> 00:51:09 +05. --
> Sep 04 03:16:03 primary-ws systemd[1]: Starting Userspace Out-Of-Memory
> (OOM) Killer...
> Sep 04 03:16:03 primary-ws systemd[1]: Started Userspace Out-Of-Memory (OOM)
> Killer.
> Sep 08 00:23:23 primary-ws systemd-oomd[1552]: Killed
> /user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.
> Terminal.slice/vte-spawn-887e6f17-fa6d-44cd-aa80-798d5c0c71ce.scope due to
> memory pressure for /user.slice/user-1000.slice/user@1000.service being
> 52.46% > 50.00% for > 20s with reclaim activity

You're pretty close to the default limits set up for Fedora so if you're fine with the added pressure you may want to try bumping them for your system with an override like so:

$ cat /etc/systemd/system/user@.service.d/99-oomd-override.conf
[Service]
ManagedOOMMemoryPressureLimit=65%
$ sudo systemctl daemon-reload
$ oomctl # check if new limit was applied

The default values will likely be reworked once https://github.com/systemd/systemd/pull/20690 is merged. This will allow setting more tuned pressure values on slices within a user session rather than relying on one value for all of <email address hidden>.

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-redhat-bugs) wrote :

(In reply to Anita Zhang from comment #15)
>
> You're pretty close to the default limits set up for Fedora so if you're
> fine with the added pressure you may want to try bumping them for your
> system with an override like so:
>
> $ cat /etc/systemd/system/user@.service.d/99-oomd-override.conf

Directory `user@.service.d` is absent on my system.

$ ls /etc/systemd/system/user@.service.d
ls: cannot access '/etc/systemd/system/user@.service.d': No such file or directory

> [Service]
> ManagedOOMMemoryPressureLimit=65%
> $ sudo systemctl daemon-reload
> $ oomctl # check if new limit was applied
>
> The default values will likely be reworked once
> https://github.com/systemd/systemd/pull/20690 is merged. This will allow
> setting more tuned pressure values on slices within a user session rather
> than relying on one value for all of <email address hidden>.

As I am understand by default PressureLimit should be 50%

$ cat /usr/lib/systemd/oomd.conf.d/10-oomd-defaults.conf
[OOM]
DefaultMemoryPressureDurationSec=20s

$ cat /usr/lib/systemd/system/user@.service.d/10-oomd-user-service-defaults.conf
[Service]
ManagedOOMMemoryPressure=kill
ManagedOOMMemoryPressureLimit=50%

But oomctl show 60%, why?

$ oomctl
Dry Run: no
Swap Used Limit: 90.00%
Default Memory Pressure Limit: 60.00%
Default Memory Pressure Duration: 20s
System Context:
        Memory: Used: 55.3G Total: 62.6G
        Swap: Used: 104.5M Total: 63.9G
Swap Monitored CGroups:
        Path: /
                Swap Usage: (see System Context)
Memory Pressure Monitored CGroups:
        Path: /user.slice/user-1000.slice/user@1000.service
                Memory Pressure Limit: 50.00%
                Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 14s
                Current Memory Usage: 49.6G
                Memory Min: 250.0M
                Memory Low: 0B
                Pgscan: 85039860
                Last Pgscan: 85039860

Revision history for this message
In , the.anitazha (the.anitazha-redhat-bugs) wrote :

You(In reply to Mikhail from comment #16)
> (In reply to Anita Zhang from comment #15)
> Directory `user@.service.d` is absent on my system.

You need to make it. Directories under /etc/systemd/system are managed by the system maintainer.

> As I am understand by default PressureLimit should be 50%
>
> But oomctl show 60%, why?
>
> $ oomctl
> Dry Run: no
> Swap Used Limit: 90.00%
> Default Memory Pressure Limit: 60.00%
> Default Memory Pressure Duration: 20s
> System Context:
> Memory: Used: 55.3G Total: 62.6G
> Swap: Used: 104.5M Total: 63.9G
> Swap Monitored CGroups:
> Path: /
> Swap Usage: (see System Context)
> Memory Pressure Monitored CGroups:
> Path: /user.slice/user-1000.slice/user@1000.service
> Memory Pressure Limit: 50.00%
> Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 14s
> Current Memory Usage: 49.6G
> Memory Min: 250.0M
> Memory Low: 0B
> Pgscan: 85039860
> Last Pgscan: 85039860

Default memory pressure limit is 60% meaning if the unit doesn't override it, it will use 60%. But here since we ship a config for user@.service, the memory pressure limit is overridden to be 50% (it has it in the output above the "Pressure" line).

Revision history for this message
In , mikhail.v.gavrilov (mikhail.v.gavrilov-redhat-bugs) wrote :

> You're pretty close to the default limits set up for Fedora so if you're fine with the added pressure you may want to try bumping them for your system with an override like so

ManagedOOMMemoryPressureLimit=65% did't helps :(

$ journalctl -b -u systemd-oomd --no-pager
-- Journal begins at Thu 2021-10-07 03:47:38 +05, ends at Fri 2021-11-12 19:54:14 +05. --
Nov 12 14:42:28 primary-ws systemd[1]: Starting Userspace Out-Of-Memory (OOM) Killer...
Nov 12 14:42:28 primary-ws systemd[1]: Started Userspace Out-Of-Memory (OOM) Killer.
Nov 12 17:50:48 primary-ws systemd-oomd[1172]: Killed /user.slice/user-1000.slice/user@1000.service/app.slice/app-org.gnome.Terminal.slice/vte-spawn-f92b7041-15da-41fb-8076-8221774567da.scope due to memory pressure for /user.slice/user-1000.slice/user@1000.service being 65.70% > 65.00% for > 20s with reclaim activity

$ cat /sys/fs/cgroup/user.slice/user-$UID.slice/user@$UID.service/memory.pressure
some avg10=3.68 avg60=31.49 avg300=25.59 total=424288160
full avg10=3.59 avg60=29.41 avg300=23.58 total=390639367

$ oomctl
Dry Run: no
Swap Used Limit: 90.00%
Default Memory Pressure Limit: 60.00%
Default Memory Pressure Duration: 20s
System Context:
        Memory: Used: 60.4G Total: 62.6G
        Swap: Used: 16.5G Total: 71.9G
Swap Monitored CGroups:
        Path: /
                Swap Usage: (see System Context)
Memory Pressure Monitored CGroups:
        Path: /user.slice/user-1000.slice/user@1000.service
                Memory Pressure Limit: 65.00%
                Pressure: Avg10: 2.11 Avg60: 26.64 Avg300: 23.10 Total: 6min 30s
                Current Memory Usage: 25.5G
                Memory Min: 250.0M
                Memory Low: 0B
                Pgscan: 36702397
                Last Pgscan: 36690917

Revision history for this message
In , mohd.akram (mohd.akram-redhat-bugs) wrote :

How do I disable this completely? It's constantly killing apps despite plenty of RAM and swap space available. I do `sudo systemctl disable --now systemd-oomd.service` and it comes back when I restart. I don't want to use any userspace OOM killer.

Revision history for this message
In , red (red-redhat-bugs-1) wrote :

(In reply to Mohamed Akram from comment #19)
> How do I disable this completely?

You can disable it completely with "systemctl mask systemd-oomd"
(Masked services won't start even if you launch them manually.)

Revision history for this message
In , the.anitazha (the.anitazha-redhat-bugs) wrote :

(In reply to Mohamed Akram from comment #19)
> It's constantly killing apps despite plenty of RAM and swap space available.

Hey this sounds like a legit bug? Do you still have the logs from this event? They should be visible in the journal by doing `journalctl -u systemd-oomd -g Killed`

Revision history for this message
ChM (christophe-meessen) wrote :

$ free -h
               total utilisé libre partagé tamp/cache disponible
Mem: 7,7Gi 3,3Gi 2,2Gi 113Mi 2,2Gi 4,0Gi
Partition d'échange: 2,0Gi 1,1Gi 936Mi

Syslog:

May 9 09:55:32 xxx systemd[2839]: snap.firefox.firefox.b9635bb0-3585-4241-8d1b-8936cedebc3a.scope: systemd-oomd killed 288 process(es) in this unit.
May 9 09:55:32 xxx systemd[2839]: snap.firefox.firefox.b9635bb0-3585-4241-8d1b-8936cedebc3a.scope: Consumed 5min 25.300s CPU time.
May 9 09:55:33 xxx systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
May 9 09:55:37 xxx systemd-oomd[607]: Killed /user.slice/user-1000.slice/user@1000.service/app.slice/snap.code.code.9ab9bef1-a97e-46df-8879-2377452219ab.scope due to memory used (8181518336) / total (8280240128) and swap used (1969328128) / total (2147479552) being more than 90.00%
May 9 09:55:37 xxx systemd[2839]: snap.code.code.9ab9bef1-a97e-46df-8879-2377452219ab.scope: systemd-oomd killed 82 process(es) in this unit.
May 9 09:55:38 xxx systemd[2839]: snap.code.code.9ab9bef1-a97e-46df-8879-2377452219ab.scope: Consumed 29min 25.392s CPU time.

Revision history for this message
ChM (christophe-meessen) wrote :

$ LANG=en free -h
               total used free shared buff/cache available
Mem: 7.7Gi 3.3Gi 2.1Gi 113Mi 2.2Gi 4.0Gi
Swap: 2.0Gi 1.1Gi 941Mi

Revision history for this message
Sebastien Bacher (seb128) wrote :

When did you get the 'free' information?

The log states

> due to memory used (8181518336) / total (8280240128) and swap used (1969328128) / total (2147479552)

it would be interesting to know if you actually hit the limits or if the computation is wrong?

Changed in systemd (Ubuntu):
importance: Undecided → High
status: New → Incomplete
Lukas Märdian (slyon)
tags: added: rls-kk-incoming
tags: added: rls-jj-incoming
removed: rls-kk-incoming
Revision history for this message
ChM (christophe-meessen) wrote :

The free -h information was collected by me after my firefox and visual studio code were killed.

The values in the log report were generated by systemd-oomd when it killed the apps I assume.

I disabled it, I thus can't contribute any further. sorry.

Lukas Märdian (slyon)
tags: removed: rls-jj-incoming
Revision history for this message
ChM (christophe-meessen) wrote :

I have upgraded my computer to 40GB of memory and restarted systemd-oomd.

Memory usage is stable. No memory leak to report. I can make further test if needed.

Revision history for this message
Connor Nolan (thebrokenrail) wrote :

This also affects me:

May 26 08:47:22 <hostname> systemd-oomd[542]: Killed /user.slice/user-1000.slice/user@1000.service/app.slice/snap.firefox.firefox.cef581a2-89d2-4a72-86a9-8a0a30cfdb86.scope due to memory used (14922825728) / total (16526151680) and swap used (1944498176) / total (2147479552) being more than 90.00%

I never had any issues with Ubuntu 20.04 and this also occurs with QtCreator (upstream version not distribution version).

Revision history for this message
ChM (christophe-meessen) wrote :

Problem definitely disappeared once I upgraded memory to 40GB. But memory usage displayed in top never exceed 6GB.

Revision history for this message
John S (johnps) wrote :

For me I have 32GB of ram and it was killing my entire user session and dropping me back to the GDM. (It took me over a week to realise this was happening and I was kicked back to login losing all work at least 15 times, having to log back in produced a lot of log spam that hid the OOM message, I was close to having to reinstall or switch distro to fix it) I was using i3 but this was totally unusable for me and I had to remove the service. Does it just not play nice with applications that grab as much RAM as possible and free it when the system is under pressure?

Personally if this isn't fixed and OOM gets turned back on at some point it would be severe enough for me to switch distro.

Revision history for this message
Kevin (kevin-b-er) wrote :

This is greatly exasperated because systemd until v251 is using MemFree and not MemAvailable to decide how much memory is remaining. Since Linux aggressively uses MemFree for caching, this will result in systemd-oomd excessively killing applications.

There's a fix in upstream 030bc91cb98385904b28a839d1e04bb4160a52d2, which was released as v251 about a week ago.

Revision history for this message
Adolfo Jayme Barrientos (fitojb) wrote :

I wish upstart was back

Revision history for this message
Sebastien Bacher (seb128) wrote :

@Kevin, we cherrypicked that patch in 22.04 before release

https://launchpad.net/ubuntu/+source/systemd/249.11-0ubuntu3

Revision history for this message
In , mail (mail-redhat-bugs) wrote :

Just wanted to chime in that I've had systemd-oomd kill my gnome shell four times in past two months on my current install of F35 (leading to a pretty jarring experience). I'm a pretty lay Linux user, will try to attach logs, let me know if anything else is helpful.

Revision history for this message
In , mail (mail-redhat-bugs) wrote :

Created attachment 1885395
journalctl -- oomd kills

Revision history for this message
In , bugzilla (bugzilla-redhat-bugs) wrote :

>Mar 25 19:00:46 fedora systemd-oomd[1612]: Killed /user.slice/user-1000.slice/user@<email address hidden> due to memory used (16360116224) / total (16541884416) and swap used (7752622080) / total (8589930496) being more than 90.00%

That something is being killed off at 90% swap usage makes sense, but not GNOME Shell. That's exchanging one big problem for another big problem, I'm not sure we can ever consider killing the desktop itself as a solution to the swap perf problem. And it makes me wonder if the only thing we can do is ensure resource control limits everything well enough that the user has the ability to choose what program needs to get killed off rather than doing it for them? As in, I'm wondering if oomd really should only be killing the most obvious candidates, and actually permit the less obvious candidates while still constraining the resources they can use to like 90% or whatever allows the shell+terminal to remain responsive enough (i.e. not perfect) such that the user doesn't reach for the power button. But instead reaches for top or systemd-cgtop to find out what's hogging resources, and decides whether to clobber it or not?

Revision history for this message
In , mail (mail-redhat-bugs) wrote :

I agree.

I might add though (and this is my first time on linux forums, so I'm not sure how this is generally addressed) that this problem can easily affect non-technical end users as well, which would likely not be comfortable with shell commands such as top or systemd-cgtop. In case it is helpful, I'll attach a screenshot of how macOS solves this problem through a GUI interface for force-quitting that includes current memory usage for each application. I think this is a nice way of empowering the user to make the decision. However, I could not find an existing linux/GNOME GUI that does something similar.

In the meantime, perhaps improving the heuristics of the resource manager so core services like GNOME Shell are not killed would be helpful. And I think that so long as auto-killing is the active solution, it would be nice for any apps that are killed due to memory constraints for there to be a system alert informing the user as to this decision, since the event otherwise seems quite anomalous.

Revision history for this message
In , mail (mail-redhat-bugs) wrote :

Created attachment 1885855
macos force quit GUI screenshot

Revision history for this message
In , burkesamuel (burkesamuel-redhat-bugs) wrote :

Created attachment 1886712
journalctl -u systemd-oomd -g Killed | grep -v Boot

My daughter is using Fedora 35 with Cinnamon Desktop and has been complaining about randomly being logged out.

We'll see how things go with systemd-oomd disabled/masked.

Revision history for this message
Lester Carballo Pérez (lestcape) wrote (last edit ):

In my case that was occurring compiling the kernel. The process take 6 hours and crash without finished in three occasions. See:
https://gitlab.freedesktop.org/drm/amd/-/issues/1569#note_1409226

To ensure it finished, I released the cache several time in the process:
sudo sh -c " sync; echo 3 > /proc/sys/vm/drop_caches"

When I do that all was ok, but sure is not nice to have a process running with an interval time to observed and cleaning the cache if it's to high.

LC_MESSAGES=C free -h
               total used free shared buff/cache available
Mem: 31Gi 16Gi 13Gi 277Mi 1.3Gi 14Gi
Swap: 39Gi 1.0Mi 39Gi

Revision history for this message
djchandler (djchandler) wrote :

I use a swap file size of 16 GB, equal to ram size on one of my systems that's been upgraded to 22.04, using Wayland and Gnome. No problems for me yet. Could simply increasing swap file size alleviate this issue for most until the patch(s) reaches endusers? (rhetorical question, no feedback please.)

Revision history for this message
Sebastien Bacher (seb128) wrote :
Lukas Märdian (slyon)
tags: added: rls-jj-incoming
tags: added: rls-kk-incoming
Revision history for this message
Lukas Märdian (slyon) wrote :

As discussed on the mailing list, would you be able to test a modification to the systemd-oomd configuration by placing a new file in /etc and report back if that improves the situation for you (after a reboot)?

$ cat /etc/systemd/system/-.slice.d/10-oomd-root-slice-defaults.conf
[Slice]
ManagedOOMSwap=auto

Changed in systemd (Ubuntu):
status: Incomplete → Confirmed
tags: added: fr-2482
tags: removed: rls-jj-incoming rls-kk-incoming
Revision history for this message
Damiön la Bagh (kat-amsterdam) wrote :

This is also occuring on Ubuntu Server on VPS.

I have 7x Ubuntu 22.04 VPSses with 1 GB physical RAM and a 3GB swap file. The programs I'm running get killed which stops production even though the swap file is far from full.

Revision history for this message
Nick Rosbrook (enr0n) wrote :

> This is also occuring on Ubuntu Server on VPS.

Did you install/enable systemd-oomd manually? AFAIK, systemd-oomd is not enabled by default on server. Or is something else killing your programs?

> swap file is far from full.

If it is systemd-oomd, do you have logs from this (journalctl -u systemd-oomd)?

Revision history for this message
Nick Rosbrook (enr0n) wrote :

> $ cat /etc/systemd/system/-.slice.d/10-oomd-root-slice-defaults.conf
> [Slice]
> ManagedOOMSwap=auto

I have been running with this configuration this week, and have been running a script to log occurrences of my memory and swap usage each exceeding 90%. I have had several such occurrences, but have yet to experience any noticeable performance issues. For example, this morning `oomctl` reported the following usage, but I was able to continue using my system without any noticeable difference:

$ oomctl
Dry Run: no
Swap Used Limit: 90.00%
Default Memory Pressure Limit: 60.00%
Default Memory Pressure Duration: 20s
System Context:
        Memory: Used: 14.3G Total: 15.5G
        Swap: Used: 979.9M Total: 979.9M
Swap Monitored CGroups:
Memory Pressure Monitored CGroups:
        Path: /user.slice/user-1000.slice/user@1000.service
                Memory Pressure Limit: 50.00%
                Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 6s
                Current Memory Usage: 13.5G
                Memory Min: 0B
                Memory Low: 0B
                Pgscan: 13490147
                Last Pgscan: 13490147

This is just one data point of course, but it puts me in favor of disabling swap kill for Jammy.

Nick Rosbrook (enr0n)
description: updated
Lukas Märdian (slyon)
Changed in systemd (Ubuntu Jammy):
status: New → In Progress
assignee: nobody → Nick Rosbrook (enr0n)
Changed in systemd (Ubuntu Kinetic):
assignee: nobody → Nick Rosbrook (enr0n)
Revision history for this message
Lukas Märdian (slyon) wrote :

Sponsored into Jammy. The fix is also part of the systemd v251 merge pending for Kinetic.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

What is the status of this in kinetic? Is this staged to be included in the v251 systemd merge that is planned shortly? I'll review it assuming that this is the case.

Changed in systemd (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-jammy
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello ChM, or anyone else affected,

Accepted systemd into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/249.11-0ubuntu3.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

The SRU looked good so I accepted it into -proposed, since this basically seems the current consensus regarding the systemd-oomd situation in jammy. It's a behavioral change, but is not viable to call a 'regression'.

Note to SRU members: let's make sure that the kinetic counterpart gets staged before this is released into -updates.

Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/249.11-0ubuntu3.4)

All autopkgtests for the newly accepted systemd (249.11-0ubuntu3.4) for jammy have finished running.
The following regressions have been reported in tests triggered by the package:

prometheus-libvirt-exporter/unknown (s390x)
corosync-qdevice/unknown (ppc64el, s390x)
csync2/unknown (ppc64el, s390x)
pyudev/unknown (ppc64el)
dlm/unknown (s390x)
systemd/unknown (ppc64el)
tpm2-abrmd/unknown (ppc64el)
dbus/unknown (ppc64el)
linux-lowlatency/5.15.0-40.43 (arm64)
samba/unknown (ppc64el)
dovecot/unknown (ppc64el, arm64)
systemd/249.11-0ubuntu3.4 (armhf)
prometheus-exporter-exporter/unknown (ppc64el)
conntrack-tools/unknown (ppc64el, s390x)
cups/unknown (s390x)
dpdk/unknown (ppc64el, arm64)
network-manager/1.36.6-0ubuntu2 (arm64)
php8.1/unknown (s390x)
procps/unknown (ppc64el)
gvfs/1.48.2-0ubuntu1 (ppc64el)
tgt/unknown (ppc64el)
redis/unknown (ppc64el)
qlcplus/unknown (ppc64el, s390x)
prometheus-squid-exporter/unknown (s390x)
flatpak/1.12.7-1 (amd64)
dq/unknown (arm64)
rtkit/unknown (ppc64el)
cockpit/unknown (s390x)
netplan.io/unknown (s390x)
pdns-recursor/unknown (s390x)
remctl/unknown (ppc64el)
comitup/unknown (s390x)
policykit-1/unknown (ppc64el)
rust-whoami/unknown (ppc64el)
casync/2+20201210-1build1 (ppc64el)
dbus-broker/unknown (ppc64el, s390x)
rpcbind/unknown (ppc64el)
openzwave/unknown (s390x)
netplan.io/0.104-0ubuntu2 (amd64, arm64)
postgresql-14/unknown (ppc64el)
python-uinput/unknown (s390x)
libsoup2.4/unknown (ppc64el)
libsfml/unknown (ppc64el)
polkit-qt-1/unknown (ppc64el)
debspawn/unknown (ppc64el)
corosync/unknown (ppc64el, s390x)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/jammy/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Nick Rosbrook (enr0n) wrote :

I have tested systemd 249.11-0ubuntu3.4 from jammy-proposed to verify the fix:

$ systemctl --version
systemd 249 (249.11-0ubuntu3.3)
[...]

$ systemctl show -- "-.slice" | grep "^ManagedOOMSwap"
ManagedOOMSwap=kill

$ sudo apt update && sudo apt install -y systemd
[...]

$ systemctl --version
systemd 249 (249.11-0ubuntu3.4)
[...]

$ systemctl show -- "-.slice" | grep "^ManagedOOMSwap"
ManagedOOMSwap=auto

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Dan Streetman (ddstreet) wrote :

Have you checked that systemd-oomd is actually killing anything in any situation now? Meaning, is this effectively the same as completely disabling systemd-oomd? And if it is the same as completely disabling systemd-oomd, would that be a better default?

Revision history for this message
Nick Rosbrook (enr0n) wrote :

Yes, I have run stress tests to confirm that the "memory pressure" kill logic is still enabled in systemd-oomd. I have not personally experienced such occurrences in my day-to-day, however.

But to answer your question, no this is not the same as entirely disabling systemd-oomd.

Lukas Märdian (slyon)
Changed in systemd (Ubuntu Kinetic):
status: Confirmed → Fix Committed
Revision history for this message
Lukas Märdian (slyon) wrote :

All autopkgtest regressions have been resolved by re-running them, using proper triggers.

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (22.3 KiB)

This bug was fixed in the package systemd - 251.2-2ubuntu1

---------------
systemd (251.2-2ubuntu1) kinetic; urgency=medium

  [ Nick Rosbrook ]
  * Merge to Ubuntu from Debian unstable
    - Dropped patches (applied upstream)
      + test-explicitly-configure-oomd-stuff-via-dropins.patch
      + test-enable-systemd-oomd.service.patch
      + linux-5.15-compat-ioprio/shared-split-out-ioprio-related-stuff-into-ioprio-util.-c.patch
      + linux-5.15-compat-ioprio/variuos-add-missing-includes.patch
      + linux-5.15-compat-ioprio/man-don-t-mention-IOSchedulingClass-none-anymore-in-the-d.patch
      + linux-5.15-compat-ioprio/test-add-test-for-ioprio-normalization.patch
      + linux-5.15-compat-ioprio/Define-ioprio_-get-set-the-same-as-other-compat-syscalls.patch
      + linux-5.15-compat-ioprio/Get-rid-of-ioprio.h-and-add-a-minimalistic-reimplementati.patch
      + linux-5.15-compat-ioprio/ioprio-util-add-macro-for-default-ioprio-settings.patch
      + linux-5.15-compat-ioprio/ioprio-normalize-io-priority-values-in-configuration.patch
      + linux-5.15-compat-ioprio/core-normalize-ioprio-values-we-acquire-from-kernel.patch
      + test-also-show-the-memory-pressure-of-testchill.service.patch
      + test-make-test-55-oomd-less-flaky.patch
      + lp1964494-network-do-not-enable-IPv4-ACD-for-IPv4-link-local-a.patch
      + lp1966381-oomd-calculate-used-memory-with-MemAvailable-instead-of-M.patch
      + lp1926860-hwdb-remove-the-tablet-pad-entry-for-the-UC-Logic-1060N.patch
      + oomd-move-oomctl-to-bindir.patch
      + test-enable-debug-logging-of-systemd-oomd.patch
      + lp1943561/Add-additional-Dell-models-that-require-ACCEL_LOCATION-ba.patch
      + lp1943561/Use-SKU-to-identify-Dell-clamshell-models-for-acceleromet.patch
      + lp1929345/hwdb-Force-release-calculator-key-on-all-HP-OMEN-laptops.patch
      + lp1929345/hwdb-Add-force-release-for-HP-Omen-15-calculator-key.-205.patch
      + sysusers-split-up-systemd.conf.patch
      + hwdb-Add-mic-mute-key-mapping-for-HP-Elite-x360.patch
      + test-check-memory-pressure-more-frequently.patch
      + meson-minor-cleanup.patch
      + units-don-t-install-dbus-org.freedesktop.oom1.service-ali.patch
      + lp1950508-cgroup-check-if-any-controller-is-in-use-as-v1.patch
      + lp1952735-keymap-Add-microphone-mute-keymap-for-Dell-Machine.patch
      + test-tweak-parameters-for-TEST-55-OOMD.patch
      + deny-list-TEST-29-PORTABLE-and-TEST-50-DISSECT.patch
      + lp1955997-unmask-intel-hid-for-HP-machines.patch
      + lp1952733-hwdb-60-keyboard-Update-Dell-Privacy-Micmute-Hotkey-Map.patch
      + Merge-pull-request-20705-from-yuwata-test-oomd-util.patch
    - Refreshed patches
      + debian/Ubuntu-UseDomains-by-default.patch
      + debian/UBUNTU-Support-system-image-read-only-etc.patch
      + Revert-network-if-sys-is-rw-then-udev-should-be-around.patch
      + debian/UBUNTU-src-test-testmount-util.c-Skip-parts-of-test-mount-util-in-LXC.patch
      + lp1950794-Revert-sd-dhcp-do-not-use-detect_container-to-guess-.patch
      + 0001-Revert-tests-add-test-case-for-UMask-BindPaths-combi.patch
  * Drop debian/Ubuntu-core-in-execute-soft-fail-setting-Nice-priority-when.patch.
    This patch...

Changed in systemd (Ubuntu Kinetic):
status: Fix Committed → Fix Released
Revision history for this message
Tim Richardson (tim-richardson) wrote :

For me, systemd-oomd no longer kills at all. The memory pressure threshold is still active, but I think the default of 50% on the user slice is way too high. I can put a 4gb test VM under extreme memory load and get so much swap activity that CPU load in a two core VM gets > 50, yet the memory pressure score is 14%. I can not conceive of what type of load would get it to 50%.

I have set the user slice threshold to 10%, and when I attempt to load 100 tabs, the browser is killed a couple of minutes after memory and swap is exhausted. It's not an aggressive kill, but it lets systemd-oomd actually kill something.

So far it has only ever killed the guilty app. I think if the aim is not have systemd-oomd ever kill anything, 50% memory threshold and swap kill off achieves the goal, but if you want it to kill baes on memory pressure, the memory threshold needs to be much lower. killing on memory pressure was supposed to be one of the great things about systemd-oomd, I thought.

I note the systemd-cgtop shows there are many tasks under the user slice (I have about 400 when idle, and about 1200 when the brower is trying to load all those tabs). All the system slices have < 5 tasks. So one or two of those processes being stalled will result is a steep increase in memory pressure KPI. But perhaps with so many tasks in the user slice, the KPI is highly "diluted" and needs a much lower threshold to be meaningful.

Maybe this is all very different on a raspberry PI.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 249.11-0ubuntu3.4

---------------
systemd (249.11-0ubuntu3.4) jammy; urgency=medium

  [ Mustafa Kemal Gilor ]
  * d/p/lp1978079-efi-pstore-not-cleared-on-boot.patch: pstore: Run after
    modules are loaded. Thanks to Alexander Graf <email address hidden>.
    (LP: #1978079)
    Author: Mustafa Kemal Gilor
    File: debian/patches/lp1978079-efi-pstore-not-cleared-on-boot.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=d990b13612810a296246011ad66a165b30166702

  [ Nick Rosbrook ]
  * systemd-oomd: set ManagedOOMSwap=auto on -.slice (LP: #1972159)
    This has the effect of disabling swap kill by default, so cgroups will
    only be monitored for memory pressure, and not swap usage.
    File: debian/extra/systemd-oomd-defaults/-.slice.d/10-oomd-root-slice-defaults.conf
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=e93c944c58ec376454301e9c9b55d35be7c14a89

 -- Nick Rosbrook <email address hidden> Mon, 27 Jun 2022 14:28:46 -0400

Changed in systemd (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for systemd has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
In , bryanhoop (bryanhoop-redhat-bugs) wrote :
Download full text (38.8 KiB)

I'm running into this issue, seemingly with plenty of RAM/swap available when running btrfs commands. btrbk and btrfs-progs commands are killed by systemd-oomd because they exceed the 50% memory pressure threshold for system.slice set in /usr/lib/systemd/system/system.slice.d/10-oomd-per-slice-defaults.conf.

Here's some logging output:
Dec 21 16:39:11 workstation systemd-oomd[1779]: Considered 74 cgroups for killing, top candidates were:
Dec 21 13:49:51 workstation systemd[1]: btrbk.service: systemd-oomd killed 5 process(es) in this unit.
Dec 21 13:49:51 workstation systemd-oomd[1779]: Killed /system.slice/btrbk.service due to memory pressure for /system.slice being 56.47% > 50.00% for > 20s with reclaim activity
Dec 21 13:49:51 workstation systemd-oomd[1779]: Last Pgscan: 1411
Dec 21 13:49:51 workstation systemd-oomd[1779]: Pgscan: 1411
Dec 21 13:49:51 workstation systemd-oomd[1779]: Memory Low: 0B
Dec 21 13:49:51 workstation systemd-oomd[1779]: Memory Min: 0B
Dec 21 13:49:51 workstation systemd-oomd[1779]: Current Memory Usage: 9.0M
Dec 21 13:49:51 workstation systemd-oomd[1779]: Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 0
Dec 21 13:49:51 workstation systemd-oomd[1779]: Memory Pressure Limit: 0.00%
Dec 21 13:49:51 workstation systemd-oomd[1779]: Path: /system.slice/pcscd.service
Dec 21 13:49:51 workstation systemd-oomd[1779]: Last Pgscan: 3337
Dec 21 13:49:51 workstation systemd-oomd[1779]: Pgscan: 3337
Dec 21 13:49:51 workstation systemd-oomd[1779]: Memory Low: 0B
Dec 21 13:49:51 workstation systemd-oomd[1779]: Memory Min: 0B
Dec 21 13:49:51 workstation systemd-oomd[1779]: Current Memory Usage: 12.8M
Dec 21 13:49:51 workstation systemd-oomd[1779]: Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 1ms
Dec 21 13:49:51 workstation systemd-oomd[1779]: Memory Pressure Limit: 0.00%
Dec 21 13:49:51 workstation systemd-oomd[1779]: Path: /system.slice/geoclue.service
Dec 21 13:49:51 workstation systemd-oomd[1779]: Last Pgscan: 12100
Dec 21 13:49:51 workstation systemd-oomd[1779]: Pgscan: 12100
Dec 21 13:49:51 workstation systemd-oomd[1779]: Memory Low: 0B
Dec 21 13:49:51 workstation systemd-oomd[1779]: Memory Min: 0B
Dec 21 13:49:51 workstation systemd-oomd[1779]: Current Memory Usage: 16.8M
Dec 21 13:49:51 workstation systemd-oomd[1779]: Pressure: Avg10: 0.00 Avg60: 0.00 Avg300: 0.00 Total: 49us
Dec 21 13:49:51 workstation systemd-oomd[1779]: Memory Pressure Limit: 0.00%
Dec 21 13:49:51 workstation systemd-oomd[1779]: Path: /system.slice/udisks2.service
Dec 21 13:49:51 workstation systemd-oomd[1779]: Last Pgscan: 10448
Dec 21 13:49:51 workstation systemd-oomd[1779]: Pgscan: 10448
Dec 21 13:49:51 workstation systemd-oomd[1779]: Memory Low: 0B
Dec 21 13:49:51 workstation systemd-oomd[1779]: Memory Min: 0B
Dec 21 13:4...

Revision history for this message
In , rkudyba (rkudyba-redhat-bugs) wrote :

Still on Fedora 36 from a MobaXterm client using ssh -X:
systemd[1]: systemd-oomd.service - Userspace Out-Of-Memory (OOM) Killer was skipped because of a failed condition check (ConditionControlGroupController=v2).

Default settings, systemd-oomd-defaults-250.9-1.fc36.noarch

Changed in systemd (Fedora):
importance: Unknown → Critical
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.