memory not present in /sys/fs/cgroup/cgroup.controllers after installation of rt patch

Bug #1978814 reported by SRINIVAS SADAGOPAN
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
ubuntu-realtime
Fix Released
Medium
Joseph Salisbury

Bug Description

I installed Ubuntu 22.04 and then followed instructions at: https://ubuntu.com/blog/real-time-ubuntu-released
to install the realtime patch (ua enable realtime-kernel --beta) and rebooted the server. Subsequently when I installed kubernetes, I noticed that most of system pods were not coming up. The error was as shown below:

  Warning Failed 33m kubelet Error: container create failed: time="2022-06-15T04:14:17Z" level=warning msg="unable to get oom kill count" error="openat2 /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod07a13da2_a5cb_4ab9_bfaf_945d21f68f20.slice/crio-bd8d06107dd81c3f680872405ad77a768ee9757634c2d51ffb4312e588b57630.scope/memory.events: no such file or directory"

I then noticed that "memory" was missing from "/sys/fs/cgroup/cgroup.controllers"

root@cran-ubuntu:~# cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io hugetlb pids rdma misc

I then re-installed the Ubuntu 22.04 (I.e., without rt patch) and I could see that "memory" is present in "/sys/fs/cgroup/cgroup.controllers"

root@cran-ubuntu:~# cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma misc

Also, "memory.events" file can be found under sub-directories in the /sys/fs/cgroup/ Path

To summarize, when the rt patch is installed, "memory" is being removed from "/sys/fs/cgroup/cgroup.controllers".. Due to which several kubernetes system pods fail to come up. Could this issue be please looked into?

Tags: jammy
Changed in ubuntu-realtime:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the bug report, Srinivas.

I am going to setup an environment to try and reproduce this issue. I should have an update for you shortly.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

We would like to collect some additional information about your system. From a terminal, please run the following:

apport-collect 1978814
or to a file:
apport-bug --save /tmp/report.1978814 linux

If apport can't be run:
1) uname -a > uname-a.log
2) dmesg > dmesg.log
3) sudo lspci -vvnn > lspci-vvnn.log
4) cat /proc/version_signature > version.log

tags: added: jammy
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built an upstream stable v5.15.35 with the upstream rt41 real-time patch set applied. This kernel can be downloaded from:

https://people.canonical.com/~jsalisbury/lp1978814/

Would it be possible to test this kernel to see if it exhibits this bug? This will tell us if the issue is caused by an Ubuntu specific commit or an upstream commit.

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :

I cannot run the "apport-bug --save /tmp/report.1978814 linux" command due to following error. I will post the output of invidiual commands you had sent

*** Collecting problem information

The collected information can be sent to the developers to improve the
application. This might take a few minutes.
..

*** Problem in linux-image-5.15.0-1014-realtime

The problem cannot be reported:

This does not seem to be an official Ubuntu package. Please retry after updating the indexes of available packages, if that does not work then remove related third party packages and try again.

Press any key to continue...

No pending crash reports. Try --help for more information.

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :
Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :
Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :
Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :
Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote (last edit ):

Hi Joseph - Could you please let me know the steps to install the kernel image from the link you provided: https://people.canonical.com/~jsalisbury/lp1978814/

I tried running the dpkg command.. however got dependency error on all the packages provided in the link. The dependency reported by linux-image and linux-modules seem to be circular on each other. Let me know if I'm doing something wrong..

dpkg -i linux-headers-5.15.35-051535-generic_5.15.35-051535.202206152023_amd64.deb
dpkg: dependency problems prevent configuration of linux-headers-5.15.35-051535-generic:
 linux-headers-5.15.35-051535-generic depends on linux-headers-5.15.35-051535; however:
  Package linux-headers-5.15.35-051535 is not installed.

dpkg -i linux-image-unsigned-5.15.35-051535-generic_5.15.35-051535.202206152023_amd64.deb
dpkg: dependency problems prevent configuration of linux-image-unsigned-5.15.35-051535-generic:
 linux-image-unsigned-5.15.35-051535-generic depends on linux-modules-5.15.35-051535-generic; however:
  Package linux-modules-5.15.35-051535-generic is not configured yet.

dpkg -i linux-modules-5.15.35-051535-generic_5.15.35-051535.202206152023_amd64.deb
 linux-modules-5.15.35-051535-generic depends on linux-image-5.15.35-051535-generic | linux-image-unsigned-5.15.35-051535-generic; however:
  Package linux-image-5.15.35-051535-generic is not installed.
  Package linux-image-unsigned-5.15.35-051535-generic is not configured yet.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Srinivas,

In the directory where you downloaded the .debs, you should be able to just run:
sudo dpkg -i *

That will allow dpkg to figure out the install order. You only really need the following two files to test:

linux-image-unsigned-5.15.35-051535-generic_5.15.35-051535.202206152023_amd64.deb
linux-modules-5.15.35-051535-generic_5.15.35-051535.202206152023_amd64.deb

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Also, I've heard of other having a similar issue and were able to get around it by ignoring the pre-flight check during the kubernetes installation.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

You may see the circular dependency issue while removing the test kernels as well. I was able to remove the kernel with:

sudo dpkg --purge linux-image-unsigned-5.15.35-051535-generic linux-modules-5.15.35-051535-generic

So list both packages on the same line during removal.

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote (last edit ):

Hi Joseph -

1. I was able to install the test kernel from the link you sent, after following your instructions. I do see "memory" being present in /sys/fs/cgroup/cgroup.controllers after installatiom:

root@node1:~# uname -a
Linux node1 5.15.35-051535-generic #202206152023 SMP Wed Jun 15 20:26:47 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
root@node1:~# cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma misc

However, with rt patched kernels that i have used before, generally I see PREEMPT_RT label in the output of "uname -a".. which I do not see in the above output and also no "-realtime" suffix with kernel version.

Also, I do not see CONFIG_PREEMPT_RT set with the test kernel version 5.15.35-051535-generic.. whereas the rt kernel that I installed using UA (5.15.0-1014-realtime) has this flag set

root@node1:~# cat /boot/config-5.15.0-1014-realtime | grep CONFIG_PREEMPT_RT
CONFIG_PREEMPT_RT=y
root@node1:~# cat /boot/config-5.15.35-051535-generic | grep CONFIG_PREEMPT_RT
# CONFIG_PREEMPT_RT is not set

Could you please confirm that the test kernel you uploaded indeed has the RT patches applied?

2. you mentioned about others facing similar issue and were able to get over by disabling certain pre-flight checks during Kubernetes installation...

Please note that in my system after installation of rt kernel using the ua tool (ua enable realtime-kernel --beta) .. Kubernetes installation was successfully completed without any errors. I'm also able to see kuberentes systemd services are running correctly and I can run kubectl commands to check status of system pods that are launched by default. The error I'm facing is that some of the Kubernetes system pods are displaying the error I have put up in the description section of this bug report...

In any case, if you know of any specific steps that I can follow to workaround this issue.. please do let me know..

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

It appears I built the upstream kernel for you with the generic configs. I re-built the kernel again, and ensured I used the real-time configs.

This kernel can be downloaded from:

https://people.canonical.com/~jsalisbury/lp1978814/

Would it be possible to test this kernel to see if it exhibits this bug?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Srinivas,

I would also like to try and reproduce the issue you are seeing. It would speed up the debugging process, so I don't have to ask you for so much testing.

Could you list the steps I can perform to duplicate what you are doing? I have a system ready with the real-time kernel, but it would be great if you could list the steps of how you install and configure kubernetes to reproduce this. Can I do this on a single VM or would I need multiply VMs/systems?

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote (last edit ):

Hi Joseph - I tried the latest kernel with real-time patches that you had put at: https://people.canonical.com/~jsalisbury/lp1978814/

After reboot, I still see the issue.. some of the kubernetes pods do not come up and "/sys/fs/cgroup/cgroup.controllers" does not include "memory"..

root@node1:~/rt-patch# cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io hugetlb pids rdma misc

I checked the kernel source and noticed that in the file linux-5.15.35/include/linux/cgroup_subsys.h .. memory maybe added to cgroup controllers only when CONFIG_MEMCG is enabled..

 28 #if IS_ENABLED(CONFIG_MEMCG)
 29 SUBSYS(memory)
 30 #endif

Accordingly, CONFIG_MEMCG is enabled in the generic kernel in Ubuntu 22.04:

root@node1:~# cat /boot/config-5.15.0-30-generic | grep CONFIG_MEMCG
CONFIG_MEMCG=y
CONFIG_MEMCG_SWAP=y
CONFIG_MEMCG_KMEM=y

However, CONFIG_MEMCG is not enabled in RT patched versions

root@node1:~# cat /boot/config-5.15.0-1014-realtime | grep CONFIG_MEMCG
root@node1:~# cat /boot/config-5.15.35-051535-realtime | grep CONFIG_MEMCG
root@node1:~#

Can you please send me RT patched kernel with CONFIG_MEMCG, CONFIG_MEMCG_KMEM, and CONFIG_MEMCG_SWAP enabled? Currently you are uploading following packages - linux-image, linux-headers, linux-modules.. Can you also additionally include the following packages - linux-devel, linux-modules-extra? With these additional modules I will be able to perform e2e tests.

If this issue is not resolved with the above mentioned change.. Then I will send across the steps we are following to install Kubernetes..

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :

Hello Joseph - I also notice in the "patch-5.15.44-rt46.patch" file that "config MEMCG" is dependent on !PREEMPT_RT

 config MEMCG
        bool "Memory controller"
+ depends on !PREEMPT_RT
        select PAGE_COUNTER
        select EVENTFD
        help

Does that mean in this version of kernel/kernel-rt, MEMORY Cgroup control is not supported with rt patch? Do you know if kubernetes is being successfully used by anyone with Kernel 5.15 and patch-5.15.44-rt46.patch? If yes, what are the steps they are following to ensure Kubernetes does not encounter the errors described in this issue?

NOTE: I also see that RT patches for Kernel 5.16.. do not have the restriction of MEMCG config on !PREEMPT_RT..

Kindly share your thoughts and analysis

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

HI Srinivas,

Thanks for the detailed information!

I will investigate if "config MEMCG" can be enabled for real-time and why it could be dependent on !PREEMPT_RT.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

It does look like MEMCG was is disable for real-time per a commit from Thomas Gleixner:

From 4a71ec0b2e4b0a5f1564d14895234dc6d51bc975 Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <email address hidden>
Date: Sun, 25 Jul 2021 21:35:46 +0200
Subject: [PATCH 098/167] mm/memcontrol: Disable on PREEMPT_RT

559271146efc ("mm/memcg: optimize user context object stock access") is a
classic example of optimizing for the cpu local BKL serialization without a
clear protection scope.

Disable MEMCG on RT for now.

Signed-off-by: Thomas Gleixner <email address hidden>

I will build you a test kernel with CONFIG_MEMCG_KMEM, and CONFIG_MEMCG_SWAP enabled to confirm that is causing the original issue. If that fixes the issues, we will have to research further to find a long term solution.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote (last edit ):

I built a test kernel with CONFIG_MEMCG, CONFIG_MEMCG_KMEM, and CONFIG_MEMCG_SWAP enabled. I also removed the following patch that makes "config MEMCG" dependent on !PREEMPT_RT:

mm/memcontrol: Disable on PREEMPT_RT

The test kernel can be downloaded from:

https://people.canonical.com/~jsalisbury/lp1978814/MEMCG_ENABLED/

Can you test this kernel to see if it exhibits the bug?

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote (last edit ):

Hi Joseph - With the latest rt kernel package that you uploaded.. all system pods of my Kuberenetes cluster has come up:

root@node1:~# uname -a
Linux node1 5.15.0-1012-realtime #12 SMP PREEMPT_RT Tue Jun 21 22:18:56 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
root@node1:~# cat /sys/fs/cgroup/cgroup.controllers
cpuset cpu io memory hugetlb pids rdma misc
root@node1:~# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-hf9m5 1/1 Running 1 6d22h
kube-system coredns-666959ff67-ck299 1/1 Running 1 6d22h
kube-system dns-autoscaler-6656dfd4c6-psvbm 1/1 Running 3 6d22h
kube-system etcd-node1 1/1 Running 6 6d22h
kube-system kube-apiserver-node1 1/1 Running 7 6d22h
kube-system kube-controller-manager-node1 1/1 Running 8 6d22h
kube-system kube-multus-ds-amd64-jfdpt 1/1 Running 1 6d22h
kube-system kube-proxy-rqctk 1/1 Running 6 6d22h
kube-system kube-scheduler-node1 1/1 Running 8 6d22h
kube-system nodelocaldns-95r7b 1/1 Running 1 6d22h
root@node1:~#

If you look at the rt patches before and after the "patch-5.15.44-rt46".. For example "patch-5.10.78-rt55", and "patch-5.17.1-rt17".. you can see these patches have other changes related to memcg.. that is not present in "patch-5.15.44-rt46". When the MEMCG config was removed with PREEMPT_RT the author probably also removed all the memcg related changed from rt patch...

So looks like the long term solution for Ubuntu RT stream is to move to kernel 5.17 or greater

Kindly share your thoughts on your plan for long term solution...

Thanks a lot for your wonderful support so far..

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Srinivas,

I'll do some research, and should have an update for you shortly.

Changed in ubuntu-realtime:
status: Triaged → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The current RT kernel for 22.04 will have to stay 5.15 based, since 22.04 is already GA. The future RT kernels will be 5.17 or greater, so this issue will not occur with them.

Since the 22.04 kernel must remain 5.15 based, we will have to figure out what other changes related to memcg are in 5.17 and greater and backport those changes to the Ubuntu 5.15 kernel.

I'll work on identifying what changes need to be backported.

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :

Hi Joseph - Thanks. Understood. I will wait for updates from you.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I am out of the office next week, so I wanted to give you a heads up in case my response is delayed.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I noticed that the real-time patch set 5.16 and newer have the following four patches that 5.15 and older do not:

0001-mm-memcg-Disable-threshold-event-handlers-on-PREEMPT.patch
0002-mm-memcg-Protect-per-CPU-counter-by-disabling-preemp.patch
0003-mm-memcg-Add-a-local_lock_t-for-IRQ-and-TASK-object.patch
0004-mm-memcg-Allow-the-task_obj-optimization-only-on-non.patch

5.16 and newer is when this patch gets dropped, which is the patch causing this bug:
 mm/memcontrol: Disable on PREEMPT_RT

I'll investigate further, but the next step would be to test a 5.15 Ubuntu real-time kernel with the addition of these four patches and the 'Disabel on PREEMPT_RT' patch dropped.

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :

Hi Joseph - Agree with you on the next step. If you could apply the four patches you have mentioned and send across the kernel packages.. I will use them for our validations.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Srinivas,

I wanted to give you an update. I sent an email to the upstream devs and mailing list regarding this issue and asked for feedback. I've received a reply:

"I know we've had latency issues in the past with memory cgroups.

If there is a change in behavior in 5.16rt+ we can consider
backporting down to 5.15-rt

Clark"

I'll keep you up to day on any additional feedback received.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

There is a discussion about a possible backport upstream:

https://<email address hidden>/

I will follow this thread to see what our options will be.

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :

Hi Joseph - Thanks for the updates so far. Glad to know that we have options for backporting MEMCG support to 5.15 Kernel. I will wait for your analysis to complete and further updates.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Srinivas,

A patch set request landed on the RT mailing list today:
https://<email address hidden>/T/#mfa1b5a1d8dbec4b5966483e47cc1f16228a7b0e9

I will monitor this latest set to see if this is the one that finally lands in 5.15. When it does land, I can backport it into Ubuntu.

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :

Hi Joseph - Thanks for the update. Typically, how long does this take?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The time can vary, depending on the upstream maintainers workload. I will monitor the thread to know when the patches are ready.

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :

Thanks a lot Joseph. I will wait for updates from you

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Srinivas,

Upstream will be reviewing the patch set this week.

While upstream reviews, I created a test kernel with the patches. Can you test the kernel found here, and see if it resolves the issue:

https://people.canonical.com/~jsalisbury/MEMCG_UNSUPPORTED_TEST_KERNEL/

Thanks in advance!

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :

Hi Joseph - Thanks for the follow-up and update. I will test the kernel with patches. I would like to synchronize this testing with some other planned testing next week to optimize efforts. I will send you an update sometime next week with my observations

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote (last edit ):

Hello Joseph - Looks like CONFIG_MEMCG is not enabled in the kernel packages put in https://people.canonical.com/~jsalisbury/MEMCG_UNSUPPORTED_TEST_KERNEL/

cat /boot/config-5.15.0-1018-realtime
CONFIG_PAGE_COUNTER=y
# CONFIG_MEMCG is not set
CONFIG_BLK_CGROUP=y

Could you please upload the packages with CONFIG_MEMCG enabled?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Ahh yes, I have to remove the rt patch that disables it. I will have another kernel ready for you shortly.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I re-built the test kernel. Can you give this one a try:
 https://people.canonical.com/~jsalisbury/MEMCG_UNSUPPORTED_TEST_KERNEL/

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :

Hi Joseph - Thanks. I will take the Kernel Packages and test

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The commits that fix this bug have landed upstream. They are available in the rt50[0] and newer patch sets.

The rt50 patch set depend on upstream stable 5.15.70. However, the Ubuntu real-time kernel is currently at upstream 5.15.60 and the rt48 patch set.

The Ubuntu kernel will be brought up to the rt50 patch set in the next SRU cycle or two (Kernel 5.15.0-1023.23 or 5.15.0-1024.24).

[0] https://mirrors.edge.kernel.org/pub/linux/kernel/projects/rt/5.15/older/patches-5.15.70-rt50.tar.gz

Revision history for this message
SRINIVAS SADAGOPAN (ssadagop) wrote :

Hi Joseph - Thanks for the update

1. Could you please make a Kernel package available with the latest patches and upstream stable kernel? I can run the tests to see if stability issues mentioned in another bug "https://bugs.launchpad.net/ubuntu-realtime/+bug/1990834" is resolved.

2. From your comment - "The Ubuntu kernel will be brought up to the rt50 patch set in the next SRU cycle or two (Kernel 5.15.0-1023.23 or 5.15.0-1024.24)."

    a. Could you please let me know ballpark ETA for this? It will help in our planning

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Srinivas,

Yes, I will attempt to build a test kernel with the rt50 patch set. It will require some backporting, since the rt kernel is only at the 5.15.60 stable updates and the patch set depends on 5.15.70. I should be able to make it work. I'll post directly in bug 1990834 when the test kernel is available.

For the dates, the -1023 kernel should be available in three weeks and the 1024 kernel available in six weeks. Those schedules could slip if any urgent CVEs cause re-spins of the kernels. I'll keep this bug up to date on the progress of the SRU cycles.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Could you test the kernel available for download from:
https://people.canonical.com/~jsalisbury/lp1978814/MEMCG_ENABLED/

Thanks in advance!

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Hi Srinivas,

Can you test the kernel available at:
https://people.canonical.com/~jsalisbury/5.15.0-1025_test_kernel/arm64/

This is a 'Pre-release' test kernel that has the MEMCG patches applied as well as rt52 real-time patchset.

Thanks,

Joe

Revision history for this message
Ihor (ikyrycho) wrote :

Not original topicstarter, however had same issue and https://people.canonical.com/~jsalisbury/lp1978814/MEMCG_ENABLED/amd64/ fixed it on my cluster. Memory cgroup is now available, so K8s can successfully run all pods.

Revision history for this message
Ihor (ikyrycho) wrote :

Not sure why, but looks like modified kernel has issue with `/sys/bus/pci/devices` path or mounting in Kubernetes. On ubuntu it is writeable
```
root@worker-2:/sys/devices/pci0000:16/0000:16:02.0/0000:17:00.0# ls -la reset
--w------- 1 root root 4096 Nov 10 13:25 reset
root@worker-2:/sys/devices/pci0000:16/0000:16:02.0/0000:17:00.0# echo 1 > reset

However Kubernetes has issue with it:
    Message: failed to execute Function Level Reset for PF (0000:17:00.0): open /sys/bus/pci/devices/0000:17:00.0/reset: read-only file system

Changed in ubuntu-realtime:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.