cgroup-bin should not move kthreadd into a default cgroup

Bug #693594 reported by Nelson Elhage
90
This bug affects 19 people
Affects Status Importance Assigned to Milestone
libcgroup (Ubuntu)
Confirmed
High
Unassigned

Bug Description

Steps to reproduce:
1. Install cgroup-bin from universe on a stock Lucid machine (I've only tested amd64, but I suspect it shouldn't matter)
2. Load an arbitrary module (e.g. modprobe rds)
3. Unload the module loaded in (2) (e.g. rmmod rds)

The 'rmmod' process will hang unkillably in the kernel.

Here's an example `dmesg` output from the hung-task watchdog for rmmod:

 INFO: task rmmod:1608 blocked for more than 120 seconds.
 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 rmmod D 0000000000000000 0 1608 1440 0x00000000
   ffff880014245c78 0000000000000082 0000000000015bc0 0000000000015bc0
   ffff880014804890 ffff880014245fd8 0000000000015bc0 ffff8800148044d0
   0000000000015bc0 ffff880014245fd8 0000000000015bc0 ffff880014804890
  Call Trace:
  [<ffffffff81541b6d>] schedule_timeout+0x22d/0x300
  [<ffffffff812b8716>] ? rb_erase+0xd6/0x160
  [<ffffffff81052a10>] ? __dequeue_entity+0x30/0x50
  [<ffffffff8154178b>] wait_for_common+0xdb/0x180
  [<ffffffff8105a220>] ? default_wake_function+0x0/0x20
  [<ffffffff815418ed>] wait_for_completion+0x1d/0x20
  [<ffffffff8107fe55>] flush_cpu_workqueue+0x65/0xa0
  [<ffffffff8107ff10>] ? wq_barrier_func+0x0/0x20
  [<ffffffff81080754>] flush_workqueue+0x54/0x80
  [<ffffffff810b5b24>] __stop_machine+0xf4/0x120
  [<ffffffff8109d8c0>] ? __try_stop_module+0x0/0x50
  [<ffffffff810b5d7e>] stop_machine+0x3e/0x60
  [<ffffffff8109cbd4>] ? find_module+0x34/0x70
  [<ffffffff8109e1ee>] sys_delete_module+0x17e/0x270
  [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b

The process is waiting on kstop/0 to wake up and service the stop_cpu workqueue work item that it has queued. kstop/0 is marked as TASK_RUNNABLE, but doesn't appear to ever be getting scheduled:

    $ ps -f 1609
    UID PID PPID C STIME TTY STAT TIME CMD
    root 1609 2 0 14:47 ? R 0:00 [kstop/0]

    $ cat /proc/1609/stack
    [<ffffffff8107fb6a>] worker_thread+0xda/0x110
    [<ffffffff81084206>] kthread+0x96/0xa0
    [<ffffffff810131ea>] child_rip+0xa/0x20
    [<ffffffffffffffff>] 0xffffffffffffffff

I tracked this behavior down and reported it to the upstream kernel, but they say it's not a bug and that it's libcgroup's fault for moving kthreadd into a cgroup without RT privs: https://lkml.org/lkml/2011/1/5/53

Revision history for this message
Nelson Elhage (nelhage) wrote :
description: updated
description: updated
description: updated
Revision history for this message
Nelson Elhage (nelhage) wrote :

The bug is somehow related to how cgroup puts every process into a default cgroup. You can un-wedge the hung 'rmmod' via

  pgrep kstop > /mnt/cgroup/cpu/tasks

Revision history for this message
Nelson Elhage (nelhage) wrote :

The problem appears to be that the 'sysdefault' cgroup has a rt_runtime_us value of 0, and the stop_machine kthreads are getting created inside that cgroup (since the init script moves everything into the sysdefault cgroup), and so never get run.

The 'cpu' cgroup subsys will prevent moving RT tasks into a cgroup with rt_runtime_us == 0, but there's no mechanism to prevent the kernel from forking an RT thread inside such a cgroup.

Maverick "fixes" this by using the migration/N tasks for stop_machine, which come into existence before the cgroup init script, and so are prevented from moving into the sysdefault cgroup because they're already RT.

Revision history for this message
Nelson Elhage (nelhage) wrote :

Upstream kernel says this isn't a kernel bug: https://lkml.org/lkml/2011/1/5/53

affects: linux (Ubuntu) → libcgroup (Ubuntu)
summary: - rmmod hangs if cgroup-bin is installed
+ cgroup-bin should not move kthreadd into a default cgroup
description: updated
Revision history for this message
fnu (fnu) wrote :

Would it make sense to enable the real time scheduler for the [sys]default process group [rt_runtime_us = 1]? At least as a kind of workaround ... ?

Revision history for this message
Nelson Elhage (nelhage) wrote :

Yes, enabling real-time scheduling for sysdefault would also fix this. Although you probably want something like rt_runtime_us = 950000, which is the default without any cgroups.

Revision history for this message
fnu (fnu) wrote :

Hey, thx for the reply.

Since I also ran in the problem on a VDR. In case of failure, VDR is covered by a watchdog, which tries do unload DVB modules. with cgroup-bin on Lucid, this ends in stop application, since rmmod does never come back.

In the meantime I'm strongly interested in a workaround, because cgroups does give me some benefit, which I could only find if I replace my single core against a dual core CPU. So, "rt_runtime_us" is a adjustable value rather than a boolean, right? So, I'll try to echo the the value "950000" with "/etc/rc.local" as a workaround on a test machine, to see if this works.

Revision history for this message
Nelson Elhage (nelhage) wrote :

> "rt_runtime_us" is a adjustable value rather than a boolean, right?

Right. In particular, it is the number of microseconds out of every second that a process scheduled with realtime priority is allowed to run for if it doesn't voluntarily yield the CPU, or something like that.

Revision history for this message
fnu (fnu) wrote :

Ok, I tried it one my little Lucid@virtualbox (1 CPU & 512MB defined) and well, it works. Just put a "echo 950000 > /cgroups/cpu/sysdefault/cpu.rt_runtime_us" into "/etc/rc.local" and it is possible to load and unload kernel modules. Since I don't know how this works against cgroup at all, I would say this is a workaround not more. The problem itself has to be fixed in Lucid and Maverick, since issue doesn't happen there due to a lucky configuration circumstance ...

But cgroup still work, I can see/feel it on my very thin equipped virtual Lucid, there some builds running in background, as write this and doing several other things. This was not possible in the installation.

Changed in libcgroup (Ubuntu):
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Hi,

the core of this bug appears to be fixed in natty's newest libcgroup version (0.37.1-1ubuntu2). It no longer moves tasks into a default cgroup.

I'm undecided as to whether to call that fixing this bug. I assume that changing the config files to set a default cgroup other than / will still cause kthreadd to be moved.

I am assigning this to jbernard to get his opinion. If you think this bug should be called 'fix released', please let me know. Otherwise, if you have a good idea how to trivially stop kernel threads like kthreadd from being moved and want to do it, great. Finally, if you think it should be 'properly' fixed but prefer for me to do it, please reassign it to me.

Changed in libcgroup (Ubuntu):
assignee: nobody → Jon Bernard (jbernard)
Revision history for this message
AlainKnaff (kubuntu-misc) wrote :

This problem also happens in lucid (which is supposed to be an LTS release). Please fix this ASAP

Revision history for this message
steubens (steubens) wrote :

a related issue with moving kernel threads is that one of them also breaks suspend :] it was fun to figure out!

Revision history for this message
Rioting_Pacifst (rioting-pacifist) wrote :

This bug has quite a few nasty symptoms (breaks suspend on SMP machines, etc) and I guess cgroups are going to get more popular as people try the "superpatch" so could we get one of the following into 12.04 please:

1)do not move all tasks into a default cgroup
comment CREATE_DEFAULT=yes out and uncomment CREATE_DEFAULT=no in /etc/default/cgconfig

2)putting the following in /etc/cgconfig.conf
group sysdefault {
    cpu {
        cpu.rt_runtime_us = 950000;
    }
}

3) Add a rule so that [kthreadd] is never put in the default group

4) Changing startup script to not move ktreadadd

I think creating a superflous default group is stupid anyway so the first fix is by far the best.

Upstream have also release 0.38 but nothing in the changelog seems relevant.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 693594] Re: cgroup-bin should not move kthreadd into a default cgroup
Download full text (3.9 KiB)

Jon,

if you could do these soon in the Debian package, I think we should
still be able to get it into precise.

Rioting_Pacifst,

note that we generally recommend using cgroup-lite when possible.

Quoting Rioting_Pacifst (<email address hidden>):
> This bug has quite a few nasty symptoms (breaks suspend on SMP machines,
> etc) and I guess cgroups are going to get more popular as people try the
> "superpatch" so could we get one of the following into 12.04 please:
>
> 1)do not move all tasks into a default cgroup
> comment CREATE_DEFAULT=yes out and uncomment CREATE_DEFAULT=no in /etc/default/cgconfig
>
> 2)putting the following in /etc/cgconfig.conf
> group sysdefault {
> cpu {
> cpu.rt_runtime_us = 950000;
> }
> }
>
> 3) Add a rule so that [kthreadd] is never put in the default group
>
> 4) Changing startup script to not move ktreadadd
>
> I think creating a superflous default group is stupid anyway so the
> first fix is by far the best.
>
> Upstream have also release 0.38 but nothing in the changelog seems
> relevant.
>
> --
> You received this bug notification because you are subscribed to
> libcgroup in Ubuntu.
> https://bugs.launchpad.net/bugs/693594
>
> Title:
> cgroup-bin should not move kthreadd into a default cgroup
>
> Status in “libcgroup” package in Ubuntu:
> Confirmed
>
> Bug description:
> Steps to reproduce:
> 1. Install cgroup-bin from universe on a stock Lucid machine (I've only tested amd64, but I suspect it shouldn't matter)
> 2. Load an arbitrary module (e.g. modprobe rds)
> 3. Unload the module loaded in (2) (e.g. rmmod rds)
>
> The 'rmmod' process will hang unkillably in the kernel.
>
> Here's an example `dmesg` output from the hung-task watchdog for
> rmmod:
>
>  INFO: task rmmod:1608 blocked for more than 120 seconds.
>  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>  rmmod D 0000000000000000 0 1608 1440 0x00000000
>   ffff880014245c78 0000000000000082 0000000000015bc0 0000000000015bc0
>   ffff880014804890 ffff880014245fd8 0000000000015bc0 ffff8800148044d0
>   0000000000015bc0 ffff880014245fd8 0000000000015bc0 ffff880014804890
>   Call Trace:
>   [<ffffffff81541b6d>] schedule_timeout+0x22d/0x300
>   [<ffffffff812b8716>] ? rb_erase+0xd6/0x160
>   [<ffffffff81052a10>] ? __dequeue_entity+0x30/0x50
>   [<ffffffff8154178b>] wait_for_common+0xdb/0x180
>   [<ffffffff8105a220>] ? default_wake_function+0x0/0x20
>   [<ffffffff815418ed>] wait_for_completion+0x1d/0x20
>   [<ffffffff8107fe55>] flush_cpu_workqueue+0x65/0xa0
>   [<ffffffff8107ff10>] ? wq_barrier_func+0x0/0x20
>   [<ffffffff81080754>] flush_workqueue+0x54/0x80
>   [<ffffffff810b5b24>] __stop_machine+0xf4/0x120
>   [<ffffffff8109d8c0>] ? __try_stop_module+0x0/0x50
>   [<ffffffff810b5d7e>] stop_machine+0x3e/0x60
>   [<ffffffff8109cbd4>] ? find_module+0x34/0x70
>   [<ffffffff8109e1ee>] sys_delete_module+0x17e/0x270
>   [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b
>
> The process is waiting on kstop/0 to wake up and service the stop_cpu
> workqueue work item that it has queued. kstop/0 is marked as
> TASK_RUNNABLE, b...

Read more...

Revision history for this message
Jon Bernard (jbernard) wrote :

You're absolutely right, I will allocate time this week to put together
a new upstream release.

Serge, would have have time to help me test and provide feedback?

    Jon

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

> You're absolutely right, I will allocate time this week to put together
> a new upstream release.
>
> Serge, would have have time to help me test and provide feedback?

I will be out the remainder of this week. I can test it next week
though.

Jon Bernard (jbernard)
Changed in libcgroup (Ubuntu):
assignee: Jon Bernard (jbernard) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.