2010-12-24 15:59:34

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH] sched, cgroup: Use exit hook to avoid use-after-free crash

On Fri, 2010-12-24 at 13:16 +0100, Mike Galbraith wrote:
> On Fri, 2010-12-24 at 11:54 +0100, Peter Zijlstra wrote:

> > Right, so the cgroup core is supposed to already emit -EBUSY when there
> > are associated tasks with the cgroup, that _should_ be sufficient, the
> > pre_destroy() method is to frob some extra constraints or somesuch.
> >
> > Our problem looks to be that a task (afaict usually current) changes
> > cgroups without us getting notified of it. On destruction the task is
> > still enqueued in the cfs_rq being destroyed but is not actually part of
> > that cgroup according to the task->css bits.
>
> Could it be an exiting task? We're still preemptible, and iirc, you run
> a CONFIG_PREEMPT kernel. (grasp at all straws;)
>
> cgroup_exit:
> /* Reassign the task to the init_css_set. */
> task_lock(tsk);
> cg = tsk->cgroups;
> tsk->cgroups = &init_css_set;
> task_unlock(tsk);
> if (cg)
> put_css_set_taskexit(cg);
>

This straw appears true:

$ grep -e cpu_cgroup\\\|f491447c log9

...

kworker/-1196 0d..2. 1601180us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /system/systemd-modules-load.service
kworker/-1196 0d..2. 1601186us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /system/systemd-modules-load.service
kworker/-1196 0d..2. 1601188us : __dequeue_entity: f491447c from f492a480, 1 left
kworker/-1196 0d..2. 1601188us : pick_next_task_fair: picked: f491447c, modprobe/1210
kworker/-1196 0d..2. 1601192us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /system/systemd-modules-load.service
modprobe-1210 0d..5. 1601802us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
modprobe-1210 0d..5. 1601807us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
modprobe-1210 0d..2. 1601817us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
modprobe-1210 0d..2. 1601819us : __enqueue_entity: f491447c to f492a480, 1 tasks
modprobe-1210 0d..2. 1601826us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
modprobe-1210 0d..2. 1601832us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
modprobe-1210 0d..2. 1601839us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
kworker/-1196 0d..2. 1601848us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
kworker/-1196 0d..2. 1601854us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
kworker/-1196 0d..2. 1601860us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
kworker/-1196 0d..2. 1601865us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
kworker/-1196 0d..2. 1601871us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
kworker/-1196 0d..2. 1601872us : __dequeue_entity: f491447c from f492a480, 1 left
kworker/-1196 0d..2. 1601873us : pick_next_task_fair: picked: f491447c, modprobe/1210
kworker/-1196 0d..2. 1601876us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
modprobe-1210 0d..7. 1601895us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
modprobe-1210 0d..7. 1601900us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
modprobe-1210 0d..2. 1601909us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
modprobe-1210 0d..2. 1601911us : __enqueue_entity: f491447c to f492a480, 1 tasks
modprobe-1210 0d..2. 1601918us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
modprobe-1210 0d..2. 1601924us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
modprobe-1210 0d..2. 1601931us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
kworker/-1196 0d..2. 1602071us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
kworker/-1196 0d..2. 1602080us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
kworker/-1196 0d..2. 1602089us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
kworker/-1196 0d..2. 1602097us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
kworker/-1196 0d..2. 1602105us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
kworker/-1196 0d..2. 1602107us : __dequeue_entity: f491447c from f492a480, 1 left
kworker/-1196 0d..2. 1602108us : pick_next_task_fair: picked: f491447c, modprobe/1210
kworker/-1196 0d..2. 1602114us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
modprobe-1210 0d..3. 1602128us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 80, load: 1024, cgroup: /


So cgroup moves a task without calling cgroup_subsys::attach() which is
odd, but it does have an ::exit method, sadly it calls that _before_
re-assigning the task, which means we have to jump through some hoops.

The below seems to fix the problem for me..

---
Subject: sched, cgroup: Use exit hook to avoid use-after-free crash

By not notifying the controller of the on-exit move back to
init_css_set, we fail to move the task out of the previous cgroup's
cfs_rq. This leads to an opportunity for a cgroup-destroy to come in and
free the cgroup (there are no active tasks left in it after all) to
which the not-quite dead task is still enqueued.

Cc: [email protected]
Reported-by: Miklos Vajna <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
---
kernel/sched.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 7e401f8..572625c 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -611,6 +611,9 @@ static inline struct task_group *task_group(struct task_struct *p)
struct task_group *tg;
struct cgroup_subsys_state *css;

+ if (p->flags & PF_EXITING)
+ return &root_task_group;
+
css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
lockdep_is_held(&task_rq(p)->lock));
tg = container_of(css, struct task_group, css);
@@ -8887,6 +8890,12 @@ cpu_cgroup_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
}
}

+static void
+cpu_cgroup_exit(struct cgroup_subsys *ss, struct task_struct *task)
+{
+ sched_move_task(task);
+}
+
#ifdef CONFIG_FAIR_GROUP_SCHED
static int cpu_shares_write_u64(struct cgroup *cgrp, struct cftype *cftype,
u64 shareval)
@@ -8959,6 +8968,7 @@ struct cgroup_subsys cpu_cgroup_subsys = {
.destroy = cpu_cgroup_destroy,
.can_attach = cpu_cgroup_can_attach,
.attach = cpu_cgroup_attach,
+ .exit = cpu_cgroup_exit,
.populate = cpu_cgroup_populate,
.subsys_id = cpu_cgroup_subsys_id,
.early_init = 1,


2010-12-24 16:40:49

by Miklos Vajna

[permalink] [raw]
Subject: Re: [PATCH] sched, cgroup: Use exit hook to avoid use-after-free crash

> Subject: sched, cgroup: Use exit hook to avoid use-after-free crash
>
> By not notifying the controller of the on-exit move back to
> init_css_set, we fail to move the task out of the previous cgroup's
> cfs_rq. This leads to an opportunity for a cgroup-destroy to come in and
> free the cgroup (there are no active tasks left in it after all) to
> which the not-quite dead task is still enqueued.
>
> Cc: [email protected]
> Reported-by: Miklos Vajna <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> kernel/sched.c | 10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)

Thanks! :)

Reported-and-tested-by: Miklos Vajna <[email protected]>


Attachments:
(No filename) (708.00 B)
(No filename) (198.00 B)
Download all attachments

2010-12-24 16:48:29

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] sched, cgroup: Use exit hook to avoid use-after-free crash

On Fri, 2010-12-24 at 16:59 +0100, Peter Zijlstra wrote:

> So cgroup moves a task without calling cgroup_subsys::attach() which is
> odd, but it does have an ::exit method, sadly it calls that _before_
> re-assigning the task, which means we have to jump through some hoops.

Could you do the move in cgroup_exit() in the CONFIG_PREEMPT case?

-Mike

2010-12-24 17:07:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] sched, cgroup: Use exit hook to avoid use-after-free crash

On Fri, 2010-12-24 at 17:48 +0100, Mike Galbraith wrote:
> On Fri, 2010-12-24 at 16:59 +0100, Peter Zijlstra wrote:
>
> > So cgroup moves a task without calling cgroup_subsys::attach() which is
> > odd, but it does have an ::exit method, sadly it calls that _before_
> > re-assigning the task, which means we have to jump through some hoops.
>
> Could you do the move in cgroup_exit() in the CONFIG_PREEMPT case?

I'm not really comfortable relying on that.. voluntary might just grow a
cond_resched() somewhere in the exit path and lead us down the same
path, also I think that !preempt smp might have the same race.

2010-12-24 17:25:02

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] sched, cgroup: Use exit hook to avoid use-after-free crash

On Fri, 2010-12-24 at 18:07 +0100, Peter Zijlstra wrote:
> On Fri, 2010-12-24 at 17:48 +0100, Mike Galbraith wrote:
> > On Fri, 2010-12-24 at 16:59 +0100, Peter Zijlstra wrote:
> >
> > > So cgroup moves a task without calling cgroup_subsys::attach() which is
> > > odd, but it does have an ::exit method, sadly it calls that _before_
> > > re-assigning the task, which means we have to jump through some hoops.
> >
> > Could you do the move in cgroup_exit() in the CONFIG_PREEMPT case?
>
> I'm not really comfortable relying on that.. voluntary might just grow a
> cond_resched() somewhere in the exit path and lead us down the same
> path, also I think that !preempt smp might have the same race.

Yeah, good point.

-Mike

2010-12-25 19:33:49

by Balbir Singh

[permalink] [raw]
Subject: Re: [PATCH] sched, cgroup: Use exit hook to avoid use-after-free crash

* Peter Zijlstra <[email protected]> [2010-12-24 16:59:13]:

> On Fri, 2010-12-24 at 13:16 +0100, Mike Galbraith wrote:
> > On Fri, 2010-12-24 at 11:54 +0100, Peter Zijlstra wrote:
>
> > > Right, so the cgroup core is supposed to already emit -EBUSY when there
> > > are associated tasks with the cgroup, that _should_ be sufficient, the
> > > pre_destroy() method is to frob some extra constraints or somesuch.
> > >
> > > Our problem looks to be that a task (afaict usually current) changes
> > > cgroups without us getting notified of it. On destruction the task is
> > > still enqueued in the cfs_rq being destroyed but is not actually part of
> > > that cgroup according to the task->css bits.
> >
> > Could it be an exiting task? We're still preemptible, and iirc, you run
> > a CONFIG_PREEMPT kernel. (grasp at all straws;)
> >
> > cgroup_exit:
> > /* Reassign the task to the init_css_set. */
> > task_lock(tsk);
> > cg = tsk->cgroups;
> > tsk->cgroups = &init_css_set;
> > task_unlock(tsk);
> > if (cg)
> > put_css_set_taskexit(cg);
> >
>
> This straw appears true:
>
> $ grep -e cpu_cgroup\\\|f491447c log9
>
> ...
>
> kworker/-1196 0d..2. 1601180us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /system/systemd-modules-load.service
> kworker/-1196 0d..2. 1601186us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /system/systemd-modules-load.service
> kworker/-1196 0d..2. 1601188us : __dequeue_entity: f491447c from f492a480, 1 left
> kworker/-1196 0d..2. 1601188us : pick_next_task_fair: picked: f491447c, modprobe/1210
> kworker/-1196 0d..2. 1601192us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /system/systemd-modules-load.service
> modprobe-1210 0d..5. 1601802us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> modprobe-1210 0d..5. 1601807us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> modprobe-1210 0d..2. 1601817us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> modprobe-1210 0d..2. 1601819us : __enqueue_entity: f491447c to f492a480, 1 tasks
> modprobe-1210 0d..2. 1601826us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> modprobe-1210 0d..2. 1601832us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> modprobe-1210 0d..2. 1601839us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1601848us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1601854us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1601860us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1601865us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1601871us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1601872us : __dequeue_entity: f491447c from f492a480, 1 left
> kworker/-1196 0d..2. 1601873us : pick_next_task_fair: picked: f491447c, modprobe/1210
> kworker/-1196 0d..2. 1601876us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 0, load: 1024, cgroup: /
> modprobe-1210 0d..7. 1601895us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> modprobe-1210 0d..7. 1601900us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> modprobe-1210 0d..2. 1601909us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> modprobe-1210 0d..2. 1601911us : __enqueue_entity: f491447c to f492a480, 1 tasks
> modprobe-1210 0d..2. 1601918us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> modprobe-1210 0d..2. 1601924us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> modprobe-1210 0d..2. 1601931us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1602071us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1602080us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1602089us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1602097us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1602105us : __print_runqueue: se: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> kworker/-1196 0d..2. 1602107us : __dequeue_entity: f491447c from f492a480, 1 left
> kworker/-1196 0d..2. 1602108us : pick_next_task_fair: picked: f491447c, modprobe/1210
> kworker/-1196 0d..2. 1602114us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 16, load: 1024, cgroup: /
> modprobe-1210 0d..3. 1602128us : __print_runqueue: curr: f491447c, comm: modprobe/1210, state: 80, load: 1024, cgroup: /
>
>
> So cgroup moves a task without calling cgroup_subsys::attach() which is
> odd, but it does have an ::exit method, sadly it calls that _before_
> re-assigning the task, which means we have to jump through some hoops.
>
> The below seems to fix the problem for me..
>
> ---
> Subject: sched, cgroup: Use exit hook to avoid use-after-free crash
>
> By not notifying the controller of the on-exit move back to
> init_css_set, we fail to move the task out of the previous cgroup's
> cfs_rq. This leads to an opportunity for a cgroup-destroy to come in and
> free the cgroup (there are no active tasks left in it after all) to
> which the not-quite dead task is still enqueued.
>
> Cc: [email protected]
> Reported-by: Miklos Vajna <[email protected]>
> Signed-off-by: Peter Zijlstra <[email protected]>
> ---
> kernel/sched.c | 10 ++++++++++
> 1 files changed, 10 insertions(+), 0 deletions(-)
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 7e401f8..572625c 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -611,6 +611,9 @@ static inline struct task_group *task_group(struct task_struct *p)
> struct task_group *tg;
> struct cgroup_subsys_state *css;
>
> + if (p->flags & PF_EXITING)
> + return &root_task_group;
> +
> css = task_subsys_state_check(p, cpu_cgroup_subsys_id,
> lockdep_is_held(&task_rq(p)->lock));
> tg = container_of(css, struct task_group, css);
> @@ -8887,6 +8890,12 @@ cpu_cgroup_attach(struct cgroup_subsys *ss, struct cgroup *cgrp,
> }
> }
>
> +static void
> +cpu_cgroup_exit(struct cgroup_subsys *ss, struct task_struct *task)
> +{
> + sched_move_task(task);
> +}
> +
> #ifdef CONFIG_FAIR_GROUP_SCHED
> static int cpu_shares_write_u64(struct cgroup *cgrp, struct cftype *cftype,
> u64 shareval)
> @@ -8959,6 +8968,7 @@ struct cgroup_subsys cpu_cgroup_subsys = {
> .destroy = cpu_cgroup_destroy,
> .can_attach = cpu_cgroup_can_attach,
> .attach = cpu_cgroup_attach,
> + .exit = cpu_cgroup_exit,
> .populate = cpu_cgroup_populate,
> .subsys_id = cpu_cgroup_subsys_id,
> .early_init = 1,
>
>

Very good catch!

Looks very reasonable and correct to me

Acked-by: Balbir Singh <[email protected]>



--
Three Cheers,
Balbir

2010-12-25 20:59:33

by Paul Menage

[permalink] [raw]
Subject: Re: [PATCH] sched, cgroup: Use exit hook to avoid use-after-free crash

On Fri, Dec 24, 2010 at 3:59 PM, Peter Zijlstra <[email protected]> wrote:
> Subject: sched, cgroup: Use exit hook to avoid use-after-free crash
>
> By not notifying the controller of the on-exit move back to
> init_css_set, we fail to move the task out of the previous cgroup's
> cfs_rq. This leads to an opportunity for a cgroup-destroy to come in and
> free the cgroup (there are no active tasks left in it after all) to
> which the not-quite dead task is still enqueued.

While this patch is likely fine for solving the problem, it does add
extra work into the task exit path.

Could you instead just use the pre_destroy callback to return -EBUSY
if there are still any tasks on the cfs_rq? That way there'd only be a
penalty on cgroup destruction, which is a much rarer operation.

Paul

2010-12-29 15:26:33

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched, cgroup: Use exit hook to avoid use-after-free crash


I tried this patch, but it causes a boot crash:

udevd[109]: delete_path: rmdir(/dev/.udev/failed) failed: Read-only file sgeneral protection fault: 0000 [#1] SMP
last sysfs file: /sys/bus/mdio_bus/drivers/Generic PHY/uevent
CPU 0
Modules linked in:

ystem
udevd[10Pid: 109, comm: udevd Not tainted 2.6.37-rc8-tip-01815-g4e7b4f4-dirty #76296 A8N-E/System Product Name
RIP: 0010:[<ffffffff810276ff>] 9]: delete_path: [<ffffffff810276ff>] task_group+0x4d/0x53
RSP: 0018:ffff88003d33bd50 EFLAGS: 00010046
RAX: 6b6b6b6b6b6b6b6b RBX: ffff88003d06f460 RCX: ffffffff00000000
RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88003d06f460
rmdir(/dev/.udeRBP: ffff88003d33bd50 R08: ffff88003e40ad68 R09: 000000000000005a
R10: ffff88003f7d62c8 R11: ffff88003d086d80 R12: 0000000000000000
v/failed) failedR13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f0b23077780(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
: Read-only fileCS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f0b222cb000 CR3: 000000003d36d000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
system
udevd[DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process udevd (pid: 109, threadinfo ffff88003d33a000, task ffff88003e63ee40)
Stack:
ffff88003d33bd70 ffffffff8102771c ffff88003d06f460 0000000000000000
ffff88003d33bd90 ffffffff81027c9c ffff88003d06f460 ffff88003fc11000
ffff88003d33bdd0 ffffffff81034d6d109]: delete_pat ffff88003d06f460 0000000000000246
Call Trace:
[<ffffffff8102771c>] set_task_rq+0x17/0x73
h: rmdir(/dev/.u [<ffffffff81027c9c>] task_move_group_fair+0x37/0x53
dev/failed) fail [<ffffffff81034d6d>] sched_move_task+0x71/0xbc
ed: Read-only fi [<ffffffff81034dc9>] cpu_cgroup_exit+0x11/0x13
[<ffffffff81070e88>] cgroup_exit+0x35/0xd4
le system
udev [<ffffffff8103706a>] copy_process+0xf3a/0xfc7
d[109]: delete_p [<ffffffff81037283>] do_fork+0x163/0x2d7
[<ffffffff810b456a>] ? do_munmap+0x2f2/0x30b
[<ffffffff8100a0c2>] sys_clone+0x28/0x2a
ath: rmdir(/dev/ [<ffffffff81002cf3>] stub_clone+0x13/0x20
[<ffffffff81002a8b>] ? system_call_fastpath+0x16/0x1b
Code: 15 cb 0f 4e 00 85 .udev/failed) fad2 74 12 48 3d 30 03 5a 81 75 0a 48 81 7f 30 90 ad 33 81 iled: Read-only 74 02 c9 c3 48 8b 87 00 04 00 00 48 8b 80 70 01 00 00 <48> 8b 40 08 eb ea 55 48 89 e5 41 54 53 e8 6f b1 fd ff 48 89 fb
RIP [<ffffffff810276ff>] task_group+0x4d/0x53
RSP <ffff88003d33bd50>
---[ end trace cde1903a0ae5345d ]---
Kernel panic - not syncing: Fatal exception
Pid: 109, comm: udevd Tainted: G D 2.6.37-rc8-tip-01815-g4e7b4f4-dirty #76296
Call Trace:
[<ffffffff8132a7d6>] ? panic+0x91/0x194
[<ffffffff8103931c>] ? kmsg_dump+0x11a/0x136
[<ffffffff8132d357>] ? oops_end+0xae/0xbe
[<ffffffff810060eb>] ? die+0x5a/0x63
[<ffffffff8132cdaa>] ? do_general_protection+0x133/0x13b
[<ffffffff8132c8af>] ? general_protection+0x1f/0x30
[<ffffffff810276ff>] ? task_group+0x4d/0x53
[<ffffffff8102771c>] ? set_task_rq+0x17/0x73
[<ffffffff81027c9c>] ? task_move_group_fair+0x37/0x53
[<ffffffff81034d6d>] ? sched_move_task+0x71/0xbc
[<ffffffff81034dc9>] ? cpu_cgroup_exit+0x11/0x13
[<ffffffff81070e88>] ? cgroup_exit+0x35/0xd4
[<ffffffff8103706a>] ? copy_process+0xf3a/0xfc7
[<ffffffff81037283>] ? do_fork+0x163/0x2d7
[<ffffffff810b456a>] ? do_munmap+0x2f2/0x30b
[<ffffffff8100a0c2>] ? sys_clone+0x28/0x2a
[<ffffffff81002cf3>] ? stub_clone+0x13/0x20
[<ffffffff81002a8b>] ? system_call_fastpath+0x16/0x1b


Config attached.

Ingo


Attachments:
(No filename) (3.43 kB)
config (65.22 kB)
Download all attachments

2010-12-29 23:07:54

by Miklos Vajna

[permalink] [raw]
Subject: Re: [PATCH] sched, cgroup: Use exit hook to avoid use-after-free crash

On Wed, Dec 29, 2010 at 04:25:22PM +0100, Ingo Molnar <[email protected]> wrote:
> I tried this patch, but it causes a boot crash:

Hm, indeed. (I get a crash in qemu, but not on the host machine.)

qemu -enable-kvm -kernel kernel-build/arch/x86/boot/bzImage -append "root=/dev/sda1 debug sched_debug ignore_loglevel sysrq_always_enabled console=ttyS0 init=/bin/systemd" -hda systemd.img -serial stdio -m 1G -vnc :0

does not crash here, but

qemu -enable-kvm -kernel kernel-build/arch/x86/boot/bzImage -append "root=/dev/sda1 debug sched_debug ignore_loglevel sysrq_always_enabled console=ttyS0" -hda systemd.img -serial stdio -m 1G -vnc :0

does.

I'm attaching the config (what I already sent earlier in this thread)
and the output of the above two commands just in case that helps Peter.


Attachments:
(No filename) (0.00 B)
(No filename) (198.00 B)
Download all attachments

2010-12-31 08:33:16

by Mike Galbraith

[permalink] [raw]
Subject: [PATCH] Re: [PATCH] sched, cgroup: Use exit hook to avoid use-after-free crash

On Wed, 2010-12-29 at 16:25 +0100, Ingo Molnar wrote:

> I tried this patch, but it causes a boot crash:

The below should fix it.

sched: fix autogroup reference leak and cpu_cgroup_exit() explosion

In the event of a fork failure, the new cpu_cgroup_exit() method tries to
move an unhashed task. Since PF_EXITING isn't set in that case, autogroup
will dig aground in a freed signal_struct. Neither cgroups nor autogroup
has anything it needs to do with this shade, so don't go there.

This also uncovered a struct autogroup reference leak. copy_process() was
simply freeing vs putting the signal_struct, stranding a reference.

Signed-off-by: Mike Galbraith <[email protected]>

---
kernel/fork.c | 2 +-
kernel/sched.c | 10 ++++++++++
2 files changed, 11 insertions(+), 1 deletion(-)

Index: linux-2.6.37.git/kernel/fork.c
===================================================================
--- linux-2.6.37.git.orig/kernel/fork.c
+++ linux-2.6.37.git/kernel/fork.c
@@ -1318,7 +1318,7 @@ bad_fork_cleanup_mm:
}
bad_fork_cleanup_signal:
if (!(clone_flags & CLONE_THREAD))
- free_signal_struct(p->signal);
+ put_signal_struct(p->signal);
bad_fork_cleanup_sighand:
__cleanup_sighand(p->sighand);
bad_fork_cleanup_fs:
Index: linux-2.6.37.git/kernel/sched.c
===================================================================
--- linux-2.6.37.git.orig/kernel/sched.c
+++ linux-2.6.37.git/kernel/sched.c
@@ -9193,6 +9193,16 @@ cpu_cgroup_attach(struct cgroup_subsys *
static void
cpu_cgroup_exit(struct cgroup_subsys *ss, struct task_struct *task)
{
+ /*
+ * cgroup_exit() is called in the copy_process failure path.
+ * The task isn't hashed, and we don't want to make autogroup
+ * dig into a freed signal_struct, so just go away.
+ *
+ * XXX: why are cgroup methods diddling unattached tasks?
+ */
+ if (!(task->flags & PF_EXITING))
+ return;
+
sched_move_task(task);
}


2010-12-31 10:05:05

by Mike Galbraith

[permalink] [raw]
Subject: Re: [PATCH] sched, cgroup: Use exit hook to avoid use-after-free crash

On Thu, 2010-12-30 at 00:07 +0100, Miklos Vajna wrote:
> On Wed, Dec 29, 2010 at 04:25:22PM +0100, Ingo Molnar <[email protected]> wrote:
> > I tried this patch, but it causes a boot crash:
>
> Hm, indeed. (I get a crash in qemu, but not on the host machine.)
>
> qemu -enable-kvm -kernel kernel-build/arch/x86/boot/bzImage -append "root=/dev/sda1 debug sched_debug ignore_loglevel sysrq_always_enabled console=ttyS0 init=/bin/systemd" -hda systemd.img -serial stdio -m 1G -vnc :0
>
> does not crash here, but
>
> qemu -enable-kvm -kernel kernel-build/arch/x86/boot/bzImage -append "root=/dev/sda1 debug sched_debug ignore_loglevel sysrq_always_enabled console=ttyS0" -hda systemd.img -serial stdio -m 1G -vnc :0
>
> does.
>
> I'm attaching the config (what I already sent earlier in this thread)
> and the output of the above two commands just in case that helps Peter.

Your crash seems to be completely independent of Peter's patch. It
crashes here with your image/config/2.6.36 here regardless, and not 100%
repeatably. Sometimes it boots, sometimes not.

-Mike

2010-12-31 10:46:45

by Miklos Vajna

[permalink] [raw]
Subject: Re: [PATCH] sched, cgroup: Use exit hook to avoid use-after-free crash

On Fri, Dec 31, 2010 at 11:04:19AM +0100, Mike Galbraith <[email protected]> wrote:
> Your crash seems to be completely independent of Peter's patch. It
> crashes here with your image/config/2.6.36 here regardless, and not 100%
> repeatably. Sometimes it boots, sometimes not.

Oh, then ignore that. It's possible it's something qemu-specific, I
can't reproduce this one on my real machine.


Attachments:
(No filename) (389.00 B)
(No filename) (198.00 B)
Download all attachments