2015-11-25 16:33:49

by Oleg Nesterov

[permalink] [raw]
Subject: [PATCH?] race between cgroup_subsys->fork() and cgroup_migrate()

Hello,

I am trying to backport cgroup_pids.c and I fail to understand pids_fork()
which does

css = task_get_css(task, pids_cgrp_id);
pids = css_pids(css);

/*
* If the association has changed, we have to revert and reapply the
* charge/uncharge on the wrong hierarchy to the current one. Since
* the association can only change due to an organisation event, its
* okay for us to ignore the limit in this case.
*/
if (pids != old_pids) {
pids_uncharge(old_pids, 1);
pids_charge(pids, 1);
}

But if the association has changed, pids_can_attach() which moved the child
into another cgrp has called pids_uncharge(old_pids) too?

IOW. Suppose that the new child is moved right before cgroup_post_fork() does

for_each_subsys_which(...)
ss->fork(child);

doesn't this mean that after ss->fork() we do the same sequence

pids_uncharge(old_pids, 1);
pids_charge(pids, 1);

twice? Note that threadgroup_change_begin/end depends on CLONE_THREAD.
So we can actually hit WARN_ON() in pids_cancel().

However, we can't simply remove this uncharge/charge afaics. We need this in
case when the parent was moved to another cgroup before cgroup_post_fork(),
and then css_set_move_task() moves the child.



I know almost nothing about cgroups, perhaps I missed something, please
correct me.

If am right. How about the patch below? percpu_down_read() is cheap. And
we can simplify cgroup_pids after this change.

And. We can probably unify cgroup_threadgroup_rwsem and dup_mmap_sem.
Note that if we take cgroup_threadgroup_rwsem for reading if CLONE_THREAD,
otherwise we take another percpu-rwsem in dup_mmap(), dup_mmap_sem.

Or I am totally confused?

Oleg.

--- x/kernel/fork.c
+++ x/kernel/fork.c
@@ -1368,8 +1368,7 @@ static struct task_struct *copy_process(
p->real_start_time = ktime_get_boot_ns();
p->io_context = NULL;
p->audit_context = NULL;
- if (clone_flags & CLONE_THREAD)
- threadgroup_change_begin(current);
+ threadgroup_change_begin(current);
cgroup_fork(p);
#ifdef CONFIG_NUMA
p->mempolicy = mpol_dup(p->mempolicy);
@@ -1610,8 +1609,7 @@ static struct task_struct *copy_process(

proc_fork_connector(p);
cgroup_post_fork(p, cgrp_ss_priv);
- if (clone_flags & CLONE_THREAD)
- threadgroup_change_end(current);
+ threadgroup_change_end(current);
perf_event_fork(p);

trace_task_newtask(p, clone_flags);
--- x/kernel/cgroup_pids.c
+++ x/kernel/cgroup_pids.c
@@ -243,27 +243,10 @@ static void pids_cancel_fork(struct task

static void pids_fork(struct task_struct *task, void *priv)
{
- struct cgroup_subsys_state *css;
- struct cgroup_subsys_state *old_css = priv;
- struct pids_cgroup *pids;
- struct pids_cgroup *old_pids = css_pids(old_css);
-
- css = task_get_css(task, pids_cgrp_id);
- pids = css_pids(css);
-
- /*
- * If the association has changed, we have to revert and reapply the
- * charge/uncharge on the wrong hierarchy to the current one. Since
- * the association can only change due to an organisation event, its
- * okay for us to ignore the limit in this case.
- */
- if (pids != old_pids) {
- pids_uncharge(old_pids, 1);
- pids_charge(pids, 1);
- }
+ struct cgroup_subsys_state *css = priv;

+ WARN_ON(task_css(task, pids_cgrp_id) != css);
css_put(css);
- css_put(old_css);
}

static void pids_free(struct task_struct *task)


2015-11-25 19:51:47

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH?] race between cgroup_subsys->fork() and cgroup_migrate()

Hello, Oleg.

On Wed, Nov 25, 2015 at 05:34:27PM +0100, Oleg Nesterov wrote:
> IOW. Suppose that the new child is moved right before cgroup_post_fork() does
>
> for_each_subsys_which(...)
> ss->fork(child);
>
> doesn't this mean that after ss->fork() we do the same sequence
>
> pids_uncharge(old_pids, 1);
> pids_charge(pids, 1);

You're absolutely right.

> twice? Note that threadgroup_change_begin/end depends on CLONE_THREAD.
> So we can actually hit WARN_ON() in pids_cancel().
>
> However, we can't simply remove this uncharge/charge afaics. We need this in
> case when the parent was moved to another cgroup before cgroup_post_fork(),
> and then css_set_move_task() moves the child.
>
> I know almost nothing about cgroups, perhaps I missed something, please
> correct me.

I can't think of anything better than what you're proposing. Thanks a
lot for tracking it down and fixing it.

> If am right. How about the patch below? percpu_down_read() is cheap. And
> we can simplify cgroup_pids after this change.
>
> And. We can probably unify cgroup_threadgroup_rwsem and dup_mmap_sem.
> Note that if we take cgroup_threadgroup_rwsem for reading if CLONE_THREAD,
> otherwise we take another percpu-rwsem in dup_mmap(), dup_mmap_sem.

Sounds perfect. As this needs to go through -stable, can you please
resend the patch with proper description and SOB? Please also update
the now incorrect comment in can_attach.

Thanks a lot!

--
tejun

2015-11-25 19:54:19

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH?] race between cgroup_subsys->fork() and cgroup_migrate()

On Wed, Nov 25, 2015 at 02:51:38PM -0500, Tejun Heo wrote:
> Sounds perfect. As this needs to go through -stable, can you please
> resend the patch with proper description and SOB? Please also update
> the now incorrect comment in can_attach.

Ooh, the patch triggers RCU warning from task_css(). It's spurious
and I think the right thing to do at least for now is using
task_css_check() and explain what's going on.

Thanks.

--
tejun

2015-11-25 20:41:08

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH?] race between cgroup_subsys->fork() and cgroup_migrate()

On Wed, Nov 25, 2015 at 02:54:10PM -0500, Tejun Heo wrote:
> On Wed, Nov 25, 2015 at 02:51:38PM -0500, Tejun Heo wrote:
> > Sounds perfect. As this needs to go through -stable, can you please
> > resend the patch with proper description and SOB? Please also update
> > the now incorrect comment in can_attach.
>
> Ooh, the patch triggers RCU warning from task_css(). It's spurious
> and I think the right thing to do at least for now is using
> task_css_check() and explain what's going on.

And I spotted a bug in cgroup core in controller enable path. While
the race between fork and attach exists and needs to be fixed, I think
the cgroup core bug is the main source of discrepancy. Will update
once I learn more.

Thanks.

--
tejun

2015-11-26 15:35:35

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH?] race between cgroup_subsys->fork() and cgroup_migrate()

On 11/25, Tejun Heo wrote:
>
> On Wed, Nov 25, 2015 at 02:51:38PM -0500, Tejun Heo wrote:
> > Sounds perfect. As this needs to go through -stable, can you please
> > resend the patch with proper description and SOB? Please also update
> > the now incorrect comment in can_attach.

OK, will do tomorrow (sorry, can't do today).

> Ooh, the patch triggers RCU warning from task_css(). It's spurious
> and I think the right thing to do at least for now is using
> task_css_check() and explain what's going on.

WARN_ON() in pids_fork() I guess. Thanks. I didn't expect you will actually
apply this patch, I didn't even try to compile it ;)

Plus this patch forgets to unconditionalize another threadgroup_change_end()
in the error path of copy_process().

Again, if we do this, we can make other cleanups/simplifications. For example,
we can kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] in copy_process().

But I see another email from you, will reply in a minute.

Oleg.

2015-11-26 16:00:59

by Oleg Nesterov

[permalink] [raw]
Subject: Re: [PATCH?] race between cgroup_subsys->fork() and cgroup_migrate()

On 11/25, Tejun Heo wrote:
>
> And I spotted a bug in cgroup core in controller enable path. While
> the race between fork and attach exists and needs to be fixed, I think
> the cgroup core bug is the main source of discrepancy. Will update
> once I learn more.

OK. I do not know what exactly do you mean, perhaps if you fix this problem
the race between fork and attach goes away and in this case the fix I sent
is not needed? I'll wait for more info.

Oleg.

2015-11-26 23:35:59

by Aleksa Sarai

[permalink] [raw]
Subject: Re: [PATCH?] race between cgroup_subsys->fork() and cgroup_migrate()

> Again, if we do this, we can make other cleanups/simplifications. For example,
> we can kill cgrp_ss_priv[CGROUP_CANFORK_COUNT] in copy_process().

Thank god, I never liked that code. ;)

--
Aleksa Sarai (cyphar)
http://www.cyphar.com