2012-10-18 14:51:04

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: Is not locking task_lock in cgroup_fork() safe?

2012/10/16 Tejun Heo <[email protected]>:
> Hey, Frederic.
>
> On Mon, Oct 08, 2012 at 02:48:58PM +0200, Frederic Weisbecker wrote:
>> Yeah I missed this one.
>> Now the whole cgroup_attach_task() is clusteracy without the
>
> Clusteracy?
>
>> threadgroup lock anyway:
>>
>> * The PF_EXITING check is racy (we are neither holding tsk->flags nor
>> threagroup lock).
>
> PF_EXITING is *always* protected by threadgroup_change_begin/end().
>
>> * The cgrp == oldcgrp is racy (exit() can change the oldcgrp anytime.
>
> So, as long as this happens after PF_EXITING check, it should be safe.
>
>> * can_attach / attach / cancel_attach can race against fork/exit (and
>> post_fork if you consider those interested in cgroup task link like
>> the freezer. But that is racy in any case already even with
>> threadgroup lock)
>
> Against exit, no. Against forking a new process, can they? If so, we
> need to fix it.
>
>> It has been designed to be called under that lock. So I suspect the
>
> Ummm.... threadgroup_lock is a recent addition so things couldn't have
> been designed to be called under that lock. threadgroup_lock protects
> the *threadgroup* - creating a new task in the same process or a task
> of the process exiting. It doesn't do anything about other processes.
> In fact, the lock itself is per-process.
>
>> best, at least for now, is to threadgroup lock from
>> cgroup_attach_task_all(). And also make cgroup_attach_task() static to
>> avoid future unsafe callers.
>
> Oh, from that call path, sure. Can someone teach me why we need that
> one at all? I think we're confusing each other here. I was talking
> about the usual migration path not protected against forking a new
> process.

Ah right I was confused. Hmm, indeed we have a race here on
cgroup_fork(). How about using css_try_get() in cgroup_fork() and
refetch the parent's css until we succeed? This requires rcu_read_lock
though, and freeing the css_set under RCU.

Don't know which is better.

Different problem but I really would like we sanitize the cgroup hooks
in fork. There is cgroup_fork(), cgroup_post_fork() which takes that
big css_set_lock, plus the big threadgroup lock... I hope we can
simplify the mess there.

>
>> There is no harm yet because the only user of it calls that with
>> current as the "task" parameter, in a place that is
>> not in the middle of a fork. So no need to worry about some stable backport.
>>
>> Also, looking at cgroup_attach_task_all(), what guarantee do we have
>> that "from" is not concurrently exiting and removing its cgrp. Which
>> is a separate problem. But we probably need to do some css_set_get()
>> before playing with it.
>
> I really don't know. Why isn't it locking the threadgroup to begin
> with?

No idea, sounds like something to fix.


2012-10-18 20:07:12

by Tejun Heo

[permalink] [raw]
Subject: Re: Is not locking task_lock in cgroup_fork() safe?

Hello, Frederic.

On Thu, Oct 18, 2012 at 04:50:59PM +0200, Frederic Weisbecker wrote:
> Ah right I was confused. Hmm, indeed we have a race here on
> cgroup_fork(). How about using css_try_get() in cgroup_fork() and
> refetch the parent's css until we succeed? This requires rcu_read_lock
> though, and freeing the css_set under RCU.
>
> Don't know which is better.

For now, I'll revert the patches and cc stable. Let's think about
improving it later.

> Different problem but I really would like we sanitize the cgroup hooks
> in fork. There is cgroup_fork(), cgroup_post_fork() which takes that
> big css_set_lock, plus the big threadgroup lock... I hope we can
> simplify the mess there.

Oh yeah, I've been looking at that one too. There are a few problems
in that area. I think all we need is clearing ->cgroups to NULL on
copy_process() and all the rest can be moved to cgroup_post_fork().
I'd also like to make it very explicit that migration can't happen
before post_fork is complete.

> > I really don't know. Why isn't it locking the threadgroup to begin
> > with?
>
> No idea, sounds like something to fix.

Alrighty.

Thanks.

--
tejun

2012-10-18 20:53:51

by Frederic Weisbecker

[permalink] [raw]
Subject: Re: Is not locking task_lock in cgroup_fork() safe?

2012/10/18 Tejun Heo <[email protected]>:
> Hello, Frederic.
>
> On Thu, Oct 18, 2012 at 04:50:59PM +0200, Frederic Weisbecker wrote:
>> Ah right I was confused. Hmm, indeed we have a race here on
>> cgroup_fork(). How about using css_try_get() in cgroup_fork() and
>> refetch the parent's css until we succeed? This requires rcu_read_lock
>> though, and freeing the css_set under RCU.
>>
>> Don't know which is better.
>
> For now, I'll revert the patches and cc stable. Let's think about
> improving it later.

Ok for reverting in cgroup_fork(). Is it necessary for the
cgroup_post_fork() thing? I don't immediately see any race involved
there.

>> Different problem but I really would like we sanitize the cgroup hooks
>> in fork. There is cgroup_fork(), cgroup_post_fork() which takes that
>> big css_set_lock, plus the big threadgroup lock... I hope we can
>> simplify the mess there.
>
> Oh yeah, I've been looking at that one too. There are a few problems
> in that area. I think all we need is clearing ->cgroups to NULL on
> copy_process() and all the rest can be moved to cgroup_post_fork().
> I'd also like to make it very explicit that migration can't happen
> before post_fork is complete.

Sounds right.

>
>> > I really don't know. Why isn't it locking the threadgroup to begin
>> > with?
>>
>> No idea, sounds like something to fix.
>
> Alrighty.

Ok thanks.

> Thanks.
>
> --
> tejun

2012-10-19 00:38:42

by Tejun Heo

[permalink] [raw]
Subject: Re: Is not locking task_lock in cgroup_fork() safe?

Hello, Frederic.

On Thu, Oct 18, 2012 at 10:53:47PM +0200, Frederic Weisbecker wrote:
> > For now, I'll revert the patches and cc stable. Let's think about
> > improving it later.
>
> Ok for reverting in cgroup_fork(). Is it necessary for the
> cgroup_post_fork() thing? I don't immediately see any race involved
> there.

Even if there isn't an actual race, the comment is dead wrong. I'm
reverting the following three patches. Let's try again later.

7e381b0eb1 ("cgroup: Drop task_lock(parent) on cgroup_fork()")
7e3aa30ac8 ("cgroup: Remove task_lock() from cgroup_post_fork()")
c84cdf75cc ("cgroup: Remove unnecessary task_lock before fetching css_set on migration")

Thanks.

--
tejun

2012-10-19 00:58:08

by Tejun Heo

[permalink] [raw]
Subject: Re: Is not locking task_lock in cgroup_fork() safe?

Hello, again.

On Thu, Oct 18, 2012 at 05:38:35PM -0700, Tejun Heo wrote:
> Even if there isn't an actual race, the comment is dead wrong. I'm
> reverting the following three patches. Let's try again later.
>
> 7e381b0eb1 ("cgroup: Drop task_lock(parent) on cgroup_fork()")
> 7e3aa30ac8 ("cgroup: Remove task_lock() from cgroup_post_fork()")

So, after some more looking, I think the following is correct and
doesn't need to be reverted. It's depending on threadgroup locking
from migration path to synchronize against exit path which is always
performed.

> c84cdf75cc ("cgroup: Remove unnecessary task_lock before fetching css_set on migration")

Frederic, were you trying to say that the above commit is correct?
Li, do you agree?

Thanks.

--
tejun

2012-10-19 08:51:23

by Zefan Li

[permalink] [raw]
Subject: Re: Is not locking task_lock in cgroup_fork() safe?

On 2012/10/19 8:58, Tejun Heo wrote:
> Hello, again.
>
> On Thu, Oct 18, 2012 at 05:38:35PM -0700, Tejun Heo wrote:
>> Even if there isn't an actual race, the comment is dead wrong. I'm
>> reverting the following three patches. Let's try again later.
>>
>> 7e381b0eb1 ("cgroup: Drop task_lock(parent) on cgroup_fork()")
>> 7e3aa30ac8 ("cgroup: Remove task_lock() from cgroup_post_fork()")
>
> So, after some more looking, I think the following is correct and
> doesn't need to be reverted. It's depending on threadgroup locking
> from migration path to synchronize against exit path which is always
> performed.
>
>> c84cdf75cc ("cgroup: Remove unnecessary task_lock before fetching css_set on migration")
>
> Frederic, were you trying to say that the above commit is correct?
> Li, do you agree?
>

This one does look innocent.