2008-06-13 21:39:12

by Chris Friesen

[permalink] [raw]
Subject: odd timing bug with cgroups?


I'm seeing some odd behaviour in the case where a parent task forks a
child and then immediately attempts to put the child into a group.

It appears that there is a window after task creation where the child
task is in "limbo" such that the parent cannot put it into a group. If
I run the parent under "strace" or else sleep for a bit before trying to
put the child into a group, then everything works fine.

This seems odd...I would think that as soon as the fork() call returns
in the parent we should be able to put that task into a group.

I'm just starting to look at the code, but I thought I'd mention it in
case someone knows exactly where to look.

Chris


2008-06-14 04:15:59

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: odd timing bug with cgroups?

On Fri, 13 Jun 2008 15:38:29 MDT, Chris Friesen said:

> This seems odd...I would think that as soon as the fork() call returns
> in the parent we should be able to put that task into a group.

I'm admittedly shooting in the dark here, but remember that a successful
fork() call returns *twice*. Just because the *parent* has returned
doesn't mean that the *child* has finished all the processing and returned
as well - it may be delayed by other kernel threads etc and still not quite
ready for tweaking.

It sounds like a variant of the race conditions we had a while back where
lots of programs blew chunks when we started having "child runs first"
semantics so children could run and exit before the parent was ready
for it?


Attachments:
(No filename) (226.00 B)

2008-06-16 16:45:34

by Chris Friesen

[permalink] [raw]
Subject: Re: odd timing bug with cgroups?

[email protected] wrote:
> On Fri, 13 Jun 2008 15:38:29 MDT, Chris Friesen said:

>>This seems odd...I would think that as soon as the fork() call returns
>>in the parent we should be able to put that task into a group.

> I'm admittedly shooting in the dark here, but remember that a successful
> fork() call returns *twice*. Just because the *parent* has returned
> doesn't mean that the *child* has finished all the processing and returned
> as well - it may be delayed by other kernel threads etc and still not quite
> ready for tweaking.

I was thinking something like this as well, like maybe we can't move the
child to another group until it gets scheduled in once, or something
similar.

If that is the case, I think it's a bug--on return from fork() the
child's pid is visible (because the parent knows it) and so it should be
valid to use for any operation that takes a pid.

Chris

2008-06-16 21:30:16

by Chris Friesen

[permalink] [raw]
Subject: Re: odd timing bug with cgroups -- solved

Chris Friesen wrote:

> I was thinking something like this as well, like maybe we can't move the
> child to another group until it gets scheduled in once, or something
> similar.

Well, it appears I was way off base. My main task is SCHED_RR, and my
child tasks put themselves into SCHED_OTHER. I didn't have any realtime
groups configured, so it was bailing out in the first conditional in
cpu_cgroup_can_attach().

By also having the parent put the children into SCHED_OTHER, my testcase
will handle both cases (parent or child runs first) and everything seems
to be working fine.

Chris