Hey,
Giuseppe reported that the the affinity mask isn't updated when a
process is spawned directly into the target cgroup via
CLONE_INTO_CGROUP. However, migrating a process will cause the affinity
mask to be updated (see the repro at [1].
I took a quick look and the issue seems to be that we don't call the
various attach handlers during CLONE_INTO_CGROUP whereas we do for
migration. So the solution seems to roughly be that we need to call the
various attach handlers during CLONE_INTO_CGROUP as well when the
parent's cgroups is different from the child cgroup. I think we need to
call all of them, can, cancel and attach.
The plumbing here might be a bit intricate since the arguments that the
fork handlers take are different from the attach handlers.
Christian
[1]: https://paste.centos.org/view/f434fa1a
On Tue, Mar 28, 2023 at 05:39:52PM +0200, Christian Brauner wrote:
> Hey,
>
> Giuseppe reported that the the affinity mask isn't updated when a
> process is spawned directly into the target cgroup via
> CLONE_INTO_CGROUP. However, migrating a process will cause the affinity
> mask to be updated (see the repro at [1].
>
> I took a quick look and the issue seems to be that we don't call the
> various attach handlers during CLONE_INTO_CGROUP whereas we do for
> migration. So the solution seems to roughly be that we need to call the
> various attach handlers during CLONE_INTO_CGROUP as well when the
> parent's cgroups is different from the child cgroup. I think we need to
> call all of them, can, cancel and attach.
>
> The plumbing here might be a bit intricate since the arguments that the
> fork handlers take are different from the attach handlers.
But note, as Johannes already pointed out somewhere else, that there's
probably a lot of code that doesn't apply to the CLONE_INTO_CGROUP case
so it might also make sense to just move the missing pieces into the
fork handlers.
On 3/28/23 11:39, Christian Brauner wrote:
> Hey,
>
> Giuseppe reported that the the affinity mask isn't updated when a
> process is spawned directly into the target cgroup via
> CLONE_INTO_CGROUP. However, migrating a process will cause the affinity
> mask to be updated (see the repro at [1].
>
> I took a quick look and the issue seems to be that we don't call the
> various attach handlers during CLONE_INTO_CGROUP whereas we do for
> migration. So the solution seems to roughly be that we need to call the
> various attach handlers during CLONE_INTO_CGROUP as well when the
> parent's cgroups is different from the child cgroup. I think we need to
> call all of them, can, cancel and attach.
>
> The plumbing here might be a bit intricate since the arguments that the
> fork handlers take are different from the attach handlers.
>
> Christian
>
> [1]: https://paste.centos.org/view/f434fa1a
>
I saw that the current cgroup code already have the can_fork, fork and
cancel_fork callbacks. Unfortunately such callbacks are not defined for
cpuset yet. That is why the cpu affinity isn't correctly updated. I can
post a patch to add those callback functions to cpuset which should then
able to correctly address this issue.
Cheers,
Longman
On 3/28/23 21:30, Waiman Long wrote:
> On 3/28/23 11:39, Christian Brauner wrote:
>> Hey,
>>
>> Giuseppe reported that the the affinity mask isn't updated when a
>> process is spawned directly into the target cgroup via
>> CLONE_INTO_CGROUP. However, migrating a process will cause the affinity
>> mask to be updated (see the repro at [1].
>>
>> I took a quick look and the issue seems to be that we don't call the
>> various attach handlers during CLONE_INTO_CGROUP whereas we do for
>> migration. So the solution seems to roughly be that we need to call the
>> various attach handlers during CLONE_INTO_CGROUP as well when the
>> parent's cgroups is different from the child cgroup. I think we need to
>> call all of them, can, cancel and attach.
>>
>> The plumbing here might be a bit intricate since the arguments that the
>> fork handlers take are different from the attach handlers.
>>
>> Christian
>>
>> [1]: https://paste.centos.org/view/f434fa1a
>>
> I saw that the current cgroup code already have the can_fork, fork and
> cancel_fork callbacks. Unfortunately such callbacks are not defined
> for cpuset yet. That is why the cpu affinity isn't correctly updated.
> I can post a patch to add those callback functions to cpuset which
> should then able to correctly address this issue.
Looking further into this issue, I am thinking that forking into a
cgroup should be equivalent to write the child pid into the
"cgroup.threads" file of the target cgroup. By taking this route, all
the existing can_attach, attach and cancel_attach methods can be used. I
believe the original fork method is for the limited use case of forking
into the same cgroup. So right now, only the pids controller has the
fork methods. Otherwise, we will have to modify a number of different
controllers to add the necessary fork methods. They will be somewhat
similar to the existing attach methods and so it will be a lot of
duplication. What do you think about this idea?
Cheers,
Longman
On Tue, Mar 28, 2023 at 10:48:49PM -0400, Waiman Long wrote:
> On 3/28/23 21:30, Waiman Long wrote:
> > On 3/28/23 11:39, Christian Brauner wrote:
> > > Hey,
> > >
> > > Giuseppe reported that the the affinity mask isn't updated when a
> > > process is spawned directly into the target cgroup via
> > > CLONE_INTO_CGROUP. However, migrating a process will cause the affinity
> > > mask to be updated (see the repro at [1].
> > >
> > > I took a quick look and the issue seems to be that we don't call the
> > > various attach handlers during CLONE_INTO_CGROUP whereas we do for
> > > migration. So the solution seems to roughly be that we need to call the
> > > various attach handlers during CLONE_INTO_CGROUP as well when the
> > > parent's cgroups is different from the child cgroup. I think we need to
> > > call all of them, can, cancel and attach.
> > >
> > > The plumbing here might be a bit intricate since the arguments that the
> > > fork handlers take are different from the attach handlers.
> > >
> > > Christian
> > >
> > > [1]: https://paste.centos.org/view/f434fa1a
> > >
> > I saw that the current cgroup code already have the can_fork, fork and
> > cancel_fork callbacks. Unfortunately such callbacks are not defined for
> > cpuset yet. That is why the cpu affinity isn't correctly updated. I can
> > post a patch to add those callback functions to cpuset which should then
> > able to correctly address this issue.
>
> Looking further into this issue, I am thinking that forking into a cgroup
> should be equivalent to write the child pid into the "cgroup.threads" file
> of the target cgroup. By taking this route, all the existing can_attach,
> attach and cancel_attach methods can be used. I believe the original fork
> method is for the limited use case of forking into the same cgroup. So right
> now, only the pids controller has the fork methods. Otherwise, we will have
> to modify a number of different controllers to add the necessary fork
> methods. They will be somewhat similar to the existing attach methods and so
> it will be a lot of duplication. What do you think about this idea?
The overall plan sounds good to me. I have one comment and question
about making this equivalent to a write of the child pid into the
cgroup.threads file.
The paragraph above seems to imply that CLONE_INTO_CGROUP currently
isn't equivalent to a write to cgroup.threads. But it's not that
straightforward. CLONE_INTO_CGROUP needs to handle both threads and
threadgroups aka being or-ed with CLONE_THREAD or not. It does that in
cgroup_css_set_fork() when calling
cgroup_attach_permissions([...] !(kargs->flags & CLONE_THREAD), [...]).
What it's missing is calling the relevant handlers that would be
executed in the migration path. They might be different between the
CLONE_THREAD and !CLONE_THREAD case. But the crux remains that
CLONE_INTO_CGROUP needs to handle both cases.
So afaict, what you're proposing is equivalent to what I sketched in the
initial mail? Or is there something else you mean by making this
equivalent to cgroup.threads that goes beyond adding the missing
handlers? Just trying to make sure we're not accidently changing
semantics.
On Tue, Mar 28, 2023 at 10:48:49PM -0400, Waiman Long wrote:
> On 3/28/23 21:30, Waiman Long wrote:
> > On 3/28/23 11:39, Christian Brauner wrote:
> > > Hey,
> > >
> > > Giuseppe reported that the the affinity mask isn't updated when a
> > > process is spawned directly into the target cgroup via
> > > CLONE_INTO_CGROUP. However, migrating a process will cause the affinity
> > > mask to be updated (see the repro at [1].
> > >
> > > I took a quick look and the issue seems to be that we don't call the
> > > various attach handlers during CLONE_INTO_CGROUP whereas we do for
> > > migration. So the solution seems to roughly be that we need to call the
> > > various attach handlers during CLONE_INTO_CGROUP as well when the
> > > parent's cgroups is different from the child cgroup. I think we need to
> > > call all of them, can, cancel and attach.
> > >
> > > The plumbing here might be a bit intricate since the arguments that the
> > > fork handlers take are different from the attach handlers.
> > >
> > > Christian
> > >
> > > [1]: https://paste.centos.org/view/f434fa1a
> > >
> > I saw that the current cgroup code already have the can_fork, fork and
> > cancel_fork callbacks. Unfortunately such callbacks are not defined for
> > cpuset yet. That is why the cpu affinity isn't correctly updated. I can
> > post a patch to add those callback functions to cpuset which should then
> > able to correctly address this issue.
>
> Looking further into this issue, I am thinking that forking into a cgroup
> should be equivalent to write the child pid into the "cgroup.threads" file
> of the target cgroup. By taking this route, all the existing can_attach,
> attach and cancel_attach methods can be used. I believe the original fork
> method is for the limited use case of forking into the same cgroup. So right
> now, only the pids controller has the fork methods. Otherwise, we will have
> to modify a number of different controllers to add the necessary fork
> methods. They will be somewhat similar to the existing attach methods and so
> it will be a lot of duplication. What do you think about this idea?
That's what I thought at first too, but then I had some doubts.
The callback is called 'attach', but it's historically implemented
when moving an established task between two cgroups. Many controllers
use it to move state between groups (memcg, pids, cpuset). So in
practice it isn't the natural fit that its name would suggest, and it
would require reworking those controllers to handle both scenarios:
moving tasks between groups, and new tasks attaching to a cgroup.
Now I'm thinking it probably makes more sense to keep using attach for
moving between groups, and fork for being born into a cgroup. That's
what the pid controller does, and it handles CLONE_INTO_CGROUP fine.
There is naturally some overlap between the two operations. But it
seems cleaner to me to use common helpers for that, as opposed to
having both attach and fork callbacks handling forks.
On 3/29/23 10:52, Johannes Weiner wrote:
> On Tue, Mar 28, 2023 at 10:48:49PM -0400, Waiman Long wrote:
>> On 3/28/23 21:30, Waiman Long wrote:
>>> On 3/28/23 11:39, Christian Brauner wrote:
>>>> Hey,
>>>>
>>>> Giuseppe reported that the the affinity mask isn't updated when a
>>>> process is spawned directly into the target cgroup via
>>>> CLONE_INTO_CGROUP. However, migrating a process will cause the affinity
>>>> mask to be updated (see the repro at [1].
>>>>
>>>> I took a quick look and the issue seems to be that we don't call the
>>>> various attach handlers during CLONE_INTO_CGROUP whereas we do for
>>>> migration. So the solution seems to roughly be that we need to call the
>>>> various attach handlers during CLONE_INTO_CGROUP as well when the
>>>> parent's cgroups is different from the child cgroup. I think we need to
>>>> call all of them, can, cancel and attach.
>>>>
>>>> The plumbing here might be a bit intricate since the arguments that the
>>>> fork handlers take are different from the attach handlers.
>>>>
>>>> Christian
>>>>
>>>> [1]: https://paste.centos.org/view/f434fa1a
>>>>
>>> I saw that the current cgroup code already have the can_fork, fork and
>>> cancel_fork callbacks. Unfortunately such callbacks are not defined for
>>> cpuset yet. That is why the cpu affinity isn't correctly updated. I can
>>> post a patch to add those callback functions to cpuset which should then
>>> able to correctly address this issue.
>> Looking further into this issue, I am thinking that forking into a cgroup
>> should be equivalent to write the child pid into the "cgroup.threads" file
>> of the target cgroup. By taking this route, all the existing can_attach,
>> attach and cancel_attach methods can be used. I believe the original fork
>> method is for the limited use case of forking into the same cgroup. So right
>> now, only the pids controller has the fork methods. Otherwise, we will have
>> to modify a number of different controllers to add the necessary fork
>> methods. They will be somewhat similar to the existing attach methods and so
>> it will be a lot of duplication. What do you think about this idea?
> That's what I thought at first too, but then I had some doubts.
>
> The callback is called 'attach', but it's historically implemented
> when moving an established task between two cgroups. Many controllers
> use it to move state between groups (memcg, pids, cpuset). So in
> practice it isn't the natural fit that its name would suggest, and it
> would require reworking those controllers to handle both scenarios:
> moving tasks between groups, and new tasks attaching to a cgroup.
>
> Now I'm thinking it probably makes more sense to keep using attach for
> moving between groups, and fork for being born into a cgroup. That's
> what the pid controller does, and it handles CLONE_INTO_CGROUP fine.
>
> There is naturally some overlap between the two operations. But it
> seems cleaner to me to use common helpers for that, as opposed to
> having both attach and fork callbacks handling forks.
I was thinking along the line of using common helpers for doing fork and
attach. However, the expected method function prototypes are quite
different. For example,
int (*can_attach)(struct cgroup_taskset *tset);
int (*can_fork)(struct task_struct *task, css_set *cset);
We need to make them more similar before we can use common helpers. I
can take a look at that.
Thanks,
Longman
On 3/29/23 03:19, Christian Brauner wrote:
> On Tue, Mar 28, 2023 at 10:48:49PM -0400, Waiman Long wrote:
>> On 3/28/23 21:30, Waiman Long wrote:
>>> On 3/28/23 11:39, Christian Brauner wrote:
>>>> Hey,
>>>>
>>>> Giuseppe reported that the the affinity mask isn't updated when a
>>>> process is spawned directly into the target cgroup via
>>>> CLONE_INTO_CGROUP. However, migrating a process will cause the affinity
>>>> mask to be updated (see the repro at [1].
>>>>
>>>> I took a quick look and the issue seems to be that we don't call the
>>>> various attach handlers during CLONE_INTO_CGROUP whereas we do for
>>>> migration. So the solution seems to roughly be that we need to call the
>>>> various attach handlers during CLONE_INTO_CGROUP as well when the
>>>> parent's cgroups is different from the child cgroup. I think we need to
>>>> call all of them, can, cancel and attach.
>>>>
>>>> The plumbing here might be a bit intricate since the arguments that the
>>>> fork handlers take are different from the attach handlers.
>>>>
>>>> Christian
>>>>
>>>> [1]: https://paste.centos.org/view/f434fa1a
>>>>
>>> I saw that the current cgroup code already have the can_fork, fork and
>>> cancel_fork callbacks. Unfortunately such callbacks are not defined for
>>> cpuset yet. That is why the cpu affinity isn't correctly updated. I can
>>> post a patch to add those callback functions to cpuset which should then
>>> able to correctly address this issue.
>> Looking further into this issue, I am thinking that forking into a cgroup
>> should be equivalent to write the child pid into the "cgroup.threads" file
>> of the target cgroup. By taking this route, all the existing can_attach,
>> attach and cancel_attach methods can be used. I believe the original fork
>> method is for the limited use case of forking into the same cgroup. So right
>> now, only the pids controller has the fork methods. Otherwise, we will have
>> to modify a number of different controllers to add the necessary fork
>> methods. They will be somewhat similar to the existing attach methods and so
>> it will be a lot of duplication. What do you think about this idea?
> The overall plan sounds good to me. I have one comment and question
> about making this equivalent to a write of the child pid into the
> cgroup.threads file.
>
> The paragraph above seems to imply that CLONE_INTO_CGROUP currently
> isn't equivalent to a write to cgroup.threads. But it's not that
> straightforward. CLONE_INTO_CGROUP needs to handle both threads and
> threadgroups aka being or-ed with CLONE_THREAD or not. It does that in
> cgroup_css_set_fork() when calling
> cgroup_attach_permissions([...] !(kargs->flags & CLONE_THREAD), [...]).
>
> What it's missing is calling the relevant handlers that would be
> executed in the migration path. They might be different between the
> CLONE_THREAD and !CLONE_THREAD case. But the crux remains that
> CLONE_INTO_CGROUP needs to handle both cases.
>
> So afaict, what you're proposing is equivalent to what I sketched in the
> initial mail? Or is there something else you mean by making this
> equivalent to cgroup.threads that goes beyond adding the missing
> handlers? Just trying to make sure we're not accidently changing
> semantics.
It turns out that cpuset does have a cpuset_fork() method defined and so
is the legacy freezer even though they don't have a can_fork method. So
the simplest way to fix this problem is to extend the existing
cpuset_fork() method to handle the CLONE_INTO_CGROUP case. I have sent
out an upstream patch to fix that issue.
Cheers,
Longman