2023-04-11 13:39:39

by Waiman Long

[permalink] [raw]
Subject: [PATCH v4 5/5] cgroup/cpuset: Optimize out unneeded cpuset_can_fork/cpuset_cancel_fork calls

The newly introduced cpuset_can_fork() and cpuset_cancel_fork() calls
are only needed when the CLONE_INTO_CGROUP flag is set which is not
likely. Adding an extra cpuset_can_fork() call does introduce a bit
of performance overhead in the fork/clone fastpath. To reduce this
performance overhead, introduce a new clone_into_cgroup_can_fork flag
into the cgroup_subsys structure. This flag, when set, will call the
can_fork and cancel_fork methods only if the CLONE_INTO_CGROUP flag
is set.

The cpuset code is now modified to set this flag. The same cpuset
checking code in cpuset_can_fork() and cpuset_cancel_fork() will have
to stay as the cgroups can be different, but the cpusets may still be
the same. So the same check must be present in both cpuset_fork() and
cpuset_can_fork() to make sure that attach_in_progress is correctly set.

Signed-off-by: Waiman Long <[email protected]>
---
include/linux/cgroup-defs.h | 6 ++++++
kernel/cgroup/cgroup.c | 23 ++++++++++++++++++-----
kernel/cgroup/cpuset.c | 1 +
3 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h
index 8a0d5466c7be..0087a47d80a2 100644
--- a/include/linux/cgroup-defs.h
+++ b/include/linux/cgroup-defs.h
@@ -701,6 +701,12 @@ struct cgroup_subsys {
*/
bool threaded:1;

+ /*
+ * If %true, the controller will call can_fork and cancel_fork
+ * methods only if CLONE_INTO_CGROUP flag is set.
+ */
+ bool clone_into_cgroup_can_fork:1;
+
/* the following two fields are initialized automatically during boot */
int id;
const char *name;
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 83ea13f2ccb1..23701e959ef5 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -6517,6 +6517,10 @@ int cgroup_can_fork(struct task_struct *child, struct kernel_clone_args *kargs)
return ret;

do_each_subsys_mask(ss, i, have_canfork_callback) {
+ if (ss->clone_into_cgroup_can_fork &&
+ !(kargs->flags & CLONE_INTO_CGROUP))
+ continue;
+
ret = ss->can_fork(child, kargs->cset);
if (ret)
goto out_revert;
@@ -6528,8 +6532,12 @@ int cgroup_can_fork(struct task_struct *child, struct kernel_clone_args *kargs)
for_each_subsys(ss, j) {
if (j >= i)
break;
- if (ss->cancel_fork)
- ss->cancel_fork(child, kargs->cset);
+ if (!ss->cancel_fork ||
+ (ss->clone_into_cgroup_can_fork &&
+ !(kargs->flags & CLONE_INTO_CGROUP)))
+ continue;
+
+ ss->cancel_fork(child, kargs->cset);
}

cgroup_css_set_put_fork(kargs);
@@ -6552,9 +6560,14 @@ void cgroup_cancel_fork(struct task_struct *child,
struct cgroup_subsys *ss;
int i;

- for_each_subsys(ss, i)
- if (ss->cancel_fork)
- ss->cancel_fork(child, kargs->cset);
+ for_each_subsys(ss, i) {
+ if (!ss->cancel_fork ||
+ (ss->clone_into_cgroup_can_fork &&
+ !(kargs->flags & CLONE_INTO_CGROUP)))
+ continue;
+
+ ss->cancel_fork(child, kargs->cset);
+ }

cgroup_css_set_put_fork(kargs);
}
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index e4ca2dd2b764..937ef4d60cd4 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3391,6 +3391,7 @@ struct cgroup_subsys cpuset_cgrp_subsys = {
.dfl_cftypes = dfl_files,
.early_init = true,
.threaded = true,
+ .clone_into_cgroup_can_fork = true,
};

/**
--
2.31.1


2023-04-12 18:36:26

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] cgroup/cpuset: Optimize out unneeded cpuset_can_fork/cpuset_cancel_fork calls

On Tue, Apr 11, 2023 at 09:36:01AM -0400, Waiman Long wrote:
> The newly introduced cpuset_can_fork() and cpuset_cancel_fork() calls
> are only needed when the CLONE_INTO_CGROUP flag is set which is not
> likely. Adding an extra cpuset_can_fork() call does introduce a bit
> of performance overhead in the fork/clone fastpath. To reduce this
> performance overhead, introduce a new clone_into_cgroup_can_fork flag
> into the cgroup_subsys structure. This flag, when set, will call the
> can_fork and cancel_fork methods only if the CLONE_INTO_CGROUP flag
> is set.
>
> The cpuset code is now modified to set this flag. The same cpuset
> checking code in cpuset_can_fork() and cpuset_cancel_fork() will have
> to stay as the cgroups can be different, but the cpusets may still be
> the same. So the same check must be present in both cpuset_fork() and
> cpuset_can_fork() to make sure that attach_in_progress is correctly set.
>
> Signed-off-by: Waiman Long <[email protected]>

Waiman, I'm not necessarily against this optimization but can we at least
have some performance numbers to show that this is actually meaningful?
Given how heavy our fork path is, I'm not too sure this would show up in any
meaningful way.

Thanks.

--
tejun

2023-04-12 18:42:32

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] cgroup/cpuset: Optimize out unneeded cpuset_can_fork/cpuset_cancel_fork calls


On 4/12/23 14:27, Tejun Heo wrote:
> On Tue, Apr 11, 2023 at 09:36:01AM -0400, Waiman Long wrote:
>> The newly introduced cpuset_can_fork() and cpuset_cancel_fork() calls
>> are only needed when the CLONE_INTO_CGROUP flag is set which is not
>> likely. Adding an extra cpuset_can_fork() call does introduce a bit
>> of performance overhead in the fork/clone fastpath. To reduce this
>> performance overhead, introduce a new clone_into_cgroup_can_fork flag
>> into the cgroup_subsys structure. This flag, when set, will call the
>> can_fork and cancel_fork methods only if the CLONE_INTO_CGROUP flag
>> is set.
>>
>> The cpuset code is now modified to set this flag. The same cpuset
>> checking code in cpuset_can_fork() and cpuset_cancel_fork() will have
>> to stay as the cgroups can be different, but the cpusets may still be
>> the same. So the same check must be present in both cpuset_fork() and
>> cpuset_can_fork() to make sure that attach_in_progress is correctly set.
>>
>> Signed-off-by: Waiman Long <[email protected]>
> Waiman, I'm not necessarily against this optimization but can we at least
> have some performance numbers to show that this is actually meaningful?
> Given how heavy our fork path is, I'm not too sure this would show up in any
> meaningful way.

That make sense to me. I am OK to leave it for now as it is an
optimization patch anyway.

BTW, another question that I have is about the cgroup_threadgroup_rwsem.
It is currently a percpu rwsem. Is it possible to change it into a
regular rwsem instead? It is causing quite a bit of latency for
workloads that require rather frequent changes to cgroups. I know we
have a "favordynmods" mount option to disable the percpu operation. This
will still be less performant than a normal rwsem. Of course the
downside is that the fork/exit fastpaths will be slowed down a bit.

Thanks,
Longman

2023-04-12 19:29:48

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] cgroup/cpuset: Optimize out unneeded cpuset_can_fork/cpuset_cancel_fork calls

Hello,

On Wed, Apr 12, 2023 at 02:40:53PM -0400, Waiman Long wrote:
> On 4/12/23 14:27, Tejun Heo wrote:
> > On Tue, Apr 11, 2023 at 09:36:01AM -0400, Waiman Long wrote:
> > > The newly introduced cpuset_can_fork() and cpuset_cancel_fork() calls
> > > are only needed when the CLONE_INTO_CGROUP flag is set which is not
> > > likely. Adding an extra cpuset_can_fork() call does introduce a bit
> > > of performance overhead in the fork/clone fastpath. To reduce this
> > > performance overhead, introduce a new clone_into_cgroup_can_fork flag
> > > into the cgroup_subsys structure. This flag, when set, will call the
> > > can_fork and cancel_fork methods only if the CLONE_INTO_CGROUP flag
> > > is set.
> > >
> > > The cpuset code is now modified to set this flag. The same cpuset
> > > checking code in cpuset_can_fork() and cpuset_cancel_fork() will have
> > > to stay as the cgroups can be different, but the cpusets may still be
> > > the same. So the same check must be present in both cpuset_fork() and
> > > cpuset_can_fork() to make sure that attach_in_progress is correctly set.
> > >
> > > Signed-off-by: Waiman Long <[email protected]>
> > Waiman, I'm not necessarily against this optimization but can we at least
> > have some performance numbers to show that this is actually meaningful?
> > Given how heavy our fork path is, I'm not too sure this would show up in any
> > meaningful way.
>
> That make sense to me. I am OK to leave it for now as it is an optimization
> patch anyway.
>
> BTW, another question that I have is about the cgroup_threadgroup_rwsem. It
> is currently a percpu rwsem. Is it possible to change it into a regular
> rwsem instead? It is causing quite a bit of latency for workloads that
> require rather frequent changes to cgroups. I know we have a "favordynmods"
> mount option to disable the percpu operation. This will still be less
> performant than a normal rwsem. Of course the downside is that the fork/exit
> fastpaths will be slowed down a bit.

I don't know. Maybe? A rwsem actually has a scalability factor in that the
more CPUs are forking, the more expensive the rwsem becomes, so it is a bit
more of a concern. Another factor is that in majority of use cases we're
almost completely bypassing write-locking percpu_rwsem, so it feel a bit sad
to convert it to a regular rwsem. So, if favordynmods is good enough, I'd
like to keep it that way.

Thanks.

--
tejun

2023-04-12 19:42:28

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] cgroup/cpuset: Optimize out unneeded cpuset_can_fork/cpuset_cancel_fork calls

On 4/12/23 15:17, Tejun Heo wrote:
> Hello,
>
> On Wed, Apr 12, 2023 at 02:40:53PM -0400, Waiman Long wrote:
>> On 4/12/23 14:27, Tejun Heo wrote:
>>> On Tue, Apr 11, 2023 at 09:36:01AM -0400, Waiman Long wrote:
>>>> The newly introduced cpuset_can_fork() and cpuset_cancel_fork() calls
>>>> are only needed when the CLONE_INTO_CGROUP flag is set which is not
>>>> likely. Adding an extra cpuset_can_fork() call does introduce a bit
>>>> of performance overhead in the fork/clone fastpath. To reduce this
>>>> performance overhead, introduce a new clone_into_cgroup_can_fork flag
>>>> into the cgroup_subsys structure. This flag, when set, will call the
>>>> can_fork and cancel_fork methods only if the CLONE_INTO_CGROUP flag
>>>> is set.
>>>>
>>>> The cpuset code is now modified to set this flag. The same cpuset
>>>> checking code in cpuset_can_fork() and cpuset_cancel_fork() will have
>>>> to stay as the cgroups can be different, but the cpusets may still be
>>>> the same. So the same check must be present in both cpuset_fork() and
>>>> cpuset_can_fork() to make sure that attach_in_progress is correctly set.
>>>>
>>>> Signed-off-by: Waiman Long <[email protected]>
>>> Waiman, I'm not necessarily against this optimization but can we at least
>>> have some performance numbers to show that this is actually meaningful?
>>> Given how heavy our fork path is, I'm not too sure this would show up in any
>>> meaningful way.
>> That make sense to me. I am OK to leave it for now as it is an optimization
>> patch anyway.
>>
>> BTW, another question that I have is about the cgroup_threadgroup_rwsem. It
>> is currently a percpu rwsem. Is it possible to change it into a regular
>> rwsem instead? It is causing quite a bit of latency for workloads that
>> require rather frequent changes to cgroups. I know we have a "favordynmods"
>> mount option to disable the percpu operation. This will still be less
>> performant than a normal rwsem. Of course the downside is that the fork/exit
>> fastpaths will be slowed down a bit.
> I don't know. Maybe? A rwsem actually has a scalability factor in that the
> more CPUs are forking, the more expensive the rwsem becomes, so it is a bit
> more of a concern. Another factor is that in majority of use cases we're
> almost completely bypassing write-locking percpu_rwsem, so it feel a bit sad
> to convert it to a regular rwsem. So, if favordynmods is good enough, I'd
> like to keep it that way.

It is just a thought that I have since Juri is in the process of
reverting the change of cpuset_mutex to cpuset_rwsem. Percpu rwsem can
be a bit problematic in PREEMPT_RT kernel since it does not support
proper priority inheritance though.

Cheers,
Longman