2022-11-12 22:48:03

by Waiman Long

[permalink] [raw]
Subject: [PATCH 0/2] cgroup/cpuset: v2 optimization

This patchset contains 2 patches to optimize out unneeded works when
running on a cgroup v2 environment.

Waiman Long (2):
cgroup/cpuset: Skip spread flags update on v2
cgroup/cpuset: Optimize cpuset_attach() on v2

kernel/cgroup/cpuset.c | 36 +++++++++++++++++++++++++++++++-----
1 file changed, 31 insertions(+), 5 deletions(-)

--
2.31.1



2022-11-12 22:55:41

by Waiman Long

[permalink] [raw]
Subject: [PATCH 2/2] cgroup/cpuset: Optimize cpuset_attach() on v2

It was found that with the default hierarchy, enabling cpuset in the
child cgroups can trigger a cpuset_attach() call in each of the child
cgroups that have tasks with no change in effective cpus and mems. If
there are many processes in those child cgroups, it will burn quite a
lot of cpu cycles iterating all the tasks without doing useful work.

Optimizing this case by comparing between the old and new cpusets and
skip useless update if there is no change in effective cpus and mems.
Also mems_allowed are less likely to be changed than cpus_allowed. So
skip changing mm if there is no change in effective_mems and
CS_MEMORY_MIGRATE is not set.

By inserting some instrumentation code and running a simple command in
a container 200 times in a cgroup v2 system, it was found that all the
cpuset_attach() calls are skipped (401 times in total) as there was no
change in effective cpus and mems.

Signed-off-by: Waiman Long <[email protected]>
---
kernel/cgroup/cpuset.c | 24 +++++++++++++++++++++++-
1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 2525905cdf48..b8361f55ef36 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -2513,12 +2513,28 @@ static void cpuset_attach(struct cgroup_taskset *tset)
struct cgroup_subsys_state *css;
struct cpuset *cs;
struct cpuset *oldcs = cpuset_attach_old_cs;
+ bool cpus_updated, mems_updated;

cgroup_taskset_first(tset, &css);
cs = css_cs(css);

lockdep_assert_cpus_held(); /* see cgroup_attach_lock() */
percpu_down_write(&cpuset_rwsem);
+ cpus_updated = !cpumask_equal(cs->effective_cpus,
+ oldcs->effective_cpus);
+ mems_updated = !nodes_equal(cs->effective_mems, oldcs->effective_mems);
+
+ /*
+ * In the default hierarchy, enabling cpuset in the child cgroups
+ * will trigger a number of cpuset_attach() calls with no change
+ * in effective cpus and mems. In that case, we can optimize out
+ * by skipping the task iteration and update.
+ */
+ if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys) &&
+ !cpus_updated && !mems_updated) {
+ cpuset_attach_nodemask_to = cs->effective_mems;
+ goto out;
+ }

guarantee_online_mems(cs, &cpuset_attach_nodemask_to);

@@ -2539,9 +2555,14 @@ static void cpuset_attach(struct cgroup_taskset *tset)

/*
* Change mm for all threadgroup leaders. This is expensive and may
- * sleep and should be moved outside migration path proper.
+ * sleep and should be moved outside migration path proper. Skip it
+ * if there is no change in effective_mems and CS_MEMORY_MIGRATE is
+ * not set.
*/
cpuset_attach_nodemask_to = cs->effective_mems;
+ if (!is_memory_migrate(cs) && !mems_updated)
+ goto out;
+
cgroup_taskset_for_each_leader(leader, css, tset) {
struct mm_struct *mm = get_task_mm(leader);

@@ -2564,6 +2585,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
}
}

+out:
cs->old_mems_allowed = cpuset_attach_nodemask_to;

cs->attach_in_progress--;
--
2.31.1


2022-11-12 23:00:22

by Waiman Long

[permalink] [raw]
Subject: [PATCH 1/2] cgroup/cpuset: Skip spread flags update on v2

Cpuset v2 has no spread flags to set. So we can skip spread
flags update if cpuset v2 is being used. Also change the name to
cpuset_update_task_spread_flags() to indicate that there are multiple
spread flags.

Signed-off-by: Waiman Long <[email protected]>
---
kernel/cgroup/cpuset.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index b474289c15b8..2525905cdf48 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -550,11 +550,15 @@ static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask)
/*
* update task's spread flag if cpuset's page/slab spread flag is set
*
- * Call with callback_lock or cpuset_rwsem held.
+ * Call with callback_lock or cpuset_rwsem held. The check can be skipped
+ * if on default hierarchy.
*/
-static void cpuset_update_task_spread_flag(struct cpuset *cs,
+static void cpuset_update_task_spread_flags(struct cpuset *cs,
struct task_struct *tsk)
{
+ if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys))
+ return;
+
if (is_spread_page(cs))
task_set_spread_page(tsk);
else
@@ -2153,7 +2157,7 @@ static void update_tasks_flags(struct cpuset *cs)

css_task_iter_start(&cs->css, 0, &it);
while ((task = css_task_iter_next(&it)))
- cpuset_update_task_spread_flag(cs, task);
+ cpuset_update_task_spread_flags(cs, task);
css_task_iter_end(&it);
}

@@ -2530,7 +2534,7 @@ static void cpuset_attach(struct cgroup_taskset *tset)
WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach));

cpuset_change_task_nodemask(task, &cpuset_attach_nodemask_to);
- cpuset_update_task_spread_flag(cs, task);
+ cpuset_update_task_spread_flags(cs, task);
}

/*
--
2.31.1


2022-11-14 22:17:52

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 0/2] cgroup/cpuset: v2 optimization

On Sat, Nov 12, 2022 at 05:19:37PM -0500, Waiman Long wrote:
> This patchset contains 2 patches to optimize out unneeded works when
> running on a cgroup v2 environment.
>
> Waiman Long (2):
> cgroup/cpuset: Skip spread flags update on v2
> cgroup/cpuset: Optimize cpuset_attach() on v2

Applied 1-2 to cgroup/for-6.2.

Thanks.

--
tejun

2022-11-21 19:40:11

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH 2/2] cgroup/cpuset: Optimize cpuset_attach() on v2

On Sat, Nov 12, 2022 at 05:19:39PM -0500, Waiman Long <[email protected]> wrote:
> + /*
> + * In the default hierarchy, enabling cpuset in the child cgroups
> + * will trigger a number of cpuset_attach() calls with no change
> + * in effective cpus and mems. In that case, we can optimize out
> + * by skipping the task iteration and update.
> + */
> + if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys) &&
> + !cpus_updated && !mems_updated) {

I'm just wondering -- why is this limited to the default hierarchy only?
IOW why can't v1 skip too (when favorable constness between cpusets).

Thanks,
Michal


Attachments:
(No filename) (624.00 B)
signature.asc (235.00 B)
Digital signature
Download all attachments

2022-11-21 20:36:46

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH 2/2] cgroup/cpuset: Optimize cpuset_attach() on v2


On 11/21/22 13:50, Michal Koutný wrote:
> On Sat, Nov 12, 2022 at 05:19:39PM -0500, Waiman Long <[email protected]> wrote:
>> + /*
>> + * In the default hierarchy, enabling cpuset in the child cgroups
>> + * will trigger a number of cpuset_attach() calls with no change
>> + * in effective cpus and mems. In that case, we can optimize out
>> + * by skipping the task iteration and update.
>> + */
>> + if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys) &&
>> + !cpus_updated && !mems_updated) {
> I'm just wondering -- why is this limited to the default hierarchy only?
> IOW why can't v1 skip too (when favorable constness between cpusets).

Cpuset v1 is a bit more complex. Besides cpu and node masks, it also
have other flags like the spread flags that we need to looks for
changes. Unlike cpuset v2, I don't think it is likely that
cpuset_attach() will be called without changes in cpu and node masks.
That are the reason why this patch focuses on v2. If it is found that
this is not the case, we can always extend the support to v1.

Cheers,
Longman


Subject: Re: [PATCH 2/2] cgroup/cpuset: Optimize cpuset_attach() on v2

On 2022-11-12 17:19:39 [-0500], Waiman Long wrote:
> It was found that with the default hierarchy, enabling cpuset in the
> child cgroups can trigger a cpuset_attach() call in each of the child
> cgroups that have tasks with no change in effective cpus and mems. If
> there are many processes in those child cgroups, it will burn quite a
> lot of cpu cycles iterating all the tasks without doing useful work.

Thank you.

So this preserves the CPU mask upon attaching the cpuset container.

| ~# taskset -pc $$
| pid 1564's current affinity list: 0-2

default mask after boot due to isolcpus=

| ~# echo "+cpu" >> /sys/fs/cgroup/cgroup.subtree_control ; echo "+cpuset" >> /sys/fs/cgroup/cgroup.subtree_control
| ~# taskset -pc $$
| pid 1564's current affinity list: 0-2

okay.

| ~# echo 1-3 > /sys/fs/cgroup/user.slice/cpuset.cpus
| ~# taskset -pc $$
| pid 1564's current affinity list: 1-3

wiped away.

| ~# taskset -pc 2-3 $$
| pid 1564's current affinity list: 1-3
| pid 1564's new affinity list: 2,3
| ~# echo 2-4 > /sys/fs/cgroup/user.slice/cpuset.cpus
| ~# taskset -pc 2-3 $$
| pid 1564's current affinity list: 2,3
| pid 1564's new affinity list: 2,3

But it works if the mask was changed on purpose.

Sebastian