2023-03-17 15:16:39

by Waiman Long

[permalink] [raw]
Subject: [PATCH v2 0/4] cgroup/cpuset: Miscellaneous updates

v2:
- Add a new patch 1 that fixes a bug introduced by recent v6.2 commit
7a2127e66a00 ("cpuset: Call set_cpus_allowed_ptr() with appropriate
mask for task").
- Make a small twist and additional comment to patch 2 ("cgroup/cpuset:
Skip task update if hotplug doesn't affect current cpuset") as
suggested by Michal.
- Remove v1 patches 3/4 for now for further discussion.

This patch series includes miscellaneous update to the cpuset and its
testing code.

Patch 1 fixes a bug caused by commit 7a2127e66a00 ("cpuset: Call
set_cpus_allowed_ptr() with appropriate mask for task") in the partition
handling code. This fix was verified by running the test_cpuset_prs.sh
test.

Patch 2 is for a hotplug optimization.

Patch 3 is actually a follow-up of commit 3fb906e7fabb ("cgroup/cpuset:
Don't filter offline CPUs in cpuset_cpus_allowed() for top cpuset tasks").

Patch 4 reduces verbosity when running test_cpuset_prs.sh test script
unless explicitly enabled with the -v option.

Waiman Long (4):
cgroup/cpuset: Fix partition root's cpuset.cpus update bug
cgroup/cpuset: Skip task update if hotplug doesn't affect current
cpuset
cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset
are updated
cgroup/cpuset: Minor updates to test_cpuset_prs.sh

kernel/cgroup/cpuset.c | 38 +++++++++++++------
.../selftests/cgroup/test_cpuset_prs.sh | 25 ++++++------
2 files changed, 41 insertions(+), 22 deletions(-)

--
2.31.1



2023-03-17 15:16:41

by Waiman Long

[permalink] [raw]
Subject: [PATCH v2 3/4] cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated

Similar to commit 3fb906e7fabb ("group/cpuset: Don't filter offline
CPUs in cpuset_cpus_allowed() for top cpuset tasks"), the whole set of
possible CPUs including offline ones should be used for setting cpumasks
for tasks in the top cpuset when a cpuset partition is modified as the
hotplug code won't update cpumasks for tasks in the top cpuset when
CPUs become online or offline.

Signed-off-by: Waiman Long <[email protected]>
---
kernel/cgroup/cpuset.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 5b8d763555b0..db8793a2082f 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1209,7 +1209,9 @@ void rebuild_sched_domains(void)
*
* Iterate through each task of @cs updating its cpus_allowed to the
* effective cpuset's. As this function is called with cpuset_rwsem held,
- * cpuset membership stays stable.
+ * cpuset membership stays stable. For top_cpuset, task_cpu_possible_mask()
+ * is used instead of effective_cpus to make sure all offline CPUs are also
+ * included as hotplug code won't update cpumasks for tasks in top_cpuset.
*/
static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)
{
@@ -1219,15 +1221,18 @@ static void update_tasks_cpumask(struct cpuset *cs, struct cpumask *new_cpus)

css_task_iter_start(&cs->css, 0, &it);
while ((task = css_task_iter_next(&it))) {
- /*
- * Percpu kthreads in top_cpuset are ignored
- */
- if (top_cs && (task->flags & PF_KTHREAD) &&
- kthread_is_per_cpu(task))
- continue;
+ const struct cpumask *possible_mask = task_cpu_possible_mask(task);

- cpumask_and(new_cpus, cs->effective_cpus,
- task_cpu_possible_mask(task));
+ if (top_cs) {
+ /*
+ * Percpu kthreads in top_cpuset are ignored
+ */
+ if ((task->flags & PF_KTHREAD) && kthread_is_per_cpu(task))
+ continue;
+ cpumask_andnot(new_cpus, possible_mask, cs->subparts_cpus);
+ } else {
+ cpumask_and(new_cpus, possible_mask, cs->effective_cpus);
+ }
set_cpus_allowed_ptr(task, new_cpus);
}
css_task_iter_end(&it);
--
2.31.1


2023-03-17 15:16:46

by Waiman Long

[permalink] [raw]
Subject: [PATCH v2 1/4] cgroup/cpuset: Fix partition root's cpuset.cpus update bug

It was found that commit 7a2127e66a00 ("cpuset: Call
set_cpus_allowed_ptr() with appropriate mask for task") introduced a bug
that corrupted "cpuset.cpus" of a partition root when it was updated.

It is because the tmp->new_cpus field of the passed tmp parameter
of update_parent_subparts_cpumask() should not be used at all as
it contains important cpumask data that should not be overwritten.
Fix it by using tmp->addmask instead.

Also update update_cpumask() to make sure that trialcs->cpu_allowed
will not be corrupted until it is no longer needed.

Fixes: 7a2127e66a00 ("cpuset: Call set_cpus_allowed_ptr() with appropriate mask for task")
Signed-off-by: Waiman Long <[email protected]>
---
kernel/cgroup/cpuset.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 636f1c682ac0..f310915d1257 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -1513,7 +1513,7 @@ static int update_parent_subparts_cpumask(struct cpuset *cs, int cmd,
spin_unlock_irq(&callback_lock);

if (adding || deleting)
- update_tasks_cpumask(parent, tmp->new_cpus);
+ update_tasks_cpumask(parent, tmp->addmask);

/*
* Set or clear CS_SCHED_LOAD_BALANCE when partcmd_update, if necessary.
@@ -1770,10 +1770,13 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
/*
* Use the cpumasks in trialcs for tmpmasks when they are pointers
* to allocated cpumasks.
+ *
+ * Note that update_parent_subparts_cpumask() uses only addmask &
+ * delmask, but not new_cpus.
*/
tmp.addmask = trialcs->subparts_cpus;
tmp.delmask = trialcs->effective_cpus;
- tmp.new_cpus = trialcs->cpus_allowed;
+ tmp.new_cpus = NULL;
#endif

retval = validate_change(cs, trialcs);
@@ -1838,6 +1841,11 @@ static int update_cpumask(struct cpuset *cs, struct cpuset *trialcs,
}
spin_unlock_irq(&callback_lock);

+#ifdef CONFIG_CPUMASK_OFFSTACK
+ /* Now trialcs->cpus_allowed is available */
+ tmp.new_cpus = trialcs->cpus_allowed;
+#endif
+
/* effective_cpus will be updated here */
update_cpumasks_hier(cs, &tmp, false);

--
2.31.1


2023-03-17 15:16:54

by Waiman Long

[permalink] [raw]
Subject: [PATCH v2 2/4] cgroup/cpuset: Skip task update if hotplug doesn't affect current cpuset

If a hotplug event doesn't affect the current cpuset, there is no point
to call hotplug_update_tasks() or hotplug_update_tasks_legacy(). So
just skip it.

Signed-off-by: Waiman Long <[email protected]>
Reviewed-by: Michal Koutný <[email protected]>
---
kernel/cgroup/cpuset.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index f310915d1257..5b8d763555b0 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -3516,6 +3516,8 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
update_tasks:
cpus_updated = !cpumask_equal(&new_cpus, cs->effective_cpus);
mems_updated = !nodes_equal(new_mems, cs->effective_mems);
+ if (!cpus_updated && !mems_updated)
+ goto unlock; /* Hotplug doesn't affect this cpuset */

if (mems_updated)
check_insane_mems_config(&new_mems);
@@ -3527,6 +3529,7 @@ static void cpuset_hotplug_update_tasks(struct cpuset *cs, struct tmpmasks *tmp)
hotplug_update_tasks_legacy(cs, &new_cpus, &new_mems,
cpus_updated, mems_updated);

+unlock:
percpu_up_write(&cpuset_rwsem);
}

--
2.31.1


2023-03-17 15:16:58

by Waiman Long

[permalink] [raw]
Subject: [PATCH v2 4/4] cgroup/cpuset: Minor updates to test_cpuset_prs.sh

This patch makes the following minor updates to the cpuset partition
testing script test_cpuset_prs.sh.

- Remove online_cpus function call as it will be called anyway on exit
in cleanup.
- Make the enabling of sched/verbose debugfs flag conditional on the
"-v" verbose option and set DELAY_FACTOR to 2 in this case as cpuset
partition operations are likely to be slowed down by enabling that.

Signed-off-by: Waiman Long <[email protected]>
---
.../selftests/cgroup/test_cpuset_prs.sh | 25 +++++++++++--------
1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/cgroup/test_cpuset_prs.sh b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
index 75c100de90ff..2b5215cc599f 100755
--- a/tools/testing/selftests/cgroup/test_cpuset_prs.sh
+++ b/tools/testing/selftests/cgroup/test_cpuset_prs.sh
@@ -15,13 +15,6 @@ skip_test() {

[[ $(id -u) -eq 0 ]] || skip_test "Test must be run as root!"

-# Set sched verbose flag, if available
-if [[ -d /sys/kernel/debug/sched ]]
-then
- # Used to restore the original setting during cleanup
- SCHED_DEBUG=$(cat /sys/kernel/debug/sched/verbose)
- echo Y > /sys/kernel/debug/sched/verbose
-fi

# Get wait_inotify location
WAIT_INOTIFY=$(cd $(dirname $0); pwd)/wait_inotify
@@ -37,10 +30,14 @@ CPUS=$(lscpu | grep "^CPU(s):" | sed -e "s/.*:[[:space:]]*//")
PROG=$1
VERBOSE=
DELAY_FACTOR=1
+SCHED_DEBUG=
while [[ "$1" = -* ]]
do
case "$1" in
-v) VERBOSE=1
+ # Enable sched/verbose can slow thing down
+ [[ $DELAY_FACTOR -eq 1 ]] &&
+ DELAY_FACTOR=2
break
;;
-d) DELAY_FACTOR=$2
@@ -54,6 +51,14 @@ do
shift
done

+# Set sched verbose flag if available when "-v" option is specified
+if [[ -n "$VERBOSE" && -d /sys/kernel/debug/sched ]]
+then
+ # Used to restore the original setting during cleanup
+ SCHED_DEBUG=$(cat /sys/kernel/debug/sched/verbose)
+ echo Y > /sys/kernel/debug/sched/verbose
+fi
+
cd $CGROUP2
echo +cpuset > cgroup.subtree_control
[[ -d test ]] || mkdir test
@@ -65,7 +70,8 @@ cleanup()
rmdir A1/A2/A3 A1/A2 A1 B1 > /dev/null 2>&1
cd ..
rmdir test > /dev/null 2>&1
- echo "$SCHED_DEBUG" > /sys/kernel/debug/sched/verbose
+ [[ -n "$SCHED_DEBUG" ]] &&
+ echo "$SCHED_DEBUG" > /sys/kernel/debug/sched/verbose
}

# Pause in ms
@@ -571,7 +577,6 @@ run_state_test()
echo "Test $TEST[$I] failed result check!"
eval echo \"\${$TEST[$I]}\"
dump_states
- online_cpus
exit 1
}

@@ -582,7 +587,6 @@ run_state_test()
eval echo \"\${$TEST[$I]}\"
echo
dump_states
- online_cpus
exit 1
}
}
@@ -594,7 +598,6 @@ run_state_test()
eval echo \"\${$TEST[$I]}\"
echo
dump_states
- online_cpus
exit 1
}
}
--
2.31.1


2023-03-17 18:02:06

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated

Hello.

On Fri, Mar 17, 2023 at 11:15:07AM -0400, Waiman Long <[email protected]> wrote:
> * Iterate through each task of @cs updating its cpus_allowed to the
> * effective cpuset's. As this function is called with cpuset_rwsem held,
> - * cpuset membership stays stable.
> + * cpuset membership stays stable. For top_cpuset, task_cpu_possible_mask()
> + * is used instead of effective_cpus to make sure all offline CPUs are also
> + * included as hotplug code won't update cpumasks for tasks in top_cpuset.
> */

On Wed, Mar 15, 2023 at 11:06:20AM +0100, Michal Koutn? <[email protected]> wrote:
> I see now that it returns offlined cpus to top cpuset's tasks.

I considered only the "base" set change cs->effective_cpus ->
possible_mask. (Apologies for that mistake.)

However, I now read the note about subparts_cpus

> * effective_cpus contains only onlined CPUs, but subparts_cpus
> * may have offlined ones.

So if subpart_cpus keeps offlined CPUs, they will be subtracted from
possible_mask and absent in the resulting new_cpus, i.e. undesirable for
the tasks in that cpuset :-/

Michal


Attachments:
(No filename) (1.09 kB)
signature.asc (228.00 B)
Download all attachments

2023-03-17 18:06:26

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated


On 3/17/23 14:01, Michal Koutný wrote:
> Hello.
>
> On Fri, Mar 17, 2023 at 11:15:07AM -0400, Waiman Long <[email protected]> wrote:
>> * Iterate through each task of @cs updating its cpus_allowed to the
>> * effective cpuset's. As this function is called with cpuset_rwsem held,
>> - * cpuset membership stays stable.
>> + * cpuset membership stays stable. For top_cpuset, task_cpu_possible_mask()
>> + * is used instead of effective_cpus to make sure all offline CPUs are also
>> + * included as hotplug code won't update cpumasks for tasks in top_cpuset.
>> */
> On Wed, Mar 15, 2023 at 11:06:20AM +0100, Michal Koutný <[email protected]> wrote:
>> I see now that it returns offlined cpus to top cpuset's tasks.
> I considered only the "base" set change cs->effective_cpus ->
> possible_mask. (Apologies for that mistake.)
>
> However, I now read the note about subparts_cpus
>
>> * effective_cpus contains only onlined CPUs, but subparts_cpus
>> * may have offlined ones.
> So if subpart_cpus keeps offlined CPUs, they will be subtracted from
> possible_mask and absent in the resulting new_cpus, i.e. undesirable for
> the tasks in that cpuset :-/

A cpu will be in the subparts_cpus only if it has been given to the
child partition. So when it becomes online, it will become part of the
scheduling domain that child partition. Only the tasks in that child
partition will get their cpumasks updated to use it, not those in the
top cpuset.

Cheers,
Longman


2023-03-20 15:09:45

by Michal Koutný

[permalink] [raw]
Subject: Re: [PATCH v2 3/4] cgroup/cpuset: Include offline CPUs when tasks' cpumasks in top_cpuset are updated

On Fri, Mar 17, 2023 at 02:05:32PM -0400, Waiman Long <[email protected]> wrote:
> A cpu will be in the subparts_cpus only if it has been given to the child
> partition. So when it becomes online, it will become part of the scheduling
> domain that child partition. Only the tasks in that child partition will get
> their cpumasks updated to use it, not those in the top cpuset.

Right, it's actually the difference between offlining a CPU and giving it
to a sub-partition (hence a removed child (or switched to member) before
CPU onlining). It's clear to me now.

Thanks,
Michal


Attachments:
(No filename) (581.00 B)
signature.asc (228.00 B)
Download all attachments

2023-03-28 13:47:23

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] cgroup/cpuset: Miscellaneous updates

Hi Waiman,

On Fri, Mar 17, 2023 at 11:15:04AM -0400, Waiman Long wrote:
> v2:
> - Add a new patch 1 that fixes a bug introduced by recent v6.2 commit
> 7a2127e66a00 ("cpuset: Call set_cpus_allowed_ptr() with appropriate
> mask for task").
> - Make a small twist and additional comment to patch 2 ("cgroup/cpuset:
> Skip task update if hotplug doesn't affect current cpuset") as
> suggested by Michal.
> - Remove v1 patches 3/4 for now for further discussion.
>
> This patch series includes miscellaneous update to the cpuset and its
> testing code.

FWIW, this series also passes my asymmetric 32-bit tests.

Cheers,

Will

2023-03-29 01:18:55

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v2 0/4] cgroup/cpuset: Miscellaneous updates

On 3/28/23 09:40, Will Deacon wrote:
> Hi Waiman,
>
> On Fri, Mar 17, 2023 at 11:15:04AM -0400, Waiman Long wrote:
>> v2:
>> - Add a new patch 1 that fixes a bug introduced by recent v6.2 commit
>> 7a2127e66a00 ("cpuset: Call set_cpus_allowed_ptr() with appropriate
>> mask for task").
>> - Make a small twist and additional comment to patch 2 ("cgroup/cpuset:
>> Skip task update if hotplug doesn't affect current cpuset") as
>> suggested by Michal.
>> - Remove v1 patches 3/4 for now for further discussion.
>>
>> This patch series includes miscellaneous update to the cpuset and its
>> testing code.
> FWIW, this series also passes my asymmetric 32-bit tests.

Thanks Will!

Tejun, do you have time to take a look at this series, especially the
first patch which is a fix that may need to go to stable?

Cheers,
Longman