Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
cpumask"), user provided CPU affinity via sched_setaffinity(2) is
perserved even if the task is being moved to a different cpuset. However,
that affinity is also being inherited by any subsequently created child
processes which may not want or be aware of that affinity.
One way to solve this problem is to provide a way to back off from
that user provided CPU affinity. This patch implements such a scheme
by using an empty cpumask to signal a reset of the cpumasks to the
default as allowed by the current cpuset.
Before this patch, passing in an empty cpumask to sched_setaffinity(2)
will always return an EINVAL error. With this patch, an error will no
longer be returned if sched_setaffinity(2) has been called before to
set up user_cpus_ptr. Instead, the user_cpus_ptr that stores the user
provided affinity will be cleared and the task's CPU affinity will be
reset to that of the current cpuset. No error will be returned in this
case to signal that a reset has happened.
If sched_setaffinity(2) has not been called previously, an EINVAL error
will be returned with an empty cpumask just like before. As a result,
tests or tools that rely on this behavior will not be affected unless
they have somehow called sched_setaffinity(2) before.
We will have to update the sched_setaffinity(2) manpage to document
this possible side effect of passing in an empty cpumask.
Signed-off-by: Waiman Long <[email protected]>
---
kernel/sched/core.c | 35 ++++++++++++++++++++++++++---------
1 file changed, 26 insertions(+), 9 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index c52c2eba7c73..9c4c27d4d4a9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8317,7 +8317,12 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx)
}
cpuset_cpus_allowed(p, cpus_allowed);
- cpumask_and(new_mask, ctx->new_mask, cpus_allowed);
+
+ /* Default to cpus_allowed with NULL new_mask */
+ if (ctx->new_mask)
+ cpumask_and(new_mask, ctx->new_mask, cpus_allowed);
+ else
+ cpumask_copy(new_mask, cpus_allowed);
ctx->new_mask = new_mask;
ctx->flags |= SCA_CHECK;
@@ -8366,6 +8371,7 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx)
long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
{
+ bool reset_cpumasks = cpumask_empty(in_mask);
struct affinity_context ac;
struct cpumask *user_mask;
struct task_struct *p;
@@ -8403,15 +8409,26 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
goto out_put_task;
/*
- * With non-SMP configs, user_cpus_ptr/user_mask isn't used and
- * alloc_user_cpus_ptr() returns NULL.
+ * If an empty cpumask is passed in and user_cpus_ptr is set,
+ * clear user_cpus_ptr and reset the current cpu affinity to the
+ * default for the current cpuset. If user_cpus_ptr isn't set,
+ * -EINVAL will be returned as before.
*/
- user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
- if (user_mask) {
- cpumask_copy(user_mask, in_mask);
- } else if (IS_ENABLED(CONFIG_SMP)) {
- retval = -ENOMEM;
- goto out_put_task;
+ if (reset_cpumasks && p->user_cpus_ptr) {
+ in_mask = NULL; /* To be updated in __sched_setaffinity */
+ user_mask = NULL;
+ } else {
+ /*
+ * With non-SMP configs, user_cpus_ptr/user_mask isn't used
+ * and alloc_user_cpus_ptr() returns NULL.
+ */
+ user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
+ if (user_mask) {
+ cpumask_copy(user_mask, in_mask);
+ } else if (IS_ENABLED(CONFIG_SMP)) {
+ retval = -ENOMEM;
+ goto out_put_task;
+ }
}
ac = (struct affinity_context){
--
2.31.1
On Mon, Jul 17, 2023 at 02:02:43PM -0400, Waiman Long wrote:
> Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
> cpumask"), user provided CPU affinity via sched_setaffinity(2) is
> perserved even if the task is being moved to a different cpuset. However,
> that affinity is also being inherited by any subsequently created child
> processes which may not want or be aware of that affinity.
>
> One way to solve this problem is to provide a way to back off from
> that user provided CPU affinity. This patch implements such a scheme
> by using an empty cpumask to signal a reset of the cpumasks to the
> default as allowed by the current cpuset.
>
> Before this patch, passing in an empty cpumask to sched_setaffinity(2)
> will always return an EINVAL error. With this patch, an error will no
> longer be returned if sched_setaffinity(2) has been called before to
> set up user_cpus_ptr. Instead, the user_cpus_ptr that stores the user
> provided affinity will be cleared and the task's CPU affinity will be
> reset to that of the current cpuset. No error will be returned in this
> case to signal that a reset has happened.
>
> If sched_setaffinity(2) has not been called previously, an EINVAL error
> will be returned with an empty cpumask just like before. As a result,
> tests or tools that rely on this behavior will not be affected unless
> they have somehow called sched_setaffinity(2) before.
>
> We will have to update the sched_setaffinity(2) manpage to document
> this possible side effect of passing in an empty cpumask.
So a normal task, that hasn't had it's affinity changed will have
possible_mask.
So why not use in_mask == possible_mask to clear the user state?
Hello,
kernel test robot noticed "kernel-selftests.vDSO.vdso_test_correctness.fail" on:
commit: 8bd21fae1e37284f0f78fa42fb4d20a6cadfa68a ("[PATCH v2] sched/core: Use empty mask to reset cpumasks in sched_setaffinity()")
url: https://github.com/intel-lab-lkp/linux/commits/Waiman-Long/sched-core-Use-empty-mask-to-reset-cpumasks-in-sched_setaffinity/20230718-194823
base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2ae2fb98b77339277a2c2f18dfec605dfd8dd321
patch link: https://lore.kernel.org/all/[email protected]/
patch subject: [PATCH v2] sched/core: Use empty mask to reset cpumasks in sched_setaffinity()
in testcase: kernel-selftests
version: kernel-selftests-x86_64-60acb023-1_20230329
with following parameters:
group: group-03
compiler: gcc-12
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]
# timeout set to 300
# selftests: vDSO: vdso_test_correctness
# [RUN] Testing clock_gettime for clock CLOCK_REALTIME (0)...
# 1690070252.747617798 1690070252.747630478 1690070252.747630848
# [OK] Test Passed.
# [RUN] Testing clock_gettime for clock CLOCK_MONOTONIC (1)...
# 736.920352329 736.920352966 736.920353262
# [OK] Test Passed.
# [RUN] Testing clock_gettime for clock CLOCK_PROCESS_CPUTIME_ID (2)...
# 0.019803172 0.019806151 0.019807948
# [OK] Test Passed.
# [RUN] Testing clock_gettime for clock CLOCK_THREAD_CPUTIME_ID (3)...
# 0.019810633 0.019812071 0.019813400
# [OK] Test Passed.
# [RUN] Testing clock_gettime for clock CLOCK_MONOTONIC_RAW (4)...
# 736.105961155 736.105961651 736.105961997
# [OK] Test Passed.
# [RUN] Testing clock_gettime for clock CLOCK_REALTIME_COARSE (5)...
# 1690070252.747112013 1690070252.747112013 1690070252.747112013
# [OK] Test Passed.
# [RUN] Testing clock_gettime for clock CLOCK_MONOTONIC_COARSE (6)...
# 736.919830840 736.919830840 736.919830840
# [OK] Test Passed.
# [RUN] Testing clock_gettime for clock CLOCK_BOOTTIME (7)...
# 736.920375040 736.920375446 736.920375746
# [OK] Test Passed.
# [RUN] Testing clock_gettime for clock CLOCK_REALTIME_ALARM (8)...
# 1690070252.747659265 1690070252.747660189 1690070252.747661034
# [OK] Test Passed.
# [RUN] Testing clock_gettime for clock CLOCK_BOOTTIME_ALARM (9)...
# 736.920381384 736.920382382 736.920383280
# [OK] Test Passed.
# [RUN] Testing clock_gettime for clock CLOCK_SGI_CYCLE (10)...
# [OK] No such clock.
# [RUN] Testing clock_gettime for clock CLOCK_TAI (11)...
# 1690070252.747666904 1690070252.747667314 1690070252.747667593
# [OK] Test Passed.
# [RUN] Testing clock_gettime for clock invalid (-1)...
# [OK] No such clock.
# [RUN] Testing clock_gettime for clock invalid (-2147483648)...
# [OK] No such clock.
# [RUN] Testing clock_gettime for clock invalid (2147483647)...
# [OK] No such clock.
# [SKIP] No vDSO, so skipping clock_gettime64() tests
# [RUN] Testing gettimeofday...
# 1690070252.747671 1690070252.747672 1690070252.747673
# [OK] timezones match: minuteswest=0, dsttime=0
# [RUN] Testing getcpu...
# [OK] CPU 0: syscall: cpu 0, node 0 vdso: cpu 0, node 0 vsyscall: cpu 0, node 0
# [OK] CPU 1: syscall: cpu 1, node 0 vdso: cpu 1, node 0 vsyscall: cpu 1, node 0
# [OK] CPU 2: syscall: cpu 2, node 0 vdso: cpu 2, node 0 vsyscall: cpu 2, node 0
# [OK] CPU 3: syscall: cpu 3, node 0 vdso: cpu 3, node 0 vsyscall: cpu 3, node 0
# [OK] CPU 4: syscall: cpu 4, node 0 vdso: cpu 4, node 0 vsyscall: cpu 4, node 0
# [OK] CPU 5: syscall: cpu 5, node 0 vdso: cpu 5, node 0 vsyscall: cpu 5, node 0
# [OK] CPU 6: syscall: cpu 6, node 0 vdso: cpu 6, node 0 vsyscall: cpu 6, node 0
# [OK] CPU 7: syscall: cpu 7, node 0 vdso: cpu 7, node 0 vsyscall: cpu 7, node 0
# [OK] CPU 8: syscall: cpu 8, node 0 vdso: cpu 8, node 0 vsyscall: cpu 8, node 0
# [OK] CPU 9: syscall: cpu 9, node 0 vdso: cpu 9, node 0 vsyscall: cpu 9, node 0
# [OK] CPU 10: syscall: cpu 10, node 0 vdso: cpu 10, node 0 vsyscall: cpu 10, node 0
# [OK] CPU 11: syscall: cpu 11, node 0 vdso: cpu 11, node 0 vsyscall: cpu 11, node 0
# [OK] CPU 12: syscall: cpu 12, node 0 vdso: cpu 12, node 0 vsyscall: cpu 12, node 0
# [OK] CPU 13: syscall: cpu 13, node 0 vdso: cpu 13, node 0 vsyscall: cpu 13, node 0
# [OK] CPU 14: syscall: cpu 14, node 0 vdso: cpu 14, node 0 vsyscall: cpu 14, node 0
# [OK] CPU 15: syscall: cpu 15, node 0 vdso: cpu 15, node 0 vsyscall: cpu 15, node 0
# [OK] CPU 16: syscall: cpu 16, node 0 vdso: cpu 16, node 0 vsyscall: cpu 16, node 0
# [OK] CPU 17: syscall: cpu 17, node 0 vdso: cpu 17, node 0 vsyscall: cpu 17, node 0
# [OK] CPU 18: syscall: cpu 18, node 0 vdso: cpu 18, node 0 vsyscall: cpu 18, node 0
# [OK] CPU 19: syscall: cpu 19, node 0 vdso: cpu 19, node 0 vsyscall: cpu 19, node 0
# [OK] CPU 20: syscall: cpu 20, node 0 vdso: cpu 20, node 0 vsyscall: cpu 20, node 0
# [OK] CPU 21: syscall: cpu 21, node 0 vdso: cpu 21, node 0 vsyscall: cpu 21, node 0
# [OK] CPU 22: syscall: cpu 22, node 0 vdso: cpu 22, node 0 vsyscall: cpu 22, node 0
# [OK] CPU 23: syscall: cpu 23, node 0 vdso: cpu 23, node 0 vsyscall: cpu 23, node 0
# [OK] CPU 24: syscall: cpu 24, node 0 vdso: cpu 24, node 0 vsyscall: cpu 24, node 0
# [OK] CPU 25: syscall: cpu 25, node 0 vdso: cpu 25, node 0 vsyscall: cpu 25, node 0
# [OK] CPU 26: syscall: cpu 26, node 0 vdso: cpu 26, node 0 vsyscall: cpu 26, node 0
# [OK] CPU 27: syscall: cpu 27, node 0 vdso: cpu 27, node 0 vsyscall: cpu 27, node 0
# [OK] CPU 28: syscall: cpu 28, node 0 vdso: cpu 28, node 0 vsyscall: cpu 28, node 0
# [OK] CPU 29: syscall: cpu 29, node 0 vdso: cpu 29, node 0 vsyscall: cpu 29, node 0
# [OK] CPU 30: syscall: cpu 30, node 0 vdso: cpu 30, node 0 vsyscall: cpu 30, node 0
# [OK] CPU 31: syscall: cpu 31, node 0 vdso: cpu 31, node 0 vsyscall: cpu 31, node 0
# [OK] CPU 32: syscall: cpu 32, node 0 vdso: cpu 32, node 0 vsyscall: cpu 32, node 0
# [OK] CPU 33: syscall: cpu 33, node 0 vdso: cpu 33, node 0 vsyscall: cpu 33, node 0
# [OK] CPU 34: syscall: cpu 34, node 0 vdso: cpu 34, node 0 vsyscall: cpu 34, node 0
# [OK] CPU 35: syscall: cpu 35, node 0 vdso: cpu 35, node 0 vsyscall: cpu 35, node 0
# [FAIL] CPU 36: syscall: cpu 35, node 0 vdso: cpu 35, node 0 vsyscall: cpu 35, node 0
not ok 6 selftests: vDSO: vdso_test_correctness # exit=1
To reproduce:
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On 7/25/23 09:07, kernel test robot wrote:
>
> Hello,
>
> kernel test robot noticed "kernel-selftests.vDSO.vdso_test_correctness.fail" on:
>
> commit: 8bd21fae1e37284f0f78fa42fb4d20a6cadfa68a ("[PATCH v2] sched/core: Use empty mask to reset cpumasks in sched_setaffinity()")
> url: https://github.com/intel-lab-lkp/linux/commits/Waiman-Long/sched-core-Use-empty-mask-to-reset-cpumasks-in-sched_setaffinity/20230718-194823
> base: https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git 2ae2fb98b77339277a2c2f18dfec605dfd8dd321
> patch link: https://lore.kernel.org/all/[email protected]/
> patch subject: [PATCH v2] sched/core: Use empty mask to reset cpumasks in sched_setaffinity()
>
> in testcase: kernel-selftests
> version: kernel-selftests-x86_64-60acb023-1_20230329
> with following parameters:
>
> group: group-03
>
>
>
> compiler: gcc-12
> test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
> # [OK] CPU 34: syscall: cpu 34, node 0 vdso: cpu 34, node 0 vsyscall: cpu 34, node 0
> # [OK] CPU 35: syscall: cpu 35, node 0 vdso: cpu 35, node 0 vsyscall: cpu 35, node 0
> # [FAIL] CPU 36: syscall: cpu 35, node 0 vdso: cpu 35, node 0 vsyscall: cpu 35, node 0
> not ok 6 selftests: vDSO: vdso_test_correctness # exit=1
I have figured out what this fail. The vdso_test_correctness.c test
calls sched_setaffinity() from cpu 0 and up until it hits an error
return. There are 36 CPUs in the test system and once the CPU number hit
36, the cpumask_empty() test in sched_setaffinity() returns true causing
a non-error return.
It seems like I need to always return an error code to not break
existing test program, but perhaps with a different error code to
indicate some action is done.
Cheers,
Longman
On 7/21/23 05:42, Peter Zijlstra wrote:
> On Mon, Jul 17, 2023 at 02:02:43PM -0400, Waiman Long wrote:
>> Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
>> cpumask"), user provided CPU affinity via sched_setaffinity(2) is
>> perserved even if the task is being moved to a different cpuset. However,
>> that affinity is also being inherited by any subsequently created child
>> processes which may not want or be aware of that affinity.
>>
>> One way to solve this problem is to provide a way to back off from
>> that user provided CPU affinity. This patch implements such a scheme
>> by using an empty cpumask to signal a reset of the cpumasks to the
>> default as allowed by the current cpuset.
>>
>> Before this patch, passing in an empty cpumask to sched_setaffinity(2)
>> will always return an EINVAL error. With this patch, an error will no
>> longer be returned if sched_setaffinity(2) has been called before to
>> set up user_cpus_ptr. Instead, the user_cpus_ptr that stores the user
>> provided affinity will be cleared and the task's CPU affinity will be
>> reset to that of the current cpuset. No error will be returned in this
>> case to signal that a reset has happened.
>>
>> If sched_setaffinity(2) has not been called previously, an EINVAL error
>> will be returned with an empty cpumask just like before. As a result,
>> tests or tools that rely on this behavior will not be affected unless
>> they have somehow called sched_setaffinity(2) before.
>>
>> We will have to update the sched_setaffinity(2) manpage to document
>> this possible side effect of passing in an empty cpumask.
> So a normal task, that hasn't had it's affinity changed will have
> possible_mask.
>
> So why not use in_mask == possible_mask to clear the user state?
It is not straight forward for a user application to figure what exactly
is possible_mask. Using a empty mask, however, is much easier with the
CPU_ZERO() macro.
In many cases, tasks may be running under a cpuset with cpu list that is
a proper subset of possible_mask. Since possible_mask is a valid
cpumask, we can't use it to reset to the cpuset's default which is what
this patch is.
Cheers,
Longman