2023-10-03 20:58:55

by Waiman Long

[permalink] [raw]
Subject: [PATCH v4] sched/core: Use zero length to reset cpumasks in sched_setaffinity()

Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
cpumask"), user provided CPU affinity via sched_setaffinity(2) is
perserved even if the task is being moved to a different cpuset. However,
that affinity is also being inherited by any subsequently created child
processes which may not want or be aware of that affinity.

One way to solve this problem is to provide a way to back off from that
user provided CPU affinity. This patch implements such a scheme by
using an input cpumask length of 0 to signal a reset of the cpumasks
to the default as allowed by the current cpuset. A non-NULL cpumask
should still be provided to avoid problem with older kernel.

If sched_setaffinity(2) has been called previously to set a user
supplied cpumask, a value of 0 will be returned to indicate success.
Otherwise, an error value of -EINVAL will be returned.

We may have to update the sched_setaffinity(2) manpage to document
this new side effect of passing in an input length of 0.

Signed-off-by: Waiman Long <[email protected]>
---
kernel/sched/core.c | 43 ++++++++++++++++++++++++++++++++++---------
1 file changed, 34 insertions(+), 9 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 802551e0009b..a10d507a05df 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -8315,7 +8315,12 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx)
}

cpuset_cpus_allowed(p, cpus_allowed);
- cpumask_and(new_mask, ctx->new_mask, cpus_allowed);
+
+ /* Default to cpus_allowed with NULL new_mask */
+ if (ctx->new_mask)
+ cpumask_and(new_mask, ctx->new_mask, cpus_allowed);
+ else
+ cpumask_copy(new_mask, cpus_allowed);

ctx->new_mask = new_mask;
ctx->flags |= SCA_CHECK;
@@ -8401,15 +8406,29 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
goto out_put_task;

/*
- * With non-SMP configs, user_cpus_ptr/user_mask isn't used and
- * alloc_user_cpus_ptr() returns NULL.
+ * If a NULL cpumask is passed in and user_cpus_ptr is set,
+ * clear user_cpus_ptr and reset the current cpu affinity to the
+ * default for the current cpuset. If user_cpus_ptr isn't set,
+ * -EINVAL will be returned.
*/
- user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
- if (user_mask) {
- cpumask_copy(user_mask, in_mask);
- } else if (IS_ENABLED(CONFIG_SMP)) {
- retval = -ENOMEM;
- goto out_put_task;
+ if (!in_mask) {
+ if (!p->user_cpus_ptr) {
+ retval = -EINVAL;
+ goto out_put_task;
+ }
+ user_mask = NULL;
+ } else {
+ /*
+ * With non-SMP configs, user_cpus_ptr/user_mask isn't used
+ * and alloc_user_cpus_ptr() returns NULL.
+ */
+ user_mask = alloc_user_cpus_ptr(NUMA_NO_NODE);
+ if (user_mask) {
+ cpumask_copy(user_mask, in_mask);
+ } else if (IS_ENABLED(CONFIG_SMP)) {
+ retval = -ENOMEM;
+ goto out_put_task;
+ }
}

ac = (struct affinity_context){
@@ -8451,6 +8470,12 @@ SYSCALL_DEFINE3(sched_setaffinity, pid_t, pid, unsigned int, len,
cpumask_var_t new_mask;
int retval;

+ /*
+ * A len of 0 will reset a previously set user cpumask.
+ */
+ if (!len)
+ return sched_setaffinity(pid, NULL);
+
if (!alloc_cpumask_var(&new_mask, GFP_KERNEL))
return -ENOMEM;

--
2.39.3


2023-10-04 09:24:00

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v4] sched/core: Use zero length to reset cpumasks in sched_setaffinity()


* Peter Zijlstra <[email protected]> wrote:

> On Tue, Oct 03, 2023 at 04:57:35PM -0400, Waiman Long wrote:
> > Since commit 8f9ea86fdf99 ("sched: Always preserve the user requested
> > cpumask"), user provided CPU affinity via sched_setaffinity(2) is
> > perserved even if the task is being moved to a different cpuset. However,
> > that affinity is also being inherited by any subsequently created child
> > processes which may not want or be aware of that affinity.
> >
> > One way to solve this problem is to provide a way to back off from that
> > user provided CPU affinity. This patch implements such a scheme by
> > using an input cpumask length of 0 to signal a reset of the cpumasks
> > to the default as allowed by the current cpuset. A non-NULL cpumask
> > should still be provided to avoid problem with older kernel.
> >
> > If sched_setaffinity(2) has been called previously to set a user
> > supplied cpumask, a value of 0 will be returned to indicate success.
> > Otherwise, an error value of -EINVAL will be returned.
> >
> > We may have to update the sched_setaffinity(2) manpage to document
> > this new side effect of passing in an input length of 0.
>
> Bah.. so while this is less horrible than some of the previous hacks,
> but I still think an all set mask is the sanest option.
>
> Adding FreeBSD's CPU_FILL() to glibc() isn't the hardest thing ever, but
> even without that, it's a single memset() away.
>
>
> Would not the below two patches, one kernel, one glibc, be all it takes?

I'd much prefer this ABI variant, it's a pretty natural extension of the
existing ABI and principles:

> if (user_mask) {
> - cpumask_copy(user_mask, in_mask);
> + /*
> + * All-set user cpumask resets affinity and drops the explicit
> + * user mask.
> + */
> + cpumask_and(user_mask, in_mask, cpu_possible_mask);
> + if (cpumask_equal(user_mask, cpu_possible_mask)) {
> + kfree(user_mask);
> + user_mask = NULL;
> + }

Question: is there any observable behavioral difference between current
(old) all-set cpumask calls and the patched (new) one?

Thanks,

Ingo

2023-10-04 10:06:47

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH v4] sched/core: Use zero length to reset cpumasks in sched_setaffinity()


* Peter Zijlstra <[email protected]> wrote:

> On Wed, Oct 04, 2023 at 11:23:41AM +0200, Ingo Molnar wrote:
>
> > > if (user_mask) {
> > > - cpumask_copy(user_mask, in_mask);
> > > + /*
> > > + * All-set user cpumask resets affinity and drops the explicit
> > > + * user mask.
> > > + */
> > > + cpumask_and(user_mask, in_mask, cpu_possible_mask);
> > > + if (cpumask_equal(user_mask, cpu_possible_mask)) {
> > > + kfree(user_mask);
> > > + user_mask = NULL;
> > > + }
> >
> > Question: is there any observable behavioral difference between current
> > (old) all-set cpumask calls and the patched (new) one?
>
> Very little I think -- the main difference is that we no longer carry
> the ->user_cpus_ptr mask around, and that saves a little masking.

So calling with a full mask would actually work fine on 'old' kernels too,
as it's a 'reset' event in essence. (With a bit of allocation & masking
overhead.)

This pretty unambiguously marks the full-mask solution as the superior ABI ...

Thanks,

Ingo

2023-10-04 12:21:15

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v4] sched/core: Use zero length to reset cpumasks in sched_setaffinity()

On 10/4/23 06:06, Ingo Molnar wrote:
> * Peter Zijlstra <[email protected]> wrote:
>
>> On Wed, Oct 04, 2023 at 11:23:41AM +0200, Ingo Molnar wrote:
>>
>>>> if (user_mask) {
>>>> - cpumask_copy(user_mask, in_mask);
>>>> + /*
>>>> + * All-set user cpumask resets affinity and drops the explicit
>>>> + * user mask.
>>>> + */
>>>> + cpumask_and(user_mask, in_mask, cpu_possible_mask);
>>>> + if (cpumask_equal(user_mask, cpu_possible_mask)) {
>>>> + kfree(user_mask);
>>>> + user_mask = NULL;
>>>> + }
>>> Question: is there any observable behavioral difference between current
>>> (old) all-set cpumask calls and the patched (new) one?
>> Very little I think -- the main difference is that we no longer carry
>> the ->user_cpus_ptr mask around, and that saves a little masking.
> So calling with a full mask would actually work fine on 'old' kernels too,
> as it's a 'reset' event in essence. (With a bit of allocation & masking
> overhead.)
>
> This pretty unambiguously marks the full-mask solution as the superior ABI ...

I am fine with that one too. I do have a little bit concern about that
the difference in behavior when the full mask is passed in, but that is
reverting to the old behavior before commit 8f9ea86fdf99 ("sched: Always
preserve the user requested cpumask").

BTW, we can probably check the in_mask directly earlier to skip an
unnecessary cpumask allocation and free in this particular case.

Cheers,
Longman

2023-10-04 12:35:12

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH v4] sched/core: Use zero length to reset cpumasks in sched_setaffinity()

* Peter Zijlstra:

> Subject: sched: Add CPU_FILL()
>
> Add the CPU_FILL() macros to easily create an all-set cpumask.
>
> FreeBSD also provides this macro with this semantic.
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>

My main concer is that obtaining the size of the mask, or at least an
approximiation is not exactly easy. If there's an expectation that
applications reset the mask more often than they do today (I don't have
the full context here), then we'd some decent interface to get the
approriate size.

Thanks,
Florian

2023-10-04 12:42:52

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v4] sched/core: Use zero length to reset cpumasks in sched_setaffinity()


On 10/4/23 08:34, Florian Weimer wrote:
> * Peter Zijlstra:
>
>> Subject: sched: Add CPU_FILL()
>>
>> Add the CPU_FILL() macros to easily create an all-set cpumask.
>>
>> FreeBSD also provides this macro with this semantic.
>>
>> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> My main concer is that obtaining the size of the mask, or at least an
> approximiation is not exactly easy. If there's an expectation that
> applications reset the mask more often than they do today (I don't have
> the full context here), then we'd some decent interface to get the
> approriate size.

I believe the macro just use sizeof(cpu_set_t) as the size of the
bitmask. It is the same case as in CPU_ZERO().

Regards,
Longman

2023-10-04 12:56:27

by Florian Weimer

[permalink] [raw]
Subject: Re: [PATCH v4] sched/core: Use zero length to reset cpumasks in sched_setaffinity()

* Waiman Long:

> On 10/4/23 08:34, Florian Weimer wrote:
>> * Peter Zijlstra:
>>
>>> Subject: sched: Add CPU_FILL()
>>>
>>> Add the CPU_FILL() macros to easily create an all-set cpumask.
>>>
>>> FreeBSD also provides this macro with this semantic.
>>>
>>> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
>> My main concer is that obtaining the size of the mask, or at least an
>> approximiation is not exactly easy. If there's an expectation that
>> applications reset the mask more often than they do today (I don't have
>> the full context here), then we'd some decent interface to get the
>> approriate size.
>
> I believe the macro just use sizeof(cpu_set_t) as the size of the
> bitmask. It is the same case as in CPU_ZERO().

I mean the CPU_FILL_S macro also defined in the patch. Correctly
written applications should not use CPU_FILL and statically sized CPU
sets.

Thanks,
Florian

2023-10-04 16:26:13

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v4] sched/core: Use zero length to reset cpumasks in sched_setaffinity()


On 10/4/23 08:55, Florian Weimer wrote:
> * Waiman Long:
>
>> On 10/4/23 08:34, Florian Weimer wrote:
>>> * Peter Zijlstra:
>>>
>>>> Subject: sched: Add CPU_FILL()
>>>>
>>>> Add the CPU_FILL() macros to easily create an all-set cpumask.
>>>>
>>>> FreeBSD also provides this macro with this semantic.
>>>>
>>>> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
>>> My main concer is that obtaining the size of the mask, or at least an
>>> approximiation is not exactly easy. If there's an expectation that
>>> applications reset the mask more often than they do today (I don't have
>>> the full context here), then we'd some decent interface to get the
>>> approriate size.
>> I believe the macro just use sizeof(cpu_set_t) as the size of the
>> bitmask. It is the same case as in CPU_ZERO().
> I mean the CPU_FILL_S macro also defined in the patch. Correctly
> written applications should not use CPU_FILL and statically sized CPU
> sets.

Right, that can be a problem. If the input bitmask size is less than
cpumask_size(), CPU_FILL_S() won't work to reset the cpumask. In fact,
it will treat that bitmask just like a regular sched_setaffinity() call
and set it accordingly.

With that caveat, I would prefer to keep using a length of 0 for the
reset then.

Cheers,
Longman