2022-09-16 18:45:04

by Waiman Long

[permalink] [raw]
Subject: [PATCH v9 0/7] sched: Persistent user requested affinity

v9:
- Fix non-SMP config build errors in patch 4 reported by kernel test
robot.
- Disable user_cpus_ptr masking also when any of the SCA_MIGRATE_*
flags are set.

v8:
- Add patches "sched: Introduce affinity_context structure" and
"sched: Always clear user_cpus_ptr in do_set_cpus_allowed()" as
suggested by PeterZ.
- Better handle cpuset and sched_setaffinity() race in patch 5.

v7:
- Make changes recommended by Peter such as making scratch_mask
allocation earlier and using percpu variable to pass information.

v6:
- Drop the cpuset patch and make user_cpus_ptr masking applicable to
all callers of set_cpus_allowed_ptr().
- Make user_cpus_ptr and cpus_mask update in a single lock critical
section to avoid race problems.

The user_cpus_ptr field is added by commit b90ca8badbd1 ("sched:
Introduce task_struct::user_cpus_ptr to track requested affinity")
which uses it narrowly to allow keeping cpus affinity intact with
asymmetric cpu setup.

This patchset extends user_cpus_ptr to store user requested cpus affinity
via the sched_setaffinity() API. With that information available, it will
enable cpuset and other callers of set_cpus_allowed_ptr() like hotplug
to keep cpus afinity as close to what the user wants as possible within
the cpu list constraint of the current cpuset. Otherwise some change
to the cpuset hierarchy or a hotplug event may reset the cpumask of
the tasks in the affected cpusets to the default cpuset value even if
those tasks have cpus affinity explicitly set by the users before.

It also means that once sched_setaffinity() is called successfully,
user_cpus_ptr will remain allocated until the task exits except in some
rare circumstances.

Waiman Long (7):
sched: Add __releases annotations to affine_move_task()
sched: Use user_cpus_ptr for saving user provided cpumask in
sched_setaffinity()
sched: Enforce user requested affinity
sched: Introduce affinity_context structure
sched: Handle set_cpus_allowed_ptr() & sched_setaffinity() race
sched: Fix sched_setaffinity() and fork/clone() race
sched: Always clear user_cpus_ptr in do_set_cpus_allowed()

kernel/sched/core.c | 231 +++++++++++++++++++++++++---------------
kernel/sched/deadline.c | 7 +-
kernel/sched/sched.h | 22 +++-
3 files changed, 169 insertions(+), 91 deletions(-)

--
2.31.1


2022-09-16 19:24:07

by Waiman Long

[permalink] [raw]
Subject: [PATCH v9 5/7] sched: Handle set_cpus_allowed_ptr() & sched_setaffinity() race

Racing is possible between set_cpus_allowed_ptr() and sched_setaffinity()
or between multiple sched_setaffinity() calls from different
CPUs. To resolve these race conditions, we need to update both
user_cpus_ptr and cpus_mask in a single lock critical section instead
of separated ones. This requires moving the user_cpus_ptr update
to set_cpus_allowed_common() by putting the user_mask into the
affinity_context structure.

This patch also changes the handling of the race between the
sched_setaffinity() call and the changing of cpumask of the current
cpuset. In case the new mask conflicts with newly updated cpuset,
the cpus_mask will be reset to the cpuset cpumask and an error value
of -EINVAL will be returned. If a previous user_cpus_ptr value exists,
it will be swapped back in and the new_mask will be further restricted
to what is allowed in the cpumask pointed to by the old user_cpus_ptr.

Signed-off-by: Waiman Long <[email protected]>
---
kernel/sched/core.c | 44 +++++++++++++++++++++++++++-----------------
kernel/sched/sched.h | 1 +
2 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b662d8ddc169..c748e56ba254 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2546,6 +2546,12 @@ void set_cpus_allowed_common(struct task_struct *p, struct affinity_context *ctx

cpumask_copy(&p->cpus_mask, ctx->new_mask);
p->nr_cpus_allowed = cpumask_weight(ctx->new_mask);
+
+ /*
+ * Swap in a new user_cpus_ptr if SCA_USER flag set
+ */
+ if (ctx->flags & SCA_USER)
+ swap(p->user_cpus_ptr, ctx->user_mask);
}

static void
@@ -8104,7 +8110,7 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx)
retval = dl_task_check_affinity(p, new_mask);
if (retval)
goto out_free_new_mask;
-again:
+
retval = __set_cpus_allowed_ptr(p, ctx);
if (retval)
goto out_free_new_mask;
@@ -8116,7 +8122,24 @@ __sched_setaffinity(struct task_struct *p, struct affinity_context *ctx)
* Just reset the cpumask to the cpuset's cpus_allowed.
*/
cpumask_copy(new_mask, cpus_allowed);
- goto again;
+
+ /*
+ * If SCA_USER is set, a 2nd call to __set_cpus_allowed_ptr()
+ * will restore the previous user_cpus_ptr value.
+ *
+ * In the unlikely event a previous user_cpus_ptr exists,
+ * we need to further restrict the mask to what is allowed
+ * by that old user_cpus_ptr.
+ */
+ if (unlikely((ctx->flags & SCA_USER) && ctx->user_mask)) {
+ bool empty = !cpumask_and(new_mask, new_mask,
+ ctx->user_mask);
+
+ if (WARN_ON_ONCE(empty))
+ cpumask_copy(new_mask, cpus_allowed);
+ }
+ __set_cpus_allowed_ptr(p, ctx);
+ retval = -EINVAL;
}

out_free_new_mask:
@@ -8172,25 +8195,12 @@ long sched_setaffinity(pid_t pid, const struct cpumask *in_mask)
cpumask_copy(user_mask, in_mask);
ac = (struct affinity_context){
.new_mask = in_mask,
+ .user_mask = user_mask,
.flags = SCA_USER,
};

retval = __sched_setaffinity(p, &ac);
-
- /*
- * Save in_mask into user_cpus_ptr after a successful
- * __sched_setaffinity() call. pi_lock is used to synchronize
- * changes to user_cpus_ptr.
- */
- if (!retval) {
- unsigned long flags;
-
- /* Use pi_lock to synchronize changes to user_cpus_ptr */
- raw_spin_lock_irqsave(&p->pi_lock, flags);
- swap(p->user_cpus_ptr, user_mask);
- raw_spin_unlock_irqrestore(&p->pi_lock, flags);
- }
- kfree(user_mask);
+ kfree(ac.user_mask);

out_put_task:
put_task_struct(p);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1927c02f68fa..110e13b7d78b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2159,6 +2159,7 @@ extern const u32 sched_prio_to_wmult[40];

struct affinity_context {
const struct cpumask *new_mask;
+ struct cpumask *user_mask;
unsigned int flags;
};

--
2.31.1

2022-09-22 09:16:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v9 0/7] sched: Persistent user requested affinity

On Fri, Sep 16, 2022 at 02:32:10PM -0400, Waiman Long wrote:

> Waiman Long (7):
> sched: Add __releases annotations to affine_move_task()
> sched: Use user_cpus_ptr for saving user provided cpumask in
> sched_setaffinity()
> sched: Enforce user requested affinity
> sched: Introduce affinity_context structure
> sched: Handle set_cpus_allowed_ptr() & sched_setaffinity() race
> sched: Fix sched_setaffinity() and fork/clone() race
> sched: Always clear user_cpus_ptr in do_set_cpus_allowed()

I still think this is ordered all wrong. Esp. with that affinity context
doing the right thing isn't onerous at all.

So please, re-order this and fold all fixes back in so that it becomes
a sane series. Basically patches 5 and 6 shouldn't exist, they fix
issues created by the earlier patches.

2022-09-22 12:36:30

by Waiman Long

[permalink] [raw]
Subject: Re: [PATCH v9 0/7] sched: Persistent user requested affinity

On 9/22/22 04:45, Peter Zijlstra wrote:
> On Fri, Sep 16, 2022 at 02:32:10PM -0400, Waiman Long wrote:
>
>> Waiman Long (7):
>> sched: Add __releases annotations to affine_move_task()
>> sched: Use user_cpus_ptr for saving user provided cpumask in
>> sched_setaffinity()
>> sched: Enforce user requested affinity
>> sched: Introduce affinity_context structure
>> sched: Handle set_cpus_allowed_ptr() & sched_setaffinity() race
>> sched: Fix sched_setaffinity() and fork/clone() race
>> sched: Always clear user_cpus_ptr in do_set_cpus_allowed()
> I still think this is ordered all wrong. Esp. with that affinity context
> doing the right thing isn't onerous at all.
>
> So please, re-order this and fold all fixes back in so that it becomes
> a sane series. Basically patches 5 and 6 shouldn't exist, they fix
> issues created by the earlier patches.
>
Thanks for the suggestion. Will rework the series as suggested.

Cheers,
Longman