2024-02-10 00:23:48

by John Stultz

[permalink] [raw]
Subject: [PATCH v8 0/7] Preparatory changes for Proxy Execution v8

After sending out v7 of Proxy Execution, I got feedback that the
patch series was getting a bit unwieldy to review, and Qais
suggested I break out just the cleanups/preparatory components of
the patch series and submit them on their own in the hope we can
start to merge the less complex bits and discussion can focus on
the more complicated portions afterwards.

So for the v8 of this series, I’m only submitting those earlier
cleanup/preparatory changes here. However work on the full series
has continued, with some nice progress on the performance front.

If you are interested, the full v8 series, it can be found here:
https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v8-6.8-rc3
https://github.com/johnstultz-work/linux-dev.git proxy-exec-v8-6.8-rc3

One bit of feedback I got earlier from K Prateek Nayak[1] was
that while the Proxy Execution feature didn’t impact performance
in most benchmarks he tried, it did cause major regressions with
the `perf bench` test, which I isolated down to the patch series
disabling optimistic spinning which has been a long term
performance concern with the change.

I spent some time looking for a way to walk the chain (across
cpus) to see if the runnable owner was actually running, which
would allow us to spin and avoid migrations. However, I realized
that is actually a separate and more difficult problem (ie:
migration avoidance) from simple optimistic spinning (a task
trying to take a mutex where the owner is running). With Proxy
Execution, the main focus is how we effectively boost
*non-running* mutex owners by the blocked tasks. But when the
contended mutex owner is running, we don’t need to do anything
different from the existing optimistic spinning logic.

So I’ve dropped the patches disabling optimistic spinning, and
fixed up some further issues as the change opened some new races
the later Proxy Execution patches weren’t expecting. The results
are looking really good with the perf bench test, which
previously regressed so badly, now seeing very close results
(within 1.5%) with and without Proxy Execution. This alleviates
a major concern I had with the series.

Re-enabling optimistic spinning does open a question about if we
should allow the optimistic spinning’s lock-stealing (instead of
lock-handoff) when the runnable mutex owner is running as a proxy
for a more important task. Ideally we probably want to hand it
back up the chain which has donated the time (to avoid migrating
the chain to the stealer’s cpu), so I have a patch I’m testing
that special cases that situation, but so far in my testing I’m
not seeing any large increase in latencies from the potential
unfairness of lock-stealing. But I’ll be doing more investigation
on this.

New in v8:
---------
* Re-ordered patch set to only submit cleanup & preparatory
changes, leaving more complicated changes for a future
submission (suggested by Qais Yousef)

* Lots of spelling fixups and minor changes suggested by Randy
Dunlap and Metin Kaya

* Cover letter was getting long winded by including boilerplate
sections, so please see earlier versions of the patch series or
this LWN article [2] for an overview of the functionality.

* Re-enabled mutex optimistic spinning to resolve a major
performance regression in earlier versions of the series

* Fixups for races discovered with optimistic spinning

* Included trace events patch from Metin to better help us
analyze the costs of proxying.

* Extended sched_football to use rtmtuexes when
!CONFIG_SCHED_PROXY_EXEC

* Tweaked sched_football to bail if spawned players don’t check
in within 30s

* Chased down a bug uncovered by sched_football that was
preventing unconstrained RT tasks from being put on the
pushable list, causing improper task starvation and
sched_football failures.

Performance:
---------
With the optimistic spinning re-enabled, the biggest concern I
had wrt performance is much relieved. However, there is still the
potential extra overhead of doing rq migrations/return migrations
for the proxy case. The big hypothesis of this series is that the
overall benefit from unblocking important tasks will well
outweigh any extra cost, but if folks see anything of concern,
I’d love to hear about it.

Issues still to address:
---------
* Enabling optimistic spinning uncovered a few interesting races
that were much harder to hit in v7. I’ve resolved most of
these, but there is a case I’ve occasionally seen where the in
try_to_wakeup(), the blocked_on_state transitions are too lax,
causing a transition to BO_RUNNABLE when we have not had a
chance to evaluate for return-migration (allowing tasks to
potentially run on cpus they shouldn’t). Unfortunately in
tightening the state handling to be more careful, I’ve run into
cases where we don’t transition to BO_RUNNABLE, and effectively
lose the wakeup. I’ve started to suspect the conceptual overlap
between task->blocked_on_state and task->__state suggests I
should consider condensing the BO_BLOCKED/BO_WAKING states into
new task states (ie: TASK_MUTEX_BLOCKED, TASK_PROXY_MIGRATED),
so we can handle the state transitions more correctly.

* Closer evaluation of pros/cons of doing lock handoff vs
allowing lock-stealing when proxy tasks release locks.

* The chain migration functionality needs further iterations and
better validation to ensure it truly maintains the RT/DL load
balancing invariants.

* Xuewen Yan earlier pointed out that we may see task
mis-placement on EAS systems if we do return migration based
strictly on cpu allowability. I tried an optimization to always
try to return migrate to the wake_cpu (which was saved on
proxy-migration), but this seemed to undo a chunk of the
benefit I saw in moving return migration back to ttwu, at least
with my prio-inversion-demo microbenchmark. Need to do some
broader performance analysis with the variations there.

* It would be nice to find optimization to avoid migrating
blocked tasks if the runnable lock-owner at the end of the
blocked_on chain is already running (though this is difficult
due to the limitations from locking rules needed to traverse
the blocked on chain across run queues).

* CFS load balancing. There was concern blocked tasks may carry
forward load (PELT) to the lock owner's CPU, so CPU may look
like it is overloaded. Needs investigation

* The sleeping owner handling (where we deactivate waiting tasks
and enqueue them onto a list, then reactivate them when the
owner wakes up) doesn’t feel great. This is in part because
when we want to activate tasks, we’re already holding a
task.pi_lock and a rq_lock, just not the locks for the task
we’re activating, nor the rq we’re enqueuing it onto. So there
has to be a bit of lock juggling to drop and acquire the right
locks (in the right order). It feels like there’s got to be a
better way.

* As discussed at OSPM[3], I’d like to split pick_next_task() up
into two phases selecting and setting the next tasks, as
currently pick_next_task() assumes the returned task will be
run which results in various side-effects in sched class logic
when it’s run. I tried to take a pass at this earlier, but it’s
hairy and lower on the priority list for now.

Credit/Disclaimer:
—--------------------
As mentioned previously, this Proxy Execution series has a long
history: First described in a paper[4] by Watkins, Straub,
Niehaus, then from patches from Peter Zijlstra, extended with
lots of work by Juri Lelli, Valentin Schneider, and Connor
O'Brien. (and thank you to Steven Rostedt for providing
additional details here!)

So again, many thanks to those above, as all the credit for this
series really is due to them - while the mistakes are likely
mine.

As always, feedback is always appreciated.

Thanks so much!
-john

[1] https://lore.kernel.org/lkml/[email protected]/
[2] https://lwn.net/Articles/934114/
[3] https://youtu.be/QEWqRhVS3lI (video of my OSPM talk)
[4] https://static.lwn.net/images/conf/rtlws11/papers/proc/p38.pdf

Cc: Joel Fernandes <[email protected]>
Cc: Qais Yousef <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Zimuzo Ezeozue <[email protected]>
Cc: Youssef Esmat <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Metin Kaya <[email protected]>
Cc: Xuewen Yan <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]

Connor O'Brien (2):
sched: Add do_push_task helper
sched: Consolidate pick_*_task to task_is_pushable helper

John Stultz (2):
locking/mutex: Remove wakeups from under mutex::wait_lock
sched: Split out __schedule() deactivate task logic into a helper

Juri Lelli (2):
locking/mutex: Make mutex::wait_lock irq safe
locking/mutex: Expose __mutex_owner()

Peter Zijlstra (1):
sched: Split scheduler and execution contexts

kernel/locking/mutex.c | 60 +++++++----------
kernel/locking/mutex.h | 25 +++++++
kernel/locking/rtmutex.c | 26 +++++---
kernel/locking/rwbase_rt.c | 4 +-
kernel/locking/rwsem.c | 4 +-
kernel/locking/spinlock_rt.c | 3 +-
kernel/locking/ww_mutex.h | 49 ++++++++------
kernel/sched/core.c | 122 +++++++++++++++++++++--------------
kernel/sched/deadline.c | 53 ++++++---------
kernel/sched/fair.c | 18 +++---
kernel/sched/rt.c | 59 +++++++----------
kernel/sched/sched.h | 44 ++++++++++++-
12 files changed, 268 insertions(+), 199 deletions(-)

--
2.43.0.687.g38aa6559b0-goog



2024-02-10 00:23:57

by John Stultz

[permalink] [raw]
Subject: [PATCH v8 1/7] locking/mutex: Remove wakeups from under mutex::wait_lock

In preparation to nest mutex::wait_lock under rq::lock we need to remove
wakeups from under it.

Cc: Joel Fernandes <[email protected]>
Cc: Qais Yousef <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Zimuzo Ezeozue <[email protected]>
Cc: Youssef Esmat <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Metin Kaya <[email protected]>
Cc: Xuewen Yan <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
[Heavily changed after 55f036ca7e74 ("locking: WW mutex cleanup") and
08295b3b5bee ("locking: Implement an algorithm choice for Wound-Wait
mutexes")]
Signed-off-by: Juri Lelli <[email protected]>
[jstultz: rebased to mainline, added extra wake_up_q & init
to avoid hangs, similar to Connor's rework of this patch]
Signed-off-by: John Stultz <[email protected]>
---
v5:
* Reverted back to an earlier version of this patch to undo
the change that kept the wake_q in the ctx structure, as
that broke the rule that the wake_q must always be on the
stack, as its not safe for concurrency.
v6:
* Made tweaks suggested by Waiman Long
v7:
* Fixups to pass wake_qs down for PREEMPT_RT logic
---
kernel/locking/mutex.c | 17 +++++++++++++----
kernel/locking/rtmutex.c | 26 +++++++++++++++++---------
kernel/locking/rwbase_rt.c | 4 +++-
kernel/locking/rwsem.c | 4 ++--
kernel/locking/spinlock_rt.c | 3 ++-
kernel/locking/ww_mutex.h | 29 ++++++++++++++++++-----------
6 files changed, 55 insertions(+), 28 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index cbae8c0b89ab..980ce630232c 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -575,6 +575,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
struct lockdep_map *nest_lock, unsigned long ip,
struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
{
+ DEFINE_WAKE_Q(wake_q);
struct mutex_waiter waiter;
struct ww_mutex *ww;
int ret;
@@ -625,7 +626,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
*/
if (__mutex_trylock(lock)) {
if (ww_ctx)
- __ww_mutex_check_waiters(lock, ww_ctx);
+ __ww_mutex_check_waiters(lock, ww_ctx, &wake_q);

goto skip_wait;
}
@@ -645,7 +646,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
* Add in stamp order, waking up waiters that must kill
* themselves.
*/
- ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx);
+ ret = __ww_mutex_add_waiter(&waiter, lock, ww_ctx, &wake_q);
if (ret)
goto err_early_kill;
}
@@ -681,6 +682,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
}

raw_spin_unlock(&lock->wait_lock);
+ /* Make sure we do wakeups before calling schedule */
+ if (!wake_q_empty(&wake_q)) {
+ wake_up_q(&wake_q);
+ wake_q_init(&wake_q);
+ }
schedule_preempt_disabled();

first = __mutex_waiter_is_first(lock, &waiter);
@@ -714,7 +720,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
*/
if (!ww_ctx->is_wait_die &&
!__mutex_waiter_is_first(lock, &waiter))
- __ww_mutex_check_waiters(lock, ww_ctx);
+ __ww_mutex_check_waiters(lock, ww_ctx, &wake_q);
}

__mutex_remove_waiter(lock, &waiter);
@@ -730,6 +736,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
ww_mutex_lock_acquired(ww, ww_ctx);

raw_spin_unlock(&lock->wait_lock);
+ wake_up_q(&wake_q);
preempt_enable();
return 0;

@@ -741,6 +748,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
raw_spin_unlock(&lock->wait_lock);
debug_mutex_free_waiter(&waiter);
mutex_release(&lock->dep_map, ip);
+ wake_up_q(&wake_q);
preempt_enable();
return ret;
}
@@ -934,6 +942,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
}
}

+ preempt_disable();
raw_spin_lock(&lock->wait_lock);
debug_mutex_unlock(lock);
if (!list_empty(&lock->wait_list)) {
@@ -952,8 +961,8 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
__mutex_handoff(lock, next);

raw_spin_unlock(&lock->wait_lock);
-
wake_up_q(&wake_q);
+ preempt_enable();
}

#ifndef CONFIG_DEBUG_LOCK_ALLOC
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 4a10e8c16fd2..eaac8b196a69 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -34,13 +34,15 @@

static inline int __ww_mutex_add_waiter(struct rt_mutex_waiter *waiter,
struct rt_mutex *lock,
- struct ww_acquire_ctx *ww_ctx)
+ struct ww_acquire_ctx *ww_ctx,
+ struct wake_q_head *wake_q)
{
return 0;
}

static inline void __ww_mutex_check_waiters(struct rt_mutex *lock,
- struct ww_acquire_ctx *ww_ctx)
+ struct ww_acquire_ctx *ww_ctx,
+ struct wake_q_head *wake_q)
{
}

@@ -1206,6 +1208,7 @@ static int __sched task_blocks_on_rt_mutex(struct rt_mutex_base *lock,
struct rt_mutex_waiter *top_waiter = waiter;
struct rt_mutex_base *next_lock;
int chain_walk = 0, res;
+ DEFINE_WAKE_Q(wake_q);

lockdep_assert_held(&lock->wait_lock);

@@ -1244,7 +1247,8 @@ static int __sched task_blocks_on_rt_mutex(struct rt_mutex_base *lock,

/* Check whether the waiter should back out immediately */
rtm = container_of(lock, struct rt_mutex, rtmutex);
- res = __ww_mutex_add_waiter(waiter, rtm, ww_ctx);
+ res = __ww_mutex_add_waiter(waiter, rtm, ww_ctx, &wake_q);
+ wake_up_q(&wake_q);
if (res) {
raw_spin_lock(&task->pi_lock);
rt_mutex_dequeue(lock, waiter);
@@ -1677,7 +1681,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
struct ww_acquire_ctx *ww_ctx,
unsigned int state,
enum rtmutex_chainwalk chwalk,
- struct rt_mutex_waiter *waiter)
+ struct rt_mutex_waiter *waiter,
+ struct wake_q_head *wake_q)
{
struct rt_mutex *rtm = container_of(lock, struct rt_mutex, rtmutex);
struct ww_mutex *ww = ww_container_of(rtm);
@@ -1688,7 +1693,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
/* Try to acquire the lock again: */
if (try_to_take_rt_mutex(lock, current, NULL)) {
if (build_ww_mutex() && ww_ctx) {
- __ww_mutex_check_waiters(rtm, ww_ctx);
+ __ww_mutex_check_waiters(rtm, ww_ctx, wake_q);
ww_mutex_lock_acquired(ww, ww_ctx);
}
return 0;
@@ -1706,7 +1711,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
/* acquired the lock */
if (build_ww_mutex() && ww_ctx) {
if (!ww_ctx->is_wait_die)
- __ww_mutex_check_waiters(rtm, ww_ctx);
+ __ww_mutex_check_waiters(rtm, ww_ctx, wake_q);
ww_mutex_lock_acquired(ww, ww_ctx);
}
} else {
@@ -1728,7 +1733,8 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,

static inline int __rt_mutex_slowlock_locked(struct rt_mutex_base *lock,
struct ww_acquire_ctx *ww_ctx,
- unsigned int state)
+ unsigned int state,
+ struct wake_q_head *wake_q)
{
struct rt_mutex_waiter waiter;
int ret;
@@ -1737,7 +1743,7 @@ static inline int __rt_mutex_slowlock_locked(struct rt_mutex_base *lock,
waiter.ww_ctx = ww_ctx;

ret = __rt_mutex_slowlock(lock, ww_ctx, state, RT_MUTEX_MIN_CHAINWALK,
- &waiter);
+ &waiter, wake_q);

debug_rt_mutex_free_waiter(&waiter);
return ret;
@@ -1753,6 +1759,7 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_base *lock,
struct ww_acquire_ctx *ww_ctx,
unsigned int state)
{
+ DEFINE_WAKE_Q(wake_q);
unsigned long flags;
int ret;

@@ -1774,8 +1781,9 @@ static int __sched rt_mutex_slowlock(struct rt_mutex_base *lock,
* irqsave/restore variants.
*/
raw_spin_lock_irqsave(&lock->wait_lock, flags);
- ret = __rt_mutex_slowlock_locked(lock, ww_ctx, state);
+ ret = __rt_mutex_slowlock_locked(lock, ww_ctx, state, &wake_q);
raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
+ wake_up_q(&wake_q);
rt_mutex_post_schedule();

return ret;
diff --git a/kernel/locking/rwbase_rt.c b/kernel/locking/rwbase_rt.c
index 34a59569db6b..e9d2f38b70f3 100644
--- a/kernel/locking/rwbase_rt.c
+++ b/kernel/locking/rwbase_rt.c
@@ -69,6 +69,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *rwb,
unsigned int state)
{
struct rt_mutex_base *rtm = &rwb->rtmutex;
+ DEFINE_WAKE_Q(wake_q);
int ret;

rwbase_pre_schedule();
@@ -110,7 +111,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *rwb,
* For rwlocks this returns 0 unconditionally, so the below
* !ret conditionals are optimized out.
*/
- ret = rwbase_rtmutex_slowlock_locked(rtm, state);
+ ret = rwbase_rtmutex_slowlock_locked(rtm, state, &wake_q);

/*
* On success the rtmutex is held, so there can't be a writer
@@ -122,6 +123,7 @@ static int __sched __rwbase_read_lock(struct rwbase_rt *rwb,
if (!ret)
atomic_inc(&rwb->readers);
raw_spin_unlock_irq(&rtm->wait_lock);
+ wake_up_q(&wake_q);
if (!ret)
rwbase_rtmutex_unlock(rtm);

diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
index 2340b6d90ec6..74ebb2915d63 100644
--- a/kernel/locking/rwsem.c
+++ b/kernel/locking/rwsem.c
@@ -1415,8 +1415,8 @@ static inline void __downgrade_write(struct rw_semaphore *sem)
#define rwbase_rtmutex_lock_state(rtm, state) \
__rt_mutex_lock(rtm, state)

-#define rwbase_rtmutex_slowlock_locked(rtm, state) \
- __rt_mutex_slowlock_locked(rtm, NULL, state)
+#define rwbase_rtmutex_slowlock_locked(rtm, state, wq) \
+ __rt_mutex_slowlock_locked(rtm, NULL, state, wq)

#define rwbase_rtmutex_unlock(rtm) \
__rt_mutex_unlock(rtm)
diff --git a/kernel/locking/spinlock_rt.c b/kernel/locking/spinlock_rt.c
index 38e292454fcc..fb1810a14c9d 100644
--- a/kernel/locking/spinlock_rt.c
+++ b/kernel/locking/spinlock_rt.c
@@ -162,7 +162,8 @@ rwbase_rtmutex_lock_state(struct rt_mutex_base *rtm, unsigned int state)
}

static __always_inline int
-rwbase_rtmutex_slowlock_locked(struct rt_mutex_base *rtm, unsigned int state)
+rwbase_rtmutex_slowlock_locked(struct rt_mutex_base *rtm, unsigned int state,
+ struct wake_q_head *wake_q)
{
rtlock_slowlock_locked(rtm);
return 0;
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 3ad2cc4823e5..7189c6631d90 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -275,7 +275,7 @@ __ww_ctx_less(struct ww_acquire_ctx *a, struct ww_acquire_ctx *b)
*/
static bool
__ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
- struct ww_acquire_ctx *ww_ctx)
+ struct ww_acquire_ctx *ww_ctx, struct wake_q_head *wake_q)
{
if (!ww_ctx->is_wait_die)
return false;
@@ -284,7 +284,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
#ifndef WW_RT
debug_mutex_wake_waiter(lock, waiter);
#endif
- wake_up_process(waiter->task);
+ wake_q_add(wake_q, waiter->task);
}

return true;
@@ -299,7 +299,8 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
*/
static bool __ww_mutex_wound(struct MUTEX *lock,
struct ww_acquire_ctx *ww_ctx,
- struct ww_acquire_ctx *hold_ctx)
+ struct ww_acquire_ctx *hold_ctx,
+ struct wake_q_head *wake_q)
{
struct task_struct *owner = __ww_mutex_owner(lock);

@@ -331,7 +332,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
* wakeup pending to re-read the wounded state.
*/
if (owner != current)
- wake_up_process(owner);
+ wake_q_add(wake_q, owner);

return true;
}
@@ -352,7 +353,8 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
* The current task must not be on the wait list.
*/
static void
-__ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx)
+__ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx,
+ struct wake_q_head *wake_q)
{
struct MUTEX_WAITER *cur;

@@ -364,8 +366,8 @@ __ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx)
if (!cur->ww_ctx)
continue;

- if (__ww_mutex_die(lock, cur, ww_ctx) ||
- __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx))
+ if (__ww_mutex_die(lock, cur, ww_ctx, wake_q) ||
+ __ww_mutex_wound(lock, cur->ww_ctx, ww_ctx, wake_q))
break;
}
}
@@ -377,6 +379,8 @@ __ww_mutex_check_waiters(struct MUTEX *lock, struct ww_acquire_ctx *ww_ctx)
static __always_inline void
ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
{
+ DEFINE_WAKE_Q(wake_q);
+
ww_mutex_lock_acquired(lock, ctx);

/*
@@ -405,8 +409,10 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
* die or wound us.
*/
lock_wait_lock(&lock->base);
- __ww_mutex_check_waiters(&lock->base, ctx);
+ __ww_mutex_check_waiters(&lock->base, ctx, &wake_q);
unlock_wait_lock(&lock->base);
+
+ wake_up_q(&wake_q);
}

static __always_inline int
@@ -488,7 +494,8 @@ __ww_mutex_check_kill(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
static inline int
__ww_mutex_add_waiter(struct MUTEX_WAITER *waiter,
struct MUTEX *lock,
- struct ww_acquire_ctx *ww_ctx)
+ struct ww_acquire_ctx *ww_ctx,
+ struct wake_q_head *wake_q)
{
struct MUTEX_WAITER *cur, *pos = NULL;
bool is_wait_die;
@@ -532,7 +539,7 @@ __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter,
pos = cur;

/* Wait-Die: ensure younger waiters die. */
- __ww_mutex_die(lock, cur, ww_ctx);
+ __ww_mutex_die(lock, cur, ww_ctx, wake_q);
}

__ww_waiter_add(lock, waiter, pos);
@@ -550,7 +557,7 @@ __ww_mutex_add_waiter(struct MUTEX_WAITER *waiter,
* such that either we or the fastpath will wound @ww->ctx.
*/
smp_mb();
- __ww_mutex_wound(lock, ww_ctx, ww->ctx);
+ __ww_mutex_wound(lock, ww_ctx, ww->ctx, wake_q);
}

return 0;
--
2.43.0.687.g38aa6559b0-goog


2024-02-10 00:24:08

by John Stultz

[permalink] [raw]
Subject: [PATCH v8 2/7] locking/mutex: Make mutex::wait_lock irq safe

From: Juri Lelli <[email protected]>

mutex::wait_lock might be nested under rq->lock.

Make it irq safe then.

Cc: Joel Fernandes <[email protected]>
Cc: Qais Yousef <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Zimuzo Ezeozue <[email protected]>
Cc: Youssef Esmat <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Metin Kaya <[email protected]>
Cc: Xuewen Yan <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Juri Lelli <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
[rebase & fix {un,}lock_wait_lock helpers in ww_mutex.h]
Signed-off-by: Connor O'Brien <[email protected]>
Signed-off-by: John Stultz <[email protected]>
---
v3:
* Re-added this patch after it was dropped in v2 which
caused lockdep warnings to trip.
v7:
* Fix function definition for PREEMPT_RT case, as pointed out
by Metin Kaya.
* Fix incorrect flags handling in PREEMPT_RT case as found by
Metin Kaya
---
kernel/locking/mutex.c | 18 ++++++++++--------
kernel/locking/ww_mutex.h | 22 +++++++++++-----------
2 files changed, 21 insertions(+), 19 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 980ce630232c..7de72c610c65 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -578,6 +578,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
DEFINE_WAKE_Q(wake_q);
struct mutex_waiter waiter;
struct ww_mutex *ww;
+ unsigned long flags;
int ret;

if (!use_ww_ctx)
@@ -620,7 +621,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
return 0;
}

- raw_spin_lock(&lock->wait_lock);
+ raw_spin_lock_irqsave(&lock->wait_lock, flags);
/*
* After waiting to acquire the wait_lock, try again.
*/
@@ -681,7 +682,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
goto err;
}

- raw_spin_unlock(&lock->wait_lock);
+ raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
/* Make sure we do wakeups before calling schedule */
if (!wake_q_empty(&wake_q)) {
wake_up_q(&wake_q);
@@ -707,9 +708,9 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
trace_contention_begin(lock, LCB_F_MUTEX);
}

- raw_spin_lock(&lock->wait_lock);
+ raw_spin_lock_irqsave(&lock->wait_lock, flags);
}
- raw_spin_lock(&lock->wait_lock);
+ raw_spin_lock_irqsave(&lock->wait_lock, flags);
acquired:
__set_current_state(TASK_RUNNING);

@@ -735,7 +736,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
if (ww_ctx)
ww_mutex_lock_acquired(ww, ww_ctx);

- raw_spin_unlock(&lock->wait_lock);
+ raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
wake_up_q(&wake_q);
preempt_enable();
return 0;
@@ -745,7 +746,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
__mutex_remove_waiter(lock, &waiter);
err_early_kill:
trace_contention_end(lock, ret);
- raw_spin_unlock(&lock->wait_lock);
+ raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
debug_mutex_free_waiter(&waiter);
mutex_release(&lock->dep_map, ip);
wake_up_q(&wake_q);
@@ -916,6 +917,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
struct task_struct *next = NULL;
DEFINE_WAKE_Q(wake_q);
unsigned long owner;
+ unsigned long flags;

mutex_release(&lock->dep_map, ip);

@@ -943,7 +945,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
}

preempt_disable();
- raw_spin_lock(&lock->wait_lock);
+ raw_spin_lock_irqsave(&lock->wait_lock, flags);
debug_mutex_unlock(lock);
if (!list_empty(&lock->wait_list)) {
/* get the first entry from the wait-list: */
@@ -960,7 +962,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
if (owner & MUTEX_FLAG_HANDOFF)
__mutex_handoff(lock, next);

- raw_spin_unlock(&lock->wait_lock);
+ raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
wake_up_q(&wake_q);
preempt_enable();
}
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 7189c6631d90..9facc0ddfdd3 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -70,14 +70,14 @@ __ww_mutex_has_waiters(struct mutex *lock)
return atomic_long_read(&lock->owner) & MUTEX_FLAG_WAITERS;
}

-static inline void lock_wait_lock(struct mutex *lock)
+static inline void lock_wait_lock(struct mutex *lock, unsigned long *flags)
{
- raw_spin_lock(&lock->wait_lock);
+ raw_spin_lock_irqsave(&lock->wait_lock, *flags);
}

-static inline void unlock_wait_lock(struct mutex *lock)
+static inline void unlock_wait_lock(struct mutex *lock, unsigned long *flags)
{
- raw_spin_unlock(&lock->wait_lock);
+ raw_spin_unlock_irqrestore(&lock->wait_lock, *flags);
}

static inline void lockdep_assert_wait_lock_held(struct mutex *lock)
@@ -144,14 +144,14 @@ __ww_mutex_has_waiters(struct rt_mutex *lock)
return rt_mutex_has_waiters(&lock->rtmutex);
}

-static inline void lock_wait_lock(struct rt_mutex *lock)
+static inline void lock_wait_lock(struct rt_mutex *lock, unsigned long *flags)
{
- raw_spin_lock(&lock->rtmutex.wait_lock);
+ raw_spin_lock_irqsave(&lock->rtmutex.wait_lock, *flags);
}

-static inline void unlock_wait_lock(struct rt_mutex *lock)
+static inline void unlock_wait_lock(struct rt_mutex *lock, unsigned long *flags)
{
- raw_spin_unlock(&lock->rtmutex.wait_lock);
+ raw_spin_unlock_irqrestore(&lock->rtmutex.wait_lock, *flags);
}

static inline void lockdep_assert_wait_lock_held(struct rt_mutex *lock)
@@ -380,6 +380,7 @@ static __always_inline void
ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
{
DEFINE_WAKE_Q(wake_q);
+ unsigned long flags;

ww_mutex_lock_acquired(lock, ctx);

@@ -408,10 +409,9 @@ ww_mutex_set_context_fastpath(struct ww_mutex *lock, struct ww_acquire_ctx *ctx)
* Uh oh, we raced in fastpath, check if any of the waiters need to
* die or wound us.
*/
- lock_wait_lock(&lock->base);
+ lock_wait_lock(&lock->base, &flags);
__ww_mutex_check_waiters(&lock->base, ctx, &wake_q);
- unlock_wait_lock(&lock->base);
-
+ unlock_wait_lock(&lock->base, &flags);
wake_up_q(&wake_q);
}

--
2.43.0.687.g38aa6559b0-goog


2024-02-10 00:24:27

by John Stultz

[permalink] [raw]
Subject: [PATCH v8 3/7] locking/mutex: Expose __mutex_owner()

From: Juri Lelli <[email protected]>

Implementing proxy execution requires that scheduler code be able to
identify the current owner of a mutex. Expose __mutex_owner() for
this purpose (alone!).

Cc: Joel Fernandes <[email protected]>
Cc: Qais Yousef <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Zimuzo Ezeozue <[email protected]>
Cc: Youssef Esmat <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Metin Kaya <[email protected]>
Cc: Xuewen Yan <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Juri Lelli <[email protected]>
[Removed the EXPORT_SYMBOL]
Signed-off-by: Valentin Schneider <[email protected]>
Signed-off-by: Connor O'Brien <[email protected]>
[jstultz: Reworked per Peter's suggestions]
Signed-off-by: John Stultz <[email protected]>
---
v4:
* Move __mutex_owner() to kernel/locking/mutex.h instead of
adding a new globally available accessor function to keep
the exposure of this low, along with keeping it an inline
function, as suggested by PeterZ
---
kernel/locking/mutex.c | 25 -------------------------
kernel/locking/mutex.h | 25 +++++++++++++++++++++++++
2 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 7de72c610c65..5741641be914 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -56,31 +56,6 @@ __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key)
}
EXPORT_SYMBOL(__mutex_init);

-/*
- * @owner: contains: 'struct task_struct *' to the current lock owner,
- * NULL means not owned. Since task_struct pointers are aligned at
- * at least L1_CACHE_BYTES, we have low bits to store extra state.
- *
- * Bit0 indicates a non-empty waiter list; unlock must issue a wakeup.
- * Bit1 indicates unlock needs to hand the lock to the top-waiter
- * Bit2 indicates handoff has been done and we're waiting for pickup.
- */
-#define MUTEX_FLAG_WAITERS 0x01
-#define MUTEX_FLAG_HANDOFF 0x02
-#define MUTEX_FLAG_PICKUP 0x04
-
-#define MUTEX_FLAGS 0x07
-
-/*
- * Internal helper function; C doesn't allow us to hide it :/
- *
- * DO NOT USE (outside of mutex code).
- */
-static inline struct task_struct *__mutex_owner(struct mutex *lock)
-{
- return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLAGS);
-}
-
static inline struct task_struct *__owner_task(unsigned long owner)
{
return (struct task_struct *)(owner & ~MUTEX_FLAGS);
diff --git a/kernel/locking/mutex.h b/kernel/locking/mutex.h
index 0b2a79c4013b..1c7d3d32def8 100644
--- a/kernel/locking/mutex.h
+++ b/kernel/locking/mutex.h
@@ -20,6 +20,31 @@ struct mutex_waiter {
#endif
};

+/*
+ * @owner: contains: 'struct task_struct *' to the current lock owner,
+ * NULL means not owned. Since task_struct pointers are aligned at
+ * at least L1_CACHE_BYTES, we have low bits to store extra state.
+ *
+ * Bit0 indicates a non-empty waiter list; unlock must issue a wakeup.
+ * Bit1 indicates unlock needs to hand the lock to the top-waiter
+ * Bit2 indicates handoff has been done and we're waiting for pickup.
+ */
+#define MUTEX_FLAG_WAITERS 0x01
+#define MUTEX_FLAG_HANDOFF 0x02
+#define MUTEX_FLAG_PICKUP 0x04
+
+#define MUTEX_FLAGS 0x07
+
+/*
+ * Internal helper function; C doesn't allow us to hide it :/
+ *
+ * DO NOT USE (outside of mutex & scheduler code).
+ */
+static inline struct task_struct *__mutex_owner(struct mutex *lock)
+{
+ return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLAGS);
+}
+
#ifdef CONFIG_DEBUG_MUTEXES
extern void debug_mutex_lock_common(struct mutex *lock,
struct mutex_waiter *waiter);
--
2.43.0.687.g38aa6559b0-goog


2024-02-10 00:24:49

by John Stultz

[permalink] [raw]
Subject: [PATCH v8 4/7] sched: Add do_push_task helper

From: Connor O'Brien <[email protected]>

Switch logic that deactivates, sets the task cpu,
and reactivates a task on a different rq to use a
helper that will be later extended to push entire
blocked task chains.

This patch was broken out from a larger chain migration
patch originally by Connor O'Brien.

Cc: Joel Fernandes <[email protected]>
Cc: Qais Yousef <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Zimuzo Ezeozue <[email protected]>
Cc: Youssef Esmat <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Metin Kaya <[email protected]>
Cc: Xuewen Yan <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Connor O'Brien <[email protected]>
[jstultz: split out from larger chain migration patch]
Signed-off-by: John Stultz <[email protected]>
---
v8:
* Renamed from push_task_chain to do_push_task so it makes
more sense without proxy-execution
---
kernel/sched/core.c | 4 +---
kernel/sched/deadline.c | 8 ++------
kernel/sched/rt.c | 8 ++------
kernel/sched/sched.h | 9 +++++++++
4 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9116bcc90346..ad4748327651 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2714,9 +2714,7 @@ int push_cpu_stop(void *arg)

// XXX validate p is still the highest prio task
if (task_rq(p) == rq) {
- deactivate_task(rq, p, 0);
- set_task_cpu(p, lowest_rq->cpu);
- activate_task(lowest_rq, p, 0);
+ do_push_task(rq, lowest_rq, p);
resched_curr(lowest_rq);
}

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index a04a436af8cc..e68d88963e89 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2443,9 +2443,7 @@ static int push_dl_task(struct rq *rq)
goto retry;
}

- deactivate_task(rq, next_task, 0);
- set_task_cpu(next_task, later_rq->cpu);
- activate_task(later_rq, next_task, 0);
+ do_push_task(rq, later_rq, next_task);
ret = 1;

resched_curr(later_rq);
@@ -2531,9 +2529,7 @@ static void pull_dl_task(struct rq *this_rq)
if (is_migration_disabled(p)) {
push_task = get_push_task(src_rq);
} else {
- deactivate_task(src_rq, p, 0);
- set_task_cpu(p, this_cpu);
- activate_task(this_rq, p, 0);
+ do_push_task(src_rq, this_rq, p);
dmin = p->dl.deadline;
resched = true;
}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 3261b067b67e..dd072d11cc02 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2106,9 +2106,7 @@ static int push_rt_task(struct rq *rq, bool pull)
goto retry;
}

- deactivate_task(rq, next_task, 0);
- set_task_cpu(next_task, lowest_rq->cpu);
- activate_task(lowest_rq, next_task, 0);
+ do_push_task(rq, lowest_rq, next_task);
resched_curr(lowest_rq);
ret = 1;

@@ -2379,9 +2377,7 @@ static void pull_rt_task(struct rq *this_rq)
if (is_migration_disabled(p)) {
push_task = get_push_task(src_rq);
} else {
- deactivate_task(src_rq, p, 0);
- set_task_cpu(p, this_cpu);
- activate_task(this_rq, p, 0);
+ do_push_task(src_rq, this_rq, p);
resched = true;
}
/*
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 001fe047bd5d..6ca83837e0f4 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3472,5 +3472,14 @@ static inline void init_sched_mm_cid(struct task_struct *t) { }

extern u64 avg_vruntime(struct cfs_rq *cfs_rq);
extern int entity_eligible(struct cfs_rq *cfs_rq, struct sched_entity *se);
+#ifdef CONFIG_SMP
+static inline
+void do_push_task(struct rq *rq, struct rq *dst_rq, struct task_struct *task)
+{
+ deactivate_task(rq, task, 0);
+ set_task_cpu(task, dst_rq->cpu);
+ activate_task(dst_rq, task, 0);
+}
+#endif

#endif /* _KERNEL_SCHED_SCHED_H */
--
2.43.0.687.g38aa6559b0-goog


2024-02-10 00:25:01

by John Stultz

[permalink] [raw]
Subject: [PATCH v8 5/7] sched: Consolidate pick_*_task to task_is_pushable helper

From: Connor O'Brien <[email protected]>

This patch consolidates rt and deadline pick_*_task functions to
a task_is_pushable() helper

This patch was broken out from a larger chain migration
patch originally by Connor O'Brien.

Cc: Joel Fernandes <[email protected]>
Cc: Qais Yousef <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Zimuzo Ezeozue <[email protected]>
Cc: Youssef Esmat <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Metin Kaya <[email protected]>
Cc: Xuewen Yan <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Connor O'Brien <[email protected]>
[jstultz: split out from larger chain migration patch,
renamed helper function]
Signed-off-by: John Stultz <[email protected]>
---
v7:
* Split from chain migration patch
* Renamed function
---
kernel/sched/deadline.c | 10 +---------
kernel/sched/rt.c | 11 +----------
kernel/sched/sched.h | 10 ++++++++++
3 files changed, 12 insertions(+), 19 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index e68d88963e89..1b9cdb507498 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2179,14 +2179,6 @@ static void task_fork_dl(struct task_struct *p)
/* Only try algorithms three times */
#define DL_MAX_TRIES 3

-static int pick_dl_task(struct rq *rq, struct task_struct *p, int cpu)
-{
- if (!task_on_cpu(rq, p) &&
- cpumask_test_cpu(cpu, &p->cpus_mask))
- return 1;
- return 0;
-}
-
/*
* Return the earliest pushable rq's task, which is suitable to be executed
* on the CPU, NULL otherwise:
@@ -2205,7 +2197,7 @@ static struct task_struct *pick_earliest_pushable_dl_task(struct rq *rq, int cpu
if (next_node) {
p = __node_2_pdl(next_node);

- if (pick_dl_task(rq, p, cpu))
+ if (task_is_pushable(rq, p, cpu) == 1)
return p;

next_node = rb_next(next_node);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index dd072d11cc02..638e7c158ae4 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1791,15 +1791,6 @@ static void put_prev_task_rt(struct rq *rq, struct task_struct *p)
/* Only try algorithms three times */
#define RT_MAX_TRIES 3

-static int pick_rt_task(struct rq *rq, struct task_struct *p, int cpu)
-{
- if (!task_on_cpu(rq, p) &&
- cpumask_test_cpu(cpu, &p->cpus_mask))
- return 1;
-
- return 0;
-}
-
/*
* Return the highest pushable rq's task, which is suitable to be executed
* on the CPU, NULL otherwise
@@ -1813,7 +1804,7 @@ static struct task_struct *pick_highest_pushable_task(struct rq *rq, int cpu)
return NULL;

plist_for_each_entry(p, head, pushable_tasks) {
- if (pick_rt_task(rq, p, cpu))
+ if (task_is_pushable(rq, p, cpu) == 1)
return p;
}

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 6ca83837e0f4..c83e5e0672dc 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -3480,6 +3480,16 @@ void do_push_task(struct rq *rq, struct rq *dst_rq, struct task_struct *task)
set_task_cpu(task, dst_rq->cpu);
activate_task(dst_rq, task, 0);
}
+
+static inline
+int task_is_pushable(struct rq *rq, struct task_struct *p, int cpu)
+{
+ if (!task_on_cpu(rq, p) &&
+ cpumask_test_cpu(cpu, &p->cpus_mask))
+ return 1;
+
+ return 0;
+}
#endif

#endif /* _KERNEL_SCHED_SCHED_H */
--
2.43.0.687.g38aa6559b0-goog


2024-02-10 00:25:41

by John Stultz

[permalink] [raw]
Subject: [PATCH v8 7/7] sched: Split scheduler and execution contexts

From: Peter Zijlstra <[email protected]>

Let's define the scheduling context as all the scheduler state
in task_struct for the task selected to run, and the execution
context as all state required to actually run the task.

Currently both are intertwined in task_struct. We want to
logically split these such that we can use the scheduling
context of the task selected to be scheduled, but use the
execution context of a different task to actually be run.

To this purpose, introduce rq_selected() macro to point to the
task_struct selected from the runqueue by the scheduler, and
will be used for scheduler state, and preserve rq->curr to
indicate the execution context of the task that will actually be
run.

Cc: Joel Fernandes <[email protected]>
Cc: Qais Yousef <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Zimuzo Ezeozue <[email protected]>
Cc: Youssef Esmat <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Xuewen Yan <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Metin Kaya <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Juri Lelli <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
[add additional comments and update more sched_class code to use
rq::proxy]
Signed-off-by: Connor O'Brien <[email protected]>
[jstultz: Rebased and resolved minor collisions, reworked to use
accessors, tweaked update_curr_common to use rq_proxy fixing rt
scheduling issues]
Signed-off-by: John Stultz <[email protected]>
---
v2:
* Reworked to use accessors
* Fixed update_curr_common to use proxy instead of curr
v3:
* Tweaked wrapper names
* Swapped proxy for selected for clarity
v4:
* Minor variable name tweaks for readability
* Use a macro instead of a inline function and drop
other helper functions as suggested by Peter.
* Remove verbose comments/questions to avoid review
distractions, as suggested by Dietmar
v5:
* Add CONFIG_PROXY_EXEC option to this patch so the
new logic can be tested with this change
* Minor fix to grab rq_selected when holding the rq lock
v7:
* Minor spelling fix and unused argument fixes suggested by
Metin Kaya
* Switch to curr_selected for consistency, and minor rewording
of commit message for clarity
* Rename variables selected instead of curr when we're using
rq_selected()
* Reduce macros in CONFIG_SCHED_PROXY_EXEC ifdef sections,
as suggested by Metin Kaya
v8:
* Use rq->curr, not rq_selected with task_tick, as suggested by
Valentin
* Minor rework to reorder this with CONFIG_SCHED_PROXY_EXEC patch
---
kernel/sched/core.c | 46 ++++++++++++++++++++++++++---------------
kernel/sched/deadline.c | 35 ++++++++++++++++---------------
kernel/sched/fair.c | 18 ++++++++--------
kernel/sched/rt.c | 40 +++++++++++++++++------------------
kernel/sched/sched.h | 25 ++++++++++++++++++++--
5 files changed, 99 insertions(+), 65 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b537e5f501ea..c17f91d6ceba 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -794,7 +794,7 @@ static enum hrtimer_restart hrtick(struct hrtimer *timer)

rq_lock(rq, &rf);
update_rq_clock(rq);
- rq->curr->sched_class->task_tick(rq, rq->curr, 1);
+ rq_selected(rq)->sched_class->task_tick(rq, rq->curr, 1);
rq_unlock(rq, &rf);

return HRTIMER_NORESTART;
@@ -2238,16 +2238,18 @@ static inline void check_class_changed(struct rq *rq, struct task_struct *p,

void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags)
{
- if (p->sched_class == rq->curr->sched_class)
- rq->curr->sched_class->wakeup_preempt(rq, p, flags);
- else if (sched_class_above(p->sched_class, rq->curr->sched_class))
+ struct task_struct *selected = rq_selected(rq);
+
+ if (p->sched_class == selected->sched_class)
+ selected->sched_class->wakeup_preempt(rq, p, flags);
+ else if (sched_class_above(p->sched_class, selected->sched_class))
resched_curr(rq);

/*
* A queue event has occurred, and we're going to schedule. In
* this case, we can save a useless back to back clock update.
*/
- if (task_on_rq_queued(rq->curr) && test_tsk_need_resched(rq->curr))
+ if (task_on_rq_queued(selected) && test_tsk_need_resched(rq->curr))
rq_clock_skip_update(rq);
}

@@ -2774,7 +2776,7 @@ __do_set_cpus_allowed(struct task_struct *p, struct affinity_context *ctx)
lockdep_assert_held(&p->pi_lock);

queued = task_on_rq_queued(p);
- running = task_current(rq, p);
+ running = task_current_selected(rq, p);

if (queued) {
/*
@@ -5587,7 +5589,7 @@ unsigned long long task_sched_runtime(struct task_struct *p)
* project cycles that may never be accounted to this
* thread, breaking clock_gettime().
*/
- if (task_current(rq, p) && task_on_rq_queued(p)) {
+ if (task_current_selected(rq, p) && task_on_rq_queued(p)) {
prefetch_curr_exec_start(p);
update_rq_clock(rq);
p->sched_class->update_curr(rq);
@@ -5655,7 +5657,8 @@ void scheduler_tick(void)
{
int cpu = smp_processor_id();
struct rq *rq = cpu_rq(cpu);
- struct task_struct *curr = rq->curr;
+ /* accounting goes to the selected task */
+ struct task_struct *selected;
struct rq_flags rf;
unsigned long thermal_pressure;
u64 resched_latency;
@@ -5666,16 +5669,17 @@ void scheduler_tick(void)
sched_clock_tick();

rq_lock(rq, &rf);
+ selected = rq_selected(rq);

update_rq_clock(rq);
thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));
update_thermal_load_avg(rq_clock_thermal(rq), rq, thermal_pressure);
- curr->sched_class->task_tick(rq, curr, 0);
+ selected->sched_class->task_tick(rq, selected, 0);
if (sched_feat(LATENCY_WARN))
resched_latency = cpu_resched_latency(rq);
calc_global_load_tick(rq);
sched_core_tick(rq);
- task_tick_mm_cid(rq, curr);
+ task_tick_mm_cid(rq, selected);

rq_unlock(rq, &rf);

@@ -5684,8 +5688,8 @@ void scheduler_tick(void)

perf_event_task_tick();

- if (curr->flags & PF_WQ_WORKER)
- wq_worker_tick(curr);
+ if (selected->flags & PF_WQ_WORKER)
+ wq_worker_tick(selected);

#ifdef CONFIG_SMP
rq->idle_balance = idle_cpu(cpu);
@@ -5750,6 +5754,12 @@ static void sched_tick_remote(struct work_struct *work)
struct task_struct *curr = rq->curr;

if (cpu_online(cpu)) {
+ /*
+ * Since this is a remote tick for full dynticks mode,
+ * we are always sure that there is no proxy (only a
+ * single task is running).
+ */
+ SCHED_WARN_ON(rq->curr != rq_selected(rq));
update_rq_clock(rq);

if (!is_idle_task(curr)) {
@@ -6701,6 +6711,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
}

next = pick_next_task(rq, prev, &rf);
+ rq_set_selected(rq, next);
clear_tsk_need_resched(prev);
clear_preempt_need_resched();
#ifdef CONFIG_SCHED_DEBUG
@@ -7201,7 +7212,7 @@ void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task)

prev_class = p->sched_class;
queued = task_on_rq_queued(p);
- running = task_current(rq, p);
+ running = task_current_selected(rq, p);
if (queued)
dequeue_task(rq, p, queue_flag);
if (running)
@@ -7291,7 +7302,7 @@ void set_user_nice(struct task_struct *p, long nice)
}

queued = task_on_rq_queued(p);
- running = task_current(rq, p);
+ running = task_current_selected(rq, p);
if (queued)
dequeue_task(rq, p, DEQUEUE_SAVE | DEQUEUE_NOCLOCK);
if (running)
@@ -7870,7 +7881,7 @@ static int __sched_setscheduler(struct task_struct *p,
}

queued = task_on_rq_queued(p);
- running = task_current(rq, p);
+ running = task_current_selected(rq, p);
if (queued)
dequeue_task(rq, p, queue_flags);
if (running)
@@ -9297,6 +9308,7 @@ void __init init_idle(struct task_struct *idle, int cpu)
rcu_read_unlock();

rq->idle = idle;
+ rq_set_selected(rq, idle);
rcu_assign_pointer(rq->curr, idle);
idle->on_rq = TASK_ON_RQ_QUEUED;
#ifdef CONFIG_SMP
@@ -9386,7 +9398,7 @@ void sched_setnuma(struct task_struct *p, int nid)

rq = task_rq_lock(p, &rf);
queued = task_on_rq_queued(p);
- running = task_current(rq, p);
+ running = task_current_selected(rq, p);

if (queued)
dequeue_task(rq, p, DEQUEUE_SAVE);
@@ -10491,7 +10503,7 @@ void sched_move_task(struct task_struct *tsk)

update_rq_clock(rq);

- running = task_current(rq, tsk);
+ running = task_current_selected(rq, tsk);
queued = task_on_rq_queued(tsk);

if (queued)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 1b9cdb507498..c30b592d6e9d 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1218,7 +1218,7 @@ static enum hrtimer_restart dl_task_timer(struct hrtimer *timer)
#endif

enqueue_task_dl(rq, p, ENQUEUE_REPLENISH);
- if (dl_task(rq->curr))
+ if (dl_task(rq_selected(rq)))
wakeup_preempt_dl(rq, p, 0);
else
resched_curr(rq);
@@ -1442,7 +1442,7 @@ void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq,
*/
static void update_curr_dl(struct rq *rq)
{
- struct task_struct *curr = rq->curr;
+ struct task_struct *curr = rq_selected(rq);
struct sched_dl_entity *dl_se = &curr->dl;
s64 delta_exec;

@@ -1899,7 +1899,7 @@ static int find_later_rq(struct task_struct *task);
static int
select_task_rq_dl(struct task_struct *p, int cpu, int flags)
{
- struct task_struct *curr;
+ struct task_struct *curr, *selected;
bool select_rq;
struct rq *rq;

@@ -1910,6 +1910,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int flags)

rcu_read_lock();
curr = READ_ONCE(rq->curr); /* unlocked access */
+ selected = READ_ONCE(rq_selected(rq));

/*
* If we are dealing with a -deadline task, we must
@@ -1920,9 +1921,9 @@ select_task_rq_dl(struct task_struct *p, int cpu, int flags)
* other hand, if it has a shorter deadline, we
* try to make it stay here, it might be important.
*/
- select_rq = unlikely(dl_task(curr)) &&
+ select_rq = unlikely(dl_task(selected)) &&
(curr->nr_cpus_allowed < 2 ||
- !dl_entity_preempt(&p->dl, &curr->dl)) &&
+ !dl_entity_preempt(&p->dl, &selected->dl)) &&
p->nr_cpus_allowed > 1;

/*
@@ -1985,7 +1986,7 @@ static void check_preempt_equal_dl(struct rq *rq, struct task_struct *p)
* let's hope p can move out.
*/
if (rq->curr->nr_cpus_allowed == 1 ||
- !cpudl_find(&rq->rd->cpudl, rq->curr, NULL))
+ !cpudl_find(&rq->rd->cpudl, rq_selected(rq), NULL))
return;

/*
@@ -2024,7 +2025,7 @@ static int balance_dl(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p,
int flags)
{
- if (dl_entity_preempt(&p->dl, &rq->curr->dl)) {
+ if (dl_entity_preempt(&p->dl, &rq_selected(rq)->dl)) {
resched_curr(rq);
return;
}
@@ -2034,7 +2035,7 @@ static void wakeup_preempt_dl(struct rq *rq, struct task_struct *p,
* In the unlikely case current and p have the same deadline
* let us try to decide what's the best thing to do...
*/
- if ((p->dl.deadline == rq->curr->dl.deadline) &&
+ if ((p->dl.deadline == rq_selected(rq)->dl.deadline) &&
!test_tsk_need_resched(rq->curr))
check_preempt_equal_dl(rq, p);
#endif /* CONFIG_SMP */
@@ -2066,7 +2067,7 @@ static void set_next_task_dl(struct rq *rq, struct task_struct *p, bool first)
if (!first)
return;

- if (rq->curr->sched_class != &dl_sched_class)
+ if (rq_selected(rq)->sched_class != &dl_sched_class)
update_dl_rq_load_avg(rq_clock_pelt(rq), rq, 0);

deadline_queue_push_tasks(rq);
@@ -2391,8 +2392,8 @@ static int push_dl_task(struct rq *rq)
* can move away, it makes sense to just reschedule
* without going further in pushing next_task.
*/
- if (dl_task(rq->curr) &&
- dl_time_before(next_task->dl.deadline, rq->curr->dl.deadline) &&
+ if (dl_task(rq_selected(rq)) &&
+ dl_time_before(next_task->dl.deadline, rq_selected(rq)->dl.deadline) &&
rq->curr->nr_cpus_allowed > 1) {
resched_curr(rq);
return 0;
@@ -2515,7 +2516,7 @@ static void pull_dl_task(struct rq *this_rq)
* deadline than the current task of its runqueue.
*/
if (dl_time_before(p->dl.deadline,
- src_rq->curr->dl.deadline))
+ rq_selected(src_rq)->dl.deadline))
goto skip;

if (is_migration_disabled(p)) {
@@ -2554,9 +2555,9 @@ static void task_woken_dl(struct rq *rq, struct task_struct *p)
if (!task_on_cpu(rq, p) &&
!test_tsk_need_resched(rq->curr) &&
p->nr_cpus_allowed > 1 &&
- dl_task(rq->curr) &&
+ dl_task(rq_selected(rq)) &&
(rq->curr->nr_cpus_allowed < 2 ||
- !dl_entity_preempt(&p->dl, &rq->curr->dl))) {
+ !dl_entity_preempt(&p->dl, &rq_selected(rq)->dl))) {
push_dl_tasks(rq);
}
}
@@ -2731,12 +2732,12 @@ static void switched_to_dl(struct rq *rq, struct task_struct *p)
return;
}

- if (rq->curr != p) {
+ if (rq_selected(rq) != p) {
#ifdef CONFIG_SMP
if (p->nr_cpus_allowed > 1 && rq->dl.overloaded)
deadline_queue_push_tasks(rq);
#endif
- if (dl_task(rq->curr))
+ if (dl_task(rq_selected(rq)))
wakeup_preempt_dl(rq, p, 0);
else
resched_curr(rq);
@@ -2765,7 +2766,7 @@ static void prio_changed_dl(struct rq *rq, struct task_struct *p,
if (!rq->dl.overloaded)
deadline_queue_pull_task(rq);

- if (task_current(rq, p)) {
+ if (task_current_selected(rq, p)) {
/*
* If we now have a earlier deadline task than p,
* then reschedule, provided p is still on this
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 533547e3c90a..dc342e1fc420 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -1140,7 +1140,7 @@ static inline void update_curr_task(struct task_struct *p, s64 delta_exec)
*/
s64 update_curr_common(struct rq *rq)
{
- struct task_struct *curr = rq->curr;
+ struct task_struct *curr = rq_selected(rq);
s64 delta_exec;

delta_exec = update_curr_se(rq, &curr->se);
@@ -1177,7 +1177,7 @@ static void update_curr(struct cfs_rq *cfs_rq)

static void update_curr_fair(struct rq *rq)
{
- update_curr(cfs_rq_of(&rq->curr->se));
+ update_curr(cfs_rq_of(&rq_selected(rq)->se));
}

static inline void
@@ -6627,7 +6627,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
s64 delta = slice - ran;

if (delta < 0) {
- if (task_current(rq, p))
+ if (task_current_selected(rq, p))
resched_curr(rq);
return;
}
@@ -6642,7 +6642,7 @@ static void hrtick_start_fair(struct rq *rq, struct task_struct *p)
*/
static void hrtick_update(struct rq *rq)
{
- struct task_struct *curr = rq->curr;
+ struct task_struct *curr = rq_selected(rq);

if (!hrtick_enabled_fair(rq) || curr->sched_class != &fair_sched_class)
return;
@@ -8267,7 +8267,7 @@ static void set_next_buddy(struct sched_entity *se)
*/
static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int wake_flags)
{
- struct task_struct *curr = rq->curr;
+ struct task_struct *curr = rq_selected(rq);
struct sched_entity *se = &curr->se, *pse = &p->se;
struct cfs_rq *cfs_rq = task_cfs_rq(curr);
int cse_is_idle, pse_is_idle;
@@ -8298,7 +8298,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
* prevents us from potentially nominating it as a false LAST_BUDDY
* below.
*/
- if (test_tsk_need_resched(curr))
+ if (test_tsk_need_resched(rq->curr))
return;

/* Idle tasks are by definition preempted by non-idle tasks. */
@@ -9282,7 +9282,7 @@ static bool __update_blocked_others(struct rq *rq, bool *done)
* update_load_avg() can call cpufreq_update_util(). Make sure that RT,
* DL and IRQ signals have been updated before updating CFS.
*/
- curr_class = rq->curr->sched_class;
+ curr_class = rq_selected(rq)->sched_class;

thermal_pressure = arch_scale_thermal_pressure(cpu_of(rq));

@@ -12673,7 +12673,7 @@ prio_changed_fair(struct rq *rq, struct task_struct *p, int oldprio)
* our priority decreased, or if we are not currently running on
* this runqueue and our priority is higher than the current's
*/
- if (task_current(rq, p)) {
+ if (task_current_selected(rq, p)) {
if (p->prio > oldprio)
resched_curr(rq);
} else
@@ -12776,7 +12776,7 @@ static void switched_to_fair(struct rq *rq, struct task_struct *p)
* kick off the schedule if running, otherwise just see
* if we can still preempt the current task.
*/
- if (task_current(rq, p))
+ if (task_current_selected(rq, p))
resched_curr(rq);
else
wakeup_preempt(rq, p, 0);
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 638e7c158ae4..48fc7a198f1a 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -530,7 +530,7 @@ static void dequeue_rt_entity(struct sched_rt_entity *rt_se, unsigned int flags)

static void sched_rt_rq_enqueue(struct rt_rq *rt_rq)
{
- struct task_struct *curr = rq_of_rt_rq(rt_rq)->curr;
+ struct task_struct *curr = rq_selected(rq_of_rt_rq(rt_rq));
struct rq *rq = rq_of_rt_rq(rt_rq);
struct sched_rt_entity *rt_se;

@@ -1000,7 +1000,7 @@ static int sched_rt_runtime_exceeded(struct rt_rq *rt_rq)
*/
static void update_curr_rt(struct rq *rq)
{
- struct task_struct *curr = rq->curr;
+ struct task_struct *curr = rq_selected(rq);
struct sched_rt_entity *rt_se = &curr->rt;
s64 delta_exec;

@@ -1543,7 +1543,7 @@ static int find_lowest_rq(struct task_struct *task);
static int
select_task_rq_rt(struct task_struct *p, int cpu, int flags)
{
- struct task_struct *curr;
+ struct task_struct *curr, *selected;
struct rq *rq;
bool test;

@@ -1555,6 +1555,7 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags)

rcu_read_lock();
curr = READ_ONCE(rq->curr); /* unlocked access */
+ selected = READ_ONCE(rq_selected(rq));

/*
* If the current task on @p's runqueue is an RT task, then
@@ -1583,8 +1584,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags)
* systems like big.LITTLE.
*/
test = curr &&
- unlikely(rt_task(curr)) &&
- (curr->nr_cpus_allowed < 2 || curr->prio <= p->prio);
+ unlikely(rt_task(selected)) &&
+ (curr->nr_cpus_allowed < 2 || selected->prio <= p->prio);

if (test || !rt_task_fits_capacity(p, cpu)) {
int target = find_lowest_rq(p);
@@ -1614,12 +1615,8 @@ select_task_rq_rt(struct task_struct *p, int cpu, int flags)

static void check_preempt_equal_prio(struct rq *rq, struct task_struct *p)
{
- /*
- * Current can't be migrated, useless to reschedule,
- * let's hope p can move out.
- */
if (rq->curr->nr_cpus_allowed == 1 ||
- !cpupri_find(&rq->rd->cpupri, rq->curr, NULL))
+ !cpupri_find(&rq->rd->cpupri, rq_selected(rq), NULL))
return;

/*
@@ -1662,7 +1659,9 @@ static int balance_rt(struct rq *rq, struct task_struct *p, struct rq_flags *rf)
*/
static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
{
- if (p->prio < rq->curr->prio) {
+ struct task_struct *curr = rq_selected(rq);
+
+ if (p->prio < curr->prio) {
resched_curr(rq);
return;
}
@@ -1680,7 +1679,7 @@ static void wakeup_preempt_rt(struct rq *rq, struct task_struct *p, int flags)
* to move current somewhere else, making room for our non-migratable
* task.
*/
- if (p->prio == rq->curr->prio && !test_tsk_need_resched(rq->curr))
+ if (p->prio == curr->prio && !test_tsk_need_resched(rq->curr))
check_preempt_equal_prio(rq, p);
#endif
}
@@ -1705,7 +1704,7 @@ static inline void set_next_task_rt(struct rq *rq, struct task_struct *p, bool f
* utilization. We only care of the case where we start to schedule a
* rt task
*/
- if (rq->curr->sched_class != &rt_sched_class)
+ if (rq_selected(rq)->sched_class != &rt_sched_class)
update_rt_rq_load_avg(rq_clock_pelt(rq), rq, 0);

rt_queue_push_tasks(rq);
@@ -1977,6 +1976,7 @@ static struct task_struct *pick_next_pushable_task(struct rq *rq)

BUG_ON(rq->cpu != task_cpu(p));
BUG_ON(task_current(rq, p));
+ BUG_ON(task_current_selected(rq, p));
BUG_ON(p->nr_cpus_allowed <= 1);

BUG_ON(!task_on_rq_queued(p));
@@ -2009,7 +2009,7 @@ static int push_rt_task(struct rq *rq, bool pull)
* higher priority than current. If that's the case
* just reschedule current.
*/
- if (unlikely(next_task->prio < rq->curr->prio)) {
+ if (unlikely(next_task->prio < rq_selected(rq)->prio)) {
resched_curr(rq);
return 0;
}
@@ -2362,7 +2362,7 @@ static void pull_rt_task(struct rq *this_rq)
* p if it is lower in priority than the
* current task on the run queue
*/
- if (p->prio < src_rq->curr->prio)
+ if (p->prio < rq_selected(src_rq)->prio)
goto skip;

if (is_migration_disabled(p)) {
@@ -2404,9 +2404,9 @@ static void task_woken_rt(struct rq *rq, struct task_struct *p)
bool need_to_push = !task_on_cpu(rq, p) &&
!test_tsk_need_resched(rq->curr) &&
p->nr_cpus_allowed > 1 &&
- (dl_task(rq->curr) || rt_task(rq->curr)) &&
+ (dl_task(rq_selected(rq)) || rt_task(rq_selected(rq))) &&
(rq->curr->nr_cpus_allowed < 2 ||
- rq->curr->prio <= p->prio);
+ rq_selected(rq)->prio <= p->prio);

if (need_to_push)
push_rt_tasks(rq);
@@ -2490,7 +2490,7 @@ static void switched_to_rt(struct rq *rq, struct task_struct *p)
if (p->nr_cpus_allowed > 1 && rq->rt.overloaded)
rt_queue_push_tasks(rq);
#endif /* CONFIG_SMP */
- if (p->prio < rq->curr->prio && cpu_online(cpu_of(rq)))
+ if (p->prio < rq_selected(rq)->prio && cpu_online(cpu_of(rq)))
resched_curr(rq);
}
}
@@ -2505,7 +2505,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
if (!task_on_rq_queued(p))
return;

- if (task_current(rq, p)) {
+ if (task_current_selected(rq, p)) {
#ifdef CONFIG_SMP
/*
* If our priority decreases while running, we
@@ -2531,7 +2531,7 @@ prio_changed_rt(struct rq *rq, struct task_struct *p, int oldprio)
* greater than the current running task
* then reschedule.
*/
- if (p->prio < rq->curr->prio)
+ if (p->prio < rq_selected(rq)->prio)
resched_curr(rq);
}
}
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index c83e5e0672dc..808d6ee8ae33 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1030,7 +1030,7 @@ struct rq {
*/
unsigned int nr_uninterruptible;

- struct task_struct __rcu *curr;
+ struct task_struct __rcu *curr; /* Execution context */
struct task_struct *idle;
struct task_struct *stop;
unsigned long next_balance;
@@ -1225,6 +1225,13 @@ DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
#define cpu_curr(cpu) (cpu_rq(cpu)->curr)
#define raw_rq() raw_cpu_ptr(&runqueues)

+/* For now, rq_selected == rq->curr */
+#define rq_selected(rq) ((rq)->curr)
+static inline void rq_set_selected(struct rq *rq, struct task_struct *t)
+{
+ /* Do nothing */
+}
+
struct sched_group;
#ifdef CONFIG_SCHED_CORE
static inline struct cpumask *sched_group_span(struct sched_group *sg);
@@ -2148,11 +2155,25 @@ static inline u64 global_rt_runtime(void)
return (u64)sysctl_sched_rt_runtime * NSEC_PER_USEC;
}

+/*
+ * Is p the current execution context?
+ */
static inline int task_current(struct rq *rq, struct task_struct *p)
{
return rq->curr == p;
}

+/*
+ * Is p the current scheduling context?
+ *
+ * Note that it might be the current execution context at the same time if
+ * rq->curr == rq_selected() == p.
+ */
+static inline int task_current_selected(struct rq *rq, struct task_struct *p)
+{
+ return rq_selected(rq) == p;
+}
+
static inline int task_on_cpu(struct rq *rq, struct task_struct *p)
{
#ifdef CONFIG_SMP
@@ -2322,7 +2343,7 @@ struct sched_class {

static inline void put_prev_task(struct rq *rq, struct task_struct *prev)
{
- WARN_ON_ONCE(rq->curr != prev);
+ WARN_ON_ONCE(rq_selected(rq) != prev);
prev->sched_class->put_prev_task(rq, prev);
}

--
2.43.0.687.g38aa6559b0-goog


2024-02-10 00:26:14

by John Stultz

[permalink] [raw]
Subject: [PATCH v8 6/7] sched: Split out __schedule() deactivate task logic into a helper

As we're going to re-use the deactivation logic,
split it into a helper.

Cc: Joel Fernandes <[email protected]>
Cc: Qais Yousef <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Ben Segall <[email protected]>
Cc: Zimuzo Ezeozue <[email protected]>
Cc: Youssef Esmat <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Daniel Bristot de Oliveira <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Waiman Long <[email protected]>
Cc: Boqun Feng <[email protected]>
Cc: "Paul E. McKenney" <[email protected]>
Cc: Metin Kaya <[email protected]>
Cc: Xuewen Yan <[email protected]>
Cc: K Prateek Nayak <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Signed-off-by: John Stultz <[email protected]>
---
v6:
* Define function as static to avoid "no previous prototype"
warnings as Reported-by: kernel test robot <[email protected]>
v7:
* Rename state task_state to be more clear, as suggested by
Metin Kaya
---
kernel/sched/core.c | 72 +++++++++++++++++++++++++++------------------
1 file changed, 43 insertions(+), 29 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ad4748327651..b537e5f501ea 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6563,6 +6563,48 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
# define SM_MASK_PREEMPT SM_PREEMPT
#endif

+/*
+ * Helper function for __schedule()
+ *
+ * If a task does not have signals pending, deactivate it and return true
+ * Otherwise marks the task's __state as RUNNING and returns false
+ */
+static bool try_to_deactivate_task(struct rq *rq, struct task_struct *p,
+ unsigned long task_state)
+{
+ if (signal_pending_state(task_state, p)) {
+ WRITE_ONCE(p->__state, TASK_RUNNING);
+ } else {
+ p->sched_contributes_to_load =
+ (task_state & TASK_UNINTERRUPTIBLE) &&
+ !(task_state & TASK_NOLOAD) &&
+ !(task_state & TASK_FROZEN);
+
+ if (p->sched_contributes_to_load)
+ rq->nr_uninterruptible++;
+
+ /*
+ * __schedule() ttwu()
+ * prev_state = prev->state; if (p->on_rq && ...)
+ * if (prev_state) goto out;
+ * p->on_rq = 0; smp_acquire__after_ctrl_dep();
+ * p->state = TASK_WAKING
+ *
+ * Where __schedule() and ttwu() have matching control dependencies.
+ *
+ * After this, schedule() must not care about p->state any more.
+ */
+ deactivate_task(rq, p, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
+
+ if (p->in_iowait) {
+ atomic_inc(&rq->nr_iowait);
+ delayacct_blkio_start();
+ }
+ return true;
+ }
+ return false;
+}
+
/*
* __schedule() is the main scheduler function.
*
@@ -6654,35 +6696,7 @@ static void __sched notrace __schedule(unsigned int sched_mode)
*/
prev_state = READ_ONCE(prev->__state);
if (!(sched_mode & SM_MASK_PREEMPT) && prev_state) {
- if (signal_pending_state(prev_state, prev)) {
- WRITE_ONCE(prev->__state, TASK_RUNNING);
- } else {
- prev->sched_contributes_to_load =
- (prev_state & TASK_UNINTERRUPTIBLE) &&
- !(prev_state & TASK_NOLOAD) &&
- !(prev_state & TASK_FROZEN);
-
- if (prev->sched_contributes_to_load)
- rq->nr_uninterruptible++;
-
- /*
- * __schedule() ttwu()
- * prev_state = prev->state; if (p->on_rq && ...)
- * if (prev_state) goto out;
- * p->on_rq = 0; smp_acquire__after_ctrl_dep();
- * p->state = TASK_WAKING
- *
- * Where __schedule() and ttwu() have matching control dependencies.
- *
- * After this, schedule() must not care about p->state any more.
- */
- deactivate_task(rq, prev, DEQUEUE_SLEEP | DEQUEUE_NOCLOCK);
-
- if (prev->in_iowait) {
- atomic_inc(&rq->nr_iowait);
- delayacct_blkio_start();
- }
- }
+ try_to_deactivate_task(rq, prev, prev_state);
switch_count = &prev->nvcsw;
}

--
2.43.0.687.g38aa6559b0-goog


2024-02-10 16:51:49

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v8 7/7] sched: Split scheduler and execution contexts

Hi John,

kernel test robot noticed the following build warnings:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on tip/master linus/master v6.8-rc3 next-20240209]
[cannot apply to tip/locking/core tip/auto-latest]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/John-Stultz/locking-mutex-Remove-wakeups-from-under-mutex-wait_lock/20240210-082732
base: tip/sched/core
patch link: https://lore.kernel.org/r/20240210002328.4126422-8-jstultz%40google.com
patch subject: [PATCH v8 7/7] sched: Split scheduler and execution contexts
config: x86_64-randconfig-r121-20240210 (https://download.01.org/0day-ci/archive/20240211/[email protected]/config)
compiler: clang version 17.0.6 (https://github.com/llvm/llvm-project 6009708b4367171ccdbf4b5905cb6a803753fe18)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240211/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

sparse warnings: (new ones prefixed by >>)
kernel/sched/core.c:264:56: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/core.c:264:56: sparse: struct task_struct [noderef] __rcu *
kernel/sched/core.c:264:56: sparse: struct task_struct *
kernel/sched/core.c:796:55: sparse: sparse: incorrect type in argument 2 (different address spaces) @@ expected struct task_struct *p @@ got struct task_struct [noderef] __rcu *curr @@
kernel/sched/core.c:796:55: sparse: expected struct task_struct *p
kernel/sched/core.c:796:55: sparse: got struct task_struct [noderef] __rcu *curr
kernel/sched/core.c:1042:38: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct task_struct *curr @@ got struct task_struct [noderef] __rcu *curr @@
kernel/sched/core.c:1042:38: sparse: expected struct task_struct *curr
kernel/sched/core.c:1042:38: sparse: got struct task_struct [noderef] __rcu *curr
>> kernel/sched/core.c:2240:40: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct task_struct *selected @@ got struct task_struct [noderef] __rcu *curr @@
kernel/sched/core.c:2240:40: sparse: expected struct task_struct *selected
kernel/sched/core.c:2240:40: sparse: got struct task_struct [noderef] __rcu *curr
kernel/sched/core.c:2251:68: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct task_struct *tsk @@ got struct task_struct [noderef] __rcu *curr @@
kernel/sched/core.c:2251:68: sparse: expected struct task_struct *tsk
kernel/sched/core.c:2251:68: sparse: got struct task_struct [noderef] __rcu *curr
kernel/sched/core.c:3745:17: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct sched_domain *[assigned] sd @@ got struct sched_domain [noderef] __rcu *parent @@
kernel/sched/core.c:3745:17: sparse: expected struct sched_domain *[assigned] sd
kernel/sched/core.c:3745:17: sparse: got struct sched_domain [noderef] __rcu *parent
kernel/sched/core.c:3952:36: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct task_struct const *p @@ got struct task_struct [noderef] __rcu *curr @@
kernel/sched/core.c:3952:36: sparse: expected struct task_struct const *p
kernel/sched/core.c:3952:36: sparse: got struct task_struct [noderef] __rcu *curr
kernel/sched/core.c:9472:43: sparse: sparse: incorrect type in initializer (different address spaces) @@ expected struct task_struct *push_task @@ got struct task_struct [noderef] __rcu *curr @@
kernel/sched/core.c:9472:43: sparse: expected struct task_struct *push_task
kernel/sched/core.c:9472:43: sparse: got struct task_struct [noderef] __rcu *curr
>> kernel/sched/core.c:5671:18: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct task_struct *selected @@ got struct task_struct [noderef] __rcu *curr @@
kernel/sched/core.c:5671:18: sparse: expected struct task_struct *selected
kernel/sched/core.c:5671:18: sparse: got struct task_struct [noderef] __rcu *curr
kernel/sched/core.c:6315:32: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/core.c:6315:32: sparse: struct task_struct [noderef] __rcu *
kernel/sched/core.c:6315:32: sparse: struct task_struct *
kernel/sched/core.c:6346:23: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/core.c:6346:23: sparse: struct task_struct [noderef] __rcu *
kernel/sched/core.c:6346:23: sparse: struct task_struct *
kernel/sched/core.c:6354:46: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/core.c:6354:46: sparse: struct task_struct *
kernel/sched/core.c:6354:46: sparse: struct task_struct [noderef] __rcu *
kernel/sched/core.c:6414:9: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct sched_domain *[assigned] sd @@ got struct sched_domain [noderef] __rcu *parent @@
kernel/sched/core.c:6414:9: sparse: expected struct sched_domain *[assigned] sd
kernel/sched/core.c:6414:9: sparse: got struct sched_domain [noderef] __rcu *parent
kernel/sched/core.c:6667:14: sparse: sparse: incorrect type in assignment (different address spaces) @@ expected struct task_struct *prev @@ got struct task_struct [noderef] __rcu *curr @@
kernel/sched/core.c:6667:14: sparse: expected struct task_struct *prev
kernel/sched/core.c:6667:14: sparse: got struct task_struct [noderef] __rcu *curr
kernel/sched/core.c:7201:17: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/core.c:7201:17: sparse: struct task_struct *
kernel/sched/core.c:7201:17: sparse: struct task_struct [noderef] __rcu *
kernel/sched/core.c:7416:22: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/core.c:7416:22: sparse: struct task_struct [noderef] __rcu *
kernel/sched/core.c:7416:22: sparse: struct task_struct *
kernel/sched/core.c:7463:48: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/core.c:7463:48: sparse: struct task_struct [noderef] __rcu *
kernel/sched/core.c:7463:48: sparse: struct task_struct *
kernel/sched/core.c:11505:25: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct task_struct *p @@ got struct task_struct [noderef] __rcu *curr @@
kernel/sched/core.c:11505:25: sparse: expected struct task_struct *p
kernel/sched/core.c:11505:25: sparse: got struct task_struct [noderef] __rcu *curr
kernel/sched/core.c:338:9: sparse: sparse: context imbalance in 'sched_core_lock' - different lock contexts for basic block
kernel/sched/core.c:348:17: sparse: sparse: context imbalance in 'sched_core_unlock' - unexpected unlock
kernel/sched/core.c:551:6: sparse: sparse: context imbalance in 'raw_spin_rq_lock_nested' - wrong count at exit
kernel/sched/core.c:584:23: sparse: sparse: context imbalance in 'raw_spin_rq_trylock' - wrong count at exit
kernel/sched/core.c:600:6: sparse: sparse: context imbalance in 'raw_spin_rq_unlock' - unexpected unlock
kernel/sched/core.c: note: in included file:
kernel/sched/sched.h:1619:9: sparse: sparse: context imbalance in '__task_rq_lock' - wrong count at exit
kernel/sched/sched.h:1619:9: sparse: sparse: context imbalance in 'task_rq_lock' - wrong count at exit
kernel/sched/core.c: note: in included file:
kernel/sched/pelt.h:97:13: sparse: sparse: incorrect type in argument 1 (different address spaces) @@ expected struct task_struct const *p @@ got struct task_struct [noderef] __rcu *curr @@
kernel/sched/pelt.h:97:13: sparse: expected struct task_struct const *p
kernel/sched/pelt.h:97:13: sparse: got struct task_struct [noderef] __rcu *curr
kernel/sched/core.c:796:9: sparse: sparse: dereference of noderef expression
kernel/sched/core.c:1502:13: sparse: sparse: context imbalance in 'uclamp_update_util_min_rt_default' - wrong count at exit
kernel/sched/core.c:1834:9: sparse: sparse: context imbalance in 'uclamp_sync_util_min_rt_default' - wrong count at exit
kernel/sched/core.c:2321:35: sparse: sparse: context imbalance in 'wait_task_inactive' - different lock contexts for basic block
kernel/sched/core.c: note: in included file:
kernel/sched/sched.h:2163:32: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/sched.h:2163:32: sparse: struct task_struct [noderef] __rcu *
kernel/sched/sched.h:2163:32: sparse: struct task_struct *
kernel/sched/sched.h:2335:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/sched.h:2335:9: sparse: struct task_struct [noderef] __rcu *
kernel/sched/sched.h:2335:9: sparse: struct task_struct *
kernel/sched/core.c:2215:38: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/core.c:2215:38: sparse: struct task_struct [noderef] __rcu *
kernel/sched/core.c:2215:38: sparse: struct task_struct const *
kernel/sched/sched.h:2163:32: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/sched.h:2163:32: sparse: struct task_struct [noderef] __rcu *
kernel/sched/sched.h:2163:32: sparse: struct task_struct *
kernel/sched/sched.h:2335:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/sched.h:2335:9: sparse: struct task_struct [noderef] __rcu *
kernel/sched/sched.h:2335:9: sparse: struct task_struct *
kernel/sched/sched.h:2335:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/sched.h:2335:9: sparse: struct task_struct [noderef] __rcu *
kernel/sched/sched.h:2335:9: sparse: struct task_struct *
kernel/sched/sched.h:2335:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/sched.h:2335:9: sparse: struct task_struct [noderef] __rcu *
kernel/sched/sched.h:2335:9: sparse: struct task_struct *
kernel/sched/sched.h:2335:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/sched.h:2335:9: sparse: struct task_struct [noderef] __rcu *
kernel/sched/sched.h:2335:9: sparse: struct task_struct *
kernel/sched/sched.h:2163:32: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/sched.h:2163:32: sparse: struct task_struct [noderef] __rcu *
kernel/sched/sched.h:2163:32: sparse: struct task_struct *
kernel/sched/sched.h:2335:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/sched.h:2335:9: sparse: struct task_struct [noderef] __rcu *
kernel/sched/sched.h:2335:9: sparse: struct task_struct *
kernel/sched/sched.h:2163:32: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/sched.h:2163:32: sparse: struct task_struct [noderef] __rcu *
kernel/sched/sched.h:2163:32: sparse: struct task_struct *
kernel/sched/sched.h:2335:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/sched.h:2335:9: sparse: struct task_struct [noderef] __rcu *
kernel/sched/sched.h:2335:9: sparse: struct task_struct *
kernel/sched/sched.h:2163:32: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/sched.h:2163:32: sparse: struct task_struct [noderef] __rcu *
kernel/sched/sched.h:2163:32: sparse: struct task_struct *
kernel/sched/sched.h:2335:9: sparse: sparse: incompatible types in comparison expression (different address spaces):
kernel/sched/sched.h:2335:9: sparse: struct task_struct [noderef] __rcu *
kernel/sched/sched.h:2335:9: sparse: struct task_struct *
kernel/sched/core.c: note: in included file (through arch/x86/include/asm/uaccess.h, include/linux/uaccess.h, include/linux/sched/task.h, ...):
arch/x86/include/asm/uaccess_64.h:88:24: sparse: sparse: cast removes address space '__user' of expression
kernel/sched/core.c:8333:5: sparse: sparse: context imbalance in 'dl_task_check_affinity' - wrong count at exit
kernel/sched/core.c:8437:9: sparse: sparse: context imbalance in 'sched_setaffinity' - different lock contexts for basic block
kernel/sched/core.c:8499:6: sparse: sparse: context imbalance in 'sched_getaffinity' - wrong count at exit
kernel/sched/core.c:8662:28: sparse: sparse: context imbalance in '__cond_resched_lock' - unexpected unlock
kernel/sched/core.c:8680:17: sparse: sparse: context imbalance in '__cond_resched_rwlock_read' - unexpected unlock
kernel/sched/core.c:8698:17: sparse: sparse: context imbalance in '__cond_resched_rwlock_write' - unexpected unlock
kernel/sched/core.c:11670:5: sparse: sparse: context imbalance in '__sched_mm_cid_migrate_from_fetch_cid' - different lock contexts for basic block

vim +2240 kernel/sched/core.c

2237
2238 void wakeup_preempt(struct rq *rq, struct task_struct *p, int flags)
2239 {
> 2240 struct task_struct *selected = rq_selected(rq);
2241
2242 if (p->sched_class == selected->sched_class)
2243 selected->sched_class->wakeup_preempt(rq, p, flags);
2244 else if (sched_class_above(p->sched_class, selected->sched_class))
2245 resched_curr(rq);
2246
2247 /*
2248 * A queue event has occurred, and we're going to schedule. In
2249 * this case, we can save a useless back to back clock update.
2250 */
2251 if (task_on_rq_queued(selected) && test_tsk_need_resched(rq->curr))
2252 rq_clock_skip_update(rq);
2253 }
2254

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2024-02-12 15:50:53

by Metin Kaya

[permalink] [raw]
Subject: Re: [PATCH v8 0/7] Preparatory changes for Proxy Execution v8

On 10/02/2024 12:23 am, John Stultz wrote:
> After sending out v7 of Proxy Execution, I got feedback that the
> patch series was getting a bit unwieldy to review, and Qais
> suggested I break out just the cleanups/preparatory components of
> the patch series and submit them on their own in the hope we can
> start to merge the less complex bits and discussion can focus on
> the more complicated portions afterwards.
>
> So for the v8 of this series, I’m only submitting those earlier
> cleanup/preparatory changes here. However work on the full series
> has continued, with some nice progress on the performance front.

I guess we can also cherry-pick 1st [1], 20th [2] and 22th [3] patches
of v7 into this series?

[1]
https://lore.kernel.org/lkml/[email protected]/T/#m1e6ef390bf9044ca69818549bccec1ada221cd32
[2]
https://lore.kernel.org/lkml/[email protected]/T/#m638575b42057e1d2cc3a82d4bdea7308eb85b461
[3]
https://lore.kernel.org/lkml/[email protected]/T/#mafcb432d20c6b545998da56ba92fc51ba8a07b42

>
> If you are interested, the full v8 series, it can be found here:
> https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v8-6.8-rc3
> https://github.com/johnstultz-work/linux-dev.git proxy-exec-v8-6.8-rc3

[...]


2024-02-12 18:39:24

by John Stultz

[permalink] [raw]
Subject: Re: [PATCH v8 0/7] Preparatory changes for Proxy Execution v8

On Mon, Feb 12, 2024 at 7:50 AM Metin Kaya <[email protected]> wrote:
> On 10/02/2024 12:23 am, John Stultz wrote:
> > After sending out v7 of Proxy Execution, I got feedback that the
> > patch series was getting a bit unwieldy to review, and Qais
> > suggested I break out just the cleanups/preparatory components of
> > the patch series and submit them on their own in the hope we can
> > start to merge the less complex bits and discussion can focus on
> > the more complicated portions afterwards.
> >
> > So for the v8 of this series, I’m only submitting those earlier
> > cleanup/preparatory changes here. However work on the full series
> > has continued, with some nice progress on the performance front.
>
> I guess we can also cherry-pick 1st [1], 20th [2] and 22th [3] patches
> of v7 into this series?
>
> [1]
> https://lore.kernel.org/lkml/[email protected]/T/#m1e6ef390bf9044ca69818549bccec1ada221cd32

That one already landed upstream w/ the deadline server changes:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5d69eca542ee17c618f9a55da52191d5e28b435f

> [2]
> https://lore.kernel.org/lkml/[email protected]/T/#m638575b42057e1d2cc3a82d4bdea7308eb85b461

On this one, I wanted to get some more review of the chain-migration
logic before really pushing it.

> [3]
> https://lore.kernel.org/lkml/[email protected]/T/#mafcb432d20c6b545998da56ba92fc51ba8a07b42

I do agree this is more of a cleanup, but I also want to try to
refactor it a touch to try to consolidate the logic, so it's not quite
ready.

So I'm just submitting the set here for now to try to get the ball
rolling, but I'll continue working on the larger series and will pull
things down as they make sense.

thanks
-john