2022-04-27 09:21:44

by Hao Jia

[permalink] [raw]
Subject: [PATCH v3 0/2] sched/core: Avoid obvious double update_rq_clock warning

These two patches are about the kernel scheduler:

patch 1: fixed the issue that kernel may trigger WARN_DOUBLE_CLOCK warning.
patch 2: removed some no longer needed comments in the deadline scheduler
and cleaned up the code.

Thanks for suggestion from Dietmar Eggemann & Peter Zijlstra.

v2->v3:
- Modify some descriptions and errors in the commit information.
- Fixed a bug that might trigger the WARN_DOUBLE_CLOCK warning in
migrate_task_rq_dl().
- Add Reviewed-by: Dietmar Eggemann <[email protected]> for
patch 2.

v1->v2:
- Added double_rq_clock_clear_update inline helper to clear RQCF_UPDATED
of rq->clock_update_flags.
- split into two separate patches.

[v1] https://lore.kernel.org/lkml/[email protected]/
[v2] https://lore.kernel.org/lkml/[email protected]/

Hao Jia (2):
sched/core: Avoid obvious double update_rq_clock warning
sched/deadline: Remove superfluous rq clock update in push_dl_task()

kernel/sched/core.c | 6 +++---
kernel/sched/deadline.c | 13 ++++---------
kernel/sched/rt.c | 5 +++--
kernel/sched/sched.h | 31 +++++++++++++++++++++++++++----
4 files changed, 37 insertions(+), 18 deletions(-)

--
2.32.0


2022-04-27 10:24:09

by Hao Jia

[permalink] [raw]
Subject: [PATCH v3 1/2] sched/core: Avoid obvious double update_rq_clock warning

When we use raw_spin_rq_lock() to acquire the rq lock and have to
update the rq clock while holding the lock, the kernel may issue
a WARN_DOUBLE_CLOCK warning.

Since we directly use raw_spin_rq_lock() to acquire rq lock instead of
rq_lock(), there is no corresponding change to rq->clock_update_flags.
In particular, we have obtained the rq lock of other cores,
the core rq->clock_update_flags may be RQCF_UPDATED at this time, and
then calling update_rq_clock() will trigger the WARN_DOUBLE_CLOCK warning.

So we need to clear RQCF_UPDATED of rq->clock_update_flags synchronously
to avoid the WARN_DOUBLE_CLOCK warning.

For the sched_rt_period_timer() and migrate_task_rq_dl() cases
we simply replace raw_spin_rq_lock()/raw_spin_rq_unlock() with
rq_lock()/rq_unlock().

For the {pull,push}_{rt,dl}_task() cases, we add the
double_rq_clock_clear_update() function to clear RQCF_UPDATED of
rq->clock_update_flags, and call double_rq_clock_clear_update()
before double_lock_balance()/double_rq_lock() returns to avoid the
WARN_DOUBLE_CLOCK warning.

Some call trace reports:
Call Trace 1:
<IRQ>
sched_rt_period_timer+0x10f/0x3a0
? enqueue_top_rt_rq+0x110/0x110
__hrtimer_run_queues+0x1a9/0x490
hrtimer_interrupt+0x10b/0x240
__sysvec_apic_timer_interrupt+0x8a/0x250
sysvec_apic_timer_interrupt+0x9a/0xd0
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x12/0x20

Call Trace 2:
<TASK>
activate_task+0x8b/0x110
push_rt_task.part.108+0x241/0x2c0
push_rt_tasks+0x15/0x30
finish_task_switch+0xaa/0x2e0
? __switch_to+0x134/0x420
__schedule+0x343/0x8e0
? hrtimer_start_range_ns+0x101/0x340
schedule+0x4e/0xb0
do_nanosleep+0x8e/0x160
hrtimer_nanosleep+0x89/0x120
? hrtimer_init_sleeper+0x90/0x90
__x64_sys_nanosleep+0x96/0xd0
do_syscall_64+0x34/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae

Call Trace 3:
<TASK>
deactivate_task+0x93/0xe0
pull_rt_task+0x33e/0x400
balance_rt+0x7e/0x90
__schedule+0x62f/0x8e0
do_task_dead+0x3f/0x50
do_exit+0x7b8/0xbb0
do_group_exit+0x2d/0x90
get_signal+0x9df/0x9e0
? preempt_count_add+0x56/0xa0
? __remove_hrtimer+0x35/0x70
arch_do_signal_or_restart+0x36/0x720
? nanosleep_copyout+0x39/0x50
? do_nanosleep+0x131/0x160
? audit_filter_inodes+0xf5/0x120
exit_to_user_mode_prepare+0x10f/0x1e0
syscall_exit_to_user_mode+0x17/0x30
do_syscall_64+0x40/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae

Call Trace 4:
update_rq_clock+0x128/0x1a0
migrate_task_rq_dl+0xec/0x310
set_task_cpu+0x84/0x1e4
try_to_wake_up+0x1d8/0x5c0
wake_up_process+0x1c/0x30
hrtimer_wakeup+0x24/0x3c
__hrtimer_run_queues+0x114/0x270
hrtimer_interrupt+0xe8/0x244
arch_timer_handler_phys+0x30/0x50
handle_percpu_devid_irq+0x88/0x140
generic_handle_domain_irq+0x40/0x60
gic_handle_irq+0x48/0xe0
call_on_irq_stack+0x2c/0x60
do_interrupt_handler+0x80/0x84

Steps to reproduce:
1. Enable CONFIG_SCHED_DEBUG when compiling the kernel
2. echo 1 > /sys/kernel/debug/clear_warn_once
echo "WARN_DOUBLE_CLOCK" > /sys/kernel/debug/sched/features
echo "NO_RT_PUSH_IPI" > /sys/kernel/debug/sched/features
3. Run some rt/dl tasks that periodically work and sleep, e.g.
Create 2*n rt or dl (90% running) tasks via rt-app (on a system
with n CPUs), and Dietmar Eggemann reports Call Trace 4 when running
on PREEMPT_RT kernel.

Signed-off-by: Hao Jia <[email protected]>
---
kernel/sched/core.c | 6 +++---
kernel/sched/deadline.c | 5 +++--
kernel/sched/rt.c | 5 +++--
kernel/sched/sched.h | 31 +++++++++++++++++++++++++++----
4 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 51efaabac3e4..84538271b4eb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -610,10 +610,10 @@ void double_rq_lock(struct rq *rq1, struct rq *rq2)
swap(rq1, rq2);

raw_spin_rq_lock(rq1);
- if (__rq_lockp(rq1) == __rq_lockp(rq2))
- return;
+ if (__rq_lockp(rq1) != __rq_lockp(rq2))
+ raw_spin_rq_lock_nested(rq2, SINGLE_DEPTH_NESTING);

- raw_spin_rq_lock_nested(rq2, SINGLE_DEPTH_NESTING);
+ double_rq_clock_clear_update(rq1, rq2);
}
#endif

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index fb4255ae0b2c..a562bb271031 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1833,6 +1833,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int flags)
static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused)
{
struct rq *rq;
+ struct rq_flags rf;

if (READ_ONCE(p->__state) != TASK_WAKING)
return;
@@ -1843,7 +1844,7 @@ static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused
* from try_to_wake_up(). Hence, p->pi_lock is locked, but
* rq->lock is not... So, lock it
*/
- raw_spin_rq_lock(rq);
+ rq_lock(rq, &rf);
if (p->dl.dl_non_contending) {
update_rq_clock(rq);
sub_running_bw(&p->dl, &rq->dl);
@@ -1859,7 +1860,7 @@ static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused
put_task_struct(p);
}
sub_rq_bw(&p->dl, &rq->dl);
- raw_spin_rq_unlock(rq);
+ rq_unlock(rq, &rf);
}

static void check_preempt_equal_dl(struct rq *rq, struct task_struct *p)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index a32c46889af8..7891c0f0e1ff 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -871,6 +871,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
int enqueue = 0;
struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i);
struct rq *rq = rq_of_rt_rq(rt_rq);
+ struct rq_flags rf;
int skip;

/*
@@ -885,7 +886,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
if (skip)
continue;

- raw_spin_rq_lock(rq);
+ rq_lock(rq, &rf);
update_rq_clock(rq);

if (rt_rq->rt_time) {
@@ -923,7 +924,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)

if (enqueue)
sched_rt_rq_enqueue(rt_rq);
- raw_spin_rq_unlock(rq);
+ rq_unlock(rq, &rf);
}

if (!throttled && (!rt_bandwidth_enabled() || rt_b->rt_runtime == RUNTIME_INF))
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 8dccb34eb190..975e7d990a0b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2478,6 +2478,27 @@ unsigned long arch_scale_freq_capacity(int cpu)
}
#endif

+#ifdef CONFIG_SCHED_DEBUG
+/*
+ * In double_lock_balance()/double_rq_lock(), we use raw_spin_rq_lock() to acquire
+ * rq lock instead of rq_lock(). So at the end of these two functions we need to
+ * call double_rq_clock_clear_update() synchronously to clear RQCF_UPDATED of
+ * rq->clock_update_flags to avoid the WARN_DOUBLE_CLOCK warning.
+ */
+static inline void double_rq_clock_clear_update(struct rq *rq1, struct rq *rq2)
+{
+ rq1->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
+ /*
+ * If CONFIG_SMP is not defined, rq1 and rq2 are the same,
+ * so we just clear RQCF_UPDATED one of them.
+ */
+#ifdef CONFIG_SMP
+ rq2->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
+#endif
+}
+#else
+static inline void double_rq_clock_clear_update(struct rq *rq1, struct rq *rq2) {}
+#endif

#ifdef CONFIG_SMP

@@ -2543,14 +2564,15 @@ static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
__acquires(busiest->lock)
__acquires(this_rq->lock)
{
- if (__rq_lockp(this_rq) == __rq_lockp(busiest))
- return 0;
-
- if (likely(raw_spin_rq_trylock(busiest)))
+ if (__rq_lockp(this_rq) == __rq_lockp(busiest) ||
+ likely(raw_spin_rq_trylock(busiest))) {
+ double_rq_clock_clear_update(this_rq, busiest);
return 0;
+ }

if (rq_order_less(this_rq, busiest)) {
raw_spin_rq_lock_nested(busiest, SINGLE_DEPTH_NESTING);
+ double_rq_clock_clear_update(this_rq, busiest);
return 0;
}

@@ -2644,6 +2666,7 @@ static inline void double_rq_lock(struct rq *rq1, struct rq *rq2)
BUG_ON(rq1 != rq2);
raw_spin_rq_lock(rq1);
__acquire(rq2->lock); /* Fake it out ;) */
+ double_rq_clock_clear_update(rq1, rq2);
}

/*
--
2.32.0

2022-04-27 11:03:15

by Hao Jia

[permalink] [raw]
Subject: [PATCH v3 2/2] sched/deadline: Remove superfluous rq clock update in push_dl_task()

The change to call update_rq_clock() before activate_task()
commit 840d719604b0 ("sched/deadline: Update rq_clock of later_rq
when pushing a task") is no longer needed since commit f4904815f97a
("sched/deadline: Fix double accounting of rq/running bw in push & pull")
removed the add_running_bw() before the activate_task().

So we remove some comments that are no longer needed and update
rq clock in activate_task().

Signed-off-by: Hao Jia <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
---
kernel/sched/deadline.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index a562bb271031..83636a7fcd32 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2320,13 +2320,7 @@ static int push_dl_task(struct rq *rq)

deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, later_rq->cpu);
-
- /*
- * Update the later_rq clock here, because the clock is used
- * by the cpufreq_update_util() inside __add_running_bw().
- */
- update_rq_clock(later_rq);
- activate_task(later_rq, next_task, ENQUEUE_NOCLOCK);
+ activate_task(later_rq, next_task, 0);
ret = 1;

resched_curr(later_rq);
--
2.32.0

2022-04-29 13:52:34

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH v3 1/2] sched/core: Avoid obvious double update_rq_clock warning

On 27/04/2022 10:00, Hao Jia wrote:

[...]

LGTM, comments are only nit-picks.

> Since we directly use raw_spin_rq_lock() to acquire rq lock instead of
> rq_lock(), there is no corresponding change to rq->clock_update_flags.
> In particular, we have obtained the rq lock of other cores,

s/cores/CPUs, with SMT, a core can have multiple (logical) CPUs.

[...]

> So we need to clear RQCF_UPDATED of rq->clock_update_flags synchronously
> to avoid the WARN_DOUBLE_CLOCK warning.

Why you mention `synchronously` here? Isn't this obvious? (1)

[...]

> @@ -1833,6 +1833,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int flags)
> static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused)
> {
> struct rq *rq;
> + struct rq_flags rf;

- struct rq *rq;
struct rq_flags rf;
+ struct rq *rq;

Use reverse fir tree order for variable declarations.
(./Documentation/process/maintainer-tip.rst)

[...]

> +#ifdef CONFIG_SCHED_DEBUG
> +/*
> + * In double_lock_balance()/double_rq_lock(), we use raw_spin_rq_lock() to acquire
> + * rq lock instead of rq_lock(). So at the end of these two functions we need to
> + * call double_rq_clock_clear_update() synchronously to clear RQCF_UPDATED of
^^^^^^^^^^^^^
(1)

> + * rq->clock_update_flags to avoid the WARN_DOUBLE_CLOCK warning.
> + */
> +static inline void double_rq_clock_clear_update(struct rq *rq1, struct rq *rq2)
> +{
> + rq1->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
> + /*
> + * If CONFIG_SMP is not defined, rq1 and rq2 are the same,
> + * so we just clear RQCF_UPDATED one of them.
> + */

Maybe shorter:

/* rq1 == rq2 for !CONFIG_SMP, so just clear RQCF_UPDATED once. */

[...]

> @@ -2543,14 +2564,15 @@ static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
> __acquires(busiest->lock)
> __acquires(this_rq->lock)
> {
> - if (__rq_lockp(this_rq) == __rq_lockp(busiest))
> - return 0;
> -
> - if (likely(raw_spin_rq_trylock(busiest)))
> + if (__rq_lockp(this_rq) == __rq_lockp(busiest) ||
> + likely(raw_spin_rq_trylock(busiest))) {

indention:

if (__rq_lockp(this_rq) == __rq_lockp(busiest) ||
- likely(raw_spin_rq_trylock(busiest))) {
+ likely(raw_spin_rq_trylock(busiest))) {

[...]

Reviewed-by: Dietmar Eggemann <[email protected]>

Tested on Arm64 Juno with mainline & preempt-rt (linux-5.17.y-rt).

Subject: Re: [PATCH v3 2/2] sched/deadline: Remove superfluous rq clock update in push_dl_task()

On 4/27/22 10:00, Hao Jia wrote:
> The change to call update_rq_clock() before activate_task()
> commit 840d719604b0 ("sched/deadline: Update rq_clock of later_rq
> when pushing a task") is no longer needed since commit f4904815f97a
> ("sched/deadline: Fix double accounting of rq/running bw in push & pull")
> removed the add_running_bw() before the activate_task().
>
> So we remove some comments that are no longer needed and update
> rq clock in activate_task().
>
> Signed-off-by: Hao Jia <[email protected]>
> Reviewed-by: Dietmar Eggemann <[email protected]>

Reviewed-by: Daniel Bristot de Oliveira <[email protected]>

-- Daniel

2022-05-03 01:17:13

by Hao Jia

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v3 1/2] sched/core: Avoid obvious double update_rq_clock warning



On 2022/4/29 Dietmar Eggemann wrote:
> On 27/04/2022 10:00, Hao Jia wrote:
>
> [...]
>
> LGTM, comments are only nit-picks.
>
>> Since we directly use raw_spin_rq_lock() to acquire rq lock instead of
>> rq_lock(), there is no corresponding change to rq->clock_update_flags.
>> In particular, we have obtained the rq lock of other cores,
>
> s/cores/CPUs, with SMT, a core can have multiple (logical) CPUs.

Thanks for your review comments.
I will do it in patch v4.
Thanks.

>
> [...]
>
>> So we need to clear RQCF_UPDATED of rq->clock_update_flags synchronously
>> to avoid the WARN_DOUBLE_CLOCK warning.
>
> Why you mention `synchronously` here? Isn't this obvious? (1)

I will do it in patch v4.
Thanks.

>
> [...]
>
>> @@ -1833,6 +1833,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int flags)
>> static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused)
>> {
>> struct rq *rq;
>> + struct rq_flags rf;
>
> - struct rq *rq;
> struct rq_flags rf;
> + struct rq *rq;
>
> Use reverse fir tree order for variable declarations.
> (./Documentation/process/maintainer-tip.rst)

I will do it in patch v4.
Thanks.

>
> [...]
>
>> +#ifdef CONFIG_SCHED_DEBUG
>> +/*
>> + * In double_lock_balance()/double_rq_lock(), we use raw_spin_rq_lock() to acquire
>> + * rq lock instead of rq_lock(). So at the end of these two functions we need to
>> + * call double_rq_clock_clear_update() synchronously to clear RQCF_UPDATED of
> ^^^^^^^^^^^^^
> (1)
>
>> + * rq->clock_update_flags to avoid the WARN_DOUBLE_CLOCK warning.
>> + */
>> +static inline void double_rq_clock_clear_update(struct rq *rq1, struct rq *rq2)
>> +{
>> + rq1->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
>> + /*
>> + * If CONFIG_SMP is not defined, rq1 and rq2 are the same,
>> + * so we just clear RQCF_UPDATED one of them.
>> + */
>
> Maybe shorter:
>
> /* rq1 == rq2 for !CONFIG_SMP, so just clear RQCF_UPDATED once. */

I will do it in patch v4.
Thanks.
>
> [...]
>
>> @@ -2543,14 +2564,15 @@ static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
>> __acquires(busiest->lock)
>> __acquires(this_rq->lock)
>> {
>> - if (__rq_lockp(this_rq) == __rq_lockp(busiest))
>> - return 0;
>> -
>> - if (likely(raw_spin_rq_trylock(busiest)))
>> + if (__rq_lockp(this_rq) == __rq_lockp(busiest) ||
>> + likely(raw_spin_rq_trylock(busiest))) {
>
> indention:
>
> if (__rq_lockp(this_rq) == __rq_lockp(busiest) ||
> - likely(raw_spin_rq_trylock(busiest))) {
> + likely(raw_spin_rq_trylock(busiest))) {

Thanks again for your review and suggestion.
I will do it in patch v4.
Thanks.

>
> [...]
>
> Reviewed-by: Dietmar Eggemann <[email protected]>
>
> Tested on Arm64 Juno with mainline & preempt-rt (linux-5.17.y-rt).