LinuxLists.cc - [PATCH v4 0/2] sched/core: Avoid obvious double update_rq

2022-05-02 02:58:13

Subject: [PATCH v4 0/2] sched/core: Avoid obvious double update_rq_clock warning

These two patches are about the kernel scheduler:

patch 1: fixed the issue that kernel may trigger WARN_DOUBLE_CLOCK warning.
patch 2: removed some no longer needed comments in the deadline scheduler
and cleaned up the code.

Thanks for suggestion from Dietmar Eggemann & Peter Zijlstra.

v3->v4:
- Modify some code formats and commit messages.
- Add Reviewed-by: Dietmar Eggemann <[email protected]> for
patch 1.
- Add Reviewed-by: Daniel Bristot de Oliveira <[email protected]>
for patch 2.

v2->v3:
- Modify some descriptions and errors in the commit information.
- Fixed a bug that might trigger the WARN_DOUBLE_CLOCK warning in
migrate_task_rq_dl().
- Add Reviewed-by: Dietmar Eggemann <[email protected]> for
patch 2.

v1->v2:
- Added double_rq_clock_clear_update inline helper to clear RQCF_UPDATED
of rq->clock_update_flags.
- split into two separate patches.

[v1] https://lore.kernel.org/lkml/[email protected]/
[v2] https://lore.kernel.org/lkml/[email protected]/
[v3] https://lore.kernel.org/all/[email protected]/

Hao Jia (2):
sched/core: Avoid obvious double update_rq_clock warning
sched/deadline: Remove superfluous rq clock update in push_dl_task()

kernel/sched/core.c | 6 +++---
kernel/sched/deadline.c | 13 ++++---------
kernel/sched/rt.c | 5 +++--
kernel/sched/sched.h | 28 ++++++++++++++++++++++++----
4 files changed, 34 insertions(+), 18 deletions(-)

--
2.32.0

2022-05-02 13:32:06

by Hao Jia

[permalink] [raw]

Subject: [PATCH v4 2/2] sched/deadline: Remove superfluous rq clock update in push_dl_task()

The change to call update_rq_clock() before activate_task()
commit 840d719604b0 ("sched/deadline: Update rq_clock of later_rq
when pushing a task") is no longer needed since commit f4904815f97a
("sched/deadline: Fix double accounting of rq/running bw in push & pull")
removed the add_running_bw() before the activate_task().

So we remove some comments that are no longer needed and update
rq clock in activate_task().

Signed-off-by: Hao Jia <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
Reviewed-by: Daniel Bristot de Oliveira <[email protected]>
---
kernel/sched/deadline.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b61281d10458..877f55f3a60e 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2320,13 +2320,7 @@ static int push_dl_task(struct rq *rq)

deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, later_rq->cpu);
-
- /*
- * Update the later_rq clock here, because the clock is used
- * by the cpufreq_update_util() inside __add_running_bw().
- */
- update_rq_clock(later_rq);
- activate_task(later_rq, next_task, ENQUEUE_NOCLOCK);
+ activate_task(later_rq, next_task, 0);
ret = 1;

resched_curr(later_rq);
--
2.32.0

2022-05-03 01:01:51

by Hao Jia

[permalink] [raw]

Subject: [PATCH v4 1/2] sched/core: Avoid obvious double update_rq_clock warning

When we use raw_spin_rq_lock() to acquire the rq lock and have to
update the rq clock while holding the lock, the kernel may issue
a WARN_DOUBLE_CLOCK warning.

Since we directly use raw_spin_rq_lock() to acquire rq lock instead of
rq_lock(), there is no corresponding change to rq->clock_update_flags.
In particular, we have obtained the rq lock of other CPUs, the
rq->clock_update_flags of this CPU may be RQCF_UPDATED at this time, and
then calling update_rq_clock() will trigger the WARN_DOUBLE_CLOCK warning.

So we need to clear RQCF_UPDATED of rq->clock_update_flags to avoid
the WARN_DOUBLE_CLOCK warning.

For the sched_rt_period_timer() and migrate_task_rq_dl() cases
we simply replace raw_spin_rq_lock()/raw_spin_rq_unlock() with
rq_lock()/rq_unlock().

For the {pull,push}_{rt,dl}_task() cases, we add the
double_rq_clock_clear_update() function to clear RQCF_UPDATED of
rq->clock_update_flags, and call double_rq_clock_clear_update()
before double_lock_balance()/double_rq_lock() returns to avoid the
WARN_DOUBLE_CLOCK warning.

Some call trace reports:
Call Trace 1:
<IRQ>
sched_rt_period_timer+0x10f/0x3a0
? enqueue_top_rt_rq+0x110/0x110
__hrtimer_run_queues+0x1a9/0x490
hrtimer_interrupt+0x10b/0x240
__sysvec_apic_timer_interrupt+0x8a/0x250
sysvec_apic_timer_interrupt+0x9a/0xd0
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x12/0x20

Call Trace 2:
<TASK>
activate_task+0x8b/0x110
push_rt_task.part.108+0x241/0x2c0
push_rt_tasks+0x15/0x30
finish_task_switch+0xaa/0x2e0
? __switch_to+0x134/0x420
__schedule+0x343/0x8e0
? hrtimer_start_range_ns+0x101/0x340
schedule+0x4e/0xb0
do_nanosleep+0x8e/0x160
hrtimer_nanosleep+0x89/0x120
? hrtimer_init_sleeper+0x90/0x90
__x64_sys_nanosleep+0x96/0xd0
do_syscall_64+0x34/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae

Call Trace 3:
<TASK>
deactivate_task+0x93/0xe0
pull_rt_task+0x33e/0x400
balance_rt+0x7e/0x90
__schedule+0x62f/0x8e0
do_task_dead+0x3f/0x50
do_exit+0x7b8/0xbb0
do_group_exit+0x2d/0x90
get_signal+0x9df/0x9e0
? preempt_count_add+0x56/0xa0
? __remove_hrtimer+0x35/0x70
arch_do_signal_or_restart+0x36/0x720
? nanosleep_copyout+0x39/0x50
? do_nanosleep+0x131/0x160
? audit_filter_inodes+0xf5/0x120
exit_to_user_mode_prepare+0x10f/0x1e0
syscall_exit_to_user_mode+0x17/0x30
do_syscall_64+0x40/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae

Call Trace 4:
update_rq_clock+0x128/0x1a0
migrate_task_rq_dl+0xec/0x310
set_task_cpu+0x84/0x1e4
try_to_wake_up+0x1d8/0x5c0
wake_up_process+0x1c/0x30
hrtimer_wakeup+0x24/0x3c
__hrtimer_run_queues+0x114/0x270
hrtimer_interrupt+0xe8/0x244
arch_timer_handler_phys+0x30/0x50
handle_percpu_devid_irq+0x88/0x140
generic_handle_domain_irq+0x40/0x60
gic_handle_irq+0x48/0xe0
call_on_irq_stack+0x2c/0x60
do_interrupt_handler+0x80/0x84

Steps to reproduce:
1. Enable CONFIG_SCHED_DEBUG when compiling the kernel
2. echo 1 > /sys/kernel/debug/clear_warn_once
echo "WARN_DOUBLE_CLOCK" > /sys/kernel/debug/sched/features
echo "NO_RT_PUSH_IPI" > /sys/kernel/debug/sched/features
3. Run some rt/dl tasks that periodically work and sleep, e.g.
Create 2*n rt or dl (90% running) tasks via rt-app (on a system
with n CPUs), and Dietmar Eggemann reports Call Trace 4 when running
on PREEMPT_RT kernel.

Signed-off-by: Hao Jia <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
---
kernel/sched/core.c | 6 +++---
kernel/sched/deadline.c | 5 +++--
kernel/sched/rt.c | 5 +++--
kernel/sched/sched.h | 28 ++++++++++++++++++++++++----
4 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 51efaabac3e4..84538271b4eb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -610,10 +610,10 @@ void double_rq_lock(struct rq *rq1, struct rq *rq2)
swap(rq1, rq2);

raw_spin_rq_lock(rq1);
- if (__rq_lockp(rq1) == __rq_lockp(rq2))
- return;
+ if (__rq_lockp(rq1) != __rq_lockp(rq2))
+ raw_spin_rq_lock_nested(rq2, SINGLE_DEPTH_NESTING);

- raw_spin_rq_lock_nested(rq2, SINGLE_DEPTH_NESTING);
+ double_rq_clock_clear_update(rq1, rq2);
}
#endif

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index fb4255ae0b2c..b61281d10458 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1832,6 +1832,7 @@ select_task_rq_dl(struct task_struct *p, int cpu, int flags)

static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused)
{
+ struct rq_flags rf;
struct rq *rq;

if (READ_ONCE(p->__state) != TASK_WAKING)
@@ -1843,7 +1844,7 @@ static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused
* from try_to_wake_up(). Hence, p->pi_lock is locked, but
* rq->lock is not... So, lock it
*/
- raw_spin_rq_lock(rq);
+ rq_lock(rq, &rf);
if (p->dl.dl_non_contending) {
update_rq_clock(rq);
sub_running_bw(&p->dl, &rq->dl);
@@ -1859,7 +1860,7 @@ static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused
put_task_struct(p);
}
sub_rq_bw(&p->dl, &rq->dl);
- raw_spin_rq_unlock(rq);
+ rq_unlock(rq, &rf);
}

static void check_preempt_equal_dl(struct rq *rq, struct task_struct *p)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index a32c46889af8..7891c0f0e1ff 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -871,6 +871,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
int enqueue = 0;
struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i);
struct rq *rq = rq_of_rt_rq(rt_rq);
+ struct rq_flags rf;
int skip;

/*
@@ -885,7 +886,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
if (skip)
continue;

- raw_spin_rq_lock(rq);
+ rq_lock(rq, &rf);
update_rq_clock(rq);

if (rt_rq->rt_time) {
@@ -923,7 +924,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)

if (enqueue)
sched_rt_rq_enqueue(rt_rq);
- raw_spin_rq_unlock(rq);
+ rq_unlock(rq, &rf);
}

if (!throttled && (!rt_bandwidth_enabled() || rt_b->rt_runtime == RUNTIME_INF))
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 8dccb34eb190..2133aea22086 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2478,6 +2478,24 @@ unsigned long arch_scale_freq_capacity(int cpu)
}
#endif

+#ifdef CONFIG_SCHED_DEBUG
+/*
+ * In double_lock_balance()/double_rq_lock(), we use raw_spin_rq_lock() to
+ * acquire rq lock instead of rq_lock(). So at the end of these two functions
+ * we need to call double_rq_clock_clear_update() to clear RQCF_UPDATED of
+ * rq->clock_update_flags to avoid the WARN_DOUBLE_CLOCK warning.
+ */
+static inline void double_rq_clock_clear_update(struct rq *rq1, struct rq *rq2)
+{
+ rq1->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
+ /* rq1 == rq2 for !CONFIG_SMP, so just clear RQCF_UPDATED once. */
+#ifdef CONFIG_SMP
+ rq2->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
+#endif
+}
+#else
+static inline void double_rq_clock_clear_update(struct rq *rq1, struct rq *rq2) {}
+#endif

#ifdef CONFIG_SMP

@@ -2543,14 +2561,15 @@ static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
__acquires(busiest->lock)
__acquires(this_rq->lock)
{
- if (__rq_lockp(this_rq) == __rq_lockp(busiest))
- return 0;
-
- if (likely(raw_spin_rq_trylock(busiest)))
+ if (__rq_lockp(this_rq) == __rq_lockp(busiest) ||
+ likely(raw_spin_rq_trylock(busiest))) {
+ double_rq_clock_clear_update(this_rq, busiest);
return 0;
+ }

if (rq_order_less(this_rq, busiest)) {
raw_spin_rq_lock_nested(busiest, SINGLE_DEPTH_NESTING);
+ double_rq_clock_clear_update(this_rq, busiest);
return 0;
}

@@ -2644,6 +2663,7 @@ static inline void double_rq_lock(struct rq *rq1, struct rq *rq2)
BUG_ON(rq1 != rq2);
raw_spin_rq_lock(rq1);
__acquire(rq2->lock); /* Fake it out ;) */
+ double_rq_clock_clear_update(rq1, rq2);
}

/*
--
2.32.0

2022-05-09 10:07:17

by Hao Jia

[permalink] [raw]

Subject: Re: [PATCH v4 0/2] sched/core: Avoid obvious double update_rq_clock warning

Friendly ping...

Hi Dietmar Eggemann & Peter Zijlstra,
If you have time, please review these two patches.

On 2022/4/30 Hao Jia wrote:
> These two patches are about the kernel scheduler:
>
> patch 1: fixed the issue that kernel may trigger WARN_DOUBLE_CLOCK warning.
> patch 2: removed some no longer needed comments in the deadline scheduler
> and cleaned up the code.
>
> Thanks for suggestion from Dietmar Eggemann & Peter Zijlstra.
>
> v3->v4:
> - Modify some code formats and commit messages.
> - Add Reviewed-by: Dietmar Eggemann <[email protected]> for
> patch 1.
> - Add Reviewed-by: Daniel Bristot de Oliveira <[email protected]>
> for patch 2.
>
> v2->v3:
> - Modify some descriptions and errors in the commit information.
> - Fixed a bug that might trigger the WARN_DOUBLE_CLOCK warning in
> migrate_task_rq_dl().
> - Add Reviewed-by: Dietmar Eggemann <[email protected]> for
> patch 2.
>
> v1->v2:
> - Added double_rq_clock_clear_update inline helper to clear RQCF_UPDATED
> of rq->clock_update_flags.
> - split into two separate patches.
>
> [v1] https://lore.kernel.org/lkml/[email protected]/
> [v2] https://lore.kernel.org/lkml/[email protected]/
> [v3] https://lore.kernel.org/all/[email protected]/
>
> Hao Jia (2):
> sched/core: Avoid obvious double update_rq_clock warning
> sched/deadline: Remove superfluous rq clock update in push_dl_task()
>
> kernel/sched/core.c | 6 +++---
> kernel/sched/deadline.c | 13 ++++---------
> kernel/sched/rt.c | 5 +++--
> kernel/sched/sched.h | 28 ++++++++++++++++++++++++----
> 4 files changed, 34 insertions(+), 18 deletions(-)
>

2022-05-09 10:35:00

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH v4 0/2] sched/core: Avoid obvious double update_rq_clock warning

On Mon, May 09, 2022 at 11:16:39AM +0800, Hao Jia wrote:
> Friendly ping...
>
> Hi Dietmar Eggemann & Peter Zijlstra,
> If you have time, please review these two patches.

I've picked them up, shall push them to tip soonish.

Thanks!

2022-05-11 22:01:33

by tip-bot2 for Hou Wenlong

[permalink] [raw]

Subject: [tip: sched/core] sched/deadline: Remove superfluous rq clock update in push_dl_task()

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 734387ec2f9d77b00276042b1fa7c95f48ee879d
Gitweb: https://git.kernel.org/tip/734387ec2f9d77b00276042b1fa7c95f48ee879d
Author: Hao Jia <[email protected]>
AuthorDate: Sat, 30 Apr 2022 16:58:43 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 11 May 2022 16:27:12 +02:00

sched/deadline: Remove superfluous rq clock update in push_dl_task()

The change to call update_rq_clock() before activate_task()
commit 840d719604b0 ("sched/deadline: Update rq_clock of later_rq
when pushing a task") is no longer needed since commit f4904815f97a
("sched/deadline: Fix double accounting of rq/running bw in push & pull")
removed the add_running_bw() before the activate_task().

So we remove some comments that are no longer needed and update
rq clock in activate_task().

Signed-off-by: Hao Jia <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
Reviewed-by: Daniel Bristot de Oliveira <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/deadline.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 0ad2818..936817a 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2318,13 +2318,7 @@ retry:

deactivate_task(rq, next_task, 0);
set_task_cpu(next_task, later_rq->cpu);
-
- /*
- * Update the later_rq clock here, because the clock is used
- * by the cpufreq_update_util() inside __add_running_bw().
- */
- update_rq_clock(later_rq);
- activate_task(later_rq, next_task, ENQUEUE_NOCLOCK);
+ activate_task(later_rq, next_task, 0);
ret = 1;

resched_curr(later_rq);

2022-05-12 08:23:53

by tip-bot2 for Hou Wenlong

[permalink] [raw]

Subject: [tip: sched/core] sched/core: Avoid obvious double update_rq_clock warning

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 2679a83731d51a744657f718fc02c3b077e47562
Gitweb: https://git.kernel.org/tip/2679a83731d51a744657f718fc02c3b077e47562
Author: Hao Jia <[email protected]>
AuthorDate: Sat, 30 Apr 2022 16:58:42 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Wed, 11 May 2022 16:27:11 +02:00

sched/core: Avoid obvious double update_rq_clock warning

When we use raw_spin_rq_lock() to acquire the rq lock and have to
update the rq clock while holding the lock, the kernel may issue
a WARN_DOUBLE_CLOCK warning.

Since we directly use raw_spin_rq_lock() to acquire rq lock instead of
rq_lock(), there is no corresponding change to rq->clock_update_flags.
In particular, we have obtained the rq lock of other CPUs, the
rq->clock_update_flags of this CPU may be RQCF_UPDATED at this time, and
then calling update_rq_clock() will trigger the WARN_DOUBLE_CLOCK warning.

So we need to clear RQCF_UPDATED of rq->clock_update_flags to avoid
the WARN_DOUBLE_CLOCK warning.

For the sched_rt_period_timer() and migrate_task_rq_dl() cases
we simply replace raw_spin_rq_lock()/raw_spin_rq_unlock() with
rq_lock()/rq_unlock().

For the {pull,push}_{rt,dl}_task() cases, we add the
double_rq_clock_clear_update() function to clear RQCF_UPDATED of
rq->clock_update_flags, and call double_rq_clock_clear_update()
before double_lock_balance()/double_rq_lock() returns to avoid the
WARN_DOUBLE_CLOCK warning.

Some call trace reports:
Call Trace 1:
<IRQ>
sched_rt_period_timer+0x10f/0x3a0
? enqueue_top_rt_rq+0x110/0x110
__hrtimer_run_queues+0x1a9/0x490
hrtimer_interrupt+0x10b/0x240
__sysvec_apic_timer_interrupt+0x8a/0x250
sysvec_apic_timer_interrupt+0x9a/0xd0
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x12/0x20

Call Trace 2:
<TASK>
activate_task+0x8b/0x110
push_rt_task.part.108+0x241/0x2c0
push_rt_tasks+0x15/0x30
finish_task_switch+0xaa/0x2e0
? __switch_to+0x134/0x420
__schedule+0x343/0x8e0
? hrtimer_start_range_ns+0x101/0x340
schedule+0x4e/0xb0
do_nanosleep+0x8e/0x160
hrtimer_nanosleep+0x89/0x120
? hrtimer_init_sleeper+0x90/0x90
__x64_sys_nanosleep+0x96/0xd0
do_syscall_64+0x34/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae

Call Trace 3:
<TASK>
deactivate_task+0x93/0xe0
pull_rt_task+0x33e/0x400
balance_rt+0x7e/0x90
__schedule+0x62f/0x8e0
do_task_dead+0x3f/0x50
do_exit+0x7b8/0xbb0
do_group_exit+0x2d/0x90
get_signal+0x9df/0x9e0
? preempt_count_add+0x56/0xa0
? __remove_hrtimer+0x35/0x70
arch_do_signal_or_restart+0x36/0x720
? nanosleep_copyout+0x39/0x50
? do_nanosleep+0x131/0x160
? audit_filter_inodes+0xf5/0x120
exit_to_user_mode_prepare+0x10f/0x1e0
syscall_exit_to_user_mode+0x17/0x30
do_syscall_64+0x40/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae

Call Trace 4:
update_rq_clock+0x128/0x1a0
migrate_task_rq_dl+0xec/0x310
set_task_cpu+0x84/0x1e4
try_to_wake_up+0x1d8/0x5c0
wake_up_process+0x1c/0x30
hrtimer_wakeup+0x24/0x3c
__hrtimer_run_queues+0x114/0x270
hrtimer_interrupt+0xe8/0x244
arch_timer_handler_phys+0x30/0x50
handle_percpu_devid_irq+0x88/0x140
generic_handle_domain_irq+0x40/0x60
gic_handle_irq+0x48/0xe0
call_on_irq_stack+0x2c/0x60
do_interrupt_handler+0x80/0x84

Steps to reproduce:
1. Enable CONFIG_SCHED_DEBUG when compiling the kernel
2. echo 1 > /sys/kernel/debug/clear_warn_once
echo "WARN_DOUBLE_CLOCK" > /sys/kernel/debug/sched/features
echo "NO_RT_PUSH_IPI" > /sys/kernel/debug/sched/features
3. Run some rt/dl tasks that periodically work and sleep, e.g.
Create 2*n rt or dl (90% running) tasks via rt-app (on a system
with n CPUs), and Dietmar Eggemann reports Call Trace 4 when running
on PREEMPT_RT kernel.

Signed-off-by: Hao Jia <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Dietmar Eggemann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/core.c | 6 +++---
kernel/sched/deadline.c | 5 +++--
kernel/sched/rt.c | 5 +++--
kernel/sched/sched.h | 28 ++++++++++++++++++++++++----
4 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 48cfad1..2c9f5e9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -613,10 +613,10 @@ void double_rq_lock(struct rq *rq1, struct rq *rq2)
swap(rq1, rq2);

raw_spin_rq_lock(rq1);
- if (__rq_lockp(rq1) == __rq_lockp(rq2))
- return;
+ if (__rq_lockp(rq1) != __rq_lockp(rq2))
+ raw_spin_rq_lock_nested(rq2, SINGLE_DEPTH_NESTING);

- raw_spin_rq_lock_nested(rq2, SINGLE_DEPTH_NESTING);
+ double_rq_clock_clear_update(rq1, rq2);
}
#endif

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 6ae4236..0ad2818 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1830,6 +1830,7 @@ out:

static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused)
{
+ struct rq_flags rf;
struct rq *rq;

if (READ_ONCE(p->__state) != TASK_WAKING)
@@ -1841,7 +1842,7 @@ static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused
* from try_to_wake_up(). Hence, p->pi_lock is locked, but
* rq->lock is not... So, lock it
*/
- raw_spin_rq_lock(rq);
+ rq_lock(rq, &rf);
if (p->dl.dl_non_contending) {
update_rq_clock(rq);
sub_running_bw(&p->dl, &rq->dl);
@@ -1857,7 +1858,7 @@ static void migrate_task_rq_dl(struct task_struct *p, int new_cpu __maybe_unused
put_task_struct(p);
}
sub_rq_bw(&p->dl, &rq->dl);
- raw_spin_rq_unlock(rq);
+ rq_unlock(rq, &rf);
}

static void check_preempt_equal_dl(struct rq *rq, struct task_struct *p)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index a32c468..7891c0f 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -871,6 +871,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
int enqueue = 0;
struct rt_rq *rt_rq = sched_rt_period_rt_rq(rt_b, i);
struct rq *rq = rq_of_rt_rq(rt_rq);
+ struct rq_flags rf;
int skip;

/*
@@ -885,7 +886,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)
if (skip)
continue;

- raw_spin_rq_lock(rq);
+ rq_lock(rq, &rf);
update_rq_clock(rq);

if (rt_rq->rt_time) {
@@ -923,7 +924,7 @@ static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun)

if (enqueue)
sched_rt_rq_enqueue(rt_rq);
- raw_spin_rq_unlock(rq);
+ rq_unlock(rq, &rf);
}

if (!throttled && (!rt_bandwidth_enabled() || rt_b->rt_runtime == RUNTIME_INF))
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 7f338c5..fe4d1ac 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2474,6 +2474,24 @@ unsigned long arch_scale_freq_capacity(int cpu)
}
#endif

+#ifdef CONFIG_SCHED_DEBUG
+/*
+ * In double_lock_balance()/double_rq_lock(), we use raw_spin_rq_lock() to
+ * acquire rq lock instead of rq_lock(). So at the end of these two functions
+ * we need to call double_rq_clock_clear_update() to clear RQCF_UPDATED of
+ * rq->clock_update_flags to avoid the WARN_DOUBLE_CLOCK warning.
+ */
+static inline void double_rq_clock_clear_update(struct rq *rq1, struct rq *rq2)
+{
+ rq1->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
+ /* rq1 == rq2 for !CONFIG_SMP, so just clear RQCF_UPDATED once. */
+#ifdef CONFIG_SMP
+ rq2->clock_update_flags &= (RQCF_REQ_SKIP|RQCF_ACT_SKIP);
+#endif
+}
+#else
+static inline void double_rq_clock_clear_update(struct rq *rq1, struct rq *rq2) {}
+#endif

#ifdef CONFIG_SMP

@@ -2539,14 +2557,15 @@ static inline int _double_lock_balance(struct rq *this_rq, struct rq *busiest)
__acquires(busiest->lock)
__acquires(this_rq->lock)
{
- if (__rq_lockp(this_rq) == __rq_lockp(busiest))
- return 0;
-
- if (likely(raw_spin_rq_trylock(busiest)))
+ if (__rq_lockp(this_rq) == __rq_lockp(busiest) ||
+ likely(raw_spin_rq_trylock(busiest))) {
+ double_rq_clock_clear_update(this_rq, busiest);
return 0;
+ }

if (rq_order_less(this_rq, busiest)) {
raw_spin_rq_lock_nested(busiest, SINGLE_DEPTH_NESTING);
+ double_rq_clock_clear_update(this_rq, busiest);
return 0;
}

@@ -2640,6 +2659,7 @@ static inline void double_rq_lock(struct rq *rq1, struct rq *rq2)
BUG_ON(rq1 != rq2);
raw_spin_rq_lock(rq1);
__acquire(rq2->lock); /* Fake it out ;) */
+ double_rq_clock_clear_update(rq1, rq2);
}

/*