2023-06-08 06:45:03

by Hao Jia

[permalink] [raw]
Subject: [PATCH v4 0/4] Fix some warnings about rq clock

These four patches fix some warnings about rq clock.

Patch1 fixes the warning of using the old rq clock caused by
missing update rq clock.

Patch2-4 fixes the warning that the rq clock was updated multiple
times while holding the rq lock.

v3->v4:
- Add Reviewed-by: Vincent Guittot <[email protected]> for
all patches.

v2->v3:
- Modify the commit information of patch1 to make the description clearer.
- Split v2 patch2 into 3 separate patches.

v1->v2:
- Vincent Guittot suggested using rq_clock_start_loop_update()
to prevent multiple calls to update_rq_clock() in unthrottle_cfs_rq()
instead of removing update_rq_clock() from unthrottle_cfs_rq()
and updating the rq clock before unthrottle_cfs_rq() for patch2.

[v2] https://lore.kernel.org/all/[email protected]
[v1] https://lore.kernel.org/all/[email protected]


Hao Jia (4):
sched/core: Fixed missing rq clock update before calling
set_rq_offline()
sched/core: Avoid double calling update_rq_clock() in
__balance_push_cpu_stop()
sched/core: Avoid multiple calling update_rq_clock() in
__cfsb_csd_unthrottle()
sched/core: Avoid multiple calling update_rq_clock() in
unthrottle_offline_cfs_rqs()

kernel/sched/core.c | 7 ++++---
kernel/sched/fair.c | 18 ++++++++++++++++++
kernel/sched/sched.h | 21 +++++++++++++++++++++
kernel/sched/topology.c | 10 ++++++----
4 files changed, 49 insertions(+), 7 deletions(-)

--
2.37.0 (Apple Git-136)



2023-06-08 06:46:25

by Hao Jia

[permalink] [raw]
Subject: [PATCH v4 1/4] sched/core: Fixed missing rq clock update before calling set_rq_offline()

This is triggered during cpu offline when CONFIG_CPU_FREQ is enabled
and cpufreq is set to powersave:
------------[ cut here ]------------
rq->clock_update_flags < RQCF_ACT_SKIP
WARNING: CPU: 24 PID: 754 at kernel/sched/sched.h:1496
enqueue_top_rt_rq+0x139/0x160
Call Trace:
<TASK>
? intel_pstate_update_util+0x3b0/0x3b0
rq_offline_rt+0x1b7/0x250
set_rq_offline.part.120+0x28/0x60
rq_attach_root+0xc4/0xd0
cpu_attach_domain+0x3dc/0x7f0
? __schedule+0x65e/0x1310
partition_sched_domains_locked+0x2a5/0x3c0
rebuild_sched_domains_locked+0x477/0x830
? percpu_rwsem_wait+0x140/0x140
rebuild_sched_domains+0x1b/0x30
cpuset_hotplug_workfn+0x2ca/0xc90
? balance_push+0x56/0x120
? _raw_spin_unlock+0x15/0x30
? finish_task_switch+0x98/0x2f0
? __switch_to+0x116/0x410
? __schedule+0x65e/0x1310 ? internal_add_timer+0x42/0x60
? _raw_spin_unlock_irqrestore+0x23/0x40
? add_timer_on+0xd5/0x130
process_one_work+0x1bc/0x3d0
worker_thread+0x4c/0x380
? preempt_count_add+0x56/0xa0
? rescuer_thread+0x310/0x310
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30

More detailed key function call graph:
rq_offline_rt()
__disable_runtime()
sched_rt_rq_enqueue()
enqueue_top_rt_rq()
cpufreq_update_util() <-- depends on CONFIG_CPU_FREQ
data->func(data, *rq_clock(rq)*, flags)
intel_pstate_update_util() <-- powersave policy callback function

Before calling set_rq_offline() we need to update the rq clock to avoid
using the old rq clock, and use rq_lock_irqsave()/rq_unlock_irqrestore()
to replace raw_spin_rq_lock_irqsave()/raw_spin_rq_unlock_irqrestore() to
ensure that rq->clock_update_flags are cleared before updating the rq
clock.

Steps to reproduce:
1. Enable CONFIG_SMP and CONFIG_CPU_FREQ when compiling the kernel
2. echo 1 > /sys/kernel/debug/clear_warn_once
3. cpupower -c all frequency-set -g powersave
4. Run some rt tasks e.g. Create 5*n rt (100% running) tasks (on a
system with n CPUs)
5. Offline cpu one by one until the warninng is triggered

Signed-off-by: Hao Jia <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
---
kernel/sched/topology.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 6682535e37c8..b89497696880 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -487,15 +487,17 @@ static void free_rootdomain(struct rcu_head *rcu)
void rq_attach_root(struct rq *rq, struct root_domain *rd)
{
struct root_domain *old_rd = NULL;
- unsigned long flags;
+ struct rq_flags rf;

- raw_spin_rq_lock_irqsave(rq, flags);
+ rq_lock_irqsave(rq, &rf);

if (rq->rd) {
old_rd = rq->rd;

- if (cpumask_test_cpu(rq->cpu, old_rd->online))
+ if (cpumask_test_cpu(rq->cpu, old_rd->online)) {
+ update_rq_clock(rq);
set_rq_offline(rq);
+ }

cpumask_clear_cpu(rq->cpu, old_rd->span);

@@ -515,7 +517,7 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
if (cpumask_test_cpu(rq->cpu, cpu_active_mask))
set_rq_online(rq);

- raw_spin_rq_unlock_irqrestore(rq, flags);
+ rq_unlock_irqrestore(rq, &rf);

if (old_rd)
call_rcu(&old_rd->rcu, free_rootdomain);
--
2.37.0 (Apple Git-136)


2023-06-08 06:48:11

by Hao Jia

[permalink] [raw]
Subject: [PATCH v4 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs()

This WARN_DOUBLE_CLOCK warning is triggered during cpu offline.
------------[ cut here ]------------
rq->clock_update_flags & RQCF_UPDATED
WARNING: CPU: 0 PID: 3323 at kernel/sched/core.c:741
update_rq_clock+0xaf/0x180
Call Trace:
<TASK>
unthrottle_cfs_rq+0x4b/0x300
rq_offline_fair+0x89/0x90
set_rq_offline.part.118+0x28/0x60
rq_attach_root+0xc4/0xd0
cpu_attach_domain+0x3dc/0x7f0
partition_sched_domains_locked+0x2a5/0x3c0
rebuild_sched_domains_locked+0x477/0x830
rebuild_sched_domains+0x1b/0x30
cpuset_hotplug_workfn+0x2ca/0xc90
? balance_push+0x56/0xf0
? _raw_spin_unlock+0x15/0x30
? finish_task_switch+0x98/0x2f0
? __switch_to+0x291/0x410
? __schedule+0x65e/0x1310
process_one_work+0x1bc/0x3d0
worker_thread+0x4c/0x380
? preempt_count_add+0x92/0xa0
? rescuer_thread+0x310/0x310
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30

The rq clock has been updated before the set_rq_offline()
function runs, so we don't need to call update_rq_clock() in
unthrottle_offline_cfs_rqs().
We only need to call rq_clock_start_loop_update() before the
loop starts and rq_clock_stop_loop_update() after the loop
to avoid this warning.

Suggested-by: Vincent Guittot <[email protected]>
Signed-off-by: Hao Jia <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index af9604f4b135..9e961e0ec971 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6124,6 +6124,13 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)

lockdep_assert_rq_held(rq);

+ /*
+ * The rq clock has already been updated before the
+ * set_rq_offline() runs, so we should skip updating
+ * the rq clock again in unthrottle_cfs_rq().
+ */
+ rq_clock_start_loop_update(rq);
+
rcu_read_lock();
list_for_each_entry_rcu(tg, &task_groups, list) {
struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
@@ -6146,6 +6153,8 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
unthrottle_cfs_rq(cfs_rq);
}
rcu_read_unlock();
+
+ rq_clock_stop_loop_update(rq);
}

#else /* CONFIG_CFS_BANDWIDTH */
--
2.37.0 (Apple Git-136)


2023-06-08 07:02:32

by Hao Jia

[permalink] [raw]
Subject: [PATCH v4 3/4] sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()

After commit 8ad075c2eb1f ("sched: Async unthrottling for cfs
bandwidth"), we may update the rq clock multiple times in the loop of
__cfsb_csd_unthrottle(). At that time the following warning will be
triggered.
------------[ cut here ]------------
rq->clock_update_flags & RQCF_UPDATED
WARNING: CPU: 54 PID: 0 at kernel/sched/core.c:741
update_rq_clock+0xaf/0x180
Call Trace:
<TASK>
unthrottle_cfs_rq+0x4b/0x300
__cfsb_csd_unthrottle+0xe0/0x100
__flush_smp_call_function_queue+0xaf/0x1d0
flush_smp_call_function_queue+0x49/0x90
do_idle+0x17c/0x270
cpu_startup_entry+0x19/0x20
start_secondary+0xfa/0x120
secondary_startup_64_no_verify+0xce/0xdb

Before the loop starts, we update the rq clock once and call
rq_clock_start_loop_update() to prevent updating the rq clock
multiple times. And call rq_clock_stop_loop_update() After
the loop to clear rq->clock_update_flags.

Fixes: 8ad075c2eb1f ("sched: Async unthrottling for cfs bandwidth")

Suggested-by: Vincent Guittot <[email protected]>
Signed-off-by: Hao Jia <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 9 +++++++++
kernel/sched/sched.h | 21 +++++++++++++++++++++
2 files changed, 30 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 373ff5f55884..af9604f4b135 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5576,6 +5576,14 @@ static void __cfsb_csd_unthrottle(void *arg)

rq_lock(rq, &rf);

+ /*
+ * Iterating over the list can trigger several call to
+ * update_rq_clock() in unthrottle_cfs_rq().
+ * Do it once and skip the potential next ones.
+ */
+ update_rq_clock(rq);
+ rq_clock_start_loop_update(rq);
+
/*
* Since we hold rq lock we're safe from concurrent manipulation of
* the CSD list. However, this RCU critical section annotates the
@@ -5595,6 +5603,7 @@ static void __cfsb_csd_unthrottle(void *arg)

rcu_read_unlock();

+ rq_clock_stop_loop_update(rq);
rq_unlock(rq, &rf);
}

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ec7b3e0a2b20..50446e401b9f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1546,6 +1546,27 @@ static inline void rq_clock_cancel_skipupdate(struct rq *rq)
rq->clock_update_flags &= ~RQCF_REQ_SKIP;
}

+/*
+ * During cpu offlining and rq wide unthrottling, we can trigger
+ * an update_rq_clock() for several cfs and rt runqueues (Typically
+ * when using list_for_each_entry_*)
+ * rq_clock_start_loop_update() can be called after updating the clock
+ * once and before iterating over the list to prevent multiple update.
+ * After the iterative traversal, we need to call rq_clock_stop_loop_update()
+ * to clear RQCF_ACT_SKIP of rq->clock_update_flags.
+ */
+static inline void rq_clock_start_loop_update(struct rq *rq)
+{
+ lockdep_assert_rq_held(rq);
+ rq->clock_update_flags |= RQCF_ACT_SKIP;
+}
+
+static inline void rq_clock_stop_loop_update(struct rq *rq)
+{
+ lockdep_assert_rq_held(rq);
+ rq->clock_update_flags &= ~RQCF_ACT_SKIP;
+}
+
struct rq_flags {
unsigned long flags;
struct pin_cookie cookie;
--
2.37.0 (Apple Git-136)


2023-06-08 21:35:59

by Benjamin Segall

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs()

Hao Jia <[email protected]> writes:

> This WARN_DOUBLE_CLOCK warning is triggered during cpu offline.
> ------------[ cut here ]------------
> rq->clock_update_flags & RQCF_UPDATED
> WARNING: CPU: 0 PID: 3323 at kernel/sched/core.c:741
> update_rq_clock+0xaf/0x180
> Call Trace:
> <TASK>
> unthrottle_cfs_rq+0x4b/0x300
> rq_offline_fair+0x89/0x90
> set_rq_offline.part.118+0x28/0x60
> rq_attach_root+0xc4/0xd0
> cpu_attach_domain+0x3dc/0x7f0
> partition_sched_domains_locked+0x2a5/0x3c0
> rebuild_sched_domains_locked+0x477/0x830
> rebuild_sched_domains+0x1b/0x30
> cpuset_hotplug_workfn+0x2ca/0xc90
> ? balance_push+0x56/0xf0
> ? _raw_spin_unlock+0x15/0x30
> ? finish_task_switch+0x98/0x2f0
> ? __switch_to+0x291/0x410
> ? __schedule+0x65e/0x1310
> process_one_work+0x1bc/0x3d0
> worker_thread+0x4c/0x380
> ? preempt_count_add+0x92/0xa0
> ? rescuer_thread+0x310/0x310
> kthread+0xe6/0x110
> ? kthread_complete_and_exit+0x20/0x20
> ret_from_fork+0x1f/0x30
>
> The rq clock has been updated before the set_rq_offline()
> function runs, so we don't need to call update_rq_clock() in
> unthrottle_offline_cfs_rqs().

I don't think we do in the path from rq_attach_root (though that's easy
enough to fix, of course).


2023-06-09 03:37:48

by Hao Jia

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v4 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs()



On 2023/6/9 Benjamin Segall wrote:
> Hao Jia <[email protected]> writes:
>
>> This WARN_DOUBLE_CLOCK warning is triggered during cpu offline.
>> ------------[ cut here ]------------
>> rq->clock_update_flags & RQCF_UPDATED
>> WARNING: CPU: 0 PID: 3323 at kernel/sched/core.c:741
>> update_rq_clock+0xaf/0x180
>> Call Trace:
>> <TASK>
>> unthrottle_cfs_rq+0x4b/0x300
>> rq_offline_fair+0x89/0x90
>> set_rq_offline.part.118+0x28/0x60
>> rq_attach_root+0xc4/0xd0
>> cpu_attach_domain+0x3dc/0x7f0
>> partition_sched_domains_locked+0x2a5/0x3c0
>> rebuild_sched_domains_locked+0x477/0x830
>> rebuild_sched_domains+0x1b/0x30
>> cpuset_hotplug_workfn+0x2ca/0xc90
>> ? balance_push+0x56/0xf0
>> ? _raw_spin_unlock+0x15/0x30
>> ? finish_task_switch+0x98/0x2f0
>> ? __switch_to+0x291/0x410
>> ? __schedule+0x65e/0x1310
>> process_one_work+0x1bc/0x3d0
>> worker_thread+0x4c/0x380
>> ? preempt_count_add+0x92/0xa0
>> ? rescuer_thread+0x310/0x310
>> kthread+0xe6/0x110
>> ? kthread_complete_and_exit+0x20/0x20
>> ret_from_fork+0x1f/0x30
>>
>> The rq clock has been updated before the set_rq_offline()
>> function runs, so we don't need to call update_rq_clock() in
>> unthrottle_offline_cfs_rqs().
>
> I don't think we do in the path from rq_attach_root (though that's easy
> enough to fix, of course).
>

Thanks for your review.

Now our fix method is that after applying patch1, we update the rq clock
before set_rq_offline(). Then use rq_clock_{start, stop}_loop_update to
avoid updating the rq clock multiple times in unthrottle_cfs_rq().

Do you have any better suggestions?

Thanks,
Hao

2023-06-09 20:13:32

by Benjamin Segall

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v4 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs()

Hao Jia <[email protected]> writes:

> On 2023/6/9 Benjamin Segall wrote:
>> Hao Jia <[email protected]> writes:
>>
>>> This WARN_DOUBLE_CLOCK warning is triggered during cpu offline.
>>> ------------[ cut here ]------------
>>> rq->clock_update_flags & RQCF_UPDATED
>>> WARNING: CPU: 0 PID: 3323 at kernel/sched/core.c:741
>>> update_rq_clock+0xaf/0x180
>>> Call Trace:
>>> <TASK>
>>> unthrottle_cfs_rq+0x4b/0x300
>>> rq_offline_fair+0x89/0x90
>>> set_rq_offline.part.118+0x28/0x60
>>> rq_attach_root+0xc4/0xd0
>>> cpu_attach_domain+0x3dc/0x7f0
>>> partition_sched_domains_locked+0x2a5/0x3c0
>>> rebuild_sched_domains_locked+0x477/0x830
>>> rebuild_sched_domains+0x1b/0x30
>>> cpuset_hotplug_workfn+0x2ca/0xc90
>>> ? balance_push+0x56/0xf0
>>> ? _raw_spin_unlock+0x15/0x30
>>> ? finish_task_switch+0x98/0x2f0
>>> ? __switch_to+0x291/0x410
>>> ? __schedule+0x65e/0x1310
>>> process_one_work+0x1bc/0x3d0
>>> worker_thread+0x4c/0x380
>>> ? preempt_count_add+0x92/0xa0
>>> ? rescuer_thread+0x310/0x310
>>> kthread+0xe6/0x110
>>> ? kthread_complete_and_exit+0x20/0x20
>>> ret_from_fork+0x1f/0x30
>>>
>>> The rq clock has been updated before the set_rq_offline()
>>> function runs, so we don't need to call update_rq_clock() in
>>> unthrottle_offline_cfs_rqs().
>> I don't think we do in the path from rq_attach_root (though that's easy
>> enough to fix, of course).
>>
>
> Thanks for your review.
>
> Now our fix method is that after applying patch1, we update the rq clock before
> set_rq_offline(). Then use rq_clock_{start, stop}_loop_update to avoid updating
> the rq clock multiple times in unthrottle_cfs_rq().
>
> Do you have any better suggestions?
>
> Thanks,
> Hao

Yeah, the obvious fixes are to either add an update_rq_clock in
rq_attach_root as you suggest, or put it in set_rq_offline instead of
the callers.

2023-06-12 03:05:36

by Hao Jia

[permalink] [raw]
Subject: Re: [External] Re: [PATCH v4 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs()



On 2023/6/10 Benjamin Segall wrote:
> Hao Jia <[email protected]> writes:
>
>> On 2023/6/9 Benjamin Segall wrote:
>>> Hao Jia <[email protected]> writes:
>>>
>>>> This WARN_DOUBLE_CLOCK warning is triggered during cpu offline.
>>>> ------------[ cut here ]------------
>>>> rq->clock_update_flags & RQCF_UPDATED
>>>> WARNING: CPU: 0 PID: 3323 at kernel/sched/core.c:741
>>>> update_rq_clock+0xaf/0x180
>>>> Call Trace:
>>>> <TASK>
>>>> unthrottle_cfs_rq+0x4b/0x300
>>>> rq_offline_fair+0x89/0x90
>>>> set_rq_offline.part.118+0x28/0x60
>>>> rq_attach_root+0xc4/0xd0
>>>> cpu_attach_domain+0x3dc/0x7f0
>>>> partition_sched_domains_locked+0x2a5/0x3c0
>>>> rebuild_sched_domains_locked+0x477/0x830
>>>> rebuild_sched_domains+0x1b/0x30
>>>> cpuset_hotplug_workfn+0x2ca/0xc90
>>>> ? balance_push+0x56/0xf0
>>>> ? _raw_spin_unlock+0x15/0x30
>>>> ? finish_task_switch+0x98/0x2f0
>>>> ? __switch_to+0x291/0x410
>>>> ? __schedule+0x65e/0x1310
>>>> process_one_work+0x1bc/0x3d0
>>>> worker_thread+0x4c/0x380
>>>> ? preempt_count_add+0x92/0xa0
>>>> ? rescuer_thread+0x310/0x310
>>>> kthread+0xe6/0x110
>>>> ? kthread_complete_and_exit+0x20/0x20
>>>> ret_from_fork+0x1f/0x30
>>>>
>>>> The rq clock has been updated before the set_rq_offline()
>>>> function runs, so we don't need to call update_rq_clock() in
>>>> unthrottle_offline_cfs_rqs().
>>> I don't think we do in the path from rq_attach_root (though that's easy
>>> enough to fix, of course).
>>>
>>
>> Thanks for your review.
>>
>> Now our fix method is that after applying patch1, we update the rq clock before
>> set_rq_offline(). Then use rq_clock_{start, stop}_loop_update to avoid updating
>> the rq clock multiple times in unthrottle_cfs_rq().
>>
>> Do you have any better suggestions?
>>
>> Thanks,
>> Hao
>
> Yeah, the obvious fixes are to either add an update_rq_clock in
> rq_attach_root as you suggest, or put it in set_rq_offline instead of
> the callers.

Thanks for your suggestion. I will do it in the next version.

Thanks,
Hao