2023-05-15 07:16:02

by Hao Jia

[permalink] [raw]
Subject: [PATCH v3 0/4] Fix some warnings about rq clock

These four patches fix some warnings about rq clock.

Patch1 fixes the warning of using the old rq clock caused by
missing update rq clock.

Patch2-4 fixes the warning that the rq clock was updated multiple
times while holding the rq lock.

v2->v3:
- Modify the commit information of patch1 to make the description clearer.
- Split v2 patch2 into 3 separate patches.

v1->v2:
- Vincent Guittot suggested using rq_clock_start_loop_update()
to prevent multiple calls to update_rq_clock() in unthrottle_cfs_rq()
instead of removing update_rq_clock() from unthrottle_cfs_rq()
and updating the rq clock before unthrottle_cfs_rq() for patch2.

[v2] https://lore.kernel.org/all/[email protected]
[v1] https://lore.kernel.org/all/[email protected]


Hao Jia (4):
sched/core: Fixed missing rq clock update before calling
set_rq_offline()
sched/core: Avoid double calling update_rq_clock() in
__balance_push_cpu_stop()
sched/core: Avoid multiple calling update_rq_clock() in
__cfsb_csd_unthrottle()
sched/core: Avoid multiple calling update_rq_clock() in
unthrottle_offline_cfs_rqs()

kernel/sched/core.c | 7 ++++---
kernel/sched/fair.c | 18 ++++++++++++++++++
kernel/sched/sched.h | 21 +++++++++++++++++++++
kernel/sched/topology.c | 10 ++++++----
4 files changed, 49 insertions(+), 7 deletions(-)

--
2.37.0



2023-05-15 07:17:21

by Hao Jia

[permalink] [raw]
Subject: [PATCH v3 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs()

This WARN_DOUBLE_CLOCK warning is triggered during cpu offline.
------------[ cut here ]------------
rq->clock_update_flags & RQCF_UPDATED
WARNING: CPU: 0 PID: 3323 at kernel/sched/core.c:741
update_rq_clock+0xaf/0x180
Call Trace:
<TASK>
unthrottle_cfs_rq+0x4b/0x300
rq_offline_fair+0x89/0x90
set_rq_offline.part.118+0x28/0x60
rq_attach_root+0xc4/0xd0
cpu_attach_domain+0x3dc/0x7f0
partition_sched_domains_locked+0x2a5/0x3c0
rebuild_sched_domains_locked+0x477/0x830
rebuild_sched_domains+0x1b/0x30
cpuset_hotplug_workfn+0x2ca/0xc90
? balance_push+0x56/0xf0
? _raw_spin_unlock+0x15/0x30
? finish_task_switch+0x98/0x2f0
? __switch_to+0x291/0x410
? __schedule+0x65e/0x1310
process_one_work+0x1bc/0x3d0
worker_thread+0x4c/0x380
? preempt_count_add+0x92/0xa0
? rescuer_thread+0x310/0x310
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30

The rq clock has been updated before the set_rq_offline()
function runs, so we don't need to call update_rq_clock() in
unthrottle_offline_cfs_rqs().
We only need to call rq_clock_start_loop_update() before the
loop starts and rq_clock_stop_loop_update() after the loop
to avoid this warning.

Suggested-by: Vincent Guittot <[email protected]>
Signed-off-by: Hao Jia <[email protected]>
---
kernel/sched/fair.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index af9604f4b135..9e961e0ec971 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6124,6 +6124,13 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)

lockdep_assert_rq_held(rq);

+ /*
+ * The rq clock has already been updated before the
+ * set_rq_offline() runs, so we should skip updating
+ * the rq clock again in unthrottle_cfs_rq().
+ */
+ rq_clock_start_loop_update(rq);
+
rcu_read_lock();
list_for_each_entry_rcu(tg, &task_groups, list) {
struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
@@ -6146,6 +6153,8 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
unthrottle_cfs_rq(cfs_rq);
}
rcu_read_unlock();
+
+ rq_clock_stop_loop_update(rq);
}

#else /* CONFIG_CFS_BANDWIDTH */
--
2.37.0


2023-05-15 13:51:37

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs()

On Mon, 15 May 2023 at 08:39, Hao Jia <[email protected]> wrote:
>
> This WARN_DOUBLE_CLOCK warning is triggered during cpu offline.
> ------------[ cut here ]------------
> rq->clock_update_flags & RQCF_UPDATED
> WARNING: CPU: 0 PID: 3323 at kernel/sched/core.c:741
> update_rq_clock+0xaf/0x180
> Call Trace:
> <TASK>
> unthrottle_cfs_rq+0x4b/0x300
> rq_offline_fair+0x89/0x90
> set_rq_offline.part.118+0x28/0x60
> rq_attach_root+0xc4/0xd0
> cpu_attach_domain+0x3dc/0x7f0
> partition_sched_domains_locked+0x2a5/0x3c0
> rebuild_sched_domains_locked+0x477/0x830
> rebuild_sched_domains+0x1b/0x30
> cpuset_hotplug_workfn+0x2ca/0xc90
> ? balance_push+0x56/0xf0
> ? _raw_spin_unlock+0x15/0x30
> ? finish_task_switch+0x98/0x2f0
> ? __switch_to+0x291/0x410
> ? __schedule+0x65e/0x1310
> process_one_work+0x1bc/0x3d0
> worker_thread+0x4c/0x380
> ? preempt_count_add+0x92/0xa0
> ? rescuer_thread+0x310/0x310
> kthread+0xe6/0x110
> ? kthread_complete_and_exit+0x20/0x20
> ret_from_fork+0x1f/0x30
>
> The rq clock has been updated before the set_rq_offline()
> function runs, so we don't need to call update_rq_clock() in
> unthrottle_offline_cfs_rqs().
> We only need to call rq_clock_start_loop_update() before the
> loop starts and rq_clock_stop_loop_update() after the loop
> to avoid this warning.
>
> Suggested-by: Vincent Guittot <[email protected]>
> Signed-off-by: Hao Jia <[email protected]>

Reviewed-by: Vincent Guittot <[email protected]>

> ---
> kernel/sched/fair.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index af9604f4b135..9e961e0ec971 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6124,6 +6124,13 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
>
> lockdep_assert_rq_held(rq);
>
> + /*
> + * The rq clock has already been updated before the
> + * set_rq_offline() runs, so we should skip updating
> + * the rq clock again in unthrottle_cfs_rq().
> + */
> + rq_clock_start_loop_update(rq);
> +
> rcu_read_lock();
> list_for_each_entry_rcu(tg, &task_groups, list) {
> struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
> @@ -6146,6 +6153,8 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
> unthrottle_cfs_rq(cfs_rq);
> }
> rcu_read_unlock();
> +
> + rq_clock_stop_loop_update(rq);
> }
>
> #else /* CONFIG_CFS_BANDWIDTH */
> --
> 2.37.0
>

2023-05-23 08:14:30

by Hao Jia

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] Fix some warnings about rq clock

kindly ping...


On 2023/5/15 Hao Jia wrote:
> These four patches fix some warnings about rq clock.
>
> Patch1 fixes the warning of using the old rq clock caused by
> missing update rq clock.
>
> Patch2-4 fixes the warning that the rq clock was updated multiple
> times while holding the rq lock.
>
> v2->v3:
> - Modify the commit information of patch1 to make the description clearer.
> - Split v2 patch2 into 3 separate patches.
>
> v1->v2:
> - Vincent Guittot suggested using rq_clock_start_loop_update()
> to prevent multiple calls to update_rq_clock() in unthrottle_cfs_rq()
> instead of removing update_rq_clock() from unthrottle_cfs_rq()
> and updating the rq clock before unthrottle_cfs_rq() for patch2.
>
> [v2] https://lore.kernel.org/all/[email protected]
> [v1] https://lore.kernel.org/all/[email protected]
>
>
> Hao Jia (4):
> sched/core: Fixed missing rq clock update before calling
> set_rq_offline()
> sched/core: Avoid double calling update_rq_clock() in
> __balance_push_cpu_stop()
> sched/core: Avoid multiple calling update_rq_clock() in
> __cfsb_csd_unthrottle()
> sched/core: Avoid multiple calling update_rq_clock() in
> unthrottle_offline_cfs_rqs()
>
> kernel/sched/core.c | 7 ++++---
> kernel/sched/fair.c | 18 ++++++++++++++++++
> kernel/sched/sched.h | 21 +++++++++++++++++++++
> kernel/sched/topology.c | 10 ++++++----
> 4 files changed, 49 insertions(+), 7 deletions(-)
>

2023-06-01 09:34:33

by Hao Jia

[permalink] [raw]
Subject: Re: [PATCH v3 0/4] Fix some warnings about rq clock

Kindly ping, any comment?

Thanks,
Hao

On 2023/5/15 Hao Jia wrote:
> These four patches fix some warnings about rq clock.
>
> Patch1 fixes the warning of using the old rq clock caused by
> missing update rq clock.
>
> Patch2-4 fixes the warning that the rq clock was updated multiple
> times while holding the rq lock.
>
> v2->v3:
> - Modify the commit information of patch1 to make the description clearer.
> - Split v2 patch2 into 3 separate patches.
>
> v1->v2:
> - Vincent Guittot suggested using rq_clock_start_loop_update()
> to prevent multiple calls to update_rq_clock() in unthrottle_cfs_rq()
> instead of removing update_rq_clock() from unthrottle_cfs_rq()
> and updating the rq clock before unthrottle_cfs_rq() for patch2.
>
> [v2] https://lore.kernel.org/all/[email protected]
> [v1] https://lore.kernel.org/all/[email protected]
>
>
> Hao Jia (4):
> sched/core: Fixed missing rq clock update before calling
> set_rq_offline()
> sched/core: Avoid double calling update_rq_clock() in
> __balance_push_cpu_stop()
> sched/core: Avoid multiple calling update_rq_clock() in
> __cfsb_csd_unthrottle()
> sched/core: Avoid multiple calling update_rq_clock() in
> unthrottle_offline_cfs_rqs()
>
> kernel/sched/core.c | 7 ++++---
> kernel/sched/fair.c | 18 ++++++++++++++++++
> kernel/sched/sched.h | 21 +++++++++++++++++++++
> kernel/sched/topology.c | 10 ++++++----
> 4 files changed, 49 insertions(+), 7 deletions(-)
>