LinuxLists.cc - [PATCH v5 0/4] Fix some warnings about rq clock

2023-06-13 08:52:14

Subject: [PATCH v5 0/4] Fix some warnings about rq clock

These four patches fix some warnings about rq clock.

Patch1 fixes the warning of using the old rq clock caused by
missing update rq clock.

Patch2-4 fixes the warning that the rq clock was updated multiple
times while holding the rq lock.

v4->v5:
- Benjamin Segall suggested adding update_rq_clock() to set_rq_offline()
instead of calling update_rq_clock() before set_rq_offline() runs.

v3->v4:
- Add Reviewed-by: Vincent Guittot <[email protected]> for
all patches.

v2->v3:
- Modify the commit information of patch1 to make the description clearer.
- Split v2 patch2 into 3 separate patches.

v1->v2:
- Vincent Guittot suggested using rq_clock_start_loop_update()
to prevent multiple calls to update_rq_clock() in unthrottle_cfs_rq()
instead of removing update_rq_clock() from unthrottle_cfs_rq()
and updating the rq clock before unthrottle_cfs_rq() for patch2.

[v4] https://lore.kernel.org/all/[email protected]
[v3] https://lore.kernel.org/all/[email protected]
[v2] https://lore.kernel.org/all/[email protected]
[v1] https://lore.kernel.org/all/[email protected]

Hao Jia (4):
sched/core: Fixed missing rq clock update before calling
set_rq_offline()
sched/core: Avoid double calling update_rq_clock() in
__balance_push_cpu_stop()
sched/core: Avoid multiple calling update_rq_clock() in
__cfsb_csd_unthrottle()
sched/core: Avoid multiple calling update_rq_clock() in
unthrottle_offline_cfs_rqs()

kernel/sched/core.c | 9 +++++----
kernel/sched/fair.c | 18 ++++++++++++++++++
kernel/sched/sched.h | 21 +++++++++++++++++++++
kernel/sched/topology.c | 6 +++---
4 files changed, 47 insertions(+), 7 deletions(-)

--
2.20.1

2023-06-13 08:53:37

by Hao Jia

[permalink] [raw]

Subject: [PATCH v5 2/4] sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()

The WARN_DOUBLE_CLOCK warning is triggered during cpu offline.
------------[ cut here ]------------
rq->clock_update_flags & RQCF_UPDATED
WARNING: CPU: 17 PID: 138 at kernel/sched/core.c:741
update_rq_clock+0xaf/0x180
Call Trace:
<TASK>
__balance_push_cpu_stop+0x146/0x180
? migration_cpu_stop+0x2a0/0x2a0
cpu_stopper_thread+0xa3/0x140
smpboot_thread_fn+0x14f/0x210
? sort_range+0x20/0x20
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30

To avoid this warning, we remove update_rq_clock() from
the __migrate_task() function. And in order to avoid
missing rq clock update, add update_rq_clock() call before
migration_cpu_stop() calls __migrate_task().

Signed-off-by: Hao Jia <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
---
kernel/sched/core.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a8be5415daba..1eca36299d8b 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2398,7 +2398,6 @@ static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
if (!is_cpu_allowed(p, dest_cpu))
return rq;

- update_rq_clock(rq);
rq = move_queued_task(rq, rf, p, dest_cpu);

return rq;
@@ -2456,10 +2455,12 @@ static int migration_cpu_stop(void *data)
goto out;
}

- if (task_on_rq_queued(p))
+ if (task_on_rq_queued(p)) {
+ update_rq_clock(rq);
rq = __migrate_task(rq, &rf, p, arg->dest_cpu);
- else
+ } else {
p->wake_cpu = arg->dest_cpu;
+ }

/*
* XXX __migrate_task() can fail, at which point we might end
--
2.20.1

2023-06-13 09:02:39

by Hao Jia

[permalink] [raw]

Subject: [PATCH v5 1/4] sched/core: Fixed missing rq clock update before calling set_rq_offline()

This is triggered during cpu offline when CONFIG_CPU_FREQ is enabled
and cpufreq is set to powersave:
------------[ cut here ]------------
rq->clock_update_flags < RQCF_ACT_SKIP
WARNING: CPU: 24 PID: 754 at kernel/sched/sched.h:1496
enqueue_top_rt_rq+0x139/0x160
Call Trace:
<TASK>
? intel_pstate_update_util+0x3b0/0x3b0
rq_offline_rt+0x1b7/0x250
set_rq_offline.part.120+0x28/0x60
rq_attach_root+0xc4/0xd0
cpu_attach_domain+0x3dc/0x7f0
? __schedule+0x65e/0x1310
partition_sched_domains_locked+0x2a5/0x3c0
rebuild_sched_domains_locked+0x477/0x830
? percpu_rwsem_wait+0x140/0x140
rebuild_sched_domains+0x1b/0x30
cpuset_hotplug_workfn+0x2ca/0xc90
? balance_push+0x56/0x120
? _raw_spin_unlock+0x15/0x30
? finish_task_switch+0x98/0x2f0
? __switch_to+0x116/0x410
? __schedule+0x65e/0x1310 ? internal_add_timer+0x42/0x60
? _raw_spin_unlock_irqrestore+0x23/0x40
? add_timer_on+0xd5/0x130
process_one_work+0x1bc/0x3d0
worker_thread+0x4c/0x380
? preempt_count_add+0x56/0xa0
? rescuer_thread+0x310/0x310
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30

More detailed key function call graph:
rq_offline_rt()
__disable_runtime()
sched_rt_rq_enqueue()
enqueue_top_rt_rq()
cpufreq_update_util() <-- depends on CONFIG_CPU_FREQ
data->func(data, *rq_clock(rq)*, flags)
intel_pstate_update_util() <-- powersave policy callback function

Before calling rq_offline_rt() we need to update the rq clock to avoid
using the old rq clock, So we add update_rq_clock() to set_rq_offline()
to update rq clock. And we use rq_lock_irqsave()/rq_unlock_irqrestore()
to replace raw_spin_rq_lock_irqsave()/raw_spin_rq_unlock_irqrestore()
in rq_attach_root() to ensure that rq->clock_update_flags are cleared
before updating the rq clock.

Steps to reproduce:
1. Enable CONFIG_SMP and CONFIG_CPU_FREQ when compiling the kernel
2. echo 1 > /sys/kernel/debug/clear_warn_once
3. cpupower -c all frequency-set -g powersave
4. Run some rt tasks e.g. Create 5*n rt (100% running) tasks (on a
system with n CPUs)
5. Offline cpu one by one until the warninng is triggered

Suggested-by: Ben Segall <[email protected]>
Signed-off-by: Hao Jia <[email protected]>
---
kernel/sched/core.c | 2 +-
kernel/sched/topology.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a68d1276bab0..a8be5415daba 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9548,6 +9548,7 @@ void set_rq_offline(struct rq *rq)
if (rq->online) {
const struct sched_class *class;

+ update_rq_clock(rq);
for_each_class(class) {
if (class->rq_offline)
class->rq_offline(rq);
@@ -9689,7 +9690,6 @@ int sched_cpu_deactivate(unsigned int cpu)

rq_lock_irqsave(rq, &rf);
if (rq->rd) {
- update_rq_clock(rq);
BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
set_rq_offline(rq);
}
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 6682535e37c8..52976eadfee9 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -487,9 +487,9 @@ static void free_rootdomain(struct rcu_head *rcu)
void rq_attach_root(struct rq *rq, struct root_domain *rd)
{
struct root_domain *old_rd = NULL;
- unsigned long flags;
+ struct rq_flags rf;

- raw_spin_rq_lock_irqsave(rq, flags);
+ rq_lock_irqsave(rq, &rf);

if (rq->rd) {
old_rd = rq->rd;
@@ -515,7 +515,7 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
if (cpumask_test_cpu(rq->cpu, cpu_active_mask))
set_rq_online(rq);

- raw_spin_rq_unlock_irqrestore(rq, flags);
+ rq_unlock_irqrestore(rq, &rf);

if (old_rd)
call_rcu(&old_rd->rcu, free_rootdomain);
--
2.20.1

2023-06-13 09:05:08

by Hao Jia

[permalink] [raw]

Subject: [PATCH v5 3/4] sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()

After commit 8ad075c2eb1f ("sched: Async unthrottling for cfs
bandwidth"), we may update the rq clock multiple times in the loop of
__cfsb_csd_unthrottle(). At that time the following warning will be
triggered.
------------[ cut here ]------------
rq->clock_update_flags & RQCF_UPDATED
WARNING: CPU: 54 PID: 0 at kernel/sched/core.c:741
update_rq_clock+0xaf/0x180
Call Trace:
<TASK>
unthrottle_cfs_rq+0x4b/0x300
__cfsb_csd_unthrottle+0xe0/0x100
__flush_smp_call_function_queue+0xaf/0x1d0
flush_smp_call_function_queue+0x49/0x90
do_idle+0x17c/0x270
cpu_startup_entry+0x19/0x20
start_secondary+0xfa/0x120
secondary_startup_64_no_verify+0xce/0xdb

Before the loop starts, we update the rq clock once and call
rq_clock_start_loop_update() to prevent updating the rq clock
multiple times. And call rq_clock_stop_loop_update() After
the loop to clear rq->clock_update_flags.

Fixes: 8ad075c2eb1f ("sched: Async unthrottling for cfs bandwidth")

Suggested-by: Vincent Guittot <[email protected]>
Signed-off-by: Hao Jia <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 9 +++++++++
kernel/sched/sched.h | 21 +++++++++++++++++++++
2 files changed, 30 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 373ff5f55884..af9604f4b135 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5576,6 +5576,14 @@ static void __cfsb_csd_unthrottle(void *arg)

rq_lock(rq, &rf);

+ /*
+ * Iterating over the list can trigger several call to
+ * update_rq_clock() in unthrottle_cfs_rq().
+ * Do it once and skip the potential next ones.
+ */
+ update_rq_clock(rq);
+ rq_clock_start_loop_update(rq);
+
/*
* Since we hold rq lock we're safe from concurrent manipulation of
* the CSD list. However, this RCU critical section annotates the
@@ -5595,6 +5603,7 @@ static void __cfsb_csd_unthrottle(void *arg)

rcu_read_unlock();

+ rq_clock_stop_loop_update(rq);
rq_unlock(rq, &rf);
}

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index ec7b3e0a2b20..50446e401b9f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1546,6 +1546,27 @@ static inline void rq_clock_cancel_skipupdate(struct rq *rq)
rq->clock_update_flags &= ~RQCF_REQ_SKIP;
}

+/*
+ * During cpu offlining and rq wide unthrottling, we can trigger
+ * an update_rq_clock() for several cfs and rt runqueues (Typically
+ * when using list_for_each_entry_*)
+ * rq_clock_start_loop_update() can be called after updating the clock
+ * once and before iterating over the list to prevent multiple update.
+ * After the iterative traversal, we need to call rq_clock_stop_loop_update()
+ * to clear RQCF_ACT_SKIP of rq->clock_update_flags.
+ */
+static inline void rq_clock_start_loop_update(struct rq *rq)
+{
+ lockdep_assert_rq_held(rq);
+ rq->clock_update_flags |= RQCF_ACT_SKIP;
+}
+
+static inline void rq_clock_stop_loop_update(struct rq *rq)
+{
+ lockdep_assert_rq_held(rq);
+ rq->clock_update_flags &= ~RQCF_ACT_SKIP;
+}
+
struct rq_flags {
unsigned long flags;
struct pin_cookie cookie;
--
2.20.1

2023-06-13 09:07:36

by Hao Jia

[permalink] [raw]

Subject: Re: [PATCH v5 1/4] sched/core: Fixed missing rq clock update before calling set_rq_offline()

On 2023/6/13 Hao Jia wrote:
> This is triggered during cpu offline when CONFIG_CPU_FREQ is enabled
> and cpufreq is set to powersave:
> ------------[ cut here ]------------
> rq->clock_update_flags < RQCF_ACT_SKIP
> WARNING: CPU: 24 PID: 754 at kernel/sched/sched.h:1496
> enqueue_top_rt_rq+0x139/0x160
> Call Trace:
> <TASK>
> ? intel_pstate_update_util+0x3b0/0x3b0
> rq_offline_rt+0x1b7/0x250
> set_rq_offline.part.120+0x28/0x60
> rq_attach_root+0xc4/0xd0
> cpu_attach_domain+0x3dc/0x7f0
> ? __schedule+0x65e/0x1310
> partition_sched_domains_locked+0x2a5/0x3c0
> rebuild_sched_domains_locked+0x477/0x830
> ? percpu_rwsem_wait+0x140/0x140
> rebuild_sched_domains+0x1b/0x30
> cpuset_hotplug_workfn+0x2ca/0xc90
> ? balance_push+0x56/0x120
> ? _raw_spin_unlock+0x15/0x30
> ? finish_task_switch+0x98/0x2f0
> ? __switch_to+0x116/0x410
> ? __schedule+0x65e/0x1310 ? internal_add_timer+0x42/0x60
> ? _raw_spin_unlock_irqrestore+0x23/0x40
> ? add_timer_on+0xd5/0x130
> process_one_work+0x1bc/0x3d0
> worker_thread+0x4c/0x380
> ? preempt_count_add+0x56/0xa0
> ? rescuer_thread+0x310/0x310
> kthread+0xe6/0x110
> ? kthread_complete_and_exit+0x20/0x20
> ret_from_fork+0x1f/0x30
>
> More detailed key function call graph:
> rq_offline_rt()
> __disable_runtime()
> sched_rt_rq_enqueue()
> enqueue_top_rt_rq()
> cpufreq_update_util() <-- depends on CONFIG_CPU_FREQ
> data->func(data, *rq_clock(rq)*, flags)
> intel_pstate_update_util() <-- powersave policy callback function
>
> Before calling rq_offline_rt() we need to update the rq clock to avoid
> using the old rq clock, So we add update_rq_clock() to set_rq_offline()
> to update rq clock. And we use rq_lock_irqsave()/rq_unlock_irqrestore()
> to replace raw_spin_rq_lock_irqsave()/raw_spin_rq_unlock_irqrestore()
> in rq_attach_root() to ensure that rq->clock_update_flags are cleared
> before updating the rq clock.
>
> Steps to reproduce:
> 1. Enable CONFIG_SMP and CONFIG_CPU_FREQ when compiling the kernel
> 2. echo 1 > /sys/kernel/debug/clear_warn_once
> 3. cpupower -c all frequency-set -g powersave
> 4. Run some rt tasks e.g. Create 5*n rt (100% running) tasks (on a
> system with n CPUs)
> 5. Offline cpu one by one until the warninng is triggered
>
> Suggested-by: Ben Segall <[email protected]>
> Signed-off-by: Hao Jia <[email protected]>

Hi Vincent,

Benjamin Segall suggested adding update_rq_clock() to set_rq_offline()
instead of calling update_rq_clock() before set_rq_offline() runs.
The code of patch1 has changed, so I removed your tag "Reviewed-by".
Please review again.

Thanks,
Hao

> ---
> kernel/sched/core.c | 2 +-
> kernel/sched/topology.c | 6 +++---
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a68d1276bab0..a8be5415daba 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9548,6 +9548,7 @@ void set_rq_offline(struct rq *rq)
> if (rq->online) {
> const struct sched_class *class;
>
> + update_rq_clock(rq);
> for_each_class(class) {
> if (class->rq_offline)
> class->rq_offline(rq);
> @@ -9689,7 +9690,6 @@ int sched_cpu_deactivate(unsigned int cpu)
>
> rq_lock_irqsave(rq, &rf);
> if (rq->rd) {
> - update_rq_clock(rq);
> BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
> set_rq_offline(rq);
> }
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 6682535e37c8..52976eadfee9 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -487,9 +487,9 @@ static void free_rootdomain(struct rcu_head *rcu)
> void rq_attach_root(struct rq *rq, struct root_domain *rd)
> {
> struct root_domain *old_rd = NULL;
> - unsigned long flags;
> + struct rq_flags rf;
>
> - raw_spin_rq_lock_irqsave(rq, flags);
> + rq_lock_irqsave(rq, &rf);
>
> if (rq->rd) {
> old_rd = rq->rd;
> @@ -515,7 +515,7 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
> if (cpumask_test_cpu(rq->cpu, cpu_active_mask))
> set_rq_online(rq);
>
> - raw_spin_rq_unlock_irqrestore(rq, flags);
> + rq_unlock_irqrestore(rq, &rf);
>
> if (old_rd)
> call_rcu(&old_rd->rcu, free_rootdomain);

2023-06-13 09:07:38

by Hao Jia

[permalink] [raw]

Subject: [PATCH v5 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs()

This WARN_DOUBLE_CLOCK warning is triggered during cpu offline.
------------[ cut here ]------------
rq->clock_update_flags & RQCF_UPDATED
WARNING: CPU: 0 PID: 3323 at kernel/sched/core.c:741
update_rq_clock+0xaf/0x180
Call Trace:
<TASK>
unthrottle_cfs_rq+0x4b/0x300
rq_offline_fair+0x89/0x90
set_rq_offline.part.118+0x28/0x60
rq_attach_root+0xc4/0xd0
cpu_attach_domain+0x3dc/0x7f0
partition_sched_domains_locked+0x2a5/0x3c0
rebuild_sched_domains_locked+0x477/0x830
rebuild_sched_domains+0x1b/0x30
cpuset_hotplug_workfn+0x2ca/0xc90
? balance_push+0x56/0xf0
? _raw_spin_unlock+0x15/0x30
? finish_task_switch+0x98/0x2f0
? __switch_to+0x291/0x410
? __schedule+0x65e/0x1310
process_one_work+0x1bc/0x3d0
worker_thread+0x4c/0x380
? preempt_count_add+0x92/0xa0
? rescuer_thread+0x310/0x310
kthread+0xe6/0x110
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30

The rq clock has been updated in the set_rq_offline(),
so we don't need to call update_rq_clock() in
unthrottle_offline_cfs_rqs().
We only need to call rq_clock_start_loop_update() before the
loop starts and rq_clock_stop_loop_update() after the loop
to avoid this warning.

Suggested-by: Vincent Guittot <[email protected]>
Signed-off-by: Hao Jia <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
---
kernel/sched/fair.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index af9604f4b135..4da5f3541762 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6124,6 +6124,13 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)

lockdep_assert_rq_held(rq);

+ /*
+ * The rq clock has already been updated in the
+ * set_rq_offline(), so we should skip updating
+ * the rq clock again in unthrottle_cfs_rq().
+ */
+ rq_clock_start_loop_update(rq);
+
rcu_read_lock();
list_for_each_entry_rcu(tg, &task_groups, list) {
struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
@@ -6146,6 +6153,8 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
unthrottle_cfs_rq(cfs_rq);
}
rcu_read_unlock();
+
+ rq_clock_stop_loop_update(rq);
}

#else /* CONFIG_CFS_BANDWIDTH */
--
2.20.1

2023-06-13 09:36:28

by Vincent Guittot

[permalink] [raw]

Subject: Re: [PATCH v5 1/4] sched/core: Fixed missing rq clock update before calling set_rq_offline()

On Tue, 13 Jun 2023 at 10:31, Hao Jia <[email protected]> wrote:
>
>
>
> On 2023/6/13 Hao Jia wrote:
> > This is triggered during cpu offline when CONFIG_CPU_FREQ is enabled
> > and cpufreq is set to powersave:
> > ------------[ cut here ]------------
> > rq->clock_update_flags < RQCF_ACT_SKIP
> > WARNING: CPU: 24 PID: 754 at kernel/sched/sched.h:1496
> > enqueue_top_rt_rq+0x139/0x160
> > Call Trace:
> > <TASK>
> > ? intel_pstate_update_util+0x3b0/0x3b0
> > rq_offline_rt+0x1b7/0x250
> > set_rq_offline.part.120+0x28/0x60
> > rq_attach_root+0xc4/0xd0
> > cpu_attach_domain+0x3dc/0x7f0
> > ? __schedule+0x65e/0x1310
> > partition_sched_domains_locked+0x2a5/0x3c0
> > rebuild_sched_domains_locked+0x477/0x830
> > ? percpu_rwsem_wait+0x140/0x140
> > rebuild_sched_domains+0x1b/0x30
> > cpuset_hotplug_workfn+0x2ca/0xc90
> > ? balance_push+0x56/0x120
> > ? _raw_spin_unlock+0x15/0x30
> > ? finish_task_switch+0x98/0x2f0
> > ? __switch_to+0x116/0x410
> > ? __schedule+0x65e/0x1310 ? internal_add_timer+0x42/0x60
> > ? _raw_spin_unlock_irqrestore+0x23/0x40
> > ? add_timer_on+0xd5/0x130
> > process_one_work+0x1bc/0x3d0
> > worker_thread+0x4c/0x380
> > ? preempt_count_add+0x56/0xa0
> > ? rescuer_thread+0x310/0x310
> > kthread+0xe6/0x110
> > ? kthread_complete_and_exit+0x20/0x20
> > ret_from_fork+0x1f/0x30
> >
> > More detailed key function call graph:
> > rq_offline_rt()
> > __disable_runtime()
> > sched_rt_rq_enqueue()
> > enqueue_top_rt_rq()
> > cpufreq_update_util() <-- depends on CONFIG_CPU_FREQ
> > data->func(data, *rq_clock(rq)*, flags)
> > intel_pstate_update_util() <-- powersave policy callback function
> >
> > Before calling rq_offline_rt() we need to update the rq clock to avoid
> > using the old rq clock, So we add update_rq_clock() to set_rq_offline()
> > to update rq clock. And we use rq_lock_irqsave()/rq_unlock_irqrestore()
> > to replace raw_spin_rq_lock_irqsave()/raw_spin_rq_unlock_irqrestore()
> > in rq_attach_root() to ensure that rq->clock_update_flags are cleared
> > before updating the rq clock.
> >
> > Steps to reproduce:
> > 1. Enable CONFIG_SMP and CONFIG_CPU_FREQ when compiling the kernel
> > 2. echo 1 > /sys/kernel/debug/clear_warn_once
> > 3. cpupower -c all frequency-set -g powersave
> > 4. Run some rt tasks e.g. Create 5*n rt (100% running) tasks (on a
> > system with n CPUs)
> > 5. Offline cpu one by one until the warninng is triggered
> >
> > Suggested-by: Ben Segall <[email protected]>
> > Signed-off-by: Hao Jia <[email protected]>
>
> Hi Vincent,
>
> Benjamin Segall suggested adding update_rq_clock() to set_rq_offline()
> instead of calling update_rq_clock() before set_rq_offline() runs.
> The code of patch1 has changed, so I removed your tag "Reviewed-by".
> Please review again.

v5 looks good to me

Reviewed-by: Vincent Guittot <[email protected]>

>
> Thanks,
> Hao
>
> > ---
> > kernel/sched/core.c | 2 +-
> > kernel/sched/topology.c | 6 +++---
> > 2 files changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index a68d1276bab0..a8be5415daba 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -9548,6 +9548,7 @@ void set_rq_offline(struct rq *rq)
> > if (rq->online) {
> > const struct sched_class *class;
> >
> > + update_rq_clock(rq);
> > for_each_class(class) {
> > if (class->rq_offline)
> > class->rq_offline(rq);
> > @@ -9689,7 +9690,6 @@ int sched_cpu_deactivate(unsigned int cpu)
> >
> > rq_lock_irqsave(rq, &rf);
> > if (rq->rd) {
> > - update_rq_clock(rq);
> > BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
> > set_rq_offline(rq);
> > }
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 6682535e37c8..52976eadfee9 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -487,9 +487,9 @@ static void free_rootdomain(struct rcu_head *rcu)
> > void rq_attach_root(struct rq *rq, struct root_domain *rd)
> > {
> > struct root_domain *old_rd = NULL;
> > - unsigned long flags;
> > + struct rq_flags rf;
> >
> > - raw_spin_rq_lock_irqsave(rq, flags);
> > + rq_lock_irqsave(rq, &rf);
> >
> > if (rq->rd) {
> > old_rd = rq->rd;
> > @@ -515,7 +515,7 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
> > if (cpumask_test_cpu(rq->cpu, cpu_active_mask))
> > set_rq_online(rq);
> >
> > - raw_spin_rq_unlock_irqrestore(rq, flags);
> > + rq_unlock_irqrestore(rq, &rf);
> >
> > if (old_rd)
> > call_rcu(&old_rd->rcu, free_rootdomain);

2023-06-13 22:10:07

by Benjamin Segall

[permalink] [raw]

Subject: Re: [PATCH v5 4/4] sched/core: Avoid multiple calling update_rq_clock() in unthrottle_offline_cfs_rqs()

Hao Jia <[email protected]> writes:

> This WARN_DOUBLE_CLOCK warning is triggered during cpu offline.
> ------------[ cut here ]------------
> rq->clock_update_flags & RQCF_UPDATED
> WARNING: CPU: 0 PID: 3323 at kernel/sched/core.c:741
> update_rq_clock+0xaf/0x180
> Call Trace:
> <TASK>
> unthrottle_cfs_rq+0x4b/0x300
> rq_offline_fair+0x89/0x90
> set_rq_offline.part.118+0x28/0x60
> rq_attach_root+0xc4/0xd0
> cpu_attach_domain+0x3dc/0x7f0
> partition_sched_domains_locked+0x2a5/0x3c0
> rebuild_sched_domains_locked+0x477/0x830
> rebuild_sched_domains+0x1b/0x30
> cpuset_hotplug_workfn+0x2ca/0xc90
> ? balance_push+0x56/0xf0
> ? _raw_spin_unlock+0x15/0x30
> ? finish_task_switch+0x98/0x2f0
> ? __switch_to+0x291/0x410
> ? __schedule+0x65e/0x1310
> process_one_work+0x1bc/0x3d0
> worker_thread+0x4c/0x380
> ? preempt_count_add+0x92/0xa0
> ? rescuer_thread+0x310/0x310
> kthread+0xe6/0x110
> ? kthread_complete_and_exit+0x20/0x20
> ret_from_fork+0x1f/0x30
>
> The rq clock has been updated in the set_rq_offline(),
> so we don't need to call update_rq_clock() in
> unthrottle_offline_cfs_rqs().
> We only need to call rq_clock_start_loop_update() before the
> loop starts and rq_clock_stop_loop_update() after the loop
> to avoid this warning.
>

Both of these cfsb patches look sensible to me.

Reviewed-By: Ben Segall <[email protected]>

2023-06-15 13:02:34

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [PATCH v5 2/4] sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()

On Tue, Jun 13, 2023 at 04:20:10PM +0800, Hao Jia wrote:
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a8be5415daba..1eca36299d8b 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2398,7 +2398,6 @@ static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
> if (!is_cpu_allowed(p, dest_cpu))
> return rq;
>
> - update_rq_clock(rq);
> rq = move_queued_task(rq, rf, p, dest_cpu);
>
> return rq;
> @@ -2456,10 +2455,12 @@ static int migration_cpu_stop(void *data)
> goto out;
> }
>
> - if (task_on_rq_queued(p))
> + if (task_on_rq_queued(p)) {
> + update_rq_clock(rq);
> rq = __migrate_task(rq, &rf, p, arg->dest_cpu);
> - else
> + } else {
> p->wake_cpu = arg->dest_cpu;
> + }
>
> /*
> * XXX __migrate_task() can fail, at which point we might end

So now you've got update_rq_clock() in both callers, why not remove it
from __balance_push_cpu_stop() ?

Afaict nothing actually needs it before __migrate_task().

---
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9435,8 +9435,6 @@ static int __balance_push_cpu_stop(void
raw_spin_lock_irq(&p->pi_lock);
rq_lock(rq, &rf);

- update_rq_clock(rq);
-
if (task_rq(p) == rq && task_on_rq_queued(p)) {
cpu = select_fallback_rq(rq->cpu, p);
rq = __migrate_task(rq, &rf, p, cpu);

2023-06-16 08:33:51

by Hao Jia

[permalink] [raw]

Subject: Re: [External] Re: [PATCH v5 2/4] sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()

On 2023/6/15 Peter Zijlstra wrote:
> On Tue, Jun 13, 2023 at 04:20:10PM +0800, Hao Jia wrote:
>> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> index a8be5415daba..1eca36299d8b 100644
>> --- a/kernel/sched/core.c
>> +++ b/kernel/sched/core.c
>> @@ -2398,7 +2398,6 @@ static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
>> if (!is_cpu_allowed(p, dest_cpu))
>> return rq;
>>
>> - update_rq_clock(rq);
>> rq = move_queued_task(rq, rf, p, dest_cpu);
>>
>> return rq;
>> @@ -2456,10 +2455,12 @@ static int migration_cpu_stop(void *data)
>> goto out;
>> }
>>
>> - if (task_on_rq_queued(p))
>> + if (task_on_rq_queued(p)) {
>> + update_rq_clock(rq);
>> rq = __migrate_task(rq, &rf, p, arg->dest_cpu);
>> - else
>> + } else {
>> p->wake_cpu = arg->dest_cpu;
>> + }
>>
>> /*
>> * XXX __migrate_task() can fail, at which point we might end
>
> So now you've got update_rq_clock() in both callers, why not remove it
> from __balance_push_cpu_stop() ?
>
> Afaict nothing actually needs it before __migrate_task().
>

Thanks for your review.
I'm afraid not, the rq clock also needs to be updated before
select_fallback_rq() is called.

> ---
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -9435,8 +9435,6 @@ static int __balance_push_cpu_stop(void
> raw_spin_lock_irq(&p->pi_lock);
> rq_lock(rq, &rf);
>
> - update_rq_clock(rq);
> -
> if (task_rq(p) == rq && task_on_rq_queued(p)) {
> cpu = select_fallback_rq(rq->cpu, p);
> rq = __migrate_task(rq, &rf, p, cpu);

If we just remove update_rq_clock() from __balance_push_cpu_stop(), we
will get this warning.

[ 1260.960166] rq->clock_update_flags < RQCF_ACT_SKIP
[ 1260.960170] WARNING: CPU: 25 PID: 196 at kernel/sched/sched.h:1496
update_curr+0xf6/0x1f0

[ 1260.960318] Call Trace:
[ 1260.960320] <TASK>
[ 1260.960323] ? show_regs+0x5b/0x60
[ 1260.960327] ? __warn+0x89/0x140
[ 1260.960331] ? update_curr+0xf6/0x1f0
[ 1260.960334] ? report_bug+0x1b7/0x1e0
[ 1260.960338] ? __wake_up_klogd.part.25+0x5a/0x80
[ 1260.960342] ? handle_bug+0x45/0x80
[ 1260.960346] ? exc_invalid_op+0x18/0x70
[ 1260.960348] ? asm_exc_invalid_op+0x1b/0x20
[ 1260.960354] ? update_curr+0xf6/0x1f0
[ 1260.960356] ? update_curr+0xf6/0x1f0
[ 1260.960359] dequeue_entity+0x3b/0x410
[ 1260.960361] dequeue_task_fair+0xc7/0x3c0
[ 1260.960363] dequeue_task+0x30/0xf0
[ 1260.960365] __do_set_cpus_allowed+0x94/0x130
[ 1260.960366] do_set_cpus_allowed+0x38/0x60
[ 1260.960368] cpuset_cpus_allowed_fallback+0x70/0x80
[ 1260.960372] select_fallback_rq+0x20f/0x250 <----
[ 1260.960374] __balance_push_cpu_stop+0x13f/0x1a0
[ 1260.960377] ? migration_cpu_stop+0x2b0/0x2b0
[ 1260.960379] cpu_stopper_thread+0xaf/0x140
[ 1260.960382] smpboot_thread_fn+0x158/0x220
[ 1260.960387] ? sort_range+0x30/0x30
[ 1260.960390] kthread+0xfe/0x130
[ 1260.960392] ? kthread_complete_and_exit+0x20/0x20
[ 1260.960394] ret_from_fork+0x1f/0x30
[ 1260.960399] </TASK>

Thanks,
Hao

2023-06-16 09:30:45

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [External] Re: [PATCH v5 2/4] sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()

On Fri, Jun 16, 2023 at 03:56:25PM +0800, Hao Jia wrote:

> I'm afraid not, the rq clock also needs to be updated before
> select_fallback_rq() is called.

> If we just remove update_rq_clock() from __balance_push_cpu_stop(), we will
> get this warning.
>
>
> [ 1260.960166] rq->clock_update_flags < RQCF_ACT_SKIP
> [ 1260.960170] WARNING: CPU: 25 PID: 196 at kernel/sched/sched.h:1496
> update_curr+0xf6/0x1f0
>
> [ 1260.960318] Call Trace:
> [ 1260.960320] <TASK>
> [ 1260.960359] dequeue_entity+0x3b/0x410
> [ 1260.960361] dequeue_task_fair+0xc7/0x3c0
> [ 1260.960363] dequeue_task+0x30/0xf0
> [ 1260.960365] __do_set_cpus_allowed+0x94/0x130
> [ 1260.960366] do_set_cpus_allowed+0x38/0x60
> [ 1260.960368] cpuset_cpus_allowed_fallback+0x70/0x80
> [ 1260.960372] select_fallback_rq+0x20f/0x250 <----

Urgh... :-(

2023-06-19 11:26:05

by tip-bot2 for Jacob Pan

[permalink] [raw]

Subject: [tip: sched/core] sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()

The following commit has been merged into the sched/core branch of tip:

Commit-ID: 96500560f0c73c71bca1b27536c6254fa0e8ce37
Gitweb: https://git.kernel.org/tip/96500560f0c73c71bca1b27536c6254fa0e8ce37
Author: Hao Jia <[email protected]>
AuthorDate: Tue, 13 Jun 2023 16:20:10 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 16 Jun 2023 22:08:12 +02:00

sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()

There is a double update_rq_clock() invocation:

__balance_push_cpu_stop()
update_rq_clock()
__migrate_task()
update_rq_clock()

Sadly select_fallback_rq() also needs update_rq_clock() for
__do_set_cpus_allowed(), it is not possible to remove the update from
__balance_push_cpu_stop(). So remove it from __migrate_task() and
ensure all callers of this function call update_rq_clock() prior to
calling it.

Signed-off-by: Hao Jia <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/core.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 442efe5..c7db597 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2546,7 +2546,6 @@ static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
if (!is_cpu_allowed(p, dest_cpu))
return rq;

- update_rq_clock(rq);
rq = move_queued_task(rq, rf, p, dest_cpu);

return rq;
@@ -2604,10 +2603,12 @@ static int migration_cpu_stop(void *data)
goto out;
}

- if (task_on_rq_queued(p))
+ if (task_on_rq_queued(p)) {
+ update_rq_clock(rq);
rq = __migrate_task(rq, &rf, p, arg->dest_cpu);
- else
+ } else {
p->wake_cpu = arg->dest_cpu;
+ }

/*
* XXX __migrate_task() can fail, at which point we might end

2023-06-19 11:41:42

by tip-bot2 for Jacob Pan

[permalink] [raw]

Subject: [tip: sched/core] sched/core: Fixed missing rq clock update before calling set_rq_offline()

The following commit has been merged into the sched/core branch of tip:

Commit-ID: cab3ecaed5cdcc9c36a96874b4c45056a46ece45
Gitweb: https://git.kernel.org/tip/cab3ecaed5cdcc9c36a96874b4c45056a46ece45
Author: Hao Jia <[email protected]>
AuthorDate: Tue, 13 Jun 2023 16:20:09 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 16 Jun 2023 22:08:12 +02:00

sched/core: Fixed missing rq clock update before calling set_rq_offline()

When using a cpufreq governor that uses
cpufreq_add_update_util_hook(), it is possible to trigger a missing
update_rq_clock() warning for the CPU hotplug path:

rq_attach_root()
set_rq_offline()
rq_offline_rt()
__disable_runtime()
sched_rt_rq_enqueue()
enqueue_top_rt_rq()
cpufreq_update_util()
data->func(data, rq_clock(rq), flags)

Move update_rq_clock() from sched_cpu_deactivate() (one of it's
callers) into set_rq_offline() such that it covers all
set_rq_offline() usage.

Additionally change rq_attach_root() to use rq_lock_irqsave() so that
it will properly manage the runqueue clock flags.

Suggested-by: Ben Segall <[email protected]>
Signed-off-by: Hao Jia <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/core.c | 2 +-
kernel/sched/topology.c | 6 +++---
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ac38225..442efe5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -9585,6 +9585,7 @@ void set_rq_offline(struct rq *rq)
if (rq->online) {
const struct sched_class *class;

+ update_rq_clock(rq);
for_each_class(class) {
if (class->rq_offline)
class->rq_offline(rq);
@@ -9726,7 +9727,6 @@ int sched_cpu_deactivate(unsigned int cpu)

rq_lock_irqsave(rq, &rf);
if (rq->rd) {
- update_rq_clock(rq);
BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
set_rq_offline(rq);
}
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index cb92dc5..d3a3b26 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -487,9 +487,9 @@ static void free_rootdomain(struct rcu_head *rcu)
void rq_attach_root(struct rq *rq, struct root_domain *rd)
{
struct root_domain *old_rd = NULL;
- unsigned long flags;
+ struct rq_flags rf;

- raw_spin_rq_lock_irqsave(rq, flags);
+ rq_lock_irqsave(rq, &rf);

if (rq->rd) {
old_rd = rq->rd;
@@ -515,7 +515,7 @@ void rq_attach_root(struct rq *rq, struct root_domain *rd)
if (cpumask_test_cpu(rq->cpu, cpu_active_mask))
set_rq_online(rq);

- raw_spin_rq_unlock_irqrestore(rq, flags);
+ rq_unlock_irqrestore(rq, &rf);

if (old_rd)
call_rcu(&old_rd->rcu, free_rootdomain);

2023-06-19 11:46:48

by tip-bot2 for Jacob Pan

[permalink] [raw]

Subject: [tip: sched/core] sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()

The following commit has been merged into the sched/core branch of tip:

Commit-ID: ebb83d84e49b54369b0db67136a5fe1087124dcc
Gitweb: https://git.kernel.org/tip/ebb83d84e49b54369b0db67136a5fe1087124dcc
Author: Hao Jia <[email protected]>
AuthorDate: Tue, 13 Jun 2023 16:20:11 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Fri, 16 Jun 2023 22:08:13 +02:00

sched/core: Avoid multiple calling update_rq_clock() in __cfsb_csd_unthrottle()

After commit 8ad075c2eb1f ("sched: Async unthrottling for cfs
bandwidth"), we may update the rq clock multiple times in the loop of
__cfsb_csd_unthrottle().

A prior (although less common) instance of this problem exists in
unthrottle_offline_cfs_rqs().

Cure both by ensuring update_rq_clock() is called before the loop and
setting RQCF_ACT_SKIP during the loop, to supress further updates.
The alternative would be pulling update_rq_clock() out of
unthrottle_cfs_rq(), but that gives an even bigger mess.

Fixes: 8ad075c2eb1f ("sched: Async unthrottling for cfs bandwidth")
Reviewed-By: Ben Segall <[email protected]>
Suggested-by: Vincent Guittot <[email protected]>
Signed-off-by: Hao Jia <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 18 ++++++++++++++++++
kernel/sched/sched.h | 22 ++++++++++++++++++++++
2 files changed, 40 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7666dbc..a80a739 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5577,6 +5577,14 @@ static void __cfsb_csd_unthrottle(void *arg)
rq_lock(rq, &rf);

/*
+ * Iterating over the list can trigger several call to
+ * update_rq_clock() in unthrottle_cfs_rq().
+ * Do it once and skip the potential next ones.
+ */
+ update_rq_clock(rq);
+ rq_clock_start_loop_update(rq);
+
+ /*
* Since we hold rq lock we're safe from concurrent manipulation of
* the CSD list. However, this RCU critical section annotates the
* fact that we pair with sched_free_group_rcu(), so that we cannot
@@ -5595,6 +5603,7 @@ static void __cfsb_csd_unthrottle(void *arg)

rcu_read_unlock();

+ rq_clock_stop_loop_update(rq);
rq_unlock(rq, &rf);
}

@@ -6115,6 +6124,13 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)

lockdep_assert_rq_held(rq);

+ /*
+ * The rq clock has already been updated in the
+ * set_rq_offline(), so we should skip updating
+ * the rq clock again in unthrottle_cfs_rq().
+ */
+ rq_clock_start_loop_update(rq);
+
rcu_read_lock();
list_for_each_entry_rcu(tg, &task_groups, list) {
struct cfs_rq *cfs_rq = tg->cfs_rq[cpu_of(rq)];
@@ -6137,6 +6153,8 @@ static void __maybe_unused unthrottle_offline_cfs_rqs(struct rq *rq)
unthrottle_cfs_rq(cfs_rq);
}
rcu_read_unlock();
+
+ rq_clock_stop_loop_update(rq);
}

#else /* CONFIG_CFS_BANDWIDTH */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 36e23e4..50d4b61 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1546,6 +1546,28 @@ static inline void rq_clock_cancel_skipupdate(struct rq *rq)
rq->clock_update_flags &= ~RQCF_REQ_SKIP;
}

+/*
+ * During cpu offlining and rq wide unthrottling, we can trigger
+ * an update_rq_clock() for several cfs and rt runqueues (Typically
+ * when using list_for_each_entry_*)
+ * rq_clock_start_loop_update() can be called after updating the clock
+ * once and before iterating over the list to prevent multiple update.
+ * After the iterative traversal, we need to call rq_clock_stop_loop_update()
+ * to clear RQCF_ACT_SKIP of rq->clock_update_flags.
+ */
+static inline void rq_clock_start_loop_update(struct rq *rq)
+{
+ lockdep_assert_rq_held(rq);
+ SCHED_WARN_ON(rq->clock_update_flags & RQCF_ACT_SKIP);
+ rq->clock_update_flags |= RQCF_ACT_SKIP;
+}
+
+static inline void rq_clock_stop_loop_update(struct rq *rq)
+{
+ lockdep_assert_rq_held(rq);
+ rq->clock_update_flags &= ~RQCF_ACT_SKIP;
+}
+
struct rq_flags {
unsigned long flags;
struct pin_cookie cookie;

2023-06-19 12:25:10

by Hao Jia

[permalink] [raw]

Subject: Re: [External] [tip: sched/core] sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()

On 2023/6/19 tip-bot2 for Hao Jia wrote:
> The following commit has been merged into the sched/core branch of tip:
>
> Commit-ID: 96500560f0c73c71bca1b27536c6254fa0e8ce37
> Gitweb: https://git.kernel.org/tip/96500560f0c73c71bca1b27536c6254fa0e8ce37
> Author: Hao Jia <[email protected]>
> AuthorDate: Tue, 13 Jun 2023 16:20:10 +08:00
> Committer: Peter Zijlstra <[email protected]>
> CommitterDate: Fri, 16 Jun 2023 22:08:12 +02:00
>
> sched/core: Avoid double calling update_rq_clock() in __balance_push_cpu_stop()
>
> There is a double update_rq_clock() invocation:
>
> __balance_push_cpu_stop()
> update_rq_clock()
> __migrate_task()
> update_rq_clock()
>
> Sadly select_fallback_rq() also needs update_rq_clock() for
> __do_set_cpus_allowed(), it is not possible to remove the update from
> __balance_push_cpu_stop(). So remove it from __migrate_task() and
> ensure all callers of this function call update_rq_clock() prior to
> calling it.
>
Hi Peter,

Thank you very much for refactoring these commit messages to make it
clearer and more concise.

Thanks,
Hao

> Signed-off-by: Hao Jia <[email protected]>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> Reviewed-by: Vincent Guittot <[email protected]>
> Link: https://lkml.kernel.org/r/[email protected]
> ---
> kernel/sched/core.c | 7 ++++---
> 1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 442efe5..c7db597 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2546,7 +2546,6 @@ static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
> if (!is_cpu_allowed(p, dest_cpu))
> return rq;
>
> - update_rq_clock(rq);
> rq = move_queued_task(rq, rf, p, dest_cpu);
>
> return rq;
> @@ -2604,10 +2603,12 @@ static int migration_cpu_stop(void *data)
> goto out;
> }
>
> - if (task_on_rq_queued(p))
> + if (task_on_rq_queued(p)) {
> + update_rq_clock(rq);
> rq = __migrate_task(rq, &rf, p, arg->dest_cpu);
> - else
> + } else {
> p->wake_cpu = arg->dest_cpu;
> + }
>
> /*
> * XXX __migrate_task() can fail, at which point we might end