2020-10-05 15:11:34

by Peter Zijlstra

[permalink] [raw]
Subject: [PATCH -v2 07/17] sched: Fix hotplug vs CPU bandwidth control

Since we now migrate tasks away before DYING, we should also move
bandwidth unthrottle, otherwise we can gain tasks from unthrottle
after we expect all tasks to be gone already.

Also; it looks like the RT balancers don't respect cpu_active() and
instead rely on rq->online in part, complete this. This too requires
we do set_rq_offline() earlier to match the cpu_active() semantics.
(The bigger patch is to convert RT to cpu_active() entirely)

Since set_rq_online() is called from sched_cpu_activate(), place
set_rq_offline() in sched_cpu_deactivate().

Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
---
kernel/sched/core.c | 14 ++++++++++----
kernel/sched/deadline.c | 5 +----
kernel/sched/rt.c | 5 +----
3 files changed, 12 insertions(+), 12 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6979,6 +6979,8 @@ int sched_cpu_activate(unsigned int cpu)

int sched_cpu_deactivate(unsigned int cpu)
{
+ struct rq *rq = cpu_rq(cpu);
+ struct rq_flags rf;
int ret;

set_cpu_active(cpu, false);
@@ -6993,6 +6995,14 @@ int sched_cpu_deactivate(unsigned int cp

balance_push_set(cpu, true);

+ rq_lock_irqsave(rq, &rf);
+ if (rq->rd) {
+ update_rq_clock();
+ BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
+ set_rq_offline(rq);
+ }
+ rq_unlock_irqrestore(rq, &rf);
+
#ifdef CONFIG_SCHED_SMT
/*
* When going down, decrement the number of cores with SMT present.
@@ -7074,10 +7084,6 @@ int sched_cpu_dying(unsigned int cpu)
sched_tick_stop(cpu);

rq_lock_irqsave(rq, &rf);
- if (rq->rd) {
- BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
- set_rq_offline(rq);
- }
BUG_ON(rq->nr_running != 1);
rq_unlock_irqrestore(rq, &rf);

--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -543,7 +543,7 @@ static int push_dl_task(struct rq *rq);

static inline bool need_pull_dl_task(struct rq *rq, struct task_struct *prev)
{
- return dl_task(prev);
+ return rq->online && dl_task(prev);
}

static DEFINE_PER_CPU(struct callback_head, dl_push_head);
@@ -2326,9 +2326,6 @@ static void rq_online_dl(struct rq *rq)
/* Assumes rq->lock is held */
static void rq_offline_dl(struct rq *rq)
{
- if (rq->dl.overloaded)
- dl_clear_overload(rq);
-
cpudl_clear(&rq->rd->cpudl, rq->cpu);
cpudl_clear_freecpu(&rq->rd->cpudl, rq->cpu);
}
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -265,7 +265,7 @@ static void pull_rt_task(struct rq *this
static inline bool need_pull_rt_task(struct rq *rq, struct task_struct *prev)
{
/* Try to pull RT tasks here if we lower this rq's prio */
- return rq->rt.highest_prio.curr > prev->prio;
+ return rq->online && rq->rt.highest_prio.curr > prev->prio;
}

static inline int rt_overloaded(struct rq *rq)
@@ -2245,9 +2245,6 @@ static void rq_online_rt(struct rq *rq)
/* Assumes rq->lock is held */
static void rq_offline_rt(struct rq *rq)
{
- if (rq->rt.overloaded)
- rt_clear_overload(rq);
-
__disable_runtime(rq);

cpupri_set(&rq->rd->cpupri, rq->cpu, CPUPRI_INVALID);



2020-10-06 12:50:03

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH -v2 07/17] sched: Fix hotplug vs CPU bandwidth control

On Mon, 5 Oct 2020 at 17:09, Peter Zijlstra <[email protected]> wrote:
>
> Since we now migrate tasks away before DYING, we should also move
> bandwidth unthrottle, otherwise we can gain tasks from unthrottle
> after we expect all tasks to be gone already.
>
> Also; it looks like the RT balancers don't respect cpu_active() and
> instead rely on rq->online in part, complete this. This too requires
> we do set_rq_offline() earlier to match the cpu_active() semantics.
> (The bigger patch is to convert RT to cpu_active() entirely)
>
> Since set_rq_online() is called from sched_cpu_activate(), place
> set_rq_offline() in sched_cpu_deactivate().
>
> Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
> ---
> kernel/sched/core.c | 14 ++++++++++----
> kernel/sched/deadline.c | 5 +----
> kernel/sched/rt.c | 5 +----
> 3 files changed, 12 insertions(+), 12 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -6979,6 +6979,8 @@ int sched_cpu_activate(unsigned int cpu)
>
> int sched_cpu_deactivate(unsigned int cpu)
> {
> + struct rq *rq = cpu_rq(cpu);
> + struct rq_flags rf;
> int ret;
>
> set_cpu_active(cpu, false);
> @@ -6993,6 +6995,14 @@ int sched_cpu_deactivate(unsigned int cp
>
> balance_push_set(cpu, true);
>
> + rq_lock_irqsave(rq, &rf);
> + if (rq->rd) {
> + update_rq_clock();

Tried to compile but rq parameter is missing

> + BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
> + set_rq_offline(rq);
> + }
> + rq_unlock_irqrestore(rq, &rf);
> +
> #ifdef CONFIG_SCHED_SMT
> /*
> * When going down, decrement the number of cores with SMT present.
> @@ -7074,10 +7084,6 @@ int sched_cpu_dying(unsigned int cpu)
> sched_tick_stop(cpu);
>
> rq_lock_irqsave(rq, &rf);
> - if (rq->rd) {
> - BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
> - set_rq_offline(rq);
> - }
> BUG_ON(rq->nr_running != 1);
> rq_unlock_irqrestore(rq, &rf);
>
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -543,7 +543,7 @@ static int push_dl_task(struct rq *rq);
>
> static inline bool need_pull_dl_task(struct rq *rq, struct task_struct *prev)
> {
> - return dl_task(prev);
> + return rq->online && dl_task(prev);
> }
>
> static DEFINE_PER_CPU(struct callback_head, dl_push_head);
> @@ -2326,9 +2326,6 @@ static void rq_online_dl(struct rq *rq)
> /* Assumes rq->lock is held */
> static void rq_offline_dl(struct rq *rq)
> {
> - if (rq->dl.overloaded)
> - dl_clear_overload(rq);
> -
> cpudl_clear(&rq->rd->cpudl, rq->cpu);
> cpudl_clear_freecpu(&rq->rd->cpudl, rq->cpu);
> }
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -265,7 +265,7 @@ static void pull_rt_task(struct rq *this
> static inline bool need_pull_rt_task(struct rq *rq, struct task_struct *prev)
> {
> /* Try to pull RT tasks here if we lower this rq's prio */
> - return rq->rt.highest_prio.curr > prev->prio;
> + return rq->online && rq->rt.highest_prio.curr > prev->prio;
> }
>
> static inline int rt_overloaded(struct rq *rq)
> @@ -2245,9 +2245,6 @@ static void rq_online_rt(struct rq *rq)
> /* Assumes rq->lock is held */
> static void rq_offline_rt(struct rq *rq)
> {
> - if (rq->rt.overloaded)
> - rt_clear_overload(rq);
> -
> __disable_runtime(rq);
>
> cpupri_set(&rq->rd->cpupri, rq->cpu, CPUPRI_INVALID);
>
>

2020-10-06 13:38:15

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -v2 07/17] sched: Fix hotplug vs CPU bandwidth control

On Tue, Oct 06, 2020 at 02:46:28PM +0200, Vincent Guittot wrote:

> > @@ -6993,6 +6995,14 @@ int sched_cpu_deactivate(unsigned int cp
> >
> > balance_push_set(cpu, true);
> >
> > + rq_lock_irqsave(rq, &rf);
> > + if (rq->rd) {
> > + update_rq_clock();
>
> Tried to compile but rq parameter is missing

Damn :/, I'm sure I fixed that. must've lost a refresh before sending.

2020-10-10 04:04:42

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH -v2 07/17] sched: Fix hotplug vs CPU bandwidth control

On 05/10/2020 16:57, Peter Zijlstra wrote:
> Since we now migrate tasks away before DYING, we should also move
> bandwidth unthrottle, otherwise we can gain tasks from unthrottle
> after we expect all tasks to be gone already.
>
> Also; it looks like the RT balancers don't respect cpu_active() and
> instead rely on rq->online in part, complete this. This too requires
> we do set_rq_offline() earlier to match the cpu_active() semantics.
> (The bigger patch is to convert RT to cpu_active() entirely)
>
> Since set_rq_online() is called from sched_cpu_activate(), place
> set_rq_offline() in sched_cpu_deactivate().

[ 76.215229] WARNING: CPU: 1 PID: 1913 at kernel/irq_work.c:95
irq_work_queue_on+0x108/0x110
[ 76.223589] Modules linked in:
[ 76.226646] CPU: 1 PID: 1913 Comm: task0-1 Not tainted
5.9.0-rc1-00159-g231df48234cb-dirty #40
[ 76.235268] Hardware name: ARM Juno development board (r0) (DT)
[ 76.241194] pstate: 60000085 (nZCv daIf -PAN -UAO BTYPE=--)
[ 76.246772] pc : irq_work_queue_on+0x108/0x110
[ 76.251220] lr : pull_rt_task+0x58/0x68
[ 76.255577] sp : ffff800013a83be0
[ 76.258890] x29: ffff800013a83be0 x28: ffff000972f34600
[ 76.264208] x27: ffff000972f34b90 x26: ffff8000114156c0
[ 76.269524] x25: 0080000000000000 x24: ffff800011cd7000
[ 76.274840] x23: ffff000972f34600 x22: ffff800010d627c8
[ 76.280157] x21: 0000000000000000 x20: 0000000000000000
[ 76.285473] x19: ffff00097ef701c0 x18: 0000000000000010
[ 76.290788] x17: 0000000000000000 x16: 0000000000000000
[ 76.296104] x15: ffff000972f34a80 x14: ffffffffffffffff
[ 76.301420] x13: ffff800093a83987 x12: ffff800013a8398f
[ 76.306736] x11: ffff800011ac2000 x10: ffff800011ce8690
[ 76.312051] x9 : 0000000000000000 x8 : ffff800011ce9000
[ 76.317367] x7 : ffff8000106e9bb8 x6 : 0000000000000144
[ 76.322682] x5 : 0000000000000000 x4 : ffff800011aaa1c0
[ 76.327998] x3 : 0000000000000000 x2 : 0000000000000000
[ 76.333314] x1 : ffff800011ce72a0 x0 : 0000000000000002
[ 76.338630] Call trace:
[ 76.341076] irq_work_queue_on+0x108/0x110
[ 76.349185] pull_rt_task+0x58/0x68
[ 76.352673] balance_rt+0x84/0x88
[ 76.355990] __schedule+0x4a4/0x670
[ 76.359478] schedule+0x70/0x108
[ 76.362706] do_nanosleep+0x8c/0x178
[ 76.366283] hrtimer_nanosleep+0xa0/0x118
[ 76.370294] common_nsleep_timens+0x68/0x98
[ 76.374479] __arm64_sys_clock_nanosleep+0xc0/0x138
[ 76.379361] el0_svc_common.constprop.0+0x6c/0x168
[ 76.384155] do_el0_svc+0x24/0x90
[ 76.387471] el0_sync_handler+0x90/0x198
[ 76.391394] el0_sync+0x158/0x180


balance_rt() checks via need_pull_rt_task() that rq is online but it
looks like that with RT_PUSH_IPI pull_rt_task() -> tell_cpu_to_push()
calls irq_work_queue_on() with cpu = rto_next_cpu(rq->rd) and this one
can be offline here as well now.

2020-10-12 13:36:12

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -v2 07/17] sched: Fix hotplug vs CPU bandwidth control

On Mon, Oct 12, 2020 at 02:52:00PM +0200, Peter Zijlstra wrote:
> On Fri, Oct 09, 2020 at 10:41:11PM +0200, Dietmar Eggemann wrote:
> > On 05/10/2020 16:57, Peter Zijlstra wrote:
> > > Since we now migrate tasks away before DYING, we should also move
> > > bandwidth unthrottle, otherwise we can gain tasks from unthrottle
> > > after we expect all tasks to be gone already.
> > >
> > > Also; it looks like the RT balancers don't respect cpu_active() and
> > > instead rely on rq->online in part, complete this. This too requires
> > > we do set_rq_offline() earlier to match the cpu_active() semantics.
> > > (The bigger patch is to convert RT to cpu_active() entirely)
> > >
> > > Since set_rq_online() is called from sched_cpu_activate(), place
> > > set_rq_offline() in sched_cpu_deactivate().
>
> > [ 76.215229] WARNING: CPU: 1 PID: 1913 at kernel/irq_work.c:95 irq_work_queue_on+0x108/0x110
>
> > [ 76.341076] irq_work_queue_on+0x108/0x110
> > [ 76.349185] pull_rt_task+0x58/0x68
> > [ 76.352673] balance_rt+0x84/0x88
>
> > balance_rt() checks via need_pull_rt_task() that rq is online but it
> > looks like that with RT_PUSH_IPI pull_rt_task() -> tell_cpu_to_push()
> > calls irq_work_queue_on() with cpu = rto_next_cpu(rq->rd) and this one
> > can be offline here as well now.
>
> Hurmph... so if I read this right, we reach offline with overload set?
>
> Oooh, I think I see how that happens..

I think the below two hunks need to be reverted from the patch. Can you
confirm?

--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2326,9 +2326,6 @@ static void rq_online_dl(struct rq *rq)
/* Assumes rq->lock is held */
static void rq_offline_dl(struct rq *rq)
{
- if (rq->dl.overloaded)
- dl_clear_overload(rq);
-
cpudl_clear(&rq->rd->cpudl, rq->cpu);
cpudl_clear_freecpu(&rq->rd->cpudl, rq->cpu);
}
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -2245,9 +2245,6 @@ static void rq_online_rt(struct rq *rq)
/* Assumes rq->lock is held */
static void rq_offline_rt(struct rq *rq)
{
- if (rq->rt.overloaded)
- rt_clear_overload(rq);
-
__disable_runtime(rq);

cpupri_set(&rq->rd->cpupri, rq->cpu, CPUPRI_INVALID);

2020-10-12 14:18:39

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH -v2 07/17] sched: Fix hotplug vs CPU bandwidth control

On 12/10/2020 15:18, Peter Zijlstra wrote:
> On Mon, Oct 12, 2020 at 02:52:00PM +0200, Peter Zijlstra wrote:
>> On Fri, Oct 09, 2020 at 10:41:11PM +0200, Dietmar Eggemann wrote:
>>> On 05/10/2020 16:57, Peter Zijlstra wrote:
>>>> Since we now migrate tasks away before DYING, we should also move
>>>> bandwidth unthrottle, otherwise we can gain tasks from unthrottle
>>>> after we expect all tasks to be gone already.
>>>>
>>>> Also; it looks like the RT balancers don't respect cpu_active() and
>>>> instead rely on rq->online in part, complete this. This too requires
>>>> we do set_rq_offline() earlier to match the cpu_active() semantics.
>>>> (The bigger patch is to convert RT to cpu_active() entirely)
>>>>
>>>> Since set_rq_online() is called from sched_cpu_activate(), place
>>>> set_rq_offline() in sched_cpu_deactivate().
>>
>>> [ 76.215229] WARNING: CPU: 1 PID: 1913 at kernel/irq_work.c:95 irq_work_queue_on+0x108/0x110
>>
>>> [ 76.341076] irq_work_queue_on+0x108/0x110
>>> [ 76.349185] pull_rt_task+0x58/0x68
>>> [ 76.352673] balance_rt+0x84/0x88
>>
>>> balance_rt() checks via need_pull_rt_task() that rq is online but it
>>> looks like that with RT_PUSH_IPI pull_rt_task() -> tell_cpu_to_push()
>>> calls irq_work_queue_on() with cpu = rto_next_cpu(rq->rd) and this one
>>> can be offline here as well now.
>>
>> Hurmph... so if I read this right, we reach offline with overload set?
>>
>> Oooh, I think I see how that happens..
>
> I think the below two hunks need to be reverted from the patch. Can you
> confirm?
>
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -2326,9 +2326,6 @@ static void rq_online_dl(struct rq *rq)
> /* Assumes rq->lock is held */
> static void rq_offline_dl(struct rq *rq)
> {
> - if (rq->dl.overloaded)
> - dl_clear_overload(rq);
> -
> cpudl_clear(&rq->rd->cpudl, rq->cpu);
> cpudl_clear_freecpu(&rq->rd->cpudl, rq->cpu);
> }
> --- a/kernel/sched/rt.c
> +++ b/kernel/sched/rt.c
> @@ -2245,9 +2245,6 @@ static void rq_online_rt(struct rq *rq)
> /* Assumes rq->lock is held */
> static void rq_offline_rt(struct rq *rq)
> {
> - if (rq->rt.overloaded)
> - rt_clear_overload(rq);
> -
> __disable_runtime(rq);
>
> cpupri_set(&rq->rd->cpupri, rq->cpu, CPUPRI_INVALID);
>

Yes, this seems to fix it. Tested with RT testcase for > 20min. This
issue did appear within 30 secs w/o this fix.

Looks like that we now bail out of pull_rt_task() in one of the
rt_overload_count related conditions before we call RT_PUSH_IPI
functionality (tell_cpu_to_push()).

Debug snippet w/o this fix with extra output in tell_cpu_to_push():

[ 128.608880] sched: RT throttling activated
[ 204.240351] CPU2 cpu=0 is offline rt_overloaded=1
[ 204.245069] ------------[ cut here ]------------
[ 204.249697] WARNING: CPU: 2 PID: 19 at kernel/irq_work.c:95
irq_work_queue_on+0x108/0x110

2020-10-12 21:26:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH -v2 07/17] sched: Fix hotplug vs CPU bandwidth control

On Fri, Oct 09, 2020 at 10:41:11PM +0200, Dietmar Eggemann wrote:
> On 05/10/2020 16:57, Peter Zijlstra wrote:
> > Since we now migrate tasks away before DYING, we should also move
> > bandwidth unthrottle, otherwise we can gain tasks from unthrottle
> > after we expect all tasks to be gone already.
> >
> > Also; it looks like the RT balancers don't respect cpu_active() and
> > instead rely on rq->online in part, complete this. This too requires
> > we do set_rq_offline() earlier to match the cpu_active() semantics.
> > (The bigger patch is to convert RT to cpu_active() entirely)
> >
> > Since set_rq_online() is called from sched_cpu_activate(), place
> > set_rq_offline() in sched_cpu_deactivate().

> [ 76.215229] WARNING: CPU: 1 PID: 1913 at kernel/irq_work.c:95 irq_work_queue_on+0x108/0x110

> [ 76.341076] irq_work_queue_on+0x108/0x110
> [ 76.349185] pull_rt_task+0x58/0x68
> [ 76.352673] balance_rt+0x84/0x88

> balance_rt() checks via need_pull_rt_task() that rq is online but it
> looks like that with RT_PUSH_IPI pull_rt_task() -> tell_cpu_to_push()
> calls irq_work_queue_on() with cpu = rto_next_cpu(rq->rd) and this one
> can be offline here as well now.

Hurmph... so if I read this right, we reach offline with overload set?

Oooh, I think I see how that happens..