2019-07-26 08:30:28

by Dietmar Eggemann

[permalink] [raw]
Subject: [PATCH 0/5] sched/deadline: Fix double accounting in push_dl_task() & some cleanups

While running a simple DL workload (1 DL (12000, 100000, 100000) task
per CPU) on Arm64 & x86 systems I noticed that some of the
SCHED_WARN_ON() in the rq/running bandwidth (bw) functions trigger.
Patch 1/5 contains a proposal to fix this.

Patch 2-5/5 contain various smaller cleanups I discovered while
debugging the actual issue.

Dietmar Eggemann (5):
sched/deadline: Fix double accounting of rq/running bw in
push_dl_task()
sched/deadline: Remove unused int flags from __dequeue_task_dl()
sched/deadline: Use __sub_running_bw() throughout
dl_change_utilization()
sched/deadline: Cleanup on_dl_rq() handling
sched/deadline: Use return value of SCHED_WARN_ON() in bw accounting

kernel/sched/deadline.c | 28 ++++++++++------------------
1 file changed, 10 insertions(+), 18 deletions(-)

--
2.17.1



2019-07-26 08:30:32

by Dietmar Eggemann

[permalink] [raw]
Subject: [PATCH 1/5] sched/deadline: Fix double accounting of rq/running bw in push_dl_task()

push_dl_task() always calls deactivate_task() with flags=0 which sets
p->on_rq=TASK_ON_RQ_MIGRATING.
push_dl_task()->deactivate_task()->dequeue_task()->dequeue_task_dl()
calls sub_[running/rq]_bw() since p->on_rq=TASK_ON_RQ_MIGRATING.
So sub_[running/rq]_bw() in push_dl_task() is double-accounting for
that task.

The same is true for add_[rq/running]_bw() and activate_task() on the
destination (later) CPU.
push_dl_task()->activate_task()->enqueue_task()->enqueue_task_dl()
calls add_[rq/running]_bw() again since p->on_rq is still set to
TASK_ON_RQ_MIGRATING.
So the add_[rq/running]_bw() in enqueue_task_dl() is double-accounting
for that task.

Fix this by removing the rq/running bw accounting in push_dl_task().

Trace (CONFIG_SCHED_DEBUG=y) before the fix on a 6 CPUs system with 6
DL (12000, 100000, 100000) tasks showing the issue:

[ 48.147868] dl_rq->running_bw > old
[ 48.147886] WARNING: CPU: 1 PID: 0 at kernel/sched/deadline.c:98
...
[ 48.274832] inactive_task_timer+0x468/0x4e8
[ 48.279057] __hrtimer_run_queues+0x10c/0x3b8
[ 48.283364] hrtimer_interrupt+0xd4/0x250
[ 48.287330] tick_handle_oneshot_broadcast+0x198/0x1d0
...
[ 48.360057] dl_rq->running_bw > dl_rq->this_bw
[ 48.360065] WARNING: CPU: 1 PID: 0 at kernel/sched/deadline.c:86
...
[ 48.488294] task_contending+0x1a0/0x208
[ 48.492172] enqueue_task_dl+0x3b8/0x970
[ 48.496050] activate_task+0x70/0xd0
[ 48.499584] ttwu_do_activate+0x50/0x78
[ 48.503375] try_to_wake_up+0x270/0x7a0
[ 48.507167] wake_up_process+0x14/0x20
[ 48.510873] hrtimer_wakeup+0x1c/0x30
...
[ 50.062867] dl_rq->this_bw > old
[ 50.062885] WARNING: CPU: 1 PID: 2048 at kernel/sched/deadline.c:122
...
[ 50.190520] dequeue_task_dl+0x1e4/0x1f8
[ 50.194400] __sched_setscheduler+0x1d0/0x860
[ 50.198707] _sched_setscheduler+0x74/0x98
[ 50.202757] do_sched_setscheduler+0xa8/0x110
[ 50.207065] __arm64_sys_sched_setscheduler+0x1c/0x30

Signed-off-by: Dietmar Eggemann <[email protected]>
---
kernel/sched/deadline.c | 4 ----
1 file changed, 4 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index de2bd006fe93..d1aeada374e1 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2121,17 +2121,13 @@ static int push_dl_task(struct rq *rq)
}

deactivate_task(rq, next_task, 0);
- sub_running_bw(&next_task->dl, &rq->dl);
- sub_rq_bw(&next_task->dl, &rq->dl);
set_task_cpu(next_task, later_rq->cpu);
- add_rq_bw(&next_task->dl, &later_rq->dl);

/*
* Update the later_rq clock here, because the clock is used
* by the cpufreq_update_util() inside __add_running_bw().
*/
update_rq_clock(later_rq);
- add_running_bw(&next_task->dl, &later_rq->dl);
activate_task(later_rq, next_task, ENQUEUE_NOCLOCK);
ret = 1;

--
2.17.1


2019-07-26 08:30:39

by Dietmar Eggemann

[permalink] [raw]
Subject: [PATCH 3/5] sched/deadline: Use __sub_running_bw() throughout dl_change_utilization()

dl_change_utilization() has a BUG_ON() to check that no schedutil
kthread (sugov) is entering this function. So instead of calling
sub_running_bw() which checks for the special entity related to a
sugov thread, call the underlying function __sub_running_bw().

Signed-off-by: Dietmar Eggemann <[email protected]>
---
kernel/sched/deadline.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 99d4c24a8637..1fa005f79307 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -164,7 +164,7 @@ void dl_change_utilization(struct task_struct *p, u64 new_bw)

rq = task_rq(p);
if (p->dl.dl_non_contending) {
- sub_running_bw(&p->dl, &rq->dl);
+ __sub_running_bw(p->dl.dl_bw, &rq->dl);
p->dl.dl_non_contending = 0;
/*
* If the timer handler is currently running and the
--
2.17.1


2019-07-26 08:32:05

by Dietmar Eggemann

[permalink] [raw]
Subject: [PATCH 5/5] sched/deadline: Use return value of SCHED_WARN_ON() in bw accounting

To make the decision whether to set rq or running bw to 0 in underflow
case use the return value of SCHED_WARN_ON() rather than an extra if
condition.

Signed-off-by: Dietmar Eggemann <[email protected]>
---
kernel/sched/deadline.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index a9cb52ceb761..66c594b5507e 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -95,8 +95,7 @@ void __sub_running_bw(u64 dl_bw, struct dl_rq *dl_rq)

lockdep_assert_held(&(rq_of_dl_rq(dl_rq))->lock);
dl_rq->running_bw -= dl_bw;
- SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
- if (dl_rq->running_bw > old)
+ if (SCHED_WARN_ON(dl_rq->running_bw > old)) /* underflow */
dl_rq->running_bw = 0;
/* kick cpufreq (see the comment in kernel/sched/sched.h). */
cpufreq_update_util(rq_of_dl_rq(dl_rq), 0);
@@ -119,8 +118,7 @@ void __sub_rq_bw(u64 dl_bw, struct dl_rq *dl_rq)

lockdep_assert_held(&(rq_of_dl_rq(dl_rq))->lock);
dl_rq->this_bw -= dl_bw;
- SCHED_WARN_ON(dl_rq->this_bw > old); /* underflow */
- if (dl_rq->this_bw > old)
+ if (SCHED_WARN_ON(dl_rq->this_bw > old)) /* underflow */
dl_rq->this_bw = 0;
SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw);
}
--
2.17.1


2019-07-26 08:32:20

by Dietmar Eggemann

[permalink] [raw]
Subject: [PATCH 4/5] sched/deadline: Cleanup on_dl_rq() handling

Remove BUG_ON() in __enqueue_dl_entity() since there is already one in
enqueue_dl_entity().

Move the check that the dl_se is not on the dl_rq from
__dequeue_dl_entity() to dequeue_dl_entity() to align with the enqueue
side and use the on_dl_rq() helper function.

Signed-off-by: Dietmar Eggemann <[email protected]>
---
kernel/sched/deadline.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 1fa005f79307..a9cb52ceb761 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1407,8 +1407,6 @@ static void __enqueue_dl_entity(struct sched_dl_entity *dl_se)
struct sched_dl_entity *entry;
int leftmost = 1;

- BUG_ON(!RB_EMPTY_NODE(&dl_se->rb_node));
-
while (*link) {
parent = *link;
entry = rb_entry(parent, struct sched_dl_entity, rb_node);
@@ -1430,9 +1428,6 @@ static void __dequeue_dl_entity(struct sched_dl_entity *dl_se)
{
struct dl_rq *dl_rq = dl_rq_of_se(dl_se);

- if (RB_EMPTY_NODE(&dl_se->rb_node))
- return;
-
rb_erase_cached(&dl_se->rb_node, &dl_rq->root);
RB_CLEAR_NODE(&dl_se->rb_node);

@@ -1466,6 +1461,9 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se,

static void dequeue_dl_entity(struct sched_dl_entity *dl_se)
{
+ if (!on_dl_rq(dl_se))
+ return;
+
__dequeue_dl_entity(dl_se);
}

--
2.17.1


2019-07-26 08:32:49

by Dietmar Eggemann

[permalink] [raw]
Subject: [PATCH 2/5] sched/deadline: Remove unused int flags from __dequeue_task_dl()

The int flags parameter is not used in __dequeue_task_dl(). Remove it.

Signed-off-by: Dietmar Eggemann <[email protected]>
---
kernel/sched/deadline.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index d1aeada374e1..99d4c24a8637 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -637,7 +637,7 @@ static inline void deadline_queue_pull_task(struct rq *rq)
#endif /* CONFIG_SMP */

static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags);
-static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags);
+static void __dequeue_task_dl(struct rq *rq, struct task_struct *p);
static void check_preempt_curr_dl(struct rq *rq, struct task_struct *p, int flags);

/*
@@ -1245,7 +1245,7 @@ static void update_curr_dl(struct rq *rq)
(dl_se->flags & SCHED_FLAG_DL_OVERRUN))
dl_se->dl_overrun = 1;

- __dequeue_task_dl(rq, curr, 0);
+ __dequeue_task_dl(rq, curr);
if (unlikely(dl_se->dl_boosted || !start_dl_timer(curr)))
enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);

@@ -1535,7 +1535,7 @@ static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags)
enqueue_pushable_dl_task(rq, p);
}

-static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
+static void __dequeue_task_dl(struct rq *rq, struct task_struct *p)
{
dequeue_dl_entity(&p->dl);
dequeue_pushable_dl_task(rq, p);
@@ -1544,7 +1544,7 @@ static void __dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
{
update_curr_dl(rq);
- __dequeue_task_dl(rq, p, flags);
+ __dequeue_task_dl(rq, p);

if (p->on_rq == TASK_ON_RQ_MIGRATING || flags & DEQUEUE_SAVE) {
sub_running_bw(&p->dl, &rq->dl);
--
2.17.1


2019-07-26 08:38:47

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH 4/5] sched/deadline: Cleanup on_dl_rq() handling

On 26/07/2019 09:27, Dietmar Eggemann wrote:
> Remove BUG_ON() in __enqueue_dl_entity() since there is already one in
> enqueue_dl_entity().
>
> Move the check that the dl_se is not on the dl_rq from
> __dequeue_dl_entity() to dequeue_dl_entity() to align with the enqueue
> side and use the on_dl_rq() helper function.
>
> Signed-off-by: Dietmar Eggemann <[email protected]>
> ---
> kernel/sched/deadline.c | 8 +++-----
> 1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 1fa005f79307..a9cb52ceb761 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1407,8 +1407,6 @@ static void __enqueue_dl_entity(struct sched_dl_entity *dl_se)
> struct sched_dl_entity *entry;
> int leftmost = 1;
>
> - BUG_ON(!RB_EMPTY_NODE(&dl_se->rb_node));
> -
> while (*link) {
> parent = *link;
> entry = rb_entry(parent, struct sched_dl_entity, rb_node);
> @@ -1430,9 +1428,6 @@ static void __dequeue_dl_entity(struct sched_dl_entity *dl_se)
> {
> struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
>
> - if (RB_EMPTY_NODE(&dl_se->rb_node))
> - return;
> -

Any idea why a similar error leads to a BUG_ON() in the enqueue path but
only a silent return on the dequeue path? I would expect the handling to be
almost identical.


2019-07-26 09:00:30

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH 4/5] sched/deadline: Cleanup on_dl_rq() handling

On 07/26/19 09:37, Valentin Schneider wrote:
> On 26/07/2019 09:27, Dietmar Eggemann wrote:
> > Remove BUG_ON() in __enqueue_dl_entity() since there is already one in
> > enqueue_dl_entity().
> >
> > Move the check that the dl_se is not on the dl_rq from
> > __dequeue_dl_entity() to dequeue_dl_entity() to align with the enqueue
> > side and use the on_dl_rq() helper function.
> >
> > Signed-off-by: Dietmar Eggemann <[email protected]>
> > ---
> > kernel/sched/deadline.c | 8 +++-----
> > 1 file changed, 3 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index 1fa005f79307..a9cb52ceb761 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1407,8 +1407,6 @@ static void __enqueue_dl_entity(struct sched_dl_entity *dl_se)
> > struct sched_dl_entity *entry;
> > int leftmost = 1;
> >
> > - BUG_ON(!RB_EMPTY_NODE(&dl_se->rb_node));
> > -
> > while (*link) {
> > parent = *link;
> > entry = rb_entry(parent, struct sched_dl_entity, rb_node);
> > @@ -1430,9 +1428,6 @@ static void __dequeue_dl_entity(struct sched_dl_entity *dl_se)
> > {
> > struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> >
> > - if (RB_EMPTY_NODE(&dl_se->rb_node))
> > - return;
> > -
>
> Any idea why a similar error leads to a BUG_ON() in the enqueue path but
> only a silent return on the dequeue path? I would expect the handling to be
> almost identical.

nit: s/BUG_ON/WARN_ON/

--
Qais Yousef

2019-07-26 09:22:34

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH 4/5] sched/deadline: Cleanup on_dl_rq() handling

Hi,

On 26/07/19 09:37, Valentin Schneider wrote:
> On 26/07/2019 09:27, Dietmar Eggemann wrote:
> > Remove BUG_ON() in __enqueue_dl_entity() since there is already one in
> > enqueue_dl_entity().
> >
> > Move the check that the dl_se is not on the dl_rq from
> > __dequeue_dl_entity() to dequeue_dl_entity() to align with the enqueue
> > side and use the on_dl_rq() helper function.
> >
> > Signed-off-by: Dietmar Eggemann <[email protected]>
> > ---
> > kernel/sched/deadline.c | 8 +++-----
> > 1 file changed, 3 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index 1fa005f79307..a9cb52ceb761 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1407,8 +1407,6 @@ static void __enqueue_dl_entity(struct sched_dl_entity *dl_se)
> > struct sched_dl_entity *entry;
> > int leftmost = 1;
> >
> > - BUG_ON(!RB_EMPTY_NODE(&dl_se->rb_node));
> > -
> > while (*link) {
> > parent = *link;
> > entry = rb_entry(parent, struct sched_dl_entity, rb_node);
> > @@ -1430,9 +1428,6 @@ static void __dequeue_dl_entity(struct sched_dl_entity *dl_se)
> > {
> > struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> >
> > - if (RB_EMPTY_NODE(&dl_se->rb_node))
> > - return;
> > -
>
> Any idea why a similar error leads to a BUG_ON() in the enqueue path but
> only a silent return on the dequeue path? I would expect the handling to be
> almost identical.
>

Task could have already been dequeued by update_curr_dl()->throttle
called by dequeue_task_dl() before calling __dequeue_task_dl().

Thanks,

Juri

2019-07-26 09:35:01

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH 4/5] sched/deadline: Cleanup on_dl_rq() handling

On 26/07/2019 10:20, Juri Lelli wrote:
>> Any idea why a similar error leads to a BUG_ON() in the enqueue path but
>> only a silent return on the dequeue path? I would expect the handling to be
>> almost identical.
>>
>
> Task could have already been dequeued by update_curr_dl()->throttle
> called by dequeue_task_dl() before calling __dequeue_task_dl().
>

Got it, thanks.

> Thanks,
>
> Juri
>

2019-07-26 10:13:51

by luca abeni

[permalink] [raw]
Subject: Re: [PATCH 1/5] sched/deadline: Fix double accounting of rq/running bw in push_dl_task()

Hi Dietmar,

On Fri, 26 Jul 2019 09:27:52 +0100
Dietmar Eggemann <[email protected]> wrote:

> push_dl_task() always calls deactivate_task() with flags=0 which sets
> p->on_rq=TASK_ON_RQ_MIGRATING.

Uhm... This is a recent change in the deactivate_task() behaviour,
right? Because I tested SCHED_DEADLINE a lot, but I've never seen this
issue :)


Anyway, looking at the current code the change looks OK. Thanks for
fixing this issue!


Luca

> push_dl_task()->deactivate_task()->dequeue_task()->dequeue_task_dl()
> calls sub_[running/rq]_bw() since p->on_rq=TASK_ON_RQ_MIGRATING.
> So sub_[running/rq]_bw() in push_dl_task() is double-accounting for
> that task.
>
> The same is true for add_[rq/running]_bw() and activate_task() on the
> destination (later) CPU.
> push_dl_task()->activate_task()->enqueue_task()->enqueue_task_dl()
> calls add_[rq/running]_bw() again since p->on_rq is still set to
> TASK_ON_RQ_MIGRATING.
> So the add_[rq/running]_bw() in enqueue_task_dl() is double-accounting
> for that task.
>
> Fix this by removing the rq/running bw accounting in push_dl_task().
>
> Trace (CONFIG_SCHED_DEBUG=y) before the fix on a 6 CPUs system with 6
> DL (12000, 100000, 100000) tasks showing the issue:
>
> [ 48.147868] dl_rq->running_bw > old
> [ 48.147886] WARNING: CPU: 1 PID: 0 at kernel/sched/deadline.c:98
> ...
> [ 48.274832] inactive_task_timer+0x468/0x4e8
> [ 48.279057] __hrtimer_run_queues+0x10c/0x3b8
> [ 48.283364] hrtimer_interrupt+0xd4/0x250
> [ 48.287330] tick_handle_oneshot_broadcast+0x198/0x1d0
> ...
> [ 48.360057] dl_rq->running_bw > dl_rq->this_bw
> [ 48.360065] WARNING: CPU: 1 PID: 0 at kernel/sched/deadline.c:86
> ...
> [ 48.488294] task_contending+0x1a0/0x208
> [ 48.492172] enqueue_task_dl+0x3b8/0x970
> [ 48.496050] activate_task+0x70/0xd0
> [ 48.499584] ttwu_do_activate+0x50/0x78
> [ 48.503375] try_to_wake_up+0x270/0x7a0
> [ 48.507167] wake_up_process+0x14/0x20
> [ 48.510873] hrtimer_wakeup+0x1c/0x30
> ...
> [ 50.062867] dl_rq->this_bw > old
> [ 50.062885] WARNING: CPU: 1 PID: 2048 at
> kernel/sched/deadline.c:122 ...
> [ 50.190520] dequeue_task_dl+0x1e4/0x1f8
> [ 50.194400] __sched_setscheduler+0x1d0/0x860
> [ 50.198707] _sched_setscheduler+0x74/0x98
> [ 50.202757] do_sched_setscheduler+0xa8/0x110
> [ 50.207065] __arm64_sys_sched_setscheduler+0x1c/0x30
>
> Signed-off-by: Dietmar Eggemann <[email protected]>
> ---
> kernel/sched/deadline.c | 4 ----
> 1 file changed, 4 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index de2bd006fe93..d1aeada374e1 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -2121,17 +2121,13 @@ static int push_dl_task(struct rq *rq)
> }
>
> deactivate_task(rq, next_task, 0);
> - sub_running_bw(&next_task->dl, &rq->dl);
> - sub_rq_bw(&next_task->dl, &rq->dl);
> set_task_cpu(next_task, later_rq->cpu);
> - add_rq_bw(&next_task->dl, &later_rq->dl);
>
> /*
> * Update the later_rq clock here, because the clock is used
> * by the cpufreq_update_util() inside __add_running_bw().
> */
> update_rq_clock(later_rq);
> - add_running_bw(&next_task->dl, &later_rq->dl);
> activate_task(later_rq, next_task, ENQUEUE_NOCLOCK);
> ret = 1;
>


2019-07-26 10:58:22

by luca abeni

[permalink] [raw]
Subject: Re: [PATCH 5/5] sched/deadline: Use return value of SCHED_WARN_ON() in bw accounting

Hi Dietmar,

On Fri, 26 Jul 2019 09:27:56 +0100
Dietmar Eggemann <[email protected]> wrote:

> To make the decision whether to set rq or running bw to 0 in underflow
> case use the return value of SCHED_WARN_ON() rather than an extra if
> condition.

I think I tried this at some point, but if I remember well this
solution does not work correctly when SCHED_DEBUG is not enabled.



Luca


>
> Signed-off-by: Dietmar Eggemann <[email protected]>
> ---
> kernel/sched/deadline.c | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index a9cb52ceb761..66c594b5507e 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -95,8 +95,7 @@ void __sub_running_bw(u64 dl_bw, struct dl_rq
> *dl_rq)
> lockdep_assert_held(&(rq_of_dl_rq(dl_rq))->lock);
> dl_rq->running_bw -= dl_bw;
> - SCHED_WARN_ON(dl_rq->running_bw > old); /* underflow */
> - if (dl_rq->running_bw > old)
> + if (SCHED_WARN_ON(dl_rq->running_bw > old)) /* underflow */
> dl_rq->running_bw = 0;
> /* kick cpufreq (see the comment in kernel/sched/sched.h). */
> cpufreq_update_util(rq_of_dl_rq(dl_rq), 0);
> @@ -119,8 +118,7 @@ void __sub_rq_bw(u64 dl_bw, struct dl_rq *dl_rq)
>
> lockdep_assert_held(&(rq_of_dl_rq(dl_rq))->lock);
> dl_rq->this_bw -= dl_bw;
> - SCHED_WARN_ON(dl_rq->this_bw > old); /* underflow */
> - if (dl_rq->this_bw > old)
> + if (SCHED_WARN_ON(dl_rq->this_bw > old)) /* underflow */
> dl_rq->this_bw = 0;
> SCHED_WARN_ON(dl_rq->running_bw > dl_rq->this_bw);
> }


2019-07-26 13:32:06

by luca abeni

[permalink] [raw]
Subject: Re: [PATCH 1/5] sched/deadline: Fix double accounting of rq/running bw in push_dl_task()

Hi,

On Fri, 26 Jul 2019 09:27:52 +0100
Dietmar Eggemann <[email protected]> wrote:
[...]
> @@ -2121,17 +2121,13 @@ static int push_dl_task(struct rq *rq)
> }
>
> deactivate_task(rq, next_task, 0);
> - sub_running_bw(&next_task->dl, &rq->dl);
> - sub_rq_bw(&next_task->dl, &rq->dl);
> set_task_cpu(next_task, later_rq->cpu);
> - add_rq_bw(&next_task->dl, &later_rq->dl);
>
> /*
> * Update the later_rq clock here, because the clock is used
> * by the cpufreq_update_util() inside __add_running_bw().
> */
> update_rq_clock(later_rq);
> - add_running_bw(&next_task->dl, &later_rq->dl);

Looking at the code again and thinking a little bit more about this
issue, I suspect a similar change is needed in pull_dl_task() too, no?



Luca

2019-07-29 09:08:38

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH 1/5] sched/deadline: Fix double accounting of rq/running bw in push_dl_task()

On 7/26/19 11:11 AM, luca abeni wrote:
> Hi Dietmar,
>
> On Fri, 26 Jul 2019 09:27:52 +0100
> Dietmar Eggemann <[email protected]> wrote:
>
>> push_dl_task() always calls deactivate_task() with flags=0 which sets
>> p->on_rq=TASK_ON_RQ_MIGRATING.
>
> Uhm... This is a recent change in the deactivate_task() behaviour,
> right? Because I tested SCHED_DEADLINE a lot, but I've never seen this
> issue :)

Looks like it was v5.2 commit 7dd778841164 ("sched/core: Unify p->on_rq
updates").

> Anyway, looking at the current code the change looks OK. Thanks for
> fixing this issue!

[...]

2019-07-29 09:10:47

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH 1/5] sched/deadline: Fix double accounting of rq/running bw in push_dl_task()

On 7/26/19 2:30 PM, luca abeni wrote:
> Hi,
>
> On Fri, 26 Jul 2019 09:27:52 +0100
> Dietmar Eggemann <[email protected]> wrote:
> [...]
>> @@ -2121,17 +2121,13 @@ static int push_dl_task(struct rq *rq)
>> }
>>
>> deactivate_task(rq, next_task, 0);
>> - sub_running_bw(&next_task->dl, &rq->dl);
>> - sub_rq_bw(&next_task->dl, &rq->dl);
>> set_task_cpu(next_task, later_rq->cpu);
>> - add_rq_bw(&next_task->dl, &later_rq->dl);
>>
>> /*
>> * Update the later_rq clock here, because the clock is used
>> * by the cpufreq_update_util() inside __add_running_bw().
>> */
>> update_rq_clock(later_rq);
>> - add_running_bw(&next_task->dl, &later_rq->dl);
>
> Looking at the code again and thinking a little bit more about this
> issue, I suspect a similar change is needed in pull_dl_task() too, no?

The code looks the same. Let me try to test it. I will add it in v2 then.

2019-07-29 16:41:02

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/5] sched/deadline: Fix double accounting of rq/running bw in push_dl_task()

On Mon, Jul 29, 2019 at 09:59:28AM +0100, Dietmar Eggemann wrote:
> On 7/26/19 11:11 AM, luca abeni wrote:
> > Hi Dietmar,
> >
> > On Fri, 26 Jul 2019 09:27:52 +0100
> > Dietmar Eggemann <[email protected]> wrote:
> >
> >> push_dl_task() always calls deactivate_task() with flags=0 which sets
> >> p->on_rq=TASK_ON_RQ_MIGRATING.
> >
> > Uhm... This is a recent change in the deactivate_task() behaviour,
> > right? Because I tested SCHED_DEADLINE a lot, but I've never seen this
> > issue :)
>
> Looks like it was v5.2 commit 7dd778841164 ("sched/core: Unify p->on_rq
> updates").

Argh, that wasn't intentional (obviously); please post v2 with that as
Fixes and I'll get it in sched/urgent and stable.

2019-07-29 16:52:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/5] sched/deadline: Remove unused int flags from __dequeue_task_dl()

On Fri, Jul 26, 2019 at 09:27:53AM +0100, Dietmar Eggemann wrote:
> The int flags parameter is not used in __dequeue_task_dl(). Remove it.

I just posted a patch(es) that will actually make use of it and extends
the flags argument into dequeue_dl_entity().

https://lkml.kernel.org/r/[email protected]

2019-07-29 17:01:39

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 3/5] sched/deadline: Use __sub_running_bw() throughout dl_change_utilization()

On Fri, Jul 26, 2019 at 09:27:54AM +0100, Dietmar Eggemann wrote:
> dl_change_utilization() has a BUG_ON() to check that no schedutil
> kthread (sugov) is entering this function. So instead of calling
> sub_running_bw() which checks for the special entity related to a
> sugov thread, call the underlying function __sub_running_bw().
>
> Signed-off-by: Dietmar Eggemann <[email protected]>
> ---
> kernel/sched/deadline.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 99d4c24a8637..1fa005f79307 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -164,7 +164,7 @@ void dl_change_utilization(struct task_struct *p, u64 new_bw)
>
> rq = task_rq(p);
> if (p->dl.dl_non_contending) {
> - sub_running_bw(&p->dl, &rq->dl);
> + __sub_running_bw(p->dl.dl_bw, &rq->dl);
> p->dl.dl_non_contending = 0;
> /*
> * If the timer handler is currently running and the

I'm confused; the only called of dl_change_utilization() is
sched_dl_overflow(), and that already checks FLAG_SUGOV and exits before
calling.

So how can this matter?

2019-07-29 17:02:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 4/5] sched/deadline: Cleanup on_dl_rq() handling

On Fri, Jul 26, 2019 at 09:27:55AM +0100, Dietmar Eggemann wrote:
> Remove BUG_ON() in __enqueue_dl_entity() since there is already one in
> enqueue_dl_entity().
>
> Move the check that the dl_se is not on the dl_rq from
> __dequeue_dl_entity() to dequeue_dl_entity() to align with the enqueue
> side and use the on_dl_rq() helper function.
>
> Signed-off-by: Dietmar Eggemann <[email protected]>
> ---
> kernel/sched/deadline.c | 8 +++-----
> 1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> index 1fa005f79307..a9cb52ceb761 100644
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -1407,8 +1407,6 @@ static void __enqueue_dl_entity(struct sched_dl_entity *dl_se)
> struct sched_dl_entity *entry;
> int leftmost = 1;
>
> - BUG_ON(!RB_EMPTY_NODE(&dl_se->rb_node));
> -
> while (*link) {
> parent = *link;
> entry = rb_entry(parent, struct sched_dl_entity, rb_node);
> @@ -1430,9 +1428,6 @@ static void __dequeue_dl_entity(struct sched_dl_entity *dl_se)
> {
> struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
>
> - if (RB_EMPTY_NODE(&dl_se->rb_node))
> - return;
> -
> rb_erase_cached(&dl_se->rb_node, &dl_rq->root);
> RB_CLEAR_NODE(&dl_se->rb_node);
>
> @@ -1466,6 +1461,9 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se,
>
> static void dequeue_dl_entity(struct sched_dl_entity *dl_se)
> {
> + if (!on_dl_rq(dl_se))
> + return;

Why allow double dequeue instead of WARN?

> +
> __dequeue_dl_entity(dl_se);
> }
>
> --
> 2.17.1
>

2019-07-29 17:06:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 5/5] sched/deadline: Use return value of SCHED_WARN_ON() in bw accounting

On Fri, Jul 26, 2019 at 12:18:19PM +0200, luca abeni wrote:
> Hi Dietmar,
>
> On Fri, 26 Jul 2019 09:27:56 +0100
> Dietmar Eggemann <[email protected]> wrote:
>
> > To make the decision whether to set rq or running bw to 0 in underflow
> > case use the return value of SCHED_WARN_ON() rather than an extra if
> > condition.
>
> I think I tried this at some point, but if I remember well this
> solution does not work correctly when SCHED_DEBUG is not enabled.

Well, it 'works' in so far that it compiles. But it might not be what
one expects. That is, for !SCHED_DEBUG the return value is an
unconditional false.

In this case I think that's fine, the WARN _should_ not be happending.

2019-07-29 17:07:25

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH 5/5] sched/deadline: Use return value of SCHED_WARN_ON() in bw accounting

On 7/29/19 5:54 PM, Peter Zijlstra wrote:
> On Fri, Jul 26, 2019 at 12:18:19PM +0200, luca abeni wrote:
>> Hi Dietmar,
>>
>> On Fri, 26 Jul 2019 09:27:56 +0100
>> Dietmar Eggemann <[email protected]> wrote:
>>
>>> To make the decision whether to set rq or running bw to 0 in underflow
>>> case use the return value of SCHED_WARN_ON() rather than an extra if
>>> condition.
>>
>> I think I tried this at some point, but if I remember well this
>> solution does not work correctly when SCHED_DEBUG is not enabled.
>
> Well, it 'works' in so far that it compiles. But it might not be what
> one expects. That is, for !SCHED_DEBUG the return value is an
> unconditional false.
>
> In this case I think that's fine, the WARN _should_ not be happending.

But there is commit 6d3aed3d ("sched/debug: Fix SCHED_WARN_ON() to
return a value on !CONFIG_SCHED_DEBUG as well")?

But it doesn't work with !CONFIG_SCHED_DEBUG

What compiles and work is (at least on Arm64).

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 4012f98b9d26..494a767a4091 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -78,7 +78,7 @@
#ifdef CONFIG_SCHED_DEBUG
# define SCHED_WARN_ON(x) WARN_ONCE(x, #x)
#else
-# define SCHED_WARN_ON(x) ({ (void)(x), 0; })
+# define SCHED_WARN_ON(x) ({ (void)(x), x; })


2019-07-29 17:13:51

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH 2/5] sched/deadline: Remove unused int flags from __dequeue_task_dl()

On 7/29/19 5:35 PM, Peter Zijlstra wrote:
> On Fri, Jul 26, 2019 at 09:27:53AM +0100, Dietmar Eggemann wrote:
>> The int flags parameter is not used in __dequeue_task_dl(). Remove it.
>
> I just posted a patch(es) that will actually make use of it and extends
> the flags argument into dequeue_dl_entity().
>
> https://lkml.kernel.org/r/[email protected]

I see, I will skip this one then.


2019-07-29 22:12:20

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH 3/5] sched/deadline: Use __sub_running_bw() throughout dl_change_utilization()

On 7/29/19 5:47 PM, Peter Zijlstra wrote:
> On Fri, Jul 26, 2019 at 09:27:54AM +0100, Dietmar Eggemann wrote:
>> dl_change_utilization() has a BUG_ON() to check that no schedutil
>> kthread (sugov) is entering this function. So instead of calling
>> sub_running_bw() which checks for the special entity related to a
>> sugov thread, call the underlying function __sub_running_bw().
>>
>> Signed-off-by: Dietmar Eggemann <[email protected]>
>> ---
>> kernel/sched/deadline.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>> index 99d4c24a8637..1fa005f79307 100644
>> --- a/kernel/sched/deadline.c
>> +++ b/kernel/sched/deadline.c
>> @@ -164,7 +164,7 @@ void dl_change_utilization(struct task_struct *p, u64 new_bw)
>>
>> rq = task_rq(p);
>> if (p->dl.dl_non_contending) {
>> - sub_running_bw(&p->dl, &rq->dl);
>> + __sub_running_bw(p->dl.dl_bw, &rq->dl);
>> p->dl.dl_non_contending = 0;
>> /*
>> * If the timer handler is currently running and the
>
> I'm confused; the only called of dl_change_utilization() is
> sched_dl_overflow(), and that already checks FLAG_SUGOV and exits before
> calling.

That's right. There is even a BUG_ON(p->dl.flags & SCHED_FLAG_SUGOV) at
the beginning of dl_change_utilization().

> So how can this matter?

We save the 'if (!dl_entity_is_special(dl_se))' from sub_running_bw()
when we call __sub_running_bw() since we know it can't be a special task.

Later in dl_change_utilization() we already use the inner bw accounting
functions __sub_rq_bw() and __add_rq_bw().

2019-07-30 10:19:17

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH 4/5] sched/deadline: Cleanup on_dl_rq() handling

On 29/07/19 18:49, Peter Zijlstra wrote:
> On Fri, Jul 26, 2019 at 09:27:55AM +0100, Dietmar Eggemann wrote:
> > Remove BUG_ON() in __enqueue_dl_entity() since there is already one in
> > enqueue_dl_entity().
> >
> > Move the check that the dl_se is not on the dl_rq from
> > __dequeue_dl_entity() to dequeue_dl_entity() to align with the enqueue
> > side and use the on_dl_rq() helper function.
> >
> > Signed-off-by: Dietmar Eggemann <[email protected]>
> > ---
> > kernel/sched/deadline.c | 8 +++-----
> > 1 file changed, 3 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > index 1fa005f79307..a9cb52ceb761 100644
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -1407,8 +1407,6 @@ static void __enqueue_dl_entity(struct sched_dl_entity *dl_se)
> > struct sched_dl_entity *entry;
> > int leftmost = 1;
> >
> > - BUG_ON(!RB_EMPTY_NODE(&dl_se->rb_node));
> > -
> > while (*link) {
> > parent = *link;
> > entry = rb_entry(parent, struct sched_dl_entity, rb_node);
> > @@ -1430,9 +1428,6 @@ static void __dequeue_dl_entity(struct sched_dl_entity *dl_se)
> > {
> > struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> >
> > - if (RB_EMPTY_NODE(&dl_se->rb_node))
> > - return;
> > -
> > rb_erase_cached(&dl_se->rb_node, &dl_rq->root);
> > RB_CLEAR_NODE(&dl_se->rb_node);
> >
> > @@ -1466,6 +1461,9 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se,
> >
> > static void dequeue_dl_entity(struct sched_dl_entity *dl_se)
> > {
> > + if (!on_dl_rq(dl_se))
> > + return;
>
> Why allow double dequeue instead of WARN?

As I was saying to Valentin, it can currently happen that a task could
have already been dequeued by update_curr_dl()->throttle called by
dequeue_task_dl() before calling __dequeue_task_dl(). Do you think we
should check for this condition before calling into dequeue_dl_entity()?

2019-07-30 11:08:04

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 4/5] sched/deadline: Cleanup on_dl_rq() handling

On Tue, Jul 30, 2019 at 08:41:15AM +0200, Juri Lelli wrote:
> On 29/07/19 18:49, Peter Zijlstra wrote:
> > On Fri, Jul 26, 2019 at 09:27:55AM +0100, Dietmar Eggemann wrote:
> > > Remove BUG_ON() in __enqueue_dl_entity() since there is already one in
> > > enqueue_dl_entity().
> > >
> > > Move the check that the dl_se is not on the dl_rq from
> > > __dequeue_dl_entity() to dequeue_dl_entity() to align with the enqueue
> > > side and use the on_dl_rq() helper function.
> > >
> > > Signed-off-by: Dietmar Eggemann <[email protected]>
> > > ---
> > > kernel/sched/deadline.c | 8 +++-----
> > > 1 file changed, 3 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
> > > index 1fa005f79307..a9cb52ceb761 100644
> > > --- a/kernel/sched/deadline.c
> > > +++ b/kernel/sched/deadline.c
> > > @@ -1407,8 +1407,6 @@ static void __enqueue_dl_entity(struct sched_dl_entity *dl_se)
> > > struct sched_dl_entity *entry;
> > > int leftmost = 1;
> > >
> > > - BUG_ON(!RB_EMPTY_NODE(&dl_se->rb_node));
> > > -
> > > while (*link) {
> > > parent = *link;
> > > entry = rb_entry(parent, struct sched_dl_entity, rb_node);
> > > @@ -1430,9 +1428,6 @@ static void __dequeue_dl_entity(struct sched_dl_entity *dl_se)
> > > {
> > > struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
> > >
> > > - if (RB_EMPTY_NODE(&dl_se->rb_node))
> > > - return;
> > > -
> > > rb_erase_cached(&dl_se->rb_node, &dl_rq->root);
> > > RB_CLEAR_NODE(&dl_se->rb_node);
> > >
> > > @@ -1466,6 +1461,9 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se,
> > >
> > > static void dequeue_dl_entity(struct sched_dl_entity *dl_se)
> > > {
> > > + if (!on_dl_rq(dl_se))
> > > + return;
> >
> > Why allow double dequeue instead of WARN?
>
> As I was saying to Valentin, it can currently happen that a task could
> have already been dequeued by update_curr_dl()->throttle called by
> dequeue_task_dl() before calling __dequeue_task_dl(). Do you think we
> should check for this condition before calling into dequeue_dl_entity()?

Yes, that's what ->dl_throttled is for, right? And !->dl_throttled &&
!on_dl_rq() is a BUG.

2019-07-30 14:52:29

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 5/5] sched/deadline: Use return value of SCHED_WARN_ON() in bw accounting

On Mon, Jul 29, 2019 at 05:59:04PM +0100, Dietmar Eggemann wrote:
> On 7/29/19 5:54 PM, Peter Zijlstra wrote:
> > On Fri, Jul 26, 2019 at 12:18:19PM +0200, luca abeni wrote:
> >> Hi Dietmar,
> >>
> >> On Fri, 26 Jul 2019 09:27:56 +0100
> >> Dietmar Eggemann <[email protected]> wrote:
> >>
> >>> To make the decision whether to set rq or running bw to 0 in underflow
> >>> case use the return value of SCHED_WARN_ON() rather than an extra if
> >>> condition.
> >>
> >> I think I tried this at some point, but if I remember well this
> >> solution does not work correctly when SCHED_DEBUG is not enabled.
> >
> > Well, it 'works' in so far that it compiles. But it might not be what
> > one expects. That is, for !SCHED_DEBUG the return value is an
> > unconditional false.
> >
> > In this case I think that's fine, the WARN _should_ not be happending.
>
> But there is commit 6d3aed3d ("sched/debug: Fix SCHED_WARN_ON() to
> return a value on !CONFIG_SCHED_DEBUG as well")?
>
> But it doesn't work with !CONFIG_SCHED_DEBUG
>
> What compiles and work is (at least on Arm64).
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index 4012f98b9d26..494a767a4091 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -78,7 +78,7 @@
> #ifdef CONFIG_SCHED_DEBUG
> # define SCHED_WARN_ON(x) WARN_ONCE(x, #x)
> #else
> -# define SCHED_WARN_ON(x) ({ (void)(x), 0; })
> +# define SCHED_WARN_ON(x) ({ (void)(x), x; })

Why doesn't the ,0 compile? That should work just fine. The thing is,
the two existing users:

kernel/sched/fair.c: if (SCHED_WARN_ON(!se->on_rq))
kernel/sched/fair.c: if (SCHED_WARN_ON(!se->on_rq))

seem to compile just fine with it.

2019-07-30 19:38:39

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH 5/5] sched/deadline: Use return value of SCHED_WARN_ON() in bw accounting

On 7/30/19 9:23 AM, Peter Zijlstra wrote:
> On Mon, Jul 29, 2019 at 05:59:04PM +0100, Dietmar Eggemann wrote:
>> On 7/29/19 5:54 PM, Peter Zijlstra wrote:
>>> On Fri, Jul 26, 2019 at 12:18:19PM +0200, luca abeni wrote:
>>>> Hi Dietmar,
>>>>
>>>> On Fri, 26 Jul 2019 09:27:56 +0100
>>>> Dietmar Eggemann <[email protected]> wrote:
>>>>
>>>>> To make the decision whether to set rq or running bw to 0 in underflow
>>>>> case use the return value of SCHED_WARN_ON() rather than an extra if
>>>>> condition.
>>>>
>>>> I think I tried this at some point, but if I remember well this
>>>> solution does not work correctly when SCHED_DEBUG is not enabled.
>>>
>>> Well, it 'works' in so far that it compiles. But it might not be what
>>> one expects. That is, for !SCHED_DEBUG the return value is an
>>> unconditional false.
>>>
>>> In this case I think that's fine, the WARN _should_ not be happending.
>>
>> But there is commit 6d3aed3d ("sched/debug: Fix SCHED_WARN_ON() to
>> return a value on !CONFIG_SCHED_DEBUG as well")?
>>
>> But it doesn't work with !CONFIG_SCHED_DEBUG
>>
>> What compiles and work is (at least on Arm64).
>>
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index 4012f98b9d26..494a767a4091 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -78,7 +78,7 @@
>> #ifdef CONFIG_SCHED_DEBUG
>> # define SCHED_WARN_ON(x) WARN_ONCE(x, #x)
>> #else
>> -# define SCHED_WARN_ON(x) ({ (void)(x), 0; })
>> +# define SCHED_WARN_ON(x) ({ (void)(x), x; })
>
> Why doesn't the ,0 compile? That should work just fine. The thing is,
> the two existing users:
>
> kernel/sched/fair.c: if (SCHED_WARN_ON(!se->on_rq))
> kernel/sched/fair.c: if (SCHED_WARN_ON(!se->on_rq))
>
> seem to compile just fine with it.

You're right, compiling is not an issue. But it looks like the code does
different things depending on CONFIG_SCHED_DEBUG.

E.g. in set_last_buddy() we would return if '!se->on_rq' with
CONFIG_SCHED_DEBUG and continue the for_each_sched_entity() with
!CONFIG_SCHED_DEBUG.

IMHO, this forced Luca e.g. in __sub_running_bw() to code:

SCHED_WARN_ON(dl_rq->running_bw > old);
if (dl_rq->running_bw > old)
dl_rq->running_bw = 0;

and not:

if (SCHED_WARN_ON(dl_rq->running_bw > old))
dl_rq->running_bw = 0;

I experimented with '# define SCHED_WARN_ON(x) ({ (void)(x), x; })' and
in this case code inside an 'if (SCHED_WARN_ON(cond))' is also executed
with !CONFIG_SCHED_DEBUG.

2019-07-31 12:18:29

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH 1/5] sched/deadline: Fix double accounting of rq/running bw in push_dl_task()

On 7/29/19 10:00 AM, Dietmar Eggemann wrote:
> On 7/26/19 2:30 PM, luca abeni wrote:
>> Hi,
>>
>> On Fri, 26 Jul 2019 09:27:52 +0100
>> Dietmar Eggemann <[email protected]> wrote:
>> [...]
>>> @@ -2121,17 +2121,13 @@ static int push_dl_task(struct rq *rq)
>>> }
>>>
>>> deactivate_task(rq, next_task, 0);
>>> - sub_running_bw(&next_task->dl, &rq->dl);
>>> - sub_rq_bw(&next_task->dl, &rq->dl);
>>> set_task_cpu(next_task, later_rq->cpu);
>>> - add_rq_bw(&next_task->dl, &later_rq->dl);
>>>
>>> /*
>>> * Update the later_rq clock here, because the clock is used
>>> * by the cpufreq_update_util() inside __add_running_bw().
>>> */
>>> update_rq_clock(later_rq);
>>> - add_running_bw(&next_task->dl, &later_rq->dl);
>>
>> Looking at the code again and thinking a little bit more about this
>> issue, I suspect a similar change is needed in pull_dl_task() too, no?
>
> The code looks the same. Let me try to test it. I will add it in v2 then.

Like you expected, it happens on the pull side as well.

[ 129.813720] --> CPU7: pull_dl_task: p=[thread0-6 1313] dl_bw=125829
src_rq->dl.running_bw=251658 this_rq->dl.running_bw=125829
[ 129.825167] <-- CPU7: pull_dl_task: p=[thread0-6 1313] dl_bw=125829
src_rq->dl.running_bw=0 this_rq->dl.running_bw=377487
...
[ 129.948528] dl_rq->running_bw > old
[ 129.948568] WARNING: CPU: 7 PID: 0 at kernel/sched/deadline.c:117
inactive_task_timer+0x5e8/0x6b8
...
[ 130.077158] inactive_task_timer+0x5e8/0x6b8
[ 130.081427] __hrtimer_run_queues+0x12c/0x398
[ 130.085782] hrtimer_interrupt+0xfc/0x330
...

2019-07-31 19:36:08

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH 4/5] sched/deadline: Cleanup on_dl_rq() handling

On 7/30/19 9:21 AM, Peter Zijlstra wrote:
> On Tue, Jul 30, 2019 at 08:41:15AM +0200, Juri Lelli wrote:
>> On 29/07/19 18:49, Peter Zijlstra wrote:
>>> On Fri, Jul 26, 2019 at 09:27:55AM +0100, Dietmar Eggemann wrote:
>>>> Remove BUG_ON() in __enqueue_dl_entity() since there is already one in
>>>> enqueue_dl_entity().
>>>>
>>>> Move the check that the dl_se is not on the dl_rq from
>>>> __dequeue_dl_entity() to dequeue_dl_entity() to align with the enqueue
>>>> side and use the on_dl_rq() helper function.
>>>>
>>>> Signed-off-by: Dietmar Eggemann <[email protected]>
>>>> ---
>>>> kernel/sched/deadline.c | 8 +++-----
>>>> 1 file changed, 3 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
>>>> index 1fa005f79307..a9cb52ceb761 100644
>>>> --- a/kernel/sched/deadline.c
>>>> +++ b/kernel/sched/deadline.c
>>>> @@ -1407,8 +1407,6 @@ static void __enqueue_dl_entity(struct sched_dl_entity *dl_se)
>>>> struct sched_dl_entity *entry;
>>>> int leftmost = 1;
>>>>
>>>> - BUG_ON(!RB_EMPTY_NODE(&dl_se->rb_node));
>>>> -
>>>> while (*link) {
>>>> parent = *link;
>>>> entry = rb_entry(parent, struct sched_dl_entity, rb_node);
>>>> @@ -1430,9 +1428,6 @@ static void __dequeue_dl_entity(struct sched_dl_entity *dl_se)
>>>> {
>>>> struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
>>>>
>>>> - if (RB_EMPTY_NODE(&dl_se->rb_node))
>>>> - return;
>>>> -
>>>> rb_erase_cached(&dl_se->rb_node, &dl_rq->root);
>>>> RB_CLEAR_NODE(&dl_se->rb_node);
>>>>
>>>> @@ -1466,6 +1461,9 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se,
>>>>
>>>> static void dequeue_dl_entity(struct sched_dl_entity *dl_se)
>>>> {
>>>> + if (!on_dl_rq(dl_se))
>>>> + return;
>>>
>>> Why allow double dequeue instead of WARN?
>>
>> As I was saying to Valentin, it can currently happen that a task could
>> have already been dequeued by update_curr_dl()->throttle called by
>> dequeue_task_dl() before calling __dequeue_task_dl(). Do you think we
>> should check for this condition before calling into dequeue_dl_entity()?
>
> Yes, that's what ->dl_throttled is for, right? And !->dl_throttled &&
> !on_dl_rq() is a BUG.

OK, I will add the following snippet to the patch.
Although it's easy to provoke a situation in which DL tasks are throttled,
I haven't seen a throttling happening when the task is being dequeued.

--->8---

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b6d2f263e0a4..a009762097fa 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -1507,8 +1507,7 @@ enqueue_dl_entity(struct sched_dl_entity *dl_se,

static void dequeue_dl_entity(struct sched_dl_entity *dl_se)
{
- if (!on_dl_rq(dl_se))
- return;
+ BUG_ON(!on_dl_rq(dl_se));

__dequeue_dl_entity(dl_se);
}
@@ -1592,6 +1591,10 @@ static void __dequeue_task_dl(struct rq *rq, struct task_struct *p)
static void dequeue_task_dl(struct rq *rq, struct task_struct *p, int flags)
{
update_curr_dl(rq);
+
+ if (p->dl.dl_throttled)
+ return;
+
__dequeue_task_dl(rq, p);

2019-07-31 22:18:39

by luca abeni

[permalink] [raw]
Subject: Re: [PATCH 4/5] sched/deadline: Cleanup on_dl_rq() handling

On Wed, 31 Jul 2019 18:32:47 +0100
Dietmar Eggemann <[email protected]> wrote:
[...]
> >>>> static void dequeue_dl_entity(struct sched_dl_entity *dl_se)
> >>>> {
> >>>> + if (!on_dl_rq(dl_se))
> >>>> + return;
> >>>
> >>> Why allow double dequeue instead of WARN?
> >>
> >> As I was saying to Valentin, it can currently happen that a task
> >> could have already been dequeued by update_curr_dl()->throttle
> >> called by dequeue_task_dl() before calling __dequeue_task_dl(). Do
> >> you think we should check for this condition before calling into
> >> dequeue_dl_entity()?
> >
> > Yes, that's what ->dl_throttled is for, right? And !->dl_throttled
> > && !on_dl_rq() is a BUG.
>
> OK, I will add the following snippet to the patch.
> Although it's easy to provoke a situation in which DL tasks are
> throttled, I haven't seen a throttling happening when the task is
> being dequeued.

This is a not-so-common situation, that can happen with periodic tasks
(a-la rt-app) blocking on clock_nanosleep() (or similar) after
executing for an amount of time comparable with the SCHED_DEADLINE
runtime.

It might happen that the task consumed a little bit more than the
remaining runtime (but has not been throttled yet, because the
accounting happens at every tick)... So, when dequeue_task_dl() invokes
update_task_dl() the runtime becomes negative and the task is throttled.

This happens infrequently, but if you try rt-app tasksets with multiple
tasks and execution times near to the runtime you will see it
happening, sooner or later.


[...]
> @@ -1592,6 +1591,10 @@ static void __dequeue_task_dl(struct rq *rq,
> struct task_struct *p) static void dequeue_task_dl(struct rq *rq,
> struct task_struct *p, int flags) {
> update_curr_dl(rq);
> +
> + if (p->dl.dl_throttled)
> + return;

Sorry, I missed part of the previous discussion, so maybe I am missing
something... But I suspect this "return" might be wrong (you risk to
miss a call to task_non_contending(), coming later in this function).

Maybe you cound use
if (!p->dl_throttled)
__dequeue_task_dl(rq, p)

Or did I misunderstand something?



Thanks,
Luca

2019-08-01 21:44:25

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH 4/5] sched/deadline: Cleanup on_dl_rq() handling

On 7/31/19 9:20 PM, luca abeni wrote:
> On Wed, 31 Jul 2019 18:32:47 +0100
> Dietmar Eggemann <[email protected]> wrote:
> [...]
>>>>>> static void dequeue_dl_entity(struct sched_dl_entity *dl_se)
>>>>>> {
>>>>>> + if (!on_dl_rq(dl_se))
>>>>>> + return;
>>>>>
>>>>> Why allow double dequeue instead of WARN?
>>>>
>>>> As I was saying to Valentin, it can currently happen that a task
>>>> could have already been dequeued by update_curr_dl()->throttle
>>>> called by dequeue_task_dl() before calling __dequeue_task_dl(). Do
>>>> you think we should check for this condition before calling into
>>>> dequeue_dl_entity()?
>>>
>>> Yes, that's what ->dl_throttled is for, right? And !->dl_throttled
>>> && !on_dl_rq() is a BUG.
>>
>> OK, I will add the following snippet to the patch.
>> Although it's easy to provoke a situation in which DL tasks are
>> throttled, I haven't seen a throttling happening when the task is
>> being dequeued.
>
> This is a not-so-common situation, that can happen with periodic tasks
> (a-la rt-app) blocking on clock_nanosleep() (or similar) after
> executing for an amount of time comparable with the SCHED_DEADLINE
> runtime.
>
> It might happen that the task consumed a little bit more than the
> remaining runtime (but has not been throttled yet, because the
> accounting happens at every tick)... So, when dequeue_task_dl() invokes
> update_task_dl() the runtime becomes negative and the task is throttled.
>
> This happens infrequently, but if you try rt-app tasksets with multiple
> tasks and execution times near to the runtime you will see it
> happening, sooner or later.
>
>
> [...]
>> @@ -1592,6 +1591,10 @@ static void __dequeue_task_dl(struct rq *rq,
>> struct task_struct *p) static void dequeue_task_dl(struct rq *rq,
>> struct task_struct *p, int flags) {
>> update_curr_dl(rq);
>> +
>> + if (p->dl.dl_throttled)
>> + return;
>
> Sorry, I missed part of the previous discussion, so maybe I am missing
> something... But I suspect this "return" might be wrong (you risk to
> miss a call to task_non_contending(), coming later in this function).
>
> Maybe you cound use
> if (!p->dl_throttled)
> __dequeue_task_dl(rq, p)
>

I see. With the following rt-app file on h960 (8 CPUs) I'm able to
recreate the situation relatively frequently.

...
"tasks" : {
"thread0" : {
"instance" : 12,
"run" : 11950,
"timer" : { "ref" : "unique", "period" : 100000, "mode" : "absolute"},
"dl-runtime" : 12000,
"dl-period" : 100000,
"dl-deadline" : 100000
}
}

...
[ 1912.086664] CPU1: p=[thread0-9 3070] throttled p->on_rq=0 flags=0x9
[ 1912.086726] CPU2: p=[thread0-10 3071] throttled p->on_rq=0 flags=0x9
[ 1924.738912] CPU6: p=[thread0-10 3149] throttled p->on_rq=0 flags=0x9
...

And the flag DEQUEUE_SLEEP is set so like you said
task_non_contending(p) should be called.

I'm going to use your proposal. Thank you for the help!