2021-02-24 13:03:19

by Aubrey Li

[permalink] [raw]
Subject: [PATCH v2] sched/fair: reduce long-tail newly idle balance cost

A long-tail load balance cost is observed on the newly idle path,
this is caused by a race window between the first nr_running check
of the busiest runqueue and its nr_running recheck in detach_tasks.

Before the busiest runqueue is locked, the tasks on the busiest
runqueue could be pulled by other CPUs and nr_running of the busiest
runqueu becomes 1 or even 0 if the running task becomes idle, this
causes detach_tasks breaks with LBF_ALL_PINNED flag set, and triggers
load_balance redo at the same sched_domain level.

In order to find the new busiest sched_group and CPU, load balance will
recompute and update the various load statistics, which eventually leads
to the long-tail load balance cost.

This patch clears LBF_ALL_PINNED flag for this race condition, and hence
reduces the long-tail cost of newly idle balance.

Cc: Vincent Guittot <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Srinivas Pandruvada <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Signed-off-by: Aubrey Li <[email protected]>
---
kernel/sched/fair.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 04a3ce2..5c67804 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7675,6 +7675,15 @@ static int detach_tasks(struct lb_env *env)

lockdep_assert_held(&env->src_rq->lock);

+ /*
+ * Source run queue has been emptied by another CPU, clear
+ * LBF_ALL_PINNED flag as we will not test any task.
+ */
+ if (env->src_rq->nr_running <= 1) {
+ env->flags &= ~LBF_ALL_PINNED;
+ return 0;
+ }
+
if (env->imbalance <= 0)
return 0;

--
2.7.4


2021-03-16 11:36:22

by Li, Aubrey

[permalink] [raw]
Subject: Re: [PATCH v2] sched/fair: reduce long-tail newly idle balance cost

On 2021/2/24 16:15, Aubrey Li wrote:
> A long-tail load balance cost is observed on the newly idle path,
> this is caused by a race window between the first nr_running check
> of the busiest runqueue and its nr_running recheck in detach_tasks.
>
> Before the busiest runqueue is locked, the tasks on the busiest
> runqueue could be pulled by other CPUs and nr_running of the busiest
> runqueu becomes 1 or even 0 if the running task becomes idle, this
> causes detach_tasks breaks with LBF_ALL_PINNED flag set, and triggers
> load_balance redo at the same sched_domain level.
>
> In order to find the new busiest sched_group and CPU, load balance will
> recompute and update the various load statistics, which eventually leads
> to the long-tail load balance cost.
>
> This patch clears LBF_ALL_PINNED flag for this race condition, and hence
> reduces the long-tail cost of newly idle balance.

Ping...

>
> Cc: Vincent Guittot <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Andi Kleen <[email protected]>
> Cc: Tim Chen <[email protected]>
> Cc: Srinivas Pandruvada <[email protected]>
> Cc: Rafael J. Wysocki <[email protected]>
> Signed-off-by: Aubrey Li <[email protected]>
> ---
> kernel/sched/fair.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 04a3ce2..5c67804 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7675,6 +7675,15 @@ static int detach_tasks(struct lb_env *env)
>
> lockdep_assert_held(&env->src_rq->lock);
>
> + /*
> + * Source run queue has been emptied by another CPU, clear
> + * LBF_ALL_PINNED flag as we will not test any task.
> + */
> + if (env->src_rq->nr_running <= 1) {
> + env->flags &= ~LBF_ALL_PINNED;
> + return 0;
> + }
> +
> if (env->imbalance <= 0)
> return 0;
>
>

2021-03-23 14:54:51

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2] sched/fair: reduce long-tail newly idle balance cost

On Tue, Mar 23, 2021 at 02:44:57PM +0100, Vincent Guittot wrote:
> Hi Aurey,
>
> On Tue, 16 Mar 2021 at 05:27, Li, Aubrey <[email protected]> wrote:
> >
> > On 2021/2/24 16:15, Aubrey Li wrote:
> > > A long-tail load balance cost is observed on the newly idle path,
> > > this is caused by a race window between the first nr_running check
> > > of the busiest runqueue and its nr_running recheck in detach_tasks.
> > >
> > > Before the busiest runqueue is locked, the tasks on the busiest
> > > runqueue could be pulled by other CPUs and nr_running of the busiest
> > > runqueu becomes 1 or even 0 if the running task becomes idle, this
> > > causes detach_tasks breaks with LBF_ALL_PINNED flag set, and triggers
> > > load_balance redo at the same sched_domain level.
> > >
> > > In order to find the new busiest sched_group and CPU, load balance will
> > > recompute and update the various load statistics, which eventually leads
> > > to the long-tail load balance cost.
> > >
> > > This patch clears LBF_ALL_PINNED flag for this race condition, and hence
> > > reduces the long-tail cost of newly idle balance.
> >
> > Ping...
>
> Reviewed-by: Vincent Guittot <[email protected]>

Thanks!

Subject: [tip: sched/core] sched/fair: Reduce long-tail newly idle balance cost

The following commit has been merged into the sched/core branch of tip:

Commit-ID: acb4decc1e900468d51b33c5f1ee445278e716a7
Gitweb: https://git.kernel.org/tip/acb4decc1e900468d51b33c5f1ee445278e716a7
Author: Aubrey Li <[email protected]>
AuthorDate: Wed, 24 Feb 2021 16:15:49 +08:00
Committer: Peter Zijlstra <[email protected]>
CommitterDate: Tue, 23 Mar 2021 16:01:59 +01:00

sched/fair: Reduce long-tail newly idle balance cost

A long-tail load balance cost is observed on the newly idle path,
this is caused by a race window between the first nr_running check
of the busiest runqueue and its nr_running recheck in detach_tasks.

Before the busiest runqueue is locked, the tasks on the busiest
runqueue could be pulled by other CPUs and nr_running of the busiest
runqueu becomes 1 or even 0 if the running task becomes idle, this
causes detach_tasks breaks with LBF_ALL_PINNED flag set, and triggers
load_balance redo at the same sched_domain level.

In order to find the new busiest sched_group and CPU, load balance will
recompute and update the various load statistics, which eventually leads
to the long-tail load balance cost.

This patch clears LBF_ALL_PINNED flag for this race condition, and hence
reduces the long-tail cost of newly idle balance.

Signed-off-by: Aubrey Li <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Reviewed-by: Vincent Guittot <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
---
kernel/sched/fair.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index aaa0dfa..6d73bdb 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7687,6 +7687,15 @@ static int detach_tasks(struct lb_env *env)

lockdep_assert_held(&env->src_rq->lock);

+ /*
+ * Source run queue has been emptied by another CPU, clear
+ * LBF_ALL_PINNED flag as we will not test any task.
+ */
+ if (env->src_rq->nr_running <= 1) {
+ env->flags &= ~LBF_ALL_PINNED;
+ return 0;
+ }
+
if (env->imbalance <= 0)
return 0;

2021-03-24 01:16:30

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH v2] sched/fair: reduce long-tail newly idle balance cost

Hi Aurey,

On Tue, 16 Mar 2021 at 05:27, Li, Aubrey <[email protected]> wrote:
>
> On 2021/2/24 16:15, Aubrey Li wrote:
> > A long-tail load balance cost is observed on the newly idle path,
> > this is caused by a race window between the first nr_running check
> > of the busiest runqueue and its nr_running recheck in detach_tasks.
> >
> > Before the busiest runqueue is locked, the tasks on the busiest
> > runqueue could be pulled by other CPUs and nr_running of the busiest
> > runqueu becomes 1 or even 0 if the running task becomes idle, this
> > causes detach_tasks breaks with LBF_ALL_PINNED flag set, and triggers
> > load_balance redo at the same sched_domain level.
> >
> > In order to find the new busiest sched_group and CPU, load balance will
> > recompute and update the various load statistics, which eventually leads
> > to the long-tail load balance cost.
> >
> > This patch clears LBF_ALL_PINNED flag for this race condition, and hence
> > reduces the long-tail cost of newly idle balance.
>
> Ping...

Reviewed-by: Vincent Guittot <[email protected]>

>
> >
> > Cc: Vincent Guittot <[email protected]>
> > Cc: Mel Gorman <[email protected]>
> > Cc: Andi Kleen <[email protected]>
> > Cc: Tim Chen <[email protected]>
> > Cc: Srinivas Pandruvada <[email protected]>
> > Cc: Rafael J. Wysocki <[email protected]>
> > Signed-off-by: Aubrey Li <[email protected]>
> > ---
> > kernel/sched/fair.c | 9 +++++++++
> > 1 file changed, 9 insertions(+)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 04a3ce2..5c67804 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7675,6 +7675,15 @@ static int detach_tasks(struct lb_env *env)
> >
> > lockdep_assert_held(&env->src_rq->lock);
> >
> > + /*
> > + * Source run queue has been emptied by another CPU, clear
> > + * LBF_ALL_PINNED flag as we will not test any task.
> > + */
> > + if (env->src_rq->nr_running <= 1) {
> > + env->flags &= ~LBF_ALL_PINNED;
> > + return 0;
> > + }
> > +
> > if (env->imbalance <= 0)
> > return 0;
> >
> >
>