wake_affine_idle() can return prev_cpu. Even in such a scenario,
scheduler was going ahead and updating schedstats related to wake
affine. i.e even if the task is not moved across LLC domains,
schedstats would have accounted.
Hence add a check before updating schedstats.
Cc: LKML <[email protected]>
Cc: Gautham R Shenoy <[email protected]>
Cc: Parth Shah <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Valentin Schneider <[email protected]>
Cc: Dietmar Eggemann <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Vincent Guittot <[email protected]>
Cc: Rik van Riel <[email protected]>
Signed-off-by: Srikar Dronamraju <[email protected]>
---
kernel/sched/fair.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 794c2cb945f8..a258a84cfdfd 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5884,8 +5884,10 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
if (target == nr_cpumask_bits)
return prev_cpu;
- schedstat_inc(sd->ttwu_move_affine);
- schedstat_inc(p->se.statistics.nr_wakeups_affine);
+ if (!cpus_share_cache(prev_cpu, target)) {
+ schedstat_inc(sd->ttwu_move_affine);
+ schedstat_inc(p->se.statistics.nr_wakeups_affine);
+ }
return target;
}
--
2.18.2
On 06/05/21 22:15, Srikar Dronamraju wrote:
> wake_affine_idle() can return prev_cpu. Even in such a scenario,
> scheduler was going ahead and updating schedstats related to wake
> affine. i.e even if the task is not moved across LLC domains,
> schedstats would have accounted.
>
> Hence add a check before updating schedstats.
>
I briefly glanced at the git history but didn't find any proper description
of that stat. As it stands, it counts the number of times wake_affine()
purposedly steered a task towards a particular CPU (waker or wakee's prev),
so nr_wakeups_affine / nr_wakeups_affine_attempts is your wake_affine()
"success rate" - how often could it make a choice with the available data.
I could see a point in only incrementing the count if wake_affine() steers
towards the waker rather than the wakee (i.e. don't increment if choice is
prev), but then that has no link with LLC spans
> Cc: LKML <[email protected]>
> Cc: Gautham R Shenoy <[email protected]>
> Cc: Parth Shah <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Peter Zijlstra <[email protected]>
> Cc: Valentin Schneider <[email protected]>
> Cc: Dietmar Eggemann <[email protected]>
> Cc: Mel Gorman <[email protected]>
> Cc: Vincent Guittot <[email protected]>
> Cc: Rik van Riel <[email protected]>
> Signed-off-by: Srikar Dronamraju <[email protected]>
> ---
> kernel/sched/fair.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 794c2cb945f8..a258a84cfdfd 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5884,8 +5884,10 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
> if (target == nr_cpumask_bits)
> return prev_cpu;
>
> - schedstat_inc(sd->ttwu_move_affine);
> - schedstat_inc(p->se.statistics.nr_wakeups_affine);
> + if (!cpus_share_cache(prev_cpu, target)) {
Per the above, why? Why not just if(target == this_cpu) ?
> + schedstat_inc(sd->ttwu_move_affine);
> + schedstat_inc(p->se.statistics.nr_wakeups_affine);
> + }
> return target;
> }
>
> --
> 2.18.2
* Valentin Schneider <[email protected]> [2021-05-07 17:08:17]:
> On 06/05/21 22:15, Srikar Dronamraju wrote:
> > wake_affine_idle() can return prev_cpu. Even in such a scenario,
> > scheduler was going ahead and updating schedstats related to wake
> > affine. i.e even if the task is not moved across LLC domains,
> > schedstats would have accounted.
> >
> > Hence add a check before updating schedstats.
> >
Thanks Valentin for taking a look at the patch.
>
> I briefly glanced at the git history but didn't find any proper description
> of that stat. As it stands, it counts the number of times wake_affine()
> purposedly steered a task towards a particular CPU (waker or wakee's prev),
> so nr_wakeups_affine / nr_wakeups_affine_attempts is your wake_affine()
> "success rate" - how often could it make a choice with the available data.
>
> I could see a point in only incrementing the count if wake_affine() steers
> towards the waker rather than the wakee (i.e. don't increment if choice is
> prev), but then that has no link with LLC spans
Lets say if prev CPU and this CPU were part of the same LLC, and the prev
CPU was busy (or busier than this CPU), should consider this as a wake
affine? If prev was idle, we would have surely consider prev CPU. Also since
both are part of same LLC, we cant say this CPU is more affine than prev
CPU. Or may be I am confusing wake_affine with cache_affine.
>
> > Cc: LKML <[email protected]>
> > Cc: Gautham R Shenoy <[email protected]>
> > Cc: Parth Shah <[email protected]>
> > Cc: Ingo Molnar <[email protected]>
> > Cc: Peter Zijlstra <[email protected]>
> > Cc: Valentin Schneider <[email protected]>
> > Cc: Dietmar Eggemann <[email protected]>
> > Cc: Mel Gorman <[email protected]>
> > Cc: Vincent Guittot <[email protected]>
> > Cc: Rik van Riel <[email protected]>
> > Signed-off-by: Srikar Dronamraju <[email protected]>
> > ---
> > kernel/sched/fair.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 794c2cb945f8..a258a84cfdfd 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -5884,8 +5884,10 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
> > if (target == nr_cpumask_bits)
> > return prev_cpu;
> >
> > - schedstat_inc(sd->ttwu_move_affine);
> > - schedstat_inc(p->se.statistics.nr_wakeups_affine);
> > + if (!cpus_share_cache(prev_cpu, target)) {
>
> Per the above, why? Why not just if(target == this_cpu) ?
We could use target == this_cpu. However if prev CPU and this CPU share the
same LLC, then should we consider moving to this_cpu as an affine wakeup?
I could have probably moved this patch a later in the patch series, but one
of the patch that introduces wake_affine_idler_llc() may end up returning
neither this_cpu, prev_cpu or nr_cpumask_bits. In such a case where it
returns a CPU closer to this_cpu, then I would still mark it as wake_affine.
>
> > + schedstat_inc(sd->ttwu_move_affine);
> > + schedstat_inc(p->se.statistics.nr_wakeups_affine);
> > + }
> > return target;
> > }
> >
> > --
> > 2.18.2
--
Thanks and Regards
Srikar Dronamraju
On 07/05/21 22:35, Srikar Dronamraju wrote:
> * Valentin Schneider <[email protected]> [2021-05-07 17:08:17]:
>
>> On 06/05/21 22:15, Srikar Dronamraju wrote:
>> > wake_affine_idle() can return prev_cpu. Even in such a scenario,
>> > scheduler was going ahead and updating schedstats related to wake
>> > affine. i.e even if the task is not moved across LLC domains,
>> > schedstats would have accounted.
>> >
>> > Hence add a check before updating schedstats.
>> >
>
> Thanks Valentin for taking a look at the patch.
>
>>
>> I briefly glanced at the git history but didn't find any proper description
>> of that stat. As it stands, it counts the number of times wake_affine()
>> purposedly steered a task towards a particular CPU (waker or wakee's prev),
>> so nr_wakeups_affine / nr_wakeups_affine_attempts is your wake_affine()
>> "success rate" - how often could it make a choice with the available data.
>>
>> I could see a point in only incrementing the count if wake_affine() steers
>> towards the waker rather than the wakee (i.e. don't increment if choice is
>> prev), but then that has no link with LLC spans
>
> Lets say if prev CPU and this CPU were part of the same LLC, and the prev
> CPU was busy (or busier than this CPU), should consider this as a wake
> affine? If prev was idle, we would have surely consider prev CPU. Also since
> both are part of same LLC, we cant say this CPU is more affine than prev
> CPU. Or may be I am confusing wake_affine with cache_affine.
>
SD_WAKE_AFFINE says: "Consider waking task on waking CPU.", with that I
read wake_affine() as: "should I place the wakee close to the waker or
close to its previous CPU?". This can be yes or no even if both are in the
same LLC.
>>
>> > Cc: LKML <[email protected]>
>> > Cc: Gautham R Shenoy <[email protected]>
>> > Cc: Parth Shah <[email protected]>
>> > Cc: Ingo Molnar <[email protected]>
>> > Cc: Peter Zijlstra <[email protected]>
>> > Cc: Valentin Schneider <[email protected]>
>> > Cc: Dietmar Eggemann <[email protected]>
>> > Cc: Mel Gorman <[email protected]>
>> > Cc: Vincent Guittot <[email protected]>
>> > Cc: Rik van Riel <[email protected]>
>> > Signed-off-by: Srikar Dronamraju <[email protected]>
>> > ---
>> > kernel/sched/fair.c | 6 ++++--
>> > 1 file changed, 4 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index 794c2cb945f8..a258a84cfdfd 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -5884,8 +5884,10 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
>> > if (target == nr_cpumask_bits)
>> > return prev_cpu;
>> >
>> > - schedstat_inc(sd->ttwu_move_affine);
>> > - schedstat_inc(p->se.statistics.nr_wakeups_affine);
>> > + if (!cpus_share_cache(prev_cpu, target)) {
>>
>> Per the above, why? Why not just if(target == this_cpu) ?
>
> We could use target == this_cpu. However if prev CPU and this CPU share the
> same LLC, then should we consider moving to this_cpu as an affine wakeup?
>
It would make sense if it's a sync wakeup, which wake_affine() does try to
do ATM (regardless of LLC actually, if I'm reading it correctly).
* Valentin Schneider <[email protected]> [2021-05-11 12:51:52]:
> On 07/05/21 22:35, Srikar Dronamraju wrote:
> > * Valentin Schneider <[email protected]> [2021-05-07 17:08:17]:
> >
> >> On 06/05/21 22:15, Srikar Dronamraju wrote:
> >> > wake_affine_idle() can return prev_cpu. Even in such a scenario,
> >> > scheduler was going ahead and updating schedstats related to wake
> >> > affine. i.e even if the task is not moved across LLC domains,
> >> > schedstats would have accounted.
> >
<snip>
> > Lets say if prev CPU and this CPU were part of the same LLC, and the prev
> > CPU was busy (or busier than this CPU), should consider this as a wake
> > affine? If prev was idle, we would have surely consider prev CPU. Also since
> > both are part of same LLC, we cant say this CPU is more affine than prev
> > CPU. Or may be I am confusing wake_affine with cache_affine.
> >
>
> SD_WAKE_AFFINE says: "Consider waking task on waking CPU.", with that I
> read wake_affine() as: "should I place the wakee close to the waker or
> close to its previous CPU?". This can be yes or no even if both are in the
> same LLC.
>
Okay.
<snip>
> >> > --- a/kernel/sched/fair.c
> >> > +++ b/kernel/sched/fair.c
> >> > @@ -5884,8 +5884,10 @@ static int wake_affine(struct sched_domain *sd, struct task_struct *p,
> >> > if (target == nr_cpumask_bits)
> >> > return prev_cpu;
> >> >
> >> > - schedstat_inc(sd->ttwu_move_affine);
> >> > - schedstat_inc(p->se.statistics.nr_wakeups_affine);
> >> > + if (!cpus_share_cache(prev_cpu, target)) {
> >>
> >> Per the above, why? Why not just if(target == this_cpu) ?
> >
> > We could use target == this_cpu. However if prev CPU and this CPU share the
> > same LLC, then should we consider moving to this_cpu as an affine wakeup?
> >
>
> It would make sense if it's a sync wakeup, which wake_affine() does try to
> do ATM (regardless of LLC actually, if I'm reading it correctly).
Okay, I will replace the cpus_share_cache check with target == this_cpu.
--
Thanks and Regards
Srikar Dronamraju