2024-01-14 18:36:19

by Vincent Guittot

[permalink] [raw]
Subject: [PATCH] sched/fair: Fix frequency selection for non invariant case

When frequency invariance is not enabled, get_capacity_ref_freq(policy)
returns the current frequency and the performance margin applied by
map_util_perf(), enabled the utilization to go above the maximum compute
capacity and to select a higher frequency than the current one.

The performance margin is now applied earlier in the path to take into
account some utilization clampings and we can't get an utilization higher
than the maximum compute capacity.

We must use a frequency above the current frequency to get a chance to
select a higher OPP when the current one becomes fully used. Apply
the same margin and returns a frequency 25% higher than the current one in
order to switch to the next OPP before we fully use the cpu at the current
one.

Reported-by: Linus Torvalds <[email protected]>
Closes: https://lore.kernel.org/lkml/CAHk-=wgWcYX2oXKtgvNN2LLDXP7kXkbo-xTfumEjmPbjSer2RQ@mail.gmail.com/
Reported-by: Wyes Karny <[email protected]>
Closes: https://lore.kernel.org/lkml/20240114091240.xzdvqk75ifgfj5yx@wyes-pc/
Fixes: 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation")
Signed-off-by: Vincent Guittot <[email protected]>
Tested-by: Wyes Karny <[email protected]>
---
kernel/sched/cpufreq_schedutil.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 95c3c097083e..d12e95d30e2e 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -133,7 +133,11 @@ unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
if (arch_scale_freq_invariant())
return policy->cpuinfo.max_freq;

- return policy->cur;
+ /*
+ * Apply a 25% margin so that we select a higher frequency than
+ * the current one before the CPU is full busy
+ */
+ return policy->cur + (policy->cur >> 2);
}

/**
--
2.34.1



2024-01-15 12:15:15

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix frequency selection for non invariant case

On 01/14/24 19:36, Vincent Guittot wrote:
> When frequency invariance is not enabled, get_capacity_ref_freq(policy)
> returns the current frequency and the performance margin applied by
> map_util_perf(), enabled the utilization to go above the maximum compute
> capacity and to select a higher frequency than the current one.
>
> The performance margin is now applied earlier in the path to take into
> account some utilization clampings and we can't get an utilization higher
> than the maximum compute capacity.
>
> We must use a frequency above the current frequency to get a chance to
> select a higher OPP when the current one becomes fully used. Apply
> the same margin and returns a frequency 25% higher than the current one in
> order to switch to the next OPP before we fully use the cpu at the current
> one.
>
> Reported-by: Linus Torvalds <[email protected]>
> Closes: https://lore.kernel.org/lkml/CAHk-=wgWcYX2oXKtgvNN2LLDXP7kXkbo-xTfumEjmPbjSer2RQ@mail.gmail.com/
> Reported-by: Wyes Karny <[email protected]>
> Closes: https://lore.kernel.org/lkml/20240114091240.xzdvqk75ifgfj5yx@wyes-pc/
> Fixes: 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation")
> Signed-off-by: Vincent Guittot <[email protected]>
> Tested-by: Wyes Karny <[email protected]>
> ---
> kernel/sched/cpufreq_schedutil.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 95c3c097083e..d12e95d30e2e 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -133,7 +133,11 @@ unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
> if (arch_scale_freq_invariant())
> return policy->cpuinfo.max_freq;
>
> - return policy->cur;
> + /*
> + * Apply a 25% margin so that we select a higher frequency than
> + * the current one before the CPU is full busy
> + */
> + return policy->cur + (policy->cur >> 2);

I think we can do better, but this does re-instate the previous behavior at
least for this merge window. So FWIW

Reviewed-and-tested-by: Qais Yousef <[email protected]>

> }
>
> /**
> --
> 2.34.1
>

2024-01-15 20:06:47

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix frequency selection for non invariant case

On 14/01/2024 19:36, Vincent Guittot wrote:
> When frequency invariance is not enabled, get_capacity_ref_freq(policy)
> returns the current frequency and the performance margin applied by
> map_util_perf(), enabled the utilization to go above the maximum compute
> capacity and to select a higher frequency than the current one.
>
> The performance margin is now applied earlier in the path to take into
> account some utilization clampings and we can't get an utilization higher
> than the maximum compute capacity.
>
> We must use a frequency above the current frequency to get a chance to
> select a higher OPP when the current one becomes fully used. Apply
> the same margin and returns a frequency 25% higher than the current one in
> order to switch to the next OPP before we fully use the cpu at the current
> one.
>
> Reported-by: Linus Torvalds <[email protected]>
> Closes: https://lore.kernel.org/lkml/CAHk-=wgWcYX2oXKtgvNN2LLDXP7kXkbo-xTfumEjmPbjSer2RQ@mail.gmail.com/
> Reported-by: Wyes Karny <[email protected]>
> Closes: https://lore.kernel.org/lkml/20240114091240.xzdvqk75ifgfj5yx@wyes-pc/
> Fixes: 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation")
> Signed-off-by: Vincent Guittot <[email protected]>
> Tested-by: Wyes Karny <[email protected]>
> ---
> kernel/sched/cpufreq_schedutil.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 95c3c097083e..d12e95d30e2e 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -133,7 +133,11 @@ unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
> if (arch_scale_freq_invariant())
> return policy->cpuinfo.max_freq;
>
> - return policy->cur;
> + /*
> + * Apply a 25% margin so that we select a higher frequency than
> + * the current one before the CPU is full busy
> + */
> + return policy->cur + (policy->cur >> 2);
> }
>
> /**

Reviewed-by: Dietmar Eggemann <[email protected]>
Tested-by: Dietmar Eggemann <[email protected]>

(on Intel Xeon CPU E5-2690 v2 with frequency invariance disabled)

2024-01-16 10:01:08

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix frequency selection for non invariant case


* Vincent Guittot <[email protected]> wrote:

> When frequency invariance is not enabled, get_capacity_ref_freq(policy)
> returns the current frequency and the performance margin applied by
> map_util_perf(), enabled the utilization to go above the maximum compute
> capacity and to select a higher frequency than the current one.
>
> The performance margin is now applied earlier in the path to take into
> account some utilization clampings and we can't get an utilization higher
> than the maximum compute capacity.
>
> We must use a frequency above the current frequency to get a chance to
> select a higher OPP when the current one becomes fully used. Apply
> the same margin and returns a frequency 25% higher than the current one in
> order to switch to the next OPP before we fully use the cpu at the current
> one.
>
> Reported-by: Linus Torvalds <[email protected]>
> Closes: https://lore.kernel.org/lkml/CAHk-=wgWcYX2oXKtgvNN2LLDXP7kXkbo-xTfumEjmPbjSer2RQ@mail.gmail.com/
> Reported-by: Wyes Karny <[email protected]>
> Closes: https://lore.kernel.org/lkml/20240114091240.xzdvqk75ifgfj5yx@wyes-pc/
> Fixes: 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation")
> Signed-off-by: Vincent Guittot <[email protected]>
> Tested-by: Wyes Karny <[email protected]>
> ---
> kernel/sched/cpufreq_schedutil.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 95c3c097083e..d12e95d30e2e 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -133,7 +133,11 @@ unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
> if (arch_scale_freq_invariant())
> return policy->cpuinfo.max_freq;
>
> - return policy->cur;
> + /*
> + * Apply a 25% margin so that we select a higher frequency than
> + * the current one before the CPU is full busy
> + */
> + return policy->cur + (policy->cur >> 2);
> }

I've updated the changelog to better express what was broken and how we
fixed it. Ack?

Ingo

==========================>
From: Vincent Guittot <[email protected]>
Date: Sun, 14 Jan 2024 19:36:00 +0100
Subject: [PATCH] sched/fair: Fix frequency selection for non-invariant case

Linus reported a ~50% performance regression on single-threaded
workloads on his AMD Ryzen system, and bisected it to:

9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation")

When frequency invariance is not enabled, get_capacity_ref_freq(policy)
is supposed to return the current frequency and the performance margin
applied by map_util_perf(), enabling the utilization to go above the
maximum compute capacity and to select a higher frequency than the current one.

After the changes in 9c0b4bb7f630, the performance margin was applied
earlier in the path to take into account utilization clampings and
we couldn't get a utilization higher than the maximum compute capacity,
and the CPU remained 'stuck' at lower frequencies.

To fix this, we must use a frequency above the current frequency to
get a chance to select a higher OPP when the current one becomes fully used.
Apply the same margin and return a frequency 25% higher than the current
one in order to switch to the next OPP before we fully use the CPU
at the current one.

[ mingo: Clarified the changelog. ]

Fixes: 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation")
Reported-by: Linus Torvalds <[email protected]>
Bisected-by: Linus Torvalds <[email protected]>
Reported-by: Wyes Karny <[email protected]>
Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Tested-by: Wyes Karny <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/cpufreq_schedutil.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 95c3c097083e..eece6244f9d2 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -133,7 +133,11 @@ unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
if (arch_scale_freq_invariant())
return policy->cpuinfo.max_freq;

- return policy->cur;
+ /*
+ * Apply a 25% margin so that we select a higher frequency than
+ * the current one before the CPU is fully busy:
+ */
+ return policy->cur + (policy->cur >> 2);
}

/**

2024-01-16 10:04:33

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix frequency selection for non invariant case

On Tue, 16 Jan 2024 at 10:59, Ingo Molnar <[email protected]> wrote:
>
>
> * Vincent Guittot <[email protected]> wrote:
>
> > When frequency invariance is not enabled, get_capacity_ref_freq(policy)
> > returns the current frequency and the performance margin applied by
> > map_util_perf(), enabled the utilization to go above the maximum compute
> > capacity and to select a higher frequency than the current one.
> >
> > The performance margin is now applied earlier in the path to take into
> > account some utilization clampings and we can't get an utilization higher
> > than the maximum compute capacity.
> >
> > We must use a frequency above the current frequency to get a chance to
> > select a higher OPP when the current one becomes fully used. Apply
> > the same margin and returns a frequency 25% higher than the current one in
> > order to switch to the next OPP before we fully use the cpu at the current
> > one.
> >
> > Reported-by: Linus Torvalds <[email protected]>
> > Closes: https://lore.kernel.org/lkml/CAHk-=wgWcYX2oXKtgvNN2LLDXP7kXkbo-xTfumEjmPbjSer2RQ@mail.gmail.com/
> > Reported-by: Wyes Karny <[email protected]>
> > Closes: https://lore.kernel.org/lkml/20240114091240.xzdvqk75ifgfj5yx@wyes-pc/
> > Fixes: 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation")
> > Signed-off-by: Vincent Guittot <[email protected]>
> > Tested-by: Wyes Karny <[email protected]>
> > ---
> > kernel/sched/cpufreq_schedutil.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 95c3c097083e..d12e95d30e2e 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -133,7 +133,11 @@ unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
> > if (arch_scale_freq_invariant())
> > return policy->cpuinfo.max_freq;
> >
> > - return policy->cur;
> > + /*
> > + * Apply a 25% margin so that we select a higher frequency than
> > + * the current one before the CPU is full busy
> > + */
> > + return policy->cur + (policy->cur >> 2);
> > }
>
> I've updated the changelog to better express what was broken and how we
> fixed it. Ack?

Looks good

Thanks

>
> Ingo
>
> ==========================>
> From: Vincent Guittot <[email protected]>
> Date: Sun, 14 Jan 2024 19:36:00 +0100
> Subject: [PATCH] sched/fair: Fix frequency selection for non-invariant case
>
> Linus reported a ~50% performance regression on single-threaded
> workloads on his AMD Ryzen system, and bisected it to:
>
> 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation")
>
> When frequency invariance is not enabled, get_capacity_ref_freq(policy)
> is supposed to return the current frequency and the performance margin
> applied by map_util_perf(), enabling the utilization to go above the
> maximum compute capacity and to select a higher frequency than the current one.
>
> After the changes in 9c0b4bb7f630, the performance margin was applied
> earlier in the path to take into account utilization clampings and
> we couldn't get a utilization higher than the maximum compute capacity,
> and the CPU remained 'stuck' at lower frequencies.
>
> To fix this, we must use a frequency above the current frequency to
> get a chance to select a higher OPP when the current one becomes fully used.
> Apply the same margin and return a frequency 25% higher than the current
> one in order to switch to the next OPP before we fully use the CPU
> at the current one.
>
> [ mingo: Clarified the changelog. ]
>
> Fixes: 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation")
> Reported-by: Linus Torvalds <[email protected]>
> Bisected-by: Linus Torvalds <[email protected]>
> Reported-by: Wyes Karny <[email protected]>
> Signed-off-by: Vincent Guittot <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
> Tested-by: Wyes Karny <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> ---
> kernel/sched/cpufreq_schedutil.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> index 95c3c097083e..eece6244f9d2 100644
> --- a/kernel/sched/cpufreq_schedutil.c
> +++ b/kernel/sched/cpufreq_schedutil.c
> @@ -133,7 +133,11 @@ unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
> if (arch_scale_freq_invariant())
> return policy->cpuinfo.max_freq;
>
> - return policy->cur;
> + /*
> + * Apply a 25% margin so that we select a higher frequency than
> + * the current one before the CPU is fully busy:
> + */
> + return policy->cur + (policy->cur >> 2);
> }
>
> /**

Subject: [tip: sched/urgent] sched/fair: Fix frequency selection for non-invariant case

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID: e37617c8e53a1f7fcba6d5e1041f4fd8a2425c27
Gitweb: https://git.kernel.org/tip/e37617c8e53a1f7fcba6d5e1041f4fd8a2425c27
Author: Vincent Guittot <[email protected]>
AuthorDate: Sun, 14 Jan 2024 19:36:00 +01:00
Committer: Ingo Molnar <[email protected]>
CommitterDate: Tue, 16 Jan 2024 10:41:25 +01:00

sched/fair: Fix frequency selection for non-invariant case

Linus reported a ~50% performance regression on single-threaded
workloads on his AMD Ryzen system, and bisected it to:

9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation")

When frequency invariance is not enabled, get_capacity_ref_freq(policy)
is supposed to return the current frequency and the performance margin
applied by map_util_perf(), enabling the utilization to go above the
maximum compute capacity and to select a higher frequency than the current one.

After the changes in 9c0b4bb7f630, the performance margin was applied
earlier in the path to take into account utilization clampings and
we couldn't get a utilization higher than the maximum compute capacity,
and the CPU remained 'stuck' at lower frequencies.

To fix this, we must use a frequency above the current frequency to
get a chance to select a higher OPP when the current one becomes fully used.
Apply the same margin and return a frequency 25% higher than the current
one in order to switch to the next OPP before we fully use the CPU
at the current one.

[ mingo: Clarified the changelog. ]

Fixes: 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation")
Reported-by: Linus Torvalds <[email protected]>
Bisected-by: Linus Torvalds <[email protected]>
Reported-by: Wyes Karny <[email protected]>
Signed-off-by: Vincent Guittot <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Tested-by: Wyes Karny <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
---
kernel/sched/cpufreq_schedutil.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 95c3c09..eece624 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -133,7 +133,11 @@ unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
if (arch_scale_freq_invariant())
return policy->cpuinfo.max_freq;

- return policy->cur;
+ /*
+ * Apply a 25% margin so that we select a higher frequency than
+ * the current one before the CPU is fully busy:
+ */
+ return policy->cur + (policy->cur >> 2);
}

/**

Subject: Re: [PATCH] sched/fair: Fix frequency selection for non invariant case

On 16.01.24 10:59, Ingo Molnar wrote:
>
> * Vincent Guittot <[email protected]> wrote:
>
>> When frequency invariance is not enabled, get_capacity_ref_freq(policy)
>> returns the current frequency and the performance margin applied by
>> map_util_perf(), enabled the utilization to go above the maximum compute
>> capacity and to select a higher frequency than the current one.
>> [...]
>> Reported-by: Linus Torvalds <[email protected]>
>> Closes: https://lore.kernel.org/lkml/CAHk-=wgWcYX2oXKtgvNN2LLDXP7kXkbo-xTfumEjmPbjSer2RQ@mail.gmail.com/
>> Reported-by: Wyes Karny <[email protected]>
>> Closes: https://lore.kernel.org/lkml/20240114091240.xzdvqk75ifgfj5yx@wyes-pc/

Thx for resolving this everyone. Allow me a quick question:

I noticed the two Closes: tags above are missing in the actual commit:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=e37617c8e53a1f7fcba6d5e1041f4fd8a2425c27

Is there a overeager script here that removes them when it shouldn't?

Just asking, because the lack of these tags makes regression tracking
hard. And Linus really wants those tags in cases like this[1].

Ciao, Thorsten

[1] for details, see:
https://lore.kernel.org/all/CAHk-=wjMmSZzMJ3Xnskdg4+GGz=5p5p+GSYyFBTh0f-DgvdBWg@mail.gmail.com/
https://lore.kernel.org/all/CAHk-=wgs38ZrfPvy=nOwVkVzjpM3VFU1zobP37Fwd_h9iAD5JQ@mail.gmail.com/
https://lore.kernel.org/all/CAHk-=wjxzafG-=J8oT30s7upn4RhBs6TX-uVFZ5rME+L5_DoJA@mail.gmail.com/

2024-02-14 17:24:32

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix frequency selection for non invariant case

Hi John,

On Wed, 14 Feb 2024 at 18:12, Jon Hunter <[email protected]> wrote:
>
> Hi Vincent,
>
> On 14/01/2024 18:36, Vincent Guittot wrote:
> > When frequency invariance is not enabled, get_capacity_ref_freq(policy)
> > returns the current frequency and the performance margin applied by
> > map_util_perf(), enabled the utilization to go above the maximum compute
> > capacity and to select a higher frequency than the current one.
> >
> > The performance margin is now applied earlier in the path to take into
> > account some utilization clampings and we can't get an utilization higher
> > than the maximum compute capacity.
> >
> > We must use a frequency above the current frequency to get a chance to
> > select a higher OPP when the current one becomes fully used. Apply
> > the same margin and returns a frequency 25% higher than the current one in
> > order to switch to the next OPP before we fully use the cpu at the current
> > one.
> >
> > Reported-by: Linus Torvalds <[email protected]>
> > Closes: https://lore.kernel.org/lkml/CAHk-=wgWcYX2oXKtgvNN2LLDXP7kXkbo-xTfumEjmPbjSer2RQ@mail.gmail.com/
> > Reported-by: Wyes Karny <[email protected]>
> > Closes: https://lore.kernel.org/lkml/20240114091240.xzdvqk75ifgfj5yx@wyes-pc/
> > Fixes: 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor performance estimation")
> > Signed-off-by: Vincent Guittot <[email protected]>
> > Tested-by: Wyes Karny <[email protected]>
> > ---
> > kernel/sched/cpufreq_schedutil.c | 6 +++++-
> > 1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 95c3c097083e..d12e95d30e2e 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -133,7 +133,11 @@ unsigned long get_capacity_ref_freq(struct cpufreq_policy *policy)
> > if (arch_scale_freq_invariant())
> > return policy->cpuinfo.max_freq;
> >
> > - return policy->cur;
> > + /*
> > + * Apply a 25% margin so that we select a higher frequency than
> > + * the current one before the CPU is full busy
> > + */
> > + return policy->cur + (policy->cur >> 2);
> > }
> >
> > /**
>
>
> We have also observed a performance degradation on our Tegra platforms
> with v6.8-rc1. Unfortunately, the above change does not fix the problem
> for us and we are still seeing a performance issue with v6.8-rc4. For
> example, running Dhrystone on Tegra234 I am seeing the following ...
>
> Linux v6.7:
> [ 2216.301949] CPU0: Dhrystones per Second: 31976326 (18199 DMIPS)
> [ 2220.993877] CPU1: Dhrystones per Second: 49568123 (28211 DMIPS)
> [ 2225.685280] CPU2: Dhrystones per Second: 49568123 (28211 DMIPS)
> [ 2230.364423] CPU3: Dhrystones per Second: 49632220 (28248 DMIPS)
>
> Linux v6.8-rc4:
> [ 44.661686] CPU0: Dhrystones per Second: 16068483 (9145 DMIPS)
> [ 51.895107] CPU1: Dhrystones per Second: 16077457 (9150 DMIPS)
> [ 59.105410] CPU2: Dhrystones per Second: 16095436 (9160 DMIPS)
> [ 66.333297] CPU3: Dhrystones per Second: 16064000 (9142 DMIPS)
>
> If I revert this change and the following ...
>
> b3edde44e5d4 ("cpufreq/schedutil: Use a fixed reference frequency")
> f12560779f9d ("sched/cpufreq: Rework iowait boost")
> 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor
>
> ... then the perf is similar to where it was ...
>
> Linux v6.8-rc4 plus reverts:
> [ 31.768189] CPU0: Dhrystones per Second: 48421678 (27559 DMIPS)
> [ 36.556838] CPU1: Dhrystones per Second: 48401324 (27547 DMIPS)
> [ 41.343343] CPU2: Dhrystones per Second: 48421678 (27559 DMIPS)
> [ 46.163347] CPU3: Dhrystones per Second: 47679814 (27137 DMIPS)
>
> All CPUs are running with the schedutil CPUFREQ governor. We have not
> looked any further but wanted to report this in case you have any more
> thoughts or suggestions for us to try.

Have you tried this :
https://lore.kernel.org/lkml/[email protected]/

It's in driver-core-linus' branch and should be sent to Linus soon

Vincent

>
> Thanks
> Jon
>
> --
> nvpublic

2024-02-14 17:25:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix frequency selection for non invariant case

On Wed, 14 Feb 2024 at 09:12, Jon Hunter <[email protected]> wrote:
>
> We have also observed a performance degradation on our Tegra platforms
> with v6.8-rc1. Unfortunately, the above change does not fix the problem
> for us and we are still seeing a performance issue with v6.8-rc4. For
> example, running Dhrystone on Tegra234 I am seeing the following ...
>
> Linux v6.7:
> [ 2216.301949] CPU0: Dhrystones per Second: 31976326 (18199 DMIPS)
> [ 2220.993877] CPU1: Dhrystones per Second: 49568123 (28211 DMIPS)
> [ 2225.685280] CPU2: Dhrystones per Second: 49568123 (28211 DMIPS)
> [ 2230.364423] CPU3: Dhrystones per Second: 49632220 (28248 DMIPS)
>
> Linux v6.8-rc4:
> [ 44.661686] CPU0: Dhrystones per Second: 16068483 (9145 DMIPS)
> [ 51.895107] CPU1: Dhrystones per Second: 16077457 (9150 DMIPS)
> [ 59.105410] CPU2: Dhrystones per Second: 16095436 (9160 DMIPS)
> [ 66.333297] CPU3: Dhrystones per Second: 16064000 (9142 DMIPS)
>
> If I revert this change and the following ...
>
> b3edde44e5d4 ("cpufreq/schedutil: Use a fixed reference frequency")
> f12560779f9d ("sched/cpufreq: Rework iowait boost")
> 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor
>
> ... then the perf is similar to where it was ...

Ok, guys, this whole scheduler / cpufreq rewrite seems to have been
completely buggered.

Please tell me why we shouldn't just revert things as per above?

Sure, the problem _I_ experienced is fixed, but apparently there are
others just lurking, and they are even bigger degradations than the
one I saw.

We're now at rc4, we're not releasing a 6.8 with the above kinds of
numbers. So either there's another obvious one-liner fix, or we need
to revert this whole thing.

Yes, dhrystones is a truly crappy benchmark, but partly _because_ it's
such a horribly bad benchmark it's also a very simple case. It's pure
CPU load with absolutely nothing interesting going on. Regressing on
that by a factor of three is a sign of complete failure.

Linus

2024-02-14 17:26:31

by Vincent Guittot

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix frequency selection for non invariant case

On Wed, 14 Feb 2024 at 18:20, Linus Torvalds
<[email protected]> wrote:
>
> On Wed, 14 Feb 2024 at 09:12, Jon Hunter <[email protected]> wrote:
> >
> > We have also observed a performance degradation on our Tegra platforms
> > with v6.8-rc1. Unfortunately, the above change does not fix the problem
> > for us and we are still seeing a performance issue with v6.8-rc4. For
> > example, running Dhrystone on Tegra234 I am seeing the following ...
> >
> > Linux v6.7:
> > [ 2216.301949] CPU0: Dhrystones per Second: 31976326 (18199 DMIPS)
> > [ 2220.993877] CPU1: Dhrystones per Second: 49568123 (28211 DMIPS)
> > [ 2225.685280] CPU2: Dhrystones per Second: 49568123 (28211 DMIPS)
> > [ 2230.364423] CPU3: Dhrystones per Second: 49632220 (28248 DMIPS)
> >
> > Linux v6.8-rc4:
> > [ 44.661686] CPU0: Dhrystones per Second: 16068483 (9145 DMIPS)
> > [ 51.895107] CPU1: Dhrystones per Second: 16077457 (9150 DMIPS)
> > [ 59.105410] CPU2: Dhrystones per Second: 16095436 (9160 DMIPS)
> > [ 66.333297] CPU3: Dhrystones per Second: 16064000 (9142 DMIPS)
> >
> > If I revert this change and the following ...
> >
> > b3edde44e5d4 ("cpufreq/schedutil: Use a fixed reference frequency")
> > f12560779f9d ("sched/cpufreq: Rework iowait boost")
> > 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor
> >
> > ... then the perf is similar to where it was ...
>
> Ok, guys, this whole scheduler / cpufreq rewrite seems to have been
> completely buggered.
>
> Please tell me why we shouldn't just revert things as per above?
>
> Sure, the problem _I_ experienced is fixed, but apparently there are
> others just lurking, and they are even bigger degradations than the
> one I saw.
>
> We're now at rc4, we're not releasing a 6.8 with the above kinds of
> numbers. So either there's another obvious one-liner fix, or we need
> to revert this whole thing.

This should fix it:
https://lore.kernel.org/lkml/[email protected]/

>
> Yes, dhrystones is a truly crappy benchmark, but partly _because_ it's
> such a horribly bad benchmark it's also a very simple case. It's pure
> CPU load with absolutely nothing interesting going on. Regressing on
> that by a factor of three is a sign of complete failure.
>
> Linus

2024-02-14 17:59:00

by Jon Hunter

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix frequency selection for non invariant case


On 14/02/2024 17:22, Vincent Guittot wrote:
> On Wed, 14 Feb 2024 at 18:20, Linus Torvalds
> <[email protected]> wrote:
>>
>> On Wed, 14 Feb 2024 at 09:12, Jon Hunter <[email protected]> wrote:
>>>
>>> We have also observed a performance degradation on our Tegra platforms
>>> with v6.8-rc1. Unfortunately, the above change does not fix the problem
>>> for us and we are still seeing a performance issue with v6.8-rc4. For
>>> example, running Dhrystone on Tegra234 I am seeing the following ...
>>>
>>> Linux v6.7:
>>> [ 2216.301949] CPU0: Dhrystones per Second: 31976326 (18199 DMIPS)
>>> [ 2220.993877] CPU1: Dhrystones per Second: 49568123 (28211 DMIPS)
>>> [ 2225.685280] CPU2: Dhrystones per Second: 49568123 (28211 DMIPS)
>>> [ 2230.364423] CPU3: Dhrystones per Second: 49632220 (28248 DMIPS)
>>>
>>> Linux v6.8-rc4:
>>> [ 44.661686] CPU0: Dhrystones per Second: 16068483 (9145 DMIPS)
>>> [ 51.895107] CPU1: Dhrystones per Second: 16077457 (9150 DMIPS)
>>> [ 59.105410] CPU2: Dhrystones per Second: 16095436 (9160 DMIPS)
>>> [ 66.333297] CPU3: Dhrystones per Second: 16064000 (9142 DMIPS)
>>>
>>> If I revert this change and the following ...
>>>
>>> b3edde44e5d4 ("cpufreq/schedutil: Use a fixed reference frequency")
>>> f12560779f9d ("sched/cpufreq: Rework iowait boost")
>>> 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor
>>>
>>> ... then the perf is similar to where it was ...
>>
>> Ok, guys, this whole scheduler / cpufreq rewrite seems to have been
>> completely buggered.
>>
>> Please tell me why we shouldn't just revert things as per above?
>>
>> Sure, the problem _I_ experienced is fixed, but apparently there are
>> others just lurking, and they are even bigger degradations than the
>> one I saw.
>>
>> We're now at rc4, we're not releasing a 6.8 with the above kinds of
>> numbers. So either there's another obvious one-liner fix, or we need
>> to revert this whole thing.
>
> This should fix it:
> https://lore.kernel.org/lkml/[email protected]/


Yes I can confirm that this does fix it ...

[ 29.440836] CPU0: Dhrystones per Second: 48340366 (27513 DMIPS)
[ 34.221323] CPU1: Dhrystones per Second: 48585127 (27652 DMIPS)
[ 38.988036] CPU2: Dhrystones per Second: 48667266 (27699 DMIPS)
[ 43.769430] CPU3: Dhrystones per Second: 48544161 (27629 DMIPS)

>> Yes, dhrystones is a truly crappy benchmark, but partly _because_ it's
>> such a horribly bad benchmark it's also a very simple case. It's pure
>> CPU load with absolutely nothing interesting going on. Regressing on
>> that by a factor of three is a sign of complete failure.


We have a few other more extensive tests that have been failing due to
the perf issue. We will run those with the above and if we see any more
issues I will let everyone know.

Thanks
Jon

--
nvpublic

Subject: Re: [PATCH] sched/fair: Fix frequency selection for non invariant case

Linus, what...

On 14.02.24 18:22, Vincent Guittot wrote:
> On Wed, 14 Feb 2024 at 18:20, Linus Torvalds
> <[email protected]> wrote:
>> On Wed, 14 Feb 2024 at 09:12, Jon Hunter <[email protected]> wrote:
>>> We have also observed a performance degradation on our Tegra platforms
>>> with v6.8-rc1. Unfortunately, the above change does not fix the problem
>>> for us and we are still seeing a performance issue with v6.8-rc4. For
>>> example, running Dhrystone on Tegra234 I am seeing the following ...
>>> [...]
>>> If I revert this change and the following ...
>>> b3edde44e5d4 ("cpufreq/schedutil: Use a fixed reference frequency")
>>> f12560779f9d ("sched/cpufreq: Rework iowait boost")
>>> 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor
>>> ... then the perf is similar to where it was ...
>>
>> Ok, guys, this whole scheduler / cpufreq rewrite seems to have been
>> completely buggered.
>> [...]
> This should fix it:
> https://lore.kernel.org/lkml/[email protected]/

..do you want me to do in situations like this? I'm asking, as I see
situations like this frequently -- e.g. people reporting problems a
second, third, or fourth time while the fix is already sitting in -next
for a few days.

Want me to list them in the weekly reports so that you can cherry-pick
them from -next if you want?

Ciao, Thorsten

2024-02-15 09:14:16

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Fix frequency selection for non invariant case

On Thu, Feb 15, 2024 at 09:45:01AM +0100, Thorsten Leemhuis wrote:
> Linus, what...
>
> On 14.02.24 18:22, Vincent Guittot wrote:
> > On Wed, 14 Feb 2024 at 18:20, Linus Torvalds
> > <[email protected]> wrote:
> >> On Wed, 14 Feb 2024 at 09:12, Jon Hunter <[email protected]> wrote:
> >>> We have also observed a performance degradation on our Tegra platforms
> >>> with v6.8-rc1. Unfortunately, the above change does not fix the problem
> >>> for us and we are still seeing a performance issue with v6.8-rc4. For
> >>> example, running Dhrystone on Tegra234 I am seeing the following ...
> >>> [...]
> >>> If I revert this change and the following ...
> >>> b3edde44e5d4 ("cpufreq/schedutil: Use a fixed reference frequency")
> >>> f12560779f9d ("sched/cpufreq: Rework iowait boost")
> >>> 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor
> >>> ... then the perf is similar to where it was ...
> >>
> >> Ok, guys, this whole scheduler / cpufreq rewrite seems to have been
> >> completely buggered.
> >> [...]
> > This should fix it:
> > https://lore.kernel.org/lkml/[email protected]/
>
> ...do you want me to do in situations like this? I'm asking, as I see
> situations like this frequently -- e.g. people reporting problems a
> second, third, or fourth time while the fix is already sitting in -next
> for a few days.
>
> Want me to list them in the weekly reports so that you can cherry-pick
> them from -next if you want?

Poke the maintainer to get off their butt and submit the pull request to
Linus (note, this is me in this case...)

I'll get it into the next -rc, sorry for the delay, other things got in
the way, my fault.

greg k-h

Subject: Re: [PATCH] sched/fair: Fix frequency selection for non invariant case

On 15.02.24 10:13, Greg KH wrote:
> On Thu, Feb 15, 2024 at 09:45:01AM +0100, Thorsten Leemhuis wrote:
>> Linus, what...
>>
>> On 14.02.24 18:22, Vincent Guittot wrote:
>>> On Wed, 14 Feb 2024 at 18:20, Linus Torvalds
>>> <[email protected]> wrote:
>>>> On Wed, 14 Feb 2024 at 09:12, Jon Hunter <[email protected]> wrote:
>>>>> We have also observed a performance degradation on our Tegra platforms
>>>>> with v6.8-rc1. Unfortunately, the above change does not fix the problem
>>>>> for us and we are still seeing a performance issue with v6.8-rc4. For
>>>>> example, running Dhrystone on Tegra234 I am seeing the following ...
>>>>> [...]
>>>>> If I revert this change and the following ...
>>>>> b3edde44e5d4 ("cpufreq/schedutil: Use a fixed reference frequency")
>>>>> f12560779f9d ("sched/cpufreq: Rework iowait boost")
>>>>> 9c0b4bb7f630 ("sched/cpufreq: Rework schedutil governor
>>>>> ... then the perf is similar to where it was ...
>>>>
>>>> Ok, guys, this whole scheduler / cpufreq rewrite seems to have been
>>>> completely buggered.
>>>> [...]
>>> This should fix it:
>>> https://lore.kernel.org/lkml/[email protected]/
>>
>> ...do you want me to do in situations like this? I'm asking, as I see
>> situations like this frequently -- e.g. people reporting problems a
>> second, third, or fourth time while the fix is already sitting in -next
>> for a few days.
>>
>> Want me to list them in the weekly reports so that you can cherry-pick
>> them from -next if you want?
>
> Poke the maintainer to get off their butt and submit the pull request to
> Linus

Well, I did that sometimes and will continue to do so. But some
maintainers then feel pestered and become annoyed by my efforts -- which
in the long-term is counter productive, as regression tracking will only
work well if maintainers and I work well together. That's why I'm a bit
careful with such things (side note: don't worry, I know that some
conflict is inevitable -- but I don't have your or Linus standing, so I
have to choose my fights carefully...).

I sometimes also got replies along the lines of "we are only at -rc2,
this can wait till -rc5 or -rc6" -- and I have no quote from Linus at
hand I can point maintainers to that says something along the lines of
"if a regression fix was in -next for at least two days, submit it to
mainline before the next -rc, unless there is a strong reason why that
particular fix needs more testing" (or whatever he actually wants).

> (note, this is me in this case...)
> I'll get it into the next -rc, sorry for the delay, other things got in
> the way, my fault.

Happens, I don't care too much about this specific event, more about the
general problem. Especially the -mm tree bugs me sometimes, as I noticed
that Andrew often lets regression fixes linger in -next for round about
a week; he furthermore sometimes sends this stuff to Linus on Mondays or
Tuesdays. Due to that the fixes often miss at least one, sometimes two
-rcs. That is especially hard to watch if the regression made it to a
stable kernel and you are waiting for the fix to get mainlined.

Ciao, Thorsten