2024-06-06 07:07:34

by Xuewen Yan

[permalink] [raw]
Subject: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity

Because the effective_cpu_util() would return a util which
maybe bigger than the actual_cpu_capacity, this could cause
the pd_busy_time calculation errors.
So clamp the cpu_busy_time with the eenv->cpu_cap, which is
the actual_cpu_capacity.

Fixes: 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
Signed-off-by: Xuewen Yan <[email protected]>
---
kernel/sched/fair.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 8a5b1ae0aa55..8939d725023a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7870,7 +7870,9 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
for_each_cpu(cpu, pd_cpus) {
unsigned long util = cpu_util(cpu, p, -1, 0);

- busy_time += effective_cpu_util(cpu, util, NULL, NULL);
+ util = effective_cpu_util(cpu, util, NULL, NULL);
+ util = min(eenv->cpu_cap, util);
+ busy_time += util;
}

eenv->pd_busy_time = min(eenv->pd_cap, busy_time);
--
2.25.1



2024-06-07 07:32:25

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity

On 06/06/2024 09:06, Xuewen Yan wrote:
> Because the effective_cpu_util() would return a util which
> maybe bigger than the actual_cpu_capacity, this could cause
> the pd_busy_time calculation errors.

Doesn't return effective_cpu_util() either scale or min(scale, util)
with scale = arch_scale_cpu_capacity(cpu)? So the util sum over the PD
cannot exceed eenv->cpu_cap?
Looks like this was the case with 3e8c6c9aac42 already.

> So clamp the cpu_busy_time with the eenv->cpu_cap, which is
> the actual_cpu_capacity.
>
> Fixes: 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> Signed-off-by: Xuewen Yan <[email protected]>
> ---
> kernel/sched/fair.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a5b1ae0aa55..8939d725023a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7870,7 +7870,9 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
> for_each_cpu(cpu, pd_cpus) {
> unsigned long util = cpu_util(cpu, p, -1, 0);
>
> - busy_time += effective_cpu_util(cpu, util, NULL, NULL);
> + util = effective_cpu_util(cpu, util, NULL, NULL);
> + util = min(eenv->cpu_cap, util);
> + busy_time += util;
> }
>
> eenv->pd_busy_time = min(eenv->pd_cap, busy_time);


2024-06-07 08:21:09

by Xuewen Yan

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity

Hi Dietmar

On Fri, Jun 7, 2024 at 3:19 PM Dietmar Eggemann
<[email protected]> wrote:
>
> On 06/06/2024 09:06, Xuewen Yan wrote:
> > Because the effective_cpu_util() would return a util which
> > maybe bigger than the actual_cpu_capacity, this could cause
> > the pd_busy_time calculation errors.
>
> Doesn't return effective_cpu_util() either scale or min(scale, util)
> with scale = arch_scale_cpu_capacity(cpu)? So the util sum over the PD
> cannot exceed eenv->cpu_cap?

In effective_cpu_util, the scale = arch_scale_cpu_capacity(cpu);
Although there is the clamp of eenv->pd_cap, but let us consider the
following simple scenario:
The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
of cpufreq-limit,
the cpu_actual_cap = 512. Then the eenv->cpu_cap = 512, the eenv->pd_cap = 2048;
effective_cpu_util(4) = 1024;
effective_cpu_util(5) = 1024;
effective_cpu_util(6) = 256;
effective_cpu_util(7) = 0;

Without this patch, the eenv->pd_busy_time = 2048, because the clamp
of eenv->pd_cap.
However, indeed, the cpu4 and cpu5's util would not exceed the actual_cap(512),
so the cpu_util4/5 = 512, instead of 1024.
And with this patch, the eenv->pd_busy_time = 512+512+256 = 1280.
And the 1280 should be more reasonable.

BR
--
xuewen

> Looks like this was the case with 3e8c6c9aac42 already.
>
> > So clamp the cpu_busy_time with the eenv->cpu_cap, which is
> > the actual_cpu_capacity.
> >
> > Fixes: 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> > Signed-off-by: Xuewen Yan <[email protected]>
> > ---
> > kernel/sched/fair.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 8a5b1ae0aa55..8939d725023a 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7870,7 +7870,9 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
> > for_each_cpu(cpu, pd_cpus) {
> > unsigned long util = cpu_util(cpu, p, -1, 0);
> >
> > - busy_time += effective_cpu_util(cpu, util, NULL, NULL);
> > + util = effective_cpu_util(cpu, util, NULL, NULL);
> > + util = min(eenv->cpu_cap, util);
> > + busy_time += util;
> > }
> >
> > eenv->pd_busy_time = min(eenv->pd_cap, busy_time);
>

2024-06-07 10:30:25

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity

On 07/06/2024 10:20, Xuewen Yan wrote:
> Hi Dietmar
>
> On Fri, Jun 7, 2024 at 3:19 PM Dietmar Eggemann
> <[email protected]> wrote:
>>
>> On 06/06/2024 09:06, Xuewen Yan wrote:
>>> Because the effective_cpu_util() would return a util which
>>> maybe bigger than the actual_cpu_capacity, this could cause
>>> the pd_busy_time calculation errors.
>>
>> Doesn't return effective_cpu_util() either scale or min(scale, util)
>> with scale = arch_scale_cpu_capacity(cpu)? So the util sum over the PD
>> cannot exceed eenv->cpu_cap?
>
> In effective_cpu_util, the scale = arch_scale_cpu_capacity(cpu);
> Although there is the clamp of eenv->pd_cap, but let us consider the
> following simple scenario:
> The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
> of cpufreq-limit,

Ah, this is due to:

find_energy_efficient_cpu()

...
for (; pd; pd = pd->next)
...
cpu_actual_cap = get_actual_cpu_capacity(cpu)

for_each_cpu(cpu, cpus)
...
eenv.pd_cap += cpu_actual_cap

and:

get_actual_cpu_capacity()

...
capacity = arch_scale_cpu_capacity(cpu)

capacity -= max(hw_load_avg(cpu_rq(cpu)), cpufreq_get_pressure(cpu))

which got introduced by f1f8d0a22422 ("sched/cpufreq: Take cpufreq
feedback into account").

[...]

2024-06-07 10:38:27

by Xuewen Yan

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity

On Fri, Jun 7, 2024 at 6:30 PM Dietmar Eggemann
<[email protected]> wrote:
>
> On 07/06/2024 10:20, Xuewen Yan wrote:
> > Hi Dietmar
> >
> > On Fri, Jun 7, 2024 at 3:19 PM Dietmar Eggemann
> > <[email protected]> wrote:
> >>
> >> On 06/06/2024 09:06, Xuewen Yan wrote:
> >>> Because the effective_cpu_util() would return a util which
> >>> maybe bigger than the actual_cpu_capacity, this could cause
> >>> the pd_busy_time calculation errors.
> >>
> >> Doesn't return effective_cpu_util() either scale or min(scale, util)
> >> with scale = arch_scale_cpu_capacity(cpu)? So the util sum over the PD
> >> cannot exceed eenv->cpu_cap?
> >
> > In effective_cpu_util, the scale = arch_scale_cpu_capacity(cpu);
> > Although there is the clamp of eenv->pd_cap, but let us consider the
> > following simple scenario:
> > The pd cpus are 4-7, and the arch_scale_capacity is 1024, and because
> > of cpufreq-limit,
>
> Ah, this is due to:
>
> find_energy_efficient_cpu()
>
> ...
> for (; pd; pd = pd->next)
> ...
> cpu_actual_cap = get_actual_cpu_capacity(cpu)
>
> for_each_cpu(cpu, cpus)
> ...
> eenv.pd_cap += cpu_actual_cap
>
> and:
>
> get_actual_cpu_capacity()
>
> ...
> capacity = arch_scale_cpu_capacity(cpu)
>
> capacity -= max(hw_load_avg(cpu_rq(cpu)), cpufreq_get_pressure(cpu))
>
> which got introduced by f1f8d0a22422 ("sched/cpufreq: Take cpufreq
> feedback into account").

I don't think it was introduced by f1f8d0a22422, because f1f8d0a22422
just replaced the cpu_thermal_cap with get_actual_cpu_capacity(cpu).
The eenv.cpu_cap was introduced by 3e8c6c9aac42 ("sched/fair: Remove
task_util from effective utilization in feec()").

BR
---
xuewen

>
> [...]

2024-06-09 22:55:33

by Qais Yousef

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity

On 06/06/24 15:06, Xuewen Yan wrote:
> Because the effective_cpu_util() would return a util which
> maybe bigger than the actual_cpu_capacity, this could cause
> the pd_busy_time calculation errors.
> So clamp the cpu_busy_time with the eenv->cpu_cap, which is
> the actual_cpu_capacity.

I actually think capping by pd_cap is something we should remove. Saturated
systems aren't calculated properly especially when uclamp_max is used.

But this might a bigger change and out of scope of what you're proposing..

Did this 'wrong' calculation cause an actual problem for task placement?
I assume the pd looked 'busier' because some CPUs were too busy.

Was the system in overutilzied state? Usually if one CPU is that busy
overutilized should be set and we'd skip EAS - unless uclamp_max was used.

>
> Fixes: 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> Signed-off-by: Xuewen Yan <[email protected]>
> ---
> kernel/sched/fair.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a5b1ae0aa55..8939d725023a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7870,7 +7870,9 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
> for_each_cpu(cpu, pd_cpus) {
> unsigned long util = cpu_util(cpu, p, -1, 0);
>
> - busy_time += effective_cpu_util(cpu, util, NULL, NULL);
> + util = effective_cpu_util(cpu, util, NULL, NULL);
> + util = min(eenv->cpu_cap, util);
> + busy_time += util;
> }
>
> eenv->pd_busy_time = min(eenv->pd_cap, busy_time);
> --
> 2.25.1
>

2024-06-11 09:32:25

by Christian Loehle

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity

On 6/6/24 08:06, Xuewen Yan wrote:
> Because the effective_cpu_util() would return a util which
> maybe bigger than the actual_cpu_capacity, this could cause
> the pd_busy_time calculation errors.
> So clamp the cpu_busy_time with the eenv->cpu_cap, which is
> the actual_cpu_capacity.
>
> Fixes: 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> Signed-off-by: Xuewen Yan <[email protected]>
> ---
> kernel/sched/fair.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 8a5b1ae0aa55..8939d725023a 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -7870,7 +7870,9 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
> for_each_cpu(cpu, pd_cpus) {
> unsigned long util = cpu_util(cpu, p, -1, 0);
>
> - busy_time += effective_cpu_util(cpu, util, NULL, NULL);
> + util = effective_cpu_util(cpu, util, NULL, NULL);
> + util = min(eenv->cpu_cap, util);
> + busy_time += util;
> }
>
> eenv->pd_busy_time = min(eenv->pd_cap, busy_time);

I can reproduce the issue and the fix, so
Tested-by: Christian Loehle <[email protected]>
(@Qais, this is on a non-overutilized system).
I'm unsure about the other callers of effective_cpu_util(), or rather sched_cpu_util()
in particular which includes thermal and powercap, they should be off too.
Anyway I'll try to reproduce for them too.

Kind Regards,
Christian

2024-06-12 08:11:53

by Xuewen Yan

[permalink] [raw]
Subject: Re: [PATCH] sched/fair: Prevent cpu_busy_time from exceeding actual_cpu_capacity

Hi Qais

On Mon, Jun 10, 2024 at 6:55 AM Qais Yousef <[email protected]> wrote:
>
> On 06/06/24 15:06, Xuewen Yan wrote:
> > Because the effective_cpu_util() would return a util which
> > maybe bigger than the actual_cpu_capacity, this could cause
> > the pd_busy_time calculation errors.
> > So clamp the cpu_busy_time with the eenv->cpu_cap, which is
> > the actual_cpu_capacity.
>
> I actually think capping by pd_cap is something we should remove. Saturated
> systems aren't calculated properly especially when uclamp_max is used.
>
> But this might a bigger change and out of scope of what you're proposing..

I agree, there are other things to consider before doing this.

>
> Did this 'wrong' calculation cause an actual problem for task placement?
> I assume the pd looked 'busier' because some CPUs were too busy.

This will not only affect calculations in scenarios with high temperatures.
Sometimes, users will set scalimg_max_freq to actively limit the CPU frequency,
so that even if the CPU load is large, the CPU frequency will not be very high.
At this time, even if tasks are placed on other CPUs in the same PD,
the energy increment may not be too large, thus affecting core selection.

>
> Was the system in overutilzied state? Usually if one CPU is that busy
> overutilized should be set and we'd skip EAS - unless uclamp_max was used.

As Christian said, This case occurs not only in the overutil scenario,
this scenario holds true as long as the actual-cpu-capacity caused by
the reduction in max cpu frequency is smaller than the cpu util.

Thanks!
---
xuewen
>
> >
> > Fixes: 3e8c6c9aac42 ("sched/fair: Remove task_util from effective utilization in feec()")
> > Signed-off-by: Xuewen Yan <[email protected]>
> > ---
> > kernel/sched/fair.c | 4 +++-
> > 1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 8a5b1ae0aa55..8939d725023a 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7870,7 +7870,9 @@ static inline void eenv_pd_busy_time(struct energy_env *eenv,
> > for_each_cpu(cpu, pd_cpus) {
> > unsigned long util = cpu_util(cpu, p, -1, 0);
> >
> > - busy_time += effective_cpu_util(cpu, util, NULL, NULL);
> > + util = effective_cpu_util(cpu, util, NULL, NULL);
> > + util = min(eenv->cpu_cap, util);
> > + busy_time += util;
> > }
> >
> > eenv->pd_busy_time = min(eenv->pd_cap, busy_time);
> > --
> > 2.25.1
> >