2023-09-18 11:41:29

by Pierre Gondois

[permalink] [raw]
Subject: [PATCH] cpufreq: Rebuild sched-domains when removing cpufreq driver

The Energy Aware Scheduler (EAS) relies on the schedutil governor.
When moving to/from the schedutil governor, sched domains must be
rebuilt to allow re-evaluating the enablement conditions of EAS.
This is done through sched_cpufreq_governor_change().

Having a cpufreq governor assumes having a cpufreq driver running.
Inserting/removing a cpufreq driver should trigger a re-evaluation
of EAS enablement conditions, avoiding to see EAS enabled when
removing a running cpufreq driver.

Add a sched_cpufreq_governor_change() call in cpufreq driver removal
path.

Signed-off-by: Pierre Gondois <[email protected]>
---
drivers/cpufreq/cpufreq.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 60ed89000e82..0a4979c34fd1 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1673,6 +1673,8 @@ static void __cpufreq_offline(unsigned int cpu, struct cpufreq_policy *policy)
cpufreq_driver->exit(policy);
policy->freq_table = NULL;
}
+
+ sched_cpufreq_governor_change(policy, policy->governor);
}

static int cpufreq_offline(unsigned int cpu)
--
2.25.1


2023-09-18 23:06:50

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Rebuild sched-domains when removing cpufreq driver

On 18/09/2023 13:29, Pierre Gondois wrote:
> The Energy Aware Scheduler (EAS) relies on the schedutil governor.
> When moving to/from the schedutil governor, sched domains must be
> rebuilt to allow re-evaluating the enablement conditions of EAS.
> This is done through sched_cpufreq_governor_change().
>
> Having a cpufreq governor assumes having a cpufreq driver running.
> Inserting/removing a cpufreq driver should trigger a re-evaluation
> of EAS enablement conditions, avoiding to see EAS enabled when
> removing a running cpufreq driver.
>
> Add a sched_cpufreq_governor_change() call in cpufreq driver removal
> path.

Rebuilding SDs when inserting the driver is already covered by

cpufreq_online()
cpufreq_set_policy()
sched_cpufreq_governor_change()
if (old or new gov eq. schedutil)
schedule_work(&rebuild_sd_work)

So what's missing is only a sched_cpufreq_governor_change() call when
removing the driver, right?

>
> Signed-off-by: Pierre Gondois <[email protected]>
> ---
> drivers/cpufreq/cpufreq.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 60ed89000e82..0a4979c34fd1 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1673,6 +1673,8 @@ static void __cpufreq_offline(unsigned int cpu, struct cpufreq_policy *policy)
> cpufreq_driver->exit(policy);
> policy->freq_table = NULL;
> }
> +
> + sched_cpufreq_governor_change(policy, policy->governor);
> }
>
> static int cpufreq_offline(unsigned int cpu)

2023-09-19 07:53:36

by Pierre Gondois

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Rebuild sched-domains when removing cpufreq driver



On 9/19/23 01:03, Dietmar Eggemann wrote:
> On 18/09/2023 13:29, Pierre Gondois wrote:
>> The Energy Aware Scheduler (EAS) relies on the schedutil governor.
>> When moving to/from the schedutil governor, sched domains must be
>> rebuilt to allow re-evaluating the enablement conditions of EAS.
>> This is done through sched_cpufreq_governor_change().
>>
>> Having a cpufreq governor assumes having a cpufreq driver running.
>> Inserting/removing a cpufreq driver should trigger a re-evaluation
>> of EAS enablement conditions, avoiding to see EAS enabled when
>> removing a running cpufreq driver.
>>
>> Add a sched_cpufreq_governor_change() call in cpufreq driver removal
>> path.
>
> Rebuilding SDs when inserting the driver is already covered by
>
> cpufreq_online()
> cpufreq_set_policy()
> sched_cpufreq_governor_change()
> if (old or new gov eq. schedutil)
> schedule_work(&rebuild_sd_work)
>
> So what's missing is only a sched_cpufreq_governor_change() call when
> removing the driver, right?

Yes exact, removing a cpufreq driver (e.g. `rmmod cppc_cpufreq.ko`) goes
through:
cpufreq_remove_dev()
\-__cpufreq_offline()

so the path you mentioned is not used in this case.

Regards,
Pierre

2023-09-20 16:18:28

by Shrikanth Hegde

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Rebuild sched-domains when removing cpufreq driver



On 9/19/23 1:19 PM, Pierre Gondois wrote:
>
>
> On 9/19/23 01:03, Dietmar Eggemann wrote:
>> On 18/09/2023 13:29, Pierre Gondois wrote:
>>> The Energy Aware Scheduler (EAS) relies on the schedutil governor.
>>> When moving to/from the schedutil governor, sched domains must be
>>> rebuilt to allow re-evaluating the enablement conditions of EAS.
>>> This is done through sched_cpufreq_governor_change().
>>>

Hi Pierre. It looks correct to update when removing the
cpufreq governer.


>>> Having a cpufreq governor assumes having a cpufreq driver running.
>>> Inserting/removing a cpufreq driver should trigger a re-evaluation
>>> of EAS enablement conditions, avoiding to see EAS enabled when
>>> removing a running cpufreq driver.
>>>
>>> Add a sched_cpufreq_governor_change() call in cpufreq driver removal
>>> path.
>>
>> Rebuilding SDs when inserting the driver is already covered by
>>
>>    cpufreq_online()
>>      cpufreq_set_policy()
>>        sched_cpufreq_governor_change()
>>          if (old or new gov eq. schedutil)
>>            schedule_work(&rebuild_sd_work)
>>
>> So what's missing is only a sched_cpufreq_governor_change() call when
>> removing the driver, right?
>
> Yes exact, removing a cpufreq driver (e.g. `rmmod cppc_cpufreq.ko`) goes
> through:
> cpufreq_remove_dev()
> \-__cpufreq_offline()
>
> so the path you mentioned is not used in this case.
>

One Doubt, while looking through code. Not well versed with this area.

cpuhp_cpufreq_offline is being registered with CPU hotplug. That ends up
calling cpufreq_offline. This may cause non desired issues.
1. rebuild of sched domains twice instead, once by CPU hotplug and once by this.
2. offline/online of CPU (non-SMT) may not disabling EAS.




> Regards,
> Pierre

2023-09-21 22:18:35

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Rebuild sched-domains when removing cpufreq driver

On 20/09/2023 07:29, Shrikanth Hegde wrote:
>
> On 9/19/23 1:19 PM, Pierre Gondois wrote:
>>
>> On 9/19/23 01:03, Dietmar Eggemann wrote:
>>> On 18/09/2023 13:29, Pierre Gondois wrote:

[...]

>>> Rebuilding SDs when inserting the driver is already covered by
>>>
>>>    cpufreq_online()
>>>      cpufreq_set_policy()
>>>        sched_cpufreq_governor_change()
>>>          if (old or new gov eq. schedutil)
>>>            schedule_work(&rebuild_sd_work)
>>>
>>> So what's missing is only a sched_cpufreq_governor_change() call when
>>> removing the driver, right?
>>
>> Yes exact, removing a cpufreq driver (e.g. `rmmod cppc_cpufreq.ko`) goes
>> through:
>> cpufreq_remove_dev()
>> \-__cpufreq_offline()
>>
>> so the path you mentioned is not used in this case.

But IMHO the name of sched_cpufreq_governor_change() is now misleading
when called from the driver removal path. It's not the governor which
has changed. We want to disable EAS when unloading the driver in (3):

...
if (!arch_scale_freq_invariant()) {
if (sched_debug()) {
pr_warn("rd %*pbl: Disabling EAS: frequency-invariant load tracking
not yet supported",
cpumask_pr_args(cpu_map));
}
goto free;
}
...

> One Doubt, while looking through code. Not well versed with this area.
>
> cpuhp_cpufreq_offline is being registered with CPU hotplug. That ends up
> calling cpufreq_offline. This may cause non desired issues.
> 1. rebuild of sched domains twice instead, once by CPU hotplug and once by this.

That's possible when you CPU hp out the last CPU of a policy (or Perf
Domain (PD).

Otherwise `cpuhp_cpufreq_offline -> cpufreq_offline() ->
__cpufreq_offline()` returns early in `if (!policy_is_inactive(policy))`
condition.

partition_sched_domains_locked() does:

(1) detach_destroy_domains()
(2) build_sched_domains()
(3) build_perf_domains()

But only the first workqueue event (from cpuset_hotplug_workfn or
rebuild_sd_workfn) will rebuild Sched Domains and PDs (1)-(3), the
second one will only build PDs (3) again.
That's not nice but AFAICS it is not functional incorrect.

> 2. offline/online of CPU (non-SMT) may not disabling EAS.

Can't see this issue right now. When we offline the last CPU of a PD on
a 2 PD system, EAS should be stopped since the root domain does not have
any asymmetric CPU capacities left and when we online it again, EAS
should be started (sched_energy_set()).

2023-09-28 12:55:13

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Rebuild sched-domains when removing cpufreq driver

On 18-09-23, 13:29, Pierre Gondois wrote:
> The Energy Aware Scheduler (EAS) relies on the schedutil governor.
> When moving to/from the schedutil governor, sched domains must be
> rebuilt to allow re-evaluating the enablement conditions of EAS.
> This is done through sched_cpufreq_governor_change().
>
> Having a cpufreq governor assumes having a cpufreq driver running.
> Inserting/removing a cpufreq driver should trigger a re-evaluation
> of EAS enablement conditions, avoiding to see EAS enabled when
> removing a running cpufreq driver.
>
> Add a sched_cpufreq_governor_change() call in cpufreq driver removal
> path.
>
> Signed-off-by: Pierre Gondois <[email protected]>
> ---
> drivers/cpufreq/cpufreq.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 60ed89000e82..0a4979c34fd1 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1673,6 +1673,8 @@ static void __cpufreq_offline(unsigned int cpu, struct cpufreq_policy *policy)
> cpufreq_driver->exit(policy);
> policy->freq_table = NULL;
> }
> +
> + sched_cpufreq_governor_change(policy, policy->governor);

This is called from many paths. Consider using cpufreq_remove_dev() instead,
before the policy is freed.

> }
>
> static int cpufreq_offline(unsigned int cpu)
> --
> 2.25.1

--
viresh

2023-09-28 13:28:34

by Pierre Gondois

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Rebuild sched-domains when removing cpufreq driver

Hello Viresh, Dietmar,

On 9/28/23 09:18, Viresh Kumar wrote:
> On 18-09-23, 13:29, Pierre Gondois wrote:
>> The Energy Aware Scheduler (EAS) relies on the schedutil governor.
>> When moving to/from the schedutil governor, sched domains must be
>> rebuilt to allow re-evaluating the enablement conditions of EAS.
>> This is done through sched_cpufreq_governor_change().
>>
>> Having a cpufreq governor assumes having a cpufreq driver running.
>> Inserting/removing a cpufreq driver should trigger a re-evaluation
>> of EAS enablement conditions, avoiding to see EAS enabled when
>> removing a running cpufreq driver.
>>
>> Add a sched_cpufreq_governor_change() call in cpufreq driver removal
>> path.
>>
>> Signed-off-by: Pierre Gondois <[email protected]>
>> ---
>> drivers/cpufreq/cpufreq.c | 2 ++
>> 1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>> index 60ed89000e82..0a4979c34fd1 100644
>> --- a/drivers/cpufreq/cpufreq.c
>> +++ b/drivers/cpufreq/cpufreq.c
>> @@ -1673,6 +1673,8 @@ static void __cpufreq_offline(unsigned int cpu, struct cpufreq_policy *policy)
>> cpufreq_driver->exit(policy);
>> policy->freq_table = NULL;
>> }
>> +
>> + sched_cpufreq_governor_change(policy, policy->governor);
>
> This is called from many paths. Consider using cpufreq_remove_dev() instead,
> before the policy is freed.

Another solution would be to call sched_cpufreq_governor_change()
from cpufreq_schedutil's init()/exit() callbacks. This would make
more sense as EAS/schedutil cpufreq are tightly bound, and it would
allow to cover all the possible paths.

When tried locally, it seems to cover all scenarios:
- insmod/rmmod a cpufreq driver
- changing the governor policy
- offlining all the CPUs of a pd

Dietmar:
The function would be static to cpufreq_schedutil, and the name can
be changed at the same time to (unless you think of a better name ?):
static void sugov_check_eas_state(void);

Regards,
Pierre

2023-09-29 04:48:58

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Rebuild sched-domains when removing cpufreq driver

On 28-09-23, 14:49, Pierre Gondois wrote:
> Another solution would be to call sched_cpufreq_governor_change()
> from cpufreq_schedutil's init()/exit() callbacks. This would make
> more sense as EAS/schedutil cpufreq are tightly bound, and it would
> allow to cover all the possible paths.
>
> When tried locally, it seems to cover all scenarios:
> - insmod/rmmod a cpufreq driver
> - changing the governor policy
> - offlining all the CPUs of a pd

Right now it is done for all governors. We don't need that ? We just care about
schedutil here ?

--
viresh

2023-09-29 07:23:04

by Pierre Gondois

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Rebuild sched-domains when removing cpufreq driver



On 9/29/23 05:27, Viresh Kumar wrote:
> On 28-09-23, 14:49, Pierre Gondois wrote:
>> Another solution would be to call sched_cpufreq_governor_change()
>> from cpufreq_schedutil's init()/exit() callbacks. This would make
>> more sense as EAS/schedutil cpufreq are tightly bound, and it would
>> allow to cover all the possible paths.
>>
>> When tried locally, it seems to cover all scenarios:
>> - insmod/rmmod a cpufreq driver
>> - changing the governor policy
>> - offlining all the CPUs of a pd
>
> Right now it is done for all governors. We don't need that ? We just care about
> schedutil here ?
>

Yes exact, EAS requires the schedutil governor to be used by all policies. The purpose
of this function is to re-evaluate if EAS enablement conditions are met. I.e., if one
policy changed from/to schedutil, if EAS must be [de|re]activated.