2023-09-05 16:42:03

by Pierre Gondois

[permalink] [raw]
Subject: Re: [PATCH v2] sched/topology: remove sysctl_sched_energy_aware depending on the architecture

Hello Shrikanth,
I tried the patch (on a platform using the cppc_cpufreq driver). The platform
normally has EAS enabled, but the patch removed the sched_energy_aware sysctl.
It seemed the following happened (in the below order):

1. sched_energy_aware_sysctl_init()
Doesn't set sysctl_sched_energy_aware as cpufreq_freq_invariance isn't set
and arch_scale_freq_invariant() returns false

2. cpufreq_register_driver()
Sets cpufreq_freq_invariance during cpufreq initialization sched_energy_set()

3. sched_energy_set()
Is called with has_eas=0 since build_perf_domains() doesn't see the platform
as EAS compatible. Indeed sysctl_sched_energy_aware=0.
So with sysctl_sched_energy_aware=0 and has_eas=0, sched_energy_aware sysctl
is not enabled even though EAS should be possible.


On 9/1/23 08:52, Shrikanth Hegde wrote:
> Currently sysctl_sched_energy_aware doesn't alter the said behaviour on
> some of the architectures. IIUC its meant to either force rebuild the
> perf domains or cleanup the perf domains by echoing 1 or 0 respectively.

There is a definition of the sysctl at:
Documentation/admin-guide/sysctl/kernel.rst::sched_energy_aware

Also a personal comment about the commit message (FWIW), I think it should
be a bit more impersonal and factual. The commit message seems to describe
the code rather than the desired behaviour.

>
> perf domains are not built when there is SMT, or when there is no
> Asymmetric CPU topologies or when there is no frequency invariance.
> Since such cases EAS is not set and perf domains are not built. By
> changing the values of sysctl_sched_energy_aware, its not possible to
> force build the perf domains. Hence remove this sysctl on such platforms
> that dont support it. Some of the settings can be changed later
> such as smt_active by offlining the CPU's, In those cases if
> build_perf_domains returns true, re-enable the sysctl.
>
> Anytime, when sysctl_sched_energy_aware is changed sched_energy_update
> is set when building the perf domains. Making use of that to find out if
> the change is happening by sysctl or dynamic system change.
>
> Taking different cases:
> Case1. system while booting has EAS capability, sysctl will be set 1. Hence
> perf domains will be built if needed. On changing the sysctl to 0, since
> sched_energy_update is true, perf domains would be freed and sysctl will
> not be removed. later sysctl is changed to 1, enabling the perf domains
> rebuild again. Since sysctl is already there, it will skip register.
>
> Case2. System while booting doesn't have EAS Capability. Later after system
> change it becomes capable of EAS. sched_energy_update is false. Though
> sysctl is 0, will go ahead and try to enable eas. This is the current
> behaviour. if has_eas is true, then sysctl will be registered. After
> that any sysctl change is same as Case1.
>
> Case3. System becomes not capable of EAS due to system change. Here since
> sched_energy_update is false, build_perf_domains return has_eas as false
> due to one of the checks and Since this is dynamic change remove the sysctl.
> Any further change which enables EAS is Case2
>
> Note: This hasn't been tested on platform which supports EAS. If the
> change can be verified on that it would really help. This has been
> tested on power10 which doesn't support EAS. sysctl_sched_energy_aware
> is removed with patch.
>
> changes since v1:
> Chen Yu had pointed out that this will not destroy the perf domains on
> architectures where EAS is supported by changing the sysctl. This patch
> addresses that.
> [v1] Link: https://lore.kernel.org/lkml/[email protected]/#t
>
> Signed-off-by: Shrikanth Hegde <[email protected]>
> ---
> kernel/sched/topology.c | 45 +++++++++++++++++++++++++++++++++--------
> 1 file changed, 37 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 05a5bc678c08..4d16269ac21a 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -208,7 +208,8 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
>
> #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> DEFINE_STATIC_KEY_FALSE(sched_energy_present);
> -static unsigned int sysctl_sched_energy_aware = 1;
> +static unsigned int sysctl_sched_energy_aware;
> +static struct ctl_table_header *sysctl_eas_header;

The variables around the presence/absence of EAS are:
- sched_energy_present:
EAS is up and running

- sysctl_sched_energy_aware:
The user wants to use EAS (or not). Doesn't mean EAS can run on the
platform.

- sched_energy_set/partition_sched_domains_locked's "has_eas":
Local variable. Represent whether EAS can run on the platform.

IMO it would be simpler to (un)register sched_energy_aware sysctl
in partition_sched_domains_locked(), based on the value of "has_eas".
This would allow to let all the logic as it is right now, inside
build_perf_domains(), and then advertise sched_energy_aware sysctl
if EAS can run on the platform.
sched_energy_aware_sysctl_init() would be deleted then.


> static DEFINE_MUTEX(sched_energy_mutex);
> static bool sched_energy_update;
>
> @@ -226,6 +227,7 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
> void *buffer, size_t *lenp, loff_t *ppos)
> {
> int ret, state;
> + int prev_val = sysctl_sched_energy_aware;
>
> if (write && !capable(CAP_SYS_ADMIN))
> return -EPERM;
> @@ -233,8 +235,11 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> if (!ret && write) {
> state = static_branch_unlikely(&sched_energy_present);
> - if (state != sysctl_sched_energy_aware)
> + if (state != sysctl_sched_energy_aware && prev_val != sysctl_sched_energy_aware) {
> + if (sysctl_sched_energy_aware && !state)
> + pr_warn("Attempt to build energy domains when EAS is disabled\n");
> rebuild_sched_domains_energy();
> + }
> }
>
> return ret;
> @@ -255,7 +260,14 @@ static struct ctl_table sched_energy_aware_sysctls[] = {
>
> static int __init sched_energy_aware_sysctl_init(void)
> {
> - register_sysctl_init("kernel", sched_energy_aware_sysctls);
> + int cpu = cpumask_first(cpu_active_mask);
> +
> + if (sched_smt_active() || !per_cpu(sd_asym_cpucapacity, cpu) ||
> + !arch_scale_freq_invariant())
> + return 0;
> +
> + sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
> + sysctl_sched_energy_aware = 1;
> return 0;
> }
>
> @@ -336,10 +348,28 @@ static void sched_energy_set(bool has_eas)
> if (sched_debug())
> pr_info("%s: stopping EAS\n", __func__);
> static_branch_disable_cpuslocked(&sched_energy_present);
> +#ifdef CONFIG_PROC_SYSCTL
> + /*
> + * if the architecture supports EAS and forcefully
> + * perf domains are destroyed, there should be a sysctl
> + * to enable it later. If this was due to dynamic system
> + * change such as SMT<->NON_SMT then remove sysctl.
> + */
> + if (sysctl_eas_header && !sched_energy_update) {
> + unregister_sysctl_table(sysctl_eas_header);
> + sysctl_eas_header = NULL;
> + }
> +#endif
> + sysctl_sched_energy_aware = 0;
> } else if (has_eas && !static_branch_unlikely(&sched_energy_present)) {
> if (sched_debug())
> pr_info("%s: starting EAS\n", __func__);
> static_branch_enable_cpuslocked(&sched_energy_present);
> +#ifdef CONFIG_PROC_SYSCTL
> + if (!sysctl_eas_header)
> + sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
> +#endif
> + sysctl_sched_energy_aware = 1;
> }
> }
>
> @@ -380,15 +410,14 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
> struct cpufreq_policy *policy;
> struct cpufreq_governor *gov;
>
> - if (!sysctl_sched_energy_aware)
> + if (!sysctl_sched_energy_aware && sched_energy_update)
> goto free;
>
> /* EAS is enabled for asymmetric CPU capacity topologies. */
> if (!per_cpu(sd_asym_cpucapacity, cpu)) {
> - if (sched_debug()) {
> - pr_info("rd %*pbl: CPUs do not have asymmetric capacities\n",
> - cpumask_pr_args(cpu_map));
> - }
> + if (sched_debug())
> + pr_info("rd %*pbl: Disabling EAS, CPUs do not have asymmetric capacities\n",
> + cpumask_pr_args(cpu_map));
> goto free;
> }
>
> --
> 2.31.1
>
>

Regards,
Pierre