2023-09-14 04:25:46

by Shrikanth Hegde

[permalink] [raw]
Subject: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture

sysctl_sched_energy_aware is available for the admin to disable/enable
energy aware scheduling(EAS). EAS is enabled only if few conditions are
met by the platform. They are, asymmetric CPU capacity, no SMT,
valid cpufreq policy, frequency invariant load tracking. It is possible
platform when booting may not have EAS capability, but can do that after.
For example, changing/registering the cpufreq policy.

At present, though platform doesn't support EAS, this sysctl is still
present and it ends up calling rebuild of sched domain on write to 1 and
NOP when writing to 0. That is confusing and un-necessary.

Desired behaviour can be, have the sysctl only when the platform can do
EAS. i.e when platform becomes capable enable the sysctl and when it can't
remove the sysctl. On Supported platform using this sysctl, admin should be
able to enable/disable EAS.

Update the sysctl guide as well.

Different Scenarios:
Scenario 1: System while booting has EAS capability.
Expected: sysctl will be present and admin can enable/disable EAS by writing
1 or 0 respectively. This operation shouldn't remove the sysctl specially when
disabling as sysctl would be needed to enable it later.
Scenario 2: System becomes capable of EAS later
Expected: At boot, sysctl will not be present. Once eas is enabled by passing
all the checks, perf domains will be built and sysctl will be enabled. Any
further change to sysctl should be treated same as Scenario 1.
Scenario 3: system becomes not capable of EAS.
Expected: Since EAS is going to be disabled now, remove the sysctl in this
scenario. If it becomes capable of EAS later again, that would be Scenario 2.

v2->v3:
Chen Yu and Pierre Gondois both pointed out that if platform becomes
capable of EAS later, this patch was not allowing that to happen.
Addressed that by using a variable to indicate the sysctl change
and re-worded the commit message with desired behaviour,
v1->v2:
Chen Yu had pointed out that this will not destroy the perf domains on
architectures where EAS is supported by changing the sysctl.
[v1] Link: https://lore.kernel.org/lkml/[email protected]/
[v2] Link: https://lore.kernel.org/lkml/[email protected]/

Signed-off-by: Shrikanth Hegde <[email protected]>
---
Documentation/admin-guide/sysctl/kernel.rst | 3 +-
kernel/sched/topology.c | 49 +++++++++++++++++----
2 files changed, 43 insertions(+), 9 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
index 3800fab1619b..455e12f1331b 100644
--- a/Documentation/admin-guide/sysctl/kernel.rst
+++ b/Documentation/admin-guide/sysctl/kernel.rst
@@ -1134,7 +1134,8 @@ automatically on platforms where it can run (that is,
platforms with asymmetric CPU topologies and having an Energy
Model available). If your platform happens to meet the
requirements for EAS but you do not want to use it, change
-this value to 0.
+this value to 0. If platform doesn't support EAS at this moment,
+this would be removed.

task_delayacct
===============
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 05a5bc678c08..57df938d5ec0 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -208,9 +208,11 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)

#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
DEFINE_STATIC_KEY_FALSE(sched_energy_present);
-static unsigned int sysctl_sched_energy_aware = 1;
+static unsigned int sysctl_sched_energy_aware;
+static struct ctl_table_header *sysctl_eas_header;
static DEFINE_MUTEX(sched_energy_mutex);
static bool sched_energy_update;
+static bool is_sysctl_eas_changing;

void rebuild_sched_domains_energy(void)
{
@@ -226,6 +228,7 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
int ret, state;
+ int prev_val = sysctl_sched_energy_aware;

if (write && !capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -233,8 +236,13 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (!ret && write) {
state = static_branch_unlikely(&sched_energy_present);
- if (state != sysctl_sched_energy_aware)
+ if (state != sysctl_sched_energy_aware && prev_val != sysctl_sched_energy_aware) {
+ is_sysctl_eas_changing = true;
+ if (sysctl_sched_energy_aware && !state)
+ pr_warn("Attempt to build energy domains when EAS is disabled\n");
rebuild_sched_domains_energy();
+ is_sysctl_eas_changing = false;
+ }
}

return ret;
@@ -255,7 +263,14 @@ static struct ctl_table sched_energy_aware_sysctls[] = {

static int __init sched_energy_aware_sysctl_init(void)
{
- register_sysctl_init("kernel", sched_energy_aware_sysctls);
+ int cpu = cpumask_first(cpu_active_mask);
+
+ if (sched_smt_active() || !per_cpu(sd_asym_cpucapacity, cpu) ||
+ !arch_scale_freq_invariant())
+ return 0;
+
+ sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
+ sysctl_sched_energy_aware = 1;
return 0;
}

@@ -336,10 +351,29 @@ static void sched_energy_set(bool has_eas)
if (sched_debug())
pr_info("%s: stopping EAS\n", __func__);
static_branch_disable_cpuslocked(&sched_energy_present);
+#ifdef CONFIG_PROC_SYSCTL
+ /*
+ * if the architecture supports EAS and forcefully
+ * perf domains are destroyed, there should be a sysctl
+ * to enable it later. If this was due to dynamic system
+ * change such as SMT<->NON_SMT then remove sysctl.
+ */
+ if (sysctl_eas_header && !is_sysctl_eas_changing) {
+ unregister_sysctl_table(sysctl_eas_header);
+ sysctl_eas_header = NULL;
+ sysctl_sched_energy_aware = 0;
+ }
+#endif
} else if (has_eas && !static_branch_unlikely(&sched_energy_present)) {
if (sched_debug())
pr_info("%s: starting EAS\n", __func__);
static_branch_enable_cpuslocked(&sched_energy_present);
+#ifdef CONFIG_PROC_SYSCTL
+ if (!sysctl_eas_header) {
+ sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
+ sysctl_sched_energy_aware = 1;
+ }
+#endif
}
}

@@ -380,15 +414,14 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
struct cpufreq_policy *policy;
struct cpufreq_governor *gov;

- if (!sysctl_sched_energy_aware)
+ if (!sysctl_sched_energy_aware && is_sysctl_eas_changing)
goto free;

/* EAS is enabled for asymmetric CPU capacity topologies. */
if (!per_cpu(sd_asym_cpucapacity, cpu)) {
- if (sched_debug()) {
- pr_info("rd %*pbl: CPUs do not have asymmetric capacities\n",
- cpumask_pr_args(cpu_map));
- }
+ if (sched_debug())
+ pr_info("rd %*pbl: Disabling EAS, CPUs do not have asymmetric capacities\n",
+ cpumask_pr_args(cpu_map));
goto free;
}

--
2.31.1


2023-09-14 21:52:02

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture

On 13/09/23 17:18, Shrikanth Hegde wrote:
> sysctl_sched_energy_aware is available for the admin to disable/enable
> energy aware scheduling(EAS). EAS is enabled only if few conditions are
> met by the platform. They are, asymmetric CPU capacity, no SMT,
> valid cpufreq policy, frequency invariant load tracking. It is possible
> platform when booting may not have EAS capability, but can do that after.
> For example, changing/registering the cpufreq policy.
>
> At present, though platform doesn't support EAS, this sysctl is still
> present and it ends up calling rebuild of sched domain on write to 1 and
> NOP when writing to 0. That is confusing and un-necessary.
>

But why would you write to it in the first place? Or do you mean to use
this as an indicator for userspace that EAS is supported?

2023-09-14 23:03:05

by Shrikanth Hegde

[permalink] [raw]
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture



On 9/14/23 5:31 AM, kernel test robot wrote:
> Hi Shrikanth,
>
> kernel test robot noticed the following build warnings:
>
> [auto build test WARNING on tip/sched/core]
> [also build test WARNING on linus/master v6.6-rc1 next-20230913]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
>
> url: https://github.com/intel-lab-lkp/linux/commits/Shrikanth-Hegde/sched-topology-remove-sysctl_sched_energy_aware-depending-on-the-architecture/20230913-195055
> base: tip/sched/core
> patch link: https://lore.kernel.org/r/20230913114807.665094-1-sshegde%40linux.vnet.ibm.com
> patch subject: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture
> config: x86_64-buildonly-randconfig-003-20230914 (https://download.01.org/0day-ci/archive/20230914/[email protected]/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230914/[email protected]/reproduce)
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <[email protected]>
> | Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/
>
> All warnings (new ones prefixed by >>):
>
> In file included from kernel/sched/build_utility.c:89:
>>> kernel/sched/topology.c:212:33: warning: 'sysctl_eas_header' defined but not used [-Wunused-variable]
> 212 | static struct ctl_table_header *sysctl_eas_header;
> | ^~~~~~~~~~~~~~~~~
>

Hi.

Thanks for pointing it out. This could be done by setting CONFIG_PROC_SYSCTL=n
have refactored the code and it likely make it simpler. will send out v4.


>
> vim +/sysctl_eas_header +212 kernel/sched/topology.c
>
> 208
> 209 #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> 210 DEFINE_STATIC_KEY_FALSE(sched_energy_present);
> 211 static unsigned int sysctl_sched_energy_aware;
> > 212 static struct ctl_table_header *sysctl_eas_header;
> 213 static DEFINE_MUTEX(sched_energy_mutex);
> 214 static bool sched_energy_update;
> 215 static bool is_sysctl_eas_changing;
> 216
>

2023-09-15 00:21:29

by Shrikanth Hegde

[permalink] [raw]
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture



On 9/14/23 9:51 PM, Valentin Schneider wrote:
> On 13/09/23 17:18, Shrikanth Hegde wrote:
>> sysctl_sched_energy_aware is available for the admin to disable/enable
>> energy aware scheduling(EAS). EAS is enabled only if few conditions are
>> met by the platform. They are, asymmetric CPU capacity, no SMT,
>> valid cpufreq policy, frequency invariant load tracking. It is possible
>> platform when booting may not have EAS capability, but can do that after.
>> For example, changing/registering the cpufreq policy.
>>
>> At present, though platform doesn't support EAS, this sysctl is still
>> present and it ends up calling rebuild of sched domain on write to 1 and
>> NOP when writing to 0. That is confusing and un-necessary.
>>
>

Hi Valentin, Thanks for taking a look at this patch.

> But why would you write to it in the first place? Or do you mean to use
> this as an indicator for userspace that EAS is supported?
>

Since this sysctl is present and its value being 1, it gives the
impression to the user that EAS is supported when it is not.
So its an attempt to correct that part.

So, yes, this can be an indicator whether platform can do EAS at
the moment or platform can do EAS and admin has explicitly disabled it.

2023-09-15 08:06:07

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture

Hi Shrikanth,

kernel test robot noticed the following build warnings:

[auto build test WARNING on tip/sched/core]
[also build test WARNING on linus/master v6.6-rc1 next-20230913]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url: https://github.com/intel-lab-lkp/linux/commits/Shrikanth-Hegde/sched-topology-remove-sysctl_sched_energy_aware-depending-on-the-architecture/20230913-195055
base: tip/sched/core
patch link: https://lore.kernel.org/r/20230913114807.665094-1-sshegde%40linux.vnet.ibm.com
patch subject: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture
config: x86_64-buildonly-randconfig-003-20230914 (https://download.01.org/0day-ci/archive/20230914/[email protected]/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20230914/[email protected]/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/

All warnings (new ones prefixed by >>):

In file included from kernel/sched/build_utility.c:89:
>> kernel/sched/topology.c:212:33: warning: 'sysctl_eas_header' defined but not used [-Wunused-variable]
212 | static struct ctl_table_header *sysctl_eas_header;
| ^~~~~~~~~~~~~~~~~


vim +/sysctl_eas_header +212 kernel/sched/topology.c

208
209 #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
210 DEFINE_STATIC_KEY_FALSE(sched_energy_present);
211 static unsigned int sysctl_sched_energy_aware;
> 212 static struct ctl_table_header *sysctl_eas_header;
213 static DEFINE_MUTEX(sched_energy_mutex);
214 static bool sched_energy_update;
215 static bool is_sysctl_eas_changing;
216

--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

2023-09-15 13:50:06

by Pierre Gondois

[permalink] [raw]
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture

Hello Valentin,

On 9/15/23 14:00, Valentin Schneider wrote:
> On 14/09/23 23:26, Shrikanth Hegde wrote:
>> On 9/14/23 9:51 PM, Valentin Schneider wrote:
>>> On 13/09/23 17:18, Shrikanth Hegde wrote:
>>>> sysctl_sched_energy_aware is available for the admin to disable/enable
>>>> energy aware scheduling(EAS). EAS is enabled only if few conditions are
>>>> met by the platform. They are, asymmetric CPU capacity, no SMT,
>>>> valid cpufreq policy, frequency invariant load tracking. It is possible
>>>> platform when booting may not have EAS capability, but can do that after.
>>>> For example, changing/registering the cpufreq policy.
>>>>
>>>> At present, though platform doesn't support EAS, this sysctl is still
>>>> present and it ends up calling rebuild of sched domain on write to 1 and
>>>> NOP when writing to 0. That is confusing and un-necessary.
>>>>
>>>
>>
>> Hi Valentin, Thanks for taking a look at this patch.
>>
>>> But why would you write to it in the first place? Or do you mean to use
>>> this as an indicator for userspace that EAS is supported?
>>>
>>
>> Since this sysctl is present and its value being 1, it gives the
>> impression to the user that EAS is supported when it is not.
>> So its an attempt to correct that part.
>>
>
> Ah, I see. Then how about just making the sysctl return 0 when EAS isn't
> supported? And on top of it, prevent all writes when EAS isn't supported
> (perf domains cannot be built, so there would be no point in forcing a
> rebuild that will do nothing).

I think the issue comes from the fact there is no variable representing
whether EAS is supported or not. sched_energy_enabled()/sched_energy_present
tells whether EAS is actively running on the system instead.

So on a system with EAS running, I think what would happen is:
# Disable EAS and set sched_energy_present=0
echo 0 > /proc/sys/kernel/sched_energy_aware

# sched_energy_present==0, so we get -EOPNOTSUPP
echo 1 > /proc/sys/kernel/sched_energy_aware


>
> I can never remember how to properly use the sysctl API, so that's a very
> crude implementation, but something like so?
>
> ---
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 05a5bc678c089..dadfc5afc4121 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -230,9 +230,28 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
> if (write && !capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> + if (!sched_energy_enabled()) {
> + if (write)
> + return -EOPNOTSUPP;
> + else {
> + size_t len;
> +
> + if (*ppos) {
> + *lenp = 0;
> + return 0;
> + }
> +
> + len = snprintf((char *)buffer, 3, "0\n");
> +
> + *lenp = len;
> + *ppos += len;
> + return 0;
> + }
> + }
> +
> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> if (!ret && write) {
> - state = static_branch_unlikely(&sched_energy_present);
> + state = sched_energy_enabled();
> if (state != sysctl_sched_energy_aware)
> rebuild_sched_domains_energy();
> }
>

Regards,
Pierre

2023-09-15 22:16:49

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture

On 14/09/23 23:26, Shrikanth Hegde wrote:
> On 9/14/23 9:51 PM, Valentin Schneider wrote:
>> On 13/09/23 17:18, Shrikanth Hegde wrote:
>>> sysctl_sched_energy_aware is available for the admin to disable/enable
>>> energy aware scheduling(EAS). EAS is enabled only if few conditions are
>>> met by the platform. They are, asymmetric CPU capacity, no SMT,
>>> valid cpufreq policy, frequency invariant load tracking. It is possible
>>> platform when booting may not have EAS capability, but can do that after.
>>> For example, changing/registering the cpufreq policy.
>>>
>>> At present, though platform doesn't support EAS, this sysctl is still
>>> present and it ends up calling rebuild of sched domain on write to 1 and
>>> NOP when writing to 0. That is confusing and un-necessary.
>>>
>>
>
> Hi Valentin, Thanks for taking a look at this patch.
>
>> But why would you write to it in the first place? Or do you mean to use
>> this as an indicator for userspace that EAS is supported?
>>
>
> Since this sysctl is present and its value being 1, it gives the
> impression to the user that EAS is supported when it is not.
> So its an attempt to correct that part.
>

Ah, I see. Then how about just making the sysctl return 0 when EAS isn't
supported? And on top of it, prevent all writes when EAS isn't supported
(perf domains cannot be built, so there would be no point in forcing a
rebuild that will do nothing).

I can never remember how to properly use the sysctl API, so that's a very
crude implementation, but something like so?

---

diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 05a5bc678c089..dadfc5afc4121 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -230,9 +230,28 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
if (write && !capable(CAP_SYS_ADMIN))
return -EPERM;

+ if (!sched_energy_enabled()) {
+ if (write)
+ return -EOPNOTSUPP;
+ else {
+ size_t len;
+
+ if (*ppos) {
+ *lenp = 0;
+ return 0;
+ }
+
+ len = snprintf((char *)buffer, 3, "0\n");
+
+ *lenp = len;
+ *ppos += len;
+ return 0;
+ }
+ }
+
ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
if (!ret && write) {
- state = static_branch_unlikely(&sched_energy_present);
+ state = sched_energy_enabled();
if (state != sysctl_sched_energy_aware)
rebuild_sched_domains_energy();
}

2023-09-16 00:33:23

by Shrikanth Hegde

[permalink] [raw]
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture



On 9/15/23 5:30 PM, Valentin Schneider wrote:
> On 14/09/23 23:26, Shrikanth Hegde wrote:
>> On 9/14/23 9:51 PM, Valentin Schneider wrote:
>>> On 13/09/23 17:18, Shrikanth Hegde wrote:
>>>> sysctl_sched_energy_aware is available for the admin to disable/enable
>>>> energy aware scheduling(EAS). EAS is enabled only if few conditions are
>>>> met by the platform. They are, asymmetric CPU capacity, no SMT,
>>>> valid cpufreq policy, frequency invariant load tracking. It is possible
>>>> platform when booting may not have EAS capability, but can do that after.
>>>> For example, changing/registering the cpufreq policy.
>>>>
>>>> At present, though platform doesn't support EAS, this sysctl is still
>>>> present and it ends up calling rebuild of sched domain on write to 1 and
>>>> NOP when writing to 0. That is confusing and un-necessary.
>>>>
>>>
>>
>> Hi Valentin, Thanks for taking a look at this patch.
>>
>>> But why would you write to it in the first place? Or do you mean to use
>>> this as an indicator for userspace that EAS is supported?
>>>
>>
>> Since this sysctl is present and its value being 1, it gives the
>> impression to the user that EAS is supported when it is not.
>> So its an attempt to correct that part.
>>
>
> Ah, I see. Then how about just making the sysctl return 0 when EAS isn't
> supported? And on top of it, prevent all writes when EAS isn't supported
> (perf domains cannot be built, so there would be no point in forcing a
> rebuild that will do nothing).

Yes. That's another way. Thats what I had as possible approach in
https://lore.kernel.org/lkml/[email protected]/



>
> I can never remember how to properly use the sysctl API, so that's a very
> crude implementation, but something like so?
>
> ---
>
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 05a5bc678c089..dadfc5afc4121 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -230,9 +230,28 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
> if (write && !capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> + if (!sched_energy_enabled()) {

Use of sched_energy_enabled won't work as Pierre has indicated.

Instead this can be done by adding those checks in a helper function to
do similar checks as done build_perf_domains.

I can send v4 with this approach if it makes more sense. Please let me know.

> + if (write)
> + return -EOPNOTSUPP;
> + else {
> + size_t len;
> +
> + if (*ppos) {
> + *lenp = 0;
> + return 0;
> + }
> +
> + len = snprintf((char *)buffer, 3, "0\n");
> +
> + *lenp = len;
> + *ppos += len;
> + return 0;
> + }
> + }
> +
> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> if (!ret && write) {
> - state = static_branch_unlikely(&sched_energy_present);
> + state = sched_energy_enabled();
> if (state != sysctl_sched_energy_aware)
> rebuild_sched_domains_energy();
> }
>

2023-09-18 12:34:59

by Pierre Gondois

[permalink] [raw]
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture

Hello Shrikanth,

On 9/13/23 13:48, Shrikanth Hegde wrote:
> sysctl_sched_energy_aware is available for the admin to disable/enable
> energy aware scheduling(EAS). EAS is enabled only if few conditions are
> met by the platform. They are, asymmetric CPU capacity, no SMT,
> valid cpufreq policy, frequency invariant load tracking. It is possible
> platform when booting may not have EAS capability, but can do that after.

I think:
"A platform may not boot without EAS capability, but could gain such capability
at runtime (and vice versa)."

> For example, changing/registering the cpufreq policy.
>
> At present, though platform doesn't support EAS, this sysctl is still
> present and it ends up calling rebuild of sched domain on write to 1 and
> NOP when writing to 0. That is confusing and un-necessary.

Maybe:
"Platforms without EAS capability still have this sysctl. Its effects
are unnecessary on such platforms (rebuilding sched-domains) and its presence
can be confusing."

>
> Desired behaviour can be, have the sysctl only when the platform can do
> EAS. i.e when platform becomes capable enable the sysctl and when it can't
> remove the sysctl. On Supported platform using this sysctl, admin should be
> able to enable/disable EAS.

Maybe just:
"Dynamically hide/show sysctl_sched_energy_aware by re-evaluating EAS capability
conditions."

>
> Update the sysctl guide as well.
>
> Different Scenarios:
> Scenario 1: System while booting has EAS capability.
> Expected: sysctl will be present and admin can enable/disable EAS by writing
> 1 or 0 respectively. This operation shouldn't remove the sysctl specially when
> disabling as sysctl would be needed to enable it later.
> Scenario 2: System becomes capable of EAS later
> Expected: At boot, sysctl will not be present. Once eas is enabled by passing
> all the checks, perf domains will be built and sysctl will be enabled. Any
> further change to sysctl should be treated same as Scenario 1.
> Scenario 3: system becomes not capable of EAS.
> Expected: Since EAS is going to be disabled now, remove the sysctl in this
> scenario. If it becomes capable of EAS later again, that would be Scenario 2.

I don't think detailing all the possible scenarios above is necessary here.

>
> v2->v3:
> Chen Yu and Pierre Gondois both pointed out that if platform becomes
> capable of EAS later, this patch was not allowing that to happen.
> Addressed that by using a variable to indicate the sysctl change
> and re-worded the commit message with desired behaviour,
> v1->v2:
> Chen Yu had pointed out that this will not destroy the perf domains on
> architectures where EAS is supported by changing the sysctl.
> [v1] Link: https://lore.kernel.org/lkml/[email protected]/
> [v2] Link: https://lore.kernel.org/lkml/[email protected]/
>
> Signed-off-by: Shrikanth Hegde <[email protected]>
> ---
> Documentation/admin-guide/sysctl/kernel.rst | 3 +-
> kernel/sched/topology.c | 49 +++++++++++++++++----
> 2 files changed, 43 insertions(+), 9 deletions(-)
>
> diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> index 3800fab1619b..455e12f1331b 100644
> --- a/Documentation/admin-guide/sysctl/kernel.rst
> +++ b/Documentation/admin-guide/sysctl/kernel.rst
> @@ -1134,7 +1134,8 @@ automatically on platforms where it can run (that is,
> platforms with asymmetric CPU topologies and having an Energy
> Model available). If your platform happens to meet the
> requirements for EAS but you do not want to use it, change
> -this value to 0.
> +this value to 0. If platform doesn't support EAS at this moment,
> +this would be removed.

Maybe:
"This file is only advertised if your platform meets EAS requirements."

(feel free to deny the rewording suggestions)

>
> task_delayacct
> ===============
> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 05a5bc678c08..57df938d5ec0 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -208,9 +208,11 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
>
> #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> DEFINE_STATIC_KEY_FALSE(sched_energy_present);
> -static unsigned int sysctl_sched_energy_aware = 1;
> +static unsigned int sysctl_sched_energy_aware;
> +static struct ctl_table_header *sysctl_eas_header;
> static DEFINE_MUTEX(sched_energy_mutex);
> static bool sched_energy_update;
> +static bool is_sysctl_eas_changing;
>
> void rebuild_sched_domains_energy(void)
> {
> @@ -226,6 +228,7 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
> void *buffer, size_t *lenp, loff_t *ppos)
> {
> int ret, state;
> + int prev_val = sysctl_sched_energy_aware;
>
> if (write && !capable(CAP_SYS_ADMIN))
> return -EPERM;
> @@ -233,8 +236,13 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> if (!ret && write) {
> state = static_branch_unlikely(&sched_energy_present);
> - if (state != sysctl_sched_energy_aware)
> + if (state != sysctl_sched_energy_aware && prev_val != sysctl_sched_energy_aware) {
> + is_sysctl_eas_changing = true;
> + if (sysctl_sched_energy_aware && !state)
> + pr_warn("Attempt to build energy domains when EAS is disabled\n");
> rebuild_sched_domains_energy();
> + is_sysctl_eas_changing = false;
> + }
> }
>
> return ret;
> @@ -255,7 +263,14 @@ static struct ctl_table sched_energy_aware_sysctls[] = {
>
> static int __init sched_energy_aware_sysctl_init(void)
> {
> - register_sysctl_init("kernel", sched_energy_aware_sysctls);
> + int cpu = cpumask_first(cpu_active_mask);
> +
> + if (sched_smt_active() || !per_cpu(sd_asym_cpucapacity, cpu) ||
> + !arch_scale_freq_invariant())
> + return 0;
> +
> + sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
> + sysctl_sched_energy_aware = 1;
> return 0;
> }
>
> @@ -336,10 +351,29 @@ static void sched_energy_set(bool has_eas)
> if (sched_debug())
> pr_info("%s: stopping EAS\n", __func__);
> static_branch_disable_cpuslocked(&sched_energy_present);
> +#ifdef CONFIG_PROC_SYSCTL
> + /*
> + * if the architecture supports EAS and forcefully
> + * perf domains are destroyed, there should be a sysctl
> + * to enable it later. If this was due to dynamic system
> + * change such as SMT<->NON_SMT then remove sysctl.
> + */
> + if (sysctl_eas_header && !is_sysctl_eas_changing) {
> + unregister_sysctl_table(sysctl_eas_header);
> + sysctl_eas_header = NULL;
> + sysctl_sched_energy_aware = 0;
> + }
> +#endif
> } else if (has_eas && !static_branch_unlikely(&sched_energy_present)) {
> if (sched_debug())
> pr_info("%s: starting EAS\n", __func__);
> static_branch_enable_cpuslocked(&sched_energy_present);
> +#ifdef CONFIG_PROC_SYSCTL
> + if (!sysctl_eas_header) {
> + sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
> + sysctl_sched_energy_aware = 1;
> + }
> +#endif

With a kernel which doesn't have a running cpufreq driver, the following scenario fails
for me:
# insmod [cpufreq driver].ko
# cat /proc/sys/kernel/sched_energy_aware
1
# echo 0 > /proc/sys/kernel/sched_energy_aware
# rmmod cpufreq driver].ko
# cat /proc/sys/kernel/sched_energy_aware
0

sched_energy_aware should have been removed at that point. In sched_energy_set(),
sysctl_eas_header sysctl should be unregistered if we go from the state where:
- EAS is disabled, but possible
to the state:
- EAS is disabled, but not possible

I think the following logic should somehow be extracted in a separate function,
named sched_energy_aware_possible() for instance (or other as you appreciate).
The logic should be checked to register/unregister the sysctl.
---
if (sched_smt_active() || !per_cpu(sd_asym_cpucapacity, cpu) ||
!arch_scale_freq_invariant())
---

Also it seemed there was a miss in rebuilding the sched domains when
a cpufreq driver is removed, but the issue described above is still appearing
with the following patch applied:
https://lore.kernel.org/all/[email protected]/

Regards,
Pierre

> }
> }
>
> @@ -380,15 +414,14 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
> struct cpufreq_policy *policy;
> struct cpufreq_governor *gov;
>
> - if (!sysctl_sched_energy_aware)
> + if (!sysctl_sched_energy_aware && is_sysctl_eas_changing)
> goto free;
>
> /* EAS is enabled for asymmetric CPU capacity topologies. */
> if (!per_cpu(sd_asym_cpucapacity, cpu)) {
> - if (sched_debug()) {
> - pr_info("rd %*pbl: CPUs do not have asymmetric capacities\n",
> - cpumask_pr_args(cpu_map));
> - }
> + if (sched_debug())
> + pr_info("rd %*pbl: Disabling EAS, CPUs do not have asymmetric capacities\n",
> + cpumask_pr_args(cpu_map));
> goto free;
> }
>
> --
> 2.31.1
>

2023-09-18 13:02:02

by Phil Auld

[permalink] [raw]
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture

On Mon, Sep 18, 2023 at 02:00:11PM +0200 Pierre Gondois wrote:
> Hello Shrikanth,
>
> On 9/13/23 13:48, Shrikanth Hegde wrote:
> > sysctl_sched_energy_aware is available for the admin to disable/enable
> > energy aware scheduling(EAS). EAS is enabled only if few conditions are
> > met by the platform. They are, asymmetric CPU capacity, no SMT,
> > valid cpufreq policy, frequency invariant load tracking. It is possible
> > platform when booting may not have EAS capability, but can do that after.
>
> I think:
> "A platform may not boot without EAS capability, but could gain such capability
> at runtime (and vice versa)."
>

s/not//

Otherwise it's a double negative and the sentence doesn't mean anything. Also
it sounds like a failure to boot, which is not the intent I think.


Cheers,
Phil

> > For example, changing/registering the cpufreq policy.
> >
> > At present, though platform doesn't support EAS, this sysctl is still
> > present and it ends up calling rebuild of sched domain on write to 1 and
> > NOP when writing to 0. That is confusing and un-necessary.
>
> Maybe:
> "Platforms without EAS capability still have this sysctl. Its effects
> are unnecessary on such platforms (rebuilding sched-domains) and its presence
> can be confusing."
>
> >
> > Desired behaviour can be, have the sysctl only when the platform can do
> > EAS. i.e when platform becomes capable enable the sysctl and when it can't
> > remove the sysctl. On Supported platform using this sysctl, admin should be
> > able to enable/disable EAS.
>
> Maybe just:
> "Dynamically hide/show sysctl_sched_energy_aware by re-evaluating EAS capability
> conditions."
>
> >
> > Update the sysctl guide as well.
> >
> > Different Scenarios:
> > Scenario 1: System while booting has EAS capability.
> > Expected: sysctl will be present and admin can enable/disable EAS by writing
> > 1 or 0 respectively. This operation shouldn't remove the sysctl specially when
> > disabling as sysctl would be needed to enable it later.
> > Scenario 2: System becomes capable of EAS later
> > Expected: At boot, sysctl will not be present. Once eas is enabled by passing
> > all the checks, perf domains will be built and sysctl will be enabled. Any
> > further change to sysctl should be treated same as Scenario 1.
> > Scenario 3: system becomes not capable of EAS.
> > Expected: Since EAS is going to be disabled now, remove the sysctl in this
> > scenario. If it becomes capable of EAS later again, that would be Scenario 2.
>
> I don't think detailing all the possible scenarios above is necessary here.
>
> >
> > v2->v3:
> > Chen Yu and Pierre Gondois both pointed out that if platform becomes
> > capable of EAS later, this patch was not allowing that to happen.
> > Addressed that by using a variable to indicate the sysctl change
> > and re-worded the commit message with desired behaviour,
> > v1->v2:
> > Chen Yu had pointed out that this will not destroy the perf domains on
> > architectures where EAS is supported by changing the sysctl.
> > [v1] Link: https://lore.kernel.org/lkml/[email protected]/
> > [v2] Link: https://lore.kernel.org/lkml/[email protected]/
> >
> > Signed-off-by: Shrikanth Hegde <[email protected]>
> > ---
> > Documentation/admin-guide/sysctl/kernel.rst | 3 +-
> > kernel/sched/topology.c | 49 +++++++++++++++++----
> > 2 files changed, 43 insertions(+), 9 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst
> > index 3800fab1619b..455e12f1331b 100644
> > --- a/Documentation/admin-guide/sysctl/kernel.rst
> > +++ b/Documentation/admin-guide/sysctl/kernel.rst
> > @@ -1134,7 +1134,8 @@ automatically on platforms where it can run (that is,
> > platforms with asymmetric CPU topologies and having an Energy
> > Model available). If your platform happens to meet the
> > requirements for EAS but you do not want to use it, change
> > -this value to 0.
> > +this value to 0. If platform doesn't support EAS at this moment,
> > +this would be removed.
>
> Maybe:
> "This file is only advertised if your platform meets EAS requirements."
>
> (feel free to deny the rewording suggestions)
>
> >
> > task_delayacct
> > ===============
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 05a5bc678c08..57df938d5ec0 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -208,9 +208,11 @@ sd_parent_degenerate(struct sched_domain *sd, struct sched_domain *parent)
> >
> > #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL)
> > DEFINE_STATIC_KEY_FALSE(sched_energy_present);
> > -static unsigned int sysctl_sched_energy_aware = 1;
> > +static unsigned int sysctl_sched_energy_aware;
> > +static struct ctl_table_header *sysctl_eas_header;
> > static DEFINE_MUTEX(sched_energy_mutex);
> > static bool sched_energy_update;
> > +static bool is_sysctl_eas_changing;
> >
> > void rebuild_sched_domains_energy(void)
> > {
> > @@ -226,6 +228,7 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
> > void *buffer, size_t *lenp, loff_t *ppos)
> > {
> > int ret, state;
> > + int prev_val = sysctl_sched_energy_aware;
> >
> > if (write && !capable(CAP_SYS_ADMIN))
> > return -EPERM;
> > @@ -233,8 +236,13 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
> > ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> > if (!ret && write) {
> > state = static_branch_unlikely(&sched_energy_present);
> > - if (state != sysctl_sched_energy_aware)
> > + if (state != sysctl_sched_energy_aware && prev_val != sysctl_sched_energy_aware) {
> > + is_sysctl_eas_changing = true;
> > + if (sysctl_sched_energy_aware && !state)
> > + pr_warn("Attempt to build energy domains when EAS is disabled\n");
> > rebuild_sched_domains_energy();
> > + is_sysctl_eas_changing = false;
> > + }
> > }
> >
> > return ret;
> > @@ -255,7 +263,14 @@ static struct ctl_table sched_energy_aware_sysctls[] = {
> >
> > static int __init sched_energy_aware_sysctl_init(void)
> > {
> > - register_sysctl_init("kernel", sched_energy_aware_sysctls);
> > + int cpu = cpumask_first(cpu_active_mask);
> > +
> > + if (sched_smt_active() || !per_cpu(sd_asym_cpucapacity, cpu) ||
> > + !arch_scale_freq_invariant())
> > + return 0;
> > +
> > + sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
> > + sysctl_sched_energy_aware = 1;
> > return 0;
> > }
> >
> > @@ -336,10 +351,29 @@ static void sched_energy_set(bool has_eas)
> > if (sched_debug())
> > pr_info("%s: stopping EAS\n", __func__);
> > static_branch_disable_cpuslocked(&sched_energy_present);
> > +#ifdef CONFIG_PROC_SYSCTL
> > + /*
> > + * if the architecture supports EAS and forcefully
> > + * perf domains are destroyed, there should be a sysctl
> > + * to enable it later. If this was due to dynamic system
> > + * change such as SMT<->NON_SMT then remove sysctl.
> > + */
> > + if (sysctl_eas_header && !is_sysctl_eas_changing) {
> > + unregister_sysctl_table(sysctl_eas_header);
> > + sysctl_eas_header = NULL;
> > + sysctl_sched_energy_aware = 0;
> > + }
> > +#endif
> > } else if (has_eas && !static_branch_unlikely(&sched_energy_present)) {
> > if (sched_debug())
> > pr_info("%s: starting EAS\n", __func__);
> > static_branch_enable_cpuslocked(&sched_energy_present);
> > +#ifdef CONFIG_PROC_SYSCTL
> > + if (!sysctl_eas_header) {
> > + sysctl_eas_header = register_sysctl("kernel", sched_energy_aware_sysctls);
> > + sysctl_sched_energy_aware = 1;
> > + }
> > +#endif
>
> With a kernel which doesn't have a running cpufreq driver, the following scenario fails
> for me:
> # insmod [cpufreq driver].ko
> # cat /proc/sys/kernel/sched_energy_aware
> 1
> # echo 0 > /proc/sys/kernel/sched_energy_aware
> # rmmod cpufreq driver].ko
> # cat /proc/sys/kernel/sched_energy_aware
> 0
>
> sched_energy_aware should have been removed at that point. In sched_energy_set(),
> sysctl_eas_header sysctl should be unregistered if we go from the state where:
> - EAS is disabled, but possible
> to the state:
> - EAS is disabled, but not possible
>
> I think the following logic should somehow be extracted in a separate function,
> named sched_energy_aware_possible() for instance (or other as you appreciate).
> The logic should be checked to register/unregister the sysctl.
> ---
> if (sched_smt_active() || !per_cpu(sd_asym_cpucapacity, cpu) ||
> !arch_scale_freq_invariant())
> ---
>
> Also it seemed there was a miss in rebuilding the sched domains when
> a cpufreq driver is removed, but the issue described above is still appearing
> with the following patch applied:
> https://lore.kernel.org/all/[email protected]/
>
> Regards,
> Pierre
>
> > }
> > }
> >
> > @@ -380,15 +414,14 @@ static bool build_perf_domains(const struct cpumask *cpu_map)
> > struct cpufreq_policy *policy;
> > struct cpufreq_governor *gov;
> >
> > - if (!sysctl_sched_energy_aware)
> > + if (!sysctl_sched_energy_aware && is_sysctl_eas_changing)
> > goto free;
> >
> > /* EAS is enabled for asymmetric CPU capacity topologies. */
> > if (!per_cpu(sd_asym_cpucapacity, cpu)) {
> > - if (sched_debug()) {
> > - pr_info("rd %*pbl: CPUs do not have asymmetric capacities\n",
> > - cpumask_pr_args(cpu_map));
> > - }
> > + if (sched_debug())
> > + pr_info("rd %*pbl: Disabling EAS, CPUs do not have asymmetric capacities\n",
> > + cpumask_pr_args(cpu_map));
> > goto free;
> > }
> >
> > --
> > 2.31.1
> >
>

--

2023-09-18 23:01:07

by Shrikanth Hegde

[permalink] [raw]
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture



On 9/18/23 5:52 PM, Valentin Schneider wrote:
> On 15/09/23 23:40, Shrikanth Hegde wrote:
>> On 9/15/23 5:30 PM, Valentin Schneider wrote:
>>> On 14/09/23 23:26, Shrikanth Hegde wrote:
>>>> On 9/14/23 9:51 PM, Valentin Schneider wrote:
>>>>> On 13/09/23 17:18, Shrikanth Hegde wrote:
>>>>>> sysctl_sched_energy_aware is available for the admin to disable/enable
>>>>>> energy aware scheduling(EAS). EAS is enabled only if few conditions are
>>>>>> met by the platform. They are, asymmetric CPU capacity, no SMT,
>>>>>> valid cpufreq policy, frequency invariant load tracking. It is possible
>>>>>> platform when booting may not have EAS capability, but can do that after.
>>>>>> For example, changing/registering the cpufreq policy.
>>>>>>
>>>>>> At present, though platform doesn't support EAS, this sysctl is still
>>>>>> present and it ends up calling rebuild of sched domain on write to 1 and
>>>>>> NOP when writing to 0. That is confusing and un-necessary.
>>>>>>
>>>>>
>>>>
>>>> Hi Valentin, Thanks for taking a look at this patch.
>>>>
>>>>> But why would you write to it in the first place? Or do you mean to use
>>>>> this as an indicator for userspace that EAS is supported?
>>>>>
>>>>
>>>> Since this sysctl is present and its value being 1, it gives the
>>>> impression to the user that EAS is supported when it is not.
>>>> So its an attempt to correct that part.
>>>>
>>>
>>> Ah, I see. Then how about just making the sysctl return 0 when EAS isn't
>>> supported? And on top of it, prevent all writes when EAS isn't supported
>>> (perf domains cannot be built, so there would be no point in forcing a
>>> rebuild that will do nothing).
>>
>> Yes. That's another way. Thats what I had as possible approach in
>> https://lore.kernel.org/lkml/[email protected]/
>>
>
> Thanks for the link; and apologies for bringing up topics that have been
> discussed already.
>
>>
>>
>>>
>>> I can never remember how to properly use the sysctl API, so that's a very
>>> crude implementation, but something like so?
>>>
>>> ---
>>>
>>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>>> index 05a5bc678c089..dadfc5afc4121 100644
>>> --- a/kernel/sched/topology.c
>>> +++ b/kernel/sched/topology.c
>>> @@ -230,9 +230,28 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
>>> if (write && !capable(CAP_SYS_ADMIN))
>>> return -EPERM;
>>>
>>> + if (!sched_energy_enabled()) {
>>
>> Use of sched_energy_enabled won't work as Pierre has indicated.
>>
>> Instead this can be done by adding those checks in a helper function to
>> do similar checks as done build_perf_domains.
>>
>> I can send v4 with this approach if it makes more sense. Please let me know.
>>
>
> So what I'm thinking is the standard approach seems to be to keep the knobs
> visible, but change how reads/writes to them are handled.
>
> For instance, SMT support has
>
> /sys/devices/system/cpu/smt
> /control
> /active
>
> And a system with CONFIG_HOTPLUG_SMT=y but no actual hardware SMT will
> have:
>
> /control = notsupported
> /active = 0
>
> So IMO it would make sense to keep sched_energy_aware around, but make it
> read 0 and prevent writes for systems that have the software support
> compiled but don't have the actual hardware support.

ok.

>
> In a pinch it also helps to know if CONFIG_ENERGY_MODEL was selected,
> though that's obvious enough with CONFIG_SCHED_DEBUG=y.
>
ok. This would be simpler to implement as well. Removing it would have
few tricky corner case scenarios as pierre has indicated.

Should be able to send out v4 sometime soon. I am on a holiday till Sep 19.

Pierre and Phil, thanks for the suggestions to commit message. I will
incorporate the suggestions.

2023-09-19 07:35:33

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture

On 15/09/23 15:35, Pierre Gondois wrote:
> Hello Valentin,
>
> On 9/15/23 14:00, Valentin Schneider wrote:
>> On 14/09/23 23:26, Shrikanth Hegde wrote:
>>> On 9/14/23 9:51 PM, Valentin Schneider wrote:
>>>> On 13/09/23 17:18, Shrikanth Hegde wrote:
>>>>> sysctl_sched_energy_aware is available for the admin to disable/enable
>>>>> energy aware scheduling(EAS). EAS is enabled only if few conditions are
>>>>> met by the platform. They are, asymmetric CPU capacity, no SMT,
>>>>> valid cpufreq policy, frequency invariant load tracking. It is possible
>>>>> platform when booting may not have EAS capability, but can do that after.
>>>>> For example, changing/registering the cpufreq policy.
>>>>>
>>>>> At present, though platform doesn't support EAS, this sysctl is still
>>>>> present and it ends up calling rebuild of sched domain on write to 1 and
>>>>> NOP when writing to 0. That is confusing and un-necessary.
>>>>>
>>>>
>>>
>>> Hi Valentin, Thanks for taking a look at this patch.
>>>
>>>> But why would you write to it in the first place? Or do you mean to use
>>>> this as an indicator for userspace that EAS is supported?
>>>>
>>>
>>> Since this sysctl is present and its value being 1, it gives the
>>> impression to the user that EAS is supported when it is not.
>>> So its an attempt to correct that part.
>>>
>>
>> Ah, I see. Then how about just making the sysctl return 0 when EAS isn't
>> supported? And on top of it, prevent all writes when EAS isn't supported
>> (perf domains cannot be built, so there would be no point in forcing a
>> rebuild that will do nothing).
>
> I think the issue comes from the fact there is no variable representing
> whether EAS is supported or not. sched_energy_enabled()/sched_energy_present
> tells whether EAS is actively running on the system instead.
>
> So on a system with EAS running, I think what would happen is:
> # Disable EAS and set sched_energy_present=0
> echo 0 > /proc/sys/kernel/sched_energy_aware
>
> # sched_energy_present==0, so we get -EOPNOTSUPP
> echo 1 > /proc/sys/kernel/sched_energy_aware
>

Ah, quite so, I didn't think this through!

2023-09-19 14:30:02

by Valentin Schneider

[permalink] [raw]
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture

On 15/09/23 23:40, Shrikanth Hegde wrote:
> On 9/15/23 5:30 PM, Valentin Schneider wrote:
>> On 14/09/23 23:26, Shrikanth Hegde wrote:
>>> On 9/14/23 9:51 PM, Valentin Schneider wrote:
>>>> On 13/09/23 17:18, Shrikanth Hegde wrote:
>>>>> sysctl_sched_energy_aware is available for the admin to disable/enable
>>>>> energy aware scheduling(EAS). EAS is enabled only if few conditions are
>>>>> met by the platform. They are, asymmetric CPU capacity, no SMT,
>>>>> valid cpufreq policy, frequency invariant load tracking. It is possible
>>>>> platform when booting may not have EAS capability, but can do that after.
>>>>> For example, changing/registering the cpufreq policy.
>>>>>
>>>>> At present, though platform doesn't support EAS, this sysctl is still
>>>>> present and it ends up calling rebuild of sched domain on write to 1 and
>>>>> NOP when writing to 0. That is confusing and un-necessary.
>>>>>
>>>>
>>>
>>> Hi Valentin, Thanks for taking a look at this patch.
>>>
>>>> But why would you write to it in the first place? Or do you mean to use
>>>> this as an indicator for userspace that EAS is supported?
>>>>
>>>
>>> Since this sysctl is present and its value being 1, it gives the
>>> impression to the user that EAS is supported when it is not.
>>> So its an attempt to correct that part.
>>>
>>
>> Ah, I see. Then how about just making the sysctl return 0 when EAS isn't
>> supported? And on top of it, prevent all writes when EAS isn't supported
>> (perf domains cannot be built, so there would be no point in forcing a
>> rebuild that will do nothing).
>
> Yes. That's another way. Thats what I had as possible approach in
> https://lore.kernel.org/lkml/[email protected]/
>

Thanks for the link; and apologies for bringing up topics that have been
discussed already.

>
>
>>
>> I can never remember how to properly use the sysctl API, so that's a very
>> crude implementation, but something like so?
>>
>> ---
>>
>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>> index 05a5bc678c089..dadfc5afc4121 100644
>> --- a/kernel/sched/topology.c
>> +++ b/kernel/sched/topology.c
>> @@ -230,9 +230,28 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
>> if (write && !capable(CAP_SYS_ADMIN))
>> return -EPERM;
>>
>> + if (!sched_energy_enabled()) {
>
> Use of sched_energy_enabled won't work as Pierre has indicated.
>
> Instead this can be done by adding those checks in a helper function to
> do similar checks as done build_perf_domains.
>
> I can send v4 with this approach if it makes more sense. Please let me know.
>

So what I'm thinking is the standard approach seems to be to keep the knobs
visible, but change how reads/writes to them are handled.

For instance, SMT support has

/sys/devices/system/cpu/smt
/control
/active

And a system with CONFIG_HOTPLUG_SMT=y but no actual hardware SMT will
have:

/control = notsupported
/active = 0

So IMO it would make sense to keep sched_energy_aware around, but make it
read 0 and prevent writes for systems that have the software support
compiled but don't have the actual hardware support.

In a pinch it also helps to know if CONFIG_ENERGY_MODEL was selected,
though that's obvious enough with CONFIG_SCHED_DEBUG=y.

2023-09-20 21:43:41

by Shrikanth Hegde

[permalink] [raw]
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture



On 9/15/23 5:30 PM, Valentin Schneider wrote:
> On 14/09/23 23:26, Shrikanth Hegde wrote:
>> On 9/14/23 9:51 PM, Valentin Schneider wrote:
>>> On 13/09/23 17:18, Shrikanth Hegde wrote:
>>>> sysctl_sched_energy_aware is available for the admin to disable/enable
>>>> energy aware scheduling(EAS). EAS is enabled only if few conditions are
>>>> met by the platform. They are, asymmetric CPU capacity, no SMT,
>>>> valid cpufreq policy, frequency invariant load tracking. It is possible
>>>> platform when booting may not have EAS capability, but can do that after.
>>>> For example, changing/registering the cpufreq policy.
>>>>
>>>> At present, though platform doesn't support EAS, this sysctl is still
>>>> present and it ends up calling rebuild of sched domain on write to 1 and
>>>> NOP when writing to 0. That is confusing and un-necessary.
>>>>
>>>
>>
>> Hi Valentin, Thanks for taking a look at this patch.
>>
>>> But why would you write to it in the first place? Or do you mean to use
>>> this as an indicator for userspace that EAS is supported?
>>>
>>
>> Since this sysctl is present and its value being 1, it gives the
>> impression to the user that EAS is supported when it is not.
>> So its an attempt to correct that part.
>>
>
> Ah, I see. Then how about just making the sysctl return 0 when EAS isn't
> supported? And on top of it, prevent all writes when EAS isn't supported
> (perf domains cannot be built, so there would be no point in forcing a
> rebuild that will do nothing).
>
> I can never remember how to properly use the sysctl API, so that's a very
> crude implementation, but something like so?
>
> ---
>

I tried the below method, instead of sched_energy_enabled, using a helper
function which does similar checks as in build_perf_domains and use the
same helper function in build_perf_domains as well.

# cat sched_energy_aware
# echo 0 > sched_energy_aware
-bash: echo: write error: Operation not supported
# echo 1 > sched_energy_aware
-bash: echo: write error: Operation not supported
#



> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> index 05a5bc678c089..dadfc5afc4121 100644
> --- a/kernel/sched/topology.c
> +++ b/kernel/sched/topology.c
> @@ -230,9 +230,28 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
> if (write && !capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> + if (!sched_energy_enabled()) {
> + if (write)
> + return -EOPNOTSUPP;
> + else {
> + size_t len;
> +
> + if (*ppos) {
> + *lenp = 0;
> + return 0;

Is it possible for *ppos to be 0? if so in which scenario?
and
Does it make sense to make length as 0 unconditionally if eas
is not possible?

> + }
> +
> + len = snprintf((char *)buffer, 3, "0\n");
> +
> + *lenp = len;
> + *ppos += len;
> + return 0;
> + }
> +> ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);

2023-09-22 14:51:50

by Pierre Gondois

[permalink] [raw]
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture



On 9/18/23 14:22, Valentin Schneider wrote:
> On 15/09/23 23:40, Shrikanth Hegde wrote:
>> On 9/15/23 5:30 PM, Valentin Schneider wrote:
>>> On 14/09/23 23:26, Shrikanth Hegde wrote:
>>>> On 9/14/23 9:51 PM, Valentin Schneider wrote:
>>>>> On 13/09/23 17:18, Shrikanth Hegde wrote:
>>>>>> sysctl_sched_energy_aware is available for the admin to disable/enable
>>>>>> energy aware scheduling(EAS). EAS is enabled only if few conditions are
>>>>>> met by the platform. They are, asymmetric CPU capacity, no SMT,
>>>>>> valid cpufreq policy, frequency invariant load tracking. It is possible
>>>>>> platform when booting may not have EAS capability, but can do that after.
>>>>>> For example, changing/registering the cpufreq policy.
>>>>>>
>>>>>> At present, though platform doesn't support EAS, this sysctl is still
>>>>>> present and it ends up calling rebuild of sched domain on write to 1 and
>>>>>> NOP when writing to 0. That is confusing and un-necessary.
>>>>>>
>>>>>
>>>>
>>>> Hi Valentin, Thanks for taking a look at this patch.
>>>>
>>>>> But why would you write to it in the first place? Or do you mean to use
>>>>> this as an indicator for userspace that EAS is supported?
>>>>>
>>>>
>>>> Since this sysctl is present and its value being 1, it gives the
>>>> impression to the user that EAS is supported when it is not.
>>>> So its an attempt to correct that part.
>>>>
>>>
>>> Ah, I see. Then how about just making the sysctl return 0 when EAS isn't
>>> supported? And on top of it, prevent all writes when EAS isn't supported
>>> (perf domains cannot be built, so there would be no point in forcing a
>>> rebuild that will do nothing).
>>
>> Yes. That's another way. Thats what I had as possible approach in
>> https://lore.kernel.org/lkml/[email protected]/
>>
>
> Thanks for the link; and apologies for bringing up topics that have been
> discussed already.
>
>>
>>
>>>
>>> I can never remember how to properly use the sysctl API, so that's a very
>>> crude implementation, but something like so?
>>>
>>> ---
>>>
>>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>>> index 05a5bc678c089..dadfc5afc4121 100644
>>> --- a/kernel/sched/topology.c
>>> +++ b/kernel/sched/topology.c
>>> @@ -230,9 +230,28 @@ static int sched_energy_aware_handler(struct ctl_table *table, int write,
>>> if (write && !capable(CAP_SYS_ADMIN))
>>> return -EPERM;
>>>
>>> + if (!sched_energy_enabled()) {
>>
>> Use of sched_energy_enabled won't work as Pierre has indicated.
>>
>> Instead this can be done by adding those checks in a helper function to
>> do similar checks as done build_perf_domains.
>>
>> I can send v4 with this approach if it makes more sense. Please let me know.
>>
>
> So what I'm thinking is the standard approach seems to be to keep the knobs
> visible, but change how reads/writes to them are handled.
>
> For instance, SMT support has
>
> /sys/devices/system/cpu/smt
> /control
> /active
>
> And a system with CONFIG_HOTPLUG_SMT=y but no actual hardware SMT will
> have:
>
> /control = notsupported
> /active = 0


Having such interface for EAS would be ideal no ?
/active:
would be the equivalent of the current sysctl_sched_energy_aware

/control:
would show whether CONFIG_SCHED_DEBUG was set and all the conditions
to have EAS enabled are satisfied.

Possible states for SMT:
---
static const char *smt_states[] = {
[CPU_SMT_ENABLED] = "on", // EAS possible and running
[CPU_SMT_DISABLED] = "off", // EAS possible and not running
[CPU_SMT_FORCE_DISABLED] = "forceoff", // not applicable for EAS
[CPU_SMT_NOT_SUPPORTED] = "notsupported", // system with smt or not asymmetric or no freq invariance
[CPU_SMT_NOT_IMPLEMENTED] = "notimplemented", // CONFIG_SCHED_DEBUG=n
};
---


>
> So IMO it would make sense to keep sched_energy_aware around, but make it
> read 0 and prevent writes for systems that have the software support
> compiled but don't have the actual hardware support.
>
> In a pinch it also helps to know if CONFIG_ENERGY_MODEL was selected,
> though that's obvious enough with CONFIG_SCHED_DEBUG=y.
>

2023-09-23 00:14:40

by Shrikanth Hegde

[permalink] [raw]
Subject: Re: [PATCH v3] sched/topology: remove sysctl_sched_energy_aware depending on the architecture



On 9/22/23 8:14 PM, Pierre Gondois wrote:
>
>
> On 9/18/23 14:22, Valentin Schneider wrote:
>> On 15/09/23 23:40, Shrikanth Hegde wrote:
>>> On 9/15/23 5:30 PM, Valentin Schneider wrote:
>>>> On 14/09/23 23:26, Shrikanth Hegde wrote:
>>>>> On 9/14/23 9:51 PM, Valentin Schneider wrote:
>>>>>> On 13/09/23 17:18, Shrikanth Hegde wrote:
>>>>>>> sysctl_sched_energy_aware is available for the admin to
>>>>>>> disable/enable
>>>>>>> energy aware scheduling(EAS). EAS is enabled only if few
>>>>>>> conditions are
>>>>>>> met by the platform. They are, asymmetric CPU capacity, no SMT,
>>>>>>> valid cpufreq policy, frequency invariant load tracking. It is
>>>>>>> possible
>>>>>>> platform when booting may not have EAS capability, but can do
>>>>>>> that after.
>>>>>>> For example, changing/registering the cpufreq policy.
>>>>>>>
>>>>>>> At present, though platform doesn't support EAS, this sysctl is
>>>>>>> still
>>>>>>> present and it ends up calling rebuild of sched domain on write
>>>>>>> to 1 and
>>>>>>> NOP when writing to 0. That is confusing and un-necessary.
>>>>>>>
>>>>>>
>>>>>
>>>>> Hi Valentin, Thanks for taking a look at this patch.
>>>>>
>>>>>> But why would you write to it in the first place? Or do you mean
>>>>>> to use
>>>>>> this as an indicator for userspace that EAS is supported?
>>>>>>
>>>>>
>>>>> Since this sysctl is present and its value being 1, it gives the
>>>>> impression to the user that EAS is supported when it is not.
>>>>> So its an attempt to correct that part.
>>>>>
>>>>
>>>> Ah, I see. Then how about just making the sysctl return 0 when EAS
>>>> isn't
>>>> supported? And on top of it, prevent all writes when EAS isn't
>>>> supported
>>>> (perf domains cannot be built, so there would be no point in forcing a
>>>> rebuild that will do nothing).
>>>
>>> Yes. That's another way. Thats what I had as possible approach in
>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>
>> Thanks for the link; and apologies for bringing up topics that have been
>> discussed already.
>>
>>>
>>>
>>>>
>>>> I can never remember how to properly use the sysctl API, so that's a
>>>> very
>>>> crude implementation, but something like so?
>>>>
>>>> ---
>>>>
>>>> diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
>>>> index 05a5bc678c089..dadfc5afc4121 100644
>>>> --- a/kernel/sched/topology.c
>>>> +++ b/kernel/sched/topology.c
>>>> @@ -230,9 +230,28 @@ static int sched_energy_aware_handler(struct
>>>> ctl_table *table, int write,
>>>>       if (write && !capable(CAP_SYS_ADMIN))
>>>>           return -EPERM;
>>>>
>>>> +    if (!sched_energy_enabled()) {
>>>
>>> Use of sched_energy_enabled won't work as Pierre has indicated.
>>>
>>> Instead this can be done by adding those checks in a helper function to
>>> do similar checks as done build_perf_domains.
>>>
>>> I can send v4 with this approach if it makes more sense. Please let
>>> me know.
>>>
>>
>> So what I'm thinking is the standard approach seems to be to keep the
>> knobs
>> visible, but change how reads/writes to them are handled.
>>
>> For instance, SMT support has
>>
>>    /sys/devices/system/cpu/smt
>>      /control
>>      /active
>>
>> And a system with CONFIG_HOTPLUG_SMT=y but no actual hardware SMT will
>> have:
>>
>>      /control = notsupported
>>      /active  = 0
>
>
> Having such interface for EAS would be ideal no > /active:
> would be the equivalent of the current sysctl_sched_energy_aware
>
> /control:
> would show whether CONFIG_SCHED_DEBUG was set and all the conditions
> to have EAS enabled are satisfied.
>
> Possible states for SMT:
> ---
> static const char *smt_states[] = {
>     [CPU_SMT_ENABLED]        = "on",             // EAS possible and
> running
>     [CPU_SMT_DISABLED]        = "off",            // EAS possible and
> not running
>     [CPU_SMT_FORCE_DISABLED]    = "forceoff",       // not applicable
> for EAS
>     [CPU_SMT_NOT_SUPPORTED]        = "notsupported",   // system with
> smt or not asymmetric or no freq invariance
>     [CPU_SMT_NOT_IMPLEMENTED]    = "notimplemented", //
> CONFIG_SCHED_DEBUG=n
> };
> ---
>

Likely the current simpler approach more or less achieves the same i think.

With V4 (not yet sent). Didnt send it out yet, as i am not
sure of proc handler's internal way of handling buffers and corner cases. I am
thinking to make *lenp=0 unconditionally and return if EAS is not possible. With
that I have these possibilities.

If sysctl is not there at all, that mean CONFIG_ENERGY_MODEL was not selected
If sysctl return empty string, that mean EAS is not possible at the moment.
One can figure out whats the reason from dmesg.
If sysctl return 1 or 0 - EAS is possible and its either enabled or disabled.

>
>>
>> So IMO it would make sense to keep sched_energy_aware around, but make it
>> read 0 and prevent writes for systems that have the software support
>> compiled but don't have the actual hardware support.
>>
>> In a pinch it also helps to know if CONFIG_ENERGY_MODEL was selected,
>> though that's obvious enough with CONFIG_SCHED_DEBUG=y.
>>