2019-03-06 15:58:08

by Lingutla Chandrasekhar

[permalink] [raw]
Subject: [PATCH v1] arch_topology: Make cpu_capacity sysfs node as ready-only

If user updates any cpu's cpu_capacity, then the new value is going to
be applied to all its online sibling cpus. But this need not to be correct
always, as sibling cpus (in ARM, same micro architecture cpus) would have
different cpu_capacity with different performance characteristics.
So updating the user supplied cpu_capacity to all cpu siblings
is not correct.

And another problem is, current code assumes that 'all cpus in a cluster
or with same package_id (core_siblings), would have same cpu_capacity'.
But with commit '5bdd2b3f0f8 ("arm64: topology: add support to remove
cpu topology sibling masks")', when a cpu hotplugged out, the cpu
information gets cleared in its sibling cpus. So user supplied
cpu_capacity would be applied to only online sibling cpus at the time.
After that, if any cpu hot plugged in, it would have different cpu_capacity
than its siblings, which breaks the above assumption.

So instead of mucking around the core sibling mask for user supplied
value, use device-tree to set cpu capacity. And make the cpu_capacity
node as read-only to know the assymetry between cpus in the system.

Signed-off-by: Lingutla Chandrasekhar <[email protected]>
---
drivers/base/arch_topology.c | 33 +--------------------------------
1 file changed, 1 insertion(+), 32 deletions(-)

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index edfcf8d..d455897 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -7,7 +7,6 @@
*/

#include <linux/acpi.h>
-#include <linux/arch_topology.h>
#include <linux/cpu.h>
#include <linux/cpufreq.h>
#include <linux/device.h>
@@ -51,37 +50,7 @@ static ssize_t cpu_capacity_show(struct device *dev,
static void update_topology_flags_workfn(struct work_struct *work);
static DECLARE_WORK(update_topology_flags_work, update_topology_flags_workfn);

-static ssize_t cpu_capacity_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf,
- size_t count)
-{
- struct cpu *cpu = container_of(dev, struct cpu, dev);
- int this_cpu = cpu->dev.id;
- int i;
- unsigned long new_capacity;
- ssize_t ret;
-
- if (!count)
- return 0;
-
- ret = kstrtoul(buf, 0, &new_capacity);
- if (ret)
- return ret;
- if (new_capacity > SCHED_CAPACITY_SCALE)
- return -EINVAL;
-
- mutex_lock(&cpu_scale_mutex);
- for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
- topology_set_cpu_scale(i, new_capacity);
- mutex_unlock(&cpu_scale_mutex);
-
- schedule_work(&update_topology_flags_work);
-
- return count;
-}
-
-static DEVICE_ATTR_RW(cpu_capacity);
+static DEVICE_ATTR_RO(cpu_capacity);

static int register_cpu_capacity_sysctl(void)
{
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.



2019-03-07 07:29:36

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH v1] arch_topology: Make cpu_capacity sysfs node as ready-only

Hi,

On 06/03/19 20:57, Lingutla Chandrasekhar wrote:
> If user updates any cpu's cpu_capacity, then the new value is going to
> be applied to all its online sibling cpus. But this need not to be correct
> always, as sibling cpus (in ARM, same micro architecture cpus) would have
> different cpu_capacity with different performance characteristics.
> So updating the user supplied cpu_capacity to all cpu siblings
> is not correct.
>
> And another problem is, current code assumes that 'all cpus in a cluster
> or with same package_id (core_siblings), would have same cpu_capacity'.
> But with commit '5bdd2b3f0f8 ("arm64: topology: add support to remove
> cpu topology sibling masks")', when a cpu hotplugged out, the cpu
> information gets cleared in its sibling cpus. So user supplied
> cpu_capacity would be applied to only online sibling cpus at the time.
> After that, if any cpu hot plugged in, it would have different cpu_capacity
> than its siblings, which breaks the above assumption.
>
> So instead of mucking around the core sibling mask for user supplied
> value, use device-tree to set cpu capacity. And make the cpu_capacity
> node as read-only to know the assymetry between cpus in the system.
>
> Signed-off-by: Lingutla Chandrasekhar <[email protected]>
> ---
> drivers/base/arch_topology.c | 33 +--------------------------------
> 1 file changed, 1 insertion(+), 32 deletions(-)
>
> diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
> index edfcf8d..d455897 100644
> --- a/drivers/base/arch_topology.c
> +++ b/drivers/base/arch_topology.c
> @@ -7,7 +7,6 @@
> */
>
> #include <linux/acpi.h>
> -#include <linux/arch_topology.h>
> #include <linux/cpu.h>
> #include <linux/cpufreq.h>
> #include <linux/device.h>
> @@ -51,37 +50,7 @@ static ssize_t cpu_capacity_show(struct device *dev,
> static void update_topology_flags_workfn(struct work_struct *work);
> static DECLARE_WORK(update_topology_flags_work, update_topology_flags_workfn);
>
> -static ssize_t cpu_capacity_store(struct device *dev,
> - struct device_attribute *attr,
> - const char *buf,
> - size_t count)
> -{
> - struct cpu *cpu = container_of(dev, struct cpu, dev);
> - int this_cpu = cpu->dev.id;
> - int i;
> - unsigned long new_capacity;
> - ssize_t ret;
> -
> - if (!count)
> - return 0;
> -
> - ret = kstrtoul(buf, 0, &new_capacity);
> - if (ret)
> - return ret;
> - if (new_capacity > SCHED_CAPACITY_SCALE)
> - return -EINVAL;
> -
> - mutex_lock(&cpu_scale_mutex);
> - for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
> - topology_set_cpu_scale(i, new_capacity);
> - mutex_unlock(&cpu_scale_mutex);
> -
> - schedule_work(&update_topology_flags_work);
> -
> - return count;
> -}
> -
> -static DEVICE_ATTR_RW(cpu_capacity);
> +static DEVICE_ATTR_RO(cpu_capacity);

There are cases in which this needs to be RW, as recently discussed
https://lore.kernel.org/lkml/20181123135807.GA14964@e107155-lin/

IMHO, if the core_sibling assumption doesn't work in all cases, one
should be looking into fixing it, rather than making this RO.

Best,

- Juri

2019-03-07 09:32:21

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH v1] arch_topology: Make cpu_capacity sysfs node as ready-only

Hi Juri,

On Thursday 07 Mar 2019 at 08:28:56 (+0100), Juri Lelli wrote:
> There are cases in which this needs to be RW, as recently discussed
> https://lore.kernel.org/lkml/20181123135807.GA14964@e107155-lin/

Yeah there's that problem when you can't fix your DT ... But I guess
this is a problem for _all_ values in the DT, not just capacities right ?
But these other values, I'd expected they just can't be fixed from
userspace most of the time, you just have to live with sub-optimal
values. So I don't find it unreasonable to do that for capacities too.

> IMHO, if the core_sibling assumption doesn't work in all cases, one
> should be looking into fixing it, rather than making this RO.

It's just that this thing keeps causing more harm than it helps IMO.
It's quite severely broken ATM, and it prevents us from assuming
'stable' capacity values in places were we'd like to do so (e.g. EAS).

And I'm not aware of a single platform where this is used. So, I'm
personally all for removing the write capability if we can.

Thanks,
Quentin

2019-03-07 09:59:46

by Juri Lelli

[permalink] [raw]
Subject: Re: [PATCH v1] arch_topology: Make cpu_capacity sysfs node as ready-only

Hi,

On 07/03/19 09:31, Quentin Perret wrote:
> Hi Juri,
>
> On Thursday 07 Mar 2019 at 08:28:56 (+0100), Juri Lelli wrote:
> > There are cases in which this needs to be RW, as recently discussed
> > https://lore.kernel.org/lkml/20181123135807.GA14964@e107155-lin/
>
> Yeah there's that problem when you can't fix your DT ... But I guess
> this is a problem for _all_ values in the DT, not just capacities right ?
> But these other values, I'd expected they just can't be fixed from
> userspace most of the time, you just have to live with sub-optimal
> values. So I don't find it unreasonable to do that for capacities too.
>
> > IMHO, if the core_sibling assumption doesn't work in all cases, one
> > should be looking into fixing it, rather than making this RO.
>
> It's just that this thing keeps causing more harm than it helps IMO.
> It's quite severely broken ATM, and it prevents us from assuming
> 'stable' capacity values in places were we'd like to do so (e.g. EAS).
>
> And I'm not aware of a single platform where this is used. So, I'm
> personally all for removing the write capability if we can.

If people think it's best to simply make this RO, I won't be against it.
Just pointed out a conversation we recently had. Guess we could also
make it RW again (properly) in the future if somebody complains.

Best,

- Juri

2019-03-07 12:14:49

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH v1] arch_topology: Make cpu_capacity sysfs node as ready-only

On Thursday 07 Mar 2019 at 10:57:50 (+0100), Juri Lelli wrote:
> If people think it's best to simply make this RO, I won't be against it.
> Just pointed out a conversation we recently had. Guess we could also
> make it RW again (properly) in the future if somebody complains.

Right, now is probably the time to give it a go before folks start
depending on it. And if I am wrong (and that happens more often than I'd
like unfortunately :-)) and there are users of that thing, then the
revert should be trivial.

Thanks,
Quentin

2019-03-07 15:07:17

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH v1] arch_topology: Make cpu_capacity sysfs node as ready-only

On Thu, Mar 07, 2019 at 12:14:03PM +0000, Quentin Perret wrote:
> On Thursday 07 Mar 2019 at 10:57:50 (+0100), Juri Lelli wrote:
> > If people think it's best to simply make this RO, I won't be against it.
> > Just pointed out a conversation we recently had. Guess we could also
> > make it RW again (properly) in the future if somebody complains.
>
> Right, now is probably the time to give it a go before folks start
> depending on it. And if I am wrong (and that happens more often than I'd
> like unfortunately :-)) and there are users of that thing, then the
> revert should be trivial.
>

+1 on all the points above ;)(I may also be getting things wrong here
but I am not convinced that we can resolve the issue for all the ARM
vendor possible combinations we may have to address)

We should come up with some *magical* cpumask that we can use if we
want to retain this write capability. And only way I see we can do that
is using DT which in turn eliminates the need to have write capability
for this sysfs.

So I am going to ack the $subject patch for now.

--
Regards,
Sudeep

2019-03-07 15:20:50

by Sudeep Holla

[permalink] [raw]
Subject: Re: [PATCH v1] arch_topology: Make cpu_capacity sysfs node as ready-only

On Wed, Mar 06, 2019 at 08:57:53PM +0530, Lingutla Chandrasekhar wrote:
> If user updates any cpu's cpu_capacity, then the new value is going to
> be applied to all its online sibling cpus. But this need not to be correct
> always, as sibling cpus (in ARM, same micro architecture cpus) would have
> different cpu_capacity with different performance characteristics.
> So updating the user supplied cpu_capacity to all cpu siblings
> is not correct.
>
> And another problem is, current code assumes that 'all cpus in a cluster
> or with same package_id (core_siblings), would have same cpu_capacity'.
> But with commit '5bdd2b3f0f8 ("arm64: topology: add support to remove
> cpu topology sibling masks")', when a cpu hotplugged out, the cpu
> information gets cleared in its sibling cpus. So user supplied
> cpu_capacity would be applied to only online sibling cpus at the time.
> After that, if any cpu hot plugged in, it would have different cpu_capacity
> than its siblings, which breaks the above assumption.
>
> So instead of mucking around the core sibling mask for user supplied
> value, use device-tree to set cpu capacity. And make the cpu_capacity
> node as read-only to know the assymetry between cpus in the system.
>

Acked-by: Sudeep Holla <[email protected]>

IIRC this was added for 2 possibilities though I don't completely agree
no one had any objections(including me though I wonder how/why I missed
to notice it now, anyways it's too late)

1. For systems that don't provide this information via device-tree/any
firmware though that's the highly recommended way. With more complex
topologies in horizon, I can't think of fetching/deducing this
information *correctly* in any other sane way.

2. For some sort of tuning(avoid rebuild and reboot), but that's
questionable as this is not a software characteristic. It's more
like deriving hardware characteristics using software experiments.
So, for me, we can compare this with some hardware latencies we have
like CPU idle entry/exit latencies. They are tuned but not in
production kernels. So if there's a case for adding this back as
write capable sysfs, I would prefer that in debugfs and this sysfs
is read-only ABI.

Hope that helps.

--
Regards,
Sudeep

2019-03-08 11:46:11

by Dietmar Eggemann

[permalink] [raw]
Subject: Re: [PATCH v1] arch_topology: Make cpu_capacity sysfs node as ready-only

On 3/6/19 4:27 PM, Lingutla Chandrasekhar wrote:

[...]

> @@ -51,37 +50,7 @@ static ssize_t cpu_capacity_show(struct device *dev,
> static void update_topology_flags_workfn(struct work_struct *work);
> static DECLARE_WORK(update_topology_flags_work, update_topology_flags_workfn);
>
> -static ssize_t cpu_capacity_store(struct device *dev,
> - struct device_attribute *attr,
> - const char *buf,
> - size_t count)
> -{
> - struct cpu *cpu = container_of(dev, struct cpu, dev);
> - int this_cpu = cpu->dev.id;
> - int i;
> - unsigned long new_capacity;
> - ssize_t ret;
> -
> - if (!count)
> - return 0;
> -
> - ret = kstrtoul(buf, 0, &new_capacity);
> - if (ret)
> - return ret;
> - if (new_capacity > SCHED_CAPACITY_SCALE)
> - return -EINVAL;
> -
> - mutex_lock(&cpu_scale_mutex);

Since we can't write to cpu_scale from here anymore, we could get rid of
cpu_scale_mutex.
topology_normalize_cpu_scale()->topology_set_cpu_scale() is now only
called from:

[ 0.202628] topology_normalize_cpu_scale+0x28/0x30
[ 0.207529] init_cpu_topology+0x168/0x1e8
[ 0.211644] smp_prepare_cpus+0x2c/0x108
[ 0.215585] kernel_init_freeable+0x104/0x518
[ 0.219963] kernel_init+0x18/0x110
[ 0.223469] ret_from_fork+0x10/0x1c

for dts capacity-dmips-mhz properties

and

[ 3.130180] topology_normalize_cpu_scale.part.0+0xac/0xd0
[ 3.135619] init_cpu_capacity_callback+0x100/0x178
[ 3.140459] notifier_call_chain+0x5c/0xa0
[ 3.144522] blocking_notifier_call_chain+0x64/0x88
[ 3.149363] cpufreq_set_policy+0xd8/0x3c8
[ 3.153427] cpufreq_init_policy+0x78/0xc8

for cpufreq max frequency related adjustments to cpu capacity.

The mutex was introduced for the sysfs interface here:
https://lore.kernel.org/lkml/[email protected]

> - for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
> - topology_set_cpu_scale(i, new_capacity);
> - mutex_unlock(&cpu_scale_mutex);
> -
> - schedule_work(&update_topology_flags_work);
> -
> - return count;
> -}
> -
> -static DEVICE_ATTR_RW(cpu_capacity);
> +static DEVICE_ATTR_RO(cpu_capacity);
>
> static int register_cpu_capacity_sysctl(void)
> {
>

Tested-by: Dietmar Eggemann <[email protected]>

on Arm64 Juno with v5.0

2019-03-08 12:41:27

by Lingutla Chandrasekhar

[permalink] [raw]
Subject: [PATCH v2] arch_topology: Make cpu_capacity sysfs node as ready-only

If user updates any cpu's cpu_capacity, then the new value is going to
be applied to all its online sibling cpus. But this need not to be correct
always, as sibling cpus (in ARM, same micro architecture cpus) would have
different cpu_capacity with different performance characteristics.
So updating the user supplied cpu_capacity to all cpu siblings
is not correct.

And another problem is, current code assumes that 'all cpus in a cluster
or with same package_id (core_siblings), would have same cpu_capacity'.
But with commit '5bdd2b3f0f8 ("arm64: topology: add support to remove
cpu topology sibling masks")', when a cpu hotplugged out, the cpu
information gets cleared in its sibling cpus. So user supplied
cpu_capacity would be applied to only online sibling cpus at the time.
After that, if any cpu hot plugged in, it would have different cpu_capacity
than its siblings, which breaks the above assumption.

So instead of mucking around the core sibling mask for user supplied
value, use device-tree to set cpu capacity. And make the cpu_capacity
node as read-only to know the assymetry between cpus in the system.
While at it, remove cpu_scale_mutex usage, which used for sysfs write
protection.

Tested-by: Dietmar Eggemann <[email protected]>
Acked-by: Sudeep Holla <[email protected]>
Signed-off-by: Lingutla Chandrasekhar <[email protected]>
---

Changes from v1:
- Removed cpu_scale_mutex usage, suggested by Dietmar Eggemann.
---
drivers/base/arch_topology.c | 36 +-----------------------------------
1 file changed, 1 insertion(+), 35 deletions(-)

diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index edfcf8d..1739d7e 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -7,7 +7,6 @@
*/

#include <linux/acpi.h>
-#include <linux/arch_topology.h>
#include <linux/cpu.h>
#include <linux/cpufreq.h>
#include <linux/device.h>
@@ -31,7 +30,6 @@ void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
per_cpu(freq_scale, i) = scale;
}

-static DEFINE_MUTEX(cpu_scale_mutex);
DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;

void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
@@ -51,37 +49,7 @@ static ssize_t cpu_capacity_show(struct device *dev,
static void update_topology_flags_workfn(struct work_struct *work);
static DECLARE_WORK(update_topology_flags_work, update_topology_flags_workfn);

-static ssize_t cpu_capacity_store(struct device *dev,
- struct device_attribute *attr,
- const char *buf,
- size_t count)
-{
- struct cpu *cpu = container_of(dev, struct cpu, dev);
- int this_cpu = cpu->dev.id;
- int i;
- unsigned long new_capacity;
- ssize_t ret;
-
- if (!count)
- return 0;
-
- ret = kstrtoul(buf, 0, &new_capacity);
- if (ret)
- return ret;
- if (new_capacity > SCHED_CAPACITY_SCALE)
- return -EINVAL;
-
- mutex_lock(&cpu_scale_mutex);
- for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
- topology_set_cpu_scale(i, new_capacity);
- mutex_unlock(&cpu_scale_mutex);
-
- schedule_work(&update_topology_flags_work);
-
- return count;
-}
-
-static DEVICE_ATTR_RW(cpu_capacity);
+static DEVICE_ATTR_RO(cpu_capacity);

static int register_cpu_capacity_sysctl(void)
{
@@ -141,7 +109,6 @@ void topology_normalize_cpu_scale(void)
return;

pr_debug("cpu_capacity: capacity_scale=%u\n", capacity_scale);
- mutex_lock(&cpu_scale_mutex);
for_each_possible_cpu(cpu) {
pr_debug("cpu_capacity: cpu=%d raw_capacity=%u\n",
cpu, raw_capacity[cpu]);
@@ -151,7 +118,6 @@ void topology_normalize_cpu_scale(void)
pr_debug("cpu_capacity: CPU%d cpu_capacity=%lu\n",
cpu, topology_get_cpu_scale(NULL, cpu));
}
- mutex_unlock(&cpu_scale_mutex);
}

bool __init topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu)
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.


2019-03-27 10:59:03

by Quentin Perret

[permalink] [raw]
Subject: Re: [PATCH v2] arch_topology: Make cpu_capacity sysfs node as ready-only

Hi,

On Friday 08 Mar 2019 at 18:08:48 (+0530), Lingutla Chandrasekhar wrote:
> If user updates any cpu's cpu_capacity, then the new value is going to
> be applied to all its online sibling cpus. But this need not to be correct
> always, as sibling cpus (in ARM, same micro architecture cpus) would have
> different cpu_capacity with different performance characteristics.
> So updating the user supplied cpu_capacity to all cpu siblings
> is not correct.
>
> And another problem is, current code assumes that 'all cpus in a cluster
> or with same package_id (core_siblings), would have same cpu_capacity'.
> But with commit '5bdd2b3f0f8 ("arm64: topology: add support to remove
> cpu topology sibling masks")', when a cpu hotplugged out, the cpu
> information gets cleared in its sibling cpus. So user supplied
> cpu_capacity would be applied to only online sibling cpus at the time.
> After that, if any cpu hot plugged in, it would have different cpu_capacity
> than its siblings, which breaks the above assumption.
>
> So instead of mucking around the core sibling mask for user supplied
> value, use device-tree to set cpu capacity. And make the cpu_capacity
> node as read-only to know the assymetry between cpus in the system.
> While at it, remove cpu_scale_mutex usage, which used for sysfs write
> protection.
>
> Tested-by: Dietmar Eggemann <[email protected]>
> Acked-by: Sudeep Holla <[email protected]>
> Signed-off-by: Lingutla Chandrasekhar <[email protected]>

Reviewed-by: Quentin Perret <[email protected]>
Tested-by: Quentin Perret <[email protected]>

Thanks for doing this,
Quentin