Since commit ca74b316df96 ("arm: Use common cpu_topology structure and
functions.") the core cpumask has to be modified during cpu hotplug
operations.
("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC")
[1] fixed that but revealed another issue on TC2, i.e in its cpufreq
driver.
During CPU hp stress operations on multiple CPUs, policy->related_cpus
can be altered. This is wrong since this cpumask should contain the
online and offline CPUs.
The WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)) in
cpufreq_online() triggers in this case.
The core cpumask can't be used to set the policy->cpus in
ve_spc_cpufreq_init() anymore in case it is called via
cpuhp_cpufreq_online()->cpufreq_online()->cpufreq_driver->init().
An empty online() callback can be used to avoid that the init()
driver function is called during CPU hotplug in so that
policy->related_cpus will not be changed.
Implementing an online() also requires an offline() callback.
Tested on TC2 with CPU hp stress test (CPU hp from multiple CPUs at
the same time).
[1]
https://lore.kernel.org/r/[email protected]
Signed-off-by: Dietmar Eggemann <[email protected]>
---
drivers/cpufreq/vexpress-spc-cpufreq.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/cpufreq/vexpress-spc-cpufreq.c b/drivers/cpufreq/vexpress-spc-cpufreq.c
index 506e3f2bf53a..857dcb535e97 100644
--- a/drivers/cpufreq/vexpress-spc-cpufreq.c
+++ b/drivers/cpufreq/vexpress-spc-cpufreq.c
@@ -492,6 +492,16 @@ static void ve_spc_cpufreq_ready(struct cpufreq_policy *policy)
cdev[cur_cluster] = of_cpufreq_cooling_register(policy);
}
+static int ve_spc_cpufreq_online(struct cpufreq_policy *policy)
+{
+ return 0;
+}
+
+static int ve_spc_cpufreq_offline(struct cpufreq_policy *policy)
+{
+ return 0;
+}
+
static struct cpufreq_driver ve_spc_cpufreq_driver = {
.name = "vexpress-spc",
.flags = CPUFREQ_STICKY |
@@ -503,6 +513,8 @@ static struct cpufreq_driver ve_spc_cpufreq_driver = {
.init = ve_spc_cpufreq_init,
.exit = ve_spc_cpufreq_exit,
.ready = ve_spc_cpufreq_ready,
+ .online = ve_spc_cpufreq_online,
+ .offline = ve_spc_cpufreq_offline,
.attr = cpufreq_generic_attr,
};
--
2.17.1
On 27-11-19, 12:48, Dietmar Eggemann wrote:
> Since commit ca74b316df96 ("arm: Use common cpu_topology structure and
> functions.") the core cpumask has to be modified during cpu hotplug
> operations.
>
> ("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC")
> [1] fixed that but revealed another issue on TC2, i.e in its cpufreq
> driver.
>
> During CPU hp stress operations on multiple CPUs, policy->related_cpus
> can be altered. This is wrong since this cpumask should contain the
> online and offline CPUs.
>
> The WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)) in
> cpufreq_online() triggers in this case.
>
> The core cpumask can't be used to set the policy->cpus in
> ve_spc_cpufreq_init() anymore in case it is called via
> cpuhp_cpufreq_online()->cpufreq_online()->cpufreq_driver->init().
>
> An empty online() callback can be used to avoid that the init()
> driver function is called during CPU hotplug in so that
> policy->related_cpus will not be changed.
>
> Implementing an online() also requires an offline() callback.
>
> Tested on TC2 with CPU hp stress test (CPU hp from multiple CPUs at
> the same time).
>
> [1]
> https://lore.kernel.org/r/[email protected]
>
> Signed-off-by: Dietmar Eggemann <[email protected]>
Wanna provide any fixes tag ?
> ---
> drivers/cpufreq/vexpress-spc-cpufreq.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
This is 5.5 material or 5.6 ?
--
viresh
On Wed, Nov 27, 2019 at 12:48:01PM +0100, Dietmar Eggemann wrote:
> Since commit ca74b316df96 ("arm: Use common cpu_topology structure and
> functions.") the core cpumask has to be modified during cpu hotplug
> operations.
>
> ("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC")
> [1] fixed that but revealed another issue on TC2, i.e in its cpufreq
> driver.
>
> During CPU hp stress operations on multiple CPUs, policy->related_cpus
> can be altered. This is wrong since this cpumask should contain the
> online and offline CPUs.
>
> The WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)) in
> cpufreq_online() triggers in this case.
>
> The core cpumask can't be used to set the policy->cpus in
> ve_spc_cpufreq_init() anymore in case it is called via
> cpuhp_cpufreq_online()->cpufreq_online()->cpufreq_driver->init().
>
> An empty online() callback can be used to avoid that the init()
> driver function is called during CPU hotplug in so that
> policy->related_cpus will not be changed.
>
Unlike DT based drivers, it not easy to get the fixed cpumask unless we
add some mechanism to extract it based on clks/OPP added. I prefer
this simple solution instead.
> Implementing an online() also requires an offline() callback.
>
> Tested on TC2 with CPU hp stress test (CPU hp from multiple CPUs at
> the same time).
>
> [1]
> https://lore.kernel.org/r/[email protected]
>
Reviewed-by: Sudeep Holla <[email protected]>
Tested-by: Sudeep Holla <[email protected]>
--
Regards,
Sudeep
On Wed, Nov 27, 2019 at 05:37:44PM +0530, Viresh Kumar wrote:
> On 27-11-19, 12:48, Dietmar Eggemann wrote:
> > Since commit ca74b316df96 ("arm: Use common cpu_topology structure and
> > functions.") the core cpumask has to be modified during cpu hotplug
> > operations.
> >
> > ("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC")
> > [1] fixed that but revealed another issue on TC2, i.e in its cpufreq
> > driver.
> >
> > During CPU hp stress operations on multiple CPUs, policy->related_cpus
> > can be altered. This is wrong since this cpumask should contain the
> > online and offline CPUs.
> >
> > The WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)) in
> > cpufreq_online() triggers in this case.
> >
> > The core cpumask can't be used to set the policy->cpus in
> > ve_spc_cpufreq_init() anymore in case it is called via
> > cpuhp_cpufreq_online()->cpufreq_online()->cpufreq_driver->init().
> >
> > An empty online() callback can be used to avoid that the init()
> > driver function is called during CPU hotplug in so that
> > policy->related_cpus will not be changed.
> >
> > Implementing an online() also requires an offline() callback.
> >
> > Tested on TC2 with CPU hp stress test (CPU hp from multiple CPUs at
> > the same time).
> >
> > [1]
> > https://lore.kernel.org/r/[email protected]
> >
> > Signed-off-by: Dietmar Eggemann <[email protected]>
>
> Wanna provide any fixes tag ?
>
> > ---
> > drivers/cpufreq/vexpress-spc-cpufreq.c | 12 ++++++++++++
> > 1 file changed, 12 insertions(+)
>
> This is 5.5 material or 5.6 ?
>
v5.5 for sure, broken even on v5.4 but unless someone really cares for
stable on TC2, I am happy to skip it.
--
Regards,
Sudeep
On 27-11-19, 12:08, Sudeep Holla wrote:
> On Wed, Nov 27, 2019 at 12:48:01PM +0100, Dietmar Eggemann wrote:
> > Since commit ca74b316df96 ("arm: Use common cpu_topology structure and
> > functions.") the core cpumask has to be modified during cpu hotplug
> > operations.
> >
> > ("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC")
> > [1] fixed that but revealed another issue on TC2, i.e in its cpufreq
> > driver.
> >
> > During CPU hp stress operations on multiple CPUs, policy->related_cpus
> > can be altered. This is wrong since this cpumask should contain the
> > online and offline CPUs.
> >
> > The WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)) in
> > cpufreq_online() triggers in this case.
> >
> > The core cpumask can't be used to set the policy->cpus in
> > ve_spc_cpufreq_init() anymore in case it is called via
> > cpuhp_cpufreq_online()->cpufreq_online()->cpufreq_driver->init().
> >
> > An empty online() callback can be used to avoid that the init()
> > driver function is called during CPU hotplug in so that
> > policy->related_cpus will not be changed.
> >
>
> Unlike DT based drivers, it not easy to get the fixed cpumask unless we
> add some mechanism to extract it based on clks/OPP added. I prefer
> this simple solution instead.
I will call this a work-around for the problem and not really the
solution, though I won't necessarily oppose it. There are cases which
will break even with this solution.
- Boot board with cpufreq driver as module.
- Offline all CPUs except CPU0.
- insert cpufreq driver.
- online all CPUs.
Now there is no guarantee that the last online will get the mask
properly, if I have understood the problem well :)
But yeah, who does this kind of messy work anyway :)
FWIW, we need a proper way (may be from architecture code) to find
list of all CPUs that share clock line.
--
viresh
On Wed, Nov 27, 2019 at 05:44:02PM +0530, Viresh Kumar wrote:
> On 27-11-19, 12:08, Sudeep Holla wrote:
> > On Wed, Nov 27, 2019 at 12:48:01PM +0100, Dietmar Eggemann wrote:
> > > Since commit ca74b316df96 ("arm: Use common cpu_topology structure and
> > > functions.") the core cpumask has to be modified during cpu hotplug
> > > operations.
> > >
> > > ("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC")
> > > [1] fixed that but revealed another issue on TC2, i.e in its cpufreq
> > > driver.
> > >
> > > During CPU hp stress operations on multiple CPUs, policy->related_cpus
> > > can be altered. This is wrong since this cpumask should contain the
> > > online and offline CPUs.
> > >
> > > The WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus)) in
> > > cpufreq_online() triggers in this case.
> > >
> > > The core cpumask can't be used to set the policy->cpus in
> > > ve_spc_cpufreq_init() anymore in case it is called via
> > > cpuhp_cpufreq_online()->cpufreq_online()->cpufreq_driver->init().
> > >
> > > An empty online() callback can be used to avoid that the init()
> > > driver function is called during CPU hotplug in so that
> > > policy->related_cpus will not be changed.
> > >
> >
> > Unlike DT based drivers, it not easy to get the fixed cpumask unless we
> > add some mechanism to extract it based on clks/OPP added. I prefer
> > this simple solution instead.
>
> I will call this a work-around for the problem and not really the
> solution, though I won't necessarily oppose it. There are cases which
> will break even with this solution.
>
I agree and that's the reason I spoke out my thought aloud here :)
> - Boot board with cpufreq driver as module.
> - Offline all CPUs except CPU0.
> - insert cpufreq driver.
> - online all CPUs.
>
Indeed, not just boot anytime since it's a module :)
> Now there is no guarantee that the last online will get the mask
> properly, if I have understood the problem well :)
>
Yes
> But yeah, who does this kind of messy work anyway :)
>
I won't bet on that ;)
> FWIW, we need a proper way (may be from architecture code) to find
> list of all CPUs that share clock line.
>
Yes but there's no architectural way. I need to revise and see tc2_pm.c
to check if we can do any magic there.
--
Regards,
Sudeep
On 27/11/2019 14:32, Sudeep Holla wrote:
> On Wed, Nov 27, 2019 at 05:44:02PM +0530, Viresh Kumar wrote:
>> On 27-11-19, 12:08, Sudeep Holla wrote:
>>> On Wed, Nov 27, 2019 at 12:48:01PM +0100, Dietmar Eggemann wrote:
[...]
>>> Unlike DT based drivers, it not easy to get the fixed cpumask unless we
>>> add some mechanism to extract it based on clks/OPP added. I prefer
>>> this simple solution instead.
>>
>> I will call this a work-around for the problem and not really the
>> solution, though I won't necessarily oppose it. There are cases which
>> will break even with this solution.
>>
>
> I agree and that's the reason I spoke out my thought aloud here :)
>
>> - Boot board with cpufreq driver as module.
>> - Offline all CPUs except CPU0.
>> - insert cpufreq driver.
>> - online all CPUs.
>>
>
> Indeed, not just boot anytime since it's a module :)
>
>> Now there is no guarantee that the last online will get the mask
>> properly, if I have understood the problem well :)
>>
>
> Yes
>
>> But yeah, who does this kind of messy work anyway :)
>>
>
> I won't bet on that ;)
>
>> FWIW, we need a proper way (may be from architecture code) to find
>> list of all CPUs that share clock line.
>>
>
> Yes but there's no architectural way. I need to revise and see tc2_pm.c
> to check if we can do any magic there.
I'm fine with finding a better solution to return a fixed topology core
cpumask or calling this patch a workaround. AFAICS, only TC2 is affected.
("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC")
is needed for other systems as well in case we have commit ca74b316df96
("arm: Use common cpu_topology structure and functions."). We probably
don't want to revert commit ca74b316df96?
We do CPU hp stress tests in our EAS mainline integration test suite
https://developer.arm.com/tools-and-software/open-source-software/linux-kernel/energy-aware-scheduling/eas-mainline-development
and there is where we initially encountered this issue on TC2.
On Wed, Nov 27, 2019 at 03:58:49PM +0100, Dietmar Eggemann wrote:
> On 27/11/2019 14:32, Sudeep Holla wrote:
[...]
> >
> > Yes but there's no architectural way. I need to revise and see tc2_pm.c
> > to check if we can do any magic there.
>
> I'm fine with finding a better solution to return a fixed topology core
> cpumask or calling this patch a workaround. AFAICS, only TC2 is affected.
>
> ("arm: Fix topology setup in case of CPU hotplug for CONFIG_SCHED_MC")
> is needed for other systems as well in case we have commit ca74b316df96
> ("arm: Use common cpu_topology structure and functions."). We probably
> don't want to revert commit ca74b316df96?
>
Correct
> We do CPU hp stress tests in our EAS mainline integration test suite
> https://developer.arm.com/tools-and-software/open-source-software/linux-kernel/energy-aware-scheduling/eas-mainline-development
> and there is where we initially encountered this issue on TC2.
I could come up with the patch below. If this is any cleaner and acceptable
I am happy to post it. One advantage of moving the use of topology_core_cpumask
inside ve_spc_clk_init is that it's just device_initcall and not a module.
It allows to handle ve_spc_cpufreq as module. I prefer this than the
previous solution/workaround. Let me know.
Regards,
Sudeep
-->8
diff --git i/arch/arm/mach-vexpress/spc.c w/arch/arm/mach-vexpress/spc.c
index 354e0e7025ae..e0e2e789a0b7 100644
--- i/arch/arm/mach-vexpress/spc.c
+++ w/arch/arm/mach-vexpress/spc.c
@@ -551,8 +551,9 @@ static struct clk *ve_spc_clk_register(struct device *cpu_dev)
static int __init ve_spc_clk_init(void)
{
- int cpu;
+ int cpu, cluster;
struct clk *clk;
+ bool init_opp_table[MAX_CLUSTERS] = { false };
if (!info)
return 0; /* Continue only if SPC is initialised */
@@ -578,8 +579,17 @@ static int __init ve_spc_clk_init(void)
continue;
}
+ cluster = topology_physical_package_id(cpu_dev->id);
+ if (init_opp_table[cluster])
+ continue;
+
if (ve_init_opp_table(cpu_dev))
pr_warn("failed to initialise cpu%d opp table\n", cpu);
+ else if (dev_pm_opp_set_sharing_cpus(cpu_dev,
+ topology_core_cpumask(cpu_dev->id)))
+ pr_warn("failed to mark OPPs shared for cpu%d\n", cpu);
+
+ init_opp_table[cluster] = true;
}
platform_device_register_simple("vexpress-spc-cpufreq", -1, NULL, 0);
diff --git i/drivers/cpufreq/vexpress-spc-cpufreq.c w/drivers/cpufreq/vexpress-spc-cpufreq.c
index 506e3f2bf53a..83c85d3d67e3 100644
--- i/drivers/cpufreq/vexpress-spc-cpufreq.c
+++ w/drivers/cpufreq/vexpress-spc-cpufreq.c
@@ -434,7 +434,7 @@ static int ve_spc_cpufreq_init(struct cpufreq_policy *policy)
if (cur_cluster < MAX_CLUSTERS) {
int cpu;
- cpumask_copy(policy->cpus, topology_core_cpumask(policy->cpu));
+ dev_pm_opp_get_sharing_cpus(cpu_dev, policy->cpus);
for_each_cpu(cpu, policy->cpus)
per_cpu(physical_cluster, cpu) = cur_cluster;
On Wed, Nov 27, 2019 at 03:40:29PM +0000, Sudeep Holla wrote:
>
> diff --git i/arch/arm/mach-vexpress/spc.c w/arch/arm/mach-vexpress/spc.c
> index 354e0e7025ae..e0e2e789a0b7 100644
> --- i/arch/arm/mach-vexpress/spc.c
> +++ w/arch/arm/mach-vexpress/spc.c
> @@ -551,8 +551,9 @@ static struct clk *ve_spc_clk_register(struct device *cpu_dev)
>
> static int __init ve_spc_clk_init(void)
> {
> - int cpu;
> + int cpu, cluster;
> struct clk *clk;
> + bool init_opp_table[MAX_CLUSTERS] = { false };
>
> if (!info)
> return 0; /* Continue only if SPC is initialised */
> @@ -578,8 +579,17 @@ static int __init ve_spc_clk_init(void)
> continue;
> }
>
> + cluster = topology_physical_package_id(cpu_dev->id);
> + if (init_opp_table[cluster])
> + continue;
> +
> if (ve_init_opp_table(cpu_dev))
> pr_warn("failed to initialise cpu%d opp table\n", cpu);
> + else if (dev_pm_opp_set_sharing_cpus(cpu_dev,
> + topology_core_cpumask(cpu_dev->id)))
> + pr_warn("failed to mark OPPs shared for cpu%d\n", cpu);
missing else here, found when I was looking at the patch to stash/commit
locally.
> +
> + init_opp_table[cluster] = true;
> }
>
> platform_device_register_simple("vexpress-spc-cpufreq", -1, NULL, 0);
--
Regards,
Sudeep
On 27-11-19, 15:40, Sudeep Holla wrote:
> diff --git i/arch/arm/mach-vexpress/spc.c w/arch/arm/mach-vexpress/spc.c
> index 354e0e7025ae..e0e2e789a0b7 100644
> --- i/arch/arm/mach-vexpress/spc.c
> +++ w/arch/arm/mach-vexpress/spc.c
> @@ -551,8 +551,9 @@ static struct clk *ve_spc_clk_register(struct device *cpu_dev)
>
> static int __init ve_spc_clk_init(void)
> {
> - int cpu;
> + int cpu, cluster;
> struct clk *clk;
> + bool init_opp_table[MAX_CLUSTERS] = { false };
>
> if (!info)
> return 0; /* Continue only if SPC is initialised */
> @@ -578,8 +579,17 @@ static int __init ve_spc_clk_init(void)
> continue;
> }
>
> + cluster = topology_physical_package_id(cpu_dev->id);
> + if (init_opp_table[cluster])
> + continue;
> +
> if (ve_init_opp_table(cpu_dev))
> pr_warn("failed to initialise cpu%d opp table\n", cpu);
> + else if (dev_pm_opp_set_sharing_cpus(cpu_dev,
> + topology_core_cpumask(cpu_dev->id)))
> + pr_warn("failed to mark OPPs shared for cpu%d\n", cpu);
> +
> + init_opp_table[cluster] = true;
> }
>
> platform_device_register_simple("vexpress-spc-cpufreq", -1, NULL, 0);
> diff --git i/drivers/cpufreq/vexpress-spc-cpufreq.c w/drivers/cpufreq/vexpress-spc-cpufreq.c
> index 506e3f2bf53a..83c85d3d67e3 100644
> --- i/drivers/cpufreq/vexpress-spc-cpufreq.c
> +++ w/drivers/cpufreq/vexpress-spc-cpufreq.c
> @@ -434,7 +434,7 @@ static int ve_spc_cpufreq_init(struct cpufreq_policy *policy)
> if (cur_cluster < MAX_CLUSTERS) {
> int cpu;
>
> - cpumask_copy(policy->cpus, topology_core_cpumask(policy->cpu));
> + dev_pm_opp_get_sharing_cpus(cpu_dev, policy->cpus);
>
> for_each_cpu(cpu, policy->cpus)
> per_cpu(physical_cluster, cpu) = cur_cluster;
This is a better *work-around* I would say, as we can't break it the
way I explained earlier :)
--
viresh
On 28/11/2019 03:31, Viresh Kumar wrote:
> On 27-11-19, 15:40, Sudeep Holla wrote:
>> diff --git i/arch/arm/mach-vexpress/spc.c w/arch/arm/mach-vexpress/spc.c
>> index 354e0e7025ae..e0e2e789a0b7 100644
>> --- i/arch/arm/mach-vexpress/spc.c
>> +++ w/arch/arm/mach-vexpress/spc.c
>> @@ -551,8 +551,9 @@ static struct clk *ve_spc_clk_register(struct device *cpu_dev)
>>
>> static int __init ve_spc_clk_init(void)
>> {
>> - int cpu;
>> + int cpu, cluster;
>> struct clk *clk;
>> + bool init_opp_table[MAX_CLUSTERS] = { false };
>>
>> if (!info)
>> return 0; /* Continue only if SPC is initialised */
>> @@ -578,8 +579,17 @@ static int __init ve_spc_clk_init(void)
>> continue;
>> }
>>
>> + cluster = topology_physical_package_id(cpu_dev->id);
>> + if (init_opp_table[cluster])
>> + continue;
>> +
>> if (ve_init_opp_table(cpu_dev))
>> pr_warn("failed to initialise cpu%d opp table\n", cpu);
>> + else if (dev_pm_opp_set_sharing_cpus(cpu_dev,
>> + topology_core_cpumask(cpu_dev->id)))
>> + pr_warn("failed to mark OPPs shared for cpu%d\n", cpu);
>> +
>> + init_opp_table[cluster] = true;
>> }
>>
>> platform_device_register_simple("vexpress-spc-cpufreq", -1, NULL, 0);
>> diff --git i/drivers/cpufreq/vexpress-spc-cpufreq.c w/drivers/cpufreq/vexpress-spc-cpufreq.c
>> index 506e3f2bf53a..83c85d3d67e3 100644
>> --- i/drivers/cpufreq/vexpress-spc-cpufreq.c
>> +++ w/drivers/cpufreq/vexpress-spc-cpufreq.c
>> @@ -434,7 +434,7 @@ static int ve_spc_cpufreq_init(struct cpufreq_policy *policy)
>> if (cur_cluster < MAX_CLUSTERS) {
>> int cpu;
>>
>> - cpumask_copy(policy->cpus, topology_core_cpumask(policy->cpu));
>> + dev_pm_opp_get_sharing_cpus(cpu_dev, policy->cpus);
>>
>> for_each_cpu(cpu, policy->cpus)
>> per_cpu(physical_cluster, cpu) = cur_cluster;
>
> This is a better *work-around* I would say, as we can't break it the
> way I explained earlier :)
I do agree. Tested CPU hp stress on TC2 and it looks good.
Tested-by: Dietmar Eggemann <[email protected]>