2015-07-20 09:47:19

by Viresh Kumar

[permalink] [raw]
Subject: [PATCH] cpufreq: Avoid double addition/removal of sysfs links

Consider a dual core (0/1) system with two CPUs:
- sharing clock/voltage rails and hence cpufreq-policy
- CPU1 is offline while the cpufreq driver is registered

- cpufreq_add_dev() is called from subsys callback for CPU0 and we
create the policy for the CPUs and create link for CPU1.
- cpufreq_add_dev() is called from subsys callback for CPU1, we find
that the cpu is offline and we try to create a sysfs link for CPU1.
- This results in double addition of the sysfs link and we get this:

WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x7c()
sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq'
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc2+ #1704
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[<c0013248>] (dump_backtrace) from [<c00133e4>] (show_stack+0x18/0x1c)
r6:c01a1f30 r5:0000001f r4:00000000 r3:00000000
[<c00133cc>] (show_stack) from [<c076920c>] (dump_stack+0x7c/0x98)
[<c0769190>] (dump_stack) from [<c0029ab4>] (warn_slowpath_common+0x80/0xbc)
r4:d74abbd0 r3:d74c0000
[<c0029a34>] (warn_slowpath_common) from [<c0029b94>] (warn_slowpath_fmt+0x38/0x40)
r8:ffffffef r7:00000000 r6:d75a8960 r5:c0993280 r4:d6b4d000
[<c0029b60>] (warn_slowpath_fmt) from [<c01a1f30>] (sysfs_warn_dup+0x60/0x7c)
r3:d6b4dfe7 r2:c0930750
[<c01a1ed0>] (sysfs_warn_dup) from [<c01a22c8>] (sysfs_do_create_link_sd+0xb8/0xc0)
r6:d75a8960 r5:c0993280 r4:d00aba20
[<c01a2210>] (sysfs_do_create_link_sd) from [<c01a22fc>] (sysfs_create_link+0x2c/0x3c)
r10:00000001 r8:c14db3c8 r7:d7b89010 r6:c0ae7c60 r5:d7b89010 r4:d00d1200
[<c01a22d0>] (sysfs_create_link) from [<c0506160>] (add_cpu_dev_symlink+0x34/0x5c)
[<c050612c>] (add_cpu_dev_symlink) from [<c05084d0>] (cpufreq_add_dev+0x674/0x794)
r5:00000001 r4:00000000
[<c0507e5c>] (cpufreq_add_dev) from [<c03db114>] (subsys_interface_register+0x8c/0xd0)
r10:00000003 r9:d7bb01f0 r8:c14db3c8 r7:00106738 r6:c0ae7c60 r5:c0acbd08
r4:c0ae7e20
[<c03db088>] (subsys_interface_register) from [<c0508a2c>] (cpufreq_register_driver+0x104/0x1f4)


The check for offline-cpu in cpufreq_add_dev() is present to ensure that
link gets added for the CPUs, that weren't physically present earlier
and we missed the case where a CPU is offline while registering the
driver.

Fix this by keeping track of CPUs for which link is already created, and
avoiding duplicate sysfs entries.

Fixes: 87549141d516 ("cpufreq: Stop migrating sysfs files on hotplug")
Reported-by: Russell King <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
---
Russell,

Can you please give this a try? (completely untested).

drivers/cpufreq/cpufreq.c | 26 +++++++++++++++++++++++---
include/linux/cpufreq.h | 1 +
2 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index cdbe0676d246..12d089b78cad 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -984,17 +984,26 @@ EXPORT_SYMBOL(cpufreq_sysfs_remove_file);
static int add_cpu_dev_symlink(struct cpufreq_policy *policy, int cpu)
{
struct device *cpu_dev;
+ int ret;

pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu);

if (!policy)
return 0;

+ /* Already added for offline CPUS from subsys callback */
+ if (cpumask_test_cpu(cpu, policy->symlinks))
+ return 0;
+
cpu_dev = get_cpu_device(cpu);
if (WARN_ON(!cpu_dev))
return 0;

- return sysfs_create_link(&cpu_dev->kobj, &policy->kobj, "cpufreq");
+ ret = sysfs_create_link(&cpu_dev->kobj, &policy->kobj, "cpufreq");
+ if (!ret)
+ cpumask_set_cpu(cpu, policy->symlinks);
+
+ return ret;
}

static void remove_cpu_dev_symlink(struct cpufreq_policy *policy, int cpu)
@@ -1007,6 +1016,7 @@ static void remove_cpu_dev_symlink(struct cpufreq_policy *policy, int cpu)
if (WARN_ON(!cpu_dev))
return;

+ cpumask_clear_cpu(cpu, policy->symlinks);
sysfs_remove_link(&cpu_dev->kobj, "cpufreq");
}

@@ -1038,6 +1048,10 @@ static void cpufreq_remove_dev_symlink(struct cpufreq_policy *policy)
if (j == policy->kobj_cpu)
continue;

+ /* Already removed for offline CPUS from subsys callback */
+ if (!cpumask_test_cpu(j, policy->symlinks))
+ continue;
+
remove_cpu_dev_symlink(policy, j);
}
}
@@ -1172,11 +1186,14 @@ static struct cpufreq_policy *cpufreq_policy_alloc(struct device *dev)
if (!zalloc_cpumask_var(&policy->related_cpus, GFP_KERNEL))
goto err_free_cpumask;

+ if (!zalloc_cpumask_var(&policy->symlinks, GFP_KERNEL))
+ goto err_free_related_cpumask;
+
ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq, &dev->kobj,
"cpufreq");
if (ret) {
pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret);
- goto err_free_rcpumask;
+ goto err_free_symlink_cpumask;
}

INIT_LIST_HEAD(&policy->policy_list);
@@ -1193,7 +1210,9 @@ static struct cpufreq_policy *cpufreq_policy_alloc(struct device *dev)

return policy;

-err_free_rcpumask:
+err_free_symlink_cpumask:
+ free_cpumask_var(policy->symlinks);
+err_free_related_cpumask:
free_cpumask_var(policy->related_cpus);
err_free_cpumask:
free_cpumask_var(policy->cpus);
@@ -1243,6 +1262,7 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy, bool notify)
write_unlock_irqrestore(&cpufreq_driver_lock, flags);

cpufreq_policy_put_kobj(policy, notify);
+ free_cpumask_var(policy->symlinks);
free_cpumask_var(policy->related_cpus);
free_cpumask_var(policy->cpus);
kfree(policy);
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index a82049683016..b4812f6977c6 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -62,6 +62,7 @@ struct cpufreq_policy {
/* CPUs sharing clock, require sw coordination */
cpumask_var_t cpus; /* Online CPUs only */
cpumask_var_t related_cpus; /* Online + Offline CPUs */
+ cpumask_var_t symlinks; /* CPUs for which cpufreq sysfs directory is present */

unsigned int shared_type; /* ACPI: ANY or ALL affected CPUs
should set cpufreq */
--
2.4.0


2015-07-20 10:36:35

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Avoid double addition/removal of sysfs links

On Mon, Jul 20, 2015 at 03:17:10PM +0530, Viresh Kumar wrote:
> Consider a dual core (0/1) system with two CPUs:
> - sharing clock/voltage rails and hence cpufreq-policy
> - CPU1 is offline while the cpufreq driver is registered
>
> - cpufreq_add_dev() is called from subsys callback for CPU0 and we
> create the policy for the CPUs and create link for CPU1.
> - cpufreq_add_dev() is called from subsys callback for CPU1, we find
> that the cpu is offline and we try to create a sysfs link for CPU1.
> - This results in double addition of the sysfs link and we get this:
>
> WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x7c()
> sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq'
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc2+ #1704
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> Backtrace:
> [<c0013248>] (dump_backtrace) from [<c00133e4>] (show_stack+0x18/0x1c)
> r6:c01a1f30 r5:0000001f r4:00000000 r3:00000000
> [<c00133cc>] (show_stack) from [<c076920c>] (dump_stack+0x7c/0x98)
> [<c0769190>] (dump_stack) from [<c0029ab4>] (warn_slowpath_common+0x80/0xbc)
> r4:d74abbd0 r3:d74c0000
> [<c0029a34>] (warn_slowpath_common) from [<c0029b94>] (warn_slowpath_fmt+0x38/0x40)
> r8:ffffffef r7:00000000 r6:d75a8960 r5:c0993280 r4:d6b4d000
> [<c0029b60>] (warn_slowpath_fmt) from [<c01a1f30>] (sysfs_warn_dup+0x60/0x7c)
> r3:d6b4dfe7 r2:c0930750
> [<c01a1ed0>] (sysfs_warn_dup) from [<c01a22c8>] (sysfs_do_create_link_sd+0xb8/0xc0)
> r6:d75a8960 r5:c0993280 r4:d00aba20
> [<c01a2210>] (sysfs_do_create_link_sd) from [<c01a22fc>] (sysfs_create_link+0x2c/0x3c)
> r10:00000001 r8:c14db3c8 r7:d7b89010 r6:c0ae7c60 r5:d7b89010 r4:d00d1200
> [<c01a22d0>] (sysfs_create_link) from [<c0506160>] (add_cpu_dev_symlink+0x34/0x5c)
> [<c050612c>] (add_cpu_dev_symlink) from [<c05084d0>] (cpufreq_add_dev+0x674/0x794)
> r5:00000001 r4:00000000
> [<c0507e5c>] (cpufreq_add_dev) from [<c03db114>] (subsys_interface_register+0x8c/0xd0)
> r10:00000003 r9:d7bb01f0 r8:c14db3c8 r7:00106738 r6:c0ae7c60 r5:c0acbd08
> r4:c0ae7e20
> [<c03db088>] (subsys_interface_register) from [<c0508a2c>] (cpufreq_register_driver+0x104/0x1f4)
>
>
> The check for offline-cpu in cpufreq_add_dev() is present to ensure that
> link gets added for the CPUs, that weren't physically present earlier
> and we missed the case where a CPU is offline while registering the
> driver.
>
> Fix this by keeping track of CPUs for which link is already created, and
> avoiding duplicate sysfs entries.

Why do we try to create the symlink for CPU devices which we haven't
"detected" yet (iow, we haven't had cpufreq_add_dev() called for)?
Surely we are guaranteed to have cpufreq_add_dev() called for every
CPU which exists in sysfs? So why not _only_ create the sysfs symlinks
when cpufreq_add_dev() is notified that a CPU subsys interface is
present?

Sure, if the policy changes, we need to do maintanence on these symlinks,
but I see only one path down into cpufreq_add_dev_symlink(), which is:

cpufreq_add_dev() -> cpufreq_add_dev_interface() ->
cpufreq_add_dev_symlink()

In other words, only when we see a new CPU interface appears, not when
the policy changes. If the set of related CPUs is policy independent,
why is this information carried in the cpufreq_policy struct?

If it is policy dependent, then I see no code which handles the effect
of a policy change where the policy->related_cpus is different. To me,
that sounds like a rather huge design hole.

Things get worse. Reading drivers/base/cpu.c, CPU interface nodes are
only ever created - they're created for the set of _possible_ CPUs in
the system, not those which are possible and present, and there is no
unregister_cpu() API, only a register_cpu() API. So, cpufreq_remove_dev()
won't be called for CPUs which were present and are no longer present.
This appears to be a misunderstanding of CPU hotplug...

So, cpufreq_remove_dev() will only get called when you call
subsys_interface_unregister(), not when the CPU present mask changes.
I suspect that the code in cpufreq_remove_dev() dealing with "offline"
CPUs even works... I'd recommend reading Documentation/cpu-hotplug.txt:

| cpu_present_mask: Bitmap of CPUs currently present in the system. Not all
| of them may be online. When physical hotplug is processed by the relevant
| subsystem (e.g ACPI) can change and new bit either be added or removed
| from the map depending on the event is hot-add/hot-remove. There are
| currently no locking rules as of now. Typical usage is to init topology
| during boot, at which time hotplug is disabled.
|
| You really dont need to manipulate any of the system cpu maps. They should
| be read-only for most use. When setting up per-cpu resources almost always
| use cpu_possible_mask/for_each_possible_cpu() to iterate.

In other words, I think your usage of cpu_present_mask in this code is
buggy in itself.

Please rethink the design of this code - I think your original change is
mis-designed.

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-07-21 00:47:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Avoid double addition/removal of sysfs links

Hi Russell,

On Mon, Jul 20, 2015 at 12:36 PM, Russell King - ARM Linux
<[email protected]> wrote:
> On Mon, Jul 20, 2015 at 03:17:10PM +0530, Viresh Kumar wrote:
>> Consider a dual core (0/1) system with two CPUs:
>> - sharing clock/voltage rails and hence cpufreq-policy
>> - CPU1 is offline while the cpufreq driver is registered
>>
>> - cpufreq_add_dev() is called from subsys callback for CPU0 and we
>> create the policy for the CPUs and create link for CPU1.
>> - cpufreq_add_dev() is called from subsys callback for CPU1, we find
>> that the cpu is offline and we try to create a sysfs link for CPU1.
>> - This results in double addition of the sysfs link and we get this:
>>
>> WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x7c()
>> sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq'
>> Modules linked in:
>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc2+ #1704
>> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
>> Backtrace:
>> [<c0013248>] (dump_backtrace) from [<c00133e4>] (show_stack+0x18/0x1c)
>> r6:c01a1f30 r5:0000001f r4:00000000 r3:00000000
>> [<c00133cc>] (show_stack) from [<c076920c>] (dump_stack+0x7c/0x98)
>> [<c0769190>] (dump_stack) from [<c0029ab4>] (warn_slowpath_common+0x80/0xbc)
>> r4:d74abbd0 r3:d74c0000
>> [<c0029a34>] (warn_slowpath_common) from [<c0029b94>] (warn_slowpath_fmt+0x38/0x40)
>> r8:ffffffef r7:00000000 r6:d75a8960 r5:c0993280 r4:d6b4d000
>> [<c0029b60>] (warn_slowpath_fmt) from [<c01a1f30>] (sysfs_warn_dup+0x60/0x7c)
>> r3:d6b4dfe7 r2:c0930750
>> [<c01a1ed0>] (sysfs_warn_dup) from [<c01a22c8>] (sysfs_do_create_link_sd+0xb8/0xc0)
>> r6:d75a8960 r5:c0993280 r4:d00aba20
>> [<c01a2210>] (sysfs_do_create_link_sd) from [<c01a22fc>] (sysfs_create_link+0x2c/0x3c)
>> r10:00000001 r8:c14db3c8 r7:d7b89010 r6:c0ae7c60 r5:d7b89010 r4:d00d1200
>> [<c01a22d0>] (sysfs_create_link) from [<c0506160>] (add_cpu_dev_symlink+0x34/0x5c)
>> [<c050612c>] (add_cpu_dev_symlink) from [<c05084d0>] (cpufreq_add_dev+0x674/0x794)
>> r5:00000001 r4:00000000
>> [<c0507e5c>] (cpufreq_add_dev) from [<c03db114>] (subsys_interface_register+0x8c/0xd0)
>> r10:00000003 r9:d7bb01f0 r8:c14db3c8 r7:00106738 r6:c0ae7c60 r5:c0acbd08
>> r4:c0ae7e20
>> [<c03db088>] (subsys_interface_register) from [<c0508a2c>] (cpufreq_register_driver+0x104/0x1f4)
>>
>>
>> The check for offline-cpu in cpufreq_add_dev() is present to ensure that
>> link gets added for the CPUs, that weren't physically present earlier
>> and we missed the case where a CPU is offline while registering the
>> driver.
>>
>> Fix this by keeping track of CPUs for which link is already created, and
>> avoiding duplicate sysfs entries.
>
> Why do we try to create the symlink for CPU devices which we haven't
> "detected" yet (iow, we haven't had cpufreq_add_dev() called for)?
> Surely we are guaranteed to have cpufreq_add_dev() called for every
> CPU which exists in sysfs? So why not _only_ create the sysfs symlinks
> when cpufreq_add_dev() is notified that a CPU subsys interface is
> present?

That's something I've overlooked.

Yes, we should be doing exactly that.

> Sure, if the policy changes, we need to do maintanence on these symlinks,
> but I see only one path down into cpufreq_add_dev_symlink(), which is:
>
> cpufreq_add_dev() -> cpufreq_add_dev_interface() ->
> cpufreq_add_dev_symlink()
>
> In other words, only when we see a new CPU interface appears, not when
> the policy changes. If the set of related CPUs is policy independent,
> why is this information carried in the cpufreq_policy struct?

It is not policy-dependent, but the way that information is gathered
is not exactly straightforward. It generally depends on what the
platform firmware tells us about the topology.

> If it is policy dependent, then I see no code which handles the effect
> of a policy change where the policy->related_cpus is different. To me,
> that sounds like a rather huge design hole.
>
> Things get worse. Reading drivers/base/cpu.c, CPU interface nodes are
> only ever created - they're created for the set of _possible_ CPUs in
> the system, not those which are possible and present, and there is no
> unregister_cpu() API, only a register_cpu() API.

There is unregister_cpu() API too, but it is called from
arch_unregister_cpu(). And it calls device_unregister() and all of
the appropriate things happen AFAICS. Eventually,
cpufreq_remove_dev() is called from that path.

Thanks,
Rafael

2015-07-21 01:14:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Avoid double addition/removal of sysfs links

On Tue, Jul 21, 2015 at 2:47 AM, Rafael J. Wysocki <[email protected]> wrote:
> Hi Russell,
>
> On Mon, Jul 20, 2015 at 12:36 PM, Russell King - ARM Linux
> <[email protected]> wrote:
>> On Mon, Jul 20, 2015 at 03:17:10PM +0530, Viresh Kumar wrote:
>>> Consider a dual core (0/1) system with two CPUs:
>>> - sharing clock/voltage rails and hence cpufreq-policy
>>> - CPU1 is offline while the cpufreq driver is registered
>>>
>>> - cpufreq_add_dev() is called from subsys callback for CPU0 and we
>>> create the policy for the CPUs and create link for CPU1.
>>> - cpufreq_add_dev() is called from subsys callback for CPU1, we find
>>> that the cpu is offline and we try to create a sysfs link for CPU1.
>>> - This results in double addition of the sysfs link and we get this:
>>>
>>> WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x7c()
>>> sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq'
>>> Modules linked in:
>>> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc2+ #1704
>>> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
>>> Backtrace:
>>> [<c0013248>] (dump_backtrace) from [<c00133e4>] (show_stack+0x18/0x1c)
>>> r6:c01a1f30 r5:0000001f r4:00000000 r3:00000000
>>> [<c00133cc>] (show_stack) from [<c076920c>] (dump_stack+0x7c/0x98)
>>> [<c0769190>] (dump_stack) from [<c0029ab4>] (warn_slowpath_common+0x80/0xbc)
>>> r4:d74abbd0 r3:d74c0000
>>> [<c0029a34>] (warn_slowpath_common) from [<c0029b94>] (warn_slowpath_fmt+0x38/0x40)
>>> r8:ffffffef r7:00000000 r6:d75a8960 r5:c0993280 r4:d6b4d000
>>> [<c0029b60>] (warn_slowpath_fmt) from [<c01a1f30>] (sysfs_warn_dup+0x60/0x7c)
>>> r3:d6b4dfe7 r2:c0930750
>>> [<c01a1ed0>] (sysfs_warn_dup) from [<c01a22c8>] (sysfs_do_create_link_sd+0xb8/0xc0)
>>> r6:d75a8960 r5:c0993280 r4:d00aba20
>>> [<c01a2210>] (sysfs_do_create_link_sd) from [<c01a22fc>] (sysfs_create_link+0x2c/0x3c)
>>> r10:00000001 r8:c14db3c8 r7:d7b89010 r6:c0ae7c60 r5:d7b89010 r4:d00d1200
>>> [<c01a22d0>] (sysfs_create_link) from [<c0506160>] (add_cpu_dev_symlink+0x34/0x5c)
>>> [<c050612c>] (add_cpu_dev_symlink) from [<c05084d0>] (cpufreq_add_dev+0x674/0x794)
>>> r5:00000001 r4:00000000
>>> [<c0507e5c>] (cpufreq_add_dev) from [<c03db114>] (subsys_interface_register+0x8c/0xd0)
>>> r10:00000003 r9:d7bb01f0 r8:c14db3c8 r7:00106738 r6:c0ae7c60 r5:c0acbd08
>>> r4:c0ae7e20
>>> [<c03db088>] (subsys_interface_register) from [<c0508a2c>] (cpufreq_register_driver+0x104/0x1f4)
>>>
>>>
>>> The check for offline-cpu in cpufreq_add_dev() is present to ensure that
>>> link gets added for the CPUs, that weren't physically present earlier
>>> and we missed the case where a CPU is offline while registering the
>>> driver.
>>>
>>> Fix this by keeping track of CPUs for which link is already created, and
>>> avoiding duplicate sysfs entries.
>>
>> Why do we try to create the symlink for CPU devices which we haven't
>> "detected" yet (iow, we haven't had cpufreq_add_dev() called for)?
>> Surely we are guaranteed to have cpufreq_add_dev() called for every
>> CPU which exists in sysfs? So why not _only_ create the sysfs symlinks
>> when cpufreq_add_dev() is notified that a CPU subsys interface is
>> present?
>
> That's something I've overlooked.
>
> Yes, we should be doing exactly that.
>
>> Sure, if the policy changes, we need to do maintanence on these symlinks,
>> but I see only one path down into cpufreq_add_dev_symlink(), which is:
>>
>> cpufreq_add_dev() -> cpufreq_add_dev_interface() ->
>> cpufreq_add_dev_symlink()
>>
>> In other words, only when we see a new CPU interface appears, not when
>> the policy changes. If the set of related CPUs is policy independent,
>> why is this information carried in the cpufreq_policy struct?
>
> It is not policy-dependent, but the way that information is gathered
> is not exactly straightforward. It generally depends on what the
> platform firmware tells us about the topology.
>
>> If it is policy dependent, then I see no code which handles the effect
>> of a policy change where the policy->related_cpus is different. To me,
>> that sounds like a rather huge design hole.
>>
>> Things get worse. Reading drivers/base/cpu.c, CPU interface nodes are
>> only ever created - they're created for the set of _possible_ CPUs in
>> the system, not those which are possible and present, and there is no
>> unregister_cpu() API, only a register_cpu() API.
>
> There is unregister_cpu() API too, but it is called from
> arch_unregister_cpu(). And it calls device_unregister() and all of
> the appropriate things happen AFAICS. Eventually,
> cpufreq_remove_dev() is called from that path.

That said, cpu_present_mask may only be updated after calling
arch_unregister_cpu(), so checking it in cpufreq_remove_dev() doesn't
really help.

It looks like using cpufreq_remove_dev() as the subsys ->remove_dev
callback is a mistake as it cannot really tell the difference between
that code path and the CPU offline one.

Thanks,
Rafael

2015-07-21 10:43:21

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Avoid double addition/removal of sysfs links

On 20 July 2015 at 16:06, Russell King - ARM Linux
<[email protected]> wrote:
> Why do we try to create the symlink for CPU devices which we haven't
> "detected" yet (iow, we haven't had cpufreq_add_dev() called for)?
> Surely we are guaranteed to have cpufreq_add_dev() called for every
> CPU which exists in sysfs? So why not _only_ create the sysfs symlinks
> when cpufreq_add_dev() is notified that a CPU subsys interface is
> present?
>
> Sure, if the policy changes, we need to do maintanence on these symlinks,
> but I see only one path down into cpufreq_add_dev_symlink(), which is:
>
> cpufreq_add_dev() -> cpufreq_add_dev_interface() ->
> cpufreq_add_dev_symlink()
>
> In other words, only when we see a new CPU interface appears, not when
> the policy changes. If the set of related CPUs is policy independent,
> why is this information carried in the cpufreq_policy struct?
>
> If it is policy dependent, then I see no code which handles the effect
> of a policy change where the policy->related_cpus is different. To me,
> that sounds like a rather huge design hole.
>
> Things get worse. Reading drivers/base/cpu.c, CPU interface nodes are
> only ever created - they're created for the set of _possible_ CPUs in
> the system, not those which are possible and present, and there is no
> unregister_cpu() API, only a register_cpu() API. So, cpufreq_remove_dev()
> won't be called for CPUs which were present and are no longer present.
> This appears to be a misunderstanding of CPU hotplug...
>
> So, cpufreq_remove_dev() will only get called when you call
> subsys_interface_unregister(), not when the CPU present mask changes.
> I suspect that the code in cpufreq_remove_dev() dealing with "offline"
> CPUs even works... I'd recommend reading Documentation/cpu-hotplug.txt:
>
> | cpu_present_mask: Bitmap of CPUs currently present in the system. Not all
> | of them may be online. When physical hotplug is processed by the relevant
> | subsystem (e.g ACPI) can change and new bit either be added or removed
> | from the map depending on the event is hot-add/hot-remove. There are
> | currently no locking rules as of now. Typical usage is to init topology
> | during boot, at which time hotplug is disabled.
> |
> | You really dont need to manipulate any of the system cpu maps. They should
> | be read-only for most use. When setting up per-cpu resources almost always
> | use cpu_possible_mask/for_each_possible_cpu() to iterate.
>
> In other words, I think your usage of cpu_present_mask in this code is
> buggy in itself.
>
> Please rethink the design of this code - I think your original change is
> mis-designed.

I wasn't able to get time in last few days for this, sorry about that..

Will try my best tomorrow to come back to this..

2015-07-21 23:15:05

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Avoid double addition/removal of sysfs links

Hi VIresh,

On Mon, Jul 20, 2015 at 11:47 AM, Viresh Kumar <[email protected]> wrote:
> Consider a dual core (0/1) system with two CPUs:
> - sharing clock/voltage rails and hence cpufreq-policy
> - CPU1 is offline while the cpufreq driver is registered
>
> - cpufreq_add_dev() is called from subsys callback for CPU0 and we
> create the policy for the CPUs and create link for CPU1.
> - cpufreq_add_dev() is called from subsys callback for CPU1, we find
> that the cpu is offline and we try to create a sysfs link for CPU1.

So the problem is that the cpu_is_offline(cpu) check in
cpufreq_add_dev() matches two distinct cases: (1) the CPU was not
present before and it is just being hot-added and (2) the CPU is
initially offline, but present, and this is the first time its device
is registered. In the first case we can expect that the CPU will
become online shortly (although that is not guaranteed too), but in
the second case that very well may not happen.

We need to be able to distinguish between those two cases and your
patch does that, but I'm not sure if this really is the most
straightforward way to do it.

I'm also unsure why you're changing the removal code paths. Is there
any particular failure scenario you're concerned about?

Thanks,
Rafael

2015-07-22 01:29:41

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Avoid double addition/removal of sysfs links

On Wednesday, July 22, 2015 01:15:01 AM Rafael J. Wysocki wrote:
> Hi VIresh,
>
> On Mon, Jul 20, 2015 at 11:47 AM, Viresh Kumar <[email protected]> wrote:
> > Consider a dual core (0/1) system with two CPUs:
> > - sharing clock/voltage rails and hence cpufreq-policy
> > - CPU1 is offline while the cpufreq driver is registered
> >
> > - cpufreq_add_dev() is called from subsys callback for CPU0 and we
> > create the policy for the CPUs and create link for CPU1.
> > - cpufreq_add_dev() is called from subsys callback for CPU1, we find
> > that the cpu is offline and we try to create a sysfs link for CPU1.
>
> So the problem is that the cpu_is_offline(cpu) check in
> cpufreq_add_dev() matches two distinct cases: (1) the CPU was not
> present before and it is just being hot-added and (2) the CPU is
> initially offline, but present, and this is the first time its device
> is registered. In the first case we can expect that the CPU will
> become online shortly (although that is not guaranteed too), but in
> the second case that very well may not happen.
>
> We need to be able to distinguish between those two cases and your
> patch does that, but I'm not sure if this really is the most
> straightforward way to do it.

It looks like we need a mask of related CPUs that are present. Or,
alternatively, a mask of CPUs that would have been related had they
been present.

That's sort of what your patch is adding, but not quite.

Thanks,
Rafael

2015-07-22 02:33:22

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Avoid double addition/removal of sysfs links

On Wednesday, July 22, 2015 03:56:21 AM Rafael J. Wysocki wrote:
> On Wednesday, July 22, 2015 01:15:01 AM Rafael J. Wysocki wrote:
> > Hi VIresh,
> >
> > On Mon, Jul 20, 2015 at 11:47 AM, Viresh Kumar <[email protected]> wrote:
> > > Consider a dual core (0/1) system with two CPUs:
> > > - sharing clock/voltage rails and hence cpufreq-policy
> > > - CPU1 is offline while the cpufreq driver is registered
> > >
> > > - cpufreq_add_dev() is called from subsys callback for CPU0 and we
> > > create the policy for the CPUs and create link for CPU1.
> > > - cpufreq_add_dev() is called from subsys callback for CPU1, we find
> > > that the cpu is offline and we try to create a sysfs link for CPU1.
> >
> > So the problem is that the cpu_is_offline(cpu) check in
> > cpufreq_add_dev() matches two distinct cases: (1) the CPU was not
> > present before and it is just being hot-added and (2) the CPU is
> > initially offline, but present, and this is the first time its device
> > is registered. In the first case we can expect that the CPU will
> > become online shortly (although that is not guaranteed too), but in
> > the second case that very well may not happen.
> >
> > We need to be able to distinguish between those two cases and your
> > patch does that, but I'm not sure if this really is the most
> > straightforward way to do it.
>
> It looks like we need a mask of related CPUs that are present. Or,
> alternatively, a mask of CPUs that would have been related had they
> been present.
>
> That's sort of what your patch is adding, but not quite.

OK, below is my take on this (untested), on top of https://patchwork.kernel.org/patch/6839031/

We still only create policies for online CPUs which may be confusing in some
cases, but that's because drivers may not be able to handle CPUs that aren't
online.


---
drivers/cpufreq/cpufreq.c | 41 +++++++++++++++++++++++++----------------
include/linux/cpufreq.h | 1 +
2 files changed, 26 insertions(+), 16 deletions(-)

Index: linux-pm/include/linux/cpufreq.h
===================================================================
--- linux-pm.orig/include/linux/cpufreq.h
+++ linux-pm/include/linux/cpufreq.h
@@ -62,6 +62,7 @@ struct cpufreq_policy {
/* CPUs sharing clock, require sw coordination */
cpumask_var_t cpus; /* Online CPUs only */
cpumask_var_t related_cpus; /* Online + Offline CPUs */
+ cpumask_var_t real_cpus; /* Related and present */

unsigned int shared_type; /* ACPI: ANY or ALL affected CPUs
should set cpufreq */
Index: linux-pm/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq.c
+++ linux-pm/drivers/cpufreq/cpufreq.c
@@ -1002,7 +1002,7 @@ static int cpufreq_add_dev_symlink(struc
int ret = 0;

/* Some related CPUs might not be present (physically hotplugged) */
- for_each_cpu_and(j, policy->related_cpus, cpu_present_mask) {
+ for_each_cpu(j, policy->real_cpus) {
if (j == policy->kobj_cpu)
continue;

@@ -1019,7 +1019,7 @@ static void cpufreq_remove_dev_symlink(s
unsigned int j;

/* Some related CPUs might not be present (physically hotplugged) */
- for_each_cpu_and(j, policy->related_cpus, cpu_present_mask) {
+ for_each_cpu(j, policy->real_cpus) {
if (j == policy->kobj_cpu)
continue;

@@ -1157,11 +1157,14 @@ static struct cpufreq_policy *cpufreq_po
if (!zalloc_cpumask_var(&policy->related_cpus, GFP_KERNEL))
goto err_free_cpumask;

+ if (!zalloc_cpumask_var(&policy->real_cpus, GFP_KERNEL))
+ goto err_free_rcpumask;
+
ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq, &dev->kobj,
"cpufreq");
if (ret) {
pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret);
- goto err_free_rcpumask;
+ goto err_free_real_cpus;
}

INIT_LIST_HEAD(&policy->policy_list);
@@ -1178,6 +1181,8 @@ static struct cpufreq_policy *cpufreq_po

return policy;

+err_free_real_cpus:
+ free_cpumask_var(policy->real_cpus);
err_free_rcpumask:
free_cpumask_var(policy->related_cpus);
err_free_cpumask:
@@ -1228,6 +1233,7 @@ static void cpufreq_policy_free(struct c
write_unlock_irqrestore(&cpufreq_driver_lock, flags);

cpufreq_policy_put_kobj(policy, notify);
+ free_cpumask_var(policy->real_cpus);
free_cpumask_var(policy->related_cpus);
free_cpumask_var(policy->cpus);
kfree(policy);
@@ -1252,14 +1258,17 @@ static int cpufreq_add_dev(struct device

pr_debug("adding CPU %u\n", cpu);

- /*
- * Only possible if 'cpu' wasn't physically present earlier and we are
- * here from subsys_interface add callback. A hotplug notifier will
- * follow and we will handle it like logical CPU hotplug then. For now,
- * just create the sysfs link.
- */
- if (cpu_is_offline(cpu))
- return add_cpu_dev_symlink(per_cpu(cpufreq_cpu_data, cpu), cpu);
+ if (cpu_is_offline(cpu)) {
+ /*
+ * Only possible if we are here from the subsys_interface add
+ * callback. A hotplug notifier will follow and we will handle
+ * it as logical CPU hotplug then. For now, just create the
+ * sysfs link, unless there is no policy.
+ */
+ policy = per_cpu(cpufreq_cpu_data, cpu);
+ return policy && !cpumask_test_and_set_cpu(cpu, policy->real_cpus)
+ ? add_cpu_dev_symlink(policy, cpu) : 0;
+ }

if (!down_read_trylock(&cpufreq_rwsem))
return 0;
@@ -1301,6 +1310,9 @@ static int cpufreq_add_dev(struct device
/* related cpus should atleast have policy->cpus */
cpumask_or(policy->related_cpus, policy->related_cpus, policy->cpus);

+ cpumask_and(policy->cpus, policy->cpus, cpu_present_mask);
+ cpumask_or(policy->real_cpus, policy->real_cpus, policy->cpus);
+
/*
* affected cpus must always be the one, which are online. We aren't
* managing offline cpus here.
@@ -1525,19 +1537,16 @@ static int cpufreq_remove_dev(struct dev
* link or free policy here.
*/
if (cpu_is_offline(cpu)) {
- struct cpumask mask;
-
if (!policy)
return 0;

- cpumask_copy(&mask, policy->related_cpus);
- cpumask_clear_cpu(cpu, &mask);
+ cpumask_clear_cpu(cpu, policy->real_cpus);

/*
* Free policy only if all policy->related_cpus are removed
* physically.
*/
- if (cpumask_intersects(&mask, cpu_present_mask)) {
+ if (!cpumask_empty(policy->real_cpus)) {
remove_cpu_dev_symlink(policy, cpu);
return 0;
}

2015-07-22 06:57:58

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Avoid double addition/removal of sysfs links

Back now, sorry for the delay ..

On 20-07-15, 11:36, Russell King - ARM Linux wrote:
> Why do we try to create the symlink for CPU devices which we haven't
> "detected" yet (iow, we haven't had cpufreq_add_dev() called for)?
> Surely we are guaranteed to have cpufreq_add_dev() called for every
> CPU which exists in sysfs? So why not _only_ create the sysfs symlinks
> when cpufreq_add_dev() is notified that a CPU subsys interface is
> present?

There are lot of combinations in which things can happen and so it was
done to simplify things a bit, but I agree to what you are saying.
Makes sense, let me put some brain again on that path.

> Sure, if the policy changes, we need to do maintanence on these symlinks,

What do you mean by policy changes? The complete policy structure get
reallocated? or any of its properties changes?

The policy structure is only freed today, if either the driver gets
unregistered or its CPUs are physically hotplugged out. No maintenance
then.

> but I see only one path down into cpufreq_add_dev_symlink(), which is:
>
> cpufreq_add_dev() -> cpufreq_add_dev_interface() ->
> cpufreq_add_dev_symlink()
>
> In other words, only when we see a new CPU interface appears, not when
> the policy changes.

Right.

> If the set of related CPUs is policy independent,
> why is this information carried in the cpufreq_policy struct?

Management of policy is done based on what's there in related_cpus and
so its present here. IOW, these are the CPUs which own this policy.

> If it is policy dependent, then I see no code which handles the effect
> of a policy change where the policy->related_cpus is different.

Sorry, but I don't exactly understand the point here. What's policy
change? When a policy is destroyed we take care of all CPUs which are
there in ->cpus or related_cpus mask.. What else are we missing ?

> To me,
> that sounds like a rather huge design hole.

Maybe, just that I haven't understood it well yet.

> Things get worse. Reading drivers/base/cpu.c, CPU interface nodes are
> only ever created - they're created for the set of _possible_ CPUs in
> the system, not those which are possible and present, and there is no
> unregister_cpu() API, only a register_cpu() API. So, cpufreq_remove_dev()
> won't be called for CPUs which were present and are no longer present.
> This appears to be a misunderstanding of CPU hotplug...

You really got me worrying on this one and good that I read Rafael's
reply on this about it being called from arch code.

To be honest, I had no idea how the physical hotplug thing really
works and shouted couple of times on the list for help while working
on this set. Finally I took help of Srivatsa Bhat, who did lot of work
in the hotplug code and my patch was based on his suggestions. I
didn't looked at cpu.c in details to follow all code paths.

> So, cpufreq_remove_dev() will only get called when you call
> subsys_interface_unregister(), not when the CPU present mask changes.

I hope this statement doesn't hold true anymore.

> I suspect that the code in cpufreq_remove_dev() dealing with "offline"
> CPUs even works... I'd recommend reading Documentation/cpu-hotplug.txt:

I never tested it, our lovely ARM world doesn't have a case where we
can do physical hotplug :) .. Or do we have a virtual test to get that
done in some way? That would be helpful for future testing, in case
you are aware of any way out.

> | cpu_present_mask: Bitmap of CPUs currently present in the system. Not all
> | of them may be online. When physical hotplug is processed by the relevant
> | subsystem (e.g ACPI) can change and new bit either be added or removed
> | from the map depending on the event is hot-add/hot-remove. There are
> | currently no locking rules as of now. Typical usage is to init topology
> | during boot, at which time hotplug is disabled.
> |
> | You really dont need to manipulate any of the system cpu maps. They should
> | be read-only for most use. When setting up per-cpu resources almost always
> | use cpu_possible_mask/for_each_possible_cpu() to iterate.
>
> In other words, I think your usage of cpu_present_mask in this code is
> buggy in itself.

I do not accept it yet, I thought it was just fine.

> Please rethink the design of this code - I think your original change is
> mis-designed.

Yeah, based on the suggestion at the top, things need to change a bit
for sure. Thanks for looking into the details of the crappy design :)

--
viresh

2015-07-22 07:04:55

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Avoid double addition/removal of sysfs links

On 21-07-15, 03:14, Rafael J. Wysocki wrote:
> That said, cpu_present_mask may only be updated after calling
> arch_unregister_cpu(), so checking it in cpufreq_remove_dev() doesn't
> really help.

No, it is indeed useful. This is a snippet from the latest code we
have:

cpumask_copy(&mask, policy->related_cpus);
cpumask_clear_cpu(cpu, &mask);

/*
* Free policy only if all policy->related_cpus are removed
* physically.
*/
if (cpumask_intersects(&mask, cpu_present_mask)) {
remove_cpu_dev_symlink(policy, cpu);
return 0;
}

cpufreq_policy_free(policy, true);



So what we are checking in the 'if' block is: "Is any CPU from
related_cpus, apart from the one getting removed now, present in the
system."

If not, then free the policy.

> It looks like using cpufreq_remove_dev() as the subsys ->remove_dev
> callback is a mistake as it cannot really tell the difference between
> that code path and the CPU offline one.

What do you mean by this? Doesn't the sif parameter confirms that its
called from subsys path ?

--
viresh

2015-07-22 07:12:46

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Avoid double addition/removal of sysfs links

On 22-07-15, 01:15, Rafael J. Wysocki wrote:
> So the problem is that the cpu_is_offline(cpu) check in
> cpufreq_add_dev() matches two distinct cases: (1) the CPU was not
> present before and it is just being hot-added and (2) the CPU is
> initially offline, but present, and this is the first time its device
> is registered. In the first case we can expect that the CPU will
> become online shortly (although that is not guaranteed too), but in
> the second case that very well may not happen.

Yeah.

> We need to be able to distinguish between those two cases and your
> patch does that, but I'm not sure if this really is the most
> straightforward way to do it.

Maybe yeah. I will take another look into that after considering
Russell's input.

> I'm also unsure why you're changing the removal code paths. Is there
> any particular failure scenario you're concerned about?

The same issue is present here too. The problem was that cpu_offline()
check was getting hit for a CPU that is present in related_cpus mask.
While allocating/freeing the policy, we create links for all
related_cpus and the cpu_offline() check was adding/removing the link
again.

--
viresh