2015-07-22 12:07:37

by Viresh Kumar

[permalink] [raw]
Subject: [PATCH V2] cpufreq: Fix double addition of sysfs links

Consider a dual core (0/1) system with two CPUs:
- sharing clock/voltage rails and hence cpufreq-policy
- CPU1 is offline while the cpufreq driver is registered
- cpufreq_add_dev() is called from subsys callback for CPU0 and we
create the policy for the group of CPUs and create links for all
present CPUs, i.e. CPU1 as well.
- cpufreq_add_dev() is called from subsys callback for CPU1, we find
that the cpu is offline and we try to create a sysfs link for CPU1.

This results in double addtion of the sysfs link and we will get this:

WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x7c()
sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq'
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc2+ #1704
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[<c0013248>] (dump_backtrace) from [<c00133e4>] (show_stack+0x18/0x1c)
r6:c01a1f30 r5:0000001f r4:00000000 r3:00000000
[<c00133cc>] (show_stack) from [<c076920c>] (dump_stack+0x7c/0x98)
[<c0769190>] (dump_stack) from [<c0029ab4>] (warn_slowpath_common+0x80/0xbc)
r4:d74abbd0 r3:d74c0000
[<c0029a34>] (warn_slowpath_common) from [<c0029b94>] (warn_slowpath_fmt+0x38/0x40)
r8:ffffffef r7:00000000 r6:d75a8960 r5:c0993280 r4:d6b4d000
[<c0029b60>] (warn_slowpath_fmt) from [<c01a1f30>] (sysfs_warn_dup+0x60/0x7c)
r3:d6b4dfe7 r2:c0930750
[<c01a1ed0>] (sysfs_warn_dup) from [<c01a22c8>] (sysfs_do_create_link_sd+0xb8/0xc0)
r6:d75a8960 r5:c0993280 r4:d00aba20
[<c01a2210>] (sysfs_do_create_link_sd) from [<c01a22fc>] (sysfs_create_link+0x2c/0x3c)
r10:00000001 r8:c14db3c8 r7:d7b89010 r6:c0ae7c60 r5:d7b89010 r4:d00d1200
[<c01a22d0>] (sysfs_create_link) from [<c0506160>] (add_cpu_dev_symlink+0x34/0x5c)
[<c050612c>] (add_cpu_dev_symlink) from [<c05084d0>] (cpufreq_add_dev+0x674/0x794)
r5:00000001 r4:00000000
[<c0507e5c>] (cpufreq_add_dev) from [<c03db114>] (subsys_interface_register+0x8c/0xd0)
r10:00000003 r9:d7bb01f0 r8:c14db3c8 r7:00106738 r6:c0ae7c60 r5:c0acbd08
r4:c0ae7e20
[<c03db088>] (subsys_interface_register) from [<c0508a2c>] (cpufreq_register_driver+0x104/0x1f4)

The check for offline-cpu in cpufreq_add_dev() is to ensure that link
gets added for the CPUs which weren't physically present earlier and
that misses the case where a CPU is offline while registering the
driver.

To fix this properly, don't create these links when the policy get
initialized. Rather wait for individual subsys callback for CPUs to
add/remove these links. This simplifies most of the code leaving
cpufreq_remove_dev().

The problem is that, we might remove cpu which was owner of policy->kobj
in sysfs, before other CPUs are removed. Fix this by the solution we
have been using until very recently, in which we move the kobject to any
other CPU, for which remove is yet to be called.

Tested on dual core exynos board with cpufreq-dt driver. The driver was
compiled as module and inserted/removed multiple times on a running
kernel.

Fixes: 87549141d516 ("cpufreq: Stop migrating sysfs files on hotplug")
Reported-and-suggested-by: Russell King <[email protected]>
Signed-off-by: Viresh Kumar <[email protected]>
---
V1->V2: Completely changed, please review again :)

@Rafael: I didn't review your solution and gave this one because I
thought Russell suggested the right thing. i.e. don't create links in
the beginning.

This is based of 4.2-rc3 and so your other patch,
https://patchwork.kernel.org/patch/6839031/ has to be rebased over it.

I didn't rebase this patch over yours for two reasons:
- Yours wasn't necessarily 4.2 material.
- I already mentioned a problem in that patch.

@Russell: I hope this will look much better than V1 to you. Please give
it a try once you get some time.

drivers/cpufreq/cpufreq.c | 165 ++++++++++++++++++----------------------------
include/linux/cpufreq.h | 1 +
2 files changed, 65 insertions(+), 101 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 26063afb3eba..81c2417e52f4 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -966,67 +966,6 @@ void cpufreq_sysfs_remove_file(const struct attribute *attr)
}
EXPORT_SYMBOL(cpufreq_sysfs_remove_file);

-static int add_cpu_dev_symlink(struct cpufreq_policy *policy, int cpu)
-{
- struct device *cpu_dev;
-
- pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu);
-
- if (!policy)
- return 0;
-
- cpu_dev = get_cpu_device(cpu);
- if (WARN_ON(!cpu_dev))
- return 0;
-
- return sysfs_create_link(&cpu_dev->kobj, &policy->kobj, "cpufreq");
-}
-
-static void remove_cpu_dev_symlink(struct cpufreq_policy *policy, int cpu)
-{
- struct device *cpu_dev;
-
- pr_debug("%s: Removing symlink for CPU: %u\n", __func__, cpu);
-
- cpu_dev = get_cpu_device(cpu);
- if (WARN_ON(!cpu_dev))
- return;
-
- sysfs_remove_link(&cpu_dev->kobj, "cpufreq");
-}
-
-/* Add/remove symlinks for all related CPUs */
-static int cpufreq_add_dev_symlink(struct cpufreq_policy *policy)
-{
- unsigned int j;
- int ret = 0;
-
- /* Some related CPUs might not be present (physically hotplugged) */
- for_each_cpu_and(j, policy->related_cpus, cpu_present_mask) {
- if (j == policy->kobj_cpu)
- continue;
-
- ret = add_cpu_dev_symlink(policy, j);
- if (ret)
- break;
- }
-
- return ret;
-}
-
-static void cpufreq_remove_dev_symlink(struct cpufreq_policy *policy)
-{
- unsigned int j;
-
- /* Some related CPUs might not be present (physically hotplugged) */
- for_each_cpu_and(j, policy->related_cpus, cpu_present_mask) {
- if (j == policy->kobj_cpu)
- continue;
-
- remove_cpu_dev_symlink(policy, j);
- }
-}
-
static int cpufreq_add_dev_interface(struct cpufreq_policy *policy,
struct device *dev)
{
@@ -1057,7 +996,7 @@ static int cpufreq_add_dev_interface(struct cpufreq_policy *policy,
return ret;
}

- return cpufreq_add_dev_symlink(policy);
+ return 0;
}

static void cpufreq_init_policy(struct cpufreq_policy *policy)
@@ -1163,11 +1102,14 @@ static struct cpufreq_policy *cpufreq_policy_alloc(struct device *dev)
if (!zalloc_cpumask_var(&policy->related_cpus, GFP_KERNEL))
goto err_free_cpumask;

+ if (!zalloc_cpumask_var(&policy->symlinks, GFP_KERNEL))
+ goto err_free_related_cpumask;
+
ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq, &dev->kobj,
"cpufreq");
if (ret) {
pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret);
- goto err_free_rcpumask;
+ goto err_free_symlink_cpumask;
}

INIT_LIST_HEAD(&policy->policy_list);
@@ -1184,7 +1126,9 @@ static struct cpufreq_policy *cpufreq_policy_alloc(struct device *dev)

return policy;

-err_free_rcpumask:
+err_free_symlink_cpumask:
+ free_cpumask_var(policy->symlinks);
+err_free_related_cpumask:
free_cpumask_var(policy->related_cpus);
err_free_cpumask:
free_cpumask_var(policy->cpus);
@@ -1204,7 +1148,6 @@ static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy, bool notify)
CPUFREQ_REMOVE_POLICY, policy);

down_write(&policy->rwsem);
- cpufreq_remove_dev_symlink(policy);
kobj = &policy->kobj;
cmp = &policy->kobj_unregister;
up_write(&policy->rwsem);
@@ -1234,6 +1177,7 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy, bool notify)
write_unlock_irqrestore(&cpufreq_driver_lock, flags);

cpufreq_policy_put_kobj(policy, notify);
+ free_cpumask_var(policy->symlinks);
free_cpumask_var(policy->related_cpus);
free_cpumask_var(policy->cpus);
kfree(policy);
@@ -1252,26 +1196,37 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
{
unsigned int j, cpu = dev->id;
int ret = -ENOMEM;
- struct cpufreq_policy *policy;
+ struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
unsigned long flags;
bool recover_policy = !sif;

pr_debug("adding CPU %u\n", cpu);

+ /* sysfs links are only created on subsys callback */
+ if (sif && policy) {
+ pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu);
+ ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
+ if (ret) {
+ dev_err(dev, "%s: Failed to create link for cpu %d (%d)\n",
+ __func__, cpu, ret);
+ return ret;
+ }
+
+ /* Track CPUs for which sysfs links are created */
+ cpumask_set_cpu(cpu, policy->symlinks);
+ }
+
/*
- * Only possible if 'cpu' wasn't physically present earlier and we are
- * here from subsys_interface add callback. A hotplug notifier will
- * follow and we will handle it like logical CPU hotplug then. For now,
- * just create the sysfs link.
+ * A hotplug notifier will follow and we will take care of rest
+ * of the initialization then.
*/
if (cpu_is_offline(cpu))
- return add_cpu_dev_symlink(per_cpu(cpufreq_cpu_data, cpu), cpu);
+ return 0;

if (!down_read_trylock(&cpufreq_rwsem))
return 0;

/* Check if this CPU already has a policy to manage it */
- policy = per_cpu(cpufreq_cpu_data, cpu);
if (policy && !policy_is_inactive(policy)) {
WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus));
ret = cpufreq_add_policy_cpu(policy, cpu, dev);
@@ -1506,10 +1461,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
if (cpufreq_driver->exit)
cpufreq_driver->exit(policy);

- /* Free the policy only if the driver is getting removed. */
- if (sif)
- cpufreq_policy_free(policy, true);
-
return 0;
}

@@ -1521,42 +1472,54 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
{
unsigned int cpu = dev->id;
+ struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
int ret;

- /*
- * Only possible if 'cpu' is getting physically removed now. A hotplug
- * notifier should have already been called and we just need to remove
- * link or free policy here.
- */
- if (cpu_is_offline(cpu)) {
- struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
- struct cpumask mask;
+ if (!policy)
+ return 0;

- if (!policy)
- return 0;
+ if (cpu_online(cpu)) {
+ ret = __cpufreq_remove_dev_prepare(dev, sif);
+ if (!ret)
+ ret = __cpufreq_remove_dev_finish(dev, sif);
+ if (ret)
+ return ret;
+ }

- cpumask_copy(&mask, policy->related_cpus);
- cpumask_clear_cpu(cpu, &mask);
+ /* sysfs links are removed only on subsys callback */
+ if (cpumask_test_cpu(cpu, policy->symlinks)) {
+ dev_dbg(dev, "%s: Removing symlink for CPU: %u\n", __func__,
+ cpu);
+ cpumask_clear_cpu(cpu, policy->symlinks);
+ sysfs_remove_link(&dev->kobj, "cpufreq");
+ return 0;
+ }

+ if (cpumask_weight(policy->symlinks)) {
/*
- * Free policy only if all policy->related_cpus are removed
- * physically.
+ * Okay, we still have some CPUs left. Transfer the ownership of
+ * policy to one of them. Would be better to pass that to
+ * cpumask_last() as that will be the last CPU to get removed,
+ * but there is no API to get last cpu of the mask. Lets move it
+ * to the first cpu in the mask.
*/
- if (cpumask_intersects(&mask, cpu_present_mask)) {
- remove_cpu_dev_symlink(policy, cpu);
- return 0;
- }
+ int new_cpu = cpumask_first(policy->symlinks);
+ struct device *new_dev = get_cpu_device(new_cpu);

- cpufreq_policy_free(policy, true);
- return 0;
- }
+ dev_dbg(dev, "%s: Migrating kobj from %d to %d\n", __func__,
+ cpu, new_cpu);

- ret = __cpufreq_remove_dev_prepare(dev, sif);
+ cpumask_clear_cpu(new_cpu, policy->symlinks);
+ sysfs_remove_link(&new_dev->kobj, "cpufreq");

- if (!ret)
- ret = __cpufreq_remove_dev_finish(dev, sif);
+ policy->kobj_cpu = new_cpu;
+ WARN_ON(kobject_move(&policy->kobj, &new_dev->kobj));
+ } else {
+ /* This is the last CPU to be removed */
+ cpufreq_policy_free(policy, true);
+ }

- return ret;
+ return 0;
}

static void handle_update(struct work_struct *work)
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index 29ad97c34fd5..c748d1cd0815 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -62,6 +62,7 @@ struct cpufreq_policy {
/* CPUs sharing clock, require sw coordination */
cpumask_var_t cpus; /* Online CPUs only */
cpumask_var_t related_cpus; /* Online + Offline CPUs */
+ cpumask_var_t symlinks; /* CPUs for which cpufreq sysfs directory is present */

unsigned int shared_type; /* ACPI: ANY or ALL affected CPUs
should set cpufreq */
--
2.4.0


2015-07-22 12:44:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH V2] cpufreq: Fix double addition of sysfs links

Hi Viresh,

On Wed, Jul 22, 2015 at 2:07 PM, Viresh Kumar <[email protected]> wrote:
> Consider a dual core (0/1) system with two CPUs:
> - sharing clock/voltage rails and hence cpufreq-policy
> - CPU1 is offline while the cpufreq driver is registered
> - cpufreq_add_dev() is called from subsys callback for CPU0 and we
> create the policy for the group of CPUs and create links for all
> present CPUs, i.e. CPU1 as well.
> - cpufreq_add_dev() is called from subsys callback for CPU1, we find
> that the cpu is offline and we try to create a sysfs link for CPU1.
>
> This results in double addtion of the sysfs link and we will get this:
>
> WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x7c()
> sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq'
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc2+ #1704
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> Backtrace:
> [<c0013248>] (dump_backtrace) from [<c00133e4>] (show_stack+0x18/0x1c)
> r6:c01a1f30 r5:0000001f r4:00000000 r3:00000000
> [<c00133cc>] (show_stack) from [<c076920c>] (dump_stack+0x7c/0x98)
> [<c0769190>] (dump_stack) from [<c0029ab4>] (warn_slowpath_common+0x80/0xbc)
> r4:d74abbd0 r3:d74c0000
> [<c0029a34>] (warn_slowpath_common) from [<c0029b94>] (warn_slowpath_fmt+0x38/0x40)
> r8:ffffffef r7:00000000 r6:d75a8960 r5:c0993280 r4:d6b4d000
> [<c0029b60>] (warn_slowpath_fmt) from [<c01a1f30>] (sysfs_warn_dup+0x60/0x7c)
> r3:d6b4dfe7 r2:c0930750
> [<c01a1ed0>] (sysfs_warn_dup) from [<c01a22c8>] (sysfs_do_create_link_sd+0xb8/0xc0)
> r6:d75a8960 r5:c0993280 r4:d00aba20
> [<c01a2210>] (sysfs_do_create_link_sd) from [<c01a22fc>] (sysfs_create_link+0x2c/0x3c)
> r10:00000001 r8:c14db3c8 r7:d7b89010 r6:c0ae7c60 r5:d7b89010 r4:d00d1200
> [<c01a22d0>] (sysfs_create_link) from [<c0506160>] (add_cpu_dev_symlink+0x34/0x5c)
> [<c050612c>] (add_cpu_dev_symlink) from [<c05084d0>] (cpufreq_add_dev+0x674/0x794)
> r5:00000001 r4:00000000
> [<c0507e5c>] (cpufreq_add_dev) from [<c03db114>] (subsys_interface_register+0x8c/0xd0)
> r10:00000003 r9:d7bb01f0 r8:c14db3c8 r7:00106738 r6:c0ae7c60 r5:c0acbd08
> r4:c0ae7e20
> [<c03db088>] (subsys_interface_register) from [<c0508a2c>] (cpufreq_register_driver+0x104/0x1f4)
>
> The check for offline-cpu in cpufreq_add_dev() is to ensure that link
> gets added for the CPUs which weren't physically present earlier and
> that misses the case where a CPU is offline while registering the
> driver.
>
> To fix this properly, don't create these links when the policy get
> initialized. Rather wait for individual subsys callback for CPUs to
> add/remove these links. This simplifies most of the code leaving
> cpufreq_remove_dev().
>
> The problem is that, we might remove cpu which was owner of policy->kobj
> in sysfs, before other CPUs are removed. Fix this by the solution we
> have been using until very recently, in which we move the kobject to any
> other CPU, for which remove is yet to be called.
>
> Tested on dual core exynos board with cpufreq-dt driver. The driver was
> compiled as module and inserted/removed multiple times on a running
> kernel.
>
> Fixes: 87549141d516 ("cpufreq: Stop migrating sysfs files on hotplug")
> Reported-and-suggested-by: Russell King <[email protected]>
> Signed-off-by: Viresh Kumar <[email protected]>

That looks good to me overall, but please let me rename your new
"symlinks" CPU mask to "dependent_cpus".

> ---
> V1->V2: Completely changed, please review again :)
>
> @Rafael: I didn't review your solution and gave this one because I
> thought Russell suggested the right thing. i.e. don't create links in
> the beginning.

Sure. I prefer this approach too.

> This is based of 4.2-rc3 and so your other patch,
> https://patchwork.kernel.org/patch/6839031/ has to be rebased over it.

OK

> I didn't rebase this patch over yours for two reasons:
> - Yours wasn't necessarily 4.2 material.

Right.

> - I already mentioned a problem in that patch.

I'm not sure if the problem is really there, but after the changes in
this patch it doesn't really matter. :-)

Thanks,
Rafael

2015-07-22 13:16:35

by Russell King - ARM Linux

[permalink] [raw]
Subject: Re: [PATCH V2] cpufreq: Fix double addition of sysfs links

On Wed, Jul 22, 2015 at 05:37:18PM +0530, Viresh Kumar wrote:
> Consider a dual core (0/1) system with two CPUs:
> - sharing clock/voltage rails and hence cpufreq-policy
> - CPU1 is offline while the cpufreq driver is registered
> - cpufreq_add_dev() is called from subsys callback for CPU0 and we
> create the policy for the group of CPUs and create links for all
> present CPUs, i.e. CPU1 as well.
> - cpufreq_add_dev() is called from subsys callback for CPU1, we find
> that the cpu is offline and we try to create a sysfs link for CPU1.
>
> This results in double addtion of the sysfs link and we will get this:
>
> WARNING: CPU: 0 PID: 1 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x60/0x7c()
> sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq'
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.2.0-rc2+ #1704
> Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> Backtrace:
> [<c0013248>] (dump_backtrace) from [<c00133e4>] (show_stack+0x18/0x1c)
> r6:c01a1f30 r5:0000001f r4:00000000 r3:00000000
> [<c00133cc>] (show_stack) from [<c076920c>] (dump_stack+0x7c/0x98)
> [<c0769190>] (dump_stack) from [<c0029ab4>] (warn_slowpath_common+0x80/0xbc)
> r4:d74abbd0 r3:d74c0000
> [<c0029a34>] (warn_slowpath_common) from [<c0029b94>] (warn_slowpath_fmt+0x38/0x40)
> r8:ffffffef r7:00000000 r6:d75a8960 r5:c0993280 r4:d6b4d000
> [<c0029b60>] (warn_slowpath_fmt) from [<c01a1f30>] (sysfs_warn_dup+0x60/0x7c)
> r3:d6b4dfe7 r2:c0930750
> [<c01a1ed0>] (sysfs_warn_dup) from [<c01a22c8>] (sysfs_do_create_link_sd+0xb8/0xc0)
> r6:d75a8960 r5:c0993280 r4:d00aba20
> [<c01a2210>] (sysfs_do_create_link_sd) from [<c01a22fc>] (sysfs_create_link+0x2c/0x3c)
> r10:00000001 r8:c14db3c8 r7:d7b89010 r6:c0ae7c60 r5:d7b89010 r4:d00d1200
> [<c01a22d0>] (sysfs_create_link) from [<c0506160>] (add_cpu_dev_symlink+0x34/0x5c)
> [<c050612c>] (add_cpu_dev_symlink) from [<c05084d0>] (cpufreq_add_dev+0x674/0x794)
> r5:00000001 r4:00000000
> [<c0507e5c>] (cpufreq_add_dev) from [<c03db114>] (subsys_interface_register+0x8c/0xd0)
> r10:00000003 r9:d7bb01f0 r8:c14db3c8 r7:00106738 r6:c0ae7c60 r5:c0acbd08
> r4:c0ae7e20
> [<c03db088>] (subsys_interface_register) from [<c0508a2c>] (cpufreq_register_driver+0x104/0x1f4)
>
> The check for offline-cpu in cpufreq_add_dev() is to ensure that link
> gets added for the CPUs which weren't physically present earlier and
> that misses the case where a CPU is offline while registering the
> driver.
>
> To fix this properly, don't create these links when the policy get
> initialized. Rather wait for individual subsys callback for CPUs to
> add/remove these links. This simplifies most of the code leaving
> cpufreq_remove_dev().
>
> The problem is that, we might remove cpu which was owner of policy->kobj
> in sysfs, before other CPUs are removed. Fix this by the solution we
> have been using until very recently, in which we move the kobject to any
> other CPU, for which remove is yet to be called.
>
> Tested on dual core exynos board with cpufreq-dt driver. The driver was
> compiled as module and inserted/removed multiple times on a running
> kernel.
>
> Fixes: 87549141d516 ("cpufreq: Stop migrating sysfs files on hotplug")
> Reported-and-suggested-by: Russell King <[email protected]>
> Signed-off-by: Viresh Kumar <[email protected]>
> ---
> V1->V2: Completely changed, please review again :)
>
> @Rafael: I didn't review your solution and gave this one because I
> thought Russell suggested the right thing. i.e. don't create links in
> the beginning.
>
> This is based of 4.2-rc3 and so your other patch,
> https://patchwork.kernel.org/patch/6839031/ has to be rebased over it.
>
> I didn't rebase this patch over yours for two reasons:
> - Yours wasn't necessarily 4.2 material.
> - I already mentioned a problem in that patch.
>
> @Russell: I hope this will look much better than V1 to you. Please give
> it a try once you get some time.
>
> drivers/cpufreq/cpufreq.c | 165 ++++++++++++++++++----------------------------
> include/linux/cpufreq.h | 1 +
> 2 files changed, 65 insertions(+), 101 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 26063afb3eba..81c2417e52f4 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -966,67 +966,6 @@ void cpufreq_sysfs_remove_file(const struct attribute *attr)
> }
> EXPORT_SYMBOL(cpufreq_sysfs_remove_file);
>
> -static int add_cpu_dev_symlink(struct cpufreq_policy *policy, int cpu)
> -{
> - struct device *cpu_dev;
> -
> - pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu);
> -
> - if (!policy)
> - return 0;
> -
> - cpu_dev = get_cpu_device(cpu);
> - if (WARN_ON(!cpu_dev))
> - return 0;
> -
> - return sysfs_create_link(&cpu_dev->kobj, &policy->kobj, "cpufreq");
> -}
> -
> -static void remove_cpu_dev_symlink(struct cpufreq_policy *policy, int cpu)
> -{
> - struct device *cpu_dev;
> -
> - pr_debug("%s: Removing symlink for CPU: %u\n", __func__, cpu);
> -
> - cpu_dev = get_cpu_device(cpu);
> - if (WARN_ON(!cpu_dev))
> - return;
> -
> - sysfs_remove_link(&cpu_dev->kobj, "cpufreq");
> -}
> -
> -/* Add/remove symlinks for all related CPUs */
> -static int cpufreq_add_dev_symlink(struct cpufreq_policy *policy)
> -{
> - unsigned int j;
> - int ret = 0;
> -
> - /* Some related CPUs might not be present (physically hotplugged) */
> - for_each_cpu_and(j, policy->related_cpus, cpu_present_mask) {
> - if (j == policy->kobj_cpu)
> - continue;
> -
> - ret = add_cpu_dev_symlink(policy, j);
> - if (ret)
> - break;
> - }
> -
> - return ret;
> -}
> -
> -static void cpufreq_remove_dev_symlink(struct cpufreq_policy *policy)
> -{
> - unsigned int j;
> -
> - /* Some related CPUs might not be present (physically hotplugged) */
> - for_each_cpu_and(j, policy->related_cpus, cpu_present_mask) {
> - if (j == policy->kobj_cpu)
> - continue;
> -
> - remove_cpu_dev_symlink(policy, j);
> - }
> -}
> -
> static int cpufreq_add_dev_interface(struct cpufreq_policy *policy,
> struct device *dev)
> {
> @@ -1057,7 +996,7 @@ static int cpufreq_add_dev_interface(struct cpufreq_policy *policy,
> return ret;
> }
>
> - return cpufreq_add_dev_symlink(policy);
> + return 0;
> }
>
> static void cpufreq_init_policy(struct cpufreq_policy *policy)
> @@ -1163,11 +1102,14 @@ static struct cpufreq_policy *cpufreq_policy_alloc(struct device *dev)
> if (!zalloc_cpumask_var(&policy->related_cpus, GFP_KERNEL))
> goto err_free_cpumask;
>
> + if (!zalloc_cpumask_var(&policy->symlinks, GFP_KERNEL))
> + goto err_free_related_cpumask;
> +
> ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq, &dev->kobj,
> "cpufreq");
> if (ret) {
> pr_err("%s: failed to init policy->kobj: %d\n", __func__, ret);
> - goto err_free_rcpumask;
> + goto err_free_symlink_cpumask;
> }
>
> INIT_LIST_HEAD(&policy->policy_list);
> @@ -1184,7 +1126,9 @@ static struct cpufreq_policy *cpufreq_policy_alloc(struct device *dev)
>
> return policy;
>
> -err_free_rcpumask:
> +err_free_symlink_cpumask:
> + free_cpumask_var(policy->symlinks);
> +err_free_related_cpumask:
> free_cpumask_var(policy->related_cpus);
> err_free_cpumask:
> free_cpumask_var(policy->cpus);
> @@ -1204,7 +1148,6 @@ static void cpufreq_policy_put_kobj(struct cpufreq_policy *policy, bool notify)
> CPUFREQ_REMOVE_POLICY, policy);
>
> down_write(&policy->rwsem);
> - cpufreq_remove_dev_symlink(policy);
> kobj = &policy->kobj;
> cmp = &policy->kobj_unregister;
> up_write(&policy->rwsem);
> @@ -1234,6 +1177,7 @@ static void cpufreq_policy_free(struct cpufreq_policy *policy, bool notify)
> write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>
> cpufreq_policy_put_kobj(policy, notify);
> + free_cpumask_var(policy->symlinks);
> free_cpumask_var(policy->related_cpus);
> free_cpumask_var(policy->cpus);
> kfree(policy);
> @@ -1252,26 +1196,37 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
> {
> unsigned int j, cpu = dev->id;
> int ret = -ENOMEM;
> - struct cpufreq_policy *policy;
> + struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
> unsigned long flags;
> bool recover_policy = !sif;
>
> pr_debug("adding CPU %u\n", cpu);
>
> + /* sysfs links are only created on subsys callback */
> + if (sif && policy) {
> + pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu);

dev_dbg() ?

> + ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
> + if (ret) {
> + dev_err(dev, "%s: Failed to create link for cpu %d (%d)\n",
> + __func__, cpu, ret);

I wonder why we print the CPU number - since it's from dev->id, isn't it
included in the struct device name printed by dev_err() already?

> + return ret;
> + }
> +
> + /* Track CPUs for which sysfs links are created */
> + cpumask_set_cpu(cpu, policy->symlinks);
> + }
> +

I guess this will do for -rc, but it's not particularly nice. Can I
suggest splitting the two operations here - the add_dev callback from
the subsys interface, and the handling of hotplug online/offline
notifications.

You only need to do the above for the subsys interface, and you only
need to do the remainder if the CPU was online.

Also, what about the CPU "owning" the policy?

So, this would become:

static int cpufreq_cpu_online(struct device *dev)
{
pr_debug("bringing CPU%d online\n", dev->id);
... stuff to do when CPU is online ...
}

static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
{
unsigned int cpu = dev->id;
struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);

pr_debug("adding CPU %u\n", cpu);

if (policy && policy->kobj_cpu != cpu) {
dev_dbg(dev, "%s: Adding cpufreq symlink\n", __func__);
ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
if (ret) {
dev_err(dev,
"%s: Failed to create cpufreq symlink (%d)\n",
__func__, ret);
return ret;
}

/* Track CPUs for which sysfs links are created */
cpumask_set_cpu(cpu, policy->symlinks);
}

/* Now do the remainder if the CPU is already online */
if (cpu_online(cpu))
return cpufreq_cpu_online(dev);

return 0;
}

Next, change the cpufreq_add_dev(dev, NULL) in the hotplug notifier call
to cpufreq_cpu_online(dev) instead.

Doing the similar thing for the cpufreq_remove_dev() path would also make
sense.

> /*
> - * Only possible if 'cpu' wasn't physically present earlier and we are
> - * here from subsys_interface add callback. A hotplug notifier will
> - * follow and we will handle it like logical CPU hotplug then. For now,
> - * just create the sysfs link.
> + * A hotplug notifier will follow and we will take care of rest
> + * of the initialization then.
> */
> if (cpu_is_offline(cpu))
> - return add_cpu_dev_symlink(per_cpu(cpufreq_cpu_data, cpu), cpu);
> + return 0;
>
> if (!down_read_trylock(&cpufreq_rwsem))
> return 0;
>
> /* Check if this CPU already has a policy to manage it */
> - policy = per_cpu(cpufreq_cpu_data, cpu);
> if (policy && !policy_is_inactive(policy)) {
> WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus));
> ret = cpufreq_add_policy_cpu(policy, cpu, dev);
> @@ -1506,10 +1461,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
> if (cpufreq_driver->exit)
> cpufreq_driver->exit(policy);
>
> - /* Free the policy only if the driver is getting removed. */
> - if (sif)
> - cpufreq_policy_free(policy, true);
> -
> return 0;
> }
>
> @@ -1521,42 +1472,54 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
> static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
> {
> unsigned int cpu = dev->id;
> + struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
> int ret;
>
> - /*
> - * Only possible if 'cpu' is getting physically removed now. A hotplug
> - * notifier should have already been called and we just need to remove
> - * link or free policy here.
> - */
> - if (cpu_is_offline(cpu)) {
> - struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
> - struct cpumask mask;
> + if (!policy)
> + return 0;
>
> - if (!policy)
> - return 0;
> + if (cpu_online(cpu)) {
> + ret = __cpufreq_remove_dev_prepare(dev, sif);
> + if (!ret)
> + ret = __cpufreq_remove_dev_finish(dev, sif);
> + if (ret)
> + return ret;

Here, I have to wonder about this. If you look at the code in
drivers/base/bus.c, you'll notice that the ->remove_dev return code is
not used (personally, I hate interfaces which are created with an int
return type for a removal operation, but then ignore the return code.
Either have the return code and use it, or don't confuse driver authors
by having one.)

What this means is that in the remove path, the device _is_ going away,
whether you like it or not. So, if you have an error early in your
remove path, returning that error does you no good - you end up leaking
memory because subsequent cleanup doesn't get done.

It's better to either ensure that your removal path can't fail, or if it
can, to reasonably clean up as much as you can (which here, means
continuing to remove the symlink.)

If you adopt my suggestion above, then cpufreq_remove_dev() becomes
something like:

static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
{
unsigned int cpu = dev->id;
struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);

if (cpu_is_online(cpu))
cpufreq_cpu_offline(dev);

if (policy) {
if (cpumask_test_cpu(cpu, policy->symlinks)) {
dev_dbg(dev, "%s: Removing cpufreq symlink\n",
__func__);
cpumask_clear_cpu(cpu, policy->symlinks);
sysfs_remove_link(&dev->kobj, "cpufreq");
}

if (policy->kobj_cpu == cpu) {
... migration code and final CPU deletion code ...
}
}

return 0;
}

which IMHO is easier to read and follow, and more symetrical with
cpufreq_add_dev().

Now, I'm left wondering about a few things:

1. whether having a CPU "own" the policy, and having the cpufreq/ directory
beneath the cpuN node is a good idea, or whether it would be better to
place this in the /sys/devices/system/cpufreq/ subdirectory and always
symlink to there. It strikes me that would simplify the code a little.

2. whether using a kref to track the usage of the policy would be better
than tracking symlink weight (or in the case of (1) being adopted,
whether the symlink cpumask becomes empty.) Note that the symlink
weight becoming zero without (1) (in other words, no symlinks) is not
the correct condition for freeing the policy - we still have one CPU,
that being the CPU for policy->kobj_cpu.

3. what happens when 'policy' is NULL at the point when the first (few) CPUs
are added - how do the symlinks get created later if/when policy becomes
non-NULL (can it?)

4. what about policy->related_cpus ? What if one of the CPUs being added is
not in policy->related_cpus? Should we still go ahead and create the
symlink?

--
FTTC broadband for 0.8mile line: currently at 10.5Mbps down 400kbps up
according to speedtest.net.

2015-07-22 16:42:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH V2] cpufreq: Fix double addition of sysfs links

Hi Russell,

On Wed, Jul 22, 2015 at 3:15 PM, Russell King - ARM Linux
<[email protected]> wrote:
> On Wed, Jul 22, 2015 at 05:37:18PM +0530, Viresh Kumar wrote:

[cut]

>> @@ -1252,26 +1196,37 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
>> {
>> unsigned int j, cpu = dev->id;
>> int ret = -ENOMEM;
>> - struct cpufreq_policy *policy;
>> + struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
>> unsigned long flags;
>> bool recover_policy = !sif;
>>
>> pr_debug("adding CPU %u\n", cpu);
>>
>> + /* sysfs links are only created on subsys callback */
>> + if (sif && policy) {
>> + pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu);
>
> dev_dbg() ?

Right.

>> + ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
>> + if (ret) {
>> + dev_err(dev, "%s: Failed to create link for cpu %d (%d)\n",
>> + __func__, cpu, ret);
>
> I wonder why we print the CPU number - since it's from dev->id, isn't it
> included in the struct device name printed by dev_err() already?

It is AFAICS.

>> + return ret;
>> + }
>> +
>> + /* Track CPUs for which sysfs links are created */
>> + cpumask_set_cpu(cpu, policy->symlinks);
>> + }
>> +
>
> I guess this will do for -rc, but it's not particularly nice.

Right.

This is what I'm queuing up for -rc:
http://git.kernel.org/cgit/linux/kernel/git/rafael/linux-pm.git/commit/?h=bleeding-edge&id=9b07109f06a1edd6e636b1e7397157eae0e6baa4

I've made a few other minor changes (discussed with Viresh on IRC) to it.

> Can I suggest splitting the two operations here - the add_dev callback from
> the subsys interface, and the handling of hotplug online/offline
> notifications.
>
> You only need to do the above for the subsys interface, and you only
> need to do the remainder if the CPU was online.
>
> Also, what about the CPU "owning" the policy?
>
> So, this would become:
>
> static int cpufreq_cpu_online(struct device *dev)
> {
> pr_debug("bringing CPU%d online\n", dev->id);
> ... stuff to do when CPU is online ...
> }
>
> static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
> {
> unsigned int cpu = dev->id;
> struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
>
> pr_debug("adding CPU %u\n", cpu);
>
> if (policy && policy->kobj_cpu != cpu) {
> dev_dbg(dev, "%s: Adding cpufreq symlink\n", __func__);
> ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
> if (ret) {
> dev_err(dev,
> "%s: Failed to create cpufreq symlink (%d)\n",
> __func__, ret);
> return ret;
> }
>
> /* Track CPUs for which sysfs links are created */
> cpumask_set_cpu(cpu, policy->symlinks);
> }
>
> /* Now do the remainder if the CPU is already online */
> if (cpu_online(cpu))
> return cpufreq_cpu_online(dev);
>
> return 0;
> }
>
> Next, change the cpufreq_add_dev(dev, NULL) in the hotplug notifier call
> to cpufreq_cpu_online(dev) instead.

These are good suggestions. I actually have a plan to do that cleanup
on top of the VIresh's patch.

> Doing the similar thing for the cpufreq_remove_dev() path would also make
> sense.

cpufreq_remove_dev() is only called from bus.c already, but it also
has to handle the driver removal case.

And I already have a patch to drop the "sif" argument from
__cpufreq_remove_dev_prepare/finish() that are called on CPU offline.

>> /*
>> - * Only possible if 'cpu' wasn't physically present earlier and we are
>> - * here from subsys_interface add callback. A hotplug notifier will
>> - * follow and we will handle it like logical CPU hotplug then. For now,
>> - * just create the sysfs link.
>> + * A hotplug notifier will follow and we will take care of rest
>> + * of the initialization then.
>> */
>> if (cpu_is_offline(cpu))
>> - return add_cpu_dev_symlink(per_cpu(cpufreq_cpu_data, cpu), cpu);
>> + return 0;
>>
>> if (!down_read_trylock(&cpufreq_rwsem))
>> return 0;
>>
>> /* Check if this CPU already has a policy to manage it */
>> - policy = per_cpu(cpufreq_cpu_data, cpu);
>> if (policy && !policy_is_inactive(policy)) {
>> WARN_ON(!cpumask_test_cpu(cpu, policy->related_cpus));
>> ret = cpufreq_add_policy_cpu(policy, cpu, dev);
>> @@ -1506,10 +1461,6 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
>> if (cpufreq_driver->exit)
>> cpufreq_driver->exit(policy);
>>
>> - /* Free the policy only if the driver is getting removed. */
>> - if (sif)
>> - cpufreq_policy_free(policy, true);
>> -
>> return 0;
>> }
>>
>> @@ -1521,42 +1472,54 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
>> static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
>> {
>> unsigned int cpu = dev->id;
>> + struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
>> int ret;
>>
>> - /*
>> - * Only possible if 'cpu' is getting physically removed now. A hotplug
>> - * notifier should have already been called and we just need to remove
>> - * link or free policy here.
>> - */
>> - if (cpu_is_offline(cpu)) {
>> - struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
>> - struct cpumask mask;
>> + if (!policy)
>> + return 0;
>>
>> - if (!policy)
>> - return 0;
>> + if (cpu_online(cpu)) {
>> + ret = __cpufreq_remove_dev_prepare(dev, sif);
>> + if (!ret)
>> + ret = __cpufreq_remove_dev_finish(dev, sif);
>> + if (ret)
>> + return ret;
>
> Here, I have to wonder about this. If you look at the code in
> drivers/base/bus.c, you'll notice that the ->remove_dev return code is
> not used (personally, I hate interfaces which are created with an int
> return type for a removal operation, but then ignore the return code.
> Either have the return code and use it, or don't confuse driver authors
> by having one.)
>
> What this means is that in the remove path, the device _is_ going away,
> whether you like it or not. So, if you have an error early in your
> remove path, returning that error does you no good - you end up leaking
> memory because subsequent cleanup doesn't get done.

Right.

> It's better to either ensure that your removal path can't fail, or if it
> can, to reasonably clean up as much as you can (which here, means
> continuing to remove the symlink.)
>
> If you adopt my suggestion above, then cpufreq_remove_dev() becomes
> something like:
>
> static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
> {
> unsigned int cpu = dev->id;
> struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
>
> if (cpu_is_online(cpu))
> cpufreq_cpu_offline(dev);
>
> if (policy) {
> if (cpumask_test_cpu(cpu, policy->symlinks)) {
> dev_dbg(dev, "%s: Removing cpufreq symlink\n",
> __func__);
> cpumask_clear_cpu(cpu, policy->symlinks);
> sysfs_remove_link(&dev->kobj, "cpufreq");
> }
>
> if (policy->kobj_cpu == cpu) {
> ... migration code and final CPU deletion code ...
> }
> }
>
> return 0;
> }
>
> which IMHO is easier to read and follow, and more symetrical with
> cpufreq_add_dev().

Agreed (in general).

> Now, I'm left wondering about a few things:
>
> 1. whether having a CPU "own" the policy, and having the cpufreq/ directory
> beneath the cpuN node is a good idea, or whether it would be better to
> place this in the /sys/devices/system/cpufreq/ subdirectory and always
> symlink to there. It strikes me that would simplify the code a little.

That is a good idea IMO. A small complication here is that there may
be multiple policies in the system and a kobject is needed for each of
them.

> 2. whether using a kref to track the usage of the policy would be better
> than tracking symlink weight (or in the case of (1) being adopted,
> whether the symlink cpumask becomes empty.) Note that the symlink
> weight becoming zero without (1) (in other words, no symlinks) is not
> the correct condition for freeing the policy - we still have one CPU,
> that being the CPU for policy->kobj_cpu.

Well, if we do (1), it certainly would be more straightforward to use
krefs for that.

> 3. what happens when 'policy' is NULL at the point when the first (few) CPUs
> are added - how do the symlinks get created later if/when policy becomes
> non-NULL (can it?)

Yes, it can, and we have a design issue here that bothers me a bit.
Namley, we need a driver's ->init callback to populate policy->cpus
for us, but this is not the only thing it is doing, so the concern is
that it may not be able to deal with CPUs that aren't online.

I was thinking about an additional driver callback that would *only*
populate a mask of CPUs that should use the same policy as the given
one. We'd be able to call that from cpufreq_add_dev() for offline
CPUs too and this way the policy object could be created for the first
CPU using the policy that is registered instead of being added for the
first CPU using that policy that becomes online (which happens today).

> 4. what about policy->related_cpus ? What if one of the CPUs being added is
> not in policy->related_cpus?

It will need a new policy.

> Should we still go ahead and create the symlink?

There's nothing to link to then. The policy object will be created
when that CPU becomes online (as per the above).

Thanks,
Rafael

2015-07-23 05:55:10

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V2] cpufreq: Fix double addition of sysfs links

On 22-07-15, 14:15, Russell King - ARM Linux wrote:
> > + /* sysfs links are only created on subsys callback */
> > + if (sif && policy) {
> > + pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu);
>
> dev_dbg() ?

Hmm, right.

> > + ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
> > + if (ret) {
> > + dev_err(dev, "%s: Failed to create link for cpu %d (%d)\n",

Rafael updated this instead with dev_dbg :), I am sending separate
patches to fix that now.

> > + __func__, cpu, ret);
>
> I wonder why we print the CPU number - since it's from dev->id, isn't it
> included in the struct device name printed by dev_err() already?

:(

> > + return ret;
> > + }
> > +
> > + /* Track CPUs for which sysfs links are created */
> > + cpumask_set_cpu(cpu, policy->symlinks);
> > + }
> > +
>
> I guess this will do for -rc, but it's not particularly nice. Can I
> suggest splitting the two operations here - the add_dev callback from
> the subsys interface, and the handling of hotplug online/offline
> notifications.
>
> You only need to do the above for the subsys interface, and you only
> need to do the remainder if the CPU was online.
>
> Also, what about the CPU "owning" the policy?
>
> So, this would become:
>
> static int cpufreq_cpu_online(struct device *dev)
> {
> pr_debug("bringing CPU%d online\n", dev->id);
> ... stuff to do when CPU is online ...
> }
>
> static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
> {
> unsigned int cpu = dev->id;
> struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
>
> pr_debug("adding CPU %u\n", cpu);
>
> if (policy && policy->kobj_cpu != cpu) {
> dev_dbg(dev, "%s: Adding cpufreq symlink\n", __func__);
> ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
> if (ret) {
> dev_err(dev,
> "%s: Failed to create cpufreq symlink (%d)\n",
> __func__, ret);
> return ret;
> }
>
> /* Track CPUs for which sysfs links are created */
> cpumask_set_cpu(cpu, policy->symlinks);
> }
>
> /* Now do the remainder if the CPU is already online */
> if (cpu_online(cpu))
> return cpufreq_cpu_online(dev);
>
> return 0;
> }
>
> Next, change the cpufreq_add_dev(dev, NULL) in the hotplug notifier call
> to cpufreq_cpu_online(dev) instead.
>
> Doing the similar thing for the cpufreq_remove_dev() path would also make
> sense.

Hmmm, Looks better ofcourse.

> > @@ -1521,42 +1472,54 @@ static int __cpufreq_remove_dev_finish(struct device *dev,
> > static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
> > {
> > unsigned int cpu = dev->id;
> > + struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
> > int ret;
> >
> > - /*
> > - * Only possible if 'cpu' is getting physically removed now. A hotplug
> > - * notifier should have already been called and we just need to remove
> > - * link or free policy here.
> > - */
> > - if (cpu_is_offline(cpu)) {
> > - struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
> > - struct cpumask mask;
> > + if (!policy)
> > + return 0;
> >
> > - if (!policy)
> > - return 0;
> > + if (cpu_online(cpu)) {
> > + ret = __cpufreq_remove_dev_prepare(dev, sif);
> > + if (!ret)
> > + ret = __cpufreq_remove_dev_finish(dev, sif);
> > + if (ret)
> > + return ret;
>
> Here, I have to wonder about this. If you look at the code in
> drivers/base/bus.c, you'll notice that the ->remove_dev return code is
> not used

Its not even using the return type of ->add_dev :), I have send an
update for that recently as that was required for cpufreq-drivers.
Greg must be applying that for 4.3 I hope :)

> (personally, I hate interfaces which are created with an int
> return type for a removal operation, but then ignore the return code.
> Either have the return code and use it, or don't confuse driver authors
> by having one.)

+1

> What this means is that in the remove path, the device _is_ going away,
> whether you like it or not. So, if you have an error early in your
> remove path, returning that error does you no good - you end up leaking
> memory because subsequent cleanup doesn't get done.
>
> It's better to either ensure that your removal path can't fail, or if it
> can, to reasonably clean up as much as you can (which here, means
> continuing to remove the symlink.)
>
> If you adopt my suggestion above, then cpufreq_remove_dev() becomes
> something like:
>
> static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
> {
> unsigned int cpu = dev->id;
> struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
>
> if (cpu_is_online(cpu))
> cpufreq_cpu_offline(dev);
>
> if (policy) {
> if (cpumask_test_cpu(cpu, policy->symlinks)) {
> dev_dbg(dev, "%s: Removing cpufreq symlink\n",
> __func__);
> cpumask_clear_cpu(cpu, policy->symlinks);
> sysfs_remove_link(&dev->kobj, "cpufreq");
> }
>
> if (policy->kobj_cpu == cpu) {
> ... migration code and final CPU deletion code ...
> }
> }
>
> return 0;
> }
>
> which IMHO is easier to read and follow, and more symetrical with
> cpufreq_add_dev().

Ack.

> Now, I'm left wondering about a few things:
>
> 1. whether having a CPU "own" the policy, and having the cpufreq/ directory
> beneath the cpuN node is a good idea, or whether it would be better to
> place this in the /sys/devices/system/cpufreq/ subdirectory and always
> symlink to there. It strikes me that would simplify the code a little.

Hmm, but there can be multiple policies in a system and that would
surely confuse people.

> 2. whether using a kref to track the usage of the policy would be better
> than tracking symlink weight (or in the case of (1) being adopted,
> whether the symlink cpumask becomes empty.)


> Note that the symlink
> weight becoming zero without (1) (in other words, no symlinks) is not
> the correct condition for freeing the policy - we still have one CPU,
> that being the CPU for policy->kobj_cpu.

But that's the cpu which is getting removed now, so it was really the
last cpu and we can free the policy.

> 3. what happens when 'policy' is NULL at the point when the first (few) CPUs
> are added

The first CPU that comes up has to create the policy.

> - how do the symlinks get created later if/when policy becomes
> non-NULL (can it?)

It can't.

> 4. what about policy->related_cpus ? What if one of the CPUs being added is
> not in policy->related_cpus? Should we still go ahead and create the
> symlink?

Let me explain a bit around how policy are managed, you might already
know this but I got a bit confused by your question.

Consider a octa-core big LITTLE platform. All big core share
clock/voltage rails and all LITTLE too..

The system will have two policies:
- big: This will manage four CPUs (0-3)
- policy->related_cpus = 0 1 2 3
- policy->cpus = all online CPUs from 0-3
- LITTLE: This will manage four CPUs (4-7)
- policy->related_cpus = 4 5 6 7
- policy->cpus = all online CPUs from 4-7

So if a CPU (say 5) doesn't find a place in big cluster's
policy->related_cpus, then it must belong to a different policy.

Does that clear your query? Or did I completely miss your concern ?

--
viresh

2015-07-23 06:09:51

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH V2] cpufreq: Fix double addition of sysfs links

On 22-07-15, 18:42, Rafael J. Wysocki wrote:
> > 3. what happens when 'policy' is NULL at the point when the first (few) CPUs
> > are added - how do the symlinks get created later if/when policy becomes
> > non-NULL (can it?)
>
> Yes, it can, and we have a design issue here that bothers me a bit.

I replied to Russell with a NO here as the first CPU should have
created the policy. BUT...

> Namley, we need a driver's ->init callback to populate policy->cpus
> for us, but this is not the only thing it is doing, so the concern is
> that it may not be able to deal with CPUs that aren't online.

... the first few CPUs could have been offline and so we might not
have tried to add the policy at all.. Need to fix that for sure.

> I was thinking about an additional driver callback that would *only*
> populate a mask of CPUs that should use the same policy as the given
> one.

Why so ? Drivers today are required to set policy->cpus with all CPUs
that should be managed by that policy. i.e. all online+offline. So,
actually ->init() fills policy->cpus with the value of
policy->related_cpus.

Yes, I thought earlier to change that by setting policy->related_cpus
from drivers, instead of policy->cpus and wasn't sure if I should do
that :)

> We'd be able to call that from cpufreq_add_dev() for offline
> CPUs too and this way the policy object could be created for the first
> CPU using the policy that is registered instead of being added for the
> first CPU using that policy that becomes online (which happens today).

Creating policy for offline CPUs doesn't look that great to me.

What we can do to fix the problem in hand, is to update a global mask
of CPUs (with policy == NULL) which were offline when
cpufreq_add_dev() was called for them. And when we create the policy,
we can add links for all such CPUs.

--
viresh

2015-07-23 08:15:37

by Viresh Kumar

[permalink] [raw]
Subject: [PATCH 1/3] cpufreq: print error messages with dev_err()

By mistake dev_err was replaced by dev_dbg in a recent patch, fix that.

Fixes: 9b07109f06a1 ("cpufreq: Fix double addition of sysfs links")
Signed-off-by: Viresh Kumar <[email protected]>
---
drivers/cpufreq/cpufreq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index fa718644f1ee..84504ae3fb38 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1173,7 +1173,7 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu);
ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
if (ret) {
- dev_dbg(dev, "%s: Failed to create link (%d)\n",
+ dev_err(dev, "%s: Failed to create link (%d)\n",
__func__, ret);
return ret;
}
--
2.4.0

2015-07-23 08:15:50

by Viresh Kumar

[permalink] [raw]
Subject: [PATCH 2/3] cpufreq: Create links for offline CPUs that got added earlier

If subsys callback ->add_dev() is called for an offline CPU, before its
policy is allocated, we will miss adding its sysfs symlink.

Fix this by tracking such CPUs in a separate mask.

Fixes: 9b07109f06a1 ("cpufreq: Fix double addition of sysfs links")
Signed-off-by: Viresh Kumar <[email protected]>
---
drivers/cpufreq/cpufreq.c | 74 +++++++++++++++++++++++++++++++++++++++--------
1 file changed, 62 insertions(+), 12 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 84504ae3fb38..d01cad993fa7 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -31,6 +31,12 @@
#include <linux/tick.h>
#include <trace/events/power.h>

+/*
+ * CPUs that were offline when a request to allocate policy was issued, symlinks
+ * for them should be created once the policy is available for them.
+ */
+cpumask_t linked_cpus_pending;
+
static LIST_HEAD(cpufreq_policy_list);

static inline bool policy_is_inactive(struct cpufreq_policy *policy)
@@ -938,6 +944,47 @@ void cpufreq_sysfs_remove_file(const struct attribute *attr)
}
EXPORT_SYMBOL(cpufreq_sysfs_remove_file);

+static int cpufreq_add_symlink(struct cpufreq_policy *policy,
+ struct device *dev)
+{
+ int ret, cpu = dev->id;
+
+ dev_dbg(dev, "%s: Adding symlink for CPU: %u\n", __func__, cpu);
+ ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
+ if (ret) {
+ dev_err(dev, "%s: Failed to create link (%d)\n", __func__, ret);
+ return ret;
+ }
+
+ /* Track CPUs for which sysfs links are created */
+ cpumask_set_cpu(cpu, policy->linked_cpus);
+ return 0;
+}
+
+/*
+ * Create symlinks for CPUs which are already added via subsys callbacks (and
+ * were offline then), before the policy was created.
+ */
+static int cpufreq_add_pending_symlinks(struct cpufreq_policy *policy)
+{
+ struct cpumask mask;
+ int cpu, ret;
+
+ cpumask_and(&mask, policy->related_cpus, &linked_cpus_pending);
+
+ if (cpumask_empty(&mask))
+ return 0;
+
+ for_each_cpu(cpu, &mask) {
+ ret = cpufreq_add_symlink(policy, get_cpu_device(cpu));
+ if (ret)
+ return ret;
+ cpumask_clear_cpu(cpu, &linked_cpus_pending);
+ }
+
+ return 0;
+}
+
static int cpufreq_add_dev_interface(struct cpufreq_policy *policy,
struct device *dev)
{
@@ -968,7 +1015,7 @@ static int cpufreq_add_dev_interface(struct cpufreq_policy *policy,
return ret;
}

- return 0;
+ return cpufreq_add_pending_symlinks(policy);
}

static int cpufreq_init_policy(struct cpufreq_policy *policy)
@@ -1170,24 +1217,21 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)

/* sysfs links are only created on subsys callback */
if (sif && policy) {
- pr_debug("%s: Adding symlink for CPU: %u\n", __func__, cpu);
- ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
- if (ret) {
- dev_err(dev, "%s: Failed to create link (%d)\n",
- __func__, ret);
+ ret = cpufreq_add_symlink(policy, dev);
+ if (ret)
return ret;
- }
-
- /* Track CPUs for which sysfs links are created */
- cpumask_set_cpu(cpu, policy->linked_cpus);
}

/*
* A hotplug notifier will follow and we will take care of rest
* of the initialization then.
*/
- if (cpu_is_offline(cpu))
+ if (cpu_is_offline(cpu)) {
+ /* symlink should be added for this CPU later */
+ if (!policy)
+ cpumask_set_cpu(cpu, &linked_cpus_pending);
return 0;
+ }

/* Check if this CPU already has a policy to manage it */
if (policy && !policy_is_inactive(policy)) {
@@ -1440,8 +1484,10 @@ static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
int ret;

- if (!policy)
+ if (!policy) {
+ cpumask_clear_cpu(cpu, &linked_cpus_pending);
return 0;
+ }

if (cpu_online(cpu)) {
ret = __cpufreq_remove_dev_prepare(dev, sif);
@@ -2533,10 +2579,14 @@ int cpufreq_unregister_driver(struct cpufreq_driver *driver)

/* Protect against concurrent cpu hotplug */
get_online_cpus();
+
subsys_interface_unregister(&cpufreq_interface);
if (cpufreq_boost_supported())
cpufreq_sysfs_remove_file(&boost.attr);

+ if (WARN_ON(!cpumask_empty(&linked_cpus_pending)))
+ cpumask_clear(&linked_cpus_pending);
+
unregister_hotcpu_notifier(&cpufreq_cpu_notifier);

write_lock_irqsave(&cpufreq_driver_lock, flags);
--
2.4.0

2015-07-23 08:15:55

by Viresh Kumar

[permalink] [raw]
Subject: [PATCH 3/3] cpufreq: use cpumask_test_and_clear_cpu() instead of separate routines

We need to clear cpumask only if the relevant cpu is set and we could
have used cpumask_test_and_clear_cpu() and set instead.

Signed-off-by: Viresh Kumar <[email protected]>
---
drivers/cpufreq/cpufreq.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index d01cad993fa7..b223c9c5296b 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1498,10 +1498,9 @@ static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
}

/* sysfs links are removed only on subsys callback */
- if (cpumask_test_cpu(cpu, policy->linked_cpus)) {
+ if (cpumask_test_and_clear_cpu(cpu, policy->linked_cpus)) {
dev_dbg(dev, "%s: Removing symlink for CPU: %u\n", __func__,
cpu);
- cpumask_clear_cpu(cpu, policy->linked_cpus);
sysfs_remove_link(&dev->kobj, "cpufreq");
return 0;
}
--
2.4.0

2015-07-23 17:22:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH V2] cpufreq: Fix double addition of sysfs links

Hi Viresh,

On Thu, Jul 23, 2015 at 8:09 AM, Viresh Kumar <[email protected]> wrote:
> On 22-07-15, 18:42, Rafael J. Wysocki wrote:
>> > 3. what happens when 'policy' is NULL at the point when the first (few) CPUs
>> > are added - how do the symlinks get created later if/when policy becomes
>> > non-NULL (can it?)
>>
>> Yes, it can, and we have a design issue here that bothers me a bit.
>
> I replied to Russell with a NO here as the first CPU should have
> created the policy. BUT...
>
>> Namley, we need a driver's ->init callback to populate policy->cpus
>> for us, but this is not the only thing it is doing, so the concern is
>> that it may not be able to deal with CPUs that aren't online.
>
> ... the first few CPUs could have been offline and so we might not
> have tried to add the policy at all.. Need to fix that for sure.

Wait here.

The current Linus' tree doesn't have that problem as far as I can say.

Say cpufreq_interface->add_dev() is called for an offline CPU (say
CPU2). It points to cpufreq_add_dev(), so we see that the CPU is
offline and call add_cpu_dev_symlink() for it. But the first argument
we pass to that is per_cpu(cpufreq_cpu_data, cpu) and that is NULL,
because the policy is not there yet. So we just return 0 (and the CPU
has no policy and no link).

Now say cpufreq_interface->add_dev() is called for an online CPU (say
CPU3). It goes and creates the policy for it and the driver's
->init() tells us that CPU2 is related to it. So
cpufreq_add_dev_interface() creates the link for CPU2 and we're fine.

Now say CPU3 was offline too when cpufreq_interface->add_dev() was
called for it. We don't create a policy or a link for it. Now say
CPU2 becomes online. cpufreq_cpu_callback() calls cpufreq_add_dev()
for it and we land in the previous case.

The *broken* case is when CPU2 is online to start with and it had
created the link for CPU3, so when an offline CPU3 is now being added,
we try to create the link for it again. That is the case we need to
address in -rc without introducing new problems. The $subject patch
adresses that issue, but it introduces the above problem. On the
other hand, my patch at https://patchwork.kernel.org/patch/6839151/
should take care of this too (unless it is broken in a way I'm not
seeing now).

>> I was thinking about an additional driver callback that would *only*
>> populate a mask of CPUs that should use the same policy as the given
>> one.
>
> Why so ? Drivers today are required to set policy->cpus with all CPUs
> that should be managed by that policy. i.e. all online+offline. So,
> actually ->init() fills policy->cpus with the value of
> policy->related_cpus.

So the problem is that the setting of policy->cpus is not the *only*
thing that ->init() is supposed to do. It can go and write to
registers etc and is that guaranteed to work with offline CPUs? I
honestly don't think so.

> Yes, I thought earlier to change that by setting policy->related_cpus
> from drivers, instead of policy->cpus and wasn't sure if I should do
> that :)
>
>> We'd be able to call that from cpufreq_add_dev() for offline
>> CPUs too and this way the policy object could be created for the first
>> CPU using the policy that is registered instead of being added for the
>> first CPU using that policy that becomes online (which happens today).
>
> Creating policy for offline CPUs doesn't look that great to me.

Then we have a problem that CPUs that are not initially online do not
have policies, but if they go online and *then* offline subsequently,
the policies will be there. So there are two different "offline"
statuses for a CPU as far as cpufreq is concerned, depending on
whether or not the CPU has ever been online. That's weird and IMO we
should avoid it.

> What we can do to fix the problem in hand, is to update a global mask
> of CPUs (with policy == NULL) which were offline when
> cpufreq_add_dev() was called for them. And when we create the policy,
> we can add links for all such CPUs.

For -rc, why don't we have a mask of CPUs that are both "related" and
present and create links only for those? Which is what the patch at
https://patchwork.kernel.org/patch/6839151/ is doing?

And then we can target a major rework at the next merge window.

Thanks,
Rafael

2015-07-23 19:28:10

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 2/3] cpufreq: Create links for offline CPUs that got added earlier

Hi Viresh,

On Thu, Jul 23, 2015 at 10:13 AM, Viresh Kumar <[email protected]> wrote:
> If subsys callback ->add_dev() is called for an offline CPU, before its
> policy is allocated, we will miss adding its sysfs symlink.
>
> Fix this by tracking such CPUs in a separate mask.
>
> Fixes: 9b07109f06a1 ("cpufreq: Fix double addition of sysfs links")
> Signed-off-by: Viresh Kumar <[email protected]>

No, we need to go to back to square one.

No fixes of fixes of fixes etc please.

Let me prepare a patch for -rc that won't introduce *new* problems and
we can make major changes as 4.3 material, OK?

Thanks,
Rafael

2015-07-23 19:29:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH V2] cpufreq: Fix double addition of sysfs links

On Thu, Jul 23, 2015 at 7:22 PM, Rafael J. Wysocki <[email protected]> wrote:
> Hi Viresh,
>
> On Thu, Jul 23, 2015 at 8:09 AM, Viresh Kumar <[email protected]> wrote:
>> On 22-07-15, 18:42, Rafael J. Wysocki wrote:
>>> > 3. what happens when 'policy' is NULL at the point when the first (few) CPUs
>>> > are added - how do the symlinks get created later if/when policy becomes
>>> > non-NULL (can it?)
>>>
>>> Yes, it can, and we have a design issue here that bothers me a bit.
>>
>> I replied to Russell with a NO here as the first CPU should have
>> created the policy. BUT...
>>
>>> Namley, we need a driver's ->init callback to populate policy->cpus
>>> for us, but this is not the only thing it is doing, so the concern is
>>> that it may not be able to deal with CPUs that aren't online.
>>
>> ... the first few CPUs could have been offline and so we might not
>> have tried to add the policy at all.. Need to fix that for sure.
>
> Wait here.
>
> The current Linus' tree doesn't have that problem as far as I can say.
>
> Say cpufreq_interface->add_dev() is called for an offline CPU (say
> CPU2). It points to cpufreq_add_dev(), so we see that the CPU is
> offline and call add_cpu_dev_symlink() for it. But the first argument
> we pass to that is per_cpu(cpufreq_cpu_data, cpu) and that is NULL,
> because the policy is not there yet. So we just return 0 (and the CPU
> has no policy and no link).
>
> Now say cpufreq_interface->add_dev() is called for an online CPU (say
> CPU3). It goes and creates the policy for it and the driver's
> ->init() tells us that CPU2 is related to it. So
> cpufreq_add_dev_interface() creates the link for CPU2 and we're fine.
>
> Now say CPU3 was offline too when cpufreq_interface->add_dev() was
> called for it. We don't create a policy or a link for it. Now say
> CPU2 becomes online. cpufreq_cpu_callback() calls cpufreq_add_dev()
> for it and we land in the previous case.
>
> The *broken* case is when CPU2 is online to start with and it had
> created the link for CPU3, so when an offline CPU3 is now being added,
> we try to create the link for it again. That is the case we need to
> address in -rc without introducing new problems. The $subject patch
> adresses that issue, but it introduces the above problem. On the
> other hand, my patch at https://patchwork.kernel.org/patch/6839151/
> should take care of this too (unless it is broken in a way I'm not
> seeing now).

It doesn't address the case when the CPU being removed is the policy owner.

Let me prepare a new version of it and we'll start over from there.

Thanks,
Rafael