2014-02-24 06:58:05

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

The existing code sets the per CPU policy to a non-NULL value before all
the steps performed during the hotplug online path is done. Specifically,
this is done before the policy min/max, governors, etc are initialized for
the policy. This in turn means that calls to cpufreq_cpu_get() return a
non-NULL policy before the policy/CPU is ready to be used.

To fix this, move the update of per CPU policy to a valid value after all
the initialization steps for the policy are completed.

Example kernel panic without this fix:
[ 512.146185] Unable to handle kernel NULL pointer dereference at virtual address 00000020
[ 512.146195] pgd = c0003000
[ 512.146213] [00000020] *pgd=80000000004003, *pmd=00000000
[ 512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM
<snip>
[ 512.146297] PC is at __cpufreq_governor+0x10/0x1ac
[ 512.146312] LR is at cpufreq_update_policy+0x114/0x150
<snip>
[ 512.149740] ---[ end trace f23a8defea6cd706 ]---
[ 512.149761] Kernel panic - not syncing: Fatal exception
[ 513.152016] CPU0: stopping
[ 513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W 3.10.0-gd727407-00074-g979ede8 #396
<snip>
[ 513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
[ 513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
[ 513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8)
[ 513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from [<c0803e7c>] (cpufreq_init_policy+0x30/0x98)
[ 513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4)
[ 513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84)
[ 513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from [<c0afe180>] (notifier_call_chain+0x40/0x68)
[ 513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02812dc>] (__cpu_notify+0x28/0x44)
[ 513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] (_cpu_up+0xf4/0x1dc)
[ 513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] (cpu_up+0x5c/0x78)
[ 513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] (store_online+0x44/0x74)
[ 513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] (sysfs_write_file+0x108/0x14c)
[ 513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from [<c03517d4>] (vfs_write+0xd0/0x180)
[ 513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] (SyS_write+0x38/0x68)
[ 513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] (ret_fast_syscall+0x0/0x30)

In this specific case, CPU0 set's CPU1's policy->governor in
cpufreq_init_policy() to NULL while CPU1 is using the policy->governor in
__cpufreq_governor().

Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/cpufreq/cpufreq.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index cb003a6..d5ceb43 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1109,11 +1109,6 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
goto err_set_policy_cpu;
}

- write_lock_irqsave(&cpufreq_driver_lock, flags);
- for_each_cpu(j, policy->cpus)
- per_cpu(cpufreq_cpu_data, j) = policy;
- write_unlock_irqrestore(&cpufreq_driver_lock, flags);
-
if (cpufreq_driver->get) {
policy->cur = cpufreq_driver->get(policy->cpu);
if (!policy->cur) {
@@ -1207,6 +1202,11 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
policy->user_policy.governor = policy->governor;
}

+ write_lock_irqsave(&cpufreq_driver_lock, flags);
+ for_each_cpu(j, policy->cpus)
+ per_cpu(cpufreq_cpu_data, j) = policy;
+ write_unlock_irqrestore(&cpufreq_driver_lock, flags);
+
kobject_uevent(&policy->kobj, KOBJ_ADD);
up_read(&cpufreq_rwsem);

--
1.8.2.1

The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation


2014-02-24 07:48:09

by Srivatsa S. Bhat

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

On 02/24/2014 12:27 PM, Saravana Kannan wrote:
> The existing code sets the per CPU policy to a non-NULL value before all
> the steps performed during the hotplug online path is done. Specifically,
> this is done before the policy min/max, governors, etc are initialized for
> the policy. This in turn means that calls to cpufreq_cpu_get() return a
> non-NULL policy before the policy/CPU is ready to be used.
>
> To fix this, move the update of per CPU policy to a valid value after all
> the initialization steps for the policy are completed.
>
> Example kernel panic without this fix:
> [ 512.146185] Unable to handle kernel NULL pointer dereference at virtual address 00000020
> [ 512.146195] pgd = c0003000
> [ 512.146213] [00000020] *pgd=80000000004003, *pmd=00000000
> [ 512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM
> <snip>
> [ 512.146297] PC is at __cpufreq_governor+0x10/0x1ac
> [ 512.146312] LR is at cpufreq_update_policy+0x114/0x150
> <snip>
> [ 512.149740] ---[ end trace f23a8defea6cd706 ]---
> [ 512.149761] Kernel panic - not syncing: Fatal exception
> [ 513.152016] CPU0: stopping
> [ 513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W 3.10.0-gd727407-00074-g979ede8 #396
> <snip>
> [ 513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
> [ 513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
> [ 513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8)
> [ 513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from [<c0803e7c>] (cpufreq_init_policy+0x30/0x98)
> [ 513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4)
> [ 513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84)
> [ 513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from [<c0afe180>] (notifier_call_chain+0x40/0x68)
> [ 513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02812dc>] (__cpu_notify+0x28/0x44)
> [ 513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] (_cpu_up+0xf4/0x1dc)
> [ 513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] (cpu_up+0x5c/0x78)
> [ 513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] (store_online+0x44/0x74)
> [ 513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] (sysfs_write_file+0x108/0x14c)
> [ 513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from [<c03517d4>] (vfs_write+0xd0/0x180)
> [ 513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] (SyS_write+0x38/0x68)
> [ 513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] (ret_fast_syscall+0x0/0x30)
>
> In this specific case, CPU0 set's CPU1's policy->governor in
> cpufreq_init_policy() to NULL while CPU1 is using the policy->governor in
> __cpufreq_governor().
>

Whoa! That's horrible! Can you please explain how exactly you
triggered this? I'm curious...

> Signed-off-by: Saravana Kannan <[email protected]>
> ---
> drivers/cpufreq/cpufreq.c | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index cb003a6..d5ceb43 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1109,11 +1109,6 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
> goto err_set_policy_cpu;
> }
>
> - write_lock_irqsave(&cpufreq_driver_lock, flags);
> - for_each_cpu(j, policy->cpus)
> - per_cpu(cpufreq_cpu_data, j) = policy;
> - write_unlock_irqrestore(&cpufreq_driver_lock, flags);
> -
> if (cpufreq_driver->get) {
> policy->cur = cpufreq_driver->get(policy->cpu);

If you move the per-cpu init further down, then what happens to the
cpufreq_generic_get() that gets invoked here by some of the drivers?
It will almost always fail (because policy will be NULL) and hence
CPU online will be unsuccessful (though you wont observe it because
the error code is not bubbled up to the CPU hotplug core; perhaps we
should).


IMHO, we should fix the synchronization in cpufreq_init_policy().
I notice cpufreq_update_policy() as well as __cpufreq_governor() in
your stack trace, which means cpufreq_set_policy() was involved.
cpufreq_update_policy() takes the policy->rwsem for write, whereas
cpufreq_init_policy() doesn't take that lock at all, which is clearly
wrong.

My guess is that, if we fix that locking, everything will be fine and
you won't hit the bug. Would you like to give that a shot?

Regards,
Srivatsa S. Bhat

> if (!policy->cur) {
> @@ -1207,6 +1202,11 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
> policy->user_policy.governor = policy->governor;
> }
>
> + write_lock_irqsave(&cpufreq_driver_lock, flags);
> + for_each_cpu(j, policy->cpus)
> + per_cpu(cpufreq_cpu_data, j) = policy;
> + write_unlock_irqrestore(&cpufreq_driver_lock, flags);
> +
> kobject_uevent(&policy->kobj, KOBJ_ADD);
> up_read(&cpufreq_rwsem);
>

2014-02-24 08:11:28

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

On 24 February 2014 13:12, Srivatsa S. Bhat
<[email protected]> wrote:
> On 02/24/2014 12:27 PM, Saravana Kannan wrote:
>> The existing code sets the per CPU policy to a non-NULL value before all
>> the steps performed during the hotplug online path is done. Specifically,
>> this is done before the policy min/max, governors, etc are initialized for
>> the policy. This in turn means that calls to cpufreq_cpu_get() return a
>> non-NULL policy before the policy/CPU is ready to be used.
>>
>> To fix this, move the update of per CPU policy to a valid value after all
>> the initialization steps for the policy are completed.
>>
>> Example kernel panic without this fix:
>> [ 512.146185] Unable to handle kernel NULL pointer dereference at virtual address 00000020
>> [ 512.146195] pgd = c0003000
>> [ 512.146213] [00000020] *pgd=80000000004003, *pmd=00000000
>> [ 512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM
>> <snip>
>> [ 512.146297] PC is at __cpufreq_governor+0x10/0x1ac
>> [ 512.146312] LR is at cpufreq_update_policy+0x114/0x150
>> <snip>
>> [ 512.149740] ---[ end trace f23a8defea6cd706 ]---
>> [ 512.149761] Kernel panic - not syncing: Fatal exception
>> [ 513.152016] CPU0: stopping
>> [ 513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W 3.10.0-gd727407-00074-g979ede8 #396
>> <snip>
>> [ 513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
>> [ 513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
>> [ 513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8)
>> [ 513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from [<c0803e7c>] (cpufreq_init_policy+0x30/0x98)
>> [ 513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4)
>> [ 513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84)
>> [ 513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from [<c0afe180>] (notifier_call_chain+0x40/0x68)
>> [ 513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02812dc>] (__cpu_notify+0x28/0x44)
>> [ 513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] (_cpu_up+0xf4/0x1dc)
>> [ 513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] (cpu_up+0x5c/0x78)
>> [ 513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] (store_online+0x44/0x74)
>> [ 513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] (sysfs_write_file+0x108/0x14c)
>> [ 513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from [<c03517d4>] (vfs_write+0xd0/0x180)
>> [ 513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] (SyS_write+0x38/0x68)
>> [ 513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] (ret_fast_syscall+0x0/0x30)
>>
>> In this specific case, CPU0 set's CPU1's policy->governor in
>> cpufreq_init_policy() to NULL while CPU1 is using the policy->governor in
>> __cpufreq_governor().
>>
>
> Whoa! That's horrible! Can you please explain how exactly you
> triggered this? I'm curious...

:)

Actually I didn't understood his problem very well until now. I couldn't
get from his trace that which pointer was actually set to NULL here?
And hence what caused this crash?

Also, what Saravana just wrote here didn't looked like relevant at all.
i.e.: policy->governor being set to NULL.

So what? It was set to NULL with a reason and it shouldn't cause
any crashes I hope. And also its sort of wrong to say that CPU0
has set governor for CPU1, as governor was set for a policy and
both CPUs were part of the same policy.

>> Signed-off-by: Saravana Kannan <[email protected]>
>> ---
>> drivers/cpufreq/cpufreq.c | 10 +++++-----
>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>> index cb003a6..d5ceb43 100644
>> --- a/drivers/cpufreq/cpufreq.c
>> +++ b/drivers/cpufreq/cpufreq.c
>> @@ -1109,11 +1109,6 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
>> goto err_set_policy_cpu;
>> }
>>
>> - write_lock_irqsave(&cpufreq_driver_lock, flags);
>> - for_each_cpu(j, policy->cpus)
>> - per_cpu(cpufreq_cpu_data, j) = policy;
>> - write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>> -
>> if (cpufreq_driver->get) {
>> policy->cur = cpufreq_driver->get(policy->cpu);
>
> If you move the per-cpu init further down, then what happens to the
> cpufreq_generic_get() that gets invoked here by some of the drivers?
> It will almost always fail (because policy will be NULL) and hence
> CPU online will be unsuccessful

Thanks, I was about to point that as well.

> (though you wont observe it because
> the error code is not bubbled up to the CPU hotplug core; perhaps we
> should).

Don't know :)

> IMHO, we should fix the synchronization in cpufreq_init_policy().
> I notice cpufreq_update_policy() as well as __cpufreq_governor() in
> your stack trace, which means cpufreq_set_policy() was involved.
> cpufreq_update_policy() takes the policy->rwsem for write, whereas
> cpufreq_init_policy() doesn't take that lock at all, which is clearly
> wrong.
>
> My guess is that, if we fix that locking, everything will be fine and
> you won't hit the bug. Would you like to give that a shot?

I am not sure about that guess as well. I first want to know what
went bad exactly..

2014-02-24 08:39:12

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done


Srivatsa S. Bhat wrote:
> On 02/24/2014 12:27 PM, Saravana Kannan wrote:
>> The existing code sets the per CPU policy to a non-NULL value before all
>> the steps performed during the hotplug online path is done.
>> Specifically,
>> this is done before the policy min/max, governors, etc are initialized
>> for
>> the policy. This in turn means that calls to cpufreq_cpu_get() return a
>> non-NULL policy before the policy/CPU is ready to be used.
>>
>> To fix this, move the update of per CPU policy to a valid value after
>> all
>> the initialization steps for the policy are completed.
>>
>> Example kernel panic without this fix:
>> [ 512.146185] Unable to handle kernel NULL pointer dereference at
>> virtual address 00000020
>> [ 512.146195] pgd = c0003000
>> [ 512.146213] [00000020] *pgd=80000000004003, *pmd=00000000
>> [ 512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM
>> <snip>
>> [ 512.146297] PC is at __cpufreq_governor+0x10/0x1ac
>> [ 512.146312] LR is at cpufreq_update_policy+0x114/0x150
>> <snip>
>> [ 512.149740] ---[ end trace f23a8defea6cd706 ]---
>> [ 512.149761] Kernel panic - not syncing: Fatal exception
>> [ 513.152016] CPU0: stopping
>> [ 513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W
>> 3.10.0-gd727407-00074-g979ede8 #396
>> <snip>
>> [ 513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from
>> [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
>> [ 513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
>> from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
>> [ 513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
>> from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8)
>> [ 513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from
>> [<c0803e7c>] (cpufreq_init_policy+0x30/0x98)
>> [ 513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from
>> [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4)
>> [ 513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from
>> [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84)
>> [ 513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from
>> [<c0afe180>] (notifier_call_chain+0x40/0x68)
>> [ 513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from
>> [<c02812dc>] (__cpu_notify+0x28/0x44)
>> [ 513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>]
>> (_cpu_up+0xf4/0x1dc)
>> [ 513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>]
>> (cpu_up+0x5c/0x78)
>> [ 513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>]
>> (store_online+0x44/0x74)
>> [ 513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>]
>> (sysfs_write_file+0x108/0x14c)
>> [ 513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from
>> [<c03517d4>] (vfs_write+0xd0/0x180)
>> [ 513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>]
>> (SyS_write+0x38/0x68)
>> [ 513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>]
>> (ret_fast_syscall+0x0/0x30)
>>
>> In this specific case, CPU0 set's CPU1's policy->governor in
>> cpufreq_init_policy() to NULL while CPU1 is using the policy->governor
>> in
>> __cpufreq_governor().
>>
>
> Whoa! That's horrible! Can you please explain how exactly you
> triggered this? I'm curious...

Just call cpufreq_update_policy(cpu) on a CPU and have another thread
online/offline it.

>
>> Signed-off-by: Saravana Kannan <[email protected]>
>> ---
>> drivers/cpufreq/cpufreq.c | 10 +++++-----
>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>> index cb003a6..d5ceb43 100644
>> --- a/drivers/cpufreq/cpufreq.c
>> +++ b/drivers/cpufreq/cpufreq.c
>> @@ -1109,11 +1109,6 @@ static int __cpufreq_add_dev(struct device *dev,
>> struct subsys_interface *sif,
>> goto err_set_policy_cpu;
>> }
>>
>> - write_lock_irqsave(&cpufreq_driver_lock, flags);
>> - for_each_cpu(j, policy->cpus)
>> - per_cpu(cpufreq_cpu_data, j) = policy;
>> - write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>> -
>> if (cpufreq_driver->get) {
>> policy->cur = cpufreq_driver->get(policy->cpu);
>
> If you move the per-cpu init further down, then what happens to the
> cpufreq_generic_get() that gets invoked here by some of the drivers?

While cpufreq_generic_get() was a good refactor, I think it's causing
unnecessary cyclic dependency. You need that function to not fail for a
policy to get added properly and you need a proper policy for the function
to work. I care more about fixing the panic than trying to keep
cpufreq_generic_get().

> It will almost always fail (because policy will be NULL) and hence
> CPU online will be unsuccessful (though you wont observe it because
> the error code is not bubbled up to the CPU hotplug core; perhaps we
> should).

Good catch. I actually hit this issue, fixed and test it on a 3.12 cpufreq
code base. Since this new function isn't there at that point, I missed it.
Even if I did use the latest kernel, I wouldn't have hit this issue,
because the MSM cpufreq driver doesn't use this function.

>
> IMHO, we should fix the synchronization in cpufreq_init_policy().
> I notice cpufreq_update_policy() as well as __cpufreq_governor() in
> your stack trace, which means cpufreq_set_policy() was involved.
> cpufreq_update_policy() takes the policy->rwsem for write, whereas
> cpufreq_init_policy() doesn't take that lock at all, which is clearly
> wrong.
>
> My guess is that, if we fix that locking, everything will be fine and
> you won't hit the bug. Would you like to give that a shot?

While locking might need fixing, this is not about the locking though.
Plenty of drivers and the framework use cpufreq_cpu_get() to get a policy
that they consider valid and also use it as a way to check and reject API
calls trying to manipulate an offline CPU.

Without this fix, the framework is advertising an incomplete policy object
as available. I think that breaks the CPUfreq framework APIs in a very
fundamental sense. This is a "no-no" in programming.

It's like trying to register a CPUfreq driver before getting the clocks to
control the CPU.

As for the ->get() issue, I think the framework should just call
clk_get_rate() instead of calling .get() if policy->clk is not NULL. No
point in doing framework -> driver -> framework.

Thanks,
Saravana

--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

2014-02-24 08:41:46

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done


Viresh Kumar wrote:
> On 24 February 2014 13:12, Srivatsa S. Bhat
> <[email protected]> wrote:
>> On 02/24/2014 12:27 PM, Saravana Kannan wrote:
>>> The existing code sets the per CPU policy to a non-NULL value before
>>> all
>>> the steps performed during the hotplug online path is done.
>>> Specifically,
>>> this is done before the policy min/max, governors, etc are initialized
>>> for
>>> the policy. This in turn means that calls to cpufreq_cpu_get() return
>>> a
>>> non-NULL policy before the policy/CPU is ready to be used.
>>>
>>> To fix this, move the update of per CPU policy to a valid value after
>>> all
>>> the initialization steps for the policy are completed.
>>>
>>> Example kernel panic without this fix:
>>> [ 512.146185] Unable to handle kernel NULL pointer dereference at
>>> virtual address 00000020
>>> [ 512.146195] pgd = c0003000
>>> [ 512.146213] [00000020] *pgd=80000000004003, *pmd=00000000
>>> [ 512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM
>>> <snip>
>>> [ 512.146297] PC is at __cpufreq_governor+0x10/0x1ac
>>> [ 512.146312] LR is at cpufreq_update_policy+0x114/0x150
>>> <snip>
>>> [ 512.149740] ---[ end trace f23a8defea6cd706 ]---
>>> [ 512.149761] Kernel panic - not syncing: Fatal exception
>>> [ 513.152016] CPU0: stopping
>>> [ 513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W
>>> 3.10.0-gd727407-00074-g979ede8 #396
>>> <snip>
>>> [ 513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from
>>> [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
>>> [ 513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
>>> from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
>>> [ 513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
>>> from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8)
>>> [ 513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from
>>> [<c0803e7c>] (cpufreq_init_policy+0x30/0x98)
>>> [ 513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from
>>> [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4)
>>> [ 513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4)
>>> from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84)
>>> [ 513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from
>>> [<c0afe180>] (notifier_call_chain+0x40/0x68)
>>> [ 513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from
>>> [<c02812dc>] (__cpu_notify+0x28/0x44)
>>> [ 513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>]
>>> (_cpu_up+0xf4/0x1dc)
>>> [ 513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>]
>>> (cpu_up+0x5c/0x78)
>>> [ 513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>]
>>> (store_online+0x44/0x74)
>>> [ 513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>]
>>> (sysfs_write_file+0x108/0x14c)
>>> [ 513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from
>>> [<c03517d4>] (vfs_write+0xd0/0x180)
>>> [ 513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>]
>>> (SyS_write+0x38/0x68)
>>> [ 513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>]
>>> (ret_fast_syscall+0x0/0x30)
>>>
>>> In this specific case, CPU0 set's CPU1's policy->governor in
>>> cpufreq_init_policy() to NULL while CPU1 is using the policy->governor
>>> in
>>> __cpufreq_governor().
>>>
>>
>> Whoa! That's horrible! Can you please explain how exactly you
>> triggered this? I'm curious...
>
> :)
>
> Actually I didn't understood his problem very well until now. I couldn't
> get from his trace that which pointer was actually set to NULL here?
> And hence what caused this crash?
>
> Also, what Saravana just wrote here didn't looked like relevant at all.
> i.e.: policy->governor being set to NULL.
>
> So what? It was set to NULL with a reason and it shouldn't cause
> any crashes I hope. And also its sort of wrong to say that CPU0
> has set governor for CPU1, as governor was set for a policy and
> both CPUs were part of the same policy.
>
>>> Signed-off-by: Saravana Kannan <[email protected]>
>>> ---
>>> drivers/cpufreq/cpufreq.c | 10 +++++-----
>>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>> index cb003a6..d5ceb43 100644
>>> --- a/drivers/cpufreq/cpufreq.c
>>> +++ b/drivers/cpufreq/cpufreq.c
>>> @@ -1109,11 +1109,6 @@ static int __cpufreq_add_dev(struct device *dev,
>>> struct subsys_interface *sif,
>>> goto err_set_policy_cpu;
>>> }
>>>
>>> - write_lock_irqsave(&cpufreq_driver_lock, flags);
>>> - for_each_cpu(j, policy->cpus)
>>> - per_cpu(cpufreq_cpu_data, j) = policy;
>>> - write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>>> -
>>> if (cpufreq_driver->get) {
>>> policy->cur = cpufreq_driver->get(policy->cpu);
>>
>> If you move the per-cpu init further down, then what happens to the
>> cpufreq_generic_get() that gets invoked here by some of the drivers?
>> It will almost always fail (because policy will be NULL) and hence
>> CPU online will be unsuccessful
>
> Thanks, I was about to point that as well.
>
>> (though you wont observe it because
>> the error code is not bubbled up to the CPU hotplug core; perhaps we
>> should).
>
> Don't know :)
>
>> IMHO, we should fix the synchronization in cpufreq_init_policy().
>> I notice cpufreq_update_policy() as well as __cpufreq_governor() in
>> your stack trace, which means cpufreq_set_policy() was involved.
>> cpufreq_update_policy() takes the policy->rwsem for write, whereas
>> cpufreq_init_policy() doesn't take that lock at all, which is clearly
>> wrong.
>>
>> My guess is that, if we fix that locking, everything will be fine and
>> you won't hit the bug. Would you like to give that a shot?
>
> I am not sure about that guess as well. I first want to know what
> went bad exactly..
>

I just replied to the other email. I think I answered both your questions
there. Sorry about mixing up CPU and policy. In my case, each CPU is
independently scalable -- so for now take CPU == policy. I'll fix it up
once we agree on the fix.

-Saravana

--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

2014-02-24 08:46:03

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

On 24 February 2014 14:11, <[email protected]> wrote:
> I just replied to the other email. I think I answered both your questions
> there. Sorry about mixing up CPU and policy. In my case, each CPU is
> independently scalable -- so for now take CPU == policy. I'll fix it up
> once we agree on the fix.

But why do you say this then?

>>> In this specific case, CPU0 set's CPU1's policy->governor in
>>> cpufreq_init_policy() to NULL while CPU1 is using the policy->governor
>>> in __cpufreq_governor().

2014-02-24 08:47:20

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done


Viresh Kumar wrote:
> On 24 February 2014 14:11, <[email protected]> wrote:
>> I just replied to the other email. I think I answered both your
>> questions
>> there. Sorry about mixing up CPU and policy. In my case, each CPU is
>> independently scalable -- so for now take CPU == policy. I'll fix it up
>> once we agree on the fix.
>
> But why do you say this then?

Sorry, not sure I understand what you mean.

I agree, wording in my commit text might be unclear. I'll fix it after we
agree on the code fix. In the MSM case, each CPU has it's own policy.

I'm assuming your original complaint was about my confusing wording. Maybe
that's not what you were pointing out?

-Saravana

--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

2014-02-24 08:50:53

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

On 24 February 2014 14:17, <[email protected]> wrote:
> Sorry, not sure I understand what you mean.
>
> I agree, wording in my commit text might be unclear. I'll fix it after we
> agree on the code fix. In the MSM case, each CPU has it's own policy.
>
> I'm assuming your original complaint was about my confusing wording. Maybe
> that's not what you were pointing out?

In your case each CPU has a separate policy structure as they have separately
controllable clocks. But you also said that CPU0 is setting CPU1's governor to
NULL. I don't see that happening. Each CPU sets its own governor to NULL on
init().

2014-02-24 09:00:08

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done


Viresh Kumar wrote:
> On 24 February 2014 14:17, <[email protected]> wrote:
>> Sorry, not sure I understand what you mean.
>>
>> I agree, wording in my commit text might be unclear. I'll fix it after
>> we
>> agree on the code fix. In the MSM case, each CPU has it's own policy.
>>
>> I'm assuming your original complaint was about my confusing wording.
>> Maybe
>> that's not what you were pointing out?
>
> In your case each CPU has a separate policy structure as they have
> separately
> controllable clocks. But you also said that CPU0 is setting CPU1's
> governor to
> NULL. I don't see that happening. Each CPU sets its own governor to NULL
> on
> init().

When I said "CPU0 is setting CPU1's governor to NULL", I meant
thread/context running in CPU0 is setting CPU1's governor as part of
CPU1's ONLINE notifier.

-Saravana

--
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

2014-02-24 10:56:02

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

On 24 February 2014 14:09, <[email protected]> wrote:
>
> Srivatsa S. Bhat wrote:
>> On 02/24/2014 12:27 PM, Saravana Kannan wrote:
>>> The existing code sets the per CPU policy to a non-NULL value before all
>>> the steps performed during the hotplug online path is done.
>>> Specifically,
>>> this is done before the policy min/max, governors, etc are initialized
>>> for
>>> the policy. This in turn means that calls to cpufreq_cpu_get() return a
>>> non-NULL policy before the policy/CPU is ready to be used.
>>>
>>> To fix this, move the update of per CPU policy to a valid value after
>>> all
>>> the initialization steps for the policy are completed.
>>>
>>> Example kernel panic without this fix:
>>> [ 512.146185] Unable to handle kernel NULL pointer dereference at
>>> virtual address 00000020
>>> [ 512.146195] pgd = c0003000
>>> [ 512.146213] [00000020] *pgd=80000000004003, *pmd=00000000
>>> [ 512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM
>>> <snip>
>>> [ 512.146297] PC is at __cpufreq_governor+0x10/0x1ac
>>> [ 512.146312] LR is at cpufreq_update_policy+0x114/0x150
>>> <snip>
>>> [ 512.149740] ---[ end trace f23a8defea6cd706 ]---
>>> [ 512.149761] Kernel panic - not syncing: Fatal exception
>>> [ 513.152016] CPU0: stopping
>>> [ 513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W
>>> 3.10.0-gd727407-00074-g979ede8 #396
>>> <snip>
>>> [ 513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from
>>> [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
>>> [ 513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
>>> from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
>>> [ 513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
>>> from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8)
>>> [ 513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from
>>> [<c0803e7c>] (cpufreq_init_policy+0x30/0x98)
>>> [ 513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from
>>> [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4)
>>> [ 513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from
>>> [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84)
>>> [ 513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from
>>> [<c0afe180>] (notifier_call_chain+0x40/0x68)
>>> [ 513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from
>>> [<c02812dc>] (__cpu_notify+0x28/0x44)
>>> [ 513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>]
>>> (_cpu_up+0xf4/0x1dc)
>>> [ 513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>]
>>> (cpu_up+0x5c/0x78)
>>> [ 513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>]
>>> (store_online+0x44/0x74)
>>> [ 513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>]
>>> (sysfs_write_file+0x108/0x14c)
>>> [ 513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from
>>> [<c03517d4>] (vfs_write+0xd0/0x180)
>>> [ 513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>]
>>> (SyS_write+0x38/0x68)
>>> [ 513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>]
>>> (ret_fast_syscall+0x0/0x30)
>>>
>>> In this specific case, CPU0 set's CPU1's policy->governor in
>>> cpufreq_init_policy() to NULL while CPU1 is using the policy->governor
>>> in
>>> __cpufreq_governor().
>>>
>>
>> Whoa! That's horrible! Can you please explain how exactly you
>> triggered this? I'm curious...
>
> Just call cpufreq_update_policy(cpu) on a CPU and have another thread
> online/offline it.

But would you do that? Means why would you call them this way?
cpufreq_update_policy() shouldn't be called that way I believe..

>>> Signed-off-by: Saravana Kannan <[email protected]>
>>> ---
>>> drivers/cpufreq/cpufreq.c | 10 +++++-----
>>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>> index cb003a6..d5ceb43 100644
>>> --- a/drivers/cpufreq/cpufreq.c
>>> +++ b/drivers/cpufreq/cpufreq.c
>>> @@ -1109,11 +1109,6 @@ static int __cpufreq_add_dev(struct device *dev,
>>> struct subsys_interface *sif,
>>> goto err_set_policy_cpu;
>>> }
>>>
>>> - write_lock_irqsave(&cpufreq_driver_lock, flags);
>>> - for_each_cpu(j, policy->cpus)
>>> - per_cpu(cpufreq_cpu_data, j) = policy;
>>> - write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>>> -
>>> if (cpufreq_driver->get) {
>>> policy->cur = cpufreq_driver->get(policy->cpu);
>>
>> If you move the per-cpu init further down, then what happens to the
>> cpufreq_generic_get() that gets invoked here by some of the drivers?
>
> While cpufreq_generic_get() was a good refactor, I think it's causing
> unnecessary cyclic dependency. You need that function to not fail for a
> policy to get added properly and you need a proper policy for the function
> to work. I care more about fixing the panic than trying to keep
> cpufreq_generic_get().

Surely, I will like a solution which would keep this as is :)..
cpufreq_generic_get() should pass..

>> It will almost always fail (because policy will be NULL) and hence
>> CPU online will be unsuccessful (though you wont observe it because
>> the error code is not bubbled up to the CPU hotplug core; perhaps we
>> should).
>
> Good catch. I actually hit this issue, fixed and test it on a 3.12 cpufreq
> code base. Since this new function isn't there at that point, I missed it.
> Even if I did use the latest kernel, I wouldn't have hit this issue,
> because the MSM cpufreq driver doesn't use this function.
>
>>
>> IMHO, we should fix the synchronization in cpufreq_init_policy().
>> I notice cpufreq_update_policy() as well as __cpufreq_governor() in
>> your stack trace, which means cpufreq_set_policy() was involved.
>> cpufreq_update_policy() takes the policy->rwsem for write, whereas
>> cpufreq_init_policy() doesn't take that lock at all, which is clearly
>> wrong.
>>
>> My guess is that, if we fix that locking, everything will be fine and
>> you won't hit the bug. Would you like to give that a shot?
>
> While locking might need fixing, this is not about the locking though.
> Plenty of drivers and the framework use cpufreq_cpu_get() to get a policy
> that they consider valid and also use it as a way to check and reject API
> calls trying to manipulate an offline CPU.
>
> Without this fix, the framework is advertising an incomplete policy object
> as available. I think that breaks the CPUfreq framework APIs in a very
> fundamental sense. This is a "no-no" in programming.
>
> It's like trying to register a CPUfreq driver before getting the clocks to
> control the CPU.

So the real question here is: What fields of 'policy' do we need to guarantee
to be initialized before policy is made available for others? And probably
that is what we need to fix.

Even your current solution would break things. For example, below will happen
before policy is set in per-cpu variable:
- CPUFREQ_CREATE_POLICY notifier will be fired and calls to do cpufreq_cpu_get()
there will fail. And hence cpufreq-stats sysfs will break.
- Governors also use cpufreq_cpu_get() and so those would also fail as they
are started from cpufreq_init_policy()..

..

2014-02-24 20:23:20

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

On 02/24/2014 02:55 AM, Viresh Kumar wrote:
> On 24 February 2014 14:09, <[email protected]> wrote:
>>
>> Srivatsa S. Bhat wrote:
>>> On 02/24/2014 12:27 PM, Saravana Kannan wrote:
>>>> The existing code sets the per CPU policy to a non-NULL value before all
>>>> the steps performed during the hotplug online path is done.
>>>> Specifically,
>>>> this is done before the policy min/max, governors, etc are initialized
>>>> for
>>>> the policy. This in turn means that calls to cpufreq_cpu_get() return a
>>>> non-NULL policy before the policy/CPU is ready to be used.
>>>>
>>>> To fix this, move the update of per CPU policy to a valid value after
>>>> all
>>>> the initialization steps for the policy are completed.
>>>>
>>>> Example kernel panic without this fix:
>>>> [ 512.146185] Unable to handle kernel NULL pointer dereference at
>>>> virtual address 00000020
>>>> [ 512.146195] pgd = c0003000
>>>> [ 512.146213] [00000020] *pgd=80000000004003, *pmd=00000000
>>>> [ 512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM
>>>> <snip>
>>>> [ 512.146297] PC is at __cpufreq_governor+0x10/0x1ac
>>>> [ 512.146312] LR is at cpufreq_update_policy+0x114/0x150
>>>> <snip>
>>>> [ 512.149740] ---[ end trace f23a8defea6cd706 ]---
>>>> [ 512.149761] Kernel panic - not syncing: Fatal exception
>>>> [ 513.152016] CPU0: stopping
>>>> [ 513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W
>>>> 3.10.0-gd727407-00074-g979ede8 #396
>>>> <snip>
>>>> [ 513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from
>>>> [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
>>>> [ 513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
>>>> from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
>>>> [ 513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
>>>> from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8)
>>>> [ 513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from
>>>> [<c0803e7c>] (cpufreq_init_policy+0x30/0x98)
>>>> [ 513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from
>>>> [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4)
>>>> [ 513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from
>>>> [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84)
>>>> [ 513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from
>>>> [<c0afe180>] (notifier_call_chain+0x40/0x68)
>>>> [ 513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from
>>>> [<c02812dc>] (__cpu_notify+0x28/0x44)
>>>> [ 513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>]
>>>> (_cpu_up+0xf4/0x1dc)
>>>> [ 513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>]
>>>> (cpu_up+0x5c/0x78)
>>>> [ 513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>]
>>>> (store_online+0x44/0x74)
>>>> [ 513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>]
>>>> (sysfs_write_file+0x108/0x14c)
>>>> [ 513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from
>>>> [<c03517d4>] (vfs_write+0xd0/0x180)
>>>> [ 513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>]
>>>> (SyS_write+0x38/0x68)
>>>> [ 513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>]
>>>> (ret_fast_syscall+0x0/0x30)
>>>>
>>>> In this specific case, CPU0 set's CPU1's policy->governor in
>>>> cpufreq_init_policy() to NULL while CPU1 is using the policy->governor
>>>> in
>>>> __cpufreq_governor().
>>>>
>>>
>>> Whoa! That's horrible! Can you please explain how exactly you
>>> triggered this? I'm curious...
>>
>> Just call cpufreq_update_policy(cpu) on a CPU and have another thread
>> online/offline it.
>
> But would you do that? Means why would you call them this way?
> cpufreq_update_policy() shouldn't be called that way I believe..

I was simplifying the scenario that causes it. We change the min/max
using ADJUST notifiers for multiple reasons -- thermal being one of them.

thermal/cpu_cooling is one example of it.

So, cpufreq_update_policy() can be called on any CPU. If that races with
someone offlining a CPU and onlining it, you'll get this crash.

>>>> Signed-off-by: Saravana Kannan <[email protected]>
>>>> ---
>>>> drivers/cpufreq/cpufreq.c | 10 +++++-----
>>>> 1 file changed, 5 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>>> index cb003a6..d5ceb43 100644
>>>> --- a/drivers/cpufreq/cpufreq.c
>>>> +++ b/drivers/cpufreq/cpufreq.c
>>>> @@ -1109,11 +1109,6 @@ static int __cpufreq_add_dev(struct device *dev,
>>>> struct subsys_interface *sif,
>>>> goto err_set_policy_cpu;
>>>> }
>>>>
>>>> - write_lock_irqsave(&cpufreq_driver_lock, flags);
>>>> - for_each_cpu(j, policy->cpus)
>>>> - per_cpu(cpufreq_cpu_data, j) = policy;
>>>> - write_unlock_irqrestore(&cpufreq_driver_lock, flags);
>>>> -
>>>> if (cpufreq_driver->get) {
>>>> policy->cur = cpufreq_driver->get(policy->cpu);
>>>
>>> If you move the per-cpu init further down, then what happens to the
>>> cpufreq_generic_get() that gets invoked here by some of the drivers?
>>
>> While cpufreq_generic_get() was a good refactor, I think it's causing
>> unnecessary cyclic dependency. You need that function to not fail for a
>> policy to get added properly and you need a proper policy for the function
>> to work. I care more about fixing the panic than trying to keep
>> cpufreq_generic_get().
>
> Surely, I will like a solution which would keep this as is :)..
> cpufreq_generic_get() should pass..

The idea would exist, but we can just call cpufreq_generic_get() and
pass it policy->clk if it is not NULL. Does that work for you?

>>> It will almost always fail (because policy will be NULL) and hence
>>> CPU online will be unsuccessful (though you wont observe it because
>>> the error code is not bubbled up to the CPU hotplug core; perhaps we
>>> should).
>>
>> Good catch. I actually hit this issue, fixed and test it on a 3.12 cpufreq
>> code base. Since this new function isn't there at that point, I missed it.
>> Even if I did use the latest kernel, I wouldn't have hit this issue,
>> because the MSM cpufreq driver doesn't use this function.
>>
>>>
>>> IMHO, we should fix the synchronization in cpufreq_init_policy().
>>> I notice cpufreq_update_policy() as well as __cpufreq_governor() in
>>> your stack trace, which means cpufreq_set_policy() was involved.
>>> cpufreq_update_policy() takes the policy->rwsem for write, whereas
>>> cpufreq_init_policy() doesn't take that lock at all, which is clearly
>>> wrong.
>>>
>>> My guess is that, if we fix that locking, everything will be fine and
>>> you won't hit the bug. Would you like to give that a shot?
>>
>> While locking might need fixing, this is not about the locking though.
>> Plenty of drivers and the framework use cpufreq_cpu_get() to get a policy
>> that they consider valid and also use it as a way to check and reject API
>> calls trying to manipulate an offline CPU.
>>
>> Without this fix, the framework is advertising an incomplete policy object
>> as available. I think that breaks the CPUfreq framework APIs in a very
>> fundamental sense. This is a "no-no" in programming.
>>
>> It's like trying to register a CPUfreq driver before getting the clocks to
>> control the CPU.
>
> So the real question here is: What fields of 'policy' do we need to guarantee
> to be initialized before policy is made available for others? And probably
> that is what we need to fix.

That's going to be hard to define since future uses could be looking at
different fields. What is the API semantics of cpufreq_cpu_get(). I
can't ever imagine it being correct for any API to return a partially
initialized data structure.

> Even your current solution would break things. For example, below will happen
> before policy is set in per-cpu variable:
> - CPUFREQ_CREATE_POLICY notifier will be fired and calls to do cpufreq_cpu_get()
> there will fail. And hence cpufreq-stats sysfs will break.
I did this on 3.12. I'll fix it up to handle this one.

> - Governors also use cpufreq_cpu_get() and so those would also fail as they
> are started from cpufreq_init_policy()..
The only use of this in governors is in update_sampling_rate() and the
behavior after this fix would be correct in that case. If the policy is
not fully initialized -- that update doesn't get to go through.

All other uses of this API fall under one of these categories:
* CPUs are already fully offline whenever it's called.
* Already get the real policy as part of the notifier so don't need to
call cpufreq_cpu_get

If I find any that doesn't fit the above, I would be glad to fix that if
it's pointed out.

Regards,
Saravana
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

2014-02-25 08:51:01

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

On 25 February 2014 01:53, Saravana Kannan <[email protected]> wrote:
> I was simplifying the scenario that causes it. We change the min/max using
> ADJUST notifiers for multiple reasons -- thermal being one of them.
>
> thermal/cpu_cooling is one example of it.

Just to understand the clear picture, you are actually hitting this bug? Or
is this only a theoretical bug?

> So, cpufreq_update_policy() can be called on any CPU. If that races with
> someone offlining a CPU and onlining it, you'll get this crash.

Then shouldn't that be fixed by locks? I think yes. That makes me agree with
Srivatsa more here.

Though I would say that your argument was also valid that 'policy' shouldn't be
up for sale unless it is prepared to. And for that reason only I
floated that question
earlier: What exactly we need to make sure is initialized in policy? Because
policy might keep changing in future as well and that needs locks to protect
that stuff. Like min/max/governor/ etc..

So, probably a solution here might be a mix of both. Initialize policy to this
minimum level and then make sure locking is used correctly..

> The idea would exist, but we can just call cpufreq_generic_get() and pass it
> policy->clk if it is not NULL. Does that work for you?

No. Not all drivers implement clk interface. And so clk doesn't look to be the
right parameter. I thought maybe 'policy' can be the right parameter and
then people can get use policy->cpu to get cpu id out of it.

But even that doesn't look to be a great idea. X86 drivers may share policy
structure for CPUs that don't actually share a clock line. And so they do need
right CPU number as parameter instead of policy. As they might be doing
some tricky stuff there. Also, we need to make sure that ->get() returns
the frequency at which CPU x is running.

>> So the real question here is: What fields of 'policy' do we need to
>> guarantee
>> to be initialized before policy is made available for others? And probably
>> that is what we need to fix.
>
> That's going to be hard to define since future uses could be looking at
> different fields. What is the API semantics of cpufreq_cpu_get(). I can't
> ever imagine it being correct for any API to return a partially initialized
> data structure.

I do agree that we can't broadcast 'policy' before it is usable. And that's
why I am asking about the right place where we are sure that it is ready..

>> Even your current solution would break things. For example, below will
>> happen
>> before policy is set in per-cpu variable:
>> - CPUFREQ_CREATE_POLICY notifier will be fired and calls to do
>> cpufreq_cpu_get()
>> there will fail. And hence cpufreq-stats sysfs will break.
>
> I did this on 3.12. I'll fix it up to handle this one.

This can be moved down without much issues I believe.

>> - Governors also use cpufreq_cpu_get() and so those would also fail as
>> they
>> are started from cpufreq_init_policy()..
>
> The only use of this in governors is in update_sampling_rate() and the
> behavior after this fix would be correct in that case. If the policy is not
> fully initialized -- that update doesn't get to go through.

hmm.. but even that looks odd. We have reached upto governors but
policy isn't available to be used? It should have been ready by now?

> All other uses of this API fall under one of these categories:
> * CPUs are already fully offline whenever it's called.
> * Already get the real policy as part of the notifier so don't need to
> call cpufreq_cpu_get
>
> If I find any that doesn't fit the above, I would be glad to fix that if
> it's pointed out.

I have just sent out a patchset to you and others, would be great if
you can give it a review/test ..

2014-02-25 12:53:32

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

On Tuesday, February 25, 2014 02:20:57 PM Viresh Kumar wrote:
> On 25 February 2014 01:53, Saravana Kannan <[email protected]> wrote:
> > I was simplifying the scenario that causes it. We change the min/max using
> > ADJUST notifiers for multiple reasons -- thermal being one of them.
> >
> > thermal/cpu_cooling is one example of it.
>
> Just to understand the clear picture, you are actually hitting this bug? Or
> is this only a theoretical bug?
>
> > So, cpufreq_update_policy() can be called on any CPU. If that races with
> > someone offlining a CPU and onlining it, you'll get this crash.
>
> Then shouldn't that be fixed by locks? I think yes. That makes me agree with
> Srivatsa more here.
>
> Though I would say that your argument was also valid that 'policy' shouldn't be
> up for sale unless it is prepared to. And for that reason only I
> floated that question
> earlier: What exactly we need to make sure is initialized in policy? Because
> policy might keep changing in future as well and that needs locks to protect
> that stuff. Like min/max/governor/ etc..

Well, that depends on what the current users expect it to look like initially.
It should be initialized to the point in which all of them can handle it
correctly.

> So, probably a solution here might be a mix of both. Initialize policy to this
> minimum level and then make sure locking is used correctly..

Yes.

> > The idea would exist, but we can just call cpufreq_generic_get() and pass it
> > policy->clk if it is not NULL. Does that work for you?
>
> No. Not all drivers implement clk interface. And so clk doesn't look to be the
> right parameter. I thought maybe 'policy' can be the right parameter and
> then people can get use policy->cpu to get cpu id out of it.
>
> But even that doesn't look to be a great idea. X86 drivers may share policy
> structure for CPUs that don't actually share a clock line. And so they do need
> right CPU number as parameter instead of policy. As they might be doing
> some tricky stuff there. Also, we need to make sure that ->get() returns
> the frequency at which CPU x is running.

That's not going to work in at least some cases anyway, because for some types
of HW we simply can't retrieve the current frequency in a non-racy way.

Thanks!

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-02-25 14:40:57

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

On 25 February 2014 18:34, Rafael J. Wysocki <[email protected]> wrote:
> Well, that depends on what the current users expect it to look like initially.
> It should be initialized to the point in which all of them can handle it
> correctly.

Exactly.

> That's not going to work in at least some cases anyway, because for some types
> of HW we simply can't retrieve the current frequency in a non-racy way.

I agree. But this is all that we can guarantee here :)

2014-02-25 21:11:32

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

On 02/25/2014 05:04 AM, Rafael J. Wysocki wrote:
> On Tuesday, February 25, 2014 02:20:57 PM Viresh Kumar wrote:
>> On 25 February 2014 01:53, Saravana Kannan <[email protected]> wrote:
>>> I was simplifying the scenario that causes it. We change the min/max using
>>> ADJUST notifiers for multiple reasons -- thermal being one of them.
>>>
>>> thermal/cpu_cooling is one example of it.
>>
>> Just to understand the clear picture, you are actually hitting this bug? Or
>> is this only a theoretical bug?
>>
This is a real bug. But the actual caller of cpufreq_update_policy() is
a driver that's local to our tree. I'm just giving examples of upstream
code that act in a similar way.

>>> So, cpufreq_update_policy() can be called on any CPU. If that races with
>>> someone offlining a CPU and onlining it, you'll get this crash.
>>
>> Then shouldn't that be fixed by locks? I think yes. That makes me agree with
>> Srivatsa more here.
>>
>> Though I would say that your argument was also valid that 'policy' shouldn't be
>> up for sale unless it is prepared to. And for that reason only I
>> floated that question
>> earlier: What exactly we need to make sure is initialized in policy? Because
>> policy might keep changing in future as well and that needs locks to protect
>> that stuff. Like min/max/governor/ etc..
>
> Well, that depends on what the current users expect it to look like initially.
> It should be initialized to the point in which all of them can handle it
> correctly.

Yes, so let's not make it available until all of it is initialized. I
don't like the piece meal check. 6 months down the lane someone making
changes might not remember this. The problem also applies for drivers
that might not be upstreamed, etc.

>> So, probably a solution here might be a mix of both. Initialize policy to this
>> minimum level and then make sure locking is used correctly..
>
> Yes.

Rafael, It's not clear what you mean by the yes. Do you want to
initialize it partly and make it available. I think that's always wrong.

>>> The idea would exist, but we can just call cpufreq_generic_get() and pass it
>>> policy->clk if it is not NULL. Does that work for you?
>>
>> No. Not all drivers implement clk interface. And so clk doesn't look to be the
>> right parameter. I thought maybe 'policy' can be the right parameter and
>> then people can get use policy->cpu to get cpu id out of it.
>>
>> But even that doesn't look to be a great idea. X86 drivers may share policy
>> structure for CPUs that don't actually share a clock line. And so they do need
>> right CPU number as parameter instead of policy. As they might be doing
>> some tricky stuff there. Also, we need to make sure that ->get() returns
>> the frequency at which CPU x is running.
>
> That's not going to work in at least some cases anyway, because for some types
> of HW we simply can't retrieve the current frequency in a non-racy way.

I think there's been a misunderstanding of what I'm proposing. The
reference to policy->clk is only to get rid of the dependency that
cpufreq_generic_get() has on the per cpu policy variable being filled.
You can do that by just removing calls to get _IF_ clk is filled in.

Viresh,

I'll look at the patches later today and respond. Although, I would have
been nice you had helped me fix any issues with my patch than coming up
with new ones. Kinda removes to motivation for submitting patches upstream.

Regards,
Saravana

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

2014-02-25 22:26:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

On Tuesday, February 25, 2014 01:11:27 PM Saravana Kannan wrote:
> On 02/25/2014 05:04 AM, Rafael J. Wysocki wrote:
> > On Tuesday, February 25, 2014 02:20:57 PM Viresh Kumar wrote:
> >> On 25 February 2014 01:53, Saravana Kannan <[email protected]> wrote:
> >>> I was simplifying the scenario that causes it. We change the min/max using
> >>> ADJUST notifiers for multiple reasons -- thermal being one of them.
> >>>
> >>> thermal/cpu_cooling is one example of it.
> >>
> >> Just to understand the clear picture, you are actually hitting this bug? Or
> >> is this only a theoretical bug?
> >>
> This is a real bug. But the actual caller of cpufreq_update_policy() is
> a driver that's local to our tree. I'm just giving examples of upstream
> code that act in a similar way.
>
> >>> So, cpufreq_update_policy() can be called on any CPU. If that races with
> >>> someone offlining a CPU and onlining it, you'll get this crash.
> >>
> >> Then shouldn't that be fixed by locks? I think yes. That makes me agree with
> >> Srivatsa more here.
> >>
> >> Though I would say that your argument was also valid that 'policy' shouldn't be
> >> up for sale unless it is prepared to. And for that reason only I
> >> floated that question
> >> earlier: What exactly we need to make sure is initialized in policy? Because
> >> policy might keep changing in future as well and that needs locks to protect
> >> that stuff. Like min/max/governor/ etc..
> >
> > Well, that depends on what the current users expect it to look like initially.
> > It should be initialized to the point in which all of them can handle it
> > correctly.
>
> Yes, so let's not make it available until all of it is initialized.

And is "fully initialized" actually well defined?

> I don't like the piece meal check. 6 months down the lane someone making
> changes might not remember this. The problem also applies for drivers
> that might not be upstreamed, etc.

Please don't bring up out-of-the-tree drivers argument in mainline discussions,
it is meaningless here.

> >> So, probably a solution here might be a mix of both. Initialize policy to this
> >> minimum level and then make sure locking is used correctly..
> >
> > Yes.
>
> Rafael, It's not clear what you mean by the yes. Do you want to
> initialize it partly and make it available. I think that's always wrong.

So what actually is your porposal?

> >>> The idea would exist, but we can just call cpufreq_generic_get() and pass it
> >>> policy->clk if it is not NULL. Does that work for you?
> >>
> >> No. Not all drivers implement clk interface. And so clk doesn't look to be the
> >> right parameter. I thought maybe 'policy' can be the right parameter and
> >> then people can get use policy->cpu to get cpu id out of it.
> >>
> >> But even that doesn't look to be a great idea. X86 drivers may share policy
> >> structure for CPUs that don't actually share a clock line. And so they do need
> >> right CPU number as parameter instead of policy. As they might be doing
> >> some tricky stuff there. Also, we need to make sure that ->get() returns
> >> the frequency at which CPU x is running.
> >
> > That's not going to work in at least some cases anyway, because for some types
> > of HW we simply can't retrieve the current frequency in a non-racy way.
>
> I think there's been a misunderstanding of what I'm proposing. The
> reference to policy->clk is only to get rid of the dependency that
> cpufreq_generic_get() has on the per cpu policy variable being filled.
> You can do that by just removing calls to get _IF_ clk is filled in.

I still have a little problem understanding what you mean exactly. At least
please explain the last sentence.

> Viresh,
>
> I'll look at the patches later today and respond. Although, I would have
> been nice you had helped me fix any issues with my patch than coming up
> with new ones. Kinda removes to motivation for submitting patches upstream.

Everyone is free to post patches at any time during the discussion. Viresh is
as well as you are.

Thanks!

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2014-02-26 01:48:41

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

On 02/25/2014 02:41 PM, Rafael J. Wysocki wrote:
> On Tuesday, February 25, 2014 01:11:27 PM Saravana Kannan wrote:
>> On 02/25/2014 05:04 AM, Rafael J. Wysocki wrote:
>>> On Tuesday, February 25, 2014 02:20:57 PM Viresh Kumar wrote:
>>>> On 25 February 2014 01:53, Saravana Kannan <[email protected]> wrote:
>>>>> I was simplifying the scenario that causes it. We change the min/max using
>>>>> ADJUST notifiers for multiple reasons -- thermal being one of them.
>>>>>
>>>>> thermal/cpu_cooling is one example of it.
>>>>
>>>> Just to understand the clear picture, you are actually hitting this bug? Or
>>>> is this only a theoretical bug?
>>>>
>> This is a real bug. But the actual caller of cpufreq_update_policy() is
>> a driver that's local to our tree. I'm just giving examples of upstream
>> code that act in a similar way.
>>
>>>>> So, cpufreq_update_policy() can be called on any CPU. If that races with
>>>>> someone offlining a CPU and onlining it, you'll get this crash.
>>>>
>>>> Then shouldn't that be fixed by locks? I think yes. That makes me agree with
>>>> Srivatsa more here.
>>>>
>>>> Though I would say that your argument was also valid that 'policy' shouldn't be
>>>> up for sale unless it is prepared to. And for that reason only I
>>>> floated that question
>>>> earlier: What exactly we need to make sure is initialized in policy? Because
>>>> policy might keep changing in future as well and that needs locks to protect
>>>> that stuff. Like min/max/governor/ etc..
>>>
>>> Well, that depends on what the current users expect it to look like initially.
>>> It should be initialized to the point in which all of them can handle it
>>> correctly.
>>
>> Yes, so let's not make it available until all of it is initialized.
>
> And is "fully initialized" actually well defined?

The point in add dev/hot plug path after which we will no longer change
policy fields without sending further CPUFREQ_UPDATE_POLICY_CPU /
CPUFRE_NOTIFY notifiers.

Pretty much the end of __cpufreq_add_dev() so that it's after:
- cpufreq_init_policy()
- And the update of userpolicy fields that after thie init call

>> I don't like the piece meal check. 6 months down the lane someone making
>> changes might not remember this. The problem also applies for drivers
>> that might not be upstreamed, etc.
>
> Please don't bring up out-of-the-tree drivers argument in mainline discussions,
> it is meaningless here.

Sure. This also applies to in-tree code. The 6 months down the lane,
what fields are being looked at could still change and we might slip up
on making sure that the policy is not made available before that field
is initialized.

>>>> So, probably a solution here might be a mix of both. Initialize policy to this
>>>> minimum level and then make sure locking is used correctly..
>>>
>>> Yes.
>>
>> Rafael, It's not clear what you mean by the yes. Do you want to
>> initialize it partly and make it available. I think that's always wrong.
>
> So what actually is your porposal?
>

Same as reply to "fully initialized". We make the policy after it's
fully initialized. After that point, any changes to the policy can be
tracked by registering for CPUFREQ_UPDATE_POLICY_CPU / CPUFREQ_NOTIFY.

>>>>> The idea would exist, but we can just call cpufreq_generic_get() and pass it
>>>>> policy->clk if it is not NULL. Does that work for you?
>>>>
>>>> No. Not all drivers implement clk interface. And so clk doesn't look to be the
>>>> right parameter. I thought maybe 'policy' can be the right parameter and
>>>> then people can get use policy->cpu to get cpu id out of it.
>>>>
>>>> But even that doesn't look to be a great idea. X86 drivers may share policy
>>>> structure for CPUs that don't actually share a clock line. And so they do need
>>>> right CPU number as parameter instead of policy. As they might be doing
>>>> some tricky stuff there. Also, we need to make sure that ->get() returns
>>>> the frequency at which CPU x is running.
>>>
>>> That's not going to work in at least some cases anyway, because for some types
>>> of HW we simply can't retrieve the current frequency in a non-racy way.
>>
>> I think there's been a misunderstanding of what I'm proposing. The
>> reference to policy->clk is only to get rid of the dependency that
>> cpufreq_generic_get() has on the per cpu policy variable being filled.
>> You can do that by just removing calls to get _IF_ clk is filled in.
>
> I still have a little problem understanding what you mean exactly. At least
> please explain the last sentence.

Ok, here's some pseudo code to explain it better:

Something like, replace all calls to cpufreq_driver->get with
__cpufreq_driver_get() with the fn being something like:

unsigned int __cpufreq_driver_get(cpufreq_policy *policy)
{
if (policy->clk)
return clk_get_rate(policy->clk) / 1000;
else
return cpufreq_driver->get(policy->cpu);
}

That way, we don't have to depend on cpufreq_cpu_get() to return a
policy in cpufreq_generic_get() for the existing add dev code path to
complete without an error (after my patch to advertise the policy struct
after it's fully initialized)

This also avoid unnecessary function calls to ->get() when we just as
well know that it's going to call back into cpufreq_generic_get() and
get the clock rate. We can just do that in the framework.

-Saravana

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

2014-02-26 03:38:46

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH 1/3] cpufreq: stats: Remove redundant cpufreq_cpu_get() call

__cpufreq_stats_create_table always gets pass the valid and real policy
struct. So, there's no need to call cpufreq_cpu_get() to get the policy
again.

Change-Id: I0136b3e67018ee3af2335906407f55d8c6219f71
Signed-off-by: Saravana Kannan <[email protected]>
---

Viresh/Rafael,

These 3 patches is the approximate code I have in mind.

Approximate because:
* I inserted one question as a comment into the code.
* If the patch doesn't have any bugs, the plan is to remove
cpufreq_generic_get() and references to it.

This takes care of the "don't advertise before it's ready for use" rule.

Viresh,

I think the locking updates needs to be done in addition to this.

Regards,
Saravana

drivers/cpufreq/cpufreq_stats.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index 5793e14..e4bd27f 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -185,7 +185,6 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy,
{
unsigned int i, j, count = 0, ret = 0;
struct cpufreq_stats *stat;
- struct cpufreq_policy *current_policy;
unsigned int alloc_size;
unsigned int cpu = policy->cpu;
if (per_cpu(cpufreq_stats_table, cpu))
@@ -194,13 +193,7 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy,
if ((stat) == NULL)
return -ENOMEM;

- current_policy = cpufreq_cpu_get(cpu);
- if (current_policy == NULL) {
- ret = -EINVAL;
- goto error_get_fail;
- }
-
- ret = sysfs_create_group(&current_policy->kobj, &stats_attr_group);
+ ret = sysfs_create_group(&policy->kobj, &stats_attr_group);
if (ret)
goto error_out;

@@ -243,11 +236,8 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy,
stat->last_time = get_jiffies_64();
stat->last_index = freq_table_get_index(stat, policy->cur);
spin_unlock(&cpufreq_stats_lock);
- cpufreq_cpu_put(current_policy);
return 0;
error_out:
- cpufreq_cpu_put(current_policy);
-error_get_fail:
kfree(stat);
per_cpu(cpufreq_stats_table, cpu) = NULL;
return ret;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

2014-02-26 03:39:26

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH 2/3] cpufreq: stats: Fix error handling in __cpufreq_stats_create_table()

Remove sysfs group if __cpufreq_stats_create_table() fails after creating
one.

Change-Id: Icb0b44424cc4eb6c88be255e2839ef51c3f8779c
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/cpufreq/cpufreq_stats.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index e4bd27f..c52b440 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -216,7 +216,7 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy,
stat->time_in_state = kzalloc(alloc_size, GFP_KERNEL);
if (!stat->time_in_state) {
ret = -ENOMEM;
- goto error_out;
+ goto error_alloc;
}
stat->freq_table = (unsigned int *)(stat->time_in_state + count);

@@ -237,6 +237,8 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy,
stat->last_index = freq_table_get_index(stat, policy->cur);
spin_unlock(&cpufreq_stats_lock);
return 0;
+error_alloc:
+ sysfs_remove_group(&policy->kobj, &stats_attr_group);
error_out:
kfree(stat);
per_cpu(cpufreq_stats_table, cpu) = NULL;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

2014-02-26 03:39:28

by Saravana Kannan

[permalink] [raw]
Subject: [PATCH 3/3] cpufreq: Set policy to non-NULL only after all hotplug online work is done

The existing code sets the per CPU policy to a non-NULL value before all
the steps performed during the hotplug online path is done. Specifically,
this is done before the policy min/max, governors, etc are initialized for
the policy. This in turn means that calls to cpufreq_cpu_get() return a
non-NULL policy before the policy/CPU is ready to be used.

To fix this, move the update of per CPU policy to a valid value after all
the initialization steps for the policy are completed.

Example kernel panic without this fix:
[ 512.146185] Unable to handle kernel NULL pointer dereference at virtual address 00000020
[ 512.146195] pgd = c0003000
[ 512.146213] [00000020] *pgd=80000000004003, *pmd=00000000
[ 512.146228] Internal error: Oops: 206 [#1] PREEMPT SMP ARM
<snip>
[ 512.146297] PC is at __cpufreq_governor+0x10/0x1ac
[ 512.146312] LR is at cpufreq_update_policy+0x114/0x150
<snip>
[ 512.149740] ---[ end trace f23a8defea6cd706 ]---
[ 512.149761] Kernel panic - not syncing: Fatal exception
[ 513.152016] CPU0: stopping
[ 513.154710] CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W 3.10.0-gd727407-00074-g979ede8 #396
<snip>
[ 513.317224] [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
[ 513.327809] [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
[ 513.339182] [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8)
[ 513.349594] [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from [<c0803e7c>] (cpufreq_init_policy+0x30/0x98)
[ 513.359231] [<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4)
[ 513.369560] [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84)
[ 513.379978] [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from [<c0afe180>] (notifier_call_chain+0x40/0x68)
[ 513.389704] [<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02812dc>] (__cpu_notify+0x28/0x44)
[ 513.398728] [<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] (_cpu_up+0xf4/0x1dc)
[ 513.406797] [<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] (cpu_up+0x5c/0x78)
[ 513.414357] [<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] (store_online+0x44/0x74)
[ 513.422253] [<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] (sysfs_write_file+0x108/0x14c)
[ 513.431195] [<c03a40f4>] (sysfs_write_file+0x108/0x14c) from [<c03517d4>] (vfs_write+0xd0/0x180)
[ 513.439958] [<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] (SyS_write+0x38/0x68)
[ 513.447947] [<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] (ret_fast_syscall+0x0/0x30)

In this specific case, thread A set's CPU1's policy->governor in
cpufreq_init_policy() to NULL while thread B is using the policy->governor in
__cpufreq_governor().

Change-Id: I0f6f4e51ac3b7127a1ea56a1cb8e7ae1bcf8d6b6
Signed-off-by: Saravana Kannan <[email protected]>
---
drivers/cpufreq/cpufreq.c | 52 ++++++++++++++++++++++++++++-------------------
1 file changed, 31 insertions(+), 21 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index cb003a6..5caefa9 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -849,7 +849,7 @@ static int cpufreq_add_dev_interface(struct cpufreq_policy *policy,
goto err_out_kobj_put;
drv_attr++;
}
- if (cpufreq_driver->get) {
+ if (cpufreq_driver->get || policy->clk) {
ret = sysfs_create_file(&policy->kobj, &cpuinfo_cur_freq.attr);
if (ret)
goto err_out_kobj_put;
@@ -877,6 +877,22 @@ err_out_kobj_put:
return ret;
}

+static unsigned int __cpufreq_get_freq(struct cpufreq_policy *policy)
+{
+ unsigned long freq;
+
+ if (policy->clk) {
+ freq = clk_get_rate(policy->clk);
+ if(!IS_ERR_VALUE(freq))
+ return freq / 1000;
+ }
+
+ if (cpufreq_driver->get)
+ return cpufreq_driver->get(policy->cpu);
+
+ return 0;
+}
+
static void cpufreq_init_policy(struct cpufreq_policy *policy)
{
struct cpufreq_policy new_policy;
@@ -1109,17 +1125,10 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
goto err_set_policy_cpu;
}

- write_lock_irqsave(&cpufreq_driver_lock, flags);
- for_each_cpu(j, policy->cpus)
- per_cpu(cpufreq_cpu_data, j) = policy;
- write_unlock_irqrestore(&cpufreq_driver_lock, flags);
-
- if (cpufreq_driver->get) {
- policy->cur = cpufreq_driver->get(policy->cpu);
- if (!policy->cur) {
- pr_err("%s: ->get() failed\n", __func__);
- goto err_get_freq;
- }
+ policy->cur = __cpufreq_get_freq(policy);
+ if (!policy->cur) {
+ pr_err("%s: get freq failed\n", __func__);
+ goto err_get_freq;
}

/*
@@ -1207,6 +1216,11 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
policy->user_policy.governor = policy->governor;
}

+ write_lock_irqsave(&cpufreq_driver_lock, flags);
+ for_each_cpu(j, policy->cpus)
+ per_cpu(cpufreq_cpu_data, j) = policy;
+ write_unlock_irqrestore(&cpufreq_driver_lock, flags);
+
kobject_uevent(&policy->kobj, KOBJ_ADD);
up_read(&cpufreq_rwsem);

@@ -1216,11 +1230,6 @@ static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,

err_out_unregister:
err_get_freq:
- write_lock_irqsave(&cpufreq_driver_lock, flags);
- for_each_cpu(j, policy->cpus)
- per_cpu(cpufreq_cpu_data, j) = NULL;
- write_unlock_irqrestore(&cpufreq_driver_lock, flags);
-
if (cpufreq_driver->exit)
cpufreq_driver->exit(policy);
err_set_policy_cpu:
@@ -1482,6 +1491,8 @@ unsigned int cpufreq_quick_get(unsigned int cpu)
struct cpufreq_policy *policy;
unsigned int ret_freq = 0;

+ /* What's up with the setpolicy check? Why the call to get only for
+ * arch's that implement setpolicy? */
if (cpufreq_driver && cpufreq_driver->setpolicy && cpufreq_driver->get)
return cpufreq_driver->get(cpu);

@@ -1520,11 +1531,10 @@ static unsigned int __cpufreq_get(unsigned int cpu)
struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
unsigned int ret_freq = 0;

- if (!cpufreq_driver->get)
+ ret_freq = __cpufreq_get_freq(policy);
+ if (!ret_freq)
return ret_freq;

- ret_freq = cpufreq_driver->get(cpu);
-
if (ret_freq && policy->cur &&
!(cpufreq_driver->flags & CPUFREQ_CONST_LOOPS)) {
/* verify no discrepancy between actual and
@@ -2148,7 +2158,7 @@ int cpufreq_update_policy(unsigned int cpu)
* BIOS might change freq behind our back
* -> ask driver for current freq and notify governors about a change
*/
- if (cpufreq_driver->get) {
+ if (cpufreq_driver->get || policy->clk) {
new_policy.cur = cpufreq_driver->get(cpu);
if (!policy->cur) {
pr_debug("Driver did not initialize current freq");
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

2014-02-26 05:07:00

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH 1/3] cpufreq: stats: Remove redundant cpufreq_cpu_get() call

On 26 February 2014 09:08, Saravana Kannan <[email protected]> wrote:
> __cpufreq_stats_create_table always gets pass the valid and real policy
> struct. So, there's no need to call cpufreq_cpu_get() to get the policy
> again.
>
> Change-Id: I0136b3e67018ee3af2335906407f55d8c6219f71

??

> Signed-off-by: Saravana Kannan <[email protected]>
> ---
>
> Viresh/Rafael,
>
> These 3 patches is the approximate code I have in mind.
>
> Approximate because:
> * I inserted one question as a comment into the code.
> * If the patch doesn't have any bugs, the plan is to remove
> cpufreq_generic_get() and references to it.
>
> This takes care of the "don't advertise before it's ready for use" rule.
>
> Viresh,
>
> I think the locking updates needs to be done in addition to this.
>
> Regards,
> Saravana
>
> drivers/cpufreq/cpufreq_stats.c | 12 +-----------
> 1 file changed, 1 insertion(+), 11 deletions(-)
>
> diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
> index 5793e14..e4bd27f 100644
> --- a/drivers/cpufreq/cpufreq_stats.c
> +++ b/drivers/cpufreq/cpufreq_stats.c
> @@ -185,7 +185,6 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy,
> {
> unsigned int i, j, count = 0, ret = 0;
> struct cpufreq_stats *stat;
> - struct cpufreq_policy *current_policy;
> unsigned int alloc_size;
> unsigned int cpu = policy->cpu;
> if (per_cpu(cpufreq_stats_table, cpu))
> @@ -194,13 +193,7 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy,
> if ((stat) == NULL)
> return -ENOMEM;
>
> - current_policy = cpufreq_cpu_get(cpu);
> - if (current_policy == NULL) {
> - ret = -EINVAL;
> - goto error_get_fail;
> - }
> -
> - ret = sysfs_create_group(&current_policy->kobj, &stats_attr_group);
> + ret = sysfs_create_group(&policy->kobj, &stats_attr_group);
> if (ret)
> goto error_out;
>
> @@ -243,11 +236,8 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy,
> stat->last_time = get_jiffies_64();
> stat->last_index = freq_table_get_index(stat, policy->cur);
> spin_unlock(&cpufreq_stats_lock);
> - cpufreq_cpu_put(current_policy);
> return 0;
> error_out:
> - cpufreq_cpu_put(current_policy);
> -error_get_fail:
> kfree(stat);
> per_cpu(cpufreq_stats_table, cpu) = NULL;
> return ret;

I was damn sure that this wasn't a waste of time. This was some meaningful
code when I visited it earlier. And we absolutely required a new
cpufreq_cpu_get()..

Reason: Earlier tables were created for this notifier: CPUFREQ_NOTIFY and
it used to come with another changed copy of 'policy' and so we were required
to get the real copy of policy to get to the right kobj.

But recently I have simplified stuff there and these tables are now added with
CPUFREQ_CREATE_POLICY and so this replication isn't required anymore.

So, Acked-by: Viresh Kumar <[email protected]>

While you are at it please get this part into __cpufreq_stats_create_table()
routine:

table = cpufreq_frequency_get_table(cpu);
if (!table)
return 0;

As it is replicated at two places currently.

2014-02-26 05:11:40

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH 2/3] cpufreq: stats: Fix error handling in __cpufreq_stats_create_table()

On 26 February 2014 09:08, Saravana Kannan <[email protected]> wrote:
> Remove sysfs group if __cpufreq_stats_create_table() fails after creating
> one.
>
> Change-Id: Icb0b44424cc4eb6c88be255e2839ef51c3f8779c
> Signed-off-by: Saravana Kannan <[email protected]>
> ---
> drivers/cpufreq/cpufreq_stats.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
> index e4bd27f..c52b440 100644
> --- a/drivers/cpufreq/cpufreq_stats.c
> +++ b/drivers/cpufreq/cpufreq_stats.c
> @@ -216,7 +216,7 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy,
> stat->time_in_state = kzalloc(alloc_size, GFP_KERNEL);
> if (!stat->time_in_state) {
> ret = -ENOMEM;
> - goto error_out;
> + goto error_alloc;
> }
> stat->freq_table = (unsigned int *)(stat->time_in_state + count);
>
> @@ -237,6 +237,8 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy,
> stat->last_index = freq_table_get_index(stat, policy->cur);
> spin_unlock(&cpufreq_stats_lock);
> return 0;
> +error_alloc:
> + sysfs_remove_group(&policy->kobj, &stats_attr_group);
> error_out:
> kfree(stat);
> per_cpu(cpufreq_stats_table, cpu) = NULL;

Acked-by: Viresh Kumar <[email protected]>

2014-02-26 05:20:46

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

On 26 February 2014 02:41, Saravana Kannan <[email protected]> wrote:
> On 02/25/2014 05:04 AM, Rafael J. Wysocki wrote:
>> On Tuesday, February 25, 2014 02:20:57 PM Viresh Kumar wrote:

> I think there's been a misunderstanding of what I'm proposing. The reference
> to policy->clk is only to get rid of the dependency that
> cpufreq_generic_get() has on the per cpu policy variable being filled. You
> can do that by just removing calls to get _IF_ clk is filled in.

cpufreq_cpu_get() can be called by other drivers as well, which may not have
clock interface implemented for them. We can't stop them from calling it.

> I'll look at the patches later today and respond. Although, I would have
> been nice you had helped me fix any issues with my patch than coming up with
> new ones. Kinda removes to motivation for submitting patches upstream.

Sorry if I demotivated you at all :)

I just wanted to get to a quick-fix and wasn't interested in getting
my patch count
up. Thought that isn't always bad :)

I sent my patches as they were very different then your approach. Obviously, I
wouldn't have done this otherwise :)

2014-02-26 06:02:22

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

On 26 February 2014 07:18, Saravana Kannan <[email protected]> wrote:
> On 02/25/2014 02:41 PM, Rafael J. Wysocki wrote:

>> And is "fully initialized" actually well defined?
>
> The point in add dev/hot plug path after which we will no longer change
> policy fields without sending further CPUFREQ_UPDATE_POLICY_CPU /
> CPUFRE_NOTIFY notifiers.

Okay..

> Pretty much the end of __cpufreq_add_dev() so that it's after:
> - cpufreq_init_policy()
> - And the update of userpolicy fields that after thie init call

No. In that case it can be considered initialized before cpufreq_init_policy().
As we do send CPUFREQ_NOTIFY after that from cpufreq_init_policy()->
cpufreq_set_policy().

There are two types of fields within policy, some are very basic: cpu/min/max/
affected_cpus/related_cpus

some are advanced: sysfs/governors/..

And as a rule you have to get policy->rwsem lock before accessing policy
members. We might not have followed it very well for small things like cpu.

And so if you are doing anything over that, please use a lock and that is
already present in cpufreq_update_policy().

With my latest patchset that I sent yesterday, locking is improved and now
a policy will be usable only after the rwsem is released. And that should be
fine. And so making it available in the per-cpu variable after all the necessary
fields are filled looks fine to me. And so I don't think we need to move it
after call to cpufreq_init_policy(maybe a better name to this function is
required)..

> Ok, here's some pseudo code to explain it better:
>
> Something like, replace all calls to cpufreq_driver->get with
> __cpufreq_driver_get() with the fn being something like:
>
> unsigned int __cpufreq_driver_get(cpufreq_policy *policy)
> {
> if (policy->clk)
> return clk_get_rate(policy->clk) / 1000;
> else
> return cpufreq_driver->get(policy->cpu);

This part may still use cpufreq_cpu_get().

> }

Drivers are free to have their implementation of ->get() even
if they have a valid policy->clk field..

2014-02-26 06:14:08

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH 3/3] cpufreq: Set policy to non-NULL only after all hotplug online work is done

On 26 February 2014 09:08, Saravana Kannan <[email protected]> wrote:
> The existing code sets the per CPU policy to a non-NULL value before all
> the steps performed during the hotplug online path is done. Specifically,
> this is done before the policy min/max, governors, etc are initialized for
> the policy. This in turn means that calls to cpufreq_cpu_get() return a
> non-NULL policy before the policy/CPU is ready to be used.

First two patches are fine but I would still say that take the three patches
I posted instead of this. Reasoning already given in other mails.

2014-02-26 20:04:49

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH 1/3] cpufreq: stats: Remove redundant cpufreq_cpu_get() call

On 02/25/2014 09:06 PM, Viresh Kumar wrote:
> On 26 February 2014 09:08, Saravana Kannan <[email protected]> wrote:
>> __cpufreq_stats_create_table always gets pass the valid and real policy
>> struct. So, there's no need to call cpufreq_cpu_get() to get the policy
>> again.
>>
>> Change-Id: I0136b3e67018ee3af2335906407f55d8c6219f71
>
> ??
>
>> Signed-off-by: Saravana Kannan <[email protected]>
>> ---
>>
>> Viresh/Rafael,
>>
>> These 3 patches is the approximate code I have in mind.
>>
>> Approximate because:
>> * I inserted one question as a comment into the code.
>> * If the patch doesn't have any bugs, the plan is to remove
>> cpufreq_generic_get() and references to it.
>>
>> This takes care of the "don't advertise before it's ready for use" rule.
>>
>> Viresh,
>>
>> I think the locking updates needs to be done in addition to this.
>>
>> Regards,
>> Saravana
>>
>> drivers/cpufreq/cpufreq_stats.c | 12 +-----------
>> 1 file changed, 1 insertion(+), 11 deletions(-)
>>
>> diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
>> index 5793e14..e4bd27f 100644
>> --- a/drivers/cpufreq/cpufreq_stats.c
>> +++ b/drivers/cpufreq/cpufreq_stats.c
>> @@ -185,7 +185,6 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy,
>> {
>> unsigned int i, j, count = 0, ret = 0;
>> struct cpufreq_stats *stat;
>> - struct cpufreq_policy *current_policy;
>> unsigned int alloc_size;
>> unsigned int cpu = policy->cpu;
>> if (per_cpu(cpufreq_stats_table, cpu))
>> @@ -194,13 +193,7 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy,
>> if ((stat) == NULL)
>> return -ENOMEM;
>>
>> - current_policy = cpufreq_cpu_get(cpu);
>> - if (current_policy == NULL) {
>> - ret = -EINVAL;
>> - goto error_get_fail;
>> - }
>> -
>> - ret = sysfs_create_group(&current_policy->kobj, &stats_attr_group);
>> + ret = sysfs_create_group(&policy->kobj, &stats_attr_group);
>> if (ret)
>> goto error_out;
>>
>> @@ -243,11 +236,8 @@ static int __cpufreq_stats_create_table(struct cpufreq_policy *policy,
>> stat->last_time = get_jiffies_64();
>> stat->last_index = freq_table_get_index(stat, policy->cur);
>> spin_unlock(&cpufreq_stats_lock);
>> - cpufreq_cpu_put(current_policy);
>> return 0;
>> error_out:
>> - cpufreq_cpu_put(current_policy);
>> -error_get_fail:
>> kfree(stat);
>> per_cpu(cpufreq_stats_table, cpu) = NULL;
>> return ret;
>
> I was damn sure that this wasn't a waste of time. This was some meaningful
> code when I visited it earlier. And we absolutely required a new
> cpufreq_cpu_get()..
>
> Reason: Earlier tables were created for this notifier: CPUFREQ_NOTIFY and
> it used to come with another changed copy of 'policy' and so we were required
> to get the real copy of policy to get to the right kobj.
>
> But recently I have simplified stuff there and these tables are now added with
> CPUFREQ_CREATE_POLICY and so this replication isn't required anymore.

Agreed. I already knew it had a good reason. :) Just that it's not
needed anymore.

> So, Acked-by: Viresh Kumar <[email protected]>

Thanks.

>
> While you are at it please get this part into __cpufreq_stats_create_table()
> routine:
>
> table = cpufreq_frequency_get_table(cpu);
> if (!table)
> return 0;
>
> As it is replicated at two places currently.
>

Doing it as a separate patch since it's technically unrelated to these
changes.

-Saravana

--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

2014-02-26 20:20:29

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: Set policy to non-NULL only after all hotplug online work is done

On 02/25/2014 10:02 PM, Viresh Kumar wrote:
> On 26 February 2014 07:18, Saravana Kannan <[email protected]> wrote:
>> On 02/25/2014 02:41 PM, Rafael J. Wysocki wrote:
>
>>> And is "fully initialized" actually well defined?
>>
>> The point in add dev/hot plug path after which we will no longer change
>> policy fields without sending further CPUFREQ_UPDATE_POLICY_CPU /
>> CPUFRE_NOTIFY notifiers.
>
> Okay..
>
>> Pretty much the end of __cpufreq_add_dev() so that it's after:
>> - cpufreq_init_policy()
>> - And the update of userpolicy fields that after thie init call
>
> No. In that case it can be considered initialized before cpufreq_init_policy().
> As we do send CPUFREQ_NOTIFY after that from cpufreq_init_policy()->
> cpufreq_set_policy().

Ok, valid hole in my definition of "fully initialized".
>
> There are two types of fields within policy, some are very basic: cpu/min/max/
> affected_cpus/related_cpus
>
> some are advanced: sysfs/governors/..
>
> And as a rule you have to get policy->rwsem lock before accessing policy
> members. We might not have followed it very well for small things like cpu.
>
> And so if you are doing anything over that, please use a lock and that is
> already present in cpufreq_update_policy().
>
> With my latest patchset that I sent yesterday, locking is improved and now
> a policy will be usable only after the rwsem is released. And that should be
> fine. And so making it available in the per-cpu variable after all the necessary
> fields are filled looks fine to me. And so I don't think we need to move it
> after call to cpufreq_init_policy(maybe a better name to this function is
> required)..

I'll take a closer look. Internal tree cpufreq code is in 3.12, so
back-porting all the cpufreq changes and testing it can take a bit of
time. Will get back on this.

-Saravana


--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation