LinuxLists.cc - [PATCH] cpufreq: Fix cpufreq regression after suspend/resume

2013-06-30 18:56:21

Subject: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume

On 06/30/2013 10:35 PM, Toralf Förster wrote:
> On 06/30/2013 06:33 PM, Srivatsa S. Bhat wrote:
>> Toralf, can you please
>> try out the below patch and see if it improves anything? (Don't revert anything,
>> just apply the below diff on a problematic kernel and see if it solves your
>> issue).
>
> applied on top of a66b2e5 - issue went away (either fixed or hidden now)
>

Cool! So here is the proper patch, with changelog added.

Regards,
Srivatsa S. Bhat

-----------------------------------------------------------------------------

From: Srivatsa S. Bhat <[email protected]>
Subject: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume

Toralf Förster reported that the cpufreq ondemand governor behaves erratically
(doesn't scale well) after a suspend/resume cycle. The problem was that the
cpufreq subsystem's idea of the cpu frequencies differed from the actual
frequencies set in the hardware after a suspend/resume cycle. Toralf bisected
the problem to commit a66b2e5 (cpufreq: Preserve sysfs files across
suspend/resume).

Among other (harmless) things, that commit skipped the call to
cpufreq_update_policy() in the resume path. But cpufreq_update_policy() plays
an important role during resume, because it is responsible for checking if
the BIOS changed the cpu frequencies behind our back and resynchronize the
cpufreq subsystem's knowledge of the cpu frequencies, and update them
accordingly.

So, restore the call to cpufreq_update_policy() in the resume path to fix
the cpufreq regression.

Reported-by: Toralf Förster <[email protected]>
Tested-by: Toralf Förster <[email protected]>
Signed-off-by: Srivatsa S. Bhat <[email protected]>
---

drivers/cpufreq/cpufreq_stats.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index fb65dec..591b6fb 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -349,6 +349,7 @@ static int __cpuinit cpufreq_stat_cpu_callback(struct notifier_block *nfb,

switch (action) {
case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
cpufreq_update_policy(cpu);
break;
case CPU_DOWN_PREPARE:

2013-06-30 22:36:33

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume

On Monday, July 01, 2013 12:22:47 AM Srivatsa S. Bhat wrote:
> On 06/30/2013 10:35 PM, Toralf Förster wrote:
> > On 06/30/2013 06:33 PM, Srivatsa S. Bhat wrote:
> >> Toralf, can you please
> >> try out the below patch and see if it improves anything? (Don't revert anything,
> >> just apply the below diff on a problematic kernel and see if it solves your
> >> issue).
> >
> > applied on top of a66b2e5 - issue went away (either fixed or hidden now)
> >
>
> Cool! So here is the proper patch, with changelog added.
>
> Regards,
> Srivatsa S. Bhat
>
>
> -----------------------------------------------------------------------------
>
> From: Srivatsa S. Bhat <[email protected]>
> Subject: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume
>
> Toralf Förster reported that the cpufreq ondemand governor behaves erratically
> (doesn't scale well) after a suspend/resume cycle. The problem was that the
> cpufreq subsystem's idea of the cpu frequencies differed from the actual
> frequencies set in the hardware after a suspend/resume cycle. Toralf bisected
> the problem to commit a66b2e5 (cpufreq: Preserve sysfs files across
> suspend/resume).
>
> Among other (harmless) things, that commit skipped the call to
> cpufreq_update_policy() in the resume path. But cpufreq_update_policy() plays
> an important role during resume, because it is responsible for checking if
> the BIOS changed the cpu frequencies behind our back and resynchronize the
> cpufreq subsystem's knowledge of the cpu frequencies, and update them
> accordingly.
>
> So, restore the call to cpufreq_update_policy() in the resume path to fix
> the cpufreq regression.
>
> Reported-by: Toralf Förster <[email protected]>
> Tested-by: Toralf Förster <[email protected]>
> Signed-off-by: Srivatsa S. Bhat <[email protected]>

Thanks Srivatsa, I'll queue it up as 3.11 (and 3.10-stable) material.

Thanks,
Rafael

> ---
>
> drivers/cpufreq/cpufreq_stats.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
> index fb65dec..591b6fb 100644
> --- a/drivers/cpufreq/cpufreq_stats.c
> +++ b/drivers/cpufreq/cpufreq_stats.c
> @@ -349,6 +349,7 @@ static int __cpuinit cpufreq_stat_cpu_callback(struct notifier_block *nfb,
>
> switch (action) {
> case CPU_ONLINE:
> + case CPU_ONLINE_FROZEN:
> cpufreq_update_policy(cpu);
> break;
> case CPU_DOWN_PREPARE:
>
>
--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-07-10 20:50:15

by Toralf Förster

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume

I tested the patch several times on top of a66b2e5 - the origin issue is
fixed but - erratically another issue now appears : all 4 cores are runs
after wakeup at 2.6 GHz.
The temporary hot fix is to switch between governor performance and
ondemand for all 4 cores.

On 06/30/2013 08:52 PM, Srivatsa S. Bhat wrote:
> On 06/30/2013 10:35 PM, Toralf Förster wrote:
>> On 06/30/2013 06:33 PM, Srivatsa S. Bhat wrote:
>>> Toralf, can you please
>>> try out the below patch and see if it improves anything? (Don't revert anything,
>>> just apply the below diff on a problematic kernel and see if it solves your
>>> issue).
>>
>> applied on top of a66b2e5 - issue went away (either fixed or hidden now)
>>
>
> Cool! So here is the proper patch, with changelog added.
>
> Regards,
> Srivatsa S. Bhat
>
>
> -----------------------------------------------------------------------------
>
> From: Srivatsa S. Bhat <[email protected]>
> Subject: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume
>
> Toralf Förster reported that the cpufreq ondemand governor behaves erratically
> (doesn't scale well) after a suspend/resume cycle. The problem was that the
> cpufreq subsystem's idea of the cpu frequencies differed from the actual
> frequencies set in the hardware after a suspend/resume cycle. Toralf bisected
> the problem to commit a66b2e5 (cpufreq: Preserve sysfs files across
> suspend/resume).
>
> Among other (harmless) things, that commit skipped the call to
> cpufreq_update_policy() in the resume path. But cpufreq_update_policy() plays
> an important role during resume, because it is responsible for checking if
> the BIOS changed the cpu frequencies behind our back and resynchronize the
> cpufreq subsystem's knowledge of the cpu frequencies, and update them
> accordingly.
>
> So, restore the call to cpufreq_update_policy() in the resume path to fix
> the cpufreq regression.
>
> Reported-by: Toralf Förster <[email protected]>
> Tested-by: Toralf Förster <[email protected]>
> Signed-off-by: Srivatsa S. Bhat <[email protected]>
> ---
>
> drivers/cpufreq/cpufreq_stats.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
> index fb65dec..591b6fb 100644
> --- a/drivers/cpufreq/cpufreq_stats.c
> +++ b/drivers/cpufreq/cpufreq_stats.c
> @@ -349,6 +349,7 @@ static int __cpuinit cpufreq_stat_cpu_callback(struct notifier_block *nfb,
>
> switch (action) {
> case CPU_ONLINE:
> + case CPU_ONLINE_FROZEN:
> cpufreq_update_policy(cpu);
> break;
> case CPU_DOWN_PREPARE:
>
>
>

--
MfG/Sincerely
Toralf Förster
pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3

2013-07-10 22:33:28

by Srivatsa S. Bhat

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume

Hi Toralf,

On 07/11/2013 02:20 AM, Toralf Förster wrote:
> I tested the patch several times on top of a66b2e5 - the origin issue is
> fixed but - erratically another issue now appears : all 4 cores are runs
> after wakeup at 2.6 GHz.
> The temporary hot fix is to switch between governor performance and
> ondemand for all 4 cores.
>
>

Thanks for all your testing efforts. The commit a66b2e5 took a shortcut to
achieve its goals but failed subtly in many aspects. Below is a patch (only
compile-tested) which tries to achieve the goal (preserve sysfs files) in
the proper way, by separating out the cpufreq-core bits from the sysfs bits.
You might want to give it a try on current mainline and see if it improves
anything.

Regards,
Srivatsa S. Bhat

(Applies on current mainline)
---

drivers/cpufreq/cpufreq.c | 144 ++++++++++++++++++++++++++-------------------
1 file changed, 82 insertions(+), 62 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 6a015ad..28c690f 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -834,11 +834,8 @@ static int cpufreq_add_dev_interface(unsigned int cpu,
struct cpufreq_policy *policy,
struct device *dev)
{
- struct cpufreq_policy new_policy;
struct freq_attr **drv_attr;
- unsigned long flags;
int ret = 0;
- unsigned int j;

/* prepare interface data */
ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq,
@@ -870,31 +867,10 @@ static int cpufreq_add_dev_interface(unsigned int cpu,
goto err_out_kobj_put;
}

- write_lock_irqsave(&cpufreq_driver_lock, flags);
- for_each_cpu(j, policy->cpus) {
- per_cpu(cpufreq_cpu_data, j) = policy;
- per_cpu(cpufreq_policy_cpu, j) = policy->cpu;
- }
- write_unlock_irqrestore(&cpufreq_driver_lock, flags);
-
ret = cpufreq_add_dev_symlink(cpu, policy);
if (ret)
goto err_out_kobj_put;

- memcpy(&new_policy, policy, sizeof(struct cpufreq_policy));
- /* assure that the starting sequence is run in __cpufreq_set_policy */
- policy->governor = NULL;
-
- /* set default policy */
- ret = __cpufreq_set_policy(policy, &new_policy);
- policy->user_policy.policy = policy->policy;
- policy->user_policy.governor = policy->governor;
-
- if (ret) {
- pr_debug("setting policy failed\n");
- if (cpufreq_driver->exit)
- cpufreq_driver->exit(policy);
- }
return ret;

err_out_kobj_put:
@@ -905,7 +881,7 @@ err_out_kobj_put:

#ifdef CONFIG_HOTPLUG_CPU
static int cpufreq_add_policy_cpu(unsigned int cpu, unsigned int sibling,
- struct device *dev)
+ struct device *dev, bool init_sysfs)
{
struct cpufreq_policy *policy;
int ret = 0, has_target = !!cpufreq_driver->target;
@@ -933,30 +909,25 @@ static int cpufreq_add_policy_cpu(unsigned int cpu, unsigned int sibling,
__cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
}

- ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
- if (ret) {
- cpufreq_cpu_put(policy);
- return ret;
+ if (init_sysfs) {
+ ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
+ if (ret) {
+ cpufreq_cpu_put(policy);
+ return ret;
+ }
}

return 0;
}
#endif

-/**
- * cpufreq_add_dev - add a CPU device
- *
- * Adds the cpufreq interface for a CPU device.
- *
- * The Oracle says: try running cpufreq registration/unregistration concurrently
- * with with cpu hotplugging and all hell will break loose. Tried to clean this
- * mess up, but more thorough testing is needed. - Mathieu
- */
-static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
+static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
+ bool init_sysfs)
{
unsigned int j, cpu = dev->id;
int ret = -ENOMEM;
struct cpufreq_policy *policy;
+ struct cpufreq_policy new_policy;
unsigned long flags;
#ifdef CONFIG_HOTPLUG_CPU
struct cpufreq_governor *gov;
@@ -984,7 +955,8 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
struct cpufreq_policy *cp = per_cpu(cpufreq_cpu_data, sibling);
if (cp && cpumask_test_cpu(cpu, cp->related_cpus)) {
read_unlock_irqrestore(&cpufreq_driver_lock, flags);
- return cpufreq_add_policy_cpu(cpu, sibling, dev);
+ return cpufreq_add_policy_cpu(cpu, sibling, dev,
+ init_sysfs);
}
}
read_unlock_irqrestore(&cpufreq_driver_lock, flags);
@@ -1049,9 +1021,33 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
}
#endif

- ret = cpufreq_add_dev_interface(cpu, policy, dev);
- if (ret)
- goto err_out_unregister;
+ write_lock_irqsave(&cpufreq_driver_lock, flags);
+ for_each_cpu(j, policy->cpus) {
+ per_cpu(cpufreq_cpu_data, j) = policy;
+ per_cpu(cpufreq_policy_cpu, j) = policy->cpu;
+ }
+ write_unlock_irqrestore(&cpufreq_driver_lock, flags);
+
+ if (init_sysfs) {
+ ret = cpufreq_add_dev_interface(cpu, policy, dev);
+ if (ret)
+ goto err_out_unregister;
+ }
+
+ memcpy(&new_policy, policy, sizeof(struct cpufreq_policy));
+ /* assure that the starting sequence is run in __cpufreq_set_policy */
+ policy->governor = NULL;
+
+ /* set default policy */
+ ret = __cpufreq_set_policy(policy, &new_policy);
+ policy->user_policy.policy = policy->policy;
+ policy->user_policy.governor = policy->governor;
+
+ if (ret) {
+ pr_debug("setting policy failed\n");
+ if (cpufreq_driver->exit)
+ cpufreq_driver->exit(policy);
+ }

kobject_uevent(&policy->kobj, KOBJ_ADD);
module_put(cpufreq_driver->owner);
@@ -1081,6 +1077,20 @@ module_out:
return ret;
}

+/**
+ * cpufreq_add_dev - add a CPU device
+ *
+ * Adds the cpufreq interface for a CPU device.
+ *
+ * The Oracle says: try running cpufreq registration/unregistration concurrently
+ * with with cpu hotplugging and all hell will break loose. Tried to clean this
+ * mess up, but more thorough testing is needed. - Mathieu
+ */
+static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
+{
+ return __cpufreq_add_dev(dev, sif, true);
+}
+
static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu)
{
int j;
@@ -1106,7 +1116,7 @@ static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu)
* This routine frees the rwsem before returning.
*/
static int __cpufreq_remove_dev(struct device *dev,
- struct subsys_interface *sif)
+ struct subsys_interface *sif, bool remove_sysfs)
{
unsigned int cpu = dev->id, ret, cpus;
unsigned long flags;
@@ -1145,9 +1155,9 @@ static int __cpufreq_remove_dev(struct device *dev,
cpumask_clear_cpu(cpu, data->cpus);
unlock_policy_rwsem_write(cpu);

- if (cpu != data->cpu) {
+ if (cpu != data->cpu && remove_sysfs) {
sysfs_remove_link(&dev->kobj, "cpufreq");
- } else if (cpus > 1) {
+ } else if (cpus > 1 && remove_sysfs) {
/* first sibling now owns the new sysfs dir */
cpu_dev = get_cpu_device(cpumask_first(data->cpus));
sysfs_remove_link(&cpu_dev->kobj, "cpufreq");
@@ -1184,26 +1194,25 @@ static int __cpufreq_remove_dev(struct device *dev,

/* If cpu is last user of policy, free policy */
if (cpus == 1) {
- lock_policy_rwsem_read(cpu);
- kobj = &data->kobj;
- cmp = &data->kobj_unregister;
- unlock_policy_rwsem_read(cpu);
- kobject_put(kobj);
-
- /* we need to make sure that the underlying kobj is actually
- * not referenced anymore by anybody before we proceed with
- * unloading.
- */
- pr_debug("waiting for dropping of refcount\n");
- wait_for_completion(cmp);
- pr_debug("wait complete\n");
-
if (cpufreq_driver->exit)
cpufreq_driver->exit(data);

free_cpumask_var(data->related_cpus);
free_cpumask_var(data->cpus);
kfree(data);
+
+ if (remove_sysfs) {
+ lock_policy_rwsem_read(cpu);
+ kobj = &data->kobj;
+ cmp = &data->kobj_unregister;
+ unlock_policy_rwsem_read(cpu);
+ kobject_put(kobj);
+
+ pr_debug("waiting for dropping of refcount\n");
+ wait_for_completion(cmp);
+ pr_debug("wait complete\n");
+ }
+
} else if (cpufreq_driver->target) {
__cpufreq_governor(data, CPUFREQ_GOV_START);
__cpufreq_governor(data, CPUFREQ_GOV_LIMITS);
@@ -1221,7 +1230,7 @@ static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
if (cpu_is_offline(cpu))
return 0;

- retval = __cpufreq_remove_dev(dev, sif);
+ retval = __cpufreq_remove_dev(dev, sif, true);
return retval;
}

@@ -1943,13 +1952,24 @@ static int __cpuinit cpufreq_cpu_callback(struct notifier_block *nfb,
case CPU_ONLINE:
cpufreq_add_dev(dev, NULL);
break;
+ case CPU_ONLINE_FROZEN:
+ __cpufreq_add_dev(dev, NULL, false);
+ break;
+
case CPU_DOWN_PREPARE:
case CPU_UP_CANCELED_FROZEN:
- __cpufreq_remove_dev(dev, NULL);
+ __cpufreq_remove_dev(dev, NULL, true);
+ break;
+ case CPU_DOWN_PREPARE_FROZEN:
+ __cpufreq_remove_dev(dev, NULL, false);
break;
+
case CPU_DOWN_FAILED:
cpufreq_add_dev(dev, NULL);
break;
+ case CPU_DOWN_FAILED_FROZEN:
+ __cpufreq_add_dev(dev, NULL, false);
+ break;
}
}
return NOTIFY_OK;

2013-07-11 05:41:04

by Tianyu Lan

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume

2013/7/11 Srivatsa S. Bhat <[email protected]>:
> Hi Toralf,
>
> On 07/11/2013 02:20 AM, Toralf F?rster wrote:
>> I tested the patch several times on top of a66b2e5 - the origin issue is
>> fixed but - erratically another issue now appears : all 4 cores are runs
>> after wakeup at 2.6 GHz.
>> The temporary hot fix is to switch between governor performance and
>> ondemand for all 4 cores.
>>
>>
>
> Thanks for all your testing efforts. The commit a66b2e5 took a shortcut to
> achieve its goals but failed subtly in many aspects. Below is a patch (only
> compile-tested) which tries to achieve the goal (preserve sysfs files) in
> the proper way, by separating out the cpufreq-core bits from the sysfs bits.
> You might want to give it a try on current mainline and see if it improves
> anything.
>
> Regards,
> Srivatsa S. Bhat
>
>
> (Applies on current mainline)
> ---
>
> drivers/cpufreq/cpufreq.c | 144 ++++++++++++++++++++++++++-------------------
> 1 file changed, 82 insertions(+), 62 deletions(-)
>
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 6a015ad..28c690f 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -834,11 +834,8 @@ static int cpufreq_add_dev_interface(unsigned int cpu,
> struct cpufreq_policy *policy,
> struct device *dev)
> {
> - struct cpufreq_policy new_policy;
> struct freq_attr **drv_attr;
> - unsigned long flags;
> int ret = 0;
> - unsigned int j;
>
> /* prepare interface data */
> ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq,
> @@ -870,31 +867,10 @@ static int cpufreq_add_dev_interface(unsigned int cpu,
> goto err_out_kobj_put;
> }
>
> - write_lock_irqsave(&cpufreq_driver_lock, flags);
> - for_each_cpu(j, policy->cpus) {
> - per_cpu(cpufreq_cpu_data, j) = policy;
> - per_cpu(cpufreq_policy_cpu, j) = policy->cpu;
> - }
> - write_unlock_irqrestore(&cpufreq_driver_lock, flags);
> -
> ret = cpufreq_add_dev_symlink(cpu, policy);
> if (ret)
> goto err_out_kobj_put;
>
> - memcpy(&new_policy, policy, sizeof(struct cpufreq_policy));
> - /* assure that the starting sequence is run in __cpufreq_set_policy */
> - policy->governor = NULL;
> -
> - /* set default policy */
> - ret = __cpufreq_set_policy(policy, &new_policy);
> - policy->user_policy.policy = policy->policy;
> - policy->user_policy.governor = policy->governor;
> -
> - if (ret) {
> - pr_debug("setting policy failed\n");
> - if (cpufreq_driver->exit)
> - cpufreq_driver->exit(policy);
> - }
> return ret;
>
> err_out_kobj_put:
> @@ -905,7 +881,7 @@ err_out_kobj_put:
>
> #ifdef CONFIG_HOTPLUG_CPU
> static int cpufreq_add_policy_cpu(unsigned int cpu, unsigned int sibling,
> - struct device *dev)
> + struct device *dev, bool init_sysfs)
> {
> struct cpufreq_policy *policy;
> int ret = 0, has_target = !!cpufreq_driver->target;
> @@ -933,30 +909,25 @@ static int cpufreq_add_policy_cpu(unsigned int cpu, unsigned int sibling,
> __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
> }
>
> - ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
> - if (ret) {
> - cpufreq_cpu_put(policy);
> - return ret;
> + if (init_sysfs) {
> + ret = sysfs_create_link(&dev->kobj, &policy->kobj, "cpufreq");
> + if (ret) {
> + cpufreq_cpu_put(policy);
> + return ret;
> + }
> }
>
> return 0;
> }
> #endif
>
> -/**
> - * cpufreq_add_dev - add a CPU device
> - *
> - * Adds the cpufreq interface for a CPU device.
> - *
> - * The Oracle says: try running cpufreq registration/unregistration concurrently
> - * with with cpu hotplugging and all hell will break loose. Tried to clean this
> - * mess up, but more thorough testing is needed. - Mathieu
> - */
> -static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
> +static int __cpufreq_add_dev(struct device *dev, struct subsys_interface *sif,
> + bool init_sysfs)
> {
> unsigned int j, cpu = dev->id;
> int ret = -ENOMEM;
> struct cpufreq_policy *policy;
> + struct cpufreq_policy new_policy;
> unsigned long flags;
> #ifdef CONFIG_HOTPLUG_CPU
> struct cpufreq_governor *gov;
> @@ -984,7 +955,8 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
> struct cpufreq_policy *cp = per_cpu(cpufreq_cpu_data, sibling);
> if (cp && cpumask_test_cpu(cpu, cp->related_cpus)) {
> read_unlock_irqrestore(&cpufreq_driver_lock, flags);
> - return cpufreq_add_policy_cpu(cpu, sibling, dev);
> + return cpufreq_add_policy_cpu(cpu, sibling, dev,
> + init_sysfs);
> }
> }
> read_unlock_irqrestore(&cpufreq_driver_lock, flags);
> @@ -1049,9 +1021,33 @@ static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
> }
> #endif
>
> - ret = cpufreq_add_dev_interface(cpu, policy, dev);
> - if (ret)
> - goto err_out_unregister;
> + write_lock_irqsave(&cpufreq_driver_lock, flags);
> + for_each_cpu(j, policy->cpus) {
> + per_cpu(cpufreq_cpu_data, j) = policy;
> + per_cpu(cpufreq_policy_cpu, j) = policy->cpu;
> + }
> + write_unlock_irqrestore(&cpufreq_driver_lock, flags);
> +
> + if (init_sysfs) {
> + ret = cpufreq_add_dev_interface(cpu, policy, dev);
> + if (ret)
> + goto err_out_unregister;
> + }
> +
> + memcpy(&new_policy, policy, sizeof(struct cpufreq_policy));
> + /* assure that the starting sequence is run in __cpufreq_set_policy */
> + policy->governor = NULL;
> +
> + /* set default policy */
> + ret = __cpufreq_set_policy(policy, &new_policy);
> + policy->user_policy.policy = policy->policy;
> + policy->user_policy.governor = policy->governor;
> +
> + if (ret) {
> + pr_debug("setting policy failed\n");
> + if (cpufreq_driver->exit)
> + cpufreq_driver->exit(policy);
> + }
>
> kobject_uevent(&policy->kobj, KOBJ_ADD);
> module_put(cpufreq_driver->owner);
> @@ -1081,6 +1077,20 @@ module_out:
> return ret;
> }
>
> +/**
> + * cpufreq_add_dev - add a CPU device
> + *
> + * Adds the cpufreq interface for a CPU device.
> + *
> + * The Oracle says: try running cpufreq registration/unregistration concurrently
> + * with with cpu hotplugging and all hell will break loose. Tried to clean this
> + * mess up, but more thorough testing is needed. - Mathieu
> + */
> +static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
> +{
> + return __cpufreq_add_dev(dev, sif, true);
> +}
> +
> static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu)
> {
> int j;
> @@ -1106,7 +1116,7 @@ static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu)
> * This routine frees the rwsem before returning.
> */
> static int __cpufreq_remove_dev(struct device *dev,
> - struct subsys_interface *sif)
> + struct subsys_interface *sif, bool remove_sysfs)
> {
> unsigned int cpu = dev->id, ret, cpus;
> unsigned long flags;
> @@ -1145,9 +1155,9 @@ static int __cpufreq_remove_dev(struct device *dev,
> cpumask_clear_cpu(cpu, data->cpus);
> unlock_policy_rwsem_write(cpu);
>
> - if (cpu != data->cpu) {
> + if (cpu != data->cpu && remove_sysfs) {
> sysfs_remove_link(&dev->kobj, "cpufreq");
> - } else if (cpus > 1) {
> + } else if (cpus > 1 && remove_sysfs) {
> /* first sibling now owns the new sysfs dir */
> cpu_dev = get_cpu_device(cpumask_first(data->cpus));
> sysfs_remove_link(&cpu_dev->kobj, "cpufreq");
> @@ -1184,26 +1194,25 @@ static int __cpufreq_remove_dev(struct device *dev,
>
> /* If cpu is last user of policy, free policy */
> if (cpus == 1) {
> - lock_policy_rwsem_read(cpu);
> - kobj = &data->kobj;
> - cmp = &data->kobj_unregister;
> - unlock_policy_rwsem_read(cpu);
> - kobject_put(kobj);
> -
> - /* we need to make sure that the underlying kobj is actually
> - * not referenced anymore by anybody before we proceed with
> - * unloading.
> - */
> - pr_debug("waiting for dropping of refcount\n");
> - wait_for_completion(cmp);
> - pr_debug("wait complete\n");
> -
> if (cpufreq_driver->exit)
> cpufreq_driver->exit(data);
>
> free_cpumask_var(data->related_cpus);
> free_cpumask_var(data->cpus);
> kfree(data);
> +
> + if (remove_sysfs) {
> + lock_policy_rwsem_read(cpu);
> + kobj = &data->kobj;
> + cmp = &data->kobj_unregister;

This looks no right. "data" has already been freed but data-> kobj still is
referenced.

> + unlock_policy_rwsem_read(cpu);
> + kobject_put(kobj);
> +
> + pr_debug("waiting for dropping of refcount\n");
> + wait_for_completion(cmp);
> + pr_debug("wait complete\n");
> + }
> +
> } else if (cpufreq_driver->target) {
> __cpufreq_governor(data, CPUFREQ_GOV_START);
> __cpufreq_governor(data, CPUFREQ_GOV_LIMITS);
> @@ -1221,7 +1230,7 @@ static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
> if (cpu_is_offline(cpu))
> return 0;
>
> - retval = __cpufreq_remove_dev(dev, sif);
> + retval = __cpufreq_remove_dev(dev, sif, true);
> return retval;
> }
>
> @@ -1943,13 +1952,24 @@ static int __cpuinit cpufreq_cpu_callback(struct notifier_block *nfb,
> case CPU_ONLINE:
> cpufreq_add_dev(dev, NULL);
> break;
> + case CPU_ONLINE_FROZEN:
> + __cpufreq_add_dev(dev, NULL, false);
> + break;
> +
> case CPU_DOWN_PREPARE:
> case CPU_UP_CANCELED_FROZEN:
> - __cpufreq_remove_dev(dev, NULL);
> + __cpufreq_remove_dev(dev, NULL, true);
> + break;
> + case CPU_DOWN_PREPARE_FROZEN:
> + __cpufreq_remove_dev(dev, NULL, false);
> break;
> +
> case CPU_DOWN_FAILED:
> cpufreq_add_dev(dev, NULL);
> break;
> + case CPU_DOWN_FAILED_FROZEN:
> + __cpufreq_add_dev(dev, NULL, false);
> + break;
> }
> }
> return NOTIFY_OK;
>
> --
> To unsubscribe from this list: send the line "unsubscribe cpufreq" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
Best regards
Tianyu Lan

2013-07-11 06:26:57

by Srivatsa S. Bhat

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume

On 07/11/2013 11:10 AM, Lan Tianyu wrote:
> 2013/7/11 Srivatsa S. Bhat <[email protected]>:
>> Hi Toralf,
>>
>> On 07/11/2013 02:20 AM, Toralf F?rster wrote:
>>> I tested the patch several times on top of a66b2e5 - the origin issue is
>>> fixed but - erratically another issue now appears : all 4 cores are runs
>>> after wakeup at 2.6 GHz.
>>> The temporary hot fix is to switch between governor performance and
>>> ondemand for all 4 cores.
>>>
>>>
>>
>> Thanks for all your testing efforts. The commit a66b2e5 took a shortcut to
>> achieve its goals but failed subtly in many aspects. Below is a patch (only
>> compile-tested) which tries to achieve the goal (preserve sysfs files) in
>> the proper way, by separating out the cpufreq-core bits from the sysfs bits.
>> You might want to give it a try on current mainline and see if it improves
>> anything.
>>
>> Regards,
>> Srivatsa S. Bhat
>>
>>
>> (Applies on current mainline)
>> ---
>>
>> drivers/cpufreq/cpufreq.c | 144 ++++++++++++++++++++++++++-------------------
>> 1 file changed, 82 insertions(+), 62 deletions(-)
>>
[...]
>> +/**
>> + * cpufreq_add_dev - add a CPU device
>> + *
>> + * Adds the cpufreq interface for a CPU device.
>> + *
>> + * The Oracle says: try running cpufreq registration/unregistration concurrently
>> + * with with cpu hotplugging and all hell will break loose. Tried to clean this
>> + * mess up, but more thorough testing is needed. - Mathieu
>> + */
>> +static int cpufreq_add_dev(struct device *dev, struct subsys_interface *sif)
>> +{
>> + return __cpufreq_add_dev(dev, sif, true);
>> +}
>> +
>> static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu)
>> {
>> int j;
>> @@ -1106,7 +1116,7 @@ static void update_policy_cpu(struct cpufreq_policy *policy, unsigned int cpu)
>> * This routine frees the rwsem before returning.
>> */
>> static int __cpufreq_remove_dev(struct device *dev,
>> - struct subsys_interface *sif)
>> + struct subsys_interface *sif, bool remove_sysfs)
>> {
>> unsigned int cpu = dev->id, ret, cpus;
>> unsigned long flags;
>> @@ -1145,9 +1155,9 @@ static int __cpufreq_remove_dev(struct device *dev,
>> cpumask_clear_cpu(cpu, data->cpus);
>> unlock_policy_rwsem_write(cpu);
>>
>> - if (cpu != data->cpu) {
>> + if (cpu != data->cpu && remove_sysfs) {
>> sysfs_remove_link(&dev->kobj, "cpufreq");
>> - } else if (cpus > 1) {
>> + } else if (cpus > 1 && remove_sysfs) {
>> /* first sibling now owns the new sysfs dir */
>> cpu_dev = get_cpu_device(cpumask_first(data->cpus));
>> sysfs_remove_link(&cpu_dev->kobj, "cpufreq");
>> @@ -1184,26 +1194,25 @@ static int __cpufreq_remove_dev(struct device *dev,
>>
>> /* If cpu is last user of policy, free policy */
>> if (cpus == 1) {
>> - lock_policy_rwsem_read(cpu);
>> - kobj = &data->kobj;
>> - cmp = &data->kobj_unregister;
>> - unlock_policy_rwsem_read(cpu);
>> - kobject_put(kobj);
>> -
>> - /* we need to make sure that the underlying kobj is actually
>> - * not referenced anymore by anybody before we proceed with
>> - * unloading.
>> - */
>> - pr_debug("waiting for dropping of refcount\n");
>> - wait_for_completion(cmp);
>> - pr_debug("wait complete\n");
>> -
>> if (cpufreq_driver->exit)
>> cpufreq_driver->exit(data);
>>
>> free_cpumask_var(data->related_cpus);
>> free_cpumask_var(data->cpus);
>> kfree(data);
>> +
>> + if (remove_sysfs) {
>> + lock_policy_rwsem_read(cpu);
>> + kobj = &data->kobj;
>> + cmp = &data->kobj_unregister;
>
> This looks no right. "data" has already been freed but data-> kobj still is
> referenced.
>

Oops! You are right. Hmm, this looks quite difficult to get right :(
There are multiple challenges here:

1. The sysfs files must not be removed during cpu_down, and not initialized

during cpu_up. That would help us preserve the file permissions.
2. But we should ensure that we really do the cpufreq-core parts of the cpu
initialization during cpu_up. If we fail to free some of the data-structures
during cpu_down, the cpu_up callback will think that a full-init is not
required and not do its job. That will make cpufreq behave erratically after
suspend/resume and take us back to square one.

3. A full re-init in the cpu_up callback also involves memory allocations.
So if we don't release the memory in the cpu_down callback, we'll end up
in a memory leak.

I tried to address all these in this patch, but you found yet another serious
loop-hole. I guess I'm out of ideas now... if anybody has any thoughts on how
to get this right, then I'm all ears. Else, we'll just revert the original
commit like Rafael suggested and leave it upto userspace to save and restore
the permissions across suspend/resume if it wants ;-(

>> + unlock_policy_rwsem_read(cpu);
>> + kobject_put(kobj);
>> +
>> + pr_debug("waiting for dropping of refcount\n");
>> + wait_for_completion(cmp);
>> + pr_debug("wait complete\n");
>> + }
>> +
>> } else if (cpufreq_driver->target) {
>> __cpufreq_governor(data, CPUFREQ_GOV_START);
>> __cpufreq_governor(data, CPUFREQ_GOV_LIMITS);
>> @@ -1221,7 +1230,7 @@ static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
>> if (cpu_is_offline(cpu))
>> return 0;
>>
>> - retval = __cpufreq_remove_dev(dev, sif);
>> + retval = __cpufreq_remove_dev(dev, sif, true);
>> return retval;
>> }

Regards,
Srivatsa S. Bhat

2013-07-11 14:03:42

by Tianyu Lan

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume

2013/7/11 Srivatsa S. Bhat <[email protected]>:
> On 07/11/2013 11:10 AM, Lan Tianyu wrote:
>> 2013/7/11 Srivatsa S. Bhat <[email protected]>:
> Oops! You are right. Hmm, this looks quite difficult to get right :(
> There are multiple challenges here:

If I understand correctly, the most concern is how to deal with per cpus'
cpufreq_policy structure. If something wrong , please correct me.

>
> 1. The sysfs files must not be removed during cpu_down, and not initialized
>
> during cpu_up. That would help us preserve the file permissions.

For this case, cpufreq_policy must be reserved since all related cpufreq data
and kobj is also store into it. We can't release it.

> 2. But we should ensure that we really do the cpufreq-core parts of the cpu
> initialization during cpu_up. If we fail to free some of the data-structures
> during cpu_down, the cpu_up callback will think that a full-init is not
> required and not do its job. That will make cpufreq behave erratically after
> suspend/resume and take us back to square one.
>

cpufreq_add_dev() checks whether the cpu has been attached cpufreq_cpu_data
and associated kobj has been created. If yes, it would try to release it by
cpufreq_cpu_put() and return directly. This seems to be conflicted
with the above.

> 3. A full re-init in the cpu_up callback also involves memory allocations.
> So if we don't release the memory in the cpu_down callback, we'll end up
> in a memory leak.
>

Even we remove the previous check, cpu scaling driver's init() also will change
some data of struct cpufreq_policy and this is also what we don't like to see.

> I tried to address all these in this patch, but you found yet another serious
> loop-hole. I guess I'm out of ideas now... if anybody has any thoughts on how
> to get this right, then I'm all ears. Else, we'll just revert the original
> commit like Rafael suggested and leave it upto userspace to save and restore
> the permissions across suspend/resume if it wants ;-(
>

How about implement scaling driver's suspend/resume callback()? Although this
needs to be dealt with case by case. If one's callbacks hasn't been implemented,
it would have to follow current rule.

>>> + unlock_policy_rwsem_read(cpu);
>>> + kobject_put(kobj);
>>> +
>>> + pr_debug("waiting for dropping of refcount\n");
>>> + wait_for_completion(cmp);
>>> + pr_debug("wait complete\n");
>>> + }
>>> +
>>> } else if (cpufreq_driver->target) {
>>> __cpufreq_governor(data, CPUFREQ_GOV_START);
>>> __cpufreq_governor(data, CPUFREQ_GOV_LIMITS);
>>> @@ -1221,7 +1230,7 @@ static int cpufreq_remove_dev(struct device *dev, struct subsys_interface *sif)
>>> if (cpu_is_offline(cpu))
>>> return 0;
>>>
>>> - retval = __cpufreq_remove_dev(dev, sif);
>>> + retval = __cpufreq_remove_dev(dev, sif, true);
>>> return retval;
>>> }
>
> Regards,
> Srivatsa S. Bhat
>

--
Best regards
Tianyu Lan

2013-07-11 14:23:21

by Alan Stern

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume

On Thu, 11 Jul 2013, Srivatsa S. Bhat wrote:

> Oops! You are right. Hmm, this looks quite difficult to get right :(
> There are multiple challenges here:
>
> 1. The sysfs files must not be removed during cpu_down, and not initialized
>
> during cpu_up. That would help us preserve the file permissions.
> 2. But we should ensure that we really do the cpufreq-core parts of the cpu
> initialization during cpu_up. If we fail to free some of the data-structures
> during cpu_down, the cpu_up callback will think that a full-init is not
> required and not do its job. That will make cpufreq behave erratically after
> suspend/resume and take us back to square one.
>
> 3. A full re-init in the cpu_up callback also involves memory allocations.
> So if we don't release the memory in the cpu_down callback, we'll end up
> in a memory leak.
>
> I tried to address all these in this patch, but you found yet another serious
> loop-hole. I guess I'm out of ideas now... if anybody has any thoughts on how
> to get this right, then I'm all ears. Else, we'll just revert the original
> commit like Rafael suggested and leave it upto userspace to save and restore
> the permissions across suspend/resume if it wants ;-(

Asking as a naive outsider who is completely unfamiliar with the code,
why are any of these things at all troublesome?

Can't cpu_up tell the difference between activating a brand-new
CPU and reactivating one that was present before but was
temporarily disabled?

Doesn't cpu_up know which data structures get freed when an
active CPU is temporarily deactivated?

Doesn't cpu_down know what memory gets allocated in cpu_up?
Can't it deallocate just the right parts for the type of
transition it is doing?

It sounds like you're really asking how to make sure that cpu_up and
cpu_down both know what the other is doing, so that each can do the
opposite of the other. That doesn't sound hard.

Alan Stern

2013-07-11 14:28:16

by Srivatsa S. Bhat

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume

On 07/11/2013 07:33 PM, Lan Tianyu wrote:
> 2013/7/11 Srivatsa S. Bhat <[email protected]>:

>> I tried to address all these in this patch, but you found yet another serious
>> loop-hole. I guess I'm out of ideas now... if anybody has any thoughts on how
>> to get this right, then I'm all ears. Else, we'll just revert the original
>> commit like Rafael suggested and leave it upto userspace to save and restore
>> the permissions across suspend/resume if it wants ;-(
>>
>
> How about implement scaling driver's suspend/resume callback()? Although this
> needs to be dealt with case by case. If one's callbacks hasn't been implemented,
> it would have to follow current rule.
>

Well, I'm now trying a slightly different approach at reorganizing the code,
and so far I think I'll be able to get it right this time. Let's see how it goes.

Regards,
Srivatsa S. Bhat

2013-07-11 14:42:29

by Srivatsa S. Bhat

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume

On 07/11/2013 07:53 PM, Alan Stern wrote:
> On Thu, 11 Jul 2013, Srivatsa S. Bhat wrote:
>
>> Oops! You are right. Hmm, this looks quite difficult to get right :(
>> There are multiple challenges here:
>>
>> 1. The sysfs files must not be removed during cpu_down, and not initialized
>>
>> during cpu_up. That would help us preserve the file permissions.
>> 2. But we should ensure that we really do the cpufreq-core parts of the cpu
>> initialization during cpu_up. If we fail to free some of the data-structures
>> during cpu_down, the cpu_up callback will think that a full-init is not
>> required and not do its job. That will make cpufreq behave erratically after
>> suspend/resume and take us back to square one.
>>
>> 3. A full re-init in the cpu_up callback also involves memory allocations.
>> So if we don't release the memory in the cpu_down callback, we'll end up
>> in a memory leak.
>>
>> I tried to address all these in this patch, but you found yet another serious
>> loop-hole. I guess I'm out of ideas now... if anybody has any thoughts on how
>> to get this right, then I'm all ears. Else, we'll just revert the original
>> commit like Rafael suggested and leave it upto userspace to save and restore
>> the permissions across suspend/resume if it wants ;-(
>
> Asking as a naive outsider who is completely unfamiliar with the code,
> why are any of these things at all troublesome?
>
> Can't cpu_up tell the difference between activating a brand-new
> CPU and reactivating one that was present before but was
> temporarily disabled?
>
> Doesn't cpu_up know which data structures get freed when an
> active CPU is temporarily deactivated?
>
> Doesn't cpu_down know what memory gets allocated in cpu_up?
> Can't it deallocate just the right parts for the type of
> transition it is doing?
>
> It sounds like you're really asking how to make sure that cpu_up and
> cpu_down both know what the other is doing, so that each can do the
> opposite of the other. That doesn't sound hard.
>

It would not have been hard if the code had been already structured that
way. Currently, the code is quite a bit entangled, and it doesn't distinguish
the "temporarily deactivated" and the "fully torn-down" cases. To add to the
mess, the code is generously sprinkled with notifiers that are invoked every
now and then (which depend on previous init steps etc).

But the bottomline is that this is purely a code-reorganization issue,
nothing more than that. And hopefully we'll be able to separate out the
"temporarily deactivated" and the "fully down" cases reasonably well, such
that we can preserve the sysfs file permissions easily across suspend/resume.
That's the code reorganization I'm working on at the moment; I'll post it
out as soon as I'm done.

Regards,
Srivatsa S. Bhat

2013-07-13 10:16:39

by Paul Bolle

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume

On Wed, 2013-07-10 at 22:50 +0200, Toralf Förster wrote:
> I tested the patch several times on top of a66b2e5 - the origin issue is
> fixed

Srivatsa's patch became commit f51e1eb63d ("cpufreq: Fix cpufreq
regression after suspend/resume").

> but - erratically another issue now appears : all 4 cores are runs
> after wakeup at 2.6 GHz.

Well, a laptop I use seems to run into something related: the second of
its two cores can get stuck at a fixed frequency after resume. Often
it's its maximum frequency. That makes the CPU run hot, without actually
doing much work. But it can also get stuck at its lowest frequency.

Please note that commit f51e1eb63d, which is part of v3.10.1-rc1,
doesn't seem to cause this behavior. This issue can already be seen when
running v3.10.0. So perhaps that commit just made this issue more
noticeable to Toralf. But it's not clear to me what Toralf's origin(al?)
issue actually was.

Anyhow, I suspect that the stuck frequency is a regression introduced in
v3.10.0. But I have been unable to pinpoint it to a commit added in the
v3.10 cycle through bisecting. I must have marked a commit good that was
actually bad, just because a few suspend and resume cycles didn't
trigger this issue. To be continued...

> The temporary hot fix is to switch between governor performance and
> ondemand for all 4 cores.

That workaround works here too. I switch the core that is stuck at a
particular frequency to some other governor and then back to ondemand.

Thanks,

Paul Bolle

2013-07-13 12:52:56

by Paul Bolle

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume

On Sat, 2013-07-13 at 12:16 +0200, Paul Bolle wrote:
> I suspect that the stuck frequency is a regression introduced in v3.10.0.

The culprit apparently is commit a66b2e503f ("cpufreq: Preserve sysfs
files across suspend/resume"). Srivatsa submitted a patch to revert that
commit (see https://lkml.org/lkml/2013/7/11/661 ). That revert seems to
fix this issue too.

Paul Bolle

2013-07-15 06:17:22

by Srivatsa S. Bhat

[permalink] [raw]

Subject: Re: [PATCH] cpufreq: Fix cpufreq regression after suspend/resume

On 07/13/2013 06:22 PM, Paul Bolle wrote:
> On Sat, 2013-07-13 at 12:16 +0200, Paul Bolle wrote:
>> I suspect that the stuck frequency is a regression introduced in v3.10.0.
>
> The culprit apparently is commit a66b2e503f ("cpufreq: Preserve sysfs
> files across suspend/resume"). Srivatsa submitted a patch to revert that
> commit (see https://lkml.org/lkml/2013/7/11/661 ). That revert seems to
> fix this issue too.
>
>

Thanks a lot for your tests and for confirming that the complete revert
fixes your cpufreq issue.

Regards,
Srivatsa S. Bhat