[permalink] [raw]

Subject: Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

On Monday, July 15, 2013 03:35:04 PM Srivatsa S. Bhat wrote:
> On 07/15/2013 03:25 PM, Viresh Kumar wrote:
> > Hi Srivatsa,
> >
> > I may be wrong but it looks something is wrong in this patch.
> >
> > On 12 July 2013 03:47, Srivatsa S. Bhat
> > <[email protected]> wrote:
> >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> >
> >> @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev,
> >> if ((cpus == 1) && (cpufreq_driver->target))
> >> __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
> >>
> >> - pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
> >> - cpufreq_cpu_put(data);
> >> + if (!frozen) {
> >> + pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
> >> + cpufreq_cpu_put(data);
> >
> > So, we don't decrement usage count here. But we are still increasing
> > counts on cpufreq_add_dev after resume, isn't it?
> >
> > So, we wouldn't be able to free policy struct once all the cpus of a
> > policy are removed after suspend/resume has happened once.
> >
>
> Actually even I was wondering about this while writing the patch and
> I even tested shutdown after multiple suspend/resume cycles, to verify that
> the refcount is messed up. But surprisingly, things worked just fine.
>
> Logically there should've been a refcount mismatch and things should have
> failed, but everything worked fine during my tests. Apart from suspend/resume
> and shutdown tests, I even tried mixing a few regular CPU hotplug operations
> (echo 0/1 to sysfs online files), but nothing stood out.
>
> Sorry, I forgot to document this in the patch. Either the patch is wrong
> or something else is silently fixing this up. Not sure what is the exact
> situation.

OK, so I'm not going to queue [2-8/8] up until we find out what's going on
here (and until Toralf tells me that it doesn't break his system any more).

I've queued up [1/8] for 3.11 already.

Thanks,
Rafael

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-07-15 11:27:43

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: [PATCH 2/8] cpufreq: Fix misplaced call to cpufreq_update_policy()

On Monday, July 15, 2013 11:50:24 AM Srivatsa S. Bhat wrote:
> On 07/12/2013 12:36 PM, Viresh Kumar wrote:
> > On 12 July 2013 03:45, Srivatsa S. Bhat
> > <[email protected]> wrote:
> >> The call to cpufreq_update_policy() is placed in the CPU hotplug callback
> >> of cpufreq_stats, which has a higher priority than the CPU hotplug callback
> >> of cpufreq-core. As a result, during CPU_ONLINE/CPU_ONLINE_FROZEN, we end up
> >> calling cpufreq_update_policy() *before* calling cpufreq_add_dev() !
> >> And for uninitialized CPUs, it just returns silently, not doing anything.
> >
> > Hmm..
> >
> >> To add to it, cpufreq_stats is not even the right place to call
> >> cpufreq_update_policy() to begin with. The cpufreq core ought to handle
> >> this in its own callback, from an elegance/relevance perspective.
> >>
> >> So move the invocation of cpufreq_update_policy() to cpufreq_cpu_callback,
> >> and place it *after* cpufreq_add_dev().
> >>
> >> Signed-off-by: Srivatsa S. Bhat <[email protected]>
> >> ---
> >>
> >> drivers/cpufreq/cpufreq.c | 1 +
> >> drivers/cpufreq/cpufreq_stats.c | 6 ------
> >> 2 files changed, 1 insertion(+), 6 deletions(-)
> >>
> >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> >> index ccc6eab..f8c3100 100644
> >> --- a/drivers/cpufreq/cpufreq.c
> >> +++ b/drivers/cpufreq/cpufreq.c
> >> @@ -1943,6 +1943,7 @@ static int __cpuinit cpufreq_cpu_callback(struct notifier_block *nfb,
> >> case CPU_ONLINE:
> >> case CPU_ONLINE_FROZEN:
> >> cpufreq_add_dev(dev, NULL);
> >> + cpufreq_update_policy(cpu);
> >
> > Do we need to call this for every hotplug of cpu? I am not
> > talking about suspend/resume here.
> >
>
> I don't think we need to, but I think it would be better to postpone
> optimizations until all the cpufreq regressions get fixed. Later perhaps
> we could revisit these minor optimizations if desired.

Agreed.

Thanks,
Rafael

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2013-07-15 11:56:31

by Srivatsa S. Bhat

[permalink] [raw]

Subject: Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

On 07/15/2013 03:51 PM, Viresh Kumar wrote:
> On 15 July 2013 15:35, Srivatsa S. Bhat
> <[email protected]> wrote:
>> Actually even I was wondering about this while writing the patch and
>> I even tested shutdown after multiple suspend/resume cycles, to verify that
>> the refcount is messed up. But surprisingly, things worked just fine.
>
> What kind of system have you tested it on?
>

The system has 2 sockets with 8 cores each, and has Intel Sandybridge
CPUs. I had used a local patch to simulate CPU hotplug in the suspend-to-ram
path using the freeze state of pm_test (because I had other problems in
using the 'processors' state of pm_test). The patch is shown below:

diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index ece0422..fe07b77 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -342,8 +342,13 @@ static int enter_state(suspend_state_t state)
if (error)
goto Unlock;

- if (suspend_test(TEST_FREEZER))
+ if (suspend_test(TEST_FREEZER)) {
+ pr_debug("Disabling nonboot CPUs\n");
+ disable_nonboot_cpus();
+ pr_debug("Enabling nonboot CPUs\n");
+ enable_nonboot_cpus();
goto Finish;
+ }

pr_debug("PM: Entering %s sleep\n", pm_states[state]);
pm_restrict_gfp_mask();

Regards,
Srivatsa S. Bhat

2013-07-15 11:57:17

by Srivatsa S. Bhat

[permalink] [raw]

Subject: Re: [PATCH 7/8] cpufreq: Preserve policy structure across suspend/resume

On 07/15/2013 05:05 PM, Rafael J. Wysocki wrote:
> On Monday, July 15, 2013 03:35:04 PM Srivatsa S. Bhat wrote:
>> On 07/15/2013 03:25 PM, Viresh Kumar wrote:
>>> Hi Srivatsa,
>>>
>>> I may be wrong but it looks something is wrong in this patch.
>>>
>>> On 12 July 2013 03:47, Srivatsa S. Bhat
>>> <[email protected]> wrote:
>>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>>
>>>> @@ -1239,29 +1263,40 @@ static int __cpufreq_remove_dev(struct device *dev,
>>>> if ((cpus == 1) && (cpufreq_driver->target))
>>>> __cpufreq_governor(data, CPUFREQ_GOV_POLICY_EXIT);
>>>>
>>>> - pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
>>>> - cpufreq_cpu_put(data);
>>>> + if (!frozen) {
>>>> + pr_debug("%s: removing link, cpu: %d\n", __func__, cpu);
>>>> + cpufreq_cpu_put(data);
>>>
>>> So, we don't decrement usage count here. But we are still increasing
>>> counts on cpufreq_add_dev after resume, isn't it?
>>>
>>> So, we wouldn't be able to free policy struct once all the cpus of a
>>> policy are removed after suspend/resume has happened once.
>>>
>>
>> Actually even I was wondering about this while writing the patch and
>> I even tested shutdown after multiple suspend/resume cycles, to verify that
>> the refcount is messed up. But surprisingly, things worked just fine.
>>
>> Logically there should've been a refcount mismatch and things should have
>> failed, but everything worked fine during my tests. Apart from suspend/resume
>> and shutdown tests, I even tried mixing a few regular CPU hotplug operations
>> (echo 0/1 to sysfs online files), but nothing stood out.
>>
>> Sorry, I forgot to document this in the patch. Either the patch is wrong
>> or something else is silently fixing this up. Not sure what is the exact
>> situation.
>
> OK, so I'm not going to queue [2-8/8] up until we find out what's going on
> here (and until Toralf tells me that it doesn't break his system any more).
>

Ok, that sounds good.

> I've queued up [1/8] for 3.11 already.
>

Thank you!

Regards,
Srivatsa S. Bhat

2013-07-15 17:39:00

On 07/21/2013 03:10 PM, Toralf Förster wrote:
> On 07/21/2013 10:43 AM, Srivatsa S. Bhat wrote:
>> On 07/17/2013 09:19 PM, Srivatsa S. Bhat wrote:
>>> On 07/17/2013 08:57 PM, Toralf Förster wrote:
>>>> On 07/16/2013 11:32 PM, Rafael J. Wysocki wrote:
>>>>> On Tuesday, July 16, 2013 05:15:14 PM Toralf Förster wrote:
>> [...]
>>>>>> sry - here again with full quote of the email :
>>>>>>
>>>>>> I applied patch [1/8] on top of v3.11-rc1-8-g47188d3 passes two s2ram/wakeup
>>>>>> cycles fine and crashed the system at the 3rd attempt / one times just at
>>>>>> the 4th (blinking power led, no sysrq, ...).
>>>>>>
>>>>>> Applying patch 1-8 on top of that tree differs in that way that it
>>>>>> crashes now the system even at the 1st attempt or at least at the 2nd
>>>>>>
>>>>>> My hardware is a ThinkPad T420 with latest BIOS and a 32 bit stable
>>>>>> Gentoo Linux - FWIW .config attached.
>>>>>
>>>>> I think you'll need the fixes first, basically [1/8] from this series and
>>>>> this: https://patchwork.kernel.org/patch/2827512/ .
>>>>>
>>>>> Please try to run with these two things applied only and see how that goes.
>>>>>
>>>>> Thanks,
>>>>> Rafael
>>>>>
>>>>>
>>>> That was it.
>>>>
>>>> Applying https://patchwork.kernel.org/patch/2827512/ and then patch
>>>> [1/8] on top of v3.11-rc1-8-g47188d3 works fine and solved the reported
>>>> issue.
>>>>
>>>> Furthermore applying patches 2-8 works too - suspend/wakeup works fine
>>>> and frequencies are scaled right after wakeup at the T420.
>>>>
>>>
>>> Phew! Finally :-)
>>>
>>> Thank you for all your testing efforts!
>>>
>>
>> Rafael, Viresh, any thoughts on picking up patches 2-8 from this series
>> for 3.12?
>
> What's about the additional patch Rafael adviced me in this thread:
> https://patchwork.kernel.org/patch/2827512/
> ?
>
> On my ThinkPad T420 I do have to apply this on top of 3.10.1 otherwise I
> do suffer from the hang during s2ram.
>

Rafael already picked it up and in fact its now in the mainline kernel
as commit:

commit e8d05276f236ee6435e78411f62be9714e0b9377
Author: Srivatsa S. Bhat <[email protected]>
Date: Tue Jul 16 22:46:48 2013 +0200

cpufreq: Revert commit 2f7021a8 to fix CPU hotplug regression

Thanks for following up on that fix, Toralf!
Regards,
Srivatsa S. Bhat

2013-07-21 12:49:32

by Rafael J. Wysocki

[permalink] [raw]

Subject: Re: [PATCH 0/8] Cpufreq, cpu hotplug, suspend/resume related fixes

On Sunday, July 21, 2013 02:13:42 PM Srivatsa S. Bhat wrote:
> On 07/17/2013 09:19 PM, Srivatsa S. Bhat wrote:
> > On 07/17/2013 08:57 PM, Toralf Förster wrote:
> >> On 07/16/2013 11:32 PM, Rafael J. Wysocki wrote:
> >>> On Tuesday, July 16, 2013 05:15:14 PM Toralf Förster wrote:
> [...]
> >>>> sry - here again with full quote of the email :
> >>>>
> >>>> I applied patch [1/8] on top of v3.11-rc1-8-g47188d3 passes two s2ram/wakeup
> >>>> cycles fine and crashed the system at the 3rd attempt / one times just at
> >>>> the 4th (blinking power led, no sysrq, ...).
> >>>>
> >>>> Applying patch 1-8 on top of that tree differs in that way that it
> >>>> crashes now the system even at the 1st attempt or at least at the 2nd
> >>>>
> >>>> My hardware is a ThinkPad T420 with latest BIOS and a 32 bit stable
> >>>> Gentoo Linux - FWIW .config attached.
> >>>
> >>> I think you'll need the fixes first, basically [1/8] from this series and
> >>> this: https://patchwork.kernel.org/patch/2827512/ .
> >>>
> >>> Please try to run with these two things applied only and see how that goes.
> >>>
> >>> Thanks,
> >>> Rafael
> >>>
> >>>
> >> That was it.
> >>
> >> Applying https://patchwork.kernel.org/patch/2827512/ and then patch
> >> [1/8] on top of v3.11-rc1-8-g47188d3 works fine and solved the reported
> >> issue.
> >>
> >> Furthermore applying patches 2-8 works too - suspend/wakeup works fine
> >> and frequencies are scaled right after wakeup at the T420.
> >>
> >
> > Phew! Finally :-)
> >
> > Thank you for all your testing efforts!
> >
>
> Rafael, Viresh, any thoughts on picking up patches 2-8 from this series
> for 3.12?
>
> From the discussions on this thread so far, there are no pending issues:
> Toralf verified that these patches work fine on his system, as he mentioned
> above, and Tianyu Lan independently tested this patchset and found no
> issues with them. Also, Viresh analyzed the refcounting used in the patches
> and we came to the conclusion that there is no problem with them either.

Yes, I'm going to queue up that series for 3.12.

Thanks,
Rafael

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.