2018-01-23 21:59:24

by Bo Yan

[permalink] [raw]
Subject: [PATCH] cpufreq: skip cpufreq resume if it's not suspended

cpufreq_resume can be called even without preceding cpufreq_suspend.
This can happen in following scenario:

suspend_devices_and_enter
--> dpm_suspend_start
--> dpm_prepare
--> device_prepare : this function errors out
--> dpm_suspend: this is skipped due to dpm_prepare failure
this means cpufreq_suspend is skipped over
--> goto Recover_platform, due to previous error
--> goto Resume_devices
--> dpm_resume_end
--> dpm_resume
--> cpufreq_resume

In case schedutil is used as frequency governor, cpufreq_resume will
eventually call sugov_start, which does following:

memset(sg_cpu, 0, sizeof(*sg_cpu));
....

This effectively erases function pointer for frequency update, causing
crash later on. The function pointer would have been set correctly if
subsequent cpufreq_add_update_util_hook runs successfully, but that
function returns earlier because cpufreq_suspend was not called:

if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
return;

Ideally, suspend should succeed, then things will be fine. But even
in case of suspend failure, system should not crash.

The fix is to check cpufreq_suspended first, if it's false, that means
cpufreq_suspend was not called in the first place, so do not resume
cpufreq.

Signed-off-by: Bo Yan <[email protected]>
---
drivers/cpufreq/cpufreq.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 41d148af7748..95b1c4afe14e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
if (!cpufreq_driver)
return;

+ if (unlikely(!cpufreq_suspended)) {
+ pr_warn("%s: resume after failing suspend\n", __func__);
+ return;
+ }
cpufreq_suspended = false;

if (!has_target() && !cpufreq_driver->resume)
--
2.7.4



2018-01-24 02:04:20

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended

On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:
> cpufreq_resume can be called even without preceding cpufreq_suspend.
> This can happen in following scenario:
>
> suspend_devices_and_enter
> --> dpm_suspend_start
> --> dpm_prepare
> --> device_prepare : this function errors out
> --> dpm_suspend: this is skipped due to dpm_prepare failure
> this means cpufreq_suspend is skipped over
> --> goto Recover_platform, due to previous error
> --> goto Resume_devices
> --> dpm_resume_end
> --> dpm_resume
> --> cpufreq_resume
>
> In case schedutil is used as frequency governor, cpufreq_resume will
> eventually call sugov_start, which does following:
>
> memset(sg_cpu, 0, sizeof(*sg_cpu));
> ....
>
> This effectively erases function pointer for frequency update, causing
> crash later on. The function pointer would have been set correctly if
> subsequent cpufreq_add_update_util_hook runs successfully, but that
> function returns earlier because cpufreq_suspend was not called:
>
> if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
> return;
>
> Ideally, suspend should succeed, then things will be fine. But even
> in case of suspend failure, system should not crash.
>
> The fix is to check cpufreq_suspended first, if it's false, that means
> cpufreq_suspend was not called in the first place, so do not resume
> cpufreq.
>
> Signed-off-by: Bo Yan <[email protected]>
> ---
> drivers/cpufreq/cpufreq.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 41d148af7748..95b1c4afe14e 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
> if (!cpufreq_driver)
> return;
>
> + if (unlikely(!cpufreq_suspended)) {
> + pr_warn("%s: resume after failing suspend\n", __func__);
> + return;
> + }
> cpufreq_suspended = false;
>
> if (!has_target() && !cpufreq_driver->resume)
>

Good catch, but rather than doing this it would be better to avoid
calling cpufreq_resume() at all if cpufreq_suspend() has not been called.

Thanks,
Rafael



2018-01-24 20:54:12

by Bo Yan

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended


On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote:
> On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:
>> drivers/cpufreq/cpufreq.c | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>> index 41d148af7748..95b1c4afe14e 100644
>> --- a/drivers/cpufreq/cpufreq.c
>> +++ b/drivers/cpufreq/cpufreq.c
>> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
>> if (!cpufreq_driver)
>> return;
>>
>> + if (unlikely(!cpufreq_suspended)) {
>> + pr_warn("%s: resume after failing suspend\n", __func__);
>> + return;
>> + }
>> cpufreq_suspended = false;
>>
>> if (!has_target() && !cpufreq_driver->resume)
>>
> Good catch, but rather than doing this it would be better to avoid
> calling cpufreq_resume() at all if cpufreq_suspend() has not been called.
Yes, I thought about that, but there is no good way to skip over it
without introducing another flag. cpufreq_resume is called by
dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure
case, dpm_resume is called, but dpm_suspend is not. So on a higher level
it's already unbalanced.

One possibility is to rely on the pm_transition flag. So something like:


diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index dc259d20c967..8469e6fc2b2c 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t
cookie)
 void dpm_resume(pm_message_t state)
 {
        struct device *dev;
+       bool suspended = (pm_transition.event != PM_EVENT_ON);
        ktime_t starttime = ktime_get();

        trace_suspend_resume(TPS("dpm_resume"), state.event, true);
@@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state)
        async_synchronize_full();
        dpm_show_time(starttime, state, NULL);

-       cpufreq_resume();
+       if (likely(suspended))
+               cpufreq_resume();
        trace_suspend_resume(TPS("dpm_resume"), state.event, false);
 }

This relies on the fact that the pm_transition will stay as PMSG_ON if
dpm_prepare failed, in which case dpm_suspend will be skipped over,
pm_transition will remain as 0 until dpm_resume.

dpm_suspend changes pm_transition to whatever state it receives, which
is never PMSG_ON. pm_transition is not changing to PMSG_ON before
dpm_resume. This is my understanding. does this make sense?


>
> Thanks,
> Rafael
>
>


2018-01-25 19:17:08

by Bo Yan

[permalink] [raw]
Subject: [PATCH v2] cpufreq: skip cpufreq resume if it's not suspended

cpufreq_resume can be called even without preceding cpufreq_suspend.
This can happen in following scenario:

suspend_devices_and_enter
--> dpm_suspend_start
--> dpm_prepare
--> device_prepare : this function errors out
--> dpm_suspend: this is skipped due to dpm_prepare failure
this means cpufreq_suspend is skipped over
--> goto Recover_platform, due to previous error
--> goto Resume_devices
--> dpm_resume_end
--> dpm_resume
--> cpufreq_resume

In case schedutil is used as frequency governor, cpufreq_resume will
eventually call sugov_start, which does following:

memset(sg_cpu, 0, sizeof(*sg_cpu));
....

This effectively erases function pointer for frequency update, causing
crash later on. The function pointer would have been set correctly if
subsequent cpufreq_add_update_util_hook runs successfully, but that
function returns earlier because cpufreq_suspend was not called:

if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
return;

Ideally, suspend should succeed, then things will be fine. But even
in case of suspend failure, system should not crash.

The fix is to check the pm_transition status in dpm_resume. if
pm_transition.event == PMSG_ON, we know for sure dpm_suspend has not
been called, so do not call cpufreq_resume.

Signed-off-by: Bo Yan <[email protected]>
---
drivers/base/power/main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 08744b572af6..39829d7a9311 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -921,6 +921,7 @@ static void async_resume(void *data, async_cookie_t cookie)
void dpm_resume(pm_message_t state)
{
struct device *dev;
+ bool suspended = (pm_transition.event != PM_EVENT_ON);
ktime_t starttime = ktime_get();

trace_suspend_resume(TPS("dpm_resume"), state.event, true);
@@ -964,7 +965,8 @@ void dpm_resume(pm_message_t state)
async_synchronize_full();
dpm_show_time(starttime, state, 0, NULL);

- cpufreq_resume();
+ if (likely(suspended))
+ cpufreq_resume();
trace_suspend_resume(TPS("dpm_resume"), state.event, false);
}

--
2.7.4


2018-02-02 11:56:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended

On Wednesday, January 24, 2018 9:53:14 PM CET Bo Yan wrote:
>
> On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote:
> > On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:
> >> drivers/cpufreq/cpufreq.c | 4 ++++
> >> 1 file changed, 4 insertions(+)
> >>
> >> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> >> index 41d148af7748..95b1c4afe14e 100644
> >> --- a/drivers/cpufreq/cpufreq.c
> >> +++ b/drivers/cpufreq/cpufreq.c
> >> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
> >> if (!cpufreq_driver)
> >> return;
> >>
> >> + if (unlikely(!cpufreq_suspended)) {
> >> + pr_warn("%s: resume after failing suspend\n", __func__);
> >> + return;
> >> + }
> >> cpufreq_suspended = false;
> >>
> >> if (!has_target() && !cpufreq_driver->resume)
> >>
> > Good catch, but rather than doing this it would be better to avoid
> > calling cpufreq_resume() at all if cpufreq_suspend() has not been called.
> Yes, I thought about that, but there is no good way to skip over it
> without introducing another flag. cpufreq_resume is called by
> dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure
> case, dpm_resume is called, but dpm_suspend is not. So on a higher level
> it's already unbalanced.
>
> One possibility is to rely on the pm_transition flag. So something like:
>
>
> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
> index dc259d20c967..8469e6fc2b2c 100644
> --- a/drivers/base/power/main.c
> +++ b/drivers/base/power/main.c
> @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t
> cookie)
> void dpm_resume(pm_message_t state)
> {
> struct device *dev;
> + bool suspended = (pm_transition.event != PM_EVENT_ON);
> ktime_t starttime = ktime_get();
>
> trace_suspend_resume(TPS("dpm_resume"), state.event, true);
> @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state)
> async_synchronize_full();
> dpm_show_time(starttime, state, NULL);
>
> - cpufreq_resume();
> + if (likely(suspended))
> + cpufreq_resume();
> trace_suspend_resume(TPS("dpm_resume"), state.event, false);
> }

I was thinking about something else.

Anyway, I think your original patch is OK too, but without printing the
message. Just combine the cpufreq_suspended check with the cpufreq_driver
one and the unlikely() thing is not necessary.

Thanks,
Rafael


2018-02-02 21:36:52

by Bo Yan

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended

On 02/02/2018 11:34 AM, Saravana Kannan wrote:
> On 02/02/2018 03:54 AM, Rafael J. Wysocki wrote:
>> On Wednesday, January 24, 2018 9:53:14 PM CET Bo Yan wrote:
>>>
>>> On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote:
>>>> On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:
>>>>>    drivers/cpufreq/cpufreq.c | 4 ++++
>>>>>    1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>>>> index 41d148af7748..95b1c4afe14e 100644
>>>>> --- a/drivers/cpufreq/cpufreq.c
>>>>> +++ b/drivers/cpufreq/cpufreq.c
>>>>> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
>>>>>        if (!cpufreq_driver)
>>>>>            return;
>>>>>
>>>>> +    if (unlikely(!cpufreq_suspended)) {
>>>>> +        pr_warn("%s: resume after failing suspend\n", __func__);
>>>>> +        return;
>>>>> +    }
>>>>>        cpufreq_suspended = false;
>>>>>
>>>>>        if (!has_target() && !cpufreq_driver->resume)
>>>>>
>>>> Good catch, but rather than doing this it would be better to avoid
>>>> calling cpufreq_resume() at all if cpufreq_suspend() has not been
>>>> called.
>>> Yes, I thought about that, but there is no good way to skip over it
>>> without introducing another flag. cpufreq_resume is called by
>>> dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure
>>> case, dpm_resume is called, but dpm_suspend is not. So on a higher
>>> level
>>> it's already unbalanced.
>>>
>>> One possibility is to rely on the pm_transition flag. So something
>>> like:
>>>
>>>
>>> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
>>> index dc259d20c967..8469e6fc2b2c 100644
>>> --- a/drivers/base/power/main.c
>>> +++ b/drivers/base/power/main.c
>>> @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t
>>> cookie)
>>>    void dpm_resume(pm_message_t state)
>>>    {
>>>           struct device *dev;
>>> +       bool suspended = (pm_transition.event != PM_EVENT_ON);
>>>           ktime_t starttime = ktime_get();
>>>
>>>           trace_suspend_resume(TPS("dpm_resume"), state.event, true);
>>> @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state)
>>>           async_synchronize_full();
>>>           dpm_show_time(starttime, state, NULL);
>>>
>>> -       cpufreq_resume();
>>> +       if (likely(suspended))
>>> +               cpufreq_resume();
>>>           trace_suspend_resume(TPS("dpm_resume"), state.event, false);
>>>    }
>>
>> I was thinking about something else.
>>
>> Anyway, I think your original patch is OK too, but without printing the
>> message.  Just combine the cpufreq_suspended check with the
>> cpufreq_driver
>> one and the unlikely() thing is not necessary.
>>
>
> I rather have this fixed in the dpm_suspend/resume() code. This is
> just masking the first issue that's being caused by unbalanced error
> handling. If that means adding flags in dpm_suspend/resume() then
> that's what we should do right now and clean it up later if it can be
> improved. Making cpufreq more messy doesn't seem like the right answer.
>
> Thanks,
> Saravana
>
>
dpm_suspend and dpm_resume by themselves are not balanced in this
particular case. As it's currently structured, dpm_resume can't be
omitted even if dpm_suspend is skipped due to earlier failure.  I think
checking cpufreq_suspended flag is a reasonable compromise. If we can
find a way to make dpm_suspend/dpm_resume also balanced, that will be best.


2018-02-02 22:06:35

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended

On 02/02/2018 03:54 AM, Rafael J. Wysocki wrote:
> On Wednesday, January 24, 2018 9:53:14 PM CET Bo Yan wrote:
>>
>> On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote:
>>> On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:
>>>> drivers/cpufreq/cpufreq.c | 4 ++++
>>>> 1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>>>> index 41d148af7748..95b1c4afe14e 100644
>>>> --- a/drivers/cpufreq/cpufreq.c
>>>> +++ b/drivers/cpufreq/cpufreq.c
>>>> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
>>>> if (!cpufreq_driver)
>>>> return;
>>>>
>>>> + if (unlikely(!cpufreq_suspended)) {
>>>> + pr_warn("%s: resume after failing suspend\n", __func__);
>>>> + return;
>>>> + }
>>>> cpufreq_suspended = false;
>>>>
>>>> if (!has_target() && !cpufreq_driver->resume)
>>>>
>>> Good catch, but rather than doing this it would be better to avoid
>>> calling cpufreq_resume() at all if cpufreq_suspend() has not been called.
>> Yes, I thought about that, but there is no good way to skip over it
>> without introducing another flag. cpufreq_resume is called by
>> dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure
>> case, dpm_resume is called, but dpm_suspend is not. So on a higher level
>> it's already unbalanced.
>>
>> One possibility is to rely on the pm_transition flag. So something like:
>>
>>
>> diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
>> index dc259d20c967..8469e6fc2b2c 100644
>> --- a/drivers/base/power/main.c
>> +++ b/drivers/base/power/main.c
>> @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t
>> cookie)
>> void dpm_resume(pm_message_t state)
>> {
>> struct device *dev;
>> + bool suspended = (pm_transition.event != PM_EVENT_ON);
>> ktime_t starttime = ktime_get();
>>
>> trace_suspend_resume(TPS("dpm_resume"), state.event, true);
>> @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state)
>> async_synchronize_full();
>> dpm_show_time(starttime, state, NULL);
>>
>> - cpufreq_resume();
>> + if (likely(suspended))
>> + cpufreq_resume();
>> trace_suspend_resume(TPS("dpm_resume"), state.event, false);
>> }
>
> I was thinking about something else.
>
> Anyway, I think your original patch is OK too, but without printing the
> message. Just combine the cpufreq_suspended check with the cpufreq_driver
> one and the unlikely() thing is not necessary.
>

I rather have this fixed in the dpm_suspend/resume() code. This is just
masking the first issue that's being caused by unbalanced error
handling. If that means adding flags in dpm_suspend/resume() then that's
what we should do right now and clean it up later if it can be improved.
Making cpufreq more messy doesn't seem like the right answer.

Thanks,
Saravana


--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

2018-02-05 04:04:27

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended

On 02-02-18, 13:28, Bo Yan wrote:
> On 02/02/2018 11:34 AM, Saravana Kannan wrote:
> >I rather have this fixed in the dpm_suspend/resume() code. This is just
> >masking the first issue that's being caused by unbalanced error handling.
> >If that means adding flags in dpm_suspend/resume() then that's what we
> >should do right now and clean it up later if it can be improved. Making
> >cpufreq more messy doesn't seem like the right answer.

+1

> dpm_suspend and dpm_resume by themselves are not balanced in this particular
> case. As it's currently structured, dpm_resume can't be omitted even if
> dpm_suspend is skipped due to earlier failure.? I think checking
> cpufreq_suspended flag is a reasonable compromise. If we can find a way to
> make dpm_suspend/dpm_resume also balanced, that will be best.

I think cpufreq is just one of the users which broke. Others didn't break
because:

- They don't have a complicated resume part.
- Or we just don't know that they broke.

Resuming something that never suspended is just broken by design. Yeah, its much
simpler in this particular case to fix cpufreq core but the
suspend/resume/hibernation part is really core kernel and should be fixed to
avoid such band-aids.

--
viresh

2018-02-05 08:53:42

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended

On Monday, February 5, 2018 5:01:18 AM CET Viresh Kumar wrote:
> On 02-02-18, 13:28, Bo Yan wrote:
> > On 02/02/2018 11:34 AM, Saravana Kannan wrote:
> > >I rather have this fixed in the dpm_suspend/resume() code. This is just
> > >masking the first issue that's being caused by unbalanced error handling.
> > >If that means adding flags in dpm_suspend/resume() then that's what we
> > >should do right now and clean it up later if it can be improved. Making
> > >cpufreq more messy doesn't seem like the right answer.
>
> +1
>
> > dpm_suspend and dpm_resume by themselves are not balanced in this particular
> > case. As it's currently structured, dpm_resume can't be omitted even if
> > dpm_suspend is skipped due to earlier failure. I think checking
> > cpufreq_suspended flag is a reasonable compromise. If we can find a way to
> > make dpm_suspend/dpm_resume also balanced, that will be best.
>
> I think cpufreq is just one of the users which broke. Others didn't break
> because:
>
> - They don't have a complicated resume part.
> - Or we just don't know that they broke.

No and no.

> Resuming something that never suspended is just broken by design. Yeah, its much
> simpler in this particular case to fix cpufreq core but the
> suspend/resume/hibernation part is really core kernel and should be fixed to
> avoid such band-aids.

By design (which I admit may be confusing) it should be fine to call
dpm_resume_end() after a failing dpm_suspend_start(), whatever the reason
for the failure is. cpufreq_suspend/resume() don't take that into account,
everybody else does.

Thanks,
Rafael


2018-02-05 09:07:26

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended

On 05-02-18, 09:50, Rafael J. Wysocki wrote:
> By design (which I admit may be confusing) it should be fine to call
> dpm_resume_end() after a failing dpm_suspend_start(), whatever the reason
> for the failure is. cpufreq_suspend/resume() don't take that into account,
> everybody else does.

Hmm, I see. Can't do much then, just fix the only broken piece of code :)

--
viresh

2018-02-05 09:24:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended

On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:
> cpufreq_resume can be called even without preceding cpufreq_suspend.
> This can happen in following scenario:
>
> suspend_devices_and_enter
> --> dpm_suspend_start
> --> dpm_prepare
> --> device_prepare : this function errors out
> --> dpm_suspend: this is skipped due to dpm_prepare failure
> this means cpufreq_suspend is skipped over
> --> goto Recover_platform, due to previous error
> --> goto Resume_devices
> --> dpm_resume_end
> --> dpm_resume
> --> cpufreq_resume
>
> In case schedutil is used as frequency governor, cpufreq_resume will
> eventually call sugov_start, which does following:
>
> memset(sg_cpu, 0, sizeof(*sg_cpu));
> ....
>
> This effectively erases function pointer for frequency update, causing
> crash later on. The function pointer would have been set correctly if
> subsequent cpufreq_add_update_util_hook runs successfully, but that
> function returns earlier because cpufreq_suspend was not called:
>
> if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
> return;
>
> Ideally, suspend should succeed, then things will be fine. But even
> in case of suspend failure, system should not crash.
>
> The fix is to check cpufreq_suspended first, if it's false, that means
> cpufreq_suspend was not called in the first place, so do not resume
> cpufreq.
>
> Signed-off-by: Bo Yan <[email protected]>
> ---
> drivers/cpufreq/cpufreq.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index 41d148af7748..95b1c4afe14e 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
> if (!cpufreq_driver)
> return;
>
> + if (unlikely(!cpufreq_suspended)) {
> + pr_warn("%s: resume after failing suspend\n", __func__);
> + return;
> + }
> cpufreq_suspended = false;
>
> if (!has_target() && !cpufreq_driver->resume)

I've just edited this patch somewhat (mostly by dropping the pr_warn())
and queued it up.

Thanks,
Rafael


2018-02-05 09:25:00

by Viresh Kumar

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended

On 05-02-18, 10:19, Rafael J. Wysocki wrote:
> On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 41d148af7748..95b1c4afe14e 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
> > if (!cpufreq_driver)
> > return;
> >
> > + if (unlikely(!cpufreq_suspended)) {
> > + pr_warn("%s: resume after failing suspend\n", __func__);
> > + return;
> > + }
> > cpufreq_suspended = false;
> >
> > if (!has_target() && !cpufreq_driver->resume)
>
> I've just edited this patch somewhat (mostly by dropping the pr_warn())
> and queued it up.

You can add my Ack as well.

Acked-by: Viresh Kumar <[email protected]>

--
viresh

2018-02-16 17:36:16

by Saravana Kannan

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended

On 02/05/2018 01:05 AM, Viresh Kumar wrote:
> On 05-02-18, 09:50, Rafael J. Wysocki wrote:
>> By design (which I admit may be confusing) it should be fine to call
>> dpm_resume_end() after a failing dpm_suspend_start(), whatever the reason
>> for the failure is. cpufreq_suspend/resume() don't take that into account,
>> everybody else does.
>
> Hmm, I see. Can't do much then, just fix the only broken piece of code :)
>

Sorry for the late reply, this email didn't get filtered into the right
folder.

I think the design of dpm_suspend_start() and dpm_resume_end() generally
works fine because we seem to keep track of what devices have been
suspended so far (in the dpm_suspended_list) and call resume only of
those. So, why isn't the right fix to have cpufreq get put into that
list? Instead of just always call it on the resume path even if it
wasn't suspended? That seems to be the real issue.

So, we should either have dpm_suspend/resume() have a flag to keep track
of if cpufreq_suspend/resume() was called and make sure they are called
in proper pairs. Or have cpufreq register in a way that gets it put in
the suspend/resume list.

I'd still like to NACK this change.

-Saravana

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

2018-02-16 18:06:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended

On Thursday, February 15, 2018 10:27:10 PM CET Saravana Kannan wrote:
> On 02/05/2018 01:05 AM, Viresh Kumar wrote:
> > On 05-02-18, 09:50, Rafael J. Wysocki wrote:
> >> By design (which I admit may be confusing) it should be fine to call
> >> dpm_resume_end() after a failing dpm_suspend_start(), whatever the reason
> >> for the failure is. cpufreq_suspend/resume() don't take that into account,
> >> everybody else does.
> >
> > Hmm, I see. Can't do much then, just fix the only broken piece of code :)
> >
>
> Sorry for the late reply, this email didn't get filtered into the right
> folder.
>
> I think the design of dpm_suspend_start() and dpm_resume_end() generally
> works fine because we seem to keep track of what devices have been
> suspended so far (in the dpm_suspended_list) and call resume only of
> those. So, why isn't the right fix to have cpufreq get put into that
> list?

Because it is more complicated?

> Instead of just always call it on the resume path even if it
> wasn't suspended? That seems to be the real issue.
>
> So, we should either have dpm_suspend/resume() have a flag to keep track
> of if cpufreq_suspend/resume() was called and make sure they are called
> in proper pairs.

Why?

> Or have cpufreq register in a way that gets it put in
> the suspend/resume list.
>
> I'd still like to NACK this change.

It's gone in already, sorry.