2018-01-02 00:57:34

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH] PM / runtime: Rework pm_runtime_force_suspend/resume()

From: Rafael J. Wysocki <[email protected]>

One of the limitations of pm_runtime_force_suspend/resume() is that
if a parent driver wants to use these functions, all of its child
drivers have to do that too because of the parent usage counter
manipulations necessary to get the correct state of the parent during
system-wide transitions to the working state (system resume).
However, that limitation turns out to be artificial, so remove it.

First, notice that it is not really necessary to change the device's
runtime PM status in pm_runtime_force_suspend(). Moreover, if
pm_runtime_set_suspended() is not called for a device, the children
counter of its parent (if there is a parent) will not drop, so it
is not necessary to increment the parent's usage counter in that
case, as long as the children counters of devices are checked
along with their usage counters in order to decide whether or not
the devices may be left in suspend on the way out (that is, during
the subsequent system-wide transition to the working state).

Accordingly, modify pm_runtime_force_suspend() to only call
pm_runtime_set_suspended() for devices whose usage and children
counters are at the "no references" level and drop the parent usage
counter incrementation from it.

Second, notice that with the above change in pm_runtime_force_suspend(),
if the runtime PM status of the device in pm_runtime_force_resume() is
"suspended", then either the device had remained in runtime suspend
before pm_runtime_force_suspend() was called for it (in which case it
is safe to leave it in suspend), or its runtime PM status has been
changed by pm_runtime_force_suspend() itself. Of course, the latter
case is only possible if the usage and children counters of the device
are at the "no reference" levels, so the device may be left in suspend
then.

In turn, if the runtime PM status of the device in
pm_runtime_force_resume() is "active", pm_runtime_force_suspend()
called for it previously found active references to it, so it needs
to be resumed.

Hence, with the above change in pm_runtime_force_suspend(),
pm_runtime_force_resume() only needs to check the device's runtime PM
status to decide whether or not to resume it and it doesn't need to
manipulate the device parent's usage counter any more, so modify it
accordingly.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/runtime.c | 61 +++++++++++++------------------------------
1 file changed, 19 insertions(+), 42 deletions(-)

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -1618,12 +1618,15 @@ void pm_runtime_drop_link(struct device
* @dev: Device to suspend.
*
* Disable runtime PM so we safely can check the device's runtime PM status and
- * if it is active, invoke it's .runtime_suspend callback to bring it into
- * suspend state. Keep runtime PM disabled to preserve the state unless we
- * encounter errors.
+ * if it is active, invoke it's ->runtime_suspend callback to suspend it. Also,
+ * if the device's usage and children counters don't indicate that the device
+ * was in use before the system-wide transition under way, change its runtime PM
+ * status to suspended after suspending it. Keep runtime PM disabled to
+ * preserve the state unless we encounter errors.
*
* Typically this function may be invoked from a system suspend callback to make
- * sure the device is put into low power state.
+ * sure the device is put into low power state. It assumes that the analogous
+ * pm_runtime_force_resume() will be used to resume the device.
*/
int pm_runtime_force_suspend(struct device *dev)
{
@@ -1645,18 +1648,12 @@ int pm_runtime_force_suspend(struct devi
if (ret)
goto err;

- /*
- * Increase the runtime PM usage count for the device's parent, in case
- * when we find the device being used when system suspend was invoked.
- * This informs pm_runtime_force_resume() to resume the parent
- * immediately, which is needed to be able to resume its children,
- * when not deferring the resume to be managed via runtime PM.
- */
- if (dev->parent && atomic_read(&dev->power.usage_count) > 1)
- pm_runtime_get_noresume(dev->parent);
+ if (atomic_read(&dev->power.usage_count) <= 1 &&
+ atomic_read(&dev->power.child_count) == 0)
+ pm_runtime_set_suspended(dev);

- pm_runtime_set_suspended(dev);
return 0;
+
err:
pm_runtime_enable(dev);
return ret;
@@ -1669,13 +1666,9 @@ EXPORT_SYMBOL_GPL(pm_runtime_force_suspe
*
* Prior invoking this function we expect the user to have brought the device
* into low power state by a call to pm_runtime_force_suspend(). Here we reverse
- * those actions and brings the device into full power, if it is expected to be
- * used on system resume. To distinguish that, we check whether the runtime PM
- * usage count is greater than 1 (the PM core increases the usage count in the
- * system PM prepare phase), as that indicates a real user (such as a subsystem,
- * driver, userspace, etc.) is using it. If that is the case, the device is
- * expected to be used on system resume as well, so then we resume it. In the
- * other case, we defer the resume to be managed via runtime PM.
+ * those actions and bring the device into full power, if it is expected to be
+ * used on system resume. In the other case, we defer the resume to be managed
+ * via runtime PM.
*
* Typically this function may be invoked from a system resume callback.
*/
@@ -1684,6 +1677,9 @@ int pm_runtime_force_resume(struct devic
int (*callback)(struct device *);
int ret = 0;

+ if (pm_runtime_status_suspended(dev))
+ goto out;
+
callback = RPM_GET_CALLBACK(dev, runtime_resume);

if (!callback) {
@@ -1691,31 +1687,12 @@ int pm_runtime_force_resume(struct devic
goto out;
}

- if (!pm_runtime_status_suspended(dev))
- goto out;
-
- /*
- * Decrease the parent's runtime PM usage count, if we increased it
- * during system suspend in pm_runtime_force_suspend().
- */
- if (atomic_read(&dev->power.usage_count) > 1) {
- if (dev->parent)
- pm_runtime_put_noidle(dev->parent);
- } else {
- goto out;
- }
-
- ret = pm_runtime_set_active(dev);
- if (ret)
- goto out;
-
ret = callback(dev);
- if (ret) {
- pm_runtime_set_suspended(dev);
+ if (ret)
goto out;
- }

pm_runtime_mark_last_busy(dev);
+
out:
pm_runtime_enable(dev);
return ret;


2018-01-02 10:51:09

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Tue, Jan 02, 2018 at 01:56:28AM +0100, Rafael J. Wysocki wrote:
> + if (atomic_read(&dev->power.usage_count) <= 1 &&
> + atomic_read(&dev->power.child_count) == 0)
> + pm_runtime_set_suspended(dev);
>
> - pm_runtime_set_suspended(dev);

The ->runtime_suspend callback *has* been executed at this point.
If the status is only updated conditionally, it may not reflect
the device's actual power state correctly. That doesn't seem to
be a good idea.

The kerneldoc says:

Typically this function may be invoked from a system suspend callback
to make sure the device is put into low power state.

That portion is not modified by your patch.

"Typically" implies that it's legal to call pm_runtime_force_suspend() in
*other* contexts than as a ->suspend hook.

Let's say pm_runtime_force_suspend() is invoked at runtime, outside of
a system sleep transition. Due to updating the power state only
conditionally, users will see an incorrect power state in sysfs.

The reason I'm speaking up here or taking an interest in the
pm_runtime_force_*() functions is that I would like to use them
for manual power control in vga_switcheroo, the kernel subsystem
for switchable Laptop graphics. It currently supports two modes of
operation for each GPU, automatic power control (autosuspend via
runtime PM) or manual power control (by echoing ON and OFF to a
debugfs file).

Manual power control is currently a mess because it switches the GPU
off behind the PM core's back, causing all sorts of issues during a
system sleep transition.

Potentially pm_runtime_force_*() could be used as a clean way to
reimplement manual power control because it wouldn't happen behind
the PM core's back. However your patch seems to break this potential
use case because:

a) the device's power state may not be reflected correctly because
it's only updated conditionally (see above),

b) pm_runtime_force_resume(), despite its name, is no longer guaranteed
to force the device on (it now allows the device to continue
slumbering).

One addition that would be really helpful: pm_runtime_force_suspend()
should also force-suspend all children and consumers of the given
device. Likewise, those should be resumed on pm_runtime_force_resume().
Then I could just add a device link from the audio PCI device on the GPU
to the graphics PCI device and just call pm_runtime_force_*() on the
graphics device (supplier) to magically power them both off and on.

Thanks,

Lukas

2018-01-02 11:02:22

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Tue, Jan 2, 2018 at 11:51 AM, Lukas Wunner <[email protected]> wrote:
> On Tue, Jan 02, 2018 at 01:56:28AM +0100, Rafael J. Wysocki wrote:
>> + if (atomic_read(&dev->power.usage_count) <= 1 &&
>> + atomic_read(&dev->power.child_count) == 0)
>> + pm_runtime_set_suspended(dev);
>>
>> - pm_runtime_set_suspended(dev);
>
> The ->runtime_suspend callback *has* been executed at this point.
> If the status is only updated conditionally, it may not reflect
> the device's actual power state correctly. That doesn't seem to
> be a good idea.

It doesn't matter, because this is done with runtime PM disabled, isn't it?

So what matters is the status after the subsequent
pm_runtime_force_resume() returns, and that should reflect the actual
state of the device correctly. Doesn't it?

> The kerneldoc says:
>
> Typically this function may be invoked from a system suspend callback
> to make sure the device is put into low power state.
>
> That portion is not modified by your patch.
>
> "Typically" implies that it's legal to call pm_runtime_force_suspend() in
> *other* contexts than as a ->suspend hook.

It should only be used during system suspend anyway, however.

> Let's say pm_runtime_force_suspend() is invoked at runtime, outside of
> a system sleep transition.

That will disable runtime PM for you until you call
pm_runtime_force_resume() for the device.

> Due to updating the power state only
> conditionally, users will see an incorrect power state in sysfs.

But that's with runtime PM disabled, so it doesn't matter.

> The reason I'm speaking up here or taking an interest in the
> pm_runtime_force_*() functions is that I would like to use them
> for manual power control in vga_switcheroo, the kernel subsystem
> for switchable Laptop graphics. It currently supports two modes of
> operation for each GPU, automatic power control (autosuspend via
> runtime PM) or manual power control (by echoing ON and OFF to a
> debugfs file).
>
> Manual power control is currently a mess because it switches the GPU
> off behind the PM core's back, causing all sorts of issues during a
> system sleep transition.
>
> Potentially pm_runtime_force_*() could be used as a clean way to
> reimplement manual power control because it wouldn't happen behind
> the PM core's back. However your patch seems to break this potential
> use case because:

No, it doesn't break anything.

It actually doesn't even change anything for real except for dropping
some manipulations of the device's parent usage counter.

> a) the device's power state may not be reflected correctly because
> it's only updated conditionally (see above),
>
> b) pm_runtime_force_resume(), despite its name, is no longer guaranteed
> to force the device on (it now allows the device to continue
> slumbering).

It does that without the patch too. The name is just confusing. :-)

> One addition that would be really helpful: pm_runtime_force_suspend()
> should also force-suspend all children and consumers of the given
> device. Likewise, those should be resumed on pm_runtime_force_resume().
> Then I could just add a device link from the audio PCI device on the GPU
> to the graphics PCI device and just call pm_runtime_force_*() on the
> graphics device (supplier) to magically power them both off and on.

Actually, the assumption is that pm_runtime_force_suspend() must be
called for the children before it is called for the parent even
without my patch, so it is just not going to work this way.

Thanks,
Rafael

2018-01-02 11:21:45

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Tue, Jan 2, 2018 at 12:02 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Tue, Jan 2, 2018 at 11:51 AM, Lukas Wunner <[email protected]> wrote:
>> On Tue, Jan 02, 2018 at 01:56:28AM +0100, Rafael J. Wysocki wrote:

[cut]

>
>> One addition that would be really helpful: pm_runtime_force_suspend()
>> should also force-suspend all children and consumers of the given
>> device. Likewise, those should be resumed on pm_runtime_force_resume().
>> Then I could just add a device link from the audio PCI device on the GPU
>> to the graphics PCI device and just call pm_runtime_force_*() on the
>> graphics device (supplier) to magically power them both off and on.
>
> Actually, the assumption is that pm_runtime_force_suspend() must be
> called for the children before it is called for the parent even
> without my patch, so it is just not going to work this way.

Moreover, what if those devices have nonzero usage counters? There
may be other reasons for that than just dependencies, like for example
user space might have written "on" to their "control" files in sysfs.

What you are looking for doesn't seem to match the runtime PM
framework's assumptions, I'm afraid.

Thanks,
Rafael

2018-01-02 13:05:20

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Tue, Jan 02, 2018 at 12:02:18PM +0100, Rafael J. Wysocki wrote:
> On Tue, Jan 2, 2018 at 11:51 AM, Lukas Wunner <[email protected]> wrote:
> > On Tue, Jan 02, 2018 at 01:56:28AM +0100, Rafael J. Wysocki wrote:
> >> + if (atomic_read(&dev->power.usage_count) <= 1 &&
> >> + atomic_read(&dev->power.child_count) == 0)
> >> + pm_runtime_set_suspended(dev);
> >>
> >> - pm_runtime_set_suspended(dev);
> >
> > The ->runtime_suspend callback *has* been executed at this point.
> > If the status is only updated conditionally, it may not reflect
> > the device's actual power state correctly. That doesn't seem to
> > be a good idea.
>
> It doesn't matter, because this is done with runtime PM disabled, isn't it?

It might not make a difference for the use case I have in mind, but
pm_runtime_status_suspended() will return an incorrect result and is
called from 47 files in 4.15-rc6 according to lxr.free-electrons.com.


> > The kerneldoc says:
> >
> > Typically this function may be invoked from a system suspend callback
> > to make sure the device is put into low power state.
> >
> > That portion is not modified by your patch.
> >
> > "Typically" implies that it's legal to call pm_runtime_force_suspend() in
> > *other* contexts than as a ->suspend hook.
>
> It should only be used during system suspend anyway, however.

Then the kerneldoc is wrong.


> >> One addition that would be really helpful: pm_runtime_force_suspend()
> >> should also force-suspend all children and consumers of the given
> >> device. Likewise, those should be resumed on pm_runtime_force_resume().
> >> Then I could just add a device link from the audio PCI device on the GPU
> >> to the graphics PCI device and just call pm_runtime_force_*() on the
> >> graphics device (supplier) to magically power them both off and on.
> >
> > Actually, the assumption is that pm_runtime_force_suspend() must be
> > called for the children before it is called for the parent even
> > without my patch, so it is just not going to work this way.
>
> Moreover, what if those devices have nonzero usage counters? There
> may be other reasons for that than just dependencies, like for example
> user space might have written "on" to their "control" files in sysfs.

In that case pm_runtime_force_suspend() should return a negative errno.

I envision amending control_store() so that "off" can be written to the
"control" file, allowing userspace to invoke pm_runtime_force_suspend()
to force certain devices into runtime suspend. The user would get back
an error if the call failed for some reason (such as an active child or
consumer of the to be force-suspended device). That would be a clean
replacement for the ON/OFF options we currently have for the
vga_switcheroo debugfs control file.

Thanks,

Lukas

2018-01-02 19:17:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Tuesday, January 2, 2018 2:04:04 PM CET Lukas Wunner wrote:
> On Tue, Jan 02, 2018 at 12:02:18PM +0100, Rafael J. Wysocki wrote:
> > On Tue, Jan 2, 2018 at 11:51 AM, Lukas Wunner <[email protected]> wrote:
> > > On Tue, Jan 02, 2018 at 01:56:28AM +0100, Rafael J. Wysocki wrote:
> > >> + if (atomic_read(&dev->power.usage_count) <= 1 &&
> > >> + atomic_read(&dev->power.child_count) == 0)
> > >> + pm_runtime_set_suspended(dev);
> > >>
> > >> - pm_runtime_set_suspended(dev);
> > >
> > > The ->runtime_suspend callback *has* been executed at this point.
> > > If the status is only updated conditionally, it may not reflect
> > > the device's actual power state correctly. That doesn't seem to
> > > be a good idea.
> >
> > It doesn't matter, because this is done with runtime PM disabled, isn't it?
>
> It might not make a difference for the use case I have in mind, but
> pm_runtime_status_suspended() will return an incorrect result and is
> called from 47 files in 4.15-rc6 according to lxr.free-electrons.com.

Generally, the runtime PM status is only meaningful for devices with runtime PM
enabled.

There is an exception, which is during system suspend/resume, when runtime PM
is automatically disabled by the core, but that only under certain assumptions.

Basically, you have to assume that no one else will mess up with the device
between the times you call pm_runtime_status_suspended() to check its runtime
PM status (or between the first time you do that and the last time runtime PM
has been enabled for the device).

This patch doesn't change the situation in that respect.

> > > The kerneldoc says:
> > >
> > > Typically this function may be invoked from a system suspend callback
> > > to make sure the device is put into low power state.
> > >
> > > That portion is not modified by your patch.
> > >
> > > "Typically" implies that it's legal to call pm_runtime_force_suspend() in
> > > *other* contexts than as a ->suspend hook.
> >
> > It should only be used during system suspend anyway, however.
>
> Then the kerneldoc is wrong.

It isn't wrong. It may be incomplete, but the information in it is correct.

And making it more complete is not part of this patch IMO.

> > >> One addition that would be really helpful: pm_runtime_force_suspend()
> > >> should also force-suspend all children and consumers of the given
> > >> device. Likewise, those should be resumed on pm_runtime_force_resume().
> > >> Then I could just add a device link from the audio PCI device on the GPU
> > >> to the graphics PCI device and just call pm_runtime_force_*() on the
> > >> graphics device (supplier) to magically power them both off and on.
> > >
> > > Actually, the assumption is that pm_runtime_force_suspend() must be
> > > called for the children before it is called for the parent even
> > > without my patch, so it is just not going to work this way.
> >
> > Moreover, what if those devices have nonzero usage counters? There
> > may be other reasons for that than just dependencies, like for example
> > user space might have written "on" to their "control" files in sysfs.
>
> In that case pm_runtime_force_suspend() should return a negative errno.

In which case it wouldn't be suitable for the system-wide PM callback role.

> I envision amending control_store() so that "off" can be written to the
> "control" file, allowing userspace to invoke pm_runtime_force_suspend()
> to force certain devices into runtime suspend.

But it isn't a good tool for the purpose which I'm trying to tell you.

You need something else.

> The user would get back
> an error if the call failed for some reason (such as an active child or
> consumer of the to be force-suspended device). That would be a clean
> replacement for the ON/OFF options we currently have for the
> vga_switcheroo debugfs control file.

Well, it looks like you are looking for an interface to invoke
pm_runtime_suspend() for the device as it behaves exactly the way you want. :-)

Thanks,
Rafael

2018-01-02 23:21:47

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Tue, Jan 2, 2018 at 8:07 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Tuesday, January 2, 2018 2:04:04 PM CET Lukas Wunner wrote:
>> On Tue, Jan 02, 2018 at 12:02:18PM +0100, Rafael J. Wysocki wrote:
>> > On Tue, Jan 2, 2018 at 11:51 AM, Lukas Wunner <[email protected]> wrote:
>> > > On Tue, Jan 02, 2018 at 01:56:28AM +0100, Rafael J. Wysocki wrote:
>> > >> + if (atomic_read(&dev->power.usage_count) <= 1 &&
>> > >> + atomic_read(&dev->power.child_count) == 0)
>> > >> + pm_runtime_set_suspended(dev);
>> > >>
>> > >> - pm_runtime_set_suspended(dev);
>> > >
>> > > The ->runtime_suspend callback *has* been executed at this point.
>> > > If the status is only updated conditionally, it may not reflect
>> > > the device's actual power state correctly. That doesn't seem to
>> > > be a good idea.
>> >
>> > It doesn't matter, because this is done with runtime PM disabled, isn't it?
>>
>> It might not make a difference for the use case I have in mind, but
>> pm_runtime_status_suspended() will return an incorrect result and is
>> called from 47 files in 4.15-rc6 according to lxr.free-electrons.com.
>
> Generally, the runtime PM status is only meaningful for devices with runtime PM
> enabled.
>
> There is an exception, which is during system suspend/resume, when runtime PM
> is automatically disabled by the core, but that only under certain assumptions.
>
> Basically, you have to assume that no one else will mess up with the device
> between the times you call pm_runtime_status_suspended() to check its runtime
> PM status (or between the first time you do that and the last time runtime PM
> has been enabled for the device).
>
> This patch doesn't change the situation in that respect.

BTW, I'm not sure why you are worrying about the "status" field alone
and not about the usage counter that can be greater than 0 after
pm_runtime_force_suspend() which is inconsistent with the device's
physical state (and with the "status" field too for that matter -
always without the patch and in some cases with it) then. As a matter
of fact, the information left by the runtime PM framework is messed up
with here this way or another and so anyway the only party that can
make sense of it after pm_runtime_force_suspend() is the subsequent
pm_runtime_force_resume().

Thanks,
Rafael

2018-01-03 11:02:55

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Wednesday, January 3, 2018 12:21:36 AM CET Rafael J. Wysocki wrote:
> On Tue, Jan 2, 2018 at 8:07 PM, Rafael J. Wysocki <[email protected]> wrote:
> > On Tuesday, January 2, 2018 2:04:04 PM CET Lukas Wunner wrote:
> >> On Tue, Jan 02, 2018 at 12:02:18PM +0100, Rafael J. Wysocki wrote:
> >> > On Tue, Jan 2, 2018 at 11:51 AM, Lukas Wunner <[email protected]> wrote:
> >> > > On Tue, Jan 02, 2018 at 01:56:28AM +0100, Rafael J. Wysocki wrote:
> >> > >> + if (atomic_read(&dev->power.usage_count) <= 1 &&
> >> > >> + atomic_read(&dev->power.child_count) == 0)
> >> > >> + pm_runtime_set_suspended(dev);
> >> > >>
> >> > >> - pm_runtime_set_suspended(dev);
> >> > >
> >> > > The ->runtime_suspend callback *has* been executed at this point.
> >> > > If the status is only updated conditionally, it may not reflect
> >> > > the device's actual power state correctly. That doesn't seem to
> >> > > be a good idea.
> >> >
> >> > It doesn't matter, because this is done with runtime PM disabled, isn't it?
> >>
> >> It might not make a difference for the use case I have in mind, but
> >> pm_runtime_status_suspended() will return an incorrect result and is
> >> called from 47 files in 4.15-rc6 according to lxr.free-electrons.com.
> >
> > Generally, the runtime PM status is only meaningful for devices with runtime PM
> > enabled.
> >
> > There is an exception, which is during system suspend/resume, when runtime PM
> > is automatically disabled by the core, but that only under certain assumptions.
> >
> > Basically, you have to assume that no one else will mess up with the device
> > between the times you call pm_runtime_status_suspended() to check its runtime
> > PM status (or between the first time you do that and the last time runtime PM
> > has been enabled for the device).
> >
> > This patch doesn't change the situation in that respect.
>
> BTW, I'm not sure why you are worrying about the "status" field alone
> and not about the usage counter that can be greater than 0 after
> pm_runtime_force_suspend() which is inconsistent with the device's
> physical state (and with the "status" field too for that matter -
> always without the patch and in some cases with it) then. As a matter
> of fact, the information left by the runtime PM framework is messed up
> with here this way or another and so anyway the only party that can
> make sense of it after pm_runtime_force_suspend() is the subsequent
> pm_runtime_force_resume().

All of that said, I have overlooked the fact that pm_runtime_force_suspend()
itself can be called twice in a row for the same device during the same
system-wide PM transition (genpd can do that, confusingly enough).

I'll send a v2 in a moment.

Thanks,
Rafael

2018-01-03 11:07:48

by Rafael J. Wysocki

[permalink] [raw]
Subject: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

From: Rafael J. Wysocki <[email protected]>

One of the limitations of pm_runtime_force_suspend/resume() is that
if a parent driver wants to use these functions, all of its child
drivers have to do that too because of the parent usage counter
manipulations necessary to get the correct state of the parent during
system-wide transitions to the working state (system resume).
However, that limitation turns out to be artificial, so remove it.

Namely, pm_runtime_force_suspend() only needs to update the children
counter of its parent (if there's is a parent) when the device can
stay in suspend after the subsequent system resume transition, as
that counter is correct already otherwise. Now, if the parent's
children counter is not updated, it is not necessary to increment
the parent's usage counter in that case any more, as long as the
children counters of devices are checked along with their usage
counters in order to decide whether or not the devices may be left
in suspend after the subsequent system resume transition.

Accordingly, modify pm_runtime_force_suspend() to only call
pm_runtime_set_suspended() for devices whose usage and children
counters are at the "no references" level (the runtime PM status
of the device needs to be updated to "suspended" anyway in case
this function is called once again for the same device during the
transition under way), drop the parent usage counter incrementation
from it and update pm_runtime_force_resume() to compensate for these
changes.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/runtime.c | 74 +++++++++++++++++++------------------------
1 file changed, 34 insertions(+), 40 deletions(-)

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -1613,17 +1613,28 @@ void pm_runtime_drop_link(struct device
spin_unlock_irq(&dev->power.lock);
}

+static bool pm_runtime_need_not_resume(struct device *dev)
+{
+ return atomic_read(&dev->power.usage_count) <= 1 &&
+ atomic_read(&dev->power.child_count) == 0;
+}
+
/**
* pm_runtime_force_suspend - Force a device into suspend state if needed.
* @dev: Device to suspend.
*
* Disable runtime PM so we safely can check the device's runtime PM status and
- * if it is active, invoke it's .runtime_suspend callback to bring it into
- * suspend state. Keep runtime PM disabled to preserve the state unless we
- * encounter errors.
+ * if it is active, invoke its ->runtime_suspend callback to suspend it and
+ * change its runtime PM status field to RPM_SUSPENDED. Also, if the device's
+ * usage and children counters don't indicate that the device was in use before
+ * the system-wide transition under way, decrement its parent's children counter
+ * (if there is a parent). Keep runtime PM disabled to preserve the state
+ * unless we encounter errors.
*
* Typically this function may be invoked from a system suspend callback to make
- * sure the device is put into low power state.
+ * sure the device is put into low power state and it should only be used during
+ * system-wide PM transitions to sleep states. It assumes that the analogous
+ * pm_runtime_force_resume() will be used to resume the device.
*/
int pm_runtime_force_suspend(struct device *dev)
{
@@ -1646,17 +1657,18 @@ int pm_runtime_force_suspend(struct devi
goto err;

/*
- * Increase the runtime PM usage count for the device's parent, in case
- * when we find the device being used when system suspend was invoked.
- * This informs pm_runtime_force_resume() to resume the parent
- * immediately, which is needed to be able to resume its children,
- * when not deferring the resume to be managed via runtime PM.
+ * If the device can stay in suspend after the system-wide transition
+ * to the working state that will follow, drop the children counter of
+ * its parent, but set its status to RPM_SUSPENDED anyway in case this
+ * function will be called again for it in the meantime.
*/
- if (dev->parent && atomic_read(&dev->power.usage_count) > 1)
- pm_runtime_get_noresume(dev->parent);
+ if (pm_runtime_need_not_resume(dev))
+ pm_runtime_set_suspended(dev);
+ else
+ __update_runtime_status(dev, RPM_SUSPENDED);

- pm_runtime_set_suspended(dev);
return 0;
+
err:
pm_runtime_enable(dev);
return ret;
@@ -1669,13 +1681,9 @@ EXPORT_SYMBOL_GPL(pm_runtime_force_suspe
*
* Prior invoking this function we expect the user to have brought the device
* into low power state by a call to pm_runtime_force_suspend(). Here we reverse
- * those actions and brings the device into full power, if it is expected to be
- * used on system resume. To distinguish that, we check whether the runtime PM
- * usage count is greater than 1 (the PM core increases the usage count in the
- * system PM prepare phase), as that indicates a real user (such as a subsystem,
- * driver, userspace, etc.) is using it. If that is the case, the device is
- * expected to be used on system resume as well, so then we resume it. In the
- * other case, we defer the resume to be managed via runtime PM.
+ * those actions and bring the device into full power, if it is expected to be
+ * used on system resume. In the other case, we defer the resume to be managed
+ * via runtime PM.
*
* Typically this function may be invoked from a system resume callback.
*/
@@ -1684,32 +1692,18 @@ int pm_runtime_force_resume(struct devic
int (*callback)(struct device *);
int ret = 0;

- callback = RPM_GET_CALLBACK(dev, runtime_resume);
-
- if (!callback) {
- ret = -ENOSYS;
- goto out;
- }
-
- if (!pm_runtime_status_suspended(dev))
+ if (!pm_runtime_status_suspended(dev) || pm_runtime_need_not_resume(dev))
goto out;

/*
- * Decrease the parent's runtime PM usage count, if we increased it
- * during system suspend in pm_runtime_force_suspend().
- */
- if (atomic_read(&dev->power.usage_count) > 1) {
- if (dev->parent)
- pm_runtime_put_noidle(dev->parent);
- } else {
- goto out;
- }
+ * The value of the parent's children counter is correct already, so
+ * just update the status of the device.
+ */
+ __update_runtime_status(dev, RPM_ACTIVE);

- ret = pm_runtime_set_active(dev);
- if (ret)
- goto out;
+ callback = RPM_GET_CALLBACK(dev, runtime_resume);

- ret = callback(dev);
+ ret = callback ? callback(dev) : -ENOSYS;
if (ret) {
pm_runtime_set_suspended(dev);
goto out;

2018-01-09 06:08:25

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

On 3 January 2018 at 12:06, Rafael J. Wysocki <[email protected]> wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> One of the limitations of pm_runtime_force_suspend/resume() is that
> if a parent driver wants to use these functions, all of its child
> drivers have to do that too because of the parent usage counter
> manipulations necessary to get the correct state of the parent during
> system-wide transitions to the working state (system resume).

I understand your point, however this isn't describing the full story,
because there are a some more alternatives when
pm_runtime_force_suspend|resume() works well, when used for parent
devices. Let me clarify, just to make sure we get this correct.

1) If the child device isn't managed by runtime PM at all (it always
has runtime PM disabled), pm_runtime_force_suspend|resume() can be
used for the parent device. This works because the runtime PM status
if the child remains in the default status, RPM_SUSPENDED.

2) If, somehow, during system suspend/resume, the driver for the child
make sure to synchronize the runtime PM status of the child device,
according to the state of the HW, this should work too.

I leave it to you how to take this fact into consideration, if and how
to update the changelog.

> However, that limitation turns out to be artificial, so remove it.

Agree!

>
> Namely, pm_runtime_force_suspend() only needs to update the children
> counter of its parent (if there's is a parent) when the device can
> stay in suspend after the subsequent system resume transition, as
> that counter is correct already otherwise. Now, if the parent's
> children counter is not updated, it is not necessary to increment
> the parent's usage counter in that case any more, as long as the
> children counters of devices are checked along with their usage
> counters in order to decide whether or not the devices may be left
> in suspend after the subsequent system resume transition.
>
> Accordingly, modify pm_runtime_force_suspend() to only call
> pm_runtime_set_suspended() for devices whose usage and children
> counters are at the "no references" level (the runtime PM status
> of the device needs to be updated to "suspended" anyway in case
> this function is called once again for the same device during the
> transition under way), drop the parent usage counter incrementation
> from it and update pm_runtime_force_resume() to compensate for these
> changes.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> ---
> drivers/base/power/runtime.c | 74 +++++++++++++++++++------------------------
> 1 file changed, 34 insertions(+), 40 deletions(-)
>
> Index: linux-pm/drivers/base/power/runtime.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/runtime.c
> +++ linux-pm/drivers/base/power/runtime.c
> @@ -1613,17 +1613,28 @@ void pm_runtime_drop_link(struct device
> spin_unlock_irq(&dev->power.lock);
> }
>
> +static bool pm_runtime_need_not_resume(struct device *dev)
> +{
> + return atomic_read(&dev->power.usage_count) <= 1 &&
> + atomic_read(&dev->power.child_count) == 0;

We should take into account the ignore_children flag here, I think.
Something like this:

return atomic_read(&dev->power.usage_count) <= 1 &&
(atomic_read(&dev->power.child_count) == 0 ||
dev->power.ignore_children)

> +}
> +
> /**
> * pm_runtime_force_suspend - Force a device into suspend state if needed.
> * @dev: Device to suspend.
> *
> * Disable runtime PM so we safely can check the device's runtime PM status and
> - * if it is active, invoke it's .runtime_suspend callback to bring it into
> - * suspend state. Keep runtime PM disabled to preserve the state unless we
> - * encounter errors.
> + * if it is active, invoke its ->runtime_suspend callback to suspend it and
> + * change its runtime PM status field to RPM_SUSPENDED. Also, if the device's
> + * usage and children counters don't indicate that the device was in use before
> + * the system-wide transition under way, decrement its parent's children counter
> + * (if there is a parent). Keep runtime PM disabled to preserve the state
> + * unless we encounter errors.
> *
> * Typically this function may be invoked from a system suspend callback to make
> - * sure the device is put into low power state.
> + * sure the device is put into low power state and it should only be used during
> + * system-wide PM transitions to sleep states. It assumes that the analogous
> + * pm_runtime_force_resume() will be used to resume the device.
> */
> int pm_runtime_force_suspend(struct device *dev)
> {
> @@ -1646,17 +1657,18 @@ int pm_runtime_force_suspend(struct devi
> goto err;
>
> /*
> - * Increase the runtime PM usage count for the device's parent, in case
> - * when we find the device being used when system suspend was invoked.
> - * This informs pm_runtime_force_resume() to resume the parent
> - * immediately, which is needed to be able to resume its children,
> - * when not deferring the resume to be managed via runtime PM.
> + * If the device can stay in suspend after the system-wide transition
> + * to the working state that will follow, drop the children counter of
> + * its parent, but set its status to RPM_SUSPENDED anyway in case this
> + * function will be called again for it in the meantime.
> */
> - if (dev->parent && atomic_read(&dev->power.usage_count) > 1)
> - pm_runtime_get_noresume(dev->parent);
> + if (pm_runtime_need_not_resume(dev))
> + pm_runtime_set_suspended(dev);
> + else
> + __update_runtime_status(dev, RPM_SUSPENDED);
>

This is clever!

I recall when working on commit 1d9174fbc55e ("PM / Runtime: Defer
resuming of the device in pm_runtime_force_resume()"), that I wanted
to take into account the dev->power.child_count, but couldn't figure
out exactly how. :-)

> - pm_runtime_set_suspended(dev);
> return 0;
> +
> err:
> pm_runtime_enable(dev);
> return ret;
> @@ -1669,13 +1681,9 @@ EXPORT_SYMBOL_GPL(pm_runtime_force_suspe
> *
> * Prior invoking this function we expect the user to have brought the device
> * into low power state by a call to pm_runtime_force_suspend(). Here we reverse
> - * those actions and brings the device into full power, if it is expected to be
> - * used on system resume. To distinguish that, we check whether the runtime PM
> - * usage count is greater than 1 (the PM core increases the usage count in the
> - * system PM prepare phase), as that indicates a real user (such as a subsystem,
> - * driver, userspace, etc.) is using it. If that is the case, the device is
> - * expected to be used on system resume as well, so then we resume it. In the
> - * other case, we defer the resume to be managed via runtime PM.
> + * those actions and bring the device into full power, if it is expected to be
> + * used on system resume. In the other case, we defer the resume to be managed
> + * via runtime PM.
> *
> * Typically this function may be invoked from a system resume callback.
> */
> @@ -1684,32 +1692,18 @@ int pm_runtime_force_resume(struct devic
> int (*callback)(struct device *);
> int ret = 0;
>
> - callback = RPM_GET_CALLBACK(dev, runtime_resume);
> -
> - if (!callback) {
> - ret = -ENOSYS;
> - goto out;
> - }

This make me realize that, I think this is probably for legacy reasons.

I think we should allow the callback to be NULL, don't you think? Of
course, this deserves its own change.

> -
> - if (!pm_runtime_status_suspended(dev))
> + if (!pm_runtime_status_suspended(dev) || pm_runtime_need_not_resume(dev))
> goto out;
>
> /*
> - * Decrease the parent's runtime PM usage count, if we increased it
> - * during system suspend in pm_runtime_force_suspend().
> - */
> - if (atomic_read(&dev->power.usage_count) > 1) {
> - if (dev->parent)
> - pm_runtime_put_noidle(dev->parent);
> - } else {
> - goto out;
> - }
> + * The value of the parent's children counter is correct already, so
> + * just update the status of the device.
> + */
> + __update_runtime_status(dev, RPM_ACTIVE);
>
> - ret = pm_runtime_set_active(dev);
> - if (ret)
> - goto out;
> + callback = RPM_GET_CALLBACK(dev, runtime_resume);
>
> - ret = callback(dev);
> + ret = callback ? callback(dev) : -ENOSYS;
> if (ret) {
> pm_runtime_set_suspended(dev);
> goto out;
>

Thanks for looking into this!

Kind regards
Uffe

2018-01-09 12:35:21

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Tuesday, January 9, 2018 7:08:21 AM CET Ulf Hansson wrote:
> On 3 January 2018 at 12:06, Rafael J. Wysocki <[email protected]> wrote:
> > From: Rafael J. Wysocki <[email protected]>
> >
> > One of the limitations of pm_runtime_force_suspend/resume() is that
> > if a parent driver wants to use these functions, all of its child
> > drivers have to do that too because of the parent usage counter
> > manipulations necessary to get the correct state of the parent during
> > system-wide transitions to the working state (system resume).
>
> I understand your point, however this isn't describing the full story,
> because there are a some more alternatives when
> pm_runtime_force_suspend|resume() works well, when used for parent
> devices. Let me clarify, just to make sure we get this correct.
>
> 1) If the child device isn't managed by runtime PM at all (it always
> has runtime PM disabled), pm_runtime_force_suspend|resume() can be
> used for the parent device. This works because the runtime PM status
> if the child remains in the default status, RPM_SUSPENDED.
>
> 2) If, somehow, during system suspend/resume, the driver for the child
> make sure to synchronize the runtime PM status of the child device,
> according to the state of the HW, this should work too.
>
> I leave it to you how to take this fact into consideration, if and how
> to update the changelog.

OK, let me see.

> > However, that limitation turns out to be artificial, so remove it.
>
> Agree!
>
> >
> > Namely, pm_runtime_force_suspend() only needs to update the children
> > counter of its parent (if there's is a parent) when the device can
> > stay in suspend after the subsequent system resume transition, as
> > that counter is correct already otherwise. Now, if the parent's
> > children counter is not updated, it is not necessary to increment
> > the parent's usage counter in that case any more, as long as the
> > children counters of devices are checked along with their usage
> > counters in order to decide whether or not the devices may be left
> > in suspend after the subsequent system resume transition.
> >
> > Accordingly, modify pm_runtime_force_suspend() to only call
> > pm_runtime_set_suspended() for devices whose usage and children
> > counters are at the "no references" level (the runtime PM status
> > of the device needs to be updated to "suspended" anyway in case
> > this function is called once again for the same device during the
> > transition under way), drop the parent usage counter incrementation
> > from it and update pm_runtime_force_resume() to compensate for these
> > changes.
> >
> > Signed-off-by: Rafael J. Wysocki <[email protected]>
> > ---
> > drivers/base/power/runtime.c | 74 +++++++++++++++++++------------------------
> > 1 file changed, 34 insertions(+), 40 deletions(-)
> >
> > Index: linux-pm/drivers/base/power/runtime.c
> > ===================================================================
> > --- linux-pm.orig/drivers/base/power/runtime.c
> > +++ linux-pm/drivers/base/power/runtime.c
> > @@ -1613,17 +1613,28 @@ void pm_runtime_drop_link(struct device
> > spin_unlock_irq(&dev->power.lock);
> > }
> >
> > +static bool pm_runtime_need_not_resume(struct device *dev)
> > +{
> > + return atomic_read(&dev->power.usage_count) <= 1 &&
> > + atomic_read(&dev->power.child_count) == 0;
>
> We should take into account the ignore_children flag here, I think.
> Something like this:
>
> return atomic_read(&dev->power.usage_count) <= 1 &&
> (atomic_read(&dev->power.child_count) == 0 ||
> dev->power.ignore_children)

The current code doesn't quite take ignore_children into account, however.

Regardless of which way the change is made, there will be one corner case
that is not going to be covered. First, if ignore_children is taken into
account, some cases in which the current code increments the parent's
usage counter will be treated as "need not resume". Second, if ignore_children
is ignored, some cases in which the parent's usage counter is not incremented
today will cause the parent to resume after the change.

Frankly, I prefer to ignore ignore_children at least for the time being,
because resuming the parent unnecessarily is not a tragedy (it will likely
suspend shortly anyway), but if it is necessary to resume it and it is not
be resumed, things will visibly break.

So, I'd prefer to leave this patch as is and do a second one adding the
ignore_children check. Then, if things break due to the second patch, it
can be reverted easily (unless that can be fixed differently).

> > +}
> > +
> > /**
> > * pm_runtime_force_suspend - Force a device into suspend state if needed.
> > * @dev: Device to suspend.
> > *
> > * Disable runtime PM so we safely can check the device's runtime PM status and
> > - * if it is active, invoke it's .runtime_suspend callback to bring it into
> > - * suspend state. Keep runtime PM disabled to preserve the state unless we
> > - * encounter errors.
> > + * if it is active, invoke its ->runtime_suspend callback to suspend it and
> > + * change its runtime PM status field to RPM_SUSPENDED. Also, if the device's
> > + * usage and children counters don't indicate that the device was in use before
> > + * the system-wide transition under way, decrement its parent's children counter
> > + * (if there is a parent). Keep runtime PM disabled to preserve the state
> > + * unless we encounter errors.
> > *
> > * Typically this function may be invoked from a system suspend callback to make
> > - * sure the device is put into low power state.
> > + * sure the device is put into low power state and it should only be used during
> > + * system-wide PM transitions to sleep states. It assumes that the analogous
> > + * pm_runtime_force_resume() will be used to resume the device.
> > */
> > int pm_runtime_force_suspend(struct device *dev)
> > {
> > @@ -1646,17 +1657,18 @@ int pm_runtime_force_suspend(struct devi
> > goto err;
> >
> > /*
> > - * Increase the runtime PM usage count for the device's parent, in case
> > - * when we find the device being used when system suspend was invoked.
> > - * This informs pm_runtime_force_resume() to resume the parent
> > - * immediately, which is needed to be able to resume its children,
> > - * when not deferring the resume to be managed via runtime PM.
> > + * If the device can stay in suspend after the system-wide transition
> > + * to the working state that will follow, drop the children counter of
> > + * its parent, but set its status to RPM_SUSPENDED anyway in case this
> > + * function will be called again for it in the meantime.
> > */
> > - if (dev->parent && atomic_read(&dev->power.usage_count) > 1)
> > - pm_runtime_get_noresume(dev->parent);
> > + if (pm_runtime_need_not_resume(dev))
> > + pm_runtime_set_suspended(dev);
> > + else
> > + __update_runtime_status(dev, RPM_SUSPENDED);
> >
>
> This is clever!
>
> I recall when working on commit 1d9174fbc55e ("PM / Runtime: Defer
> resuming of the device in pm_runtime_force_resume()"), that I wanted
> to take into account the dev->power.child_count, but couldn't figure
> out exactly how. :-)
>
> > - pm_runtime_set_suspended(dev);
> > return 0;
> > +
> > err:
> > pm_runtime_enable(dev);
> > return ret;
> > @@ -1669,13 +1681,9 @@ EXPORT_SYMBOL_GPL(pm_runtime_force_suspe
> > *
> > * Prior invoking this function we expect the user to have brought the device
> > * into low power state by a call to pm_runtime_force_suspend(). Here we reverse
> > - * those actions and brings the device into full power, if it is expected to be
> > - * used on system resume. To distinguish that, we check whether the runtime PM
> > - * usage count is greater than 1 (the PM core increases the usage count in the
> > - * system PM prepare phase), as that indicates a real user (such as a subsystem,
> > - * driver, userspace, etc.) is using it. If that is the case, the device is
> > - * expected to be used on system resume as well, so then we resume it. In the
> > - * other case, we defer the resume to be managed via runtime PM.
> > + * those actions and bring the device into full power, if it is expected to be
> > + * used on system resume. In the other case, we defer the resume to be managed
> > + * via runtime PM.
> > *
> > * Typically this function may be invoked from a system resume callback.
> > */
> > @@ -1684,32 +1692,18 @@ int pm_runtime_force_resume(struct devic
> > int (*callback)(struct device *);
> > int ret = 0;
> >
> > - callback = RPM_GET_CALLBACK(dev, runtime_resume);
> > -
> > - if (!callback) {
> > - ret = -ENOSYS;
> > - goto out;
> > - }
>
> This make me realize that, I think this is probably for legacy reasons.
>
> I think we should allow the callback to be NULL, don't you think?

Well, it probably doesn't matter too much, but OK.

> Of course, this deserves its own change.

Right.

> > -
> > - if (!pm_runtime_status_suspended(dev))
> > + if (!pm_runtime_status_suspended(dev) || pm_runtime_need_not_resume(dev))
> > goto out;
> >
> > /*
> > - * Decrease the parent's runtime PM usage count, if we increased it
> > - * during system suspend in pm_runtime_force_suspend().
> > - */
> > - if (atomic_read(&dev->power.usage_count) > 1) {
> > - if (dev->parent)
> > - pm_runtime_put_noidle(dev->parent);
> > - } else {
> > - goto out;
> > - }
> > + * The value of the parent's children counter is correct already, so
> > + * just update the status of the device.
> > + */
> > + __update_runtime_status(dev, RPM_ACTIVE);
> >
> > - ret = pm_runtime_set_active(dev);
> > - if (ret)
> > - goto out;
> > + callback = RPM_GET_CALLBACK(dev, runtime_resume);
> >
> > - ret = callback(dev);
> > + ret = callback ? callback(dev) : -ENOSYS;
> > if (ret) {
> > pm_runtime_set_suspended(dev);
> > goto out;
> >
>
> Thanks for looking into this!

No problem.

Thanks,
Rafael

2018-01-09 13:37:49

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

Hi Rafael,

On Wed, Jan 3, 2018 at 12:06 PM, Rafael J. Wysocki <[email protected]> wrote:
> From: Rafael J. Wysocki <[email protected]>
>
> One of the limitations of pm_runtime_force_suspend/resume() is that
> if a parent driver wants to use these functions, all of its child
> drivers have to do that too because of the parent usage counter
> manipulations necessary to get the correct state of the parent during
> system-wide transitions to the working state (system resume).
> However, that limitation turns out to be artificial, so remove it.
>
> Namely, pm_runtime_force_suspend() only needs to update the children
> counter of its parent (if there's is a parent) when the device can
> stay in suspend after the subsequent system resume transition, as
> that counter is correct already otherwise. Now, if the parent's
> children counter is not updated, it is not necessary to increment
> the parent's usage counter in that case any more, as long as the
> children counters of devices are checked along with their usage
> counters in order to decide whether or not the devices may be left
> in suspend after the subsequent system resume transition.
>
> Accordingly, modify pm_runtime_force_suspend() to only call
> pm_runtime_set_suspended() for devices whose usage and children
> counters are at the "no references" level (the runtime PM status
> of the device needs to be updated to "suspended" anyway in case
> this function is called once again for the same device during the
> transition under way), drop the parent usage counter incrementation
> from it and update pm_runtime_force_resume() to compensate for these
> changes.
>
> Signed-off-by: Rafael J. Wysocki <[email protected]>

This patch causes a regression during system resume on Renesas Salvator-XS
with R-Car H3 ES2.0:

SError Interrupt on CPU3, code 0xbf000002 -- SError
SError Interrupt on CPU2, code 0xbf000002 -- SError
CPU: 3 PID: 1769 Comm: kworker/u16:13 Not tainted
4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
CPU: 2 PID: 1774 Comm: kworker/u16:18 Not tainted
4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
Hardware name: Renesas Salvator-X 2nd version board based on
r8a7795 ES2.0+ (DT)
Hardware name: Renesas Salvator-X 2nd version board based on
r8a7795 ES2.0+ (DT)
Workqueue: events_unbound async_run_entry_fn
Workqueue: events_unbound async_run_entry_fn
pstate: 60000005 (nZCv daif -PAN -UAO)
pstate: 60000005 (nZCv daif -PAN -UAO)
pc : rcar_gen3_phy_usb2_init+0x34/0xf8
pc : rcar_gen3_phy_usb2_init+0x34/0xf8
lr : phy_init+0x64/0xcc
lr : phy_init+0x64/0xcc
...
Kernel panic - not syncing: Asynchronous SError Interrupt

Note that before, it printed a warning instead:

Enabling runtime PM for inactive device (ee0a0200.usb-phy) with
active children
WARNING: CPU: 0 PID: 1741 at drivers/base/power/runtime.c:1300
pm_runtime_enable+0x94/0xd8

Reverting commit 0408584d580d4a2c ("PM / runtime: Rework
pm_runtime_force_suspend/resume()") fixes the crash.

Note that applying Ulf's "[PATCH v2 0/3] phy: core: Re-work runtime PM
deployment and fix an issue"
(https://www.spinics.net/lists/linux-renesas-soc/msg21719.html) instead
does not fix the crash.

With more debug code added, it seems the EHCI module clocks (701-703) are
enabled earlier than before. I guess this triggers the workqueue to perform
an operation while another related device (HSUSB 704?) is still disabled?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-01-09 14:13:57

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

[...]

>> > +++ linux-pm/drivers/base/power/runtime.c
>> > @@ -1613,17 +1613,28 @@ void pm_runtime_drop_link(struct device
>> > spin_unlock_irq(&dev->power.lock);
>> > }
>> >
>> > +static bool pm_runtime_need_not_resume(struct device *dev)
>> > +{
>> > + return atomic_read(&dev->power.usage_count) <= 1 &&
>> > + atomic_read(&dev->power.child_count) == 0;
>>
>> We should take into account the ignore_children flag here, I think.
>> Something like this:
>>
>> return atomic_read(&dev->power.usage_count) <= 1 &&
>> (atomic_read(&dev->power.child_count) == 0 ||
>> dev->power.ignore_children)
>
> The current code doesn't quite take ignore_children into account, however.
>
> Regardless of which way the change is made, there will be one corner case
> that is not going to be covered. First, if ignore_children is taken into
> account, some cases in which the current code increments the parent's
> usage counter will be treated as "need not resume". Second, if ignore_children
> is ignored, some cases in which the parent's usage counter is not incremented
> today will cause the parent to resume after the change.
>
> Frankly, I prefer to ignore ignore_children at least for the time being,
> because resuming the parent unnecessarily is not a tragedy (it will likely
> suspend shortly anyway), but if it is necessary to resume it and it is not
> be resumed, things will visibly break.
>
> So, I'd prefer to leave this patch as is and do a second one adding the
> ignore_children check. Then, if things break due to the second patch, it
> can be reverted easily (unless that can be fixed differently).

Okay, that seems like a reasonable approach.

Feel free to add:

Reviewed-by: Ulf Hansson <[email protected]>

[...]

Kind regards
Uffe

2018-01-09 14:27:36

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

On 9 January 2018 at 14:37, Geert Uytterhoeven <[email protected]> wrote:
> Hi Rafael,
>
> On Wed, Jan 3, 2018 at 12:06 PM, Rafael J. Wysocki <[email protected]> wrote:
>> From: Rafael J. Wysocki <[email protected]>
>>
>> One of the limitations of pm_runtime_force_suspend/resume() is that
>> if a parent driver wants to use these functions, all of its child
>> drivers have to do that too because of the parent usage counter
>> manipulations necessary to get the correct state of the parent during
>> system-wide transitions to the working state (system resume).
>> However, that limitation turns out to be artificial, so remove it.
>>
>> Namely, pm_runtime_force_suspend() only needs to update the children
>> counter of its parent (if there's is a parent) when the device can
>> stay in suspend after the subsequent system resume transition, as
>> that counter is correct already otherwise. Now, if the parent's
>> children counter is not updated, it is not necessary to increment
>> the parent's usage counter in that case any more, as long as the
>> children counters of devices are checked along with their usage
>> counters in order to decide whether or not the devices may be left
>> in suspend after the subsequent system resume transition.
>>
>> Accordingly, modify pm_runtime_force_suspend() to only call
>> pm_runtime_set_suspended() for devices whose usage and children
>> counters are at the "no references" level (the runtime PM status
>> of the device needs to be updated to "suspended" anyway in case
>> this function is called once again for the same device during the
>> transition under way), drop the parent usage counter incrementation
>> from it and update pm_runtime_force_resume() to compensate for these
>> changes.
>>
>> Signed-off-by: Rafael J. Wysocki <[email protected]>
>
> This patch causes a regression during system resume on Renesas Salvator-XS
> with R-Car H3 ES2.0:
>
> SError Interrupt on CPU3, code 0xbf000002 -- SError
> SError Interrupt on CPU2, code 0xbf000002 -- SError
> CPU: 3 PID: 1769 Comm: kworker/u16:13 Not tainted
> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
> CPU: 2 PID: 1774 Comm: kworker/u16:18 Not tainted
> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
> Hardware name: Renesas Salvator-X 2nd version board based on
> r8a7795 ES2.0+ (DT)
> Hardware name: Renesas Salvator-X 2nd version board based on
> r8a7795 ES2.0+ (DT)
> Workqueue: events_unbound async_run_entry_fn
> Workqueue: events_unbound async_run_entry_fn
> pstate: 60000005 (nZCv daif -PAN -UAO)
> pstate: 60000005 (nZCv daif -PAN -UAO)
> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
> lr : phy_init+0x64/0xcc
> lr : phy_init+0x64/0xcc
> ...
> Kernel panic - not syncing: Asynchronous SError Interrupt
>
> Note that before, it printed a warning instead:
>
> Enabling runtime PM for inactive device (ee0a0200.usb-phy) with
> active children
> WARNING: CPU: 0 PID: 1741 at drivers/base/power/runtime.c:1300
> pm_runtime_enable+0x94/0xd8
>
> Reverting commit 0408584d580d4a2c ("PM / runtime: Rework
> pm_runtime_force_suspend/resume()") fixes the crash.
>
> Note that applying Ulf's "[PATCH v2 0/3] phy: core: Re-work runtime PM
> deployment and fix an issue"
> (https://www.spinics.net/lists/linux-renesas-soc/msg21719.html) instead
> does not fix the crash.

What happens if you apply both the revert and the above series?

Does the WARNING still disappear? Or have something else been changed?

>
> With more debug code added, it seems the EHCI module clocks (701-703) are
> enabled earlier than before. I guess this triggers the workqueue to perform
> an operation while another related device (HSUSB 704?) is still disabled?

Huh, this seems broken in several aspects.

Could this be the classic case of having the wrong suspend/resume
order of devices?

[...]

Kind regards
Uffe

2018-01-09 14:34:19

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

Hi Ulf,

On Tue, Jan 9, 2018 at 3:27 PM, Ulf Hansson <[email protected]> wrote:
> On 9 January 2018 at 14:37, Geert Uytterhoeven <[email protected]> wrote:
>> On Wed, Jan 3, 2018 at 12:06 PM, Rafael J. Wysocki <[email protected]> wrote:
>>> From: Rafael J. Wysocki <[email protected]>
>>> One of the limitations of pm_runtime_force_suspend/resume() is that
>>> if a parent driver wants to use these functions, all of its child
>>> drivers have to do that too because of the parent usage counter
>>> manipulations necessary to get the correct state of the parent during
>>> system-wide transitions to the working state (system resume).
>>> However, that limitation turns out to be artificial, so remove it.
>>>
>>> Namely, pm_runtime_force_suspend() only needs to update the children
>>> counter of its parent (if there's is a parent) when the device can
>>> stay in suspend after the subsequent system resume transition, as
>>> that counter is correct already otherwise. Now, if the parent's
>>> children counter is not updated, it is not necessary to increment
>>> the parent's usage counter in that case any more, as long as the
>>> children counters of devices are checked along with their usage
>>> counters in order to decide whether or not the devices may be left
>>> in suspend after the subsequent system resume transition.
>>>
>>> Accordingly, modify pm_runtime_force_suspend() to only call
>>> pm_runtime_set_suspended() for devices whose usage and children
>>> counters are at the "no references" level (the runtime PM status
>>> of the device needs to be updated to "suspended" anyway in case
>>> this function is called once again for the same device during the
>>> transition under way), drop the parent usage counter incrementation
>>> from it and update pm_runtime_force_resume() to compensate for these
>>> changes.
>>>
>>> Signed-off-by: Rafael J. Wysocki <[email protected]>
>>
>> This patch causes a regression during system resume on Renesas Salvator-XS
>> with R-Car H3 ES2.0:
>>
>> SError Interrupt on CPU3, code 0xbf000002 -- SError
>> SError Interrupt on CPU2, code 0xbf000002 -- SError
>> CPU: 3 PID: 1769 Comm: kworker/u16:13 Not tainted
>> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
>> CPU: 2 PID: 1774 Comm: kworker/u16:18 Not tainted
>> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
>> Hardware name: Renesas Salvator-X 2nd version board based on
>> r8a7795 ES2.0+ (DT)
>> Hardware name: Renesas Salvator-X 2nd version board based on
>> r8a7795 ES2.0+ (DT)
>> Workqueue: events_unbound async_run_entry_fn
>> Workqueue: events_unbound async_run_entry_fn
>> pstate: 60000005 (nZCv daif -PAN -UAO)
>> pstate: 60000005 (nZCv daif -PAN -UAO)
>> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
>> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
>> lr : phy_init+0x64/0xcc
>> lr : phy_init+0x64/0xcc
>> ...
>> Kernel panic - not syncing: Asynchronous SError Interrupt
>>
>> Note that before, it printed a warning instead:
>>
>> Enabling runtime PM for inactive device (ee0a0200.usb-phy) with
>> active children
>> WARNING: CPU: 0 PID: 1741 at drivers/base/power/runtime.c:1300
>> pm_runtime_enable+0x94/0xd8
>>
>> Reverting commit 0408584d580d4a2c ("PM / runtime: Rework
>> pm_runtime_force_suspend/resume()") fixes the crash.
>>
>> Note that applying Ulf's "[PATCH v2 0/3] phy: core: Re-work runtime PM
>> deployment and fix an issue"
>> (https://www.spinics.net/lists/linux-renesas-soc/msg21719.html) instead
>> does not fix the crash.
>
> What happens if you apply both the revert and the above series?
>
> Does the WARNING still disappear? Or have something else been changed?

The warning doesn't seem to go away in that case.

Although Shimoda-san reported before it does. So something else has changed?

>> With more debug code added, it seems the EHCI module clocks (701-703) are
>> enabled earlier than before. I guess this triggers the workqueue to perform
>> an operation while another related device (HSUSB 704?) is still disabled?
>
> Huh, this seems broken in several aspects.
>
> Could this be the classic case of having the wrong suspend/resume
> order of devices?

Possibly.

/me no USB expert.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-01-09 15:00:39

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Tue, Jan 9, 2018 at 2:37 PM, Geert Uytterhoeven <[email protected]> wrote:
> Hi Rafael,
>
> On Wed, Jan 3, 2018 at 12:06 PM, Rafael J. Wysocki <[email protected]> wrote:
>> From: Rafael J. Wysocki <[email protected]>
>>
>> One of the limitations of pm_runtime_force_suspend/resume() is that
>> if a parent driver wants to use these functions, all of its child
>> drivers have to do that too because of the parent usage counter
>> manipulations necessary to get the correct state of the parent during
>> system-wide transitions to the working state (system resume).
>> However, that limitation turns out to be artificial, so remove it.
>>
>> Namely, pm_runtime_force_suspend() only needs to update the children
>> counter of its parent (if there's is a parent) when the device can
>> stay in suspend after the subsequent system resume transition, as
>> that counter is correct already otherwise. Now, if the parent's
>> children counter is not updated, it is not necessary to increment
>> the parent's usage counter in that case any more, as long as the
>> children counters of devices are checked along with their usage
>> counters in order to decide whether or not the devices may be left
>> in suspend after the subsequent system resume transition.
>>
>> Accordingly, modify pm_runtime_force_suspend() to only call
>> pm_runtime_set_suspended() for devices whose usage and children
>> counters are at the "no references" level (the runtime PM status
>> of the device needs to be updated to "suspended" anyway in case
>> this function is called once again for the same device during the
>> transition under way), drop the parent usage counter incrementation
>> from it and update pm_runtime_force_resume() to compensate for these
>> changes.
>>
>> Signed-off-by: Rafael J. Wysocki <[email protected]>
>
> This patch causes a regression during system resume on Renesas Salvator-XS
> with R-Car H3 ES2.0:

I have dropped it for now, but we need to address the underlying issue.

> SError Interrupt on CPU3, code 0xbf000002 -- SError
> SError Interrupt on CPU2, code 0xbf000002 -- SError
> CPU: 3 PID: 1769 Comm: kworker/u16:13 Not tainted
> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
> CPU: 2 PID: 1774 Comm: kworker/u16:18 Not tainted
> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
> Hardware name: Renesas Salvator-X 2nd version board based on
> r8a7795 ES2.0+ (DT)
> Hardware name: Renesas Salvator-X 2nd version board based on
> r8a7795 ES2.0+ (DT)
> Workqueue: events_unbound async_run_entry_fn
> Workqueue: events_unbound async_run_entry_fn
> pstate: 60000005 (nZCv daif -PAN -UAO)
> pstate: 60000005 (nZCv daif -PAN -UAO)
> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
> lr : phy_init+0x64/0xcc
> lr : phy_init+0x64/0xcc
> ...
> Kernel panic - not syncing: Asynchronous SError Interrupt
>
> Note that before, it printed a warning instead:
>
> Enabling runtime PM for inactive device (ee0a0200.usb-phy) with
> active children
> WARNING: CPU: 0 PID: 1741 at drivers/base/power/runtime.c:1300
> pm_runtime_enable+0x94/0xd8
>
> Reverting commit 0408584d580d4a2c ("PM / runtime: Rework
> pm_runtime_force_suspend/resume()") fixes the crash.
>
> Note that applying Ulf's "[PATCH v2 0/3] phy: core: Re-work runtime PM
> deployment and fix an issue"
> (https://www.spinics.net/lists/linux-renesas-soc/msg21719.html) instead
> does not fix the crash.

OK

> With more debug code added, it seems the EHCI module clocks (701-703) are
> enabled earlier than before. I guess this triggers the workqueue to perform
> an operation while another related device (HSUSB 704?) is still disabled?

Probably.

Likely a device that wasn't resumed before resumes now and that causes
the issue to appear.

I'm wondering if adding the ignore_children check to the patch helps.
If not, there clearly is a resume ordering issue that is papered over
by the current code.

Thanks,
Rafael

2018-01-09 15:31:04

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

Hi Rafael,

CC usb

On Tue, Jan 9, 2018 at 4:00 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Tue, Jan 9, 2018 at 2:37 PM, Geert Uytterhoeven <[email protected]> wrote:
>> On Wed, Jan 3, 2018 at 12:06 PM, Rafael J. Wysocki <[email protected]> wrote:
>>> From: Rafael J. Wysocki <[email protected]>
>>>
>>> One of the limitations of pm_runtime_force_suspend/resume() is that
>>> if a parent driver wants to use these functions, all of its child
>>> drivers have to do that too because of the parent usage counter
>>> manipulations necessary to get the correct state of the parent during
>>> system-wide transitions to the working state (system resume).
>>> However, that limitation turns out to be artificial, so remove it.
>>>
>>> Namely, pm_runtime_force_suspend() only needs to update the children
>>> counter of its parent (if there's is a parent) when the device can
>>> stay in suspend after the subsequent system resume transition, as
>>> that counter is correct already otherwise. Now, if the parent's
>>> children counter is not updated, it is not necessary to increment
>>> the parent's usage counter in that case any more, as long as the
>>> children counters of devices are checked along with their usage
>>> counters in order to decide whether or not the devices may be left
>>> in suspend after the subsequent system resume transition.
>>>
>>> Accordingly, modify pm_runtime_force_suspend() to only call
>>> pm_runtime_set_suspended() for devices whose usage and children
>>> counters are at the "no references" level (the runtime PM status
>>> of the device needs to be updated to "suspended" anyway in case
>>> this function is called once again for the same device during the
>>> transition under way), drop the parent usage counter incrementation
>>> from it and update pm_runtime_force_resume() to compensate for these
>>> changes.
>>>
>>> Signed-off-by: Rafael J. Wysocki <[email protected]>
>>
>> This patch causes a regression during system resume on Renesas Salvator-XS
>> with R-Car H3 ES2.0:
>
> I have dropped it for now, but we need to address the underlying issue.
>
>> SError Interrupt on CPU3, code 0xbf000002 -- SError
>> SError Interrupt on CPU2, code 0xbf000002 -- SError
>> CPU: 3 PID: 1769 Comm: kworker/u16:13 Not tainted
>> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
>> CPU: 2 PID: 1774 Comm: kworker/u16:18 Not tainted
>> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
>> Hardware name: Renesas Salvator-X 2nd version board based on
>> r8a7795 ES2.0+ (DT)
>> Hardware name: Renesas Salvator-X 2nd version board based on
>> r8a7795 ES2.0+ (DT)
>> Workqueue: events_unbound async_run_entry_fn
>> Workqueue: events_unbound async_run_entry_fn
>> pstate: 60000005 (nZCv daif -PAN -UAO)
>> pstate: 60000005 (nZCv daif -PAN -UAO)
>> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
>> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
>> lr : phy_init+0x64/0xcc
>> lr : phy_init+0x64/0xcc
>> ...
>> Kernel panic - not syncing: Asynchronous SError Interrupt
>>
>> Note that before, it printed a warning instead:
>>
>> Enabling runtime PM for inactive device (ee0a0200.usb-phy) with
>> active children
>> WARNING: CPU: 0 PID: 1741 at drivers/base/power/runtime.c:1300
>> pm_runtime_enable+0x94/0xd8
>>
>> Reverting commit 0408584d580d4a2c ("PM / runtime: Rework
>> pm_runtime_force_suspend/resume()") fixes the crash.
>>
>> Note that applying Ulf's "[PATCH v2 0/3] phy: core: Re-work runtime PM
>> deployment and fix an issue"
>> (https://www.spinics.net/lists/linux-renesas-soc/msg21719.html) instead
>> does not fix the crash.
>
> OK
>
>> With more debug code added, it seems the EHCI module clocks (701-703) are
>> enabled earlier than before. I guess this triggers the workqueue to perform
>> an operation while another related device (HSUSB 704?) is still disabled?
>
> Probably.
>
> Likely a device that wasn't resumed before resumes now and that causes
> the issue to appear.
>
> I'm wondering if adding the ignore_children check to the patch helps.
> If not, there clearly is a resume ordering issue that is papered over
> by the current code.

Something fishy is going on. Status of the USB PHYs differ before/after
system suspend, according to /sys/kernel/debug/pm_genpd/pm_genpd_summary:

- /devices/platform/soc/ee0a0200.usb-phy active
- /devices/platform/soc/ee0c0200.usb-phy active
- /devices/platform/soc/ee080200.usb-phy active
+ /devices/platform/soc/ee0a0200.usb-phy suspended
+ /devices/platform/soc/ee0c0200.usb-phy suspended
+ /devices/platform/soc/ee080200.usb-phy suspended

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-01-09 16:03:23

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Tue, Jan 9, 2018 at 4:30 PM, Geert Uytterhoeven <[email protected]> wrote:
> Hi Rafael,
>
> CC usb
>
> On Tue, Jan 9, 2018 at 4:00 PM, Rafael J. Wysocki <[email protected]> wrote:
>> On Tue, Jan 9, 2018 at 2:37 PM, Geert Uytterhoeven <[email protected]> wrote:
>>> On Wed, Jan 3, 2018 at 12:06 PM, Rafael J. Wysocki <[email protected]> wrote:
>>>> From: Rafael J. Wysocki <[email protected]>
>>>>
>>>> One of the limitations of pm_runtime_force_suspend/resume() is that
>>>> if a parent driver wants to use these functions, all of its child
>>>> drivers have to do that too because of the parent usage counter
>>>> manipulations necessary to get the correct state of the parent during
>>>> system-wide transitions to the working state (system resume).
>>>> However, that limitation turns out to be artificial, so remove it.
>>>>
>>>> Namely, pm_runtime_force_suspend() only needs to update the children
>>>> counter of its parent (if there's is a parent) when the device can
>>>> stay in suspend after the subsequent system resume transition, as
>>>> that counter is correct already otherwise. Now, if the parent's
>>>> children counter is not updated, it is not necessary to increment
>>>> the parent's usage counter in that case any more, as long as the
>>>> children counters of devices are checked along with their usage
>>>> counters in order to decide whether or not the devices may be left
>>>> in suspend after the subsequent system resume transition.
>>>>
>>>> Accordingly, modify pm_runtime_force_suspend() to only call
>>>> pm_runtime_set_suspended() for devices whose usage and children
>>>> counters are at the "no references" level (the runtime PM status
>>>> of the device needs to be updated to "suspended" anyway in case
>>>> this function is called once again for the same device during the
>>>> transition under way), drop the parent usage counter incrementation
>>>> from it and update pm_runtime_force_resume() to compensate for these
>>>> changes.
>>>>
>>>> Signed-off-by: Rafael J. Wysocki <[email protected]>
>>>
>>> This patch causes a regression during system resume on Renesas Salvator-XS
>>> with R-Car H3 ES2.0:
>>
>> I have dropped it for now, but we need to address the underlying issue.
>>
>>> SError Interrupt on CPU3, code 0xbf000002 -- SError
>>> SError Interrupt on CPU2, code 0xbf000002 -- SError
>>> CPU: 3 PID: 1769 Comm: kworker/u16:13 Not tainted
>>> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
>>> CPU: 2 PID: 1774 Comm: kworker/u16:18 Not tainted
>>> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
>>> Hardware name: Renesas Salvator-X 2nd version board based on
>>> r8a7795 ES2.0+ (DT)
>>> Hardware name: Renesas Salvator-X 2nd version board based on
>>> r8a7795 ES2.0+ (DT)
>>> Workqueue: events_unbound async_run_entry_fn
>>> Workqueue: events_unbound async_run_entry_fn
>>> pstate: 60000005 (nZCv daif -PAN -UAO)
>>> pstate: 60000005 (nZCv daif -PAN -UAO)
>>> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
>>> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
>>> lr : phy_init+0x64/0xcc
>>> lr : phy_init+0x64/0xcc
>>> ...
>>> Kernel panic - not syncing: Asynchronous SError Interrupt
>>>
>>> Note that before, it printed a warning instead:
>>>
>>> Enabling runtime PM for inactive device (ee0a0200.usb-phy) with
>>> active children
>>> WARNING: CPU: 0 PID: 1741 at drivers/base/power/runtime.c:1300
>>> pm_runtime_enable+0x94/0xd8
>>>
>>> Reverting commit 0408584d580d4a2c ("PM / runtime: Rework
>>> pm_runtime_force_suspend/resume()") fixes the crash.
>>>
>>> Note that applying Ulf's "[PATCH v2 0/3] phy: core: Re-work runtime PM
>>> deployment and fix an issue"
>>> (https://www.spinics.net/lists/linux-renesas-soc/msg21719.html) instead
>>> does not fix the crash.
>>
>> OK
>>
>>> With more debug code added, it seems the EHCI module clocks (701-703) are
>>> enabled earlier than before. I guess this triggers the workqueue to perform
>>> an operation while another related device (HSUSB 704?) is still disabled?
>>
>> Probably.
>>
>> Likely a device that wasn't resumed before resumes now and that causes
>> the issue to appear.
>>
>> I'm wondering if adding the ignore_children check to the patch helps.
>> If not, there clearly is a resume ordering issue that is papered over
>> by the current code.
>
> Something fishy is going on. Status of the USB PHYs differ before/after
> system suspend, according to /sys/kernel/debug/pm_genpd/pm_genpd_summary:
>
> - /devices/platform/soc/ee0a0200.usb-phy active
> - /devices/platform/soc/ee0c0200.usb-phy active
> - /devices/platform/soc/ee080200.usb-phy active
> + /devices/platform/soc/ee0a0200.usb-phy suspended
> + /devices/platform/soc/ee0c0200.usb-phy suspended
> + /devices/platform/soc/ee080200.usb-phy suspended

Yeah.

That's because of the pm_runtime_force_suspend/resume() called by
genpd. These functions generally may cause devices active before
system suspend to be left in suspend after it. That generally is a
good idea if the device was not really in use before the system
suspend, but there is a problem that the driver of it may not be
prepared for that to happen (and also the way to determine the "not
really in use" case may not be as expected by the driver).

Thanks,
Rafael

2018-01-09 16:27:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Tuesday, January 9, 2018 4:00:35 PM CET Rafael J. Wysocki wrote:
> On Tue, Jan 9, 2018 at 2:37 PM, Geert Uytterhoeven <[email protected]> wrote:
> > Hi Rafael,
> >
> > On Wed, Jan 3, 2018 at 12:06 PM, Rafael J. Wysocki <[email protected]> wrote:
> >> From: Rafael J. Wysocki <[email protected]>
> >>
> >> One of the limitations of pm_runtime_force_suspend/resume() is that
> >> if a parent driver wants to use these functions, all of its child
> >> drivers have to do that too because of the parent usage counter
> >> manipulations necessary to get the correct state of the parent during
> >> system-wide transitions to the working state (system resume).
> >> However, that limitation turns out to be artificial, so remove it.
> >>
> >> Namely, pm_runtime_force_suspend() only needs to update the children
> >> counter of its parent (if there's is a parent) when the device can
> >> stay in suspend after the subsequent system resume transition, as
> >> that counter is correct already otherwise. Now, if the parent's
> >> children counter is not updated, it is not necessary to increment
> >> the parent's usage counter in that case any more, as long as the
> >> children counters of devices are checked along with their usage
> >> counters in order to decide whether or not the devices may be left
> >> in suspend after the subsequent system resume transition.
> >>
> >> Accordingly, modify pm_runtime_force_suspend() to only call
> >> pm_runtime_set_suspended() for devices whose usage and children
> >> counters are at the "no references" level (the runtime PM status
> >> of the device needs to be updated to "suspended" anyway in case
> >> this function is called once again for the same device during the
> >> transition under way), drop the parent usage counter incrementation
> >> from it and update pm_runtime_force_resume() to compensate for these
> >> changes.
> >>
> >> Signed-off-by: Rafael J. Wysocki <[email protected]>
> >
> > This patch causes a regression during system resume on Renesas Salvator-XS
> > with R-Car H3 ES2.0:
>
> I have dropped it for now, but we need to address the underlying issue.
>
> > SError Interrupt on CPU3, code 0xbf000002 -- SError
> > SError Interrupt on CPU2, code 0xbf000002 -- SError
> > CPU: 3 PID: 1769 Comm: kworker/u16:13 Not tainted
> > 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
> > CPU: 2 PID: 1774 Comm: kworker/u16:18 Not tainted
> > 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
> > Hardware name: Renesas Salvator-X 2nd version board based on
> > r8a7795 ES2.0+ (DT)
> > Hardware name: Renesas Salvator-X 2nd version board based on
> > r8a7795 ES2.0+ (DT)
> > Workqueue: events_unbound async_run_entry_fn
> > Workqueue: events_unbound async_run_entry_fn
> > pstate: 60000005 (nZCv daif -PAN -UAO)
> > pstate: 60000005 (nZCv daif -PAN -UAO)
> > pc : rcar_gen3_phy_usb2_init+0x34/0xf8
> > pc : rcar_gen3_phy_usb2_init+0x34/0xf8
> > lr : phy_init+0x64/0xcc
> > lr : phy_init+0x64/0xcc
> > ...
> > Kernel panic - not syncing: Asynchronous SError Interrupt
> >
> > Note that before, it printed a warning instead:
> >
> > Enabling runtime PM for inactive device (ee0a0200.usb-phy) with
> > active children
> > WARNING: CPU: 0 PID: 1741 at drivers/base/power/runtime.c:1300
> > pm_runtime_enable+0x94/0xd8
> >
> > Reverting commit 0408584d580d4a2c ("PM / runtime: Rework
> > pm_runtime_force_suspend/resume()") fixes the crash.
> >
> > Note that applying Ulf's "[PATCH v2 0/3] phy: core: Re-work runtime PM
> > deployment and fix an issue"
> > (https://www.spinics.net/lists/linux-renesas-soc/msg21719.html) instead
> > does not fix the crash.
>
> OK
>
> > With more debug code added, it seems the EHCI module clocks (701-703) are
> > enabled earlier than before. I guess this triggers the workqueue to perform
> > an operation while another related device (HSUSB 704?) is still disabled?
>
> Probably.
>
> Likely a device that wasn't resumed before resumes now and that causes
> the issue to appear.
>
> I'm wondering if adding the ignore_children check to the patch helps.
> If not, there clearly is a resume ordering issue that is papered over
> by the current code.

Below is an alternative version of the patch that caused the problem
to happen which checks the ignore_children flag of the device. It may
avoid the issue by accident, but then I'd rather make it not happen. :-)

Please try this patch anyway, though, and let me know if the crash is still
there.

Thanks,
Rafael

---
From: Rafael J. Wysocki <[email protected]>
Subject: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

One of the limitations of pm_runtime_force_suspend/resume() is that
if a parent driver wants to use these functions, all of its child
drivers have to do that too because of the parent usage counter
manipulations necessary to get the correct state of the parent during
system-wide transitions to the working state (system resume).
However, that limitation turns out to be artificial, so remove it.

Namely, pm_runtime_force_suspend() only needs to update the children
counter of its parent (if there's is a parent) when the device can
stay in suspend after the subsequent system resume transition, as
that counter is correct already otherwise. Now, if the parent's
children counter is not updated, it is not necessary to increment
the parent's usage counter in that case any more, as long as the
children counters of devices are checked along with their usage
counters in order to decide whether or not the devices may be left
in suspend after the subsequent system resume transition.

Accordingly, modify pm_runtime_force_suspend() to only call
pm_runtime_set_suspended() for devices whose usage and children
counters are at the "no references" level (the runtime PM status
of the device needs to be updated to "suspended" anyway in case
this function is called once again for the same device during the
transition under way), drop the parent usage counter incrementation
from it and update pm_runtime_force_resume() to compensate for these
changes.

Signed-off-by: Rafael J. Wysocki <[email protected]>
---
drivers/base/power/runtime.c | 75 ++++++++++++++++++++-----------------------
1 file changed, 35 insertions(+), 40 deletions(-)

Index: linux-pm/drivers/base/power/runtime.c
===================================================================
--- linux-pm.orig/drivers/base/power/runtime.c
+++ linux-pm/drivers/base/power/runtime.c
@@ -1613,17 +1613,29 @@ void pm_runtime_drop_link(struct device
spin_unlock_irq(&dev->power.lock);
}

+static bool pm_runtime_need_not_resume(struct device *dev)
+{
+ return atomic_read(&dev->power.usage_count) <= 1 &&
+ (atomic_read(&dev->power.child_count) == 0 ||
+ dev->power.ignore_children);
+}
+
/**
* pm_runtime_force_suspend - Force a device into suspend state if needed.
* @dev: Device to suspend.
*
* Disable runtime PM so we safely can check the device's runtime PM status and
- * if it is active, invoke it's .runtime_suspend callback to bring it into
- * suspend state. Keep runtime PM disabled to preserve the state unless we
- * encounter errors.
+ * if it is active, invoke its ->runtime_suspend callback to suspend it and
+ * change its runtime PM status field to RPM_SUSPENDED. Also, if the device's
+ * usage and children counters don't indicate that the device was in use before
+ * the system-wide transition under way, decrement its parent's children counter
+ * (if there is a parent). Keep runtime PM disabled to preserve the state
+ * unless we encounter errors.
*
* Typically this function may be invoked from a system suspend callback to make
- * sure the device is put into low power state.
+ * sure the device is put into low power state and it should only be used during
+ * system-wide PM transitions to sleep states. It assumes that the analogous
+ * pm_runtime_force_resume() will be used to resume the device.
*/
int pm_runtime_force_suspend(struct device *dev)
{
@@ -1646,17 +1658,18 @@ int pm_runtime_force_suspend(struct devi
goto err;

/*
- * Increase the runtime PM usage count for the device's parent, in case
- * when we find the device being used when system suspend was invoked.
- * This informs pm_runtime_force_resume() to resume the parent
- * immediately, which is needed to be able to resume its children,
- * when not deferring the resume to be managed via runtime PM.
+ * If the device can stay in suspend after the system-wide transition
+ * to the working state that will follow, drop the children counter of
+ * its parent, but set its status to RPM_SUSPENDED anyway in case this
+ * function will be called again for it in the meantime.
*/
- if (dev->parent && atomic_read(&dev->power.usage_count) > 1)
- pm_runtime_get_noresume(dev->parent);
+ if (pm_runtime_need_not_resume(dev))
+ pm_runtime_set_suspended(dev);
+ else
+ __update_runtime_status(dev, RPM_SUSPENDED);

- pm_runtime_set_suspended(dev);
return 0;
+
err:
pm_runtime_enable(dev);
return ret;
@@ -1669,13 +1682,9 @@ EXPORT_SYMBOL_GPL(pm_runtime_force_suspe
*
* Prior invoking this function we expect the user to have brought the device
* into low power state by a call to pm_runtime_force_suspend(). Here we reverse
- * those actions and brings the device into full power, if it is expected to be
- * used on system resume. To distinguish that, we check whether the runtime PM
- * usage count is greater than 1 (the PM core increases the usage count in the
- * system PM prepare phase), as that indicates a real user (such as a subsystem,
- * driver, userspace, etc.) is using it. If that is the case, the device is
- * expected to be used on system resume as well, so then we resume it. In the
- * other case, we defer the resume to be managed via runtime PM.
+ * those actions and bring the device into full power, if it is expected to be
+ * used on system resume. In the other case, we defer the resume to be managed
+ * via runtime PM.
*
* Typically this function may be invoked from a system resume callback.
*/
@@ -1684,32 +1693,18 @@ int pm_runtime_force_resume(struct devic
int (*callback)(struct device *);
int ret = 0;

- callback = RPM_GET_CALLBACK(dev, runtime_resume);
-
- if (!callback) {
- ret = -ENOSYS;
- goto out;
- }
-
- if (!pm_runtime_status_suspended(dev))
+ if (!pm_runtime_status_suspended(dev) || pm_runtime_need_not_resume(dev))
goto out;

/*
- * Decrease the parent's runtime PM usage count, if we increased it
- * during system suspend in pm_runtime_force_suspend().
- */
- if (atomic_read(&dev->power.usage_count) > 1) {
- if (dev->parent)
- pm_runtime_put_noidle(dev->parent);
- } else {
- goto out;
- }
+ * The value of the parent's children counter is correct already, so
+ * just update the status of the device.
+ */
+ __update_runtime_status(dev, RPM_ACTIVE);

- ret = pm_runtime_set_active(dev);
- if (ret)
- goto out;
+ callback = RPM_GET_CALLBACK(dev, runtime_resume);

- ret = callback(dev);
+ ret = callback ? callback(dev) : -ENOSYS;
if (ret) {
pm_runtime_set_suspended(dev);
goto out;

2018-01-09 16:29:50

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Tuesday, January 9, 2018 5:03:18 PM CET Rafael J. Wysocki wrote:
> On Tue, Jan 9, 2018 at 4:30 PM, Geert Uytterhoeven <[email protected]> wrote:
> > Hi Rafael,
> >
> > CC usb
> >
> > On Tue, Jan 9, 2018 at 4:00 PM, Rafael J. Wysocki <[email protected]> wrote:
> >> On Tue, Jan 9, 2018 at 2:37 PM, Geert Uytterhoeven <[email protected]> wrote:
> >>> On Wed, Jan 3, 2018 at 12:06 PM, Rafael J. Wysocki <[email protected]> wrote:
> >>>> From: Rafael J. Wysocki <[email protected]>
> >>>>
> >>>> One of the limitations of pm_runtime_force_suspend/resume() is that
> >>>> if a parent driver wants to use these functions, all of its child
> >>>> drivers have to do that too because of the parent usage counter
> >>>> manipulations necessary to get the correct state of the parent during
> >>>> system-wide transitions to the working state (system resume).
> >>>> However, that limitation turns out to be artificial, so remove it.
> >>>>
> >>>> Namely, pm_runtime_force_suspend() only needs to update the children
> >>>> counter of its parent (if there's is a parent) when the device can
> >>>> stay in suspend after the subsequent system resume transition, as
> >>>> that counter is correct already otherwise. Now, if the parent's
> >>>> children counter is not updated, it is not necessary to increment
> >>>> the parent's usage counter in that case any more, as long as the
> >>>> children counters of devices are checked along with their usage
> >>>> counters in order to decide whether or not the devices may be left
> >>>> in suspend after the subsequent system resume transition.
> >>>>
> >>>> Accordingly, modify pm_runtime_force_suspend() to only call
> >>>> pm_runtime_set_suspended() for devices whose usage and children
> >>>> counters are at the "no references" level (the runtime PM status
> >>>> of the device needs to be updated to "suspended" anyway in case
> >>>> this function is called once again for the same device during the
> >>>> transition under way), drop the parent usage counter incrementation
> >>>> from it and update pm_runtime_force_resume() to compensate for these
> >>>> changes.
> >>>>
> >>>> Signed-off-by: Rafael J. Wysocki <[email protected]>
> >>>
> >>> This patch causes a regression during system resume on Renesas Salvator-XS
> >>> with R-Car H3 ES2.0:
> >>
> >> I have dropped it for now, but we need to address the underlying issue.
> >>
> >>> SError Interrupt on CPU3, code 0xbf000002 -- SError
> >>> SError Interrupt on CPU2, code 0xbf000002 -- SError
> >>> CPU: 3 PID: 1769 Comm: kworker/u16:13 Not tainted
> >>> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
> >>> CPU: 2 PID: 1774 Comm: kworker/u16:18 Not tainted
> >>> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
> >>> Hardware name: Renesas Salvator-X 2nd version board based on
> >>> r8a7795 ES2.0+ (DT)
> >>> Hardware name: Renesas Salvator-X 2nd version board based on
> >>> r8a7795 ES2.0+ (DT)
> >>> Workqueue: events_unbound async_run_entry_fn
> >>> Workqueue: events_unbound async_run_entry_fn
> >>> pstate: 60000005 (nZCv daif -PAN -UAO)
> >>> pstate: 60000005 (nZCv daif -PAN -UAO)
> >>> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
> >>> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
> >>> lr : phy_init+0x64/0xcc
> >>> lr : phy_init+0x64/0xcc
> >>> ...
> >>> Kernel panic - not syncing: Asynchronous SError Interrupt
> >>>
> >>> Note that before, it printed a warning instead:
> >>>
> >>> Enabling runtime PM for inactive device (ee0a0200.usb-phy) with
> >>> active children
> >>> WARNING: CPU: 0 PID: 1741 at drivers/base/power/runtime.c:1300
> >>> pm_runtime_enable+0x94/0xd8
> >>>
> >>> Reverting commit 0408584d580d4a2c ("PM / runtime: Rework
> >>> pm_runtime_force_suspend/resume()") fixes the crash.
> >>>
> >>> Note that applying Ulf's "[PATCH v2 0/3] phy: core: Re-work runtime PM
> >>> deployment and fix an issue"
> >>> (https://www.spinics.net/lists/linux-renesas-soc/msg21719.html) instead
> >>> does not fix the crash.
> >>
> >> OK
> >>
> >>> With more debug code added, it seems the EHCI module clocks (701-703) are
> >>> enabled earlier than before. I guess this triggers the workqueue to perform
> >>> an operation while another related device (HSUSB 704?) is still disabled?
> >>
> >> Probably.
> >>
> >> Likely a device that wasn't resumed before resumes now and that causes
> >> the issue to appear.
> >>
> >> I'm wondering if adding the ignore_children check to the patch helps.
> >> If not, there clearly is a resume ordering issue that is papered over
> >> by the current code.
> >
> > Something fishy is going on. Status of the USB PHYs differ before/after
> > system suspend, according to /sys/kernel/debug/pm_genpd/pm_genpd_summary:
> >
> > - /devices/platform/soc/ee0a0200.usb-phy active
> > - /devices/platform/soc/ee0c0200.usb-phy active
> > - /devices/platform/soc/ee080200.usb-phy active
> > + /devices/platform/soc/ee0a0200.usb-phy suspended
> > + /devices/platform/soc/ee0c0200.usb-phy suspended
> > + /devices/platform/soc/ee080200.usb-phy suspended
>
> Yeah.
>
> That's because of the pm_runtime_force_suspend/resume() called by
> genpd. These functions generally may cause devices active before
> system suspend to be left in suspend after it. That generally is a
> good idea if the device was not really in use before the system
> suspend, but there is a problem that the driver of it may not be
> prepared for that to happen (and also the way to determine the "not
> really in use" case may not be as expected by the driver).

So below is something to try (untested).

I know that Ulf will be protesting, but that's what I would do unless there
are really *really* good reasons not to.


---
drivers/base/power/domain.c | 34 +++++++++++++++++++++-------------
1 file changed, 21 insertions(+), 13 deletions(-)

Index: linux-pm/drivers/base/power/domain.c
===================================================================
--- linux-pm.orig/drivers/base/power/domain.c
+++ linux-pm/drivers/base/power/domain.c
@@ -1048,8 +1048,9 @@ static int genpd_finish_suspend(struct d
if (ret)
return ret;

- if (genpd->dev_ops.stop && genpd->dev_ops.start) {
- ret = pm_runtime_force_suspend(dev);
+ if (genpd->dev_ops.stop && genpd->dev_ops.start &&
+ !pm_runtime_status_suspended(dev)) {
+ ret = genpd_stop_dev(genpd, dev);
if (ret)
return ret;
}
@@ -1101,8 +1102,12 @@ static int genpd_resume_noirq(struct dev
genpd->suspended_count--;
genpd_unlock(genpd);

- if (genpd->dev_ops.stop && genpd->dev_ops.start)
- ret = pm_runtime_force_resume(dev);
+ if (genpd->dev_ops.stop && genpd->dev_ops.start &&
+ !pm_runtime_status_suspended(dev)) {
+ ret = genpd_start_dev(genpd, dev);
+ if (ret)
+ return ret;
+ }

ret = pm_generic_resume_noirq(dev);
if (ret)
@@ -1123,7 +1128,7 @@ static int genpd_resume_noirq(struct dev
static int genpd_freeze_noirq(struct device *dev)
{
const struct generic_pm_domain *genpd;
- int ret = 0;
+ int ret;

dev_dbg(dev, "%s()\n", __func__);

@@ -1135,8 +1140,9 @@ static int genpd_freeze_noirq(struct dev
if (ret)
return ret;

- if (genpd->dev_ops.stop && genpd->dev_ops.start)
- ret = pm_runtime_force_suspend(dev);
+ if (genpd->dev_ops.stop && genpd->dev_ops.start &&
+ !pm_runtime_status_suspended(dev))
+ ret = genpd_stop_dev(genpd, dev);

return ret;
}
@@ -1151,7 +1157,7 @@ static int genpd_freeze_noirq(struct dev
static int genpd_thaw_noirq(struct device *dev)
{
const struct generic_pm_domain *genpd;
- int ret = 0;
+ int ret;

dev_dbg(dev, "%s()\n", __func__);

@@ -1159,8 +1165,9 @@ static int genpd_thaw_noirq(struct devic
if (IS_ERR(genpd))
return -EINVAL;

- if (genpd->dev_ops.stop && genpd->dev_ops.start) {
- ret = pm_runtime_force_resume(dev);
+ if (genpd->dev_ops.stop && genpd->dev_ops.start &&
+ !pm_runtime_status_suspended(dev)) {
+ ret = genpd_start_dev(genpd, dev);
if (ret)
return ret;
}
@@ -1193,7 +1200,7 @@ static int genpd_poweroff_noirq(struct d
static int genpd_restore_noirq(struct device *dev)
{
struct generic_pm_domain *genpd;
- int ret = 0;
+ int ret;

dev_dbg(dev, "%s()\n", __func__);

@@ -1217,8 +1224,9 @@ static int genpd_restore_noirq(struct de
genpd_sync_power_on(genpd, true, 0);
genpd_unlock(genpd);

- if (genpd->dev_ops.stop && genpd->dev_ops.start) {
- ret = pm_runtime_force_resume(dev);
+ if (genpd->dev_ops.stop && genpd->dev_ops.start &&
+ !pm_runtime_status_suspended(dev)) {
+ ret = genpd_start_dev(genpd, dev);
if (ret)
return ret;
}

2018-01-09 18:46:33

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Tue, Jan 9, 2018 at 2:37 PM, Geert Uytterhoeven <[email protected]> wrote:
> Hi Rafael,
>
> On Wed, Jan 3, 2018 at 12:06 PM, Rafael J. Wysocki <[email protected]> wrote:
>> From: Rafael J. Wysocki <[email protected]>
>>
>> One of the limitations of pm_runtime_force_suspend/resume() is that
>> if a parent driver wants to use these functions, all of its child
>> drivers have to do that too because of the parent usage counter
>> manipulations necessary to get the correct state of the parent during
>> system-wide transitions to the working state (system resume).
>> However, that limitation turns out to be artificial, so remove it.
>>
>> Namely, pm_runtime_force_suspend() only needs to update the children
>> counter of its parent (if there's is a parent) when the device can
>> stay in suspend after the subsequent system resume transition, as
>> that counter is correct already otherwise. Now, if the parent's
>> children counter is not updated, it is not necessary to increment
>> the parent's usage counter in that case any more, as long as the
>> children counters of devices are checked along with their usage
>> counters in order to decide whether or not the devices may be left
>> in suspend after the subsequent system resume transition.
>>
>> Accordingly, modify pm_runtime_force_suspend() to only call
>> pm_runtime_set_suspended() for devices whose usage and children
>> counters are at the "no references" level (the runtime PM status
>> of the device needs to be updated to "suspended" anyway in case
>> this function is called once again for the same device during the
>> transition under way), drop the parent usage counter incrementation
>> from it and update pm_runtime_force_resume() to compensate for these
>> changes.
>>
>> Signed-off-by: Rafael J. Wysocki <[email protected]>
>
> This patch causes a regression during system resume on Renesas Salvator-XS
> with R-Car H3 ES2.0:
>
> SError Interrupt on CPU3, code 0xbf000002 -- SError
> SError Interrupt on CPU2, code 0xbf000002 -- SError
> CPU: 3 PID: 1769 Comm: kworker/u16:13 Not tainted
> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
> CPU: 2 PID: 1774 Comm: kworker/u16:18 Not tainted
> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
> Hardware name: Renesas Salvator-X 2nd version board based on
> r8a7795 ES2.0+ (DT)
> Hardware name: Renesas Salvator-X 2nd version board based on
> r8a7795 ES2.0+ (DT)
> Workqueue: events_unbound async_run_entry_fn
> Workqueue: events_unbound async_run_entry_fn
> pstate: 60000005 (nZCv daif -PAN -UAO)
> pstate: 60000005 (nZCv daif -PAN -UAO)
> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
> lr : phy_init+0x64/0xcc
> lr : phy_init+0x64/0xcc
> ...
> Kernel panic - not syncing: Asynchronous SError Interrupt
>
> Note that before, it printed a warning instead:
>
> Enabling runtime PM for inactive device (ee0a0200.usb-phy) with
> active children
> WARNING: CPU: 0 PID: 1741 at drivers/base/power/runtime.c:1300
> pm_runtime_enable+0x94/0xd8
>
> Reverting commit 0408584d580d4a2c ("PM / runtime: Rework
> pm_runtime_force_suspend/resume()") fixes the crash.

It looks like what happens without the Ulf's patch is as follows.

usb-phy has children with runtime PM enabled that are not in the
domain, so without the $subject patch the pm_runtime_force_suspend()
in genpd_finish_suspend() checks the usage counter of usb-phy and
since it is 1, the parent's usage counter is not incremented and
genpd_runtime_suspend() is run for usb-phy. On resume, the
pm_runtime_force_resume() in genpd_resume_noirq() finds that the usage
counter of usb-phy is 1, so the parent's usage counter is not
decremented (correctly) and the function returns (arguably incorrectly
if the runtime PM status of the children is "active", because it is
necessary to resume the children in that case, but the children have
no PM callbacks and even if they had had them, they would have been
run later anyway). The parent of usb-phy is skipped by the
pm_runtime_force_resume() too, because its usage counter is 1 when it
is checked by this function.

With the $subject patch pm_runtime_force_suspend() in
genpd_finish_suspend() calls pm_runtime_need_not_resume() and that
returns "false" for usb-phy if the runtime PM status or at least one
of its children is "active" (which it is for the "phy" devices). That
is correct, but for this reason the parent's children counter is not
decremented and both usb-phy and its parent will be resumed by the
subsequent pm_runtime_force_resume().

The pm_runtime_force_resume() in genpd_resume(_noirq) now finds that
pm_runtime_need_not_resume() returns "false" for both usb-phy and its
parent and attempts to resume them both via genpd_runtime_resume()
which is too early, because stuff they depend on hasn't been resumed
yet. That triggers the crash (so
https://patchwork.kernel.org/patch/10152767/ will cause the crash to
happen too).

In conclusion, without the $subject patch it all works pretty much by
accident, basically because the pm_runtime_force_resume()
inadvertently decides to skip the resume of some devices which avoids
the premature execution of genpd_runtime_resume() for them.

> Note that applying Ulf's "[PATCH v2 0/3] phy: core: Re-work runtime PM
> deployment and fix an issue"
> (https://www.spinics.net/lists/linux-renesas-soc/msg21719.html) instead
> does not fix the crash.

I'm not sure why the crash is still there in this case, but quite
likely it is related (again) to something with a child outside of the
domain that causes its parent to be treated as a device that must be
resumed and that triggers a premature execution of
genpd_runtime_resume() in genpd_resume_noirq() for the parent (because
something the parents depends on quite likely is outside of the domain
and will be resumed later).

Granted, the (indirect) invocation of genpd_runtime_suspend() in
genpd_finish_suspend() is too late for drivers that are not expecting
it to be called then (but also harmless if the drivers check the
status of the devices) and the (indirect) invocation of
genpd_runtime_resume() from genpd_resume_noirq() is too early,
especially if there are dependencies on devices outside the domains
handled by genpd.

So something like https://patchwork.kernel.org/patch/10152873/ is needed IMO.

Thanks,
Rafael

2018-01-09 19:02:22

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Tue, Jan 9, 2018 at 7:46 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Tue, Jan 9, 2018 at 2:37 PM, Geert Uytterhoeven <[email protected]> wrote:
>> Hi Rafael,
>>
>> On Wed, Jan 3, 2018 at 12:06 PM, Rafael J. Wysocki <[email protected]> wrote:
>>> From: Rafael J. Wysocki <[email protected]>
>>>

[cut]

>>>
>>> Signed-off-by: Rafael J. Wysocki <[email protected]>
>>
>> This patch causes a regression during system resume on Renesas Salvator-XS
>> with R-Car H3 ES2.0:
>>
>> SError Interrupt on CPU3, code 0xbf000002 -- SError
>> SError Interrupt on CPU2, code 0xbf000002 -- SError
>> CPU: 3 PID: 1769 Comm: kworker/u16:13 Not tainted
>> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
>> CPU: 2 PID: 1774 Comm: kworker/u16:18 Not tainted
>> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
>> Hardware name: Renesas Salvator-X 2nd version board based on
>> r8a7795 ES2.0+ (DT)
>> Hardware name: Renesas Salvator-X 2nd version board based on
>> r8a7795 ES2.0+ (DT)
>> Workqueue: events_unbound async_run_entry_fn
>> Workqueue: events_unbound async_run_entry_fn
>> pstate: 60000005 (nZCv daif -PAN -UAO)
>> pstate: 60000005 (nZCv daif -PAN -UAO)
>> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
>> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
>> lr : phy_init+0x64/0xcc
>> lr : phy_init+0x64/0xcc
>> ...
>> Kernel panic - not syncing: Asynchronous SError Interrupt
>>
>> Note that before, it printed a warning instead:
>>
>> Enabling runtime PM for inactive device (ee0a0200.usb-phy) with
>> active children
>> WARNING: CPU: 0 PID: 1741 at drivers/base/power/runtime.c:1300
>> pm_runtime_enable+0x94/0xd8
>>
>> Reverting commit 0408584d580d4a2c ("PM / runtime: Rework
>> pm_runtime_force_suspend/resume()") fixes the crash.
>
> It looks like what happens without the Ulf's patch is as follows.
>
> usb-phy has children with runtime PM enabled that are not in the
> domain, so without the $subject patch the pm_runtime_force_suspend()
> in genpd_finish_suspend() checks the usage counter of usb-phy and
> since it is 1, the parent's usage counter is not incremented and
> genpd_runtime_suspend() is run for usb-phy. On resume, the
> pm_runtime_force_resume() in genpd_resume_noirq() finds that the usage
> counter of usb-phy is 1, so the parent's usage counter is not
> decremented (correctly) and the function returns (arguably incorrectly
> if the runtime PM status of the children is "active", because it is
> necessary to resume the children in that case, but the children have
> no PM callbacks and even if they had had them, they would have been
> run later anyway). The parent of usb-phy is skipped by the
> pm_runtime_force_resume() too, because its usage counter is 1 when it
> is checked by this function.

And note that it shouldn't be skipped by pm_runtime_force_resume() in
principle, because there are active children under usb-phy.

> With the $subject patch pm_runtime_force_suspend() in
> genpd_finish_suspend() calls pm_runtime_need_not_resume() and that
> returns "false" for usb-phy if the runtime PM status or at least one
> of its children is "active" (which it is for the "phy" devices). That
> is correct, but for this reason the parent's children counter is not
> decremented and both usb-phy and its parent will be resumed by the
> subsequent pm_runtime_force_resume().
>
> The pm_runtime_force_resume() in genpd_resume(_noirq) now finds that
> pm_runtime_need_not_resume() returns "false" for both usb-phy and its
> parent and attempts to resume them both via genpd_runtime_resume()
> which is too early, because stuff they depend on hasn't been resumed
> yet. That triggers the crash (so
> https://patchwork.kernel.org/patch/10152767/ will cause the crash to
> happen too).
>
> In conclusion, without the $subject patch it all works pretty much by
> accident, basically because the pm_runtime_force_resume()
> inadvertently decides to skip the resume of some devices which avoids
> the premature execution of genpd_runtime_resume() for them.
>
>> Note that applying Ulf's "[PATCH v2 0/3] phy: core: Re-work runtime PM
>> deployment and fix an issue"
>> (https://www.spinics.net/lists/linux-renesas-soc/msg21719.html) instead
>> does not fix the crash.
>
> I'm not sure why the crash is still there in this case,

It is there, because usb-phy itself is now reference-counted by the
phy layer, so its usage counter is greater than 1 in
genpd_finish_suspend() and its parent's children counter is not
decremented then.

Next, on resume, the pm_runtime_force_resume() in genpd_resume_noirq()
will attempt to resume both usb-phy and its parent via
genpd_runtime_suspend() and that (again) is too early.

I'm not sure, however, why the crash isn't there with
https://www.spinics.net/lists/linux-renesas-soc/msg21719.html alone,
because in theory it should be there too in that case.

Thanks,
Rafael

2018-01-10 09:26:41

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

On 9 January 2018 at 17:28, Rafael J. Wysocki <[email protected]> wrote:
> On Tuesday, January 9, 2018 5:03:18 PM CET Rafael J. Wysocki wrote:
>> On Tue, Jan 9, 2018 at 4:30 PM, Geert Uytterhoeven <[email protected]> wrote:
>> > Hi Rafael,
>> >
>> > CC usb
>> >
>> > On Tue, Jan 9, 2018 at 4:00 PM, Rafael J. Wysocki <[email protected]> wrote:
>> >> On Tue, Jan 9, 2018 at 2:37 PM, Geert Uytterhoeven <[email protected]> wrote:
>> >>> On Wed, Jan 3, 2018 at 12:06 PM, Rafael J. Wysocki <[email protected]> wrote:
>> >>>> From: Rafael J. Wysocki <[email protected]>
>> >>>>
>> >>>> One of the limitations of pm_runtime_force_suspend/resume() is that
>> >>>> if a parent driver wants to use these functions, all of its child
>> >>>> drivers have to do that too because of the parent usage counter
>> >>>> manipulations necessary to get the correct state of the parent during
>> >>>> system-wide transitions to the working state (system resume).
>> >>>> However, that limitation turns out to be artificial, so remove it.
>> >>>>
>> >>>> Namely, pm_runtime_force_suspend() only needs to update the children
>> >>>> counter of its parent (if there's is a parent) when the device can
>> >>>> stay in suspend after the subsequent system resume transition, as
>> >>>> that counter is correct already otherwise. Now, if the parent's
>> >>>> children counter is not updated, it is not necessary to increment
>> >>>> the parent's usage counter in that case any more, as long as the
>> >>>> children counters of devices are checked along with their usage
>> >>>> counters in order to decide whether or not the devices may be left
>> >>>> in suspend after the subsequent system resume transition.
>> >>>>
>> >>>> Accordingly, modify pm_runtime_force_suspend() to only call
>> >>>> pm_runtime_set_suspended() for devices whose usage and children
>> >>>> counters are at the "no references" level (the runtime PM status
>> >>>> of the device needs to be updated to "suspended" anyway in case
>> >>>> this function is called once again for the same device during the
>> >>>> transition under way), drop the parent usage counter incrementation
>> >>>> from it and update pm_runtime_force_resume() to compensate for these
>> >>>> changes.
>> >>>>
>> >>>> Signed-off-by: Rafael J. Wysocki <[email protected]>
>> >>>
>> >>> This patch causes a regression during system resume on Renesas Salvator-XS
>> >>> with R-Car H3 ES2.0:
>> >>
>> >> I have dropped it for now, but we need to address the underlying issue.
>> >>
>> >>> SError Interrupt on CPU3, code 0xbf000002 -- SError
>> >>> SError Interrupt on CPU2, code 0xbf000002 -- SError
>> >>> CPU: 3 PID: 1769 Comm: kworker/u16:13 Not tainted
>> >>> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
>> >>> CPU: 2 PID: 1774 Comm: kworker/u16:18 Not tainted
>> >>> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
>> >>> Hardware name: Renesas Salvator-X 2nd version board based on
>> >>> r8a7795 ES2.0+ (DT)
>> >>> Hardware name: Renesas Salvator-X 2nd version board based on
>> >>> r8a7795 ES2.0+ (DT)
>> >>> Workqueue: events_unbound async_run_entry_fn
>> >>> Workqueue: events_unbound async_run_entry_fn
>> >>> pstate: 60000005 (nZCv daif -PAN -UAO)
>> >>> pstate: 60000005 (nZCv daif -PAN -UAO)
>> >>> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
>> >>> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
>> >>> lr : phy_init+0x64/0xcc
>> >>> lr : phy_init+0x64/0xcc
>> >>> ...
>> >>> Kernel panic - not syncing: Asynchronous SError Interrupt
>> >>>
>> >>> Note that before, it printed a warning instead:
>> >>>
>> >>> Enabling runtime PM for inactive device (ee0a0200.usb-phy) with
>> >>> active children
>> >>> WARNING: CPU: 0 PID: 1741 at drivers/base/power/runtime.c:1300
>> >>> pm_runtime_enable+0x94/0xd8
>> >>>
>> >>> Reverting commit 0408584d580d4a2c ("PM / runtime: Rework
>> >>> pm_runtime_force_suspend/resume()") fixes the crash.
>> >>>
>> >>> Note that applying Ulf's "[PATCH v2 0/3] phy: core: Re-work runtime PM
>> >>> deployment and fix an issue"
>> >>> (https://www.spinics.net/lists/linux-renesas-soc/msg21719.html) instead
>> >>> does not fix the crash.
>> >>
>> >> OK
>> >>
>> >>> With more debug code added, it seems the EHCI module clocks (701-703) are
>> >>> enabled earlier than before. I guess this triggers the workqueue to perform
>> >>> an operation while another related device (HSUSB 704?) is still disabled?
>> >>
>> >> Probably.
>> >>
>> >> Likely a device that wasn't resumed before resumes now and that causes
>> >> the issue to appear.
>> >>
>> >> I'm wondering if adding the ignore_children check to the patch helps.
>> >> If not, there clearly is a resume ordering issue that is papered over
>> >> by the current code.
>> >
>> > Something fishy is going on. Status of the USB PHYs differ before/after
>> > system suspend, according to /sys/kernel/debug/pm_genpd/pm_genpd_summary:
>> >
>> > - /devices/platform/soc/ee0a0200.usb-phy active
>> > - /devices/platform/soc/ee0c0200.usb-phy active
>> > - /devices/platform/soc/ee080200.usb-phy active
>> > + /devices/platform/soc/ee0a0200.usb-phy suspended
>> > + /devices/platform/soc/ee0c0200.usb-phy suspended
>> > + /devices/platform/soc/ee080200.usb-phy suspended
>>
>> Yeah.
>>
>> That's because of the pm_runtime_force_suspend/resume() called by
>> genpd. These functions generally may cause devices active before
>> system suspend to be left in suspend after it. That generally is a
>> good idea if the device was not really in use before the system
>> suspend, but there is a problem that the driver of it may not be
>> prepared for that to happen (and also the way to determine the "not
>> really in use" case may not be as expected by the driver).
>
> So below is something to try (untested).
>
> I know that Ulf will be protesting, but that's what I would do unless there
> are really *really* good reasons not to.

I am not protesting! :-)

Instead I think we are rather in agreement that we are concerned about
that genpd are invoking pm_runtime_force_suspend|resume().

>
>
> ---
> drivers/base/power/domain.c | 34 +++++++++++++++++++++-------------
> 1 file changed, 21 insertions(+), 13 deletions(-)
>
> Index: linux-pm/drivers/base/power/domain.c
> ===================================================================
> --- linux-pm.orig/drivers/base/power/domain.c
> +++ linux-pm/drivers/base/power/domain.c
> @@ -1048,8 +1048,9 @@ static int genpd_finish_suspend(struct d
> if (ret)
> return ret;
>
> - if (genpd->dev_ops.stop && genpd->dev_ops.start) {
> - ret = pm_runtime_force_suspend(dev);
> + if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> + !pm_runtime_status_suspended(dev)) {
> + ret = genpd_stop_dev(genpd, dev);

Something like this existed there before, but because of other
internal genpd reasons. It's an option but there are issues with it.

I think it's fragile because invoking the ->stop() callback may
trigger calls to "disable" resources (like clocks, pinctrls, etc) for
the device. Doing this at ->suspend_noirq() may be too late, as that
subsystem/driver for the resource(s) may already be system suspended.
This is the classic problem of suspending (and later resuming) devices
in in-correct order.

Today we deal with this, by drivers/subsystem assigning the right
level of callback, as well as making sure the "dpm_list" is sorted
correctly (yeah, we have device links as well).

The point is, we can go for this solution, but is it good enough?

Another option, is to simply to remove (and not replace with something
else) the calls to pm_runtime_force_ suspend|resume() from genpd. This
moves the entire responsibility to the driver, to put its device into
low power state during system suspend, but with the benefit of that it
can decide to use the correct level of suspend/resume callbacks.

No matter how we do it, changing this may introduce some potential
regressions from a power consumption point of view, however although
only for those SoCs that uses the ->start|stop() callbacks. To
mitigate these power regressions, some drivers needs to be fixed, but
that seems inevitable anyway, doesn't it?

[...]

Kind regards
Uffe

2018-01-11 00:46:52

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Wed, Jan 10, 2018 at 10:26 AM, Ulf Hansson <[email protected]> wrote:
> On 9 January 2018 at 17:28, Rafael J. Wysocki <[email protected]> wrote:
>> On Tuesday, January 9, 2018 5:03:18 PM CET Rafael J. Wysocki wrote:
>>> On Tue, Jan 9, 2018 at 4:30 PM, Geert Uytterhoeven <[email protected]> wrote:

[cut]

>>> That's because of the pm_runtime_force_suspend/resume() called by
>>> genpd. These functions generally may cause devices active before
>>> system suspend to be left in suspend after it. That generally is a
>>> good idea if the device was not really in use before the system
>>> suspend, but there is a problem that the driver of it may not be
>>> prepared for that to happen (and also the way to determine the "not
>>> really in use" case may not be as expected by the driver).
>>
>> So below is something to try (untested).
>>
>> I know that Ulf will be protesting, but that's what I would do unless there
>> are really *really* good reasons not to.
>
> I am not protesting! :-)

OK

That makes things a lot easier in principle. :-)

> Instead I think we are rather in agreement that we are concerned about
> that genpd are invoking pm_runtime_force_suspend|resume().
>
>>
>>
>> ---
>> drivers/base/power/domain.c | 34 +++++++++++++++++++++-------------
>> 1 file changed, 21 insertions(+), 13 deletions(-)
>>
>> Index: linux-pm/drivers/base/power/domain.c
>> ===================================================================
>> --- linux-pm.orig/drivers/base/power/domain.c
>> +++ linux-pm/drivers/base/power/domain.c
>> @@ -1048,8 +1048,9 @@ static int genpd_finish_suspend(struct d
>> if (ret)
>> return ret;
>>
>> - if (genpd->dev_ops.stop && genpd->dev_ops.start) {
>> - ret = pm_runtime_force_suspend(dev);
>> + if (genpd->dev_ops.stop && genpd->dev_ops.start &&
>> + !pm_runtime_status_suspended(dev)) {
>> + ret = genpd_stop_dev(genpd, dev);
>
> Something like this existed there before, but because of other
> internal genpd reasons. It's an option but there are issues with it.

OK

> I think it's fragile because invoking the ->stop() callback may
> trigger calls to "disable" resources (like clocks, pinctrls, etc) for
> the device. Doing this at ->suspend_noirq() may be too late, as that
> subsystem/driver for the resource(s) may already be system suspended.
> This is the classic problem of suspending (and later resuming) devices
> in in-correct order.

Well, this is what happens with the current code too.

I mean if unmodified genpd_finish_suspend() runs, it will call
pm_runtime_force_suspend() and that will check the device's runtime PM
status in the first place. If that is "suspended", it will return
right away without doing anything (after disabling runtime PM, but
that is irrelevant here as runtime PM is already disabled). In turn,
if the runtime PM status is "active", it will run
genpd_runtime_suspend() (as the callback) and that will run the
driver's runtime PM callback and the genpd_stop_dev() right after it.
Thus, if calling genpd_stop_dev() at that point is problematic, it is
problematic already today.

What the patch does is essentially to drop the driver's runtime PM
callback (should not be necessary) and the domain power off (certainly
not necessary) from that path, but to keep the genpd_stop_dev()
invocation that may be necessary in principle (the device is "active",
so genpd_runtime_suspend() has not run for it, so genpd_stop_dev() has
not been called for it yet, but it may be good to stop the clocks
before powering off the domain).

The resume part is basically running genpd_start_dev() for the devices
for which genpd_stop_dev() was run by genpd_finish_suspend(), which is
because the subsequent driver callbacks may expect the clocks to be
enabled.

> Today we deal with this, by drivers/subsystem assigning the right
> level of callback, as well as making sure the "dpm_list" is sorted
> correctly (yeah, we have device links as well).
>
> The point is, we can go for this solution, but is it good enough?

I would like to treat it as a step away from what is there today,
retaining some of the existing functionality.

>From a quick look at the existing users of genpd that use the device
start/stop functionality, running genpd_stop_dev()/genpd_start_dev()
in the "noirq" phases should not be problematic for them at least.

> Another option, is to simply to remove (and not replace with something
> else) the calls to pm_runtime_force_ suspend|resume() from genpd.

Right.

> This moves the entire responsibility to the driver, to put its device into
> low power state during system suspend, but with the benefit of that it
> can decide to use the correct level of suspend/resume callbacks.

Drivers of the devices in the "stop/start" domains will have to use
pm_runtime_force_ suspend|resume() internally then to benefit from the
domain's management of clocks, but that just may be the only way to
avoid more problems.

> No matter how we do it, changing this may introduce some potential
> regressions from a power consumption point of view, however although
> only for those SoCs that uses the ->start|stop() callbacks. To
> mitigate these power regressions, some drivers needs to be fixed, but
> that seems inevitable anyway, doesn't it?

Yes, it does.

I would say let's try to go with this patch of mine as submitted and
see how far we can get with that.

That essentially doesn't prevent us from dropping the
genpd_stop_dev()/genpd_start_dev() from the "noirq" stages if need be
after all, but dropping them right away may cause the fallout to be
more dramatic, at least in theory.

Thanks,
Rafael

2018-01-11 12:32:20

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

[...]

>>> Index: linux-pm/drivers/base/power/domain.c
>>> ===================================================================
>>> --- linux-pm.orig/drivers/base/power/domain.c
>>> +++ linux-pm/drivers/base/power/domain.c
>>> @@ -1048,8 +1048,9 @@ static int genpd_finish_suspend(struct d
>>> if (ret)
>>> return ret;
>>>
>>> - if (genpd->dev_ops.stop && genpd->dev_ops.start) {
>>> - ret = pm_runtime_force_suspend(dev);
>>> + if (genpd->dev_ops.stop && genpd->dev_ops.start &&
>>> + !pm_runtime_status_suspended(dev)) {
>>> + ret = genpd_stop_dev(genpd, dev);
>>
>> Something like this existed there before, but because of other
>> internal genpd reasons. It's an option but there are issues with it.
>
> OK
>
>> I think it's fragile because invoking the ->stop() callback may
>> trigger calls to "disable" resources (like clocks, pinctrls, etc) for
>> the device. Doing this at ->suspend_noirq() may be too late, as that
>> subsystem/driver for the resource(s) may already be system suspended.
>> This is the classic problem of suspending (and later resuming) devices
>> in in-correct order.
>
> Well, this is what happens with the current code too.

Correct.

>
> I mean if unmodified genpd_finish_suspend() runs, it will call
> pm_runtime_force_suspend() and that will check the device's runtime PM
> status in the first place. If that is "suspended", it will return
> right away without doing anything (after disabling runtime PM, but
> that is irrelevant here as runtime PM is already disabled). In turn,
> if the runtime PM status is "active", it will run
> genpd_runtime_suspend() (as the callback) and that will run the
> driver's runtime PM callback and the genpd_stop_dev() right after it.
> Thus, if calling genpd_stop_dev() at that point is problematic, it is
> problematic already today.
>
> What the patch does is essentially to drop the driver's runtime PM
> callback (should not be necessary) and the domain power off (certainly
> not necessary) from that path, but to keep the genpd_stop_dev()
> invocation that may be necessary in principle (the device is "active",
> so genpd_runtime_suspend() has not run for it, so genpd_stop_dev() has
> not been called for it yet, but it may be good to stop the clocks
> before powering off the domain).

I understand your suggested approach and it may work.

>
> The resume part is basically running genpd_start_dev() for the devices
> for which genpd_stop_dev() was run by genpd_finish_suspend(), which is
> because the subsequent driver callbacks may expect the clocks to be
> enabled.
>
>> Today we deal with this, by drivers/subsystem assigning the right
>> level of callback, as well as making sure the "dpm_list" is sorted
>> correctly (yeah, we have device links as well).
>>
>> The point is, we can go for this solution, but is it good enough?
>
> I would like to treat it as a step away from what is there today,
> retaining some of the existing functionality.
>
> From a quick look at the existing users of genpd that use the device
> start/stop functionality, running genpd_stop_dev()/genpd_start_dev()
> in the "noirq" phases should not be problematic for them at least.

I guess so.

Still, the error report by Geert worries me, but I don't think it
worth to speculate, but rather test and see.

>
>> Another option, is to simply to remove (and not replace with something
>> else) the calls to pm_runtime_force_ suspend|resume() from genpd.
>
> Right.
>
>> This moves the entire responsibility to the driver, to put its device into
>> low power state during system suspend, but with the benefit of that it
>> can decide to use the correct level of suspend/resume callbacks.
>
> Drivers of the devices in the "stop/start" domains will have to use
> pm_runtime_force_ suspend|resume() internally then to benefit from the
> domain's management of clocks, but that just may be the only way to
> avoid more problems.

Agree.

>
>> No matter how we do it, changing this may introduce some potential
>> regressions from a power consumption point of view, however although
>> only for those SoCs that uses the ->start|stop() callbacks. To
>> mitigate these power regressions, some drivers needs to be fixed, but
>> that seems inevitable anyway, doesn't it?
>
> Yes, it does.
>
> I would say let's try to go with this patch of mine as submitted and
> see how far we can get with that.
>
> That essentially doesn't prevent us from dropping the
> genpd_stop_dev()/genpd_start_dev() from the "noirq" stages if need be
> after all, but dropping them right away may cause the fallout to be
> more dramatic, at least in theory.
>

Well, my guesses is that would be a step to make the behavior a bit
more deterministic and perhaps less fragile than today.

On the other hand, I am also concerned about the future users of the
->stop|start() callbacks, including related drivers dealing with the
affected devices. That in a sense, that converting the code in genpd
to what you suggest, would still not encourage related drivers to
behave correctly during system suspend/resume. Then down the road,
when additional new users of the ->stop|start() callbacks may have
been added, it may be even harder to drop calling them in from the
noirq phases.

So to conclude, I would prefer to drop calling
pm_runtime_force_suspend|resume() from genpd, without a replacement,
but I am willing to accept also your suggested alternative.

Kind regards
Uffe

2018-01-11 18:44:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Thu, Jan 11, 2018 at 1:32 PM, Ulf Hansson <[email protected]> wrote:

[cut]

>>>
>>> The point is, we can go for this solution, but is it good enough?
>>
>> I would like to treat it as a step away from what is there today,
>> retaining some of the existing functionality.
>>
>> From a quick look at the existing users of genpd that use the device
>> start/stop functionality, running genpd_stop_dev()/genpd_start_dev()
>> in the "noirq" phases should not be problematic for them at least.
>
> I guess so.
>
> Still, the error report by Geert worries me, but I don't think it
> worth to speculate, but rather test and see.

Agreed.

>>> Another option, is to simply to remove (and not replace with something
>>> else) the calls to pm_runtime_force_ suspend|resume() from genpd.
>>
>> Right.
>>
>>> This moves the entire responsibility to the driver, to put its device into
>>> low power state during system suspend, but with the benefit of that it
>>> can decide to use the correct level of suspend/resume callbacks.
>>
>> Drivers of the devices in the "stop/start" domains will have to use
>> pm_runtime_force_ suspend|resume() internally then to benefit from the
>> domain's management of clocks, but that just may be the only way to
>> avoid more problems.
>
> Agree.

But this doesn't look particularly attractive to me, because it would
add a requirement for drivers to implement their system-wide PM
callbacks in a specific way and, in my view, no such requirements
should be imposed by code that advertises itself as "generic" (by the
name at least). In particular, drivers written to this requirement
might not be able to work correctly with other middle-layer code.

If genpd was a bus type, that wouldn't be particularly objectionable,
even though it is not particularly straightforward either, because
drivers register for a particular bus type and so they need to take
its specifics into account. Drivers usually don't register for a
specific PM domain, however, and at least some of them may need to
work with different PM domain code on different platforms.

>>> No matter how we do it, changing this may introduce some potential
>>> regressions from a power consumption point of view, however although
>>> only for those SoCs that uses the ->start|stop() callbacks. To
>>> mitigate these power regressions, some drivers needs to be fixed, but
>>> that seems inevitable anyway, doesn't it?
>>
>> Yes, it does.
>>
>> I would say let's try to go with this patch of mine as submitted and
>> see how far we can get with that.
>>
>> That essentially doesn't prevent us from dropping the
>> genpd_stop_dev()/genpd_start_dev() from the "noirq" stages if need be
>> after all, but dropping them right away may cause the fallout to be
>> more dramatic, at least in theory.
>>
>
> Well, my guesses is that would be a step to make the behavior a bit
> more deterministic and perhaps less fragile than today.
>
> On the other hand, I am also concerned about the future users of the
> ->stop|start() callbacks, including related drivers dealing with the
> affected devices. That in a sense, that converting the code in genpd
> to what you suggest, would still not encourage related drivers to
> behave correctly during system suspend/resume.

I'm not sure what you mean here.

Drivers in principle should not need to worry about the ->stop|start()
part, because that belongs to a different code layer.

They only need to be able to expect certain things (like clocks) to be
there (or in a specific state) when their code is running in certain
cases.

Now, I realize that this is pure theory and that in practice, if
->stop|start() are there, drivers working with genpd are probably
platform-specific anyway, more or less, so it is not likely that they
will ever have to work with any other middle-layer code.

>Then down the road, when additional new users of the ->stop|start() callbacks
> may have been added, it may be even harder to drop calling them in from the
> noirq phases.
>
> So to conclude, I would prefer to drop calling
> pm_runtime_force_suspend|resume() from genpd, without a replacement,
> but I am willing to accept also your suggested alternative.

OK

Thanks,
Rafael

2018-01-12 01:55:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Tue, Jan 9, 2018 at 7:08 AM, Ulf Hansson <[email protected]> wrote:
> On 3 January 2018 at 12:06, Rafael J. Wysocki <[email protected]> wrote:
>> From: Rafael J. Wysocki <[email protected]>
>>
>> One of the limitations of pm_runtime_force_suspend/resume() is that
>> if a parent driver wants to use these functions, all of its child
>> drivers have to do that too because of the parent usage counter
>> manipulations necessary to get the correct state of the parent during
>> system-wide transitions to the working state (system resume).
>
> I understand your point, however this isn't describing the full story,
> because there are a some more alternatives when
> pm_runtime_force_suspend|resume() works well, when used for parent
> devices. Let me clarify, just to make sure we get this correct.
>
> 1) If the child device isn't managed by runtime PM at all (it always
> has runtime PM disabled), pm_runtime_force_suspend|resume() can be
> used for the parent device. This works because the runtime PM status
> if the child remains in the default status, RPM_SUSPENDED.

Say the child is resumed every time during system-wide resume (which
is the only possibility for drivers that don't support runtime PM) and
it needs the parent to be accessible for that.

If the parent uses pm_runtime_force_suspend|resume(), the child needs
to bump up the usage counter of the parent during system-wide suspend
to prevent pm_runtime_force_resume() from leaving the parent in
suspend during the subsequent system-wide resume and that obviously is
not going to happen if the child driver doesn't support runtime PM.

Note that the $subject patch doesn't really change anything in that
respect, because the children counter of the parent will be zero (or
at least will not cover the child in question) then.

Of course, this means that it is only safe to use
pm_runtime_force_suspend|resume() in parent drivers if all of their
child drivers support runtime PM.

> 2) If, somehow, during system suspend/resume, the driver for the child
> make sure to synchronize the runtime PM status of the child device,
> according to the state of the HW, this should work too.

Not really.

Again, problematic is the case when the child is resumed during
system-wide resume and needs the parent to be accessible for that.

Note that the runtime PM status of the child doesn't matter for
pm_runtime_force_resume() running for the parent. What matters in
there is the parent's usage counter alone. If that usage counter is
not greater than 1, pm_runtime_force_resume() will leave the parent in
suspend and note that it must run before the child's system-wide
resume callbacks for the ordering between the parent resume and the
child resume to be correct. This means that the parent will still be
suspended when the child's system-wide resume callbacks run and that
is a problem if the parent has to be accessible for them to succeed.

Thus the runtime PM status of the child has no bearing on whether or
not this is going to work. The child has to bump up the parent's
usage counter during system-wide suspend to ensure that it will not be
left in suspend by the subsequent pm_runtime_force_resume() and that
practically requires the child driver to also use
pm_runtime_force_suspend|resume().

Now, the $subject patch changes the situation here for child drivers
that only resume their devices during system-wide resume if they were
"active" before the preceding system-wide suspend, because in that
case the children counter of the parent will be nonzero. If the child
resumes during system-wide resume even though it was suspended before
the preceding system-wide suspend, pm_runtime_force_resume() running
for its parent won't realize that the parent needs to resume too even
after this patch.

I generally think that leaving devices in suspend during system-wide
resume is a very aggressive optimization and it should be done with
extra care. It generally is not sufficient to rely on information
coming from the runtime PM framework to do that safely in all cases.
IMO it should only be done for drivers that explicitly expect it to be
done and only if all of the drivers depending on them also expect it
to be done. Otherwise it always will be potentially unsafe.

Thanks,
Rafael

2018-01-12 11:20:33

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

Hi Rafael,

On Tue, Jan 9, 2018 at 5:25 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Tuesday, January 9, 2018 4:00:35 PM CET Rafael J. Wysocki wrote:
>> On Tue, Jan 9, 2018 at 2:37 PM, Geert Uytterhoeven <[email protected]> wrote:
>> > On Wed, Jan 3, 2018 at 12:06 PM, Rafael J. Wysocki <[email protected]> wrote:
>> >> From: Rafael J. Wysocki <[email protected]>
>> >>
>> >> One of the limitations of pm_runtime_force_suspend/resume() is that
>> >> if a parent driver wants to use these functions, all of its child
>> >> drivers have to do that too because of the parent usage counter
>> >> manipulations necessary to get the correct state of the parent during
>> >> system-wide transitions to the working state (system resume).
>> >> However, that limitation turns out to be artificial, so remove it.
>> >>
>> >> Namely, pm_runtime_force_suspend() only needs to update the children
>> >> counter of its parent (if there's is a parent) when the device can
>> >> stay in suspend after the subsequent system resume transition, as
>> >> that counter is correct already otherwise. Now, if the parent's
>> >> children counter is not updated, it is not necessary to increment
>> >> the parent's usage counter in that case any more, as long as the
>> >> children counters of devices are checked along with their usage
>> >> counters in order to decide whether or not the devices may be left
>> >> in suspend after the subsequent system resume transition.
>> >>
>> >> Accordingly, modify pm_runtime_force_suspend() to only call
>> >> pm_runtime_set_suspended() for devices whose usage and children
>> >> counters are at the "no references" level (the runtime PM status
>> >> of the device needs to be updated to "suspended" anyway in case
>> >> this function is called once again for the same device during the
>> >> transition under way), drop the parent usage counter incrementation
>> >> from it and update pm_runtime_force_resume() to compensate for these
>> >> changes.
>> >>
>> >> Signed-off-by: Rafael J. Wysocki <[email protected]>
>> >
>> > This patch causes a regression during system resume on Renesas Salvator-XS
>> > with R-Car H3 ES2.0:
>>
>> I have dropped it for now, but we need to address the underlying issue.
>>
>> > SError Interrupt on CPU3, code 0xbf000002 -- SError
>> > SError Interrupt on CPU2, code 0xbf000002 -- SError
>> > CPU: 3 PID: 1769 Comm: kworker/u16:13 Not tainted
>> > 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
>> > CPU: 2 PID: 1774 Comm: kworker/u16:18 Not tainted
>> > 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
>> > Hardware name: Renesas Salvator-X 2nd version board based on
>> > r8a7795 ES2.0+ (DT)
>> > Hardware name: Renesas Salvator-X 2nd version board based on
>> > r8a7795 ES2.0+ (DT)
>> > Workqueue: events_unbound async_run_entry_fn
>> > Workqueue: events_unbound async_run_entry_fn
>> > pstate: 60000005 (nZCv daif -PAN -UAO)
>> > pstate: 60000005 (nZCv daif -PAN -UAO)
>> > pc : rcar_gen3_phy_usb2_init+0x34/0xf8
>> > pc : rcar_gen3_phy_usb2_init+0x34/0xf8
>> > lr : phy_init+0x64/0xcc
>> > lr : phy_init+0x64/0xcc
>> > ...
>> > Kernel panic - not syncing: Asynchronous SError Interrupt
>> >
>> > Note that before, it printed a warning instead:
>> >
>> > Enabling runtime PM for inactive device (ee0a0200.usb-phy) with
>> > active children
>> > WARNING: CPU: 0 PID: 1741 at drivers/base/power/runtime.c:1300
>> > pm_runtime_enable+0x94/0xd8
>> >
>> > Reverting commit 0408584d580d4a2c ("PM / runtime: Rework
>> > pm_runtime_force_suspend/resume()") fixes the crash.
>> >
>> > Note that applying Ulf's "[PATCH v2 0/3] phy: core: Re-work runtime PM
>> > deployment and fix an issue"
>> > (https://www.spinics.net/lists/linux-renesas-soc/msg21719.html) instead
>> > does not fix the crash.
>>
>> OK
>>
>> > With more debug code added, it seems the EHCI module clocks (701-703) are
>> > enabled earlier than before. I guess this triggers the workqueue to perform
>> > an operation while another related device (HSUSB 704?) is still disabled?
>>
>> Probably.
>>
>> Likely a device that wasn't resumed before resumes now and that causes
>> the issue to appear.
>>
>> I'm wondering if adding the ignore_children check to the patch helps.
>> If not, there clearly is a resume ordering issue that is papered over
>> by the current code.
>
> Below is an alternative version of the patch that caused the problem
> to happen which checks the ignore_children flag of the device. It may
> avoid the issue by accident, but then I'd rather make it not happen. :-)

So the delta with the previous version is:

--- a/drivers/base/power/runtime.c
+++ b/drivers/base/power/runtime.c
@@ -1616,7 +1616,8 @@ void pm_runtime_drop_link(struct device *dev)
static bool pm_runtime_need_not_resume(struct device *dev)
{
return atomic_read(&dev->power.usage_count) <= 1 &&
- atomic_read(&dev->power.child_count) == 0;
+ (atomic_read(&dev->power.child_count) == 0 ||
+ dev->power.ignore_children);
}

/**

> Please try this patch anyway, though, and let me know if the crash is still
> there.

Unfortunately it still fails the same way.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-01-12 11:27:00

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

Hi Rafael,

On Tue, Jan 9, 2018 at 5:28 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Tuesday, January 9, 2018 5:03:18 PM CET Rafael J. Wysocki wrote:
>> On Tue, Jan 9, 2018 at 4:30 PM, Geert Uytterhoeven <[email protected]> wrote:
>> > On Tue, Jan 9, 2018 at 4:00 PM, Rafael J. Wysocki <[email protected]> wrote:
>> >> On Tue, Jan 9, 2018 at 2:37 PM, Geert Uytterhoeven <[email protected]> wrote:
>> >>> On Wed, Jan 3, 2018 at 12:06 PM, Rafael J. Wysocki <[email protected]> wrote:
>> >>>> From: Rafael J. Wysocki <[email protected]>
>> >>>>
>> >>>> One of the limitations of pm_runtime_force_suspend/resume() is that
>> >>>> if a parent driver wants to use these functions, all of its child
>> >>>> drivers have to do that too because of the parent usage counter
>> >>>> manipulations necessary to get the correct state of the parent during
>> >>>> system-wide transitions to the working state (system resume).
>> >>>> However, that limitation turns out to be artificial, so remove it.
>> >>>>
>> >>>> Namely, pm_runtime_force_suspend() only needs to update the children
>> >>>> counter of its parent (if there's is a parent) when the device can
>> >>>> stay in suspend after the subsequent system resume transition, as
>> >>>> that counter is correct already otherwise. Now, if the parent's
>> >>>> children counter is not updated, it is not necessary to increment
>> >>>> the parent's usage counter in that case any more, as long as the
>> >>>> children counters of devices are checked along with their usage
>> >>>> counters in order to decide whether or not the devices may be left
>> >>>> in suspend after the subsequent system resume transition.
>> >>>>
>> >>>> Accordingly, modify pm_runtime_force_suspend() to only call
>> >>>> pm_runtime_set_suspended() for devices whose usage and children
>> >>>> counters are at the "no references" level (the runtime PM status
>> >>>> of the device needs to be updated to "suspended" anyway in case
>> >>>> this function is called once again for the same device during the
>> >>>> transition under way), drop the parent usage counter incrementation
>> >>>> from it and update pm_runtime_force_resume() to compensate for these
>> >>>> changes.
>> >>>>
>> >>>> Signed-off-by: Rafael J. Wysocki <[email protected]>
>> >>>
>> >>> This patch causes a regression during system resume on Renesas Salvator-XS
>> >>> with R-Car H3 ES2.0:
>> >>
>> >> I have dropped it for now, but we need to address the underlying issue.
>> >>
>> >>> SError Interrupt on CPU3, code 0xbf000002 -- SError
>> >>> SError Interrupt on CPU2, code 0xbf000002 -- SError
>> >>> CPU: 3 PID: 1769 Comm: kworker/u16:13 Not tainted
>> >>> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
>> >>> CPU: 2 PID: 1774 Comm: kworker/u16:18 Not tainted
>> >>> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
>> >>> Hardware name: Renesas Salvator-X 2nd version board based on
>> >>> r8a7795 ES2.0+ (DT)
>> >>> Hardware name: Renesas Salvator-X 2nd version board based on
>> >>> r8a7795 ES2.0+ (DT)
>> >>> Workqueue: events_unbound async_run_entry_fn
>> >>> Workqueue: events_unbound async_run_entry_fn
>> >>> pstate: 60000005 (nZCv daif -PAN -UAO)
>> >>> pstate: 60000005 (nZCv daif -PAN -UAO)
>> >>> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
>> >>> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
>> >>> lr : phy_init+0x64/0xcc
>> >>> lr : phy_init+0x64/0xcc
>> >>> ...
>> >>> Kernel panic - not syncing: Asynchronous SError Interrupt
>> >>>
>> >>> Note that before, it printed a warning instead:
>> >>>
>> >>> Enabling runtime PM for inactive device (ee0a0200.usb-phy) with
>> >>> active children
>> >>> WARNING: CPU: 0 PID: 1741 at drivers/base/power/runtime.c:1300
>> >>> pm_runtime_enable+0x94/0xd8
>> >>>
>> >>> Reverting commit 0408584d580d4a2c ("PM / runtime: Rework
>> >>> pm_runtime_force_suspend/resume()") fixes the crash.
>> >>>
>> >>> Note that applying Ulf's "[PATCH v2 0/3] phy: core: Re-work runtime PM
>> >>> deployment and fix an issue"
>> >>> (https://www.spinics.net/lists/linux-renesas-soc/msg21719.html) instead
>> >>> does not fix the crash.
>> >>
>> >> OK
>> >>
>> >>> With more debug code added, it seems the EHCI module clocks (701-703) are
>> >>> enabled earlier than before. I guess this triggers the workqueue to perform
>> >>> an operation while another related device (HSUSB 704?) is still disabled?
>> >>
>> >> Probably.
>> >>
>> >> Likely a device that wasn't resumed before resumes now and that causes
>> >> the issue to appear.
>> >>
>> >> I'm wondering if adding the ignore_children check to the patch helps.
>> >> If not, there clearly is a resume ordering issue that is papered over
>> >> by the current code.
>> >
>> > Something fishy is going on. Status of the USB PHYs differ before/after
>> > system suspend, according to /sys/kernel/debug/pm_genpd/pm_genpd_summary:
>> >
>> > - /devices/platform/soc/ee0a0200.usb-phy active
>> > - /devices/platform/soc/ee0c0200.usb-phy active
>> > - /devices/platform/soc/ee080200.usb-phy active
>> > + /devices/platform/soc/ee0a0200.usb-phy suspended
>> > + /devices/platform/soc/ee0c0200.usb-phy suspended
>> > + /devices/platform/soc/ee080200.usb-phy suspended
>>
>> Yeah.
>>
>> That's because of the pm_runtime_force_suspend/resume() called by
>> genpd. These functions generally may cause devices active before
>> system suspend to be left in suspend after it. That generally is a
>> good idea if the device was not really in use before the system
>> suspend, but there is a problem that the driver of it may not be
>> prepared for that to happen (and also the way to determine the "not
>> really in use" case may not be as expected by the driver).
>
> So below is something to try (untested).
>
> I know that Ulf will be protesting, but that's what I would do unless there
> are really *really* good reasons not to.

With the below, the issue is gone, both with and without the
dev->power.ignore_children check.

> --- linux-pm.orig/drivers/base/power/domain.c
> +++ linux-pm/drivers/base/power/domain.c
> @@ -1048,8 +1048,9 @@ static int genpd_finish_suspend(struct d
> if (ret)
> return ret;
>
> - if (genpd->dev_ops.stop && genpd->dev_ops.start) {
> - ret = pm_runtime_force_suspend(dev);
> + if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> + !pm_runtime_status_suspended(dev)) {
> + ret = genpd_stop_dev(genpd, dev);
> if (ret)
> return ret;
> }
> @@ -1101,8 +1102,12 @@ static int genpd_resume_noirq(struct dev
> genpd->suspended_count--;
> genpd_unlock(genpd);
>
> - if (genpd->dev_ops.stop && genpd->dev_ops.start)
> - ret = pm_runtime_force_resume(dev);
> + if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> + !pm_runtime_status_suspended(dev)) {
> + ret = genpd_start_dev(genpd, dev);
> + if (ret)
> + return ret;
> + }
>
> ret = pm_generic_resume_noirq(dev);
> if (ret)
> @@ -1123,7 +1128,7 @@ static int genpd_resume_noirq(struct dev
> static int genpd_freeze_noirq(struct device *dev)
> {
> const struct generic_pm_domain *genpd;
> - int ret = 0;
> + int ret;
>
> dev_dbg(dev, "%s()\n", __func__);
>
> @@ -1135,8 +1140,9 @@ static int genpd_freeze_noirq(struct dev
> if (ret)
> return ret;
>
> - if (genpd->dev_ops.stop && genpd->dev_ops.start)
> - ret = pm_runtime_force_suspend(dev);
> + if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> + !pm_runtime_status_suspended(dev))
> + ret = genpd_stop_dev(genpd, dev);
>
> return ret;
> }
> @@ -1151,7 +1157,7 @@ static int genpd_freeze_noirq(struct dev
> static int genpd_thaw_noirq(struct device *dev)
> {
> const struct generic_pm_domain *genpd;
> - int ret = 0;
> + int ret;
>
> dev_dbg(dev, "%s()\n", __func__);
>
> @@ -1159,8 +1165,9 @@ static int genpd_thaw_noirq(struct devic
> if (IS_ERR(genpd))
> return -EINVAL;
>
> - if (genpd->dev_ops.stop && genpd->dev_ops.start) {
> - ret = pm_runtime_force_resume(dev);
> + if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> + !pm_runtime_status_suspended(dev)) {
> + ret = genpd_start_dev(genpd, dev);
> if (ret)
> return ret;
> }
> @@ -1193,7 +1200,7 @@ static int genpd_poweroff_noirq(struct d
> static int genpd_restore_noirq(struct device *dev)
> {
> struct generic_pm_domain *genpd;
> - int ret = 0;
> + int ret;
>
> dev_dbg(dev, "%s()\n", __func__);
>
> @@ -1217,8 +1224,9 @@ static int genpd_restore_noirq(struct de
> genpd_sync_power_on(genpd, true, 0);
> genpd_unlock(genpd);
>
> - if (genpd->dev_ops.stop && genpd->dev_ops.start) {
> - ret = pm_runtime_force_resume(dev);
> + if (genpd->dev_ops.stop && genpd->dev_ops.start &&
> + !pm_runtime_status_suspended(dev)) {
> + ret = genpd_start_dev(genpd, dev);
> if (ret)
> return ret;
> }

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2018-01-12 12:11:13

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Friday, January 12, 2018 12:26:56 PM CET Geert Uytterhoeven wrote:
> Hi Rafael,
>
> On Tue, Jan 9, 2018 at 5:28 PM, Rafael J. Wysocki <[email protected]> wrote:
> > On Tuesday, January 9, 2018 5:03:18 PM CET Rafael J. Wysocki wrote:
> >> On Tue, Jan 9, 2018 at 4:30 PM, Geert Uytterhoeven <[email protected]> wrote:
> >> > On Tue, Jan 9, 2018 at 4:00 PM, Rafael J. Wysocki <[email protected]> wrote:
> >> >> On Tue, Jan 9, 2018 at 2:37 PM, Geert Uytterhoeven <[email protected]> wrote:
> >> >>> On Wed, Jan 3, 2018 at 12:06 PM, Rafael J. Wysocki <[email protected]> wrote:
> >> >>>> From: Rafael J. Wysocki <[email protected]>
> >> >>>>
> >> >>>> One of the limitations of pm_runtime_force_suspend/resume() is that
> >> >>>> if a parent driver wants to use these functions, all of its child
> >> >>>> drivers have to do that too because of the parent usage counter
> >> >>>> manipulations necessary to get the correct state of the parent during
> >> >>>> system-wide transitions to the working state (system resume).
> >> >>>> However, that limitation turns out to be artificial, so remove it.
> >> >>>>
> >> >>>> Namely, pm_runtime_force_suspend() only needs to update the children
> >> >>>> counter of its parent (if there's is a parent) when the device can
> >> >>>> stay in suspend after the subsequent system resume transition, as
> >> >>>> that counter is correct already otherwise. Now, if the parent's
> >> >>>> children counter is not updated, it is not necessary to increment
> >> >>>> the parent's usage counter in that case any more, as long as the
> >> >>>> children counters of devices are checked along with their usage
> >> >>>> counters in order to decide whether or not the devices may be left
> >> >>>> in suspend after the subsequent system resume transition.
> >> >>>>
> >> >>>> Accordingly, modify pm_runtime_force_suspend() to only call
> >> >>>> pm_runtime_set_suspended() for devices whose usage and children
> >> >>>> counters are at the "no references" level (the runtime PM status
> >> >>>> of the device needs to be updated to "suspended" anyway in case
> >> >>>> this function is called once again for the same device during the
> >> >>>> transition under way), drop the parent usage counter incrementation
> >> >>>> from it and update pm_runtime_force_resume() to compensate for these
> >> >>>> changes.
> >> >>>>
> >> >>>> Signed-off-by: Rafael J. Wysocki <[email protected]>
> >> >>>
> >> >>> This patch causes a regression during system resume on Renesas Salvator-XS
> >> >>> with R-Car H3 ES2.0:
> >> >>
> >> >> I have dropped it for now, but we need to address the underlying issue.
> >> >>
> >> >>> SError Interrupt on CPU3, code 0xbf000002 -- SError
> >> >>> SError Interrupt on CPU2, code 0xbf000002 -- SError
> >> >>> CPU: 3 PID: 1769 Comm: kworker/u16:13 Not tainted
> >> >>> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
> >> >>> CPU: 2 PID: 1774 Comm: kworker/u16:18 Not tainted
> >> >>> 4.15.0-rc7-arm64-renesas-05338-gf14cf570a813c9ca-dirty #97
> >> >>> Hardware name: Renesas Salvator-X 2nd version board based on
> >> >>> r8a7795 ES2.0+ (DT)
> >> >>> Hardware name: Renesas Salvator-X 2nd version board based on
> >> >>> r8a7795 ES2.0+ (DT)
> >> >>> Workqueue: events_unbound async_run_entry_fn
> >> >>> Workqueue: events_unbound async_run_entry_fn
> >> >>> pstate: 60000005 (nZCv daif -PAN -UAO)
> >> >>> pstate: 60000005 (nZCv daif -PAN -UAO)
> >> >>> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
> >> >>> pc : rcar_gen3_phy_usb2_init+0x34/0xf8
> >> >>> lr : phy_init+0x64/0xcc
> >> >>> lr : phy_init+0x64/0xcc
> >> >>> ...
> >> >>> Kernel panic - not syncing: Asynchronous SError Interrupt
> >> >>>
> >> >>> Note that before, it printed a warning instead:
> >> >>>
> >> >>> Enabling runtime PM for inactive device (ee0a0200.usb-phy) with
> >> >>> active children
> >> >>> WARNING: CPU: 0 PID: 1741 at drivers/base/power/runtime.c:1300
> >> >>> pm_runtime_enable+0x94/0xd8
> >> >>>
> >> >>> Reverting commit 0408584d580d4a2c ("PM / runtime: Rework
> >> >>> pm_runtime_force_suspend/resume()") fixes the crash.
> >> >>>
> >> >>> Note that applying Ulf's "[PATCH v2 0/3] phy: core: Re-work runtime PM
> >> >>> deployment and fix an issue"
> >> >>> (https://www.spinics.net/lists/linux-renesas-soc/msg21719.html) instead
> >> >>> does not fix the crash.
> >> >>
> >> >> OK
> >> >>
> >> >>> With more debug code added, it seems the EHCI module clocks (701-703) are
> >> >>> enabled earlier than before. I guess this triggers the workqueue to perform
> >> >>> an operation while another related device (HSUSB 704?) is still disabled?
> >> >>
> >> >> Probably.
> >> >>
> >> >> Likely a device that wasn't resumed before resumes now and that causes
> >> >> the issue to appear.
> >> >>
> >> >> I'm wondering if adding the ignore_children check to the patch helps.
> >> >> If not, there clearly is a resume ordering issue that is papered over
> >> >> by the current code.
> >> >
> >> > Something fishy is going on. Status of the USB PHYs differ before/after
> >> > system suspend, according to /sys/kernel/debug/pm_genpd/pm_genpd_summary:
> >> >
> >> > - /devices/platform/soc/ee0a0200.usb-phy active
> >> > - /devices/platform/soc/ee0c0200.usb-phy active
> >> > - /devices/platform/soc/ee080200.usb-phy active
> >> > + /devices/platform/soc/ee0a0200.usb-phy suspended
> >> > + /devices/platform/soc/ee0c0200.usb-phy suspended
> >> > + /devices/platform/soc/ee080200.usb-phy suspended
> >>
> >> Yeah.
> >>
> >> That's because of the pm_runtime_force_suspend/resume() called by
> >> genpd. These functions generally may cause devices active before
> >> system suspend to be left in suspend after it. That generally is a
> >> good idea if the device was not really in use before the system
> >> suspend, but there is a problem that the driver of it may not be
> >> prepared for that to happen (and also the way to determine the "not
> >> really in use" case may not be as expected by the driver).
> >
> > So below is something to try (untested).
> >
> > I know that Ulf will be protesting, but that's what I would do unless there
> > are really *really* good reasons not to.
>
> With the below, the issue is gone,

Cool, thanks for testing!

> both with and without the dev->power.ignore_children check.

Yes, that check is irrelevant here.

Thanks,
Rafael

2018-01-12 13:46:20

by Ulf Hansson

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

On 12 January 2018 at 02:55, Rafael J. Wysocki <[email protected]> wrote:
> On Tue, Jan 9, 2018 at 7:08 AM, Ulf Hansson <[email protected]> wrote:
>> On 3 January 2018 at 12:06, Rafael J. Wysocki <[email protected]> wrote:
>>> From: Rafael J. Wysocki <[email protected]>
>>>
>>> One of the limitations of pm_runtime_force_suspend/resume() is that
>>> if a parent driver wants to use these functions, all of its child
>>> drivers have to do that too because of the parent usage counter
>>> manipulations necessary to get the correct state of the parent during
>>> system-wide transitions to the working state (system resume).
>>
>> I understand your point, however this isn't describing the full story,
>> because there are a some more alternatives when
>> pm_runtime_force_suspend|resume() works well, when used for parent
>> devices. Let me clarify, just to make sure we get this correct.
>>
>> 1) If the child device isn't managed by runtime PM at all (it always
>> has runtime PM disabled), pm_runtime_force_suspend|resume() can be
>> used for the parent device. This works because the runtime PM status
>> if the child remains in the default status, RPM_SUSPENDED.
>
> Say the child is resumed every time during system-wide resume (which
> is the only possibility for drivers that don't support runtime PM) and
> it needs the parent to be accessible for that.
>
> If the parent uses pm_runtime_force_suspend|resume(), the child needs
> to bump up the usage counter of the parent during system-wide suspend
> to prevent pm_runtime_force_resume() from leaving the parent in
> suspend during the subsequent system-wide resume and that obviously is
> not going to happen if the child driver doesn't support runtime PM.

My take on this, is that this isn't a system suspend/resume specific
issue, thus also not something that pm_runtime_force_suspend|resume()
should need need to care about.

The reason is why I think so is because, normally during runtime, the
runtime PM core don't prevent parents from being runtime suspended, in
case their children isn't managed by runtime PM. In other words, the
problem exists already and isn't specific to system suspend.

>
> Note that the $subject patch doesn't really change anything in that
> respect, because the children counter of the parent will be zero (or
> at least will not cover the child in question) then.
>
> Of course, this means that it is only safe to use
> pm_runtime_force_suspend|resume() in parent drivers if all of their
> child drivers support runtime PM.

I understand your concern, however, hopefully the about comment can address it!?

>
>> 2) If, somehow, during system suspend/resume, the driver for the child
>> make sure to synchronize the runtime PM status of the child device,
>> according to the state of the HW, this should work too.
>
> Not really.
>
> Again, problematic is the case when the child is resumed during
> system-wide resume and needs the parent to be accessible for that.
>
> Note that the runtime PM status of the child doesn't matter for
> pm_runtime_force_resume() running for the parent. What matters in
> there is the parent's usage counter alone. If that usage counter is
> not greater than 1, pm_runtime_force_resume() will leave the parent in
> suspend and note that it must run before the child's system-wide
> resume callbacks for the ordering between the parent resume and the
> child resume to be correct. This means that the parent will still be
> suspended when the child's system-wide resume callbacks run and that
> is a problem if the parent has to be accessible for them to succeed.
>
> Thus the runtime PM status of the child has no bearing on whether or
> not this is going to work. The child has to bump up the parent's
> usage counter during system-wide suspend to ensure that it will not be
> left in suspend by the subsequent pm_runtime_force_resume() and that
> practically requires the child driver to also use
> pm_runtime_force_suspend|resume().
>

Yeah, you are right!

I was thinking about the special case when the child driver has called
pm_runtime_get_sync() during ->probe(), then during system suspend it
puts the child device into low power state and updates the runtime PM
status of the child device, by using pm_runtime_set_suspended(). In
system resume the child driver then reverse those actions.

Anyway, let's forget about these cases, as they are rather specific.
Instead I think it's fair to say that currently:

"If the child device is managed by runtime PM, and the parent driver
wants to use pm_runtime_force_suspend|resume() for the parent device,
so must the child driver for the child device" - and subject patch
removes that artificial limitation.

> Now, the $subject patch changes the situation here for child drivers
> that only resume their devices during system-wide resume if they were
> "active" before the preceding system-wide suspend, because in that
> case the children counter of the parent will be nonzero. If the child
> resumes during system-wide resume even though it was suspended before
> the preceding system-wide suspend, pm_runtime_force_resume() running
> for its parent won't realize that the parent needs to resume too even
> after this patch.
>
> I generally think that leaving devices in suspend during system-wide
> resume is a very aggressive optimization and it should be done with
> extra care. It generally is not sufficient to rely on information
> coming from the runtime PM framework to do that safely in all cases.
> IMO it should only be done for drivers that explicitly expect it to be
> done and only if all of the drivers depending on them also expect it
> to be done. Otherwise it always will be potentially unsafe.

That's true, but I think your concern is related to the case when the
child isn't managed by runtime PM, while the parent is?

In regards to $subject patch, I really think it replaces the crappy
magic I invented to try to deal with child devices, with something
correct and clever. I would really appreciate it getting applied.

Kind regards
Uffe

2018-01-13 01:25:09

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v2] PM / runtime: Rework pm_runtime_force_suspend/resume()

On Friday, January 12, 2018 2:46:12 PM CET Ulf Hansson wrote:
> On 12 January 2018 at 02:55, Rafael J. Wysocki <[email protected]> wrote:
> > On Tue, Jan 9, 2018 at 7:08 AM, Ulf Hansson <[email protected]> wrote:
> >> On 3 January 2018 at 12:06, Rafael J. Wysocki <[email protected]> wrote:
> >>> From: Rafael J. Wysocki <[email protected]>
> >>>
> >>> One of the limitations of pm_runtime_force_suspend/resume() is that
> >>> if a parent driver wants to use these functions, all of its child
> >>> drivers have to do that too because of the parent usage counter
> >>> manipulations necessary to get the correct state of the parent during
> >>> system-wide transitions to the working state (system resume).
> >>
> >> I understand your point, however this isn't describing the full story,
> >> because there are a some more alternatives when
> >> pm_runtime_force_suspend|resume() works well, when used for parent
> >> devices. Let me clarify, just to make sure we get this correct.
> >>
> >> 1) If the child device isn't managed by runtime PM at all (it always
> >> has runtime PM disabled), pm_runtime_force_suspend|resume() can be
> >> used for the parent device. This works because the runtime PM status
> >> if the child remains in the default status, RPM_SUSPENDED.
> >
> > Say the child is resumed every time during system-wide resume (which
> > is the only possibility for drivers that don't support runtime PM) and
> > it needs the parent to be accessible for that.
> >
> > If the parent uses pm_runtime_force_suspend|resume(), the child needs
> > to bump up the usage counter of the parent during system-wide suspend
> > to prevent pm_runtime_force_resume() from leaving the parent in
> > suspend during the subsequent system-wide resume and that obviously is
> > not going to happen if the child driver doesn't support runtime PM.
>
> My take on this, is that this isn't a system suspend/resume specific
> issue, thus also not something that pm_runtime_force_suspend|resume()
> should need need to care about.
>
> The reason is why I think so is because, normally during runtime, the
> runtime PM core don't prevent parents from being runtime suspended, in
> case their children isn't managed by runtime PM. In other words, the
> problem exists already and isn't specific to system suspend.

Well, you said above that "If the child device isn't managed by runtime PM at
all (it always has runtime PM disabled), pm_runtime_force_suspend|resume() can
be used for the parent device", but that's not the case according to my
analysis and I was responding specifically to that statement.

For pm_runtime_force_suspend|resume() to be used safely in a parent driver,
all of its child drivers must in fact support runtime PM as stated below
(and, of course, that's related to the fact that runtime PM can only be safely
supported by the parent driver, in general, if all of its child drivers support
it).

> >
> > Note that the $subject patch doesn't really change anything in that
> > respect, because the children counter of the parent will be zero (or
> > at least will not cover the child in question) then.
> >
> > Of course, this means that it is only safe to use
> > pm_runtime_force_suspend|resume() in parent drivers if all of their
> > child drivers support runtime PM.
>
> I understand your concern, however, hopefully the about comment can address it!?

This is an observation, not a concern.

It is sort of obvious that the driver using pm_runtime_force_suspend|resume()
must support runtime PM and for that to really work all of its child drivers
must support runtime PM too (in general).

I just don't know why you were arguing otherwise to start with. :-)

> >
> >> 2) If, somehow, during system suspend/resume, the driver for the child
> >> make sure to synchronize the runtime PM status of the child device,
> >> according to the state of the HW, this should work too.
> >
> > Not really.
> >
> > Again, problematic is the case when the child is resumed during
> > system-wide resume and needs the parent to be accessible for that.
> >
> > Note that the runtime PM status of the child doesn't matter for
> > pm_runtime_force_resume() running for the parent. What matters in
> > there is the parent's usage counter alone. If that usage counter is
> > not greater than 1, pm_runtime_force_resume() will leave the parent in
> > suspend and note that it must run before the child's system-wide
> > resume callbacks for the ordering between the parent resume and the
> > child resume to be correct. This means that the parent will still be
> > suspended when the child's system-wide resume callbacks run and that
> > is a problem if the parent has to be accessible for them to succeed.
> >
> > Thus the runtime PM status of the child has no bearing on whether or
> > not this is going to work. The child has to bump up the parent's
> > usage counter during system-wide suspend to ensure that it will not be
> > left in suspend by the subsequent pm_runtime_force_resume() and that
> > practically requires the child driver to also use
> > pm_runtime_force_suspend|resume().
> >
>
> Yeah, you are right!
>
> I was thinking about the special case when the child driver has called
> pm_runtime_get_sync() during ->probe(), then during system suspend it
> puts the child device into low power state and updates the runtime PM
> status of the child device, by using pm_runtime_set_suspended(). In
> system resume the child driver then reverse those actions.
>
> Anyway, let's forget about these cases, as they are rather specific.
> Instead I think it's fair to say that currently:
>
> "If the child device is managed by runtime PM, and the parent driver
> wants to use pm_runtime_force_suspend|resume() for the parent device,
> so must the child driver for the child device" - and subject patch
> removes that artificial limitation.

This isn't entirely accurate, because all of that implicitly assumes
runtime PM support in all of the involved drivers anyway.

> > Now, the $subject patch changes the situation here for child drivers
> > that only resume their devices during system-wide resume if they were
> > "active" before the preceding system-wide suspend, because in that
> > case the children counter of the parent will be nonzero. If the child
> > resumes during system-wide resume even though it was suspended before
> > the preceding system-wide suspend, pm_runtime_force_resume() running
> > for its parent won't realize that the parent needs to resume too even
> > after this patch.
> >
> > I generally think that leaving devices in suspend during system-wide
> > resume is a very aggressive optimization and it should be done with
> > extra care. It generally is not sufficient to rely on information
> > coming from the runtime PM framework to do that safely in all cases.
> > IMO it should only be done for drivers that explicitly expect it to be
> > done and only if all of the drivers depending on them also expect it
> > to be done. Otherwise it always will be potentially unsafe.
>
> That's true, but I think your concern is related to the case when the
> child isn't managed by runtime PM, while the parent is?

No, that case is not interesting at all (again, runtime PM must be supported
by all of the involved drivers for this to work at all).

Anyway, my point is that optimizations like this one only work under specific
assumptions, but here there's nothing to ensure that these assumptions hold.
It is sort of like a loaded gun with no safety.

> In regards to $subject patch, I really think it replaces the crappy
> magic I invented to try to deal with child devices, with something
> correct and clever. I would really appreciate it getting applied.

Sure, but that requires genpd to be fixed too.

Thanks,
Rafael