Make sure that the controller is runtime resumed when system suspending
to avoid an external abort when accessing the interrupt registers:
Unhandled fault: external abort on non-linefetch (0x1008) at 0xd025840a
...
[<c05481a4>] (musb_default_readb) from [<c0545abc>] (musb_disable_interrupts+0x84/0xa8)
[<c0545abc>] (musb_disable_interrupts) from [<c0546b08>] (musb_suspend+0x38/0xb8)
[<c0546b08>] (musb_suspend) from [<c04a57f8>] (platform_pm_suspend+0x3c/0x64)
This is easily reproduced on a BBB by enabling the peripheral port only
(as the host port may enable the shared clock) and keeping it
disconnected so that the controller is runtime suspended. (Well, you
would also need to the not-yet-merged am33xx-suspend patches by Dave
Gerlach to be able to suspend the BBB.)
This is a regression that was introduced by commit 1c4d0b4e1806 ("usb:
musb: Remove pm_runtime_set_irq_safe") which allowed the parent glue
device to runtime suspend and thereby exposed a couple of older issues:
Register accesses without explicitly making sure the controller is
runtime resumed during suspend was first introduced by commit
c338412b5ded ("usb: musb: unconditionally save and restore the context
on suspend") in 3.14.
Commit a1fc1920aaaa ("usb: musb: core: make sure musb is in RPM_ACTIVE on
resume") later started setting the RPM status to active during resume
without first making sure that the parent was runtime resumed. This was
also implicitly relying on the parent always being active. Since commit
71723f95463d ("PM / runtime: print error when activating a child to
unactive parent") this now also results in following warning:
musb-hdrc musb-hdrc.0: runtime PM trying to activate child device
musb-hdrc.0 but parent (47401400.usb) is not active
This patch has been verified on 4.13-rc2, 4.12 and 4.9 using a BBB
(the dsps glue would always be active also in 4.8).
Fixes: c338412b5ded ("usb: musb: unconditionally save and restore the context on suspend")
Fixes: a1fc1920aaaa ("usb: musb: core: make sure musb is in RPM_ACTIVE on resume")
Fixes: 1c4d0b4e1806 ("usb: musb: Remove pm_runtime_set_irq_safe")
Cc: stable <[email protected]> # 4.8
Cc: Alan Stern <[email protected]>
Cc: Daniel Mack <[email protected]>
Cc: Dave Gerlach <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Tony Lindgren <[email protected]>
Signed-off-by: Johan Hovold <[email protected]>
---
drivers/usb/musb/musb_core.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/drivers/usb/musb/musb_core.c b/drivers/usb/musb/musb_core.c
index 87cbd56cc761..b67692857daf 100644
--- a/drivers/usb/musb/musb_core.c
+++ b/drivers/usb/musb/musb_core.c
@@ -2671,6 +2671,13 @@ static int musb_suspend(struct device *dev)
{
struct musb *musb = dev_to_musb(dev);
unsigned long flags;
+ int ret;
+
+ ret = pm_runtime_get_sync(dev);
+ if (ret < 0) {
+ pm_runtime_put_noidle(dev);
+ return ret;
+ }
musb_platform_disable(musb);
musb_disable_interrupts(musb);
@@ -2721,14 +2728,6 @@ static int musb_resume(struct device *dev)
if ((devctl & mask) != (musb->context.devctl & mask))
musb->port1_status = 0;
- /*
- * The USB HUB code expects the device to be in RPM_ACTIVE once it came
- * out of suspend
- */
- pm_runtime_disable(dev);
- pm_runtime_set_active(dev);
- pm_runtime_enable(dev);
-
musb_start(musb);
spin_lock_irqsave(&musb->lock, flags);
@@ -2738,6 +2737,9 @@ static int musb_resume(struct device *dev)
error);
spin_unlock_irqrestore(&musb->lock, flags);
+ pm_runtime_mark_last_busy(dev);
+ pm_runtime_put_autosuspend(dev);
+
return 0;
}
--
2.13.3
On Mon, 24 Jul 2017, Johan Hovold wrote:
> Make sure that the controller is runtime resumed when system suspending
> to avoid an external abort when accessing the interrupt registers:
>
> Unhandled fault: external abort on non-linefetch (0x1008) at 0xd025840a
> ...
> [<c05481a4>] (musb_default_readb) from [<c0545abc>] (musb_disable_interrupts+0x84/0xa8)
> [<c0545abc>] (musb_disable_interrupts) from [<c0546b08>] (musb_suspend+0x38/0xb8)
> [<c0546b08>] (musb_suspend) from [<c04a57f8>] (platform_pm_suspend+0x3c/0x64)
>
> This is easily reproduced on a BBB by enabling the peripheral port only
> (as the host port may enable the shared clock) and keeping it
> disconnected so that the controller is runtime suspended. (Well, you
> would also need to the not-yet-merged am33xx-suspend patches by Dave
> Gerlach to be able to suspend the BBB.)
>
> This is a regression that was introduced by commit 1c4d0b4e1806 ("usb:
> musb: Remove pm_runtime_set_irq_safe") which allowed the parent glue
> device to runtime suspend and thereby exposed a couple of older issues:
>
> Register accesses without explicitly making sure the controller is
> runtime resumed during suspend was first introduced by commit
> c338412b5ded ("usb: musb: unconditionally save and restore the context
> on suspend") in 3.14.
>
> Commit a1fc1920aaaa ("usb: musb: core: make sure musb is in RPM_ACTIVE on
> resume") later started setting the RPM status to active during resume
> without first making sure that the parent was runtime resumed. This was
> also implicitly relying on the parent always being active. Since commit
> 71723f95463d ("PM / runtime: print error when activating a child to
> unactive parent") this now also results in following warning:
>
> musb-hdrc musb-hdrc.0: runtime PM trying to activate child device
> musb-hdrc.0 but parent (47401400.usb) is not active
I don't understand this. Why wouldn't the parent be in RPM_ACTIVE at
this time? After all, how could the system be expected to resume a
child device if its parent wasn't fully active?
In general, during a system resume callback we should bring a device
back to full power, tell the PM core that this has been done, and leave
it at full power until the whole system resume is finished. For
efficiency we can avoid doing this in cases where the device was in
runtime suspend before the system suspend began, but you have to be
very careful about it -- see the documentation for the ->prepare
callback in Documentation/driver-api/pm/devices.rst.
Alan Stern
On Mon, Jul 24, 2017 at 10:38:41AM -0400, Alan Stern wrote:
> On Mon, 24 Jul 2017, Johan Hovold wrote:
>
> > Make sure that the controller is runtime resumed when system suspending
> > to avoid an external abort when accessing the interrupt registers:
> >
> > Unhandled fault: external abort on non-linefetch (0x1008) at 0xd025840a
> > ...
> > [<c05481a4>] (musb_default_readb) from [<c0545abc>] (musb_disable_interrupts+0x84/0xa8)
> > [<c0545abc>] (musb_disable_interrupts) from [<c0546b08>] (musb_suspend+0x38/0xb8)
> > [<c0546b08>] (musb_suspend) from [<c04a57f8>] (platform_pm_suspend+0x3c/0x64)
> >
> > This is easily reproduced on a BBB by enabling the peripheral port only
> > (as the host port may enable the shared clock) and keeping it
> > disconnected so that the controller is runtime suspended. (Well, you
> > would also need to the not-yet-merged am33xx-suspend patches by Dave
> > Gerlach to be able to suspend the BBB.)
> >
> > This is a regression that was introduced by commit 1c4d0b4e1806 ("usb:
> > musb: Remove pm_runtime_set_irq_safe") which allowed the parent glue
> > device to runtime suspend and thereby exposed a couple of older issues:
> >
> > Register accesses without explicitly making sure the controller is
> > runtime resumed during suspend was first introduced by commit
> > c338412b5ded ("usb: musb: unconditionally save and restore the context
> > on suspend") in 3.14.
> >
> > Commit a1fc1920aaaa ("usb: musb: core: make sure musb is in RPM_ACTIVE on
> > resume") later started setting the RPM status to active during resume
> > without first making sure that the parent was runtime resumed. This was
> > also implicitly relying on the parent always being active. Since commit
> > 71723f95463d ("PM / runtime: print error when activating a child to
> > unactive parent") this now also results in following warning:
> >
> > musb-hdrc musb-hdrc.0: runtime PM trying to activate child device
> > musb-hdrc.0 but parent (47401400.usb) is not active
>
> I don't understand this. Why wouldn't the parent be in RPM_ACTIVE at
> this time? After all, how could the system be expected to resume a
> child device if its parent wasn't fully active?
The parent for a musb controller is a "glue" device (e.g. musb_dsps)
which previously was always kept active, but that's no longer the case
as mentioned above.
In a system with two controllers (e.g. a Beagle Bone Black), the host
port may be active and keep the shared clock enabled (managed by the
grandparent device). Thereby the external-abort crash can be avoided
when suspending a disconnected (and runtime suspended) peripheral port.
When the system is later resumed, you would hit that broken activation
code of the runtime suspended device, with a likewise runtime suspended
parent, and the warning would be printed.
> In general, during a system resume callback we should bring a device
> back to full power, tell the PM core that this has been done, and leave
> it at full power until the whole system resume is finished. For
> efficiency we can avoid doing this in cases where the device was in
> runtime suspend before the system suspend began, but you have to be
> very careful about it -- see the documentation for the ->prepare
> callback in Documentation/driver-api/pm/devices.rst.
Right, this is how things should have been implemented if it is at all
possible too keep the device runtime suspended across system suspend.
Thanks,
Johan
On Mon, 24 Jul 2017, Johan Hovold wrote:
> On Mon, Jul 24, 2017 at 10:38:41AM -0400, Alan Stern wrote:
> > On Mon, 24 Jul 2017, Johan Hovold wrote:
> >
> > > Make sure that the controller is runtime resumed when system suspending
> > > to avoid an external abort when accessing the interrupt registers:
> > >
> > > Unhandled fault: external abort on non-linefetch (0x1008) at 0xd025840a
> > > ...
> > > [<c05481a4>] (musb_default_readb) from [<c0545abc>] (musb_disable_interrupts+0x84/0xa8)
> > > [<c0545abc>] (musb_disable_interrupts) from [<c0546b08>] (musb_suspend+0x38/0xb8)
> > > [<c0546b08>] (musb_suspend) from [<c04a57f8>] (platform_pm_suspend+0x3c/0x64)
> > >
> > > This is easily reproduced on a BBB by enabling the peripheral port only
> > > (as the host port may enable the shared clock) and keeping it
> > > disconnected so that the controller is runtime suspended. (Well, you
> > > would also need to the not-yet-merged am33xx-suspend patches by Dave
> > > Gerlach to be able to suspend the BBB.)
> > >
> > > This is a regression that was introduced by commit 1c4d0b4e1806 ("usb:
> > > musb: Remove pm_runtime_set_irq_safe") which allowed the parent glue
> > > device to runtime suspend and thereby exposed a couple of older issues:
> > >
> > > Register accesses without explicitly making sure the controller is
> > > runtime resumed during suspend was first introduced by commit
> > > c338412b5ded ("usb: musb: unconditionally save and restore the context
> > > on suspend") in 3.14.
> > >
> > > Commit a1fc1920aaaa ("usb: musb: core: make sure musb is in RPM_ACTIVE on
> > > resume") later started setting the RPM status to active during resume
> > > without first making sure that the parent was runtime resumed. This was
> > > also implicitly relying on the parent always being active. Since commit
> > > 71723f95463d ("PM / runtime: print error when activating a child to
> > > unactive parent") this now also results in following warning:
> > >
> > > musb-hdrc musb-hdrc.0: runtime PM trying to activate child device
> > > musb-hdrc.0 but parent (47401400.usb) is not active
> >
> > I don't understand this. Why wouldn't the parent be in RPM_ACTIVE at
> > this time? After all, how could the system be expected to resume a
> > child device if its parent wasn't fully active?
>
> The parent for a musb controller is a "glue" device (e.g. musb_dsps)
> which previously was always kept active, but that's no longer the case
> as mentioned above.
Even if the parent is not always kept active, it should still be active
during a system resume. Starting from the time its resume routine
runs, it should remain at full power until the system resume is
finished.
> In a system with two controllers (e.g. a Beagle Bone Black),
Do you mean a host controller and a peripheral controller?
> the host
> port may be active and keep the shared clock enabled (managed by the
> grandparent device). Thereby the external-abort crash can be avoided
> when suspending a disconnected (and runtime suspended) peripheral port.
So what? There are lots of ways of avoiding such crashes. (Disabling
the driver entirely, for example.) They aren't relevant for this
discussion.
> When the system is later resumed, you would hit that broken activation
> code of the runtime suspended device, with a likewise runtime suspended
> parent, and the warning would be printed.
Why would the parent be runtime suspended? Why wouldn't it still be in
the full-power state, the way its own resume routine should have left
it?
Maybe I'm being slow and dumb here, but I don't see how any of this
answers the question I raised earlier.
Alan Stern
> > In general, during a system resume callback we should bring a device
> > back to full power, tell the PM core that this has been done, and leave
> > it at full power until the whole system resume is finished. For
> > efficiency we can avoid doing this in cases where the device was in
> > runtime suspend before the system suspend began, but you have to be
> > very careful about it -- see the documentation for the ->prepare
> > callback in Documentation/driver-api/pm/devices.rst.
>
> Right, this is how things should have been implemented if it is at all
> possible too keep the device runtime suspended across system suspend.
>
> Thanks,
> Johan
On Mon, Jul 24, 2017 at 01:13:22PM -0400, Alan Stern wrote:
> On Mon, 24 Jul 2017, Johan Hovold wrote:
>
> > On Mon, Jul 24, 2017 at 10:38:41AM -0400, Alan Stern wrote:
> > > On Mon, 24 Jul 2017, Johan Hovold wrote:
> > >
> > > > Make sure that the controller is runtime resumed when system suspending
> > > > to avoid an external abort when accessing the interrupt registers:
> > > >
> > > > Unhandled fault: external abort on non-linefetch (0x1008) at 0xd025840a
> > > > ...
> > > > [<c05481a4>] (musb_default_readb) from [<c0545abc>] (musb_disable_interrupts+0x84/0xa8)
> > > > [<c0545abc>] (musb_disable_interrupts) from [<c0546b08>] (musb_suspend+0x38/0xb8)
> > > > [<c0546b08>] (musb_suspend) from [<c04a57f8>] (platform_pm_suspend+0x3c/0x64)
> > > >
> > > > This is easily reproduced on a BBB by enabling the peripheral port only
> > > > (as the host port may enable the shared clock) and keeping it
> > > > disconnected so that the controller is runtime suspended. (Well, you
> > > > would also need to the not-yet-merged am33xx-suspend patches by Dave
> > > > Gerlach to be able to suspend the BBB.)
> > > >
> > > > This is a regression that was introduced by commit 1c4d0b4e1806 ("usb:
> > > > musb: Remove pm_runtime_set_irq_safe") which allowed the parent glue
> > > > device to runtime suspend and thereby exposed a couple of older issues:
> > > >
> > > > Register accesses without explicitly making sure the controller is
> > > > runtime resumed during suspend was first introduced by commit
> > > > c338412b5ded ("usb: musb: unconditionally save and restore the context
> > > > on suspend") in 3.14.
> > > >
> > > > Commit a1fc1920aaaa ("usb: musb: core: make sure musb is in RPM_ACTIVE on
> > > > resume") later started setting the RPM status to active during resume
> > > > without first making sure that the parent was runtime resumed. This was
> > > > also implicitly relying on the parent always being active. Since commit
> > > > 71723f95463d ("PM / runtime: print error when activating a child to
> > > > unactive parent") this now also results in following warning:
> > > >
> > > > musb-hdrc musb-hdrc.0: runtime PM trying to activate child device
> > > > musb-hdrc.0 but parent (47401400.usb) is not active
> > >
> > > I don't understand this. Why wouldn't the parent be in RPM_ACTIVE at
> > > this time? After all, how could the system be expected to resume a
> > > child device if its parent wasn't fully active?
> >
> > The parent for a musb controller is a "glue" device (e.g. musb_dsps)
> > which previously was always kept active, but that's no longer the case
> > as mentioned above.
>
> Even if the parent is not always kept active, it should still be active
> during a system resume. Starting from the time its resume routine
> runs, it should remain at full power until the system resume is
> finished.
It is powered, but its runtime PM status does not reflect that, and that
is the problem. This patch makes sure that the child, and thereby
parent, are both runtime resumed throughout system suspend, but perhaps
that should be done explicitly in the parent driver as well (more
below).
> > In a system with two controllers (e.g. a Beagle Bone Black),
>
> Do you mean a host controller and a peripheral controller?
Yes, in this example (the BBB has two OTG controllers), but it could
just as well be two controllers in peripheral mode where one is active.
> > the host
> > port may be active and keep the shared clock enabled (managed by the
> > grandparent device). Thereby the external-abort crash can be avoided
> > when suspending a disconnected (and runtime suspended) peripheral port.
>
> So what? There are lots of ways of avoiding such crashes. (Disabling
> the driver entirely, for example.) They aren't relevant for this
> discussion.
Perhaps I read your question too literally above; I'm trying to explain
how you can end up with a runtime suspended parent during resume, without
hitting the external abort during suspend, with the current kernel.
This can be done by keeping the sibling/cousin controller enabled, but
could of course also have been achieved by preventing the grandparent
(omap) device (which controls the clock) from suspending by other means.
I'm just describing how this could happen with the current
implementation; I'm not claiming that the implementation is correct.
> > When the system is later resumed, you would hit that broken activation
> > code of the runtime suspended device, with a likewise runtime suspended
> > parent, and the warning would be printed.
>
> Why would the parent be runtime suspended? Why wouldn't it still be in
> the full-power state, the way its own resume routine should have left
> it?
>
> Maybe I'm being slow and dumb here, but I don't see how any of this
> answers the question I raised earlier.
I think understand what you're getting at and yes, the parent *should*
be RPM_ACTIVE, while I'm saying that it *currently* is not guaranteed.
As mentioned above, this patch does make sure that child and parent are
both runtime resume when suspending and therefore remain RPM_ACTIVE
throughout suspend. This specifically means that the explicit activation
code on resume can now be removed.
But I should fix that paragraph and not blame the explicit activation
code for not "making sure that the parent was runtime resumed".
In fact, some of the parent glue drivers also do register accesses in
their suspend/resume callbacks which ought to have been preceded by an
explicit runtime resume. These glue drivers are a bit special however
and does check for a registered child in their pm callbacks so it's not
a problem in practise. I think I'll add them anyway for clarity in a
follow up patch.
Thanks,
Johan
On Tue, 25 Jul 2017, Johan Hovold wrote:
> On Mon, Jul 24, 2017 at 01:13:22PM -0400, Alan Stern wrote:
> > On Mon, 24 Jul 2017, Johan Hovold wrote:
> >
> > > On Mon, Jul 24, 2017 at 10:38:41AM -0400, Alan Stern wrote:
> > > > On Mon, 24 Jul 2017, Johan Hovold wrote:
> > > >
> > > > > Make sure that the controller is runtime resumed when system suspending
> > > > > to avoid an external abort when accessing the interrupt registers:
> > > > >
> > > > > Unhandled fault: external abort on non-linefetch (0x1008) at 0xd025840a
> > > > > ...
> > > > > [<c05481a4>] (musb_default_readb) from [<c0545abc>] (musb_disable_interrupts+0x84/0xa8)
> > > > > [<c0545abc>] (musb_disable_interrupts) from [<c0546b08>] (musb_suspend+0x38/0xb8)
> > > > > [<c0546b08>] (musb_suspend) from [<c04a57f8>] (platform_pm_suspend+0x3c/0x64)
> > > > >
> > > > > This is easily reproduced on a BBB by enabling the peripheral port only
> > > > > (as the host port may enable the shared clock) and keeping it
> > > > > disconnected so that the controller is runtime suspended. (Well, you
> > > > > would also need to the not-yet-merged am33xx-suspend patches by Dave
> > > > > Gerlach to be able to suspend the BBB.)
> > > > >
> > > > > This is a regression that was introduced by commit 1c4d0b4e1806 ("usb:
> > > > > musb: Remove pm_runtime_set_irq_safe") which allowed the parent glue
> > > > > device to runtime suspend and thereby exposed a couple of older issues:
> > > > >
> > > > > Register accesses without explicitly making sure the controller is
> > > > > runtime resumed during suspend was first introduced by commit
> > > > > c338412b5ded ("usb: musb: unconditionally save and restore the context
> > > > > on suspend") in 3.14.
> > > > >
> > > > > Commit a1fc1920aaaa ("usb: musb: core: make sure musb is in RPM_ACTIVE on
> > > > > resume") later started setting the RPM status to active during resume
> > > > > without first making sure that the parent was runtime resumed. This was
> > > > > also implicitly relying on the parent always being active. Since commit
> > > > > 71723f95463d ("PM / runtime: print error when activating a child to
> > > > > unactive parent") this now also results in following warning:
> > > > >
> > > > > musb-hdrc musb-hdrc.0: runtime PM trying to activate child device
> > > > > musb-hdrc.0 but parent (47401400.usb) is not active
> > > >
> > > > I don't understand this. Why wouldn't the parent be in RPM_ACTIVE at
> > > > this time? After all, how could the system be expected to resume a
> > > > child device if its parent wasn't fully active?
> > >
> > > The parent for a musb controller is a "glue" device (e.g. musb_dsps)
> > > which previously was always kept active, but that's no longer the case
> > > as mentioned above.
> >
> > Even if the parent is not always kept active, it should still be active
> > during a system resume. Starting from the time its resume routine
> > runs, it should remain at full power until the system resume is
> > finished.
>
> It is powered, but its runtime PM status does not reflect that, and that
> is the problem. This patch makes sure that the child, and thereby
> parent, are both runtime resumed throughout system suspend, but perhaps
> that should be done explicitly in the parent driver as well (more
> below).
>
> > > In a system with two controllers (e.g. a Beagle Bone Black),
> >
> > Do you mean a host controller and a peripheral controller?
>
> Yes, in this example (the BBB has two OTG controllers), but it could
> just as well be two controllers in peripheral mode where one is active.
>
> > > the host
> > > port may be active and keep the shared clock enabled (managed by the
> > > grandparent device). Thereby the external-abort crash can be avoided
> > > when suspending a disconnected (and runtime suspended) peripheral port.
> >
> > So what? There are lots of ways of avoiding such crashes. (Disabling
> > the driver entirely, for example.) They aren't relevant for this
> > discussion.
>
> Perhaps I read your question too literally above; I'm trying to explain
> how you can end up with a runtime suspended parent during resume, without
> hitting the external abort during suspend, with the current kernel.
>
> This can be done by keeping the sibling/cousin controller enabled, but
> could of course also have been achieved by preventing the grandparent
> (omap) device (which controls the clock) from suspending by other means.
>
> I'm just describing how this could happen with the current
> implementation; I'm not claiming that the implementation is correct.
>
> > > When the system is later resumed, you would hit that broken activation
> > > code of the runtime suspended device, with a likewise runtime suspended
> > > parent, and the warning would be printed.
> >
> > Why would the parent be runtime suspended? Why wouldn't it still be in
> > the full-power state, the way its own resume routine should have left
> > it?
> >
> > Maybe I'm being slow and dumb here, but I don't see how any of this
> > answers the question I raised earlier.
>
> I think understand what you're getting at and yes, the parent *should*
> be RPM_ACTIVE, while I'm saying that it *currently* is not guaranteed.
>
> As mentioned above, this patch does make sure that child and parent are
> both runtime resume when suspending and therefore remain RPM_ACTIVE
> throughout suspend. This specifically means that the explicit activation
> code on resume can now be removed.
>
> But I should fix that paragraph and not blame the explicit activation
> code for not "making sure that the parent was runtime resumed".
>
> In fact, some of the parent glue drivers also do register accesses in
> their suspend/resume callbacks which ought to have been preceded by an
> explicit runtime resume. These glue drivers are a bit special however
> and does check for a registered child in their pm callbacks so it's not
> a problem in practise. I think I'll add them anyway for clarity in a
> follow up patch.
I see. Thanks for the explanation.
Alan Stern