LinuxLists.cc - Re: [linux-pm] subtle pm_runtime_put

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

"Rafael J. Wysocki" <[email protected]> writes:

> On Saturday, December 18, 2010, Rafael J. Wysocki wrote:
>> On Saturday, December 18, 2010, Ohad Ben-Cohen wrote:
>> > On Sat, Dec 18, 2010 at 5:07 PM, Rafael J. Wysocki <[email protected]> wrote:
>> > > On Saturday, December 18, 2010, Ohad Ben-Cohen wrote:
>> > >> On Sat, Dec 11, 2010 at 4:50 PM, Alan Stern <[email protected]> wrote:
>> > >> >> Think of a device which you have no way to reset other than powering
>> > >> >> it down and up again.
>> > >> >>
>> > >> >> If that device is managed by runtime PM, the only way to do that is by
>> > >> >> using put_sync() followed by a get_sync(). Sure, if someone else
>> > >> >> increased the usage_count of that device this won't work, but then of
>> > >> >> course it means that the driver is not using runtime PM correctly.
>> > >> >
>> > >> > Not so. Even if a driver uses runtime PM correctly, there will still
>> > >> > be times when someone else has increased the usage_count. This happens
>> > >> > during probing and during system resume, for example.
>> > >>
>> > >> I'm aware of these two examples; normally we're good with them since
>> > >> during probing we're not toggling the power, and during suspend/resume
>> > >> the SDIO core is responsible for manipulating the power (and it does
>> > >> so directly). Are there (or do you think there will be) additional
>> > >> examples where this can happen ?
>> > >>
>> > >> But this leads me to a real problem which we have encountered.
>> > >>
>> > >> During system suspend, our driver is asked (by mac80211's suspend
>> > >> handler) to power off its device. When this happens, the driver has no
>> > >> idea that the system is suspending - regular driver code (responsible
>> > >> to remove the wlan interface and stop the device) is being called.
>> > >
>> > > That's where the problem is. If there's a difference, from the driver's
>> > > point of view, between suspend and some other operation, there should be a
>> > > way to tell the driver what case it actually is dealing with.
>> >
>> > Yes, the problem will be solved if the driver would bypass the runtime
>> > PM framework on system suspend. mac80211 obviously has this
>> > information, and technically it's very easy to let the driver know
>> > about it.
>> >
>> > But the difference between suspend and normal operation is really
>> > artificial: in both cases mac80211 just asks the driver to power its
>> > device down, and the end result is exactly the same (a GPIO line of
>> > the device is de-asserted in our case). The difference between these
>> > two scenarios
>> > exist only because runtime PM is effectively disabled during system
>> > suspend, and therefore the driver has to look for an alternative way
>> > to power down the device.
>>
>> That's not correct, sorry.
>>
>> There is a bug and the bug is that you use the runtime PM to power down
>> the device in every situation, although you obviously shouldn't do that
>> (e.g. because it may be disabled by user space or it _is_ disabled by
>> the PM core during system suspend).
>>
>> > > BTW, what would you do in that case if the runtime PM of the device were
>> > > disabled by user space by writing "on" to the device's
>> > > /sys/devices/.../power/control file?
>> >
>> > That's a good point.
>> >
>> > Blocking runtime PM for this device is fatal since this particular
>> > device has functionality tied up with its power control (no other way
>> > to reset it).
>> >
>> > It might call for a device-specific dev_pm_info bit flag to prohibit this...
>>
>> No way.
>
> That said, I think we may do something different that perhaps will make your
> life somewhat easier.
>
> Namely, if your driver doesn't implement any system suspend callbacks
> (i.e. ->suspend(), ->resume(), ->freeze(), ->thaw() etc.) and it doesn't
> want the analogous subsystem callbacks to be executed for the device, it will
> make sense to flag the device as "runtime only", or something like this,
> which make the PM core skip the device in dpm_suspend() etc.
>
> In that case, if a device if flagged as "runtime only", we can avoid
> calling pm_runtime_get_noirq() for it in dpm_prepare() and, analogously,
> calling pm_runtime_put_sync() for it in dpm_complete(). However, we will have
> to fail system suspend (or hibernation) if a "runtime only" device has the
> power.runtime_auto flag unset.
>
> So, I think we can add a "runtime only" flag working as described above.
> I guess it will also help in the case I've been discussing with Kevin for some
> time (i2c device using runtime PM used by an RTC in a semi-transparent
> fashion).

[sorry for being late to the discussion]

Something like thi would certainly work for the case we've discussed at
LPC and some other similar ones we have on OMAP where we have "runtime
only" drivers.

That being said, another thing we discussed briefly at LPC was wondering
about reason(s) behind the DPM core locking out runtime PM transitions
in the first place.

Currently runtime PM transitions are blocked in dpm_prepare() and only
allowed again in dpm_complete(). How about locking out runtime PM
transitions only until the DPM suspend operation is complete. IOW,
rather than waiting for dpm_complete() to re-allow runtime PM
transitions, what about allowing them after dpm_suspend()? I haven't
actually tested this yet, since I'm busy with getting OMAP PM stuff
ready for the merge window, so it's just and idea so far. Of course
similar will be needed to block runtime PM transitions during
dpm_resume().

Kevin

2010-12-29 06:34:46

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Tue, Dec 28, 2010 at 11:46 PM, Alan Stern <[email protected]> wrote:
> What's the relation between mmc_power_off() and mmc_power_save_host()?

Essentially they are the same - mmc_power_off() is the one that
actually powers off the card.

mmc_power_save_host() just invokes first a bus-specific ->power_save()
handler (if one exists).

> Does one merely go into a low-power state whereas the other does the
> complete-power-off reset?

No, think of them as the same. Both will lead to a complete power off.
The mmc_power_off() is just an internal function which belongs to the
MMC core.

> During wlan-interface-down, it's not necessary to reduce the power
> level; it's merely desirable. ?That's exactly the sort of thing runtime
> PM is meant for. ?Hence the existing call to pm_runtime_put_sync() is
> sufficient.

Not exactly - think of airplane mode, where we must ensure the radios
are disabled, without being blocked by /sys/devices/.../power/control
- we will need to bypass runtime PM in this scenario too.

> Put it another way: ?Suppose an SDIO card has more than one function
> and you need to reset the wl1271 function. ?It sounds like there's no
> way to do this without resetting the other functions as well. ?What
> happens if another function's driver is busy and can't allow a reset
> just then?

That's a chip specific thing. In our case, there is no other active
SDIO function.

Other vendors which does employ several SDIO functions have a separate
reset for each function _in addition_ to the power line. So for them
the runtime PM model is just great - they never care if the actual
power is down or not - it's really just about power consumption, and
nothing else. When they need an unconditional reset, they just use the
appropriate reset line.

In our case we have a single control for both power and reset
functionality (actually we do have a separate power line to the
device, but it is not controlled by software - it is fed directly from
the battery rail).

2010-12-22 12:30:10

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Wednesday, December 22, 2010, Alan Stern wrote:
> On Tue, 21 Dec 2010, Rafael J. Wysocki wrote:
>
> > It basically goes like this. There's device A that is only resumed when it's
> > needed to carry out an operation and is suspended immediately after that.
> > There's another device B that needs A to do something during its suspend.
> > So, when the suspend of B is started, A is woken up, does its work and is
> > suspended again (using pm_runtime_suspend()). Then B is suspended.
> >
> > We currently require that ->suspend() and ->resume() callbacks be defined
> > for A (presumably pointing to the same code as its runtime callbacks) so that
> > things work correctly, but perhaps we can just relax this requirement a bit?
> > I'm not 100% sure that's a good idea, just considering it.
>
> I still don't know. It would require a lot of special conditions: no
> child devices, not runtime-PM-disabled, not runtime-PM-forbidden...
> Also, A's parent would have to be coded carefully; otherwise A's
> runtime resume would prevent the parent from suspending.
>
> This just doesn't fit very well with the runtime PM model, or at least,
> not in the form you described.
>
> Consider this instead: Since A is required to be functional before B
> can be used, A must be registered before B and hence B gets suspended
> before A. Therefore during the prepare phase we can runtime-resume A
> and leave it powered up; when B needs to suspend, it won't matter that
> the runtime-PM calls are ineffective.

We don't really need to do that, because the runtime resume _is_ functional
during system suspend. The only thing missing is a ->suspend() callback for A
(and a corresponding ->resume() callback to make sure A will be available to
B during system resume).

> Then when A's dpm_suspend occurs, it can safely go to a low-power state and
> stay there.

Agreed.

Thanks,
Rafael

2010-12-26 05:55:27

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sun, Dec 26, 2010 at 4:48 AM, Alan Stern <[email protected]> wrote:
> Is there a separate handler responsible for powering the device back
> on? ?What are the names of these handlers?

wl1271_sdio_power_on() and wl1271_sdio_power_off().

If you're taking a look, please do so using the latest code as seen on mmc-next:

git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc.git mmc-next

> That's not a race

It is.

If system suspend will continue uninterruptedly, and the driver will
be suspended (by the SDIO subsystem), then the device will be powered
down, and everything will work.

But if something will interrupt the suspend transition _before_ our
driver is suspended (e.g. some other suspend() handler will fail),
then we have a problem, because the device will not be reset as the
driver needs it to be.

> What is the name of the driver's runtime_suspend routine?

The driver itself doesn't have one; power is being controlled by the
MMC/SDIO subsystem, so the actual runtime_suspend handler is
mmc_runtime_suspend().

> And while we're at it, what is the name of the routine that _actually_ changes the
> device's power level?

mmc_power_save_host(), which is called by mmc_runtime_suspend().

(well, more accurately it is actually fixed_voltage_disable(), but
that's a few levels deeper than I think you'll be interested in)

> Evidently you are facing two distinct problems, and they need to be
> solved together. ?In fact, solving one problem (bypassing runtime PM)
> will probably help to solve the other (doing system resume correctly).

I agree.

> Well, I'd like to help you find the best solution, but I can't because
> I don't understand how the subsystem and driver are structured, and you
> aren't providing enough details. ?We won't make any further progress on
> this unless you abandon the incomplete high-level overviews and instead
> give a more or less complete lower-level description -- including the
> subroutine names!

In fact, I'd appreciate if we can abandon the high-level discussions too :)

I have provided pointers to the actual code, please tell me if you're
interested in any other parts or have questions about it.

> Maybe you mean that you want to prevent dpm_complete() from powering
> the device down while it is in use.

Yes.

> That should never be a problem --
> the device should never be used without the PM core being aware.

That's why I said we will have to both directly manipulate the power
of the device, and also maintain usage_count.

> Therefore you need to have a single routine that actually powers-down
> the device, and you need to call this routine in different settings:
> during system sleep or runtime suspend (the same code in the driver
> handles these two cases, right?) and when a reset is needed.

Yes. and sometimes also just because the user turned wlan off.

Thanks,
Ohad.

2010-12-28 19:34:31

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Tue, Dec 28, 2010 at 9:21 PM, Rafael J. Wysocki <[email protected]> wrote:
> It looks like you could simply do a power down-power up cycle before trying to
> load new firmware, just in case. ?I guess that's suboptimal for some reason?

It would work, but we will not be able to unconditionally disable the
radios (e.g. airplane mode comes to mind).

> Please pretend that the runtime PM framework doesn't exist for a while. ?How
> would you design things in that case?

Duplicate most of runtime-PM's plumbing into the MMC/SDIO subsystem.
Off the top of my hat:
- We need the device hierarchy and the suspend/resume dependencies (a
single SDIO card has several logical sub-devices, a.k.a SDIO
functions)
- We need to maintain usage_count for each device, and expose the same
API to handle it
- We need autosuspend for MMC cards (power them off on inactivity)
- We need the same, or similar, locking plumbing
- We probably need the same sysfs ABI as well: autosuspend_delay_ms,
and even /sys/devices/.../power/control itself is useful (not for our
device, but for others, sure)
- ...

Thanks,
Ohad.

2010-12-23 07:51:43

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sun, Dec 19, 2010 at 12:22 PM, Rafael J. Wysocki <[email protected]> wrote:
> That said, I think we may do something different that perhaps will make your
> life somewhat easier.
...
> So, I think we can add a "runtime only" flag working as described above.

That will definitely solve the suspend/resume issue we're having.

But due to /sys/devices/.../power/control, we will still have to
bypass runtime PM for this particular device.

We need a way to unconditionally power down the device, and as long as
runtime PM can be disabled by the user, we can't use it.

Thanks,
Ohad.

2010-12-21 21:32:57

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Tuesday, December 21, 2010, Alan Stern wrote:
> On Mon, 20 Dec 2010, Rafael J. Wysocki wrote:
>
> > > > In that case, if a device if flagged as "runtime only", we can avoid
> > > > calling pm_runtime_get_noirq() for it in dpm_prepare() and, analogously,
> > > > calling pm_runtime_put_sync() for it in dpm_complete(). However, we will have
> > > > to fail system suspend (or hibernation) if a "runtime only" device has the
> > > > power.runtime_auto flag unset.
> > >
> > > Or more generally, if pm_runtime_suspended() doesn't return 'true' for
> > > the device.
> >
> > That's not necessary, because the device may be suspended using
> > pm_runtime_suspend() later than we check pm_runtime_suspended().
>
> What if the device has a child in the RPM_ACTIVE state? Then
> pm_runtime_suspend() won't do anything, even if the child really is
> dpm-suspended.

Well, in fact I was thinking of leaf devices. Following all of the children
for non-leaf devices would pretty much nullify the whole possible gain.

> > I'd use the "runtime only" (or perhaps better "no_dpm") flag as a declaration
> > (if set) that the device is going to be suspended with the help of "runtime"
> > callbacks and the driver takes the responsibility for getting things right.
>
> I'm still not sure about this; the design isn't clear. Are these
> runtime callbacks going to come from the PM core or from the driver?
> If from the driver, how will the driver know when to issue them? What
> about coordinating async suspends (the device must be suspended after
> its children and before its parent)?

It basically goes like this. There's device A that is only resumed when it's
needed to carry out an operation and is suspended immediately after that.
There's another device B that needs A to do something during its suspend.
So, when the suspend of B is started, A is woken up, does its work and is
suspended again (using pm_runtime_suspend()). Then B is suspended.

We currently require that ->suspend() and ->resume() callbacks be defined
for A (presumably pointing to the same code as its runtime callbacks) so that
things work correctly, but perhaps we can just relax this requirement a bit?
I'm not 100% sure that's a good idea, just considering it.

Rafael

2010-12-26 14:54:17

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sun, Dec 26, 2010 at 1:45 PM, Rafael J. Wysocki <[email protected]> wrote:
> (It still is not a race, though.)

Thread 1
=======

suspend_handler_of_a_random_device()
{
return failure;
}

Thread 2
=======

suspend_handler_of_our_mmc_host_controller()
{
invoke our sdio suspend handler and power down the card (directly);
return success;
}

If thread 2 wins => everything is cool
If thread 1 wins => our driver will not be able to resume

The race begins after mac80211 invokes our driver's power off handler,
which is using pm_runtime_put_sync().

2010-12-19 07:48:29

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sun, Dec 19, 2010 at 12:47 AM, Rafael J. Wysocki <[email protected]> wrote:
> On Saturday, December 18, 2010, Ohad Ben-Cohen wrote:
>> On Sat, Dec 18, 2010 at 5:07 PM, Rafael J. Wysocki <[email protected]> wrote:
>> > On Saturday, December 18, 2010, Ohad Ben-Cohen wrote:
>> >> On Sat, Dec 11, 2010 at 4:50 PM, Alan Stern <[email protected]> wrote:
>> >> >> Think of a device which you have no way to reset other than powering
>> >> >> it down and up again.
>> >> >>
>> >> >> If that device is managed by runtime PM, the only way to do that is by
>> >> >> using put_sync() followed by a get_sync(). Sure, if someone else
>> >> >> increased the usage_count of that device this won't work, but then of
>> >> >> course it means that the driver is not using runtime PM correctly.
>> >> >
>> >> > Not so. ?Even if a driver uses runtime PM correctly, there will still
>> >> > be times when someone else has increased the usage_count. ?This happens
>> >> > during probing and during system resume, for example.
>> >>
>> >> I'm aware of these two examples; normally we're good with them since
>> >> during probing we're not toggling the power, and during suspend/resume
>> >> the SDIO core is responsible for manipulating the power (and it does
>> >> so directly). Are there (or do you think there will be) additional
>> >> examples where this can happen ?
>> >>
>> >> But this leads me to a real problem which we have encountered.
>> >>
>> >> During system suspend, our driver is asked (by mac80211's suspend
>> >> handler) to power off its device. When this happens, the driver has no
>> >> idea that the system is suspending - regular driver code (responsible
>> >> to remove the wlan interface and stop the device) is being called.
>> >
>> > That's where the problem is. ?If there's a difference, from the driver's
>> > point of view, between suspend and some other operation, there should be a
>> > way to tell the driver what case it actually is dealing with.
>>
>> Yes, the problem will be solved if the driver would bypass the runtime
>> PM framework on system suspend. mac80211 obviously has this
>> information, and technically it's very easy to let the driver know
>> about it.
>>
>> But the difference between suspend and normal operation is really
>> artificial: in both cases mac80211 just asks the driver to power its
>> device down, and the end result is exactly the same (a GPIO line of
>> the device is de-asserted in our case). The difference between these
>> two scenarios
>> exist only because runtime PM is effectively disabled during system
>> suspend, and therefore the driver has to look for an alternative way
>> to power down the device.
>
> That's not correct, sorry.
>
> There is a bug and the bug is that you use the runtime PM to power down
> the device in every situation, although you obviously shouldn't do that
> (e.g. because it may be disabled by user space or it _is_ disabled by
> the PM core during system suspend).

That completely makes sense, and helps to precisely define the problem
and justify the solution.

We will continue to use runtime PM within the SDIO core, in order to
opportunistically power down the device (e.g. when a driver is not
loaded). In this case, it is perfectly OK if runtime PM is disabled,
since there is no functional difference if power is taken down or not
- it's purely about minimizing power consumption.

OTOH, within the wl12xx driver, where powering down the device is tied
to functionality (e.g. error recovery or just hard reset), and must be
carried out unconditionally, we will bypass runtime PM.

Rafael, Alan, thank you for your time and attention, it really helped.

Thanks,
Ohad.

2010-12-30 04:25:37

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Wed, 29 Dec 2010, Ohad Ben-Cohen wrote:

> On Tue, Dec 28, 2010 at 11:46 PM, Alan Stern <[email protected]> wrote:
> > What's the relation between mmc_power_off() and mmc_power_save_host()?
>
> Essentially they are the same - mmc_power_off() is the one that
> actually powers off the card.
>
> mmc_power_save_host() just invokes first a bus-specific ->power_save()
> handler (if one exists).
>
> > Does one merely go into a low-power state whereas the other does the
> > complete-power-off reset?
>
> No, think of them as the same. Both will lead to a complete power off.
> The mmc_power_off() is just an internal function which belongs to the
> MMC core.

Then what routine does the power down without the full reset?

That's why you run into trouble, isn't it? The device has been powered
down, but if the system sleep transition is aborted then the device
doesn't get reset (by mmc_power_save_host?), so you can't wake it up
again.

Also, what routine tries to do the failing wakeup, and what is its call
path?

> > During wlan-interface-down, it's not necessary to reduce the power
> > level; it's merely desirable. ?That's exactly the sort of thing runtime
> > PM is meant for. ?Hence the existing call to pm_runtime_put_sync() is
> > sufficient.
>
> Not exactly - think of airplane mode, where we must ensure the radios
> are disabled, without being blocked by /sys/devices/.../power/control
> - we will need to bypass runtime PM in this scenario too.

I don't see why. Turning off the radio is different from powering the
device down -- you should be able to do the first without doing the
second (although probably not vice versa).

Alan Stern

2010-12-26 17:00:28

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sun, 26 Dec 2010, Ohad Ben-Cohen wrote:

> On Sun, Dec 26, 2010 at 4:48 AM, Alan Stern <[email protected]> wrote:
> > Is there a separate handler responsible for powering the device back
> > on? ?What are the names of these handlers?
>
> wl1271_sdio_power_on() and wl1271_sdio_power_off().
>
> If you're taking a look, please do so using the latest code as seen on mmc-next:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc.git mmc-next

Hmm. It's a little difficult to untangle the web of dev_pm_ops
pointers and other stuff. There's wl1271_suspend() and wl1271_resume()
(which don't do anything) plus wl1271_sdio_set_power(), which calls
either wl1271_sdio_power_on() or wl1271_sdio_power_off(), and these
routines in turn call pm_runtime_get_sync() and pm_runtime_put_sync().

Under what circumstances does the MMC/SDIO core call
wl1271_sdio_set_power(), and where are those function calls?

> > That's not a race
>
> It is.
>
> If system suspend will continue uninterruptedly, and the driver will
> be suspended (by the SDIO subsystem), then the device will be powered
> down, and everything will work.
>
> But if something will interrupt the suspend transition _before_ our
> driver is suspended (e.g. some other suspend() handler will fail),
> then we have a problem, because the device will not be reset as the
> driver needs it to be.

The normal case is that system suspend succeeds. Then there is no
race. If system suspend fails then the failure may occur either before
or after the wl1271 device is powered down; _that_ is the race
(assuming you are using asynchronous PM). But normally it doesn't
occur.

> > What is the name of the driver's runtime_suspend routine?
>
> The driver itself doesn't have one; power is being controlled by the
> MMC/SDIO subsystem, so the actual runtime_suspend handler is
> mmc_runtime_suspend().

You mean, the wl1271 device has no PM methods of its own; its power is
managed entirely by the parent MMC/SDIO controller? If that's right
then you should call pm_runtime_no_callbacks() for the wl1271 device.

Is it possible for there to be more than one device connected to the
same MMC/SDIO controller? If it is, how do you reset one without
resetting all the others?

> > And while we're at it, what is the name of the routine that _actually_ changes the
> > device's power level?
>
> mmc_power_save_host(), which is called by mmc_runtime_suspend().
>
> (well, more accurately it is actually fixed_voltage_disable(), but
> that's a few levels deeper than I think you'll be interested in)

That's for runtime PM. What about system sleep? It looks like
sdio_bus_type has no callbacks for suspend or resume, so the driver's
callbacks get used instead -- that would be wl1271_suspend() and
wl1271_resume(). But they don't do anything, so there's no power
change until the parent MMC/SDIO controller is suspended. That would
be in mmc_sdio_suspend() and mmc_sdio_resume()? But I can't tell what
pmops->suspend and pmops->resume methods they end up calling. Are
these the entries in wl1271_sdio_pm_ops? Does that mean
wl1271_suspend() and wl1271_resume() get called twice?

I think I already understand how the runtime suspend/resume paths work:
There are no runtime PM methods for the wl1271 device, so the PM core
simply propagates changes up to the parent MMC/SDIO device and ends up
calling mmc_runtime_suspend() or mmc_runtime_resume(). Right?

What is the pathway for reset (or other similar activities that
require the device to be powered-down while it is in use)?
wl1271_sdio_reset() doesn't do anything.

Alan Stern

2010-12-18 22:51:26

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Saturday, December 18, 2010, Johannes Berg wrote:
> On Sat, 2010-12-18 at 18:00 +0200, Ohad Ben-Cohen wrote:
>
> > > That's where the problem is. If there's a difference, from the driver's
> > > point of view, between suspend and some other operation, there should be a
> > > way to tell the driver what case it actually is dealing with.
> >
> > Yes, the problem will be solved if the driver would bypass the runtime
> > PM framework on system suspend. mac80211 obviously has this
> > information, and technically it's very easy to let the driver know
> > about it.
> >
> > But the difference between suspend and normal operation is really
> > artificial: in both cases mac80211 just asks the driver to power its
> > device down, and the end result is exactly the same (a GPIO line of
> > the device is de-asserted in our case). The difference between these
> > two scenarios
> > exist only because runtime PM is effectively disabled during system
> > suspend, and therefore the driver has to look for an alternative way
> > to power down the device.
>
> Sounds to me like the difference isn't really in the driver, but the
> core PM subsystem. Why does it care when powering off a device whether
> it's during suspend, or during runtime?

The problem is that the driver is using something it shouldn't use for
the particular purpose. Plain and simple.

Namely, it uses the runtime PM _framework_ (which has specific and _documented_
limitations) for general power management of the device it handles. It is
wrong and it has nothing to do with the PM subsystem.

Thanks,
Rafael

2010-12-22 01:48:15

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Tue, 21 Dec 2010, Kevin Hilman wrote:

> That being said, another thing we discussed briefly at LPC was wondering
> about reason(s) behind the DPM core locking out runtime PM transitions
> in the first place.

The reason is to prevent confusion from unwanted runtime-PM state
changes during a system sleep transition.

> Currently runtime PM transitions are blocked in dpm_prepare() and only
> allowed again in dpm_complete(). How about locking out runtime PM
> transitions only until the DPM suspend operation is complete. IOW,
> rather than waiting for dpm_complete() to re-allow runtime PM
> transitions, what about allowing them after dpm_suspend()? I haven't
> actually tested this yet, since I'm busy with getting OMAP PM stuff
> ready for the merge window, so it's just and idea so far. Of course
> similar will be needed to block runtime PM transitions during
> dpm_resume().

That would defeat the purpose. We need to prevent unwanted state
changes during the entire sleep transition, not just during the time
that dpm_suspend is running.

Alan Stern

2010-12-20 21:17:48

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Monday, December 20, 2010, Alan Stern wrote:
> On Sun, 19 Dec 2010, Rafael J. Wysocki wrote:
>
> > That said, I think we may do something different that perhaps will make your
> > life somewhat easier.
> >
> > Namely, if your driver doesn't implement any system suspend callbacks
> > (i.e. ->suspend(), ->resume(), ->freeze(), ->thaw() etc.) and it doesn't
> > want the analogous subsystem callbacks to be executed for the device, it will
> > make sense to flag the device as "runtime only", or something like this,
> > which make the PM core skip the device in dpm_suspend() etc.
> >
> > In that case, if a device if flagged as "runtime only", we can avoid
> > calling pm_runtime_get_noirq() for it in dpm_prepare() and, analogously,
> > calling pm_runtime_put_sync() for it in dpm_complete(). However, we will have
> > to fail system suspend (or hibernation) if a "runtime only" device has the
> > power.runtime_auto flag unset.
>
> Or more generally, if pm_runtime_suspended() doesn't return 'true' for
> the device.

That's not necessary, because the device may be suspended using
pm_runtime_suspend() later than we check pm_runtime_suspended().

I'd use the "runtime only" (or perhaps better "no_dpm") flag as a declaration
(if set) that the device is going to be suspended with the help of "runtime"
callbacks and the driver takes the responsibility for getting things right.

> But if the device gets suspended asynchronously, this may
> very well happen. For example, an i2c device is originally runtime
> suspended, but its device_suspend() call occurs at the same time as the
> call for the RTC device, so the i2c device actually happens to be
> resumed at that time in order to communicate with the RTC.
>
> > So, I think we can add a "runtime only" flag working as described above.
> > I guess it will also help in the case I've been discussing with Kevin for some
> > time (i2c device using runtime PM used by an RTC in a semi-transparent
> > fashion).
> >
> > Alan, what do you think?
>
> I'm not sure. In this situation, should we worry more that we usually
> do about the possibility of a runtime resume occurring after the device
> has gone through device_suspend()? Or just depend on the driver to do
> everything correctly?

The latter.

Thanks,
Rafael

2010-12-25 21:54:20

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

Hi Ohad,

On Sat, Dec 25, 2010 at 9:58 PM, Ohad Ben-Cohen <[email protected]> wrote:
> On Sat, Dec 25, 2010 at 6:21 PM, Alan Stern <[email protected]> wrote:
>> Right. ?You may or may not realize it, but this requirement means that
>> the driver _must_ bypass runtime PM sometimes.
>
> http://www.spinics.net/lists/linux-pm/msg22864.html
>
>> Now you've lost me. ?Which of the driver's handlers are you talking
>> about?
>
> The driver's handler, which is called by mac80211, and is responsible
> to power off the device.
> The _same_ handler is being called either during runtime or during a
> system transition to suspend
>
>> What races?
>
> Driver thinks power is off and device is now fully reset, but it's isn't really

maybe it's worth starting off with the description of chip power
states and how they are mapped to runtime PM and static PM?

Most of the WLAN chips have some very low power modes, but you're
talking about _complete_ shutdown as a suspended state for both
runtime PM and static PM, is that correct?

Thanks,
Vitaly

2010-12-26 12:44:00

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sun, Dec 26, 2010 at 1:45 PM, Rafael J. Wysocki <[email protected]> wrote:
> Why does the driver need the device to be reset even though it hasn't
> been suspeneded yet?

Because it is asked to stop the hardware by mac80211.

2010-12-28 19:12:05

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sun, Dec 26, 2010 at 8:35 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Sunday, December 26, 2010, Ohad Ben-Cohen wrote:
>> On Sun, Dec 26, 2010 at 1:45 PM, Rafael J. Wysocki <[email protected]> wrote:
>> > Why does the driver need the device to be reset even though it hasn't
>> > been suspeneded yet?
>>
>> Because it is asked to stop the hardware by mac80211.
>
> So I guess the mac80211 layer should ask it to start it again.

It does...

And at this point the driver will try to boot a new firmware, but it
will only succeed if the device was indeed powered off. If it wasn't
(system suspend was cancelled before the host controller suspended),
the driver will fail to resume the device (because it can't reset it).

We can change man80211 to let us know the system is suspending, and
then we will power down the device directly. Or we can use something
like your "runtime only" proposal, and then pm_runtime_put_sync() will
just work for us regardless of the system state.

But that will not solve the /sys/devices/.../power/control problem.
For that we will either have always to bypass runtime PM, or introduce
something like "always auto"...

Thanks,
Ohad.

2010-12-28 19:15:48

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sun, Dec 26, 2010 at 8:37 PM, Rafael J. Wysocki <[email protected]> wrote:
> So, it only happens during asynchronous suspend? ?In other words, if suspend
> is synchronous, everything should be fine, right?

Not necessarily.

Consider this simple scenario, where a device was added after the mmc
host controller, but before mac80211. In this case its suspend handler
will have the chance to abort system suspend after mac80211 already
told our driver to power down the device (but the device wasn't
powered down yet, because the driver used pm_runtime_put_sync() which
is disabled).

Thanks,
Ohad.

2010-12-30 04:30:22

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Wed, 29 Dec 2010, Ohad Ben-Cohen wrote:

> On Tue, Dec 28, 2010 at 11:46 PM, Alan Stern <[email protected]> wrote:
> > Similarly, during system suspend mmc_suspend_host() should be
> > responsible for doing all the necessary power-down operations. ?Runtime
> > PM is completely out of the picture at this time. ?And this should be
> > independent of mac80211 -- in fact, the card should be powered down
> > appropriately even if the kernel doesn't have a mac80211 layer.
>
> And it is.
>
> But in order to boot the firmware on resume, we need to reset the
> device. And if system suspend has been cancelled before
> mmc_power_off() was invoked, we will not be able to. So, in this case
> too, the driver will have to power the device off.

If the suspend transition was cancelled before mmc_power_off(), why do
you need to boot the firmware on resume? Since the device hasn't
actually been powered off, can't it continue using the same firmware as
before?

What routine is responsible for deciding to boot the firmware, and what
is its call path?

> It all really boils down to the same solution - we will always have to
> bypass runtime PM and directly control the power of the device.

While this is probably true, I'm not at all convinced that the current
design is layered correctly. Fixing the design might reduce the amount
of bypassing required.

Alan Stern

2010-12-25 20:58:26

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sat, Dec 25, 2010 at 6:21 PM, Alan Stern <[email protected]> wrote:
> Right. ?You may or may not realize it, but this requirement means that
> the driver _must_ bypass runtime PM sometimes.

http://www.spinics.net/lists/linux-pm/msg22864.html

> Now you've lost me. ?Which of the driver's handlers are you talking
> about?

The driver's handler, which is called by mac80211, and is responsible
to power off the device.
The _same_ handler is being called either during runtime or during a
system transition to suspend

> What races?

Driver thinks power is off and device is now fully reset, but it's isn't really

> Why does the driver have to _assume_ the device was powered off -- why
> doesn't it simply _do_ the power down?

runtime PM is disabled during suspend. and the driver doesn't know
that the system is suspending, and it is using pm_runtime_put_sync().
It's the same code that executes during runtime and while system is
suspending

Even if we changed mac80211 to tell us the system is suspending, so we
would power off the device directly in this case, it won't solve all
of our problems, because even during runtime we need to bypass runtime
PM due to /sys/devices/.../power/control.

> Maybe it would help if you provided a list of the relevant subroutines
> together with descriptions of the circumstances under which they are
> called and what they are expected to do. ?For example, a brief
> pseudo-code listing.

Frankly, I don't think it's worth it.

We're just a single use case and Rafael already said he won't support it.

> It's not necessary to maintain usage_count while you're bypassing
> runtime PM. ?Just make sure when you're done that everything is back in
> sync.

I think you are missing the fact that we will have to bypass runtime
PM also during runtime, and not only in suspend/resume, and that's due
to /sys/devices/.../power/control.

That's why we will also have to maintain usage_count - to prevent
dpm_complete() from powering the device down, when it is really up.

> Why shouldn't dpm_complete cause the device to be powered down
> (assuming the device isn't in use, of course)?

Because it _is_ in use

2010-12-25 07:34:55

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Thu, Dec 23, 2010 at 6:03 PM, Alan Stern <[email protected]> wrote:
> I'm still not aware of the details of your situation. ?Nevertheless, it
> shouldn't be hard to set up a suspend() routine that would do whatever
> was needed to power down the device, whether this means calling the
> runtime_suspend() routine directly or something else. ?That's basically
> what every other driver does.

And that's what we already do as well, but as I wrote earlier, in our
specific scenario this _breaks_ if system suspend is cancelled (for
whatever reason) _before_ our suspend handler kicks in (and after
mac80211 has suspended our driver).

Moreover, even if it did work, it wouldn't have been enough.

I'll try to explain, and if something is still not clear, please let me know.

Our wlan device is an ARM microcontroller running some flavor of RTOS
(i.e. the firmware); as I mentioned before, its power and reset
functionalities are tied together (as far as software is concerned. a
small detail I won't get into now). The ability to unconditionally
power it down is needed for error recovery, for booting a new firmware
(and for unconditionally stopping all radios...).

The driver assumes it can unconditionally power down the device, and
is built around this assumption, so a user interface such as
/sys/devices/.../power/control has no sense, and if set to 'on', will
be fatal for this driver (e.g. it will not be possible to bring the
wlan interface down and up).

There is no way around this. The driver must be able to
unconditionally power the hardware down.

About the suspend/resume issue:

This is a mac80211 driver. It is basically a set of "dumb" handlers,
which mac80211 uses to abstract the hardware differences between its
various drivers. For example, there are handlers to take down/up an
interface, to start/stop the hardware, etc..

When one of the driver's handlers is being called, the driver doesn't
really know (or care) if this is during runtime or not. If it is asked
to stop its hardware, it should just do so. And when (in our case)
this handler is invoked during system suspend, any disability to power
off the device immediately opens a window for races due to the
driver's assumption, on resume, that the device was powered off
successfully. So, yeah, we do have a suspend() handler, which powers
the device off directly, but we need the "runtime" handler of the
driver to immediately succeed too in order to prevent the race (and
then it's fully powered off and we wouldn't need to wait for the
suspend() handler to do so too). For that to happen, we need runtime
PM not to be disabled during system suspend (or by
/sys/devices/.../power/control).

Tweaking the driver around this limitation (to realize somehow if the
hardware was really powered down or not) doesn't make sense and
frankly isn't worth the effort, since the driver anyway has to be able
to unconditionally power down the device for the aforementioned
reasons (error recovery, reboot a new fw, disable radios, ..).

A small comment: SDIO drivers' suspend handler is actually triggered
by the mmc host controller's suspend routine (through the SDIO
subsystem); it's not the classic dpm-triggered suspend handler, so
Rafael's notion of "runtime only" flag will work here.

Why runtime PM:

The device has an SDIO interface, so one of its incarnations is an SDIO driver.

Short background: MMC/SDIO cards are dynamically probed, identified
and configured by the MMC/SDIO subsystem, so toggling their power must
take place from within the MMC/SDIO subsystem itself.

Until recently, MMC/SDIO cards were kept powered on from the moment
they were inserted, up until they were removed (exception: power was
removed on system suspend and brought up back on resume. there is an
exception to this exception, too, but I won't get into that now).

Recently, we have added runtime PM support to the MMC/SDIO subsystem,
so cards can be powered down when they're not in use. E.g., an SDIO
card is powered down if no driver is available or requires power to
it, and an MMC card might be powered off after some period of
inactivity (the latter is just being discussed and has not yet hit
mainline).

As far as the MMC/SDIO subsystem is concerned, this is merely a power
optimization, and it's perfectly fine if the user will disable runtime
PM for the card and by that disallow powering it down.

But for our particular device this is fatal; as explained, the driver
must have unconditional control of the device's power.

So we need runtime PM at the subsystem level (to allow the MMC/SDIO
subsystem power it off in case the driver is not loaded), but we will
have no choice but bypass runtime PM at the driver level, in order to
avoid the aforementioned suspend race, and a potential
/sys/devices/.../power/control block.

To maintain coherency with the runtime-PM's usage_count (and by that
prevent dpm_complete() from powering off our device after a system
resume) we will also keep calling the pm_runtime_{get, put} API
appropriately.

It's not pretty, but this is the only way we can make this work
(unless, of course, our use case will be supported within the
runtime-PM framework itself).

Thanks,
Ohad.

2010-12-19 10:22:52

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Saturday, December 18, 2010, Rafael J. Wysocki wrote:
> On Saturday, December 18, 2010, Ohad Ben-Cohen wrote:
> > On Sat, Dec 18, 2010 at 5:07 PM, Rafael J. Wysocki <[email protected]> wrote:
> > > On Saturday, December 18, 2010, Ohad Ben-Cohen wrote:
> > >> On Sat, Dec 11, 2010 at 4:50 PM, Alan Stern <[email protected]> wrote:
> > >> >> Think of a device which you have no way to reset other than powering
> > >> >> it down and up again.
> > >> >>
> > >> >> If that device is managed by runtime PM, the only way to do that is by
> > >> >> using put_sync() followed by a get_sync(). Sure, if someone else
> > >> >> increased the usage_count of that device this won't work, but then of
> > >> >> course it means that the driver is not using runtime PM correctly.
> > >> >
> > >> > Not so. Even if a driver uses runtime PM correctly, there will still
> > >> > be times when someone else has increased the usage_count. This happens
> > >> > during probing and during system resume, for example.
> > >>
> > >> I'm aware of these two examples; normally we're good with them since
> > >> during probing we're not toggling the power, and during suspend/resume
> > >> the SDIO core is responsible for manipulating the power (and it does
> > >> so directly). Are there (or do you think there will be) additional
> > >> examples where this can happen ?
> > >>
> > >> But this leads me to a real problem which we have encountered.
> > >>
> > >> During system suspend, our driver is asked (by mac80211's suspend
> > >> handler) to power off its device. When this happens, the driver has no
> > >> idea that the system is suspending - regular driver code (responsible
> > >> to remove the wlan interface and stop the device) is being called.
> > >
> > > That's where the problem is. If there's a difference, from the driver's
> > > point of view, between suspend and some other operation, there should be a
> > > way to tell the driver what case it actually is dealing with.
> >
> > Yes, the problem will be solved if the driver would bypass the runtime
> > PM framework on system suspend. mac80211 obviously has this
> > information, and technically it's very easy to let the driver know
> > about it.
> >
> > But the difference between suspend and normal operation is really
> > artificial: in both cases mac80211 just asks the driver to power its
> > device down, and the end result is exactly the same (a GPIO line of
> > the device is de-asserted in our case). The difference between these
> > two scenarios
> > exist only because runtime PM is effectively disabled during system
> > suspend, and therefore the driver has to look for an alternative way
> > to power down the device.
>
> That's not correct, sorry.
>
> There is a bug and the bug is that you use the runtime PM to power down
> the device in every situation, although you obviously shouldn't do that
> (e.g. because it may be disabled by user space or it _is_ disabled by
> the PM core during system suspend).
>
> > > BTW, what would you do in that case if the runtime PM of the device were
> > > disabled by user space by writing "on" to the device's
> > > /sys/devices/.../power/control file?
> >
> > That's a good point.
> >
> > Blocking runtime PM for this device is fatal since this particular
> > device has functionality tied up with its power control (no other way
> > to reset it).
> >
> > It might call for a device-specific dev_pm_info bit flag to prohibit this...
>
> No way.

That said, I think we may do something different that perhaps will make your
life somewhat easier.

Namely, if your driver doesn't implement any system suspend callbacks
(i.e. ->suspend(), ->resume(), ->freeze(), ->thaw() etc.) and it doesn't
want the analogous subsystem callbacks to be executed for the device, it will
make sense to flag the device as "runtime only", or something like this,
which make the PM core skip the device in dpm_suspend() etc.

In that case, if a device if flagged as "runtime only", we can avoid
calling pm_runtime_get_noirq() for it in dpm_prepare() and, analogously,
calling pm_runtime_put_sync() for it in dpm_complete(). However, we will have
to fail system suspend (or hibernation) if a "runtime only" device has the
power.runtime_auto flag unset.

So, I think we can add a "runtime only" flag working as described above.
I guess it will also help in the case I've been discussing with Kevin for some
time (i2c device using runtime PM used by an RTC in a semi-transparent
fashion).

Alan, what do you think?

Rafael

2010-12-18 22:47:38

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Saturday, December 18, 2010, Ohad Ben-Cohen wrote:
> On Sat, Dec 18, 2010 at 5:07 PM, Rafael J. Wysocki <[email protected]> wrote:
> > On Saturday, December 18, 2010, Ohad Ben-Cohen wrote:
> >> On Sat, Dec 11, 2010 at 4:50 PM, Alan Stern <[email protected]> wrote:
> >> >> Think of a device which you have no way to reset other than powering
> >> >> it down and up again.
> >> >>
> >> >> If that device is managed by runtime PM, the only way to do that is by
> >> >> using put_sync() followed by a get_sync(). Sure, if someone else
> >> >> increased the usage_count of that device this won't work, but then of
> >> >> course it means that the driver is not using runtime PM correctly.
> >> >
> >> > Not so. Even if a driver uses runtime PM correctly, there will still
> >> > be times when someone else has increased the usage_count. This happens
> >> > during probing and during system resume, for example.
> >>
> >> I'm aware of these two examples; normally we're good with them since
> >> during probing we're not toggling the power, and during suspend/resume
> >> the SDIO core is responsible for manipulating the power (and it does
> >> so directly). Are there (or do you think there will be) additional
> >> examples where this can happen ?
> >>
> >> But this leads me to a real problem which we have encountered.
> >>
> >> During system suspend, our driver is asked (by mac80211's suspend
> >> handler) to power off its device. When this happens, the driver has no
> >> idea that the system is suspending - regular driver code (responsible
> >> to remove the wlan interface and stop the device) is being called.
> >
> > That's where the problem is. If there's a difference, from the driver's
> > point of view, between suspend and some other operation, there should be a
> > way to tell the driver what case it actually is dealing with.
>
> Yes, the problem will be solved if the driver would bypass the runtime
> PM framework on system suspend. mac80211 obviously has this
> information, and technically it's very easy to let the driver know
> about it.
>
> But the difference between suspend and normal operation is really
> artificial: in both cases mac80211 just asks the driver to power its
> device down, and the end result is exactly the same (a GPIO line of
> the device is de-asserted in our case). The difference between these
> two scenarios
> exist only because runtime PM is effectively disabled during system
> suspend, and therefore the driver has to look for an alternative way
> to power down the device.

That's not correct, sorry.

There is a bug and the bug is that you use the runtime PM to power down
the device in every situation, although you obviously shouldn't do that
(e.g. because it may be disabled by user space or it _is_ disabled by
the PM core during system suspend).

> > BTW, what would you do in that case if the runtime PM of the device were
> > disabled by user space by writing "on" to the device's
> > /sys/devices/.../power/control file?
>
> That's a good point.
>
> Blocking runtime PM for this device is fatal since this particular
> device has functionality tied up with its power control (no other way
> to reset it).
>
> It might call for a device-specific dev_pm_info bit flag to prohibit this...

No way.

> (Or.. not using runtime PM at all for this device. But that would call
> for SDIO bus changes, because there is no other way to power off SDIO
> devices in the absence of their drivers. Moreover, the device would
> not gain from runtime PM's system integration, e.g. powertop
> statistics, etc..)

Let me repeat, please. The runtime PM framework is _not_ _suitable_ for
_general_ runtime PM of devices, because (a) it may be disabled by user space
at any time and (b) it is disabled by the PM core during system suspend.

Thanks,
Rafael

2010-12-28 21:46:34

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Tue, 28 Dec 2010, Ohad Ben-Cohen wrote:

> On Sun, Dec 26, 2010 at 7:00 PM, Alan Stern <[email protected]> wrote:
> > Hmm. ?It's a little difficult to untangle the web of dev_pm_ops
> > pointers and other stuff.
>
> Yeah, SDIO suspend/resume is very different from other subsystems.
>
> There are several layers of abstractions involved, from the host
> controller driver, to the MMC core code, to the SDIO core code, to
> finally get to the actual SDIO function driver.

Actually that sounds quite a lot like the USB subsystem, except perhaps
for the fact that you have two layers of core code instead of just one.
The cards are much like USB devices (indivisible for the purposes of
power management) and the functions are much like USB interfaces (bound
to individual drivers but not independently power-managed or
resettable).

> I'll try to explain it below by addressing your questions. If
> something is still unclear, feel free to ask me.
>
> > There's wl1271_suspend() and wl1271_resume()
> > (which don't do anything)
>
> It just looks like they don't. In fact, by returning 0,
> wl1271_suspend() is telling the SDIO subsystem that it's OK to power
> down the SDIO function (in general an SDIO card can have several SDIO
> functions, each of which may be driven by a separate SDIO driver). If
> all the SDIO functions agreed, then SDIO core (mmc_sdio_suspend())
> tells MMC core (mmc_suspend_host()) it's OK to power down the card (by
> directly calling mmc_power_off()).

What's the relation between mmc_power_off() and mmc_power_save_host()?

When you say "power down", which of these routines do you mean? I get
the feeling that sometimes you mean one and sometimes the other, which
leads to problems in understanding.

Why are there two separate routines?

Does one merely go into a low-power state whereas the other does the
complete-power-off reset?

If so, which does which?

What's the reason for not going into the complete-power-off state every
time?

If it is latency, shouldn't the runtime-PM path use the low-latency
routine and the system-sleep path use the high-latency routine?
Although I can't tell for certain, it seems to be the other way around.

> > Under what circumstances does the MMC/SDIO core call
> > wl1271_sdio_set_power(), and where are those function calls?
>
> The relations are actually reversed: the MMC/SDIO core never calls
> this function.
>
> This function is called by the wl1271 driver itself, mostly as a
> response to a request by the mac80211 layer (but also as a response to
> a hardware error), that requires manipulation of the power of the
> card.
>
> There are a handful of reasons this may happen:
>
> error recovery:
>
> wl1271_recovery_work() ->
> __wl1271_op_remove_interface() ->
> wl1271_power_off() ->
> wl1271_sdio_set_power() ->
> wl1271_sdio_power_off() ->
> pm_runtime_put_sync()
>
> wlan interface goes down:
>
> ieee80211_do_stop() ->
> wl1271_op_remove_interface() ->
> __wl1271_op_remove_interface() ->
> ...
>
> system suspend:
>
> __ieee80211_suspend ->
> wl1271_op_remove_interface() ->
> ...

Okay. These paths should be interrelated with power management in a
different way.

It's correct for wl1271_op_remove_interface() to be called in these
three situations, and it's correct to do a pm_runtime_put_sync(). But
it's not correct to rely on that to actually change any power levels or
do a reset.

Instead, wl1271_error_recovery_work() should call either
mmc_suspend_host() or mmc_power_save_host() (whichever one does the
actual power-off reset) directly, after calling wl1271_power_off().
In fact, you might even want to do a pm_runtime_get_noresume() before
calling wl1271_power_off(), to prevent the PM core from interfering
with the reset.

Similarly, during system suspend mmc_suspend_host() should be
responsible for doing all the necessary power-down operations. Runtime
PM is completely out of the picture at this time. And this should be
independent of mac80211 -- in fact, the card should be powered down
appropriately even if the kernel doesn't have a mac80211 layer.

During wlan-interface-down, it's not necessary to reduce the power
level; it's merely desirable. That's exactly the sort of thing runtime
PM is meant for. Hence the existing call to pm_runtime_put_sync() is
sufficient.

> > Is it possible for there to be more than one device connected to the
> > same MMC/SDIO controller?
>
> Generally yes, host controllers may have several slots, but to the
> MMC/SDIO core, it is presented as each slot is a separate host
> controller (personally I have never seen such hardware, but I did find
> an example for that in the host controller's source folder).

Put it another way: Suppose an SDIO card has more than one function
and you need to reset the wl1271 function. It sounds like there's no
way to do this without resetting the other functions as well. What
happens if another function's driver is busy and can't allow a reset
just then?

Alan Stern

2010-12-18 21:29:14

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sat, 18 Dec 2010, Johannes Berg wrote:

> Sounds to me like the difference isn't really in the driver, but the
> core PM subsystem. Why does it care when powering off a device whether
> it's during suspend, or during runtime?

The two operations are quite different. One involves preparing the
device to have its power removed (when the system goes to sleep), and
the other involves actually reducing the device's power. (In other
words, system suspend involves telling the driver "The computer is
going to sleep in a moment, so get ready", whereas runtime suspend
involves telling the driver "Your device isn't being used now so feel
free to reduce its power".) There also are differences with regard to
how wakeup events are signalled. Obviously there's a great deal of
similarity and overlap, but they are _not_ the same in general.

Now these differences don't matter so much to the PM core, but they
should matter to subsystems and drivers. The PM core cares just enough
to know that it has to invoke different callbacks for the different
operations.

Alan Stern

2010-12-26 05:27:36

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sat, Dec 25, 2010 at 11:50 PM, Vitaly Wool <[email protected]> wrote:
> you're talking about _complete_ shutdown as a suspended state for both
> runtime PM and static PM, is that correct?

Exactly.

This is the power of the card as seen by the MMC/SDIO subsystem.

2010-12-18 22:53:07

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Saturday, December 18, 2010, Ohad Ben-Cohen wrote:
> On Sat, Dec 18, 2010 at 6:40 PM, Johannes Berg
> <[email protected]> wrote:
> > On Sat, 2010-12-18 at 18:00 +0200, Ohad Ben-Cohen wrote:
> >
> >> > That's where the problem is. If there's a difference, from the driver's
> >> > point of view, between suspend and some other operation, there should be a
> >> > way to tell the driver what case it actually is dealing with.
> >>
> >> Yes, the problem will be solved if the driver would bypass the runtime
> >> PM framework on system suspend. mac80211 obviously has this
> >> information, and technically it's very easy to let the driver know
> >> about it.
> >>
> >> But the difference between suspend and normal operation is really
> >> artificial: in both cases mac80211 just asks the driver to power its
> >> device down, and the end result is exactly the same (a GPIO line of
> >> the device is de-asserted in our case). The difference between these
> >> two scenarios
> >> exist only because runtime PM is effectively disabled during system
> >> suspend, and therefore the driver has to look for an alternative way
> >> to power down the device.
> >
> > Sounds to me like the difference isn't really in the driver, but the
> > core PM subsystem. Why does it care when powering off a device whether
> > it's during suspend, or during runtime?
>
> Agree.
>
> If we can add a dev_pm_info bit, that would allow using runtime PM API
> during suspend/resume transitions, the driver will not have to care.
>
> Rafael what do you think ? Is that totally unacceptable ?

Already said. It is not acceptable at all.

Thanks,
Rafael

2010-12-18 19:09:16

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sat, Dec 18, 2010 at 6:40 PM, Johannes Berg
<[email protected]> wrote:
> On Sat, 2010-12-18 at 18:00 +0200, Ohad Ben-Cohen wrote:
>
>> > That's where the problem is. ?If there's a difference, from the driver's
>> > point of view, between suspend and some other operation, there should be a
>> > way to tell the driver what case it actually is dealing with.
>>
>> Yes, the problem will be solved if the driver would bypass the runtime
>> PM framework on system suspend. mac80211 obviously has this
>> information, and technically it's very easy to let the driver know
>> about it.
>>
>> But the difference between suspend and normal operation is really
>> artificial: in both cases mac80211 just asks the driver to power its
>> device down, and the end result is exactly the same (a GPIO line of
>> the device is de-asserted in our case). The difference between these
>> two scenarios
>> exist only because runtime PM is effectively disabled during system
>> suspend, and therefore the driver has to look for an alternative way
>> to power down the device.
>
> Sounds to me like the difference isn't really in the driver, but the
> core PM subsystem. Why does it care when powering off a device whether
> it's during suspend, or during runtime?

Agree.

If we can add a dev_pm_info bit, that would allow using runtime PM API
during suspend/resume transitions, the driver will not have to care.

Rafael what do you think ? Is that totally unacceptable ?

Thanks,
Ohad.

2010-12-28 20:37:07

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Tuesday, December 28, 2010, Ohad Ben-Cohen wrote:
> On Tue, Dec 28, 2010 at 9:21 PM, Rafael J. Wysocki <[email protected]> wrote:
> > It looks like you could simply do a power down-power up cycle before trying to
> > load new firmware, just in case. I guess that's suboptimal for some reason?
>
> It would work, but we will not be able to unconditionally disable the
> radios (e.g. airplane mode comes to mind).

For _unconditional_ disabling you need to bypass the runtime PM anyway.

I guess it's best to simply disable it and invoke whatever-function-powers-off
your device directly in those cases.

> > Please pretend that the runtime PM framework doesn't exist for a while. How
> > would you design things in that case?
>
> Duplicate most of runtime-PM's plumbing into the MMC/SDIO subsystem.
> Off the top of my hat:
> - We need the device hierarchy and the suspend/resume dependencies (a
> single SDIO card has several logical sub-devices, a.k.a SDIO
> functions)
> - We need to maintain usage_count for each device, and expose the same
> API to handle it
> - We need autosuspend for MMC cards (power them off on inactivity)
> - We need the same, or similar, locking plumbing
> - We probably need the same sysfs ABI as well: autosuspend_delay_ms,
> and even /sys/devices/.../power/control itself is useful (not for our
> device, but for others, sure)
> - ...

I didn't mean that, never mind.

To solve the problem with system suspend (I think) you need to:
(1) Add a ->prepare() callback to your driver, calling pm_runtime_resume()
for the device. From that point on you know that its runtime callbacks
won't be called and if all of the pm_runtime_get_* and pm_runtime_put_*
things are in balance, everything will work out nicely during resume.
(2) Add ->suspend() and ->resume() callbacks to your driver that will
set the device for system suspend and bring it back to the "power on"
state during system resume (that will cover your error code path issue in
particular).
During resume, the pm_runtime_put_sync() in dpm_complete() should put the
device back into the low power state (if it was in that state before).

Thanks,
Rafael

2010-12-23 16:03:24

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Thu, 23 Dec 2010, Ohad Ben-Cohen wrote:

> On Sun, Dec 19, 2010 at 12:22 PM, Rafael J. Wysocki <[email protected]> wrote:
> > That said, I think we may do something different that perhaps will make your
> > life somewhat easier.
> ...
> > So, I think we can add a "runtime only" flag working as described above.
>
> That will definitely solve the suspend/resume issue we're having.
>
> But due to /sys/devices/.../power/control, we will still have to
> bypass runtime PM for this particular device.
>
> We need a way to unconditionally power down the device, and as long as
> runtime PM can be disabled by the user, we can't use it.

I'm still not aware of the details of your situation. Nevertheless, it
shouldn't be hard to set up a suspend() routine that would do whatever
was needed to power down the device, whether this means calling the
runtime_suspend() routine directly or something else. That's basically
what every other driver does.

By definition, system sleep transitions require bypassing runtime PM.
There's nothing odd or unusual about it.

Alan Stern

2010-12-26 18:36:15

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sunday, December 26, 2010, Ohad Ben-Cohen wrote:
> On Sun, Dec 26, 2010 at 1:45 PM, Rafael J. Wysocki <[email protected]> wrote:
> > Why does the driver need the device to be reset even though it hasn't
> > been suspeneded yet?
>
> Because it is asked to stop the hardware by mac80211.

So I guess the mac80211 layer should ask it to start it again.

Thanks,
Rafael

2010-12-18 21:30:19

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sat, 18 Dec 2010, Ohad Ben-Cohen wrote:

> > Sounds to me like the difference isn't really in the driver, but the
> > core PM subsystem. Why does it care when powering off a device whether
> > it's during suspend, or during runtime?
>
> Agree.
>
> If we can add a dev_pm_info bit, that would allow using runtime PM API
> during suspend/resume transitions, the driver will not have to care.
>
> Rafael what do you think ? Is that totally unacceptable ?

Have you forgotten about the "echo on >.../power/control" scenario?

Alan Stern

2010-12-26 02:48:11

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sat, 25 Dec 2010, Ohad Ben-Cohen wrote:

> On Sat, Dec 25, 2010 at 6:21 PM, Alan Stern <[email protected]> wrote:
> > Right. ?You may or may not realize it, but this requirement means that
> > the driver _must_ bypass runtime PM sometimes.
>
> http://www.spinics.net/lists/linux-pm/msg22864.html
>
> > Now you've lost me. ?Which of the driver's handlers are you talking
> > about?
>
> The driver's handler, which is called by mac80211, and is responsible
> to power off the device.
> The _same_ handler is being called either during runtime or during a
> system transition to suspend

Is there a separate handler responsible for powering the device back
on? What are the names of these handlers?

> > What races?
>
> Driver thinks power is off and device is now fully reset, but it's isn't really

That's not a race; it's just a misunderstanding. A race is when you
have two threads of control executing concurrently and either one can
end up carrying out some action first.

> > Why does the driver have to _assume_ the device was powered off -- why
> > doesn't it simply _do_ the power down?
>
> runtime PM is disabled during suspend. and the driver doesn't know
> that the system is suspending, and it is using pm_runtime_put_sync().

What is the name of the routine that calls pm_runtime_put_sync()?
What is the name of the driver's runtime_suspend routine? Is either of
them the same as the handler you mentioned above? And while we're at
it, what is the name of the routine that _actually_ changes the
device's power level?

It's very difficult to talk about different pieces of code without
knowing their names.

> It's the same code that executes during runtime and while system is
> suspending

Okay, there's nothing wrong with that. But the code that runs during
system suspend should not call pm_runtime_put_sync().

> Even if we changed mac80211 to tell us the system is suspending, so we
> would power off the device directly in this case, it won't solve all
> of our problems, because even during runtime we need to bypass runtime
> PM due to /sys/devices/.../power/control.

Evidently you are facing two distinct problems, and they need to be
solved together. In fact, solving one problem (bypassing runtime PM)
will probably help to solve the other (doing system resume correctly).

> > Maybe it would help if you provided a list of the relevant subroutines
> > together with descriptions of the circumstances under which they are
> > called and what they are expected to do. ?For example, a brief
> > pseudo-code listing.
>
> Frankly, I don't think it's worth it.
>
> We're just a single use case and Rafael already said he won't support it.

Well, I'd like to help you find the best solution, but I can't because
I don't understand how the subsystem and driver are structured, and you
aren't providing enough details. We won't make any further progress on
this unless you abandon the incomplete high-level overviews and instead
give a more or less complete lower-level description -- including the
subroutine names!

> > It's not necessary to maintain usage_count while you're bypassing
> > runtime PM. ?Just make sure when you're done that everything is back in
> > sync.
>
> I think you are missing the fact that we will have to bypass runtime
> PM also during runtime, and not only in suspend/resume, and that's due
> to /sys/devices/.../power/control.

No, I realize that.

> That's why we will also have to maintain usage_count - to prevent
> dpm_complete() from powering the device down, when it is really up.

When else should dpm_complete() power the device down? You certainly
don't want to power the device down when it is already down! :-)

Maybe you mean that you want to prevent dpm_complete() from powering
the device down while it is in use. That should never be a problem --
the device should never be used without the PM core being aware. The
unusual cases you have described all involve powering the device down
at times when it is supposed to be in use (e.g., in order to do a
reset). If you do these power-downs directly instead of trying to use
runtime PM, then the PM core will never think the device is idle when
it really is being used.

Ultimately it boils down to this. You have several possible reasons
for powering-down the device: runtime PM, system sleep, and reset (or
other similar driver-specific things). Presumably the reset case
occurs only while the device is in use, and with sufficient locking to
prevent the driver from trying to access the device while the reset is
in progress.

Therefore you need to have a single routine that actually powers-down
the device, and you need to call this routine in different settings:
during system sleep or runtime suspend (the same code in the driver
handles these two cases, right?) and when a reset is needed.

Alan Stern

2010-12-21 00:57:50

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Mon, 20 Dec 2010, Rafael J. Wysocki wrote:

> > > In that case, if a device if flagged as "runtime only", we can avoid
> > > calling pm_runtime_get_noirq() for it in dpm_prepare() and, analogously,
> > > calling pm_runtime_put_sync() for it in dpm_complete(). However, we will have
> > > to fail system suspend (or hibernation) if a "runtime only" device has the
> > > power.runtime_auto flag unset.
> >
> > Or more generally, if pm_runtime_suspended() doesn't return 'true' for
> > the device.
>
> That's not necessary, because the device may be suspended using
> pm_runtime_suspend() later than we check pm_runtime_suspended().

What if the device has a child in the RPM_ACTIVE state? Then
pm_runtime_suspend() won't do anything, even if the child really is
dpm-suspended.

> I'd use the "runtime only" (or perhaps better "no_dpm") flag as a declaration
> (if set) that the device is going to be suspended with the help of "runtime"
> callbacks and the driver takes the responsibility for getting things right.

I'm still not sure about this; the design isn't clear. Are these
runtime callbacks going to come from the PM core or from the driver?
If from the driver, how will the driver know when to issue them? What
about coordinating async suspends (the device must be suspended after
its children and before its parent)?

Alan Stern

2010-12-28 20:41:50

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Tue, Dec 28, 2010 at 10:04 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Tuesday, December 28, 2010, Ohad Ben-Cohen wrote:
>> On Sun, Dec 26, 2010 at 8:37 PM, Rafael J. Wysocki <[email protected]> wrote:
>> > So, it only happens during asynchronous suspend? ?In other words, if suspend
>> > is synchronous, everything should be fine, right?
>>
>> Not necessarily.
>
> So it's not a race after all, is it?

There are several scenarios to the same problem.. In one of them, we
were racing against an event that caused a suspend handler of an
entirely unrelated device to fail. I'm trying very hard not to
overload this thread with irrelevant details.. but anyway this forked
discussion is a bit moot IMHO

> Second, what you'd really want to do (I guess) is:
>
> pm_runtime_put_noidle(device);
> device->bus->pm->runtime_suspend(device);

Yes, our workaround is somewhere in these lines. We're using it
regardless of the system state (runtime or suspending), and frankly,
we're happy with it, just like I said in:

http://www.spinics.net/linux/lists/linux-mmc/msg05052.html

I still call it a workaround though...

2010-12-26 11:46:22

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sunday, December 26, 2010, Ohad Ben-Cohen wrote:
> On Sun, Dec 26, 2010 at 4:48 AM, Alan Stern <[email protected]> wrote:
> > Is there a separate handler responsible for powering the device back
> > on? What are the names of these handlers?
>
> wl1271_sdio_power_on() and wl1271_sdio_power_off().
>
> If you're taking a look, please do so using the latest code as seen on mmc-next:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc.git mmc-next
>
> > That's not a race
>
> It is.
>
> If system suspend will continue uninterruptedly, and the driver will
> be suspended (by the SDIO subsystem), then the device will be powered
> down, and everything will work.
>
> But if something will interrupt the suspend transition _before_ our
> driver is suspended (e.g. some other suspend() handler will fail),
> then we have a problem, because the device will not be reset as the
> driver needs it to be.

Now, that's interesting. (It still is not a race, though.)
Why does the driver need the device to be reset even though it hasn't
been suspeneded yet?

Rafael

2010-12-26 18:37:44

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sunday, December 26, 2010, Ohad Ben-Cohen wrote:
> On Sun, Dec 26, 2010 at 1:45 PM, Rafael J. Wysocki <[email protected]> wrote:
> > (It still is not a race, though.)
>
> Thread 1
> =======
>
> suspend_handler_of_a_random_device()
> {
> return failure;
> }
>
> Thread 2
> =======
>
> suspend_handler_of_our_mmc_host_controller()
> {
> invoke our sdio suspend handler and power down the card (directly);
> return success;
> }
>
> If thread 2 wins => everything is cool
> If thread 1 wins => our driver will not be able to resume
>
> The race begins after mac80211 invokes our driver's power off handler,
> which is using pm_runtime_put_sync().

So, it only happens during asynchronous suspend? In other words, if suspend
is synchronous, everything should be fine, right?

Rafael

2010-12-28 19:22:35

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Tuesday, December 28, 2010, Ohad Ben-Cohen wrote:
> On Sun, Dec 26, 2010 at 8:35 PM, Rafael J. Wysocki <[email protected]> wrote:
> > On Sunday, December 26, 2010, Ohad Ben-Cohen wrote:
> >> On Sun, Dec 26, 2010 at 1:45 PM, Rafael J. Wysocki <[email protected]> wrote:
> >> > Why does the driver need the device to be reset even though it hasn't
> >> > been suspeneded yet?
> >>
> >> Because it is asked to stop the hardware by mac80211.
> >
> > So I guess the mac80211 layer should ask it to start it again.
>
> It does...
>
> And at this point the driver will try to boot a new firmware, but it
> will only succeed if the device was indeed powered off. If it wasn't
> (system suspend was cancelled before the host controller suspended),
> the driver will fail to resume the device (because it can't reset it).

It looks like you could simply do a power down-power up cycle before trying to
load new firmware, just in case. I guess that's suboptimal for some reason?

> We can change man80211 to let us know the system is suspending, and
> then we will power down the device directly. Or we can use something
> like your "runtime only" proposal, and then pm_runtime_put_sync() will
> just work for us regardless of the system state.

The pm_runtime_put_sync() is irrelevant at this point IMHO. First, we should
figure out what needs to be done at the low level and _then_ think how to
code it. From that we'll learn if you _really_ need anything new from the PM
core, but quite frankly I seriously doubt it right now.

> But that will not solve the /sys/devices/.../power/control problem.
> For that we will either have always to bypass runtime PM, or introduce
> something like "always auto"...

Please pretend that the runtime PM framework doesn't exist for a while. How
would you design things in that case?

Rafael

2010-12-28 19:05:08

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sun, Dec 26, 2010 at 7:00 PM, Alan Stern <[email protected]> wrote:
> Hmm. ?It's a little difficult to untangle the web of dev_pm_ops
> pointers and other stuff.

Yeah, SDIO suspend/resume is very different from other subsystems.

There are several layers of abstractions involved, from the host
controller driver, to the MMC core code, to the SDIO core code, to
finally get to the actual SDIO function driver.

I'll try to explain it below by addressing your questions. If
something is still unclear, feel free to ask me.

> There's wl1271_suspend() and wl1271_resume()
> (which don't do anything)

It just looks like they don't. In fact, by returning 0,
wl1271_suspend() is telling the SDIO subsystem that it's OK to power
down the SDIO function (in general an SDIO card can have several SDIO
functions, each of which may be driven by a separate SDIO driver). If
all the SDIO functions agreed, then SDIO core (mmc_sdio_suspend())
tells MMC core (mmc_suspend_host()) it's OK to power down the card (by
directly calling mmc_power_off()).

> Under what circumstances does the MMC/SDIO core call
> wl1271_sdio_set_power(), and where are those function calls?

The relations are actually reversed: the MMC/SDIO core never calls
this function.

This function is called by the wl1271 driver itself, mostly as a
response to a request by the mac80211 layer (but also as a response to
a hardware error), that requires manipulation of the power of the
card.

There are a handful of reasons this may happen:

error recovery:

wl1271_recovery_work() ->
__wl1271_op_remove_interface() ->
wl1271_power_off() ->
wl1271_sdio_set_power() ->
wl1271_sdio_power_off() ->
pm_runtime_put_sync()

wlan interface goes down:

ieee80211_do_stop() ->
wl1271_op_remove_interface() ->
__wl1271_op_remove_interface() ->
...

system suspend:

__ieee80211_suspend ->
wl1271_op_remove_interface() ->
...

> The normal case is that system suspend succeeds. ?Then there is no
> race. ?If system suspend fails then the failure may occur either before
> or after the wl1271 device is powered down; _that_ is the race
> (assuming you are using asynchronous PM). ?But normally it doesn't
> occur.

True. On my setup I have stressed the solution for long nights (using
/sys/power/pm_test) without any error. But we found other setups with
different circumstances where this error showed up.

But it's not only due to the asynchronous PM race - a device that was
added to the device tree after the host controller, but before
mac80211, has a chance to abort a system suspend (by returning an
error in its suspend handler) at a point in time which will leave our
driver thinking (erroneously) that the wl1271 was powered off.

> You mean, the wl1271 device has no PM methods of its own; its power is
> managed entirely by the parent MMC/SDIO controller?

Yes.

> If that's right
> then you should call pm_runtime_no_callbacks() for the wl1271 device.

Yeah, that's in my queue...

>
> Is it possible for there to be more than one device connected to the
> same MMC/SDIO controller?

Generally yes, host controllers may have several slots, but to the
MMC/SDIO core, it is presented as each slot is a separate host
controller (personally I have never seen such hardware, but I did find
an example for that in the host controller's source folder).

> That's for runtime PM. ?What about system sleep? ?It looks like
> sdio_bus_type has no callbacks for suspend or resume, so the driver's
> callbacks get used instead -- that would be wl1271_suspend() and
> wl1271_resume(). ?But they don't do anything, so there's no power
> change until the parent MMC/SDIO controller is suspended. ?That would
> be in mmc_sdio_suspend() and mmc_sdio_resume()? ?But I can't tell what
> pmops->suspend and pmops->resume methods they end up calling. ?Are
> these the entries in wl1271_sdio_pm_ops? ?Does that mean
> wl1271_suspend() and wl1271_resume() get called twice?

No, wl1271_suspend() and wl1271_resume() are called once, but it's all
triggered by the host controller's suspend/resume handlers.

For example:

omap_hsmmc_suspend() ->
mmc_suspend_host() ->
mmc_sdio_suspend() ->
wl1271_suspend()

And then, if there are no errors, mmc_suspend_host() will eventually
call mmc_power_off(), which powers off the card directly. So only at
this point will our device get eventually shut down.

> I think I already understand how the runtime suspend/resume paths work:
> There are no runtime PM methods for the wl1271 device, so the PM core
> simply propagates changes up to the parent MMC/SDIO device and ends up
> calling mmc_runtime_suspend() or mmc_runtime_resume(). ?Right?

Right.

> What is the pathway for reset (or other similar activities that
> require the device to be powered-down while it is in use)?

During runtime, the device will have to be powered off for error
recovery and every time the wlan interface goes down. Please see the
pathways above.

Thanks,
Ohad.

2010-12-29 08:02:01

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Tue, Dec 28, 2010 at 11:46 PM, Alan Stern <[email protected]> wrote:
> Similarly, during system suspend mmc_suspend_host() should be
> responsible for doing all the necessary power-down operations. ?Runtime
> PM is completely out of the picture at this time. ?And this should be
> independent of mac80211 -- in fact, the card should be powered down
> appropriately even if the kernel doesn't have a mac80211 layer.

And it is.

But in order to boot the firmware on resume, we need to reset the
device. And if system suspend has been cancelled before
mmc_power_off() was invoked, we will not be able to. So, in this case
too, the driver will have to power the device off.

It all really boils down to the same solution - we will always have to
bypass runtime PM and directly control the power of the device.

2010-12-18 16:40:17

by Johannes Berg

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sat, 2010-12-18 at 18:00 +0200, Ohad Ben-Cohen wrote:

> > That's where the problem is. If there's a difference, from the driver's
> > point of view, between suspend and some other operation, there should be a
> > way to tell the driver what case it actually is dealing with.
>
> Yes, the problem will be solved if the driver would bypass the runtime
> PM framework on system suspend. mac80211 obviously has this
> information, and technically it's very easy to let the driver know
> about it.
>
> But the difference between suspend and normal operation is really
> artificial: in both cases mac80211 just asks the driver to power its
> device down, and the end result is exactly the same (a GPIO line of
> the device is de-asserted in our case). The difference between these
> two scenarios
> exist only because runtime PM is effectively disabled during system
> suspend, and therefore the driver has to look for an alternative way
> to power down the device.

Sounds to me like the difference isn't really in the driver, but the
core PM subsystem. Why does it care when powering off a device whether
it's during suspend, or during runtime?

johannes

2010-12-20 03:38:01

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sun, 19 Dec 2010, Rafael J. Wysocki wrote:

> That said, I think we may do something different that perhaps will make your
> life somewhat easier.
>
> Namely, if your driver doesn't implement any system suspend callbacks
> (i.e. ->suspend(), ->resume(), ->freeze(), ->thaw() etc.) and it doesn't
> want the analogous subsystem callbacks to be executed for the device, it will
> make sense to flag the device as "runtime only", or something like this,
> which make the PM core skip the device in dpm_suspend() etc.
>
> In that case, if a device if flagged as "runtime only", we can avoid
> calling pm_runtime_get_noirq() for it in dpm_prepare() and, analogously,
> calling pm_runtime_put_sync() for it in dpm_complete(). However, we will have
> to fail system suspend (or hibernation) if a "runtime only" device has the
> power.runtime_auto flag unset.

Or more generally, if pm_runtime_suspended() doesn't return 'true' for
the device. But if the device gets suspended asynchronously, this may
very well happen. For example, an i2c device is originally runtime
suspended, but its device_suspend() call occurs at the same time as the
call for the RTC device, so the i2c device actually happens to be
resumed at that time in order to communicate with the RTC.

> So, I think we can add a "runtime only" flag working as described above.
> I guess it will also help in the case I've been discussing with Kevin for some
> time (i2c device using runtime PM used by an RTC in a semi-transparent
> fashion).
>
> Alan, what do you think?

I'm not sure. In this situation, should we worry more that we usually
do about the possibility of a runtime resume occurring after the device
has gone through device_suspend()? Or just depend on the driver to do
everything correctly?

Alan Stern

2010-12-25 21:50:55

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

2010-12-22 01:43:00

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Tue, 21 Dec 2010, Rafael J. Wysocki wrote:

> It basically goes like this. There's device A that is only resumed when it's
> needed to carry out an operation and is suspended immediately after that.
> There's another device B that needs A to do something during its suspend.
> So, when the suspend of B is started, A is woken up, does its work and is
> suspended again (using pm_runtime_suspend()). Then B is suspended.
>
> We currently require that ->suspend() and ->resume() callbacks be defined
> for A (presumably pointing to the same code as its runtime callbacks) so that
> things work correctly, but perhaps we can just relax this requirement a bit?
> I'm not 100% sure that's a good idea, just considering it.

I still don't know. It would require a lot of special conditions: no
child devices, not runtime-PM-disabled, not runtime-PM-forbidden...
Also, A's parent would have to be coded carefully; otherwise A's
runtime resume would prevent the parent from suspending.

This just doesn't fit very well with the runtime PM model, or at least,
not in the form you described.

Consider this instead: Since A is required to be functional before B
can be used, A must be registered before B and hence B gets suspended
before A. Therefore during the prepare phase we can runtime-resume A
and leave it powered up; when B needs to suspend, it won't matter that
the runtime-PM calls are ineffective. Then when A's dpm_suspend
occurs, it can safely go to a low-power state and stay there.

Alan Stern

2010-12-25 16:21:45

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Sat, 25 Dec 2010, Ohad Ben-Cohen wrote:

> I'll try to explain, and if something is still not clear, please let me know.
>
> Our wlan device is an ARM microcontroller running some flavor of RTOS
> (i.e. the firmware); as I mentioned before, its power and reset
> functionalities are tied together (as far as software is concerned. a
> small detail I won't get into now). The ability to unconditionally
> power it down is needed for error recovery, for booting a new firmware
> (and for unconditionally stopping all radios...).

Okay.

> The driver assumes it can unconditionally power down the device, and
> is built around this assumption, so a user interface such as
> /sys/devices/.../power/control has no sense, and if set to 'on', will
> be fatal for this driver (e.g. it will not be possible to bring the
> wlan interface down and up).
>
> There is no way around this. The driver must be able to
> unconditionally power the hardware down.

Right. You may or may not realize it, but this requirement means that
the driver _must_ bypass runtime PM sometimes.

> About the suspend/resume issue:
>
> This is a mac80211 driver. It is basically a set of "dumb" handlers,
> which mac80211 uses to abstract the hardware differences between its
> various drivers. For example, there are handlers to take down/up an
> interface, to start/stop the hardware, etc..
>
> When one of the driver's handlers is being called, the driver doesn't
> really know (or care) if this is during runtime or not. If it is asked
> to stop its hardware, it should just do so.

With you so far.

> And when (in our case)
> this handler is invoked during system suspend, any disability to power
> off the device immediately opens a window for races due to the
> driver's assumption, on resume, that the device was powered off
> successfully.

Now you've lost me. Which of the driver's handlers are you talking
about? The one responsible for powering down the device? What races?
Why does the driver have to _assume_ the device was powered off -- why
doesn't it simply _do_ the power down? (As you said above, if it is
asked to stop its hardware, it should just do so.)

> So, yeah, we do have a suspend() handler, which powers
> the device off directly, but we need the "runtime" handler of the

Which "runtime" handler? Are you talking about the runtime_suspend
method, the runtime_resume method, the runtime_idle method, or
something else?

> driver to immediately succeed too in order to prevent the race (and
> then it's fully powered off and we wouldn't need to wait for the
> suspend() handler to do so too). For that to happen, we need runtime
> PM not to be disabled during system suspend (or by
> /sys/devices/.../power/control).

At this point I'm so confused, it's hard to ask a rational question.
What goes wrong if runtime PM is disabled during system suspend?
Since the driver must bypass runtime PM anyway, what difference does it
make if runtime PM is disabled?

> Tweaking the driver around this limitation (to realize somehow if the
> hardware was really powered down or not) doesn't make sense and
> frankly isn't worth the effort, since the driver anyway has to be able
> to unconditionally power down the device for the aforementioned
> reasons (error recovery, reboot a new fw, disable radios, ..).
>
> A small comment: SDIO drivers' suspend handler is actually triggered
> by the mmc host controller's suspend routine (through the SDIO
> subsystem); it's not the classic dpm-triggered suspend handler, so
> Rafael's notion of "runtime only" flag will work here.

Maybe it would help if you provided a list of the relevant subroutines
together with descriptions of the circumstances under which they are
called and what they are expected to do. For example, a brief
pseudo-code listing.

> Why runtime PM:
>
> The device has an SDIO interface, so one of its incarnations is an SDIO driver.
>
> Short background: MMC/SDIO cards are dynamically probed, identified
> and configured by the MMC/SDIO subsystem, so toggling their power must
> take place from within the MMC/SDIO subsystem itself.
>
> Until recently, MMC/SDIO cards were kept powered on from the moment
> they were inserted, up until they were removed (exception: power was
> removed on system suspend and brought up back on resume. there is an
> exception to this exception, too, but I won't get into that now).
>
> Recently, we have added runtime PM support to the MMC/SDIO subsystem,
> so cards can be powered down when they're not in use. E.g., an SDIO
> card is powered down if no driver is available or requires power to
> it, and an MMC card might be powered off after some period of
> inactivity (the latter is just being discussed and has not yet hit
> mainline).
>
> As far as the MMC/SDIO subsystem is concerned, this is merely a power
> optimization, and it's perfectly fine if the user will disable runtime
> PM for the card and by that disallow powering it down.
>
> But for our particular device this is fatal; as explained, the driver
> must have unconditional control of the device's power.
>
> So we need runtime PM at the subsystem level (to allow the MMC/SDIO
> subsystem power it off in case the driver is not loaded), but we will
> have no choice but bypass runtime PM at the driver level, in order to
> avoid the aforementioned suspend race, and a potential
> /sys/devices/.../power/control block.

That all makes sense, and there's nothing wrong with it. (Except that
I still don't know what suspend race you mean.)

> To maintain coherency with the runtime-PM's usage_count

It's not necessary to maintain usage_count while you're bypassing
runtime PM. Just make sure when you're done that everything is back in
sync.

For example, the USB subsystem bypasses runtime PM for interface
drivers; the interface driver is told to suspend at the time the
interface's parent device driver suspends. This means it's possible to
see that an interface's runtime PM status is RPM_SUSPENDED even though
the interface driver's suspend method hasn't been called. In the end
it all works out okay.

> (and by that
> prevent dpm_complete() from powering off our device after a system
> resume)

Why shouldn't dpm_complete cause the device to be powered down
(assuming the device isn't in use, of course)?

> we will also keep calling the pm_runtime_{get, put} API
> appropriately.
>
> It's not pretty, but this is the only way we can make this work
> (unless, of course, our use case will be supported within the
> runtime-PM framework itself).

It doesn't sound any less pretty than the situation in USB, so you
shouldn't be too concerned about the aesthetics. :-)

Alan Stern

2010-12-28 20:04:58

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Tuesday, December 28, 2010, Ohad Ben-Cohen wrote:
> On Sun, Dec 26, 2010 at 8:37 PM, Rafael J. Wysocki <[email protected]> wrote:
> > So, it only happens during asynchronous suspend? In other words, if suspend
> > is synchronous, everything should be fine, right?
>
> Not necessarily.

So it's not a race after all, is it?

> Consider this simple scenario, where a device was added after the mmc
> host controller, but before mac80211. In this case its suspend handler
> will have the chance to abort system suspend after mac80211 already
> told our driver to power down the device (but the device wasn't
> powered down yet, because the driver used pm_runtime_put_sync() which
> is disabled).

Well, first, you shouldn't rely on pm_runtime_put_sync() to actually _suspend_
the device at any point. What it does is to call pm_runtime_idle() for the
device, which isn't guaranteed to suspend it. If you want the device to
be suspended, you should use

pm_runtime_put_noidle(device);
pm_runtime_suspend(device);

or, alternatively

pm_runtime_put_sync_suspend(device);

(which equivalent to the above pair of callbacks, but is not available in
kernels prior to 2.6.37-rc1).

Second, what you'd really want to do (I guess) is:

pm_runtime_put_noidle(device);
device->bus->pm->runtime_suspend(device);

(I have omitted all of the usual checks for simplicity), because that would
_unconditionally_ put your device into a low-power state. No?

The problem is at this point the PM core will think the device is still
RPM_ACTIVE, so it will be necessary to additionally do something like:

pm_runtime_disable(device);
pm_runtime_set_suspended(device);
pm_runtime_enable(device);

Of course, you'll need to ensure there are no races between that and
any other code path that may want to resume the device simultaneously.
And here it backfires, because you have to synchronize not only with
runtime resume, but also with system suspend and possibly resume.

Rafael

2010-12-18 22:57:59

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Saturday, December 18, 2010, Alan Stern wrote:
> On Sat, 18 Dec 2010, Ohad Ben-Cohen wrote:
>
> > > Sounds to me like the difference isn't really in the driver, but the
> > > core PM subsystem. Why does it care when powering off a device whether
> > > it's during suspend, or during runtime?
> >
> > Agree.
> >
> > If we can add a dev_pm_info bit, that would allow using runtime PM API
> > during suspend/resume transitions, the driver will not have to care.
> >
> > Rafael what do you think ? Is that totally unacceptable ?
>
> Have you forgotten about the "echo on >.../power/control" scenario?

Well, that change would basically require the runtime PM framework to ignore
the usage count for this particular device, which would defeat the framework's
purpose to some extent, but it would cover the "echo on > ..." case.

However, I'm not going to agree to make that change.

Thanks,
Rafael

2011-01-27 23:12:17

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Thursday, January 27, 2011, Alan Stern wrote:
> On Wed, 26 Jan 2011, Kevin Hilman wrote:
>
> > >> Consider this instead: Since A is required to be functional before B
> > >> can be used, A must be registered before B and hence B gets suspended
> > >> before A. Therefore during the prepare phase we can runtime-resume A
> > >> and leave it powered up; when B needs to suspend, it won't matter that
> > >> the runtime-PM calls are ineffective.
> > >
> > > We don't really need to do that, because the runtime resume _is_ functional
> > > during system suspend.
>
> Not asynchronous runtime resume, because the workqueue is frozen. But
> that's not the issue here.
>
> > > The only thing missing is a ->suspend() callback for A
> > > (and a corresponding ->resume() callback to make sure A will be available to
> > > B during system resume).
> > >
> >
> > OK, I'm finally back to debugging this problem and looking for a final
> > solution.
> >
> > I agree that what is needed is ->suspend() and ->resume() callbacks for
> > A, but the question remains how to implement them.
> >
> > In my case, A doesn't need runtime callbacks, but *does* require that
> > the subsystem callbacks are called because the subsystem actually does
> > all the real PM work. On OMAP, the device PM code (clock mgmt, device
> > low-power states, etc.) is common for all on-chip devices, so is handled
> > in common code at the subsystem level (in this case, platform_bus.)
> >
> > Therefore, what is ideally needed is the ability for A's suspend to
> > simply call pm_runtime_suspend() so the subsystem can do the work.
> > However, since runtime transitions are locked out by this time, that
> > doesn't work. IOW, what is needed is a way for a system suspend to say
> > "please finish the runtime suspend that was already requested."
>
> Calling the runtime_suspend method directly is the way to do it.
>
> > What I've done to work around this in driver A is to manually check
> > pm_runtime_suspended() and directly call the subsystem's runtime
> > suspend/resume (patch below[1]. NOTE, I've used the _noirq methods to
> > ensure device A is available when device B needs it.)
>
> Hmm. The pm_runtime_suspended() check may not be needed (if A were
> suspended already then B would have encountered problems). But
> including it doesn't hurt.
>
> Using the _noirq method isn't a good idea, unless you know for certain
> that the runtime_suspend handler doesn't need to sleep.

That requirement is long gone, because timer interrupts are still enabled
when the _noirq() callbacks are run. The _noirq part means that the driver's
own interrupt handler is guaranteed not to run concurrently with the routines
(unless this also is a timer interrupt, that is).

Rafael

2011-01-26 23:35:21

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

"Rafael J. Wysocki" <[email protected]> writes:

> On Wednesday, December 22, 2010, Alan Stern wrote:
>> On Tue, 21 Dec 2010, Rafael J. Wysocki wrote:
>>
>> > It basically goes like this. There's device A that is only resumed when it's
>> > needed to carry out an operation and is suspended immediately after that.
>> > There's another device B that needs A to do something during its suspend.
>> > So, when the suspend of B is started, A is woken up, does its work and is
>> > suspended again (using pm_runtime_suspend()). Then B is suspended.
>> >
>> > We currently require that ->suspend() and ->resume() callbacks be defined
>> > for A (presumably pointing to the same code as its runtime callbacks) so that
>> > things work correctly, but perhaps we can just relax this requirement a bit?
>> > I'm not 100% sure that's a good idea, just considering it.
>>
>> I still don't know. It would require a lot of special conditions: no
>> child devices, not runtime-PM-disabled, not runtime-PM-forbidden...
>> Also, A's parent would have to be coded carefully; otherwise A's
>> runtime resume would prevent the parent from suspending.
>>
>> This just doesn't fit very well with the runtime PM model, or at least,
>> not in the form you described.
>>
>> Consider this instead: Since A is required to be functional before B
>> can be used, A must be registered before B and hence B gets suspended
>> before A. Therefore during the prepare phase we can runtime-resume A
>> and leave it powered up; when B needs to suspend, it won't matter that
>> the runtime-PM calls are ineffective.
>
> We don't really need to do that, because the runtime resume _is_ functional
> during system suspend. The only thing missing is a ->suspend() callback for A
> (and a corresponding ->resume() callback to make sure A will be available to
> B during system resume).
>

OK, I'm finally back to debugging this problem and looking for a final
solution.

I agree that what is needed is ->suspend() and ->resume() callbacks for
A, but the question remains how to implement them.

In my case, A doesn't need runtime callbacks, but *does* require that
the subsystem callbacks are called because the subsystem actually does
all the real PM work. On OMAP, the device PM code (clock mgmt, device
low-power states, etc.) is common for all on-chip devices, so is handled
in common code at the subsystem level (in this case, platform_bus.)

Therefore, what is ideally needed is the ability for A's suspend to
simply call pm_runtime_suspend() so the subsystem can do the work.
However, since runtime transitions are locked out by this time, that
doesn't work. IOW, what is needed is a way for a system suspend to say
"please finish the runtime suspend that was already requested."

What I've done to work around this in driver A is to manually check
pm_runtime_suspended() and directly call the subsystem's runtime
suspend/resume (patch below[1]. NOTE, I've used the _noirq methods to
ensure device A is available when device B needs it.)

While this works, I'm not crazy about it since it requires the driver
know about the subsystem (in this case the bus) where the real PM work
is done. IMO, it would be much more intuitive (and readable) if the
driver's suspend hooks could simply trigger a runtime suspend (either a
new one, or one already requested.)

FWIW, another hack that I've experimented with is just to just re-enable
runtime transitions by doing a pm_runtime_put_sync() in the
suspend_noirq method and a pm_runtime_get_noresume() in the resume_noirq
method. This allows driver A to suspend since it has already requested
runtime, but I don't expect this second approach to be popluar. :) This
second approach also doen't address system suspend/resume when the user
has disabled runtime PM via /sys/devices/.../power/control.

Kevin

[1]
diff --git a/drivers/i2c/busses/i2c-omap.c b/drivers/i2c/busses/i2c-omap.c
index b605ff3..a4bc15a 100644
--- a/drivers/i2c/busses/i2c-omap.c
+++ b/drivers/i2c/busses/i2c-omap.c
@@ -1137,12 +1137,40 @@ omap_i2c_remove(struct platform_device *pdev)
return 0;
}

+#ifdef CONFIG_SUSPEND
+static int omap_i2c_suspend(struct device *dev)
+{
+ if (!pm_runtime_suspended(dev))
+ if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_suspend)
+ dev->bus->pm->runtime_suspend(dev);
+
+ return 0;
+}
+
+static int omap_i2c_resume(struct device *dev)
+{
+ if (!pm_runtime_suspended(dev))
+ if (dev->bus && dev->bus->pm && dev->bus->pm->runtime_resume)
+ dev->bus->pm->runtime_resume(dev);
+
+ return 0;
+}
+
+static struct dev_pm_ops omap_i2c_pm_ops = {
+ .suspend_noirq = omap_i2c_suspend,
+ .resume_noirq = omap_i2c_resume,
+};
+#else
+#define omap_i2c_pm_ops NULL
+#endif
+
static struct platform_driver omap_i2c_driver = {
.probe = omap_i2c_probe,
.remove = omap_i2c_remove,
.driver = {
.name = "omap_i2c",
.owner = THIS_MODULE,
+ .pm = &omap_i2c_pm_ops,
},
};

2011-01-27 20:15:17

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

Alan Stern <[email protected]> writes:

> On Thu, 27 Jan 2011, Kevin Hilman wrote:
>
>> > Calling the runtime_suspend method directly is the way to do it.
>> >
>>
>> Do you mean the driver's runtime_suspend method, or the subsystem's
>> runtime_suspend method?
>
> The subsystem's. If the driver has a runtime_suspend method then the
> subsystem's method will call it.

Yes. Thanks for the clarification.

>> >> While this works, I'm not crazy about it since it requires the driver
>> >> know about the subsystem (in this case the bus) where the real PM work
>> >> is done. IMO, it would be much more intuitive (and readable) if the
>> >> driver's suspend hooks could simply trigger a runtime suspend (either a
>> >> new one, or one already requested.)
>> >
>> > This isn't clear to me. Isn't the driver registered on the bus in
>> > question? Can't the driver therefore call the bus's runtime_suspend
>> > routine directly, instead of dereferencing the bus->pm->runtime_suspend
>> > pointer?
>>
>> Not sure what you mean by directly. The platform_bus doesn't expose
>> its runtime PM methods since they can be customized at runtime, so they
>> have to be called via bus->pm.
>>
>> Or do you mean using dev->driver instead of dev->bus?
>
> You're doing all of this for OMAP, right? What is the subsystem's
> runtime_suspend routine? Is it omap_pm_runtime_suspend()?

Yes.

> If it is, then you can call omap_pm_runtime_suspend() directly instead
> of calling dev->bus->pm->runtime_suspend().

Personally, I prefer going through dev->bus as we try to avoid SoC
specific calls in the driver.

This same HW block might be re-used on non-OMAP SoCs (e.g. TI DaVinci)
that would have different PM at the susbystem level.

So, to summarize, as long as folks are OK with drivers directly calling
the subsystem runtime PM callbacks, I'll go this route.

Kevin

2011-01-27 18:20:12

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

Hi Kevin,

> Therefore, what is ideally needed is the ability for A's suspend to
> simply call pm_runtime_suspend() so the subsystem can do the work.
> However, since runtime transitions are locked out by this time, that
> doesn't work. ?IOW, what is needed is a way for a system suspend to say
> "please finish the runtime suspend that was already requested."
>
> What I've done to work around this in driver A is to manually check
> pm_runtime_suspended() and directly call the subsystem's runtime
> suspend/resume (patch below[1]. ?NOTE, I've used the _noirq methods to
> ensure device A is available when device B needs it.)

suppose this driver runs on a platform that has runtime PM disabled.
How is it going to work then?

Thanks,
Vitaly

2011-01-27 23:49:43

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

"Rafael J. Wysocki" <[email protected]> writes:

[...]

>>
>> So, to summarize, as long as folks are OK with drivers directly calling
>> the subsystem runtime PM callbacks, I'll go this route.
>
> This is perfectly fine as long as you ensure that they won't be called
> concurrently through the runtime PM framework.

Yes, as long as they're only used in the suspend/resume (or _noirq
versions) this is guaranteed since the DPM core is already preventing
runtime PM transitions during suspend/resume.

Kevin

2011-01-27 18:54:24

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

Vitaly Wool <[email protected]> writes:

> Hi Kevin,
>
>> Therefore, what is ideally needed is the ability for A's suspend to
>> simply call pm_runtime_suspend() so the subsystem can do the work.
>> However, since runtime transitions are locked out by this time, that
>> doesn't work. IOW, what is needed is a way for a system suspend to say
>> "please finish the runtime suspend that was already requested."
>>
>> What I've done to work around this in driver A is to manually check
>> pm_runtime_suspended() and directly call the subsystem's runtime
>> suspend/resume (patch below[1]. NOTE, I've used the _noirq methods to
>> ensure device A is available when device B needs it.)
>
> suppose this driver runs on a platform that has runtime PM disabled.
> How is it going to work then?
>

Then pm_runtime_suspended() will be false, and the bus methods will
suspend the device.

Kevin

2011-01-27 19:49:02

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Thu, 27 Jan 2011, Kevin Hilman wrote:

> > Calling the runtime_suspend method directly is the way to do it.
> >
>
> Do you mean the driver's runtime_suspend method, or the subsystem's
> runtime_suspend method?

The subsystem's. If the driver has a runtime_suspend method then the
subsystem's method will call it.

> >> While this works, I'm not crazy about it since it requires the driver
> >> know about the subsystem (in this case the bus) where the real PM work
> >> is done. IMO, it would be much more intuitive (and readable) if the
> >> driver's suspend hooks could simply trigger a runtime suspend (either a
> >> new one, or one already requested.)
> >
> > This isn't clear to me. Isn't the driver registered on the bus in
> > question? Can't the driver therefore call the bus's runtime_suspend
> > routine directly, instead of dereferencing the bus->pm->runtime_suspend
> > pointer?
>
> Not sure what you mean by directly. The platform_bus doesn't expose
> its runtime PM methods since they can be customized at runtime, so they
> have to be called via bus->pm.
>
> Or do you mean using dev->driver instead of dev->bus?

You're doing all of this for OMAP, right? What is the subsystem's
runtime_suspend routine? Is it omap_pm_runtime_suspend()?

If it is, then you can call omap_pm_runtime_suspend() directly instead
of calling dev->bus->pm->runtime_suspend().

Alan Stern

2011-01-27 18:13:36

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Wed, 26 Jan 2011, Kevin Hilman wrote:

> >> Consider this instead: Since A is required to be functional before B
> >> can be used, A must be registered before B and hence B gets suspended
> >> before A. Therefore during the prepare phase we can runtime-resume A
> >> and leave it powered up; when B needs to suspend, it won't matter that
> >> the runtime-PM calls are ineffective.
> >
> > We don't really need to do that, because the runtime resume _is_ functional
> > during system suspend.

Not asynchronous runtime resume, because the workqueue is frozen. But
that's not the issue here.

> > The only thing missing is a ->suspend() callback for A
> > (and a corresponding ->resume() callback to make sure A will be available to
> > B during system resume).
> >
>
> OK, I'm finally back to debugging this problem and looking for a final
> solution.
>
> I agree that what is needed is ->suspend() and ->resume() callbacks for
> A, but the question remains how to implement them.
>
> In my case, A doesn't need runtime callbacks, but *does* require that
> the subsystem callbacks are called because the subsystem actually does
> all the real PM work. On OMAP, the device PM code (clock mgmt, device
> low-power states, etc.) is common for all on-chip devices, so is handled
> in common code at the subsystem level (in this case, platform_bus.)
>
> Therefore, what is ideally needed is the ability for A's suspend to
> simply call pm_runtime_suspend() so the subsystem can do the work.
> However, since runtime transitions are locked out by this time, that
> doesn't work. IOW, what is needed is a way for a system suspend to say
> "please finish the runtime suspend that was already requested."

Calling the runtime_suspend method directly is the way to do it.

> What I've done to work around this in driver A is to manually check
> pm_runtime_suspended() and directly call the subsystem's runtime
> suspend/resume (patch below[1]. NOTE, I've used the _noirq methods to
> ensure device A is available when device B needs it.)

Hmm. The pm_runtime_suspended() check may not be needed (if A were
suspended already then B would have encountered problems). But
including it doesn't hurt.

Using the _noirq method isn't a good idea, unless you know for certain
that the runtime_suspend handler doesn't need to sleep. Using the
normal suspend method should work okay, because B always has to suspend
before A.

> While this works, I'm not crazy about it since it requires the driver
> know about the subsystem (in this case the bus) where the real PM work
> is done. IMO, it would be much more intuitive (and readable) if the
> driver's suspend hooks could simply trigger a runtime suspend (either a
> new one, or one already requested.)

This isn't clear to me. Isn't the driver registered on the bus in
question? Can't the driver therefore call the bus's runtime_suspend
routine directly, instead of dereferencing the bus->pm->runtime_suspend
pointer?

Alan Stern

2011-01-27 23:21:54

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

On Thursday, January 27, 2011, Kevin Hilman wrote:
> Alan Stern <[email protected]> writes:
>
> > On Thu, 27 Jan 2011, Kevin Hilman wrote:
> >
> >> > Calling the runtime_suspend method directly is the way to do it.
> >> >
> >>
> >> Do you mean the driver's runtime_suspend method, or the subsystem's
> >> runtime_suspend method?
> >
> > The subsystem's. If the driver has a runtime_suspend method then the
> > subsystem's method will call it.
>
> Yes. Thanks for the clarification.
>
> >> >> While this works, I'm not crazy about it since it requires the driver
> >> >> know about the subsystem (in this case the bus) where the real PM work
> >> >> is done. IMO, it would be much more intuitive (and readable) if the
> >> >> driver's suspend hooks could simply trigger a runtime suspend (either a
> >> >> new one, or one already requested.)
> >> >
> >> > This isn't clear to me. Isn't the driver registered on the bus in
> >> > question? Can't the driver therefore call the bus's runtime_suspend
> >> > routine directly, instead of dereferencing the bus->pm->runtime_suspend
> >> > pointer?
> >>
> >> Not sure what you mean by directly. The platform_bus doesn't expose
> >> its runtime PM methods since they can be customized at runtime, so they
> >> have to be called via bus->pm.
> >>
> >> Or do you mean using dev->driver instead of dev->bus?
> >
> > You're doing all of this for OMAP, right? What is the subsystem's
> > runtime_suspend routine? Is it omap_pm_runtime_suspend()?
>
> Yes.
>
> > If it is, then you can call omap_pm_runtime_suspend() directly instead
> > of calling dev->bus->pm->runtime_suspend().
>
> Personally, I prefer going through dev->bus as we try to avoid SoC
> specific calls in the driver.
>
> This same HW block might be re-used on non-OMAP SoCs (e.g. TI DaVinci)
> that would have different PM at the susbystem level.
>
> So, to summarize, as long as folks are OK with drivers directly calling
> the subsystem runtime PM callbacks, I'll go this route.

This is perfectly fine as long as you ensure that they won't be called
concurrently through the runtime PM framework.

Thanks,
Rafael

2011-01-27 19:23:03

[permalink] [raw]

Subject: Re: [linux-pm] subtle pm_runtime_put_sync race and sdio functions

Alan Stern <[email protected]> writes:

> On Wed, 26 Jan 2011, Kevin Hilman wrote:
>
>> >> Consider this instead: Since A is required to be functional before B
>> >> can be used, A must be registered before B and hence B gets suspended
>> >> before A. Therefore during the prepare phase we can runtime-resume A
>> >> and leave it powered up; when B needs to suspend, it won't matter that
>> >> the runtime-PM calls are ineffective.
>> >
>> > We don't really need to do that, because the runtime resume _is_ functional
>> > during system suspend.
>
> Not asynchronous runtime resume, because the workqueue is frozen. But
> that's not the issue here.
>
>> > The only thing missing is a ->suspend() callback for A
>> > (and a corresponding ->resume() callback to make sure A will be available to
>> > B during system resume).
>> >
>>
>> OK, I'm finally back to debugging this problem and looking for a final
>> solution.
>>
>> I agree that what is needed is ->suspend() and ->resume() callbacks for
>> A, but the question remains how to implement them.
>>
>> In my case, A doesn't need runtime callbacks, but *does* require that
>> the subsystem callbacks are called because the subsystem actually does
>> all the real PM work. On OMAP, the device PM code (clock mgmt, device
>> low-power states, etc.) is common for all on-chip devices, so is handled
>> in common code at the subsystem level (in this case, platform_bus.)
>>
>> Therefore, what is ideally needed is the ability for A's suspend to
>> simply call pm_runtime_suspend() so the subsystem can do the work.
>> However, since runtime transitions are locked out by this time, that
>> doesn't work. IOW, what is needed is a way for a system suspend to say
>> "please finish the runtime suspend that was already requested."
>
> Calling the runtime_suspend method directly is the way to do it.
>

Do you mean the driver's runtime_suspend method, or the subsystem's
runtime_suspend method? In my case, the driver has no runtime_suspend
method since the subystem methods are doing the heavy lifting.

>> What I've done to work around this in driver A is to manually check
>> pm_runtime_suspended() and directly call the subsystem's runtime
>> suspend/resume (patch below[1]. NOTE, I've used the _noirq methods to
>> ensure device A is available when device B needs it.)
>
> Hmm. The pm_runtime_suspended() check may not be needed (if A were
> suspended already then B would have encountered problems). But
> including it doesn't hurt.
>
> Using the _noirq method isn't a good idea, unless you know for certain
> that the runtime_suspend handler doesn't need to sleep.
> Using the normal suspend method should work okay, because B always has
> to suspend before A.

I do know that it doesn't need to sleep, but I think I can use the
normal methods anyways in case that changes in the future.

>> While this works, I'm not crazy about it since it requires the driver
>> know about the subsystem (in this case the bus) where the real PM work
>> is done. IMO, it would be much more intuitive (and readable) if the
>> driver's suspend hooks could simply trigger a runtime suspend (either a
>> new one, or one already requested.)
>
> This isn't clear to me. Isn't the driver registered on the bus in
> question? Can't the driver therefore call the bus's runtime_suspend
> routine directly, instead of dereferencing the bus->pm->runtime_suspend
> pointer?

Not sure what you mean by directly. The platform_bus doesn't expose
its runtime PM methods since they can be customized at runtime, so they
have to be called via bus->pm.

Or do you mean using dev->driver instead of dev->bus?

Kevin

2011-01-27 22:18:28