On Mon, May 31, 2010 at 04:21:09PM -0500, James Bottomley wrote:
> You're the one mentioning x86, not me. I already explained that some
> MSM hardware (the G1 for example) has lower power consumption in S3
> (which I'm using as an ACPI shorthand for suspend to ram) than any
> suspend from idle C state. The fact that current x86 hardware has the
> same problem may be true, but it's not entirely relevant.
As long as you can set a wakeup timer, an S state is just a C state with
side effects. The significant one is that entering an S state stops the
process scheduler and any in-kernel timers. I don't think Google care at
all about whether suspend is entered through an explicit transition or
something hooked into cpuidle - the relevant issue is that they want to
be able to express a set of constraints that lets them control whether
or not the scheduler keeps on scheduling, and which doesn't let them
lose wakeup events in the process.
--
Matthew Garrett | [email protected]
On Tue, 2010-06-01 at 14:51 +0100, Matthew Garrett wrote:
> On Mon, May 31, 2010 at 04:21:09PM -0500, James Bottomley wrote:
>
> > You're the one mentioning x86, not me. I already explained that some
> > MSM hardware (the G1 for example) has lower power consumption in S3
> > (which I'm using as an ACPI shorthand for suspend to ram) than any
> > suspend from idle C state. The fact that current x86 hardware has the
> > same problem may be true, but it's not entirely relevant.
>
> As long as you can set a wakeup timer, an S state is just a C state with
> side effects. The significant one is that entering an S state stops the
> process scheduler and any in-kernel timers. I don't think Google care at
> all about whether suspend is entered through an explicit transition or
> something hooked into cpuidle - the relevant issue is that they want to
> be able to express a set of constraints that lets them control whether
> or not the scheduler keeps on scheduling, and which doesn't let them
> lose wakeup events in the process.
Exactly, so my understanding of where we currently are is:
1. pm_qos will be updated to be able to express the android suspend
blockers as interactivity constraints (exact name TBD, but
probably /dev/cpu_interactivity)
2. pm_qos will be updated to be callable from atomic context
3. pm_qos will be updated to export statistics initially closely
matching what suspend blockers provides (simple update of the rw
interface?)
After this is done, the current android suspend block patch becomes a
re-expression in kernel space in terms of pm_qos, with the current
userspace wakelocks being adapted by the android framework into pm_qos
requirements expressed to /dev/cpu_interactivity (or whatever name is
chosen). Then opportunistic suspend is either a small add-on kernel
patch they have in their tree to suspend when the interactivity
constraint goes to NONE, or it could be done entirely by a userspace
process. Long term this could migrate to the freezer and suspend from
idle approach as the various problem timers get fixed.
I think the big unresolved issue is the stats extension. For android,
we need just a name written along with the value, so we have something
to hang the stats off ... current pm_qos userspace users just write a
value, so the name would be optional. From the kernel, we probably just
need an additional API that takes a stats name or NULL if none
(pm_qos_add_request_named()?). Then reading the stats could be done by
implementing a fops read routine on the misc device.
Did I miss anything?
James
--- On Tue, 6/1/10, James Bottomley <[email protected]> wrote:
> > As long as you can set a wakeup timer, an S state is just a C state with
> > side effects.
I've seen similar statements on this endless
thread before; they're not quite true...
> > The significant one is that entering an
> > S state stops the process scheduler and
> > any in-kernel timers.
There's a structural difference too, related
to peripheral device activity and power states.
Specifically, peripherals can be active in C
states (erforming I/O, maybe with DMA etc) and
will in general not be in lowest power states
(PCI etc). Whereas entry to ACPI S-states
involves calling the AML code to put those
peripherals into lowest power modes ... ones
they can't in general enter at run time. (An
additional task of that bytecode is to activate
any wakeup logic, which again is not generally
functional in except in S-states).
The point being perhaps more that ACPI doesn't
map well to the more power-efficient architectures
(often built on ARM) ... hardware vendors provide
all kinds of PM hooks, and Linux can choose between
them so it's more power-miserly than if it tried
to emulate an ACPI based platform.
I've seen some Linux systems which put DRAM into
self-refresh during certain idle modes, for example,
not just during suspend-to-RAM, if it's known that
no DMA is active. (Why not save that power if it's
safe?) Likewise, disable some oscillators and PLLs
if they're not needed (the clock API allows that to
be done regardless of "C-states" etc).
The notion of "suspend" gets introduced on such
systems primarily to match the ACPI-ish models that
exist ... rather than because they necessarily make
good matches for the hardware. Which has left a
puzzle: how and why to use such "suspend" models?
Maybe that's underlying some of the pushback for
the notion of automagic entry to "suspend" states.
- Dave
On Tuesday 01 June 2010, James Bottomley wrote:
> On Tue, 2010-06-01 at 14:51 +0100, Matthew Garrett wrote:
> > On Mon, May 31, 2010 at 04:21:09PM -0500, James Bottomley wrote:
> >
> > > You're the one mentioning x86, not me. I already explained that some
> > > MSM hardware (the G1 for example) has lower power consumption in S3
> > > (which I'm using as an ACPI shorthand for suspend to ram) than any
> > > suspend from idle C state. The fact that current x86 hardware has the
> > > same problem may be true, but it's not entirely relevant.
> >
> > As long as you can set a wakeup timer, an S state is just a C state with
> > side effects. The significant one is that entering an S state stops the
> > process scheduler and any in-kernel timers. I don't think Google care at
> > all about whether suspend is entered through an explicit transition or
> > something hooked into cpuidle - the relevant issue is that they want to
> > be able to express a set of constraints that lets them control whether
> > or not the scheduler keeps on scheduling, and which doesn't let them
> > lose wakeup events in the process.
>
> Exactly, so my understanding of where we currently are is:
Thanks for the recap.
> 1. pm_qos will be updated to be able to express the android suspend
> blockers as interactivity constraints (exact name TBD, but
> probably /dev/cpu_interactivity)
I think that's not been decided yet precisely enough. I saw a few ideas
here and there in the thread, but which of them are we going to follow?
> 2. pm_qos will be updated to be callable from atomic context
> 3. pm_qos will be updated to export statistics initially closely
> matching what suspend blockers provides (simple update of the rw
> interface?)
>
> After this is done, the current android suspend block patch becomes a
> re-expression in kernel space in terms of pm_qos, with the current
> userspace wakelocks being adapted by the android framework into pm_qos
> requirements expressed to /dev/cpu_interactivity (or whatever name is
> chosen). Then opportunistic suspend is either a small add-on kernel
> patch they have in their tree to suspend when the interactivity
> constraint goes to NONE, or it could be done entirely by a userspace
> process. Long term this could migrate to the freezer and suspend from
> idle approach as the various problem timers get fixed.
>
> I think the big unresolved issue is the stats extension. For android,
> we need just a name written along with the value, so we have something
> to hang the stats off ... current pm_qos userspace users just write a
> value, so the name would be optional. From the kernel, we probably just
> need an additional API that takes a stats name or NULL if none
> (pm_qos_add_request_named()?). Then reading the stats could be done by
> implementing a fops read routine on the misc device.
Is the original idea of having that information in debugfs objectionable?
Rafael
On Wed, 2010-06-02 at 00:24 +0200, Rafael J. Wysocki wrote:
> On Tuesday 01 June 2010, James Bottomley wrote:
> > On Tue, 2010-06-01 at 14:51 +0100, Matthew Garrett wrote:
> > > On Mon, May 31, 2010 at 04:21:09PM -0500, James Bottomley wrote:
> > >
> > > > You're the one mentioning x86, not me. I already explained that some
> > > > MSM hardware (the G1 for example) has lower power consumption in S3
> > > > (which I'm using as an ACPI shorthand for suspend to ram) than any
> > > > suspend from idle C state. The fact that current x86 hardware has the
> > > > same problem may be true, but it's not entirely relevant.
> > >
> > > As long as you can set a wakeup timer, an S state is just a C state with
> > > side effects. The significant one is that entering an S state stops the
> > > process scheduler and any in-kernel timers. I don't think Google care at
> > > all about whether suspend is entered through an explicit transition or
> > > something hooked into cpuidle - the relevant issue is that they want to
> > > be able to express a set of constraints that lets them control whether
> > > or not the scheduler keeps on scheduling, and which doesn't let them
> > > lose wakeup events in the process.
> >
> > Exactly, so my understanding of where we currently are is:
>
> Thanks for the recap.
>
> > 1. pm_qos will be updated to be able to express the android suspend
> > blockers as interactivity constraints (exact name TBD, but
> > probably /dev/cpu_interactivity)
>
> I think that's not been decided yet precisely enough. I saw a few ideas
> here and there in the thread, but which of them are we going to follow?
Well, android only needs two states (block and don't block), so that
gets translated as 2 s32 values (say 0 and INT_MAX). I've seen defines
like QOS_INTERACTIVE and QOS_NONE (or QOS_DRECKLY or QOS_MANANA) to
describe these, but if all we're arguing over is the define name, that's
progress.
The other piece they need is the suspend block name, which comes with
the stats API, and finally we need to decide what the actual constraint
is called (which is how the dev node gets its name) ...
> > 2. pm_qos will be updated to be callable from atomic context
> > 3. pm_qos will be updated to export statistics initially closely
> > matching what suspend blockers provides (simple update of the rw
> > interface?)
> >
> > After this is done, the current android suspend block patch becomes a
> > re-expression in kernel space in terms of pm_qos, with the current
> > userspace wakelocks being adapted by the android framework into pm_qos
> > requirements expressed to /dev/cpu_interactivity (or whatever name is
> > chosen). Then opportunistic suspend is either a small add-on kernel
> > patch they have in their tree to suspend when the interactivity
> > constraint goes to NONE, or it could be done entirely by a userspace
> > process. Long term this could migrate to the freezer and suspend from
> > idle approach as the various problem timers get fixed.
> >
> > I think the big unresolved issue is the stats extension. For android,
> > we need just a name written along with the value, so we have something
> > to hang the stats off ... current pm_qos userspace users just write a
> > value, so the name would be optional. From the kernel, we probably just
> > need an additional API that takes a stats name or NULL if none
> > (pm_qos_add_request_named()?). Then reading the stats could be done by
> > implementing a fops read routine on the misc device.
>
> Is the original idea of having that information in debugfs objectionable?
Well ... debugfs is usually used to get around the sysfs rules. In this
case, pm_qos has a dev interface ... I don't specifically object to
using debugfs, but I don't see any reason to forbid it from being a
simple dev read interface either.
James
On Tue, Jun 1, 2010 at 3:36 PM, James Bottomley <[email protected]> wrote:
> On Wed, 2010-06-02 at 00:24 +0200, Rafael J. Wysocki wrote:
>> On Tuesday 01 June 2010, James Bottomley wrote:
>> > On Tue, 2010-06-01 at 14:51 +0100, Matthew Garrett wrote:
>> > > On Mon, May 31, 2010 at 04:21:09PM -0500, James Bottomley wrote:
>> > >
>> > > > You're the one mentioning x86, not me. ?I already explained that some
>> > > > MSM hardware (the G1 for example) has lower power consumption in S3
>> > > > (which I'm using as an ACPI shorthand for suspend to ram) than any
>> > > > suspend from idle C state. ?The fact that current x86 hardware has the
>> > > > same problem may be true, but it's not entirely relevant.
>> > >
>> > > As long as you can set a wakeup timer, an S state is just a C state with
>> > > side effects. The significant one is that entering an S state stops the
>> > > process scheduler and any in-kernel timers. I don't think Google care at
>> > > all about whether suspend is entered through an explicit transition or
>> > > something hooked into cpuidle - the relevant issue is that they want to
>> > > be able to express a set of constraints that lets them control whether
>> > > or not the scheduler keeps on scheduling, and which doesn't let them
>> > > lose wakeup events in the process.
>> >
>> > Exactly, so my understanding of where we currently are is:
>>
>> Thanks for the recap.
>>
>> > ? ? ?1. pm_qos will be updated to be able to express the android suspend
>> > ? ? ? ? blockers as interactivity constraints (exact name TBD, but
>> > ? ? ? ? probably /dev/cpu_interactivity)
>>
>> I think that's not been decided yet precisely enough. ?I saw a few ideas
>> here and there in the thread, but which of them are we going to follow?
>
> Well, android only needs two states (block and don't block), so that
> gets translated as 2 s32 values (say 0 and INT_MAX). ?I've seen defines
> like QOS_INTERACTIVE and QOS_NONE (or QOS_DRECKLY or QOS_MANANA) to
> describe these, but if all we're arguing over is the define name, that's
> progress.
I think we need separate state constraints for suspend and idle low
power modes. On the msm platform only a subset of the interrupts can
wake up from the low power mode, so we block the use if the low power
mode from idle while other interrupts are enabled. We do not block
suspend however if those interrupts are not marked as wakeup
interrupts. Most constraints that prevent suspend are not hardware
specific and should not prevent entering low power modes from idle. In
other words we may need to prevent low power idle modes while allowing
suspend, and we may need to prevent suspend while allowing low power
idle modes.
It would also be good to not have an implementation that gets slower
and slower the more clients you have. With binary constraints this is
trivial.
>
> The other piece they need is the suspend block name, which comes with
> the stats API, and finally we need to decide what the actual constraint
> is called (which is how the dev node gets its name) ...
>
>> > ? ? ?2. pm_qos will be updated to be callable from atomic context
>> > ? ? ?3. pm_qos will be updated to export statistics initially closely
>> > ? ? ? ? matching what suspend blockers provides (simple update of the rw
>> > ? ? ? ? interface?)
4. It would be useful to change pm_qos_add_request to not allocate
anything so can add constraints from init functions that currently
cannot fail.
>> >
>> > After this is done, the current android suspend block patch becomes a
>> > re-expression in kernel space in terms of pm_qos, with the current
>> > userspace wakelocks being adapted by the android framework into pm_qos
>> > requirements expressed to /dev/cpu_interactivity (or whatever name is
>> > chosen). ?Then opportunistic suspend is either a small add-on kernel
>> > patch they have in their tree to suspend when the interactivity
>> > constraint goes to NONE, or it could be done entirely by a userspace
>> > process. ?Long term this could migrate to the freezer and suspend from
>> > idle approach as the various problem timers get fixed.
>> >
>> > I think the big unresolved issue is the stats extension. ?For android,
>> > we need just a name written along with the value, so we have something
>> > to hang the stats off ... current pm_qos userspace users just write a
>> > value, so the name would be optional. ?From the kernel, we probably just
>> > need an additional API that takes a stats name or NULL if none
>> > (pm_qos_add_request_named()?). ?Then reading the stats could be done by
>> > implementing a fops read routine on the misc device.
>>
>> Is the original idea of having that information in debugfs objectionable?
>
> Well ... debugfs is usually used to get around the sysfs rules. ?In this
> case, pm_qos has a dev interface ... I don't specifically object to
> using debugfs, but I don't see any reason to forbid it from being a
> simple dev read interface either.
>
We don't currently have a dev interface for stats so this is not an
immediate requirement. The suspend blocker debugfs interface is just
as good as the proc interface we have for wakelocks.
--
Arve Hj?nnev?g
>-----Original Message-----
>From: Arve Hj?nnev?g [mailto:[email protected]]
>Sent: Tuesday, June 01, 2010 6:11 PM
>To: James Bottomley
>Cc: Rafael J. Wysocki; Matthew Garrett; Thomas Gleixner; Peter Zijlstra;
>[email protected]; LKML; Florian Mickler; Linux PM; Linux OMAP Mailing List;
>[email protected]; Alan Cox; Alan Stern; Gross, Mark; Neil Brown
>Subject: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)
>
>On Tue, Jun 1, 2010 at 3:36 PM, James Bottomley <[email protected]>
>wrote:
>> On Wed, 2010-06-02 at 00:24 +0200, Rafael J. Wysocki wrote:
>>> On Tuesday 01 June 2010, James Bottomley wrote:
>>> > On Tue, 2010-06-01 at 14:51 +0100, Matthew Garrett wrote:
>>> > > On Mon, May 31, 2010 at 04:21:09PM -0500, James Bottomley wrote:
>>> > >
>>> > > > You're the one mentioning x86, not me. ?I already explained that
>some
>>> > > > MSM hardware (the G1 for example) has lower power consumption in
>S3
>>> > > > (which I'm using as an ACPI shorthand for suspend to ram) than any
>>> > > > suspend from idle C state. ?The fact that current x86 hardware has
>the
>>> > > > same problem may be true, but it's not entirely relevant.
>>> > >
>>> > > As long as you can set a wakeup timer, an S state is just a C state
>with
>>> > > side effects. The significant one is that entering an S state stops
>the
>>> > > process scheduler and any in-kernel timers. I don't think Google
>care at
>>> > > all about whether suspend is entered through an explicit transition
>or
>>> > > something hooked into cpuidle - the relevant issue is that they want
>to
>>> > > be able to express a set of constraints that lets them control
>whether
>>> > > or not the scheduler keeps on scheduling, and which doesn't let them
>>> > > lose wakeup events in the process.
>>> >
>>> > Exactly, so my understanding of where we currently are is:
>>>
>>> Thanks for the recap.
>>>
>>> > ? ? ?1. pm_qos will be updated to be able to express the android
>suspend
>>> > ? ? ? ? blockers as interactivity constraints (exact name TBD, but
>>> > ? ? ? ? probably /dev/cpu_interactivity)
>>>
>>> I think that's not been decided yet precisely enough. ?I saw a few ideas
>>> here and there in the thread, but which of them are we going to follow?
>>
>> Well, android only needs two states (block and don't block), so that
>> gets translated as 2 s32 values (say 0 and INT_MAX). ?I've seen defines
>> like QOS_INTERACTIVE and QOS_NONE (or QOS_DRECKLY or QOS_MANANA) to
>> describe these, but if all we're arguing over is the define name, that's
>> progress.
>
>I think we need separate state constraints for suspend and idle low
>power modes. On the msm platform only a subset of the interrupts can
>wake up from the low power mode, so we block the use if the low power
>mode from idle while other interrupts are enabled. We do not block
>suspend however if those interrupts are not marked as wakeup
>interrupts. Most constraints that prevent suspend are not hardware
>specific and should not prevent entering low power modes from idle. In
>other words we may need to prevent low power idle modes while allowing
>suspend, and we may need to prevent suspend while allowing low power
>idle modes.
>
>It would also be good to not have an implementation that gets slower
>and slower the more clients you have. With binary constraints this is
>trivial.
[mtg: ] agreed.
>
>>
>> The other piece they need is the suspend block name, which comes with
>> the stats API, and finally we need to decide what the actual constraint
>> is called (which is how the dev node gets its name) ...
>>
>>> > ? ? ?2. pm_qos will be updated to be callable from atomic context
>>> > ? ? ?3. pm_qos will be updated to export statistics initially closely
>>> > ? ? ? ? matching what suspend blockers provides (simple update of the
>rw
>>> > ? ? ? ? interface?)
>
>4. It would be useful to change pm_qos_add_request to not allocate
>anything so can add constraints from init functions that currently
>cannot fail.
[mtg: ] I'm not sure how to do this but I agree it would be good. I guess we could have a block of pm_qos requests pre-allocated statically and re-use them. In practice there will not be more than a handful of requests ever. Dynamic allocation does seem like a bit of a waste.
>
>>> >
>>> > After this is done, the current android suspend block patch becomes a
>>> > re-expression in kernel space in terms of pm_qos, with the current
>>> > userspace wakelocks being adapted by the android framework into pm_qos
>>> > requirements expressed to /dev/cpu_interactivity (or whatever name is
>>> > chosen). ?Then opportunistic suspend is either a small add-on kernel
>>> > patch they have in their tree to suspend when the interactivity
>>> > constraint goes to NONE, or it could be done entirely by a userspace
>>> > process. ?Long term this could migrate to the freezer and suspend from
>>> > idle approach as the various problem timers get fixed.
>>> >
>>> > I think the big unresolved issue is the stats extension. ?For android,
>>> > we need just a name written along with the value, so we have something
>>> > to hang the stats off ... current pm_qos userspace users just write a
>>> > value, so the name would be optional. ?From the kernel, we probably
>just
>>> > need an additional API that takes a stats name or NULL if none
>>> > (pm_qos_add_request_named()?). ?Then reading the stats could be done
>by
>>> > implementing a fops read routine on the misc device.
>>>
>>> Is the original idea of having that information in debugfs
>objectionable?
>>
>> Well ... debugfs is usually used to get around the sysfs rules. ?In this
>> case, pm_qos has a dev interface ... I don't specifically object to
>> using debugfs, but I don't see any reason to forbid it from being a
>> simple dev read interface either.
>>
>
>We don't currently have a dev interface for stats so this is not an
>immediate requirement. The suspend blocker debugfs interface is just
>as good as the proc interface we have for wakelocks.
>
>--
>Arve Hj?nnev?g
On Tue, Jun 01, 2010 at 04:01:25PM -0500, James Bottomley wrote:
> On Tue, 2010-06-01 at 14:51 +0100, Matthew Garrett wrote:
> > On Mon, May 31, 2010 at 04:21:09PM -0500, James Bottomley wrote:
> >
> > > You're the one mentioning x86, not me. I already explained that some
> > > MSM hardware (the G1 for example) has lower power consumption in S3
> > > (which I'm using as an ACPI shorthand for suspend to ram) than any
> > > suspend from idle C state. The fact that current x86 hardware has the
> > > same problem may be true, but it's not entirely relevant.
> >
> > As long as you can set a wakeup timer, an S state is just a C state with
> > side effects. The significant one is that entering an S state stops the
> > process scheduler and any in-kernel timers. I don't think Google care at
> > all about whether suspend is entered through an explicit transition or
> > something hooked into cpuidle - the relevant issue is that they want to
> > be able to express a set of constraints that lets them control whether
> > or not the scheduler keeps on scheduling, and which doesn't let them
> > lose wakeup events in the process.
>
> Exactly, so my understanding of where we currently are is:
>
> 1. pm_qos will be updated to be able to express the android suspend
> blockers as interactivity constraints (exact name TBD, but
> probably /dev/cpu_interactivity)
> 2. pm_qos will be updated to be callable from atomic context
> 3. pm_qos will be updated to export statistics initially closely
> matching what suspend blockers provides (simple update of the rw
> interface?)
>
> After this is done, the current android suspend block patch becomes a
> re-expression in kernel space in terms of pm_qos, with the current
> userspace wakelocks being adapted by the android framework into pm_qos
> requirements expressed to /dev/cpu_interactivity (or whatever name is
> chosen). Then opportunistic suspend is either a small add-on kernel
> patch they have in their tree to suspend when the interactivity
> constraint goes to NONE, or it could be done entirely by a userspace
> process. Long term this could migrate to the freezer and suspend from
> idle approach as the various problem timers get fixed.
This is all nice but, all this does is implement the exact same thing as
the wake lock / suspend blocker API as a pm_qos request-class. It
leaves the overlapping constraint issue from ISR to user mode in place
depending on exactly how the oppertunistic suspend is implemented.
I expect it will be via a notifier on the pm_qos request-class update
that would do exactly what the wake lock code does today. just load up
an a "suspend_on_non_interactivity" driver that registers for the call
back, have it enabled by the user mode PM, and you have the equivelent
architecture as what was proposed by the wake lock patches.
it gives the Android guys what they want, without adding a new
subsystem, minimizing the changes and makes most of the architecture
much more politicaly acceptible.
But doesn't it have the same issues with getting the overlapping
constraints right from wake up source to user mode and dealing with the
wake up envents in a sane way? Instead of sprinkling suspend-blockers
about the kernel we'll sprinkle pm_qos_requests about. I like getting
more users of pm_qos, but isn't this the same thing?
>
> I think the big unresolved issue is the stats extension. For android,
> we need just a name written along with the value, so we have something
> to hang the stats off ... current pm_qos userspace users just write a
> value, so the name would be optional. From the kernel, we probably just
> need an additional API that takes a stats name or NULL if none
> (pm_qos_add_request_named()?). Then reading the stats could be done by
> implementing a fops read routine on the misc device.
I don't think the status would be a big deal to add.
However; I am really burned out by this discussion. I am willing to
stub this out ASAP if it puts this behind us if the principles in the
discussion are in more or less agreement.
--mgross
For the record, I still like my low power event idea, which could
coexist with the above.
2010/6/1 Gross, Mark <[email protected]>:
...
>>4. It would be useful to change pm_qos_add_request to not allocate
>>anything so can add constraints from init functions that currently
>>cannot fail.
> [mtg: ] I'm not sure how to do this but I agree it would be good. ?I guess we could have a block of pm_qos requests pre-allocated statically and re-use them. ?In practice there will not be more than a handful of requests ever. ?Dynamic allocation does seem like a bit of a waste.
The calling code will have to store a pointer to your structure
anyway, you may as well have them provide the whole structure.
--
Arve Hj?nnev?g
>-----Original Message-----
>From: Arve Hj?nnev?g [mailto:[email protected]]
>Sent: Tuesday, June 01, 2010 8:15 PM
>To: Gross, Mark
>Cc: James Bottomley; Rafael J. Wysocki; Matthew Garrett; Thomas Gleixner;
>Peter Zijlstra; [email protected]; LKML; Florian Mickler; Linux PM; Linux OMAP
>Mailing List; [email protected]; Alan Cox; Alan Stern; Neil Brown
>Subject: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)
>
>2010/6/1 Gross, Mark <[email protected]>:
>...
>>>4. It would be useful to change pm_qos_add_request to not allocate
>>>anything so can add constraints from init functions that currently
>>>cannot fail.
>> [mtg: ] I'm not sure how to do this but I agree it would be good. ?I
>guess we could have a block of pm_qos requests pre-allocated statically and
>re-use them. ?In practice there will not be more than a handful of requests
>ever. ?Dynamic allocation does seem like a bit of a waste.
>
>The calling code will have to store a pointer to your structure
>anyway, you may as well have them provide the whole structure.
[mtg: ] duh! You are right. Make the caller's hold the structure. Its been a long day. That would be easy todo.
--gmross
On Tue, 2010-06-01 at 18:10 -0700, Arve Hjønnevåg wrote:
> On Tue, Jun 1, 2010 at 3:36 PM, James Bottomley <[email protected]> wrote:
> > On Wed, 2010-06-02 at 00:24 +0200, Rafael J. Wysocki wrote:
> >> On Tuesday 01 June 2010, James Bottomley wrote:
> >> > On Tue, 2010-06-01 at 14:51 +0100, Matthew Garrett wrote:
> >> > > On Mon, May 31, 2010 at 04:21:09PM -0500, James Bottomley wrote:
> >> > >
> >> > > > You're the one mentioning x86, not me. I already explained that some
> >> > > > MSM hardware (the G1 for example) has lower power consumption in S3
> >> > > > (which I'm using as an ACPI shorthand for suspend to ram) than any
> >> > > > suspend from idle C state. The fact that current x86 hardware has the
> >> > > > same problem may be true, but it's not entirely relevant.
> >> > >
> >> > > As long as you can set a wakeup timer, an S state is just a C state with
> >> > > side effects. The significant one is that entering an S state stops the
> >> > > process scheduler and any in-kernel timers. I don't think Google care at
> >> > > all about whether suspend is entered through an explicit transition or
> >> > > something hooked into cpuidle - the relevant issue is that they want to
> >> > > be able to express a set of constraints that lets them control whether
> >> > > or not the scheduler keeps on scheduling, and which doesn't let them
> >> > > lose wakeup events in the process.
> >> >
> >> > Exactly, so my understanding of where we currently are is:
> >>
> >> Thanks for the recap.
> >>
> >> > 1. pm_qos will be updated to be able to express the android suspend
> >> > blockers as interactivity constraints (exact name TBD, but
> >> > probably /dev/cpu_interactivity)
> >>
> >> I think that's not been decided yet precisely enough. I saw a few ideas
> >> here and there in the thread, but which of them are we going to follow?
> >
> > Well, android only needs two states (block and don't block), so that
> > gets translated as 2 s32 values (say 0 and INT_MAX). I've seen defines
> > like QOS_INTERACTIVE and QOS_NONE (or QOS_DRECKLY or QOS_MANANA) to
> > describe these, but if all we're arguing over is the define name, that's
> > progress.
>
> I think we need separate state constraints for suspend and idle low
> power modes. On the msm platform only a subset of the interrupts can
> wake up from the low power mode, so we block the use if the low power
> mode from idle while other interrupts are enabled. We do not block
> suspend however if those interrupts are not marked as wakeup
> interrupts. Most constraints that prevent suspend are not hardware
> specific and should not prevent entering low power modes from idle. In
> other words we may need to prevent low power idle modes while allowing
> suspend, and we may need to prevent suspend while allowing low power
> idle modes.
Well, as I said, pm_qos is s32 ... it's easy to make the constraint
ternary instead of binary.
> It would also be good to not have an implementation that gets slower
> and slower the more clients you have. With binary constraints this is
> trivial.
Well, that's an implementation detail ... ordering the list or using a
btree would significantly fix that. However, the most number of
constraint users I've seen in android is around 60 ... that's not huge
from a kernel linear list perspective, so is this really a concern? ...
particularly when most uses don't necessarily change the constrain, so a
list search isn't done.
> > The other piece they need is the suspend block name, which comes with
> > the stats API, and finally we need to decide what the actual constraint
> > is called (which is how the dev node gets its name) ...
> >
> >> > 2. pm_qos will be updated to be callable from atomic context
> >> > 3. pm_qos will be updated to export statistics initially closely
> >> > matching what suspend blockers provides (simple update of the rw
> >> > interface?)
>
> 4. It would be useful to change pm_qos_add_request to not allocate
> anything so can add constraints from init functions that currently
> cannot fail.
Sure .. we do that for the delayed work queues, it's just an API which
takes the structure as an argument leaving it the responsibility of the
caller to free.
> >> > After this is done, the current android suspend block patch becomes a
> >> > re-expression in kernel space in terms of pm_qos, with the current
> >> > userspace wakelocks being adapted by the android framework into pm_qos
> >> > requirements expressed to /dev/cpu_interactivity (or whatever name is
> >> > chosen). Then opportunistic suspend is either a small add-on kernel
> >> > patch they have in their tree to suspend when the interactivity
> >> > constraint goes to NONE, or it could be done entirely by a userspace
> >> > process. Long term this could migrate to the freezer and suspend from
> >> > idle approach as the various problem timers get fixed.
> >> >
> >> > I think the big unresolved issue is the stats extension. For android,
> >> > we need just a name written along with the value, so we have something
> >> > to hang the stats off ... current pm_qos userspace users just write a
> >> > value, so the name would be optional. From the kernel, we probably just
> >> > need an additional API that takes a stats name or NULL if none
> >> > (pm_qos_add_request_named()?). Then reading the stats could be done by
> >> > implementing a fops read routine on the misc device.
> >>
> >> Is the original idea of having that information in debugfs objectionable?
> >
> > Well ... debugfs is usually used to get around the sysfs rules. In this
> > case, pm_qos has a dev interface ... I don't specifically object to
> > using debugfs, but I don't see any reason to forbid it from being a
> > simple dev read interface either.
> >
>
> We don't currently have a dev interface for stats so this is not an
> immediate requirement. The suspend blocker debugfs interface is just
> as good as the proc interface we have for wakelocks.
OK, great ... what actually exports the statistics is just an
implementation detail.
James
On Tue, 2010-06-01 at 19:45 -0700, mark gross wrote:
> On Tue, Jun 01, 2010 at 04:01:25PM -0500, James Bottomley wrote:
> > On Tue, 2010-06-01 at 14:51 +0100, Matthew Garrett wrote:
> > > On Mon, May 31, 2010 at 04:21:09PM -0500, James Bottomley wrote:
> > >
> > > > You're the one mentioning x86, not me. I already explained that some
> > > > MSM hardware (the G1 for example) has lower power consumption in S3
> > > > (which I'm using as an ACPI shorthand for suspend to ram) than any
> > > > suspend from idle C state. The fact that current x86 hardware has the
> > > > same problem may be true, but it's not entirely relevant.
> > >
> > > As long as you can set a wakeup timer, an S state is just a C state with
> > > side effects. The significant one is that entering an S state stops the
> > > process scheduler and any in-kernel timers. I don't think Google care at
> > > all about whether suspend is entered through an explicit transition or
> > > something hooked into cpuidle - the relevant issue is that they want to
> > > be able to express a set of constraints that lets them control whether
> > > or not the scheduler keeps on scheduling, and which doesn't let them
> > > lose wakeup events in the process.
> >
> > Exactly, so my understanding of where we currently are is:
> >
> > 1. pm_qos will be updated to be able to express the android suspend
> > blockers as interactivity constraints (exact name TBD, but
> > probably /dev/cpu_interactivity)
> > 2. pm_qos will be updated to be callable from atomic context
> > 3. pm_qos will be updated to export statistics initially closely
> > matching what suspend blockers provides (simple update of the rw
> > interface?)
> >
> > After this is done, the current android suspend block patch becomes a
> > re-expression in kernel space in terms of pm_qos, with the current
> > userspace wakelocks being adapted by the android framework into pm_qos
> > requirements expressed to /dev/cpu_interactivity (or whatever name is
> > chosen). Then opportunistic suspend is either a small add-on kernel
> > patch they have in their tree to suspend when the interactivity
> > constraint goes to NONE, or it could be done entirely by a userspace
> > process. Long term this could migrate to the freezer and suspend from
> > idle approach as the various problem timers get fixed.
>
> This is all nice but, all this does is implement the exact same thing as
> the wake lock / suspend blocker API as a pm_qos request-class.
funny that ...
> It
> leaves the overlapping constraint issue from ISR to user mode in place
> depending on exactly how the oppertunistic suspend is implemented.
if the vanilla kernel is simply consuming the pm_qos infrastructure and
using suspend from idle, this is irrelevant. As I said, S3 suspend
*can* be implemented via a suspend manager process from userspace (the
alan stern proposal). However, if I were coding the android kernel, I'd
do it as a tiny add on kernel patch. The main goal of making the
android kernel close enough to the vanilla kernel for there not to be
two separate upstreams for the device driver writers has been achieved
regardless of which path is taken.
> I expect it will be via a notifier on the pm_qos request-class update
> that would do exactly what the wake lock code does today. just load up
> an a "suspend_on_non_interactivity" driver that registers for the call
> back, have it enabled by the user mode PM, and you have the equivelent
> architecture as what was proposed by the wake lock patches.
>
> it gives the Android guys what they want, without adding a new
> subsystem, minimizing the changes and makes most of the architecture
> much more politicaly acceptible.
>
> But doesn't it have the same issues with getting the overlapping
> constraints right from wake up source to user mode and dealing with the
> wake up envents in a sane way? Instead of sprinkling suspend-blockers
> about the kernel we'll sprinkle pm_qos_requests about. I like getting
> more users of pm_qos, but isn't this the same thing?
Suspend from idle doesn't have the wakeup problem. it only manifests if
you want to take the system down via the S states. I think long term,
making suspend from idle work for all hardware is the agreed goal, even
if android can't implement it today and has to use an S state work
around.
> > I think the big unresolved issue is the stats extension. For android,
> > we need just a name written along with the value, so we have something
> > to hang the stats off ... current pm_qos userspace users just write a
> > value, so the name would be optional. From the kernel, we probably just
> > need an additional API that takes a stats name or NULL if none
> > (pm_qos_add_request_named()?). Then reading the stats could be done by
> > implementing a fops read routine on the misc device.
>
> I don't think the status would be a big deal to add.
>
>
> However; I am really burned out by this discussion. I am willing to
> stub this out ASAP if it puts this behind us if the principles in the
> discussion are in more or less agreement.
>
> --mgross
>
> For the record, I still like my low power event idea, which could
> coexist with the above.
The proposal is isomorphic to what I said above ... just
s/pm_qos/whatever the lp API is/
James
2010/6/1 James Bottomley <[email protected]>:
> On Tue, 2010-06-01 at 18:10 -0700, Arve Hj?nnev?g wrote:
>> On Tue, Jun 1, 2010 at 3:36 PM, James Bottomley <[email protected]> wrote:
>> > On Wed, 2010-06-02 at 00:24 +0200, Rafael J. Wysocki wrote:
>> >> On Tuesday 01 June 2010, James Bottomley wrote:
>> >> > On Tue, 2010-06-01 at 14:51 +0100, Matthew Garrett wrote:
>> >> > > On Mon, May 31, 2010 at 04:21:09PM -0500, James Bottomley wrote:
>> >> > >
>> >> > > > You're the one mentioning x86, not me. ?I already explained that some
>> >> > > > MSM hardware (the G1 for example) has lower power consumption in S3
>> >> > > > (which I'm using as an ACPI shorthand for suspend to ram) than any
>> >> > > > suspend from idle C state. ?The fact that current x86 hardware has the
>> >> > > > same problem may be true, but it's not entirely relevant.
>> >> > >
>> >> > > As long as you can set a wakeup timer, an S state is just a C state with
>> >> > > side effects. The significant one is that entering an S state stops the
>> >> > > process scheduler and any in-kernel timers. I don't think Google care at
>> >> > > all about whether suspend is entered through an explicit transition or
>> >> > > something hooked into cpuidle - the relevant issue is that they want to
>> >> > > be able to express a set of constraints that lets them control whether
>> >> > > or not the scheduler keeps on scheduling, and which doesn't let them
>> >> > > lose wakeup events in the process.
>> >> >
>> >> > Exactly, so my understanding of where we currently are is:
>> >>
>> >> Thanks for the recap.
>> >>
>> >> > ? ? ?1. pm_qos will be updated to be able to express the android suspend
>> >> > ? ? ? ? blockers as interactivity constraints (exact name TBD, but
>> >> > ? ? ? ? probably /dev/cpu_interactivity)
>> >>
>> >> I think that's not been decided yet precisely enough. ?I saw a few ideas
>> >> here and there in the thread, but which of them are we going to follow?
>> >
>> > Well, android only needs two states (block and don't block), so that
>> > gets translated as 2 s32 values (say 0 and INT_MAX). ?I've seen defines
>> > like QOS_INTERACTIVE and QOS_NONE (or QOS_DRECKLY or QOS_MANANA) to
>> > describe these, but if all we're arguing over is the define name, that's
>> > progress.
>>
>> I think we need separate state constraints for suspend and idle low
>> power modes. On the msm platform only a subset of the interrupts can
>> wake up from the low power mode, so we block the use if the low power
>> mode from idle while other interrupts are enabled. We do not block
>> suspend however if those interrupts are not marked as wakeup
>> interrupts. Most constraints that prevent suspend are not hardware
>> specific and should not prevent entering low power modes from idle. In
>> other words we may need to prevent low power idle modes while allowing
>> suspend, and we may need to prevent suspend while allowing low power
>> idle modes.
>
> Well, as I said, pm_qos is s32 ... it's easy to make the constraint
> ternary instead of binary.
No, they have to be two separate constraints, otherwise a constraint
to block suspend would override a constraint to block a low power idle
mode or the other way around.
>
>> It would also be good to not have an implementation that gets slower
>> and slower the more clients you have. With binary constraints this is
>> trivial.
>
> Well, that's an implementation detail ... ordering the list or using a
True. I think we also need timeout support in the short term though
which is also somewhat simpler to implement in an efficient way for
binary constraints.
> btree would significantly fix that. ?However, the most number of
> constraint users I've seen in android is around 60 ... that's not huge
> from a kernel linear list perspective, so is this really a concern? ...
> particularly when most uses don't necessarily change the constrain, so a
> list search isn't done.
The search is done every time any of them changes.
>
>> > The other piece they need is the suspend block name, which comes with
>> > the stats API, and finally we need to decide what the actual constraint
>> > is called (which is how the dev node gets its name) ...
>> >
>> >> > ? ? ?2. pm_qos will be updated to be callable from atomic context
>> >> > ? ? ?3. pm_qos will be updated to export statistics initially closely
>> >> > ? ? ? ? matching what suspend blockers provides (simple update of the rw
>> >> > ? ? ? ? interface?)
>>
>> 4. It would be useful to change pm_qos_add_request to not allocate
>> anything so can add constraints from init functions that currently
>> cannot fail.
>
> Sure .. we do that for the delayed work queues, it's just an API which
> takes the structure as an argument leaving it the responsibility of the
> caller to free.
>
>> >> > After this is done, the current android suspend block patch becomes a
>> >> > re-expression in kernel space in terms of pm_qos, with the current
>> >> > userspace wakelocks being adapted by the android framework into pm_qos
>> >> > requirements expressed to /dev/cpu_interactivity (or whatever name is
>> >> > chosen). ?Then opportunistic suspend is either a small add-on kernel
>> >> > patch they have in their tree to suspend when the interactivity
>> >> > constraint goes to NONE, or it could be done entirely by a userspace
>> >> > process. ?Long term this could migrate to the freezer and suspend from
>> >> > idle approach as the various problem timers get fixed.
>> >> >
>> >> > I think the big unresolved issue is the stats extension. ?For android,
>> >> > we need just a name written along with the value, so we have something
>> >> > to hang the stats off ... current pm_qos userspace users just write a
>> >> > value, so the name would be optional. ?From the kernel, we probably just
>> >> > need an additional API that takes a stats name or NULL if none
>> >> > (pm_qos_add_request_named()?). ?Then reading the stats could be done by
>> >> > implementing a fops read routine on the misc device.
>> >>
>> >> Is the original idea of having that information in debugfs objectionable?
>> >
>> > Well ... debugfs is usually used to get around the sysfs rules. ?In this
>> > case, pm_qos has a dev interface ... I don't specifically object to
>> > using debugfs, but I don't see any reason to forbid it from being a
>> > simple dev read interface either.
>> >
>>
>> We don't currently have a dev interface for stats so this is not an
>> immediate requirement. The suspend blocker debugfs interface is just
>> as good as the proc interface we have for wakelocks.
>
> OK, great ... what actually exports the statistics is just an
> implementation detail.
>
> James
>
>
>
>
--
Arve Hj?nnev?g
On Tue, 2010-06-01 at 21:41 -0700, Arve Hjønnevåg wrote:
> 2010/6/1 James Bottomley <[email protected]>:
> > On Tue, 2010-06-01 at 18:10 -0700, Arve Hjønnevåg wrote:
> >> On Tue, Jun 1, 2010 at 3:36 PM, James Bottomley <[email protected]> wrote:
> >> > On Wed, 2010-06-02 at 00:24 +0200, Rafael J. Wysocki wrote:
> >> >> On Tuesday 01 June 2010, James Bottomley wrote:
> >> >> > On Tue, 2010-06-01 at 14:51 +0100, Matthew Garrett wrote:
> >> >> > > On Mon, May 31, 2010 at 04:21:09PM -0500, James Bottomley wrote:
> >> >> > >
> >> >> > > > You're the one mentioning x86, not me. I already explained that some
> >> >> > > > MSM hardware (the G1 for example) has lower power consumption in S3
> >> >> > > > (which I'm using as an ACPI shorthand for suspend to ram) than any
> >> >> > > > suspend from idle C state. The fact that current x86 hardware has the
> >> >> > > > same problem may be true, but it's not entirely relevant.
> >> >> > >
> >> >> > > As long as you can set a wakeup timer, an S state is just a C state with
> >> >> > > side effects. The significant one is that entering an S state stops the
> >> >> > > process scheduler and any in-kernel timers. I don't think Google care at
> >> >> > > all about whether suspend is entered through an explicit transition or
> >> >> > > something hooked into cpuidle - the relevant issue is that they want to
> >> >> > > be able to express a set of constraints that lets them control whether
> >> >> > > or not the scheduler keeps on scheduling, and which doesn't let them
> >> >> > > lose wakeup events in the process.
> >> >> >
> >> >> > Exactly, so my understanding of where we currently are is:
> >> >>
> >> >> Thanks for the recap.
> >> >>
> >> >> > 1. pm_qos will be updated to be able to express the android suspend
> >> >> > blockers as interactivity constraints (exact name TBD, but
> >> >> > probably /dev/cpu_interactivity)
> >> >>
> >> >> I think that's not been decided yet precisely enough. I saw a few ideas
> >> >> here and there in the thread, but which of them are we going to follow?
> >> >
> >> > Well, android only needs two states (block and don't block), so that
> >> > gets translated as 2 s32 values (say 0 and INT_MAX). I've seen defines
> >> > like QOS_INTERACTIVE and QOS_NONE (or QOS_DRECKLY or QOS_MANANA) to
> >> > describe these, but if all we're arguing over is the define name, that's
> >> > progress.
> >>
> >> I think we need separate state constraints for suspend and idle low
> >> power modes. On the msm platform only a subset of the interrupts can
> >> wake up from the low power mode, so we block the use if the low power
> >> mode from idle while other interrupts are enabled. We do not block
> >> suspend however if those interrupts are not marked as wakeup
> >> interrupts. Most constraints that prevent suspend are not hardware
> >> specific and should not prevent entering low power modes from idle. In
> >> other words we may need to prevent low power idle modes while allowing
> >> suspend, and we may need to prevent suspend while allowing low power
> >> idle modes.
> >
> > Well, as I said, pm_qos is s32 ... it's easy to make the constraint
> > ternary instead of binary.
>
> No, they have to be two separate constraints, otherwise a constraint
> to block suspend would override a constraint to block a low power idle
> mode or the other way around.
Depends. If you block the system from going into low power idle, does
that mean you still want it to be fully suspended?
If yes, then we do have independent constraints. If not, they have a
hierarchy:
* Fully Interactive (no low power idle or suspend)
* Partially Interactive (may go into low power idle but not
suspend)
* None (may go into low power idle or suspend)
Which is expressable as a ternary constraint.
James
On Wed, 02 Jun 2010 10:05:11 -0500
James Bottomley <[email protected]> wrote:
> On Tue, 2010-06-01 at 21:41 -0700, Arve Hj?nnev?g wrote:
> > No, they have to be two separate constraints, otherwise a constraint
> > to block suspend would override a constraint to block a low power idle
> > mode or the other way around.
>
> Depends. If you block the system from going into low power idle, does
> that mean you still want it to be fully suspended?
>
> If yes, then we do have independent constraints. If not, they have a
> hierarchy:
>
> * Fully Interactive (no low power idle or suspend)
> * Partially Interactive (may go into low power idle but not
> suspend)
> * None (may go into low power idle or suspend)
>
> Which is expressable as a ternary constraint.
>
> James
>
But unblocking suspend at the moment is independent to getting idle.
If you have the requirement to stay in the highest-idle level (i.e.
best latency you can get) that does not (currently) mean, that you can
not suspend.
To preserve that explicit fall-through while still having working
run-time-powermanagement I think the qos-constraints need to be
separated.
<disclaimer: just from what I read>
Provided you can reach the same power state from idle, current suspend
could probably also be implemented by just the freezing part and a hint
to the idle-loop to provide accelerated fall-through to lowest power.
</disclaimer>
At that point, you could probably merge the constraints.
But the freezing part is also the hard part, isn't it? (I have no
idea. Thomas seems to think about cgroups for that and doing smth about the timers.)
Cheers,
Flo
On Wed, 2010-06-02 at 21:47 +0200, Florian Mickler wrote:
> On Wed, 02 Jun 2010 10:05:11 -0500
> James Bottomley <[email protected]> wrote:
>
> > On Tue, 2010-06-01 at 21:41 -0700, Arve Hjønnevåg wrote:
> > > No, they have to be two separate constraints, otherwise a constraint
> > > to block suspend would override a constraint to block a low power idle
> > > mode or the other way around.
> >
> > Depends. If you block the system from going into low power idle, does
> > that mean you still want it to be fully suspended?
> >
> > If yes, then we do have independent constraints. If not, they have a
> > hierarchy:
> >
> > * Fully Interactive (no low power idle or suspend)
> > * Partially Interactive (may go into low power idle but not
> > suspend)
> > * None (may go into low power idle or suspend)
> >
> > Which is expressable as a ternary constraint.
> >
> > James
> >
>
> But unblocking suspend at the moment is independent to getting idle.
> If you have the requirement to stay in the highest-idle level (i.e.
> best latency you can get) that does not (currently) mean, that you can
> not suspend.
I don't understand that as a reason. If we looks at this a qos
constraints, you're saying that the system may not drop into certain low
power states because it might turn something off that's currently being
used by a driver or a process. Suspend is certainly the lowest state of
that because it turns everything off, why would it be legal to drop into
that?
I also couldn't find this notion of separation of idleness power from
suspend blocking in the original suspend block patch set ... if you can
either tell me where it is, or give me an example of the separated use
cases, I'd understand better.
> To preserve that explicit fall-through while still having working
> run-time-powermanagement I think the qos-constraints need to be
> separated.
OK, as above, why? or better yet, just give an example.
> <disclaimer: just from what I read>
> Provided you can reach the same power state from idle, current suspend
> could probably also be implemented by just the freezing part and a hint
> to the idle-loop to provide accelerated fall-through to lowest power.
> </disclaimer>
>
> At that point, you could probably merge the constraints.
>
> But the freezing part is also the hard part, isn't it? (I have no
> idea. Thomas seems to think about cgroups for that and doing smth about the timers.)
Um, well, as I said, I think using suspend from idle and freezer is
longer term. I think if we express the constraints as qos android can
then use them to gate when to enter S3 .. which is functionally
equivalent to suspend blockers. And the vanilla kernel can use them to
gate power states for the drivers in suspend from idle.
James
On Wed, Jun 2, 2010 at 1:41 PM, James Bottomley <[email protected]> wrote:
> On Wed, 2010-06-02 at 21:47 +0200, Florian Mickler wrote:
>> On Wed, 02 Jun 2010 10:05:11 -0500
>> James Bottomley <[email protected]> wrote:
>>
>> > On Tue, 2010-06-01 at 21:41 -0700, Arve Hj?nnev?g wrote:
>> > > No, they have to be two separate constraints, otherwise a constraint
>> > > to block suspend would override a constraint to block a low power idle
>> > > mode or the other way around.
>> >
>> > Depends. ?If you block the system from going into low power idle, does
>> > that mean you still want it to be fully suspended?
>> >
>> > If yes, then we do have independent constraints. ?If not, they have a
>> > hierarchy:
>> >
>> > ? ? ? * Fully Interactive (no low power idle or suspend)
>> > ? ? ? * Partially Interactive (may go into low power idle but not
>> > ? ? ? ? suspend)
>> > ? ? ? * None (may go into low power idle or suspend)
>> >
>> > Which is expressable as a ternary constraint.
>> >
>> > James
>> >
>>
>> But unblocking suspend at the moment is independent to getting idle.
>> If you have the requirement to stay in the highest-idle level (i.e.
>> best latency you can get) that does not (currently) mean, that you can
>> not suspend.
>
> I don't understand that as a reason. ?If we looks at this a qos
> constraints, you're saying that the system may not drop into certain low
> power states because it might turn something off that's currently being
> used by a driver or a process. ?Suspend is certainly the lowest state of
> that because it turns everything off, why would it be legal to drop into
> that?
Because the driver gets called on suspend which gives it a change to
stop using it.
>
> I also couldn't find this notion of separation of idleness power from
> suspend blocking in the original suspend block patch set ... if you can
> either tell me where it is, or give me an example of the separated use
> cases, I'd understand better.
>
The suspend block patchset only deals with suspend, not low power idle
modes. The original wakelock patchset had two wakelock types, idle and
suspend.
>> To preserve that explicit fall-through while still having working
>> run-time-powermanagement I think the qos-constraints need to be
>> separated.
>
> OK, as above, why? ?or better yet, just give an example.
>
The i2c bus on the Nexus One is used by the other core to turn off the
power you our core when we enter the lowest power mode. This means
that we cannot enter that low power mode while the i2c bus is active,
so we block low power idle modes. At some point we also tries to block
suspend in this case, but this caused a lot of failed suspend attempts
since the frequency scaling code would try to ramp up while freezing
processes which in turn aborted the process freezing attempts.
>> <disclaimer: just from what I read>
>> Provided you can reach the same power state from idle, current suspend
>> could probably also be implemented by just the freezing part and a hint
>> to the idle-loop to provide accelerated fall-through to lowest power.
>> </disclaimer>
>>
>> At that point, you could probably merge the constraints.
>>
>> But the freezing part is also the hard part, isn't it? (I have no
>> idea. Thomas seems to think about cgroups for that and doing smth about the timers.)
>
> Um, well, as I said, I think using suspend from idle and freezer is
> longer term. ?I think if we express the constraints as qos android can
> then use them to gate when to enter S3 .. which is functionally
> equivalent to suspend blockers. ?And the vanilla kernel can use them to
> gate power states for the drivers in suspend from idle.
>
> James
>
>
>
--
Arve Hj?nnev?g
On Wed, 2010-06-02 at 15:27 -0700, Arve Hjønnevåg wrote:
> On Wed, Jun 2, 2010 at 1:41 PM, James Bottomley <[email protected]> wrote:
> > On Wed, 2010-06-02 at 21:47 +0200, Florian Mickler wrote:
> >> On Wed, 02 Jun 2010 10:05:11 -0500
> >> James Bottomley <[email protected]> wrote:
> >>
> >> > On Tue, 2010-06-01 at 21:41 -0700, Arve Hjønnevåg wrote:
> >> > > No, they have to be two separate constraints, otherwise a constraint
> >> > > to block suspend would override a constraint to block a low power idle
> >> > > mode or the other way around.
> >> >
> >> > Depends. If you block the system from going into low power idle, does
> >> > that mean you still want it to be fully suspended?
> >> >
> >> > If yes, then we do have independent constraints. If not, they have a
> >> > hierarchy:
> >> >
> >> > * Fully Interactive (no low power idle or suspend)
> >> > * Partially Interactive (may go into low power idle but not
> >> > suspend)
> >> > * None (may go into low power idle or suspend)
> >> >
> >> > Which is expressable as a ternary constraint.
> >> >
> >> > James
> >> >
> >>
> >> But unblocking suspend at the moment is independent to getting idle.
> >> If you have the requirement to stay in the highest-idle level (i.e.
> >> best latency you can get) that does not (currently) mean, that you can
> >> not suspend.
> >
> > I don't understand that as a reason. If we looks at this a qos
> > constraints, you're saying that the system may not drop into certain low
> > power states because it might turn something off that's currently being
> > used by a driver or a process. Suspend is certainly the lowest state of
> > that because it turns everything off, why would it be legal to drop into
> > that?
>
> Because the driver gets called on suspend which gives it a change to
> stop using it.
>
> >
> > I also couldn't find this notion of separation of idleness power from
> > suspend blocking in the original suspend block patch set ... if you can
> > either tell me where it is, or give me an example of the separated use
> > cases, I'd understand better.
> >
>
> The suspend block patchset only deals with suspend, not low power idle
> modes. The original wakelock patchset had two wakelock types, idle and
> suspend.
OK, so that explains why I didn't see it ...
> >> To preserve that explicit fall-through while still having working
> >> run-time-powermanagement I think the qos-constraints need to be
> >> separated.
> >
> > OK, as above, why? or better yet, just give an example.
> >
>
> The i2c bus on the Nexus One is used by the other core to turn off the
> power you our core when we enter the lowest power mode. This means
> that we cannot enter that low power mode while the i2c bus is active,
> so we block low power idle modes. At some point we also tries to block
> suspend in this case, but this caused a lot of failed suspend attempts
> since the frequency scaling code would try to ramp up while freezing
> processes which in turn aborted the process freezing attempts.
OK, so this is a device specific power constraint state. I suppose it
makes sense to have a bunch of those, because the device isn't
necessarily going to know what idle power mode it can't go into, so the
cpu govenor should sort it out rather than have the device specify a
minimum state.
James
On Wed, 02 Jun 2010 15:41:11 -0500
James Bottomley <[email protected]> wrote:
> On Wed, 2010-06-02 at 21:47 +0200, Florian Mickler wrote:
> > On Wed, 02 Jun 2010 10:05:11 -0500
> > James Bottomley <[email protected]> wrote:
> >
> > > On Tue, 2010-06-01 at 21:41 -0700, Arve Hj?nnev?g wrote:
> > > > No, they have to be two separate constraints, otherwise a constraint
> > > > to block suspend would override a constraint to block a low power idle
> > > > mode or the other way around.
> > >
> > > Depends. If you block the system from going into low power idle, does
> > > that mean you still want it to be fully suspended?
> > >
> > > If yes, then we do have independent constraints. If not, they have a
> > > hierarchy:
> > >
> > > * Fully Interactive (no low power idle or suspend)
> > > * Partially Interactive (may go into low power idle but not
> > > suspend)
> > > * None (may go into low power idle or suspend)
> > >
> > > Which is expressable as a ternary constraint.
> > >
> > > James
> > >
> >
> > But unblocking suspend at the moment is independent to getting idle.
> > If you have the requirement to stay in the highest-idle level (i.e.
> > best latency you can get) that does not (currently) mean, that you can
> > not suspend.
>
> I don't understand that as a reason. If we looks at this a qos
> constraints, you're saying that the system may not drop into certain low
> power states because it might turn something off that's currently being
> used by a driver or a process. Suspend is certainly the lowest state of
> that because it turns everything off, why would it be legal to drop into
> that?
>
> I also couldn't find this notion of separation of idleness power from
> suspend blocking in the original suspend block patch set ... if you can
> either tell me where it is, or give me an example of the separated use
> cases, I'd understand better.
>
> > To preserve that explicit fall-through while still having working
> > run-time-powermanagement I think the qos-constraints need to be
> > separated.
>
> OK, as above, why? or better yet, just give an example.
Hm. Maybe it is me who doesn't understand.
With proposed patchset:
1. As soon as we unblock suspend we go down. (i.e. suspending)
2. While suspend is blocked, the idle-loop does it's things. (i.e.
runtime power managment -> can give same power-result as suspend)
possible cases:
1:
- qos-latency-constraints: 1ms, [here: forbids anything other than
C1 idle state.]
- suspend is blocked
2: - qos latency-constraints: as in 1
- suspend unblocked
3: - qos latency-constraints: infinity, cpu in lowest power state.
- suspend is blocked
4: - qos latency-constraints: infinity, cpu in lowest power state.
- suspend unblocked
in case 2 and 4 we would suspend, regardeless of the qos-latency.
in case 1 and 3 we would stay awake, regardeless of the qos-latency
constraint.
If only one constraint, then case 2 (or 3) wouldn't be possible. But it
is possible now.
A possible use case as an example?
(hmm... i'm trying my imagination hard now):
Your sound needs low latency, so that could be a cause for the
qos-latency constraint.
And unblocking suspend could nonetheless happen:
For example... you have an firefox open and don't want to
prevent suspend for that case when the display is turned off
Cheers,
Flo
>-----Original Message-----
>From: Florian Mickler [mailto:[email protected]]
>Sent: Wednesday, June 02, 2010 4:06 PM
>To: James Bottomley
>Cc: Arve Hj?nnev?g; Neil Brown; [email protected]; Peter Zijlstra; LKML; Alan
>Cox; Gross, Mark; Thomas Gleixner; Linux OMAP Mailing List; Linux PM;
>[email protected]
>Subject: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)
>
>On Wed, 02 Jun 2010 15:41:11 -0500
>James Bottomley <[email protected]> wrote:
>
>> On Wed, 2010-06-02 at 21:47 +0200, Florian Mickler wrote:
>> > On Wed, 02 Jun 2010 10:05:11 -0500
>> > James Bottomley <[email protected]> wrote:
>> >
>> > > On Tue, 2010-06-01 at 21:41 -0700, Arve Hj?nnev?g wrote:
>> > > > No, they have to be two separate constraints, otherwise a
>constraint
>> > > > to block suspend would override a constraint to block a low power
>idle
>> > > > mode or the other way around.
>> > >
>> > > Depends. If you block the system from going into low power idle,
>does
>> > > that mean you still want it to be fully suspended?
>> > >
>> > > If yes, then we do have independent constraints. If not, they have a
>> > > hierarchy:
>> > >
>> > > * Fully Interactive (no low power idle or suspend)
>> > > * Partially Interactive (may go into low power idle but not
>> > > suspend)
>> > > * None (may go into low power idle or suspend)
>> > >
>> > > Which is expressable as a ternary constraint.
>> > >
>> > > James
>> > >
>> >
>> > But unblocking suspend at the moment is independent to getting idle.
>> > If you have the requirement to stay in the highest-idle level (i.e.
>> > best latency you can get) that does not (currently) mean, that you can
>> > not suspend.
>>
>> I don't understand that as a reason. If we looks at this a qos
>> constraints, you're saying that the system may not drop into certain low
>> power states because it might turn something off that's currently being
>> used by a driver or a process. Suspend is certainly the lowest state of
>> that because it turns everything off, why would it be legal to drop into
>> that?
>>
>> I also couldn't find this notion of separation of idleness power from
>> suspend blocking in the original suspend block patch set ... if you can
>> either tell me where it is, or give me an example of the separated use
>> cases, I'd understand better.
>>
>> > To preserve that explicit fall-through while still having working
>> > run-time-powermanagement I think the qos-constraints need to be
>> > separated.
>>
>> OK, as above, why? or better yet, just give an example.
>
>Hm. Maybe it is me who doesn't understand.
>
>With proposed patchset:
>1. As soon as we unblock suspend we go down. (i.e. suspending)
>2. While suspend is blocked, the idle-loop does it's things. (i.e.
>runtime power managment -> can give same power-result as suspend)
>
>possible cases:
>1:
> - qos-latency-constraints: 1ms, [here: forbids anything other than
> C1 idle state.]
> - suspend is blocked
>
>2: - qos latency-constraints: as in 1
> - suspend unblocked
>
>3: - qos latency-constraints: infinity, cpu in lowest power state.
> - suspend is blocked
>
>4: - qos latency-constraints: infinity, cpu in lowest power state.
> - suspend unblocked
>
>
>in case 2 and 4 we would suspend, regardeless of the qos-latency.
>
>in case 1 and 3 we would stay awake, regardeless of the qos-latency
>constraint.
>
>
>If only one constraint, then case 2 (or 3) wouldn't be possible. But it
>is possible now.
>
>A possible use case as an example?
>(hmm... i'm trying my imagination hard now):
> Your sound needs low latency, so that could be a cause for the
> qos-latency constraint.
>
> And unblocking suspend could nonetheless happen:
> For example... you have an firefox open and don't want to
> prevent suspend for that case when the display is turned off
>
>
[mtg: ] This has been a pain point for the PM_QOS implementation. They change the constrain back and forth at the transaction level of the i2c driver. The pm_qos code really wasn't made to deal with such hot path use, as each such change triggers a re-computation of what the aggregate qos request is.
We've had a number of attempts at fixing this, but I think the proper fix is to bolt a "disable C-states > x" interface into cpu_idle that bypases pm_qos altogether. Or, perhaps add a new pm_qos API that does the equivalent operation, overriding whatever constraint is active.
--mgross
> [mtg: ] This has been a pain point for the PM_QOS implementation. They change the constrain back and forth at the transaction level of the i2c driver. The pm_qos code really wasn't made to deal with such hot path use, as each such change triggers a re-computation of what the aggregate qos request is.
That should be trivial in the usual case because 99% of the time you can
hot path
the QoS entry changing is the latest one
there have been no other changes
If it is valid I can use the cached previous aggregate I cunningly
saved in the top QoS entry when I computed the new one
(ie most of the time from the kernel side you have a QoS stack)
> We've had a number of attempts at fixing this, but I think the proper fix is to bolt a "disable C-states > x" interface into cpu_idle that bypases pm_qos altogether. Or, perhaps add a new pm_qos API that does the equivalent operation, overriding whatever constraint is active.
We need some of this anyway for deep power saving because there is
hardware which can't wake from soem states, which in turn means if that
device is active we need to be above the state in question.
On Thu, 2010-06-03 at 11:03 +0100, Alan Cox wrote:
> > [mtg: ] This has been a pain point for the PM_QOS implementation.
> They change the constrain back and forth at the transaction level of
> the i2c driver. The pm_qos code really wasn't made to deal with such
> hot path use, as each such change triggers a re-computation of what
> the aggregate qos request is.
>
> That should be trivial in the usual case because 99% of the time you can
> hot path
>
> the QoS entry changing is the latest one
> there have been no other changes
> If it is valid I can use the cached previous aggregate I cunningly
> saved in the top QoS entry when I computed the new one
>
> (ie most of the time from the kernel side you have a QoS stack)
Why would the kernel change the QoS state of a task? Why not have two
interacting QoS variables, one for the task, one for the subsystem in
question, and the action depends on their relative value?
> > We've had a number of attempts at fixing this, but I think the
> proper fix is to bolt a "disable C-states > x" interface into cpu_idle
> that bypases pm_qos altogether. Or, perhaps add a new pm_qos API that
> does the equivalent operation, overriding whatever constraint is
> active.
>
> We need some of this anyway for deep power saving because there is
> hardware which can't wake from soem states, which in turn means if that
> device is active we need to be above the state in question.
Right, and I can imagine that depending on the platform details and not
the device details, so we get platform hooks in the drivers, or possible
up in the generic stack because I don't think NICs actually know if
there are open connections.
On Thu, 2010-06-03 at 11:03 +0100, Alan Cox wrote:
> > [mtg: ] This has been a pain point for the PM_QOS implementation. They change the constrain back and forth at the transaction level of the i2c driver. The pm_qos code really wasn't made to deal with such hot path use, as each such change triggers a re-computation of what the aggregate qos request is.
>
> That should be trivial in the usual case because 99% of the time you can
> hot path
>
> the QoS entry changing is the latest one
> there have been no other changes
> If it is valid I can use the cached previous aggregate I cunningly
> saved in the top QoS entry when I computed the new one
>
> (ie most of the time from the kernel side you have a QoS stack)
It's not just the list based computation: that's trivial to fix, as you
say ... the other problem is the notifier chain, because that's blocking
and could be long. Could we invoke the notifier through a workqueue?
It doesn't seem to have veto power, so it's pure notification, does it
matter if the notice is delayed (as long as it's in order)?
> > We've had a number of attempts at fixing this, but I think the proper fix is to bolt a "disable C-states > x" interface into cpu_idle that bypases pm_qos altogether. Or, perhaps add a new pm_qos API that does the equivalent operation, overriding whatever constraint is active.
>
> We need some of this anyway for deep power saving because there is
> hardware which can't wake from soem states, which in turn means if that
> device is active we need to be above the state in question.
James
On Thu, 03 Jun 2010 08:24:31 -0500
James Bottomley <[email protected]> wrote:
> On Thu, 2010-06-03 at 11:03 +0100, Alan Cox wrote:
> > > [mtg: ] This has been a pain point for the PM_QOS implementation. They change the constrain back and forth at the transaction level of the i2c driver. The pm_qos code really wasn't made to deal with such hot path use, as each such change triggers a re-computation of what the aggregate qos request is.
> >
> > That should be trivial in the usual case because 99% of the time you can
> > hot path
> >
> > the QoS entry changing is the latest one
> > there have been no other changes
> > If it is valid I can use the cached previous aggregate I cunningly
> > saved in the top QoS entry when I computed the new one
> >
> > (ie most of the time from the kernel side you have a QoS stack)
>
> It's not just the list based computation: that's trivial to fix, as you
> say ... the other problem is the notifier chain, because that's blocking
> and could be long. Could we invoke the notifier through a workqueue?
> It doesn't seem to have veto power, so it's pure notification, does it
> matter if the notice is delayed (as long as it's in order)?
I think schedule_work() (worqueue.h) can take care of that.
Thats how the rfkill subsystem does it.
Cheers,
Flo
>-----Original Message-----
>From: James Bottomley [mailto:[email protected]]
>Sent: Thursday, June 03, 2010 6:25 AM
>To: Alan Cox
>Cc: Gross, Mark; Florian Mickler; Arve Hj?nnev?g; Neil Brown;
>[email protected]; Peter Zijlstra; LKML; Thomas Gleixner; Linux OMAP Mailing
>List; Linux PM; [email protected]
>Subject: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)
>
>On Thu, 2010-06-03 at 11:03 +0100, Alan Cox wrote:
>> > [mtg: ] This has been a pain point for the PM_QOS implementation. They
>change the constrain back and forth at the transaction level of the i2c
>driver. The pm_qos code really wasn't made to deal with such hot path use,
>as each such change triggers a re-computation of what the aggregate qos
>request is.
>>
>> That should be trivial in the usual case because 99% of the time you can
>> hot path
>>
>> the QoS entry changing is the latest one
>> there have been no other changes
>> If it is valid I can use the cached previous aggregate I cunningly
>> saved in the top QoS entry when I computed the new one
>>
>> (ie most of the time from the kernel side you have a QoS stack)
>
>It's not just the list based computation: that's trivial to fix, as you
>say ... the other problem is the notifier chain, because that's blocking
>and could be long. Could we invoke the notifier through a workqueue?
>It doesn't seem to have veto power, so it's pure notification, does it
>matter if the notice is delayed (as long as it's in order)?
[mtg: ] true. The notifications "could be" done on as a scheduled work item
in most cases. I think there is only one user of the notification so far
any way. Most pm_qos users do a pole of the current value for whatever parameter they are interested in.
--mgross
>
>> > We've had a number of attempts at fixing this, but I think the proper
>fix is to bolt a "disable C-states > x" interface into cpu_idle that
>bypases pm_qos altogether. Or, perhaps add a new pm_qos API that does the
>equivalent operation, overriding whatever constraint is active.
>>
>> We need some of this anyway for deep power saving because there is
>> hardware which can't wake from soem states, which in turn means if that
>> device is active we need to be above the state in question.
>
>James
>
On Thu, 3 Jun 2010, James Bottomley wrote:
> On Thu, 2010-06-03 at 11:03 +0100, Alan Cox wrote:
> > > [mtg: ] This has been a pain point for the PM_QOS implementation. They change the constrain back and forth at the transaction level of the i2c driver. The pm_qos code really wasn't made to deal with such hot path use, as each such change triggers a re-computation of what the aggregate qos request is.
> >
> > That should be trivial in the usual case because 99% of the time you can
> > hot path
> >
> > the QoS entry changing is the latest one
> > there have been no other changes
> > If it is valid I can use the cached previous aggregate I cunningly
> > saved in the top QoS entry when I computed the new one
> >
> > (ie most of the time from the kernel side you have a QoS stack)
>
> It's not just the list based computation: that's trivial to fix, as you
> say ... the other problem is the notifier chain, because that's blocking
> and could be long. Could we invoke the notifier through a workqueue?
> It doesn't seem to have veto power, so it's pure notification, does it
> matter if the notice is delayed (as long as it's in order)?
It depends on the information type and for a lot of things we might
get away without notifiers.
The only real issue is when you need to get other cores out of their
deep idle state to make a new constraint work. That's what we do with
the DMA latency notifier right now.
Thanks,
tglx
Peter Zijlstra <[email protected]> writes:
> On Thu, 2010-06-03 at 11:03 +0100, Alan Cox wrote:
>> > [mtg: ] This has been a pain point for the PM_QOS implementation.
>> They change the constrain back and forth at the transaction level of
>> the i2c driver. The pm_qos code really wasn't made to deal with such
>> hot path use, as each such change triggers a re-computation of what
>> the aggregate qos request is.
>>
>> That should be trivial in the usual case because 99% of the time you can
>> hot path
>>
>> the QoS entry changing is the latest one
>> there have been no other changes
>> If it is valid I can use the cached previous aggregate I cunningly
>> saved in the top QoS entry when I computed the new one
>>
>> (ie most of the time from the kernel side you have a QoS stack)
>
> Why would the kernel change the QoS state of a task? Why not have two
> interacting QoS variables, one for the task, one for the subsystem in
> question, and the action depends on their relative value?
Yes, having a QoS parameter per-subsystem (or even per-device) is very
important for SoCs that have independently controlled powerdomains.
If all devices/subsystems in a particular powerdomain have QoS
parameters that permit, the power state of that powerdomain can be
lowered independently from system-wide power state and power states of
other power domains.
Kevin
On Thu, 2010-06-03 at 16:35 +0200, Thomas Gleixner wrote:
> On Thu, 3 Jun 2010, James Bottomley wrote:
>
> > On Thu, 2010-06-03 at 11:03 +0100, Alan Cox wrote:
> > > > [mtg: ] This has been a pain point for the PM_QOS implementation. They change the constrain back and forth at the transaction level of the i2c driver. The pm_qos code really wasn't made to deal with such hot path use, as each such change triggers a re-computation of what the aggregate qos request is.
> > >
> > > That should be trivial in the usual case because 99% of the time you can
> > > hot path
> > >
> > > the QoS entry changing is the latest one
> > > there have been no other changes
> > > If it is valid I can use the cached previous aggregate I cunningly
> > > saved in the top QoS entry when I computed the new one
> > >
> > > (ie most of the time from the kernel side you have a QoS stack)
> >
> > It's not just the list based computation: that's trivial to fix, as you
> > say ... the other problem is the notifier chain, because that's blocking
> > and could be long. Could we invoke the notifier through a workqueue?
> > It doesn't seem to have veto power, so it's pure notification, does it
> > matter if the notice is delayed (as long as it's in order)?
>
> It depends on the information type and for a lot of things we might
> get away without notifiers.
>
> The only real issue is when you need to get other cores out of their
> deep idle state to make a new constraint work. That's what we do with
> the DMA latency notifier right now.
But the only DMA latency notifier is cpuidle_latency_notifier. That
looks callable from atomic context, so we could have two chains: one
atomic and one not.
The only other notifier in use is the ieee80211_max_network_latency,
which uses mutexes, so does require user context.
James
>-----Original Message-----
>From: Kevin Hilman [mailto:[email protected]]
>Sent: Thursday, June 03, 2010 7:43 AM
>To: Peter Zijlstra
>Cc: Alan Cox; Gross, Mark; Florian Mickler; James Bottomley; Arve
>Hj?nnev?g; Neil Brown; [email protected]; LKML; Thomas Gleixner; Linux OMAP
>Mailing List; Linux PM; [email protected]
>Subject: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)
>
>Peter Zijlstra <[email protected]> writes:
>
>> On Thu, 2010-06-03 at 11:03 +0100, Alan Cox wrote:
>>> > [mtg: ] This has been a pain point for the PM_QOS implementation.
>>> They change the constrain back and forth at the transaction level of
>>> the i2c driver. The pm_qos code really wasn't made to deal with such
>>> hot path use, as each such change triggers a re-computation of what
>>> the aggregate qos request is.
>>>
>>> That should be trivial in the usual case because 99% of the time you can
>>> hot path
>>>
>>> the QoS entry changing is the latest one
>>> there have been no other changes
>>> If it is valid I can use the cached previous aggregate I cunningly
>>> saved in the top QoS entry when I computed the new one
>>>
>>> (ie most of the time from the kernel side you have a QoS stack)
>>
>> Why would the kernel change the QoS state of a task? Why not have two
>> interacting QoS variables, one for the task, one for the subsystem in
>> question, and the action depends on their relative value?
>
>Yes, having a QoS parameter per-subsystem (or even per-device) is very
>important for SoCs that have independently controlled powerdomains.
>If all devices/subsystems in a particular powerdomain have QoS
>parameters that permit, the power state of that powerdomain can be
>lowered independently from system-wide power state and power states of
>other power domains.
>
This seems similar to that pm_qos generalization into bus drivers we where
waving our hands at during the collab summit in April? We never did get
into meaningful detail at that time.
--mgross
"Gross, Mark" <[email protected]> writes:
>>-----Original Message-----
>>From: Kevin Hilman [mailto:[email protected]]
>>Sent: Thursday, June 03, 2010 7:43 AM
>>To: Peter Zijlstra
>>Cc: Alan Cox; Gross, Mark; Florian Mickler; James Bottomley; Arve
>>Hj?nnev?g; Neil Brown; [email protected]; LKML; Thomas Gleixner; Linux OMAP
>>Mailing List; Linux PM; [email protected]
>>Subject: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)
>>
>>Peter Zijlstra <[email protected]> writes:
>>
>>> On Thu, 2010-06-03 at 11:03 +0100, Alan Cox wrote:
>>>> > [mtg: ] This has been a pain point for the PM_QOS implementation.
>>>> They change the constrain back and forth at the transaction level of
>>>> the i2c driver. The pm_qos code really wasn't made to deal with such
>>>> hot path use, as each such change triggers a re-computation of what
>>>> the aggregate qos request is.
>>>>
>>>> That should be trivial in the usual case because 99% of the time you can
>>>> hot path
>>>>
>>>> the QoS entry changing is the latest one
>>>> there have been no other changes
>>>> If it is valid I can use the cached previous aggregate I cunningly
>>>> saved in the top QoS entry when I computed the new one
>>>>
>>>> (ie most of the time from the kernel side you have a QoS stack)
>>>
>>> Why would the kernel change the QoS state of a task? Why not have two
>>> interacting QoS variables, one for the task, one for the subsystem in
>>> question, and the action depends on their relative value?
>>
>>Yes, having a QoS parameter per-subsystem (or even per-device) is very
>>important for SoCs that have independently controlled powerdomains.
>>If all devices/subsystems in a particular powerdomain have QoS
>>parameters that permit, the power state of that powerdomain can be
>>lowered independently from system-wide power state and power states of
>>other power domains.
>>
> This seems similar to that pm_qos generalization into bus drivers we where
> waving our hands at during the collab summit in April? We never did get
> into meaningful detail at that time.
The hand-waving was around how to generalize it into the driver-model,
or PM QoS. We're already doing this for OMAP, but in an OMAP-specific
way, but it's become clear that this is something useful to
generalize.
Kevin
On Thu, 2010-06-03 at 09:58 -0700, Kevin Hilman wrote:
> "Gross, Mark" <[email protected]> writes:
>
> >>-----Original Message-----
> >>From: Kevin Hilman [mailto:[email protected]]
> >>Sent: Thursday, June 03, 2010 7:43 AM
> >>To: Peter Zijlstra
> >>Cc: Alan Cox; Gross, Mark; Florian Mickler; James Bottomley; Arve
> >>Hjønnevåg; Neil Brown; [email protected]; LKML; Thomas Gleixner; Linux OMAP
> >>Mailing List; Linux PM; [email protected]
> >>Subject: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)
> >>
> >>Peter Zijlstra <[email protected]> writes:
> >>
> >>> On Thu, 2010-06-03 at 11:03 +0100, Alan Cox wrote:
> >>>> > [mtg: ] This has been a pain point for the PM_QOS implementation.
> >>>> They change the constrain back and forth at the transaction level of
> >>>> the i2c driver. The pm_qos code really wasn't made to deal with such
> >>>> hot path use, as each such change triggers a re-computation of what
> >>>> the aggregate qos request is.
> >>>>
> >>>> That should be trivial in the usual case because 99% of the time you can
> >>>> hot path
> >>>>
> >>>> the QoS entry changing is the latest one
> >>>> there have been no other changes
> >>>> If it is valid I can use the cached previous aggregate I cunningly
> >>>> saved in the top QoS entry when I computed the new one
> >>>>
> >>>> (ie most of the time from the kernel side you have a QoS stack)
> >>>
> >>> Why would the kernel change the QoS state of a task? Why not have two
> >>> interacting QoS variables, one for the task, one for the subsystem in
> >>> question, and the action depends on their relative value?
> >>
> >>Yes, having a QoS parameter per-subsystem (or even per-device) is very
> >>important for SoCs that have independently controlled powerdomains.
> >>If all devices/subsystems in a particular powerdomain have QoS
> >>parameters that permit, the power state of that powerdomain can be
> >>lowered independently from system-wide power state and power states of
> >>other power domains.
> >>
> > This seems similar to that pm_qos generalization into bus drivers we where
> > waving our hands at during the collab summit in April? We never did get
> > into meaningful detail at that time.
>
> The hand-waving was around how to generalize it into the driver-model,
> or PM QoS. We're already doing this for OMAP, but in an OMAP-specific
> way, but it's become clear that this is something useful to
> generalize.
Do you have a pointer to the source and description? It might be useful
to look at to do a reality check on what we're talking about.
James
Hi Kevin, Mark, all,
Yes, from our brief discussions at ELC, and all the ensuing discussions that have happened in the last few weeks, it certainly seems like a good time to think about:
- what is a good model to tie up device idleness, latencies, constraints with cpu idle infrastructure - extensions to PM_QOS, part of what is being discussed, especially Kevin's earlier mail about QOS parameter per subsystem/device that may have independent clock/power domain control.
- what is a good infrastructure to subsequently allow platform-specific low power state - extensions to cpuidle infrastructure to allow platform-wide low power state? Exact conditions for such entry/exit into low power state (latency, wake, etc.) could be platform specific.
Is it a good idea to discuss about a model that could be applicable to other SOCs/platforms as well?
Thanks
Rajeev
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Kevin Hilman
Sent: Thursday, June 03, 2010 10:28 PM
To: Gross, Mark
Cc: Neil Brown; [email protected]; Peter Zijlstra; [email protected]; LKML; Florian Mickler; James Bottomley; Thomas Gleixner; Linux OMAP Mailing List; Linux PM; Alan Cox
Subject: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)
"Gross, Mark" <[email protected]> writes:
>>-----Original Message-----
>>From: Kevin Hilman [mailto:[email protected]]
>>Sent: Thursday, June 03, 2010 7:43 AM
>>To: Peter Zijlstra
>>Cc: Alan Cox; Gross, Mark; Florian Mickler; James Bottomley; Arve
>>Hj?nnev?g; Neil Brown; [email protected]; LKML; Thomas Gleixner; Linux OMAP
>>Mailing List; Linux PM; [email protected]
>>Subject: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)
>>
>>Peter Zijlstra <[email protected]> writes:
>>
>>> On Thu, 2010-06-03 at 11:03 +0100, Alan Cox wrote:
>>>> > [mtg: ] This has been a pain point for the PM_QOS implementation.
>>>> They change the constrain back and forth at the transaction level of
>>>> the i2c driver. The pm_qos code really wasn't made to deal with such
>>>> hot path use, as each such change triggers a re-computation of what
>>>> the aggregate qos request is.
>>>>
>>>> That should be trivial in the usual case because 99% of the time you can
>>>> hot path
>>>>
>>>> the QoS entry changing is the latest one
>>>> there have been no other changes
>>>> If it is valid I can use the cached previous aggregate I cunningly
>>>> saved in the top QoS entry when I computed the new one
>>>>
>>>> (ie most of the time from the kernel side you have a QoS stack)
>>>
>>> Why would the kernel change the QoS state of a task? Why not have two
>>> interacting QoS variables, one for the task, one for the subsystem in
>>> question, and the action depends on their relative value?
>>
>>Yes, having a QoS parameter per-subsystem (or even per-device) is very
>>important for SoCs that have independently controlled powerdomains.
>>If all devices/subsystems in a particular powerdomain have QoS
>>parameters that permit, the power state of that powerdomain can be
>>lowered independently from system-wide power state and power states of
>>other power domains.
>>
> This seems similar to that pm_qos generalization into bus drivers we where
> waving our hands at during the collab summit in April? We never did get
> into meaningful detail at that time.
The hand-waving was around how to generalize it into the driver-model,
or PM QoS. We're already doing this for OMAP, but in an OMAP-specific
way, but it's become clear that this is something useful to
generalize.
Kevin
_______________________________________________
linux-pm mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/linux-pm
>> Yes, having a QoS parameter per-subsystem (or even per-device) is very
>> important for SoCs that have independently controlled powerdomains.
>> If all devices/subsystems in a particular powerdomain have QoS
>> parameters that permit, the power state of that powerdomain can be
>> lowered independently from system-wide power state and power states of
>> other power domains.
>>
> This seems similar to that pm_qos generalization into bus drivers we where
> waving our hands at during the collab summit in April? We never did get
> into meaningful detail at that time.
>
> --mgross
I think there is definitely a need for QoS parameters per-device. I've
been pondering how to incorporate this concept into runtime_pm. One
idea would be to add pm_qos-like callbacks to struct dev_pm_ops, e.g.
runtime_pm_qos_add/update/remove_requirement(). Requirements would be
passed up the tree to the first parent that cares, usually a bus driver.
Is this similar to what you guys were discussing at the collab summit?
Thanks.
- Bryan