2010-06-01 10:50:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Tue, 1 Jun 2010, Neil Brown wrote:
>
> I think you have acknowledged that there is a race with suspend - thanks.
> Next step was "can it be closed".
> You seem to suggest that it can, but you describe it as a "work around"
> rather than a "bug fix"...
>
> Do you agree that the race is a "bug", and therefore it is appropriate to
> "fix" it assuming an acceptable fix can be found (which I think it can)?

If we can fix it, yes we definitely should do and not work around it.

Thanks,

tglx


2010-06-02 05:32:53

by NeilBrown

[permalink] [raw]
Subject: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Tue, 1 Jun 2010 12:50:01 +0200 (CEST)
Thomas Gleixner <[email protected]> wrote:

> On Tue, 1 Jun 2010, Neil Brown wrote:
> >
> > I think you have acknowledged that there is a race with suspend - thanks.
> > Next step was "can it be closed".
> > You seem to suggest that it can, but you describe it as a "work around"
> > rather than a "bug fix"...
> >
> > Do you agree that the race is a "bug", and therefore it is appropriate to
> > "fix" it assuming an acceptable fix can be found (which I think it can)?
>
> If we can fix it, yes we definitely should do and not work around it.
>
> Thanks,
>
> tglx

OK.
Here is my suggestion.

While I think this patch would actually work, and hope the ugly aspects are
reasonably balanced by the simplicity, I present it primarily as a base for
improvement.
The important part is to present how drivers and user-space can co-operate
to avoid losing wake-events. The details of what happens in the kernel are
certainly up for discussion (as is everything else really of course).

The user-space suspend daemon avoids losing wake-events by using
fcntl(F_OWNER) to ensure it gets a signal whenever any important wake-event
is ready to be read by user-space. This may involve:
- the one daemon processing all wake events
- Both the suspend daemon and the main event handling daemon opening any
given device that delivers wake events (this should work with input
events ... unless grabbing is needed)
- The event handling daemon giving the suspend-daemon's pid as F_OWNER, and
using poll/select to get the events itself.

When 'mem' is written to /sys/power/state, suspend_prepare waits in an
interruptible wait until any wake-event that might have been initiated before
the suspend was request, has had a chance to be queued for user-space and
trigger kill_fasync.
Currently this wait is a configurable time after the last wake-event was
initiated. This is hackish, but simple and probably adequate.
If more precise timing is needed and achievable, that can be added later. It
would be an entirely internal change and would not affect the API further.
Some of the code developed for suspend-blockers might be a starting point for
this.

Drivers should call pm_suspend_delay() whenever a wake-event occurs. This
simply records the time so that the suspend process knows if there is in fact
any need to wait at all.

The delay to wait after the last pm_suspend_delay() is limited to 10 seconds
and can be set by kernel parameter suspend_block_delay=number-of-milliseconds
It defaults to 2 jiffies (which is possibly too short).

- Would this fix the "bug"??
- and address the issues that suspend-blockers was created to address?
- or are the requirements on user-space too onerous?


Thanks,
NeilBrown

Signed-off-by: NeilBrown <[email protected]>

diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 5e781d8..ccbadd0 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -142,11 +142,13 @@ extern void arch_suspend_disable_irqs(void);
extern void arch_suspend_enable_irqs(void);

extern int pm_suspend(suspend_state_t state);
+extern void pm_suspend_delay(void);
#else /* !CONFIG_SUSPEND */
#define suspend_valid_only_mem NULL

static inline void suspend_set_ops(struct platform_suspend_ops *ops) {}
static inline int pm_suspend(suspend_state_t state) { return -ENOSYS; }
+static inlint void pm_suspend_delay(void) {}
#endif /* !CONFIG_SUSPEND */

/* struct pbe is used for creating lists of pages that should be restored
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 56e7dbb..07897b9 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -46,6 +46,69 @@ bool valid_state(suspend_state_t state)
return suspend_ops && suspend_ops->valid && suspend_ops->valid(state);
}

+/*
+ * Devices that process potential wake-up events report each
+ * wake-up events by pm_suspend_delay();
+ * This ensures that suspend won't happen for a "little while"
+ * so the event has a chance to get to user-space.
+ * pm_suspend calls wait_for_blockers to wait the required
+ * "little while" and to check for signals.
+ * A process that requests a suspend should arrange (via
+ * fcntl(F_GETOWN)) to get signalled whenever a wake-up event
+ * is queued for user-space. This will ensure that if a suspend
+ * is requested at much the same time as a wakeup event arrives, either
+ * the suspend will be interrupted, or it will complete quickly.
+ *
+ * The "little while" is a heuristic to avoid having to explicitly
+ * track every event through the kernel with associated locking and unlocking.
+ * It should be more than the longest time it can take between an interrupt
+ * occurring and the corresponding event being queued to userspace
+ * (and the accompanying kill_fasync call).
+ * This duration is configurable at boot time, has a lower limit of 2
+ * jiffies and an upper limit of 10 seconds. It defaults to the minimum.
+ */
+static unsigned long little_while_jiffies = 2;
+static int __init setup_suspend_block_delay(char *str)
+{
+ unsigned long msec;
+ if (sscanf(str, "%lu", &msec) != 1)
+ return 1;
+ if (msec > 10000)
+ msec = 10000;
+ little_while_jiffies = msecs_to_jiffies(msec);
+ if (little_while_jiffies < 2)
+ little_while_jiffies = 2;
+ return 1;
+}
+__setup("suspend_block_delay=", setup_suspend_block_delay);
+
+static unsigned long next_little_while;
+void pm_suspend_delay()
+{
+ unsigned long then = jiffies + little_while_jiffies;
+
+ if (then != next_little_while)
+ next_little_while = then;
+}
+EXPORT_SYMBOL_GPL(pm_suspend_delay);
+
+static int wait_for_blockers(void)
+{
+ unsigned long timeout;
+
+ if (time_after(jiffies, next_little_while))
+ return 0;
+ timeout = next_little_while - jiffies;
+ if (timeout > msecs_to_jiffies(10000))
+ /* jiffy wrap */
+ return 0;
+
+ while (timeout && !signal_pending(current))
+ timeout = schedule_timeout_interruptible(timeout);
+ if (signal_pending(current))
+ return -EINTR;
+ return 0;
+}
/**
* suspend_valid_only_mem - generic memory-only valid callback
*
@@ -89,6 +152,10 @@ static int suspend_prepare(void)
if (error)
goto Finish;

+ error = wait_for_blockers();
+ if (error)
+ goto Finish;
+
error = usermodehelper_disable();
if (error)
goto Finish;

2010-06-02 07:05:18

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Tue, Jun 1, 2010 at 10:32 PM, Neil Brown <[email protected]> wrote:
> On Tue, 1 Jun 2010 12:50:01 +0200 (CEST)
> Thomas Gleixner <[email protected]> wrote:
>
>> On Tue, 1 Jun 2010, Neil Brown wrote:
>> >
>> > I think you have acknowledged that there is a race with suspend - thanks.
>> > Next step was "can it be closed".
>> > You seem to suggest that it can, but you describe it as a "work around"
>> > rather than a "bug fix"...
>> >
>> > Do you agree that the race is a "bug", and therefore it is appropriate to
>> > "fix" it assuming an acceptable fix can be found (which I think it can)?
>>
>> If we can fix it, yes we definitely should do and not work around it.
>>
>> Thanks,
>>
>> ? ? ? tglx
>
> OK.
> Here is my suggestion.
>
> While I think this patch would actually work, and hope the ugly aspects are
> reasonably balanced by the simplicity, I present it primarily as a base for
> improvement.
> The important part is to present how drivers and user-space can co-operate
> to avoid losing wake-events. ?The details of what happens in the kernel are
> certainly up for discussion (as is everything else really of course).
>
> The user-space suspend daemon avoids losing wake-events by using
> fcntl(F_OWNER) to ensure it gets a signal whenever any important wake-event
> is ready to be read by user-space. ?This may involve:
> ?- the one daemon processing all wake events

Wake up events are not all processed by one daemon.

> ?- Both the suspend daemon and the main event handling daemon opening any
> ? ?given device that delivers wake events (this should work with input
> ? ?events ... unless grabbing is needed)

Not all wakeup events are broadcast like input events so they cannot
be read by both daemons. Not that this really matters, since reading
the event from the suspend daemon does not mean that it has been
delivered to and processed by the other daemon.

> ?- The event handling daemon giving the suspend-daemon's pid as F_OWNER, and
> ? ?using poll/select to get the events itself.

I don't like the idea of using signals for this. Without the hack Alan
Stern suggested, you will temporarily block suspend if the wakeup
event happened before the suspend daemon thread made it to the kernel,
but abort suspend if it happened right after.

>
> When 'mem' is written to /sys/power/state, suspend_prepare waits in an
> interruptible wait until any wake-event that might have been initiated before
> the suspend was request, has had a chance to be queued for user-space and
> trigger kill_fasync.

And what happens if you are not waiting when this happens?

> Currently this wait is a configurable time after the last wake-event was
> initiated. ?This is hackish, but simple and probably adequate.

Waiting after a wake event is the same as suspend_block_timeout. This
is useful when passing events through layers of code that does no
block suspend, but we should strive to avoid it. Other people had much
stronger objections to this, which is why it is not included in the
last suspend blocker patchset.

It also does not work for drivers that need to block suspend for more
than a few seconds. For instance the gpio keypad matrix driver needs
to block suspend while keys are pressed so it can scan the keypad.

> If more precise timing is needed and achievable, that can be added later. ?It
> would be an entirely internal change and would not affect the API further.
> Some of the code developed for suspend-blockers might be a starting point for
> this.
>
> Drivers should call pm_suspend_delay() whenever a wake-event occurs. ?This
> simply records the time so that the suspend process knows if there is in fact
> any need to wait at all.
>
> The delay to wait after the last pm_suspend_delay() is limited to 10 seconds
> and can be set by kernel parameter suspend_block_delay=number-of-milliseconds
> It defaults to 2 jiffies (which is possibly too short).
>
> - Would this fix the "bug"??
> - and address the issues that suspend-blockers was created to address?
> - or are the requirements on user-space too onerous?
>
>
> Thanks,
> NeilBrown
>
> Signed-off-by: NeilBrown <[email protected]>
>
> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> index 5e781d8..ccbadd0 100644
> --- a/include/linux/suspend.h
> +++ b/include/linux/suspend.h
> @@ -142,11 +142,13 @@ extern void arch_suspend_disable_irqs(void);
> ?extern void arch_suspend_enable_irqs(void);
>
> ?extern int pm_suspend(suspend_state_t state);
> +extern void pm_suspend_delay(void);
> ?#else /* !CONFIG_SUSPEND */
> ?#define suspend_valid_only_mem NULL
>
> ?static inline void suspend_set_ops(struct platform_suspend_ops *ops) {}
> ?static inline int pm_suspend(suspend_state_t state) { return -ENOSYS; }
> +static inlint void pm_suspend_delay(void) {}
> ?#endif /* !CONFIG_SUSPEND */
>
> ?/* struct pbe is used for creating lists of pages that should be restored
> diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
> index 56e7dbb..07897b9 100644
> --- a/kernel/power/suspend.c
> +++ b/kernel/power/suspend.c
> @@ -46,6 +46,69 @@ bool valid_state(suspend_state_t state)
> ? ? ? ?return suspend_ops && suspend_ops->valid && suspend_ops->valid(state);
> ?}
>
> +/*
> + * Devices that process potential wake-up events report each
> + * wake-up events by pm_suspend_delay();
> + * This ensures that suspend won't happen for a "little while"
> + * so the event has a chance to get to user-space.
> + * pm_suspend calls wait_for_blockers to wait the required
> + * "little while" and to check for signals.
> + * A process that requests a suspend should arrange (via
> + * fcntl(F_GETOWN)) to get signalled whenever a wake-up event
> + * is queued for user-space. ?This will ensure that if a suspend
> + * is requested at much the same time as a wakeup event arrives, either
> + * the suspend will be interrupted, or it will complete quickly.
> + *
> + * The "little while" is a heuristic to avoid having to explicitly
> + * track every event through the kernel with associated locking and unlocking.
> + * It should be more than the longest time it can take between an interrupt
> + * occurring and the corresponding event being queued to userspace
> + * (and the accompanying kill_fasync call).
> + * This duration is configurable at boot time, has a lower limit of 2
> + * jiffies and an upper limit of 10 seconds. ?It defaults to the minimum.
> + */
> +static unsigned long little_while_jiffies = 2;
> +static int __init setup_suspend_block_delay(char *str)
> +{
> + ? ? ? unsigned long msec;
> + ? ? ? if (sscanf(str, "%lu", &msec) != 1)
> + ? ? ? ? ? ? ? return 1;
> + ? ? ? if (msec > 10000)
> + ? ? ? ? ? ? ? msec = 10000;
> + ? ? ? little_while_jiffies = msecs_to_jiffies(msec);
> + ? ? ? if (little_while_jiffies < 2)
> + ? ? ? ? ? ? ? little_while_jiffies = 2;
> + ? ? ? return 1;
> +}
> +__setup("suspend_block_delay=", setup_suspend_block_delay);
> +
> +static unsigned long next_little_while;
> +void pm_suspend_delay()
> +{
> + ? ? ? unsigned long then = jiffies + little_while_jiffies;
> +
> + ? ? ? if (then != next_little_while)
> + ? ? ? ? ? ? ? next_little_while = then;
> +}
> +EXPORT_SYMBOL_GPL(pm_suspend_delay);
> +
> +static int wait_for_blockers(void)
> +{
> + ? ? ? unsigned long timeout;
> +
> + ? ? ? if (time_after(jiffies, next_little_while))
> + ? ? ? ? ? ? ? return 0;
> + ? ? ? timeout = next_little_while - jiffies;
> + ? ? ? if (timeout > msecs_to_jiffies(10000))
> + ? ? ? ? ? ? ? /* jiffy wrap */
> + ? ? ? ? ? ? ? return 0;
> +
> + ? ? ? while (timeout && !signal_pending(current))
> + ? ? ? ? ? ? ? timeout = schedule_timeout_interruptible(timeout);
> + ? ? ? if (signal_pending(current))
> + ? ? ? ? ? ? ? return -EINTR;
> + ? ? ? return 0;
> +}
> ?/**
> ?* suspend_valid_only_mem - generic memory-only valid callback
> ?*
> @@ -89,6 +152,10 @@ static int suspend_prepare(void)
> ? ? ? ?if (error)
> ? ? ? ? ? ? ? ?goto Finish;
>
> + ? ? ? error = wait_for_blockers();
> + ? ? ? if (error)
> + ? ? ? ? ? ? ? goto Finish;
> +
> ? ? ? ?error = usermodehelper_disable();
> ? ? ? ?if (error)
> ? ? ? ? ? ? ? ?goto Finish;
>
>



--
Arve Hj?nnev?g

2010-06-02 08:06:32

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, 2 Jun 2010 00:05:14 -0700
Arve Hjønnevåg <[email protected]> wrote:

> On Tue, Jun 1, 2010 at 10:32 PM, Neil Brown <[email protected]> wrote:
> > On Tue, 1 Jun 2010 12:50:01 +0200 (CEST)
> > Thomas Gleixner <[email protected]> wrote:
> >
> >> On Tue, 1 Jun 2010, Neil Brown wrote:
> >> >
> >> > I think you have acknowledged that there is a race with suspend - thanks.
> >> > Next step was "can it be closed".
> >> > You seem to suggest that it can, but you describe it as a "work around"
> >> > rather than a "bug fix"...
> >> >
> >> > Do you agree that the race is a "bug", and therefore it is appropriate to
> >> > "fix" it assuming an acceptable fix can be found (which I think it can)?
> >>
> >> If we can fix it, yes we definitely should do and not work around it.
> >>
> >> Thanks,
> >>
> >>       tglx
> >
> > OK.
> > Here is my suggestion.
> >
> > While I think this patch would actually work, and hope the ugly aspects are
> > reasonably balanced by the simplicity, I present it primarily as a base for
> > improvement.
> > The important part is to present how drivers and user-space can co-operate
> > to avoid losing wake-events.  The details of what happens in the kernel are
> > certainly up for discussion (as is everything else really of course).
> >
> > The user-space suspend daemon avoids losing wake-events by using
> > fcntl(F_OWNER) to ensure it gets a signal whenever any important wake-event
> > is ready to be read by user-space.  This may involve:
> >  - the one daemon processing all wake events
>
> Wake up events are not all processed by one daemon.

Not with your current user-space code, no. Are you saying that you are not
open to any significant change in the Android user-space code? That would
make the situation a lot harder to resolve.

>
> >  - Both the suspend daemon and the main event handling daemon opening any
> >    given device that delivers wake events (this should work with input
> >    events ... unless grabbing is needed)
>
> Not all wakeup events are broadcast like input events so they cannot
> be read by both daemons. Not that this really matters, since reading
> the event from the suspend daemon does not mean that it has been
> delivered to and processed by the other daemon.

There would still need to be some sort of communication between the the
suspend daemon on any event daemon to ensure that the events had been
processed. This could be very light weight interaction. The point though is
that with this patch it becomes possible to avoid races. Possible is better
than impossible.

>
> >  - The event handling daemon giving the suspend-daemon's pid as F_OWNER, and
> >    using poll/select to get the events itself.
>
> I don't like the idea of using signals for this. Without the hack Alan
> Stern suggested, you will temporarily block suspend if the wakeup
> event happened before the suspend daemon thread made it to the kernel,
> but abort suspend if it happened right after.

I'm not sure why that difference matters. But I'm also not sure that it is
true.
When any wakeup event happen, a signal will be delivered to the suspend
daemon.
This will interrupt a pending suspend request, or a sleep, or whatever else
the daemon is doing.
It can then go back to waiting for a good time to suspend, and then try to
suspend again.


>
> >
> > When 'mem' is written to /sys/power/state, suspend_prepare waits in an
> > interruptible wait until any wake-event that might have been initiated before
> > the suspend was request, has had a chance to be queued for user-space and
> > trigger kill_fasync.
>
> And what happens if you are not waiting when this happens?

I'm not sure I understand the question. Could you explain it please?

Either the initial event happens late enough to abort/resume the suspend, or
the signal happens early enough to abort the suspend, or alert the daemon not
to do a suspend. Either way we don't get stuck in suspend.


>
> > Currently this wait is a configurable time after the last wake-event was
> > initiated.  This is hackish, but simple and probably adequate.
>
> Waiting after a wake event is the same as suspend_block_timeout. This
> is useful when passing events through layers of code that does no
> block suspend, but we should strive to avoid it. Other people had much
> stronger objections to this, which is why it is not included in the
> last suspend blocker patchset.

Absolutely agree. The idea of of waiting was just a simple way to present
code that actually could work. There are doubtlessly better ways and I
assume they have been implemented in the suspend-blocker code.
We just need some way to wait for the suspend-block count to reach zero, with
some confidence that this amount of time is limited.

(though to be honest ... the incredible simplicity of waiting a little while
is very compelling.... :-))

>
> It also does not work for drivers that need to block suspend for more
> than a few seconds. For instance the gpio keypad matrix driver needs
> to block suspend while keys are pressed so it can scan the keypad.

I cannot imagine why it would take multiple seconds to scan a keypad.
Can you explain that?

Do you mean while keys are held pressed? Maybe you don't get a wake-up event
on key-release? In that case your user-space daemon could block suspend
while there are any pressed keys.... confused.


Thanks for the review,

NeilBrown

2010-06-02 08:50:53

by Florian Mickler

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, 2 Jun 2010 18:06:14 +1000
Neil Brown <[email protected]> wrote:

> I cannot imagine why it would take multiple seconds to scan a keypad.
> Can you explain that?
>
> Do you mean while keys are held pressed? Maybe you don't get a wake-up event
> on key-release? In that case your user-space daemon could block suspend
> while there are any pressed keys.... confused.

IIRC, the device sends interrupts only for first key-down and
last key-up.
To detect simultaneous key-presses you must actively scan it after the
first key-down.


>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-omap" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2010-06-02 09:12:13

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010/6/2 Neil Brown <[email protected]>:
> On Wed, 2 Jun 2010 00:05:14 -0700
> Arve Hj?nnev?g <[email protected]> wrote:
>
>> On Tue, Jun 1, 2010 at 10:32 PM, Neil Brown <[email protected]> wrote:
>> > On Tue, 1 Jun 2010 12:50:01 +0200 (CEST)
>> > Thomas Gleixner <[email protected]> wrote:
>> >
>> >> On Tue, 1 Jun 2010, Neil Brown wrote:
>> >> >
>> >> > I think you have acknowledged that there is a race with suspend - thanks.
>> >> > Next step was "can it be closed".
>> >> > You seem to suggest that it can, but you describe it as a "work around"
>> >> > rather than a "bug fix"...
>> >> >
>> >> > Do you agree that the race is a "bug", and therefore it is appropriate to
>> >> > "fix" it assuming an acceptable fix can be found (which I think it can)?
>> >>
>> >> If we can fix it, yes we definitely should do and not work around it.
>> >>
>> >> Thanks,
>> >>
>> >> ? ? ? tglx
>> >
>> > OK.
>> > Here is my suggestion.
>> >
>> > While I think this patch would actually work, and hope the ugly aspects are
>> > reasonably balanced by the simplicity, I present it primarily as a base for
>> > improvement.
>> > The important part is to present how drivers and user-space can co-operate
>> > to avoid losing wake-events. ?The details of what happens in the kernel are
>> > certainly up for discussion (as is everything else really of course).
>> >
>> > The user-space suspend daemon avoids losing wake-events by using
>> > fcntl(F_OWNER) to ensure it gets a signal whenever any important wake-event
>> > is ready to be read by user-space. ?This may involve:
>> > ?- the one daemon processing all wake events
>>
>> Wake up events are not all processed by one daemon.
>
> Not with your current user-space code, no. ?Are you saying that you are not
> open to any significant change in the Android user-space code? ?That would
> make the situation a lot harder to resolve.
>

Some wakeup events like the an incoming phone may be handled by a
vendor supplied daemon that I do not have the source code for. And, no
I'm not open to a change that would require all wakeup events to go to
a single process.

>>
>> > ?- Both the suspend daemon and the main event handling daemon opening any
>> > ? ?given device that delivers wake events (this should work with input
>> > ? ?events ... unless grabbing is needed)
>>
>> Not all wakeup events are broadcast like input events so they cannot
>> be read by both daemons. Not that this really matters, since reading
>> the event from the suspend daemon does not mean that it has been
>> delivered to and processed by the other daemon.
>
> There would still need to be some sort of communication between the the
> suspend daemon on any event daemon to ensure that the events had been
> processed. ?This could be very light weight interaction. ?The point though is
> that with this patch it becomes possible to avoid races. ?Possible is better
> than impossible.
>

We already have a solution. I don't think rejecting our solution but
merging a worse solution should be the goal.

>>
>> > ?- The event handling daemon giving the suspend-daemon's pid as F_OWNER, and
>> > ? ?using poll/select to get the events itself.
>>
>> I don't like the idea of using signals for this. Without the hack Alan
>> Stern suggested, you will temporarily block suspend if the wakeup
>> event happened before the suspend daemon thread made it to the kernel,
>> but abort suspend if it happened right after.
>
> I'm not sure why that difference matters. ?But I'm also not sure that it is
> true.
> When any wakeup event happen, a signal will be delivered to the suspend
> daemon.
> This will interrupt a pending suspend request, or a sleep, or whatever else
> the daemon is doing.
> It can then go back to waiting for a good time to suspend, and then try to
> suspend again.
>

This is inferior to the solution that is in the android kernel and the
suspend blocker patchset. Both suspend as soon as possible and do not
require signal handlers that modify the argument to your kernel call.

>
>>
>> >
>> > When 'mem' is written to /sys/power/state, suspend_prepare waits in an
>> > interruptible wait until any wake-event that might have been initiated before
>> > the suspend was request, has had a chance to be queued for user-space and
>> > trigger kill_fasync.
>>
>> And what happens if you are not waiting when this happens?
>
> I'm not sure I understand the question. ?Could you explain it please?
>

If the thread is not already in the kernel how does your signal
handler abort suspend.

> Either the initial event happens late enough to abort/resume the suspend, or
> the signal happens early enough to abort the suspend, or alert the daemon not
> to do a suspend. ?Either way we don't get stuck in suspend.
>
>
>>
>> > Currently this wait is a configurable time after the last wake-event was
>> > initiated. ?This is hackish, but simple and probably adequate.
>>
>> Waiting after a wake event is the same as suspend_block_timeout. This
>> is useful when passing events through layers of code that does no
>> block suspend, but we should strive to avoid it. Other people had much
>> stronger objections to this, which is why it is not included in the
>> last suspend blocker patchset.
>
> Absolutely agree. ?The idea of of waiting was just a simple way to present
> code that actually could work. ?There are doubtlessly better ways and I
> assume they have been implemented in the suspend-blocker code.
> We just need some way to wait for the suspend-block count to reach zero, with
> some confidence that this amount of time is limited.
>
> (though to be honest ... the incredible simplicity of waiting a little while
> is very compelling.... :-))
>

Sure, but forcing that as the only way to prevent suspend is taking to too far.

>>
>> It also does not work for drivers that need to block suspend for more
>> than a few seconds. For instance the gpio keypad matrix driver needs
>> to block suspend while keys are pressed so it can scan the keypad.
>
> I cannot imagine why it would take multiple seconds to scan a keypad.
> Can you explain that?
>
> Do you mean while keys are held pressed?

Yes.

> ?Maybe you don't get a wake-up event
> on key-release?

We should.

> ?In that case your user-space daemon could block suspend
> while there are any pressed keys.... ?confused.
>

The user-space daemon should not need to know which keys are in a
matrix. We also have other drivers that need to block suspend. For
instance, some devices need to block suspend while connected to a USB
host.

>
> Thanks for the review,
>
> NeilBrown
>
>



--
Arve Hj?nnev?g

2010-06-02 09:33:46

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, 2 Jun 2010, Arve Hj?nnev?g wrote:
> 2010/6/2 Neil Brown <[email protected]>:
> > There would still need to be some sort of communication between the the
> > suspend daemon on any event daemon to ensure that the events had been
> > processed. ?This could be very light weight interaction. ?The point though is
> > that with this patch it becomes possible to avoid races. ?Possible is better
> > than impossible.
> >
>
> We already have a solution. I don't think rejecting our solution but
> merging a worse solution should be the goal.

That's not the goal at all. We want a solution which is acceptable for
android and OTOH does not get into the way of other approaches.

The main problem I have is that suspend blockers are only addressing
one particular problem space of power management.

We have more requirements than that, e.g. an active device transfer
requires to prevent the idle code to select a deep power state due to
latency requirements.

So we then have to implement two mechanisms in the relevant drivers:

1) telling the idle code to limit latency
2) telling the suspend code not to suspend

My main interest is to limit it to one mechanism, which is QoS based
and let idle and suspend make the appropriate decisions based on that
information.

Thanks,

tglx



2010-06-02 09:53:18

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010/6/2 Thomas Gleixner <[email protected]>:
> On Wed, 2 Jun 2010, Arve Hj?nnev?g wrote:
>> 2010/6/2 Neil Brown <[email protected]>:
>> > There would still need to be some sort of communication between the the
>> > suspend daemon on any event daemon to ensure that the events had been
>> > processed. ?This could be very light weight interaction. ?The point though is
>> > that with this patch it becomes possible to avoid races. ?Possible is better
>> > than impossible.
>> >
>>
>> We already have a solution. I don't think rejecting our solution but
>> merging a worse solution should be the goal.
>
> That's not the goal at all. We want a solution which is acceptable for
> android and OTOH does not get into the way of other approaches.
>

I don't actually think the suspend blocker patchset get in the way of
anything else.

> The main problem I have is that suspend blockers are only addressing
> one particular problem space of power management.
>
> We have more requirements than that, e.g. an active device transfer
> requires to prevent the idle code to select a deep power state due to
> latency requirements.
>
> So we then have to implement two mechanisms in the relevant drivers:
>
> ? 1) telling the idle code to limit latency
> ? 2) telling the suspend code not to suspend

And 3) telling the idle code to not enter low power modes that disrupt
active interrupts or clocks.

Our wakelock code handles 2 and 3, but I removed support for 3 on
request since you can hack it by specifying a latency value that you
know the low power mode cannot support.

>
> My main interest is to limit it to one mechanism, which is QoS based
> and let idle and suspend make the appropriate decisions based on that
> information.
>

We can use one mechanism for this, but we still have to specify both.
To me this is just another naming argument and not a good reason to
not merge the suspend blocker code. You have to modify the same
drivers if you call suspend_block() as if you call
pm_qos_update_requirement(don't suspend). We have to specify when it
is not safe to suspend independent of when it is not safe to enter low
power idle modes so unless you want to have a bitmap of constraints
you don't save any calls. And, if we later get a constraint framework
that supports everything, we can switch to it then and we will then
already have some drivers annotated.

--
Arve Hj?nnev?g

2010-06-02 10:23:15

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, 2 Jun 2010 10:50:39 +0200
Florian Mickler <[email protected]> wrote:

> On Wed, 2 Jun 2010 18:06:14 +1000
> Neil Brown <[email protected]> wrote:
>
> > I cannot imagine why it would take multiple seconds to scan a keypad.
> > Can you explain that?
> >
> > Do you mean while keys are held pressed? Maybe you don't get a wake-up event
> > on key-release? In that case your user-space daemon could block suspend
> > while there are any pressed keys.... confused.
>
> IIRC, the device sends interrupts only for first key-down and
> last key-up.
> To detect simultaneous key-presses you must actively scan it after the
> first key-down.

That makes sense - thanks.
Presumably the first key-press gets to user-space promptly, so the user-space
suspend daemon can be told not to suspend until the last key-up.

Thanks,
NeilBrown

2010-06-02 11:02:41

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, 2 Jun 2010 02:12:10 -0700
Arve Hjønnevåg <[email protected]> wrote:

> 2010/6/2 Neil Brown <[email protected]>:
> > On Wed, 2 Jun 2010 00:05:14 -0700
> > Arve Hjønnevåg <[email protected]> wrote:
> >
> >> On Tue, Jun 1, 2010 at 10:32 PM, Neil Brown <[email protected]> wrote:
> >> > On Tue, 1 Jun 2010 12:50:01 +0200 (CEST)
> >> > Thomas Gleixner <[email protected]> wrote:
> >> >
> >> >> On Tue, 1 Jun 2010, Neil Brown wrote:
> >> >> >
> >> >> > I think you have acknowledged that there is a race with suspend - thanks.
> >> >> > Next step was "can it be closed".
> >> >> > You seem to suggest that it can, but you describe it as a "work around"
> >> >> > rather than a "bug fix"...
> >> >> >
> >> >> > Do you agree that the race is a "bug", and therefore it is appropriate to
> >> >> > "fix" it assuming an acceptable fix can be found (which I think it can)?
> >> >>
> >> >> If we can fix it, yes we definitely should do and not work around it.
> >> >>
> >> >> Thanks,
> >> >>
> >> >>       tglx
> >> >
> >> > OK.
> >> > Here is my suggestion.
> >> >
> >> > While I think this patch would actually work, and hope the ugly aspects are
> >> > reasonably balanced by the simplicity, I present it primarily as a base for
> >> > improvement.
> >> > The important part is to present how drivers and user-space can co-operate
> >> > to avoid losing wake-events.  The details of what happens in the kernel are
> >> > certainly up for discussion (as is everything else really of course).
> >> >
> >> > The user-space suspend daemon avoids losing wake-events by using
> >> > fcntl(F_OWNER) to ensure it gets a signal whenever any important wake-event
> >> > is ready to be read by user-space.  This may involve:
> >> >  - the one daemon processing all wake events
> >>
> >> Wake up events are not all processed by one daemon.
> >
> > Not with your current user-space code, no.  Are you saying that you are not
> > open to any significant change in the Android user-space code?  That would
> > make the situation a lot harder to resolve.
> >
>
> Some wakeup events like the an incoming phone may be handled by a
> vendor supplied daemon that I do not have the source code for. And, no
> I'm not open to a change that would require all wakeup events to go to
> a single process.

Ahh.. Well I have no answer for the "I must support a closed-source app"
card that has not been heard 1000 times already.

My proposal doesn't require all wakeup events to go through one single
process - it was just one of (at least) 3 options.

>
> >>
> >> >  - Both the suspend daemon and the main event handling daemon opening any
> >> >    given device that delivers wake events (this should work with input
> >> >    events ... unless grabbing is needed)
> >>
> >> Not all wakeup events are broadcast like input events so they cannot
> >> be read by both daemons. Not that this really matters, since reading
> >> the event from the suspend daemon does not mean that it has been
> >> delivered to and processed by the other daemon.
> >
> > There would still need to be some sort of communication between the the
> > suspend daemon on any event daemon to ensure that the events had been
> > processed.  This could be very light weight interaction.  The point though is
> > that with this patch it becomes possible to avoid races.  Possible is better
> > than impossible.
> >
>
> We already have a solution. I don't think rejecting our solution but
> merging a worse solution should be the goal.
>
> >>
> >> >  - The event handling daemon giving the suspend-daemon's pid as F_OWNER, and
> >> >    using poll/select to get the events itself.
> >>
> >> I don't like the idea of using signals for this. Without the hack Alan
> >> Stern suggested, you will temporarily block suspend if the wakeup
> >> event happened before the suspend daemon thread made it to the kernel,
> >> but abort suspend if it happened right after.
> >
> > I'm not sure why that difference matters.  But I'm also not sure that it is
> > true.
> > When any wakeup event happen, a signal will be delivered to the suspend
> > daemon.
> > This will interrupt a pending suspend request, or a sleep, or whatever else
> > the daemon is doing.
> > It can then go back to waiting for a good time to suspend, and then try to
> > suspend again.
> >
>
> This is inferior to the solution that is in the android kernel and the
> suspend blocker patchset. Both suspend as soon as possible and do not
> require signal handlers that modify the argument to your kernel call.
>

The solution in the android kernel and the suspend blocker patchset both
share one fairly fatal flaw - they are not being accepted upstream.
I am trying to find a minimal suitable solution that does not share that
flaw.
I do not know yet if it does or not, but as it is fixing a real (design) bug,
I feel it has some chance. Of course if it doesn't meet your need, then
that becomes irrelevant....

And there is no requirement to modify any arguments in any signal handlers.

> >
> >>
> >> >
> >> > When 'mem' is written to /sys/power/state, suspend_prepare waits in an
> >> > interruptible wait until any wake-event that might have been initiated before
> >> > the suspend was request, has had a chance to be queued for user-space and
> >> > trigger kill_fasync.
> >>
> >> And what happens if you are not waiting when this happens?
> >
> > I'm not sure I understand the question.  Could you explain it please?
> >
>
> If the thread is not already in the kernel how does your signal
> handler abort suspend.

setjmp / longjmp. This is the time-honoured method for allowing a signal to
break the flow of a program and re-start somewhere else.

>
> > Either the initial event happens late enough to abort/resume the suspend, or
> > the signal happens early enough to abort the suspend, or alert the daemon not
> > to do a suspend.  Either way we don't get stuck in suspend.
> >
> >
> >>
> >> > Currently this wait is a configurable time after the last wake-event was
> >> > initiated.  This is hackish, but simple and probably adequate.
> >>
> >> Waiting after a wake event is the same as suspend_block_timeout. This
> >> is useful when passing events through layers of code that does no
> >> block suspend, but we should strive to avoid it. Other people had much
> >> stronger objections to this, which is why it is not included in the
> >> last suspend blocker patchset.
> >
> > Absolutely agree.  The idea of of waiting was just a simple way to present
> > code that actually could work.  There are doubtlessly better ways and I
> > assume they have been implemented in the suspend-blocker code.
> > We just need some way to wait for the suspend-block count to reach zero, with
> > some confidence that this amount of time is limited.
> >
> > (though to be honest ... the incredible simplicity of waiting a little while
> > is very compelling.... :-))
> >
>
> Sure, but forcing that as the only way to prevent suspend is taking to too far.
>
> >>
> >> It also does not work for drivers that need to block suspend for more
> >> than a few seconds. For instance the gpio keypad matrix driver needs
> >> to block suspend while keys are pressed so it can scan the keypad.
> >
> > I cannot imagine why it would take multiple seconds to scan a keypad.
> > Can you explain that?
> >
> > Do you mean while keys are held pressed?
>
> Yes.
>
> >  Maybe you don't get a wake-up event
> > on key-release?
>
> We should.
>
> >  In that case your user-space daemon could block suspend
> > while there are any pressed keys....  confused.
> >
>
> The user-space daemon should not need to know which keys are in a
> matrix. We also have other drivers that need to block suspend. For
> instance, some devices need to block suspend while connected to a USB
> host.

And this decision (to block suspend) really needs to be made in the driver,
not in userspace?

You could get those drivers to return EBUSY from PM_SUSPEND_PREPARE (which
would need to be a configurable option), but then I guess you have no way to
wait for the device to become non-busy.

If user-space really cannot tell if the driver is busy or not, then I would
suggest that the driver is fairly poorly designed.

It would seem then that a user-space requested suspend is not sufficient for
your needs even if we remove the race window, as you have drivers that want
to avoid suspend indefinitely, and that "need to avoid suspend" status is not
visible from user-space.
It is a pity that this extra requirement was not clear from your introduction
to the "Opportunistic suspend support" patch.

If that be the case, I'll stop bothering you with suggestions that can never
work.
Thanks for your time,
NeilBrown

2010-06-02 12:27:24

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, 2 Jun 2010, Arve Hj?nnev?g wrote:

> 2010/6/2 Thomas Gleixner <[email protected]>:
> > On Wed, 2 Jun 2010, Arve Hj?nnev?g wrote:
> >> 2010/6/2 Neil Brown <[email protected]>:
> >> > There would still need to be some sort of communication between the the
> >> > suspend daemon on any event daemon to ensure that the events had been
> >> > processed. ?This could be very light weight interaction. ?The point though is
> >> > that with this patch it becomes possible to avoid races. ?Possible is better
> >> > than impossible.
> >> >
> >>
> >> We already have a solution. I don't think rejecting our solution but
> >> merging a worse solution should be the goal.
> >
> > That's not the goal at all. We want a solution which is acceptable for
> > android and OTOH does not get into the way of other approaches.
> >
>
> I don't actually think the suspend blocker patchset get in the way of
> anything else.
>
> > The main problem I have is that suspend blockers are only addressing
> > one particular problem space of power management.
> >
> > We have more requirements than that, e.g. an active device transfer
> > requires to prevent the idle code to select a deep power state due to
> > latency requirements.
> >
> > So we then have to implement two mechanisms in the relevant drivers:
> >
> > ? 1) telling the idle code to limit latency
> > ? 2) telling the suspend code not to suspend
>
> And 3) telling the idle code to not enter low power modes that disrupt
> active interrupts or clocks.
>
> Our wakelock code handles 2 and 3, but I removed support for 3 on
> request since you can hack it by specifying a latency value that you
> know the low power mode cannot support.

You are mixing concepts.

clock domains and power domains are a separate issue which are already
handled by the run time power management code and the clock framework.

The interrupt latency is a QoS requirement and has nothing to do with
power domains and clock domains simply because I can go deeper w/o
violating the clock and power domain constraints when the latency
allows it.

> > My main interest is to limit it to one mechanism, which is QoS based
> > and let idle and suspend make the appropriate decisions based on that
> > information.
> >
>
> We can use one mechanism for this, but we still have to specify both.
> To me this is just another naming argument and not a good reason to
> not merge the suspend blocker code. You have to modify the same

The main objection against suspend blocker is the user space interface
/ ABI issue, not the driver code which we can fix in no time. But we
cannot fix it once it is glued into a user space interface.

I don't care about adding two empty static inlines into a header file,
which allows to merge the android drivers, but I care much about
giving a guaranteed behaviour to user space.

Thanks,

tglx

2010-06-02 18:05:31

by Brian Swetland

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, Jun 2, 2010 at 1:06 AM, Neil Brown <[email protected]> wrote:
> On Wed, 2 Jun 2010 00:05:14 -0700
> Arve Hjønnevåg <[email protected]> wrote:
>> > The user-space suspend daemon avoids losing wake-events by using
>> > fcntl(F_OWNER) to ensure it gets a signal whenever any important wake-event
>> > is ready to be read by user-space.  This may involve:
>> >  - the one daemon processing all wake events
>>
>> Wake up events are not all processed by one daemon.
>
> Not with your current user-space code, no.  Are you saying that you are not
> open to any significant change in the Android user-space code?  That would
> make the situation a lot harder to resolve.

There are many wakeup events possible in a typical system --
keypresses or other input events, network traffic, telephony events,
media events (fill audio buffer, fill video decoder buffer, etc), and
I think requiring that all wakeup event processing bottleneck through
a single userspace process is non-optimal here.

The current suspend-blocker proposal already involves userspace
changes (it's different than our existing wakelock interface), and
we're certainly not opposed to any/all userspace changes on principle,
but on the other hand we're not interested in significant reworks of
userspace unless they actually improve the situation somehow. I think
bottlenecking events through a central daemon would represent a step
backwards.

Brian

2010-06-02 19:05:40

by Florian Mickler

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, 2 Jun 2010 21:02:24 +1000
Neil Brown <[email protected]> wrote:
>
> And this decision (to block suspend) really needs to be made in the driver,
> not in userspace?

Well, it fits. The requirement is a direct consequence of the intimate
knowledge the driver has about the driven devices.
Or if you get in an upper layer, the knowledge that the subsystem has
about it's requirements to function properly. Why would you separate it out?

If all your drivers specify their needed requirements, the pm-core (or
idle) has the maximum flexibility to implement any strategy and is
guaranteed a stable system as long as it honours the given requirements.
(That's the theory, of course.)

>
> You could get those drivers to return EBUSY from PM_SUSPEND_PREPARE (which
> would need to be a configurable option), but then I guess you have no way to
> wait for the device to become non-busy.
>
> If user-space really cannot tell if the driver is busy or not, then I would
> suggest that the driver is fairly poorly designed.

In general, userspace has no business knowing if the driver is busy or
not. It would need intimate knowledge about the driver to determine if
it is busy or not. That is what the kernel is all about, to hide that
knowledge from userspace and masking it with a one-fits-all-API.


>
> It would seem then that a user-space requested suspend is not sufficient for
> your needs even if we remove the race window, as you have drivers that want
> to avoid suspend indefinitely, and that "need to avoid suspend" status is not
> visible from user-space.

Well, but the kernel-solution and the userspace solution should be
pretty independent. The tricky part here seems to be to have a
kernel-userspace-boundary that doesn't require a specific kernel
implementation and works.

Could someone perhaps make a recap on what are the problems with the
API? I have no clear eye (experience?) for that (or so it seems).

> It is a pity that this extra requirement was not clear from your introduction
> to the "Opportunistic suspend support" patch.

I think that the main problem was that _all_ the requirements were
not communicated well. That caused everybody to think that their
solution would be a better fit. You are not alone.

> If that be the case, I'll stop bothering you with suggestions that can never
> work.
> Thanks for your time,
> NeilBrown

Don't be frustrated. What should Arve be? :)

Cheers,
Flo

2010-06-02 20:40:12

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wednesday 02 June 2010, Neil Brown wrote:
> On Tue, 1 Jun 2010 12:50:01 +0200 (CEST)
> Thomas Gleixner <[email protected]> wrote:
>
> > On Tue, 1 Jun 2010, Neil Brown wrote:
> > >
> > > I think you have acknowledged that there is a race with suspend - thanks.
> > > Next step was "can it be closed".
> > > You seem to suggest that it can, but you describe it as a "work around"
> > > rather than a "bug fix"...
> > >
> > > Do you agree that the race is a "bug", and therefore it is appropriate to
> > > "fix" it assuming an acceptable fix can be found (which I think it can)?
> >
> > If we can fix it, yes we definitely should do and not work around it.
> >
> > Thanks,
> >
> > tglx
>
> OK.
> Here is my suggestion.
>
> While I think this patch would actually work, and hope the ugly aspects are
> reasonably balanced by the simplicity, I present it primarily as a base for
> improvement.
> The important part is to present how drivers and user-space can co-operate
> to avoid losing wake-events. The details of what happens in the kernel are
> certainly up for discussion (as is everything else really of course).
>
> The user-space suspend daemon avoids losing wake-events by using
> fcntl(F_OWNER) to ensure it gets a signal whenever any important wake-event
> is ready to be read by user-space. This may involve:
> - the one daemon processing all wake events
> - Both the suspend daemon and the main event handling daemon opening any
> given device that delivers wake events (this should work with input
> events ... unless grabbing is needed)
> - The event handling daemon giving the suspend-daemon's pid as F_OWNER, and
> using poll/select to get the events itself.
>
> When 'mem' is written to /sys/power/state, suspend_prepare waits in an
> interruptible wait until any wake-event that might have been initiated before
> the suspend was request, has had a chance to be queued for user-space and
> trigger kill_fasync.
> Currently this wait is a configurable time after the last wake-event was
> initiated. This is hackish, but simple and probably adequate.
> If more precise timing is needed and achievable, that can be added later. It
> would be an entirely internal change and would not affect the API further.
> Some of the code developed for suspend-blockers might be a starting point for
> this.
>
> Drivers should call pm_suspend_delay() whenever a wake-event occurs. This
> simply records the time so that the suspend process knows if there is in fact
> any need to wait at all.
>
> The delay to wait after the last pm_suspend_delay() is limited to 10 seconds
> and can be set by kernel parameter suspend_block_delay=number-of-milliseconds
> It defaults to 2 jiffies (which is possibly too short).
>
> - Would this fix the "bug"??
> - and address the issues that suspend-blockers was created to address?
> - or are the requirements on user-space too onerous?

In theory wakeup events can also happen after wait_for_blockers() has returned
0 and I guess we should rollback the suspend in such cases.

Rafael

2010-06-02 22:06:19

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, 2 Jun 2010 22:41:14 +0200
"Rafael J. Wysocki" <[email protected]> wrote:

> On Wednesday 02 June 2010, Neil Brown wrote:
> > - Would this fix the "bug"??
> > - and address the issues that suspend-blockers was created to address?
> > - or are the requirements on user-space too onerous?
>
> In theory wakeup events can also happen after wait_for_blockers() has returned
> 0 and I guess we should rollback the suspend in such cases.
>

I naively assumed this was already the case, but on a slightly closer look at
the code it seems not.

Presumably there is some point deep in the suspend code, probably after the
call to sysdev_suspend, where interrupts are disabled and we are about to
actually suspend. At that point a simple "is a roll-back required" test
could abort the suspend.
Then any driver that handles wake-up events, if it gets and event that (would
normally cause a wakeup) PM_SUSPEND_PREPARE and PM_POST_SUSPEND, could set
the "roll-back is required" flag.

??

NeilBrown

2010-06-02 22:14:26

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Thursday 03 June 2010, Neil Brown wrote:
> On Wed, 2 Jun 2010 22:41:14 +0200
> "Rafael J. Wysocki" <[email protected]> wrote:
>
> > On Wednesday 02 June 2010, Neil Brown wrote:
> > > - Would this fix the "bug"??
> > > - and address the issues that suspend-blockers was created to address?
> > > - or are the requirements on user-space too onerous?
> >
> > In theory wakeup events can also happen after wait_for_blockers() has returned
> > 0 and I guess we should rollback the suspend in such cases.
> >
>
> I naively assumed this was already the case, but on a slightly closer look at
> the code it seems not.
>
> Presumably there is some point deep in the suspend code, probably after the
> call to sysdev_suspend, where interrupts are disabled and we are about to
> actually suspend. At that point a simple "is a roll-back required" test
> could abort the suspend.

Yes.

> Then any driver that handles wake-up events, if it gets and event that (would
> normally cause a wakeup) PM_SUSPEND_PREPARE and PM_POST_SUSPEND, could set
> the "roll-back is required" flag.

That's my idea, but instead of setting a flag, I'd use a counter increased
every time there is a wakeup event. Then, the core suspend core code
would store a pre-suspend value of the counter and compare it with
the current value after all wakeup event sources had been set. If they
were different, the suspend would be aborted.

Rafael

2010-06-02 23:21:56

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, 2 Jun 2010 21:05:21 +0200
Florian Mickler <[email protected]> wrote:

> Could someone perhaps make a recap on what are the problems with the
> API? I have no clear eye (experience?) for that (or so it seems).

Good interface design is an acquired taste. And it isn't always easy to
explain satisfactorily. But let me try to explain what I see.

A key aspect of a good interface is unity, and sometimes uniformity.
For example, the file descriptor is a key element to the unity of the
Unix (and hence Posix and Linux) interface. "everything is a file" and
even when it isn't, everything is accessed via a file descriptor.
This is one of the reasons that signals cause so much problem when
programming in Unix - they aren't files, don't have file descriptors and
don't look them at all. That is why signalfd was created, to try to tie
signals back in to the 'file descriptor' model.

So unity is important. Adding new concepts is best done as an extension of
an existing concept. That means that all the infrastructure, not only code
and design but also developer understanding, can be leveraged to help get the
new concept *right* first time. It also means that using the new concept is
easier to learn.

So the problem with the wake-locks / suspend-blockers (and I've actually come
to like the first name much more) is that it introduces a new concept without
properly leveraging existing concepts.

The new concept is opportunistic suspend, though maybe a better name would be
automatic suspend - not sure.

There appear to be two ways you can get opportunistic suspend leveraging
already-existing concepts.

One is to leverage the current "power/state = mem" architecture and just let
userspace choose the opportune moment. The user-space daemon that chooses
this moment would need full information about states of various things to do
this, but sysfs is good at providing full information about what is in the
kernel, and there are dozens of ways for user-space processes to communicate
their state to each other. So this is all doable today without introducing
new design concepts.
Except there is a race between suspending and new events, so we just need to
fix the race. Hence my patch.

The other is to leverage the more general power management infrastructure.
We can already detect when the processor won't be needed for a while, and put
it into a low-power state. We can already put devices to sleep when they
aren't being used. We can just generalise this so that we can detect when
all devices are either unused, or capable of supporting an S3 transition, and
detect when the next timer interrupt is far enough in the future that S3
latency wont be a problem - set the rtc alarm to match the next timer and go
to S3. All completely transparent. (I admit I'm not entirely sure what the
qos that is being discussed brings us, but I assume it is better quality
rather than correctness).

So there are at least two different ways that opportunistic suspend could be
integrated into existing infrastructure with virtually no change of interface
and no new concepts - just refining or extending existing concepts.

Yet the approach used and preferred by android is to create something
substantially new. Yes, it does use the existing suspend infrastructure, but
in a very new and different way. Suspend is now initiated by the kernel, but
in a completely different way to the ways that the kernel already initiates
power saving. So we have two infrastructures for essentially one task.
Looked at the other way, it moves the initiation of suspend from user-space
into the kernel, and then allows user-space to tell the kernel not to suspend.
That to me is very ugly.
In general, the kernel should provide information to user-space, and provide
services to user-space, and user-space should use that information to decide
what services to request. This is the essence the "policy lives in
user-space" maxim.
The Android code has user-space giving information to the kernel, so the
kernel can make a policy decision. This approach is generally less flexible
and is best avoided.

Just as a bit of background, let's think about some of the areas in the
kernel where the kernel does make policy decisions based on user-space input.
- the scheduler - based on 'nice' setting it decided who should run when
- the VM - based on read-ahead settings, madvise/fadvise, recent-use
heuristics, it tries to decide what to keep in memory and what to swap out.
I think those are the main ones. There are other smaller fish like the
choice of IO scheduler and various ways to tune network connections.

But the two big ones are perfect examples of subsystems that have proven very
hard to get *right*, and have been substantially re-written more than once.
In each case, the difficulty wasn't how to perform the work, it was the
choice of what work to perform. It probably also involved getting different
sorts of information about the current state.

That perspective leaves me very sceptical of any design that involves making
policy decisions in the kernel. It is too easy to get wrong, then too hard
to change.
Admittedly the power subsystem does seem to make policy decisions in the
kernel, via the various governors. Though I don't know much about how these
work, it seems significant that there is a pluggable infrastructure with
multiple governors, and one of them leaves the decisions to user-space.

So that is what I see as wrong with the android API : it doesn't bring unity
by simply leveraging existing infrastructure, and it makes policy decisions
in the kernel.


>
> > It is a pity that this extra requirement was not clear from your introduction
> > to the "Opportunistic suspend support" patch.
>
> I think that the main problem was that _all_ the requirements were
> not communicated well. That caused everybody to think that their
> solution would be a better fit. You are not alone.
>
> > If that be the case, I'll stop bothering you with suggestions that can never
> > work.
> > Thanks for your time,
> > NeilBrown
>
> Don't be frustrated. What should Arve be? :)
>

Sometimes appearing frustrated can elicit a different style of response to
appearing polite and constructive... can be helpful.
And yes: I fully understand that Arve would be frustrated. There seems to
be a big disconnect in perceptions of what problem is trying to be solved, and
thus disconnects in what the solution should look like, and I suspect that
would be very frustrating all around.

NeilBrown

2010-06-02 23:32:54

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, Jun 02, 2010 at 09:05:21PM +0200, Florian Mickler wrote:
> On Wed, 2 Jun 2010 21:02:24 +1000
> Neil Brown <[email protected]> wrote:
> >
> > And this decision (to block suspend) really needs to be made in the driver,
> > not in userspace?
>
> Well, it fits. The requirement is a direct consequence of the intimate
> knowledge the driver has about the driven devices.

That is not really true. A driver does have intimate knowledge of the
device, however it does not necessarily have an idea about the data read
from the device. Consider the gpio_matrix driver: Arve says that it has
to continue scanning matrix once first interrupt arrvies. But it really
depends on what key has been pressed - if user pressed KEY_SUSPEND or
KEY_POWER it cmight be better if we did not wait for key release but
initiated the action right away. The decision on how system reacts to a
key press does not belong to the driver but really to userspace.

--
Dmitry

2010-06-03 01:27:59

by Florian Mickler

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, 2 Jun 2010 16:32:44 -0700
Dmitry Torokhov <[email protected]> wrote:

> On Wed, Jun 02, 2010 at 09:05:21PM +0200, Florian Mickler wrote:
> > On Wed, 2 Jun 2010 21:02:24 +1000
> > Neil Brown <[email protected]> wrote:
> > >
> > > And this decision (to block suspend) really needs to be made in the driver,
> > > not in userspace?
> >
> > Well, it fits. The requirement is a direct consequence of the intimate
> > knowledge the driver has about the driven devices.
>
> That is not really true. A driver does have intimate knowledge of the
> device, however it does not necessarily have an idea about the data read
> from the device. Consider the gpio_matrix driver: Arve says that it has
> to continue scanning matrix once first interrupt arrvies. But it really
> depends on what key has been pressed - if user pressed KEY_SUSPEND or
> KEY_POWER it cmight be better if we did not wait for key release but
> initiated the action right away. The decision on how system reacts to a
> key press does not belong to the driver but really to userspace.
>

I can't follow the gpio_matrix_driver example, but your point is
obviously true.
A device should never register a constraint because of the data it
handles. That belongs into userspace. Or wherever the data is
consumed.

I'm obviously not trying to say that a network driver should block
suspend while I'm on facebook. Or unblock when visiting m$.com. That
would be absurd.

But, there are of course places in the kernel, where some kernel code
listens to data. For example the packet-filtering. Or sysrq-keys.
But I don't know how that relates to suspend_blockers now... ?

Mind if I rephrase the quote?
From: "Well, it fits. The requirement is a direct consequence of the
intimate knowledge the driver has about the driven devices."
To: "It fits, when the requirement is a direct consequence of the
intimate knowledge the driver has about the driven devices."

Cheers,
Flo

p.s.: tsss.... language... what a broken concept. And yet we have to
work with it.

2010-06-03 02:45:04

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, Jun 2, 2010 at 4:32 PM, Dmitry Torokhov
<[email protected]> wrote:
> On Wed, Jun 02, 2010 at 09:05:21PM +0200, Florian Mickler wrote:
>> On Wed, 2 Jun 2010 21:02:24 +1000
>> Neil Brown <[email protected]> wrote:
>> >
>> > And this decision (to block suspend) really needs to be made in the driver,
>> > not in userspace?
>>
>> Well, it fits. The requirement is a direct consequence of the intimate
>> knowledge the driver has about the driven devices.
>
> That is not really true. A driver does have intimate knowledge of the
> device, however it does not necessarily have an idea about the data read
> from the device. Consider the gpio_matrix driver: Arve says that it has
> to continue scanning matrix once first interrupt arrvies. But it really
> depends on what key has been pressed - if user pressed KEY_SUSPEND or
> KEY_POWER it cmight be better if we did not wait for key release but
> initiated the action right away. The decision on how system reacts to a
> key press does not belong to the driver but really to userspace.

If we just suspend while the keypad scan is active, a second press of
KEY_POWER would not be able wake the device up. The gpio keypad matrix
driver has two operating modes. No keys are pressed: all the outputs
are active so a key press will generate an interrupt in one of the
inputs. Keys are pressed: Activate a single output at a time so we can
determine which keys are pressed. The second mode is entered when the
driver gets an interrupt in the first mode. The first mode is entered
if the scan detected that no keys are pressed. The driver could be
modified to stop the scan on suspend and forcibly put the device back
in no-keys-pressed state, but pressing additional keys connected to
the same input can no longer generate any events (the input does not
change).

--
Arve Hj?nnev?g

2010-06-03 03:27:07

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, 2 Jun 2010 19:44:59 -0700
Arve Hjønnevåg <[email protected]> wrote:

> On Wed, Jun 2, 2010 at 4:32 PM, Dmitry Torokhov
> <[email protected]> wrote:
> > On Wed, Jun 02, 2010 at 09:05:21PM +0200, Florian Mickler wrote:
> >> On Wed, 2 Jun 2010 21:02:24 +1000
> >> Neil Brown <[email protected]> wrote:
> >> >
> >> > And this decision (to block suspend) really needs to be made in the driver,
> >> > not in userspace?
> >>
> >> Well, it fits. The requirement is a direct consequence of the intimate
> >> knowledge the driver has about the driven devices.
> >
> > That is not really true. A driver does have intimate knowledge of the
> > device, however it does not necessarily have an idea about the data read
> > from the device. Consider the gpio_matrix driver: Arve says that it has
> > to continue scanning matrix once first interrupt arrvies. But it really
> > depends on what key has been pressed - if user pressed KEY_SUSPEND or
> > KEY_POWER it cmight be better if we did not wait for key release but
> > initiated the action right away. The decision on how system reacts to a
> > key press does not belong to the driver but really to userspace.
>
> If we just suspend while the keypad scan is active, a second press of
> KEY_POWER would not be able wake the device up. The gpio keypad matrix
> driver has two operating modes. No keys are pressed: all the outputs
> are active so a key press will generate an interrupt in one of the
> inputs. Keys are pressed: Activate a single output at a time so we can
> determine which keys are pressed. The second mode is entered when the
> driver gets an interrupt in the first mode. The first mode is entered
> if the scan detected that no keys are pressed. The driver could be
> modified to stop the scan on suspend and forcibly put the device back
> in no-keys-pressed state, but pressing additional keys connected to
> the same input can no longer generate any events (the input does not
> change).
>

Thanks for the detailed explanation. That helps a lot.

I see that if follows that performing a suspend while keys are pressed would
not be good, and keys could be pressed for arbitrarily long periods.
It does not follow that the keypad driver should therefore explicitly disable
suspend. It could simply inform user-space that the keypad is in the
alternate scan mode, so user-space can choose not to enter suspend in at that
time (i.e. policy in user-space).

I can see also how one might want to express this behaviour as a PM_QOS
constraint (now that I have read a bit about PM_QOS), but I cannot see that
you would need to. As the keypad driver is polling during this mode, it
would presumably set a timer that would fire in the near future, and the very
existence of this timer (with a duration shorter than the S3 latency) would
be enough to ensure S3 wasn't automatically entered by PM.
At most you might use set_timer_slack to give a slightly higher upper bound
to the timeout.

So if we take the suspend-from-idle approach, this driver needs no
annotation, and if we take the 'fix the suspend/wake-event race' then this
driver needs no special treatment - just enough to close the race, and some
user-space policy support.

i.e. it doesn't seem to be a special case.

NeilBrown

2010-06-03 06:04:33

by 640E9920

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] - race-free suspend. Was: Re: [PATCH 0/8] Suspend block api (version 8)

On Wed, Jun 02, 2010 at 11:05:18AM -0700, Brian Swetland wrote:
> On Wed, Jun 2, 2010 at 1:06 AM, Neil Brown <[email protected]> wrote:
> > On Wed, 2 Jun 2010 00:05:14 -0700
> > Arve Hj?nnev?g <[email protected]> wrote:
> >> > The user-space suspend daemon avoids losing wake-events by using
> >> > fcntl(F_OWNER) to ensure it gets a signal whenever any important wake-event
> >> > is ready to be read by user-space. ?This may involve:
> >> > ?- the one daemon processing all wake events
> >>
> >> Wake up events are not all processed by one daemon.
> >
> > Not with your current user-space code, no. ?Are you saying that you are not
> > open to any significant change in the Android user-space code? ?That would
> > make the situation a lot harder to resolve.
>
> There are many wakeup events possible in a typical system --
> keypresses or other input events, network traffic, telephony events,
> media events (fill audio buffer, fill video decoder buffer, etc), and
> I think requiring that all wakeup event processing bottleneck through
> a single userspace process is non-optimal here.

Um doesn't the android framework bottleneck the user mode lock
processing through the powermanager and any wake up event processing
eventually has to grab a lock through this bottleneck anyway?

>
> The current suspend-blocker proposal already involves userspace
> changes (it's different than our existing wakelock interface), and
> we're certainly not opposed to any/all userspace changes on principle,
> but on the other hand we're not interested in significant reworks of
> userspace unless they actually improve the situation somehow. I think
> bottlenecking events through a central daemon would represent a step
> backwards.

I'm not sure its a step in any direction, but I do understand the
avoidance of having to rework a lot of code.

--mgross

2010-06-03 06:12:48

by Brian Swetland

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] - race-free suspend. Was: Re: [PATCH 0/8] Suspend block api (version 8)

On Wed, Jun 2, 2010 at 11:04 PM, mark gross <[email protected]> wrote:
>>
>> There are many wakeup events possible in a typical system --
>> keypresses or other input events, network traffic, telephony events,
>> media events (fill audio buffer, fill video decoder buffer, etc), and
>> I think requiring that all wakeup event processing bottleneck through
>> a single userspace process is non-optimal here.
>
> Um doesn't the android framework bottleneck the user mode lock
> processing through the powermanager and any wake up event processing
> eventually has to grab a lock through this bottleneck anyway?

For "high level" framework/application level wakelocks, yes, but lower
level components beneath the java api layer use the kernel interface
directly.

Brian

2010-06-03 06:33:24

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, 2 Jun 2010 11:05:18 -0700
Brian Swetland <[email protected]> wrote:

> On Wed, Jun 2, 2010 at 1:06 AM, Neil Brown <[email protected]> wrote:
> > On Wed, 2 Jun 2010 00:05:14 -0700
> > Arve Hjønnevåg <[email protected]> wrote:
> >> > The user-space suspend daemon avoids losing wake-events by using
> >> > fcntl(F_OWNER) to ensure it gets a signal whenever any important wake-event
> >> > is ready to be read by user-space.  This may involve:
> >> >  - the one daemon processing all wake events
> >>
> >> Wake up events are not all processed by one daemon.
> >
> > Not with your current user-space code, no.  Are you saying that you are not
> > open to any significant change in the Android user-space code?  That would
> > make the situation a lot harder to resolve.
>
> There are many wakeup events possible in a typical system --
> keypresses or other input events, network traffic, telephony events,
> media events (fill audio buffer, fill video decoder buffer, etc), and
> I think requiring that all wakeup event processing bottleneck through
> a single userspace process is non-optimal here.

Just to be clear: I'm not suggesting all wake-events need to go through one
process. That was just one example of how the interface I proposed could be
used. There were two other examples.
However one process would need to know about any wakeup event that happens.
I don't think that needs to be a significant bottleneck, but I don't really
know enough about all the requirement to try devising a demonstration.

>
> The current suspend-blocker proposal already involves userspace
> changes (it's different than our existing wakelock interface), and
> we're certainly not opposed to any/all userspace changes on principle,
> but on the other hand we're not interested in significant reworks of
> userspace unless they actually improve the situation somehow. I think
> bottlenecking events through a central daemon would represent a step
> backwards.

I guess it becomes an question of economics for you then. Does the cost of
whatever user-space changes are required exceed the value of using an upstream
kernel? Both the cost and the value would be very hard to estimate in
advance. I don't envy you the decision...

NeilBrown

2010-06-03 06:43:13

by Brian Swetland

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, Jun 2, 2010 at 11:33 PM, Neil Brown <[email protected]> wrote:
>>
>> The current suspend-blocker proposal already involves userspace
>> changes (it's different than our existing wakelock interface), and
>> we're certainly not opposed to any/all userspace changes on principle,
>> but on the other hand we're not interested in significant reworks of
>> userspace unless they actually improve the situation somehow.  I think
>> bottlenecking events through a central daemon would represent a step
>> backwards.
>
> I guess it becomes an question of economics for you then.  Does the cost of
> whatever user-space changes are required exceed the value of using an upstream
> kernel?  Both the cost and the value would be very hard to estimate in
> advance.  I don't envy you the decision...

Well, at this point we've invested more engineering hours in the
various rewrites of this (single) patchset and discussion around it
than we have spent on rebasing our trees on roughly every other
mainline release since 2.6.16 or so, across five years of Android
development. We think there's some good value to be had (all the
usual reasons) by heading upstream, so we're still discussing these
patches and exploring alternatives, but yes, from one way of looking
at it, it'd certainly be cheaper to just keep maintaining our own
trees.

Brian

2010-06-03 13:36:36

by 640E9920

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] - race-free suspend. Was: Re: [PATCH 0/8] Suspend block api (version 8)

On Wed, Jun 02, 2010 at 11:12:39PM -0700, Brian Swetland wrote:
> On Wed, Jun 2, 2010 at 11:04 PM, mark gross <[email protected]> wrote:
> >>
> >> There are many wakeup events possible in a typical system --
> >> keypresses or other input events, network traffic, telephony events,
> >> media events (fill audio buffer, fill video decoder buffer, etc), and
> >> I think requiring that all wakeup event processing bottleneck through
> >> a single userspace process is non-optimal here.
> >
> > Um doesn't the android framework bottleneck the user mode lock
> > processing through the powermanager and any wake up event processing
> > eventually has to grab a lock through this bottleneck anyway?
>
> For "high level" framework/application level wakelocks, yes, but lower
> level components beneath the java api layer use the kernel interface
> directly.
>
Oh. I thought everything went through
hardware/libhardware_legacy/power/power.c
who else is hitting /sys/power/* in the android user mode then?

I'll have to go hunting for them I guess.

--mgross

2010-06-03 14:22:23

by Theodore Ts'o

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, Jun 02, 2010 at 11:43:06PM -0700, Brian Swetland wrote:
> > I guess it becomes an question of economics for you then. ?Does the cost of
> > whatever user-space changes are required exceed the value of using an upstream
> > kernel? ?Both the cost and the value would be very hard to estimate in
> > advance. ?I don't envy you the decision...
>
> Well, at this point we've invested more engineering hours in the
> various rewrites of this (single) patchset and discussion around it
> than we have spent on rebasing our trees on roughly every other
> mainline release since 2.6.16 or so, across five years of Android
> development. We think there's some good value to be had (all the
> usual reasons) by heading upstream, so we're still discussing these
> patches and exploring alternatives, but yes, from one way of looking
> at it, it'd certainly be cheaper to just keep maintaining our own
> trees.

And let's be blunt. If in the future the Android team (which I'm not
a member of) decides that they have invested more engineering time
than they can justify from a business perspective, the next time
someone starts whining on a blog, or on Slashdot, or at a conference,
about how "OMG! Google is forking the kernel!!!", or "Google is
making the lives of device driver writers for the embedded world
difficult", it will now be possible from a political point of view to
point and the hundreds, if not thousands, of e-mail messages of LKML
developers wanting to redesign this effort and saying "Nyet! Nyet!
Nyet!" to the original patchset, to point out that Google has a made
an effort, and gone far beyond what is required by the GPL. Not only
has the source code been made available, but hundreds of engineering
hours have been made trying to accomodate the demands of LKML --- and
LKML has said no to suspend blockers/wakelocks.

Realistically, a company makes decisions about whether to pursue
engineering efforts based on business plans, and this is true whether
the company is Red Hat, or Novell, or IBM, or Google. One of the
cost/benefits can be things that aren't easily measured such as public
relations. But past a certain point, it may be that the right answer
is to make a single public appeal to Linus, and if he says no, or if
he ignores the pull request, then the company at hand can say, "We've
done the best that we could, but the Linux developer community, and
Linus specifically, has refused our patch". And yes, it's all about
PR, but let's not kid ourselves --- the GPL and bad PR can't be used
as blackmail to force a company to invest arbitrarily unlimited
amounts of engineering effort just to satisfy the mandarins of the
LKML and/or Linus.

And if people want to call this a fork, Linus has in the past said
that sometimes forks are healthy, and necessary, and we can see how
things play out in the marketplace.

Disclosure: I work at Google, but I'm not at all involved in the
Android development team, and it's not at all my call when Brian and
his team should make a decision that they've invested more time/energy
than can be justified to their management --- heck, they even roll up
to an completely different VP than I do. :-)

- Ted

2010-06-03 15:41:43

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Thu, 2010-06-03 at 10:21 -0400, [email protected] wrote:

> And let's be blunt. If in the future the Android team (which I'm not
> a member of) decides that they have invested more engineering time
> than they can justify from a business perspective, the next time
> someone starts whining on a blog, or on Slashdot, or at a conference,
> about how "OMG! Google is forking the kernel!!!", or "Google is
> making the lives of device driver writers for the embedded world
> difficult", it will now be possible from a political point of view to
> point and the hundreds, if not thousands, of e-mail messages of LKML
> developers wanting to redesign this effort and saying "Nyet! Nyet!
> Nyet!" to the original patchset, to point out that Google has a made
> an effort, and gone far beyond what is required by the GPL. Not only
> has the source code been made available, but hundreds of engineering
> hours have been made trying to accomodate the demands of LKML --- and
> LKML has said no to suspend blockers/wakelocks.


In the spirit of being blunt, so what?

We say no for good technical reasons. Also when did it become sensible
to push features after they were shipped?

It never works to develop stuff like this out-of-tree and then push for
inclusion. You always get to rewrite (at least 3 times).

If Google indeed decides it doesn't want to play upstream, then sad. But
I don't see how we would be unjust in complaining about it.

There is more to our community than the letter of the GPL, and you
should know that. So I really don't see the point of your argument (was
there one besides the management gibberish?).

2010-06-03 17:27:05

by Brian Swetland

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH] - race-free suspend. Was: Re: [PATCH 0/8] Suspend block api (version 8)

On Thu, Jun 3, 2010 at 6:36 AM, mark gross <[email protected]> wrote:
> On Wed, Jun 02, 2010 at 11:12:39PM -0700, Brian Swetland wrote:
>> On Wed, Jun 2, 2010 at 11:04 PM, mark gross <[email protected]> wrote:
>> >>
>> >> There are many wakeup events possible in a typical system --
>> >> keypresses or other input events, network traffic, telephony events,
>> >> media events (fill audio buffer, fill video decoder buffer, etc), and
>> >> I think requiring that all wakeup event processing bottleneck through
>> >> a single userspace process is non-optimal here.
>> >
>> > Um doesn't the android framework bottleneck the user mode lock
>> > processing through the powermanager and any wake up event processing
>> > eventually has to grab a lock through this bottleneck anyway?
>>
>> For "high level" framework/application level wakelocks, yes, but lower
>> level components beneath the java api layer use the kernel interface
>> directly.
>>
> Oh.  I thought everything went through
> hardware/libhardware_legacy/power/power.c
> who else is hitting /sys/power/* in the android user mode then?

I believe everything does -- that's the thin wrapper around the kernel
interface (which will have to change slightly to meet the
suspend_blocker device/fd vs wakelock proc/sys interface, etc), which
is used by the powermanager service, the RIL, and any other low level
code. At the App/Service level, wakelocks are provided by a java
level API that is a remote interface to the powermanager.

Brian

2010-06-04 07:14:55

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

On Wed, Jun 02, 2010 at 07:44:59PM -0700, Arve Hj?nnev?g wrote:
> On Wed, Jun 2, 2010 at 4:32 PM, Dmitry Torokhov
> <[email protected]> wrote:
> > On Wed, Jun 02, 2010 at 09:05:21PM +0200, Florian Mickler wrote:
> >> On Wed, 2 Jun 2010 21:02:24 +1000
> >> Neil Brown <[email protected]> wrote:
> >> >
> >> > And this decision (to block suspend) really needs to be made in the driver,
> >> > not in userspace?
> >>
> >> Well, it fits. The requirement is a direct consequence of the intimate
> >> knowledge the driver has about the driven devices.
> >
> > That is not really true. A driver does have intimate knowledge of the
> > device, however it does not necessarily have an idea about the data read
> > from the device. Consider the gpio_matrix driver: Arve says that it has
> > to continue scanning matrix once first interrupt arrvies. But it really
> > depends on what key has been pressed - if user pressed KEY_SUSPEND or
> > KEY_POWER it cmight be better if we did not wait for key release but
> > initiated the action right away. The decision on how system reacts to a
> > key press does not belong to the driver but really to userspace.
>
> If we just suspend while the keypad scan is active, a second press of
> KEY_POWER would not be able wake the device up. The gpio keypad matrix
> driver has two operating modes. No keys are pressed: all the outputs
> are active so a key press will generate an interrupt in one of the
> inputs. Keys are pressed: Activate a single output at a time so we can
> determine which keys are pressed. The second mode is entered when the
> driver gets an interrupt in the first mode. The first mode is entered
> if the scan detected that no keys are pressed. The driver could be
> modified to stop the scan on suspend and forcibly put the device back
> in no-keys-pressed state, but pressing additional keys connected to
> the same input can no longer generate any events (the input does not
> change).

Would that be still the case if you reprogram the device as wakeup
source while suspending?

Anyway, it does not really matter. Your case (current suspend blockers)
would delay putting device to sleep till you release all the keys,
including KEY_POWER. If you delegate the decision to userspace it would
have an _option_ of putting the device to sleep earlier, however in both
cases user has to release all keys before the device can be resumed.

--
Dmitry

2010-06-04 07:55:13

by Arve Hjønnevåg

[permalink] [raw]
Subject: Re: [PATCH] - race-free suspend. Was: Re: [linux-pm] [PATCH 0/8] Suspend block api (version 8)

2010/6/4 Dmitry Torokhov <[email protected]>:
> On Wed, Jun 02, 2010 at 07:44:59PM -0700, Arve Hj?nnev?g wrote:
>> On Wed, Jun 2, 2010 at 4:32 PM, Dmitry Torokhov
>> <[email protected]> wrote:
>> > On Wed, Jun 02, 2010 at 09:05:21PM +0200, Florian Mickler wrote:
>> >> On Wed, 2 Jun 2010 21:02:24 +1000
>> >> Neil Brown <[email protected]> wrote:
>> >> >
>> >> > And this decision (to block suspend) really needs to be made in the driver,
>> >> > not in userspace?
>> >>
>> >> Well, it fits. The requirement is a direct consequence of the intimate
>> >> knowledge the driver has about the driven devices.
>> >
>> > That is not really true. A driver does have intimate knowledge of the
>> > device, however it does not necessarily have an idea about the data read
>> > from the device. Consider the gpio_matrix driver: Arve says that it has
>> > to continue scanning matrix once first interrupt arrvies. But it really
>> > depends on what key has been pressed - if user pressed KEY_SUSPEND or
>> > KEY_POWER it cmight be better if we did not wait for key release but
>> > initiated the action right away. The decision on how system reacts to a
>> > key press does not belong to the driver but really to userspace.
>>
>> If we just suspend while the keypad scan is active, a second press of
>> KEY_POWER would not be able wake the device up. The gpio keypad matrix
>> driver has two operating modes. No keys are pressed: all the outputs
>> are active so a key press will generate an interrupt in one of the
>> inputs. Keys are pressed: Activate a single output at a time so we can
>> determine which keys are pressed. The second mode is entered when the
>> driver gets an interrupt in the first mode. The first mode is entered
>> if the scan detected that no keys are pressed. The driver could be
>> modified to stop the scan on suspend and forcibly put the device back
>> in no-keys-pressed state, but pressing additional keys connected to
>> the same input can no longer generate any events (the input does not
>> change).
>
> Would that be still the case if you reprogram the device as wakeup
> source while suspending?

It depends on what part you are referring to. It is impossible to
detect some keys presses if you suspend while a key is held down. That
key will connect one output to one input. If the interrupt is edge
triggered you can just turn all the outputs back on (and clear the
interrupt that this will generate) and suspend, but no additional key
presses on keys connected to the same input will cause an interrupt
until all those keys have been released first or a key connected to
another input is pressed. If the interrupt is level triggered (and
fixed polarity) you cannot do this. You must either disable the input
interrupt or the output. This means that you if the user releases the
key and press it again, you cannot wakeup on this key. You can also
not wake up on any other keys connected to the disables input or
output. So you have some choice in what events you loose, but there
will always be some key press sequence that will not work if you
suspend but will work if you prevent suspend in the middle.

>
> Anyway, it does not really matter. Your case (current suspend blockers)
> would delay putting device to sleep till you release all the keys,
> including KEY_POWER. If you delegate the decision to userspace it would
> have an _option_ of putting the device to sleep earlier, however in both
> cases user has to release all keys before the device can be resumed.

We deliver all keys to user space so that is has the option of
reacting to them. Yes if we did not do this user space would have the
option of suspending while keys are pressed, but it would need
detailed knowledge about the driver to make this decision (will the
driver loose key events if we suspend, which keys can be lost, does
the condition clear automatically when all the keys are released or do
we need another wakeup event to get out of it).

--
Arve Hj?nnev?g