2014-10-21 08:50:35

by Bastien Nocera

[permalink] [raw]
Subject: A desktop environment[1] kernel wishlist

Hey,

GNOME has had discussions with kernel developers in the past, and,
fortunately, in some cases we were able to make headway.

There are however a number of items that we still don't have solutions
for, items that kernel developers might not realise we'd like to rely
on, or don't know that we'd make use of if merged.

I've posted this list at:
https://wiki.gnome.org/BastienNocera/KernelWishlist

Let me know on-list or off-list if you have any comments about those, so
I can update the list.

Cheers

PS: Please CC: me as I'm not subscribed to LKML itself.

[1]: the desktop environment in question being GNOME, as the URL shows.
I'm confident that most of those changes would be useful for embedded
usage, whether a NAS, Android devices, or other desktops.


2014-10-21 13:15:10

by Shnatsel

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

Hey everyone,

I'm glad we're having some discussion on this, because we have almost
exactly the same kernel wishlist internally for elementary OS / Pantheon DE.

I believe I can further elaborate on the VFS monitoring part. We need a file
monitoring facility that's scalable (unlike inotify) and can provide a
decent level of detail (unlike fanotify). In particular, we need to be able
to detect file/directory creation, renaming and removal events, as well as
close_write event. And, in an ideal world, all of that without requiring
root permissions.

This can be almost accomplished by combining output of fanotify with that of
a custom LSM module that just reports events to userspace (e.g. rlocate uses
such a thing). There are two problems with this: first, it's a hideous hack,
and second, it doesn't detect deletions.

This is a big deal because without it we're stuck with always presenting the
user with the filesystem. If you've seen library-based music players like
Rhythmbox or Banshee, you know that they group and sort all your music by
artist and album, but not by directory and file name, and that you can
efficiently search all that metadata. We're trying to get the same thing
into more applications, but the absence of VFS features described above is
blocking us. Even after moving all the database management to a single
daemon that does all the monitoring and very rarely has to rescan anything,
the system either slows to a crawl (inotify) or the database gets out of
date quickly (fanotify+LSM).
In case I didn't make myself clear, a more detailed writeup on the design
can be found here: http://tiny.cc/tearing-up-files

Regarding the other items, AFAIK the kernel implements mechanism, not
policy, so instead of "zswap selectively enabled by default" we just want
"stable reliable zswap". We had to give up on zram previously (in pre-3.10
days) because of kernel regressions leading to panics when zram was enabled.
And we don't have the "Power management" part on our list because we haven't
really delved in that yet. But our lists are identical in all the other
areas, so that's not "just GNOME".

PS: I'm not subscribed to LKML either, so please CC me.

Cheers!

2014-10-21 17:04:10

by John Stultz

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, Oct 21, 2014 at 1:49 AM, Bastien Nocera <[email protected]> wrote:
> Hey,
>
> GNOME has had discussions with kernel developers in the past, and,
> fortunately, in some cases we were able to make headway.
>
> There are however a number of items that we still don't have solutions
> for, items that kernel developers might not realise we'd like to rely
> on, or don't know that we'd make use of if merged.
>
> I've posted this list at:
> https://wiki.gnome.org/BastienNocera/KernelWishlist
>
> Let me know on-list or off-list if you have any comments about those, so
> I can update the list.

As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
documentation'

Can you expand more on the rational for the need here? Is this for UI
for power debugging, or something else?

thanks
-john

2014-10-21 17:14:43

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

Hey,

On Tue, 2014-10-21 at 10:04 -0700, John Stultz wrote:
> On Tue, Oct 21, 2014 at 1:49 AM, Bastien Nocera <[email protected]> wrote:
> > Hey,
> >
> > GNOME has had discussions with kernel developers in the past, and,
> > fortunately, in some cases we were able to make headway.
> >
> > There are however a number of items that we still don't have solutions
> > for, items that kernel developers might not realise we'd like to rely
> > on, or don't know that we'd make use of if merged.
> >
> > I've posted this list at:
> > https://wiki.gnome.org/BastienNocera/KernelWishlist
> >
> > Let me know on-list or off-list if you have any comments about those, so
> > I can update the list.
>
> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
> documentation'
>
> Can you expand more on the rational for the need here? Is this for UI
> for power debugging, or something else?

No, it would be used for automating backups, or implementing
suspend->hibernation transitions. For example, right before the machine
suspends, I would schedule it to wake up in a hour. If I get woken up by
the rtc alarm (and not by the user through a lid open), I might:
- check that I'm plugged into the AC, it's night, and in the vicinity of
the server that handles my backups and so backup the system.
- check whether the battery is low, and hibernate the machine (if it
supports it, obviously).

We cannot do that if we can't make out whether the wake-up came from a
user action, or the alarm we set.

Cheers

2014-10-21 18:00:41

by John Stultz

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, Oct 21, 2014 at 10:14 AM, Bastien Nocera <[email protected]> wrote:
> Hey,
>
> On Tue, 2014-10-21 at 10:04 -0700, John Stultz wrote:
>> On Tue, Oct 21, 2014 at 1:49 AM, Bastien Nocera <[email protected]> wrote:
>> > Hey,
>> >
>> > GNOME has had discussions with kernel developers in the past, and,
>> > fortunately, in some cases we were able to make headway.
>> >
>> > There are however a number of items that we still don't have solutions
>> > for, items that kernel developers might not realise we'd like to rely
>> > on, or don't know that we'd make use of if merged.
>> >
>> > I've posted this list at:
>> > https://wiki.gnome.org/BastienNocera/KernelWishlist
>> >
>> > Let me know on-list or off-list if you have any comments about those, so
>> > I can update the list.
>>
>> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
>> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
>> documentation'
>>
>> Can you expand more on the rational for the need here? Is this for UI
>> for power debugging, or something else?
>
> No, it would be used for automating backups, or implementing
> suspend->hibernation transitions. For example, right before the machine
> suspends, I would schedule it to wake up in a hour. If I get woken up by
> the rtc alarm (and not by the user through a lid open), I might:
> - check that I'm plugged into the AC, it's night, and in the vicinity of
> the server that handles my backups and so backup the system.
> - check whether the battery is low, and hibernate the machine (if it
> supports it, obviously).
>
> We cannot do that if we can't make out whether the wake-up came from a
> user action, or the alarm we set.

I suspect wakeup type reporting is maybe not the best way to go about
this, since there may be a number of causes for wakeups and they can
arrive closely together in different orders, which can result in
races.

For instance, if the machine suspends, and sets an alarm to be woken
up at midnight to do a backup, if the user resumes their laptop at
11:59:59, should the backup still proceed at midnight? What happens
if the user starts to use their machine at 12:00:01? What about if
the user walked away from their machine at 11:55:01, and the system
would suspend at 12:00:01, should the backup commence at 12:00:00?

Thus you probably want to have a "user present" status, then use the
timerfd() ALARM clockids to set any wakeups you'd like, and when they
trigger (if the system was suspended or not), decide to do your backup
based the conditionals you had above, using the user-present status in
a similar way to how you use AC status.

I'd suggest looking into some of the details on how Android does its
wakelock logic, as well as the timerfd ALARM clockids, since I think
this would provide what you need.

My bigger concern here with your use case though, is that you might be
able to use ALARM timers more commonly, but that for much existing
hardware, corner cases like programmatic resuming of a laptop while
its packed in a bag somewhere might have thermal risks. For mobile
devices this is an expected design point, but for off-the-shelf
laptops with big fans and exhaust vents, I'm not sure how safe this
would be, so you may need to constrain this functionality somehow (or
look to see if a enforced low-power resume is possible).

thanks
-john

2014-10-21 18:10:16

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, 2014-10-21 at 11:00 -0700, John Stultz wrote:
> On Tue, Oct 21, 2014 at 10:14 AM, Bastien Nocera <[email protected]> wrote:
> > Hey,
> >
> > On Tue, 2014-10-21 at 10:04 -0700, John Stultz wrote:
> >> On Tue, Oct 21, 2014 at 1:49 AM, Bastien Nocera <[email protected]> wrote:
> >> > Hey,
> >> >
> >> > GNOME has had discussions with kernel developers in the past, and,
> >> > fortunately, in some cases we were able to make headway.
> >> >
> >> > There are however a number of items that we still don't have solutions
> >> > for, items that kernel developers might not realise we'd like to rely
> >> > on, or don't know that we'd make use of if merged.
> >> >
> >> > I've posted this list at:
> >> > https://wiki.gnome.org/BastienNocera/KernelWishlist
> >> >
> >> > Let me know on-list or off-list if you have any comments about those, so
> >> > I can update the list.
> >>
> >> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
> >> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
> >> documentation'
> >>
> >> Can you expand more on the rational for the need here? Is this for UI
> >> for power debugging, or something else?
> >
> > No, it would be used for automating backups, or implementing
> > suspend->hibernation transitions. For example, right before the machine
> > suspends, I would schedule it to wake up in a hour. If I get woken up by
> > the rtc alarm (and not by the user through a lid open), I might:
> > - check that I'm plugged into the AC, it's night, and in the vicinity of
> > the server that handles my backups and so backup the system.
> > - check whether the battery is low, and hibernate the machine (if it
> > supports it, obviously).
> >
> > We cannot do that if we can't make out whether the wake-up came from a
> > user action, or the alarm we set.
>
> I suspect wakeup type reporting is maybe not the best way to go about
> this, since there may be a number of causes for wakeups and they can
> arrive closely together in different orders, which can result in
> races.
>
> For instance, if the machine suspends, and sets an alarm to be woken
> up at midnight to do a backup, if the user resumes their laptop at
> 11:59:59, should the backup still proceed at midnight?

No. And I would expect that we would get a wake up type of "power
button" or "lid open" in this case.

> What happens
> if the user starts to use their machine at 12:00:01?

I would expect the backup to stop and be tried again later.

> What about if
> the user walked away from their machine at 11:55:01, and the system
> would suspend at 12:00:01, should the backup commence at 12:00:00?

That wouldn't happen because we'd set the wake up time when suspending.

> Thus you probably want to have a "user present" status,

We can do any sort of thing once the laptop is awake. But right now
there's no way to know whether the resume is due to a user action or
not.

> then use the
> timerfd() ALARM clockids to set any wakeups you'd like, and when they
> trigger (if the system was suspended or not), decide to do your backup
> based the conditionals you had above, using the user-present status in
> a similar way to how you use AC status.
>
> I'd suggest looking into some of the details on how Android does its
> wakelock logic, as well as the timerfd ALARM clockids, since I think
> this would provide what you need.

It doesn't. There's still a whole class of hardware that isn't always on
as mobile SoCs are, and wakelocks aren't going to help if the kernel
isn't running and we don't know why it started running again.

> My bigger concern here with your use case though, is that you might be
> able to use ALARM timers more commonly, but that for much existing
> hardware, corner cases like programmatic resuming of a laptop while
> its packed in a bag somewhere might have thermal risks.

I'm pretty sure that Windows has done this for years before we did. If
the laptop cannot suspend reliably, then the user would disable it. We
cannot keep designing around broken software.

> For mobile
> devices this is an expected design point, but for off-the-shelf
> laptops with big fans and exhaust vents, I'm not sure how safe this
> would be, so you may need to constrain this functionality somehow (or
> look to see if a enforced low-power resume is possible).

I think that we won't know whether it's a problem until the point that
somebody actually implements it.

2014-10-21 18:24:34

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, Oct 21, 2014 at 7:14 PM, Bastien Nocera <[email protected]> wrote:
>> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
>> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
>> documentation'
>>
>> Can you expand more on the rational for the need here? Is this for UI
>> for power debugging, or something else?
>
> No, it would be used for automating backups, or implementing
> suspend->hibernation transitions. For example, right before the machine
> suspends, I would schedule it to wake up in a hour. If I get woken up by
> the rtc alarm (and not by the user through a lid open), I might:
> - check that I'm plugged into the AC, it's night, and in the vicinity of
> the server that handles my backups and so backup the system.
> - check whether the battery is low, and hibernate the machine (if it
> supports it, obviously).

Isn't this already available through /sys/kernel/debug/wakeup_sources
and/or the various power/wake* files in sysfs?

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2014-10-21 19:10:39

by John Stultz

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, Oct 21, 2014 at 11:09 AM, Bastien Nocera <[email protected]> wrote:
> On Tue, 2014-10-21 at 11:00 -0700, John Stultz wrote:
>> I suspect wakeup type reporting is maybe not the best way to go about
>> this, since there may be a number of causes for wakeups and they can
>> arrive closely together in different orders, which can result in
>> races.
>>
>> For instance, if the machine suspends, and sets an alarm to be woken
>> up at midnight to do a backup, if the user resumes their laptop at
>> 11:59:59, should the backup still proceed at midnight?
>
> No. And I would expect that we would get a wake up type of "power
> button" or "lid open" in this case.
>
>> What happens
>> if the user starts to use their machine at 12:00:01?
>
> I would expect the backup to stop and be tried again later.

What "event" would you be using to trigger stopping the backup?


>> What about if
>> the user walked away from their machine at 11:55:01, and the system
>> would suspend at 12:00:01, should the backup commence at 12:00:00?
>
> That wouldn't happen because we'd set the wake up time when suspending.
>
>> Thus you probably want to have a "user present" status,
>
> We can do any sort of thing once the laptop is awake. But right now
> there's no way to know whether the resume is due to a user action or
> not.

I'm suggesting its best if you don't care which specific irq brought
the device out of suspend mode.

You may want to monitor various events like the lid-open or
power-button (as well as the timerfd for alarms), and use those for
your logic, but again, because a number of different irqs might bring
the system out of suspend and those irqs can possibly occur almost
simultaneously, which specific one landed first and woke the system is
really not that useful (and again prone to races).

Now, I do think knowing which IRQ did bring you out of suspend is
useful, but mostly for power-debugging when you're trying to optimize
battery life. But for userland logic, I think its far too prone to
races.


>> then use the
>> timerfd() ALARM clockids to set any wakeups you'd like, and when they
>> trigger (if the system was suspended or not), decide to do your backup
>> based the conditionals you had above, using the user-present status in
>> a similar way to how you use AC status.
>>
>> I'd suggest looking into some of the details on how Android does its
>> wakelock logic, as well as the timerfd ALARM clockids, since I think
>> this would provide what you need.
>
> It doesn't. There's still a whole class of hardware that isn't always on
> as mobile SoCs are, and wakelocks aren't going to help if the kernel
> isn't running and we don't know why it started running again.

I'm not sure I parsed this properly. Mobile SoCs are quite frequently
in suspend and not always on. They frequently resume both due to
wakeup alarms, modem call irqs, and as a result of user-interaction
like button presses.


>> My bigger concern here with your use case though, is that you might be
>> able to use ALARM timers more commonly, but that for much existing
>> hardware, corner cases like programmatic resuming of a laptop while
>> its packed in a bag somewhere might have thermal risks.
>
> I'm pretty sure that Windows has done this for years before we did. If
> the laptop cannot suspend reliably, then the user would disable it. We
> cannot keep designing around broken software.

Sure. But its not reliably suspending I'm worried about, its
accidentally resuming in an environment the hardware wasn't designed
for. Its really more of a hardware design issue. I'm not suggesting
you don't do it, but I just suspect you'll need to be careful about
automatically enabling this on older hardware.

thanks
-john

2014-10-21 19:23:26

by Andy Lutomirski

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On 10/21/2014 11:09 AM, Bastien Nocera wrote:
> On Tue, 2014-10-21 at 11:00 -0700, John Stultz wrote:
>> On Tue, Oct 21, 2014 at 10:14 AM, Bastien Nocera <[email protected]> wrote:
>>> Hey,
>>>
>>> On Tue, 2014-10-21 at 10:04 -0700, John Stultz wrote:
>>>> On Tue, Oct 21, 2014 at 1:49 AM, Bastien Nocera <[email protected]> wrote:
>>>>> Hey,
>>>>>
>>>>> GNOME has had discussions with kernel developers in the past, and,
>>>>> fortunately, in some cases we were able to make headway.
>>>>>
>>>>> There are however a number of items that we still don't have solutions
>>>>> for, items that kernel developers might not realise we'd like to rely
>>>>> on, or don't know that we'd make use of if merged.
>>>>>
>>>>> I've posted this list at:
>>>>> https://wiki.gnome.org/BastienNocera/KernelWishlist
>>>>>
>>>>> Let me know on-list or off-list if you have any comments about those, so
>>>>> I can update the list.
>>>>
>>>> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
>>>> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
>>>> documentation'
>>>>
>>>> Can you expand more on the rational for the need here? Is this for UI
>>>> for power debugging, or something else?
>>>
>>> No, it would be used for automating backups, or implementing
>>> suspend->hibernation transitions. For example, right before the machine
>>> suspends, I would schedule it to wake up in a hour. If I get woken up by
>>> the rtc alarm (and not by the user through a lid open), I might:
>>> - check that I'm plugged into the AC, it's night, and in the vicinity of
>>> the server that handles my backups and so backup the system.
>>> - check whether the battery is low, and hibernate the machine (if it
>>> supports it, obviously).
>>>
>>> We cannot do that if we can't make out whether the wake-up came from a
>>> user action, or the alarm we set.
>>
>> I suspect wakeup type reporting is maybe not the best way to go about
>> this, since there may be a number of causes for wakeups and they can
>> arrive closely together in different orders, which can result in
>> races.
>>
>> For instance, if the machine suspends, and sets an alarm to be woken
>> up at midnight to do a backup, if the user resumes their laptop at
>> 11:59:59, should the backup still proceed at midnight?
>
> No. And I would expect that we would get a wake up type of "power
> button" or "lid open" in this case.
>
>> What happens
>> if the user starts to use their machine at 12:00:01?
>
> I would expect the backup to stop and be tried again later.
>
>> What about if
>> the user walked away from their machine at 11:55:01, and the system
>> would suspend at 12:00:01, should the backup commence at 12:00:00?
>
> That wouldn't happen because we'd set the wake up time when suspending.
>
>> Thus you probably want to have a "user present" status,
>
> We can do any sort of thing once the laptop is awake. But right now
> there's no way to know whether the resume is due to a user action or
> not.

Let me try saying what I think John is saying a little differently:

I think it shouldn't really matter what caused the wakeup. I think it
should matter what the state of the laptop is post-wakeup.

For example, suppose that the user opens the lid at effectively the same
time as an alarm fires. By the time userspace code can run, the alarm
has fired *and* the lid is open.

Shouldn't the user code just check whether the alarm has fired and
whether the lid is open?

Admittedly, this gets a bit hairy for things like wake buttons (which my
laptop has two of), but I still think it's more important whether the
button was pressed recently, not whether the button press actually
caused the wakeup.

(With S0ix and similar technologies, I imagine that determining what
caused a "wakeup" could be extremely confusing.)

--Andy

2014-10-21 19:28:17

by Andy Lutomirski

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On 10/21/2014 01:49 AM, Bastien Nocera wrote:
> Hey,
>
> GNOME has had discussions with kernel developers in the past, and,
> fortunately, in some cases we were able to make headway.
>
> There are however a number of items that we still don't have solutions
> for, items that kernel developers might not realise we'd like to rely
> on, or don't know that we'd make use of if merged.
>
> I've posted this list at:
> https://wiki.gnome.org/BastienNocera/KernelWishlist
>
> Let me know on-list or off-list if you have any comments about those, so
> I can update the list.

I don't know much about desktop environment infrastructure, but I think
the kernel probably already has a lot of what's needed for LinuxApps.

Tools like Sandstorm [1] (shameless plug, but it's a good example here)
can already sandbox normal-ish programs, and those sandboxes can be
launched without privilege [2].

Why is kdbus needed? Why are overlays better than, say, btrfs
lightweight copies here? Also, overlayfs might actually make it for 3.19.

[1] sandstorm.io
[2]
https://github.com/sandstorm-io/sandstorm/blob/master/src/sandstorm/supervisor-main.c%2B%2B

As for childfs, I implemented procfs polling a couple years ago, but it
never went anywhere:

http://lkml.kernel.org/g/1840e47fc4113af16989a4250d98bed62a9bce53.1354559528.git.luto@amacapital.net

If that would help, I can try to dust it off and get it in to the kernel.

--Andy

2014-10-21 19:43:43

by Al Viro

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, Oct 21, 2014 at 12:28:12PM -0700, Andy Lutomirski wrote:

> Why is kdbus needed?

Presumably because freedesktop crowd has made an architectural mistake and
pushed dbus as solution to all problems. And ran into limitations of
that, er, solution. Then, instead of perhaps reconsidering the wisdom of
their inspired decision, went for "let's push it kernelwards, it might
somewhat reduce the overhead and problems will be easier to chalk up to
something wrong being done by the kernel".

2014-10-21 19:48:10

by Andy Lutomirski

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, Oct 21, 2014 at 12:43 PM, Al Viro <[email protected]> wrote:
> On Tue, Oct 21, 2014 at 12:28:12PM -0700, Andy Lutomirski wrote:
>
>> Why is kdbus needed?
>
> Presumably because freedesktop crowd has made an architectural mistake and
> pushed dbus as solution to all problems. And ran into limitations of
> that, er, solution. Then, instead of perhaps reconsidering the wisdom of
> their inspired decision, went for "let's push it kernelwards, it might
> somewhat reduce the overhead and problems will be easier to chalk up to
> something wrong being done by the kernel".

Well, yes, but the question I was actually trying to ask is: why does
a containerized app need any kernel help at all for communication with
the rest of the system (using dbus or anything else)? Passing socket
fds into a container works just fine.

--Andy

2014-10-22 02:58:19

by Minchan Kim

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

Hello,

About zram, pre-3.10 is really old. At that time, there were some trouble
in zram but I belive most of known bugs should fix since 3.14 merged
from staging and enhanced much so I suggest you try it with recent zram.
If you find a problem, let me know it.

Thanks.

On Tue, Oct 21, 2014 at 01:11:07PM +0000, Sergey wrote:
> Hey everyone,
>
> I'm glad we're having some discussion on this, because we have almost
> exactly the same kernel wishlist internally for elementary OS / Pantheon DE.
>
> I believe I can further elaborate on the VFS monitoring part. We need a file
> monitoring facility that's scalable (unlike inotify) and can provide a
> decent level of detail (unlike fanotify). In particular, we need to be able
> to detect file/directory creation, renaming and removal events, as well as
> close_write event. And, in an ideal world, all of that without requiring
> root permissions.
>
> This can be almost accomplished by combining output of fanotify with that of
> a custom LSM module that just reports events to userspace (e.g. rlocate uses
> such a thing). There are two problems with this: first, it's a hideous hack,
> and second, it doesn't detect deletions.
>
> This is a big deal because without it we're stuck with always presenting the
> user with the filesystem. If you've seen library-based music players like
> Rhythmbox or Banshee, you know that they group and sort all your music by
> artist and album, but not by directory and file name, and that you can
> efficiently search all that metadata. We're trying to get the same thing
> into more applications, but the absence of VFS features described above is
> blocking us. Even after moving all the database management to a single
> daemon that does all the monitoring and very rarely has to rescan anything,
> the system either slows to a crawl (inotify) or the database gets out of
> date quickly (fanotify+LSM).
> In case I didn't make myself clear, a more detailed writeup on the design
> can be found here: http://tiny.cc/tearing-up-files
>
> Regarding the other items, AFAIK the kernel implements mechanism, not
> policy, so instead of "zswap selectively enabled by default" we just want
> "stable reliable zswap". We had to give up on zram previously (in pre-3.10
> days) because of kernel regressions leading to panics when zram was enabled.
> And we don't have the "Power management" part on our list because we haven't
> really delved in that yet. But our lists are identical in all the other
> areas, so that's not "just GNOME".
>
> PS: I'm not subscribed to LKML either, so please CC me.
>
> Cheers!
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Kind regards,
Minchan Kim

2014-10-22 16:52:41

by Dan Streetman

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, Oct 21, 2014 at 9:11 AM, Sergey <[email protected]> wrote:
> Hey everyone,
>
> I'm glad we're having some discussion on this, because we have almost
> exactly the same kernel wishlist internally for elementary OS / Pantheon DE.
>
> I believe I can further elaborate on the VFS monitoring part. We need a file
> monitoring facility that's scalable (unlike inotify) and can provide a
> decent level of detail (unlike fanotify). In particular, we need to be able
> to detect file/directory creation, renaming and removal events, as well as
> close_write event. And, in an ideal world, all of that without requiring
> root permissions.
>
> This can be almost accomplished by combining output of fanotify with that of
> a custom LSM module that just reports events to userspace (e.g. rlocate uses
> such a thing). There are two problems with this: first, it's a hideous hack,
> and second, it doesn't detect deletions.
>
> This is a big deal because without it we're stuck with always presenting the
> user with the filesystem. If you've seen library-based music players like
> Rhythmbox or Banshee, you know that they group and sort all your music by
> artist and album, but not by directory and file name, and that you can
> efficiently search all that metadata. We're trying to get the same thing
> into more applications, but the absence of VFS features described above is
> blocking us. Even after moving all the database management to a single
> daemon that does all the monitoring and very rarely has to rescan anything,
> the system either slows to a crawl (inotify) or the database gets out of
> date quickly (fanotify+LSM).
> In case I didn't make myself clear, a more detailed writeup on the design
> can be found here: http://tiny.cc/tearing-up-files
>
> Regarding the other items, AFAIK the kernel implements mechanism, not
> policy, so instead of "zswap selectively enabled by default" we just want
> "stable reliable zswap".

Can you elaborate on what your problems with zswap are? What are you
seeing when it's unstable or unreliable?

> We had to give up on zram previously (in pre-3.10
> days) because of kernel regressions leading to panics when zram was enabled.
> And we don't have the "Power management" part on our list because we haven't
> really delved in that yet. But our lists are identical in all the other
> areas, so that's not "just GNOME".
>
> PS: I'm not subscribed to LKML either, so please CC me.
>
> Cheers!
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2014-10-22 17:14:06

by Zygo Blaxell

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, Oct 21, 2014 at 08:09:38PM +0200, Bastien Nocera wrote:
> On Tue, 2014-10-21 at 11:00 -0700, John Stultz wrote:
> > On Tue, Oct 21, 2014 at 10:14 AM, Bastien Nocera <[email protected]> wrote:
> > >> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
> > >> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
> > >> documentation'
> > >>
> > >> Can you expand more on the rational for the need here? Is this for UI
> > >> for power debugging, or something else?
> > >
> > > No, it would be used for automating backups, or implementing
> > > suspend->hibernation transitions. For example, right before the machine
> > > suspends, I would schedule it to wake up in a hour. If I get woken up by
> > > the rtc alarm (and not by the user through a lid open), I might:
> > > - check that I'm plugged into the AC, it's night, and in the vicinity of
> > > the server that handles my backups and so backup the system.
> > > - check whether the battery is low, and hibernate the machine (if it
> > > supports it, obviously).
> > >
> > > We cannot do that if we can't make out whether the wake-up came from a
> > > user action, or the alarm we set.
> >
> > I suspect wakeup type reporting is maybe not the best way to go about
> > this, since there may be a number of causes for wakeups and they can
> > arrive closely together in different orders, which can result in
> > races.
> >
> > For instance, if the machine suspends, and sets an alarm to be woken
> > up at midnight to do a backup, if the user resumes their laptop at
> > 11:59:59, should the backup still proceed at midnight?
>
> No. And I would expect that we would get a wake up type of "power
> button" or "lid open" in this case.

I have been using something like this for the last 7 years or so.
The relevant inputs are:

1. is the user present (is there recent input on HID devices,
keyboard/mouse, but ignore devices like light sensors, 3D
accelerometers, and ACPI virtual keys)?

2. which network connection(s) are available to reach the
backup server?

3. how much power is available (if on battery, how much run
time left?)

4. what is the policy (do backups happen at a specific time
of day, or whenever they can?)

5. was a backup completed successfully in the last N hours?

Note the absence of any information about the cause of recent
suspend/resume activity, or any input from suspend/resume at all.

Most of the inputs are used for table lookup with a bit of logic for
dependent configuration parameters. e.g. from input #2, if the network
connection is not home or office, use a different threshold for the
amount of battery power required by input #3, assuming that when I'm in
those specific places I am never more than 5 minutes away from AC power.

In my setup a daemon evaluates all the input conditions whenever
any of them change, and if the result is "machine should sleep" then
it simulates #4 over time, sets an alarm for the next time "machine
should not sleep" is asserted, and goes to sleep. If the next wakeup
event is less than 60 seconds in the future, we might miss the wakeup
alarm, so we just stay on. The daemon controls the backup process (and
dozens of other power-state-dependent processes) with freezer cgroups.
The backup process knows nothing about power management or scheduling
(nor should it), it is simply running, frozen, or not running according
to the current conditions and the table.

All five inputs are relevant. I don't want backups while I'm using the
machine as a desktop because of the latency impact on disk and network;
however, when I stop to get a coffee, the backups can proceed until I
get back. If the only network available incurs per-bit usage charges
or it is a shared busy network with no excess capacity, there will be
no backups regardless of the other conditions. If the machine has less
than six hours of battery power and no AC feed, there are no backups.

The best part is that in this scheme the backups can be scheduled
opportunistically, i.e. whenever the machine is awake and required
resources are abundant or underutilized, so by the time midnight rolls
around there are rarely any outstanding backups left to do, and the
machine just sleeps straight through the night.

The same schema gets used for other processes like web browsers, but
with different values (e.g. the web browser runs only when there is
a network connection and one of AC power or recent user input, and is
frozen at other times so that the battery isn't wasted rotating banner
ads that nobody is looking at).

> > What happens
> > if the user starts to use their machine at 12:00:01?
>
> I would expect the backup to stop and be tried again later.

Suspend/resume has nothing to do with this. Backups should treat
suspend state like a routine transient network failure. If I suspend
my laptop because I'm moving from one meeting room to another, the TCP
connections used by the backups should survive and backups will continue
as soon as the machine wakes up. If the machine is suspended too long,
the backup TCP connections will fail, and the backup process will retry
if it's still in-window and stop if it's not.

> > Thus you probably want to have a "user present" status,
>
> We can do any sort of thing once the laptop is awake. But right now
> there's no way to know whether the resume is due to a user action or
> not.

This can be a useful thing to have, especially if it can be in the
form of a timestamped log of input events that occurred while userspace
was sleeping, and if it's not available any other way (e.g. by reading
the current state of wakeup button or lid sensor).

That said, I wouldn't advise building anything in userspace on top of
it, or at least not without combining that input with the other inputs.
The specific use cases you mentioned are much better served by ordinary
sensor inputs and userspace state tracking available after the kernel
is running. The "wake reason" could be another of those inputs.

> > then use the
> > timerfd() ALARM clockids to set any wakeups you'd like, and when they
> > trigger (if the system was suspended or not), decide to do your backup
> > based the conditionals you had above, using the user-present status in
> > a similar way to how you use AC status.
> >
> > I'd suggest looking into some of the details on how Android does its
> > wakelock logic, as well as the timerfd ALARM clockids, since I think
> > this would provide what you need.
>
> It doesn't. There's still a whole class of hardware that isn't always on
> as mobile SoCs are, and wakelocks aren't going to help if the kernel
> isn't running and we don't know why it started running again.

Funny, that never stopped me from implementing these use cases on such
hardware. I didn't even need wakelocks, although I might have implemented
a functional equivalent in userspace.

> > My bigger concern here with your use case though, is that you might be
> > able to use ALARM timers more commonly, but that for much existing
> > hardware, corner cases like programmatic resuming of a laptop while
> > its packed in a bag somewhere might have thermal risks.
>
> I'm pretty sure that Windows has done this for years before we did. If
> the laptop cannot suspend reliably, then the user would disable it. We
> cannot keep designing around broken software.

> > For mobile
> > devices this is an expected design point, but for off-the-shelf
> > laptops with big fans and exhaust vents, I'm not sure how safe this
> > would be, so you may need to constrain this functionality somehow (or
> > look to see if a enforced low-power resume is possible).
>
> I think that we won't know whether it's a problem until the point that
> somebody actually implements it.

I have implemented this starting about 9 years ago with a variety of
off-the-shelf laptops, netbooks, DIY Beagleboard-based hardware, etc.
There are two cases: devices that will run happily inside a laptop
bag, and devices that won't. The first case is trivial and requires no
further discussion. The second case requires reliable implementation
of one simple policy:

If the machine wakes up with no AC and the lid closed, assume the machine
is in a laptop bag and go straight back to sleep again. There is usually
just enough time to do this--and nothing else--before a dangerous amount
of heat builds up (assuming you're waking from S3 suspend or similar...if
you're waking from suspend-to-disk then you've already cooked the laptop
before userspace is running).

If the kernel crashes and fails to suspend or power off, the laptop _will_
be damaged (at the very least there will be a permanent reduction of
battery capacity), but that's true of OSX and Windows too. If there is
a choice between crashing and powering off, power off by default (an
option to power off on kernel panic would be particularly useful).

Note that there is never a "why are we awake?" input here. If you really
get to the bottom of this, the only relevant input is "is it safe to
run now, or do I need to suspend immediately?" and you need to monitor
and respond to that input all the time, not just when waking up. I don't
have a reliable "am I on fire yet?" input on random laptop hardware, so
I must I infer from the closed lid and battery power that some exception
to normal or safe use cases may be in progress.

I've had machines packed in laptop bags that ended up being handled so
roughly by airport people that the power/wakeup buttons on the keyboard
were pressed *through* the display panel. The machines woke up in a
laptop bag, drained the battery, and generated enough heat to make the
case too hot to touch. Technically, these machines were woken by user
action, but that doesn't mean they were in a safe operating environment.
The following day I implemented the "immediately go back to sleep
when the lid is closed and not on AC" policy and never looked back.
(of course this assumes reliable lid and AC sensor inputs...)

One note for the kernel PM people here: if userspace tells the kernel to
suspend, it means suspend. Immediately. Right Fscking Now. Not "wait
20 seconds for some random process stuck in iowait with a network file
server or broken USB device, and then resume again with only kernel
log messages to distinguish between that failure case and a user who
just closed the lid and then changed their mind." From the kernel,
don't call sync(), and suspend any buffer flushing already in progress.
Dirty buffers will still be in RAM on resume, and can be flushed when
the disks come back up.


Attachments:
(No filename) (10.33 kB)
signature.asc (198.00 B)
Digital signature
Download all attachments

2014-10-22 20:38:57

by Heinrich Schuchardt

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On 21.10.2014 15:11, Sergey wrote:
> Hey everyone,
>
> I'm glad we're having some discussion on this, because we have almost
> exactly the same kernel wishlist internally for elementary OS / Pantheon DE.
>
> I believe I can further elaborate on the VFS monitoring part. We need a file
> monitoring facility that's scalable (unlike inotify) and can provide a
> decent level of detail (unlike fanotify). In particular, we need to be able
> to detect file/directory creation, renaming and removal events, as well as
> close_write event. And, in an ideal world, all of that without requiring
> root permissions.
When I read your wish, I guess adding the capability to watch mounts
with inotify would satisfy your needs. Would you agree?

Best regards

Heinrich Schuchardt
>
> This can be almost accomplished by combining output of fanotify with that of
> a custom LSM module that just reports events to userspace (e.g. rlocate uses
> such a thing). There are two problems with this: first, it's a hideous hack,
> and second, it doesn't detect deletions.
>
> This is a big deal because without it we're stuck with always presenting the
> user with the filesystem. If you've seen library-based music players like
> Rhythmbox or Banshee, you know that they group and sort all your music by
> artist and album, but not by directory and file name, and that you can
> efficiently search all that metadata. We're trying to get the same thing
> into more applications, but the absence of VFS features described above is
> blocking us. Even after moving all the database management to a single
> daemon that does all the monitoring and very rarely has to rescan anything,
> the system either slows to a crawl (inotify) or the database gets out of
> date quickly (fanotify+LSM).
> In case I didn't make myself clear, a more detailed writeup on the design
> can be found here: http://tiny.cc/tearing-up-files
>
> Regarding the other items, AFAIK the kernel implements mechanism, not
> policy, so instead of "zswap selectively enabled by default" we just want
> "stable reliable zswap". We had to give up on zram previously (in pre-3.10
> days) because of kernel regressions leading to panics when zram was enabled.
> And we don't have the "Power management" part on our list because we haven't
> really delved in that yet. But our lists are identical in all the other
> areas, so that's not "just GNOME".
>
> PS: I'm not subscribed to LKML either, so please CC me.
>
> Cheers!
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2014-10-27 09:23:18

by Pavel Machek

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue 2014-10-21 13:11:07, Sergey wrote:
> Hey everyone,
>
> I'm glad we're having some discussion on this, because we have almost
> exactly the same kernel wishlist internally for elementary OS / Pantheon DE.
>
> I believe I can further elaborate on the VFS monitoring part. We need a file
> monitoring facility that's scalable (unlike inotify) and can provide a
> decent level of detail (unlike fanotify). In particular, we need to be able
> to detect file/directory creation, renaming and removal events, as well as
> close_write event. And, in an ideal world, all of that without requiring
> root permissions.
>
> This can be almost accomplished by combining output of fanotify with that of
> a custom LSM module that just reports events to userspace (e.g. rlocate uses
> such a thing). There are two problems with this: first, it's a hideous hack,
> and second, it doesn't detect deletions.

And third, it will eat unbounded ammounts of memory, right?

If "recursive mtime" was available, would that work for you?
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2014-10-27 09:28:12

by Pavel Machek

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

Hi!

> > I suspect wakeup type reporting is maybe not the best way to go about
> > this, since there may be a number of causes for wakeups and they can
> > arrive closely together in different orders, which can result in
> > races.
> >
> > For instance, if the machine suspends, and sets an alarm to be woken
> > up at midnight to do a backup, if the user resumes their laptop at
> > 11:59:59, should the backup still proceed at midnight?
>
> No. And I would expect that we would get a wake up type of "power
> button" or "lid open" in this case.

I believe you should really use "is lid opened or AC or dock
connected" to determine if it was automatic resume or not. It should
work better and you can actually do it today.

> > For mobile
> > devices this is an expected design point, but for off-the-shelf
> > laptops with big fans and exhaust vents, I'm not sure how safe this
> > would be, so you may need to constrain this functionality somehow (or
> > look to see if a enforced low-power resume is possible).
>
> I think that we won't know whether it's a problem until the point that
> somebody actually implements it.

Kernel does not stop you at this point, right?

Suspend-to-partition is also doable today (see suspend.sf.net), or you
can just swapon before starting. You can take it off the list, I
believe.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2014-10-27 13:56:21

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, 2014-10-21 at 12:28 -0700, Andy Lutomirski wrote:
> On 10/21/2014 01:49 AM, Bastien Nocera wrote:
> > Hey,
> >
> > GNOME has had discussions with kernel developers in the past, and,
> > fortunately, in some cases we were able to make headway.
> >
> > There are however a number of items that we still don't have solutions
> > for, items that kernel developers might not realise we'd like to rely
> > on, or don't know that we'd make use of if merged.
> >
> > I've posted this list at:
> > https://wiki.gnome.org/BastienNocera/KernelWishlist
> >
> > Let me know on-list or off-list if you have any comments about those, so
> > I can update the list.
>
> I don't know much about desktop environment infrastructure, but I think
> the kernel probably already has a lot of what's needed for LinuxApps.
>
> Tools like Sandstorm [1] (shameless plug, but it's a good example here)
> can already sandbox normal-ish programs, and those sandboxes can be
> launched without privilege [2].
>
> Why is kdbus needed?

Because it sucks less than passing fd's and using home-made protocols on
top of it.

> Why are overlays better than, say, btrfs
> lightweight copies here? Also, overlayfs might actually make it for 3.19.

Overlayfs works on more than just btrfs, which is useful to not rely on
a particular filesystem to implement those features.

> As for childfs, I implemented procfs polling a couple years ago, but it
> never went anywhere:
>
> http://lkml.kernel.org/g/1840e47fc4113af16989a4250d98bed62a9bce53.1354559528.git.luto@amacapital.net
>
> If that would help, I can try to dust it off and get it in to the kernel.

I'll pass that on to Ryan who requested this feature.

Cheers

2014-10-27 14:20:32

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, 2014-10-21 at 12:10 -0700, John Stultz wrote:
> On Tue, Oct 21, 2014 at 11:09 AM, Bastien Nocera <[email protected]> wrote:
> > On Tue, 2014-10-21 at 11:00 -0700, John Stultz wrote:
> >> I suspect wakeup type reporting is maybe not the best way to go about
> >> this, since there may be a number of causes for wakeups and they can
> >> arrive closely together in different orders, which can result in
> >> races.
> >>
> >> For instance, if the machine suspends, and sets an alarm to be woken
> >> up at midnight to do a backup, if the user resumes their laptop at
> >> 11:59:59, should the backup still proceed at midnight?
> >
> > No. And I would expect that we would get a wake up type of "power
> > button" or "lid open" in this case.
> >
> >> What happens
> >> if the user starts to use their machine at 12:00:01?
> >
> > I would expect the backup to stop and be tried again later.
>
> What "event" would you be using to trigger stopping the backup?
>
>
> >> What about if
> >> the user walked away from their machine at 11:55:01, and the system
> >> would suspend at 12:00:01, should the backup commence at 12:00:00?
> >
> > That wouldn't happen because we'd set the wake up time when suspending.
> >
> >> Thus you probably want to have a "user present" status,
> >
> > We can do any sort of thing once the laptop is awake. But right now
> > there's no way to know whether the resume is due to a user action or
> > not.
>
> I'm suggesting its best if you don't care which specific irq brought
> the device out of suspend mode.
>
> You may want to monitor various events like the lid-open or
> power-button (as well as the timerfd for alarms), and use those for
> your logic, but again, because a number of different irqs might bring
> the system out of suspend and those irqs can possibly occur almost
> simultaneously, which specific one landed first and woke the system is
> really not that useful (and again prone to races).
>
> Now, I do think knowing which IRQ did bring you out of suspend is
> useful, but mostly for power-debugging when you're trying to optimize
> battery life. But for userland logic, I think its far too prone to
> races.

I also cannot know, from user-space, whether Wake-On-LAN,
Wake-On-Wireless-LAN, or the Wi-Fi card's "network proximity" triggered
coming out of suspend for example.

I can certainly check for the status of the lid, but I wouldn't know
whether a button was pressed to turn the machine back on, as the
firmware would eat that.

To make it short, I don't have a way to know, from user-space, whether
the event that took it out of suspend was programmatic, or user action.
I would add that, even if we said that races can occur, I have no easy
way to know, from user-space, whether the last thing that occurred was
the Wi-Fi card waking the machine up or the power button being pressed.

I'm sure all that information is available inside the kernel, but the
user-space interface for it is lacking.

> >> then use the
> >> timerfd() ALARM clockids to set any wakeups you'd like, and when they
> >> trigger (if the system was suspended or not), decide to do your backup
> >> based the conditionals you had above, using the user-present status in
> >> a similar way to how you use AC status.
> >>
> >> I'd suggest looking into some of the details on how Android does its
> >> wakelock logic, as well as the timerfd ALARM clockids, since I think
> >> this would provide what you need.
> >
> > It doesn't. There's still a whole class of hardware that isn't always on
> > as mobile SoCs are, and wakelocks aren't going to help if the kernel
> > isn't running and we don't know why it started running again.
>
> I'm not sure I parsed this properly. Mobile SoCs are quite frequently
> in suspend and not always on. They frequently resume both due to
> wakeup alarms, modem call irqs, and as a result of user-interaction
> like button presses.
>
>
> >> My bigger concern here with your use case though, is that you might be
> >> able to use ALARM timers more commonly, but that for much existing
> >> hardware, corner cases like programmatic resuming of a laptop while
> >> its packed in a bag somewhere might have thermal risks.
> >
> > I'm pretty sure that Windows has done this for years before we did. If
> > the laptop cannot suspend reliably, then the user would disable it. We
> > cannot keep designing around broken software.
>
> Sure. But its not reliably suspending I'm worried about, its
> accidentally resuming in an environment the hardware wasn't designed
> for. Its really more of a hardware design issue. I'm not suggesting
> you don't do it, but I just suspect you'll need to be careful about
> automatically enabling this on older hardware.

It could be opt-in if that's actually a problem.

2014-10-27 14:21:50

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, 2014-10-21 at 20:24 +0200, Geert Uytterhoeven wrote:
> On Tue, Oct 21, 2014 at 7:14 PM, Bastien Nocera <[email protected]> wrote:
> >> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
> >> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
> >> documentation'
> >>
> >> Can you expand more on the rational for the need here? Is this for UI
> >> for power debugging, or something else?
> >
> > No, it would be used for automating backups, or implementing
> > suspend->hibernation transitions. For example, right before the machine
> > suspends, I would schedule it to wake up in a hour. If I get woken up by
> > the rtc alarm (and not by the user through a lid open), I might:
> > - check that I'm plugged into the AC, it's night, and in the vicinity of
> > the server that handles my backups and so backup the system.
> > - check whether the battery is low, and hibernate the machine (if it
> > supports it, obviously).
>
> Isn't this already available through /sys/kernel/debug/wakeup_sources
> and/or the various power/wake* files in sysfs?

That might very well be, but /sys/kernel/debug/wakeup_sources really
isn't much of a user-space API. Where's the documentation for the
various power/wake* files?

As in my mail to John, that information might already be available
within the kernel, but it's not exported in a sensible way to
user-space.

2014-10-27 14:29:00

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Wed, 2014-10-22 at 13:04 -0400, Zygo Blaxell wrote:
> On Tue, Oct 21, 2014 at 08:09:38PM +0200, Bastien Nocera wrote:
> > On Tue, 2014-10-21 at 11:00 -0700, John Stultz wrote:
> > > On Tue, Oct 21, 2014 at 10:14 AM, Bastien Nocera <[email protected]> wrote:
> > > >> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
> > > >> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
> > > >> documentation'
> > > >>
> > > >> Can you expand more on the rational for the need here? Is this for UI
> > > >> for power debugging, or something else?
> > > >
> > > > No, it would be used for automating backups, or implementing
> > > > suspend->hibernation transitions. For example, right before the machine
> > > > suspends, I would schedule it to wake up in a hour. If I get woken up by
> > > > the rtc alarm (and not by the user through a lid open), I might:
> > > > - check that I'm plugged into the AC, it's night, and in the vicinity of
> > > > the server that handles my backups and so backup the system.
> > > > - check whether the battery is low, and hibernate the machine (if it
> > > > supports it, obviously).
> > > >
> > > > We cannot do that if we can't make out whether the wake-up came from a
> > > > user action, or the alarm we set.
> > >
> > > I suspect wakeup type reporting is maybe not the best way to go about
> > > this, since there may be a number of causes for wakeups and they can
> > > arrive closely together in different orders, which can result in
> > > races.
> > >
> > > For instance, if the machine suspends, and sets an alarm to be woken
> > > up at midnight to do a backup, if the user resumes their laptop at
> > > 11:59:59, should the backup still proceed at midnight?
> >
> > No. And I would expect that we would get a wake up type of "power
> > button" or "lid open" in this case.
>
> I have been using something like this for the last 7 years or so.
> The relevant inputs are:
>
> 1. is the user present (is there recent input on HID devices,
> keyboard/mouse, but ignore devices like light sensors, 3D
> accelerometers, and ACPI virtual keys)?

If the user woke the machine up through the power button, you wouldn't
see that from user-space. You could detect that the lid was opened,
because you have state.

> 2. which network connection(s) are available to reach the
> backup server?
>
> 3. how much power is available (if on battery, how much run
> time left?)
>
> 4. what is the policy (do backups happen at a specific time
> of day, or whenever they can?)
>
> 5. was a backup completed successfully in the last N hours?
>
> Note the absence of any information about the cause of recent
> suspend/resume activity, or any input from suspend/resume at all.

How do I tell my environment not to wake the screen up when the machine
was woken up by an alarm I scheduled to launch a backup? Or not to
resume audio playback when I get woken up to handle network events
(through connected suspend, or Wake-On-(Wireless-)LAN)?

2014-10-27 14:32:30

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Mon, 2014-10-27 at 10:28 +0100, Pavel Machek wrote:
> Hi!
>
> > > I suspect wakeup type reporting is maybe not the best way to go about
> > > this, since there may be a number of causes for wakeups and they can
> > > arrive closely together in different orders, which can result in
> > > races.
> > >
> > > For instance, if the machine suspends, and sets an alarm to be woken
> > > up at midnight to do a backup, if the user resumes their laptop at
> > > 11:59:59, should the backup still proceed at midnight?
> >
> > No. And I would expect that we would get a wake up type of "power
> > button" or "lid open" in this case.
>
> I believe you should really use "is lid opened or AC or dock
> connected" to determine if it was automatic resume or not. It should
> work better and you can actually do it today.

There's no LID or docks on a tablet.

> > > For mobile
> > > devices this is an expected design point, but for off-the-shelf
> > > laptops with big fans and exhaust vents, I'm not sure how safe this
> > > would be, so you may need to constrain this functionality somehow (or
> > > look to see if a enforced low-power resume is possible).
> >
> > I think that we won't know whether it's a problem until the point that
> > somebody actually implements it.
>
> Kernel does not stop you at this point, right?
>
> Suspend-to-partition is also doable today (see suspend.sf.net),

Is it perfect? Because no releases in 3 years kind of scares me.

> or you
> can just swapon before starting. You can take it off the list, I
> believe.

Or we could create a new filesystem type that isn't swap, that isn't
used by swap at all, but could be created by distributions' installers.
Then I wouldn't need to hope that the swap didn't start being used in
between me enabling it, and the suspend actually occurring (making it
impossible to disable afterwards).

2014-10-27 15:12:49

by Andy Lutomirski

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Oct 27, 2014 6:56 AM, "Bastien Nocera" <[email protected]> wrote:
>
> On Tue, 2014-10-21 at 12:28 -0700, Andy Lutomirski wrote:
> > On 10/21/2014 01:49 AM, Bastien Nocera wrote:
> > > Hey,
> > >
> > > GNOME has had discussions with kernel developers in the past, and,
> > > fortunately, in some cases we were able to make headway.
> > >
> > > There are however a number of items that we still don't have solutions
> > > for, items that kernel developers might not realise we'd like to rely
> > > on, or don't know that we'd make use of if merged.
> > >
> > > I've posted this list at:
> > > https://wiki.gnome.org/BastienNocera/KernelWishlist
> > >
> > > Let me know on-list or off-list if you have any comments about those, so
> > > I can update the list.
> >
> > I don't know much about desktop environment infrastructure, but I think
> > the kernel probably already has a lot of what's needed for LinuxApps.
> >
> > Tools like Sandstorm [1] (shameless plug, but it's a good example here)
> > can already sandbox normal-ish programs, and those sandboxes can be
> > launched without privilege [2].
> >
> > Why is kdbus needed?
>
> Because it sucks less than passing fd's and using home-made protocols on
> top of it.

For securely communicating with a container, "it sucks less" is hard
to use as a design criterion.

What's wrong with fds, and how does kdbus solve it?

--Andy

2014-10-27 15:31:28

by Geert Uytterhoeven

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

Hi Bastien,

On Mon, Oct 27, 2014 at 3:20 PM, Bastien Nocera <[email protected]> wrote:
> On Tue, 2014-10-21 at 20:24 +0200, Geert Uytterhoeven wrote:
>> On Tue, Oct 21, 2014 at 7:14 PM, Bastien Nocera <[email protected]> wrote:
>> >> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
>> >> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
>> >> documentation'
>> >>
>> >> Can you expand more on the rational for the need here? Is this for UI
>> >> for power debugging, or something else?
>> >
>> > No, it would be used for automating backups, or implementing
>> > suspend->hibernation transitions. For example, right before the machine
>> > suspends, I would schedule it to wake up in a hour. If I get woken up by
>> > the rtc alarm (and not by the user through a lid open), I might:
>> > - check that I'm plugged into the AC, it's night, and in the vicinity of
>> > the server that handles my backups and so backup the system.
>> > - check whether the battery is low, and hibernate the machine (if it
>> > supports it, obviously).
>>
>> Isn't this already available through /sys/kernel/debug/wakeup_sources
>> and/or the various power/wake* files in sysfs?
>
> That might very well be, but /sys/kernel/debug/wakeup_sources really
> isn't much of a user-space API. Where's the documentation for the
> various power/wake* files?

The debugfs entry is indeed not documented.
Documentation for the others is just a git grep away:

$ git grep power/wake -- Documentation/
Documentation/ABI/testing/sysfs-devices-power:What:
/sys/devices/.../power/wakeup
Documentation/ABI/testing/sysfs-devices-power: The
/sys/devices/.../power/wakeup attribute allows the user
Documentation/ABI/testing/sysfs-devices-power: have one of
the following two values for the sysfs power/wakeup
Documentation/ABI/testing/sysfs-devices-power:What:
/sys/devices/.../power/wakeup_count
Documentation/ABI/testing/sysfs-devices-power:What:
/sys/devices/.../power/wakeup_active_count
Documentation/ABI/testing/sysfs-devices-power:What:
/sys/devices/.../power/wakeup_abort_count
Documentation/ABI/testing/sysfs-devices-power:What:
/sys/devices/.../power/wakeup_expire_count
Documentation/ABI/testing/sysfs-devices-power:What:
/sys/devices/.../power/wakeup_active
Documentation/ABI/testing/sysfs-devices-power:What:
/sys/devices/.../power/wakeup_total_time_ms
Documentation/ABI/testing/sysfs-devices-power:What:
/sys/devices/.../power/wakeup_max_time_ms
Documentation/ABI/testing/sysfs-devices-power:What:
/sys/devices/.../power/wakeup_last_time_ms
Documentation/ABI/testing/sysfs-devices-power:What:
/sys/devices/.../power/wakeup_prevent_sleep_time_ms
Documentation/ABI/testing/sysfs-power:What: /sys/power/wakeup_count
Documentation/ABI/testing/sysfs-power: The
/sys/power/wakeup_count file allows user space to put the
Documentation/ABI/testing/sysfs-power:What: /sys/power/wake_lock
Documentation/ABI/testing/sysfs-power: The
/sys/power/wake_lock file allows user space to create
Documentation/ABI/testing/sysfs-power:
/sys/power/wakeup_count file block or return false). When a
Documentation/ABI/testing/sysfs-power: string without white
space is written to /sys/power/wake_lock,
Documentation/ABI/testing/sysfs-power: If a string written to
/sys/power/wake_lock contains white
Documentation/ABI/testing/sysfs-power:What: /sys/power/wake_unlock
Documentation/ABI/testing/sysfs-power: The
/sys/power/wake_unlock file allows user space to deactivate
Documentation/ABI/testing/sysfs-power: wakeup sources created
with the help of /sys/power/wake_lock.
Documentation/ABI/testing/sysfs-power: When a string is
written to /sys/power/wake_unlock, it will be
Documentation/ABI/testing/sysfs-power: wakeup sources created
with the help of /sys/power/wake_lock
Documentation/power/devices.txt: using the relevant
/sys/devices/.../power/wakeup file (for Ethernet
Documentation/power/devices.txt:/sys/devices/.../power/wakeup files
Documentation/power/devices.txt:"power/wakeup" file. User space can
write the strings "enabled" or "disabled"
Documentation/power/devices.txt:The "power/wakeup" file is supposed to
contain the "disabled" string initially
Documentation/power/devices.txt:exists and the corresponding
"power/wakeup" file contains the string "enabled".
Documentation/usb/power-management.txt: power/wakeup

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- [email protected]

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

2014-10-27 15:45:53

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Mon, 2014-10-27 at 16:31 +0100, Geert Uytterhoeven wrote:
> Hi Bastien,
>
> On Mon, Oct 27, 2014 at 3:20 PM, Bastien Nocera <[email protected]> wrote:
> > On Tue, 2014-10-21 at 20:24 +0200, Geert Uytterhoeven wrote:
> >> On Tue, Oct 21, 2014 at 7:14 PM, Bastien Nocera <[email protected]> wrote:
> >> >> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
> >> >> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
> >> >> documentation'
> >> >>
> >> >> Can you expand more on the rational for the need here? Is this for UI
> >> >> for power debugging, or something else?
> >> >
> >> > No, it would be used for automating backups, or implementing
> >> > suspend->hibernation transitions. For example, right before the machine
> >> > suspends, I would schedule it to wake up in a hour. If I get woken up by
> >> > the rtc alarm (and not by the user through a lid open), I might:
> >> > - check that I'm plugged into the AC, it's night, and in the vicinity of
> >> > the server that handles my backups and so backup the system.
> >> > - check whether the battery is low, and hibernate the machine (if it
> >> > supports it, obviously).
> >>
> >> Isn't this already available through /sys/kernel/debug/wakeup_sources
> >> and/or the various power/wake* files in sysfs?
> >
> > That might very well be, but /sys/kernel/debug/wakeup_sources really
> > isn't much of a user-space API. Where's the documentation for the
> > various power/wake* files?
>
> The debugfs entry is indeed not documented.
> Documentation for the others is just a git grep away:

Fair enough, but apart from tallying up wakeup_count by hand, I'm not
sure how I'm supposed to interface with that.

It also doesn't fix the problem of knowing which event caused a wakeup
(or when the last event occurred on a particular device) when this
device can have multiple different reasons to wake up (see the
WOL/connected suspend mentions earlier in the thread).

2014-10-27 15:46:20

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Mon, 2014-10-27 at 08:12 -0700, Andy Lutomirski wrote:
> On Oct 27, 2014 6:56 AM, "Bastien Nocera" <[email protected]> wrote:
> >
> > On Tue, 2014-10-21 at 12:28 -0700, Andy Lutomirski wrote:
> > > On 10/21/2014 01:49 AM, Bastien Nocera wrote:
> > > > Hey,
> > > >
> > > > GNOME has had discussions with kernel developers in the past, and,
> > > > fortunately, in some cases we were able to make headway.
> > > >
> > > > There are however a number of items that we still don't have solutions
> > > > for, items that kernel developers might not realise we'd like to rely
> > > > on, or don't know that we'd make use of if merged.
> > > >
> > > > I've posted this list at:
> > > > https://wiki.gnome.org/BastienNocera/KernelWishlist
> > > >
> > > > Let me know on-list or off-list if you have any comments about those, so
> > > > I can update the list.
> > >
> > > I don't know much about desktop environment infrastructure, but I think
> > > the kernel probably already has a lot of what's needed for LinuxApps.
> > >
> > > Tools like Sandstorm [1] (shameless plug, but it's a good example here)
> > > can already sandbox normal-ish programs, and those sandboxes can be
> > > launched without privilege [2].
> > >
> > > Why is kdbus needed?
> >
> > Because it sucks less than passing fd's and using home-made protocols on
> > top of it.
>
> For securely communicating with a container, "it sucks less" is hard
> to use as a design criterion.

Sucking less is a requirement when it comes to being able to use it. At
the very least, when it comes to security, the fact that the protocol
can be captured and analysed in wireshark is already of great help to
inspect what each component of the system is doing. More so than passing
fd's and using a custom protocol on the server and client sides.

> What's wrong with fds, and how does kdbus solve it?

By having a well-known protocol and defined semantics on top of that
communication channel. I could try and re-explain why kdbus is needed,
but I wouldn't do as good a job as the people working on it, so best to
refer to the individual threads about kdbus on this list.

2014-10-27 16:02:59

by Shnatsel

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

> If "recursive mtime" was available, would that work for you?

It would work for detecting "offline" changes. I suppose recursive
mtime not viable for online monitoring, mostly because detecting file
renaming would be a massive PITA (and we already have fanotify with
exactly this problem).

Another approach to offline monitoring is using things like btrfs
changelogs - e.g. incremental "btrfs send", but I don't think that's
viable for other filesystems and I don't know how efficient it is even
on btrfs.

Actually, being able to read btrfs changelog from userspace more or
less as it happens (<200ms latency on commodity x86) would fulfill our
requirements for online VFS monitoring facility, and we could use
"btrfs send" approach for replaying offline changes. The good thing
about this approach is that it can use a fixed-size buffer for feeding
the info in userspace - if userspace can't keep up, the data is not
lost but recorded to the filesystem and can be retrieved from it
later. In this case we'd be tied to a specific filesystem, but that's
worlds better than nothing at all.

--
Sergey "Shnatsel" Davidoff

2014-10-27 16:08:43

by Andy Lutomirski

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Mon, Oct 27, 2014 at 8:45 AM, Bastien Nocera <[email protected]> wrote:
> On Mon, 2014-10-27 at 08:12 -0700, Andy Lutomirski wrote:
>> On Oct 27, 2014 6:56 AM, "Bastien Nocera" <[email protected]> wrote:
>> >
>> > On Tue, 2014-10-21 at 12:28 -0700, Andy Lutomirski wrote:
>> > > On 10/21/2014 01:49 AM, Bastien Nocera wrote:
>> > > > Hey,
>> > > >
>> > > > GNOME has had discussions with kernel developers in the past, and,
>> > > > fortunately, in some cases we were able to make headway.
>> > > >
>> > > > There are however a number of items that we still don't have solutions
>> > > > for, items that kernel developers might not realise we'd like to rely
>> > > > on, or don't know that we'd make use of if merged.
>> > > >
>> > > > I've posted this list at:
>> > > > https://wiki.gnome.org/BastienNocera/KernelWishlist
>> > > >
>> > > > Let me know on-list or off-list if you have any comments about those, so
>> > > > I can update the list.
>> > >
>> > > I don't know much about desktop environment infrastructure, but I think
>> > > the kernel probably already has a lot of what's needed for LinuxApps.
>> > >
>> > > Tools like Sandstorm [1] (shameless plug, but it's a good example here)
>> > > can already sandbox normal-ish programs, and those sandboxes can be
>> > > launched without privilege [2].
>> > >
>> > > Why is kdbus needed?
>> >
>> > Because it sucks less than passing fd's and using home-made protocols on
>> > top of it.
>>
>> For securely communicating with a container, "it sucks less" is hard
>> to use as a design criterion.
>
> Sucking less is a requirement when it comes to being able to use it. At
> the very least, when it comes to security, the fact that the protocol
> can be captured and analysed in wireshark is already of great help to
> inspect what each component of the system is doing. More so than passing
> fd's and using a custom protocol on the server and client sides.
>
>> What's wrong with fds, and how does kdbus solve it?
>
> By having a well-known protocol and defined semantics on top of that
> communication channel. I could try and re-explain why kdbus is needed,
> but I wouldn't do as good a job as the people working on it, so best to
> refer to the individual threads about kdbus on this list.
>

I didn't do a good job asking the question, then.

What's wrong with fds in the context of communicating with a
container? What does kdbus do container-wise that helps?

--Andy

2014-10-27 16:10:01

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Mon, 2014-10-27 at 09:08 -0700, Andy Lutomirski wrote:
> On Mon, Oct 27, 2014 at 8:45 AM, Bastien Nocera <[email protected]> wrote:
> > On Mon, 2014-10-27 at 08:12 -0700, Andy Lutomirski wrote:
> >> On Oct 27, 2014 6:56 AM, "Bastien Nocera" <[email protected]> wrote:
> >> >
> >> > On Tue, 2014-10-21 at 12:28 -0700, Andy Lutomirski wrote:
> >> > > On 10/21/2014 01:49 AM, Bastien Nocera wrote:
> >> > > > Hey,
> >> > > >
> >> > > > GNOME has had discussions with kernel developers in the past, and,
> >> > > > fortunately, in some cases we were able to make headway.
> >> > > >
> >> > > > There are however a number of items that we still don't have solutions
> >> > > > for, items that kernel developers might not realise we'd like to rely
> >> > > > on, or don't know that we'd make use of if merged.
> >> > > >
> >> > > > I've posted this list at:
> >> > > > https://wiki.gnome.org/BastienNocera/KernelWishlist
> >> > > >
> >> > > > Let me know on-list or off-list if you have any comments about those, so
> >> > > > I can update the list.
> >> > >
> >> > > I don't know much about desktop environment infrastructure, but I think
> >> > > the kernel probably already has a lot of what's needed for LinuxApps.
> >> > >
> >> > > Tools like Sandstorm [1] (shameless plug, but it's a good example here)
> >> > > can already sandbox normal-ish programs, and those sandboxes can be
> >> > > launched without privilege [2].
> >> > >
> >> > > Why is kdbus needed?
> >> >
> >> > Because it sucks less than passing fd's and using home-made protocols on
> >> > top of it.
> >>
> >> For securely communicating with a container, "it sucks less" is hard
> >> to use as a design criterion.
> >
> > Sucking less is a requirement when it comes to being able to use it. At
> > the very least, when it comes to security, the fact that the protocol
> > can be captured and analysed in wireshark is already of great help to
> > inspect what each component of the system is doing. More so than passing
> > fd's and using a custom protocol on the server and client sides.
> >
> >> What's wrong with fds, and how does kdbus solve it?
> >
> > By having a well-known protocol and defined semantics on top of that
> > communication channel. I could try and re-explain why kdbus is needed,
> > but I wouldn't do as good a job as the people working on it, so best to
> > refer to the individual threads about kdbus on this list.
> >
>
> I didn't do a good job asking the question, then.
>
> What's wrong with fds in the context of communicating with a
> container? What does kdbus do container-wise that helps?

Nothing's wrong with using fd's. They're just a very poor API.

2014-10-27 16:11:09

by Shnatsel

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

> When I read your wish, I guess adding the capability to watch mounts with
> inotify would satisfy your needs. Would you agree?

The problem with inotify is that it's very inefficient for monitoring
hundreds of thousands of files, which is a fairly common case on the
desktop. I cannot see how support for watching mounts would fix that.

--
Sergey "Shnatsel" Davidoff

2014-10-27 16:22:25

by Al Viro

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Mon, Oct 27, 2014 at 05:09:06PM +0100, Bastien Nocera wrote:

> > > By having a well-known protocol and defined semantics on top of that
> > > communication channel. I could try and re-explain why kdbus is needed,
> > > but I wouldn't do as good a job as the people working on it, so best to
> > > refer to the individual threads about kdbus on this list.
> > >
> >
> > I didn't do a good job asking the question, then.
> >
> > What's wrong with fds in the context of communicating with a
> > container? What does kdbus do container-wise that helps?
>
> Nothing's wrong with using fd's. They're just a very poor API.

For a proof by assertion it's too transparent, for any other kind of
explanation - far too short on details...

2014-10-27 16:57:00

by John Stultz

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Mon, Oct 27, 2014 at 7:19 AM, Bastien Nocera <[email protected]> wrote:
> I also cannot know, from user-space, whether Wake-On-LAN,
> Wake-On-Wireless-LAN, or the Wi-Fi card's "network proximity" triggered
> coming out of suspend for example.
>
> I can certainly check for the status of the lid, but I wouldn't know
> whether a button was pressed to turn the machine back on, as the
> firmware would eat that.

If the firmware eats that button (which I hope it wouldn't, but I
probably should know better then to expect sane behavior), how does
the kernel know anything more?

> To make it short, I don't have a way to know, from user-space, whether
> the event that took it out of suspend was programmatic, or user action.
> I would add that, even if we said that races can occur, I have no easy
> way to know, from user-space, whether the last thing that occurred was
> the Wi-Fi card waking the machine up or the power button being pressed.


Again, I think you just want to know if the power button (or lid
trigger) was pressed. Not if it was the cause for resume (since a
wake-on-lan or alarm could fire right as the user presses the power
button). If those button presses don't reliably get communicated, I
think that's a better problem to solve in the kernel.

Again, part of the reason I'm pushing back here, is that there may be
a lot of things going on on a system, and systems may suspend and
resume quite often while being in use, so applications really should
handle events consistently weather the system was suspended or not
(another lesson from android: suspend blocking is a more flexible
approach then having applications initiate suspend, since you avoid
all the races of multiple applications trying to manage initiating
suspend state).

But the other part of why I'm pushing back is that on future hardware,
we may not have a "suspend" mode, and systems may just be in a deep
idle, with selected interrupts disabled (event filtering, in other
words). So I think its better if you design around events (button
presses, lid triggers, mouse movements, timers firing), rather then
specific system suspend state.


>> Sure. But its not reliably suspending I'm worried about, its
>> accidentally resuming in an environment the hardware wasn't designed
>> for. Its really more of a hardware design issue. I'm not suggesting
>> you don't do it, but I just suspect you'll need to be careful about
>> automatically enabling this on older hardware.
>
> It could be opt-in if that's actually a problem.

Yea. And again,I don't mean to throw water on the idea, I just wanted
to make sure considerations were being made. Its good folks are
working to keep function/feature parity with other modern desktop
OSes, but quite often those OSes aren't expected to run on the same
variety of hardware. So finding a way to detect safe hardware-designs
would be useful for your effort.

thanks
-john

2014-10-27 20:59:40

by Zygo Blaxell

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Mon, Oct 27, 2014 at 03:28:04PM +0100, Bastien Nocera wrote:
> On Wed, 2014-10-22 at 13:04 -0400, Zygo Blaxell wrote:
> > On Tue, Oct 21, 2014 at 08:09:38PM +0200, Bastien Nocera wrote:
> > > On Tue, 2014-10-21 at 11:00 -0700, John Stultz wrote:
> > > > On Tue, Oct 21, 2014 at 10:14 AM, Bastien Nocera <[email protected]> wrote:
> > > > >> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
> > > > >> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
> > > > >> documentation'
> > > > >>
> > > > >> Can you expand more on the rational for the need here? Is this for UI
> > > > >> for power debugging, or something else?
> > > > >
> > > > > No, it would be used for automating backups, or implementing
> > > > > suspend->hibernation transitions. For example, right before the machine
> > > > > suspends, I would schedule it to wake up in a hour. If I get woken up by
> > > > > the rtc alarm (and not by the user through a lid open), I might:
> > > > > - check that I'm plugged into the AC, it's night, and in the vicinity of
> > > > > the server that handles my backups and so backup the system.
> > > > > - check whether the battery is low, and hibernate the machine (if it
> > > > > supports it, obviously).
> > > > >
> > > > > We cannot do that if we can't make out whether the wake-up came from a
> > > > > user action, or the alarm we set.
> > > >
> > > > I suspect wakeup type reporting is maybe not the best way to go about
> > > > this, since there may be a number of causes for wakeups and they can
> > > > arrive closely together in different orders, which can result in
> > > > races.
> > > >
> > > > For instance, if the machine suspends, and sets an alarm to be woken
> > > > up at midnight to do a backup, if the user resumes their laptop at
> > > > 11:59:59, should the backup still proceed at midnight?
> > >
> > > No. And I would expect that we would get a wake up type of "power
> > > button" or "lid open" in this case.
> >
> > I have been using something like this for the last 7 years or so.
> > The relevant inputs are:
> >
> > 1. is the user present (is there recent input on HID devices,
> > keyboard/mouse, but ignore devices like light sensors, 3D
> > accelerometers, and ACPI virtual keys)?
>
> If the user woke the machine up through the power button, you wouldn't
> see that from user-space. You could detect that the lid was opened,
> because you have state.
>
> > 2. which network connection(s) are available to reach the
> > backup server?
> >
> > 3. how much power is available (if on battery, how much run
> > time left?)
> >
> > 4. what is the policy (do backups happen at a specific time
> > of day, or whenever they can?)
> >
> > 5. was a backup completed successfully in the last N hours?
> >
> > Note the absence of any information about the cause of recent
> > suspend/resume activity, or any input from suspend/resume at all.
>
> How do I tell my environment not to wake the screen up when the machine
> was woken up by an alarm I scheduled to launch a backup?

Lid closed? Screen off (nobody can see it, it wastes power on battery,
and lengthens charge time on AC).

Lid open and user input? Screen on.

Lid open and user disabled time-based screen power saving? Screen on.

These are not symmetrical. To wake up the screen, the screen needs to
be visible, and the policy conditions to wake up the screen need to be
met. Spontaneously firing up a backlight at full power in a dark room
in the middle of the night may not be appreciated by the system owner.

I could see the utility of capturing ACPI state here, but only as a way
to sense the "user present" condition.

> Or not to
> resume audio playback when I get woken up to handle network events
> (through connected suspend, or Wake-On-(Wireless-)LAN)?

Audio playback can wait for evidence of user presence too (unless we're
implementing an alarm clock, and need to wake up the audio ourselves--but
we'd know that because of the time).


Attachments:
(No filename) (3.90 kB)
signature.asc (198.00 B)
Digital signature
Download all attachments

2014-10-28 11:37:20

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Mon, 2014-10-27 at 16:59 -0400, Zygo Blaxell wrote:
> On Mon, Oct 27, 2014 at 03:28:04PM +0100, Bastien Nocera wrote:
> > On Wed, 2014-10-22 at 13:04 -0400, Zygo Blaxell wrote:
> > > On Tue, Oct 21, 2014 at 08:09:38PM +0200, Bastien Nocera wrote:
> > > > On Tue, 2014-10-21 at 11:00 -0700, John Stultz wrote:
> > > > > On Tue, Oct 21, 2014 at 10:14 AM, Bastien Nocera <[email protected]> wrote:
> > > > > >> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
> > > > > >> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
> > > > > >> documentation'
> > > > > >>
> > > > > >> Can you expand more on the rational for the need here? Is this for UI
> > > > > >> for power debugging, or something else?
> > > > > >
> > > > > > No, it would be used for automating backups, or implementing
> > > > > > suspend->hibernation transitions. For example, right before the machine
> > > > > > suspends, I would schedule it to wake up in a hour. If I get woken up by
> > > > > > the rtc alarm (and not by the user through a lid open), I might:
> > > > > > - check that I'm plugged into the AC, it's night, and in the vicinity of
> > > > > > the server that handles my backups and so backup the system.
> > > > > > - check whether the battery is low, and hibernate the machine (if it
> > > > > > supports it, obviously).
> > > > > >
> > > > > > We cannot do that if we can't make out whether the wake-up came from a
> > > > > > user action, or the alarm we set.
> > > > >
> > > > > I suspect wakeup type reporting is maybe not the best way to go about
> > > > > this, since there may be a number of causes for wakeups and they can
> > > > > arrive closely together in different orders, which can result in
> > > > > races.
> > > > >
> > > > > For instance, if the machine suspends, and sets an alarm to be woken
> > > > > up at midnight to do a backup, if the user resumes their laptop at
> > > > > 11:59:59, should the backup still proceed at midnight?
> > > >
> > > > No. And I would expect that we would get a wake up type of "power
> > > > button" or "lid open" in this case.
> > >
> > > I have been using something like this for the last 7 years or so.
> > > The relevant inputs are:
> > >
> > > 1. is the user present (is there recent input on HID devices,
> > > keyboard/mouse, but ignore devices like light sensors, 3D
> > > accelerometers, and ACPI virtual keys)?
> >
> > If the user woke the machine up through the power button, you wouldn't
> > see that from user-space. You could detect that the lid was opened,
> > because you have state.
> >
> > > 2. which network connection(s) are available to reach the
> > > backup server?
> > >
> > > 3. how much power is available (if on battery, how much run
> > > time left?)
> > >
> > > 4. what is the policy (do backups happen at a specific time
> > > of day, or whenever they can?)
> > >
> > > 5. was a backup completed successfully in the last N hours?
> > >
> > > Note the absence of any information about the cause of recent
> > > suspend/resume activity, or any input from suspend/resume at all.
> >
> > How do I tell my environment not to wake the screen up when the machine
> > was woken up by an alarm I scheduled to launch a backup?
>
> Lid closed? Screen off (nobody can see it, it wastes power on battery,
> and lengthens charge time on AC).
>
> Lid open and user input? Screen on.
>
> Lid open and user disabled time-based screen power saving? Screen on.
>
> These are not symmetrical. To wake up the screen, the screen needs to
> be visible, and the policy conditions to wake up the screen need to be
> met. Spontaneously firing up a backlight at full power in a dark room
> in the middle of the night may not be appreciated by the system owner.

How do I detect that the screen is visible on a tablet? I could turn on
the webcam to see if a face is detected ;)

Maybe the wake-up reason isn't good enough on its own, but how do I know
which one the possible wake-up reasons was the last one to trigger?

2014-10-28 14:43:27

by John Stultz

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, Oct 28, 2014 at 5:36 AM, Bastien Nocera <[email protected]> wrote:
> Maybe the wake-up reason isn't good enough on its own, but how do I know
> which one the possible wake-up reasons was the last one to trigger?

So I feel like I'm still missing why its so critical to know what the
last-event was? To me it seems a number of events have occurred, and
they should all be processed. Since they're all asynchronous, they
could come in any order, so it seems best handle them one by one
rather then have any requirement on which one happened last.

What does the exact timeline of the events provide for you?

thanks
-john

2014-10-28 18:50:34

by Pavel Machek

[permalink] [raw]
Subject: suspend to partition Re: A desktop environment[1] kernel wishlist

Hi!

> > > > For mobile
> > > > devices this is an expected design point, but for off-the-shelf
> > > > laptops with big fans and exhaust vents, I'm not sure how safe this
> > > > would be, so you may need to constrain this functionality somehow (or
> > > > look to see if a enforced low-power resume is possible).
> > >
> > > I think that we won't know whether it's a problem until the point that
> > > somebody actually implements it.
> >
> > Kernel does not stop you at this point, right?
> >
> > Suspend-to-partition is also doable today (see suspend.sf.net),
>
> Is it perfect? Because no releases in 3 years kind of scares me.

Yes, it is perfect :-).

[It is in use by SUSE, so it should be good; and if anything, it shows
that suspend-to-partition does not need kernel help.]

> > or you
> > can just swapon before starting. You can take it off the list, I
> > believe.
>
> Or we could create a new filesystem type that isn't swap, that isn't
> used by swap at all, but could be created by distributions' installers.
> Then I wouldn't need to hope that the swap didn't start being used in
> between me enabling it, and the suspend actually occurring (making it
> impossible to disable afterwards).

I see there's a race, but is it big enough to matter in practice? I
don't think so. Anyway -- suspend.sf.net is way to go for advanced
features.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2014-10-28 22:42:41

by Alan Cox

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

> I believe you should really use "is lid opened or AC or dock
> connected" to determine if it was automatic resume or not. It should
> work better and you can actually do it today.

There is no useful LID information on anything else either.

Alan

2014-10-28 22:57:50

by Alan Cox

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

> If the firmware eats that button (which I hope it wouldn't, but I
> probably should know better then to expect sane behavior), how does
> the kernel know anything more?

The firmware is generally going to do whatever it believes is "correct",
which may nor may not be determined by what the hardware itself does
(if wakeup is a GPIO off the controller then it'll be determined by the
widget the other end what is eaten)

> button). If those button presses don't reliably get communicated, I
> think that's a better problem to solve in the kernel.

You'd have to solve it in the firmware.

> But the other part of why I'm pushing back is that on future hardware,
> we may not have a "suspend" mode, and systems may just be in a deep
> idle, with selected interrupts disabled

Nothing future about this. Some ARM devices have had that kind of mindset
for a long time, some x86 platforms can run that way but don't currently
do so under Linux (eg Baytrail/T). On those x86 platforms 'suspend' and
'resume' are in fact entirely Linux constructs to fake legacy behaviour
on top of an ultra low power idle.

If you are planning for the future then I wouldn't be too hooked on ideas
like "suspend", "lid switches" or assumptions that a "closed" device
should be kept suspended. It's a broken model. It's bad enough that
systemd tries to do magic hackery to fake this up and gets it wrong in
some case (despite making a very good effort) without propogating the mess
further.

- S3 has already gone away on some Intel SoC devices
- Suspend/Resume on such machines are a Linux fake to keep legacy code
happy
- In such an environment your "wakeup" model changes entirely because you
drop into deep idle whever there isn't stuff annoying the CPU
regularly. With the right kinds of video and audio that could even mean
doing it between keypresses (feature parity approaching 1980s 8bit
laptops ;-) )

Instead think long term that

- There may be no such thing as suspend or resume, just make your code
very well behaved on wakeup events, and closing unneeded
devices/resources whenever it can.
- On/off is an extreme action rarely taken (feature parity with 1970s
VAXen ;-) )
- The "blob with a lid" model of construction is no longer useful. Even a
keyboarded device is quite likely have a removable keyboard.

Alan

2014-10-29 19:20:02

by Andy Lutomirski

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On 10/27/2014 07:31 AM, Bastien Nocera wrote:
> On Mon, 2014-10-27 at 10:28 +0100, Pavel Machek wrote:
>> Hi!
>>
>>>> I suspect wakeup type reporting is maybe not the best way to go about
>>>> this, since there may be a number of causes for wakeups and they can
>>>> arrive closely together in different orders, which can result in
>>>> races.
>>>>
>>>> For instance, if the machine suspends, and sets an alarm to be woken
>>>> up at midnight to do a backup, if the user resumes their laptop at
>>>> 11:59:59, should the backup still proceed at midnight?
>>>
>>> No. And I would expect that we would get a wake up type of "power
>>> button" or "lid open" in this case.
>>
>> I believe you should really use "is lid opened or AC or dock
>> connected" to determine if it was automatic resume or not. It should
>> work better and you can actually do it today.
>
> There's no LID or docks on a tablet.

For a tablet, isn't the relevant piece of information whether the power
button was recently pressed, not whether the power button caused the wakeup?

It would be really annoying if there were a window around every RTC
wakeup during which pressing the power button didn't actually turn on
the screen.

--Andy

2014-10-29 20:26:23

by Theodore Ts'o

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Wed, Oct 29, 2014 at 12:19:56PM -0700, Andy Lutomirski wrote:
> For a tablet, isn't the relevant piece of information whether the power
> button was recently pressed, not whether the power button caused the wakeup?

For Android L devices, it has been reported that the device might
power up its screen fully (note I didn't say 'wake up') automatically
when it detects that you are picking it up, or when you double-tap the
screen. It also reportedly has a low power black and white "ambient
display" (ala the Android Wear devics) which allows you to see
notifications without waking up the phone all the way[1]. (All of
this assuming appropriate hardware support, of course.)

[1] http://www.androidauthority.com/ambient-display-lollipop-541198/

Which goes back to the concept of having a "suspend" mode is legacy
thinking. Modern devices will soon have not just a "awake" and a
"asleep" modes; there will be (well, is now) a much wider spectrum of
modes, with the goal of using the minimum amount of power while still
providing use functionality to the user.

- Ted

2014-10-29 21:16:13

by Pavel Machek

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Wed 2014-10-29 16:26:16, Theodore Ts'o wrote:
> On Wed, Oct 29, 2014 at 12:19:56PM -0700, Andy Lutomirski wrote:
> > For a tablet, isn't the relevant piece of information whether the power
> > button was recently pressed, not whether the power button caused the wakeup?
>
> For Android L devices, it has been reported that the device might
> power up its screen fully (note I didn't say 'wake up') automatically
> when it detects that you are picking it up, or when you double-tap the
> screen. It also reportedly has a low power black and white "ambient
> display" (ala the Android Wear devics) which allows you to see
> notifications without waking up the phone all the way[1]. (All of
> this assuming appropriate hardware support, of course.)
>
> [1] http://www.androidauthority.com/ambient-display-lollipop-541198/
>
> Which goes back to the concept of having a "suspend" mode is legacy
> thinking. Modern devices will soon have not just a "awake" and a
> "asleep" modes; there will be (well, is now) a much wider spectrum of
> modes, with the goal of using the minimum amount of power while still
> providing use functionality to the user.

Actually Maemo people (on Nokia N900 and friends) got it right: unlike
android devices, it does not suspend to RAM at any point, and still
has reasonable battery life.

So I agree -- using suspend to RAM on "active" cell phone is just a
bad design.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2014-10-30 13:58:10

by Bastien Nocera

[permalink] [raw]
Subject: Re: suspend to partition Re: A desktop environment[1] kernel wishlist

On Tue, 2014-10-28 at 19:50 +0100, Pavel Machek wrote:
> Hi!
>
> > > > > For mobile
> > > > > devices this is an expected design point, but for off-the-shelf
> > > > > laptops with big fans and exhaust vents, I'm not sure how safe this
> > > > > would be, so you may need to constrain this functionality somehow (or
> > > > > look to see if a enforced low-power resume is possible).
> > > >
> > > > I think that we won't know whether it's a problem until the point that
> > > > somebody actually implements it.
> > >
> > > Kernel does not stop you at this point, right?
> > >
> > > Suspend-to-partition is also doable today (see suspend.sf.net),
> >
> > Is it perfect? Because no releases in 3 years kind of scares me.
>
> Yes, it is perfect :-).
>
> [It is in use by SUSE, so it should be good; and if anything, it shows
> that suspend-to-partition does not need kernel help.]

Does SUSE have patches to integrate this functionality into systemd, by
any chance?

> > > or you
> > > can just swapon before starting. You can take it off the list, I
> > > believe.
> >
> > Or we could create a new filesystem type that isn't swap, that isn't
> > used by swap at all, but could be created by distributions' installers.
> > Then I wouldn't need to hope that the swap didn't start being used in
> > between me enabling it, and the suspend actually occurring (making it
> > impossible to disable afterwards).
>
> I see there's a race, but is it big enough to matter in practice? I
> don't think so.

I'd certainly prefer not to chance it...

> Anyway -- suspend.sf.net is way to go for advanced
> features.

Noted, thanks.

2014-10-30 14:36:24

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Mon, 2014-10-27 at 09:56 -0700, John Stultz wrote:
> On Mon, Oct 27, 2014 at 7:19 AM, Bastien Nocera <[email protected]> wrote:
> > I also cannot know, from user-space, whether Wake-On-LAN,
> > Wake-On-Wireless-LAN, or the Wi-Fi card's "network proximity" triggered
> > coming out of suspend for example.
> >
> > I can certainly check for the status of the lid, but I wouldn't know
> > whether a button was pressed to turn the machine back on, as the
> > firmware would eat that.
>
> If the firmware eats that button (which I hope it wouldn't, but I
> probably should know better then to expect sane behavior), how does
> the kernel know anything more?

The kernel receives an interrupt, likely on a different device. Again,
I'm talking about "legacy" devices, for which suspend is actually a
state. If the device is only in low-power mode, you'd probably get the
event on the input device, which is accessible from user-space.

> > To make it short, I don't have a way to know, from user-space, whether
> > the event that took it out of suspend was programmatic, or user action.
> > I would add that, even if we said that races can occur, I have no easy
> > way to know, from user-space, whether the last thing that occurred was
> > the Wi-Fi card waking the machine up or the power button being pressed.
>
>
> Again, I think you just want to know if the power button (or lid
> trigger) was pressed. Not if it was the cause for resume (since a
> wake-on-lan or alarm could fire right as the user presses the power
> button). If those button presses don't reliably get communicated, I
> think that's a better problem to solve in the kernel.
>
> Again, part of the reason I'm pushing back here, is that there may be
> a lot of things going on on a system, and systems may suspend and
> resume quite often while being in use, so applications really should
> handle events consistently weather the system was suspended or not
> (another lesson from android: suspend blocking is a more flexible
> approach then having applications initiate suspend, since you avoid
> all the races of multiple applications trying to manage initiating
> suspend state).
>
> But the other part of why I'm pushing back is that on future hardware,
> we may not have a "suspend" mode, and systems may just be in a deep
> idle, with selected interrupts disabled (event filtering, in other
> words). So I think its better if you design around events (button
> presses, lid triggers, mouse movements, timers firing), rather then
> specific system suspend state.

Knowing why the Wi-Fi card woke up is also important when there isn't a
full "suspend" state. As was mentioned, it's useful for power debugging,
but it's also useful because that tells things outside the network card
driver what happened.

As I mentioned in more recent emails on this thread, maybe we don't want
to know what woke the system up, but knowing that a wake-up event
occurred on this device, at this time, would allow us to make the
software act accordingly. The fact that we don't know that means that we
cannot take appropriate action.

Cheers

2014-10-30 14:41:45

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, 2014-10-28 at 22:57 +0000, One Thousand Gnomes wrote:
> > If the firmware eats that button (which I hope it wouldn't, but I
> > probably should know better then to expect sane behavior), how does
> > the kernel know anything more?
>
> The firmware is generally going to do whatever it believes is "correct",
> which may nor may not be determined by what the hardware itself does
> (if wakeup is a GPIO off the controller then it'll be determined by the
> widget the other end what is eaten)
>
> > button). If those button presses don't reliably get communicated, I
> > think that's a better problem to solve in the kernel.
>
> You'd have to solve it in the firmware.

Not if the kernel can tell us that the event occurred and when.

> > But the other part of why I'm pushing back is that on future hardware,
> > we may not have a "suspend" mode, and systems may just be in a deep
> > idle, with selected interrupts disabled
>
> Nothing future about this. Some ARM devices have had that kind of mindset
> for a long time, some x86 platforms can run that way but don't currently
> do so under Linux (eg Baytrail/T). On those x86 platforms 'suspend' and
> 'resume' are in fact entirely Linux constructs to fake legacy behaviour
> on top of an ultra low power idle.
>
> If you are planning for the future then I wouldn't be too hooked on ideas
> like "suspend", "lid switches" or assumptions that a "closed" device
> should be kept suspended. It's a broken model. It's bad enough that
> systemd tries to do magic hackery to fake this up and gets it wrong in
> some case (despite making a very good effort) without propogating the mess
> further.
>
> - S3 has already gone away on some Intel SoC devices

And I think I have one of those devices, an Intel Baytrail tablet.

> - Suspend/Resume on such machines are a Linux fake to keep legacy code
> happy

Do you have a link to how this is implemented currently?

> - In such an environment your "wakeup" model changes entirely because you
> drop into deep idle whever there isn't stuff annoying the CPU
> regularly. With the right kinds of video and audio that could even mean
> doing it between keypresses (feature parity approaching 1980s 8bit
> laptops ;-) )
>
> Instead think long term that
>
> - There may be no such thing as suspend or resume, just make your code
> very well behaved on wakeup events, and closing unneeded
> devices/resources whenever it can.
> - On/off is an extreme action rarely taken (feature parity with 1970s
> VAXen ;-) )
> - The "blob with a lid" model of construction is no longer useful. Even a
> keyboarded device is quite likely have a removable keyboard.

Except that what I requested (at least the amended version[1]) would
also work with devices that don't have suspend. And would also work on
the millions of other devices that do have a suspend state, and exist in
the wild.

[1]: Reason for wake-up for each wake-up-able device, along with a
timestamp.

2014-10-30 14:43:24

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Wed, 2014-10-29 at 16:26 -0400, Theodore Ts'o wrote:
> On Wed, Oct 29, 2014 at 12:19:56PM -0700, Andy Lutomirski wrote:
> > For a tablet, isn't the relevant piece of information whether the power
> > button was recently pressed, not whether the power button caused the wakeup?
>
> For Android L devices, it has been reported that the device might
> power up its screen fully (note I didn't say 'wake up') automatically
> when it detects that you are picking it up, or when you double-tap the
> screen.

Power up the screen or the touchscreen?

> It also reportedly has a low power black and white "ambient
> display" (ala the Android Wear devics) which allows you to see
> notifications without waking up the phone all the way[1]. (All of
> this assuming appropriate hardware support, of course.)
>
> [1] http://www.androidauthority.com/ambient-display-lollipop-541198/
>
> Which goes back to the concept of having a "suspend" mode is legacy
> thinking. Modern devices will soon have not just a "awake" and a
> "asleep" modes; there will be (well, is now) a much wider spectrum of
> modes, with the goal of using the minimum amount of power while still
> providing use functionality to the user.

We still have a huge install base that does use suspend though, and for
which this information would be useful.

2014-10-30 14:45:52

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Wed, 2014-10-29 at 22:16 +0100, Pavel Machek wrote:
> On Wed 2014-10-29 16:26:16, Theodore Ts'o wrote:
> > On Wed, Oct 29, 2014 at 12:19:56PM -0700, Andy Lutomirski wrote:
> > > For a tablet, isn't the relevant piece of information whether the power
> > > button was recently pressed, not whether the power button caused the wakeup?
> >
> > For Android L devices, it has been reported that the device might
> > power up its screen fully (note I didn't say 'wake up') automatically
> > when it detects that you are picking it up, or when you double-tap the
> > screen. It also reportedly has a low power black and white "ambient
> > display" (ala the Android Wear devics) which allows you to see
> > notifications without waking up the phone all the way[1]. (All of
> > this assuming appropriate hardware support, of course.)
> >
> > [1] http://www.androidauthority.com/ambient-display-lollipop-541198/
> >
> > Which goes back to the concept of having a "suspend" mode is legacy
> > thinking. Modern devices will soon have not just a "awake" and a
> > "asleep" modes; there will be (well, is now) a much wider spectrum of
> > modes, with the goal of using the minimum amount of power while still
> > providing use functionality to the user.
>
> Actually Maemo people (on Nokia N900 and friends) got it right: unlike
> android devices, it does not suspend to RAM at any point, and still
> has reasonable battery life.

Android devices don't suspend to RAM. Neither do Tizen devices AFAIK.

> So I agree -- using suspend to RAM on "active" cell phone is just a
> bad design.

I don't think anyone was discussing cell phones in particular in this
thread, and knowing when user-space got woken up because of the baseband
processor having information for us would still be useful.

2014-10-30 14:53:28

by Andy Lutomirski

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thu, Oct 30, 2014 at 7:45 AM, Bastien Nocera <[email protected]> wrote:
> On Wed, 2014-10-29 at 22:16 +0100, Pavel Machek wrote:
>> On Wed 2014-10-29 16:26:16, Theodore Ts'o wrote:
>> > On Wed, Oct 29, 2014 at 12:19:56PM -0700, Andy Lutomirski wrote:
>> > > For a tablet, isn't the relevant piece of information whether the power
>> > > button was recently pressed, not whether the power button caused the wakeup?
>> >
>> > For Android L devices, it has been reported that the device might
>> > power up its screen fully (note I didn't say 'wake up') automatically
>> > when it detects that you are picking it up, or when you double-tap the
>> > screen. It also reportedly has a low power black and white "ambient
>> > display" (ala the Android Wear devics) which allows you to see
>> > notifications without waking up the phone all the way[1]. (All of
>> > this assuming appropriate hardware support, of course.)
>> >
>> > [1] http://www.androidauthority.com/ambient-display-lollipop-541198/
>> >
>> > Which goes back to the concept of having a "suspend" mode is legacy
>> > thinking. Modern devices will soon have not just a "awake" and a
>> > "asleep" modes; there will be (well, is now) a much wider spectrum of
>> > modes, with the goal of using the minimum amount of power while still
>> > providing use functionality to the user.
>>
>> Actually Maemo people (on Nokia N900 and friends) got it right: unlike
>> android devices, it does not suspend to RAM at any point, and still
>> has reasonable battery life.
>
> Android devices don't suspend to RAM. Neither do Tizen devices AFAIK.
>
>> So I agree -- using suspend to RAM on "active" cell phone is just a
>> bad design.
>
> I don't think anyone was discussing cell phones in particular in this
> thread, and knowing when user-space got woken up because of the baseband
> processor having information for us would still be useful.
>

You still haven't addressed what problem this solves that isn't solved
by merely knowing whether the baseband processor has useful
information.

--Andy

--
Andy Lutomirski
AMA Capital Management, LLC

2014-10-30 15:05:24

by Theodore Ts'o

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thu, Oct 30, 2014 at 03:45:02PM +0100, Bastien Nocera wrote:
> > Actually Maemo people (on Nokia N900 and friends) got it right: unlike
> > android devices, it does not suspend to RAM at any point, and still
> > has reasonable battery life.
>
> Android devices don't suspend to RAM. Neither do Tizen devices AFAIK.

Actually, Android devices have historically always suspended the CPU
whenever there wasn't a wakelock keeping the device to suspend. You
might not consider this "suspend to RAM" but in fact it uses the
identical kernel and hardware facilities as the legacy "suspend to
RAM" mechanism.

> I don't think anyone was discussing cell phones in particular in this
> thread, and knowing when user-space got woken up because of the baseband
> processor having information for us would still be useful.

It matters because for laptops, what's important is whether the lid is
closed or not. Whether and how the laptop was "woken" is really
beside the point, as others have argued. Your counter argument is
that tablets don't have lids. But tablets are going to be using
schemes similar to Android, Tizen, and Maemo, and they are *not* going
to be using the legacy suspend-to-RAM model, because it's not
sufficiently good at power saving.

Cheers,

- Ted

2014-10-30 15:08:20

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thu, 2014-10-30 at 07:53 -0700, Andy Lutomirski wrote:
> On Thu, Oct 30, 2014 at 7:45 AM, Bastien Nocera <[email protected]> wrote:
> > On Wed, 2014-10-29 at 22:16 +0100, Pavel Machek wrote:
> >> On Wed 2014-10-29 16:26:16, Theodore Ts'o wrote:
> >> > On Wed, Oct 29, 2014 at 12:19:56PM -0700, Andy Lutomirski wrote:
> >> > > For a tablet, isn't the relevant piece of information whether the power
> >> > > button was recently pressed, not whether the power button caused the wakeup?
> >> >
> >> > For Android L devices, it has been reported that the device might
> >> > power up its screen fully (note I didn't say 'wake up') automatically
> >> > when it detects that you are picking it up, or when you double-tap the
> >> > screen. It also reportedly has a low power black and white "ambient
> >> > display" (ala the Android Wear devics) which allows you to see
> >> > notifications without waking up the phone all the way[1]. (All of
> >> > this assuming appropriate hardware support, of course.)
> >> >
> >> > [1] http://www.androidauthority.com/ambient-display-lollipop-541198/
> >> >
> >> > Which goes back to the concept of having a "suspend" mode is legacy
> >> > thinking. Modern devices will soon have not just a "awake" and a
> >> > "asleep" modes; there will be (well, is now) a much wider spectrum of
> >> > modes, with the goal of using the minimum amount of power while still
> >> > providing use functionality to the user.
> >>
> >> Actually Maemo people (on Nokia N900 and friends) got it right: unlike
> >> android devices, it does not suspend to RAM at any point, and still
> >> has reasonable battery life.
> >
> > Android devices don't suspend to RAM. Neither do Tizen devices AFAIK.
> >
> >> So I agree -- using suspend to RAM on "active" cell phone is just a
> >> bad design.
> >
> > I don't think anyone was discussing cell phones in particular in this
> > thread, and knowing when user-space got woken up because of the baseband
> > processor having information for us would still be useful.
> >
>
> You still haven't addressed what problem this solves that isn't solved
> by merely knowing whether the baseband processor has useful
> information.

We don't know that the baseband processor has useful information if
we're not in the "path" for the interrupt that it would send. This event
might end up dying somewhere inside the kernel for all we know.

I've given the example of the Wi-Fi card, which definitely has multiple
ways of being woken up, and doesn't export those in any way that
user-space could use.

2014-10-30 15:15:59

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thu, 2014-10-30 at 11:05 -0400, Theodore Ts'o wrote:
> On Thu, Oct 30, 2014 at 03:45:02PM +0100, Bastien Nocera wrote:
> > > Actually Maemo people (on Nokia N900 and friends) got it right: unlike
> > > android devices, it does not suspend to RAM at any point, and still
> > > has reasonable battery life.
> >
> > Android devices don't suspend to RAM. Neither do Tizen devices AFAIK.
>
> Actually, Android devices have historically always suspended the CPU
> whenever there wasn't a wakelock keeping the device to suspend. You
> might not consider this "suspend to RAM" but in fact it uses the
> identical kernel and hardware facilities as the legacy "suspend to
> RAM" mechanism.

I wouldn't consider this "suspend to RAM", but that's because I expect
the firmware to implement most of that. Anyway, that's splitting hair.

> > I don't think anyone was discussing cell phones in particular in this
> > thread, and knowing when user-space got woken up because of the baseband
> > processor having information for us would still be useful.
>
> It matters because for laptops, what's important is whether the lid is
> closed or not. Whether and how the laptop was "woken" is really
> beside the point, as others have argued. Your counter argument is
> that tablets don't have lids. But tablets are going to be using
> schemes similar to Android, Tizen, and Maemo, and they are *not* going
> to be using the legacy suspend-to-RAM model, because it's not
> sufficiently good at power saving.

There are plenty of tablets around that aren't Android devices. There
are plenty of laptops that can be switched to a tablet mode for which
this wouldn't apply either.

2014-10-30 15:34:44

by Theodore Ts'o

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thu, Oct 30, 2014 at 04:15:15PM +0100, Bastien Nocera wrote:
>
> There are plenty of tablets around that aren't Android devices. There
> are plenty of laptops that can be switched to a tablet mode for which
> this wouldn't apply either.

These "tablets" will either have enough battery that they will be a
commercial failure because your arm will fall off trying to use it as
a "tablet". Or it won't have a laptop sized battery, in which case it
will be using a more advanced power management scheme than "suspend to
ram" --- or it will be a commercial failure because no one likes using
a tablet with a miniscule battery life.

Cheers,

- Ted

2014-10-30 15:37:22

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thu, 2014-10-30 at 11:34 -0400, Theodore Ts'o wrote:
> On Thu, Oct 30, 2014 at 04:15:15PM +0100, Bastien Nocera wrote:
> >
> > There are plenty of tablets around that aren't Android devices. There
> > are plenty of laptops that can be switched to a tablet mode for which
> > this wouldn't apply either.
>
> These "tablets" will either have enough battery that they will be a
> commercial failure because your arm will fall off trying to use it as
> a "tablet". Or it won't have a laptop sized battery, in which case it
> will be using a more advanced power management scheme than "suspend to
> ram" --- or it will be a commercial failure because no one likes using
> a tablet with a miniscule battery life.

And why would that be relevant?

2014-10-30 17:41:40

by Pavel Machek

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thu 2014-10-30 16:15:15, Bastien Nocera wrote:
> On Thu, 2014-10-30 at 11:05 -0400, Theodore Ts'o wrote:
> > On Thu, Oct 30, 2014 at 03:45:02PM +0100, Bastien Nocera wrote:
> > > > Actually Maemo people (on Nokia N900 and friends) got it right: unlike
> > > > android devices, it does not suspend to RAM at any point, and still
> > > > has reasonable battery life.
> > >
> > > Android devices don't suspend to RAM. Neither do Tizen devices AFAIK.
> >
> > Actually, Android devices have historically always suspended the CPU
> > whenever there wasn't a wakelock keeping the device to suspend. You
> > might not consider this "suspend to RAM" but in fact it uses the
> > identical kernel and hardware facilities as the legacy "suspend to
> > RAM" mechanism.
>
> I wouldn't consider this "suspend to RAM", but that's because I expect
> the firmware to implement most of that. Anyway, that's splitting
> hair.

Could you rephrase that?

Anyway, this is "echo mem > /sys/power/state" or
suspend-to-RAM. Android does the same, with more tricky wakeup logic.

> > > I don't think anyone was discussing cell phones in particular in this
> > > thread, and knowing when user-space got woken up because of the baseband
> > > processor having information for us would still be useful.
> >
> > It matters because for laptops, what's important is whether the lid is
> > closed or not. Whether and how the laptop was "woken" is really
> > beside the point, as others have argued. Your counter argument is
> > that tablets don't have lids. But tablets are going to be using
> > schemes similar to Android, Tizen, and Maemo, and they are *not* going
> > to be using the legacy suspend-to-RAM model, because it's not
> > sufficiently good at power saving.
>
> There are plenty of tablets around that aren't Android devices. There
> are plenty of laptops that can be switched to a tablet mode for which
> this wouldn't apply either.

Yes, still the right question is "was the power button pressed while
userland was suspended" not "was the system woken by power
button"... and yes, I guess kernel should add the "power button" event
to the input queue, even if that press was used to wake up the system.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2014-10-30 18:23:36

by Pavel Machek

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thu 2014-10-30 16:07:35, Bastien Nocera wrote:
> On Thu, 2014-10-30 at 07:53 -0700, Andy Lutomirski wrote:
> > On Thu, Oct 30, 2014 at 7:45 AM, Bastien Nocera <[email protected]> wrote:
> > > On Wed, 2014-10-29 at 22:16 +0100, Pavel Machek wrote:
> > >> On Wed 2014-10-29 16:26:16, Theodore Ts'o wrote:
> > >> > On Wed, Oct 29, 2014 at 12:19:56PM -0700, Andy Lutomirski wrote:
> > >> > > For a tablet, isn't the relevant piece of information whether the power
> > >> > > button was recently pressed, not whether the power button caused the wakeup?
> > >> >
> > >> > For Android L devices, it has been reported that the device might
> > >> > power up its screen fully (note I didn't say 'wake up') automatically
> > >> > when it detects that you are picking it up, or when you double-tap the
> > >> > screen. It also reportedly has a low power black and white "ambient
> > >> > display" (ala the Android Wear devics) which allows you to see
> > >> > notifications without waking up the phone all the way[1]. (All of
> > >> > this assuming appropriate hardware support, of course.)
> > >> >
> > >> > [1] http://www.androidauthority.com/ambient-display-lollipop-541198/
> > >> >
> > >> > Which goes back to the concept of having a "suspend" mode is legacy
> > >> > thinking. Modern devices will soon have not just a "awake" and a
> > >> > "asleep" modes; there will be (well, is now) a much wider spectrum of
> > >> > modes, with the goal of using the minimum amount of power while still
> > >> > providing use functionality to the user.
> > >>
> > >> Actually Maemo people (on Nokia N900 and friends) got it right: unlike
> > >> android devices, it does not suspend to RAM at any point, and still
> > >> has reasonable battery life.
> > >
> > > Android devices don't suspend to RAM. Neither do Tizen devices AFAIK.
> > >
> > >> So I agree -- using suspend to RAM on "active" cell phone is just a
> > >> bad design.
> > >
> > > I don't think anyone was discussing cell phones in particular in this
> > > thread, and knowing when user-space got woken up because of the baseband
> > > processor having information for us would still be useful.
> > >
> >
> > You still haven't addressed what problem this solves that isn't solved
> > by merely knowing whether the baseband processor has useful
> > information.
>
> We don't know that the baseband processor has useful information if
> we're not in the "path" for the interrupt that it would send. This event
> might end up dying somewhere inside the kernel for all we know.

See, that would be something to fix. If baseband has incoming data,
and kernel fails to notify userspace about that data after resume,
that is a kernel bug.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2014-10-30 23:19:20

by Alan Cox

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

> It matters because for laptops, what's important is whether the lid is
> closed or not. Whether and how the laptop was "woken" is really
> beside the point, as others have argued. Your counter argument is
> that tablets don't have lids. But tablets are going to be using
> schemes similar to Android, Tizen, and Maemo, and they are *not* going
> to be using the legacy suspend-to-RAM model, because it's not
> sufficiently good at power saving.

There's no longer a tablet/laptop divide in the general case. Many
tablets are convertables and many not convertables are only "non
convertable" in the sense that they are screwed together not fitted with
funky connectors.

There are also a host of reasons (tried making a high speed bus go round
a hinge ?) that this is going to become more common.

2014-10-30 23:21:44

by Alan Cox

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

> I wouldn't consider this "suspend to RAM", but that's because I expect
> the firmware to implement most of that. Anyway, that's splitting hair.

Quite the reverse in many cases. If your hardware has low power idle you
probably have almost no firmware involved (if any). It's the old world
model of S3 which is all firmware powered.

The extreme case of this is that there are processors out there where
the equivalent of a 'wait for IRQ' instruction is the entire thing.

Alan

2014-10-30 23:25:42

by Alan Cox

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

O> The kernel receives an interrupt, likely on a different device. Again,
> I'm talking about "legacy" devices, for which suspend is actually a
> state. If the device is only in low-power mode, you'd probably get the
> event on the input device, which is accessible from user-space.

I don't believe so - the firmware ate it.

> Knowing why the Wi-Fi card woke up is also important when there isn't a
> full "suspend" state. As was mentioned, it's useful for power debugging,
> but it's also useful because that tells things outside the network card
> driver what happened.

Wifi devices that are smart generally have a fair bit of info they
provide themselves on this. In particular if you are using a deep idle
type behaviour they may well wake every minute or so just to poke a
packet out to keep any NAT mapping alive.

> As I mentioned in more recent emails on this thread, maybe we don't want
> to know what woke the system up, but knowing that a wake-up event
> occurred on this device, at this time, would allow us to make the
> software act accordingly. The fact that we don't know that means that we
> cannot take appropriate action.

What woke the system up may also not be a singular item. Suppose the
alarm goes off as the user opens the lid and the wireless gets a wakeup
packet in the same window ?

Alan

2014-10-30 23:39:19

by Alan Cox

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

> > You'd have to solve it in the firmware.
>
> Not if the kernel can tell us that the event occurred and when.

Which it can only do if the firmware told the kernel meaningfully !

> And I think I have one of those devices, an Intel Baytrail tablet.
>
> > - Suspend/Resume on such machines are a Linux fake to keep legacy code
> > happy
>
> Do you have a link to how this is implemented currently?

You ask for suspend and we put all the devices into lowest power state if
they are not already there then sit on our backsides issuing mwaits
asking for C7 state on BYT (C10 I think on HSW).

If you box is ever passive enough you can even randomly enter this state
in the idle loop. You generally won't do this on current devices because
you won't have suitable panels and most desktop OS's are far too noisy on
wakeups. There's nothing preventing you having half your processors in
deep idle.

That's where it is all heading though. Suspend will eventually go away.

> [1]: Reason for wake-up for each wake-up-able device, along with a
> timestamp.

We may not know and the answer in many cases will be extremely device
specific. It's a reasonable ask but answers even if available are likely
to be things like "because GPE36" and GPE36 will just be some connection
to something that could be anything from a lid switch to a light sensor
or even a smart wifi chip deciding it wants the CPU to help out because
you are out of range of the base station. We may not even know what it
relates to.

A non suspend system will exit deep idle type status because they got
an IRQ or perhaps some DMA needed the cache coherency. That doesn't mean
they've got the foggiest which IRQ kicked them out if idle, just that hey
I'm awake and there are four pending interrupts. That of course is
assuming it even noticed it entered a deep idle state - you don't want to
wake an idle CPU to tell it that its more idle than it was before.

Alan

2014-10-31 09:36:26

by Jan Kara

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Mon 27-10-14 20:02:51, Sergey "Shnatsel" Davidoff wrote:
> > If "recursive mtime" was available, would that work for you?
>
> It would work for detecting "offline" changes. I suppose recursive
> mtime not viable for online monitoring, mostly because detecting file
> renaming would be a massive PITA (and we already have fanotify with
> exactly this problem).
Yes, you'll get only "something has changed in a subtree" information for
each directory. You'd then have to rescan the directory to find out what
has changed. But there's no simple solution for this - either you have to
process tons of events for busy directory tree or you have to somehow
reduce the amount of information provided to userspace...

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2014-10-31 13:55:26

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, 2014-10-28 at 07:36 -0700, John Stultz wrote:
> On Tue, Oct 28, 2014 at 5:36 AM, Bastien Nocera <[email protected]> wrote:
> > Maybe the wake-up reason isn't good enough on its own, but how do I know
> > which one the possible wake-up reasons was the last one to trigger?
>
> So I feel like I'm still missing why its so critical to know what the
> last-event was? To me it seems a number of events have occurred, and
> they should all be processed. Since they're all asynchronous, they
> could come in any order, so it seems best handle them one by one
> rather then have any requirement on which one happened last.
>
> What does the exact timeline of the events provide for you?

What's most important is the reason for (device) wake-up. I could see if
the event that woke up the machine was Wake-On-LAN (which would wake up
the screen), or the proximity of wireless networks (Wi-Fi card firmwares
can do that on their own) (which wouldn't wake the screen up).

The timestamp makes it possible to avoid races that were mentioned
earlier in the thread.

Knowing whether an alarm or a button woke the machine would be useful as
well (even if we would need to monitor additional resources, such as the
input devices, or accelerometers, to know what state the machine
currently is in).

As others mentioned, this is also useful as a power debugging tool. The
important part here would be that each device would report its own
wake-up reason, which the driver can know better than user-space or
other parts of the kernel. User-space would be responsible for
coalescing that information.

2014-10-31 13:58:16

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thu, 2014-10-30 at 19:23 +0100, Pavel Machek wrote:
> On Thu 2014-10-30 16:07:35, Bastien Nocera wrote:
> > On Thu, 2014-10-30 at 07:53 -0700, Andy Lutomirski wrote:
> > > On Thu, Oct 30, 2014 at 7:45 AM, Bastien Nocera <[email protected]> wrote:
> > > > On Wed, 2014-10-29 at 22:16 +0100, Pavel Machek wrote:
> > > >> On Wed 2014-10-29 16:26:16, Theodore Ts'o wrote:
> > > >> > On Wed, Oct 29, 2014 at 12:19:56PM -0700, Andy Lutomirski wrote:
> > > >> > > For a tablet, isn't the relevant piece of information whether the power
> > > >> > > button was recently pressed, not whether the power button caused the wakeup?
> > > >> >
> > > >> > For Android L devices, it has been reported that the device might
> > > >> > power up its screen fully (note I didn't say 'wake up') automatically
> > > >> > when it detects that you are picking it up, or when you double-tap the
> > > >> > screen. It also reportedly has a low power black and white "ambient
> > > >> > display" (ala the Android Wear devics) which allows you to see
> > > >> > notifications without waking up the phone all the way[1]. (All of
> > > >> > this assuming appropriate hardware support, of course.)
> > > >> >
> > > >> > [1] http://www.androidauthority.com/ambient-display-lollipop-541198/
> > > >> >
> > > >> > Which goes back to the concept of having a "suspend" mode is legacy
> > > >> > thinking. Modern devices will soon have not just a "awake" and a
> > > >> > "asleep" modes; there will be (well, is now) a much wider spectrum of
> > > >> > modes, with the goal of using the minimum amount of power while still
> > > >> > providing use functionality to the user.
> > > >>
> > > >> Actually Maemo people (on Nokia N900 and friends) got it right: unlike
> > > >> android devices, it does not suspend to RAM at any point, and still
> > > >> has reasonable battery life.
> > > >
> > > > Android devices don't suspend to RAM. Neither do Tizen devices AFAIK.
> > > >
> > > >> So I agree -- using suspend to RAM on "active" cell phone is just a
> > > >> bad design.
> > > >
> > > > I don't think anyone was discussing cell phones in particular in this
> > > > thread, and knowing when user-space got woken up because of the baseband
> > > > processor having information for us would still be useful.
> > > >
> > >
> > > You still haven't addressed what problem this solves that isn't solved
> > > by merely knowing whether the baseband processor has useful
> > > information.
> >
> > We don't know that the baseband processor has useful information if
> > we're not in the "path" for the interrupt that it would send. This event
> > might end up dying somewhere inside the kernel for all we know.
>
> See, that would be something to fix. If baseband has incoming data,
> and kernel fails to notify userspace about that data after resume,
> that is a kernel bug.

It might notify through a specific socket, through a specific service,
but the outside observer, one that's not directly in the path to that
event, will be unaware. Coalescing all the information taking different
routes from the harwdare to user-space can be very difficult, and it's
already difficult to observe the reason for wakeup without being part of
that event's path.

Having drivers export that information would allow us to know why
something was woken up.

2014-10-31 14:01:06

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thu, 2014-10-30 at 18:41 +0100, Pavel Machek wrote:
> On Thu 2014-10-30 16:15:15, Bastien Nocera wrote:
> > On Thu, 2014-10-30 at 11:05 -0400, Theodore Ts'o wrote:
> > > On Thu, Oct 30, 2014 at 03:45:02PM +0100, Bastien Nocera wrote:
> > > > > Actually Maemo people (on Nokia N900 and friends) got it right: unlike
> > > > > android devices, it does not suspend to RAM at any point, and still
> > > > > has reasonable battery life.
> > > >
> > > > Android devices don't suspend to RAM. Neither do Tizen devices AFAIK.
> > >
> > > Actually, Android devices have historically always suspended the CPU
> > > whenever there wasn't a wakelock keeping the device to suspend. You
> > > might not consider this "suspend to RAM" but in fact it uses the
> > > identical kernel and hardware facilities as the legacy "suspend to
> > > RAM" mechanism.
> >
> > I wouldn't consider this "suspend to RAM", but that's because I expect
> > the firmware to implement most of that. Anyway, that's splitting
> > hair.
>
> Could you rephrase that?
>
> Anyway, this is "echo mem > /sys/power/state" or
> suspend-to-RAM. Android does the same, with more tricky wakeup logic.
>
> > > > I don't think anyone was discussing cell phones in particular in this
> > > > thread, and knowing when user-space got woken up because of the baseband
> > > > processor having information for us would still be useful.
> > >
> > > It matters because for laptops, what's important is whether the lid is
> > > closed or not. Whether and how the laptop was "woken" is really
> > > beside the point, as others have argued. Your counter argument is
> > > that tablets don't have lids. But tablets are going to be using
> > > schemes similar to Android, Tizen, and Maemo, and they are *not* going
> > > to be using the legacy suspend-to-RAM model, because it's not
> > > sufficiently good at power saving.
> >
> > There are plenty of tablets around that aren't Android devices. There
> > are plenty of laptops that can be switched to a tablet mode for which
> > this wouldn't apply either.
>
> Yes, still the right question is "was the power button pressed while
> userland was suspended" not "was the system woken by power
> button"...

"Was the power button pressed while userland was suspended" is
presumably also racy.

> and yes, I guess kernel should add the "power button" event
> to the input queue, even if that press was used to wake up the system.

And how would one know whether to suspend or resume in this case?

2014-10-31 14:02:14

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thu, 2014-10-30 at 23:25 +0000, One Thousand Gnomes wrote:
> O> The kernel receives an interrupt, likely on a different device. Again,
> > I'm talking about "legacy" devices, for which suspend is actually a
> > state. If the device is only in low-power mode, you'd probably get the
> > event on the input device, which is accessible from user-space.
>
> I don't believe so - the firmware ate it.
>
> > Knowing why the Wi-Fi card woke up is also important when there isn't a
> > full "suspend" state. As was mentioned, it's useful for power debugging,
> > but it's also useful because that tells things outside the network card
> > driver what happened.
>
> Wifi devices that are smart generally have a fair bit of info they
> provide themselves on this. In particular if you are using a deep idle
> type behaviour they may well wake every minute or so just to poke a
> packet out to keep any NAT mapping alive.
>
> > As I mentioned in more recent emails on this thread, maybe we don't want
> > to know what woke the system up, but knowing that a wake-up event
> > occurred on this device, at this time, would allow us to make the
> > software act accordingly. The fact that we don't know that means that we
> > cannot take appropriate action.
>
> What woke the system up may also not be a singular item. Suppose the
> alarm goes off as the user opens the lid and the wireless gets a wakeup
> packet in the same window ?

Then each one of those devices would have their own wakeup reason set,
with a timestamp.

2014-10-31 14:04:25

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thu, 2014-10-30 at 23:39 +0000, One Thousand Gnomes wrote:
> > > You'd have to solve it in the firmware.
> >
> > Not if the kernel can tell us that the event occurred and when.
>
> Which it can only do if the firmware told the kernel meaningfully !
>
> > And I think I have one of those devices, an Intel Baytrail tablet.
> >
> > > - Suspend/Resume on such machines are a Linux fake to keep legacy code
> > > happy
> >
> > Do you have a link to how this is implemented currently?
>
> You ask for suspend and we put all the devices into lowest power state if
> they are not already there then sit on our backsides issuing mwaits
> asking for C7 state on BYT (C10 I think on HSW).
>
> If you box is ever passive enough you can even randomly enter this state
> in the idle loop. You generally won't do this on current devices because
> you won't have suitable panels and most desktop OS's are far too noisy on
> wakeups. There's nothing preventing you having half your processors in
> deep idle.
>
> That's where it is all heading though. Suspend will eventually go away.
>
> > [1]: Reason for wake-up for each wake-up-able device, along with a
> > timestamp.
>
> We may not know and the answer in many cases will be extremely device
> specific.

Which is why I'm interested in the device drivers providing that
information.

> It's a reasonable ask but answers even if available are likely
> to be things like "because GPE36" and GPE36 will just be some connection
> to something that could be anything from a lid switch to a light sensor
> or even a smart wifi chip deciding it wants the CPU to help out because
> you are out of range of the base station. We may not even know what it
> relates to.

But the device or platform driver would know that, presumably.

> A non suspend system will exit deep idle type status because they got
> an IRQ or perhaps some DMA needed the cache coherency. That doesn't mean
> they've got the foggiest which IRQ kicked them out if idle, just that hey
> I'm awake and there are four pending interrupts. That of course is
> assuming it even noticed it entered a deep idle state - you don't want to
> wake an idle CPU to tell it that its more idle than it was before.

Sure, the CPU might not be the best example of a device for which we
need to track the wakeup reason. The device drivers however...

2014-10-31 17:38:21

by John Stultz

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Fri, Oct 31, 2014 at 6:54 AM, Bastien Nocera <[email protected]> wrote:
> On Tue, 2014-10-28 at 07:36 -0700, John Stultz wrote:
>> On Tue, Oct 28, 2014 at 5:36 AM, Bastien Nocera <[email protected]> wrote:
>> > Maybe the wake-up reason isn't good enough on its own, but how do I know
>> > which one the possible wake-up reasons was the last one to trigger?
>>
>> So I feel like I'm still missing why its so critical to know what the
>> last-event was? To me it seems a number of events have occurred, and
>> they should all be processed. Since they're all asynchronous, they
>> could come in any order, so it seems best handle them one by one
>> rather then have any requirement on which one happened last.
>>
>> What does the exact timeline of the events provide for you?
>
> What's most important is the reason for (device) wake-up.

You keep on insisting this, but I still don't quite understand *why*.

> I could see if
> the event that woke up the machine was Wake-On-LAN (which would wake up
> the screen), or the proximity of wireless networks (Wi-Fi card firmwares
> can do that on their own) (which wouldn't wake the screen up).
>
> The timestamp makes it possible to avoid races that were mentioned
> earlier in the thread.

But even as Alan said, we may not actually know the real ordering even
in the kernel. We come out of resume and there may be three interrupts
pending, and they'll just be handled in priority order, not in real
time order.

> Knowing whether an alarm or a button woke the machine would be useful as
> well (even if we would need to monitor additional resources, such as the
> input devices, or accelerometers, to know what state the machine
> currently is in).
>
> As others mentioned, this is also useful as a power debugging tool. The
> important part here would be that each device would report its own
> wake-up reason, which the driver can know better than user-space or
> other parts of the kernel. User-space would be responsible for
> coalescing that information.

I agree that it can be useful for power statistics and debugging, and
there is work being done by Android folks and others to improve this.
So you might follow along with that work.

But I still am concerned that the model you're seeming to want to use
- depending on what specifically caused the state transition - is
flawed, and instead recommend a more event based model.

Even so, I worry we're just talking past each other, so I'll stop
being a wet blanket here and let you get on with doing development and
hopefully proving me wrong. :)

thanks
-john

2014-11-03 14:17:54

by Alan Cox

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

> > It's a reasonable ask but answers even if available are likely
> > to be things like "because GPE36" and GPE36 will just be some connection
> > to something that could be anything from a lid switch to a light sensor
> > or even a smart wifi chip deciding it wants the CPU to help out because
> > you are out of range of the base station. We may not even know what it
> > relates to.
>
> But the device or platform driver would know that, presumably.

Quite often it has no idea - maybe the firmware knows but it isn't
telling us. It's an internal detail.
>
> > A non suspend system will exit deep idle type status because they got
> > an IRQ or perhaps some DMA needed the cache coherency. That doesn't mean
> > they've got the foggiest which IRQ kicked them out if idle, just that hey
> > I'm awake and there are four pending interrupts. That of course is
> > assuming it even noticed it entered a deep idle state - you don't want to
> > wake an idle CPU to tell it that its more idle than it was before.
>
> Sure, the CPU might not be the best example of a device for which we
> need to track the wakeup reason. The device drivers however...

You keep assuming a wakeup is "special" - it's quite possibly not. The
RTC driver knows whether an alarm went off, it's got no idea if that
cause a wakeup or even what a wake up is or if the platform has wakeups
or just deep sleeps.

Same for most other stuff - plugging in a display may well bump a machine
out of deep sleep but the graphics driver isn't going to know anything
other than "I'm handling a cable change".

Events can also be processed entirely by firmware so we just wake up, look
round, scratch out head and go back to sleep.

So I really think "why did I wake up" is actually the wrong question to
be asking.

What you probably should be asking (and what the kernel effectively asks)
is "Why am I not idle ?". That plus "what state changes have occurred
that I care about". Both of those are questions you can ask at any time
without caring how sleeping may or may not happen.

ie you don't care if a lid event woke you, you care if the lid is open or
shut. You don't care whether a wireless event woke you, you care that the
wireless has been lost etc..

Ie instead of doing

if (woken && cause == BATTERY_LOW)
suspend_to_disk()

you want to be doing (as part of the normal flow)

if (battery < BATTERY_LOW && battery_prev >= BATTERY_LOW +
HYSTERISIS)
suspend_to_disk();

because it's kind of irrelevant whether it woke you for this, you need to
do it anyway.

Alan

2014-11-03 18:22:12

by Heinrich Schuchardt

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On 31.10.2014 10:36, Jan Kara wrote:
> On Mon 27-10-14 20:02:51, Sergey "Shnatsel" Davidoff wrote:
>>> If "recursive mtime" was available, would that work for you?
>>
>> It would work for detecting "offline" changes. I suppose recursive
>> mtime not viable for online monitoring, mostly because detecting file
>> renaming would be a massive PITA (and we already have fanotify with
>> exactly this problem).
> Yes, you'll get only "something has changed in a subtree" information for
> each directory. You'd then have to rescan the directory to find out what
> has changed. But there's no simple solution for this - either you have to
> process tons of events for busy directory tree or you have to somehow
> reduce the amount of information provided to userspace...

If inotify_add_watch() would allow to mark a complete mount (like
fanotify_mark() called with FAN_MOUNT) events for all files on this
mount could be detected. If furthermore inotify_read() would return the
complete relative path of the changed file relative to the mount in
inotify_event->name, it would be obvious what the meaning of the event is.

Best regards

Heinrich Schuchardt

2014-11-04 09:28:18

by Jan Kara

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Mon 03-11-14 19:21:43, Heinrich Schuchardt wrote:
> On 31.10.2014 10:36, Jan Kara wrote:
> >On Mon 27-10-14 20:02:51, Sergey "Shnatsel" Davidoff wrote:
> >>>If "recursive mtime" was available, would that work for you?
> >>
> >>It would work for detecting "offline" changes. I suppose recursive
> >>mtime not viable for online monitoring, mostly because detecting file
> >>renaming would be a massive PITA (and we already have fanotify with
> >>exactly this problem).
> > Yes, you'll get only "something has changed in a subtree" information for
> >each directory. You'd then have to rescan the directory to find out what
> >has changed. But there's no simple solution for this - either you have to
> >process tons of events for busy directory tree or you have to somehow
> >reduce the amount of information provided to userspace...
>
> If inotify_add_watch() would allow to mark a complete mount (like
> fanotify_mark() called with FAN_MOUNT) events for all files on this
> mount could be detected. If furthermore inotify_read() would return
> the complete relative path of the changed file relative to the mount
> in inotify_event->name, it would be obvious what the meaning of the
> event is.
There are two catches though:
1) Getting of the path itself is unreliable due to presence of other fs
changes happening in parallel to you traversing the directory tree.

2) The name you get from inotify_event->name is unreliable because by the
time you read the event, directory tree may be completely different.

These are the reasons why fanotify passes file descriptors with the
events instead of names.

Also for mountpoint wide notification your app has to be much faster
processing events so that the event queue doesn't overflow (and thus forces
you to do full rescan). fanotify deals with this by not limiting event
queue length at all but that's one of the reason why it's restricted to
CAP_SYS_ADMIN users.

So all in all I would find it better to extend fanotify to provide
directory events (that was originally planned but the support was dropped
due to technical issues), solve problems with unlimited event queues, add
some permission checking for passed file descriptors and audit fanotify to
verify it's otherwise safe for regular users.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2014-11-04 19:55:51

by Heinrich Schuchardt

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On 04.11.2014 10:28, Jan Kara wrote:

>> If inotify_add_watch() would allow to mark a complete mount (like
>> fanotify_mark() called with FAN_MOUNT) events for all files on this
>> mount could be detected. If furthermore inotify_read() would return
>> the complete relative path of the changed file relative to the mount
>> in inotify_event->name, it would be obvious what the meaning of the
>> event is.
> There are two catches though:
> 1) Getting of the path itself is unreliable due to presence of other fs
> changes happening in parallel to you traversing the directory tree.
>
> 2) The name you get from inotify_event->name is unreliable because by the
> time you read the event, directory tree may be completely different.
>
> These are the reasons why fanotify passes file descriptors with the
> events instead of names.
>
> Also for mountpoint wide notification your app has to be much faster
> processing events so that the event queue doesn't overflow (and thus forces
> you to do full rescan). fanotify deals with this by not limiting event
> queue length at all but that's one of the reason why it's restricted to
> CAP_SYS_ADMIN users.

This is reflected in fanotify_init already by explicitly checking
CAP_SYS_ADMIN when using flag FAN_UNLIMITED_QUEUE.

>
> So all in all I would find it better to extend fanotify to provide
> directory events (that was originally planned but the support was dropped
> due to technical issues),

I had a brief look at the systemd coding. They use at least:
IN_ATTRIB
IN_CLOSE_WRITE
IN_CLOSE_NOWRITE
IN_CREATE
IN_DELETE
IN_DELETE_SELF
IN_MODIFY
IN_MOVE_SELF
IN_MOVED_FROM
IN_MOVED_TO

Most of the sources of these events are passed as dentry and not as
path. Bug 86781 describes that fanotify does not create events for
double mounted file systems. It suggest to use the mount list of the
superblock (which the dentry points to) instead of the mount indicated
by a path. https://bugzilla.kernel.org/show_bug.cgi?id=86781

Using dentries would allow to minimize code changes outside the core of
the notification framework. Hence this is a central design decision to
take before implementing the additional fanotify events.

> solve problems with unlimited event queues,

Isn't this already solved by
group->max_events = FANOTIFY_DEFAULT_MAX_EVENTS?

> add some permission checking for passed file descriptors

This brings us back to
https://lkml.org/lkml/2014/4/19/151
(fanotify: check permissions when creating file descriptor)

> and audit fanotify to
> verify it's otherwise safe for regular users.

Best regards

Heinrich Schuchardt

2014-11-05 17:18:25

by Jan Kara

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue 04-11-14 20:55:15, Heinrich Schuchardt wrote:
> On 04.11.2014 10:28, Jan Kara wrote:
>
> >>If inotify_add_watch() would allow to mark a complete mount (like
> >>fanotify_mark() called with FAN_MOUNT) events for all files on this
> >>mount could be detected. If furthermore inotify_read() would return
> >>the complete relative path of the changed file relative to the mount
> >>in inotify_event->name, it would be obvious what the meaning of the
> >>event is.
> > There are two catches though:
> >1) Getting of the path itself is unreliable due to presence of other fs
> >changes happening in parallel to you traversing the directory tree.
> >
> >2) The name you get from inotify_event->name is unreliable because by the
> >time you read the event, directory tree may be completely different.
> >
> > These are the reasons why fanotify passes file descriptors with the
> >events instead of names.
> >
> > Also for mountpoint wide notification your app has to be much faster
> >processing events so that the event queue doesn't overflow (and thus forces
> >you to do full rescan). fanotify deals with this by not limiting event
> >queue length at all but that's one of the reason why it's restricted to
> >CAP_SYS_ADMIN users.
>
> This is reflected in fanotify_init already by explicitly checking
> CAP_SYS_ADMIN when using flag FAN_UNLIMITED_QUEUE.
Right. So this is already handled, although we'd possibly need to have
the user limit tunable, not a compile time constant.

> > So all in all I would find it better to extend fanotify to provide
> >directory events (that was originally planned but the support was dropped
> >due to technical issues),
>
> I had a brief look at the systemd coding. They use at least:
> IN_ATTRIB
> IN_CLOSE_WRITE
> IN_CLOSE_NOWRITE
> IN_CREATE
> IN_DELETE
> IN_DELETE_SELF
> IN_MODIFY
> IN_MOVE_SELF
> IN_MOVED_FROM
> IN_MOVED_TO
>
> Most of the sources of these events are passed as dentry and not as
> path. Bug 86781 describes that fanotify does not create events for
> double mounted file systems. It suggest to use the mount list of the
> superblock (which the dentry points to) instead of the mount
> indicated by a path.
> https://bugzilla.kernel.org/show_bug.cgi?id=86781
I think there was a reason for this behavior but I forgot...

> Using dentries would allow to minimize code changes outside the core
> of the notification framework. Hence this is a central design
> decision to take before implementing the additional fanotify events.
>
> > solve problems with unlimited event queues,
>
> Isn't this already solved by
> group->max_events = FANOTIFY_DEFAULT_MAX_EVENTS?
Mostly.

> > add some permission checking for passed file descriptors
>
> This brings us back to
> https://lkml.org/lkml/2014/4/19/151
> (fanotify: check permissions when creating file descriptor)
Yeah, somewhat. So when we'd open fanotify for ordinary users, we have to
add permission checks. Not only to the inode about whose change the event
notifies but also to all the path that leads to the inode (and that's going
to be tricky). Otherwise you could get file descriptors of files you'd
otherwise never be able to open. Hum, that's ugly.

Honza
--
Jan Kara <[email protected]>
SUSE Labs, CR

2014-11-21 19:08:45

by Pavel Machek

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

Hi!
On Mon 2014-10-27 15:19:39, Bastien Nocera wrote:

> > Now, I do think knowing which IRQ did bring you out of suspend is
> > useful, but mostly for power-debugging when you're trying to optimize
> > battery life. But for userland logic, I think its far too prone to
> > races.
>
> I also cannot know, from user-space, whether Wake-On-LAN,
> Wake-On-Wireless-LAN, or the Wi-Fi card's "network proximity" triggered
> coming out of suspend for example.
>
> I can certainly check for the status of the lid, but I wouldn't know
> whether a button was pressed to turn the machine back on, as the
> firmware would eat that.

If the firmware eats that, I'd call it "firmware bug"... and if we
could work around it by synthetizing button press, yes, we should do
that.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2015-04-30 16:25:25

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, 2014-10-21 at 10:04 -0700, John Stultz wrote:
> On Tue, Oct 21, 2014 at 1:49 AM, Bastien Nocera <[email protected]>
> wrote:
> > Hey,
> >
> > GNOME has had discussions with kernel developers in the past, and,
> > fortunately, in some cases we were able to make headway.
> >
> > There are however a number of items that we still don't have
> > solutions
> > for, items that kernel developers might not realise we'd like to
> > rely
> > on, or don't know that we'd make use of if merged.
> >
> > I've posted this list at:
> > https://wiki.gnome.org/BastienNocera/KernelWishlist
> >
> > Let me know on-list or off-list if you have any comments about
> > those, so
> > I can update the list.
>
> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
> documentation'
>
> Can you expand more on the rational for the need here? Is this for UI
> for power debugging, or something else?

This is pretty much what I had in mind:
https://www.chromium.org/chromium-os/chromiumos-design-docs/lucid-sleep

I guess I didn't make myself understood.

2015-04-30 17:10:28

by John Stultz

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thu, Apr 30, 2015 at 9:25 AM, Bastien Nocera <[email protected]> wrote:
> On Tue, 2014-10-21 at 10:04 -0700, John Stultz wrote:
>> On Tue, Oct 21, 2014 at 1:49 AM, Bastien Nocera <[email protected]>
>> wrote:
>> > Hey,
>> >
>> > GNOME has had discussions with kernel developers in the past, and,
>> > fortunately, in some cases we were able to make headway.
>> >
>> > There are however a number of items that we still don't have
>> > solutions
>> > for, items that kernel developers might not realise we'd like to
>> > rely
>> > on, or don't know that we'd make use of if merged.
>> >
>> > I've posted this list at:
>> > https://wiki.gnome.org/BastienNocera/KernelWishlist
>> >
>> > Let me know on-list or off-list if you have any comments about
>> > those, so
>> > I can update the list.
>>
>> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
>> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
>> documentation'
>>
>> Can you expand more on the rational for the need here? Is this for UI
>> for power debugging, or something else?
>
> This is pretty much what I had in mind:
> https://www.chromium.org/chromium-os/chromiumos-design-docs/lucid-sleep
>
> I guess I didn't make myself understood.

My, admittedly quick skim, of that design document seems to suggest
that lucid sleep would be a new kernel state. That would keep the
kernel in charge of determining the state transitions (ie:
SUSPEND-(alarm)->LUCID-(wakelock
release)->SUSPEND-(alarm)->LUCID-(power-button)->AWAKE). Then it seems
userspace would be able to query the current state. This avoids some
of the races I was concerned with trying to detect which irq woke us
from suspend from userspace.

That said, the Power Manager section in that document sounds a little
racy as it seems to rely on asking userspace if suspend is ok, rather
then using userspace wakelocks, so I'm not sure how well baked this
doc is.

Olof: Can you comment on who's working on that design doc? Also the
discussion around using freezing cgroups separately to distinguish
between lucid and awake is interesting, but I wonder if we need to
make wakeup_sources/wakelocks cgroup aware, or has that already been
done?

thanks
-john

2015-04-30 17:23:59

by Olof Johansson

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

Hi,

On Thu, Apr 30, 2015 at 10:10 AM, John Stultz <[email protected]> wrote:
> On Thu, Apr 30, 2015 at 9:25 AM, Bastien Nocera <[email protected]> wrote:
>> On Tue, 2014-10-21 at 10:04 -0700, John Stultz wrote:
>>> On Tue, Oct 21, 2014 at 1:49 AM, Bastien Nocera <[email protected]>
>>> wrote:
>>> > Hey,
>>> >
>>> > GNOME has had discussions with kernel developers in the past, and,
>>> > fortunately, in some cases we were able to make headway.
>>> >
>>> > There are however a number of items that we still don't have
>>> > solutions
>>> > for, items that kernel developers might not realise we'd like to
>>> > rely
>>> > on, or don't know that we'd make use of if merged.
>>> >
>>> > I've posted this list at:
>>> > https://wiki.gnome.org/BastienNocera/KernelWishlist
>>> >
>>> > Let me know on-list or off-list if you have any comments about
>>> > those, so
>>> > I can update the list.
>>>
>>> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
>>> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
>>> documentation'
>>>
>>> Can you expand more on the rational for the need here? Is this for UI
>>> for power debugging, or something else?
>>
>> This is pretty much what I had in mind:
>> https://www.chromium.org/chromium-os/chromiumos-design-docs/lucid-sleep
>>
>> I guess I didn't make myself understood.
>
> My, admittedly quick skim, of that design document seems to suggest
> that lucid sleep would be a new kernel state. That would keep the
> kernel in charge of determining the state transitions (ie:
> SUSPEND-(alarm)->LUCID-(wakelock
> release)->SUSPEND-(alarm)->LUCID-(power-button)->AWAKE). Then it seems
> userspace would be able to query the current state. This avoids some
> of the races I was concerned with trying to detect which irq woke us
> from suspend from userspace.
>
> That said, the Power Manager section in that document sounds a little
> racy as it seems to rely on asking userspace if suspend is ok, rather
> then using userspace wakelocks, so I'm not sure how well baked this
> doc is.
>
> Olof: Can you comment on who's working on that design doc? Also the
> discussion around using freezing cgroups separately to distinguish
> between lucid and awake is interesting, but I wonder if we need to
> make wakeup_sources/wakelocks cgroup aware, or has that already been
> done?


Sameer and Chirantan have both been deeply involved in that project,
adding them on cc here.


-Olof

2015-04-30 18:54:57

by Chirantan Ekbote

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thu, Apr 30, 2015 at 10:23 AM, Olof Johansson <[email protected]> wrote:
> Hi,
>
> On Thu, Apr 30, 2015 at 10:10 AM, John Stultz <[email protected]> wrote:
>> On Thu, Apr 30, 2015 at 9:25 AM, Bastien Nocera <[email protected]> wrote:
>>> On Tue, 2014-10-21 at 10:04 -0700, John Stultz wrote:
>>>> On Tue, Oct 21, 2014 at 1:49 AM, Bastien Nocera <[email protected]>
>>>> wrote:
>>>> > Hey,
>>>> >
>>>> > GNOME has had discussions with kernel developers in the past, and,
>>>> > fortunately, in some cases we were able to make headway.
>>>> >
>>>> > There are however a number of items that we still don't have
>>>> > solutions
>>>> > for, items that kernel developers might not realise we'd like to
>>>> > rely
>>>> > on, or don't know that we'd make use of if merged.
>>>> >
>>>> > I've posted this list at:
>>>> > https://wiki.gnome.org/BastienNocera/KernelWishlist
>>>> >
>>>> > Let me know on-list or off-list if you have any comments about
>>>> > those, so
>>>> > I can update the list.
>>>>
>>>> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
>>>> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
>>>> documentation'
>>>>
>>>> Can you expand more on the rational for the need here? Is this for UI
>>>> for power debugging, or something else?
>>>
>>> This is pretty much what I had in mind:
>>> https://www.chromium.org/chromium-os/chromiumos-design-docs/lucid-sleep
>>>
>>> I guess I didn't make myself understood.
>>
>> My, admittedly quick skim, of that design document seems to suggest
>> that lucid sleep would be a new kernel state. That would keep the
>> kernel in charge of determining the state transitions (ie:
>> SUSPEND-(alarm)->LUCID-(wakelock
>> release)->SUSPEND-(alarm)->LUCID-(power-button)->AWAKE). Then it seems
>> userspace would be able to query the current state. This avoids some
>> of the races I was concerned with trying to detect which irq woke us
>> from suspend from userspace.
>>

Tomeu has been working on making things so that we don't need a new
kernel state. Adding him on cc so he can correct me if I say
something wrong. The current idea is to have userspace runtime
suspend any unneeded devices before starting a suspend. This way the
kernel will leave them alone at resume. This behavior already exists
in the mainline kernel. Getting the wakeup reason can be accomplished
by having the kernel emit a uevent with the wakeup reason. This is
the only change that would be necessary.

>> That said, the Power Manager section in that document sounds a little
>> racy as it seems to rely on asking userspace if suspend is ok, rather
>> then using userspace wakelocks, so I'm not sure how well baked this
>> doc is.
>>

I'm not sure I understand what you are saying here. If you're saying
that the kernel is asking userspace if suspend is ok, then I can say
that that's definitely not the case. Currently from the kernel's
perspective a lucid sleep resume isn't really different from a regular
resume. We have a hack in each driver that we care about that
basically boils down to an if statement that skips re-initialization
if we are entering lucid sleep. If userspace tries to access that
device in lucid sleep, it just gets an error. This has actually
caused us some headache (see the GPU process section of the doc),
which is why we'd like to switch to using the runtime suspend approach
I mentioned above.

If instead you're saying that the power manager needs to ask the rest
of userspace whether suspend is ok, you can think of the current
mechanism as a sort of timed wake lock. Daemons that care about lucid
sleep register with the power manager when they start up. The power
manager then waits for these daemons to report readiness while in
lucid sleep before starting another suspend. So each daemon
effectively acquires a wake lock when the system enters lucid sleep
and releases the wake lock when it reports readiness to the power
manager or the timeout occurs.

>> Olof: Can you comment on who's working on that design doc? Also the
>> discussion around using freezing cgroups separately to distinguish
>> between lucid and awake is interesting, but I wonder if we need to
>> make wakeup_sources/wakelocks cgroup aware, or has that already been
>> done?
>

Currently cgroup process management is handled by the Chrome browser
since the only processes we freeze are Chrome renderers. A renderer
is basically a process that hosts the content for a single tab,
extension, plugin, etc. Freezing occurs when the system is about to
enter suspend from the fully awake state and thawing occurs during the
reverse transition. We've made no changes to the cgroup code and have
been using it as is.

>
> Sameer and Chirantan have both been deeply involved in that project,
> adding them on cc here.
>
>
> -Olof

2015-05-01 09:02:46

by Tomeu Vizoso

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On 30 April 2015 at 20:54, Chirantan Ekbote <[email protected]> wrote:
> On Thu, Apr 30, 2015 at 10:23 AM, Olof Johansson <[email protected]> wrote:
>> Hi,
>>
>> On Thu, Apr 30, 2015 at 10:10 AM, John Stultz <[email protected]> wrote:
>>> On Thu, Apr 30, 2015 at 9:25 AM, Bastien Nocera <[email protected]> wrote:
>>>> On Tue, 2014-10-21 at 10:04 -0700, John Stultz wrote:
>>>>> On Tue, Oct 21, 2014 at 1:49 AM, Bastien Nocera <[email protected]>
>>>>> wrote:
>>>>> > Hey,
>>>>> >
>>>>> > GNOME has had discussions with kernel developers in the past, and,
>>>>> > fortunately, in some cases we were able to make headway.
>>>>> >
>>>>> > There are however a number of items that we still don't have
>>>>> > solutions
>>>>> > for, items that kernel developers might not realise we'd like to
>>>>> > rely
>>>>> > on, or don't know that we'd make use of if merged.
>>>>> >
>>>>> > I've posted this list at:
>>>>> > https://wiki.gnome.org/BastienNocera/KernelWishlist
>>>>> >
>>>>> > Let me know on-list or off-list if you have any comments about
>>>>> > those, so
>>>>> > I can update the list.
>>>>>
>>>>> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
>>>>> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
>>>>> documentation'
>>>>>
>>>>> Can you expand more on the rational for the need here? Is this for UI
>>>>> for power debugging, or something else?
>>>>
>>>> This is pretty much what I had in mind:
>>>> https://www.chromium.org/chromium-os/chromiumos-design-docs/lucid-sleep
>>>>
>>>> I guess I didn't make myself understood.
>>>
>>> My, admittedly quick skim, of that design document seems to suggest
>>> that lucid sleep would be a new kernel state. That would keep the
>>> kernel in charge of determining the state transitions (ie:
>>> SUSPEND-(alarm)->LUCID-(wakelock
>>> release)->SUSPEND-(alarm)->LUCID-(power-button)->AWAKE). Then it seems
>>> userspace would be able to query the current state. This avoids some
>>> of the races I was concerned with trying to detect which irq woke us
>>> from suspend from userspace.
>>>
>
> Tomeu has been working on making things so that we don't need a new
> kernel state. Adding him on cc so he can correct me if I say
> something wrong. The current idea is to have userspace runtime
> suspend any unneeded devices before starting a suspend. This way the
> kernel will leave them alone at resume. This behavior already exists
> in the mainline kernel.

This is right, I have one series on flight about removing any
non-runtime device resumes from my test platform (a nyan-big) and
another about entering suspend-to-idle with ticks frozen (also on
Tegra124/nyan-big).

> Getting the wakeup reason can be accomplished
> by having the kernel emit a uevent with the wakeup reason. This is
> the only change that would be necessary.

My current opinion is that for ChromeOS there's no need for the kernel
to communicate a wakeup reason to userspace. From what I know it
should be enough for powerd/upower/whatever to constantly monitor all
relevant input devices and that should tell if the wakeup was caused
by user activity or not.

Once userspace is thawed, the power management daemon would read any
pending events from the input devices and would decide whether to stay
on dark resume or whether to unfreeze renderer daemons and power up
the display, unpause any audio streams, etc.

For the GNOME folks, this is how it's done in ChromeOS:

https://chromium.googlesource.com/chromiumos/platform2/+/master/power_manager/powerd/system/input_watcher.cc

Of course, this depends on events not getting lost. In the ChromeOS
case, the firmware is under the control of the OS developers, so any
bugs can be fixed.

For GNOME and other desktop environments who aim to run on systems
whose firmware cannot be fixed, I think it could make sense for the
kernel to synthesize such events if it happens to have enough
information to do so.

Besides this issue, I think that what "only" remains to be done in the
kernel is to speed up resumes so no hacks are needed downstream. The
infrastructure for this already exists in the form of
power.direct_complete and suspend-to-idle and the work that remains is
mostly platform-specific.

This is to say that GNOME could start implementing lucid sleep right
now, though the user experience might not be ideal yet because resumes
might take longer than desired. But it might not matter to GNOME as
much as it does to ChromeOS because they aren't in control of the hw
anyway?

Regards,

Tomeu

>>> That said, the Power Manager section in that document sounds a little
>>> racy as it seems to rely on asking userspace if suspend is ok, rather
>>> then using userspace wakelocks, so I'm not sure how well baked this
>>> doc is.
>>>
>
> I'm not sure I understand what you are saying here. If you're saying
> that the kernel is asking userspace if suspend is ok, then I can say
> that that's definitely not the case. Currently from the kernel's
> perspective a lucid sleep resume isn't really different from a regular
> resume. We have a hack in each driver that we care about that
> basically boils down to an if statement that skips re-initialization
> if we are entering lucid sleep. If userspace tries to access that
> device in lucid sleep, it just gets an error. This has actually
> caused us some headache (see the GPU process section of the doc),
> which is why we'd like to switch to using the runtime suspend approach
> I mentioned above.
>
> If instead you're saying that the power manager needs to ask the rest
> of userspace whether suspend is ok, you can think of the current
> mechanism as a sort of timed wake lock. Daemons that care about lucid
> sleep register with the power manager when they start up. The power
> manager then waits for these daemons to report readiness while in
> lucid sleep before starting another suspend. So each daemon
> effectively acquires a wake lock when the system enters lucid sleep
> and releases the wake lock when it reports readiness to the power
> manager or the timeout occurs.
>
>>> Olof: Can you comment on who's working on that design doc? Also the
>>> discussion around using freezing cgroups separately to distinguish
>>> between lucid and awake is interesting, but I wonder if we need to
>>> make wakeup_sources/wakelocks cgroup aware, or has that already been
>>> done?
>>
>
> Currently cgroup process management is handled by the Chrome browser
> since the only processes we freeze are Chrome renderers. A renderer
> is basically a process that hosts the content for a single tab,
> extension, plugin, etc. Freezing occurs when the system is about to
> enter suspend from the fully awake state and thawing occurs during the
> reverse transition. We've made no changes to the cgroup code and have
> been using it as is.
>
>>
>> Sameer and Chirantan have both been deeply involved in that project,
>> adding them on cc here.
>>
>>
>> -Olof
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-05-04 21:47:24

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thursday, April 30, 2015 11:54:51 AM Chirantan Ekbote wrote:
> On Thu, Apr 30, 2015 at 10:23 AM, Olof Johansson <[email protected]> wrote:
> > Hi,
> >
> > On Thu, Apr 30, 2015 at 10:10 AM, John Stultz <[email protected]> wrote:
> >> On Thu, Apr 30, 2015 at 9:25 AM, Bastien Nocera <[email protected]> wrote:
> >>> On Tue, 2014-10-21 at 10:04 -0700, John Stultz wrote:

Thanks for CCin me, John!

Let's also CC linux-pm as the people on that list may be generally interested
in this thread.

> >>>> On Tue, Oct 21, 2014 at 1:49 AM, Bastien Nocera <[email protected]>
> >>>> wrote:
> >>>> > Hey,
> >>>> >
> >>>> > GNOME has had discussions with kernel developers in the past, and,
> >>>> > fortunately, in some cases we were able to make headway.
> >>>> >
> >>>> > There are however a number of items that we still don't have
> >>>> > solutions
> >>>> > for, items that kernel developers might not realise we'd like to
> >>>> > rely
> >>>> > on, or don't know that we'd make use of if merged.
> >>>> >
> >>>> > I've posted this list at:
> >>>> > https://wiki.gnome.org/BastienNocera/KernelWishlist
> >>>> >
> >>>> > Let me know on-list or off-list if you have any comments about
> >>>> > those, so
> >>>> > I can update the list.
> >>>>
> >>>> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
> >>>> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
> >>>> documentation'
> >>>>
> >>>> Can you expand more on the rational for the need here? Is this for UI
> >>>> for power debugging, or something else?
> >>>
> >>> This is pretty much what I had in mind:
> >>> https://www.chromium.org/chromium-os/chromiumos-design-docs/lucid-sleep
> >>>
> >>> I guess I didn't make myself understood.
> >>
> >> My, admittedly quick skim, of that design document seems to suggest
> >> that lucid sleep would be a new kernel state. That would keep the
> >> kernel in charge of determining the state transitions (ie:
> >> SUSPEND-(alarm)->LUCID-(wakelock
> >> release)->SUSPEND-(alarm)->LUCID-(power-button)->AWAKE). Then it seems
> >> userspace would be able to query the current state. This avoids some
> >> of the races I was concerned with trying to detect which irq woke us
> >> from suspend from userspace.
> >>
>
> Tomeu has been working on making things so that we don't need a new
> kernel state.

Which is good, because adding a new kernel state like that to the mainline is
out of the question as far as I'm concerned.

> Adding him on cc so he can correct me if I say
> something wrong. The current idea is to have userspace runtime
> suspend any unneeded devices before starting a suspend. This way the
> kernel will leave them alone at resume. This behavior already exists
> in the mainline kernel. Getting the wakeup reason can be accomplished
> by having the kernel emit a uevent with the wakeup reason. This is
> the only change that would be necessary.

Well, that needs to be thought through carefully in my view, or it will
always be racy.

You cannot really only rely on wakeup events that have already happened,
because something requiring you to bring up the full UI may happen at
any time. In particular, it may happen when you're about to suspend again.

For this reason, it looks like you need something along the lines of
the wakeup_count interface, but acting on subsets of devices.

It actually shouldn't be too difficult to split the existing wakeup
counter into a number of subcounters each tracking a subset of wakeup
sources and one of them might be used as the "full UI wakeup" condition
trigger in principle.

> >> That said, the Power Manager section in that document sounds a little
> >> racy as it seems to rely on asking userspace if suspend is ok, rather
> >> then using userspace wakelocks, so I'm not sure how well baked this
> >> doc is.

You cannot be "a little" racy. Either you are racy, or you aren't. If you
are, it's only a matter of time until someone hits the race. How often
that will happen depends on how hard the race is to trigger and how many
users of the feature there are. Given enough users, quite a number of them
may be unhappy.

> I'm not sure I understand what you are saying here. If you're saying
> that the kernel is asking userspace if suspend is ok, then I can say
> that that's definitely not the case. Currently from the kernel's
> perspective a lucid sleep resume isn't really different from a regular
> resume. We have a hack in each driver that we care about that
> basically boils down to an if statement that skips re-initialization
> if we are entering lucid sleep. If userspace tries to access that
> device in lucid sleep, it just gets an error. This has actually
> caused us some headache (see the GPU process section of the doc),
> which is why we'd like to switch to using the runtime suspend approach
> I mentioned above.

That's a good plan, because that's the only way you can satisfy all of the
dependencies that may be involved.

> If instead you're saying that the power manager needs to ask the rest
> of userspace whether suspend is ok, you can think of the current
> mechanism as a sort of timed wake lock. Daemons that care about lucid
> sleep register with the power manager when they start up. The power
> manager then waits for these daemons to report readiness while in
> lucid sleep before starting another suspend. So each daemon
> effectively acquires a wake lock when the system enters lucid sleep
> and releases the wake lock when it reports readiness to the power
> manager or the timeout occurs.

I think what John meant was exactly what I said above: You need a race
free mechanism to verify whether or not it is OK to suspend again (ie.
whether or not there are any unhandled events that would have woken you up
had they happened while suspended) when you're about to. You *also* need
to be able to determine (in a race free way) whether or not any of them
require you to bring up the UI.

It looks like your use case is actually more complex than the Android's one. :-)


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2015-05-04 21:54:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Friday, May 01, 2015 11:02:19 AM Tomeu Vizoso wrote:
> On 30 April 2015 at 20:54, Chirantan Ekbote <[email protected]> wrote:
> > On Thu, Apr 30, 2015 at 10:23 AM, Olof Johansson <[email protected]> wrote:
> >> Hi,
> >>
> >> On Thu, Apr 30, 2015 at 10:10 AM, John Stultz <[email protected]> wrote:
> >>> On Thu, Apr 30, 2015 at 9:25 AM, Bastien Nocera <[email protected]> wrote:
> >>>> On Tue, 2014-10-21 at 10:04 -0700, John Stultz wrote:
> >>>>> On Tue, Oct 21, 2014 at 1:49 AM, Bastien Nocera <[email protected]>
> >>>>> wrote:
> >>>>> > Hey,
> >>>>> >
> >>>>> > GNOME has had discussions with kernel developers in the past, and,
> >>>>> > fortunately, in some cases we were able to make headway.
> >>>>> >
> >>>>> > There are however a number of items that we still don't have
> >>>>> > solutions
> >>>>> > for, items that kernel developers might not realise we'd like to
> >>>>> > rely
> >>>>> > on, or don't know that we'd make use of if merged.
> >>>>> >
> >>>>> > I've posted this list at:
> >>>>> > https://wiki.gnome.org/BastienNocera/KernelWishlist
> >>>>> >
> >>>>> > Let me know on-list or off-list if you have any comments about
> >>>>> > those, so
> >>>>> > I can update the list.
> >>>>>
> >>>>> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
> >>>>> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
> >>>>> documentation'
> >>>>>
> >>>>> Can you expand more on the rational for the need here? Is this for UI
> >>>>> for power debugging, or something else?
> >>>>
> >>>> This is pretty much what I had in mind:
> >>>> https://www.chromium.org/chromium-os/chromiumos-design-docs/lucid-sleep
> >>>>
> >>>> I guess I didn't make myself understood.
> >>>
> >>> My, admittedly quick skim, of that design document seems to suggest
> >>> that lucid sleep would be a new kernel state. That would keep the
> >>> kernel in charge of determining the state transitions (ie:
> >>> SUSPEND-(alarm)->LUCID-(wakelock
> >>> release)->SUSPEND-(alarm)->LUCID-(power-button)->AWAKE). Then it seems
> >>> userspace would be able to query the current state. This avoids some
> >>> of the races I was concerned with trying to detect which irq woke us
> >>> from suspend from userspace.
> >>>
> >
> > Tomeu has been working on making things so that we don't need a new
> > kernel state. Adding him on cc so he can correct me if I say
> > something wrong. The current idea is to have userspace runtime
> > suspend any unneeded devices before starting a suspend. This way the
> > kernel will leave them alone at resume. This behavior already exists
> > in the mainline kernel.
>
> This is right, I have one series on flight about removing any
> non-runtime device resumes from my test platform (a nyan-big) and
> another about entering suspend-to-idle with ticks frozen (also on
> Tegra124/nyan-big).
>
> > Getting the wakeup reason can be accomplished
> > by having the kernel emit a uevent with the wakeup reason. This is
> > the only change that would be necessary.
>
> My current opinion is that for ChromeOS there's no need for the kernel
> to communicate a wakeup reason to userspace. From what I know it
> should be enough for powerd/upower/whatever to constantly monitor all
> relevant input devices and that should tell if the wakeup was caused
> by user activity or not.

Dream on.

What about input events that leave no trace in the input buffers?

What about events occuring between the time you've decided to suspend and
you actually suspended?

> Once userspace is thawed, the power management daemon would read any
> pending events from the input devices and would decide whether to stay
> on dark resume or whether to unfreeze renderer daemons and power up
> the display, unpause any audio streams, etc.

And what about if it decideds to suspend again and a wakeup event
requiring you to bring up the UI occurs exactly at that time?

You really have the same race as Android does here, but in addition to that
you want to deterimne how far to go with the resume on the basis of what
wakeup sources are involved.

I'm leaving the below for the benefit of linux-pm readers.

> For the GNOME folks, this is how it's done in ChromeOS:
>
> https://chromium.googlesource.com/chromiumos/platform2/+/master/power_manager/powerd/system/input_watcher.cc
>
> Of course, this depends on events not getting lost. In the ChromeOS
> case, the firmware is under the control of the OS developers, so any
> bugs can be fixed.
>
> For GNOME and other desktop environments who aim to run on systems
> whose firmware cannot be fixed, I think it could make sense for the
> kernel to synthesize such events if it happens to have enough
> information to do so.
>
> Besides this issue, I think that what "only" remains to be done in the
> kernel is to speed up resumes so no hacks are needed downstream. The
> infrastructure for this already exists in the form of
> power.direct_complete and suspend-to-idle and the work that remains is
> mostly platform-specific.
>
> This is to say that GNOME could start implementing lucid sleep right
> now, though the user experience might not be ideal yet because resumes
> might take longer than desired. But it might not matter to GNOME as
> much as it does to ChromeOS because they aren't in control of the hw
> anyway?
>
> Regards,
>
> Tomeu
>
> >>> That said, the Power Manager section in that document sounds a little
> >>> racy as it seems to rely on asking userspace if suspend is ok, rather
> >>> then using userspace wakelocks, so I'm not sure how well baked this
> >>> doc is.
> >>>
> >
> > I'm not sure I understand what you are saying here. If you're saying
> > that the kernel is asking userspace if suspend is ok, then I can say
> > that that's definitely not the case. Currently from the kernel's
> > perspective a lucid sleep resume isn't really different from a regular
> > resume. We have a hack in each driver that we care about that
> > basically boils down to an if statement that skips re-initialization
> > if we are entering lucid sleep. If userspace tries to access that
> > device in lucid sleep, it just gets an error. This has actually
> > caused us some headache (see the GPU process section of the doc),
> > which is why we'd like to switch to using the runtime suspend approach
> > I mentioned above.
> >
> > If instead you're saying that the power manager needs to ask the rest
> > of userspace whether suspend is ok, you can think of the current
> > mechanism as a sort of timed wake lock. Daemons that care about lucid
> > sleep register with the power manager when they start up. The power
> > manager then waits for these daemons to report readiness while in
> > lucid sleep before starting another suspend. So each daemon
> > effectively acquires a wake lock when the system enters lucid sleep
> > and releases the wake lock when it reports readiness to the power
> > manager or the timeout occurs.
> >
> >>> Olof: Can you comment on who's working on that design doc? Also the
> >>> discussion around using freezing cgroups separately to distinguish
> >>> between lucid and awake is interesting, but I wonder if we need to
> >>> make wakeup_sources/wakelocks cgroup aware, or has that already been
> >>> done?
> >>
> >
> > Currently cgroup process management is handled by the Chrome browser
> > since the only processes we freeze are Chrome renderers. A renderer
> > is basically a process that hosts the content for a single tab,
> > extension, plugin, etc. Freezing occurs when the system is about to
> > enter suspend from the fully awake state and thawing occurs during the
> > reverse transition. We've made no changes to the cgroup code and have
> > been using it as is.
> >
> >>
> >> Sameer and Chirantan have both been deeply involved in that project,
> >> adding them on cc here.
> >>
> >>
> >> -Olof
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2015-05-04 23:30:12

by Chirantan Ekbote

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Mon, May 4, 2015 at 3:12 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Thursday, April 30, 2015 11:54:51 AM Chirantan Ekbote wrote:
>> On Thu, Apr 30, 2015 at 10:23 AM, Olof Johansson <[email protected]> wrote:
>> > Hi,
>> >
>> > On Thu, Apr 30, 2015 at 10:10 AM, John Stultz <[email protected]> wrote:
>> >> On Thu, Apr 30, 2015 at 9:25 AM, Bastien Nocera <[email protected]> wrote:
>> >>> On Tue, 2014-10-21 at 10:04 -0700, John Stultz wrote:
>
> Thanks for CCin me, John!
>
> Let's also CC linux-pm as the people on that list may be generally interested
> in this thread.
>
>> >>>> On Tue, Oct 21, 2014 at 1:49 AM, Bastien Nocera <[email protected]>
>> >>>> wrote:
>> >>>> > Hey,
>> >>>> >
>> >>>> > GNOME has had discussions with kernel developers in the past, and,
>> >>>> > fortunately, in some cases we were able to make headway.
>> >>>> >
>> >>>> > There are however a number of items that we still don't have
>> >>>> > solutions
>> >>>> > for, items that kernel developers might not realise we'd like to
>> >>>> > rely
>> >>>> > on, or don't know that we'd make use of if merged.
>> >>>> >
>> >>>> > I've posted this list at:
>> >>>> > https://wiki.gnome.org/BastienNocera/KernelWishlist
>> >>>> >
>> >>>> > Let me know on-list or off-list if you have any comments about
>> >>>> > those, so
>> >>>> > I can update the list.
>> >>>>
>> >>>> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
>> >>>> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
>> >>>> documentation'
>> >>>>
>> >>>> Can you expand more on the rational for the need here? Is this for UI
>> >>>> for power debugging, or something else?
>> >>>
>> >>> This is pretty much what I had in mind:
>> >>> https://www.chromium.org/chromium-os/chromiumos-design-docs/lucid-sleep
>> >>>
>> >>> I guess I didn't make myself understood.
>> >>
>> >> My, admittedly quick skim, of that design document seems to suggest
>> >> that lucid sleep would be a new kernel state. That would keep the
>> >> kernel in charge of determining the state transitions (ie:
>> >> SUSPEND-(alarm)->LUCID-(wakelock
>> >> release)->SUSPEND-(alarm)->LUCID-(power-button)->AWAKE). Then it seems
>> >> userspace would be able to query the current state. This avoids some
>> >> of the races I was concerned with trying to detect which irq woke us
>> >> from suspend from userspace.
>> >>
>>
>> Tomeu has been working on making things so that we don't need a new
>> kernel state.
>
> Which is good, because adding a new kernel state like that to the mainline is
> out of the question as far as I'm concerned.
>
>> Adding him on cc so he can correct me if I say
>> something wrong. The current idea is to have userspace runtime
>> suspend any unneeded devices before starting a suspend. This way the
>> kernel will leave them alone at resume. This behavior already exists
>> in the mainline kernel. Getting the wakeup reason can be accomplished
>> by having the kernel emit a uevent with the wakeup reason. This is
>> the only change that would be necessary.
>
> Well, that needs to be thought through carefully in my view, or it will
> always be racy.
>
> You cannot really only rely on wakeup events that have already happened,
> because something requiring you to bring up the full UI may happen at
> any time. In particular, it may happen when you're about to suspend again.
>
> For this reason, it looks like you need something along the lines of
> the wakeup_count interface, but acting on subsets of devices.
>
> It actually shouldn't be too difficult to split the existing wakeup
> counter into a number of subcounters each tracking a subset of wakeup
> sources and one of them might be used as the "full UI wakeup" condition
> trigger in principle.
>

In the interest of brevity, I didn't go into the design of suspend /
resume in userspace in my last email but it seems like there's no way
around it.

Ignoring lucid sleep for a moment, here is how a regular suspend works
in the power manager. The power manager sees a suspend request either
because the user has been idle for X amount of time (usually 15
minutes) or the user explicitly requested it by closing the lid. The
power manager reads the value of /sys/power/wakeup_count and then
announces an imminent suspend to the rest of the system via DBus.
Interested applications (like the network manager and Chrome) will
have registered with the power manager to be informed about this when
they started up. For example, this is when Chrome would freeze its
renderer processes. The power manager will now wait for them to
report back their readiness to suspend. Once all applications have
reported ready (or the maximum timeout occurs), the power manager
performs some final preparations (like setting the resume brightness
for the display). The last thing the power manager does, right before
writing "mem" to /sys/power/state, is write the wakeup_count that it
read earlier to /sys/power/wakeup_count. If the write fails, the
power manager considers the suspend attempt failed, reads the new
wakeup_count, and starts a timer (usually 10 seconds) to retry the
suspend. The same thing happens if the write to /sys/power/state
fails.

It may be the case that the event that incremented the count happened
because a user was trying to cancel the suspend. The user could have
pressed some keys on the keyboard, touched the trackpad, opened the
lid, pressed the power button, etc, etc. For the keyboard and
trackpad these events make their way up to Chrome, which sends a user
activity message to the power manager. This is a message that Chrome
sends to the power manager even during regular activity, up to five
times a second, to prevent the idle timeout from occuring. When the
power manager sees this user activity message while waiting to retry
the suspend, it cancels the retry and sends out a DBus signal
informing the system that the suspend is now complete. For events
like opening the lid or pressing the power button, the power manager
monitors those events in /dev/input and simulates a user activity
message from chrome if any of them fire.


Ok so now I can talk about how lucid sleep fits into all of this. The
power manager only considers a suspend attempt successful if the write
to /sys/power/state was successful. If the suspend was successful,
the power manager then reads another sysfs file
(/sys/power/wakeup_type in our kernel) to determine if the wakeup
event was user-triggered. If the event was user-triggered it sends
out a DBus signal announcing the end of the suspend, Chrome thaws its
renderer processes, the full UI comes back up, and the user can start
working. If the event was _not_ user-triggred (if it was the RTC or
NIC), the power manager sends out a different DBus signal announcing
that the system is in lucid sleep and will re-suspend soon. It will
then wait for all registered applications to report readiness to
suspend or for the max timeout to expire.

If it so happens that a user interacts with the system while it is in
this state, the power manager detects it via the user activity message
that Chrome sends. This could be real (keyboard, trackpad) or
simulated (lid, power button). Either way, the power manager responds
the same way: it announces the end of the suspend, Chrome thaws its
renderers, the full UI comes back up, and the user starts working. If
there is no user activity and all applications report ready, the power
manager gets ready to suspend the system again. Since the NIC is now
a wakeup source, the power manager doesn't read the wakeup_count until
after all applications have reported ready because accessing the
network could increment the wakeup_count and cause false positives.

If either the write to /sys/power/wakeup_count or /sys/power/state
fails from lucid sleep, the power manager re-announces that the system
is in lucid sleep and will re-suspend soon. It's actually a little
smart about this: it will only re-announce lucid sleep if there was a
wakeup_count mismatch or if the write to /sys/power/state returned
EBUSY. Other failures only trigger a simple retry and no DBus signal.
We do this because these wakeup events may legitimately trigger lucid
sleep. For example, more packets may arrive from the network or the
RTC may go off and applications don't perform work until they hear
from the power manager that the system is in lucid sleep. At this
point the power manager is back to waiting for applications to report
ready (or for the retry timer to fire). This process may repeat
multiple times if we keep getting wakeup events right as the system
tries to re-suspend.


So that was a little long-winded but hopefully I've addressed all your
concerns about potential race conditions in this code. I simplified a
few bits because would just complicate the discussion but for the most
part this is how the feature works now. Having the kernel emit a
uevent with the wakeup event type would take the place of the power
manager reading from /sys/power/wakeup_type in this system but
wouldn't really affect anything else.


>> >> That said, the Power Manager section in that document sounds a little
>> >> racy as it seems to rely on asking userspace if suspend is ok, rather
>> >> then using userspace wakelocks, so I'm not sure how well baked this
>> >> doc is.
>
> You cannot be "a little" racy. Either you are racy, or you aren't. If you
> are, it's only a matter of time until someone hits the race. How often
> that will happen depends on how hard the race is to trigger and how many
> users of the feature there are. Given enough users, quite a number of them
> may be unhappy.
>
>> I'm not sure I understand what you are saying here. If you're saying
>> that the kernel is asking userspace if suspend is ok, then I can say
>> that that's definitely not the case. Currently from the kernel's
>> perspective a lucid sleep resume isn't really different from a regular
>> resume. We have a hack in each driver that we care about that
>> basically boils down to an if statement that skips re-initialization
>> if we are entering lucid sleep. If userspace tries to access that
>> device in lucid sleep, it just gets an error. This has actually
>> caused us some headache (see the GPU process section of the doc),
>> which is why we'd like to switch to using the runtime suspend approach
>> I mentioned above.
>
> That's a good plan, because that's the only way you can satisfy all of the
> dependencies that may be involved.
>
>> If instead you're saying that the power manager needs to ask the rest
>> of userspace whether suspend is ok, you can think of the current
>> mechanism as a sort of timed wake lock. Daemons that care about lucid
>> sleep register with the power manager when they start up. The power
>> manager then waits for these daemons to report readiness while in
>> lucid sleep before starting another suspend. So each daemon
>> effectively acquires a wake lock when the system enters lucid sleep
>> and releases the wake lock when it reports readiness to the power
>> manager or the timeout occurs.
>
> I think what John meant was exactly what I said above: You need a race
> free mechanism to verify whether or not it is OK to suspend again (ie.
> whether or not there are any unhandled events that would have woken you up
> had they happened while suspended) when you're about to. You *also* need
> to be able to determine (in a race free way) whether or not any of them
> require you to bring up the UI.
>
> It looks like your use case is actually more complex than the Android's one. :-)
>
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.

2015-05-05 06:05:59

by Tomeu Vizoso

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On 5 May 2015 at 00:19, Rafael J. Wysocki <[email protected]> wrote:
> On Friday, May 01, 2015 11:02:19 AM Tomeu Vizoso wrote:
>> On 30 April 2015 at 20:54, Chirantan Ekbote <[email protected]> wrote:
>> > On Thu, Apr 30, 2015 at 10:23 AM, Olof Johansson <[email protected]> wrote:
>> >> Hi,
>> >>
>> >> On Thu, Apr 30, 2015 at 10:10 AM, John Stultz <[email protected]> wrote:
>> >>> On Thu, Apr 30, 2015 at 9:25 AM, Bastien Nocera <[email protected]> wrote:
>> >>>> On Tue, 2014-10-21 at 10:04 -0700, John Stultz wrote:
>> >>>>> On Tue, Oct 21, 2014 at 1:49 AM, Bastien Nocera <[email protected]>
>> >>>>> wrote:
>> >>>>> > Hey,
>> >>>>> >
>> >>>>> > GNOME has had discussions with kernel developers in the past, and,
>> >>>>> > fortunately, in some cases we were able to make headway.
>> >>>>> >
>> >>>>> > There are however a number of items that we still don't have
>> >>>>> > solutions
>> >>>>> > for, items that kernel developers might not realise we'd like to
>> >>>>> > rely
>> >>>>> > on, or don't know that we'd make use of if merged.
>> >>>>> >
>> >>>>> > I've posted this list at:
>> >>>>> > https://wiki.gnome.org/BastienNocera/KernelWishlist
>> >>>>> >
>> >>>>> > Let me know on-list or off-list if you have any comments about
>> >>>>> > those, so
>> >>>>> > I can update the list.
>> >>>>>
>> >>>>> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
>> >>>>> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
>> >>>>> documentation'
>> >>>>>
>> >>>>> Can you expand more on the rational for the need here? Is this for UI
>> >>>>> for power debugging, or something else?
>> >>>>
>> >>>> This is pretty much what I had in mind:
>> >>>> https://www.chromium.org/chromium-os/chromiumos-design-docs/lucid-sleep
>> >>>>
>> >>>> I guess I didn't make myself understood.
>> >>>
>> >>> My, admittedly quick skim, of that design document seems to suggest
>> >>> that lucid sleep would be a new kernel state. That would keep the
>> >>> kernel in charge of determining the state transitions (ie:
>> >>> SUSPEND-(alarm)->LUCID-(wakelock
>> >>> release)->SUSPEND-(alarm)->LUCID-(power-button)->AWAKE). Then it seems
>> >>> userspace would be able to query the current state. This avoids some
>> >>> of the races I was concerned with trying to detect which irq woke us
>> >>> from suspend from userspace.
>> >>>
>> >
>> > Tomeu has been working on making things so that we don't need a new
>> > kernel state. Adding him on cc so he can correct me if I say
>> > something wrong. The current idea is to have userspace runtime
>> > suspend any unneeded devices before starting a suspend. This way the
>> > kernel will leave them alone at resume. This behavior already exists
>> > in the mainline kernel.
>>
>> This is right, I have one series on flight about removing any
>> non-runtime device resumes from my test platform (a nyan-big) and
>> another about entering suspend-to-idle with ticks frozen (also on
>> Tegra124/nyan-big).
>>
>> > Getting the wakeup reason can be accomplished
>> > by having the kernel emit a uevent with the wakeup reason. This is
>> > the only change that would be necessary.
>>
>> My current opinion is that for ChromeOS there's no need for the kernel
>> to communicate a wakeup reason to userspace. From what I know it
>> should be enough for powerd/upower/whatever to constantly monitor all
>> relevant input devices and that should tell if the wakeup was caused
>> by user activity or not.
>
> Dream on.
>
> What about input events that leave no trace in the input buffers?

You mean that it can happen that, barring bugs on hw or fw, the kernel
knows what device "caused" the wakeup but doesn't have enough
information to report the full event (key pressed, touchpad motion,
etc) to userspace?

> What about events occuring between the time you've decided to suspend and
> you actually suspended?

What about them? How would wakeup_type help us there?

Regards,

Tomeu

>> Once userspace is thawed, the power management daemon would read any
>> pending events from the input devices and would decide whether to stay
>> on dark resume or whether to unfreeze renderer daemons and power up
>> the display, unpause any audio streams, etc.
>
> And what about if it decideds to suspend again and a wakeup event
> requiring you to bring up the UI occurs exactly at that time?
>
> You really have the same race as Android does here, but in addition to that
> you want to deterimne how far to go with the resume on the basis of what
> wakeup sources are involved.
>
> I'm leaving the below for the benefit of linux-pm readers.
>
>> For the GNOME folks, this is how it's done in ChromeOS:
>>
>> https://chromium.googlesource.com/chromiumos/platform2/+/master/power_manager/powerd/system/input_watcher.cc
>>
>> Of course, this depends on events not getting lost. In the ChromeOS
>> case, the firmware is under the control of the OS developers, so any
>> bugs can be fixed.
>>
>> For GNOME and other desktop environments who aim to run on systems
>> whose firmware cannot be fixed, I think it could make sense for the
>> kernel to synthesize such events if it happens to have enough
>> information to do so.
>>
>> Besides this issue, I think that what "only" remains to be done in the
>> kernel is to speed up resumes so no hacks are needed downstream. The
>> infrastructure for this already exists in the form of
>> power.direct_complete and suspend-to-idle and the work that remains is
>> mostly platform-specific.
>>
>> This is to say that GNOME could start implementing lucid sleep right
>> now, though the user experience might not be ideal yet because resumes
>> might take longer than desired. But it might not matter to GNOME as
>> much as it does to ChromeOS because they aren't in control of the hw
>> anyway?
>>
>> Regards,
>>
>> Tomeu
>>
>> >>> That said, the Power Manager section in that document sounds a little
>> >>> racy as it seems to rely on asking userspace if suspend is ok, rather
>> >>> then using userspace wakelocks, so I'm not sure how well baked this
>> >>> doc is.
>> >>>
>> >
>> > I'm not sure I understand what you are saying here. If you're saying
>> > that the kernel is asking userspace if suspend is ok, then I can say
>> > that that's definitely not the case. Currently from the kernel's
>> > perspective a lucid sleep resume isn't really different from a regular
>> > resume. We have a hack in each driver that we care about that
>> > basically boils down to an if statement that skips re-initialization
>> > if we are entering lucid sleep. If userspace tries to access that
>> > device in lucid sleep, it just gets an error. This has actually
>> > caused us some headache (see the GPU process section of the doc),
>> > which is why we'd like to switch to using the runtime suspend approach
>> > I mentioned above.
>> >
>> > If instead you're saying that the power manager needs to ask the rest
>> > of userspace whether suspend is ok, you can think of the current
>> > mechanism as a sort of timed wake lock. Daemons that care about lucid
>> > sleep register with the power manager when they start up. The power
>> > manager then waits for these daemons to report readiness while in
>> > lucid sleep before starting another suspend. So each daemon
>> > effectively acquires a wake lock when the system enters lucid sleep
>> > and releases the wake lock when it reports readiness to the power
>> > manager or the timeout occurs.
>> >
>> >>> Olof: Can you comment on who's working on that design doc? Also the
>> >>> discussion around using freezing cgroups separately to distinguish
>> >>> between lucid and awake is interesting, but I wonder if we need to
>> >>> make wakeup_sources/wakelocks cgroup aware, or has that already been
>> >>> done?
>> >>
>> >
>> > Currently cgroup process management is handled by the Chrome browser
>> > since the only processes we freeze are Chrome renderers. A renderer
>> > is basically a process that hosts the content for a single tab,
>> > extension, plugin, etc. Freezing occurs when the system is about to
>> > enter suspend from the fully awake state and thawing occurs during the
>> > reverse transition. We've made no changes to the cgroup code and have
>> > been using it as is.
>> >
>> >>
>> >> Sameer and Chirantan have both been deeply involved in that project,
>> >> adding them on cc here.
>> >>
>> >>
>> >> -Olof
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> > the body of a message to [email protected]
>> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>> > Please read the FAQ at http://www.tux.org/lkml/
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2015-05-05 10:46:52

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Mon, 2015-05-04 at 16:30 -0700, Chirantan Ekbote wrote:
>
<snip>
> In the interest of brevity, I didn't go into the design of suspend /
> resume in userspace in my last email but it seems like there's no way
> around it.
>
> Ignoring lucid sleep for a moment, here is how a regular suspend
> works
> in the power manager. The power manager sees a suspend request
> either
> because the user has been idle for X amount of time (usually 15
> minutes) or the user explicitly requested it by closing the lid. The
> power manager reads the value of /sys/power/wakeup_count and then
> announces an imminent suspend to the rest of the system via DBus.
> Interested applications (like the network manager and Chrome) will
> have registered with the power manager to be informed about this when
> they started up. For example, this is when Chrome would freeze its
> renderer processes. The power manager will now wait for them to
> report back their readiness to suspend. Once all applications have
> reported ready (or the maximum timeout occurs), the power manager
> performs some final preparations (like setting the resume brightness
> for the display).

logind in systemd also does this, using block or delay inhibitors over
D-Bus:
http://www.freedesktop.org/wiki/Software/systemd/inhibit/

> The last thing the power manager does, right before
> writing "mem" to /sys/power/state, is write the wakeup_count that it
> read earlier to /sys/power/wakeup_count. If the write fails, the
> power manager considers the suspend attempt failed, reads the new
> wakeup_count, and starts a timer (usually 10 seconds) to retry the
> suspend. The same thing happens if the write to /sys/power/state
> fails.

Is this something that logind should do as well?

> It may be the case that the event that incremented the count happened
> because a user was trying to cancel the suspend. The user could have
> pressed some keys on the keyboard, touched the trackpad, opened the
> lid, pressed the power button, etc, etc. For the keyboard and
> trackpad these events make their way up to Chrome, which sends a user
> activity message to the power manager. This is a message that Chrome
> sends to the power manager even during regular activity, up to five
> times a second, to prevent the idle timeout from occuring. When the
> power manager sees this user activity message while waiting to retry
> the suspend, it cancels the retry and sends out a DBus signal
> informing the system that the suspend is now complete. For events
> like opening the lid or pressing the power button, the power manager
> monitors those events in /dev/input and simulates a user activity
> message from chrome if any of them fire.
>
>
> Ok so now I can talk about how lucid sleep fits into all of this.
> The
> power manager only considers a suspend attempt successful if the
> write
> to /sys/power/state was successful. If the suspend was successful,
> the power manager then reads another sysfs file
> (/sys/power/wakeup_type in our kernel) to determine if the wakeup
> event was user-triggered.

This is where your lucid sleep implementation matches up with what I
had in mind for that wishlist.

> If the event was user-triggered it sends
> out a DBus signal announcing the end of the suspend, Chrome thaws its
> renderer processes, the full UI comes back up, and the user can start
> working. If the event was _not_ user-triggred (if it was the RTC or
> NIC), the power manager sends out a different DBus signal announcing
> that the system is in lucid sleep and will re-suspend soon. It will
> then wait for all registered applications to report readiness to
> suspend or for the max timeout to expire.
>
> If it so happens that a user interacts with the system while it is in
> this state, the power manager detects it via the user activity
> message
> that Chrome sends. This could be real (keyboard, trackpad) or
> simulated (lid, power button). Either way, the power manager
> responds
> the same way: it announces the end of the suspend, Chrome thaws its
> renderers, the full UI comes back up, and the user starts working.
> If
> there is no user activity and all applications report ready, the
> power
> manager gets ready to suspend the system again. Since the NIC is now
> a wakeup source, the power manager doesn't read the wakeup_count
> until
> after all applications have reported ready because accessing the
> network could increment the wakeup_count and cause false positives.
>
> If either the write to /sys/power/wakeup_count or /sys/power/state
> fails from lucid sleep, the power manager re-announces that the
> system
> is in lucid sleep and will re-suspend soon. It's actually a little
> smart about this: it will only re-announce lucid sleep if there was a
> wakeup_count mismatch or if the write to /sys/power/state returned
> EBUSY. Other failures only trigger a simple retry and no DBus
> signal.
> We do this because these wakeup events may legitimately trigger lucid
> sleep. For example, more packets may arrive from the network or the
> RTC may go off and applications don't perform work until they hear
> from the power manager that the system is in lucid sleep. At this
> point the power manager is back to waiting for applications to report
> ready (or for the retry timer to fire). This process may repeat
> multiple times if we keep getting wakeup events right as the system
> tries to re-suspend.
>
>
> So that was a little long-winded but hopefully I've addressed all
> your
> concerns about potential race conditions in this code. I simplified
> a
> few bits because would just complicate the discussion but for the
> most
> part this is how the feature works now. Having the kernel emit a
> uevent with the wakeup event type would take the place of the power
> manager reading from /sys/power/wakeup_type in this system but
> wouldn't really affect anything else.

Could you or Tomeu explain the driver changes that are necessary? Do
only drivers that we want to be usable as wakeup sources need to be
changed? Do you have examples of such a patch?

Cheers

2015-05-05 12:07:12

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tuesday, May 05, 2015 08:05:32 AM Tomeu Vizoso wrote:
> On 5 May 2015 at 00:19, Rafael J. Wysocki <[email protected]> wrote:
> > On Friday, May 01, 2015 11:02:19 AM Tomeu Vizoso wrote:
> >> On 30 April 2015 at 20:54, Chirantan Ekbote <[email protected]> wrote:
> >> > On Thu, Apr 30, 2015 at 10:23 AM, Olof Johansson <[email protected]> wrote:
> >> >> Hi,
> >> >>
> >> >> On Thu, Apr 30, 2015 at 10:10 AM, John Stultz <[email protected]> wrote:
> >> >>> On Thu, Apr 30, 2015 at 9:25 AM, Bastien Nocera <[email protected]> wrote:
> >> >>>> On Tue, 2014-10-21 at 10:04 -0700, John Stultz wrote:
> >> >>>>> On Tue, Oct 21, 2014 at 1:49 AM, Bastien Nocera <[email protected]>
> >> >>>>> wrote:
> >> >>>>> > Hey,
> >> >>>>> >
> >> >>>>> > GNOME has had discussions with kernel developers in the past, and,
> >> >>>>> > fortunately, in some cases we were able to make headway.
> >> >>>>> >
> >> >>>>> > There are however a number of items that we still don't have
> >> >>>>> > solutions
> >> >>>>> > for, items that kernel developers might not realise we'd like to
> >> >>>>> > rely
> >> >>>>> > on, or don't know that we'd make use of if merged.
> >> >>>>> >
> >> >>>>> > I've posted this list at:
> >> >>>>> > https://wiki.gnome.org/BastienNocera/KernelWishlist
> >> >>>>> >
> >> >>>>> > Let me know on-list or off-list if you have any comments about
> >> >>>>> > those, so
> >> >>>>> > I can update the list.
> >> >>>>>
> >> >>>>> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
> >> >>>>> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
> >> >>>>> documentation'
> >> >>>>>
> >> >>>>> Can you expand more on the rational for the need here? Is this for UI
> >> >>>>> for power debugging, or something else?
> >> >>>>
> >> >>>> This is pretty much what I had in mind:
> >> >>>> https://www.chromium.org/chromium-os/chromiumos-design-docs/lucid-sleep
> >> >>>>
> >> >>>> I guess I didn't make myself understood.
> >> >>>
> >> >>> My, admittedly quick skim, of that design document seems to suggest
> >> >>> that lucid sleep would be a new kernel state. That would keep the
> >> >>> kernel in charge of determining the state transitions (ie:
> >> >>> SUSPEND-(alarm)->LUCID-(wakelock
> >> >>> release)->SUSPEND-(alarm)->LUCID-(power-button)->AWAKE). Then it seems
> >> >>> userspace would be able to query the current state. This avoids some
> >> >>> of the races I was concerned with trying to detect which irq woke us
> >> >>> from suspend from userspace.
> >> >>>
> >> >
> >> > Tomeu has been working on making things so that we don't need a new
> >> > kernel state. Adding him on cc so he can correct me if I say
> >> > something wrong. The current idea is to have userspace runtime
> >> > suspend any unneeded devices before starting a suspend. This way the
> >> > kernel will leave them alone at resume. This behavior already exists
> >> > in the mainline kernel.
> >>
> >> This is right, I have one series on flight about removing any
> >> non-runtime device resumes from my test platform (a nyan-big) and
> >> another about entering suspend-to-idle with ticks frozen (also on
> >> Tegra124/nyan-big).
> >>
> >> > Getting the wakeup reason can be accomplished
> >> > by having the kernel emit a uevent with the wakeup reason. This is
> >> > the only change that would be necessary.
> >>
> >> My current opinion is that for ChromeOS there's no need for the kernel
> >> to communicate a wakeup reason to userspace. From what I know it
> >> should be enough for powerd/upower/whatever to constantly monitor all
> >> relevant input devices and that should tell if the wakeup was caused
> >> by user activity or not.
> >
> > Dream on.
> >
> > What about input events that leave no trace in the input buffers?
>
> You mean that it can happen that, barring bugs on hw or fw, the kernel
> knows what device "caused" the wakeup but doesn't have enough
> information to report the full event (key pressed, touchpad motion,
> etc) to userspace?

Yes.

For example, when you wake up from S3 on ACPI-based systems, the best you
can get is what devices have generated the wakeup events, but there's
no input available from that (like you won't know which key has been
pressed). You may not get that even. You may only know what GPEs have
caused the wakeup to happen and they may be shared.

For PCI wakeup, the wakeup event may be out of band. You need to walk
the hierarchy and check the PME status bits to identify the wakeup device
and then you need to be careful enough not to reset it while putting into
D0 for the input data associated with the event to be available. I'm not
sure how many device/driver combinations this actually works for.

For USB wakeup, you get the wakeup event from the controller which may be
a PCI device. Getting to the USB device itself from there requires some
work and even then the device may not "remember" what exactly happened.

Further, if you wake up via the PC keyboard from suspend-to-idle, the
wakeup key code is not available, the only thing you know is that the
interrupts has occured (that may be changed, but it's how the current
code works).

I guess I could continue with that.

> > What about events occuring between the time you've decided to suspend and
> > you actually suspended?
>
> What about them? How would wakeup_type help us there?

Well, what do you do to ensure that you don't miss them? That's what wakeup_count
can be used for (if that's what you mean).

Thanks,
Rafael


> >> Once userspace is thawed, the power management daemon would read any
> >> pending events from the input devices and would decide whether to stay
> >> on dark resume or whether to unfreeze renderer daemons and power up
> >> the display, unpause any audio streams, etc.
> >
> > And what about if it decideds to suspend again and a wakeup event
> > requiring you to bring up the UI occurs exactly at that time?
> >
> > You really have the same race as Android does here, but in addition to that
> > you want to deterimne how far to go with the resume on the basis of what
> > wakeup sources are involved.
> >
> > I'm leaving the below for the benefit of linux-pm readers.
> >
> >> For the GNOME folks, this is how it's done in ChromeOS:
> >>
> >> https://chromium.googlesource.com/chromiumos/platform2/+/master/power_manager/powerd/system/input_watcher.cc
> >>
> >> Of course, this depends on events not getting lost. In the ChromeOS
> >> case, the firmware is under the control of the OS developers, so any
> >> bugs can be fixed.
> >>
> >> For GNOME and other desktop environments who aim to run on systems
> >> whose firmware cannot be fixed, I think it could make sense for the
> >> kernel to synthesize such events if it happens to have enough
> >> information to do so.
> >>
> >> Besides this issue, I think that what "only" remains to be done in the
> >> kernel is to speed up resumes so no hacks are needed downstream. The
> >> infrastructure for this already exists in the form of
> >> power.direct_complete and suspend-to-idle and the work that remains is
> >> mostly platform-specific.
> >>
> >> This is to say that GNOME could start implementing lucid sleep right
> >> now, though the user experience might not be ideal yet because resumes
> >> might take longer than desired. But it might not matter to GNOME as
> >> much as it does to ChromeOS because they aren't in control of the hw
> >> anyway?
> >>
> >> Regards,
> >>
> >> Tomeu
> >>
> >> >>> That said, the Power Manager section in that document sounds a little
> >> >>> racy as it seems to rely on asking userspace if suspend is ok, rather
> >> >>> then using userspace wakelocks, so I'm not sure how well baked this
> >> >>> doc is.
> >> >>>
> >> >
> >> > I'm not sure I understand what you are saying here. If you're saying
> >> > that the kernel is asking userspace if suspend is ok, then I can say
> >> > that that's definitely not the case. Currently from the kernel's
> >> > perspective a lucid sleep resume isn't really different from a regular
> >> > resume. We have a hack in each driver that we care about that
> >> > basically boils down to an if statement that skips re-initialization
> >> > if we are entering lucid sleep. If userspace tries to access that
> >> > device in lucid sleep, it just gets an error. This has actually
> >> > caused us some headache (see the GPU process section of the doc),
> >> > which is why we'd like to switch to using the runtime suspend approach
> >> > I mentioned above.
> >> >
> >> > If instead you're saying that the power manager needs to ask the rest
> >> > of userspace whether suspend is ok, you can think of the current
> >> > mechanism as a sort of timed wake lock. Daemons that care about lucid
> >> > sleep register with the power manager when they start up. The power
> >> > manager then waits for these daemons to report readiness while in
> >> > lucid sleep before starting another suspend. So each daemon
> >> > effectively acquires a wake lock when the system enters lucid sleep
> >> > and releases the wake lock when it reports readiness to the power
> >> > manager or the timeout occurs.
> >> >
> >> >>> Olof: Can you comment on who's working on that design doc? Also the
> >> >>> discussion around using freezing cgroups separately to distinguish
> >> >>> between lucid and awake is interesting, but I wonder if we need to
> >> >>> make wakeup_sources/wakelocks cgroup aware, or has that already been
> >> >>> done?
> >> >>
> >> >
> >> > Currently cgroup process management is handled by the Chrome browser
> >> > since the only processes we freeze are Chrome renderers. A renderer
> >> > is basically a process that hosts the content for a single tab,
> >> > extension, plugin, etc. Freezing occurs when the system is about to
> >> > enter suspend from the fully awake state and thawing occurs during the
> >> > reverse transition. We've made no changes to the cgroup code and have
> >> > been using it as is.
> >> >
> >> >>
> >> >> Sameer and Chirantan have both been deeply involved in that project,
> >> >> adding them on cc here.
> >> >>
> >> >>
> >> >> -Olof
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> > the body of a message to [email protected]
> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> > Please read the FAQ at http://www.tux.org/lkml/
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> >> the body of a message to [email protected]
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >> Please read the FAQ at http://www.tux.org/lkml/
> >
> > --
> > I speak only for myself.
> > Rafael J. Wysocki, Intel Open Source Technology Center.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2015-05-05 16:07:36

by Alan Stern

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Mon, 4 May 2015, Chirantan Ekbote wrote:

> Ok so now I can talk about how lucid sleep fits into all of this. The
> power manager only considers a suspend attempt successful if the write
> to /sys/power/state was successful. If the suspend was successful,
> the power manager then reads another sysfs file
> (/sys/power/wakeup_type in our kernel) to determine if the wakeup
> event was user-triggered. If the event was user-triggered it sends
> out a DBus signal announcing the end of the suspend, Chrome thaws its
> renderer processes, the full UI comes back up, and the user can start
> working. If the event was _not_ user-triggred (if it was the RTC or
> NIC), the power manager sends out a different DBus signal announcing
> that the system is in lucid sleep and will re-suspend soon. It will
> then wait for all registered applications to report readiness to
> suspend or for the max timeout to expire.
>
> If it so happens that a user interacts with the system while it is in
> this state, the power manager detects it via the user activity message
> that Chrome sends. This could be real (keyboard, trackpad) or
> simulated (lid, power button). Either way, the power manager responds
> the same way: it announces the end of the suspend, Chrome thaws its
> renderers, the full UI comes back up, and the user starts working. If
> there is no user activity and all applications report ready, the power
> manager gets ready to suspend the system again. Since the NIC is now
> a wakeup source, the power manager doesn't read the wakeup_count until
> after all applications have reported ready because accessing the
> network could increment the wakeup_count and cause false positives.

This gives only an implicit description of the difference between
"lucid sleep" and normal resume. I gather that "lucid sleep" means you
avoid reactivating some parts of the hardware (in particular the parts
used by the renderers), and perhaps also you shorten the inactivity
delay before the next suspend.

Is that right? The main point of "lucid sleep" is to allow the system
to do a partial resume, in which some of the most power-hungry parts
remain suspended?

If that's true then some of the recent work on "direct-complete" may be
sufficient to do what you want. See commit f71495f3f0c5 (PM / sleep:
Update device PM documentation to cover direct_complete) and the
preceding commits. The idea is that under some circumstances, devices
that were in runtime-suspend before a system sleep can remain in
runtime-suspend when the system wakes up.

If you can force the rendering hardware to go into runtime-suspend when
starting a system suspend, it can remain powered down when the system
wakes up. Then if it isn't needed for anything (because you suspend
the system again before doing any UI activity), it can remain powered
down the entire time. No need for a separate "lucid sleep" state.

Alan Stern

2015-05-05 17:58:39

by Chirantan Ekbote

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, May 5, 2015 at 7:39 AM, Alan Stern <[email protected]> wrote:
> On Mon, 4 May 2015, Chirantan Ekbote wrote:
>
>> Ok so now I can talk about how lucid sleep fits into all of this. The
>> power manager only considers a suspend attempt successful if the write
>> to /sys/power/state was successful. If the suspend was successful,
>> the power manager then reads another sysfs file
>> (/sys/power/wakeup_type in our kernel) to determine if the wakeup
>> event was user-triggered. If the event was user-triggered it sends
>> out a DBus signal announcing the end of the suspend, Chrome thaws its
>> renderer processes, the full UI comes back up, and the user can start
>> working. If the event was _not_ user-triggred (if it was the RTC or
>> NIC), the power manager sends out a different DBus signal announcing
>> that the system is in lucid sleep and will re-suspend soon. It will
>> then wait for all registered applications to report readiness to
>> suspend or for the max timeout to expire.
>>
>> If it so happens that a user interacts with the system while it is in
>> this state, the power manager detects it via the user activity message
>> that Chrome sends. This could be real (keyboard, trackpad) or
>> simulated (lid, power button). Either way, the power manager responds
>> the same way: it announces the end of the suspend, Chrome thaws its
>> renderers, the full UI comes back up, and the user starts working. If
>> there is no user activity and all applications report ready, the power
>> manager gets ready to suspend the system again. Since the NIC is now
>> a wakeup source, the power manager doesn't read the wakeup_count until
>> after all applications have reported ready because accessing the
>> network could increment the wakeup_count and cause false positives.
>
> This gives only an implicit description of the difference between
> "lucid sleep" and normal resume. I gather that "lucid sleep" means you
> avoid reactivating some parts of the hardware (in particular the parts
> used by the renderers), and perhaps also you shorten the inactivity
> delay before the next suspend.
>
> Is that right? The main point of "lucid sleep" is to allow the system
> to do a partial resume, in which some of the most power-hungry parts
> remain suspended?
>

Yes, that is correct.

> If that's true then some of the recent work on "direct-complete" may be
> sufficient to do what you want. See commit f71495f3f0c5 (PM / sleep:
> Update device PM documentation to cover direct_complete) and the
> preceding commits. The idea is that under some circumstances, devices
> that were in runtime-suspend before a system sleep can remain in
> runtime-suspend when the system wakes up.
>
> If you can force the rendering hardware to go into runtime-suspend when
> starting a system suspend, it can remain powered down when the system
> wakes up. Then if it isn't needed for anything (because you suspend
> the system again before doing any UI activity), it can remain powered
> down the entire time. No need for a separate "lucid sleep" state.
>

This is our plan for the next version (see my email earlier in this
thread). Keeping a hybrid power state with hacks in the drivers isn't
really maintainable, scalable, or upstream-able and has caused us some
headaches already. Unfortunately we were working with the 3.14 kernel
at the time, which didn't have the framework necessary to do anything
else. The new version of lucid sleep will have the power manager
runtime suspend power-hungry devices before a suspend so that they
remain powered off at resume time. The power manager can then decide
to resume those devices based on whether the wakeup event was
user-triggered.

Being able to determine the wakeup type from userspace is the only
functionality we need from the kernel that doesn't already exist in
mainline.

> Alan Stern
>

2015-05-05 19:22:11

by Chirantan Ekbote

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, May 5, 2015 at 3:46 AM, Bastien Nocera <[email protected]> wrote:
> On Mon, 2015-05-04 at 16:30 -0700, Chirantan Ekbote wrote:
>>
> <snip>
>> In the interest of brevity, I didn't go into the design of suspend /
>> resume in userspace in my last email but it seems like there's no way
>> around it.
>>
>> Ignoring lucid sleep for a moment, here is how a regular suspend
>> works
>> in the power manager. The power manager sees a suspend request
>> either
>> because the user has been idle for X amount of time (usually 15
>> minutes) or the user explicitly requested it by closing the lid. The
>> power manager reads the value of /sys/power/wakeup_count and then
>> announces an imminent suspend to the rest of the system via DBus.
>> Interested applications (like the network manager and Chrome) will
>> have registered with the power manager to be informed about this when
>> they started up. For example, this is when Chrome would freeze its
>> renderer processes. The power manager will now wait for them to
>> report back their readiness to suspend. Once all applications have
>> reported ready (or the maximum timeout occurs), the power manager
>> performs some final preparations (like setting the resume brightness
>> for the display).
>
> logind in systemd also does this, using block or delay inhibitors over
> D-Bus:
> http://www.freedesktop.org/wiki/Software/systemd/inhibit/
>
>> The last thing the power manager does, right before
>> writing "mem" to /sys/power/state, is write the wakeup_count that it
>> read earlier to /sys/power/wakeup_count. If the write fails, the
>> power manager considers the suspend attempt failed, reads the new
>> wakeup_count, and starts a timer (usually 10 seconds) to retry the
>> suspend. The same thing happens if the write to /sys/power/state
>> fails.
>
> Is this something that logind should do as well?
>

We do it to avoid a race condition where a wakeup event occurs after
userspace has started the suspend process but before anything writes
"mem" to /sys/power/state. I'm guessing that this is something logind
should be doing as well since the chances of missing a wakeup event
increase the longer any given delay inhibitor takes to delay a
suspend.

>> It may be the case that the event that incremented the count happened
>> because a user was trying to cancel the suspend. The user could have
>> pressed some keys on the keyboard, touched the trackpad, opened the
>> lid, pressed the power button, etc, etc. For the keyboard and
>> trackpad these events make their way up to Chrome, which sends a user
>> activity message to the power manager. This is a message that Chrome
>> sends to the power manager even during regular activity, up to five
>> times a second, to prevent the idle timeout from occuring. When the
>> power manager sees this user activity message while waiting to retry
>> the suspend, it cancels the retry and sends out a DBus signal
>> informing the system that the suspend is now complete. For events
>> like opening the lid or pressing the power button, the power manager
>> monitors those events in /dev/input and simulates a user activity
>> message from chrome if any of them fire.
>>
>> <snip>
>>
>> So that was a little long-winded but hopefully I've addressed all
>> your
>> concerns about potential race conditions in this code. I simplified
>> a
>> few bits because would just complicate the discussion but for the
>> most
>> part this is how the feature works now. Having the kernel emit a
>> uevent with the wakeup event type would take the place of the power
>> manager reading from /sys/power/wakeup_type in this system but
>> wouldn't really affect anything else.
>
> Could you or Tomeu explain the driver changes that are necessary? Do
> only drivers that we want to be usable as wakeup sources need to be
> changed? Do you have examples of such a patch?
>

You will need driver support for anything that needs to trigger a
wakeup event. The RTC already supports this but for wifi we needed
both driver and firmware support from Intel. Assuming you choose to
go down the "runtime-suspend devices to keep them suspended on resume"
path, then you would also need runtime suspend support in all the
drivers that you wanted to leave off. In our case, we only do this
for the display and the USB ports because we don't want the screen to
flash or the lights on any USB-connected device to start blinking.

Here are some of the patches for wifi that we needed to get this working:

6abb9cb cfg80211: make WoWLAN configuration available to drivers
cd8f7cb cfg80211/mac80211: support reporting wakeup reason
a92eecb cfg80211: fix WoWLAN wakeup tracing
8cd4d45 cfg80211: add wowlan net-detect support


> Cheers
>

2015-05-05 19:35:58

by Alan Stern

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, 5 May 2015, Chirantan Ekbote wrote:

> This is our plan for the next version (see my email earlier in this
> thread). Keeping a hybrid power state with hacks in the drivers isn't
> really maintainable, scalable, or upstream-able and has caused us some
> headaches already. Unfortunately we were working with the 3.14 kernel
> at the time, which didn't have the framework necessary to do anything
> else. The new version of lucid sleep will have the power manager
> runtime suspend power-hungry devices before a suspend so that they
> remain powered off at resume time. The power manager can then decide
> to resume those devices based on whether the wakeup event was
> user-triggered.
>
> Being able to determine the wakeup type from userspace is the only
> functionality we need from the kernel that doesn't already exist in
> mainline.

Maybe you can simplify the problem. You don't really need to know the
wakeup type, or to determine which device was responsible for the
wakeup. All you really need to know is whether the wakeup was
user-triggered. That may be much easier to discover.

Alan Stern

2015-05-05 20:58:19

by Chirantan Ekbote

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, May 5, 2015 at 12:35 PM, Alan Stern <[email protected]> wrote:
> On Tue, 5 May 2015, Chirantan Ekbote wrote:
>
>> This is our plan for the next version (see my email earlier in this
>> thread). Keeping a hybrid power state with hacks in the drivers isn't
>> really maintainable, scalable, or upstream-able and has caused us some
>> headaches already. Unfortunately we were working with the 3.14 kernel
>> at the time, which didn't have the framework necessary to do anything
>> else. The new version of lucid sleep will have the power manager
>> runtime suspend power-hungry devices before a suspend so that they
>> remain powered off at resume time. The power manager can then decide
>> to resume those devices based on whether the wakeup event was
>> user-triggered.
>>
>> Being able to determine the wakeup type from userspace is the only
>> functionality we need from the kernel that doesn't already exist in
>> mainline.
>
> Maybe you can simplify the problem. You don't really need to know the
> wakeup type, or to determine which device was responsible for the
> wakeup. All you really need to know is whether the wakeup was
> user-triggered. That may be much easier to discover.
>

You are, of course, correct. Ultimately the only requirement we have
is that there exists a way for userspace to determine if the system
woke up because of a user-triggered event. The actual mechanism by
which this determination is made isn't something I feel strongly
about. The reason I had been focusing on exposing the actual wakeup
event to userspace is because classifying wakeup events as
user-triggered or not feels to me like a policy decision that should
be left to userspace. If the kernel maintainers are ok with doing
this work in the kernel instead and only exposing a binary yes/no bit
to userspace for user-triggered wakeups, that's perfectly fine because
it still meets our requirements.

> Alan Stern
>

2015-05-05 23:39:13

by David Lang

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Wed, 6 May 2015, Rafael J. Wysocki wrote:

>> You are, of course, correct. Ultimately the only requirement we have
>> is that there exists a way for userspace to determine if the system
>> woke up because of a user-triggered event. The actual mechanism by
>> which this determination is made isn't something I feel strongly
>> about. The reason I had been focusing on exposing the actual wakeup
>> event to userspace is because classifying wakeup events as
>> user-triggered or not feels to me like a policy decision that should
>> be left to userspace. If the kernel maintainers are ok with doing
>> this work in the kernel instead and only exposing a binary yes/no bit
>> to userspace for user-triggered wakeups, that's perfectly fine because
>> it still meets our requirements.
>
> Well, please see the message I've just sent.
>
> All wakeup devices have a wakeup source object associated with them. In
> principle, we can expose a "priority" attribute from that for user space to
> set as it wants to. There may be two values of it, like "normal" and "high"
> for example.
>
> Then, what only remains is to introduce separate wakeup counts for the "high"
> priority and "normal" priority wakeup sources and teach the power manager to
> use them.
>
> That leaves no policy in the kernel, but it actually has a chance to work.

how about instead of setting two states and defining that one must be a subset
of the other you instead have the existing feed of events and then allow
software that cares to define additional feeds that take the current feed and
filter it. We allow bpf filters in the kernel, so use those to filter what
events the additional feed is going to receive.

remember that the interesting numbers in CS are 0, 1, and many, not 2 :-)

don't limit things to two feeds with one always being a subset of the other,
create a mechanism to allow an arbitrary number of feeds that can be filtered in
different ways

David Lang

2015-05-05 23:22:04

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Monday, May 04, 2015 04:30:00 PM Chirantan Ekbote wrote:
> On Mon, May 4, 2015 at 3:12 PM, Rafael J. Wysocki <[email protected]> wrote:
> > On Thursday, April 30, 2015 11:54:51 AM Chirantan Ekbote wrote:
> >> On Thu, Apr 30, 2015 at 10:23 AM, Olof Johansson <[email protected]> wrote:
> >> > Hi,
> >> >
> >> > On Thu, Apr 30, 2015 at 10:10 AM, John Stultz <[email protected]> wrote:
> >> >> On Thu, Apr 30, 2015 at 9:25 AM, Bastien Nocera <[email protected]> wrote:
> >> >>> On Tue, 2014-10-21 at 10:04 -0700, John Stultz wrote:
> >
> > Thanks for CCin me, John!
> >
> > Let's also CC linux-pm as the people on that list may be generally interested
> > in this thread.
> >
> >> >>>> On Tue, Oct 21, 2014 at 1:49 AM, Bastien Nocera <[email protected]>
> >> >>>> wrote:
> >> >>>> > Hey,
> >> >>>> >
> >> >>>> > GNOME has had discussions with kernel developers in the past, and,
> >> >>>> > fortunately, in some cases we were able to make headway.
> >> >>>> >
> >> >>>> > There are however a number of items that we still don't have
> >> >>>> > solutions
> >> >>>> > for, items that kernel developers might not realise we'd like to
> >> >>>> > rely
> >> >>>> > on, or don't know that we'd make use of if merged.
> >> >>>> >
> >> >>>> > I've posted this list at:
> >> >>>> > https://wiki.gnome.org/BastienNocera/KernelWishlist
> >> >>>> >
> >> >>>> > Let me know on-list or off-list if you have any comments about
> >> >>>> > those, so
> >> >>>> > I can update the list.
> >> >>>>
> >> >>>> As for: 'Export of "wake reason" when the system wakes up (rtc alarm,
> >> >>>> lid open, etc.) and wakealarm (/sys/class/rtc/foo/wakealarm)
> >> >>>> documentation'
> >> >>>>
> >> >>>> Can you expand more on the rational for the need here? Is this for UI
> >> >>>> for power debugging, or something else?
> >> >>>
> >> >>> This is pretty much what I had in mind:
> >> >>> https://www.chromium.org/chromium-os/chromiumos-design-docs/lucid-sleep
> >> >>>
> >> >>> I guess I didn't make myself understood.
> >> >>
> >> >> My, admittedly quick skim, of that design document seems to suggest
> >> >> that lucid sleep would be a new kernel state. That would keep the
> >> >> kernel in charge of determining the state transitions (ie:
> >> >> SUSPEND-(alarm)->LUCID-(wakelock
> >> >> release)->SUSPEND-(alarm)->LUCID-(power-button)->AWAKE). Then it seems
> >> >> userspace would be able to query the current state. This avoids some
> >> >> of the races I was concerned with trying to detect which irq woke us
> >> >> from suspend from userspace.
> >> >>
> >>
> >> Tomeu has been working on making things so that we don't need a new
> >> kernel state.
> >
> > Which is good, because adding a new kernel state like that to the mainline is
> > out of the question as far as I'm concerned.
> >
> >> Adding him on cc so he can correct me if I say
> >> something wrong. The current idea is to have userspace runtime
> >> suspend any unneeded devices before starting a suspend. This way the
> >> kernel will leave them alone at resume. This behavior already exists
> >> in the mainline kernel. Getting the wakeup reason can be accomplished
> >> by having the kernel emit a uevent with the wakeup reason. This is
> >> the only change that would be necessary.
> >
> > Well, that needs to be thought through carefully in my view, or it will
> > always be racy.
> >
> > You cannot really only rely on wakeup events that have already happened,
> > because something requiring you to bring up the full UI may happen at
> > any time. In particular, it may happen when you're about to suspend again.
> >
> > For this reason, it looks like you need something along the lines of
> > the wakeup_count interface, but acting on subsets of devices.
> >
> > It actually shouldn't be too difficult to split the existing wakeup
> > counter into a number of subcounters each tracking a subset of wakeup
> > sources and one of them might be used as the "full UI wakeup" condition
> > trigger in principle.
> >
>
> In the interest of brevity, I didn't go into the design of suspend /
> resume in userspace in my last email but it seems like there's no way
> around it.

Well, thanks for the effort. :-)

> Ignoring lucid sleep for a moment, here is how a regular suspend works
> in the power manager. The power manager sees a suspend request either
> because the user has been idle for X amount of time (usually 15
> minutes) or the user explicitly requested it by closing the lid. The
> power manager reads the value of /sys/power/wakeup_count and then
> announces an imminent suspend to the rest of the system via DBus.
> Interested applications (like the network manager and Chrome) will
> have registered with the power manager to be informed about this when
> they started up. For example, this is when Chrome would freeze its
> renderer processes. The power manager will now wait for them to
> report back their readiness to suspend. Once all applications have
> reported ready (or the maximum timeout occurs), the power manager
> performs some final preparations (like setting the resume brightness
> for the display). The last thing the power manager does, right before
> writing "mem" to /sys/power/state, is write the wakeup_count that it
> read earlier to /sys/power/wakeup_count. If the write fails, the
> power manager considers the suspend attempt failed, reads the new
> wakeup_count, and starts a timer (usually 10 seconds) to retry the
> suspend. The same thing happens if the write to /sys/power/state
> fails.

That sounds reasonable to me.

> It may be the case that the event that incremented the count happened
> because a user was trying to cancel the suspend. The user could have
> pressed some keys on the keyboard, touched the trackpad, opened the
> lid, pressed the power button, etc, etc. For the keyboard and
> trackpad these events make their way up to Chrome, which sends a user
> activity message to the power manager. This is a message that Chrome
> sends to the power manager even during regular activity, up to five
> times a second, to prevent the idle timeout from occuring. When the
> power manager sees this user activity message while waiting to retry
> the suspend, it cancels the retry and sends out a DBus signal
> informing the system that the suspend is now complete. For events
> like opening the lid or pressing the power button, the power manager
> monitors those events in /dev/input and simulates a user activity
> message from chrome if any of them fire.

So far, so good.

> Ok so now I can talk about how lucid sleep fits into all of this. The
> power manager only considers a suspend attempt successful if the write
> to /sys/power/state was successful. If the suspend was successful,
> the power manager then reads another sysfs file
> (/sys/power/wakeup_type in our kernel) to determine if the wakeup
> event was user-triggered.

Well, that's where things are likely to get ugly depending on how the
/sys/power/wakeup_type attribute works (more below).

> If the event was user-triggered it sends
> out a DBus signal announcing the end of the suspend, Chrome thaws its
> renderer processes, the full UI comes back up, and the user can start
> working. If the event was _not_ user-triggred (if it was the RTC or
> NIC), the power manager sends out a different DBus signal announcing
> that the system is in lucid sleep and will re-suspend soon. It will
> then wait for all registered applications to report readiness to
> suspend or for the max timeout to expire.

First let me say that the "user-triggered" vs "non-user-triggered" distinction
seems somewhat artificial to me. All boils down to having a special class
of wakeup events that are supposed to make the power manager behave differently
after resuming. Whether or not they are actually triggered by the user
doesn't really matter technically.

> If it so happens that a user interacts with the system while it is in
> this state, the power manager detects it via the user activity message
> that Chrome sends. This could be real (keyboard, trackpad) or
> simulated (lid, power button). Either way, the power manager responds
> the same way: it announces the end of the suspend, Chrome thaws its
> renderers, the full UI comes back up, and the user starts working. If
> there is no user activity and all applications report ready, the power
> manager gets ready to suspend the system again. Since the NIC is now
> a wakeup source, the power manager doesn't read the wakeup_count until
> after all applications have reported ready because accessing the
> network could increment the wakeup_count and cause false positives.
>
> If either the write to /sys/power/wakeup_count or /sys/power/state
> fails from lucid sleep, the power manager re-announces that the system
> is in lucid sleep and will re-suspend soon. It's actually a little
> smart about this: it will only re-announce lucid sleep if there was a
> wakeup_count mismatch or if the write to /sys/power/state returned
> EBUSY. Other failures only trigger a simple retry and no DBus signal.
> We do this because these wakeup events may legitimately trigger lucid
> sleep. For example, more packets may arrive from the network or the
> RTC may go off and applications don't perform work until they hear
> from the power manager that the system is in lucid sleep. At this
> point the power manager is back to waiting for applications to report
> ready (or for the retry timer to fire). This process may repeat
> multiple times if we keep getting wakeup events right as the system
> tries to re-suspend.
>
>
> So that was a little long-winded but hopefully I've addressed all your
> concerns about potential race conditions in this code. I simplified a
> few bits because would just complicate the discussion but for the most
> part this is how the feature works now. Having the kernel emit a
> uevent with the wakeup event type would take the place of the power
> manager reading from /sys/power/wakeup_type in this system but
> wouldn't really affect anything else.

Which loops back to my previous remark: Things may get ugly if /sys/power/wakeup_type
doesn't do the right thing (the uevent mechanics you'd like to replace it with
will really need to do the same, so I'm not quite sure it's worth the effort).

Namely, it really has to cover all events that might have woken you up and
happened before stuff has started to be added to the input buffers that Chrome
cares about. It is difficult to identify the exact point where that takes place
in the resume sequence, but it should be somewhere in dpm_resume_end(). Why so?
Because it really doesn't matter why exactly the system is waking up. What
matters is whether or not an event that you should react to by bringing up the
UI happens *at* *any* *time* between (and including) the actual wakeup and the
point when you can rely on the input buffers to contain any useful information
consumable by Chrome.

This pretty much means that /sys/power/wakeup_type needs to behave almost like
/sys/power/wakeup_count, but is limited to a subset of wakeup sources. That's
why I was talking about splitting the wakeup count.

So instead of adding an entirely new mechanics for that, why don't you add
something like "priority" or "weight" to struct wakeup_source and assign
higher values of that to the wakeup sources associated with the events
you want to bring up the UI after resume? And make those "higher-priority"
wakeup sources use a separate wakeup counter, so you can easily verify if
any of them has triggered by reading that or making it trigger a uevent if
you want to?


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2015-05-05 23:51:53

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Wed, May 6, 2015 at 1:38 AM, David Lang <[email protected]> wrote:
> On Wed, 6 May 2015, Rafael J. Wysocki wrote:
>
>>> You are, of course, correct. Ultimately the only requirement we have
>>> is that there exists a way for userspace to determine if the system
>>> woke up because of a user-triggered event. The actual mechanism by
>>> which this determination is made isn't something I feel strongly
>>> about. The reason I had been focusing on exposing the actual wakeup
>>> event to userspace is because classifying wakeup events as
>>> user-triggered or not feels to me like a policy decision that should
>>> be left to userspace. If the kernel maintainers are ok with doing
>>> this work in the kernel instead and only exposing a binary yes/no bit
>>> to userspace for user-triggered wakeups, that's perfectly fine because
>>> it still meets our requirements.
>>
>>
>> Well, please see the message I've just sent.
>>
>> All wakeup devices have a wakeup source object associated with them. In
>> principle, we can expose a "priority" attribute from that for user space
>> to
>> set as it wants to. There may be two values of it, like "normal" and
>> "high"
>> for example.
>>
>> Then, what only remains is to introduce separate wakeup counts for the
>> "high"
>> priority and "normal" priority wakeup sources and teach the power manager
>> to
>> use them.
>>
>> That leaves no policy in the kernel, but it actually has a chance to work.
>
>
> how about instead of setting two states and defining that one must be a
> subset of the other you instead have the existing feed of events and then
> allow software that cares to define additional feeds that take the current
> feed and filter it. We allow bpf filters in the kernel, so use those to
> filter what events the additional feed is going to receive.
>
> remember that the interesting numbers in CS are 0, 1, and many, not 2 :-)
>
> don't limit things to two feeds with one always being a subset of the other,
> create a mechanism to allow an arbitrary number of feeds that can be
> filtered in different ways

In my example above "high" is not a subset of "normal". They are
separate sets. Each of them is a subset of the set of all wakeup
sources, but that's obvious.

And yes, you can create more of them, but then you'll need an
interface to manipulate them which will probably be overkill for the
use case at hand.

Do you envision a use case where more than two sets would be necessary?

Rafael

2015-05-05 23:31:51

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tuesday, May 05, 2015 01:58:12 PM Chirantan Ekbote wrote:
> On Tue, May 5, 2015 at 12:35 PM, Alan Stern <[email protected]> wrote:
> > On Tue, 5 May 2015, Chirantan Ekbote wrote:
> >
> >> This is our plan for the next version (see my email earlier in this
> >> thread). Keeping a hybrid power state with hacks in the drivers isn't
> >> really maintainable, scalable, or upstream-able and has caused us some
> >> headaches already. Unfortunately we were working with the 3.14 kernel
> >> at the time, which didn't have the framework necessary to do anything
> >> else. The new version of lucid sleep will have the power manager
> >> runtime suspend power-hungry devices before a suspend so that they
> >> remain powered off at resume time. The power manager can then decide
> >> to resume those devices based on whether the wakeup event was
> >> user-triggered.
> >>
> >> Being able to determine the wakeup type from userspace is the only
> >> functionality we need from the kernel that doesn't already exist in
> >> mainline.
> >
> > Maybe you can simplify the problem. You don't really need to know the
> > wakeup type, or to determine which device was responsible for the
> > wakeup. All you really need to know is whether the wakeup was
> > user-triggered. That may be much easier to discover.
> >
>
> You are, of course, correct. Ultimately the only requirement we have
> is that there exists a way for userspace to determine if the system
> woke up because of a user-triggered event. The actual mechanism by
> which this determination is made isn't something I feel strongly
> about. The reason I had been focusing on exposing the actual wakeup
> event to userspace is because classifying wakeup events as
> user-triggered or not feels to me like a policy decision that should
> be left to userspace. If the kernel maintainers are ok with doing
> this work in the kernel instead and only exposing a binary yes/no bit
> to userspace for user-triggered wakeups, that's perfectly fine because
> it still meets our requirements.

Well, please see the message I've just sent.

All wakeup devices have a wakeup source object associated with them. In
principle, we can expose a "priority" attribute from that for user space to
set as it wants to. There may be two values of it, like "normal" and "high"
for example.

Then, what only remains is to introduce separate wakeup counts for the "high"
priority and "normal" priority wakeup sources and teach the power manager to
use them.

That leaves no policy in the kernel, but it actually has a chance to work.


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2015-05-06 12:41:46

by Bastien Nocera

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, 2015-05-05 at 12:22 -0700, Chirantan Ekbote wrote:
> On Tue, May 5, 2015 at 3:46 AM, Bastien Nocera <[email protected]>
> wrote:
> >
> > > The last thing the power manager does, right before
> > > writing "mem" to /sys/power/state, is write the wakeup_count
> > > that it
> > > read earlier to /sys/power/wakeup_count. If the write fails, the
> > > power manager considers the suspend attempt failed, reads the new
> > > wakeup_count, and starts a timer (usually 10 seconds) to retry
> > > the
> > > suspend. The same thing happens if the write to /sys/power/state
> > > fails.
> >
> > Is this something that logind should do as well?
> >
>
> We do it to avoid a race condition where a wakeup event occurs after
> userspace has started the suspend process but before anything writes
> "mem" to /sys/power/state. I'm guessing that this is something
> logind
> should be doing as well since the chances of missing a wakeup event
> increase the longer any given delay inhibitor takes to delay a
> suspend.

File https://bugzilla.freedesktop.org/show_bug.cgi?id=90339

Cheers

2015-05-06 17:40:57

by Chirantan Ekbote

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, May 5, 2015 at 4:47 PM, Rafael J. Wysocki <[email protected]> wrote:
> On Monday, May 04, 2015 04:30:00 PM Chirantan Ekbote wrote:
>
> <snip>
>
>> Ok so now I can talk about how lucid sleep fits into all of this. The
>> power manager only considers a suspend attempt successful if the write
>> to /sys/power/state was successful. If the suspend was successful,
>> the power manager then reads another sysfs file
>> (/sys/power/wakeup_type in our kernel) to determine if the wakeup
>> event was user-triggered.
>
> Well, that's where things are likely to get ugly depending on how the
> /sys/power/wakeup_type attribute works (more below).
>
>> If the event was user-triggered it sends
>> out a DBus signal announcing the end of the suspend, Chrome thaws its
>> renderer processes, the full UI comes back up, and the user can start
>> working. If the event was _not_ user-triggred (if it was the RTC or
>> NIC), the power manager sends out a different DBus signal announcing
>> that the system is in lucid sleep and will re-suspend soon. It will
>> then wait for all registered applications to report readiness to
>> suspend or for the max timeout to expire.
>
> First let me say that the "user-triggered" vs "non-user-triggered" distinction
> seems somewhat artificial to me. All boils down to having a special class
> of wakeup events that are supposed to make the power manager behave differently
> after resuming. Whether or not they are actually triggered by the user
> doesn't really matter technically.
>

This is true. It's just easier to talk about them as user-triggered
or not. For reference, [1] and [2] are the patches that implement
wakeup_type in our kernel.

>> If it so happens that a user interacts with the system while it is in
>> this state, the power manager detects it via the user activity message
>> that Chrome sends. This could be real (keyboard, trackpad) or
>> simulated (lid, power button). Either way, the power manager responds
>> the same way: it announces the end of the suspend, Chrome thaws its
>> renderers, the full UI comes back up, and the user starts working. If
>> there is no user activity and all applications report ready, the power
>> manager gets ready to suspend the system again. Since the NIC is now
>> a wakeup source, the power manager doesn't read the wakeup_count until
>> after all applications have reported ready because accessing the
>> network could increment the wakeup_count and cause false positives.
>>
>> If either the write to /sys/power/wakeup_count or /sys/power/state
>> fails from lucid sleep, the power manager re-announces that the system
>> is in lucid sleep and will re-suspend soon. It's actually a little
>> smart about this: it will only re-announce lucid sleep if there was a
>> wakeup_count mismatch or if the write to /sys/power/state returned
>> EBUSY. Other failures only trigger a simple retry and no DBus signal.
>> We do this because these wakeup events may legitimately trigger lucid
>> sleep. For example, more packets may arrive from the network or the
>> RTC may go off and applications don't perform work until they hear
>> from the power manager that the system is in lucid sleep. At this
>> point the power manager is back to waiting for applications to report
>> ready (or for the retry timer to fire). This process may repeat
>> multiple times if we keep getting wakeup events right as the system
>> tries to re-suspend.
>>
>>
>> So that was a little long-winded but hopefully I've addressed all your
>> concerns about potential race conditions in this code. I simplified a
>> few bits because would just complicate the discussion but for the most
>> part this is how the feature works now. Having the kernel emit a
>> uevent with the wakeup event type would take the place of the power
>> manager reading from /sys/power/wakeup_type in this system but
>> wouldn't really affect anything else.
>
> Which loops back to my previous remark: Things may get ugly if /sys/power/wakeup_type
> doesn't do the right thing (the uevent mechanics you'd like to replace it with
> will really need to do the same, so I'm not quite sure it's worth the effort).
>
> Namely, it really has to cover all events that might have woken you up and
> happened before stuff has started to be added to the input buffers that Chrome
> cares about. It is difficult to identify the exact point where that takes place
> in the resume sequence, but it should be somewhere in dpm_resume_end(). Why so?
> Because it really doesn't matter why exactly the system is waking up. What
> matters is whether or not an event that you should react to by bringing up the
> UI happens *at* *any* *time* between (and including) the actual wakeup and the
> point when you can rely on the input buffers to contain any useful information
> consumable by Chrome.
>

So this is something that we don't catch right now. Our
implementation queries the firmware via an ACPI call to get the wakeup
source and I was assuming that any events after that *would*
eventually make their way up to Chrome or userspace. But based on
what you are saying, it seems like we do drop any events that occur
between the wakeup and the time when the input buffers contain useful
information.

I'm guessing that this window is pretty small though and when we do
end up missing an event, we can consider ourselves lucky because the
typical user reaction when their computer doesn't wake up is to start
pressing random keys on the keyboard and we would definitely catch
those.

> This pretty much means that /sys/power/wakeup_type needs to behave almost like
> /sys/power/wakeup_count, but is limited to a subset of wakeup sources. That's
> why I was talking about splitting the wakeup count.
>
> So instead of adding an entirely new mechanics for that, why don't you add
> something like "priority" or "weight" to struct wakeup_source and assign
> higher values of that to the wakeup sources associated with the events
> you want to bring up the UI after resume? And make those "higher-priority"
> wakeup sources use a separate wakeup counter, so you can easily verify if
> any of them has triggered by reading that or making it trigger a uevent if
> you want to?
>
>

Sounds good to me.



[1] https://chromium-review.googlesource.com/#/c/226932
[2] https://chromium-review.googlesource.com/#/c/226933

> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.

2015-05-07 16:55:27

by Alan Cox

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tue, 05 May 2015 14:31:26 +0200

> For example, when you wake up from S3 on ACPI-based systems, the best you
> can get is what devices have generated the wakeup events, but there's
> no input available from that (like you won't know which key has been
> pressed). You may not get that even. You may only know what GPEs have
> caused the wakeup to happen and they may be shared.
>
> For PCI wakeup, the wakeup event may be out of band. You need to walk
> the hierarchy and check the PME status bits to identify the wakeup device
> and then you need to be careful enough not to reset it while putting into
> D0 for the input data associated with the event to be available. I'm not
> sure how many device/driver combinations this actually works for.
>
> For USB wakeup, you get the wakeup event from the controller which may be
> a PCI device. Getting to the USB device itself from there requires some
> work and even then the device may not "remember" what exactly happened.
>
> Further, if you wake up via the PC keyboard from suspend-to-idle, the
> wakeup key code is not available, the only thing you know is that the
> interrupts has occured (that may be changed, but it's how the current
> code works).

It's probably got to change, otherwise once machines get able to sleep
between keypresses it's going to suck every time you pause and think for
a minute then begin typing. Remember display being off for suspend is
purely a limitation of most current display panels.


Alan

2015-05-07 17:04:13

by Alan Cox

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

> You are, of course, correct. Ultimately the only requirement we have
> is that there exists a way for userspace to determine if the system
> woke up because of a user-triggered event. The actual mechanism by

No. That is irrelevant. You need a way to ascertain if a user triggered
event has occurred since you suspended.

The two are not the same thing.

If your box wakes up due to something like a wireless card deciding it
needs to poke the base station and the user hits a key a microsecond
after wakeup then you want the display on.

The question is never "did the user wake the machine" the question is "did
the user do something that takes me out of 'lucid sleep/snooze/whatever'
since I suspended". Every user event could equally occur a microsecond
after a wakeup from a non user source, so every time you must ask the
"since suspend" question.

In fact if you had some kind of hypoethetical event counter incremented
by the device on it causing a wakeup event *or* an event while active
(and no way to tell them apat) that would provide a correct race free
interface to figure out if the display ought to be on

It doesn't solve the powering off as a key is hit race but that's a
different beast.

Alan

2015-05-07 18:21:52

by Chirantan Ekbote

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thu, May 7, 2015 at 10:03 AM, One Thousand Gnomes
<[email protected]> wrote:
>> You are, of course, correct. Ultimately the only requirement we have
>> is that there exists a way for userspace to determine if the system
>> woke up because of a user-triggered event. The actual mechanism by
>
> No. That is irrelevant. You need a way to ascertain if a user triggered
> event has occurred since you suspended.
>
> The two are not the same thing.
>
> If your box wakes up due to something like a wireless card deciding it
> needs to poke the base station and the user hits a key a microsecond
> after wakeup then you want the display on.
>
> The question is never "did the user wake the machine" the question is "did
> the user do something that takes me out of 'lucid sleep/snooze/whatever'
> since I suspended". Every user event could equally occur a microsecond
> after a wakeup from a non user source, so every time you must ask the
> "since suspend" question.
>

Yes, this is what Rafael said earlier and if you had read my reply,
you would have seen that I have already admitted that this is a
situation that our current implementation does not handle properly.
However, this hasn't been a very big issue for us in practice because
1) the window during which an event could get dropped like this is
presumably very small and 2) the standard user behavior when their
computer doesn't wake up is to start pressing random keys, which we do
end up catching. No, this is not a great user experience and we want
to fix it for the next version.

> In fact if you had some kind of hypoethetical event counter incremented
> by the device on it causing a wakeup event *or* an event while active
> (and no way to tell them apat) that would provide a correct race free
> interface to figure out if the display ought to be on
>

Again, this is basically what Rafael suggested earlier and I've
already said that it sounds like a perfectly reasonable solution to
me.

> It doesn't solve the powering off as a key is hit race but that's a
> different beast.
>
> Alan

2015-05-07 20:38:30

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Thursday, May 07, 2015 05:54:56 PM One Thousand Gnomes wrote:
> On Tue, 05 May 2015 14:31:26 +0200
>
> > For example, when you wake up from S3 on ACPI-based systems, the best you
> > can get is what devices have generated the wakeup events, but there's
> > no input available from that (like you won't know which key has been
> > pressed). You may not get that even. You may only know what GPEs have
> > caused the wakeup to happen and they may be shared.
> >
> > For PCI wakeup, the wakeup event may be out of band. You need to walk
> > the hierarchy and check the PME status bits to identify the wakeup device
> > and then you need to be careful enough not to reset it while putting into
> > D0 for the input data associated with the event to be available. I'm not
> > sure how many device/driver combinations this actually works for.
> >
> > For USB wakeup, you get the wakeup event from the controller which may be
> > a PCI device. Getting to the USB device itself from there requires some
> > work and even then the device may not "remember" what exactly happened.
> >
> > Further, if you wake up via the PC keyboard from suspend-to-idle, the
> > wakeup key code is not available, the only thing you know is that the
> > interrupts has occured (that may be changed, but it's how the current
> > code works).
>
> It's probably got to change, otherwise once machines get able to sleep
> between keypresses it's going to suck every time you pause and think for
> a minute then begin typing. Remember display being off for suspend is
> purely a limitation of most current display panels.

Right.

It is just one example, though.

Take a PCI device in D3hot for another one. It may not even have a buffer
to store input data while in that state. The only thing it may be able to
do is to signal a PME from it.

Rafael

2015-05-07 22:54:01

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Wednesday, May 06, 2015 10:40:53 AM Chirantan Ekbote wrote:
> On Tue, May 5, 2015 at 4:47 PM, Rafael J. Wysocki <[email protected]> wrote:
> > On Monday, May 04, 2015 04:30:00 PM Chirantan Ekbote wrote:
> >
> > <snip>
> >
> >> Ok so now I can talk about how lucid sleep fits into all of this. The
> >> power manager only considers a suspend attempt successful if the write
> >> to /sys/power/state was successful. If the suspend was successful,
> >> the power manager then reads another sysfs file
> >> (/sys/power/wakeup_type in our kernel) to determine if the wakeup
> >> event was user-triggered.
> >
> > Well, that's where things are likely to get ugly depending on how the
> > /sys/power/wakeup_type attribute works (more below).
> >
> >> If the event was user-triggered it sends
> >> out a DBus signal announcing the end of the suspend, Chrome thaws its
> >> renderer processes, the full UI comes back up, and the user can start
> >> working. If the event was _not_ user-triggred (if it was the RTC or
> >> NIC), the power manager sends out a different DBus signal announcing
> >> that the system is in lucid sleep and will re-suspend soon. It will
> >> then wait for all registered applications to report readiness to
> >> suspend or for the max timeout to expire.
> >
> > First let me say that the "user-triggered" vs "non-user-triggered" distinction
> > seems somewhat artificial to me. All boils down to having a special class
> > of wakeup events that are supposed to make the power manager behave differently
> > after resuming. Whether or not they are actually triggered by the user
> > doesn't really matter technically.
> >
>
> This is true. It's just easier to talk about them as user-triggered
> or not. For reference, [1] and [2] are the patches that implement
> wakeup_type in our kernel.
>
> >> If it so happens that a user interacts with the system while it is in
> >> this state, the power manager detects it via the user activity message
> >> that Chrome sends. This could be real (keyboard, trackpad) or
> >> simulated (lid, power button). Either way, the power manager responds
> >> the same way: it announces the end of the suspend, Chrome thaws its
> >> renderers, the full UI comes back up, and the user starts working. If
> >> there is no user activity and all applications report ready, the power
> >> manager gets ready to suspend the system again. Since the NIC is now
> >> a wakeup source, the power manager doesn't read the wakeup_count until
> >> after all applications have reported ready because accessing the
> >> network could increment the wakeup_count and cause false positives.
> >>
> >> If either the write to /sys/power/wakeup_count or /sys/power/state
> >> fails from lucid sleep, the power manager re-announces that the system
> >> is in lucid sleep and will re-suspend soon. It's actually a little
> >> smart about this: it will only re-announce lucid sleep if there was a
> >> wakeup_count mismatch or if the write to /sys/power/state returned
> >> EBUSY. Other failures only trigger a simple retry and no DBus signal.
> >> We do this because these wakeup events may legitimately trigger lucid
> >> sleep. For example, more packets may arrive from the network or the
> >> RTC may go off and applications don't perform work until they hear
> >> from the power manager that the system is in lucid sleep. At this
> >> point the power manager is back to waiting for applications to report
> >> ready (or for the retry timer to fire). This process may repeat
> >> multiple times if we keep getting wakeup events right as the system
> >> tries to re-suspend.
> >>
> >>
> >> So that was a little long-winded but hopefully I've addressed all your
> >> concerns about potential race conditions in this code. I simplified a
> >> few bits because would just complicate the discussion but for the most
> >> part this is how the feature works now. Having the kernel emit a
> >> uevent with the wakeup event type would take the place of the power
> >> manager reading from /sys/power/wakeup_type in this system but
> >> wouldn't really affect anything else.
> >
> > Which loops back to my previous remark: Things may get ugly if /sys/power/wakeup_type
> > doesn't do the right thing (the uevent mechanics you'd like to replace it with
> > will really need to do the same, so I'm not quite sure it's worth the effort).
> >
> > Namely, it really has to cover all events that might have woken you up and
> > happened before stuff has started to be added to the input buffers that Chrome
> > cares about. It is difficult to identify the exact point where that takes place
> > in the resume sequence, but it should be somewhere in dpm_resume_end(). Why so?
> > Because it really doesn't matter why exactly the system is waking up. What
> > matters is whether or not an event that you should react to by bringing up the
> > UI happens *at* *any* *time* between (and including) the actual wakeup and the
> > point when you can rely on the input buffers to contain any useful information
> > consumable by Chrome.
> >
>
> So this is something that we don't catch right now. Our
> implementation queries the firmware via an ACPI call to get the wakeup
> source and I was assuming that any events after that *would*
> eventually make their way up to Chrome or userspace. But based on
> what you are saying, it seems like we do drop any events that occur
> between the wakeup and the time when the input buffers contain useful
> information.
>
> I'm guessing that this window is pretty small though and when we do
> end up missing an event, we can consider ourselves lucky because the
> typical user reaction when their computer doesn't wake up is to start
> pressing random keys on the keyboard and we would definitely catch
> those.

That is, provided that the "user" actually is a human at the keyboard which
need not be the case in the general situation.

> > This pretty much means that /sys/power/wakeup_type needs to behave almost like
> > /sys/power/wakeup_count, but is limited to a subset of wakeup sources. That's
> > why I was talking about splitting the wakeup count.
> >
> > So instead of adding an entirely new mechanics for that, why don't you add
> > something like "priority" or "weight" to struct wakeup_source and assign
> > higher values of that to the wakeup sources associated with the events
> > you want to bring up the UI after resume? And make those "higher-priority"
> > wakeup sources use a separate wakeup counter, so you can easily verify if
> > any of them has triggered by reading that or making it trigger a uevent if
> > you want to?
> >
> >
>
> Sounds good to me.

OK

So I can prototype high-level support for that in the wakeup sources framework.

In addition to that we would need to ensure that the wakeup sources would be
activated in response to the wakeup interrupts handled by the IRQ core between
suspendind and resuming device IRQs. That is slightly less straightforward
than one might think, but should be doable.

Would you be interested in that?


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

2015-05-08 07:09:34

by Tomeu Vizoso

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On 05/07/2015 11:03 PM, Rafael J. Wysocki wrote:
> On Thursday, May 07, 2015 05:54:56 PM One Thousand Gnomes wrote:
>> On Tue, 05 May 2015 14:31:26 +0200
>>
>>> For example, when you wake up from S3 on ACPI-based systems, the best you
>>> can get is what devices have generated the wakeup events, but there's
>>> no input available from that (like you won't know which key has been
>>> pressed). You may not get that even. You may only know what GPEs have
>>> caused the wakeup to happen and they may be shared.
>>>
>>> For PCI wakeup, the wakeup event may be out of band. You need to walk
>>> the hierarchy and check the PME status bits to identify the wakeup device
>>> and then you need to be careful enough not to reset it while putting into
>>> D0 for the input data associated with the event to be available. I'm not
>>> sure how many device/driver combinations this actually works for.
>>>
>>> For USB wakeup, you get the wakeup event from the controller which may be
>>> a PCI device. Getting to the USB device itself from there requires some
>>> work and even then the device may not "remember" what exactly happened.
>>>
>>> Further, if you wake up via the PC keyboard from suspend-to-idle, the
>>> wakeup key code is not available, the only thing you know is that the
>>> interrupts has occured (that may be changed, but it's how the current
>>> code works).
>>
>> It's probably got to change, otherwise once machines get able to sleep
>> between keypresses it's going to suck every time you pause and think for
>> a minute then begin typing. Remember display being off for suspend is
>> purely a limitation of most current display panels.
>
> Right.
>
> It is just one example, though.
>
> Take a PCI device in D3hot for another one. It may not even have a buffer
> to store input data while in that state. The only thing it may be able to
> do is to signal a PME from it.

Yeah, I tried to make clear that I don't think that this is generally
achievable. But in the ChromeOS hardware that I have here, the input
event is there for userspace to read when it wakes up.

But if there's traction for adding upstream a more generic mechanism
that works in a broader range of machines, I'm all for it.

Regards,

Tomeu

2015-05-11 22:12:36

by Pavel Machek

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

Hi!

> > If the event was user-triggered it sends
> > out a DBus signal announcing the end of the suspend, Chrome thaws its
> > renderer processes, the full UI comes back up, and the user can start
> > working. If the event was _not_ user-triggred (if it was the RTC or
> > NIC), the power manager sends out a different DBus signal announcing
> > that the system is in lucid sleep and will re-suspend soon. It will
> > then wait for all registered applications to report readiness to
> > suspend or for the max timeout to expire.
>
> First let me say that the "user-triggered" vs "non-user-triggered" distinction
> seems somewhat artificial to me. All boils down to having a special class
> of wakeup events that are supposed to make the power manager behave differently
> after resuming. Whether or not they are actually triggered by the user
> doesn't really matter technically.
...
> > So that was a little long-winded but hopefully I've addressed all your
> > concerns about potential race conditions in this code. I simplified a
> > few bits because would just complicate the discussion but for the most
> > part this is how the feature works now. Having the kernel emit a
> > uevent with the wakeup event type would take the place of the power
> > manager reading from /sys/power/wakeup_type in this system but
> > wouldn't really affect anything else.
>
> Which loops back to my previous remark: Things may get ugly if /sys/power/wakeup_type
> doesn't do the right thing (the uevent mechanics you'd like to replace it with
> will really need to do the same, so I'm not quite sure it's worth the effort).
>
> Namely, it really has to cover all events that might have woken you up and
> happened before stuff has started to be added to the input buffers that Chrome
> cares about. It is difficult to identify the exact point where that takes place
> in the resume sequence, but it should be somewhere in dpm_resume_end(). Why so?
> Because it really doesn't matter why exactly the system is waking up. What
> matters is whether or not an event that you should react to by bringing up the
> UI happens *at* *any* *time* between (and including) the actual wakeup and the
> point when you can rely on the input buffers to contain any useful information
> consumable by Chrome.
>
> This pretty much means that /sys/power/wakeup_type needs to behave almost like
> /sys/power/wakeup_count, but is limited to a subset of wakeup sources. That's
> why I was talking about splitting the wakeup count.
>
> So instead of adding an entirely new mechanics for that, why don't you add
> something like "priority" or "weight" to struct wakeup_source and assign
> higher values of that to the wakeup sources associated with the events
> you want to bring up the UI after resume? And make those "higher-priority"
> wakeup sources use a separate wakeup counter, so you can easily verify if
> any of them has triggered by reading that or making it trigger a uevent if
> you want to?

Does it do all we want? What if one device wants to generate both
"normal" and "higher-priority" wakeup events? (*)

Should not we have normal interface for keyboard (and similar devices)
where we could ask "did something interesting happen while we were
sleeping"? Actually.. maybe the device can queue the events
that happened during sleep, and deliver them after wakeup? If user
pressed key during sleep, you should have key event waiting on
/dev/input/event3...

Pavel
(*) Ethernet card might be an example. If machine received wake-on-lan
packet, it will want to wake up with screen on. If machine received
normal packet, it might want to process the packet and get back to sleep.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2015-05-12 00:20:46

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: A desktop environment[1] kernel wishlist

On Tuesday, May 12, 2015 12:12:30 AM Pavel Machek wrote:
> Hi!
>
> > > If the event was user-triggered it sends
> > > out a DBus signal announcing the end of the suspend, Chrome thaws its
> > > renderer processes, the full UI comes back up, and the user can start
> > > working. If the event was _not_ user-triggred (if it was the RTC or
> > > NIC), the power manager sends out a different DBus signal announcing
> > > that the system is in lucid sleep and will re-suspend soon. It will
> > > then wait for all registered applications to report readiness to
> > > suspend or for the max timeout to expire.
> >
> > First let me say that the "user-triggered" vs "non-user-triggered" distinction
> > seems somewhat artificial to me. All boils down to having a special class
> > of wakeup events that are supposed to make the power manager behave differently
> > after resuming. Whether or not they are actually triggered by the user
> > doesn't really matter technically.
> ...
> > > So that was a little long-winded but hopefully I've addressed all your
> > > concerns about potential race conditions in this code. I simplified a
> > > few bits because would just complicate the discussion but for the most
> > > part this is how the feature works now. Having the kernel emit a
> > > uevent with the wakeup event type would take the place of the power
> > > manager reading from /sys/power/wakeup_type in this system but
> > > wouldn't really affect anything else.
> >
> > Which loops back to my previous remark: Things may get ugly if /sys/power/wakeup_type
> > doesn't do the right thing (the uevent mechanics you'd like to replace it with
> > will really need to do the same, so I'm not quite sure it's worth the effort).
> >
> > Namely, it really has to cover all events that might have woken you up and
> > happened before stuff has started to be added to the input buffers that Chrome
> > cares about. It is difficult to identify the exact point where that takes place
> > in the resume sequence, but it should be somewhere in dpm_resume_end(). Why so?
> > Because it really doesn't matter why exactly the system is waking up. What
> > matters is whether or not an event that you should react to by bringing up the
> > UI happens *at* *any* *time* between (and including) the actual wakeup and the
> > point when you can rely on the input buffers to contain any useful information
> > consumable by Chrome.
> >
> > This pretty much means that /sys/power/wakeup_type needs to behave almost like
> > /sys/power/wakeup_count, but is limited to a subset of wakeup sources. That's
> > why I was talking about splitting the wakeup count.
> >
> > So instead of adding an entirely new mechanics for that, why don't you add
> > something like "priority" or "weight" to struct wakeup_source and assign
> > higher values of that to the wakeup sources associated with the events
> > you want to bring up the UI after resume? And make those "higher-priority"
> > wakeup sources use a separate wakeup counter, so you can easily verify if
> > any of them has triggered by reading that or making it trigger a uevent if
> > you want to?
>
> Does it do all we want?

I believe so.

> What if one device wants to generate both "normal" and "higher-priority"
> wakeup events? (*)
>
> Should not we have normal interface for keyboard (and similar devices)
> where we could ask "did something interesting happen while we were
> sleeping"? Actually.. maybe the device can queue the events
> that happened during sleep, and deliver them after wakeup? If user
> pressed key during sleep, you should have key event waiting on
> /dev/input/event3...

If it can queue up all of them, it can be "normal" priority just fine
and user space can read all the queue and decide what to do then.

The "high-priority" idea is for devices that can't do that at least at one
point during the suspend-resume cycle. In those cases we can't simply go
back and check what the event was, so we need to rely on the device's
"importance" or "class" (with respect to wakeup).


--
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.