2005-12-27 21:35:01

by Pavel Machek

[permalink] [raw]
Subject: [patch] pm: fix runtime powermanagement's /sys interface

/sys/devices/..../power interface is currently very broken. It takes
integer from user, and passes it to drivers as pm_message_t.event
... without even checking it. This changes the interface to pass
strings, and introduces checks.

Signed-off-by: Pavel Machek <[email protected]>

diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c
--- a/drivers/base/power/sysfs.c
+++ b/drivers/base/power/sysfs.c
@@ -27,22 +27,25 @@

static ssize_t state_show(struct device * dev, struct device_attribute *attr, char * buf)
{
- return sprintf(buf, "%u\n", dev->power.power_state.event);
+ if (dev->power.power_state.event)
+ return sprintf(buf, "suspend\n");
+ else
+ return sprintf(buf, "on\n");
}

static ssize_t state_store(struct device * dev, struct device_attribute *attr, const char * buf, size_t n)
{
pm_message_t state;
- char * rest;
- int error = 0;
+ int error = -EINVAL;

- state.event = simple_strtoul(buf, &rest, 10);
- if (*rest)
- return -EINVAL;
- if (state.event)
- error = dpm_runtime_suspend(dev, state);
- else
+ state.event = PM_EVENT_SUSPEND;
+ if ((n == 2) && !strncmp(buf, "on", 2)) {
dpm_runtime_resume(dev);
+ error = 0;
+ }
+ if ((n == 7) && !strncmp(buf, "suspend", 7))
+ error = dpm_runtime_suspend(dev, state);
+
return error ? error : n;
}


--
Thanks, Sharp!


2005-12-27 21:55:20

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On 12/27/05, Pavel Machek <[email protected]> wrote:
>
> static ssize_t state_show(struct device * dev, struct device_attribute *attr, char * buf)
> {
> - return sprintf(buf, "%u\n", dev->power.power_state.event);
> + if (dev->power.power_state.event)
> + return sprintf(buf, "suspend\n");
> + else
> + return sprintf(buf, "on\n");
> }

Are you sure that having only 2 options (suspend/on) is enough at the
core level? I could envision having more levels, like "poweroff", etc?

--
Dmitry

2005-12-27 22:05:57

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

Hi!

> > static ssize_t state_show(struct device * dev, struct device_attribute *attr, char * buf)
> > {
> > - return sprintf(buf, "%u\n", dev->power.power_state.event);
> > + if (dev->power.power_state.event)
> > + return sprintf(buf, "suspend\n");
> > + else
> > + return sprintf(buf, "on\n");
> > }
>
> Are you sure that having only 2 options (suspend/on) is enough at the
> core level? I could envision having more levels, like "poweroff", etc?

Note that interface is 0/2, currently, so this is improvement :-).

One day, when we find device that needs it, we may want to add more
states. I don't know about such device currently.

Pavel
--
Thanks, Sharp!

2005-12-28 04:22:17

by Patrick Mochel

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface



On Tue, 27 Dec 2005, Pavel Machek wrote:

> Hi!
>
> > > static ssize_t state_show(struct device * dev, struct device_attribute *attr, char * buf)
> > > {
> > > - return sprintf(buf, "%u\n", dev->power.power_state.event);
> > > + if (dev->power.power_state.event)
> > > + return sprintf(buf, "suspend\n");
> > > + else
> > > + return sprintf(buf, "on\n");
> > > }
> >
> > Are you sure that having only 2 options (suspend/on) is enough at the
> > core level? I could envision having more levels, like "poweroff", etc?
>
> Note that interface is 0/2, currently, so this is improvement :-).

Heh, not really. You're not really solving any problems, only giving the
illusion that you've actually fixed something.

As I mentioned in the thread (currently happening, BTW) on the linux-pm
list, what you want to do is accept a string that reflects an actual state
that the device supports. For PCI devices that support low-power states,
this would be "D1", "D2", "D3", etc. For USB devices, which only support
an "on" and "suspended" state, the values that this patch parses would
actually work.

One way or another, you want the drivers to export the power states that
they support in some fashion. Not all devices support PM, and the current
interface is admittedly lacking in that respect. As I mentioned, the
proper thing to do would be to not add this file for *every* device, but
leave it up to the buses to add it for devices that support PM (and that
have drivers bound to them that implement it).

The reason is that some drivers and devices will support more than just
"on" and "suspended". which states those are, the power savings that they
offer, and the tradeoff in latency for resuming from those states are real
values and things to be considered for wanting to enter those states.

Your patch does nothing to actually help support those things, and doesn't
do anything to improve the broken interface.

> One day, when we find device that needs it, we may want to add more
> states. I don't know about such device currently.

There are many devices already do - there are PCI, PCI-X, PCI Express,
ACPI devices, etc that do. But, you simply cannot create a single decent
runtime power management interface for every single type of device. The
power states are inherently specific to the bus that they are on. Some of
them are specific to the device.

This is not suspend - you won't be able to get away with a few arbitrary
values that work for most systems. The proper interface should allow the
buses and drivers to use *factual* identifiers to express the states they
support. Anything else (including the current interface, which I wrote) is
simply a hack.

Thanks,


Patrick

2006-01-04 21:34:16

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On ?t 27-12-05 20:22:04, Patrick Mochel wrote:
>
>
> On Tue, 27 Dec 2005, Pavel Machek wrote:
>
> > Hi!
> >
> > > > static ssize_t state_show(struct device * dev, struct device_attribute *attr, char * buf)
> > > > {
> > > > - return sprintf(buf, "%u\n", dev->power.power_state.event);
> > > > + if (dev->power.power_state.event)
> > > > + return sprintf(buf, "suspend\n");
> > > > + else
> > > > + return sprintf(buf, "on\n");
> > > > }
> > >
> > > Are you sure that having only 2 options (suspend/on) is enough at the
> > > core level? I could envision having more levels, like "poweroff", etc?
> >
> > Note that interface is 0/2, currently, so this is improvement :-).
>
> Heh, not really. You're not really solving any problems, only giving the
> illusion that you've actually fixed something.

Except perhaps userspace passing invalid values down to drivers in
pm_message_t.event?

> As I mentioned in the thread (currently happening, BTW) on the linux-pm
> list, what you want to do is accept a string that reflects an actual state
> that the device supports. For PCI devices that support low-power states,
> this would be "D1", "D2", "D3", etc. For USB devices, which only support
> an "on" and "suspended" state, the values that this patch parses would
> actually work.

We want _common_ values, anyway. So, we do not want "D0", "D1", "D2",
"D3hot" in PCI cases. We probably want "on", "D1", "D2", "suspend",
and I'm not sure about those "D1" and "D2" parts. Userspace should not
have to know about details, it will mostly use "on"/"suspend" anyway.

> > One day, when we find device that needs it, we may want to add more
> > states. I don't know about such device currently.
>
> There are many devices already do - there are PCI, PCI-X, PCI Express,
> ACPI devices, etc that do. But, you simply cannot create a single
> decent

I asked for an example.
Pavel
--
Thanks, Sharp!

2006-01-04 22:06:14

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Wed, 4 Jan 2006, Pavel Machek wrote:

> > As I mentioned in the thread (currently happening, BTW) on the linux-pm
> > list, what you want to do is accept a string that reflects an actual state
> > that the device supports. For PCI devices that support low-power states,
> > this would be "D1", "D2", "D3", etc. For USB devices, which only support
> > an "on" and "suspended" state, the values that this patch parses would
> > actually work.
>
> We want _common_ values, anyway. So, we do not want "D0", "D1", "D2",
> "D3hot" in PCI cases. We probably want "on", "D1", "D2", "suspend",
> and I'm not sure about those "D1" and "D2" parts. Userspace should not
> have to know about details, it will mostly use "on"/"suspend" anyway.

It would be good to make the details available so that they are there when
needed. For instance, we might export "D0", "on", "D1", "D2", "D3", and
"suspend", treating "on" as a synonym for "D0" and "suspend" as a synonym
for "D3".

Alan Stern

2006-01-04 22:17:11

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On St 04-01-06 17:06:09, Alan Stern wrote:
> On Wed, 4 Jan 2006, Pavel Machek wrote:
>
> > > As I mentioned in the thread (currently happening, BTW) on the linux-pm
> > > list, what you want to do is accept a string that reflects an actual state
> > > that the device supports. For PCI devices that support low-power states,
> > > this would be "D1", "D2", "D3", etc. For USB devices, which only support
> > > an "on" and "suspended" state, the values that this patch parses would
> > > actually work.
> >
> > We want _common_ values, anyway. So, we do not want "D0", "D1", "D2",
> > "D3hot" in PCI cases. We probably want "on", "D1", "D2", "suspend",
> > and I'm not sure about those "D1" and "D2" parts. Userspace should not
> > have to know about details, it will mostly use "on"/"suspend" anyway.
>
> It would be good to make the details available so that they are there when
> needed. For instance, we might export "D0", "on", "D1", "D2", "D3", and
> "suspend", treating "on" as a synonym for "D0" and "suspend" as a synonym
> for "D3".

Why to make it this complex?

I do not think there's any confusion possible. "on" always corresponds
to "D0", and "suspend" is "D3". Anyone who knows what "D2" means,
should know that, too...
Pavel
--
Thanks, Sharp!

2006-01-05 21:42:43

by Patrick Mochel

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface


On Wed, 4 Jan 2006, Pavel Machek wrote:

> On ?t 27-12-05 20:22:04, Patrick Mochel wrote:

> > Heh, not really. You're not really solving any problems, only giving the
> > illusion that you've actually fixed something.
>
> Except perhaps userspace passing invalid values down to drivers in
> pm_message_t.event?

It is up to the layer parsing the value to filter out bad values.

> > As I mentioned in the thread (currently happening, BTW) on the linux-pm
> > list, what you want to do is accept a string that reflects an actual state
> > that the device supports. For PCI devices that support low-power states,
> > this would be "D1", "D2", "D3", etc. For USB devices, which only support
> > an "on" and "suspended" state, the values that this patch parses would
> > actually work.
>
> We want _common_ values, anyway. So, we do not want "D0", "D1", "D2",
> "D3hot" in PCI cases. We probably want "on", "D1", "D2", "suspend",
> and I'm not sure about those "D1" and "D2" parts. Userspace should not
> have to know about details, it will mostly use "on"/"suspend" anyway.

D0 - D3 are common for all PCI devices. "on" and "suspend" are not device
states. They are conceptual representations of device states.

I understand where you are coming from. Most users will only care that a
particular device is "on" or "off". That is fine. They will click through
a gui that turns off a device and never think any more about it.

However, we are not developing an interface for end-users. We're
developing an interface that the guis may use. And, along with the guis,
there are also people that care about everything in between "on" and
"suspend".

If we export exactly the device states that a device supports, then we can
leave it up to the layers taking user input to translate between the real
device states and the user conceptions ("on", "suspend", etc), instead of
doing the translation ourselves.

That way, the kernel PM layers (as part of the bus drivers), can easily
and simply support every device that a device may have. What happens when
device manufacturers start coming out with D4, D5, etc (like they have
with the CPU C? States)? "suspend" takes on a completely different
meaning. With a simple (and accurate) kernel interface, this will never be
an issue.

> > > One day, when we find device that needs it, we may want to add more
> > > states. I don't know about such device currently.
> >
> > There are many devices already do - there are PCI, PCI-X, PCI Express,
> > ACPI devices, etc that do. But, you simply cannot create a single
> > decent
>
> I asked for an example.

Hey, calm down.

See for yourself:

for i in `lspci | cut -d ' ' -f 1 `
do
pm_str=`sudo lspci -s $i -vv | grep -A 1 "Power Management" | grep
Flags | cut -d ' ' -f 4-5`;

if [ ! -z "$pm_str" ]
then
printf "Device: %s ==> Power States: %s\n" $i "$pm_str"
fi
done


I have a firewire controller in a desktop system, and a ATI Radeon in a
T42 that support D1 and D2..

Thanks,


Patrick

2006-01-05 21:43:39

by Patrick Mochel

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface


On Wed, 4 Jan 2006, Alan Stern wrote:

> On Wed, 4 Jan 2006, Pavel Machek wrote:
>
> > > As I mentioned in the thread (currently happening, BTW) on the linux-pm
> > > list, what you want to do is accept a string that reflects an actual state
> > > that the device supports. For PCI devices that support low-power states,
> > > this would be "D1", "D2", "D3", etc. For USB devices, which only support
> > > an "on" and "suspended" state, the values that this patch parses would
> > > actually work.
> >
> > We want _common_ values, anyway. So, we do not want "D0", "D1", "D2",
> > "D3hot" in PCI cases. We probably want "on", "D1", "D2", "suspend",
> > and I'm not sure about those "D1" and "D2" parts. Userspace should not
> > have to know about details, it will mostly use "on"/"suspend" anyway.
>
> It would be good to make the details available so that they are there when
> needed. For instance, we might export "D0", "on", "D1", "D2", "D3", and
> "suspend", treating "on" as a synonym for "D0" and "suspend" as a synonym
> for "D3".

Do it in userspace; the kernel doesn't need to know about "on" or
"suspend". It should just validate and forward requests to enter specific
states.

Thanks,


Patrick

2006-01-05 21:56:04

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Čt 05-01-06 13:42:33, Patrick Mochel wrote:
>
> On Wed, 4 Jan 2006, Pavel Machek wrote:
>
> > On Út 27-12-05 20:22:04, Patrick Mochel wrote:
>
> > > Heh, not really. You're not really solving any problems, only giving the
> > > illusion that you've actually fixed something.
> >
> > Except perhaps userspace passing invalid values down to drivers in
> > pm_message_t.event?
>
> It is up to the layer parsing the value to filter out bad values.

Do you have the patch to filter bad values? I submitted one. You
rejected it, because it does not support D1. Never mind that original
code does not support D1, either. [Should I retransmit the patch?]

If you suggest to just add check for == 0 or == 2... I think I can do
that, but that's going to break userspace, anyway (it passes _3_
there) and have no reasonable path to sane interface.

> > > As I mentioned in the thread (currently happening, BTW) on the linux-pm
> > > list, what you want to do is accept a string that reflects an actual state
> > > that the device supports. For PCI devices that support low-power states,
> > > this would be "D1", "D2", "D3", etc. For USB devices, which only support
> > > an "on" and "suspended" state, the values that this patch parses would
> > > actually work.
> >
> > We want _common_ values, anyway. So, we do not want "D0", "D1", "D2",
> > "D3hot" in PCI cases. We probably want "on", "D1", "D2", "suspend",
> > and I'm not sure about those "D1" and "D2" parts. Userspace should not
> > have to know about details, it will mostly use "on"/"suspend" anyway.
>
> D0 - D3 are common for all PCI devices. "on" and "suspend" are not device
> states. They are conceptual representations of device states.
>
> I understand where you are coming from. Most users will only care that a
> particular device is "on" or "off". That is fine. They will click through
> a gui that turns off a device and never think any more about it.
>
> However, we are not developing an interface for end-users. We're
> developing an interface that the guis may use. And, along with the guis,
> there are also people that care about everything in between "on" and
> "suspend".

There may be more than "D1" and "D2" between "D0" and "D3". There may
be two different ways to put particular device into D1 sleep. This
means buses and devices contributing states, and all this is
complexity noone really wants.

> If we export exactly the device states that a device supports, then
>we can

Exporting multiple states is quite a lot of code, and it needs driver
changes. There's no clear benefit.

> I have a firewire controller in a desktop system, and a ATI Radeon in a
> T42 that support D1 and D2..

Ok, now we have a concrete example. So Radeon supports D1. But putting
radeon into D1 means you probably want to blank your screen and turn
the backlight off; that takes *long* time anyway. So you can simply
put your radeon into D3 and save a bit more power.

So yes, Radeon supports D1, but we probably do not want to use it.

[You may still want to do D1 on radeon for
debugging/testing/something. Fine, but we are trying to build
power-management infrastructure, not debugging one.]

Pavel
--
Thanks, Sharp!

2006-01-05 22:06:48

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Thu, 5 Jan 2006, Patrick Mochel wrote:

> > It would be good to make the details available so that they are there when
> > needed. For instance, we might export "D0", "on", "D1", "D2", "D3", and
> > "suspend", treating "on" as a synonym for "D0" and "suspend" as a synonym
> > for "D3".
>
> Do it in userspace; the kernel doesn't need to know about "on" or
> "suspend". It should just validate and forward requests to enter specific
> states.

The problem is that the set of devices, drivers, and states is not
bounded. A single userspace tool might not know about all the
device-specific states some weird driver supports. The tool should still
be able to ask the kernel to suspend the device without needing to know
exactly which device state corresponds to "suspended".


On Thu, 5 Jan 2006, Pavel Machek wrote:

> Its okay with me to add more states _when they are needed_. Just now,
> many drivers do not even handle system suspend/resume correctly.

> We are not adding random crap to kernel just because "someone may need
> it". And yes, having aliases counts as "random crap". Perfectly legal
> but totally useless things count as a random crap, too.
>
> Bring example hardware that needs more than two states, implement
> driver support for that, and then we can talk about adding more than
> two states into core code.

Embedded devices are a great example. Consider putting Linux on a
portable phone. The individual components can have many different power
states, depending on which clock and power lines are enabled. "on" and
"suspend" won't be sufficient to handle the vendor's needs.

Alan Stern

2006-01-05 22:13:37

by Dominik Brodowski

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Thu, Jan 05, 2006 at 10:55:29PM +0100, Pavel Machek wrote:
> > I have a firewire controller in a desktop system, and a ATI Radeon in a
> > T42 that support D1 and D2..
>
> Ok, now we have a concrete example. So Radeon supports D1. But putting
> radeon into D1 means you probably want to blank your screen and turn
> the backlight off; that takes *long* time anyway. So you can simply
> put your radeon into D3 and save a bit more power.

Using your logic, you never want to put your CPU into C2 power-saving state
instead of C3 or C4. Which is ridiculous. There are technical reasons why
you want to put devices into different power-saving states. E.g. wakeup
latency, ability to receive wakeup signals, snooping and so on.

In addition, your patch breaks pcmcia / pcmciautils which already uses
numbers (which I already had to change from "3" to "2" before...).

Dominik

2006-01-05 22:15:53

by Patrick Mochel

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface


On Thu, 5 Jan 2006, Pavel Machek wrote:

> Do you have the patch to filter bad values? I submitted one. You
> rejected it, because it does not support D1. Never mind that original
> code does not support D1, either. [Should I retransmit the patch?]

No, I offered guidance in one of the first emails. The code that exports a
'power' file for every single device from the driver model core should be
removed.

It should be replaced with a file exported by the bus driver that exports
the actual states that the device supports. The parsing can easily happen
at this point, because the bus knows what a good value is.

> If you suggest to just add check for == 0 or == 2... I think I can do
> that, but that's going to break userspace, anyway (it passes _3_
> there) and have no reasonable path to sane interface.

The userspace interface is broken. We can keep it for compatability
reasons, but there needs to be a new interface.

> There may be more than "D1" and "D2" between "D0" and "D3". There may
> be two different ways to put particular device into D1 sleep. This
> means buses and devices contributing states, and all this is
> complexity noone really wants.

It is not unnecessary, though. If the buses are the ones exporting the
states, then the number of states and the names of the states can be
accurate.

> > If we export exactly the device states that a device supports, then
> >we can
>
> Exporting multiple states is quite a lot of code, and it needs driver
> changes. There's no clear benefit.

I don't understand what you're saying. If I have a driver that I want to
make support another power state and I'm willing to write that code, then
there is a clear benefit to having the infrastructure for it to "just
work".

> > I have a firewire controller in a desktop system, and a ATI Radeon in a
> > T42 that support D1 and D2..
>
> Ok, now we have a concrete example. So Radeon supports D1. But putting
> radeon into D1 means you probably want to blank your screen and turn
> the backlight off; that takes *long* time anyway. So you can simply
> put your radeon into D3 and save a bit more power.
>
> So yes, Radeon supports D1, but we probably do not want to use it.

I'm sorry, but that's not your call to make.

If the device supports it, and if the driver supports it (or if it is made
to support it), then we don't want to modify the kernel (the core driver
model code, no less) just to add the ability to pass one more state down
to the driver, when we could get it for free now (by having the PCI bus
export the PCI device states it sees the device support via the config
space).

If you want a more concrete example, consider the possibility where it may
be possible to reinitialize the device from D1 or D2, but not from D3. For
the radeon, this is true in some cases (if I understand Ben H correctly).
There is also the case where screen blanking is either free or
non-existent (when the GPU is used for computation).

> [You may still want to do D1 on radeon for
> debugging/testing/something. Fine, but we are trying to build
> power-management infrastructure, not debugging one.]

I see those things as means to the same end. We want a good debugging
interface so people will actually try these things and use them. Plus,
testing is a big piece and going to get bigger - Many people and companies
want to know how much battery life they can squeeze out of a laptop, and
that evaluation involves experiementing with different configurations of
devices being in various states of suspend.

Thanks,


Patrick

2006-01-05 22:23:52

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Čt 05-01-06 23:13:34, Dominik Brodowski wrote:
> On Thu, Jan 05, 2006 at 10:55:29PM +0100, Pavel Machek wrote:
> > > I have a firewire controller in a desktop system, and a ATI Radeon in a
> > > T42 that support D1 and D2..
> >
> > Ok, now we have a concrete example. So Radeon supports D1. But putting
> > radeon into D1 means you probably want to blank your screen and turn
> > the backlight off; that takes *long* time anyway. So you can simply
> > put your radeon into D3 and save a bit more power.
>
> Using your logic, you never want to put your CPU into C2 power-saving state
> instead of C3 or C4. Which is ridiculous. There are technical reasons why
> you want to put devices into different power-saving states. E.g. wakeup
> latency, ability to receive wakeup signals, snooping and so on.

Well, yes. Two years before we supported hlt, it would be stupid to
try to support multiple hlt states. We do not support hlt for any
devices today.

> In addition, your patch breaks pcmcia / pcmciautils which already uses
> numbers (which I already had to change from "3" to "2" before...).

pcmcia actually uses this? Ouch. Do you just read the power file, or
do you write to it, too?
Pavel

--
Thanks, Sharp!

2006-01-05 22:27:09

by Dominik Brodowski

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Thu, Jan 05, 2006 at 11:23:38PM +0100, Pavel Machek wrote:
> > In addition, your patch breaks pcmcia / pcmciautils which already uses
> > numbers (which I already had to change from "3" to "2" before...).
>
> pcmcia actually uses this? Ouch. Do you just read the power file, or
> do you write to it, too?

Reading and writing. Replacement for "cardctl suspend" and "cardctl resume".


static int pccardctl_power_one(unsigned long socket_no, unsigned int device,
unsigned int power)
{
int ret;
char file[SYSFS_PATH_MAX];
struct sysfs_attribute *attr;

snprintf(file, SYSFS_PATH_MAX,
"/sys/bus/pcmcia/devices/%lu.%u/power/state",
socket_no, device);

attr = sysfs_open_attribute(file);
if (!attr)
return -ENODEV;

ret = sysfs_write_attribute(attr, power ? "2" : "0", 1);

sysfs_close_attribute(attr);

return (ret);
}


NB: it will break one day, one way or another, when gregkh makes the
/sys/class -> /sys/devices conversion. However, I'd want to try not to break
the new pcmciautils userspace too often...

Dominik

2006-01-05 22:29:05

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

> On Thu, 5 Jan 2006, Pavel Machek wrote:
>
> > Its okay with me to add more states _when they are needed_. Just now,
> > many drivers do not even handle system suspend/resume correctly.
>
> > We are not adding random crap to kernel just because "someone may need
> > it". And yes, having aliases counts as "random crap". Perfectly legal
> > but totally useless things count as a random crap, too.
> >
> > Bring example hardware that needs more than two states, implement
> > driver support for that, and then we can talk about adding more than
> > two states into core code.
>
> Embedded devices are a great example. Consider putting Linux on a
> portable phone. The individual components can have many different power
> states, depending on which clock and power lines are enabled. "on" and
> "suspend" won't be sufficient to handle the vendor's needs.

Consider putting Linux on a portable phone. It could contain 13-bit
bytes. Does that mean we should support 13-bit bytes in linux?

No.

People are running Linux on cellphones today, and they are running for
weeks in standby, too. And runtime suspend support is still terminally
broken, so they somehow get around that.

Do you have example of device that has multiple "sleep states" and it
makes sense to use them?

There are examples of devices that have multiple "on states". IIRC
graphics cards can already control their clock tick rates. I hope we
are not trying to solve that one here.
Pavel
--
Thanks, Sharp!

2006-01-05 22:44:24

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Čt 05-01-06 14:15:39, Patrick Mochel wrote:
>
> On Thu, 5 Jan 2006, Pavel Machek wrote:
>
> > Do you have the patch to filter bad values? I submitted one. You
> > rejected it, because it does not support D1. Never mind that original
> > code does not support D1, either. [Should I retransmit the patch?]
>
> No, I offered guidance in one of the first emails. The code that exports a
> 'power' file for every single device from the driver model core should be
> removed.
>
> It should be replaced with a file exported by the bus driver that exports
> the actual states that the device supports. The parsing can easily happen
> at this point, because the bus knows what a good value is.

(1) would change core<->driver interface

(2) is quite a lot of work

(3) ...with very little benefit, until drivers support >2 states

I want to fix invalid values being passed down to drivers, not rewrite
half of driver model.

If you want to rewrite driver model for >2 states, great, but that is
going to take at least a year AFAICS, so please let me at least fix
the bugs in meantime.

> > If you suggest to just add check for == 0 or == 2... I think I can do
> > that, but that's going to break userspace, anyway (it passes _3_
> > there) and have no reasonable path to sane interface.
>
> The userspace interface is broken. We can keep it for compatability
> reasons, but there needs to be a new interface.

I assumed we could fix the interface without actually introducing >2
states support. That can be done in reasonable ammount of code.

> > > If we export exactly the device states that a device supports, then
> > >we can
> >
> > Exporting multiple states is quite a lot of code, and it needs driver
> > changes. There's no clear benefit.
>
> I don't understand what you're saying. If I have a driver that Iwant to
~~~~~~~~~~~~~~~~~~
> make support another power state and I'm willing to write that code, then
> there is a clear benefit to having the infrastructure for it to "just
> work".

I do not see such drivers around me, that's all. It seems fair to me
that first driver author wanting that is the one who introduces >2
states support to generic infrastructure.

> If you want a more concrete example, consider the possibility where it may
> be possible to reinitialize the device from D1 or D2, but not from D3. For
> the radeon, this is true in some cases (if I understand Ben H
> correctly).

...which seems like one more reason to only export "on" and "off" in
radeon case. We don't want userspace to write "D3" to radeon, then
wondering why it failed.

Passing "on"/"off" down to radeon lets the driver decide what power
state it should enter.
Pavel
--
Thanks, Sharp!

2006-01-05 23:00:10

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Čt 05-01-06 23:27:05, Dominik Brodowski wrote:
> On Thu, Jan 05, 2006 at 11:23:38PM +0100, Pavel Machek wrote:
> > > In addition, your patch breaks pcmcia / pcmciautils which already uses
> > > numbers (which I already had to change from "3" to "2" before...).
> >
> > pcmcia actually uses this? Ouch. Do you just read the power file, or
> > do you write to it, too?
>
> Reading and writing. Replacement for "cardctl suspend" and "cardctl resume".
>
>
> static int pccardctl_power_one(unsigned long socket_no, unsigned int device,
> unsigned int power)
> {
> int ret;
> char file[SYSFS_PATH_MAX];
> struct sysfs_attribute *attr;
>
> snprintf(file, SYSFS_PATH_MAX,
> "/sys/bus/pcmcia/devices/%lu.%u/power/state",
> socket_no, device);
>
> attr = sysfs_open_attribute(file);
> if (!attr)
> return -ENODEV;
>
> ret = sysfs_write_attribute(attr, power ? "2" : "0", 1);
>
> sysfs_close_attribute(attr);
>
> return (ret);
> }
>
>
> NB: it will break one day, one way or another, when gregkh makes the
> /sys/class -> /sys/devices conversion. However, I'd want to try not to break
> the new pcmciautils userspace too often...

Do you know how soon that day is?

I'd really like to change it into strings so that interface will not
have to change when/if someone will introduce multiple power states.

I could accept both "0" and "on" on input, but have to output single
value.
Pavel

--
Thanks, Sharp!

2006-01-05 23:32:28

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Čt 05-01-06 23:27:05, Dominik Brodowski wrote:
> On Thu, Jan 05, 2006 at 11:23:38PM +0100, Pavel Machek wrote:
> > > In addition, your patch breaks pcmcia / pcmciautils which already uses
> > > numbers (which I already had to change from "3" to "2" before...).
> >
> > pcmcia actually uses this? Ouch. Do you just read the power file, or
> > do you write to it, too?
>
> Reading and writing. Replacement for "cardctl suspend" and "cardctl resume".
>
>
> static int pccardctl_power_one(unsigned long socket_no, unsigned int device,
> unsigned int power)
> {
> int ret;
> char file[SYSFS_PATH_MAX];
> struct sysfs_attribute *attr;
>
> snprintf(file, SYSFS_PATH_MAX,
> "/sys/bus/pcmcia/devices/%lu.%u/power/state",
> socket_no, device);
>
> attr = sysfs_open_attribute(file);
> if (!attr)
> return -ENODEV;
>
> ret = sysfs_write_attribute(attr, power ? "2" : "0", 1);
>
> sysfs_close_attribute(attr);
>
> return (ret);
> }
>
>
> NB: it will break one day, one way or another, when gregkh makes the
> /sys/class -> /sys/devices conversion. However, I'd want to try not to break
> the new pcmciautils userspace too often...

Ok, so lets at least add value-checking to .../power file, and prevent
userspace see changes to PM_EVENT_SUSPEND value. 2 and 0 are now
"arbitrary cookies". I'd like to use "on" and "off", but pcmcia
apparently depends on "2" and "0", so...

Any objections?

Signed-off-by: Pavel Machek <[email protected]>
Pavel

diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c
--- a/drivers/base/power/sysfs.c
+++ b/drivers/base/power/sysfs.c
@@ -27,22 +27,25 @@

static ssize_t state_show(struct device * dev, struct device_attribute *attr, char * buf)
{
- return sprintf(buf, "%u\n", dev->power.power_state.event);
+ if (dev->power.power_state.event)
+ return sprintf(buf, "2\n");
+ else
+ return sprintf(buf, "0\n");
}

static ssize_t state_store(struct device * dev, struct device_attribute *attr, const char * buf, size_t n)
{
pm_message_t state;
- char * rest;
- int error = 0;
+ int error = -EINVAL;

- state.event = simple_strtoul(buf, &rest, 10);
- if (*rest)
- return -EINVAL;
- if (state.event)
- error = dpm_runtime_suspend(dev, state);
- else
+ state.event = PM_EVENT_SUSPEND;
+ if ((n == 1) && !strncmp(buf, "2", 1)) {
dpm_runtime_resume(dev);
+ error = 0;
+ }
+ if ((n == 1) && !strncmp(buf, "0", 1))
+ error = dpm_runtime_suspend(dev, state);
+
return error ? error : n;
}



--
Thanks, Sharp!

2006-01-05 23:46:32

by Dominik Brodowski

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Fri, Jan 06, 2006 at 12:08:49AM +0100, Pavel Machek wrote:
> Ok, so lets at least add value-checking to .../power file, and prevent
> userspace see changes to PM_EVENT_SUSPEND value. 2 and 0 are now
> "arbitrary cookies". I'd like to use "on" and "off", but pcmcia
> apparently depends on "2" and "0", so...
>
> Any objections?

Sorry, but yes -- the same as before, minus the PCMCIA breakage.

Dominik

2006-01-05 23:54:36

by Patrick Mochel

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface


On Thu, 5 Jan 2006, Pavel Machek wrote:

> On Čt 05-01-06 14:15:39, Patrick Mochel wrote:

> > It should be replaced with a file exported by the bus driver that exports
> > the actual states that the device supports. The parsing can easily happen
> > at this point, because the bus knows what a good value is.
>
> (1) would change core<->driver interface

It's broken anyway for runtime power management.

> (2) is quite a lot of work

In the long run, it's not.

> (3) ...with very little benefit, until drivers support >2 states

Without it, you are preventing drivers from being able to support > 2
states.

> I want to fix invalid values being passed down to drivers, not rewrite
> half of driver model.

Please don't exaggerate the issue.

> If you want to rewrite driver model for >2 states, great, but that is
> going to take at least a year AFAICS, so please let me at least fix
> the bugs in meantime.

It's a band-aid; it is not a long-term solution.

> > The userspace interface is broken. We can keep it for compatability
> > reasons, but there needs to be a new interface.
>
> I assumed we could fix the interface without actually introducing >2
> states support. That can be done in reasonable ammount of code.

The interface is irreparably broken. You can't fix it with an infinite
number of band aids.

> > I don't understand what you're saying. If I have a driver that Iwant to
> ~~~~~~~~~~~~~~~~~~
> > make support another power state and I'm willing to write that code, then
> > there is a clear benefit to having the infrastructure for it to "just
> > work".
>
> I do not see such drivers around me, that's all. It seems fair to me
> that first driver author wanting that is the one who introduces >2
> states support to generic infrastructure.

Just because you personally have not seen such things does not mean they
do not exist.

> > If you want a more concrete example, consider the possibility where it may
> > be possible to reinitialize the device from D1 or D2, but not from D3. For
> > the radeon, this is true in some cases (if I understand Ben H
> > correctly).
>
> ...which seems like one more reason to only export "on" and "off" in
> radeon case. We don't want userspace to write "D3" to radeon, then
> wondering why it failed.

User application translates e.g. -EINVAL into "State not supported."

> Passing "on"/"off" down to radeon lets the driver decide what power
> state it should enter.

Driver implements power state policy? Sounds like that policy would find a
much more comfortable home in userspace.

Thanks,


Patrick

2006-01-05 23:59:15

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On P? 06-01-06 00:46:29, Dominik Brodowski wrote:
> On Fri, Jan 06, 2006 at 12:08:49AM +0100, Pavel Machek wrote:
> > Ok, so lets at least add value-checking to .../power file, and prevent
> > userspace see changes to PM_EVENT_SUSPEND value. 2 and 0 are now
> > "arbitrary cookies". I'd like to use "on" and "off", but pcmcia
> > apparently depends on "2" and "0", so...
> >
> > Any objections?
>
> Sorry, but yes -- the same as before, minus the PCMCIA breakage.

I don't understand at this point.

Current code takes value from user, and passes it down to driver,
without any checking. If user writes 666 into the file, it will
happily pass down 666 to the driver. Driver does not expect 666 in
pm_message_t.event. It may oops, BUG_ON() or anything like that.

Shall I change

#define PM_EVENT_SUSPEND 2

to

#define PM_EVENT_SUSPEND 1324

to get my point across? This is kernel-specific value, it should not
be exported to userland.
Pavel

--
Thanks, Sharp!

2006-01-06 00:04:28

by Patrick Mochel

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface


On Fri, 6 Jan 2006, Pavel Machek wrote:

> On P? 06-01-06 00:46:29, Dominik Brodowski wrote:
> > On Fri, Jan 06, 2006 at 12:08:49AM +0100, Pavel Machek wrote:
> > > Ok, so lets at least add value-checking to .../power file, and prevent
> > > userspace see changes to PM_EVENT_SUSPEND value. 2 and 0 are now
> > > "arbitrary cookies". I'd like to use "on" and "off", but pcmcia
> > > apparently depends on "2" and "0", so...
> > >
> > > Any objections?
> >
> > Sorry, but yes -- the same as before, minus the PCMCIA breakage.
>
> I don't understand at this point.
>
> Current code takes value from user, and passes it down to driver,
> without any checking. If user writes 666 into the file, it will
> happily pass down 666 to the driver. Driver does not expect 666 in
> pm_message_t.event. It may oops, BUG_ON() or anything like that.
>
> Shall I change
>
> #define PM_EVENT_SUSPEND 2
>
> to
>
> #define PM_EVENT_SUSPEND 1324
>
> to get my point across? This is kernel-specific value, it should not
> be exported to userland.

A better point, and one that would actually be useful, would be to remove
the file altogether. Let Dominik export a power file, with complete
control over the values, for each pcmcia device. Then you never have to
worry about breaking PCMCIA again.

Thanks,


Patrick

2006-01-06 00:07:24

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Čt 05-01-06 15:54:15, Patrick Mochel wrote:
>
> On Thu, 5 Jan 2006, Pavel Machek wrote:
>
> > On Čt 05-01-06 14:15:39, Patrick Mochel wrote:
>
> > > It should be replaced with a file exported by the bus driver that exports
> > > the actual states that the device supports. The parsing can easily happen
> > > at this point, because the bus knows what a good value is.
> >
> > (1) would change core<->driver interface
>
> It's broken anyway for runtime power management.

Please explain. As far as I can see, it is fairly simple, but good
enough. pm_message_t.flags indicating it is runtime suspend would be
nice, but I do not think it is broken.

> > (2) is quite a lot of work
>
> In the long run, it's not.

Nobody fixed it in a year, so apparently it is a lot of work.

> > (3) ...with very little benefit, until drivers support >2 states
>
> Without it, you are preventing drivers from being able to support > 2
> states.

0 drivers support > 2 states. So it is indeed very little benefit.

> > If you want to rewrite driver model for >2 states, great, but that is
> > going to take at least a year AFAICS, so please let me at least fix
> > the bugs in meantime.
>
> It's a band-aid; it is not a long-term solution.

But band-aid is apparently neccessary unless you want drivers to see
invalid values.

> > > The userspace interface is broken. We can keep it for compatability
> > > reasons, but there needs to be a new interface.
> >
> > I assumed we could fix the interface without actually introducing >2
> > states support. That can be done in reasonable ammount of code.
>
> The interface is irreparably broken. You can't fix it with an infinite
> number of band aids.

Without "band aids", you'll get BUG()s all over the kernel.

> > > I don't understand what you're saying. If I have a driver that Iwant to
> > ~~~~~~~~~~~~~~~~~~
> > > make support another power state and I'm willing to write that code, then
> > > there is a clear benefit to having the infrastructure for it to "just
> > > work".
> >
> > I do not see such drivers around me, that's all. It seems fair to me
> > that first driver author wanting that is the one who introduces >2
> > states support to generic infrastructure.
>
> Just because you personally have not seen such things does not mean they
> do not exist.

Just because you claim they exist does not mean they exist.

> > Passing "on"/"off" down to radeon lets the driver decide what power
> > state it should enter.
>
> Driver implements power state policy? Sounds like that policy would find a
> much more comfortable home in userspace.

Userspace can't know that driver does not support D3 on this
particular hardware version because of hardware problems... or simply
because it is not yet implemented.
Pavel
--
Thanks, Sharp!

2006-01-06 00:13:11

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Čt 05-01-06 16:04:07, Patrick Mochel wrote:
>
> On Fri, 6 Jan 2006, Pavel Machek wrote:
>
> > On Pá 06-01-06 00:46:29, Dominik Brodowski wrote:
> > > On Fri, Jan 06, 2006 at 12:08:49AM +0100, Pavel Machek wrote:
> > > > Ok, so lets at least add value-checking to .../power file, and prevent
> > > > userspace see changes to PM_EVENT_SUSPEND value. 2 and 0 are now
> > > > "arbitrary cookies". I'd like to use "on" and "off", but pcmcia
> > > > apparently depends on "2" and "0", so...
> > > >
> > > > Any objections?
> > >
> > > Sorry, but yes -- the same as before, minus the PCMCIA breakage.
> >
> > I don't understand at this point.
> >
> > Current code takes value from user, and passes it down to driver,
> > without any checking. If user writes 666 into the file, it will
> > happily pass down 666 to the driver. Driver does not expect 666 in
> > pm_message_t.event. It may oops, BUG_ON() or anything like that.
> >
> > Shall I change
> >
> > #define PM_EVENT_SUSPEND 2
> >
> > to
> >
> > #define PM_EVENT_SUSPEND 1324
> >
> > to get my point across? This is kernel-specific value, it should not
> > be exported to userland.
>
> A better point, and one that would actually be useful, would be to remove
> the file altogether. Let Dominik export a power file, with complete
> control over the values, for each pcmcia device. Then you never have to
> worry about breaking PCMCIA again.

Fine with me.

[except that now you we went from supporting 2 runtime device states
to supporting just 1... I don't think this is much of progress...]

diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c
--- a/drivers/base/power/sysfs.c
+++ b/drivers/base/power/sysfs.c
@@ -6,49 +6,6 @@
#include <linux/string.h>
#include "power.h"

-
-/**
- * state - Control current power state of device
- *
- * show() returns the current power state of the device. '0' indicates
- * the device is on. Other values (1-3) indicate the device is in a low
- * power state.
- *
- * store() sets the current power state, which is an integer value
- * between 0-3. If the device is on ('0'), and the value written is
- * greater than 0, then the device is placed directly into the low-power
- * state (via its driver's ->suspend() method).
- * If the device is currently in a low-power state, and the value is 0,
- * the device is powered back on (via the ->resume() method).
- * If the device is in a low-power state, and a different low-power state
- * is requested, the device is first resumed, then suspended into the new
- * low-power state.
- */
-
-static ssize_t state_show(struct device * dev, struct device_attribute *attr, char * buf)
-{
- return sprintf(buf, "%u\n", dev->power.power_state.event);
-}
-
-static ssize_t state_store(struct device * dev, struct device_attribute *attr, const char * buf, size_t n)
-{
- pm_message_t state;
- char * rest;
- int error = 0;
-
- state.event = simple_strtoul(buf, &rest, 10);
- if (*rest)
- return -EINVAL;
- if (state.event)
- error = dpm_runtime_suspend(dev, state);
- else
- dpm_runtime_resume(dev);
- return error ? error : n;
-}
-
-static DEVICE_ATTR(state, 0644, state_show, state_store);
-
-
/*
* wakeup - Report/change current wakeup option for device
*
@@ -122,7 +79,6 @@ static DEVICE_ATTR(wakeup, 0644, wake_sh


static struct attribute * power_attrs[] = {
- &dev_attr_state.attr,
&dev_attr_wakeup.attr,
NULL,
};

--
Thanks, Sharp!

2006-01-06 01:37:43

by Patrick Mochel

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface



On Fri, 6 Jan 2006, Pavel Machek wrote:

> On Čt 05-01-06 16:04:07, Patrick Mochel wrote:

> > A better point, and one that would actually be useful, would be to remove
> > the file altogether. Let Dominik export a power file, with complete
> > control over the values, for each pcmcia device. Then you never have to
> > worry about breaking PCMCIA again.
>
> Fine with me.

ACK, you beat me to it.

And, appended is a patch to export PM controls for PCI devices. The file
"pm_possible_states" exports the states a device supports, and "pm_state"
exports the current state (and provides the interface for entering a
state).

Eventually, some drivers will want to fix up those values so that it can
mask of states that it doesn't support, as well as offer possible device-
specific states.

What's interesting is that with this patch, I can see that two more
devices on my system support D1 and D2 -- the cardbus controllers, which
are actually bridges whose PM capabilities aren't exported via lspci.

Thanks,

Patrick


diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 6707df9..628f3a3 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -36,6 +36,10 @@ obj-$(CONFIG_ACPI) += pci-acpi.o
# Cardbus & CompactPCI use setup-bus
obj-$(CONFIG_HOTPLUG) += setup-bus.o

+
+# Power Management functionality
+obj-$(CONFIG_PM) += pm.o
+
ifndef CONFIG_X86
obj-y += syscall.o
endif
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index eed67d9..83045d9 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -85,6 +85,8 @@ void __devinit pci_bus_add_device(struct
list_add_tail(&dev->global_list, &pci_devices);
spin_unlock(&pci_bus_lock);

+ pci_pm_init(dev);
+
pci_proc_attach_device(dev);
pci_create_sysfs_dev_files(dev);
}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 6527b36..6d7afbc 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -63,6 +63,23 @@ extern int pcie_mch_quirk;
extern struct device_attribute pci_dev_attrs[];
extern struct class_device_attribute class_device_attr_cpuaffinity;

+
+#ifdef CONFIG_PM
+extern int pci_pm_init(struct pci_dev *);
+extern void pci_pm_exit(struct pci_dev *);
+#else /* CONFIG_PM */
+static inline int pci_pm_init(struct pci_dev *)
+{
+ return 0;
+}
+
+static inline void pci_pm_exit(struct pci_dev *)
+{
+
+}
+
+#endif
+
/**
* pci_match_one_device - Tell if a PCI device structure has a matching
* PCI device id structure
diff --git a/drivers/pci/pm.c b/drivers/pci/pm.c
new file mode 100644
index 0000000..ce476e4
--- /dev/null
+++ b/drivers/pci/pm.c
@@ -0,0 +1,138 @@
+/*
+ * drivers/pci/pm.c - Power management support for PCI devices
+ */
+
+#include <linux/pci.h>
+
+
+static ssize_t pm_possible_states_show(struct device * d,
+ struct device_attribute * a,
+ char * buffer)
+{
+ struct pci_dev * dev = to_pci_dev(d);
+ char * s = buffer;
+
+ s += sprintf(s, "d0");
+ if (dev->poss_states[PCI_D1])
+ s += sprintf(s, " d1");
+ if (dev->poss_states[PCI_D2])
+ s += sprintf(s, " d2");
+ if (dev->poss_states[PCI_D3hot])
+ s += sprintf(s, " d3");
+ s += sprintf(s, "\n");
+ return (s - buffer);
+}
+
+static DEVICE_ATTR(pm_possible_states, 0444, pm_possible_states_show, NULL);
+
+
+static ssize_t pm_state_show(struct device * d, struct device_attribute * a,
+ char * buffer)
+{
+ struct pci_dev * dev = to_pci_dev(d);
+ const char * str;
+
+ switch (dev->current_state) {
+ case PCI_D0:
+ str = "d0";
+ break;
+ case PCI_D1:
+ str = "d1";
+ break;
+ case PCI_D2:
+ str = "d2";
+ break;
+ case PCI_D3hot:
+ str = "d3";
+ break;
+ default:
+ str = "d?";
+ break;
+ }
+
+ return sprintf(buffer, "%s\n", str);
+}
+
+
+static ssize_t pm_state_store(struct device * d, struct device_attribute * a,
+ const char * buffer, size_t len)
+{
+ struct pci_dev * dev = to_pci_dev(d);
+ pci_power_t state;
+ int ret;
+
+ if (!strnicmp(buffer, "d0", len))
+ state = PCI_D0;
+ else if (!strnicmp(buffer, "d1", len))
+ state = PCI_D1;
+ else if (!strnicmp(buffer, "d2", len))
+ state = PCI_D2;
+ else if (!strnicmp(buffer, "d3", len))
+ state = PCI_D3hot;
+ else
+ return -EINVAL;
+
+ if (state == dev->current_state)
+ return 0;
+
+ if (dev->poss_states[state])
+ ret = pci_set_power_state(dev, state);
+ else
+ ret = -EINVAL;
+
+ return ret == 0 ? len : ret;
+}
+
+static DEVICE_ATTR(pm_state, 0644, pm_state_show, pm_state_store);
+
+
+static int find_states(struct pci_dev * dev)
+{
+ int cap;
+ u16 pmc;
+
+
+ /*
+ * Every device supports D0
+ */
+ dev->poss_states[PCI_D0] = 1;
+
+ /*
+ * Check if the device has PM capabilties in the config space
+ */
+ cap = pci_find_capability(dev, PCI_CAP_ID_PM);
+ if (!cap)
+ return -EIO;
+
+ /*
+ * If it supports PM capabilities, it will support D3
+ */
+ dev->poss_states[PCI_D3hot] = 1;
+
+ /*
+ * Check D1 and D2 support
+ */
+ pci_read_config_word(dev, cap + PCI_PM_PMC, &pmc);
+ if (pmc & PCI_PM_CAP_D1)
+ dev->poss_states[PCI_D1] = 1;
+ if (pmc & PCI_PM_CAP_D2)
+ dev->poss_states[PCI_D2] = 1;
+ return 0;
+}
+
+
+int pci_pm_init(struct pci_dev * dev)
+{
+ if (find_states(dev))
+ return 0;
+
+ device_create_file(&dev->dev, &dev_attr_pm_possible_states);
+ return device_create_file(&dev->dev, &dev_attr_pm_state);
+}
+
+void pci_pm_exit(struct pci_dev * dev)
+{
+ device_remove_file(&dev->dev, &dev_attr_pm_state);
+ device_remove_file(&dev->dev, &dev_attr_pm_possible_states);
+}
+
diff --git a/include/linux/pci.h b/include/linux/pci.h
index de690ca..2600119 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -106,6 +106,7 @@ struct pci_dev {
this if your device has broken DMA
or supports 64-bit transfers. */

+ u32 poss_states[4];
pci_power_t current_state; /* Current operating state. In ACPI-speak,
this is D0-D3, D0 being fully functional,
and D3 being off. */

2006-01-06 02:50:28

by Greg KH

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Thu, Jan 05, 2006 at 11:27:05PM +0100, Dominik Brodowski wrote:
> On Thu, Jan 05, 2006 at 11:23:38PM +0100, Pavel Machek wrote:
> > > In addition, your patch breaks pcmcia / pcmciautils which already uses
> > > numbers (which I already had to change from "3" to "2" before...).
> >
> > pcmcia actually uses this? Ouch. Do you just read the power file, or
> > do you write to it, too?
>
> Reading and writing. Replacement for "cardctl suspend" and "cardctl resume".
>
>
> static int pccardctl_power_one(unsigned long socket_no, unsigned int device,
> unsigned int power)
> {
> int ret;
> char file[SYSFS_PATH_MAX];
> struct sysfs_attribute *attr;
>
> snprintf(file, SYSFS_PATH_MAX,
> "/sys/bus/pcmcia/devices/%lu.%u/power/state",
> socket_no, device);
>
> attr = sysfs_open_attribute(file);
> if (!attr)
> return -ENODEV;
>
> ret = sysfs_write_attribute(attr, power ? "2" : "0", 1);
>
> sysfs_close_attribute(attr);
>
> return (ret);
> }
>
>
> NB: it will break one day, one way or another, when gregkh makes the
> /sys/class -> /sys/devices conversion. However, I'd want to try not to break
> the new pcmciautils userspace too often...

Why would the conversion that I'm working on break this userspace code?
You are only using the device directory, which will not change at all.

And my goal in the rework is to not break userspace code, although
libsysfs does need a one line change to handle this, hopefully they have
intregrated that already...

thanks,

greg k-h

2006-01-06 09:01:39

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Čt 05-01-06 17:37:30, Patrick Mochel wrote:
>
>
> On Fri, 6 Jan 2006, Pavel Machek wrote:
>
> > On Čt 05-01-06 16:04:07, Patrick Mochel wrote:
>
> > > A better point, and one that would actually be useful, would be to remove
> > > the file altogether. Let Dominik export a power file, with complete
> > > control over the values, for each pcmcia device. Then you never have to
> > > worry about breaking PCMCIA again.
> >
> > Fine with me.
>
> ACK, you beat me to it.
>
> And, appended is a patch to export PM controls for PCI devices. The file
> "pm_possible_states" exports the states a device supports, and "pm_state"
> exports the current state (and provides the interface for entering a
> state).
>
> Eventually, some drivers will want to fix up those values so that it can
> mask of states that it doesn't support, as well as offer possible device-
> specific states.
>
> What's interesting is that with this patch, I can see that two more
> devices on my system support D1 and D2 -- the cardbus controllers, which
> are actually bridges whose PM capabilities aren't exported via
lspci.

> +static ssize_t pm_possible_states_show(struct device * d,
> + struct device_attribute * a,
> + char * buffer)
> +{
> + struct pci_dev * dev = to_pci_dev(d);
> + char * s = buffer;
> +
> + s += sprintf(s, "d0");
> + if (dev->poss_states[PCI_D1])
> + s += sprintf(s, " d1");
> + if (dev->poss_states[PCI_D2])
> + s += sprintf(s, " d2");
> + if (dev->poss_states[PCI_D3hot])
> + s += sprintf(s, " d3");

Could we use same states as rest of code, i.e. "D2" instead of "d2"
and "D3hot" instead of "d3"?

> +static ssize_t pm_state_show(struct device * d, struct device_attribute * a,
> + char * buffer)

Please no space between "struct foo *" and "d".

> + if (state == dev->current_state)
> + return 0;
> +
> + if (dev->poss_states[state])
> + ret = pci_set_power_state(dev, state);

So you just set the PCI power, without any coordination with driver?
How can this work?

> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -106,6 +106,7 @@ struct pci_dev {
> this if your device has broken DMA
> or supports 64-bit transfers. */
>
> + u32 poss_states[4];

It is boolean... Can we have int instead of u32?

Pavel

--
Thanks, Sharp!

2006-01-06 09:03:11

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

One more nit...

> + s += sprintf(s, "d0");
> + if (dev->poss_states[PCI_D1])
> + s += sprintf(s, " d1");
> + if (dev->poss_states[PCI_D2])
> + s += sprintf(s, " d2");
> + if (dev->poss_states[PCI_D3hot])
> + s += sprintf(s, " d3");
...
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -106,6 +106,7 @@ struct pci_dev {
> this if your device has broken DMA
> or supports 64-bit transfers. */
>
> + u32 poss_states[4];

So this probably should be poss_states[PCI_D3hot]; or something,
instead of explicit "4".
Pavel

--
Thanks, Sharp!

2006-01-06 14:35:35

by Tom Marshall

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

> > I have a firewire controller in a desktop system, and a ATI Radeon in a
> > T42 that support D1 and D2..
>
> Ok, now we have a concrete example. So Radeon supports D1. But putting
> radeon into D1 means you probably want to blank your screen and turn
> the backlight off; that takes *long* time anyway. So you can simply
> put your radeon into D3 and save a bit more power.
>
> So yes, Radeon supports D1, but we probably do not want to use it.

I understand this is a theoretical argument. However, in reality, a
significant number of T42 owners get less than 12 hours of battery life in
S3 suspend because the Radeon chip sucks a huge amount of power with the
current code. We would be eternally grateful if someone could figure out
why only some models are affected and, more importantly, submit a patch that
will reliably enter D2 (or D3 if it is supported?) for all Radeon 7500
chips. I've found a couple patches that were submitted to this list but,
for whatever reason, nobody seems to have found a solution that is
acceptable yet. I've been manually patching my kernels with code to enter
D2 for the last year or so, and from the volume of google results, it looks
like quite a lot of other folks are doing the same... :(

2006-01-06 15:00:09

by Dominik Brodowski

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

Hi,

On Thu, Jan 05, 2006 at 05:37:30PM -0800, Patrick Mochel wrote:
> And, appended is a patch to export PM controls for PCI devices. The file
> "pm_possible_states" exports the states a device supports, and "pm_state"
> exports the current state (and provides the interface for entering a
> state).

Your patch doesn't handle the PM dependencies, unfortunately...

Thanks,
Dominik

2006-01-06 15:03:36

by Dominik Brodowski

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Thu, Jan 05, 2006 at 04:38:06PM -0800, Greg KH wrote:
> > NB: it will break one day, one way or another, when gregkh makes the
> > /sys/class -> /sys/devices conversion. However, I'd want to try not to break
> > the new pcmciautils userspace too often...
>
> Why would the conversion that I'm working on break this userspace code?
> You are only using the device directory, which will not change at all.

Actually, pcmciautils uses both paths starting with
/sys/bus/pcmcia/devices/ and /sys/class/pcmcia_socket/pcmcia_socket%d/ --
and I was expecting that the latter path won't be available somewhen in
future?

Thanks,
Dominik

2006-01-06 15:42:26

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Thu, 5 Jan 2006, Patrick Mochel wrote:

> On Fri, 6 Jan 2006, Pavel Machek wrote:
>
> > On 05-01-06 16:04:07, Patrick Mochel wrote:
>
> > > A better point, and one that would actually be useful, would be to remove
> > > the file altogether. Let Dominik export a power file, with complete
> > > control over the values, for each pcmcia device. Then you never have to
> > > worry about breaking PCMCIA again.
> >
> > Fine with me.
>
> ACK, you beat me to it.
>
> And, appended is a patch to export PM controls for PCI devices. The file
> "pm_possible_states" exports the states a device supports, and "pm_state"
> exports the current state (and provides the interface for entering a
> state).
>
> Eventually, some drivers will want to fix up those values so that it can
> mask of states that it doesn't support, as well as offer possible device-
> specific states.
>
> What's interesting is that with this patch, I can see that two more
> devices on my system support D1 and D2 -- the cardbus controllers, which
> are actually bridges whose PM capabilities aren't exported via lspci.

This trend is extremely alarming!!

It's a very bad idea to make bus drivers export and manage the syfs power
interface. It means that lots of code gets repeated and different buses
do things differently.

Already we have PCI exporting "pm_possible_states" and "pm_state" while
PCMCIA exports "suspend". How many other different schemes are going to
crop up? How much bus-specific information will have to be built into a
user utility?

If possible states are represented as arrays of pointers to strings, then
the PM core can easily supply the sysfs interface. If Patrick's patch
were re-written so that the sysfs interface were moved into the PM core,
leaving only the PCI-specific portions in the PCI drivers, I would be much
happier. This would also mean that Dominik's patch could be replaced by
something a good deal smaller.

And it wouldn't hurt to add some mechanism for indicating which of the
possible states is the generic "suspend" state (usually D3 for PCI
devices, but not necessarily).

Alan Stern


2006-01-06 16:20:59

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On P? 06-01-06 06:34:14, Tom Marshall wrote:
> > > I have a firewire controller in a desktop system, and a ATI Radeon in a
> > > T42 that support D1 and D2..
> >
> > Ok, now we have a concrete example. So Radeon supports D1. But putting
> > radeon into D1 means you probably want to blank your screen and turn
> > the backlight off; that takes *long* time anyway. So you can simply
> > put your radeon into D3 and save a bit more power.
> >
> > So yes, Radeon supports D1, but we probably do not want to use it.
>
> I understand this is a theoretical argument. However, in reality, a
> significant number of T42 owners get less than 12 hours of battery life in
> S3 suspend because the Radeon chip sucks a huge amount of power with the
> current code. We would be eternally grateful if someone could figure out
> why only some models are affected and, more importantly, submit a patch that
> will reliably enter D2 (or D3 if it is supported?) for all Radeon 7500
> chips. I've found a couple patches that were submitted to this list but,
> for whatever reason, nobody seems to have found a solution that is
> acceptable yet. I've been manually patching my kernels with code to enter
> D2 for the last year or so, and from the volume of google results, it looks
> like quite a lot of other folks are doing the same... :(

We were talking runtime pm here.
Pavel

--
Thanks, Sharp!

2006-01-06 16:25:25

by Kay Sievers

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Fri, Jan 06, 2006 at 04:03:34PM +0100, Dominik Brodowski wrote:
> On Thu, Jan 05, 2006 at 04:38:06PM -0800, Greg KH wrote:
> > > NB: it will break one day, one way or another, when gregkh makes the
> > > /sys/class -> /sys/devices conversion. However, I'd want to try not to break
> > > the new pcmciautils userspace too often...
> >
> > Why would the conversion that I'm working on break this userspace code?
> > You are only using the device directory, which will not change at all.
>
> Actually, pcmciautils uses both paths starting with
> /sys/bus/pcmcia/devices/ and /sys/class/pcmcia_socket/pcmcia_socket%d/ --
> and I was expecting that the latter path won't be available somewhen in
> future?

That will just be resolved by a symlink. The access path to the device will
stay the same, but the location of the "real" object will move to /devices.
All classification will point into the unified /devices tree but keep
the pathes as we have today. /sys/class will contain only symlinks in
the end.

Kay

2006-01-07 00:09:08

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On P? 06-01-06 10:42:24, Alan Stern wrote:
> On Thu, 5 Jan 2006, Patrick Mochel wrote:
> > On Fri, 6 Jan 2006, Pavel Machek wrote:
> > > On 05-01-06 16:04:07, Patrick Mochel wrote:
> >
> > > > A better point, and one that would actually be useful, would be to remove
> > > > the file altogether. Let Dominik export a power file, with complete
> > > > control over the values, for each pcmcia device. Then you never have to
> > > > worry about breaking PCMCIA again.
> > >
> > > Fine with me.
> >
> > ACK, you beat me to it.
> >
> > And, appended is a patch to export PM controls for PCI devices. The file
> > "pm_possible_states" exports the states a device supports, and "pm_state"
> > exports the current state (and provides the interface for entering a
> > state).
> >
> > Eventually, some drivers will want to fix up those values so that it can
> > mask of states that it doesn't support, as well as offer possible device-
> > specific states.
> >
> > What's interesting is that with this patch, I can see that two more
> > devices on my system support D1 and D2 -- the cardbus controllers, which
> > are actually bridges whose PM capabilities aren't exported via lspci.
>
> This trend is extremely alarming!!

It scares me a bit, too.

> It's a very bad idea to make bus drivers export and manage the syfs power
> interface. It means that lots of code gets repeated and different buses
> do things differently.
>
> Already we have PCI exporting "pm_possible_states" and "pm_state" while
> PCMCIA exports "suspend". How many other different schemes are going to
> crop up? How much bus-specific information will have to be built into a
> user utility?
>
> If possible states are represented as arrays of pointers to strings, then
> the PM core can easily supply the sysfs interface. If Patrick's patch
> were re-written so that the sysfs interface were moved into the PM core,
> leaving only the PCI-specific portions in the PCI drivers, I would be much
> happier. This would also mean that Dominik's patch could be replaced by
> something a good deal smaller.
>
> And it wouldn't hurt to add some mechanism for indicating which of the
> possible states is the generic "suspend" state (usually D3 for PCI
> devices, but not necessarily).

I think we should start with string-based interface, with just two
states ("on" and "off"). That is easily extensible into future, and
suits current PCMCIA nicely. It also allows us to experiment with PCI
power management... I can cook up a patch, but it will be simple
reintroduction of .../power file under different name.
Pavel
--
Thanks, Sharp!

2006-01-07 03:19:09

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Sat, 7 Jan 2006, Pavel Machek wrote:

> > This trend is extremely alarming!!
>
> It scares me a bit, too.

> I think we should start with string-based interface, with just two
> states ("on" and "off"). That is easily extensible into future, and
> suits current PCMCIA nicely. It also allows us to experiment with PCI
> power management... I can cook up a patch, but it will be simple
> reintroduction of .../power file under different name.

That sounds good to me, provided you add to dev_pm_info a pointer to an
array of string pointers, the names of the available states. For now all
devices can share a single array with just two entries ("on" and "off",
or even "0" and "2"). In the future, bus drivers will be able to
substitute a pointer to a private array with device-specific state names.

Of course, this change will require you to add something like

const char *name;

to the pm_message_t struct. PM-aware drivers will be able to use it to
see which state was requested, and other drivers can ignore it.

In fact, for PM-aware drivers the name field makes the event field
unnecessary. But until all drivers are PM-aware we can't get rid of it.

Alan Stern

2006-01-07 05:45:27

by Adam Belay

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Fri, Jan 06, 2006 at 09:59:21AM +0100, Pavel Machek wrote:
> On ??t 05-01-06 17:37:30, Patrick Mochel wrote:
> > +static ssize_t pm_possible_states_show(struct device * d,
> > + struct device_attribute * a,
> > + char * buffer)
> > +{
> > + struct pci_dev * dev = to_pci_dev(d);
> > + char * s = buffer;
> > +
> > + s += sprintf(s, "d0");
> > + if (dev->poss_states[PCI_D1])
> > + s += sprintf(s, " d1");
> > + if (dev->poss_states[PCI_D2])
> > + s += sprintf(s, " d2");
> > + if (dev->poss_states[PCI_D3hot])
> > + s += sprintf(s, " d3");
>
> Could we use same states as rest of code, i.e. "D2" instead of "d2"
> and "D3hot" instead of "d3"?

"D3hot" and "D3cold" most likely going to be removed and replaced with just
"D3" in my next set of changes. Although this distinction is mentioned in
the spec, d3cold cannot actually be specified. Rather, D3cold is entered when
the parent bridge is powered off.

Thanks,
Adam

2006-01-07 05:55:42

by Adam Belay

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Fri, Jan 06, 2006 at 04:00:05PM +0100, Dominik Brodowski wrote:
> Hi,
>
> On Thu, Jan 05, 2006 at 05:37:30PM -0800, Patrick Mochel wrote:
> > And, appended is a patch to export PM controls for PCI devices. The file
> > "pm_possible_states" exports the states a device supports, and "pm_state"
> > exports the current state (and provides the interface for entering a
> > state).
>
> Your patch doesn't handle the PM dependencies, unfortunately...
>
> Thanks,
> Dominik

Physical power dependencies are a very bus and platform specific matter.
I don't expect to see much core level help or infustructure. I think in
the PCI case, we would first require PCI bus power management support.
This feature is currently nonexistent. As I understand, PCI express links
are capable of handling much of this at the hardware level, so it's not
especially important for newer hardware.

At the moment, I think the biggest X86 pm dependency issue is ACPI power
resources.

Logical dependencies might be another matter. For example, if we have a
bus-instance structure at the driver core level, it may make sense to
call some sort of ->prepare_for_children mechansim before starting up a
child device.

Thanks,
Adam

2006-01-07 07:39:20

by Adam Belay

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Fri, Jan 06, 2006 at 10:42:24AM -0500, Alan Stern wrote:
> On Thu, 5 Jan 2006, Patrick Mochel wrote:
>
> > On Fri, 6 Jan 2006, Pavel Machek wrote:
> >
> > > On 05-01-06 16:04:07, Patrick Mochel wrote:
> >
> > > > A better point, and one that would actually be useful, would be to remove
> > > > the file altogether. Let Dominik export a power file, with complete
> > > > control over the values, for each pcmcia device. Then you never have to
> > > > worry about breaking PCMCIA again.
> > >
> > > Fine with me.
> >
> > ACK, you beat me to it.
> >
> > And, appended is a patch to export PM controls for PCI devices. The file
> > "pm_possible_states" exports the states a device supports, and "pm_state"
> > exports the current state (and provides the interface for entering a
> > state).
> >
> > Eventually, some drivers will want to fix up those values so that it can
> > mask of states that it doesn't support, as well as offer possible device-
> > specific states.
> >
> > What's interesting is that with this patch, I can see that two more
> > devices on my system support D1 and D2 -- the cardbus controllers, which
> > are actually bridges whose PM capabilities aren't exported via lspci.
>
> This trend is extremely alarming!!
>
> It's a very bad idea to make bus drivers export and manage the syfs power
> interface. It means that lots of code gets repeated and different buses
> do things differently.

In my opinion, the vast and often fundamentally different power
management specifications contribute greatly to the problem of
coordinated operating system controlled power management. ACPI has
defined D0 - D3, and frankly, on x86 platforms, limiting the core interface
to those four states can be very functional. Of course this isn't
pratical for the Linux PM layer because there several other important
platforms. With that in mind, any generic representation of power
states has a tendency to be either overly complex or unacceptably limiting.

Considering these factors, I think allowing each bus to define its own
power management states and capabilities is a sensible option. However,
I'm not convinced that it is necessary for these bus specific interfaces
to provide direct control of a device's power management states in most
situations. That's not to say that some platforms won't need this
functionality but rather that PCI, USB, ACPI, and many others may not
want to provide userspace control of these low-level details.

As an alternative, it might be possible to allow each driver to export a
list of runtime power management states. These states might revolve around
high level device class definitions rather than bus and platform interfaces.
The mechanism of reading and controlling these states could be similar to
the one you previously proposed for bus-level states.

As an example, a typical ethernet driver might export "on" and "off".
It doesn't matter if the ethernet device is PCI, ACPI, USB, etc. The
key matter is that, for the "net" device class, most drivers will want
to providee "on" and "off" as they correspond to "up" and "down". For the
PCI case, "off" will mean the highest (most off) D-state capable of
supporting the user's current wake settings. This might be D2 if link
detect is enabled or D3 if it is disabled. The actual PCI state can
be changed by the driver at any time, but the driver level state
dictates the drivers current intentions ("off" meaning save as much
power as would be possible while satisfying constraints).

Most sound card drivers will probably have more complex states.
They might be "on", "sleep", and "off". "sleep" could be invoked as a
low latency state when the input and output lines have been quiet for
a certain uesr specified timeout period. "off" could be be much higher
latency (some output might be lost i.e. skipping) and only invoked when
the audio interface has been closed from the userspace end. Once again,
these states are not required to have a direct relation to bus level states.
A PCI sound card might remain in "D0" during the "sleep" state but turn off
many sub-components of the card and still save some power.

I think runtime power management is really all about what functionality
the drivers are willing provide. If we focus on presenting bus-level power
management capabilities under a unified interface, then at best we are
ignoring the various subtleties of each specification (even ACPI and PCI
have minor differences), and at worst we're preventing drivers from revealing
the states that are actually important. In other words, power management
can also be seen as a behavior, not just a power level. Afterall, even
devices without bus level PM suport can save power just by doing things like
stopping DMA. Even "virtual" devices can be seen as power-managable.

In short I'm suggesting the following:

1.) Every bus and device has its own unique PM mechanisms and specifications.
Representing this in a single unified model of any sort is nearly impossible.
Therefore, it may be best to allow each bus to define its own PM
infustructure and sysfs files (perhaps in a way similar to Pat's recent
patch).

2.) Device drivers on the other hand exist at a more abstract level and,
as a result, we have greater flexability and more options. Therefore, I
think this is an excellent place to define power states and driver core PM
infustructure.

3.) System suspend and runtime power management are not even close to
similar. Trying to use the same ->suspend and ->resume API is
ridiculious because it prevents intermediate power states and doesn't
properly perpare devices and device classes for a runtime environment.
Therefore, I'm in favor of a seperate interface tailored specifically for
runtime power management.

4.) If we're going to make any meaningful progress, we need to also
focus on device classes and class orriented power policy. For example,
the "net" device class should provide infustructure and helper functions
for runtime power management of that flavor. This might include some
generic "net" PM sysfs files.

Thanks,
Adam

2006-01-07 07:56:25

by Adam Belay

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Sat, Jan 07, 2006 at 01:08:26AM +0100, Pavel Machek wrote:
> On P? 06-01-06 10:42:24, Alan Stern wrote:
> > On Thu, 5 Jan 2006, Patrick Mochel wrote:
> > It's a very bad idea to make bus drivers export and manage the syfs power
> > interface. It means that lots of code gets repeated and different buses
> > do things differently.
> >
> > Already we have PCI exporting "pm_possible_states" and "pm_state" while
> > PCMCIA exports "suspend". How many other different schemes are going to
> > crop up? How much bus-specific information will have to be built into a
> > user utility?
> >
> > If possible states are represented as arrays of pointers to strings, then
> > the PM core can easily supply the sysfs interface. If Patrick's patch
> > were re-written so that the sysfs interface were moved into the PM core,
> > leaving only the PCI-specific portions in the PCI drivers, I would be much
> > happier. This would also mean that Dominik's patch could be replaced by
> > something a good deal smaller.
> >
> > And it wouldn't hurt to add some mechanism for indicating which of the
> > possible states is the generic "suspend" state (usually D3 for PCI
> > devices, but not necessarily).
>
> I think we should start with string-based interface, with just two
> states ("on" and "off"). That is easily extensible into future, and
> suits current PCMCIA nicely. It also allows us to experiment with PCI
> power management... I can cook up a patch, but it will be simple
> reintroduction of .../power file under different name.

The driver core can provide some infustructure, but let's leave the states
up to the drivers. Afterall, some drivers might only be interested in "on"
during runtime. Also, drivers might support some sort of partial-off
but not "off".

And no, this does not allow us to experiment with PCI power management.
The runtime PM in drivers and some aspects of the PCI PM code (e.g. PME events
and saving/restoring device context) are terminally broken. We really need a
good foundation for each bus PM spec. There's a lot more to it than
just setting some bit in the PCI config space.

Also there's nothing "runtime" about the PCMCIA PM API. It's much more
like calling ->remove() as it disabled the device all together. I'm more
interested in saving power without crippling functionality.

Thanks,
Adam

2006-01-07 08:33:36

by Adam Belay

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Wed, Jan 04, 2006 at 10:34:05PM +0100, Pavel Machek wrote:
> On ?t 27-12-05 20:22:04, Patrick Mochel wrote:
> We want _common_ values, anyway. So, we do not want "D0", "D1", "D2",
> "D3hot" in PCI cases. We probably want "on", "D1", "D2", "suspend",
> and I'm not sure about those "D1" and "D2" parts. Userspace should not
> have to know about details, it will mostly use "on"/"suspend" anyway.
>
> > > One day, when we find device that needs it, we may want to add more
> > > states. I don't know about such device currently.
> >
> > There are many devices already do - there are PCI, PCI-X, PCI Express,
> > ACPI devices, etc that do. But, you simply cannot create a single
> > decent
>
> I asked for an example.

Look at the ACPI spec, it has several examples...

1.) most sound cards have more than two states. (once again latency over
power savings trade offs)
2.) many PCI devices with wake support use different D-levels depending
on wake settings
3.) PCI buses have B0, B1, B2, and B3 (and yes these are very different
from D-states). This catagory also includes cardbus bridges.
4.) IDE hard drives and other storage media have "sleep", "suspend",
etc.
5.) SATA controllers have more states than just "on" and "off". Also
these states are independent of the PCI d-states.
6.) many video cards implement D1 and D2 as you've already seen. This
is often more a matter of "we only know how to restore from such and such
states"
7.) Many processors support of wealth of different power states

This list is not exhaustive. Also, embedded platforms probably have
serveral more examples.

Thanks,
Adam

2006-01-07 10:20:59

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On So 07-01-06 02:58:51, Adam Belay wrote:
> On Sat, Jan 07, 2006 at 01:08:26AM +0100, Pavel Machek wrote:
> > On P? 06-01-06 10:42:24, Alan Stern wrote:
> > > On Thu, 5 Jan 2006, Patrick Mochel wrote:
> > > It's a very bad idea to make bus drivers export and manage the syfs power
> > > interface. It means that lots of code gets repeated and different buses
> > > do things differently.
> > >
> > > Already we have PCI exporting "pm_possible_states" and "pm_state" while
> > > PCMCIA exports "suspend". How many other different schemes are going to
> > > crop up? How much bus-specific information will have to be built into a
> > > user utility?
> > >
> > > If possible states are represented as arrays of pointers to strings, then
> > > the PM core can easily supply the sysfs interface. If Patrick's patch
> > > were re-written so that the sysfs interface were moved into the PM core,
> > > leaving only the PCI-specific portions in the PCI drivers, I would be much
> > > happier. This would also mean that Dominik's patch could be replaced by
> > > something a good deal smaller.
> > >
> > > And it wouldn't hurt to add some mechanism for indicating which of the
> > > possible states is the generic "suspend" state (usually D3 for PCI
> > > devices, but not necessarily).
> >
> > I think we should start with string-based interface, with just two
> > states ("on" and "off"). That is easily extensible into future, and
> > suits current PCMCIA nicely. It also allows us to experiment with PCI
> > power management... I can cook up a patch, but it will be simple
> > reintroduction of .../power file under different name.
>
> The driver core can provide some infustructure, but let's leave the states
> up to the drivers. Afterall, some drivers might only be interested in "on"
> during runtime. Also, drivers might support some sort of partial-off
> but not "off".

Then they map "off" to "as much off as I can". Big deal.

> And no, this does not allow us to experiment with PCI power
> management.

Well, suse was already doing experiments with 2.6.14 sysfs...

> Also there's nothing "runtime" about the PCMCIA PM API. It's much more
> like calling ->remove() as it disabled the device all together.

It looks enough runtime to me.

> I'm more
> interested in saving power without crippling functionality.

You are on wrong mailing list, then. Seriously. If you can save power
without affecting functionality, then just doing, you don't need any
userland interface.
Pavel

--
Thanks, Sharp!

2006-01-07 10:26:22

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On So 07-01-06 03:36:02, Adam Belay wrote:
> On Wed, Jan 04, 2006 at 10:34:05PM +0100, Pavel Machek wrote:
> > On ?t 27-12-05 20:22:04, Patrick Mochel wrote:
> > We want _common_ values, anyway. So, we do not want "D0", "D1", "D2",
> > "D3hot" in PCI cases. We probably want "on", "D1", "D2", "suspend",
> > and I'm not sure about those "D1" and "D2" parts. Userspace should not
> > have to know about details, it will mostly use "on"/"suspend" anyway.
> >
> > > > One day, when we find device that needs it, we may want to add more
> > > > states. I don't know about such device currently.
> > >
> > > There are many devices already do - there are PCI, PCI-X, PCI Express,
> > > ACPI devices, etc that do. But, you simply cannot create a single
> > > decent
> >
> > I asked for an example.
>
> Look at the ACPI spec, it has several examples...
>
> 1.) most sound cards have more than two states. (once again latency over
> power savings trade offs)

What is the latency in typical "most sound card" case?

> 2.) many PCI devices with wake support use different D-levels depending
> on wake settings

...can be done internally in driver.

> 4.) IDE hard drives and other storage media have "sleep", "suspend",
> etc.

Yep; but spindown takes 5 seconds, so if you need to reset ide bus or
not to get it back is driver detail. Plus notice how power consuption
in sleep and suspend is almost same; motor not running is big deal
there. Ouch and hdparm already handles these.

> 5.) SATA controllers have more states than just "on" and "off". Also
> these states are independent of the PCI d-states.

...so "bus provides list of states" ideas do not really work.

> 6.) many video cards implement D1 and D2 as you've already seen. This
> is often more a matter of "we only know how to restore from such and such
> states"

Excatly, so "on"/"off" is enough for them.

> 7.) Many processors support of wealth of different power states

Processors are handled specially, anyway.
Pavel
--
Thanks, Sharp!

2006-01-07 12:43:16

by Adam Belay

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Sat, Jan 07, 2006 at 11:25:54AM +0100, Pavel Machek wrote:
> > > I asked for an example.
> >
> > Look at the ACPI spec, it has several examples...
> >
> > 1.) most sound cards have more than two states. (once again latency over
> > power savings trade offs)
>
> What is the latency in typical "most sound card" case?

ACPI requires within 100ms for D1 and D2. D3 can be longer.

> > 4.) IDE hard drives and other storage media have "sleep", "suspend",
> > etc.
>
> Yep; but spindown takes 5 seconds, so if you need to reset ide bus or
> not to get it back is driver detail. Plus notice how power consuption
> in sleep and suspend is almost same; motor not running is big deal
> there. Ouch and hdparm already handles these.

5 seconds is more of an upper bound. My laptop hardrive can do spindown
and spinup in around 1 second. hdparm doesn't handle the deeper suspend
case (it can enter but not recover). Kernel level support is required.

>
> > 6.) many video cards implement D1 and D2 as you've already seen. This
> > is often more a matter of "we only know how to restore from such and such
> > states"
>
> Excatly, so "on"/"off" is enough for them.

Not really. The ACPI spec suggests latency requirements for each state.
If we had better drivers, we might enter a lower latency video card
state when the screen is blanked that's not quite an "off" state.

The ACPI spec also defines intermediate states for input devices,
various display classes, and modems.

-Adam

2006-01-07 13:04:23

by Adam Belay

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Sat, Jan 07, 2006 at 11:20:13AM +0100, Pavel Machek wrote:
> On So 07-01-06 02:58:51, Adam Belay wrote:
> > The driver core can provide some infustructure, but let's leave the states
> > up to the drivers. Afterall, some drivers might only be interested in "on"
> > during runtime. Also, drivers might support some sort of partial-off
> > but not "off".
>
> Then they map "off" to "as much off as I can". Big deal.

Yes, but it may not be the kernel's place to make this generalization.

>
> > And no, this does not allow us to experiment with PCI power
> > management.
>
> Well, suse was already doing experiments with 2.6.14 sysfs...

Sure one can see how much power is saved by entering D3, but than can
even be done through userspace. I think it's more important to
experiment with schemes that will actually be useful and reliable to the
end user.

>
> > Also there's nothing "runtime" about the PCMCIA PM API. It's much more
> > like calling ->remove() as it disabled the device all together.
>
> It looks enough runtime to me.

As was already discussed, we don't want to modify every userspace
application to check if the device it needs is "on" (resumed) or
"off" (suspended). It's just two painful with third party apps etc.
Therefore, the kernel needs to handle this more seemlessly. In my
opinion, this is what runtime power management is all about, saving
power without getting in the user's way.

>
> > I'm more
> > interested in saving power without crippling functionality.
>
> You are on wrong mailing list, then. Seriously. If you can save power
> without affecting functionality, then just doing, you don't need any
> userland interface.

Cool it, affecting functionality and disabling it are not the same
thing. I'm fairly certain this mailing list is interested in exploring
more than suspend/resume (i.e. "on" and "off"), even if it can be hacked
to happen during runtime.

This issue depends on the platform and device class. Some device
classes will have more user initiated power transitions than others.
With that in mind, even drivers with exclusively kernel-space control
will want to export knobs and tunables for timeout values etc. A
userland interface if important regardless of kernel-side
implementation.

-Adam

2006-01-07 15:24:52

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

On Sat, 7 Jan 2006, Adam Belay wrote:

> In short I'm suggesting the following:
>
> 1.) Every bus and device has its own unique PM mechanisms and specifications.
> Representing this in a single unified model of any sort is nearly impossible.
> Therefore, it may be best to allow each bus to define its own PM
> infustructure and sysfs files (perhaps in a way similar to Pat's recent
> patch).

Bus and/or driver. However even though the mechanisms and specifications
are all different, that doesn't mean the user interfaces have to be
different. Using simple character string names (like Pat did and like I
did several months ago) should be general enough to work everywhere.

Similarly, the sysfs files themselves should be uniform (even if their
contents aren't). The core should provide the user interface mechanism
and infrastructure, while leaving it up to the bus and the device drivers
to interpret the meanings of the strings.

> 2.) Device drivers on the other hand exist at a more abstract level and,
> as a result, we have greater flexability and more options. Therefore, I
> think this is an excellent place to define power states and driver core PM
> infustructure.

It's a good place to define and implement power states. But the driver
core PM infrastructure obviously should be defined and implemented in the
core, not in the drivers.

> 3.) System suspend and runtime power management are not even close to
> similar. Trying to use the same ->suspend and ->resume API is
> ridiculious because it prevents intermediate power states and doesn't
> properly perpare devices and device classes for a runtime environment.
> Therefore, I'm in favor of a seperate interface tailored specifically for
> runtime power management.

Personally I don't care much one way or another. The APIs can be kept
separate or overlapped somehow; it doesn't really matter. Just so long as
it's done the same way in all drivers and in the core.

> 4.) If we're going to make any meaningful progress, we need to also
> focus on device classes and class orriented power policy. For example,
> the "net" device class should provide infustructure and helper functions
> for runtime power management of that flavor. This might include some
> generic "net" PM sysfs files.

Can you think of device classes other than "net" requiring this special
approach? Network interfaces already get special treatment in some ways
(for example, they don't have entries in /dev). This could just be
another form of special treatment for them.

Alan Stern

2006-01-07 19:16:05

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

> > > Also there's nothing "runtime" about the PCMCIA PM API. It's much more
> > > like calling ->remove() as it disabled the device all together.
> >
> > It looks enough runtime to me.
>
> As was already discussed, we don't want to modify every userspace
> application to check if the device it needs is "on" (resumed) or
> "off" (suspended). It's just two painful with third party apps etc.
> Therefore, the kernel needs to handle this more seemlessly. In my

But we do not want to reactivate device on first access. Certainly not
in PCMCIA case. Reactivation is separate problem.

Pavel

--
Thanks, Sharp!

2006-01-07 19:22:20

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface


> > > 1.) most sound cards have more than two states. (once again latency over
> > > power savings trade offs)
> >
> > What is the latency in typical "most sound card" case?
>
> ACPI requires within 100ms for D1 and D2. D3 can be longer.

Spec is useless here. 1usec is <100msec. Do you know any
soundcard that takes >10msec from D3?
>
> > > 4.) IDE hard drives and other storage media have "sleep", "suspend",
> > > etc.
> >
> > Yep; but spindown takes 5 seconds, so if you need to reset ide bus or
> > not to get it back is driver detail. Plus notice how power consuption
> > in sleep and suspend is almost same; motor not running is big deal
> > there. Ouch and hdparm already handles these.
>
> 5 seconds is more of an upper bound. My laptop hardrive can do spindown
> and spinup in around 1 second. hdparm doesn't handle the deeper suspend
> case (it can enter but not recover). Kernel level support is required.

...but this has very little to do with framework we are trying to
build.
(and no support is needed for hdparm -y).

> > > 6.) many video cards implement D1 and D2 as you've already seen. This
> > > is often more a matter of "we only know how to restore from such and such
> > > states"
> >
> > Excatly, so "on"/"off" is enough for them.
>
> Not really. The ACPI spec suggests latency requirements for each state.
> If we had better drivers, we might enter a lower latency video card
> state when the screen is blanked that's not quite an "off" state.

Backlight takes ages, anyway.


--
Thanks, Sharp!

2006-01-09 20:10:38

by Dominik Brodowski

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

Hi,

On Fri, Jan 06, 2006 at 05:25:16PM +0100, Kay Sievers wrote:
> On Fri, Jan 06, 2006 at 04:03:34PM +0100, Dominik Brodowski wrote:
> > On Thu, Jan 05, 2006 at 04:38:06PM -0800, Greg KH wrote:
> > > > NB: it will break one day, one way or another, when gregkh makes the
> > > > /sys/class -> /sys/devices conversion. However, I'd want to try not to break
> > > > the new pcmciautils userspace too often...
> > >
> > > Why would the conversion that I'm working on break this userspace code?
> > > You are only using the device directory, which will not change at all.
> >
> > Actually, pcmciautils uses both paths starting with
> > /sys/bus/pcmcia/devices/ and /sys/class/pcmcia_socket/pcmcia_socket%d/ --
> > and I was expecting that the latter path won't be available somewhen in
> > future?
>
> That will just be resolved by a symlink. The access path to the device will
> stay the same, but the location of the "real" object will move to /devices.
> All classification will point into the unified /devices tree but keep
> the pathes as we have today. /sys/class will contain only symlinks in
> the end.

OK, thanks for this clarification!

Dominik

2006-01-13 19:55:51

by Takashi Iwai

[permalink] [raw]
Subject: Re: [linux-pm] [patch] pm: fix runtime powermanagement's /sys interface

At Fri, 6 Jan 2006 04:24:22 +0000,
Pavel Machek wrote:
>
>
> > > > 1.) most sound cards have more than two states. (once again latency over
> > > > power savings trade offs)
> > >
> > > What is the latency in typical "most sound card" case?
> >
> > ACPI requires within 100ms for D1 and D2. D3 can be longer.
>
> Spec is useless here. 1usec is <100msec. Do you know any
> soundcard that takes >10msec from D3?

In the case of D3, the driver needs to initialize fully, and a latency
longer than 10ms may easily happen espeically with HZ=100.

Most of codec chips have powersave modes (disable converters and
amps), so having a "powersave" state makes sense, not only "on" and
"suspend".


Takashi