2006-12-19 19:15:41

by Matthew Garrett

[permalink] [raw]
Subject: Changes to sysfs PM layer break userspace

Commit 047bda36150d11422b2c7bacca1df324c909c0b3 broke userspace.
Previously, /sys/bus/pci/devices/foo/power/state could have values
echoed into it for triggering suspend/resume calls in the driver. The
breakage is handily mentioned in the comment:

"Devices with bus.suspend_late(), or bus.resume_early() methods fail
this operation; those methods couldn't be called."

but there's no mention of what previously working code is supposed to do
now. That's the second time in the past year or so that this interface
has been broken - can we have it working again, please, especially as
there doesn't appear to be an alternative yet?

--
Matthew Garrett | [email protected]


2006-12-19 19:35:00

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Tue, 2006-12-19 at 18:52 +0000, Matthew Garrett wrote:
> Commit 047bda36150d11422b2c7bacca1df324c909c0b3 broke userspace.
> Previously, /sys/bus/pci/devices/foo/power/state could have values
> echoed into it for triggering suspend/resume calls in the driver. The
> breakage is handily mentioned in the comment:
>
> "Devices with bus.suspend_late(), or bus.resume_early() methods fail
> this operation; those methods couldn't be called."
>
> but there's no mention of what previously working code is supposed to do
> now. That's the second time in the past year or so that this interface
> has been broken - can we have it working again, please, especially as
> there doesn't appear to be an alternative yet?


which userspace is using this btw?


2006-12-19 19:44:26

by Matthew Garrett

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Tue, Dec 19, 2006 at 08:34:48PM +0100, Arjan van de Ven wrote:

> which userspace is using this btw?

Ubuntu uses it to disable wireless hardware under certain circumstances.
I believe that Suse's powernowd uses it to power down wired ethernet
hardware when it's not in use.

--
Matthew Garrett | [email protected]

2006-12-19 20:03:25

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Tue, 2006-12-19 at 19:44 +0000, Matthew Garrett wrote:
> On Tue, Dec 19, 2006 at 08:34:48PM +0100, Arjan van de Ven wrote:
>
> > which userspace is using this btw?
>
> Ubuntu uses it to disable wireless hardware under certain circumstances.
> I believe that Suse's powernowd uses it to power down wired ethernet
> hardware when it's not in use.

humm shouldn't the driver do this when the interface is brought down?
sounds like you're playing with fire to do this behind the drivers'
back....

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-19 20:08:16

by Matthew Garrett

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Tue, Dec 19, 2006 at 09:03:21PM +0100, Arjan van de Ven wrote:

> humm shouldn't the driver do this when the interface is brought down?
> sounds like you're playing with fire to do this behind the drivers'
> back....

I'm not sure. Suspending the chip means you lose things like link beat
detection, so it's not something you necessarily want to automatically
tie to something like interface status. Some chips support more
fine-grained power management, so we could do something more sensible in
that case - but right now, there doesn't seem to be a lot of driver
support for it.

--
Matthew Garrett | [email protected]

2006-12-19 20:23:09

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Tue, 2006-12-19 at 20:08 +0000, Matthew Garrett wrote:
> On Tue, Dec 19, 2006 at 09:03:21PM +0100, Arjan van de Ven wrote:
>
> > humm shouldn't the driver do this when the interface is brought down?
> > sounds like you're playing with fire to do this behind the drivers'
> > back....
>
> I'm not sure. Suspending the chip means you lose things like link beat
> detection, so it's not something you necessarily want to automatically
> tie to something like interface status.

right now the "spec" for Linux network drivers assumes that you put the
NIC into D3 on down, except for cases where Wake-on-Lan is enabled etc.

> Some chips support more
> fine-grained power management, so we could do something more sensible in
> that case - but right now, there doesn't seem to be a lot of driver
> support for it.

sounds like that's the right approach at least .. not talking to the PCI
hardware directly from userspace...

I can see the point of having more than just "UP" and "DOWN" as
interface states; "UP", "DOWN" and "OFF" for example...


--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-19 20:33:05

by Matthew Garrett

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Tue, Dec 19, 2006 at 09:23:05PM +0100, Arjan van de Ven wrote:
> On Tue, 2006-12-19 at 20:08 +0000, Matthew Garrett wrote:
> > I'm not sure. Suspending the chip means you lose things like link beat
> > detection, so it's not something you necessarily want to automatically
> > tie to something like interface status.
>
> right now the "spec" for Linux network drivers assumes that you put the
> NIC into D3 on down, except for cases where Wake-on-Lan is enabled etc.

Really? I can't find any drivers that seem to do this. The only calls to
pci_set_power_state seem to be in the suspend, resume, init and exit
routines.

> > Some chips support more
> > fine-grained power management, so we could do something more sensible in
> > that case - but right now, there doesn't seem to be a lot of driver
> > support for it.
>
> sounds like that's the right approach at least .. not talking to the PCI
> hardware directly from userspace...

I'd certainly agree that that's the right thing to do, but userspace has
a habit of using whatever functionality /is/ available to get to a given
end. The semantics of the device/power/state file were never made
terribly clear, and it did have the desired effect of suspending the
device. Removing it without providing warning or a transition pathway is
a pain.

> I can see the point of having more than just "UP" and "DOWN" as
> interface states; "UP", "DOWN" and "OFF" for example...

Agreed.

--
Matthew Garrett | [email protected]

2006-12-19 20:55:47

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Tue, 2006-12-19 at 20:32 +0000, Matthew Garrett wrote:
> On Tue, Dec 19, 2006 at 09:23:05PM +0100, Arjan van de Ven wrote:
> > On Tue, 2006-12-19 at 20:08 +0000, Matthew Garrett wrote:
> > > I'm not sure. Suspending the chip means you lose things like link beat
> > > detection, so it's not something you necessarily want to automatically
> > > tie to something like interface status.
> >
> > right now the "spec" for Linux network drivers assumes that you put the
> > NIC into D3 on down, except for cases where Wake-on-Lan is enabled etc.
>
> Really? I can't find any drivers that seem to do this. The only calls to
> pci_set_power_state seem to be in the suspend, resume, init and exit
> routines.

your grep missed tg3.c for example, which has a wrapper around the power
state code and goes to D3hot on downing of the interface. (just the
first one I looked at as a "reference driver", others probably do the
same thing)

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-19 21:22:18

by David Brownell

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Tuesday 19 December 2006 10:52 am, Matthew Garrett wrote:
> Commit 047bda36150d11422b2c7bacca1df324c909c0b3 broke userspace.

Actually, no ... that just prevented breakage enabled by
commit cbd69dbbf1adfce6e048f15afc8629901ca9dae5 which taught
PCI how to use the new irqs-off suspend_late()/resume_early()
driver model calls.


> Previously, /sys/bus/pci/devices/foo/power/state could have values
> echoed into it for triggering suspend/resume calls in the driver.

In general, it still can; though not for PCI, because of the change
I pointed out above. What the patch you mentioned changes is something
else: it refuses to do so when those calls should require leaving
the system in an IRQs-off mode.

Rather obviously, IRQs-off is fine for entering system-wide suspend
states. But Linux can't stay in that mode for normal operations ...


> The
> breakage is handily mentioned in the comment:
>
> "Devices with bus.suspend_late(), or bus.resume_early() methods fail
> this operation; those methods couldn't be called."

And the reason they couldn't be called is: that they guarantee IRQs
would stay off between suspend_late() and resume_early().


> but there's no mention of what previously working code is supposed to do
> now.

Stop trying to use broken and deprecated APIs; and realize that "previously
working" meant you just hadn't tripped over the serious bugs yet.

Work with driver framework developers to come up with a solution for
the _real_ problem, which IMO will look more like "teach <x> stack about
power management" than "bypass all the driver layers and go right to a
PCI-specific mechanism, even for non-PCI drivers".


>> Ubuntu uses it to disable wireless hardware under certain circumstances.
>> I believe that Suse's powernowd uses it to power down wired ethernet
>> hardware when it's not in use.

Drivers can and should know how to do that sort of stuff themselves,
so the power savings are reliable and consistent no matter what user
space tools are (or aren't) used.

Drivers know how to get power savings a lot better than any userspace
code ever could ... with the exception of hints like "ifdown eth0"
letting the driver know that right now is a good time to power down
almost everything, since it's not going to be used until "ifup".
(Agreed that other hints may be desirable, but that's a different
issue ... probably best addressed at the framework level, e.g. WLAN,
rather than by hacks to individual drivers.)


> That's the second time in the past year or so that this interface
> has been broken - can we have it working again, please, especially as
> there doesn't appear to be an alternative yet?

As a generic mechanism, that interface has *ALWAYS* been "broken
by design"; I'd call it unfixable. It's deprecated, and scheduled
to vanish; see Documentation/feature-removal-schedule.txt ...

- Dave

2006-12-19 21:34:53

by David Brownell

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Tuesday 19 December 2006 12:32 pm, Matthew Garrett wrote:
> On Tue, Dec 19, 2006 at 09:23:05PM +0100, Arjan van de Ven wrote:

> > right now the "spec" for Linux network drivers assumes that you put the
> > NIC into D3 on down, except for cases where Wake-on-Lan is enabled etc.
>
> Really? I can't find any drivers that seem to do this. The only calls to
> pci_set_power_state seem to be in the suspend, resume, init and exit
> routines.

More drivers should do this, even though not many do so right now. In the
ideal case, the PHY can be active for link detection without the rest of
the adapter being powered up...


> > > Some chips support more
> > > fine-grained power management, so we could do something more sensible in
> > > that case - but right now, there doesn't seem to be a lot of driver
> > > support for it.
> >
> > sounds like that's the right approach at least .. not talking to the PCI
> > hardware directly from userspace...
>
> I'd certainly agree that that's the right thing to do, but userspace has
> a habit of using whatever functionality /is/ available to get to a given
> end. The semantics of the device/power/state file were never made
> terribly clear, and it did have the desired effect of suspending the
> device. Removing it without providing warning or a transition pathway is
> a pain.

Documentation/feature-removal-schedule.txt has warned about this since
August, and the PM list has discussed how broken that model is numerous
times over the past several years. (I'm pretty sure that discussion has
leaked out to LKML on occasion.) It shouldn't be news today.


> > I can see the point of having more than just "UP" and "DOWN" as
> > interface states; "UP", "DOWN" and "OFF" for example...
>
> Agreed.

More than one driver stack probably needs to start paying attention to
its power management models. I think Arjan is right about the basic
mindset of "down == off" for network drivers. If there are cases where
that doesn't work, those can start driving improvements.

- Dave

2006-12-19 22:57:42

by Matthew Garrett

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Tue, Dec 19, 2006 at 01:22:12PM -0800, David Brownell wrote:

> Stop trying to use broken and deprecated APIs; and realize that "previously
> working" meant you just hadn't tripped over the serious bugs yet.

I'm happy to stop using broken and deprecated APIs, providing that
there's *actually a replacement*.

> Drivers can and should know how to do that sort of stuff themselves,
> so the power savings are reliable and consistent no matter what user
> space tools are (or aren't) used.

To the extent that that's possible, I entirely agree - however, it
clearly isn't always. Not all hardware can be powered down without
losing functionality, and so in those cases it's important that there be
a mechanism to allow that decision to be made by userspace.

> Drivers know how to get power savings a lot better than any userspace
> code ever could ... with the exception of hints like "ifdown eth0"
> letting the driver know that right now is a good time to power down
> almost everything, since it's not going to be used until "ifup".

But in the cases where you don't have fine grained power management in
the hardware, it's still desirable to be able to suspend an unused
network agent. The driver can't do it by default, because that would
break existing behaviour.

> As a generic mechanism, that interface has *ALWAYS* been "broken
> by design"; I'd call it unfixable. It's deprecated, and scheduled
> to vanish; see Documentation/feature-removal-schedule.txt ...

The fact that something is scheduled to be removed in July 2007 does
*not* mean it's acceptable to break it in 2006. We need to find a way to
fix this functionality in the meantime.

--
Matthew Garrett | [email protected]

2006-12-19 23:36:32

by David Brownell

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace

On Tuesday 19 December 2006 2:57 pm, Matthew Garrett wrote:
> On Tue, Dec 19, 2006 at 01:22:12PM -0800, David Brownell wrote:

> > As a generic mechanism, that interface has *ALWAYS* been "broken
> > by design"; I'd call it unfixable. It's deprecated, and scheduled
> > to vanish; see Documentation/feature-removal-schedule.txt ...
>
> The fact that something is scheduled to be removed in July 2007 does
> *not* mean it's acceptable to break it in 2006. We need to find a way to
> fix this functionality in the meantime.

The disconnect here is analagous to: I tell you the alleged perpetual
motion machine never worked, and can't ever work; and you push back and
say that you need a perpetual motion machine that works, NOW please,
because you need something that pushes those widgets around. (There are
better ways to push widgets than side effects of a broken machine...)


Given that your examples are network adapters, you should really look
more at why "ifdown eth0" (etc) having drivers put the device into a
low power state (like PCI D3hot, or maybe D2) wouldn't work in any
particular case. If you actually have such cases, then maybe those
specific drivers need to drive new power management interfaces.

That's a workable approach to resolving the underlying problem in the
long term. In the short term, notice the system still works correctly
if you don't try writing those files.

I'd not be keen on reverting Linus' patch [1] myself, even though few
drivers have started to use that mechanism yet; that would be a step
backwards, and would perpetuate users of that broken sysfs file.

- Dave

[1] cbd69dbbf1adfce6e048f15afc8629901ca9dae5

2006-12-20 00:10:13

by Matthew Garrett

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace

On Tue, Dec 19, 2006 at 03:36:28PM -0800, David Brownell wrote:
> On Tuesday 19 December 2006 2:57 pm, Matthew Garrett wrote:
> > The fact that something is scheduled to be removed in July 2007 does
> > *not* mean it's acceptable to break it in 2006. We need to find a way to
> > fix this functionality in the meantime.
>
> The disconnect here is analagous to: I tell you the alleged perpetual
> motion machine never worked, and can't ever work; and you push back and
> say that you need a perpetual motion machine that works, NOW please,
> because you need something that pushes those widgets around. (There are
> better ways to push widgets than side effects of a broken machine...)

But it *did* work. Userspace could ask the device to suspend, and (in
general) that would result in the device going into a lower power state.
You've broken that without providing an alternative.

> Given that your examples are network adapters, you should really look
> more at why "ifdown eth0" (etc) having drivers put the device into a
> low power state (like PCI D3hot, or maybe D2) wouldn't work in any
> particular case. If you actually have such cases, then maybe those
> specific drivers need to drive new power management interfaces.

We seem to be arguing at cross purposes here. I've absolutely no
objection to this approach in the long run, just as I've got no
objection to flying cars or food pills or moon pods. When these things
exist, the world will indeed be a glorious place. But that doesn't
justify me slashing your tyres, poisoning your crops or setting fire to
whatever the real-world analogue of a moon pod is. I had something that
worked. Now I don't, but instead have the promise that at some point
I'll have something better. Understand why I'm a touch irritated?

> That's a workable approach to resolving the underlying problem in the
> long term. In the short term, notice the system still works correctly
> if you don't try writing those files.

Well, except I'm now burning an extra couple of watts of power. I
consider that pretty broken.

> I'd not be keen on reverting Linus' patch [1] myself, even though few
> drivers have started to use that mechanism yet; that would be a step
> backwards, and would perpetuate users of that broken sysfs file.

I'm sorry, which bit of "Don't break userspace API without adequate
prior warning and with a workable replacement" is difficult to
understand?

--
Matthew Garrett | [email protected]

2006-12-20 00:26:00

by Matthew Garrett

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Tue, Dec 19, 2006 at 01:34:49PM -0800, David Brownell wrote:

> Documentation/feature-removal-schedule.txt has warned about this since
> August, and the PM list has discussed how broken that model is numerous
> times over the past several years. (I'm pretty sure that discussion has
> leaked out to LKML on occasion.) It shouldn't be news today.

1) feature-removal-schedule.txt says that it'll be removed in July 2007.
This isn't July 2007.

2) The functionality was disabled in 2.6.19. The addition to
feature-removal-schedule.txt was in, uh, 2.6.19.

3) "The whole _point_ of a kernel is to act as a abstraction layer and
resource management between user programs and hardware/outside world.
That's why kernels _exist_. Breaking user-land API's is thus by
definition something totally idiotic.

If you need to break something, you create a new interface, and try to
translate between the two, and maybe you deprecate the old one so that
it can be removed once it's not in use any more. If you can't see that
this is how a kernel should work, you're missing the point of having a
kernel in the first place."

Linus, http://lkml.org/lkml/2006/10/4/327

--
Matthew Garrett | [email protected]

2006-12-20 02:15:42

by Andrew Morton

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Tue, 19 Dec 2006 13:34:49 -0800
David Brownell <[email protected]> wrote:

> Documentation/feature-removal-schedule.txt has warned about this since
> August

Nobody reads that.

Please, wherever possible, put a nice printk("this is going away") in the code
when planning these things.

2006-12-20 02:34:48

by Randy Dunlap

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Tue, 19 Dec 2006 18:15:24 -0800 Andrew Morton wrote:

> On Tue, 19 Dec 2006 13:34:49 -0800
> David Brownell <[email protected]> wrote:
>
> > Documentation/feature-removal-schedule.txt has warned about this since
> > August
>
> Nobody reads that.

Ugh, I read it.

> Please, wherever possible, put a nice printk("this is going away") in the code
> when planning these things.

Can notices go in both places, or is in the source code (printk)
now the preferred way?

I think that we can point people to Doc/feature-removal-schedule.txt
easier (and more effectively) than we can source code (or noisy kernel
logs).

---
~Randy

2006-12-20 03:19:40

by David Brownell

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace

On Tuesday 19 December 2006 4:09 pm, Matthew Garrett wrote:
> On Tue, Dec 19, 2006 at 03:36:28PM -0800, David Brownell wrote:
> > On Tuesday 19 December 2006 2:57 pm, Matthew Garrett wrote:
> > > The fact that something is scheduled to be removed in July 2007 does
> > > *not* mean it's acceptable to break it in 2006. We need to find a way to
> > > fix this functionality in the meantime.
> >
> > The disconnect here is analagous to: I tell you the alleged perpetual
> > motion machine never worked, and can't ever work; and you push back and
> > say that you need a perpetual motion machine that works, NOW please,
> > because you need something that pushes those widgets around. (There are
> > better ways to push widgets than side effects of a broken machine...)
>
> But it *did* work.

Having been on the other side ... I can testify that if you
think it actually worked, it's because you're ignoring all
the nasty failure modes.


> > I'd not be keen on reverting Linus' patch [1] myself, even though few
> > drivers have started to use that mechanism yet; that would be a step
> > backwards, and would perpetuate users of that broken sysfs file.
>
> I'm sorry, which bit of "Don't break userspace API without adequate
> prior warning and with a workable replacement" is difficult to
> understand?

What part of "it was already broken" do YOU not understand? The
whole notion is unsustainable. It doesn't work cross-platform, or
for multiple bus types. It confuses system-wide suspend mechanisms
with runtime mechanisms. It breaks guaranteed parent/child ordering
of suspend/resume calls. (And more...)


Let us know when you get tired of whining and want to move on to
getting a real solution to the set of problems here. I've pointed
out that reverting Linus' patch would be one option to get your
short term issue rsolved ... that would remove a capability from
PCI drivers, but you could then use that deprecated mechanism.
I've also pointed out that you could start working towards a real
long term solution.

Do you have an alternate solution?

- Dave

2006-12-20 03:29:17

by David Brownell

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Tuesday 19 December 2006 6:15 pm, Andrew Morton wrote:
> On Tue, 19 Dec 2006 13:34:49 -0800
> David Brownell <[email protected]> wrote:
>
> > Documentation/feature-removal-schedule.txt has warned about this since
> > August
>
> Nobody reads that.
>
> Please, wherever possible, put a nice printk("this is going away") in the code
> when planning these things.


Signed-off-by: David Brownell <[email protected]>

Index: g26/drivers/base/power/sysfs.c
===================================================================
--- g26.orig/drivers/base/power/sysfs.c 2006-09-27 16:19:00.000000000 -0700
+++ g26/drivers/base/power/sysfs.c 2006-12-19 19:27:25.000000000 -0800
@@ -42,9 +42,17 @@ static ssize_t state_show(struct device

static ssize_t state_store(struct device * dev, struct device_attribute *attr, const char * buf, size_t n)
{
+ static int warned;
pm_message_t state;
int error = -EINVAL;

+ if (!warned) {
+ printk(KERN_WARNING
+ "*** WARNING *** sysfs devices/.../power/state files "
+ "are only for testing, and will be removed\n");
+ warned = error;
+ }
+
/* disallow incomplete suspend sequences */
if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early))
return error;

2006-12-20 03:43:27

by Matthew Garrett

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace

On Tue, Dec 19, 2006 at 07:19:36PM -0800, David Brownell wrote:
> On Tuesday 19 December 2006 4:09 pm, Matthew Garrett wrote:
> > I'm sorry, which bit of "Don't break userspace API without adequate
> > prior warning and with a workable replacement" is difficult to
> > understand?
>
> What part of "it was already broken" do YOU not understand? The
> whole notion is unsustainable. It doesn't work cross-platform, or
> for multiple bus types. It confuses system-wide suspend mechanisms
> with runtime mechanisms. It breaks guaranteed parent/child ordering
> of suspend/resume calls. (And more...)

Linux is utterly riddled with broken APIs. It's possible to see that as
a downside of the "Release early, release often" model, but the
advantage is that we get the opportunity to determine how these
interfaces are broken. Based on that, we can either improve the existing
interface or decide that it's broken beyond repair and design a new one.

What we don't do is decide that an interface is broken, deprecate it
and in the same release break it even for the cases where it
previously worked. That's just insane.

> Let us know when you get tired of whining and want to move on to
> getting a real solution to the set of problems here. I've pointed
> out that reverting Linus' patch would be one option to get your
> short term issue rsolved ... that would remove a capability from
> PCI drivers, but you could then use that deprecated mechanism.
> I've also pointed out that you could start working towards a real
> long term solution.

I could, and in the long run I intend to. On the other hand, I don't
expect to have enough time to fix every single in-tree network driver
before 2.6.20, so...

> Do you have an alternate solution?

How about something like this? Entirely untested, but I think it shows
the basic idea.

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index f9c903b..4865918 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -597,6 +597,17 @@ static int platform_resume(struct device * dev)
return ret;
}

+static int platform_requires_disabled_interrupts(struct device * dev)
+{
+ int ret = 0;
+
+ if (dev->driver && (dev->driver->resume_early
+ || dev->driver->suspend_late))
+ ret = 1;
+
+ return ret;
+}
+
struct bus_type platform_bus_type = {
.name = "platform",
.dev_attrs = platform_dev_attrs,
@@ -604,8 +615,9 @@ struct bus_type platform_bus_type = {
.uevent = platform_uevent,
.suspend = platform_suspend,
.suspend_late = platform_suspend_late,
- .resume_early = platform_resume_early,
+ .resume_early = platform_resume_early,
.resume = platform_resume,
+ .requires_disabled_interrupts = platform_requires_disabled_interrupts,
};
EXPORT_SYMBOL_GPL(platform_bus_type);

diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c
index 2d47517..97c6d65 100644
--- a/drivers/base/power/sysfs.c
+++ b/drivers/base/power/sysfs.c
@@ -46,7 +46,8 @@ static ssize_t state_store(struct device * dev, struct device_attribute *attr, c
int error = -EINVAL;

/* disallow incomplete suspend sequences */
- if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early))
+ if (dev->bus && dev->bus->requires_disabled_interrupts
+ && dev->bus->requries_disabled_interrupts())
return error;

state.event = PM_EVENT_SUSPEND;
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index e5ae3a0..9808d42 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -351,6 +351,18 @@ static int pci_device_resume(struct device * dev)
return error;
}

+static int pci_device_requires_disabled_interrupts(struct device * dev)
+{
+ int error = 0;
+ struct pci_dev * pci_dev = to_pci_dev(dev);
+ struct pci_driver * drv = pci_dev->driver;
+
+ if (drv && (drv->resume_early || drv_suspend_late))
+ error = 1;
+
+ return error;
+}
+
static int pci_device_resume_early(struct device * dev)
{
int error = 0;
@@ -569,6 +581,7 @@ struct bus_type pci_bus_type = {
.suspend_late = pci_device_suspend_late,
.resume_early = pci_device_resume_early,
.resume = pci_device_resume,
+ .requires_disabled_interrupts = pci_requires_disabled_interrupts,
.shutdown = pci_device_shutdown,
.dev_attrs = pci_dev_attrs,
};
diff --git a/include/linux/device.h b/include/linux/device.h
index 49ab53c..0686234 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -59,6 +59,7 @@ struct bus_type {
int (*suspend)(struct device * dev, pm_message_t state);
int (*suspend_late)(struct device * dev, pm_message_t state);
int (*resume_early)(struct device * dev);
+ int (*requires_disabled_interrupts)(struct device * dev);
int (*resume)(struct device * dev);
};

--
Matthew Garrett | [email protected]

2006-12-20 03:50:18

by Andrew Morton

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Tue, 19 Dec 2006 18:35:39 -0800
Randy Dunlap <[email protected]> wrote:

> On Tue, 19 Dec 2006 18:15:24 -0800 Andrew Morton wrote:
>
> > On Tue, 19 Dec 2006 13:34:49 -0800
> > David Brownell <[email protected]> wrote:
> >
> > > Documentation/feature-removal-schedule.txt has warned about this since
> > > August
> >
> > Nobody reads that.
>
> Ugh, I read it.
>
> > Please, wherever possible, put a nice printk("this is going away") in the code
> > when planning these things.
>
> Can notices go in both places, or is in the source code (printk)
> now the preferred way?

I think printks grab a lot more attention. It's not surprising that people
get surprised when the feature they're using goes away.

Plus they may not even know that that they're using the feature. A printk
fixes that.

> I think that we can point people to Doc/feature-removal-schedule.txt
> easier (and more effectively) than we can source code (or noisy kernel
> logs).

Hopefully developers who see the printk will think to look in
feature-removal-schedule.txt for more details.

2006-12-20 03:59:48

by David Brownell

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace

On Tuesday 19 December 2006 4:25 pm, Matthew Garrett wrote:
> On Tue, Dec 19, 2006 at 01:34:49PM -0800, David Brownell wrote:
>
> > Documentation/feature-removal-schedule.txt has warned about this since
> > August, and the PM list has discussed how broken that model is numerous
> > times over the past several years. (I'm pretty sure that discussion has
> > leaked out to LKML on occasion.) It shouldn't be news today.
>
> 1) feature-removal-schedule.txt says that it'll be removed in July 2007.
> This isn't July 2007.

Which is why the functionality is still there.


> 2) The functionality was disabled in 2.6.19. The addition to
> feature-removal-schedule.txt was in, uh, 2.6.19.

Please respond to the technical explanation I provided, and stop
referring to the functionality ** which is still there and works **
as being disabled.

The fact that PCI exposes a mechanism that conflicts with that is
a separate issue.

Whining does not help.

I can't help it if that schedule.txt patch took until 2.6.19 to get
upstream; ISTR it was available before 2.6.18 shipped. Maybe patches
to that file should be accelerated, even into the stable series.


> 3) "The whole _point_ of a kernel is to act as a abstraction layer and
> resource management between user programs and hardware/outside world.
> That's why kernels _exist_. Breaking user-land API's is thus by
> definition something totally idiotic.
>
> If you need to break something, you create a new interface, and try to
> translate between the two, and maybe you deprecate the old one so that
> it can be removed once it's not in use any more. If you can't see that
> this is how a kernel should work, you're missing the point of having a
> kernel in the first place."
>
> Linus, http://lkml.org/lkml/2006/10/4/327

So I'm amused that the problem you refer to is the direct consequence
of Linus' patch to add the suspend_late()/resume_early() mechanism
into the PCI driver framework. (Again, see the technical explanation;
and please try to have a technical discussion, not a flamefest.)


One of the missing steps in Linus' formulation there is that not all
interfaces are equivalent in terms of support guarantee. Bugs are
interfaces, for example, and sometimes folk wrongly depend on them
when they persist for a long time (like, cough, this one).

His comment was specifically about breaking a widely used API that
many people have been relying on since, oh, about 1996, and had been
well proven in that time. And the change was a "system doesn't work"
level change.

In contrast, the /sys/devices/.../power/state API has never had many
users beyond developers trying to test their drivers (without taking
the whole system into a low power state, which probably didn't work
in any case), and has *always* been problematic. And the change you
object to doesn't "break" anything fundamental, either. Everything
still works.

In terms of any reasonable expectations about support, those two
changes aren't comparable.

- Dave

2006-12-20 04:15:19

by David Brownell

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace

On Tuesday 19 December 2006 7:43 pm, Matthew Garrett wrote:

> > Do you have an alternate solution?
>
> How about something like this? Entirely untested, but I think it shows
> the basic idea.

Other than indentation/whitespace bugs, it seems to encapsulate the
layering violation needed to get those deprecated files working again
for PCI (and platform_bus). I'd rename the new bus method though;
maybe "pm_has_noirq_stage()" or somesuch. Your name is so generic that
it'd be a surprise if the answer were ever "no"!

You should also list this new call in the feature-removal.txt entry for
stuff that gets removed with /sys/devices/.../power/state files, since
it's another mechanism that only exists to prop up that broken API,
and should vanish at the same time that API does.

- Dave

2006-12-20 04:27:05

by Matthew Garrett

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace

On Tue, Dec 19, 2006 at 07:59:42PM -0800, David Brownell wrote:
> On Tuesday 19 December 2006 4:25 pm, Matthew Garrett wrote:
> > 1) feature-removal-schedule.txt says that it'll be removed in July 2007.
> > This isn't July 2007.
>
> Which is why the functionality is still there.

Merely broken in the majority of cases...

> > 2) The functionality was disabled in 2.6.19. The addition to
> > feature-removal-schedule.txt was in, uh, 2.6.19.
>
> Please respond to the technical explanation I provided, and stop
> referring to the functionality ** which is still there and works **
> as being disabled.

The breakage is that devices that are happy to suspend with enabled
interrupts can no longer be suspended from userspace. Refusing to
suspend a single device on the basis that some other driver on the bus
may, potentially, at some point require some suspend code to be run with
disabled interrupts is not a sensible choice. Especially since I can't
actually find a single driver in the kernel tree that currently uses
this functionality.

> I can't help it if that schedule.txt patch took until 2.6.19 to get
> upstream; ISTR it was available before 2.6.18 shipped. Maybe patches
> to that file should be accelerated, even into the stable series.

That would still not have provided anywhere near enough warning.

> One of the missing steps in Linus' formulation there is that not all
> interfaces are equivalent in terms of support guarantee. Bugs are
> interfaces, for example, and sometimes folk wrongly depend on them
> when they persist for a long time (like, cough, this one).

The existence of the power/state interface wasn't a bug - it was a
deliberate decision to add it. It's the only reason the
dpm_runtime_suspend() interface exists. It's perfectly reasonable to
refer to it as a flawed interface, or perhaps even a buggy one. But in
itself, it's clearly not a bug. And it's perfectly reasonable for
userland to depend on interfaces that are deliberately exposed by the
kernel.

> In contrast, the /sys/devices/.../power/state API has never had many
> users beyond developers trying to test their drivers (without taking
> the whole system into a low power state, which probably didn't work
> in any case), and has *always* been problematic. And the change you
> object to doesn't "break" anything fundamental, either. Everything
> still works.

It's used on every Ubuntu and Suse system, and the change means that
certain functionality no longer works - it's now impossible to prevent
my wireless hardware from drawing power when I'm not using it, for
example. If the WE power operations were deliberately disabled, then
that would also be a bug.

--
Matthew Garrett | [email protected]

2006-12-20 04:56:16

by Matthew Garrett

[permalink] [raw]
Subject: [PATCH 1/2] Fix /sys/device/.../power/state

Recent changes in the PM system made it impossible to perform runtime
suspend of any PCI or platform devices. This patch restores the
functionality for any devices that don't require any of their suspend or
resume code to be run with interrupts disabled.

Signed-off-by: Matthew Garrett <[email protected]>

diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index f9c903b..6bf1218 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -597,6 +597,16 @@ static int platform_resume(struct device * dev)
return ret;
}

+static int platform_pm_has_noirq_stage(struct device * dev)
+{
+ int ret = 0;
+ struct platform_driver *drv = to_platform_driver(dev->driver);
+
+ if (dev->driver && (drv->resume_early || drv->suspend_late))
+ ret = 1;
+ return ret;
+}
+
struct bus_type platform_bus_type = {
.name = "platform",
.dev_attrs = platform_dev_attrs,
@@ -606,6 +616,7 @@ struct bus_type platform_bus_type = {
.suspend_late = platform_suspend_late,
.resume_early = platform_resume_early,
.resume = platform_resume,
+ .pm_has_noirq_stage = platform_pm_has_noirq_stage,
};
EXPORT_SYMBOL_GPL(platform_bus_type);

diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c
index 2d47517..03d3f81 100644
--- a/drivers/base/power/sysfs.c
+++ b/drivers/base/power/sysfs.c
@@ -46,7 +46,8 @@ static ssize_t state_store(struct device * dev, struct device_attribute *attr, c
int error = -EINVAL;

/* disallow incomplete suspend sequences */
- if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early))
+ if (dev->bus && dev->bus->pm_has_noirq_stage
+ && dev->bus->pm_has_noirq_stage(dev))
return error;

state.event = PM_EVENT_SUSPEND;
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index e5ae3a0..c0e4e7a 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -351,6 +351,17 @@ static int pci_device_resume(struct device * dev)
return error;
}

+static int pci_device_pm_has_noirq_stage(struct device * dev)
+{
+ int error = 0;
+ struct pci_dev * pci_dev = to_pci_dev(dev);
+ struct pci_driver * drv = pci_dev->driver;
+
+ if (drv && (drv->resume_early || drv->suspend_late))
+ error = 1;
+ return error;
+}
+
static int pci_device_resume_early(struct device * dev)
{
int error = 0;
@@ -569,6 +580,7 @@ struct bus_type pci_bus_type = {
.suspend_late = pci_device_suspend_late,
.resume_early = pci_device_resume_early,
.resume = pci_device_resume,
+ .pm_has_noirq_stage = pci_device_pm_has_noirq_stage,
.shutdown = pci_device_shutdown,
.dev_attrs = pci_dev_attrs,
};
diff --git a/include/linux/device.h b/include/linux/device.h
index 49ab53c..1c663c4 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -59,6 +59,7 @@ struct bus_type {
int (*suspend)(struct device * dev, pm_message_t state);
int (*suspend_late)(struct device * dev, pm_message_t state);
int (*resume_early)(struct device * dev);
+ int (*pm_has_noirq_stage)(struct device * dev);
int (*resume)(struct device * dev);
};
--
Matthew Garrett | [email protected]

2006-12-20 04:56:23

by Matthew Garrett

[permalink] [raw]
Subject: [PATCH 2/2] Update feature-removal-schedule.txt

Add pm_has_noirq_stage to feature-removal-schedule as part of the
/sys/devices/.../power/state removal. Also note that this functionality
won't be removed until alternative functionality is implemented, in
order to avoid having this argument again in July.

Signed-off-by: Matthew Garrett <[email protected]>

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 30f3c8c..8a91689 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -9,7 +9,8 @@ be removed from this file.
What: /sys/devices/.../power/state
dev->power.power_state
dpm_runtime_{suspend,resume)()
-When: July 2007
+ bus->pm_has_noirq_stage()
+When: Once alternative functionality has been implemented
Why: Broken design for runtime control over driver power states, confusing
driver-internal runtime power management with: mechanisms to support
system-wide sleep state transitions; event codes that distinguish

--
Matthew Garrett | [email protected]

2006-12-20 05:14:53

by David Brownell

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace

On Tuesday 19 December 2006 8:26 pm, Matthew Garrett wrote:
> On Tue, Dec 19, 2006 at 07:59:42PM -0800, David Brownell wrote:

> The existence of the power/state interface wasn't a bug - it was a
> deliberate decision to add it. It's the only reason the
> dpm_runtime_suspend() interface exists.

All that buggy infrastructure talks together, yes. Those dpm_*()
calls are in the same "will remove" task item.


> It's perfectly reasonable to
> refer to it as a flawed interface, or perhaps even a buggy one. But in
> itself, it's clearly not a bug.

This class of bug is also called a "design bug" or sometimes "mistake".


> > In contrast, the /sys/devices/.../power/state API has never had many
> > users beyond developers trying to test their drivers (without taking
> > the whole system into a low power state, which probably didn't work
> > in any case), and has *always* been problematic. And the change you
> > object to doesn't "break" anything fundamental, either. Everything
> > still works.
>
> It's used on every Ubuntu and Suse system,

Odd how the relevant Suse developers didn't mention any issues with
those files going away, any of the times problems with them were
discussed on the PM list. Also, I have a Suse system that doesn't
use those files for anything ... maybe only newer release use it.

I've got some Ubuntu going too, which hasn't (visibly) suffered from
any of these changes.

- dave

2006-12-20 05:34:54

by Greg KH

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace

On Tue, Dec 19, 2006 at 09:14:49PM -0800, David Brownell wrote:
> On Tuesday 19 December 2006 8:26 pm, Matthew Garrett wrote:
> > On Tue, Dec 19, 2006 at 07:59:42PM -0800, David Brownell wrote:
> > It's perfectly reasonable to
> > refer to it as a flawed interface, or perhaps even a buggy one. But in
> > itself, it's clearly not a bug.
>
> This class of bug is also called a "design bug" or sometimes "mistake".

Exactly, those "power" files actually pre-date the actual tree of
devices itself. They were just holders for what the original developer
thought was going to be needed, but was never properly implemented due
to some job changes (note, this was not myself...)

> > > In contrast, the /sys/devices/.../power/state API has never had many
> > > users beyond developers trying to test their drivers (without taking
> > > the whole system into a low power state, which probably didn't work
> > > in any case), and has *always* been problematic. And the change you
> > > object to doesn't "break" anything fundamental, either. Everything
> > > still works.
> >
> > It's used on every Ubuntu and Suse system,
>
> Odd how the relevant Suse developers didn't mention any issues with
> those files going away, any of the times problems with them were
> discussed on the PM list. Also, I have a Suse system that doesn't
> use those files for anything ... maybe only newer release use it.

I would be very interested to see any newer SuSE programs using that
interface. Just point them out to me and I'll quickly fix them.

And yes, as a SuSE developer (and one of the people in charge of the
SuSE kernels), I have no problem with these files just going away.
Because, as David keeps repeating, they are broken and wrong.

thanks,

greg k-h

2006-12-20 05:52:25

by Matthew Garrett

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace

On Tue, Dec 19, 2006 at 09:34:17PM -0800, Greg KH wrote:

> I would be very interested to see any newer SuSE programs using that
> interface. Just point them out to me and I'll quickly fix them.

As far as I can tell, powersaved still uses these.. I'm not quite sure
how you can fix it without just removing the functionality from it...

> And yes, as a SuSE developer (and one of the people in charge of the
> SuSE kernels), I have no problem with these files just going away.
> Because, as David keeps repeating, they are broken and wrong.

In the common case, it works perfectly well for the management of
individual PCI devices. Yes it's "wrong", in much the same way as (say)
the IDE bus registration/unregistration code. But we keep that around
because despite it being even more broken than devices/.../power/state,
people are still actually using it and we haven't provided any sort of
alternative.

Seriously. How many pieces of userspace-visible functionality have
recently been removed without there being any sort of alternative?
--
Matthew Garrett | [email protected]

2006-12-20 07:50:41

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace


> Seriously. How many pieces of userspace-visible functionality have
> recently been removed without there being any sort of alternative?

There IS an alternative, you're using it for networking:

You *down the interface*.

If there's a NIC that doesn't support that let us (or preferably netdev)
know and it'll get fixed quickly I'm sure.

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-20 12:53:34

by Matthew Garrett

[permalink] [raw]
Subject: Network drivers that don't suspend on interface down

On Wed, Dec 20, 2006 at 08:50:24AM +0100, Arjan van de Ven wrote:

(Adding netdev - context is the altering of the runtime power
management interface, with the effect that it's no longer possible for
userspace to request that drivers suspend a device, so Arjan has
suggested that we do it via other existing interfaces)

> > Seriously. How many pieces of userspace-visible functionality have
> > recently been removed without there being any sort of alternative?
>
> There IS an alternative, you're using it for networking:
>
> You *down the interface*.
>
> If there's a NIC that doesn't support that let us (or preferably netdev)
> know and it'll get fixed quickly I'm sure.

As far as I can tell, the following network devices don't put the
hardware into D3 on interface down:

3c59x
8139too
acenic
amd8111e
b44
cassini
defxx
dl2k
e100
e1000
epic100
fealnx
forcedeth
hamachi
hp100
ioc3-eth
natsemi
ne2k-pci
ns83820
pcnet32
qla3xxx
rtl8169
rrunner
s2io
saa9730
sis190
sis900
skge
sky2
spider_net
starfire
sundance
sungem
sunhme
tc35815
tlan
via-rhine
yellowfin

while these ones do:

bnx2
tg3
typhoon
via-velocity

tulip is somewhere in between - it puts the chip in a lower power state,
but not D3. It's possible that some of the other drivers do something
similar, but nothing leapt out at me.

The situation is more complicated for wireless. Userspace expects to be
able to get scan results from the card even if the interface is down. In
that case, I'm pretty sure we need a third state rather than just "up"
or "down".
--
Matthew Garrett | [email protected]

2006-12-20 13:38:55

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

about your driver list;
do you have an idea of what the top 5 relevant ones would be?
I'd be surprised if the top 5 together had less than 95% market share,
so if we fix those we'd be mostly done already.

> The situation is more complicated for wireless. Userspace expects to be
> able to get scan results from the card even if the interface is down.

if it's down userspace cannot currently expect this (if it does it's
broken), just as it currently can't expect link notifications when the
interface is down. It needs to have the interface up for this.
(but point taken for a 3rd state)

> In
> that case, I'm pretty sure we need a third state rather than just "up"
> or "down".

so what do you want from this 3rd state? rough guess based on what I
think the desktop wants (so please correct/append)

In the third state you
* don't expect to get or send "regular" packets
* don't have a dhcp lease or anything like that
* you do expect to get link change notification [1]
* you do expect to be able to scan for access points [2]

open questions
* what if you get a WOL event?



[1] What kind of latency would be allowed? Would an implementation be
allowed to power up the phy say once per minute or once per 5 minutes to
see if there is link? The implementation could do this progressively;
first poll every X seconds, then after an hour, every minute etc.

[2] would it be permissible to temporarily power up the device on scan?
Eg how frequently does the desktop expect to poll for scanning, and what
kind of latency would be tolerable?

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-20 14:31:50

by Matthew Garrett

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, Dec 20, 2006 at 02:38:51PM +0100, Arjan van de Ven wrote:
> about your driver list;
> do you have an idea of what the top 5 relevant ones would be?
> I'd be surprised if the top 5 together had less than 95% market share,
> so if we fix those we'd be mostly done already.

In terms of what I've seen on vaguely modern hardware, I'd guess at
e1000 and sky2 as the top ones. b44 is still common in cheaper hardware,
with via-rhine appearing at the very low end. I'll try to grep through
our hardware database results to get a stronger idea about percentages.

> > The situation is more complicated for wireless. Userspace expects to be
> > able to get scan results from the card even if the interface is down.
>
> if it's down userspace cannot currently expect this (if it does it's
> broken), just as it currently can't expect link notifications when the
> interface is down. It needs to have the interface up for this.
> (but point taken for a 3rd state)

The documentation for what userspace can legitimately expect of the
kernel is distinctly lacking, and as far as I can tell most of the
common drivers support scanning while the interface is down. It would be
immensely helpful if we could have a better idea of which kernel
behaviour is deliberate, and which bits of kernel functionality are
accidental and might go away at any time. Right now, I have various
scripts depending on this behaviour because there's absolutely no
indication that I'm not supposed to be.

(Of course, I may have missed it somewhere - I've never been able to
find terribly comprehensive documentation on WE)

> so what do you want from this 3rd state? rough guess based on what I
> think the desktop wants (so please correct/append)

Just to be clear: in this world view, "down" maps to "fully powered
down", so this third state is a "low power consumption mode"? If so:

> In the third state you
> * don't expect to get or send "regular" packets
> * don't have a dhcp lease or anything like that
> * you do expect to get link change notification [1]
> * you do expect to be able to scan for access points [2]

Yes, I think that's a fair summary.

> open questions
> * what if you get a WOL event?

In an ideal world, I think the information would be passed to userspace
and it would get to make the distinction. I appreciate that the hardware
may have different ideas about what's appropriate...

> [1] What kind of latency would be allowed? Would an implementation be
> allowed to power up the phy say once per minute or once per 5 minutes to
> see if there is link? The implementation could do this progressively;
> first poll every X seconds, then after an hour, every minute etc.

Yeah, I guess that's a problem. From a user perspective, the
functionality is only really useful if the latency is very small. I
think where possible we'd want to power down the chip while keeping the
phy up, but it would be nice to know how much power that would actually
cost us.

(We have a similar issue when it comes to stuff like monitor hotplug -
it's the sort of thing that many users are willing to accept losing some
battery for, and there probably isn't a single right answer)

> [2] would it be permissible to temporarily power up the device on scan?
> Eg how frequently does the desktop expect to poll for scanning, and what
> kind of latency would be tolerable?

network-manager's behaviour when the interface is inactive is to scan
every 2 minutes. I don't think latency is too much of an issue.

--
Matthew Garrett | [email protected]

2006-12-20 14:32:07

by Jiri Benc

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, 20 Dec 2006 12:53:14 +0000, Matthew Garrett wrote:
> The situation is more complicated for wireless. Userspace expects to be
> able to get scan results from the card even if the interface is down.

User space should get an error when trying to get scan results from the
interface that is down. Some drivers are broken and don't do this but
when they're fixed there is no problem here.

> In that case, I'm pretty sure we need a third state rather than just
> "up" or "down".

We have that third state, it's IFF_DORMANT. Not supported yet by any
wireless driver/stack, unfortunately.

Thanks,

Jiri

--
Jiri Benc
SUSE Labs

2006-12-20 15:27:05

by Olivier Galibert

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, Dec 20, 2006 at 02:38:51PM +0100, Arjan van de Ven wrote:
> [1] What kind of latency would be allowed? Would an implementation be
> allowed to power up the phy say once per minute or once per 5 minutes to
> see if there is link? The implementation could do this progressively;
> first poll every X seconds, then after an hour, every minute etc.

I suspect that the hard maximum latency is the time needed by the user
to start the network himself, be it opening a root xterm and doing the
appropriate invocation or pulling up and clicking where appropriate in
a GUI. That's probably around 5 seconds. Over that, and they won't
even notice there is an autodetection running.

But still, 5 seconds is probably too much too, because it's going to
look like it's unreliable. The user has to see something happen
within half-a-second or so, otherwise he's going to start doing it by
hand. The "see" part is distribution/desktop-dependant and not the
kernel problem, but the top chrono happens when the rj45 is plugged
in.

OG.

2006-12-20 15:34:24

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, 2006-12-20 at 16:27 +0100, Olivier Galibert wrote:
> On Wed, Dec 20, 2006 at 02:38:51PM +0100, Arjan van de Ven wrote:
> > [1] What kind of latency would be allowed? Would an implementation be
> > allowed to power up the phy say once per minute or once per 5 minutes to
> > see if there is link? The implementation could do this progressively;
> > first poll every X seconds, then after an hour, every minute etc.
>
> I suspect that the hard maximum latency is the time needed by the user
> to start the network himself, be it opening a root xterm and doing the
> appropriate invocation or pulling up and clicking where appropriate in
> a GUI. That's probably around 5 seconds. Over that, and they won't
> even notice there is an autodetection running.
>
> But still, 5 seconds is probably too much too, because it's going to
> look like it's unreliable. The user has to see something happen
> within half-a-second or so, otherwise he's going to start doing it by
> hand. The "see" part is distribution/desktop-dependant and not the
> kernel problem, but the top chrono happens when the rj45 is plugged
> in.

5 seconds is unfair and unrealistic though. The *hardware* negotiation
before link is seen can easily take upto 45 seconds already.
That's a network topology/hardware issue (spanning tree fun) that
software or even the hardware in your PC can do nothing about.

this means that the "power up time" needs to be at least 45 seconds, if
it's then down 5 seconds inbetween... that's not real power savings.

> .
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-20 15:51:43

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down


> Yeah, I guess that's a problem. From a user perspective, the
> functionality is only really useful if the latency is very small. I
> think where possible we'd want to power down the chip while keeping the
> phy up, but it would be nice to know how much power that would actually
> cost us.


I'm no expert but afaik the PHY is the power hungry part, the rest is
peanuts. So if we can get the PHY to sleep most of the time that would
be great.

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-20 16:04:08

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, 20 Dec 2006, Matthew Garrett wrote:

> As far as I can tell, the following network devices don't put the
> hardware into D3 on interface down:
[...]
> defxx

No support in the hardware for that. Even revision 3 of the board which
is the last one and the only to support PCI 2.2 says:

Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 PME-Enable- DSel=0 DScale=0 PME-

;-)

Maciej

2006-12-20 16:40:57

by Olivier Galibert

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, Dec 20, 2006 at 04:34:17PM +0100, Arjan van de Ven wrote:
> 5 seconds is unfair and unrealistic though. The *hardware* negotiation
> before link is seen can easily take upto 45 seconds already.
> That's a network topology/hardware issue (spanning tree fun) that
> software or even the hardware in your PC can do nothing about.

It's about ergonomics, not technical capabilities or fairness.


> this means that the "power up time" needs to be at least 45 seconds, if
> it's then down 5 seconds inbetween... that's not real power savings.

Then that means you can't have usable autodetection and power savings
at the same time. That's a pefectly acceptable answer, you just have
to give the choice between the two to the user. From the kernel
p.o.v, it just means that you probably need 3 modes:
1- active and exchanging packets

2- inactive but waiting for plugging and able to tell something is
going on fast (like 0.5s fast)

3- powered off

and they probably already exist (UP+addr/procmisc. set, UP and DOWN).
And if the second mode can't be lower power than the first, that's
just life. An hypothetical mode 4 identical to 2 without the "fast"
part is just not worth bothering with.

OG.

2006-12-20 17:22:22

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, 2006-12-20 at 17:40 +0100, Olivier Galibert wrote:
> On Wed, Dec 20, 2006 at 04:34:17PM +0100, Arjan van de Ven wrote:
> > 5 seconds is unfair and unrealistic though. The *hardware* negotiation
> > before link is seen can easily take upto 45 seconds already.
> > That's a network topology/hardware issue (spanning tree fun) that
> > software or even the hardware in your PC can do nothing about.
>
> It's about ergonomics, not technical capabilities or fairness.
not entirely.

>
>
> > this means that the "power up time" needs to be at least 45 seconds, if
> > it's then down 5 seconds inbetween... that's not real power savings.
>
> Then that means you can't have usable autodetection and power savings
> at the same time.

even if you have NO power savings you still don't meet your criteria.
That's basic ethernet for you....

That's what I was trying to say; your criteria is unrealistic regardless
of what the kernel does, ethernet already dictates 30 to 45 seconds
there.


--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-20 18:11:21

by Dan Williams

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, 2006-12-20 at 15:00 +0100, Jiri Benc wrote:
> On Wed, 20 Dec 2006 12:53:14 +0000, Matthew Garrett wrote:
> > The situation is more complicated for wireless. Userspace expects to be
> > able to get scan results from the card even if the interface is down.
>
> User space should get an error when trying to get scan results from the
> interface that is down. Some drivers are broken and don't do this but
> when they're fixed there is no problem here.

Entirely correct. If the card is DOWN, the radio should be off (both TX
& RX) and it should be in max power save mode. If userspace expects to
be able to get the card to do _anything_ when it's down, that's just
110% wrong. You can't get link events for many wired cards when they
are down, so I fail to see where userspace could expect to do anything
with a wireless card when it's down too.

> > In that case, I'm pretty sure we need a third state rather than just
> > "up" or "down".
>
> We have that third state, it's IFF_DORMANT. Not supported yet by any
> wireless driver/stack, unfortunately.

So we have 3 states? What purpose does DORMANT serve and what is
allowed in DORMANT?

Also, how does rfkill fit into this? rfkill implies killing TX, but do
we have the granularity to still receive while the transmit paths are
powered down?

Dan

> Thanks,
>
> Jiri
>

2006-12-20 20:40:53

by Benny Amorsen

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

>>>>> "AvdV" == Arjan van de Ven <[email protected]> writes:

AvdV> even if you have NO power savings you still don't meet your
AvdV> criteria. That's basic ethernet for you....

AvdV> That's what I was trying to say; your criteria is unrealistic
AvdV> regardless of what the kernel does, ethernet already dictates 30
AvdV> to 45 seconds there.

Can you get to such high numbers without STP?


/Benny


2006-12-20 21:19:42

by David Brownell

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix /sys/device/.../power/state

On Tuesday 19 December 2006 8:56 pm, Matthew Garrett wrote:

> --- a/drivers/base/power/sysfs.c
> +++ b/drivers/base/power/sysfs.c
> @@ -46,7 +46,8 @@ static ssize_t state_store(struct device * dev, struct device_attribute *attr, c
> int error = -EINVAL;
>
> /* disallow incomplete suspend sequences */
> - if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early))
> + if (dev->bus && dev->bus->pm_has_noirq_stage
> + && dev->bus->pm_has_noirq_stage(dev))
> return error;
>

I'm suspecting these two patches won't be merged, but this fragment has
two bugs. One is the whitespace bug already mentioned. The other is that
the original test must still be used if that bus primitve doesn't exist.

And in a different vein, I'm a bit surprised that the update to the
feature-removal-schedule.txt file is a separate patch, but:

> +???????bus->pm_has_noirq_stage()
> -When:??July 2007
> +When:??Once alternative functionality has been implemented

The "When" shouldn't change.

2006-12-20 21:22:38

by Stefan Rompf

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

Am Mittwoch, 20. Dezember 2006 16:34 schrieb Arjan van de Ven:

> 5 seconds is unfair and unrealistic though. The *hardware* negotiation
> before link is seen can easily take upto 45 seconds already.
> That's a network topology/hardware issue (spanning tree fun) that
> software or even the hardware in your PC can do nothing about.

Spanning tree decides whether or not a port forwards traffic. It has nothing
to do with link beat detection and autonegotation, so it shouldn't be an
issue here.

Stefan

2006-12-20 21:49:38

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, 2006-12-20 at 21:40 +0100, Benny Amorsen wrote:
> >>>>> "AvdV" == Arjan van de Ven <[email protected]> writes:
>
> AvdV> even if you have NO power savings you still don't meet your
> AvdV> criteria. That's basic ethernet for you....
>
> AvdV> That's what I was trying to say; your criteria is unrealistic
> AvdV> regardless of what the kernel does, ethernet already dictates 30
> AvdV> to 45 seconds there.
>
> Can you get to such high numbers without STP?

you can get the 30 seconds yes. Usually not with home equipment though,
but with longer cables and expensive switches.. it does happen.

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-20 22:49:17

by Stephen Hemminger

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, 20 Dec 2006 16:51:39 +0100
Arjan van de Ven <[email protected]> wrote:

>
> > Yeah, I guess that's a problem. From a user perspective, the
> > functionality is only really useful if the latency is very small. I
> > think where possible we'd want to power down the chip while keeping the
> > phy up, but it would be nice to know how much power that would actually
> > cost us.
>
>
> I'm no expert but afaik the PHY is the power hungry part, the rest is
> peanuts. So if we can get the PHY to sleep most of the time that would
> be great.
>

There are two different problems:

1) Behavior seems to be different depending on device driver
author. We should document the expected semantics better.

IMHO:
When device is down, it should:
a) use as few resources as possible:
- not grab memory for buffers
- not assign IRQ unless it could get one
- turn off all power consumption possible
b) allow setting parameters like speed/duplex/autonegotiation,
ring buffers, ... with ethtool, and remember the state
c) not accept data coming in, and drop packets queued

When device is up, it should:
a) Start negotiation if needed
b) Not bring up carrier till it is ready
c) Allow reconfiguration

Wake on Lan should be disabled by default, until changed.

2) Network device infrastructure should make it easier for devices:
bring interface down on suspend and bring it up after resume
(if it was running when suspended). This would allow many devices to
have no suspend/resume hook; except those that have some better power
control over hardware.




2006-12-20 23:52:17

by Stephen Hemminger

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, 20 Dec 2006 15:37:41 -0800
Rick Jones <[email protected]> wrote:

> > There are two different problems:
> >
> > 1) Behavior seems to be different depending on device driver
> > author. We should document the expected semantics better.
> >
> > IMHO:
> > When device is down, it should:
> > a) use as few resources as possible:
> > - not grab memory for buffers
> > - not assign IRQ unless it could get one
> > - turn off all power consumption possible
> > b) allow setting parameters like speed/duplex/autonegotiation,
> > ring buffers, ... with ethtool, and remember the state
> > c) not accept data coming in, and drop packets queued
>
> What implications does c have for something like tcpdump?
>
> rick jones

None, you can bring up the device without actually assigning an address to it.

--
Stephen Hemminger <[email protected]>

2006-12-20 23:55:42

by Rick Jones

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

> There are two different problems:
>
> 1) Behavior seems to be different depending on device driver
> author. We should document the expected semantics better.
>
> IMHO:
> When device is down, it should:
> a) use as few resources as possible:
> - not grab memory for buffers
> - not assign IRQ unless it could get one
> - turn off all power consumption possible
> b) allow setting parameters like speed/duplex/autonegotiation,
> ring buffers, ... with ethtool, and remember the state
> c) not accept data coming in, and drop packets queued

What implications does c have for something like tcpdump?

rick jones

2006-12-21 00:09:27

by Kyle Moffett

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Dec 19, 2006, at 15:55:43, Arjan van de Ven wrote:
> On Tue, 2006-12-19 at 20:32 +0000, Matthew Garrett wrote:
>> On Tue, Dec 19, 2006 at 09:23:05PM +0100, Arjan van de Ven wrote:
>>> On Tue, 2006-12-19 at 20:08 +0000, Matthew Garrett wrote:
>>>> I'm not sure. Suspending the chip means you lose things like
>>>> link beat detection, so it's not something you necessarily want
>>>> to automatically tie to something like interface status.
>>>
>>> right now the "spec" for Linux network drivers assumes that you
>>> put the NIC into D3 on down, except for cases where Wake-on-Lan
>>> is enabled etc.
>>
>> Really? I can't find any drivers that seem to do this. The only
>> calls to pci_set_power_state seem to be in the suspend, resume,
>> init and exit routines.
>
> your grep missed tg3.c for example, which has a wrapper around the
> power state code and goes to D3hot on downing of the interface.
> (just the first one I looked at as a "reference driver", others
> probably do the same thing)

I actually kind of like this feature; it makes it possible for
"ifdown" to virtually "unplug" the cable, such that the remote end
doesn't have an activated link. Admittedly it would be slightly more
useful to have the ifdown and the virtual-unplug be separate events.
There have been at least a few times where my Linux box is connected
to a network port I don't control that maintains some independent
state, and it's handy to be able to reset that state remotely.

Cheers,
Kyle Moffett

2006-12-21 00:14:31

by Francois Romieu

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

Stephen Hemminger <[email protected]> :
[...]
> IMHO:
> When device is down, it should:
> a) use as few resources as possible:
> - not grab memory for buffers
> - not assign IRQ unless it could get one
> - turn off all power consumption possible
> b) allow setting parameters like speed/duplex/autonegotiation,
> ring buffers, ... with ethtool, and remember the state
> c) not accept data coming in, and drop packets queued

<nit>
Imho speed/duplex/autoneg is not the business of the device: they belong
to the phy and it's up to it to decide if its state allows to set the
requested parameters or not.
</nit>

--
Ueimor

2006-12-21 00:27:04

by Stephen Hemminger

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Thu, 21 Dec 2006 01:11:12 +0100
Francois Romieu <[email protected]> wrote:

> Stephen Hemminger <[email protected]> :
> [...]
> > IMHO:
> > When device is down, it should:
> > a) use as few resources as possible:
> > - not grab memory for buffers
> > - not assign IRQ unless it could get one
> > - turn off all power consumption possible
> > b) allow setting parameters like speed/duplex/autonegotiation,
> > ring buffers, ... with ethtool, and remember the state
> > c) not accept data coming in, and drop packets queued
>
> <nit>
> Imho speed/duplex/autoneg is not the business of the device: they belong
> to the phy and it's up to it to decide if its state allows to set the
> requested parameters or not.
> </nit>
>

We need to allow ethtool setting to be done before device has been brought
up and started autonegotiation. The current MII library doesn't really support
it.

--
Stephen Hemminger <[email protected]>

2006-12-21 01:12:23

by Matthew Garrett

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, Dec 20, 2006 at 02:49:06PM -0800, Stephen Hemminger wrote:

> When device is down, it should:
> a) use as few resources as possible:
> - not grab memory for buffers
> - not assign IRQ unless it could get one
> - turn off all power consumption possible
> b) allow setting parameters like speed/duplex/autonegotiation,
> ring buffers, ... with ethtool, and remember the state

Veering off at something of a tangent - how much of this should be true
for wireless devices? Softmac seems to be unhappy about setting the
essid unless the card is up, which breaks various assumptions...

Beyond that, I think your descriptions of up and down states make sense
for userspace. As Arjan suggests, there's then the intermediate state of
"disable as much as possible while still providing scanning and link
detection".

> 2) Network device infrastructure should make it easier for devices:
> bring interface down on suspend and bring it up after resume
> (if it was running when suspended). This would allow many devices to
> have no suspend/resume hook; except those that have some better power
> control over hardware.

I'd have some concerns over how that would interact with the rest of the
PM infrastructure, but it certainly sounds good in principle.

--
Matthew Garrett | [email protected]

2006-12-21 01:15:42

by Matthew Garrett

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, Dec 20, 2006 at 01:12:51PM -0500, Dan Williams wrote:

> Entirely correct. If the card is DOWN, the radio should be off (both TX
> & RX) and it should be in max power save mode. If userspace expects to
> be able to get the card to do _anything_ when it's down, that's just
> 110% wrong. You can't get link events for many wired cards when they
> are down, so I fail to see where userspace could expect to do anything
> with a wireless card when it's down too.

Because it works on the common hardware? If there's documentation about
what userspace can legitimately expect, then I'm happy to defer to that.
But in the absence of any indication as to what functionality users can
depend on, deciding that existing functionality is a bug is, well,
impolite.

> Also, how does rfkill fit into this? rfkill implies killing TX, but do
> we have the granularity to still receive while the transmit paths are
> powered down?

Is rfkill not just primarily an interface for us getting events when the
radio changes state? Every time I read up on it I get a little more
confused - some time I really need to make sense of it...

--
Matthew Garrett | [email protected]

2006-12-21 01:29:42

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix /sys/device/.../power/state

On Wed, Dec 20, 2006 at 01:18:06PM -0800, David Brownell wrote:
> > /* disallow incomplete suspend sequences */
> > - if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early))
> > + if (dev->bus && dev->bus->pm_has_noirq_stage
> > + && dev->bus->pm_has_noirq_stage(dev))
> > return error;
> >
>
> I'm suspecting these two patches won't be merged, but this fragment has
> two bugs. One is the whitespace bug already mentioned.

I'm a bit curious about the whitespace issue - CodingStyle doesn't seem
to discuss what to do with if statements that end up longer than 80
characters, which is (I think) what you're talking about?

> The other is that
> the original test must still be used if that bus primitve doesn't exist.

I dislike that. We're asking to suspend an individual device - whether
the bus supports devices that need to suspend with interrupts disabled
is irrelevent, it's the device that we care about. We should just make
it necessary for every bus to support this method until the interface is
removed.

> And in a different vein, I'm a bit surprised that the update to the
> feature-removal-schedule.txt file is a separate patch, but:

It seemed like a logically distinct change, but I'm happy to merge them.

> > +???????bus->pm_has_noirq_stage()
> > -When:??July 2007
> > +When:??Once alternative functionality has been implemented
>
> The "When" shouldn't change.

We shouldn't remove interfaces that userland uses until there's been a
replacement for long enough that userland can switch over. Setting a
date for removing this interface when most drivers don't implement the
replacement isn't reasonable.

--
Matthew Garrett | [email protected]

2006-12-21 02:05:53

by Michael Wu

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wednesday 20 December 2006 20:12, Matthew Garrett wrote:
> Veering off at something of a tangent - how much of this should be true
> for wireless devices? Softmac seems to be unhappy about setting the
> essid unless the card is up, which breaks various assumptions...
>
Softmac isn't the only wireless code that likes to be configured after going
up first. Configuring after the card goes up has generally been more
reliable, though that should not be necessary and is a bug IMHO.

> Beyond that, I think your descriptions of up and down states make sense
> for userspace. As Arjan suggests, there's then the intermediate state of
> "disable as much as possible while still providing scanning and link
> detection".
>
In order to scan, we need to have the radio on and we need to be able to send
and receive. What are you gonna turn off?

-Michael Wu


Attachments:
(No filename) (858.00 B)
(No filename) (189.00 B)
Download all attachments

2006-12-21 02:10:53

by Jesse Brandeburg

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On 12/20/06, Arjan van de Ven <[email protected]> wrote:
>
> > Yeah, I guess that's a problem. From a user perspective, the
> > functionality is only really useful if the latency is very small. I
> > think where possible we'd want to power down the chip while keeping the
> > phy up, but it would be nice to know how much power that would actually
> > cost us.
>
> I'm no expert but afaik the PHY is the power hungry part, the rest is
> peanuts. So if we can get the PHY to sleep most of the time that would
> be great.

The MAC uses some part of power, but FYI at least e1000 already does
phy power management when IF_DOWN, if wake on lan isn't enabled, smbus
isn't enabled, etc etc. If we started using D3 power management its
possible a whole bunch of code would go away out of e1000.

Is there some reason why we can't have the OS just do the D3
transition for all drivers that register support? I mean, this power
management using D states is actually driver *independent* and at
least way back in the day was supposed to be implemented for "OS power
management"

2006-12-21 02:18:46

by Matthew Garrett

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, Dec 20, 2006 at 09:05:27PM -0500, Michael Wu wrote:

> Softmac isn't the only wireless code that likes to be configured after going
> up first. Configuring after the card goes up has generally been more
> reliable, though that should not be necessary and is a bug IMHO.

Ok, that's nice to know.

> In order to scan, we need to have the radio on and we need to be able to send
> and receive. What are you gonna turn off?

The obvious route would be to power the card down, but come back up
every two minutes to perform a scan, or if userspace explicitly requests
one. Would this cause problems in some cases?

--
Matthew Garrett | [email protected]

2006-12-21 02:21:11

by Matthew Garrett

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, Dec 20, 2006 at 08:57:05PM -0500, Michael Wu wrote:

(allowing scanning while the interface is down)

> No, it's absolutely a bug. It just so happens that some drivers incorrectly
> allowed it.

Ok. Would it be reasonable to add warnings to any devices that allow it?
--
Matthew Garrett | [email protected]

2006-12-21 02:22:29

by Michael Wu

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wednesday 20 December 2006 20:15, Matthew Garrett wrote:
> Because it works on the common hardware? If there's documentation about
> what userspace can legitimately expect, then I'm happy to defer to that.
> But in the absence of any indication as to what functionality users can
> depend on, deciding that existing functionality is a bug is, well,
> impolite.
>
No, it's absolutely a bug. It just so happens that some drivers incorrectly
allowed it.

-Michael Wu


Attachments:
(No filename) (468.00 B)
(No filename) (189.00 B)
Download all attachments

2006-12-21 02:28:53

by Daniel Drake

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

Matthew Garrett wrote:
> Veering off at something of a tangent - how much of this should be true
> for wireless devices? Softmac seems to be unhappy about setting the
> essid unless the card is up, which breaks various assumptions...

You might regard that as a bug - I agree it probably makes sense for you
to be able to set certain configuration variables before the interface
is up, within reason.

However, the mentality adopted by most wireless drivers is the SIWESSID
wireless extension ioctl means *associate*, something which obviously
shouldn't be possible when the interface is down (radio off, etc).

While you might blame drivers for this possible misinterpretation, it
can also be viewed as a design flaw in WE: the drivers have to handle
the ioctl's directly, meaning that if you want some kind of
configuration management then you have to do it on the driver level, and
this doesn't feel right.

The situation is also made worse due to WE generally being hard to
implement, and also the lack of documentation (really the only source
here is the iwconfig man page).

This screams out for an 802.11-centric configuration system, and it
looks like we have one on the way: cfg80211
From reading some mails, it looks like the drivers will simply have to
provide functions for "associate", "scan", etc, and the configuration
management will be offloaded to the upper layers.

For the time being, I suggest you bring the interface up before setting
the configuration. Regardless of the inconsistency of the current
situation, and lack documentation saying which way it should be done,
you are at least playing it safe and guaranteeing it works on all drivers.

Daniel

2006-12-21 02:37:38

by Daniel Drake

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

Matthew Garrett wrote:
>> In order to scan, we need to have the radio on and we need to be able to send
>> and receive. What are you gonna turn off?
>
> The obvious route would be to power the card down, but come back up
> every two minutes to perform a scan, or if userspace explicitly requests
> one. Would this cause problems in some cases?

I don't think it makes sense. For zd1211 the power consumption and heat
emission goes up considerably when the interface is brought up (radio
on, interrupts enabled, etc), and this is also a relatively long
operation in terms of duration needed to bring the interface up and
down. A scanning operation requires radio on, interrupts enabled, lots
of register reading, RF calibration, RX/TX ringbuffers allocation, etc.

I don't think that supporting scanning when the interface is supposed to
be disabled is sensible. If you want to scan, you are simply sending and
receiving frames, it's no different from having the interface up and
sending/receiving data frames.

There are additional implementation problems: scanning requires 2
different ioctl calls: siwscan, then several giwscan. If you want the
driver to effectively temporarily bring the interface up when userspace
requests a scan but the interface was down, then how does the driver
know when to bring it down again?

Daniel

2006-12-21 02:45:52

by Matthew Garrett

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, Dec 20, 2006 at 09:38:20PM -0500, Daniel Drake wrote:

> I don't think that supporting scanning when the interface is supposed to
> be disabled is sensible. If you want to scan, you are simply sending and
> receiving frames, it's no different from having the interface up and
> sending/receiving data frames.

>From a usability point of view, it's helpful to power the card down as
much as possible while it's not being actively used. However, it's also
helpful to be able to provide a list of available wireless networks,
though some degree of latency would be acceptable in that. These two
desires are obviously not entirely compatible with one another, so it
would be helpful if there was some means of providing an intermediate
state.

> There are additional implementation problems: scanning requires 2
> different ioctl calls: siwscan, then several giwscan. If you want the
> driver to effectively temporarily bring the interface up when userspace
> requests a scan but the interface was down, then how does the driver
> know when to bring it down again?

Hm. Does the spec not set any upper bound on how long it might take for
APs to respond? I'm afraid that my 802.11 knowledge is pretty slim.
Picking a number out of thin air would be one answer, but clearly less
than ideal. This may be a case of us not being able to satisfy everyone,
and so just having to force the user to choose between low power or
wireless scanning.

--
Matthew Garrett | [email protected]

2006-12-21 03:00:19

by Dan Williams

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Thu, 2006-12-21 at 02:20 +0000, Matthew Garrett wrote:
> On Wed, Dec 20, 2006 at 08:57:05PM -0500, Michael Wu wrote:
>
> (allowing scanning while the interface is down)
>
> > No, it's absolutely a bug. It just so happens that some drivers incorrectly
> > allowed it.
>
> Ok. Would it be reasonable to add warnings to any devices that allow it?

Quite reasonable.

Dan


2006-12-21 03:04:45

by Dan Williams

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Thu, 2006-12-21 at 01:15 +0000, Matthew Garrett wrote:
> On Wed, Dec 20, 2006 at 01:12:51PM -0500, Dan Williams wrote:
>
> > Entirely correct. If the card is DOWN, the radio should be off (both TX
> > & RX) and it should be in max power save mode. If userspace expects to
> > be able to get the card to do _anything_ when it's down, that's just
> > 110% wrong. You can't get link events for many wired cards when they
> > are down, so I fail to see where userspace could expect to do anything
> > with a wireless card when it's down too.
>
> Because it works on the common hardware? If there's documentation about
> what userspace can legitimately expect, then I'm happy to defer to that.
> But in the absence of any indication as to what functionality users can
> depend on, deciding that existing functionality is a bug is, well,
> impolite.
>
> > Also, how does rfkill fit into this? rfkill implies killing TX, but do
> > we have the granularity to still receive while the transmit paths are
> > powered down?
>
> Is rfkill not just primarily an interface for us getting events when the
> radio changes state? Every time I read up on it I get a little more
> confused - some time I really need to make sense of it...

That's OK, it's really complicated. There are 3 cases of rfkill
switches AFAICT:

a) tied to the wireless hardware, switch kills hardware directly
b) tied to wireless hardware, but driver handles the kill request
c) just another key, a separate key driver handles the event and asks
the wireless driver to kill the card

It's also complicated because some switches are supposed to rfkill both
an 802.11 module _and_ a bluetooth module at the same time, or I guess
some laptops may even have one rfkill switch for each wireless device.
Furthermore, some people want to 'softkill' the hardware via software
without pushing the key, which is a subset of (b) or (c) above.

It sucks. But we _need_ a unified interface to handle it.

Dan


2006-12-21 03:07:41

by Daniel Drake

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

Matthew Garrett wrote:
>> There are additional implementation problems: scanning requires 2
>> different ioctl calls: siwscan, then several giwscan. If you want the
>> driver to effectively temporarily bring the interface up when userspace
>> requests a scan but the interface was down, then how does the driver
>> know when to bring it down again?
>
> Hm. Does the spec not set any upper bound on how long it might take for
> APs to respond? I'm afraid that my 802.11 knowledge is pretty slim.

I'm not sure, but thats not entirely relevant either. The time it takes
for the AP to respond is not related to the delay between userspace
sending the siwscan and giwscan ioctls (unless you're thinking of
userspace being too quick, but GIWSCAN already returns -EINPROGRESS when
appropriate so this is detectable)

> Picking a number out of thin air would be one answer, but clearly less
> than ideal. This may be a case of us not being able to satisfy everyone,
> and so just having to force the user to choose between low power or
> wireless scanning.

I think it's reasonable to keep the interface down, but then when the
user does want to connect, bring the interface up, scan, present scan
results. Scanning is quick, there would be minimal wait needed here.

Alternatively, if you do want to prepare scan results in the background
every 2 minutes, use a sequence something like:

- bring interface up
- siwscan
- giwscan [...]
- bring interface down
- repeat after 2 mins

If this kind of thing was implemented at the driver level, in most cases
it would be identical to doing the above anyway.

Daniel

2006-12-21 03:12:04

by Dan Williams

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Thu, 2006-12-21 at 02:18 +0000, Matthew Garrett wrote:
> On Wed, Dec 20, 2006 at 09:05:27PM -0500, Michael Wu wrote:
>
> > Softmac isn't the only wireless code that likes to be configured after going
> > up first. Configuring after the card goes up has generally been more
> > reliable, though that should not be necessary and is a bug IMHO.
>
> Ok, that's nice to know.
>
> > In order to scan, we need to have the radio on and we need to be able to send
> > and receive. What are you gonna turn off?
>
> The obvious route would be to power the card down, but come back up
> every two minutes to perform a scan, or if userspace explicitly requests
> one. Would this cause problems in some cases?

Seriously, having all these different capabilities when the card is
"down" is just madness. Down == Down!!! Furthermore, every card is
going to support some other subset of capabilities when it's "down".
When you bring "up" prism54 fullmac card, you have to power up the
hardware, reload the firmware, let the firmware boot, and then talk to
it. Doing that every 2 minutes is just a waste of time, effort, and
power.

If you want to scan, just bring the darn card up to do it. It's so much
simpler that way, and I just don't see what having all this "every 2
minutes do a scan" policy really buys us. That doesn't belong in the
kernel. If something wants to scan, userspace can wake the card up and
do the scan. It's userspace that's using the scan results to configure
the card anyway, so userspace can do the scan.

Simple == good. Down == down. Lets just agree on that and save
ourselves a lot of pain.

Dan


2006-12-21 03:14:31

by Matthew Garrett

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, Dec 20, 2006 at 10:06:51PM -0500, Dan Williams wrote:

> a) tied to the wireless hardware, switch kills hardware directly
> b) tied to wireless hardware, but driver handles the kill request
> c) just another key, a separate key driver handles the event and asks
> the wireless driver to kill the card
>
> It's also complicated because some switches are supposed to rfkill both
> an 802.11 module _and_ a bluetooth module at the same time, or I guess
> some laptops may even have one rfkill switch for each wireless device.
> Furthermore, some people want to 'softkill' the hardware via software
> without pushing the key, which is a subset of (b) or (c) above.

If we define interface down as meaning that the device is powered down
and the radio switched off, then (b) and (c) would presumably just need
to ensure that the interface is downed. (a) is a slightly more special
case - if the switch disables the radio, I guess we then want the driver
to down the interface as well.

In the (a) case, drivers should presumably refuse to bring the interface
up if the radio is disabled?
--
Matthew Garrett | [email protected]

2006-12-21 03:26:07

by Matthew Garrett

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, Dec 20, 2006 at 10:08:22PM -0500, Daniel Drake wrote:
> Matthew Garrett wrote:
> >Hm. Does the spec not set any upper bound on how long it might take for
> >APs to respond? I'm afraid that my 802.11 knowledge is pretty slim.
>
> I'm not sure, but thats not entirely relevant either. The time it takes
> for the AP to respond is not related to the delay between userspace
> sending the siwscan and giwscan ioctls (unless you're thinking of
> userspace being too quick, but GIWSCAN already returns -EINPROGRESS when
> appropriate so this is detectable)

Ah - I've mostly been looking at the ipw* drivers, where giwscan just
seems to return information cached by the ieee80211 layer. A quick scan
suggests that most cards behave like this, but prism54 seems to refer to
the hardware. I can see why that would cause problems.

> I think it's reasonable to keep the interface down, but then when the
> user does want to connect, bring the interface up, scan, present scan
> results. Scanning is quick, there would be minimal wait needed here.

Yeah, that's true.

--
Matthew Garrett | [email protected]

2006-12-21 03:27:11

by Dan Williams

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, 2006-12-20 at 22:08 -0500, Daniel Drake wrote:
> Matthew Garrett wrote:
> >> There are additional implementation problems: scanning requires 2
> >> different ioctl calls: siwscan, then several giwscan. If you want the
> >> driver to effectively temporarily bring the interface up when userspace
> >> requests a scan but the interface was down, then how does the driver
> >> know when to bring it down again?
> >
> > Hm. Does the spec not set any upper bound on how long it might take for
> > APs to respond? I'm afraid that my 802.11 knowledge is pretty slim.
>
> I'm not sure, but thats not entirely relevant either. The time it takes
> for the AP to respond is not related to the delay between userspace
> sending the siwscan and giwscan ioctls (unless you're thinking of
> userspace being too quick, but GIWSCAN already returns -EINPROGRESS when
> appropriate so this is detectable)

Channel dwell time for a passive scan is usually around 100ms - 250ms,
depending on how accurate you want your scan results (== wait longer),
and how much power you want to save (== don't wait long).

Correct userspace apps should:

1) Set a timer for, say, 8 seconds
2) Issue an SIWSCAN command
3) Wait for the GIWSCAN netlink event from the card, get results via
GIWSCAN command when it comes; cancel the timer from (2)
4) If the timer fires because no GIWSCAN event was received, try to get
scan results via GIWSCAN command from the driver anyway

<rant>
Note that NDIS requires a driver to return _something_ within 2 seconds
of a scan request. Even if you're an 802.11a card (madwifi *cough*, I'm
starting a new thing where I cough after...). So it's certainly
possible to return scan results in a timely manner, since the Windows
drivers for these cards are obviously doing it just fine.

Drivers should buffer scan results from past scans, age them
appropriately, and purge them when they get too old. Drivers should
never, ever, clear the scan result list when SIWSCAN or GIWSCAN is
called, because that means there's a window when a scan result request
from some other app could illegitimately return no BSSID records.
</rant>

> > Picking a number out of thin air would be one answer, but clearly less
> > than ideal. This may be a case of us not being able to satisfy everyone,
> > and so just having to force the user to choose between low power or
> > wireless scanning.
>
> I think it's reasonable to keep the interface down, but then when the
> user does want to connect, bring the interface up, scan, present scan
> results. Scanning is quick, there would be minimal wait needed here.

Unless you're madwifi *cough* and then you may have to wait up to _14_
seconds for a full scan of all a/bg channels. That's just insane. I
have no idea why that's the case (or at least was up to earlier this
year) but it's just unacceptable.

> Alternatively, if you do want to prepare scan results in the background
> every 2 minutes, use a sequence something like:
>
> - bring interface up
> - siwscan
> - giwscan [...]
> - bring interface down
> - repeat after 2 mins
>
> If this kind of thing was implemented at the driver level, in most cases
> it would be identical to doing the above anyway.

Right. It should 100% be in userspace and not in the drivers. Who says
2 minutes is the right interval? Putting that stuff, and the get/set
commands for changing that interval, in the driver is just plain wrong.

Dan

> Daniel
> -
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2006-12-21 03:30:44

by Dan Williams

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Thu, 2006-12-21 at 03:14 +0000, Matthew Garrett wrote:
> On Wed, Dec 20, 2006 at 10:06:51PM -0500, Dan Williams wrote:
>
> > a) tied to the wireless hardware, switch kills hardware directly
> > b) tied to wireless hardware, but driver handles the kill request
> > c) just another key, a separate key driver handles the event and asks
> > the wireless driver to kill the card
> >
> > It's also complicated because some switches are supposed to rfkill both
> > an 802.11 module _and_ a bluetooth module at the same time, or I guess
> > some laptops may even have one rfkill switch for each wireless device.
> > Furthermore, some people want to 'softkill' the hardware via software
> > without pushing the key, which is a subset of (b) or (c) above.
>
> If we define interface down as meaning that the device is powered down
> and the radio switched off, then (b) and (c) would presumably just need
> to ensure that the interface is downed. (a) is a slightly more special
> case - if the switch disables the radio, I guess we then want the driver
> to down the interface as well.

Correct.

> In the (a) case, drivers should presumably refuse to bring the interface
> up if the radio is disabled?

Right; the driver simply can't do anything about it, because the switch
is hardwired to the card and either the card's firmware takes care of
it, or the chipset takes care of it. The driver has no say whatsoever
in the state of the card's radio for this case. I tend to think this
case is on it's way out in the same way that fullmac cards are falling
out of favor (ie, do everything in software and save $$$), but they are
around and we need to support them.

In this case, down really does mean down too. The driver cannot honor
requests to set SSID, frequency, etc, because it's simply not possible
at that time.

Dan


2006-12-21 03:31:20

by David Brownell

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix /sys/device/.../power/state

On Wednesday 20 December 2006 5:29 pm, Matthew Garrett wrote:
> On Wed, Dec 20, 2006 at 01:18:06PM -0800, David Brownell wrote:
> > > /* disallow incomplete suspend sequences */
> > > - if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early))
> > > + if (dev->bus && dev->bus->pm_has_noirq_stage
> > > + && dev->bus->pm_has_noirq_stage(dev))
> > > return error;
> > >
> >
> > I'm suspecting these two patches won't be merged,

Make that "strongly suspecting" given what Greg said ... he normally
gets the final say over drivers/core/* things, and you seem alone in
wanting to help those sysfs files extend their withered existence.


> > but this fragment has
> > two bugs. One is the whitespace bug already mentioned.
>
> I'm a bit curious about the whitespace issue - CodingStyle doesn't seem
> to discuss what to do with if statements that end up longer than 80
> characters, which is (I think) what you're talking about?

It does say that indents must use only tabs, which that clearly doesn't.
I think you'll find that

if (some_very_long_condition
&& probably_not_quite_as_long
&& or_too_long_for_one_line) {
do_this;
and_this;
}

is widely accepted. (The conditions get an extra indent so they don't
look like they're part of the block executing if the test is true.)


>
> > The other is that
> > the original test must still be used if that bus primitve doesn't exist.
>
> I dislike that.

Tough noogies, as they say. In a tradeoff between correctness and your
personal taste (or even mine, sigh!), the normal tradeoff is in favor
of correctness.


> We're asking to suspend an individual device - whether
> the bus supports devices that need to suspend with interrupts disabled
> is irrelevent, it's the device that we care about. We should just make
> it necessary for every bus to support this method until the interface is
> removed.

But you _didn't_ do anything to "make it necessary". Which means that
your patch *WILL* cause bugs whenever a driver uses those calls, and
courtesy of your patch userspace tries to suspend that device ...


> > > +???????bus->pm_has_noirq_stage()
> > > -When:??July 2007
> > > +When:??Once alternative functionality has been implemented
> >
> > The "When" shouldn't change.
>
> We shouldn't remove interfaces that userland uses until there's been a
> replacement for long enough that userland can switch over.

Userland can stop using this **TODAY** and just "ifdown", so that
argument seems weak. For all your examples, the userland interface
is already available.

- Dave

2006-12-21 03:37:37

by Dan Williams

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Thu, 2006-12-21 at 03:25 +0000, Matthew Garrett wrote:
> On Wed, Dec 20, 2006 at 10:08:22PM -0500, Daniel Drake wrote:
> > Matthew Garrett wrote:
> > >Hm. Does the spec not set any upper bound on how long it might take for
> > >APs to respond? I'm afraid that my 802.11 knowledge is pretty slim.
> >
> > I'm not sure, but thats not entirely relevant either. The time it takes
> > for the AP to respond is not related to the delay between userspace
> > sending the siwscan and giwscan ioctls (unless you're thinking of
> > userspace being too quick, but GIWSCAN already returns -EINPROGRESS when
> > appropriate so this is detectable)
>
> Ah - I've mostly been looking at the ipw* drivers, where giwscan just
> seems to return information cached by the ieee80211 layer. A quick scan
> suggests that most cards behave like this, but prism54 seems to refer to
> the hardware. I can see why that would cause problems.

Prism54 (fullmac) does background scanning all the time when the
interface is up. You can't turn it off AFAIK. If you look at SIWSCAN,
you'll see that it's essentially a NOP since the firmware is always
scanning, and you'll always have up-to-date scan results. Ideally, all
drivers would do it like prism54 does (and some later ipw versions, I
think).

Dan

>
> > I think it's reasonable to keep the interface down, but then when the
> > user does want to connect, bring the interface up, scan, present scan
> > results. Scanning is quick, there would be minimal wait needed here.
>
> Yeah, that's true.
>

2006-12-21 03:51:55

by Andrew Morton

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Tue, 19 Dec 2006 19:29:13 -0800
David Brownell <[email protected]> wrote:

> On Tuesday 19 December 2006 6:15 pm, Andrew Morton wrote:
> > On Tue, 19 Dec 2006 13:34:49 -0800
> > David Brownell <[email protected]> wrote:
> >
> > > Documentation/feature-removal-schedule.txt has warned about this since
> > > August
> >
> > Nobody reads that.
> >
> > Please, wherever possible, put a nice printk("this is going away") in the code
> > when planning these things.
>
>
> Signed-off-by: David Brownell <[email protected]>
>
> Index: g26/drivers/base/power/sysfs.c
> ===================================================================
> --- g26.orig/drivers/base/power/sysfs.c 2006-09-27 16:19:00.000000000 -0700
> +++ g26/drivers/base/power/sysfs.c 2006-12-19 19:27:25.000000000 -0800
> @@ -42,9 +42,17 @@ static ssize_t state_show(struct device
>
> static ssize_t state_store(struct device * dev, struct device_attribute *attr, const char * buf, size_t n)
> {
> + static int warned;
> pm_message_t state;
> int error = -EINVAL;
>
> + if (!warned) {
> + printk(KERN_WARNING
> + "*** WARNING *** sysfs devices/.../power/state files "
> + "are only for testing, and will be removed\n");
> + warned = error;
> + }
> +
> /* disallow incomplete suspend sequences */
> if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early))
> return error;

Well that's not much use. It tells people "hey, we broke it". They
already knew that.

What we should do is to revert 047bda36150d11422b2c7bacca1df324c909c0b3 and
add a printk("hey, we'll be breaking this soon").

2006-12-21 04:07:04

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix /sys/device/.../power/state

On Wed, Dec 20, 2006 at 07:04:28PM -0800, David Brownell wrote:
> On Wednesday 20 December 2006 5:29 pm, Matthew Garrett wrote:
> > I dislike that.
>
> Tough noogies, as they say. In a tradeoff between correctness and your
> personal taste (or even mine, sigh!), the normal tradeoff is in favor
> of correctness.

But it's not correct - the test prohibits suspending devices even if
it would be safe to do so.

> > We're asking to suspend an individual device - whether
> > the bus supports devices that need to suspend with interrupts disabled
> > is irrelevent, it's the device that we care about. We should just make
> > it necessary for every bus to support this method until the interface is
> > removed.
>
> But you _didn't_ do anything to "make it necessary". Which means that
> your patch *WILL* cause bugs whenever a driver uses those calls, and
> courtesy of your patch userspace tries to suspend that device ...

New patch attached.

> > We shouldn't remove interfaces that userland uses until there's been a
> > replacement for long enough that userland can switch over.
>
> Userland can stop using this **TODAY** and just "ifdown", so that
> argument seems weak. For all your examples, the userland interface
> is already available.

No. The proposed interface is currently implemented by three network
drivers out of over seventy. Once the rest (or even the majority of the
rest) are converted to support that, I'd have no objections to the
removal being scheduled.

Signed-off-by: Matthew Garrett <[email protected]>

diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 30f3c8c..8a91689 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -9,7 +9,8 @@ be removed from this file.
What: /sys/devices/.../power/state
dev->power.power_state
dpm_runtime_{suspend,resume)()
-When: July 2007
+ bus->pm_has_noirq_stage()
+When: Once alternative functionality has been implemented
Why: Broken design for runtime control over driver power states, confusing
driver-internal runtime power management with: mechanisms to support
system-wide sleep state transitions; event codes that distinguish
diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt
index d0e79d5..345cca4 100644
--- a/Documentation/power/devices.txt
+++ b/Documentation/power/devices.txt
@@ -79,6 +79,7 @@ struct bus_type {

int (*resume_early)(struct device *dev);
int (*resume)(struct device *dev);
+ int (*pm_has_noirq_stage)(struct device *dev);
};

Bus drivers implement those methods as appropriate for the hardware and
@@ -236,6 +237,10 @@ The phases are seen by driver notifications issued in this order:
may stay partly usable until this late. This "late" call may also
help when coping with hardware that behaves badly.

+ If a bus implements the suspend_late method, it must also provide a
+ pm_has_noirq_stage function in order to determine whether devices
+ may be suspended during runtime.
+
The pm_message_t parameter is currently used to refine those semantics
(described later).

@@ -348,7 +353,9 @@ The phases are seen by driver notifications issued in this order:
won't be supported on busses that require IRQs in order to
interact with devices.

- This reverses the effects of bus.suspend_late().
+ This reverses the effects of bus.suspend_late(). As with suspend_late,
+ if a bus implements this function it must provide a pm_has_noirq_stage
+ function.

2 bus.resume(dev) is called next. This may be morphed into a device
driver call with bus-specific parameters; implementations may sleep.
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index f9c903b..6bf1218 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -597,6 +597,16 @@ static int platform_resume(struct device * dev)
return ret;
}

+static int platform_pm_has_noirq_stage(struct device * dev)
+{
+ int ret = 0;
+ struct platform_driver *drv = to_platform_driver(dev->driver);
+
+ if (dev->driver && (drv->resume_early || drv->suspend_late))
+ ret = 1;
+ return ret;
+}
+
struct bus_type platform_bus_type = {
.name = "platform",
.dev_attrs = platform_dev_attrs,
@@ -606,6 +616,7 @@ struct bus_type platform_bus_type = {
.suspend_late = platform_suspend_late,
.resume_early = platform_resume_early,
.resume = platform_resume,
+ .pm_has_noirq_stage = platform_pm_has_noirq_stage,
};
EXPORT_SYMBOL_GPL(platform_bus_type);

diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c
index 2d47517..c4ce060 100644
--- a/drivers/base/power/sysfs.c
+++ b/drivers/base/power/sysfs.c
@@ -46,7 +46,8 @@ static ssize_t state_store(struct device * dev, struct device_attribute *attr, c
int error = -EINVAL;

/* disallow incomplete suspend sequences */
- if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early))
+ if (dev->bus && dev->bus->pm_has_noirq_stage
+ && dev->bus->pm_has_noirq_stage(dev))
return error;

state.event = PM_EVENT_SUSPEND;
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index e5ae3a0..c0e4e7a 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -351,6 +351,17 @@ static int pci_device_resume(struct device * dev)
return error;
}

+static int pci_device_pm_has_noirq_stage(struct device * dev)
+{
+ int error = 0;
+ struct pci_dev * pci_dev = to_pci_dev(dev);
+ struct pci_driver * drv = pci_dev->driver;
+
+ if (drv && (drv->resume_early || drv->suspend_late))
+ error = 1;
+ return error;
+}
+
static int pci_device_resume_early(struct device * dev)
{
int error = 0;
@@ -569,6 +580,7 @@ struct bus_type pci_bus_type = {
.suspend_late = pci_device_suspend_late,
.resume_early = pci_device_resume_early,
.resume = pci_device_resume,
+ .pm_has_noirq_stage = pci_device_pm_has_noirq_stage,
.shutdown = pci_device_shutdown,
.dev_attrs = pci_dev_attrs,
};
diff --git a/include/linux/device.h b/include/linux/device.h
index 49ab53c..1c663c4 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -59,6 +59,7 @@ struct bus_type {
int (*suspend)(struct device * dev, pm_message_t state);
int (*suspend_late)(struct device * dev, pm_message_t state);
int (*resume_early)(struct device * dev);
+ int (*pm_has_noirq_stage)(struct device * dev);
int (*resume)(struct device * dev);
};

--
Matthew Garrett | [email protected]

2006-12-21 04:51:11

by David Brownell

[permalink] [raw]
Subject: Re: [PATCH 1/2] Fix /sys/device/.../power/state

On Wednesday 20 December 2006 8:06 pm, Matthew Garrett wrote:
> On Wed, Dec 20, 2006 at 07:04:28PM -0800, David Brownell wrote:
> > On Wednesday 20 December 2006 5:29 pm, Matthew Garrett wrote:
> > > I dislike that.
> >
> > Tough noogies, as they say. In a tradeoff between correctness and your
> > personal taste (or even mine, sigh!), the normal tradeoff is in favor
> > of correctness.
>
> But it's not correct - the test prohibits suspending devices even if
> it would be safe to do so.

It prohibits suspending them unless it's known to be safe. What your
patch does is add some more ways to know it's safe. My comment was
that while adding ways is safe, it's incorrect to allow things which
aren't known to be safe.


> > > We're asking to suspend an individual device - whether
> > > the bus supports devices that need to suspend with interrupts disabled
> > > is irrelevent, it's the device that we care about. We should just make
> > > it necessary for every bus to support this method until the interface is
> > > removed.
> >
> > But you _didn't_ do anything to "make it necessary". Which means that
> > your patch *WILL* cause bugs whenever a driver uses those calls, and
> > courtesy of your patch userspace tries to suspend that device ...
>
> New patch attached.

I'll have a look at it after I get past some other stuff. I take it that
you tested this by now? I assume it'd work, but you know how that goes.

- Dave

2006-12-21 04:56:32

by David Brownell

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Wednesday 20 December 2006 7:51 pm, Andrew Morton wrote:

> > + if (!warned) {
> > + printk(KERN_WARNING
> > + "*** WARNING *** sysfs devices/.../power/state files "
> > + "are only for testing, and will be removed\n");
> > + warned = error;
> > + }
> > +
> > /* disallow incomplete suspend sequences */
> > if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early))
> > return error;
>
> Well that's not much use. It tells people "hey, we broke it". They
> already knew that.

No, it only does what you asked for: warning people that they're using
something that's going away. It says nothing about "broke".


> What we should do is to revert 047bda36150d11422b2c7bacca1df324c909c0b3 and

Bad answer ... see my original reply in this thread. If "the answer" is
to involve making PCI devices work again, better solutions include reverting
the patch I mentioned (adding the suspend_late/resume_early support to PCI)
or a version of what Matthew has produced (poking through bus layers so
that test can be made to fail when the bus supports those methods but the
specific device's driver doesn't use them).

- Dave

2006-12-21 05:02:25

by Andrew Morton

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Wed, 20 Dec 2006 20:56:27 -0800
David Brownell <[email protected]> wrote:

> On Wednesday 20 December 2006 7:51 pm, Andrew Morton wrote:
>
> > > + if (!warned) {
> > > + printk(KERN_WARNING
> > > + "*** WARNING *** sysfs devices/.../power/state files "
> > > + "are only for testing, and will be removed\n");
> > > + warned = error;
> > > + }
> > > +
> > > /* disallow incomplete suspend sequences */
> > > if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early))
> > > return error;
> >
> > Well that's not much use. It tells people "hey, we broke it". They
> > already knew that.
>
> No, it only does what you asked for: warning people that they're using
> something that's going away. It says nothing about "broke".
>

But it's still broken, is it not?

>
> > What we should do is to revert 047bda36150d11422b2c7bacca1df324c909c0b3 and
>
> Bad answer

Is better than breaking stuff.

> ... see my original reply in this thread. If "the answer" is
> to involve making PCI devices work again, better solutions include reverting
> the patch I mentioned (adding the suspend_late/resume_early support to PCI)
> or a version of what Matthew has produced (poking through bus layers so
> that test can be made to fail when the bus supports those methods but the
> specific device's driver doesn't use them).
>

We appear to have a choice of three options. But I see no fix in Greg's
tree. Please let's not just accidentally forget to do this.

2006-12-21 06:38:07

by David Brownell

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

Hmm, this reminds me of a thread from last summer, following up on
some PM discussions at OLS. Thread "Runtime power management for
network interfaces", at the end of July.


> 2) Network device infrastructure should make it easier for devices:
> bring interface down on suspend and bring it up after resume
> (if it was running when suspended). This would allow many devices to
> have no suspend/resume hook; except those that have some better power
> control over hardware.

The _intent_ of the class suspend() and resume() methods is to let
infrastructure (the network stack was explicitly mentioned!) handle
pretty much everything except putting the hardware in low power
modes ... which last step might, for PCI devices at least, most
naturally be done in suspend_late(). That way it'd be decoupled
cleanly from anything else.

Now, I recently tried refreshing a patch that used those class
suspend() and resume() methods, and for some reason they're not
getting called. I believe they used to get called, although it's
true their parameter wasn't very useful ... it was called with the
underlying device, not the class_device holding state that the
class driver manages.

I just wanted to point out that yes, this ground has been covered
before, with some agreement on that approach. It'd be good to see
it pursued. :)

- Dave

2006-12-21 07:06:03

by David Brownell

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Wednesday 20 December 2006 9:02 pm, Andrew Morton wrote:

> > ... see my original reply in this thread. If "the answer" is
> > to involve making PCI devices work again, better solutions include reverting
> > the patch I mentioned (adding the suspend_late/resume_early support to PCI)
> > or a version of what Matthew has produced (poking through bus layers so
> > that test can be made to fail when the bus supports those methods but the
> > specific device's driver doesn't use them).
> >
>
> We appear to have a choice of three options. But I see no fix in Greg's
> tree. Please let's not just accidentally forget to do this.

Plus the fourth "leave it be" option, which I guess you're voting against.
Of those options, I'd go for something like Matthew's patch to add a new
layer-punching hook. (I'll look at his latest tomorrow, and do something
appropriate with it.)

It's interesting that there was no evident motion on these network PM
issues after the OLS (and then netdev) discussion last summer ... but
there is now much more active discussion. Evidently PM issues are still
ignored until a fire gets set.

- Dave

2006-12-21 07:10:53

by Stephen Hemminger

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

David Brownell wrote:
> Hmm, this reminds me of a thread from last summer, following up on
> some PM discussions at OLS. Thread "Runtime power management for
> network interfaces", at the end of July.
>
>
>
>> 2) Network device infrastructure should make it easier for devices:
>> bring interface down on suspend and bring it up after resume
>> (if it was running when suspended). This would allow many devices to
>> have no suspend/resume hook; except those that have some better power
>> control over hardware.
>>
>
> The _intent_ of the class suspend() and resume() methods is to let
> infrastructure (the network stack was explicitly mentioned!) handle
> pretty much everything except putting the hardware in low power
> modes ... which last step might, for PCI devices at least, most
> naturally be done in suspend_late(). That way it'd be decoupled
> cleanly from anything else.
>
The class methods don't work right for that because the physical class
(PCI) gets
called before the virtual class (network devices).

> Now, I recently tried refreshing a patch that used those class
> suspend() and resume() methods, and for some reason they're not
> getting called. I believe they used to get called, although it's
> true their parameter wasn't very useful ... it was called with the
> underlying device, not the class_device holding state that the
> class driver manages.
>
> I just wanted to point out that yes, this ground has been covered
> before, with some agreement on that approach. It'd be good to see
> it pursued. :)
>
> - Dave
>
>

2006-12-21 08:11:29

by David Brownell

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wednesday 20 December 2006 11:08 pm, Stephen Hemminger wrote:
> David Brownell wrote:
> > Hmm, this reminds me of a thread from last summer, following up on
> > some PM discussions at OLS. Thread "Runtime power management for
> > network interfaces", at the end of July.
> >
> >
> >
> >> 2) Network device infrastructure should make it easier for devices:
> >> bring interface down on suspend and bring it up after resume
> >> (if it was running when suspended). This would allow many devices to
> >> have no suspend/resume hook; except those that have some better power
> >> control over hardware.
> >>
> >
> > The _intent_ of the class suspend() and resume() methods is to let
> > infrastructure (the network stack was explicitly mentioned!) handle
> > pretty much everything except putting the hardware in low power
> > modes ... which last step might, for PCI devices at least, most
> > naturally be done in suspend_late(). That way it'd be decoupled
> > cleanly from anything else.
> >
> The class methods don't work right for that because the physical class
> (PCI) gets called before the virtual class (network devices).

I'd say they don't work just now because the virtual class code just
doesn't get called ... at least, without someone setting up a field
(device.class) that's flagged as "optional" and might be disappearing.

But if you read the PM code, you'll observe that the class suspend
method gets called BEFORE the bus/device suspend method. And that's
how it's documented in Documentation/power/devices.txt too.


... However notice that "interface down" operations won't have that
particular problem, they have net_device.class_dev.dev already ready
for whatever PM operation is appropriate.

- Dave


> > Now, I recently tried refreshing a patch that used those class
> > suspend() and resume() methods, and for some reason they're not
> > getting called. I believe they used to get called, although it's
> > true their parameter wasn't very useful ... it was called with the
> > underlying device, not the class_device holding state that the
> > class driver manages.
> >
> > I just wanted to point out that yes, this ground has been covered
> > before, with some agreement on that approach. It'd be good to see
> > it pursued. :)
> >
> > - Dave
> >
> >
>

2006-12-21 08:27:25

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace


> >
> > > What we should do is to revert 047bda36150d11422b2c7bacca1df324c909c0b3 and
> >
> > Bad answer
>
> Is better than breaking stuff.

.. stuff that made assumptions about something and did stuff it probably
shouldn't have been doing for the intent it had ;)

the semantics of this thing were clear as mud, and actually
disfunctional.... (and the user of it that "broke" actually wanted
something else, but didn't because a few drivers didn't implement it
quite the right way)
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-21 08:54:18

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down


> Is there some reason why we can't have the OS just do the D3
> transition for all drivers that register support? I mean, this power
> management using D states is actually driver *independent* and at
> least way back in the day was supposed to be implemented for "OS power
> management"

all you need to do is 1 function call from your interface down code.. so
it's really not a big deal to just do that call ;)
(well and you want the D0 call in the up code, but that's ok)

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-21 11:22:40

by Francois Romieu

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

Stephen Hemminger <[email protected]> :
[...]
> We need to allow ethtool setting to be done before device has been brought
> up and started autonegotiation. The current MII library doesn't really support
> it.

I completely agree.

--
Ueimor

2006-12-21 13:14:34

by jamal

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, 2006-20-12 at 22:14 -0500, Dan Williams wrote:
...
....
> Simple == good. Down == down. Lets just agree on that and save
> ourselves a lot of pain.

netdevices have well defined operational and administrative state
machines. And very well defined relationship between operational and
administrative status. IOW, care should be invoked not to reinvent.

Power management to me seems like an operational state.
A link could only transition to operational or down depending on
whether it is "powered" up or down.

To be complete, since a netdevice is a generic construct, nota bene:
- a link could be a wireless association or ethernet cable or a PPP
session or a ATM PVC, or an infrared channel etc.
- events that result in operational link transitions could be anything
from powering up an ethernet phy with an active cable plugged to an
802.1x auth on a wireless association to a on-demand ppp link seeing an
outgoing packet.

IMO, for this discussion to be meaningful, it would be useful to read
Documentation/networking/operstates.txt
And if you are keen you can then read RFC 2863...

cheers,
jamal

2006-12-21 13:40:22

by Sven-Haegar Koch

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, 20 Dec 2006, Dan Williams wrote:

>> If we define interface down as meaning that the device is powered down
>> and the radio switched off, then (b) and (c) would presumably just need
>> to ensure that the interface is downed. (a) is a slightly more special
>> case - if the switch disables the radio, I guess we then want the driver
>> to down the interface as well.
>
> Correct.
>
>> In the (a) case, drivers should presumably refuse to bring the interface
>> up if the radio is disabled?
>
> Right; the driver simply can't do anything about it, because the switch
> is hardwired to the card and either the card's firmware takes care of
> it, or the chipset takes care of it. The driver has no say whatsoever
> in the state of the card's radio for this case. I tend to think this
> case is on it's way out in the same way that fullmac cards are falling
> out of favor (ie, do everything in software and save $$$), but they are
> around and we need to support them.
>
> In this case, down really does mean down too. The driver cannot honor
> requests to set SSID, frequency, etc, because it's simply not possible
> at that time.

What do you mean with this exactly?
Should the user not be able to set these values, or should the driver not
be able to activate them?

I think it is correct when the driver does not activate them, but I think
the user should be able to configure them, have them stored inside
cfg80211/the driver, and have them activated/used when uping the
interface, or when the rfkill switch has been deactivated. Otherwise it
will get impossible to boot with rfkill disabled, toggle the switch later
on and have everything working.

And another side to this:
if a disabled rfkill switch downs the interface (opposed to just
disabling it but staying "ifconfig up") - what happens to the ip config
of this interface? What reconfigures the needed routes when a re-enabled
rfkill switch reactivates the interface? Will manual route add and
ifconfig statements be impossible and we'll get forced to use some crappy
distri-scripts and daemons for it?

And third point just coming to my mind:
how is changing the mac address of the card supposed to work? Chaning it
through ifconfig only works when the interface is downed, so the newly
wanted mac address has to be saved somewhere before the interface is
reenabled and reinitialized on the next "ifconfig up".
(And I think it is an absolute requirement that NO packet with the
old/default mac address may be sent into the air whatsoever)

c'ya
sven

--

The Internet treats censorship as a routing problem, and routes around it.
(John Gilmore on http://www.cygnus.com/~gnu/)

2006-12-21 17:14:11

by Dan Williams

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Thu, 2006-12-21 at 14:19 +0100, Sven-Haegar Koch wrote:
> On Wed, 20 Dec 2006, Dan Williams wrote:
>
> >> If we define interface down as meaning that the device is powered down
> >> and the radio switched off, then (b) and (c) would presumably just need
> >> to ensure that the interface is downed. (a) is a slightly more special
> >> case - if the switch disables the radio, I guess we then want the driver
> >> to down the interface as well.
> >
> > Correct.
> >
> >> In the (a) case, drivers should presumably refuse to bring the interface
> >> up if the radio is disabled?
> >
> > Right; the driver simply can't do anything about it, because the switch
> > is hardwired to the card and either the card's firmware takes care of
> > it, or the chipset takes care of it. The driver has no say whatsoever
> > in the state of the card's radio for this case. I tend to think this
> > case is on it's way out in the same way that fullmac cards are falling
> > out of favor (ie, do everything in software and save $$$), but they are
> > around and we need to support them.
> >
> > In this case, down really does mean down too. The driver cannot honor
> > requests to set SSID, frequency, etc, because it's simply not possible
> > at that time.
>
> What do you mean with this exactly?
> Should the user not be able to set these values, or should the driver not
> be able to activate them?
>
> I think it is correct when the driver does not activate them, but I think
> the user should be able to configure them, have them stored inside
> cfg80211/the driver, and have them activated/used when uping the
> interface, or when the rfkill switch has been deactivated. Otherwise it
> will get impossible to boot with rfkill disabled, toggle the switch later
> on and have everything working.

This would be an optimization. You could possibly _set_ values, but
obviously an 'associate' command would fail, and so it should. But
there's really not that much of a point to doing this, because cfg80211
should support "packaging" up all the config for a particular
association request into one call, and then just blasting that to the
card. Ideally configuration wouldn't be pushed to the card piecemeal.
As WEXT stands right now, setting the SSID on the card is essentially
the "associate" command, which obviously wouldn't work when the card is
down. cfg80211 can fix that, you're right.

> And another side to this:
> if a disabled rfkill switch downs the interface (opposed to just
> disabling it but staying "ifconfig up") - what happens to the ip config
> of this interface? What reconfigures the needed routes when a re-enabled
> rfkill switch reactivates the interface? Will manual route add and
> ifconfig statements be impossible and we'll get forced to use some crappy
> distri-scripts and daemons for it?

For anything other than unencrypted and WEP-only networks, you already
need a userspace program to configure your wireless card. Dynamic WEP,
LEAP, WPA, WPA2, 802.1x all require much, much more handshake and
validation that should _ever_ be in a driver. You should _never_, ever
be configuring your wireless card with module parameters. I'm sure
something like iwconfig would be fine to configure your card with.

When the card goes down, it normally looses it's association to the
access point anyway, and you need to start the assocaition and
authentication over completely. At that point, it's no longer
guaranteed that you could ever get a previous IP address back.

What does downing an ethernet device do? It clears out routes
associated with that device, and clears assigned addresses (I think?).
Wireless is and should not be any different here. When you bring the
device back up, you need to go through some amount of renegotiation
anyway.

> And third point just coming to my mind:
> how is changing the mac address of the card supposed to work? Chaning it
> through ifconfig only works when the interface is downed, so the newly
> wanted mac address has to be saved somewhere before the interface is
> reenabled and reinitialized on the next "ifconfig up".
> (And I think it is an absolute requirement that NO packet with the
> old/default mac address may be sent into the air whatsoever)

That's how it should work. If you want to change the MAC address, the
card shouldn't probably be down.

Dan

2006-12-21 18:30:09

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Wed, 20 Dec 2006 22:06:51 EST, Dan Williams said:
> It's also complicated because some switches are supposed to rfkill both
> an 802.11 module _and_ a bluetooth module at the same time, or I guess
> some laptops may even have one rfkill switch for each wireless device.

On my Dell D820, it's bios-selectable if the switch is enabled, or if
it controls just the 802.11 card, or 802.11 and bluetooth, or just bluetooth,
or 802.11 and mobile broadband, or ...

This way lies madness. :)

(Oddest part - said bios config screen offers the choices for bluetooth
and mobile broadband even though the hardware config doesn't include it. ;)


Attachments:
(No filename) (226.00 B)

2006-12-22 01:03:55

by Herbert Xu

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

Matthew Garrett <[email protected]> wrote:
>
> In terms of what I've seen on vaguely modern hardware, I'd guess at
> e1000 and sky2 as the top ones. b44 is still common in cheaper hardware,
> with via-rhine appearing at the very low end. I'll try to grep through
> our hardware database results to get a stronger idea about percentages.

The Sony laptop that I bought a year ago still has an e100 chipset so
that's probably worth fixing too.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2006-12-22 01:35:21

by Matt Domsch

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

On Thu, Dec 21, 2006 at 01:27:55PM -0500, [email protected] wrote:
> On Wed, 20 Dec 2006 22:06:51 EST, Dan Williams said:
> > It's also complicated because some switches are supposed to rfkill both
> > an 802.11 module _and_ a bluetooth module at the same time, or I guess
> > some laptops may even have one rfkill switch for each wireless device.
>
> On my Dell D820, it's bios-selectable if the switch is enabled, or if
> it controls just the 802.11 card, or 802.11 and bluetooth, or just bluetooth,
> or 802.11 and mobile broadband, or ...
>
> This way lies madness. :)
>
> (Oddest part - said bios config screen offers the choices for bluetooth
> and mobile broadband even though the hardware config doesn't include it. ;)

In this case changing the UI based on presence (and thus the printed
docs etc.) winds up being difficult. Think of it as an embedded
advertisement - you too could have bluetooth and mobile broadband... :-)

-Matt

--
Matt Domsch
Software Architect
Dell Linux Solutions linux.dell.com & http://www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

2006-12-23 09:37:15

by Pavel Machek

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

Hi!

> > which userspace is using this btw?
>
> Ubuntu uses it to disable wireless hardware under certain circumstances.
> I believe that Suse's powernowd uses it to power down wired ethernet
> hardware when it's not in use.

I flamed seife for this. It was always broken for 20%-or-so of
hardware. It is _not_ simple to fix.
Pavel
--
Thanks for all the (sleeping) penguins.

2006-12-23 09:37:53

by Pavel Machek

[permalink] [raw]
Subject: Re: Network drivers that don't suspend on interface down

Hi!

> > about your driver list;
> > do you have an idea of what the top 5 relevant ones would be?
> > I'd be surprised if the top 5 together had less than 95% market share,
> > so if we fix those we'd be mostly done already.
>
> In terms of what I've seen on vaguely modern hardware, I'd guess at
> e1000 and sky2 as the top ones. b44 is still common in cheaper hardware,

e1000 already powersaves when cable is not plugged in. Difference is
~0.5W, IIRC.
Pavel
--
Thanks for all the (sleeping) penguins.

2006-12-23 09:37:56

by Pavel Machek

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace

Hi!

> > The existence of the power/state interface wasn't a bug - it was a
> > deliberate decision to add it. It's the only reason the
> > dpm_runtime_suspend() interface exists.

Actually, if we noticed power/state during PM framework review, it
would have been killed. It is just way too ugly.

> > > In contrast, the /sys/devices/.../power/state API has never had many
> > > users beyond developers trying to test their drivers (without taking
> > > the whole system into a low power state, which probably didn't work
> > > in any case), and has *always* been problematic. And the change you
> > > object to doesn't "break" anything fundamental, either. Everything
> > > still works.
> >
> > It's used on every Ubuntu and Suse system,
>
> Odd how the relevant Suse developers didn't mention any issues with
> those files going away, any of the times problems with them were
> discussed on the PM list. Also, I have a Suse system that doesn't
> use those files for anything ... maybe only newer release use it.

Not on *every* suse system. power/state is known to oops kernels, so
it is only enabled when user explicitely asks for 'dangerous aggresive
experimental power saving' or something like that.
--
Thanks for all the (sleeping) penguins.

2006-12-23 09:37:56

by Pavel Machek

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace

Hi!

> > That's a workable approach to resolving the underlying problem in the
> > long term. In the short term, notice the system still works correctly
> > if you don't try writing those files.
>
> Well, except I'm now burning an extra couple of watts of power. I
> consider that pretty broken.

Couple of watts is not that bad, considering usb still eats 4W more
than it should.

> > I'd not be keen on reverting Linus' patch [1] myself, even though few
> > drivers have started to use that mechanism yet; that would be a step
> > backwards, and would perpetuate users of that broken sysfs file.
>
> I'm sorry, which bit of "Don't break userspace API without adequate
> prior warning and with a workable replacement" is difficult to
> understand?

It should not break any userspace... but you do not get the power
savings any more. Sorry. This kind of powersaving is not available on
recent kernels.


Right fix is to extend wifi stack... and have ifconfig wlan0
powerdown, or something like that.
Pavel
--
Thanks for all the (sleeping) penguins.

2006-12-23 14:19:26

by Stefan Seyfried

[permalink] [raw]
Subject: Re: Changes to sysfs PM layer break userspace

On Fri, Dec 22, 2006 at 08:44:01PM +0000, Pavel Machek wrote:
> Hi!
>
> > > which userspace is using this btw?
> >
> > Ubuntu uses it to disable wireless hardware under certain circumstances.
> > I believe that Suse's powernowd uses it to power down wired ethernet
> > hardware when it's not in use.

Powersaved had implemented this, but it was always declared an experimental
feature and AFAIK is gone since quite some time.

> I flamed seife for this. It was always broken for 20%-or-so of
> hardware. It is _not_ simple to fix.

It was an experimental feature in the words sense:
For experimentation. I never accepted any bugreports for that but told
the reporters to go away :-)
--
Stefan Seyfried
QA / R&D Team Mobile Devices | "Any ideas, John?"
SUSE LINUX Products GmbH, N?rnberg | "Well, surrounding them's out."

2006-12-24 07:02:10

by David Brownell

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace

On Friday 22 December 2006 1:09 pm, Pavel Machek wrote:
> Actually, if we noticed power/state during PM framework review, it
> would have been killed. It is just way too ugly.
>
> > > > In contrast, the /sys/devices/.../power/state API has never had many
> > > > users beyond developers trying to test their drivers ...
> > >
> > > It's used on every Ubuntu and Suse system,
> >
> > Odd how the relevant Suse developers didn't mention any issues with
> > those files going away, any of the times problems with them were
> > discussed on the PM list. Also, I have a Suse system that doesn't
> > use those files for anything ... maybe only newer release use it.
>
> Not on *every* suse system. power/state is known to oops kernels, so
> it is only enabled when user explicitely asks for 'dangerous aggresive
> experimental power saving' or something like that.

So exactly what tool on Ubuntu uses this? Without any "dangerous!
aggressive! experimental!" read-lights-siren-alarms-ringing alert level?


Seems to me anyone really desperate to put PCI devices into a low
power mode, without driver support at the "ifdown" level, would be
able just "rmmod driver; setpci". Without risking software bugs.

2006-12-28 13:22:56

by Alan

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace

> Seems to me anyone really desperate to put PCI devices into a low
> power mode, without driver support at the "ifdown" level, would be
> able just "rmmod driver; setpci".

Incorrect for very obvious reasons - there may be two devices driven by
the same driver one up and one down.

Alan

2006-12-28 16:05:00

by Arjan van de Ven

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace

On Thu, 2006-12-28 at 13:31 +0000, Alan wrote:
> > Seems to me anyone really desperate to put PCI devices into a low
> > power mode, without driver support at the "ifdown" level, would be
> > able just "rmmod driver; setpci".
>
> Incorrect for very obvious reasons - there may be two devices driven by
> the same driver one up and one down.

btw this same "incorrect" applies to the sysfs method, that also does a
more or less uncontrolled/uncoordinated power state switch.

All the more reason to have the "normal" device interfaces do the right
thing, so that the kernel has a standing chance to coordinate it
properly.

--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org

2006-12-29 05:30:18

by David Brownell

[permalink] [raw]
Subject: Re: Changes to PM layer break userspace

On Thursday 28 December 2006 5:31 am, Alan wrote:
> > Seems to me anyone really desperate to put PCI devices into a low
> > power mode, without driver support at the "ifdown" level, would be
> > able just "rmmod driver; setpci".
>
> Incorrect for very obvious reasons - there may be two devices driven by
> the same driver one up and one down.

Let me emphasize "desperate". ;)

The examples given were all cases where that didn't seem to be an issue.

But agreed, the best approach is really to make devices not in active
use (i.e. before "ifup", after "ifdown" ... maybe even whenever no
driver is bound to the device) stay in low power states.

- Dave

2007-01-25 05:00:30

by Matthew Garrett

[permalink] [raw]
Subject: [PATCH] Fix /sys/device/.../power/state regression

In 2.6.19, support for splitting driver suspend and resume callbacks
into interrupt and non-interrupt contexts was added. Unfortunately, this
broke /sys/device/.../power/state support for all devices. In the long
run, this should be obsoleted by power management support in the
individual drivers - however, in the case of network drivers (for
example), currently only three drivers implement any sort of useful
run-time power management.

This patch allows the bus driver to check whether a specific driver
requires the split. If not, the 2.6.18 functionality is restored. It
also alters feature-removals.txt to note that the deprecated
functionality should not be removed until a replacement actually exists.

Signed-off-by: Matthew Garrett <[email protected]>

--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -9,7 +9,8 @@ be removed from this file.
What: /sys/devices/.../power/state
dev->power.power_state
dpm_runtime_{suspend,resume)()
-When: July 2007
+ bus->pm_has_noirq_stage()
+When: Once alternative functionality has been implemented
Why: Broken design for runtime control over driver power states, confusing
driver-internal runtime power management with: mechanisms to support
system-wide sleep state transitions; event codes that distinguish
diff --git a/Documentation/power/devices.txt b/Documentation/power/devices.txt
index d0e79d5..345cca4 100644
--- a/Documentation/power/devices.txt
+++ b/Documentation/power/devices.txt
@@ -79,6 +79,7 @@ struct bus_type {

int (*resume_early)(struct device *dev);
int (*resume)(struct device *dev);
+ int (*pm_has_noirq_stage)(struct device *dev);
};

Bus drivers implement those methods as appropriate for the hardware and
@@ -236,6 +237,10 @@ The phases are seen by driver notifications issued in this order:
may stay partly usable until this late. This "late" call may also
help when coping with hardware that behaves badly.

+ If a bus implements the suspend_late method, it must also provide a
+ pm_has_noirq_stage function in order to determine whether devices
+ may be suspended during runtime.
+
The pm_message_t parameter is currently used to refine those semantics
(described later).

@@ -348,7 +353,9 @@ The phases are seen by driver notifications issued in this order:
won't be supported on busses that require IRQs in order to
interact with devices.

- This reverses the effects of bus.suspend_late().
+ This reverses the effects of bus.suspend_late(). As with suspend_late,
+ if a bus implements this function it must provide a pm_has_noirq_stage
+ function.

2 bus.resume(dev) is called next. This may be morphed into a device
driver call with bus-specific parameters; implementations may sleep.
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index f9c903b..6bf1218 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -597,6 +597,16 @@ static int platform_resume(struct device * dev)
return ret;
}

+static int platform_pm_has_noirq_stage(struct device * dev)
+{
+ int ret = 0;
+ struct platform_driver *drv = to_platform_driver(dev->driver);
+
+ if (dev->driver && (drv->resume_early || drv->suspend_late))
+ ret = 1;
+ return ret;
+}
+
struct bus_type platform_bus_type = {
.name = "platform",
.dev_attrs = platform_dev_attrs,
@@ -606,6 +616,7 @@ struct bus_type platform_bus_type = {
.suspend_late = platform_suspend_late,
.resume_early = platform_resume_early,
.resume = platform_resume,
+ .pm_has_noirq_stage = platform_pm_has_noirq_stage,
};
EXPORT_SYMBOL_GPL(platform_bus_type);

diff --git a/drivers/base/power/sysfs.c b/drivers/base/power/sysfs.c
index 2d47517..c4ce060 100644
--- a/drivers/base/power/sysfs.c
+++ b/drivers/base/power/sysfs.c
@@ -46,7 +46,8 @@ static ssize_t state_store(struct device * dev, struct device_attribute *attr, c
int error = -EINVAL;

/* disallow incomplete suspend sequences */
- if (dev->bus && (dev->bus->suspend_late || dev->bus->resume_early))
+ if (dev->bus && dev->bus->pm_has_noirq_stage
+ && dev->bus->pm_has_noirq_stage(dev))
return error;

state.event = PM_EVENT_SUSPEND;
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index e5ae3a0..c0e4e7a 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -351,6 +351,17 @@ static int pci_device_resume(struct device * dev)
return error;
}

+static int pci_device_pm_has_noirq_stage(struct device * dev)
+{
+ int error = 0;
+ struct pci_dev * pci_dev = to_pci_dev(dev);
+ struct pci_driver * drv = pci_dev->driver;
+
+ if (drv && (drv->resume_early || drv->suspend_late))
+ error = 1;
+ return error;
+}
+
static int pci_device_resume_early(struct device * dev)
{
int error = 0;
@@ -569,6 +580,7 @@ struct bus_type pci_bus_type = {
.suspend_late = pci_device_suspend_late,
.resume_early = pci_device_resume_early,
.resume = pci_device_resume,
+ .pm_has_noirq_stage = pci_device_pm_has_noirq_stage,
.shutdown = pci_device_shutdown,
.dev_attrs = pci_dev_attrs,
};
diff --git a/include/linux/device.h b/include/linux/device.h
index 49ab53c..1c663c4 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -59,6 +59,7 @@ struct bus_type {
int (*suspend)(struct device * dev, pm_message_t state);
int (*suspend_late)(struct device * dev, pm_message_t state);
int (*resume_early)(struct device * dev);
+ int (*pm_has_noirq_stage)(struct device * dev);
int (*resume)(struct device * dev);
};

--
Matthew Garrett | [email protected]

2007-01-26 19:59:18

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

On Thu, 25 Jan 2007 05:00:09 +0000
Matthew Garrett <[email protected]> wrote:

> In 2.6.19, support for splitting driver suspend and resume callbacks
> into interrupt and non-interrupt contexts was added. Unfortunately, this
> broke /sys/device/.../power/state support for all devices. In the long
> run, this should be obsoleted by power management support in the
> individual drivers - however, in the case of network drivers (for
> example), currently only three drivers implement any sort of useful
> run-time power management.
>
> This patch allows the bus driver to check whether a specific driver
> requires the split. If not, the 2.6.18 functionality is restored. It
> also alters feature-removals.txt to note that the deprecated
> functionality should not be removed until a replacement actually exists.

That sounds like material for 2.6.20 as well as for 2.6.19.x.

2007-01-26 20:42:41

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

On Thu, Jan 25, 2007 at 05:00:09AM +0000, Matthew Garrett wrote:
> In 2.6.19, support for splitting driver suspend and resume callbacks
> into interrupt and non-interrupt contexts was added. Unfortunately, this
> broke /sys/device/.../power/state support for all devices. In the long
> run, this should be obsoleted by power management support in the
> individual drivers - however, in the case of network drivers (for
> example), currently only three drivers implement any sort of useful
> run-time power management.
>
> This patch allows the bus driver to check whether a specific driver
> requires the split. If not, the 2.6.18 functionality is restored. It
> also alters feature-removals.txt to note that the deprecated
> functionality should not be removed until a replacement actually exists.

Ick, no.

As stated before, it was broken in the past, and no userspace tools use
it because of that. So it was disabled.

Or am I misstating that long thread? David, your thoughts?

thanks,

greg k-h

2007-01-26 20:43:22

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

On Fri, Jan 26, 2007 at 11:59:05AM -0800, Andrew Morton wrote:
> On Thu, 25 Jan 2007 05:00:09 +0000
> Matthew Garrett <[email protected]> wrote:
>
> > In 2.6.19, support for splitting driver suspend and resume callbacks
> > into interrupt and non-interrupt contexts was added. Unfortunately, this
> > broke /sys/device/.../power/state support for all devices. In the long
> > run, this should be obsoleted by power management support in the
> > individual drivers - however, in the case of network drivers (for
> > example), currently only three drivers implement any sort of useful
> > run-time power management.
> >
> > This patch allows the bus driver to check whether a specific driver
> > requires the split. If not, the 2.6.18 functionality is restored. It
> > also alters feature-removals.txt to note that the deprecated
> > functionality should not be removed until a replacement actually exists.
>
> That sounds like material for 2.6.20 as well as for 2.6.19.x.

No, it's not stable material, as drivers would have to be modified to
support it, and that is adding new stuff. See my other comment about
why this was changed because it was broken...

thanks,

greg k-h

2007-01-26 21:56:47

by David Brownell

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

On Friday 26 January 2007 12:41 pm, Greg KH wrote:
> On Thu, Jan 25, 2007 at 05:00:09AM +0000, Matthew Garrett wrote:

> > This patch allows the bus driver to check whether a specific driver
> > requires the split. If not, the 2.6.18 functionality is restored. It
> > also alters feature-removals.txt to note that the deprecated
> > functionality should not be removed until a replacement actually exists.
>
> Ick, no.

Especially changing the feature-removal.txt file ...


> As stated before, it was broken in the past, and no userspace tools use
> it because of that. So it was disabled.

Matthew's problem seems to be that he (?) wrote such a tool to
power down certain wireless adapters. We did ask around, a while
back, to see if anyone was attempting to use that broken mechanism.

The only real user was pcmcia (which no longer does so), although
it was sometimes used for developer testing (with extreme caution).

Evidently that tool was written after we asked, but before anyone
got around to listing that mechanism as "gonna go"; or maybe it just
never turned up when we looked for users.


It's certainly NOT been news in the PM community, for the past
several years, that the /sys/devices/.../power/state files are
conceptually broken, so userspace tools should NOT use them.

And since the semantics of that file have never been stable even
for PCI devices (much less other busses) or even documented, I
can't have much sympathy towards software using it.

Anyone depending on unspecified behavior (a.k.a. "bugs") is
walking on thin ice. Anyone trying to use behavior that's been
discussed (among the relevant maintainers) as broken is actively
jumping up and down on that ice. Even though there was a gap
(sorry!) between the recognition of that mechanism as broken,
and it actually getting scheduled for removal.


> Or am I misstating that long thread? David, your thoughts?

My recollection of that thread was that *NOBODY* claimed that a
replacement of that (broken-by-design) mechanism would ever exist.

However, there was some interesting discussion of just how to fix
the wireless adapters. There appeared to be consensus that the
right solution involved "ifdown wlan0" (or whatever) ought to power
down that adapter, in the way Matthew wanted.

I thought the resolution was that fixing a few of those drivers
should solve the problem Matthew needed resolved, and that in
the meanwhile "rmmod drivername" should suffice. There also seemed
to be agreement that power management for wireless devices needed
more work; there might need to be a state between "down/off" and
"configured and able to talk IP".

- Dave

2007-01-26 22:20:29

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

On Fri, Jan 26, 2007 at 01:56:41PM -0800, David Brownell wrote:
> > Or am I misstating that long thread? David, your thoughts?
>
> My recollection of that thread was that *NOBODY* claimed that a
> replacement of that (broken-by-design) mechanism would ever exist.
>
> However, there was some interesting discussion of just how to fix
> the wireless adapters. There appeared to be consensus that the
> right solution involved "ifdown wlan0" (or whatever) ought to power
> down that adapter, in the way Matthew wanted.
>
> I thought the resolution was that fixing a few of those drivers
> should solve the problem Matthew needed resolved, and that in
> the meanwhile "rmmod drivername" should suffice. There also seemed
> to be agreement that power management for wireless devices needed
> more work; there might need to be a state between "down/off" and
> "configured and able to talk IP".

Ok, that sounds like what I thought the thread resolved too.

Andrew, please drop this patch from your queue, and I'll make sure the
stable tree doesn't get it either.

thanks,

greg k-h

2007-01-26 23:15:39

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

On Fri, Jan 26, 2007 at 01:56:41PM -0800, David Brownell wrote:

> I thought the resolution was that fixing a few of those drivers
> should solve the problem Matthew needed resolved, and that in
> the meanwhile "rmmod drivername" should suffice. There also seemed
> to be agreement that power management for wireless devices needed
> more work; there might need to be a state between "down/off" and
> "configured and able to talk IP".

It's certainly the case that fixing those drivers would result in a
better long-term situation - however, nobody currently seems to have any
interest in doing so, and I've only got access to a subset of the
hardware and approximately none of the documentation. It's likely that
I'll end up spending time on that, but right now I'm afraid that other
things are taking a higher priority.

I'm not sure what you mean by using rmmod instead. Most drivers don't
explicitly set the power state when unbinding, and I can't see anywhere
in the generic PCI code that will do it.

As I've said before, I think it's unreasonable to cripple interfaces for
(mostly) aesthetic reasons without ensuring that equivalent
functionality already exists. This patch restores useful functionality
without breaking the extra sanity checks that you've added. I appreciate
that it's not an interface that you want to support in the long term
(well, even the short term...), and that suggesting its removal provides
a useful incentive to fix things properly in the long term. But it would
be nice if we didn't make it impossible to do this until the right thing
is implemented.

--
Matthew Garrett | [email protected]

2007-01-27 00:43:35

by David Brownell

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

On Friday 26 January 2007 3:15 pm, Matthew Garrett wrote:
> On Fri, Jan 26, 2007 at 01:56:41PM -0800, David Brownell wrote:
>
> > I thought the resolution was that fixing a few of those drivers
> > should solve the problem Matthew needed resolved, and that in
> > the meanwhile "rmmod drivername" should suffice. There also seemed
> > to be agreement that power management for wireless devices needed
> > more work; there might need to be a state between "down/off" and
> > "configured and able to talk IP".
>
> It's certainly the case that fixing those drivers would result in a
> better long-term situation - however, nobody currently seems to have any
> interest in doing so...

And the way these things work, unfortunately, is that merging your patch
would ensure nobody ever gets such interest. Removing that "state" file
(and its bogus infrastructure) has already taken a few years too long.


> I'm not sure what you mean by using rmmod instead. Most drivers don't
> explicitly set the power state when unbinding, and I can't see anywhere
> in the generic PCI code that will do it.

There are broadly two things happening in a driver suspend()
method:

- One of them *always* happens on rmmod: stopping all driver
activity. That eliminates the dynamic power usage, which as
a rule is what consumes most of the power.

- And issuing a pci_set_power_state() call. That eliminates
some more power usage, but as a rule it's not as much.

Plus some drivers will also enable the device as system wakeup
source. That behaves poorly on PC Linux (ACPI doesn't handle it
well yet), but on more power-aware Linux systems it's important
as a way to let the system stay in low power states most of the
time, without sacrificing system response to external actions.

If you measure and find that setting the power state matters,
making those drivers do that on rmmod should be easy. (And IMO
it would be worth trying to make PCI use those states by default
for driverless devices. Different issue though.)


> As I've said before, I think it's unreasonable to cripple interfaces for
> (mostly) aesthetic reasons without ensuring that equivalent
> functionality already exists.

I don't recall anyone raising aesthetic concerns. And bug-equivalence
has never been a goal of Linux.


> This patch restores useful functionality
> without breaking the extra sanity checks that you've added. I appreciate
> that it's not an interface that you want to support in the long term
> (well, even the short term...),

You imply that it _was_ once supported, which is not true. Like any
other bug (in this case "design bug"), it was there and could be abused.
And like some other bugs, fixing it can trigger complaints from (ab)users.

It's been several years now that this interface has been well recognized as
trouble. Years in which all _other_ users went and did the Right Thing:
they stopped using it, or never started.

If you want sympathy for an application that took the time during which
that mechanism was getting phased out, and then decided to phase *IN* a
new use ... sorry, I'm not constitutionally able to give sympathy. I can
maybe offer sympathy that you didn't know it was being phased out (since
the decision to do that ISTR predated the "feature removal" schedule, with
its additional publicity), but not much more than that.

- Dave

2007-01-27 00:57:09

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

On Fri, 26 Jan 2007 16:42:56 -0800
David Brownell <[email protected]> wrote:

> On Friday 26 January 2007 3:15 pm, Matthew Garrett wrote:
> > On Fri, Jan 26, 2007 at 01:56:41PM -0800, David Brownell wrote:
> >
> > > I thought the resolution was that fixing a few of those drivers
> > > should solve the problem Matthew needed resolved, and that in
> > > the meanwhile "rmmod drivername" should suffice. There also seemed
> > > to be agreement that power management for wireless devices needed
> > > more work; there might need to be a state between "down/off" and
> > > "configured and able to talk IP".
> >
> > It's certainly the case that fixing those drivers would result in a
> > better long-term situation - however, nobody currently seems to have any
> > interest in doing so...
>
> And the way these things work, unfortunately, is that merging your patch
> would ensure nobody ever gets such interest. Removing that "state" file
> (and its bogus infrastructure) has already taken a few years too long.
>

No, we shouldn't just break stuff for our users in the hope that said
breakage will force some other developer to come in and fix things later.

We should revert the breakage-causing patch, with the expectation that its
submitter will ensure that all prerequisites are in place before it is
reapplied.

>
> > As I've said before, I think it's unreasonable to cripple interfaces for
> > (mostly) aesthetic reasons without ensuring that equivalent
> > functionality already exists.
>
> I don't recall anyone raising aesthetic concerns. And bug-equivalence
> has never been a goal of Linux.
>

Not breaking things for end-users is a goal. Prime directive, indeed.

>
> > This patch restores useful functionality
> > without breaking the extra sanity checks that you've added. I appreciate
> > that it's not an interface that you want to support in the long term
> > (well, even the short term...),
>
> You imply that it _was_ once supported, which is not true. Like any
> other bug (in this case "design bug"), it was there and could be abused.
> And like some other bugs, fixing it can trigger complaints from (ab)users.

Could someone please explain in easy-to-understand terms what the
real-world impact of this bug is upon our users? How many are affected,
and under what circumstances, and with what effects?

2007-01-27 01:20:06

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

On Fri, Jan 26, 2007 at 04:42:56PM -0800, David Brownell wrote:
> On Friday 26 January 2007 3:15 pm, Matthew Garrett wrote:
> > It's certainly the case that fixing those drivers would result in a
> > better long-term situation - however, nobody currently seems to have any
> > interest in doing so...
>
> And the way these things work, unfortunately, is that merging your patch
> would ensure nobody ever gets such interest. Removing that "state" file
> (and its bogus infrastructure) has already taken a few years too long.

I'd argue that the onus is on those who wish to remove the interface to
ensure that equivalent functionality exists first. It's not as if it's
especially /hard/, just tedious and requires access to a large quantity
of hardware.

> > This patch restores useful functionality
> > without breaking the extra sanity checks that you've added. I appreciate
> > that it's not an interface that you want to support in the long term
> > (well, even the short term...),
>
> You imply that it _was_ once supported, which is not true. Like any
> other bug (in this case "design bug"), it was there and could be abused.
> And like some other bugs, fixing it can trigger complaints from (ab)users.
>
> It's been several years now that this interface has been well recognized as
> trouble. Years in which all _other_ users went and did the Right Thing:
> they stopped using it, or never started.

Now this is what throws me somewhat. Last May, you argued strongly in
favour of keeping the interface:

"Which IMO makes removing this a Bad Thing. It needs to have some
kind of replacement in place before the "magic numbers" go away."

(http://lists.osdl.org/pipermail/linux-pm/2006-May/002376.html)

"How could we schedule the removal before we have even had a couple
releases to fine-tune its replacement, and verify that the main issues
with the current thing are fully resolved?"

(http://lists.osdl.org/pipermail/linux-pm/2006-May/002382.html)

"> Maybe date will need to be shifted...

How about "one year after its replacement is ready"?"

(http://lists.osdl.org/pipermail/linux-pm/2006-May/002384.html)

I don't think there /has/ been any strong indication from the PM
community that this interface was going to go away in the near future -
the last time somebody tried to remove it (rather than incidentally
breaking it), there were complaints and the matter seemed to drop.

Now, the current situation is that the interface is scheduled for
removal in July (not something I especially agree with, and something
you seemed to disagree with less than a year ago). However, the changes
in 2.6.19 effectively crippled it. The current situation is that the
only widely-used bus where this interface still works is USB, so to a
large extent the interface has already been removed. If the interface is
that broken, then it should just be removed now - if it's going to be
carried around until July, it should be fixed so that it works as well
as it did in 2.6.18 and below. The patch does that.

--
Matthew Garrett | [email protected]

2007-01-27 01:26:40

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

On Fri, Jan 26, 2007 at 12:42:26PM -0800, Greg KH wrote:

> No, it's not stable material, as drivers would have to be modified to
> support it, and that is adding new stuff. See my other comment about
> why this was changed because it was broken...

Which drivers? The current code simply bails if the bus (not the device)
supports the late_suspend method. In the PCI core, that function simply
calls the device's late_suspend method and does nothing else. My patch
simply alters the check so that the bus can veto the request if the
driver has such a method, and allow it to pass if it doesn't. To
summarise:

2.6.18 situation:

/sys/devices/.../power/state call will succeed for all PCI devices, even
if the device suspend method must be called with interrupts disabled

2.6.19 situation:

/sys/devices/.../power/state call will fail for all PCI devices, even if
the PCI bus's suspend_late function is effectively a noop for that
device

2.6.19+my patch situation:

/sys/devices/.../power/state call will fail for PCI devices that
implement a suspend_late method, and succeed for other PCI devices

Surely the latter of these is the closest to the expected behaviour?
--
Matthew Garrett | [email protected]

2007-01-27 02:40:22

by David Brownell

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

> > > It's certainly the case that fixing those drivers would result in a
> > > better long-term situation - however, nobody currently seems to have any
> > > interest in doing so...
> >
> > And the way these things work, unfortunately, is that merging your patch
> > would ensure nobody ever gets such interest. Removing that "state" file
> > (and its bogus infrastructure) has already taken a few years too long.
>
> No, we shouldn't just break stuff for our users in the hope that said
> breakage will force some other developer to come in and fix things later.

That's not what happened though. From my perspective, what I see is some
software which chose to START using an interface that's been on its way
out for a few years now. (And deservedly so; nobody has been able to
demonstrate that it's not been broken-as-designed since day one.) That
broke the process which had been working to finally get rid of that broken
interface.

Don't the people writing such software have responsibilities too? Like,
to not start using undocumented and unstable interfaces that have been
acknowledged as broken and on-the-way-out? Joining some of the Linux PM
mailing list discussions over the past few years would have been good too,
especially ones where userspace-driven PM was the topic. (I kept asking
for real world examples, and nobody had any. Which, among other things,
suggests that tool Matthew is concerned with isn't at all well known...)

(Plus, keep in mind that this is not "breakage" in any conventional
sense of something not working. There's no oopsing or slowdown.)


> We should revert the breakage-causing patch, with the expectation that its
> submitter will ensure that all prerequisites are in place before it is
> reapplied.

Well, Linus is the one who submitted the original patch to add the
late_suspend()/early_resume() mechanism ... and there have been a few
other patches fixing issues turned up by that patch. Matthew objected
to one side effect of that. The "bypass layers" part of his patch is
as clean a workaround as can be had, I suppose.


> > > As I've said before, I think it's unreasonable to cripple interfaces for
> > > (mostly) aesthetic reasons without ensuring that equivalent
> > > functionality already exists.
> >
> > I don't recall anyone raising aesthetic concerns. And bug-equivalence
> > has never been a goal of Linux.
> >
>
> Not breaking things for end-users is a goal. Prime directive, indeed.

With limitations. Any time a bug gets fixed, it will break software
that relies on that bug. Like, hmm, this case...

We **STILL** haven't gotten solid information on the software that
broke. From what I've heard on this list it would seem to be used
on certain Ubuntu systems, but even the _name_ of the utility has
never been mentioned. When we asked for more details, what came
back was the information that the problem was really a handful of
wireless drivers.


> > > This patch restores useful functionality
> > > without breaking the extra sanity checks that you've added. I appreciate
> > > that it's not an interface that you want to support in the long term
> > > (well, even the short term...),
> >
> > You imply that it _was_ once supported, which is not true. Like any
> > other bug (in this case "design bug"), it was there and could be abused.
> > And like some other bugs, fixing it can trigger complaints from (ab)users.
>
> Could someone please explain in easy-to-understand terms what the
> real-world impact of this bug is upon our users? How many are affected,
> and under what circumstances, and with what effects?

>From what I can see, a small subset of Ubuntu users will notice that
battery life on some laptops isn't as long as it might be. Most of
that issue can be resolved by "rmmod $WLAN_DRIVER", a workaround with
a long and successful history in Linux.

- Dave

2007-01-27 03:02:41

by David Brownell

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

On Friday 26 January 2007 5:19 pm, Matthew Garrett wrote:
> On Fri, Jan 26, 2007 at 04:42:56PM -0800, David Brownell wrote:
> > On Friday 26 January 2007 3:15 pm, Matthew Garrett wrote:
> > > It's certainly the case that fixing those drivers would result in a
> > > better long-term situation - however, nobody currently seems to have any
> > > interest in doing so...
> >
> > And the way these things work, unfortunately, is that merging your patch
> > would ensure nobody ever gets such interest. Removing that "state" file
> > (and its bogus infrastructure) has already taken a few years too long.
>
> I'd argue that the onus is on those who wish to remove the interface to
> ensure that equivalent functionality exists first.

Are you now arguing that "rmmod $DRIVER" doesn't suffice for what you
were wanting to do? If so, why? What's the delta in power usage?


> Now this is what throws me somewhat. Last May, you argued strongly in
> favour of keeping the interface:
>
> "Which IMO makes removing this a Bad Thing. It needs to have some
> kind of replacement in place before the "magic numbers" go away."
>
> (http://lists.osdl.org/pipermail/linux-pm/2006-May/002376.html)

Specifically to support driver testing. Recall that such testing was
at that time the only known quasi-viable use of that interface. (And
despite your pushback, it still seems that way to me...)

I've changed my mind about that; it's just as easy to whip up custom test
logic, and in any case the stuff that I most needed to test (wakeup
events) can't be tested like that. Bad testing infrastructure doesn't
really do anyone favors, anyway; too much time spent with workarounds,
many of which cover up the bugs you're trying to uncover by testing.

- Dave

2007-01-27 17:05:26

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

Hi!

> In 2.6.19, support for splitting driver suspend and resume callbacks
> into interrupt and non-interrupt contexts was added. Unfortunately, this
> broke /sys/device/.../power/state support for all devices. In the long
> run, this should be obsoleted by power management support in the
> individual drivers - however, in the case of network drivers (for
> example), currently only three drivers implement any sort of useful
> run-time power management.

Well... solution seems to be 'implement useful pm for more drivers'
not 'discourage people from doing so by re-enabling broken interface'.

> --- a/Documentation/feature-removal-schedule.txt
> +++ b/Documentation/feature-removal-schedule.txt
> @@ -9,7 +9,8 @@ be removed from this file.
> What: /sys/devices/.../power/state
> dev->power.power_state
> dpm_runtime_{suspend,resume)()
> -When: July 2007
> + bus->pm_has_noirq_stage()
> +When: Once alternative functionality has been implemented

.../power/state never worked properly. You have been warned and it is
going to be removed. It oopses kernels... while 'only' providing power
savings. If you are interested in power savings, please help doing
them right.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-01-27 17:19:25

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

Hi!

> > This patch allows the bus driver to check whether a specific driver
> > requires the split. If not, the 2.6.18 functionality is restored. It
> > also alters feature-removals.txt to note that the deprecated
> > functionality should not be removed until a replacement actually exists.
>
> That sounds like material for 2.6.20 as well as for 2.6.19.x.

No. It re-enbles mechanism that never worked properly and that is
known to oops the kernels.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-01-27 17:42:29

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

Hi!

> > > > I thought the resolution was that fixing a few of those drivers
> > > > should solve the problem Matthew needed resolved, and that in
> > > > the meanwhile "rmmod drivername" should suffice. There also seemed
> > > > to be agreement that power management for wireless devices needed
> > > > more work; there might need to be a state between "down/off" and
> > > > "configured and able to talk IP".
> > >
> > > It's certainly the case that fixing those drivers would result in a
> > > better long-term situation - however, nobody currently seems to have any
> > > interest in doing so...
> >
> > And the way these things work, unfortunately, is that merging your patch
> > would ensure nobody ever gets such interest. Removing that "state" file
> > (and its bogus infrastructure) has already taken a few years too long.
> >
>
> No, we shouldn't just break stuff for our users in the hope that said
> breakage will force some other developer to come in and fix things later.

We are not breaking anything. We just make power consumption go up for
_very_ small minority of our users... and we removed quite a few ways
to oops a kernel that way.

Drivers suspend/resume methods were not designed to be run at runtime;
if you want to re-enable that, you should audit the drivers before
reenable. And at that point, it should be easy to just do it properly.

> We should revert the breakage-causing patch, with the expectation that its
> submitter will ensure that all prerequisites are in place before it is
> reapplied.

Change breaking that was 'introduce suspend early to fix suspend on
mac mini', by Linus, IIRC. So no, it is not easy to revert this one.

> > You imply that it _was_ once supported, which is not true. Like any
> > other bug (in this case "design bug"), it was there and could be abused.
> > And like some other bugs, fixing it can trigger complaints from (ab)users.
>
> Could someone please explain in easy-to-understand terms what the
> real-world impact of this bug is upon our users? How many are affected,
> and under what circumstances, and with what effects?

Oops, if you echo 3 to wrong file, or if you are hit by race.

Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-01-27 22:42:33

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

On Sat, Jan 27, 2007 at 05:38:04PM +0000, Pavel Machek wrote:

> Change breaking that was 'introduce suspend early to fix suspend on
> mac mini', by Linus, IIRC. So no, it is not easy to revert this one.

But it's easy to fix it. Either drivers need suspend routines that are
called without interrupts enabled, or they don't. The current situation
is that the interface is broken regardless of which is the case - the
situation with the patch is that the interface only stops working for
drivers that need the suspend routine to be called with disabled
interrupts.

--
Matthew Garrett | [email protected]

2007-01-29 17:40:22

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

On Fri, 26 Jan 2007 19:02:37 -0800
David Brownell <[email protected]> wrote:

> On Friday 26 January 2007 5:19 pm, Matthew Garrett wrote:
> > On Fri, Jan 26, 2007 at 04:42:56PM -0800, David Brownell wrote:
> > > On Friday 26 January 2007 3:15 pm, Matthew Garrett wrote:
> > > > It's certainly the case that fixing those drivers would result in a
> > > > better long-term situation - however, nobody currently seems to have any
> > > > interest in doing so...
> > >
> > > And the way these things work, unfortunately, is that merging your patch
> > > would ensure nobody ever gets such interest. Removing that "state" file
> > > (and its bogus infrastructure) has already taken a few years too long.
> >
> > I'd argue that the onus is on those who wish to remove the interface to
> > ensure that equivalent functionality exists first.
>
> Are you now arguing that "rmmod $DRIVER" doesn't suffice for what you
> were wanting to do? If so, why? What's the delta in power usage?

For the case that started the discussion (wireless network devices).
The expected behavior is that the device remains in a low power state
until it enabled (set to up). If really smart a wired network device
can also stay in low power state until carrier is detected. There are
also other network device states (dormant, lower layer down), not currently
in use that could also be useful.

The point is that using the /sys/device/.../power/state file
is not the right way to handle network devices. Power usage should
correlate to device status, not be controlled differently.

--
Stephen Hemminger <[email protected]>

2007-01-30 18:43:25

by Éric Piel

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

01/27/2007 02:17 PM, Pavel Machek wrote/a écrit:
> Hi!
>
>> In 2.6.19, support for splitting driver suspend and resume callbacks
>> into interrupt and non-interrupt contexts was added. Unfortunately, this
>> broke /sys/device/.../power/state support for all devices. In the long
>> run, this should be obsoleted by power management support in the
>> individual drivers - however, in the case of network drivers (for
>> example), currently only three drivers implement any sort of useful
>> run-time power management.
>
> Well... solution seems to be 'implement useful pm for more drivers'
> not 'discourage people from doing so by re-enabling broken interface'.
>
>> --- a/Documentation/feature-removal-schedule.txt
>> +++ b/Documentation/feature-removal-schedule.txt
>> @@ -9,7 +9,8 @@ be removed from this file.
>> What: /sys/devices/.../power/state
>> dev->power.power_state
>> dpm_runtime_{suspend,resume)()
>> -When: July 2007
>> + bus->pm_has_noirq_stage()
>> +When: Once alternative functionality has been implemented
>
> .../power/state never worked properly. You have been warned and it is
> going to be removed. It oopses kernels... while 'only' providing power
> savings. If you are interested in power savings, please help doing
> them right.

Hi,
I realize that I'm arriving just far too late on this thread to bring
any meaning, so that's just for the info...

I have been using this interface for a _long_ time (before june 2005, cf
http://marc.theaimsgroup.com/?l=linux-usb-devel&m=111869558800526&w=2)
and for different things than power saving: I can turn off my (usb)
optical mouse when watching a movie or going to bed.

Is there any hardware independent alternative to turn off USB devices?

See you,
Eric

2007-01-30 18:47:06

by Oliver Neukum

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

Am Dienstag, 30. Januar 2007 19:43 schrieb Eric Piel:

> I have been using this interface for a _long_ time (before june 2005, cf
> http://marc.theaimsgroup.com/?l=linux-usb-devel&m=111869558800526&w=2)
> and for different things than power saving: I can turn off my (usb)
> optical mouse when watching a movie or going to bed.
>
> Is there any hardware independent alternative to turn off USB devices?

Would you test a patch for autosuspension of hid devices?

Regards
Oliver

2007-01-30 19:01:36

by Éric Piel

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

01/30/2007 07:47 PM, Oliver Neukum wrote/a écrit:
> Am Dienstag, 30. Januar 2007 19:43 schrieb Eric Piel:
>
>> I have been using this interface for a _long_ time (before june 2005, cf
>> http://marc.theaimsgroup.com/?l=linux-usb-devel&m=111869558800526&w=2)
>> and for different things than power saving: I can turn off my (usb)
>> optical mouse when watching a movie or going to bed.
>>
>> Is there any hardware independent alternative to turn off USB devices?
>
> Would you test a patch for autosuspension of hid devices?
>
Sure, that sounds pretty interesting :-)

Eric

2007-01-31 10:35:08

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

Hi!

> > Change breaking that was 'introduce suspend early to fix suspend on
> > mac mini', by Linus, IIRC. So no, it is not easy to revert this one.
>
> But it's easy to fix it. Either drivers need suspend routines that

Yes, that's right. It is easy to fix, but not as easy as "just revert".

> called without interrupts enabled, or they don't. The current situation
> is that the interface is broken regardless of which is the case - the
> situation with the patch is that the interface only stops working for
> drivers that need the suspend routine to be called with disabled
> interrupts.

...unfortunately that patch was pretty ugly, and it still does not
solve the "will it oops when I try to use it" part of problem.

(Thanks for your ipw2100 runtime pm work, btw).
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2007-01-31 10:56:10

by Jiri Kosina

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

On Tue, 30 Jan 2007, Oliver Neukum wrote:

> > Is there any hardware independent alternative to turn off USB devices?
> Would you test a patch for autosuspension of hid devices?

Hi Oliver,

do you have such patch already? I would also like to see it; if it looks
OK, I will push it to -mm through my tree for wider testing.

Thanks,

--
Jiri Kosina

2007-01-31 11:03:17

by Oliver Neukum

[permalink] [raw]
Subject: Re: [PATCH] Fix /sys/device/.../power/state regression

Am Mittwoch, 31. Januar 2007 11:55 schrieb Jiri Kosina:
> On Tue, 30 Jan 2007, Oliver Neukum wrote:
>
> > > Is there any hardware independent alternative to turn off USB devices?
> > Would you test a patch for autosuspension of hid devices?
>
> Hi Oliver,
>
> do you have such patch already? I would also like to see it; if it looks
> OK, I will push it to -mm through my tree for wider testing.

It started working ten minutes ago and is not in a nice state now.
But you'll get it today.

Regards
Oliver