2005-04-25 19:06:51

by Greg KH

[permalink] [raw]
Subject: [PATCH] PCI: Add pci shutdown ability

Well it seems that people are starting to want to hook the reboot
notifier, or the device shutdown facility in order to properly shutdown
pci drivers to make kexec work nicer.

So here's a patch for the PCI core that allows pci drivers to now just
add a "shutdown" notifier function that will be called when the system
is being shutdown. It happens just after the reboot notifier happens,
and it should happen in the proper device tree order, so everyone should
be happy.

Any objections to this patch?

thanks,

greg k-h
------

PCI: Add pci shutdown ability

Now pci drivers can know when the system is going down without having to
add a reboot notifier event.

Signed-off-by: Greg Kroah-Hartman <[email protected]>

--- gregkh-2.6.orig/include/linux/pci.h 2005-04-20 21:25:11.000000000 -0700
+++ gregkh-2.6/include/linux/pci.h 2005-04-25 11:54:20.000000000 -0700
@@ -671,6 +671,7 @@
int (*suspend) (struct pci_dev *dev, pm_message_t state); /* Device suspended */
int (*resume) (struct pci_dev *dev); /* Device woken up */
int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); /* Enable wake event */
+ void (*shutdown) (struct pci_dev *dev);

struct device_driver driver;
struct pci_dynids dynids;
Index: gregkh-2.6/drivers/pci/pci-driver.c
===================================================================
--- gregkh-2.6.orig/drivers/pci/pci-driver.c 2005-04-06 11:47:47.000000000 -0700
+++ gregkh-2.6/drivers/pci/pci-driver.c 2005-04-25 12:02:12.000000000 -0700
@@ -318,6 +318,14 @@
return 0;
}

+static void pci_device_shutdown(struct device *dev)
+{
+ struct pci_dev *pci_dev = to_pci_dev(dev);
+ struct pci_driver *drv = pci_dev->driver;
+
+ if (drv && drv->shutdown)
+ drv->shutdown(pci_dev);
+}

#define kobj_to_pci_driver(obj) container_of(obj, struct device_driver, kobj)
#define attr_to_driver_attribute(obj) container_of(obj, struct driver_attribute, attr)
@@ -385,6 +393,7 @@
drv->driver.bus = &pci_bus_type;
drv->driver.probe = pci_device_probe;
drv->driver.remove = pci_device_remove;
+ drv->driver.shutdown = pci_device_shutdown,
drv->driver.owner = drv->owner;
drv->driver.kobj.ktype = &pci_driver_kobj_type;
pci_init_dynids(&drv->dynids);


2005-04-25 19:27:39

by Jeff Garzik

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

Greg KH wrote:
> Well it seems that people are starting to want to hook the reboot
> notifier, or the device shutdown facility in order to properly shutdown
> pci drivers to make kexec work nicer.
>
> So here's a patch for the PCI core that allows pci drivers to now just
> add a "shutdown" notifier function that will be called when the system
> is being shutdown. It happens just after the reboot notifier happens,
> and it should happen in the proper device tree order, so everyone should
> be happy.
>
> Any objections to this patch?

Traditionally the proper place -has- been
* the reboot notifier
* the ->remove hook (hot unplug, and module remove)

which covers all the cases.

Add a ->shutdown hook is more of a hack. If you want to introduce this
facility in a systematic way, introduce a 'kexec reboot' option which
walks the device tree and shuts down hardware.

->shutdown is just a piecemeal, uncoordinated effort (uncoordinated in
the sense that driver shutdowns occur in an undefined order).

Jeff



2005-04-25 19:45:51

by Alexander Nyberg

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

> Well it seems that people are starting to want to hook the reboot
> notifier, or the device shutdown facility in order to properly shutdown
> pci drivers to make kexec work nicer.
>
> So here's a patch for the PCI core that allows pci drivers to now just
> add a "shutdown" notifier function that will be called when the system
> is being shutdown. It happens just after the reboot notifier happens,
> and it should happen in the proper device tree order, so everyone should
> be happy.
>
> Any objections to this patch?

Not sure what you mean by "make kexec work nicer" but if it is because
some devices don't work after a kexec I have some objections.
What about the kexec-on-panic?

In the end at least every storage device should work after a
kexec-on-panic or else there might be cases where we cannot get dumps of
what happened. My guess is that having access to the network might come
in handy after a kexec-on-panic as well.

So if this patch is because some devices don't work across kexec I don't
think this is a good idea because the same devices won't work after a
kexec-on-panic.

2005-04-25 20:08:36

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, Apr 25, 2005 at 03:23:09PM -0400, Jeff Garzik wrote:
> Greg KH wrote:
> >Well it seems that people are starting to want to hook the reboot
> >notifier, or the device shutdown facility in order to properly shutdown
> >pci drivers to make kexec work nicer.
> >
> >So here's a patch for the PCI core that allows pci drivers to now just
> >add a "shutdown" notifier function that will be called when the system
> >is being shutdown. It happens just after the reboot notifier happens,
> >and it should happen in the proper device tree order, so everyone should
> >be happy.
> >
> >Any objections to this patch?
>
> Traditionally the proper place -has- been
> * the reboot notifier
> * the ->remove hook (hot unplug, and module remove)

The latter doesn't get called on power-down, which is what the recent
patches for the kexec "fixes" seem to want.

> which covers all the cases.

But do we really want every pci driver adding a reboot notifier? It's
simple, yes, but a lot of extra code everywhere, that I'm pretty sure
the shutdown() hook was ment to handle.

> Add a ->shutdown hook is more of a hack. If you want to introduce this
> facility in a systematic way, introduce a 'kexec reboot' option which
> walks the device tree and shuts down hardware.

Why would "kexec reboot" be any different from the "normal" system
shutdown?

> ->shutdown is just a piecemeal, uncoordinated effort (uncoordinated in
> the sense that driver shutdowns occur in an undefined order).

->shutdown looks like it walks the device tree and shuts down the
hardware in the proper order, why do you think it is an undefined order?

thanks,

greg k-h

2005-04-25 20:12:51

by Adam Belay

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, Apr 25, 2005 at 12:06:06PM -0700, Greg KH wrote:
> Well it seems that people are starting to want to hook the reboot
> notifier, or the device shutdown facility in order to properly shutdown
> pci drivers to make kexec work nicer.
>
> So here's a patch for the PCI core that allows pci drivers to now just
> add a "shutdown" notifier function that will be called when the system
> is being shutdown. It happens just after the reboot notifier happens,
> and it should happen in the proper device tree order, so everyone should
> be happy.
>
> Any objections to this patch?
>
> thanks,
>
> greg k-h
> ------

Hi Greg,

I think this could be important for any type of device, so the power
management subsystem and driver core should handle it. I'm not really
sure if it's useful in pci alone, as it lacks the necessary ordering and
coordination.

I'm currently developing an interface for quieting devices without turning
them off in my Power Management model. Pavel seems to also have plans along
those lines:

(from the current pm.h)
> * There are 4 important states driver can be in:
> * ON -- driver is working
> * FREEZE -- stop operations and apply whatever policy is applicable to a suspended driver
> * of that class, freeze queues for block like IDE does, drop packets for
> * ethernet, etc... stop DMA engine too etc... so a consistent image can be
> * saved; but do not power any hardware down.
> * SUSPEND - like FREEZE, but hardware is doing as much powersaving as possible. Roughly
> * pci D3.

Thanks,
Adam

>
> PCI: Add pci shutdown ability
>
> Now pci drivers can know when the system is going down without having to
> add a reboot notifier event.
>
> Signed-off-by: Greg Kroah-Hartman <[email protected]>
>
> --- gregkh-2.6.orig/include/linux/pci.h 2005-04-20 21:25:11.000000000 -0700
> +++ gregkh-2.6/include/linux/pci.h 2005-04-25 11:54:20.000000000 -0700
> @@ -671,6 +671,7 @@
> int (*suspend) (struct pci_dev *dev, pm_message_t state); /* Device suspended */
> int (*resume) (struct pci_dev *dev); /* Device woken up */
> int (*enable_wake) (struct pci_dev *dev, pci_power_t state, int enable); /* Enable wake event */
> + void (*shutdown) (struct pci_dev *dev);
>
> struct device_driver driver;
> struct pci_dynids dynids;
> Index: gregkh-2.6/drivers/pci/pci-driver.c
> ===================================================================
> --- gregkh-2.6.orig/drivers/pci/pci-driver.c 2005-04-06 11:47:47.000000000 -0700
> +++ gregkh-2.6/drivers/pci/pci-driver.c 2005-04-25 12:02:12.000000000 -0700
> @@ -318,6 +318,14 @@
> return 0;
> }
>
> +static void pci_device_shutdown(struct device *dev)
> +{
> + struct pci_dev *pci_dev = to_pci_dev(dev);
> + struct pci_driver *drv = pci_dev->driver;
> +
> + if (drv && drv->shutdown)
> + drv->shutdown(pci_dev);
> +}
>
> #define kobj_to_pci_driver(obj) container_of(obj, struct device_driver, kobj)
> #define attr_to_driver_attribute(obj) container_of(obj, struct driver_attribute, attr)
> @@ -385,6 +393,7 @@
> drv->driver.bus = &pci_bus_type;
> drv->driver.probe = pci_device_probe;
> drv->driver.remove = pci_device_remove;
> + drv->driver.shutdown = pci_device_shutdown,
> drv->driver.owner = drv->owner;
> drv->driver.kobj.ktype = &pci_driver_kobj_type;
> pci_init_dynids(&drv->dynids);
> -

2005-04-25 20:15:37

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, Apr 25, 2005 at 09:45:25PM +0200, Alexander Nyberg wrote:
> > Well it seems that people are starting to want to hook the reboot
> > notifier, or the device shutdown facility in order to properly shutdown
> > pci drivers to make kexec work nicer.
> >
> > So here's a patch for the PCI core that allows pci drivers to now just
> > add a "shutdown" notifier function that will be called when the system
> > is being shutdown. It happens just after the reboot notifier happens,
> > and it should happen in the proper device tree order, so everyone should
> > be happy.
> >
> > Any objections to this patch?
>
> Not sure what you mean by "make kexec work nicer" but if it is because
> some devices don't work after a kexec I have some objections.
> What about the kexec-on-panic?

People are starting to submit patches for pci drivers that add "reboot
notifier" hooks, under the guise of fixing up kexec issues with those
drivers.

That is why I proposed this patch, to make it easier for such drivers to
shutdown properly, without needing a reboot notifier hook (which takes
up more code and memory.

thanks,

greg k-h

2005-04-25 20:16:35

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, 25 Apr 2005, Alexander Nyberg wrote:

> Not sure what you mean by "make kexec work nicer" but if it is because
> some devices don't work after a kexec I have some objections.

That was indeed the reason, at least in my case. The newly-rebooted
kernel doesn't work too well when there are active devices, with no driver
loaded, doing DMA and issuing IRQs because they were never shut down.

> What about the kexec-on-panic?
>
> In the end at least every storage device should work after a
> kexec-on-panic or else there might be cases where we cannot get dumps of
> what happened. My guess is that having access to the network might come
> in handy after a kexec-on-panic as well.
>
> So if this patch is because some devices don't work across kexec I don't
> think this is a good idea because the same devices won't work after a
> kexec-on-panic.

Do I understand your argument correctly? You seem to be saying that
because this new facility sometimes won't work (the kexec-on-panic case)
it shouldn't be added at all. What about all the other times when it will
work?

Alan Stern

2005-04-25 20:19:28

by Adam Belay

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, Apr 25, 2005 at 03:23:09PM -0400, Jeff Garzik wrote:
> Greg KH wrote:
> >Well it seems that people are starting to want to hook the reboot
> >notifier, or the device shutdown facility in order to properly shutdown
> >pci drivers to make kexec work nicer.
> >
> >So here's a patch for the PCI core that allows pci drivers to now just
> >add a "shutdown" notifier function that will be called when the system
> >is being shutdown. It happens just after the reboot notifier happens,
> >and it should happen in the proper device tree order, so everyone should
> >be happy.
> >
> >Any objections to this patch?
>
> Traditionally the proper place -has- been
> * the reboot notifier
> * the ->remove hook (hot unplug, and module remove)
>
> which covers all the cases.
>
> Add a ->shutdown hook is more of a hack. If you want to introduce this
> facility in a systematic way, introduce a 'kexec reboot' option which
> walks the device tree and shuts down hardware.
>
> ->shutdown is just a piecemeal, uncoordinated effort (uncoordinated in
> the sense that driver shutdowns occur in an undefined order).
>
> Jeff

I agree, though I think "->remove" may be more than we need. Another
potential use of this might be to prepare devices just before removing power.

Thanks,
Adam

2005-04-25 20:25:18

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, Apr 25, 2005 at 04:08:25PM -0400, Adam Belay wrote:
> I think this could be important for any type of device, so the power
> management subsystem and driver core should handle it. I'm not really
> sure if it's useful in pci alone, as it lacks the necessary ordering and
> coordination.

The driver core today _does_ handle this properly, and in the correct
order. I'm just allowing pci drivers access to that functionality, as
today they can not take advantage of it. That's all this patch does.

> I'm currently developing an interface for quieting devices without turning
> them off in my Power Management model. Pavel seems to also have plans along
> those lines:

<snip>

Great, then it will tie into the current driver model code, which will
then call the proper pci driver code, and everyone will be happy :)

thanks,

greg k-h

2005-04-25 20:35:27

by Adam Belay

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, Apr 25, 2005 at 01:19:49PM -0700, Greg KH wrote:
> On Mon, Apr 25, 2005 at 04:08:25PM -0400, Adam Belay wrote:
> > I think this could be important for any type of device, so the power
> > management subsystem and driver core should handle it. I'm not really
> > sure if it's useful in pci alone, as it lacks the necessary ordering and
> > coordination.
>
> The driver core today _does_ handle this properly, and in the correct
> order. I'm just allowing pci drivers access to that functionality, as
> today they can not take advantage of it. That's all this patch does.

sorry, I didn't notice this after a quick glance :)

> + drv->driver.shutdown = pci_device_shutdown,

Ok, great. I understand.

>
> > I'm currently developing an interface for quieting devices without turning
> > them off in my Power Management model. Pavel seems to also have plans along
> > those lines:
>
> <snip>

I think the intention here may have been to use PM_FREEZE via *suspend. It
isn't currently supported though.

>
> Great, then it will tie into the current driver model code, which will
> then call the proper pci driver code, and everyone will be happy :)
>
> thanks,
>
> greg k-h

Thanks,
Adam

2005-04-25 20:46:31

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

Hi!

> Well it seems that people are starting to want to hook the reboot
> notifier, or the device shutdown facility in order to properly shutdown
> pci drivers to make kexec work nicer.
>
> So here's a patch for the PCI core that allows pci drivers to now just
> add a "shutdown" notifier function that will be called when the system
> is being shutdown. It happens just after the reboot notifier happens,
> and it should happen in the proper device tree order, so everyone should
> be happy.
>
> Any objections to this patch?

Yes.

I believe it should just do suspend(PMSG_SUSPEND) before system
shutdown. If you think distintion between shutdown and suspend is
important (I am not 100% convinced it is), we can just add flag
saying "this is system shutdown".

Actually this patch should be in the queue somewhere... We had it in
suse trees for a long time, and IMO it can solve problem easily.

Pavel

--- clean-git/kernel/sys.c 2005-04-23 23:21:55.000000000 +0200
+++ linux/kernel/sys.c 2005-04-24 00:20:47.000000000 +0200
@@ -404,6 +404,7 @@
case LINUX_REBOOT_CMD_HALT:
notifier_call_chain(&reboot_notifier_list, SYS_HALT, NULL);
system_state = SYSTEM_HALT;
+ device_suspend(PMSG_SUSPEND);
device_shutdown();
printk(KERN_EMERG "System halted.\n");
machine_halt();
@@ -414,6 +415,7 @@
case LINUX_REBOOT_CMD_POWER_OFF:
notifier_call_chain(&reboot_notifier_list, SYS_POWER_OFF, NULL);
system_state = SYSTEM_POWER_OFF;
+ device_suspend(PMSG_SUSPEND);
device_shutdown();
printk(KERN_EMERG "Power down.\n");
machine_power_off();
@@ -430,6 +432,7 @@

notifier_call_chain(&reboot_notifier_list, SYS_RESTART, buffer);
system_state = SYSTEM_RESTART;
+ device_suspend(PMSG_FREEZE);
device_shutdown();
printk(KERN_EMERG "Restarting system with command '%s'.\n", buffer);
machine_restart(buffer);


--
Boycott Kodak -- for their patent abuse against Java.

2005-04-25 20:52:20

by Alexander Nyberg

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

> > Not sure what you mean by "make kexec work nicer" but if it is because
> > some devices don't work after a kexec I have some objections.
>
> That was indeed the reason, at least in my case. The newly-rebooted
> kernel doesn't work too well when there are active devices, with no driver
> loaded, doing DMA and issuing IRQs because they were never shut down.
>
> > What about the kexec-on-panic?
> >
> > In the end at least every storage device should work after a
> > kexec-on-panic or else there might be cases where we cannot get dumps of
> > what happened. My guess is that having access to the network might come
> > in handy after a kexec-on-panic as well.
> >
> > So if this patch is because some devices don't work across kexec I don't
> > think this is a good idea because the same devices won't work after a
> > kexec-on-panic.
>
> Do I understand your argument correctly? You seem to be saying that
> because this new facility sometimes won't work (the kexec-on-panic case)
> it shouldn't be added at all. What about all the other times when it will
> work?

No, I was saying that this approach doesn't solve all problems that
exist with kexec and kexec-on-panic is a very important coming
functionality. If there is going to be prepatory work for its coming we
might as well at least try to consider all the problems that are at
hand.
Otherwise the same problems will just appear again when kexec-on-panic
starts to get used in the real world.

2005-04-25 20:59:33

by Adam Belay

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, Apr 25, 2005 at 10:42:07PM +0200, Pavel Machek wrote:
> Hi!
>
> > Well it seems that people are starting to want to hook the reboot
> > notifier, or the device shutdown facility in order to properly shutdown
> > pci drivers to make kexec work nicer.
> >
> > So here's a patch for the PCI core that allows pci drivers to now just
> > add a "shutdown" notifier function that will be called when the system
> > is being shutdown. It happens just after the reboot notifier happens,
> > and it should happen in the proper device tree order, so everyone should
> > be happy.
> >
> > Any objections to this patch?
>
> Yes.
>
> I believe it should just do suspend(PMSG_SUSPEND) before system
> shutdown. If you think distintion between shutdown and suspend is
> important (I am not 100% convinced it is), we can just add flag
> saying "this is system shutdown".

So if I understand this correctly, you'd like to manually turn off devices
during a power off. I believe the ACPI spec recommends this for S4 (but also
to leave on wake devices), but not necessarily S5. Still it may be a good
idea. Comments?

>
> Actually this patch should be in the queue somewhere... We had it in
> suse trees for a long time, and IMO it can solve problem easily.

Yeah, that's what I had in mind when I mentioned PMSG_FREEZE. It seems
to replace "shutdown" in many ways, is this correct?

Thanks,
Adam

2005-04-25 21:03:42

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, Apr 25, 2005 at 10:42:07PM +0200, Pavel Machek wrote:
> Hi!
>
> > Well it seems that people are starting to want to hook the reboot
> > notifier, or the device shutdown facility in order to properly shutdown
> > pci drivers to make kexec work nicer.
> >
> > So here's a patch for the PCI core that allows pci drivers to now just
> > add a "shutdown" notifier function that will be called when the system
> > is being shutdown. It happens just after the reboot notifier happens,
> > and it should happen in the proper device tree order, so everyone should
> > be happy.
> >
> > Any objections to this patch?
>
> Yes.
>
> I believe it should just do suspend(PMSG_SUSPEND) before system
> shutdown. If you think distintion between shutdown and suspend is
> important (I am not 100% convinced it is), we can just add flag
> saying "this is system shutdown".

Then why even have the device_shutdown() call and notifier in the struct
device_driver?

> Actually this patch should be in the queue somewhere... We had it in
> suse trees for a long time, and IMO it can solve problem easily.
>
> Pavel
>
> --- clean-git/kernel/sys.c 2005-04-23 23:21:55.000000000 +0200
> +++ linux/kernel/sys.c 2005-04-24 00:20:47.000000000 +0200
> @@ -404,6 +404,7 @@
> case LINUX_REBOOT_CMD_HALT:
> notifier_call_chain(&reboot_notifier_list, SYS_HALT, NULL);
> system_state = SYSTEM_HALT;
> + device_suspend(PMSG_SUSPEND);
> device_shutdown();

Again, why keep device_shutdown() around at all then?

thanks,

greg k-h

2005-04-25 21:12:42

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, 25 Apr 2005, Alexander Nyberg wrote:

> > Do I understand your argument correctly? You seem to be saying that
> > because this new facility sometimes won't work (the kexec-on-panic case)
> > it shouldn't be added at all. What about all the other times when it will
> > work?
>
> No, I was saying that this approach doesn't solve all problems that
> exist with kexec and kexec-on-panic is a very important coming
> functionality. If there is going to be prepatory work for its coming we
> might as well at least try to consider all the problems that are at
> hand.
> Otherwise the same problems will just appear again when kexec-on-panic
> starts to get used in the real world.

I doubt anything can solve _all_ problems that exist with kexec. :-)

Do you have any suggestions for a better way to stop a device from issuing
IRQs and doing DMA at any point between the time when the old kernel
panics and the time when the new kernel loads the device's driver?

I looked into the possibility of having the PCI core disable interrupt
generation and DMA on each new device as it is discovered. Unfortunately
there is no dependable, universal way to do this for IRQs. (A notable gap
in the original PCI specification, IMHO.) Another problem with this
approach is that on some platforms the initial console is a PCI device
driven by the firmware, but the firmware won't tell Linux which device it
is! Disable the wrong device and away goes your screen display.

Alan Stern

2005-04-25 21:14:02

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

Hi!

> > > Well it seems that people are starting to want to hook the reboot
> > > notifier, or the device shutdown facility in order to properly shutdown
> > > pci drivers to make kexec work nicer.
> > >
> > > So here's a patch for the PCI core that allows pci drivers to now just
> > > add a "shutdown" notifier function that will be called when the system
> > > is being shutdown. It happens just after the reboot notifier happens,
> > > and it should happen in the proper device tree order, so everyone should
> > > be happy.
> > >
> > > Any objections to this patch?
> >
> > Yes.
> >
> > I believe it should just do suspend(PMSG_SUSPEND) before system
> > shutdown. If you think distintion between shutdown and suspend is
> > important (I am not 100% convinced it is), we can just add flag
> > saying "this is system shutdown".
>
> So if I understand this correctly, you'd like to manually turn off devices
> during a power off. I believe the ACPI spec recommends this for S4 (but also
> to leave on wake devices), but not necessarily S5. Still it may be a good
> idea. Comments?

It is neccessary for some machines (interrupt controller) or machine
will not power down...

> > Actually this patch should be in the queue somewhere... We had it in
> > suse trees for a long time, and IMO it can solve problem easily.
>
> Yeah, that's what I had in mind when I mentioned PMSG_FREEZE. It seems
> to replace "shutdown" in many ways, is this correct?

Yes. (Actually I'm not sure if PMSG_FREEZE or PMSG_SUSPEND is right
thing to do for suspend.)
Pavel
--
Boycott Kodak -- for their patent abuse against Java.

2005-04-25 21:16:46

by David Brownell

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: [PATCH] PCI: Add pci shutdown ability

> > So here's a patch for the PCI core that allows pci drivers to now just
> > add a "shutdown" notifier function that will be called when the system
> > is being shutdown. It happens just after the reboot notifier happens,
> > and it should happen in the proper device tree order, so everyone should
> > be happy.

Both kexec and sys_shutdown always call notifiers first, then the
device shutdown stuff, as I understand. (I didn't see the patch,
so I don't know what it should do...)

If there's going to be a "shutdown" primitive, I certainly think
that it should work for PCI devices.

There's a separate issue about needing to clone such stuff into
every version of bus glue. For example, OHCI runs on PCI ... but
also on lots of non-PCI hardware. It's actually cleaner to use
a notifier than to write almost-identical shutdown methods for
each different bus it may be glued to. Same thing with EHCI;
other bus-neutral register APIs will have the same issue.


> > Any objections to this patch?
>
> Yes.
>
> I believe it should just do suspend(PMSG_SUSPEND) before system
> shutdown. If you think distintion between shutdown and suspend is
> important (I am not 100% convinced it is), we can just add flag
> saying "this is system shutdown".

I've made that point before -- essentially that shutdown() in
the driver model itself is superfluous, either remove() or
maybe suspend() would achieve the same result, without adding
any special code that's run/tested rather infrequently ...

That is: if shutdown() isn't going to be removed, then the
code that shuts down devices should invoke remove() or even
suspend() in cases that there's no shutdown() method.

But the pushback was that shutdown() methods are allowed to
be much lighter weight, and really ought to work even when
big parts of the system are misbehaving. So they'd just do
stuff like resetting chips, sanitizing GPIOs, and so on ...
and nothing that'd be likely to break if significant parts
of kernel memory were corrupted.

- Dave


2005-04-25 21:34:19

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

Hi!

> > > Well it seems that people are starting to want to hook the reboot
> > > notifier, or the device shutdown facility in order to properly shutdown
> > > pci drivers to make kexec work nicer.
> > >
> > > So here's a patch for the PCI core that allows pci drivers to now just
> > > add a "shutdown" notifier function that will be called when the system
> > > is being shutdown. It happens just after the reboot notifier happens,
> > > and it should happen in the proper device tree order, so everyone should
> > > be happy.
> > >
> > > Any objections to this patch?
> >
> > Yes.
> >
> > I believe it should just do suspend(PMSG_SUSPEND) before system
> > shutdown. If you think distintion between shutdown and suspend is
> > important (I am not 100% convinced it is), we can just add flag
> > saying "this is system shutdown".
>
> Then why even have the device_shutdown() call and notifier in the struct
> device_driver?

Yes, I guess it is redundant. Perhaps it should be removed...

Pavel
--
Boycott Kodak -- for their patent abuse against Java.

2005-04-25 22:03:07

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

Alan Stern <[email protected]> wrote:
>
> On Mon, 25 Apr 2005, Alexander Nyberg wrote:
>
> > Not sure what you mean by "make kexec work nicer" but if it is because
> > some devices don't work after a kexec I have some objections.
>
> That was indeed the reason, at least in my case. The newly-rebooted
> kernel doesn't work too well when there are active devices, with no driver
> loaded, doing DMA and issuing IRQs because they were never shut down.

I have vague memories of this being discussed at some length last year.
Nothing comprehensive came of it, except that perhaps the kdump code should
spin with irqs off for a couple of seconds so the DMA and IRQs stop.

(Ongoing DMA is not a problem actually, because the kdump kernel won't be
using that memory anyway)

> > What about the kexec-on-panic?
> >
> > In the end at least every storage device should work after a
> > kexec-on-panic or else there might be cases where we cannot get dumps of
> > what happened. My guess is that having access to the network might come
> > in handy after a kexec-on-panic as well.
> >
> > So if this patch is because some devices don't work across kexec I don't
> > think this is a good idea because the same devices won't work after a
> > kexec-on-panic.
>
> Do I understand your argument correctly? You seem to be saying that
> because this new facility sometimes won't work (the kexec-on-panic case)
> it shouldn't be added at all. What about all the other times when it will
> work?

For kdump we really don't want to be doing fancy driver shutdown inside a
crashed kernel. It would be better to just jump to the new kernel and
to reset the hardware from there. DMA doesn't matter, and maybe IRQs can
be handled with a sustained local_irq_disable() (hard).

But for the normal kexec path, yes, graceful device shutdown is desirable.

So the requirements for the two different kexec scenarios are quite
different and it seems unlikely that any single approach to device shutdown
will suit both situations.

2005-04-25 22:14:21

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, Apr 25, 2005 at 02:58:31PM -0700, Andrew Morton wrote:
> Alan Stern <[email protected]> wrote:
> >
> > On Mon, 25 Apr 2005, Alexander Nyberg wrote:
> >
> > > Not sure what you mean by "make kexec work nicer" but if it is because
> > > some devices don't work after a kexec I have some objections.
> >
> > That was indeed the reason, at least in my case. The newly-rebooted
> > kernel doesn't work too well when there are active devices, with no driver
> > loaded, doing DMA and issuing IRQs because they were never shut down.
>
> I have vague memories of this being discussed at some length last year.
> Nothing comprehensive came of it, except that perhaps the kdump code should
> spin with irqs off for a couple of seconds so the DMA and IRQs stop.
>
> (Ongoing DMA is not a problem actually, because the kdump kernel won't be
> using that memory anyway)

Actually, some cpufreq drivers *should* do their speed transitions with
all PCI mastering disabled. The lack of any infrastructure to quiesce drivers
and prevent new DMA transactions from occuring whilst the transition occurs
means that currently.. we don't. So +1 for any driver model work that
may lead to something we can use here.

This is the main reason the longhaul cpufreq driver is currently busted.
That it ever worked at all is a miracle.

Dave

2005-04-25 23:27:37

by Adam Belay

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, Apr 25, 2005 at 06:13:27PM -0400, Dave Jones wrote:
> On Mon, Apr 25, 2005 at 02:58:31PM -0700, Andrew Morton wrote:
> > Alan Stern <[email protected]> wrote:
> > >
> > > On Mon, 25 Apr 2005, Alexander Nyberg wrote:
> > >
> > > > Not sure what you mean by "make kexec work nicer" but if it is because
> > > > some devices don't work after a kexec I have some objections.
> > >
> > > That was indeed the reason, at least in my case. The newly-rebooted
> > > kernel doesn't work too well when there are active devices, with no driver
> > > loaded, doing DMA and issuing IRQs because they were never shut down.
> >
> > I have vague memories of this being discussed at some length last year.
> > Nothing comprehensive came of it, except that perhaps the kdump code should
> > spin with irqs off for a couple of seconds so the DMA and IRQs stop.
> >
> > (Ongoing DMA is not a problem actually, because the kdump kernel won't be
> > using that memory anyway)
>
> Actually, some cpufreq drivers *should* do their speed transitions with
> all PCI mastering disabled. The lack of any infrastructure to quiesce drivers
> and prevent new DMA transactions from occuring whilst the transition occurs
> means that currently.. we don't. So +1 for any driver model work that
> may lead to something we can use here.
>
> This is the main reason the longhaul cpufreq driver is currently busted.
> That it ever worked at all is a miracle.
>
> Dave

I've been considering for a while that, in addition to ->probe and ->remove, we
have the following:

"struct device" -->
->attach - binds to the device and allocates data structures
->probe - detects and sets up the hardware
->start - begins transactions (like DMA)
->stop - stops transactions
->remove - prepares the hardware for no driver control
->detach - frees stuff and unbinds the device

->start and ->stop would be optional, and only used where they apply.

->probe and ->remove would be useful for resource rebalancing

Power management functions could (and usually should) manually call some of
these. Also this would be useful for error recovery and restarting devices.

Still, cpufreq seems like a difficult problem. What's to prevent,
hypothetically, an SMP system from stoping a device while the upper class
layer tries to use it. If the class level locks control of the device, then
DMA can't be stopped. Also, attempting to stop device activity may fail
if the driver decides it's not possible.

Thanks,
Adam

2005-04-26 03:40:14

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: [PATCH] PCI: Add pci shutdown ability


> Yes.
>
> I believe it should just do suspend(PMSG_SUSPEND) before system
> shutdown. If you think distintion between shutdown and suspend is
> important (I am not 100% convinced it is), we can just add flag
> saying "this is system shutdown".
>
> Actually this patch should be in the queue somewhere... We had it in
> suse trees for a long time, and IMO it can solve problem easily.

I think we probably want a distinct state for shutdown ...

Ben.


2005-04-26 03:43:00

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability


> > I believe it should just do suspend(PMSG_SUSPEND) before system
> > shutdown. If you think distintion between shutdown and suspend is
> > important (I am not 100% convinced it is), we can just add flag
> > saying "this is system shutdown".
>
> Then why even have the device_shutdown() call and notifier in the struct
> device_driver?

Historical bad ideas...

> > Actually this patch should be in the queue somewhere... We had it in
> > suse trees for a long time, and IMO it can solve problem easily.
> >
> > Pavel
> >
> > --- clean-git/kernel/sys.c 2005-04-23 23:21:55.000000000 +0200
> > +++ linux/kernel/sys.c 2005-04-24 00:20:47.000000000 +0200
> > @@ -404,6 +404,7 @@
> > case LINUX_REBOOT_CMD_HALT:
> > notifier_call_chain(&reboot_notifier_list, SYS_HALT, NULL);
> > system_state = SYSTEM_HALT;
> > + device_suspend(PMSG_SUSPEND);
> > device_shutdown();
>
> Again, why keep device_shutdown() around at all then?

I've argued for folding shutdown and suspend for some time now, though
some drivers who rely on shutdown today will need fixing I suppose.

Also, I think kexec shouldn't use "shutdown" but a different message.
There are some conceptual differences, things like stopping the platters
on disk etc... things you want to do on one and not the other. In a way,
kexec needs are very similar to suspend-to-disk "freeze" state. I'd
rather call PMSG_FREEZE there.

Ben.


2005-04-26 03:47:05

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, 2005-04-25 at 14:58 -0700, Andrew Morton wrote:
> Alan Stern <[email protected]> wrote:
> >
> > On Mon, 25 Apr 2005, Alexander Nyberg wrote:
> >
> > > Not sure what you mean by "make kexec work nicer" but if it is because
> > > some devices don't work after a kexec I have some objections.
> >
> > That was indeed the reason, at least in my case. The newly-rebooted
> > kernel doesn't work too well when there are active devices, with no driver
> > loaded, doing DMA and issuing IRQs because they were never shut down.
>
> I have vague memories of this being discussed at some length last year.
> Nothing comprehensive came of it, except that perhaps the kdump code should
> spin with irqs off for a couple of seconds so the DMA and IRQs stop.

That i bogus, USB will never stop DMA unless told to do so for example.

> (Ongoing DMA is not a problem actually, because the kdump kernel won't be
> using that memory anyway)

Ok, good. So kdump don't need to call PMSG_FREEZE, normal kexec still
does though.

> > > What about the kexec-on-panic?
> > >
> > > In the end at least every storage device should work after a
> > > kexec-on-panic or else there might be cases where we cannot get dumps of
> > > what happened. My guess is that having access to the network might come
> > > in handy after a kexec-on-panic as well.
> > >
> > > So if this patch is because some devices don't work across kexec I don't
> > > think this is a good idea because the same devices won't work after a
> > > kexec-on-panic.
> >
> > Do I understand your argument correctly? You seem to be saying that
> > because this new facility sometimes won't work (the kexec-on-panic case)
> > it shouldn't be added at all. What about all the other times when it will
> > work?
>
> For kdump we really don't want to be doing fancy driver shutdown inside a
> crashed kernel. It would be better to just jump to the new kernel and
> to reset the hardware from there. DMA doesn't matter, and maybe IRQs can
> be handled with a sustained local_irq_disable() (hard).

Yup.

> But for the normal kexec path, yes, graceful device shutdown is desirable.
>
> So the requirements for the two different kexec scenarios are quite
> different and it seems unlikely that any single approach to device shutdown
> will suit both situations.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Benjamin Herrenschmidt <[email protected]>

2005-04-26 03:53:19

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability


> > I have vague memories of this being discussed at some length last year.
> > Nothing comprehensive came of it, except that perhaps the kdump code should
> > spin with irqs off for a couple of seconds so the DMA and IRQs stop.
> >
> > (Ongoing DMA is not a problem actually, because the kdump kernel won't be
> > using that memory anyway)
>
> Actually, some cpufreq drivers *should* do their speed transitions with
> all PCI mastering disabled. The lack of any infrastructure to quiesce drivers
> and prevent new DMA transactions from occuring whilst the transition occurs
> means that currently.. we don't. So +1 for any driver model work that
> may lead to something we can use here.

True, I have the same problem on pmac with some machines that use PMU
based speed switch. On those, the CPU is hard rebooted, so we need to
flush all caches which can't always be done in a completely "safe" way
with pending DMAs...

> This is the main reason the longhaul cpufreq driver is currently busted.
> That it ever worked at all is a miracle.

Well, In my case, I disp-flush so much more than is normally necessary
that I end up with something that seem stable, but I agree it's dodgy.

The PMSG_FREEZE semantics that we defined for suspend-to-disk however is
just what we need here. It basically tells driver to stop any DMA
activity and freeze processing. It should be used for kexec too.

The problem is, as far as I understand what David told me a while ago,
some USB chips simply _cannot_ disable DMA without actually suspending
the bus, which itself is a complex process that takes some time and can
involve all sort of problems with devices / drivers that don't deal with
suspended busses properly. I suspect other kind of chips may be
similarily busted by design.





2005-04-26 04:00:19

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, 2005-04-25 at 13:12 -0700, Greg KH wrote:

> People are starting to submit patches for pci drivers that add "reboot
> notifier" hooks, under the guise of fixing up kexec issues with those
> drivers.
>
> That is why I proposed this patch, to make it easier for such drivers to
> shutdown properly, without needing a reboot notifier hook (which takes
> up more code and memory.

But it isn't the right fix. It should be a suspend() call with
PMSG_FREEZE

Ben.


2005-04-26 04:32:07

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

> > So if I understand this correctly, you'd like to manually turn off devices
> > during a power off. I believe the ACPI spec recommends this for S4 (but also
> > to leave on wake devices), but not necessarily S5. Still it may be a good
> > idea. Comments?
>
> It is neccessary for some machines (interrupt controller) or machine
> will not power down...

Additionally, some machines won't properly park/flush the disk, it's
necessary to send the proper suspend commands to IDE hard disks prior to
shutting down or we risk data loss.

> > > Actually this patch should be in the queue somewhere... We had it in
> > > suse trees for a long time, and IMO it can solve problem easily.
> >
> > Yeah, that's what I had in mind when I mentioned PMSG_FREEZE. It seems
> > to replace "shutdown" in many ways, is this correct?
>
> Yes. (Actually I'm not sure if PMSG_FREEZE or PMSG_SUSPEND is right
> thing to do for suspend.)


I think FREEZE for kexec and SUSPEND for shutdown, though I suppose we
may want a separate one for the later eventually...

Ben.


2005-04-26 04:33:56

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

> I've been considering for a while that, in addition to ->probe and ->remove, we
> have the following:
>
> "struct device" -->
> ->attach - binds to the device and allocates data structures
> ->probe - detects and sets up the hardware
> ->start - begins transactions (like DMA)
> ->stop - stops transactions
> ->remove - prepares the hardware for no driver control
> ->detach - frees stuff and unbinds the device
>
> ->start and ->stop would be optional, and only used where they apply.

>From my experience, this doesn't work. You actually want to have power
transitions and start/stop semantics to be "atomic" as far as drvier
state change is concerned. You can't for example stop all drivers, then
in a second pass, change the power state, since after you have stopped
drivers, you parent (bus) driver may not let you talk to your device
anymore for obvious reasons (and thus may prevent you from doing the
power state change).

We really want all this to be part of the normal power management
infrastructure. In this specific state, it's just basically a system
state, that has already been discussed at lenght and that we nicknamed
'freeze' since it's exactly what suspend-to-disk needs before
snapshoting the system image.

> ->probe and ->remove would be useful for resource rebalancing
>
> Power management functions could (and usually should) manually call some of
> these. Also this would be useful for error recovery and restarting devices.
>
> Still, cpufreq seems like a difficult problem. What's to prevent,
> hypothetically, an SMP system from stoping a device while the upper class
> layer tries to use it.

Proper locking in the driver should prevent that. if you have a problem
with "SMP", then you have a problem with preempt, and others ... then
your model is flawed.

> If the class level locks control of the device, then
> DMA can't be stopped. Also, attempting to stop device activity may fail
> if the driver decides it's not possible.

No locking should be at the class level. All locking should be local to
the device, unless the notion of device state is managed outside of the
driver.

I don't like this notion of "stop" separated from power states anyway, I
think it just doesn't work in practice.

Ben.

> Thanks,
> Adam
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
Benjamin Herrenschmidt <[email protected]>

2005-04-26 06:27:22

by Adam Belay

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Tue, Apr 26, 2005 at 02:32:29PM +1000, Benjamin Herrenschmidt wrote:

-->snip

>
> I don't like this notion of "stop" separated from power states anyway, I
> think it just doesn't work in practice.

Yeah, after giving it some additional thought, I think there are better ways.

>
> Ben.
>

Ok, here's a new idea.

For many devices "->suspend" and "->resume" with pm_message_t is exactly what
we need. However, as we support more advanced power management features, such
as runtime power management, or power containers, we need something a little
more specific. The exact power state must be specified among other issues.

We might do something like this:

Keep "->suspend" and "->resume" around unchanged. (so the states would
probably remain as PMSG_FREEZE and PMSG_SUSPEND). If the driver doesn't
support the more advanced PM methods just use these. They work well enough
for system sleep states etc.

Alternatively drivers could support a more rich power management interface
via the following methods:


change_state - changes a device's power state

change_state(struct device * dev, pm_state_t state, struct system_state * sys_state, int reason);
@dev - the device
@state - the target device-specific power state
@sys_state - a data structure containing information about the intended global system power state
@reason - why the state must be changed (ex. RUNTIME_PM, SYSTEM_SLEEP, SYSTEM_RESUME, etc.)


halt - acts somewhat like PMSG_FREEZE, stops device activity, doesn't change power state

halt(struct device * dev, struct system_state * sys_state, int reason);
@dev - the device
@sys_state - a data structure containing information about the intended global system power state
@reason - why we are halting operation (ex. RUNTIME_CHANGES (like cpufreq), SYSTEM_SLEEP, SHUTDOWN, REBOOT)


contine - resumes from a "halt"

continue(struct device * dev, struct system_state * sys_state, int reason);
@dev - the device
@sys_state - a data structure containing information about the intended global system power state
@reason - why we are resuming operation (ex. RUNTIME_CHANGES (like cpufreq), SYSTEM_RESUME)


When changing system state, we call "change_state" for every device with power
resources. Devices that do not directly consume power or have power states
will not implement "change_state" so we will call "halt" and "continue"
instead.

When shutting down the system, halt has the option of turning off the device,
as it will see the SHUTDOWN reason. So it's a driver-knows-best approach
instead of assuming everything must be turned off, or everything must just be
stopped.

So in theory, with cpufreq, we could stop userspace, ->halt every device
(drivers won't do anything if they know it's not necessary), change the
frequency, and then resume operation.

We may want to create structures like pm_message_t for "change_state", "halt",
and "continue". Pavel, do you have any thoughts on this?

This is just a rough idea... I look forward to any comments or suggestions.

Thanks,
Adam

2005-04-26 06:38:07

by Adam Belay

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: [PATCH] PCI: Add pci shutdown ability

On Tue, Apr 26, 2005 at 01:39:20PM +1000, Benjamin Herrenschmidt wrote:
>
> > Yes.
> >
> > I believe it should just do suspend(PMSG_SUSPEND) before system
> > shutdown. If you think distintion between shutdown and suspend is
> > important (I am not 100% convinced it is), we can just add flag
> > saying "this is system shutdown".
> >
> > Actually this patch should be in the queue somewhere... We had it in
> > suse trees for a long time, and IMO it can solve problem easily.
>
> I think we probably want a distinct state for shutdown ...
>
> Ben.

Yes, it's possible we do. I'm not sure what every platform requires. The
question is should we make this change to pm_message_t in the short term?
"->shutdown", even though incorrect, seems to fill this role.

Greg, have you seen shutdown methods doing anything special other than
standard "->suspend" stuff?

Thanks,
Adam

2005-04-26 06:47:36

by Greg KH

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: [PATCH] PCI: Add pci shutdown ability

On Tue, Apr 26, 2005 at 02:33:28AM -0400, Adam Belay wrote:
> On Tue, Apr 26, 2005 at 01:39:20PM +1000, Benjamin Herrenschmidt wrote:
> >
> > > I believe it should just do suspend(PMSG_SUSPEND) before system
> > > shutdown. If you think distintion between shutdown and suspend is
> > > important (I am not 100% convinced it is), we can just add flag
> > > saying "this is system shutdown".
> > >
> > > Actually this patch should be in the queue somewhere... We had it in
> > > suse trees for a long time, and IMO it can solve problem easily.
> >
> > I think we probably want a distinct state for shutdown ...
>
> Yes, it's possible we do. I'm not sure what every platform requires. The
> question is should we make this change to pm_message_t in the short term?
> "->shutdown", even though incorrect, seems to fill this role.
>
> Greg, have you seen shutdown methods doing anything special other than
> standard "->suspend" stuff?

Don't know, never looked. Try looking at the system and platform
drivers, I think those are the only ones that have that callback hooked
up right now.

thanks,

greg k-h

2005-04-26 07:16:01

by Nigel Cunningham

[permalink] [raw]
Subject: Re: [linux-pm] Re: [PATCH] PCI: Add pci shutdown ability

Hi.

On Tue, 2005-04-26 at 16:23, Adam Belay wrote:
> Ok, here's a new idea.
>
> For many devices "->suspend" and "->resume" with pm_message_t is exactly what
> we need. However, as we support more advanced power management features, such
> as runtime power management, or power containers, we need something a little
> more specific. The exact power state must be specified among other issues.
>
> We might do something like this:
>
> Keep "->suspend" and "->resume" around unchanged. (so the states would
> probably remain as PMSG_FREEZE and PMSG_SUSPEND). If the driver doesn't
> support the more advanced PM methods just use these. They work well enough
> for system sleep states etc.

Ok, so for each driver we have either ->suspend & ->resume or
->change_state, ->halt and ->continue, right?

Is it safe to assume that these later methods would get called from the
same places in device_suspend and device_resume that ->suspend &
->resume are called from at the moment? That would seem to me to be the
cleanest way of addressing ordering ... but then I find myself asking,
why not just do one or the other?

I have to admit that I've never liked ->suspend & ->resume. They imply
too much knowledge in the caller about what start the device was in. I'd
like, instead, to see all of the decision making as to what state to
actually be in exist in the driver and the helpers it uses. Then the
semantics becomes requests and notifications of system/child/parent
state changes rather than _commands_ to suspend/resume. This should be a
lot cleaner in the context of runtime power management as well.

> When changing system state, we call "change_state" for every device with power
> resources. Devices that do not directly consume power or have power states
> will not implement "change_state" so we will call "halt" and "continue"
> instead.

Now you're confusing me...

if (driver->suspend | driver->resume)
driver->suspend|resume
elseif (driver->change_state)
driver->change_state
elseif (driver->halt | driver->continue)
driver->halt|continue
else
printk("Urgh. No driver power management methods for %s.\n");


> When shutting down the system, halt has the option of turning off the device,
> as it will see the SHUTDOWN reason. So it's a driver-knows-best approach
> instead of assuming everything must be turned off, or everything must just be
> stopped.

I do like this idea. Especially if we can say "I'm rebooting rather than
powering off."

> So in theory, with cpufreq, we could stop userspace, ->halt every device
> (drivers won't do anything if they know it's not necessary), change the
> frequency, and then resume operation.

That looks like a good idea too. But it also sounds like an abuse of
suspend|resume. Perhaps it implies the need/desire for more genericness
to pm_message_t (sounds familiar!).

Regards,

Nigel
--
Nigel Cunningham
Software Engineer, Canberra, Australia
http://www.cyclades.com
Bus: +61 (2) 6291 9554; Hme: +61 (2) 6292 8028; Mob: +61 (417) 100 574

Maintainer of Suspend2 Kernel Patches http://suspend2.net

2005-04-26 09:16:45

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

Hi!

> > I don't like this notion of "stop" separated from power states anyway, I
> > think it just doesn't work in practice.
>
> Yeah, after giving it some additional thought, I think there are better ways.
>
> >
> > Ben.
> >
>
> Ok, here's a new idea.
>
> For many devices "->suspend" and "->resume" with pm_message_t is exactly what
> we need. However, as we support more advanced power management features, such
> as runtime power management, or power containers, we need something a little
> more specific. The exact power state must be specified among other
> issues.

Okay, maybe. But not by adding 3 new callbacks that mirror existing
functionality.

> We might do something like this:
>
> Keep "->suspend" and "->resume" around unchanged. (so the states would
> probably remain as PMSG_FREEZE and PMSG_SUSPEND). If the driver doesn't
> support the more advanced PM methods just use these. They work well enough
> for system sleep states etc.
>
> Alternatively drivers could support a more rich power management interface
> via the following methods:
>
>
> change_state - changes a device's power state
>
> change_state(struct device * dev, pm_state_t state, struct system_state * sys_state, int reason);
> @dev - the device
> @state - the target device-specific power state
> @sys_state - a data structure containing information about the intended global system power state
> @reason - why the state must be changed (ex. RUNTIME_PM,
> SYSTEM_SLEEP, SYSTEM_RESUME, etc.)

If drivers really need to know system state and reason, just put it
into pm_message_t. I wanted to add "flags" there from the begining,
serving similar purpose as your "reason".

> halt - acts somewhat like PMSG_FREEZE, stops device activity, doesn't change power state
>
> halt(struct device * dev, struct system_state * sys_state, int reason);
> @dev - the device
> @sys_state - a data structure containing information about the intended global system power state
> @reason - why we are halting operation (ex. RUNTIME_CHANGES (like cpufreq), SYSTEM_SLEEP, SHUTDOWN, REBOOT)

If it is similar to PMSG_FREEZE, just pass PMSG_FREEZE and put
* sys_state and reason into pm_message_t.

> contine - resumes from a "halt"
>
> continue(struct device * dev, struct system_state * sys_state, int reason);
> @dev - the device
> @sys_state - a data structure containing information about the intended global system power state
> @reason - why we are resuming operation (ex. RUNTIME_CHANGES (like cpufreq), SYSTEM_RESUME)

Now, here you have a point. resume() should get pm_message_t,
too. This should be rather easy to change (simple matter of coding),
and we have agreed before that it is good idea. Patches welcome.

Pavel
--
Boycott Kodak -- for their patent abuse against Java.

2005-04-26 09:40:31

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Po 25-04-05 18:13:27, Dave Jones wrote:
> On Mon, Apr 25, 2005 at 02:58:31PM -0700, Andrew Morton wrote:
> > Alan Stern <[email protected]> wrote:
> > >
> > > On Mon, 25 Apr 2005, Alexander Nyberg wrote:
> > >
> > > > Not sure what you mean by "make kexec work nicer" but if it is because
> > > > some devices don't work after a kexec I have some objections.
> > >
> > > That was indeed the reason, at least in my case. The newly-rebooted
> > > kernel doesn't work too well when there are active devices, with no driver
> > > loaded, doing DMA and issuing IRQs because they were never shut down.
> >
> > I have vague memories of this being discussed at some length last year.
> > Nothing comprehensive came of it, except that perhaps the kdump code should
> > spin with irqs off for a couple of seconds so the DMA and IRQs stop.
> >
> > (Ongoing DMA is not a problem actually, because the kdump kernel won't be
> > using that memory anyway)
>
> Actually, some cpufreq drivers *should* do their speed transitions with
> all PCI mastering disabled. The lack of any infrastructure to quiesce drivers
> and prevent new DMA transactions from occuring whilst the transition occurs
> means that currently.. we don't. So +1 for any driver model work that
> may lead to something we can use here.

Well, you can do "half suspend to ram; change your frequency; half
resume" today, and it should work, but I do not think you'll like the
speed.

In a ideal world, calling device_suspend(PMSG_FREEZE) gets you exactly
that, and we'll do our best to make it fast enough.

OTOH it *needs* to switch consoles to text one (because X may be
running DMA, right?); I do not think you'll like that one.
Pavel
--
Boycott Kodak -- for their patent abuse against Java.

2005-04-26 09:44:23

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability


> I've been considering for a while that, in addition to ->probe and ->remove, we
> have the following:
>
> "struct device" -->
> ->attach - binds to the device and allocates data structures
> ->probe - detects and sets up the hardware
> ->start - begins transactions (like DMA)
> ->stop - stops transactions
> ->remove - prepares the hardware for no driver control
> ->detach - frees stuff and unbinds the device

No.

Stop trying to add more hooks to struct device, we have too many
already.

Pavel
--
Boycott Kodak -- for their patent abuse against Java.

2005-04-26 10:13:22

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

Hi!

> > > Actually this patch should be in the queue somewhere... We had it in
> > > suse trees for a long time, and IMO it can solve problem easily.
> > >
> > > Pavel
> > >
> > > --- clean-git/kernel/sys.c 2005-04-23 23:21:55.000000000 +0200
> > > +++ linux/kernel/sys.c 2005-04-24 00:20:47.000000000 +0200
> > > @@ -404,6 +404,7 @@
> > > case LINUX_REBOOT_CMD_HALT:
> > > notifier_call_chain(&reboot_notifier_list, SYS_HALT, NULL);
> > > system_state = SYSTEM_HALT;
> > > + device_suspend(PMSG_SUSPEND);
> > > device_shutdown();
> >
> > Again, why keep device_shutdown() around at all then?
>
> I've argued for folding shutdown and suspend for some time now, though
> some drivers who rely on shutdown today will need fixing I suppose.
>
> Also, I think kexec shouldn't use "shutdown" but a different message.
> There are some conceptual differences, things like stopping the platters
> on disk etc... things you want to do on one and not the other. In a way,
> kexec needs are very similar to suspend-to-disk "freeze" state. I'd
> rather call PMSG_FREEZE there.

Agreed. If hardware is going to be physically powered down, we need
PMSG_SUSPEND. If it is not (kexec), PMSG_FREEZE should be enough.

Now, if we want separate PMSG_SHUTDOWN. ... I think it is similar
enough to PMSG_SUSPEND that we can keep them same "major" value and
just use different flags. I do not think many drivers will
care.
Pavel
--
Boycott Kodak -- for their patent abuse against Java.

2005-04-26 13:44:35

by David Brownell

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: [PATCH] PCI: Add pci shutdown ability

On Monday 25 April 2005 2:06 pm, Pavel Machek wrote:

> Yes. (Actually I'm not sure if PMSG_FREEZE or PMSG_SUSPEND is right
> thing to do for suspend.)

Until they have different values, it's a moot point isn't it?

2005-04-26 15:12:14

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, 25 Apr 2005, Andrew Morton wrote:

> I have vague memories of this being discussed at some length last year.
> Nothing comprehensive came of it, except that perhaps the kdump code should
> spin with irqs off for a couple of seconds so the DMA and IRQs stop.

Like Pavel said, this won't work.

> (Ongoing DMA is not a problem actually, because the kdump kernel won't be
> using that memory anyway)

For PCI devices at least, DMA _can_ be disabled in a uniform way as
devices are discovered. Some platforms might not want to do this for fear
it would kill the initial console display.

IRQs _cannot_ be disabled in a uniform way. So they remain a problem.

> For kdump we really don't want to be doing fancy driver shutdown inside a
> crashed kernel. It would be better to just jump to the new kernel and
> to reset the hardware from there. DMA doesn't matter, and maybe IRQs can
> be handled with a sustained local_irq_disable() (hard).

But at some point you have to enable local IRQs, and then you're in
trouble if a device without a driver is generating requests. Unless the
new kernel can run with interrupts entirely disabled? Seems pretty
unlikely.

The real problem is that, in general, hardware _can't_ be reset properly
by a new kernel. There are things that can help, like the PCI USB quirks
code. That might be enough to handle the most pressing existing problems;
certainly it would avoid the USB issues we've seen. (But it needs to be
updated to avoid interfering with normal operations during
resume-from-disk.)

Alan Stern

2005-04-26 15:14:44

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Tue, 26 Apr 2005, Benjamin Herrenschmidt wrote:

> The problem is, as far as I understand what David told me a while ago,
> some USB chips simply _cannot_ disable DMA without actually suspending
> the bus, which itself is a complex process that takes some time and can
> involve all sort of problems with devices / drivers that don't deal with
> suspended busses properly. I suspect other kind of chips may be
> similarily busted by design.

That's correct. However, during shutdown we don't really need to take the
time and we don't care about problems with drivers not handling suspended
buses properly. (USB devices, at least, _can_ handle such things -- it's
part of the spec.)

Alan Stern

2005-04-26 15:39:46

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, Apr 25, 2005 at 04:14:13PM -0400, Alan Stern wrote:
> > Not sure what you mean by "make kexec work nicer" but if it is because
> > some devices don't work after a kexec I have some objections.
>
> That was indeed the reason, at least in my case. The newly-rebooted
> kernel doesn't work too well when there are active devices, with no driver
> loaded, doing DMA and issuing IRQs because they were never shut down.

This is also a problem at "normal" boot time. BIOS may leave devices
still doing DMA if BIOS (or the arch equivalent) was using the device.
This problem is obvious for systems with an IOMMU (e.g. parisc).
See drivers/parisc/sba_iommu.c for an example of where I try to
deal with active DMA at boot time *before* PCI bus walks have occurred.
Masking IRQs is trivial in comparison to dealing with active DMA.

> > What about the kexec-on-panic?

Same problem - just much more likely to hit the issue and completely
crash the box or corrupt memory.

grant

2005-04-26 15:51:12

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Mon, Apr 25, 2005 at 05:12:09PM -0400, Alan Stern wrote:
> Do you have any suggestions for a better way to stop a device from issuing
> IRQs and doing DMA at any point between the time when the old kernel
> panics and the time when the new kernel loads the device's driver?

PCI Bus resets?

That means linux kernel needs to reconfigure everything unless
the BIOS gets invoked again (machine reset). Linux has *most*
of the code to do this but I'm pretty sure enough existing
configs won't work that this will be very painful.

> I looked into the possibility of having the PCI core disable interrupt
> generation and DMA on each new device as it is discovered. Unfortunately
> there is no dependable, universal way to do this for IRQs.

Sure there is. Every IRQ line goes to an IRQ controller.
Arch specific code deals with programming the controller and can
mask all interrupts (or not). Historically, they've been left unmasked
for ISA IRQ discovery and debugging misrouted IRQ lines.

> (A notable gap in the original PCI specification, IMHO.) Another problem
> with this
> approach is that on some platforms the initial console is a PCI device
> driven by the firmware, but the firmware won't tell Linux which device it
> is! Disable the wrong device and away goes your screen display.

Exactly. That's really the main problem with disabling DMA globally.
I don't have any brilliant ideas on how to fix this either
except shutdown console before PCI bus walks and switch to
linux console after PCI bus walks. Makes debugging PCI issues
a bit more difficult though.

grant

2005-04-26 16:06:25

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Tue, 26 Apr 2005, Grant Grundler wrote:

> > I looked into the possibility of having the PCI core disable interrupt
> > generation and DMA on each new device as it is discovered. Unfortunately
> > there is no dependable, universal way to do this for IRQs.
>
> Sure there is. Every IRQ line goes to an IRQ controller.
> Arch specific code deals with programming the controller and can
> mask all interrupts (or not). Historically, they've been left unmasked
> for ISA IRQ discovery and debugging misrouted IRQ lines.

This doesn't help. Consider what happens when two devices share an IRQ
line. Suppose device B is generating interrupt requests when the driver
for device A is probed. The driver registers its handler, which causes
the IRQ line to be unmasked. Then a multitude of IRQs arrive from B, none
of which can be handled by A's driver. So the kernel shuts the IRQ line
down permanently...

Alan Stern

2005-04-26 16:06:28

by Alexander Nyberg

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

tis 2005-04-26 klockan 11:11 -0400 skrev Alan Stern:
> On Mon, 25 Apr 2005, Andrew Morton wrote:
>
> > I have vague memories of this being discussed at some length last year.
> > Nothing comprehensive came of it, except that perhaps the kdump code should
> > spin with irqs off for a couple of seconds so the DMA and IRQs stop.
>
> Like Pavel said, this won't work.
>
> > (Ongoing DMA is not a problem actually, because the kdump kernel won't be
> > using that memory anyway)
>
> For PCI devices at least, DMA _can_ be disabled in a uniform way as
> devices are discovered. Some platforms might not want to do this for fear
> it would kill the initial console display.
>
> IRQs _cannot_ be disabled in a uniform way. So they remain a problem.
>
> > For kdump we really don't want to be doing fancy driver shutdown inside a
> > crashed kernel. It would be better to just jump to the new kernel and
> > to reset the hardware from there. DMA doesn't matter, and maybe IRQs can
> > be handled with a sustained local_irq_disable() (hard).
>
> But at some point you have to enable local IRQs, and then you're in
> trouble if a device without a driver is generating requests. Unless the
> new kernel can run with interrupts entirely disabled? Seems pretty
> unlikely.

At least on x86 & x64 both i8259A and optional IOAPIC are initially
fully masked until a driver decides to open up a line.

If driver initialization fails then it should never open up the line on
the interrupt controller. So this shouldn't be a problem with interrupts
or am I missing something?
Shared interrupt lines do present a problem however as someone else gets
the chance to scream on an open line ultimately killing it, hmmm.

> The real problem is that, in general, hardware _can't_ be reset properly
> by a new kernel. There are things that can help, like the PCI USB quirks
> code. That might be enough to handle the most pressing existing problems;
> certainly it would avoid the USB issues we've seen. (But it needs to be
> updated to avoid interfering with normal operations during
> resume-from-disk.)



2005-04-26 16:11:25

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Tue, Apr 26, 2005 at 02:30:12PM +1000, Benjamin Herrenschmidt wrote:
> Additionally, some machines won't properly park/flush the disk, it's
> necessary to send the proper suspend commands to IDE hard disks prior to
> shutting down or we risk data loss.

That sounds like the disk is caching write data.

SCSI has a "Write Cache Enable" bit in the "caching mode page".
See "scsiinfo -c" output.

10 years ago I measured/compared performance of disabling WCE
(plus queue depth of 8) and enabling WCE (queue depth 1).
It's a wash for the workloads I was testing. I was told "stupid OSs"
that didn't know about tagged queueing needed WCE for benchmarks.
HP cared for exactly the same reasons: Bus Resets or Disk power failure
would cause data loss with WCE enabled.

Disabling WCE on current OSs would solve the above problem
IFF IDE/SATA also supports "cache mode" page.

grant

2005-04-26 16:15:12

by linux-os (Dick Johnson)

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Tue, 26 Apr 2005, Grant Grundler wrote:

> On Mon, Apr 25, 2005 at 04:14:13PM -0400, Alan Stern wrote:
>>> Not sure what you mean by "make kexec work nicer" but if it is because
>>> some devices don't work after a kexec I have some objections.
>>
>> That was indeed the reason, at least in my case. The newly-rebooted
>> kernel doesn't work too well when there are active devices, with no driver
>> loaded, doing DMA and issuing IRQs because they were never shut down.
>
> This is also a problem at "normal" boot time. BIOS may leave devices
> still doing DMA if BIOS (or the arch equivalent) was using the device.
> This problem is obvious for systems with an IOMMU (e.g. parisc).
> See drivers/parisc/sba_iommu.c for an example of where I try to
> deal with active DMA at boot time *before* PCI bus walks have occurred.
> Masking IRQs is trivial in comparison to dealing with active DMA.
>
>>> What about the kexec-on-panic?
>
> Same problem - just much more likely to hit the issue and completely
> crash the box or corrupt memory.
>
> grant

DMAs don't go on "forever" and at the time they are started they
are doing DMA to or from memory that is "owned" by the user of
the DMA device. In order for DMAs to continue past that point,
there needs to be something that got notified of a previous
completion to start the next one that you state is erroneous.
If such an erroneous DMA is occurring, it can only occur as
as result of an interrupt and ISR that is still in-place to
reprogram and restart DMA on the interrupting device. Therefore,
all you need to do to quiet any such erroneous DMA operations,
if they are occurring at all, is to mask off the interrupts
on anything that hasn't been initialized yet.

The newer PCI code design has a built-in problem that you
can't find out the interrupt it will use until you enable
the device. This means that there is some possibility of
a race between getting that information and putting in a
ISR to quiet the device which may still be active because
it was used during the boot. It think this is the more
likely scenario than some DMA that is still active.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
Notice : All mail here is now cached for review by Dictator Bush.
98.36% of all statistics are fiction.

2005-04-26 16:23:50

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Tue, Apr 26, 2005 at 12:07:41PM -0400, Richard B. Johnson wrote:
> DMAs don't go on "forever"

They don't. But we also don't know when they will stop.
E.g. NICs will stop DMA when the RX descriptor ring is full.
I don't know when USB stop on it's own.

> and at the time they are started they
> are doing DMA to or from memory that is "owned" by the user of
> the DMA device. In order for DMAs to continue past that point,
> there needs to be something that got notified of a previous
> completion to start the next one that you state is erroneous.
> If such an erroneous DMA is occurring, it can only occur as
> as result of an interrupt and ISR that is still in-place to
> reprogram and restart DMA on the interrupting device.

No. BIOS (parisc PDC in my case) left the device enabled.
PDC does NOT use interrupts which is exactly why they've left
the device enabled for DMA.

> Therefore,
> all you need to do to quiet any such erroneous DMA operations,
> if they are occurring at all, is to mask off the interrupts
> on anything that hasn't been initialized yet.
>
> The newer PCI code design has a built-in problem that you
> can't find out the interrupt it will use until you enable
> the device.

DMA does not need to be enabled to read PCI_INTERRUPT_LINE (config space).
Or grab the IRQ # from pci_dev->irq if PCI is already initialized.

grant

> This means that there is some possibility of
> a race between getting that information and putting in a
> ISR to quiet the device which may still be active because
> it was used during the boot. It think this is the more
> likely scenario than some DMA that is still active.
>
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.6.11 on an i686 machine (5537.79 BogoMips).
> Notice : All mail here is now cached for review by Dictator Bush.
> 98.36% of all statistics are fiction.

2005-04-26 16:40:53

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Tue, Apr 26, 2005 at 12:04:00PM -0400, Alan Stern wrote:
> > Sure there is. Every IRQ line goes to an IRQ controller.
> > Arch specific code deals with programming the controller and can
> > mask all interrupts (or not). Historically, they've been left unmasked
> > for ISA IRQ discovery and debugging misrouted IRQ lines.
>
> This doesn't help. Consider what happens when two devices share an IRQ
> line. Suppose device B is generating interrupt requests when the driver
> for device A is probed. The driver registers its handler, which causes
> the IRQ line to be unmasked. Then a multitude of IRQs arrive from B, none
> of which can be handled by A's driver. So the kernel shuts the IRQ line
> down permanently...

Agreed - but this is a different problem than "shutting down IRQs".
My point was arch specific code knows how to mask all IRQs.
irq_disable() is expected to work regardless of what state the
driver is in. On kexec "reboot", kernel drivers can unmask IRQs
as they normally would during initialization. No?

grant

2005-04-26 17:12:43

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Tue, 26 Apr 2005, Grant Grundler wrote:

> On Tue, Apr 26, 2005 at 12:07:41PM -0400, Richard B. Johnson wrote:
> > DMAs don't go on "forever"
>
> They don't. But we also don't know when they will stop.
> E.g. NICs will stop DMA when the RX descriptor ring is full.
> I don't know when USB stop on it's own.

USB doesn't stop DMA on its own. It goes on forever until it's told to
stop or it encounters an error.

Alan Stern

2005-04-26 17:15:16

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Tue, 26 Apr 2005, Grant Grundler wrote:

> Agreed - but this is a different problem than "shutting down IRQs".
> My point was arch specific code knows how to mask all IRQs.
> irq_disable() is expected to work regardless of what state the
> driver is in. On kexec "reboot", kernel drivers can unmask IRQs
> as they normally would during initialization. No?

One has to be careful when talking about enabling/disable IRQs, because
there are (at least) two choke points: one on the device and one on the
computer's interrupt controller. Masking IRQs takes place on the
controller, but I was talking about stopping interrupt requests at their
source on the device. It's the only way to avoid problems when IRQ lines
are shared.

Alan Stern

2005-04-26 17:24:15

by Lee Revell

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Tue, 2005-04-26 at 13:12 -0400, Alan Stern wrote:
> On Tue, 26 Apr 2005, Grant Grundler wrote:
>
> > On Tue, Apr 26, 2005 at 12:07:41PM -0400, Richard B. Johnson wrote:
> > > DMAs don't go on "forever"
> >
> > They don't. But we also don't know when they will stop.
> > E.g. NICs will stop DMA when the RX descriptor ring is full.
> > I don't know when USB stop on it's own.
>
> USB doesn't stop DMA on its own. It goes on forever until it's told to
> stop or it encounters an error.

Ditto sound cards. Once you start capture or playback the device will
DMA to/from the assigned area forever.

Lee

2005-04-26 17:40:43

by Grant Grundler

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Tue, Apr 26, 2005 at 01:14:38PM -0400, Alan Stern wrote:
> On Tue, 26 Apr 2005, Grant Grundler wrote:
>
> > Agreed - but this is a different problem than "shutting down IRQs".
> > My point was arch specific code knows how to mask all IRQs.
> > irq_disable() is expected to work regardless of what state the
> > driver is in. On kexec "reboot", kernel drivers can unmask IRQs
> > as they normally would during initialization. No?
>
> One has to be careful when talking about enabling/disable IRQs, because
> there are (at least) two choke points: one on the device and one on the
> computer's interrupt controller. Masking IRQs takes place on the
> controller, but I was talking about stopping interrupt requests at their
> source on the device. It's the only way to avoid problems when IRQ lines
> are shared.

Yes, got it. I wouldn't object if someone said kexec can not
support shared IRQs unless both drivers sharing the IRQ implement
PCI "suspend" hook (and disable IRQ on the respective devices).
We really don't have many options to fix Shared IRQ problems.

thanks,
grant

2005-04-26 17:53:38

by Dave Jones

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Tue, Apr 26, 2005 at 11:39:39AM +0200, Pavel Machek wrote:

> Well, you can do "half suspend to ram; change your frequency; half
> resume" today, and it should work, but I do not think you'll like the
> speed.

Indeed. With people running things like cpuspeed daemons to dynamically
scale speed, this is going to be really painful.
Of course, any operation where we have to quiesce DMA is going to mean
we're increasing latency around the scaling operation, but we don't
have to go through all the hoops that are necessary when suspending.

Thankfully some of the more recent implementations of speed/voltage
scaling don't have this requirement.

> In a ideal world, calling device_suspend(PMSG_FREEZE) gets you exactly
> that, and we'll do our best to make it fast enough.
>
> OTOH it *needs* to switch consoles to text one (because X may be
> running DMA, right?); I do not think you'll like that one.

That would be insane, and make cpufreq totally useless for anyone
running X, so no. This is one of the reasons the kernel needs to
arbitrate DMA on behalf of X. It just needs someone to do the work.

Dave

2005-04-26 20:23:53

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

Hi!

> > Well, you can do "half suspend to ram; change your frequency; half
> > resume" today, and it should work, but I do not think you'll like the
> > speed.
>
> Indeed. With people running things like cpuspeed daemons to dynamically
> scale speed, this is going to be really painful.
> Of course, any operation where we have to quiesce DMA is going to mean
> we're increasing latency around the scaling operation, but we don't
> have to go through all the hoops that are necessary when suspending.

> Thankfully some of the more recent implementations of speed/voltage
> scaling don't have this requirement.

Good, because some devices really need DMA. (Won't audio skip, and USB
break when you disable DMA? I do not see how cpufreq doing DMA disable
can be usefull.)

> > In a ideal world, calling device_suspend(PMSG_FREEZE) gets you exactly
> > that, and we'll do our best to make it fast enough.
> >
> > OTOH it *needs* to switch consoles to text one (because X may be
> > running DMA, right?); I do not think you'll like that one.
>
> That would be insane, and make cpufreq totally useless for anyone
> running X, so no. This is one of the reasons the kernel needs to
> arbitrate DMA on behalf of X. It just needs someone to do the work.

Yes... But it also looks like a lot of work :-(.
Pavel
--
Boycott Kodak -- for their patent abuse against Java.

2005-04-26 21:16:13

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-usb-devel] Re: [PATCH] PCI: Add pci shutdown ability


> > Yes. (Actually I'm not sure if PMSG_FREEZE or PMSG_SUSPEND is right
> > thing to do for suspend.)
>
> Until they have different values, it's a moot point isn't it?

They already have different values in my and SuSE trees, and I plan to
propagate that upstream very soon after 2.6.12. Until then... yes its
moot.

Pavel
--
Boycott Kodak -- for their patent abuse against Java.

2005-05-11 05:33:33

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

> I looked into the possibility of having the PCI core disable interrupt
> generation and DMA on each new device as it is discovered. Unfortunately
> there is no dependable, universal way to do this for IRQs. (A notable gap
> in the original PCI specification, IMHO.)

PCI specification 2.3 onwards, command register bit 10 can be used for
disabling the interrupts from respective device. And the very reason for
introducing this bit seems to be to not allow the device issue interrupts
until a suitable driver for the device has been loaded. Have a look at
following message.

http://www.pcisig.com/reflector/msg05302.html

Probably this feature can be used to disable the interrupts from the devices
and enable these back when respective driver is loaded. This will resolve
the problem of drivers not getting initialized in second kernel due to shared
interrupts in kdump and reliability of capturing dump can be increased.

Thanks
Vivek

2005-05-11 14:42:06

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] PCI: Add pci shutdown ability

On Wed, 11 May 2005, Vivek Goyal wrote:

> > I looked into the possibility of having the PCI core disable interrupt
> > generation and DMA on each new device as it is discovered. Unfortunately
> > there is no dependable, universal way to do this for IRQs. (A notable gap
> > in the original PCI specification, IMHO.)
>
> PCI specification 2.3 onwards, command register bit 10 can be used for
> disabling the interrupts from respective device. And the very reason for
> introducing this bit seems to be to not allow the device issue interrupts
> until a suitable driver for the device has been loaded. Have a look at
> following message.
>
> http://www.pcisig.com/reflector/msg05302.html
>
> Probably this feature can be used to disable the interrupts from the devices
> and enable these back when respective driver is loaded. This will resolve
> the problem of drivers not getting initialized in second kernel due to shared
> interrupts in kdump and reliability of capturing dump can be increased.

That's good, and it's certainly a step in the right direction. It won't
help with any pre-PCI-2.3 devices out there but it might be worth trying
to implement.

Alan Stern