2006-10-11 20:41:25

by Alan Stern

[permalink] [raw]
Subject: Bug in PCI core

When a PCI device is suspended, its driver calls pci_save_state() so that
the config space can be restored when the device is resumed. Then the
driver calls pci_set_power_state().

However pci_set_power_state() calls pci_block_user_cfg_access(), and that
routine calls pci_save_state() again. This overwrites the saved state
with data in which memory, I/O, and bus master accesses are disabled. As
a result, when the device is resumed it doesn't work.

Obviously pci_block_user_cfg_access() needs to be fixed. I don't know the
right way to fix it; hopefully somebody else does.

Alan Stern


2006-10-13 01:01:23

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Wed, 2006-10-11 at 16:41 -0400, Alan Stern wrote:
> When a PCI device is suspended, its driver calls pci_save_state() so that
> the config space can be restored when the device is resumed. Then the
> driver calls pci_set_power_state().
>
> However pci_set_power_state() calls pci_block_user_cfg_access(), and that
> routine calls pci_save_state() again. This overwrites the saved state
> with data in which memory, I/O, and bus master accesses are disabled. As
> a result, when the device is resumed it doesn't work.
>
> Obviously pci_block_user_cfg_access() needs to be fixed. I don't know the
> right way to fix it; hopefully somebody else does.

Well, blocking user cfg access snapshots the config space to be able to
respond to user space while the device is offline. Maybe it should be
done from a separate config space image buffer ? ugh....

Ben.


2006-10-13 08:40:35

by Adam M Belay

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, 2006-10-13 at 11:01 +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2006-10-11 at 16:41 -0400, Alan Stern wrote:
> > When a PCI device is suspended, its driver calls pci_save_state() so that
> > the config space can be restored when the device is resumed. Then the
> > driver calls pci_set_power_state().
> >
> > However pci_set_power_state() calls pci_block_user_cfg_access(), and that
> > routine calls pci_save_state() again. This overwrites the saved state
> > with data in which memory, I/O, and bus master accesses are disabled. As
> > a result, when the device is resumed it doesn't work.
> >
> > Obviously pci_block_user_cfg_access() needs to be fixed. I don't know the
> > right way to fix it; hopefully somebody else does.
>
> Well, blocking user cfg access snapshots the config space to be able to
> respond to user space while the device is offline. Maybe it should be
> done from a separate config space image buffer ? ugh....
>
> Ben.

Personally, I don't think exposing a cached version of the PCI config
space when direct device access is prohibited is the right approach
here. We really shouldn't be lying about the internal state of PCI
devices (the cached version could be quite inaccurate). After all, if
the device is in D3cold, then the spec claims it's perfectly valid for
it to not respond to PCI configuration access.

I can only assume this hack was done to satisfy some terribly broken
userspace app. It's not surprising that even reading PCI config can
easily crash systems. However, it's the responsibility of those apps
with permission to access the PCI sysfs interface, not the kernel, to be
aware of these constraints.

The PCI configuration space cache was originally introduced to support
power management. However, it's mostly incorrect, as it unnecessarily
stores the values of read only registers (and even BIST which is
potentially dangerous). A while back I posted a series of patches that
address this issue, and the net result was that the config cache stays
around wasting memory because of the pci_block_user_cfg_access()
dependency despite being useless to PCI PM.

I'd like to propose that we have the pci config sysfs interface return
-EIO when it's blocked (e.g. active BIST or D3cold). This accurately
reflects the state of the device to userspace, reduces complexity, and
could potentially save some memory per PCI device instance.

Thanks,
Adam


2006-10-13 09:17:16

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core


> Personally, I don't think exposing a cached version of the PCI config
> space when direct device access is prohibited is the right approach
> here. We really shouldn't be lying about the internal state of PCI
> devices (the cached version could be quite inaccurate). After all, if
> the device is in D3cold, then the spec claims it's perfectly valid for
> it to not respond to PCI configuration access.

Yes, but the problem is that lspci etc... will suddenly see a bunch of
ffff's all over the place instead of the device. In fact, I think we do
use -some- cached infos already there but not for everything.

And that breaks .... distro installers :0

For example, currently, if I power off the ethernet of my mac, or the
firewire chip (which are powered off if the module isn't loaded), lspci
will get the device id and vendor id right ... but won't get the class
code.

I'm not sure about kudzu/udev/whoever, I've had problems with distros
not loading the modules because they couldn't find the chip because it
was off because the module wasn't loaded ... (typical of chips like
firewire which are loaded by class code).

So I agree... but :)

In a perfect world were everthing goes via sysfs and we have files that
expose all the necessary cached infos for identifying the device
(vendor, device, subsystem ids, class code, etc... which we do have in
sysfs), then yes, we can probably make config space accesses to an off
device just return ff's or an error (on some platform, they have to be
blocked as they can lockup, happens with some cells on the mac when
unclocked).

> I can only assume this hack was done to satisfy some terribly broken
> userspace app.

Yes, distro HW detection tools mostly.

> It's not surprising that even reading PCI config can
> easily crash systems. However, it's the responsibility of those apps
> with permission to access the PCI sysfs interface, not the kernel, to be
> aware of these constraints.

I agree.

> The PCI configuration space cache was originally introduced to support
> power management. However, it's mostly incorrect, as it unnecessarily
> stores the values of read only registers (and even BIST which is
> potentially dangerous). A while back I posted a series of patches that
> address this issue, and the net result was that the config cache stays
> around wasting memory because of the pci_block_user_cfg_access()
> dependency despite being useless to PCI PM.
>
> I'd like to propose that we have the pci config sysfs interface return
> -EIO when it's blocked (e.g. active BIST or D3cold). This accurately
> reflects the state of the device to userspace, reduces complexity, and
> could potentially save some memory per PCI device instance.

We need to enquire which userspace apps have a problem here. It's mostly
a distro matter... or we can force their hand :)

The problem is mostly obsolete crap not using sysfs, like ... lspci :)

Ben.


2006-10-13 09:31:55

by Martin Mares

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

Hi!

> For example, currently, if I power off the ethernet of my mac, or the
> firewire chip (which are powered off if the module isn't loaded), lspci
> will get the device id and vendor id right ... but won't get the class
> code.

Ehm, you aren't using any recent pciutils, are you? ;-)

Have a nice fortnight
--
Martin `MJ' Mares <[email protected]> http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
Lottery -- a tax on people who can't do math.

2006-10-13 12:29:14

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, 2006-10-13 at 11:31 +0200, Martin Mares wrote:
> Hi!
>
> > For example, currently, if I power off the ethernet of my mac, or the
> > firewire chip (which are powered off if the module isn't loaded), lspci
> > will get the device id and vendor id right ... but won't get the class
> > code.
>
> Ehm, you aren't using any recent pciutils, are you? ;-)

Whatever came with the distro that complained about the problem back
then :) I agree that the problem is fixed on the kernel level (sysfs)
and I'm happy to hear that pciutils is fixed too :) So we can probably
do what Adam suggest and just return errors or ff's

Ben.


2006-10-13 14:29:28

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, 13 Oct 2006, Adam Belay wrote:

> Personally, I don't think exposing a cached version of the PCI config
> space when direct device access is prohibited is the right approach
> here. We really shouldn't be lying about the internal state of PCI
> devices (the cached version could be quite inaccurate). After all, if
> the device is in D3cold, then the spec claims it's perfectly valid for
> it to not respond to PCI configuration access.
>
> I can only assume this hack was done to satisfy some terribly broken
> userspace app. It's not surprising that even reading PCI config can
> easily crash systems. However, it's the responsibility of those apps
> with permission to access the PCI sysfs interface, not the kernel, to be
> aware of these constraints.

According to the comment in the code, access is blocked only during the
time that the device is making the transition between different power
states.

> The PCI configuration space cache was originally introduced to support
> power management. However, it's mostly incorrect, as it unnecessarily
> stores the values of read only registers (and even BIST which is
> potentially dangerous). A while back I posted a series of patches that
> address this issue, and the net result was that the config cache stays
> around wasting memory because of the pci_block_user_cfg_access()
> dependency despite being useless to PCI PM.
>
> I'd like to propose that we have the pci config sysfs interface return
> -EIO when it's blocked (e.g. active BIST or D3cold). This accurately
> reflects the state of the device to userspace, reduces complexity, and
> could potentially save some memory per PCI device instance.

Could you resubmit your old patches and include a corresponding fix for
this access problem?

BTW, is it possible for userspace to try accessing a device in D3cold?
Doesn't the fact that it is D3cold rather than D3hot mean the computer is
turned off? Or have I been missing out on new developments?

Alan Stern

2006-10-13 15:02:32

by Alan

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

Ar Gwe, 2006-10-13 am 10:29 -0400, ysgrifennodd Alan Stern:
> > I'd like to propose that we have the pci config sysfs interface return
> > -EIO when it's blocked (e.g. active BIST or D3cold). This accurately
> > reflects the state of the device to userspace, reduces complexity, and
> > could potentially save some memory per PCI device instance.
>
> Could you resubmit your old patches and include a corresponding fix for
> this access problem?

And then you can fix the applications it breaks, like the X server which
does actually want to know where all the devices are located in PCI
space.

Alan

2006-10-13 15:30:08

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, 2006-10-13 at 16:26 +0100, Alan Cox wrote:
> Ar Gwe, 2006-10-13 am 10:29 -0400, ysgrifennodd Alan Stern:
> > > I'd like to propose that we have the pci config sysfs interface return
> > > -EIO when it's blocked (e.g. active BIST or D3cold). This accurately
> > > reflects the state of the device to userspace, reduces complexity, and
> > > could potentially save some memory per PCI device instance.
> >
> > Could you resubmit your old patches and include a corresponding fix for
> > this access problem?
>
> And then you can fix the applications it breaks, like the X server which
> does actually want to know where all the devices are located in PCI
> space.
>

.. but which could equally well mmap the resource from sysfs ;)


also the thing this patch does is ONLY when the device is effectively
off the bus return -EIO.
One can argue that -EAGAIN is nicer since it's only a temporary
condition though....


2006-10-13 15:40:10

by Alan

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

Ar Gwe, 2006-10-13 am 17:29 +0200, ysgrifennodd Arjan van de Ven:
> > And then you can fix the applications it breaks, like the X server which
> > does actually want to know where all the devices are located in PCI
> > space.
> >
>
> .. but which could equally well mmap the resource from sysfs ;)

That doesn't help deal with the location and PCI control side of things
X has to perform and deal with. You also forgot to attach the tested
patch set for the X server and other affected apps.

The cached stuff was put in place precisely because stuff broke

2006-10-13 16:24:27

by Adam M Belay

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, 2006-10-13 at 17:06 +0100, Alan Cox wrote:
> Ar Gwe, 2006-10-13 am 17:29 +0200, ysgrifennodd Arjan van de Ven:
> > > And then you can fix the applications it breaks, like the X server which
> > > does actually want to know where all the devices are located in PCI
> > > space.
> > >
> >
> > .. but which could equally well mmap the resource from sysfs ;)
>
> That doesn't help deal with the location and PCI control side of things
> X has to perform and deal with. You also forgot to attach the tested
> patch set for the X server and other affected apps.
>
> The cached stuff was put in place precisely because stuff broke
>

I agree this needs to be fixed. However, as I previously mentioned,
this isn't the right place to attack the problem. Remember, this wasn't
originally a kernel regression. Rather it's a workaround for a known
X/lspci/whatever bug. It's not the kernel's job to babysit userspace.
If a userspace app that has the proper permissions decides to take a
course of action that could potentially crash the system, then it has a
right to do so. There are probably dozens of vectors for these sorts of
problems (e.g. mmap as Arjan has mentioned) so why stop at the pci
config sysfs interface?

In this specific case, the workaround for this userspace bug actually
makes it impossible for programs that are implemented correctly (i.e.
understand that PCI configuration space can be inaccessible under
certain conditions) from working optimally because the kernel gives
inaccurate PCI config data rather than reporting the reality of the
situation. I'd much rather give correct code the advantage then work
around buggy software that really needs to be fixed directly.

Finally, it's worth noting that this issue is really a corner-case, and
in most systems it's extremely rare that even incorrect userspace apps
would have any issue.

Thanks,
Adam


2006-10-13 16:30:40

by Adam M Belay

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, 2006-10-13 at 17:29 +0200, Arjan van de Ven wrote:
> On Fri, 2006-10-13 at 16:26 +0100, Alan Cox wrote:
> > Ar Gwe, 2006-10-13 am 10:29 -0400, ysgrifennodd Alan Stern:
> > > > I'd like to propose that we have the pci config sysfs interface return
> > > > -EIO when it's blocked (e.g. active BIST or D3cold). This accurately
> > > > reflects the state of the device to userspace, reduces complexity, and
> > > > could potentially save some memory per PCI device instance.
> > >
> > > Could you resubmit your old patches and include a corresponding fix for
> > > this access problem?
> >
> > And then you can fix the applications it breaks, like the X server which
> > does actually want to know where all the devices are located in PCI
> > space.
> >
>
> .. but which could equally well mmap the resource from sysfs ;)
>
>
> also the thing this patch does is ONLY when the device is effectively
> off the bus return -EIO.
> One can argue that -EAGAIN is nicer since it's only a temporary
> condition though....
>
>

Yeah, perhaps -EAGAIN would be more appropriate, especially in the power
state transition and BIST cases. An interesting possibility might be to
have the file actually block, although I'm not sure if the O_NONBLOCK
flag or polling for that matter can be supported through the
sysfs/driver-core API.

-Adam


2006-10-13 16:36:38

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, Oct 13, 2006 at 12:34:20PM -0400, Adam Belay wrote:
> I agree this needs to be fixed. However, as I previously mentioned,
> this isn't the right place to attack the problem. Remember, this wasn't
> originally a kernel regression. Rather it's a workaround for a known
> X/lspci/whatever bug. It's not the kernel's job to babysit userspace.
> If a userspace app that has the proper permissions decides to take a
> course of action that could potentially crash the system, then it has a
> right to do so. There are probably dozens of vectors for these sorts of
> problems (e.g. mmap as Arjan has mentioned) so why stop at the pci
> config sysfs interface?

The patch I posted (to deny user access while the device is
transitioning D-states) is to fix a bug where *any* local user can bring
the system into undefined territory, simply by typing lspci at the right
moment. No special permission is needed.

I hadn't realised that pci_block_user_cfg_access() would call
pci_save_state(). There's only one other user of pci_block_user_cfg_access()
-- drivers/scsi/ipr.c and I think it could be induced to call
pci_save_state() itself. It's an odd asymmetry anyway -- block calls
save state, but unblock doesn't call restore_state.

2006-10-13 16:43:06

by Alan

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

Ar Gwe, 2006-10-13 am 12:34 -0400, ysgrifennodd Adam Belay:
> I agree this needs to be fixed. However, as I previously mentioned,
> this isn't the right place to attack the problem. Remember, this wasn't
> originally a kernel regression. Rather it's a workaround for a known

It's a kernel regression. It used to be reliable to read X resource
addresses at any time.

> Finally, it's worth noting that this issue is really a corner-case, and
> in most systems it's extremely rare that even incorrect userspace apps
> would have any issue.

Except just occasionally and randomly in the field, probably almost
undebuggable and irreproducable - the very worst conceivable kind of
bug.

2006-10-13 16:49:36

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, Oct 13, 2006 at 06:09:09PM +0100, Alan Cox wrote:
> Ar Gwe, 2006-10-13 am 12:34 -0400, ysgrifennodd Adam Belay:
> > I agree this needs to be fixed. However, as I previously mentioned,
> > this isn't the right place to attack the problem. Remember, this wasn't
> > originally a kernel regression. Rather it's a workaround for a known
>
> It's a kernel regression. It used to be reliable to read X resource
> addresses at any time.

No it didn't. It's undefined behaviour to perform *any* PCI config
access to the device while it's doing a D-state transition. It may have
happened to work with the chips you tried it with, but more likely you
never hit that window because X simply didn't try to do that.

> > Finally, it's worth noting that this issue is really a corner-case, and
> > in most systems it's extremely rare that even incorrect userspace apps
> > would have any issue.
>
> Except just occasionally and randomly in the field, probably almost
> undebuggable and irreproducable - the very worst conceivable kind of
> bug.

Indeed. Only now we have software producing it, rather than hardware
producing it. That's actually an improvement I think, since it forces
awareness of the issue.

2006-10-13 16:52:09

by Adam M Belay

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, 2006-10-13 at 18:09 +0100, Alan Cox wrote:
> Ar Gwe, 2006-10-13 am 12:34 -0400, ysgrifennodd Adam Belay:
> > I agree this needs to be fixed. However, as I previously mentioned,
> > this isn't the right place to attack the problem. Remember, this wasn't
> > originally a kernel regression. Rather it's a workaround for a known
>
> It's a kernel regression. It used to be reliable to read X resource
> addresses at any time.

Not true, reading these registers during a BIST has always been a
problem.

>
> > Finally, it's worth noting that this issue is really a corner-case, and
> > in most systems it's extremely rare that even incorrect userspace apps
> > would have any issue.
>
> Except just occasionally and randomly in the field, probably almost
> undebuggable and irreproducable - the very worst conceivable kind of
> bug.
>

Which is why returning an error code during device transitions is a
reasonable compromise between correctness and practicality.

-Adam


2006-10-13 17:08:48

by Alan

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

Ar Gwe, 2006-10-13 am 10:49 -0600, ysgrifennodd Matthew Wilcox:
> No it didn't. It's undefined behaviour to perform *any* PCI config
> access to the device while it's doing a D-state transition. It may have

I think you missed the earlier parts of the story - the kernel caches
the base config register state.

> happened to work with the chips you tried it with, but more likely you
> never hit that window because X simply didn't try to do that.

Which is why the kernel caches the register state. This all came up long
ago and the solution we currently have was the one chosen after
considerable debate and analysis about things like locking. We preserved
the historical reliable interface going back to the early Linux PCI
support and used by all the apps.


There are several problems with making it return an error

- What does user space do ?

while(pci_...() == -EAGAIN) yield();

which is useful how - there is no select operation for waiting here, and
while it could be added it just gets uglier

- Who actually wants to get an error in that specific case ?

If you can find someone who desperately wants an error code then code in
O_DIRECT support to do it and preserve the existing sane API.

The job of the kernel is not to expose hardware directly, it is to
provide sane interfaces to it. We don't have separate interfaces to
conf1, conf2, pcibios etc for good reason. Exposing everyone to ugly
minor details of the PCI transition handling isn't progress.


Alan

2006-10-13 17:14:09

by Arjan van de Ven

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, 2006-10-13 at 18:34 +0100, Alan Cox wrote:
> Ar Gwe, 2006-10-13 am 10:49 -0600, ysgrifennodd Matthew Wilcox:
> > No it didn't. It's undefined behaviour to perform *any* PCI config
> > access to the device while it's doing a D-state transition. It may have
>
> I think you missed the earlier parts of the story - the kernel caches
> the base config register state.
>
> > happened to work with the chips you tried it with, but more likely you
> > never hit that window because X simply didn't try to do that.
>
> Which is why the kernel caches the register state.

but... it didn't USE the cache in the case we're protecting here.
Instead the hardware would just go splat.


2006-10-13 17:57:52

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, 13 Oct 2006, Alan Cox wrote:

> Ar Gwe, 2006-10-13 am 10:49 -0600, ysgrifennodd Matthew Wilcox:
> > No it didn't. It's undefined behaviour to perform *any* PCI config
> > access to the device while it's doing a D-state transition. It may have
>
> I think you missed the earlier parts of the story - the kernel caches
> the base config register state.
>
> > happened to work with the chips you tried it with, but more likely you
> > never hit that window because X simply didn't try to do that.
>
> Which is why the kernel caches the register state. This all came up long
> ago and the solution we currently have was the one chosen after
> considerable debate and analysis about things like locking. We preserved
> the historical reliable interface going back to the early Linux PCI
> support and used by all the apps.

Would it be okay for pci_block_user_cfg_access() to use its own cache, so
it doesn't interfere with data previously cached by pci_save_state()?

Alan Stern

2006-10-13 19:18:56

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, Oct 13, 2006 at 01:57:48PM -0400, Alan Stern wrote:
> Would it be okay for pci_block_user_cfg_access() to use its own cache, so
> it doesn't interfere with data previously cached by pci_save_state()?

My suggestion is just to require that the callers have previously called
pci_save_state(). The PCI PM stack already has, and it's a one-line
change to the IPR driver.

2006-10-13 19:20:35

by Adam M Belay

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, 2006-10-13 at 18:34 +0100, Alan Cox wrote:
Ar Gwe, 2006-10-13 am 10:49 -0600, ysgrifennodd Matthew Wilcox:
> > No it didn't. It's undefined behaviour to perform *any* PCI config
> > access to the device while it's doing a D-state transition. It may have
>
> I think you missed the earlier parts of the story - the kernel caches
> the base config register state.
>
> > happened to work with the chips you tried it with, but more likely you
> > never hit that window because X simply didn't try to do that.
>
> Which is why the kernel caches the register state. This all came up long
> ago and the solution we currently have was the one chosen after
> considerable debate and analysis about things like locking. We preserved
> the historical reliable interface going back to the early Linux PCI
> support and used by all the apps.
>
>
> There are several problems with making it return an error
>
> - What does user space do ?
>
> while(pci_...() == -EAGAIN) yield();
>
> which is useful how - there is no select operation for waiting here, and
> while it could be added it just gets uglier
>

If the sysfs file blocked, this could be handled quite cleanly, and
would reflect accurate PCI config state.

> - Who actually wants to get an error in that specific case ?
>

Let's say the device is in D3cold (i.e. the parent bridge has been
powered down). In that case, you might want to get an error (probably
-EIO, but maybe FF...). A buffered copy would be incorrect if used by a
userspace driver, as this would be hiding a legitimate failure
condition.

> If you can find someone who desperately wants an error code then code in
> O_DIRECT support to do it and preserve the existing sane API.
>
> The job of the kernel is not to expose hardware directly, it is to
> provide sane interfaces to it. We don't have separate interfaces to
> conf1, conf2, pcibios etc for good reason. Exposing everyone to ugly
> minor details of the PCI transition handling isn't progress.
>

I suppose we have very different ideas about the actual role and purpose
of this sysfs interface. As I see it, it provides direct access to
hardware for userspace device drivers (software that actually cares
about the ugly PCI details). It's much lower-level than the highly
abstracted "vendor", "device", "resourceX", etc. interfaces. As such,
it's very important that it accurately reflects what's actually going on
in hardware, even if this is of potentially greater hassle to userspace.
Now that's not to suggest that we shouldn't block this interface when
making a power state transition. But I think it's best to expose the
hardware failure and powered off cases as errors.

On the other hand you seem to suggest that it is a potentially
approximate cache of the pci config space that primarily serves to
provide pci configuration data to userspace hardware detection
mechanisms. However, in this case, I think it may as well be marked as
deprecated, as it's clearly inferior to the higher order sysfs
attributes ("vendor", "device", "irq", "class", etc.) with regard to
accuracy, code complexity (both for the kernel and userspace), and
ease-of-use. In other words, I don't see a reason any userspace app
should ever use it other than for debugging (i.e. lspci).

Adam


2006-10-13 20:48:55

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

Hi!

> > The PCI configuration space cache was originally introduced to support
> > power management. However, it's mostly incorrect, as it unnecessarily
> > stores the values of read only registers (and even BIST which is
> > potentially dangerous). A while back I posted a series of patches that
> > address this issue, and the net result was that the config cache stays
> > around wasting memory because of the pci_block_user_cfg_access()
> > dependency despite being useless to PCI PM.
> >
> > I'd like to propose that we have the pci config sysfs interface return
> > -EIO when it's blocked (e.g. active BIST or D3cold). This accurately
> > reflects the state of the device to userspace, reduces complexity, and
> > could potentially save some memory per PCI device instance.
>
> Could you resubmit your old patches and include a corresponding fix for
> this access problem?
>
> BTW, is it possible for userspace to try accessing a device in D3cold?
> Doesn't the fact that it is D3cold rather than D3hot mean the computer is
> turned off? Or have I been missing out on new developments?

I'm not sure about D3cold vs. D3hot... IIRC D3hot means that it is
possible to wake up the system, while D3cold means slot is completely
powered down.
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2006-10-13 20:59:49

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, 13 Oct 2006, Matthew Wilcox wrote:

> On Fri, Oct 13, 2006 at 01:57:48PM -0400, Alan Stern wrote:
> > Would it be okay for pci_block_user_cfg_access() to use its own cache, so
> > it doesn't interfere with data previously cached by pci_save_state()?
>
> My suggestion is just to require that the callers have previously called
> pci_save_state(). The PCI PM stack already has, and it's a one-line
> change to the IPR driver.

Okay. Would you like to write a patch with that fix? Be sure to add a
comment explaining the need for a previous call to pci_save_state().

At least it will get things going for now, even if it isn't perfectly
correct in the long run.

Alan Stern

2006-10-13 23:04:09

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, 2006-10-13 at 18:34 +0100, Alan Cox wrote:
> Ar Gwe, 2006-10-13 am 10:49 -0600, ysgrifennodd Matthew Wilcox:
> > No it didn't. It's undefined behaviour to perform *any* PCI config
> > access to the device while it's doing a D-state transition. It may have
>
> I think you missed the earlier parts of the story - the kernel caches
> the base config register state.
>
> > happened to work with the chips you tried it with, but more likely you
> > never hit that window because X simply didn't try to do that.
>
> Which is why the kernel caches the register state. This all came up long
> ago and the solution we currently have was the one chosen after
> considerable debate and analysis about things like locking. We preserved
> the historical reliable interface going back to the early Linux PCI
> support and used by all the apps.

Well, we have two different things here.

One is short term block. For example, PM transitions, or BIST. In that
case, I reckon it might be worth just making the user space PCI config
space accessors block in the kernel during the transition (a wait
queue ?)

One is long term block: the device is off. That's where it becomes
tricky. For D3, I suppose it's actually correct to return cached infos
provided that those do actually cache the PM capability indicating D3
state (that is we need to update the cache after the transition). And
it's correct to prevent writes too I suppose.

Then there are problems with things like embedded or some Apple ASICs
where we toggle power/clock lines of devices but not directly using PCI
PM (in fact, those devices might not even have PCI PM capability
exposed). Returning cached info is fine, but we can't tell userland
about the powered off (or unclocked) state of the device that way.

Ben.


2006-10-14 02:33:07

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Sat, 14 Oct 2006, Benjamin Herrenschmidt wrote:

> Well, we have two different things here.
>
> One is short term block. For example, PM transitions, or BIST. In that
> case, I reckon it might be worth just making the user space PCI config
> space accessors block in the kernel during the transition (a wait
> queue ?)

That seems like a reasonable thing to do. (BTW, can anyone explain
quickly what "BIST" means?)

> One is long term block: the device is off. That's where it becomes
> tricky. For D3, I suppose it's actually correct to return cached infos
> provided that those do actually cache the PM capability indicating D3
> state (that is we need to update the cache after the transition). And
> it's correct to prevent writes too I suppose.
>
> Then there are problems with things like embedded or some Apple ASICs
> where we toggle power/clock lines of devices but not directly using PCI
> PM (in fact, those devices might not even have PCI PM capability
> exposed). Returning cached info is fine, but we can't tell userland
> about the powered off (or unclocked) state of the device that way.

Now you're starting to tread in the dangerous waters of runtime PM
userspace APIs. So far nobody has figured out a good general way of
exposing internal power states to userspace. There may not even be such a
thing.

Alan Stern

2006-10-14 03:04:50

by Roland Dreier

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

Alan> That seems like a reasonable thing to do. (BTW, can anyone
Alan> explain quickly what "BIST" means?)

"Built-In Self Test" -- make the device go away and run some internal
tests for a while and then report back.

2006-10-14 03:07:48

by Matthew Wilcox

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, Oct 13, 2006 at 10:33:04PM -0400, Alan Stern wrote:
> That seems like a reasonable thing to do. (BTW, can anyone explain
> quickly what "BIST" means?)

Built In Self Test. Devices can take up to 2 seconds to complete the
test, and some drop off the PCI bus while doing it.

2006-10-14 03:15:09

by Bill Randle

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, 2006-10-13 at 22:33 -0400, Alan Stern wrote:
> That seems like a reasonable thing to do. (BTW, can anyone explain
> quickly what "BIST" means?)

Built-In Self Test

-Bill


2006-10-14 05:36:40

by Greg KH

[permalink] [raw]
Subject: Re: Bug in PCI core

On Wed, Oct 11, 2006 at 04:41:22PM -0400, Alan Stern wrote:
> When a PCI device is suspended, its driver calls pci_save_state() so that
> the config space can be restored when the device is resumed. Then the
> driver calls pci_set_power_state().
>
> However pci_set_power_state() calls pci_block_user_cfg_access(), and that
> routine calls pci_save_state() again. This overwrites the saved state
> with data in which memory, I/O, and bus master accesses are disabled. As
> a result, when the device is resumed it doesn't work.
>
> Obviously pci_block_user_cfg_access() needs to be fixed. I don't know the
> right way to fix it; hopefully somebody else does.

Yeah, I'm just going to drop the patch from Matthew that added this.
Andrew also noted that it causes bad things to happen on his laptop.

Thanks for pointing it out.

I gotta get suspend working on a machine here so I can test these things
better...

thanks,

greg k-h

2006-10-14 05:47:52

by Greg KH

[permalink] [raw]
Subject: Re: [linux-pm] Bug in PCI core

On Fri, Oct 13, 2006 at 10:33:04PM -0400, Alan Stern wrote:
> (BTW, can anyone explain quickly what "BIST" means?)

"Built In Self Test"