2020-07-14 19:13:46

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

[trimmed the cc list; it's still too large but maybe arch folks care]

On Mon, Jul 13, 2020 at 05:08:10PM +0200, Arnd Bergmann wrote:
> On Mon, Jul 13, 2020 at 3:22 PM Saheed O. Bolarinwa
> <[email protected]> wrote:
> > This goal of these series is to move the definition of *all*
> > PCIBIOS* from include/linux/pci.h to arch/x86 and limit their use
> > within there. All other tree specific definition will be left for
> > intact. Maybe they can be renamed.
> >
> > PCIBIOS* is an x86 concept as defined by the PCI spec. The
> > returned error codes of PCIBIOS* are positive values and this
> > introduces some complexities which other archs need not incur.
>
> I think the intention is good, but I find the series in its current
> form very hard to review, in particular the way you touch some
> functions three times with trivial changes. Instead of
>
> 1) replace PCIBIOS_SUCCESSFUL with 0
> 2) drop pointless 0-comparison
> 3) reformat whitespace
>
> I would suggest to combine the first two steps into one patch per
> subsystem and drop the third step.

I agree. BUT please don't just run out and post new patches to do
this. Let's talk about Arnd's further ideas below first.

> ...
> Maybe the work can be split up differently, with a similar end
> result but fewer and easier reviewed patches. The way I'd look at
> the problem, there are three main areas that can be dealt with one
> at a time:
>
> a) callers of the high-level config space accessors
> pci_{write,read}_config_{byte,word,dword}, mostly in device
> drivers.
> b) low-level implementation of the config space accessors
> through struct pci_ops
> c) all other occurrences of these constants
>
> Starting with a), my first question is whether any high-level
> drivers even need to care about errors from these functions. I see
> 4913 callers that ignore the return code, and 576 that actually
> check it, and almost none care about the specific error (as you
> found as well). Unless we conclude that most PCI drivers are wrong,
> could we just change the return type to 'void' and assume they never
> fail for valid arguments on a valid pci_device* ?

I really like this idea.

pci_write_config_*() has one return value, and only 100ish of 2500
callers check for errors. It's sometimes possible for config
accessors to detect PCI errors and return failure, e.g., device was
removed or didn't respond, but most of them don't, and detecting these
errors is not really that valuable.

pci_read_config_*() is much more interesting because it returns two
things, the function return value and the value read from the PCI
device, and it's complicated to check both.

Again it's sometimes possible for config read accessors to detect PCI
errors, but in most cases a PCI error means the accessor returns
success and the value from PCI is ~0.

Checking the function return value catches programming errors (bad
alignment, etc) but misses most of the interesting errors (device was
unplugged or reported a PCI error).

Checking the value returned from PCI is tricky because ~0 is a valid
value for some config registers, and only the driver knows for sure.
If the driver knows that ~0 is a possible value, it would have to do
something else, e.g., another config read of a register that *cannot*
be ~0, to see whether it's really an error.

I suspect that if we had a single value to look at it would be easier
to get right. Error checking with current interface would look like
this:

err = pci_read_config_word(dev, addr, &val);
if (err)
return -EINVAL;

if (PCI_POSSIBLE_ERROR(val)) {
/* if driver knows ~0 is invalid */
return -EINVAL;

/* if ~0 is potentially a valid value */
err = pci_read_config_word(dev, PCI_VENDOR_ID, &val2);
if (err)
return -EINVAL;

if (PCI_POSSIBLE_ERROR(val2))
return -EINVAL;
}

Error checking with a possible interface that returned only a single
value could look like this:

val = pci_config_read_word(dev, addr);
if (PCI_POSSIBLE_ERROR(val)) {
/* if driver knows ~0 is invalid */
return -EINVAL;

/* if ~0 is potentially a valid value */
val2 = pci_config_read_word(dev, PCI_VENDOR_ID);
if (PCI_POSSIBLE_ERROR(val2))
return -EINVAL;
}

Am I understanding you correctly?

> For b), it might be nice to also change other aspects of the
> interface, e.g. passing a pci_host_bridge pointer plus bus number
> instead of a pci_bus pointer, or having the callback in the
> pci_host_bridge structure.

I like this idea a lot, too. I think the fact that
pci_bus_read_config_word() requires a pci_bus * complicates things in
a few places.

I think it's completely separate, as you say, and we should defer it
for now because even part a) is a lot of work. I added it to my list
of possible future projects.

Bjorn


2020-07-14 21:05:27

by Kjetil Oftedal

[permalink] [raw]
Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

On 14/07/2020, Bjorn Helgaas <[email protected]> wrote:

>>
>> a) callers of the high-level config space accessors
>> pci_{write,read}_config_{byte,word,dword}, mostly in device
>> drivers.
>> b) low-level implementation of the config space accessors
>> through struct pci_ops
>> c) all other occurrences of these constants
>>
>> Starting with a), my first question is whether any high-level
>> drivers even need to care about errors from these functions. I see
>> 4913 callers that ignore the return code, and 576 that actually
>> check it, and almost none care about the specific error (as you
>> found as well). Unless we conclude that most PCI drivers are wrong,
>> could we just change the return type to 'void' and assume they never
>> fail for valid arguments on a valid pci_device* ?
>
> I really like this idea.
>
> pci_write_config_*() has one return value, and only 100ish of 2500
> callers check for errors. It's sometimes possible for config
> accessors to detect PCI errors and return failure, e.g., device was
> removed or didn't respond, but most of them don't, and detecting these
> errors is not really that valuable.
>
> pci_read_config_*() is much more interesting because it returns two
> things, the function return value and the value read from the PCI
> device, and it's complicated to check both.
>
> Again it's sometimes possible for config read accessors to detect PCI
> errors, but in most cases a PCI error means the accessor returns
> success and the value from PCI is ~0.
>
> Checking the function return value catches programming errors (bad
> alignment, etc) but misses most of the interesting errors (device was
> unplugged or reported a PCI error).
>
> Checking the value returned from PCI is tricky because ~0 is a valid
> value for some config registers, and only the driver knows for sure.
> If the driver knows that ~0 is a possible value, it would have to do
> something else, e.g., another config read of a register that *cannot*
> be ~0, to see whether it's really an error.
>
> I suspect that if we had a single value to look at it would be easier
> to get right. Error checking with current interface would look like
> this:
>
> err = pci_read_config_word(dev, addr, &val);
> if (err)
> return -EINVAL;
>
> if (PCI_POSSIBLE_ERROR(val)) {
> /* if driver knows ~0 is invalid */
> return -EINVAL;
>
> /* if ~0 is potentially a valid value */
> err = pci_read_config_word(dev, PCI_VENDOR_ID, &val2);
> if (err)
> return -EINVAL;
>
> if (PCI_POSSIBLE_ERROR(val2))
> return -EINVAL;
> }
>
> Error checking with a possible interface that returned only a single
> value could look like this:
>
> val = pci_config_read_word(dev, addr);
> if (PCI_POSSIBLE_ERROR(val)) {
> /* if driver knows ~0 is invalid */
> return -EINVAL;
>
> /* if ~0 is potentially a valid value */
> val2 = pci_config_read_word(dev, PCI_VENDOR_ID);
> if (PCI_POSSIBLE_ERROR(val2))
> return -EINVAL;
> }
>
> Am I understanding you correctly?

Let us not do this. Reading config space is really expensive on some
architectures. Requiring a driver to do it twice on some values does not
improve upon that situation. And is quite redundant if the Root Complex
driver already knows that the first access has failed.

Additionally since multiple config accesses to the same devices is not
allowed in the spec, the hardware must block and wait for a timeout if
a config access does not get a response.
(Can happen if a intermediate link between the RC and endpoint has to retrain)
Having to block twice is very much not ideal. And in the case with
retraining the secondary access might even succeed. As the link might
recover between reading the first config word and reading PCI_VENDOR_ID.
Thus allowing the driver to accept invalid data from the device.

>
>> For b), it might be nice to also change other aspects of the
>> interface, e.g. passing a pci_host_bridge pointer plus bus number
>> instead of a pci_bus pointer, or having the callback in the
>> pci_host_bridge structure.
>
> I like this idea a lot, too. I think the fact that
> pci_bus_read_config_word() requires a pci_bus * complicates things in
> a few places.
>
> I think it's completely separate, as you say, and we should defer it
> for now because even part a) is a lot of work. I added it to my list
> of possible future projects.
>

What about strange PCI devices such as Non-Transparent bridges?
They will require their own PCI Config space accessors that is not
connected to a host bridge if one wants to do some sort of
punch-through enumeration.
I guess the kernel doesn't care much about them?

Best regards,
Kjetil Oftedal

2020-07-14 22:03:23

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

On Tue, Jul 14, 2020 at 8:45 PM Bjorn Helgaas <[email protected]> wrote:
> On Mon, Jul 13, 2020 at 05:08:10PM +0200, Arnd Bergmann wrote:
> > On Mon, Jul 13, 2020 at 3:22 PM Saheed O. Bolarinwa
> > Starting with a), my first question is whether any high-level
> > drivers even need to care about errors from these functions. I see
> > 4913 callers that ignore the return code, and 576 that actually
> > check it, and almost none care about the specific error (as you
> > found as well). Unless we conclude that most PCI drivers are wrong,
> > could we just change the return type to 'void' and assume they never
> > fail for valid arguments on a valid pci_device* ?
>
> I really like this idea.
>
> pci_write_config_*() has one return value, and only 100ish of 2500
> callers check for errors. It's sometimes possible for config
> accessors to detect PCI errors and return failure, e.g., device was
> removed or didn't respond, but most of them don't, and detecting these
> errors is not really that valuable.
>
> pci_read_config_*() is much more interesting because it returns two
> things, the function return value and the value read from the PCI
> device, and it's complicated to check both.
>
> Again it's sometimes possible for config read accessors to detect PCI
> errors, but in most cases a PCI error means the accessor returns
> success and the value from PCI is ~0.
>
> Checking the function return value catches programming errors (bad
> alignment, etc) but misses most of the interesting errors (device was
> unplugged or reported a PCI error).

My thinking was more that most of the time the error checking may
be completely bogus to start with, and I would just not check for
errors at all.

> Checking the value returned from PCI is tricky because ~0 is a valid
> value for some config registers, and only the driver knows for sure.
> If the driver knows that ~0 is a possible value, it would have to do
> something else, e.g., another config read of a register that *cannot*
> be ~0, to see whether it's really an error.
>
> I suspect that if we had a single value to look at it would be easier
> to get right. Error checking with current interface would look like
> this:
>
> err = pci_read_config_word(dev, addr, &val);
> if (err)
> return -EINVAL;
>
> if (PCI_POSSIBLE_ERROR(val)) {
> /* if driver knows ~0 is invalid */
> return -EINVAL;
>
> /* if ~0 is potentially a valid value */
> err = pci_read_config_word(dev, PCI_VENDOR_ID, &val2);
> if (err)
> return -EINVAL;
>
> if (PCI_POSSIBLE_ERROR(val2))
> return -EINVAL;
> }
>
> Error checking with a possible interface that returned only a single
> value could look like this:
>
> val = pci_config_read_word(dev, addr);
> if (PCI_POSSIBLE_ERROR(val)) {
> /* if driver knows ~0 is invalid */
> return -EINVAL;
>
> /* if ~0 is potentially a valid value */
> val2 = pci_config_read_word(dev, PCI_VENDOR_ID);
> if (PCI_POSSIBLE_ERROR(val2))
> return -EINVAL;
> }
>
> Am I understanding you correctly?

That would require changing all callers of the function, which
I think would involve changing some 700 files. What I was
suggesting was to only change the return type to void and
categorize all drivers that today check it as either

a) checking the return code is not helpful, or possibly even
wrong, so we just stop doing it. I expect those to be the
vast majority of callers, but that could be wrong.

b) Code that legitimately check the error code and need to
take an appropriate action. These could be changed to
calling a different interface such as 'pci_bus_read_config_word'
or a new 'pci_device_last_error()' function.

The reasons I suspect that most callers don't actually need
to check for errors are:

- Most error checking is static: PCIBIOS_BAD_REGISTER_NUMBER
only happens if you pass an invalid register number, but most
callers pass a compile-time constant register number that is
known to be correct, or the driver would never work. Similarly,
PCIBIOS_DEVICE_NOT_FOUND wouldn't normally happen
since you pass a valid pci_device pointer that was already
probed.

- config space accesses are very rare compared to memory
space access and on the hardware side the error handling
would be similar, but readl/writel don't return errors, they just
access wrong registers or return 0xffffffff.
arch/powerpc/kernel/eeh.c has a ton extra code written to
deal with it, but no other architectures do.

- If we add code to detect errors in pci_read_config_*
and do some of the stuff from powerpc's
eeh_dev_check_failure(), we are more likely to catch
intermittent failures when drivers don't check, or bugs
with invalid arguments in device drivers than relying on
drivers to get their error handling right when those code
paths don't ever get covered in normal testing.

Looking at a couple of random drivers that do check the
return codes, I find:

drivers/edac/amd8131_edac.c: prints the register number,
then keeps going. This is not useful

drivers/net/ethernet/mellanox/mlx4/reset.c: error handling
in mlx4_reset() seems reasonable, but it gets called
from mlx4_pci_resume(), which has a 'void' return code and
cannot propagate the error further. My guess is that it
would try to keep going after a failed resume and run into
random other problems then.

drivers/ata/pata_cs5536.c: error code gets passed to
caller and then always ignored. Can clearly be changed

drivers/net/wireless/intersil/prism54/islpci_hotplug.c:
Out of two calls, only one is checked, which seems bogus

drivers/usb/host/pci-quirks.c: only one of many instances
has a check, again this seems bogus.

drivers/leds/leds-ss4200.c: called from probe(), which
seems to correctly deal with errors by failing the probe.
Not sure this can ever fail though, since the driver only does
it after pci_enable_device() succeeds first. Note that
pci_enable_device() ignores pci_read_config_byte()
errors but sanity-checks the register contents/

Arnd

2020-07-14 23:17:38

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

On Tue, Jul 14, 2020 at 12:45 PM Bjorn Helgaas <[email protected]> wrote:
>
> [trimmed the cc list; it's still too large but maybe arch folks care]
>
> On Mon, Jul 13, 2020 at 05:08:10PM +0200, Arnd Bergmann wrote:
> > On Mon, Jul 13, 2020 at 3:22 PM Saheed O. Bolarinwa
> > <[email protected]> wrote:
> > > This goal of these series is to move the definition of *all*
> > > PCIBIOS* from include/linux/pci.h to arch/x86 and limit their use
> > > within there. All other tree specific definition will be left for
> > > intact. Maybe they can be renamed.
> > >
> > > PCIBIOS* is an x86 concept as defined by the PCI spec. The
> > > returned error codes of PCIBIOS* are positive values and this
> > > introduces some complexities which other archs need not incur.
> >
> > I think the intention is good, but I find the series in its current
> > form very hard to review, in particular the way you touch some
> > functions three times with trivial changes. Instead of
> >
> > 1) replace PCIBIOS_SUCCESSFUL with 0
> > 2) drop pointless 0-comparison
> > 3) reformat whitespace
> >
> > I would suggest to combine the first two steps into one patch per
> > subsystem and drop the third step.
>
> I agree. BUT please don't just run out and post new patches to do
> this. Let's talk about Arnd's further ideas below first.
>
> > ...
> > Maybe the work can be split up differently, with a similar end
> > result but fewer and easier reviewed patches. The way I'd look at
> > the problem, there are three main areas that can be dealt with one
> > at a time:
> >
> > a) callers of the high-level config space accessors
> > pci_{write,read}_config_{byte,word,dword}, mostly in device
> > drivers.
> > b) low-level implementation of the config space accessors
> > through struct pci_ops
> > c) all other occurrences of these constants
> >
> > Starting with a), my first question is whether any high-level
> > drivers even need to care about errors from these functions. I see
> > 4913 callers that ignore the return code, and 576 that actually
> > check it, and almost none care about the specific error (as you
> > found as well). Unless we conclude that most PCI drivers are wrong,
> > could we just change the return type to 'void' and assume they never
> > fail for valid arguments on a valid pci_device* ?
>
> I really like this idea.
>
> pci_write_config_*() has one return value, and only 100ish of 2500
> callers check for errors. It's sometimes possible for config
> accessors to detect PCI errors and return failure, e.g., device was
> removed or didn't respond, but most of them don't, and detecting these
> errors is not really that valuable.
>
> pci_read_config_*() is much more interesting because it returns two
> things, the function return value and the value read from the PCI
> device, and it's complicated to check both.
>
> Again it's sometimes possible for config read accessors to detect PCI
> errors, but in most cases a PCI error means the accessor returns
> success and the value from PCI is ~0.
>
> Checking the function return value catches programming errors (bad
> alignment, etc) but misses most of the interesting errors (device was
> unplugged or reported a PCI error).
>
> Checking the value returned from PCI is tricky because ~0 is a valid
> value for some config registers, and only the driver knows for sure.
> If the driver knows that ~0 is a possible value, it would have to do
> something else, e.g., another config read of a register that *cannot*
> be ~0, to see whether it's really an error.
>
> I suspect that if we had a single value to look at it would be easier
> to get right. Error checking with current interface would look like
> this:
>
> err = pci_read_config_word(dev, addr, &val);
> if (err)
> return -EINVAL;
>
> if (PCI_POSSIBLE_ERROR(val)) {
> /* if driver knows ~0 is invalid */
> return -EINVAL;
>
> /* if ~0 is potentially a valid value */
> err = pci_read_config_word(dev, PCI_VENDOR_ID, &val2);
> if (err)
> return -EINVAL;
>
> if (PCI_POSSIBLE_ERROR(val2))
> return -EINVAL;
> }
>
> Error checking with a possible interface that returned only a single
> value could look like this:
>
> val = pci_config_read_word(dev, addr);
> if (PCI_POSSIBLE_ERROR(val)) {
> /* if driver knows ~0 is invalid */
> return -EINVAL;
>
> /* if ~0 is potentially a valid value */
> val2 = pci_config_read_word(dev, PCI_VENDOR_ID);
> if (PCI_POSSIBLE_ERROR(val2))
> return -EINVAL;
> }
>
> Am I understanding you correctly?
>
> > For b), it might be nice to also change other aspects of the
> > interface, e.g. passing a pci_host_bridge pointer plus bus number
> > instead of a pci_bus pointer, or having the callback in the
> > pci_host_bridge structure.
>
> I like this idea a lot, too. I think the fact that
> pci_bus_read_config_word() requires a pci_bus * complicates things in
> a few places.

I've been looking at the various host implementations of config
accessors as well as probe functions. Needing the pci_bus pointer is a
big reason why host drivers will have 2 sets of config accessors or
don't use the generic ones. Often that's just for the root bus config
space init before pci_host_probe() is called. Perhaps that's better
addressed with a fixup hook for the host bridge? ftpci100i is a good
example of this.

The root bus accesses are often different from the rest of config
space. Determining if an access is for the root bus or not is all over
the map, but often involves a private bus number variable. I have a
series to use pci_is_root_bus() instead and eliminate a bunch of bus
number handling in the host drivers (I'm sure there's a bunch of hosts
that would be broken if the root bus is not 0). The majority of hosts
don't really need to know anything about the bus number. The more I've
thought about it, it would be better if the PCI core handled this and
picked the right ops to call. We already have several cases of host
drivers with their own ops for this and we could eliminate several
layers of indirection (looking at you, DWC). Any thoughts on direction
here would be helpful.

> I think it's completely separate, as you say, and we should defer it
> for now because even part a) is a lot of work. I added it to my list
> of possible future projects.

Got that published somewhere? :)

Rob

2020-07-15 00:10:52

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

[+cc Kjetil]

On Wed, Jul 15, 2020 at 12:01:56AM +0200, Arnd Bergmann wrote:
> On Tue, Jul 14, 2020 at 8:45 PM Bjorn Helgaas <[email protected]> wrote:
> > On Mon, Jul 13, 2020 at 05:08:10PM +0200, Arnd Bergmann wrote:
> > > On Mon, Jul 13, 2020 at 3:22 PM Saheed O. Bolarinwa
> > > Starting with a), my first question is whether any high-level
> > > drivers even need to care about errors from these functions. I see
> > > 4913 callers that ignore the return code, and 576 that actually
> > > check it, and almost none care about the specific error (as you
> > > found as well). Unless we conclude that most PCI drivers are wrong,
> > > could we just change the return type to 'void' and assume they never
> > > fail for valid arguments on a valid pci_device* ?
> >
> > I really like this idea.
> >
> > pci_write_config_*() has one return value, and only 100ish of 2500
> > callers check for errors. It's sometimes possible for config
> > accessors to detect PCI errors and return failure, e.g., device was
> > removed or didn't respond, but most of them don't, and detecting these
> > errors is not really that valuable.
> >
> > pci_read_config_*() is much more interesting because it returns two
> > things, the function return value and the value read from the PCI
> > device, and it's complicated to check both.
> >
> > Again it's sometimes possible for config read accessors to detect PCI
> > errors, but in most cases a PCI error means the accessor returns
> > success and the value from PCI is ~0.
> >
> > Checking the function return value catches programming errors (bad
> > alignment, etc) but misses most of the interesting errors (device was
> > unplugged or reported a PCI error).
>
> My thinking was more that most of the time the error checking may
> be completely bogus to start with, and I would just not check for
> errors at all.

Yes. I have no problem with that. There are a few cases where it's
important to check for errors, e.g., we read a status register and do
something based on a bit being set. A failure will return all bits
set, and we may do the wrong thing. But most of the errors we care
about will be on MMIO reads, not config reads, so we can probably
ignore most config read errors.

> > Checking the value returned from PCI is tricky because ~0 is a valid
> > value for some config registers, and only the driver knows for sure.
> > If the driver knows that ~0 is a possible value, it would have to do
> > something else, e.g., another config read of a register that *cannot*
> > be ~0, to see whether it's really an error.
> >
> > I suspect that if we had a single value to look at it would be easier
> > to get right. Error checking with current interface would look like
> > this:
> >
> > err = pci_read_config_word(dev, addr, &val);
> > if (err)
> > return -EINVAL;
> >
> > if (PCI_POSSIBLE_ERROR(val)) {
> > /* if driver knows ~0 is invalid */
> > return -EINVAL;
> >
> > /* if ~0 is potentially a valid value */
> > err = pci_read_config_word(dev, PCI_VENDOR_ID, &val2);
> > if (err)
> > return -EINVAL;
> >
> > if (PCI_POSSIBLE_ERROR(val2))
> > return -EINVAL;
> > }
> >
> > Error checking with a possible interface that returned only a single
> > value could look like this:
> >
> > val = pci_config_read_word(dev, addr);
> > if (PCI_POSSIBLE_ERROR(val)) {
> > /* if driver knows ~0 is invalid */
> > return -EINVAL;
> >
> > /* if ~0 is potentially a valid value */
> > val2 = pci_config_read_word(dev, PCI_VENDOR_ID);
> > if (PCI_POSSIBLE_ERROR(val2))
> > return -EINVAL;
> > }
> >
> > Am I understanding you correctly?
>
> That would require changing all callers of the function, which
> I think would involve changing some 700 files.

Yeah, that would be a disaster. So something like:

void pci_read_config_word(struct pci_dev *dev, int where, u16 *val)

and where we used to return anything non-zero, we just set *val = ~0
instead? I think we do that already in most, maybe all, cases.

> What I was suggesting was to only change the return type to void and
> categorize all drivers that today check it as either
>
> a) checking the return code is not helpful, or possibly even
> wrong, so we just stop doing it. I expect those to be the
> vast majority of callers, but that could be wrong.
>
> b) Code that legitimately check the error code and need to
> take an appropriate action. These could be changed to
> calling a different interface such as 'pci_bus_read_config_word'
> or a new 'pci_device_last_error()' function.

Yep, makes sense.

> The reasons I suspect that most callers don't actually need
> to check for errors are:
>
> - Most error checking is static: PCIBIOS_BAD_REGISTER_NUMBER
> only happens if you pass an invalid register number, but most
> callers pass a compile-time constant register number that is
> known to be correct, or the driver would never work. Similarly,
> PCIBIOS_DEVICE_NOT_FOUND wouldn't normally happen
> since you pass a valid pci_device pointer that was already
> probed.

Yep, except for things like device removal or other PCI errors.

> - config space accesses are very rare compared to memory
> space access and on the hardware side the error handling
> would be similar, but readl/writel don't return errors, they just
> access wrong registers or return 0xffffffff.
> arch/powerpc/kernel/eeh.c has a ton extra code written to
> deal with it, but no other architectures do.
>
> - If we add code to detect errors in pci_read_config_*
> and do some of the stuff from powerpc's
> eeh_dev_check_failure(), we are more likely to catch
> intermittent failures when drivers don't check, or bugs
> with invalid arguments in device drivers than relying on
> drivers to get their error handling right when those code
> paths don't ever get covered in normal testing.

Yeah, this makes sense and sounds like a potential follow-on project.

> Looking at a couple of random drivers that do check the
> return codes, I find:
>
> drivers/edac/amd8131_edac.c: prints the register number,
> then keeps going. This is not useful
>
> drivers/net/ethernet/mellanox/mlx4/reset.c: error handling
> in mlx4_reset() seems reasonable, but it gets called
> from mlx4_pci_resume(), which has a 'void' return code and
> cannot propagate the error further. My guess is that it
> would try to keep going after a failed resume and run into
> random other problems then.
>
> drivers/ata/pata_cs5536.c: error code gets passed to
> caller and then always ignored. Can clearly be changed
>
> drivers/net/wireless/intersil/prism54/islpci_hotplug.c:
> Out of two calls, only one is checked, which seems bogus
>
> drivers/usb/host/pci-quirks.c: only one of many instances
> has a check, again this seems bogus.
>
> drivers/leds/leds-ss4200.c: called from probe(), which
> seems to correctly deal with errors by failing the probe.
> Not sure this can ever fail though, since the driver only does
> it after pci_enable_device() succeeds first. Note that
> pci_enable_device() ignores pci_read_config_byte()
> errors but sanity-checks the register contents/

So maybe a good place to start is by removing some of the useless
error checking for pci_read_config_*() and pci_write_config_*().
That's a decent-sized but not impractical project that could be done
per subsystem or something:

git grep -E "(if|return|=).*\<pci_(read|write)_config" drivers

finds about 400 matches.

Some of those callers probably really *do* want to check for errors,
and I guess we'd have to identify them and do them separately as you
mentioned.

Bjorn

2020-07-15 02:30:09

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

On Tue, 2020-07-14 at 13:45 -0500, Bjorn Helgaas wrote:
>
> > fail for valid arguments on a valid pci_device* ?
>
> I really like this idea.
>
> pci_write_config_*() has one return value, and only 100ish of 2500
> callers check for errors. It's sometimes possible for config
> accessors to detect PCI errors and return failure, e.g., device was
> removed or didn't respond, but most of them don't, and detecting
> these
> errors is not really that valuable.
>
> pci_read_config_*() is much more interesting because it returns two
> things, the function return value and the value read from the PCI
> device, and it's complicated to check both.

.../...

I agree. It's a mess at the moment.

We have separate mechanism to convey PCI errors (among other things the
channel state) which should apply to config space when detection is
possible.

I think returning all 1's is the right thing to do here and avoids odd
duplicate error detection logic which I bet you is never properly
tested.

> > For b), it might be nice to also change other aspects of the
> > interface, e.g. passing a pci_host_bridge pointer plus bus number
> > instead of a pci_bus pointer, or having the callback in the
> > pci_host_bridge structure.
>
> I like this idea a lot, too. I think the fact that
> pci_bus_read_config_word() requires a pci_bus * complicates things in
> a few places.
>
> I think it's completely separate, as you say, and we should defer it
> for now because even part a) is a lot of work. I added it to my list
> of possible future projects.

Agreed on both points.

Cheers,
Ben.


2020-07-15 02:30:43

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

On Tue, 2020-07-14 at 23:02 +0200, Kjetil Oftedal wrote:
> >
> > > For b), it might be nice to also change other aspects of the
> > > interface, e.g. passing a pci_host_bridge pointer plus bus number
> > > instead of a pci_bus pointer, or having the callback in the
> > > pci_host_bridge structure.
> >
> > I like this idea a lot, too. I think the fact that
> > pci_bus_read_config_word() requires a pci_bus * complicates things in
> > a few places.
> >
> > I think it's completely separate, as you say, and we should defer it
> > for now because even part a) is a lot of work. I added it to my list
> > of possible future projects.
> >
>
> What about strange PCI devices such as Non-Transparent bridges?
> They will require their own PCI Config space accessors that is not
> connected to a host bridge if one wants to do some sort of
> punch-through enumeration.
> I guess the kernel doesn't care much about them?

Well, today they would require a pci_bus anyway.. . so if you want to do
that sort of funny trick you may as well create a "virtual" host bridge.

Cheers,
Ben.


2020-07-15 02:32:40

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

On Tue, 2020-07-14 at 18:46 -0500, Bjorn Helgaas wrote:
> Yes. I have no problem with that. There are a few cases where it's
> important to check for errors, e.g., we read a status register and do
> something based on a bit being set. A failure will return all bits
> set, and we may do the wrong thing. But most of the errors we care
> about will be on MMIO reads, not config reads, so we can probably
> ignore most config read errors.

And in both cases, we don't have the plumbing to provide accurate
and reliable error returns for all platforms anyways (esp. not for
MMIO).

I think it makes sense to stick to the good old "if all 1's, then go
out of line" including for config space.

../..

> Yep, except for things like device removal or other PCI errors.

A whole bunch of these are reported asynchronously, esp for writes (and
yes, including config writes, they are supposed to be non-posted but
more often than not, the path from the CPU to the PCI bridge remains
posted for writes including config ones).

> So maybe a good place to start is by removing some of the useless
> error checking for pci_read_config_*() and pci_write_config_*().
> That's a decent-sized but not impractical project that could be done
> per subsystem or something:
>
> git grep -E "(if|return|=).*\<pci_(read|write)_config" drivers
>
> finds about 400 matches.
>
> Some of those callers probably really *do* want to check for errors,
> and I guess we'd have to identify them and do them separately as you
> mentioned.

I'd be curious about these considering how unreliable our error return
is accross the board.

Cheers,
Ben.


2020-07-15 04:21:04

by Oliver O'Halloran

[permalink] [raw]
Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

On Wed, Jul 15, 2020 at 8:03 AM Arnd Bergmann <[email protected]> wrote:
>
> - Most error checking is static: PCIBIOS_BAD_REGISTER_NUMBER
> only happens if you pass an invalid register number, but most
> callers pass a compile-time constant register number that is
> known to be correct, or the driver would never work. Similarly,
> PCIBIOS_DEVICE_NOT_FOUND wouldn't normally happen
> since you pass a valid pci_device pointer that was already
> probed.

Having some feedback about obvious programming errors is still useful
when doing driver development. Reporting those via printk() would
probably be more useful to those who care though.

> - config space accesses are very rare compared to memory
> space access and on the hardware side the error handling
> would be similar, but readl/writel don't return errors, they just
> access wrong registers or return 0xffffffff.
> arch/powerpc/kernel/eeh.c has a ton extra code written to
> deal with it, but no other architectures do.

TBH the EEH MMIO hooks were probably a mistake to begin with. Errors
detected via MMIO are almost always asynchronous to the error itself
so you usually just wind up with a misleading stack trace rather than
any kind of useful synchronous error reporting. It seems like most
drivers don't bother checking for 0xFFs either and rely on the
asynchronous reporting via .error_detected() instead, so I have to
wonder what the point is. I've been thinking of removing the MMIO
hooks and using a background poller to check for errors on each PHB
periodically (assuming we don't have an EEH interrupt) instead. That
would remove the requirement for eeh_dev_check_failure() to be
interrupt safe too, so it might even let us fix all the godawful races
in EEH.

> - If we add code to detect errors in pci_read_config_*
> and do some of the stuff from powerpc's
> eeh_dev_check_failure(), we are more likely to catch
> intermittent failures when drivers don't check, or bugs
> with invalid arguments in device drivers than relying on
> drivers to get their error handling right when those code
> paths don't ever get covered in normal testing.

Adding some kind of error detection to the generic config accessors
wouldn't hurt, but detection is only half the problem. The main job of
eeh_dev_check_failure() is waking up the EEH recovery thread which
actually handles notifying drivers, device resets, etc and you'd want
something in the PCI core. Right now there's two implementations of
that reporting logic: one for EEH in arch/powerpc/eeh_driver.c and one
for AER/DPC in drivers/pci/pcie/err.c. I think the latter could be
moved into the PCI core easily enough since there's not much about it
that's really specific to PCIe. Ideally we could drop the EEH specific
one too, but I'm not sure how to implement that without it devolving
into callback spaghetti.

Oliver

2020-07-15 06:48:34

by Arnd Bergmann

[permalink] [raw]
Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

On Wed, Jul 15, 2020 at 1:46 AM Bjorn Helgaas <[email protected]> wrote:

So something like:
>
> void pci_read_config_word(struct pci_dev *dev, int where, u16 *val)
>
> and where we used to return anything non-zero, we just set *val = ~0
> instead? I think we do that already in most, maybe all, cases.

Right, this is what I had in mind. If we start by removing the handling
of the return code in all files that clearly don't need it, looking at
whatever remains will give a much better idea of what a good interface
should be.

> git grep -E "(if|return|=).*\<pci_(read|write)_config" drivers
> finds about 400 matches.

Right, and this is some 112 files to look at.

I had a slightly different regex, which found more false-positives, but
also these:

arch/x86/kernel/amd_nb.c: : pci_read_config_dword(root, 0x64, value));
drivers/i2c/busses/i2c-sis630.c: pci_write_config_byte(sis630_dev,
SIS630_BIOS_CTL_REG, b | 0x80)) {
drivers/i2c/busses/i2c-viapro.c: !pci_read_config_word(pdev,
SMBBA2, &vt596_smba) &&
drivers/ide/rz1000.c: !pci_write_config_word(dev, 0x40, reg & 0xdfff)) {
drivers/net/ethernet/realtek/r8169_main.c:
pci_write_config_byte(pdev, 0x070f, val) == PCIBIOS_SUCCESSFUL)
include/linux/rtsx_pci.h:#define rtsx_pci_read_config_dword(pcr,
where, val) pci_read_config_dword((pcr)->pci, where, val)
include/linux/rtsx_pci.h:#define rtsx_pci_write_config_dword(pcr,
where, val) pci_write_config_dword((pcr)->pci, where, val)
drivers/misc/cardreader/rts5261.c: retval =
rtsx_pci_read_config_dword(pcr,
drivers/misc/cardreader/rts5261.c: retval =
rtsx_pci_write_config_dword(pcr, PCR_SETTING_REG2, lval);

That last one is interesting because I think this is a case in which we
actually want to check for errors, as the driver seems to use it
to ensure that accessing extended config space at offset 0x814
works before relying on the value. Unfortunately the implementation
seems buggy as it a) keeps using the possibly uninitialized value after
printing a warning and b) returns the PCIBIOS_* value in place of a
negative errno and then ignores it in the caller.

Arnd

2020-07-15 15:10:44

by David Laight

[permalink] [raw]
Subject: RE: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

From: Oliver O'Halloran
> Sent: 15 July 2020 05:19
>
> On Wed, Jul 15, 2020 at 8:03 AM Arnd Bergmann <[email protected]> wrote:
...
> > - config space accesses are very rare compared to memory
> > space access and on the hardware side the error handling
> > would be similar, but readl/writel don't return errors, they just
> > access wrong registers or return 0xffffffff.
> > arch/powerpc/kernel/eeh.c has a ton extra code written to
> > deal with it, but no other architectures do.
>
> TBH the EEH MMIO hooks were probably a mistake to begin with. Errors
> detected via MMIO are almost always asynchronous to the error itself
> so you usually just wind up with a misleading stack trace rather than
> any kind of useful synchronous error reporting. It seems like most
> drivers don't bother checking for 0xFFs either and rely on the
> asynchronous reporting via .error_detected() instead, so I have to
> wonder what the point is. I've been thinking of removing the MMIO
> hooks and using a background poller to check for errors on each PHB
> periodically (assuming we don't have an EEH interrupt) instead. That
> would remove the requirement for eeh_dev_check_failure() to be
> interrupt safe too, so it might even let us fix all the godawful races
> in EEH.

I've 'played' with PCIe error handling - without much success.
What might be useful is for a driver that has just read ~0u to
be able to ask 'has there been an error signalled for this device?'.

I got an error generated by doing an MMIO access that was inside
the address range forwarded to the slave, but outside any of its BARs.
(Two BARs of different sizes leaves a nice gap.)
This got reported up to the bridge nearest the slave (which supported
error handling), but not to the root bridge (which I don't think does).
ISTR a message about EEH being handled by the hardware (the machine
is up but dmesg is full of messages from a bouncing USB mouse).

With such partial error reporting useful info can still be extracted.

Of course, what actually happens on a PCIe error is that the signal
gets routed to some 'board support logic' and then passed back into
the kernel as an NMI - which then crashes the kernel!
This even happens when the PCIe link goes down after we've done a
soft-remove of the device itself!
Rather makes updating the board's FPGA without a reboot tricky.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

2020-07-15 22:13:50

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

On Wed, Jul 15, 2020 at 02:38:29PM +0000, David Laight wrote:
> From: Oliver O'Halloran
> > Sent: 15 July 2020 05:19
> >
> > On Wed, Jul 15, 2020 at 8:03 AM Arnd Bergmann <[email protected]> wrote:
> ...
> > > - config space accesses are very rare compared to memory
> > > space access and on the hardware side the error handling
> > > would be similar, but readl/writel don't return errors, they just
> > > access wrong registers or return 0xffffffff.
> > > arch/powerpc/kernel/eeh.c has a ton extra code written to
> > > deal with it, but no other architectures do.
> >
> > TBH the EEH MMIO hooks were probably a mistake to begin with. Errors
> > detected via MMIO are almost always asynchronous to the error itself
> > so you usually just wind up with a misleading stack trace rather than
> > any kind of useful synchronous error reporting. It seems like most
> > drivers don't bother checking for 0xFFs either and rely on the
> > asynchronous reporting via .error_detected() instead, so I have to
> > wonder what the point is. I've been thinking of removing the MMIO
> > hooks and using a background poller to check for errors on each PHB
> > periodically (assuming we don't have an EEH interrupt) instead. That
> > would remove the requirement for eeh_dev_check_failure() to be
> > interrupt safe too, so it might even let us fix all the godawful races
> > in EEH.
>
> I've 'played' with PCIe error handling - without much success.
> What might be useful is for a driver that has just read ~0u to
> be able to ask 'has there been an error signalled for this device?'.

In many cases a driver will know that ~0 is not a valid value for the
register it's reading. But if ~0 *could* be valid, an interface like
you suggest could be useful. I don't think we have anything like that
today, but maybe we could. It would certainly be nice if the PCI core
noticed, logged, and cleared errors. We have some of that for AER,
but that's an optional feature, and support for the error bits in the
garden-variety PCI_STATUS register is pretty haphazard. As you note
below, this sort of SERR/PERR reporting is frequently hard-wired in
ways that takes it out of our purview.

Bjorn

2020-07-15 22:30:58

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

On Wed, 2020-07-15 at 08:47 +0200, Arnd Bergmann wrote:
> drivers/misc/cardreader/rts5261.c: retval =
> rtsx_pci_write_config_dword(pcr, PCR_SETTING_REG2, lval);
>
> That last one is interesting because I think this is a case in which
> we
> actually want to check for errors, as the driver seems to use it
> to ensure that accessing extended config space at offset 0x814
> works before relying on the value. Unfortunately the implementation
> seems buggy as it a) keeps using the possibly uninitialized value
> after
> printing a warning and b) returns the PCIBIOS_* value in place of a
> negative errno and then ignores it in the caller.

In cases like this, usually checking against ~0 is sufficient

Cheers,
Ben.


2020-07-15 22:53:07

by Benjamin Herrenschmidt

[permalink] [raw]
Subject: Re: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

On Wed, 2020-07-15 at 17:12 -0500, Bjorn Helgaas wrote:
> > I've 'played' with PCIe error handling - without much success.
> > What might be useful is for a driver that has just read ~0u to
> > be able to ask 'has there been an error signalled for this device?'.
>
> In many cases a driver will know that ~0 is not a valid value for the
> register it's reading. But if ~0 *could* be valid, an interface like
> you suggest could be useful. I don't think we have anything like that
> today, but maybe we could. It would certainly be nice if the PCI core
> noticed, logged, and cleared errors. We have some of that for AER,
> but that's an optional feature, and support for the error bits in the
> garden-variety PCI_STATUS register is pretty haphazard. As you note
> below, this sort of SERR/PERR reporting is frequently hard-wired in
> ways that takes it out of our purview.

We do have pci_channel_state (via pci_channel_offline()) which covers
the cases where the underlying error handling (such as EEH or unplug)
results in the device being offlined though this tend to be
asynchronous so it might take a few ~0's before you get it.

It's typically used to break potentially infinite loops in some
drivers.

There is no interface to check whether *an* error happened though for
the most cases it will be captured in the status register, which is
harvested (and cleared ?) by some EDAC drivers iirc...

All this lacks coordination, I agree.

Cheers,
Ben.


2020-07-16 08:10:17

by David Laight

[permalink] [raw]
Subject: RE: [RFC PATCH 00/35] Move all PCIBIOS* definitions into arch/x86

From: Benjamin Herrenschmidt
> Sent: 15 July 2020 23:49
> On Wed, 2020-07-15 at 17:12 -0500, Bjorn Helgaas wrote:
> > > I've 'played' with PCIe error handling - without much success.
> > > What might be useful is for a driver that has just read ~0u to
> > > be able to ask 'has there been an error signalled for this device?'.
> >
> > In many cases a driver will know that ~0 is not a valid value for the
> > register it's reading. But if ~0 *could* be valid, an interface like
> > you suggest could be useful. I don't think we have anything like that
> > today, but maybe we could. It would certainly be nice if the PCI core
> > noticed, logged, and cleared errors. We have some of that for AER,
> > but that's an optional feature, and support for the error bits in the
> > garden-variety PCI_STATUS register is pretty haphazard. As you note
> > below, this sort of SERR/PERR reporting is frequently hard-wired in
> > ways that takes it out of our purview.
>
> We do have pci_channel_state (via pci_channel_offline()) which covers
> the cases where the underlying error handling (such as EEH or unplug)
> results in the device being offlined though this tend to be
> asynchronous so it might take a few ~0's before you get it.

On one of my systems I don't think the error TLP from the target
made its way past the first bridge - I could see the error in it's
status registers.
But I couldn't find any of the AER status registers in the root bridge.
So I think you'd need a software poll of the bridge registers to
find out (and clear) the error.

The NMI on the dell system (which is supposed to meet some special
NEBS? server requirements) is just stupid.
Too late to be synchronous and impossible for the OS to handle.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)