The return value of pci_read_config_*() may not indicate a device error.
However, the value read by these functions is more likely to indicate
this kind of error. This presents two overlapping ways of reporting
errors and complicates error checking.
It is possible to move to one single way of checking for error if the
dependencies on the return value of these functions are removed, then it
can later be made to return void.
Remove all uses of the return value of pci_read_config_*().
Check the actual value read for ~0. In this case, ~0 is an invalid value
thus it indicates some kind of error.
In some cases it madkes sence to make the calling function return void
without causing any bug. Future callers can use the value obtained from
these functions for validation. This case pertain to cs5536_read() and
edac_pci_read_dword()
MERGE:
There is no dependency.
Merge individually
Saheed O. Bolarinwa (17):
ata: Drop uses of pci_read_config_*() return value
atm: Drop uses of pci_read_config_*() return value
bcma: Drop uses of pci_read_config_*() return value
hwrng: Drop uses of pci_read_config_*() return value
dmaengine: ioat: Drop uses of pci_read_config_*() return value
edac: Drop uses of pci_read_config_*() return value
fpga: altera-cvp: Drop uses of pci_read_config_*() return value
gpio: Drop uses of pci_read_config_*() return value
drm/i915/vga: Drop uses of pci_read_config_*() return value
hwmon: Drop uses of pci_read_config_*() return value
intel_th: pci: Drop uses of pci_read_config_*() return value
i2c: Drop uses of pci_read_config_*() return value
ide: Drop uses of pci_read_config_*() return value
IB: Drop uses of pci_read_config_*() return value
iommu/vt-d: Drop uses of pci_read_config_*() return value
mtd: Drop uses of pci_read_config_*() return value
net: Drop uses of pci_read_config_*() return value
drivers/ata/pata_cs5536.c | 6 +--
drivers/ata/pata_rz1000.c | 3 +-
drivers/atm/eni.c | 3 +-
drivers/atm/he.c | 12 +++--
drivers/atm/idt77252.c | 9 ++--
drivers/atm/iphase.c | 46 ++++++++++---------
drivers/atm/lanai.c | 4 +-
drivers/atm/nicstar.c | 3 +-
drivers/atm/zatm.c | 9 ++--
drivers/bcma/host_pci.c | 6 ++-
drivers/char/hw_random/amd-rng.c | 6 +--
drivers/dma/ioat/dma.c | 6 +--
drivers/edac/amd64_edac.c | 8 ++--
drivers/edac/amd8111_edac.c | 16 ++-----
drivers/edac/amd8131_edac.c | 6 +--
drivers/edac/i82443bxgx_edac.c | 3 +-
drivers/edac/sb_edac.c | 12 +++--
drivers/edac/skx_common.c | 18 +++++---
drivers/fpga/altera-cvp.c | 8 ++--
drivers/gpio/gpio-amd8111.c | 7 ++-
drivers/gpio/gpio-rdc321x.c | 21 +++++----
drivers/gpu/drm/i915/display/intel_vga.c | 3 +-
drivers/hwmon/i5k_amb.c | 12 +++--
drivers/hwmon/vt8231.c | 8 ++--
drivers/hwtracing/intel_th/pci.c | 12 ++---
drivers/i2c/busses/i2c-ali15x3.c | 6 ++-
drivers/i2c/busses/i2c-elektor.c | 3 +-
drivers/i2c/busses/i2c-nforce2.c | 4 +-
drivers/i2c/busses/i2c-sis5595.c | 17 ++++---
drivers/i2c/busses/i2c-sis630.c | 7 +--
drivers/i2c/busses/i2c-viapro.c | 11 +++--
drivers/ide/cs5536.c | 6 +--
drivers/ide/rz1000.c | 3 +-
drivers/ide/setup-pci.c | 26 +++++++----
drivers/infiniband/hw/hfi1/pcie.c | 38 +++++++--------
drivers/infiniband/hw/mthca/mthca_reset.c | 19 ++++----
drivers/iommu/intel/iommu.c | 6 ++-
drivers/mtd/maps/ichxrom.c | 4 +-
drivers/net/can/peak_canfd/peak_pciefd_main.c | 6 ++-
drivers/net/can/sja1000/peak_pci.c | 6 ++-
drivers/net/ethernet/agere/et131x.c | 11 +++--
.../ethernet/broadcom/bnx2x/bnx2x_ethtool.c | 5 +-
.../cavium/liquidio/cn23xx_pf_device.c | 4 +-
drivers/net/ethernet/marvell/sky2.c | 5 +-
drivers/net/ethernet/mellanox/mlx4/catas.c | 7 +--
drivers/net/ethernet/mellanox/mlx4/reset.c | 10 ++--
.../net/ethernet/myricom/myri10ge/myri10ge.c | 4 +-
drivers/net/wan/farsync.c | 5 +-
.../broadcom/brcm80211/brcmfmac/pcie.c | 4 +-
.../net/wireless/intel/iwlwifi/pcie/trans.c | 15 ++++--
50 files changed, 270 insertions(+), 209 deletions(-)
--
2.18.4
On Sun, Aug 02, 2020 at 07:28:00PM +0200, Saheed Bolarinwa wrote:
> Because the value ~0 has a meaning to some drivers and only
No, ~0 means that the PCI read failed. For *every* PCI device I know.
Here's me reading from 0xf0 offset of my hostbridge:
# setpci -s 00:00.0 0xf0.l
01000000
That device doesn't have extended config space, so the last valid byte
is 0xff. Let's read beyond that:
# setpci -s 00:00.0 0x100.l
ffffffff
> Again, only the drivers can determine if ~0 is a valid value. This
> information is not available inside pci_config_read*().
Of course it is.
*every* change you've done in 6/17 - this is the only patch I have
received - checks for == ~0. So that check can just as well be moved
inside pci_config_read_*().
Here's how one could do it:
#define PCI_OP_READ(size, type, len) \
int noinline pci_bus_read_config_##size \
(struct pci_bus *bus, unsigned int devfn, int pos, type *value) \
{ \
int res; \
unsigned long flags; \
u32 data = 0; \
if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \
pci_lock_config(flags); \
res = bus->ops->read(bus, devfn, pos, len, &data); \
/* Check we actually read something which is not all 1s.*/
if (data == ~0)
return PCIBIOS_READ_FAILED;
*value = (type)data; \
pci_unlock_config(flags); \
return res; \
}
Also, I'd prefer a function to *not* return void but return either
an error or success. In the success case, the @value argument can be
consumed by the caller and otherwise not.
In any case, that change is a step in the wrong direction and I don't
like it, sorry.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Sun, Aug 02, 2020 at 08:46:48PM +0200, Borislav Petkov wrote:
> On Sun, Aug 02, 2020 at 07:28:00PM +0200, Saheed Bolarinwa wrote:
> > Because the value ~0 has a meaning to some drivers and only
>
> No, ~0 means that the PCI read failed. For *every* PCI device I know.
Wait, I'm not convinced yet. I know that if a PCI read fails, you
normally get ~0 data because the host bridge fabricates it to complete
the CPU load.
But what guarantees that a PCI config register cannot contain ~0?
If there's something about that in the spec I'd love to know where it
is because it would simplify a lot of things.
I don't think we should merge any of these patches as-is. If we *do*
want to go this direction, we at least need some kind of macro or
function that tests for ~0 so we have a clue about what's happening
and can grep for it.
Bjorn
On Sun, Aug 02, 2020 at 02:14:06PM -0500, Bjorn Helgaas wrote:
> Wait, I'm not convinced yet. I know that if a PCI read fails, you
> normally get ~0 data because the host bridge fabricates it to complete
> the CPU load.
>
> But what guarantees that a PCI config register cannot contain ~0?
Well, I don't think you can differentiate that case, right?
I guess this is where the driver knowledge comes into play: if the read
returns ~0, the pci_read_config* should probably return in that case
something like:
PCIBIOS_READ_MAYBE_FAILED
to denote it is all 1s and then the caller should be able to determine,
based on any of domain:bus:slot.func and whatever else the driver knows
about its hardware, whether the 1s are a valid value or an error.
Hopefully.
Or something better of which I cannot think of right now...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Sun, Aug 02, 2020 at 02:14:06PM -0500, Bjorn Helgaas wrote:
> But what guarantees that a PCI config register cannot contain ~0?
> If there's something about that in the spec I'd love to know where it
> is because it would simplify a lot of things.
There isn't. An we even have cases like the NVMe controller memory
buffer and persistent memory region, which are BARs that store
abritrary values for later retreival, so it can't. (now those
features have a major issue with error detection, but that is another
issue)