2014-10-30 17:56:47

by Myron Stowe

[permalink] [raw]
Subject: [PATCH 0/3] PCI: Fix detection of read-only BARs

While non-conformant, PCI devices having read-only (r/o) BARs - registers
that when read return fixed, non-zero, values regardless of whether or
not their being sized - occasionally turn up. Pre-git commit 1307ef662199
[1] was the initial attempt to detect such BARs. The detection mechanism
used however ended up exposing further unexpected behaviors on enough
devices that it had to be reverted [2].

A subsequent solution, which is still currently in use, was put in place
with pre-git commit 2c79a80ab7b7 ("[PCI] Correctly size hardwired empty
BARs"). This solution's logic detects r/o BARs via the following (see
'pci_size()'):
/* base == maxbase can be valid only if the BAR has
already been programmed with all 1s. */
if (base == maxbase && ((base | size) & mask) != mask)
return 0;

Later, commit 6ac665c63dca ("PCI: rewrite PCI BAR reading code") was
introduced, re-factoring PCI's core BAR sizing logic. The commit altered
__pci_read_base's local variable 'l', stripping off its lower,
non-addressing related bits, prior to being passed in as the 'base'
parameter to pci_size(). This masking broke the r/o BAR detection logic's
first comparison check for r/o BARs that have any of their lower order
bits, bits that are not part of a BARs "base address" field, set. For
such cases, the 'base == maxbase' comparison check will no longer ever be
"true".

This series restores the r/o BAR detection logic so that it will once
again catch, and ignore, such occurrences as have been encountered to
date:
- AGP aperture BAR of AMD-7xx host bridges; if the AGP window
disabled, this BAR is read-only and read as 0x00000008 [1]
- BAR0-4 of ALi IDE controllers can be non-zero and read-only [1]
- Intel Sandy Bridge - Thermal Management Controller [8086:0103];
BAR 0 returning 0xfed98004 [3]
- Intel Xeon E5 v3/Core i7 Power Control Unit [8086:2fc0];
Bar 0 returning 0x00001a [4]


[1] From Thomas Gleixner's "Linux kernel history" repository:
https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/drivers/pci/probe.c?id=1307ef6621991f1c4bc3cec1b5a4ebd6fd3d66b9
pre-git commit 1307ef662199 "PCI: probing read-only Bars"
[2] Pre-git commit 182d090b9dfe "Undo due to weird behaviour on various
boxes"
[3] https://bugzilla.kernel.org/show_bug.cgi?id=43331
[4] https://bugzilla.kernel.org/show_bug.cgi?id=85991


Myron Stowe (3):
PCI: Restore detection of read-only BARs
PCI: Re-factor __pci_read_base
PCI: Add a informational printk for invalid BARs


drivers/pci/probe.c | 75 +++++++++++++++++++++++----------------------------
1 file changed, 34 insertions(+), 41 deletions(-)

--


2014-10-30 17:56:37

by Myron Stowe

[permalink] [raw]
Subject: [PATCH 1/3] PCI: Restore detection of read-only BARs

Commit 6ac665c63dca ("PCI: rewrite PCI BAR reading code") altered
__pci_read_base's local variable 'l', masking off its lower non-addressing
related bits, prior to it being passed in as the 'base' parameter to
pci_size(). This masking broke pci_size's r/o BAR detection logic's
comparison check for r/o BARs that have lower order bits set. For such
occurrences, the 'base == maxbase' check will no longer ever be "true".

This patch resolves this issue by also masking off the non-addressing
related bits of 'sz' before passing it into pci_size() as the 'maxbase'
parameter. With this change the r/o detection logic of pci_size() will
once again catch known occurrences that have been encountered to date:
- AGP aperture BAR of AMD-7xx host bridges; if the AGP window
disabled, this BAR is read-only and read as 0x00000008 [1]
- BAR0-4 of ALi IDE controllers can be non-zero and read-only [1]
- Intel Sandy Bridge - Thermal Management Controller [8086:0103];
BAR 0 returning 0xfed98004 [2]
- Intel Xeon E5 v3/Core i7 Power Control Unit [8086:2fc0];
Bar 0 returning 0x00001a [3]


[1] From Thomas Gleixner's "Linux kernel history" repository:
https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/drivers/pci/probe.c?id=1307ef6621991f1c4bc3cec1b5a4ebd6fd3d66b9
pre-git commit 1307ef662199 "PCI: probing read-only Bars"
[2] https://bugzilla.kernel.org/show_bug.cgi?id=43331
[3] https://bugzilla.kernel.org/show_bug.cgi?id=85991


Reported-by: William Unruh <[email protected]>
Reported-by: Martin Lucina <[email protected]>
Signed-off-by: Myron Stowe <[email protected]>
Cc: Matthew Wilcox <[email protected]>
---
drivers/pci/probe.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 5ed9930..19dc247 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -216,14 +216,17 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
res->flags |= IORESOURCE_SIZEALIGN;
if (res->flags & IORESOURCE_IO) {
l &= PCI_BASE_ADDRESS_IO_MASK;
+ sz &= PCI_BASE_ADDRESS_IO_MASK;
mask = PCI_BASE_ADDRESS_IO_MASK & (u32) IO_SPACE_LIMIT;
} else {
l &= PCI_BASE_ADDRESS_MEM_MASK;
+ sz &= PCI_BASE_ADDRESS_MEM_MASK;
mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
}
} else {
res->flags |= (l & IORESOURCE_ROM_ENABLE);
l &= PCI_ROM_ADDRESS_MASK;
+ sz &= PCI_ROM_ADDRESS_MASK;
mask = (u32)PCI_ROM_ADDRESS_MASK;
}

2014-10-30 17:56:42

by Myron Stowe

[permalink] [raw]
Subject: [PATCH 2/3] PCI: Re-factor __pci_read_base

__pci_read_base() disables a device's decoding while sizing its BARs. A
ramification of the disabling is printks needing to be withheld which has
lead to some rather messy exit logic.

This patch attempts to resolve such by coalescing the routine's sizing
logic in order to minimize the duration in which the device's decoding is
disabled. This allows the error condition related printks to now reside
within their detection blocks.

The re-factoring also takes advantage of the symmetry of obtaining the
BAR's extent (pci_size) and storing the result as the 'region' for both
the 32 bit and 64 bit BARs, consolidating both cases.

No functional change intended.

Reported-by: William Unruh <[email protected]>
Reported-by: Martin Lucina <[email protected]>
Signed-off-by: Myron Stowe <[email protected]>
Cc: Matthew Wilcox <[email protected]>
---
drivers/pci/probe.c | 77 +++++++++++++++++++++------------------------------
1 file changed, 32 insertions(+), 45 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 19dc247..5c13279 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -175,7 +175,6 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
u64 l64, sz64, mask64;
u16 orig_cmd;
struct pci_bus_region region, inverted_region;
- bool bar_too_big = false, bar_too_high = false, bar_invalid = false;

mask = type ? PCI_ROM_ADDRESS_MASK : ~0;

@@ -201,8 +200,8 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
* memory BAR or a ROM, bit 0 must be clear; if it's an io BAR, bit
* 1 must be clear.
*/
- if (!sz || sz == 0xffffffff)
- goto fail;
+ if (sz == 0xffffffff)
+ sz = 0;

/*
* I don't know how l can have all bits set. Copied from old code.
@@ -215,26 +214,22 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
res->flags = decode_bar(dev, l);
res->flags |= IORESOURCE_SIZEALIGN;
if (res->flags & IORESOURCE_IO) {
- l &= PCI_BASE_ADDRESS_IO_MASK;
- sz &= PCI_BASE_ADDRESS_IO_MASK;
- mask = PCI_BASE_ADDRESS_IO_MASK & (u32) IO_SPACE_LIMIT;
+ l64 = l & PCI_BASE_ADDRESS_IO_MASK;
+ sz64 = sz & PCI_BASE_ADDRESS_IO_MASK;
+ mask64 = PCI_BASE_ADDRESS_IO_MASK & (u32)IO_SPACE_LIMIT;
} else {
- l &= PCI_BASE_ADDRESS_MEM_MASK;
- sz &= PCI_BASE_ADDRESS_MEM_MASK;
- mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
+ l64 = l & PCI_BASE_ADDRESS_MEM_MASK;
+ sz64 = sz & PCI_BASE_ADDRESS_MEM_MASK;
+ mask64 = (u32)PCI_BASE_ADDRESS_MEM_MASK;
}
} else {
res->flags |= (l & IORESOURCE_ROM_ENABLE);
- l &= PCI_ROM_ADDRESS_MASK;
- sz &= PCI_ROM_ADDRESS_MASK;
- mask = (u32)PCI_ROM_ADDRESS_MASK;
+ l64 = l & PCI_ROM_ADDRESS_MASK;
+ sz64 = sz & PCI_ROM_ADDRESS_MASK;
+ mask64 = (u32)PCI_ROM_ADDRESS_MASK;
}

if (res->flags & IORESOURCE_MEM_64) {
- l64 = l;
- sz64 = sz;
- mask64 = mask | (u64)~0 << 32;
-
pci_read_config_dword(dev, pos + 4, &l);
pci_write_config_dword(dev, pos + 4, ~0);
pci_read_config_dword(dev, pos + 4, &sz);
@@ -242,18 +237,24 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,

l64 |= ((u64)l << 32);
sz64 |= ((u64)sz << 32);
+ mask64 |= ((u64)~0 << 32);
+ }

- sz64 = pci_size(l64, sz64, mask64);
+ if (!dev->mmio_always_on &&
+ (orig_cmd & PCI_COMMAND_DECODE_ENABLE))
+ pci_write_config_word(dev, PCI_COMMAND, orig_cmd);

- if (!sz64)
- goto fail;
+ if (!sz64)
+ goto fail;

+ if (res->flags & IORESOURCE_MEM_64) {
if ((sizeof(dma_addr_t) < 8 || sizeof(resource_size_t) < 8) &&
sz64 > 0x100000000ULL) {
res->flags |= IORESOURCE_UNSET | IORESOURCE_DISABLED;
res->start = 0;
res->end = 0;
- bar_too_big = true;
+ dev_err(&dev->dev, "reg 0x%x: can't handle BAR larger than 4GB (size %#010llx)\n",
+ pos, (unsigned long long)sz64);
goto out;
}

@@ -262,21 +263,19 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
res->flags |= IORESOURCE_UNSET;
res->start = 0;
res->end = sz64;
- bar_too_high = true;
+ dev_info(&dev->dev, "reg 0x%x: can't handle BAR above 4G (bus address %#010llx)\n",
+ pos, (unsigned long long)l64);
goto out;
- } else {
- region.start = l64;
- region.end = l64 + sz64;
}
- } else {
- sz = pci_size(l, sz, mask);
+ }

- if (!sz)
- goto fail;
+ sz64 = pci_size(l64, sz64, mask64);

- region.start = l;
- region.end = l + sz;
- }
+ if (!sz64)
+ goto fail;
+
+ region.start = l64;
+ region.end = l64 + sz64;

pcibios_bus_to_resource(dev->bus, res, &region);
pcibios_resource_to_bus(dev->bus, &inverted_region, res);
@@ -296,7 +295,8 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
res->flags |= IORESOURCE_UNSET;
res->start = 0;
res->end = region.end - region.start;
- bar_invalid = true;
+ dev_info(&dev->dev, "reg 0x%x: initial BAR value %#010llx invalid\n",
+ pos, (unsigned long long)region.start);
}

goto out;
@@ -305,19 +305,6 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
fail:
res->flags = 0;
out:
- if (!dev->mmio_always_on &&
- (orig_cmd & PCI_COMMAND_DECODE_ENABLE))
- pci_write_config_word(dev, PCI_COMMAND, orig_cmd);
-
- if (bar_too_big)
- dev_err(&dev->dev, "reg 0x%x: can't handle BAR larger than 4GB (size %#010llx)\n",
- pos, (unsigned long long) sz64);
- if (bar_too_high)
- dev_info(&dev->dev, "reg 0x%x: can't handle BAR above 4G (bus address %#010llx)\n",
- pos, (unsigned long long) l64);
- if (bar_invalid)
- dev_info(&dev->dev, "reg 0x%x: initial BAR value %#010llx invalid\n",
- pos, (unsigned long long) region.start);
if (res->flags)
dev_printk(KERN_DEBUG, &dev->dev, "reg 0x%x: %pR\n", pos, res);

2014-10-30 17:56:51

by Myron Stowe

[permalink] [raw]
Subject: [PATCH 3/3] PCI: Add a informational printk for invalid BARs

As a consequence of restoring the detection of invalid BARs, add a new
informational printk like the following when such occurrences are
encountered.

pci ssss:bb:dd.f: [Firmware Bug]: reg 0xXX: invalid BAR (can't size)

Reported-by: William Unruh <[email protected]>
Reported-by: Martin Lucina <[email protected]>
Signed-off-by: Myron Stowe <[email protected]>
Cc: Matthew Wilcox <[email protected]>
---
drivers/pci/probe.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 5c13279..2953c1d 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -271,8 +271,11 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,

sz64 = pci_size(l64, sz64, mask64);

- if (!sz64)
+ if (!sz64) {
+ dev_info(&dev->dev, "%sreg 0x%x: invalid BAR (can't size)\n",
+ FW_BUG, pos);
goto fail;
+ }

region.start = l64;
region.end = l64 + sz64;

2014-11-11 01:02:38

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 1/3] PCI: Restore detection of read-only BARs

On Thu, Oct 30, 2014 at 11:54:37AM -0600, Myron Stowe wrote:
> Commit 6ac665c63dca ("PCI: rewrite PCI BAR reading code") altered
> __pci_read_base's local variable 'l', masking off its lower non-addressing
> related bits, prior to it being passed in as the 'base' parameter to
> pci_size(). This masking broke pci_size's r/o BAR detection logic's
> comparison check for r/o BARs that have lower order bits set. For such
> occurrences, the 'base == maxbase' check will no longer ever be "true".
>
> This patch resolves this issue by also masking off the non-addressing
> related bits of 'sz' before passing it into pci_size() as the 'maxbase'
> parameter. With this change the r/o detection logic of pci_size() will
> once again catch known occurrences that have been encountered to date:
> - AGP aperture BAR of AMD-7xx host bridges; if the AGP window
> disabled, this BAR is read-only and read as 0x00000008 [1]
> - BAR0-4 of ALi IDE controllers can be non-zero and read-only [1]
> - Intel Sandy Bridge - Thermal Management Controller [8086:0103];
> BAR 0 returning 0xfed98004 [2]
> - Intel Xeon E5 v3/Core i7 Power Control Unit [8086:2fc0];
> Bar 0 returning 0x00001a [3]
>
>
> [1] From Thomas Gleixner's "Linux kernel history" repository:
> https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/drivers/pci/probe.c?id=1307ef6621991f1c4bc3cec1b5a4ebd6fd3d66b9
> pre-git commit 1307ef662199 "PCI: probing read-only Bars"
> [2] https://bugzilla.kernel.org/show_bug.cgi?id=43331
> [3] https://bugzilla.kernel.org/show_bug.cgi?id=85991

I like this patch and I think it's correct.

The only thing that bothers me is that Martin reported that lspci showed
"Memory at <ignored>" for this BAR:
https://bugzilla.kernel.org/show_bug.cgi?id=43331#c36 .

I guess this is because lspci reads the BAR values from
/sys/bus/pci/devices/.../config, which uses pci_read_config(), which
actually reads config space again, so it sees whatever is actually in the
BAR, not the cleared out resource in the pci_dev. Oh well, I guess that's
OK.

> Reported-by: William Unruh <[email protected]>
> Reported-by: Martin Lucina <[email protected]>
> Signed-off-by: Myron Stowe <[email protected]>
> Cc: Matthew Wilcox <[email protected]>
> ---
> drivers/pci/probe.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 5ed9930..19dc247 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -216,14 +216,17 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
> res->flags |= IORESOURCE_SIZEALIGN;
> if (res->flags & IORESOURCE_IO) {
> l &= PCI_BASE_ADDRESS_IO_MASK;
> + sz &= PCI_BASE_ADDRESS_IO_MASK;
> mask = PCI_BASE_ADDRESS_IO_MASK & (u32) IO_SPACE_LIMIT;
> } else {
> l &= PCI_BASE_ADDRESS_MEM_MASK;
> + sz &= PCI_BASE_ADDRESS_MEM_MASK;
> mask = (u32)PCI_BASE_ADDRESS_MEM_MASK;
> }
> } else {
> res->flags |= (l & IORESOURCE_ROM_ENABLE);
> l &= PCI_ROM_ADDRESS_MASK;
> + sz &= PCI_ROM_ADDRESS_MASK;
> mask = (u32)PCI_ROM_ADDRESS_MASK;
> }
>
>

2014-11-11 03:32:01

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH 0/3] PCI: Fix detection of read-only BARs

On Thu, Oct 30, 2014 at 11:54:31AM -0600, Myron Stowe wrote:
> While non-conformant, PCI devices having read-only (r/o) BARs - registers
> that when read return fixed, non-zero, values regardless of whether or
> not their being sized - occasionally turn up. Pre-git commit 1307ef662199
> [1] was the initial attempt to detect such BARs. The detection mechanism
> used however ended up exposing further unexpected behaviors on enough
> devices that it had to be reverted [2].
>
> A subsequent solution, which is still currently in use, was put in place
> with pre-git commit 2c79a80ab7b7 ("[PCI] Correctly size hardwired empty
> BARs"). This solution's logic detects r/o BARs via the following (see
> 'pci_size()'):
> /* base == maxbase can be valid only if the BAR has
> already been programmed with all 1s. */
> if (base == maxbase && ((base | size) & mask) != mask)
> return 0;
>
> Later, commit 6ac665c63dca ("PCI: rewrite PCI BAR reading code") was
> introduced, re-factoring PCI's core BAR sizing logic. The commit altered
> __pci_read_base's local variable 'l', stripping off its lower,
> non-addressing related bits, prior to being passed in as the 'base'
> parameter to pci_size(). This masking broke the r/o BAR detection logic's
> first comparison check for r/o BARs that have any of their lower order
> bits, bits that are not part of a BARs "base address" field, set. For
> such cases, the 'base == maxbase' comparison check will no longer ever be
> "true".
>
> This series restores the r/o BAR detection logic so that it will once
> again catch, and ignore, such occurrences as have been encountered to
> date:
> - AGP aperture BAR of AMD-7xx host bridges; if the AGP window
> disabled, this BAR is read-only and read as 0x00000008 [1]
> - BAR0-4 of ALi IDE controllers can be non-zero and read-only [1]
> - Intel Sandy Bridge - Thermal Management Controller [8086:0103];
> BAR 0 returning 0xfed98004 [3]
> - Intel Xeon E5 v3/Core i7 Power Control Unit [8086:2fc0];
> Bar 0 returning 0x00001a [4]
>
>
> [1] From Thomas Gleixner's "Linux kernel history" repository:
> https://git.kernel.org/cgit/linux/kernel/git/tglx/history.git/commit/drivers/pci/probe.c?id=1307ef6621991f1c4bc3cec1b5a4ebd6fd3d66b9
> pre-git commit 1307ef662199 "PCI: probing read-only Bars"
> [2] Pre-git commit 182d090b9dfe "Undo due to weird behaviour on various
> boxes"
> [3] https://bugzilla.kernel.org/show_bug.cgi?id=43331
> [4] https://bugzilla.kernel.org/show_bug.cgi?id=85991
>
>
> Myron Stowe (3):
> PCI: Restore detection of read-only BARs
> PCI: Re-factor __pci_read_base
> PCI: Add a informational printk for invalid BARs

I applied all three of these to pci/enumeration for v3.19. It's great to
get this long-standing problem fixed and the code cleaned up a bit.

Bjorn