2015-05-26 17:54:13

by Alex Williamson

[permalink] [raw]
Subject: [PATCH 0/2] ACPI / PCI: Fix _PRT lookup for ARI enabled devices

In most cases we only use ARI with SR-IOV VFs, which do not support
INTx and therefore never hit this problem. However, some non-SR-IOV
implementations create multiple PFs, extending beyond the standard
3-bit function numbers with ARI, and do support INTx for those
additional functions. This can happen with Solarflare SFC9120
adapters. The host driver typically doesn't use INTx, so we also
haven't noticed this problem on bare metal, but when we attempt to
assign the device to a VM using vfio-pci, we fail trying to setup
default INTx signaling. Thanks,

Alex

---

Alex Williamson (2):
PCI: Move pci_ari_enabled() to global header
ACPI / PCI: Account for ARI in _PRT lookups


drivers/acpi/pci_irq.c | 4 ++--
drivers/pci/pci.h | 11 -----------
include/linux/pci.h | 11 +++++++++++
3 files changed, 13 insertions(+), 13 deletions(-)


2015-05-26 17:54:08

by Alex Williamson

[permalink] [raw]
Subject: [PATCH 1/2] PCI: Move pci_ari_enabled() to global header

This is useful outside of drivers/pci, particularly for deriving INTx
routing via ACPI _PRT. Also convert to bool return.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/pci/pci.h | 11 -----------
include/linux/pci.h | 11 +++++++++++
2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 9bd762c2..c1b2a43 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -216,17 +216,6 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
struct list_head *fail_head);
bool pci_bus_clip_resource(struct pci_dev *dev, int idx);

-/**
- * pci_ari_enabled - query ARI forwarding status
- * @bus: the PCI bus
- *
- * Returns 1 if ARI forwarding is enabled, or 0 if not enabled;
- */
-static inline int pci_ari_enabled(struct pci_bus *bus)
-{
- return bus->self && bus->self->ari_enabled;
-}
-
void pci_reassigndev_resource_alignment(struct pci_dev *dev);
void pci_disable_bridge_window(struct pci_dev *dev);

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 353db8d..2925561 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1905,4 +1905,15 @@ static inline bool pci_is_dev_assigned(struct pci_dev *pdev)
{
return (pdev->dev_flags & PCI_DEV_FLAGS_ASSIGNED) == PCI_DEV_FLAGS_ASSIGNED;
}
+
+/**
+ * pci_ari_enabled - query ARI forwarding status
+ * @bus: the PCI bus
+ *
+ * Returns true if ARI forwarding is enabled.
+ */
+static inline bool pci_ari_enabled(struct pci_bus *bus)
+{
+ return bus->self && bus->self->ari_enabled;
+}
#endif /* LINUX_PCI_H */

2015-05-26 17:54:16

by Alex Williamson

[permalink] [raw]
Subject: [PATCH 2/2] ACPI / PCI: Account for ARI in _PRT lookups

The PCIe specification, rev 3.0, section 2.2.8.1, contains the
following implementation note:

Virtual Wire Mapping for INTx Interrupts From ARI Devices

The implied Device Number for an ARI Device is 0. When ARI-aware
software (including BIOS and operating system) enables ARI
Forwarding in the Downstream Port immediately above an ARI Device
in order to access its Extended Functions, software must
comprehend that the Downstream Port will use Device Number 0 for
the virtual wire mappings of INTx interrupts coming from all
Functions of the ARI Device. If non-ARI-aware software attempts
to determine the virtual wire mappings for Extended Functions, it
can come up with incorrect mappings by examining the traditional
Device Number field and finding it to be non-0.

We account for this in pci_swizzle_interrupt_pin(), but it looks like
we miss it here, looking for a _PRT entry with a slot matching the
ARI device slot number. This can cause errors like:

pcieport 0000:80:03.0: can't derive routing for PCI INT B
sfc 0000:82:01.1: PCI INT B: no GSI

pci_dev.irq is then invalid, resulting in errors for drivers that
attempt to enable INTx on the device. Fix by using slot 0 for ARI
enabled devices.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/acpi/pci_irq.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
index b1def41..65e83cd 100644
--- a/drivers/acpi/pci_irq.c
+++ b/drivers/acpi/pci_irq.c
@@ -163,7 +163,7 @@ static int acpi_pci_irq_check_entry(acpi_handle handle, struct pci_dev *dev,
{
int segment = pci_domain_nr(dev->bus);
int bus = dev->bus->number;
- int device = PCI_SLOT(dev->devfn);
+ int device = pci_ari_enabled(dev->bus) ? 0 : PCI_SLOT(dev->devfn);
struct acpi_prt_entry *entry;

if (((prt->address >> 16) & 0xffff) != device ||
@@ -181,7 +181,7 @@ static int acpi_pci_irq_check_entry(acpi_handle handle, struct pci_dev *dev,
*/
entry->id.segment = segment;
entry->id.bus = bus;
- entry->id.device = (prt->address >> 16) & 0xFFFF;
+ entry->id.device = PCI_SLOT(dev->devfn);
entry->pin = prt->pin + 1;

do_prt_fixups(entry, prt);

2015-05-26 20:06:56

by Donald Dutile

[permalink] [raw]
Subject: Re: [PATCH 2/2] ACPI / PCI: Account for ARI in _PRT lookups

On 05/26/2015 01:54 PM, Alex Williamson wrote:
> The PCIe specification, rev 3.0, section 2.2.8.1, contains the
> following implementation note:
>
> Virtual Wire Mapping for INTx Interrupts From ARI Devices
>
> The implied Device Number for an ARI Device is 0. When ARI-aware
> software (including BIOS and operating system) enables ARI
> Forwarding in the Downstream Port immediately above an ARI Device
> in order to access its Extended Functions, software must
> comprehend that the Downstream Port will use Device Number 0 for
> the virtual wire mappings of INTx interrupts coming from all
> Functions of the ARI Device. If non-ARI-aware software attempts
> to determine the virtual wire mappings for Extended Functions, it
> can come up with incorrect mappings by examining the traditional
> Device Number field and finding it to be non-0.
>
> We account for this in pci_swizzle_interrupt_pin(), but it looks like
> we miss it here, looking for a _PRT entry with a slot matching the
> ARI device slot number. This can cause errors like:
>
> pcieport 0000:80:03.0: can't derive routing for PCI INT B
> sfc 0000:82:01.1: PCI INT B: no GSI
>
> pci_dev.irq is then invalid, resulting in errors for drivers that
> attempt to enable INTx on the device. Fix by using slot 0 for ARI
> enabled devices.
>
> Signed-off-by: Alex Williamson <[email protected]>
> ---
> drivers/acpi/pci_irq.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
> index b1def41..65e83cd 100644
> --- a/drivers/acpi/pci_irq.c
> +++ b/drivers/acpi/pci_irq.c
> @@ -163,7 +163,7 @@ static int acpi_pci_irq_check_entry(acpi_handle handle, struct pci_dev *dev,
> {
> int segment = pci_domain_nr(dev->bus);
> int bus = dev->bus->number;
> - int device = PCI_SLOT(dev->devfn);
> + int device = pci_ari_enabled(dev->bus) ? 0 : PCI_SLOT(dev->devfn);
> struct acpi_prt_entry *entry;
>
> if (((prt->address >> 16) & 0xffff) != device ||
> @@ -181,7 +181,7 @@ static int acpi_pci_irq_check_entry(acpi_handle handle, struct pci_dev *dev,
> */
> entry->id.segment = segment;
> entry->id.bus = bus;
> - entry->id.device = (prt->address >> 16) & 0xFFFF;
> + entry->id.device = PCI_SLOT(dev->devfn);
I would expect that this should be = device, not PCI_SLOT(dev->devfn),
esp if used by ACPI core, since it'll be expecting a swizzle from device 0,
per above spec.

Additionally, if you look at the beginning of this function, this check is performed:
if (((prt->address >> 16) & 0xffff) != device ||
prt->pin + 1 != pin)
return -ENODEV;

So, that implies you leave this assignment as is,
or set it to device -- six of one, half-dozen another.


> entry->pin = prt->pin + 1;
>
> do_prt_fixups(entry, prt);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

2015-05-26 20:42:32

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 2/2] ACPI / PCI: Account for ARI in _PRT lookups

On Tue, 2015-05-26 at 16:06 -0400, Don Dutile wrote:
> On 05/26/2015 01:54 PM, Alex Williamson wrote:
> > The PCIe specification, rev 3.0, section 2.2.8.1, contains the
> > following implementation note:
> >
> > Virtual Wire Mapping for INTx Interrupts From ARI Devices
> >
> > The implied Device Number for an ARI Device is 0. When ARI-aware
> > software (including BIOS and operating system) enables ARI
> > Forwarding in the Downstream Port immediately above an ARI Device
> > in order to access its Extended Functions, software must
> > comprehend that the Downstream Port will use Device Number 0 for
> > the virtual wire mappings of INTx interrupts coming from all
> > Functions of the ARI Device. If non-ARI-aware software attempts
> > to determine the virtual wire mappings for Extended Functions, it
> > can come up with incorrect mappings by examining the traditional
> > Device Number field and finding it to be non-0.
> >
> > We account for this in pci_swizzle_interrupt_pin(), but it looks like
> > we miss it here, looking for a _PRT entry with a slot matching the
> > ARI device slot number. This can cause errors like:
> >
> > pcieport 0000:80:03.0: can't derive routing for PCI INT B
> > sfc 0000:82:01.1: PCI INT B: no GSI
> >
> > pci_dev.irq is then invalid, resulting in errors for drivers that
> > attempt to enable INTx on the device. Fix by using slot 0 for ARI
> > enabled devices.
> >
> > Signed-off-by: Alex Williamson <[email protected]>
> > ---
> > drivers/acpi/pci_irq.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
> > index b1def41..65e83cd 100644
> > --- a/drivers/acpi/pci_irq.c
> > +++ b/drivers/acpi/pci_irq.c
> > @@ -163,7 +163,7 @@ static int acpi_pci_irq_check_entry(acpi_handle handle, struct pci_dev *dev,
> > {
> > int segment = pci_domain_nr(dev->bus);
> > int bus = dev->bus->number;
> > - int device = PCI_SLOT(dev->devfn);
> > + int device = pci_ari_enabled(dev->bus) ? 0 : PCI_SLOT(dev->devfn);
> > struct acpi_prt_entry *entry;
> >
> > if (((prt->address >> 16) & 0xffff) != device ||
> > @@ -181,7 +181,7 @@ static int acpi_pci_irq_check_entry(acpi_handle handle, struct pci_dev *dev,
> > */
> > entry->id.segment = segment;
> > entry->id.bus = bus;
> > - entry->id.device = (prt->address >> 16) & 0xFFFF;
> > + entry->id.device = PCI_SLOT(dev->devfn);
> I would expect that this should be = device, not PCI_SLOT(dev->devfn),
> esp if used by ACPI core, since it'll be expecting a swizzle from device 0,
> per above spec.

But it's not used by ACPI core.

> Additionally, if you look at the beginning of this function, this check is performed:
> if (((prt->address >> 16) & 0xffff) != device ||
> prt->pin + 1 != pin)
> return -ENODEV;
>
> So, that implies you leave this assignment as is,
> or set it to device -- six of one, half-dozen another.

TBH, I didn't really know what to do with this field. struct
acpi_prt_entry is defined locally to this file, so we're not passing it
out to ACPI core for anything. The only consumer of entry.id in this
call path is the debug print at the bottom of the function:

ACPI_DEBUG_PRINT_RAW((ACPI_DB_INFO,
" %04x:%02x:%02x[%c] -> %s[%d]\n",
entry->id.segment, entry->id.bus,
entry->id.device, pin_name(entry->pin),
prt->source, entry->index));

Which is the reason I chose to use the value that I did, because using
'device', aka 0, in the ARI path would be confusing.

I think that the only reason entry.id exists is for the fixup code in
this file. I'm happy to leave it as 'device' or the original
'(prt->address >> 16) & 0xFFFF', but what I have feels more correct for
the debug printk if nothing else. Thanks,

Alex

> > entry->pin = prt->pin + 1;
> >
> > do_prt_fixups(entry, prt);
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
>


2015-05-26 20:58:56

by Donald Dutile

[permalink] [raw]
Subject: Re: [PATCH 2/2] ACPI / PCI: Account for ARI in _PRT lookups

On 05/26/2015 04:42 PM, Alex Williamson wrote:
> On Tue, 2015-05-26 at 16:06 -0400, Don Dutile wrote:
>> On 05/26/2015 01:54 PM, Alex Williamson wrote:
>>> The PCIe specification, rev 3.0, section 2.2.8.1, contains the
>>> following implementation note:
>>>
>>> Virtual Wire Mapping for INTx Interrupts From ARI Devices
>>>
>>> The implied Device Number for an ARI Device is 0. When ARI-aware
>>> software (including BIOS and operating system) enables ARI
>>> Forwarding in the Downstream Port immediately above an ARI Device
>>> in order to access its Extended Functions, software must
>>> comprehend that the Downstream Port will use Device Number 0 for
>>> the virtual wire mappings of INTx interrupts coming from all
>>> Functions of the ARI Device. If non-ARI-aware software attempts
>>> to determine the virtual wire mappings for Extended Functions, it
>>> can come up with incorrect mappings by examining the traditional
>>> Device Number field and finding it to be non-0.
>>>
>>> We account for this in pci_swizzle_interrupt_pin(), but it looks like
>>> we miss it here, looking for a _PRT entry with a slot matching the
>>> ARI device slot number. This can cause errors like:
>>>
>>> pcieport 0000:80:03.0: can't derive routing for PCI INT B
>>> sfc 0000:82:01.1: PCI INT B: no GSI
>>>
>>> pci_dev.irq is then invalid, resulting in errors for drivers that
>>> attempt to enable INTx on the device. Fix by using slot 0 for ARI
>>> enabled devices.
>>>
>>> Signed-off-by: Alex Williamson <[email protected]>
>>> ---
>>> drivers/acpi/pci_irq.c | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/acpi/pci_irq.c b/drivers/acpi/pci_irq.c
>>> index b1def41..65e83cd 100644
>>> --- a/drivers/acpi/pci_irq.c
>>> +++ b/drivers/acpi/pci_irq.c
>>> @@ -163,7 +163,7 @@ static int acpi_pci_irq_check_entry(acpi_handle handle, struct pci_dev *dev,
>>> {
>>> int segment = pci_domain_nr(dev->bus);
>>> int bus = dev->bus->number;
>>> - int device = PCI_SLOT(dev->devfn);
>>> + int device = pci_ari_enabled(dev->bus) ? 0 : PCI_SLOT(dev->devfn);
>>> struct acpi_prt_entry *entry;
>>>
>>> if (((prt->address >> 16) & 0xffff) != device ||
>>> @@ -181,7 +181,7 @@ static int acpi_pci_irq_check_entry(acpi_handle handle, struct pci_dev *dev,
>>> */
>>> entry->id.segment = segment;
>>> entry->id.bus = bus;
>>> - entry->id.device = (prt->address >> 16) & 0xFFFF;
>>> + entry->id.device = PCI_SLOT(dev->devfn);
>> I would expect that this should be = device, not PCI_SLOT(dev->devfn),
>> esp if used by ACPI core, since it'll be expecting a swizzle from device 0,
>> per above spec.
>
> But it's not used by ACPI core.
>
>> Additionally, if you look at the beginning of this function, this check is performed:
>> if (((prt->address >> 16) & 0xffff) != device ||
>> prt->pin + 1 != pin)
>> return -ENODEV;
>>
>> So, that implies you leave this assignment as is,
>> or set it to device -- six of one, half-dozen another.
>
> TBH, I didn't really know what to do with this field. struct
> acpi_prt_entry is defined locally to this file, so we're not passing it
> out to ACPI core for anything. The only consumer of entry.id in this
> call path is the debug print at the bottom of the function:
Well, do_prt_fixups(entry, prt) is called right after the setting,
and although your patch leaves the state of id.device as it was before
to enable the matches in the current list, that may not be so later.

I'd leave the id.device as it was -- set to prt->address >> 16) & 0xFFFF
for the proper matching all around.

if you want the printf to change, patch it to use PCI_SLOT(dev->devfn) instead of entry->id.device.


>
> ACPI_DEBUG_PRINT_RAW((ACPI_DB_INFO,
> " %04x:%02x:%02x[%c] -> %s[%d]\n",
> entry->id.segment, entry->id.bus,
> entry->id.device, pin_name(entry->pin),
> prt->source, entry->index));
>
> Which is the reason I chose to use the value that I did, because using
> 'device', aka 0, in the ARI path would be confusing.
>
> I think that the only reason entry.id exists is for the fixup code in
> this file. I'm happy to leave it as 'device' or the original
> '(prt->address >> 16) & 0xFFFF', but what I have feels more correct for
> the debug printk if nothing else. Thanks,
>
> Alex
>
>>> entry->pin = prt->pin + 1;
>>>
>>> do_prt_fixups(entry, prt);
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>
>
>
>