2018-08-09 19:38:41

by Ashok Raj

[permalink] [raw]
Subject: [PATCH] vfio/pci: Some buggy virtual functions incorrectly report 1 for intx.

PCI_INTERRUPT_PIN should always read 0 for SRIOV Virtual Functions.

Some SRIOV devices have some bugs in RTL and VF's end up reading 1
instead of 0 for the PIN.

Since this is a spec required value, rather than having a device specific
quirk, we could fix it permanently in vfio.

Reworked suggestions from Alex https://lkml.org/lkml/2018/7/16/1052

Reported-by: Gage Eads <[email protected]>
Tested-by: Gage Eads <[email protected]>
Signed-off-by: Ashok Raj <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Joerg Roedel <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Gage Eads <[email protected]>
---
drivers/vfio/pci/vfio_pci.c | 12 +++++++++---
drivers/vfio/pci/vfio_pci_config.c | 3 ++-
2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index b423a30..32943dd 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -433,10 +433,16 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
{
if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
u8 pin;
- pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin);
- if (IS_ENABLED(CONFIG_VFIO_PCI_INTX) && !vdev->nointx && pin)
- return 1;
+ /*
+ * INTx must be 0 for all VF's. Enforce that for all
+ * VF's since this is a spec requirement.
+ */
+ if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx ||
+ vdev->pdev->is_virtfn)
+ return 0;

+ pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin);
+ return (pin ? 1 : 0);
} else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) {
u8 pos;
u16 flags;
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 115a36f..e36b7c3 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -1676,7 +1676,8 @@ int vfio_config_init(struct vfio_pci_device *vdev)
*(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(pdev->device);
}

- if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx)
+ if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx ||
+ pdev->is_virtfn)
vconfig[PCI_INTERRUPT_PIN] = 0;

ret = vfio_cap_init(vdev);
--
2.7.4



2018-08-09 19:45:39

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH] vfio/pci: Some buggy virtual functions incorrectly report 1 for intx.

On Thu, 9 Aug 2018 12:37:06 -0700
Ashok Raj <[email protected]> wrote:

> PCI_INTERRUPT_PIN should always read 0 for SRIOV Virtual Functions.
>
> Some SRIOV devices have some bugs in RTL and VF's end up reading 1
> instead of 0 for the PIN.

Hi Ashok,

One question, can we identify which VFs are known to have this issue so
that users (and downstreams) can know how to prioritize this patch?
Thanks,

Alex

> Since this is a spec required value, rather than having a device specific
> quirk, we could fix it permanently in vfio.
>
> Reworked suggestions from Alex https://lkml.org/lkml/2018/7/16/1052
>
> Reported-by: Gage Eads <[email protected]>
> Tested-by: Gage Eads <[email protected]>
> Signed-off-by: Ashok Raj <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]
> Cc: Joerg Roedel <[email protected]>
> Cc: Bjorn Helgaas <[email protected]>
> Cc: Gage Eads <[email protected]>
> ---
> drivers/vfio/pci/vfio_pci.c | 12 +++++++++---
> drivers/vfio/pci/vfio_pci_config.c | 3 ++-
> 2 files changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index b423a30..32943dd 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -433,10 +433,16 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
> {
> if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
> u8 pin;
> - pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin);
> - if (IS_ENABLED(CONFIG_VFIO_PCI_INTX) && !vdev->nointx && pin)
> - return 1;
> + /*
> + * INTx must be 0 for all VF's. Enforce that for all
> + * VF's since this is a spec requirement.
> + */
> + if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx ||
> + vdev->pdev->is_virtfn)
> + return 0;
>
> + pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin);
> + return (pin ? 1 : 0);
> } else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) {
> u8 pos;
> u16 flags;
> diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> index 115a36f..e36b7c3 100644
> --- a/drivers/vfio/pci/vfio_pci_config.c
> +++ b/drivers/vfio/pci/vfio_pci_config.c
> @@ -1676,7 +1676,8 @@ int vfio_config_init(struct vfio_pci_device *vdev)
> *(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(pdev->device);
> }
>
> - if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx)
> + if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx ||
> + pdev->is_virtfn)
> vconfig[PCI_INTERRUPT_PIN] = 0;
>
> ret = vfio_cap_init(vdev);


2018-08-09 23:06:35

by Ashok Raj

[permalink] [raw]
Subject: Re: [PATCH] vfio/pci: Some buggy virtual functions incorrectly report 1 for intx.

On Thu, Aug 09, 2018 at 01:44:17PM -0600, Alex Williamson wrote:
> On Thu, 9 Aug 2018 12:37:06 -0700
> Ashok Raj <[email protected]> wrote:
>
> > PCI_INTERRUPT_PIN should always read 0 for SRIOV Virtual Functions.
> >
> > Some SRIOV devices have some bugs in RTL and VF's end up reading 1
> > instead of 0 for the PIN.
>
> Hi Ashok,
>
> One question, can we identify which VFs are known to have this issue so
> that users (and downstreams) can know how to prioritize this patch?
> Thanks,

Hi Alex,

The hardware isn't public yet, so can't talk about it :-(. Once this patch gets
merged, will let the OSV engagement folks drive it for inclusions. We
could mark this for stable, but i would rather wait until we know the
timeline when they are expecting it to be in. It shouldn't break anything
since we are just enforcing the spec.

Cheers,
Ashok

2018-08-10 16:50:35

by Alan Cox

[permalink] [raw]
Subject: Re: [PATCH] vfio/pci: Some buggy virtual functions incorrectly report 1 for intx.

> The hardware isn't public yet, so can't talk about it :-(. Once this patch gets
> merged, will let the OSV engagement folks drive it for inclusions. We
> could mark this for stable, but i would rather wait until we know the
> timeline when they are expecting it to be in. It shouldn't break anything
> since we are just enforcing the spec.

Until a new better spec appears...

I know there is always fun when it comes to the people involved in
such a screwup having to admit it in public but this should IMHO be
tied to a PCI identifier table so that we know what the afflicted
hardware is. Otherwise some day in the future SRIOV will grow new features
and we'll have no idea what particular devices we need to narrow the
workaround too and indeed not necessarily even know this device is the
only one, as we'll silently fix other stuff then have it break on us
later.

Alan

2018-08-10 16:55:36

by Ashok Raj

[permalink] [raw]
Subject: Re: [PATCH] vfio/pci: Some buggy virtual functions incorrectly report 1 for intx.

On Fri, Aug 10, 2018 at 05:48:36PM +0100, Alan Cox wrote:
> > The hardware isn't public yet, so can't talk about it :-(. Once this patch gets
> > merged, will let the OSV engagement folks drive it for inclusions. We
> > could mark this for stable, but i would rather wait until we know the
> > timeline when they are expecting it to be in. It shouldn't break anything
> > since we are just enforcing the spec.
>
> Until a new better spec appears...

Well, here we are talking about PIN interrupts and VF's. IMO it would be
weird to allow Virtual functions and physical interrupts other than MSIx.

There are other things in the PCIe spec which could be enforced in SW
or expect devices to implement them.

What might be ok is probably doing the right thing in SW as done in this
patch, additionaly maybe we can flag them and say "Your device is busted"
message.

>
> I know there is always fun when it comes to the people involved in
> such a screwup having to admit it in public but this should IMHO be
> tied to a PCI identifier table so that we know what the afflicted
> hardware is. Otherwise some day in the future SRIOV will grow new features
> and we'll have no idea what particular devices we need to narrow the
> workaround too and indeed not necessarily even know this device is the
> only one, as we'll silently fix other stuff then have it break on us
> later.
>
> Alan

2018-09-12 17:49:53

by Ashok Raj

[permalink] [raw]
Subject: Re: [PATCH] vfio/pci: Some buggy virtual functions incorrectly report 1 for intx.

On Thu, Aug 09, 2018 at 01:44:17PM -0600, Alex Williamson wrote:
> On Thu, 9 Aug 2018 12:37:06 -0700
> Ashok Raj <[email protected]> wrote:
>
> > PCI_INTERRUPT_PIN should always read 0 for SRIOV Virtual Functions.
> >
> > Some SRIOV devices have some bugs in RTL and VF's end up reading 1
> > instead of 0 for the PIN.
>
> Hi Ashok,
>
> One question, can we identify which VFs are known to have this issue so
> that users (and downstreams) can know how to prioritize this patch?

Hi Alex

Sorry it took some time to hunt this down.

The offending VF has a device ID : 0x270C
The corresponding PF has a device ID: 0x270B.

> Thanks,
>
> Alex
>
> > Since this is a spec required value, rather than having a device specific
> > quirk, we could fix it permanently in vfio.
> >
> > Reworked suggestions from Alex https://lkml.org/lkml/2018/7/16/1052
> >
> > Reported-by: Gage Eads <[email protected]>
> > Tested-by: Gage Eads <[email protected]>
> > Signed-off-by: Ashok Raj <[email protected]>
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: [email protected]
> > Cc: Joerg Roedel <[email protected]>
> > Cc: Bjorn Helgaas <[email protected]>
> > Cc: Gage Eads <[email protected]>
> > ---
> > drivers/vfio/pci/vfio_pci.c | 12 +++++++++---
> > drivers/vfio/pci/vfio_pci_config.c | 3 ++-
> > 2 files changed, 11 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> > index b423a30..32943dd 100644
> > --- a/drivers/vfio/pci/vfio_pci.c
> > +++ b/drivers/vfio/pci/vfio_pci.c
> > @@ -433,10 +433,16 @@ static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
> > {
> > if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
> > u8 pin;
> > - pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin);
> > - if (IS_ENABLED(CONFIG_VFIO_PCI_INTX) && !vdev->nointx && pin)
> > - return 1;
> > + /*
> > + * INTx must be 0 for all VF's. Enforce that for all
> > + * VF's since this is a spec requirement.
> > + */
> > + if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx ||
> > + vdev->pdev->is_virtfn)
> > + return 0;
> >
> > + pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin);
> > + return (pin ? 1 : 0);
> > } else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) {
> > u8 pos;
> > u16 flags;
> > diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
> > index 115a36f..e36b7c3 100644
> > --- a/drivers/vfio/pci/vfio_pci_config.c
> > +++ b/drivers/vfio/pci/vfio_pci_config.c
> > @@ -1676,7 +1676,8 @@ int vfio_config_init(struct vfio_pci_device *vdev)
> > *(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(pdev->device);
> > }
> >
> > - if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx)
> > + if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx ||
> > + pdev->is_virtfn)
> > vconfig[PCI_INTERRUPT_PIN] = 0;
> >
> > ret = vfio_cap_init(vdev);
>

2018-09-19 04:00:26

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH] vfio/pci: Some buggy virtual functions incorrectly report 1 for intx.

On Wed, 12 Sep 2018 10:46:19 -0700
"Raj, Ashok" <[email protected]> wrote:

> On Thu, Aug 09, 2018 at 01:44:17PM -0600, Alex Williamson wrote:
> > On Thu, 9 Aug 2018 12:37:06 -0700
> > Ashok Raj <[email protected]> wrote:
> >
> > > PCI_INTERRUPT_PIN should always read 0 for SRIOV Virtual
> > > Functions.
> > >
> > > Some SRIOV devices have some bugs in RTL and VF's end up reading 1
> > > instead of 0 for the PIN.
> >
> > Hi Ashok,
> >
> > One question, can we identify which VFs are known to have this
> > issue so that users (and downstreams) can know how to prioritize
> > this patch?
>
> Hi Alex
>
> Sorry it took some time to hunt this down.
>
> The offending VF has a device ID : 0x270C
> The corresponding PF has a device ID: 0x270B.

Ok, I interpret Alan's previous comment about the patch[1] to suggest a
structure a bit more like that below. IOW, we know that 8086:270c
inspires this change, but once it's included we won't know who else
relies on it. We can perhaps encourage better hardware validation, or
at least better tracking of who needs this with a warning and
whitelist. Testing, especially positive and negative testing against
the warning, and reviews welcome. Thanks,

Alex

[1]https://lkml.org/lkml/2018/8/10/462

commit d780da26a81c6f47522ae0aeff03abd4d08b89b5
Author: Alex Williamson <[email protected]>
Date: Tue Sep 18 21:27:57 2018 -0600

vfio/pci: Mask buggy SR-IOV VF INTx support

The SR-IOV spec requires that VFs must report zero for the INTx pin
register as VFs are precluded from INTx support. It's much easier for
the host kernel to understand whether a device is a VF and therefore
whether a non-zero pin register value is bogus than it is to do the
same in userspace. Override the INTx count for such devices and
virtualize the pin register to provide a consistent view of the device
to the user.

As this is clearly a spec violation, warn about it to support hardware
validation, but also provide a known whitelist as it doesn't do much
good to continue complaining if the hardware vendor doesn't plan to
fix it.

Known devices with this issue: 8086:270c

Signed-off-by: Alex Williamson <[email protected]>

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index cddb453a1ba5..8af3f6f35f32 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -430,14 +430,41 @@ static int vfio_pci_open(void *device_data)
return ret;
}

+static const struct pci_device_id known_bogus_vf_intx_pin[] = {
+ { PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x270c) },
+ {}
+};
+
static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
{
if (irq_type == VFIO_PCI_INTX_IRQ_INDEX) {
u8 pin;
+
+ if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx)
+ return 0;
+
pci_read_config_byte(vdev->pdev, PCI_INTERRUPT_PIN, &pin);
- if (IS_ENABLED(CONFIG_VFIO_PCI_INTX) && !vdev->nointx && pin)
- return 1;

+ /*
+ * Per SR-IOV spec rev 1.1, 3.4.1.18 the interrupt pin register
+ * does not apply to VFs and VFs must implement this register
+ * as read-only with value zero. Userspace is not readily
+ * able to identify a device as a VF and thus that the pin
+ * definition on the device is bogus should a device violate
+ * this requirement. For such devices, override the bogus
+ * value and provide a warning to support hardware validation
+ * (or be quite if it's known). PCI config space emulation
+ * will virtualize this register for all VFs.
+ */
+ if (pin && vdev->pdev->is_virtfn) {
+ if (!pci_match_id(known_bogus_vf_intx_pin, vdev->pdev))
+ dev_warn_once(&vdev->pdev->dev,
+ "VF reports bogus INTx pin %d\n",
+ pin);
+ return 0;
+ }
+
+ return pin ? 1 : 0;
} else if (irq_type == VFIO_PCI_MSI_IRQ_INDEX) {
u8 pos;
u16 flags;
diff --git a/drivers/vfio/pci/vfio_pci_config.c b/drivers/vfio/pci/vfio_pci_config.c
index 62023b4a373b..25130fa6e265 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -1678,7 +1678,8 @@ int vfio_config_init(struct vfio_pci_device *vdev)
*(__le16 *)&vconfig[PCI_DEVICE_ID] = cpu_to_le16(pdev->device);
}

- if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx)
+ if (!IS_ENABLED(CONFIG_VFIO_PCI_INTX) || vdev->nointx ||
+ pdev->is_virtfn)
vconfig[PCI_INTERRUPT_PIN] = 0;

ret = vfio_cap_init(vdev);

2018-09-19 22:27:18

by Eads, Gage

[permalink] [raw]
Subject: RE: [PATCH] vfio/pci: Some buggy virtual functions incorrectly report 1 for intx.

> This looks good and also addresses Alan's concern that don't silently hide under
> the rug for all devices. We'll also queue it for testing just to confirm and keep
> you posted.
>
> Reviewed-by: Ashok Raj <[email protected]>
>

Hi Alex,

I've confirmed that the patch works as intended for the 8086:270c device, and negative tested the warning by commenting out the device's entry in known_bogus_vf_intx_pin.

For what it's worth, my positive test case - launching a QEMU VM with a VFIO-owned 0x270c device - did not trigger the warning. This is because the change to vfio_pci_config.c caused QEMU to read a PCI_INTERRUPT_PIN value of 0, and so didn't execute a codepath that calls vfio_pci_get_irq_count(). Instead, I used a simple C program that calls the VFIO_DEVICE_GET_IRQ_INFO ioctl for negative testing.

Also, there's one typo in the comment: 'quite' -> 'quiet'

Tested-by: Gage Eads <[email protected]>

Thanks,
Gage

2018-09-19 22:52:03

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH] vfio/pci: Some buggy virtual functions incorrectly report 1 for intx.

On Wed, 19 Sep 2018 22:25:03 +0000
"Eads, Gage" <[email protected]> wrote:

> > This looks good and also addresses Alan's concern that don't
> > silently hide under the rug for all devices. We'll also queue it
> > for testing just to confirm and keep you posted.
> >
> > Reviewed-by: Ashok Raj <[email protected]>
> >
>
> Hi Alex,
>
> I've confirmed that the patch works as intended for the 8086:270c
> device, and negative tested the warning by commenting out the
> device's entry in known_bogus_vf_intx_pin.
>
> For what it's worth, my positive test case - launching a QEMU VM with
> a VFIO-owned 0x270c device - did not trigger the warning. This is
> because the change to vfio_pci_config.c caused QEMU to read a
> PCI_INTERRUPT_PIN value of 0, and so didn't execute a codepath that
> calls vfio_pci_get_irq_count(). Instead, I used a simple C program
> that calls the VFIO_DEVICE_GET_IRQ_INFO ioctl for negative testing.
>
> Also, there's one typo in the comment: 'quite' -> 'quiet'
>
> Tested-by: Gage Eads <[email protected]>

Thanks for the thorough testing, Gage! So it sounds like we're not
quite generating the desired result. We'd really like QEMU, as the
predominant userspace driver for VFIO, to generate the warning,
otherwise there's hardly any purpose to having it. That suggests I
should have put the warning where the config space emulation is setup
rather than the irq count path. Thanks,

Alex