2017-04-07 14:32:32

by Joerg Roedel

[permalink] [raw]
Subject: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

From: Joerg Roedel <[email protected]>

ATS is broken on this hardware and causes IOMMU stalls and
system failure. Disable ATS on these devices to make them
usable again with IOMMU enabled.

Note that the commit in the Fixes-tag is not buggy, it
just uncovers the problem in the hardware by increasing
the ATS-flush rate.

Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
Signed-off-by: Joerg Roedel <[email protected]>
---
drivers/pci/quirks.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 6736836..7cbe316 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev *pdev)
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031, quirk_no_aersid);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032, quirk_no_aersid);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033, quirk_no_aersid);
+
+#ifdef CONFIG_PCI_ATS
+/*
+ * Some devices have a broken ATS implementation causing IOMMU stalls.
+ * Don't use ATS for those devices.
+ */
+static void quirk_disable_ats(struct pci_dev *pdev)
+{
+ /*
+ * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
+ * early.
+ */
+ dev_info(&pdev->dev, "QUIRK: Disabling ATS");
+ pdev->ats_cap = 0;
+}
+
+/* AMD Stoney platform GPU */
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_disable_ats);
+#endif /* CONFIG_PCI_ATS */
--
1.9.1


2017-04-07 16:46:51

by Deucher, Alexander

[permalink] [raw]
Subject: RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

> -----Original Message-----
> From: Joerg Roedel [mailto:[email protected]]
> Sent: Friday, April 07, 2017 10:32 AM
> To: Bjorn Helgaas
> Cc: [email protected]; [email protected]; Daniel Drake;
> Deucher, Alexander; Samuel Sieb; David Woodhouse; Joerg Roedel
> Subject: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
>
> From: Joerg Roedel <[email protected]>
>
> ATS is broken on this hardware and causes IOMMU stalls and
> system failure. Disable ATS on these devices to make them
> usable again with IOMMU enabled.
>
> Note that the commit in the Fixes-tag is not buggy, it
> just uncovers the problem in the hardware by increasing
> the ATS-flush rate.
>
> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> Signed-off-by: Joerg Roedel <[email protected]>

Acked-by: Alex Deucher <[email protected]>

> ---
> drivers/pci/quirks.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6736836..7cbe316 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev *pdev)
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031,
> quirk_no_aersid);
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032,
> quirk_no_aersid);
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033,
> quirk_no_aersid);
> +
> +#ifdef CONFIG_PCI_ATS
> +/*
> + * Some devices have a broken ATS implementation causing IOMMU stalls.
> + * Don't use ATS for those devices.
> + */
> +static void quirk_disable_ats(struct pci_dev *pdev)
> +{
> + /*
> + * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
> + * early.
> + */
> + dev_info(&pdev->dev, "QUIRK: Disabling ATS");
> + pdev->ats_cap = 0;
> +}
> +
> +/* AMD Stoney platform GPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4,
> quirk_disable_ats);
> +#endif /* CONFIG_PCI_ATS */
> --
> 1.9.1


2017-04-08 07:40:59

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <[email protected]>
>
> ATS is broken on this hardware and causes IOMMU stalls and
> system failure. Disable ATS on these devices to make them
> usable again with IOMMU enabled.

AMD Stoney Ridge is an x86 CPU + GPU combo and this quirk pertains
to the GPU, right?

In that case the quirk should go to arch/x86. Paul Menzel (+cc)
has just complained on linux-pci@ that final fixups are taking half
a second, and I think that could be reduced if more efforts were
spent to move arch-specific quirks out of the catch-all in
drivers/pci/quirks.c.

Thanks,

Lukas

>
> Note that the commit in the Fixes-tag is not buggy, it
> just uncovers the problem in the hardware by increasing
> the ATS-flush rate.
>
> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> Signed-off-by: Joerg Roedel <[email protected]>
> ---
> drivers/pci/quirks.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6736836..7cbe316 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev *pdev)
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031, quirk_no_aersid);
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032, quirk_no_aersid);
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033, quirk_no_aersid);
> +
> +#ifdef CONFIG_PCI_ATS
> +/*
> + * Some devices have a broken ATS implementation causing IOMMU stalls.
> + * Don't use ATS for those devices.
> + */
> +static void quirk_disable_ats(struct pci_dev *pdev)
> +{
> + /*
> + * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
> + * early.
> + */
> + dev_info(&pdev->dev, "QUIRK: Disabling ATS");
> + pdev->ats_cap = 0;
> +}
> +
> +/* AMD Stoney platform GPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_disable_ats);
> +#endif /* CONFIG_PCI_ATS */
> --
> 1.9.1
>

2017-04-20 12:11:50

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

On Sat, Apr 08, 2017 at 09:41:07AM +0200, Lukas Wunner wrote:
> On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> > From: Joerg Roedel <[email protected]>
> >
> > ATS is broken on this hardware and causes IOMMU stalls and
> > system failure. Disable ATS on these devices to make them
> > usable again with IOMMU enabled.
>
> AMD Stoney Ridge is an x86 CPU + GPU combo and this quirk pertains
> to the GPU, right?
>
> In that case the quirk should go to arch/x86. Paul Menzel (+cc)
> has just complained on linux-pci@ that final fixups are taking half
> a second, and I think that could be reduced if more efforts were
> spent to move arch-specific quirks out of the catch-all in
> drivers/pci/quirks.c.

The affected hardware here might be x86-only, but ATS is not. If a
broken ATS-capable plug-in card appears, we need this in generic code
anyway.

Also has anyone profiled why the fixups take so long (and on what
hardware)? Maybe the fixup-device matching can be improved instead of
cluttering arch-code with pci-fixups.


Joerg

2017-06-15 14:04:30

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

Hi Bjorn,

On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <[email protected]>
>
> ATS is broken on this hardware and causes IOMMU stalls and
> system failure. Disable ATS on these devices to make them
> usable again with IOMMU enabled.
>
> Note that the commit in the Fixes-tag is not buggy, it
> just uncovers the problem in the hardware by increasing
> the ATS-flush rate.
>
> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> Signed-off-by: Joerg Roedel <[email protected]>
> ---
> drivers/pci/quirks.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)

Any more objections on this patch? Please let me know if you want to
have something changed.


Regards,

Joerg

2017-06-15 17:02:04

by Samuel Sieb

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

On 06/15/2017 07:04 AM, Joerg Roedel wrote:
> Hi Bjorn,
>
> On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
>> From: Joerg Roedel <[email protected]>
>>
>> ATS is broken on this hardware and causes IOMMU stalls and
>> system failure. Disable ATS on these devices to make them
>> usable again with IOMMU enabled.
>>
>> Note that the commit in the Fixes-tag is not buggy, it
>> just uncovers the problem in the hardware by increasing
>> the ATS-flush rate.
>>
>> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
>> Signed-off-by: Joerg Roedel <[email protected]>
>> ---
>> drivers/pci/quirks.c | 19 +++++++++++++++++++
>> 1 file changed, 19 insertions(+)
>
> Any more objections on this patch? Please let me know if you want to
> have something changed.

The other patch seems to fix this issue without disabling ATS. Isn't
that better?

2017-06-15 17:12:14

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

On Thu, Apr 20, 2017 at 02:11:42PM +0200, Joerg Roedel wrote:
> On Sat, Apr 08, 2017 at 09:41:07AM +0200, Lukas Wunner wrote:
> > On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> > > From: Joerg Roedel <[email protected]>
> > >
> > > ATS is broken on this hardware and causes IOMMU stalls and
> > > system failure. Disable ATS on these devices to make them
> > > usable again with IOMMU enabled.
> >
> > AMD Stoney Ridge is an x86 CPU + GPU combo and this quirk pertains
> > to the GPU, right?
> >
> > In that case the quirk should go to arch/x86. Paul Menzel (+cc)
> > has just complained on linux-pci@ that final fixups are taking half
> > a second, and I think that could be reduced if more efforts were
> > spent to move arch-specific quirks out of the catch-all in
> > drivers/pci/quirks.c.
>
> The affected hardware here might be x86-only, but ATS is not. If a
> broken ATS-capable plug-in card appears, we need this in generic code
> anyway.

It could go in either arch/x86/pci/fixup.c or drivers/pci/quirks.c.

It's not clear to me exactly what the hardware defect is or where it
is. If it's in the CPU or in a GPU that can only be found on x86, I
think arch/x86/pci/fixup.c is the appropriate place.

If it's in a GPU that could be found on other arches,
drivers/pci/quirks.c would be the appropriate place.

I don't personally think the possibility of a plugin card with broken
ATS is a real reason to put this quirk in drivers/pci/quirks.c. It's
a trivial patch and easy to copy or move later if we need to.

Bjorn

2017-06-15 18:14:00

by Deucher, Alexander

[permalink] [raw]
Subject: RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

> -----Original Message-----
> From: Samuel Sieb [mailto:[email protected]]
> Sent: Thursday, June 15, 2017 1:02 PM
> To: Joerg Roedel; Bjorn Helgaas
> Cc: [email protected]; [email protected]; Daniel Drake;
> Deucher, Alexander; David Woodhouse
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
>
> On 06/15/2017 07:04 AM, Joerg Roedel wrote:
> > Hi Bjorn,
> >
> > On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> >> From: Joerg Roedel <[email protected]>
> >>
> >> ATS is broken on this hardware and causes IOMMU stalls and
> >> system failure. Disable ATS on these devices to make them
> >> usable again with IOMMU enabled.
> >>
> >> Note that the commit in the Fixes-tag is not buggy, it
> >> just uncovers the problem in the hardware by increasing
> >> the ATS-flush rate.
> >>
> >> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> >> Signed-off-by: Joerg Roedel <[email protected]>
> >> ---
> >> drivers/pci/quirks.c | 19 +++++++++++++++++++
> >> 1 file changed, 19 insertions(+)
> >
> > Any more objections on this patch? Please let me know if you want to
> > have something changed.
>
> The other patch seems to fix this issue without disabling ATS. Isn't
> that better?

I talked to our validation team and ATS was validated on Stoney, so this patch is just working around something else. The other patch fixes it and is a valid optimization (it should be applied eventually), but apparently the current behavior is allowed even if it's now optimal. I'm not really an ATS expert.

Alex


2017-06-15 19:15:49

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

On Thu, Jun 15, 2017 at 04:04:21PM +0200, Joerg Roedel wrote:
> Hi Bjorn,
>
> On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> > From: Joerg Roedel <[email protected]>
> >
> > ATS is broken on this hardware and causes IOMMU stalls and
> > system failure. Disable ATS on these devices to make them
> > usable again with IOMMU enabled.
> >
> > Note that the commit in the Fixes-tag is not buggy, it
> > just uncovers the problem in the hardware by increasing
> > the ATS-flush rate.
> >
> > Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> > Signed-off-by: Joerg Roedel <[email protected]>
> > ---
> > drivers/pci/quirks.c | 19 +++++++++++++++++++
> > 1 file changed, 19 insertions(+)
>
> Any more objections on this patch? Please let me know if you want to
> have something changed.

It was marked "superseded" in patchwork and thus off my radar. I
don't remember if I did that or why. I changed it back to "New" so I
won't forget about it.

You mention (May 24) the original bug report. Can you include the URL
for that?

I admit I just don't have warm fuzzies that the problem is well
understood.

Bjorn

2017-06-16 16:29:28

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

Hi Bjorn,

On Thu, Jun 15, 2017 at 02:15:45PM -0500, Bjorn Helgaas wrote:
> It was marked "superseded" in patchwork and thus off my radar. I
> don't remember if I did that or why. I changed it back to "New" so I
> won't forget about it.

Great!

> You mention (May 24) the original bug report. Can you include the URL
> for that?

I think there were multiple reports, here is one I could still find:

https://lists.linuxfoundation.org/pipermail/iommu/2017-March/020836.html

> I admit I just don't have warm fuzzies that the problem is well
> understood.

The current understanding (without my ability to debug the hardware
involved) is that the GPU in the Stoney systems gets into a weird state
when ATS invalidations are sent too fast and stops responding to the
iommu.

The iommu then can't complete the invalidation commands and the driver
throws completion-wait loop timeout messages out.



Joerg

2017-07-10 16:54:17

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

On Fri, Jun 16, 2017 at 06:29:23PM +0200, Joerg Roedel wrote:
> Hi Bjorn,
>
> On Thu, Jun 15, 2017 at 02:15:45PM -0500, Bjorn Helgaas wrote:
> > It was marked "superseded" in patchwork and thus off my radar. I
> > don't remember if I did that or why. I changed it back to "New" so I
> > won't forget about it.
>
> Great!
>
> > You mention (May 24) the original bug report. Can you include the URL
> > for that?
>
> I think there were multiple reports, here is one I could still find:
>
> https://lists.linuxfoundation.org/pipermail/iommu/2017-March/020836.html
>
> > I admit I just don't have warm fuzzies that the problem is well
> > understood.
>
> The current understanding (without my ability to debug the hardware
> involved) is that the GPU in the Stoney systems gets into a weird state
> when ATS invalidations are sent too fast and stops responding to the
> iommu.
>
> The iommu then can't complete the invalidation commands and the driver
> throws completion-wait loop timeout messages out.

I'm still confused. Per Samuel
([email protected]):

Samuel> The other patch seems to fix this issue without disabling ATS.
Samuel> Isn't that better?

and Alex
(BN6PR12MB1652DF4130FC792B71DD9974F7C00@BN6PR12MB1652.namprd12.prod.outlook.com):

Alex> I talked to our validation team and ATS was validated on Stoney,
Alex> so this patch is just working around something else. The other
Alex> patch fixes it and is a valid optimization ...

I'm confused about what this "other patch" is and whether we want that
one, this one, or both.

Bjorn

2017-07-11 11:49:54

by Jörg Rödel

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

Hi Bjorn,

On Mon, Jul 10, 2017 at 11:53:58AM -0500, Bjorn Helgaas wrote:
> I'm still confused. Per Samuel
> ([email protected]):
>
> Samuel> The other patch seems to fix this issue without disabling ATS.
> Samuel> Isn't that better?
>
> and Alex
> (BN6PR12MB1652DF4130FC792B71DD9974F7C00@BN6PR12MB1652.namprd12.prod.outlook.com):
>
> Alex> I talked to our validation team and ATS was validated on Stoney,
> Alex> so this patch is just working around something else. The other
> Alex> patch fixes it and is a valid optimization ...
>
> I'm confused about what this "other patch" is and whether we want that
> one, this one, or both.

The other patches floating around lowered the ATS flush-rate from the
AMD IOMMU driver, which makes the issue disappear as well. But the issue
only disappeared, it is not solved and could probably still be
reproduced with a GPU usage pattern that increases the ATS flush-rate.

So blacklisting the device for ATS is still the safest thing we could do
here.


Regards,

Joerg

2017-07-11 19:08:07

by Deucher, Alexander

[permalink] [raw]
Subject: RE: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

> -----Original Message-----
> From: Joerg Roedel [mailto:[email protected]]
> Sent: Tuesday, July 11, 2017 7:50 AM
> To: Bjorn Helgaas
> Cc: Bjorn Helgaas; [email protected]; [email protected];
> Daniel Drake; Deucher, Alexander; Samuel Sieb; David Woodhouse
> Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs
>
> Hi Bjorn,
>
> On Mon, Jul 10, 2017 at 11:53:58AM -0500, Bjorn Helgaas wrote:
> > I'm still confused. Per Samuel
> > ([email protected]):
> >
> > Samuel> The other patch seems to fix this issue without disabling ATS.
> > Samuel> Isn't that better?
> >
> > and Alex
> >
> ([email protected]
> 2.prod.outlook.com):
> >
> > Alex> I talked to our validation team and ATS was validated on Stoney,
> > Alex> so this patch is just working around something else. The other
> > Alex> patch fixes it and is a valid optimization ...
> >
> > I'm confused about what this "other patch" is and whether we want that
> > one, this one, or both.

Here's the other patch:
https://lists.freedesktop.org/archives/amd-gfx/2017-May/009421.html

>
> The other patches floating around lowered the ATS flush-rate from the
> AMD IOMMU driver, which makes the issue disappear as well. But the issue
> only disappeared, it is not solved and could probably still be
> reproduced with a GPU usage pattern that increases the ATS flush-rate.
>
> So blacklisting the device for ATS is still the safest thing we could do
> here.

I don't have any objection per se, but I'd hate to add a quirk to disable it only to remove it again in the future if we needed ATS related functionality later. We are in the process of upstreaming KFD support for Carrizo (which is a bigger version of Stoney) and that utilizes ATS related functionality to provide GPU access to pageable memory. There are no immediate requirements for Stoney, but that may change.

Alex


2017-07-13 02:56:12

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <[email protected]>
>
> ATS is broken on this hardware and causes IOMMU stalls and
> system failure. Disable ATS on these devices to make them
> usable again with IOMMU enabled.
>
> Note that the commit in the Fixes-tag is not buggy, it
> just uncovers the problem in the hardware by increasing
> the ATS-flush rate.
>
> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> Signed-off-by: Joerg Roedel <[email protected]>

Applied with Alex's ack to pci/virtualization for v4.14, thanks!

Alex, you seemed a little ambivalent later. If you want to rescind
your ack, let me know and I'll remove it.

> ---
> drivers/pci/quirks.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 6736836..7cbe316 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4634,3 +4634,22 @@ static void quirk_no_aersid(struct pci_dev *pdev)
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2031, quirk_no_aersid);
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2032, quirk_no_aersid);
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x2033, quirk_no_aersid);
> +
> +#ifdef CONFIG_PCI_ATS
> +/*
> + * Some devices have a broken ATS implementation causing IOMMU stalls.
> + * Don't use ATS for those devices.
> + */
> +static void quirk_disable_ats(struct pci_dev *pdev)
> +{
> + /*
> + * Set pdev->ats_cap = 0 to make pci_enable_ats() bail out
> + * early.
> + */
> + dev_info(&pdev->dev, "QUIRK: Disabling ATS");
> + pdev->ats_cap = 0;
> +}
> +
> +/* AMD Stoney platform GPU */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x98e4, quirk_disable_ats);
> +#endif /* CONFIG_PCI_ATS */
> --
> 1.9.1
>

2017-08-29 20:11:10

by Samuel Sieb

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

On 07/12/2017 07:56 PM, Bjorn Helgaas wrote:
> On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
>> From: Joerg Roedel <[email protected]>
>>
>> ATS is broken on this hardware and causes IOMMU stalls and
>> system failure. Disable ATS on these devices to make them
>> usable again with IOMMU enabled.
>>
>> Note that the commit in the Fixes-tag is not buggy, it
>> just uncovers the problem in the hardware by increasing
>> the ATS-flush rate.
>>
>> Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
>> Signed-off-by: Joerg Roedel <[email protected]>
>
> Applied with Alex's ack to pci/virtualization for v4.14, thanks!

Is there any chance of getting this into an earlier kernel? This is a
pretty devastating bug for users! I'm currently providing patched
kernels, but if they run upgrades and get a new kernel without noticing
it, any filesystem they access will get completely mangled.

2017-08-29 20:50:02

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

On Tue, Aug 29, 2017 at 01:02:41PM -0700, Samuel Sieb wrote:
> On 07/12/2017 07:56 PM, Bjorn Helgaas wrote:
> >On Fri, Apr 07, 2017 at 04:32:18PM +0200, Joerg Roedel wrote:
> >>From: Joerg Roedel <[email protected]>
> >>
> >>ATS is broken on this hardware and causes IOMMU stalls and
> >>system failure. Disable ATS on these devices to make them
> >>usable again with IOMMU enabled.
> >>
> >>Note that the commit in the Fixes-tag is not buggy, it
> >>just uncovers the problem in the hardware by increasing
> >>the ATS-flush rate.
> >>
> >>Fixes: b1516a14657a ('iommu/amd: Implement flush queue')
> >>Signed-off-by: Joerg Roedel <[email protected]>
> >
> >Applied with Alex's ack to pci/virtualization for v4.14, thanks!
>
> Is there any chance of getting this into an earlier kernel? This is
> a pretty devastating bug for users! I'm currently providing patched
> kernels, but if they run upgrades and get a new kernel without
> noticing it, any filesystem they access will get completely mangled.

I assume you're looking to get this into stable kernels or distro
update kernels. I don't personally deal with either of those, but for
stable kernels, see
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/stable-kernel-rules.rst

For distro update kernels, you'd have to talk to the distro folks, and
I don't have contacts for those.

Bjorn

2017-08-30 11:44:17

by Joerg Roedel

[permalink] [raw]
Subject: Re: [PATCH v2] PCI: Add ATS-disable quirk for AMD Stoney GPUs

On Tue, Aug 29, 2017 at 01:02:41PM -0700, Samuel Sieb wrote:
> On 07/12/2017 07:56 PM, Bjorn Helgaas wrote:
> >Applied with Alex's ack to pci/virtualization for v4.14, thanks!
>
> Is there any chance of getting this into an earlier kernel? This is a
> pretty devastating bug for users! I'm currently providing patched kernels,
> but if they run upgrades and get a new kernel without noticing it, any
> filesystem they access will get completely mangled.

The patch is queued for v4.14, so it will probably be picked up by
distributions when v4.14-rc1 is released. At least this will be the case
for the affected (Open)SUSE distributions.


Joerg