2014-11-21 18:24:07

by Alex Williamson

[permalink] [raw]
Subject: [PATCH 0/4] PCI: Reset exclusions

This is really 2 sets of 2 patches, but they both add bits to
dev_flags so are included together.

This fixes two problems we've seen with resets. The first is for
devices that advertise a PM reset mechanism, but it doesn't appear to
do anything. We add a quirk and flags bit to indicate the PM reset
mechanism isn't viable. This happens on some AMD GPUs and causes
vfio-pci to assume the PM reset was successful when it really did
nothing and we should have escalated to a PCI bus reset. Alex
Deucher confirms that PM reset isn't used by graphics drivers. The
exclusion here is for users of pci_reset_function(), which mostly
only includes pci-sysfs and drivers like legacy KVM device assignment
and VFIO.

The second issue is a problem identified with an Atheros wifi chip
where in performing a bus reset of the device, we not only permanently
lose access to the device, but it introduces host stability issues if
we try to access it. I've been unsuccessful in finding any way to
make the device behave or in finding anyone with access to hardware
documentation and errata for this device, so it seems like the most
appropriate path is to blacklist bus resets for topologies including
this device. The second two patches add infrastructure and quirks to
do this. Thanks,

Alex

---

Alex Williamson (4):
PCI: quirk Atheros AR93xx to avoid bus reset
PCI: Allow device quirks to exclude bus reset
PCI: quirk AMD/ATI VGA cards to avoid PM reset
PCI: Allow device quirks to exclude D3->D0 PM reset


drivers/pci/pci.c | 42 +++++++++++++++++++++++++++++++++++++-----
drivers/pci/quirks.c | 35 +++++++++++++++++++++++++++++++++++
include/linux/pci.h | 4 ++++
3 files changed, 76 insertions(+), 5 deletions(-)


2014-11-21 18:24:15

by Alex Williamson

[permalink] [raw]
Subject: [PATCH 1/4] PCI: Allow device quirks to exclude D3->D0 PM reset

The PCI PM spec indicates that when No_Soft_Reset is clear
(NoSoftRst-) that the device performs an internal reset when
transitioning From D3hot to D0. Configuration context is lost and
the device requires a full reinitialization sequence.

Unfortunately the definition of "internal reset", beyond the
application of the configuration context, is largely left to the
interpretation of the specific device. For some devices, setting
NoSoftRst- appears arbitrary and no obvious "internal reset" occurs
on D3hot to D0 transition.

We still need to honor the PCI specification and restore PCI config
context in the event that we do a PM reset, so we don't cache and
modify the PCI_PM_CTRL_NO_SOFT_RESET bit for the device, but for
interfaces where the intention is to reset the device, like
pci_reset_function(), we need a mechanism to flag that PM reset
doesn't perform any significant "internal reset" of the device.

Signed-off-by: Alex Williamson <[email protected]>
---

drivers/pci/pci.c | 2 +-
include/linux/pci.h | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 625a4ac..ba54a5a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3206,7 +3206,7 @@ static int pci_pm_reset(struct pci_dev *dev, int probe)
{
u16 csr;

- if (!dev->pm_cap)
+ if (!dev->pm_cap || dev->dev_flags & PCI_DEV_FLAGS_NO_PM_RESET)
return -ENOTTY;

pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &csr);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 5be8db4..aea347d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -175,6 +175,8 @@ enum pci_dev_flags {
PCI_DEV_FLAGS_DMA_ALIAS_DEVFN = (__force pci_dev_flags_t) (1 << 4),
/* Use a PCIe-to-PCI bridge alias even if !pci_is_pcie */
PCI_DEV_FLAG_PCIE_BRIDGE_ALIAS = (__force pci_dev_flags_t) (1 << 5),
+ /* Do not use PM reset even if device advertises NoSoftRst- */
+ PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 6),
};

enum pci_irq_reroute_variant {

2014-11-21 18:24:24

by Alex Williamson

[permalink] [raw]
Subject: [PATCH 2/4] PCI: quirk AMD/ATI VGA cards to avoid PM reset

Some AMD/ATI GPUs report that they support PM reset (NoSoftRst-) but
initiating such a reset has no apparent affect on the device. The
monitor remains sync'd, the framebuffer contents are retained, etc.
Callers of pci_reset_function() don't necessarily have a way to
validate whether a reset was effective, so we really want to avoid
making use of a known non-effective reset. Returning an error in
such cases appears to be the better option. For users like vfio-pci,
this allows the driver to escalate to the bus reset interfaces.

If a device lives on the root bus, there's really no further
escalation path, so we exempt PM reset as potentially better than
nothing.

Signed-off-by: Alex Williamson <[email protected]>
Cc: Alex Deucher <[email protected]>
---

drivers/pci/quirks.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 90acb32..561e10d 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3008,6 +3008,27 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_REALTEK, 0x8169,
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MELLANOX, PCI_ANY_ID,
quirk_broken_intx_masking);

+static void quirk_no_pm_reset(struct pci_dev *dev)
+{
+ /*
+ * A non-effective PM reset may be better than nothing
+ * if we can't do a bus reset
+ */
+ if (!pci_is_root_bus(dev->bus))
+ dev->dev_flags |= PCI_DEV_FLAGS_NO_PM_RESET;
+}
+
+/*
+ * Some AMD/ATI GPUS (HD8570 - Oland) report supporting PM reset via D3->D0
+ * transition (NoSoftRst-). This reset mechanims seems to have no effect
+ * whatsoever on the device, even retaining the framebuffer contents and
+ * monitor sync. Advertising this support makes other layers, like VFIO
+ * assume pci_reset_function() is viable for this device. Mark it as
+ * unavailable to skip it when testing reset methods.
+ */
+DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_ATI, PCI_ANY_ID,
+ PCI_CLASS_DISPLAY_VGA, 8, quirk_no_pm_reset);
+
#ifdef CONFIG_ACPI
/*
* Apple: Shutdown Cactus Ridge Thunderbolt controller.

2014-11-21 18:24:29

by Alex Williamson

[permalink] [raw]
Subject: [PATCH 3/4] PCI: Allow device quirks to exclude bus reset

Enable a mechanism for devices to quirk that they do not behave when
doing a PCI bus reset. We require a modest level of spec compliant
behavior in order to do a reset, for instance the device should come
out of reset without throwing errors and PCI config space should be
accessible after reset. This is too much to ask for some devices.

Signed-off-by: Alex Williamson <[email protected]>
---

drivers/pci/pci.c | 40 ++++++++++++++++++++++++++++++++++++----
include/linux/pci.h | 2 ++
2 files changed, 38 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ba54a5a..7d6f87e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3280,7 +3280,8 @@ static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
{
struct pci_dev *pdev;

- if (pci_is_root_bus(dev->bus) || dev->subordinate || !dev->bus->self)
+ if (pci_is_root_bus(dev->bus) || dev->subordinate ||
+ !dev->bus->self || dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
return -ENOTTY;

list_for_each_entry(pdev, &dev->bus->devices, bus_list)
@@ -3314,7 +3315,8 @@ static int pci_dev_reset_slot_function(struct pci_dev *dev, int probe)
{
struct pci_dev *pdev;

- if (dev->subordinate || !dev->slot)
+ if (dev->subordinate || !dev->slot ||
+ dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET)
return -ENOTTY;

list_for_each_entry(pdev, &dev->bus->devices, bus_list)
@@ -3566,6 +3568,20 @@ int pci_try_reset_function(struct pci_dev *dev)
}
EXPORT_SYMBOL_GPL(pci_try_reset_function);

+/* Do any devices on or below this bus prevent a bus reset */
+static bool pci_bus_resetable(struct pci_bus *bus)
+{
+ struct pci_dev *dev;
+
+ list_for_each_entry(dev, &bus->devices, bus_list) {
+ if (dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET ||
+ (dev->subordinate && !pci_bus_resetable(dev->subordinate)))
+ return false;
+ }
+
+ return true;
+}
+
/* Lock devices from the top of the tree down */
static void pci_bus_lock(struct pci_bus *bus)
{
@@ -3616,6 +3632,22 @@ unlock:
return 0;
}

+/* Do any devices on or below this slot prevent a bus reset */
+static bool pci_slot_resetable(struct pci_slot *slot)
+{
+ struct pci_dev *dev;
+
+ list_for_each_entry(dev, &slot->bus->devices, bus_list) {
+ if (!dev->slot || dev->slot != slot)
+ continue;
+ if (dev->dev_flags & PCI_DEV_FLAGS_NO_BUS_RESET ||
+ (dev->subordinate && !pci_bus_resetable(dev->subordinate)))
+ return false;
+ }
+
+ return true;
+}
+
/* Lock devices from the top of the tree down */
static void pci_slot_lock(struct pci_slot *slot)
{
@@ -3737,7 +3769,7 @@ static int pci_slot_reset(struct pci_slot *slot, int probe)
{
int rc;

- if (!slot)
+ if (!slot || !pci_slot_resetable(slot))
return -ENOTTY;

if (!probe)
@@ -3829,7 +3861,7 @@ EXPORT_SYMBOL_GPL(pci_try_reset_slot);

static int pci_bus_reset(struct pci_bus *bus, int probe)
{
- if (!bus->self)
+ if (!bus->self || !pci_bus_resetable(bus))
return -ENOTTY;

if (probe)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index aea347d..c9a8904 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -177,6 +177,8 @@ enum pci_dev_flags {
PCI_DEV_FLAG_PCIE_BRIDGE_ALIAS = (__force pci_dev_flags_t) (1 << 5),
/* Do not use PM reset even if device advertises NoSoftRst- */
PCI_DEV_FLAGS_NO_PM_RESET = (__force pci_dev_flags_t) (1 << 6),
+ /* Do not use bus resets for device */
+ PCI_DEV_FLAGS_NO_BUS_RESET = (__force pci_dev_flags_t) (1 << 7),
};

enum pci_irq_reroute_variant {

2014-11-21 18:24:32

by Alex Williamson

[permalink] [raw]
Subject: [PATCH 4/4] PCI: quirk Atheros AR93xx to avoid bus reset

Reports against the TL-WDN4800 card indicate that PCI bus reset of
this Atheros device cause system lock-ups and resets. I've also
been able to confirm this behavior on multiple systems. The device
never returns from reset and attempts to access config space of the
device after reset result in hangs. Blacklist bus reset for the
device to avoid this issue.

Reported-by: Andreas Hartmann <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
Tested-by: Andreas Hartmann <[email protected]>
---

drivers/pci/quirks.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 561e10d..ebbd5b4 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3029,6 +3029,20 @@ static void quirk_no_pm_reset(struct pci_dev *dev)
DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_ATI, PCI_ANY_ID,
PCI_CLASS_DISPLAY_VGA, 8, quirk_no_pm_reset);

+static void quirk_no_bus_reset(struct pci_dev *dev)
+{
+ dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
+}
+
+/*
+ * Atheros AR93xx chips do not behave after a bus reset. The device will
+ * throw a Link Down error on AER capable system and regardless of AER,
+ * config space of the device is never accessible again and typically
+ * causes the system to hang or reset when access is attempted.
+ * http://www.spinics.net/lists/linux-pci/msg34797.html
+ */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0030, quirk_no_bus_reset);
+
#ifdef CONFIG_ACPI
/*
* Apple: Shutdown Cactus Ridge Thunderbolt controller.

2014-11-21 19:00:43

by Deucher, Alexander

[permalink] [raw]
Subject: RE: [PATCH 2/4] PCI: quirk AMD/ATI VGA cards to avoid PM reset

> -----Original Message-----
> From: Alex Williamson [mailto:[email protected]]
> Sent: Friday, November 21, 2014 1:24 PM
> To: [email protected]
> Cc: [email protected]; [email protected]; Alex
> Williamson; Deucher, Alexander
> Subject: [PATCH 2/4] PCI: quirk AMD/ATI VGA cards to avoid PM reset
>
> Some AMD/ATI GPUs report that they support PM reset (NoSoftRst-) but
> initiating such a reset has no apparent affect on the device. The
> monitor remains sync'd, the framebuffer contents are retained, etc.
> Callers of pci_reset_function() don't necessarily have a way to
> validate whether a reset was effective, so we really want to avoid
> making use of a known non-effective reset. Returning an error in
> such cases appears to be the better option. For users like vfio-pci,
> this allows the driver to escalate to the bus reset interfaces.
>
> If a device lives on the root bus, there's really no further
> escalation path, so we exempt PM reset as potentially better than
> nothing.
>
> Signed-off-by: Alex Williamson <[email protected]>
> Cc: Alex Deucher <[email protected]>

Acked-by: Alex Deucher <[email protected]>

> ---
>
> drivers/pci/quirks.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 90acb32..561e10d 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3008,6 +3008,27 @@
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_REALTEK, 0x8169,
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MELLANOX, PCI_ANY_ID,
> quirk_broken_intx_masking);
>
> +static void quirk_no_pm_reset(struct pci_dev *dev)
> +{
> + /*
> + * A non-effective PM reset may be better than nothing
> + * if we can't do a bus reset
> + */
> + if (!pci_is_root_bus(dev->bus))
> + dev->dev_flags |= PCI_DEV_FLAGS_NO_PM_RESET;
> +}
> +
> +/*
> + * Some AMD/ATI GPUS (HD8570 - Oland) report supporting PM reset via
> D3->D0
> + * transition (NoSoftRst-). This reset mechanims seems to have no effect
> + * whatsoever on the device, even retaining the framebuffer contents and
> + * monitor sync. Advertising this support makes other layers, like VFIO
> + * assume pci_reset_function() is viable for this device. Mark it as
> + * unavailable to skip it when testing reset methods.
> + */
> +DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_ATI, PCI_ANY_ID,
> + PCI_CLASS_DISPLAY_VGA, 8,
> quirk_no_pm_reset);
> +
> #ifdef CONFIG_ACPI
> /*
> * Apple: Shutdown Cactus Ridge Thunderbolt controller.

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?

2014-12-26 08:17:45

by Andreas Hartmann

[permalink] [raw]
Subject: Re: [PATCH 4/4] PCI: quirk Atheros AR93xx to avoid bus reset

Hello Bjorn,

I'm running this patch and the corresponding "[PATCH 3/4] PCI: Allow
device quirks to exclude bus reset" patch meanwhile since a month w/
kernel 3.14.x and couldn't find any problem. Would it be possible to
apply these patches to main kernel? Or even to lt-kernel 3.14?


Thanks.
kind regards,
Andreas Hartmann


Alex Williamson wrote:
> Reports against the TL-WDN4800 card indicate that PCI bus reset of
> this Atheros device cause system lock-ups and resets. I've also
> been able to confirm this behavior on multiple systems. The device
> never returns from reset and attempts to access config space of the
> device after reset result in hangs. Blacklist bus reset for the
> device to avoid this issue.
>
> Reported-by: Andreas Hartmann <[email protected]>
> Signed-off-by: Alex Williamson <[email protected]>
> Tested-by: Andreas Hartmann <[email protected]>
> ---
>
> drivers/pci/quirks.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 561e10d..ebbd5b4 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3029,6 +3029,20 @@ static void quirk_no_pm_reset(struct pci_dev *dev)
> DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_ATI, PCI_ANY_ID,
> PCI_CLASS_DISPLAY_VGA, 8, quirk_no_pm_reset);
>
> +static void quirk_no_bus_reset(struct pci_dev *dev)
> +{
> + dev->dev_flags |= PCI_DEV_FLAGS_NO_BUS_RESET;
> +}
> +
> +/*
> + * Atheros AR93xx chips do not behave after a bus reset. The device will
> + * throw a Link Down error on AER capable system and regardless of AER,
> + * config space of the device is never accessible again and typically
> + * causes the system to hang or reset when access is attempted.
> + * http://www.spinics.net/lists/linux-pci/msg34797.html
> + */
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATHEROS, 0x0030, quirk_no_bus_reset);
> +
> #ifdef CONFIG_ACPI
> /*
> * Apple: Shutdown Cactus Ridge Thunderbolt controller.
>