2017-08-27 17:41:03

by Sinan Kaya

[permalink] [raw]
Subject: [PATCH V13 1/4] PCI: Don't ignore valid response before CRS timeout

From: Bjorn Helgaas <[email protected]>

While waiting for a device to become ready (i.e., to return a non-CRS
completion to a read of its Vendor ID), if we got a valid response to the
very last read before timing out, we printed a warning and gave up on the
device even though it was actually ready.

For a typical 60s timeout, we wait about 65s (it's not exact because of the
exponential backoff), but we treated devices that became ready between 33s
and 65s as though they failed.

Move the Device ID read later so we check whether the device is ready
immediately, before checking for a timeout.

Signed-off-by: Bjorn Helgaas <[email protected]>
[okaya: reorder reads so that we check device presence after sleep]
Signed-off-by: Sinan Kaya <[email protected]>
---
drivers/pci/probe.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index c31310d..2849e0e 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1847,17 +1847,18 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
if (!crs_timeout)
return false;

- msleep(delay);
- delay *= 2;
- if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l))
- return false;
- /* Card hasn't responded in 60 seconds? Must be stuck. */
if (delay > crs_timeout) {
printk(KERN_WARNING "pci %04x:%02x:%02x.%d: not responding\n",
pci_domain_nr(bus), bus->number, PCI_SLOT(devfn),
PCI_FUNC(devfn));
return false;
}
+
+ msleep(delay);
+ delay *= 2;
+
+ if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l))
+ return false;
}

return true;
--
1.9.1


2017-08-27 17:41:09

by Sinan Kaya

[permalink] [raw]
Subject: [PATCH V13 4/4] PCI: Warn periodically while waiting for device to become ready

Add a print statement in pci_bus_wait_crs() so that user observes the
progress of device polling instead of silently waiting for timeout to be
reached.

Signed-off-by: Sinan Kaya <[email protected]>
[bhelgaas: check for timeout first so we don't print "waiting, giving up",
always print time we've slept (not the actual timeout, print a "ready"
message if we've printed a "waiting" message]
Signed-off-by: Bjorn Helgaas <[email protected]>
---
drivers/pci/probe.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index d834a20..8f7ba16 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1847,11 +1847,16 @@ static bool pci_bus_wait_crs(struct pci_bus *bus, int devfn, u32 l,
*/
while ((l & 0xffff) == 0x0001) {
if (delay > timeout) {
- printk(KERN_WARNING "pci %04x:%02x:%02x.%d: not responding\n",
- pci_domain_nr(bus), bus->number, PCI_SLOT(devfn),
- PCI_FUNC(devfn));
+ pr_warn("pci %04x:%02x:%02x.%d: not ready after %dms; giving up\n",
+ pci_domain_nr(bus), bus->number,
+ PCI_SLOT(devfn), PCI_FUNC(devfn), delay - 1);
+
return false;
}
+ if (delay >= 1000)
+ pr_info("pci %04x:%02x:%02x.%d: not ready after %dms; waiting\n",
+ pci_domain_nr(bus), bus->number,
+ PCI_SLOT(devfn), PCI_FUNC(devfn), delay - 1);

msleep(delay);
delay *= 2;
@@ -1860,6 +1865,11 @@ static bool pci_bus_wait_crs(struct pci_bus *bus, int devfn, u32 l,
return false;
}

+ if (delay >= 1000)
+ pr_info("pci %04x:%02x:%02x.%d: ready after %dms\n",
+ pci_domain_nr(bus), bus->number,
+ PCI_SLOT(devfn), PCI_FUNC(devfn), delay - 1);
+
return true;
}

--
1.9.1

2017-08-27 17:41:08

by Sinan Kaya

[permalink] [raw]
Subject: [PATCH V13 3/4] PCI: Handle CRS ('device not ready') returned by device after FLR

Sporadic reset issues have been observed with Intel 750 NVMe drive while
assigning the physical function to the guest machine. The sequence of
events observed is as follows:

- perform a Function Level Reset (FLR)
- sleep up to 1000ms total
- read ~0 from PCI_COMMAND
- warn that the device didn't return from FLR
- touch the device before it's ready
- device drops config writes when we restore register settings
- incomplete register restore leaves device in inconsistent state
- device probe fails because device is in inconsistent state

After reset, an endpoint may respond to config requests with Configuration
Request Retry Status (CRS) to indicate that it is not ready to accept new
requests. See PCIe r3.1, sec 2.3.1 and 6.6.2.

Increase the timeout value from 1 second to 60 seconds to cover the period
where device responds with CRS and also report polling progress.

Signed-off-by: Sinan Kaya <[email protected]>
---
drivers/pci/pci.c | 52 +++++++++++++++++++++++++++++++++++++---------------
1 file changed, 37 insertions(+), 15 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index af0cc34..abab64b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3811,27 +3811,49 @@ int pci_wait_for_pending_transaction(struct pci_dev *dev)
}
EXPORT_SYMBOL(pci_wait_for_pending_transaction);

-/*
- * We should only need to wait 100ms after FLR, but some devices take longer.
- * Wait for up to 1000ms for config space to return something other than -1.
- * Intel IGD requires this when an LCD panel is attached. We read the 2nd
- * dword because VFs don't implement the 1st dword.
- */
static void pci_flr_wait(struct pci_dev *dev)
{
- int i = 0;
+ int delay = 1, timeout = 60000;
u32 id;

- do {
- msleep(100);
+ /*
+ * Per PCIe r3.1, sec 6.6.2, a device must complete an FLR within
+ * 100ms, but may silently discard requests while the FLR is in
+ * progress. Wait 100ms before trying to access the device.
+ */
+ msleep(100);
+
+ /*
+ * After 100ms, the device should not silently discard config
+ * requests, but it may still indicate that it needs more time by
+ * responding to them with CRS completions. The Root Port will
+ * generally synthesize ~0 data to complete the read (except when
+ * CRS SV is enabled and the read was for the Vendor ID; in that
+ * case it synthesizes 0x0001 data).
+ *
+ * Wait for the device to return a non-CRS completion. Read the
+ * Command register instead of Vendor ID so we don't have to
+ * contend with the CRS SV value.
+ */
+ pci_read_config_dword(dev, PCI_COMMAND, &id);
+ while (id == ~0) {
+ if (delay > timeout) {
+ dev_warn(&dev->dev, "not ready %dms after FLR; giving up\n",
+ delay - 1);
+ return;
+ }
+
+ if (delay > 1000)
+ dev_info(&dev->dev, "not ready %dms after FLR; waiting\n",
+ delay - 1);
+
+ msleep(delay);
+ delay *= 2;
pci_read_config_dword(dev, PCI_COMMAND, &id);
- } while (i++ < 10 && id == ~0);
+ }

- if (id == ~0)
- dev_warn(&dev->dev, "Failed to return from FLR\n");
- else if (i > 1)
- dev_info(&dev->dev, "Required additional %dms to return from FLR\n",
- (i - 1) * 100);
+ if (delay > 1000)
+ dev_info(&dev->dev, "ready %dms after FLR\n", delay - 1);
}

/**
--
1.9.1

2017-08-27 17:41:46

by Sinan Kaya

[permalink] [raw]
Subject: [PATCH V13 2/4] PCI: Factor out pci_bus_wait_crs()

Configuration Request Retry Status (CRS) was previously hidden inside
pci_bus_read_dev_vendor_id(). We want to add support for CRS in other
situations, such as waiting for a device to become ready after a Function
Level Reset.

Move CRS handling into pci_bus_wait_crs() so it can be called from other
places and also introduce pci_bus_crs_visibility_pending() to determine
when we should wait.

Signed-off-by: Sinan Kaya <[email protected]>
---
drivers/pci/probe.c | 52 ++++++++++++++++++++++++++++++++++------------------
1 file changed, 34 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2849e0e..d834a20 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1824,30 +1824,29 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
}
EXPORT_SYMBOL(pci_alloc_dev);

-bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
- int crs_timeout)
+static inline bool pci_bus_crs_visibility_pending(u32 l)
+{
+ return (l & 0xffff) == 0x0001;
+}
+
+static bool pci_bus_wait_crs(struct pci_bus *bus, int devfn, u32 l,
+ int timeout)
{
int delay = 1;

- if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l))
- return false;
+ if ((l & 0xffff) != 0x0001)
+ return true; /* not a CRS completion */

- /* some broken boards return 0 or ~0 if a slot is empty: */
- if (*l == 0xffffffff || *l == 0x00000000 ||
- *l == 0x0000ffff || *l == 0xffff0000)
- return false;
+ if (!timeout)
+ return false; /* CRS, but caller doesn't want to wait */

/*
- * Configuration Request Retry Status. Some root ports return the
- * actual device ID instead of the synthetic ID (0xFFFF) required
- * by the PCIe spec. Ignore the device ID and only check for
- * (vendor id == 1).
+ * We got the reserved Vendor ID that indicates a completion with
+ * Configuration Request Retry Status (CRS). Retry until we get a
+ * valid Vendor ID or we time out.
*/
- while ((*l & 0xffff) == 0x0001) {
- if (!crs_timeout)
- return false;
-
- if (delay > crs_timeout) {
+ while ((l & 0xffff) == 0x0001) {
+ if (delay > timeout) {
printk(KERN_WARNING "pci %04x:%02x:%02x.%d: not responding\n",
pci_domain_nr(bus), bus->number, PCI_SLOT(devfn),
PCI_FUNC(devfn));
@@ -1857,12 +1856,29 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
msleep(delay);
delay *= 2;

- if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l))
+ if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, &l))
return false;
}

return true;
}
+
+bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
+ int timeout)
+{
+ if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l))
+ return false;
+
+ /* some broken boards return 0 or ~0 if a slot is empty: */
+ if (*l == 0xffffffff || *l == 0x00000000 ||
+ *l == 0x0000ffff || *l == 0xffff0000)
+ return false;
+
+ if (pci_bus_crs_visibility_pending(*l))
+ return pci_bus_wait_crs(bus, devfn, *l, timeout);
+
+ return true;
+}
EXPORT_SYMBOL(pci_bus_read_dev_vendor_id);

/*
--
1.9.1

2017-08-29 19:53:27

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH V13 1/4] PCI: Don't ignore valid response before CRS timeout

On Sun, Aug 27, 2017 at 01:40:48PM -0400, Sinan Kaya wrote:
> From: Bjorn Helgaas <[email protected]>
>
> While waiting for a device to become ready (i.e., to return a non-CRS
> completion to a read of its Vendor ID), if we got a valid response to the
> very last read before timing out, we printed a warning and gave up on the
> device even though it was actually ready.
>
> For a typical 60s timeout, we wait about 65s (it's not exact because of the
> exponential backoff), but we treated devices that became ready between 33s
> and 65s as though they failed.
>
> Move the Device ID read later so we check whether the device is ready
> immediately, before checking for a timeout.
>
> Signed-off-by: Bjorn Helgaas <[email protected]>
> [okaya: reorder reads so that we check device presence after sleep]
> Signed-off-by: Sinan Kaya <[email protected]>

Applied this series to pci/enumeration for v4.14. You didn't include a
cover letter, but the series includes:

[V13 1/4] PCI: Don't ignore valid response before CRS timeout
[V13 2/4] PCI: Factor out pci_bus_wait_crs()
[V13 3/4] PCI: Handle CRS ('device not ready') returned by device af
[V13 4/4] PCI: Warn periodically while waiting for device to become

I made some changes:

- Renamed pci_bus_crs_visibility_pending() to pci_bus_crs_vendor_id()
because the CRS completion is not "pending". It is not waiting
somewhere for us to do something about it. The CRS completion has
already occurred and is over. All we have now is a magic Vendor ID
value that tells us that it happened.

- Split addition of pci_bus_crs_vendor_id() to a separate patch, move
it to probe.c, and make it static.

- Pass a pointer, not a value, to pci_bus_wait_crs() so the caller gets
correct Vendor ID when we're finished waiting.

- Add in the 100ms mandatory sleep in the delays we print in
pci_flr_wait() so the printed value reflects the entire time since the
FLR was started.

> ---
> drivers/pci/probe.c | 11 ++++++-----
> 1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index c31310d..2849e0e 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -1847,17 +1847,18 @@ bool pci_bus_read_dev_vendor_id(struct pci_bus *bus, int devfn, u32 *l,
> if (!crs_timeout)
> return false;
>
> - msleep(delay);
> - delay *= 2;
> - if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l))
> - return false;
> - /* Card hasn't responded in 60 seconds? Must be stuck. */
> if (delay > crs_timeout) {
> printk(KERN_WARNING "pci %04x:%02x:%02x.%d: not responding\n",
> pci_domain_nr(bus), bus->number, PCI_SLOT(devfn),
> PCI_FUNC(devfn));
> return false;
> }
> +
> + msleep(delay);
> + delay *= 2;
> +
> + if (pci_bus_read_config_dword(bus, devfn, PCI_VENDOR_ID, l))
> + return false;
> }
>
> return true;
> --
> 1.9.1
>

2017-09-03 22:13:55

by Yinghai Lu

[permalink] [raw]
Subject: Re: [PATCH V13 1/4] PCI: Don't ignore valid response before CRS timeout

On Tue, Aug 29, 2017 at 12:53 PM, Bjorn Helgaas <[email protected]> wrote:

>
> Applied this series to pci/enumeration for v4.14. You didn't include a
> cover letter, but the series includes:
>
> [V13 1/4] PCI: Don't ignore valid response before CRS timeout
> [V13 2/4] PCI: Factor out pci_bus_wait_crs()
> [V13 3/4] PCI: Handle CRS ('device not ready') returned by device af
> [V13 4/4] PCI: Warn periodically while waiting for device to become
>
> I made some changes:
>
> - Renamed pci_bus_crs_visibility_pending() to pci_bus_crs_vendor_id()
> because the CRS completion is not "pending". It is not waiting
> somewhere for us to do something about it. The CRS completion has
> already occurred and is over. All we have now is a magic Vendor ID
> value that tells us that it happened.
>
> - Split addition of pci_bus_crs_vendor_id() to a separate patch, move
> it to probe.c, and make it static.

the calling of pci_bus_crs_vendor_id() in pci_bus_read_dev_vendor_id()
could be removed. pci_bus_wait_crs() have that calling inside.

-Yinghai