2013-08-01 16:55:16

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v3 0/9] pci: bus and slot reset interfaces

I posted a v2 of this back in May, but nack'd it because it had issues
in cases where PCI hotplug was built as a module. I pulled some of
the code into core pci to alleviate that and module options are being
removed from core pci hotplug and pciehp, so I think it's completely
fixed now. Since the RFC a couple days ago I've done more testing and
added the last two patches, which tunes the delays for holding the bus
in reset and allowing initialization time after. These are my best
interpretation of the spec, if anyone wants to adjust them, please
suggest new values and spec or hardware evidence to support it. We
can also remove the duplicate bus reset code in AER, which is done in
the last patch. Including my most recent description of why we need
this series below. Thanks,

Alex

This series adds PCI bus and slot reset interfaces to the already
existing function reset interface. I need this for two reasons, the
first is that not all devices support function level reset. Even
some of those that we detect as supporting a PM reset on D3hot->D0
transition actually don't do any reset. Others have no reset
capability at all. We currently implement a secondary bus reset
escalation from the function reset path, but only when there is a
single devfn on the bus. Drivers like vfio can have ownership of
all of the devices on a bus and should therefore have a path to
initiate a secondary bus reset with multiple devices. This is
particularly required for use of GPUs by userspace, where none of
the predominant GPUs implement a useful function level reset.

The second reason is that even the current function reset escalating
to a secondary bus reset can cause problems with hotplug controllers.
If a root port supports PCIe HP with suprise removal, a bus reset
can trigger a presence detection change, which results in an attempt
to remove the struct device. By having a slot reset interface, we
can involve the hotplug controllers to allow for a controlled bus
reset and avoid this spurious removal attempt.

---

Alex Williamson (9):
pci: Create pci_reset_bridge_secondary_bus()
pci: Add hotplug_slot_ops.reset_slot()
pci: Implement reset_slot for pciehp
pci: Add slot reset option to pci_dev_reset
pci: Split out pci_dev lock/unlock and save/restore
pci: Add slot and bus reset interfaces
pci: Wake-up devices before save for reset
pci: Tune secondary bus reset timing
pci: Remove aer_do_secondary_bus_reset()


drivers/pci/hotplug/pciehp.h | 1
drivers/pci/hotplug/pciehp_core.c | 12 +
drivers/pci/hotplug/pciehp_hpc.c | 31 +++
drivers/pci/pci.c | 331 +++++++++++++++++++++++++++++++++---
drivers/pci/pcie/aer/aerdrv.c | 2
drivers/pci/pcie/aer/aerdrv.h | 1
drivers/pci/pcie/aer/aerdrv_core.c | 35 ----
include/linux/pci.h | 3
include/linux/pci_hotplug.h | 4
9 files changed, 358 insertions(+), 62 deletions(-)


2013-08-01 16:55:25

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v3 1/9] pci: Create pci_reset_bridge_secondary_bus()

Move the secondary bus reset code from pci_parent_bus_reset() into its own
function. Export it as we'll later be calling it from hotplug controllers
and elsewhere.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/pci/pci.c | 32 +++++++++++++++++++++++---------
include/linux/pci.h | 1 +
2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index a599a6b..bfd629c 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3276,9 +3276,30 @@ static int pci_pm_reset(struct pci_dev *dev, int probe)
return 0;
}

-static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
+/**
+ * pci_reset_bridge_secondary_bus - Reset the secondary bus on a PCI bridge.
+ * @dev: Bridge device
+ *
+ * Use the bridge control register to assert reset on the secondary bus.
+ * Devices on the secondary bus are left in power-on state.
+ */
+void pci_reset_bridge_secondary_bus(struct pci_dev *dev)
{
u16 ctrl;
+
+ pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl);
+ ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
+ pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
+ msleep(100);
+
+ ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
+ pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
+ msleep(100);
+}
+EXPORT_SYMBOL_GPL(pci_reset_bridge_secondary_bus);
+
+static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
+{
struct pci_dev *pdev;

if (pci_is_root_bus(dev->bus) || dev->subordinate || !dev->bus->self)
@@ -3291,14 +3312,7 @@ static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
if (probe)
return 0;

- pci_read_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, &ctrl);
- ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
- pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
- msleep(100);
-
- ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
- pci_write_config_word(dev->bus->self, PCI_BRIDGE_CONTROL, ctrl);
- msleep(100);
+ pci_reset_bridge_secondary_bus(dev->bus->self);

return 0;
}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0fd1f15..35c1bc4 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -924,6 +924,7 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
int __pci_reset_function(struct pci_dev *dev);
int __pci_reset_function_locked(struct pci_dev *dev);
int pci_reset_function(struct pci_dev *dev);
+void pci_reset_bridge_secondary_bus(struct pci_dev *dev);
void pci_update_resource(struct pci_dev *dev, int resno);
int __must_check pci_assign_resource(struct pci_dev *dev, int i);
int __must_check pci_reassign_resource(struct pci_dev *dev, int i, resource_size_t add_size, resource_size_t align);

2013-08-01 16:55:35

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v3 3/9] pci: Implement reset_slot for pciehp

PCIe hotplug has a bus per slot, so we can just use a normal
secondary bus reset. However, if a slot supports surprise removal
then a bus reset can be seen as a presence detection change triggering
a hot-remove followed by a hot-add. Disable presence detection from
triggering an interrupt or being polled around the bus reset.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/pci/hotplug/pciehp.h | 1 +
drivers/pci/hotplug/pciehp_core.c | 12 ++++++++++++
drivers/pci/hotplug/pciehp_hpc.c | 31 +++++++++++++++++++++++++++++++
3 files changed, 44 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 7fb3269..541bbe6 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -155,6 +155,7 @@ void pciehp_green_led_off(struct slot *slot);
void pciehp_green_led_blink(struct slot *slot);
int pciehp_check_link_status(struct controller *ctrl);
void pciehp_release_ctrl(struct controller *ctrl);
+int pciehp_reset_slot(struct slot *slot, int probe);

static inline const char *slot_name(struct slot *slot)
{
diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 7d72c5e..f4a18f5 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -69,6 +69,7 @@ static int get_power_status (struct hotplug_slot *slot, u8 *value);
static int get_attention_status (struct hotplug_slot *slot, u8 *value);
static int get_latch_status (struct hotplug_slot *slot, u8 *value);
static int get_adapter_status (struct hotplug_slot *slot, u8 *value);
+static int reset_slot (struct hotplug_slot *slot, int probe);

/**
* release_slot - free up the memory used by a slot
@@ -111,6 +112,7 @@ static int init_slot(struct controller *ctrl)
ops->disable_slot = disable_slot;
ops->get_power_status = get_power_status;
ops->get_adapter_status = get_adapter_status;
+ ops->reset_slot = reset_slot;
if (MRL_SENS(ctrl))
ops->get_latch_status = get_latch_status;
if (ATTN_LED(ctrl)) {
@@ -223,6 +225,16 @@ static int get_adapter_status(struct hotplug_slot *hotplug_slot, u8 *value)
return pciehp_get_adapter_status(slot, value);
}

+static int reset_slot(struct hotplug_slot *hotplug_slot, int probe)
+{
+ struct slot *slot = hotplug_slot->private;
+
+ ctrl_dbg(slot->ctrl, "%s: physical_slot = %s\n",
+ __func__, slot_name(slot));
+
+ return pciehp_reset_slot(slot, probe);
+}
+
static int pciehp_probe(struct pcie_device *dev)
{
int rc;
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index b225573..51f56ef 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -749,6 +749,37 @@ static void pcie_disable_notification(struct controller *ctrl)
ctrl_warn(ctrl, "Cannot disable software notification\n");
}

+/*
+ * pciehp has a 1:1 bus:slot relationship so we ultimately want a secondary
+ * bus reset of the bridge, but if the slot supports surprise removal we need
+ * to disable presence detection around the bus reset and clear any spurious
+ * events after.
+ */
+int pciehp_reset_slot(struct slot *slot, int probe)
+{
+ struct controller *ctrl = slot->ctrl;
+
+ if (probe)
+ return 0;
+
+ if (HP_SUPR_RM(ctrl)) {
+ pcie_write_cmd(ctrl, 0, PCI_EXP_SLTCTL_PDCE);
+ if (pciehp_poll_mode)
+ del_timer_sync(&ctrl->poll_timer);
+ }
+
+ pci_reset_bridge_secondary_bus(ctrl->pcie->port);
+
+ if (HP_SUPR_RM(ctrl)) {
+ pciehp_writew(ctrl, PCI_EXP_SLTSTA, PCI_EXP_SLTSTA_PDC);
+ pcie_write_cmd(ctrl, PCI_EXP_SLTCTL_PDCE, PCI_EXP_SLTCTL_PDCE);
+ if (pciehp_poll_mode)
+ int_poll_timeout(ctrl->poll_timer.data);
+ }
+
+ return 0;
+}
+
int pcie_init_notification(struct controller *ctrl)
{
if (pciehp_request_irq(ctrl))

2013-08-01 16:55:47

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v3 5/9] pci: Split out pci_dev lock/unlock and save/restore

Only cosmetic changes to existing paths.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/pci/pci.c | 52 +++++++++++++++++++++++++++++++++++-----------------
1 file changed, 35 insertions(+), 17 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index db90692..f6c757a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3378,22 +3378,46 @@ done:
return rc;
}

+static void pci_dev_lock(struct pci_dev *dev)
+{
+ pci_cfg_access_lock(dev);
+ /* block PM suspend, driver probe, etc. */
+ device_lock(&dev->dev);
+}
+
+static void pci_dev_unlock(struct pci_dev *dev)
+{
+ device_unlock(&dev->dev);
+ pci_cfg_access_unlock(dev);
+}
+
+static void pci_dev_save_and_disable(struct pci_dev *dev)
+{
+ pci_save_state(dev);
+ /*
+ * both INTx and MSI are disabled after the Interrupt Disable bit
+ * is set and the Bus Master bit is cleared.
+ */
+ pci_write_config_word(dev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
+}
+
+static void pci_dev_restore(struct pci_dev *dev)
+{
+ pci_restore_state(dev);
+}
+
static int pci_dev_reset(struct pci_dev *dev, int probe)
{
int rc;

- if (!probe) {
- pci_cfg_access_lock(dev);
- /* block PM suspend, driver probe, etc. */
- device_lock(&dev->dev);
- }
+ if (!probe)
+ pci_dev_lock(dev);

rc = __pci_dev_reset(dev, probe);

- if (!probe) {
- device_unlock(&dev->dev);
- pci_cfg_access_unlock(dev);
- }
+ if (!probe)
+ pci_dev_unlock(dev);
+
return rc;
}
/**
@@ -3484,17 +3508,11 @@ int pci_reset_function(struct pci_dev *dev)
if (rc)
return rc;

- pci_save_state(dev);
-
- /*
- * both INTx and MSI are disabled after the Interrupt Disable bit
- * is set and the Bus Master bit is cleared.
- */
- pci_write_config_word(dev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
+ pci_dev_save_and_disable(dev);

rc = pci_dev_reset(dev, 0);

- pci_restore_state(dev);
+ pci_dev_restore(dev);

return rc;
}

2013-08-01 16:55:41

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v3 4/9] pci: Add slot reset option to pci_dev_reset

If the hotplug controller provides a way to reset a slot, use that
before a direct parent bus reset. Like the bus reset option, this is
only available when a single pci_dev occupies the slot.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/pci/pci.c | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index bfd629c..db90692 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -22,6 +22,7 @@
#include <linux/interrupt.h>
#include <linux/device.h>
#include <linux/pm_runtime.h>
+#include <linux/pci_hotplug.h>
#include <asm-generic/pci-bridge.h>
#include <asm/setup.h>
#include "pci.h"
@@ -3317,6 +3318,35 @@ static int pci_parent_bus_reset(struct pci_dev *dev, int probe)
return 0;
}

+static int pci_reset_hotplug_slot(struct hotplug_slot *hotplug, int probe)
+{
+ int rc = -ENOTTY;
+
+ if (!hotplug || !try_module_get(hotplug->ops->owner))
+ return rc;
+
+ if (hotplug->ops->reset_slot)
+ rc = hotplug->ops->reset_slot(hotplug, probe);
+
+ module_put(hotplug->ops->owner);
+
+ return rc;
+}
+
+static int pci_dev_reset_slot_function(struct pci_dev *dev, int probe)
+{
+ struct pci_dev *pdev;
+
+ if (dev->subordinate || !dev->slot)
+ return -ENOTTY;
+
+ list_for_each_entry(pdev, &dev->bus->devices, bus_list)
+ if (pdev != dev && pdev->slot == dev->slot)
+ return -ENOTTY;
+
+ return pci_reset_hotplug_slot(dev->slot->hotplug, probe);
+}
+
static int __pci_dev_reset(struct pci_dev *dev, int probe)
{
int rc;
@@ -3339,6 +3369,10 @@ static int __pci_dev_reset(struct pci_dev *dev, int probe)
if (rc != -ENOTTY)
goto done;

+ rc = pci_dev_reset_slot_function(dev, probe);
+ if (rc != -ENOTTY)
+ goto done;
+
rc = pci_parent_bus_reset(dev, probe);
done:
return rc;

2013-08-01 16:55:54

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v3 6/9] pci: Add slot and bus reset interfaces

Sometimes pci_reset_function is not sufficient. We have cases where
devices do not support any kind of reset, but there might be multiple
functions on the bus preventing pci_reset_function from doing a
secondary bus reset. We also have cases where a device will advertise
that it supports a PM reset, but really does nothing on D3hot->D0
(graphics cards are notorious for this). These devices often also
have more than one function, so even blacklisting PM reset for them
wouldn't allow a secondary bus reset through pci_reset_function.

If a driver supports multiple devices it should have the ability to
induce a bus reset when it needs to. This patch provides that ability
through pci_reset_slot and pci_reset_bus. It's the caller's
responsibility when using these interfaces to understand that all of
the devices in or below the slot (or on or below the bus) will be
reset and therefore should be under control of the caller. PCI state
of all the affected devices is saved and restored around these resets,
but internal state of all of the affected devices is reset (which
should be the intention).

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/pci/pci.c | 195 +++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/pci.h | 2 +
2 files changed, 197 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index f6c757a..91f7bc4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3518,6 +3518,201 @@ int pci_reset_function(struct pci_dev *dev)
}
EXPORT_SYMBOL_GPL(pci_reset_function);

+static void pci_bus_lock(struct pci_bus *bus)
+{
+ struct pci_dev *dev;
+
+ list_for_each_entry(dev, &bus->devices, bus_list) {
+ pci_dev_lock(dev);
+ if (dev->subordinate)
+ pci_bus_lock(dev->subordinate);
+ }
+}
+
+static void pci_bus_unlock(struct pci_bus *bus)
+{
+ struct pci_dev *dev;
+
+ list_for_each_entry(dev, &bus->devices, bus_list) {
+ pci_dev_unlock(dev);
+ if (dev->subordinate)
+ pci_bus_unlock(dev->subordinate);
+ }
+}
+
+static void pci_slot_lock(struct pci_slot *slot)
+{
+ struct pci_dev *dev;
+
+ list_for_each_entry(dev, &slot->bus->devices, bus_list) {
+ if (!dev->slot || dev->slot != slot)
+ continue;
+ pci_dev_lock(dev);
+ if (dev->subordinate)
+ pci_bus_lock(dev->subordinate);
+ }
+}
+
+static void pci_slot_unlock(struct pci_slot *slot)
+{
+ struct pci_dev *dev;
+
+ list_for_each_entry(dev, &slot->bus->devices, bus_list) {
+ if (!dev->slot || dev->slot != slot)
+ continue;
+ pci_dev_unlock(dev);
+ if (dev->subordinate)
+ pci_bus_unlock(dev->subordinate);
+ }
+}
+
+static void pci_bus_save_and_disable(struct pci_bus *bus)
+{
+ struct pci_dev *dev;
+
+ list_for_each_entry(dev, &bus->devices, bus_list) {
+ pci_dev_save_and_disable(dev);
+ if (dev->subordinate)
+ pci_bus_save_and_disable(dev->subordinate);
+ }
+}
+
+static void pci_bus_restore(struct pci_bus *bus)
+{
+ struct pci_dev *dev;
+
+ list_for_each_entry(dev, &bus->devices, bus_list) {
+ pci_dev_restore(dev);
+ if (dev->subordinate)
+ pci_bus_restore(dev->subordinate);
+ }
+}
+
+static void pci_slot_save_and_disable(struct pci_slot *slot)
+{
+ struct pci_dev *dev;
+
+ list_for_each_entry(dev, &slot->bus->devices, bus_list) {
+ if (!dev->slot || dev->slot != slot)
+ continue;
+ pci_dev_save_and_disable(dev);
+ if (dev->subordinate)
+ pci_bus_save_and_disable(dev->subordinate);
+ }
+}
+
+static void pci_slot_restore(struct pci_slot *slot)
+{
+ struct pci_dev *dev;
+
+ list_for_each_entry(dev, &slot->bus->devices, bus_list) {
+ if (!dev->slot || dev->slot != slot)
+ continue;
+ pci_dev_restore(dev);
+ if (dev->subordinate)
+ pci_bus_restore(dev->subordinate);
+ }
+}
+
+static int pci_slot_reset(struct pci_slot *slot, int probe)
+{
+ int rc;
+
+ if (!slot)
+ return -ENOTTY;
+
+ if (!probe)
+ pci_slot_lock(slot);
+
+ might_sleep();
+
+ rc = pci_reset_hotplug_slot(slot->hotplug, probe);
+
+ if (!probe)
+ pci_slot_unlock(slot);
+
+ return rc;
+}
+
+/**
+ * pci_reset_slot - reset a PCI slot
+ * @slot: PCI slot to reset
+ *
+ * A PCI bus may host multiple slots, each slot may support a reset mechanism
+ * independent of other slots. For instance, some slots may support slot power
+ * control. In the case of a 1:1 bus to slot architecture, this function may
+ * wrap the bus reset to avoid spurious slot related events such as hotplug.
+ * Generally a slot reset should be attempted before a bus reset. All of the
+ * function of the slot and any subordinate buses behind the slot are reset
+ * through this function. PCI config space of all devices in the slot and
+ * behind the slot is saved before and restored after reset.
+ *
+ * Return 0 on success, non-zero on error.
+ */
+int pci_reset_slot(struct pci_slot *slot)
+{
+ int rc;
+
+ rc = pci_slot_reset(slot, 1);
+ if (rc)
+ return rc;
+
+ pci_slot_save_and_disable(slot);
+
+ rc = pci_slot_reset(slot, 0);
+
+ pci_slot_restore(slot);
+
+ return rc;
+}
+EXPORT_SYMBOL_GPL(pci_reset_slot);
+
+static int pci_bus_reset(struct pci_bus *bus, int probe)
+{
+ if (!bus->self)
+ return -ENOTTY;
+
+ if (probe)
+ return 0;
+
+ pci_bus_lock(bus);
+
+ might_sleep();
+
+ pci_reset_bridge_secondary_bus(bus->self);
+
+ pci_bus_unlock(bus);
+
+ return 0;
+}
+
+/**
+ * pci_reset_bus - reset a PCI bus
+ * @bus: top level PCI bus to reset
+ *
+ * Do a bus reset on the given bus and any subordinate buses, saving
+ * and restoring state of all devices.
+ *
+ * Return 0 on success, non-zero on error.
+ */
+int pci_reset_bus(struct pci_bus *bus)
+{
+ int rc;
+
+ rc = pci_bus_reset(bus, 1);
+ if (rc)
+ return rc;
+
+ pci_bus_save_and_disable(bus);
+
+ rc = pci_bus_reset(bus, 0);
+
+ pci_bus_restore(bus);
+
+ return rc;
+}
+EXPORT_SYMBOL_GPL(pci_reset_bus);
+
/**
* pcix_get_max_mmrbc - get PCI-X maximum designed memory read byte count
* @dev: PCI device to query
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 35c1bc4..1a8fd34 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -924,6 +924,8 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
int __pci_reset_function(struct pci_dev *dev);
int __pci_reset_function_locked(struct pci_dev *dev);
int pci_reset_function(struct pci_dev *dev);
+int pci_reset_slot(struct pci_slot *slot);
+int pci_reset_bus(struct pci_bus *bus);
void pci_reset_bridge_secondary_bus(struct pci_dev *dev);
void pci_update_resource(struct pci_dev *dev, int resno);
int __must_check pci_assign_resource(struct pci_dev *dev, int i);

2013-08-01 16:56:06

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v3 8/9] pci: Tune secondary bus reset timing

The PCI spec indicates that with stable power, reset needs to be
asserted for a minimum of 1ms (Trst). Seems like we should be able
to assume power is stable for a runtime secondary bus reset. The
current code has always used 100ms with no explanation where that
came from. The aer_do_secondary_bus_reset() function uses 2ms, but
that seems to be a misinterpretation of the PCIe spec, where hot
reset is implemented by TS1 ordered sets containing the hot reset
command. After a 2ms delay the state machine enters the detect state,
but to generate a link down, only two consecutive TS1 hot reset
ordered sets are requred. 1ms should be plenty for that.

After reset is de-asserted we must wait for devices to complete
initialization. The specs refer to this as "recovery time" (Trhfa).
For PCI this is 2^25 clock cycles or 2^26 for PCI-X. For minimum
bus speeds, both of those come to 1s. PCIe "softens" this
requirement with the Configuration Request Retry Status (CRS)
completion status. Theoretically we could use CRS to shorten the
wait time. We don't make use of that here, using a fixed 1s delay
to allow devices to re-initialize.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/pci/pci.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 3e71887..a5c6a9b 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3291,11 +3291,22 @@ void pci_reset_bridge_secondary_bus(struct pci_dev *dev)
pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &ctrl);
ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
- msleep(100);
+ /*
+ * PCI spec v3.0 7.6.4.2 requires minimum Trst of 1ms.
+ */
+ msleep(1);

ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
- msleep(100);
+
+ /*
+ * Trhfa for conventional PCI is 2^25 clock cycles.
+ * Assuming a minimum 33MHz clock this results in a 1s
+ * delay before we can consider subordinate devices to
+ * be re-initialized. PCIe has some ways to shorten this,
+ * but we don't make use of them yet.
+ */
+ ssleep(1);
}
EXPORT_SYMBOL_GPL(pci_reset_bridge_secondary_bus);

2013-08-01 16:56:01

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v3 7/9] pci: Wake-up devices before save for reset

Devices come out of reset in D0. Restoring a device to a different
post-reset state takes more smarts than our simple config space
restore, which can leave devices in an inconsistent state. For
example, if a device is reset in D3, but the restore doesn't
successfully return the device to D3, then the actual state of the
device and dev->current_state are contradictory. Put everything
in D0 going into the reset, then we don't need to do anything
special on the way out.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/pci/pci.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 91f7bc4..3e71887 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3393,6 +3393,13 @@ static void pci_dev_unlock(struct pci_dev *dev)

static void pci_dev_save_and_disable(struct pci_dev *dev)
{
+ /*
+ * Wake-up device prior to save. PM registers default to D0 after
+ * reset and a simple register restore doesn't reliably return
+ * to a non-D0 state anyway.
+ */
+ pci_set_power_state(dev, PCI_D0);
+
pci_save_state(dev);
/*
* both INTx and MSI are disabled after the Interrupt Disable bit

2013-08-01 16:56:11

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v3 9/9] pci: Remove aer_do_secondary_bus_reset()

One PCI bus reset function to rule them all.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/pci/pcie/aer/aerdrv.c | 2 +-
drivers/pci/pcie/aer/aerdrv.h | 1 -
drivers/pci/pcie/aer/aerdrv_core.c | 35 +----------------------------------
3 files changed, 2 insertions(+), 36 deletions(-)

diff --git a/drivers/pci/pcie/aer/aerdrv.c b/drivers/pci/pcie/aer/aerdrv.c
index 76ef634..0bf82a2 100644
--- a/drivers/pci/pcie/aer/aerdrv.c
+++ b/drivers/pci/pcie/aer/aerdrv.c
@@ -352,7 +352,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev)
reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK;
pci_write_config_dword(dev, pos + PCI_ERR_ROOT_COMMAND, reg32);

- aer_do_secondary_bus_reset(dev);
+ pci_reset_bridge_secondary_bus(dev);
dev_printk(KERN_DEBUG, &dev->dev, "Root Port link has been reset\n");

/* Clear Root Error Status */
diff --git a/drivers/pci/pcie/aer/aerdrv.h b/drivers/pci/pcie/aer/aerdrv.h
index 90ea3e8..84420b7 100644
--- a/drivers/pci/pcie/aer/aerdrv.h
+++ b/drivers/pci/pcie/aer/aerdrv.h
@@ -106,7 +106,6 @@ static inline pci_ers_result_t merge_result(enum pci_ers_result orig,
}

extern struct bus_type pcie_port_bus_type;
-void aer_do_secondary_bus_reset(struct pci_dev *dev);
int aer_init(struct pcie_device *dev);
void aer_isr(struct work_struct *work);
void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);
diff --git a/drivers/pci/pcie/aer/aerdrv_core.c b/drivers/pci/pcie/aer/aerdrv_core.c
index 8b68ae5..85ca36f 100644
--- a/drivers/pci/pcie/aer/aerdrv_core.c
+++ b/drivers/pci/pcie/aer/aerdrv_core.c
@@ -367,39 +367,6 @@ static pci_ers_result_t broadcast_error_message(struct pci_dev *dev,
}

/**
- * aer_do_secondary_bus_reset - perform secondary bus reset
- * @dev: pointer to bridge's pci_dev data structure
- *
- * Invoked when performing link reset at Root Port or Downstream Port.
- */
-void aer_do_secondary_bus_reset(struct pci_dev *dev)
-{
- u16 p2p_ctrl;
-
- /* Assert Secondary Bus Reset */
- pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &p2p_ctrl);
- p2p_ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
- pci_write_config_word(dev, PCI_BRIDGE_CONTROL, p2p_ctrl);
-
- /*
- * we should send hot reset message for 2ms to allow it time to
- * propagate to all downstream ports
- */
- msleep(2);
-
- /* De-assert Secondary Bus Reset */
- p2p_ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
- pci_write_config_word(dev, PCI_BRIDGE_CONTROL, p2p_ctrl);
-
- /*
- * System software must wait for at least 100ms from the end
- * of a reset of one or more device before it is permitted
- * to issue Configuration Requests to those devices.
- */
- msleep(200);
-}
-
-/**
* default_reset_link - default reset function
* @dev: pointer to pci_dev data structure
*
@@ -408,7 +375,7 @@ void aer_do_secondary_bus_reset(struct pci_dev *dev)
*/
static pci_ers_result_t default_reset_link(struct pci_dev *dev)
{
- aer_do_secondary_bus_reset(dev);
+ pci_reset_bridge_secondary_bus(dev);
dev_printk(KERN_DEBUG, &dev->dev, "downstream link has been reset\n");
return PCI_ERS_RESULT_RECOVERED;
}

2013-08-01 16:55:59

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v3 2/9] pci: Add hotplug_slot_ops.reset_slot()

This optional callback allows htoplug controllers to perform slot
specific resets. These may be necessary in cases where a normal
secondary bus reset can interact with controller logic and expose
spurious hotplugs.

Signed-off-by: Alex Williamson <[email protected]>
---
include/linux/pci_hotplug.h | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/include/linux/pci_hotplug.h b/include/linux/pci_hotplug.h
index 8db71dc..bd32109 100644
--- a/include/linux/pci_hotplug.h
+++ b/include/linux/pci_hotplug.h
@@ -63,6 +63,9 @@ enum pcie_link_width {
* @get_adapter_status: Called to get see if an adapter is present in the slot or not.
* If this field is NULL, the value passed in the struct hotplug_slot_info
* will be used when this value is requested by a user.
+ * @reset_slot: Optional interface to allow override of a bus reset for the
+ * slot for cases where a secondary bus reset can result in spurious
+ * hotplug events or where a slot can be reset independent of the bus.
*
* The table of function pointers that is passed to the hotplug pci core by a
* hotplug pci driver. These functions are called by the hotplug pci core when
@@ -80,6 +83,7 @@ struct hotplug_slot_ops {
int (*get_attention_status) (struct hotplug_slot *slot, u8 *value);
int (*get_latch_status) (struct hotplug_slot *slot, u8 *value);
int (*get_adapter_status) (struct hotplug_slot *slot, u8 *value);
+ int (*reset_slot) (struct hotplug_slot *slot, int probe);
};

/**

2013-08-01 20:59:21

by Donald Dutile

[permalink] [raw]
Subject: Re: [PATCH v3 5/9] pci: Split out pci_dev lock/unlock and save/restore

On 08/01/2013 12:55 PM, Alex Williamson wrote:
> Only cosmetic changes to existing paths.
>
> Signed-off-by: Alex Williamson<[email protected]>
> ---
> drivers/pci/pci.c | 52 +++++++++++++++++++++++++++++++++++-----------------
> 1 file changed, 35 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index db90692..f6c757a 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3378,22 +3378,46 @@ done:
> return rc;
> }
>
> +static void pci_dev_lock(struct pci_dev *dev)
> +{
> + pci_cfg_access_lock(dev);
> + /* block PM suspend, driver probe, etc. */
> + device_lock(&dev->dev);
> +}
> +
> +static void pci_dev_unlock(struct pci_dev *dev)
> +{
> + device_unlock(&dev->dev);
> + pci_cfg_access_unlock(dev);
> +}
> +
> +static void pci_dev_save_and_disable(struct pci_dev *dev)
> +{
> + pci_save_state(dev);
> + /*
> + * both INTx and MSI are disabled after the Interrupt Disable bit
> + * is set and the Bus Master bit is cleared.
> + */
> + pci_write_config_word(dev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
> +}
although the above is just a shuffling of the previous code,
does the comment need to point out that BARs are disabled by this write as well ?

> +
> +static void pci_dev_restore(struct pci_dev *dev)
> +{
> + pci_restore_state(dev);
> +}
> +
> static int pci_dev_reset(struct pci_dev *dev, int probe)
> {
> int rc;
>
> - if (!probe) {
> - pci_cfg_access_lock(dev);
> - /* block PM suspend, driver probe, etc. */
> - device_lock(&dev->dev);
> - }
> + if (!probe)
> + pci_dev_lock(dev);
>
> rc = __pci_dev_reset(dev, probe);
>
> - if (!probe) {
> - device_unlock(&dev->dev);
> - pci_cfg_access_unlock(dev);
> - }
> + if (!probe)
> + pci_dev_unlock(dev);
> +
> return rc;
> }
> /**
> @@ -3484,17 +3508,11 @@ int pci_reset_function(struct pci_dev *dev)
> if (rc)
> return rc;
>
> - pci_save_state(dev);
> -
> - /*
> - * both INTx and MSI are disabled after the Interrupt Disable bit
> - * is set and the Bus Master bit is cleared.
> - */
> - pci_write_config_word(dev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
> + pci_dev_save_and_disable(dev);
>
> rc = pci_dev_reset(dev, 0);
>
> - pci_restore_state(dev);
> + pci_dev_restore(dev);
>
> return rc;
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-08-01 21:04:46

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v3 5/9] pci: Split out pci_dev lock/unlock and save/restore

On Thu, 2013-08-01 at 16:59 -0400, Don Dutile wrote:
> On 08/01/2013 12:55 PM, Alex Williamson wrote:
> > Only cosmetic changes to existing paths.
> >
> > Signed-off-by: Alex Williamson<[email protected]>
> > ---
> > drivers/pci/pci.c | 52 +++++++++++++++++++++++++++++++++++-----------------
> > 1 file changed, 35 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index db90692..f6c757a 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3378,22 +3378,46 @@ done:
> > return rc;
> > }
> >
> > +static void pci_dev_lock(struct pci_dev *dev)
> > +{
> > + pci_cfg_access_lock(dev);
> > + /* block PM suspend, driver probe, etc. */
> > + device_lock(&dev->dev);
> > +}
> > +
> > +static void pci_dev_unlock(struct pci_dev *dev)
> > +{
> > + device_unlock(&dev->dev);
> > + pci_cfg_access_unlock(dev);
> > +}
> > +
> > +static void pci_dev_save_and_disable(struct pci_dev *dev)
> > +{
> > + pci_save_state(dev);
> > + /*
> > + * both INTx and MSI are disabled after the Interrupt Disable bit
> > + * is set and the Bus Master bit is cleared.
> > + */
> > + pci_write_config_word(dev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
> > +}
> although the above is just a shuffling of the previous code,
> does the comment need to point out that BARs are disabled by this write as well ?

I could certainly add such a comment. That's also why I added the
_and_disable to this function name instead of just pci_dev_save().
Thanks,

Alex

> > +
> > +static void pci_dev_restore(struct pci_dev *dev)
> > +{
> > + pci_restore_state(dev);
> > +}
> > +
> > static int pci_dev_reset(struct pci_dev *dev, int probe)
> > {
> > int rc;
> >
> > - if (!probe) {
> > - pci_cfg_access_lock(dev);
> > - /* block PM suspend, driver probe, etc. */
> > - device_lock(&dev->dev);
> > - }
> > + if (!probe)
> > + pci_dev_lock(dev);
> >
> > rc = __pci_dev_reset(dev, probe);
> >
> > - if (!probe) {
> > - device_unlock(&dev->dev);
> > - pci_cfg_access_unlock(dev);
> > - }
> > + if (!probe)
> > + pci_dev_unlock(dev);
> > +
> > return rc;
> > }
> > /**
> > @@ -3484,17 +3508,11 @@ int pci_reset_function(struct pci_dev *dev)
> > if (rc)
> > return rc;
> >
> > - pci_save_state(dev);
> > -
> > - /*
> > - * both INTx and MSI are disabled after the Interrupt Disable bit
> > - * is set and the Bus Master bit is cleared.
> > - */
> > - pci_write_config_word(dev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE);
> > + pci_dev_save_and_disable(dev);
> >
> > rc = pci_dev_reset(dev, 0);
> >
> > - pci_restore_state(dev);
> > + pci_dev_restore(dev);
> >
> > return rc;
> > }
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2013-08-01 21:23:06

by Donald Dutile

[permalink] [raw]
Subject: Re: [PATCH v3 6/9] pci: Add slot and bus reset interfaces

On 08/01/2013 12:55 PM, Alex Williamson wrote:
> Sometimes pci_reset_function is not sufficient. We have cases where
> devices do not support any kind of reset, but there might be multiple
> functions on the bus preventing pci_reset_function from doing a
> secondary bus reset. We also have cases where a device will advertise
> that it supports a PM reset, but really does nothing on D3hot->D0
> (graphics cards are notorious for this). These devices often also
> have more than one function, so even blacklisting PM reset for them
> wouldn't allow a secondary bus reset through pci_reset_function.
>
> If a driver supports multiple devices it should have the ability to
> induce a bus reset when it needs to. This patch provides that ability
> through pci_reset_slot and pci_reset_bus. It's the caller's
> responsibility when using these interfaces to understand that all of
> the devices in or below the slot (or on or below the bus) will be
> reset and therefore should be under control of the caller. PCI state
> of all the affected devices is saved and restored around these resets,
> but internal state of all of the affected devices is reset (which
> should be the intention).
>
> Signed-off-by: Alex Williamson<[email protected]>
> ---
> drivers/pci/pci.c | 195 +++++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/pci.h | 2 +
> 2 files changed, 197 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index f6c757a..91f7bc4 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3518,6 +3518,201 @@ int pci_reset_function(struct pci_dev *dev)
> }
> EXPORT_SYMBOL_GPL(pci_reset_function);
>
> +static void pci_bus_lock(struct pci_bus *bus)
> +{
> + struct pci_dev *dev;
> +
> + list_for_each_entry(dev,&bus->devices, bus_list) {
> + pci_dev_lock(dev);
> + if (dev->subordinate)
> + pci_bus_lock(dev->subordinate);
> + }
> +}
> +
> +static void pci_bus_unlock(struct pci_bus *bus)
> +{
> + struct pci_dev *dev;
> +
> + list_for_each_entry(dev,&bus->devices, bus_list) {
> + pci_dev_unlock(dev);
> + if (dev->subordinate)
> + pci_bus_unlock(dev->subordinate);
> + }
> +}
So, I'm wondering if the above locks from top of PCI tree down,
should the unlock work from bottom back to the top ?
i.e.,
list_for_each_entry(dev,&bus->devices, bus_list) {
if (dev->subordinate)
pci_bus_unlock(dev->subordinate);
pci_dev_unlock(dev);
}

> +static void pci_slot_lock(struct pci_slot *slot)
> +{
> + struct pci_dev *dev;
> +
> + list_for_each_entry(dev,&slot->bus->devices, bus_list) {
> + if (!dev->slot || dev->slot != slot)
> + continue;
> + pci_dev_lock(dev);
> + if (dev->subordinate)
> + pci_bus_lock(dev->subordinate);
> + }
> +}
> +
> +static void pci_slot_unlock(struct pci_slot *slot)
> +{
> + struct pci_dev *dev;
> +
> + list_for_each_entry(dev,&slot->bus->devices, bus_list) {
> + if (!dev->slot || dev->slot != slot)
> + continue;
> + pci_dev_unlock(dev);
> + if (dev->subordinate)
> + pci_bus_unlock(dev->subordinate);
> + }
> +}
ditto.

> +
> +static void pci_bus_save_and_disable(struct pci_bus *bus)
> +{
> + struct pci_dev *dev;
> +
> + list_for_each_entry(dev,&bus->devices, bus_list) {
> + pci_dev_save_and_disable(dev);
> + if (dev->subordinate)
> + pci_bus_save_and_disable(dev->subordinate);
> + }
> +}
> +
> +static void pci_bus_restore(struct pci_bus *bus)
> +{
> + struct pci_dev *dev;
> +
> + list_for_each_entry(dev,&bus->devices, bus_list) {
> + pci_dev_restore(dev);
> + if (dev->subordinate)
> + pci_bus_restore(dev->subordinate);
> + }
> +}
but not in reverse here, b/c need top-most device restored
to get to next level in PCI tree (since could be a PPB that needs
restoring to get to next level bus/device(s)).

> +
> +static void pci_slot_save_and_disable(struct pci_slot *slot)
> +{
> + struct pci_dev *dev;
> +
> + list_for_each_entry(dev,&slot->bus->devices, bus_list) {
> + if (!dev->slot || dev->slot != slot)
> + continue;
> + pci_dev_save_and_disable(dev);
> + if (dev->subordinate)
> + pci_bus_save_and_disable(dev->subordinate);
> + }
> +}
> +
> +static void pci_slot_restore(struct pci_slot *slot)
> +{
> + struct pci_dev *dev;
> +
> + list_for_each_entry(dev,&slot->bus->devices, bus_list) {
> + if (!dev->slot || dev->slot != slot)
> + continue;
> + pci_dev_restore(dev);
> + if (dev->subordinate)
> + pci_bus_restore(dev->subordinate);
> + }
> +}
ditto.

> +
> +static int pci_slot_reset(struct pci_slot *slot, int probe)
> +{
> + int rc;
> +
> + if (!slot)
> + return -ENOTTY;
> +
> + if (!probe)
> + pci_slot_lock(slot);
> +
> + might_sleep();
> +
> + rc = pci_reset_hotplug_slot(slot->hotplug, probe);
> +
> + if (!probe)
> + pci_slot_unlock(slot);
> +
> + return rc;
> +}
> +
> +/**
> + * pci_reset_slot - reset a PCI slot
> + * @slot: PCI slot to reset
> + *
> + * A PCI bus may host multiple slots, each slot may support a reset mechanism
> + * independent of other slots. For instance, some slots may support slot power
> + * control. In the case of a 1:1 bus to slot architecture, this function may
> + * wrap the bus reset to avoid spurious slot related events such as hotplug.
> + * Generally a slot reset should be attempted before a bus reset. All of the
> + * function of the slot and any subordinate buses behind the slot are reset
> + * through this function. PCI config space of all devices in the slot and
> + * behind the slot is saved before and restored after reset.
> + *
> + * Return 0 on success, non-zero on error.
> + */
> +int pci_reset_slot(struct pci_slot *slot)
> +{
> + int rc;
> +
> + rc = pci_slot_reset(slot, 1);
> + if (rc)
> + return rc;
> +
> + pci_slot_save_and_disable(slot);
> +
> + rc = pci_slot_reset(slot, 0);
> +
> + pci_slot_restore(slot);
> +
> + return rc;
> +}
> +EXPORT_SYMBOL_GPL(pci_reset_slot);
> +
> +static int pci_bus_reset(struct pci_bus *bus, int probe)
> +{
> + if (!bus->self)
> + return -ENOTTY;
> +
> + if (probe)
> + return 0;
> +
> + pci_bus_lock(bus);
> +
> + might_sleep();
> +
> + pci_reset_bridge_secondary_bus(bus->self);
> +
> + pci_bus_unlock(bus);
> +
> + return 0;
> +}
> +
> +/**
> + * pci_reset_bus - reset a PCI bus
> + * @bus: top level PCI bus to reset
> + *
> + * Do a bus reset on the given bus and any subordinate buses, saving
> + * and restoring state of all devices.
> + *
> + * Return 0 on success, non-zero on error.
> + */
> +int pci_reset_bus(struct pci_bus *bus)
> +{
> + int rc;
> +
> + rc = pci_bus_reset(bus, 1);
> + if (rc)
> + return rc;
> +
> + pci_bus_save_and_disable(bus);
> +
> + rc = pci_bus_reset(bus, 0);
> +
> + pci_bus_restore(bus);
> +
> + return rc;
> +}
> +EXPORT_SYMBOL_GPL(pci_reset_bus);
> +
> /**
> * pcix_get_max_mmrbc - get PCI-X maximum designed memory read byte count
> * @dev: PCI device to query
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 35c1bc4..1a8fd34 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -924,6 +924,8 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
> int __pci_reset_function(struct pci_dev *dev);
> int __pci_reset_function_locked(struct pci_dev *dev);
> int pci_reset_function(struct pci_dev *dev);
> +int pci_reset_slot(struct pci_slot *slot);
> +int pci_reset_bus(struct pci_bus *bus);
> void pci_reset_bridge_secondary_bus(struct pci_dev *dev);
> void pci_update_resource(struct pci_dev *dev, int resno);
> int __must_check pci_assign_resource(struct pci_dev *dev, int i);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-08-01 21:29:34

by Donald Dutile

[permalink] [raw]
Subject: Re: [PATCH v3 8/9] pci: Tune secondary bus reset timing

On 08/01/2013 12:55 PM, Alex Williamson wrote:
> The PCI spec indicates that with stable power, reset needs to be
> asserted for a minimum of 1ms (Trst). Seems like we should be able
> to assume power is stable for a runtime secondary bus reset. The
> current code has always used 100ms with no explanation where that
> came from. The aer_do_secondary_bus_reset() function uses 2ms, but
> that seems to be a misinterpretation of the PCIe spec, where hot
> reset is implemented by TS1 ordered sets containing the hot reset
> command. After a 2ms delay the state machine enters the detect state,
> but to generate a link down, only two consecutive TS1 hot reset
> ordered sets are requred. 1ms should be plenty for that.
>
> After reset is de-asserted we must wait for devices to complete
> initialization. The specs refer to this as "recovery time" (Trhfa).
> For PCI this is 2^25 clock cycles or 2^26 for PCI-X. For minimum
> bus speeds, both of those come to 1s. PCIe "softens" this
> requirement with the Configuration Request Retry Status (CRS)
> completion status. Theoretically we could use CRS to shorten the
> wait time. We don't make use of that here, using a fixed 1s delay
> to allow devices to re-initialize.
Unfortunately, I don't think CRS is widely supported to make it worth
the additional checking & use, atm.

>
> Signed-off-by: Alex Williamson<[email protected]>
> ---
> drivers/pci/pci.c | 15 +++++++++++++--
> 1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 3e71887..a5c6a9b 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3291,11 +3291,22 @@ void pci_reset_bridge_secondary_bus(struct pci_dev *dev)
> pci_read_config_word(dev, PCI_BRIDGE_CONTROL,&ctrl);
> ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
> pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
> - msleep(100);
> + /*
> + * PCI spec v3.0 7.6.4.2 requires minimum Trst of 1ms.
> + */
> + msleep(1);
>
> ctrl&= ~PCI_BRIDGE_CTL_BUS_RESET;
> pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
> - msleep(100);
> +
> + /*
> + * Trhfa for conventional PCI is 2^25 clock cycles.
> + * Assuming a minimum 33MHz clock this results in a 1s
> + * delay before we can consider subordinate devices to
> + * be re-initialized. PCIe has some ways to shorten this,
> + * but we don't make use of them yet.
> + */
> + ssleep(1);
Can't bus speed be determined from (config space) status bits, so
this time can be minimized, esp. on modern, PCIe busses/links ?
Not too many 33Mhz, legacy PCI busses that this type of
timing is desired, or will be done to (for device assignment/vfio). :-/

> }
> EXPORT_SYMBOL_GPL(pci_reset_bridge_secondary_bus);
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-08-01 21:32:21

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v3 6/9] pci: Add slot and bus reset interfaces

On Thu, 2013-08-01 at 17:22 -0400, Don Dutile wrote:
> On 08/01/2013 12:55 PM, Alex Williamson wrote:
> > Sometimes pci_reset_function is not sufficient. We have cases where
> > devices do not support any kind of reset, but there might be multiple
> > functions on the bus preventing pci_reset_function from doing a
> > secondary bus reset. We also have cases where a device will advertise
> > that it supports a PM reset, but really does nothing on D3hot->D0
> > (graphics cards are notorious for this). These devices often also
> > have more than one function, so even blacklisting PM reset for them
> > wouldn't allow a secondary bus reset through pci_reset_function.
> >
> > If a driver supports multiple devices it should have the ability to
> > induce a bus reset when it needs to. This patch provides that ability
> > through pci_reset_slot and pci_reset_bus. It's the caller's
> > responsibility when using these interfaces to understand that all of
> > the devices in or below the slot (or on or below the bus) will be
> > reset and therefore should be under control of the caller. PCI state
> > of all the affected devices is saved and restored around these resets,
> > but internal state of all of the affected devices is reset (which
> > should be the intention).
> >
> > Signed-off-by: Alex Williamson<[email protected]>
> > ---
> > drivers/pci/pci.c | 195 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/pci.h | 2 +
> > 2 files changed, 197 insertions(+)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index f6c757a..91f7bc4 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3518,6 +3518,201 @@ int pci_reset_function(struct pci_dev *dev)
> > }
> > EXPORT_SYMBOL_GPL(pci_reset_function);
> >
> > +static void pci_bus_lock(struct pci_bus *bus)
> > +{
> > + struct pci_dev *dev;
> > +
> > + list_for_each_entry(dev,&bus->devices, bus_list) {
> > + pci_dev_lock(dev);
> > + if (dev->subordinate)
> > + pci_bus_lock(dev->subordinate);
> > + }
> > +}
> > +
> > +static void pci_bus_unlock(struct pci_bus *bus)
> > +{
> > + struct pci_dev *dev;
> > +
> > + list_for_each_entry(dev,&bus->devices, bus_list) {
> > + pci_dev_unlock(dev);
> > + if (dev->subordinate)
> > + pci_bus_unlock(dev->subordinate);
> > + }
> > +}
> So, I'm wondering if the above locks from top of PCI tree down,
> should the unlock work from bottom back to the top ?
> i.e.,
> list_for_each_entry(dev,&bus->devices, bus_list) {
> if (dev->subordinate)
> pci_bus_unlock(dev->subordinate);
> pci_dev_unlock(dev);
> }


I hadn't considered that, it does seem like a good idea though.

> > +static void pci_slot_lock(struct pci_slot *slot)
> > +{
> > + struct pci_dev *dev;
> > +
> > + list_for_each_entry(dev,&slot->bus->devices, bus_list) {
> > + if (!dev->slot || dev->slot != slot)
> > + continue;
> > + pci_dev_lock(dev);
> > + if (dev->subordinate)
> > + pci_bus_lock(dev->subordinate);
> > + }
> > +}
> > +
> > +static void pci_slot_unlock(struct pci_slot *slot)
> > +{
> > + struct pci_dev *dev;
> > +
> > + list_for_each_entry(dev,&slot->bus->devices, bus_list) {
> > + if (!dev->slot || dev->slot != slot)
> > + continue;
> > + pci_dev_unlock(dev);
> > + if (dev->subordinate)
> > + pci_bus_unlock(dev->subordinate);
> > + }
> > +}
> ditto.
>
> > +
> > +static void pci_bus_save_and_disable(struct pci_bus *bus)
> > +{
> > + struct pci_dev *dev;
> > +
> > + list_for_each_entry(dev,&bus->devices, bus_list) {
> > + pci_dev_save_and_disable(dev);
> > + if (dev->subordinate)
> > + pci_bus_save_and_disable(dev->subordinate);
> > + }
> > +}
> > +
> > +static void pci_bus_restore(struct pci_bus *bus)
> > +{
> > + struct pci_dev *dev;
> > +
> > + list_for_each_entry(dev,&bus->devices, bus_list) {
> > + pci_dev_restore(dev);
> > + if (dev->subordinate)
> > + pci_bus_restore(dev->subordinate);
> > + }
> > +}
> but not in reverse here, b/c need top-most device restored
> to get to next level in PCI tree (since could be a PPB that needs
> restoring to get to next level bus/device(s)).

Right. Thanks, I'll see if there are any more comments and respin.

Alex

> > +
> > +static void pci_slot_save_and_disable(struct pci_slot *slot)
> > +{
> > + struct pci_dev *dev;
> > +
> > + list_for_each_entry(dev,&slot->bus->devices, bus_list) {
> > + if (!dev->slot || dev->slot != slot)
> > + continue;
> > + pci_dev_save_and_disable(dev);
> > + if (dev->subordinate)
> > + pci_bus_save_and_disable(dev->subordinate);
> > + }
> > +}
> > +
> > +static void pci_slot_restore(struct pci_slot *slot)
> > +{
> > + struct pci_dev *dev;
> > +
> > + list_for_each_entry(dev,&slot->bus->devices, bus_list) {
> > + if (!dev->slot || dev->slot != slot)
> > + continue;
> > + pci_dev_restore(dev);
> > + if (dev->subordinate)
> > + pci_bus_restore(dev->subordinate);
> > + }
> > +}
> ditto.
>
> > +
> > +static int pci_slot_reset(struct pci_slot *slot, int probe)
> > +{
> > + int rc;
> > +
> > + if (!slot)
> > + return -ENOTTY;
> > +
> > + if (!probe)
> > + pci_slot_lock(slot);
> > +
> > + might_sleep();
> > +
> > + rc = pci_reset_hotplug_slot(slot->hotplug, probe);
> > +
> > + if (!probe)
> > + pci_slot_unlock(slot);
> > +
> > + return rc;
> > +}
> > +
> > +/**
> > + * pci_reset_slot - reset a PCI slot
> > + * @slot: PCI slot to reset
> > + *
> > + * A PCI bus may host multiple slots, each slot may support a reset mechanism
> > + * independent of other slots. For instance, some slots may support slot power
> > + * control. In the case of a 1:1 bus to slot architecture, this function may
> > + * wrap the bus reset to avoid spurious slot related events such as hotplug.
> > + * Generally a slot reset should be attempted before a bus reset. All of the
> > + * function of the slot and any subordinate buses behind the slot are reset
> > + * through this function. PCI config space of all devices in the slot and
> > + * behind the slot is saved before and restored after reset.
> > + *
> > + * Return 0 on success, non-zero on error.
> > + */
> > +int pci_reset_slot(struct pci_slot *slot)
> > +{
> > + int rc;
> > +
> > + rc = pci_slot_reset(slot, 1);
> > + if (rc)
> > + return rc;
> > +
> > + pci_slot_save_and_disable(slot);
> > +
> > + rc = pci_slot_reset(slot, 0);
> > +
> > + pci_slot_restore(slot);
> > +
> > + return rc;
> > +}
> > +EXPORT_SYMBOL_GPL(pci_reset_slot);
> > +
> > +static int pci_bus_reset(struct pci_bus *bus, int probe)
> > +{
> > + if (!bus->self)
> > + return -ENOTTY;
> > +
> > + if (probe)
> > + return 0;
> > +
> > + pci_bus_lock(bus);
> > +
> > + might_sleep();
> > +
> > + pci_reset_bridge_secondary_bus(bus->self);
> > +
> > + pci_bus_unlock(bus);
> > +
> > + return 0;
> > +}
> > +
> > +/**
> > + * pci_reset_bus - reset a PCI bus
> > + * @bus: top level PCI bus to reset
> > + *
> > + * Do a bus reset on the given bus and any subordinate buses, saving
> > + * and restoring state of all devices.
> > + *
> > + * Return 0 on success, non-zero on error.
> > + */
> > +int pci_reset_bus(struct pci_bus *bus)
> > +{
> > + int rc;
> > +
> > + rc = pci_bus_reset(bus, 1);
> > + if (rc)
> > + return rc;
> > +
> > + pci_bus_save_and_disable(bus);
> > +
> > + rc = pci_bus_reset(bus, 0);
> > +
> > + pci_bus_restore(bus);
> > +
> > + return rc;
> > +}
> > +EXPORT_SYMBOL_GPL(pci_reset_bus);
> > +
> > /**
> > * pcix_get_max_mmrbc - get PCI-X maximum designed memory read byte count
> > * @dev: PCI device to query
> > diff --git a/include/linux/pci.h b/include/linux/pci.h
> > index 35c1bc4..1a8fd34 100644
> > --- a/include/linux/pci.h
> > +++ b/include/linux/pci.h
> > @@ -924,6 +924,8 @@ int pcie_set_mps(struct pci_dev *dev, int mps);
> > int __pci_reset_function(struct pci_dev *dev);
> > int __pci_reset_function_locked(struct pci_dev *dev);
> > int pci_reset_function(struct pci_dev *dev);
> > +int pci_reset_slot(struct pci_slot *slot);
> > +int pci_reset_bus(struct pci_bus *bus);
> > void pci_reset_bridge_secondary_bus(struct pci_dev *dev);
> > void pci_update_resource(struct pci_dev *dev, int resno);
> > int __must_check pci_assign_resource(struct pci_dev *dev, int i);
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2013-08-01 21:41:25

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v3 8/9] pci: Tune secondary bus reset timing

On Thu, 2013-08-01 at 17:29 -0400, Don Dutile wrote:
> On 08/01/2013 12:55 PM, Alex Williamson wrote:
> > The PCI spec indicates that with stable power, reset needs to be
> > asserted for a minimum of 1ms (Trst). Seems like we should be able
> > to assume power is stable for a runtime secondary bus reset. The
> > current code has always used 100ms with no explanation where that
> > came from. The aer_do_secondary_bus_reset() function uses 2ms, but
> > that seems to be a misinterpretation of the PCIe spec, where hot
> > reset is implemented by TS1 ordered sets containing the hot reset
> > command. After a 2ms delay the state machine enters the detect state,
> > but to generate a link down, only two consecutive TS1 hot reset
> > ordered sets are requred. 1ms should be plenty for that.
> >
> > After reset is de-asserted we must wait for devices to complete
> > initialization. The specs refer to this as "recovery time" (Trhfa).
> > For PCI this is 2^25 clock cycles or 2^26 for PCI-X. For minimum
> > bus speeds, both of those come to 1s. PCIe "softens" this
> > requirement with the Configuration Request Retry Status (CRS)
> > completion status. Theoretically we could use CRS to shorten the
> > wait time. We don't make use of that here, using a fixed 1s delay
> > to allow devices to re-initialize.
> Unfortunately, I don't think CRS is widely supported to make it worth
> the additional checking & use, atm.
>
> >
> > Signed-off-by: Alex Williamson<[email protected]>
> > ---
> > drivers/pci/pci.c | 15 +++++++++++++--
> > 1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 3e71887..a5c6a9b 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3291,11 +3291,22 @@ void pci_reset_bridge_secondary_bus(struct pci_dev *dev)
> > pci_read_config_word(dev, PCI_BRIDGE_CONTROL,&ctrl);
> > ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
> > pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
> > - msleep(100);
> > + /*
> > + * PCI spec v3.0 7.6.4.2 requires minimum Trst of 1ms.
> > + */
> > + msleep(1);
> >
> > ctrl&= ~PCI_BRIDGE_CTL_BUS_RESET;
> > pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
> > - msleep(100);
> > +
> > + /*
> > + * Trhfa for conventional PCI is 2^25 clock cycles.
> > + * Assuming a minimum 33MHz clock this results in a 1s
> > + * delay before we can consider subordinate devices to
> > + * be re-initialized. PCIe has some ways to shorten this,
> > + * but we don't make use of them yet.
> > + */
> > + ssleep(1);
> Can't bus speed be determined from (config space) status bits, so
> this time can be minimized, esp. on modern, PCIe busses/links ?
> Not too many 33Mhz, legacy PCI busses that this type of
> timing is desired, or will be done to (for device assignment/vfio). :-/

Just like CRS, is it worth it? The PCIe spec seems to indicate a 1s
Trhfa regardless of bus speed. Even if it didn't, we'd need to walk
down through all the subordinate buses to find a least common
denominator. It seems sufficiently complicated to save it for a later
optimization. Thanks,

Alex

> > }
> > EXPORT_SYMBOL_GPL(pci_reset_bridge_secondary_bus);
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>


2013-08-01 21:55:43

by Donald Dutile

[permalink] [raw]
Subject: Re: [PATCH v3 8/9] pci: Tune secondary bus reset timing

On 08/01/2013 05:41 PM, Alex Williamson wrote:
> On Thu, 2013-08-01 at 17:29 -0400, Don Dutile wrote:
>> On 08/01/2013 12:55 PM, Alex Williamson wrote:
>>> The PCI spec indicates that with stable power, reset needs to be
>>> asserted for a minimum of 1ms (Trst). Seems like we should be able
>>> to assume power is stable for a runtime secondary bus reset. The
>>> current code has always used 100ms with no explanation where that
>>> came from. The aer_do_secondary_bus_reset() function uses 2ms, but
>>> that seems to be a misinterpretation of the PCIe spec, where hot
>>> reset is implemented by TS1 ordered sets containing the hot reset
>>> command. After a 2ms delay the state machine enters the detect state,
>>> but to generate a link down, only two consecutive TS1 hot reset
>>> ordered sets are requred. 1ms should be plenty for that.
>>>
>>> After reset is de-asserted we must wait for devices to complete
>>> initialization. The specs refer to this as "recovery time" (Trhfa).
>>> For PCI this is 2^25 clock cycles or 2^26 for PCI-X. For minimum
>>> bus speeds, both of those come to 1s. PCIe "softens" this
>>> requirement with the Configuration Request Retry Status (CRS)
>>> completion status. Theoretically we could use CRS to shorten the
>>> wait time. We don't make use of that here, using a fixed 1s delay
>>> to allow devices to re-initialize.
>> Unfortunately, I don't think CRS is widely supported to make it worth
>> the additional checking& use, atm.
>>
>>>
>>> Signed-off-by: Alex Williamson<[email protected]>
>>> ---
>>> drivers/pci/pci.c | 15 +++++++++++++--
>>> 1 file changed, 13 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
>>> index 3e71887..a5c6a9b 100644
>>> --- a/drivers/pci/pci.c
>>> +++ b/drivers/pci/pci.c
>>> @@ -3291,11 +3291,22 @@ void pci_reset_bridge_secondary_bus(struct pci_dev *dev)
>>> pci_read_config_word(dev, PCI_BRIDGE_CONTROL,&ctrl);
>>> ctrl |= PCI_BRIDGE_CTL_BUS_RESET;
>>> pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
>>> - msleep(100);
>>> + /*
>>> + * PCI spec v3.0 7.6.4.2 requires minimum Trst of 1ms.
>>> + */
>>> + msleep(1);
>>>
>>> ctrl&= ~PCI_BRIDGE_CTL_BUS_RESET;
>>> pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
>>> - msleep(100);
>>> +
>>> + /*
>>> + * Trhfa for conventional PCI is 2^25 clock cycles.
>>> + * Assuming a minimum 33MHz clock this results in a 1s
>>> + * delay before we can consider subordinate devices to
>>> + * be re-initialized. PCIe has some ways to shorten this,
>>> + * but we don't make use of them yet.
>>> + */
>>> + ssleep(1);
>> Can't bus speed be determined from (config space) status bits, so
>> this time can be minimized, esp. on modern, PCIe busses/links ?
>> Not too many 33Mhz, legacy PCI busses that this type of
>> timing is desired, or will be done to (for device assignment/vfio). :-/
>
> Just like CRS, is it worth it? The PCIe spec seems to indicate a 1s
> Trhfa regardless of bus speed. Even if it didn't, we'd need to walk
> down through all the subordinate buses to find a least common
> denominator. It seems sufficiently complicated to save it for a later
> optimization. Thanks,
>
> Alex
>
ya sure.. I figured you had the spec memorized w/all these reset patches
you've worked on! ;-)

>>> }
>>> EXPORT_SYMBOL_GPL(pci_reset_bridge_secondary_bus);
>>>
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-pci" in
>>> the body of a message to [email protected]
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
>
>