According to the old PCIe spec, PDS would always have to come up with
or before DLLLA. For various reasons, not all OEMs followed this
requirement. As a result, in PCIe 4.0(*), there is a new way to disable
in-band presence reporting, such that PDS always reports the status of
the out-of-band presence (if implemented).
For us, this means it is legal to get a Card Present event after the
link is active, and the driver was loaded. This causes an erroneous
removal of the device driver, followed by immediate re-probing.
In a fully PCIe-compliant world, we could leave inband presence
enabled and not require any software changes. However, certain OEMs
simply disable inband presence without implementing the requisite
query and control bits. I present two solutions to resolve this.
The first and last patch in the series are required. Of the remaining
two, only one is required to complete the series. Since we don't yet have
a go/no-go on which method to use, both solutions are included:
- 2/4: Try to wait for PDS _before_ loading the driver
- 3/4: Load as usual, and recognize the delayed PDS event as such
(*) ECN was approved in Nov 2018, and is normative spec text. A lot of
the leaked PCIe 4.0 specs do not have this change.
Changes since v2:
* Dropped [RFC] from title
* Addressed style issue found by Lukas
Alexandru Gagniuc (4):
PCI: hotplug: Add support for disabling in-band presence
PCI: pciehp: Do not turn off slot if presence comes up after link
PCI: hotplug: Wait for PDS when in-band presence is disabled
PCI: hotplug: Add quirk For Dell nvme pcie switches
drivers/pci/hotplug/pciehp.h | 1 +
drivers/pci/hotplug/pciehp_ctrl.c | 24 ++++++++++++++
drivers/pci/hotplug/pciehp_hpc.c | 52 ++++++++++++++++++++++++++++++-
include/linux/pci.h | 1 +
include/uapi/linux/pci_regs.h | 2 ++
5 files changed, 79 insertions(+), 1 deletion(-)
--
2.20.1
The presence detect state (PDS) is normally a logical or of in-band
and out-of-band presence. In PCIe 4.0, there is the option to disable
in-band presence so that the PDS bit always reflects the state of the
out-of-band presence.
The recommendation of the PCIe spec is to disable in-band presence
whenever supported.
Signed-off-by: Alexandru Gagniuc <[email protected]>
---
drivers/pci/hotplug/pciehp.h | 1 +
drivers/pci/hotplug/pciehp_hpc.c | 9 ++++++++-
include/uapi/linux/pci_regs.h | 2 ++
3 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 506e1d923a1f..6f729ce4a7b9 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -98,6 +98,7 @@ struct controller {
struct pcie_device *pcie;
u32 slot_cap; /* capabilities and quirks */
+ unsigned int inband_presence_disabled:1;
u16 slot_ctrl; /* control register access */
struct mutex ctrl_lock;
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 6a2365cd794e..078d78a7437d 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -837,7 +837,7 @@ static inline void dbg_ctrl(struct controller *ctrl)
struct controller *pcie_init(struct pcie_device *dev)
{
struct controller *ctrl;
- u32 slot_cap, link_cap;
+ u32 slot_cap, slot_cap2, link_cap;
u8 poweron;
struct pci_dev *pdev = dev->port;
struct pci_bus *subordinate = pdev->subordinate;
@@ -895,6 +895,13 @@ struct controller *pcie_init(struct pcie_device *dev)
FLAG(link_cap, PCI_EXP_LNKCAP_DLLLARC),
pdev->broken_cmd_compl ? " (with Cmd Compl erratum)" : "");
+ pcie_capability_read_dword(pdev, PCI_EXP_SLTCAP2, &slot_cap2);
+ if (slot_cap2 & PCI_EXP_SLTCAP2_IBPD) {
+ pcie_write_cmd_nowait(ctrl, PCI_EXP_SLTCTL_IBPD_DISABLE,
+ PCI_EXP_SLTCTL_IBPD_DISABLE);
+ ctrl->inband_presence_disabled = 1;
+ }
+
/*
* If empty slot's power status is on, turn power off. The IRQ isn't
* requested yet, so avoid triggering a notification with this command.
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 5c98133f2c94..c7afdc4c098c 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -597,6 +597,7 @@
#define PCI_EXP_SLTCTL_PWR_OFF 0x0400 /* Power Off */
#define PCI_EXP_SLTCTL_EIC 0x0800 /* Electromechanical Interlock Control */
#define PCI_EXP_SLTCTL_DLLSCE 0x1000 /* Data Link Layer State Changed Enable */
+#define PCI_EXP_SLTCTL_IBPD_DISABLE 0x4000 /* In-band PD disable */
#define PCI_EXP_SLTSTA 26 /* Slot Status */
#define PCI_EXP_SLTSTA_ABP 0x0001 /* Attention Button Pressed */
#define PCI_EXP_SLTSTA_PFD 0x0002 /* Power Fault Detected */
@@ -667,6 +668,7 @@
#define PCI_EXP_LNKSTA2 50 /* Link Status 2 */
#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 52 /* v2 endpoints with link end here */
#define PCI_EXP_SLTCAP2 52 /* Slot Capabilities 2 */
+#define PCI_EXP_SLTCAP2_IBPD 0x0001 /* In-band PD Disable Supported */
#define PCI_EXP_SLTCTL2 56 /* Slot Control 2 */
#define PCI_EXP_SLTSTA2 58 /* Slot Status 2 */
--
2.20.1
According to PCIe 3.0, the presence detect state is a logical OR of
in-band and out-of-band presence. With this, we'd expect the presence
state to always be asserted when the link comes up.
Not all hardware follows this, and it is possible for the presence to
come up after the link. In this case, the PCIe device would be
erroneously disabled and re-probed. It is possible to distinguish
between a delayed presence and a card swap by looking at the DLL state
changed bit -- The link has to come down if the card is removed.
Thus, for a device that is probed, present and has its link active, a
lack of a link state change event guarantees we have the same device,
and shutdown is not needed.
Signed-off-by: Alexandru Gagniuc <[email protected]>
---
drivers/pci/hotplug/pciehp_ctrl.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c b/drivers/pci/hotplug/pciehp_ctrl.c
index 905282a8ddaa..d46724f0b4ce 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -217,6 +217,22 @@ void pciehp_handle_disable_request(struct controller *ctrl)
ctrl->request_result = pciehp_disable_slot(ctrl, SAFE_REMOVAL);
}
+static bool is_delayed_presence_up_event(struct controller *ctrl, u32 events)
+{
+ bool present, link_active;
+
+ if (!ctrl->inband_presence_disabled)
+ return false;
+
+ present = pciehp_card_present(ctrl);
+ link_active = pciehp_check_link_active(ctrl);
+
+ if (!present || !link_active || events & PCI_EXP_SLTSTA_DLLSC)
+ return false;
+
+ return true;
+}
+
void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
{
bool present, link_active;
@@ -224,6 +240,9 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
/*
* If the slot is on and presence or link has changed, turn it off.
* Even if it's occupied again, we cannot assume the card is the same.
+ *
+ * An exception is a delayed "Card present" after a "Link Up". This can
+ * happen on controllers with in-band presence disabled.
*/
mutex_lock(&ctrl->state_lock);
switch (ctrl->state) {
@@ -231,6 +250,11 @@ void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
cancel_delayed_work(&ctrl->button_work);
/* fall through */
case ON_STATE:
+ if (is_delayed_presence_up_event(ctrl, events)) {
+ mutex_unlock(&ctrl->state_lock);
+ ctrl_dbg(ctrl, "Presence state came up after link");
+ return;
+ }
ctrl->state = POWEROFF_STATE;
mutex_unlock(&ctrl->state_lock);
if (events & PCI_EXP_SLTSTA_DLLSC)
--
2.20.1
These switches are used to fornicate the motherboard's x16 PCIe ports
into four x4 ports for NVMe drives. In conjunction with the storage
backplane, the PDS bit reports only the out-of-band presence. The fact
that inband presence is disabled is not reported in the slot
capabilities 2 (SLTCAP2) register.
Because this does not conform to the PCIe spec, add a quirk to let
hotplug code know to expect and handle this.
Signed-off-by: Alexandru Gagniuc <[email protected]>
---
drivers/pci/hotplug/pciehp_hpc.c | 21 +++++++++++++++++++++
include/linux/pci.h | 1 +
2 files changed, 22 insertions(+)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 6cd2c4fb4edb..681043f59fa5 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -881,6 +881,9 @@ struct controller *pcie_init(struct pcie_device *dev)
if (pdev->is_thunderbolt)
slot_cap |= PCI_EXP_SLTCAP_NCCS;
+ if (pdev->no_in_band_presence)
+ ctrl->inband_presence_disabled = 1;
+
ctrl->slot_cap = slot_cap;
mutex_init(&ctrl->ctrl_lock);
mutex_init(&ctrl->state_lock);
@@ -964,3 +967,21 @@ DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_QCOM, 0x0401,
PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl);
DECLARE_PCI_FIXUP_CLASS_EARLY(PCI_VENDOR_ID_HXT, 0x0401,
PCI_CLASS_BRIDGE_PCI, 8, quirk_cmd_compl);
+
+static void fixup_dell_nvme_backplane_switches(struct pci_dev *pdev)
+{
+ if (!pci_is_pcie(pdev))
+ return;
+
+ if (pci_pcie_type(pdev) != PCI_EXP_TYPE_DOWNSTREAM)
+ return;
+
+ if (pdev->subsystem_vendor != PCI_VENDOR_ID_DELL ||
+ pdev->subsystem_device != 0x1fc7)
+ return;
+
+ pdev->no_in_band_presence = 1;
+}
+DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_PLX, 0x9733,
+ PCI_CLASS_BRIDGE_PCI, 8,
+ fixup_dell_nvme_backplane_switches);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 77448215ef5b..9594a313064d 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -416,6 +416,7 @@ struct pci_dev {
unsigned int non_compliant_bars:1; /* Broken BARs; ignore them */
unsigned int is_probed:1; /* Device probing in progress */
unsigned int link_active_reporting:1;/* Device capable of reporting link active */
+ unsigned int no_in_band_presence:1; /* Device does not report in-band presence */
unsigned int no_vf_scan:1; /* Don't scan for VFs after IOV enablement */
pci_dev_flags_t dev_flags;
atomic_t enable_cnt; /* pci_enable_device has been called */
--
2.20.1
When inband presence is disabled, PDS may come up at any time, or not
at all. PDS being low may indicate that the card is still mating, and
we could expect contact bounce to bring down the link as well.
It is reasonable to assume that most cards will mate in a hotplug slot
in less than a second. Thus, when we know PDS only reflects
out-of-band presence, it's worthwhile to wait the extra second and
make sure the card is properly mated before loading the driver.
Signed-off-by: Alexandru Gagniuc <[email protected]>
---
drivers/pci/hotplug/pciehp_hpc.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 078d78a7437d..6cd2c4fb4edb 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -240,6 +240,25 @@ static bool pci_bus_check_dev(struct pci_bus *bus, int devfn)
return found;
}
+static void pcie_wait_for_presence(struct pci_dev *pdev)
+{
+ int timeout = 1000;
+ bool pds;
+ u16 slot_status;
+
+ while(true) {
+ pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &slot_status);
+ pds = !!(slot_status & PCI_EXP_SLTSTA_PDS);
+ if (pds || timeout <= 0)
+ break;
+ msleep(10);
+ timeout -= 10;
+ }
+
+ if (!pds)
+ pci_info(pdev, "Presence Detect state not set in 1000 msec\n");
+}
+
int pciehp_check_link_status(struct controller *ctrl)
{
struct pci_dev *pdev = ctrl_dev(ctrl);
@@ -249,6 +268,9 @@ int pciehp_check_link_status(struct controller *ctrl)
if (!pcie_wait_for_link(pdev, true))
return -1;
+ if (ctrl->inband_presence_disabled)
+ pcie_wait_for_presence(pdev);
+
found = pci_bus_check_dev(ctrl->pcie->port->subordinate,
PCI_DEVFN(0, 0));
--
2.20.1
Alexandru Gagniuc <[email protected]> writes:
> These switches are used to fornicate the motherboard's x16 PCIe ports
^^^^^^^^^
I don't think that's the word you intended to use.
--
Alan J. Wylie https://www.wylie.me.uk/
Dance like no-one's watching. / Encrypt like everyone is.
Security is inversely proportional to convenience