2023-02-05 15:49:04

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures

Hi,

This is v6 of the change to work around a PCIe link training phenomenon
where a pair of devices both capable of operating at a link speed above
2.5GT/s seems unable to negotiate the link speed and continues training
indefinitely with the Link Training bit switching on and off repeatedly
and the data link layer never reaching the active state.

Following Bjorn's suggestion from the previous iteration:
<https://lore.kernel.org/lkml/20221109050418.GA529724@bhelgaas/> I have
moved the workaround into the PCI core. I have kept the part specific to
ASMedia (to lift the speed restriction after a successful retrain) within,
although I find it a good candidate for a standalone quirk. It seems to
me we'd have to add additional classes of fixups however to move this part
to drivers/pci/quirks.c, which I think would be an overkill. So I've only
made it explicitly guarded by CONFIG_PCI_QUIRKS; I can see there's prior
art with this approach.

In the course of the update I have realised that commit 6b2f1351af56
("PCI: Wait for device to become ready after secondary bus reset") makes
no sense and was about to figure out what to do here about it, but then
found Lukas's recent patch series addressing this issue (thanks, Lukas,
you made my life easier!), so I have rebased my patch set on top of
Lukas's:
<https://lore.kernel.org/all/da77c92796b99ec568bd070cbe4725074a117038.1673769517.git.lukas@wunner.de/>.

This has resulted in mild ugliness in that `pcie_downstream_link_retrain'
may be called from `pci_bridge_wait_for_secondary_bus' twice, first time
via `pcie_wait_for_link_delay' and second time via `pci_dev_wait'. This
second call to `pcie_downstream_link_retrain' will do nothing, because for
`link_active_reporting' devices `pcie_wait_for_link_delay' will have
ensured the link has gone up or the second call won't have been reached.

I have also decided to move the initialisation of `link_active_reporting'
earlier on, so as to have a single way to check for the feature. This has
brought an extra patch and its 3 clean-up dependencies into the series.

This was originally observed in a configuration featuring a downstream
port of the ASMedia ASM2824 Gen 3 switch wired to the upstream port of the
Pericom PI7C9X2G304 Gen 2 switch. However in the course of review I have
come to the conclusion that similarly to the earlier similar change to
U-Boot it is indeed expected to be safe to apply this workaround to any
downstream port that has failed link negotiation provided that:

1. the port is capable of reporting the data link layer link active
status (because unlike U-Boot we cannot busy-loop continuously polling
the link training bit),

and:

2. we don't attempt to lift the 2.5GT/s speed restriction, imposed as the
basis of the workaround, for devices not explicitly known to continue
working in that case.

It is expected to be safe because the workaround is applied to a failed
link, that is one that does not (at the time this code is executed) work
anyway, so trying to bring it up cannot make the situation worse.

This has been verified with a SiFive HiFive unmatched board, with and
without CONFIG_PCI_QUIRKS enabled, booting with or without the workaround
activated in U-Boot, which covered both the link retraining part of the
quirk and the lifting of speed restriction already imposed by U-Boot.

I have also issued resets via sysfs to see how this change behaves. For
the problematic link this required a hack to remove a `dev->subordinate'
check from `pci_parent_bus_reset', which in turn triggered the workaround
as expected and brought the link up (but otherwise clobbered downstream
devices as one would inevitably expect).

I have no way to verify these patches with power management or hot-plug
events, but owing to Lukas's effort they get into the same infrastructure,
so I expect the workaround to do its job as expected. I note that there
is an extra call to `pcie_wait_for_link' from `pciehp_check_link_status',
but I expect it to work too. For `link_active_reporting' devices it will
call `pcie_downstream_link_retrain' and for`!link_active_reporting' ones
we have no means to do anything anyway.

The 3 extra clean-ups were only compile-tested (with PowerPC and x86-64
configurations, as appropriate), because I have no suitable hardware
available.

Please see individual change descriptions for further details.

Let me know if this is going in the right direction.

Maciej


2023-02-05 15:49:07

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PATCH v6 1/7] PCI: Export PCI link retrain timeout

Rename LINK_RETRAIN_TIMEOUT to PCIE_LINK_RETRAIN_TIMEOUT and make it
available via "pci.h" for PCI drivers to use.

Signed-off-by: Maciej W. Rozycki <[email protected]>
---
No change from v5.

New change in v5.
---
drivers/pci/pci.h | 2 ++
drivers/pci/pcie/aspm.c | 4 +---
2 files changed, 3 insertions(+), 3 deletions(-)

linux-pcie-link-retrain-timeout.diff
Index: linux-macro/drivers/pci/pci.h
===================================================================
--- linux-macro.orig/drivers/pci/pci.h
+++ linux-macro/drivers/pci/pci.h
@@ -11,6 +11,8 @@

#define PCI_VSEC_ID_INTEL_TBT 0x1234 /* Thunderbolt */

+#define PCIE_LINK_RETRAIN_TIMEOUT HZ
+
extern const unsigned char pcie_link_speed[];
extern bool pci_early_dump;

Index: linux-macro/drivers/pci/pcie/aspm.c
===================================================================
--- linux-macro.orig/drivers/pci/pcie/aspm.c
+++ linux-macro/drivers/pci/pcie/aspm.c
@@ -90,8 +90,6 @@ static const char *policy_str[] = {
[POLICY_POWER_SUPERSAVE] = "powersupersave"
};

-#define LINK_RETRAIN_TIMEOUT HZ
-
/*
* The L1 PM substate capability is only implemented in function 0 in a
* multi function device.
@@ -213,7 +211,7 @@ static bool pcie_retrain_link(struct pci
}

/* Wait for link training end. Break out after waiting for timeout */
- end_jiffies = jiffies + LINK_RETRAIN_TIMEOUT;
+ end_jiffies = jiffies + PCIE_LINK_RETRAIN_TIMEOUT;
do {
pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &reg16);
if (!(reg16 & PCI_EXP_LNKSTA_LT))

2023-02-05 15:49:13

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PATCH v6 2/7] PCI: Execute `quirk_enable_clear_retrain_link' earlier

Make `quirk_enable_clear_retrain_link' `pci_fixup_early' so that any later
fixups can rely on `clear_retrain_link' to have been already initialised.

Signed-off-by: Maciej W. Rozycki <[email protected]>
---
No change from v5.

New change in v5.
---
drivers/pci/quirks.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

linux-pcie-clear-retrain-link-early.diff
Index: linux-macro/drivers/pci/quirks.c
===================================================================
--- linux-macro.orig/drivers/pci/quirks.c
+++ linux-macro/drivers/pci/quirks.c
@@ -2407,9 +2407,9 @@ static void quirk_enable_clear_retrain_l
dev->clear_retrain_link = 1;
pci_info(dev, "Enable PCIe Retrain Link quirk\n");
}
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PERICOM, 0xe110, quirk_enable_clear_retrain_link);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PERICOM, 0xe111, quirk_enable_clear_retrain_link);
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_PERICOM, 0xe130, quirk_enable_clear_retrain_link);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_PERICOM, 0xe110, quirk_enable_clear_retrain_link);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_PERICOM, 0xe111, quirk_enable_clear_retrain_link);
+DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_PERICOM, 0xe130, quirk_enable_clear_retrain_link);

static void fixup_rev1_53c810(struct pci_dev *dev)
{

2023-02-05 15:49:23

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PATCH v6 3/7] PCI: Initialize `link_active_reporting' earlier

Determine whether Data Link Layer Link Active Reporting is available
ahead of calling any fixups so that the cached value can be used there
and later on.

Signed-off-by: Maciej W. Rozycki <[email protected]>
---
New change in v6.
---
drivers/pci/probe.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

linux-pcie-link-active-reporting-early.diff
Index: linux-macro/drivers/pci/probe.c
===================================================================
--- linux-macro.orig/drivers/pci/probe.c
+++ linux-macro/drivers/pci/probe.c
@@ -819,7 +819,6 @@ static void pci_set_bus_speed(struct pci

pcie_capability_read_dword(bridge, PCI_EXP_LNKCAP, &linkcap);
bus->max_bus_speed = pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS];
- bridge->link_active_reporting = !!(linkcap & PCI_EXP_LNKCAP_DLLLARC);

pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &linksta);
pcie_update_link_speed(bus, linksta);
@@ -1828,6 +1827,7 @@ int pci_setup_device(struct pci_dev *dev
int pos = 0;
struct pci_bus_region region;
struct resource *res;
+ u32 linkcap;

hdr_type = pci_hdr_type(dev);

@@ -1873,6 +1873,10 @@ int pci_setup_device(struct pci_dev *dev
/* "Unknown power state" */
dev->current_state = PCI_UNKNOWN;

+ /* Set it early to make it available to fixups, etc. */
+ pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap);
+ dev->link_active_reporting = !!(linkcap & PCI_EXP_LNKCAP_DLLLARC);
+
/* Early fixups, before probing the BARs */
pci_fixup_device(pci_fixup_early, dev);


2023-02-05 15:49:43

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PATCH v6 4/7] powerpc/eeh: Rely on `link_active_reporting'

Use `link_active_reporting' to determine whether Data Link Layer Link
Active Reporting is available rather than re-retrieving the capability.

Signed-off-by: Maciej W. Rozycki <[email protected]>
---
NB this has been compile-tested only with a PPC64LE configuration.

New change in v6.
---
arch/powerpc/kernel/eeh_pe.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

linux-pcie-link-active-reporting-eeh.diff
Index: linux-macro/arch/powerpc/kernel/eeh_pe.c
===================================================================
--- linux-macro.orig/arch/powerpc/kernel/eeh_pe.c
+++ linux-macro/arch/powerpc/kernel/eeh_pe.c
@@ -671,9 +671,8 @@ static void eeh_bridge_check_link(struct
eeh_ops->write_config(edev, cap + PCI_EXP_LNKCTL, 2, val);

/* Check link */
- eeh_ops->read_config(edev, cap + PCI_EXP_LNKCAP, 4, &val);
- if (!(val & PCI_EXP_LNKCAP_DLLLARC)) {
- eeh_edev_dbg(edev, "No link reporting capability (0x%08x) \n", val);
+ if (!edev->pdev->link_active_reporting) {
+ eeh_edev_dbg(edev, "No link reporting capability\n");
msleep(1000);
return;
}

2023-02-05 15:49:55

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PATCH v6 5/7] net/mlx5: Rely on `link_active_reporting'

Use `link_active_reporting' to determine whether Data Link Layer Link
Active Reporting is available rather than re-retrieving the capability.

Signed-off-by: Maciej W. Rozycki <[email protected]>
---
NB this has been compile-tested only with PPC64LE and x86-64
configurations.

New change in v6.
---
drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

linux-pcie-link-active-reporting-mlx5.diff
Index: linux-macro/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
===================================================================
--- linux-macro.orig/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ linux-macro/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -294,7 +294,6 @@ static int mlx5_pci_link_toggle(struct m
unsigned long timeout;
struct pci_dev *sdev;
int cap, err;
- u32 reg32;

/* Check that all functions under the pci bridge are PFs of
* this device otherwise fail this function.
@@ -333,11 +332,8 @@ static int mlx5_pci_link_toggle(struct m
return err;

/* Check link */
- err = pci_read_config_dword(bridge, cap + PCI_EXP_LNKCAP, &reg32);
- if (err)
- return err;
- if (!(reg32 & PCI_EXP_LNKCAP_DLLLARC)) {
- mlx5_core_warn(dev, "No PCI link reporting capability (0x%08x)\n", reg32);
+ if (!bridge->link_active_reporting) {
+ mlx5_core_warn(dev, "No PCI link reporting capability\n");
msleep(1000);
goto restore;
}

2023-02-05 15:50:01

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PATCH v6 6/7] PCI: pciehp: Rely on `link_active_reporting'

Use `link_active_reporting' to determine whether Data Link Layer Link
Active Reporting is available rather than re-retrieving the capability.

Signed-off-by: Maciej W. Rozycki <[email protected]>
---
NB this has been compile-tested only with PPC64LE and x86-64
configurations.

New change in v6.
---
drivers/pci/hotplug/pciehp_hpc.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

linux-pcie-link-active-reporting-hpc.diff
Index: linux-macro/drivers/pci/hotplug/pciehp_hpc.c
===================================================================
--- linux-macro.orig/drivers/pci/hotplug/pciehp_hpc.c
+++ linux-macro/drivers/pci/hotplug/pciehp_hpc.c
@@ -984,7 +984,7 @@ static inline int pcie_hotplug_depth(str
struct controller *pcie_init(struct pcie_device *dev)
{
struct controller *ctrl;
- u32 slot_cap, slot_cap2, link_cap;
+ u32 slot_cap, slot_cap2;
u8 poweron;
struct pci_dev *pdev = dev->port;
struct pci_bus *subordinate = pdev->subordinate;
@@ -1030,9 +1030,6 @@ struct controller *pcie_init(struct pcie
if (dmi_first_match(inband_presence_disabled_dmi_table))
ctrl->inband_presence_disabled = 1;

- /* Check if Data Link Layer Link Active Reporting is implemented */
- pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &link_cap);
-
/* Clear all remaining event bits in Slot Status register. */
pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
@@ -1051,7 +1048,7 @@ struct controller *pcie_init(struct pcie
FLAG(slot_cap, PCI_EXP_SLTCAP_EIP),
FLAG(slot_cap, PCI_EXP_SLTCAP_NCCS),
FLAG(slot_cap2, PCI_EXP_SLTCAP2_IBPD),
- FLAG(link_cap, PCI_EXP_LNKCAP_DLLLARC),
+ FLAG(pdev->link_active_reporting, true),
pdev->broken_cmd_compl ? " (with Cmd Compl erratum)" : "");

/*

2023-02-05 15:54:54

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PATCH v6 7/7] PCI: Work around PCIe link training failures

Attempt to handle cases such as with a downstream port of the ASMedia
ASM2824 PCIe switch where link training never completes and the link
continues switching between speeds indefinitely with the data link layer
never reaching the active state.

It has been observed with a downstream port of the ASMedia ASM2824 Gen 3
switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2
switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device,
P/N 41433, wired to a SiFive HiFive Unmatched board. In this setup the
switches are supposed to negotiate the link speed of preferably 5.0GT/s,
falling back to 2.5GT/s.

Instead the link continues oscillating between the two speeds, at the
rate of 34-35 times per second, with link training reported repeatedly
active ~84% of the time. Forcibly limiting the target link speed to
2.5GT/s with the upstream ASM2824 device however makes the two switches
communicate correctly. Removing the speed restriction afterwards makes
the two devices switch to 5.0GT/s then.

Make use of these observations then and detect the inability to train
the link, by checking for the Data Link Layer Link Active status bit
being off while the Link Bandwidth Management Status indicating that
hardware has changed the link speed or width in an attempt to correct
unreliable link operation.

Restrict the speed to 2.5GT/s then with the Target Link Speed field,
request a retrain and wait 200ms for the data link to go up. If this
turns out successful, then lift the restriction, letting the devices
negotiate a higher speed.

Also check for a 2.5GT/s speed restriction the firmware may have already
arranged and lift it too with ports of devices known to continue working
afterwards, currently the ASM2824 only, that already report their data
link being up.

Signed-off-by: Maciej W. Rozycki <[email protected]>
Link: https://lore.kernel.org/lkml/[email protected]/
Link: https://source.denx.de/u-boot/u-boot/-/commit/a398a51ccc68
---
Changes from v5:

- Move from a quirk into PCI core and call at device probing, hot-plug,
reset and resume. Keep the ASMedia part under CONFIG_PCI_QUIRKS.

- Rely on `dev->link_active_reporting' rather than re-retrieving the
capability.

Changes from v4:

- Remove <linux/bug.h> inclusion no longer needed.

- Make the quirk generic based on probing device features rather than
specific to the ASM2824 part only; take the Retrain Link bit erratum
into account.

- Still lift the 2.5GT/s speed restriction with the ASM2824 only.

- Increase retrain timeout from 200ms to 1s (PCIE_LINK_RETRAIN_TIMEOUT).

- Remove retrain success notification.

- Use PCIe helpers rather than generic PCI functions throughout.

- Trim down and update the wording of the change description for the
switch from an ASM2824-specific to a generic fixup.

Changes from v3:

- Remove the <linux/pci_ids.h> entry for the ASM2824.

Changes from v2:

- Regenerate for 5.17-rc2 for a merge conflict.

- Replace BUG_ON for a missing PCI Express capability with WARN_ON and an
early return.

Changes from v1:

- Regenerate for a merge conflict.
---
drivers/pci/pci.c | 154 ++++++++++++++++++++++++++++++++++++++++++++++++++--
drivers/pci/pci.h | 1
drivers/pci/probe.c | 2
3 files changed, 152 insertions(+), 5 deletions(-)

linux-pcie-asm2824-manual-retrain.diff
Index: linux-macro/drivers/pci/pci.c
===================================================================
--- linux-macro.orig/drivers/pci/pci.c
+++ linux-macro/drivers/pci/pci.c
@@ -859,6 +859,132 @@ int pci_wait_for_pending(struct pci_dev
return 0;
}

+/*
+ * Retrain the link of a downstream PCIe port by hand if necessary.
+ *
+ * This is needed at least where a downstream port of the ASMedia ASM2824
+ * Gen 3 switch is wired to the upstream port of the Pericom PI7C9X2G304
+ * Gen 2 switch, and observed with the Delock Riser Card PCI Express x1 >
+ * 2 x PCIe x1 device, P/N 41433, plugged into the SiFive HiFive Unmatched
+ * board.
+ *
+ * In such a configuration the switches are supposed to negotiate the link
+ * speed of preferably 5.0GT/s, falling back to 2.5GT/s. However the link
+ * continues switching between the two speeds indefinitely and the data
+ * link layer never reaches the active state, with link training reported
+ * repeatedly active ~84% of the time. Forcing the target link speed to
+ * 2.5GT/s with the upstream ASM2824 device makes the two switches talk to
+ * each other correctly however. And more interestingly retraining with a
+ * higher target link speed afterwards lets the two successfully negotiate
+ * 5.0GT/s.
+ *
+ * With the ASM2824 we can rely on the otherwise optional Data Link Layer
+ * Link Active status bit and in the failed link training scenario it will
+ * be off along with the Link Bandwidth Management Status indicating that
+ * hardware has changed the link speed or width in an attempt to correct
+ * unreliable link operation. For a port that has been left unconnected
+ * both bits will be clear. So use this information to detect the problem
+ * rather than polling the Link Training bit and watching out for flips or
+ * at least the active status.
+ *
+ * Since the exact nature of the problem isn't known and in principle this
+ * could trigger where an ASM2824 device is downstream rather upstream,
+ * apply this erratum workaround to any downstream ports as long as they
+ * support Link Active reporting and have the Link Control 2 register.
+ * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
+ * request a retrain and wait 200ms for the data link to go up.
+ *
+ * If this turns out successful and we know by the Vendor:Device ID it is
+ * safe to do so, then lift the restriction, letting the devices negotiate
+ * a higher speed. Also check for a similar 2.5GT/s speed restriction the
+ * firmware may have already arranged and lift it with ports that already
+ * report their data link being up.
+ *
+ * Return 0 if the link has been successfully retrained, otherwise -1.
+ */
+int pcie_downstream_link_retrain(struct pci_dev *dev)
+{
+ static const struct pci_device_id ids[] = {
+ { PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
+ {}
+ };
+ u16 lnksta, lnkctl2;
+
+ if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
+ !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
+ return -1;
+
+ pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
+ pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+ if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
+ PCI_EXP_LNKSTA_LBMS) {
+ unsigned long timeout;
+ u16 lnkctl;
+
+ pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
+
+ pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
+ lnkctl |= PCI_EXP_LNKCTL_RL;
+ lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
+ lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
+ pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
+ pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
+ /*
+ * Due to an erratum in some devices the Retrain Link bit
+ * needs to be cleared again manually to allow the link
+ * training to succeed.
+ */
+ lnkctl &= ~PCI_EXP_LNKCTL_RL;
+ if (dev->clear_retrain_link)
+ pcie_capability_write_word(dev, PCI_EXP_LNKCTL,
+ lnkctl);
+
+ timeout = jiffies + PCIE_LINK_RETRAIN_TIMEOUT;
+ do {
+ pcie_capability_read_word(dev, PCI_EXP_LNKSTA,
+ &lnksta);
+ if (lnksta & PCI_EXP_LNKSTA_DLLLA)
+ break;
+ usleep_range(10000, 20000);
+ } while (time_before(jiffies, timeout));
+
+ if (!(lnksta & PCI_EXP_LNKSTA_DLLLA)) {
+ pci_info(dev, "retraining failed\n");
+ return -1;
+ }
+ }
+
+ if (IS_ENABLED(CONFIG_PCI_QUIRKS) && (lnksta & PCI_EXP_LNKSTA_DLLLA) &&
+ (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
+ pci_match_id(ids, dev)) {
+ u32 lnkcap;
+ u16 lnkctl;
+
+ pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
+ pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
+ pcie_capability_read_word(dev, PCI_EXP_LNKCTL, &lnkctl);
+ lnkctl |= PCI_EXP_LNKCTL_RL;
+ lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
+ lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS;
+ pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
+ pcie_capability_write_word(dev, PCI_EXP_LNKCTL, lnkctl);
+ }
+
+ return 0;
+}
+
+/* Same as above, but called for a downstream device. */
+static int pcie_upstream_link_retrain(struct pci_dev *dev)
+{
+ struct pci_dev *bridge;
+
+ bridge = pci_upstream_bridge(dev);
+ if (bridge)
+ return pcie_downstream_link_retrain(bridge);
+ else
+ return -1;
+}
+
static int pci_acs_enable;

/**
@@ -1148,8 +1274,8 @@ void pci_resume_bus(struct pci_bus *bus)

static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout)
{
+ int retrain = 0;
int delay = 1;
- u32 id;

/*
* After reset, the device should not silently discard config
@@ -1163,21 +1289,37 @@ static int pci_dev_wait(struct pci_dev *
* Command register instead of Vendor ID so we don't have to
* contend with the CRS SV value.
*/
- pci_read_config_dword(dev, PCI_COMMAND, &id);
- while (PCI_POSSIBLE_ERROR(id)) {
+ for (;;) {
+ u32 id;
+
+ pci_read_config_dword(dev, PCI_COMMAND, &id);
+ if (!PCI_POSSIBLE_ERROR(id)) {
+ if (delay > PCI_RESET_WAIT)
+ pci_info(dev, "ready %dms after %s\n",
+ delay - 1, reset_type);
+ break;
+ }
+
if (delay > timeout) {
pci_warn(dev, "not ready %dms after %s; giving up\n",
delay - 1, reset_type);
return -ENOTTY;
}

- if (delay > PCI_RESET_WAIT)
+ if (delay > PCI_RESET_WAIT) {
+ if (!retrain) {
+ retrain = 1;
+ if (pcie_upstream_link_retrain(dev) == 0) {
+ delay = 1;
+ continue;
+ }
+ }
pci_info(dev, "not ready %dms after %s; waiting\n",
delay - 1, reset_type);
+ }

msleep(delay);
delay *= 2;
- pci_read_config_dword(dev, PCI_COMMAND, &id);
}

if (delay > PCI_RESET_WAIT)
@@ -4901,6 +5043,8 @@ static bool pcie_wait_for_link_delay(str
msleep(10);
timeout -= 10;
}
+ if (active && !ret)
+ ret = pcie_downstream_link_retrain(pdev) == 0;
if (active && ret)
msleep(delay);

Index: linux-macro/drivers/pci/pci.h
===================================================================
--- linux-macro.orig/drivers/pci/pci.h
+++ linux-macro/drivers/pci/pci.h
@@ -37,6 +37,7 @@ int pci_mmap_fits(struct pci_dev *pdev,
enum pci_mmap_api mmap_api);

bool pci_reset_supported(struct pci_dev *dev);
+int pcie_downstream_link_retrain(struct pci_dev *dev);
void pci_init_reset_methods(struct pci_dev *dev);
int pci_bridge_secondary_bus_reset(struct pci_dev *dev);
int pci_bus_error_reset(struct pci_dev *dev);
Index: linux-macro/drivers/pci/probe.c
===================================================================
--- linux-macro.orig/drivers/pci/probe.c
+++ linux-macro/drivers/pci/probe.c
@@ -2546,6 +2546,8 @@ void pci_device_add(struct pci_dev *dev,
dma_set_max_seg_size(&dev->dev, 65536);
dma_set_seg_boundary(&dev->dev, 0xffffffff);

+ pcie_downstream_link_retrain(dev);
+
/* Fix up broken headers */
pci_fixup_device(pci_fixup_header, dev);


2023-02-13 13:53:42

by Lukas Wunner

[permalink] [raw]
Subject: Re: [PATCH v6 6/7] PCI: pciehp: Rely on `link_active_reporting'

On Sun, Feb 05, 2023 at 03:49:21PM +0000, Maciej W. Rozycki wrote:
> Use `link_active_reporting' to determine whether Data Link Layer Link
> Active Reporting is available rather than re-retrieving the capability.
>
> Signed-off-by: Maciej W. Rozycki <[email protected]>

Reviewed-by: Lukas Wunner <[email protected]>

I believe this should work without the preceding patches in the series,
hence can be applied independently.

Thanks,

Lukas

> drivers/pci/hotplug/pciehp_hpc.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> linux-pcie-link-active-reporting-hpc.diff
> Index: linux-macro/drivers/pci/hotplug/pciehp_hpc.c
> ===================================================================
> --- linux-macro.orig/drivers/pci/hotplug/pciehp_hpc.c
> +++ linux-macro/drivers/pci/hotplug/pciehp_hpc.c
> @@ -984,7 +984,7 @@ static inline int pcie_hotplug_depth(str
> struct controller *pcie_init(struct pcie_device *dev)
> {
> struct controller *ctrl;
> - u32 slot_cap, slot_cap2, link_cap;
> + u32 slot_cap, slot_cap2;
> u8 poweron;
> struct pci_dev *pdev = dev->port;
> struct pci_bus *subordinate = pdev->subordinate;
> @@ -1030,9 +1030,6 @@ struct controller *pcie_init(struct pcie
> if (dmi_first_match(inband_presence_disabled_dmi_table))
> ctrl->inband_presence_disabled = 1;
>
> - /* Check if Data Link Layer Link Active Reporting is implemented */
> - pcie_capability_read_dword(pdev, PCI_EXP_LNKCAP, &link_cap);
> -
> /* Clear all remaining event bits in Slot Status register. */
> pcie_capability_write_word(pdev, PCI_EXP_SLTSTA,
> PCI_EXP_SLTSTA_ABP | PCI_EXP_SLTSTA_PFD |
> @@ -1051,7 +1048,7 @@ struct controller *pcie_init(struct pcie
> FLAG(slot_cap, PCI_EXP_SLTCAP_EIP),
> FLAG(slot_cap, PCI_EXP_SLTCAP_NCCS),
> FLAG(slot_cap2, PCI_EXP_SLTCAP2_IBPD),
> - FLAG(link_cap, PCI_EXP_LNKCAP_DLLLARC),
> + FLAG(pdev->link_active_reporting, true),
> pdev->broken_cmd_compl ? " (with Cmd Compl erratum)" : "");
>
> /*

2023-02-19 18:53:08

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PING][PATCH v6 0/7] pci: Work around ASMedia ASM2824 PCIe link training failures

On Sun, 5 Feb 2023, Maciej W. Rozycki wrote:

> This is v6 of the change to work around a PCIe link training phenomenon
> where a pair of devices both capable of operating at a link speed above
> 2.5GT/s seems unable to negotiate the link speed and continues training
> indefinitely with the Link Training bit switching on and off repeatedly
> and the data link layer never reaching the active state.

Ping for:
<https://lore.kernel.org/linux-pci/[email protected]/>.

Maciej