Attempt to handle cases with a downstream port of the ASMedia ASM2824
PCIe switch where link training never completes and the link continues
switching between speeds indefinitely with the data link layer never
reaching the active state.
It has been observed with a downstream port of the ASMedia ASM2824 Gen 3
switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2
switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device,
P/N 41433, wired to a SiFive HiFive Unmatched board. In this setup the
switches are supposed to negotiate the link speed of preferably 5.0GT/s,
falling back to 2.5GT/s.
However the link continues oscillating between the two speeds, at the
rate of 34-35 times per second, with link training reported repeatedly
active ~84% of the time, e.g.:
02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
[...]
Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
[...]
Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
[...]
LnkSta: Speed 5GT/s (downgraded), Width x1 (ok)
TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
[...]
Forcibly limiting the target link speed to 2.5GT/s with the upstream
ASM2824 device makes the two switches communicate correctly however:
02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
[...]
Bus: primary=02, secondary=05, subordinate=09, sec-latency=0
[...]
Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
[...]
LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
[...]
and then:
05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode])
[...]
Bus: primary=05, secondary=06, subordinate=09, sec-latency=0
[...]
Capabilities: [c0] Express (v2) Upstream Port, MSI 00
[...]
LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
[...]
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
[...]
Removing the speed restriction afterwards makes the two devices switch
to 5.0GT/s then.
Make use of these observations then and detect the inability to train
the link, by checking for the Data Link Layer Link Active status bit
implemented by the ASM2824 being off while the Link Bandwidth Management
Status indicating that hardware has changed the link speed or width in
an attempt to correct unreliable link operation.
Restrict the speed to 2.5GT/s then with the Target Link Speed field,
request a retrain and wait 200ms for the data link to go up. If this
turns out successful, then lift the restriction, letting the devices
negotiate a higher speed. Also check for a 2.5GT/s speed restriction
the firmware may have already arranged and lift it too with ports that
already report their data link being up.
Signed-off-by: Maciej W. Rozycki <[email protected]>
---
Hi,
Patch regenerated and re-verified against 5.17-rc2. On this occasion I
decided to remove BUG_ON after all for a supposedly impossible condition
of a missing PCI Express capability for this PCIe switch and instead use
WARN_ON and an early return. It should be safer this way.
NB the corresponding U-boot change has since gone in with commit
a398a51ccc68 ("pci: Work around PCIe link training failures"). See
<https://lore.kernel.org/lkml/[email protected]/>
for previous background information.
Questions, comments, concerns? Otherwise please apply.
Maciej
Changes from v2:
- Regenerate for 5.17-rc2 for a merge conflict.
- Replace BUG_ON for a missing PCI Express capability with WARN_ON and an
early return.
Changes from v1:
- Regenerate for a merge conflict.
---
drivers/pci/quirks.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/pci_ids.h | 1
2 files changed, 99 insertions(+)
linux-pcie-asm2824-manual-retrain.diff
Index: linux-macro/drivers/pci/quirks.c
===================================================================
--- linux-macro.orig/drivers/pci/quirks.c
+++ linux-macro/drivers/pci/quirks.c
@@ -12,6 +12,7 @@
* file, where their drivers can use them.
*/
+#include <linux/bug.h>
#include <linux/types.h>
#include <linux/kernel.h>
#include <linux/export.h>
@@ -5879,3 +5880,100 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IN
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1536, rom_bar_overlap_defect);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1537, rom_bar_overlap_defect);
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1538, rom_bar_overlap_defect);
+
+/*
+ * Retrain the link of a downstream PCIe port by hand if necessary.
+ *
+ * This is needed at least where a downstream port of the ASMedia ASM2824
+ * Gen 3 switch is wired to the upstream port of the Pericom PI7C9X2G304
+ * Gen 2 switch, and observed with the Delock Riser Card PCI Express x1 >
+ * 2 x PCIe x1 device, P/N 41433, plugged into the SiFive HiFive Unmatched
+ * board.
+ *
+ * In such a configuration the switches are supposed to negotiate the link
+ * speed of preferably 5.0GT/s, falling back to 2.5GT/s. However the link
+ * continues switching between the two speeds indefinitely and the data
+ * link layer never reaches the active state, with link training reported
+ * repeatedly active ~84% of the time. Forcing the target link speed to
+ * 2.5GT/s with the upstream ASM2824 device makes the two switches talk to
+ * each other correctly however. And more interestingly retraining with a
+ * higher target link speed afterwards lets the two successfully negotiate
+ * 5.0GT/s.
+ *
+ * With the ASM2824 we can rely on the otherwise optional Data Link Layer
+ * Link Active status bit and in the failed link training scenario it will
+ * be off along with the Link Bandwidth Management Status indicating that
+ * hardware has changed the link speed or width in an attempt to correct
+ * unreliable link operation. For a port that has been left unconnected
+ * both bits will be clear. So use this information to detect the problem
+ * rather than polling the Link Training bit and watching out for flips or
+ * at least the active status.
+ *
+ * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
+ * request a retrain and wait 200ms for the data link to go up. If this
+ * turns out successful, then lift the restriction, letting the devices
+ * negotiate a higher speed. Also check for a 2.5GT/s speed restriction
+ * the firmware may have already arranged and lift it too with ports that
+ * already report their data link being up.
+ */
+static void pcie_downstream_link_retrain(struct pci_dev *dev)
+{
+ u16 lnksta, lnkctl2;
+ u8 pos;
+
+ pos = pci_find_capability(dev, PCI_CAP_ID_EXP);
+ WARN_ON(!pos);
+ if (!pos)
+ return;
+
+ pci_read_config_word(dev, pos + PCI_EXP_LNKCTL2, &lnkctl2);
+ pci_read_config_word(dev, pos + PCI_EXP_LNKSTA, &lnksta);
+ if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
+ PCI_EXP_LNKSTA_LBMS) {
+ unsigned long timeout;
+ u16 lnkctl;
+
+ pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s...\n");
+
+ pci_read_config_word(dev, pos + PCI_EXP_LNKCTL, &lnkctl);
+ lnkctl |= PCI_EXP_LNKCTL_RL;
+ lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
+ lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
+ pci_write_config_word(dev, pos + PCI_EXP_LNKCTL2, lnkctl2);
+ pci_write_config_word(dev, pos + PCI_EXP_LNKCTL, lnkctl);
+
+ timeout = jiffies + msecs_to_jiffies(200);
+ do {
+ pci_read_config_word(dev, pos + PCI_EXP_LNKSTA,
+ &lnksta);
+ if (lnksta & PCI_EXP_LNKSTA_DLLLA)
+ break;
+ usleep_range(10000, 20000);
+ } while (time_before(jiffies, timeout));
+
+ pci_info(dev, "retraining %s!\n",
+ lnksta & PCI_EXP_LNKSTA_DLLLA ?
+ "succeeded" : "failed");
+ }
+
+ if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
+ (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) {
+ u32 lnkcap;
+ u16 lnkctl;
+
+ pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
+ pci_read_config_word(dev, pos + PCI_EXP_LNKCTL, &lnkctl);
+ pci_read_config_dword(dev, pos + PCI_EXP_LNKCAP, &lnkcap);
+ lnkctl |= PCI_EXP_LNKCTL_RL;
+ lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
+ lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS;
+ pci_write_config_word(dev, pos + PCI_EXP_LNKCTL2, lnkctl2);
+ pci_write_config_word(dev, pos + PCI_EXP_LNKCTL, lnkctl);
+ }
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ASMEDIA,
+ PCI_DEVICE_ID_ASMEDIA_ASM2824,
+ pcie_downstream_link_retrain);
+DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_ASMEDIA,
+ PCI_DEVICE_ID_ASMEDIA_ASM2824,
+ pcie_downstream_link_retrain);
Index: linux-macro/include/linux/pci_ids.h
===================================================================
--- linux-macro.orig/include/linux/pci_ids.h
+++ linux-macro/include/linux/pci_ids.h
@@ -2545,6 +2545,7 @@
#define PCI_SUBDEVICE_ID_QEMU 0x1100
#define PCI_VENDOR_ID_ASMEDIA 0x1b21
+#define PCI_DEVICE_ID_ASMEDIA_ASM2824 0x2824
#define PCI_VENDOR_ID_REDHAT 0x1b36
On Tue, 1 Feb 2022, Maciej W. Rozycki wrote:
> Attempt to handle cases with a downstream port of the ASMedia ASM2824
> PCIe switch where link training never completes and the link continues
> switching between speeds indefinitely with the data link layer never
> reaching the active state.
Ping for:
<https://lore.kernel.org/all/[email protected]/>
Maciej
On Tue, Feb 01, 2022 at 10:50:37AM +0000, Maciej W. Rozycki wrote:
> Attempt to handle cases with a downstream port of the ASMedia ASM2824
> PCIe switch where link training never completes and the link continues
> switching between speeds indefinitely with the data link layer never
> reaching the active state.
>
> It has been observed with a downstream port of the ASMedia ASM2824 Gen 3
> switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2
> switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device,
> P/N 41433, wired to a SiFive HiFive Unmatched board. In this setup the
> switches are supposed to negotiate the link speed of preferably 5.0GT/s,
> falling back to 2.5GT/s.
Can you collect the complete dmesg and lspci output, please?
From hints on the web, I guess the ASM2824 is built-in to the SiFive
board, right? So I suspect the topology is something like this:
00:00.0 Root Port to [bus 01-??] # soldered on to Unmatched
01:00.0 ASM2824 Upstream Port to [bus 02-09] # soldered on
02:03.0 ASM2824 Downstream Port to [bus 05-09] # soldered on, to slot
05:00.0 PI7C9X2G304 Upstream Port to [bus 06-09] # on Delock riser card
06:??.? PI7C9X2G304 Downstream Port to [bus ??] # slot 0 on Delock card
06:??.? PI7C9X2G304 Downstream Port to [bus ??] # slot 1 on Delock card
Do we have other reports of this bridge being broken? Or could this
be some kind of signal integrity problem on Unmatched or the slot?
The ASM2824 has been around for a while and seems to be used in other
systems, e.g., https://linux-hardware.org/?id=pci:1b21-2824, so if
it's an ASM2824 issue, we should see it elsewhere, too.
> However the link continues oscillating between the two speeds, at the
> rate of 34-35 times per second, with link training reported repeatedly
> active ~84% of the time, e.g.:
>
> 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> [...]
> Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
> [...]
> Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> [...]
> LnkSta: Speed 5GT/s (downgraded), Width x1 (ok)
> TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt-
> [...]
> LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> Compliance De-emphasis: -6dB
> [...]
>
> Forcibly limiting the target link speed to 2.5GT/s with the upstream
> ASM2824 device makes the two switches communicate correctly however:
>
> 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> [...]
> Bus: primary=02, secondary=05, subordinate=09, sec-latency=0
> [...]
> Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> [...]
> LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
> TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
> [...]
> LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> Compliance De-emphasis: -6dB
> [...]
>
> and then:
>
> 05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode])
> [...]
> Bus: primary=05, secondary=06, subordinate=09, sec-latency=0
> [...]
> Capabilities: [c0] Express (v2) Upstream Port, MSI 00
> [...]
> LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
> TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> [...]
> LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
> Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> Compliance De-emphasis: -6dB
> [...]
I think the above is supposed to show 02:03.0 operating at 5GT/s or
2.5GT/s, and 05:00.0 operating at 2.5GT/s. But there's a lot of noise
that makes it hard to pick out what's important.
> Removing the speed restriction afterwards makes the two devices switch
> to 5.0GT/s then.
>
> Make use of these observations then and detect the inability to train
> the link, by checking for the Data Link Layer Link Active status bit
> implemented by the ASM2824 being off while the Link Bandwidth Management
> Status indicating that hardware has changed the link speed or width in
> an attempt to correct unreliable link operation.
>
> Restrict the speed to 2.5GT/s then with the Target Link Speed field,
> request a retrain and wait 200ms for the data link to go up. If this
> turns out successful, then lift the restriction, letting the devices
> negotiate a higher speed. Also check for a 2.5GT/s speed restriction
> the firmware may have already arranged and lift it too with ports that
> already report their data link being up.
>
> Signed-off-by: Maciej W. Rozycki <[email protected]>
> ---
> Hi,
>
> Patch regenerated and re-verified against 5.17-rc2. On this occasion I
> decided to remove BUG_ON after all for a supposedly impossible condition
> of a missing PCI Express capability for this PCIe switch and instead use
> WARN_ON and an early return. It should be safer this way.
>
> NB the corresponding U-boot change has since gone in with commit
> a398a51ccc68 ("pci: Work around PCIe link training failures"). See
Please include a URL for this commit. Without knowing where to use
it, the SHA1 by itself isn't much use.
> <https://lore.kernel.org/lkml/[email protected]/>
> for previous background information.
>
> Questions, comments, concerns? Otherwise please apply.
>
> Maciej
>
> Changes from v2:
>
> - Regenerate for 5.17-rc2 for a merge conflict.
>
> - Replace BUG_ON for a missing PCI Express capability with WARN_ON and an
> early return.
>
> Changes from v1:
>
> - Regenerate for a merge conflict.
> ---
> drivers/pci/quirks.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/pci_ids.h | 1
> 2 files changed, 99 insertions(+)
>
> linux-pcie-asm2824-manual-retrain.diff
> Index: linux-macro/drivers/pci/quirks.c
> ===================================================================
> --- linux-macro.orig/drivers/pci/quirks.c
> +++ linux-macro/drivers/pci/quirks.c
> @@ -12,6 +12,7 @@
> * file, where their drivers can use them.
> */
>
> +#include <linux/bug.h>
> #include <linux/types.h>
> #include <linux/kernel.h>
> #include <linux/export.h>
> @@ -5879,3 +5880,100 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_IN
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1536, rom_bar_overlap_defect);
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1537, rom_bar_overlap_defect);
> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1538, rom_bar_overlap_defect);
> +
> +/*
> + * Retrain the link of a downstream PCIe port by hand if necessary.
> + *
> + * This is needed at least where a downstream port of the ASMedia ASM2824
> + * Gen 3 switch is wired to the upstream port of the Pericom PI7C9X2G304
> + * Gen 2 switch, and observed with the Delock Riser Card PCI Express x1 >
> + * 2 x PCIe x1 device, P/N 41433, plugged into the SiFive HiFive Unmatched
> + * board.
> + *
> + * In such a configuration the switches are supposed to negotiate the link
> + * speed of preferably 5.0GT/s, falling back to 2.5GT/s. However the link
> + * continues switching between the two speeds indefinitely and the data
> + * link layer never reaches the active state, with link training reported
> + * repeatedly active ~84% of the time. Forcing the target link speed to
> + * 2.5GT/s with the upstream ASM2824 device makes the two switches talk to
> + * each other correctly however. And more interestingly retraining with a
> + * higher target link speed afterwards lets the two successfully negotiate
> + * 5.0GT/s.
> + *
> + * With the ASM2824 we can rely on the otherwise optional Data Link Layer
> + * Link Active status bit and in the failed link training scenario it will
> + * be off along with the Link Bandwidth Management Status indicating that
> + * hardware has changed the link speed or width in an attempt to correct
> + * unreliable link operation. For a port that has been left unconnected
> + * both bits will be clear. So use this information to detect the problem
> + * rather than polling the Link Training bit and watching out for flips or
> + * at least the active status.
> + *
> + * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
> + * request a retrain and wait 200ms for the data link to go up. If this
> + * turns out successful, then lift the restriction, letting the devices
> + * negotiate a higher speed. Also check for a 2.5GT/s speed restriction
> + * the firmware may have already arranged and lift it too with ports that
> + * already report their data link being up.
> + */
> +static void pcie_downstream_link_retrain(struct pci_dev *dev)
> +{
> + u16 lnksta, lnkctl2;
> + u8 pos;
> +
> + pos = pci_find_capability(dev, PCI_CAP_ID_EXP);
> + WARN_ON(!pos);
> + if (!pos)
> + return;
> +
> + pci_read_config_word(dev, pos + PCI_EXP_LNKCTL2, &lnkctl2);
> + pci_read_config_word(dev, pos + PCI_EXP_LNKSTA, &lnksta);
> + if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
> + PCI_EXP_LNKSTA_LBMS) {
> + unsigned long timeout;
> + u16 lnkctl;
> +
> + pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s...\n");
> +
> + pci_read_config_word(dev, pos + PCI_EXP_LNKCTL, &lnkctl);
> + lnkctl |= PCI_EXP_LNKCTL_RL;
> + lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> + lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
> + pci_write_config_word(dev, pos + PCI_EXP_LNKCTL2, lnkctl2);
> + pci_write_config_word(dev, pos + PCI_EXP_LNKCTL, lnkctl);
> +
> + timeout = jiffies + msecs_to_jiffies(200);
> + do {
> + pci_read_config_word(dev, pos + PCI_EXP_LNKSTA,
> + &lnksta);
> + if (lnksta & PCI_EXP_LNKSTA_DLLLA)
> + break;
> + usleep_range(10000, 20000);
> + } while (time_before(jiffies, timeout));
> +
> + pci_info(dev, "retraining %s!\n",
> + lnksta & PCI_EXP_LNKSTA_DLLLA ?
> + "succeeded" : "failed");
> + }
Perhaps a place to use pcie_retrain_link() (would require a little
rework to make it usable outside aspm.c).
> +
> + if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
> + (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT) {
> + u32 lnkcap;
> + u16 lnkctl;
> +
> + pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
> + pci_read_config_word(dev, pos + PCI_EXP_LNKCTL, &lnkctl);
> + pci_read_config_dword(dev, pos + PCI_EXP_LNKCAP, &lnkcap);
> + lnkctl |= PCI_EXP_LNKCTL_RL;
> + lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> + lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS;
> + pci_write_config_word(dev, pos + PCI_EXP_LNKCTL2, lnkctl2);
> + pci_write_config_word(dev, pos + PCI_EXP_LNKCTL, lnkctl);
> + }
> +}
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ASMEDIA,
> + PCI_DEVICE_ID_ASMEDIA_ASM2824,
> + pcie_downstream_link_retrain);
> +DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_ASMEDIA,
> + PCI_DEVICE_ID_ASMEDIA_ASM2824,
> + pcie_downstream_link_retrain);
> Index: linux-macro/include/linux/pci_ids.h
> ===================================================================
> --- linux-macro.orig/include/linux/pci_ids.h
> +++ linux-macro/include/linux/pci_ids.h
> @@ -2545,6 +2545,7 @@
> #define PCI_SUBDEVICE_ID_QEMU 0x1100
>
> #define PCI_VENDOR_ID_ASMEDIA 0x1b21
> +#define PCI_DEVICE_ID_ASMEDIA_ASM2824 0x2824
We only add to pci_ids.h when the ID is used in more than one place.
On Tue, 1 Feb 2022, Maciej W. Rozycki wrote:
> Attempt to handle cases with a downstream port of the ASMedia ASM2824
> PCIe switch where link training never completes and the link continues
> switching between speeds indefinitely with the data link layer never
> reaching the active state.
Ping for:
<https://lore.kernel.org/all/[email protected]/>
Maciej
On Tue, 1 Mar 2022, Bjorn Helgaas wrote:
> > Attempt to handle cases with a downstream port of the ASMedia ASM2824
> > PCIe switch where link training never completes and the link continues
> > switching between speeds indefinitely with the data link layer never
> > reaching the active state.
> >
> > It has been observed with a downstream port of the ASMedia ASM2824 Gen 3
> > switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2
> > switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device,
> > P/N 41433, wired to a SiFive HiFive Unmatched board. In this setup the
> > switches are supposed to negotiate the link speed of preferably 5.0GT/s,
> > falling back to 2.5GT/s.
>
> Can you collect the complete dmesg and lspci output, please?
Please find them attached (including output from U-boot), both from an
unchanged kernel and from one with my patch proposed applied.
Note that I have the U-boot-side workaround already installed and while
the firmware is on a uSD card and therefore easy to reflash I have the
system in a remote lab hundreds of kilometres/miles away so I dare not
fiddle with it while not there and for the purpose of this experiment I
have manually removed the sticky 2.5GT/s clamp installed by U-boot with:
=> pci write.w 02.03.0 0xb0 0x0063
This is indicated in the console logs.
> >From hints on the web, I guess the ASM2824 is built-in to the SiFive
> board, right? So I suspect the topology is something like this:
>
> 00:00.0 Root Port to [bus 01-??] # soldered on to Unmatched
> 01:00.0 ASM2824 Upstream Port to [bus 02-09] # soldered on
> 02:03.0 ASM2824 Downstream Port to [bus 05-09] # soldered on, to slot
> 05:00.0 PI7C9X2G304 Upstream Port to [bus 06-09] # on Delock riser card
> 06:??.? PI7C9X2G304 Downstream Port to [bus ??] # slot 0 on Delock card
> 06:??.? PI7C9X2G304 Downstream Port to [bus ??] # slot 1 on Delock card
Correct.
> Do we have other reports of this bridge being broken? Or could this
> be some kind of signal integrity problem on Unmatched or the slot?
This appears to be the same issue on the ASM2824's upstream port side:
<https://lore.kernel.org/lkml/[email protected]/>.
I have replied downthread there already, pointing here.
On the ASM2824's downstream ports side it happens with the Delock device
regardless of whether port 02:03.0 (M.2 Key E slot, with an adapter to a
regular PCIe x1 slot installed) or 02:08.0 (regular PCIe x8 slot) is used,
so it's clearly not slot-specific.
Furthermore I highly doubt it's a signal integrity issue. It is because
the link is correctly negotiated and continues working at its maximum of
5GT/s for weeks since the bootstrap as long as it's first manually clamped
to 2.5GT/s and negotiated at that speed and then manually retrained with
the clamp removed. Except for the longer interval between negotiations
it's not much different AFAIK to automatic link training (where 2.5GT/s is
first negotiated anyway before a higher speed is negotiated according to
link capability information exchanged between the endpoints). Therefore I
highly suspect it is a protocol issue and either or both devices become
confused and restart link training in an attempt to recover.
Sadly ASMedia and Pericom declined to comment, all Delock stated was
their device is fine and works correctly with millions of systems out
there, and SiFive guys while sympathetic do not have the expertise
required to further debug this issue.
> The ASM2824 has been around for a while and seems to be used in other
> systems, e.g., https://linux-hardware.org/?id=pci:1b21-2824, so if
> it's an ASM2824 issue, we should see it elsewhere, too.
Also e.g.: <https://www.amazon.com/dp/B07PRN2QCV> and I could get it for
further experiments, but somehow I lack the incentive to spend any money
on a device and therefore indirectly pay its manufacturer where said
manufacturer does not bother to respond to a legitimate issue report.
> > However the link continues oscillating between the two speeds, at the
> > rate of 34-35 times per second, with link training reported repeatedly
> > active ~84% of the time, e.g.:
> >
> > 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> > [...]
> > Bus: primary=02, secondary=05, subordinate=05, sec-latency=0
> > [...]
> > Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> > [...]
> > LnkSta: Speed 5GT/s (downgraded), Width x1 (ok)
> > TrErr- Train+ SlotClk+ DLActive- BWMgmt+ ABWMgmt-
> > [...]
> > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
> > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> > Compliance De-emphasis: -6dB
> > [...]
> >
> > Forcibly limiting the target link speed to 2.5GT/s with the upstream
> > ASM2824 device makes the two switches communicate correctly however:
> >
> > 02:03.0 PCI bridge [0604]: ASMedia Technology Inc. ASM2824 PCIe Gen3 Packet Switch [1b21:2824] (rev 01) (prog-if 00 [Normal decode])
> > [...]
> > Bus: primary=02, secondary=05, subordinate=09, sec-latency=0
> > [...]
> > Capabilities: [80] Express (v2) Downstream Port (Slot+), MSI 00
> > [...]
> > LnkSta: Speed 2.5GT/s (downgraded), Width x1 (ok)
> > TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
> > [...]
> > LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis+, Selectable De-emphasis: -3.5dB
> > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> > Compliance De-emphasis: -6dB
> > [...]
> >
> > and then:
> >
> > 05:00.0 PCI bridge [0604]: Pericom Semiconductor PI7C9X2G304 EL/SL PCIe2 3-Port/4-Lane Packet Switch [12d8:2304] (rev 05) (prog-if 00 [Normal decode])
> > [...]
> > Bus: primary=05, secondary=06, subordinate=09, sec-latency=0
> > [...]
> > Capabilities: [c0] Express (v2) Upstream Port, MSI 00
> > [...]
> > LnkSta: Speed 2.5GT/s (downgraded), Width x1 (downgraded)
> > TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
> > [...]
> > LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-
> > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
> > Compliance De-emphasis: -6dB
> > [...]
>
> I think the above is supposed to show 02:03.0 operating at 5GT/s or
> 2.5GT/s, and 05:00.0 operating at 2.5GT/s. But there's a lot of noise
> that makes it hard to pick out what's important.
With the 2.5GT/s clamp in place for 02:03.0 (Target Link Speed: 2.5GT/s
in LnkCtl2) it is correctly shown that the device operates at 2.5GT/s
(Speed 2.5GT/s (downgraded) in LnkSta). Conversely 05:00.0 has no clamp
in place (Target Link Speed: 5GT/s in LnkCtl2), so it only operates at
2.5GT/s (Speed 2.5GT/s (downgraded) in LnkSta) due to the clamp upstream.
> > NB the corresponding U-boot change has since gone in with commit
> > a398a51ccc68 ("pci: Work around PCIe link training failures"). See
>
> Please include a URL for this commit. Without knowing where to use
> it, the SHA1 by itself isn't much use.
Sure: <https://source.denx.de/u-boot/u-boot/-/commit/a398a51ccc68>.
> > +static void pcie_downstream_link_retrain(struct pci_dev *dev)
> > +{
> > + u16 lnksta, lnkctl2;
> > + u8 pos;
> > +
> > + pos = pci_find_capability(dev, PCI_CAP_ID_EXP);
> > + WARN_ON(!pos);
> > + if (!pos)
> > + return;
> > +
> > + pci_read_config_word(dev, pos + PCI_EXP_LNKCTL2, &lnkctl2);
> > + pci_read_config_word(dev, pos + PCI_EXP_LNKSTA, &lnksta);
> > + if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
> > + PCI_EXP_LNKSTA_LBMS) {
> > + unsigned long timeout;
> > + u16 lnkctl;
> > +
> > + pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s...\n");
> > +
> > + pci_read_config_word(dev, pos + PCI_EXP_LNKCTL, &lnkctl);
> > + lnkctl |= PCI_EXP_LNKCTL_RL;
> > + lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
> > + lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
> > + pci_write_config_word(dev, pos + PCI_EXP_LNKCTL2, lnkctl2);
> > + pci_write_config_word(dev, pos + PCI_EXP_LNKCTL, lnkctl);
> > +
> > + timeout = jiffies + msecs_to_jiffies(200);
> > + do {
> > + pci_read_config_word(dev, pos + PCI_EXP_LNKSTA,
> > + &lnksta);
> > + if (lnksta & PCI_EXP_LNKSTA_DLLLA)
> > + break;
> > + usleep_range(10000, 20000);
> > + } while (time_before(jiffies, timeout));
> > +
> > + pci_info(dev, "retraining %s!\n",
> > + lnksta & PCI_EXP_LNKSTA_DLLLA ?
> > + "succeeded" : "failed");
> > + }
>
> Perhaps a place to use pcie_retrain_link() (would require a little
> rework to make it usable outside aspm.c).
I'd rather rely on PCI_EXP_LNKSTA_DLLLA and not PCI_EXP_LNKSTA_LT here
(maybe it's OK for ASPM). The thing is with the ASM2824 PCI_EXP_LNKSTA_LT
will oscillate in the case of an unsuccessful link negotiation, as noted
in the change description and consequently:
if (!(reg16 & PCI_EXP_LNKSTA_LT))
break;
may break out of the loop prematurely. This is prevented in the U-boot
version of the workaround (which is not ASM2824-specific and therefore
does not rely on the presence of PCI_EXP_LNKSTA_DLLLA), by taking samples
in a busy loop, which is something the firmware can do, but we cannot in a
running OS, and then doing simple statistics.
NB in an earlier version of the U-boot workaround I actually iterated
clamping over all speeds supported rather than at 2.5GT/s only, in which
case ensuring the link has stabilised was even more important. But then I
realised it will be better if the firmware only tries the single reduced
speed and then leaves the clamp in, taking advantage of the stickiness of
the bit in case the OS booted does not have a workaround and resetting the
port with the clamp removed as the OS boots would break the link again.
> > Index: linux-macro/include/linux/pci_ids.h
> > ===================================================================
> > --- linux-macro.orig/include/linux/pci_ids.h
> > +++ linux-macro/include/linux/pci_ids.h
> > @@ -2545,6 +2545,7 @@
> > #define PCI_SUBDEVICE_ID_QEMU 0x1100
> >
> > #define PCI_VENDOR_ID_ASMEDIA 0x1b21
> > +#define PCI_DEVICE_ID_ASMEDIA_ASM2824 0x2824
>
> We only add to pci_ids.h when the ID is used in more than one place.
I find it weird as working with magic numbers sprinkled throughout code
is problematic (and I spent a considerable amount of time to clean such
stuff up in the MIPS port), but won't insist if that's a prerequisite for
acceptance.
Please let me know if you need the change updated beyond the removal of
the pci_ids.h entry and I will post a new version. Also let me know if
you need any further details.
Thank you for your review.
Maciej