v2 -- Changed binding property 'brcm,completion-timeout-msec' to
'brcm,completion-timeout-us'. (StefanW for standard suffix).
-- Warn when clamping timeout value, and include clamped
region in message. Also add min and max in YAML. (StefanW)
-- Qualify description of "brcm,completion-timeout-us" so that
it refers to PCIe transactions. (StefanW)
-- Remvove mention of Linux specifics in binding description. (StefanW)
-- s/clkreq#/CLKREQ#/g (Bjorn)
-- Refactor completion-timeout-us code to compare max and min to
value given by the property (as opposed to the computed value).
v1 -- The current driver assumes the downstream devices can
provide CLKREQ# for ASPM. These commits accomodate devices
w/ or w/o clkreq# and also handle L1SS-capable devices.
-- The Raspian Linux folks have already been using a PCIe RC
property "brcm,enable-l1ss". These commits use the same
property, in a backward-compatible manner, and the implementaion
adds more detail and also automatically identifies devices w/o
a clkreq# signal, i.e. most devices plugged into an RPi CM4
IO board.
Jim Quinlan (3):
PCI: brcmstb: CLKREQ# accomodations of downstream device
PCI: brcmstb: Set PCIe transaction completion timeout
blah blah
drivers/pci/controller/pcie-brcmstb.c | 99 ++++++++++++++++++++++++---
init/do_mounts.c | 16 ++++-
2 files changed, 102 insertions(+), 13 deletions(-)
base-commit: 76f598ba7d8e2bfb4855b5298caedd5af0c374a8
prerequisite-patch-id: f38de8681d8746126d60b3430eaf218d2dd169cd
prerequisite-patch-id: 23e13189f200358976abf5bf3600973a20cf386c
prerequisite-patch-id: edfbe6ea39ed6a4937e2cec3bb8ee0e60091546d
prerequisite-patch-id: c87dd155e8506a2a277726c47d85bf3fa83727d5
prerequisite-patch-id: 579841e1dc179517506a7a7c42e0e651b3bc3649
prerequisite-patch-id: b5b079998ea451821edffd7c52cd3d89d06046a1
prerequisite-patch-id: b51b3918e554e279b2ace1f68ed6b4176f8ccc24
prerequisite-patch-id: 333e5188fb27d0ed010f5359e83e539172a67690
prerequisite-patch-id: bb107ee7b4811a9719508ea667cad2466933dec0
prerequisite-patch-id: 1258db336e778eb57d5cbea88834cd25aa1346ba
--
2.17.1
The Broadcom STB/CM PCIe HW core, which is also used in RPi SOCs, must be
deliberately set by the probe() into one of three mutually exclusive modes:
(a) No CLKREQ# expected or required, refclk is always available.
(b) CLKREQ# is expected to be driven by downstream device when needed.
(c) Bidirectional CLKREQ# for L1SS capable devices.
Previously, only (b) was supported by the driver, as almost all STB/CM
boards operate in this mode. But now there is interest in activating L1SS
power savings from STB/CM customers, and also interest in accomodating mode
(a) for designs such as the RPi CM4 with IO board.
The HW+driver is able to tell us when mode (a) mode is needed. But there
is no easy way to tell if L1SS mode should be configured. In certain
situations, getting this wrong may cause a panic during boot time. So we
rely on the DT prop "brcm,enable-l1ss" to tell us when mode (c) is desired.
Using this mode only makes sense when the downstream device is L1SS-capable
and the OS has been configured to activate L1SS
(e.g. policy==powersupersave).
This property has already been in use by Raspian Linux, but this
immplementation adds more details and discerns between (a) and (b)
automatically.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217276
Signed-off-by: Jim Quinlan <[email protected]>
---
drivers/pci/controller/pcie-brcmstb.c | 69 +++++++++++++++++++++++----
1 file changed, 59 insertions(+), 10 deletions(-)
diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
index edf283e2b5dd..56b96aa02221 100644
--- a/drivers/pci/controller/pcie-brcmstb.c
+++ b/drivers/pci/controller/pcie-brcmstb.c
@@ -48,10 +48,17 @@
#define PCIE_RC_CFG_PRIV1_LINK_CAPABILITY 0x04dc
#define PCIE_RC_CFG_PRIV1_LINK_CAPABILITY_ASPM_SUPPORT_MASK 0xc00
+#define PCIE_RC_CFG_PRIV1_ROOT_CAP 0x4f8
+#define PCIE_RC_CFG_PRIV1_ROOT_CAP_L1SS_MODE_MASK 0xf8
+
#define PCIE_RC_DL_MDIO_ADDR 0x1100
#define PCIE_RC_DL_MDIO_WR_DATA 0x1104
#define PCIE_RC_DL_MDIO_RD_DATA 0x1108
+#define PCIE_0_RC_PL_PHY_DBG_CLKREQ2_0 0x1e30
+#define CLKREQ2_0_CLKREQ_IN_CNT_MASK 0x3f000000
+#define CLKREQ2_0_CLKREQ_IN_MASK 0x40000000
+
#define PCIE_MISC_MISC_CTRL 0x4008
#define PCIE_MISC_MISC_CTRL_PCIE_RCB_64B_MODE_MASK 0x80
#define PCIE_MISC_MISC_CTRL_PCIE_RCB_MPS_MODE_MASK 0x400
@@ -121,9 +128,12 @@
#define PCIE_MISC_HARD_PCIE_HARD_DEBUG 0x4204
#define PCIE_MISC_HARD_PCIE_HARD_DEBUG_CLKREQ_DEBUG_ENABLE_MASK 0x2
+#define PCIE_MISC_HARD_PCIE_HARD_DEBUG_L1SS_ENABLE_MASK 0x200000
#define PCIE_MISC_HARD_PCIE_HARD_DEBUG_SERDES_IDDQ_MASK 0x08000000
#define PCIE_BMIPS_MISC_HARD_PCIE_HARD_DEBUG_SERDES_IDDQ_MASK 0x00800000
-
+#define PCIE_CLKREQ_MASK \
+ (PCIE_MISC_HARD_PCIE_HARD_DEBUG_CLKREQ_DEBUG_ENABLE_MASK | \
+ PCIE_MISC_HARD_PCIE_HARD_DEBUG_L1SS_ENABLE_MASK)
#define PCIE_INTR2_CPU_BASE 0x4300
#define PCIE_MSI_INTR2_BASE 0x4500
@@ -1024,13 +1034,58 @@ static int brcm_pcie_setup(struct brcm_pcie *pcie)
return 0;
}
+static void brcm_config_clkreq(struct brcm_pcie *pcie)
+{
+ bool l1ss = of_property_read_bool(pcie->np, "brcm,enable-l1ss");
+ void __iomem *base = pcie->base;
+ u32 clkreq_set, tmp = readl(base + PCIE_0_RC_PL_PHY_DBG_CLKREQ2_0);
+ bool clkreq_in_seen;
+
+ /*
+ * We have "seen" CLKREQ# if it is asserted or has been in the past.
+ * Note that the CLKREQ_IN_MASK is 1 if CLKREQ# is asserted.
+ */
+ clkreq_in_seen = !!(tmp & CLKREQ2_0_CLKREQ_IN_MASK) ||
+ !!FIELD_GET(CLKREQ2_0_CLKREQ_IN_CNT_MASK, tmp);
+
+ /* Start with safest setting where we provide refclk regardless */
+ clkreq_set = readl(pcie->base + PCIE_MISC_HARD_PCIE_HARD_DEBUG) &
+ ~PCIE_CLKREQ_MASK;
+
+ if (l1ss && IS_ENABLED(CONFIG_PCIEASPM)) {
+ /*
+ * Note: This (l1ss) mode may not meet requirement for
+ * downstream devices that require CLKREQ# assertion to
+ * clock active within 400ns.
+ */
+ clkreq_set |= PCIE_MISC_HARD_PCIE_HARD_DEBUG_L1SS_ENABLE_MASK;
+ dev_info(pcie->dev, "bi-dir CLKREQ# for l1ss-capable devs\n");
+ dev_info(pcie->dev, "ASPM policy should be set to \"powersupersave\"\n");
+ } else {
+ if (clkreq_in_seen && IS_ENABLED(CONFIG_PCIEASPM)) {
+ clkreq_set |= PCIE_MISC_HARD_PCIE_HARD_DEBUG_CLKREQ_DEBUG_ENABLE_MASK;
+ dev_info(pcie->dev, "uni-dir CLKREQ# for ASPM\n");
+ } else {
+ dev_info(pcie->dev, "CLKREQ# ignored; no ASPM\n");
+ /* Might as well unadvertise ASPM */
+ tmp = readl(base + PCIE_RC_CFG_PRIV1_LINK_CAPABILITY) &
+ ~PCIE_RC_CFG_PRIV1_LINK_CAPABILITY_ASPM_SUPPORT_MASK;
+ writel(tmp, base + PCIE_RC_CFG_PRIV1_LINK_CAPABILITY);
+ }
+ /* Setting the field to 2 unadvertises L1SS support */
+ tmp = readl(base + PCIE_RC_CFG_PRIV1_ROOT_CAP);
+ u32p_replace_bits(&tmp, 2, PCIE_RC_CFG_PRIV1_ROOT_CAP_L1SS_MODE_MASK);
+ writel(tmp, base + PCIE_RC_CFG_PRIV1_ROOT_CAP);
+ }
+ writel(clkreq_set, pcie->base + PCIE_MISC_HARD_PCIE_HARD_DEBUG);
+}
+
static int brcm_pcie_start_link(struct brcm_pcie *pcie)
{
struct device *dev = pcie->dev;
void __iomem *base = pcie->base;
u16 nlw, cls, lnksta;
bool ssc_good = false;
- u32 tmp;
int ret, i;
/* Unassert the fundamental reset */
@@ -1055,6 +1110,8 @@ static int brcm_pcie_start_link(struct brcm_pcie *pcie)
return -ENODEV;
}
+ brcm_config_clkreq(pcie);
+
if (pcie->gen)
brcm_pcie_set_gen(pcie, pcie->gen);
@@ -1073,14 +1130,6 @@ static int brcm_pcie_start_link(struct brcm_pcie *pcie)
pci_speed_string(pcie_link_speed[cls]), nlw,
ssc_good ? "(SSC)" : "(!SSC)");
- /*
- * Refclk from RC should be gated with CLKREQ# input when ASPM L0s,L1
- * is enabled => setting the CLKREQ_DEBUG_ENABLE field to 1.
- */
- tmp = readl(base + PCIE_MISC_HARD_PCIE_HARD_DEBUG);
- tmp |= PCIE_MISC_HARD_PCIE_HARD_DEBUG_CLKREQ_DEBUG_ENABLE_MASK;
- writel(tmp, base + PCIE_MISC_HARD_PCIE_HARD_DEBUG);
-
return 0;
}
--
2.17.1
Since the STB PCIe HW will cause a CPU abort on a PCIe transaction
completion timeout abort, we might as well extend the default timeout
limit. Further, different devices and systems may requires a larger or
smaller amount commensurate with their L1SS exit time, so the property
"brcm,completion-abort-us" may be used to set a custom timeout value.
Signed-off-by: Jim Quinlan <[email protected]>
---
drivers/pci/controller/pcie-brcmstb.c | 30 +++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
index 56b96aa02221..9610066f8c80 100644
--- a/drivers/pci/controller/pcie-brcmstb.c
+++ b/drivers/pci/controller/pcie-brcmstb.c
@@ -1080,6 +1080,35 @@ static void brcm_config_clkreq(struct brcm_pcie *pcie)
writel(clkreq_set, pcie->base + PCIE_MISC_HARD_PCIE_HARD_DEBUG);
}
+static void brcm_config_completion_timeout(struct brcm_pcie *pcie)
+{
+ /* TIMEOUT register is two registers before RGR1_SW_INIT_1 */
+ const char *fmt = "brcm,completion-timeout-us clamped to region [%u..%u]\n";
+ const unsigned int REG_OFFSET = PCIE_RGR1_SW_INIT_1(pcie) - 8;
+ const u32 timeout_us_min = 16;
+ const u32 timeout_us_max = 19884107;
+ u32 timeout_us = 1000000; /* Our default, 1 second */
+ int rval, ret;
+
+ ret = of_property_read_u32(pcie->np, "brcm,completion-timeout-us",
+ &timeout_us);
+ if (ret && ret != -EINVAL)
+ dev_err(pcie->dev, "malformed/invalid 'brcm,completion-timeout-us'\n");
+
+ /* If needed, clamp the requested timeout value and issue a warning */
+ if (timeout_us < timeout_us_min) {
+ timeout_us = timeout_us_min;
+ dev_warn(pcie->dev, fmt, timeout_us_min, timeout_us_max);
+ } else if (timeout_us > timeout_us_max) {
+ timeout_us = timeout_us_max;
+ dev_warn(pcie->dev, fmt, timeout_us_min, timeout_us_max);
+ }
+
+ /* Each unit in timeout register is 1/216,000,000 seconds */
+ rval = 216 * timeout_us;
+ writel(rval, pcie->base + REG_OFFSET);
+}
+
static int brcm_pcie_start_link(struct brcm_pcie *pcie)
{
struct device *dev = pcie->dev;
@@ -1110,6 +1139,7 @@ static int brcm_pcie_start_link(struct brcm_pcie *pcie)
return -ENODEV;
}
+ brcm_config_completion_timeout(pcie);
brcm_config_clkreq(pcie);
if (pcie->gen)
--
2.17.1
Regarding "brcm,enable-l1ss":
The Broadcom STB/CM PCIe HW -- a core that is also used by RPi SOCs --
requires the driver probe() to deliberately place the HW one of three
CLKREQ# modes:
(a) CLKREQ# driven by the RC unconditionally
(b) CLKREQ# driven by the EP for ASPM L0s, L1
(c) Bidirectional CLKREQ#, as used for L1 Substates (L1SS).
The HW+driver can tell the difference between downstream devices that
need (a) and (b), but does not know when to configure (c). Further, the
HW may cause a CPU abort on boot if guesses wrong regarding the need for
(c). So we introduce the boolean "brcm,enable-l1ss" property to indicate
that (c) is desired. Setting this property only makes sense when the
downstream device is L1SS-capable and the OS is configured to activate
this mode (e.g. policy==superpowersave).
This property is already present in the Raspian version of Linux, but the
upstream driver implementaion that will follow adds more details and
discerns between (a) and (b).
Regarding "brcm,completion-timeout-us"
Our HW will cause a CPU abort if the L1SS exit time is longer than the
PCIe transaction completion abort timeout. We've been asked to make this
configurable, so we are introducing "brcm,completion-timeout-us".
Signed-off-by: Jim Quinlan <[email protected]>
---
.../devicetree/bindings/pci/brcm,stb-pcie.yaml | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml b/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
index 7e15aae7d69e..f7fc2f6561bb 100644
--- a/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
+++ b/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
@@ -64,6 +64,22 @@ properties:
aspm-no-l0s: true
+ brcm,enable-l1ss:
+ description: Indicates that PCIe L1SS power savings
+ are desired, the downstream device is L1SS-capable, and the
+ OS has been configured to enable this mode. Note that when
+ in this mode, this particular HW may not meet the requirement
+ that requires CLKREQ# assertion to clock active to be
+ within 400ns.
+ type: boolean
+
+ brcm,completion-timeout-us:
+ description: Number of microseconds before PCI transaction
+ completion timeout abort is signalled.
+ minimum: 16
+ default: 1000000
+ maximum: 19884107
+
brcm,scb-sizes:
description: u64 giving the 64bit PCIe memory
viewport size of a memory controller. There may be up to
--
2.17.1
Hi Jim,
Jim Quinlan <[email protected]> (2023-04-11):
> Since the STB PCIe HW will cause a CPU abort on a PCIe transaction
> completion timeout abort, we might as well extend the default timeout
> limit. Further, different devices and systems may requires a larger or
> smaller amount commensurate with their L1SS exit time, so the property
> "brcm,completion-abort-us" may be used to set a custom timeout value.
^^^^^^^^^^^^^^^^^^^^^^^^
> + ret = of_property_read_u32(pcie->np, "brcm,completion-timeout-us",
> + &timeout_us);
> + if (ret && ret != -EINVAL)
> + dev_err(pcie->dev, "malformed/invalid 'brcm,completion-timeout-us'\n");
v2 renames brcm,completion-abort-msecs into brcm,completion-timeout-us
but the commit message mentions the half-way brcm,completion-abort-us
property instead.
(Also spotted “immplementation” in 2/3 but I thought I'd spare everyone
an extra mail.)
Cheers,
--
Cyril Brulebois ([email protected]) <https://debamax.com/>
D-I release manager -- Release team member -- Freelance Consultant
On 11/04/2023 18:59, Jim Quinlan wrote:
> Regarding "brcm,enable-l1ss":
>
> The Broadcom STB/CM PCIe HW -- a core that is also used by RPi SOCs --
> requires the driver probe() to deliberately place the HW one of three
> CLKREQ# modes:
>
> (a) CLKREQ# driven by the RC unconditionally
> (b) CLKREQ# driven by the EP for ASPM L0s, L1
> (c) Bidirectional CLKREQ#, as used for L1 Substates (L1SS).
>
> The HW+driver can tell the difference between downstream devices that
> need (a) and (b), but does not know when to configure (c). Further, the
> HW may cause a CPU abort on boot if guesses wrong regarding the need for
> (c). So we introduce the boolean "brcm,enable-l1ss" property to indicate
> that (c) is desired. Setting this property only makes sense when the
> downstream device is L1SS-capable and the OS is configured to activate
> this mode (e.g. policy==superpowersave).
>
> This property is already present in the Raspian version of Linux, but the
> upstream driver implementaion that will follow adds more details and
typo, implementation
> discerns between (a) and (b).
>
> Regarding "brcm,completion-timeout-us"
>
> Our HW will cause a CPU abort if the L1SS exit time is longer than the
> PCIe transaction completion abort timeout. We've been asked to make this
> configurable, so we are introducing "brcm,completion-timeout-us".
>
> Signed-off-by: Jim Quinlan <[email protected]>
What happened here? Where is the changelog?
Best regards,
Krzysztof
On 4/12/2023 1:09 AM, Krzysztof Kozlowski wrote:
> On 11/04/2023 18:59, Jim Quinlan wrote:
>> Regarding "brcm,enable-l1ss":
>>
>> The Broadcom STB/CM PCIe HW -- a core that is also used by RPi SOCs --
>> requires the driver probe() to deliberately place the HW one of three
>> CLKREQ# modes:
>>
>> (a) CLKREQ# driven by the RC unconditionally
>> (b) CLKREQ# driven by the EP for ASPM L0s, L1
>> (c) Bidirectional CLKREQ#, as used for L1 Substates (L1SS).
>>
>> The HW+driver can tell the difference between downstream devices that
>> need (a) and (b), but does not know when to configure (c). Further, the
>> HW may cause a CPU abort on boot if guesses wrong regarding the need for
>> (c). So we introduce the boolean "brcm,enable-l1ss" property to indicate
>> that (c) is desired. Setting this property only makes sense when the
>> downstream device is L1SS-capable and the OS is configured to activate
>> this mode (e.g. policy==superpowersave).
>>
>> This property is already present in the Raspian version of Linux, but the
>> upstream driver implementaion that will follow adds more details and
>
> typo, implementation
>
>> discerns between (a) and (b).
>>
>> Regarding "brcm,completion-timeout-us"
>>
>> Our HW will cause a CPU abort if the L1SS exit time is longer than the
>> PCIe transaction completion abort timeout. We've been asked to make this
>> configurable, so we are introducing "brcm,completion-timeout-us".
>>
>> Signed-off-by: Jim Quinlan <[email protected]>
>
> What happened here? Where is the changelog?
It is in the cover letter:
https://lore.kernel.org/all/[email protected]/
but it does not look like the cover letter was copied to you or Rob.
--
Florian
On 12/04/2023 13:49, Florian Fainelli wrote:
>
>
> On 4/12/2023 1:09 AM, Krzysztof Kozlowski wrote:
>> On 11/04/2023 18:59, Jim Quinlan wrote:
>>> Regarding "brcm,enable-l1ss":
>>>
>>> The Broadcom STB/CM PCIe HW -- a core that is also used by RPi SOCs --
>>> requires the driver probe() to deliberately place the HW one of three
>>> CLKREQ# modes:
>>>
>>> (a) CLKREQ# driven by the RC unconditionally
>>> (b) CLKREQ# driven by the EP for ASPM L0s, L1
>>> (c) Bidirectional CLKREQ#, as used for L1 Substates (L1SS).
>>>
>>> The HW+driver can tell the difference between downstream devices that
>>> need (a) and (b), but does not know when to configure (c). Further, the
>>> HW may cause a CPU abort on boot if guesses wrong regarding the need for
>>> (c). So we introduce the boolean "brcm,enable-l1ss" property to indicate
>>> that (c) is desired. Setting this property only makes sense when the
>>> downstream device is L1SS-capable and the OS is configured to activate
>>> this mode (e.g. policy==superpowersave).
>>>
>>> This property is already present in the Raspian version of Linux, but the
>>> upstream driver implementaion that will follow adds more details and
>>
>> typo, implementation
>>
>>> discerns between (a) and (b).
>>>
>>> Regarding "brcm,completion-timeout-us"
>>>
>>> Our HW will cause a CPU abort if the L1SS exit time is longer than the
>>> PCIe transaction completion abort timeout. We've been asked to make this
>>> configurable, so we are introducing "brcm,completion-timeout-us".
>>>
>>> Signed-off-by: Jim Quinlan <[email protected]>
>>
>> What happened here? Where is the changelog?
>
> It is in the cover letter:
>
> https://lore.kernel.org/all/[email protected]/
>
> but it does not look like the cover letter was copied to you or Rob.
As you said, I did not get it.
Best regards,
Krzysztof
On Wed, Apr 12, 2023 at 7:56 AM Krzysztof Kozlowski
<[email protected]> wrote:
>
> On 12/04/2023 13:49, Florian Fainelli wrote:
> >
> >
> > On 4/12/2023 1:09 AM, Krzysztof Kozlowski wrote:
> >> On 11/04/2023 18:59, Jim Quinlan wrote:
> >>> Regarding "brcm,enable-l1ss":
> >>>
> >>> The Broadcom STB/CM PCIe HW -- a core that is also used by RPi SOCs --
> >>> requires the driver probe() to deliberately place the HW one of three
> >>> CLKREQ# modes:
> >>>
> >>> (a) CLKREQ# driven by the RC unconditionally
> >>> (b) CLKREQ# driven by the EP for ASPM L0s, L1
> >>> (c) Bidirectional CLKREQ#, as used for L1 Substates (L1SS).
> >>>
> >>> The HW+driver can tell the difference between downstream devices that
> >>> need (a) and (b), but does not know when to configure (c). Further, the
> >>> HW may cause a CPU abort on boot if guesses wrong regarding the need for
> >>> (c). So we introduce the boolean "brcm,enable-l1ss" property to indicate
> >>> that (c) is desired. Setting this property only makes sense when the
> >>> downstream device is L1SS-capable and the OS is configured to activate
> >>> this mode (e.g. policy==superpowersave).
> >>>
> >>> This property is already present in the Raspian version of Linux, but the
> >>> upstream driver implementaion that will follow adds more details and
> >>
> >> typo, implementation
> >>
> >>> discerns between (a) and (b).
> >>>
> >>> Regarding "brcm,completion-timeout-us"
> >>>
> >>> Our HW will cause a CPU abort if the L1SS exit time is longer than the
> >>> PCIe transaction completion abort timeout. We've been asked to make this
> >>> configurable, so we are introducing "brcm,completion-timeout-us".
> >>>
> >>> Signed-off-by: Jim Quinlan <[email protected]>
> >>
> >> What happened here? Where is the changelog?
> >
> > It is in the cover letter:
> >
> > https://lore.kernel.org/all/[email protected]/
> >
> > but it does not look like the cover letter was copied to you or Rob.
>
> As you said, I did not get it.
Yes, sorry about that; I use a wrapper over the "cocci_cc" script and
I need to modify one or both scripts to send the cover to the
superset of recipients in the constituent commits.
Regards,
Jim Quinan
Broadcom STB
>
> Best regards,
> Krzysztof
>
On Wed, Apr 12, 2023 at 10:14:46AM -0400, Jim Quinlan wrote:
> On Wed, Apr 12, 2023 at 7:56 AM Krzysztof Kozlowski
> <[email protected]> wrote:
> >
> > On 12/04/2023 13:49, Florian Fainelli wrote:
> > >
> > >
> > > On 4/12/2023 1:09 AM, Krzysztof Kozlowski wrote:
> > >> On 11/04/2023 18:59, Jim Quinlan wrote:
> > >>> Regarding "brcm,enable-l1ss":
> > >>>
> > >>> The Broadcom STB/CM PCIe HW -- a core that is also used by RPi SOCs --
> > >>> requires the driver probe() to deliberately place the HW one of three
> > >>> CLKREQ# modes:
> > >>>
> > >>> (a) CLKREQ# driven by the RC unconditionally
> > >>> (b) CLKREQ# driven by the EP for ASPM L0s, L1
> > >>> (c) Bidirectional CLKREQ#, as used for L1 Substates (L1SS).
> > >>>
> > >>> The HW+driver can tell the difference between downstream devices that
> > >>> need (a) and (b), but does not know when to configure (c). Further, the
> > >>> HW may cause a CPU abort on boot if guesses wrong regarding the need for
> > >>> (c). So we introduce the boolean "brcm,enable-l1ss" property to indicate
> > >>> that (c) is desired. Setting this property only makes sense when the
> > >>> downstream device is L1SS-capable and the OS is configured to activate
> > >>> this mode (e.g. policy==superpowersave).
> > >>>
> > >>> This property is already present in the Raspian version of Linux, but the
> > >>> upstream driver implementaion that will follow adds more details and
> > >>
> > >> typo, implementation
> > >>
> > >>> discerns between (a) and (b).
> > >>>
> > >>> Regarding "brcm,completion-timeout-us"
> > >>>
> > >>> Our HW will cause a CPU abort if the L1SS exit time is longer than the
> > >>> PCIe transaction completion abort timeout. We've been asked to make this
> > >>> configurable, so we are introducing "brcm,completion-timeout-us".
> > >>>
> > >>> Signed-off-by: Jim Quinlan <[email protected]>
> > >>
> > >> What happened here? Where is the changelog?
> > >
> > > It is in the cover letter:
> > >
> > > https://lore.kernel.org/all/[email protected]/
> > >
> > > but it does not look like the cover letter was copied to you or Rob.
> >
> > As you said, I did not get it.
>
> Yes, sorry about that; I use a wrapper over the "cocci_cc" script and
> I need to modify one or both scripts to send the cover to the
> superset of recipients in the constituent commits.
Try out 'b4'. It's much easier.
In any case, I don't read cover letters. Changes to a patch belong with
the patch.
Rob
On 4/12/23 08:37, Rob Herring wrote:
> On Wed, Apr 12, 2023 at 10:14:46AM -0400, Jim Quinlan wrote:
>> On Wed, Apr 12, 2023 at 7:56 AM Krzysztof Kozlowski
>> <[email protected]> wrote:
>>>
>>> On 12/04/2023 13:49, Florian Fainelli wrote:
>>>>
>>>>
>>>> On 4/12/2023 1:09 AM, Krzysztof Kozlowski wrote:
>>>>> On 11/04/2023 18:59, Jim Quinlan wrote:
>>>>>> Regarding "brcm,enable-l1ss":
>>>>>>
>>>>>> The Broadcom STB/CM PCIe HW -- a core that is also used by RPi SOCs --
>>>>>> requires the driver probe() to deliberately place the HW one of three
>>>>>> CLKREQ# modes:
>>>>>>
>>>>>> (a) CLKREQ# driven by the RC unconditionally
>>>>>> (b) CLKREQ# driven by the EP for ASPM L0s, L1
>>>>>> (c) Bidirectional CLKREQ#, as used for L1 Substates (L1SS).
>>>>>>
>>>>>> The HW+driver can tell the difference between downstream devices that
>>>>>> need (a) and (b), but does not know when to configure (c). Further, the
>>>>>> HW may cause a CPU abort on boot if guesses wrong regarding the need for
>>>>>> (c). So we introduce the boolean "brcm,enable-l1ss" property to indicate
>>>>>> that (c) is desired. Setting this property only makes sense when the
>>>>>> downstream device is L1SS-capable and the OS is configured to activate
>>>>>> this mode (e.g. policy==superpowersave).
>>>>>>
>>>>>> This property is already present in the Raspian version of Linux, but the
>>>>>> upstream driver implementaion that will follow adds more details and
>>>>>
>>>>> typo, implementation
>>>>>
>>>>>> discerns between (a) and (b).
>>>>>>
>>>>>> Regarding "brcm,completion-timeout-us"
>>>>>>
>>>>>> Our HW will cause a CPU abort if the L1SS exit time is longer than the
>>>>>> PCIe transaction completion abort timeout. We've been asked to make this
>>>>>> configurable, so we are introducing "brcm,completion-timeout-us".
>>>>>>
>>>>>> Signed-off-by: Jim Quinlan <[email protected]>
>>>>>
>>>>> What happened here? Where is the changelog?
>>>>
>>>> It is in the cover letter:
>>>>
>>>> https://lore.kernel.org/all/[email protected]/
>>>>
>>>> but it does not look like the cover letter was copied to you or Rob.
>>>
>>> As you said, I did not get it.
>>
>> Yes, sorry about that; I use a wrapper over the "cocci_cc" script and
>> I need to modify one or both scripts to send the cover to the
>> superset of recipients in the constituent commits.
>
> Try out 'b4'. It's much easier.
>
> In any case, I don't read cover letters. Changes to a patch belong with
> the patch.
This is not what most other maintainers do, and there does not appear to
be a general consensus amongst maintainers that the changes belong in
the individual patches, or in the cover letter. Some trees like the
networking tree do merge commits of patch sets where the cover letter is
used as part of the merge commit message. Other maintainers don't, and
some want the change log after the '---' and some do not.
--
Florian
Hi Jim,
Jim Quinlan <[email protected]> (2023-04-11):
> […]
> This property has already been in use by Raspian Linux, but this
> immplementation adds more details and discerns between (a) and (b)
^^^^^^^^^^^^^^^
implementation
> automatically.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217276
> Signed-off-by: Jim Quinlan <[email protected]>
Sorry, it seems like my initial tests with v1 (applied on top of
v6.3-rc5-137-gf2afccfefe7b) weren't thorough enough, and I'm seeing the
same problems with v2 (applied on top of v6.3-rc6-46-gde4664485abb):
- same setup as in https://bugzilla.kernel.org/show_bug.cgi?id=217276
- the kernel panic is indeed gone;
- a USB keyboard connected on that SupaHub PCIe-to-multiple-USB adapter
isn't seen by the kernel;
- a USB memory stick connected on the same adapter isn't seen by the
kernel either;
- of course both USB devices are confirmed to work fine if they're
plugged directly on the CM4's USB ports.
Logs with v2:
root@cm4:~# dmesg|grep -i pci
[ 0.610997] PCI: CLS 0 bytes, default 64
[ 1.664886] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[ 1.672083] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
[ 1.679125] brcm-pcie fd500000.pcie: No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
[ 1.688279] brcm-pcie fd500000.pcie: MEM 0x0600000000..0x0603ffffff -> 0x00f8000000
[ 1.696463] brcm-pcie fd500000.pcie: IB MEM 0x0000000000..0x00ffffffff -> 0x0400000000
[ 1.705282] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
[ 1.711629] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 1.717172] pci_bus 0000:00: root bus resource [mem 0x600000000-0x603ffffff] (bus address [0xf8000000-0xfbffffff])
[ 1.727653] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
[ 1.733768] pci 0000:00:00.0: PME# supported from D0 D3hot
[ 1.740235] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 1.855826] brcm-pcie fd500000.pcie: CLKREQ# ignored; no ASPM
[ 1.863666] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
[ 1.870115] pci 0000:01:00.0: [1912:0014] type 00 class 0x0c0330
[ 1.876205] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
[ 1.883177] pci 0000:01:00.0: PME# supported from D0 D3hot
[ 1.888881] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[ 1.895581] pci 0000:00:00.0: BAR 14: assigned [mem 0x600000000-0x6000fffff]
[ 1.902707] pci 0000:01:00.0: BAR 0: assigned [mem 0x600000000-0x600001fff 64bit]
[ 1.910279] pci 0000:00:00.0: PCI bridge to [bus 01]
[ 1.915293] pci 0000:00:00.0: bridge window [mem 0x600000000-0x6000fffff]
[ 1.922412] pcieport 0000:00:00.0: enabling device (0000 -> 0002)
[ 1.928633] pcieport 0000:00:00.0: PME: Signaling with IRQ 23
[ 1.934609] pcieport 0000:00:00.0: AER: enabled with IRQ 23
[ 1.940340] pci 0000:01:00.0: enabling device (0000 -> 0002)
[ 6.946090] pci 0000:01:00.0: xHCI HW not ready after 5 sec (HC bug?) status = 0x1801
[ 6.954026] pci 0000:01:00.0: quirk_usb_early_handoff+0x0/0x968 took 4896180 usecs
Please let me know what I can do to help.
Cheers,
--
Cyril Brulebois ([email protected]) <https://debamax.com/>
D-I release manager -- Release team member -- Freelance Consultant
Hello Cyril,
Can you provide (a) the full boot log prior to applying the patch
series and (b) full boot log after applying the series, using an
IDENTICAL setup.
If it fails on both then it has little to do with my patch series.
In my last series your testing somehow conflated the effect of an
unrelated MMC interrupt issue so please be precise.
Regards,
Jim Quinlan
Broadcom STB
On Thu, Apr 13, 2023 at 10:39 AM Cyril Brulebois <[email protected]> wrote:
>
> Hi Jim,
>
> Jim Quinlan <[email protected]> (2023-04-11):
> > […]
> > This property has already been in use by Raspian Linux, but this
> > immplementation adds more details and discerns between (a) and (b)
> ^^^^^^^^^^^^^^^
> implementation
>
> > automatically.
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217276
> > Signed-off-by: Jim Quinlan <[email protected]>
>
> Sorry, it seems like my initial tests with v1 (applied on top of
> v6.3-rc5-137-gf2afccfefe7b) weren't thorough enough, and I'm seeing the
> same problems with v2 (applied on top of v6.3-rc6-46-gde4664485abb):
> - same setup as in https://bugzilla.kernel.org/show_bug.cgi?id=217276
> - the kernel panic is indeed gone;
> - a USB keyboard connected on that SupaHub PCIe-to-multiple-USB adapter
> isn't seen by the kernel;
> - a USB memory stick connected on the same adapter isn't seen by the
> kernel either;
> - of course both USB devices are confirmed to work fine if they're
> plugged directly on the CM4's USB ports.
>
> Logs with v2:
>
> root@cm4:~# dmesg|grep -i pci
> [ 0.610997] PCI: CLS 0 bytes, default 64
> [ 1.664886] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
> [ 1.672083] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
> [ 1.679125] brcm-pcie fd500000.pcie: No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
> [ 1.688279] brcm-pcie fd500000.pcie: MEM 0x0600000000..0x0603ffffff -> 0x00f8000000
> [ 1.696463] brcm-pcie fd500000.pcie: IB MEM 0x0000000000..0x00ffffffff -> 0x0400000000
> [ 1.705282] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
> [ 1.711629] pci_bus 0000:00: root bus resource [bus 00-ff]
> [ 1.717172] pci_bus 0000:00: root bus resource [mem 0x600000000-0x603ffffff] (bus address [0xf8000000-0xfbffffff])
> [ 1.727653] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
> [ 1.733768] pci 0000:00:00.0: PME# supported from D0 D3hot
> [ 1.740235] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> [ 1.855826] brcm-pcie fd500000.pcie: CLKREQ# ignored; no ASPM
> [ 1.863666] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
> [ 1.870115] pci 0000:01:00.0: [1912:0014] type 00 class 0x0c0330
> [ 1.876205] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
> [ 1.883177] pci 0000:01:00.0: PME# supported from D0 D3hot
> [ 1.888881] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
> [ 1.895581] pci 0000:00:00.0: BAR 14: assigned [mem 0x600000000-0x6000fffff]
> [ 1.902707] pci 0000:01:00.0: BAR 0: assigned [mem 0x600000000-0x600001fff 64bit]
> [ 1.910279] pci 0000:00:00.0: PCI bridge to [bus 01]
> [ 1.915293] pci 0000:00:00.0: bridge window [mem 0x600000000-0x6000fffff]
> [ 1.922412] pcieport 0000:00:00.0: enabling device (0000 -> 0002)
> [ 1.928633] pcieport 0000:00:00.0: PME: Signaling with IRQ 23
> [ 1.934609] pcieport 0000:00:00.0: AER: enabled with IRQ 23
> [ 1.940340] pci 0000:01:00.0: enabling device (0000 -> 0002)
> [ 6.946090] pci 0000:01:00.0: xHCI HW not ready after 5 sec (HC bug?) status = 0x1801
> [ 6.954026] pci 0000:01:00.0: quirk_usb_early_handoff+0x0/0x968 took 4896180 usecs
>
> Please let me know what I can do to help.
>
>
> Cheers,
> --
> Cyril Brulebois ([email protected]) <https://debamax.com/>
> D-I release manager -- Release team member -- Freelance Consultant
On 4/13/2023 7:39 AM, Cyril Brulebois wrote:
> Hi Jim,
>
> Jim Quinlan <[email protected]> (2023-04-11):
>> […]
>> This property has already been in use by Raspian Linux, but this
>> immplementation adds more details and discerns between (a) and (b)
> ^^^^^^^^^^^^^^^
> implementation
>
>> automatically.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217276
>> Signed-off-by: Jim Quinlan <[email protected]>
>
> Sorry, it seems like my initial tests with v1 (applied on top of
> v6.3-rc5-137-gf2afccfefe7b) weren't thorough enough, and I'm seeing the
> same problems with v2 (applied on top of v6.3-rc6-46-gde4664485abb):
> - same setup as in https://bugzilla.kernel.org/show_bug.cgi?id=217276
> - the kernel panic is indeed gone;
> - a USB keyboard connected on that SupaHub PCIe-to-multiple-USB adapter
> isn't seen by the kernel;
> - a USB memory stick connected on the same adapter isn't seen by the
> kernel either;
> - of course both USB devices are confirmed to work fine if they're
> plugged directly on the CM4's USB ports.
>
> Logs with v2:
>
> root@cm4:~# dmesg|grep -i pci
> [ 0.610997] PCI: CLS 0 bytes, default 64
> [ 1.664886] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
> [ 1.672083] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges:
> [ 1.679125] brcm-pcie fd500000.pcie: No bus range found for /scb/pcie@7d500000, using [bus 00-ff]
> [ 1.688279] brcm-pcie fd500000.pcie: MEM 0x0600000000..0x0603ffffff -> 0x00f8000000
> [ 1.696463] brcm-pcie fd500000.pcie: IB MEM 0x0000000000..0x00ffffffff -> 0x0400000000
> [ 1.705282] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00
> [ 1.711629] pci_bus 0000:00: root bus resource [bus 00-ff]
> [ 1.717172] pci_bus 0000:00: root bus resource [mem 0x600000000-0x603ffffff] (bus address [0xf8000000-0xfbffffff])
> [ 1.727653] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400
> [ 1.733768] pci 0000:00:00.0: PME# supported from D0 D3hot
> [ 1.740235] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> [ 1.855826] brcm-pcie fd500000.pcie: CLKREQ# ignored; no ASPM
> [ 1.863666] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC)
> [ 1.870115] pci 0000:01:00.0: [1912:0014] type 00 class 0x0c0330
> [ 1.876205] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
> [ 1.883177] pci 0000:01:00.0: PME# supported from D0 D3hot
> [ 1.888881] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
> [ 1.895581] pci 0000:00:00.0: BAR 14: assigned [mem 0x600000000-0x6000fffff]
> [ 1.902707] pci 0000:01:00.0: BAR 0: assigned [mem 0x600000000-0x600001fff 64bit]
> [ 1.910279] pci 0000:00:00.0: PCI bridge to [bus 01]
> [ 1.915293] pci 0000:00:00.0: bridge window [mem 0x600000000-0x6000fffff]
> [ 1.922412] pcieport 0000:00:00.0: enabling device (0000 -> 0002)
> [ 1.928633] pcieport 0000:00:00.0: PME: Signaling with IRQ 23
> [ 1.934609] pcieport 0000:00:00.0: AER: enabled with IRQ 23
> [ 1.940340] pci 0000:01:00.0: enabling device (0000 -> 0002)
> [ 6.946090] pci 0000:01:00.0: xHCI HW not ready after 5 sec (HC bug?) status = 0x1801
> [ 6.954026] pci 0000:01:00.0: quirk_usb_early_handoff+0x0/0x968 took 4896180 usecs
>
> Please let me know what I can do to help.
Could you please attach your .config so we can check a few things?
--
Florian
On 4/11/23 09:59, Jim Quinlan wrote:
> v2 -- Changed binding property 'brcm,completion-timeout-msec' to
> 'brcm,completion-timeout-us'. (StefanW for standard suffix).
> -- Warn when clamping timeout value, and include clamped
> region in message. Also add min and max in YAML. (StefanW)
> -- Qualify description of "brcm,completion-timeout-us" so that
> it refers to PCIe transactions. (StefanW)
> -- Remvove mention of Linux specifics in binding description. (StefanW)
> -- s/clkreq#/CLKREQ#/g (Bjorn)
> -- Refactor completion-timeout-us code to compare max and min to
> value given by the property (as opposed to the computed value).
>
> v1 -- The current driver assumes the downstream devices can
> provide CLKREQ# for ASPM. These commits accomodate devices
> w/ or w/o clkreq# and also handle L1SS-capable devices.
>
> -- The Raspian Linux folks have already been using a PCIe RC
> property "brcm,enable-l1ss". These commits use the same
> property, in a backward-compatible manner, and the implementaion
> adds more detail and also automatically identifies devices w/o
> a clkreq# signal, i.e. most devices plugged into an RPi CM4
> IO board.
>
> Jim Quinlan (3):
> PCI: brcmstb: CLKREQ# accomodations of downstream device
> PCI: brcmstb: Set PCIe transaction completion timeout
> blah blah
Tested-by: Florian Fainelli <[email protected]>
On a 7216 system test with:
01:00.0 Network controller: Intel Corporation Wireless 7260 (rev 73)
and on the CM4 I/O board with:
01:00.0 Network controller: Intel Corporation Wireless 7260 (rev 73)
01:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
Connection
01:00.0 Network controller: Broadcom Inc. and subsidiaries BCM43224
802.11a/b/g/n (rev 01)
01:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9125 PCIe
SATA 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme
BCM5751 Gigabit Ethernet PCI Express (rev 21)
01:00.0 PCI bridge: PLX Technology, Inc. PEX8112 x1 Lane PCI
Express-to-PCI Bridge (rev aa)
02:00.0 Multiport serial controller: Pepperl+Fuchs RocketPort EXPRESS
8-port w/Octa Cable
01:00.0 Ethernet controller: Qualcomm Atheros AR5008 Wireless Network
Adapter (rev 01)
01:00.0 Network controller: Broadcom Inc. and subsidiaries BCM4311
802.11a/b/g (rev 01)
01:00.0 Network controller: Broadcom Inc. and subsidiaries BCM4322
802.11a/b/g/n Wireless LAN Controller (rev 01)
01:00.0 Network controller: Broadcom Inc. and subsidiaries BCM43602
802.11ac Wireless LAN SoC (rev 01)
and finally with a 4 port switch:
-[0000:00]---00.0-[01-07]----00.0-[02-07]--+-01.0-[03]----00.0 Intel
Corporation 82574L Gigabit Network Connection
+-03.0-[04-05]----00.0-[05]----00.0 Pepperl+Fuchs RocketPort EXPRESS
8-port w/Octa Cable
+-05.0-[06]----00.0
Broadcom Inc. and subsidiaries NetXtreme BCM5751 Gigabit Ethernet PCI
Express
\-07.0-[07]----00.0 Intel
Corporation 82574L Gigabit Network Connection
And than I ran out of devices that I could plug, the others were x4, x8
or x16.
Most (all?) would previously fail, so definitively an improvement!
Thanks!
--
--
Florian
Hi Jim,
Jim Quinlan <[email protected]> (2023-04-13):
> Can you provide (a) the full boot log prior to applying the patch
> series and (b) full boot log after applying the series, using an
> IDENTICAL setup. If it fails on both then it has little to do with my
> patch series.
Just to be clear, the issue I reported was with:
- Raspberry Pi Compute Module 4 (Rev 1.1, 4G RAM, 32G storage)
- Raspberry Pi Compute Module 4 IO Board
- SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006S
This was my minimal reproducer for the kernel panic at boot-up, which
goes away with either v1 or v2. When I realized I didn't actually check
whether the SupaHub board was working correctly, I plugged 2 devices to
obtain this setup:
- Raspberry Pi Compute Module 4 (Rev 1.1, 4G RAM, 32G storage)
- Raspberry Pi Compute Module 4 IO Board
- SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006S
- Kingston DataTraveler G4 32GB on USB-A port #1 of the SupaHub board.
- Logitech K120 keyboard on USB-A port #2 of the SupaHub board.
It turns out that this particular revision of the SupaHub board isn't
supported by xhci_hcd directly (failing to probe with error -110) and
one needs to enable CONFIG_USB_XHCI_PCI_RENESAS=m and also ship its
accompanying firmware (/lib/firmware/renesas_usb_fw.mem). With this
updated kernel config, I'm able to use the keyboard and to read data
from the memory stick without problems (70 MB/s).
> In my last series your testing somehow conflated the effect of an
> unrelated MMC interrupt issue so please be precise.
I wish things would be simpler and didn't involve combinatorics, let
alone other bugs/regressions at times, but I'm really trying my best to
navigate and report issues and test patches when I can spare some time…
In my defence, the very similar board PCE6U1C-R02, VER 006 boots without
a kernel panic, and works fine with xhci_hcd without that extra module
and its firmware. It's based on the exact same chip (Renesas Technology
Corp. uPD720201) though, that's why I didn't realize the need for an
extra module until now for the PCE6U1C-R02, VER 006S one. (Florian,
thanks for mentioning .config earlier…)
To sum up, it seems both (sub)versions of that SupaHub PCE6U1C-R02 board
are usable with this patch series: the kernel panic at boot-up is gone,
and USB devices plugged into those boards are working as expected.
But, speaking of combinatorics, while the patch series helps fix
https://bugzilla.kernel.org/show_bug.cgi?id=217276 on that particular
initial combination of CM4 and SupaHub PCE6U1C-R02, VER 006S, it also
generates a regression if I use a different CM4…
Setup:
- Raspberry Pi Compute Module 4 (Rev 1.0, 8G RAM, 32G storage)
- Raspberry Pi Compute Module 4 IO Board
- SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006S
You'll find attached (a) and (b) as requested above, for this setup, as
well as the kernel configuration file:
- (a) = dmesg-unpatched.txt = boots fine (and USB devices work fine if
memory sticks or keyboards are plugged in).
- (b) = dmesg-patched.txt = kernel panic.
- cm4+pcie.config
I'm getting similar results with the other SupaHub board:
- Raspberry Pi Compute Module 4 (Rev 1.0, 8G RAM, 32G storage)
- Raspberry Pi Compute Module 4 IO Board
- SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006
This is probably best summarized this way:
| unpatched | patched
------------------------------+--------------+--------------
CM4 Rev 1.1 4G/32G + VER 006 | OK | OK
CM4 Rev 1.1 4G/32G + VER 006S | Kernel panic | OK ← Bugzilla entry
CM4 Rev 1.0 8G/32G + VER 006 | OK | Kernel panic
CM4 Rev 1.0 8G/32G + VER 006S | OK | Kernel panic
(And I'm of course using the same settings regarding the serial console,
to get traces when needed.)
Cheers,
--
Cyril Brulebois ([email protected]) <https://debamax.com/>
D-I release manager -- Release team member -- Freelance Consultant
On Thu, Apr 13, 2023 at 4:06 PM Cyril Brulebois <[email protected]> wrote:
>
> Hi Jim,
>
> Jim Quinlan <[email protected]> (2023-04-13):
> > Can you provide (a) the full boot log prior to applying the patch
> > series and (b) full boot log after applying the series, using an
> > IDENTICAL setup. If it fails on both then it has little to do with my
> > patch series.
>
> Just to be clear, the issue I reported was with:
> - Raspberry Pi Compute Module 4 (Rev 1.1, 4G RAM, 32G storage)
> - Raspberry Pi Compute Module 4 IO Board
> - SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006S
>
> This was my minimal reproducer for the kernel panic at boot-up, which
> goes away with either v1 or v2. When I realized I didn't actually check
> whether the SupaHub board was working correctly, I plugged 2 devices to
> obtain this setup:
> - Raspberry Pi Compute Module 4 (Rev 1.1, 4G RAM, 32G storage)
> - Raspberry Pi Compute Module 4 IO Board
> - SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006S
> - Kingston DataTraveler G4 32GB on USB-A port #1 of the SupaHub board.
> - Logitech K120 keyboard on USB-A port #2 of the SupaHub board.
>
> It turns out that this particular revision of the SupaHub board isn't
> supported by xhci_hcd directly (failing to probe with error -110) and
> one needs to enable CONFIG_USB_XHCI_PCI_RENESAS=m and also ship its
> accompanying firmware (/lib/firmware/renesas_usb_fw.mem). With this
> updated kernel config, I'm able to use the keyboard and to read data
> from the memory stick without problems (70 MB/s).
>
> > In my last series your testing somehow conflated the effect of an
> > unrelated MMC interrupt issue so please be precise.
>
> I wish things would be simpler and didn't involve combinatorics, let
> alone other bugs/regressions at times, but I'm really trying my best to
> navigate and report issues and test patches when I can spare some time…
Hi Cyril,
I want to encourage you and others doing testing and bug reporting:
everyone wins when a bug or issue is reported, fixed, and tested.
I'm just asking that when you have negative results, that you provide
information on the "before" and "after" test results of
the patch series, and run both on the same test environment.
Regards,
Jim Quinlan
Broadcom STB
>
>
> In my defence, the very similar board PCE6U1C-R02, VER 006 boots without
> a kernel panic, and works fine with xhci_hcd without that extra module
> and its firmware. It's based on the exact same chip (Renesas Technology
> Corp. uPD720201) though, that's why I didn't realize the need for an
> extra module until now for the PCE6U1C-R02, VER 006S one. (Florian,
> thanks for mentioning .config earlier…)
>
> To sum up, it seems both (sub)versions of that SupaHub PCE6U1C-R02 board
> are usable with this patch series: the kernel panic at boot-up is gone,
> and USB devices plugged into those boards are working as expected.
>
>
> But, speaking of combinatorics, while the patch series helps fix
> https://bugzilla.kernel.org/show_bug.cgi?id=217276 on that particular
> initial combination of CM4 and SupaHub PCE6U1C-R02, VER 006S, it also
> generates a regression if I use a different CM4…
>
> Setup:
> - Raspberry Pi Compute Module 4 (Rev 1.0, 8G RAM, 32G storage)
> - Raspberry Pi Compute Module 4 IO Board
> - SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006S
>
> You'll find attached (a) and (b) as requested above, for this setup, as
> well as the kernel configuration file:
> - (a) = dmesg-unpatched.txt = boots fine (and USB devices work fine if
> memory sticks or keyboards are plugged in).
> - (b) = dmesg-patched.txt = kernel panic.
> - cm4+pcie.config
>
> I'm getting similar results with the other SupaHub board:
> - Raspberry Pi Compute Module 4 (Rev 1.0, 8G RAM, 32G storage)
> - Raspberry Pi Compute Module 4 IO Board
> - SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006
>
>
> This is probably best summarized this way:
>
> | unpatched | patched
> ------------------------------+--------------+--------------
> CM4 Rev 1.1 4G/32G + VER 006 | OK | OK
> CM4 Rev 1.1 4G/32G + VER 006S | Kernel panic | OK ← Bugzilla entry
> CM4 Rev 1.0 8G/32G + VER 006 | OK | Kernel panic
> CM4 Rev 1.0 8G/32G + VER 006S | OK | Kernel panic
>
>
> (And I'm of course using the same settings regarding the serial console,
> to get traces when needed.)
>
>
> Cheers,
> --
> Cyril Brulebois ([email protected]) <https://debamax.com/>
> D-I release manager -- Release team member -- Freelance Consultant
On 4/14/2023 5:14 AM, Jim Quinlan wrote:
> On Thu, Apr 13, 2023 at 4:06 PM Cyril Brulebois <[email protected]> wrote:
>>
>> Hi Jim,
>>
>> Jim Quinlan <[email protected]> (2023-04-13):
>>> Can you provide (a) the full boot log prior to applying the patch
>>> series and (b) full boot log after applying the series, using an
>>> IDENTICAL setup. If it fails on both then it has little to do with my
>>> patch series.
>>
>> Just to be clear, the issue I reported was with:
>> - Raspberry Pi Compute Module 4 (Rev 1.1, 4G RAM, 32G storage)
>> - Raspberry Pi Compute Module 4 IO Board
>> - SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006S
>>
>> This was my minimal reproducer for the kernel panic at boot-up, which
>> goes away with either v1 or v2. When I realized I didn't actually check
>> whether the SupaHub board was working correctly, I plugged 2 devices to
>> obtain this setup:
>> - Raspberry Pi Compute Module 4 (Rev 1.1, 4G RAM, 32G storage)
>> - Raspberry Pi Compute Module 4 IO Board
>> - SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006S
>> - Kingston DataTraveler G4 32GB on USB-A port #1 of the SupaHub board.
>> - Logitech K120 keyboard on USB-A port #2 of the SupaHub board.
>>
>> It turns out that this particular revision of the SupaHub board isn't
>> supported by xhci_hcd directly (failing to probe with error -110) and
>> one needs to enable CONFIG_USB_XHCI_PCI_RENESAS=m and also ship its
>> accompanying firmware (/lib/firmware/renesas_usb_fw.mem). With this
>> updated kernel config, I'm able to use the keyboard and to read data
>> from the memory stick without problems (70 MB/s).
>>
>>> In my last series your testing somehow conflated the effect of an
>>> unrelated MMC interrupt issue so please be precise.
>>
>> I wish things would be simpler and didn't involve combinatorics, let
>> alone other bugs/regressions at times, but I'm really trying my best to
>> navigate and report issues and test patches when I can spare some time…
>
> Hi Cyril,
>
> I want to encourage you and others doing testing and bug reporting:
> everyone wins when a bug or issue is reported, fixed, and tested.
> I'm just asking that when you have negative results, that you provide
> information on the "before" and "after" test results of
> the patch series, and run both on the same test environment.
Cyril, based upon the table and logs you provided whereby you have used
the following:
- Raspberry Pi Compute Module 4 (Rev 1.0, 8G RAM, 32G storage)
- Raspberry Pi Compute Module 4 IO Board
- SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006S
in the before/unpatched case we have a PCIe link down and in the
after/patched we have a PCIe link up but a kernel panic. Neither are
great nor resulting in a fully functional PCIe device.
Looking at:
https://www.amazon.co.uk/SupaHub-Express-BandWidth-Capable-Expanding/dp/B092ZQWG5B
it would appear that it can accept an external power supply, do you have
one connected to that USB expansion card by any chance? Are you able to
boot the kernel before/after if you disconnect any USB peripheral?
This looks like a broader electrical problem than the scope of this
patch, though it would be neat if we could find a combination that
works. At least with Jim's patch we have a PCIe link with
uni-directional CLKREQ# so we could try a variety of things.
Does that SupaHub board plugged to the CM4 1.0 system work fine in the
Raspberry Pi kernel tree?
--
Florian
On Fri, Apr 14, 2023 at 8:27 AM Florian Fainelli <[email protected]> wrote:
>
>
>
> On 4/14/2023 5:14 AM, Jim Quinlan wrote:
> > On Thu, Apr 13, 2023 at 4:06 PM Cyril Brulebois <[email protected]> wrote:
> >>
> >> Hi Jim,
> >>
> >> Jim Quinlan <[email protected]> (2023-04-13):
> >>> Can you provide (a) the full boot log prior to applying the patch
> >>> series and (b) full boot log after applying the series, using an
> >>> IDENTICAL setup. If it fails on both then it has little to do with my
> >>> patch series.
> >>
> >> Just to be clear, the issue I reported was with:
> >> - Raspberry Pi Compute Module 4 (Rev 1.1, 4G RAM, 32G storage)
> >> - Raspberry Pi Compute Module 4 IO Board
> >> - SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006S
> >>
> >> This was my minimal reproducer for the kernel panic at boot-up, which
> >> goes away with either v1 or v2. When I realized I didn't actually check
> >> whether the SupaHub board was working correctly, I plugged 2 devices to
> >> obtain this setup:
> >> - Raspberry Pi Compute Module 4 (Rev 1.1, 4G RAM, 32G storage)
> >> - Raspberry Pi Compute Module 4 IO Board
> >> - SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006S
> >> - Kingston DataTraveler G4 32GB on USB-A port #1 of the SupaHub board.
> >> - Logitech K120 keyboard on USB-A port #2 of the SupaHub board.
> >>
> >> It turns out that this particular revision of the SupaHub board isn't
> >> supported by xhci_hcd directly (failing to probe with error -110) and
> >> one needs to enable CONFIG_USB_XHCI_PCI_RENESAS=m and also ship its
> >> accompanying firmware (/lib/firmware/renesas_usb_fw.mem). With this
> >> updated kernel config, I'm able to use the keyboard and to read data
> >> from the memory stick without problems (70 MB/s).
> >>
> >>> In my last series your testing somehow conflated the effect of an
> >>> unrelated MMC interrupt issue so please be precise.
> >>
> >> I wish things would be simpler and didn't involve combinatorics, let
> >> alone other bugs/regressions at times, but I'm really trying my best to
> >> navigate and report issues and test patches when I can spare some time…
> >
> > Hi Cyril,
> >
> > I want to encourage you and others doing testing and bug reporting:
> > everyone wins when a bug or issue is reported, fixed, and tested.
> > I'm just asking that when you have negative results, that you provide
> > information on the "before" and "after" test results of
> > the patch series, and run both on the same test environment.
>
> Cyril, based upon the table and logs you provided whereby you have used
> the following:
>
> - Raspberry Pi Compute Module 4 (Rev 1.0, 8G RAM, 32G storage)
> - Raspberry Pi Compute Module 4 IO Board
> - SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006S
>
> in the before/unpatched case we have a PCIe link down and in the
> after/patched we have a PCIe link up but a kernel panic. Neither are
> great nor resulting in a fully functional PCIe device.
The data do not make sense to me. My new code is executed AFTER pcie
link up/down determination. Both test results should have the same
link status -- either "link up" or "link down".
If the system is wishy-washy, i.e. it has link-up in 5/10 boots, then
we need to repeat the experiment to compare the "link up" cases only.
Or discount the test completely
If the system is not wishy-washy, then something has been changed
between the "before" and "after" tests.
>
> Looking at:
>
> https://www.amazon.co.uk/SupaHub-Express-BandWidth-Capable-Expanding/dp/B092ZQWG5B
>
> it would appear that it can accept an external power supply, do you have
> one connected to that USB expansion card by any chance? Are you able to
> boot the kernel before/after if you disconnect any USB peripheral?
>
> This looks like a broader electrical problem than the scope of this
> patch, though it would be neat if we could find a combination that
> works. At least with Jim's patch we have a PCIe link with
> uni-directional CLKREQ# so we could try a variety of things.
>
> Does that SupaHub board plugged to the CM4 1.0 system work fine in the
> Raspberry Pi kernel tree?
> --
> Florian
Florian Fainelli <[email protected]> (2023-04-14):
> Cyril, based upon the table and logs you provided whereby you have used the
> following:
>
> - Raspberry Pi Compute Module 4 (Rev 1.0, 8G RAM, 32G storage)
> - Raspberry Pi Compute Module 4 IO Board
> - SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006S
>
> in the before/unpatched case we have a PCIe link down and in the
> after/patched we have a PCIe link up but a kernel panic. Neither are great
> nor resulting in a fully functional PCIe device.
Based on Jim's feedback, I'm attaching a new dmesg for the aforementioned
setup, with an unpatched kernel (dmesg-unpatched-pcie-link-up.txt).
I fear that initially I might have not waited enough before power off and
power on when replugging the SupaHub adapter, and I've only seen the PCIe
link down a few times (now that I'm actively paying attention to that
part). I'm indeed seeing PCIe link up consistently (100%) when using the
following method:
for i in $(seq 1 20); do
# power on, let the system boot up and settle:
sispmctl -t 4; sleep 2m
ssh cm4 sh -c "dmesg > dmesg-$i.txt; poweroff"
# let the system power down, and power off:
sleep 30; sispmctl -t 4
# wait before power cycling:
sleep 10
done
> Looking at:
> https://www.amazon.co.uk/SupaHub-Express-BandWidth-Capable-Expanding/dp/B092ZQWG5B
>
> it would appear that it can accept an external power supply, do you have one
> connected to that USB expansion card by any chance?
With the patched kernel, I have tried several things (leaving the regular
12V adapter plugged in all the time):
- No external power supply (that's what I've been using in all previous
attempts).
- Using a 5V PSU with 2 pins (ground and 5V) plugged on the Ext PSU
4-pin header on the IO Board.
- Using a 5V PSU with a SATA connector plugged directly onto the SupaHub
adapter.
I'm getting the same trace in all cases.
> Are you able to boot the kernel before/after if you disconnect any USB
> peripheral?
The trace is obtained without any USB peripheral (on the extension card or
on the onboard USB slots). Besides the SupaHub adapter on the PCIe slot, I
only have Ethernet and HDMI plugged in (I'm getting traces via serial
console, and operating the system over SSH when it boots fine).
As soon as I unplug the SupaHub board itself, the system boots fine.
> Does that SupaHub board plugged to the CM4 1.0 system work fine in the
> Raspberry Pi kernel tree?
With the Raspberry Pi OS (64-bit) > Raspberry Pi OS Lite image
(2023-02-21-raspios-bullseye-arm64-lite.img.xz), the system at least
boots fine; I haven't tried adding devices to the mix just yet, trying
to stick with that particular setup.
A full dmesg is attached (dmesg-raspios.txt).
Cheers,
--
Cyril Brulebois ([email protected]) <https://debamax.com/>
D-I release manager -- Release team member -- Freelance Consultant
It'd be nice to mention the property names (maybe omit the "brcm,"
prefix if that helps) in the commit log so "git log --oneline" is more
useful:
959e000f0463 ("dt-bindings: PCI: brcmstb: Add two optional props")
ea372f45cfff ("dt-bindings: PCI: Add bindings for Brcmstb EP voltage regulators")
504253e44a9d ("dt-bindings: PCI: Correct brcmstb interrupts, interrupt-map.")
145790e55d82 ("dt-bindings: PCI: Add compatible string for Brcmstb 74[23]5 MIPs SOCs")
5e8a7d26d935 ("dt-bindings: PCI: brcmstb: compatible is required")
f435ce7ebf8c ("dt-bindings: PCI: brcmstb: add BCM4908 binding")
On Tue, Apr 11, 2023 at 12:59:16PM -0400, Jim Quinlan wrote:
> Regarding "brcm,enable-l1ss":
>
> The Broadcom STB/CM PCIe HW -- a core that is also used by RPi SOCs --
> requires the driver probe() to deliberately place the HW one of three
> CLKREQ# modes:
>
> (a) CLKREQ# driven by the RC unconditionally
> (b) CLKREQ# driven by the EP for ASPM L0s, L1
> (c) Bidirectional CLKREQ#, as used for L1 Substates (L1SS).
>
> The HW+driver can tell the difference between downstream devices that
> need (a) and (b), but does not know when to configure (c). Further, the
> HW may cause a CPU abort on boot if guesses wrong regarding the need for
> (c). So we introduce the boolean "brcm,enable-l1ss" property to indicate
> that (c) is desired. Setting this property only makes sense when the
> downstream device is L1SS-capable and the OS is configured to activate
> this mode (e.g. policy==superpowersave).
>
> This property is already present in the Raspian version of Linux, but the
> upstream driver implementaion that will follow adds more details and
> discerns between (a) and (b).
>
> Regarding "brcm,completion-timeout-us"
>
> Our HW will cause a CPU abort if the L1SS exit time is longer than the
> PCIe transaction completion abort timeout. We've been asked to make this
> configurable, so we are introducing "brcm,completion-timeout-us".
Completion Timeout is a generic PCIe concept. Do we want a generic
(non-brcm) name that would be documented elsewhere? Rob?
> Signed-off-by: Jim Quinlan <[email protected]>
> ---
> .../devicetree/bindings/pci/brcm,stb-pcie.yaml | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml b/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> index 7e15aae7d69e..f7fc2f6561bb 100644
> --- a/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> +++ b/Documentation/devicetree/bindings/pci/brcm,stb-pcie.yaml
> @@ -64,6 +64,22 @@ properties:
>
> aspm-no-l0s: true
>
> + brcm,enable-l1ss:
> + description: Indicates that PCIe L1SS power savings
> + are desired, the downstream device is L1SS-capable, and the
> + OS has been configured to enable this mode. Note that when
> + in this mode, this particular HW may not meet the requirement
> + that requires CLKREQ# assertion to clock active to be
> + within 400ns.
Maybe a pointer to the source of the 400ns requirement?
"requirement that requires" is a little redundant, maybe "... may not
meet the requirement that Refclk be valid within 400ns of CLKREQ#
assertion"?
(I don't actually know whether this refers to Refclk or if that would
be a true statement; this is just a possible sentence structure.)
This subject line no verb. Can you add a leading verb to suggest what
this patch does?
s/accomodations/accommodations/
On Tue, Apr 11, 2023 at 12:59:17PM -0400, Jim Quinlan wrote:
> The Broadcom STB/CM PCIe HW core, which is also used in RPi SOCs, must be
> deliberately set by the probe() into one of three mutually exclusive modes:
>
> (a) No CLKREQ# expected or required, refclk is always available.
> (b) CLKREQ# is expected to be driven by downstream device when needed.
> (c) Bidirectional CLKREQ# for L1SS capable devices.
>
> Previously, only (b) was supported by the driver, as almost all STB/CM
> boards operate in this mode. But now there is interest in activating L1SS
> power savings from STB/CM customers, and also interest in accomodating mode
> (a) for designs such as the RPi CM4 with IO board.
accommodating
> The HW+driver is able to tell us when mode (a) mode is needed. But there
> is no easy way to tell if L1SS mode should be configured. In certain
> situations, getting this wrong may cause a panic during boot time. So we
> rely on the DT prop "brcm,enable-l1ss" to tell us when mode (c) is desired.
> Using this mode only makes sense when the downstream device is L1SS-capable
> and the OS has been configured to activate L1SS
> (e.g. policy==powersupersave).
I'm really concerned about the user experience here. I assume users
do not want to edit the DT based on what device they plug in. They
shouldn't need to (and probably won't) know whether the device
supports L1SS.
I hate kernel/module parameters, but I think even that would be better
then having to edit the DT.
There's obviously a period of time when L1SS is supported but not yet
enabled, so I'm *guessing* the "OS has been configured to activate
L1SS" is not actually a requirement, and choosing (c) really just
opens the possibility that L1SS can be used?
Would be nice to have a hint (maybe a line or two of the panic
message) to help users find the fix for a problem they're seeing.
Obviously the ideal would be if we could use (c) in all cases, so I
assume that's where a panic might happen. What situation would that
be? An endpoint that doesn't support L1SS? One that supports L1SS
but it's not enabled? Maybe if L1SS isn't configured correctly, e.g.,
LTR values programmed wrong?
Bjorn
On 4/14/23 13:27, Bjorn Helgaas wrote:
> This subject line no verb. Can you add a leading verb to suggest what
> this patch does?
>
> s/accomodations/accommodations/
>
> On Tue, Apr 11, 2023 at 12:59:17PM -0400, Jim Quinlan wrote:
>> The Broadcom STB/CM PCIe HW core, which is also used in RPi SOCs, must be
>> deliberately set by the probe() into one of three mutually exclusive modes:
>>
>> (a) No CLKREQ# expected or required, refclk is always available.
>> (b) CLKREQ# is expected to be driven by downstream device when needed.
>> (c) Bidirectional CLKREQ# for L1SS capable devices.
>>
>> Previously, only (b) was supported by the driver, as almost all STB/CM
>> boards operate in this mode. But now there is interest in activating L1SS
>> power savings from STB/CM customers, and also interest in accomodating mode
>> (a) for designs such as the RPi CM4 with IO board.
>
> accommodating
>
>> The HW+driver is able to tell us when mode (a) mode is needed. But there
>> is no easy way to tell if L1SS mode should be configured. In certain
>> situations, getting this wrong may cause a panic during boot time. So we
>> rely on the DT prop "brcm,enable-l1ss" to tell us when mode (c) is desired.
>> Using this mode only makes sense when the downstream device is L1SS-capable
>> and the OS has been configured to activate L1SS
>> (e.g. policy==powersupersave).
>
> I'm really concerned about the user experience here. I assume users
> do not want to edit the DT based on what device they plug in. They
> shouldn't need to (and probably won't) know whether the device
> supports L1SS.
>
> I hate kernel/module parameters, but I think even that would be better
> then having to edit the DT.
The problem I see with kernel/module parameters is that it is really
hard to differentiate whether they should be applied across all
instances of the device/drivers or just for one in particular.
The Raspberry Pi 4 is a single pcie-brcmstb instance, but we have other
systems supported by that driver on Set-top-box and Cable Modem chips
for instance where we may have different types of end-points being
connected, some of which could be Multi-chip-module (MCM) where
everything is known ahead of time, and sometimes cards that are plugged
to full-sized PCIe or mini-PCIe connectors, where some amount of runtime
discoverability is involved.
Without inventing some custom modular parameter syntax, it may not work
that well. The Device Tree properties at least easily allow for
per-instance configuration.
>
> There's obviously a period of time when L1SS is supported but not yet
> enabled, so I'm *guessing* the "OS has been configured to activate
> L1SS" is not actually a requirement, and choosing (c) really just
> opens the possibility that L1SS can be used?
>
> Would be nice to have a hint (maybe a line or two of the panic
> message) to help users find the fix for a problem they're seeing.
>
> Obviously the ideal would be if we could use (c) in all cases, so I
> assume that's where a panic might happen. What situation would that
> be? An endpoint that doesn't support L1SS? One that supports L1SS
> but it's not enabled? Maybe if L1SS isn't configured correctly, e.g.,
> LTR values programmed wrong?
>
> Bjorn
--
Florian
On Fri, Apr 14, 2023 at 4:27 PM Bjorn Helgaas <[email protected]> wrote:
>
> This subject line no verb. Can you add a leading verb to suggest what
> this patch does?
>
> s/accomodations/accommodations/
>
> On Tue, Apr 11, 2023 at 12:59:17PM -0400, Jim Quinlan wrote:
> > The Broadcom STB/CM PCIe HW core, which is also used in RPi SOCs, must be
> > deliberately set by the probe() into one of three mutually exclusive modes:
> >
> > (a) No CLKREQ# expected or required, refclk is always available.
> > (b) CLKREQ# is expected to be driven by downstream device when needed.
> > (c) Bidirectional CLKREQ# for L1SS capable devices.
> >
> > Previously, only (b) was supported by the driver, as almost all STB/CM
> > boards operate in this mode. But now there is interest in activating L1SS
> > power savings from STB/CM customers, and also interest in accomodating mode
> > (a) for designs such as the RPi CM4 with IO board.
>
> accommodating
>
> > The HW+driver is able to tell us when mode (a) mode is needed. But there
> > is no easy way to tell if L1SS mode should be configured. In certain
> > situations, getting this wrong may cause a panic during boot time. So we
> > rely on the DT prop "brcm,enable-l1ss" to tell us when mode (c) is desired.
> > Using this mode only makes sense when the downstream device is L1SS-capable
> > and the OS has been configured to activate L1SS
> > (e.g. policy==powersupersave).
>
> I'm really concerned about the user experience here. I assume users
> do not want to edit the DT based on what device they plug in. They
> shouldn't need to (and probably won't) know whether the device
> supports L1SS.
>
> I hate kernel/module parameters, but I think even that would be better
> then having to edit the DT.
>
> There's obviously a period of time when L1SS is supported but not yet
> enabled, so I'm *guessing* the "OS has been configured to activate
> L1SS" is not actually a requirement, and choosing (c) really just
> opens the possibility that L1SS can be used?
Yes. Before this patch series we had two panic scenarios:
(a) Endpoint devices with no CLKREQ# connection
(b) Endpoints that are L1SS-capable
Even without the "brcm,enable-l1ss" property present, both (a) and
(b) should be eliminated.
The reason (b) is eliminated is because the RC driver now unadvertises
RC L1SS by default; subsequently, Linux does
not turn it on. So the default setting should be fine for all devices.
For those folks who have L1SS capable devices and desire L1SS power
savings, they can add
the brcm,enable-l1ss property. But everyone should have functionality
w/o doing anything.
As I am typing this I realize that my comments and dev_info()s are not
aligned with what I am saying so I will change them in V3. Sorry
about the confusion.
>
> Would be nice to have a hint (maybe a line or two of the panic
> message) to help users find the fix for a problem they're seeing.SS
>
> Obviously the ideal would be if we could use (c) in all cases, so I
> assume that's where a panic might happen. What situation would that
> be? An endpoint that doesn't support L1SS? One that supports L1SS
> but it's not enabled? Maybe if L1SS isn't configured correctly, e.g.,
> LTR values programmed wrong?
Let me test everything on Monday and get back to you before I make any
statements.
Regards,
Jim Quinilan
Broadcom STB
>
> Bjorn
On Fri, Apr 14, 2023 at 01:33:29PM -0700, Florian Fainelli wrote:
> On 4/14/23 13:27, Bjorn Helgaas wrote:
> > On Tue, Apr 11, 2023 at 12:59:17PM -0400, Jim Quinlan wrote:
> ...
> > > The HW+driver is able to tell us when mode (a) mode is needed. But there
> > > is no easy way to tell if L1SS mode should be configured. In certain
> > > situations, getting this wrong may cause a panic during boot time. So we
> > > rely on the DT prop "brcm,enable-l1ss" to tell us when mode (c) is desired.
> > > Using this mode only makes sense when the downstream device is L1SS-capable
> > > and the OS has been configured to activate L1SS
> > > (e.g. policy==powersupersave).
> >
> > I'm really concerned about the user experience here. I assume users
> > do not want to edit the DT based on what device they plug in. They
> > shouldn't need to (and probably won't) know whether the device
> > supports L1SS.
> >
> > I hate kernel/module parameters, but I think even that would be better
> > then having to edit the DT.
>
> The problem I see with kernel/module parameters is that it is really hard to
> differentiate whether they should be applied across all instances of the
> device/drivers or just for one in particular.
>
> The Raspberry Pi 4 is a single pcie-brcmstb instance, but we have other
> systems supported by that driver on Set-top-box and Cable Modem chips for
> instance where we may have different types of end-points being connected,
> some of which could be Multi-chip-module (MCM) where everything is known
> ahead of time, and sometimes cards that are plugged to full-sized PCIe or
> mini-PCIe connectors, where some amount of runtime discoverability is
> involved.
>
> Without inventing some custom modular parameter syntax, it may not work that
> well. The Device Tree properties at least easily allow for per-instance
> configuration.
We do have this already (from
Documentation/admin-guide/kernel-parameters.txt):
pci=option[,option...] [PCI] various PCI subsystem options.
Some options herein operate on a specific device
or a set of devices (<pci_dev>). These are
specified in one of the following formats:
[<domain>:]<bus>:<dev>.<func>[/<dev>.<func>]*
pci:<vendor>:<device>[:<subvendor>:<subdevice>]
Note: the first format specifies a PCI
bus/device/function address which may change
if new hardware is inserted, if motherboard
firmware changes, or due to changes caused
by other kernel parameters. If the
domain is left unspecified, it is
taken to be zero. Optionally, a path
to a device through multiple device/function
addresses can be specified after the base
address (this is more robust against
renumbering issues). The second format
selects devices using IDs from the
configuration space which may match multiple
devices in the system.
On Wed, Apr 12, 2023 at 09:12:07AM -0700, Florian Fainelli wrote:
> On 4/12/23 08:37, Rob Herring wrote:
> > On Wed, Apr 12, 2023 at 10:14:46AM -0400, Jim Quinlan wrote:
> > > On Wed, Apr 12, 2023 at 7:56 AM Krzysztof Kozlowski
> > > <[email protected]> wrote:
> > > >
> > > > On 12/04/2023 13:49, Florian Fainelli wrote:
> > > > >
> > > > >
> > > > > On 4/12/2023 1:09 AM, Krzysztof Kozlowski wrote:
> > > > > > On 11/04/2023 18:59, Jim Quinlan wrote:
> > > > > > > Regarding "brcm,enable-l1ss":
> > > > > > >
> > > > > > > The Broadcom STB/CM PCIe HW -- a core that is also used by RPi SOCs --
> > > > > > > requires the driver probe() to deliberately place the HW one of three
> > > > > > > CLKREQ# modes:
> > > > > > >
> > > > > > > (a) CLKREQ# driven by the RC unconditionally
> > > > > > > (b) CLKREQ# driven by the EP for ASPM L0s, L1
> > > > > > > (c) Bidirectional CLKREQ#, as used for L1 Substates (L1SS).
> > > > > > >
> > > > > > > The HW+driver can tell the difference between downstream devices that
> > > > > > > need (a) and (b), but does not know when to configure (c). Further, the
> > > > > > > HW may cause a CPU abort on boot if guesses wrong regarding the need for
> > > > > > > (c). So we introduce the boolean "brcm,enable-l1ss" property to indicate
> > > > > > > that (c) is desired. Setting this property only makes sense when the
> > > > > > > downstream device is L1SS-capable and the OS is configured to activate
> > > > > > > this mode (e.g. policy==superpowersave).
> > > > > > >
> > > > > > > This property is already present in the Raspian version of Linux, but the
> > > > > > > upstream driver implementaion that will follow adds more details and
> > > > > >
> > > > > > typo, implementation
> > > > > >
> > > > > > > discerns between (a) and (b).
> > > > > > >
> > > > > > > Regarding "brcm,completion-timeout-us"
> > > > > > >
> > > > > > > Our HW will cause a CPU abort if the L1SS exit time is longer than the
> > > > > > > PCIe transaction completion abort timeout. We've been asked to make this
> > > > > > > configurable, so we are introducing "brcm,completion-timeout-us".
> > > > > > >
> > > > > > > Signed-off-by: Jim Quinlan <[email protected]>
> > > > > >
> > > > > > What happened here? Where is the changelog?
> > > > >
> > > > > It is in the cover letter:
> > > > >
> > > > > https://lore.kernel.org/all/[email protected]/
> > > > >
> > > > > but it does not look like the cover letter was copied to you or Rob.
> > > >
> > > > As you said, I did not get it.
> > >
> > > Yes, sorry about that; I use a wrapper over the "cocci_cc" script and
> > > I need to modify one or both scripts to send the cover to the
> > > superset of recipients in the constituent commits.
> >
> > Try out 'b4'. It's much easier.
> >
> > In any case, I don't read cover letters. Changes to a patch belong with
> > the patch.
>
> This is not what most other maintainers do, and there does not appear to be
> a general consensus amongst maintainers that the changes belong in the
> individual patches, or in the cover letter.
Well, I stole that phrase from someone else (gregkh).
> Some trees like the networking
> tree do merge commits of patch sets where the cover letter is used as part
> of the merge commit message. Other maintainers don't, and some want the
> change log after the '---' and some do not.
I'm not aware of anyone except for DRM wanting the changelog in the
final commits, but that's really a different issue.
I'm pretty sure no one will complain about a changelog in the patches. I
guess you just have to duplicate it if you think it should be in both.
b4 could be taught to do that I suppose. IMO, the cover letter should
have a higher level changelog than the individual patches.
Rob
On Fri, Apr 14, 2023 at 12:19 PM Cyril Brulebois <[email protected]> wrote:
>
> Florian Fainelli <[email protected]> (2023-04-14):
> > Cyril, based upon the table and logs you provided whereby you have used the
> > following:
> >
> > - Raspberry Pi Compute Module 4 (Rev 1.0, 8G RAM, 32G storage)
> > - Raspberry Pi Compute Module 4 IO Board
> > - SupaHub PCIe-to-multiple-USB adapter, reference PCE6U1C-R02, VER 006S
> >
> > in the before/unpatched case we have a PCIe link down and in the
> > after/patched we have a PCIe link up but a kernel panic. Neither are great
> > nor resulting in a fully functional PCIe device.
>
> Based on Jim's feedback, I'm attaching a new dmesg for the aforementioned
> setup, with an unpatched kernel (dmesg-unpatched-pcie-link-up.txt).
Hello Cyril,
I'm still seeing things in the logs that do not make sense to me.
First, the "unpatched" version logs -- including the Raspian case --
all have the following lines:
[ 0.000000] Movable zone start for each node
/* ... */
[ 0.000000] pcpu-alloc: s88232 r8192 d30552 u126976 alloc=31*4096
[ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3
The above lines are also in my logs. But they are missing from your
"after patch" logs -- what did you change to have these lines
disappear on the patched logs?
Second, you say that you used a different "CM4" from the one used in
the Bugzilla report -- what do you mean by that? Are you referring
to the CM4 module proper, whose only change was going from 4GB to 8GB,
or the IO board being used? My testing is done with a standard RPi
CM4 and standard RPi IO board. Before this patch series, the only way
this standard configuration can work for most hobbyist PCI cards (i.e.
floating CLKREQ# pin) is by using Raspian AND adding
"brcm,enable-l1ss" to the DT node.
I'm guessing that you are using the configuration that you described
to me in a personal email: "[the] chip is embedded directly on the
modified PCB, as opposed to plugged into the PCIe slot on the official
CM4 IO Board". If true, you are testing on a configuration that (a)
is unique to you and your group and (b) must be doing something with
the CLKREQ# signal so that your "before" case does not panic. Is this
the case?
Finally, and this is a big one, is the fact that in both of the
"before" and "after" cases the value written to the internal Brcm
CLKREQ# register is the same.
This is evident by this line in the "after" patch: "brcm-pcie
fd500000.pcie: uni-dir CLKREQ# for ASPM". This mode is the only mode
supported by the "before"
case. Now there are some other things going on in the patch series
-- reading two registers and extending the timeout value of a
completion abort, but I'm having
a hard time imagining how they could cause a panic.
Regards,
Jim
PS Can you please include in your logs the output of "sudo lspci
-vvv" for those cases that boot to prompt?
>
> I fear that initially I might have not waited enough before power off and
> power on when replugging the SupaHub adapter, and I've only seen the PCIe
> link down a few times (now that I'm actively paying attention to that
> part). I'm indeed seeing PCIe link up consistently (100%) when using the
> following method:
>
> for i in $(seq 1 20); do
> # power on, let the system boot up and settle:
> sispmctl -t 4; sleep 2m
> ssh cm4 sh -c "dmesg > dmesg-$i.txt; poweroff"
> # let the system power down, and power off:
> sleep 30; sispmctl -t 4
> # wait before power cycling:
> sleep 10
> done
>
> > Looking at:
> > https://www.amazon.co.uk/SupaHub-Express-BandWidth-Capable-Expanding/dp/B092ZQWG5B
> >
> > it would appear that it can accept an external power supply, do you have one
> > connected to that USB expansion card by any chance?
>
> With the patched kernel, I have tried several things (leaving the regular
> 12V adapter plugged in all the time):
> - No external power supply (that's what I've been using in all previous
> attempts).
> - Using a 5V PSU with 2 pins (ground and 5V) plugged on the Ext PSU
> 4-pin header on the IO Board.
> - Using a 5V PSU with a SATA connector plugged directly onto the SupaHub
> adapter.
>
> I'm getting the same trace in all cases.
>
> > Are you able to boot the kernel before/after if you disconnect any USB
> > peripheral?
>
> The trace is obtained without any USB peripheral (on the extension card or
> on the onboard USB slots). Besides the SupaHub adapter on the PCIe slot, I
> only have Ethernet and HDMI plugged in (I'm getting traces via serial
> console, and operating the system over SSH when it boots fine).
>
> As soon as I unplug the SupaHub board itself, the system boots fine.
>
> > Does that SupaHub board plugged to the CM4 1.0 system work fine in the
> > Raspberry Pi kernel tree?
>
> With the Raspberry Pi OS (64-bit) > Raspberry Pi OS Lite image
> (2023-02-21-raspios-bullseye-arm64-lite.img.xz), the system at least
> boots fine; I haven't tried adding devices to the mix just yet, trying
> to stick with that particular setup.
>
> A full dmesg is attached (dmesg-raspios.txt).
>
>
> Cheers,
> --
> Cyril Brulebois ([email protected]) <https://debamax.com/>
> D-I release manager -- Release team member -- Freelance Consultant
Hi Jim,
It might take a few days before getting back to you regarding the
various questions you asked. To be on the safe side, I'll probably run
a cold boot for each setup a number of times (e.g. 10), so that any
possible outlier can be spotted/rejected. And maybe share results off
list once we have a better understanding of what's going on. Does that
make sense to you?
I'll cover a particular topic right away though.
Jim Quinlan <[email protected]> (2023-04-19):
> Second, you say that you used a different "CM4" from the one used in
> the Bugzilla report -- what do you mean by that? Are you referring
> to the CM4 module proper, whose only change was going from 4GB to 8GB,
> or the IO board being used? My testing is done with a standard RPi
> CM4 and standard RPi IO board. Before this patch series, the only way
> this standard configuration can work for most hobbyist PCI cards (i.e.
> floating CLKREQ# pin) is by using Raspian AND adding
> "brcm,enable-l1ss" to the DT node.
Regarding the IO Board, I'm using the official Compute Module 4 IO
Board: https://www.raspberrypi.com/products/compute-module-4-io-board/
I've been using the very same IO Board for all my testing, and what I'm
changing is the standard RPi CM4 plugged on it.
> I'm guessing that you are using the configuration that you described
> to me in a personal email: "[the] chip is embedded directly on the
> modified PCB, as opposed to plugged into the PCIe slot on the official
> CM4 IO Board". If true, you are testing on a configuration that (a)
> is unique to you and your group and (b) must be doing something with
> the CLKREQ# signal so that your "before" case does not panic. Is this
> the case?
That's definitely not the case.
True, as mentioned in a personal mail, we've seen problems with a custom
PCB, derived from the CM4 IO Board design, but of course there could be
some faulty design at work there… So we've first researched whether the
same problem could be produced with consumer grade products, and once
we've verified that, I opened #217276 on Bugzilla.
Since Florian's testing seems overwhelmingly positive, and since I'm
seeing definitive improvement with at least one CM4, maybe it would make
sense not to block the patch series on the kernel panic I'm seeing with
the other CM4, and track that particular issue via a separate bug?
Cheers,
--
Cyril Brulebois ([email protected]) <https://debamax.com/>
D-I release manager -- Release team member -- Freelance Consultant
April 18, 2023 2:35 PM, "Rob Herring" <[email protected]> wrote:
>> Some trees like the networking
>> tree do merge commits of patch sets where the cover letter is used as part
>> of the merge commit message. Other maintainers don't, and some want the
>> change log after the '---' and some do not.
>
> I'm not aware of anyone except for DRM wanting the changelog in the
> final commits, but that's really a different issue.
I don't think anyone wants changelogs in actual final commits, they usually go under "---" in patch submissions.
> I'm pretty sure no one will complain about a changelog in the patches. I
> guess you just have to duplicate it if you think it should be in both.
> b4 could be taught to do that I suppose. IMO, the cover letter should
> have a higher level changelog than the individual patches.
b4 doesn't really need to manage per-patch changelogs -- they should just go under "---" in the commit. When you send the series either via "b4 send" or via git-send-email, the changelogs will be properly included in the message, but they won't make it into the tree after the maintainer runs "git am", because git will drop anything under the first "---" in the commit message.
The cover letter changelog is supposed to be higher level than individual patch changelogs, so I don't think it makes sense for b4 to collect them from individual patches.
-K