2024-01-09 15:21:14

by Lucas Karpinski

[permalink] [raw]
Subject: [PATCH v2] arm64: dts: qcom: sa8540p-ride: disable pcie2a node

pcie2a and pcie3a both cause interrupt storms to occur. However, when
both are enabled simultaneously, the two combined interrupt storms will
lead to rcu stalls. Red Hat is the only company still using this board
and since we still need pcie3a, just disable pcie2a.

Signed-off-by: Lucas Karpinski <[email protected]>
---
v2:
- don't remove the entire pcie2a node, just set status to disabled.
- update commit message.

arch/arm64/boot/dts/qcom/sa8540p-ride.dts | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
index b04f72ec097c..177b9dad6ff7 100644
--- a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
+++ b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
@@ -376,14 +376,14 @@ &pcie2a {
pinctrl-names = "default";
pinctrl-0 = <&pcie2a_default>;

- status = "okay";
+ status = "disabled";
};

&pcie2a_phy {
vdda-phy-supply = <&vreg_l11a>;
vdda-pll-supply = <&vreg_l3a>;

- status = "okay";
+ status = "disabled";
};

&pcie3a {
--
2.43.0



2024-01-11 14:02:59

by Brian Masney

[permalink] [raw]
Subject: Re: [PATCH v2] arm64: dts: qcom: sa8540p-ride: disable pcie2a node

On Tue, Jan 09, 2024 at 10:20:50AM -0500, Lucas Karpinski wrote:
> pcie2a and pcie3a both cause interrupt storms to occur. However, when
> both are enabled simultaneously, the two combined interrupt storms will
> lead to rcu stalls. Red Hat is the only company still using this board
> and since we still need pcie3a, just disable pcie2a.
>
> Signed-off-by: Lucas Karpinski <[email protected]>

Reviewed-by: Brian Masney <[email protected]>

To elaborate further: Leaving both pcie2a and pcie3a enabled will lead
to rcu stalls and the board fails to boot when both are enabled. We
have the latest firmware that we've been able to get from QC.
Disabling one of the pcie nodes works around the boot issue. There's
nothing interesting on pcie2a on the development board, and pcie3a is
enabled because it has 10GB ethernet that works upstream.

The interrupt storm on pcie3a can still occur on this platform, however
that's a separate issue.

Brian


2024-01-11 15:16:35

by Andrew Halaney

[permalink] [raw]
Subject: Re: [PATCH v2] arm64: dts: qcom: sa8540p-ride: disable pcie2a node

On Thu, Jan 11, 2024 at 09:02:41AM -0500, Brian Masney wrote:
> On Tue, Jan 09, 2024 at 10:20:50AM -0500, Lucas Karpinski wrote:
> > pcie2a and pcie3a both cause interrupt storms to occur. However, when
> > both are enabled simultaneously, the two combined interrupt storms will
> > lead to rcu stalls. Red Hat is the only company still using this board
> > and since we still need pcie3a, just disable pcie2a.
> >
> > Signed-off-by: Lucas Karpinski <[email protected]>
>
> Reviewed-by: Brian Masney <[email protected]>
>
> To elaborate further: Leaving both pcie2a and pcie3a enabled will lead
> to rcu stalls and the board fails to boot when both are enabled. We
> have the latest firmware that we've been able to get from QC.
> Disabling one of the pcie nodes works around the boot issue. There's
> nothing interesting on pcie2a on the development board, and pcie3a is
> enabled because it has 10GB ethernet that works upstream.
>
> The interrupt storm on pcie3a can still occur on this platform, however
> that's a separate issue.

Related work-around to that in case anyone is interested in the paper
trail:

https://lore.kernel.org/all/[email protected]/


2024-01-30 21:29:26

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v2] arm64: dts: qcom: sa8540p-ride: disable pcie2a node

On Tue, Jan 09, 2024 at 10:20:50AM -0500, Lucas Karpinski wrote:
> pcie2a and pcie3a both cause interrupt storms to occur. However, when
> both are enabled simultaneously, the two combined interrupt storms will
> lead to rcu stalls. Red Hat is the only company still using this board
> and since we still need pcie3a, just disable pcie2a.
>

Why are there interrupt storms? What interrupt(s) is(are) involved?

Do you consider this a temporary fix?

Are you okay with pcie3a misbehaving?

Regards,
Bjorn

> Signed-off-by: Lucas Karpinski <[email protected]>
> ---
> v2:
> - don't remove the entire pcie2a node, just set status to disabled.
> - update commit message.
>
> arch/arm64/boot/dts/qcom/sa8540p-ride.dts | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> index b04f72ec097c..177b9dad6ff7 100644
> --- a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> +++ b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> @@ -376,14 +376,14 @@ &pcie2a {
> pinctrl-names = "default";
> pinctrl-0 = <&pcie2a_default>;
>
> - status = "okay";
> + status = "disabled";
> };
>
> &pcie2a_phy {
> vdda-phy-supply = <&vreg_l11a>;
> vdda-pll-supply = <&vreg_l3a>;
>
> - status = "okay";
> + status = "disabled";
> };
>
> &pcie3a {
> --
> 2.43.0
>

2024-01-30 22:16:26

by Lucas Karpinski

[permalink] [raw]
Subject: Re: Re: [PATCH v2] arm64: dts: qcom: sa8540p-ride: disable pcie2a node

> Why are there interrupt storms? What interrupt(s) is(are) involved?
In the earlier link that Andrew mentioned, the DesignWare PCIe driver
uses a chained interrupt to demultiplex the downstream MSI interrupts.
This meant we couldn't identify the MSI interrupt source, so it is not
clear what is causing the hw to misbehave the way that it is.

> Do you consider this a temporary fix?
This will likely be a permanent fix. Qualcomm disabled pcie2a in their
downstream kernel as well, quite some time ago, so this may never be
actually fixed.

> Are you okay with pcie3a misbehaving?
Yes, it would be great of the underlying issue was addressed, but at
least the boards are usable with just pcie3a enabled and the nic will be
available.

Lucas


> > Signed-off-by: Lucas Karpinski <[email protected]>
> > ---
> > v2:
> > - don't remove the entire pcie2a node, just set status to disabled.
> > - update commit message.
> >
> > arch/arm64/boot/dts/qcom/sa8540p-ride.dts | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> > index b04f72ec097c..177b9dad6ff7 100644
> > --- a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> > +++ b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> > @@ -376,14 +376,14 @@ &pcie2a {
> > pinctrl-names = "default";
> > pinctrl-0 = <&pcie2a_default>;
> >
> > - status = "okay";
> > + status = "disabled";
> > };
> >
> > &pcie2a_phy {
> > vdda-phy-supply = <&vreg_l11a>;
> > vdda-pll-supply = <&vreg_l3a>;
> >
> > - status = "okay";
> > + status = "disabled";
> > };
> >
> > &pcie3a {
> > --
> > 2.43.0
> >
>


2024-02-20 18:03:52

by Bjorn Andersson

[permalink] [raw]
Subject: Re: [PATCH v2] arm64: dts: qcom: sa8540p-ride: disable pcie2a node


On Tue, 09 Jan 2024 10:20:50 -0500, Lucas Karpinski wrote:
> pcie2a and pcie3a both cause interrupt storms to occur. However, when
> both are enabled simultaneously, the two combined interrupt storms will
> lead to rcu stalls. Red Hat is the only company still using this board
> and since we still need pcie3a, just disable pcie2a.
>
>

Applied, thanks!

[1/1] arm64: dts: qcom: sa8540p-ride: disable pcie2a node
commit: 07bbe3fd0704ab47d365756a31f45a86e3b45c0a

Best regards,
--
Bjorn Andersson <[email protected]>