pcie2a and pcie3a both cause interrupt storms to occur. However, when
both are enabled simultaneously, the two combined interrupt storms will
lead to rcu stalls. Red Hat is the only company still using this board
and since we still need pcie3a, just disable pcie2a.
Signed-off-by: Lucas Karpinski <[email protected]>
---
v2:
- don't remove the entire pcie2a node, just set status to disabled.
- update commit message.
arch/arm64/boot/dts/qcom/sa8540p-ride.dts | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
index b04f72ec097c..177b9dad6ff7 100644
--- a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
+++ b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
@@ -376,14 +376,14 @@ &pcie2a {
pinctrl-names = "default";
pinctrl-0 = <&pcie2a_default>;
- status = "okay";
+ status = "disabled";
};
&pcie2a_phy {
vdda-phy-supply = <&vreg_l11a>;
vdda-pll-supply = <&vreg_l3a>;
- status = "okay";
+ status = "disabled";
};
&pcie3a {
--
2.43.0
On Tue, Jan 09, 2024 at 10:20:50AM -0500, Lucas Karpinski wrote:
> pcie2a and pcie3a both cause interrupt storms to occur. However, when
> both are enabled simultaneously, the two combined interrupt storms will
> lead to rcu stalls. Red Hat is the only company still using this board
> and since we still need pcie3a, just disable pcie2a.
>
> Signed-off-by: Lucas Karpinski <[email protected]>
Reviewed-by: Brian Masney <[email protected]>
To elaborate further: Leaving both pcie2a and pcie3a enabled will lead
to rcu stalls and the board fails to boot when both are enabled. We
have the latest firmware that we've been able to get from QC.
Disabling one of the pcie nodes works around the boot issue. There's
nothing interesting on pcie2a on the development board, and pcie3a is
enabled because it has 10GB ethernet that works upstream.
The interrupt storm on pcie3a can still occur on this platform, however
that's a separate issue.
Brian
On Thu, Jan 11, 2024 at 09:02:41AM -0500, Brian Masney wrote:
> On Tue, Jan 09, 2024 at 10:20:50AM -0500, Lucas Karpinski wrote:
> > pcie2a and pcie3a both cause interrupt storms to occur. However, when
> > both are enabled simultaneously, the two combined interrupt storms will
> > lead to rcu stalls. Red Hat is the only company still using this board
> > and since we still need pcie3a, just disable pcie2a.
> >
> > Signed-off-by: Lucas Karpinski <[email protected]>
>
> Reviewed-by: Brian Masney <[email protected]>
>
> To elaborate further: Leaving both pcie2a and pcie3a enabled will lead
> to rcu stalls and the board fails to boot when both are enabled. We
> have the latest firmware that we've been able to get from QC.
> Disabling one of the pcie nodes works around the boot issue. There's
> nothing interesting on pcie2a on the development board, and pcie3a is
> enabled because it has 10GB ethernet that works upstream.
>
> The interrupt storm on pcie3a can still occur on this platform, however
> that's a separate issue.
Related work-around to that in case anyone is interested in the paper
trail:
https://lore.kernel.org/all/[email protected]/
On Tue, Jan 09, 2024 at 10:20:50AM -0500, Lucas Karpinski wrote:
> pcie2a and pcie3a both cause interrupt storms to occur. However, when
> both are enabled simultaneously, the two combined interrupt storms will
> lead to rcu stalls. Red Hat is the only company still using this board
> and since we still need pcie3a, just disable pcie2a.
>
Why are there interrupt storms? What interrupt(s) is(are) involved?
Do you consider this a temporary fix?
Are you okay with pcie3a misbehaving?
Regards,
Bjorn
> Signed-off-by: Lucas Karpinski <[email protected]>
> ---
> v2:
> - don't remove the entire pcie2a node, just set status to disabled.
> - update commit message.
>
> arch/arm64/boot/dts/qcom/sa8540p-ride.dts | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> index b04f72ec097c..177b9dad6ff7 100644
> --- a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> +++ b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> @@ -376,14 +376,14 @@ &pcie2a {
> pinctrl-names = "default";
> pinctrl-0 = <&pcie2a_default>;
>
> - status = "okay";
> + status = "disabled";
> };
>
> &pcie2a_phy {
> vdda-phy-supply = <&vreg_l11a>;
> vdda-pll-supply = <&vreg_l3a>;
>
> - status = "okay";
> + status = "disabled";
> };
>
> &pcie3a {
> --
> 2.43.0
>
> Why are there interrupt storms? What interrupt(s) is(are) involved?
In the earlier link that Andrew mentioned, the DesignWare PCIe driver
uses a chained interrupt to demultiplex the downstream MSI interrupts.
This meant we couldn't identify the MSI interrupt source, so it is not
clear what is causing the hw to misbehave the way that it is.
> Do you consider this a temporary fix?
This will likely be a permanent fix. Qualcomm disabled pcie2a in their
downstream kernel as well, quite some time ago, so this may never be
actually fixed.
> Are you okay with pcie3a misbehaving?
Yes, it would be great of the underlying issue was addressed, but at
least the boards are usable with just pcie3a enabled and the nic will be
available.
Lucas
> > Signed-off-by: Lucas Karpinski <[email protected]>
> > ---
> > v2:
> > - don't remove the entire pcie2a node, just set status to disabled.
> > - update commit message.
> >
> > arch/arm64/boot/dts/qcom/sa8540p-ride.dts | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> > index b04f72ec097c..177b9dad6ff7 100644
> > --- a/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> > +++ b/arch/arm64/boot/dts/qcom/sa8540p-ride.dts
> > @@ -376,14 +376,14 @@ &pcie2a {
> > pinctrl-names = "default";
> > pinctrl-0 = <&pcie2a_default>;
> >
> > - status = "okay";
> > + status = "disabled";
> > };
> >
> > &pcie2a_phy {
> > vdda-phy-supply = <&vreg_l11a>;
> > vdda-pll-supply = <&vreg_l3a>;
> >
> > - status = "okay";
> > + status = "disabled";
> > };
> >
> > &pcie3a {
> > --
> > 2.43.0
> >
>
On Tue, 09 Jan 2024 10:20:50 -0500, Lucas Karpinski wrote:
> pcie2a and pcie3a both cause interrupt storms to occur. However, when
> both are enabled simultaneously, the two combined interrupt storms will
> lead to rcu stalls. Red Hat is the only company still using this board
> and since we still need pcie3a, just disable pcie2a.
>
>
Applied, thanks!
[1/1] arm64: dts: qcom: sa8540p-ride: disable pcie2a node
commit: 07bbe3fd0704ab47d365756a31f45a86e3b45c0a
Best regards,
--
Bjorn Andersson <[email protected]>