When working in host mode, in certain conditions, when the USB
host controller is stressed, there is a HC died warning that comes up.
Fix this up by disabling SS instances in park mode for SC7280 and SC7180.
Krishna Kurapati (2):
arm64: dts: qcom: sc7180: Disable SS instances in park mode
arm64: dts: qcom: sc7280: Disable SS instances in park mode
arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 +
2 files changed, 2 insertions(+)
--
2.34.1
On SC7180, in host mode, it is observed that stressing out controller
in host mode results in HC died error and only restarting the host
mode fixes it. Disable SS instances in park mode for these targets to
avoid host controller being dead.
Reported-by: Doug Anderson <[email protected]>
Cc: <[email protected]>
Fixes: 0b766e7fe5a2 ("arm64: dts: qcom: sc7180: Add USB related nodes")
Signed-off-by: Krishna Kurapati <[email protected]>
---
arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/boot/dts/qcom/sc7180.dtsi b/arch/arm64/boot/dts/qcom/sc7180.dtsi
index 2b481e20ae38..cc93b5675d5d 100644
--- a/arch/arm64/boot/dts/qcom/sc7180.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc7180.dtsi
@@ -3063,6 +3063,7 @@ usb_1_dwc3: usb@a600000 {
iommus = <&apps_smmu 0x540 0>;
snps,dis_u2_susphy_quirk;
snps,dis_enblslpm_quirk;
+ snps,parkmode-disable-ss-quirk;
phys = <&usb_1_hsphy>, <&usb_1_qmpphy QMP_USB43DP_USB3_PHY>;
phy-names = "usb2-phy", "usb3-phy";
maximum-speed = "super-speed";
--
2.34.1
On Thu, May 30, 2024 at 01:55:55PM +0530, Krishna Kurapati wrote:
> On SC7180, in host mode, it is observed that stressing out controller
> in host mode results in HC died error and only restarting the host
> mode fixes it. Disable SS instances in park mode for these targets to
> avoid host controller being dead.
Just out of curiosity, what is the park mode?
>
> Reported-by: Doug Anderson <[email protected]>
> Cc: <[email protected]>
> Fixes: 0b766e7fe5a2 ("arm64: dts: qcom: sc7180: Add USB related nodes")
> Signed-off-by: Krishna Kurapati <[email protected]>
> ---
> arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/boot/dts/qcom/sc7180.dtsi b/arch/arm64/boot/dts/qcom/sc7180.dtsi
> index 2b481e20ae38..cc93b5675d5d 100644
> --- a/arch/arm64/boot/dts/qcom/sc7180.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sc7180.dtsi
> @@ -3063,6 +3063,7 @@ usb_1_dwc3: usb@a600000 {
> iommus = <&apps_smmu 0x540 0>;
> snps,dis_u2_susphy_quirk;
> snps,dis_enblslpm_quirk;
> + snps,parkmode-disable-ss-quirk;
> phys = <&usb_1_hsphy>, <&usb_1_qmpphy QMP_USB43DP_USB3_PHY>;
> phy-names = "usb2-phy", "usb3-phy";
> maximum-speed = "super-speed";
> --
> 2.34.1
>
--
With best wishes
Dmitry
Hi,
On Thu, May 30, 2024 at 1:26 AM Krishna Kurapati
<[email protected]> wrote:
>
> On SC7180, in host mode, it is observed that stressing out controller
> in host mode results in HC died error and only restarting the host
> mode fixes it. Disable SS instances in park mode for these targets to
> avoid host controller being dead.
>
> Reported-by: Doug Anderson <[email protected]>
> Cc: <[email protected]>
> Fixes: 0b766e7fe5a2 ("arm64: dts: qcom: sc7180: Add USB related nodes")
> Signed-off-by: Krishna Kurapati <[email protected]>
> ---
> arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
> 1 file changed, 1 insertion(+)
Reviewed-by: Douglas Anderson <[email protected]>
Tested-by: Douglas Anderson <[email protected]>
Hi,
On Thu, May 30, 2024 at 1:26 AM Krishna Kurapati
<[email protected]> wrote:
>
> When working in host mode, in certain conditions, when the USB
> host controller is stressed, there is a HC died warning that comes up.
> Fix this up by disabling SS instances in park mode for SC7280 and SC7180.
>
> Krishna Kurapati (2):
> arm64: dts: qcom: sc7180: Disable SS instances in park mode
> arm64: dts: qcom: sc7280: Disable SS instances in park mode
>
> arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
> arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 +
> 2 files changed, 2 insertions(+)
FWIW, the test case I used to reproduce this:
1. Plug in a USB dock w/ Ethernet
2. Plug a USB 3 SD card reader into the dock.
3. Use lsusb -t to confirm both Ethernet and card reader are on USB3.
4. From a shell, run for i in $(seq 5); do dd if=/dev/sdb of=/dev/null
bs=4M; done to read from the card reader.
5. At the same time, stress the Internet. If you've got a very fast
Internet connection then running Google's "Internet speed test" did
it, but I could also reproduce by just running this from a PC
connected to the same network as my DUT: ssh ${DUT} "dd of=/dev/null"
< /dev/zero
I would also note that, though I personally reproduced this on sc7180
and sc7280 boards and thus Krishna posted the patch for those boards,
there's no reason to believe that this problem doesn't affect all of
Qualcomm's SoCs. It would be nice if someone at Qualcomm could post a
followup patch fixing this everywhere.
-Doug
On 30.05.2024 3:34 PM, Doug Anderson wrote:
> Hi,
>
> On Thu, May 30, 2024 at 1:26 AM Krishna Kurapati
> <[email protected]> wrote:
>>
>> When working in host mode, in certain conditions, when the USB
>> host controller is stressed, there is a HC died warning that comes up.
>> Fix this up by disabling SS instances in park mode for SC7280 and SC7180.
>>
>> Krishna Kurapati (2):
>> arm64: dts: qcom: sc7180: Disable SS instances in park mode
>> arm64: dts: qcom: sc7280: Disable SS instances in park mode
>>
>> arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
>> arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 +
>> 2 files changed, 2 insertions(+)
>
> FWIW, the test case I used to reproduce this:
>
> 1. Plug in a USB dock w/ Ethernet
> 2. Plug a USB 3 SD card reader into the dock.
> 3. Use lsusb -t to confirm both Ethernet and card reader are on USB3.
> 4. From a shell, run for i in $(seq 5); do dd if=/dev/sdb of=/dev/null
> bs=4M; done to read from the card reader.
> 5. At the same time, stress the Internet. If you've got a very fast
> Internet connection then running Google's "Internet speed test" did
> it, but I could also reproduce by just running this from a PC
> connected to the same network as my DUT: ssh ${DUT} "dd of=/dev/null"
> < /dev/zero
>
> I would also note that, though I personally reproduced this on sc7180
> and sc7280 boards and thus Krishna posted the patch for those boards,
> there's no reason to believe that this problem doesn't affect all of
> Qualcomm's SoCs. It would be nice if someone at Qualcomm could post a
> followup patch fixing this everywhere.
Right, this sounds like a more widespread issue
That said, I couldn't reproduce it on SC8280XP / X13s (which does NOT mean
8280 isn't affected). My setup was:
- USB3 5GB/s hub plugged into one of the side USBs
- on-hub 1 Gb /s network hub connected straight to my router with a
600 / 60 Mbps link, spamming speedtest-cli and dd-over-ssh
- M.2 SSD connected over a USB adapter, nearing 280 MB/s speeds (the
adapter isn't particularly speedy)
So it stands to reason that it might not have been enough to trigger it.
Konrad
On 31.05.2024 4:17 PM, Doug Anderson wrote:
> Hi,
>
> On Fri, May 31, 2024 at 5:33 AM Konrad Dybcio <[email protected]> wrote:
>>
>> On 30.05.2024 3:34 PM, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Thu, May 30, 2024 at 1:26 AM Krishna Kurapati
>>> <[email protected]> wrote:
>>>>
>>>> When working in host mode, in certain conditions, when the USB
>>>> host controller is stressed, there is a HC died warning that comes up.
>>>> Fix this up by disabling SS instances in park mode for SC7280 and SC7180.
>>>>
>>>> Krishna Kurapati (2):
>>>> arm64: dts: qcom: sc7180: Disable SS instances in park mode
>>>> arm64: dts: qcom: sc7280: Disable SS instances in park mode
>>>>
>>>> arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
>>>> arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 +
>>>> 2 files changed, 2 insertions(+)
>>>
>>> FWIW, the test case I used to reproduce this:
>>>
>>> 1. Plug in a USB dock w/ Ethernet
>>> 2. Plug a USB 3 SD card reader into the dock.
>>> 3. Use lsusb -t to confirm both Ethernet and card reader are on USB3.
>>> 4. From a shell, run for i in $(seq 5); do dd if=/dev/sdb of=/dev/null
>>> bs=4M; done to read from the card reader.
>>> 5. At the same time, stress the Internet. If you've got a very fast
>>> Internet connection then running Google's "Internet speed test" did
>>> it, but I could also reproduce by just running this from a PC
>>> connected to the same network as my DUT: ssh ${DUT} "dd of=/dev/null"
>>> < /dev/zero
>>>
>>> I would also note that, though I personally reproduced this on sc7180
>>> and sc7280 boards and thus Krishna posted the patch for those boards,
>>> there's no reason to believe that this problem doesn't affect all of
>>> Qualcomm's SoCs. It would be nice if someone at Qualcomm could post a
>>> followup patch fixing this everywhere.
>>
>> Right, this sounds like a more widespread issue
>>
>> That said, I couldn't reproduce it on SC8280XP / X13s (which does NOT mean
>> 8280 isn't affected). My setup was:
>>
>> - USB3 5GB/s hub plugged into one of the side USBs
>> - on-hub 1 Gb /s network hub connected straight to my router with a
>> 600 / 60 Mbps link, spamming speedtest-cli and dd-over-ssh
>> - M.2 SSD connected over a USB adapter, nearing 280 MB/s speeds (the
>> adapter isn't particularly speedy)
>>
>> So it stands to reason that it might not have been enough to trigger it.
>
> In my case I wasn't using anything nearly as fast as a M.2 SSD. I was
> just using a normal USB3 SD card reader. That being said, multiple
> people at Qualcomm were able to replicate the issue without lots of
> back and forth, so I'd guess that the problem isn't that sensitive to
> the exact storage device. I will also note that it's not sensitive to
> the exact network device as I replicated it with two Ethernet adapters
> with very different chipsets.
>
> My only guess is that somehow SC8280XP is faster and that changes the
> timing of how it handles interrupts. I guess you could try capping
> your cpufreq in sysfs and see if that makes a difference in
> reproducing. ;-) ...or maybe somehow SC8280XP has a newer version of
> the IP where they've fixed this?
Well, great minds think alike :P I did cap it to f_min on all cores, but
that didn't change the situation. Might have been worth to check out powering
off all cores except 0.. I might do that at one point.
My guess is that with a process node change, they might have used some
newer/better ip revision though. Remains to be seen.
Konrad
>
> It would be interesting if someone with a SDM845 dragonboard could try
> replicating since that seems highly likely to reproduce, at least.
>
> -Doug
On 5/31/2024 7:47 PM, Doug Anderson wrote:
> Hi,
>
> On Fri, May 31, 2024 at 5:33 AM Konrad Dybcio <[email protected]> wrote:
>>
>> On 30.05.2024 3:34 PM, Doug Anderson wrote:
>>> Hi,
>>>
>>> On Thu, May 30, 2024 at 1:26 AM Krishna Kurapati
>>> <[email protected]> wrote:
>>>>
>>>> When working in host mode, in certain conditions, when the USB
>>>> host controller is stressed, there is a HC died warning that comes up.
>>>> Fix this up by disabling SS instances in park mode for SC7280 and SC7180.
>>>>
>>>> Krishna Kurapati (2):
>>>> arm64: dts: qcom: sc7180: Disable SS instances in park mode
>>>> arm64: dts: qcom: sc7280: Disable SS instances in park mode
>>>>
>>>> arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
>>>> arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 +
>>>> 2 files changed, 2 insertions(+)
>>>
>>> FWIW, the test case I used to reproduce this:
>>>
>>> 1. Plug in a USB dock w/ Ethernet
>>> 2. Plug a USB 3 SD card reader into the dock.
>>> 3. Use lsusb -t to confirm both Ethernet and card reader are on USB3.
>>> 4. From a shell, run for i in $(seq 5); do dd if=/dev/sdb of=/dev/null
>>> bs=4M; done to read from the card reader.
>>> 5. At the same time, stress the Internet. If you've got a very fast
>>> Internet connection then running Google's "Internet speed test" did
>>> it, but I could also reproduce by just running this from a PC
>>> connected to the same network as my DUT: ssh ${DUT} "dd of=/dev/null"
>>> < /dev/zero
>>>
>>> I would also note that, though I personally reproduced this on sc7180
>>> and sc7280 boards and thus Krishna posted the patch for those boards,
>>> there's no reason to believe that this problem doesn't affect all of
>>> Qualcomm's SoCs. It would be nice if someone at Qualcomm could post a
>>> followup patch fixing this everywhere.
>>
>> Right, this sounds like a more widespread issue
>>
>> That said, I couldn't reproduce it on SC8280XP / X13s (which does NOT mean
>> 8280 isn't affected). My setup was:
>>
>> - USB3 5GB/s hub plugged into one of the side USBs
>> - on-hub 1 Gb /s network hub connected straight to my router with a
>> 600 / 60 Mbps link, spamming speedtest-cli and dd-over-ssh
>> - M.2 SSD connected over a USB adapter, nearing 280 MB/s speeds (the
>> adapter isn't particularly speedy)
>>
>> So it stands to reason that it might not have been enough to trigger it.
>
> In my case I wasn't using anything nearly as fast as a M.2 SSD. I was
> just using a normal USB3 SD card reader. That being said, multiple
> people at Qualcomm were able to replicate the issue without lots of
> back and forth, so I'd guess that the problem isn't that sensitive to
> the exact storage device. I will also note that it's not sensitive to
> the exact network device as I replicated it with two Ethernet adapters
> with very different chipsets.
>
> My only guess is that somehow SC8280XP is faster and that changes the
> timing of how it handles interrupts. I guess you could try capping
> your cpufreq in sysfs and see if that makes a difference in
> reproducing. ;-) ...or maybe somehow SC8280XP has a newer version of
> the IP where they've fixed this?
>
> It would be interesting if someone with a SDM845 dragonboard could try
> replicating since that seems highly likely to reproduce, at least.
>
Hi Konrad, Doug,
Usually on downstream we set this quirk only for all Gen-1 targets
(not particularly for this testcase) but to avoid these kind of
controller going dead issues. I can filter out the gen-1 targets (other
than sc7280/sc7180) and send a separate series to add this quirk in all
of them.
Regards,
Krishna,
Hi,
On Fri, May 31, 2024 at 7:27 AM Krishna Kurapati PSSNV
<[email protected]> wrote:
>
> > My only guess is that somehow SC8280XP is faster and that changes the
> > timing of how it handles interrupts. I guess you could try capping
> > your cpufreq in sysfs and see if that makes a difference in
> > reproducing. ;-) ...or maybe somehow SC8280XP has a newer version of
> > the IP where they've fixed this?
> >
> > It would be interesting if someone with a SDM845 dragonboard could try
> > replicating since that seems highly likely to reproduce, at least.
> >
>
> Hi Konrad, Doug,
>
> Usually on downstream we set this quirk only for all Gen-1 targets
> (not particularly for this testcase) but to avoid these kind of
> controller going dead issues. I can filter out the gen-1 targets (other
> than sc7280/sc7180) and send a separate series to add this quirk in all
> of them.
Sounds like a plan to me!
-Doug
On 31.05.2024 4:31 PM, Doug Anderson wrote:
> Hi,
>
> On Fri, May 31, 2024 at 7:27 AM Krishna Kurapati PSSNV
> <[email protected]> wrote:
>>
>>> My only guess is that somehow SC8280XP is faster and that changes the
>>> timing of how it handles interrupts. I guess you could try capping
>>> your cpufreq in sysfs and see if that makes a difference in
>>> reproducing. ;-) ...or maybe somehow SC8280XP has a newer version of
>>> the IP where they've fixed this?
>>>
>>> It would be interesting if someone with a SDM845 dragonboard could try
>>> replicating since that seems highly likely to reproduce, at least.
>>>
>>
>> Hi Konrad, Doug,
>>
>> Usually on downstream we set this quirk only for all Gen-1 targets
>> (not particularly for this testcase) but to avoid these kind of
>> controller going dead issues. I can filter out the gen-1 targets (other
>> than sc7280/sc7180) and send a separate series to add this quirk in all
>> of them.
>
> Sounds like a plan to me!
Yep!
In case there are more gen1 platforms than what we have upstream, it would
be of great utility if you could list them all, so that we can have a reference
for future additions, Krishna.
Konrad
Hi,
On Fri, May 31, 2024 at 5:33 AM Konrad Dybcio <konrad.dybcio@linaroorg> wrote:
>
> On 30.05.2024 3:34 PM, Doug Anderson wrote:
> > Hi,
> >
> > On Thu, May 30, 2024 at 1:26 AM Krishna Kurapati
> > <[email protected]> wrote:
> >>
> >> When working in host mode, in certain conditions, when the USB
> >> host controller is stressed, there is a HC died warning that comes up.
> >> Fix this up by disabling SS instances in park mode for SC7280 and SC7180.
> >>
> >> Krishna Kurapati (2):
> >> arm64: dts: qcom: sc7180: Disable SS instances in park mode
> >> arm64: dts: qcom: sc7280: Disable SS instances in park mode
> >>
> >> arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
> >> arch/arm64/boot/dts/qcom/sc7280.dtsi | 1 +
> >> 2 files changed, 2 insertions(+)
> >
> > FWIW, the test case I used to reproduce this:
> >
> > 1. Plug in a USB dock w/ Ethernet
> > 2. Plug a USB 3 SD card reader into the dock.
> > 3. Use lsusb -t to confirm both Ethernet and card reader are on USB3.
> > 4. From a shell, run for i in $(seq 5); do dd if=/dev/sdb of=/dev/null
> > bs=4M; done to read from the card reader.
> > 5. At the same time, stress the Internet. If you've got a very fast
> > Internet connection then running Google's "Internet speed test" did
> > it, but I could also reproduce by just running this from a PC
> > connected to the same network as my DUT: ssh ${DUT} "dd of=/dev/null"
> > < /dev/zero
> >
> > I would also note that, though I personally reproduced this on sc7180
> > and sc7280 boards and thus Krishna posted the patch for those boards,
> > there's no reason to believe that this problem doesn't affect all of
> > Qualcomm's SoCs. It would be nice if someone at Qualcomm could post a
> > followup patch fixing this everywhere.
>
> Right, this sounds like a more widespread issue
>
> That said, I couldn't reproduce it on SC8280XP / X13s (which does NOT mean
> 8280 isn't affected). My setup was:
>
> - USB3 5GB/s hub plugged into one of the side USBs
> - on-hub 1 Gb /s network hub connected straight to my router with a
> 600 / 60 Mbps link, spamming speedtest-cli and dd-over-ssh
> - M.2 SSD connected over a USB adapter, nearing 280 MB/s speeds (the
> adapter isn't particularly speedy)
>
> So it stands to reason that it might not have been enough to trigger it.
In my case I wasn't using anything nearly as fast as a M.2 SSD. I was
just using a normal USB3 SD card reader. That being said, multiple
people at Qualcomm were able to replicate the issue without lots of
back and forth, so I'd guess that the problem isn't that sensitive to
the exact storage device. I will also note that it's not sensitive to
the exact network device as I replicated it with two Ethernet adapters
with very different chipsets.
My only guess is that somehow SC8280XP is faster and that changes the
timing of how it handles interrupts. I guess you could try capping
your cpufreq in sysfs and see if that makes a difference in
reproducing. ;-) ...or maybe somehow SC8280XP has a newer version of
the IP where they've fixed this?
It would be interesting if someone with a SDM845 dragonboard could try
replicating since that seems highly likely to reproduce, at least.
-Doug
On 5/31/2024 8:11 PM, Konrad Dybcio wrote:
> On 31.05.2024 4:31 PM, Doug Anderson wrote:
>> Hi,
>>
>> On Fri, May 31, 2024 at 7:27 AM Krishna Kurapati PSSNV
>> <[email protected]> wrote:
>>>
>>>> My only guess is that somehow SC8280XP is faster and that changes the
>>>> timing of how it handles interrupts. I guess you could try capping
>>>> your cpufreq in sysfs and see if that makes a difference in
>>>> reproducing. ;-) ...or maybe somehow SC8280XP has a newer version of
>>>> the IP where they've fixed this?
>>>>
>>>> It would be interesting if someone with a SDM845 dragonboard could try
>>>> replicating since that seems highly likely to reproduce, at least.
>>>>
>>>
>>> Hi Konrad, Doug,
>>>
>>> Usually on downstream we set this quirk only for all Gen-1 targets
>>> (not particularly for this testcase) but to avoid these kind of
>>> controller going dead issues. I can filter out the gen-1 targets (other
>>> than sc7280/sc7180) and send a separate series to add this quirk in all
>>> of them.
>>
>> Sounds like a plan to me!
>
> Yep!
>
> In case there are more gen1 platforms than what we have upstream, it would
> be of great utility if you could list them all, so that we can have a reference
> for future additions, Krishna.
>
I am not sure if I can give out info on targets that are not on
upstream. I will check internally if I can do that. Else we can just
ensure that from now on whenever a Gen-1 target is getting upstreamed,
this quirk is set.
Regards,
Krishna,
On Thu, May 30, 2024 at 01:55:55PM GMT, Krishna Kurapati wrote:
> On SC7180, in host mode, it is observed that stressing out controller
> in host mode results in HC died error and only restarting the host
Could you please include a copy of that error message, so that others
searching for that error message will be able to find this commit?
Also, there's three "in host mode"s in this sentence.
> mode fixes it. Disable SS instances in park mode for these targets to
Please spell SS SuperSpeed.
Regards,
Bjorn
> avoid host controller being dead.
>
> Reported-by: Doug Anderson <[email protected]>
> Cc: <[email protected]>
> Fixes: 0b766e7fe5a2 ("arm64: dts: qcom: sc7180: Add USB related nodes")
> Signed-off-by: Krishna Kurapati <[email protected]>
> ---
> arch/arm64/boot/dts/qcom/sc7180.dtsi | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/arch/arm64/boot/dts/qcom/sc7180.dtsi b/arch/arm64/boot/dts/qcom/sc7180.dtsi
> index 2b481e20ae38..cc93b5675d5d 100644
> --- a/arch/arm64/boot/dts/qcom/sc7180.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sc7180.dtsi
> @@ -3063,6 +3063,7 @@ usb_1_dwc3: usb@a600000 {
> iommus = <&apps_smmu 0x540 0>;
> snps,dis_u2_susphy_quirk;
> snps,dis_enblslpm_quirk;
> + snps,parkmode-disable-ss-quirk;
> phys = <&usb_1_hsphy>, <&usb_1_qmpphy QMP_USB43DP_USB3_PHY>;
> phy-names = "usb2-phy", "usb3-phy";
> maximum-speed = "super-speed";
> --
> 2.34.1
>