2024-03-05 06:20:58

by karthikeyan

[permalink] [raw]
Subject: dmaengine: CPU stalls while loading bluetooth module

Hi all,

we have encountered CPU stalls in mainline kernel while loading the
bluetooth module. We have custom board based on rockchip rv1109 soc
and there is bluetooth chipset of relatek 8821cs. CPU is stalls while
realtek 8821cs module.

Bug/Regression:
In current mainline, we found CPU is stalls when we load bluetooth
module. git bisect shows commit 22a9d9585812440211b0b34a6bc02ade62314be4
as a bad, which produce CPU stalls.

git show 22a9d9585812440211b0b34a6bc02ade62314be4
commit 22a9d9585812440211b0b34a6bc02ade62314be4
Author: Bumyong Lee <[email protected]>
Date: Tue Dec 19 14:50:26 2023 +0900

dmaengine: pl330: issue_pending waits until WFP state

According to DMA-330 errata notice[1] 71930, DMAKILL
cannot clear internal signal, named pipeline_req_active.
it makes that pl330 would wait forever in WFP state
although dma already send dma request if pl330 gets
dma request before entering WFP state.

The errata suggests that polling until entering WFP state
as workaround and then peripherals allows to issue dma request.

[1]: https://developer.arm.com/documentation/genc008428/latest

Signed-off-by: Bumyong Lee <[email protected]>
Link:
https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>

diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 3cf0b38387ae..c29744bfdf2c 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -1053,6 +1053,9 @@ static bool _trigger(struct pl330_thread *thrd)

thrd->req_running = idx;

+ if (desc->rqtype == DMA_MEM_TO_DEV || desc->rqtype ==
DMA_DEV_TO_MEM)
+ UNTIL(thrd, PL330_STATE_WFP);
+
return true;
}

By reverting this commit, we have success in loading of bluetooth module.


Output of CPU stalls:
# modprobe hci_uart
[ 27.024749] Bluetooth: HCI UART driver ver 2.3
[ 27.025284] Bluetooth: HCI UART protocol Three-wire (H5) registered
# [ 28.125338] dwmmc_rockchip ffc70000.mmc: Unexpected interrupt latency
[ 33.245339] dwmmc_rockchip ffc50000.mmc: Unexpected interrupt latency
[ 326.195321] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 326.195880] rcu: 0-...0: (3 ticks this GP) idle=e5f4/1/0x40000000
softirq=551/552 fqs=420
[ 326.196621] rcu: hardirqs softirqs csw/system
[ 326.197115] rcu: number: 0 0 0
[ 326.197612] rcu: cputime: 0 0 0 ==>
10500(ms)
[ 326.198231] rcu: (detected by 1, t=2105 jiffies, g=-455, q=17
ncpus=2)
[ 326.198823] Sending NMI from CPU 1 to CPUs 0:

Expected Output:
# modprobe hci_uart
[ 30.690321] Bluetooth: HCI UART driver ver 2.3
[ 30.690852] Bluetooth: HCI UART protocol Three-wire (H5) registered
# [ 31.453586] Bluetooth: hci0: RTL: examining hci_ver=08 hci_rev=000c
lmp_ver=08 lmp_subver=8821
[ 31.458061] Bluetooth: hci0: RTL: rom_version status=0 version=1
[ 31.458608] Bluetooth: hci0: RTL: loading rtl_bt/rtl8821cs_fw.bin
[ 31.465029] Bluetooth: hci0: RTL: loading rtl_bt/rtl8821cs_config.bin
[ 31.483926] Bluetooth: hci0: RTL: cfg_sz 25, total sz 36953
[ 32.213105] Bluetooth: hci0: RTL: fw version 0x75b8f098
[ 32.274216] Bluetooth: MGMT ver 1.22
[ 32.285376] NET: Registered PF_ALG protocol family

Thanks,
Karthikeyan K


2024-03-08 10:07:50

by bumyong.lee

[permalink] [raw]
Subject: RE: dmaengine: CPU stalls while loading bluetooth module

Hello

> Hmmm. 6.8 final is due. Is that something we can live with? Or would it be
> a good idea to revert above commit for now and reapply it when something
> better emerged? I doubt that the answer is "yes, let's do that", but I
> have to ask.

I couldn't find better way now.
I think it's better to follow you mentioned

>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://protect2.fireeye.com/v1/url?k=1721f6f7-48bacfe8-17207db8-
> 000babdfecba-6a9835eff37b0303&q=1&e=f39e15ee-403b-4efb-a56e-
> f6aba3905bc5&u=https%3A%2F%2Flinux-
> regtracking.leemhuis.info%2Fabout%2F%23tldr
> If I did something stupid, please tell me, as explained on that page.
>
> P.S.: To be sure the issue doesn't fall through the cracks unnoticed, I'm
> adding it to regzbot, the Linux kernel regression tracking bot:
>
> #regzbot report /
> #regzbot introduced 22a9d9585812440211b
> #regzbot duplicate: https://lore.kernel.org/lkml/[email protected]/
> #regzbot title dmaengine: CPU stalls while loading bluetooth module
> #regzbot ignore-activity


Subject: Re: dmaengine: CPU stalls while loading bluetooth module

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

On 05.03.24 08:13, bumyong.lee wrote:
>> we have encountered CPU stalls in mainline kernel while loading the
>> bluetooth module. We have custom board based on rockchip rv1109 soc and
>> there is bluetooth chipset of relatek 8821cs. CPU is stalls while realtek
>> 8821cs module.
>>
>> Bug/Regression:
>> In current mainline, we found CPU is stalls when we load bluetooth module.
>> git bisect shows commit 22a9d9585812440211b0b34a6bc02ade62314be4
>> as a bad, which produce CPU stalls.
>>
>> git show 22a9d9585812440211b0b34a6bc02ade62314be4
>> commit 22a9d9585812440211b0b34a6bc02ade62314be4
>> Author: Bumyong Lee <[email protected]>
>> Date: Tue Dec 19 14:50:26 2023 +0900
>>
>> dmaengine: pl330: issue_pending waits until WFP state
>>
> [...]
>>
>> By reverting this commit, we have success in loading of bluetooth module.
>
>> Output of CPU stalls:
> [...]
>
> I discussed this issue. Could you refer to this[1]?
> I haven't received anymore reply from him after that.
> If you have any more opinion, please let me know.
> [1]: https://lore.kernel.org/lkml/[email protected]/T/

Hmmm. 6.8 final is due. Is that something we can live with? Or would it
be a good idea to revert above commit for now and reapply it when
something better emerged? I doubt that the answer is "yes, let's do
that", but I have to ask.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

P.S.: To be sure the issue doesn't fall through the cracks unnoticed,
I'm adding it to regzbot, the Linux kernel regression tracking bot:

#regzbot report /
#regzbot introduced 22a9d9585812440211b
#regzbot duplicate: https://lore.kernel.org/lkml/[email protected]/
#regzbot title dmaengine: CPU stalls while loading bluetooth module
#regzbot ignore-activity

2024-03-05 07:20:19

by bumyong.lee

[permalink] [raw]
Subject: RE: dmaengine: CPU stalls while loading bluetooth module

Hello.

> we have encountered CPU stalls in mainline kernel while loading the
> bluetooth module. We have custom board based on rockchip rv1109 soc and
> there is bluetooth chipset of relatek 8821cs. CPU is stalls while realtek
> 8821cs module.
>
> Bug/Regression:
> In current mainline, we found CPU is stalls when we load bluetooth module.
> git bisect shows commit 22a9d9585812440211b0b34a6bc02ade62314be4
> as a bad, which produce CPU stalls.
>
> git show 22a9d9585812440211b0b34a6bc02ade62314be4
> commit 22a9d9585812440211b0b34a6bc02ade62314be4
> Author: Bumyong Lee <[email protected]>
> Date: Tue Dec 19 14:50:26 2023 +0900
>
> dmaengine: pl330: issue_pending waits until WFP state
>
> According to DMA-330 errata notice[1] 71930, DMAKILL
> cannot clear internal signal, named pipeline_req_active.
> it makes that pl330 would wait forever in WFP state
> although dma already send dma request if pl330 gets
> dma request before entering WFP state.
>
> The errata suggests that polling until entering WFP state
> as workaround and then peripherals allows to issue dma request.
>
> [1]: https://developer.arm.com/documentation/genc008428/latest
>
> Signed-off-by: Bumyong Lee <[email protected]>
> Link:
> https://lore.kernel.org/r/[email protected]
> Signed-off-by: Vinod Koul <[email protected]>
>
> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c index
> 3cf0b38387ae..c29744bfdf2c 100644
> --- a/drivers/dma/pl330.c
> +++ b/drivers/dma/pl330.c
> @@ -1053,6 +1053,9 @@ static bool _trigger(struct pl330_thread *thrd)
>
> thrd->req_running = idx;
>
> + if (desc->rqtype == DMA_MEM_TO_DEV || desc->rqtype ==
> DMA_DEV_TO_MEM)
> + UNTIL(thrd, PL330_STATE_WFP);
> +
> return true;
> }
>
> By reverting this commit, we have success in loading of bluetooth module.
>
>

> Output of CPU stalls:
> # modprobe hci_uart
> [ 27.024749] Bluetooth: HCI UART driver ver 2.3
> [ 27.025284] Bluetooth: HCI UART protocol Three-wire (H5) registered
> # [ 28.125338] dwmmc_rockchip ffc70000.mmc: Unexpected interrupt latency
> [ 33.245339] dwmmc_rockchip ffc50000.mmc: Unexpected interrupt latency
> [ 326.195321] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
> [ 326.195880] rcu: 0-...0: (3 ticks this GP) idle=e5f4/1/0x40000000
> softirq=551/552 fqs=420
> [ 326.196621] rcu: hardirqs softirqs csw/system
> [ 326.197115] rcu: number: 0 0 0
> [ 326.197612] rcu: cputime: 0 0 0 ==>
> 10500(ms)
> [ 326.198231] rcu: (detected by 1, t=2105 jiffies, g=-455, q=17
> ncpus=2)
> [ 326.198823] Sending NMI from CPU 1 to CPUs 0:
>
> Expected Output:
> # modprobe hci_uart
> [ 30.690321] Bluetooth: HCI UART driver ver 2.3
> [ 30.690852] Bluetooth: HCI UART protocol Three-wire (H5) registered
> # [ 31.453586] Bluetooth: hci0: RTL: examining hci_ver=08 hci_rev=000c
> lmp_ver=08 lmp_subver=8821
> [ 31.458061] Bluetooth: hci0: RTL: rom_version status=0 version=1
> [ 31.458608] Bluetooth: hci0: RTL: loading rtl_bt/rtl8821cs_fw.bin
> [ 31.465029] Bluetooth: hci0: RTL: loading rtl_bt/rtl8821cs_config.bin
> [ 31.483926] Bluetooth: hci0: RTL: cfg_sz 25, total sz 36953
> [ 32.213105] Bluetooth: hci0: RTL: fw version 0x75b8f098
> [ 32.274216] Bluetooth: MGMT ver 1.22
> [ 32.285376] NET: Registered PF_ALG protocol family

I discussed this issue. Could you refer to this[1]?
I haven't received anymore reply from him after that.
If you have any more opinion, please let me know.

[1]: https://lore.kernel.org/lkml/[email protected]/T/

Best Regards


2024-03-20 00:49:51

by bumyong.lee

[permalink] [raw]
Subject: RE: dmaengine: CPU stalls while loading bluetooth module

Hello.

> >> Hmmm. 6.8 final is due. Is that something we can live with? Or would
> >> it be a good idea to revert above commit for now and reapply it when
> >> something better emerged? I doubt that the answer is "yes, let's do
> >> that", but I have to ask.
> >
> > I couldn't find better way now.
> > I think it's better to follow you mentioned
>
> 6.8 is out, but that issue afaics was not resolved, so allow me to ask:
> did "submit a revert" fell through the cracks or is there some other
> solution in the works? Or am I missing something?

"submit a revert" would fix the issue. but it would make another issue
that the errata[1] 719340 described.

Sometimes dma wouldn't work well when issueing dma_issue_pending()
after dma_terminate()

Best regards

[1]: https://developer.arm.com/documentation/genc008428/latest


Subject: Re: dmaengine: CPU stalls while loading bluetooth module

On 20.03.24 01:49, bumyong.lee wrote:
>>>> Hmmm. 6.8 final is due. Is that something we can live with? Or would
>>>> it be a good idea to revert above commit for now and reapply it when
>>>> something better emerged? I doubt that the answer is "yes, let's do
>>>> that", but I have to ask.
>>>
>>> I couldn't find better way now.
>>> I think it's better to follow you mentioned
>>
>> 6.8 is out, but that issue afaics was not resolved, so allow me to ask:
>> did "submit a revert" fell through the cracks or is there some other
>> solution in the works? Or am I missing something?
>
> "submit a revert" would fix the issue. but it would make another issue
> that the errata[1] 719340 described.

"Make" as it "that other issue was present before the culprit was
applied"? Then that other issue does not matter due to the "no
regression" rule and how Linus afaics wants to see it applied in
practice. For details on the latter, see the quotes from him here:
https://docs.kernel.org/process/handling-regressions.html
Hence please submit a revert (or tell me if I misunderstood something)
-- or of course a workaround for the other issue that does not cause the
regression people reported.

> [...]
> [1]: https://developer.arm.com/documentation/genc008428/latest

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

Subject: Re: dmaengine: CPU stalls while loading bluetooth module

On 08.03.24 11:07, bumyong.lee wrote:
>
>> Hmmm. 6.8 final is due. Is that something we can live with? Or would it be
>> a good idea to revert above commit for now and reapply it when something
>> better emerged? I doubt that the answer is "yes, let's do that", but I
>> have to ask.
>
> I couldn't find better way now.
> I think it's better to follow you mentioned

6.8 is out, but that issue afaics was not resolved, so allow me to ask:
did "submit a revert" fell through the cracks or is there some other
solution in the works? Or am I missing something?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

Subject: Re: dmaengine: CPU stalls while loading bluetooth module

Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
for once, to make this easily accessible to everyone.

Vinod Koul, what's your option here? We have two reports about
regressions caused by 22a9d958581244 ("dmaengine: pl330: issue_pending
waits until WFP state") [v6.8-rc1] now:

https://lore.kernel.org/lkml/[email protected]/

https://lore.kernel.org/all/[email protected]/
[the first link points to the start of this thread]

To me it sounds like this is a change that better should be reverted,
but you are of course the better judge here.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

On 20.03.24 07:28, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 20.03.24 01:49, bumyong.lee wrote:
>>>>> Hmmm. 6.8 final is due. Is that something we can live with? Or would
>>>>> it be a good idea to revert above commit for now and reapply it when
>>>>> something better emerged? I doubt that the answer is "yes, let's do
>>>>> that", but I have to ask.
>>>>
>>>> I couldn't find better way now.
>>>> I think it's better to follow you mentioned
>>>
>>> 6.8 is out, but that issue afaics was not resolved, so allow me to ask:
>>> did "submit a revert" fell through the cracks or is there some other
>>> solution in the works? Or am I missing something?
>>
>> "submit a revert" would fix the issue. but it would make another issue
>> that the errata[1] 719340 described.
>
> "Make" as it "that other issue was present before the culprit was
> applied"? Then that other issue does not matter due to the "no
> regression" rule and how Linus afaics wants to see it applied in
> practice. For details on the latter, see the quotes from him here:
> https://docs.kernel.org/process/handling-regressions.html
> Hence please submit a revert (or tell me if I misunderstood something)
> -- or of course a workaround for the other issue that does not cause the
> regression people reported.
>
>> [...]
>> [1]: https://developer.arm.com/documentation/genc008428/latest
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
>
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

2024-03-28 06:51:40

by Vinod Koul

[permalink] [raw]
Subject: Re: dmaengine: CPU stalls while loading bluetooth module

On 26-03-24, 14:50, Linux regression tracking (Thorsten Leemhuis) wrote:
> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
> for once, to make this easily accessible to everyone.
>
> Vinod Koul, what's your option here? We have two reports about
> regressions caused by 22a9d958581244 ("dmaengine: pl330: issue_pending
> waits until WFP state") [v6.8-rc1] now:
>
> https://lore.kernel.org/lkml/[email protected]/
>
> https://lore.kernel.org/all/[email protected]/
> [the first link points to the start of this thread]
>
> To me it sounds like this is a change that better should be reverted,
> but you are of course the better judge here.

Sure I have reverted this, so original issue exist as is now...

>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>
> On 20.03.24 07:28, Linux regression tracking (Thorsten Leemhuis) wrote:
> > On 20.03.24 01:49, bumyong.lee wrote:
> >>>>> Hmmm. 6.8 final is due. Is that something we can live with? Or would
> >>>>> it be a good idea to revert above commit for now and reapply it when
> >>>>> something better emerged? I doubt that the answer is "yes, let's do
> >>>>> that", but I have to ask.
> >>>>
> >>>> I couldn't find better way now.
> >>>> I think it's better to follow you mentioned
> >>>
> >>> 6.8 is out, but that issue afaics was not resolved, so allow me to ask:
> >>> did "submit a revert" fell through the cracks or is there some other
> >>> solution in the works? Or am I missing something?
> >>
> >> "submit a revert" would fix the issue. but it would make another issue
> >> that the errata[1] 719340 described.
> >
> > "Make" as it "that other issue was present before the culprit was
> > applied"? Then that other issue does not matter due to the "no
> > regression" rule and how Linus afaics wants to see it applied in
> > practice. For details on the latter, see the quotes from him here:
> > https://docs.kernel.org/process/handling-regressions.html
> > Hence please submit a revert (or tell me if I misunderstood something)
> > -- or of course a workaround for the other issue that does not cause the
> > regression people reported.
> >
> >> [...]
> >> [1]: https://developer.arm.com/documentation/genc008428/latest
> >
> > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> > --
> > Everything you wanna know about Linux kernel regression tracking:
> > https://linux-regtracking.leemhuis.info/about/#tldr
> > If I did something stupid, please tell me, as explained on that page.
> >
> >
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
> #regzbot poke

--
~Vinod

Subject: Re: dmaengine: CPU stalls while loading bluetooth module

On 28.03.24 07:51, Vinod Koul wrote:
> On 26-03-24, 14:50, Linux regression tracking (Thorsten Leemhuis) wrote:
>> Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting
>> for once, to make this easily accessible to everyone.
>>
>> Vinod Koul, what's your option here? We have two reports about
>> regressions caused by 22a9d958581244 ("dmaengine: pl330: issue_pending
>> waits until WFP state") [v6.8-rc1] now:
>>
>> https://lore.kernel.org/lkml/[email protected]/
>>
>> https://lore.kernel.org/all/[email protected]/
>> [the first link points to the start of this thread]
>>
>> To me it sounds like this is a change that better should be reverted,
>> but you are of course the better judge here.
>
> Sure I have reverted this,

Thx!

> so original issue exist as is now...

Yeah, that's a downside, but that's afaik how Linus wants these
situations to be handled. Hopefully it will motivate someone to fix the
original issue without causing a regression.

Thx again! Ciao, Thorsten

P.S.:

#regzbot fix: dmaengine: Revert "dmaengine: pl330: issue_pending waits
until WFP state"

(that's
https://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git/commit/?h=fixes&id=afc89870ea677bd5a44516eb981f7a259b74280c
currently)

Subject: Re: dmaengine: CPU stalls while loading bluetooth module

On 28.03.24 16:06, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 28.03.24 07:51, Vinod Koul wrote:
>> On 26-03-24, 14:50, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>
>>> Vinod Koul, what's your option here? We have two reports about
>>> regressions caused by 22a9d958581244 ("dmaengine: pl330: issue_pending
>>> waits until WFP state") [v6.8-rc1] now:
>>>
>>> https://lore.kernel.org/lkml/[email protected]/
>>>
>>> https://lore.kernel.org/all/[email protected]/
>>> [the first link points to the start of this thread]
>>>
>>> To me it sounds like this is a change that better should be reverted,
>>> but you are of course the better judge here.
>>
>> Sure I have reverted this,
>
> Thx!

That revert afaics has not made it to Linus yet. Is that intentional, or
did it just fell through the cracks?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke