2023-07-15 13:32:24

by Sean Mollet

[permalink] [raw]
Subject: [PATCH] RTW88 firmware download issues - improvement, but not perfect

I’m attempting to debug intermittent firmware download issues that occur with rtw88 on an RTL8821CU. I did not have these problems with the Realtek OOT driver (using morrownr’s version).

Based on GitHub issues I’ve found, this same problem seems to also occasionally occur with other chips in the family, including PCI-E ones.

Example dmesg including some telemetry I added to narrow down the issue:

[ 28.486954] rtx88: Loading firmware rtw88/rtw8821c_fw.bin
[ 28.492907] rtw_8821cu 1-1.5:1.0: Firmware version 24.11.0, H2C version 12
[ 28.988012] check_hw_ready failed
[ 28.991624] rtx_88 failed in download_firmware_validate
[ 28.998626] rtw_8821cu 1-1.5:1.0: failed to download firmware
[ 29.012373] rtw_8821cu 1-1.5:1.0: failed to setup chip efuse info
[ 29.018749] rtw_8821cu 1-1.5:1.0: failed to setup chip information
[ 29.029496] rtw_8821cu: probe of 1-1.5:1.0 failed with error -22

It’s failing in mac.c, in the call to download_firmware_validate. The register contains 0x4078 instead of the 0xC078 that is usually present after a download.

Comparing this to the OOT driver, I noticed that the order of operations at the end of the process is different. In rtw88, lte_coe_backup is restored before checking the register. In the OOT, it’s done after. Another difference is that the check loop in OOT has a much larger count and a larger delay. I applied both of these changes to rtw88 and the failure rate decreased significantly. It is still non 0.

I happen to have a nearly ideal test lab for exploring this problem in that I’ve got a factory full of embedded systems with the RTL8821CU chip attached to an automated boot/provisioning system. I can make a change and deploy+test on 10s or even hundreds of devices in a few minutes.

I don’t have chip documentation, so I’m shooting in a dark a bit here. My suspicion is that this is a race condition either in the rtw88 driver or in the hardware or the interaction between the two. It also seems to be exacerbated by high IO on the host CPU during driver loading. I’m further exploring the differences between the OOT driver and rtw88’s handling of firmware download, since I’ve never seen this happen with the OOT driver.

References:

https://github.com/morrownr/8821cu-20210916/blob/main/hal/halmac/halmac_88xx/halmac_fw_88xx.c
Line 205 restores the lte_coe_backup. The equivalent of the check in download_firmware_validate happens inside the call to flfw_end_flow_88xx on line 201, the opposite order compared to what
Line 666 sets the check loop, equivalent to rtw88's download_firmware_validate function to 5000 cycles and the delay used is 50 uS on line 678.

diff --git a/mac.c b/mac.c
index 298663b..d595711 100644
--- a/mac.c
+++ b/mac.c
@@ -794,15 +794,15 @@ static int __rtw_download_firmware(struct rtw_dev *rtwdev,

wlan_cpu_enable(rtwdev, true);

- if (!ltecoex_reg_write(rtwdev, 0x38, ltecoex_bckp)) {
- ret = -EBUSY;
- goto dlfw_fail;
- }
-
ret = download_firmware_validate(rtwdev);
if (ret)
goto dlfw_fail;

+ if (!ltecoex_reg_write(rtwdev, 0x38, ltecoex_bckp)) {
+ ret = -EBUSY;
+ goto dlfw_fail;
+ }
+
/* reset desc and index */
rtw_hci_setup(rtwdev);

diff --git a/util.c b/util.c
index ff3c269..fbd6599 100644
--- a/util.c
+++ b/util.c
@@ -10,11 +10,11 @@ bool check_hw_ready(struct rtw_dev *rtwdev, u32 addr, u32 mask, u32 target)
{
u32 cnt;

- for (cnt = 0; cnt < 1000; cnt++) {
+ for (cnt = 0; cnt < 5000; cnt++) {
if (rtw_read32_mask(rtwdev, addr, mask) == target)
return true;

- udelay(10);
+ udelay(50);
}

return false;
--


2023-07-16 18:12:02

by Larry Finger

[permalink] [raw]
Subject: Re: [PATCH] RTW88 firmware download issues - improvement, but not perfect

On 7/15/23 08:23, Sean Mollet wrote:
> I’m attempting to debug intermittent firmware download issues that occur with rtw88 on an RTL8821CU. I did not have these problems with the Realtek OOT driver (using morrownr’s version).
>
> Based on GitHub issues I’ve found, this same problem seems to also occasionally occur with other chips in the family, including PCI-E ones.
>
> Example dmesg including some telemetry I added to narrow down the issue:
>
> [ 28.486954] rtx88: Loading firmware rtw88/rtw8821c_fw.bin
> [ 28.492907] rtw_8821cu 1-1.5:1.0: Firmware version 24.11.0, H2C version 12
> [ 28.988012] check_hw_ready failed
> [ 28.991624] rtx_88 failed in download_firmware_validate
> [ 28.998626] rtw_8821cu 1-1.5:1.0: failed to download firmware
> [ 29.012373] rtw_8821cu 1-1.5:1.0: failed to setup chip efuse info
> [ 29.018749] rtw_8821cu 1-1.5:1.0: failed to setup chip information
> [ 29.029496] rtw_8821cu: probe of 1-1.5:1.0 failed with error -22
>
> It’s failing in mac.c, in the call to download_firmware_validate. The register contains 0x4078 instead of the 0xC078 that is usually present after a download.
>
> Comparing this to the OOT driver, I noticed that the order of operations at the end of the process is different. In rtw88, lte_coe_backup is restored before checking the register. In the OOT, it’s done after. Another difference is that the check loop in OOT has a much larger count and a larger delay. I applied both of these changes to rtw88 and the failure rate decreased significantly. It is still non 0.
>
> I happen to have a nearly ideal test lab for exploring this problem in that I’ve got a factory full of embedded systems with the RTL8821CU chip attached to an automated boot/provisioning system. I can make a change and deploy+test on 10s or even hundreds of devices in a few minutes.
>
> I don’t have chip documentation, so I’m shooting in a dark a bit here. My suspicion is that this is a race condition either in the rtw88 driver or in the hardware or the interaction between the two. It also seems to be exacerbated by high IO on the host CPU during driver loading. I’m further exploring the differences between the OOT driver and rtw88’s handling of firmware download, since I’ve never seen this happen with the OOT driver.
>
> References:
>
> https://github.com/morrownr/8821cu-20210916/blob/main/hal/halmac/halmac_88xx/halmac_fw_88xx.c
> Line 205 restores the lte_coe_backup. The equivalent of the check in download_firmware_validate happens inside the call to flfw_end_flow_88xx on line 201, the opposite order compared to what
> Line 666 sets the check loop, equivalent to rtw88's download_firmware_validate function to 5000 cycles and the delay used is 50 uS on line 678.
>

Patches for the rtlwifi, rtw88, and rtw89 drivers should be addressed to Ping-Ke
Shih <[email protected]>. Any patch for a driver in drivers/net/wireless/...
should be sent to Kalle Valo <[email protected]>. Cc linux-wireless.

The subject line should be "wifi: rtw88: ......"

There is no signed-off-by tag. This is essential.

This commit message is not appropriate. There is too much detail on how the
sausage was made. It should state what the problem is, and a little about what
was done to fix it.

Your first reference is to the vendor driver, which is a totally different
beast. I would say no more than "The vendor driver was consulted for ideas." If
a specific change was made based on the vendor driver, then describe it.

When fixing a bug, you need to add a Fixes tag. See
Documentation/process/submitting-patches.rst in your kernel source free for
instructions. You should also include "fixes" in the subject, and likely add a
Cc for [email protected]. That will ensure that the fix is propagated to
stable kernels.

Your mailer sent HTML mail. For patches, it must be plain test. In addition,
your mailer line wrapped a couple of lines, and the tabs at the start of lines
got changed to spaces. It took a lot of work to get the patches to apply to my
rtw88 repo.

> diff --git a/mac.c b/mac.c
> index 298663b..d595711 100644
> --- a/mac.c
> +++ b/mac.c

These files are not found in the kernel source tree. You are confusing the rtw88
files with the kernel source. Any wifi patches should be based on the current
wireless-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git.
Repository rtw88 is fed from wireless-next, not the other way around.

> @@ -794,15 +794,15 @@ static int __rtw_download_firmware(struct rtw_dev *rtwdev,
>
> wlan_cpu_enable(rtwdev, true);
>
> - if (!ltecoex_reg_write(rtwdev, 0x38, ltecoex_bckp)) {
> - ret = -EBUSY;
> - goto dlfw_fail;
> - }
> -
> ret = download_firmware_validate(rtwdev);
> if (ret)
> goto dlfw_fail;
>
> + if (!ltecoex_reg_write(rtwdev, 0x38, ltecoex_bckp)) {
> + ret = -EBUSY;
> + goto dlfw_fail;
> + }
> +
> /* reset desc and index */
> rtw_hci_setup(rtwdev);
>
> diff --git a/util.c b/util.c
> index ff3c269..fbd6599 100644
> --- a/util.c
> +++ b/util.c
> @@ -10,11 +10,11 @@ bool check_hw_ready(struct rtw_dev *rtwdev, u32 addr, u32 mask, u32 target)
> {
> u32 cnt;
>
> - for (cnt = 0; cnt < 1000; cnt++) {
> + for (cnt = 0; cnt < 5000; cnt++) {
> if (rtw_read32_mask(rtwdev, addr, mask) == target)
> return true;
>
> - udelay(10);
> + udelay(50);

You have increased the maximum stall time from 10 msec to 250 msec. Do you
really need to lock up a CPU for that long? This is a place where you should
document how long it actually takes, if it really is more than 10 msec. On my
rtw8821ce card, the longest it took was 6.25 msec. The USB device will likely
take longer, but I would be interested in your worst case. FYI, I changed
check_hw_ready() to read

for (cnt = 0; cnt < 5000; cnt++) {
if (rtw_read32_mask(rtwdev, addr, mask) == target) {
if (cnt > 50)
pr_info("hw_ready at count %d\n", cnt);
return true;
}

udelay(50);
}



> }
>
> return false;
> --

Thanks for working on this problem. I hope we can get the submitted patch into
good shape.

Larry


2023-07-17 02:23:38

by Ping-Ke Shih

[permalink] [raw]
Subject: RE: [PATCH] RTW88 firmware download issues - improvement, but not perfect



> -----Original Message-----
> From: Larry Finger <[email protected]> On Behalf Of Larry Finger
> Sent: Monday, July 17, 2023 2:02 AM
> To: Sean Mollet <[email protected]>; [email protected]
> Subject: Re: [PATCH] RTW88 firmware download issues - improvement, but not perfect
>
>
> Patches for the rtlwifi, rtw88, and rtw89 drivers should be addressed to Ping-Ke
> Shih <[email protected]>. Any patch for a driver in drivers/net/wireless/...
> should be sent to Kalle Valo <[email protected]>. Cc linux-wireless.

I subscribe this mailing list and treat rtlwifi/rtw88/rtw89 with special filter, so
I think I don't miss this mail. :-)

>
> The subject line should be "wifi: rtw88: ......"

I think this should be "RFC" instead of "PATCH" as subject.
> > @@ -794,15 +794,15 @@ static int __rtw_download_firmware(struct rtw_dev *rtwdev,
> >
> > wlan_cpu_enable(rtwdev, true);
> >
> > - if (!ltecoex_reg_write(rtwdev, 0x38, ltecoex_bckp)) {
> > - ret = -EBUSY;
> > - goto dlfw_fail;
> > - }
> > -
> > ret = download_firmware_validate(rtwdev);
> > if (ret)
> > goto dlfw_fail;
> >
> > + if (!ltecoex_reg_write(rtwdev, 0x38, ltecoex_bckp)) {
> > + ret = -EBUSY;
> > + goto dlfw_fail;
> > + }
> > +

This looks reason to restore 0x38 after validating firmware. Do you have a result
how this change can improve?

> > /* reset desc and index */
> > rtw_hci_setup(rtwdev);
> >
> > diff --git a/util.c b/util.c
> > index ff3c269..fbd6599 100644
> > --- a/util.c
> > +++ b/util.c
> > @@ -10,11 +10,11 @@ bool check_hw_ready(struct rtw_dev *rtwdev, u32 addr, u32 mask, u32 target)
> > {
> > u32 cnt;
> >
> > - for (cnt = 0; cnt < 1000; cnt++) {
> > + for (cnt = 0; cnt < 5000; cnt++) {
> > if (rtw_read32_mask(rtwdev, addr, mask) == target)
> > return true;
> >
> > - udelay(10);
> > + udelay(50);

I look into the latest vendor driver, it shows that cnt becomes 10,000 and delay
is 50us as your change.

>
> You have increased the maximum stall time from 10 msec to 250 msec. Do you
> really need to lock up a CPU for that long? This is a place where you should
> document how long it actually takes, if it really is more than 10 msec. On my
> rtw8821ce card, the longest it took was 6.25 msec. The USB device will likely
> take longer, but I would be interested in your worst case.

Maybe, we can set cnt/udelay according to 'rtwdev->hci.type', and change udelay()
to fsleep() if all callers are running on thread context.

Another note is that check_hw_ready() is also used by many other places.

> FYI, I changed
> check_hw_ready() to read
>
> for (cnt = 0; cnt < 5000; cnt++) {
> if (rtw_read32_mask(rtwdev, addr, mask) == target) {
> if (cnt > 50)
> pr_info("hw_ready at count %d\n", cnt);
> return true;
> }
>
> udelay(50);

This looks weird. If udelay() isn't in loop, PCIE device can run quickly and get
a result "not ready". But, for slow IO USB/SDIO, this might be fine.

Ping-Ke

2023-07-17 02:30:13

by Sean Mollet

[permalink] [raw]
Subject: [RFC] RTW88 firmware download issues - improvement, but not perfect


On Jul 16, 2023, at 9:05 PM, Ping-Ke Shih <[email protected]> wrote:
>
> 
>
>> -----Original Message-----
>> From: Larry Finger <[email protected]> On Behalf Of Larry Finger
>> Sent: Monday, July 17, 2023 2:02 AM
>> To: Sean Mollet <[email protected]>; [email protected]
>> Subject: Re: [PATCH] RTW88 firmware download issues - improvement, but not perfect
>>
>>
>> Patches for the rtlwifi, rtw88, and rtw89 drivers should be addressed to Ping-Ke
>> Shih <[email protected]>. Any patch for a driver in drivers/net/wireless/...
>> should be sent to Kalle Valo <[email protected]>. Cc linux-wireless.
>
> I subscribe this mailing list and treat rtlwifi/rtw88/rtw89 with special filter, so
> I think I don't miss this mail. :-)
>
>>
>> The subject line should be "wifi: rtw88: ......"
>
> I think this should be "RFC" instead of "PATCH" as subject.

Done.

>>> @@ -794,15 +794,15 @@ static int __rtw_download_firmware(struct rtw_dev *rtwdev,
>>>
>>> wlan_cpu_enable(rtwdev, true);
>>>
>>> - if (!ltecoex_reg_write(rtwdev, 0x38, ltecoex_bckp)) {
>>> - ret = -EBUSY;
>>> - goto dlfw_fail;
>>> - }
>>> -
>>> ret = download_firmware_validate(rtwdev);
>>> if (ret)
>>> goto dlfw_fail;
>>>
>>> + if (!ltecoex_reg_write(rtwdev, 0x38, ltecoex_bckp)) {
>>> + ret = -EBUSY;
>>> + goto dlfw_fail;
>>> + }
>>> +
>
> This looks reason to restore 0x38 after validating firmware. Do you have a result
> how this change can improve?
>

Using a Pi 4 CM as host, this reduces failures from 1 in 5 to 1 in 20.

I don’t know why, but it makes a measurable difference.

>>> /* reset desc and index */
>>> rtw_hci_setup(rtwdev);
>>>
>>> diff --git a/util.c b/util.c
>>> index ff3c269..fbd6599 100644
>>> --- a/util.c
>>> +++ b/util.c
>>> @@ -10,11 +10,11 @@ bool check_hw_ready(struct rtw_dev *rtwdev, u32 addr, u32 mask, u32 target)
>>> {
>>> u32 cnt;
>>>
>>> - for (cnt = 0; cnt < 1000; cnt++) {
>>> + for (cnt = 0; cnt < 5000; cnt++) {
>>> if (rtw_read32_mask(rtwdev, addr, mask) == target)
>>> return true;
>>>
>>> - udelay(10);
>>> + udelay(50);
>
> I look into the latest vendor driver, it shows that cnt becomes 10,000 and delay
> is 50us as your change.
Interesting. Is it possible that the real problem is simply not waiting long enough?

Can you share some details of what the chip is doing and how long it should take?


>
>>
>> You have increased the maximum stall time from 10 msec to 250 msec. Do you
>> really need to lock up a CPU for that long? This is a place where you should
>> document how long it actually takes, if it really is more than 10 msec. On my
>> rtw8821ce card, the longest it took was 6.25 msec. The USB device will likely
>> take longer, but I would be interested in your worst case.
>
> Maybe, we can set cnt/udelay according to 'rtwdev->hci.type', and change udelay()
> to fsleep() if all callers are running on thread context.
I like this. If we fsleep all the time, there’s no real cost to having a large ucnt.

>
> Another note is that check_hw_ready() is also used by many other places.
>
Yes. Nearly all of those calls return right away though (I added prints to check for this.)

>> FYI, I changed
>> check_hw_ready() to read
>>
>> for (cnt = 0; cnt < 5000; cnt++) {
>> if (rtw_read32_mask(rtwdev, addr, mask) == target) {
>> if (cnt > 50)
>> pr_info("hw_ready at count %d\n", cnt);
>> return true;
>> }
>>
>> udelay(50);
>
> This looks weird. If udelay() isn't in loop, PCIE device can run quickly and get
> a result "not ready". But, for slow IO USB/SDIO, this might be fine.
>

My average when firmware load succeeds is less than 10. I’ll try increasing ucnt to 10,000 tomorrow and see if perhaps some complete at more than 5000.

> Ping-Ke

Sean

>

2023-07-17 09:14:05

by Ping-Ke Shih

[permalink] [raw]
Subject: RE: [RFC] RTW88 firmware download issues - improvement, but not perfect



> -----Original Message-----
> From: Sean Mollet <[email protected]>
> Sent: Monday, July 17, 2023 10:24 AM
> To: Ping-Ke Shih <[email protected]>
> Cc: Larry Finger <[email protected]>; [email protected]
> Subject: [RFC] RTW88 firmware download issues - improvement, but not perfect
>
> On Jul 16, 2023, at 9:05 PM, Ping-Ke Shih <[email protected]> wrote:
> >
> >
> >
> >>> @@ -794,15 +794,15 @@ static int __rtw_download_firmware(struct rtw_dev *rtwdev,
> >>>
> >>> wlan_cpu_enable(rtwdev, true);
> >>>
> >>> - if (!ltecoex_reg_write(rtwdev, 0x38, ltecoex_bckp)) {
> >>> - ret = -EBUSY;
> >>> - goto dlfw_fail;
> >>> - }
> >>> -
> >>> ret = download_firmware_validate(rtwdev);
> >>> if (ret)
> >>> goto dlfw_fail;
> >>>
> >>> + if (!ltecoex_reg_write(rtwdev, 0x38, ltecoex_bckp)) {
> >>> + ret = -EBUSY;
> >>> + goto dlfw_fail;
> >>> + }
> >>> +
> >
> > This looks reason to restore 0x38 after validating firmware. Do you have a result
> > how this change can improve?
> >
>
> Using a Pi 4 CM as host, this reduces failures from 1 in 5 to 1 in 20.
>
> I don’t know why, but it makes a measurable difference.

I will check this with my colleague to see if we can apply this change.

>
> >>> /* reset desc and index */
> >>> rtw_hci_setup(rtwdev);
> >>>
> >>> diff --git a/util.c b/util.c
> >>> index ff3c269..fbd6599 100644
> >>> --- a/util.c
> >>> +++ b/util.c
> >>> @@ -10,11 +10,11 @@ bool check_hw_ready(struct rtw_dev *rtwdev, u32 addr, u32 mask, u32 target)
> >>> {
> >>> u32 cnt;
> >>>
> >>> - for (cnt = 0; cnt < 1000; cnt++) {
> >>> + for (cnt = 0; cnt < 5000; cnt++) {
> >>> if (rtw_read32_mask(rtwdev, addr, mask) == target)
> >>> return true;
> >>>
> >>> - udelay(10);
> >>> + udelay(50);
> >
> > I look into the latest vendor driver, it shows that cnt becomes 10,000 and delay
> > is 50us as your change.
> Interesting. Is it possible that the real problem is simply not waiting long enough?
>
> Can you share some details of what the chip is doing and how long it should take?
>

It seems like I misread the code, the latest version is 5,000 as you mentioned.

If failed to polling ready, please read and print out 0x1C4 and 0x10fc 20 times
with 1ms or more delay. These store firmware PC-like address, so we can check
if firmware is running or getting stuck.

Ping-Ke

2023-07-17 18:54:53

by Sean Mollet

[permalink] [raw]
Subject: Re: [RFC] RTW88 firmware download issues - improvement, but not perfect

I added a check of those two registers and rebooted 20 units 10 times. Failure rate was consistent and the traces were all like this:

[ 30.227933] rtx_88 failed in download_firmware_validate
Support information:
[ 30.228392] rtx_88 0/50 0x1C4: 0, 0x10FC: 0
[ 30.244149] rtx_88 1/50 0x1C4: fe000000, 0x10FC: 800350a6
[ 30.251142] rtx_88 2/50 0x1C4: fe000000, 0x10FC: 800350a6
[ 30.258269] rtx_88 3/50 0x1C4: fe000000, 0x10FC: 800350f5
[ 30.244149] rtx_88 1/50 0x1C4: fe000000, 0x10FC: 800350a6
[ 30.251142] rtx_88 2/50 0x1C4: fe000000, 0x10FC: 800350a6
[ 30.258269] rtx_88 3/50 0x1C4: fe000000, 0x10FC: 800350f5
[ 30.265399] rtx_88 4/50 0x1C4: fe000000, 0x10FC: 800350a6
[ 30.272388] rtx_88 5/50 0x1C4: fe000000, 0x10FC: 800350a5
[ 30.279387] rtx_88 6/50 0x1C4: fe000000, 0x10FC: 800350a5
[ 30.286387] rtx_88 7/50 0x1C4: fe000000, 0x10FC: 800350a5
[ 30.293392] rtx_88 8/50 0x1C4: fe000000, 0x10FC: 800350f5
[ 30.300386] rtx_88 9/50 0x1C4: fe000000, 0x10FC: 800350a5
[ 30.307387] rtx_88 10/50 0x1C4: fe000000, 0x10FC: 800350a5
[ 30.314518] rtx_88 11/50 0x1C4: fe000000, 0x10FC: 800350a5
[ 30.321654] rtx_88 12/50 0x1C4: fe000000, 0x10FC: 800350a6
[ 30.329913] rtx_88 13/50 0x1C4: fe000000, 0x10FC: 800350f6
[ 30.338722] rtx_88 14/50 0x1C4: fe000000, 0x10FC: 800350a6

The pattern and addresses continue and are the same on any device that fails. Going on your statement that 0x10FC is a PC like register, it looks like it’s caught in an infinite loop.


Sean


> On Jul 17, 2023, at 3:52 AM, Ping-Ke Shih <[email protected]> wrote:
>
>
>
>> -----Original Message-----
>> From: Sean Mollet <[email protected]>
>> Sent: Monday, July 17, 2023 10:24 AM
>> To: Ping-Ke Shih <[email protected]>
>> Cc: Larry Finger <[email protected]>; [email protected]
>> Subject: [RFC] RTW88 firmware download issues - improvement, but not perfect
>>
>> On Jul 16, 2023, at 9:05 PM, Ping-Ke Shih <[email protected]> wrote:
>>>
>>>
>>>
>>>>> @@ -794,15 +794,15 @@ static int __rtw_download_firmware(struct rtw_dev *rtwdev,
>>>>>
>>>>> wlan_cpu_enable(rtwdev, true);
>>>>>
>>>>> - if (!ltecoex_reg_write(rtwdev, 0x38, ltecoex_bckp)) {
>>>>> - ret = -EBUSY;
>>>>> - goto dlfw_fail;
>>>>> - }
>>>>> -
>>>>> ret = download_firmware_validate(rtwdev);
>>>>> if (ret)
>>>>> goto dlfw_fail;
>>>>>
>>>>> + if (!ltecoex_reg_write(rtwdev, 0x38, ltecoex_bckp)) {
>>>>> + ret = -EBUSY;
>>>>> + goto dlfw_fail;
>>>>> + }
>>>>> +
>>>
>>> This looks reason to restore 0x38 after validating firmware. Do you have a result
>>> how this change can improve?
>>>
>>
>> Using a Pi 4 CM as host, this reduces failures from 1 in 5 to 1 in 20.
>>
>> I don’t know why, but it makes a measurable difference.
>
> I will check this with my colleague to see if we can apply this change.
>
>>
>>>>> /* reset desc and index */
>>>>> rtw_hci_setup(rtwdev);
>>>>>
>>>>> diff --git a/util.c b/util.c
>>>>> index ff3c269..fbd6599 100644
>>>>> --- a/util.c
>>>>> +++ b/util.c
>>>>> @@ -10,11 +10,11 @@ bool check_hw_ready(struct rtw_dev *rtwdev, u32 addr, u32 mask, u32 target)
>>>>> {
>>>>> u32 cnt;
>>>>>
>>>>> - for (cnt = 0; cnt < 1000; cnt++) {
>>>>> + for (cnt = 0; cnt < 5000; cnt++) {
>>>>> if (rtw_read32_mask(rtwdev, addr, mask) == target)
>>>>> return true;
>>>>>
>>>>> - udelay(10);
>>>>> + udelay(50);
>>>
>>> I look into the latest vendor driver, it shows that cnt becomes 10,000 and delay
>>> is 50us as your change.
>> Interesting. Is it possible that the real problem is simply not waiting long enough?
>>
>> Can you share some details of what the chip is doing and how long it should take?
>>
>
> It seems like I misread the code, the latest version is 5,000 as you mentioned.
>
> If failed to polling ready, please read and print out 0x1C4 and 0x10fc 20 times
> with 1ms or more delay. These store firmware PC-like address, so we can check
> if firmware is running or getting stuck.
>
> Ping-Ke
>