2016-05-18 17:57:36

by Daniel Lenski

[permalink] [raw]
Subject: [PATCH] rtl8xxxu: increase polling timeout for firmware startup

Here is a patch to increase the polling timeout for rtl8xxxu firmware
startup.

This patch now applies cleanly to Jes Sorensen's rtl8xxxu-devel tree
(86c89dd4782030c4f9e0c82424a8b92f9fdb35aa).

The patch to make this a module parameter was removed.

Dan Lenski (1):
rtl8xxxu: Increase default polling timeout for firmware startup

drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--
2.8.2



2016-05-21 02:40:42

by Daniel Lenski

[permalink] [raw]
Subject: Re: [PATCH] rtl8xxxu: increase polling timeout for firmware startup

On Fri, May 20, 2016 at 7:08 AM, Jes Sorensen <[email protected]> wrote:
>
> Daniel Lenski <[email protected]> writes:
> > Unfortunately, I ran into a case today where even 5000 loops was not
> > enough after a cold boot. 5000 loops meant about 1.5 second delay
> > between finishing the firmware checksum poll, while waiting for the
> > firmware to start. It now appears to me that the number of required
> > polling loops must be strongly bimodal.
> >
> > I added some logging, so that the driver reports to me the number of
> > loops required for the firmware to start.
>
> This is bizarre, I wonder if the hardware is having issues in your
> laptop? Another thing to try would be to do an additional reset of the
> chip and wait a bit before trying to load the firmware?

I tried various versions of running rtl8xxxu_init_device in a loop, with
delays in between retries, and did not have any success.

If the device doesn't want to start on the first load after boot, running
various parts of init over and over just doesn't fix it. But unloading the
driver and reloading does seem to fix it.

So then I wondered...

- Why does the firmware always (?) start on the *second try* of
loading rtl8xxxu, even if it failed to load after thousands of loops on
the first load attempt.
- What would be the difference between the two cases?
- As far as I understand it, the main effect on the hardware of unloading
the driver and then reloading it is that the device is power-cycled
(rtl8xxxu_power_off in rtl8xxxu_disable_device).
- Is it possible that the device sometimes starts up in an unknown state
after a cold boot?
- Hypothesis: since the rtl8xxxu driver does not explicitly power off the
device before attempting to power it on, if it boots up in an unknown
state, it will remain in this state until explicitly power-cycled.

So then I tried powering off the device for 500ms after a failure in
rtl8xxxu_init_device, before a retry:

for (retry=5; retry>=0; retry--) {
ret = rtl8xxxu_init_device(hw);
if (ret==0) {
break;
} else {
dev_err(&udev->dev, "Failed to init device,
will retry %d more times.\n", retry);
if (retry==0)
goto exit;
else {
/* power off for 500 ms before retry */
rtl8xxxu_power_off(priv);
msleep(500);
}
}
}

So far, this always seems to work on the second try, even with a very
short firmware_poll_max (50).

I even tried forcing 50 power-cycles and inits in a row, and the firmware
still starts up on the 51st cycle, and everything works fine.

I don't understand *why* this works, but it seems like it might be a
more reliable solution, since it addresses the experience common to
the multiple bug reports, wherein the failure is only on the first attempt
after cold boot.

Thanks,
Dan

2016-05-20 20:51:19

by Daniel Lenski

[permalink] [raw]
Subject: Re: [PATCH] rtl8xxxu: increase polling timeout for firmware startup

Jes Sorensen <[email protected]> writes:
> > Unfortunately, I ran into a case today where even 5000 loops was not
> > enough after a cold boot. 5000 loops meant about 1.5 second delay
> > between finishing the firmware checksum poll, while waiting for the
> > firmware to start. It now appears to me that the number of required
> > polling loops must be strongly bimodal.
> >
> > I added some logging, so that the driver reports to me the number of
> > loops required for the firmware to start.
>
> This is bizarre, I wonder if the hardware is having issues in your
> laptop?

The other bug reports I linked to seem identical to what's happening to
mine, so I doubt it is an issue with one specific instance of the hardware.

> Another thing to try would be to do an additional reset of the
> chip and wait a bit before trying to load the firmware?

Thanks, I will give this a shot.

Am I understanding your idea correctly? To put a loop around the 8051
reset and the firmware polling loop, with a delay between failure and retry?

for (int retry=1; ii<=MAX_RETRIES; ii++) {
for (retry=1; retry<=5; retry++) {
/* Reset the 8051 */
priv->fops->reset_8051(priv);

/* Wait for firmware to become ready */
for (i=0; i<RTL8XXXXU_FIRMWARE_POLL_MAX; i++) {
...
}

/* On failure, insert a delay before trying again */
if (i==RTL8XXXU_FIRMWARE_POLL_MAX)
msleep(500);
}
}

Thanks,
Dan

2016-05-23 19:24:34

by Jes Sorensen

[permalink] [raw]
Subject: Re: [PATCH] rtl8xxxu: increase polling timeout for firmware startup

Daniel Lenski <[email protected]> writes:
> I tried various versions of running rtl8xxxu_init_device in a loop, with
> delays in between retries, and did not have any success.
>
> If the device doesn't want to start on the first load after boot, running
> various parts of init over and over just doesn't fix it. But unloading the
> driver and reloading does seem to fix it.
>
> So then I wondered...
>
> - Why does the firmware always (?) start on the *second try* of
> loading rtl8xxxu, even if it failed to load after thousands of loops on
> the first load attempt.
> - What would be the difference between the two cases?
> - As far as I understand it, the main effect on the hardware of unloading
> the driver and then reloading it is that the device is power-cycled
> (rtl8xxxu_power_off in rtl8xxxu_disable_device).
> - Is it possible that the device sometimes starts up in an unknown state
> after a cold boot?
> - Hypothesis: since the rtl8xxxu driver does not explicitly power off the
> device before attempting to power it on, if it boots up in an unknown
> state, it will remain in this state until explicitly power-cycled.
>
> So then I tried powering off the device for 500ms after a failure in
> rtl8xxxu_init_device, before a retry:
>
> for (retry=5; retry>=0; retry--) {
> ret = rtl8xxxu_init_device(hw);
> if (ret==0) {
> break;
> } else {
> dev_err(&udev->dev, "Failed to init device,
> will retry %d more times.\n", retry);
> if (retry==0)
> goto exit;
> else {
> /* power off for 500 ms before retry */
> rtl8xxxu_power_off(priv);
> msleep(500);
> }
> }
> }
>
> So far, this always seems to work on the second try, even with a very
> short firmware_poll_max (50).
>
> I even tried forcing 50 power-cycles and inits in a row, and the firmware
> still starts up on the 51st cycle, and everything works fine.
>
> I don't understand *why* this works, but it seems like it might be a
> more reliable solution, since it addresses the experience common to
> the multiple bug reports, wherein the failure is only on the first attempt
> after cold boot.

Interesting, so if I understand you correctly, if you run
rtl8xxxu_power_off() once, the driver comes up correctly?

It is possible that rtl8xxxu_power_off() resets something that isn't being
initialized normally in the init sequence. It would be interesting to
try to break it down to find out which piece of the _power_off() code
we're missing.

Nice work!

Cheers,
Jes

2016-05-21 03:13:12

by Jes Sorensen

[permalink] [raw]
Subject: Re: [PATCH] rtl8xxxu: increase polling timeout for firmware startup

Daniel Lenski <[email protected]> writes:
> On Fri, May 20, 2016 at 1:50 PM, Daniel Lenski <[email protected]> wrote:
>> Am I understanding your idea correctly? To put a loop around the 8051
>> reset and the firmware polling loop, with a delay between failure and retry?
>
> That was some sloppy pseudo-code by me... here's what I think you're
> suggesting:

Actually I was more thinking along resetting it properly before tryinhg
to launch the firmware. Ie. before enabling execution.

Cheers,
Jes

2016-05-20 04:52:25

by Daniel Lenski

[permalink] [raw]
Subject: Re: [PATCH] rtl8xxxu: increase polling timeout for firmware startup

You're welcome, and thanks for writing this great driver. It really
makes a huge difference for stability of the Yoga 13 wifi!

Unfortunately, I ran into a case today where even 5000 loops was not
enough after a cold boot. 5000 loops meant about 1.5 second delay
between finishing the firmware checksum poll, while waiting for the
firmware to start. It now appears to me that the number of required
polling loops must be strongly bimodal.

I added some logging, so that the driver reports to me the number of
loops required for the firmware to start.

[ 7.726197] usb 1-1.4: Vendor: Realtek
[ 7.726200] usb 1-1.4: Product: 802.11n WLAN Adapter
[ 7.726203] usb 1-1.4: RTL8723AU rev B (TSMC) 1T1R, TX queues 2,
WiFi=1, BT=1, GPS=0, HI PA=0
[ 7.726204] usb 1-1.4: RTL8723AU MAC: 20:16:d8:ce:5e:29
[ 7.726206] usb 1-1.4: rtl8xxxu: Loading firmware
rtlwifi/rtl8723aufw_B_NoBT.bin
[ 7.747313] usb 1-1.4: Firmware revision 31.0 (signature 0x2302)
[ 7.842450] usb 1-1.4: Firmware checksum poll completed after 0 loops
[ 9.271023] usb 1-1.4: Firmware failed to start after 5000 loops
[ 9.271026] usb 1-1.4: Enabling HT_20_40 on the 2.4GHz band

After manual rmmod and modprobe, the firmware startup then completes
in a small number of loops (23 is most common):

[ 207.334435] usbcore: deregistering interface driver rtl8xxxu
[ 207.337394] usb 1-1.4: rtl8xxxu_int_complete: Error -108
[ 207.342254] usb 1-1.4: disconnecting
[ 207.586628] usb 1-1.4: Vendor: Realtek
[ 207.586638] usb 1-1.4: Product: 802.11n WLAN Adapter
[ 207.586646] usb 1-1.4: RTL8723AU rev B (TSMC) 1T1R, TX queues 2,
WiFi=1, BT=1, GPS=0, HI PA=0
[ 207.586651] usb 1-1.4: RTL8723AU MAC: 20:16:d8:ce:5e:29
[ 207.586657] usb 1-1.4: rtl8xxxu: Loading firmware
rtlwifi/rtl8723aufw_B_NoBT.bin
[ 207.586748] usb 1-1.4: Firmware revision 31.0 (signature 0x2302)
[ 207.669288] usb 1-1.4: Firmware checksum poll completed after 0 loops
[ 207.675421] usb 1-1.4: Firmware ready to start after 23 loops
[ 208.118745] usb 1-1.4: Enabling HT_20_40 on the 2.4GHz band

After wakeup from suspend, the firmware startup again completes in 23 loops:

[ 281.660333] usb 1-1.4: Vendor: Realtek
[ 281.660337] usb 1-1.4: Product: 802.11n WLAN Adapter
[ 281.660343] usb 1-1.4: RTL8723AU rev B (TSMC) 1T1R, TX queues 2,
WiFi=1, BT=1, GPS=0, HI PA=0
[ 281.660347] usb 1-1.4: RTL8723AU MAC: 20:16:d8:ce:5e:29
[ 281.660351] usb 1-1.4: rtl8xxxu: Loading firmware
rtlwifi/rtl8723aufw_B_NoBT.bin
[ 281.660369] usb 1-1.4: Firmware revision 31.0 (signature 0x2302)
[ 281.742107] usb 1-1.4: Firmware checksum poll completed after 0 loops
[ 281.748214] usb 1-1.4: Firmware ready to start after 23 loops
[ 282.191088] usb 1-1.4: Enabling HT_20_40 on the 2.4GHz band
[ 282.192740] PM: Finishing wakeup.

I would like to do some more testing and logging try to better map the
bimodal distribution, but I can't figure out a way to automatically
produce a large number of cold boots.

Thanks,
Dan

On May 18, 2016 7:08 PM, "Jes Sorensen" <[email protected]> wrote:
>
> Dan Lenski <[email protected]> writes:
> > Here is a patch to increase the polling timeout for rtl8xxxu firmware
> > startup.
> >
> > This patch now applies cleanly to Jes Sorensen's rtl8xxxu-devel tree
> > (86c89dd4782030c4f9e0c82424a8b92f9fdb35aa).
> >
> > The patch to make this a module parameter was removed.
> >
> > Dan Lenski (1):
> > rtl8xxxu: Increase default polling timeout for firmware startup
> >
> > drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Dan,
>
> Thanks for the patch, I'll apply it shortly.
>
> Cheers,
> Jes

2016-05-31 18:00:08

by Daniel Lenski

[permalink] [raw]
Subject: Re: [PATCH] rtl8xxxu: increase polling timeout for firmware startup

On Tue, May 31, 2016 at 8:04 AM, Jes Sorensen <[email protected]> wrote:
> Given your follow-on discoveries, do you still want me to apply the poll
> timeout patch, or should we wait until we nail down the reset code issue
> first?

No, I don't think the timeout patch is worth appying. I believe it
was just dumb luck that it worked when I initially tested it,
because I did not understand how to reproducibly generate the
incomplete-reset condition then.

One the other hand, the power-cycle-and-retry approach has been 100%
reliable for me. It has never failed to bring up the device in the
last 8-10 days and probably a few dozen reboots.

The patch for this could be applied as-is and I expect it will fix the
issue for other users who've reported it, though I understand that
you're interested in identifying the root cause issue with the reset
sequence as well.

-Dan

ps- Just checked my dmesg. Sure enough, worked on the second try today too:

[ 7.802484] usb 1-1.4: RTL8723AU rev B (TSMC) 1T1R, TX queues 2,
WiFi=1, BT=1, GPS=0, HI PA=0
[ 7.802485] usb 1-1.4: RTL8723AU MAC: 20:16:d8:ce:5e:29
[ 7.802487] usb 1-1.4: rtl8xxxu: Loading firmware
rtlwifi/rtl8723aufw_B_NoBT.bin
[ 7.820331] usb 1-1.4: Firmware revision 31.0 (signature 0x2302)
[ 7.913493] usb 1-1.4: Firmware checksum poll completed after 0 loops
[ 9.381455] usb 1-1.4: Firmware failed to start after 5000 loops
[ 9.381460] usb 1-1.4: Failed to init device, will retry 5 more times.
[ 9.969912] usb 1-1.4: Firmware checksum poll completed after 0 loops
[ 9.976034] usb 1-1.4: Firmware started after 23 loops
[ 10.417327] usb 1-1.4: Enabling HT_20_40 on the 2.4GHz band

2016-05-23 21:48:04

by Daniel Lenski

[permalink] [raw]
Subject: Re: [PATCH] rtl8xxxu: increase polling timeout for firmware startup

On Mon, May 23, 2016 at 12:24 PM, Jes Sorensen <[email protected]> wrote:
> Interesting, so if I understand you correctly, if you run
> rtl8xxxu_power_off() once, the driver comes up correctly?

Right. If rtl8xxxu_init_device() fails, I simply call
rtl8xxxu_power_off(), and then rtl8xxxu_init_device() again. It never
fails on the second try.

for (retry=0; retry<2; retry++) {
ret = rtl8xxxu_init_device(hw);
if (ret==0) {
break;
} else if (retry==1) {
dev_err(&udev->dev, "Fatal - failed to init device.\n");
goto exit;
} else {
dev_warn(&udev->dev, "Failed to init device,
will power off and retry.\n");
rtl8xxxu_power_off(priv);
msleep(50);
}
}

> It is possible that rtl8xxxu_power_off() resets something that isn't being
> initialized normally in the init sequence. It would be interesting to
> try to break it down to find out which piece of the _power_off() code
> we're missing.

Here's the evidence that I have:

1. I can reliably induce the failure-to-start condition by warm-booting via
kexec --force, without rtl8xxxu_power_off()

2. I can reliably recover from this artificially-induced failure-to-start using
rtl8xxxu_power_off(), and it appears to work for the "naturally occurring"
failure-to-start too.

3. I can *not* recover from artificially induced failure-to-start by lengthening
the polling timeout.

(Which means that my previous patch to lengthen the timeout was
bogus, and it must have just worked by luck or some detail of how I was
rebooting while testing it.)

> Nice work!

Thanks. If someone else is able to reproduce the artificial failure-to-start,
I'll be more confident that I've actually solved it this time.

Thanks,
Dan

2016-05-31 18:24:02

by Jes Sorensen

[permalink] [raw]
Subject: Re: [PATCH] rtl8xxxu: increase polling timeout for firmware startup

Daniel Lenski <[email protected]> writes:
> On Tue, May 31, 2016 at 8:04 AM, Jes Sorensen <[email protected]> wrote:
>> Given your follow-on discoveries, do you still want me to apply the poll
>> timeout patch, or should we wait until we nail down the reset code issue
>> first?
>
> No, I don't think the timeout patch is worth appying. I believe it
> was just dumb luck that it worked when I initially tested it,
> because I did not understand how to reproducibly generate the
> incomplete-reset condition then.
>
> One the other hand, the power-cycle-and-retry approach has been 100%
> reliable for me. It has never failed to bring up the device in the
> last 8-10 days and probably a few dozen reboots.
>
> The patch for this could be applied as-is and I expect it will fix the
> issue for other users who've reported it, though I understand that
> you're interested in identifying the root cause issue with the reset
> sequence as well.

Thanks for the update!

I would like if you could try to break down the power_off sequence of
code to try to figure out exactly what part of it does the trick.

Maybe try if this sequence does the trick first:

/* Reset MCU IO Wrapper */
val8 = rtl8xxxu_read8(priv, REG_RSV_CTRL + 1);
val8 &= ~BIT(0);
rtl8xxxu_write8(priv, REG_RSV_CTRL + 1, val8);

val8 = rtl8xxxu_read8(priv, REG_RSV_CTRL + 1);
val8 |= BIT(0);
rtl8xxxu_write8(priv, REG_RSV_CTRL + 1, val8);

/* RSV_CTRL 0x1C[7:0] = 0x0e lock ISO/CLK/Power control register */
rtl8xxxu_write8(priv, REG_RSV_CTRL, 0x0e);

If that doesn't do it, try to call rtl8xxxu_emu_to_disabled() instead;

Cheers,
Jes

2016-05-18 17:57:37

by Daniel Lenski

[permalink] [raw]
Subject: [PATCH] rtl8xxxu: Increase default polling timeout for firmware startup

This patch increases the default value for the maximum number of polling
loops to wait for the rtl8xxxu MCU to start after the firmware is loaded
(from 1000 to 5000).

With RTL8723AU chipset, I frequently encounter "Firmware failed to start"
errors from rtl8xxxu after a cold boot.

It appears that other chipsets supported by the driver have the same
problem. Here are a couple of relevant bug reports:
- http://ubuntuforums.org/showthread.php?t=2321756
- https://www.mail-archive.com/[email protected]/msg4942468.html

This issue seems to occur because RTL8XXXU_FIRMWARE_POLL_MAX (1000) is
too short, and the MCU fails to start up as quickly as expected.

With a longer value (5000), the driver starts up consistently and
successfully after cold-boot.

Signed-off-by: Dan Lenski <[email protected]>
---
drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
index 870c9cd..972767f 100644
--- a/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
+++ b/drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h
@@ -55,7 +55,7 @@
#define TX_PAGE_NUM_NORM_PQ_8192E 0x00

#define RTL_FW_PAGE_SIZE 4096
-#define RTL8XXXU_FIRMWARE_POLL_MAX 1000
+#define RTL8XXXU_FIRMWARE_POLL_MAX 5000

#define RTL8723A_CHANNEL_GROUPS 3
#define RTL8723A_MAX_RF_PATHS 2
--
2.8.2


2016-05-20 20:54:51

by Daniel Lenski

[permalink] [raw]
Subject: Re: [PATCH] rtl8xxxu: increase polling timeout for firmware startup

On Fri, May 20, 2016 at 1:50 PM, Daniel Lenski <[email protected]> wrote:
> Am I understanding your idea correctly? To put a loop around the 8051
> reset and the firmware polling loop, with a delay between failure and retry?

That was some sloppy pseudo-code by me... here's what I think you're
suggesting:

for (retry=1; retry<=5; retry++) {
/* Reset the 8051 */
priv->fops->reset_8051(priv);

/* Wait for firmware to become ready */
for (i=0; i<RTL8XXXXU_FIRMWARE_POLL_MAX; i++) {
...
}

/* On failure, insert a delay before trying again */
if (i==RTL8XXXU_FIRMWARE_POLL_MAX)
msleep(500);
}

Thanks,
Dan

2016-05-31 15:04:03

by Jes Sorensen

[permalink] [raw]
Subject: Re: [PATCH] rtl8xxxu: increase polling timeout for firmware startup

Jes Sorensen <[email protected]> writes:
> Dan Lenski <[email protected]> writes:
>> Here is a patch to increase the polling timeout for rtl8xxxu firmware
>> startup.
>>
>> This patch now applies cleanly to Jes Sorensen's rtl8xxxu-devel tree
>> (86c89dd4782030c4f9e0c82424a8b92f9fdb35aa).
>>
>> The patch to make this a module parameter was removed.
>>
>> Dan Lenski (1):
>> rtl8xxxu: Increase default polling timeout for firmware startup
>>
>> drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Dan,
>
> Thanks for the patch, I'll apply it shortly.

Dan,

Given your follow-on discoveries, do you still want me to apply the poll
timeout patch, or should we wait until we nail down the reset code issue
first?

Cheers,
Jes

2016-05-20 14:08:32

by Jes Sorensen

[permalink] [raw]
Subject: Re: [PATCH] rtl8xxxu: increase polling timeout for firmware startup

Daniel Lenski <[email protected]> writes:
> You're welcome, and thanks for writing this great driver. It really
> makes a huge difference for stability of the Yoga 13 wifi!

You're welcome! It made a big difference for my own Yoga 13 and it's
been an exciting project. I learned a lot in the process.

> Unfortunately, I ran into a case today where even 5000 loops was not
> enough after a cold boot. 5000 loops meant about 1.5 second delay
> between finishing the firmware checksum poll, while waiting for the
> firmware to start. It now appears to me that the number of required
> polling loops must be strongly bimodal.
>
> I added some logging, so that the driver reports to me the number of
> loops required for the firmware to start.

This is bizarre, I wonder if the hardware is having issues in your
laptop? Another thing to try would be to do an additional reset of the
chip and wait a bit before trying to load the firmware?

Cheers,
Jes

2016-05-19 02:08:55

by Jes Sorensen

[permalink] [raw]
Subject: Re: [PATCH] rtl8xxxu: increase polling timeout for firmware startup

Dan Lenski <[email protected]> writes:
> Here is a patch to increase the polling timeout for rtl8xxxu firmware
> startup.
>
> This patch now applies cleanly to Jes Sorensen's rtl8xxxu-devel tree
> (86c89dd4782030c4f9e0c82424a8b92f9fdb35aa).
>
> The patch to make this a module parameter was removed.
>
> Dan Lenski (1):
> rtl8xxxu: Increase default polling timeout for firmware startup
>
> drivers/net/wireless/realtek/rtl8xxxu/rtl8xxxu.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)

Dan,

Thanks for the patch, I'll apply it shortly.

Cheers,
Jes