2014-04-08 15:28:26

by Yegor Yefremov

[permalink] [raw]
Subject: wl1271: tx stuck in firmware

I'm using kernel 3.14.0-rc4-12738-g674748b-dirty on a am335x-evmsk as
also on a custom board. I get following issue on both systems, when
working with various calibrator functions:

# calibrator wlan0 plt power_mode on
[ 252.935208] wlcore: power up
[ 253.502531] wlcore: firmware booted in PLT mode PLT_ON (PLT 6.3.10.0.133)
# [ 399.656192] wlcore: ERROR Tx stuck (in FW) for 5000 ms. Starting recovery
[ 399.663410] ------------[ cut here ]------------
[ 399.668606] WARNING: CPU: 0 PID: 761 at
drivers/net/wireless/ti/wlcore/main.c:789
wl12xx_queue_recovery_work+0x60/0x6c()
[ 399.680181] Modules linked in:
[ 399.683465] CPU: 0 PID: 761 Comm: kworker/u2:2 Not tainted
3.14.0-rc4-12738-g674748b-dirty #247
[ 399.692767] Workqueue: phy0 wl12xx_tx_watchdog_work
[ 399.698090] [<c00151e0>] (unwind_backtrace) from [<c0011e38>]
(show_stack+0x10/0x14)
[ 399.706394] [<c0011e38>] (show_stack) from [<c05c56b0>]
(dump_stack+0x7c/0x94)
[ 399.714143] [<c05c56b0>] (dump_stack) from [<c0040274>]
(warn_slowpath_common+0x6c/0x90)
[ 399.722708] [<c0040274>] (warn_slowpath_common) from [<c0040334>]
(warn_slowpath_null+0x1c/0x24)
[ 399.732100] [<c0040334>] (warn_slowpath_null) from [<c038de00>]
(wl12xx_queue_recovery_work+0x60/0x6c)
[ 399.742043] [<c038de00>] (wl12xx_queue_recovery_work) from
[<c038df18>] (wl12xx_tx_watchdog_work+0x10c/0x140)
[ 399.752623] [<c038df18>] (wl12xx_tx_watchdog_work) from
[<c0058fac>] (process_one_work+0x1ac/0x4c4)
[ 399.762279] [<c0058fac>] (process_one_work) from [<c0059e40>]
(worker_thread+0x114/0x3b4)
[ 399.771033] [<c0059e40>] (worker_thread) from [<c005f9fc>]
(kthread+0xcc/0xe8)
[ 399.778777] [<c005f9fc>] (kthread) from [<c000e388>]
(ret_from_fork+0x14/0x2c)
[ 399.786493] ---[ end trace 422890a806d48616 ]---

or following error, when I start NetworkManager and wait for 5 minutes or so:

[ 717.608550] wlcore: Scan completed due to error.
[ 717.613422] ------------[ cut here ]------------
[ 717.618405] WARNING: CPU: 0 PID: 940 at
drivers/net/wireless/ti/wlcore/main.c:789
wl12xx_queue_recovery_work+0x60/0x6c()
[ 717.629832] Modules linked in:
[ 717.633055] CPU: 0 PID: 940 Comm: kworker/u2:2 Tainted: G W
3.14.0-rc4-12737-gefb78a0-dirty #214
[ 717.643264] Workqueue: phy0 wl1271_scan_complete_work
[ 717.648649] [<c00151e0>] (unwind_backtrace) from [<c0011e38>]
(show_stack+0x10/0x14)
[ 717.656762] [<c0011e38>] (show_stack) from [<c05a1944>]
(dump_stack+0x7c/0x94)
[ 717.664372] [<c05a1944>] (dump_stack) from [<c00402bc>]
(warn_slowpath_common+0x6c/0x90)
[ 717.672876] [<c00402bc>] (warn_slowpath_common) from [<c004037c>]
(warn_slowpath_null+0x1c/0x24)
[ 717.682110] [<c004037c>] (warn_slowpath_null) from [<c035be0c>]
(wl12xx_queue_recovery_work+0x60/0x6c)
[ 717.691889] [<c035be0c>] (wl12xx_queue_recovery_work) from
[<c036d804>] (wl1271_scan_complete_work+0xf8/0x108)
[ 717.702391] [<c036d804>] (wl1271_scan_complete_work) from
[<c0058ff4>] (process_one_work+0x1ac/0x4c4)
[ 717.712110] [<c0058ff4>] (process_one_work) from [<c0059e88>]
(worker_thread+0x114/0x3b4)
[ 717.720707] [<c0059e88>] (worker_thread) from [<c005fa44>]
(kthread+0xcc/0xe8)
[ 717.728306] [<c005fa44>] (kthread) from [<c000e388>]
(ret_from_fork+0x14/0x2c)
[ 717.735856] ---[ end trace f4d374bb2ca37664 ]---
[ 717.743587] wlcore: Hardware recovery in progress. FW ver: Rev 6.3.10.0.133
[ 717.752600] wlcore: pc: 0x0, hint_sts: 0x00000020 count: 6
[ 717.759609] wlcore: down
[ 717.764108] ieee80211 phy0: Hardware restart was requested
[ 718.250930] wlcore: firmware booted (Rev 6.3.10.0.133)

Any idea, what could be wrong?

Regards,
Yegor


2014-04-09 13:25:55

by Arik Nemtsov

[permalink] [raw]
Subject: Re: wl1271: tx stuck in firmware

On Wed, Apr 9, 2014 at 4:20 PM, Yegor Yefremov
<[email protected]> wrote:
> On Wed, Apr 9, 2014 at 3:07 PM, Arik Nemtsov <[email protected]> wrote:
> Both changes helped a lot. At least one can make a lot of tests
> without a crash, though I had one
>
> # calibrator wlan0 plt tune_channel 0 1
> # calibrator wlan0 plt tx_tone 1 9000
> [ 180.041505] wlcore: ERROR command complete timeout

It's a FW error.
Not sure which change you tried, but if it was one of Luca's, I guess
the FW doesn't like it when you ignore its requests for a dummy
packet. Did you try my change with just disabling the re-arm?

Arik

2014-04-09 11:49:43

by Luca Coelho

[permalink] [raw]
Subject: Re: wl1271: tx stuck in firmware

On Wed, 2014-04-09 at 12:16 +0200, Yegor Yefremov wrote:
> On Wed, Apr 9, 2014 at 11:21 AM, Luca Coelho <[email protected]> wrote:
> > On Wed, 2014-04-09 at 11:14 +0200, Yegor Yefremov wrote:
> >> On Tue, Apr 8, 2014 at 10:29 PM, Luca Coelho <[email protected]> wrote:
> >> > On Tue, 2014-04-08 at 17:28 +0200, Yegor Yefremov wrote:
> >> >> Any idea, what could be wrong?
> >> >
> >> > First of all, try a clean kernel.
> >>
> >> It is not simple to get a clean kernel. "Official" support for
> >> am335x-evmsk WLAN was added first in the upcoming 3.15 kernel. I've
> >> tried the merge window version (3.14.0-12041-g75ff24f-dirty) and get
> >> following issue:
> >>
> >> # calibrator wlan0 plt power_mode on
> >> wlcore: power up
> >> wlcore: ERROR timeout waiting for the hardware to complete initialization
> >> wlcore: ERROR timeout waiting for the hardware to complete initialization
> >> wlcore: ERROR timeout waiting for the hardware to complete initialization
> >> wlcore: ERROR firmware boot in PLT mode failed despite 3 retries
> >> command failed: Input/output error (-5)
> >
> > Okay, the main question here is, did you make any changes in the
> > wlcore/wl12xx code? As I said, the TX watchdog should *not* be
> > initialized in PLT mode.
>
> I made not changes to any file in drivers/net/wireless/ti/
>
> > Maybe you could add some prints in the driver to see when the watchdog
> > timer is set up? Try to add a printk in every place where
> > wl12xx_rearm_tx_watchdog_locked() is called to see where it is being
> > armed.
>
> drivers/net/wireless/ti/wlcore/tx.c in wl1271_tx_allocate()
>
> # calibrator wlan0 plt power_mode on
> [ 93.107215] wlcore: power up
> [ 93.688714] wlcore: firmware booted in PLT mode PLT_ON (PLT 6.3.10.0.133)
> # [ 139.232802] YY: wl1271_tx_allocate
> [ 144.241849] wlcore: ERROR Tx stuck (in FW) for 5000 ms. Starting recovery

Okay, you have a starting point. :)

Now try to look from where this is being called and so on. I don't
think we should be calling wl1271_tx_allocate() in PLT mode either. Try
tracing back until you find where this is coming from.

--
Luca.


2014-04-09 15:29:19

by Yegor Yefremov

[permalink] [raw]
Subject: Re: wl1271: tx stuck in firmware

On Wed, Apr 9, 2014 at 3:25 PM, Arik Nemtsov <[email protected]> wrote:
> On Wed, Apr 9, 2014 at 4:20 PM, Yegor Yefremov
> <[email protected]> wrote:
>> On Wed, Apr 9, 2014 at 3:07 PM, Arik Nemtsov <[email protected]> wrote:
>> Both changes helped a lot. At least one can make a lot of tests
>> without a crash, though I had one
>>
>> # calibrator wlan0 plt tune_channel 0 1
>> # calibrator wlan0 plt tx_tone 1 9000
>> [ 180.041505] wlcore: ERROR command complete timeout
>
> It's a FW error.
> Not sure which change you tried, but if it was one of Luca's, I guess
> the FW doesn't like it when you ignore its requests for a dummy
> packet. Did you try my change with just disabling the re-arm?

I have tried both changes. At first disabling the re-arm, it comes
with the same crash as when I implement both changes at once.

Yegor

2014-04-09 12:42:06

by Luca Coelho

[permalink] [raw]
Subject: Re: wl1271: tx stuck in firmware

On Wed, 2014-04-09 at 15:20 +0300, Arik Nemtsov wrote:
> >>
> >> # calibrator wlan0 plt power_mode on
> >> [ 93.107215] wlcore: power up
> >> [ 93.688714] wlcore: firmware booted in PLT mode PLT_ON (PLT 6.3.10.0.133)
> >> # [ 139.232802] YY: wl1271_tx_allocate
> >> [ 144.241849] wlcore: ERROR Tx stuck (in FW) for 5000 ms. Starting recovery
> >
> > Okay, you have a starting point. :)
> >
> > Now try to look from where this is being called and so on. I don't
> > think we should be calling wl1271_tx_allocate() in PLT mode either. Try
> > tracing back until you find where this is coming from.
>
> Actually we've seen this bug before. At the time it was caused by FW
> requests for dummy packets during PLT mode, which forced us to issue
> Tx..

Ah, interesting, I hadn't heard about it. Would it send it even if the
event was masked out? Masking it out (if it's not already) for PLT would
probably be the right thing to do.


> It was fixed in the FW, but I'm not sure this was ever upstreamed
> (customer specific fix, etc). Anyway it's pretty safe to disable the
> Tx watchdog during PLT - try something like this:
>
> void wl12xx_rearm_tx_watchdog_locked(struct wl1271 *wl)
> {
> if (wl->plt)
> return;

Maybe this could be blocked even earlier, like here:

diff --git a/drivers/net/wireless/ti/wlcore/event.c b/drivers/net/wireless/ti/wlcore/event.c
index 1f9a360..c44d3c0 100644
--- a/drivers/net/wireless/ti/wlcore/event.c
+++ b/drivers/net/wireless/ti/wlcore/event.c
@@ -158,6 +158,9 @@ EXPORT_SYMBOL_GPL(wlcore_event_channel_switch);

void wlcore_event_dummy_packet(struct wl1271 *wl)
{
+ if (wl->plt)
+ return;
+
wl1271_debug(DEBUG_EVENT, "DUMMY_PACKET_ID_EVENT_ID");
wl1271_tx_dummy_packet(wl);
}

--
Luca.


2014-04-10 07:43:39

by Arik Nemtsov

[permalink] [raw]
Subject: Re: wl1271: tx stuck in firmware

On Thu, Apr 10, 2014 at 10:21 AM, Yegor Yefremov
<[email protected]> wrote:
> On Wed, Apr 9, 2014 at 5:28 PM, Yegor Yefremov
> <[email protected]> wrote:
>> On Wed, Apr 9, 2014 at 3:25 PM, Arik Nemtsov <[email protected]> wrote:
>>> On Wed, Apr 9, 2014 at 4:20 PM, Yegor Yefremov
>>> <[email protected]> wrote:
>>>> On Wed, Apr 9, 2014 at 3:07 PM, Arik Nemtsov <[email protected]> wrote:
>>>> Both changes helped a lot. At least one can make a lot of tests
>>>> without a crash, though I had one
>>>>
>>>> # calibrator wlan0 plt tune_channel 0 1
>>>> # calibrator wlan0 plt tx_tone 1 9000
>>>> [ 180.041505] wlcore: ERROR command complete timeout
>>>
>>> It's a FW error.
>>> Not sure which change you tried, but if it was one of Luca's, I guess
>>> the FW doesn't like it when you ignore its requests for a dummy
>>> packet. Did you try my change with just disabling the re-arm?
>>
>> I have tried both changes. At first disabling the re-arm, it comes
>> with the same crash as when I implement both changes at once.
>
> What to do with usual fw and scanning?
>
> # [ 165.608684] YY: wl1271_scan_complete_work
> [ 165.614218] wlcore: Scan completed due to error.

Ah you didn't mention you also had problems with a regular FW. It's
probably either an API sync issue between the FW and driver, or a
platform integration issue where you're sometimes not getting
interrupts.

Arik

2014-04-10 07:22:01

by Yegor Yefremov

[permalink] [raw]
Subject: Re: wl1271: tx stuck in firmware

On Wed, Apr 9, 2014 at 5:28 PM, Yegor Yefremov
<[email protected]> wrote:
> On Wed, Apr 9, 2014 at 3:25 PM, Arik Nemtsov <[email protected]> wrote:
>> On Wed, Apr 9, 2014 at 4:20 PM, Yegor Yefremov
>> <[email protected]> wrote:
>>> On Wed, Apr 9, 2014 at 3:07 PM, Arik Nemtsov <[email protected]> wrote:
>>> Both changes helped a lot. At least one can make a lot of tests
>>> without a crash, though I had one
>>>
>>> # calibrator wlan0 plt tune_channel 0 1
>>> # calibrator wlan0 plt tx_tone 1 9000
>>> [ 180.041505] wlcore: ERROR command complete timeout
>>
>> It's a FW error.
>> Not sure which change you tried, but if it was one of Luca's, I guess
>> the FW doesn't like it when you ignore its requests for a dummy
>> packet. Did you try my change with just disabling the re-arm?
>
> I have tried both changes. At first disabling the re-arm, it comes
> with the same crash as when I implement both changes at once.

What to do with usual fw and scanning?

# [ 165.608684] YY: wl1271_scan_complete_work
[ 165.614218] wlcore: Scan completed due to error.
[ 165.619356] ------------[ cut here ]------------
[ 165.624315] WARNING: CPU: 0 PID: 761 at
drivers/net/wireless/ti/wlcore/main.c:803
wl12xx_queue_recovery_work+0x60/0x6c()
[ 165.635889] Modules linked in:
[ 165.639335] CPU: 0 PID: 761 Comm: kworker/u2:2 Not tainted
3.14.0-rc4-12738-g674748b-dirty #253
[ 165.648663] Workqueue: phy0 wl1271_scan_complete_work
[ 165.654087] [<c00151e0>] (unwind_backtrace) from [<c0011e38>]
(show_stack+0x10/0x14)
[ 165.662412] [<c0011e38>] (show_stack) from [<c05c5794>]
(dump_stack+0x7c/0x94)
[ 165.670162] [<c05c5794>] (dump_stack) from [<c0040274>]
(warn_slowpath_common+0x6c/0x90)
[ 165.678819] [<c0040274>] (warn_slowpath_common) from [<c0040334>]
(warn_slowpath_null+0x1c/0x24)
[ 165.688213] [<c0040334>] (warn_slowpath_null) from [<c038de34>]
(wl12xx_queue_recovery_work+0x60/0x6c)
[ 165.698165] [<c038de34>] (wl12xx_queue_recovery_work) from
[<c039f8d8>] (wl1271_scan_complete_work+0xf4/0x108)
[ 165.708911] [<c039f8d8>] (wl1271_scan_complete_work) from
[<c0058fac>] (process_one_work+0x1ac/0x4c4)
[ 165.718772] [<c0058fac>] (process_one_work) from [<c0059e40>]
(worker_thread+0x114/0x3b4)
[ 165.727531] [<c0059e40>] (worker_thread) from [<c005f9fc>]
(kthread+0xcc/0xe8)
[ 165.735187] [<c005f9fc>] (kthread) from [<c000e388>]
(ret_from_fork+0x14/0x2c)
[ 165.742915] ---[ end trace 9c62bb183fd26be3 ]---
[ 165.755662] wlcore: Hardware recovery in progress. FW ver: Rev 6.3.10.0.133
[ 165.769379] wlcore: pc: 0x0, hint_sts: 0x00000020 count: 1
[ 165.778031] wlcore: down
[ 165.790946] ieee80211 phy0: Hardware restart was requested
[ 166.337122] wlcore: firmware booted (Rev 6.3.10.0.133)
[ 188.977108] YY: wl1271_scan_complete_work
[ 191.333964] YY: wl1271_scan_complete_work

2014-04-09 09:21:46

by Luca Coelho

[permalink] [raw]
Subject: Re: wl1271: tx stuck in firmware

On Wed, 2014-04-09 at 11:14 +0200, Yegor Yefremov wrote:
> On Tue, Apr 8, 2014 at 10:29 PM, Luca Coelho <[email protected]> wrote:
> > On Tue, 2014-04-08 at 17:28 +0200, Yegor Yefremov wrote:
> >> Any idea, what could be wrong?
> >
> > First of all, try a clean kernel.
>
> It is not simple to get a clean kernel. "Official" support for
> am335x-evmsk WLAN was added first in the upcoming 3.15 kernel. I've
> tried the merge window version (3.14.0-12041-g75ff24f-dirty) and get
> following issue:
>
> # calibrator wlan0 plt power_mode on
> wlcore: power up
> wlcore: ERROR timeout waiting for the hardware to complete initialization
> wlcore: ERROR timeout waiting for the hardware to complete initialization
> wlcore: ERROR timeout waiting for the hardware to complete initialization
> wlcore: ERROR firmware boot in PLT mode failed despite 3 retries
> command failed: Input/output error (-5)

Okay, the main question here is, did you make any changes in the
wlcore/wl12xx code? As I said, the TX watchdog should *not* be
initialized in PLT mode.

Maybe you could add some prints in the driver to see when the watchdog
timer is set up? Try to add a printk in every place where
wl12xx_rearm_tx_watchdog_locked() is called to see where it is being
armed.


> If I use the "normal" firmware, i.e. ifconfig wlan0 up, it gets loaded
> and I can search for the stations. But PLT doesn't want to boot. Do
> you still have a board, where you could try the latest
> kernel/firmware?

I do have a board, but I *really* don't have the time to set it up and
everything. Sorry about that...

--
Cheers,
Luca.


2014-04-08 20:29:22

by Luca Coelho

[permalink] [raw]
Subject: Re: wl1271: tx stuck in firmware

On Tue, 2014-04-08 at 17:28 +0200, Yegor Yefremov wrote:
> I'm using kernel 3.14.0-rc4-12738-g674748b-dirty on a am335x-evmsk as
> also on a custom board. I get following issue on both systems, when
> working with various calibrator functions:

That's not a very nice kernel, right? How am I supposed to know what the
12738 patches you have on top of 3.14.0-rc4 are? Even worse because your
tree is dirty. :(


> # calibrator wlan0 plt power_mode on
> [ 252.935208] wlcore: power up
> [ 253.502531] wlcore: firmware booted in PLT mode PLT_ON (PLT 6.3.10.0.133)
> # [ 399.656192] wlcore: ERROR Tx stuck (in FW) for 5000 ms. Starting recovery

This sounds really strange. The TX watchdog should not be armed in PLT
mode.

> or following error, when I start NetworkManager and wait for 5 minutes or so:
>
> [ 717.608550] wlcore: Scan completed due to error.
> [ 717.613422] ------------[ cut here ]------------
> [ 717.618405] WARNING: CPU: 0 PID: 940 at
> drivers/net/wireless/ti/wlcore/main.c:789
> wl12xx_queue_recovery_work+0x60/0x6c()
> [ 717.629832] Modules linked in:
> [ 717.633055] CPU: 0 PID: 940 Comm: kworker/u2:2 Tainted: G W
> 3.14.0-rc4-12737-gefb78a0-dirty #214
> [ 717.643264] Workqueue: phy0 wl1271_scan_complete_work
> [ 717.648649] [<c00151e0>] (unwind_backtrace) from [<c0011e38>]
> (show_stack+0x10/0x14)
> [ 717.656762] [<c0011e38>] (show_stack) from [<c05a1944>]
> (dump_stack+0x7c/0x94)
> [ 717.664372] [<c05a1944>] (dump_stack) from [<c00402bc>]
> (warn_slowpath_common+0x6c/0x90)
> [ 717.672876] [<c00402bc>] (warn_slowpath_common) from [<c004037c>]
> (warn_slowpath_null+0x1c/0x24)
> [ 717.682110] [<c004037c>] (warn_slowpath_null) from [<c035be0c>]
> (wl12xx_queue_recovery_work+0x60/0x6c)
> [ 717.691889] [<c035be0c>] (wl12xx_queue_recovery_work) from
> [<c036d804>] (wl1271_scan_complete_work+0xf8/0x108)
> [ 717.702391] [<c036d804>] (wl1271_scan_complete_work) from
> [<c0058ff4>] (process_one_work+0x1ac/0x4c4)
> [ 717.712110] [<c0058ff4>] (process_one_work) from [<c0059e88>]
> (worker_thread+0x114/0x3b4)
> [ 717.720707] [<c0059e88>] (worker_thread) from [<c005fa44>]
> (kthread+0xcc/0xe8)
> [ 717.728306] [<c005fa44>] (kthread) from [<c000e388>]
> (ret_from_fork+0x14/0x2c)
> [ 717.735856] ---[ end trace f4d374bb2ca37664 ]---
> [ 717.743587] wlcore: Hardware recovery in progress. FW ver: Rev 6.3.10.0.133
> [ 717.752600] wlcore: pc: 0x0, hint_sts: 0x00000020 count: 6
> [ 717.759609] wlcore: down
> [ 717.764108] ieee80211 phy0: Hardware restart was requested
> [ 718.250930] wlcore: firmware booted (Rev 6.3.10.0.133)
>
> Any idea, what could be wrong?

First of all, try a clean kernel.

--
Luca.


2014-04-09 13:21:13

by Yegor Yefremov

[permalink] [raw]
Subject: Re: wl1271: tx stuck in firmware

On Wed, Apr 9, 2014 at 3:07 PM, Arik Nemtsov <[email protected]> wrote:
> On Wed, Apr 9, 2014 at 3:41 PM, Luca Coelho <[email protected]> wrote:
>>> Actually we've seen this bug before. At the time it was caused by FW
>>> requests for dummy packets during PLT mode, which forced us to issue
>>> Tx..
>>
>> Ah, interesting, I hadn't heard about it. Would it send it even if the
>> event was masked out? Masking it out (if it's not already) for PLT would
>> probably be the right thing to do.
>>
>>
>>> It was fixed in the FW, but I'm not sure this was ever upstreamed
>>> (customer specific fix, etc). Anyway it's pretty safe to disable the
>>> Tx watchdog during PLT - try something like this:
>>>
>>> void wl12xx_rearm_tx_watchdog_locked(struct wl1271 *wl)
>>> {
>>> if (wl->plt)
>>> return;
>>
>> Maybe this could be blocked even earlier, like here:
>>
>> diff --git a/drivers/net/wireless/ti/wlcore/event.c b/drivers/net/wireless/ti/wlcore/event.c
>> index 1f9a360..c44d3c0 100644
>> --- a/drivers/net/wireless/ti/wlcore/event.c
>> +++ b/drivers/net/wireless/ti/wlcore/event.c
>> @@ -158,6 +158,9 @@ EXPORT_SYMBOL_GPL(wlcore_event_channel_switch);
>>
>> void wlcore_event_dummy_packet(struct wl1271 *wl)
>> {
>> + if (wl->plt)
>> + return;
>> +
>> wl1271_debug(DEBUG_EVENT, "DUMMY_PACKET_ID_EVENT_ID");
>> wl1271_tx_dummy_packet(wl);
>> }
>>
>> --
>> Luca.
>>
>
> These are good suggestions as well.

Both changes helped a lot. At least one can make a lot of tests
without a crash, though I had one

# calibrator wlan0 plt tune_channel 0 1
# calibrator wlan0 plt tx_tone 1 9000
[ 180.041505] wlcore: ERROR command complete timeout
[ 180.046622] ------------[ cut here ]------------
[ 180.051756] WARNING: CPU: 0 PID: 1072 at
drivers/net/wireless/ti/wlcore/main.c:803
wl12xx_queue_recovery_work+0x60/0x6c()
[ 180.063451] Modules linked in:
[ 180.066734] CPU: 0 PID: 1072 Comm: calibrator Not tainted
3.14.0-rc4-12739-g17534e5-dirty #251
[ 180.075991] [<c00151e0>] (unwind_backtrace) from [<c0011e38>]
(show_stack+0x10/0x14)
[ 180.084311] [<c0011e38>] (show_stack) from [<c05c57d4>]
(dump_stack+0x7c/0x94)
[ 180.092062] [<c05c57d4>] (dump_stack) from [<c0040274>]
(warn_slowpath_common+0x6c/0x90)
[ 180.100721] [<c0040274>] (warn_slowpath_common) from [<c0040334>]
(warn_slowpath_null+0x1c/0x24)
[ 180.110028] [<c0040334>] (warn_slowpath_null) from [<c038de68>]
(wl12xx_queue_recovery_work+0x60/0x6c)
[ 180.119976] [<c038de68>] (wl12xx_queue_recovery_work) from
[<c0390710>] (wlcore_cmd_send_failsafe+0x74/0x420)
[ 180.130596] [<c0390710>] (wlcore_cmd_send_failsafe) from
[<c0390b30>] (wl1271_cmd_test+0x4c/0x9c)
[ 180.140005] [<c0390b30>] (wl1271_cmd_test) from [<c03a0184>]
(wl1271_tm_cmd+0x370/0x5ac)
[ 180.148740] [<c03a0184>] (wl1271_tm_cmd) from [<c055a710>]
(nl80211_testmode_do+0x94/0x148)
[ 180.157698] [<c055a710>] (nl80211_testmode_do) from [<c0496434>]
(genl_rcv_msg+0x224/0x390)
[ 180.166635] [<c0496434>] (genl_rcv_msg) from [<c0495a1c>]
(netlink_rcv_skb+0xc4/0xd8)
[ 180.175015] [<c0495a1c>] (netlink_rcv_skb) from [<c04961fc>]
(genl_rcv+0x20/0x34)
[ 180.183023] [<c04961fc>] (genl_rcv) from [<c0495364>]
(netlink_unicast+0x15c/0x22c)
[ 180.191219] [<c0495364>] (netlink_unicast) from [<c0495794>]
(netlink_sendmsg+0x2bc/0x37c)
[ 180.199974] [<c0495794>] (netlink_sendmsg) from [<c045f950>]
(sock_sendmsg+0x84/0xa8)
[ 180.208360] [<c045f950>] (sock_sendmsg) from [<c0460ea8>]
(___sys_sendmsg.part.28+0x268/0x278)
[ 180.217568] [<c0460ea8>] (___sys_sendmsg.part.28) from [<c0461f10>]
(__sys_sendmsg+0x4c/0x7c)
[ 180.226686] [<c0461f10>] (__sys_sendmsg) from [<c000e2c0>]
(ret_fast_syscall+0x0/0x48)
[ 180.235143] ---[ end trace c13b2d4f8f45be2b ]---
[ 180.241635] wlcore: WARNING TEST command failed
[ 180.246453] wlcore: WARNING testmode cmd test failed: -110
command failed: Connection timed out (-110)

But after reboot I could successfully invoke calibrator wlan0 plt tx_tone 1 9000

2014-04-09 12:20:32

by Arik Nemtsov

[permalink] [raw]
Subject: Re: wl1271: tx stuck in firmware

>>
>> # calibrator wlan0 plt power_mode on
>> [ 93.107215] wlcore: power up
>> [ 93.688714] wlcore: firmware booted in PLT mode PLT_ON (PLT 6.3.10.0.133)
>> # [ 139.232802] YY: wl1271_tx_allocate
>> [ 144.241849] wlcore: ERROR Tx stuck (in FW) for 5000 ms. Starting recovery
>
> Okay, you have a starting point. :)
>
> Now try to look from where this is being called and so on. I don't
> think we should be calling wl1271_tx_allocate() in PLT mode either. Try
> tracing back until you find where this is coming from.

Actually we've seen this bug before. At the time it was caused by FW
requests for dummy packets during PLT mode, which forced us to issue
Tx..

It was fixed in the FW, but I'm not sure this was ever upstreamed
(customer specific fix, etc). Anyway it's pretty safe to disable the
Tx watchdog during PLT - try something like this:

void wl12xx_rearm_tx_watchdog_locked(struct wl1271 *wl)
{
if (wl->plt)
return;

Arik

2014-04-09 10:17:18

by Yegor Yefremov

[permalink] [raw]
Subject: Re: wl1271: tx stuck in firmware

On Wed, Apr 9, 2014 at 11:21 AM, Luca Coelho <[email protected]> wrote:
> On Wed, 2014-04-09 at 11:14 +0200, Yegor Yefremov wrote:
>> On Tue, Apr 8, 2014 at 10:29 PM, Luca Coelho <[email protected]> wrote:
>> > On Tue, 2014-04-08 at 17:28 +0200, Yegor Yefremov wrote:
>> >> Any idea, what could be wrong?
>> >
>> > First of all, try a clean kernel.
>>
>> It is not simple to get a clean kernel. "Official" support for
>> am335x-evmsk WLAN was added first in the upcoming 3.15 kernel. I've
>> tried the merge window version (3.14.0-12041-g75ff24f-dirty) and get
>> following issue:
>>
>> # calibrator wlan0 plt power_mode on
>> wlcore: power up
>> wlcore: ERROR timeout waiting for the hardware to complete initialization
>> wlcore: ERROR timeout waiting for the hardware to complete initialization
>> wlcore: ERROR timeout waiting for the hardware to complete initialization
>> wlcore: ERROR firmware boot in PLT mode failed despite 3 retries
>> command failed: Input/output error (-5)
>
> Okay, the main question here is, did you make any changes in the
> wlcore/wl12xx code? As I said, the TX watchdog should *not* be
> initialized in PLT mode.

I made not changes to any file in drivers/net/wireless/ti/

> Maybe you could add some prints in the driver to see when the watchdog
> timer is set up? Try to add a printk in every place where
> wl12xx_rearm_tx_watchdog_locked() is called to see where it is being
> armed.

drivers/net/wireless/ti/wlcore/tx.c in wl1271_tx_allocate()

# calibrator wlan0 plt power_mode on
[ 93.107215] wlcore: power up
[ 93.688714] wlcore: firmware booted in PLT mode PLT_ON (PLT 6.3.10.0.133)
# [ 139.232802] YY: wl1271_tx_allocate
[ 144.241849] wlcore: ERROR Tx stuck (in FW) for 5000 ms. Starting recovery
[ 144.249064] ------------[ cut here ]------------
[ 144.254329] WARNING: CPU: 0 PID: 761 at
drivers/net/wireless/ti/wlcore/main.c:800
wl12xx_queue_recovery_work+0x60/0x68()
[ 144.265917] Modules linked in:
[ 144.269294] CPU: 0 PID: 761 Comm: kworker/u2:2 Not tainted
3.14.0-rc4-12739-g17534e5-dirty #248
[ 144.278507] Workqueue: phy0 wl12xx_tx_watchdog_work
[ 144.283836] [<c00151e0>] (unwind_backtrace) from [<c0011e38>]
(show_stack+0x10/0x14)
[ 144.292156] [<c0011e38>] (show_stack) from [<c05c5778>]
(dump_stack+0x7c/0x94)
[ 144.299909] [<c05c5778>] (dump_stack) from [<c0040274>]
(warn_slowpath_common+0x6c/0x90)
[ 144.308474] [<c0040274>] (warn_slowpath_common) from [<c0040334>]
(warn_slowpath_null+0x1c/0x24)
[ 144.317869] [<c0040334>] (warn_slowpath_null) from [<c038de34>]
(wl12xx_queue_recovery_work+0x60/0x68)
[ 144.327818] [<c038de34>] (wl12xx_queue_recovery_work) from
[<c038df04>] (wl12xx_tx_watchdog_work+0xc8/0x1b8)
[ 144.338383] [<c038df04>] (wl12xx_tx_watchdog_work) from
[<c0058fac>] (process_one_work+0x1ac/0x4c4)
[ 144.348059] [<c0058fac>] (process_one_work) from [<c0059e40>]
(worker_thread+0x114/0x3b4)
[ 144.356820] [<c0059e40>] (worker_thread) from [<c005f9fc>]
(kthread+0xcc/0xe8)
[ 144.364568] [<c005f9fc>] (kthread) from [<c000e388>]
(ret_from_fork+0x14/0x2c)
[ 144.372292] ---[ end trace c815a12d05c76600 ]---

Yegor

2014-04-09 09:14:30

by Yegor Yefremov

[permalink] [raw]
Subject: Re: wl1271: tx stuck in firmware

On Tue, Apr 8, 2014 at 10:29 PM, Luca Coelho <[email protected]> wrote:
> On Tue, 2014-04-08 at 17:28 +0200, Yegor Yefremov wrote:
>> I'm using kernel 3.14.0-rc4-12738-g674748b-dirty on a am335x-evmsk as
>> also on a custom board. I get following issue on both systems, when
>> working with various calibrator functions:
>
> That's not a very nice kernel, right? How am I supposed to know what the
> 12738 patches you have on top of 3.14.0-rc4 are? Even worse because your
> tree is dirty. :(
>
>
>> # calibrator wlan0 plt power_mode on
>> [ 252.935208] wlcore: power up
>> [ 253.502531] wlcore: firmware booted in PLT mode PLT_ON (PLT 6.3.10.0.133)
>> # [ 399.656192] wlcore: ERROR Tx stuck (in FW) for 5000 ms. Starting recovery
>
> This sounds really strange. The TX watchdog should not be armed in PLT
> mode.
>
>> or following error, when I start NetworkManager and wait for 5 minutes or so:
>>
>> [ 717.608550] wlcore: Scan completed due to error.
>> [ 717.613422] ------------[ cut here ]------------
>> [ 717.618405] WARNING: CPU: 0 PID: 940 at
>> drivers/net/wireless/ti/wlcore/main.c:789
>> wl12xx_queue_recovery_work+0x60/0x6c()
>> [ 717.629832] Modules linked in:
>> [ 717.633055] CPU: 0 PID: 940 Comm: kworker/u2:2 Tainted: G W
>> 3.14.0-rc4-12737-gefb78a0-dirty #214
>> [ 717.643264] Workqueue: phy0 wl1271_scan_complete_work
>> [ 717.648649] [<c00151e0>] (unwind_backtrace) from [<c0011e38>]
>> (show_stack+0x10/0x14)
>> [ 717.656762] [<c0011e38>] (show_stack) from [<c05a1944>]
>> (dump_stack+0x7c/0x94)
>> [ 717.664372] [<c05a1944>] (dump_stack) from [<c00402bc>]
>> (warn_slowpath_common+0x6c/0x90)
>> [ 717.672876] [<c00402bc>] (warn_slowpath_common) from [<c004037c>]
>> (warn_slowpath_null+0x1c/0x24)
>> [ 717.682110] [<c004037c>] (warn_slowpath_null) from [<c035be0c>]
>> (wl12xx_queue_recovery_work+0x60/0x6c)
>> [ 717.691889] [<c035be0c>] (wl12xx_queue_recovery_work) from
>> [<c036d804>] (wl1271_scan_complete_work+0xf8/0x108)
>> [ 717.702391] [<c036d804>] (wl1271_scan_complete_work) from
>> [<c0058ff4>] (process_one_work+0x1ac/0x4c4)
>> [ 717.712110] [<c0058ff4>] (process_one_work) from [<c0059e88>]
>> (worker_thread+0x114/0x3b4)
>> [ 717.720707] [<c0059e88>] (worker_thread) from [<c005fa44>]
>> (kthread+0xcc/0xe8)
>> [ 717.728306] [<c005fa44>] (kthread) from [<c000e388>]
>> (ret_from_fork+0x14/0x2c)
>> [ 717.735856] ---[ end trace f4d374bb2ca37664 ]---
>> [ 717.743587] wlcore: Hardware recovery in progress. FW ver: Rev 6.3.10.0.133
>> [ 717.752600] wlcore: pc: 0x0, hint_sts: 0x00000020 count: 6
>> [ 717.759609] wlcore: down
>> [ 717.764108] ieee80211 phy0: Hardware restart was requested
>> [ 718.250930] wlcore: firmware booted (Rev 6.3.10.0.133)
>>
>> Any idea, what could be wrong?
>
> First of all, try a clean kernel.

It is not simple to get a clean kernel. "Official" support for
am335x-evmsk WLAN was added first in the upcoming 3.15 kernel. I've
tried the merge window version (3.14.0-12041-g75ff24f-dirty) and get
following issue:

# calibrator wlan0 plt power_mode on
wlcore: power up
wlcore: ERROR timeout waiting for the hardware to complete initialization
wlcore: ERROR timeout waiting for the hardware to complete initialization
wlcore: ERROR timeout waiting for the hardware to complete initialization
wlcore: ERROR firmware boot in PLT mode failed despite 3 retries
command failed: Input/output error (-5)

If I use the "normal" firmware, i.e. ifconfig wlan0 up, it gets loaded
and I can search for the stations. But PLT doesn't want to boot. Do
you still have a board, where you could try the latest
kernel/firmware?

Yegor

2014-04-09 13:07:49

by Arik Nemtsov

[permalink] [raw]
Subject: Re: wl1271: tx stuck in firmware

On Wed, Apr 9, 2014 at 3:41 PM, Luca Coelho <[email protected]> wrote:
>> Actually we've seen this bug before. At the time it was caused by FW
>> requests for dummy packets during PLT mode, which forced us to issue
>> Tx..
>
> Ah, interesting, I hadn't heard about it. Would it send it even if the
> event was masked out? Masking it out (if it's not already) for PLT would
> probably be the right thing to do.
>
>
>> It was fixed in the FW, but I'm not sure this was ever upstreamed
>> (customer specific fix, etc). Anyway it's pretty safe to disable the
>> Tx watchdog during PLT - try something like this:
>>
>> void wl12xx_rearm_tx_watchdog_locked(struct wl1271 *wl)
>> {
>> if (wl->plt)
>> return;
>
> Maybe this could be blocked even earlier, like here:
>
> diff --git a/drivers/net/wireless/ti/wlcore/event.c b/drivers/net/wireless/ti/wlcore/event.c
> index 1f9a360..c44d3c0 100644
> --- a/drivers/net/wireless/ti/wlcore/event.c
> +++ b/drivers/net/wireless/ti/wlcore/event.c
> @@ -158,6 +158,9 @@ EXPORT_SYMBOL_GPL(wlcore_event_channel_switch);
>
> void wlcore_event_dummy_packet(struct wl1271 *wl)
> {
> + if (wl->plt)
> + return;
> +
> wl1271_debug(DEBUG_EVENT, "DUMMY_PACKET_ID_EVENT_ID");
> wl1271_tx_dummy_packet(wl);
> }
>
> --
> Luca.
>

These are good suggestions as well.

Arik