2013-02-19 18:08:39

by Daniel Drake

[permalink] [raw]
Subject: mwifiex crash when removing interface while scanning

Hi,

Testing mwifiex_sdio on XO-4 (8787) on Linux 3.8. In OLPC's use case
we quite regularly will power down the wireless card when it is not
being actively used. This is currently proving problematic.

Here is a test case reproducing one of the failures we are seeing:

First, run this script:

insmod mwifiex_sdio.ko
sleep 1
ifconfig eth0 up
iwlist eth0 scan &
sleep 0.5
rmmod mwifiex_sdio

It normally runs fine the first time, no errors, but run it again and
it terminates with these errors:

mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0, cmd_pending=-1
eth0 Failed to read scan data : No such device

>From this point, things are pretty screwed up. Try to load the driver again...

insmod mwifiex_sdio.ko
mwifiex_sdio mmc0:0001:1: WLAN FW already running! Skip FW dnld
mwifiex_sdio mmc0:0001:1: WLAN FW is active
mwifiex_sdio mmc0:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id
(331.391420) = 0xa9, act = 0x0
mwifiex_sdio mmc0:0001:1: num_data_h2c_failure = 0
mwifiex_sdio mmc0:0001:1: num_cmd_h2c_failure = 0
mwifiex_sdio mmc0:0001:1: num_cmd_timeout = 1
mwifiex_sdio mmc0:0001:1: num_tx_timeout = 0
mwifiex_sdio mmc0:0001:1: last_cmd_index = 1
mwifiex_sdio mmc0:0001:1: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00
mwifiex_sdio mmc0:0001:1: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
mwifiex_sdio mmc0:0001:1: last_cmd_resp_index = 0
mwifiex_sdio mmc0:0001:1: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00
mwifiex_sdio mmc0:0001:1: last_event_index = 0
mwifiex_sdio mmc0:0001:1: last_event: 00 00 00 00 00 00 00 00 00 00
mwifiex_sdio mmc0:0001:1: data_sent=1 cmd_sent=1
mwifiex_sdio mmc0:0001:1: ps_mode=0 ps_state=0
mwifiex_sdio mmc0:0001:1: cmd timeout
mwifiex_sdio: Resetting card...
WARNING: driver mwifiex_sdio did not remove its interrupt handler!
mmc0: card 0001 removed

and it doesn't come back.

Let me know how we can help further debugging.

Thanks
Daniel


2013-02-21 00:00:57

by Bing Zhao

[permalink] [raw]
Subject: RE: mwifiex crash when removing interface while scanning

Hi Daniel,

> >> > mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0, cmd_pending=-1
> >> > eth0 Failed to read scan data : No such device
> >
> > After this error message, do you see the sd8787 card is removed from the bus?
> >
> > "mmc0: card 0001 removed"
>
> No.

That's strange.
"mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0, cmd_pending=-1" is from
mwifiex_remove_card().

>
> > My card works fine with a Ricoh controller on x86_64 machine. Of course the power is not cut when I
> remove the mwifiex_sdio module.
>
> Under this kernel, I'm not sure if the power is or isn't being cut at
> this point. I don't think this is relevant though; regular

"we quite regularly will power down the wireless card when it is not
being actively used..."

I interpreted this statement as "the power to the card is being cut while inactive".
mwifiex_remove_card() is called seems agrees with it. Perhaps I'm missing something here.

> insmod/rmmod cycles work fine, the problem only appears when tearing
> down the interface while a scan is in progress.

Could you add some debug logs in the tear-down path to show where exactly it gets stuck?

>
> > I can upgrade my XO-4 to kernel 3.8 if feasible. Please let me know if a HOW-TO doc is available.
>
> You will need a serial console connected, as there is no output on the
> LCD at this time when running the upstream kernel.
>
> http://wiki.laptop.org/go/User:DanielDrake/Run_upstream_kernel_on_XO-4

Will do.

Thanks,
Bing

>
> Daniel

2013-02-21 14:20:10

by Daniel Drake

[permalink] [raw]
Subject: Re: mwifiex crash when removing interface while scanning

On Wed, Feb 20, 2013 at 6:00 PM, Bing Zhao <[email protected]> wrote:
> Hi Daniel,
>
>> >> > mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0, cmd_pending=-1
>> >> > eth0 Failed to read scan data : No such device
>> >
>> > After this error message, do you see the sd8787 card is removed from the bus?
>> >
>> > "mmc0: card 0001 removed"
>>
>> No.
>
> That's strange.
> "mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0, cmd_pending=-1" is from
> mwifiex_remove_card().

I haven't dug into the code, but it seems quite normal to me. It makes
sense for mwifiex_remove_card to be called in the rmmod path. And it
makes sense for the kernel not to report that the card was removed,
since it wasn't actually removed. Only the module was unloaded.

>> Under this kernel, I'm not sure if the power is or isn't being cut at
>> this point. I don't think this is relevant though; regular
>
> "we quite regularly will power down the wireless card when it is not
> being actively used..."
>
> I interpreted this statement as "the power to the card is being cut while inactive".
> mwifiex_remove_card() is called seems agrees with it. Perhaps I'm missing something here.

Yes, we will want to make sure that the power is physically cut. I
just haven't verified what the exact behaviour is under 3.8.
Nevertheless, I think this detail is irrelevant, given that
rmmod/insmod cycles work fine, but the problem only occurs when
scanning is involved.

>> insmod/rmmod cycles work fine, the problem only appears when tearing
>> down the interface while a scan is in progress.
>
> Could you add some debug logs in the tear-down path to show where exactly it gets stuck?

I mis-spoke there, sorry. The problem happens on the next modprobe
*after* a teardown when a scan was in progress. The teardown itself
seems to have completed without getting stuck.

If you have any suggestions for where to add debug messages, that
would be appreciated. Otherwise I will look into sprinkling a few
throughout the probe sequences to see if any communication succeeds
before the first error is printed.

Thanks,
Daniel

2013-02-20 17:47:23

by Daniel Drake

[permalink] [raw]
Subject: Re: mwifiex crash when removing interface while scanning

On Tue, Feb 19, 2013 at 11:06 PM, Bing Zhao <[email protected]> wrote:
>> > mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0, cmd_pending=-1
>> > eth0 Failed to read scan data : No such device
>
> After this error message, do you see the sd8787 card is removed from the bus?
>
> "mmc0: card 0001 removed"

No.

> My card works fine with a Ricoh controller on x86_64 machine. Of course the power is not cut when I remove the mwifiex_sdio module.

Under this kernel, I'm not sure if the power is or isn't being cut at
this point. I don't think this is relevant though; regular
insmod/rmmod cycles work fine, the problem only appears when tearing
down the interface while a scan is in progress.

> I can upgrade my XO-4 to kernel 3.8 if feasible. Please let me know if a HOW-TO doc is available.

You will need a serial console connected, as there is no output on the
LCD at this time when running the upstream kernel.

http://wiki.laptop.org/go/User:DanielDrake/Run_upstream_kernel_on_XO-4

Daniel

2013-02-21 19:57:06

by Bing Zhao

[permalink] [raw]
Subject: RE: mwifiex crash when removing interface while scanning

Hi Daniel,

> > That's strange.
> > "mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0, cmd_pending=-1" is from
> > mwifiex_remove_card().
>
> I haven't dug into the code, but it seems quite normal to me. It makes
> sense for mwifiex_remove_card to be called in the rmmod path. And it

Ah, you are right. I overlooked that rmmod will lead to mwifiex_remove_card.

> makes sense for the kernel not to report that the card was removed,
> since it wasn't actually removed. Only the module was unloaded.
>
> >> Under this kernel, I'm not sure if the power is or isn't being cut at
> >> this point. I don't think this is relevant though; regular
> >
> > "we quite regularly will power down the wireless card when it is not
> > being actively used..."
> >
> > I interpreted this statement as "the power to the card is being cut while inactive".
> > mwifiex_remove_card() is called seems agrees with it. Perhaps I'm missing something here.
>
> Yes, we will want to make sure that the power is physically cut. I
> just haven't verified what the exact behaviour is under 3.8.

MMC core will report "mmc0: card 0001 removed" when the power is cut.

> Nevertheless, I think this detail is irrelevant, given that
> rmmod/insmod cycles work fine, but the problem only occurs when
> scanning is involved.
>
> >> insmod/rmmod cycles work fine, the problem only appears when tearing
> >> down the interface while a scan is in progress.
> >
> > Could you add some debug logs in the tear-down path to show where exactly it gets stuck?
>
> I mis-spoke there, sorry. The problem happens on the next modprobe
> *after* a teardown when a scan was in progress. The teardown itself
> seems to have completed without getting stuck.
>
> If you have any suggestions for where to add debug messages, that
> would be appreciated. Otherwise I will look into sprinkling a few
> throughout the probe sequences to see if any communication succeeds
> before the first error is printed.

For now, let's use dynamic_debug to collect some logs.

CONFIG_DYNAMIC_DEBUG=y

# run these commands before running the test script:
echo "module mwifiex +p" > /sys/kernel/debug/dynamic_debug/control
echo "module mwifiex_sdio +p" > /sys/kernel/debug/dynamic_debug/control

Thanks,
Bing

2013-02-19 23:32:02

by Bing Zhao

[permalink] [raw]
Subject: RE: mwifiex crash when removing interface while scanning

Hi Daniel,

> Hi,
>
> Testing mwifiex_sdio on XO-4 (8787) on Linux 3.8. In OLPC's use case
> we quite regularly will power down the wireless card when it is not
> being actively used. This is currently proving problematic.

I have an XO-4. Perhaps I can update to kernel 3.8?

>
> Here is a test case reproducing one of the failures we are seeing:
>
> First, run this script:
>
> insmod mwifiex_sdio.ko
> sleep 1
> ifconfig eth0 up
> iwlist eth0 scan &
> sleep 0.5
> rmmod mwifiex_sdio
>
> It normally runs fine the first time, no errors, but run it again and
> it terminates with these errors:
>
> mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0, cmd_pending=-1
> eth0 Failed to read scan data : No such device
>
> From this point, things are pretty screwed up. Try to load the driver again...
>
> insmod mwifiex_sdio.ko
> mwifiex_sdio mmc0:0001:1: WLAN FW already running! Skip FW dnld
> mwifiex_sdio mmc0:0001:1: WLAN FW is active
> mwifiex_sdio mmc0:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id
> (331.391420) = 0xa9, act = 0x0
> mwifiex_sdio mmc0:0001:1: num_data_h2c_failure = 0
> mwifiex_sdio mmc0:0001:1: num_cmd_h2c_failure = 0
> mwifiex_sdio mmc0:0001:1: num_cmd_timeout = 1
> mwifiex_sdio mmc0:0001:1: num_tx_timeout = 0
> mwifiex_sdio mmc0:0001:1: last_cmd_index = 1
> mwifiex_sdio mmc0:0001:1: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00
> mwifiex_sdio mmc0:0001:1: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
> mwifiex_sdio mmc0:0001:1: last_cmd_resp_index = 0
> mwifiex_sdio mmc0:0001:1: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00
> mwifiex_sdio mmc0:0001:1: last_event_index = 0
> mwifiex_sdio mmc0:0001:1: last_event: 00 00 00 00 00 00 00 00 00 00
> mwifiex_sdio mmc0:0001:1: data_sent=1 cmd_sent=1
> mwifiex_sdio mmc0:0001:1: ps_mode=0 ps_state=0
> mwifiex_sdio mmc0:0001:1: cmd timeout
> mwifiex_sdio: Resetting card...
> WARNING: driver mwifiex_sdio did not remove its interrupt handler!
> mmc0: card 0001 removed
>
> and it doesn't come back.
>
> Let me know how we can help further debugging.

I will try this test with my SD8787 card and get back to you later.

Thanks,
Bing

>
> Thanks
> Daniel

2013-02-20 05:13:57

by Bing Zhao

[permalink] [raw]
Subject: RE: mwifiex crash when removing interface while scanning

Hi Daniel,

> > insmod mwifiex_sdio.ko
> > sleep 1
> > ifconfig eth0 up
> > iwlist eth0 scan &
> > sleep 0.5
> > rmmod mwifiex_sdio
> >
> > It normally runs fine the first time, no errors, but run it again and
> > it terminates with these errors:
> >
> > mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0, cmd_pending=-1
> > eth0 Failed to read scan data : No such device

After this error message, do you see the sd8787 card is removed from the bus?

"mmc0: card 0001 removed"

> >
> > From this point, things are pretty screwed up. Try to load the driver again...
> >
> > insmod mwifiex_sdio.ko
> > mwifiex_sdio mmc0:0001:1: WLAN FW already running! Skip FW dnld
> > mwifiex_sdio mmc0:0001:1: WLAN FW is active
> > mwifiex_sdio mmc0:0001:1: mwifiex_cmd_timeout_func: Timeout cmd id
> > (331.391420) = 0xa9, act = 0x0
> > mwifiex_sdio mmc0:0001:1: num_data_h2c_failure = 0
> > mwifiex_sdio mmc0:0001:1: num_cmd_h2c_failure = 0
> > mwifiex_sdio mmc0:0001:1: num_cmd_timeout = 1
> > mwifiex_sdio mmc0:0001:1: num_tx_timeout = 0
> > mwifiex_sdio mmc0:0001:1: last_cmd_index = 1
> > mwifiex_sdio mmc0:0001:1: last_cmd_id: 00 00 a9 00 00 00 00 00 00 00
> > mwifiex_sdio mmc0:0001:1: last_cmd_act: 00 00 00 00 00 00 00 00 00 00
> > mwifiex_sdio mmc0:0001:1: last_cmd_resp_index = 0
> > mwifiex_sdio mmc0:0001:1: last_cmd_resp_id: 00 00 00 00 00 00 00 00 00 00
> > mwifiex_sdio mmc0:0001:1: last_event_index = 0
> > mwifiex_sdio mmc0:0001:1: last_event: 00 00 00 00 00 00 00 00 00 00
> > mwifiex_sdio mmc0:0001:1: data_sent=1 cmd_sent=1
> > mwifiex_sdio mmc0:0001:1: ps_mode=0 ps_state=0
> > mwifiex_sdio mmc0:0001:1: cmd timeout
> > mwifiex_sdio: Resetting card...
> > WARNING: driver mwifiex_sdio did not remove its interrupt handler!
> > mmc0: card 0001 removed
> >
> > and it doesn't come back.
> >
> > Let me know how we can help further debugging.
>
> I will try this test with my SD8787 card and get back to you later.

My card works fine with a Ricoh controller on x86_64 machine. Of course the power is not cut when I remove the mwifiex_sdio module.

I can upgrade my XO-4 to kernel 3.8 if feasible. Please let me know if a HOW-TO doc is available.

Thanks,
Bing


2013-03-11 20:12:09

by Daniel Drake

[permalink] [raw]
Subject: Re: mwifiex crash when removing interface while scanning

Hi,

On Thu, Mar 7, 2013 at 11:00 PM, Bing Zhao <[email protected]> wrote:
> Running your test script on my XO-4 with 3.8 kernel does reproduce command timeout.
> Although the timeout happens at different path than yours they could have the same cause.
>
> Attached please find two patches that seems fix the FUNC_SHUTDOWN timeout on my XO-4.
> Hope it helps.

Thanks, this seems like an improvement.
With NetworkManager disabled, I can now run the original test script
in a loop, and after a few minutes it is still running.

I still see this message:
mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0, cmd_pending=-1
(cmd_pending is sometimes -2 as well).
Does this suggest a leak?

Even after these patches it still seems easy to confuse this driver
with just a trivial setup. Here's one example:

NetworkManager enabled again, clean reboot, using test.sh from the
first mail, the following failure seems 100% reproducible:

# bash test.sh
[ 17.129827] lis3lv02d_i2c 4-0019: Failed to get supply 'Vdd': -517
[ 17.136003] i2c 4-0019: Driver lis3lv02d_i2c requests probe deferral
[ 17.915844] mwifiex_sdio mmc0:0001:1: WLAN FW is active
eth0: ERROR while getting interface flags: No such device
eth0 Interface doesn't support scanning.
[ 18.407286] mwifiex_sdio mmc0:0001:1: ignoring F/W country code US
[ 18.430167] mwifiex_sdio mmc0:0001:1: driver_version = mwifiex 1.0
(14.66.9.p96)
[ 18.510658] systemd-udevd[235]: renamed network interface mlan0 to eth0

Oops - seems like my script doesn't wait long enough for the network
interface rename to happen, so that wasn't a complete test. This seems
to happen frequently on first run, but subsequent runs work better.
Lets run it again:

# bash test.sh
[ 21.653722] mwifiex_sdio mmc0:0001:1: WLAN FW already running! Skip FW dnld
[ 21.662303] lis3lv02d_i2c 4-0019: Failed to get supply 'Vdd': -517
[ 21.673173] i2c 4-0019: Driver lis3lv02d_i2c requests probe deferral
[ 21.681210] mwifiex_sdio mmc0:0001:1: WLAN FW is active
[ 21.794164] mwifiex_sdio mmc0:0001:1: ignoring F/W country code US
[ 21.807990] mwifiex_sdio mmc0:0001:1: driver_version = mwifiex 1.0
(14.66.9.p96)
[ 21.893378] systemd-udevd[235]: renamed network interface mlan0 to eth0
[ 21.984014] ieee80211 phy1: uap0: changing to 2 not supported
[ 22.054609] ieee80211 phy1: uap0: changing to 2 not supported
[ 22.095906] ieee80211 phy1: uap0: changing to 2 not supported
[ 22.115854] ieee80211 phy1: uap0: changing to 2 not supported
[ 22.146579] ieee80211 phy1: uap0: changing to 2 not supported
eth0 Interface doesn't support scanning : Device or resource busy
[ 24.092789] mwifiex_sdio mmc0:0001:1: DNLD_CMD: FW in reset state,
ignore cmd 0x6
[ 24.101801] mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0,
cmd_pending=-2

I guess the scan did not succeed there because NetworkManager was
scanning. No problem, things look OK (apart from the last 2 messages
which seem a little concerning).

Now lets load the module manually and do a scan:

# insmod mwifiex_sdio.ko
[ 26.639096] lis3lv02d_i2c 4-0019: Failed to get supply 'Vdd': -517
[ 26.645260] i2c 4-0019: Driver lis3lv02d_i2c requests probe deferral
[ 26.656082] mwifiex_sdio mmc0:0001:1: WLAN FW already running! Skip FW dnld
[ 26.666128] mwifiex_sdio mmc0:0001:1: WLAN FW is active
[ 26.777635] mwifiex_sdio mmc0:0001:1: ignoring F/W country code US
[ 26.798631] mwifiex_sdio mmc0:0001:1: driver_version = mwifiex 1.0
(14.66.9.p96)
[ 26.862487] systemd-udevd[235]: renamed network interface mlan0 to eth0
[ 27.129054] ieee80211 phy2: uap0: changing to 2 not supported
[ 27.168360] ieee80211 phy2: uap0: changing to 2 not supported
[ 27.196832] ieee80211 phy2: uap0: changing to 2 not supported
[ 27.216715] ieee80211 phy2: uap0: changing to 2 not supported
[ 27.239068] ieee80211 phy2: uap0: changing to 2 not supported

# iwlist eth0 scan
eth0 Interface doesn't support scanning : Device or resource busy

OK, NetworkManager is scanning. I keep trying this every few seconds:

# iwlist eth0 scan
eth0 Interface doesn't support scanning : Device or resource busy
# iwlist eth0 scan
eth0 Interface doesn't support scanning : Device or resource busy
# iwlist eth0 scan
eth0 Interface doesn't support scanning : Device or resource busy
# iwlist eth0 scan
[ 49.567711] mwifiex_sdio mmc0:0001:1: mwifiex_cmd_timeout_func:
Timeout cmd id (49.555599) = 0x6, act = 0x3
[ 49.577420] mwifiex_sdio mmc0:0001:1: num_data_h2c_failure = 0
[ 49.583240] mwifiex_sdio mmc0:0001:1: num_cmd_h2c_failure = 0
[ 49.588986] mwifiex_sdio mmc0:0001:1: num_cmd_timeout = 1
[ 49.588986] mwifiex_sdio mmc0:0001:1: num_tx_timeout = 0
[ 49.599612] mwifiex_sdio mmc0:0001:1: last_cmd_index = 0
[ 49.604892] mwifiex_sdio mmc0:0001:1: last_cmd_id: 06 00 71 00 06
00 06 00 5e 00
[ 49.612243] mwifiex_sdio mmc0:0001:1: last_cmd_act: 03 00 00 00 03
00 03 00 01 00
[ 49.619683] mwifiex_sdio mmc0:0001:1: last_cmd_resp_index = 4
[ 49.619693] mwifiex_sdio mmc0:0001:1: last_cmd_resp_id: 12 80 71 80
06 80 06 80 5e 80
[ 49.633176] mwifiex_sdio mmc0:0001:1: last_event_index = 3
[ 49.638627] mwifiex_sdio mmc0:0001:1: last_event: 17 00 0a 00 08 00
17 00 17 00
[ 49.645883] mwifiex_sdio mmc0:0001:1: data_sent=0 cmd_sent=1
[ 49.651499] mwifiex_sdio mmc0:0001:1: ps_mode=1 ps_state=0
[ 49.656950] mwifiex_sdio mmc0:0001:1: cmd timeout
[ 49.661697] mwifiex_sdio: Resetting card...
[ 49.676122] mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0,
cmd_pending=-4
[ 49.683632] ieee80211 phy2: crypto keys added
[ 49.689716] mwifiex_sdio mmc0:0001:1: PREP_CMD: card is removed
eth0 Interface doesn't support scanning : No such device
[ 49.717762] mwifiex_sdio mmc0:0001:1: PREP_CMD: card is removed
[ 49.808245] mmc0: card 0001 removed

...and the device doesn't come back.

Daniel

2013-03-29 18:45:42

by Bing Zhao

[permalink] [raw]
Subject: RE: mwifiex crash when removing interface while scanning

Hi Daniel,

> On Wed, Mar 27, 2013 at 8:24 PM, Bing Zhao <[email protected]> wrote:
> > This is caused by decreasing cmd_pending while cleanup scan commands in scan_pending_q.
> > cmd_pending only counts the commands queued in cmd_pending_q.
> >
> > Attached v2 patch should fix it. Please try it out.
>
> Thanks, applied both patches, that seems to be working now.

Thanks for testing. I will resend the patches to the list.

Thanks,
Bing

2013-03-28 02:26:37

by Bing Zhao

[permalink] [raw]
Subject: RE: mwifiex crash when removing interface while scanning

Hi Daniel,

> > Please try attached patch on top of the 3 patches I sent last Friday.
>
> Thanks for continuing to look at this.
>
> Unfortuntaely the cmd_pending warning is still there and the numbers
> are now -4 and -5 e.g.
> [ 128.269912] mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0,
> cmd_pending=-5

This is caused by decreasing cmd_pending while cleanup scan commands in scan_pending_q.
cmd_pending only counts the commands queued in cmd_pending_q.

Attached v2 patch should fix it. Please try it out.

Thanks,
Bing

>
> This is using the same test script used in the opening message on this
> thread, with uap0 and p2p0 auto-creation disabled, with or without
> NetworkManager running at the same time.
>
> Thanks
> Daniel


Attachments:
0001-v2-mwifiex-fix-negative-cmd_pending-count.patch (8.59 kB)
0001-v2-mwifiex-fix-negative-cmd_pending-count.patch
0002-v2-mwifiex-complete-last-internal-scan.patch (1.67 kB)
0002-v2-mwifiex-complete-last-internal-scan.patch
Download all attachments

2013-03-19 06:19:47

by Bing Zhao

[permalink] [raw]
Subject: RE: mwifiex crash when removing interface while scanning

Hi Daniel,

> > That seems to avoid the issue. I'm worried that it suggests some more
> > significant lower-level bug, but I guess for now it would make sense
> > to finish off those 2 patches you already posted, and investigate the
> > possible cmd_pending leak, before revisiting this.
>
> Thanks for testing. I will fine tune the 2 patches and resend the updated version to the list.
> The cmd_pending leak will be investigated too.

Please try attached patch on top of the 3 patches I sent last Friday.

Thanks,
Bing


Attachments:
0001-mwifiex-fix-negative-cmd_pending-count.patch (3.34 kB)
0001-mwifiex-fix-negative-cmd_pending-count.patch

2013-03-29 13:46:29

by Daniel Drake

[permalink] [raw]
Subject: Re: mwifiex crash when removing interface while scanning

On Wed, Mar 27, 2013 at 8:24 PM, Bing Zhao <[email protected]> wrote:
> This is caused by decreasing cmd_pending while cleanup scan commands in scan_pending_q.
> cmd_pending only counts the commands queued in cmd_pending_q.
>
> Attached v2 patch should fix it. Please try it out.

Thanks, applied both patches, that seems to be working now.

Daniel

2013-03-25 21:49:42

by John Rhodes

[permalink] [raw]
Subject: RE: mwifiex crash when removing interface while scanning

Hi Daniel,

We are now also seeing cmd_pending warnings in testing at Marvell in Santa Clara. Bing is working on a revised patch for this issue. We'll keep you posted on the progress. I'll also update the tracking system ticket #479039.

John

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Daniel Drake
Sent: Monday, March 25, 2013 12:39 PM
To: Bing Zhao
Cc: [email protected]; John Rhodes; Amitkumar Karwar
Subject: Re: mwifiex crash when removing interface while scanning

Hi,

On Tue, Mar 19, 2013 at 12:19 AM, Bing Zhao <[email protected]> wrote:
> Hi Daniel,
>
>> > That seems to avoid the issue. I'm worried that it suggests some
>> > more significant lower-level bug, but I guess for now it would make
>> > sense to finish off those 2 patches you already posted, and
>> > investigate the possible cmd_pending leak, before revisiting this.
>>
>> Thanks for testing. I will fine tune the 2 patches and resend the updated version to the list.
>> The cmd_pending leak will be investigated too.
>
> Please try attached patch on top of the 3 patches I sent last Friday.

Thanks for continuing to look at this.

Unfortuntaely the cmd_pending warning is still there and the numbers are now -4 and -5 e.g.
[ 128.269912] mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0,
cmd_pending=-5

This is using the same test script used in the opening message on this thread, with uap0 and p2p0 auto-creation disabled, with or without NetworkManager running at the same time.

Thanks
Daniel

2013-03-15 03:40:17

by Bing Zhao

[permalink] [raw]
Subject: RE: mwifiex crash when removing interface while scanning

Hi Daniel,

> > Tim Shepard reported a scan timeout issue back in kernel 3.5 when NetworkManager is enabled, and he
> had a HACK workaround for that. I wonder if this is the same issue in 3.8.
> > Could you please try that workaround again?
>
> That seems to avoid the issue. I'm worried that it suggests some more
> significant lower-level bug, but I guess for now it would make sense
> to finish off those 2 patches you already posted, and investigate the
> possible cmd_pending leak, before revisiting this.

Thanks for testing. I will fine tune the 2 patches and resend the updated version to the list.
The cmd_pending leak will be investigated too.

Thanks,
Bing

2013-03-25 16:38:56

by Daniel Drake

[permalink] [raw]
Subject: Re: mwifiex crash when removing interface while scanning

Hi,

On Tue, Mar 19, 2013 at 12:19 AM, Bing Zhao <[email protected]> wrote:
> Hi Daniel,
>
>> > That seems to avoid the issue. I'm worried that it suggests some more
>> > significant lower-level bug, but I guess for now it would make sense
>> > to finish off those 2 patches you already posted, and investigate the
>> > possible cmd_pending leak, before revisiting this.
>>
>> Thanks for testing. I will fine tune the 2 patches and resend the updated version to the list.
>> The cmd_pending leak will be investigated too.
>
> Please try attached patch on top of the 3 patches I sent last Friday.

Thanks for continuing to look at this.

Unfortuntaely the cmd_pending warning is still there and the numbers
are now -4 and -5 e.g.
[ 128.269912] mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0,
cmd_pending=-5

This is using the same test script used in the opening message on this
thread, with uap0 and p2p0 auto-creation disabled, with or without
NetworkManager running at the same time.

Thanks
Daniel

2013-03-08 05:00:54

by Bing Zhao

[permalink] [raw]
Subject: RE: mwifiex crash when removing interface while scanning

Hi Daniel,

> > If you have any suggestions for where to add debug messages, that
> > would be appreciated. Otherwise I will look into sprinkling a few
> > throughout the probe sequences to see if any communication succeeds
> > before the first error is printed.
>
> For now, let's use dynamic_debug to collect some logs.
>
> CONFIG_DYNAMIC_DEBUG=y
>
> # run these commands before running the test script:
> echo "module mwifiex +p" > /sys/kernel/debug/dynamic_debug/control
> echo "module mwifiex_sdio +p" > /sys/kernel/debug/dynamic_debug/control

Running your test script on my XO-4 with 3.8 kernel does reproduce command timeout.
Although the timeout happens at different path than yours they could have the same cause.

Attached please find two patches that seems fix the FUNC_SHUTDOWN timeout on my XO-4.
Hope it helps.

Thanks,
Bing


Attachments:
0002-mwifiex-skip-pending-commands-after-function-shutdow.patch (1.79 kB)
0002-mwifiex-skip-pending-commands-after-function-shutdow.patch
0001-mwifiex-fix-race-when-queuing-commands.patch (4.93 kB)
0001-mwifiex-fix-race-when-queuing-commands.patch
Download all attachments

2013-03-14 16:26:25

by Daniel Drake

[permalink] [raw]
Subject: Re: mwifiex crash when removing interface while scanning

On Tue, Mar 12, 2013 at 9:04 PM, Bing Zhao <[email protected]> wrote:
> Tim Shepard reported a scan timeout issue back in kernel 3.5 when NetworkManager is enabled, and he had a HACK workaround for that. I wonder if this is the same issue in 3.8.
> Could you please try that workaround again?

That seems to avoid the issue. I'm worried that it suggests some more
significant lower-level bug, but I guess for now it would make sense
to finish off those 2 patches you already posted, and investigate the
possible cmd_pending leak, before revisiting this.

Thanks.
Daniel

2013-03-13 03:06:23

by Bing Zhao

[permalink] [raw]
Subject: RE: mwifiex crash when removing interface while scanning

Hi Daniel,

> Hi,
>
> On Thu, Mar 7, 2013 at 11:00 PM, Bing Zhao <[email protected]> wrote:
> > Running your test script on my XO-4 with 3.8 kernel does reproduce command timeout.
> > Although the timeout happens at different path than yours they could have the same cause.
> >
> > Attached please find two patches that seems fix the FUNC_SHUTDOWN timeout on my XO-4.
> > Hope it helps.
>
> Thanks, this seems like an improvement.
> With NetworkManager disabled, I can now run the original test script
> in a loop, and after a few minutes it is still running.

Good to know it improves.

>
> I still see this message:
> mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0, cmd_pending=-1
> (cmd_pending is sometimes -2 as well).
> Does this suggest a leak?

Yeah, I also noticed this. I will check.

>
> Even after these patches it still seems easy to confuse this driver
> with just a trivial setup. Here's one example:
>
> NetworkManager enabled again, clean reboot, using test.sh from the
> first mail, the following failure seems 100% reproducible:
>

[...]

> [ 24.092789] mwifiex_sdio mmc0:0001:1: DNLD_CMD: FW in reset state,
> ignore cmd 0x6
> [ 24.101801] mwifiex_sdio mmc0:0001:1: rx_pending=0, tx_pending=0,
> cmd_pending=-2
>
> I guess the scan did not succeed there because NetworkManager was
> scanning. No problem, things look OK (apart from the last 2 messages
> which seem a little concerning).

The "FW in reset state, ignore cmd 0x06" message is correct. During driver/firmware shutdown process we must skip these commands.

>
> Now lets load the module manually and do a scan:
>
> # insmod mwifiex_sdio.ko
> [ 26.639096] lis3lv02d_i2c 4-0019: Failed to get supply 'Vdd': -517
> [ 26.645260] i2c 4-0019: Driver lis3lv02d_i2c requests probe deferral
> [ 26.656082] mwifiex_sdio mmc0:0001:1: WLAN FW already running! Skip FW dnld
> [ 26.666128] mwifiex_sdio mmc0:0001:1: WLAN FW is active
> [ 26.777635] mwifiex_sdio mmc0:0001:1: ignoring F/W country code US
> [ 26.798631] mwifiex_sdio mmc0:0001:1: driver_version = mwifiex 1.0
> (14.66.9.p96)
> [ 26.862487] systemd-udevd[235]: renamed network interface mlan0 to eth0
> [ 27.129054] ieee80211 phy2: uap0: changing to 2 not supported
> [ 27.168360] ieee80211 phy2: uap0: changing to 2 not supported
> [ 27.196832] ieee80211 phy2: uap0: changing to 2 not supported
> [ 27.216715] ieee80211 phy2: uap0: changing to 2 not supported
> [ 27.239068] ieee80211 phy2: uap0: changing to 2 not supported
>
> # iwlist eth0 scan
> eth0 Interface doesn't support scanning : Device or resource busy
>
> OK, NetworkManager is scanning. I keep trying this every few seconds:
>
> # iwlist eth0 scan
> eth0 Interface doesn't support scanning : Device or resource busy
> # iwlist eth0 scan
> eth0 Interface doesn't support scanning : Device or resource busy
> # iwlist eth0 scan
> eth0 Interface doesn't support scanning : Device or resource busy
> # iwlist eth0 scan
> [ 49.567711] mwifiex_sdio mmc0:0001:1: mwifiex_cmd_timeout_func:
> Timeout cmd id (49.555599) = 0x6, act = 0x3

Tim Shepard reported a scan timeout issue back in kernel 3.5 when NetworkManager is enabled, and he had a HACK workaround for that. I wonder if this is the same issue in 3.8.
Could you please try that workaround again?

diff --git a/drivers/net/wireless/mwifiex/main.c b/drivers/net/wireless/mwifie
index 121443a..a451d18 100644
--- a/drivers/net/wireless/mwifiex/main.c
+++ b/drivers/net/wireless/mwifiex/main.c
@@ -364,6 +364,7 @@ static void mwifiex_fw_dpc(const struct firmware *firmware
goto err_add_intf;
}

+#if 0
/* Create AP interface by default */
if (!mwifiex_add_virtual_intf(adapter->wiphy, "uap%d",
NL80211_IFTYPE_AP, NULL, NULL)) {
@@ -377,6 +378,7 @@ static void mwifiex_fw_dpc(const struct firmware *firmware
dev_err(adapter->dev, "cannot create default P2P interface\n")
goto err_add_intf;
}
+#endif
rtnl_unlock();

mwifiex_drv_get_driver_version(adapter, fmt, sizeof(fmt) - 1);

Thanks,
Bing