2014-09-11 08:57:01

by Belisko Marek

[permalink] [raw]
Subject: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Hello,

I'm using 3.9 mainline mwifiex driver for wireless usb card. Doing
some throughput testing (with iperf) in 5GHz I got following failures:
[ 221.521799] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed

I checked which which size fails to allocate and it's 4096 bytes. I
was looking to changes in never kernel releases but I cannot find
anything obvious. When connected to 2.4GHz I cannot reproduce issue
though. I'm using FW version mwifiex 1.0 (14.68.29.p26).

Thank for any help/hints,

BR,

marek

--
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com


2014-09-11 19:15:04

by Belisko Marek

[permalink] [raw]
Subject: Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Dear Amitkumar Karwar,

On Thu, Sep 11, 2014 at 5:09 PM, Amitkumar Karwar <[email protected]> wrote:
> Hi BR,
>
>>
>> I'm using 3.9 mainline mwifiex driver for wireless usb card. Doing some
>> throughput testing (with iperf) in 5GHz I got following failures:
>> [ 221.521799] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed
>
> This is skb allocation failure returned by kernel. 4k buffer is always allocated for Rx packets. This issue doesn't seem to be specific to 5Ghz.
OK. But I cannot reproduce this issue when connected to 2.4GHz (same
steps as described below).
>.
>>
>> I checked which which size fails to allocate and it's 4096 bytes. I was
>> looking to changes in never kernel releases but I cannot find anything
>> obvious. When connected to 2.4GHz I cannot reproduce issue though. I'm
>> using FW version mwifiex 1.0 (14.68.29.p26).
>>
>
> Could you please provide the platform details?
> How often the problem occurs during throughput testing? Are there any specific steps?
I can reproduce problem 100% by running iperf as server on board where
wifi is connected to (iperf -s -u -w1M -i1)
and connecting with iperf client from other PC (iperf -c IPADDR -u -b 100m).
Then immediately I can see allocation failures. If you need more info
please let me know. Thanks.
>
> Regards,
> Amitkumar Karwar

BR,

marek

--
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com

2014-09-17 21:57:31

by James Cameron

[permalink] [raw]
Subject: Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

On Wed, Sep 17, 2014 at 03:52:52AM -0700, Amitkumar Karwar wrote:
> Hi BR,
>
> > Dear Amitkumar Karwar,
> >
> > some additional info.
> >
> > On Thu, Sep 11, 2014 at 5:09 PM, Amitkumar Karwar <[email protected]>
> > wrote:
> > > Hi BR,
> > >
> > >>
> > >> I'm using 3.9 mainline mwifiex driver for wireless usb card. Doing
> > >> some throughput testing (with iperf) in 5GHz I got following
> > failures:
> > >> [ 221.521799] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb
> > >> failed
> > >
> > > This is skb allocation failure returned by kernel. 4k buffer is
> > always allocated for Rx packets. This issue doesn't seem to be specific
> > to 5Ghz.
> > Yes you're right. I can reproduce issue also with 2.4GHz (doing iperf
> > testing as mentioned in other email) by pinging device with card.
> > >
> > >>
> > >> I checked which which size fails to allocate and it's 4096 bytes. I
> > >> was looking to changes in never kernel releases but I cannot find
> > >> anything obvious. When connected to 2.4GHz I cannot reproduce issue
> > >> though. I'm using FW version mwifiex 1.0 (14.68.29.p26).
> > >>
> > >
> > > Could you please provide the platform details?
> > > How often the problem occurs during throughput testing? Are there any
> > specific steps?
> > One more observation is that when problem occurred complete system is
> > unresponsive (console is almost completely dead).
>
> Thanks for the more information.
> Skb alloc failure should be gracefully handled. We will look into
> this issue.

If you get time, I'd also appreciate a look into the issue on sdio.c
during data receive.

When dev_alloc_skb fails the interrupt handler does not rewind
the driver state in preparation for a retry. This is not graceful.

http://dev.laptop.org/ticket/12694 has details, and an adequate
solution we are using in 3.5 to rewind the driver state:

http://dev.laptop.org/git/olpc-kernel/commit/?h=arm-3.5&id=59fcaf10cce5bbdc370ec1c262b12aeb66ed1dca

We're using 8787.

>
> > I can workaround issue by decreasing iperf bandwidth to ~40m. I think
> > in this situation we're running out of memory by exhaustive skb
> > allocations.
>
> Actually 6 4K size buffers are being allocated for Rx and Tx data during traffic.
> Probably your platform runs out of memory after these allocations.
>
> Could you please try changing this number(MWIFIEX_TX_DATA_URB/MWIFIEX_RX_DATA_URB macros) to 3?
>
> Regards,
> Amitkumar Karwar
> N?????r??y????b?X??ǧv?^?)޺{.n?+????{??*ޕ?,?{ay?ʇڙ?,j??f???h???z??w??? ???j:+v???w?j?m????????zZ+?????ݢj"??!

--
James Cameron
http://quozl.linux.org.au/

2014-09-17 10:52:59

by Amitkumar Karwar

[permalink] [raw]
Subject: RE: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

SGkgQlIsDQoNCj4gRGVhciBBbWl0a3VtYXIgS2Fyd2FyLA0KPiANCj4gc29tZSBhZGRpdGlvbmFs
IGluZm8uDQo+IA0KPiBPbiBUaHUsIFNlcCAxMSwgMjAxNCBhdCA1OjA5IFBNLCBBbWl0a3VtYXIg
S2Fyd2FyIDxha2Fyd2FyQG1hcnZlbGwuY29tPg0KPiB3cm90ZToNCj4gPiBIaSBCUiwNCj4gPg0K
PiA+Pg0KPiA+PiBJJ20gdXNpbmcgMy45IG1haW5saW5lIG13aWZpZXggZHJpdmVyIGZvciB3aXJl
bGVzcyB1c2IgY2FyZC4gRG9pbmcNCj4gPj4gc29tZSB0aHJvdWdocHV0IHRlc3RpbmcgKHdpdGgg
aXBlcmYpIGluIDVHSHogSSBnb3QgZm9sbG93aW5nDQo+IGZhaWx1cmVzOg0KPiA+PiBbIDIyMS41
MjE3OTldIHVzYiAxLTE6IG13aWZpZXhfdXNiX3N1Ym1pdF9yeF91cmI6IGRldl9hbGxvY19za2IN
Cj4gPj4gZmFpbGVkDQo+ID4NCj4gPiBUaGlzIGlzIHNrYiBhbGxvY2F0aW9uIGZhaWx1cmUgcmV0
dXJuZWQgYnkga2VybmVsLiA0ayBidWZmZXIgaXMNCj4gYWx3YXlzIGFsbG9jYXRlZCBmb3IgUngg
cGFja2V0cy4gVGhpcyBpc3N1ZSBkb2Vzbid0IHNlZW0gdG8gYmUgc3BlY2lmaWMNCj4gdG8gNUdo
ei4NCj4gWWVzIHlvdSdyZSByaWdodC4gSSBjYW4gcmVwcm9kdWNlIGlzc3VlIGFsc28gd2l0aCAy
LjRHSHogKGRvaW5nIGlwZXJmDQo+IHRlc3RpbmcgYXMgbWVudGlvbmVkIGluIG90aGVyIGVtYWls
KSBieSBwaW5naW5nIGRldmljZSB3aXRoIGNhcmQuDQo+ID4NCj4gPj4NCj4gPj4gSSBjaGVja2Vk
IHdoaWNoIHdoaWNoIHNpemUgZmFpbHMgdG8gYWxsb2NhdGUgYW5kIGl0J3MgNDA5NiBieXRlcy4g
SQ0KPiA+PiB3YXMgbG9va2luZyB0byBjaGFuZ2VzIGluIG5ldmVyIGtlcm5lbCByZWxlYXNlcyBi
dXQgSSBjYW5ub3QgZmluZA0KPiA+PiBhbnl0aGluZyBvYnZpb3VzLiBXaGVuIGNvbm5lY3RlZCB0
byAyLjRHSHogSSBjYW5ub3QgcmVwcm9kdWNlIGlzc3VlDQo+ID4+IHRob3VnaC4gSSdtIHVzaW5n
IEZXIHZlcnNpb24gbXdpZmlleCAxLjAgKDE0LjY4LjI5LnAyNikuDQo+ID4+DQo+ID4NCj4gPiBD
b3VsZCB5b3UgcGxlYXNlIHByb3ZpZGUgdGhlIHBsYXRmb3JtIGRldGFpbHM/DQo+ID4gSG93IG9m
dGVuIHRoZSBwcm9ibGVtIG9jY3VycyBkdXJpbmcgdGhyb3VnaHB1dCB0ZXN0aW5nPyBBcmUgdGhl
cmUgYW55DQo+IHNwZWNpZmljIHN0ZXBzPw0KPiBPbmUgbW9yZSBvYnNlcnZhdGlvbiBpcyB0aGF0
IHdoZW4gcHJvYmxlbSBvY2N1cnJlZCBjb21wbGV0ZSBzeXN0ZW0gaXMNCj4gdW5yZXNwb25zaXZl
IChjb25zb2xlIGlzIGFsbW9zdCBjb21wbGV0ZWx5IGRlYWQpLg0KDQpUaGFua3MgZm9yIHRoZSBt
b3JlIGluZm9ybWF0aW9uLg0KU2tiIGFsbG9jIGZhaWx1cmUgc2hvdWxkIGJlIGdyYWNlZnVsbHkg
aGFuZGxlZC4gV2Ugd2lsbCBsb29rIGludG8gdGhpcyBpc3N1ZS4NCg0KPiBJIGNhbiB3b3JrYXJv
dW5kIGlzc3VlIGJ5IGRlY3JlYXNpbmcgaXBlcmYgYmFuZHdpZHRoIHRvIH40MG0uIEkgdGhpbmsN
Cj4gaW4gdGhpcyBzaXR1YXRpb24gd2UncmUgcnVubmluZyBvdXQgb2YgbWVtb3J5IGJ5IGV4aGF1
c3RpdmUgc2tiDQo+IGFsbG9jYXRpb25zLg0KDQpBY3R1YWxseSA2IDRLIHNpemUgYnVmZmVycyBh
cmUgYmVpbmcgYWxsb2NhdGVkIGZvciBSeCBhbmQgVHggZGF0YSBkdXJpbmcgdHJhZmZpYy4NClBy
b2JhYmx5IHlvdXIgcGxhdGZvcm0gcnVucyBvdXQgb2YgbWVtb3J5IGFmdGVyIHRoZXNlIGFsbG9j
YXRpb25zLg0KDQpDb3VsZCB5b3UgcGxlYXNlIHRyeSBjaGFuZ2luZyB0aGlzIG51bWJlcihNV0lG
SUVYX1RYX0RBVEFfVVJCL01XSUZJRVhfUlhfREFUQV9VUkIgbWFjcm9zKSB0byAzPw0KDQpSZWdh
cmRzLA0KQW1pdGt1bWFyIEthcndhcg0K

2014-09-17 09:57:14

by Belisko Marek

[permalink] [raw]
Subject: Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Dear Amitkumar Karwar,

some additional info.

On Thu, Sep 11, 2014 at 5:09 PM, Amitkumar Karwar <[email protected]> wrote:
> Hi BR,
>
>>
>> I'm using 3.9 mainline mwifiex driver for wireless usb card. Doing some
>> throughput testing (with iperf) in 5GHz I got following failures:
>> [ 221.521799] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed
>
> This is skb allocation failure returned by kernel. 4k buffer is always allocated for Rx packets. This issue doesn't seem to be specific to 5Ghz.
Yes you're right. I can reproduce issue also with 2.4GHz (doing iperf
testing as mentioned in other email) by pinging device with card.
>
>>
>> I checked which which size fails to allocate and it's 4096 bytes. I was
>> looking to changes in never kernel releases but I cannot find anything
>> obvious. When connected to 2.4GHz I cannot reproduce issue though. I'm
>> using FW version mwifiex 1.0 (14.68.29.p26).
>>
>
> Could you please provide the platform details?
> How often the problem occurs during throughput testing? Are there any specific steps?
One more observation is that when problem occurred complete system is
unresponsive (console is almost completely dead).
I can workaround issue by decreasing iperf bandwidth to ~40m. I think
in this situation we're running out of memory by exhaustive skb
allocations.
>
> Regards,
> Amitkumar Karwar

BR,

marek

--
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com

2014-09-17 12:18:42

by Belisko Marek

[permalink] [raw]
Subject: Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Hi,

I was too fast when hitting Send :)

On Wed, Sep 17, 2014 at 2:07 PM, Belisko Marek <[email protected]> wrote:
> Dear Amitkumar Karwar,
>
> On Wed, Sep 17, 2014 at 12:52 PM, Amitkumar Karwar <[email protected]> wrote:
>> Hi BR,
>>
>>> Dear Amitkumar Karwar,
>>>
>>> some additional info.
>>>
>>> On Thu, Sep 11, 2014 at 5:09 PM, Amitkumar Karwar <[email protected]>
>>> wrote:
>>> > Hi BR,
>>> >
>>> >>
>>> >> I'm using 3.9 mainline mwifiex driver for wireless usb card. Doing
>>> >> some throughput testing (with iperf) in 5GHz I got following
>>> failures:
>>> >> [ 221.521799] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb
>>> >> failed
>>> >
>>> > This is skb allocation failure returned by kernel. 4k buffer is
>>> always allocated for Rx packets. This issue doesn't seem to be specific
>>> to 5Ghz.
>>> Yes you're right. I can reproduce issue also with 2.4GHz (doing iperf
>>> testing as mentioned in other email) by pinging device with card.
>>> >
>>> >>
>>> >> I checked which which size fails to allocate and it's 4096 bytes. I
>>> >> was looking to changes in never kernel releases but I cannot find
>>> >> anything obvious. When connected to 2.4GHz I cannot reproduce issue
>>> >> though. I'm using FW version mwifiex 1.0 (14.68.29.p26).
>>> >>
>>> >
>>> > Could you please provide the platform details?
>>> > How often the problem occurs during throughput testing? Are there any
>>> specific steps?
>>> One more observation is that when problem occurred complete system is
>>> unresponsive (console is almost completely dead).
>>
>> Thanks for the more information.
>> Skb alloc failure should be gracefully handled. We will look into this issue.
> OK. Thanks. Can you please
Can you please ping me when you will have some patch so I can test it
also on my device.
>>
>>> I can workaround issue by decreasing iperf bandwidth to ~40m. I think
>>> in this situation we're running out of memory by exhaustive skb
>>> allocations.
>>
>> Actually 6 4K size buffers are being allocated for Rx and Tx data during traffic.
>> Probably your platform runs out of memory after these allocations.
> This could be the issue. I'm running some apps when performing
> throughput test and this could lead
> to out of memory. When I stop all apps and perform the test it behave
> better (I cannot reproduce issue) except of
> fact that whole system is unresponsive for ~ 10 secs as was already mentioned.
This is not fully true. I run loop iperf test (-t 600) and it will
fail after 4 minutes with (there was around 190MB free memory):

[ 875.249551] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed
[ 875.256645] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed
[ 875.263694] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed
[ 875.270940] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed
[ 875.277993] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed
[ 875.285068] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed
[ 3] 0.0- 1.0 sec 81.8 KBytes 670 Kbits/sec 0.034 ms 0/ 57 (0%)
[ 3] 1.0- 2.0 sec 0.00 Bytes 0.00 bits/sec 0.034 ms 0/ 0 (nan%)
[ 3] 2.0- 3.0 sec 0.00 Bytes 0.00 bits/sec 0.034 ms 0/ 0 (nan%)
[ 3] 3.0- 4.0 sec 0.00 Bytes 0.00 bits/sec 0.034 ms 0/ 0 (nan%)
[ 3] 4.0- 5.0 sec 0.00 Bytes 0.00 bits/sec 0.034 ms 0/ 0 (nan%)
[ 3] 5.0- 6.0 sec 0.00 Bytes 0.00 bits/sec 0.034 ms 0/ 0 (nan%)
[ 3] 6.0- 7.0 sec 0.00 Bytes 0.00 bits/sec 0.034 ms 0/ 0 (nan%)
[ 3] 7.0- 8.0 sec 0.00 Bytes 0.00 bits/sec 0.034 ms 0/ 0 (nan%)
[ 3] 8.0- 9.0 sec 0.00 Bytes 0.00 bits/sec 0.034 ms 0/ 0 (nan%)
[ 3] 9.0-10.0 sec 0.00 Bytes 0.00 bits/sec 0.034 ms 0/ 0 (nan%)
[ 3] 10.0-11.0 sec 0.00 Bytes 0.00 bits/sec 0.034 ms 0/ 0 (nan%)
[ 3] 11.0-12.0 sec 0.00 Bytes 0.00 bits/sec 0.034 ms 0/ 0 (nan%)
[ 3] 12.0-13.0 sec 0.00 Bytes 0.00 bits/sec 0.034 ms 0/ 0 (nan%)
[ 3] 13.0-14.0 sec 0.00 Bytes 0.00 bits/sec 0.034 ms 0/ 0 (nan%)
[ 3] 14.0-15.0 sec 81.8 KBytes 670 Kbits/sec 24.506 ms 60/ 117 (51%)


[ 1019.644302] usb 1-1: mwifiex_cmd_timeout_func: Timeout cmd id
(1410955919.418091) = 0x6, act = 0x3
[ 1019.653826] usb 1-1: num_data_h2c_failure = 0
[ 1019.658409] usb 1-1: num_cmd_h2c_failure = 0
[ 1019.662907] usb 1-1: num_cmd_timeout = 1
[ 1019.667072] usb 1-1: num_tx_timeout = 0
[ 1019.671117] usb 1-1: last_cmd_index = 4
[ 1019.675186] usb 1-1: last_cmd_id: 28 00 28 00 28 00 5b 00 06 00
[ 1019.681426] usb 1-1: last_cmd_act: 13 01 13 01 13 01 01 00 03 00
[ 1019.687778] usb 1-1: last_cmd_resp_index = 3
[ 1019.692281] usb 1-1: last_cmd_resp_id: 28 80 28 80 28 80 5b 80 28 80
[ 1019.698997] usb 1-1: last_event_index = 2
[ 1019.703223] usb 1-1: last_event: 17 00 33 00 08 00 33 00 08 00
[ 1019.709389] usb 1-1: data_sent=0 cmd_sent=0
[ 1019.713820] usb 1-1: ps_mode=1 ps_state=0
[ 1019.718211] usb 1-1: cmd timeout

>>
>> Could you please try changing this number(MWIFIEX_TX_DATA_URB/MWIFIEX_RX_DATA_URB macros) to 3?
I tried but it doesn't help (also sometimes I see odd driver crashes -
NULL pointer access)
>>
>> Regards,
>> Amitkumar Karwar
>
> Best Regards,
>
> marekb
>
> --
> as simple and primitive as possible
> -------------------------------------------------
> Marek Belisko - OPEN-NANDRA
> Freelance Developer
>
> Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
> Tel: +421 915 052 184
> skype: marekwhite
> twitter: #opennandra
> web: http://open-nandra.com

Best Regards,

marek

--
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com

2014-09-17 12:07:24

by Belisko Marek

[permalink] [raw]
Subject: Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Dear Amitkumar Karwar,

On Wed, Sep 17, 2014 at 12:52 PM, Amitkumar Karwar <[email protected]> wrote:
> Hi BR,
>
>> Dear Amitkumar Karwar,
>>
>> some additional info.
>>
>> On Thu, Sep 11, 2014 at 5:09 PM, Amitkumar Karwar <[email protected]>
>> wrote:
>> > Hi BR,
>> >
>> >>
>> >> I'm using 3.9 mainline mwifiex driver for wireless usb card. Doing
>> >> some throughput testing (with iperf) in 5GHz I got following
>> failures:
>> >> [ 221.521799] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb
>> >> failed
>> >
>> > This is skb allocation failure returned by kernel. 4k buffer is
>> always allocated for Rx packets. This issue doesn't seem to be specific
>> to 5Ghz.
>> Yes you're right. I can reproduce issue also with 2.4GHz (doing iperf
>> testing as mentioned in other email) by pinging device with card.
>> >
>> >>
>> >> I checked which which size fails to allocate and it's 4096 bytes. I
>> >> was looking to changes in never kernel releases but I cannot find
>> >> anything obvious. When connected to 2.4GHz I cannot reproduce issue
>> >> though. I'm using FW version mwifiex 1.0 (14.68.29.p26).
>> >>
>> >
>> > Could you please provide the platform details?
>> > How often the problem occurs during throughput testing? Are there any
>> specific steps?
>> One more observation is that when problem occurred complete system is
>> unresponsive (console is almost completely dead).
>
> Thanks for the more information.
> Skb alloc failure should be gracefully handled. We will look into this issue.
OK. Thanks. Can you please
>
>> I can workaround issue by decreasing iperf bandwidth to ~40m. I think
>> in this situation we're running out of memory by exhaustive skb
>> allocations.
>
> Actually 6 4K size buffers are being allocated for Rx and Tx data during traffic.
> Probably your platform runs out of memory after these allocations.
This could be the issue. I'm running some apps when performing
throughput test and this could lead
to out of memory. When I stop all apps and perform the test it behave
better (I cannot reproduce issue) except of
fact that whole system is unresponsive for ~ 10 secs as was already mentioned.
>
> Could you please try changing this number(MWIFIEX_TX_DATA_URB/MWIFIEX_RX_DATA_URB macros) to 3?
>
> Regards,
> Amitkumar Karwar

Best Regards,

marekb

--
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com

2014-09-11 15:09:18

by Amitkumar Karwar

[permalink] [raw]
Subject: RE: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

SGkgQlIsDQoNCj4gDQo+IEknbSB1c2luZyAzLjkgbWFpbmxpbmUgbXdpZmlleCBkcml2ZXIgZm9y
IHdpcmVsZXNzIHVzYiBjYXJkLiBEb2luZyBzb21lDQo+IHRocm91Z2hwdXQgdGVzdGluZyAod2l0
aCBpcGVyZikgaW4gNUdIeiBJIGdvdCBmb2xsb3dpbmcgZmFpbHVyZXM6DQo+IFsgMjIxLjUyMTc5
OV0gdXNiIDEtMTogbXdpZmlleF91c2Jfc3VibWl0X3J4X3VyYjogZGV2X2FsbG9jX3NrYiBmYWls
ZWQNCg0KVGhpcyBpcyBza2IgYWxsb2NhdGlvbiBmYWlsdXJlIHJldHVybmVkIGJ5IGtlcm5lbC4g
NGsgYnVmZmVyIGlzIGFsd2F5cyBhbGxvY2F0ZWQgZm9yIFJ4IHBhY2tldHMuIFRoaXMgaXNzdWUg
ZG9lc24ndCBzZWVtIHRvIGJlIHNwZWNpZmljIHRvIDVHaHouIA0KDQo+IA0KPiBJIGNoZWNrZWQg
d2hpY2ggd2hpY2ggc2l6ZSBmYWlscyB0byBhbGxvY2F0ZSBhbmQgaXQncyA0MDk2IGJ5dGVzLiBJ
IHdhcw0KPiBsb29raW5nIHRvIGNoYW5nZXMgaW4gbmV2ZXIga2VybmVsIHJlbGVhc2VzIGJ1dCBJ
IGNhbm5vdCBmaW5kIGFueXRoaW5nDQo+IG9idmlvdXMuIFdoZW4gY29ubmVjdGVkIHRvIDIuNEdI
eiBJIGNhbm5vdCByZXByb2R1Y2UgaXNzdWUgdGhvdWdoLiBJJ20NCj4gdXNpbmcgRlcgdmVyc2lv
biBtd2lmaWV4IDEuMCAoMTQuNjguMjkucDI2KS4NCj4gDQoNCkNvdWxkIHlvdSBwbGVhc2UgcHJv
dmlkZSB0aGUgcGxhdGZvcm0gZGV0YWlscz8NCkhvdyBvZnRlbiB0aGUgcHJvYmxlbSBvY2N1cnMg
ZHVyaW5nIHRocm91Z2hwdXQgdGVzdGluZz8gQXJlIHRoZXJlIGFueSBzcGVjaWZpYyBzdGVwcz8N
Cg0KUmVnYXJkcywNCkFtaXRrdW1hciBLYXJ3YXINCg==

2014-10-14 10:48:08

by Belisko Marek

[permalink] [raw]
Subject: Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Dear James Cameron,

On Tue, Oct 14, 2014 at 12:20 PM, James Cameron <[email protected]> wrote:
> On Tue, Oct 14, 2014 at 10:25:01AM +0200, Belisko Marek wrote:
>> Hi Amitkumar,
>>
>> On Tue, Oct 14, 2014 at 9:08 AM, Amitkumar Karwar <[email protected]> wrote:
>> > Hi Marek,
>> >
>> > > I tried both (slightly modified as we're in 3.9 kernel) but
>> > > issue is still reproducible. My patch against 3.9 sources:
>> >
>> > Thanks a lot for the tests.
>> >
>> > > One thing which is not yet still clear to me why kernel console
>> > > is completely unresponsive when receiving packets in high
>> > > rates. When use iperf (on client) with -b40m it is OK but when
>> > > increase to -b100m then console is completely unresponsive until
>> > > iperf finish.
>> >
>> > Does the system recover when "-b100M" iperf is finished? Can we
>> > run iperf with "-b40M" later? Do you see "dev_alloc_skb failed"
>> > messages in dmesg when console is unresponsive?
>> When we get "dev_alloc_skb failed" then interface is dead (cannot
>> ping ...) so no recovery is possible only system reboot.
>
> This symptom was familiar to me, but on sdio.c, which is very
> different code.
>
> I've had a brief look at usb.c and offer the following comments:
>
> - a list of six data endpoint urb is allocated in mwifiex_usb_rx_init,
> because MWIFIEX_RX_DATA_URB is 6,
>
> - when data endpoint urb is submitted, a new skb is allocated, in
> mwifiex_usb_submit_rx_urb, and this is the only source of
> "dev_alloc_skb failed" message,
>
> - in normal situation, when data endpoint urb is complete, skb is
> either freed or handed up to mwifiex_usb_recv, and the urb is
> resubmitted, which causes a new skb to be allocated.
>
> - if "dev_alloc_skb failed" message appears, one data endpoint urb has
> been lost and is not re-used,
>
> - if six "dev_alloc_skb failed" messages appear, the interface should
> be dead for data receive only.
>
> Amitkumar mentioned this on 9th October; "corresponding URB won't get
> submitted". I think this should be fixed; dev_alloc_skb should be
> harmless failure, please retry.
>
> I don't see why interface is dead with only one "dev_alloc_skb
> failed" message.
Maybe my description wasn't not correct. I see 6 "dev_alloc_skb
failed" messages and then interface is dead
as you wrote.
>
>> I don't see dev_alloc_skb failed when console is unresponsive.
>> >
>> > > Any other ideas
>> > > what to change to check? Thanks.
>> >
>> > Could you please share dmesg log with dynamic debug enabled (using
>> > attached script) captured when the problem occurs?
>> I tried to capture logs but when enable DYNAMIC_DEBUG I cannot
>> reproduce issue (running test > 30 minutes without allocation
>> failure).
>
> Yes, I've seen similar; turn on debugging, and timing critical bug
> goes away.
>
> Serial console? If so, try turning it off, and logging to dmesg
> buffer only.
When turning off serial console how then I get kernel messages from
dmesg buffer?
>
> --
> James Cameron
> http://quozl.linux.org.au/

BR,

marek

--
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com

2014-10-30 13:56:08

by Belisko Marek

[permalink] [raw]
Subject: Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Hi Amitkumar,

On Thu, Oct 30, 2014 at 11:01 AM, Amitkumar Karwar <[email protected]> wrote:
> Hi Marek,
>
> Thanks a lot for testing. I have added flow control logic to limit pending Rx packets. Could you please check if attached patch fixes out of memory issue?
> You can also backport below patch to your 3.9 kernel for rx_pending count imbalance issue.
With rx_pending count imbalance + your patch on top I cannot reproduce
allocation failure issue (tested on 1 hour iperf session).
Reason why console was completely blocked was that iperf server eats ~
55% cpu + kworker ~27 + ksoftirq ~14 so I think this is cause
why console was unresponsive. Thanks for patches. You can add my Tested-by ;)
>
> http://www.spinics.net/lists/linux-wireless/msg115536.html
>
> Your hacks to decrement rx_pending count won't be needed now.
>
> Regards,
> Amitkumar

BR,

marek
--
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com

2014-10-09 11:57:42

by Belisko Marek

[permalink] [raw]
Subject: Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Dear Amitkumar Karwar,


On Thu, Oct 9, 2014 at 11:30 AM, Amitkumar Karwar <[email protected]> wrote:
> Hi Marek,
>
> Sorry for late reply. We have tried to simulate dev_alloc_skb() failure on our reference platform after 100th, 200th, 300th etc. packets. Our observation is when the failure occurs, corresponding URB won't get submitted, but traffic continues. Traffic stops when the failure count reaches 6 (mwifiex Rx URB count). We don't see any crash or system unresponsiveness.
No I don't see such behavior. In my case usb still sending packets
over URB (also proven by below hack). My platform is am335x (same chip
as on beaglebone).
>
>
>> I did small investigation (will my limited networking knowledge :)) and
>> to avoid usb issue I did small hack to free received packet in skb
>> (with specific size
>> 1574 which sends iperf) before sending for processing to driver
>> workqueue. With this small hack I can run iperf -b100m on client size
>> without any allocation issue.
>
> That's good :) Actually kernel will take care of freeing skb when driver submits received packet using netif_rx(). Could you please share your hack?
Yes it should be freed when netif_rx() is processed but I have feeling
that sometimes (my case) during high load (-b 100m) packets are not
send to kernel (not free skb)
until skb allocation fails. I need to better understand 11n reordering code :).

Hack (it's dirty I know but packets of that size are send only during
iperf session):

diff --git a/drivers/net/wireless/mwifiex/usb.c
b/drivers/net/wireless/mwifiex/usb.c
index f90fe21..cc548a1 100644
--- a/drivers/net/wireless/mwifiex/usb.c
+++ b/drivers/net/wireless/mwifiex/usb.c
@@ -178,6 +178,12 @@ static void mwifiex_usb_rx_complete(struct urb *urb)
dev_dbg(adapter->dev, "info: recv_length=%d, status=%d\n",
recv_length, status);
if (status == -EINPROGRESS) {
+
+ if (skb->len >= 1574) {
+ dev_kfree_skb_any(skb);
+ goto setup_for_next;
+ }
+
queue_work(adapter->workqueue, &adapter->main_work);

/* urb for data_ep is re-submitted now;
>
> Regards,
> Amit

BR,

marek

--
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com

2014-10-23 12:41:00

by Amitkumar Karwar

[permalink] [raw]
Subject: RE: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Hi Marek,

>> I tried to capture logs but when enable DYNAMIC_DEBUG I cannot
>> reproduce issue (running test > 30 minutes without allocation
>> failure).

Thanks for the testing. Yes. Sometimes timing issues won't get reproduced with debug messages enabled.

>Any update on this? Should I provide some other logs?

What's the size of Rx data packets? Is the Rx data AMSDU aggregated?(You can check if "if (rx_pkt_type == PKT_TYPE_AMSDU)" check is passed in mwifiex code) If so, disable AMSDU option in AP and try to reproduce the issue.

As you suspected earlier, we might have missed to free skbs allocated for Rx data which leads to SKB allocation failure. There is very less probability for this. But we can try below experiment.

1) I observed that debug variable "adapter->rx_pending" doesn;t get decremented when packet is submitted to kernel. Add below code change(valid only for AMSDU disabled case. Because multiple packets are submitted to kernel when AMSDU is enabled)

----------
--- a/drivers/net/wireless/mwifiex/util.c
+++ b/drivers/net/wireless/mwifiex/util.c
@@ -218,6 +218,7 @@ int mwifiex_recv_packet(struct mwifiex_private *priv, struct sk_buff *skb)

priv->stats.rx_bytes += skb->len;
priv->stats.rx_packets++;
+ atomic_dec(&priv->adapter->rx_pending);
if (in_interrupt())
netif_rx(skb);
----------

2) Add BUG_ON when first time SKB allocation is failed and print "rx_pending". If it's a huge number, we have missed freeing allocated SKB.

3) Also, get rx_pending info when Rx traffic works fine with 40M bandwidth option.

Btw, could you move to the firmware image (14.68.29.p38)
shared recently?

By any means(redirecting the log to serial console etc.), could you please capture and share kernel trace logs when system becomes unresponsive. It may give some clue and help us debug further.

Best Regards,
Amit

2014-10-13 13:42:09

by Amitkumar Karwar

[permalink] [raw]
Subject: RE: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Hi Marek,

>>
>> That's good :) Actually kernel will take care of freeing skb when
>driver submits received packet using netif_rx(). Could you please share
>your hack?
>Yes it should be freed when netif_rx() is processed but I have feeling
>that sometimes (my case) during high load (-b 100m) packets are not send
>to kernel (not free skb) until skb allocation fails. I need to better
>understand 11n reordering code :).

Rx reorder logic can buffer upto 32 packets, if they aren't received in correct order. You are right. Your platform may run out of memory in this case.
Could you please check if the problem fixes with any of the attached changes?
1) Change Rx window size from 32 to 8.
2) Disable AMPDU feature.

>
>Hack (it's dirty I know but packets of that size are send only during
>iperf session):
>
>diff --git a/drivers/net/wireless/mwifiex/usb.c
>b/drivers/net/wireless/mwifiex/usb.c
>index f90fe21..cc548a1 100644
>--- a/drivers/net/wireless/mwifiex/usb.c
>+++ b/drivers/net/wireless/mwifiex/usb.c
>@@ -178,6 +178,12 @@ static void mwifiex_usb_rx_complete(struct urb
>*urb)
> dev_dbg(adapter->dev, "info: recv_length=%d,
>status=%d\n",
> recv_length, status);
> if (status == -EINPROGRESS) {
>+
>+ if (skb->len >= 1574) {
>+ dev_kfree_skb_any(skb);
>+ goto setup_for_next;
>+ }
>+

The change doesn't look correct. The skb is being processed/used. We cannot free it here. This may create problem.

Regards,
Amit


Attachments:
disable_ampdu_feature.diff (1.01 kB)
disable_ampdu_feature.diff
reduce_Rx_window_size.diff (816.00 B)
reduce_Rx_window_size.diff
Download all attachments

2014-10-30 10:01:09

by Amitkumar Karwar

[permalink] [raw]
Subject: RE: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Hi Marek,

Thanks a lot for testing. I have added flow control logic to limit pending Rx packets. Could you please check if attached patch fixes out of memory issue?
You can also backport below patch to your 3.9 kernel for rx_pending count imbalance issue.

http://www.spinics.net/lists/linux-wireless/msg115536.html

Your hacks to decrement rx_pending count won't be needed now.

Regards,
Amitkumar


Attachments:
0001-mwifiex-add-flow-control-logic-to-Rx-data-on-USB-int.patch (3.44 kB)
0001-mwifiex-add-flow-control-logic-to-Rx-data-on-USB-int.patch

2014-10-22 13:36:26

by Belisko Marek

[permalink] [raw]
Subject: Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Below reply I post by accident in html which was rejected by ML.
Re-posting again. Sorry about that.

On Tue, Oct 14, 2014 at 10:25 AM, Belisko Marek <[email protected]> wrote:
> Hi Amitkumar,
>
> On Tue, Oct 14, 2014 at 9:08 AM, Amitkumar Karwar <[email protected]> wrote:
>> Hi Marek,
>>
>>>I tried both (slightly modified as we're in 3.9 kernel) but issue is
>>>still reproducible. My patch against 3.9 sources:
>>
>> Thanks a lot for the tests.
>>
>>>One thing which is not yet still clear to me why kernel console is
>>>completely unresponsive when receiving packets in high rates. When use
>>>iperf (on client) with -b40m it is OK but when increase to -b100m then
>>>console is completely unresponsive until iperf finish.
>>
>> Does the system recover when "-b100M" iperf is finished? Can we run iperf with "-b40M" later?
>> Do you see "dev_alloc_skb failed" messages in dmesg when console is unresponsive?
> When we get "dev_alloc_skb failed" then interface is dead (cannot ping
> ...) so no recovery is possible only system reboot.
> I don't see dev_alloc_skb failed when console is unresponsive.
>>
>>>Any other ideas
>>>what to change to check? Thanks.
>>
>> Could you please share dmesg log with dynamic debug enabled (using attached script) captured when the problem occurs?
> I tried to capture logs but when enable DYNAMIC_DEBUG I cannot
> reproduce issue (running test > 30 minutes without allocation
> failure).
Any update on this? Should I provide some other logs?
>>
>> Best regards,
>> Amit
>
> BR,
>
> marek
>
> --
> as simple and primitive as possible
> -------------------------------------------------
> Marek Belisko - OPEN-NANDRA
> Freelance Developer
>
> Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
> Tel: +421 915 052 184
> skype: marekwhite
> twitter: #opennandra
> web: http://open-nandra.com

Thanks,

marek

--
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com

2014-10-29 09:54:28

by Amitkumar Karwar

[permalink] [raw]
Subject: RE: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Hi Marek,

We have one fix in Rx reorder logic and have handled dev_alloc_skb failure by resubmitting the urbs.
Could you please test attached experimental patches?

>[ 76.769662] usb 1-1: Received between 2 BAR:2918
>[ 76.775047] usb 1-1: Networking send size:2914
>[ 77.067491] usb 1-1: rx_pending:10155
>[ 77.177524] usb 1-1: rx_pending kernel:10383
>According log it seems rx_pending is slowly increasing until allocation
>fails. Code hacks are attached.

I am checking your hacks. The reason for huge rx_pending count is we might have missed to decrement the count at some place. The difference between " Received between 2 BAR" and "Networking send size" could be genuine leaks. We will look into this.

Regards,
Amit



Attachments:
0001-mwifiex-restart-rxreorder-timer-correctly.patch (5.87 kB)
0001-mwifiex-restart-rxreorder-timer-correctly.patch
0001-mwifiex-resubmit-Rx-data-URBS-when-dev_alloc_skb-fai.patch (3.13 kB)
0001-mwifiex-resubmit-Rx-data-URBS-when-dev_alloc_skb-fai.patch
Download all attachments

2014-10-14 10:21:06

by James Cameron

[permalink] [raw]
Subject: Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

On Tue, Oct 14, 2014 at 10:25:01AM +0200, Belisko Marek wrote:
> Hi Amitkumar,
>
> On Tue, Oct 14, 2014 at 9:08 AM, Amitkumar Karwar <[email protected]> wrote:
> > Hi Marek,
> >
> > > I tried both (slightly modified as we're in 3.9 kernel) but
> > > issue is still reproducible. My patch against 3.9 sources:
> >
> > Thanks a lot for the tests.
> >
> > > One thing which is not yet still clear to me why kernel console
> > > is completely unresponsive when receiving packets in high
> > > rates. When use iperf (on client) with -b40m it is OK but when
> > > increase to -b100m then console is completely unresponsive until
> > > iperf finish.
> >
> > Does the system recover when "-b100M" iperf is finished? Can we
> > run iperf with "-b40M" later? Do you see "dev_alloc_skb failed"
> > messages in dmesg when console is unresponsive?
> When we get "dev_alloc_skb failed" then interface is dead (cannot
> ping ...) so no recovery is possible only system reboot.

This symptom was familiar to me, but on sdio.c, which is very
different code.

I've had a brief look at usb.c and offer the following comments:

- a list of six data endpoint urb is allocated in mwifiex_usb_rx_init,
because MWIFIEX_RX_DATA_URB is 6,

- when data endpoint urb is submitted, a new skb is allocated, in
mwifiex_usb_submit_rx_urb, and this is the only source of
"dev_alloc_skb failed" message,

- in normal situation, when data endpoint urb is complete, skb is
either freed or handed up to mwifiex_usb_recv, and the urb is
resubmitted, which causes a new skb to be allocated.

- if "dev_alloc_skb failed" message appears, one data endpoint urb has
been lost and is not re-used,

- if six "dev_alloc_skb failed" messages appear, the interface should
be dead for data receive only.

Amitkumar mentioned this on 9th October; "corresponding URB won't get
submitted". I think this should be fixed; dev_alloc_skb should be
harmless failure, please retry.

I don't see why interface is dead with only one "dev_alloc_skb
failed" message.

> I don't see dev_alloc_skb failed when console is unresponsive.
> >
> > > Any other ideas
> > > what to change to check? Thanks.
> >
> > Could you please share dmesg log with dynamic debug enabled (using
> > attached script) captured when the problem occurs?
> I tried to capture logs but when enable DYNAMIC_DEBUG I cannot
> reproduce issue (running test > 30 minutes without allocation
> failure).

Yes, I've seen similar; turn on debugging, and timing critical bug
goes away.

Serial console? If so, try turning it off, and logging to dmesg
buffer only.

--
James Cameron
http://quozl.linux.org.au/

2014-10-29 10:50:32

by Belisko Marek

[permalink] [raw]
Subject: Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Hi Amitkumar,

On Wed, Oct 29, 2014 at 10:54 AM, Amitkumar Karwar <[email protected]> wrote:
> Hi Marek,
>
> We have one fix in Rx reorder logic and have handled dev_alloc_skb failure by resubmitting the urbs.
> Could you please test attached experimental patches?
I tested both patches and I can still reproduce issue. With
0001-mwifiex-resubmit-Rx-data-URBS-when-dev_alloc_skb-fai.patch
I get kernel crash when allocation fails happens:
[ 118.904307] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed
[ 118.911431] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed
[ 118.918328] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed
[ 118.925204] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed
[ 118.932291] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed
[ 118.939403] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed
[ 120.171372] ------------[ cut here ]------------
[ 120.176288] WARNING: at drivers/usb/core/urb.c:327 usb_submit_urb+0x35/0x4c()
[ 120.183776] URB ce411840 submitted while active
[ 120.188528] Modules linked in: mwifiex_usb mwifiex btusb
beo_hh1_misc(O) beo_hh1_lsl(O) beo_hh1_summit(O) beo_hh1_tunnel(O)
beo_hh1_rtc(O) beo_hh1_power(O) beo_hh1_dsp(O) beo_hh1_fwupdate(O)
beo_hh1_input(O) beo_hh1_leds(O) beo_hh1_fep(O) beo_ase_fephw(O)
beo_ase_dsp(]
[ 120.217461] [<c000ff19>] (unwind_backtrace+0x1/0x98) from
[<c0026e17>] (warn_slowpath_common+0x33/0x4c)
[ 120.227332] [<c0026e17>] (warn_slowpath_common+0x33/0x4c) from
[<c0026e89>] (warn_slowpath_fmt+0x1d/0x28)
[ 120.237386] [<c0026e89>] (warn_slowpath_fmt+0x1d/0x28) from
[<c022fd4d>] (usb_submit_urb+0x35/0x4c)
[ 120.246907] [<c022fd4d>] (usb_submit_urb+0x35/0x4c) from
[<bf8a321b>] (mwifiex_usb_submit_rx_urb+0x72/0x10c [mwifiex_usb])
[ 120.258592] [<bf8a321b>] (mwifiex_usb_submit_rx_urb+0x72/0x10c
[mwifiex_usb]) from [<bf8a34a9>] (mwifiex_usb_rx_complete+0xe0/0x2b0
[mwifiex_usb])
[ 120.272388] [<bf8a34a9>] (mwifiex_usb_rx_complete+0xe0/0x2b0
[mwifiex_usb]) from [<c022dfd5>] (usb_hcd_giveback_urb+0x29/0x74)
[ 120.284360] [<c022dfd5>] (usb_hcd_giveback_urb+0x29/0x74) from
[<c024730b>] (musb_giveback+0x23/0x2c)
[ 120.294051] [<c024730b>] (musb_giveback+0x23/0x2c) from
[<c0247f21>] (musb_advance_schedule+0x35/0x16c)
[ 120.303922] [<c0247f21>] (musb_advance_schedule+0x35/0x16c) from
[<c0246605>] (musb_interrupt+0x61/0x5e0)
[ 120.313974] [<c0246605>] (musb_interrupt+0x61/0x5e0) from
[<c02492b5>] (dsps_interrupt+0x17d/0x234)
[ 120.323492] [<c02492b5>] (dsps_interrupt+0x17d/0x234) from
[<c005cc73>] (handle_irq_event_percpu+0x33/0x11c)
[ 120.333819] [<c005cc73>] (handle_irq_event_percpu+0x33/0x11c) from
[<c005cd85>] (handle_irq_event+0x29/0x3c)
[ 120.344144] [<c005cd85>] (handle_irq_event+0x29/0x3c) from
[<c005e2f3>] (handle_level_irq+0x4f/0x88)
[ 120.353738] [<c005e2f3>] (handle_level_irq+0x4f/0x88) from
[<c005c80f>] (generic_handle_irq+0x13/0x1c)
[ 120.363525] [<c005c80f>] (generic_handle_irq+0x13/0x1c) from
[<c000d595>] (handle_IRQ+0x1d/0x54)
[ 120.372758] [<c000d595>] (handle_IRQ+0x1d/0x54) from [<c0008495>]
(omap3_intc_handle_irq+0x5d/0x68)
[ 120.382262] [<c0008495>] (omap3_intc_handle_irq+0x5d/0x68) from
[<c000c7ff>] (__irq_svc+0x3f/0x64)
[ 120.391666] Exception stack(0xce569d68 to 0xce569db0)
[ 120.396977] 9d60: 00000001 ce122f98 00000000
00000001 a0000013 cf31e010
[ 120.405567] 9d80: a0000013 00000000 cf00e480 ce015b04 ce411ac0
00000020 ce122b40 ce569db0
[ 120.414151] 9da0: c0050b97 c037e914 40000033 ffffffff
[ 120.419473] [<c000c7ff>] (__irq_svc+0x3f/0x64) from [<c037e914>]
(_raw_spin_unlock_irqrestore+0x24/0x30)
[ 120.429435] [<c037e914>] (_raw_spin_unlock_irqrestore+0x24/0x30)
from [<c0247a4d>] (musb_urb_enqueue+0x4d/0x3b8)
[ 120.440128] [<c0247a4d>] (musb_urb_enqueue+0x4d/0x3b8) from
[<c022e9d7>] (usb_hcd_submit_urb+0x6f/0x57c)
[ 120.450094] [<c022e9d7>] (usb_hcd_submit_urb+0x6f/0x57c) from
[<bf8a321b>] (mwifiex_usb_submit_rx_urb+0x72/0x10c [mwifiex_usb])
[ 120.462229] [<bf8a321b>] (mwifiex_usb_submit_rx_urb+0x72/0x10c
[mwifiex_usb]) from [<bf87ec27>] (mwifiex_main_process+0x2a2/0x3ac
[mwifiex])
[ 120.475507] [<bf87ec27>] (mwifiex_main_process+0x2a2/0x3ac
[mwifiex]) from [<bf87ed3f>] (mwifiex_main_work_queue+0xe/0x10
[mwifiex])
[ 120.488049] [<bf87ed3f>] (mwifiex_main_work_queue+0xe/0x10
[mwifiex]) from [<c0036cef>] (process_one_work+0x117/0x2b4)
[ 120.499289] [<c0036cef>] (process_one_work+0x117/0x2b4) from
[<c00370e1>] (worker_thread+0xd9/0x280)
[ 120.508893] [<c00370e1>] (worker_thread+0xd9/0x280) from
[<c003a1cf>] (kthread+0x6b/0x74)
[ 120.517487] [<c003a1cf>] (kthread+0x6b/0x74) from [<c000cd95>]
(ret_from_fork+0x11/0x3c)
[ 120.525978] ---[ end trace 67ca4bd7a80b7c08 ]---
[ 120.530838] usb 1-1: usb_submit_urb failed

>
>>[ 76.769662] usb 1-1: Received between 2 BAR:2918
>>[ 76.775047] usb 1-1: Networking send size:2914
>>[ 77.067491] usb 1-1: rx_pending:10155
>>[ 77.177524] usb 1-1: rx_pending kernel:10383
>>According log it seems rx_pending is slowly increasing until allocation
>>fails. Code hacks are attached.
>
> I am checking your hacks. The reason for huge rx_pending count is we might have missed to decrement the count at some place. The difference between " Received between 2 BAR" and "Networking send size" could be genuine leaks. We will look into this.
Thanks.
>
> Regards,
> Amit
>
>

BR,

marek

--
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com

2014-10-14 06:37:46

by Belisko Marek

[permalink] [raw]
Subject: Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Dear Amitkumar Karwar,

On Mon, Oct 13, 2014 at 3:41 PM, Amitkumar Karwar <[email protected]> wrote:
> Hi Marek,
>
>>>
>>> That's good :) Actually kernel will take care of freeing skb when
>>driver submits received packet using netif_rx(). Could you please share
>>your hack?
>>Yes it should be freed when netif_rx() is processed but I have feeling
>>that sometimes (my case) during high load (-b 100m) packets are not send
>>to kernel (not free skb) until skb allocation fails. I need to better
>>understand 11n reordering code :).
>
> Rx reorder logic can buffer upto 32 packets, if they aren't received in correct order. You are right. Your platform may run out of memory in this case.
> Could you please check if the problem fixes with any of the attached changes?
> 1) Change Rx window size from 32 to 8.
> 2) Disable AMPDU feature.
Thanks for patches.
I tried both (slightly modified as we're in 3.9 kernel) but issue is
still reproducible. My patch against 3.9 sources:
diff --git a/drivers/net/wireless/mwifiex/decl.h
b/drivers/net/wireless/mwifiex/decl.h
index e8a569a..916924c 100644
--- a/drivers/net/wireless/mwifiex/decl.h
+++ b/drivers/net/wireless/mwifiex/decl.h
@@ -42,7 +42,7 @@
#define MWIFIEX_MAX_RX_BASTREAM_SUPPORTED 16

#define MWIFIEX_AMPDU_DEF_TXWINSIZE 32
-#define MWIFIEX_AMPDU_DEF_RXWINSIZE 16
+#define MWIFIEX_AMPDU_DEF_RXWINSIZE 8
#define MWIFIEX_DEFAULT_BLOCK_ACK_TIMEOUT 0xffff

#define MWIFIEX_RATE_BITMAP_MCS0 32
diff --git a/drivers/net/wireless/mwifiex/sta_event.c
b/drivers/net/wireless/mwifiex/sta_event.c
index 41aafc7..05a36f0 100644
--- a/drivers/net/wireless/mwifiex/sta_event.c
+++ b/drivers/net/wireless/mwifiex/sta_event.c
@@ -378,10 +378,7 @@ int mwifiex_process_sta_event(struct mwifiex_private *priv)
HostCmd_ACT_GEN_GET, 0, NULL);
break;
case EVENT_ADDBA:
- dev_dbg(adapter->dev, "event: ADDBA Request\n");
- mwifiex_send_cmd_async(priv, HostCmd_CMD_11N_ADDBA_RSP,
- HostCmd_ACT_GEN_SET, 0,
- adapter->event_body);
+ dev_dbg(adapter->dev, "event: ADDBA Request ignoring\n");
break;
case EVENT_DELBA:
dev_dbg(adapter->dev, "event: DELBA Request\n");
diff --git a/drivers/net/wireless/mwifiex/wmm.c
b/drivers/net/wireless/mwifiex/wmm.c
index 2670676..3d0aa02 100644
--- a/drivers/net/wireless/mwifiex/wmm.c
+++ b/drivers/net/wireless/mwifiex/wmm.c
@@ -415,8 +415,8 @@ mwifiex_wmm_init(struct mwifiex_adapter *adapter)

for (i = 0; i < MAX_NUM_TID; ++i) {
priv->aggr_prio_tbl[i].amsdu = tos_to_tid_inv[i];
- priv->aggr_prio_tbl[i].ampdu_ap = tos_to_tid_inv[i];
- priv->aggr_prio_tbl[i].ampdu_user = tos_to_tid_inv[i];
+ priv->aggr_prio_tbl[i].ampdu_ap = BA_STREAM_NOT_ALLOWED;
+ priv->aggr_prio_tbl[i].ampdu_user =
BA_STREAM_NOT_ALLOWED;
}

priv->aggr_prio_tbl[6].amsdu

One thing which is not yet still clear to me why kernel console is
completely unresponsive when
receiving packets in high rates. When use iperf (on client) with -b40m
it is OK but when increase to -b100m then console
is completely unresponsive until iperf finish. Any other ideas what to
change to check? Thanks.

>
>>
>>Hack (it's dirty I know but packets of that size are send only during
>>iperf session):
>>
>>diff --git a/drivers/net/wireless/mwifiex/usb.c
>>b/drivers/net/wireless/mwifiex/usb.c
>>index f90fe21..cc548a1 100644
>>--- a/drivers/net/wireless/mwifiex/usb.c
>>+++ b/drivers/net/wireless/mwifiex/usb.c
>>@@ -178,6 +178,12 @@ static void mwifiex_usb_rx_complete(struct urb
>>*urb)
>> dev_dbg(adapter->dev, "info: recv_length=%d,
>>status=%d\n",
>> recv_length, status);
>> if (status == -EINPROGRESS) {
>>+
>>+ if (skb->len >= 1574) {
>>+ dev_kfree_skb_any(skb);
>>+ goto setup_for_next;
>>+ }
>>+
>
> The change doesn't look correct. The skb is being processed/used. We cannot free it here. This may create problem.
>
> Regards,
> Amit

BR,

marek

--
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com

2014-10-14 07:08:09

by Amitkumar Karwar

[permalink] [raw]
Subject: RE: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Hi Marek,

>I tried both (slightly modified as we're in 3.9 kernel) but issue is
>still reproducible. My patch against 3.9 sources:

Thanks a lot for the tests.

>One thing which is not yet still clear to me why kernel console is
>completely unresponsive when receiving packets in high rates. When use
>iperf (on client) with -b40m it is OK but when increase to -b100m then
>console is completely unresponsive until iperf finish.

Does the system recover when "-b100M" iperf is finished? Can we run iperf with "-b40M" later?
Do you see "dev_alloc_skb failed" messages in dmesg when console is unresponsive?

>Any other ideas
>what to change to check? Thanks.

Could you please share dmesg log with dynamic debug enabled (using attached script) captured when the problem occurs?

Best regards,
Amit


Attachments:
mwifiex_dynamic_debug.sh (429.00 B)
mwifiex_dynamic_debug.sh

2014-10-30 16:21:10

by Amitkumar Karwar

[permalink] [raw]
Subject: RE: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Hi Marek,

>With rx_pending count imbalance + your patch on top I cannot reproduce
>allocation failure issue (tested on 1 hour iperf session).

Thanks for confirmation. I will prepare new patch for latest code and submit. It will be slightly different than the one shared with you due Rx work queue implementation in latest code.

>Reason why console was completely blocked was that iperf server eats ~
>55% cpu + kworker ~27 + ksoftirq ~14 so I think this is cause
>why console was unresponsive.

I see.

>Thanks for patches. You can add my Tested-by ;)

Sure.

Regards,
Amitkumar

2014-10-09 08:31:54

by Belisko Marek

[permalink] [raw]
Subject: Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Dear Amitkumar Karwar,

On Wed, Sep 17, 2014 at 12:52 PM, Amitkumar Karwar <[email protected]> wrote:
> Hi BR,
>
>> Dear Amitkumar Karwar,
>>
>> some additional info.
>>
>> On Thu, Sep 11, 2014 at 5:09 PM, Amitkumar Karwar <[email protected]>
>> wrote:
>> > Hi BR,
>> >
>> >>
>> >> I'm using 3.9 mainline mwifiex driver for wireless usb card. Doing
>> >> some throughput testing (with iperf) in 5GHz I got following
>> failures:
>> >> [ 221.521799] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb
>> >> failed
>> >
>> > This is skb allocation failure returned by kernel. 4k buffer is
>> always allocated for Rx packets. This issue doesn't seem to be specific
>> to 5Ghz.
>> Yes you're right. I can reproduce issue also with 2.4GHz (doing iperf
>> testing as mentioned in other email) by pinging device with card.
>> >
>> >>
>> >> I checked which which size fails to allocate and it's 4096 bytes. I
>> >> was looking to changes in never kernel releases but I cannot find
>> >> anything obvious. When connected to 2.4GHz I cannot reproduce issue
>> >> though. I'm using FW version mwifiex 1.0 (14.68.29.p26).
>> >>
>> >
>> > Could you please provide the platform details?
>> > How often the problem occurs during throughput testing? Are there any
>> specific steps?
>> One more observation is that when problem occurred complete system is
>> unresponsive (console is almost completely dead).
>
> Thanks for the more information.
> Skb alloc failure should be gracefully handled. We will look into this issue.
I did small investigation (will my limited networking knowledge :))
and to avoid usb issue
I did small hack to free received packet in skb (with specific size
1574 which sends iperf) before sending for
processing to driver workqueue. With this small hack I can run iperf
-b100m on client size without any
allocation issue. Bit more testing shows that 11n rx reordering is in
place when receiving packets send from iperf.
Is there any know issue in this area which could lead to issues
described in my report?

BTW I tested backports driver from 3.17-rc3 and issue is still present.

>
>> I can workaround issue by decreasing iperf bandwidth to ~40m. I think
>> in this situation we're running out of memory by exhaustive skb
>> allocations.
>
> Actually 6 4K size buffers are being allocated for Rx and Tx data during traffic.
> Probably your platform runs out of memory after these allocations.
>
> Could you please try changing this number(MWIFIEX_TX_DATA_URB/MWIFIEX_RX_DATA_URB macros) to 3?
>
> Regards,
> Amitkumar Karwar

BR,

marek

--
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com

2014-10-09 09:30:12

by Amitkumar Karwar

[permalink] [raw]
Subject: RE: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

SGkgTWFyZWssDQoNClNvcnJ5IGZvciBsYXRlIHJlcGx5LiBXZSBoYXZlIHRyaWVkIHRvIHNpbXVs
YXRlIGRldl9hbGxvY19za2IoKSBmYWlsdXJlIG9uIG91ciByZWZlcmVuY2UgcGxhdGZvcm0gYWZ0
ZXIgMTAwdGgsIDIwMHRoLCAzMDB0aCBldGMuIHBhY2tldHMuIE91ciBvYnNlcnZhdGlvbiBpcyB3
aGVuIHRoZSBmYWlsdXJlIG9jY3VycywgY29ycmVzcG9uZGluZyBVUkIgd29uJ3QgZ2V0IHN1Ym1p
dHRlZCwgYnV0IHRyYWZmaWMgY29udGludWVzLiBUcmFmZmljIHN0b3BzIHdoZW4gdGhlIGZhaWx1
cmUgY291bnQgcmVhY2hlcyA2IChtd2lmaWV4IFJ4IFVSQiBjb3VudCkuIFdlIGRvbid0IHNlZSBh
bnkgY3Jhc2ggb3Igc3lzdGVtIHVucmVzcG9uc2l2ZW5lc3MuDQoNCg0KPiBJIGRpZCBzbWFsbCBp
bnZlc3RpZ2F0aW9uICh3aWxsIG15IGxpbWl0ZWQgbmV0d29ya2luZyBrbm93bGVkZ2UgOikpIGFu
ZA0KPiB0byBhdm9pZCB1c2IgaXNzdWUgSSBkaWQgc21hbGwgaGFjayB0byBmcmVlIHJlY2VpdmVk
IHBhY2tldCBpbiBza2INCj4gKHdpdGggc3BlY2lmaWMgc2l6ZQ0KPiAxNTc0IHdoaWNoIHNlbmRz
IGlwZXJmKSBiZWZvcmUgc2VuZGluZyBmb3IgcHJvY2Vzc2luZyB0byBkcml2ZXINCj4gd29ya3F1
ZXVlLiBXaXRoIHRoaXMgc21hbGwgaGFjayBJIGNhbiBydW4gaXBlcmYgLWIxMDBtIG9uIGNsaWVu
dCBzaXplDQo+IHdpdGhvdXQgYW55IGFsbG9jYXRpb24gaXNzdWUuDQoNClRoYXQncyBnb29kIDop
IEFjdHVhbGx5IGtlcm5lbCB3aWxsIHRha2UgY2FyZSBvZiBmcmVlaW5nIHNrYiB3aGVuIGRyaXZl
ciBzdWJtaXRzIHJlY2VpdmVkIHBhY2tldCB1c2luZyBuZXRpZl9yeCgpLiBDb3VsZCB5b3UgcGxl
YXNlIHNoYXJlIHlvdXIgaGFjaz8NCg0KUmVnYXJkcywNCkFtaXQNCg==

2014-10-14 20:44:53

by James Cameron

[permalink] [raw]
Subject: Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

On Tue, Oct 14, 2014 at 12:48:07PM +0200, Belisko Marek wrote:
> Dear James Cameron,
>
> On Tue, Oct 14, 2014 at 12:20 PM, James Cameron <[email protected]> wrote:
> > On Tue, Oct 14, 2014 at 10:25:01AM +0200, Belisko Marek wrote:
> >> Hi Amitkumar,
> >>
> >> On Tue, Oct 14, 2014 at 9:08 AM, Amitkumar Karwar <[email protected]> wrote:
> >> > Hi Marek,
> >> >
> >> > > I tried both (slightly modified as we're in 3.9 kernel) but
> >> > > issue is still reproducible. My patch against 3.9 sources:
> >> >
> >> > Thanks a lot for the tests.
> >> >
> >> > > One thing which is not yet still clear to me why kernel console
> >> > > is completely unresponsive when receiving packets in high
> >> > > rates. When use iperf (on client) with -b40m it is OK but when
> >> > > increase to -b100m then console is completely unresponsive until
> >> > > iperf finish.
> >> >
> >> > Does the system recover when "-b100M" iperf is finished? Can we
> >> > run iperf with "-b40M" later? Do you see "dev_alloc_skb failed"
> >> > messages in dmesg when console is unresponsive?
> >> When we get "dev_alloc_skb failed" then interface is dead (cannot
> >> ping ...) so no recovery is possible only system reboot.
> >
> > This symptom was familiar to me, but on sdio.c, which is very
> > different code.
> >
> > I've had a brief look at usb.c and offer the following comments:
> >
> > - a list of six data endpoint urb is allocated in mwifiex_usb_rx_init,
> > because MWIFIEX_RX_DATA_URB is 6,
> >
> > - when data endpoint urb is submitted, a new skb is allocated, in
> > mwifiex_usb_submit_rx_urb, and this is the only source of
> > "dev_alloc_skb failed" message,
> >
> > - in normal situation, when data endpoint urb is complete, skb is
> > either freed or handed up to mwifiex_usb_recv, and the urb is
> > resubmitted, which causes a new skb to be allocated.
> >
> > - if "dev_alloc_skb failed" message appears, one data endpoint urb has
> > been lost and is not re-used,
> >
> > - if six "dev_alloc_skb failed" messages appear, the interface should
> > be dead for data receive only.
> >
> > Amitkumar mentioned this on 9th October; "corresponding URB won't get
> > submitted". I think this should be fixed; dev_alloc_skb should be
> > harmless failure, please retry.
> >
> > I don't see why interface is dead with only one "dev_alloc_skb
> > failed" message.
> Maybe my description wasn't not correct. I see 6 "dev_alloc_skb
> failed" messages and then interface is dead
> as you wrote.

Thanks. That is what I expected.

> >> I don't see dev_alloc_skb failed when console is unresponsive.
> >> >
> >> > > Any other ideas
> >> > > what to change to check? Thanks.
> >> >
> >> > Could you please share dmesg log with dynamic debug enabled (using
> >> > attached script) captured when the problem occurs?
> >> I tried to capture logs but when enable DYNAMIC_DEBUG I cannot
> >> reproduce issue (running test > 30 minutes without allocation
> >> failure).
> >
> > Yes, I've seen similar; turn on debugging, and timing critical bug
> > goes away.
> >
> > Serial console? If so, try turning it off, and logging to dmesg
> > buffer only.
> When turning off serial console how then I get kernel messages from
> dmesg buffer?

I use three techniques depending on how usable the system is afterwards:

0. capture kernel messages to disk, using rsyslog or other user space
assistance; often there is enough information saved before the system
is unresponsive,

1. type dmesg at a console shell prompt,

2. watchdog trigger system reset, and in firmware locate the dmesg
buffer structures in memory, and dump them to serial.

http://lists.laptop.org/pipermail/devel/2012-January/034253.html

--
James Cameron
http://quozl.linux.org.au/

2014-10-14 08:25:02

by Belisko Marek

[permalink] [raw]
Subject: Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Hi Amitkumar,

On Tue, Oct 14, 2014 at 9:08 AM, Amitkumar Karwar <[email protected]> wrote:
> Hi Marek,
>
>>I tried both (slightly modified as we're in 3.9 kernel) but issue is
>>still reproducible. My patch against 3.9 sources:
>
> Thanks a lot for the tests.
>
>>One thing which is not yet still clear to me why kernel console is
>>completely unresponsive when receiving packets in high rates. When use
>>iperf (on client) with -b40m it is OK but when increase to -b100m then
>>console is completely unresponsive until iperf finish.
>
> Does the system recover when "-b100M" iperf is finished? Can we run iperf with "-b40M" later?
> Do you see "dev_alloc_skb failed" messages in dmesg when console is unresponsive?
When we get "dev_alloc_skb failed" then interface is dead (cannot ping
...) so no recovery is possible only system reboot.
I don't see dev_alloc_skb failed when console is unresponsive.
>
>>Any other ideas
>>what to change to check? Thanks.
>
> Could you please share dmesg log with dynamic debug enabled (using attached script) captured when the problem occurs?
I tried to capture logs but when enable DYNAMIC_DEBUG I cannot
reproduce issue (running test > 30 minutes without allocation
failure).
>
> Best regards,
> Amit

BR,

marek

--
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com

2014-10-28 12:35:58

by Belisko Marek

[permalink] [raw]
Subject: Re: mwifiex_usb_submit_rx_urb: dev_alloc_skb failed when conected to 5GHz

Hi Amitkumar,

On Thu, Oct 23, 2014 at 2:40 PM, Amitkumar Karwar <[email protected]> wrote:
> Hi Marek,
>
>>> I tried to capture logs but when enable DYNAMIC_DEBUG I cannot
>>> reproduce issue (running test > 30 minutes without allocation
>>> failure).
>
> Thanks for the testing. Yes. Sometimes timing issues won't get reproduced with debug messages enabled.
>
>>Any update on this? Should I provide some other logs?
>
> What's the size of Rx data packets? Is the Rx data AMSDU aggregated?(You can check if "if (rx_pkt_type == PKT_TYPE_AMSDU)" check is passed in mwifiex code) If so, disable AMSDU option in AP and try to reproduce the issue.
Size of Rx data packets are : pkt type == 2, size - 1574bytes + BAR
pkt type, size - 34bytes. AMSDU isn't enabled on AP (verified by
adding debug message in mwifiex_process_sta_rx_packet()).
>
> As you suspected earlier, we might have missed to free skbs allocated for Rx data which leads to SKB allocation failure. There is very less probability for this. But we can try below experiment.
>
> 1) I observed that debug variable "adapter->rx_pending" doesn;t get decremented when packet is submitted to kernel. Add below code change(valid only for AMSDU disabled case. Because multiple packets are submitted to kernel when AMSDU is enabled)
>
> ----------
> --- a/drivers/net/wireless/mwifiex/util.c
> +++ b/drivers/net/wireless/mwifiex/util.c
> @@ -218,6 +218,7 @@ int mwifiex_recv_packet(struct mwifiex_private *priv, struct sk_buff *skb)
>
> priv->stats.rx_bytes += skb->len;
> priv->stats.rx_packets++;
> + atomic_dec(&priv->adapter->rx_pending);
> if (in_interrupt())
> netif_rx(skb);
> ----------
OK, patch applied.
>
> 2) Add BUG_ON when first time SKB allocation is failed and print "rx_pending". If it's a huge number, we have missed freeing allocated SKB.
[ 167.624452] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb
failed, rx_pending:26893
[ 167.632885] ------------[ cut here ]------------
[ 167.637743] Kernel BUG at bf8a22ae [verbose debug info unavailable]
so number seems to be huge and we seems miss free allocated skb.

I did some hacks to code which shows how many packets are received
between 2 BAR packets + I print every 500ms rx_pending packets (when
packet is received) and
also when packet is send to kernel. I also update counter how many
packets after reordering are sent to kernel. Log:
[ 71.973800] usb 1-1: rx_pending:11
[ 72.077308] usb 1-1: rx_pending kernel:10
[ 72.477546] usb 1-1: rx_pending:868
[ 72.587877] usb 1-1: rx_pending kernel:1096
[ 72.818041] usb 1-1: Received between 2 BAR:6275
[ 72.823127] usb 1-1: Networking send size:6271
[ 72.987375] usb 1-1: rx_pending:1940
[ 73.097504] usb 1-1: rx_pending kernel:2159
[ 73.431973] usb 1-1: Received between 2 BAR:1602
[ 73.437106] usb 1-1: Networking send size:1608
[ 73.497381] usb 1-1: rx_pending:2983
[ 73.608315] usb 1-1: rx_pending kernel:3091
[ 74.007379] usb 1-1: rx_pending:3767
[ 74.117879] usb 1-1: rx_pending kernel:3998
[ 74.517375] usb 1-1: rx_pending:4854
[ 74.543168] usb 1-1: Received between 2 BAR:3152
[ 74.548371] usb 1-1: Networking send size:3152
[ 74.627509] usb 1-1: rx_pending kernel:5062
[ 74.743362] usb 1-1: Received between 2 BAR:495
[ 74.748483] usb 1-1: Networking send size:494
[ 75.027523] usb 1-1: rx_pending:5872
[ 75.137961] usb 1-1: rx_pending kernel:6106
[ 75.537485] usb 1-1: rx_pending:6959
[ 75.647934] usb 1-1: rx_pending kernel:7188
[ 75.656273] usb 1-1: Received between 2 BAR:2383
[ 75.661528] usb 1-1: Networking send size:2382
[ 76.047441] usb 1-1: rx_pending:8004
[ 76.157712] usb 1-1: rx_pending kernel:8240
[ 76.557547] usb 1-1: rx_pending:9095
[ 76.667991] usb 1-1: rx_pending kernel:9326
[ 76.769662] usb 1-1: Received between 2 BAR:2918
[ 76.775047] usb 1-1: Networking send size:2914
[ 77.067491] usb 1-1: rx_pending:10155
[ 77.177524] usb 1-1: rx_pending kernel:10383
[ 77.577547] usb 1-1: rx_pending:11237
[ 77.687859] usb 1-1: rx_pending kernel:11461
[ 78.087401] usb 1-1: rx_pending:12304
[ 78.197992] usb 1-1: rx_pending kernel:12539
[ 78.597442] usb 1-1: rx_pending:13391
[ 78.707786] usb 1-1: rx_pending kernel:13628
[ 79.107487] usb 1-1: rx_pending:14469
[ 79.217812] usb 1-1: rx_pending kernel:14704
[ 79.617528] usb 1-1: rx_pending:15555
[ 79.727734] usb 1-1: rx_pending kernel:15781
[ 80.127486] usb 1-1: rx_pending:16624
[ 80.237756] usb 1-1: rx_pending kernel:16855
[ 80.637423] usb 1-1: rx_pending:17699
[ 80.747868] usb 1-1: rx_pending kernel:17926
[ 81.147446] usb 1-1: rx_pending:18769
[ 81.257800] usb 1-1: rx_pending kernel:18995
[ 81.657399] usb 1-1: rx_pending:19851
[ 81.767785] usb 1-1: rx_pending kernel:20083
[ 82.167341] usb 1-1: rx_pending:20938
[ 82.278005] usb 1-1: rx_pending kernel:21174
[ 82.677527] usb 1-1: rx_pending:22025
[ 82.787508] usb 1-1: rx_pending kernel:22246
[ 83.187390] usb 1-1: rx_pending:23103
[ 83.297690] usb 1-1: rx_pending kernel:23326
[ 83.697376] usb 1-1: rx_pending:24182
[ 83.807711] usb 1-1: rx_pending kernel:24404
[ 84.207487] usb 1-1: rx_pending:25236
[ 84.317747] usb 1-1: rx_pending kernel:25466
[ 84.717378] usb 1-1: rx_pending:26308
[ 84.827806] usb 1-1: rx_pending kernel:26522
[ 85.070931] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb
failed - rx_pending:27017
[ 85.079788] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb
failed - rx_pending:27019
[ 85.088424] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb
failed - rx_pending:27020
[ 85.097029] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb
failed - rx_pending:27020
[ 85.105845] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb
failed - rx_pending:27022
[ 85.114468] usb 1-1: mwifiex_usb_submit_rx_urb: dev_alloc_skb
failed - rx_pending:27023
[ 85.337391] usb 1-1: rx_pending kernel:22327
[ 85.400153] usb 1-1: Received between 2 BAR:27875
[ 85.405135] usb 1-1: Networking send size:27881
[ 85.766194] usb 1-1: Received between 2 BAR:7781
[ 85.771152] usb 1-1: Networking send size:7781
[ 85.847374] usb 1-1: rx_pending kernel:11681
[ 86.112809] usb 1-1: Received between 2 BAR:7263
[ 86.117755] usb 1-1: Networking send size:7262
[ 86.357378] usb 1-1: rx_pending kernel:827

According log it seems rx_pending is slowly increasing until
allocation fails. Code hacks are attached.

>
> 3) Also, get rx_pending info when Rx traffic works fine with 40M bandwidth option.
when -b40 is set then IMO 11n reordering isn't applied. I added some
debug code to mwifiex_11n_rx_reorder_pkt()
.. snip ..
tbl = mwifiex_11n_get_rx_reorder_tbl(priv, tid, ta);
if (!tbl) {
.. snip ..

is always true (!tbl) so rx_pending after session is 0.

>
> Btw, could you move to the firmware image (14.68.29.p38)
> shared recently?
Tested with 14.68.29.p38 FW version. Issue still reproducible.
>
> By any means(redirecting the log to serial console etc.), could you please capture and share kernel trace logs when system becomes unresponsive. It may give some clue and help us debug further.
As I wrote earlier with DYNAMIC_DEBUG I cannot reproduce issue. Maybe
when we drop some debug code (received packet over USB) issue can be
reproducible again, but it's just guess.
>
> Best Regards,
> Amit

BR,

marek

--
as simple and primitive as possible
-------------------------------------------------
Marek Belisko - OPEN-NANDRA
Freelance Developer

Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
Tel: +421 915 052 184
skype: marekwhite
twitter: #opennandra
web: http://open-nandra.com


Attachments:
patch.txt (8.55 kB)