2017-08-30 19:54:52

by Ganapathi Bhat

[permalink] [raw]
Subject: [PATCH 0/2] mwifiex: address URB submission failure

This patch series fixes a driver issue with USB interface. When
URB submission fails due to skb allocation, it is retried in
main thread.

Ganapathi Bhat (1):
mwifiex: print URB submit failure error after threshold attemtps

James Cao (1):
mwifiex: resubmit failed to submit RX URBs in main thread

drivers/net/wireless/marvell/mwifiex/main.c | 11 +++++++++++
drivers/net/wireless/marvell/mwifiex/usb.c | 11 +++++++++--
drivers/net/wireless/marvell/mwifiex/usb.h | 2 ++
3 files changed, 22 insertions(+), 2 deletions(-)

--
1.9.1


2017-08-30 19:57:00

by Ganapathi Bhat

[permalink] [raw]
Subject: [PATCH 2/2] mwifiex: print URB submit failure error after threshold attemtps

Current driver prints dev_alloc_skb failures everytime while
submitting RX URBs. This failure might be frequent in some
low resource platforms. So, wait for a threshold failure
count before start priting the error. This change is a follow
up for the 'commit 7b368e3d15c3
("mwifiex: resubmit failed to submit RX URBs in main thread")'

Signed-off-by: Cathy Luo <[email protected]>
Signed-off-by: Ganapathi Bhat <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/usb.c | 11 +++++++++--
drivers/net/wireless/marvell/mwifiex/usb.h | 2 ++
2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/usb.c b/drivers/net/wireless/marvell/mwifiex/usb.c
index f4f2b9b..98f6973 100644
--- a/drivers/net/wireless/marvell/mwifiex/usb.c
+++ b/drivers/net/wireless/marvell/mwifiex/usb.c
@@ -300,9 +300,16 @@ static int mwifiex_usb_submit_rx_urb(struct urb_context *ctx, int size)
if (card->rx_cmd_ep != ctx->ep) {
ctx->skb = dev_alloc_skb(size);
if (!ctx->skb) {
- mwifiex_dbg(adapter, ERROR,
- "%s: dev_alloc_skb failed\n", __func__);
+ if (++card->rx_urb_failure_count >
+ MWIFIEX_RX_URB_FAILURE_THRESHOLD) {
+ mwifiex_dbg(adapter, ERROR,
+ "%s: dev_alloc_skb failed, failure count = %u\n",
+ __func__,
+ card->rx_urb_failure_count);
+ }
return -ENOMEM;
+ } else {
+ card->rx_urb_failure_count = 0;
}
}

diff --git a/drivers/net/wireless/marvell/mwifiex/usb.h b/drivers/net/wireless/marvell/mwifiex/usb.h
index 37abd22..dc4750b 100644
--- a/drivers/net/wireless/marvell/mwifiex/usb.h
+++ b/drivers/net/wireless/marvell/mwifiex/usb.h
@@ -43,6 +43,7 @@
#define MWIFIEX_TX_DATA_URB 6
#define MWIFIEX_RX_DATA_URB 6
#define MWIFIEX_USB_TIMEOUT 100
+#define MWIFIEX_RX_URB_FAILURE_THRESHOLD 20

#define USB8766_DEFAULT_FW_NAME "mrvl/usb8766_uapsta.bin"
#define USB8797_DEFAULT_FW_NAME "mrvl/usb8797_uapsta.bin"
@@ -117,6 +118,7 @@ struct usb_card_rec {
u8 rx_cmd_interval;
int tx_cmd_ep_type;
u8 tx_cmd_interval;
+ u32 rx_urb_failure_count;
};

struct fw_header {
--
1.9.1

2017-08-30 19:55:58

by Ganapathi Bhat

[permalink] [raw]
Subject: [PATCH 1/2] mwifiex: resubmit failed to submit RX URBs in main thread

From: James Cao <[email protected]>

Current driver has 6 Rx data URBs. Once any packet received
kernel calls our callback, in which the same URB will be
resubmitted after Rx indication. In URB submission function a new
skb will be allocated since the previous one is passed to upper
layer (freed later). Since the skb is from a special pool (not
regular memory), skb allocation may fail when kernel holds a lot
of Rx packets on some low resource platforms. The URB will not be
resubmitted in this no free skb case. If driver fails to resubmit
all 6 URBs, Rx will stop. To cover this scenario check and
resubmit Rx URBs in main thread.

Signed-off-by: James Cao <[email protected]>
Signed-off-by: Cathy Luo <[email protected]>
Signed-off-by: Ganapathi Bhat <[email protected]>
---
drivers/net/wireless/marvell/mwifiex/main.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index ee40b73..c78014b 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -247,6 +247,7 @@ int mwifiex_main_process(struct mwifiex_adapter *adapter)
{
int ret = 0;
unsigned long flags;
+ struct usb_card_rec *usb_card;

spin_lock_irqsave(&adapter->main_proc_lock, flags);

@@ -278,6 +279,16 @@ int mwifiex_main_process(struct mwifiex_adapter *adapter)
break;
}

+ /* Try to resubmit RX URB if sunmission failed earlier */
+ if (!atomic_read(&adapter->rx_pending) &&
+ adapter->iface_type == MWIFIEX_USB) {
+ usb_card = adapter->card;
+ if (atomic_read(&usb_card->rx_data_urb_pending) <
+ MWIFIEX_RX_DATA_URB &&
+ adapter->if_ops.submit_rem_rx_urbs)
+ adapter->if_ops.submit_rem_rx_urbs(adapter);
+ }
+
/* Handle pending interrupt if any */
if (adapter->int_status) {
if (adapter->hs_activated)
--
1.9.1

2017-09-15 09:46:39

by Ganapathi Bhat

[permalink] [raw]
Subject: RE: [EXT] Re: [PATCH 2/2] mwifiex: print URB submit failure error after threshold attemtps

Hi Brian,
>
> Hi Ganapathi,
>
> On Thu, Sep 14, 2017 at 02:14:24PM +0000, Ganapathi Bhat wrote:
> > > On Thu, 2017-08-31 at 01:21 +0530, Ganapathi Bhat wrote:
> > > > Current driver prints dev_alloc_skb failures everytime while
> > > > submitting RX URBs. This failure might be frequent in some low
> > > > resource platforms. So, wait for a threshold failure count before
> > > > start priting the error. This change is a follow up for the
> > > > 'commit
> > > > 7b368e3d15c3
> > > > ("mwifiex: resubmit failed to submit RX URBs in main thread")'
> > >
> > > []
> > >
> > > > diff --git a/drivers/net/wireless/marvell/mwifiex/usb.c
> > > > b/drivers/net/wireless/marvell/mwifiex/usb.c
> > > []
> > > > @@ -300,9 +300,16 @@ static int mwifiex_usb_submit_rx_urb(struct
> > > urb_context *ctx, int size)
> > > > if (card->rx_cmd_ep != ctx->ep) {
> > > > ctx->skb = dev_alloc_skb(size);
> > > > if (!ctx->skb) {
> > > > - mwifiex_dbg(adapter, ERROR,
> > > > - "%s: dev_alloc_skb failed\n",
> __func__);
> > > > + if (++card->rx_urb_failure_count >
> > > > + MWIFIEX_RX_URB_FAILURE_THRESHOLD) {
> > > > + mwifiex_dbg(adapter, ERROR,
> > > > + "%s: dev_alloc_skb failed,
> failure
> > > count = %u\n",
> > > > + __func__,
> > > > + card->rx_urb_failure_count);
> > > > + }
> > > > return -ENOMEM;
> > >
> > > Why not use a ratelimit?
> > Since this is for receive, the packets are from AP side and we cannot
> > lower the rate from AP. On some low performance systems this change
> > will be helpful.
>
> I think Joe was referring to things like printk_ratelimited() or
> dev_err_ratelimited(). Those automatically ratelimit prints for you,
> using a static counter. You'd just need to make a small warpper for
> mwifiex_dbg() using __ratelimit().

Got it. Yet it looks he meant the same. Thank you.

>
> Those sort of rate limits are significantly different than yours
> though.
> You were looking to avoid printing errors when there are only a few
> failures in a row, whereas the existing rate-limiting infrastructure
> looks to avoid printing errors if too many happen in a row. Those are
> different goals.
>
> Brian

Ok.

Hi Joe,

Let us know your comments on the above.

Thanks,
Ganapathi

2017-09-01 04:05:31

by Joe Perches

[permalink] [raw]
Subject: Re: [PATCH 2/2] mwifiex: print URB submit failure error after threshold attemtps

On Thu, 2017-08-31 at 01:21 +0530, Ganapathi Bhat wrote:
> Current driver prints dev_alloc_skb failures everytime while
> submitting RX URBs. This failure might be frequent in some
> low resource platforms. So, wait for a threshold failure
> count before start priting the error. This change is a follow
> up for the 'commit 7b368e3d15c3
> ("mwifiex: resubmit failed to submit RX URBs in main thread")'

[]

> diff --git a/drivers/net/wireless/marvell/mwifiex/usb.c b/drivers/net/wireless/marvell/mwifiex/usb.c
[]
> @@ -300,9 +300,16 @@ static int mwifiex_usb_submit_rx_urb(struct urb_context *ctx, int size)
> if (card->rx_cmd_ep != ctx->ep) {
> ctx->skb = dev_alloc_skb(size);
> if (!ctx->skb) {
> - mwifiex_dbg(adapter, ERROR,
> - "%s: dev_alloc_skb failed\n", __func__);
> + if (++card->rx_urb_failure_count >
> + MWIFIEX_RX_URB_FAILURE_THRESHOLD) {
> + mwifiex_dbg(adapter, ERROR,
> + "%s: dev_alloc_skb failed, failure count = %u\n",
> + __func__,
> + card->rx_urb_failure_count);
> + }
> return -ENOMEM;

Why not use a ratelimit?

2017-09-20 12:13:10

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 1/2] mwifiex: resubmit failed to submit RX URBs in main thread

Ganapathi Bhat <[email protected]> writes:

> From: James Cao <[email protected]>
>
> Current driver has 6 Rx data URBs. Once any packet received
> kernel calls our callback, in which the same URB will be
> resubmitted after Rx indication. In URB submission function a new
> skb will be allocated since the previous one is passed to upper
> layer (freed later). Since the skb is from a special pool (not
> regular memory), skb allocation may fail when kernel holds a lot
> of Rx packets on some low resource platforms.

The special pool being GFP_ATOMIC allocations or what?

> The URB will not be resubmitted in this no free skb case. If driver
> fails to resubmit all 6 URBs, Rx will stop. To cover this scenario
> check and resubmit Rx URBs in main thread.
>
> Signed-off-by: James Cao <[email protected]>
> Signed-off-by: Cathy Luo <[email protected]>
> Signed-off-by: Ganapathi Bhat <[email protected]>

[...]

> @@ -278,6 +279,16 @@ int mwifiex_main_process(struct mwifiex_adapter *adapter)
> break;
> }
>
> + /* Try to resubmit RX URB if sunmission failed earlier */
> + if (!atomic_read(&adapter->rx_pending) &&
> + adapter->iface_type == MWIFIEX_USB) {
> + usb_card = adapter->card;
> + if (atomic_read(&usb_card->rx_data_urb_pending) <
> + MWIFIEX_RX_DATA_URB &&
> + adapter->if_ops.submit_rem_rx_urbs)
> + adapter->if_ops.submit_rem_rx_urbs(adapter);
> + }

To me this just feels wrong. Normally the proceduce is to drop the frame
if allocations fail, not try to reallocate. I need more convincing that
this really is the right approach.

--
Kalle Valo

2017-09-20 04:30:33

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 2/2] mwifiex: print URB submit failure error after threshold attemtps

Brian Norris <[email protected]> writes:

> Hi Ganapathi,
>
> On Thu, Sep 14, 2017 at 02:14:24PM +0000, Ganapathi Bhat wrote:
>> > On Thu, 2017-08-31 at 01:21 +0530, Ganapathi Bhat wrote:
>> > > @@ -300,9 +300,16 @@ static int mwifiex_usb_submit_rx_urb(struct
>> > urb_context *ctx, int size)
>> > > if (card->rx_cmd_ep != ctx->ep) {
>> > > ctx->skb = dev_alloc_skb(size);
>> > > if (!ctx->skb) {
>> > > - mwifiex_dbg(adapter, ERROR,
>> > > - "%s: dev_alloc_skb failed\n", __func__);
>> > > + if (++card->rx_urb_failure_count >
>> > > + MWIFIEX_RX_URB_FAILURE_THRESHOLD) {
>> > > + mwifiex_dbg(adapter, ERROR,
>> > > + "%s: dev_alloc_skb failed, failure
>> > count = %u\n",
>> > > + __func__,
>> > > + card->rx_urb_failure_count);
>> > > + }
>> > > return -ENOMEM;
>> >
>> > Why not use a ratelimit?
>> Since this is for receive, the packets are from AP side and we cannot
>> lower the rate from AP. On some low performance systems this change
>> will be helpful.
>
> I think Joe was referring to things like printk_ratelimited() or
> dev_err_ratelimited(). Those automatically ratelimit prints for you,
> using a static counter. You'd just need to make a small warpper for
> mwifiex_dbg() using __ratelimit().
>
> Those sort of rate limits are significantly different than yours though.
> You were looking to avoid printing errors when there are only a few
> failures in a row, whereas the existing rate-limiting infrastructure
> looks to avoid printing errors if too many happen in a row. Those are
> different goals.

Are you saying that this patch is good to take? Or should Ganapathi
submit v2?

--
Kalle Valo

2017-09-20 06:02:49

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 2/2] mwifiex: print URB submit failure error after threshold attemtps

Hi Kalle,

On Wed, Sep 20, 2017 at 07:30:29AM +0300, Kalle Valo wrote:
> Brian Norris <[email protected]> writes:
>
> > Hi Ganapathi,
> >
> > On Thu, Sep 14, 2017 at 02:14:24PM +0000, Ganapathi Bhat wrote:
> >> > On Thu, 2017-08-31 at 01:21 +0530, Ganapathi Bhat wrote:

> >> > Why not use a ratelimit?
> >> Since this is for receive, the packets are from AP side and we cannot
> >> lower the rate from AP. On some low performance systems this change
> >> will be helpful.
> >
> > I think Joe was referring to things like printk_ratelimited() or
> > dev_err_ratelimited(). Those automatically ratelimit prints for you,
> > using a static counter. You'd just need to make a small warpper for
> > mwifiex_dbg() using __ratelimit().
> >
> > Those sort of rate limits are significantly different than yours though.
> > You were looking to avoid printing errors when there are only a few
> > failures in a row, whereas the existing rate-limiting infrastructure
> > looks to avoid printing errors if too many happen in a row. Those are
> > different goals.
>
> Are you saying that this patch is good to take? Or should Ganapathi
> submit v2?

If you're asking me...

All I was saying was that I don't think Joe's suggestion will help
Ganapathi. I'd expect Ganapathi could confirm/deny that part. (Or Joe
could correct me if my interpretation is wrong.)

I'm also not familiar with how we expect dev_alloc_skb() failures to be
handled. If that's a common expected failure mode in low-memory
situations (seems reasonable?) and the driver handles these failure just
fine (completely unreviewed by me, so far; I suspect it doesn't do this
completely correctly), then sure, being less noisy about it as done in
this patch should be fine.

IOW, I don't have concrete feedback for Ganapathi to address, but I'm
not exactly "ack"ing it myself.

Brian

2017-09-20 12:09:40

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 2/2] mwifiex: print URB submit failure error after threshold attemtps

Brian Norris <[email protected]> writes:

> On Wed, Sep 20, 2017 at 07:30:29AM +0300, Kalle Valo wrote:
>> Brian Norris <[email protected]> writes:
>>
>> > Hi Ganapathi,
>> >
>> > On Thu, Sep 14, 2017 at 02:14:24PM +0000, Ganapathi Bhat wrote:
>> >> > On Thu, 2017-08-31 at 01:21 +0530, Ganapathi Bhat wrote:
>
>> >> > Why not use a ratelimit?
>> >> Since this is for receive, the packets are from AP side and we cannot
>> >> lower the rate from AP. On some low performance systems this change
>> >> will be helpful.
>> >
>> > I think Joe was referring to things like printk_ratelimited() or
>> > dev_err_ratelimited(). Those automatically ratelimit prints for you,
>> > using a static counter. You'd just need to make a small warpper for
>> > mwifiex_dbg() using __ratelimit().
>> >
>> > Those sort of rate limits are significantly different than yours though.
>> > You were looking to avoid printing errors when there are only a few
>> > failures in a row, whereas the existing rate-limiting infrastructure
>> > looks to avoid printing errors if too many happen in a row. Those are
>> > different goals.
>>
>> Are you saying that this patch is good to take? Or should Ganapathi
>> submit v2?
>
> If you're asking me...

Yeah, I was asking you because to me this patch looks like an ugly
workaround to a bug. And now that looked patch 1 more closely it feels
the same.

> All I was saying was that I don't think Joe's suggestion will help
> Ganapathi. I'd expect Ganapathi could confirm/deny that part. (Or Joe
> could correct me if my interpretation is wrong.)

Ok.

> I'm also not familiar with how we expect dev_alloc_skb() failures to be
> handled. If that's a common expected failure mode in low-memory
> situations (seems reasonable?) and the driver handles these failure just
> fine (completely unreviewed by me, so far; I suspect it doesn't do this
> completely correctly), then sure, being less noisy about it as done in
> this patch should be fine.

But this is a debug message so it should not bother normal users, right?
I think that having a threshold like this is just hiding problems and
not solving them. The real issue here is that dev_alloc_skb() is failing
and that's what should be solved, not to paper it over by limiting debug
messages. It just means that the real issue will be even more difficult
to detect in the future.

> IOW, I don't have concrete feedback for Ganapathi to address, but I'm
> not exactly "ack"ing it myself.

I'm not very confident about this patch either, it's not just making any
sense.

--
Kalle Valo

2017-09-14 14:14:32

by Ganapathi Bhat

[permalink] [raw]
Subject: RE: [EXT] Re: [PATCH 2/2] mwifiex: print URB submit failure error after threshold attemtps

Hi Joe,

> On Thu, 2017-08-31 at 01:21 +0530, Ganapathi Bhat wrote:
> > Current driver prints dev_alloc_skb failures everytime while
> > submitting RX URBs. This failure might be frequent in some low
> > resource platforms. So, wait for a threshold failure count before
> > start priting the error. This change is a follow up for the 'commit
> > 7b368e3d15c3
> > ("mwifiex: resubmit failed to submit RX URBs in main thread")'
>
> []
>
> > diff --git a/drivers/net/wireless/marvell/mwifiex/usb.c
> > b/drivers/net/wireless/marvell/mwifiex/usb.c
> []
> > @@ -300,9 +300,16 @@ static int mwifiex_usb_submit_rx_urb(struct
> urb_context *ctx, int size)
> > if (card->rx_cmd_ep != ctx->ep) {
> > ctx->skb = dev_alloc_skb(size);
> > if (!ctx->skb) {
> > - mwifiex_dbg(adapter, ERROR,
> > - "%s: dev_alloc_skb failed\n", __func__);
> > + if (++card->rx_urb_failure_count >
> > + MWIFIEX_RX_URB_FAILURE_THRESHOLD) {
> > + mwifiex_dbg(adapter, ERROR,
> > + "%s: dev_alloc_skb failed, failure
> count = %u\n",
> > + __func__,
> > + card->rx_urb_failure_count);
> > + }
> > return -ENOMEM;
>
> Why not use a ratelimit?
Since this is for receive, the packets are from AP side and we cannot lower the rate from AP. On some low performance systems this change will be helpful.

Regards,
Ganapathi

2017-09-14 22:00:00

by Brian Norris

[permalink] [raw]
Subject: Re: [EXT] Re: [PATCH 2/2] mwifiex: print URB submit failure error after threshold attemtps

Hi Ganapathi,

On Thu, Sep 14, 2017 at 02:14:24PM +0000, Ganapathi Bhat wrote:
> > On Thu, 2017-08-31 at 01:21 +0530, Ganapathi Bhat wrote:
> > > Current driver prints dev_alloc_skb failures everytime while
> > > submitting RX URBs. This failure might be frequent in some low
> > > resource platforms. So, wait for a threshold failure count before
> > > start priting the error. This change is a follow up for the 'commit
> > > 7b368e3d15c3
> > > ("mwifiex: resubmit failed to submit RX URBs in main thread")'
> >
> > []
> >
> > > diff --git a/drivers/net/wireless/marvell/mwifiex/usb.c
> > > b/drivers/net/wireless/marvell/mwifiex/usb.c
> > []
> > > @@ -300,9 +300,16 @@ static int mwifiex_usb_submit_rx_urb(struct
> > urb_context *ctx, int size)
> > > if (card->rx_cmd_ep != ctx->ep) {
> > > ctx->skb = dev_alloc_skb(size);
> > > if (!ctx->skb) {
> > > - mwifiex_dbg(adapter, ERROR,
> > > - "%s: dev_alloc_skb failed\n", __func__);
> > > + if (++card->rx_urb_failure_count >
> > > + MWIFIEX_RX_URB_FAILURE_THRESHOLD) {
> > > + mwifiex_dbg(adapter, ERROR,
> > > + "%s: dev_alloc_skb failed, failure
> > count = %u\n",
> > > + __func__,
> > > + card->rx_urb_failure_count);
> > > + }
> > > return -ENOMEM;
> >
> > Why not use a ratelimit?
> Since this is for receive, the packets are from AP side and we cannot
> lower the rate from AP. On some low performance systems this change
> will be helpful.

I think Joe was referring to things like printk_ratelimited() or
dev_err_ratelimited(). Those automatically ratelimit prints for you,
using a static counter. You'd just need to make a small warpper for
mwifiex_dbg() using __ratelimit().

Those sort of rate limits are significantly different than yours though.
You were looking to avoid printing errors when there are only a few
failures in a row, whereas the existing rate-limiting infrastructure
looks to avoid printing errors if too many happen in a row. Those are
different goals.

Brian