2013-09-16 11:39:17

by Daniel Mack

[permalink] [raw]
Subject: [PATCH] mwifiex: fix command timeout with SDIO interrupts enabled

We've been hunting a command timeout issue in the mwifiex driver which
occurs on an AM33xx platform when Andreas Fenkart's omap_hsmmc SDIO IRQ
patches are applied (they are not yet in the mainline kernel). We first
suspected the mmc host driver to be racy, but that wasn't the case.

I dug a little through the changes between 3.7 and 3.10 on the mwifiex
driver, and it turned out that the culprit is 601216e12 ("mwifiex:
process RX packets in SDIO IRQ thread directly"), especially the
situation where mwifiex_main_process() bails when mwifiex_processing is
set.

In other words: if an SDIO irq arrives while the driver is processing
mwifiex_main_process() from the work queue, we effectively loose the
interrupt, which then results in a command timeout.

I've written a little test case scenario that calls
mwifiex_dump_station_info() through the SIOCGIWSTATS ioctl excessively,
and that reproduces the problem after a couple of seconds already:

https://gist.github.com/zonque/6579314


The fix for this issue is quite simple and works very well.


Thanks,
Daniel


Daniel Mack (1):
mwifiex: queue main work from main process when bailing on races

drivers/net/wireless/mwifiex/main.c | 1 +
1 file changed, 1 insertion(+)

--
1.8.3.1



2013-09-16 11:39:17

by Daniel Mack

[permalink] [raw]
Subject: [PATCH 1/1] mwifiex: queue main work from main process when bailing on races

Queue main_work in case mwifiex_main_process() bails due to an already
processed transaction. This is particularly necessary because
mwifiex_main_process() is called from both the SDIO interrupt handler
and the workqueue. In case an interrupt occurs while the main process
is currently executed from the workqueue, the interrupt is lost,
resulting in a command timeout and consequently a card reset.

I'm marking this for stable kernel in version 3.7+, because on our
platform, the issue appears since 601216e12c ("mwifiex: process RX
packets in SDIO IRQ thread directly") went in.

Signed-off-by: Daniel Mack <[email protected]>
Reported-by: Sven Neumann <[email protected]>
Reported-by: Andreas Fenkart <[email protected]>
Cc: Bing Zhao <[email protected]>
Cc: <[email protected]> [v3.7+]
---
drivers/net/wireless/mwifiex/main.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/mwifiex/main.c b/drivers/net/wireless/mwifiex/main.c
index ff4ed96..0700bc2 100644
--- a/drivers/net/wireless/mwifiex/main.c
+++ b/drivers/net/wireless/mwifiex/main.c
@@ -235,6 +235,7 @@ int mwifiex_main_process(struct mwifiex_adapter *adapter)
/* Check if already processing */
if (adapter->mwifiex_processing) {
spin_unlock_irqrestore(&adapter->main_proc_lock, flags);
+ queue_work(adapter->workqueue, &adapter->main_work);
goto exit_main_proc;
} else {
adapter->mwifiex_processing = true;
--
1.8.3.1


2013-09-16 21:15:00

by Bing Zhao

[permalink] [raw]
Subject: RE: [PATCH 1/1] mwifiex: queue main work from main process when bailing on races

Hi Daniel,

Thanks for the patch.

> Queue main_work in case mwifiex_main_process() bails due to an already
> processed transaction. This is particularly necessary because
> mwifiex_main_process() is called from both the SDIO interrupt handler
> and the workqueue. In case an interrupt occurs while the main process
> is currently executed from the workqueue, the interrupt is lost,
> resulting in a command timeout and consequently a card reset.
>
> I'm marking this for stable kernel in version 3.7+, because on our
> platform, the issue appears since 601216e12c ("mwifiex: process RX
> packets in SDIO IRQ thread directly") went in.
>
> Signed-off-by: Daniel Mack <[email protected]>
> Reported-by: Sven Neumann <[email protected]>
> Reported-by: Andreas Fenkart <[email protected]>
> Cc: Bing Zhao <[email protected]>
> Cc: <[email protected]> [v3.7+]
> ---
> drivers/net/wireless/mwifiex/main.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/wireless/mwifiex/main.c b/drivers/net/wireless/mwifiex/main.c
> index ff4ed96..0700bc2 100644
> --- a/drivers/net/wireless/mwifiex/main.c
> +++ b/drivers/net/wireless/mwifiex/main.c
> @@ -235,6 +235,7 @@ int mwifiex_main_process(struct mwifiex_adapter *adapter)
> /* Check if already processing */
> if (adapter->mwifiex_processing) {
> spin_unlock_irqrestore(&adapter->main_proc_lock, flags);
> + queue_work(adapter->workqueue, &adapter->main_work);

This is specific to SDIO interface, so we'd add a check here.

+ if (adapter->iface_type == MWIFIEX_SDIO)
+ queue_work(adapter->workqueue, &adapter->main_work);

Thanks,
Bing

> goto exit_main_proc;
> } else {
> adapter->mwifiex_processing = true;
> --
> 1.8.3.1


2013-09-17 18:20:36

by Bing Zhao

[permalink] [raw]
Subject: RE: [PATCH 1/1] mwifiex: queue main work from main process when bailing on races

Hi Daniel,

> >> /* Check if already processing */
> >> if (adapter->mwifiex_processing) {
> >> spin_unlock_irqrestore(&adapter->main_proc_lock, flags);
> >> + queue_work(adapter->workqueue, &adapter->main_work);
> >
> > This is specific to SDIO interface,
>
> Is it really? By checking adapter->mwifiex_processing, the driver seems
> to expect mwifiex_main_process() to be called from multiple execution
> paths, and in that case, we will always loose one execution cycle in

You are right. I overlooked it.

> case we bail early. I actually wonder why this didn't hit us earlier,
> but I might miss a detail.

I guess, in your case, the interrupt comes in at line 363 where you have passed the int_status or RX_RCVD checking but the mwifiex_processing flag is still true.

361 if ((adapter->int_status) || IS_CARD_RX_RCVD(adapter))
362 goto process_start;
363
364 spin_lock_irqsave(&adapter->main_proc_lock, flags);
365 adapter->mwifiex_processing = false;
366 spin_unlock_irqrestore(&adapter->main_proc_lock, flags);

The interrupt thread exits because mwifiex_processing is true.
Therefore the mwifiex_main_process misses this interrupt.

>
> OTOH, the worst thing that can happen if the function is executed too
> often is that it exits early and does nothing.
>
> > + if (adapter->iface_type == MWIFIEX_SDIO)
> > + queue_work(adapter->workqueue, &adapter->main_work);
>
> I can of course add this, but I don't fully understand why the driver
> takes care of concurrently running executing paths and then just bails
> silently in case a race is detected.

No. Your original patch is fine.
Could you resend it as [PATCH 3.12]? I will ACK in that thread.

Thanks,
Bing




2013-09-16 23:00:21

by Daniel Mack

[permalink] [raw]
Subject: Re: [PATCH 1/1] mwifiex: queue main work from main process when bailing on races

Hi Bing,

On 16.09.2013 23:14, Bing Zhao wrote:
>> diff --git a/drivers/net/wireless/mwifiex/main.c b/drivers/net/wireless/mwifiex/main.c
>> index ff4ed96..0700bc2 100644
>> --- a/drivers/net/wireless/mwifiex/main.c
>> +++ b/drivers/net/wireless/mwifiex/main.c
>> @@ -235,6 +235,7 @@ int mwifiex_main_process(struct mwifiex_adapter *adapter)
>> /* Check if already processing */
>> if (adapter->mwifiex_processing) {
>> spin_unlock_irqrestore(&adapter->main_proc_lock, flags);
>> + queue_work(adapter->workqueue, &adapter->main_work);
>
> This is specific to SDIO interface,

Is it really? By checking adapter->mwifiex_processing, the driver seems
to expect mwifiex_main_process() to be called from multiple execution
paths, and in that case, we will always loose one execution cycle in
case we bail early. I actually wonder why this didn't hit us earlier,
but I might miss a detail.

OTOH, the worst thing that can happen if the function is executed too
often is that it exits early and does nothing.

> + if (adapter->iface_type == MWIFIEX_SDIO)
> + queue_work(adapter->workqueue, &adapter->main_work);

I can of course add this, but I don't fully understand why the driver
takes care of concurrently running executing paths and then just bails
silently in case a race is detected.


Best regards,
Daniel