Subject: [PATCH] ath6kl: Fix random system lockup

From: Raja Mani <[email protected]>

The commit "ath6kl: Use a mutex_lock to avoid
race in diabling and handling irq" introduces a
state where ath6kl_sdio_irq_handler() would be waiting
to claim the sdio function for receive indefinitely
when things happen in the following order.

ath6kl_sdio_irq_handler()
- aquires mtx_irq
- sdio_release_host()
ath6kl_sdio_irq_disable()
- sdio_claim_host()
- sleep on mtx_irq
ath6kl_hif_intr_bh_handler()
- (indefinitely) wait for the sdio
function to be released to exclusively claim
it again for receive operation.

Fix this by replacing the mtx_irq with an atomic
variable and a wait_queue.

Signed-off-by: Raja Mani <[email protected]>
Signed-off-by: Vasanthakumar Thiagarajan <[email protected]>
---
drivers/net/wireless/ath/ath6kl/sdio.c | 27 +++++++++++++++++++--------
1 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/drivers/net/wireless/ath/ath6kl/sdio.c b/drivers/net/wireless/ath/ath6kl/sdio.c
index f4306e4..a7c0d8d 100644
--- a/drivers/net/wireless/ath/ath6kl/sdio.c
+++ b/drivers/net/wireless/ath/ath6kl/sdio.c
@@ -49,8 +49,8 @@ struct ath6kl_sdio {
/* scatter request list head */
struct list_head scat_req;

- /* Avoids disabling irq while the interrupts being handled */
- struct mutex mtx_irq;
+ atomic_t irq_handling;
+ wait_queue_head_t irq_wq;

spinlock_t scat_lock;
bool scatter_enabled;
@@ -462,7 +462,7 @@ static void ath6kl_sdio_irq_handler(struct sdio_func *func)
ath6kl_dbg(ATH6KL_DBG_SDIO, "irq\n");

ar_sdio = sdio_get_drvdata(func);
- mutex_lock(&ar_sdio->mtx_irq);
+ atomic_set(&ar_sdio->irq_handling, 1);
/*
* Release the host during interrups so we can pick it back up when
* we process commands.
@@ -471,7 +471,10 @@ static void ath6kl_sdio_irq_handler(struct sdio_func *func)

status = ath6kl_hif_intr_bh_handler(ar_sdio->ar);
sdio_claim_host(ar_sdio->func);
- mutex_unlock(&ar_sdio->mtx_irq);
+
+ atomic_set(&ar_sdio->irq_handling, 0);
+ wake_up(&ar_sdio->irq_wq);
+
WARN_ON(status && status != -ECANCELED);
}

@@ -579,14 +582,21 @@ static void ath6kl_sdio_irq_disable(struct ath6kl *ar)

sdio_claim_host(ar_sdio->func);

- mutex_lock(&ar_sdio->mtx_irq);
+ if (atomic_read(&ar_sdio->irq_handling)) {
+ sdio_release_host(ar_sdio->func);
+
+ ret = wait_event_interruptible(ar_sdio->irq_wq,
+ !atomic_read(&ar_sdio->irq_handling));
+ if (ret)
+ return;
+
+ sdio_claim_host(ar_sdio->func);
+ }

ret = sdio_release_irq(ar_sdio->func);
if (ret)
ath6kl_err("Failed to release sdio irq: %d\n", ret);

- mutex_unlock(&ar_sdio->mtx_irq);
-
sdio_release_host(ar_sdio->func);
}

@@ -1285,7 +1295,6 @@ static int ath6kl_sdio_probe(struct sdio_func *func,
spin_lock_init(&ar_sdio->scat_lock);
spin_lock_init(&ar_sdio->wr_async_lock);
mutex_init(&ar_sdio->dma_buffer_mutex);
- mutex_init(&ar_sdio->mtx_irq);

INIT_LIST_HEAD(&ar_sdio->scat_req);
INIT_LIST_HEAD(&ar_sdio->bus_req_freeq);
@@ -1293,6 +1302,8 @@ static int ath6kl_sdio_probe(struct sdio_func *func,

INIT_WORK(&ar_sdio->wr_async_work, ath6kl_sdio_write_async_work);

+ init_waitqueue_head(&ar_sdio->irq_wq);
+
for (count = 0; count < BUS_REQUEST_MAX_NUM; count++)
ath6kl_sdio_free_bus_req(ar_sdio, &ar_sdio->bus_req[count]);

--
1.7.0.4



2012-02-28 07:57:09

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] ath6kl: Fix random system lockup

On 02/28/2012 07:28 AM, Vasanthakumar Thiagarajan wrote:
> On Mon, Feb 27, 2012 at 04:17:41PM +0200, Kalle Valo wrote:
>> On 02/09/2012 09:27 AM, Vasanthakumar Thiagarajan wrote:
>>> From: Raja Mani <[email protected]>
>>>
>>> The commit "ath6kl: Use a mutex_lock to avoid
>>> race in diabling and handling irq" introduces a
>>> state where ath6kl_sdio_irq_handler() would be waiting
>>> to claim the sdio function for receive indefinitely
>>> when things happen in the following order.
>>>
>>> ath6kl_sdio_irq_handler()
>>> - aquires mtx_irq
>>> - sdio_release_host()
>>> ath6kl_sdio_irq_disable()
>>> - sdio_claim_host()
>>> - sleep on mtx_irq
>>> ath6kl_hif_intr_bh_handler()
>>> - (indefinitely) wait for the sdio
>>> function to be released to exclusively claim
>>> it again for receive operation.
>>>
>>> Fix this by replacing the mtx_irq with an atomic
>>> variable and a wait_queue.
>>>
>>> Signed-off-by: Raja Mani <[email protected]>
>>> Signed-off-by: Vasanthakumar Thiagarajan <[email protected]>
>>
>> I would really like to avoid using atomic variable if at all possible. I
>> was trying to think other options and what if we take in
>> ath6kl_sdio_irq_disable() mtx_irq before calling sdio_claim_host().
>> Wouldn't that solve the deadlock?
>
> Ok, the goal is to process any pending interrupts before disabling the interrupt.
> Taking mutex before sdio_claim_host() would have a deadlock condition like the following
>
> sdio_claim_host() ath6kl_sdio_irq_disable()
> - Acquire mtx_irq
> ath6kl_sdio_irq_handler() - Trying to claim sdio func()
> - waiting on mtx_irq

Yeah, that's true. I really would not want to apply this patch but I
guess there is no other option :(

Kalle

2012-02-27 14:17:46

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] ath6kl: Fix random system lockup

On 02/09/2012 09:27 AM, Vasanthakumar Thiagarajan wrote:
> From: Raja Mani <[email protected]>
>
> The commit "ath6kl: Use a mutex_lock to avoid
> race in diabling and handling irq" introduces a
> state where ath6kl_sdio_irq_handler() would be waiting
> to claim the sdio function for receive indefinitely
> when things happen in the following order.
>
> ath6kl_sdio_irq_handler()
> - aquires mtx_irq
> - sdio_release_host()
> ath6kl_sdio_irq_disable()
> - sdio_claim_host()
> - sleep on mtx_irq
> ath6kl_hif_intr_bh_handler()
> - (indefinitely) wait for the sdio
> function to be released to exclusively claim
> it again for receive operation.
>
> Fix this by replacing the mtx_irq with an atomic
> variable and a wait_queue.
>
> Signed-off-by: Raja Mani <[email protected]>
> Signed-off-by: Vasanthakumar Thiagarajan <[email protected]>

I would really like to avoid using atomic variable if at all possible. I
was trying to think other options and what if we take in
ath6kl_sdio_irq_disable() mtx_irq before calling sdio_claim_host().
Wouldn't that solve the deadlock?

Kalle

Subject: Re: [PATCH] ath6kl: Fix random system lockup

On Mon, Feb 27, 2012 at 04:17:41PM +0200, Kalle Valo wrote:
> On 02/09/2012 09:27 AM, Vasanthakumar Thiagarajan wrote:
> > From: Raja Mani <[email protected]>
> >
> > The commit "ath6kl: Use a mutex_lock to avoid
> > race in diabling and handling irq" introduces a
> > state where ath6kl_sdio_irq_handler() would be waiting
> > to claim the sdio function for receive indefinitely
> > when things happen in the following order.
> >
> > ath6kl_sdio_irq_handler()
> > - aquires mtx_irq
> > - sdio_release_host()
> > ath6kl_sdio_irq_disable()
> > - sdio_claim_host()
> > - sleep on mtx_irq
> > ath6kl_hif_intr_bh_handler()
> > - (indefinitely) wait for the sdio
> > function to be released to exclusively claim
> > it again for receive operation.
> >
> > Fix this by replacing the mtx_irq with an atomic
> > variable and a wait_queue.
> >
> > Signed-off-by: Raja Mani <[email protected]>
> > Signed-off-by: Vasanthakumar Thiagarajan <[email protected]>
>
> I would really like to avoid using atomic variable if at all possible. I
> was trying to think other options and what if we take in
> ath6kl_sdio_irq_disable() mtx_irq before calling sdio_claim_host().
> Wouldn't that solve the deadlock?

Ok, the goal is to process any pending interrupts before disabling the interrupt.
Taking mutex before sdio_claim_host() would have a deadlock condition like the following

sdio_claim_host() ath6kl_sdio_irq_disable()
- Acquire mtx_irq
ath6kl_sdio_irq_handler() - Trying to claim sdio func()
- waiting on mtx_irq


Vasanth

2012-03-01 07:35:58

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] ath6kl: Fix random system lockup

On 02/09/2012 09:27 AM, Vasanthakumar Thiagarajan wrote:
> From: Raja Mani <[email protected]>
>
> The commit "ath6kl: Use a mutex_lock to avoid
> race in diabling and handling irq" introduces a
> state where ath6kl_sdio_irq_handler() would be waiting
> to claim the sdio function for receive indefinitely
> when things happen in the following order.
>
> ath6kl_sdio_irq_handler()
> - aquires mtx_irq
> - sdio_release_host()
> ath6kl_sdio_irq_disable()
> - sdio_claim_host()
> - sleep on mtx_irq
> ath6kl_hif_intr_bh_handler()
> - (indefinitely) wait for the sdio
> function to be released to exclusively claim
> it again for receive operation.
>
> Fix this by replacing the mtx_irq with an atomic
> variable and a wait_queue.
>
> Signed-off-by: Raja Mani <[email protected]>
> Signed-off-by: Vasanthakumar Thiagarajan <[email protected]>

Thanks, applied with a minor change due to indentation:

static bool ath6kl_sdio_is_on_irq(struct ath6kl *ar)
{
struct ath6kl_sdio *ar_sdio = ath6kl_sdio_priv(ar);

return !atomic_read(&ar_sdio->irq_handling);
}

ret = wait_event_interruptible(ar_sdio->irq_wq,
ath6kl_sdio_is_on_irq(ar));

Kalle