2012-08-08 14:25:08

by Felix Fietkau

[permalink] [raw]
Subject: [PATCH v2 3.6] ath9k: fix interrupt storms on queued hardware reset

commit b74713d04effbacd3d126ce94cec18742187b6ce
"ath9k: Handle fatal interrupts properly" introduced a race condition, where
IRQs are being left enabled, however the irq handler returns IRQ_HANDLED
while the reset is still queued without addressing the IRQ cause.
This leads to an IRQ storm that prevents the system from even getting to
the reset code.

Fix this by disabling IRQs in the handler without touching intr_ref_cnt.

Cc: Rajkumar Manoharan <[email protected]>
Cc: Sujith Manoharan <[email protected]>
Signed-off-by: Felix Fietkau <[email protected]>
---
drivers/net/wireless/ath/ath9k/mac.c | 18 ++++++++++++------
drivers/net/wireless/ath/ath9k/mac.h | 1 +
drivers/net/wireless/ath/ath9k/main.c | 4 +++-
3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/mac.c b/drivers/net/wireless/ath/ath9k/mac.c
index 7990cd5..b42be91 100644
--- a/drivers/net/wireless/ath/ath9k/mac.c
+++ b/drivers/net/wireless/ath/ath9k/mac.c
@@ -773,15 +773,10 @@ bool ath9k_hw_intrpend(struct ath_hw *ah)
}
EXPORT_SYMBOL(ath9k_hw_intrpend);

-void ath9k_hw_disable_interrupts(struct ath_hw *ah)
+void ath9k_hw_kill_interrupts(struct ath_hw *ah)
{
struct ath_common *common = ath9k_hw_common(ah);

- if (!(ah->imask & ATH9K_INT_GLOBAL))
- atomic_set(&ah->intr_ref_cnt, -1);
- else
- atomic_dec(&ah->intr_ref_cnt);
-
ath_dbg(common, INTERRUPT, "disable IER\n");
REG_WRITE(ah, AR_IER, AR_IER_DISABLE);
(void) REG_READ(ah, AR_IER);
@@ -793,6 +788,17 @@ void ath9k_hw_disable_interrupts(struct ath_hw *ah)
(void) REG_READ(ah, AR_INTR_SYNC_ENABLE);
}
}
+EXPORT_SYMBOL(ath9k_hw_kill_interrupts);
+
+void ath9k_hw_disable_interrupts(struct ath_hw *ah)
+{
+ if (!(ah->imask & ATH9K_INT_GLOBAL))
+ atomic_set(&ah->intr_ref_cnt, -1);
+ else
+ atomic_dec(&ah->intr_ref_cnt);
+
+ ath9k_hw_kill_interrupts(ah);
+}
EXPORT_SYMBOL(ath9k_hw_disable_interrupts);

void ath9k_hw_enable_interrupts(struct ath_hw *ah)
diff --git a/drivers/net/wireless/ath/ath9k/mac.h b/drivers/net/wireless/ath/ath9k/mac.h
index 0eba36d..4a745e6 100644
--- a/drivers/net/wireless/ath/ath9k/mac.h
+++ b/drivers/net/wireless/ath/ath9k/mac.h
@@ -738,6 +738,7 @@ bool ath9k_hw_intrpend(struct ath_hw *ah);
void ath9k_hw_set_interrupts(struct ath_hw *ah);
void ath9k_hw_enable_interrupts(struct ath_hw *ah);
void ath9k_hw_disable_interrupts(struct ath_hw *ah);
+void ath9k_hw_kill_interrupts(struct ath_hw *ah);

void ar9002_hw_attach_mac_ops(struct ath_hw *ah);

diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 6049d8b..a22df74 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -462,8 +462,10 @@ irqreturn_t ath_isr(int irq, void *dev)
if (!ath9k_hw_intrpend(ah))
return IRQ_NONE;

- if(test_bit(SC_OP_HW_RESET, &sc->sc_flags))
+ if (test_bit(SC_OP_HW_RESET, &sc->sc_flags)) {
+ ath9k_hw_kill_interrupts(ah);
return IRQ_HANDLED;
+ }

/*
* Figure out the reason(s) for the interrupt. Note
--
1.7.9.6 (Apple Git-31.1)



2012-08-08 15:00:45

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH v2 3.6] ath9k: fix interrupt storms on queued hardware reset

On 2012-08-08 4:43 PM, Rajkumar Manoharan wrote:
> On Wed, Aug 08, 2012 at 04:25:03PM +0200, Felix Fietkau wrote:
>> commit b74713d04effbacd3d126ce94cec18742187b6ce
>> "ath9k: Handle fatal interrupts properly" introduced a race condition, where
>> IRQs are being left enabled, however the irq handler returns IRQ_HANDLED
>> while the reset is still queued without addressing the IRQ cause.
>> This leads to an IRQ storm that prevents the system from even getting to
>> the reset code.
>>
>> Fix this by disabling IRQs in the handler without touching intr_ref_cnt.
>>
> It is safer not to re-enable interrupts on FATAL errors rather than enabling
> it and then checking it on irq for bailing out. It would be better if you kill
> the interrupts on processing fatal interrupts.
A fatal interrupt isn't the only place where this is race shows up.
Anything that queues a reset is affected, so skipping the interrupt
enable in the IRQ handler is not enough (aside from the fact that it
would mess up irq disable refcounting).

Also, how is it safer? It's not like the interrupt handler does any real
processing before running into that check.

- Felix

2012-08-08 15:26:25

by Sujith Manoharan

[permalink] [raw]
Subject: Re: [PATCH v2 3.6] ath9k: fix interrupt storms on queued hardware reset

Rajkumar Manoharan wrote:
> It is safer not to re-enable interrupts on FATAL errors rather than enabling
> it and then checking it on irq for bailing out. It would be better if you kill
> the interrupts on processing fatal interrupts.

I am not sure I understand.

The original issue was the race between reset-work and the ISR which resulted in
frequent disconnects when a BB-WATCHDOG interrupt occurred or TX hung, which was
fixed by introducing the SC_OP_HW_RESET flag. Later, the work_pending() race was
fixed. Still, this is a race that can happen and I think fixing it by bypassing
the ref-count maintenance and disabling interrupts is okay.

Sujith

2012-08-08 15:19:09

by Rajkumar Manoharan

[permalink] [raw]
Subject: Re: [PATCH v2 3.6] ath9k: fix interrupt storms on queued hardware reset

On Wed, Aug 08, 2012 at 05:00:39PM +0200, Felix Fietkau wrote:
> On 2012-08-08 4:43 PM, Rajkumar Manoharan wrote:
> > On Wed, Aug 08, 2012 at 04:25:03PM +0200, Felix Fietkau wrote:
> >> commit b74713d04effbacd3d126ce94cec18742187b6ce
> >> "ath9k: Handle fatal interrupts properly" introduced a race condition, where
> >> IRQs are being left enabled, however the irq handler returns IRQ_HANDLED
> >> while the reset is still queued without addressing the IRQ cause.
> >> This leads to an IRQ storm that prevents the system from even getting to
> >> the reset code.
> >>
> >> Fix this by disabling IRQs in the handler without touching intr_ref_cnt.
> >>
> > It is safer not to re-enable interrupts on FATAL errors rather than enabling
> > it and then checking it on irq for bailing out. It would be better if you kill
> > the interrupts on processing fatal interrupts.
> A fatal interrupt isn't the only place where this is race shows up.
> Anything that queues a reset is affected, so skipping the interrupt
> enable in the IRQ handler is not enough (aside from the fact that it
> would mess up irq disable refcounting).
>
> Also, how is it safer? It's not like the interrupt handler does any real
> processing before running into that check.
>
Agree. I confused with the mentioned commit subject. Sorry for the noise.

-Rajkumar

2012-08-08 14:42:20

by Rajkumar Manoharan

[permalink] [raw]
Subject: Re: [PATCH v2 3.6] ath9k: fix interrupt storms on queued hardware reset

On Wed, Aug 08, 2012 at 04:25:03PM +0200, Felix Fietkau wrote:
> commit b74713d04effbacd3d126ce94cec18742187b6ce
> "ath9k: Handle fatal interrupts properly" introduced a race condition, where
> IRQs are being left enabled, however the irq handler returns IRQ_HANDLED
> while the reset is still queued without addressing the IRQ cause.
> This leads to an IRQ storm that prevents the system from even getting to
> the reset code.
>
> Fix this by disabling IRQs in the handler without touching intr_ref_cnt.
>
It is safer not to re-enable interrupts on FATAL errors rather than enabling
it and then checking it on irq for bailing out. It would be better if you kill
the interrupts on processing fatal interrupts.

-Rajkumar