2015-01-14 13:17:40

by Felix Fietkau

[permalink] [raw]
Subject: [PATCH 3.19] ath9k: fix race condition in irq processing during hardware reset

To fix invalid hardware accesses, the commit
"ath9k: do not access hardware on IRQs during reset" made the irq
handler ignore interrupts emitted after queueing a hardware reset (which
disables the IRQ). This left a small time window for the IRQ to get
re-enabled by the tasklet, which caused IRQ storms.
Instead of returning IRQ_NONE when ATH_OP_HW_RESET is set, disable the
IRQ entirely for the duration of the reset.

Signed-off-by: Felix Fietkau <[email protected]>
---
drivers/net/wireless/ath/ath9k/main.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index 9a72640..62b0bf4 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -285,6 +285,7 @@ static int ath_reset_internal(struct ath_softc *sc, struct ath9k_channel *hchan)

__ath_cancel_work(sc);

+ disable_irq(sc->irq);
tasklet_disable(&sc->intr_tq);
tasklet_disable(&sc->bcon_tasklet);
spin_lock_bh(&sc->sc_pcu_lock);
@@ -331,6 +332,7 @@ static int ath_reset_internal(struct ath_softc *sc, struct ath9k_channel *hchan)
r = -EIO;

out:
+ enable_irq(sc->irq);
spin_unlock_bh(&sc->sc_pcu_lock);
tasklet_enable(&sc->bcon_tasklet);
tasklet_enable(&sc->intr_tq);
@@ -512,9 +514,6 @@ irqreturn_t ath_isr(int irq, void *dev)
if (!ah || test_bit(ATH_OP_INVALID, &common->op_flags))
return IRQ_NONE;

- if (!AR_SREV_9100(ah) && test_bit(ATH_OP_HW_RESET, &common->op_flags))
- return IRQ_NONE;
-
/* shared irq, not for us */
if (!ath9k_hw_intrpend(ah))
return IRQ_NONE;
@@ -529,7 +528,7 @@ irqreturn_t ath_isr(int irq, void *dev)
ath9k_debug_sync_cause(sc, sync_cause);
status &= ah->imask; /* discard unasked-for bits */

- if (AR_SREV_9100(ah) && test_bit(ATH_OP_HW_RESET, &common->op_flags))
+ if (test_bit(ATH_OP_HW_RESET, &common->op_flags))
return IRQ_HANDLED;

/*
--
2.1.2



2015-01-15 09:06:58

by Sujith Manoharan

[permalink] [raw]
Subject: Re: [PATCH 3.19] ath9k: fix race condition in irq processing during hardware reset

Felix Fietkau wrote:
> To fix invalid hardware accesses, the commit
> "ath9k: do not access hardware on IRQs during reset" made the irq
> handler ignore interrupts emitted after queueing a hardware reset (which
> disables the IRQ). This left a small time window for the IRQ to get
> re-enabled by the tasklet, which caused IRQ storms.
> Instead of returning IRQ_NONE when ATH_OP_HW_RESET is set, disable the
> IRQ entirely for the duration of the reset.

Doesn't this make the kill_interrupts() that was added in the earlier
commit unnecessary now ?

Sujith

2015-01-15 09:46:32

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH 3.19] ath9k: fix race condition in irq processing during hardware reset

On 2015-01-15 10:09, Sujith Manoharan wrote:
> Felix Fietkau wrote:
>> To fix invalid hardware accesses, the commit
>> "ath9k: do not access hardware on IRQs during reset" made the irq
>> handler ignore interrupts emitted after queueing a hardware reset (which
>> disables the IRQ). This left a small time window for the IRQ to get
>> re-enabled by the tasklet, which caused IRQ storms.
>> Instead of returning IRQ_NONE when ATH_OP_HW_RESET is set, disable the
>> IRQ entirely for the duration of the reset.
>
> Doesn't this make the kill_interrupts() that was added in the earlier
> commit unnecessary now ?
I think it's still a good idea to try to silence interrupts between
queueing a reset and actually performing it.

- Felix

2015-01-16 08:08:26

by Rico Derrer

[permalink] [raw]
Subject: Re: [PATCH 3.19] ath9k: fix race condition in irq processing during hardware reset

Hi Felix

Felix Fietkau wrote:
> diff --git a/drivers/net/wireless/ath/ath9k/main.c
> b/drivers/net/wireless/ath/ath9k/main.c
> index 9a72640..62b0bf4 100644
> --- a/drivers/net/wireless/ath/ath9k/main.c
> +++ b/drivers/net/wireless/ath/ath9k/main.c
> @@ -285,6 +285,7 @@ static int ath_reset_internal(struct ath_softc *sc, struct
> ath9k_channel *hchan)
>
> __ath_cancel_work(sc);
>
> + disable_irq(sc->irq);
> tasklet_disable(&sc->intr_tq);
> tasklet_disable(&sc->bcon_tasklet);
> spin_lock_bh(&sc->sc_pcu_lock);
> @@ -331,6 +332,7 @@ static int ath_reset_internal(struct ath_softc *sc, struct
> ath9k_channel *hchan)
> r = -EIO;
>
> out:
> + enable_irq(sc->irq);
> spin_unlock_bh(&sc->sc_pcu_lock);
> tasklet_enable(&sc->bcon_tasklet);
> tasklet_enable(&sc->intr_tq);

This part completely blocks the system on a AR9350. Loading the module works but configuring it hangs the system until watchdog restarts it.

Regards
Rico

2015-01-19 12:37:25

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 3.19] ath9k: fix race condition in irq processing during hardware reset

Rico Derrer <[email protected]> writes:

>>> This part completely blocks the system on a AR9350. Loading the
>>> module works but configuring it hangs the system until watchdog
>>> restarts it.
>>
>> What kernel/software are you running on there? When I committed this
>> patch to OpenWrt, it uncovered IRQ handling bugs in MIPS kernel code
>> (both generic and in the ath79 platform code).
>>
>> You can find the fixes that I've made here:
>> http://git.openwrt.org/?p=openwrt.git;a=blob_plain;f=target/linux/generic/patches-3.14/130-mips_cpu_irq_disable.patch
>> http://git.openwrt.org/?p=openwrt.git;a=blob_plain;f=target/linux/ar71xx/patches-3.14/736-MIPS-ath79-fix-chained-irq-disable.patch
>>
>> I've already submitted the first one to linux-mips.
>
> Thank you, these patches fixed the problem.

Thanks for confirming that, makes it a lot safer for me to apply the
patch.

--
Kalle Valo

2015-01-19 12:36:09

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 3.19] ath9k: fix race condition in irq processing during hardware reset

Felix Fietkau <[email protected]> writes:

> To fix invalid hardware accesses, the commit
> "ath9k: do not access hardware on IRQs during reset" made the irq
> handler ignore interrupts emitted after queueing a hardware reset (which
> disables the IRQ). This left a small time window for the IRQ to get
> re-enabled by the tasklet, which caused IRQ storms.
> Instead of returning IRQ_NONE when ATH_OP_HW_RESET is set, disable the
> IRQ entirely for the duration of the reset.
>
> Signed-off-by: Felix Fietkau <[email protected]>

Thanks, applied to wireless-drivers.git. I made a small change and added
the commit id to the commit log:

commit e3f31175a3eeb492a6ab788e4fa136c19b43aab4
Author: Felix Fietkau <[email protected]>
Date: Wed Jan 14 14:17:36 2015 +0100

ath9k: fix race condition in irq processing during hardware reset

To fix invalid hardware accesses, the commit 872b5d814f99 ("ath9k: do not
access hardware on IRQs during reset") made the irq handler ignore interrupts
emitted after queueing a hardware reset (which disables the IRQ). This left a
small time window for the IRQ to get re-enabled by the tasklet, which caused
IRQ storms. Instead of returning IRQ_NONE when ATH_OP_HW_RESET is set, disable
the IRQ entirely for the duration of the reset.

Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Kalle Valo <[email protected]>

--
Kalle Valo

2015-01-16 10:41:45

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH 3.19] ath9k: fix race condition in irq processing during hardware reset

On 2015-01-16 09:00, Rico Derrer wrote:
> Hi Felix
>
> Felix Fietkau wrote:
>> diff --git a/drivers/net/wireless/ath/ath9k/main.c
>> b/drivers/net/wireless/ath/ath9k/main.c
>> index 9a72640..62b0bf4 100644
>> --- a/drivers/net/wireless/ath/ath9k/main.c
>> +++ b/drivers/net/wireless/ath/ath9k/main.c
>> @@ -285,6 +285,7 @@ static int ath_reset_internal(struct ath_softc *sc, struct
>> ath9k_channel *hchan)
>>
>> __ath_cancel_work(sc);
>>
>> + disable_irq(sc->irq);
>> tasklet_disable(&sc->intr_tq);
>> tasklet_disable(&sc->bcon_tasklet);
>> spin_lock_bh(&sc->sc_pcu_lock);
>> @@ -331,6 +332,7 @@ static int ath_reset_internal(struct ath_softc *sc, struct
>> ath9k_channel *hchan)
>> r = -EIO;
>>
>> out:
>> + enable_irq(sc->irq);
>> spin_unlock_bh(&sc->sc_pcu_lock);
>> tasklet_enable(&sc->bcon_tasklet);
>> tasklet_enable(&sc->intr_tq);
>
> This part completely blocks the system on a AR9350. Loading the
> module works but configuring it hangs the system until watchdog restarts it.
What kernel/software are you running on there? When I committed this
patch to OpenWrt, it uncovered IRQ handling bugs in MIPS kernel code
(both generic and in the ath79 platform code).

You can find the fixes that I've made here:
http://git.openwrt.org/?p=openwrt.git;a=blob_plain;f=target/linux/generic/patches-3.14/130-mips_cpu_irq_disable.patch
http://git.openwrt.org/?p=openwrt.git;a=blob_plain;f=target/linux/ar71xx/patches-3.14/736-MIPS-ath79-fix-chained-irq-disable.patch

I've already submitted the first one to linux-mips.

- Felix

2015-01-19 08:27:01

by Rico Derrer

[permalink] [raw]
Subject: Re: [PATCH 3.19] ath9k: fix race condition in irq processing during hardware reset

> On 2015-01-16 09:00, Rico Derrer wrote:
>> Hi Felix
>>
>> Felix Fietkau wrote:
>>> diff --git a/drivers/net/wireless/ath/ath9k/main.c
>>> b/drivers/net/wireless/ath/ath9k/main.c
>>> index 9a72640..62b0bf4 100644
>>> --- a/drivers/net/wireless/ath/ath9k/main.c
>>> +++ b/drivers/net/wireless/ath/ath9k/main.c
>>> @@ -285,6 +285,7 @@ static int ath_reset_internal(struct ath_softc *sc, struct
>>> ath9k_channel *hchan)
>>>
>>> __ath_cancel_work(sc);
>>>
>>> + disable_irq(sc->irq);
>>> tasklet_disable(&sc->intr_tq);
>>> tasklet_disable(&sc->bcon_tasklet);
>>> spin_lock_bh(&sc->sc_pcu_lock);
>>> @@ -331,6 +332,7 @@ static int ath_reset_internal(struct ath_softc *sc, struct
>>> ath9k_channel *hchan)
>>> r = -EIO;
>>>
>>> out:
>>> + enable_irq(sc->irq);
>>> spin_unlock_bh(&sc->sc_pcu_lock);
>>> tasklet_enable(&sc->bcon_tasklet);
>>> tasklet_enable(&sc->intr_tq);
>>
>> This part completely blocks the system on a AR9350. Loading the
>> module works but configuring it hangs the system until watchdog restarts it.
> What kernel/software are you running on there? When I committed this
> patch to OpenWrt, it uncovered IRQ handling bugs in MIPS kernel code
> (both generic and in the ath79 platform code).
>
> You can find the fixes that I've made here:
> http://git.openwrt.org/?p=openwrt.git;a=blob_plain;f=target/linux/generic/patches-3.14/130-mips_cpu_irq_disable.patch
> http://git.openwrt.org/?p=openwrt.git;a=blob_plain;f=target/linux/ar71xx/patches-3.14/736-MIPS-ath79-fix-chained-irq-disable.patch
>
> I've already submitted the first one to linux-mips.
>
> - Felix

Thank you, these patches fixed the problem.

Rico