2016-11-17 08:36:24

by Sven Eckelmann

[permalink] [raw]
Subject: [RFC v2 1/2] ath9k: work around AR_CFG 0xdeadbeef chip hang

From: Simon Wunderlich <[email protected]>

QCA 802.11n chips (especially AR9330/AR9340) sometimes end up in a state in
which a read of AR_CFG always returns 0xdeadbeef. This should not happen
when when the power_mode of the device is ATH9K_PM_AWAKE.

This problem is not yet detected by any other workaround in ath9k. No way
is known to reproduce the problem easily.

Signed-off-by: Simon Wunderlich <[email protected]>
[[email protected]: port to recent ath9k, add commit message]
Signed-off-by: Sven Eckelmann <[email protected]>
---
v2:
- reduce amount of possible goto-raptor attacks by one (thanks Kalle Valo)

This was discussed 4 years ago on the OpenWrt mailing list. The most
relevant post is
https://lists.openwrt.org/pipermail/openwrt-devel/2012-September/016708.html
---
drivers/net/wireless/ath/ath9k/ath9k.h | 3 +++
drivers/net/wireless/ath/ath9k/debug.c | 1 +
drivers/net/wireless/ath/ath9k/debug.h | 1 +
drivers/net/wireless/ath/ath9k/init.c | 1 +
drivers/net/wireless/ath/ath9k/link.c | 31 +++++++++++++++++++++++++++++++
drivers/net/wireless/ath/ath9k/main.c | 4 ++++
6 files changed, 41 insertions(+)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index 26fc8ec..9c6fee7 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -710,11 +710,13 @@ void ath9k_csa_update(struct ath_softc *sc);
#define ATH_ANI_MAX_SKIP_COUNT 10
#define ATH_PAPRD_TIMEOUT 100 /* msecs */
#define ATH_PLL_WORK_INTERVAL 100
+#define ATH_HANG_WORK_INTERVAL 30000

void ath_tx_complete_poll_work(struct work_struct *work);
void ath_reset_work(struct work_struct *work);
bool ath_hw_check(struct ath_softc *sc);
void ath_hw_pll_work(struct work_struct *work);
+void ath_hw_hang_work(struct work_struct *work);
void ath_paprd_calibrate(struct work_struct *work);
void ath_ani_calibrate(unsigned long data);
void ath_start_ani(struct ath_softc *sc);
@@ -1014,6 +1016,7 @@ struct ath_softc {
#endif
struct delayed_work tx_complete_work;
struct delayed_work hw_pll_work;
+ struct delayed_work hw_hang_work;
struct timer_list sleep_timer;

#ifdef CONFIG_ATH9K_BTCOEX_SUPPORT
diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c
index c56e40f..608b370 100644
--- a/drivers/net/wireless/ath/ath9k/debug.c
+++ b/drivers/net/wireless/ath/ath9k/debug.c
@@ -767,6 +767,7 @@ static int read_file_reset(struct seq_file *file, void *data)
[RESET_TYPE_CALIBRATION] = "Calibration error",
[RESET_TX_DMA_ERROR] = "Tx DMA stop error",
[RESET_RX_DMA_ERROR] = "Rx DMA stop error",
+ [RESET_TYPE_DEADBEEF] = "deadbeef hang",
};
int i;

diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h
index cd68c5f..0d77abbf6 100644
--- a/drivers/net/wireless/ath/ath9k/debug.h
+++ b/drivers/net/wireless/ath/ath9k/debug.h
@@ -52,6 +52,7 @@ enum ath_reset_type {
RESET_TYPE_CALIBRATION,
RESET_TX_DMA_ERROR,
RESET_RX_DMA_ERROR,
+ RESET_TYPE_DEADBEEF,
__RESET_TYPE_MAX
};

diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c
index 368d9b3..9bc7d1c 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -638,6 +638,7 @@ static int ath9k_init_softc(u16 devid, struct ath_softc *sc,
INIT_WORK(&sc->hw_reset_work, ath_reset_work);
INIT_WORK(&sc->paprd_work, ath_paprd_calibrate);
INIT_DELAYED_WORK(&sc->hw_pll_work, ath_hw_pll_work);
+ INIT_DELAYED_WORK(&sc->hw_hang_work, ath_hw_hang_work);

ath9k_init_channel_context(sc);

diff --git a/drivers/net/wireless/ath/ath9k/link.c b/drivers/net/wireless/ath/ath9k/link.c
index 5ad0fee..04195d5 100644
--- a/drivers/net/wireless/ath/ath9k/link.c
+++ b/drivers/net/wireless/ath/ath9k/link.c
@@ -138,6 +138,37 @@ void ath_hw_pll_work(struct work_struct *work)
msecs_to_jiffies(ATH_PLL_WORK_INTERVAL));
}

+static bool ath_hw_hang_deadbeef(struct ath_softc *sc)
+{
+ struct ath_common *common = ath9k_hw_common(sc->sc_ah);
+ u32 reg;
+
+ /* check for stucked MAC */
+ ath9k_ps_wakeup(sc);
+ reg = REG_READ(sc->sc_ah, AR_CFG);
+ ath9k_ps_restore(sc);
+
+ if (reg != 0xdeadbeef)
+ return false;
+
+ ath_dbg(common, RESET,
+ "0xdeadbeef hang is detected. Schedule chip reset\n");
+ ath9k_queue_reset(sc, RESET_TYPE_DEADBEEF);
+
+ return true;
+}
+
+void ath_hw_hang_work(struct work_struct *work)
+{
+ struct ath_softc *sc = container_of(work, struct ath_softc,
+ hw_hang_work.work);
+
+ ath_hw_hang_deadbeef(sc);
+
+ ieee80211_queue_delayed_work(sc->hw, &sc->hw_hang_work,
+ msecs_to_jiffies(ATH_HANG_WORK_INTERVAL));
+}
+
/*
* PA Pre-distortion.
*/
diff --git a/drivers/net/wireless/ath/ath9k/main.c b/drivers/net/wireless/ath/ath9k/main.c
index e9f32b5..4d3e216 100644
--- a/drivers/net/wireless/ath/ath9k/main.c
+++ b/drivers/net/wireless/ath/ath9k/main.c
@@ -183,6 +183,7 @@ static void __ath_cancel_work(struct ath_softc *sc)
cancel_work_sync(&sc->paprd_work);
cancel_delayed_work_sync(&sc->tx_complete_work);
cancel_delayed_work_sync(&sc->hw_pll_work);
+ cancel_delayed_work_sync(&sc->hw_hang_work);

#ifdef CONFIG_ATH9K_BTCOEX_SUPPORT
if (ath9k_hw_mci_is_enabled(sc->sc_ah))
@@ -204,6 +205,9 @@ void ath_restart_work(struct ath_softc *sc)
ieee80211_queue_delayed_work(sc->hw, &sc->hw_pll_work,
msecs_to_jiffies(ATH_PLL_WORK_INTERVAL));

+ ieee80211_queue_delayed_work(sc->hw, &sc->hw_hang_work,
+ msecs_to_jiffies(ATH_HANG_WORK_INTERVAL));
+
ath_start_ani(sc);
}

--
2.10.2


2016-11-21 09:11:02

by Sven Eckelmann

[permalink] [raw]
Subject: Re: [ath9k-devel] [RFC v2 2/2] ath9k: Reset chip on potential deaf state

On Montag, 21. November 2016 10:07:43 CET Ferry Huberts wrote:
[...]
> > v2:
> > - reduce amount of possible goto-raptor attacks by one (thanks Kalle Valo)
> >
> > This problem was discovered in mesh setups. It was noticed that some nodes
>
>
> What kind of setup?
> Using 802.11s?

Unencrypted IBSS.

Kind regards,
Sven


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part.

2016-11-21 09:07:47

by Ferry Huberts

[permalink] [raw]
Subject: Re: [ath9k-devel] [RFC v2 2/2] ath9k: Reset chip on potential deaf state



On 17/11/16 09:36, Sven Eckelmann wrote:
> From: Simon Wunderlich <[email protected]>
>
> The chip is switching seemingly random into a state which can be described
> as "deaf". No or nearly no interrupts are generated anymore for incoming
> packets. Existing links either break down after a while and new links will
> not be established.
>
> The driver doesn't know if there is no other device available or if it
> ended up in an "deaf" state. Resetting the chip proactively avoids
> permanent problems in case the chip really was in its "deaf" state but
> maybe causes unnecessary resets in case it wasn't "deaf".
>
> Signed-off-by: Simon Wunderlich <[email protected]>
> [[email protected]: port to recent ath9k, add commit message]
> Signed-off-by: Sven Eckelmann <[email protected]>
> ---
> v2:
> - reduce amount of possible goto-raptor attacks by one (thanks Kalle Valo)
>
> This problem was discovered in mesh setups. It was noticed that some nodes


What kind of setup?
Using 802.11s?

I ask this because I have almost completed a patch for authsae that
checks rekey.

The problems there might show as the behaviour described here.


> were not able to see their neighbors (mostly after running for a while) -
> even when those neighbors received data from them via IBSS. A simple `iw
> dev wlan0 scan` fixed the problem for them. But the problem seems to
> reappear after while(tm) in a large enough(tm) mesh.
>
> This patch is a little bit obscure because it requires CONFIG_ATH9K_DEBUGFS
> to actually work. But there still seems to be potential interest in
> Freifunk communities or Freifunk meta-projects (e.g. freifunk-gluon). It is
> currently not known if it helps them but publishing this to allow them to
> test and play around with it will not hurt :)
> ---
> drivers/net/wireless/ath/ath9k/ath9k.h | 3 +++
> drivers/net/wireless/ath/ath9k/debug.c | 1 +
> drivers/net/wireless/ath/ath9k/debug.h | 1 +
> drivers/net/wireless/ath/ath9k/link.c | 48 +++++++++++++++++++++++++++++++++-
> 4 files changed, 52 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
> index 9c6fee7..3987ad5 100644
> --- a/drivers/net/wireless/ath/ath9k/ath9k.h
> +++ b/drivers/net/wireless/ath/ath9k/ath9k.h
> @@ -996,6 +996,9 @@ struct ath_softc {
> short nbcnvifs;
> unsigned long ps_usecount;
>
> + unsigned long last_check_time;
> + u32 last_check_interrupts;
> +
> struct ath_rx rx;
> struct ath_tx tx;
> struct ath_beacon beacon;
> diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c
> index 608b370..6d5c253 100644
> --- a/drivers/net/wireless/ath/ath9k/debug.c
> +++ b/drivers/net/wireless/ath/ath9k/debug.c
> @@ -768,6 +768,7 @@ static int read_file_reset(struct seq_file *file, void *data)
> [RESET_TX_DMA_ERROR] = "Tx DMA stop error",
> [RESET_RX_DMA_ERROR] = "Rx DMA stop error",
> [RESET_TYPE_DEADBEEF] = "deadbeef hang",
> + [RESET_TYPE_DEAF] = "deaf hang",
> };
> int i;
>
> diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h
> index 0d77abbf6..6f186bd 100644
> --- a/drivers/net/wireless/ath/ath9k/debug.h
> +++ b/drivers/net/wireless/ath/ath9k/debug.h
> @@ -53,6 +53,7 @@ enum ath_reset_type {
> RESET_TX_DMA_ERROR,
> RESET_RX_DMA_ERROR,
> RESET_TYPE_DEADBEEF,
> + RESET_TYPE_DEAF,
> __RESET_TYPE_MAX
> };
>
> diff --git a/drivers/net/wireless/ath/ath9k/link.c b/drivers/net/wireless/ath/ath9k/link.c
> index 04195d5..ae99c02 100644
> --- a/drivers/net/wireless/ath/ath9k/link.c
> +++ b/drivers/net/wireless/ath/ath9k/link.c
> @@ -158,13 +158,59 @@ static bool ath_hw_hang_deadbeef(struct ath_softc *sc)
> return true;
> }
>
> +static bool ath_hw_hang_deaf(struct ath_softc *sc)
> +{
> +#ifndef CONFIG_ATH9K_DEBUGFS
> + return false;
> +#else
> + struct ath_common *common = ath9k_hw_common(sc->sc_ah);
> + u32 interrupts, interrupt_per_s;
> + unsigned int interval;
> +
> + /* get historic data */
> + interval = jiffies_to_msecs(jiffies - sc->last_check_time);
> + if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_EDMA)
> + interrupts = sc->debug.stats.istats.rxlp;
> + else
> + interrupts = sc->debug.stats.istats.rxok;
> +
> + interrupts -= sc->last_check_interrupts;
> +
> + /* save current data */
> + sc->last_check_time = jiffies;
> + if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_EDMA)
> + sc->last_check_interrupts = sc->debug.stats.istats.rxlp;
> + else
> + sc->last_check_interrupts = sc->debug.stats.istats.rxok;
> +
> + /* sanity check, should be 30 seconds */
> + if (interval > 40000 || interval < 20000)
> + return false;
> +
> + /* should be at least one interrupt per second */
> + interrupt_per_s = interrupts / (interval / 1000);
> + if (interrupt_per_s >= 1)
> + return false;
> +
> + ath_dbg(common, RESET,
> + "RX deaf hang is detected. Schedule chip reset\n");
> + ath9k_queue_reset(sc, RESET_TYPE_DEAF);
> +
> + return true;
> +#endif
> +}
> +
> void ath_hw_hang_work(struct work_struct *work)
> {
> struct ath_softc *sc = container_of(work, struct ath_softc,
> hw_hang_work.work);
>
> - ath_hw_hang_deadbeef(sc);
> + if (ath_hw_hang_deadbeef(sc))
> + goto requeue_worker;
> +
> + ath_hw_hang_deaf(sc);
>
> +requeue_worker:
> ieee80211_queue_delayed_work(sc->hw, &sc->hw_hang_work,
> msecs_to_jiffies(ATH_HANG_WORK_INTERVAL));
> }
>

--
Ferry Huberts

2016-11-21 09:15:28

by Ferry Huberts

[permalink] [raw]
Subject: Re: [ath9k-devel] [RFC v2 2/2] ath9k: Reset chip on potential deaf state



On 21/11/16 10:10, Sven Eckelmann wrote:
> On Montag, 21. November 2016 10:07:43 CET Ferry Huberts wrote:
> [...]
>>> v2:
>>> - reduce amount of possible goto-raptor attacks by one (thanks Kalle Valo)
>>>
>>> This problem was discovered in mesh setups. It was noticed that some nodes
>>
>>
>> What kind of setup?
>> Using 802.11s?
>
> Unencrypted IBSS.
>

ok, thanks. that is different then.

I _can_ tell you that using the high priority queue (EF class traffic)
seems to somehow 'unwedge' the chip during/after rekeying. Still have to
verify this again, but that is what I saw last week.

2016-11-17 08:36:29

by Sven Eckelmann

[permalink] [raw]
Subject: [RFC v2 2/2] ath9k: Reset chip on potential deaf state

From: Simon Wunderlich <[email protected]>

The chip is switching seemingly random into a state which can be described
as "deaf". No or nearly no interrupts are generated anymore for incoming
packets. Existing links either break down after a while and new links will
not be established.

The driver doesn't know if there is no other device available or if it
ended up in an "deaf" state. Resetting the chip proactively avoids
permanent problems in case the chip really was in its "deaf" state but
maybe causes unnecessary resets in case it wasn't "deaf".

Signed-off-by: Simon Wunderlich <[email protected]>
[[email protected]: port to recent ath9k, add commit message]
Signed-off-by: Sven Eckelmann <[email protected]>
---
v2:
- reduce amount of possible goto-raptor attacks by one (thanks Kalle Valo)

This problem was discovered in mesh setups. It was noticed that some nodes
were not able to see their neighbors (mostly after running for a while) -
even when those neighbors received data from them via IBSS. A simple `iw
dev wlan0 scan` fixed the problem for them. But the problem seems to
reappear after while(tm) in a large enough(tm) mesh.

This patch is a little bit obscure because it requires CONFIG_ATH9K_DEBUGFS
to actually work. But there still seems to be potential interest in
Freifunk communities or Freifunk meta-projects (e.g. freifunk-gluon). It is
currently not known if it helps them but publishing this to allow them to
test and play around with it will not hurt :)
---
drivers/net/wireless/ath/ath9k/ath9k.h | 3 +++
drivers/net/wireless/ath/ath9k/debug.c | 1 +
drivers/net/wireless/ath/ath9k/debug.h | 1 +
drivers/net/wireless/ath/ath9k/link.c | 48 +++++++++++++++++++++++++++++++++-
4 files changed, 52 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index 9c6fee7..3987ad5 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -996,6 +996,9 @@ struct ath_softc {
short nbcnvifs;
unsigned long ps_usecount;

+ unsigned long last_check_time;
+ u32 last_check_interrupts;
+
struct ath_rx rx;
struct ath_tx tx;
struct ath_beacon beacon;
diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c
index 608b370..6d5c253 100644
--- a/drivers/net/wireless/ath/ath9k/debug.c
+++ b/drivers/net/wireless/ath/ath9k/debug.c
@@ -768,6 +768,7 @@ static int read_file_reset(struct seq_file *file, void *data)
[RESET_TX_DMA_ERROR] = "Tx DMA stop error",
[RESET_RX_DMA_ERROR] = "Rx DMA stop error",
[RESET_TYPE_DEADBEEF] = "deadbeef hang",
+ [RESET_TYPE_DEAF] = "deaf hang",
};
int i;

diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h
index 0d77abbf6..6f186bd 100644
--- a/drivers/net/wireless/ath/ath9k/debug.h
+++ b/drivers/net/wireless/ath/ath9k/debug.h
@@ -53,6 +53,7 @@ enum ath_reset_type {
RESET_TX_DMA_ERROR,
RESET_RX_DMA_ERROR,
RESET_TYPE_DEADBEEF,
+ RESET_TYPE_DEAF,
__RESET_TYPE_MAX
};

diff --git a/drivers/net/wireless/ath/ath9k/link.c b/drivers/net/wireless/ath/ath9k/link.c
index 04195d5..ae99c02 100644
--- a/drivers/net/wireless/ath/ath9k/link.c
+++ b/drivers/net/wireless/ath/ath9k/link.c
@@ -158,13 +158,59 @@ static bool ath_hw_hang_deadbeef(struct ath_softc *sc)
return true;
}

+static bool ath_hw_hang_deaf(struct ath_softc *sc)
+{
+#ifndef CONFIG_ATH9K_DEBUGFS
+ return false;
+#else
+ struct ath_common *common = ath9k_hw_common(sc->sc_ah);
+ u32 interrupts, interrupt_per_s;
+ unsigned int interval;
+
+ /* get historic data */
+ interval = jiffies_to_msecs(jiffies - sc->last_check_time);
+ if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_EDMA)
+ interrupts = sc->debug.stats.istats.rxlp;
+ else
+ interrupts = sc->debug.stats.istats.rxok;
+
+ interrupts -= sc->last_check_interrupts;
+
+ /* save current data */
+ sc->last_check_time = jiffies;
+ if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_EDMA)
+ sc->last_check_interrupts = sc->debug.stats.istats.rxlp;
+ else
+ sc->last_check_interrupts = sc->debug.stats.istats.rxok;
+
+ /* sanity check, should be 30 seconds */
+ if (interval > 40000 || interval < 20000)
+ return false;
+
+ /* should be at least one interrupt per second */
+ interrupt_per_s = interrupts / (interval / 1000);
+ if (interrupt_per_s >= 1)
+ return false;
+
+ ath_dbg(common, RESET,
+ "RX deaf hang is detected. Schedule chip reset\n");
+ ath9k_queue_reset(sc, RESET_TYPE_DEAF);
+
+ return true;
+#endif
+}
+
void ath_hw_hang_work(struct work_struct *work)
{
struct ath_softc *sc = container_of(work, struct ath_softc,
hw_hang_work.work);

- ath_hw_hang_deadbeef(sc);
+ if (ath_hw_hang_deadbeef(sc))
+ goto requeue_worker;
+
+ ath_hw_hang_deaf(sc);

+requeue_worker:
ieee80211_queue_delayed_work(sc->hw, &sc->hw_hang_work,
msecs_to_jiffies(ATH_HANG_WORK_INTERVAL));
}
--
2.10.2