2014-10-28 09:46:05

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v3 0/3] ath10k: speed up recovery

Patch #1 fixes a bug I've found while testing
patches #2 and #3. It's safe to cherry-pick it in
case patches #2 and/or #3 aren't accepted or need
a re-spin.

I've mainly used patch #2 to test reset and
recovery. It's pretty handy for (stress-)testing.

Patch #3 should improve recovery speed in some
cases. Notably when there's a long chain of WMI
commands involved which take a painfully long time
to timeout/complete if firmware crashes mid-way.

Note: Patches #2 and #3 depend on `ath10k: pci
related fixes 2014-10-09`. Without the patchset
patches #2 and #3 must not be applied. Patch #1 is
fine though.

v3:
* skip shadow ring cleanup in #1

Michal Kazior (3):
ath10k: fix possible bmi crash
ath10k: expose hw restart via debugfs
ath10k: speed up hw recovery

drivers/net/wireless/ath/ath10k/ce.c | 7 +++++++
drivers/net/wireless/ath/ath10k/core.c | 23 +++++++++++++++++++++++
drivers/net/wireless/ath/ath10k/core.h | 5 +++++
drivers/net/wireless/ath/ath10k/debug.c | 7 ++++++-
drivers/net/wireless/ath/ath10k/htt_tx.c | 1 -
drivers/net/wireless/ath/ath10k/mac.c | 14 ++++++++++++--
drivers/net/wireless/ath/ath10k/mac.h | 1 +
drivers/net/wireless/ath/ath10k/pci.c | 3 +++
drivers/net/wireless/ath/ath10k/txrx.c | 3 ++-
drivers/net/wireless/ath/ath10k/wmi.c | 5 ++++-
10 files changed, 63 insertions(+), 6 deletions(-)

--
1.8.5.3



2014-10-28 09:46:09

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v3 3/3] ath10k: speed up hw recovery

In some cases hw recovery was taking an absurdly
long time due to ath10k waiting for things that
would never really complete.

Instead of waiting for inevitable timeouts poke
all completions and wakequeues and check if it's
still worth waiting.

Reading/writing ar->state requires conf_mutex.
Since waiters might be holding it introduce a new
flag CRASH_FLUSH so it's possible to tell waiters
to abort whatever they were waiting for.

Signed-off-by: Michal Kazior <[email protected]>
---

Notes:
v2:
* comment use of barrier() [Kalle]
* comment CRASH_FLUSH vs ar->state in commit log [Kalle]
* fix crash on too early use of waitqueues [Janusz]

drivers/net/wireless/ath/ath10k/core.c | 23 +++++++++++++++++++++++
drivers/net/wireless/ath/ath10k/core.h | 5 +++++
drivers/net/wireless/ath/ath10k/htt_tx.c | 1 -
drivers/net/wireless/ath/ath10k/mac.c | 14 ++++++++++++--
drivers/net/wireless/ath/ath10k/mac.h | 1 +
drivers/net/wireless/ath/ath10k/txrx.c | 3 ++-
drivers/net/wireless/ath/ath10k/wmi.c | 5 ++++-
7 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index 5c23d00..6f2c459 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -744,6 +744,25 @@ static void ath10k_core_restart(struct work_struct *work)
{
struct ath10k *ar = container_of(work, struct ath10k, restart_work);

+ set_bit(ATH10K_FLAG_CRASH_FLUSH, &ar->dev_flags);
+
+ /* Place a barrier to make sure the compiler doesn't reorder
+ * CRASH_FLUSH and calling other functions.
+ */
+ barrier();
+
+ ieee80211_stop_queues(ar->hw);
+ ath10k_drain_tx(ar);
+ complete_all(&ar->scan.started);
+ complete_all(&ar->scan.completed);
+ complete_all(&ar->scan.on_channel);
+ complete_all(&ar->offchan_tx_completed);
+ complete_all(&ar->install_key_done);
+ complete_all(&ar->vdev_setup_done);
+ wake_up(&ar->htt.empty_tx_wq);
+ wake_up(&ar->wmi.tx_credits_wq);
+ wake_up(&ar->peer_mapping_wq);
+
mutex_lock(&ar->conf_mutex);

switch (ar->state) {
@@ -781,6 +800,8 @@ int ath10k_core_start(struct ath10k *ar, enum ath10k_firmware_mode mode)

lockdep_assert_held(&ar->conf_mutex);

+ clear_bit(ATH10K_FLAG_CRASH_FLUSH, &ar->dev_flags);
+
ath10k_bmi_start(ar);

if (ath10k_init_configure_target(ar)) {
@@ -1185,6 +1206,8 @@ struct ath10k *ath10k_core_create(size_t priv_size, struct device *dev,

INIT_LIST_HEAD(&ar->peers);
init_waitqueue_head(&ar->peer_mapping_wq);
+ init_waitqueue_head(&ar->htt.empty_tx_wq);
+ init_waitqueue_head(&ar->wmi.tx_credits_wq);

init_completion(&ar->offchan_tx_completed);
INIT_WORK(&ar->offchan_tx_work, ath10k_offchan_tx_work);
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 1e3fd10..114bac0 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -386,6 +386,11 @@ enum ath10k_dev_flags {
/* Indicates that ath10k device is during CAC phase of DFS */
ATH10K_CAC_RUNNING,
ATH10K_FLAG_CORE_REGISTERED,
+
+ /* Device has crashed and needs to restart. This indicates any pending
+ * waiters should immediately cancel instead of waiting for a time out.
+ */
+ ATH10K_FLAG_CRASH_FLUSH,
};

enum ath10k_cal_mode {
diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
index b0df470..1702c97 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -92,7 +92,6 @@ int ath10k_htt_tx_alloc(struct ath10k_htt *htt)
struct ath10k *ar = htt->ar;

spin_lock_init(&htt->tx_lock);
- init_waitqueue_head(&htt->empty_tx_wq);

if (test_bit(ATH10K_FW_FEATURE_WMI_10X, htt->ar->fw_features))
htt->max_num_pending_tx = TARGET_10X_NUM_MSDU_DESC;
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index a726107..b2edfb4 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -519,6 +519,9 @@ static inline int ath10k_vdev_setup_sync(struct ath10k *ar)

lockdep_assert_held(&ar->conf_mutex);

+ if (test_bit(ATH10K_FLAG_CRASH_FLUSH, &ar->dev_flags))
+ return -ESHUTDOWN;
+
ret = wait_for_completion_timeout(&ar->vdev_setup_done,
ATH10K_VDEV_SETUP_TIMEOUT_HZ);
if (ret == 0)
@@ -551,6 +554,8 @@ static int ath10k_monitor_vdev_start(struct ath10k *ar, int vdev_id)
arg.channel.max_reg_power = channel->max_reg_power * 2;
arg.channel.max_antenna_gain = channel->max_antenna_gain * 2;

+ reinit_completion(&ar->vdev_setup_done);
+
ret = ath10k_wmi_vdev_start(ar, &arg);
if (ret) {
ath10k_warn(ar, "failed to request monitor vdev %i start: %d\n",
@@ -598,6 +603,8 @@ static int ath10k_monitor_vdev_stop(struct ath10k *ar)
ath10k_warn(ar, "failed to put down monitor vdev %i: %d\n",
ar->monitor_vdev_id, ret);

+ reinit_completion(&ar->vdev_setup_done);
+
ret = ath10k_wmi_vdev_stop(ar, ar->monitor_vdev_id);
if (ret)
ath10k_warn(ar, "failed to to request monitor vdev %i stop: %d\n",
@@ -2350,7 +2357,7 @@ static void ath10k_tx(struct ieee80211_hw *hw,
}

/* Must not be called with conf_mutex held as workers can use that also. */
-static void ath10k_drain_tx(struct ath10k *ar)
+void ath10k_drain_tx(struct ath10k *ar)
{
/* make sure rcu-protected mac80211 tx path itself is drained */
synchronize_net();
@@ -3908,7 +3915,9 @@ static void ath10k_flush(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
empty = (ar->htt.num_pending_tx == 0);
spin_unlock_bh(&ar->htt.tx_lock);

- skip = (ar->state == ATH10K_STATE_WEDGED);
+ skip = (ar->state == ATH10K_STATE_WEDGED) ||
+ test_bit(ATH10K_FLAG_CRASH_FLUSH,
+ &ar->dev_flags);

(empty || skip);
}), ATH10K_FLUSH_TIMEOUT_HZ);
@@ -4005,6 +4014,7 @@ static void ath10k_restart_complete(struct ieee80211_hw *hw)
if (ar->state == ATH10K_STATE_RESTARTED) {
ath10k_info(ar, "device successfully recovered\n");
ar->state = ATH10K_STATE_ON;
+ ieee80211_wake_queues(ar->hw);
}

mutex_unlock(&ar->conf_mutex);
diff --git a/drivers/net/wireless/ath/ath10k/mac.h b/drivers/net/wireless/ath/ath10k/mac.h
index 965c511..4e3c989 100644
--- a/drivers/net/wireless/ath/ath10k/mac.h
+++ b/drivers/net/wireless/ath/ath10k/mac.h
@@ -40,6 +40,7 @@ void ath10k_mgmt_over_wmi_tx_purge(struct ath10k *ar);
void ath10k_mgmt_over_wmi_tx_work(struct work_struct *work);
void ath10k_halt(struct ath10k *ar);
void ath10k_mac_vif_beacon_free(struct ath10k_vif *arvif);
+void ath10k_drain_tx(struct ath10k *ar);

static inline struct ath10k_vif *ath10k_vif_to_arvif(struct ieee80211_vif *vif)
{
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
index f9c90e3..7579de8 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -146,7 +146,8 @@ static int ath10k_wait_for_peer_common(struct ath10k *ar, int vdev_id,
mapped = !!ath10k_peer_find(ar, vdev_id, addr);
spin_unlock_bh(&ar->data_lock);

- mapped == expect_mapped;
+ (mapped == expect_mapped ||
+ test_bit(ATH10K_FLAG_CRASH_FLUSH, &ar->dev_flags));
}), 3*HZ);

if (ret <= 0)
diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
index ae746ce..5592844 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -779,6 +779,10 @@ int ath10k_wmi_cmd_send(struct ath10k *ar, struct sk_buff *skb, u32 cmd_id)
ath10k_wmi_tx_beacons_nowait(ar);

ret = ath10k_wmi_cmd_send_nowait(ar, skb, cmd_id);
+
+ if (ret && test_bit(ATH10K_FLAG_CRASH_FLUSH, &ar->dev_flags))
+ ret = -ESHUTDOWN;
+
(ret != -EAGAIN);
}), 3*HZ);

@@ -4398,7 +4402,6 @@ int ath10k_wmi_attach(struct ath10k *ar)

init_completion(&ar->wmi.service_ready);
init_completion(&ar->wmi.unified_ready);
- init_waitqueue_head(&ar->wmi.tx_credits_wq);

return 0;
}
--
1.8.5.3


2014-10-28 09:46:07

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v3 2/3] ath10k: expose hw restart via debugfs

Until now it was possible to simulate soft and
hard fw crashes but it wasn't possible to trigger
an immediately hw restart itself (without the fw
crash).

This can be useful when stress testing hw
restarting stability, e.g. during heavy tx/rx
traffic.

Signed-off-by: Michal Kazior <[email protected]>
---

Notes:
v2:
* s/request/hw-restart/ [Kalle]

drivers/net/wireless/ath/ath10k/debug.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath10k/debug.c b/drivers/net/wireless/ath/ath10k/debug.c
index 9147dd3..a8f5a72 100644
--- a/drivers/net/wireless/ath/ath10k/debug.c
+++ b/drivers/net/wireless/ath/ath10k/debug.c
@@ -695,7 +695,8 @@ static ssize_t ath10k_read_simulate_fw_crash(struct file *file,
"To simulate firmware crash write one of the keywords to this file:\n"
"`soft` - this will send WMI_FORCE_FW_HANG_ASSERT to firmware if FW supports that command.\n"
"`hard` - this will send to firmware command with illegal parameters causing firmware crash.\n"
- "`assert` - this will send special illegal parameter to firmware to cause assert failure and crash.\n";
+ "`assert` - this will send special illegal parameter to firmware to cause assert failure and crash.\n"
+ "`hw-restart` - this will simply queue hw restart without fw/hw actually crashing.\n";

return simple_read_from_buffer(user_buf, count, ppos, buf, strlen(buf));
}
@@ -748,6 +749,10 @@ static ssize_t ath10k_write_simulate_fw_crash(struct file *file,
} else if (!strcmp(buf, "assert")) {
ath10k_info(ar, "simulating firmware assert crash\n");
ret = ath10k_debug_fw_assert(ar);
+ } else if (!strcmp(buf, "hw-restart")) {
+ ath10k_info(ar, "user requested hw restart\n");
+ queue_work(ar->workqueue, &ar->restart_work);
+ ret = 0;
} else {
ret = -EINVAL;
goto exit;
--
1.8.5.3


2014-10-28 09:46:07

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v3 1/3] ath10k: fix possible bmi crash

While testing other things I've found that CE
items aren't cleared properly. This could lead to
null dereferences in BMI.

To prevent that make sure CE revoking clears the
nbytes value (which is used as a buffer completion
indication) and memset the entire CE ring data
shared between host and target when
(re)initializing.

Also make sure to check BMI xfer pointer and print
a splat instead of crashing the kernel.

Signed-off-by: Michal Kazior <[email protected]>
---

Notes:
v3:
* dont reset shadow desc - its used for cleanup in hif_stop()

drivers/net/wireless/ath/ath10k/ce.c | 7 +++++++
drivers/net/wireless/ath/ath10k/pci.c | 3 +++
2 files changed, 10 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index 878e1ec..a156e6e 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -558,6 +558,7 @@ int ath10k_ce_revoke_recv_next(struct ath10k_ce_pipe *ce_state,

/* sanity */
dest_ring->per_transfer_context[sw_index] = NULL;
+ desc->nbytes = 0;

/* Update sw_index */
sw_index = CE_RING_IDX_INCR(nentries_mask, sw_index);
@@ -835,6 +836,9 @@ static int ath10k_ce_init_src_ring(struct ath10k *ar,

nentries = roundup_pow_of_two(attr->src_nentries);

+ memset(src_ring->base_addr_owner_space, 0,
+ nentries * sizeof(struct ce_desc));
+
src_ring->sw_index = ath10k_ce_src_ring_read_index_get(ar, ctrl_addr);
src_ring->sw_index &= src_ring->nentries_mask;
src_ring->hw_index = src_ring->sw_index;
@@ -869,6 +873,9 @@ static int ath10k_ce_init_dest_ring(struct ath10k *ar,

nentries = roundup_pow_of_two(attr->dest_nentries);

+ memset(dest_ring->base_addr_owner_space, 0,
+ nentries * sizeof(struct ce_desc));
+
dest_ring->sw_index = ath10k_ce_dest_ring_read_index_get(ar, ctrl_addr);
dest_ring->sw_index &= dest_ring->nentries_mask;
dest_ring->write_index =
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 63f374ed..f5e426e 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -1442,6 +1442,9 @@ static void ath10k_pci_bmi_recv_data(struct ath10k_ce_pipe *ce_state)
&nbytes, &transfer_id, &flags))
return;

+ if (WARN_ON_ONCE(!xfer))
+ return;
+
if (!xfer->wait_for_resp) {
ath10k_warn(ar, "unexpected: BMI data received; ignoring\n");
return;
--
1.8.5.3


2014-10-31 00:33:50

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH v3 0/3] ath10k: speed up recovery

Michal Kazior <[email protected]> writes:

> Patch #1 fixes a bug I've found while testing
> patches #2 and #3. It's safe to cherry-pick it in
> case patches #2 and/or #3 aren't accepted or need
> a re-spin.
>
> I've mainly used patch #2 to test reset and
> recovery. It's pretty handy for (stress-)testing.
>
> Patch #3 should improve recovery speed in some
> cases. Notably when there's a long chain of WMI
> commands involved which take a painfully long time
> to timeout/complete if firmware crashes mid-way.
>
> Note: Patches #2 and #3 depend on `ath10k: pci
> related fixes 2014-10-09`. Without the patchset
> patches #2 and #3 must not be applied. Patch #1 is
> fine though.
>
> v3:
> * skip shadow ring cleanup in #1
>
> Michal Kazior (3):
> ath10k: fix possible bmi crash
> ath10k: expose hw restart via debugfs
> ath10k: speed up hw recovery

Thanks, all three patches applied.

--
Kalle Valo