2014-02-17 09:37:52

by Michal Kazior

[permalink] [raw]
Subject: [RFC/RFT 0/7] ath10k: performance improvements

Hi,

These patches aim at reducing host CPU load and
thus improve performance on low-end systems.

In my setup I get a relative improvement of
100mbps for both UDP Tx and Rx.

Tx ----->
laptop:eth---eth:AP135:ath10k---ath10k:laptop
<----- Rx


Michal Kazior (7):
ath10k: remove DMA mapping wrappers
ath10k: remove is_aborted from skb_cb
ath10k: replace send_head() with tx_sg()
ath10k: bypass htc for htt tx path
ath10k: batch htt tx/rx completions
ath10k: remove pci completion list
ath10k: minimize coherent dma accesses

drivers/net/wireless/ath/ath10k/ce.c | 16 +-
drivers/net/wireless/ath/ath10k/ce.h | 9 +-
drivers/net/wireless/ath/ath10k/core.h | 33 +--
drivers/net/wireless/ath/ath10k/hif.h | 25 +-
drivers/net/wireless/ath/ath10k/htc.c | 25 +-
drivers/net/wireless/ath/ath10k/htt.h | 16 ++
drivers/net/wireless/ath/ath10k/htt_rx.c | 152 +++++++-----
drivers/net/wireless/ath/ath10k/htt_tx.c | 207 ++++++++--------
drivers/net/wireless/ath/ath10k/mac.c | 4 +-
drivers/net/wireless/ath/ath10k/pci.c | 389 +++++++------------------------
drivers/net/wireless/ath/ath10k/pci.h | 28 ---
drivers/net/wireless/ath/ath10k/txrx.c | 15 +-
drivers/net/wireless/ath/ath10k/wmi.c | 17 +-
13 files changed, 382 insertions(+), 554 deletions(-)

--
1.8.5.3



2014-02-28 09:06:22

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] ath10k: bypass htc for htt tx path

Michal Kazior <[email protected]> writes:

> Going through full htc tx path for htt tx is a
> waste of resources. By skipping it it's possible
> to easily submit scatter-gather to the pci hif for
> reduced host cpu load and improved performance.
>
> The new approach uses dma pool to store the
> following metadata for each tx request:
> * msdu fragment list
> * htc header
> * htt tx command
>
> The htt tx command contains a msdu prefetch.
> Instead of copying it original mapped msdu address
> is used to submit a second scatter-gather item to
> hif to make a complete htt tx command.
>
> The htt tx command itself hands over dma mapped
> pointers to msdus and completion of the command
> itself doesn't mean the frame has been sent and
> can be unmapped/freed. This is why htc tx
> completion is skipped for htt tx as all tx related
> resources are freed upon htt tx completion
> indication event (which also implicitly means htt
> tx command itself was completed).
>
> Since now each htt tx request effectively consists
> of 2 copy engine items CE_HTT_H2T_MSG_SRC_NENTRIES
> is updated to allow maximum of
> TARGET_10X_NUM_MSDU_DESC msdus being queued. This
> keeps the tx path resource management simple.
>
> Signed-off-by: Michal Kazior <[email protected]>
> ---
> v2:
> * improve commit log
> * improve comment in code
> * fix sparse/checkpatch/buildbot warnings

[...]

> --- a/drivers/net/wireless/ath/ath10k/htc.c
> +++ b/drivers/net/wireless/ath/ath10k/htc.c
> @@ -202,10 +202,8 @@ static int ath10k_htc_tx_completion_handler(struct ath10k *ar,
> struct ath10k_htc *htc = &ar->htc;
> struct ath10k_htc_ep *ep = &htc->endpoint[eid];
>
> - if (!skb) {
> - ath10k_warn("invalid sk_buff completion - NULL pointer. firmware crashed?\n");
> + if (WARN_ON(!skb))
> return 0;
> - }

WARN_ON() is a bit dangerous here as it might cause excessive spamming.
Why did you want to change this? I think either ath10k_warn() or
WARN_ON_ONCE() would be safer, but not sure which one to use.

--
Kalle Valo

2014-02-27 07:25:03

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v3 2/8] ath10k: remove is_aborted from skb_cb

The flag wasn't used anymore. No need to keep it.

Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/core.h | 1 -
drivers/net/wireless/ath/ath10k/pci.c | 20 --------------------
2 files changed, 21 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 082fa77..7cf022d 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -62,7 +62,6 @@ struct ath10k;

struct ath10k_skb_cb {
dma_addr_t paddr;
- bool is_aborted;
u8 vdev_id;

struct {
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 34f0910..d973975 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -63,7 +63,6 @@ static int ath10k_pci_post_rx(struct ath10k *ar);
static int ath10k_pci_post_rx_pipe(struct ath10k_pci_pipe *pipe_info,
int num);
static void ath10k_pci_rx_pipe_cleanup(struct ath10k_pci_pipe *pipe_info);
-static void ath10k_pci_stop_ce(struct ath10k *ar);
static int ath10k_pci_cold_reset(struct ath10k *ar);
static int ath10k_pci_warm_reset(struct ath10k *ar);
static int ath10k_pci_wait_for_target_init(struct ath10k *ar);
@@ -993,22 +992,6 @@ static void ath10k_pci_kill_tasklet(struct ath10k *ar)
tasklet_kill(&ar_pci->pipe_info[i].intr);
}

-static void ath10k_pci_stop_ce(struct ath10k *ar)
-{
- struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
- struct ath10k_pci_compl *compl;
- struct sk_buff *skb;
-
- /* Mark pending completions as aborted, so that upper layers free up
- * their associated resources */
- spin_lock_bh(&ar_pci->compl_lock);
- list_for_each_entry(compl, &ar_pci->compl_process, list) {
- skb = compl->skb;
- ATH10K_SKB_CB(skb)->is_aborted = true;
- }
- spin_unlock_bh(&ar_pci->compl_lock);
-}
-
static void ath10k_pci_cleanup_ce(struct ath10k *ar)
{
struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
@@ -1339,7 +1322,6 @@ err_stop:
ath10k_ce_disable_interrupts(ar);
ath10k_pci_free_irq(ar);
ath10k_pci_kill_tasklet(ar);
- ath10k_pci_stop_ce(ar);
ath10k_pci_process_ce(ar);
err_free_compl:
ath10k_pci_cleanup_ce(ar);
@@ -1424,7 +1406,6 @@ static void ath10k_pci_tx_pipe_cleanup(struct ath10k_pci_pipe *pipe_info)
continue;
}

- ATH10K_SKB_CB(netbuf)->is_aborted = true;
ar_pci->msg_callbacks_current.tx_completion(ar,
netbuf,
id);
@@ -1482,7 +1463,6 @@ static void ath10k_pci_hif_stop(struct ath10k *ar)

ath10k_pci_free_irq(ar);
ath10k_pci_kill_tasklet(ar);
- ath10k_pci_stop_ce(ar);

ret = ath10k_pci_request_early_irq(ar);
if (ret)
--
1.8.5.3


2014-02-28 09:00:49

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH v3 5/8] ath10k: batch htt tx/rx completions

Michal Kazior <[email protected]> writes:

> HTT Rx endpoint processes both frame rx
> indications and frame tx completion indications.
>
> Those completions typically come in batches and
> may be mixed so it makes sense to defer processing
> hoping to get a bunch of them and take advantage
> of hot caches.
>
> Signed-off-by: Michal Kazior <[email protected]>

[...]

> +void ath10k_htt_txrx_compl_task(unsigned long ptr)
> +{
> + struct ath10k_htt *htt = (struct ath10k_htt *)ptr;
> + struct htt_resp *resp;
> + struct sk_buff *skb;

I think this can be a static function. Is it ok if I change this patch
with the diff below?

--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -1360,6 +1360,5 @@ int ath10k_htt_tx_alloc_msdu_id(struct ath10k_htt *htt);
void ath10k_htt_tx_free_msdu_id(struct ath10k_htt *htt, u16 msdu_id);
int ath10k_htt_mgmt_tx(struct ath10k_htt *htt, struct sk_buff *);
int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *);
-void ath10k_htt_txrx_compl_task(unsigned long ptr);

#endif
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index e362358240e8..3a02e654d871 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -43,7 +43,7 @@


static int ath10k_htt_rx_get_csum_state(struct sk_buff *skb);
-
+static void ath10k_htt_txrx_compl_task(unsigned long ptr);

static int ath10k_htt_rx_ring_size(struct ath10k_htt *htt)
{
@@ -1347,7 +1347,7 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
dev_kfree_skb_any(skb);
}

-void ath10k_htt_txrx_compl_task(unsigned long ptr)
+static void ath10k_htt_txrx_compl_task(unsigned long ptr)
{
struct ath10k_htt *htt = (struct ath10k_htt *)ptr;
struct htt_resp *resp;


--
Kalle Valo

2014-02-17 09:38:04

by Michal Kazior

[permalink] [raw]
Subject: [RFC/RFT 7/7] ath10k: minimize coherent dma accesses

It doesn't make much sense to calculate the ring
size fill count because it already is memoized in
a separate variable.

Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/htt_rx.c | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index d3051e3..6fe5ecc 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -225,12 +225,6 @@ static void ath10k_htt_rx_ring_refill_retry(unsigned long arg)
ath10k_htt_rx_msdu_buff_replenish(htt);
}

-static unsigned ath10k_htt_rx_ring_elems(struct ath10k_htt *htt)
-{
- return (__le32_to_cpu(*htt->rx_ring.alloc_idx.vaddr) -
- htt->rx_ring.sw_rd_idx.msdu_payld) & htt->rx_ring.size_mask;
-}
-
void ath10k_htt_rx_detach(struct ath10k_htt *htt)
{
int sw_rd_idx = htt->rx_ring.sw_rd_idx.msdu_payld;
@@ -276,8 +270,11 @@ static inline struct sk_buff *ath10k_htt_rx_netbuf_pop(struct ath10k_htt *htt)

lockdep_assert_held(&htt->rx_ring.lock);

- if (ath10k_htt_rx_ring_elems(htt) == 0)
- ath10k_warn("htt rx ring is empty!\n");
+ if (htt->rx_ring.fill_cnt == 0) {
+ ath10k_warn("tried to pop sk_buff from an empty rx ring\n");
+ spin_unlock_bh(&htt->rx_ring.lock);
+ return NULL;
+ }

idx = htt->rx_ring.sw_rd_idx.msdu_payld;
msdu = htt->rx_ring.netbufs_ring[idx];
@@ -312,9 +309,6 @@ static int ath10k_htt_rx_amsdu_pop(struct ath10k_htt *htt,

lockdep_assert_held(&htt->rx_ring.lock);

- if (ath10k_htt_rx_ring_elems(htt) == 0)
- ath10k_warn("htt rx ring is empty!\n");
-
if (htt->rx_confused) {
ath10k_warn("htt is confused. refusing rx\n");
return 0;
--
1.8.5.3


2014-02-27 07:25:08

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v3 7/8] ath10k: remove pci completion list

One of the premises was to guarantee serialized
completion handling for upper layers
(HTC/WMI/HTT). Since quite some time now it is no
longer necessary.

The other premise was to batch up tx/rx
completions to take advantage of hot caches.
However frame tx/rx completion indications come in
on a single pipe already so they are already
batched up. More meaningful batching is done in
HTT itself.

This means PCI completion is no longer necessary
to keep around. It just wastes memory, cycles and
SLOC.

Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/pci.c | 275 +++-------------------------------
drivers/net/wireless/ath/ath10k/pci.h | 28 ----
2 files changed, 24 insertions(+), 279 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 2305d58..9d242d8 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -58,7 +58,6 @@ static DEFINE_PCI_DEVICE_TABLE(ath10k_pci_id_table) = {
static int ath10k_pci_diag_read_access(struct ath10k *ar, u32 address,
u32 *data);

-static void ath10k_pci_process_ce(struct ath10k *ar);
static int ath10k_pci_post_rx(struct ath10k *ar);
static int ath10k_pci_post_rx_pipe(struct ath10k_pci_pipe *pipe_info,
int num);
@@ -73,7 +72,6 @@ static void ath10k_pci_free_irq(struct ath10k *ar);
static int ath10k_pci_bmi_wait(struct ath10k_ce_pipe *tx_pipe,
struct ath10k_ce_pipe *rx_pipe,
struct bmi_xfer *xfer);
-static void ath10k_pci_cleanup_ce(struct ath10k *ar);

static const struct ce_attr host_ce_config_wlan[] = {
/* CE0: host->target HTC control and raw streams */
@@ -678,34 +676,12 @@ void ath10k_do_pci_sleep(struct ath10k *ar)
}
}

-/*
- * FIXME: Handle OOM properly.
- */
-static inline
-struct ath10k_pci_compl *get_free_compl(struct ath10k_pci_pipe *pipe_info)
-{
- struct ath10k_pci_compl *compl = NULL;
-
- spin_lock_bh(&pipe_info->pipe_lock);
- if (list_empty(&pipe_info->compl_free)) {
- ath10k_warn("Completion buffers are full\n");
- goto exit;
- }
- compl = list_first_entry(&pipe_info->compl_free,
- struct ath10k_pci_compl, list);
- list_del(&compl->list);
-exit:
- spin_unlock_bh(&pipe_info->pipe_lock);
- return compl;
-}
-
/* Called by lower (CE) layer when a send to Target completes. */
static void ath10k_pci_ce_send_done(struct ath10k_ce_pipe *ce_state)
{
struct ath10k *ar = ce_state->ar;
struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
- struct ath10k_pci_pipe *pipe_info = &ar_pci->pipe_info[ce_state->id];
- struct ath10k_pci_compl *compl;
+ struct ath10k_hif_cb *cb = &ar_pci->msg_callbacks_current;
void *transfer_context;
u32 ce_data;
unsigned int nbytes;
@@ -718,27 +694,8 @@ static void ath10k_pci_ce_send_done(struct ath10k_ce_pipe *ce_state)
if (transfer_context == NULL)
continue;

- compl = get_free_compl(pipe_info);
- if (!compl)
- break;
-
- compl->state = ATH10K_PCI_COMPL_SEND;
- compl->ce_state = ce_state;
- compl->pipe_info = pipe_info;
- compl->skb = transfer_context;
- compl->nbytes = nbytes;
- compl->transfer_id = transfer_id;
- compl->flags = 0;
-
- /*
- * Add the completion to the processing queue.
- */
- spin_lock_bh(&ar_pci->compl_lock);
- list_add_tail(&compl->list, &ar_pci->compl_process);
- spin_unlock_bh(&ar_pci->compl_lock);
+ cb->tx_completion(ar, transfer_context, transfer_id);
}
-
- ath10k_pci_process_ce(ar);
}

/* Called by lower (CE) layer when data is received from the Target. */
@@ -747,42 +704,40 @@ static void ath10k_pci_ce_recv_data(struct ath10k_ce_pipe *ce_state)
struct ath10k *ar = ce_state->ar;
struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
struct ath10k_pci_pipe *pipe_info = &ar_pci->pipe_info[ce_state->id];
- struct ath10k_pci_compl *compl;
+ struct ath10k_hif_cb *cb = &ar_pci->msg_callbacks_current;
struct sk_buff *skb;
void *transfer_context;
u32 ce_data;
- unsigned int nbytes;
+ unsigned int nbytes, max_nbytes;
unsigned int transfer_id;
unsigned int flags;
+ int err;

while (ath10k_ce_completed_recv_next(ce_state, &transfer_context,
&ce_data, &nbytes, &transfer_id,
&flags) == 0) {
- compl = get_free_compl(pipe_info);
- if (!compl)
- break;
-
- compl->state = ATH10K_PCI_COMPL_RECV;
- compl->ce_state = ce_state;
- compl->pipe_info = pipe_info;
- compl->skb = transfer_context;
- compl->nbytes = nbytes;
- compl->transfer_id = transfer_id;
- compl->flags = flags;
+ err = ath10k_pci_post_rx_pipe(pipe_info, 1);
+ if (unlikely(err)) {
+ /* FIXME: retry */
+ ath10k_warn("failed to replenish CE rx ring %d: %d\n",
+ pipe_info->pipe_num, err);
+ }

skb = transfer_context;
+ max_nbytes = skb->len + skb_tailroom(skb);
dma_unmap_single(ar->dev, ATH10K_SKB_CB(skb)->paddr,
- skb->len + skb_tailroom(skb),
- DMA_FROM_DEVICE);
- /*
- * Add the completion to the processing queue.
- */
- spin_lock_bh(&ar_pci->compl_lock);
- list_add_tail(&compl->list, &ar_pci->compl_process);
- spin_unlock_bh(&ar_pci->compl_lock);
- }
+ max_nbytes, DMA_FROM_DEVICE);

- ath10k_pci_process_ce(ar);
+ if (unlikely(max_nbytes < nbytes)) {
+ ath10k_warn("rxed more than expected (nbytes %d, max %d)",
+ nbytes, max_nbytes);
+ dev_kfree_skb_any(skb);
+ continue;
+ }
+
+ skb_put(skb, nbytes);
+ cb->rx_completion(ar, skb, pipe_info->pipe_num);
+ }
}

static int ath10k_pci_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
@@ -931,52 +886,6 @@ static void ath10k_pci_hif_set_callbacks(struct ath10k *ar,
sizeof(ar_pci->msg_callbacks_current));
}

-static int ath10k_pci_alloc_compl(struct ath10k *ar)
-{
- struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
- const struct ce_attr *attr;
- struct ath10k_pci_pipe *pipe_info;
- struct ath10k_pci_compl *compl;
- int i, pipe_num, completions;
-
- spin_lock_init(&ar_pci->compl_lock);
- INIT_LIST_HEAD(&ar_pci->compl_process);
-
- for (pipe_num = 0; pipe_num < CE_COUNT; pipe_num++) {
- pipe_info = &ar_pci->pipe_info[pipe_num];
-
- spin_lock_init(&pipe_info->pipe_lock);
- INIT_LIST_HEAD(&pipe_info->compl_free);
-
- /* Handle Diagnostic CE specially */
- if (pipe_info->ce_hdl == ar_pci->ce_diag)
- continue;
-
- attr = &host_ce_config_wlan[pipe_num];
- completions = 0;
-
- if (attr->src_nentries)
- completions += attr->src_nentries;
-
- if (attr->dest_nentries)
- completions += attr->dest_nentries;
-
- for (i = 0; i < completions; i++) {
- compl = kmalloc(sizeof(*compl), GFP_KERNEL);
- if (!compl) {
- ath10k_warn("No memory for completion state\n");
- ath10k_pci_cleanup_ce(ar);
- return -ENOMEM;
- }
-
- compl->state = ATH10K_PCI_COMPL_FREE;
- list_add_tail(&compl->list, &pipe_info->compl_free);
- }
- }
-
- return 0;
-}
-
static int ath10k_pci_setup_ce_irq(struct ath10k *ar)
{
struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
@@ -1021,131 +930,6 @@ static void ath10k_pci_kill_tasklet(struct ath10k *ar)
tasklet_kill(&ar_pci->pipe_info[i].intr);
}

-static void ath10k_pci_cleanup_ce(struct ath10k *ar)
-{
- struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
- struct ath10k_pci_compl *compl, *tmp;
- struct ath10k_pci_pipe *pipe_info;
- struct sk_buff *netbuf;
- int pipe_num;
-
- /* Free pending completions. */
- spin_lock_bh(&ar_pci->compl_lock);
- if (!list_empty(&ar_pci->compl_process))
- ath10k_warn("pending completions still present! possible memory leaks.\n");
-
- list_for_each_entry_safe(compl, tmp, &ar_pci->compl_process, list) {
- list_del(&compl->list);
- netbuf = compl->skb;
- dev_kfree_skb_any(netbuf);
- kfree(compl);
- }
- spin_unlock_bh(&ar_pci->compl_lock);
-
- /* Free unused completions for each pipe. */
- for (pipe_num = 0; pipe_num < CE_COUNT; pipe_num++) {
- pipe_info = &ar_pci->pipe_info[pipe_num];
-
- spin_lock_bh(&pipe_info->pipe_lock);
- list_for_each_entry_safe(compl, tmp,
- &pipe_info->compl_free, list) {
- list_del(&compl->list);
- kfree(compl);
- }
- spin_unlock_bh(&pipe_info->pipe_lock);
- }
-}
-
-static void ath10k_pci_process_ce(struct ath10k *ar)
-{
- struct ath10k_pci *ar_pci = ar->hif.priv;
- struct ath10k_hif_cb *cb = &ar_pci->msg_callbacks_current;
- struct ath10k_pci_compl *compl;
- struct sk_buff *skb;
- unsigned int nbytes;
- int ret, send_done = 0;
-
- /* Upper layers aren't ready to handle tx/rx completions in parallel so
- * we must serialize all completion processing. */
-
- spin_lock_bh(&ar_pci->compl_lock);
- if (ar_pci->compl_processing) {
- spin_unlock_bh(&ar_pci->compl_lock);
- return;
- }
- ar_pci->compl_processing = true;
- spin_unlock_bh(&ar_pci->compl_lock);
-
- for (;;) {
- spin_lock_bh(&ar_pci->compl_lock);
- if (list_empty(&ar_pci->compl_process)) {
- spin_unlock_bh(&ar_pci->compl_lock);
- break;
- }
- compl = list_first_entry(&ar_pci->compl_process,
- struct ath10k_pci_compl, list);
- list_del(&compl->list);
- spin_unlock_bh(&ar_pci->compl_lock);
-
- switch (compl->state) {
- case ATH10K_PCI_COMPL_SEND:
- cb->tx_completion(ar,
- compl->skb,
- compl->transfer_id);
- send_done = 1;
- break;
- case ATH10K_PCI_COMPL_RECV:
- ret = ath10k_pci_post_rx_pipe(compl->pipe_info, 1);
- if (ret) {
- ath10k_warn("failed to post RX buffer for pipe %d: %d\n",
- compl->pipe_info->pipe_num, ret);
- break;
- }
-
- skb = compl->skb;
- nbytes = compl->nbytes;
-
- ath10k_dbg(ATH10K_DBG_PCI,
- "ath10k_pci_ce_recv_data netbuf=%p nbytes=%d\n",
- skb, nbytes);
- ath10k_dbg_dump(ATH10K_DBG_PCI_DUMP, NULL,
- "ath10k rx: ", skb->data, nbytes);
-
- if (skb->len + skb_tailroom(skb) >= nbytes) {
- skb_trim(skb, 0);
- skb_put(skb, nbytes);
- cb->rx_completion(ar, skb,
- compl->pipe_info->pipe_num);
- } else {
- ath10k_warn("rxed more than expected (nbytes %d, max %d)",
- nbytes,
- skb->len + skb_tailroom(skb));
- }
- break;
- case ATH10K_PCI_COMPL_FREE:
- ath10k_warn("free completion cannot be processed\n");
- break;
- default:
- ath10k_warn("invalid completion state (%d)\n",
- compl->state);
- break;
- }
-
- compl->state = ATH10K_PCI_COMPL_FREE;
-
- /*
- * Add completion back to the pipe's free list.
- */
- spin_lock_bh(&compl->pipe_info->pipe_lock);
- list_add_tail(&compl->list, &compl->pipe_info->compl_free);
- spin_unlock_bh(&compl->pipe_info->pipe_lock);
- }
-
- spin_lock_bh(&ar_pci->compl_lock);
- ar_pci->compl_processing = false;
- spin_unlock_bh(&ar_pci->compl_lock);
-}
-
/* TODO - temporary mapping while we have too few CE's */
static int ath10k_pci_hif_map_service_to_pipe(struct ath10k *ar,
u16 service_id, u8 *ul_pipe,
@@ -1317,17 +1101,11 @@ static int ath10k_pci_hif_start(struct ath10k *ar)
ath10k_pci_free_early_irq(ar);
ath10k_pci_kill_tasklet(ar);

- ret = ath10k_pci_alloc_compl(ar);
- if (ret) {
- ath10k_warn("failed to allocate CE completions: %d\n", ret);
- goto err_early_irq;
- }
-
ret = ath10k_pci_request_irq(ar);
if (ret) {
ath10k_warn("failed to post RX buffers for all pipes: %d\n",
ret);
- goto err_free_compl;
+ goto err_early_irq;
}

ret = ath10k_pci_setup_ce_irq(ar);
@@ -1351,9 +1129,6 @@ err_stop:
ath10k_ce_disable_interrupts(ar);
ath10k_pci_free_irq(ar);
ath10k_pci_kill_tasklet(ar);
- ath10k_pci_process_ce(ar);
-err_free_compl:
- ath10k_pci_cleanup_ce(ar);
err_early_irq:
/* Though there should be no interrupts (device was reset)
* power_down() expects the early IRQ to be installed as per the
@@ -1494,8 +1269,6 @@ static void ath10k_pci_hif_stop(struct ath10k *ar)
* not DMA nor interrupt. We process the leftovers and then free
* everything else up. */

- ath10k_pci_process_ce(ar);
- ath10k_pci_cleanup_ce(ar);
ath10k_pci_buffer_cleanup(ar);

/* Make the sure the device won't access any structures on the host by
diff --git a/drivers/net/wireless/ath/ath10k/pci.h b/drivers/net/wireless/ath/ath10k/pci.h
index a4f3203..b43fdb4 100644
--- a/drivers/net/wireless/ath/ath10k/pci.h
+++ b/drivers/net/wireless/ath/ath10k/pci.h
@@ -43,23 +43,6 @@ struct bmi_xfer {
u32 resp_len;
};

-enum ath10k_pci_compl_state {
- ATH10K_PCI_COMPL_FREE = 0,
- ATH10K_PCI_COMPL_SEND,
- ATH10K_PCI_COMPL_RECV,
-};
-
-struct ath10k_pci_compl {
- struct list_head list;
- enum ath10k_pci_compl_state state;
- struct ath10k_ce_pipe *ce_state;
- struct ath10k_pci_pipe *pipe_info;
- struct sk_buff *skb;
- unsigned int nbytes;
- unsigned int transfer_id;
- unsigned int flags;
-};
-
/*
* PCI-specific Target state
*
@@ -175,9 +158,6 @@ struct ath10k_pci_pipe {
/* protects compl_free and num_send_allowed */
spinlock_t pipe_lock;

- /* List of free CE completion slots */
- struct list_head compl_free;
-
struct ath10k_pci *ar_pci;
struct tasklet_struct intr;
};
@@ -205,14 +185,6 @@ struct ath10k_pci {
atomic_t keep_awake_count;
bool verified_awake;

- /* List of CE completions to be processed */
- struct list_head compl_process;
-
- /* protects compl_processing and compl_process */
- spinlock_t compl_lock;
-
- bool compl_processing;
-
struct ath10k_pci_pipe pipe_info[CE_COUNT_MAX];

struct ath10k_hif_cb msg_callbacks_current;
--
1.8.5.3


2014-02-19 15:10:25

by Kalle Valo

[permalink] [raw]
Subject: Re: [RFC/RFT 5/7] ath10k: batch htt tx/rx completions

Michal Kazior <[email protected]> writes:

> HTT Rx endpoint processes both frame rx
> indications and frame tx completion indications.
>
> Those completions typically come in batches and
> may be mixed so it makes sense to defer processing
> hoping to get a bunch of them and take advantage
> of hot caches.
>
> Signed-off-by: Michal Kazior <[email protected]>

[...]

> @@ -270,7 +274,7 @@ static inline struct sk_buff *ath10k_htt_rx_netbuf_pop(struct ath10k_htt *htt)
> int idx;
> struct sk_buff *msdu;
>
> - spin_lock_bh(&htt->rx_ring.lock);
> + lockdep_assert_held(&htt->rx_ring.lock);

There are some locking changes which I think would be better to have in
a separate patch.


> case HTT_T2H_MSG_TYPE_MGMT_TX_COMPLETION: {
> + struct htt_resp *resp = (struct htt_resp *)skb->data;
> struct htt_tx_done tx_done = {};
> int status = __le32_to_cpu(resp->mgmt_tx_completion.status);
>
> - tx_done.msdu_id =
> - __le32_to_cpu(resp->mgmt_tx_completion.desc_id);
> + tx_done.msdu_id = __le32_to_cpu(resp->mgmt_tx_completion.desc_id);

I don't see any changes here.

--
Kalle Valo

2014-02-28 09:28:28

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] ath10k: bypass htc for htt tx path

Michal Kazior <[email protected]> writes:

> On 28 February 2014 10:06, Kalle Valo <[email protected]> wrote:
>> Michal Kazior <[email protected]> writes:
>>
>>> --- a/drivers/net/wireless/ath/ath10k/htc.c
>>> +++ b/drivers/net/wireless/ath/ath10k/htc.c
>>> @@ -202,10 +202,8 @@ static int ath10k_htc_tx_completion_handler(struct ath10k *ar,
>>> struct ath10k_htc *htc = &ar->htc;
>>> struct ath10k_htc_ep *ep = &htc->endpoint[eid];
>>>
>>> - if (!skb) {
>>> - ath10k_warn("invalid sk_buff completion - NULL pointer. firmware crashed?\n");
>>> + if (WARN_ON(!skb))
>>> return 0;
>>> - }
>>
>> WARN_ON() is a bit dangerous here as it might cause excessive spamming.
>> Why did you want to change this? I think either ath10k_warn() or
>> WARN_ON_ONCE() would be safer, but not sure which one to use.
>
> After the scatter-gather patch no NULL skb should be ever passed to tx
> completion handler as those are ignored by hif/pci.

Sure. But that's just theory, in practise all sorts bugs can always
happen :)

> Perhaps the hunk should be moved from this patch to the scatter-gather
> one.

Nah, I don't think that's necessary.

> Perhaps WARN_ON() is a bit over the top here, but since this is now
> more of a logic issue rather than runtime issue I decided to change it
> from ath10k_warn to WARN_ON(). It's probably still a good idea to make
> it _ONCE generally, although if you actually get skbuff it's already
> too late or it should be screaming loudly because someone must've
> changed the code in an incorrect/incomplete way.

So I change it to WARN_ON_ONCE(), ok?

--
Kalle Valo

2014-02-17 09:37:59

by Michal Kazior

[permalink] [raw]
Subject: [RFC/RFT 4/7] ath10k: bypass htc for htt tx path

Going through full HTC tx path for HTT tx is a
waste of resources. By skipping it it's possible
to easily submit scatter-gather to the PCI HIF for
reduced host CPU load and improved performance.

Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/ce.c | 4 +-
drivers/net/wireless/ath/ath10k/ce.h | 2 +-
drivers/net/wireless/ath/ath10k/core.h | 5 +-
drivers/net/wireless/ath/ath10k/htc.c | 4 +-
drivers/net/wireless/ath/ath10k/htt.h | 8 ++
drivers/net/wireless/ath/ath10k/htt_tx.c | 190 ++++++++++++++++---------------
drivers/net/wireless/ath/ath10k/pci.c | 12 +-
drivers/net/wireless/ath/ath10k/txrx.c | 6 +-
8 files changed, 122 insertions(+), 109 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index a0b1a8c..a79499c 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -1067,9 +1067,9 @@ struct ath10k_ce_pipe *ath10k_ce_init(struct ath10k *ar,
*
* For the lack of a better place do the check here.
*/
- BUILD_BUG_ON(TARGET_NUM_MSDU_DESC >
+ BUILD_BUG_ON(2*TARGET_NUM_MSDU_DESC >
(CE_HTT_H2T_MSG_SRC_NENTRIES - 1));
- BUILD_BUG_ON(TARGET_10X_NUM_MSDU_DESC >
+ BUILD_BUG_ON(2*TARGET_10X_NUM_MSDU_DESC >
(CE_HTT_H2T_MSG_SRC_NENTRIES - 1));

ret = ath10k_pci_wake(ar);
diff --git a/drivers/net/wireless/ath/ath10k/ce.h b/drivers/net/wireless/ath/ath10k/ce.h
index 322e929..8eb7f99 100644
--- a/drivers/net/wireless/ath/ath10k/ce.h
+++ b/drivers/net/wireless/ath/ath10k/ce.h
@@ -23,7 +23,7 @@

/* Maximum number of Copy Engine's supported */
#define CE_COUNT_MAX 8
-#define CE_HTT_H2T_MSG_SRC_NENTRIES 2048
+#define CE_HTT_H2T_MSG_SRC_NENTRIES 4096

/* Descriptor rings must be aligned to this boundary */
#define CE_DESC_RING_ALIGN 8
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 7cf022d..f34019f 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -67,9 +67,8 @@ struct ath10k_skb_cb {
struct {
u8 tid;
bool is_offchan;
-
- u8 frag_len;
- u8 pad_len;
+ struct ath10k_htt_txbuf *txbuf;
+ u32 txbuf_paddr;
} __packed htt;

struct {
diff --git a/drivers/net/wireless/ath/ath10k/htc.c b/drivers/net/wireless/ath/ath10k/htc.c
index 64ab8d6..1491ce7 100644
--- a/drivers/net/wireless/ath/ath10k/htc.c
+++ b/drivers/net/wireless/ath/ath10k/htc.c
@@ -202,10 +202,8 @@ static int ath10k_htc_tx_completion_handler(struct ath10k *ar,
struct ath10k_htc *htc = &ar->htc;
struct ath10k_htc_ep *ep = &htc->endpoint[eid];

- if (!skb) {
- ath10k_warn("invalid sk_buff completion - NULL pointer. firmware crashed?\n");
+ if (WARN_ON(!skb))
return 0;
- }

ath10k_htc_notify_tx_completion(ep, skb);
/* the skb now belongs to the completion handler */
diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index 02c009d..782a449 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -1188,6 +1188,13 @@ struct htt_rx_info {
bool mic_err;
};

+struct ath10k_htt_txbuf {
+ struct htt_data_tx_desc_frag frags[2];
+ struct ath10k_htc_hdr htc_hdr;
+ struct htt_cmd_hdr cmd_hdr;
+ struct htt_data_tx_desc cmd_tx;
+} __packed;
+
struct ath10k_htt {
struct ath10k *ar;
enum ath10k_htc_ep_id eid;
@@ -1269,6 +1276,7 @@ struct ath10k_htt {
struct sk_buff **pending_tx;
unsigned long *used_msdu_ids; /* bitmap */
wait_queue_head_t empty_tx_wq;
+ struct dma_pool *tx_pool;

/* set if host-fw communication goes haywire
* used to avoid further failures */
diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
index f5960c5..0e24f51 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -109,6 +109,14 @@ int ath10k_htt_tx_attach(struct ath10k_htt *htt)
return -ENOMEM;
}

+ htt->tx_pool = dma_pool_create("ath10k htt tx pool", htt->ar->dev,
+ sizeof(struct ath10k_htt_txbuf), 4, 0);
+ if (!htt->tx_pool) {
+ kfree(htt->used_msdu_ids);
+ kfree(htt->pending_tx);
+ return -ENOMEM;
+ }
+
return 0;
}

@@ -139,6 +147,7 @@ void ath10k_htt_tx_detach(struct ath10k_htt *htt)
ath10k_htt_tx_cleanup_pending(htt);
kfree(htt->pending_tx);
kfree(htt->used_msdu_ids);
+ dma_pool_destroy(htt->tx_pool);
return;
}

@@ -350,8 +359,7 @@ int ath10k_htt_mgmt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
memcpy(cmd->mgmt_tx.hdr, msdu->data,
min_t(int, msdu->len, HTT_MGMT_FRM_HDR_DOWNLOAD_LEN));

- skb_cb->htt.frag_len = 0;
- skb_cb->htt.pad_len = 0;
+ skb_cb->htt.txbuf = NULL;

res = ath10k_htc_send(&htt->ar->htc, htt->eid, txdesc);
if (res)
@@ -377,19 +385,18 @@ err:
int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
{
struct device *dev = htt->ar->dev;
- struct htt_cmd *cmd;
- struct htt_data_tx_desc_frag *tx_frags;
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)msdu->data;
struct ath10k_skb_cb *skb_cb = ATH10K_SKB_CB(msdu);
- struct sk_buff *txdesc = NULL;
- bool use_frags;
- u8 vdev_id = ATH10K_SKB_CB(msdu)->vdev_id;
- u8 tid;
- int prefetch_len, desc_len;
- int msdu_id = -1;
+ struct ath10k_hif_sg_item sg_items[2];
+ u8 vdev_id = skb_cb->vdev_id;
+ u8 tid = skb_cb->htt.tid;
+ int prefetch_len;
int res;
- u8 flags0;
- u16 flags1;
+ u8 flags0 = 0;
+ u16 msdu_id, flags1 = 0;
+ dma_addr_t paddr;
+ u32 frags_paddr;
+ bool use_frags;

res = ath10k_htt_tx_inc_pending(htt);
if (res)
@@ -408,105 +415,110 @@ int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
prefetch_len = min(htt->prefetch_len, msdu->len);
prefetch_len = roundup(prefetch_len, 4);

- desc_len = sizeof(cmd->hdr) + sizeof(cmd->data_tx) + prefetch_len;
-
- txdesc = ath10k_htc_alloc_skb(desc_len);
- if (!txdesc) {
- res = -ENOMEM;
- goto err_free_msdu_id;
- }
-
/* Since HTT 3.0 there is no separate mgmt tx command. However in case
* of mgmt tx using TX_FRM there is not tx fragment list. Instead of tx
* fragment list host driver specifies directly frame pointer. */
use_frags = htt->target_version_major < 3 ||
!ieee80211_is_mgmt(hdr->frame_control);

- if (!IS_ALIGNED((unsigned long)txdesc->data, 4)) {
- ath10k_warn("htt alignment check failed. dropping packet.\n");
- res = -EIO;
- goto err_free_txdesc;
- }
-
- if (use_frags) {
- skb_cb->htt.frag_len = sizeof(*tx_frags) * 2;
- skb_cb->htt.pad_len = (unsigned long)msdu->data -
- round_down((unsigned long)msdu->data, 4);
-
- skb_push(msdu, skb_cb->htt.frag_len + skb_cb->htt.pad_len);
- } else {
- skb_cb->htt.frag_len = 0;
- skb_cb->htt.pad_len = 0;
- }
+ skb_cb->htt.txbuf = dma_pool_alloc(htt->tx_pool, GFP_ATOMIC,
+ &paddr);
+ if (!skb_cb->htt.txbuf)
+ goto err_free_msdu_id;
+ skb_cb->htt.txbuf_paddr = paddr;

skb_cb->paddr = dma_map_single(dev, msdu->data, msdu->len,
DMA_TO_DEVICE);
res = dma_mapping_error(dev, skb_cb->paddr);
if (res)
- goto err_pull_txfrag;
-
- if (use_frags) {
- dma_sync_single_for_cpu(dev, skb_cb->paddr, msdu->len,
- DMA_TO_DEVICE);
-
- /* tx fragment list must be terminated with zero-entry */
- tx_frags = (struct htt_data_tx_desc_frag *)msdu->data;
- tx_frags[0].paddr = __cpu_to_le32(skb_cb->paddr +
- skb_cb->htt.frag_len +
- skb_cb->htt.pad_len);
- tx_frags[0].len = __cpu_to_le32(msdu->len -
- skb_cb->htt.frag_len -
- skb_cb->htt.pad_len);
- tx_frags[1].paddr = __cpu_to_le32(0);
- tx_frags[1].len = __cpu_to_le32(0);
-
- dma_sync_single_for_device(dev, skb_cb->paddr, msdu->len,
- DMA_TO_DEVICE);
- }
+ goto err_free_txbuf;

- ath10k_dbg(ATH10K_DBG_HTT, "tx-msdu 0x%llx\n",
- (unsigned long long) ATH10K_SKB_CB(msdu)->paddr);
- ath10k_dbg_dump(ATH10K_DBG_HTT_DUMP, NULL, "tx-msdu: ",
- msdu->data, msdu->len);
+ if (likely(use_frags)) {
+ skb_cb->htt.txbuf->frags[0].paddr = __cpu_to_le32(skb_cb->paddr);
+ skb_cb->htt.txbuf->frags[0].len = __cpu_to_le32(msdu->len);
+ skb_cb->htt.txbuf->frags[1].paddr = 0;
+ skb_cb->htt.txbuf->frags[1].len = 0;

- skb_put(txdesc, desc_len);
- cmd = (struct htt_cmd *)txdesc->data;
+ flags0 |= SM(ATH10K_HW_TXRX_NATIVE_WIFI,
+ HTT_DATA_TX_DESC_FLAGS0_PKT_TYPE);

- tid = ATH10K_SKB_CB(msdu)->htt.tid;
+ frags_paddr = skb_cb->htt.txbuf_paddr;
+ } else {
+ flags0 |= SM(ATH10K_HW_TXRX_MGMT,
+ HTT_DATA_TX_DESC_FLAGS0_PKT_TYPE);

- ath10k_dbg(ATH10K_DBG_HTT, "htt data tx using tid %hhu\n", tid);
+ frags_paddr = skb_cb->paddr;
+ }
+
+ /* Normally all commands go through HTC which manages tx credits for
+ * each endpoint and notifies when tx is completed.
+ *
+ * HTT endpoint is creditless so there's no need to care about HTC
+ * flags. In that case it is trivial to fill the HTC header here.
+ *
+ * MSDU transmission is considered completed upon HTT event. Receiving
+ * HTT tx completion event implicitly means TX_FRM command itself has
+ * been HTC tx-completed thus it is pointless to care about the HTC tx
+ * completion for TX_FRM at all. That's why transfer_context for all
+ * partials are NULL to avoid HTC tx completion handler from being
+ * unnecessarily called.
+ *
+ * There is simply no point in pushing HTT TX_FRM through HTC tx path
+ * as it's a waste of resources. By bypassing HTC it is possible to
+ * avoid extra memory allocations, compress data structures and thus
+ * improve performance. */
+
+ skb_cb->htt.txbuf->htc_hdr.eid = htt->eid;
+ skb_cb->htt.txbuf->htc_hdr.len = __cpu_to_le16(
+ sizeof(skb_cb->htt.txbuf->cmd_hdr) +
+ sizeof(skb_cb->htt.txbuf->cmd_tx) +
+ prefetch_len);
+ skb_cb->htt.txbuf->htc_hdr.flags = 0;

- flags0 = 0;
if (!ieee80211_has_protected(hdr->frame_control))
flags0 |= HTT_DATA_TX_DESC_FLAGS0_NO_ENCRYPT;
- flags0 |= HTT_DATA_TX_DESC_FLAGS0_MAC_HDR_PRESENT;

- if (use_frags)
- flags0 |= SM(ATH10K_HW_TXRX_NATIVE_WIFI,
- HTT_DATA_TX_DESC_FLAGS0_PKT_TYPE);
- else
- flags0 |= SM(ATH10K_HW_TXRX_MGMT,
- HTT_DATA_TX_DESC_FLAGS0_PKT_TYPE);
+ flags0 |= HTT_DATA_TX_DESC_FLAGS0_MAC_HDR_PRESENT;

- flags1 = 0;
flags1 |= SM((u16)vdev_id, HTT_DATA_TX_DESC_FLAGS1_VDEV_ID);
flags1 |= SM((u16)tid, HTT_DATA_TX_DESC_FLAGS1_EXT_TID);
flags1 |= HTT_DATA_TX_DESC_FLAGS1_CKSUM_L3_OFFLOAD;
flags1 |= HTT_DATA_TX_DESC_FLAGS1_CKSUM_L4_OFFLOAD;

- cmd->hdr.msg_type = HTT_H2T_MSG_TYPE_TX_FRM;
- cmd->data_tx.flags0 = flags0;
- cmd->data_tx.flags1 = __cpu_to_le16(flags1);
- cmd->data_tx.len = __cpu_to_le16(msdu->len -
- skb_cb->htt.frag_len -
- skb_cb->htt.pad_len);
- cmd->data_tx.id = __cpu_to_le16(msdu_id);
- cmd->data_tx.frags_paddr = __cpu_to_le32(skb_cb->paddr);
- cmd->data_tx.peerid = __cpu_to_le32(HTT_INVALID_PEERID);
-
- memcpy(cmd->data_tx.prefetch, hdr, prefetch_len);
+ skb_cb->htt.txbuf->cmd_hdr.msg_type = HTT_H2T_MSG_TYPE_TX_FRM;
+ skb_cb->htt.txbuf->cmd_tx.flags0 = flags0;
+ skb_cb->htt.txbuf->cmd_tx.flags1 = __cpu_to_le16(flags1);
+ skb_cb->htt.txbuf->cmd_tx.len = __cpu_to_le16(msdu->len);
+ skb_cb->htt.txbuf->cmd_tx.id = __cpu_to_le16(msdu_id);
+ skb_cb->htt.txbuf->cmd_tx.frags_paddr = __cpu_to_le32(frags_paddr);
+ skb_cb->htt.txbuf->cmd_tx.peerid = __cpu_to_le32(HTT_INVALID_PEERID);
+
+ ath10k_dbg(ATH10K_DBG_HTT,
+ "htt tx flags0 %hhu flags1 %hu len %d id %hu frags_paddr %08x, msdu_paddr %08x vdev %hhu tid %hhu\n",
+ flags0, flags1, msdu->len, msdu_id, frags_paddr,
+ (u32)skb_cb->paddr, vdev_id, tid);
+ ath10k_dbg_dump(ATH10K_DBG_HTT_DUMP, NULL, "htt tx msdu: ",
+ msdu->data, msdu->len);

- res = ath10k_htc_send(&htt->ar->htc, htt->eid, txdesc);
+ sg_items[0].transfer_id = 0;
+ sg_items[0].transfer_context = NULL;
+ sg_items[0].vaddr = skb_cb->htt.txbuf +
+ sizeof(skb_cb->htt.txbuf->frags);
+ sg_items[0].paddr = skb_cb->htt.txbuf_paddr +
+ sizeof(skb_cb->htt.txbuf->frags);
+ sg_items[0].len = sizeof(skb_cb->htt.txbuf->htc_hdr) +
+ sizeof(skb_cb->htt.txbuf->cmd_hdr) +
+ sizeof(skb_cb->htt.txbuf->cmd_tx);
+
+ sg_items[1].transfer_id = 0;
+ sg_items[1].transfer_context = NULL;
+ sg_items[1].vaddr = msdu->data;
+ sg_items[1].paddr = skb_cb->paddr;
+ sg_items[1].len = prefetch_len;
+
+ res = ath10k_hif_tx_sg(htt->ar,
+ htt->ar->htc.endpoint[htt->eid].ul_pipe_id,
+ sg_items, ARRAY_SIZE(sg_items));
if (res)
goto err_unmap_msdu;

@@ -514,10 +526,10 @@ int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)

err_unmap_msdu:
dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);
-err_pull_txfrag:
- skb_pull(msdu, skb_cb->htt.frag_len + skb_cb->htt.pad_len);
-err_free_txdesc:
- dev_kfree_skb_any(txdesc);
+err_free_txbuf:
+ dma_pool_free(htt->tx_pool,
+ skb_cb->htt.txbuf,
+ skb_cb->htt.txbuf_paddr);
err_free_msdu_id:
spin_lock_bh(&htt->tx_lock);
htt->pending_tx[msdu_id] = NULL;
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 713c18e..2305d58 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -714,6 +714,7 @@ static void ath10k_pci_ce_send_done(struct ath10k_ce_pipe *ce_state)
while (ath10k_ce_completed_send_next(ce_state, &transfer_context,
&ce_data, &nbytes,
&transfer_id) == 0) {
+ /* no need to call tx completion for NULL pointers */
if (transfer_context == NULL)
continue;

@@ -1423,16 +1424,9 @@ static void ath10k_pci_tx_pipe_cleanup(struct ath10k_pci_pipe *pipe_info)

while (ath10k_ce_cancel_send_next(ce_hdl, (void **)&netbuf,
&ce_data, &nbytes, &id) == 0) {
- /*
- * Indicate the completion to higer layer to free
- * the buffer
- */
-
- if (!netbuf) {
- ath10k_warn("invalid sk_buff on CE %d - NULL pointer. firmware crashed?\n",
- ce_hdl->id);
+ /* no need to call tx completion for NULL pointers */
+ if (!netbuf)
continue;
- }

ar_pci->msg_callbacks_current.tx_completion(ar,
netbuf,
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
index 8fd0680..0158ce0 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -66,8 +66,10 @@ void ath10k_txrx_tx_unref(struct ath10k_htt *htt,

dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);

- if (skb_cb->htt.frag_len)
- skb_pull(msdu, skb_cb->htt.frag_len + skb_cb->htt.pad_len);
+ if (skb_cb->htt.txbuf)
+ dma_pool_free(htt->tx_pool,
+ skb_cb->htt.txbuf,
+ skb_cb->htt.txbuf_paddr);

ath10k_report_offchan_tx(htt->ar, msdu);

--
1.8.5.3


2014-02-26 11:39:50

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v2 0/8] ath10k: performance improvements

Hi,

These patches aim at reducing host CPU load and
thus improve performance on low-end systems.

In my setup I get a relative improvement of
100mbps for both UDP Tx and Rx.

Tx ----->
laptop:eth---eth:AP135:ath10k---ath10k:laptop
<----- Rx


v2:
* improve comments/commit messages
* fix 1 unbalanced locking
* fix var naming (s/err/ret)
* remove code that didn't belong here
* split batch tx/rx into 2 patches
(+ath10k: reduce htt tx/rx spinlock overhead)

Michal Kazior (8):
ath10k: remove DMA mapping wrappers
ath10k: remove is_aborted from skb_cb
ath10k: replace send_head() with tx_sg()
ath10k: bypass htc for htt tx path
ath10k: batch htt tx/rx completions
ath10k: reduce htt tx/rx spinlock overhead
ath10k: remove pci completion list
ath10k: minimize coherent dma accesses

drivers/net/wireless/ath/ath10k/ce.c | 16 +-
drivers/net/wireless/ath/ath10k/ce.h | 9 +-
drivers/net/wireless/ath/ath10k/core.h | 33 +--
drivers/net/wireless/ath/ath10k/hif.h | 25 +-
drivers/net/wireless/ath/ath10k/htc.c | 25 +-
drivers/net/wireless/ath/ath10k/htt.h | 17 ++
drivers/net/wireless/ath/ath10k/htt_rx.c | 147 ++++++++----
drivers/net/wireless/ath/ath10k/htt_tx.c | 205 ++++++++--------
drivers/net/wireless/ath/ath10k/mac.c | 4 +-
drivers/net/wireless/ath/ath10k/pci.c | 389 +++++++------------------------
drivers/net/wireless/ath/ath10k/pci.h | 28 ---
drivers/net/wireless/ath/ath10k/txrx.c | 15 +-
drivers/net/wireless/ath/ath10k/wmi.c | 17 +-
13 files changed, 379 insertions(+), 551 deletions(-)

--
1.8.5.3


2014-02-17 09:38:02

by Michal Kazior

[permalink] [raw]
Subject: [RFC/RFT 6/7] ath10k: remove pci completion list

One of the premises was to guarantee serialized
completion handling for upper layers
(HTC/WMI/HTT). Since quite some time now it is no
longer necessary.

The other premise was to batch up tx/rx
completions to take advantage of hot caches.
However frame tx/rx completion indications come in
on a single pipe already so they are already
batched up. More meaningful batching is done in
HTT itself.

This means PCI completion is no longer necessary
to keep around. It just wastes memory, cycles and
SLOC.

Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/pci.c | 275 +++-------------------------------
drivers/net/wireless/ath/ath10k/pci.h | 28 ----
2 files changed, 24 insertions(+), 279 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 2305d58..9d242d8 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -58,7 +58,6 @@ static DEFINE_PCI_DEVICE_TABLE(ath10k_pci_id_table) = {
static int ath10k_pci_diag_read_access(struct ath10k *ar, u32 address,
u32 *data);

-static void ath10k_pci_process_ce(struct ath10k *ar);
static int ath10k_pci_post_rx(struct ath10k *ar);
static int ath10k_pci_post_rx_pipe(struct ath10k_pci_pipe *pipe_info,
int num);
@@ -73,7 +72,6 @@ static void ath10k_pci_free_irq(struct ath10k *ar);
static int ath10k_pci_bmi_wait(struct ath10k_ce_pipe *tx_pipe,
struct ath10k_ce_pipe *rx_pipe,
struct bmi_xfer *xfer);
-static void ath10k_pci_cleanup_ce(struct ath10k *ar);

static const struct ce_attr host_ce_config_wlan[] = {
/* CE0: host->target HTC control and raw streams */
@@ -678,34 +676,12 @@ void ath10k_do_pci_sleep(struct ath10k *ar)
}
}

-/*
- * FIXME: Handle OOM properly.
- */
-static inline
-struct ath10k_pci_compl *get_free_compl(struct ath10k_pci_pipe *pipe_info)
-{
- struct ath10k_pci_compl *compl = NULL;
-
- spin_lock_bh(&pipe_info->pipe_lock);
- if (list_empty(&pipe_info->compl_free)) {
- ath10k_warn("Completion buffers are full\n");
- goto exit;
- }
- compl = list_first_entry(&pipe_info->compl_free,
- struct ath10k_pci_compl, list);
- list_del(&compl->list);
-exit:
- spin_unlock_bh(&pipe_info->pipe_lock);
- return compl;
-}
-
/* Called by lower (CE) layer when a send to Target completes. */
static void ath10k_pci_ce_send_done(struct ath10k_ce_pipe *ce_state)
{
struct ath10k *ar = ce_state->ar;
struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
- struct ath10k_pci_pipe *pipe_info = &ar_pci->pipe_info[ce_state->id];
- struct ath10k_pci_compl *compl;
+ struct ath10k_hif_cb *cb = &ar_pci->msg_callbacks_current;
void *transfer_context;
u32 ce_data;
unsigned int nbytes;
@@ -718,27 +694,8 @@ static void ath10k_pci_ce_send_done(struct ath10k_ce_pipe *ce_state)
if (transfer_context == NULL)
continue;

- compl = get_free_compl(pipe_info);
- if (!compl)
- break;
-
- compl->state = ATH10K_PCI_COMPL_SEND;
- compl->ce_state = ce_state;
- compl->pipe_info = pipe_info;
- compl->skb = transfer_context;
- compl->nbytes = nbytes;
- compl->transfer_id = transfer_id;
- compl->flags = 0;
-
- /*
- * Add the completion to the processing queue.
- */
- spin_lock_bh(&ar_pci->compl_lock);
- list_add_tail(&compl->list, &ar_pci->compl_process);
- spin_unlock_bh(&ar_pci->compl_lock);
+ cb->tx_completion(ar, transfer_context, transfer_id);
}
-
- ath10k_pci_process_ce(ar);
}

/* Called by lower (CE) layer when data is received from the Target. */
@@ -747,42 +704,40 @@ static void ath10k_pci_ce_recv_data(struct ath10k_ce_pipe *ce_state)
struct ath10k *ar = ce_state->ar;
struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
struct ath10k_pci_pipe *pipe_info = &ar_pci->pipe_info[ce_state->id];
- struct ath10k_pci_compl *compl;
+ struct ath10k_hif_cb *cb = &ar_pci->msg_callbacks_current;
struct sk_buff *skb;
void *transfer_context;
u32 ce_data;
- unsigned int nbytes;
+ unsigned int nbytes, max_nbytes;
unsigned int transfer_id;
unsigned int flags;
+ int err;

while (ath10k_ce_completed_recv_next(ce_state, &transfer_context,
&ce_data, &nbytes, &transfer_id,
&flags) == 0) {
- compl = get_free_compl(pipe_info);
- if (!compl)
- break;
-
- compl->state = ATH10K_PCI_COMPL_RECV;
- compl->ce_state = ce_state;
- compl->pipe_info = pipe_info;
- compl->skb = transfer_context;
- compl->nbytes = nbytes;
- compl->transfer_id = transfer_id;
- compl->flags = flags;
+ err = ath10k_pci_post_rx_pipe(pipe_info, 1);
+ if (unlikely(err)) {
+ /* FIXME: retry */
+ ath10k_warn("failed to replenish CE rx ring %d: %d\n",
+ pipe_info->pipe_num, err);
+ }

skb = transfer_context;
+ max_nbytes = skb->len + skb_tailroom(skb);
dma_unmap_single(ar->dev, ATH10K_SKB_CB(skb)->paddr,
- skb->len + skb_tailroom(skb),
- DMA_FROM_DEVICE);
- /*
- * Add the completion to the processing queue.
- */
- spin_lock_bh(&ar_pci->compl_lock);
- list_add_tail(&compl->list, &ar_pci->compl_process);
- spin_unlock_bh(&ar_pci->compl_lock);
- }
+ max_nbytes, DMA_FROM_DEVICE);

- ath10k_pci_process_ce(ar);
+ if (unlikely(max_nbytes < nbytes)) {
+ ath10k_warn("rxed more than expected (nbytes %d, max %d)",
+ nbytes, max_nbytes);
+ dev_kfree_skb_any(skb);
+ continue;
+ }
+
+ skb_put(skb, nbytes);
+ cb->rx_completion(ar, skb, pipe_info->pipe_num);
+ }
}

static int ath10k_pci_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
@@ -931,52 +886,6 @@ static void ath10k_pci_hif_set_callbacks(struct ath10k *ar,
sizeof(ar_pci->msg_callbacks_current));
}

-static int ath10k_pci_alloc_compl(struct ath10k *ar)
-{
- struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
- const struct ce_attr *attr;
- struct ath10k_pci_pipe *pipe_info;
- struct ath10k_pci_compl *compl;
- int i, pipe_num, completions;
-
- spin_lock_init(&ar_pci->compl_lock);
- INIT_LIST_HEAD(&ar_pci->compl_process);
-
- for (pipe_num = 0; pipe_num < CE_COUNT; pipe_num++) {
- pipe_info = &ar_pci->pipe_info[pipe_num];
-
- spin_lock_init(&pipe_info->pipe_lock);
- INIT_LIST_HEAD(&pipe_info->compl_free);
-
- /* Handle Diagnostic CE specially */
- if (pipe_info->ce_hdl == ar_pci->ce_diag)
- continue;
-
- attr = &host_ce_config_wlan[pipe_num];
- completions = 0;
-
- if (attr->src_nentries)
- completions += attr->src_nentries;
-
- if (attr->dest_nentries)
- completions += attr->dest_nentries;
-
- for (i = 0; i < completions; i++) {
- compl = kmalloc(sizeof(*compl), GFP_KERNEL);
- if (!compl) {
- ath10k_warn("No memory for completion state\n");
- ath10k_pci_cleanup_ce(ar);
- return -ENOMEM;
- }
-
- compl->state = ATH10K_PCI_COMPL_FREE;
- list_add_tail(&compl->list, &pipe_info->compl_free);
- }
- }
-
- return 0;
-}
-
static int ath10k_pci_setup_ce_irq(struct ath10k *ar)
{
struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
@@ -1021,131 +930,6 @@ static void ath10k_pci_kill_tasklet(struct ath10k *ar)
tasklet_kill(&ar_pci->pipe_info[i].intr);
}

-static void ath10k_pci_cleanup_ce(struct ath10k *ar)
-{
- struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
- struct ath10k_pci_compl *compl, *tmp;
- struct ath10k_pci_pipe *pipe_info;
- struct sk_buff *netbuf;
- int pipe_num;
-
- /* Free pending completions. */
- spin_lock_bh(&ar_pci->compl_lock);
- if (!list_empty(&ar_pci->compl_process))
- ath10k_warn("pending completions still present! possible memory leaks.\n");
-
- list_for_each_entry_safe(compl, tmp, &ar_pci->compl_process, list) {
- list_del(&compl->list);
- netbuf = compl->skb;
- dev_kfree_skb_any(netbuf);
- kfree(compl);
- }
- spin_unlock_bh(&ar_pci->compl_lock);
-
- /* Free unused completions for each pipe. */
- for (pipe_num = 0; pipe_num < CE_COUNT; pipe_num++) {
- pipe_info = &ar_pci->pipe_info[pipe_num];
-
- spin_lock_bh(&pipe_info->pipe_lock);
- list_for_each_entry_safe(compl, tmp,
- &pipe_info->compl_free, list) {
- list_del(&compl->list);
- kfree(compl);
- }
- spin_unlock_bh(&pipe_info->pipe_lock);
- }
-}
-
-static void ath10k_pci_process_ce(struct ath10k *ar)
-{
- struct ath10k_pci *ar_pci = ar->hif.priv;
- struct ath10k_hif_cb *cb = &ar_pci->msg_callbacks_current;
- struct ath10k_pci_compl *compl;
- struct sk_buff *skb;
- unsigned int nbytes;
- int ret, send_done = 0;
-
- /* Upper layers aren't ready to handle tx/rx completions in parallel so
- * we must serialize all completion processing. */
-
- spin_lock_bh(&ar_pci->compl_lock);
- if (ar_pci->compl_processing) {
- spin_unlock_bh(&ar_pci->compl_lock);
- return;
- }
- ar_pci->compl_processing = true;
- spin_unlock_bh(&ar_pci->compl_lock);
-
- for (;;) {
- spin_lock_bh(&ar_pci->compl_lock);
- if (list_empty(&ar_pci->compl_process)) {
- spin_unlock_bh(&ar_pci->compl_lock);
- break;
- }
- compl = list_first_entry(&ar_pci->compl_process,
- struct ath10k_pci_compl, list);
- list_del(&compl->list);
- spin_unlock_bh(&ar_pci->compl_lock);
-
- switch (compl->state) {
- case ATH10K_PCI_COMPL_SEND:
- cb->tx_completion(ar,
- compl->skb,
- compl->transfer_id);
- send_done = 1;
- break;
- case ATH10K_PCI_COMPL_RECV:
- ret = ath10k_pci_post_rx_pipe(compl->pipe_info, 1);
- if (ret) {
- ath10k_warn("failed to post RX buffer for pipe %d: %d\n",
- compl->pipe_info->pipe_num, ret);
- break;
- }
-
- skb = compl->skb;
- nbytes = compl->nbytes;
-
- ath10k_dbg(ATH10K_DBG_PCI,
- "ath10k_pci_ce_recv_data netbuf=%p nbytes=%d\n",
- skb, nbytes);
- ath10k_dbg_dump(ATH10K_DBG_PCI_DUMP, NULL,
- "ath10k rx: ", skb->data, nbytes);
-
- if (skb->len + skb_tailroom(skb) >= nbytes) {
- skb_trim(skb, 0);
- skb_put(skb, nbytes);
- cb->rx_completion(ar, skb,
- compl->pipe_info->pipe_num);
- } else {
- ath10k_warn("rxed more than expected (nbytes %d, max %d)",
- nbytes,
- skb->len + skb_tailroom(skb));
- }
- break;
- case ATH10K_PCI_COMPL_FREE:
- ath10k_warn("free completion cannot be processed\n");
- break;
- default:
- ath10k_warn("invalid completion state (%d)\n",
- compl->state);
- break;
- }
-
- compl->state = ATH10K_PCI_COMPL_FREE;
-
- /*
- * Add completion back to the pipe's free list.
- */
- spin_lock_bh(&compl->pipe_info->pipe_lock);
- list_add_tail(&compl->list, &compl->pipe_info->compl_free);
- spin_unlock_bh(&compl->pipe_info->pipe_lock);
- }
-
- spin_lock_bh(&ar_pci->compl_lock);
- ar_pci->compl_processing = false;
- spin_unlock_bh(&ar_pci->compl_lock);
-}
-
/* TODO - temporary mapping while we have too few CE's */
static int ath10k_pci_hif_map_service_to_pipe(struct ath10k *ar,
u16 service_id, u8 *ul_pipe,
@@ -1317,17 +1101,11 @@ static int ath10k_pci_hif_start(struct ath10k *ar)
ath10k_pci_free_early_irq(ar);
ath10k_pci_kill_tasklet(ar);

- ret = ath10k_pci_alloc_compl(ar);
- if (ret) {
- ath10k_warn("failed to allocate CE completions: %d\n", ret);
- goto err_early_irq;
- }
-
ret = ath10k_pci_request_irq(ar);
if (ret) {
ath10k_warn("failed to post RX buffers for all pipes: %d\n",
ret);
- goto err_free_compl;
+ goto err_early_irq;
}

ret = ath10k_pci_setup_ce_irq(ar);
@@ -1351,9 +1129,6 @@ err_stop:
ath10k_ce_disable_interrupts(ar);
ath10k_pci_free_irq(ar);
ath10k_pci_kill_tasklet(ar);
- ath10k_pci_process_ce(ar);
-err_free_compl:
- ath10k_pci_cleanup_ce(ar);
err_early_irq:
/* Though there should be no interrupts (device was reset)
* power_down() expects the early IRQ to be installed as per the
@@ -1494,8 +1269,6 @@ static void ath10k_pci_hif_stop(struct ath10k *ar)
* not DMA nor interrupt. We process the leftovers and then free
* everything else up. */

- ath10k_pci_process_ce(ar);
- ath10k_pci_cleanup_ce(ar);
ath10k_pci_buffer_cleanup(ar);

/* Make the sure the device won't access any structures on the host by
diff --git a/drivers/net/wireless/ath/ath10k/pci.h b/drivers/net/wireless/ath/ath10k/pci.h
index a4f3203..b43fdb4 100644
--- a/drivers/net/wireless/ath/ath10k/pci.h
+++ b/drivers/net/wireless/ath/ath10k/pci.h
@@ -43,23 +43,6 @@ struct bmi_xfer {
u32 resp_len;
};

-enum ath10k_pci_compl_state {
- ATH10K_PCI_COMPL_FREE = 0,
- ATH10K_PCI_COMPL_SEND,
- ATH10K_PCI_COMPL_RECV,
-};
-
-struct ath10k_pci_compl {
- struct list_head list;
- enum ath10k_pci_compl_state state;
- struct ath10k_ce_pipe *ce_state;
- struct ath10k_pci_pipe *pipe_info;
- struct sk_buff *skb;
- unsigned int nbytes;
- unsigned int transfer_id;
- unsigned int flags;
-};
-
/*
* PCI-specific Target state
*
@@ -175,9 +158,6 @@ struct ath10k_pci_pipe {
/* protects compl_free and num_send_allowed */
spinlock_t pipe_lock;

- /* List of free CE completion slots */
- struct list_head compl_free;
-
struct ath10k_pci *ar_pci;
struct tasklet_struct intr;
};
@@ -205,14 +185,6 @@ struct ath10k_pci {
atomic_t keep_awake_count;
bool verified_awake;

- /* List of CE completions to be processed */
- struct list_head compl_process;
-
- /* protects compl_processing and compl_process */
- spinlock_t compl_lock;
-
- bool compl_processing;
-
struct ath10k_pci_pipe pipe_info[CE_COUNT_MAX];

struct ath10k_hif_cb msg_callbacks_current;
--
1.8.5.3


2014-02-19 14:28:00

by Kalle Valo

[permalink] [raw]
Subject: Re: [RFC/RFT 3/7] ath10k: replace send_head() with tx_sg()

Michal Kazior <[email protected]> writes:

> On 19 February 2014 13:48, Kalle Valo <[email protected]> wrote:
>> Michal Kazior <[email protected]> writes:
>>
>>> +struct ath10k_hif_sg_item {
>>> + u16 transfer_id;
>>> + void *transfer_context;
>>> + void *vaddr; /* for debugging mostly */
>>> + u32 paddr;
>>> + u16 len;
>>> +};
>>
>> This is the part I don't like. Instead of adding our own structs we
>> instead should have everything in skb->cb and pass the skbs around. The
>> sad part was that last fall I was working on cleaning up that but never
>> found the time to finish it :(
>
> There's simply not enough room to keep it all in ath10k_skb_cb
> directly.

After my cleanups transfer_context was not used and when using sk_buffs
properly vaddr and len would be useless. So we would have only transfer
id and paddr left. And when I was working on this they did fit to cb.

> It doesn't really make any sense to keep it there anyway because
> sg_item is used as means to pass a complex function argument to
> sg_tx().

If we used skbs we would just give a list/queue of them and no need to
have any extra structs.

> The sg_item list is never used again.

Ok, that's better. At least then it's later easier to convert to proper
skbs.

--
Kalle Valo

2014-02-27 07:25:08

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v3 8/8] ath10k: minimize coherent dma accesses

It doesn't make much sense to calculate the ring
size fill count because it already is memoized in
a separate variable.

Signed-off-by: Michal Kazior <[email protected]>
---
v2:
* fix unbalanced locking

drivers/net/wireless/ath/ath10k/htt_rx.c | 15 ++++-----------
1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 1cb88bf..2fe47d8 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -225,12 +225,6 @@ static void ath10k_htt_rx_ring_refill_retry(unsigned long arg)
ath10k_htt_rx_msdu_buff_replenish(htt);
}

-static unsigned ath10k_htt_rx_ring_elems(struct ath10k_htt *htt)
-{
- return (__le32_to_cpu(*htt->rx_ring.alloc_idx.vaddr) -
- htt->rx_ring.sw_rd_idx.msdu_payld) & htt->rx_ring.size_mask;
-}
-
void ath10k_htt_rx_detach(struct ath10k_htt *htt)
{
int sw_rd_idx = htt->rx_ring.sw_rd_idx.msdu_payld;
@@ -276,8 +270,10 @@ static inline struct sk_buff *ath10k_htt_rx_netbuf_pop(struct ath10k_htt *htt)

lockdep_assert_held(&htt->rx_ring.lock);

- if (ath10k_htt_rx_ring_elems(htt) == 0)
- ath10k_warn("htt rx ring is empty!\n");
+ if (htt->rx_ring.fill_cnt == 0) {
+ ath10k_warn("tried to pop sk_buff from an empty rx ring\n");
+ return NULL;
+ }

idx = htt->rx_ring.sw_rd_idx.msdu_payld;
msdu = htt->rx_ring.netbufs_ring[idx];
@@ -312,9 +308,6 @@ static int ath10k_htt_rx_amsdu_pop(struct ath10k_htt *htt,

lockdep_assert_held(&htt->rx_ring.lock);

- if (ath10k_htt_rx_ring_elems(htt) == 0)
- ath10k_warn("htt rx ring is empty!\n");
-
if (htt->rx_confused) {
ath10k_warn("htt is confused. refusing rx\n");
return 0;
--
1.8.5.3


2014-02-28 09:07:09

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCH v3 5/8] ath10k: batch htt tx/rx completions

On 28 February 2014 10:00, Kalle Valo <[email protected]> wrote:
> Michal Kazior <[email protected]> writes:
>
>> HTT Rx endpoint processes both frame rx
>> indications and frame tx completion indications.
>>
>> Those completions typically come in batches and
>> may be mixed so it makes sense to defer processing
>> hoping to get a bunch of them and take advantage
>> of hot caches.
>>
>> Signed-off-by: Michal Kazior <[email protected]>
>
> [...]
>
>> +void ath10k_htt_txrx_compl_task(unsigned long ptr)
>> +{
>> + struct ath10k_htt *htt = (struct ath10k_htt *)ptr;
>> + struct htt_resp *resp;
>> + struct sk_buff *skb;
>
> I think this can be a static function. Is it ok if I change this patch
> with the diff below?

I don't really mind.


Michał

2014-02-19 13:26:00

by Michal Kazior

[permalink] [raw]
Subject: Re: [RFC/RFT 3/7] ath10k: replace send_head() with tx_sg()

On 19 February 2014 13:48, Kalle Valo <[email protected]> wrote:
> Michal Kazior <[email protected]> writes:
>
>> PCI is capable of handling scatter-gather lists.
>> This can be used to avoid copying memory.
>>
>> Change the name of the callback while at to
>> reflect its purpose.
>>
>> Signed-off-by: Michal Kazior <[email protected]>
>
> [...]
>
>> +struct ath10k_hif_sg_item {
>> + u16 transfer_id;
>> + void *transfer_context;
>> + void *vaddr; /* for debugging mostly */
>> + u32 paddr;
>> + u16 len;
>> +};
>
> This is the part I don't like. Instead of adding our own structs we
> instead should have everything in skb->cb and pass the skbs around. The
> sad part was that last fall I was working on cleaning up that but never
> found the time to finish it :(

There's simply not enough room to keep it all in ath10k_skb_cb
directly. It doesn't really make any sense to keep it there anyway
because sg_item is used as means to pass a complex function argument
to sg_tx(). The sg_item list is never used again.


Michał

2014-02-27 07:25:06

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v3 5/8] ath10k: batch htt tx/rx completions

HTT Rx endpoint processes both frame rx
indications and frame tx completion indications.

Those completions typically come in batches and
may be mixed so it makes sense to defer processing
hoping to get a bunch of them and take advantage
of hot caches.

Signed-off-by: Michal Kazior <[email protected]>
---
v2:
* split locking related changes into next patch
* remove code that didn't belong here

drivers/net/wireless/ath/ath10k/htt.h | 8 +++
drivers/net/wireless/ath/ath10k/htt_rx.c | 116 ++++++++++++++++++++-----------
2 files changed, 85 insertions(+), 39 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index 2b76cb5..0743b8b 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -1283,6 +1283,12 @@ struct ath10k_htt {
* used to avoid further failures */
bool rx_confused;
struct tasklet_struct rx_replenish_task;
+
+ /* This is used to group tx/rx completions separately and process them
+ * in batches to reduce cache stalls */
+ struct tasklet_struct txrx_compl_task;
+ struct sk_buff_head tx_compl_q;
+ struct sk_buff_head rx_compl_q;
};

#define RX_HTT_HDR_STATUS_LEN 64
@@ -1354,4 +1360,6 @@ int ath10k_htt_tx_alloc_msdu_id(struct ath10k_htt *htt);
void ath10k_htt_tx_free_msdu_id(struct ath10k_htt *htt, u16 msdu_id);
int ath10k_htt_mgmt_tx(struct ath10k_htt *htt, struct sk_buff *);
int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *);
+void ath10k_htt_txrx_compl_task(unsigned long ptr);
+
#endif
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index fcd00f6..1141f6c 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -237,6 +237,10 @@ void ath10k_htt_rx_detach(struct ath10k_htt *htt)

del_timer_sync(&htt->rx_ring.refill_retry_timer);
tasklet_kill(&htt->rx_replenish_task);
+ tasklet_kill(&htt->txrx_compl_task);
+
+ skb_queue_purge(&htt->tx_compl_q);
+ skb_queue_purge(&htt->rx_compl_q);

while (sw_rd_idx != __le32_to_cpu(*(htt->rx_ring.alloc_idx.vaddr))) {
struct sk_buff *skb =
@@ -529,6 +533,12 @@ int ath10k_htt_rx_attach(struct ath10k_htt *htt)
tasklet_init(&htt->rx_replenish_task, ath10k_htt_rx_replenish_task,
(unsigned long)htt);

+ skb_queue_head_init(&htt->tx_compl_q);
+ skb_queue_head_init(&htt->rx_compl_q);
+
+ tasklet_init(&htt->txrx_compl_task, ath10k_htt_txrx_compl_task,
+ (unsigned long)htt);
+
ath10k_dbg(ATH10K_DBG_BOOT, "htt rx ring size %d fill_level %d\n",
htt->rx_ring.size, htt->rx_ring.fill_level);
return 0;
@@ -1123,6 +1133,43 @@ end:
}
}

+static void ath10k_htt_rx_frm_tx_compl(struct ath10k *ar,
+ struct sk_buff *skb)
+{
+ struct ath10k_htt *htt = &ar->htt;
+ struct htt_resp *resp = (struct htt_resp *)skb->data;
+ struct htt_tx_done tx_done = {};
+ int status = MS(resp->data_tx_completion.flags, HTT_DATA_TX_STATUS);
+ __le16 msdu_id;
+ int i;
+
+ switch (status) {
+ case HTT_DATA_TX_STATUS_NO_ACK:
+ tx_done.no_ack = true;
+ break;
+ case HTT_DATA_TX_STATUS_OK:
+ break;
+ case HTT_DATA_TX_STATUS_DISCARD:
+ case HTT_DATA_TX_STATUS_POSTPONE:
+ case HTT_DATA_TX_STATUS_DOWNLOAD_FAIL:
+ tx_done.discard = true;
+ break;
+ default:
+ ath10k_warn("unhandled tx completion status %d\n", status);
+ tx_done.discard = true;
+ break;
+ }
+
+ ath10k_dbg(ATH10K_DBG_HTT, "htt tx completion num_msdus %d\n",
+ resp->data_tx_completion.num_msdus);
+
+ for (i = 0; i < resp->data_tx_completion.num_msdus; i++) {
+ msdu_id = resp->data_tx_completion.msdus[i];
+ tx_done.msdu_id = __le16_to_cpu(msdu_id);
+ ath10k_txrx_tx_unref(htt, &tx_done);
+ }
+}
+
void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
{
struct ath10k_htt *htt = &ar->htt;
@@ -1141,10 +1188,10 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
complete(&htt->target_version_received);
break;
}
- case HTT_T2H_MSG_TYPE_RX_IND: {
- ath10k_htt_rx_handler(htt, &resp->rx_ind);
- break;
- }
+ case HTT_T2H_MSG_TYPE_RX_IND:
+ skb_queue_tail(&htt->rx_compl_q, skb);
+ tasklet_schedule(&htt->txrx_compl_task);
+ return;
case HTT_T2H_MSG_TYPE_PEER_MAP: {
struct htt_peer_map_event ev = {
.vdev_id = resp->peer_map.vdev_id,
@@ -1179,44 +1226,17 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
break;
}

+ spin_lock_bh(&htt->tx_lock);
ath10k_txrx_tx_unref(htt, &tx_done);
+ spin_unlock_bh(&htt->tx_lock);
break;
}
- case HTT_T2H_MSG_TYPE_TX_COMPL_IND: {
- struct htt_tx_done tx_done = {};
- int status = MS(resp->data_tx_completion.flags,
- HTT_DATA_TX_STATUS);
- __le16 msdu_id;
- int i;
-
- switch (status) {
- case HTT_DATA_TX_STATUS_NO_ACK:
- tx_done.no_ack = true;
- break;
- case HTT_DATA_TX_STATUS_OK:
- break;
- case HTT_DATA_TX_STATUS_DISCARD:
- case HTT_DATA_TX_STATUS_POSTPONE:
- case HTT_DATA_TX_STATUS_DOWNLOAD_FAIL:
- tx_done.discard = true;
- break;
- default:
- ath10k_warn("unhandled tx completion status %d\n",
- status);
- tx_done.discard = true;
- break;
- }
-
- ath10k_dbg(ATH10K_DBG_HTT, "htt tx completion num_msdus %d\n",
- resp->data_tx_completion.num_msdus);
-
- for (i = 0; i < resp->data_tx_completion.num_msdus; i++) {
- msdu_id = resp->data_tx_completion.msdus[i];
- tx_done.msdu_id = __le16_to_cpu(msdu_id);
- ath10k_txrx_tx_unref(htt, &tx_done);
- }
- break;
- }
+ case HTT_T2H_MSG_TYPE_TX_COMPL_IND:
+ spin_lock_bh(&htt->tx_lock);
+ __skb_queue_tail(&htt->tx_compl_q, skb);
+ spin_unlock_bh(&htt->tx_lock);
+ tasklet_schedule(&htt->txrx_compl_task);
+ return;
case HTT_T2H_MSG_TYPE_SEC_IND: {
struct ath10k *ar = htt->ar;
struct htt_security_indication *ev = &resp->security_indication;
@@ -1256,3 +1276,21 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
/* Free the indication buffer */
dev_kfree_skb_any(skb);
}
+
+void ath10k_htt_txrx_compl_task(unsigned long ptr)
+{
+ struct ath10k_htt *htt = (struct ath10k_htt *)ptr;
+ struct htt_resp *resp;
+ struct sk_buff *skb;
+
+ while ((skb = skb_dequeue(&htt->tx_compl_q))) {
+ ath10k_htt_rx_frm_tx_compl(htt->ar, skb);
+ dev_kfree_skb_any(skb);
+ }
+
+ while ((skb = skb_dequeue(&htt->rx_compl_q))) {
+ resp = (struct htt_resp *)skb->data;
+ ath10k_htt_rx_handler(htt, &resp->rx_ind);
+ dev_kfree_skb_any(skb);
+ }
+}
--
1.8.5.3


2014-02-27 07:25:01

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v3 0/8] ath10k: performance improvements

Hi,

These patches aim at reducing host CPU load and
thus improve performance on low-end systems.

In my setup I get a relative improvement of
100mbps for both UDP Tx and Rx.

Tx ----->
laptop:eth---eth:AP135:ath10k---ath10k:laptop
<----- Rx


v2:
* improve comments/commit messages
* fix unbalanced locking
* fix var naming (s/err/ret)
* fix sparse/checkpatch/buildbot warnings
* remove code that didn't belong here
* split batch tx/rx into 2 patches
(+ath10k: reduce htt tx/rx spinlock overhead)

v3:
* just a resend (due to my mail-server issues the
other day not all of the v2 patches had been
sent out)

Michal Kazior (8):
ath10k: remove DMA mapping wrappers
ath10k: remove is_aborted from skb_cb
ath10k: replace send_head() with tx_sg()
ath10k: bypass htc for htt tx path
ath10k: batch htt tx/rx completions
ath10k: reduce htt tx/rx spinlock overhead
ath10k: remove pci completion list
ath10k: minimize coherent dma accesses

drivers/net/wireless/ath/ath10k/ce.c | 16 +-
drivers/net/wireless/ath/ath10k/ce.h | 9 +-
drivers/net/wireless/ath/ath10k/core.h | 33 +--
drivers/net/wireless/ath/ath10k/hif.h | 25 +-
drivers/net/wireless/ath/ath10k/htc.c | 25 +-
drivers/net/wireless/ath/ath10k/htt.h | 17 ++
drivers/net/wireless/ath/ath10k/htt_rx.c | 147 ++++++++----
drivers/net/wireless/ath/ath10k/htt_tx.c | 205 ++++++++--------
drivers/net/wireless/ath/ath10k/mac.c | 4 +-
drivers/net/wireless/ath/ath10k/pci.c | 389 +++++++------------------------
drivers/net/wireless/ath/ath10k/pci.h | 28 ---
drivers/net/wireless/ath/ath10k/txrx.c | 15 +-
drivers/net/wireless/ath/ath10k/wmi.c | 17 +-
13 files changed, 379 insertions(+), 551 deletions(-)

--
1.8.5.3


2014-02-24 11:47:21

by Kalle Valo

[permalink] [raw]
Subject: Re: [RFC/RFT 3/7] ath10k: replace send_head() with tx_sg()

Michal Kazior <[email protected]> writes:

> On 19 February 2014 15:18, Kalle Valo <[email protected]> wrote:
>> Michal Kazior <[email protected]> writes:
>>
>>> It doesn't really make any sense to keep it there anyway because
>>> sg_item is used as means to pass a complex function argument to
>>> sg_tx().
>>
>> If we used skbs we would just give a list/queue of them and no need to
>> have any extra structs.
>
> Managing skbs would incur extra overhead. Skbs are also less flexible
> and probably unnecessary (at least at this point).

As a rule of thumb we should avoid reinventing things on our own and use
Linux infrastructure as much possible. There are a lot of benefits from
using sk_buffs.

> Think about the qos workaround removal - you have to submit htt tx
> descriptor frame prefetch in 2 parts instead of 1 (to make it appear
> to FW as there's no QoS field). With skbs you have to allocate another
> skb, copy part of MSDU into it... aaand you've just traded a single
> memmove() for dev_alloc_skb(), skb_put() and a memcpy().

I'm missing a lot of details here (and I can't check from code right
now), but are you saying that skb allocation is too slow? In that case
we should fix the skb allocation instead of working it around in ath10k.

But this is all hand waving for the future, not really something I see a
problem with this patchset.

--
Kalle Valo

2014-02-27 07:25:02

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v3 1/8] ath10k: remove DMA mapping wrappers

There's no real benefit from using them. DMA-API
already provides debugging. Some skbuffs are
already mapped directly with DMA-API since wrapper
arguments were insufficient and extending them
would be pointless.

Signed-off-by: Michal Kazior <[email protected]>
---
v2:
* var naming fix: s/ret/err/

drivers/net/wireless/ath/ath10k/core.h | 27 ---------------------------
drivers/net/wireless/ath/ath10k/htc.c | 11 ++++++++---
drivers/net/wireless/ath/ath10k/htt_tx.c | 12 ++++++++----
drivers/net/wireless/ath/ath10k/mac.c | 4 +++-
drivers/net/wireless/ath/ath10k/txrx.c | 5 +----
drivers/net/wireless/ath/ath10k/wmi.c | 17 ++++++++++++++---
6 files changed, 34 insertions(+), 42 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 1fc26fe..082fa77 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -62,7 +62,6 @@ struct ath10k;

struct ath10k_skb_cb {
dma_addr_t paddr;
- bool is_mapped;
bool is_aborted;
u8 vdev_id;

@@ -87,32 +86,6 @@ static inline struct ath10k_skb_cb *ATH10K_SKB_CB(struct sk_buff *skb)
return (struct ath10k_skb_cb *)&IEEE80211_SKB_CB(skb)->driver_data;
}

-static inline int ath10k_skb_map(struct device *dev, struct sk_buff *skb)
-{
- if (ATH10K_SKB_CB(skb)->is_mapped)
- return -EINVAL;
-
- ATH10K_SKB_CB(skb)->paddr = dma_map_single(dev, skb->data, skb->len,
- DMA_TO_DEVICE);
-
- if (unlikely(dma_mapping_error(dev, ATH10K_SKB_CB(skb)->paddr)))
- return -EIO;
-
- ATH10K_SKB_CB(skb)->is_mapped = true;
- return 0;
-}
-
-static inline int ath10k_skb_unmap(struct device *dev, struct sk_buff *skb)
-{
- if (!ATH10K_SKB_CB(skb)->is_mapped)
- return -EINVAL;
-
- dma_unmap_single(dev, ATH10K_SKB_CB(skb)->paddr, skb->len,
- DMA_TO_DEVICE);
- ATH10K_SKB_CB(skb)->is_mapped = false;
- return 0;
-}
-
static inline u32 host_interest_item_address(u32 item_offset)
{
return QCA988X_HOST_INTEREST_ADDRESS + item_offset;
diff --git a/drivers/net/wireless/ath/ath10k/htc.c b/drivers/net/wireless/ath/ath10k/htc.c
index edc57ab..69f1f46 100644
--- a/drivers/net/wireless/ath/ath10k/htc.c
+++ b/drivers/net/wireless/ath/ath10k/htc.c
@@ -63,7 +63,9 @@ static struct sk_buff *ath10k_htc_build_tx_ctrl_skb(void *ar)
static inline void ath10k_htc_restore_tx_skb(struct ath10k_htc *htc,
struct sk_buff *skb)
{
- ath10k_skb_unmap(htc->ar->dev, skb);
+ struct ath10k_skb_cb *skb_cb = ATH10K_SKB_CB(skb);
+
+ dma_unmap_single(htc->ar->dev, skb_cb->paddr, skb->len, DMA_TO_DEVICE);
skb_pull(skb, sizeof(struct ath10k_htc_hdr));
}

@@ -122,6 +124,8 @@ int ath10k_htc_send(struct ath10k_htc *htc,
struct sk_buff *skb)
{
struct ath10k_htc_ep *ep = &htc->endpoint[eid];
+ struct ath10k_skb_cb *skb_cb = ATH10K_SKB_CB(skb);
+ struct device *dev = htc->ar->dev;
int credits = 0;
int ret;

@@ -157,7 +161,8 @@ int ath10k_htc_send(struct ath10k_htc *htc,

ath10k_htc_prepare_tx_skb(ep, skb);

- ret = ath10k_skb_map(htc->ar->dev, skb);
+ skb_cb->paddr = dma_map_single(dev, skb->data, skb->len, DMA_TO_DEVICE);
+ ret = dma_mapping_error(dev, skb_cb->paddr);
if (ret)
goto err_credits;

@@ -169,7 +174,7 @@ int ath10k_htc_send(struct ath10k_htc *htc,
return 0;

err_unmap:
- ath10k_skb_unmap(htc->ar->dev, skb);
+ dma_unmap_single(dev, skb_cb->paddr, skb->len, DMA_TO_DEVICE);
err_credits:
if (ep->tx_credit_flow_enabled) {
spin_lock_bh(&htc->tx_lock);
diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
index acaa046..f5960c5 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -334,7 +334,9 @@ int ath10k_htt_mgmt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
goto err_free_msdu_id;
}

- res = ath10k_skb_map(dev, msdu);
+ skb_cb->paddr = dma_map_single(dev, msdu->data, msdu->len,
+ DMA_TO_DEVICE);
+ res = dma_mapping_error(dev, skb_cb->paddr);
if (res)
goto err_free_txdesc;

@@ -358,7 +360,7 @@ int ath10k_htt_mgmt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
return 0;

err_unmap_msdu:
- ath10k_skb_unmap(dev, msdu);
+ dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);
err_free_txdesc:
dev_kfree_skb_any(txdesc);
err_free_msdu_id:
@@ -437,7 +439,9 @@ int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
skb_cb->htt.pad_len = 0;
}

- res = ath10k_skb_map(dev, msdu);
+ skb_cb->paddr = dma_map_single(dev, msdu->data, msdu->len,
+ DMA_TO_DEVICE);
+ res = dma_mapping_error(dev, skb_cb->paddr);
if (res)
goto err_pull_txfrag;

@@ -509,7 +513,7 @@ int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
return 0;

err_unmap_msdu:
- ath10k_skb_unmap(dev, msdu);
+ dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);
err_pull_txfrag:
skb_pull(msdu, skb_cb->htt.frag_len + skb_cb->htt.pad_len);
err_free_txdesc:
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 0321da0..9d682ed 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -855,7 +855,9 @@ static void ath10k_control_beaconing(struct ath10k_vif *arvif,

spin_lock_bh(&arvif->ar->data_lock);
if (arvif->beacon) {
- ath10k_skb_unmap(arvif->ar->dev, arvif->beacon);
+ dma_unmap_single(arvif->ar->dev,
+ ATH10K_SKB_CB(arvif->beacon)->paddr,
+ arvif->beacon->len, DMA_TO_DEVICE);
dev_kfree_skb_any(arvif->beacon);

arvif->beacon = NULL;
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
index dcf7efd..fe69899 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -51,7 +51,6 @@ void ath10k_txrx_tx_unref(struct ath10k_htt *htt,
struct ieee80211_tx_info *info;
struct ath10k_skb_cb *skb_cb;
struct sk_buff *msdu;
- int ret;

ath10k_dbg(ATH10K_DBG_HTT, "htt tx completion msdu_id %u discard %d no_ack %d\n",
tx_done->msdu_id, !!tx_done->discard, !!tx_done->no_ack);
@@ -65,9 +64,7 @@ void ath10k_txrx_tx_unref(struct ath10k_htt *htt,
msdu = htt->pending_tx[tx_done->msdu_id];
skb_cb = ATH10K_SKB_CB(msdu);

- ret = ath10k_skb_unmap(dev, msdu);
- if (ret)
- ath10k_warn("data skb unmap failed (%d)\n", ret);
+ dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);

if (skb_cb->htt.frag_len)
skb_pull(msdu, skb_cb->htt.frag_len + skb_cb->htt.pad_len);
diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
index 91e501b..478e7f6 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -1360,7 +1360,7 @@ static void ath10k_wmi_event_host_swba(struct ath10k *ar, struct sk_buff *skb)
struct wmi_bcn_info *bcn_info;
struct ath10k_vif *arvif;
struct sk_buff *bcn;
- int vdev_id = 0;
+ int ret, vdev_id = 0;

ath10k_dbg(ATH10K_DBG_MGMT, "WMI_HOST_SWBA_EVENTID\n");

@@ -1435,16 +1435,27 @@ static void ath10k_wmi_event_host_swba(struct ath10k *ar, struct sk_buff *skb)
ath10k_warn("SWBA overrun on vdev %d\n",
arvif->vdev_id);

- ath10k_skb_unmap(ar->dev, arvif->beacon);
+ dma_unmap_single(arvif->ar->dev,
+ ATH10K_SKB_CB(arvif->beacon)->paddr,
+ arvif->beacon->len, DMA_TO_DEVICE);
dev_kfree_skb_any(arvif->beacon);
}

- ath10k_skb_map(ar->dev, bcn);
+ ATH10K_SKB_CB(bcn)->paddr = dma_map_single(arvif->ar->dev,
+ bcn->data, bcn->len,
+ DMA_TO_DEVICE);
+ ret = dma_mapping_error(arvif->ar->dev,
+ ATH10K_SKB_CB(bcn)->paddr);
+ if (ret) {
+ ath10k_warn("failed to map beacon: %d\n", ret);
+ goto skip;
+ }

arvif->beacon = bcn;
arvif->beacon_sent = false;

ath10k_wmi_tx_beacon_nowait(arvif);
+skip:
spin_unlock_bh(&ar->data_lock);
}
}
--
1.8.5.3


2014-02-27 07:25:03

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v3 3/8] ath10k: replace send_head() with tx_sg()

PCI is capable of handling scatter-gather lists.
This can be used to avoid copying memory.

Change the name of the callback while at to
reflect its purpose.

Signed-off-by: Michal Kazior <[email protected]>
---
v2:
* add comment for NULL transfer_context behavior

drivers/net/wireless/ath/ath10k/ce.c | 12 ++---
drivers/net/wireless/ath/ath10k/ce.h | 7 +++
drivers/net/wireless/ath/ath10k/hif.h | 25 ++++++-----
drivers/net/wireless/ath/ath10k/htc.c | 10 ++++-
drivers/net/wireless/ath/ath10k/pci.c | 84 +++++++++++++++++++++++------------
5 files changed, 92 insertions(+), 46 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index d44d618..a0b1a8c 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -266,12 +266,12 @@ static inline void ath10k_ce_engine_int_status_clear(struct ath10k *ar,
* ath10k_ce_sendlist_send.
* The caller takes responsibility for any needed locking.
*/
-static int ath10k_ce_send_nolock(struct ath10k_ce_pipe *ce_state,
- void *per_transfer_context,
- u32 buffer,
- unsigned int nbytes,
- unsigned int transfer_id,
- unsigned int flags)
+int ath10k_ce_send_nolock(struct ath10k_ce_pipe *ce_state,
+ void *per_transfer_context,
+ u32 buffer,
+ unsigned int nbytes,
+ unsigned int transfer_id,
+ unsigned int flags)
{
struct ath10k *ar = ce_state->ar;
struct ath10k_ce_ring *src_ring = ce_state->src_ring;
diff --git a/drivers/net/wireless/ath/ath10k/ce.h b/drivers/net/wireless/ath/ath10k/ce.h
index 67dbde6..322e929 100644
--- a/drivers/net/wireless/ath/ath10k/ce.h
+++ b/drivers/net/wireless/ath/ath10k/ce.h
@@ -152,6 +152,13 @@ int ath10k_ce_send(struct ath10k_ce_pipe *ce_state,
unsigned int transfer_id,
unsigned int flags);

+int ath10k_ce_send_nolock(struct ath10k_ce_pipe *ce_state,
+ void *per_transfer_context,
+ u32 buffer,
+ unsigned int nbytes,
+ unsigned int transfer_id,
+ unsigned int flags);
+
void ath10k_ce_send_cb_register(struct ath10k_ce_pipe *ce_state,
void (*send_cb)(struct ath10k_ce_pipe *),
int disable_interrupts);
diff --git a/drivers/net/wireless/ath/ath10k/hif.h b/drivers/net/wireless/ath/ath10k/hif.h
index dcdea68..2ac7bea 100644
--- a/drivers/net/wireless/ath/ath10k/hif.h
+++ b/drivers/net/wireless/ath/ath10k/hif.h
@@ -21,6 +21,14 @@
#include <linux/kernel.h>
#include "core.h"

+struct ath10k_hif_sg_item {
+ u16 transfer_id;
+ void *transfer_context; /* NULL = tx completion callback not called */
+ void *vaddr; /* for debugging mostly */
+ u32 paddr;
+ u16 len;
+};
+
struct ath10k_hif_cb {
int (*tx_completion)(struct ath10k *ar,
struct sk_buff *wbuf,
@@ -31,11 +39,9 @@ struct ath10k_hif_cb {
};

struct ath10k_hif_ops {
- /* Send the head of a buffer to HIF for transmission to the target. */
- int (*send_head)(struct ath10k *ar, u8 pipe_id,
- unsigned int transfer_id,
- unsigned int nbytes,
- struct sk_buff *buf);
+ /* send a scatter-gather list to the target */
+ int (*tx_sg)(struct ath10k *ar, u8 pipe_id,
+ struct ath10k_hif_sg_item *items, int n_items);

/*
* API to handle HIF-specific BMI message exchanges, this API is
@@ -86,12 +92,11 @@ struct ath10k_hif_ops {
};


-static inline int ath10k_hif_send_head(struct ath10k *ar, u8 pipe_id,
- unsigned int transfer_id,
- unsigned int nbytes,
- struct sk_buff *buf)
+static inline int ath10k_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
+ struct ath10k_hif_sg_item *items,
+ int n_items)
{
- return ar->hif.ops->send_head(ar, pipe_id, transfer_id, nbytes, buf);
+ return ar->hif.ops->tx_sg(ar, pipe_id, items, n_items);
}

static inline int ath10k_hif_exchange_bmi_msg(struct ath10k *ar,
diff --git a/drivers/net/wireless/ath/ath10k/htc.c b/drivers/net/wireless/ath/ath10k/htc.c
index 69f1f46..64ab8d6 100644
--- a/drivers/net/wireless/ath/ath10k/htc.c
+++ b/drivers/net/wireless/ath/ath10k/htc.c
@@ -125,6 +125,7 @@ int ath10k_htc_send(struct ath10k_htc *htc,
{
struct ath10k_htc_ep *ep = &htc->endpoint[eid];
struct ath10k_skb_cb *skb_cb = ATH10K_SKB_CB(skb);
+ struct ath10k_hif_sg_item sg_item;
struct device *dev = htc->ar->dev;
int credits = 0;
int ret;
@@ -166,8 +167,13 @@ int ath10k_htc_send(struct ath10k_htc *htc,
if (ret)
goto err_credits;

- ret = ath10k_hif_send_head(htc->ar, ep->ul_pipe_id, ep->eid,
- skb->len, skb);
+ sg_item.transfer_id = ep->eid;
+ sg_item.transfer_context = skb;
+ sg_item.vaddr = skb->data;
+ sg_item.paddr = skb_cb->paddr;
+ sg_item.len = skb->len;
+
+ ret = ath10k_hif_tx_sg(htc->ar, ep->ul_pipe_id, &sg_item, 1);
if (ret)
goto err_unmap;

diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index d973975..713c18e 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -714,6 +714,9 @@ static void ath10k_pci_ce_send_done(struct ath10k_ce_pipe *ce_state)
while (ath10k_ce_completed_send_next(ce_state, &transfer_context,
&ce_data, &nbytes,
&transfer_id) == 0) {
+ if (transfer_context == NULL)
+ continue;
+
compl = get_free_compl(pipe_info);
if (!compl)
break;
@@ -781,39 +784,64 @@ static void ath10k_pci_ce_recv_data(struct ath10k_ce_pipe *ce_state)
ath10k_pci_process_ce(ar);
}

-/* Send the first nbytes bytes of the buffer */
-static int ath10k_pci_hif_send_head(struct ath10k *ar, u8 pipe_id,
- unsigned int transfer_id,
- unsigned int bytes, struct sk_buff *nbuf)
+static int ath10k_pci_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
+ struct ath10k_hif_sg_item *items, int n_items)
{
- struct ath10k_skb_cb *skb_cb = ATH10K_SKB_CB(nbuf);
struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
- struct ath10k_pci_pipe *pipe_info = &(ar_pci->pipe_info[pipe_id]);
- struct ath10k_ce_pipe *ce_hdl = pipe_info->ce_hdl;
- unsigned int len;
- u32 flags = 0;
- int ret;
+ struct ath10k_pci_pipe *pci_pipe = &ar_pci->pipe_info[pipe_id];
+ struct ath10k_ce_pipe *ce_pipe = pci_pipe->ce_hdl;
+ struct ath10k_ce_ring *src_ring = ce_pipe->src_ring;
+ unsigned int nentries_mask = src_ring->nentries_mask;
+ unsigned int sw_index = src_ring->sw_index;
+ unsigned int write_index = src_ring->write_index;
+ int err, i;

- len = min(bytes, nbuf->len);
- bytes -= len;
+ spin_lock_bh(&ar_pci->ce_lock);

- if (len & 3)
- ath10k_warn("skb not aligned to 4-byte boundary (%d)\n", len);
+ if (unlikely(CE_RING_DELTA(nentries_mask,
+ write_index, sw_index - 1) < n_items)) {
+ err = -ENOBUFS;
+ goto unlock;
+ }

- ath10k_dbg(ATH10K_DBG_PCI,
- "pci send data vaddr %p paddr 0x%llx len %d as %d bytes\n",
- nbuf->data, (unsigned long long) skb_cb->paddr,
- nbuf->len, len);
- ath10k_dbg_dump(ATH10K_DBG_PCI_DUMP, NULL,
- "ath10k tx: data: ",
- nbuf->data, nbuf->len);
-
- ret = ath10k_ce_send(ce_hdl, nbuf, skb_cb->paddr, len, transfer_id,
- flags);
- if (ret)
- ath10k_warn("failed to send sk_buff to CE: %p\n", nbuf);
+ for (i = 0; i < n_items - 1; i++) {
+ ath10k_dbg(ATH10K_DBG_PCI,
+ "pci tx item %d paddr 0x%08x len %d n_items %d\n",
+ i, items[i].paddr, items[i].len, n_items);
+ ath10k_dbg_dump(ATH10K_DBG_PCI_DUMP, NULL, "item data: ",
+ items[i].vaddr, items[i].len);

- return ret;
+ err = ath10k_ce_send_nolock(ce_pipe,
+ items[i].transfer_context,
+ items[i].paddr,
+ items[i].len,
+ items[i].transfer_id,
+ CE_SEND_FLAG_GATHER);
+ if (err)
+ goto unlock;
+ }
+
+ /* `i` is equal to `n_items -1` after for() */
+
+ ath10k_dbg(ATH10K_DBG_PCI,
+ "pci tx item %d paddr 0x%08x len %d n_items %d\n",
+ i, items[i].paddr, items[i].len, n_items);
+ ath10k_dbg_dump(ATH10K_DBG_PCI_DUMP, NULL, "item data: ",
+ items[i].vaddr, items[i].len);
+
+ err = ath10k_ce_send_nolock(ce_pipe,
+ items[i].transfer_context,
+ items[i].paddr,
+ items[i].len,
+ items[i].transfer_id,
+ 0);
+ if (err)
+ goto unlock;
+
+ err = 0;
+unlock:
+ spin_unlock_bh(&ar_pci->ce_lock);
+ return err;
}

static u16 ath10k_pci_hif_get_free_queue_number(struct ath10k *ar, u8 pipe)
@@ -2249,7 +2277,7 @@ static int ath10k_pci_hif_resume(struct ath10k *ar)
#endif

static const struct ath10k_hif_ops ath10k_pci_hif_ops = {
- .send_head = ath10k_pci_hif_send_head,
+ .tx_sg = ath10k_pci_hif_tx_sg,
.exchange_bmi_msg = ath10k_pci_hif_exchange_bmi_msg,
.start = ath10k_pci_hif_start,
.stop = ath10k_pci_hif_stop,
--
1.8.5.3


2014-02-28 09:54:33

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] ath10k: bypass htc for htt tx path

On 28 February 2014 10:28, Kalle Valo <[email protected]> wrote:
> Michal Kazior <[email protected]> writes:
>
>> On 28 February 2014 10:06, Kalle Valo <[email protected]> wrote:
>>> Michal Kazior <[email protected]> writes:
>>>
>>>> --- a/drivers/net/wireless/ath/ath10k/htc.c
>>>> +++ b/drivers/net/wireless/ath/ath10k/htc.c
>>>> @@ -202,10 +202,8 @@ static int ath10k_htc_tx_completion_handler(struct ath10k *ar,
>>>> struct ath10k_htc *htc = &ar->htc;
>>>> struct ath10k_htc_ep *ep = &htc->endpoint[eid];
>>>>
>>>> - if (!skb) {
>>>> - ath10k_warn("invalid sk_buff completion - NULL pointer. firmware crashed?\n");
>>>> + if (WARN_ON(!skb))
>>>> return 0;
>>>> - }
>>>
>>> WARN_ON() is a bit dangerous here as it might cause excessive spamming.
>>> Why did you want to change this? I think either ath10k_warn() or
>>> WARN_ON_ONCE() would be safer, but not sure which one to use.
>>

[...]

>> Perhaps WARN_ON() is a bit over the top here, but since this is now
>> more of a logic issue rather than runtime issue I decided to change it
>> from ath10k_warn to WARN_ON(). It's probably still a good idea to make
>> it _ONCE generally, although if you actually get skbuff it's already
>> too late or it should be screaming loudly because someone must've
>> changed the code in an incorrect/incomplete way.
>
> So I change it to WARN_ON_ONCE(), ok?

Sure. Thanks!


Michał

2014-02-20 06:43:03

by Michal Kazior

[permalink] [raw]
Subject: Re: [RFC/RFT 3/7] ath10k: replace send_head() with tx_sg()

On 19 February 2014 15:18, Kalle Valo <[email protected]> wrote:
> Michal Kazior <[email protected]> writes:
>
>> On 19 February 2014 13:48, Kalle Valo <[email protected]> wrote:
>>> Michal Kazior <[email protected]> writes:
>>>
>>>> +struct ath10k_hif_sg_item {
>>>> + u16 transfer_id;
>>>> + void *transfer_context;
>>>> + void *vaddr; /* for debugging mostly */
>>>> + u32 paddr;
>>>> + u16 len;
>>>> +};
>>>
>>> This is the part I don't like. Instead of adding our own structs we
>>> instead should have everything in skb->cb and pass the skbs around. The
>>> sad part was that last fall I was working on cleaning up that but never
>>> found the time to finish it :(
>>
>> There's simply not enough room to keep it all in ath10k_skb_cb
>> directly.
>
> After my cleanups transfer_context was not used and when using sk_buffs
> properly vaddr and len would be useless. So we would have only transfer
> id and paddr left. And when I was working on this they did fit to cb.

You'd need `len` so you can use part of the original MSDU as a frame
prefetch for htt tx descriptor.


>> It doesn't really make any sense to keep it there anyway because
>> sg_item is used as means to pass a complex function argument to
>> sg_tx().
>
> If we used skbs we would just give a list/queue of them and no need to
> have any extra structs.

Managing skbs would incur extra overhead. Skbs are also less flexible
and probably unnecessary (at least at this point).

Think about the qos workaround removal - you have to submit htt tx
descriptor frame prefetch in 2 parts instead of 1 (to make it appear
to FW as there's no QoS field). With skbs you have to allocate another
skb, copy part of MSDU into it... aaand you've just traded a single
memmove() for dev_alloc_skb(), skb_put() and a memcpy().


Micha?

2014-02-17 09:38:00

by Michal Kazior

[permalink] [raw]
Subject: [RFC/RFT 5/7] ath10k: batch htt tx/rx completions

HTT Rx endpoint processes both frame rx
indications and frame tx completion indications.

Those completions typically come in batches and
may be mixed so it makes sense to defer processing
hoping to get a bunch of them and take advantage
of hot caches.

Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/htt.h | 8 ++
drivers/net/wireless/ath/ath10k/htt_rx.c | 138 +++++++++++++++++++++----------
drivers/net/wireless/ath/ath10k/htt_tx.c | 5 +-
drivers/net/wireless/ath/ath10k/txrx.c | 4 +-
4 files changed, 107 insertions(+), 48 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index 782a449..c36151c 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -1282,6 +1282,12 @@ struct ath10k_htt {
* used to avoid further failures */
bool rx_confused;
struct tasklet_struct rx_replenish_task;
+
+ /* This is used to group tx/rx completions separately and process them
+ * in batches to reduce cache stalls */
+ struct tasklet_struct txrx_compl_task;
+ struct sk_buff_head tx_compl_q;
+ struct sk_buff_head rx_compl_q;
};

#define RX_HTT_HDR_STATUS_LEN 64
@@ -1353,4 +1359,6 @@ int ath10k_htt_tx_alloc_msdu_id(struct ath10k_htt *htt);
void ath10k_htt_tx_free_msdu_id(struct ath10k_htt *htt, u16 msdu_id);
int ath10k_htt_mgmt_tx(struct ath10k_htt *htt, struct sk_buff *);
int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *);
+void ath10k_htt_txrx_compl_task(unsigned long ptr);
+
#endif
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 371bed6..d3051e3 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -237,6 +237,10 @@ void ath10k_htt_rx_detach(struct ath10k_htt *htt)

del_timer_sync(&htt->rx_ring.refill_retry_timer);
tasklet_kill(&htt->rx_replenish_task);
+ tasklet_kill(&htt->txrx_compl_task);
+
+ skb_queue_purge(&htt->tx_compl_q);
+ skb_queue_purge(&htt->rx_compl_q);

while (sw_rd_idx != __le32_to_cpu(*(htt->rx_ring.alloc_idx.vaddr))) {
struct sk_buff *skb =
@@ -270,7 +274,7 @@ static inline struct sk_buff *ath10k_htt_rx_netbuf_pop(struct ath10k_htt *htt)
int idx;
struct sk_buff *msdu;

- spin_lock_bh(&htt->rx_ring.lock);
+ lockdep_assert_held(&htt->rx_ring.lock);

if (ath10k_htt_rx_ring_elems(htt) == 0)
ath10k_warn("htt rx ring is empty!\n");
@@ -283,7 +287,6 @@ static inline struct sk_buff *ath10k_htt_rx_netbuf_pop(struct ath10k_htt *htt)
htt->rx_ring.sw_rd_idx.msdu_payld = idx;
htt->rx_ring.fill_cnt--;

- spin_unlock_bh(&htt->rx_ring.lock);
return msdu;
}

@@ -307,6 +310,8 @@ static int ath10k_htt_rx_amsdu_pop(struct ath10k_htt *htt,
struct sk_buff *msdu;
struct htt_rx_desc *rx_desc;

+ lockdep_assert_held(&htt->rx_ring.lock);
+
if (ath10k_htt_rx_ring_elems(htt) == 0)
ath10k_warn("htt rx ring is empty!\n");

@@ -529,6 +534,12 @@ int ath10k_htt_rx_attach(struct ath10k_htt *htt)
tasklet_init(&htt->rx_replenish_task, ath10k_htt_rx_replenish_task,
(unsigned long)htt);

+ skb_queue_head_init(&htt->tx_compl_q);
+ skb_queue_head_init(&htt->rx_compl_q);
+
+ tasklet_init(&htt->txrx_compl_task, ath10k_htt_txrx_compl_task,
+ (unsigned long)htt);
+
ath10k_dbg(ATH10K_DBG_BOOT, "htt rx ring size %d fill_level %d\n",
htt->rx_ring.size, htt->rx_ring.fill_level);
return 0;
@@ -888,6 +899,8 @@ static void ath10k_htt_rx_handler(struct ath10k_htt *htt,
u8 *fw_desc;
int i, j;

+ lockdep_assert_held(&htt->rx_ring.lock);
+
memset(&info, 0, sizeof(info));

fw_desc_len = __le16_to_cpu(rx->prefix.fw_rx_desc_bytes);
@@ -1024,8 +1037,11 @@ static void ath10k_htt_rx_frag_handler(struct ath10k_htt *htt,

msdu_head = NULL;
msdu_tail = NULL;
+
+ spin_lock_bh(&htt->rx_ring.lock);
msdu_chaining = ath10k_htt_rx_amsdu_pop(htt, &fw_desc, &fw_desc_len,
&msdu_head, &msdu_tail);
+ spin_unlock_bh(&htt->rx_ring.lock);

ath10k_dbg(ATH10K_DBG_HTT_DUMP, "htt rx frag ahead\n");

@@ -1117,6 +1133,45 @@ end:
}
}

+static void ath10k_htt_rx_frm_tx_compl(struct ath10k *ar,
+ struct sk_buff *skb)
+{
+ struct ath10k_htt *htt = &ar->htt;
+ struct htt_resp *resp = (struct htt_resp *)skb->data;
+ struct htt_tx_done tx_done = {};
+ int status = MS(resp->data_tx_completion.flags, HTT_DATA_TX_STATUS);
+ __le16 msdu_id;
+ int i;
+
+ lockdep_assert_held(&htt->tx_lock);
+
+ switch (status) {
+ case HTT_DATA_TX_STATUS_NO_ACK:
+ tx_done.no_ack = true;
+ break;
+ case HTT_DATA_TX_STATUS_OK:
+ break;
+ case HTT_DATA_TX_STATUS_DISCARD:
+ case HTT_DATA_TX_STATUS_POSTPONE:
+ case HTT_DATA_TX_STATUS_DOWNLOAD_FAIL:
+ tx_done.discard = true;
+ break;
+ default:
+ ath10k_warn("unhandled tx completion status %d\n", status);
+ tx_done.discard = true;
+ break;
+ }
+
+ ath10k_dbg(ATH10K_DBG_HTT, "htt tx completion num_msdus %d\n",
+ resp->data_tx_completion.num_msdus);
+
+ for (i = 0; i < resp->data_tx_completion.num_msdus; i++) {
+ msdu_id = resp->data_tx_completion.msdus[i];
+ tx_done.msdu_id = __le16_to_cpu(msdu_id);
+ ath10k_txrx_tx_unref(htt, &tx_done);
+ }
+}
+
void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
{
struct ath10k_htt *htt = &ar->htt;
@@ -1135,10 +1190,12 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
complete(&htt->target_version_received);
break;
}
- case HTT_T2H_MSG_TYPE_RX_IND: {
- ath10k_htt_rx_handler(htt, &resp->rx_ind);
- break;
- }
+ case HTT_T2H_MSG_TYPE_RX_IND:
+ spin_lock_bh(&htt->rx_ring.lock);
+ __skb_queue_tail(&htt->rx_compl_q, skb);
+ spin_unlock_bh(&htt->rx_ring.lock);
+ tasklet_schedule(&htt->txrx_compl_task);
+ return;
case HTT_T2H_MSG_TYPE_PEER_MAP: {
struct htt_peer_map_event ev = {
.vdev_id = resp->peer_map.vdev_id,
@@ -1156,11 +1213,11 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
break;
}
case HTT_T2H_MSG_TYPE_MGMT_TX_COMPLETION: {
+ struct htt_resp *resp = (struct htt_resp *)skb->data;
struct htt_tx_done tx_done = {};
int status = __le32_to_cpu(resp->mgmt_tx_completion.status);

- tx_done.msdu_id =
- __le32_to_cpu(resp->mgmt_tx_completion.desc_id);
+ tx_done.msdu_id = __le32_to_cpu(resp->mgmt_tx_completion.desc_id);

switch (status) {
case HTT_MGMT_TX_STATUS_OK:
@@ -1173,44 +1230,17 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
break;
}

+ spin_lock_bh(&htt->tx_lock);
ath10k_txrx_tx_unref(htt, &tx_done);
+ spin_unlock_bh(&htt->tx_lock);
break;
}
- case HTT_T2H_MSG_TYPE_TX_COMPL_IND: {
- struct htt_tx_done tx_done = {};
- int status = MS(resp->data_tx_completion.flags,
- HTT_DATA_TX_STATUS);
- __le16 msdu_id;
- int i;
-
- switch (status) {
- case HTT_DATA_TX_STATUS_NO_ACK:
- tx_done.no_ack = true;
- break;
- case HTT_DATA_TX_STATUS_OK:
- break;
- case HTT_DATA_TX_STATUS_DISCARD:
- case HTT_DATA_TX_STATUS_POSTPONE:
- case HTT_DATA_TX_STATUS_DOWNLOAD_FAIL:
- tx_done.discard = true;
- break;
- default:
- ath10k_warn("unhandled tx completion status %d\n",
- status);
- tx_done.discard = true;
- break;
- }
-
- ath10k_dbg(ATH10K_DBG_HTT, "htt tx completion num_msdus %d\n",
- resp->data_tx_completion.num_msdus);
-
- for (i = 0; i < resp->data_tx_completion.num_msdus; i++) {
- msdu_id = resp->data_tx_completion.msdus[i];
- tx_done.msdu_id = __le16_to_cpu(msdu_id);
- ath10k_txrx_tx_unref(htt, &tx_done);
- }
- break;
- }
+ case HTT_T2H_MSG_TYPE_TX_COMPL_IND:
+ spin_lock_bh(&htt->tx_lock);
+ __skb_queue_tail(&htt->tx_compl_q, skb);
+ spin_unlock_bh(&htt->tx_lock);
+ tasklet_schedule(&htt->txrx_compl_task);
+ return;
case HTT_T2H_MSG_TYPE_SEC_IND: {
struct ath10k *ar = htt->ar;
struct htt_security_indication *ev = &resp->security_indication;
@@ -1250,3 +1280,25 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
/* Free the indication buffer */
dev_kfree_skb_any(skb);
}
+
+void ath10k_htt_txrx_compl_task(unsigned long ptr)
+{
+ struct ath10k_htt *htt = (struct ath10k_htt *)ptr;
+ struct htt_resp *resp;
+ struct sk_buff *skb;
+
+ spin_lock_bh(&htt->tx_lock);
+ while ((skb = __skb_dequeue(&htt->tx_compl_q))) {
+ ath10k_htt_rx_frm_tx_compl(htt->ar, skb);
+ dev_kfree_skb_any(skb);
+ }
+ spin_unlock_bh(&htt->tx_lock);
+
+ spin_lock_bh(&htt->rx_ring.lock);
+ while ((skb = __skb_dequeue(&htt->rx_compl_q))) {
+ resp = (struct htt_resp *)skb->data;
+ ath10k_htt_rx_handler(htt, &resp->rx_ind);
+ dev_kfree_skb_any(skb);
+ }
+ spin_unlock_bh(&htt->rx_ring.lock);
+}
diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
index 0e24f51..2d10ab1 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -125,9 +125,7 @@ static void ath10k_htt_tx_cleanup_pending(struct ath10k_htt *htt)
struct htt_tx_done tx_done = {0};
int msdu_id;

- /* No locks needed. Called after communication with the device has
- * been stopped. */
-
+ spin_lock_bh(&htt->tx_lock);
for (msdu_id = 0; msdu_id < htt->max_num_pending_tx; msdu_id++) {
if (!test_bit(msdu_id, htt->used_msdu_ids))
continue;
@@ -140,6 +138,7 @@ static void ath10k_htt_tx_cleanup_pending(struct ath10k_htt *htt)

ath10k_txrx_tx_unref(htt, &tx_done);
}
+ spin_unlock_bh(&htt->tx_lock);
}

void ath10k_htt_tx_detach(struct ath10k_htt *htt)
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
index 0158ce0..aecc46c 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -52,6 +52,8 @@ void ath10k_txrx_tx_unref(struct ath10k_htt *htt,
struct ath10k_skb_cb *skb_cb;
struct sk_buff *msdu;

+ lockdep_assert_held(&htt->tx_lock);
+
ath10k_dbg(ATH10K_DBG_HTT, "htt tx completion msdu_id %u discard %d no_ack %d\n",
tx_done->msdu_id, !!tx_done->discard, !!tx_done->no_ack);

@@ -91,13 +93,11 @@ void ath10k_txrx_tx_unref(struct ath10k_htt *htt,
/* we do not own the msdu anymore */

exit:
- spin_lock_bh(&htt->tx_lock);
htt->pending_tx[tx_done->msdu_id] = NULL;
ath10k_htt_tx_free_msdu_id(htt, tx_done->msdu_id);
__ath10k_htt_tx_dec_pending(htt);
if (htt->num_pending_tx == 0)
wake_up(&htt->empty_tx_wq);
- spin_unlock_bh(&htt->tx_lock);
}

static const u8 rx_legacy_rate_idx[] = {
--
1.8.5.3


2014-02-27 07:25:04

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v3 4/8] ath10k: bypass htc for htt tx path

Going through full htc tx path for htt tx is a
waste of resources. By skipping it it's possible
to easily submit scatter-gather to the pci hif for
reduced host cpu load and improved performance.

The new approach uses dma pool to store the
following metadata for each tx request:
* msdu fragment list
* htc header
* htt tx command

The htt tx command contains a msdu prefetch.
Instead of copying it original mapped msdu address
is used to submit a second scatter-gather item to
hif to make a complete htt tx command.

The htt tx command itself hands over dma mapped
pointers to msdus and completion of the command
itself doesn't mean the frame has been sent and
can be unmapped/freed. This is why htc tx
completion is skipped for htt tx as all tx related
resources are freed upon htt tx completion
indication event (which also implicitly means htt
tx command itself was completed).

Since now each htt tx request effectively consists
of 2 copy engine items CE_HTT_H2T_MSG_SRC_NENTRIES
is updated to allow maximum of
TARGET_10X_NUM_MSDU_DESC msdus being queued. This
keeps the tx path resource management simple.

Signed-off-by: Michal Kazior <[email protected]>
---
v2:
* improve commit log
* improve comment in code
* fix sparse/checkpatch/buildbot warnings

drivers/net/wireless/ath/ath10k/ce.c | 4 +-
drivers/net/wireless/ath/ath10k/ce.h | 2 +-
drivers/net/wireless/ath/ath10k/core.h | 5 +-
drivers/net/wireless/ath/ath10k/htc.c | 4 +-
drivers/net/wireless/ath/ath10k/htt.h | 9 ++
drivers/net/wireless/ath/ath10k/htt_tx.c | 190 ++++++++++++++++---------------
drivers/net/wireless/ath/ath10k/pci.c | 12 +-
drivers/net/wireless/ath/ath10k/txrx.c | 6 +-
8 files changed, 123 insertions(+), 109 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index a0b1a8c..a79499c 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -1067,9 +1067,9 @@ struct ath10k_ce_pipe *ath10k_ce_init(struct ath10k *ar,
*
* For the lack of a better place do the check here.
*/
- BUILD_BUG_ON(TARGET_NUM_MSDU_DESC >
+ BUILD_BUG_ON(2*TARGET_NUM_MSDU_DESC >
(CE_HTT_H2T_MSG_SRC_NENTRIES - 1));
- BUILD_BUG_ON(TARGET_10X_NUM_MSDU_DESC >
+ BUILD_BUG_ON(2*TARGET_10X_NUM_MSDU_DESC >
(CE_HTT_H2T_MSG_SRC_NENTRIES - 1));

ret = ath10k_pci_wake(ar);
diff --git a/drivers/net/wireless/ath/ath10k/ce.h b/drivers/net/wireless/ath/ath10k/ce.h
index 322e929..8eb7f99 100644
--- a/drivers/net/wireless/ath/ath10k/ce.h
+++ b/drivers/net/wireless/ath/ath10k/ce.h
@@ -23,7 +23,7 @@

/* Maximum number of Copy Engine's supported */
#define CE_COUNT_MAX 8
-#define CE_HTT_H2T_MSG_SRC_NENTRIES 2048
+#define CE_HTT_H2T_MSG_SRC_NENTRIES 4096

/* Descriptor rings must be aligned to this boundary */
#define CE_DESC_RING_ALIGN 8
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 7cf022d..f34019f 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -67,9 +67,8 @@ struct ath10k_skb_cb {
struct {
u8 tid;
bool is_offchan;
-
- u8 frag_len;
- u8 pad_len;
+ struct ath10k_htt_txbuf *txbuf;
+ u32 txbuf_paddr;
} __packed htt;

struct {
diff --git a/drivers/net/wireless/ath/ath10k/htc.c b/drivers/net/wireless/ath/ath10k/htc.c
index 64ab8d6..1491ce7 100644
--- a/drivers/net/wireless/ath/ath10k/htc.c
+++ b/drivers/net/wireless/ath/ath10k/htc.c
@@ -202,10 +202,8 @@ static int ath10k_htc_tx_completion_handler(struct ath10k *ar,
struct ath10k_htc *htc = &ar->htc;
struct ath10k_htc_ep *ep = &htc->endpoint[eid];

- if (!skb) {
- ath10k_warn("invalid sk_buff completion - NULL pointer. firmware crashed?\n");
+ if (WARN_ON(!skb))
return 0;
- }

ath10k_htc_notify_tx_completion(ep, skb);
/* the skb now belongs to the completion handler */
diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index 02c009d..2b76cb5 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -20,6 +20,7 @@

#include <linux/bug.h>
#include <linux/interrupt.h>
+#include <linux/dmapool.h>

#include "htc.h"
#include "rx_desc.h"
@@ -1188,6 +1189,13 @@ struct htt_rx_info {
bool mic_err;
};

+struct ath10k_htt_txbuf {
+ struct htt_data_tx_desc_frag frags[2];
+ struct ath10k_htc_hdr htc_hdr;
+ struct htt_cmd_hdr cmd_hdr;
+ struct htt_data_tx_desc cmd_tx;
+} __packed;
+
struct ath10k_htt {
struct ath10k *ar;
enum ath10k_htc_ep_id eid;
@@ -1269,6 +1277,7 @@ struct ath10k_htt {
struct sk_buff **pending_tx;
unsigned long *used_msdu_ids; /* bitmap */
wait_queue_head_t empty_tx_wq;
+ struct dma_pool *tx_pool;

/* set if host-fw communication goes haywire
* used to avoid further failures */
diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
index f5960c5..20b7a44 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -109,6 +109,14 @@ int ath10k_htt_tx_attach(struct ath10k_htt *htt)
return -ENOMEM;
}

+ htt->tx_pool = dma_pool_create("ath10k htt tx pool", htt->ar->dev,
+ sizeof(struct ath10k_htt_txbuf), 4, 0);
+ if (!htt->tx_pool) {
+ kfree(htt->used_msdu_ids);
+ kfree(htt->pending_tx);
+ return -ENOMEM;
+ }
+
return 0;
}

@@ -139,6 +147,7 @@ void ath10k_htt_tx_detach(struct ath10k_htt *htt)
ath10k_htt_tx_cleanup_pending(htt);
kfree(htt->pending_tx);
kfree(htt->used_msdu_ids);
+ dma_pool_destroy(htt->tx_pool);
return;
}

@@ -350,8 +359,7 @@ int ath10k_htt_mgmt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
memcpy(cmd->mgmt_tx.hdr, msdu->data,
min_t(int, msdu->len, HTT_MGMT_FRM_HDR_DOWNLOAD_LEN));

- skb_cb->htt.frag_len = 0;
- skb_cb->htt.pad_len = 0;
+ skb_cb->htt.txbuf = NULL;

res = ath10k_htc_send(&htt->ar->htc, htt->eid, txdesc);
if (res)
@@ -377,19 +385,19 @@ err:
int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
{
struct device *dev = htt->ar->dev;
- struct htt_cmd *cmd;
- struct htt_data_tx_desc_frag *tx_frags;
struct ieee80211_hdr *hdr = (struct ieee80211_hdr *)msdu->data;
struct ath10k_skb_cb *skb_cb = ATH10K_SKB_CB(msdu);
- struct sk_buff *txdesc = NULL;
- bool use_frags;
- u8 vdev_id = ATH10K_SKB_CB(msdu)->vdev_id;
- u8 tid;
- int prefetch_len, desc_len;
- int msdu_id = -1;
+ struct ath10k_hif_sg_item sg_items[2];
+ struct htt_data_tx_desc_frag *frags;
+ u8 vdev_id = skb_cb->vdev_id;
+ u8 tid = skb_cb->htt.tid;
+ int prefetch_len;
int res;
- u8 flags0;
- u16 flags1;
+ u8 flags0 = 0;
+ u16 msdu_id, flags1 = 0;
+ dma_addr_t paddr;
+ u32 frags_paddr;
+ bool use_frags;

res = ath10k_htt_tx_inc_pending(htt);
if (res)
@@ -408,105 +416,109 @@ int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
prefetch_len = min(htt->prefetch_len, msdu->len);
prefetch_len = roundup(prefetch_len, 4);

- desc_len = sizeof(cmd->hdr) + sizeof(cmd->data_tx) + prefetch_len;
-
- txdesc = ath10k_htc_alloc_skb(desc_len);
- if (!txdesc) {
- res = -ENOMEM;
- goto err_free_msdu_id;
- }
-
/* Since HTT 3.0 there is no separate mgmt tx command. However in case
* of mgmt tx using TX_FRM there is not tx fragment list. Instead of tx
* fragment list host driver specifies directly frame pointer. */
use_frags = htt->target_version_major < 3 ||
!ieee80211_is_mgmt(hdr->frame_control);

- if (!IS_ALIGNED((unsigned long)txdesc->data, 4)) {
- ath10k_warn("htt alignment check failed. dropping packet.\n");
- res = -EIO;
- goto err_free_txdesc;
- }
-
- if (use_frags) {
- skb_cb->htt.frag_len = sizeof(*tx_frags) * 2;
- skb_cb->htt.pad_len = (unsigned long)msdu->data -
- round_down((unsigned long)msdu->data, 4);
-
- skb_push(msdu, skb_cb->htt.frag_len + skb_cb->htt.pad_len);
- } else {
- skb_cb->htt.frag_len = 0;
- skb_cb->htt.pad_len = 0;
- }
+ skb_cb->htt.txbuf = dma_pool_alloc(htt->tx_pool, GFP_ATOMIC,
+ &paddr);
+ if (!skb_cb->htt.txbuf)
+ goto err_free_msdu_id;
+ skb_cb->htt.txbuf_paddr = paddr;

skb_cb->paddr = dma_map_single(dev, msdu->data, msdu->len,
DMA_TO_DEVICE);
res = dma_mapping_error(dev, skb_cb->paddr);
if (res)
- goto err_pull_txfrag;
-
- if (use_frags) {
- dma_sync_single_for_cpu(dev, skb_cb->paddr, msdu->len,
- DMA_TO_DEVICE);
-
- /* tx fragment list must be terminated with zero-entry */
- tx_frags = (struct htt_data_tx_desc_frag *)msdu->data;
- tx_frags[0].paddr = __cpu_to_le32(skb_cb->paddr +
- skb_cb->htt.frag_len +
- skb_cb->htt.pad_len);
- tx_frags[0].len = __cpu_to_le32(msdu->len -
- skb_cb->htt.frag_len -
- skb_cb->htt.pad_len);
- tx_frags[1].paddr = __cpu_to_le32(0);
- tx_frags[1].len = __cpu_to_le32(0);
-
- dma_sync_single_for_device(dev, skb_cb->paddr, msdu->len,
- DMA_TO_DEVICE);
- }
+ goto err_free_txbuf;

- ath10k_dbg(ATH10K_DBG_HTT, "tx-msdu 0x%llx\n",
- (unsigned long long) ATH10K_SKB_CB(msdu)->paddr);
- ath10k_dbg_dump(ATH10K_DBG_HTT_DUMP, NULL, "tx-msdu: ",
- msdu->data, msdu->len);
+ if (likely(use_frags)) {
+ frags = skb_cb->htt.txbuf->frags;

- skb_put(txdesc, desc_len);
- cmd = (struct htt_cmd *)txdesc->data;
+ frags[0].paddr = __cpu_to_le32(skb_cb->paddr);
+ frags[0].len = __cpu_to_le32(msdu->len);
+ frags[1].paddr = 0;
+ frags[1].len = 0;
+
+ flags0 |= SM(ATH10K_HW_TXRX_NATIVE_WIFI,
+ HTT_DATA_TX_DESC_FLAGS0_PKT_TYPE);

- tid = ATH10K_SKB_CB(msdu)->htt.tid;
+ frags_paddr = skb_cb->htt.txbuf_paddr;
+ } else {
+ flags0 |= SM(ATH10K_HW_TXRX_MGMT,
+ HTT_DATA_TX_DESC_FLAGS0_PKT_TYPE);

- ath10k_dbg(ATH10K_DBG_HTT, "htt data tx using tid %hhu\n", tid);
+ frags_paddr = skb_cb->paddr;
+ }
+
+ /* Normally all commands go through HTC which manages tx credits for
+ * each endpoint and notifies when tx is completed.
+ *
+ * HTT endpoint is creditless so there's no need to care about HTC
+ * flags. In that case it is trivial to fill the HTC header here.
+ *
+ * MSDU transmission is considered completed upon HTT event. This
+ * implies no relevant resources can be freed until after the event is
+ * received. That's why HTC tx completion handler itself is ignored by
+ * setting NULL to transfer_context for all sg items.
+ *
+ * There is simply no point in pushing HTT TX_FRM through HTC tx path
+ * as it's a waste of resources. By bypassing HTC it is possible to
+ * avoid extra memory allocations, compress data structures and thus
+ * improve performance. */
+
+ skb_cb->htt.txbuf->htc_hdr.eid = htt->eid;
+ skb_cb->htt.txbuf->htc_hdr.len = __cpu_to_le16(
+ sizeof(skb_cb->htt.txbuf->cmd_hdr) +
+ sizeof(skb_cb->htt.txbuf->cmd_tx) +
+ prefetch_len);
+ skb_cb->htt.txbuf->htc_hdr.flags = 0;

- flags0 = 0;
if (!ieee80211_has_protected(hdr->frame_control))
flags0 |= HTT_DATA_TX_DESC_FLAGS0_NO_ENCRYPT;
- flags0 |= HTT_DATA_TX_DESC_FLAGS0_MAC_HDR_PRESENT;

- if (use_frags)
- flags0 |= SM(ATH10K_HW_TXRX_NATIVE_WIFI,
- HTT_DATA_TX_DESC_FLAGS0_PKT_TYPE);
- else
- flags0 |= SM(ATH10K_HW_TXRX_MGMT,
- HTT_DATA_TX_DESC_FLAGS0_PKT_TYPE);
+ flags0 |= HTT_DATA_TX_DESC_FLAGS0_MAC_HDR_PRESENT;

- flags1 = 0;
flags1 |= SM((u16)vdev_id, HTT_DATA_TX_DESC_FLAGS1_VDEV_ID);
flags1 |= SM((u16)tid, HTT_DATA_TX_DESC_FLAGS1_EXT_TID);
flags1 |= HTT_DATA_TX_DESC_FLAGS1_CKSUM_L3_OFFLOAD;
flags1 |= HTT_DATA_TX_DESC_FLAGS1_CKSUM_L4_OFFLOAD;

- cmd->hdr.msg_type = HTT_H2T_MSG_TYPE_TX_FRM;
- cmd->data_tx.flags0 = flags0;
- cmd->data_tx.flags1 = __cpu_to_le16(flags1);
- cmd->data_tx.len = __cpu_to_le16(msdu->len -
- skb_cb->htt.frag_len -
- skb_cb->htt.pad_len);
- cmd->data_tx.id = __cpu_to_le16(msdu_id);
- cmd->data_tx.frags_paddr = __cpu_to_le32(skb_cb->paddr);
- cmd->data_tx.peerid = __cpu_to_le32(HTT_INVALID_PEERID);
-
- memcpy(cmd->data_tx.prefetch, hdr, prefetch_len);
+ skb_cb->htt.txbuf->cmd_hdr.msg_type = HTT_H2T_MSG_TYPE_TX_FRM;
+ skb_cb->htt.txbuf->cmd_tx.flags0 = flags0;
+ skb_cb->htt.txbuf->cmd_tx.flags1 = __cpu_to_le16(flags1);
+ skb_cb->htt.txbuf->cmd_tx.len = __cpu_to_le16(msdu->len);
+ skb_cb->htt.txbuf->cmd_tx.id = __cpu_to_le16(msdu_id);
+ skb_cb->htt.txbuf->cmd_tx.frags_paddr = __cpu_to_le32(frags_paddr);
+ skb_cb->htt.txbuf->cmd_tx.peerid = __cpu_to_le32(HTT_INVALID_PEERID);
+
+ ath10k_dbg(ATH10K_DBG_HTT,
+ "htt tx flags0 %hhu flags1 %hu len %d id %hu frags_paddr %08x, msdu_paddr %08x vdev %hhu tid %hhu\n",
+ flags0, flags1, msdu->len, msdu_id, frags_paddr,
+ (u32)skb_cb->paddr, vdev_id, tid);
+ ath10k_dbg_dump(ATH10K_DBG_HTT_DUMP, NULL, "htt tx msdu: ",
+ msdu->data, msdu->len);

- res = ath10k_htc_send(&htt->ar->htc, htt->eid, txdesc);
+ sg_items[0].transfer_id = 0;
+ sg_items[0].transfer_context = NULL;
+ sg_items[0].vaddr = &skb_cb->htt.txbuf->htc_hdr;
+ sg_items[0].paddr = skb_cb->htt.txbuf_paddr +
+ sizeof(skb_cb->htt.txbuf->frags);
+ sg_items[0].len = sizeof(skb_cb->htt.txbuf->htc_hdr) +
+ sizeof(skb_cb->htt.txbuf->cmd_hdr) +
+ sizeof(skb_cb->htt.txbuf->cmd_tx);
+
+ sg_items[1].transfer_id = 0;
+ sg_items[1].transfer_context = NULL;
+ sg_items[1].vaddr = msdu->data;
+ sg_items[1].paddr = skb_cb->paddr;
+ sg_items[1].len = prefetch_len;
+
+ res = ath10k_hif_tx_sg(htt->ar,
+ htt->ar->htc.endpoint[htt->eid].ul_pipe_id,
+ sg_items, ARRAY_SIZE(sg_items));
if (res)
goto err_unmap_msdu;

@@ -514,10 +526,10 @@ int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)

err_unmap_msdu:
dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);
-err_pull_txfrag:
- skb_pull(msdu, skb_cb->htt.frag_len + skb_cb->htt.pad_len);
-err_free_txdesc:
- dev_kfree_skb_any(txdesc);
+err_free_txbuf:
+ dma_pool_free(htt->tx_pool,
+ skb_cb->htt.txbuf,
+ skb_cb->htt.txbuf_paddr);
err_free_msdu_id:
spin_lock_bh(&htt->tx_lock);
htt->pending_tx[msdu_id] = NULL;
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 713c18e..2305d58 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -714,6 +714,7 @@ static void ath10k_pci_ce_send_done(struct ath10k_ce_pipe *ce_state)
while (ath10k_ce_completed_send_next(ce_state, &transfer_context,
&ce_data, &nbytes,
&transfer_id) == 0) {
+ /* no need to call tx completion for NULL pointers */
if (transfer_context == NULL)
continue;

@@ -1423,16 +1424,9 @@ static void ath10k_pci_tx_pipe_cleanup(struct ath10k_pci_pipe *pipe_info)

while (ath10k_ce_cancel_send_next(ce_hdl, (void **)&netbuf,
&ce_data, &nbytes, &id) == 0) {
- /*
- * Indicate the completion to higer layer to free
- * the buffer
- */
-
- if (!netbuf) {
- ath10k_warn("invalid sk_buff on CE %d - NULL pointer. firmware crashed?\n",
- ce_hdl->id);
+ /* no need to call tx completion for NULL pointers */
+ if (!netbuf)
continue;
- }

ar_pci->msg_callbacks_current.tx_completion(ar,
netbuf,
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
index fe69899..2993ca7 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -66,8 +66,10 @@ void ath10k_txrx_tx_unref(struct ath10k_htt *htt,

dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);

- if (skb_cb->htt.frag_len)
- skb_pull(msdu, skb_cb->htt.frag_len + skb_cb->htt.pad_len);
+ if (skb_cb->htt.txbuf)
+ dma_pool_free(htt->tx_pool,
+ skb_cb->htt.txbuf,
+ skb_cb->htt.txbuf_paddr);

ath10k_report_offchan_tx(htt->ar, msdu);

--
1.8.5.3


2014-02-19 12:38:03

by Kalle Valo

[permalink] [raw]
Subject: Re: [RFC/RFT 1/7] ath10k: remove DMA mapping wrappers

Michal Kazior <[email protected]> writes:

> There's no real benefit from using them. DMA-API
> already provides debugging. Some skbuffs are
> already mapped directly with DMA-API since wrapper
> arguments were insufficient and extending them
> would be pointless.
>
> Signed-off-by: Michal Kazior <[email protected]>

[...]

> --- a/drivers/net/wireless/ath/ath10k/wmi.c
> +++ b/drivers/net/wireless/ath/ath10k/wmi.c
> @@ -1360,7 +1360,7 @@ static void ath10k_wmi_event_host_swba(struct ath10k *ar, struct sk_buff *skb)
> struct wmi_bcn_info *bcn_info;
> struct ath10k_vif *arvif;
> struct sk_buff *bcn;
> - int vdev_id = 0;
> + int err, vdev_id = 0;

We use ret, res and err as names for the status variable. We should try
to unify this and use ret everywhere.

--
Kalle Valo

2014-02-17 09:37:55

by Michal Kazior

[permalink] [raw]
Subject: [RFC/RFT 2/7] ath10k: remove is_aborted from skb_cb

The flag wasn't used anymore. No need to keep it.

Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/core.h | 1 -
drivers/net/wireless/ath/ath10k/pci.c | 20 --------------------
2 files changed, 21 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 082fa77..7cf022d 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -62,7 +62,6 @@ struct ath10k;

struct ath10k_skb_cb {
dma_addr_t paddr;
- bool is_aborted;
u8 vdev_id;

struct {
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 34f0910..d973975 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -63,7 +63,6 @@ static int ath10k_pci_post_rx(struct ath10k *ar);
static int ath10k_pci_post_rx_pipe(struct ath10k_pci_pipe *pipe_info,
int num);
static void ath10k_pci_rx_pipe_cleanup(struct ath10k_pci_pipe *pipe_info);
-static void ath10k_pci_stop_ce(struct ath10k *ar);
static int ath10k_pci_cold_reset(struct ath10k *ar);
static int ath10k_pci_warm_reset(struct ath10k *ar);
static int ath10k_pci_wait_for_target_init(struct ath10k *ar);
@@ -993,22 +992,6 @@ static void ath10k_pci_kill_tasklet(struct ath10k *ar)
tasklet_kill(&ar_pci->pipe_info[i].intr);
}

-static void ath10k_pci_stop_ce(struct ath10k *ar)
-{
- struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
- struct ath10k_pci_compl *compl;
- struct sk_buff *skb;
-
- /* Mark pending completions as aborted, so that upper layers free up
- * their associated resources */
- spin_lock_bh(&ar_pci->compl_lock);
- list_for_each_entry(compl, &ar_pci->compl_process, list) {
- skb = compl->skb;
- ATH10K_SKB_CB(skb)->is_aborted = true;
- }
- spin_unlock_bh(&ar_pci->compl_lock);
-}
-
static void ath10k_pci_cleanup_ce(struct ath10k *ar)
{
struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
@@ -1339,7 +1322,6 @@ err_stop:
ath10k_ce_disable_interrupts(ar);
ath10k_pci_free_irq(ar);
ath10k_pci_kill_tasklet(ar);
- ath10k_pci_stop_ce(ar);
ath10k_pci_process_ce(ar);
err_free_compl:
ath10k_pci_cleanup_ce(ar);
@@ -1424,7 +1406,6 @@ static void ath10k_pci_tx_pipe_cleanup(struct ath10k_pci_pipe *pipe_info)
continue;
}

- ATH10K_SKB_CB(netbuf)->is_aborted = true;
ar_pci->msg_callbacks_current.tx_completion(ar,
netbuf,
id);
@@ -1482,7 +1463,6 @@ static void ath10k_pci_hif_stop(struct ath10k *ar)

ath10k_pci_free_irq(ar);
ath10k_pci_kill_tasklet(ar);
- ath10k_pci_stop_ce(ar);

ret = ath10k_pci_request_early_irq(ar);
if (ret)
--
1.8.5.3


2014-02-26 12:09:38

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCH v2 0/8] ath10k: performance improvements

On 26 February 2014 12:34, Michal Kazior <[email protected]> wrote:
> Hi,
>
> These patches aim at reducing host CPU load and
> thus improve performance on low-end systems.
>
> In my setup I get a relative improvement of
> 100mbps for both UDP Tx and Rx.
>
> Tx ----->
> laptop:eth---eth:AP135:ath10k---ath10k:laptop
> <----- Rx
>
>
> v2:
> * improve comments/commit messages
> * fix 1 unbalanced locking
> * fix var naming (s/err/ret)
> * remove code that didn't belong here
> * split batch tx/rx into 2 patches
> (+ath10k: reduce htt tx/rx spinlock overhead)

My bloody mail server said it's too busy and dropped some of the
patches.. I'll try to re-send it as v3 later.


Michał

2014-02-28 10:13:56

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH v3 0/8] ath10k: performance improvements

Michal Kazior <[email protected]> writes:

> These patches aim at reducing host CPU load and
> thus improve performance on low-end systems.
>
> In my setup I get a relative improvement of
> 100mbps for both UDP Tx and Rx.
>
> Tx ----->
> laptop:eth---eth:AP135:ath10k---ath10k:laptop
> <----- Rx
>
>
> v2:
> * improve comments/commit messages
> * fix unbalanced locking
> * fix var naming (s/err/ret)
> * fix sparse/checkpatch/buildbot warnings
> * remove code that didn't belong here
> * split batch tx/rx into 2 patches
> (+ath10k: reduce htt tx/rx spinlock overhead)
>
> v3:
> * just a resend (due to my mail-server issues the
> other day not all of the v2 patches had been
> sent out)
>
> Michal Kazior (8):
> ath10k: remove DMA mapping wrappers
> ath10k: remove is_aborted from skb_cb
> ath10k: replace send_head() with tx_sg()
> ath10k: bypass htc for htt tx path
> ath10k: batch htt tx/rx completions
> ath10k: reduce htt tx/rx spinlock overhead
> ath10k: remove pci completion list
> ath10k: minimize coherent dma accesses

Thanks, all eight patches applied.

--
Kalle Valo

2014-02-26 11:40:00

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v2 1/8] ath10k: remove DMA mapping wrappers

There's no real benefit from using them. DMA-API
already provides debugging. Some skbuffs are
already mapped directly with DMA-API since wrapper
arguments were insufficient and extending them
would be pointless.

Signed-off-by: Michal Kazior <[email protected]>
---
v2:
* var naming fix: s/ret/err/

drivers/net/wireless/ath/ath10k/core.h | 27 ---------------------------
drivers/net/wireless/ath/ath10k/htc.c | 11 ++++++++---
drivers/net/wireless/ath/ath10k/htt_tx.c | 12 ++++++++----
drivers/net/wireless/ath/ath10k/mac.c | 4 +++-
drivers/net/wireless/ath/ath10k/txrx.c | 5 +----
drivers/net/wireless/ath/ath10k/wmi.c | 17 ++++++++++++++---
6 files changed, 34 insertions(+), 42 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 1fc26fe..082fa77 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -62,7 +62,6 @@ struct ath10k;

struct ath10k_skb_cb {
dma_addr_t paddr;
- bool is_mapped;
bool is_aborted;
u8 vdev_id;

@@ -87,32 +86,6 @@ static inline struct ath10k_skb_cb *ATH10K_SKB_CB(struct sk_buff *skb)
return (struct ath10k_skb_cb *)&IEEE80211_SKB_CB(skb)->driver_data;
}

-static inline int ath10k_skb_map(struct device *dev, struct sk_buff *skb)
-{
- if (ATH10K_SKB_CB(skb)->is_mapped)
- return -EINVAL;
-
- ATH10K_SKB_CB(skb)->paddr = dma_map_single(dev, skb->data, skb->len,
- DMA_TO_DEVICE);
-
- if (unlikely(dma_mapping_error(dev, ATH10K_SKB_CB(skb)->paddr)))
- return -EIO;
-
- ATH10K_SKB_CB(skb)->is_mapped = true;
- return 0;
-}
-
-static inline int ath10k_skb_unmap(struct device *dev, struct sk_buff *skb)
-{
- if (!ATH10K_SKB_CB(skb)->is_mapped)
- return -EINVAL;
-
- dma_unmap_single(dev, ATH10K_SKB_CB(skb)->paddr, skb->len,
- DMA_TO_DEVICE);
- ATH10K_SKB_CB(skb)->is_mapped = false;
- return 0;
-}
-
static inline u32 host_interest_item_address(u32 item_offset)
{
return QCA988X_HOST_INTEREST_ADDRESS + item_offset;
diff --git a/drivers/net/wireless/ath/ath10k/htc.c b/drivers/net/wireless/ath/ath10k/htc.c
index edc57ab..69f1f46 100644
--- a/drivers/net/wireless/ath/ath10k/htc.c
+++ b/drivers/net/wireless/ath/ath10k/htc.c
@@ -63,7 +63,9 @@ static struct sk_buff *ath10k_htc_build_tx_ctrl_skb(void *ar)
static inline void ath10k_htc_restore_tx_skb(struct ath10k_htc *htc,
struct sk_buff *skb)
{
- ath10k_skb_unmap(htc->ar->dev, skb);
+ struct ath10k_skb_cb *skb_cb = ATH10K_SKB_CB(skb);
+
+ dma_unmap_single(htc->ar->dev, skb_cb->paddr, skb->len, DMA_TO_DEVICE);
skb_pull(skb, sizeof(struct ath10k_htc_hdr));
}

@@ -122,6 +124,8 @@ int ath10k_htc_send(struct ath10k_htc *htc,
struct sk_buff *skb)
{
struct ath10k_htc_ep *ep = &htc->endpoint[eid];
+ struct ath10k_skb_cb *skb_cb = ATH10K_SKB_CB(skb);
+ struct device *dev = htc->ar->dev;
int credits = 0;
int ret;

@@ -157,7 +161,8 @@ int ath10k_htc_send(struct ath10k_htc *htc,

ath10k_htc_prepare_tx_skb(ep, skb);

- ret = ath10k_skb_map(htc->ar->dev, skb);
+ skb_cb->paddr = dma_map_single(dev, skb->data, skb->len, DMA_TO_DEVICE);
+ ret = dma_mapping_error(dev, skb_cb->paddr);
if (ret)
goto err_credits;

@@ -169,7 +174,7 @@ int ath10k_htc_send(struct ath10k_htc *htc,
return 0;

err_unmap:
- ath10k_skb_unmap(htc->ar->dev, skb);
+ dma_unmap_single(dev, skb_cb->paddr, skb->len, DMA_TO_DEVICE);
err_credits:
if (ep->tx_credit_flow_enabled) {
spin_lock_bh(&htc->tx_lock);
diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
index acaa046..f5960c5 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -334,7 +334,9 @@ int ath10k_htt_mgmt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
goto err_free_msdu_id;
}

- res = ath10k_skb_map(dev, msdu);
+ skb_cb->paddr = dma_map_single(dev, msdu->data, msdu->len,
+ DMA_TO_DEVICE);
+ res = dma_mapping_error(dev, skb_cb->paddr);
if (res)
goto err_free_txdesc;

@@ -358,7 +360,7 @@ int ath10k_htt_mgmt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
return 0;

err_unmap_msdu:
- ath10k_skb_unmap(dev, msdu);
+ dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);
err_free_txdesc:
dev_kfree_skb_any(txdesc);
err_free_msdu_id:
@@ -437,7 +439,9 @@ int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
skb_cb->htt.pad_len = 0;
}

- res = ath10k_skb_map(dev, msdu);
+ skb_cb->paddr = dma_map_single(dev, msdu->data, msdu->len,
+ DMA_TO_DEVICE);
+ res = dma_mapping_error(dev, skb_cb->paddr);
if (res)
goto err_pull_txfrag;

@@ -509,7 +513,7 @@ int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
return 0;

err_unmap_msdu:
- ath10k_skb_unmap(dev, msdu);
+ dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);
err_pull_txfrag:
skb_pull(msdu, skb_cb->htt.frag_len + skb_cb->htt.pad_len);
err_free_txdesc:
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 0321da0..9d682ed 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -855,7 +855,9 @@ static void ath10k_control_beaconing(struct ath10k_vif *arvif,

spin_lock_bh(&arvif->ar->data_lock);
if (arvif->beacon) {
- ath10k_skb_unmap(arvif->ar->dev, arvif->beacon);
+ dma_unmap_single(arvif->ar->dev,
+ ATH10K_SKB_CB(arvif->beacon)->paddr,
+ arvif->beacon->len, DMA_TO_DEVICE);
dev_kfree_skb_any(arvif->beacon);

arvif->beacon = NULL;
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
index dcf7efd..fe69899 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -51,7 +51,6 @@ void ath10k_txrx_tx_unref(struct ath10k_htt *htt,
struct ieee80211_tx_info *info;
struct ath10k_skb_cb *skb_cb;
struct sk_buff *msdu;
- int ret;

ath10k_dbg(ATH10K_DBG_HTT, "htt tx completion msdu_id %u discard %d no_ack %d\n",
tx_done->msdu_id, !!tx_done->discard, !!tx_done->no_ack);
@@ -65,9 +64,7 @@ void ath10k_txrx_tx_unref(struct ath10k_htt *htt,
msdu = htt->pending_tx[tx_done->msdu_id];
skb_cb = ATH10K_SKB_CB(msdu);

- ret = ath10k_skb_unmap(dev, msdu);
- if (ret)
- ath10k_warn("data skb unmap failed (%d)\n", ret);
+ dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);

if (skb_cb->htt.frag_len)
skb_pull(msdu, skb_cb->htt.frag_len + skb_cb->htt.pad_len);
diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
index 91e501b..478e7f6 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -1360,7 +1360,7 @@ static void ath10k_wmi_event_host_swba(struct ath10k *ar, struct sk_buff *skb)
struct wmi_bcn_info *bcn_info;
struct ath10k_vif *arvif;
struct sk_buff *bcn;
- int vdev_id = 0;
+ int ret, vdev_id = 0;

ath10k_dbg(ATH10K_DBG_MGMT, "WMI_HOST_SWBA_EVENTID\n");

@@ -1435,16 +1435,27 @@ static void ath10k_wmi_event_host_swba(struct ath10k *ar, struct sk_buff *skb)
ath10k_warn("SWBA overrun on vdev %d\n",
arvif->vdev_id);

- ath10k_skb_unmap(ar->dev, arvif->beacon);
+ dma_unmap_single(arvif->ar->dev,
+ ATH10K_SKB_CB(arvif->beacon)->paddr,
+ arvif->beacon->len, DMA_TO_DEVICE);
dev_kfree_skb_any(arvif->beacon);
}

- ath10k_skb_map(ar->dev, bcn);
+ ATH10K_SKB_CB(bcn)->paddr = dma_map_single(arvif->ar->dev,
+ bcn->data, bcn->len,
+ DMA_TO_DEVICE);
+ ret = dma_mapping_error(arvif->ar->dev,
+ ATH10K_SKB_CB(bcn)->paddr);
+ if (ret) {
+ ath10k_warn("failed to map beacon: %d\n", ret);
+ goto skip;
+ }

arvif->beacon = bcn;
arvif->beacon_sent = false;

ath10k_wmi_tx_beacon_nowait(arvif);
+skip:
spin_unlock_bh(&ar->data_lock);
}
}
--
1.8.5.3


2014-02-19 15:16:54

by Kalle Valo

[permalink] [raw]
Subject: Re: [RFC/RFT 0/7] ath10k: performance improvements

Michal Kazior <[email protected]> writes:

> Hi,
>
> These patches aim at reducing host CPU load and
> thus improve performance on low-end systems.
>
> In my setup I get a relative improvement of
> 100mbps for both UDP Tx and Rx.
>
> Tx ----->
> laptop:eth---eth:AP135:ath10k---ath10k:laptop
> <----- Rx
>
>
> Michal Kazior (7):
> ath10k: remove DMA mapping wrappers
> ath10k: remove is_aborted from skb_cb
> ath10k: replace send_head() with tx_sg()
> ath10k: bypass htc for htt tx path
> ath10k: batch htt tx/rx completions
> ath10k: remove pci completion list
> ath10k: minimize coherent dma accesses

Ok, now I reviewed the patches and sent some comments. But I didn't run
any tests yet. I assume you also saw these warnings from the build bot:

drivers/net/wireless/ath/ath10k/htt_rx.c:274:17: sparse: context imbalance in 'ath10k_htt_rx_amsdu_pop' - unexpected unlock
drivers/net/wireless/ath/ath10k/htt_tx.c:505 ath10k_htt_tx() warn: potential pointer math issue ('skb_cb->htt.txbuf' is a 352 bit pointer)
drivers/net/wireless/ath/ath10k/htt_tx.c:423:2: error: implicit declaration of function 'dma_pool_alloc' [-Werror=implicit-function-declaration]
drivers/net/wireless/ath/ath10k/htt_tx.c:112:2: error: implicit declaration of function 'dma_pool_create' [-Werror=implicit-function-declaration]
drivers/net/wireless/ath/ath10k/htt_tx.c:149:2: error: implicit declaration of function 'dma_pool_destroy' [-Werror=implicit-function-declaration]
drivers/net/wireless/ath/ath10k/htt_tx.c:529:2: error: implicit declaration of function 'dma_pool_free' [-Werror=implicit-function-declaration]
drivers/net/wireless/ath/ath10k/htt_tx.c:423:20: warning: assignment makes pointer from integer without a cast [enabled by default]
drivers/net/wireless/ath/ath10k/txrx.c:72:3: error: implicit declaration of function 'dma_pool_free' [-Werror=implicit-function-declaration]

--
Kalle Valo

2014-02-19 14:56:18

by Kalle Valo

[permalink] [raw]
Subject: Re: [RFC/RFT 4/7] ath10k: bypass htc for htt tx path

Michal Kazior <[email protected]> writes:

> Going through full HTC tx path for HTT tx is a
> waste of resources. By skipping it it's possible
> to easily submit scatter-gather to the PCI HIF for
> reduced host CPU load and improved performance.
>
> Signed-off-by: Michal Kazior <[email protected]>

For a change like this the commit log needs to be more descriptive.
There's no mention about dma pools, nothing about the basic principle
how this works now etc.

> @@ -1067,9 +1067,9 @@ struct ath10k_ce_pipe *ath10k_ce_init(struct ath10k *ar,
> *
> * For the lack of a better place do the check here.
> */
> - BUILD_BUG_ON(TARGET_NUM_MSDU_DESC >
> + BUILD_BUG_ON(2*TARGET_NUM_MSDU_DESC >
> (CE_HTT_H2T_MSG_SRC_NENTRIES - 1));
> - BUILD_BUG_ON(TARGET_10X_NUM_MSDU_DESC >
> + BUILD_BUG_ON(2*TARGET_10X_NUM_MSDU_DESC >
> (CE_HTT_H2T_MSG_SRC_NENTRIES - 1));
>
> ret = ath10k_pci_wake(ar);
> diff --git a/drivers/net/wireless/ath/ath10k/ce.h b/drivers/net/wireless/ath/ath10k/ce.h
> index 322e929..8eb7f99 100644
> --- a/drivers/net/wireless/ath/ath10k/ce.h
> +++ b/drivers/net/wireless/ath/ath10k/ce.h
> @@ -23,7 +23,7 @@
>
> /* Maximum number of Copy Engine's supported */
> #define CE_COUNT_MAX 8
> -#define CE_HTT_H2T_MSG_SRC_NENTRIES 2048
> +#define CE_HTT_H2T_MSG_SRC_NENTRIES 4096

Also document why you do these changes.

--
Kalle Valo

2014-02-19 12:48:53

by Kalle Valo

[permalink] [raw]
Subject: Re: [RFC/RFT 3/7] ath10k: replace send_head() with tx_sg()

Michal Kazior <[email protected]> writes:

> PCI is capable of handling scatter-gather lists.
> This can be used to avoid copying memory.
>
> Change the name of the callback while at to
> reflect its purpose.
>
> Signed-off-by: Michal Kazior <[email protected]>

[...]

> +struct ath10k_hif_sg_item {
> + u16 transfer_id;
> + void *transfer_context;
> + void *vaddr; /* for debugging mostly */
> + u32 paddr;
> + u16 len;
> +};

This is the part I don't like. Instead of adding our own structs we
instead should have everything in skb->cb and pass the skbs around. The
sad part was that last fall I was working on cleaning up that but never
found the time to finish it :(

--
Kalle Valo

2014-02-26 11:40:06

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v2 2/8] ath10k: remove is_aborted from skb_cb

The flag wasn't used anymore. No need to keep it.

Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/core.h | 1 -
drivers/net/wireless/ath/ath10k/pci.c | 20 --------------------
2 files changed, 21 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 082fa77..7cf022d 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -62,7 +62,6 @@ struct ath10k;

struct ath10k_skb_cb {
dma_addr_t paddr;
- bool is_aborted;
u8 vdev_id;

struct {
diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index 34f0910..d973975 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -63,7 +63,6 @@ static int ath10k_pci_post_rx(struct ath10k *ar);
static int ath10k_pci_post_rx_pipe(struct ath10k_pci_pipe *pipe_info,
int num);
static void ath10k_pci_rx_pipe_cleanup(struct ath10k_pci_pipe *pipe_info);
-static void ath10k_pci_stop_ce(struct ath10k *ar);
static int ath10k_pci_cold_reset(struct ath10k *ar);
static int ath10k_pci_warm_reset(struct ath10k *ar);
static int ath10k_pci_wait_for_target_init(struct ath10k *ar);
@@ -993,22 +992,6 @@ static void ath10k_pci_kill_tasklet(struct ath10k *ar)
tasklet_kill(&ar_pci->pipe_info[i].intr);
}

-static void ath10k_pci_stop_ce(struct ath10k *ar)
-{
- struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
- struct ath10k_pci_compl *compl;
- struct sk_buff *skb;
-
- /* Mark pending completions as aborted, so that upper layers free up
- * their associated resources */
- spin_lock_bh(&ar_pci->compl_lock);
- list_for_each_entry(compl, &ar_pci->compl_process, list) {
- skb = compl->skb;
- ATH10K_SKB_CB(skb)->is_aborted = true;
- }
- spin_unlock_bh(&ar_pci->compl_lock);
-}
-
static void ath10k_pci_cleanup_ce(struct ath10k *ar)
{
struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
@@ -1339,7 +1322,6 @@ err_stop:
ath10k_ce_disable_interrupts(ar);
ath10k_pci_free_irq(ar);
ath10k_pci_kill_tasklet(ar);
- ath10k_pci_stop_ce(ar);
ath10k_pci_process_ce(ar);
err_free_compl:
ath10k_pci_cleanup_ce(ar);
@@ -1424,7 +1406,6 @@ static void ath10k_pci_tx_pipe_cleanup(struct ath10k_pci_pipe *pipe_info)
continue;
}

- ATH10K_SKB_CB(netbuf)->is_aborted = true;
ar_pci->msg_callbacks_current.tx_completion(ar,
netbuf,
id);
@@ -1482,7 +1463,6 @@ static void ath10k_pci_hif_stop(struct ath10k *ar)

ath10k_pci_free_irq(ar);
ath10k_pci_kill_tasklet(ar);
- ath10k_pci_stop_ce(ar);

ret = ath10k_pci_request_early_irq(ar);
if (ret)
--
1.8.5.3


2014-02-20 11:23:12

by Michal Kazior

[permalink] [raw]
Subject: Re: [RFC/RFT 5/7] ath10k: batch htt tx/rx completions

On 19 February 2014 16:10, Kalle Valo <[email protected]> wrote:
> Michal Kazior <[email protected]> writes:
>
>> HTT Rx endpoint processes both frame rx
>> indications and frame tx completion indications.
>>
>> Those completions typically come in batches and
>> may be mixed so it makes sense to defer processing
>> hoping to get a bunch of them and take advantage
>> of hot caches.
>>
>> Signed-off-by: Michal Kazior <[email protected]>
>
> [...]
>
>> @@ -270,7 +274,7 @@ static inline struct sk_buff *ath10k_htt_rx_netbuf_pop(struct ath10k_htt *htt)
>> int idx;
>> struct sk_buff *msdu;
>>
>> - spin_lock_bh(&htt->rx_ring.lock);
>> + lockdep_assert_held(&htt->rx_ring.lock);
>
> There are some locking changes which I think would be better to have in
> a separate patch.

I don't think it makes much sense. It seems silly to move a few lines
in one patch to move some of those lines again in another one.


>> case HTT_T2H_MSG_TYPE_MGMT_TX_COMPLETION: {
>> + struct htt_resp *resp = (struct htt_resp *)skb->data;
>> struct htt_tx_done tx_done = {};
>> int status = __le32_to_cpu(resp->mgmt_tx_completion.status);
>>
>> - tx_done.msdu_id =
>> - __le32_to_cpu(resp->mgmt_tx_completion.desc_id);
>> + tx_done.msdu_id = __le32_to_cpu(resp->mgmt_tx_completion.desc_id);
>
> I don't see any changes here.

Right. I'll cut this hunk out.


Michał

2014-02-24 11:49:28

by Kalle Valo

[permalink] [raw]
Subject: Re: [RFC/RFT 5/7] ath10k: batch htt tx/rx completions

Michal Kazior <[email protected]> writes:

> On 19 February 2014 16:10, Kalle Valo <[email protected]> wrote:
>> Michal Kazior <[email protected]> writes:
>>
>>> HTT Rx endpoint processes both frame rx
>>> indications and frame tx completion indications.
>>>
>>> Those completions typically come in batches and
>>> may be mixed so it makes sense to defer processing
>>> hoping to get a bunch of them and take advantage
>>> of hot caches.
>>>
>>> Signed-off-by: Michal Kazior <[email protected]>
>>
>> [...]
>>
>>> @@ -270,7 +274,7 @@ static inline struct sk_buff *ath10k_htt_rx_netbuf_pop(struct ath10k_htt *htt)
>>> int idx;
>>> struct sk_buff *msdu;
>>>
>>> - spin_lock_bh(&htt->rx_ring.lock);
>>> + lockdep_assert_held(&htt->rx_ring.lock);
>>
>> There are some locking changes which I think would be better to have in
>> a separate patch.
>
> I don't think it makes much sense. It seems silly to move a few lines
> in one patch to move some of those lines again in another one.

I'm not really getting your point here. Moving things back and forth
would not make any sense, but that's not what I was asking either.

I noticed that you were taking locks in a larger scope (and earlier)
than before. I was just thinking would it be possible to make that
locking changes in a separate patch, that way it would be easier to
bisect regressions if/when they show up.

--
Kalle Valo

2014-02-27 07:25:07

by Michal Kazior

[permalink] [raw]
Subject: [PATCH v3 6/8] ath10k: reduce htt tx/rx spinlock overhead

It is inefficient to grab irqsave spinlocks for
skb lists for each queue/dequeue action.

Using rx_ring.lock and tx_lock allows to use less
heavy bh spinlock functions and moving locking
upwards allows to toggle spinlocks less often.

Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/htt_rx.c | 24 +++++++++++++++++++-----
drivers/net/wireless/ath/ath10k/htt_tx.c | 5 ++---
drivers/net/wireless/ath/ath10k/txrx.c | 4 ++--
3 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 1141f6c..1cb88bf 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -274,7 +274,7 @@ static inline struct sk_buff *ath10k_htt_rx_netbuf_pop(struct ath10k_htt *htt)
int idx;
struct sk_buff *msdu;

- spin_lock_bh(&htt->rx_ring.lock);
+ lockdep_assert_held(&htt->rx_ring.lock);

if (ath10k_htt_rx_ring_elems(htt) == 0)
ath10k_warn("htt rx ring is empty!\n");
@@ -287,7 +287,6 @@ static inline struct sk_buff *ath10k_htt_rx_netbuf_pop(struct ath10k_htt *htt)
htt->rx_ring.sw_rd_idx.msdu_payld = idx;
htt->rx_ring.fill_cnt--;

- spin_unlock_bh(&htt->rx_ring.lock);
return msdu;
}

@@ -311,6 +310,8 @@ static int ath10k_htt_rx_amsdu_pop(struct ath10k_htt *htt,
struct sk_buff *msdu;
struct htt_rx_desc *rx_desc;

+ lockdep_assert_held(&htt->rx_ring.lock);
+
if (ath10k_htt_rx_ring_elems(htt) == 0)
ath10k_warn("htt rx ring is empty!\n");

@@ -904,6 +905,8 @@ static void ath10k_htt_rx_handler(struct ath10k_htt *htt,
u8 *fw_desc;
int i, j;

+ lockdep_assert_held(&htt->rx_ring.lock);
+
memset(&info, 0, sizeof(info));

fw_desc_len = __le16_to_cpu(rx->prefix.fw_rx_desc_bytes);
@@ -1040,8 +1043,11 @@ static void ath10k_htt_rx_frag_handler(struct ath10k_htt *htt,

msdu_head = NULL;
msdu_tail = NULL;
+
+ spin_lock_bh(&htt->rx_ring.lock);
msdu_chaining = ath10k_htt_rx_amsdu_pop(htt, &fw_desc, &fw_desc_len,
&msdu_head, &msdu_tail);
+ spin_unlock_bh(&htt->rx_ring.lock);

ath10k_dbg(ATH10K_DBG_HTT_DUMP, "htt rx frag ahead\n");

@@ -1143,6 +1149,8 @@ static void ath10k_htt_rx_frm_tx_compl(struct ath10k *ar,
__le16 msdu_id;
int i;

+ lockdep_assert_held(&htt->tx_lock);
+
switch (status) {
case HTT_DATA_TX_STATUS_NO_ACK:
tx_done.no_ack = true;
@@ -1189,7 +1197,9 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
break;
}
case HTT_T2H_MSG_TYPE_RX_IND:
- skb_queue_tail(&htt->rx_compl_q, skb);
+ spin_lock_bh(&htt->rx_ring.lock);
+ __skb_queue_tail(&htt->rx_compl_q, skb);
+ spin_unlock_bh(&htt->rx_ring.lock);
tasklet_schedule(&htt->txrx_compl_task);
return;
case HTT_T2H_MSG_TYPE_PEER_MAP: {
@@ -1283,14 +1293,18 @@ void ath10k_htt_txrx_compl_task(unsigned long ptr)
struct htt_resp *resp;
struct sk_buff *skb;

- while ((skb = skb_dequeue(&htt->tx_compl_q))) {
+ spin_lock_bh(&htt->tx_lock);
+ while ((skb = __skb_dequeue(&htt->tx_compl_q))) {
ath10k_htt_rx_frm_tx_compl(htt->ar, skb);
dev_kfree_skb_any(skb);
}
+ spin_unlock_bh(&htt->tx_lock);

- while ((skb = skb_dequeue(&htt->rx_compl_q))) {
+ spin_lock_bh(&htt->rx_ring.lock);
+ while ((skb = __skb_dequeue(&htt->rx_compl_q))) {
resp = (struct htt_resp *)skb->data;
ath10k_htt_rx_handler(htt, &resp->rx_ind);
dev_kfree_skb_any(skb);
}
+ spin_unlock_bh(&htt->rx_ring.lock);
}
diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
index 20b7a44..7a3e2e4 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -125,9 +125,7 @@ static void ath10k_htt_tx_cleanup_pending(struct ath10k_htt *htt)
struct htt_tx_done tx_done = {0};
int msdu_id;

- /* No locks needed. Called after communication with the device has
- * been stopped. */
-
+ spin_lock_bh(&htt->tx_lock);
for (msdu_id = 0; msdu_id < htt->max_num_pending_tx; msdu_id++) {
if (!test_bit(msdu_id, htt->used_msdu_ids))
continue;
@@ -140,6 +138,7 @@ static void ath10k_htt_tx_cleanup_pending(struct ath10k_htt *htt)

ath10k_txrx_tx_unref(htt, &tx_done);
}
+ spin_unlock_bh(&htt->tx_lock);
}

void ath10k_htt_tx_detach(struct ath10k_htt *htt)
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
index 2993ca7..0541dd9 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -52,6 +52,8 @@ void ath10k_txrx_tx_unref(struct ath10k_htt *htt,
struct ath10k_skb_cb *skb_cb;
struct sk_buff *msdu;

+ lockdep_assert_held(&htt->tx_lock);
+
ath10k_dbg(ATH10K_DBG_HTT, "htt tx completion msdu_id %u discard %d no_ack %d\n",
tx_done->msdu_id, !!tx_done->discard, !!tx_done->no_ack);

@@ -91,13 +93,11 @@ void ath10k_txrx_tx_unref(struct ath10k_htt *htt,
/* we do not own the msdu anymore */

exit:
- spin_lock_bh(&htt->tx_lock);
htt->pending_tx[tx_done->msdu_id] = NULL;
ath10k_htt_tx_free_msdu_id(htt, tx_done->msdu_id);
__ath10k_htt_tx_dec_pending(htt);
if (htt->num_pending_tx == 0)
wake_up(&htt->empty_tx_wq);
- spin_unlock_bh(&htt->tx_lock);
}

static const u8 rx_legacy_rate_idx[] = {
--
1.8.5.3


2014-02-17 15:01:54

by Kalle Valo

[permalink] [raw]
Subject: Re: [RFC/RFT 0/7] ath10k: performance improvements

Michal Kazior <[email protected]> writes:

> These patches aim at reducing host CPU load and
> thus improve performance on low-end systems.
>
> In my setup I get a relative improvement of
> 100mbps for both UDP Tx and Rx.

Nice!

> Tx ----->
> laptop:eth---eth:AP135:ath10k---ath10k:laptop
> <----- Rx
>
>
> Michal Kazior (7):
> ath10k: remove DMA mapping wrappers
> ath10k: remove is_aborted from skb_cb
> ath10k: replace send_head() with tx_sg()
> ath10k: bypass htc for htt tx path
> ath10k: batch htt tx/rx completions
> ath10k: remove pci completion list
> ath10k: minimize coherent dma accesses

I know this is RFC still, but I did some compilation tests and saw
these:

drivers/net/wireless/ath/ath10k/htt_rx.c:274:17: warning: context imbalance in 'ath10k_htt_rx_amsdu_pop' - unexpected unlock
drivers/net/wireless/ath/ath10k/htt_tx.c:436: WARNING: line over 80 characters
drivers/net/wireless/ath/ath10k/htt_tx.c:493: WARNING: line over 80 characters
drivers/net/wireless/ath/ath10k/htt_rx.c:1214: WARNING: line over 80 characters

--
Kalle Valo

2014-02-17 09:37:58

by Michal Kazior

[permalink] [raw]
Subject: [RFC/RFT 3/7] ath10k: replace send_head() with tx_sg()

PCI is capable of handling scatter-gather lists.
This can be used to avoid copying memory.

Change the name of the callback while at to
reflect its purpose.

Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/ce.c | 12 ++---
drivers/net/wireless/ath/ath10k/ce.h | 7 +++
drivers/net/wireless/ath/ath10k/hif.h | 25 ++++++-----
drivers/net/wireless/ath/ath10k/htc.c | 10 ++++-
drivers/net/wireless/ath/ath10k/pci.c | 84 +++++++++++++++++++++++------------
5 files changed, 92 insertions(+), 46 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/ce.c b/drivers/net/wireless/ath/ath10k/ce.c
index d44d618..a0b1a8c 100644
--- a/drivers/net/wireless/ath/ath10k/ce.c
+++ b/drivers/net/wireless/ath/ath10k/ce.c
@@ -266,12 +266,12 @@ static inline void ath10k_ce_engine_int_status_clear(struct ath10k *ar,
* ath10k_ce_sendlist_send.
* The caller takes responsibility for any needed locking.
*/
-static int ath10k_ce_send_nolock(struct ath10k_ce_pipe *ce_state,
- void *per_transfer_context,
- u32 buffer,
- unsigned int nbytes,
- unsigned int transfer_id,
- unsigned int flags)
+int ath10k_ce_send_nolock(struct ath10k_ce_pipe *ce_state,
+ void *per_transfer_context,
+ u32 buffer,
+ unsigned int nbytes,
+ unsigned int transfer_id,
+ unsigned int flags)
{
struct ath10k *ar = ce_state->ar;
struct ath10k_ce_ring *src_ring = ce_state->src_ring;
diff --git a/drivers/net/wireless/ath/ath10k/ce.h b/drivers/net/wireless/ath/ath10k/ce.h
index 67dbde6..322e929 100644
--- a/drivers/net/wireless/ath/ath10k/ce.h
+++ b/drivers/net/wireless/ath/ath10k/ce.h
@@ -152,6 +152,13 @@ int ath10k_ce_send(struct ath10k_ce_pipe *ce_state,
unsigned int transfer_id,
unsigned int flags);

+int ath10k_ce_send_nolock(struct ath10k_ce_pipe *ce_state,
+ void *per_transfer_context,
+ u32 buffer,
+ unsigned int nbytes,
+ unsigned int transfer_id,
+ unsigned int flags);
+
void ath10k_ce_send_cb_register(struct ath10k_ce_pipe *ce_state,
void (*send_cb)(struct ath10k_ce_pipe *),
int disable_interrupts);
diff --git a/drivers/net/wireless/ath/ath10k/hif.h b/drivers/net/wireless/ath/ath10k/hif.h
index dcdea68..3a0ba21 100644
--- a/drivers/net/wireless/ath/ath10k/hif.h
+++ b/drivers/net/wireless/ath/ath10k/hif.h
@@ -21,6 +21,14 @@
#include <linux/kernel.h>
#include "core.h"

+struct ath10k_hif_sg_item {
+ u16 transfer_id;
+ void *transfer_context;
+ void *vaddr; /* for debugging mostly */
+ u32 paddr;
+ u16 len;
+};
+
struct ath10k_hif_cb {
int (*tx_completion)(struct ath10k *ar,
struct sk_buff *wbuf,
@@ -31,11 +39,9 @@ struct ath10k_hif_cb {
};

struct ath10k_hif_ops {
- /* Send the head of a buffer to HIF for transmission to the target. */
- int (*send_head)(struct ath10k *ar, u8 pipe_id,
- unsigned int transfer_id,
- unsigned int nbytes,
- struct sk_buff *buf);
+ /* send a scatter-gather list to the target */
+ int (*tx_sg)(struct ath10k *ar, u8 pipe_id,
+ struct ath10k_hif_sg_item *items, int n_items);

/*
* API to handle HIF-specific BMI message exchanges, this API is
@@ -86,12 +92,11 @@ struct ath10k_hif_ops {
};


-static inline int ath10k_hif_send_head(struct ath10k *ar, u8 pipe_id,
- unsigned int transfer_id,
- unsigned int nbytes,
- struct sk_buff *buf)
+static inline int ath10k_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
+ struct ath10k_hif_sg_item *items,
+ int n_items)
{
- return ar->hif.ops->send_head(ar, pipe_id, transfer_id, nbytes, buf);
+ return ar->hif.ops->tx_sg(ar, pipe_id, items, n_items);
}

static inline int ath10k_hif_exchange_bmi_msg(struct ath10k *ar,
diff --git a/drivers/net/wireless/ath/ath10k/htc.c b/drivers/net/wireless/ath/ath10k/htc.c
index 69f1f46..64ab8d6 100644
--- a/drivers/net/wireless/ath/ath10k/htc.c
+++ b/drivers/net/wireless/ath/ath10k/htc.c
@@ -125,6 +125,7 @@ int ath10k_htc_send(struct ath10k_htc *htc,
{
struct ath10k_htc_ep *ep = &htc->endpoint[eid];
struct ath10k_skb_cb *skb_cb = ATH10K_SKB_CB(skb);
+ struct ath10k_hif_sg_item sg_item;
struct device *dev = htc->ar->dev;
int credits = 0;
int ret;
@@ -166,8 +167,13 @@ int ath10k_htc_send(struct ath10k_htc *htc,
if (ret)
goto err_credits;

- ret = ath10k_hif_send_head(htc->ar, ep->ul_pipe_id, ep->eid,
- skb->len, skb);
+ sg_item.transfer_id = ep->eid;
+ sg_item.transfer_context = skb;
+ sg_item.vaddr = skb->data;
+ sg_item.paddr = skb_cb->paddr;
+ sg_item.len = skb->len;
+
+ ret = ath10k_hif_tx_sg(htc->ar, ep->ul_pipe_id, &sg_item, 1);
if (ret)
goto err_unmap;

diff --git a/drivers/net/wireless/ath/ath10k/pci.c b/drivers/net/wireless/ath/ath10k/pci.c
index d973975..713c18e 100644
--- a/drivers/net/wireless/ath/ath10k/pci.c
+++ b/drivers/net/wireless/ath/ath10k/pci.c
@@ -714,6 +714,9 @@ static void ath10k_pci_ce_send_done(struct ath10k_ce_pipe *ce_state)
while (ath10k_ce_completed_send_next(ce_state, &transfer_context,
&ce_data, &nbytes,
&transfer_id) == 0) {
+ if (transfer_context == NULL)
+ continue;
+
compl = get_free_compl(pipe_info);
if (!compl)
break;
@@ -781,39 +784,64 @@ static void ath10k_pci_ce_recv_data(struct ath10k_ce_pipe *ce_state)
ath10k_pci_process_ce(ar);
}

-/* Send the first nbytes bytes of the buffer */
-static int ath10k_pci_hif_send_head(struct ath10k *ar, u8 pipe_id,
- unsigned int transfer_id,
- unsigned int bytes, struct sk_buff *nbuf)
+static int ath10k_pci_hif_tx_sg(struct ath10k *ar, u8 pipe_id,
+ struct ath10k_hif_sg_item *items, int n_items)
{
- struct ath10k_skb_cb *skb_cb = ATH10K_SKB_CB(nbuf);
struct ath10k_pci *ar_pci = ath10k_pci_priv(ar);
- struct ath10k_pci_pipe *pipe_info = &(ar_pci->pipe_info[pipe_id]);
- struct ath10k_ce_pipe *ce_hdl = pipe_info->ce_hdl;
- unsigned int len;
- u32 flags = 0;
- int ret;
+ struct ath10k_pci_pipe *pci_pipe = &ar_pci->pipe_info[pipe_id];
+ struct ath10k_ce_pipe *ce_pipe = pci_pipe->ce_hdl;
+ struct ath10k_ce_ring *src_ring = ce_pipe->src_ring;
+ unsigned int nentries_mask = src_ring->nentries_mask;
+ unsigned int sw_index = src_ring->sw_index;
+ unsigned int write_index = src_ring->write_index;
+ int err, i;

- len = min(bytes, nbuf->len);
- bytes -= len;
+ spin_lock_bh(&ar_pci->ce_lock);

- if (len & 3)
- ath10k_warn("skb not aligned to 4-byte boundary (%d)\n", len);
+ if (unlikely(CE_RING_DELTA(nentries_mask,
+ write_index, sw_index - 1) < n_items)) {
+ err = -ENOBUFS;
+ goto unlock;
+ }

- ath10k_dbg(ATH10K_DBG_PCI,
- "pci send data vaddr %p paddr 0x%llx len %d as %d bytes\n",
- nbuf->data, (unsigned long long) skb_cb->paddr,
- nbuf->len, len);
- ath10k_dbg_dump(ATH10K_DBG_PCI_DUMP, NULL,
- "ath10k tx: data: ",
- nbuf->data, nbuf->len);
-
- ret = ath10k_ce_send(ce_hdl, nbuf, skb_cb->paddr, len, transfer_id,
- flags);
- if (ret)
- ath10k_warn("failed to send sk_buff to CE: %p\n", nbuf);
+ for (i = 0; i < n_items - 1; i++) {
+ ath10k_dbg(ATH10K_DBG_PCI,
+ "pci tx item %d paddr 0x%08x len %d n_items %d\n",
+ i, items[i].paddr, items[i].len, n_items);
+ ath10k_dbg_dump(ATH10K_DBG_PCI_DUMP, NULL, "item data: ",
+ items[i].vaddr, items[i].len);

- return ret;
+ err = ath10k_ce_send_nolock(ce_pipe,
+ items[i].transfer_context,
+ items[i].paddr,
+ items[i].len,
+ items[i].transfer_id,
+ CE_SEND_FLAG_GATHER);
+ if (err)
+ goto unlock;
+ }
+
+ /* `i` is equal to `n_items -1` after for() */
+
+ ath10k_dbg(ATH10K_DBG_PCI,
+ "pci tx item %d paddr 0x%08x len %d n_items %d\n",
+ i, items[i].paddr, items[i].len, n_items);
+ ath10k_dbg_dump(ATH10K_DBG_PCI_DUMP, NULL, "item data: ",
+ items[i].vaddr, items[i].len);
+
+ err = ath10k_ce_send_nolock(ce_pipe,
+ items[i].transfer_context,
+ items[i].paddr,
+ items[i].len,
+ items[i].transfer_id,
+ 0);
+ if (err)
+ goto unlock;
+
+ err = 0;
+unlock:
+ spin_unlock_bh(&ar_pci->ce_lock);
+ return err;
}

static u16 ath10k_pci_hif_get_free_queue_number(struct ath10k *ar, u8 pipe)
@@ -2249,7 +2277,7 @@ static int ath10k_pci_hif_resume(struct ath10k *ar)
#endif

static const struct ath10k_hif_ops ath10k_pci_hif_ops = {
- .send_head = ath10k_pci_hif_send_head,
+ .tx_sg = ath10k_pci_hif_tx_sg,
.exchange_bmi_msg = ath10k_pci_hif_exchange_bmi_msg,
.start = ath10k_pci_hif_start,
.stop = ath10k_pci_hif_stop,
--
1.8.5.3


2014-02-28 09:15:21

by Michal Kazior

[permalink] [raw]
Subject: Re: [PATCH v3 4/8] ath10k: bypass htc for htt tx path

On 28 February 2014 10:06, Kalle Valo <[email protected]> wrote:
> Michal Kazior <[email protected]> writes:
>
>> Going through full htc tx path for htt tx is a
>> waste of resources. By skipping it it's possible
>> to easily submit scatter-gather to the pci hif for
>> reduced host cpu load and improved performance.
>>
>> The new approach uses dma pool to store the
>> following metadata for each tx request:
>> * msdu fragment list
>> * htc header
>> * htt tx command
>>
>> The htt tx command contains a msdu prefetch.
>> Instead of copying it original mapped msdu address
>> is used to submit a second scatter-gather item to
>> hif to make a complete htt tx command.
>>
>> The htt tx command itself hands over dma mapped
>> pointers to msdus and completion of the command
>> itself doesn't mean the frame has been sent and
>> can be unmapped/freed. This is why htc tx
>> completion is skipped for htt tx as all tx related
>> resources are freed upon htt tx completion
>> indication event (which also implicitly means htt
>> tx command itself was completed).
>>
>> Since now each htt tx request effectively consists
>> of 2 copy engine items CE_HTT_H2T_MSG_SRC_NENTRIES
>> is updated to allow maximum of
>> TARGET_10X_NUM_MSDU_DESC msdus being queued. This
>> keeps the tx path resource management simple.
>>
>> Signed-off-by: Michal Kazior <[email protected]>
>> ---
>> v2:
>> * improve commit log
>> * improve comment in code
>> * fix sparse/checkpatch/buildbot warnings
>
> [...]
>
>> --- a/drivers/net/wireless/ath/ath10k/htc.c
>> +++ b/drivers/net/wireless/ath/ath10k/htc.c
>> @@ -202,10 +202,8 @@ static int ath10k_htc_tx_completion_handler(struct ath10k *ar,
>> struct ath10k_htc *htc = &ar->htc;
>> struct ath10k_htc_ep *ep = &htc->endpoint[eid];
>>
>> - if (!skb) {
>> - ath10k_warn("invalid sk_buff completion - NULL pointer. firmware crashed?\n");
>> + if (WARN_ON(!skb))
>> return 0;
>> - }
>
> WARN_ON() is a bit dangerous here as it might cause excessive spamming.
> Why did you want to change this? I think either ath10k_warn() or
> WARN_ON_ONCE() would be safer, but not sure which one to use.

After the scatter-gather patch no NULL skb should be ever passed to tx
completion handler as those are ignored by hif/pci. Perhaps the hunk
should be moved from this patch to the scatter-gather one.

Perhaps WARN_ON() is a bit over the top here, but since this is now
more of a logic issue rather than runtime issue I decided to change it
from ath10k_warn to WARN_ON(). It's probably still a good idea to make
it _ONCE generally, although if you actually get skbuff it's already
too late or it should be screaming loudly because someone must've
changed the code in an incorrect/incomplete way.


Michał

2014-02-17 09:37:56

by Michal Kazior

[permalink] [raw]
Subject: [RFC/RFT 1/7] ath10k: remove DMA mapping wrappers

There's no real benefit from using them. DMA-API
already provides debugging. Some skbuffs are
already mapped directly with DMA-API since wrapper
arguments were insufficient and extending them
would be pointless.

Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/core.h | 27 ---------------------------
drivers/net/wireless/ath/ath10k/htc.c | 11 ++++++++---
drivers/net/wireless/ath/ath10k/htt_tx.c | 12 ++++++++----
drivers/net/wireless/ath/ath10k/mac.c | 4 +++-
drivers/net/wireless/ath/ath10k/txrx.c | 5 +----
drivers/net/wireless/ath/ath10k/wmi.c | 17 ++++++++++++++---
6 files changed, 34 insertions(+), 42 deletions(-)

diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 1fc26fe..082fa77 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -62,7 +62,6 @@ struct ath10k;

struct ath10k_skb_cb {
dma_addr_t paddr;
- bool is_mapped;
bool is_aborted;
u8 vdev_id;

@@ -87,32 +86,6 @@ static inline struct ath10k_skb_cb *ATH10K_SKB_CB(struct sk_buff *skb)
return (struct ath10k_skb_cb *)&IEEE80211_SKB_CB(skb)->driver_data;
}

-static inline int ath10k_skb_map(struct device *dev, struct sk_buff *skb)
-{
- if (ATH10K_SKB_CB(skb)->is_mapped)
- return -EINVAL;
-
- ATH10K_SKB_CB(skb)->paddr = dma_map_single(dev, skb->data, skb->len,
- DMA_TO_DEVICE);
-
- if (unlikely(dma_mapping_error(dev, ATH10K_SKB_CB(skb)->paddr)))
- return -EIO;
-
- ATH10K_SKB_CB(skb)->is_mapped = true;
- return 0;
-}
-
-static inline int ath10k_skb_unmap(struct device *dev, struct sk_buff *skb)
-{
- if (!ATH10K_SKB_CB(skb)->is_mapped)
- return -EINVAL;
-
- dma_unmap_single(dev, ATH10K_SKB_CB(skb)->paddr, skb->len,
- DMA_TO_DEVICE);
- ATH10K_SKB_CB(skb)->is_mapped = false;
- return 0;
-}
-
static inline u32 host_interest_item_address(u32 item_offset)
{
return QCA988X_HOST_INTEREST_ADDRESS + item_offset;
diff --git a/drivers/net/wireless/ath/ath10k/htc.c b/drivers/net/wireless/ath/ath10k/htc.c
index edc57ab..69f1f46 100644
--- a/drivers/net/wireless/ath/ath10k/htc.c
+++ b/drivers/net/wireless/ath/ath10k/htc.c
@@ -63,7 +63,9 @@ static struct sk_buff *ath10k_htc_build_tx_ctrl_skb(void *ar)
static inline void ath10k_htc_restore_tx_skb(struct ath10k_htc *htc,
struct sk_buff *skb)
{
- ath10k_skb_unmap(htc->ar->dev, skb);
+ struct ath10k_skb_cb *skb_cb = ATH10K_SKB_CB(skb);
+
+ dma_unmap_single(htc->ar->dev, skb_cb->paddr, skb->len, DMA_TO_DEVICE);
skb_pull(skb, sizeof(struct ath10k_htc_hdr));
}

@@ -122,6 +124,8 @@ int ath10k_htc_send(struct ath10k_htc *htc,
struct sk_buff *skb)
{
struct ath10k_htc_ep *ep = &htc->endpoint[eid];
+ struct ath10k_skb_cb *skb_cb = ATH10K_SKB_CB(skb);
+ struct device *dev = htc->ar->dev;
int credits = 0;
int ret;

@@ -157,7 +161,8 @@ int ath10k_htc_send(struct ath10k_htc *htc,

ath10k_htc_prepare_tx_skb(ep, skb);

- ret = ath10k_skb_map(htc->ar->dev, skb);
+ skb_cb->paddr = dma_map_single(dev, skb->data, skb->len, DMA_TO_DEVICE);
+ ret = dma_mapping_error(dev, skb_cb->paddr);
if (ret)
goto err_credits;

@@ -169,7 +174,7 @@ int ath10k_htc_send(struct ath10k_htc *htc,
return 0;

err_unmap:
- ath10k_skb_unmap(htc->ar->dev, skb);
+ dma_unmap_single(dev, skb_cb->paddr, skb->len, DMA_TO_DEVICE);
err_credits:
if (ep->tx_credit_flow_enabled) {
spin_lock_bh(&htc->tx_lock);
diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
index acaa046..f5960c5 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -334,7 +334,9 @@ int ath10k_htt_mgmt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
goto err_free_msdu_id;
}

- res = ath10k_skb_map(dev, msdu);
+ skb_cb->paddr = dma_map_single(dev, msdu->data, msdu->len,
+ DMA_TO_DEVICE);
+ res = dma_mapping_error(dev, skb_cb->paddr);
if (res)
goto err_free_txdesc;

@@ -358,7 +360,7 @@ int ath10k_htt_mgmt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
return 0;

err_unmap_msdu:
- ath10k_skb_unmap(dev, msdu);
+ dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);
err_free_txdesc:
dev_kfree_skb_any(txdesc);
err_free_msdu_id:
@@ -437,7 +439,9 @@ int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
skb_cb->htt.pad_len = 0;
}

- res = ath10k_skb_map(dev, msdu);
+ skb_cb->paddr = dma_map_single(dev, msdu->data, msdu->len,
+ DMA_TO_DEVICE);
+ res = dma_mapping_error(dev, skb_cb->paddr);
if (res)
goto err_pull_txfrag;

@@ -509,7 +513,7 @@ int ath10k_htt_tx(struct ath10k_htt *htt, struct sk_buff *msdu)
return 0;

err_unmap_msdu:
- ath10k_skb_unmap(dev, msdu);
+ dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);
err_pull_txfrag:
skb_pull(msdu, skb_cb->htt.frag_len + skb_cb->htt.pad_len);
err_free_txdesc:
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 94a70a9..ed90de2 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -855,7 +855,9 @@ static void ath10k_control_beaconing(struct ath10k_vif *arvif,

spin_lock_bh(&arvif->ar->data_lock);
if (arvif->beacon) {
- ath10k_skb_unmap(arvif->ar->dev, arvif->beacon);
+ dma_unmap_single(arvif->ar->dev,
+ ATH10K_SKB_CB(arvif->beacon)->paddr,
+ arvif->beacon->len, DMA_TO_DEVICE);
dev_kfree_skb_any(arvif->beacon);

arvif->beacon = NULL;
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
index 4713bdb..8fd0680 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -51,7 +51,6 @@ void ath10k_txrx_tx_unref(struct ath10k_htt *htt,
struct ieee80211_tx_info *info;
struct ath10k_skb_cb *skb_cb;
struct sk_buff *msdu;
- int ret;

ath10k_dbg(ATH10K_DBG_HTT, "htt tx completion msdu_id %u discard %d no_ack %d\n",
tx_done->msdu_id, !!tx_done->discard, !!tx_done->no_ack);
@@ -65,9 +64,7 @@ void ath10k_txrx_tx_unref(struct ath10k_htt *htt,
msdu = htt->pending_tx[tx_done->msdu_id];
skb_cb = ATH10K_SKB_CB(msdu);

- ret = ath10k_skb_unmap(dev, msdu);
- if (ret)
- ath10k_warn("data skb unmap failed (%d)\n", ret);
+ dma_unmap_single(dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);

if (skb_cb->htt.frag_len)
skb_pull(msdu, skb_cb->htt.frag_len + skb_cb->htt.pad_len);
diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
index 91e501b..f372957 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -1360,7 +1360,7 @@ static void ath10k_wmi_event_host_swba(struct ath10k *ar, struct sk_buff *skb)
struct wmi_bcn_info *bcn_info;
struct ath10k_vif *arvif;
struct sk_buff *bcn;
- int vdev_id = 0;
+ int err, vdev_id = 0;

ath10k_dbg(ATH10K_DBG_MGMT, "WMI_HOST_SWBA_EVENTID\n");

@@ -1435,16 +1435,27 @@ static void ath10k_wmi_event_host_swba(struct ath10k *ar, struct sk_buff *skb)
ath10k_warn("SWBA overrun on vdev %d\n",
arvif->vdev_id);

- ath10k_skb_unmap(ar->dev, arvif->beacon);
+ dma_unmap_single(arvif->ar->dev,
+ ATH10K_SKB_CB(arvif->beacon)->paddr,
+ arvif->beacon->len, DMA_TO_DEVICE);
dev_kfree_skb_any(arvif->beacon);
}

- ath10k_skb_map(ar->dev, bcn);
+ ATH10K_SKB_CB(bcn)->paddr = dma_map_single(arvif->ar->dev,
+ bcn->data, bcn->len,
+ DMA_TO_DEVICE);
+ err = dma_mapping_error(arvif->ar->dev,
+ ATH10K_SKB_CB(bcn)->paddr);
+ if (err) {
+ ath10k_warn("failed to map beacon: %d\n", err);
+ goto skip;
+ }

arvif->beacon = bcn;
arvif->beacon_sent = false;

ath10k_wmi_tx_beacon_nowait(arvif);
+skip:
spin_unlock_bh(&ar->data_lock);
}
}
--
1.8.5.3