2021-09-02 05:35:56

by P Praneesh

[permalink] [raw]
Subject: [PATCH v3 00/12] ath11k: optimizations in data path

This patchset covers optimizations in rx (first 7 patches)
and tx (remaining 5 patches) data path.

Running UDP DL/UL traffic on IPQ8074 5G radio showed an average 5-10%
improvement on a 4 core platform
---
v3:
- Changed rcu_dereference to rcu_access_pointer in
[PATCH 07/12] ath11k: add branch predictors in process_rx
[PATCH 11/12] ath11k: add branch predictors in dp_tx path.
removed redundant check in
[PATCH 02/12] ath11k: allocate dst ring descriptors from
cacheable memory.
v2:
- Addressed internal developer reported segfault and avoid lookup twice
by utilizing idr_remove (patch 12/12 and patch 2/12).
---
P Praneesh (12):
ath11k: disable unused CE8 interrupts for ipq8074
ath11k: allocate dst ring descriptors from cacheable memory
ath11k: modify dp_rx desc access wrapper calls inline
ath11k: avoid additional access to ath11k_hal_srng_dst_num_free
ath11k: avoid active pdev check for each msdu
ath11k: remove usage quota while processing rx packets
ath11k: add branch predictors in process_rx
ath11k: allocate HAL_WBM2SW_RELEASE ring from cacheable memory
ath11k: remove mod operator in dst ring processing
ath11k: avoid while loop in ring selection of tx completion interrupt
ath11k: add branch predictors in dp_tx path
ath11k: avoid unnecessary lock contention in tx_completion path

drivers/net/wireless/ath/ath11k/ce.c | 2 +-
drivers/net/wireless/ath/ath11k/core.c | 5 +
drivers/net/wireless/ath/ath11k/dp.c | 48 ++++++--
drivers/net/wireless/ath/ath11k/dp.h | 1 +
drivers/net/wireless/ath/ath11k/dp_rx.c | 207 ++++++++++++++++----------------
drivers/net/wireless/ath/ath11k/dp_tx.c | 86 ++++++-------
drivers/net/wireless/ath/ath11k/hal.c | 35 +++++-
drivers/net/wireless/ath/ath11k/hal.h | 1 +
drivers/net/wireless/ath/ath11k/hw.h | 1 +
drivers/net/wireless/ath/ath11k/mac.c | 2 +-
10 files changed, 220 insertions(+), 168 deletions(-)

--
2.7.4


2021-09-02 05:36:06

by P Praneesh

[permalink] [raw]
Subject: [PATCH v3 10/12] ath11k: avoid while loop in ring selection of tx completion interrupt

Currently while loop is used to find the tx completion ring number and
it is not required since the tx ring mask and the group id can be combined
to directly fetch the ring number. Hence remove the while loop
and directly get the ring number from tx mask and group id.

Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.4.0.1-01734-QCAHKSWPL_SILICONZ-1 v2

Co-developed-by: Sriram R <[email protected]>
Signed-off-by: Sriram R <[email protected]>
Signed-off-by: Jouni Malinen <[email protected]>
Signed-off-by: P Praneesh <[email protected]>
---
drivers/net/wireless/ath/ath11k/dp.c | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath11k/dp.c b/drivers/net/wireless/ath/ath11k/dp.c
index 0278ff6..d553692 100644
--- a/drivers/net/wireless/ath/ath11k/dp.c
+++ b/drivers/net/wireless/ath/ath11k/dp.c
@@ -770,13 +770,12 @@ int ath11k_dp_service_srng(struct ath11k_base *ab,
struct napi_struct *napi = &irq_grp->napi;
int grp_id = irq_grp->grp_id;
int work_done = 0;
- int i = 0, j;
+ int i, j;
int tot_work_done = 0;

- while (ab->hw_params.ring_mask->tx[grp_id] >> i) {
- if (ab->hw_params.ring_mask->tx[grp_id] & BIT(i))
- ath11k_dp_tx_completion_handler(ab, i);
- i++;
+ if (ab->hw_params.ring_mask->tx[grp_id]) {
+ i = __fls(ab->hw_params.ring_mask->tx[grp_id]);
+ ath11k_dp_tx_completion_handler(ab, i);
}

if (ab->hw_params.ring_mask->rx_err[grp_id]) {
--
2.7.4

2021-09-02 05:36:09

by P Praneesh

[permalink] [raw]
Subject: [PATCH v3 07/12] ath11k: add branch predictors in process_rx

In datapath, add branch predictors where required in the process rx().
This protects high value rx path without having performance overhead.
Also while processing rx packets, the pointer that is returned by
rcu_dereference() is not dereferenced. so it is preferable to use
rcu_access_pointer() here.

Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.4.0.1.r2-00012-QCAHKSWPL_SILICONZ-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.4.0.1-01695-QCAHKSWPL_SILICONZ-1

Co-developed-by: Sriram R <[email protected]>
Signed-off-by: Sriram R <[email protected]>
Signed-off-by: Jouni Malinen <[email protected]>
Signed-off-by: P Praneesh <[email protected]>
---
drivers/net/wireless/ath/ath11k/dp_rx.c | 24 +++++++++---------------
1 file changed, 9 insertions(+), 15 deletions(-)

diff --git a/drivers/net/wireless/ath/ath11k/dp_rx.c b/drivers/net/wireless/ath/ath11k/dp_rx.c
index e105bdc..a362615 100644
--- a/drivers/net/wireless/ath/ath11k/dp_rx.c
+++ b/drivers/net/wireless/ath/ath11k/dp_rx.c
@@ -2532,24 +2532,20 @@ static void ath11k_dp_rx_process_received_packets(struct ath11k_base *ab,
if (skb_queue_empty(msdu_list))
return;

- rcu_read_lock();
-
- ar = ab->pdevs[mac_id].ar;
- if (!rcu_dereference(ab->pdevs_active[mac_id])) {
+ if (unlikely(!rcu_access_pointer(ab->pdevs_active[mac_id]))) {
__skb_queue_purge(msdu_list);
- rcu_read_unlock();
return;
}

- if (test_bit(ATH11K_CAC_RUNNING, &ar->dev_flags)) {
+ ar = ab->pdevs[mac_id].ar;
+ if (unlikely(test_bit(ATH11K_CAC_RUNNING, &ar->dev_flags))) {
__skb_queue_purge(msdu_list);
- rcu_read_unlock();
return;
}

while ((msdu = __skb_dequeue(msdu_list))) {
ret = ath11k_dp_rx_process_msdu(ar, msdu, msdu_list);
- if (ret) {
+ if (unlikely(ret)) {
ath11k_dbg(ab, ATH11K_DBG_DATA,
"Unable to process msdu %d", ret);
dev_kfree_skb_any(msdu);
@@ -2558,8 +2554,6 @@ static void ath11k_dp_rx_process_received_packets(struct ath11k_base *ab,

ath11k_dp_rx_deliver_msdu(ar, napi, msdu);
}
-
- rcu_read_unlock();
}

int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
@@ -2604,7 +2598,7 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
rx_ring = &ar->dp.rx_refill_buf_ring;
spin_lock_bh(&rx_ring->idr_lock);
msdu = idr_find(&rx_ring->bufs_idr, buf_id);
- if (!msdu) {
+ if (unlikely(!msdu)) {
ath11k_warn(ab, "frame rx with invalid buf_id %d\n",
buf_id);
spin_unlock_bh(&rx_ring->idr_lock);
@@ -2623,8 +2617,8 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,

push_reason = FIELD_GET(HAL_REO_DEST_RING_INFO0_PUSH_REASON,
desc->info0);
- if (push_reason !=
- HAL_REO_DEST_RING_PUSH_REASON_ROUTING_INSTRUCTION) {
+ if (unlikely(push_reason !=
+ HAL_REO_DEST_RING_PUSH_REASON_ROUTING_INSTRUCTION)) {
dev_kfree_skb_any(msdu);
ab->soc_stats.hal_reo_error[dp->reo_dst_ring[ring_id].ring_id]++;
continue;
@@ -2659,7 +2653,7 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
* head pointer so that we can reap complete MPDU in the current
* rx processing.
*/
- if (!done && ath11k_hal_srng_dst_num_free(ab, srng, true)) {
+ if (unlikely(!done && ath11k_hal_srng_dst_num_free(ab, srng, true))) {
ath11k_hal_srng_access_end(ab, srng);
goto try_again;
}
@@ -2668,7 +2662,7 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,

spin_unlock_bh(&srng->lock);

- if (!total_msdu_reaped)
+ if (unlikely(!total_msdu_reaped))
goto exit;

for (i = 0; i < ab->num_radios; i++) {
--
2.7.4

2021-09-02 05:36:18

by P Praneesh

[permalink] [raw]
Subject: [PATCH v3 12/12] ath11k: avoid unnecessary lock contention in tx_completion path

Avoid unnecessary idr_find calls before the idr_remove calls. Because
idr_remove gives the valid ptr if id is valid otherwise return NULL ptr.
So removed the idr_find before idr_remove in tx completion path. Also no
need to disable the bottom half preempt if it is already in the
bottom half context, so modify the spin_lock_bh to spin_lock in the
data tx completion path.

Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.4.0.1-01734-QCAHKSWPL_SILICONZ-1 v2

Co-developed-by: Karthikeyan Periyasamy <[email protected]>
Signed-off-by: Karthikeyan Periyasamy <[email protected]>
Signed-off-by: Jouni Malinen <[email protected]>
Signed-off-by: P Praneesh <[email protected]>
---
drivers/net/wireless/ath/ath11k/dp_tx.c | 32 ++++++++++++++------------------
1 file changed, 14 insertions(+), 18 deletions(-)

diff --git a/drivers/net/wireless/ath/ath11k/dp_tx.c b/drivers/net/wireless/ath/ath11k/dp_tx.c
index 602184b..05bd86f 100644
--- a/drivers/net/wireless/ath/ath11k/dp_tx.c
+++ b/drivers/net/wireless/ath/ath11k/dp_tx.c
@@ -288,20 +288,18 @@ static void ath11k_dp_tx_free_txbuf(struct ath11k_base *ab, u8 mac_id,
struct sk_buff *msdu;
struct ath11k_skb_cb *skb_cb;

- spin_lock_bh(&tx_ring->tx_idr_lock);
- msdu = idr_find(&tx_ring->txbuf_idr, msdu_id);
- if (!msdu) {
+ spin_lock(&tx_ring->tx_idr_lock);
+ msdu = idr_remove(&tx_ring->txbuf_idr, msdu_id);
+ spin_unlock(&tx_ring->tx_idr_lock);
+
+ if (unlikely(!msdu)) {
ath11k_warn(ab, "tx completion for unknown msdu_id %d\n",
msdu_id);
- spin_unlock_bh(&tx_ring->tx_idr_lock);
return;
}

skb_cb = ATH11K_SKB_CB(msdu);

- idr_remove(&tx_ring->txbuf_idr, msdu_id);
- spin_unlock_bh(&tx_ring->tx_idr_lock);
-
dma_unmap_single(ab->dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);
dev_kfree_skb_any(msdu);

@@ -320,12 +318,13 @@ ath11k_dp_tx_htt_tx_complete_buf(struct ath11k_base *ab,
struct ath11k_skb_cb *skb_cb;
struct ath11k *ar;

- spin_lock_bh(&tx_ring->tx_idr_lock);
- msdu = idr_find(&tx_ring->txbuf_idr, ts->msdu_id);
+ spin_lock(&tx_ring->tx_idr_lock);
+ msdu = idr_remove(&tx_ring->txbuf_idr, ts->msdu_id);
+ spin_unlock(&tx_ring->tx_idr_lock);
+
if (unlikely(!msdu)) {
ath11k_warn(ab, "htt tx completion for unknown msdu_id %d\n",
ts->msdu_id);
- spin_unlock_bh(&tx_ring->tx_idr_lock);
return;
}

@@ -334,9 +333,6 @@ ath11k_dp_tx_htt_tx_complete_buf(struct ath11k_base *ab,

ar = skb_cb->ar;

- idr_remove(&tx_ring->txbuf_idr, ts->msdu_id);
- spin_unlock_bh(&tx_ring->tx_idr_lock);
-
if (atomic_dec_and_test(&ar->dp.num_tx_pending))
wake_up(&ar->dp.tx_empty_waitq);

@@ -579,16 +575,16 @@ void ath11k_dp_tx_completion_handler(struct ath11k_base *ab, int ring_id)
continue;
}

- spin_lock_bh(&tx_ring->tx_idr_lock);
- msdu = idr_find(&tx_ring->txbuf_idr, msdu_id);
+ spin_lock(&tx_ring->tx_idr_lock);
+ msdu = idr_remove(&tx_ring->txbuf_idr, msdu_id);
if (unlikely(!msdu)) {
ath11k_warn(ab, "tx completion for unknown msdu_id %d\n",
msdu_id);
- spin_unlock_bh(&tx_ring->tx_idr_lock);
+ spin_unlock(&tx_ring->tx_idr_lock);
continue;
}
- idr_remove(&tx_ring->txbuf_idr, msdu_id);
- spin_unlock_bh(&tx_ring->tx_idr_lock);
+
+ spin_unlock(&tx_ring->tx_idr_lock);

ar = ab->pdevs[mac_id].ar;

--
2.7.4

2021-09-02 05:36:27

by P Praneesh

[permalink] [raw]
Subject: [PATCH v3 08/12] ath11k: allocate HAL_WBM2SW_RELEASE ring from cacheable memory

Similar to REO destination ring, also allocate HAL_WBM2SW_RELEASE
from cacheable memory so that descriptors could be prefetched during
tx completion handling.

Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.4.0.1-01734-QCAHKSWPL_SILICONZ-1 v2

Co-developed-by: Sriram R <[email protected]>
Signed-off-by: Sriram R <[email protected]>
Signed-off-by: Jouni Malinen <[email protected]>
Signed-off-by: P Praneesh <[email protected]>
---
drivers/net/wireless/ath/ath11k/dp.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/ath/ath11k/dp.c b/drivers/net/wireless/ath/ath11k/dp.c
index 943d0a7..0278ff6 100644
--- a/drivers/net/wireless/ath/ath11k/dp.c
+++ b/drivers/net/wireless/ath/ath11k/dp.c
@@ -239,6 +239,7 @@ int ath11k_dp_srng_setup(struct ath11k_base *ab, struct dp_srng *ring,
/* Allocate the reo dst and tx completion rings from cacheable memory */
switch (type) {
case HAL_REO_DST:
+ case HAL_WBM2SW_RELEASE:
cached = true;
break;
default:
--
2.7.4

2021-09-02 05:36:27

by P Praneesh

[permalink] [raw]
Subject: [PATCH v3 06/12] ath11k: remove usage quota while processing rx packets

The usage of quota variable inside ath11k_dp_rx_process_received_packets()
is redundant. Since we would queue only max packets to the list before
calling this function so it would never exceed quota. Hence removing
usage of quota variable.

Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.4.0.1.r2-00012-QCAHKSWPL_SILICONZ-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.4.0.1-01695-QCAHKSWPL_SILICONZ-1

Co-developed-by: Sriram R <[email protected]>
Signed-off-by: Sriram R <[email protected]>
Signed-off-by: Jouni Malinen <[email protected]>
Signed-off-by: P Praneesh <[email protected]>
---
drivers/net/wireless/ath/ath11k/dp_rx.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/ath/ath11k/dp_rx.c b/drivers/net/wireless/ath/ath11k/dp_rx.c
index 1d85e10..e105bdc 100644
--- a/drivers/net/wireless/ath/ath11k/dp_rx.c
+++ b/drivers/net/wireless/ath/ath11k/dp_rx.c
@@ -2523,7 +2523,7 @@ static int ath11k_dp_rx_process_msdu(struct ath11k *ar,
static void ath11k_dp_rx_process_received_packets(struct ath11k_base *ab,
struct napi_struct *napi,
struct sk_buff_head *msdu_list,
- int *quota, int mac_id)
+ int mac_id)
{
struct sk_buff *msdu;
struct ath11k *ar;
@@ -2557,7 +2557,6 @@ static void ath11k_dp_rx_process_received_packets(struct ath11k_base *ab,
}

ath11k_dp_rx_deliver_msdu(ar, napi, msdu);
- (*quota)--;
}

rcu_read_unlock();
@@ -2574,7 +2573,6 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
int total_msdu_reaped = 0;
struct hal_srng *srng;
struct sk_buff *msdu;
- int quota = budget;
bool done = false;
int buf_id, mac_id;
struct ath11k *ar;
@@ -2677,8 +2675,7 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
if (!num_buffs_reaped[i])
continue;

- ath11k_dp_rx_process_received_packets(ab, napi, &msdu_list[i],
- &quota, i);
+ ath11k_dp_rx_process_received_packets(ab, napi, &msdu_list[i], i);

ar = ab->pdevs[i].ar;
rx_ring = &ar->dp.rx_refill_buf_ring;
@@ -2687,7 +2684,7 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
HAL_RX_BUF_RBM_SW3_BM);
}
exit:
- return budget - quota;
+ return total_msdu_reaped;
}

static void ath11k_dp_rx_update_peer_stats(struct ath11k_sta *arsta,
--
2.7.4

2021-09-02 05:36:42

by P Praneesh

[permalink] [raw]
Subject: [PATCH v3 11/12] ath11k: add branch predictors in dp_tx path

Add branch prediction in dp_tx code path in tx and tx completion handlers.
Also in ath11k_dp_tx_complete_msdu , the pointer that is returned by
rcu_dereference() is not dereferenced. so it is preferable to use
rcu_access_pointer() here.

Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.4.0.1-01734-QCAHKSWPL_SILICONZ-1 v2

Co-developed-by: Sriram R <[email protected]>
Signed-off-by: Sriram R <[email protected]>
Signed-off-by: Jouni Malinen <[email protected]>
Signed-off-by: P Praneesh <[email protected]>
---
drivers/net/wireless/ath/ath11k/dp_tx.c | 54 +++++++++++++++------------------
drivers/net/wireless/ath/ath11k/mac.c | 2 +-
2 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/drivers/net/wireless/ath/ath11k/dp_tx.c b/drivers/net/wireless/ath/ath11k/dp_tx.c
index 8bba523..602184b 100644
--- a/drivers/net/wireless/ath/ath11k/dp_tx.c
+++ b/drivers/net/wireless/ath/ath11k/dp_tx.c
@@ -95,11 +95,11 @@ int ath11k_dp_tx(struct ath11k *ar, struct ath11k_vif *arvif,
u8 ring_selector = 0, ring_map = 0;
bool tcl_ring_retry;

- if (test_bit(ATH11K_FLAG_CRASH_FLUSH, &ar->ab->dev_flags))
+ if (unlikely(test_bit(ATH11K_FLAG_CRASH_FLUSH, &ar->ab->dev_flags)))
return -ESHUTDOWN;

- if (!(info->flags & IEEE80211_TX_CTL_HW_80211_ENCAP) &&
- !ieee80211_is_data(hdr->frame_control))
+ if (unlikely(!(info->flags & IEEE80211_TX_CTL_HW_80211_ENCAP) &&
+ !ieee80211_is_data(hdr->frame_control)))
return -ENOTSUPP;

pool_id = skb_get_queue_mapping(skb) & (ATH11K_HW_MAX_QUEUES - 1);
@@ -130,7 +130,7 @@ int ath11k_dp_tx(struct ath11k *ar, struct ath11k_vif *arvif,
DP_TX_IDR_SIZE - 1, GFP_ATOMIC);
spin_unlock_bh(&tx_ring->tx_idr_lock);

- if (ret < 0) {
+ if (unlikely(ret < 0)) {
if (ring_map == (BIT(DP_TCL_NUM_RING_MAX) - 1)) {
atomic_inc(&ab->soc_stats.tx_err.misc_fail);
return -ENOSPC;
@@ -147,7 +147,7 @@ int ath11k_dp_tx(struct ath11k *ar, struct ath11k_vif *arvif,
ti.encap_type = ath11k_dp_tx_get_encap_type(arvif, skb);
ti.meta_data_flags = arvif->tcl_metadata;

- if (ti.encap_type == HAL_TCL_ENCAP_TYPE_RAW) {
+ if (unlikely(ti.encap_type == HAL_TCL_ENCAP_TYPE_RAW)) {
if (skb_cb->flags & ATH11K_SKB_CIPHER_SET) {
ti.encrypt_type =
ath11k_dp_tx_get_encrypt_type(skb_cb->cipher);
@@ -168,8 +168,8 @@ int ath11k_dp_tx(struct ath11k *ar, struct ath11k_vif *arvif,
ti.bss_ast_idx = arvif->ast_idx;
ti.dscp_tid_tbl_idx = 0;

- if (skb->ip_summed == CHECKSUM_PARTIAL &&
- ti.encap_type != HAL_TCL_ENCAP_TYPE_RAW) {
+ if (likely(skb->ip_summed == CHECKSUM_PARTIAL &&
+ ti.encap_type != HAL_TCL_ENCAP_TYPE_RAW)) {
ti.flags0 |= FIELD_PREP(HAL_TCL_DATA_CMD_INFO1_IP4_CKSUM_EN, 1) |
FIELD_PREP(HAL_TCL_DATA_CMD_INFO1_UDP4_CKSUM_EN, 1) |
FIELD_PREP(HAL_TCL_DATA_CMD_INFO1_UDP6_CKSUM_EN, 1) |
@@ -206,7 +206,7 @@ int ath11k_dp_tx(struct ath11k *ar, struct ath11k_vif *arvif,
}

ti.paddr = dma_map_single(ab->dev, skb->data, skb->len, DMA_TO_DEVICE);
- if (dma_mapping_error(ab->dev, ti.paddr)) {
+ if (unlikely(dma_mapping_error(ab->dev, ti.paddr))) {
atomic_inc(&ab->soc_stats.tx_err.misc_fail);
ath11k_warn(ab, "failed to DMA map data Tx buffer\n");
ret = -ENOMEM;
@@ -226,7 +226,7 @@ int ath11k_dp_tx(struct ath11k *ar, struct ath11k_vif *arvif,
ath11k_hal_srng_access_begin(ab, tcl_ring);

hal_tcl_desc = (void *)ath11k_hal_srng_src_get_next_entry(ab, tcl_ring);
- if (!hal_tcl_desc) {
+ if (unlikely(!hal_tcl_desc)) {
/* NOTE: It is highly unlikely we'll be running out of tcl_ring
* desc because the desc is directly enqueued onto hw queue.
*/
@@ -240,8 +240,8 @@ int ath11k_dp_tx(struct ath11k *ar, struct ath11k_vif *arvif,
* checking this ring earlier for each pkt tx.
* Restart ring selection if some rings are not checked yet.
*/
- if (ring_map != (BIT(DP_TCL_NUM_RING_MAX) - 1) &&
- !ar->ab->hw_params.tcl_0_only) {
+ if (unlikely(ring_map != (BIT(DP_TCL_NUM_RING_MAX) - 1) &&
+ !ar->ab->hw_params.tcl_0_only)) {
tcl_ring_retry = true;
ring_selector++;
}
@@ -322,7 +322,7 @@ ath11k_dp_tx_htt_tx_complete_buf(struct ath11k_base *ab,

spin_lock_bh(&tx_ring->tx_idr_lock);
msdu = idr_find(&tx_ring->txbuf_idr, ts->msdu_id);
- if (!msdu) {
+ if (unlikely(!msdu)) {
ath11k_warn(ab, "htt tx completion for unknown msdu_id %d\n",
ts->msdu_id);
spin_unlock_bh(&tx_ring->tx_idr_lock);
@@ -430,16 +430,14 @@ static void ath11k_dp_tx_complete_msdu(struct ath11k *ar,

dma_unmap_single(ab->dev, skb_cb->paddr, msdu->len, DMA_TO_DEVICE);

- rcu_read_lock();
-
- if (!rcu_dereference(ab->pdevs_active[ar->pdev_idx])) {
+ if (unlikely(!rcu_access_pointer(ab->pdevs_active[ar->pdev_idx]))) {
dev_kfree_skb_any(msdu);
- goto exit;
+ return;
}

- if (!skb_cb->vif) {
+ if (unlikely(!skb_cb->vif)) {
dev_kfree_skb_any(msdu);
- goto exit;
+ return;
}

info = IEEE80211_SKB_CB(msdu);
@@ -460,7 +458,7 @@ static void ath11k_dp_tx_complete_msdu(struct ath11k *ar,
(info->flags & IEEE80211_TX_CTL_NO_ACK))
info->flags |= IEEE80211_TX_STAT_NOACK_TRANSMITTED;

- if (ath11k_debugfs_is_extd_tx_stats_enabled(ar)) {
+ if (unlikely(ath11k_debugfs_is_extd_tx_stats_enabled(ar))) {
if (ts->flags & HAL_TX_STATUS_FLAGS_FIRST_MSDU) {
if (ar->last_ppdu_id == 0) {
ar->last_ppdu_id = ts->ppdu_id;
@@ -489,9 +487,6 @@ static void ath11k_dp_tx_complete_msdu(struct ath11k *ar,
*/

ieee80211_tx_status(ar->hw, msdu);
-
-exit:
- rcu_read_unlock();
}

static inline void ath11k_dp_tx_status_parse(struct ath11k_base *ab,
@@ -500,11 +495,11 @@ static inline void ath11k_dp_tx_status_parse(struct ath11k_base *ab,
{
ts->buf_rel_source =
FIELD_GET(HAL_WBM_RELEASE_INFO0_REL_SRC_MODULE, desc->info0);
- if (ts->buf_rel_source != HAL_WBM_REL_SRC_MODULE_FW &&
- ts->buf_rel_source != HAL_WBM_REL_SRC_MODULE_TQM)
+ if (unlikely(ts->buf_rel_source != HAL_WBM_REL_SRC_MODULE_FW &&
+ ts->buf_rel_source != HAL_WBM_REL_SRC_MODULE_TQM))
return;

- if (ts->buf_rel_source == HAL_WBM_REL_SRC_MODULE_FW)
+ if (unlikely(ts->buf_rel_source == HAL_WBM_REL_SRC_MODULE_FW))
return;

ts->status = FIELD_GET(HAL_WBM_RELEASE_INFO0_TQM_RELEASE_REASON,
@@ -551,8 +546,9 @@ void ath11k_dp_tx_completion_handler(struct ath11k_base *ab, int ring_id)
ATH11K_TX_COMPL_NEXT(tx_ring->tx_status_head);
}

- if ((ath11k_hal_srng_dst_peek(ab, status_ring) != NULL) &&
- (ATH11K_TX_COMPL_NEXT(tx_ring->tx_status_head) == tx_ring->tx_status_tail)) {
+ if (unlikely((ath11k_hal_srng_dst_peek(ab, status_ring) != NULL) &&
+ (ATH11K_TX_COMPL_NEXT(tx_ring->tx_status_head) ==
+ tx_ring->tx_status_tail))) {
/* TODO: Process pending tx_status messages when kfifo_is_full() */
ath11k_warn(ab, "Unable to process some of the tx_status ring desc because status_fifo is full\n");
}
@@ -575,7 +571,7 @@ void ath11k_dp_tx_completion_handler(struct ath11k_base *ab, int ring_id)
mac_id = FIELD_GET(DP_TX_DESC_ID_MAC_ID, desc_id);
msdu_id = FIELD_GET(DP_TX_DESC_ID_MSDU_ID, desc_id);

- if (ts.buf_rel_source == HAL_WBM_REL_SRC_MODULE_FW) {
+ if (unlikely(ts.buf_rel_source == HAL_WBM_REL_SRC_MODULE_FW)) {
ath11k_dp_tx_process_htt_tx_complete(ab,
(void *)tx_status,
mac_id, msdu_id,
@@ -585,7 +581,7 @@ void ath11k_dp_tx_completion_handler(struct ath11k_base *ab, int ring_id)

spin_lock_bh(&tx_ring->tx_idr_lock);
msdu = idr_find(&tx_ring->txbuf_idr, msdu_id);
- if (!msdu) {
+ if (unlikely(!msdu)) {
ath11k_warn(ab, "tx completion for unknown msdu_id %d\n",
msdu_id);
spin_unlock_bh(&tx_ring->tx_idr_lock);
diff --git a/drivers/net/wireless/ath/ath11k/mac.c b/drivers/net/wireless/ath/ath11k/mac.c
index e9b3689..7c4bf51 100644
--- a/drivers/net/wireless/ath/ath11k/mac.c
+++ b/drivers/net/wireless/ath/ath11k/mac.c
@@ -4339,7 +4339,7 @@ static void ath11k_mac_op_tx(struct ieee80211_hw *hw,
}

ret = ath11k_dp_tx(ar, arvif, skb);
- if (ret) {
+ if (unlikely(ret)) {
ath11k_warn(ar->ab, "failed to transmit frame %d\n", ret);
ieee80211_free_txskb(ar->hw, skb);
}
--
2.7.4

2021-09-02 05:37:11

by P Praneesh

[permalink] [raw]
Subject: [PATCH v3 09/12] ath11k: remove mod operator in dst ring processing

Replace use of mod operator with a manual wrap around
to avoid additional cost of using mod operation.

Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.4.0.1-01734-QCAHKSWPL_SILICONZ-1 v2

Co-developed-by: Sriram R <[email protected]>
Signed-off-by: Sriram R <[email protected]>
Signed-off-by: Jouni Malinen <[email protected]>
Signed-off-by: P Praneesh <[email protected]>
---
drivers/net/wireless/ath/ath11k/hal.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
index f04edaf..7cf9e23 100644
--- a/drivers/net/wireless/ath/ath11k/hal.c
+++ b/drivers/net/wireless/ath/ath11k/hal.c
@@ -654,8 +654,11 @@ u32 *ath11k_hal_srng_dst_get_next_entry(struct ath11k_base *ab,

desc = srng->ring_base_vaddr + srng->u.dst_ring.tp;

- srng->u.dst_ring.tp = (srng->u.dst_ring.tp + srng->entry_size) %
- srng->ring_size;
+ srng->u.dst_ring.tp += srng->entry_size;
+
+ /* wrap around to start of ring*/
+ if (srng->u.dst_ring.tp == srng->ring_size)
+ srng->u.dst_ring.tp = 0;

/* Try to prefetch the next descriptor in the ring */
if (srng->flags & HAL_SRNG_FLAGS_CACHED)
--
2.7.4

2021-09-02 05:37:20

by P Praneesh

[permalink] [raw]
Subject: [PATCH v3 01/12] ath11k: disable unused CE8 interrupts for ipq8074

Host driver doesn't need to process CE8 interrupts (used
by target independently)

The volume of interrupts is huge within short interval,
CPU0 CPU1 CPU2 CPU3
14022188 0 0 0 GIC 71 Edge ce8

Hence disabling unused CE8 interrupt will improve CPU usage.

Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.4.0.1.r2-00012-QCAHKSWPL_SILICONZ-1
Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.4.0.1-01695-QCAHKSWPL_SILICONZ-1

Co-developed-by: Sriram R <[email protected]>
Signed-off-by: Sriram R <[email protected]>
Signed-off-by: Jouni Malinen <[email protected]>
Signed-off-by: P Praneesh <[email protected]>
---
drivers/net/wireless/ath/ath11k/ce.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath11k/ce.c b/drivers/net/wireless/ath/ath11k/ce.c
index de8b632..b6ffe03 100644
--- a/drivers/net/wireless/ath/ath11k/ce.c
+++ b/drivers/net/wireless/ath/ath11k/ce.c
@@ -77,7 +77,7 @@ const struct ce_attr ath11k_host_ce_config_ipq8074[] = {

/* CE8: target autonomous hif_memcpy */
{
- .flags = CE_ATTR_FLAGS,
+ .flags = CE_ATTR_FLAGS | CE_ATTR_DIS_INTR,
.src_nentries = 0,
.src_sz_max = 0,
.dest_nentries = 0,
--
2.7.4

2021-11-12 13:08:07

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH v3 00/12] ath11k: optimizations in data path

P Praneesh <[email protected]> writes:

> This patchset covers optimizations in rx (first 7 patches)
> and tx (remaining 5 patches) data path.
>
> Running UDP DL/UL traffic on IPQ8074 5G radio showed an average 5-10%
> improvement on a 4 core platform

These had multiple conflicts but luckily they were relatively easy to
fix. But please do check my changes in the pending branch:

https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/ath.git/log/?h=pending

Here's where I had conflicts:

Applying: ath11k: allocate dst ring descriptors from cacheable memory
Using index info to reconstruct a base tree...
M drivers/net/wireless/ath/ath11k/core.c
M drivers/net/wireless/ath/ath11k/dp.c
M drivers/net/wireless/ath/ath11k/dp.h
M drivers/net/wireless/ath/ath11k/hw.h
Falling back to patching base and 3-way merge...
Auto-merging drivers/net/wireless/ath/ath11k/hw.h
CONFLICT (content): Merge conflict in drivers/net/wireless/ath/ath11k/hw.h
Auto-merging drivers/net/wireless/ath/ath11k/dp.h
Auto-merging drivers/net/wireless/ath/ath11k/dp.c
Auto-merging drivers/net/wireless/ath/ath11k/core.c
CONFLICT (content): Merge conflict in drivers/net/wireless/ath/ath11k/core.c
Recorded preimage for 'drivers/net/wireless/ath/ath11k/core.c'
Recorded preimage for 'drivers/net/wireless/ath/ath11k/hw.h'
error: Failed to merge in the changes.
Patch failed at 0002 ath11k: allocate dst ring descriptors from cacheable memory


Applying: ath11k: modify dp_rx desc access wrapper calls inline
Using index info to reconstruct a base tree...
M drivers/net/wireless/ath/ath11k/dp_rx.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/net/wireless/ath/ath11k/dp_rx.c
CONFLICT (content): Merge conflict in drivers/net/wireless/ath/ath11k/dp_rx.c
Recorded preimage for 'drivers/net/wireless/ath/ath11k/dp_rx.c'
error: Failed to merge in the changes.
Patch failed at 0003 ath11k: modify dp_rx desc access wrapper calls inline

Applying: ath11k: avoid additional access to ath11k_hal_srng_dst_num_free
Using index info to reconstruct a base tree...
M drivers/net/wireless/ath/ath11k/dp_rx.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/net/wireless/ath/ath11k/dp_rx.c
Applying: ath11k: avoid active pdev check for each msdu
Using index info to reconstruct a base tree...
M drivers/net/wireless/ath/ath11k/dp_rx.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/net/wireless/ath/ath11k/dp_rx.c
CONFLICT (content): Merge conflict in drivers/net/wireless/ath/ath11k/dp_rx.c
Recorded preimage for 'drivers/net/wireless/ath/ath11k/dp_rx.c'
error: Failed to merge in the changes.
Patch failed at 0005 ath11k: avoid active pdev check for each msdu

Applying: ath11k: remove usage quota while processing rx packets
Using index info to reconstruct a base tree...
M drivers/net/wireless/ath/ath11k/dp_rx.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/net/wireless/ath/ath11k/dp_rx.c
CONFLICT (content): Merge conflict in drivers/net/wireless/ath/ath11k/dp_rx.c
Recorded preimage for 'drivers/net/wireless/ath/ath11k/dp_rx.c'
error: Failed to merge in the changes.
Patch failed at 0006 ath11k: remove usage quota while processing rx packets

Applying: ath11k: add branch predictors in process_rx
Using index info to reconstruct a base tree...
M drivers/net/wireless/ath/ath11k/dp_rx.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/net/wireless/ath/ath11k/dp_rx.c
CONFLICT (content): Merge conflict in drivers/net/wireless/ath/ath11k/dp_rx.c
Recorded preimage for 'drivers/net/wireless/ath/ath11k/dp_rx.c'
error: Failed to merge in the changes.
Patch failed at 0007 ath11k: add branch predictors in process_rx

Applying: ath11k: avoid while loop in ring selection of tx completion interrupt
error: sha1 information is lacking or useless (drivers/net/wireless/ath/ath11k/dp.c).
error: could not build fake ancestor
Patch failed at 0010 ath11k: avoid while loop in ring selection of tx completion interrupt

Applying: ath11k: add branch predictors in dp_tx path
Using index info to reconstruct a base tree...
M drivers/net/wireless/ath/ath11k/dp_tx.c
M drivers/net/wireless/ath/ath11k/mac.c
Falling back to patching base and 3-way merge...
Auto-merging drivers/net/wireless/ath/ath11k/mac.c
CONFLICT (content): Merge conflict in drivers/net/wireless/ath/ath11k/mac.c
Auto-merging drivers/net/wireless/ath/ath11k/dp_tx.c
CONFLICT (content): Merge conflict in drivers/net/wireless/ath/ath11k/dp_tx.c
Recorded preimage for 'drivers/net/wireless/ath/ath11k/dp_tx.c'
Recorded preimage for 'drivers/net/wireless/ath/ath11k/mac.c'
error: Failed to merge in the changes.
Patch failed at 0011 ath11k: add branch predictors in dp_tx path

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2021-11-15 09:29:12

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH v3 01/12] ath11k: disable unused CE8 interrupts for ipq8074

P Praneesh <[email protected]> wrote:

> Host driver doesn't need to process CE8 interrupts (used
> by target independently)
>
> The volume of interrupts is huge within short interval,
> CPU0 CPU1 CPU2 CPU3
> 14022188 0 0 0 GIC 71 Edge ce8
>
> Hence disabling unused CE8 interrupt will improve CPU usage.
>
> Tested-on: QCN9074 hw1.0 PCI WLAN.HK.2.4.0.1.r2-00012-QCAHKSWPL_SILICONZ-1
> Tested-on: IPQ8074 hw2.0 AHB WLAN.HK.2.4.0.1-01695-QCAHKSWPL_SILICONZ-1
>
> Co-developed-by: Sriram R <[email protected]>
> Signed-off-by: Sriram R <[email protected]>
> Signed-off-by: Jouni Malinen <[email protected]>
> Signed-off-by: P Praneesh <[email protected]>
> Signed-off-by: Kalle Valo <[email protected]>

12 patches applied to ath-next branch of ath.git, thanks.

2c5545bfa29d ath11k: disable unused CE8 interrupts for ipq8074
6452f0a3d565 ath11k: allocate dst ring descriptors from cacheable memory
5e76fe03dbf9 ath11k: modify dp_rx desc access wrapper calls inline
a1775e732eb9 ath11k: avoid additional access to ath11k_hal_srng_dst_num_free
c4d12cb37ea2 ath11k: avoid active pdev check for each msdu
db2ecf9f0567 ath11k: remove usage quota while processing rx packets
400588039a17 ath11k: add branch predictors in process_rx
d0e2523bfa9c ath11k: allocate HAL_WBM2SW_RELEASE ring from cacheable memory
a8508bf7ced2 ath11k: remove mod operator in dst ring processing
cbfbed495d32 ath11k: avoid while loop in ring selection of tx completion interrupt
bcef57ea400c ath11k: add branch predictors in dp_tx path
be8867cb4765 ath11k: avoid unnecessary lock contention in tx_completion path

--
https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches