2013-03-08 08:43:05

by Raja Mani

[permalink] [raw]
Subject: [PATCH] ath6kl: Add rx workqueue for SDIO

In the current implementation, all RX packets (data and control)
are fetched from the chip and processed in ISR handler context.
ISR handler has to process fetched packets first before it goes
ahead and fetch further packets from the chip. In high throughput
scenario, doing everything (read and process) in one context is
time consuming and it's not quicker way of reading RX packets from
the chip.

This patch lets ISR to read packets (data and control) from the chip
and process control packets alone (EP0 pkts). All data packets are
moved to another queue which is separately processed by another worker
thread. So that, ISR doesn't have to wait to read further packets until
fetched data packets are processed. With this change, significant
improvement is seen both in TCP (around 10 Mpbs) and UDP (around 5 Mpbs)
down-link case.

Signed-off-by: Pandiyarajan Pitchaimuthu <[email protected]>
Signed-off-by: Raja Mani <[email protected]>
---
drivers/net/wireless/ath/ath6kl/htc.h | 7 ++++
drivers/net/wireless/ath/ath6kl/htc_mbox.c | 50 ++++++++++++++++++++++++++++
2 files changed, 57 insertions(+), 0 deletions(-)

diff --git a/drivers/net/wireless/ath/ath6kl/htc.h b/drivers/net/wireless/ath/ath6kl/htc.h
index a2c8ff8..485cd89 100644
--- a/drivers/net/wireless/ath/ath6kl/htc.h
+++ b/drivers/net/wireless/ath/ath6kl/htc.h
@@ -620,6 +620,13 @@ struct htc_target {
/* counts the number of Tx without bundling continously per AC */
u32 ac_tx_count[WMM_NUM_AC];

+ /* protects rx_bufq */
+ spinlock_t rx_comp_lock;
+
+ struct workqueue_struct *rx_wq;
+ struct work_struct rx_work;
+ struct list_head rx_bufq;
+
struct {
struct htc_packet *htc_packet_pool;
u8 ctrl_response_buf[HTC_MAX_CTRL_MSG_LEN];
diff --git a/drivers/net/wireless/ath/ath6kl/htc_mbox.c b/drivers/net/wireless/ath/ath6kl/htc_mbox.c
index fbb78df..6cd8213 100644
--- a/drivers/net/wireless/ath/ath6kl/htc_mbox.c
+++ b/drivers/net/wireless/ath/ath6kl/htc_mbox.c
@@ -1897,13 +1897,50 @@ fail_rx:
return status;
}

+static void ath6kl_htc_rx_work(struct work_struct *work)
+{
+ struct htc_target *target;
+ struct htc_packet *packet, *tmp_pkt;
+ struct htc_endpoint *endpoint;
+
+ target = container_of(work, struct htc_target, rx_work);
+
+ spin_lock_bh(&target->rx_comp_lock);
+ list_for_each_entry_safe(packet, tmp_pkt, &target->rx_bufq, list) {
+ list_del(&packet->list);
+
+ spin_unlock_bh(&target->rx_comp_lock);
+ endpoint = &target->endpoint[packet->endpoint];
+
+ ath6kl_dbg(ATH6KL_DBG_HTC,
+ "htc rx complete ep %d packet 0x%p\n",
+ endpoint->eid, packet);
+
+ endpoint->ep_cb.rx(endpoint->target, packet);
+ spin_lock_bh(&target->rx_comp_lock);
+ }
+ spin_unlock_bh(&target->rx_comp_lock);
+}
+
static void ath6kl_htc_rx_complete(struct htc_endpoint *endpoint,
struct htc_packet *packet)
{
+ struct htc_target *target;
+
+ if (endpoint->eid == ENDPOINT_0) {
ath6kl_dbg(ATH6KL_DBG_HTC,
"htc rx complete ep %d packet 0x%p\n",
endpoint->eid, packet);
endpoint->ep_cb.rx(endpoint->target, packet);
+ } else {
+ target = endpoint->target;
+
+ spin_lock_bh(&target->rx_comp_lock);
+ list_add_tail(&packet->list, &target->rx_bufq);
+ spin_unlock_bh(&target->rx_comp_lock);
+
+ queue_work(target->rx_wq, &target->rx_work);
+ }
}

static int ath6kl_htc_rx_bundle(struct htc_target *target,
@@ -2852,13 +2889,24 @@ static void *ath6kl_htc_mbox_create(struct ath6kl *ar)
goto err_htc_cleanup;
}

+ target->rx_wq = create_singlethread_workqueue("htc_rx");
+ if (!target->rx_wq) {
+ ath6kl_err("unable to create rx_wq workqueue\n");
+ status = -ENOMEM;
+ goto err_htc_cleanup;
+ }
+
spin_lock_init(&target->htc_lock);
spin_lock_init(&target->rx_lock);
spin_lock_init(&target->tx_lock);
+ spin_lock_init(&target->rx_comp_lock);

INIT_LIST_HEAD(&target->free_ctrl_txbuf);
INIT_LIST_HEAD(&target->free_ctrl_rxbuf);
INIT_LIST_HEAD(&target->cred_dist_list);
+ INIT_LIST_HEAD(&target->rx_bufq);
+
+ INIT_WORK(&target->rx_work, ath6kl_htc_rx_work);

target->dev->ar = ar;
target->dev->htc_cnxt = target;
@@ -2885,6 +2933,8 @@ static void ath6kl_htc_mbox_cleanup(struct htc_target *target)
{
struct htc_packet *packet, *tmp_packet;

+ destroy_workqueue(target->rx_wq);
+
ath6kl_hif_cleanup_scatter(target->dev->ar);

list_for_each_entry_safe(packet, tmp_packet,
--
1.7.1



2013-03-09 15:33:21

by Adrian Chadd

[permalink] [raw]
Subject: Re: [PATCH] ath6kl: Add rx workqueue for SDIO

On 9 March 2013 01:27, Kalle Valo <[email protected]> wrote:

> I think to fix the race the queue should be consumed something like
> this (pseudo code):
>
> lock()
> while (!list_empty()) {
> entry = list_first_entry()
> unlock()
>
> do_stuff()
>
> lock()
> }

Well, sure, but you should then re-check all the state that may have
changed when you reacquire that lock. :-)

Another way of doing it is:

init temp list
lock()
copy list to temp list
unlock()
while (! temp list empty) {
do_stuff()
}

That way you don't have that constant unlock/relock going on.


Adrian

2013-03-09 09:27:57

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] ath6kl: Add rx workqueue for SDIO

Raja Mani <[email protected]> writes:

> In the current implementation, all RX packets (data and control)
> are fetched from the chip and processed in ISR handler context.
> ISR handler has to process fetched packets first before it goes
> ahead and fetch further packets from the chip. In high throughput
> scenario, doing everything (read and process) in one context is
> time consuming and it's not quicker way of reading RX packets from
> the chip.
>
> This patch lets ISR to read packets (data and control) from the chip
> and process control packets alone (EP0 pkts). All data packets are
> moved to another queue which is separately processed by another worker
> thread. So that, ISR doesn't have to wait to read further packets until
> fetched data packets are processed. With this change, significant
> improvement is seen both in TCP (around 10 Mpbs) and UDP (around 5 Mpbs)
> down-link case.
>
> Signed-off-by: Pandiyarajan Pitchaimuthu <[email protected]>
> Signed-off-by: Raja Mani <[email protected]>

Sorry, I think I missed this in our internal review:

> +static void ath6kl_htc_rx_work(struct work_struct *work)
> +{
> + struct htc_target *target;
> + struct htc_packet *packet, *tmp_pkt;
> + struct htc_endpoint *endpoint;
> +
> + target = container_of(work, struct htc_target, rx_work);
> +
> + spin_lock_bh(&target->rx_comp_lock);
> + list_for_each_entry_safe(packet, tmp_pkt, &target->rx_bufq, list) {
> + list_del(&packet->list);
> +
> + spin_unlock_bh(&target->rx_comp_lock);
> + endpoint = &target->endpoint[packet->endpoint];
> +
> + ath6kl_dbg(ATH6KL_DBG_HTC,
> + "htc rx complete ep %d packet 0x%p\n",
> + endpoint->eid, packet);
> +
> + endpoint->ep_cb.rx(endpoint->target, packet);
> + spin_lock_bh(&target->rx_comp_lock);
> + }
> + spin_unlock_bh(&target->rx_comp_lock);

I think there's a (theoretical?) race here. You cannot consume rx_bufq
with a for loop and then release rx_comp_lock inside the loop. Nothing
prevents that rx_bufq is not modified while the lock is released. Ok, in
practise it won't be a problem as ISR is adding them to the end of list.

I think to fix the race the queue should be consumed something like
this (pseudo code):

lock()
while (!list_empty()) {
entry = list_first_entry()
unlock()

do_stuff()

lock()
}

Can someone else comment? Am I on right track?

I so wish sdio.c and htc would use struct sk_buff. It would have all the
infrastructure for this :/

BTW, for consistency please also rename rx_comp_lock to rx_bufq_lock.


--
Kalle Valo