Hi,
This patchset addresses some throughput issues.
Pozdrawiam / Best regards,
Michal Kazior.
Michal Kazior (3):
ath10k: make the workqueue multithreaded
ath10k: move htt rx processing to worker
ath10k: fix issues on non-preemptible systems
drivers/net/wireless/ath/ath10k/core.c | 4 ++-
drivers/net/wireless/ath/ath10k/core.h | 7 ++++
drivers/net/wireless/ath/ath10k/htc.c | 4 +++
drivers/net/wireless/ath/ath10k/htt.h | 3 ++
drivers/net/wireless/ath/ath10k/htt_rx.c | 58 ++++++++++++++++++++++++++----
drivers/net/wireless/ath/ath10k/wmi.c | 4 +++
6 files changed, 72 insertions(+), 8 deletions(-)
--
1.7.9.5
On 28 August 2013 06:02, Kalle Valo <[email protected]> wrote:
> Michal Kazior <[email protected]> writes:
>> There's another solution that I had in mind. Instead of:
>>
>> for (;;) { dequeue(z); process; }
>>
>> I did:
>>
>> q = dequeue_all(z); for (;;) { dequeue(q); process; }
>>
>> I.e. move all queued stuff at the worker entry and move it out of the
>> global queue, that can, and will be, having more stuff queued while
>> the worker does its job).
>>
>> This way workers would exit/restart more often, but from what I tested
>> it didn't really solve the problem. Given enough traffic HTC worker
>> responsible for HTT TX gets overwhelmed eventually. You could try
>> limit how many frames a worker can process during one execution, but
>> how do you guess that? This starvation depends on how fast your CPU
>> is.
>
> I think we should come up with better ways to handle this. To have a
> quota (like you mention above) would be one option. Other option would
> be to have multiple queues and some sort of priorisation or fair
> queueing.
Having quota will not help here in any way. You can re-queue a worker
after each single frame and avoid WMI starvation, however you can
still starve the rest of the system (and that can lead to system reset
via watchdog). I'm also unsure about the overhead queueing a work may
have (on a uP system without preemption in might be negligible, but
what about other systems?), so you'd have to guess the quota size or
else you'd get increased latency/overhead and perhaps slower
performance.
I believe cond_resched is a solution, not a workaround. Slow systems
without preemption need this. I wonder how other drivers got around
it? Or perhaps none of the other drivers had to deal with really
insufficient number of CPU cycles versus lots of work.
We could perhaps move workers out of HTC and have a single TX worker
in core.c for both WMI and HTT that would prioritize WMI, before
trying HTT. This could help guarantee that all beacons (which go
through WMI) are sent in a timely fashion in response to SWBA event.
But that won't fix the overall system starvation.
> And most importantly of all, we should minimise the lenght of queue we
> have inside ath10k. I'm worried that we queue way too many packets
> within ath10k right now.
Felix pointed that out quite some time ago. I would agree but I'm
affraid you'll hurt performance if you decrease the queue depth. There
seems to be some kind of latency thing going on (either on host, or on
firmware/hardware, or both combined). I tried decreasing HTT TX ring
buffer from 2048 to 256. In result on AP135 UDP TX got trimmed at
~330mbps max. Stuffing more throughput even left some idle CPU cycles.
If you consider 3x3 devices that are supposed to get you 1.3gbps, then
you apparently need that 2048 depth.
Michał.
On 27 August 2013 09:06, Kalle Valo <[email protected]> wrote:
> Michal Kazior <[email protected]> writes:
>
>> Workers may not call into a sleepable function
>> (e.g. mutex_lock). Since ath10k workers can run
>> for a very long time it is necessary to explicitly
>> allow process rescheduling in case there's no
>> preemption.
>>
>> This fixes some issues with system freezes, hangs,
>> watchdogs being triggered, userspace being
>> unresponsive on slow host machines under heavy
>> load.
>
> I consider this more as a workaround as a real fix. Would NAPI be a
> proper fix for issues like this?
>
> NAPI support was removed from mac80211 six months ago, but I guess we
> could try to get it back if we have a good reason:
>
> http://marc.info/?l=linux-wireless&m=136204135907491
Hmm. From what I understand NAPI is used for RX polling. My patch
addresses mainly WMI/HTT TX starvation.
There's another solution that I had in mind. Instead of:
for (;;) { dequeue(z); process; }
I did:
q = dequeue_all(z); for (;;) { dequeue(q); process; }
I.e. move all queued stuff at the worker entry and move it out of the
global queue, that can, and will be, having more stuff queued while
the worker does its job).
This way workers would exit/restart more often, but from what I tested
it didn't really solve the problem. Given enough traffic HTC worker
responsible for HTT TX gets overwhelmed eventually. You could try
limit how many frames a worker can process during one execution, but
how do you guess that? This starvation depends on how fast your CPU
is.
Thus I opted for sole cond_resched().
>> --- a/drivers/net/wireless/ath/ath10k/htt_rx.c
>> +++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
>> @@ -1229,6 +1229,10 @@ static void ath10k_htt_rx_work(struct work_struct *work)
>> break;
>>
>> ath10k_htt_rx_process_skb(htt->ar, skb);
>> +
>> +#ifndef CONFIG_PREEMPT
>> + cond_resched();
>> +#endif
>
> Why the #ifndef? Why should we not call cond_resched() when PREEMPT is
> enabled? Does something negative happen then?
I think it should be okay. I saw the #ifndef thing in b43legacy and
thought it simply makes sense (although it's unsightly).
Michał.
Michal Kazior <[email protected]> writes:
> On 28 August 2013 06:23, Kalle Valo <[email protected]> wrote:
>> Michal Kazior <[email protected]> writes:
>>
>>> According to mac80211 docs tx and rx reporting
>>> routines must not run concurrently unless they are
>>> _irqsafe variants.
>>>
>>> This doesn't fix any visible bug but is apparently
>>> the right thing to do as per the documentation.
>>>
>>> Signed-off-by: Michal Kazior <[email protected]>
>>
>> I had a quick chat with Johannes and Sujith about this. The concurrency
>> requirements are for the STA PS race in AP mode (see
>> ieee80211_handle_filtered_frame()).
>>
>> I think we should drop this frame, revisit later and properly analyse
"drop this frame". Heh, of course I was trying to say "drop this
patch" :)
>> how to fix the race. But it would be good add this to the todo list:
>>
>> http://wireless.kernel.org/en/users/Drivers/ath10k#TODO
>
> I've added this to the TODO list.
Thanks!
--
Kalle Valo
Michal Kazior <[email protected]> writes:
> Workers may not call into a sleepable function
> (e.g. mutex_lock). Since ath10k workers can run
> for a very long time it is necessary to explicitly
> allow process rescheduling in case there's no
> preemption.
>
> This fixes some issues with system freezes, hangs,
> watchdogs being triggered, userspace being
> unresponsive on slow host machines under heavy
> load.
I consider this more as a workaround as a real fix. Would NAPI be a
proper fix for issues like this?
NAPI support was removed from mac80211 six months ago, but I guess we
could try to get it back if we have a good reason:
http://marc.info/?l=linux-wireless&m=136204135907491
> --- a/drivers/net/wireless/ath/ath10k/htt_rx.c
> +++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
> @@ -1229,6 +1229,10 @@ static void ath10k_htt_rx_work(struct work_struct *work)
> break;
>
> ath10k_htt_rx_process_skb(htt->ar, skb);
> +
> +#ifndef CONFIG_PREEMPT
> + cond_resched();
> +#endif
Why the #ifndef? Why should we not call cond_resched() when PREEMPT is
enabled? Does something negative happen then?
--
Kalle Valo
This improves rx performance by a significant
value (280mbps -> 350mbps UDP) on AP135.
Note: it is not safe to move HTT tx completion
handling to a worker yet as it must be serialized
against HTC tx completions.
Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/core.h | 4 ++-
drivers/net/wireless/ath/ath10k/htt.h | 3 ++
drivers/net/wireless/ath/ath10k/htt_rx.c | 54 ++++++++++++++++++++++++++----
3 files changed, 53 insertions(+), 8 deletions(-)
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index e196833..7d5b71b 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -42,8 +42,10 @@
#define ATH10K_NUM_HTC_TX_WORKERS ATH10K_HTC_EP_COUNT
#define ATH10K_NUM_WMI_RX_WORKERS 1
+#define ATH10K_NUM_HTT_RX_WORKERS 1
#define ATH10K_MAX_NUM_PARALLEL_WORKERS (ATH10K_NUM_HTC_TX_WORKERS + \
- ATH10K_NUM_WMI_RX_WORKERS)
+ ATH10K_NUM_WMI_RX_WORKERS + \
+ ATH10K_NUM_HTT_RX_WORKERS)
/* Antenna noise floor */
#define ATH10K_DEFAULT_NOISE_FLOOR -95
diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index 26c78a9..8abbf2c 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -1268,6 +1268,9 @@ struct ath10k_htt {
/* set if host-fw communication goes haywire
* used to avoid further failures */
bool rx_confused;
+
+ struct work_struct rx_work;
+ struct sk_buff_head rx_queue;
};
#define RX_HTT_HDR_STATUS_LEN 64
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 9bb0ae89..6cf4d95 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -40,6 +40,10 @@
/* when under memory pressure rx ring refill may fail and needs a retry */
#define HTT_RX_RING_REFILL_RETRY_MS 50
+
+static void ath10k_htt_rx_work(struct work_struct *work);
+
+
static int ath10k_htt_rx_ring_size(struct ath10k_htt *htt)
{
int size;
@@ -211,6 +215,8 @@ void ath10k_htt_rx_detach(struct ath10k_htt *htt)
{
int sw_rd_idx = htt->rx_ring.sw_rd_idx.msdu_payld;
+ skb_queue_purge(&htt->rx_queue);
+ cancel_work_sync(&htt->rx_work);
del_timer_sync(&htt->rx_ring.refill_retry_timer);
while (sw_rd_idx != __le32_to_cpu(*(htt->rx_ring.alloc_idx.vaddr))) {
@@ -501,6 +507,9 @@ int ath10k_htt_rx_attach(struct ath10k_htt *htt)
if (__ath10k_htt_rx_ring_fill_n(htt, htt->rx_ring.fill_level))
goto err_fill_ring;
+ INIT_WORK(&htt->rx_work, ath10k_htt_rx_work);
+ skb_queue_head_init(&htt->rx_queue);
+
ath10k_dbg(ATH10K_DBG_HTT, "HTT RX ring size: %d, fill_level: %d\n",
htt->rx_ring.size, htt->rx_ring.fill_level);
return 0;
@@ -1083,17 +1092,11 @@ end:
}
}
-void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
+static void ath10k_htt_rx_process_skb(struct ath10k *ar, struct sk_buff *skb)
{
struct ath10k_htt *htt = &ar->htt;
struct htt_resp *resp = (struct htt_resp *)skb->data;
- /* confirm alignment */
- if (!IS_ALIGNED((unsigned long)skb->data, 4))
- ath10k_warn("unaligned htt message, expect trouble\n");
-
- ath10k_dbg(ATH10K_DBG_HTT, "HTT RX, msg_type: 0x%0X\n",
- resp->hdr.msg_type);
switch (resp->hdr.msg_type) {
case HTT_T2H_MSG_TYPE_VERSION_CONF: {
htt->target_version_major = resp->ver_resp.major;
@@ -1214,3 +1217,40 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
/* Free the indication buffer */
dev_kfree_skb_any(skb);
}
+
+static void ath10k_htt_rx_work(struct work_struct *work)
+{
+ struct ath10k_htt *htt = container_of(work, struct ath10k_htt, rx_work);
+ struct sk_buff *skb;
+
+ for (;;) {
+ skb = skb_dequeue(&htt->rx_queue);
+ if (!skb)
+ break;
+
+ ath10k_htt_rx_process_skb(htt->ar, skb);
+ }
+}
+
+void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
+{
+ struct ath10k_htt *htt = &ar->htt;
+ struct htt_resp *resp = (struct htt_resp *)skb->data;
+
+ /* confirm alignment */
+ if (!IS_ALIGNED((unsigned long)skb->data, 4))
+ ath10k_warn("unaligned htt message, expect trouble\n");
+
+ ath10k_dbg(ATH10K_DBG_HTT, "HTT RX, msg_type: 0x%0X\n",
+ resp->hdr.msg_type);
+ switch (resp->hdr.msg_type) {
+ case HTT_T2H_MSG_TYPE_RX_FRAG_IND:
+ case HTT_T2H_MSG_TYPE_RX_IND:
+ skb_queue_tail(&htt->rx_queue, skb);
+ queue_work(ar->workqueue, &htt->rx_work);
+ break;
+ default:
+ ath10k_htt_rx_process_skb(ar, skb);
+ break;
+ }
+}
--
1.7.9.5
Michal Kazior <[email protected]> writes:
> On 27 August 2013 09:06, Kalle Valo <[email protected]> wrote:
>
>> I consider this more as a workaround as a real fix. Would NAPI be a
>> proper fix for issues like this?
>>
>> NAPI support was removed from mac80211 six months ago, but I guess we
>> could try to get it back if we have a good reason:
>>
>> http://marc.info/?l=linux-wireless&m=136204135907491
>
> Hmm. From what I understand NAPI is used for RX polling. My patch
> addresses mainly WMI/HTT TX starvation.
I thought I saw something related to NAPI on TX as well, but I can't
find it anymore.
> There's another solution that I had in mind. Instead of:
>
> for (;;) { dequeue(z); process; }
>
> I did:
>
> q = dequeue_all(z); for (;;) { dequeue(q); process; }
>
> I.e. move all queued stuff at the worker entry and move it out of the
> global queue, that can, and will be, having more stuff queued while
> the worker does its job).
>
> This way workers would exit/restart more often, but from what I tested
> it didn't really solve the problem. Given enough traffic HTC worker
> responsible for HTT TX gets overwhelmed eventually. You could try
> limit how many frames a worker can process during one execution, but
> how do you guess that? This starvation depends on how fast your CPU
> is.
I think we should come up with better ways to handle this. To have a
quota (like you mention above) would be one option. Other option would
be to have multiple queues and some sort of priorisation or fair
queueing.
And most importantly of all, we should minimise the lenght of queue we
have inside ath10k. I'm worried that we queue way too many packets
within ath10k right now.
> Thus I opted for sole cond_resched().
Yeah, this is a good workaround for the problem. But it would be good to
get bottom of this as well.
>>> +#ifndef CONFIG_PREEMPT
>>> + cond_resched();
>>> +#endif
>>
>> Why the #ifndef? Why should we not call cond_resched() when PREEMPT is
>> enabled? Does something negative happen then?
>
> I think it should be okay. I saw the #ifndef thing in b43legacy and
> thought it simply makes sense (although it's unsightly).
For various reasons I would prefer to have cond_resched() without the
#ifndef, if possible.
--
Kalle Valo
Michal Kazior <[email protected]> writes:
> According to mac80211 docs tx and rx reporting
> routines must not run concurrently unless they are
> _irqsafe variants.
>
> This doesn't fix any visible bug but is apparently
> the right thing to do as per the documentation.
>
> Signed-off-by: Michal Kazior <[email protected]>
I had a quick chat with Johannes and Sujith about this. The concurrency
requirements are for the STA PS race in AP mode (see
ieee80211_handle_filtered_frame()).
I think we should drop this frame, revisit later and properly analyse
how to fix the race. But it would be good add this to the todo list:
http://wireless.kernel.org/en/users/Drivers/ath10k#TODO
--
Kalle Valo
TX work for a HTC endpoint can run for long
periods of time, especially on slower host
machines when under heavy load.
This patch prevents one endpoint starving another
one by allowing each endpoint worker to be
processed in parallel.
This is safe since none of ar->workqueue users
depend on sequential execution.
Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/core.c | 4 +++-
drivers/net/wireless/ath/ath10k/core.h | 5 +++++
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index 04c132e..a828584 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -520,7 +520,9 @@ struct ath10k *ath10k_core_create(void *hif_priv, struct device *dev,
setup_timer(&ar->scan.timeout, ath10k_reset_scan, (unsigned long)ar);
- ar->workqueue = create_singlethread_workqueue("ath10k_wq");
+ ar->workqueue = alloc_workqueue("ath10k_wq",
+ WQ_UNBOUND | WQ_MEM_RECLAIM,
+ ATH10K_MAX_NUM_PARALLEL_WORKERS);
if (!ar->workqueue)
goto err_wq;
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index ab05c4c..e196833 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -40,6 +40,11 @@
#define ATH10K_FLUSH_TIMEOUT_HZ (5*HZ)
#define ATH10K_NUM_CHANS 38
+#define ATH10K_NUM_HTC_TX_WORKERS ATH10K_HTC_EP_COUNT
+#define ATH10K_NUM_WMI_RX_WORKERS 1
+#define ATH10K_MAX_NUM_PARALLEL_WORKERS (ATH10K_NUM_HTC_TX_WORKERS + \
+ ATH10K_NUM_WMI_RX_WORKERS)
+
/* Antenna noise floor */
#define ATH10K_DEFAULT_NOISE_FLOOR -95
--
1.7.9.5
This improves rx performance by a significant
value (280mbps -> 350mbps UDP) on AP135.
Note: it is not safe to move HTT tx completion
handling to a worker yet as it must be serialized
against HTC tx completions.
Signed-off-by: Michal Kazior <[email protected]>
---
v2:
* use ieee80211_rx_ni()
drivers/net/wireless/ath/ath10k/core.h | 4 ++-
drivers/net/wireless/ath/ath10k/htt.h | 3 ++
drivers/net/wireless/ath/ath10k/htt_rx.c | 54 ++++++++++++++++++++++++++----
drivers/net/wireless/ath/ath10k/txrx.c | 2 +-
4 files changed, 54 insertions(+), 9 deletions(-)
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 17694ec..b5f8df0 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -42,8 +42,10 @@
#define ATH10K_NUM_HTC_TX_WORKERS ATH10K_HTC_EP_COUNT
#define ATH10K_NUM_WMI_RX_WORKERS 1
+#define ATH10K_NUM_HTT_RX_WORKERS 1
#define ATH10K_MAX_NUM_PARALLEL_WORKERS (ATH10K_NUM_HTC_TX_WORKERS + \
- ATH10K_NUM_WMI_RX_WORKERS)
+ ATH10K_NUM_WMI_RX_WORKERS + \
+ ATH10K_NUM_HTT_RX_WORKERS)
/* Antenna noise floor */
#define ATH10K_DEFAULT_NOISE_FLOOR -95
diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h
index 26c78a9..8abbf2c 100644
--- a/drivers/net/wireless/ath/ath10k/htt.h
+++ b/drivers/net/wireless/ath/ath10k/htt.h
@@ -1268,6 +1268,9 @@ struct ath10k_htt {
/* set if host-fw communication goes haywire
* used to avoid further failures */
bool rx_confused;
+
+ struct work_struct rx_work;
+ struct sk_buff_head rx_queue;
};
#define RX_HTT_HDR_STATUS_LEN 64
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 9bb0ae89..6cf4d95 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -40,6 +40,10 @@
/* when under memory pressure rx ring refill may fail and needs a retry */
#define HTT_RX_RING_REFILL_RETRY_MS 50
+
+static void ath10k_htt_rx_work(struct work_struct *work);
+
+
static int ath10k_htt_rx_ring_size(struct ath10k_htt *htt)
{
int size;
@@ -211,6 +215,8 @@ void ath10k_htt_rx_detach(struct ath10k_htt *htt)
{
int sw_rd_idx = htt->rx_ring.sw_rd_idx.msdu_payld;
+ skb_queue_purge(&htt->rx_queue);
+ cancel_work_sync(&htt->rx_work);
del_timer_sync(&htt->rx_ring.refill_retry_timer);
while (sw_rd_idx != __le32_to_cpu(*(htt->rx_ring.alloc_idx.vaddr))) {
@@ -501,6 +507,9 @@ int ath10k_htt_rx_attach(struct ath10k_htt *htt)
if (__ath10k_htt_rx_ring_fill_n(htt, htt->rx_ring.fill_level))
goto err_fill_ring;
+ INIT_WORK(&htt->rx_work, ath10k_htt_rx_work);
+ skb_queue_head_init(&htt->rx_queue);
+
ath10k_dbg(ATH10K_DBG_HTT, "HTT RX ring size: %d, fill_level: %d\n",
htt->rx_ring.size, htt->rx_ring.fill_level);
return 0;
@@ -1083,17 +1092,11 @@ end:
}
}
-void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
+static void ath10k_htt_rx_process_skb(struct ath10k *ar, struct sk_buff *skb)
{
struct ath10k_htt *htt = &ar->htt;
struct htt_resp *resp = (struct htt_resp *)skb->data;
- /* confirm alignment */
- if (!IS_ALIGNED((unsigned long)skb->data, 4))
- ath10k_warn("unaligned htt message, expect trouble\n");
-
- ath10k_dbg(ATH10K_DBG_HTT, "HTT RX, msg_type: 0x%0X\n",
- resp->hdr.msg_type);
switch (resp->hdr.msg_type) {
case HTT_T2H_MSG_TYPE_VERSION_CONF: {
htt->target_version_major = resp->ver_resp.major;
@@ -1214,3 +1217,40 @@ void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
/* Free the indication buffer */
dev_kfree_skb_any(skb);
}
+
+static void ath10k_htt_rx_work(struct work_struct *work)
+{
+ struct ath10k_htt *htt = container_of(work, struct ath10k_htt, rx_work);
+ struct sk_buff *skb;
+
+ for (;;) {
+ skb = skb_dequeue(&htt->rx_queue);
+ if (!skb)
+ break;
+
+ ath10k_htt_rx_process_skb(htt->ar, skb);
+ }
+}
+
+void ath10k_htt_t2h_msg_handler(struct ath10k *ar, struct sk_buff *skb)
+{
+ struct ath10k_htt *htt = &ar->htt;
+ struct htt_resp *resp = (struct htt_resp *)skb->data;
+
+ /* confirm alignment */
+ if (!IS_ALIGNED((unsigned long)skb->data, 4))
+ ath10k_warn("unaligned htt message, expect trouble\n");
+
+ ath10k_dbg(ATH10K_DBG_HTT, "HTT RX, msg_type: 0x%0X\n",
+ resp->hdr.msg_type);
+ switch (resp->hdr.msg_type) {
+ case HTT_T2H_MSG_TYPE_RX_FRAG_IND:
+ case HTT_T2H_MSG_TYPE_RX_IND:
+ skb_queue_tail(&htt->rx_queue, skb);
+ queue_work(ar->workqueue, &htt->rx_work);
+ break;
+ default:
+ ath10k_htt_rx_process_skb(ar, skb);
+ break;
+ }
+}
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
index ce7e304..dd87e6f 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -297,7 +297,7 @@ void ath10k_process_rx(struct ath10k *ar, struct htt_rx_info *info)
status->band);
spin_lock_bh(&ar->txrx_lock);
- ieee80211_rx(ar->hw, info->skb);
+ ieee80211_rx_ni(ar->hw, info->skb);
spin_unlock_bh(&ar->txrx_lock);
}
--
1.7.9.5
TX work for a HTC endpoint can run for long
periods of time, especially on slower host
machines when under heavy load.
This patch prevents one endpoint starving another
one by allowing each endpoint worker to be
processed in parallel.
This is safe since none of ar->workqueue users
depend on sequential execution.
Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/core.c | 4 +++-
drivers/net/wireless/ath/ath10k/core.h | 5 +++++
2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index b43e8ad..348a0fd 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -520,7 +520,9 @@ struct ath10k *ath10k_core_create(void *hif_priv, struct device *dev,
setup_timer(&ar->scan.timeout, ath10k_reset_scan, (unsigned long)ar);
- ar->workqueue = create_singlethread_workqueue("ath10k_wq");
+ ar->workqueue = alloc_workqueue("ath10k_wq",
+ WQ_UNBOUND | WQ_MEM_RECLAIM,
+ ATH10K_MAX_NUM_PARALLEL_WORKERS);
if (!ar->workqueue)
goto err_wq;
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index 0469a2e..17694ec 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -40,6 +40,11 @@
#define ATH10K_FLUSH_TIMEOUT_HZ (5*HZ)
#define ATH10K_NUM_CHANS 38
+#define ATH10K_NUM_HTC_TX_WORKERS ATH10K_HTC_EP_COUNT
+#define ATH10K_NUM_WMI_RX_WORKERS 1
+#define ATH10K_MAX_NUM_PARALLEL_WORKERS (ATH10K_NUM_HTC_TX_WORKERS + \
+ ATH10K_NUM_WMI_RX_WORKERS)
+
/* Antenna noise floor */
#define ATH10K_DEFAULT_NOISE_FLOOR -95
--
1.7.9.5
Hi,
This patchset addresses some throughput/stability
issues.
v2:
* add new patch
* fix WARNING due to wrong ieee80211_rx() usage
Pozdrawiam / Best regards,
Michal Kazior.
Michal Kazior (4):
ath10k: synchronize tx/rx reporting to mac80211
ath10k: make the workqueue multithreaded
ath10k: move htt rx processing to worker
ath10k: fix issues on non-preemptible systems
drivers/net/wireless/ath/ath10k/core.c | 5 ++-
drivers/net/wireless/ath/ath10k/core.h | 10 ++++++
drivers/net/wireless/ath/ath10k/htc.c | 4 +++
drivers/net/wireless/ath/ath10k/htt.h | 3 ++
drivers/net/wireless/ath/ath10k/htt_rx.c | 58 ++++++++++++++++++++++++++----
drivers/net/wireless/ath/ath10k/txrx.c | 6 +++-
drivers/net/wireless/ath/ath10k/wmi.c | 4 +++
7 files changed, 81 insertions(+), 9 deletions(-)
--
1.7.9.5
"Luis R. Rodriguez" <[email protected]> writes:
> On Mon, Aug 26, 2013 at 1:53 AM, Michal Kazior <[email protected]> wrote:
>> ath10k: fix issues on non-preemptible systems
>
> This patch looks like a stable candidate fix. Please annotate as such
> if you confirm. Also, I reviewed other ath10k "fixes" and I see no
> practice of propagating any patches to stable yet. Can you please
> start doing that? If there were patches which are already merged
> upstream that should be propagated to stable then they can be
> submitted as stable candidate patches.
I disagree. The point of linux-stable is _not_ that we send all possible
fixes to stable. Instead we should send fixes only which really matter
to users and for which we have received bug reports. I haven't yet seen
any fix for ath10k which should be a candidate for stable releases.
If we start sending all ath10k fixes to stable it's just extra churn for
both Greg and people working on ath10k.
--
Kalle Valo
On Tue, Aug 27, 2013 at 09:57:22AM +0300, Kalle Valo wrote:
> "Luis R. Rodriguez" <[email protected]> writes:
>
> > On Mon, Aug 26, 2013 at 1:53 AM, Michal Kazior <[email protected]> wrote:
> >> ath10k: fix issues on non-preemptible systems
> >
> > This patch looks like a stable candidate fix. Please annotate as such
> > if you confirm. Also, I reviewed other ath10k "fixes" and I see no
> > practice of propagating any patches to stable yet. Can you please
> > start doing that? If there were patches which are already merged
> > upstream that should be propagated to stable then they can be
> > submitted as stable candidate patches.
>
> I disagree. The point of linux-stable is _not_ that we send all possible
> fixes to stable. Instead we should send fixes only which really matter
> to users and for which we have received bug reports. I haven't yet seen
> any fix for ath10k which should be a candidate for stable releases.
You don't need to wait for an issue to happen to consider it serious,
the description of the symptoms seem pretty bad to me, but its your
call in the end.
> If we start sending all ath10k fixes to stable it's just extra churn for
> both Greg and people working on ath10k.
I'm not asking for anything that has the word "fix" to be sent, I'm
asking them to be reviewed for stable consideration.
Luis
On 21 August 2013 08:34, Michal Kazior <[email protected]> wrote:
> Hi,
>
> This patchset addresses some throughput issues.
>
>
> Pozdrawiam / Best regards,
> Michal Kazior.
>
>
> Michal Kazior (3):
> ath10k: make the workqueue multithreaded
> ath10k: move htt rx processing to worker
> ath10k: fix issues on non-preemptible systems
>
> drivers/net/wireless/ath/ath10k/core.c | 4 ++-
> drivers/net/wireless/ath/ath10k/core.h | 7 ++++
> drivers/net/wireless/ath/ath10k/htc.c | 4 +++
> drivers/net/wireless/ath/ath10k/htt.h | 3 ++
> drivers/net/wireless/ath/ath10k/htt_rx.c | 58 ++++++++++++++++++++++++++----
> drivers/net/wireless/ath/ath10k/wmi.c | 4 +++
> 6 files changed, 72 insertions(+), 8 deletions(-)
Drop this patchset for now, please, There are some minor issues I'd
like to address first. Nevertheless, feel free to comment the patchset
if you see any issues.
Pozdrawiam / Best regards,
Michał Kazior.
On 28 August 2013 06:23, Kalle Valo <[email protected]> wrote:
> Michal Kazior <[email protected]> writes:
>
>> According to mac80211 docs tx and rx reporting
>> routines must not run concurrently unless they are
>> _irqsafe variants.
>>
>> This doesn't fix any visible bug but is apparently
>> the right thing to do as per the documentation.
>>
>> Signed-off-by: Michal Kazior <[email protected]>
>
> I had a quick chat with Johannes and Sujith about this. The concurrency
> requirements are for the STA PS race in AP mode (see
> ieee80211_handle_filtered_frame()).
>
> I think we should drop this frame, revisit later and properly analyse
> how to fix the race. But it would be good add this to the todo list:
>
> http://wireless.kernel.org/en/users/Drivers/ath10k#TODO
I've added this to the TODO list.
I'm worried this PS stuff is broken on ath10k anyway because ath10k
modifies tx skbuffs for QoS workaround (QoS Control must be removed
before submitting to FW).
Michał.
Workers may not call into a sleepable function
(e.g. mutex_lock). Since ath10k workers can run
for a very long time it is necessary to explicitly
allow process rescheduling in case there's no
preemption.
This fixes some issues with system freezes, hangs,
watchdogs being triggered, userspace being
unresponsive on slow host machines under heavy
load.
Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/htc.c | 4 ++++
drivers/net/wireless/ath/ath10k/htt_rx.c | 4 ++++
drivers/net/wireless/ath/ath10k/wmi.c | 4 ++++
3 files changed, 12 insertions(+)
diff --git a/drivers/net/wireless/ath/ath10k/htc.c b/drivers/net/wireless/ath/ath10k/htc.c
index 7d445d3..99f1dbd 100644
--- a/drivers/net/wireless/ath/ath10k/htc.c
+++ b/drivers/net/wireless/ath/ath10k/htc.c
@@ -237,6 +237,10 @@ static void ath10k_htc_send_work(struct work_struct *work)
ret = ath10k_htc_issue_skb(htc, ep, skb, credits);
if (ret == -ENOSR)
break;
+
+#ifndef CONFIG_PREEMPT
+ cond_resched();
+#endif
}
}
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 6cf4d95..80ea398 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -1229,6 +1229,10 @@ static void ath10k_htt_rx_work(struct work_struct *work)
break;
ath10k_htt_rx_process_skb(htt->ar, skb);
+
+#ifndef CONFIG_PREEMPT
+ cond_resched();
+#endif
}
}
diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
index 32fd5e7..f36f0be 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -1155,6 +1155,10 @@ static void ath10k_wmi_event_work(struct work_struct *work)
break;
ath10k_wmi_event_process(ar, skb);
+
+#ifndef CONFIG_PREEMPT
+ cond_resched();
+#endif
}
}
--
1.7.9.5
Workers may not call into a sleepable function
(e.g. mutex_lock). Since ath10k workers can run
for a very long time it is necessary to explicitly
allow process rescheduling in case there's no
preemption.
This fixes some issues with system freezes, hangs,
watchdogs being triggered, userspace being
unresponsive on slow host machines under heavy
load.
Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/htc.c | 4 ++++
drivers/net/wireless/ath/ath10k/htt_rx.c | 4 ++++
drivers/net/wireless/ath/ath10k/wmi.c | 4 ++++
3 files changed, 12 insertions(+)
diff --git a/drivers/net/wireless/ath/ath10k/htc.c b/drivers/net/wireless/ath/ath10k/htc.c
index 7d445d3..99f1dbd 100644
--- a/drivers/net/wireless/ath/ath10k/htc.c
+++ b/drivers/net/wireless/ath/ath10k/htc.c
@@ -237,6 +237,10 @@ static void ath10k_htc_send_work(struct work_struct *work)
ret = ath10k_htc_issue_skb(htc, ep, skb, credits);
if (ret == -ENOSR)
break;
+
+#ifndef CONFIG_PREEMPT
+ cond_resched();
+#endif
}
}
diff --git a/drivers/net/wireless/ath/ath10k/htt_rx.c b/drivers/net/wireless/ath/ath10k/htt_rx.c
index 6cf4d95..80ea398 100644
--- a/drivers/net/wireless/ath/ath10k/htt_rx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_rx.c
@@ -1229,6 +1229,10 @@ static void ath10k_htt_rx_work(struct work_struct *work)
break;
ath10k_htt_rx_process_skb(htt->ar, skb);
+
+#ifndef CONFIG_PREEMPT
+ cond_resched();
+#endif
}
}
diff --git a/drivers/net/wireless/ath/ath10k/wmi.c b/drivers/net/wireless/ath/ath10k/wmi.c
index 32fd5e7..f36f0be 100644
--- a/drivers/net/wireless/ath/ath10k/wmi.c
+++ b/drivers/net/wireless/ath/ath10k/wmi.c
@@ -1155,6 +1155,10 @@ static void ath10k_wmi_event_work(struct work_struct *work)
break;
ath10k_wmi_event_process(ar, skb);
+
+#ifndef CONFIG_PREEMPT
+ cond_resched();
+#endif
}
}
--
1.7.9.5
According to mac80211 docs tx and rx reporting
routines must not run concurrently unless they are
_irqsafe variants.
This doesn't fix any visible bug but is apparently
the right thing to do as per the documentation.
Signed-off-by: Michal Kazior <[email protected]>
---
drivers/net/wireless/ath/ath10k/core.c | 1 +
drivers/net/wireless/ath/ath10k/core.h | 3 +++
drivers/net/wireless/ath/ath10k/txrx.c | 4 ++++
3 files changed, 8 insertions(+)
diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index 04c132e..b43e8ad 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -526,6 +526,7 @@ struct ath10k *ath10k_core_create(void *hif_priv, struct device *dev,
mutex_init(&ar->conf_mutex);
spin_lock_init(&ar->data_lock);
+ spin_lock_init(&ar->txrx_lock);
INIT_LIST_HEAD(&ar->peers);
init_waitqueue_head(&ar->peer_mapping_wq);
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index ab05c4c..0469a2e 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -392,6 +392,9 @@ struct ath10k {
u32 survey_last_cycle_count;
struct survey_info survey[ATH10K_NUM_CHANS];
+ /* synchronized tx/rx reporting to mac80211 */
+ spinlock_t txrx_lock;
+
#ifdef CONFIG_ATH10K_DEBUGFS
struct ath10k_debug debug;
#endif
diff --git a/drivers/net/wireless/ath/ath10k/txrx.c b/drivers/net/wireless/ath/ath10k/txrx.c
index 68b6fae..ce7e304 100644
--- a/drivers/net/wireless/ath/ath10k/txrx.c
+++ b/drivers/net/wireless/ath/ath10k/txrx.c
@@ -88,7 +88,9 @@ void ath10k_txrx_tx_unref(struct ath10k_htt *htt, struct sk_buff *txdesc)
if (ATH10K_SKB_CB(txdesc)->htt.no_ack)
info->flags &= ~IEEE80211_TX_STAT_ACK;
+ spin_lock_bh(&htt->ar->txrx_lock);
ieee80211_tx_status(htt->ar->hw, msdu);
+ spin_unlock_bh(&htt->ar->txrx_lock);
/* we do not own the msdu anymore */
exit:
@@ -294,7 +296,9 @@ void ath10k_process_rx(struct ath10k *ar, struct htt_rx_info *info)
status->freq,
status->band);
+ spin_lock_bh(&ar->txrx_lock);
ieee80211_rx(ar->hw, info->skb);
+ spin_unlock_bh(&ar->txrx_lock);
}
struct ath10k_peer *ath10k_peer_find(struct ath10k *ar, int vdev_id,
--
1.7.9.5