2022-12-20 08:02:49

by Abhishek Kumar

[permalink] [raw]
Subject: [PATCH] ath10k: snoc: enable threaded napi on WCN3990

NAPI poll can be done in threaded context along with soft irq
context. Threaded context can be scheduled efficiently, thus
creating less of bottleneck during Rx processing. This patch is
to enable threaded NAPI on ath10k driver.

Based on testing, it was observed that on WCN3990, the CPU0 reaches
100% utilization when napi runs in softirq context. At the same
time the other CPUs are at low consumption percentage. This
does not allow device to reach its maximum throughput potential.
After enabling threaded napi, CPU load is balanced across all CPUs
and following improvments were observed:
- UDP_RX increase by ~22-25%
- TCP_RX increase by ~15%

Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2-00696-QCAHLSWMTPL-1
Signed-off-by: Abhishek Kumar <[email protected]>
---

drivers/net/wireless/ath/ath10k/core.c | 16 ++++++++++++++++
drivers/net/wireless/ath/ath10k/hw.h | 2 ++
drivers/net/wireless/ath/ath10k/snoc.c | 3 +++
3 files changed, 21 insertions(+)

diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index 5eb131ab916fd..ee4b6ba508c81 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -100,6 +100,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
.hw_restart_disconnect = false,
.use_fw_tx_credits = true,
.delay_unmap_buffer = false,
+ .enable_threaded_napi = false,
},
{
.id = QCA988X_HW_2_0_VERSION,
@@ -140,6 +141,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
.hw_restart_disconnect = false,
.use_fw_tx_credits = true,
.delay_unmap_buffer = false,
+ .enable_threaded_napi = false,
},
{
.id = QCA9887_HW_1_0_VERSION,
@@ -181,6 +183,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
.hw_restart_disconnect = false,
.use_fw_tx_credits = true,
.delay_unmap_buffer = false,
+ .enable_threaded_napi = false,
},
{
.id = QCA6174_HW_3_2_VERSION,
@@ -217,6 +220,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
.hw_restart_disconnect = false,
.use_fw_tx_credits = true,
.delay_unmap_buffer = false,
+ .enable_threaded_napi = false,
},
{
.id = QCA6174_HW_2_1_VERSION,
@@ -257,6 +261,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
.hw_restart_disconnect = false,
.use_fw_tx_credits = true,
.delay_unmap_buffer = false,
+ .enable_threaded_napi = false,
},
{
.id = QCA6174_HW_2_1_VERSION,
@@ -297,6 +302,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
.hw_restart_disconnect = false,
.use_fw_tx_credits = true,
.delay_unmap_buffer = false,
+ .enable_threaded_napi = false,
},
{
.id = QCA6174_HW_3_0_VERSION,
@@ -337,6 +343,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
.hw_restart_disconnect = false,
.use_fw_tx_credits = true,
.delay_unmap_buffer = false,
+ .enable_threaded_napi = false,
},
{
.id = QCA6174_HW_3_2_VERSION,
@@ -381,6 +388,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
.hw_restart_disconnect = false,
.use_fw_tx_credits = true,
.delay_unmap_buffer = false,
+ .enable_threaded_napi = false,
},
{
.id = QCA99X0_HW_2_0_DEV_VERSION,
@@ -427,6 +435,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
.hw_restart_disconnect = false,
.use_fw_tx_credits = true,
.delay_unmap_buffer = false,
+ .enable_threaded_napi = false,
},
{
.id = QCA9984_HW_1_0_DEV_VERSION,
@@ -480,6 +489,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
.hw_restart_disconnect = false,
.use_fw_tx_credits = true,
.delay_unmap_buffer = false,
+ .enable_threaded_napi = false,
},
{
.id = QCA9888_HW_2_0_DEV_VERSION,
@@ -530,6 +540,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
.hw_restart_disconnect = false,
.use_fw_tx_credits = true,
.delay_unmap_buffer = false,
+ .enable_threaded_napi = false,
},
{
.id = QCA9377_HW_1_0_DEV_VERSION,
@@ -570,6 +581,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
.hw_restart_disconnect = false,
.use_fw_tx_credits = true,
.delay_unmap_buffer = false,
+ .enable_threaded_napi = false,
},
{
.id = QCA9377_HW_1_1_DEV_VERSION,
@@ -612,6 +624,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
.hw_restart_disconnect = false,
.use_fw_tx_credits = true,
.delay_unmap_buffer = false,
+ .enable_threaded_napi = false,
},
{
.id = QCA9377_HW_1_1_DEV_VERSION,
@@ -645,6 +658,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
.hw_restart_disconnect = false,
.use_fw_tx_credits = true,
.delay_unmap_buffer = false,
+ .enable_threaded_napi = false,
},
{
.id = QCA4019_HW_1_0_DEV_VERSION,
@@ -692,6 +706,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
.hw_restart_disconnect = false,
.use_fw_tx_credits = true,
.delay_unmap_buffer = false,
+ .enable_threaded_napi = false,
},
{
.id = WCN3990_HW_1_0_DEV_VERSION,
@@ -725,6 +740,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
.hw_restart_disconnect = true,
.use_fw_tx_credits = false,
.delay_unmap_buffer = true,
+ .enable_threaded_napi = true,
},
};

diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
index 9643031a4427a..adf3076b96503 100644
--- a/drivers/net/wireless/ath/ath10k/hw.h
+++ b/drivers/net/wireless/ath/ath10k/hw.h
@@ -639,6 +639,8 @@ struct ath10k_hw_params {
bool use_fw_tx_credits;

bool delay_unmap_buffer;
+
+ bool enable_threaded_napi;
};

struct htt_resp;
diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
index cfcb759a87dea..b94150fb6ef06 100644
--- a/drivers/net/wireless/ath/ath10k/snoc.c
+++ b/drivers/net/wireless/ath/ath10k/snoc.c
@@ -927,6 +927,9 @@ static int ath10k_snoc_hif_start(struct ath10k *ar)

bitmap_clear(ar_snoc->pending_ce_irqs, 0, CE_COUNT_MAX);

+ if (ar->hw_params.enable_threaded_napi)
+ dev_set_threaded(&ar->napi_dev, true);
+
ath10k_core_napi_enable(ar);
ath10k_snoc_irq_enable(ar);
ath10k_snoc_rx_post(ar);
--
2.39.0.314.g84b9a713c41-goog


2022-12-20 15:17:50

by Dave Taht

[permalink] [raw]
Subject: Re: [PATCH] ath10k: snoc: enable threaded napi on WCN3990

I am always interested in flent.org tcp_nup, tcp_ndown, and rrul_be
tests on wifi hardware. In AP mode, especially, against a few clients
in rtt_fair on the "ending the anomaly" test suite at the bottom of
this link: https://www.cs.kau.se/tohojo/airtime-fairness/ . Of these,
it's trying to optimize bandwidth more fairly and keep latencies low
when 4 or more stations are trying to transmit (in a world with 16 or
more stations online), that increasingly bothers me the most. I'm
seeing 5+ seconds on some rtt_fair-like tests nowadays.

I was also seeing huge simultaneous upload vs download disparities on
the latest kernels, on various threads over here:
https://forum.openwrt.org/t/aql-and-the-ath10k-is-lovely/59002 and
more recently here:
https://forum.openwrt.org/t/reducing-multiplexing-latencies-still-further-in-wifi/133605

I don't understand why napi with the default budget (64) is even
needed on the ath10k, as a single txop takes a minimum of ~200us, but
perhaps your patch will help. Still, measuring the TCP statistics
in-band would be nice to see. Some new tools are appearing that can do
this, Apple's goresponsiveness, crusader... that are simpler to use
than flent.

On Tue, Dec 20, 2022 at 12:17 AM Abhishek Kumar <[email protected]> wrote:
>
> NAPI poll can be done in threaded context along with soft irq
> context. Threaded context can be scheduled efficiently, thus
> creating less of bottleneck during Rx processing. This patch is
> to enable threaded NAPI on ath10k driver.
>
> Based on testing, it was observed that on WCN3990, the CPU0 reaches
> 100% utilization when napi runs in softirq context. At the same
> time the other CPUs are at low consumption percentage. This
> does not allow device to reach its maximum throughput potential.
> After enabling threaded napi, CPU load is balanced across all CPUs
> and following improvments were observed:
> - UDP_RX increase by ~22-25%
> - TCP_RX increase by ~15%
>
> Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2-00696-QCAHLSWMTPL-1
> Signed-off-by: Abhishek Kumar <[email protected]>
> ---
>
> drivers/net/wireless/ath/ath10k/core.c | 16 ++++++++++++++++
> drivers/net/wireless/ath/ath10k/hw.h | 2 ++
> drivers/net/wireless/ath/ath10k/snoc.c | 3 +++
> 3 files changed, 21 insertions(+)
>
> diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
> index 5eb131ab916fd..ee4b6ba508c81 100644
> --- a/drivers/net/wireless/ath/ath10k/core.c
> +++ b/drivers/net/wireless/ath/ath10k/core.c
> @@ -100,6 +100,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA988X_HW_2_0_VERSION,
> @@ -140,6 +141,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA9887_HW_1_0_VERSION,
> @@ -181,6 +183,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA6174_HW_3_2_VERSION,
> @@ -217,6 +220,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA6174_HW_2_1_VERSION,
> @@ -257,6 +261,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA6174_HW_2_1_VERSION,
> @@ -297,6 +302,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA6174_HW_3_0_VERSION,
> @@ -337,6 +343,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA6174_HW_3_2_VERSION,
> @@ -381,6 +388,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA99X0_HW_2_0_DEV_VERSION,
> @@ -427,6 +435,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA9984_HW_1_0_DEV_VERSION,
> @@ -480,6 +489,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA9888_HW_2_0_DEV_VERSION,
> @@ -530,6 +540,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA9377_HW_1_0_DEV_VERSION,
> @@ -570,6 +581,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA9377_HW_1_1_DEV_VERSION,
> @@ -612,6 +624,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA9377_HW_1_1_DEV_VERSION,
> @@ -645,6 +658,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = QCA4019_HW_1_0_DEV_VERSION,
> @@ -692,6 +706,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = false,
> .use_fw_tx_credits = true,
> .delay_unmap_buffer = false,
> + .enable_threaded_napi = false,
> },
> {
> .id = WCN3990_HW_1_0_DEV_VERSION,
> @@ -725,6 +740,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> .hw_restart_disconnect = true,
> .use_fw_tx_credits = false,
> .delay_unmap_buffer = true,
> + .enable_threaded_napi = true,
> },
> };
>
> diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
> index 9643031a4427a..adf3076b96503 100644
> --- a/drivers/net/wireless/ath/ath10k/hw.h
> +++ b/drivers/net/wireless/ath/ath10k/hw.h
> @@ -639,6 +639,8 @@ struct ath10k_hw_params {
> bool use_fw_tx_credits;
>
> bool delay_unmap_buffer;
> +
> + bool enable_threaded_napi;
> };
>
> struct htt_resp;
> diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
> index cfcb759a87dea..b94150fb6ef06 100644
> --- a/drivers/net/wireless/ath/ath10k/snoc.c
> +++ b/drivers/net/wireless/ath/ath10k/snoc.c
> @@ -927,6 +927,9 @@ static int ath10k_snoc_hif_start(struct ath10k *ar)
>
> bitmap_clear(ar_snoc->pending_ce_irqs, 0, CE_COUNT_MAX);
>
> + if (ar->hw_params.enable_threaded_napi)
> + dev_set_threaded(&ar->napi_dev, true);
> +
> ath10k_core_napi_enable(ar);
> ath10k_snoc_irq_enable(ar);
> ath10k_snoc_rx_post(ar);
> --
> 2.39.0.314.g84b9a713c41-goog
>


--
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC

2022-12-29 00:02:15

by Abhishek Kumar

[permalink] [raw]
Subject: Re: [PATCH] ath10k: snoc: enable threaded napi on WCN3990

Apologies for the late reply, Thanks Dave for your comment. My answer is inline.

On Tue, Dec 20, 2022 at 7:10 AM Dave Taht <[email protected]> wrote:
>
> I am always interested in flent.org tcp_nup, tcp_ndown, and rrul_be
> tests on wifi hardware. In AP mode, especially, against a few clients
> in rtt_fair on the "ending the anomaly" test suite at the bottom of
> this link: https://www.cs.kau.se/tohojo/airtime-fairness/ . Of these,
> it's trying to optimize bandwidth more fairly and keep latencies low
> when 4 or more stations are trying to transmit (in a world with 16 or
> more stations online), that increasingly bothers me the most. I'm
> seeing 5+ seconds on some rtt_fair-like tests nowadays.
I used testing using iperf and conductive setup and fetched the
throughput data(mentioned below).
>
> I was also seeing huge simultaneous upload vs download disparities on
> the latest kernels, on various threads over here:
> https://forum.openwrt.org/t/aql-and-the-ath10k-is-lovely/59002 and
> more recently here:
> https://forum.openwrt.org/t/reducing-multiplexing-latencies-still-further-in-wifi/133605
Interesting, thanks for the pointer and probably the Qualcomm team is
aware of it.
>
> I don't understand why napi with the default budget (64) is even
> needed on the ath10k, as a single txop takes a minimum of ~200us, but
> perhaps your patch will help. Still, measuring the TCP statistics
> in-band would be nice to see. Some new tools are appearing that can do
> this, Apple's goresponsiveness, crusader... that are simpler to use
> than flent.
Here are some of the additional raw data with and without threaded napi:
==================================================
udp_rx(Without threaded NAPI)
435.98+-5.16 : Channel 44
439.06+-0.66 : Channel 157

udp_rx(With threaded NAPI)
509.73+-41.03 : Channel 44
549.97+-7.62 : Channel 157
===================================================
udp_tx(Without threaded NAPI)
461.31+-0.69 : Channel 44
461.46+-0.78 : Channel 157

udp_tx(With threaded NAPI)
459.20+-0.77 : Channel 44
459.78+-1.08 : Channel 157
===================================================
tcp_rx(Without threaded NAPI)
472.63+-2.35 : Channel 44
469.29+-6.31 : Channel 157

tcp_rx(With threaded NAPI)
498.49+-2.44 : Channel 44
541.14+-40.65 : Channel 157
===================================================
tcp_tx(Without threaded NAPI)
317.34+-2.37 : Channel 44
317.01+-2.56 : Channel 157

tcp_tx(With threaded NAPI)
371.34+-2.36 : Channel 44
376.95+-9.40 : Channel 157
====================================================

>
> On Tue, Dec 20, 2022 at 12:17 AM Abhishek Kumar <[email protected]> wrote:
> >
> > NAPI poll can be done in threaded context along with soft irq
> > context. Threaded context can be scheduled efficiently, thus
> > creating less of bottleneck during Rx processing. This patch is
> > to enable threaded NAPI on ath10k driver.
> >
> > Based on testing, it was observed that on WCN3990, the CPU0 reaches
> > 100% utilization when napi runs in softirq context. At the same
> > time the other CPUs are at low consumption percentage. This
> > does not allow device to reach its maximum throughput potential.
> > After enabling threaded napi, CPU load is balanced across all CPUs
> > and following improvments were observed:
> > - UDP_RX increase by ~22-25%
> > - TCP_RX increase by ~15%
> >
> > Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2-00696-QCAHLSWMTPL-1
> > Signed-off-by: Abhishek Kumar <[email protected]>
> > ---
> >
> > drivers/net/wireless/ath/ath10k/core.c | 16 ++++++++++++++++
> > drivers/net/wireless/ath/ath10k/hw.h | 2 ++
> > drivers/net/wireless/ath/ath10k/snoc.c | 3 +++
> > 3 files changed, 21 insertions(+)
> >
> > diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
> > index 5eb131ab916fd..ee4b6ba508c81 100644
> > --- a/drivers/net/wireless/ath/ath10k/core.c
> > +++ b/drivers/net/wireless/ath/ath10k/core.c
> > @@ -100,6 +100,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA988X_HW_2_0_VERSION,
> > @@ -140,6 +141,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA9887_HW_1_0_VERSION,
> > @@ -181,6 +183,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA6174_HW_3_2_VERSION,
> > @@ -217,6 +220,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA6174_HW_2_1_VERSION,
> > @@ -257,6 +261,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA6174_HW_2_1_VERSION,
> > @@ -297,6 +302,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA6174_HW_3_0_VERSION,
> > @@ -337,6 +343,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA6174_HW_3_2_VERSION,
> > @@ -381,6 +388,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA99X0_HW_2_0_DEV_VERSION,
> > @@ -427,6 +435,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA9984_HW_1_0_DEV_VERSION,
> > @@ -480,6 +489,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA9888_HW_2_0_DEV_VERSION,
> > @@ -530,6 +540,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA9377_HW_1_0_DEV_VERSION,
> > @@ -570,6 +581,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA9377_HW_1_1_DEV_VERSION,
> > @@ -612,6 +624,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA9377_HW_1_1_DEV_VERSION,
> > @@ -645,6 +658,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA4019_HW_1_0_DEV_VERSION,
> > @@ -692,6 +706,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = WCN3990_HW_1_0_DEV_VERSION,
> > @@ -725,6 +740,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = true,
> > .use_fw_tx_credits = false,
> > .delay_unmap_buffer = true,
> > + .enable_threaded_napi = true,
> > },
> > };
> >
> > diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
> > index 9643031a4427a..adf3076b96503 100644
> > --- a/drivers/net/wireless/ath/ath10k/hw.h
> > +++ b/drivers/net/wireless/ath/ath10k/hw.h
> > @@ -639,6 +639,8 @@ struct ath10k_hw_params {
> > bool use_fw_tx_credits;
> >
> > bool delay_unmap_buffer;
> > +
> > + bool enable_threaded_napi;
> > };
> >
> > struct htt_resp;
> > diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
> > index cfcb759a87dea..b94150fb6ef06 100644
> > --- a/drivers/net/wireless/ath/ath10k/snoc.c
> > +++ b/drivers/net/wireless/ath/ath10k/snoc.c
> > @@ -927,6 +927,9 @@ static int ath10k_snoc_hif_start(struct ath10k *ar)
> >
> > bitmap_clear(ar_snoc->pending_ce_irqs, 0, CE_COUNT_MAX);
> >
> > + if (ar->hw_params.enable_threaded_napi)
> > + dev_set_threaded(&ar->napi_dev, true);
> > +
> > ath10k_core_napi_enable(ar);
> > ath10k_snoc_irq_enable(ar);
> > ath10k_snoc_rx_post(ar);
> > --
> > 2.39.0.314.g84b9a713c41-goog
> >
>
>
> --
> This song goes out to all the folk that thought Stadia would work:
> https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
> Dave Täht CEO, TekLibre, LLC

2022-12-29 00:06:18

by Abhishek Kumar

[permalink] [raw]
Subject: Re: [PATCH] ath10k: snoc: enable threaded napi on WCN3990

Apologies for the late reply. Please see my response inline.

On Tue, Dec 20, 2022 at 4:14 AM Manikanta Pubbisetty
<[email protected]> wrote:
>
> On 12/20/2022 1:25 PM, Abhishek Kumar wrote:
> > NAPI poll can be done in threaded context along with soft irq
> > context. Threaded context can be scheduled efficiently, thus
> > creating less of bottleneck during Rx processing. This patch is
> > to enable threaded NAPI on ath10k driver.
> >
> > Based on testing, it was observed that on WCN3990, the CPU0 reaches
> > 100% utilization when napi runs in softirq context. At the same
> > time the other CPUs are at low consumption percentage. This
> > does not allow device to reach its maximum throughput potential.
> > After enabling threaded napi, CPU load is balanced across all CPUs
> > and following improvments were observed:
> > - UDP_RX increase by ~22-25%
> > - TCP_RX increase by ~15%
> >
> > Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2-00696-QCAHLSWMTPL-1
> > Signed-off-by: Abhishek Kumar <[email protected]>
> > ---
> >
> > drivers/net/wireless/ath/ath10k/core.c | 16 ++++++++++++++++
> > drivers/net/wireless/ath/ath10k/hw.h | 2 ++
> > drivers/net/wireless/ath/ath10k/snoc.c | 3 +++
> > 3 files changed, 21 insertions(+)
> >
> > diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
> > index 5eb131ab916fd..ee4b6ba508c81 100644
> > --- a/drivers/net/wireless/ath/ath10k/core.c
> > +++ b/drivers/net/wireless/ath/ath10k/core.c
> > @@ -100,6 +100,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA988X_HW_2_0_VERSION,
> > @@ -140,6 +141,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA9887_HW_1_0_VERSION,
> > @@ -181,6 +183,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA6174_HW_3_2_VERSION,
> > @@ -217,6 +220,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA6174_HW_2_1_VERSION,
> > @@ -257,6 +261,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA6174_HW_2_1_VERSION,
> > @@ -297,6 +302,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA6174_HW_3_0_VERSION,
> > @@ -337,6 +343,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA6174_HW_3_2_VERSION,
> > @@ -381,6 +388,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA99X0_HW_2_0_DEV_VERSION,
> > @@ -427,6 +435,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA9984_HW_1_0_DEV_VERSION,
> > @@ -480,6 +489,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA9888_HW_2_0_DEV_VERSION,
> > @@ -530,6 +540,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA9377_HW_1_0_DEV_VERSION,
> > @@ -570,6 +581,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA9377_HW_1_1_DEV_VERSION,
> > @@ -612,6 +624,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA9377_HW_1_1_DEV_VERSION,
> > @@ -645,6 +658,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = QCA4019_HW_1_0_DEV_VERSION,
> > @@ -692,6 +706,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = false,
> > .use_fw_tx_credits = true,
> > .delay_unmap_buffer = false,
> > + .enable_threaded_napi = false,
> > },
> > {
> > .id = WCN3990_HW_1_0_DEV_VERSION,
> > @@ -725,6 +740,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > .hw_restart_disconnect = true,
> > .use_fw_tx_credits = false,
> > .delay_unmap_buffer = true,
> > + .enable_threaded_napi = true,
> > },
> > };
> >
> > diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
> > index 9643031a4427a..adf3076b96503 100644
> > --- a/drivers/net/wireless/ath/ath10k/hw.h
> > +++ b/drivers/net/wireless/ath/ath10k/hw.h
> > @@ -639,6 +639,8 @@ struct ath10k_hw_params {
> > bool use_fw_tx_credits;
> >
> > bool delay_unmap_buffer;
> > +
> > + bool enable_threaded_napi;
> > };
> >
> > struct htt_resp;
> > diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
> > index cfcb759a87dea..b94150fb6ef06 100644
> > --- a/drivers/net/wireless/ath/ath10k/snoc.c
> > +++ b/drivers/net/wireless/ath/ath10k/snoc.c
> > @@ -927,6 +927,9 @@ static int ath10k_snoc_hif_start(struct ath10k *ar)
> >
> > bitmap_clear(ar_snoc->pending_ce_irqs, 0, CE_COUNT_MAX);
> >
> > + if (ar->hw_params.enable_threaded_napi)
> > + dev_set_threaded(&ar->napi_dev, true);
> > +
>
> Since this is done in the API specific to WCN3990, we do not need
> hw_param for this.
Just so that I am clear, are you suggesting to enable this by default
in snoc.c, similar to what you did in
https://lore.kernel.org/all/[email protected]/
. If my understanding is correct and there is no objection, I can
remove hw_param and enable it by default on snoc.c .
I used hw_param because, as I see it, threaded NAPI can have some
adverse effect on the cache utilization and power.

Thanks
Abhishek
>
> Thanks,
> Manikanta

2022-12-29 01:05:48

by Dave Taht

[permalink] [raw]
Subject: Re: [PATCH] ath10k: snoc: enable threaded napi on WCN3990

On Wed, Dec 28, 2022 at 3:53 PM Abhishek Kumar <[email protected]> wrote:
>
> Apologies for the late reply, Thanks Dave for your comment. My answer is inline.
>
> On Tue, Dec 20, 2022 at 7:10 AM Dave Taht <[email protected]> wrote:
> >
> > I am always interested in flent.org tcp_nup, tcp_ndown, and rrul_be
> > tests on wifi hardware. In AP mode, especially, against a few clients
> > in rtt_fair on the "ending the anomaly" test suite at the bottom of
> > this link: https://www.cs.kau.se/tohojo/airtime-fairness/ . Of these,
> > it's trying to optimize bandwidth more fairly and keep latencies low
> > when 4 or more stations are trying to transmit (in a world with 16 or
> > more stations online), that increasingly bothers me the most. I'm
> > seeing 5+ seconds on some rtt_fair-like tests nowadays.
> I used testing using iperf and conductive setup and fetched the
> throughput data(mentioned below).
> >
> > I was also seeing huge simultaneous upload vs download disparities on
> > the latest kernels, on various threads over here:
> > https://forum.openwrt.org/t/aql-and-the-ath10k-is-lovely/59002 and
> > more recently here:
> > https://forum.openwrt.org/t/reducing-multiplexing-latencies-still-further-in-wifi/133605
> Interesting, thanks for the pointer and probably the Qualcomm team is
> aware of it.
> >
> > I don't understand why napi with the default budget (64) is even
> > needed on the ath10k, as a single txop takes a minimum of ~200us, but
> > perhaps your patch will help. Still, measuring the TCP statistics
> > in-band would be nice to see. Some new tools are appearing that can do
> > this, Apple's goresponsiveness, crusader... that are simpler to use
> > than flent.
> Here are some of the additional raw data with and without threaded napi:
> ==================================================
> udp_rx(Without threaded NAPI)
> 435.98+-5.16 : Channel 44
> 439.06+-0.66 : Channel 157
>
> udp_rx(With threaded NAPI)
> 509.73+-41.03 : Channel 44
> 549.97+-7.62 : Channel 157
> ===================================================
> udp_tx(Without threaded NAPI)
> 461.31+-0.69 : Channel 44
> 461.46+-0.78 : Channel 157
>
> udp_tx(With threaded NAPI)
> 459.20+-0.77 : Channel 44
> 459.78+-1.08 : Channel 157
> ===================================================
> tcp_rx(Without threaded NAPI)
> 472.63+-2.35 : Channel 44
> 469.29+-6.31 : Channel 157
>
> tcp_rx(With threaded NAPI)
> 498.49+-2.44 : Channel 44
> 541.14+-40.65 : Channel 157
> ===================================================
> tcp_tx(Without threaded NAPI)
> 317.34+-2.37 : Channel 44
> 317.01+-2.56 : Channel 157
>
> tcp_tx(With threaded NAPI)
> 371.34+-2.36 : Channel 44
> 376.95+-9.40 : Channel 157

My concern is primarily with the induced tcp latency on this test. A
way to check that is to run wireshark on your test client driving the
test, capture the iperf traffic, and then plot the "Statistics->TCP
stream statistics for both throughput and rtt. Would it be possible
for you to do that and put up those plots somewhere?

The worst case test is a tcp bidirectional test which I don't know if
older iperfs can do. (iperf2 has new bounceback and bidir tests)

Ideally stuff going in either direction, would not look as horrible,
as it did, back in 2016, documented in this linuxplumbers presentation
here: https://blog.linuxplumbersconf.org/2016/ocw/system/presentations/3963/original/linuxplumbers_wifi_latency-3Nov.pdf
and discussed on lwn, here: https://lwn.net/Articles/705884/

I worry about folk achieving slightly better tcp throughput at the
expense of clobbering in-tcp-stream latency. Back then we were
shooting for no more than 40ms extra latency under load on this chip,
down from (unusable) seconds. Presently elsewhere, on other chips,
we're getting 8ms with stuff that's not in tree for the ath10k, there
is a slight cost in single stream throughput but when multiple streams
are in use, on multiple stations, things like web pages fly,
irrespective of load.


> ====================================================
>
> >
> > On Tue, Dec 20, 2022 at 12:17 AM Abhishek Kumar <[email protected]> wrote:
> > >
> > > NAPI poll can be done in threaded context along with soft irq
> > > context. Threaded context can be scheduled efficiently, thus
> > > creating less of bottleneck during Rx processing. This patch is
> > > to enable threaded NAPI on ath10k driver.
> > >
> > > Based on testing, it was observed that on WCN3990, the CPU0 reaches
> > > 100% utilization when napi runs in softirq context. At the same
> > > time the other CPUs are at low consumption percentage. This
> > > does not allow device to reach its maximum throughput potential.
> > > After enabling threaded napi, CPU load is balanced across all CPUs
> > > and following improvments were observed:
> > > - UDP_RX increase by ~22-25%
> > > - TCP_RX increase by ~15%
> > >
> > > Tested-on: WCN3990 hw1.0 SNOC WLAN.HL.3.2.2-00696-QCAHLSWMTPL-1
> > > Signed-off-by: Abhishek Kumar <[email protected]>
> > > ---
> > >
> > > drivers/net/wireless/ath/ath10k/core.c | 16 ++++++++++++++++
> > > drivers/net/wireless/ath/ath10k/hw.h | 2 ++
> > > drivers/net/wireless/ath/ath10k/snoc.c | 3 +++
> > > 3 files changed, 21 insertions(+)
> > >
> > > diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
> > > index 5eb131ab916fd..ee4b6ba508c81 100644
> > > --- a/drivers/net/wireless/ath/ath10k/core.c
> > > +++ b/drivers/net/wireless/ath/ath10k/core.c
> > > @@ -100,6 +100,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > > .hw_restart_disconnect = false,
> > > .use_fw_tx_credits = true,
> > > .delay_unmap_buffer = false,
> > > + .enable_threaded_napi = false,
> > > },
> > > {
> > > .id = QCA988X_HW_2_0_VERSION,
> > > @@ -140,6 +141,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > > .hw_restart_disconnect = false,
> > > .use_fw_tx_credits = true,
> > > .delay_unmap_buffer = false,
> > > + .enable_threaded_napi = false,
> > > },
> > > {
> > > .id = QCA9887_HW_1_0_VERSION,
> > > @@ -181,6 +183,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > > .hw_restart_disconnect = false,
> > > .use_fw_tx_credits = true,
> > > .delay_unmap_buffer = false,
> > > + .enable_threaded_napi = false,
> > > },
> > > {
> > > .id = QCA6174_HW_3_2_VERSION,
> > > @@ -217,6 +220,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > > .hw_restart_disconnect = false,
> > > .use_fw_tx_credits = true,
> > > .delay_unmap_buffer = false,
> > > + .enable_threaded_napi = false,
> > > },
> > > {
> > > .id = QCA6174_HW_2_1_VERSION,
> > > @@ -257,6 +261,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > > .hw_restart_disconnect = false,
> > > .use_fw_tx_credits = true,
> > > .delay_unmap_buffer = false,
> > > + .enable_threaded_napi = false,
> > > },
> > > {
> > > .id = QCA6174_HW_2_1_VERSION,
> > > @@ -297,6 +302,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > > .hw_restart_disconnect = false,
> > > .use_fw_tx_credits = true,
> > > .delay_unmap_buffer = false,
> > > + .enable_threaded_napi = false,
> > > },
> > > {
> > > .id = QCA6174_HW_3_0_VERSION,
> > > @@ -337,6 +343,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > > .hw_restart_disconnect = false,
> > > .use_fw_tx_credits = true,
> > > .delay_unmap_buffer = false,
> > > + .enable_threaded_napi = false,
> > > },
> > > {
> > > .id = QCA6174_HW_3_2_VERSION,
> > > @@ -381,6 +388,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > > .hw_restart_disconnect = false,
> > > .use_fw_tx_credits = true,
> > > .delay_unmap_buffer = false,
> > > + .enable_threaded_napi = false,
> > > },
> > > {
> > > .id = QCA99X0_HW_2_0_DEV_VERSION,
> > > @@ -427,6 +435,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > > .hw_restart_disconnect = false,
> > > .use_fw_tx_credits = true,
> > > .delay_unmap_buffer = false,
> > > + .enable_threaded_napi = false,
> > > },
> > > {
> > > .id = QCA9984_HW_1_0_DEV_VERSION,
> > > @@ -480,6 +489,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > > .hw_restart_disconnect = false,
> > > .use_fw_tx_credits = true,
> > > .delay_unmap_buffer = false,
> > > + .enable_threaded_napi = false,
> > > },
> > > {
> > > .id = QCA9888_HW_2_0_DEV_VERSION,
> > > @@ -530,6 +540,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > > .hw_restart_disconnect = false,
> > > .use_fw_tx_credits = true,
> > > .delay_unmap_buffer = false,
> > > + .enable_threaded_napi = false,
> > > },
> > > {
> > > .id = QCA9377_HW_1_0_DEV_VERSION,
> > > @@ -570,6 +581,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > > .hw_restart_disconnect = false,
> > > .use_fw_tx_credits = true,
> > > .delay_unmap_buffer = false,
> > > + .enable_threaded_napi = false,
> > > },
> > > {
> > > .id = QCA9377_HW_1_1_DEV_VERSION,
> > > @@ -612,6 +624,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > > .hw_restart_disconnect = false,
> > > .use_fw_tx_credits = true,
> > > .delay_unmap_buffer = false,
> > > + .enable_threaded_napi = false,
> > > },
> > > {
> > > .id = QCA9377_HW_1_1_DEV_VERSION,
> > > @@ -645,6 +658,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > > .hw_restart_disconnect = false,
> > > .use_fw_tx_credits = true,
> > > .delay_unmap_buffer = false,
> > > + .enable_threaded_napi = false,
> > > },
> > > {
> > > .id = QCA4019_HW_1_0_DEV_VERSION,
> > > @@ -692,6 +706,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > > .hw_restart_disconnect = false,
> > > .use_fw_tx_credits = true,
> > > .delay_unmap_buffer = false,
> > > + .enable_threaded_napi = false,
> > > },
> > > {
> > > .id = WCN3990_HW_1_0_DEV_VERSION,
> > > @@ -725,6 +740,7 @@ static const struct ath10k_hw_params ath10k_hw_params_list[] = {
> > > .hw_restart_disconnect = true,
> > > .use_fw_tx_credits = false,
> > > .delay_unmap_buffer = true,
> > > + .enable_threaded_napi = true,
> > > },
> > > };
> > >
> > > diff --git a/drivers/net/wireless/ath/ath10k/hw.h b/drivers/net/wireless/ath/ath10k/hw.h
> > > index 9643031a4427a..adf3076b96503 100644
> > > --- a/drivers/net/wireless/ath/ath10k/hw.h
> > > +++ b/drivers/net/wireless/ath/ath10k/hw.h
> > > @@ -639,6 +639,8 @@ struct ath10k_hw_params {
> > > bool use_fw_tx_credits;
> > >
> > > bool delay_unmap_buffer;
> > > +
> > > + bool enable_threaded_napi;
> > > };
> > >
> > > struct htt_resp;
> > > diff --git a/drivers/net/wireless/ath/ath10k/snoc.c b/drivers/net/wireless/ath/ath10k/snoc.c
> > > index cfcb759a87dea..b94150fb6ef06 100644
> > > --- a/drivers/net/wireless/ath/ath10k/snoc.c
> > > +++ b/drivers/net/wireless/ath/ath10k/snoc.c
> > > @@ -927,6 +927,9 @@ static int ath10k_snoc_hif_start(struct ath10k *ar)
> > >
> > > bitmap_clear(ar_snoc->pending_ce_irqs, 0, CE_COUNT_MAX);
> > >
> > > + if (ar->hw_params.enable_threaded_napi)
> > > + dev_set_threaded(&ar->napi_dev, true);
> > > +
> > > ath10k_core_napi_enable(ar);
> > > ath10k_snoc_irq_enable(ar);
> > > ath10k_snoc_rx_post(ar);
> > > --
> > > 2.39.0.314.g84b9a713c41-goog
> > >
> >
> >
> > --
> > This song goes out to all the folk that thought Stadia would work:
> > https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
> > Dave Täht CEO, TekLibre, LLC



--
This song goes out to all the folk that thought Stadia would work:
https://www.linkedin.com/posts/dtaht_the-mushroom-song-activity-6981366665607352320-FXtz
Dave Täht CEO, TekLibre, LLC

2023-01-12 10:30:26

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH] ath10k: snoc: enable threaded napi on WCN3990

Abhishek Kumar <[email protected]> writes:

>> > --- a/drivers/net/wireless/ath/ath10k/snoc.c
>> > +++ b/drivers/net/wireless/ath/ath10k/snoc.c
>> > @@ -927,6 +927,9 @@ static int ath10k_snoc_hif_start(struct ath10k *ar)
>> >
>> > bitmap_clear(ar_snoc->pending_ce_irqs, 0, CE_COUNT_MAX);
>> >
>> > + if (ar->hw_params.enable_threaded_napi)
>> > + dev_set_threaded(&ar->napi_dev, true);
>> > +
>>
>> Since this is done in the API specific to WCN3990, we do not need
>> hw_param for this.
>
> Just so that I am clear, are you suggesting to enable this by default
> in snoc.c, similar to what you did in
>
> https://lore.kernel.org/all/[email protected]/
>
> If my understanding is correct and there is no objection, I can remove
> hw_param and enable it by default on snoc.c . I used hw_param because,
> as I see it, threaded NAPI can have some adverse effect on the cache
> utilization and power.

WCN3990 is the only device using SNOC bus so the hw_param is not needed.
It's safe to call dev_set_threaded() in ath10k_snoc_hif_start()
unconditionally as it only affects WCN3990.

--
https://patchwork.kernel.org/project/linux-wireless/list/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches

2023-01-27 22:08:41

by Abhishek Kumar

[permalink] [raw]
Subject: Re: [PATCH] ath10k: snoc: enable threaded napi on WCN3990

Thanks for all the comments. I will call dev_set_threaded() directly
without HW params and rollout a v2 soon.

On Thu, Jan 12, 2023 at 2:15 AM Kalle Valo <[email protected]> wrote:
>
> Abhishek Kumar <[email protected]> writes:
>
> >> > --- a/drivers/net/wireless/ath/ath10k/snoc.c
> >> > +++ b/drivers/net/wireless/ath/ath10k/snoc.c
> >> > @@ -927,6 +927,9 @@ static int ath10k_snoc_hif_start(struct ath10k *ar)
> >> >
> >> > bitmap_clear(ar_snoc->pending_ce_irqs, 0, CE_COUNT_MAX);
> >> >
> >> > + if (ar->hw_params.enable_threaded_napi)
> >> > + dev_set_threaded(&ar->napi_dev, true);
> >> > +
> >>
> >> Since this is done in the API specific to WCN3990, we do not need
> >> hw_param for this.
> >
> > Just so that I am clear, are you suggesting to enable this by default
> > in snoc.c, similar to what you did in
> >
> > https://lore.kernel.org/all/[email protected]/
> >
> > If my understanding is correct and there is no objection, I can remove
> > hw_param and enable it by default on snoc.c . I used hw_param because,
> > as I see it, threaded NAPI can have some adverse effect on the cache
> > utilization and power.
>
> WCN3990 is the only device using SNOC bus so the hw_param is not needed.
> It's safe to call dev_set_threaded() in ath10k_snoc_hif_start()
> unconditionally as it only affects WCN3990.
>
> --
> https://patchwork.kernel.org/project/linux-wireless/list/
>
> https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches