2023-05-09 01:52:17

by Tejun Heo

[permalink] [raw]
Subject: [PATCH 02/13] wifi: mwifiex: Use default @max_active for workqueues

These workqueues only host a single work item and thus doen't need explicit
concurrency limit. Let's use the default @max_active. This doesn't cost
anything and clearly expresses that @max_active doesn't matter.

Signed-off-by: Tejun Heo <[email protected]>
Cc: Amitkumar Karwar <[email protected]>
Cc: Ganapathi Bhat <[email protected]>
Cc: Sharvari Harisangam <[email protected]>
Cc: Xinming Hu <[email protected]>
Cc: Kalle Valo <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
---
drivers/net/wireless/marvell/mwifiex/cfg80211.c | 4 ++--
drivers/net/wireless/marvell/mwifiex/main.c | 8 ++++----
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
index bcd564dc3554..5337ee4b6f10 100644
--- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
@@ -3127,7 +3127,7 @@ struct wireless_dev *mwifiex_add_virtual_intf(struct wiphy *wiphy,
priv->dfs_cac_workqueue = alloc_workqueue("MWIFIEX_DFS_CAC%s",
WQ_HIGHPRI |
WQ_MEM_RECLAIM |
- WQ_UNBOUND, 1, name);
+ WQ_UNBOUND, 0, name);
if (!priv->dfs_cac_workqueue) {
mwifiex_dbg(adapter, ERROR, "cannot alloc DFS CAC queue\n");
ret = -ENOMEM;
@@ -3138,7 +3138,7 @@ struct wireless_dev *mwifiex_add_virtual_intf(struct wiphy *wiphy,

priv->dfs_chan_sw_workqueue = alloc_workqueue("MWIFIEX_DFS_CHSW%s",
WQ_HIGHPRI | WQ_UNBOUND |
- WQ_MEM_RECLAIM, 1, name);
+ WQ_MEM_RECLAIM, 0, name);
if (!priv->dfs_chan_sw_workqueue) {
mwifiex_dbg(adapter, ERROR, "cannot alloc DFS channel sw queue\n");
ret = -ENOMEM;
diff --git a/drivers/net/wireless/marvell/mwifiex/main.c b/drivers/net/wireless/marvell/mwifiex/main.c
index ea22a08e6c08..1cd9d20cca16 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -1547,7 +1547,7 @@ mwifiex_reinit_sw(struct mwifiex_adapter *adapter)

adapter->workqueue =
alloc_workqueue("MWIFIEX_WORK_QUEUE",
- WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
+ WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
if (!adapter->workqueue)
goto err_kmalloc;

@@ -1557,7 +1557,7 @@ mwifiex_reinit_sw(struct mwifiex_adapter *adapter)
adapter->rx_workqueue = alloc_workqueue("MWIFIEX_RX_WORK_QUEUE",
WQ_HIGHPRI |
WQ_MEM_RECLAIM |
- WQ_UNBOUND, 1);
+ WQ_UNBOUND, 0);
if (!adapter->rx_workqueue)
goto err_kmalloc;
INIT_WORK(&adapter->rx_work, mwifiex_rx_work_queue);
@@ -1702,7 +1702,7 @@ mwifiex_add_card(void *card, struct completion *fw_done,

adapter->workqueue =
alloc_workqueue("MWIFIEX_WORK_QUEUE",
- WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
+ WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
if (!adapter->workqueue)
goto err_kmalloc;

@@ -1712,7 +1712,7 @@ mwifiex_add_card(void *card, struct completion *fw_done,
adapter->rx_workqueue = alloc_workqueue("MWIFIEX_RX_WORK_QUEUE",
WQ_HIGHPRI |
WQ_MEM_RECLAIM |
- WQ_UNBOUND, 1);
+ WQ_UNBOUND, 0);
if (!adapter->rx_workqueue)
goto err_kmalloc;

--
2.40.1


2023-05-10 08:53:13

by Kalle Valo

[permalink] [raw]
Subject: Re: [PATCH 02/13] wifi: mwifiex: Use default @max_active for workqueues

Tejun Heo <[email protected]> wrote:

> These workqueues only host a single work item and thus doen't need explicit
> concurrency limit. Let's use the default @max_active. This doesn't cost
> anything and clearly expresses that @max_active doesn't matter.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Amitkumar Karwar <[email protected]>
> Cc: Ganapathi Bhat <[email protected]>
> Cc: Sharvari Harisangam <[email protected]>
> Cc: Xinming Hu <[email protected]>
> Cc: Kalle Valo <[email protected]>
> Cc: "David S. Miller" <[email protected]>
> Cc: Eric Dumazet <[email protected]>
> Cc: Jakub Kicinski <[email protected]>
> Cc: Paolo Abeni <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]

I didn't review the patch but I assume it's ok. Feel free to take it via your
tree:

Acked-by: Kalle Valo <[email protected]>

Patch set to Not Applicable.

--
https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/

https://wireless.wiki.kernel.org/en/developers/documentation/submittingpatches


2023-05-10 18:16:26

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 02/13] wifi: mwifiex: Use default @max_active for workqueues

On Mon, May 08, 2023 at 03:50:21PM -1000, Tejun Heo wrote:
> These workqueues only host a single work item and thus doen't need explicit
> concurrency limit. Let's use the default @max_active. This doesn't cost
> anything and clearly expresses that @max_active doesn't matter.
>
> Signed-off-by: Tejun Heo <[email protected]>
> Cc: Amitkumar Karwar <[email protected]>
> Cc: Ganapathi Bhat <[email protected]>
> Cc: Sharvari Harisangam <[email protected]>
> Cc: Xinming Hu <[email protected]>
> Cc: Kalle Valo <[email protected]>
> Cc: "David S. Miller" <[email protected]>
> Cc: Eric Dumazet <[email protected]>
> Cc: Jakub Kicinski <[email protected]>
> Cc: Paolo Abeni <[email protected]>
> Cc: [email protected]
> Cc: [email protected]
> Cc: [email protected]

Reviewed-by: Brian Norris <[email protected]>

I'll admit, the workqueue documentation sounds a bit like "max_active ==
1 + WQ_UNBOUND" is what we want ("one work item [...] active at any
given time"), but that's more of my misunderstanding than anything --
each work item can only be active in a single context at any given time,
so that note is talking about distinct (i.e., more than 1) work items.

While I'm here: we're still debugging what's affecting WiFi performance
on some of our WiFi systems, but it's possible I'll be turning some of
these into struct kthread_worker instead. We can cross that bridge
(including potential conflicts) if/when we come to it though.

Thanks,
Brian

2023-05-10 18:16:47

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 02/13] wifi: mwifiex: Use default @max_active for workqueues

Hello,

On Wed, May 10, 2023 at 11:09:55AM -0700, Brian Norris wrote:
> I'll admit, the workqueue documentation sounds a bit like "max_active ==
> 1 + WQ_UNBOUND" is what we want ("one work item [...] active at any
> given time"), but that's more of my misunderstanding than anything --
> each work item can only be active in a single context at any given time,
> so that note is talking about distinct (i.e., more than 1) work items.

Yeah, a future patch is gonna change the semantics a bit and I'll update the
doc to be clearer.

> While I'm here: we're still debugging what's affecting WiFi performance
> on some of our WiFi systems, but it's possible I'll be turning some of
> these into struct kthread_worker instead. We can cross that bridge
> (including potential conflicts) if/when we come to it though.

Can you elaborate the performance problem you're seeing? I'm working on a
major update for workqueue to improve its locality behavior, so if you're
experiencing issues on CPUs w/ multiple L3 caches, it'd be a good test case.

Thanks.

--
tejun

2023-05-10 19:03:16

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 02/13] wifi: mwifiex: Use default @max_active for workqueues

Hi,

On Wed, May 10, 2023 at 08:16:00AM -1000, Tejun Heo wrote:
> > While I'm here: we're still debugging what's affecting WiFi performance
> > on some of our WiFi systems, but it's possible I'll be turning some of
> > these into struct kthread_worker instead. We can cross that bridge
> > (including potential conflicts) if/when we come to it though.
>
> Can you elaborate the performance problem you're seeing? I'm working on a
> major update for workqueue to improve its locality behavior, so if you're
> experiencing issues on CPUs w/ multiple L3 caches, it'd be a good test case.

Sure!

Test case: iperf TCP RX (i.e., hits "MWIFIEX_RX_WORK_QUEUE" a lot) at
some of the higher (VHT 80 MHz) data rates.

Hardware: Mediatek MT8173 2xA53 (little) + 2xA72 (big) CPU
(I'm not familiar with its cache details)
+
Marvell SD8897 SDIO WiFi (mwifiex_sdio)

We're looking at a major regression from our 4.19 kernel to a 5.15
kernel (yeah, that's downstream reality). So far, we've found that
performance is:

(1) much better (nearly the same as 4.19) if we add WQ_SYSFS and pin the
work queue to one CPU (doesn't really matter which CPU, as long as it's
not the one loaded with IRQ(?) work)

(2) moderately better if we pin the CPU frequency (e.g., "performance"
cpufreq governor instead of "schedutil")

(3) moderately better (not quite as good as (2)) if we switch a
kthread_worker and don't pin anything.

We tried (2) because we saw a lot more CPU migration on kernel 5.15
(work moves across all 4 CPUs throughout the run; on kernel 4.19 it
mostly switched between 2 CPUs).

We tried (3) suspecting some kind of EAS issue (instead of distributing
our workload onto 4 different kworkers, our work (and therefore our load
calculation) is mostly confined to a single kernel thread). But it still
seems like our issues are more than "just" EAS / cpufreq issues, since
(2) and (3) aren't as good as (1).

NB: there weren't many relevant mwifiex or MTK-SDIO changes in this
range.

So we're still investigating a few other areas, but it does seem like
"locality" (in some sense of the word) is relevant. We'd probably be
open to testing any patches you have, although it's likely we'd have the
easiest time if we can port those to 5.15. We're constantly working on
getting good upstream support for Chromebook chips, but ARM SoC reality
is that it still varies a lot as to how much works upstream on any given
system.

Thanks,
Brian

2023-05-10 19:58:48

by Brian Norris

[permalink] [raw]
Subject: Re: [PATCH 02/13] wifi: mwifiex: Use default @max_active for workqueues

On Wed, May 10, 2023 at 09:19:20AM -1000, Tejun Heo wrote:
> On Wed, May 10, 2023 at 11:57:41AM -0700, Brian Norris wrote:
> > (1) much better (nearly the same as 4.19) if we add WQ_SYSFS and pin the
> > work queue to one CPU (doesn't really matter which CPU, as long as it's
> > not the one loaded with IRQ(?) work)
> >
> > (2) moderately better if we pin the CPU frequency (e.g., "performance"
> > cpufreq governor instead of "schedutil")
> >
> > (3) moderately better (not quite as good as (2)) if we switch a
> > kthread_worker and don't pin anything.
>
> Hmm... so it's not just workqueue.

Right. And not just cpufreq either.

> > We tried (2) because we saw a lot more CPU migration on kernel 5.15
> > (work moves across all 4 CPUs throughout the run; on kernel 4.19 it
> > mostly switched between 2 CPUs).
>
> Workqueue can contribute to this but it seems more likely that scheduling
> changes are also part of the story.

Yeah, that's one theory. And in that vein, that's one reason we might
consider switching to a kthread_worker anyway, even if that doesn't
solve all the regression -- because schedutil relies on per-entity load
calculations to make decisions, and workqueues don't help the scheduler
understand that load when spread across N CPUs (workers). A dedicated
kthread would better represent our workload to the scheduler.

(Threaded NAPI -- mwifiex doesn't support NAPI -- takes a similar
approach, as it has its own thread per NAPI context.)

> > We tried (3) suspecting some kind of EAS issue (instead of distributing
> > our workload onto 4 different kworkers, our work (and therefore our load
> > calculation) is mostly confined to a single kernel thread). But it still
> > seems like our issues are more than "just" EAS / cpufreq issues, since
> > (2) and (3) aren't as good as (1).
> >
> > NB: there weren't many relevant mwifiex or MTK-SDIO changes in this
> > range.
> >
> > So we're still investigating a few other areas, but it does seem like
> > "locality" (in some sense of the word) is relevant. We'd probably be
> > open to testing any patches you have, although it's likely we'd have the
> > easiest time if we can port those to 5.15. We're constantly working on
> > getting good upstream support for Chromebook chips, but ARM SoC reality
> > is that it still varies a lot as to how much works upstream on any given
> > system.
>
> I should be able to post the patchset later today or tomorrow. It comes with
> sysfs knobs to control affinity scopes and strictness, so hopefully you
> should be able to find the configuration that works without too much
> difficulty.

Great!

Brian

2023-05-19 00:37:31

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH 02/13] wifi: mwifiex: Use default @max_active for workqueues

On Mon, May 08, 2023 at 03:50:21PM -1000, Tejun Heo wrote:
> These workqueues only host a single work item and thus doen't need explicit
> concurrency limit. Let's use the default @max_active. This doesn't cost
> anything and clearly expresses that @max_active doesn't matter.

Applied to wq/for-6.5-cleanup-ordered.

Thanks.

--
tejun