2022-09-15 05:20:37

by Venkat Ch

[permalink] [raw]
Subject: [PATCH v2] wifi: mac80211: Fix performance issue with mutex_lock

From: Venkat Chimata <[email protected]>

OpenWiFi's ucentral agent periodically (typically 120 seconds)
issues nl80211 call to get associated client list from the
WLAN driver. Somehow this operation was causing tx/rx delays
sometimes and the video calls on connected clients are experiencing
jitter. The associated client list was protected by
a mutex lock. I saw that ieee80211_check_fast_xmit_all uses
rcu_read_lock and rcu_read_unlock to iterate through sta_list.
I took it as a refernce and changed the lock to rcu_read lock
from mutex.
Also saw this this comment just above sta_mutex declaration.
/* Station data */
/*
* The mutex only protects the list, hash table and
* counter, reads are done with RCU.
*/
struct mutex sta_mtx;

Hence tried changing mutex_lock(/unlock) in ieee80211_dump_station
to rcu_read_lock(/unlock) and it resolved the jitter issue in the
video calls.

Tests:
We had this issue show up consistently and the patch fixed the issue.
We spent a good part of 2 weeks following up on this and with this
fix, the video calls are smooth.

Also tested if this could cause any crashes with below mentioned
process.

1. Connect 3 clients
2. A script running dumping clients in an infinite loop
3. Continuously disconnect / connect one client to see if
there is any crash. No crash was observed.

---
v2:
* Resolve warning
net/mac80211/cfg.c:858:33: warning:
unused variable 'local' [-Wunused-variable]
---

Signed-off-by: Venkat Chimata <[email protected]>
---
net/mac80211/cfg.c | 7 ++-----
net/mac80211/sta_info.c | 3 +--
2 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 6a8350d..a005364 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -790,20 +790,17 @@ static int ieee80211_dump_station(struct wiphy *wiphy, struct net_device *dev,
int idx, u8 *mac, struct station_info *sinfo)
{
struct ieee80211_sub_if_data *sdata = IEEE80211_DEV_TO_SUB_IF(dev);
- struct ieee80211_local *local = sdata->local;
struct sta_info *sta;
int ret = -ENOENT;

- mutex_lock(&local->sta_mtx);
-
+ rcu_read_lock();
sta = sta_info_get_by_idx(sdata, idx);
if (sta) {
ret = 0;
memcpy(mac, sta->sta.addr, ETH_ALEN);
sta_set_sinfo(sta, sinfo, true);
}
-
- mutex_unlock(&local->sta_mtx);
+ rcu_read_unlock();

return ret;
}
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 550a610..af6fa75 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -231,8 +231,7 @@ struct sta_info *sta_info_get_by_idx(struct ieee80211_sub_if_data *sdata,
struct sta_info *sta;
int i = 0;

- list_for_each_entry_rcu(sta, &local->sta_list, list,
- lockdep_is_held(&local->sta_mtx)) {
+ list_for_each_entry_rcu(sta, &local->sta_list, list) {
if (sdata != sta->sdata)
continue;
if (i < idx) {
--
2.34.1


2022-09-20 07:34:39

by kernel test robot

[permalink] [raw]
Subject: [wifi] 662cceb50e: BUG:sleeping_function_called_from_invalid_context_at_include/linux/sched/mm.h


Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: 662cceb50ea735492ff67b5ecdbc8d9de15212bd ("[PATCH v2] wifi: mac80211: Fix performance issue with mutex_lock")
url: https://github.com/intel-lab-lkp/linux/commits/venkatch-gmail-com/wifi-mac80211-Fix-performance-issue-with-mutex_lock/20220915-123721
base: https://git.kernel.org/cgit/linux/kernel/git/wireless/wireless-next.git main
patch link: https://lore.kernel.org/linux-wireless/[email protected]

in testcase: hwsim
version: hwsim-x86_64-717e5d7-1_20220525
with following parameters:

test: group-03



on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz (Ivy Bridge) with 16G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <[email protected]>
| Link: https://lore.kernel.org/r/[email protected]


[ 103.595219][ T5052] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:274
[ 103.595404][ T5052] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 5052, name: iw
[ 103.595601][ T5052] preempt_count: 1, expected: 0
[ 103.595740][ T5052] CPU: 3 PID: 5052 Comm: iw Not tainted 6.0.0-rc3-00691-g662cceb50ea7 #1
[ 103.595921][ T5052] Hardware name: /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
[ 103.596088][ T5052] Call Trace:
[ 103.596151][ T5052] <TASK>
[ 103.596209][ T5052] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
[ 103.596297][ T5052] __might_resched.cold (kernel/sched/core.c:9893)
[ 103.596391][ T5052] ? cfg80211_sinfo_alloc_tid_stats (net/wireless/util.c:2255) cfg80211
[ 103.596557][ T5052] kmem_cache_alloc_trace (include/linux/sched/mm.h:274 mm/slab.h:700 mm/slub.c:3157 mm/slub.c:3251 mm/slub.c:3282)
[ 103.596655][ T5052] cfg80211_sinfo_alloc_tid_stats (net/wireless/util.c:2255) cfg80211
[ 103.596807][ T5052] sta_set_sinfo (net/mac80211/sta_info.c:2651 (discriminator 1)) mac80211
[ 103.596956][ T5052] ? kfree (mm/slub.c:1780 mm/slub.c:3534 mm/slub.c:4562)
[ 103.597034][ T5052] ieee80211_dump_station (net/mac80211/cfg.c:864) mac80211
[ 103.597183][ T5052] nl80211_dump_station (net/wireless/rdev-ops.h:245 net/wireless/nl80211.c:6565) cfg80211
[ 103.597330][ T5052] ? nl80211_send_station (net/wireless/nl80211.c:6539) cfg80211
[ 103.597482][ T5052] ? __alloc_skb (net/core/skbuff.c:448)
[ 103.597568][ T5052] ? kmalloc_reserve (net/core/skbuff.c:400)
[ 103.597656][ T5052] ? __mutex_lock_slowpath (kernel/locking/mutex.c:282)
[ 103.597750][ T5052] netlink_dump (net/netlink/af_netlink.c:2275)
[ 103.597832][ T5052] ? netlink_getsockopt (net/netlink/af_netlink.c:2213)
[ 103.597925][ T5052] ? mutex_unlock (arch/x86/include/asm/atomic64_64.h:190 include/linux/atomic/atomic-long.h:449 include/linux/atomic/atomic-instrumented.h:1790 kernel/locking/mutex.c:181 kernel/locking/mutex.c:540)
[ 103.598009][ T5052] ? __mutex_unlock_slowpath+0x2c0/0x2c0
[ 103.598123][ T5052] __netlink_dump_start (include/linux/instrumented.h:101 include/linux/atomic/atomic-instrumented.h:176 include/linux/refcount.h:272 include/linux/refcount.h:315 include/linux/refcount.h:333 include/net/sock.h:1975 net/netlink/af_netlink.c:2382)
[ 103.598217][ T5052] genl_family_rcv_msg_dumpit (net/netlink/genetlink.c:689)
[ 103.598318][ T5052] ? genl_rcv (net/netlink/genetlink.c:655)
[ 103.598395][ T5052] ? __stack_depot_save (lib/stackdepot.c:424)
[ 103.598490][ T5052] ? genl_family_rcv_msg_doit (net/netlink/genetlink.c:563)
[ 103.598590][ T5052] ? nl80211_send_station (net/wireless/nl80211.c:6539) cfg80211
[ 103.598739][ T5052] ? genl_family_rcv_msg_dumpit (net/netlink/genetlink.c:637)
[ 103.598842][ T5052] ? genl_op_from_small (net/netlink/genetlink.c:136)
[ 103.598933][ T5052] genl_rcv_msg (net/netlink/genetlink.c:775 net/netlink/genetlink.c:795)
[ 103.599016][ T5052] ? genl_get_cmd (net/netlink/genetlink.c:784)
[ 103.599100][ T5052] ? nl80211_dump_station (net/wireless/nl80211.c:6592) cfg80211
[ 103.599245][ T5052] ? nl80211_send_station (net/wireless/nl80211.c:6539) cfg80211
[ 103.599394][ T5052] ? kasan_save_stack (mm/kasan/common.c:40)
[ 103.599511][ T5052] ? kasan_save_stack (mm/kasan/common.c:39)
[ 103.599613][ T5052] ? __kasan_slab_alloc (mm/kasan/common.c:45 mm/kasan/common.c:437 mm/kasan/common.c:470)
[ 103.599704][ T5052] ? kmem_cache_alloc_lru (mm/slab.h:727 mm/slub.c:3243 mm/slub.c:3251 mm/slub.c:3258 mm/slub.c:3275)
[ 103.599798][ T5052] netlink_rcv_skb (net/netlink/af_netlink.c:2538)
[ 103.599882][ T5052] ? genl_get_cmd (net/netlink/genetlink.c:784)
[ 103.599967][ T5052] ? netlink_ack (net/netlink/af_netlink.c:2515)
[ 103.600049][ T5052] ? netlink_recvmsg (net/netlink/af_netlink.c:514)
[ 103.600136][ T5052] ? _copy_from_iter (lib/iov_iter.c:628 (discriminator 11))
[ 103.600226][ T5052] genl_rcv (net/netlink/genetlink.c:807)
[ 103.600299][ T5052] netlink_unicast (net/netlink/af_netlink.c:1320 net/netlink/af_netlink.c:1345)
[ 103.600386][ T5052] ? netlink_attachskb (net/netlink/af_netlink.c:1330)
[ 103.600488][ T5052] ? check_heap_object (arch/x86/include/asm/bitops.h:207 arch/x86/include/asm/bitops.h:239 include/asm-generic/bitops/instrumented-non-atomic.h:142 include/linux/page-flags.h:487 mm/usercopy.c:193)
[ 103.600588][ T5052] netlink_sendmsg (net/netlink/af_netlink.c:1921)
[ 103.600675][ T5052] ? netlink_unicast (net/netlink/af_netlink.c:1841)
[ 103.600764][ T5052] ? __import_iovec (lib/iov_iter.c:1771)
[ 103.600849][ T5052] ? netlink_unicast (net/netlink/af_netlink.c:1841)
[ 103.600936][ T5052] sock_sendmsg (net/socket.c:714 net/socket.c:734)
[ 103.601016][ T5052] ____sys_sendmsg (net/socket.c:2482)
[ 103.601101][ T5052] ? kernel_sendmsg (net/socket.c:2429)
[ 103.601184][ T5052] ? __copy_msghdr (net/socket.c:2409)
[ 103.601269][ T5052] ? __kasan_record_aux_stack (mm/kasan/generic.c:348)
[ 103.601369][ T5052] ___sys_sendmsg (net/socket.c:2538)
[ 103.601461][ T5052] ? __ia32_sys_recvmmsg (net/socket.c:2525)
[ 103.601566][ T5052] ? rcu_gp_kthread (kernel/rcu/tree_nocb.h:371)
[ 103.601653][ T5052] ? fsnotify_grab_connector (fs/notify/mark.c:601)
[ 103.601751][ T5052] ? filemap_map_pmd (mm/filemap.c:3324)
[ 103.601839][ T5052] ? do_read_fault (mm/memory.c:4487 mm/memory.c:4513)
[ 103.601925][ T5052] ? do_fault (mm/memory.c:4647)
[ 103.602003][ T5052] ? __handle_mm_fault (mm/memory.c:5053)
[ 103.602093][ T5052] ? rcu_gp_kthread (kernel/rcu/tree_nocb.h:371)
[ 103.602180][ T5052] ? copy_page_range (mm/memory.c:4963)
[ 103.602267][ T5052] ? __fget_light (arch/x86/include/asm/atomic.h:29 include/linux/atomic/atomic-instrumented.h:28 fs/file.c:1005)
[ 103.602352][ T5052] __sys_sendmsg (include/linux/file.h:31 net/socket.c:2567)
[ 103.602435][ T5052] ? __sys_sendmsg_sock (net/socket.c:2553)
[ 103.602545][ T5052] ? do_user_addr_fault (arch/x86/mm/fault.c:1426)
[ 103.602636][ T5052] ? exit_to_user_mode_loop (include/linux/sched.h:2305 include/linux/resume_user_mode.h:61 kernel/entry/common.c:169)
[ 103.602731][ T5052] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
[ 103.602812][ T5052] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
[ 103.602915][ T5052] RIP: 0033:0x7fcd499bf2c3
[ 103.602996][ T5052] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b7 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 89 54 24 1c 48
All code
========
0: 64 89 02 mov %eax,%fs:(%rdx)
3: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
a: eb b7 jmp 0xffffffffffffffc3
c: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
13: 00 00 00
16: 90 nop
17: 64 8b 04 25 18 00 00 mov %fs:0x18,%eax
1e: 00
1f: 85 c0 test %eax,%eax
21: 75 14 jne 0x37
23: b8 2e 00 00 00 mov $0x2e,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 55 ja 0x87
32: c3 retq
33: 0f 1f 40 00 nopl 0x0(%rax)
37: 48 83 ec 28 sub $0x28,%rsp
3b: 89 54 24 1c mov %edx,0x1c(%rsp)
3f: 48 rex.W

Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 55 ja 0x5d
8: c3 retq
9: 0f 1f 40 00 nopl 0x0(%rax)
d: 48 83 ec 28 sub $0x28,%rsp
11: 89 54 24 1c mov %edx,0x1c(%rsp)
15: 48 rex.W
[ 103.603312][ T5052] RSP: 002b:00007ffd473a4358 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 103.603472][ T5052] RAX: ffffffffffffffda RBX: 000055d33f5b4390 RCX: 00007fcd499bf2c3
[ 103.603625][ T5052] RDX: 0000000000000000 RSI: 00007ffd473a4390 RDI: 0000000000000003
[ 103.603758][ T5052] RBP: 000055d33f5b9780 R08: 000055d33f5b54c0 R09: 000055d33f5b44b0
[ 103.603894][ T5052] R10: 000055d33f290940 R11: 0000000000000246 R12: 000055d33f5b98c0
[ 103.604030][ T5052] R13: 00007ffd473a4390 R14: 000055d33f5b97d0 R15: 000055d33f5b98c0
[ 103.604166][ T5052] </TASK>
[ 103.605988][ T405] Station dump on dev[0]:
[ 103.605997][ T405]
[ 103.606940][ T405] Station 02:00:00:00:03:00 (on wlan0)
[ 103.606947][ T405]
[ 103.607595][ T405] inactive time: 61 ms
[ 103.607602][ T405]
[ 103.608090][ T405] rx bytes: 150775
[ 103.608096][ T405]
[ 103.608636][ T405] rx packets: 172
[ 103.608643][ T405]
[ 103.609122][ T405] tx bytes: 64715
[ 103.609128][ T405]
[ 103.609621][ T405] tx packets: 54
[ 103.609627][ T405]
[ 103.610061][ T405] tx retries: 0
[ 103.610067][ T405]
[ 103.610590][ T405] tx failed: 0
[ 103.610596][ T405]
[ 103.611031][ T405] beacon loss: 0
[ 103.611037][ T405]
[ 103.611473][ T405] beacon rx: 25
[ 103.611480][ T405]
[ 103.611979][ T405] rx drop misc: 30
[ 103.611985][ T405]
[ 103.612510][ T405] signal: -27 dBm
[ 103.612521][ T405]
[ 103.613094][ T405] signal avg: -27 dBm
[ 103.613100][ T405]
[ 103.613841][ T405] beacon signal avg: -27 dBm
[ 103.613848][ T405]
[ 103.614976][ T405] tx bitrate: 13.0 MBit/s VHT-MCS 1 VHT-NSS 1
[ 103.614983][ T405]
[ 103.615774][ T8] wlan0: deauthenticated from 02:00:00:00:03:00 (Reason: 3=DEAUTH_LEAVING)
[ 103.615801][ T60] wlan1: deauthenticated from 02:00:00:00:03:00 (Reason: 3=DEAUTH_LEAVING)
[ 103.615855][ T405] tx duration: 0 us
[ 103.615864][ T405]


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (11.13 kB)
config-6.0.0-rc3-00691-g662cceb50ea7 (169.98 kB)
job-script (6.02 kB)
dmesg.xz (154.85 kB)
hwsim (70.87 kB)
job.yaml (4.65 kB)
reproduce (4.77 kB)
Download all attachments

2022-09-20 08:39:18

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH v2] wifi: mac80211: Fix performance issue with mutex_lock


On 15.09.22 06:35, [email protected] wrote:
> From: Venkat Chimata <[email protected]>
>
> OpenWiFi's ucentral agent periodically (typically 120 seconds)
> issues nl80211 call to get associated client list from the
> WLAN driver. Somehow this operation was causing tx/rx delays
> sometimes and the video calls on connected clients are experiencing
> jitter. The associated client list was protected by
> a mutex lock. I saw that ieee80211_check_fast_xmit_all uses
> rcu_read_lock and rcu_read_unlock to iterate through sta_list.
> I took it as a refernce and changed the lock to rcu_read lock
> from mutex.
> Also saw this this comment just above sta_mutex declaration.
> /* Station data */
> /*
> * The mutex only protects the list, hash table and
> * counter, reads are done with RCU.
> */
> struct mutex sta_mtx;
>
> Hence tried changing mutex_lock(/unlock) in ieee80211_dump_station
> to rcu_read_lock(/unlock) and it resolved the jitter issue in the
> video calls.
>
> Tests:
> We had this issue show up consistently and the patch fixed the issue.
> We spent a good part of 2 weeks following up on this and with this
> fix, the video calls are smooth.
>
> Also tested if this could cause any crashes with below mentioned
> process.
>
> 1. Connect 3 clients
> 2. A script running dumping clients in an infinite loop
> 3. Continuously disconnect / connect one client to see if
> there is any crash. No crash was observed.
The drv_sta_statistics call (called from sta_set_sinfo) can sleep, so
you're not allowed to use RCU for iterating over the station list.
Also, please analyze and explain the actual root cause before proposing
another fix.

- Felix

2022-09-20 09:11:52

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH v2] wifi: mac80211: Fix performance issue with mutex_lock


On 20.09.22 11:05, Venkat Ch wrote:
> Hi Felix,
> Thanks for the comments. What part of  drv_sta_statistics could make it
> sleep? I might be missing something. Please explain.
It's just a wrapper around the driver .sta_statistics op.
The documentation in include/net/mac80211.h states that this callback is
allowed to sleep, and it does so in a few drivers.

- Felix

2022-09-20 18:31:52

by Venkat Ch

[permalink] [raw]
Subject: Re: [PATCH v2] wifi: mac80211: Fix performance issue with mutex_lock

Hi Felix,

Thank you. I browsed through the code. There seems to be sleep
operations wcn36xx platform.ath11k does n't seem to have the sleep
operations in sta_statistics. We are using ath11k based chipset. Will
it impact things if we apply this patch for ath11k boards only as a
platform specific patch?

Thanks & Regards
Venkat

Venkat

On Tue, 20 Sept 2022 at 14:40, Felix Fietkau <[email protected]> wrote:
>
>
> On 20.09.22 11:05, Venkat Ch wrote:
> > Hi Felix,
> > Thanks for the comments. What part of drv_sta_statistics could make it
> > sleep? I might be missing something. Please explain.
> It's just a wrapper around the driver .sta_statistics op.
> The documentation in include/net/mac80211.h states that this callback is
> allowed to sleep, and it does so in a few drivers.
>
> - Felix



--
If you rest, you rust

2022-09-20 19:12:36

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH v2] wifi: mac80211: Fix performance issue with mutex_lock


On 20.09.22 20:27, Venkat Ch wrote:
> Hi Felix,
>
> Thank you. I browsed through the code. There seems to be sleep
> operations wcn36xx platform.ath11k does n't seem to have the sleep
> operations in sta_statistics. We are using ath11k based chipset. Will
> it impact things if we apply this patch for ath11k boards only as a
> platform specific patch?
I think it's a bad idea to keep this hack without understanding why it
helps at all.

- Felix

2022-09-20 19:43:32

by Venkat Ch

[permalink] [raw]
Subject: Re: [PATCH v2] wifi: mac80211: Fix performance issue with mutex_lock

Hi Felix,

Following is the background of the problem, how I traced to
mutex_lock and why I propose rcu locks.

Issue:
On a 10Mbps upload / 50 Mbps download connection, the following issue reported.

Video periodically freezes and/or appears delayed when on Zoom or Teams.
1. Video will freeze for 10 or 15 seconds periodically when on a call,
but audio continues and the session doesn't drop.
2. The video still works but it appears delay (I move, but the video
of my movement is noticeably delay by a second or so)

Tracing to mutex_lock(sta_mutex):

When I investigated, I found that the ucentral agent in openwifi
fetches the station list periodically. Without the station list
fetch, the video quality is just fine. I investigated the station list
path and found this mutex_lock. I also see that earlier it was
rcu_lock which protected the station list in this path. In this
commit, https://github.com/torvalds/linux/commit/66572cfc30a4b764150c83ee5d842a3ce17991c9,
rcu lock was changed to mutex lock without providing any reason.

I also saw this comment just above the sta_mutex declaration.
/* Station data */
/*
* The mutex only protects the list, hash table and
* counter, reads are done with RCU.
*/
struct mutex sta_mtx;

So I reverted back the mutex_lock with rcu_lock and it just worked
fine. We tested for more than 2 weeks before concluding this analysis.

I think the usage of mutex_lock is impacting the tx / rx path
somewhere and hence the issue. It is a challenge to trace the exact
function though.

Thanks & Regards
Venkat

On Wed, 21 Sept 2022 at 00:21, Felix Fietkau <[email protected]> wrote:
>
>
> On 20.09.22 20:27, Venkat Ch wrote:
> > Hi Felix,
> >
> > Thank you. I browsed through the code. There seems to be sleep
> > operations wcn36xx platform.ath11k does n't seem to have the sleep
> > operations in sta_statistics. We are using ath11k based chipset. Will
> > it impact things if we apply this patch for ath11k boards only as a
> > platform specific patch?
> I think it's a bad idea to keep this hack without understanding why it
> helps at all.
>
> - Felix



--
If you rest, you rust

2022-09-21 08:14:26

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH v2] wifi: mac80211: Fix performance issue with mutex_lock


On 20.09.22 21:23, Venkat Ch wrote:
> Hi Felix,
>
> Following is the background of the problem, how I traced to
> mutex_lock and why I propose rcu locks.
>
> Issue:
> On a 10Mbps upload / 50 Mbps download connection, the following issue reported.
>
> Video periodically freezes and/or appears delayed when on Zoom or Teams.
> 1. Video will freeze for 10 or 15 seconds periodically when on a call,
> but audio continues and the session doesn't drop.
> 2. The video still works but it appears delay (I move, but the video
> of my movement is noticeably delay by a second or so)
>
> Tracing to mutex_lock(sta_mutex):
>
> When I investigated, I found that the ucentral agent in openwifi
> fetches the station list periodically. Without the station list
> fetch, the video quality is just fine. I investigated the station list
> path and found this mutex_lock. I also see that earlier it was
> rcu_lock which protected the station list in this path. In this
> commit, https://github.com/torvalds/linux/commit/66572cfc30a4b764150c83ee5d842a3ce17991c9,
> rcu lock was changed to mutex lock without providing any reason.
The reason seems clear to me, even though it was not explicitly stated
in the commit message: in sta_set_sinfo it introduces a call to a driver
op that is allowed to sleep.

> I also saw this comment just above the sta_mutex declaration.
> /* Station data */
> /*
> * The mutex only protects the list, hash table and
> * counter, reads are done with RCU.
> */
> struct mutex sta_mtx;
>
> So I reverted back the mutex_lock with rcu_lock and it just worked
> fine. We tested for more than 2 weeks before concluding this analysis.
>
> I think the usage of mutex_lock is impacting the tx / rx path
> somewhere and hence the issue. It is a challenge to trace the exact
> function though.

I don't see any critical part in the tx/rx path which depends on the
sta_mtx lock. My guess is that for some reason your change is simply
accidentally papering over the real bug, which may be somewhere else
entirely, maybe even in the driver. A freeze for 10-15 seconds
definitely does not sound like simple lock contention on the mutex,
since a single station dump will be much faster than that.

- Felix

2022-09-21 08:35:47

by Venkat Ch

[permalink] [raw]
Subject: Re: [PATCH v2] wifi: mac80211: Fix performance issue with mutex_lock

Hi Felix,

Thanks again for patiently discussing things. I clearly understand
what you said about locks. Now that we know the background of the
problem, do you suggest any potential solution or any potential
direction that I should start investigating? Please let me know.

Thanks & Regards
Venkat

On Wed, 21 Sept 2022 at 13:35, Felix Fietkau <[email protected]> wrote:
>
>
> On 20.09.22 21:23, Venkat Ch wrote:
> > Hi Felix,
> >
> > Following is the background of the problem, how I traced to
> > mutex_lock and why I propose rcu locks.
> >
> > Issue:
> > On a 10Mbps upload / 50 Mbps download connection, the following issue reported.
> >
> > Video periodically freezes and/or appears delayed when on Zoom or Teams.
> > 1. Video will freeze for 10 or 15 seconds periodically when on a call,
> > but audio continues and the session doesn't drop.
> > 2. The video still works but it appears delay (I move, but the video
> > of my movement is noticeably delay by a second or so)
> >
> > Tracing to mutex_lock(sta_mutex):
> >
> > When I investigated, I found that the ucentral agent in openwifi
> > fetches the station list periodically. Without the station list
> > fetch, the video quality is just fine. I investigated the station list
> > path and found this mutex_lock. I also see that earlier it was
> > rcu_lock which protected the station list in this path. In this
> > commit, https://github.com/torvalds/linux/commit/66572cfc30a4b764150c83ee5d842a3ce17991c9,
> > rcu lock was changed to mutex lock without providing any reason.
> The reason seems clear to me, even though it was not explicitly stated
> in the commit message: in sta_set_sinfo it introduces a call to a driver
> op that is allowed to sleep.
>
> > I also saw this comment just above the sta_mutex declaration.
> > /* Station data */
> > /*
> > * The mutex only protects the list, hash table and
> > * counter, reads are done with RCU.
> > */
> > struct mutex sta_mtx;
> >
> > So I reverted back the mutex_lock with rcu_lock and it just worked
> > fine. We tested for more than 2 weeks before concluding this analysis.
> >
> > I think the usage of mutex_lock is impacting the tx / rx path
> > somewhere and hence the issue. It is a challenge to trace the exact
> > function though.
>
> I don't see any critical part in the tx/rx path which depends on the
> sta_mtx lock. My guess is that for some reason your change is simply
> accidentally papering over the real bug, which may be somewhere else
> entirely, maybe even in the driver. A freeze for 10-15 seconds
> definitely does not sound like simple lock contention on the mutex,
> since a single station dump will be much faster than that.
>
> - Felix



--
If you rest, you rust

2022-09-21 10:52:19

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH v2] wifi: mac80211: Fix performance issue with mutex_lock

On 21.09.22 10:31, Venkat Ch wrote:
> Hi Felix,
>
> Thanks again for patiently discussing things. I clearly understand
> what you said about locks. Now that we know the background of the
> problem, do you suggest any potential solution or any potential
> direction that I should start investigating? Please let me know.

Maybe try enabling lockdep and see if anything shows up.

- Felix