2017-10-03 18:17:41

by Ben Greear

[permalink] [raw]
Subject: lockdep splat on 4.13.3 (plus hacks)

We are seeing deadlocks related to wifi in our 4.13.3+ kernels,
so I enabled lockdep and immediately saw this. Anyone know if this
is a known issue? Otherwise, I guess it could be related to some local
patch I have added...


[ 476.172823] ============================================
[ 476.176863] WARNING: possible recursive locking detected
[ 476.180895] 4.13.3+ #1 Not tainted
[ 476.183025] --------------------------------------------
[ 476.187053] kworker/u8:2/281 is trying to acquire lock:
[ 476.190993] (&sta->ampdu_mlme.mtx){+.+...}, at: [<ffffffffa09cd4e8>] __ieee80211_start_rx_ba_session+0x178/0x670 [mac80211]
[ 476.201004]
but task is already holding lock:
[ 476.204270] (&sta->ampdu_mlme.mtx){+.+...}, at: [<ffffffffa09c93a6>] ieee80211_ba_session_work+0x46/0x2b0 [mac80211]
[ 476.213645]
other info that might help us debug this:
[ 476.217587] Possible unsafe locking scenario:

[ 476.220930] CPU0
[ 476.222082] ----
[ 476.223236] lock(&sta->ampdu_mlme.mtx);
[ 476.225957] lock(&sta->ampdu_mlme.mtx);
[ 476.228673]
*** DEADLOCK ***

[ 476.230689] May be due to missing lock nesting notation

[ 476.234879] 3 locks held by kworker/u8:2/281:
[ 476.237941] #0: ("%s"wiphy_name(local->hw.wiphy)){++++.+}, at: [<ffffffff8113df8f>] process_one_work+0x14f/0x6a0
[ 476.247033] #1: ((&sta->ampdu_mlme.work)){+.+...}, at: [<ffffffff8113df8f>] process_one_work+0x14f/0x6a0
[ 476.255393] #2: (&sta->ampdu_mlme.mtx){+.+...}, at: [<ffffffffa09c93a6>] ieee80211_ba_session_work+0x46/0x2b0 [mac80211]
[ 476.265170]
stack backtrace:
[ 476.266928] CPU: 0 PID: 281 Comm: kworker/u8:2 Not tainted 4.13.3+ #1
[ 476.272073] Hardware name: _ _/ , BIOS 5.11 08/26/2016
[ 476.275927] Workqueue: phy1 ieee80211_ba_session_work [mac80211]
[ 476.280640] Call Trace:
[ 476.281792] dump_stack+0x85/0xc7
[ 476.283811] __lock_acquire+0x14ba/0x1520
[ 476.286526] ? __save_stack_trace+0x6e/0xd0
[ 476.289412] ? ret_from_fork+0x2a/0x40
[ 476.291867] lock_acquire+0xac/0x200
[ 476.294145] ? lock_acquire+0xac/0x200
[ 476.296610] ? __ieee80211_start_rx_ba_session+0x178/0x670 [mac80211]
[ 476.301766] ? __ieee80211_start_rx_ba_session+0x178/0x670 [mac80211]
[ 476.306910] __mutex_lock+0x69/0x930
[ 476.309194] ? __ieee80211_start_rx_ba_session+0x178/0x670 [mac80211]
[ 476.314343] ? rcu_read_lock_sched_held+0x6d/0x80
[ 476.317766] ? __sdata_dbg+0x14a/0x1a0 [mac80211]
[ 476.321181] mutex_lock_nested+0x16/0x20
[ 476.323810] ? mutex_lock_nested+0x16/0x20
[ 476.326626] __ieee80211_start_rx_ba_session+0x178/0x670 [mac80211]
[ 476.331627] ieee80211_ba_session_work+0x157/0x2b0 [mac80211]
[ 476.336078] ? process_one_work+0x14f/0x6a0
[ 476.338985] process_one_work+0x1ce/0x6a0
[ 476.341696] worker_thread+0x46/0x400
[ 476.344061] kthread+0x10f/0x150
[ 476.346001] ? process_one_work+0x6a0/0x6a0
[ 476.348886] ? kthread_create_on_node+0x40/0x40
[ 476.352122] ret_from_fork+0x2a/0x40

Thanks,
Ben

--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com


2017-10-03 18:50:54

by Ben Greear

[permalink] [raw]
Subject: Re: lockdep splat on 4.13.3 (plus hacks)

On 10/03/2017 11:17 AM, Ben Greear wrote:
> We are seeing deadlocks related to wifi in our 4.13.3+ kernels,
> so I enabled lockdep and immediately saw this. Anyone know if this
> is a known issue? Otherwise, I guess it could be related to some local
> patch I have added...

I think I found the fix in the stable queue, so I guess it will be in
4.13.5 when that is out...

Will continue testing...

Thanks,
Ben

>
>
> [ 476.172823] ============================================
> [ 476.176863] WARNING: possible recursive locking detected
> [ 476.180895] 4.13.3+ #1 Not tainted
> [ 476.183025] --------------------------------------------
> [ 476.187053] kworker/u8:2/281 is trying to acquire lock:
> [ 476.190993] (&sta->ampdu_mlme.mtx){+.+...}, at: [<ffffffffa09cd4e8>] __ieee80211_start_rx_ba_session+0x178/0x670 [mac80211]
> [ 476.201004]
> but task is already holding lock:
> [ 476.204270] (&sta->ampdu_mlme.mtx){+.+...}, at: [<ffffffffa09c93a6>] ieee80211_ba_session_work+0x46/0x2b0 [mac80211]
> [ 476.213645]
> other info that might help us debug this:
> [ 476.217587] Possible unsafe locking scenario:
>
> [ 476.220930] CPU0
> [ 476.222082] ----
> [ 476.223236] lock(&sta->ampdu_mlme.mtx);
> [ 476.225957] lock(&sta->ampdu_mlme.mtx);
> [ 476.228673]
> *** DEADLOCK ***
>
> [ 476.230689] May be due to missing lock nesting notation
>
> [ 476.234879] 3 locks held by kworker/u8:2/281:
> [ 476.237941] #0: ("%s"wiphy_name(local->hw.wiphy)){++++.+}, at: [<ffffffff8113df8f>] process_one_work+0x14f/0x6a0
> [ 476.247033] #1: ((&sta->ampdu_mlme.work)){+.+...}, at: [<ffffffff8113df8f>] process_one_work+0x14f/0x6a0
> [ 476.255393] #2: (&sta->ampdu_mlme.mtx){+.+...}, at: [<ffffffffa09c93a6>] ieee80211_ba_session_work+0x46/0x2b0 [mac80211]
> [ 476.265170]
> stack backtrace:
> [ 476.266928] CPU: 0 PID: 281 Comm: kworker/u8:2 Not tainted 4.13.3+ #1
> [ 476.272073] Hardware name: _ _/ , BIOS 5.11 08/26/2016
> [ 476.275927] Workqueue: phy1 ieee80211_ba_session_work [mac80211]
> [ 476.280640] Call Trace:
> [ 476.281792] dump_stack+0x85/0xc7
> [ 476.283811] __lock_acquire+0x14ba/0x1520
> [ 476.286526] ? __save_stack_trace+0x6e/0xd0
> [ 476.289412] ? ret_from_fork+0x2a/0x40
> [ 476.291867] lock_acquire+0xac/0x200
> [ 476.294145] ? lock_acquire+0xac/0x200
> [ 476.296610] ? __ieee80211_start_rx_ba_session+0x178/0x670 [mac80211]
> [ 476.301766] ? __ieee80211_start_rx_ba_session+0x178/0x670 [mac80211]
> [ 476.306910] __mutex_lock+0x69/0x930
> [ 476.309194] ? __ieee80211_start_rx_ba_session+0x178/0x670 [mac80211]
> [ 476.314343] ? rcu_read_lock_sched_held+0x6d/0x80
> [ 476.317766] ? __sdata_dbg+0x14a/0x1a0 [mac80211]
> [ 476.321181] mutex_lock_nested+0x16/0x20
> [ 476.323810] ? mutex_lock_nested+0x16/0x20
> [ 476.326626] __ieee80211_start_rx_ba_session+0x178/0x670 [mac80211]
> [ 476.331627] ieee80211_ba_session_work+0x157/0x2b0 [mac80211]
> [ 476.336078] ? process_one_work+0x14f/0x6a0
> [ 476.338985] process_one_work+0x1ce/0x6a0
> [ 476.341696] worker_thread+0x46/0x400
> [ 476.344061] kthread+0x10f/0x150
> [ 476.346001] ? process_one_work+0x6a0/0x6a0
> [ 476.348886] ? kthread_create_on_node+0x40/0x40
> [ 476.352122] ret_from_fork+0x2a/0x40
>
> Thanks,
> Ben
>


--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com