2023-04-25 16:55:51

by Mirsad Todorovac

[permalink] [raw]
Subject: [PATCH v4 1/1] wifi: mac80211: fortify the spinlock against deadlock by interrupt

In the function ieee80211_tx_dequeue() there is a particular locking
sequence:

begin:
spin_lock(&local->queue_stop_reason_lock);
q_stopped = local->queue_stop_reasons[q];
spin_unlock(&local->queue_stop_reason_lock);

However small the chance (increased by ftracetest), an asynchronous
interrupt can occur in between of spin_lock() and spin_unlock(),
and the interrupt routine will attempt to lock the same
&local->queue_stop_reason_lock again.

This will cause a costly reset of the CPU and the wifi device or an
altogether hang in the single CPU and single core scenario.

The only remaining spin_lock(&local->queue_stop_reason_lock) that
did not disable interrupts was patched, which should prevent any
deadlocks on the same CPU/core and the same wifi device.

This is the probable trace of the deadlock:

kernel: ================================
kernel: WARNING: inconsistent lock state
kernel: 6.3.0-rc6-mt-20230401-00001-gf86822a1170f #4 Tainted: G W
kernel: --------------------------------
kernel: inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
kernel: kworker/5:0/25656 [HC0[0]:SC0[0]:HE1:SE1] takes:
kernel: ffff9d6190779478 (&local->queue_stop_reason_lock){+.?.}-{2:2}, at: return_to_handler+0x0/0x40
kernel: {IN-SOFTIRQ-W} state was registered at:
kernel: lock_acquire+0xc7/0x2d0
kernel: _raw_spin_lock+0x36/0x50
kernel: ieee80211_tx_dequeue+0xb4/0x1330 [mac80211]
kernel: iwl_mvm_mac_itxq_xmit+0xae/0x210 [iwlmvm]
kernel: iwl_mvm_mac_wake_tx_queue+0x2d/0xd0 [iwlmvm]
kernel: ieee80211_queue_skb+0x450/0x730 [mac80211]
kernel: __ieee80211_xmit_fast.constprop.66+0x834/0xa50 [mac80211]
kernel: __ieee80211_subif_start_xmit+0x217/0x530 [mac80211]
kernel: ieee80211_subif_start_xmit+0x60/0x580 [mac80211]
kernel: dev_hard_start_xmit+0xb5/0x260
kernel: __dev_queue_xmit+0xdbe/0x1200
kernel: neigh_resolve_output+0x166/0x260
kernel: ip_finish_output2+0x216/0xb80
kernel: __ip_finish_output+0x2a4/0x4d0
kernel: ip_finish_output+0x2d/0xd0
kernel: ip_output+0x82/0x2b0
kernel: ip_local_out+0xec/0x110
kernel: igmpv3_sendpack+0x5c/0x90
kernel: igmp_ifc_timer_expire+0x26e/0x4e0
kernel: call_timer_fn+0xa5/0x230
kernel: run_timer_softirq+0x27f/0x550
kernel: __do_softirq+0xb4/0x3a4
kernel: irq_exit_rcu+0x9b/0xc0
kernel: sysvec_apic_timer_interrupt+0x80/0xa0
kernel: asm_sysvec_apic_timer_interrupt+0x1f/0x30
kernel: _raw_spin_unlock_irqrestore+0x3f/0x70
kernel: free_to_partial_list+0x3d6/0x590
kernel: __slab_free+0x1b7/0x310
kernel: kmem_cache_free+0x52d/0x550
kernel: putname+0x5d/0x70
kernel: do_sys_openat2+0x1d7/0x310
kernel: do_sys_open+0x51/0x80
kernel: __x64_sys_openat+0x24/0x30
kernel: do_syscall_64+0x5c/0x90
kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc
kernel: irq event stamp: 5120729
kernel: hardirqs last enabled at (5120729): [<ffffffff9d149936>] trace_graph_return+0xd6/0x120
kernel: hardirqs last disabled at (5120728): [<ffffffff9d149950>] trace_graph_return+0xf0/0x120
kernel: softirqs last enabled at (5069900): [<ffffffff9cf65b60>] return_to_handler+0x0/0x40
kernel: softirqs last disabled at (5067555): [<ffffffff9cf65b60>] return_to_handler+0x0/0x40
kernel:
other info that might help us debug this:
kernel: Possible unsafe locking scenario:
kernel: CPU0
kernel: ----
kernel: lock(&local->queue_stop_reason_lock);
kernel: <Interrupt>
kernel: lock(&local->queue_stop_reason_lock);
kernel:
*** DEADLOCK ***
kernel: 8 locks held by kworker/5:0/25656:
kernel: #0: ffff9d618009d138 ((wq_completion)events_freezable){+.+.}-{0:0}, at: process_one_work+0x1ca/0x530
kernel: #1: ffffb1ef4637fe68 ((work_completion)(&local->restart_work)){+.+.}-{0:0}, at: process_one_work+0x1ce/0x530
kernel: #2: ffffffff9f166548 (rtnl_mutex){+.+.}-{3:3}, at: return_to_handler+0x0/0x40
kernel: #3: ffff9d6190778728 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: return_to_handler+0x0/0x40
kernel: #4: ffff9d619077b480 (&mvm->mutex){+.+.}-{3:3}, at: return_to_handler+0x0/0x40
kernel: #5: ffff9d61907bacd8 (&trans_pcie->mutex){+.+.}-{3:3}, at: return_to_handler+0x0/0x40
kernel: #6: ffffffff9ef9cda0 (rcu_read_lock){....}-{1:2}, at: iwl_mvm_queue_state_change+0x59/0x3a0 [iwlmvm]
kernel: #7: ffffffff9ef9cda0 (rcu_read_lock){....}-{1:2}, at: iwl_mvm_mac_itxq_xmit+0x42/0x210 [iwlmvm]
kernel:
stack backtrace:
kernel: CPU: 5 PID: 25656 Comm: kworker/5:0 Tainted: G W 6.3.0-rc6-mt-20230401-00001-gf86822a1170f #4
kernel: Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN51WW 11/16/2022
kernel: Workqueue: events_freezable ieee80211_restart_work [mac80211]
kernel: Call Trace:
kernel: <TASK>
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: dump_stack_lvl+0x5f/0xa0
kernel: dump_stack+0x14/0x20
kernel: print_usage_bug.part.46+0x208/0x2a0
kernel: mark_lock.part.47+0x605/0x630
kernel: ? sched_clock+0xd/0x20
kernel: ? trace_clock_local+0x14/0x30
kernel: ? __rb_reserve_next+0x5f/0x490
kernel: ? _raw_spin_lock+0x1b/0x50
kernel: __lock_acquire+0x464/0x1990
kernel: ? mark_held_locks+0x4e/0x80
kernel: lock_acquire+0xc7/0x2d0
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: ? ftrace_return_to_handler+0x8b/0x100
kernel: ? preempt_count_add+0x4/0x70
kernel: _raw_spin_lock+0x36/0x50
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: ieee80211_tx_dequeue+0xb4/0x1330 [mac80211]
kernel: ? prepare_ftrace_return+0xc5/0x190
kernel: ? ftrace_graph_func+0x16/0x20
kernel: ? 0xffffffffc02ab0b1
kernel: ? lock_acquire+0xc7/0x2d0
kernel: ? iwl_mvm_mac_itxq_xmit+0x42/0x210 [iwlmvm]
kernel: ? ieee80211_tx_dequeue+0x9/0x1330 [mac80211]
kernel: ? __rcu_read_lock+0x4/0x40
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: iwl_mvm_mac_itxq_xmit+0xae/0x210 [iwlmvm]
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: iwl_mvm_queue_state_change+0x311/0x3a0 [iwlmvm]
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: iwl_mvm_wake_sw_queue+0x17/0x20 [iwlmvm]
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: iwl_txq_gen2_unmap+0x1c9/0x1f0 [iwlwifi]
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: iwl_txq_gen2_free+0x55/0x130 [iwlwifi]
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: iwl_txq_gen2_tx_free+0x63/0x80 [iwlwifi]
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: _iwl_trans_pcie_gen2_stop_device+0x3f3/0x5b0 [iwlwifi]
kernel: ? _iwl_trans_pcie_gen2_stop_device+0x9/0x5b0 [iwlwifi]
kernel: ? mutex_lock_nested+0x4/0x30
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: iwl_trans_pcie_gen2_stop_device+0x5f/0x90 [iwlwifi]
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: iwl_mvm_stop_device+0x78/0xd0 [iwlmvm]
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: __iwl_mvm_mac_start+0x114/0x210 [iwlmvm]
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: iwl_mvm_mac_start+0x76/0x150 [iwlmvm]
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: drv_start+0x79/0x180 [mac80211]
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: ieee80211_reconfig+0x1523/0x1ce0 [mac80211]
kernel: ? synchronize_net+0x4/0x50
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: ieee80211_restart_work+0x108/0x170 [mac80211]
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: process_one_work+0x250/0x530
kernel: ? ftrace_regs_caller_end+0x66/0x66
kernel: worker_thread+0x48/0x3a0
kernel: ? __pfx_worker_thread+0x10/0x10
kernel: kthread+0x10f/0x140
kernel: ? __pfx_kthread+0x10/0x10
kernel: ret_from_fork+0x29/0x50
kernel: </TASK>

Fixes: 4444bc2116ae ("wifi: mac80211: Proper mark iTXQs for resumption")
Link: https://lore.kernel.org/all/[email protected]/
Reported-by: Mirsad Goran Todorovac <[email protected]>
Cc: Gregory Greenman <[email protected]>
Cc: Johannes Berg <[email protected]>
Link: https://lore.kernel.org/all/[email protected]/
Cc: David S. Miller <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Paolo Abeni <[email protected]>
Cc: Leon Romanovsky <[email protected]>
Cc: Alexander Wetzel <[email protected]>
Signed-off-by: Mirsad Goran Todorovac <[email protected]>
---
v3 -> v4:
- Added whole lockdep trace as advised.
- Trimmed irrelevant line prefix.
v2 -> v3:
- Fix the Fixes: tag as advised.
- Change the net: to wifi: to comply with the original patch that
is being fixed.
v1 -> v2:
- Minor rewording and clarification.
- Cc:-ed people that replied to the original bug report (forgotten
in v1 by omission).

net/mac80211/tx.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 7699fb410670..45cb8e7bcc61 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3781,6 +3781,7 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
ieee80211_tx_result r;
struct ieee80211_vif *vif = txq->vif;
int q = vif->hw_queue[txq->ac];
+ unsigned long flags;
bool q_stopped;

WARN_ON_ONCE(softirq_count() == 0);
@@ -3789,9 +3790,9 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
return NULL;

begin:
- spin_lock(&local->queue_stop_reason_lock);
+ spin_lock_irqsave(&local->queue_stop_reason_lock, flags);
q_stopped = local->queue_stop_reasons[q];
- spin_unlock(&local->queue_stop_reason_lock);
+ spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);

if (unlikely(q_stopped)) {
/* mark for waking later */
--
2.30.2


2023-04-26 06:52:41

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH v4 1/1] wifi: mac80211: fortify the spinlock against deadlock by interrupt

On Tue, Apr 25, 2023 at 06:40:08PM +0200, Mirsad Goran Todorovac wrote:
> In the function ieee80211_tx_dequeue() there is a particular locking
> sequence:
>
> begin:
> spin_lock(&local->queue_stop_reason_lock);
> q_stopped = local->queue_stop_reasons[q];
> spin_unlock(&local->queue_stop_reason_lock);
>
> However small the chance (increased by ftracetest), an asynchronous
> interrupt can occur in between of spin_lock() and spin_unlock(),
> and the interrupt routine will attempt to lock the same
> &local->queue_stop_reason_lock again.
>
> This will cause a costly reset of the CPU and the wifi device or an
> altogether hang in the single CPU and single core scenario.
>
> The only remaining spin_lock(&local->queue_stop_reason_lock) that
> did not disable interrupts was patched, which should prevent any
> deadlocks on the same CPU/core and the same wifi device.
>
> This is the probable trace of the deadlock:
>
> kernel: ================================
> kernel: WARNING: inconsistent lock state
> kernel: 6.3.0-rc6-mt-20230401-00001-gf86822a1170f #4 Tainted: G W
> kernel: --------------------------------
> kernel: inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> kernel: kworker/5:0/25656 [HC0[0]:SC0[0]:HE1:SE1] takes:
> kernel: ffff9d6190779478 (&local->queue_stop_reason_lock){+.?.}-{2:2}, at: return_to_handler+0x0/0x40
> kernel: {IN-SOFTIRQ-W} state was registered at:
> kernel: lock_acquire+0xc7/0x2d0
> kernel: _raw_spin_lock+0x36/0x50
> kernel: ieee80211_tx_dequeue+0xb4/0x1330 [mac80211]
> kernel: iwl_mvm_mac_itxq_xmit+0xae/0x210 [iwlmvm]
> kernel: iwl_mvm_mac_wake_tx_queue+0x2d/0xd0 [iwlmvm]
> kernel: ieee80211_queue_skb+0x450/0x730 [mac80211]
> kernel: __ieee80211_xmit_fast.constprop.66+0x834/0xa50 [mac80211]
> kernel: __ieee80211_subif_start_xmit+0x217/0x530 [mac80211]
> kernel: ieee80211_subif_start_xmit+0x60/0x580 [mac80211]
> kernel: dev_hard_start_xmit+0xb5/0x260
> kernel: __dev_queue_xmit+0xdbe/0x1200
> kernel: neigh_resolve_output+0x166/0x260
> kernel: ip_finish_output2+0x216/0xb80
> kernel: __ip_finish_output+0x2a4/0x4d0
> kernel: ip_finish_output+0x2d/0xd0
> kernel: ip_output+0x82/0x2b0
> kernel: ip_local_out+0xec/0x110
> kernel: igmpv3_sendpack+0x5c/0x90
> kernel: igmp_ifc_timer_expire+0x26e/0x4e0
> kernel: call_timer_fn+0xa5/0x230
> kernel: run_timer_softirq+0x27f/0x550
> kernel: __do_softirq+0xb4/0x3a4
> kernel: irq_exit_rcu+0x9b/0xc0
> kernel: sysvec_apic_timer_interrupt+0x80/0xa0
> kernel: asm_sysvec_apic_timer_interrupt+0x1f/0x30
> kernel: _raw_spin_unlock_irqrestore+0x3f/0x70
> kernel: free_to_partial_list+0x3d6/0x590
> kernel: __slab_free+0x1b7/0x310
> kernel: kmem_cache_free+0x52d/0x550
> kernel: putname+0x5d/0x70
> kernel: do_sys_openat2+0x1d7/0x310
> kernel: do_sys_open+0x51/0x80
> kernel: __x64_sys_openat+0x24/0x30
> kernel: do_syscall_64+0x5c/0x90
> kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc
> kernel: irq event stamp: 5120729
> kernel: hardirqs last enabled at (5120729): [<ffffffff9d149936>] trace_graph_return+0xd6/0x120
> kernel: hardirqs last disabled at (5120728): [<ffffffff9d149950>] trace_graph_return+0xf0/0x120
> kernel: softirqs last enabled at (5069900): [<ffffffff9cf65b60>] return_to_handler+0x0/0x40
> kernel: softirqs last disabled at (5067555): [<ffffffff9cf65b60>] return_to_handler+0x0/0x40
> kernel:
> other info that might help us debug this:
> kernel: Possible unsafe locking scenario:
> kernel: CPU0
> kernel: ----
> kernel: lock(&local->queue_stop_reason_lock);
> kernel: <Interrupt>
> kernel: lock(&local->queue_stop_reason_lock);
> kernel:
> *** DEADLOCK ***
> kernel: 8 locks held by kworker/5:0/25656:
> kernel: #0: ffff9d618009d138 ((wq_completion)events_freezable){+.+.}-{0:0}, at: process_one_work+0x1ca/0x530
> kernel: #1: ffffb1ef4637fe68 ((work_completion)(&local->restart_work)){+.+.}-{0:0}, at: process_one_work+0x1ce/0x530
> kernel: #2: ffffffff9f166548 (rtnl_mutex){+.+.}-{3:3}, at: return_to_handler+0x0/0x40
> kernel: #3: ffff9d6190778728 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: return_to_handler+0x0/0x40
> kernel: #4: ffff9d619077b480 (&mvm->mutex){+.+.}-{3:3}, at: return_to_handler+0x0/0x40
> kernel: #5: ffff9d61907bacd8 (&trans_pcie->mutex){+.+.}-{3:3}, at: return_to_handler+0x0/0x40
> kernel: #6: ffffffff9ef9cda0 (rcu_read_lock){....}-{1:2}, at: iwl_mvm_queue_state_change+0x59/0x3a0 [iwlmvm]
> kernel: #7: ffffffff9ef9cda0 (rcu_read_lock){....}-{1:2}, at: iwl_mvm_mac_itxq_xmit+0x42/0x210 [iwlmvm]
> kernel:
> stack backtrace:
> kernel: CPU: 5 PID: 25656 Comm: kworker/5:0 Tainted: G W 6.3.0-rc6-mt-20230401-00001-gf86822a1170f #4
> kernel: Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN51WW 11/16/2022
> kernel: Workqueue: events_freezable ieee80211_restart_work [mac80211]
> kernel: Call Trace:
> kernel: <TASK>
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: dump_stack_lvl+0x5f/0xa0
> kernel: dump_stack+0x14/0x20
> kernel: print_usage_bug.part.46+0x208/0x2a0
> kernel: mark_lock.part.47+0x605/0x630
> kernel: ? sched_clock+0xd/0x20
> kernel: ? trace_clock_local+0x14/0x30
> kernel: ? __rb_reserve_next+0x5f/0x490
> kernel: ? _raw_spin_lock+0x1b/0x50
> kernel: __lock_acquire+0x464/0x1990
> kernel: ? mark_held_locks+0x4e/0x80
> kernel: lock_acquire+0xc7/0x2d0
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: ? ftrace_return_to_handler+0x8b/0x100
> kernel: ? preempt_count_add+0x4/0x70
> kernel: _raw_spin_lock+0x36/0x50
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: ieee80211_tx_dequeue+0xb4/0x1330 [mac80211]
> kernel: ? prepare_ftrace_return+0xc5/0x190
> kernel: ? ftrace_graph_func+0x16/0x20
> kernel: ? 0xffffffffc02ab0b1
> kernel: ? lock_acquire+0xc7/0x2d0
> kernel: ? iwl_mvm_mac_itxq_xmit+0x42/0x210 [iwlmvm]
> kernel: ? ieee80211_tx_dequeue+0x9/0x1330 [mac80211]
> kernel: ? __rcu_read_lock+0x4/0x40
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: iwl_mvm_mac_itxq_xmit+0xae/0x210 [iwlmvm]
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: iwl_mvm_queue_state_change+0x311/0x3a0 [iwlmvm]
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: iwl_mvm_wake_sw_queue+0x17/0x20 [iwlmvm]
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: iwl_txq_gen2_unmap+0x1c9/0x1f0 [iwlwifi]
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: iwl_txq_gen2_free+0x55/0x130 [iwlwifi]
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: iwl_txq_gen2_tx_free+0x63/0x80 [iwlwifi]
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: _iwl_trans_pcie_gen2_stop_device+0x3f3/0x5b0 [iwlwifi]
> kernel: ? _iwl_trans_pcie_gen2_stop_device+0x9/0x5b0 [iwlwifi]
> kernel: ? mutex_lock_nested+0x4/0x30
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: iwl_trans_pcie_gen2_stop_device+0x5f/0x90 [iwlwifi]
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: iwl_mvm_stop_device+0x78/0xd0 [iwlmvm]
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: __iwl_mvm_mac_start+0x114/0x210 [iwlmvm]
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: iwl_mvm_mac_start+0x76/0x150 [iwlmvm]
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: drv_start+0x79/0x180 [mac80211]
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: ieee80211_reconfig+0x1523/0x1ce0 [mac80211]
> kernel: ? synchronize_net+0x4/0x50
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: ieee80211_restart_work+0x108/0x170 [mac80211]
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: process_one_work+0x250/0x530
> kernel: ? ftrace_regs_caller_end+0x66/0x66
> kernel: worker_thread+0x48/0x3a0
> kernel: ? __pfx_worker_thread+0x10/0x10
> kernel: kthread+0x10f/0x140
> kernel: ? __pfx_kthread+0x10/0x10
> kernel: ret_from_fork+0x29/0x50
> kernel: </TASK>
>
> Fixes: 4444bc2116ae ("wifi: mac80211: Proper mark iTXQs for resumption")
> Link: https://lore.kernel.org/all/[email protected]/
> Reported-by: Mirsad Goran Todorovac <[email protected]>
> Cc: Gregory Greenman <[email protected]>
> Cc: Johannes Berg <[email protected]>
> Link: https://lore.kernel.org/all/[email protected]/
> Cc: David S. Miller <[email protected]>
> Cc: Eric Dumazet <[email protected]>
> Cc: Jakub Kicinski <[email protected]>
> Cc: Paolo Abeni <[email protected]>
> Cc: Leon Romanovsky <[email protected]>
> Cc: Alexander Wetzel <[email protected]>
> Signed-off-by: Mirsad Goran Todorovac <[email protected]>
> ---
> v3 -> v4:
> - Added whole lockdep trace as advised.
> - Trimmed irrelevant line prefix.
> v2 -> v3:
> - Fix the Fixes: tag as advised.
> - Change the net: to wifi: to comply with the original patch that
> is being fixed.
> v1 -> v2:
> - Minor rewording and clarification.
> - Cc:-ed people that replied to the original bug report (forgotten
> in v1 by omission).
>
> net/mac80211/tx.c | 5 +++--
> 1 file changed, 3 insertions(+), 2 deletions(-)
>

Thanks,
Reviewed-by: Leon Romanovsky <[email protected]>

2023-04-26 14:14:35

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: [PATCH v4 1/1] wifi: mac80211: fortify the spinlock against deadlock by interrupt

On 4/26/23 08:41, Leon Romanovsky wrote:
> On Tue, Apr 25, 2023 at 06:40:08PM +0200, Mirsad Goran Todorovac wrote:
>> In the function ieee80211_tx_dequeue() there is a particular locking
>> sequence:
>>
>> begin:
>> spin_lock(&local->queue_stop_reason_lock);
>> q_stopped = local->queue_stop_reasons[q];
>> spin_unlock(&local->queue_stop_reason_lock);
>>
>> However small the chance (increased by ftracetest), an asynchronous
>> interrupt can occur in between of spin_lock() and spin_unlock(),
>> and the interrupt routine will attempt to lock the same
>> &local->queue_stop_reason_lock again.
>>
>> This will cause a costly reset of the CPU and the wifi device or an
>> altogether hang in the single CPU and single core scenario.
>>
>> The only remaining spin_lock(&local->queue_stop_reason_lock) that
>> did not disable interrupts was patched, which should prevent any
>> deadlocks on the same CPU/core and the same wifi device.
>>
>> This is the probable trace of the deadlock:
>>
>> kernel: ================================
>> kernel: WARNING: inconsistent lock state
>> kernel: 6.3.0-rc6-mt-20230401-00001-gf86822a1170f #4 Tainted: G W
>> kernel: --------------------------------
>> kernel: inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
>> kernel: kworker/5:0/25656 [HC0[0]:SC0[0]:HE1:SE1] takes:
>> kernel: ffff9d6190779478 (&local->queue_stop_reason_lock){+.?.}-{2:2}, at: return_to_handler+0x0/0x40
>> kernel: {IN-SOFTIRQ-W} state was registered at:
>> kernel: lock_acquire+0xc7/0x2d0
>> kernel: _raw_spin_lock+0x36/0x50
>> kernel: ieee80211_tx_dequeue+0xb4/0x1330 [mac80211]
>> kernel: iwl_mvm_mac_itxq_xmit+0xae/0x210 [iwlmvm]
>> kernel: iwl_mvm_mac_wake_tx_queue+0x2d/0xd0 [iwlmvm]
>> kernel: ieee80211_queue_skb+0x450/0x730 [mac80211]
>> kernel: __ieee80211_xmit_fast.constprop.66+0x834/0xa50 [mac80211]
>> kernel: __ieee80211_subif_start_xmit+0x217/0x530 [mac80211]
>> kernel: ieee80211_subif_start_xmit+0x60/0x580 [mac80211]
>> kernel: dev_hard_start_xmit+0xb5/0x260
>> kernel: __dev_queue_xmit+0xdbe/0x1200
>> kernel: neigh_resolve_output+0x166/0x260
>> kernel: ip_finish_output2+0x216/0xb80
>> kernel: __ip_finish_output+0x2a4/0x4d0
>> kernel: ip_finish_output+0x2d/0xd0
>> kernel: ip_output+0x82/0x2b0
>> kernel: ip_local_out+0xec/0x110
>> kernel: igmpv3_sendpack+0x5c/0x90
>> kernel: igmp_ifc_timer_expire+0x26e/0x4e0
>> kernel: call_timer_fn+0xa5/0x230
>> kernel: run_timer_softirq+0x27f/0x550
>> kernel: __do_softirq+0xb4/0x3a4
>> kernel: irq_exit_rcu+0x9b/0xc0
>> kernel: sysvec_apic_timer_interrupt+0x80/0xa0
>> kernel: asm_sysvec_apic_timer_interrupt+0x1f/0x30
>> kernel: _raw_spin_unlock_irqrestore+0x3f/0x70
>> kernel: free_to_partial_list+0x3d6/0x590
>> kernel: __slab_free+0x1b7/0x310
>> kernel: kmem_cache_free+0x52d/0x550
>> kernel: putname+0x5d/0x70
>> kernel: do_sys_openat2+0x1d7/0x310
>> kernel: do_sys_open+0x51/0x80
>> kernel: __x64_sys_openat+0x24/0x30
>> kernel: do_syscall_64+0x5c/0x90
>> kernel: entry_SYSCALL_64_after_hwframe+0x72/0xdc
>> kernel: irq event stamp: 5120729
>> kernel: hardirqs last enabled at (5120729): [<ffffffff9d149936>] trace_graph_return+0xd6/0x120
>> kernel: hardirqs last disabled at (5120728): [<ffffffff9d149950>] trace_graph_return+0xf0/0x120
>> kernel: softirqs last enabled at (5069900): [<ffffffff9cf65b60>] return_to_handler+0x0/0x40
>> kernel: softirqs last disabled at (5067555): [<ffffffff9cf65b60>] return_to_handler+0x0/0x40
>> kernel:
>> other info that might help us debug this:
>> kernel: Possible unsafe locking scenario:
>> kernel: CPU0
>> kernel: ----
>> kernel: lock(&local->queue_stop_reason_lock);
>> kernel: <Interrupt>
>> kernel: lock(&local->queue_stop_reason_lock);
>> kernel:
>> *** DEADLOCK ***
>> kernel: 8 locks held by kworker/5:0/25656:
>> kernel: #0: ffff9d618009d138 ((wq_completion)events_freezable){+.+.}-{0:0}, at: process_one_work+0x1ca/0x530
>> kernel: #1: ffffb1ef4637fe68 ((work_completion)(&local->restart_work)){+.+.}-{0:0}, at: process_one_work+0x1ce/0x530
>> kernel: #2: ffffffff9f166548 (rtnl_mutex){+.+.}-{3:3}, at: return_to_handler+0x0/0x40
>> kernel: #3: ffff9d6190778728 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: return_to_handler+0x0/0x40
>> kernel: #4: ffff9d619077b480 (&mvm->mutex){+.+.}-{3:3}, at: return_to_handler+0x0/0x40
>> kernel: #5: ffff9d61907bacd8 (&trans_pcie->mutex){+.+.}-{3:3}, at: return_to_handler+0x0/0x40
>> kernel: #6: ffffffff9ef9cda0 (rcu_read_lock){....}-{1:2}, at: iwl_mvm_queue_state_change+0x59/0x3a0 [iwlmvm]
>> kernel: #7: ffffffff9ef9cda0 (rcu_read_lock){....}-{1:2}, at: iwl_mvm_mac_itxq_xmit+0x42/0x210 [iwlmvm]
>> kernel:
>> stack backtrace:
>> kernel: CPU: 5 PID: 25656 Comm: kworker/5:0 Tainted: G W 6.3.0-rc6-mt-20230401-00001-gf86822a1170f #4
>> kernel: Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN51WW 11/16/2022
>> kernel: Workqueue: events_freezable ieee80211_restart_work [mac80211]
>> kernel: Call Trace:
>> kernel: <TASK>
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: dump_stack_lvl+0x5f/0xa0
>> kernel: dump_stack+0x14/0x20
>> kernel: print_usage_bug.part.46+0x208/0x2a0
>> kernel: mark_lock.part.47+0x605/0x630
>> kernel: ? sched_clock+0xd/0x20
>> kernel: ? trace_clock_local+0x14/0x30
>> kernel: ? __rb_reserve_next+0x5f/0x490
>> kernel: ? _raw_spin_lock+0x1b/0x50
>> kernel: __lock_acquire+0x464/0x1990
>> kernel: ? mark_held_locks+0x4e/0x80
>> kernel: lock_acquire+0xc7/0x2d0
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: ? ftrace_return_to_handler+0x8b/0x100
>> kernel: ? preempt_count_add+0x4/0x70
>> kernel: _raw_spin_lock+0x36/0x50
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: ieee80211_tx_dequeue+0xb4/0x1330 [mac80211]
>> kernel: ? prepare_ftrace_return+0xc5/0x190
>> kernel: ? ftrace_graph_func+0x16/0x20
>> kernel: ? 0xffffffffc02ab0b1
>> kernel: ? lock_acquire+0xc7/0x2d0
>> kernel: ? iwl_mvm_mac_itxq_xmit+0x42/0x210 [iwlmvm]
>> kernel: ? ieee80211_tx_dequeue+0x9/0x1330 [mac80211]
>> kernel: ? __rcu_read_lock+0x4/0x40
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: iwl_mvm_mac_itxq_xmit+0xae/0x210 [iwlmvm]
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: iwl_mvm_queue_state_change+0x311/0x3a0 [iwlmvm]
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: iwl_mvm_wake_sw_queue+0x17/0x20 [iwlmvm]
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: iwl_txq_gen2_unmap+0x1c9/0x1f0 [iwlwifi]
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: iwl_txq_gen2_free+0x55/0x130 [iwlwifi]
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: iwl_txq_gen2_tx_free+0x63/0x80 [iwlwifi]
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: _iwl_trans_pcie_gen2_stop_device+0x3f3/0x5b0 [iwlwifi]
>> kernel: ? _iwl_trans_pcie_gen2_stop_device+0x9/0x5b0 [iwlwifi]
>> kernel: ? mutex_lock_nested+0x4/0x30
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: iwl_trans_pcie_gen2_stop_device+0x5f/0x90 [iwlwifi]
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: iwl_mvm_stop_device+0x78/0xd0 [iwlmvm]
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: __iwl_mvm_mac_start+0x114/0x210 [iwlmvm]
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: iwl_mvm_mac_start+0x76/0x150 [iwlmvm]
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: drv_start+0x79/0x180 [mac80211]
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: ieee80211_reconfig+0x1523/0x1ce0 [mac80211]
>> kernel: ? synchronize_net+0x4/0x50
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: ieee80211_restart_work+0x108/0x170 [mac80211]
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: process_one_work+0x250/0x530
>> kernel: ? ftrace_regs_caller_end+0x66/0x66
>> kernel: worker_thread+0x48/0x3a0
>> kernel: ? __pfx_worker_thread+0x10/0x10
>> kernel: kthread+0x10f/0x140
>> kernel: ? __pfx_kthread+0x10/0x10
>> kernel: ret_from_fork+0x29/0x50
>> kernel: </TASK>
>>
>> Fixes: 4444bc2116ae ("wifi: mac80211: Proper mark iTXQs for resumption")
>> Link: https://lore.kernel.org/all/[email protected]/
>> Reported-by: Mirsad Goran Todorovac <[email protected]>
>> Cc: Gregory Greenman <[email protected]>
>> Cc: Johannes Berg <[email protected]>
>> Link: https://lore.kernel.org/all/[email protected]/
>> Cc: David S. Miller <[email protected]>
>> Cc: Eric Dumazet <[email protected]>
>> Cc: Jakub Kicinski <[email protected]>
>> Cc: Paolo Abeni <[email protected]>
>> Cc: Leon Romanovsky <[email protected]>
>> Cc: Alexander Wetzel <[email protected]>
>> Signed-off-by: Mirsad Goran Todorovac <[email protected]>
>> ---
>> v3 -> v4:
>> - Added whole lockdep trace as advised.
>> - Trimmed irrelevant line prefix.
>> v2 -> v3:
>> - Fix the Fixes: tag as advised.
>> - Change the net: to wifi: to comply with the original patch that
>> is being fixed.
>> v1 -> v2:
>> - Minor rewording and clarification.
>> - Cc:-ed people that replied to the original bug report (forgotten
>> in v1 by omission).
>>
>> net/mac80211/tx.c | 5 +++--
>> 1 file changed, 3 insertions(+), 2 deletions(-)
>
> Thanks,
> Reviewed-by: Leon Romanovsky <[email protected]>

Not at all.

That's awesome! Just to ask, do I need to send the PATCH v5 with the
Reviewed-by: tag, or it goes automatically?

Thanks.

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu

System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia

"What’s this thing suddenly coming towards me very fast? Very very fast.
... I wonder if it will be friends with me?"

2023-04-26 18:10:05

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: [PATCH v4 1/1] wifi: mac80211: fortify the spinlock against deadlock by interrupt

On 26. 04. 2023. 17:05, Johannes Berg wrote:
> On Wed, 2023-04-26 at 16:02 +0200, Mirsad Todorovac wrote:
>>
>> That's awesome! Just to ask, do I need to send the PATCH v5 with the
>> Reviewed-by: tag, or it goes automatically?
>>
>
> Patchwork will be pick it up automatically.
>
> johannes

That's awesome, thank you for the update.

Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu

System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union

"I see something approaching fast ... Will it be friends with me?"