2023-04-24 17:34:37

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH RFC v1 1/1] net: mac80211: fortify the spinlock against deadlock in interrupt

On Sun, 2023-04-23 at 10:24 +0200, Mirsad Goran Todorovac wrote:
> In the function ieee80211_tx_dequeue() there is a locking sequence:
>
> begin:
> spin_lock(&local->queue_stop_reason_lock);
> q_stopped = local->queue_stop_reasons[q];
> spin_unlock(&local->queue_stop_reason_lock);
>
> However small the chance (increased by ftracetest), an asynchronous
> interrupt can occur in between of spin_lock() and spin_unlock(),
> and the interrupt routine will attempt to lock the same
> &local->queue_stop_reason_lock again.
>
> This is the only remaining spin_lock() on local->queue_stop_reason_lock
> that did not disable interrupts and could have possibly caused the deadlock
> on the same CPU (core).
>
> This will cause a costly reset of the CPU and wifi device or an
> altogether hang in the single CPU and single core scenario.
>
> This is the probable reproduce of the deadlock:
>
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: Possible unsafe locking scenario:
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: CPU0
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: ----
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: lock(&local->queue_stop_reason_lock);
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: <Interrupt>
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: lock(&local->queue_stop_reason_lock);
> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:
> *** DEADLOCK ***
>
> Fixes: 4444bc2116ae

That fixes tag is wrong, should be

Fixes: 4444bc2116ae ("wifi: mac80211: Proper mark iTXQs for resumption")

Otherwise seems fine to me, submit it properly?

johannes


2023-04-25 08:31:48

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: [PATCH RFC v1 1/1] net: mac80211: fortify the spinlock against deadlock in interrupt

On 24.4.2023. 19:27, Johannes Berg wrote:
> On Sun, 2023-04-23 at 10:24 +0200, Mirsad Goran Todorovac wrote:
>> In the function ieee80211_tx_dequeue() there is a locking sequence:
>>
>> begin:
>> spin_lock(&local->queue_stop_reason_lock);
>> q_stopped = local->queue_stop_reasons[q];
>> spin_unlock(&local->queue_stop_reason_lock);
>>
>> However small the chance (increased by ftracetest), an asynchronous
>> interrupt can occur in between of spin_lock() and spin_unlock(),
>> and the interrupt routine will attempt to lock the same
>> &local->queue_stop_reason_lock again.
>>
>> This is the only remaining spin_lock() on local->queue_stop_reason_lock
>> that did not disable interrupts and could have possibly caused the deadlock
>> on the same CPU (core).
>>
>> This will cause a costly reset of the CPU and wifi device or an
>> altogether hang in the single CPU and single core scenario.
>>
>> This is the probable reproduce of the deadlock:
>>
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: Possible unsafe locking scenario:
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: CPU0
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: ----
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: lock(&local->queue_stop_reason_lock);
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: <Interrupt>
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel: lock(&local->queue_stop_reason_lock);
>> Apr 10 00:58:33 marvin-IdeaPad-3-15ITL6 kernel:
>> *** DEADLOCK ***
>>
>> Fixes: 4444bc2116ae
>
> That fixes tag is wrong, should be
>
> Fixes: 4444bc2116ae ("wifi: mac80211: Proper mark iTXQs for resumption")
>
> Otherwise seems fine to me, submit it properly?
>
> johannes

Will do, Sir. Do I have an Acked-by: ?

Thank you.

Mirsad

--
Mirsad Todorovac
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb
Republic of Croatia, the European Union

Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu