2023-01-10 12:52:15

by Bryan O'Donoghue

[permalink] [raw]
Subject: Re: ieee80211_handle_wake_tx_queue and dynamic ps regression

+ linux-wireless
On 10/01/2023 12:35, Bryan O'Donoghue wrote:
> commit a790cc3a4fad75048295571a350b95b87e022a5a
> (wake_tx_queue-broken-23-08-01)
> Author: Alexander Wetzel <[email protected]>
> Date:   Sun Oct 9 18:30:39 2022 +0200
>
>     wifi: mac80211: add wake_tx_queue callback to drivers
>
> is causing a regression with
>
> - CONF_PS = 1
> - CONF_DYNAMIC_PS = 0
> - ieee80211_handle_wake_tx_queue
>
> In this case we get stuck in a loop similar to this
>
> // IEEE80211_CONF_CHANGE_PS
> [   17.255480] wcn36xx: wcn36xx_change_ps/312 enable
> [   18.088835] ieee80211_tx_h_dynamic_ps/263 setting
> IEEE80211_QUEUE_STOP_REASON_PS
> [   18.088906] ieee80211_handle_wake_tx_queue/334 entry
> [   18.091505] ieee80211_dynamic_ps_disable_work/2250 calling
> ieee80211_hw_config()
> [   18.095370] ieee80211_handle_wake_tx_queue/338 wake_tx_push_queue
>
> // IEEE80211_CONF_CHANGE_PS
> [   18.102625] wcn36xx: wcn36xx_change_ps/312 disable
> [   18.107643] wake_tx_push_queue/303 entry
>
> // txq is stopped here reason == IEEE80211_QUEUE_STOP_REASON_PS
> [   18.107654] wake_tx_push_queue/311 q_stopped bitmask 0x00000002
> IEEE80211_QUEUE_STOP_REASON_PS true
> [   18.107661] wake_tx_push_queue/324 exit
> [   18.107667] ieee80211_handle_wake_tx_queue/342 exit
> [   18.115560] ieee80211_handle_wake_tx_queue/334 entry
> [   18.139937] ieee80211_handle_wake_tx_queue/338 wake_tx_push_queue
> [   18.145163] wake_tx_push_queue/303 entry
> [   18.150016] ieee80211_dynamic_ps_disable_work/2252 completed
> ieee80211_hw_config()
>
> // now we unset IEEE80211_QUEUE_STOP_REASON_PS but too late
> [   18.151145] wake_tx_push_queue/311 q_stopped bitmask 0x00000002
> IEEE80211_QUEUE_STOP_REASON_PS true
> [   18.155263] ieee80211_dynamic_ps_disable_work/2254 clearing
> IEEE80211_QUEUE_STOP_REASON_PS
> [   18.162531] wake_tx_push_queue/324 exit
> [   18.162548] ieee80211_handle_wake_tx_queue/342 exit
> [   18.183639] ieee80211_dynamic_ps_disable_work/2259 cleared
> IEEE80211_QUEUE_STOP_REASON_PS
>
> // IEEE80211_CONF_CHANGE_PS runs again
> [   18.215487] wcn36xx: wcn36xx_change_ps/312 enable
>
> We get stuck in that loop. Packets getting transmitted is a rare event,
> most are dropped.
>
> I tried this as a fix
>
> --- a/net/mac80211/mlme.c
> +++ b/net/mac80211/mlme.c
> @@ -2245,15 +2245,15 @@ void ieee80211_dynamic_ps_disable_work(struct
> work_struct *work)
>                 container_of(work, struct ieee80211_local,
>                              dynamic_ps_disable_work);
>
> -       if (local->hw.conf.flags & IEEE80211_CONF_PS) {
> -               local->hw.conf.flags &= ~IEEE80211_CONF_PS;
> -               ieee80211_hw_config(local, IEEE80211_CONF_CHANGE_PS);
> -       }
> -
>         ieee80211_wake_queues_by_reason(&local->hw,
>                                         IEEE80211_MAX_QUEUE_MAP,
>                                         IEEE80211_QUEUE_STOP_REASON_PS,
>                                         false);
> +
> +       if (local->hw.conf.flags & IEEE80211_CONF_PS) {
> +               local->hw.conf.flags &= ~IEEE80211_CONF_PS;
> +               ieee80211_hw_config(local, IEEE80211_CONF_CHANGE_PS);
> +       }
>  }
>
> but it only "slightly improves" the situation, the fundamental race
> condition is still there.
>
> Suggest reverting this change and trying again.
>
> ---
> bod


2023-01-10 14:49:56

by Bryan O'Donoghue

[permalink] [raw]
Subject: Re: ieee80211_handle_wake_tx_queue and dynamic ps regression

On 10/01/2023 12:44, Bryan O'Donoghue wrote:
> + linux-wireless
> On 10/01/2023 12:35, Bryan O'Donoghue wrote:
>> commit a790cc3a4fad75048295571a350b95b87e022a5a
>> (wake_tx_queue-broken-23-08-01)
>> Author: Alexander Wetzel <[email protected]>
>> Date:   Sun Oct 9 18:30:39 2022 +0200
>>
>>      wifi: mac80211: add wake_tx_queue callback to drivers
>>
>> is causing a regression with
>>
>> - CONF_PS = 1
>> - CONF_DYNAMIC_PS = 0
>> - ieee80211_handle_wake_tx_queue
>>
>> In this case we get stuck in a loop similar to this
>>
>> // IEEE80211_CONF_CHANGE_PS
>> [   17.255480] wcn36xx: wcn36xx_change_ps/312 enable
>> [   18.088835] ieee80211_tx_h_dynamic_ps/263 setting
>> IEEE80211_QUEUE_STOP_REASON_PS
>> [   18.088906] ieee80211_handle_wake_tx_queue/334 entry
>> [   18.091505] ieee80211_dynamic_ps_disable_work/2250 calling
>> ieee80211_hw_config()
>> [   18.095370] ieee80211_handle_wake_tx_queue/338 wake_tx_push_queue
>>
>> // IEEE80211_CONF_CHANGE_PS
>> [   18.102625] wcn36xx: wcn36xx_change_ps/312 disable
>> [   18.107643] wake_tx_push_queue/303 entry
>>
>> // txq is stopped here reason == IEEE80211_QUEUE_STOP_REASON_PS
>> [   18.107654] wake_tx_push_queue/311 q_stopped bitmask 0x00000002
>> IEEE80211_QUEUE_STOP_REASON_PS true
>> [   18.107661] wake_tx_push_queue/324 exit
>> [   18.107667] ieee80211_handle_wake_tx_queue/342 exit
>> [   18.115560] ieee80211_handle_wake_tx_queue/334 entry
>> [   18.139937] ieee80211_handle_wake_tx_queue/338 wake_tx_push_queue
>> [   18.145163] wake_tx_push_queue/303 entry
>> [   18.150016] ieee80211_dynamic_ps_disable_work/2252 completed
>> ieee80211_hw_config()
>>
>> // now we unset IEEE80211_QUEUE_STOP_REASON_PS but too late
>> [   18.151145] wake_tx_push_queue/311 q_stopped bitmask 0x00000002
>> IEEE80211_QUEUE_STOP_REASON_PS true
>> [   18.155263] ieee80211_dynamic_ps_disable_work/2254 clearing
>> IEEE80211_QUEUE_STOP_REASON_PS
>> [   18.162531] wake_tx_push_queue/324 exit
>> [   18.162548] ieee80211_handle_wake_tx_queue/342 exit
>> [   18.183639] ieee80211_dynamic_ps_disable_work/2259 cleared
>> IEEE80211_QUEUE_STOP_REASON_PS
>>
>> // IEEE80211_CONF_CHANGE_PS runs again
>> [   18.215487] wcn36xx: wcn36xx_change_ps/312 enable
>>
>> We get stuck in that loop. Packets getting transmitted is a rare
>> event, most are dropped.

BTW I considered implementing a wcn36xx specific wake_tx callback -
which maybe should be done anyway.

I _don't_ see other drivers checking for q_stopped &
IEEE80211_QUEUE_STOP_REASON_PS

Should they be ?

If they should check IEEE80211_QUEUE_STOP_REASON_PS, then right now,
they don't. If they shouldn't check IEEE80211_QUEUE_STOP_REASON_PS then
neither should the generic replacement ieee80211_handle_wake_tx_queue()

---
bod

2023-01-10 15:25:41

by Alexander Wetzel

[permalink] [raw]
Subject: Re: ieee80211_handle_wake_tx_queue and dynamic ps regression

On 10.01.23 15:47, Bryan O'Donoghue wrote:
> On 10/01/2023 12:44, Bryan O'Donoghue wrote:
>> + linux-wireless
>> On 10/01/2023 12:35, Bryan O'Donoghue wrote:
>>> commit a790cc3a4fad75048295571a350b95b87e022a5a
>>> (wake_tx_queue-broken-23-08-01)
>>> Author: Alexander Wetzel <[email protected]>
>>> Date:   Sun Oct 9 18:30:39 2022 +0200
>>>
>>>      wifi: mac80211: add wake_tx_queue callback to drivers
>>>
>>> is causing a regression with
>>>
>>> - CONF_PS = 1
>>> - CONF_DYNAMIC_PS = 0
>>> - ieee80211_handle_wake_tx_queue
>>>
>>> In this case we get stuck in a loop similar to this
>>>
>>> // IEEE80211_CONF_CHANGE_PS
>>> [   17.255480] wcn36xx: wcn36xx_change_ps/312 enable
>>> [   18.088835] ieee80211_tx_h_dynamic_ps/263 setting
>>> IEEE80211_QUEUE_STOP_REASON_PS
>>> [   18.088906] ieee80211_handle_wake_tx_queue/334 entry
>>> [   18.091505] ieee80211_dynamic_ps_disable_work/2250 calling
>>> ieee80211_hw_config()
>>> [   18.095370] ieee80211_handle_wake_tx_queue/338 wake_tx_push_queue
>>>
>>> // IEEE80211_CONF_CHANGE_PS
>>> [   18.102625] wcn36xx: wcn36xx_change_ps/312 disable
>>> [   18.107643] wake_tx_push_queue/303 entry
>>>
>>> // txq is stopped here reason == IEEE80211_QUEUE_STOP_REASON_PS
>>> [   18.107654] wake_tx_push_queue/311 q_stopped bitmask 0x00000002
>>> IEEE80211_QUEUE_STOP_REASON_PS true
>>> [   18.107661] wake_tx_push_queue/324 exit
>>> [   18.107667] ieee80211_handle_wake_tx_queue/342 exit
>>> [   18.115560] ieee80211_handle_wake_tx_queue/334 entry
>>> [   18.139937] ieee80211_handle_wake_tx_queue/338 wake_tx_push_queue
>>> [   18.145163] wake_tx_push_queue/303 entry
>>> [   18.150016] ieee80211_dynamic_ps_disable_work/2252 completed
>>> ieee80211_hw_config()
>>>
>>> // now we unset IEEE80211_QUEUE_STOP_REASON_PS but too late
>>> [   18.151145] wake_tx_push_queue/311 q_stopped bitmask 0x00000002
>>> IEEE80211_QUEUE_STOP_REASON_PS true
>>> [   18.155263] ieee80211_dynamic_ps_disable_work/2254 clearing
>>> IEEE80211_QUEUE_STOP_REASON_PS
>>> [   18.162531] wake_tx_push_queue/324 exit
>>> [   18.162548] ieee80211_handle_wake_tx_queue/342 exit
>>> [   18.183639] ieee80211_dynamic_ps_disable_work/2259 cleared
>>> IEEE80211_QUEUE_STOP_REASON_PS
>>>
>>> // IEEE80211_CONF_CHANGE_PS runs again
>>> [   18.215487] wcn36xx: wcn36xx_change_ps/312 enable
>>>
>>> We get stuck in that loop. Packets getting transmitted is a rare
>>> event, most are dropped.
>

I'll need some time digest that... I report back once I get it.

> BTW I considered implementing a wcn36xx specific wake_tx callback -
> which maybe should be done anyway.
>
> I _don't_ see other drivers checking for q_stopped &
> IEEE80211_QUEUE_STOP_REASON_PS
>
> Should they be ?
>

No, they should not.

My take is, that this is a bug in mac80211. I submitted patches to
fixing that, they have just been accepted:

https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/

and

https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/


Can you test if these also help here?



> If they should check IEEE80211_QUEUE_STOP_REASON_PS, then right now,
> they don't. If they shouldn't check IEEE80211_QUEUE_STOP_REASON_PS then
> neither should the generic replacement ieee80211_handle_wake_tx_queue()
>
> ---
> bod

2023-01-10 15:50:28

by Alexander Wetzel

[permalink] [raw]
Subject: Re: ieee80211_handle_wake_tx_queue and dynamic ps regression

On 10.01.23 16:23, Alexander Wetzel wrote:
> On 10.01.23 15:47, Bryan O'Donoghue wrote:
>> On 10/01/2023 12:44, Bryan O'Donoghue wrote:
>>> + linux-wireless
>>> On 10/01/2023 12:35, Bryan O'Donoghue wrote:
>>>> commit a790cc3a4fad75048295571a350b95b87e022a5a
>>>> (wake_tx_queue-broken-23-08-01)
>>>> Author: Alexander Wetzel <[email protected]>
>>>> Date:   Sun Oct 9 18:30:39 2022 +0200
>>>>
>>>>      wifi: mac80211: add wake_tx_queue callback to drivers
>>>>
>>>> is causing a regression with
>>>>
>>>> - CONF_PS = 1
>>>> - CONF_DYNAMIC_PS = 0
>>>> - ieee80211_handle_wake_tx_queue
>>>>
>>>> In this case we get stuck in a loop similar to this
>>>>
>>>> // IEEE80211_CONF_CHANGE_PS
>>>> [   17.255480] wcn36xx: wcn36xx_change_ps/312 enable
>>>> [   18.088835] ieee80211_tx_h_dynamic_ps/263 setting
>>>> IEEE80211_QUEUE_STOP_REASON_PS
>>>> [   18.088906] ieee80211_handle_wake_tx_queue/334 entry
>>>> [   18.091505] ieee80211_dynamic_ps_disable_work/2250 calling
>>>> ieee80211_hw_config()
>>>> [   18.095370] ieee80211_handle_wake_tx_queue/338 wake_tx_push_queue
>>>>
>>>> // IEEE80211_CONF_CHANGE_PS
>>>> [   18.102625] wcn36xx: wcn36xx_change_ps/312 disable
>>>> [   18.107643] wake_tx_push_queue/303 entry
>>>>
>>>> // txq is stopped here reason == IEEE80211_QUEUE_STOP_REASON_PS
>>>> [   18.107654] wake_tx_push_queue/311 q_stopped bitmask 0x00000002
>>>> IEEE80211_QUEUE_STOP_REASON_PS true
>>>> [   18.107661] wake_tx_push_queue/324 exit
>>>> [   18.107667] ieee80211_handle_wake_tx_queue/342 exit
>>>> [   18.115560] ieee80211_handle_wake_tx_queue/334 entry
>>>> [   18.139937] ieee80211_handle_wake_tx_queue/338 wake_tx_push_queue
>>>> [   18.145163] wake_tx_push_queue/303 entry
>>>> [   18.150016] ieee80211_dynamic_ps_disable_work/2252 completed
>>>> ieee80211_hw_config()
>>>>
>>>> // now we unset IEEE80211_QUEUE_STOP_REASON_PS but too late
>>>> [   18.151145] wake_tx_push_queue/311 q_stopped bitmask 0x00000002
>>>> IEEE80211_QUEUE_STOP_REASON_PS true
>>>> [   18.155263] ieee80211_dynamic_ps_disable_work/2254 clearing
>>>> IEEE80211_QUEUE_STOP_REASON_PS
>>>> [   18.162531] wake_tx_push_queue/324 exit
>>>> [   18.162548] ieee80211_handle_wake_tx_queue/342 exit
>>>> [   18.183639] ieee80211_dynamic_ps_disable_work/2259 cleared
>>>> IEEE80211_QUEUE_STOP_REASON_PS
>>>>
>>>> // IEEE80211_CONF_CHANGE_PS runs again
>>>> [   18.215487] wcn36xx: wcn36xx_change_ps/312 enable
>>>>
>>>> We get stuck in that loop. Packets getting transmitted is a rare
>>>> event, most are dropped.
>>
>
> I'll need some time digest that... I report back once I get it.

Looks like the the commit
https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
has a good chance to solve the issue:

1) Queues are stopped due to PS
2) Then there is a TX attempt. But due to the (PS) queue stop
wake_tx_push_queue() aborts the queue run
3) Then we hit the bug the patch fixes: The queue is not marked to
have pending packets and thus packets on it are not transmitted.

Packets get only send when you happen to try tx when the queue is
operational. (And then you will get all the packets sitting in the queue.)

Does that make sense? And more crucial, is the patch fixing that for you?

>
>> BTW I considered implementing a wcn36xx specific wake_tx callback -
>> which maybe should be done anyway.
>>
>> I _don't_ see other drivers checking for q_stopped &
>> IEEE80211_QUEUE_STOP_REASON_PS
>>
>> Should they be ?
>>
>
> No, they should not.
>
> My take is, that this is a bug in mac80211. I submitted patches to
> fixing that, they have just been accepted:
>
> https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
>
> and
>
> https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
>
>
> Can you test if these also help here?
>
>
>
>> If they should check IEEE80211_QUEUE_STOP_REASON_PS, then right now,
>> they don't. If they shouldn't check IEEE80211_QUEUE_STOP_REASON_PS
>> then neither should the generic replacement
>> ieee80211_handle_wake_tx_queue()
>>
>> ---
>> bod
>

2023-01-10 16:08:17

by Bryan O'Donoghue

[permalink] [raw]
Subject: Re: ieee80211_handle_wake_tx_queue and dynamic ps regression

On 10/01/2023 15:23, Alexander Wetzel wrote:
>
> No, they should not.
>
> My take is, that this is a bug in mac80211. I submitted patches to
> fixing that, they have just been accepted:
>
> https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
>
> and
>
> https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
>
>
> Can you test if these also help here?

np

2023-01-10 19:39:43

by Bryan O'Donoghue

[permalink] [raw]
Subject: Re: ieee80211_handle_wake_tx_queue and dynamic ps regression

On 10/01/2023 15:43, Alexander Wetzel wrote:
>>
>
> Looks like the the commit
> https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
> has a good chance to solve the issue:
>
> 1) Queues are stopped due to PS
> 2) Then there is a TX attempt. But due to the (PS) queue stop
>    wake_tx_push_queue() aborts the queue run
> 3) Then we hit the bug the patch fixes: The queue is not marked to
>    have pending packets and thus packets on it are not transmitted.
>
> Packets get only send when you happen to try tx when the queue is
> operational. (And then you will get all the packets sitting in the queue.)
>
> Does that make sense? And more crucial, is the patch fixing that for you?

Ok works for me.

Good job.

---
bod

Subject: Re: ieee80211_handle_wake_tx_queue and dynamic ps regression

[TLDR: This mail in primarily relevant for Linux kernel regression
tracking. See link in footer if these mails annoy you.]

On 10.01.23 20:35, Bryan O'Donoghue wrote:
> On 10/01/2023 15:43, Alexander Wetzel wrote:
>> Looks like the the commit
>> https://patchwork.kernel.org/project/linux-wireless/patch/[email protected]/
>> has a good chance to solve the issue:
> [..]
>> Does that make sense? And more crucial, is the patch fixing that for you?
>
> Ok works for me.

In that case:

#regzbot fix: wifi: mac80211: Proper mark iTXQs for resumption

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.