2017-03-22 09:15:20

by Sven Eckelmann

[permalink] [raw]
Subject: [REGRESSION] mac80211: IBSS vif queue stopped when started after 11s vif

Hi,

I had following "simple" setup with LEDE with a single ath9k phy and multiple
vif:

* encrypted AP
* encrypted 802.11s meshpoint
* encrypted IBSS

Everything was started in the order by hostapd/wpa_supplicant (but immediately
after each other).

The problem which I've experienced was that IBSS was not able to communicate
with its link partner. The reason for that problem was that the IBSS
interface's queue was stopped (QUEUE_STATE_DRV_XOFF). This problem disappeared
when either the IBSS or meshpoint interface was changed to unencrypted (which
disables wpa_supplicant in LEDE).

It looks like ieee80211_do_open didn't start the queues via
netif_start_subqueue because local->queue_stop_reasons was for all queues set
to IEEE80211_QUEUE_STOP_REASON_OFFCHANNEL. This happened because
ieee80211_offchannel_stop_vifs was called from somewhere in the scan code at
that time and ieee80211_offchannel_return was not yet called.


This behavior seems to be introduced by 2b730daacee6 ("mac80211: don't start
new netdev queues if driver stopped"). I have therefore chosen to call it for
now a regression by this change. Especially because it is rather odd that the
commit talked about not starting the queues for AP_VLAN and 2b436312f091
("mac80211: fix queue handling crash") introduced extra code to use the old
behavior again for AP_VLAN.

But I could be completely wrong about it. It would therefore be interesting
for me to know who would be responsible to start the queues when
ieee80211_do_open rejected it for IBSS.

I am now simply using this setup with a revert of
2b436312f0919c05804fed5aa4b7f255db196e7a and
2b730daacee6c318bce7b6373c19909e36a74590.

Kind regards,
Sven


Attachments:
signature.asc (833.00 B)
This is a digitally signed message part.

2017-03-29 12:11:13

by Sven Eckelmann

[permalink] [raw]
Subject: Re: [REGRESSION] mac80211: IBSS vif queue stopped when started after 11s vif

On Mittwoch, 29. M?rz 2017 13:53:06 CEST Johannes Berg wrote:
[...]
> > Not sure whether removing it in ieee80211_propagate_queue_wake will
> > have other odd side effects with software queuing. Maybe Michal
> > Kazior can tell us if it is safe to remove it.
>
> No, it's the other way around.
>
> Michal's patches correctly added a test for this to
> __ieee80211_stop_queue(), the only missing thing is that this test
> should also be in ieee80211_do_open() like this:
>
> diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
> index 40813dd3301c..5bb0c5012819 100644
> --- a/net/mac80211/iface.c
> +++ b/net/mac80211/iface.c
> @@ -718,7 +718,8 @@ int ieee80211_do_open(struct wireless_dev *wdev, bool coming_up)
> ieee80211_recalc_ps(local);
>
> if (sdata->vif.type == NL80211_IFTYPE_MONITOR ||
> - sdata->vif.type == NL80211_IFTYPE_AP_VLAN) {
> + sdata->vif.type == NL80211_IFTYPE_AP_VLAN ||
> + local->ops->wake_tx_queue) {
> /* XXX: for AP_VLAN, actually track AP queues */
> netif_tx_start_all_queues(dev);
> } else if (dev) {

Yes, this also works.

Kind regards,
Sven


Attachments:
signature.asc (833.00 B)
This is a digitally signed message part.

2017-03-29 07:49:24

by Johannes Berg

[permalink] [raw]
Subject: Re: [REGRESSION] mac80211: IBSS vif queue stopped when started after 11s vif

Hi Sven,


> But I could be completely wrong about it. It would therefore be
> interesting for me to know who would be responsible to start the
> queues when ieee80211_do_open rejected it for IBSS.

Well, once ieee80211_offchannel_return() is called, that should do the
needful and end up in ieee80211_propagate_queue_wake().

Can you check what the IBSS vif's queues are (vif->hw_queue[...])?

However, I also don't understand the difference between encrypted and
unencrypted here.

johannes

2017-03-29 11:53:10

by Johannes Berg

[permalink] [raw]
Subject: Re: [REGRESSION] mac80211: IBSS vif queue stopped when started after 11s vif


>     if (local->ops->wake_tx_queue)
>      return;
>
> evaluates to true. The rest rest of the function is therefore always
> skipped for ath9k.

Ahh, yes, ok.

> Removing this is enough to fix the problem. And now you will propably
> say "hey, this is not my code". And this is the reason why I have now
> CC'ed the author of 80a83cfc434b ("mac80211: skip netdev queue
> control with software queuing"). This change in
> ieee80211_propagate_queue_wake is basically breaking 
> the (delayed) startup of the ibss netdev queue [1] when the device
> was offchan during the ieee80211_do_open of the ibss interface.
>
> Not sure whether removing it in ieee80211_propagate_queue_wake will
> have other odd side effects with software queuing. Maybe Michal
> Kazior can tell us if it is safe to remove it.

No, it's the other way around.

Michal's patches correctly added a test for this to
__ieee80211_stop_queue(), the only missing thing is that this test
should also be in ieee80211_do_open() like this:

diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 40813dd3301c..5bb0c5012819 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -718,7 +718,8 @@ int ieee80211_do_open(struct wireless_dev *wdev, bool coming_up)
ieee80211_recalc_ps(local);

if (sdata->vif.type == NL80211_IFTYPE_MONITOR ||
- sdata->vif.type == NL80211_IFTYPE_AP_VLAN) {
+ sdata->vif.type == NL80211_IFTYPE_AP_VLAN ||
+ local->ops->wake_tx_queue) {
/* XXX: for AP_VLAN, actually track AP queues */
netif_tx_start_all_queues(dev);
} else if (dev) {

johannes

2017-03-29 11:07:47

by Sven Eckelmann

[permalink] [raw]
Subject: Re: [REGRESSION] mac80211: IBSS vif queue stopped when started after 11s vif

On Mittwoch, 29. M?rz 2017 09:49:21 CEST Johannes Berg wrote:
> > But I could be completely wrong about it. It would therefore be
> > interesting for me to know who would be responsible to start the
> > queues when ieee80211_do_open rejected it for IBSS.
>
> Well, once ieee80211_offchannel_return() is called, that should do the
> needful and end up in ieee80211_propagate_queue_wake().
>
> Can you check what the IBSS vif's queues are (vif->hw_queue[...])?

I've just dumped the data in ieee80211_propagate_queue_wake and checked when
the function returns. The test patch (sorry, really ugly debug printk stuff)
is attached. The most interesting part is that

if (local->ops->wake_tx_queue)
return;

evaluates to true. The rest rest of the function is therefore always skipped
for ath9k.

This was noticed when looking at the debug output:

root@lede:/# dmesg|grep ieee80211_propagate_queue_wake
[ 20.865005] ieee80211_propagate_queue_wake:248 queue 00000000
[ 20.870839] ieee80211_propagate_queue_wake:248 queue 00000001
[ 20.876661] ieee80211_propagate_queue_wake:248 queue 00000002
[ 20.882487] ieee80211_propagate_queue_wake:248 queue 00000003
[ 21.794795] ieee80211_propagate_queue_wake:248 queue 00000000
[ 21.800629] ieee80211_propagate_queue_wake:248 queue 00000001
[ 21.806452] ieee80211_propagate_queue_wake:248 queue 00000002
[ 21.812278] ieee80211_propagate_queue_wake:248 queue 00000003
[ 21.830078] ieee80211_propagate_queue_wake:248 queue 00000000
[ 21.835918] ieee80211_propagate_queue_wake:248 queue 00000001
[ 21.841740] ieee80211_propagate_queue_wake:248 queue 00000002
[ 21.847566] ieee80211_propagate_queue_wake:248 queue 00000003
[ 23.320814] ieee80211_propagate_queue_wake:248 queue 00000000
[ 23.326643] ieee80211_propagate_queue_wake:248 queue 00000001
[ 23.332469] ieee80211_propagate_queue_wake:248 queue 00000002
[ 23.338294] ieee80211_propagate_queue_wake:248 queue 00000003
[ 41.930942] ieee80211_propagate_queue_wake:248 queue 00000000
[ 41.940709] ieee80211_propagate_queue_wake:248 queue 00000002
[ 46.949087] ieee80211_propagate_queue_wake:248 queue 00000000
[ 82.999021] ieee80211_propagate_queue_wake:248 queue 00000000

Removing this is enough to fix the problem. And now you will propably say
"hey, this is not my code". And this is the reason why I have now CC'ed the
author of 80a83cfc434b ("mac80211: skip netdev queue control with software
queuing"). This change in ieee80211_propagate_queue_wake is basically breaking
the (delayed) startup of the ibss netdev queue [1] when the device was offchan
during the ieee80211_do_open of the ibss interface.

Not sure whether removing it in ieee80211_propagate_queue_wake will have other
odd side effects with software queuing. Maybe Michal Kazior can tell us if it
is safe to remove it.

> However, I also don't understand the difference between encrypted and
> unencrypted here.

My best guess is timing. LEDE is not using wpa_supplicant when encryption is
disabled.

Kind regards,
Sven

[1] https://lkml.kernel.org/r/1978424.XTv2Qph05K@bentobox


Attachments:
999-test.patch (5.11 kB)
signature.asc (833.00 B)
This is a digitally signed message part.
Download all attachments