2017-10-17 09:04:27

by Jesse Sung

[permalink] [raw]
Subject: Commit 0711d638 breaks mwifiex

Hi,

While working on an issue that marvell module stops connecting to AP,
bisect reveals that the issue starts to happen from commit 0711d638,
which uses wdev->ssid_len instead of wdev->current_bss to determine
if driver's .disconnect() should be called.

It happens because mwifiex_cfg80211_connect() returns -EALREADY
when it finds wdev->current_bss is valid:

if (priv->wdev.current_bss) {
[PRINT LOG]
return -EALREADY;
}

This would make cfg80211_connect() set wdev->ssid_len to 0, and thus
mwifiex_cfg80211_disconnect() won't be called by cfg80211_disconnect().

The easiest way to overcome this is

diff --git a/net/wireless/sme.c b/net/wireless/sme.c
index 0a49b88..104edb4 100644
--- a/net/wireless/sme.c
+++ b/net/wireless/sme.c
@@ -1142,7 +1142,7 @@ int cfg80211_disconnect(struct
cfg80211_registered_device *rdev,
err = cfg80211_sme_disconnect(wdev, reason);
else if (!rdev->ops->disconnect)
cfg80211_mlme_down(rdev, dev);
- else if (wdev->ssid_len)
+ else if (wdev->ssid_len || wdev->current_bss)
err = rdev_disconnect(rdev, dev, reason);

return err;

but I'm not sure if this is a proper fix for this issue.

Thanks,
Jesse


2017-10-17 09:51:36

by Johannes Berg

[permalink] [raw]
Subject: Re: Commit 0711d638 breaks mwifiex

Hi,

> While working on an issue that marvell module stops connecting to AP,
> bisect reveals that the issue starts to happen from commit 0711d638,
> which uses wdev->ssid_len instead of wdev->current_bss to determine
> if driver's .disconnect() should be called.
>
> It happens because mwifiex_cfg80211_connect() returns -EALREADY
> when it finds wdev->current_bss is valid:
>
> if (priv->wdev.current_bss) {
> [PRINT LOG]
> return -EALREADY;
> }
>
> This would make cfg80211_connect() set wdev->ssid_len to 0, and thus
> mwifiex_cfg80211_disconnect() won't be called by
> cfg80211_disconnect().

Hmm, none of this makes much sense to me right now.

Does mwifiex treat this -EALREADY as *keeping* an old connection, or
tearing it down entirely?

Because right now clearly cfg80211 assumes, on the one hand, that no
connection is kept (resetting ssid_len), but on the other hand it got
here with current_bss set - so perhaps we should reject that before in
cfg80211, rather than in mwifiex?

I think your fix is invalid because we then reset ssid_len and still
keep an old connection (current_bss) which will lead to strange nl80211
behaviour when getting interface data etc.

johannes

2017-10-17 14:10:49

by Jesse Sung

[permalink] [raw]
Subject: Re: Commit 0711d638 breaks mwifiex

On Tue, Oct 17, 2017 at 10:08 PM, Jesse Sung <[email protected]> wrote:
> On Tue, Oct 17, 2017 at 9:13 PM, Johannes Berg
> <[email protected]> wrote:
>>
>> On Tue, 2017-10-17 at 21:07 +0800, Jesse Sung wrote:
>> >
>> > > Not really quite sure about it yet, but that should address the
>> > > issue?
>> >
>> > A very rough test by issuing reassociate in wpa_cli:
>> > mwifiex works after reassociate - looks good
>>
>> Ok. Discussing this with Ilan, we realized that this was buggy, and
>> came up with this patch instead:
>>
>> https://p.sipsolutions.net/8f9d5917a5cbe4b3.txt
>>
>> Can you also give that a spin?
>
> Issued four reassociate, the network interface is still alive. Also with this
> patch it stays with BSSID1 and I don't see any "Association request to
> the driver failed" in the process.

Correction: I can still see "Association request to the driver failed" and
switching between BSSIDs with more reassociates but I think that shouldn't
be an issue?

2017-10-17 14:08:12

by Jesse Sung

[permalink] [raw]
Subject: Re: Commit 0711d638 breaks mwifiex

On Tue, Oct 17, 2017 at 9:13 PM, Johannes Berg
<[email protected]> wrote:
>
> On Tue, 2017-10-17 at 21:07 +0800, Jesse Sung wrote:
> >
> > > Not really quite sure about it yet, but that should address the
> > > issue?
> >
> > A very rough test by issuing reassociate in wpa_cli:
> > mwifiex works after reassociate - looks good
>
> Ok. Discussing this with Ilan, we realized that this was buggy, and
> came up with this patch instead:
>
> https://p.sipsolutions.net/8f9d5917a5cbe4b3.txt
>
> Can you also give that a spin?

Issued four reassociate, the network interface is still alive. Also with this
patch it stays with BSSID1 and I don't see any "Association request to
the driver failed" in the process.

Thanks,
Jesse

2017-10-17 10:48:24

by Johannes Berg

[permalink] [raw]
Subject: Re: Commit 0711d638 breaks mwifiex

On Tue, 2017-10-17 at 18:18 +0800, Jesse Sung wrote:

> > Does mwifiex treat this -EALREADY as *keeping* an old connection,
> > or tearing it down entirely?
>
> From the call trace:

Well, the call trace can't really answer that :-)
Does mwifiex firmware stay connected?


> 139.451318: nl80211_get_valid_chan <-nl80211_connect
> 139.451321: cfg80211_connect <-nl80211_connect
> 139.451322: cfg80211_oper_and_ht_capa <-cfg80211_connect
> 139.451323: mwifiex_cfg80211_connect <-cfg80211_connect
> 139.451337: nl80211_post_doit <-genl_family_rcv_msg
> 139.451423: nl80211_pre_doit <-genl_family_rcv_msg
> 139.451425: nl80211_disconnect <-genl_family_rcv_msg
> 139.451426: cfg80211_disconnect <-nl80211_disconnect
> 139.451430: mwifiex_cfg80211_disconnect <-cfg80211_disconnect
>
> mwifiex_cfg80211_disconnect() would be called after
> mwifiex_cfg80211_connect(), though I'm not sure if it's triggered by
> the error returned.

I think so - it's probably wpa_supplicant trying to get back to a well-
known state (of being disconnected).

> > I think your fix is invalid because we then reset ssid_len and
> > still
> > keep an old connection (current_bss) which will lead to strange
> > nl80211
> > behaviour when getting interface data etc.
>
> Since this is how it works before commit 0711d638 (use current_bss
> instead of ssid_len), so I'm guessing this would still work. But I
> agree that this may not be a proper fix...

It would probably work, but we get data inconsistencies, and at the
very least you get no SSID data back when you query the current state.

I don't see anything in nl80211 or so that would say we should accept a
connect() while already connected, so how about this?

diff --git a/net/wireless/sme.c b/net/wireless/sme.c
index b347e63d7aaa..fe0037ad1f5e 100644
--- a/net/wireless/sme.c
+++ b/net/wireless/sme.c
@@ -1042,6 +1042,9 @@ int cfg80211_connect(struct cfg80211_registered_device *rdev,

ASSERT_WDEV_LOCK(wdev);

+ if (wdev->current_bss)
+ return -EALREADY;
+
if (WARN_ON(wdev->connect_keys)) {
kzfree(wdev->connect_keys);
wdev->connect_keys = NULL;

Not really quite sure about it yet, but that should address the issue?

johannes

2017-10-17 13:07:53

by Jesse Sung

[permalink] [raw]
Subject: Re: Commit 0711d638 breaks mwifiex

On Tue, Oct 17, 2017 at 6:48 PM, Johannes Berg
<[email protected]> wrote:
> On Tue, 2017-10-17 at 18:18 +0800, Jesse Sung wrote:
>
>> > Does mwifiex treat this -EALREADY as *keeping* an old connection,
>> > or tearing it down entirely?
>>
>> From the call trace:
>
> Well, the call trace can't really answer that :-)
> Does mwifiex firmware stay connected?

Sorry I don't know how to tell firmware's state. @@

>> 139.451318: nl80211_get_valid_chan <-nl80211_connect
>> 139.451321: cfg80211_connect <-nl80211_connect
>> 139.451322: cfg80211_oper_and_ht_capa <-cfg80211_connect
>> 139.451323: mwifiex_cfg80211_connect <-cfg80211_connect
>> 139.451337: nl80211_post_doit <-genl_family_rcv_msg
>> 139.451423: nl80211_pre_doit <-genl_family_rcv_msg
>> 139.451425: nl80211_disconnect <-genl_family_rcv_msg
>> 139.451426: cfg80211_disconnect <-nl80211_disconnect
>> 139.451430: mwifiex_cfg80211_disconnect <-cfg80211_disconnect
>>
>> mwifiex_cfg80211_disconnect() would be called after
>> mwifiex_cfg80211_connect(), though I'm not sure if it's triggered by
>> the error returned.
>
> I think so - it's probably wpa_supplicant trying to get back to a well-
> known state (of being disconnected).
>
>> > I think your fix is invalid because we then reset ssid_len and
>> > still
>> > keep an old connection (current_bss) which will lead to strange
>> > nl80211
>> > behaviour when getting interface data etc.
>>
>> Since this is how it works before commit 0711d638 (use current_bss
>> instead of ssid_len), so I'm guessing this would still work. But I
>> agree that this may not be a proper fix...
>
> It would probably work, but we get data inconsistencies, and at the
> very least you get no SSID data back when you query the current state.
>
> I don't see anything in nl80211 or so that would say we should accept a
> connect() while already connected, so how about this?
>
> diff --git a/net/wireless/sme.c b/net/wireless/sme.c
> index b347e63d7aaa..fe0037ad1f5e 100644
> --- a/net/wireless/sme.c
> +++ b/net/wireless/sme.c
> @@ -1042,6 +1042,9 @@ int cfg80211_connect(struct cfg80211_registered_device *rdev,
>
> ASSERT_WDEV_LOCK(wdev);
>
> + if (wdev->current_bss)
> + return -EALREADY;
> +
> if (WARN_ON(wdev->connect_keys)) {
> kzfree(wdev->connect_keys);
> wdev->connect_keys = NULL;
>
> Not really quite sure about it yet, but that should address the issue?

A very rough test by issuing reassociate in wpa_cli:
mwifiex works after reassociate - looks good

> reassociate
OK
<3>CTRL-EVENT-SCAN-STARTED
<3>CTRL-EVENT-SCAN-RESULTS
<3>Trying to associate with <BSSID1> (SSID=<SSID> freq=<freq1>)
<3>Association request to the driver failed
<3>CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
<3>CTRL-EVENT-SCAN-STARTED
<3>CTRL-EVENT-SCAN-RESULTS
<3>WPS-AP-AVAILABLE
<3>Trying to associate with <BSSID2> (SSID=<SSID> freq=<freq2>)
<3>Associated with <BSSID2>
<3>WPA: Key negotiation completed with <BSSID2> [PTK=CCMP GTK=CCMP]
<3>CTRL-EVENT-CONNECTED - Connection to <BSSID2> completed [id=0 id_str=]
<3>CTRL-EVENT-SCAN-STARTED
> reassociate
OK
<3>CTRL-EVENT-SCAN-RESULTS
<3>Trying to associate with <BSSID2> (SSID=<SSID> freq=<freq2>)
<3>Associated with <BSSID2>
<3>CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
<3>WPA: Key negotiation completed with <BSSID2> [PTK=CCMP GTK=CCMP]
<3>CTRL-EVENT-CONNECTED - Connection to <BSSID2> completed [id=0 id_str=]
> reassociate
OK
<3>CTRL-EVENT-SCAN-STARTED
>
<3>CTRL-EVENT-SCAN-RESULTS
<3>Trying to associate with <BSSID2> (SSID=<SSID> freq=<freq2>)
<3>Association request to the driver failed
<3>CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
<3>CTRL-EVENT-SCAN-STARTED
<3>CTRL-EVENT-SCAN-RESULTS
<3>WPS-AP-AVAILABLE
<3>Trying to associate with <BSSID3> (SSID=<SSID> freq=<freq3>)
<3>Associated with <BSSID3>
<3>WPA: Key negotiation completed with <BSSID3> [PTK=CCMP GTK=CCMP]
<3>CTRL-EVENT-CONNECTED - Connection to <BSSID3> completed [id=0 id_str=]
> reassociate
OK
<3>CTRL-EVENT-SCAN-STARTED
<3>CTRL-EVENT-SCAN-RESULTS
<3>Trying to associate with <BSSID3> (SSID=<SSID> freq=<freq3>)
<3>Association request to the driver failed
<3>CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
<3>CTRL-EVENT-SCAN-STARTED
<3>CTRL-EVENT-SCAN-RESULTS
<3>WPS-AP-AVAILABLE
<3>Trying to associate with <BSSID1> (SSID=<SSID> freq=<freq1>)
<3>Associated with <BSSID1>
<3>WPA: Key negotiation completed with <BSSID1> [PTK=CCMP GTK=CCMP]
<3>CTRL-EVENT-CONNECTED - Connection to <BSSID1> completed [id=0 id_str=]
<3>CTRL-EVENT-SCAN-STARTED
<3>CTRL-EVENT-SCAN-RESULTS
<3>Trying to associate with <BSSID2> (SSID=<SSID> freq=<freq2>)
<3>Association request to the driver failed
<3>CTRL-EVENT-REGDOM-CHANGE init=CORE type=WORLD
<3>CTRL-EVENT-SCAN-STARTED
<3>CTRL-EVENT-SCAN-RESULTS
<3>WPS-AP-AVAILABLE
<3>Trying to associate with <BSSID1> (SSID=<SSID> freq=<freq1>)
<3>Associated with <BSSID1>
<3>WPA: Key negotiation completed with <BSSID1> [PTK=CCMP GTK=CCMP]
<3>CTRL-EVENT-CONNECTED - Connection to <BSSID1> completed [id=0 id_str=]


Thanks,
Jesse

2017-10-17 10:18:20

by Jesse Sung

[permalink] [raw]
Subject: Re: Commit 0711d638 breaks mwifiex

Hi Johannes,

On Tue, Oct 17, 2017 at 5:51 PM, Johannes Berg
<[email protected]> wrote:
> Hi,
>
>> While working on an issue that marvell module stops connecting to AP,
>> bisect reveals that the issue starts to happen from commit 0711d638,
>> which uses wdev->ssid_len instead of wdev->current_bss to determine
>> if driver's .disconnect() should be called.
>>
>> It happens because mwifiex_cfg80211_connect() returns -EALREADY
>> when it finds wdev->current_bss is valid:
>>
>> if (priv->wdev.current_bss) {
>> [PRINT LOG]
>> return -EALREADY;
>> }
>>
>> This would make cfg80211_connect() set wdev->ssid_len to 0, and thus
>> mwifiex_cfg80211_disconnect() won't be called by
>> cfg80211_disconnect().
>
> Hmm, none of this makes much sense to me right now.
>
> Does mwifiex treat this -EALREADY as *keeping* an old connection, or
> tearing it down entirely?

>From the call trace:

139.451318: nl80211_get_valid_chan <-nl80211_connect
139.451321: cfg80211_connect <-nl80211_connect
139.451322: cfg80211_oper_and_ht_capa <-cfg80211_connect
139.451323: mwifiex_cfg80211_connect <-cfg80211_connect
139.451337: nl80211_post_doit <-genl_family_rcv_msg
139.451423: nl80211_pre_doit <-genl_family_rcv_msg
139.451425: nl80211_disconnect <-genl_family_rcv_msg
139.451426: cfg80211_disconnect <-nl80211_disconnect
139.451430: mwifiex_cfg80211_disconnect <-cfg80211_disconnect

mwifiex_cfg80211_disconnect() would be called after
mwifiex_cfg80211_connect(), though I'm not sure if it's triggered by the
error returned.

> Because right now clearly cfg80211 assumes, on the one hand, that no
> connection is kept (resetting ssid_len), but on the other hand it got
> here with current_bss set - so perhaps we should reject that before in
> cfg80211, rather than in mwifiex?

Yes the inconsistency may be a problem.

> I think your fix is invalid because we then reset ssid_len and still
> keep an old connection (current_bss) which will lead to strange nl80211
> behaviour when getting interface data etc.

Since this is how it works before commit 0711d638 (use current_bss instead
of ssid_len), so I'm guessing this would still work. But I agree that this
may not be a proper fix...

Thanks,
Jesse

2017-10-17 15:25:54

by Jesse Sung

[permalink] [raw]
Subject: Re: Commit 0711d638 breaks mwifiex

On Tue, Oct 17, 2017 at 11:10 PM, Johannes Berg
<[email protected]> wrote:
> Hi,
>
>> > > https://p.sipsolutions.net/8f9d5917a5cbe4b3.txt
>> > >
>> > > Can you also give that a spin?
>
> Thanks for testing!
>
>> > Issued four reassociate, the network interface is still alive. Also
>> > with this patch it stays with BSSID1 and I don't see any
>> > "Association request to the driver failed" in the process.
>>
>> Correction: I can still see "Association request to the driver
>> failed" and switching between BSSIDs with more reassociates but I
>> think that shouldn't be an issue?
>
> Not sure what you mean by "with more reassociates"?

I mean do more "reassociate" in the wpa_cli. :)

Thanks,
Jesse

2017-10-17 13:13:46

by Johannes Berg

[permalink] [raw]
Subject: Re: Commit 0711d638 breaks mwifiex

On Tue, 2017-10-17 at 21:07 +0800, Jesse Sung wrote:
>
> > Not really quite sure about it yet, but that should address the
> > issue?
>
> A very rough test by issuing reassociate in wpa_cli:
> mwifiex works after reassociate - looks good

Ok. Discussing this with Ilan, we realized that this was buggy, and
came up with this patch instead:

https://p.sipsolutions.net/8f9d5917a5cbe4b3.txt

Can you also give that a spin?

johannes

2017-10-28 21:32:21

by Arend Van Spriel

[permalink] [raw]
Subject: Re: Commit 0711d638 breaks mwifiex

On 27-10-17 22:10, Johannes Berg wrote:
> Hi,
>
>> IIUC, mwifiex hasn't told the firmware to do anything at this point --
>> the -EALREADY check is practically the first thing it does within
>> connect(). So it just quits the connect() request and tries to carry on
>> as usual. It will only do something different if the upper layers tell
>> it to do so afterward (e.g., calling disconnect()).
>
> Yeah, that makes sense.
>
>> Yes, that's definitely what's happening. And it's explicitly called out
>> in the supplicant's nl80211 driver that this is intentional:
>>
>> [...]
>
> Right.
>
>> This is the main code path for supplicant commands like "Reattach",
>> which boil down to (for non SME drivers):
>>
>> wpas_request_connection()
>> ...
>> -> wpa_supplicant_connect()
>> -> wpa_supplicant_associate()
>> -> wpas_start_assoc_cb()
>> -> wpa_drv_associate()
>> -> wpa_driver_nl80211_associate()
>> -> wpa_driver_nl80211_connect()
>>
>> Now for the part I'm not so familiar with: is this really the *expected*
>> flow for full-MAC drivers in reattach, reassociate, and roaming flows?
>> All of those seem to boil down to this same connect() (and fallback to
>> disconnect()+connect() if -EALREADY) flow.
>
> We never implemented a "ROAM" command, so there's not all that much
> choice.
>
>> But it doesn't seem like all full-MAC drivers do the same thing. Some
>> seem to just blaze ahead with a connect attempt (maybe some firmwares
>> automatically interpret this for us?) and never return -EALREADY at all.
>
> Agree, some seem to just do some form of roaming on this, which really
> just means they disconnect internally - or perhaps they even try to
> roam if it's the same network?
>
>> Sorry if this is slightly off-topic, but I'm trying to understand what
>> the general expectations are here, based on my relatively narrow
>> experience with a few drivers.
>
> I don't really know either! There aren't that many direct cfg80211
> drivers that don't use mac80211, so there's not that much experience.
>
> You're probably well positioned to say what the behaviour _should_ be
> though? :-)
>
> I tend to think we should actually do the EALREADY from cfg80211, so
> that wpa_s will always be forced to go through this roundabout code
> path, but also finally implement a ROAM command, to let drivers have
> that option?
>
> Note also that we've always sort of envisioned that drivers
> implementing CONNECT would do BSS selection themselves, but I think
> there's no automatic way of doing that with wpa_s, and it may not even
> be desirable in many cases (unless you really need the power saving
> advantage) since it gets us into a situation where we have all these
> different algorithms etc.

wpa_s can issue a CONNECT without bssid (nor bssid_hint) leaving it up
to the driver to select the BSS. I am pretty sure I hit that scenario
although not sure what exact criteria are for wpa_s. Maybe it depended
on WIPHY_FLAG_SUPPORTS_FW_ROAM.

Regarding BSS selection I added bss_select feature. An effort which
actually was triggered by a chromebook project. Guess the number of
drivers implementing that can be counted on one ...finger ;-)

Regards,
Arend

2017-10-26 21:13:18

by Brian Norris

[permalink] [raw]
Subject: Re: Commit 0711d638 breaks mwifiex

Hi,

I'm not sure I've followed all the problems here, but I wanted to point
some things out:

On Tue, Oct 17, 2017 at 12:48:18PM +0200, Johannes Berg wrote:
> On Tue, 2017-10-17 at 18:18 +0800, Jesse Sung wrote:
>
> > > Does mwifiex treat this -EALREADY as *keeping* an old connection,
> > > or tearing it down entirely?
> >
> > From the call trace:
>
> Well, the call trace can't really answer that :-)
> Does mwifiex firmware stay connected?

IIUC, mwifiex hasn't told the firmware to do anything at this point --
the -EALREADY check is practically the first thing it does within
connect(). So it just quits the connect() request and tries to carry on
as usual. It will only do something different if the upper layers tell
it to do so afterward (e.g., calling disconnect()).

> > 139.451318: nl80211_get_valid_chan <-nl80211_connect
> > 139.451321: cfg80211_connect <-nl80211_connect
> > 139.451322: cfg80211_oper_and_ht_capa <-cfg80211_connect
> > 139.451323: mwifiex_cfg80211_connect <-cfg80211_connect
> > 139.451337: nl80211_post_doit <-genl_family_rcv_msg
> > 139.451423: nl80211_pre_doit <-genl_family_rcv_msg
> > 139.451425: nl80211_disconnect <-genl_family_rcv_msg
> > 139.451426: cfg80211_disconnect <-nl80211_disconnect
> > 139.451430: mwifiex_cfg80211_disconnect <-cfg80211_disconnect
> >
> > mwifiex_cfg80211_disconnect() would be called after
> > mwifiex_cfg80211_connect(), though I'm not sure if it's triggered by
> > the error returned.
>
> I think so - it's probably wpa_supplicant trying to get back to a well-
> known state (of being disconnected).

Yes, that's definitely what's happening. And it's explicitly called out
in the supplicant's nl80211 driver that this is intentional:

static int wpa_driver_nl80211_connect(
struct wpa_driver_nl80211_data *drv,
struct wpa_driver_associate_params *params)
{
...
ret = wpa_driver_nl80211_try_connect(drv, params);
if (ret == -EALREADY) {
/*
* cfg80211 does not currently accept new connections if
* we are already connected. As a workaround, force
* disconnection and try again.
*/
wpa_printf(MSG_DEBUG, "nl80211: Explicitly "
"disconnecting before reassociation "
"attempt");
if (wpa_driver_nl80211_disconnect(
drv, WLAN_REASON_PREV_AUTH_NOT_VALID))
return -1;
ret = wpa_driver_nl80211_try_connect(drv, params);
}
return ret;
}

This is the main code path for supplicant commands like "Reattach",
which boil down to (for non SME drivers):

wpas_request_connection()
...
-> wpa_supplicant_connect()
-> wpa_supplicant_associate()
-> wpas_start_assoc_cb()
-> wpa_drv_associate()
-> wpa_driver_nl80211_associate()
-> wpa_driver_nl80211_connect()

Now for the part I'm not so familiar with: is this really the *expected*
flow for full-MAC drivers in reattach, reassociate, and roaming flows?
All of those seem to boil down to this same connect() (and fallback to
disconnect()+connect() if -EALREADY) flow.

But it doesn't seem like all full-MAC drivers do the same thing. Some
seem to just blaze ahead with a connect attempt (maybe some firmwares
automatically interpret this for us?) and never return -EALREADY at all.

Sorry if this is slightly off-topic, but I'm trying to understand what
the general expectations are here, based on my relatively narrow
experience with a few drivers.

Brian

2017-10-27 20:10:42

by Johannes Berg

[permalink] [raw]
Subject: Re: Commit 0711d638 breaks mwifiex

Hi,

> IIUC, mwifiex hasn't told the firmware to do anything at this point --
> the -EALREADY check is practically the first thing it does within
> connect(). So it just quits the connect() request and tries to carry on
> as usual. It will only do something different if the upper layers tell
> it to do so afterward (e.g., calling disconnect()).

Yeah, that makes sense.

> Yes, that's definitely what's happening. And it's explicitly called out
> in the supplicant's nl80211 driver that this is intentional:
>
> [...]

Right.

> This is the main code path for supplicant commands like "Reattach",
> which boil down to (for non SME drivers):
>
> wpas_request_connection()
> ...
> -> wpa_supplicant_connect()
> -> wpa_supplicant_associate()
> -> wpas_start_assoc_cb()
> -> wpa_drv_associate()
> -> wpa_driver_nl80211_associate()
> -> wpa_driver_nl80211_connect()
>
> Now for the part I'm not so familiar with: is this really the *expected*
> flow for full-MAC drivers in reattach, reassociate, and roaming flows?
> All of those seem to boil down to this same connect() (and fallback to
> disconnect()+connect() if -EALREADY) flow.

We never implemented a "ROAM" command, so there's not all that much
choice.

> But it doesn't seem like all full-MAC drivers do the same thing. Some
> seem to just blaze ahead with a connect attempt (maybe some firmwares
> automatically interpret this for us?) and never return -EALREADY at all.

Agree, some seem to just do some form of roaming on this, which really
just means they disconnect internally - or perhaps they even try to
roam if it's the same network?

> Sorry if this is slightly off-topic, but I'm trying to understand what
> the general expectations are here, based on my relatively narrow
> experience with a few drivers.

I don't really know either! There aren't that many direct cfg80211
drivers that don't use mac80211, so there's not that much experience.

You're probably well positioned to say what the behaviour _should_ be
though? :-)

I tend to think we should actually do the EALREADY from cfg80211, so
that wpa_s will always be forced to go through this roundabout code
path, but also finally implement a ROAM command, to let drivers have
that option?

Note also that we've always sort of envisioned that drivers
implementing CONNECT would do BSS selection themselves, but I think
there's no automatic way of doing that with wpa_s, and it may not even
be desirable in many cases (unless you really need the power saving
advantage) since it gets us into a situation where we have all these
different algorithms etc.

johannes

2017-10-17 15:10:51

by Johannes Berg

[permalink] [raw]
Subject: Re: Commit 0711d638 breaks mwifiex

Hi,

> > > https://p.sipsolutions.net/8f9d5917a5cbe4b3.txt
> > >
> > > Can you also give that a spin?

Thanks for testing!

> > Issued four reassociate, the network interface is still alive. Also
> > with this patch it stays with BSSID1 and I don't see any
> > "Association request to the driver failed" in the process.
>
> Correction: I can still see "Association request to the driver
> failed" and switching between BSSIDs with more reassociates but I
> think that shouldn't be an issue?

Not sure what you mean by "with more reassociates"?

In any case, we suspect that there's another underlying issue in
mwifiex, which happens to tickle this problem.

Could you record a full trace with -e cfg80211 for me (and send it
privately, please compress trace.dat e.g. with xz, it compresses really
well)

Thanks,
johannes