2019-01-05 19:44:16

by Arend van Spriel

[permalink] [raw]
Subject: Re: Kernel oops / WiFi connection failure with wpa_supplicant 2.7

On 1/3/2019 4:49 PM, Jouni Malinen wrote:
> On Thu, Jan 03, 2019 at 10:38:32AM -0500, Eric Blau wrote:
>> Since upgrading to wpa_supplicant 2.7, myself and many others have hit
>> issues with wpa_supplicant failing to connect due to invalid arguments
>> being passed to the underlying kernel driver. Reverting to version 2.6
>> makes these issues go away.
>
>> kernel: WARNING: CPU: 0 PID: 16169 at
>> drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c:5130
>> brcmf_cfg80211_set_pmk+0x50/0x70 [brcmfmac]
>
> Which is this WARN_ON in the driver:
>
> /* expect using firmware supplicant for 1X */
> ifp = netdev_priv(dev);
> if (WARN_ON(ifp->vif->profile.use_fwsup != BRCMF_PROFILE_FWSUP_1X))
> return -EINVAL;

Yes. It means the firmware is not configured to use the 1x offload so it
rejects the PMK setting here.

>> Notice that the oops references wpa_supplicant as the offending
>> process, although maybe the firmware or driver is at fault for
>> advertising 4-way handshake offload support.
>
> That's not an oops and wpa_supplicant is not the "offending process", it
> is just the user space process in which context the driver hits this
> issue.

Well, I beg to differ but I will get to that.

>> Any ideas what the issue could be here? If there's anything else I can
>> do to help track down the problem, please let me know.
>
> That should be reported to the maintainers of the kernel driver that has
> this issue:

So the issue is that the nl80211 api requires wpa_supplicant to provide
an attribute in the NL80211_CMD_CONNECT to indicate that driver/firmware
should do the 1x offload which is described in the second paragraph below:

/**
* DOC: WPA/WPA2 EAPOL handshake offload
*
* By setting @NL80211_EXT_FEATURE_4WAY_HANDSHAKE_STA_PSK flag drivers
* can indicate they support offloading EAPOL handshakes for WPA/WPA2
* preshared key authentication. In %NL80211_CMD_CONNECT the preshared
* key should be specified using %NL80211_ATTR_PMK. Drivers supporting
* this offload may reject the %NL80211_CMD_CONNECT when no preshared
* key material is provided, for example when that driver does not
* support setting the temporal keys through %CMD_NEW_KEY.
*
* Similarly @NL80211_EXT_FEATURE_4WAY_HANDSHAKE_STA_1X flag can be
* set by drivers indicating offload support of the PTK/GTK EAPOL
* handshakes during 802.1X authentication. In order to use the offload
* the %NL80211_CMD_CONNECT should have %NL80211_ATTR_WANT_1X_4WAY_HS
* attribute flag. Drivers supporting this offload may reject the
* %NL80211_CMD_CONNECT when the attribute flag is not present.
*
* For 802.1X the PMK or PMK-R0 are set by providing %NL80211_ATTR_PMK
* using %NL80211_CMD_SET_PMK. For offloaded FT support also
* %NL80211_ATTR_PMKR0_NAME must be provided.
*/

For testing I had modified the wpa_supplicant to add the required flag
in CONNECT command, but it was a bit too hacky to submit. I will rebase
those changes and clean it up.

However, there is more to it. When these offloads were introduced, we
discussed about having a PORT_AUTHORIZED event or not. It was decided
passing an attribute in CONNECT and ROAMED event would suffice and that
is what was implemented in brcmfmac. However, it seems time passed and
the need for an explicit PORT_AUTHORIZED was there (probably Denis
knows), which wpa_supplicant now supports thus ignoring the attribute in
the CONNECT and ROAMED events. The brcmfmac driver was not changed
accordingly. For this there are patches pending in linux-wireless which
are necessary to have a working connection.

Regards,
Arend


2019-01-08 17:44:06

by Denis Kenzior

[permalink] [raw]
Subject: Re: Kernel oops / WiFi connection failure with wpa_supplicant 2.7

Hi Arend,

> However, there is more to it. When these offloads were introduced, we
> discussed about having a PORT_AUTHORIZED event or not. It was decided
> passing an attribute in CONNECT and ROAMED event would suffice and that
> is what was implemented in brcmfmac. However, it seems time passed and
> the need for an explicit PORT_AUTHORIZED was there (probably Denis
> knows), which wpa_supplicant now supports thus ignoring the attribute in
> the CONNECT and ROAMED events. The brcmfmac driver was not changed
> accordingly. For this there are patches pending in linux-wireless which
> are necessary to have a working connection.
>

Coming in a bit late to this discussion, but it does raise a few points
I wouldn't mind some clarification on:

- With commit 503c1fb98ba3, the kernel effectively changed the userspace
API. So I take it that breaking userspace APIs are OK sometimes? If so,
I have lots of suggestions to make ;)

- Is RTNL LINK_MODE / OPER_STATE status being (supposed to be?) affected
by the driver during a roam? E.g. if we're in a 802.1X network with
userspace authentication, and driver roamed requiring a new 802.1X auth,
then in theory the RTNL mode needs to be brought back out of UP state...

- The new API leaves a lot to be desired in terms of race conditions.
For example, how long should userspace wait for EAPoL-EAP packets to
arrive (before triggering its own EAPoL-Start for example) if a
CMD_ROAMED event comes?

- What happens if userspace does send an EAPoL-Start in the middle of an
offloaded 4-way handshake?

Regards,
-Denis

2019-01-14 20:12:45

by Arend van Spriel

[permalink] [raw]
Subject: Re: Kernel oops / WiFi connection failure with wpa_supplicant 2.7

On 1/8/2019 6:44 PM, Denis Kenzior wrote:
> Hi Arend,
>
>> However, there is more to it. When these offloads were introduced, we
>> discussed about having a PORT_AUTHORIZED event or not. It was decided
>> passing an attribute in CONNECT and ROAMED event would suffice and
>> that is what was implemented in brcmfmac. However, it seems time
>> passed and the need for an explicit PORT_AUTHORIZED was there
>> (probably Denis knows), which wpa_supplicant now supports thus
>> ignoring the attribute in the CONNECT and ROAMED events. The brcmfmac
>> driver was not changed accordingly. For this there are patches pending
>> in linux-wireless which are necessary to have a working connection.
>>
>
> Coming in a bit late to this discussion, but it does raise a few points
> I wouldn't mind some clarification on:
>
> - With commit 503c1fb98ba3, the kernel effectively changed the userspace
> API.  So I take it that breaking userspace APIs are OK sometimes? If so,
> I have lots of suggestions to make ;)

I bet you do :-p I think the rule of thumb is that there are no drivers
providing the functionality behind the user-space API and/or no
user-space applications are using that API.

> - Is RTNL LINK_MODE / OPER_STATE status being (supposed to be?) affected
> by the driver during a roam?  E.g. if we're in a 802.1X network with
> userspace authentication, and driver roamed requiring a new 802.1X auth,
> then in theory the RTNL mode needs to be brought back out of UP state...

So do you expect the driver/cfg80211 to take care of that or the
supplicant? I assumed wpa_supplicant would be doing that.

> - The new API leaves a lot to be desired in terms of race conditions.
> For example, how long should userspace wait for EAPoL-EAP packets to
> arrive (before triggering its own EAPoL-Start for example) if a
> CMD_ROAMED event comes?

I think that question applies to CMD_CONNECT as well, right? Not sure if
the specs provide any guidance for that. I can dive into that, but maybe
someone like Jouni or Johannes know. If so, let me know ;-)

> - What happens if userspace does send an EAPoL-Start in the middle of an
> offloaded 4-way handshake?

Probably those would be dropped.

Regards,
Arend

2019-01-14 21:19:00

by Denis Kenzior

[permalink] [raw]
Subject: Re: Kernel oops / WiFi connection failure with wpa_supplicant 2.7

Hi Arend,

On 01/14/2019 02:12 PM, Arend Van Spriel wrote:
> On 1/8/2019 6:44 PM, Denis Kenzior wrote:
>> Hi Arend,
>>
>>> However, there is more to it. When these offloads were introduced, we
>>> discussed about having a PORT_AUTHORIZED event or not. It was decided
>>> passing an attribute in CONNECT and ROAMED event would suffice and
>>> that is what was implemented in brcmfmac. However, it seems time
>>> passed and the need for an explicit PORT_AUTHORIZED was there
>>> (probably Denis knows), which wpa_supplicant now supports thus
>>> ignoring the attribute in the CONNECT and ROAMED events. The brcmfmac
>>> driver was not changed accordingly. For this there are patches
>>> pending in linux-wireless which are necessary to have a working
>>> connection.
>>>
>>
>> Coming in a bit late to this discussion, but it does raise a few
>> points I wouldn't mind some clarification on:
>>
>> - With commit 503c1fb98ba3, the kernel effectively changed the
>> userspace API.  So I take it that breaking userspace APIs are OK
>> sometimes? If so, I have lots of suggestions to make ;)
>
> I bet you do :-p I think the rule of thumb is that there are no drivers
> providing the functionality behind the user-space API and/or no
> user-space applications are using that API.

Maybe this is a question for Johannes as well, but define 'user-space
applications'? If that includes wpa_s, wasn't the rule of thumb broken
with that commit?

>
>> - Is RTNL LINK_MODE / OPER_STATE status being (supposed to be?)
>> affected by the driver during a roam?  E.g. if we're in a 802.1X
>> network with userspace authentication, and driver roamed requiring a
>> new 802.1X auth, then in theory the RTNL mode needs to be brought back
>> out of UP state...
>
> So do you expect the driver/cfg80211 to take care of that or the
> supplicant? I assumed wpa_supplicant would be doing that.
>

With regular roaming where we trigger a Deassociate/Deathenticate
(either explicitly or implicitly) first, the interface goes into dormant
mode by virtue of the carrier going down.

With this it isn't really clear whether the same is happening and who
(kernel/userspace) should be doing what. I would actually assume the
kernel is/should be turning carrier off for the duration of the roam
operation?

>> - The new API leaves a lot to be desired in terms of race conditions.
>> For example, how long should userspace wait for EAPoL-EAP packets to
>> arrive (before triggering its own EAPoL-Start for example) if a
>> CMD_ROAMED event comes?
>
> I think that question applies to CMD_CONNECT as well, right? Not sure if
> the specs provide any guidance for that. I can dive into that, but maybe
> someone like Jouni or Johannes know. If so, let me know ;-)

With CMD_CONNECT it is a bit more clear because you're most likely not
specifying a PMKID for the first time, so you expect the authentication
to happen in all cases. If the AP doesn't respond after some small
timeout, the supplicant can send its own EAPoL-Start.

With CMD_ROAMED it is less clear.

>
>> - What happens if userspace does send an EAPoL-Start in the middle of
>> an offloaded 4-way handshake?
>
> Probably those would be dropped.
>

I would love to have something more definitive than 'Probably', and it
might be worth mentioning this hint in the documentation somewhere.

Regards,
-Denis

2019-01-14 23:04:29

by Arend van Spriel

[permalink] [raw]
Subject: Re: Kernel oops / WiFi connection failure with wpa_supplicant 2.7

On 1/14/2019 10:18 PM, Denis Kenzior wrote:
> Hi Arend,
>
> On 01/14/2019 02:12 PM, Arend Van Spriel wrote:
>> On 1/8/2019 6:44 PM, Denis Kenzior wrote:
>>> Hi Arend,
>>>
>>>> However, there is more to it. When these offloads were introduced,
>>>> we discussed about having a PORT_AUTHORIZED event or not. It was
>>>> decided passing an attribute in CONNECT and ROAMED event would
>>>> suffice and that is what was implemented in brcmfmac. However, it
>>>> seems time passed and the need for an explicit PORT_AUTHORIZED was
>>>> there (probably Denis knows), which wpa_supplicant now supports thus
>>>> ignoring the attribute in the CONNECT and ROAMED events. The
>>>> brcmfmac driver was not changed accordingly. For this there are
>>>> patches pending in linux-wireless which are necessary to have a
>>>> working connection.
>>>>
>>>
>>> Coming in a bit late to this discussion, but it does raise a few
>>> points I wouldn't mind some clarification on:
>>>
>>> - With commit 503c1fb98ba3, the kernel effectively changed the
>>> userspace API.  So I take it that breaking userspace APIs are OK
>>> sometimes? If so, I have lots of suggestions to make ;)
>>
>> I bet you do :-p I think the rule of thumb is that there are no
>> drivers providing the functionality behind the user-space API and/or
>> no user-space applications are using that API.
>
> Maybe this is a question for Johannes as well, but define 'user-space
> applications'?  If that includes wpa_s, wasn't the rule of thumb broken
> with that commit?

In my previous reply I wanted to add that it would be hard to proof that
no user-space applications are using the API. Not sure exactly when
things were added in wpa_s, but I suspect it was
post-commit-503c1fb98ba3 so it did not have support for the user-space
API before the commit.

>>
>>> - Is RTNL LINK_MODE / OPER_STATE status being (supposed to be?)
>>> affected by the driver during a roam?  E.g. if we're in a 802.1X
>>> network with userspace authentication, and driver roamed requiring a
>>> new 802.1X auth, then in theory the RTNL mode needs to be brought
>>> back out of UP state...
>>
>> So do you expect the driver/cfg80211 to take care of that or the
>> supplicant? I assumed wpa_supplicant would be doing that.
>>
>
> With regular roaming where we trigger a Deassociate/Deathenticate
> (either explicitly or implicitly) first, the interface goes into dormant
> mode by virtue of the carrier going down.
>
> With this it isn't really clear whether the same is happening and who
> (kernel/userspace) should be doing what.  I would actually assume the
> kernel is/should be turning carrier off for the duration of the roam
> operation?

On what layer do we know 802.1X re-auth is required?

>>> - The new API leaves a lot to be desired in terms of race conditions.
>>> For example, how long should userspace wait for EAPoL-EAP packets to
>>> arrive (before triggering its own EAPoL-Start for example) if a
>>> CMD_ROAMED event comes?
>>
>> I think that question applies to CMD_CONNECT as well, right? Not sure
>> if the specs provide any guidance for that. I can dive into that, but
>> maybe someone like Jouni or Johannes know. If so, let me know ;-)
>
> With CMD_CONNECT it is a bit more clear because you're most likely not
> specifying a PMKID for the first time, so you expect the authentication
> to happen in all cases.  If the AP doesn't respond after some small
> timeout, the supplicant can send its own EAPoL-Start.
>
> With CMD_ROAMED it is less clear.
>
>>
>>> - What happens if userspace does send an EAPoL-Start in the middle of
>>> an offloaded 4-way handshake?
>>
>> Probably those would be dropped.
>>
>
> I would love to have something more definitive than 'Probably', and it
> might be worth mentioning this hint in the documentation somewhere.

I was hesitant to use that word, but decided to do so simply because I
can not speak for every driver and even for the brcmfmac driver that I
maintain I will need to look into the firmware to be sure. I agree that
a remark of that possibility is worth adding.

Regards,
Arend

2019-01-15 13:00:27

by Johannes Berg

[permalink] [raw]
Subject: Re: Kernel oops / WiFi connection failure with wpa_supplicant 2.7


> > Maybe this is a question for Johannes as well, but define 'user-space
> > applications'? If that includes wpa_s, wasn't the rule of thumb broken
> > with that commit?
>
> In my previous reply I wanted to add that it would be hard to proof that
> no user-space applications are using the API. Not sure exactly when
> things were added in wpa_s, but I suspect it was
> post-commit-503c1fb98ba3 so it did not have support for the user-space
> API before the commit.

I don't know about this really.

My thought at the time likely was that if there's no driver implementing
it, no userspace could've existed? Or maybe that just wasn't true, and I
got confused?

In any case, it certainly wasn't an intentional API break.

> > > > - What happens if userspace does send an EAPoL-Start in the middle of
> > > > an offloaded 4-way handshake?
> > >
> > > Probably those would be dropped.
> > >
> >
> > I would love to have something more definitive than 'Probably', and it
> > might be worth mentioning this hint in the documentation somewhere.
>
> I was hesitant to use that word, but decided to do so simply because I
> can not speak for every driver and even for the brcmfmac driver that I
> maintain I will need to look into the firmware to be sure. I agree that
> a remark of that possibility is worth adding.

I don't really know if we should really cover all possible error
scenarios like that?

johannes


2019-01-15 15:55:10

by Denis Kenzior

[permalink] [raw]
Subject: Re: Kernel oops / WiFi connection failure with wpa_supplicant 2.7

Hi Arend,

>>>
>>>> - Is RTNL LINK_MODE / OPER_STATE status being (supposed to be?)
>>>> affected by the driver during a roam?  E.g. if we're in a 802.1X
>>>> network with userspace authentication, and driver roamed requiring a
>>>> new 802.1X auth, then in theory the RTNL mode needs to be brought
>>>> back out of UP state...
>>>
>>> So do you expect the driver/cfg80211 to take care of that or the
>>> supplicant? I assumed wpa_supplicant would be doing that.
>>>
>>
>> With regular roaming where we trigger a Deassociate/Deathenticate
>> (either explicitly or implicitly) first, the interface goes into
>> dormant mode by virtue of the carrier going down.
>>
>> With this it isn't really clear whether the same is happening and who
>> (kernel/userspace) should be doing what.  I would actually assume the
>> kernel is/should be turning carrier off for the duration of the roam
>> operation?
>
> On what layer do we know 802.1X re-auth is required?
>

Not sure what you mean by 'layer'? If re-auth is required, then only
the supplicant has the proper info and it will handle this via EAPoL frames.

But that is besides the point. Regardless of whether a roam needs
re-auth or not, network interface dormant notification is needed. For
example: userspace DHCP clients need to know when to renew the address.
And yes, there are weird networks out there that expect you to
re-negotiate your DHCP address on a roam. Such clients are not
integrated in any way with a supplicant and rely on rtnl.

Regards,
-Denis