2022-10-25 20:41:22

by James Prestwood

[permalink] [raw]
Subject: [PATCH 0/1] Fix __ieee80211_disconnect when not associated

A user reported some behavior where IWD hangs expecting another event
to come and it never does. This was due to the firmware (iwlwifi)
timing out after authentication and calling __ieee80211_disconnect
which essentially does nothing if not associated. The problem here
is userspace expects some event to come after authenticating whether
it be an association, disconnect, death etc.

This my attempt at fixing the problem. Note that the real issue
here is iwlwifi. The user can reliably reproduce this problem which
points to a firmware issue since a connection loss should be quite
rare I assume. Regardless, any bad driver/firmware could cause this
behavior since __ieee80211_disconnect only handles being associated.

I personally cannot test this, so I hacked mac80211 to call
__ieee80211_disconnect immediately after sending the authenticate
event. This effectively causes the same behavior and indeed this
patch causes a disconnect which is the behavior I would expect.

I asked about this a week ago with no response so hopefully this
gets more attention and at least gets the conversation going, even
if I went about the fix wrong.

James Prestwood (1):
wifi: mac80211: handle connection loss in __ieee80211_disconnect

net/mac80211/mlme.c | 5 +++++
1 file changed, 5 insertions(+)

--
2.34.3



2023-01-18 16:14:09

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 0/1] Fix __ieee80211_disconnect when not associated

Oh I guess I should read the cover letter too ...

On Tue, 2022-10-25 at 13:34 -0700, James Prestwood wrote:
> A user reported some behavior where IWD hangs expecting another event
> to come and it never does. This was due to the firmware (iwlwifi)
> timing out after authentication and calling __ieee80211_disconnect
> which essentially does nothing if not associated. The problem here
> is userspace expects some event to come after authenticating whether
> it be an association, disconnect, death etc.
>

Basically I don't understand why userspace expects some event. It asked
for authentication, and you got it. That's all. I don't see userspace
asking for association, or anything else, so what would it be waiting
for?

johannes

2023-01-18 18:01:58

by James Prestwood

[permalink] [raw]
Subject: Re: [PATCH 0/1] Fix __ieee80211_disconnect when not associated

On Wed, 2023-01-18 at 17:06 +0100, Johannes Berg wrote:
> Oh I guess I should read the cover letter too ...
>
> On Tue, 2022-10-25 at 13:34 -0700, James Prestwood wrote:
> > A user reported some behavior where IWD hangs expecting another
> > event
> > to come and it never does. This was due to the firmware (iwlwifi)
> > timing out after authentication and calling __ieee80211_disconnect
> > which essentially does nothing if not associated. The problem here
> > is userspace expects some event to come after authenticating
> > whether
> > it be an association, disconnect, death etc.
> >
>
> Basically I don't understand why userspace expects some event. It
> asked
> for authentication, and you got it. That's all. I don't see userspace
> asking for association, or anything else, so what would it be waiting
> for?

I should have explained this better. I dug up the old thread from the
original report and IIRC this is the sequence of events:

1. Begin reassociation to new BSS via CMD_CONNECT. This results in the
kernel sending many events to remove the current BSS in favor of the
new one:

2. Receive DEL_STATION, DEAUTHENTICATE, DISCONNECT, NEW_STATION

3. Then a CMD_AUTHENTICATE, with a success status.

4. At this point the firmware decides its not gonna continue and calls
__ieee80211_disconnect which is a no-op when not associated. We assumed
either CMD_ASSOCIATE or CMD_CONNECT will come after CMD_AUTHENTICATE,
or a CMD_DEAUTH if there was a problem.

I will say, we do see a CMD_DEL_STATION event here which we never
processed in this case. But again, I would expect a CMD_DEAUTHENTICATE
since we successfully authenticated, right?

Thanks,
James


>
> johannes


2023-01-18 18:10:25

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 0/1] Fix __ieee80211_disconnect when not associated

On Wed, 2023-01-18 at 09:54 -0800, James Prestwood wrote:
> >
> > Basically I don't understand why userspace expects some event. It
> > asked
> > for authentication, and you got it. That's all. I don't see userspace
> > asking for association, or anything else, so what would it be waiting
> > for?
>
> I should have explained this better. I dug up the old thread from the
> original report and IIRC this is the sequence of events:
>
> 1. Begin reassociation to new BSS via CMD_CONNECT.
>

Oh, CMD_CONNECT, OK.

> This results in the
> kernel sending many events to remove the current BSS in favor of the
> new one:
>
> 2. Receive DEL_STATION, DEAUTHENTICATE, DISCONNECT, NEW_STATION

So far, so good.

> 3. Then a CMD_AUTHENTICATE, with a success status.

Sure, that's what we see.

> 4. At this point the firmware decides its not gonna continue and calls
> __ieee80211_disconnect which is a no-op when not associated.
>

This is also completely irrelevant to the state. The firmware doesn't
"decide not to continue". Something else already decided it doesn't want
to continue. This is entirely driven from the top - be it the
net/wireless/sme.c for CMD_CONNECT, or userspace for CMD_AUTHENTICATE
and CMD_ASSOCIATE.

> We assumed
> either CMD_ASSOCIATE or CMD_CONNECT will come after CMD_AUTHENTICATE,
> or a CMD_DEAUTH if there was a problem.

Yeah for CMD_CONNECT I guess that's reasonable. In fact you really
shouldn't even look at the authenticate/associate even in these cases
since they're not guaranteed (e.g. if you have a fullmac device), they
only happen as a consequence of the mac80211 implementation, not by
design.

So I think you're just looking in the wrong place - the real question is
why the association sequence in net/wireless/sme.c doesn't continue (or
abort) at this point?

The whole "firmware event" thing is completely incidental and an
implementation detail. The firmware decides nothing. All that happens
here is that the firmware decided to no longer wait for the association
to happen (but that wasn't even started), and a driver bug that
basically just prints the wrong message - it says it's expecting to be
associated but it can't actually expect that since we never even
transmitted an association request.

> I will say, we do see a CMD_DEL_STATION event here which we never
> processed in this case.

And you probably shouldn't anyway, again more of an implementation
detail, not really by design.

> But again, I would expect a CMD_DEAUTHENTICATE
> since we successfully authenticated, right?

No, you can't expect that, you could be authenticated with the AP for an
indefinite amount of time, or never hear the deauth frame (if it ever
sends one).

johannes

2023-01-18 18:48:55

by James Prestwood

[permalink] [raw]
Subject: Re: [PATCH 0/1] Fix __ieee80211_disconnect when not associated

On Wed, 2023-01-18 at 19:08 +0100, Johannes Berg wrote:
> On Wed, 2023-01-18 at 09:54 -0800, James Prestwood wrote:
> > >

<snip>
>

> > 4. At this point the firmware decides its not gonna continue and
> > calls
> > __ieee80211_disconnect which is a no-op when not associated.
> >
>
> This is also completely irrelevant to the state. The firmware doesn't
> "decide not to continue". Something else already decided it doesn't
> want
> to continue. This is entirely driven from the top - be it the
> net/wireless/sme.c for CMD_CONNECT, or userspace for CMD_AUTHENTICATE
> and CMD_ASSOCIATE.

Well, we see this:

iwlwifi 0000:00:14.3: Not associated and the session protection is over
already...

So _something_ happens in the driver/firmware causing this event to be
generated, thats all I'm saying. But yes, it really doesn't matter why
but rather how mac80211 handles it.

>
> > We assumed
> > either CMD_ASSOCIATE or CMD_CONNECT will come after
> > CMD_AUTHENTICATE,
> > or a CMD_DEAUTH if there was a problem.
>
> Yeah for CMD_CONNECT I guess that's reasonable. In fact you really
> shouldn't even look at the authenticate/associate even in these cases
> since they're not guaranteed (e.g. if you have a fullmac device),
> they
> only happen as a consequence of the mac80211 implementation, not by
> design.
>
> So I think you're just looking in the wrong place - the real question
> is
> why the association sequence in net/wireless/sme.c doesn't continue
> (or
> abort) at this point?

I think I narrowed down this "why" pretty well. A connection loss event
happens between a successful authentication but before association.
Since __ieee80211_disconnect does not take this state into account the
kernel haults the reassociation and never informs userspace.

Maybe my solution/fix is incorrect, but its at least a starting point.

>
> The whole "firmware event" thing is completely incidental and an
> implementation detail. The firmware decides nothing. All that happens
> here is that the firmware decided to no longer wait for the
> association
> to happen (but that wasn't even started), and a driver bug that
> basically just prints the wrong message - it says it's expecting to
> be
> associated but it can't actually expect that since we never even
> transmitted an association request.
>
> > I will say, we do see a CMD_DEL_STATION event here which we never
> > processed in this case.
>
> And you probably shouldn't anyway, again more of an implementation
> detail, not really by design.
>
> > But again, I would expect a CMD_DEAUTHENTICATE
> > since we successfully authenticated, right?
>
> No, you can't expect that, you could be authenticated with the AP for
> an
> indefinite amount of time, or never hear the deauth frame (if it ever
> sends one).

Ok, so at least a CMD_CONNECT event right?

Maybe I'm giving nl80211 too much credit, but the
CMD_AUTH/ASSOC/CONNECT APIs have always seemed symmetric in terms of
events, in that if you issue one of these commands you will get an
event in return. So I would expect CMD_CONNECT to generate a
CMD_CONNECT event.

>
> johannes


2023-01-18 19:43:51

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH 0/1] Fix __ieee80211_disconnect when not associated

On Wed, 2023-01-18 at 10:39 -0800, James Prestwood wrote:
>
> iwlwifi 0000:00:14.3: Not associated and the session protection is over
> already...
>
> So _something_ happens in the driver/firmware causing this event to be
> generated, thats all I'm saying.
>

Yes. But that's also irrelevant.

> But yes, it really doesn't matter why
> but rather how mac80211 handles it.

No, it doesn't matter how mac80211 handles it. The fact that the driver
even tells mac80211 about it by calling the disconnect API is already a
bug - just one that doesn't matter now because mac80211 ignores it.

> > So I think you're just looking in the wrong place - the real question
> > is
> > why the association sequence in net/wireless/sme.c doesn't continue
> > (or
> > abort) at this point?
>
> I think I narrowed down this "why" pretty well.

No, you haven't.

> A connection loss event
> happens between a successful authentication but before association.

But it doesn't!

There's no "connection loss" event, it's just the driver being confused.

The flow is something like this:

* authenticate
-> driver prepare for authentication
-> send auth frame
-> get auth response
* firmware waits for association but that never happens
* driver prints a message and calls API

but since there was never even an *attempt* to associate, that whole end
of the "time event" and the message is irrelevant

If there was an attempt to associate it wouldn't even matter if it was
before or after the end of the time event, because if it already ended
the driver would just create a new one.

In fact, the whole reason we don't abort the time event after successful
authentication and schedule a new one on association is just an
optimisation - it's nicer to the firmware to just have one, and normally
we finish auth + assoc within a few hundred ms.


> Since __ieee80211_disconnect does not take this state into account the
> kernel haults the reassociation and never informs userspace.

No no - there's no "the kernel halts the reassociation" in this case.
You're confusing cause and effect. The *reason* we get this message and
the pointless/buggy/... call to __ieee80211_disconnect() is that there's
never an attempt to associate. It's not *causing* the lack thereof.

> Maybe my solution/fix is incorrect, but its at least a starting point.

I don't see it that way - something is clearly broken in that there's no
association attempt, but I still don't know what. All you've done is
created a special iwlwifi fallback path to let userspace recover from
it, not actually addressed the bug.

> > No, you can't expect that, you could be authenticated with the AP for
> > an
> > indefinite amount of time, or never hear the deauth frame (if it ever
> > sends one).
>
> Ok, so at least a CMD_CONNECT event right?
>
> Maybe I'm giving nl80211 too much credit, but the
> CMD_AUTH/ASSOC/CONNECT APIs have always seemed symmetric in terms of
> events, in that if you issue one of these commands you will get an
> event in return. So I would expect CMD_CONNECT to generate a
> CMD_CONNECT event.
>

Yes, I would also expect a CMD_CONNECT event, via
nl80211_send_connect_result() / __cfg80211_connect_result().

But something is going wrong. I think we need to look into probably
cfg80211_sme_rx_auth() - why is that not continuing the state machine?
Surely wdev->conn didn't go away, so maybe for some reason in this case
we already have wdev->conn->state == CFG80211_CONN_CONNECTED? But even
in case of reassoc, wdev->conn is freshly allocated and should be
zeroed, at least initially. But maybe some of the events mac80211
generates during the disconnect messes with the state?

The other cases in cfg80211_sme_rx_auth() seem to generate an event
already one way or the other.

johannes