2019-10-28 20:09:24

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot

On Fri, 2019-10-25 at 12:21 +0200, Krzysztof Hałasa wrote:
> Fix a bug where the mac80211 RX aggregation code sets a new aggregation
> "session" at the remote station's request, but the head_seq_num
> (the sequence number the receiver expects to receive) isn't reset.
>
> Spotted on a pair of AR9580 in IBSS mode.
>
> Signed-off-by: Krzysztof Halasa <[email protected]>
>
> diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
> index 4d1c335e06e5..67733bd61297 100644
> --- a/net/mac80211/agg-rx.c
> +++ b/net/mac80211/agg-rx.c
> @@ -354,10 +354,13 @@ void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
> */
> rcu_read_lock();
> tid_rx = rcu_dereference(sta->ampdu_mlme.tid_rx[tid]);
> - if (tid_rx && tid_rx->timeout == timeout)
> + if (tid_rx && tid_rx->timeout == timeout) {
> + tid_rx->ssn = start_seq_num;
> + tid_rx->head_seq_num = start_seq_num;
> status = WLAN_STATUS_SUCCESS;

This is wrong, this is the case of *updating an existing session*, we
must not reset the head SN then.

I think you just got very lucky (or unlucky) to have the same dialog
token, because we start from 0 - maybe we should initialize it to a
random value to flush out such issues.

Really what I think probably happened is that one of your stations lost
the connection to the other, and didn't tell it about it in any way - so
the other kept all the status alive.

I suspect to make all this work well we need to not only have the fixes
I made recently to actually send and parse deauth frames, but also to
even send an auth and reset the state when we receive that, so if we
move out of range and even the deauth frame is lost, we can still reset
properly.

In any case, this is not the right approach - we need to handle the
"lost connection" case better I suspect, but since you don't say what
really happened I don't really know that that's what you're seeing.

johannes


2019-10-29 08:43:42

by Koen Vandeputte

[permalink] [raw]
Subject: Re: [PATCH v2] 802.11n IBSS: wlan0 stops receiving packets due to aggregation after sender reboot


On 28.10.19 13:21, Johannes Berg wrote:
> On Fri, 2019-10-25 at 12:21 +0200, Krzysztof Hałasa wrote:
>> Fix a bug where the mac80211 RX aggregation code sets a new aggregation
>> "session" at the remote station's request, but the head_seq_num
>> (the sequence number the receiver expects to receive) isn't reset.
>>
>> Spotted on a pair of AR9580 in IBSS mode.
>>
>> Signed-off-by: Krzysztof Halasa <[email protected]>
>>
>> diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
>> index 4d1c335e06e5..67733bd61297 100644
>> --- a/net/mac80211/agg-rx.c
>> +++ b/net/mac80211/agg-rx.c
>> @@ -354,10 +354,13 @@ void ___ieee80211_start_rx_ba_session(struct sta_info *sta,
>> */
>> rcu_read_lock();
>> tid_rx = rcu_dereference(sta->ampdu_mlme.tid_rx[tid]);
>> - if (tid_rx && tid_rx->timeout == timeout)
>> + if (tid_rx && tid_rx->timeout == timeout) {
>> + tid_rx->ssn = start_seq_num;
>> + tid_rx->head_seq_num = start_seq_num;
>> status = WLAN_STATUS_SUCCESS;
> This is wrong, this is the case of *updating an existing session*, we
> must not reset the head SN then.
>
> I think you just got very lucky (or unlucky) to have the same dialog
> token, because we start from 0 - maybe we should initialize it to a
> random value to flush out such issues.
>
> Really what I think probably happened is that one of your stations lost
> the connection to the other, and didn't tell it about it in any way - so
> the other kept all the status alive.
>
> I suspect to make all this work well we need to not only have the fixes
> I made recently to actually send and parse deauth frames, but also to
> even send an auth and reset the state when we receive that, so if we
> move out of range and even the deauth frame is lost, we can still reset
> properly.
>
> In any case, this is not the right approach - we need to handle the
> "lost connection" case better I suspect, but since you don't say what
> really happened I don't really know that that's what you're seeing.
>
> johannes

Hi all,

I can confirm the issue as I'm also seeing this sometimes in the field here.

Sometimes when a devices goes out of range and then re-enters,
the link refuses to "come up", as in rx looks to be "stuck" without any
reports in system log or locking issues (lockdep enabled)

I have dozens of devices installed offshore (802.11n based), both on
static and moving assets,
which cover from short (250m) up to very long distances (~35km)

So .. while there is some momentum for this issue,
I'm more than happy to provide extensive testing should fixes be posted
regarding IBSS in general.

Regards,

Koen