2013-11-10 21:52:36

by Felipe Contreras

[permalink] [raw]
Subject: [PATCH v2] mac80211: add assoc beacon timeout logic

We don't want to be waiting forever for a beacon that will never come,
just continue the association.

Signed-off-by: Felipe Contreras <[email protected]>
---

The previous version depended on some cleanup patches, plus had some unclear
rerun logic.

net/mac80211/ieee80211_i.h | 1 +
net/mac80211/mlme.c | 32 ++++++++++++++++++++++++++------
2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 611abfc..e1f858d 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -358,6 +358,7 @@ struct ieee80211_mgd_assoc_data {
const u8 *supp_rates;

unsigned long timeout;
+ unsigned long beacon_timeout;
int tries;

u16 capability;
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index 86e4ad5..68f76f7 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -38,6 +38,7 @@
#define IEEE80211_ASSOC_TIMEOUT (HZ / 5)
#define IEEE80211_ASSOC_TIMEOUT_LONG (HZ / 2)
#define IEEE80211_ASSOC_TIMEOUT_SHORT (HZ / 10)
+#define IEEE80211_ASSOC_BEACON_TIMEOUT 2 * HZ
#define IEEE80211_ASSOC_MAX_TRIES 3

static int max_nullfunc_tries = 2;
@@ -3475,6 +3476,24 @@ void ieee80211_mgd_conn_tx_status(struct ieee80211_sub_if_data *sdata,
ieee80211_queue_work(&local->hw, &sdata->work);
}

+static int check_beacon(struct ieee80211_sub_if_data *sdata)
+{
+ struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
+ struct ieee80211_mgd_assoc_data *assoc_data = ifmgd->assoc_data;
+
+ if (!assoc_data->need_beacon || ifmgd->have_beacon)
+ return true;
+
+ if (time_after(jiffies, assoc_data->beacon_timeout)) {
+ sdata_info(sdata, "no beacon from %pM\n", assoc_data->bss->bssid);
+ return true;
+ }
+
+ assoc_data->timeout = TU_TO_EXP_TIME(assoc_data->bss->beacon_interval);
+ run_again(sdata, assoc_data->timeout);
+ return false;
+}
+
void ieee80211_sta_work(struct ieee80211_sub_if_data *sdata)
{
struct ieee80211_local *local = sdata->local;
@@ -3533,12 +3552,12 @@ void ieee80211_sta_work(struct ieee80211_sub_if_data *sdata)

if (ifmgd->assoc_data && ifmgd->assoc_data->timeout_started &&
time_after(jiffies, ifmgd->assoc_data->timeout)) {
- if ((ifmgd->assoc_data->need_beacon && !ifmgd->have_beacon) ||
- ieee80211_do_assoc(sdata)) {
- struct cfg80211_bss *bss = ifmgd->assoc_data->bss;
-
- ieee80211_destroy_assoc_data(sdata, false);
- cfg80211_assoc_timeout(sdata->dev, bss);
+ if (check_beacon(sdata)) {
+ if (ieee80211_do_assoc(sdata)) {
+ struct cfg80211_bss *bss = ifmgd->assoc_data->bss;
+ ieee80211_destroy_assoc_data(sdata, false);
+ cfg80211_assoc_timeout(sdata->dev, bss);
+ }
}
} else if (ifmgd->assoc_data && ifmgd->assoc_data->timeout_started)
run_again(sdata, ifmgd->assoc_data->timeout);
@@ -4335,6 +4354,7 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
sdata_info(sdata, "waiting for beacon from %pM\n",
ifmgd->bssid);
assoc_data->timeout = TU_TO_EXP_TIME(req->bss->beacon_interval);
+ assoc_data->beacon_timeout = jiffies + IEEE80211_ASSOC_BEACON_TIMEOUT;
assoc_data->timeout_started = true;
assoc_data->need_beacon = true;
} else if (beacon_ies) {
--
1.8.4.2+fc1



2013-11-11 16:56:58

by Felipe Contreras

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: add assoc beacon timeout logic

On Mon, Nov 11, 2013 at 10:53 AM, Felipe Contreras
<[email protected]> wrote:
> On Mon, Nov 11, 2013 at 10:41 AM, Johannes Berg
> <[email protected]> wrote:
>> On Mon, 2013-11-11 at 10:23 -0600, Felipe Contreras wrote:

>>> > This isn't really true like I said above - the kernel can only drop the
>>> > association, if userspace *insists* then it will try again and again.
>>>
>>> But it's not doing this:
>>>
>>> ieee80211_destroy_assoc_data(sdata, false);
>>> cfg80211_assoc_timeout(sdata->dev, bss);
>>>
>>> Which is what causes the association to stop for me.
>>>
>>> So where exactly in the code is the association being "dropped"?
>>
>> This does get called in my setup.
>
> Yes, because your setup is receiving beacons.
>
> Check the code:
>
> if ((ifmgd->assoc_data->need_beacon && !ifmgd->have_beacon) ||
> ieee80211_do_assoc(sdata)) {
> struct cfg80211_bss *bss = ifmgd->assoc_data->bss;
>
> ieee80211_destroy_assoc_data(sdata, false);
> cfg80211_assoc_timeout(sdata->dev, bss);
> }
>
> If there's no beacon, cfg80211_assoc_timeout() is not called.
>
> I'm sure if you don't call ieee80211_rx_mgmt_beacon() at all you will
> see the same behavior I see.

My bad, actually the code that is not being called is:

cfg80211_unlink_bss(local->hw.wiphy, assoc_data->bss);

In ieee80211_do_assoc().

--
Felipe Contreras

2013-11-27 06:44:39

by Felipe Contreras

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: add assoc beacon timeout logic

On Thu, Nov 14, 2013 at 4:44 AM, Felipe Contreras
<[email protected]> wrote:
> On Wed, Nov 13, 2013 at 12:30 PM, Johannes Berg
> <[email protected]> wrote:

>> In any case, I'm not really comfortable connecting to an AP that we
>> never ever receive beacons from.
>
> Well, we are *already* doing that for every driver, except the ones
> that do IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC, which are few.

Since you didn't reply I'm going to arrive to the obvious conclusion;
we *already* connect to APs even if we don't receive any beacons, my
patch simply makes it consistent for the drivers that set
IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC.

Therefore all your reasons to not apply the patch have been dismissed.

--
Felipe Contreras

2013-11-11 18:07:00

by Felipe Contreras

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: add assoc beacon timeout logic

On Mon, Nov 11, 2013 at 11:00 AM, Johannes Berg
<[email protected]> wrote:
> On Mon, 2013-11-11 at 10:53 -0600, Felipe Contreras wrote:

>> > Like I said before - trying to work with an AP without beacons at all is
>> > really bad, we shouldn't be doing it.
>>
>> Why not? For all intents and purposes my system is not receiving any
>> beacons, and I don't see any problems.
>
> The not receiving part is a bug. I think you're probably receiving
> beacons once associated though?

Nope. Never.

>> What would you prefer? That nothing works at all?
>
> Yes, that'd be much safer.

How exactly?

>> > We might not properly react to
>> > radar events, and other things, for example.
>>
>> So? I don't know what that means, but it can't be worst than not being
>> able to connect to the Internet whatsoever at all.
>
> It can make you break the law.

How?

I'm reading this document[1], and if that's what you are referring to,
then for starters it only applies to the master mode, my patch changes
the behavior only on station mode.

Moreover, if continuing the association without beacons has a legal a
problem, that problem would exist for drivers that don't have the
IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC flag, wouldn't it? How exactly
would trying to associate with need_beacon break the law, but not if
!need_beacon?

[1] http://wireless.kernel.org/en/developers/DFS

--
Felipe Contreras

2013-11-11 10:59:42

by Felipe Contreras

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: add assoc beacon timeout logic

On Mon, Nov 11, 2013 at 3:08 AM, Johannes Berg
<[email protected]> wrote:
> On Sun, 2013-11-10 at 15:45 -0600, Felipe Contreras wrote:
>> We don't want to be waiting forever for a beacon that will never come,
>> just continue the association.
>
> This makes no sense, IMO.
>
> If there really are no beacons at all, then nothing can work with the AP
> -- things like HT, regulatory/radar, powersave, multiple virtual
> interfaces and many others require beacons.

Well the AP is sending beacons, but they seem to be corrupted,
although the corruption often seems to happen in a place that is not
so important.

No other device at this house seems to have a problem with that, even
the Intel driver in Windows doesn't have a problem, it's just the
driver in Linux.

However, if I apply this patch, I don't notice any issue, it
associates and works fine. Maybe there's some subtle issues with
features I don't personally use, or perhaps there's the occasional
disconnection (although that could be due to something else), but
that's light years away from not associating at all.

I'd say between a) some features not working and b) nothing working at
all, a) is preferred.

If you think it's better that nothing works at all, then wouldn't it
make sense to time out and return an error? Currently we just keep
trying to associate *forever*.

> If the AP is sending beacons but the device isn't receiving them, then
> it's a driver bug and mac80211 shouldn't work around it.

I agree, but I can't seem to convince Intel guys of that. The firmware
is dropping the corrupt beacon frames (although not always), so
there's nothing the driver can do afterwards.

But even if there were not beacons at all (corrupt or otherwise), I
still think waiting *forever* in a loop is not ideal, a) is preferred;
not having all the features, but still somehow work (from my point of
view it's more than somewhat).

--
Felipe Contreras

2013-11-11 16:41:14

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: add assoc beacon timeout logic

On Mon, 2013-11-11 at 10:23 -0600, Felipe Contreras wrote:

> The driver is not receiving it at all. I already debugged this:
>
> http://article.gmane.org/gmane.linux.kernel.wireless.general/115429

Hmm, ok. I pretty much didn't read that thread since some others were
jumping in.

> However, I noticed that once in a very long time, sometimes it does
> receive the corrupted frame and the association continues, and the
> driver code detects it's a corrupted beacon frame.

So how does it treat the corruption?

> > The firmware still
> > shouldn't be filtering anything since it doesn't really look at the
> > beacon information (or maybe it filters based on the DS IE? I'm not
> > entirely sure)
>
> That's what I thought, but I don't see it at all (only in monitor
> mode, and in ad-hoc).

Yes, that part is odd - that's really the root cause.

I didn't quickly find in the threads what device and firmware you were
using, mind identifying it (again)?

> Nope, it keeps trying forever.
>
> Oct 13 14:33:15 nysa kernel: wlan0: authenticate with e0:1d:3b:46:82:a0
> Oct 13 14:33:15 nysa kernel: wlan0: send auth to e0:1d:3b:46:82:a0 (try 1/3)
> Oct 13 14:33:15 nysa kernel: wlan0: authenticated
> Oct 13 14:33:15 nysa kernel: wlan0: waiting for beacon from e0:1d:3b:46:82:a0
> Oct 13 14:33:18 nysa kernel: wlan0: authenticate with e0:1d:3b:46:82:a0
> Oct 13 14:33:18 nysa kernel: wlan0: send auth to e0:1d:3b:46:82:a0 (try 1/3)
> Oct 13 14:33:18 nysa kernel: wlan0: authenticated
> Oct 13 14:33:18 nysa kernel: wlan0: waiting for beacon from e0:1d:3b:46:82:a0
> Oct 13 14:33:22 nysa kernel: wlan0: authenticate with e0:1d:3b:46:82:a0
> Oct 13 14:33:22 nysa kernel: wlan0: send auth to e0:1d:3b:46:82:a0 (try 1/3)
> Oct 13 14:33:22 nysa kernel: wlan0: authenticated
> Oct 13 14:33:22 nysa kernel: wlan0: waiting for beacon from e0:1d:3b:46:82:a0
> ...

I see the same behaviour - but it's the supplicant's doing, it is indeed
getting the event that the AP connection failed (timed out):

wlan0: Event ASSOC_TIMED_OUT (15) received


> > This isn't really true like I said above - the kernel can only drop the
> > association, if userspace *insists* then it will try again and again.
>
> But it's not doing this:
>
> ieee80211_destroy_assoc_data(sdata, false);
> cfg80211_assoc_timeout(sdata->dev, bss);
>
> Which is what causes the association to stop for me.
>
> So where exactly in the code is the association being "dropped"?

This does get called in my setup.

> I would rather fix the problem at the two levels, so even if the
> firmware passes the corrupt frames correctly, the driver would still
> somewhat work when there's no beacon frames at all.

Like I said before - trying to work with an AP without beacons at all is
really bad, we shouldn't be doing it. We might not properly react to
radar events, and other things, for example.

johannes


2013-11-11 17:00:35

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: add assoc beacon timeout logic

On Mon, 2013-11-11 at 10:53 -0600, Felipe Contreras wrote:

> > I see the same behaviour - but it's the supplicant's doing, it is indeed
> > getting the event that the AP connection failed (timed out):
> >
> > wlan0: Event ASSOC_TIMED_OUT (15) received
>
> Not in my setup.

Well, dunno then. Different kernel versions? This clearly happens for
me.

> >> > This isn't really true like I said above - the kernel can only drop the
> >> > association, if userspace *insists* then it will try again and again.
> >>
> >> But it's not doing this:
> >>
> >> ieee80211_destroy_assoc_data(sdata, false);
> >> cfg80211_assoc_timeout(sdata->dev, bss);
> >>
> >> Which is what causes the association to stop for me.
> >>
> >> So where exactly in the code is the association being "dropped"?
> >
> > This does get called in my setup.
>
> Yes, because your setup is receiving beacons.

No ... I tested on hwsim, making it ask for dtim-before-assoc, and
short-circuiting the beacon-TX routing. It can't have been seeing
beacons.

> Check the code:
>
> if ((ifmgd->assoc_data->need_beacon && !ifmgd->have_beacon) ||
> ieee80211_do_assoc(sdata)) {
> struct cfg80211_bss *bss = ifmgd->assoc_data->bss;
>
> ieee80211_destroy_assoc_data(sdata, false);
> cfg80211_assoc_timeout(sdata->dev, bss);
> }
>
> If there's no beacon, cfg80211_assoc_timeout() is not called.

Yes it is.

"need_beacon && !have_beacon:

means - I wanted the beacon but didn't get it at the timeout.

> I'm sure if you don't call ieee80211_rx_mgmt_beacon() at all you will
> see the same behavior I see.

I'm sure I won't :)

> > Like I said before - trying to work with an AP without beacons at all is
> > really bad, we shouldn't be doing it.
>
> Why not? For all intents and purposes my system is not receiving any
> beacons, and I don't see any problems.

The not receiving part is a bug. I think you're probably receiving
beacons once associated though?

> What would you prefer? That nothing works at all?

Yes, that'd be much safer.

> > We might not properly react to
> > radar events, and other things, for example.
>
> So? I don't know what that means, but it can't be worst than not being
> able to connect to the Internet whatsoever at all.

It can make you break the law.

johannes


2013-11-11 15:43:56

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: add assoc beacon timeout logic

On Mon, 2013-11-11 at 04:59 -0600, Felipe Contreras wrote:

> Well the AP is sending beacons, but they seem to be corrupted,
> although the corruption often seems to happen in a place that is not
> so important.

Indeed - the beacon you sent to me in private is damaged somewhere
towards the end of the frame. Are we actually receiving it but ignoring
it because it doesn't have the data we need? The firmware still
shouldn't be filtering anything since it doesn't really look at the
beacon information (or maybe it filters based on the DS IE? I'm not
entirely sure)

> No other device at this house seems to have a problem with that, even
> the Intel driver in Windows doesn't have a problem, it's just the
> driver in Linux.

The windows driver has some slightly different behaviour - it pretends
to be associated towards the firmware before that's actually true, iirc.

> However, if I apply this patch, I don't notice any issue, it
> associates and works fine. Maybe there's some subtle issues with
> features I don't personally use, or perhaps there's the occasional
> disconnection (although that could be due to something else), but
> that's light years away from not associating at all.
>
> I'd say between a) some features not working and b) nothing working at
> all, a) is preferred.
>
> If you think it's better that nothing works at all, then wouldn't it
> make sense to time out and return an error? Currently we just keep
> trying to associate *forever*.

That's wpa_supplicant/userspace behaviour. The kernel will just drop the
connection.

> > If the AP is sending beacons but the device isn't receiving them, then
> > it's a driver bug and mac80211 shouldn't work around it.
>
> I agree, but I can't seem to convince Intel guys of that. The firmware
> is dropping the corrupt beacon frames (although not always), so
> there's nothing the driver can do afterwards.

You realize I work for the same team in Intel as well? :)

> But even if there were not beacons at all (corrupt or otherwise), I
> still think waiting *forever* in a loop is not ideal, a) is preferred;
> not having all the features, but still somehow work (from my point of
> view it's more than somewhat).

This isn't really true like I said above - the kernel can only drop the
association, if userspace *insists* then it will try again and again.

I'd much rather try to get to the bottom of this. Maybe the firmware is
dropping the beacon because the DS IE is broken? Or are we receiving it
but ignoring it because it's broken?

Unfortunately, there's only so much we can do to work around broken APs.

johannes


2013-11-14 10:44:51

by Felipe Contreras

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: add assoc beacon timeout logic

On Wed, Nov 13, 2013 at 12:30 PM, Johannes Berg
<[email protected]> wrote:
> On Mon, 2013-11-11 at 12:06 -0600, Felipe Contreras wrote:
>
>> > The not receiving part is a bug. I think you're probably receiving
>> > beacons once associated though?
>>
>> Nope. Never.
>
> That's odd, firmware bug? But it's still odd since windows works?
>
>> Moreover, if continuing the association without beacons has a legal a
>> problem, that problem would exist for drivers that don't have the
>> IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC flag, wouldn't it? How exactly
>> would trying to associate with need_beacon break the law, but not if
>> !need_beacon?
>
> Well, it's actually somewhat unlikely you'd break the law, although
> possible, since eventually you'd get disconnected from the AP anyway?

I get disconnected from the AP due to other bugs, you are making the
assumption that it's because of the lack of beacons.

I saw this the last time I experienced a disconnect:

Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: fail to flush all
tx fifo queues Q 2
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Current SW read_ptr
213 write_ptr 214
Nov 13 16:29:48 nysa kernel: iwl data: 00000000: 00 00 20 00 00 00 00
00 00 00 00 00 00 00 00 00 .. .............
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: FH TRBs(0) = 0x8000302e
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: FH TRBs(1) = 0x801020d5
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: FH TRBs(2) = 0x80201016
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: FH TRBs(3) = 0x803000fb
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: FH TRBs(4) = 0x00000000
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: FH TRBs(5) = 0x00000000
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: FH TRBs(6) = 0x00000000
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: FH TRBs(7) = 0x00709010
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 0 is active and
mapped to fifo 3 ra_tid 0x0000 [252,252]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 1 is active and
mapped to fifo 2 ra_tid 0x0000 [23,23]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 2 is active and
mapped to fifo 1 ra_tid 0x0000 [213,214]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 3 is active and
mapped to fifo 0 ra_tid 0x0000 [47,47]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 4 is active and
mapped to fifo 0 ra_tid 0x0000 [0,0]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 5 is active and
mapped to fifo 4 ra_tid 0x0000 [0,0]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 6 is active and
mapped to fifo 2 ra_tid 0x0000 [0,0]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 7 is active and
mapped to fifo 5 ra_tid 0x0000 [0,0]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 8 is active and
mapped to fifo 4 ra_tid 0x0000 [0,0]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 9 is active and
mapped to fifo 7 ra_tid 0x0000 [17,17]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 10 is active and
mapped to fifo 5 ra_tid 0x0000 [0,0]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 11 is inactive
and mapped to fifo 1 ra_tid 0x0000 [163,163]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 12 is inactive
and mapped to fifo 3 ra_tid 0x0006 [236,236]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 13 is inactive
and mapped to fifo 0 ra_tid 0x0000 [0,0]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 14 is inactive
and mapped to fifo 0 ra_tid 0x0000 [0,0]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 15 is inactive
and mapped to fifo 0 ra_tid 0x0000 [0,0]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 16 is inactive
and mapped to fifo 0 ra_tid 0x0000 [0,0]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 17 is inactive
and mapped to fifo 0 ra_tid 0x0000 [0,0]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 18 is inactive
and mapped to fifo 0 ra_tid 0x0000 [0,0]
Nov 13 16:29:48 nysa kernel: iwlwifi 0000:02:00.0: Q 19 is inactive
and mapped to fifo 0 ra_tid 0x0000 [0,0]

> In any case - your argumentation above isn't true because every device
> listens for beacons, and if none are received then it should eventually
> disconnect. That should be true even with your hack here.

Should it? I don't see that happening. Where exactly in the code is that check?

> In any case, I'm not really comfortable connecting to an AP that we
> never ever receive beacons from.

Well, we are *already* doing that for every driver, except the ones
that do IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC, which are few.

Cheers.

--
Felipe Contreras

2013-11-11 16:54:00

by Felipe Contreras

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: add assoc beacon timeout logic

On Mon, Nov 11, 2013 at 10:41 AM, Johannes Berg
<[email protected]> wrote:
> On Mon, 2013-11-11 at 10:23 -0600, Felipe Contreras wrote:

>> However, I noticed that once in a very long time, sometimes it does
>> receive the corrupted frame and the association continues, and the
>> driver code detects it's a corrupted beacon frame.
>
> So how does it treat the corruption?

wlan0: associating with AP with corrupt beacon

>> > The firmware still
>> > shouldn't be filtering anything since it doesn't really look at the
>> > beacon information (or maybe it filters based on the DS IE? I'm not
>> > entirely sure)
>>
>> That's what I thought, but I don't see it at all (only in monitor
>> mode, and in ad-hoc).
>
> Yes, that part is odd - that's really the root cause.
>
> I didn't quickly find in the threads what device and firmware you were
> using, mind identifying it (again)?

Intel Corporation Centrino Advanced-N 6235 (rev 24)
iwlwifi 0000:02:00.0: loaded firmware version 18.168.6.1 op_mode iwldvm

>> Nope, it keeps trying forever.
>>
>> Oct 13 14:33:15 nysa kernel: wlan0: authenticate with e0:1d:3b:46:82:a0
>> Oct 13 14:33:15 nysa kernel: wlan0: send auth to e0:1d:3b:46:82:a0 (try 1/3)
>> Oct 13 14:33:15 nysa kernel: wlan0: authenticated
>> Oct 13 14:33:15 nysa kernel: wlan0: waiting for beacon from e0:1d:3b:46:82:a0
>> Oct 13 14:33:18 nysa kernel: wlan0: authenticate with e0:1d:3b:46:82:a0
>> Oct 13 14:33:18 nysa kernel: wlan0: send auth to e0:1d:3b:46:82:a0 (try 1/3)
>> Oct 13 14:33:18 nysa kernel: wlan0: authenticated
>> Oct 13 14:33:18 nysa kernel: wlan0: waiting for beacon from e0:1d:3b:46:82:a0
>> Oct 13 14:33:22 nysa kernel: wlan0: authenticate with e0:1d:3b:46:82:a0
>> Oct 13 14:33:22 nysa kernel: wlan0: send auth to e0:1d:3b:46:82:a0 (try 1/3)
>> Oct 13 14:33:22 nysa kernel: wlan0: authenticated
>> Oct 13 14:33:22 nysa kernel: wlan0: waiting for beacon from e0:1d:3b:46:82:a0
>> ...
>
> I see the same behaviour - but it's the supplicant's doing, it is indeed
> getting the event that the AP connection failed (timed out):
>
> wlan0: Event ASSOC_TIMED_OUT (15) received

Not in my setup.

>> > This isn't really true like I said above - the kernel can only drop the
>> > association, if userspace *insists* then it will try again and again.
>>
>> But it's not doing this:
>>
>> ieee80211_destroy_assoc_data(sdata, false);
>> cfg80211_assoc_timeout(sdata->dev, bss);
>>
>> Which is what causes the association to stop for me.
>>
>> So where exactly in the code is the association being "dropped"?
>
> This does get called in my setup.

Yes, because your setup is receiving beacons.

Check the code:

if ((ifmgd->assoc_data->need_beacon && !ifmgd->have_beacon) ||
ieee80211_do_assoc(sdata)) {
struct cfg80211_bss *bss = ifmgd->assoc_data->bss;

ieee80211_destroy_assoc_data(sdata, false);
cfg80211_assoc_timeout(sdata->dev, bss);
}

If there's no beacon, cfg80211_assoc_timeout() is not called.

I'm sure if you don't call ieee80211_rx_mgmt_beacon() at all you will
see the same behavior I see.

>> I would rather fix the problem at the two levels, so even if the
>> firmware passes the corrupt frames correctly, the driver would still
>> somewhat work when there's no beacon frames at all.
>
> Like I said before - trying to work with an AP without beacons at all is
> really bad, we shouldn't be doing it.

Why not? For all intents and purposes my system is not receiving any
beacons, and I don't see any problems.

What would you prefer? That nothing works at all?

> We might not properly react to
> radar events, and other things, for example.

So? I don't know what that means, but it can't be worst than not being
able to connect to the Internet whatsoever at all.

Cheers.

--
Felipe Contreras

2013-11-11 06:41:30

by Krishna Chaitanya

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: add assoc beacon timeout logic

On Mon, Nov 11, 2013 at 3:15 AM, Felipe Contreras
<[email protected]> wrote:
> We don't want to be waiting forever for a beacon that will never come,
> just continue the association.
>
> Signed-off-by: Felipe Contreras <[email protected]>
> ---
>
> The previous version depended on some cleanup patches, plus had some unclear
> rerun logic.
>
> net/mac80211/ieee80211_i.h | 1 +
> net/mac80211/mlme.c | 32 ++++++++++++++++++++++++++------
> 2 files changed, 27 insertions(+), 6 deletions(-)
>
> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
> index 611abfc..e1f858d 100644
> --- a/net/mac80211/ieee80211_i.h
> +++ b/net/mac80211/ieee80211_i.h
> @@ -358,6 +358,7 @@ struct ieee80211_mgd_assoc_data {
> const u8 *supp_rates;
>
> unsigned long timeout;
> + unsigned long beacon_timeout;
> int tries;
>
> u16 capability;
> diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
> index 86e4ad5..68f76f7 100644
> --- a/net/mac80211/mlme.c
> +++ b/net/mac80211/mlme.c
> @@ -38,6 +38,7 @@
> #define IEEE80211_ASSOC_TIMEOUT (HZ / 5)
> #define IEEE80211_ASSOC_TIMEOUT_LONG (HZ / 2)
> #define IEEE80211_ASSOC_TIMEOUT_SHORT (HZ / 10)
> +#define IEEE80211_ASSOC_BEACON_TIMEOUT 2 * HZ

Don't you think its too long? 3-4 beacon intervals should suffice.

> #define IEEE80211_ASSOC_MAX_TRIES 3
>
> static int max_nullfunc_tries = 2;
> @@ -3475,6 +3476,24 @@ void ieee80211_mgd_conn_tx_status(struct ieee80211_sub_if_data *sdata,
> ieee80211_queue_work(&local->hw, &sdata->work);
> }
>
> +static int check_beacon(struct ieee80211_sub_if_data *sdata)
> +{
> + struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
> + struct ieee80211_mgd_assoc_data *assoc_data = ifmgd->assoc_data;
> +
> + if (!assoc_data->need_beacon || ifmgd->have_beacon)
> + return true;
> +
> + if (time_after(jiffies, assoc_data->beacon_timeout)) {
> + sdata_info(sdata, "no beacon from %pM\n", assoc_data->bss->bssid);
> + return true;
> + }
> +
> + assoc_data->timeout = TU_TO_EXP_TIME(assoc_data->bss->beacon_interval);
> + run_again(sdata, assoc_data->timeout);
> + return false;
> +}
> +
> void ieee80211_sta_work(struct ieee80211_sub_if_data *sdata)
> {
> struct ieee80211_local *local = sdata->local;
> @@ -3533,12 +3552,12 @@ void ieee80211_sta_work(struct ieee80211_sub_if_data *sdata)
>
> if (ifmgd->assoc_data && ifmgd->assoc_data->timeout_started &&
> time_after(jiffies, ifmgd->assoc_data->timeout)) {
> - if ((ifmgd->assoc_data->need_beacon && !ifmgd->have_beacon) ||
> - ieee80211_do_assoc(sdata)) {
> - struct cfg80211_bss *bss = ifmgd->assoc_data->bss;
> -
> - ieee80211_destroy_assoc_data(sdata, false);
> - cfg80211_assoc_timeout(sdata->dev, bss);
> + if (check_beacon(sdata)) {
> + if (ieee80211_do_assoc(sdata)) {
> + struct cfg80211_bss *bss = ifmgd->assoc_data->bss;
> + ieee80211_destroy_assoc_data(sdata, false);
> + cfg80211_assoc_timeout(sdata->dev, bss);
> + }
> }
> } else if (ifmgd->assoc_data && ifmgd->assoc_data->timeout_started)
> run_again(sdata, ifmgd->assoc_data->timeout);
> @@ -4335,6 +4354,7 @@ int ieee80211_mgd_assoc(struct ieee80211_sub_if_data *sdata,
> sdata_info(sdata, "waiting for beacon from %pM\n",
> ifmgd->bssid);
> assoc_data->timeout = TU_TO_EXP_TIME(req->bss->beacon_interval);
> + assoc_data->beacon_timeout = jiffies + IEEE80211_ASSOC_BEACON_TIMEOUT;
> assoc_data->timeout_started = true;
> assoc_data->need_beacon = true;
> } else if (beacon_ies) {
> --
> 1.8.4.2+fc1

2013-11-11 17:01:15

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: add assoc beacon timeout logic

On Mon, 2013-11-11 at 10:56 -0600, Felipe Contreras wrote:
> On Mon, Nov 11, 2013 at 10:53 AM, Felipe Contreras
> <[email protected]> wrote:
> > On Mon, Nov 11, 2013 at 10:41 AM, Johannes Berg
> > <[email protected]> wrote:
> >> On Mon, 2013-11-11 at 10:23 -0600, Felipe Contreras wrote:
>
> >>> > This isn't really true like I said above - the kernel can only drop the
> >>> > association, if userspace *insists* then it will try again and again.
> >>>
> >>> But it's not doing this:
> >>>
> >>> ieee80211_destroy_assoc_data(sdata, false);
> >>> cfg80211_assoc_timeout(sdata->dev, bss);
> >>>
> >>> Which is what causes the association to stop for me.
> >>>
> >>> So where exactly in the code is the association being "dropped"?
> >>
> >> This does get called in my setup.
> >
> > Yes, because your setup is receiving beacons.
> >
> > Check the code:
> >
> > if ((ifmgd->assoc_data->need_beacon && !ifmgd->have_beacon) ||
> > ieee80211_do_assoc(sdata)) {
> > struct cfg80211_bss *bss = ifmgd->assoc_data->bss;
> >
> > ieee80211_destroy_assoc_data(sdata, false);
> > cfg80211_assoc_timeout(sdata->dev, bss);
> > }
> >
> > If there's no beacon, cfg80211_assoc_timeout() is not called.
> >
> > I'm sure if you don't call ieee80211_rx_mgmt_beacon() at all you will
> > see the same behavior I see.
>
> My bad, actually the code that is not being called is:
>
> cfg80211_unlink_bss(local->hw.wiphy, assoc_data->bss);
>
> In ieee80211_do_assoc().

That's not really interesting though, it just deletes the scan entry. If
it was deleted, then the supplicant would just scan again and probably
retry the connection.

johannes



2013-11-11 16:23:06

by Felipe Contreras

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: add assoc beacon timeout logic

On Mon, Nov 11, 2013 at 9:43 AM, Johannes Berg
<[email protected]> wrote:
> On Mon, 2013-11-11 at 04:59 -0600, Felipe Contreras wrote:
>
>> Well the AP is sending beacons, but they seem to be corrupted,
>> although the corruption often seems to happen in a place that is not
>> so important.
>
> Indeed - the beacon you sent to me in private is damaged somewhere
> towards the end of the frame. Are we actually receiving it but ignoring
> it because it doesn't have the data we need?

The driver is not receiving it at all. I already debugged this:

http://article.gmane.org/gmane.linux.kernel.wireless.general/115429

However, I noticed that once in a very long time, sometimes it does
receive the corrupted frame and the association continues, and the
driver code detects it's a corrupted beacon frame.

> The firmware still
> shouldn't be filtering anything since it doesn't really look at the
> beacon information (or maybe it filters based on the DS IE? I'm not
> entirely sure)

That's what I thought, but I don't see it at all (only in monitor
mode, and in ad-hoc).

>> However, if I apply this patch, I don't notice any issue, it
>> associates and works fine. Maybe there's some subtle issues with
>> features I don't personally use, or perhaps there's the occasional
>> disconnection (although that could be due to something else), but
>> that's light years away from not associating at all.
>>
>> I'd say between a) some features not working and b) nothing working at
>> all, a) is preferred.
>>
>> If you think it's better that nothing works at all, then wouldn't it
>> make sense to time out and return an error? Currently we just keep
>> trying to associate *forever*.
>
> That's wpa_supplicant/userspace behaviour. The kernel will just drop the
> connection.

Nope, it keeps trying forever.

Oct 13 14:33:15 nysa kernel: wlan0: authenticate with e0:1d:3b:46:82:a0
Oct 13 14:33:15 nysa kernel: wlan0: send auth to e0:1d:3b:46:82:a0 (try 1/3)
Oct 13 14:33:15 nysa kernel: wlan0: authenticated
Oct 13 14:33:15 nysa kernel: wlan0: waiting for beacon from e0:1d:3b:46:82:a0
Oct 13 14:33:18 nysa kernel: wlan0: authenticate with e0:1d:3b:46:82:a0
Oct 13 14:33:18 nysa kernel: wlan0: send auth to e0:1d:3b:46:82:a0 (try 1/3)
Oct 13 14:33:18 nysa kernel: wlan0: authenticated
Oct 13 14:33:18 nysa kernel: wlan0: waiting for beacon from e0:1d:3b:46:82:a0
Oct 13 14:33:22 nysa kernel: wlan0: authenticate with e0:1d:3b:46:82:a0
Oct 13 14:33:22 nysa kernel: wlan0: send auth to e0:1d:3b:46:82:a0 (try 1/3)
Oct 13 14:33:22 nysa kernel: wlan0: authenticated
Oct 13 14:33:22 nysa kernel: wlan0: waiting for beacon from e0:1d:3b:46:82:a0
...

>> > If the AP is sending beacons but the device isn't receiving them, then
>> > it's a driver bug and mac80211 shouldn't work around it.
>>
>> I agree, but I can't seem to convince Intel guys of that. The firmware
>> is dropping the corrupt beacon frames (although not always), so
>> there's nothing the driver can do afterwards.
>
> You realize I work for the same team in Intel as well? :)

Now I do.

>> But even if there were not beacons at all (corrupt or otherwise), I
>> still think waiting *forever* in a loop is not ideal, a) is preferred;
>> not having all the features, but still somehow work (from my point of
>> view it's more than somewhat).
>
> This isn't really true like I said above - the kernel can only drop the
> association, if userspace *insists* then it will try again and again.

But it's not doing this:

ieee80211_destroy_assoc_data(sdata, false);
cfg80211_assoc_timeout(sdata->dev, bss);

Which is what causes the association to stop for me.

So where exactly in the code is the association being "dropped"?

> I'd much rather try to get to the bottom of this. Maybe the firmware is
> dropping the beacon because the DS IE is broken? Or are we receiving it
> but ignoring it because it's broken?

It's not the latter.

I would rather fix the problem at the two levels, so even if the
firmware passes the corrupt frames correctly, the driver would still
somewhat work when there's no beacon frames at all.

> Unfortunately, there's only so much we can do to work around broken APs.

Indeed, but 'so much' for this AP is really nothing, while with my
patch it's quite a lot.

--
Felipe Contreras

2013-11-11 11:01:13

by Felipe Contreras

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: add assoc beacon timeout logic

On Mon, Nov 11, 2013 at 12:41 AM, Krishna Chaitanya
<[email protected]> wrote:
> On Mon, Nov 11, 2013 at 3:15 AM, Felipe Contreras
> <[email protected]> wrote:
>> We don't want to be waiting forever for a beacon that will never come,
>> just continue the association.
>>
>> Signed-off-by: Felipe Contreras <[email protected]>
>> ---
>>
>> The previous version depended on some cleanup patches, plus had some unclear
>> rerun logic.
>>
>> net/mac80211/ieee80211_i.h | 1 +
>> net/mac80211/mlme.c | 32 ++++++++++++++++++++++++++------
>> 2 files changed, 27 insertions(+), 6 deletions(-)
>>
>> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
>> index 611abfc..e1f858d 100644
>> --- a/net/mac80211/ieee80211_i.h
>> +++ b/net/mac80211/ieee80211_i.h
>> @@ -358,6 +358,7 @@ struct ieee80211_mgd_assoc_data {
>> const u8 *supp_rates;
>>
>> unsigned long timeout;
>> + unsigned long beacon_timeout;
>> int tries;
>>
>> u16 capability;
>> diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
>> index 86e4ad5..68f76f7 100644
>> --- a/net/mac80211/mlme.c
>> +++ b/net/mac80211/mlme.c
>> @@ -38,6 +38,7 @@
>> #define IEEE80211_ASSOC_TIMEOUT (HZ / 5)
>> #define IEEE80211_ASSOC_TIMEOUT_LONG (HZ / 2)
>> #define IEEE80211_ASSOC_TIMEOUT_SHORT (HZ / 10)
>> +#define IEEE80211_ASSOC_BEACON_TIMEOUT 2 * HZ
>
> Don't you think its too long? 3-4 beacon intervals should suffice.

I don't have an opinion, if you think that's better, that's fine by me.

--
Felipe Contreras

2013-11-13 18:30:30

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: add assoc beacon timeout logic

On Mon, 2013-11-11 at 12:06 -0600, Felipe Contreras wrote:

> > The not receiving part is a bug. I think you're probably receiving
> > beacons once associated though?
>
> Nope. Never.

That's odd, firmware bug? But it's still odd since windows works?

> Moreover, if continuing the association without beacons has a legal a
> problem, that problem would exist for drivers that don't have the
> IEEE80211_HW_NEED_DTIM_BEFORE_ASSOC flag, wouldn't it? How exactly
> would trying to associate with need_beacon break the law, but not if
> !need_beacon?

Well, it's actually somewhat unlikely you'd break the law, although
possible, since eventually you'd get disconnected from the AP anyway?

In any case - your argumentation above isn't true because every device
listens for beacons, and if none are received then it should eventually
disconnect. That should be true even with your hack here.

In any case, I'm not really comfortable connecting to an AP that we
never ever receive beacons from. I can try to investigate why this
doesn't happen on windows and what we should be doing though.

johannes


2013-11-11 09:09:02

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: add assoc beacon timeout logic

On Sun, 2013-11-10 at 15:45 -0600, Felipe Contreras wrote:
> We don't want to be waiting forever for a beacon that will never come,
> just continue the association.

This makes no sense, IMO.

If there really are no beacons at all, then nothing can work with the AP
-- things like HT, regulatory/radar, powersave, multiple virtual
interfaces and many others require beacons.

If the AP is sending beacons but the device isn't receiving them, then
it's a driver bug and mac80211 shouldn't work around it.

johannes