2020-07-01 22:13:47

by James Prestwood

[permalink] [raw]
Subject: Lost beacon behavior changed as of 01afc6fed (hwsim)

Hi,

First off, everything described here is using mac80211_hwsim. I have
not tested if any of this happens on physical hardware or not.

Commit 01afc6fed seems to have changed the kernel behavior with regard
to lost beacons. So much so that it completely breaks all roaming tests
for IWD and (if kept this way) will require severe changes to the
existing roaming logic we have used for quite a long time. Plus
supporting older kernels AND this new behavior is going to be quite
annoying to deal with.

Before, the kernel would only send a lost beacon QCM event when it
detected beacon loss. This allowed us to scan, find a suitable BSS to
roam to, and then roam.

Now it also sends Del Station, Deauthenticate, and Disconnect all
immediately after a lost beacon, and the disconnect reason being
DISASSOC_DUE_TO_INACTIVITY (4). We handle these extra events as we
would at any other time, and fully disconnect which prevents us from
being able to roam quickly (as well as breaking tests).

Looking at that commit nothing particular jumps out at me, but
obviously those added flags are causing something else to send these
extra events.

Was this change actually intended to cause these extra events? And if
so, why was it changed?

Thanks,
James



2020-07-30 14:10:20

by Johannes Berg

[permalink] [raw]
Subject: Re: Lost beacon behavior changed as of 01afc6fed (hwsim)

Hi James,

> First off, everything described here is using mac80211_hwsim. I have
> not tested if any of this happens on physical hardware or not.
>
> Commit 01afc6fed seems to have changed the kernel behavior with regard
> to lost beacons. So much so that it completely breaks all roaming tests
> for IWD and (if kept this way) will require severe changes to the
> existing roaming logic we have used for quite a long time. Plus
> supporting older kernels AND this new behavior is going to be quite
> annoying to deal with.
>
> Before, the kernel would only send a lost beacon QCM event when it
> detected beacon loss. This allowed us to scan, find a suitable BSS to
> roam to, and then roam.
>
> Now it also sends Del Station, Deauthenticate, and Disconnect all
> immediately after a lost beacon, and the disconnect reason being
> DISASSOC_DUE_TO_INACTIVITY (4). We handle these extra events as we
> would at any other time, and fully disconnect which prevents us from
> being able to roam quickly (as well as breaking tests).
>
> Looking at that commit nothing particular jumps out at me, but
> obviously those added flags are causing something else to send these
> extra events.
>
> Was this change actually intended to cause these extra events? And if
> so, why was it changed?

I don't think that was intentional.

But really that was meant only to enable support for *powersave*.

I suspect that the changes are actually caused by
adding REPORTS_TX_ACK_STATUS, which is in fact necessary here.


But I suspect that it could be that you're testing this in the wrong
way? From your description, it almost seems like you turn off the AP
interface, and roam after that? I'm not sure that's really realistic. If
you wanted to test the "a few beacons were lost" behaviour, then you'd
really have to lose a few beacons only (perhaps by adding something to
wmediumd?), and not drop the AP off the air entirely.

If the AP is in fact completely unreachable, then I'm pretty sure real
hardware will behave just like hwsim here, albeit perhaps a bit slower,
though not by much. And then you'd have the same issue there.

The fact that hwsim behaved differently would likely have been just a
timing thing - it didn't advertise REPORTS_TX_ACK_STATUS, so we'd wait a
bit longer until deciding that the AP really was truly gone. If the ACK
status is reported we just send a (few?) quick nullfunc(s) and decide
that very quickly. But that's independent on hwsim or real hardware.


johannes

2020-08-10 17:19:19

by James Prestwood

[permalink] [raw]
Subject: Re: Lost beacon behavior changed as of 01afc6fed (hwsim)

Hi,

>
> But I suspect that it could be that you're testing this in the wrong
> way? From your description, it almost seems like you turn off the AP
> interface, and roam after that? I'm not sure that's really realistic.

Yes, your right. I guess we just got away with this since the behavior
was different previously.

> If
> you wanted to test the "a few beacons were lost" behaviour, then
> you'd
> really have to lose a few beacons only (perhaps by adding something
> to
> wmediumd?), and not drop the AP off the air entirely.

Yeah, I think this is what we will have to do. Target beacons
specifically to block (and just a few) vs everything.

>
> If the AP is in fact completely unreachable, then I'm pretty sure
> real
> hardware will behave just like hwsim here, albeit perhaps a bit
> slower,
> though not by much. And then you'd have the same issue there.
>
> The fact that hwsim behaved differently would likely have been just a
> timing thing - it didn't advertise REPORTS_TX_ACK_STATUS, so we'd
> wait a
> bit longer until deciding that the AP really was truly gone. If the
> ACK
> status is reported we just send a (few?) quick nullfunc(s) and decide
> that very quickly. But that's independent on hwsim or real hardware.
>
>
> johannes
>

Thanks,
James