Hi James,
> We use mac80211_hwsim and our own 'hwsim' daemon to test conditions
> like poor signal strength or dropped frames. For quite a while we've
> noticed very poor reliability related to scanning when
> HWSIM_CMD_REGISTER is used to process frames. Scan results are just
> empty.
>
> We've put in some work arounds like only registering for tests that
> absolutely need it and repeatedly scanning until the expected network
> is found, but there are cases where this is not possible.
>
> I'm hoping for some ideas on how to actually fix this problem rather
> than continue trying to come up with workarounds. I have tried removing
> any frame processing (just an empty function) and noticed the problem
> still occurs, basically just calling HWSIM_CMD_REGISTER causes these
> problems.
>
> I will admit this seems to happen more on slower systems, like inside a
> virtual machine environment, or in tests which create more than just a
> few radios (like ~5-6+) so it does seem like mac80211_hwsim/wmediumd
> processing the frame is just taking too long for beacons.
>
That sounds like it's just scheduling? HWSIM_CMD_REGISTER makes all the
frames go through the wmediumd (or whatever other tool you use),
including the beacons - which makes sense, but there's scheduling
overhead. Also probe requests/probe responses will go through, and we
only wait for a limited time on each channel for a response/beacon.
Though I'm surprised the overhead and all is enough to make the jump out
to userspace and back take 30+ milliseconds (which is the smallest
possible dwell time if you have hwsim hw-scan enabled, otherwise it's
slightly larger).
Maybe, since beacons are sent periodically, and perhaps we can assume
the overhead is always similar, then you could do passive scanning?
No good answers to this, I guess ...
Though if you can run tests under UML/time-travel that would get rid of
this problem ;-)
johannes
Hi Johannes,
On Wed, 2022-03-23 at 19:50 +0100, Johannes Berg wrote:
> Hi James,
>
> > We use mac80211_hwsim and our own 'hwsim' daemon to test conditions
> > like poor signal strength or dropped frames. For quite a while
> > we've
> > noticed very poor reliability related to scanning when
> > HWSIM_CMD_REGISTER is used to process frames. Scan results are just
> > empty.
> >
> > We've put in some work arounds like only registering for tests that
> > absolutely need it and repeatedly scanning until the expected
> > network
> > is found, but there are cases where this is not possible.
> >
> > I'm hoping for some ideas on how to actually fix this problem
> > rather
> > than continue trying to come up with workarounds. I have tried
> > removing
> > any frame processing (just an empty function) and noticed the
> > problem
> > still occurs, basically just calling HWSIM_CMD_REGISTER causes
> > these
> > problems.
> >
> > I will admit this seems to happen more on slower systems, like
> > inside a
> > virtual machine environment, or in tests which create more than
> > just a
> > few radios (like ~5-6+) so it does seem like
> > mac80211_hwsim/wmediumd
> > processing the frame is just taking too long for beacons.
> >
>
> That sounds like it's just scheduling? HWSIM_CMD_REGISTER makes all
> the
> frames go through the wmediumd (or whatever other tool you use),
> including the beacons - which makes sense, but there's scheduling
> overhead. Also probe requests/probe responses will go through, and we
> only wait for a limited time on each channel for a response/beacon.
>
> Though I'm surprised the overhead and all is enough to make the jump
> out
> to userspace and back take 30+ milliseconds (which is the smallest
> possible dwell time if you have hwsim hw-scan enabled, otherwise it's
> slightly larger).
Yeah I'm surprised as well. I haven't _proven_ this is the case but its
really all I can think of for why scan results are missing. I don't
think hw-scan is being used, we don't set ATTR_USE_SCANCTX or
ATTR_CHANNELS so I guess this is the best case scenario for dwell time,
hmm.
>
> Maybe, since beacons are sent periodically, and perhaps we can assume
> the overhead is always similar, then you could do passive scanning?
>
> No good answers to this, I guess ...
>
> Though if you can run tests under UML/time-travel that would get rid
> of
> this problem ;-)
Yeah this has been in the back of my mind for a while since it could
also speed stuff not having to wait for timeouts.
But with respect to this issue how could UML fix it? Pause time to
allow the scheduler to catch up?
>
> johannes