Hi,
We use mac80211_hwsim and our own 'hwsim' daemon to test conditions
like poor signal strength or dropped frames. For quite a while we've
noticed very poor reliability related to scanning when
HWSIM_CMD_REGISTER is used to process frames. Scan results are just
empty.
We've put in some work arounds like only registering for tests that
absolutely need it and repeatedly scanning until the expected network
is found, but there are cases where this is not possible.
I'm hoping for some ideas on how to actually fix this problem rather
than continue trying to come up with workarounds. I have tried removing
any frame processing (just an empty function) and noticed the problem
still occurs, basically just calling HWSIM_CMD_REGISTER causes these
problems.
I will admit this seems to happen more on slower systems, like inside a
virtual machine environment, or in tests which create more than just a
few radios (like ~5-6+) so it does seem like mac80211_hwsim/wmediumd
processing the frame is just taking too long for beacons.
Thanks,
James
> Well I should have said time-travel=inf-cpu, which is really the mode
> I'd use for testing (and we have time-travel=ext of course for use
> with
> multiple VMs).
>
> In this case it simulates infinite CPU speed! Thus time only passes
> if
> it passes *explicitly*. So a timeout of 30ms will only fire after
> something else has slept 30ms, or nothing is actually doing anything
> at
> all of course. The amount of time it takes the CPU to do the jump out
> to
> userspace/wmediumd, come back, copy the frame, etc. is all completely
> irrelevant in this case. It's just "sleep 30ms" and all the necessary
> CPU expenditure is not accounted at all.
Sounds magical :) I'll have to look into this again.
>
> johannes
On Wed, 2022-03-23 at 12:45 -0700, James Prestwood wrote:
> > Though I'm surprised the overhead and all is enough to make the jump
> > out
> > to userspace and back take 30+ milliseconds (which is the smallest
> > possible dwell time if you have hwsim hw-scan enabled, otherwise it's
> > slightly larger).
>
> Yeah I'm surprised as well. I haven't _proven_ this is the case but its
> really all I can think of for why scan results are missing.
Agree.
> I don't
> think hw-scan is being used, we don't set ATTR_USE_SCANCTX or
> ATTR_CHANNELS so I guess this is the best case scenario for dwell time,
> hmm.
Well in mac80211 it's HZ/33, which is about the same time.
> > Though if you can run tests under UML/time-travel that would get rid
> > of this problem ;-)
>
> Yeah this has been in the back of my mind for a while since it could
> also speed stuff not having to wait for timeouts.
True.
> But with respect to this issue how could UML fix it? Pause time to
> allow the scheduler to catch up?
Well I should have said time-travel=inf-cpu, which is really the mode
I'd use for testing (and we have time-travel=ext of course for use with
multiple VMs).
In this case it simulates infinite CPU speed! Thus time only passes if
it passes *explicitly*. So a timeout of 30ms will only fire after
something else has slept 30ms, or nothing is actually doing anything at
all of course. The amount of time it takes the CPU to do the jump out to
userspace/wmediumd, come back, copy the frame, etc. is all completely
irrelevant in this case. It's just "sleep 30ms" and all the necessary
CPU expenditure is not accounted at all.
johannes