2023-04-07 18:49:46

by James Prestwood

[permalink] [raw]
Subject: CMD_REMAIN_ON_CHANNEL vs CMD_FRAME (offchannel)

Hi,

I'm having an issue with CMD_REMAIN_ON_CHANNEL taking a full second to
become ready versus CMD_FRAME (offchannel_ok=1) which is quite fast.
Under the hood it looks like both commands call
ieee80211_start_roc_work() so its curious why one would take so long.
I'm running this in UML so I shouldn't be hitting scheduling problems
(at least that's how I understand UML).

This happens during the DPP auth exchange, I can include a full log if
desired, but this is the sequence:

- Start ROC on 2417mhz
- Wait for ROC event indicating we are offchannel
- Receive DPP presence announcement from enrollee
- Send DPP auth request, request enrollee switches to 2412mhz
- Send Cancel ROC (and wait for confirmation)

- Start ROC on 2412mhz
- Oddly, receive the enrollees auth response before ROC event. So the
driver _did_ switch channels.
- ROC event comes about a second later, and enrollee has timed out

So the driver is in fact going offchannel to 2412 and even receiving a
frame. But for whatever reason it doesn't send the ROC event for a full
second. Any idea why the ROC event is so delayed here?

Thanks,
James


2023-04-10 22:52:26

by James Prestwood

[permalink] [raw]
Subject: Re: CMD_REMAIN_ON_CHANNEL vs CMD_FRAME (offchannel)

On 4/7/23 11:47 AM, James Prestwood wrote:
> Hi,
>
> I'm having an issue with CMD_REMAIN_ON_CHANNEL taking a full second to
> become ready versus CMD_FRAME (offchannel_ok=1) which is quite fast.
> Under the hood it looks like both commands call
> ieee80211_start_roc_work() so its curious why one would take so long.
> I'm running this in UML so I shouldn't be hitting scheduling problems
> (at least that's how I understand UML).
>
> This happens during the DPP auth exchange, I can include a full log if
> desired, but this is the sequence:
>
> - Start ROC on 2417mhz
> - Wait for ROC event indicating we are offchannel
> - Receive DPP presence announcement from enrollee
> - Send DPP auth request, request enrollee switches to 2412mhz
> - Send Cancel ROC (and wait for confirmation)
>
> - Start ROC on 2412mhz
> - Oddly, receive the enrollees auth response before ROC event. So the
> driver _did_ switch channels.
> - ROC event comes about a second later, and enrollee has timed out
>
> So the driver is in fact going offchannel to 2412 and even receiving a
> frame. But for whatever reason it doesn't send the ROC event for a full
> second. Any idea why the ROC event is so delayed here?

After a lot of kernel prints I think I know whats going on, but still
somewhat confused.

It appears that sending a frame then immediately canceling the ROC
request is the issue. The kernel seems to be queuing the CMD_FRAME
rather than adding it to the ROC request (is this expected?). To
simplify the above sequence without DPP specifics:

ROC on 2417mhz
...
Send frame on 2417mhz (queued)
Cancel ROC request.
ROC request on 2412mhz (queued)

At this point the kernel waits for the first (canceled) ROC to complete,
then processes the frame request (which requires going offchannel
again). THEN its able to handle the second ROC request.

Anyways, maybe ROC is the wrong way to be doing this? It was convenient
because its much easier to set some ultimate timeout then fire off
frames as needed in that duration in addition to listening for presence
announcements... But maybe ROC is just very limited in what can be done?
and using CMD_FRAME + a duration is the only way the kernel can support
this nicely?

Thanks,
James


>
> Thanks,
> James

2023-04-24 17:38:48

by Johannes Berg

[permalink] [raw]
Subject: Re: CMD_REMAIN_ON_CHANNEL vs CMD_FRAME (offchannel)

On Mon, 2023-04-10 at 15:49 -0700, James Prestwood wrote:
> On 4/7/23 11:47 AM, James Prestwood wrote:
> > Hi,
> >
> > I'm having an issue with CMD_REMAIN_ON_CHANNEL taking a full second to
> > become ready versus CMD_FRAME (offchannel_ok=1) which is quite fast.
> > Under the hood it looks like both commands call
> > ieee80211_start_roc_work() so its curious why one would take so long.
> > I'm running this in UML so I shouldn't be hitting scheduling problems
> > (at least that's how I understand UML).
> >
> > This happens during the DPP auth exchange, I can include a full log if
> > desired, but this is the sequence:
> >
> > - Start ROC on 2417mhz
> > - Wait for ROC event indicating we are offchannel
> > - Receive DPP presence announcement from enrollee
> > - Send DPP auth request, request enrollee switches to 2412mhz
> > - Send Cancel ROC (and wait for confirmation)
> >
> > - Start ROC on 2412mhz
> > - Oddly, receive the enrollees auth response before ROC event. So the
> > driver _did_ switch channels.
> > - ROC event comes about a second later, and enrollee has timed out
> >
> > So the driver is in fact going offchannel to 2412 and even receiving a
> > frame. But for whatever reason it doesn't send the ROC event for a full
> > second. Any idea why the ROC event is so delayed here?
>
> After a lot of kernel prints I think I know whats going on, but still
> somewhat confused.
>
> It appears that sending a frame then immediately canceling the ROC
> request is the issue. The kernel seems to be queuing the CMD_FRAME
> rather than adding it to the ROC request (is this expected?).
>

I can't tell with the information you gave - depends on the waiting
period for a response frame you ask for, I guess? With HW ROC (if you're
using iwlwifi) we cannot extend the previous period, see
ieee80211_coalesce_hw_started_roc().

> To simplify the above sequence without DPP specifics:
>
> ROC on 2417mhz
> ...
> Send frame on 2417mhz (queued)
> Cancel ROC request.
> ROC request on 2412mhz (queued)
>
> At this point the kernel waits for the first (canceled) ROC to complete,
> then processes the frame request (which requires going offchannel
> again). THEN its able to handle the second ROC request.

What are the durations asked for, and waiting for response or not?

> Anyways, maybe ROC is the wrong way to be doing this? It was convenient
> because its much easier to set some ultimate timeout then fire off
> frames as needed in that duration in addition to listening for presence
> announcements... But maybe ROC is just very limited in what can be done?
> and using CMD_FRAME + a duration is the only way the kernel can support
> this nicely?

Not sure I understand this part. ROC is fine mostly for the "wait for
some frame and send a response", but not so much suited for "send a
frame and wait for a response" part. So 3-way-handshakes are iffy with
it...

johannes

2023-04-25 17:58:12

by James Prestwood

[permalink] [raw]
Subject: Re: CMD_REMAIN_ON_CHANNEL vs CMD_FRAME (offchannel)

Hi Johannes,

On 4/24/23 10:36 AM, Johannes Berg wrote:
> On Mon, 2023-04-10 at 15:49 -0700, James Prestwood wrote:
>> On 4/7/23 11:47 AM, James Prestwood wrote:
>>> Hi,
>>>
>>> I'm having an issue with CMD_REMAIN_ON_CHANNEL taking a full second to
>>> become ready versus CMD_FRAME (offchannel_ok=1) which is quite fast.
>>> Under the hood it looks like both commands call
>>> ieee80211_start_roc_work() so its curious why one would take so long.
>>> I'm running this in UML so I shouldn't be hitting scheduling problems
>>> (at least that's how I understand UML).
>>>
>>> This happens during the DPP auth exchange, I can include a full log if
>>> desired, but this is the sequence:
>>>
>>> - Start ROC on 2417mhz
>>> - Wait for ROC event indicating we are offchannel
>>> - Receive DPP presence announcement from enrollee
>>> - Send DPP auth request, request enrollee switches to 2412mhz
>>> - Send Cancel ROC (and wait for confirmation)
>>>
>>> - Start ROC on 2412mhz
>>> - Oddly, receive the enrollees auth response before ROC event. So the
>>> driver _did_ switch channels.
>>> - ROC event comes about a second later, and enrollee has timed out
>>>
>>> So the driver is in fact going offchannel to 2412 and even receiving a
>>> frame. But for whatever reason it doesn't send the ROC event for a full
>>> second. Any idea why the ROC event is so delayed here?
>>
>> After a lot of kernel prints I think I know whats going on, but still
>> somewhat confused.
>>
>> It appears that sending a frame then immediately canceling the ROC
>> request is the issue. The kernel seems to be queuing the CMD_FRAME
>> rather than adding it to the ROC request (is this expected?).
>>
>
> I can't tell with the information you gave - depends on the waiting
> period for a response frame you ask for, I guess? With HW ROC (if you're
> using iwlwifi) we cannot extend the previous period, see
> ieee80211_coalesce_hw_started_roc().

This is purely virtual at the moment, but in my case its looking like it
cannot be extended either since the CMD_FRAME just queues a separate
request.

>
>> To simplify the above sequence without DPP specifics:
>>
>> ROC on 2417mhz
>> ...
>> Send frame on 2417mhz (queued)
>> Cancel ROC request.
>> ROC request on 2412mhz (queued)
>>
>> At this point the kernel waits for the first (canceled) ROC to complete,
>> then processes the frame request (which requires going offchannel
>> again). THEN its able to handle the second ROC request.
>
> What are the durations asked for, and waiting for response or not?
>
>> Anyways, maybe ROC is the wrong way to be doing this? It was convenient
>> because its much easier to set some ultimate timeout then fire off
>> frames as needed in that duration in addition to listening for presence
>> announcements... But maybe ROC is just very limited in what can be done?
>> and using CMD_FRAME + a duration is the only way the kernel can support
>> this nicely?
>
> Not sure I understand this part. ROC is fine mostly for the "wait for
> some frame and send a response", but not so much suited for "send a
> frame and wait for a response" part. So 3-way-handshakes are iffy with
> it...

Yeah, and it actually has worked great for the entire DPP procedure
using the same channel (presence -> auth request -> auth response ->
auth confirm) assuming both sides respond in a timely fashion.

The comes when changing the channel after the auth request. The auth
request gets queued separately, which then delays the ROC and we
can't/shouldn't send anything until ROC starts. The only strange thing
is we actually receive the auth response on the new channel before the
ROC for that new channel even starts. Its like the hardware and driver
aren't quite in sync.

But anyways I think its best to use ROC for presence (waiting for
announcements) but then use CMD_FRAME for the rest of the protocol.

Thanks,
James


>
> johannes

2023-04-25 18:25:32

by Johannes Berg

[permalink] [raw]
Subject: Re: CMD_REMAIN_ON_CHANNEL vs CMD_FRAME (offchannel)

Hi,

> > I can't tell with the information you gave - depends on the waiting
> > period for a response frame you ask for, I guess? With HW ROC (if you're
> > using iwlwifi) we cannot extend the previous period, see
> > ieee80211_coalesce_hw_started_roc().
>
> This is purely virtual at the moment, but in my case its looking like it
> cannot be extended either since the CMD_FRAME just queues a separate
> request.

Hm. Not sure then? I guess then you'd be falling into the max duration
or so?

if (!local->ops->remain_on_channel) {
/* If there's no hardware remain-on-channel, and
* doing so won't push us over the maximum r-o-c
* we allow, then we can just add the new one to
* the list and mark it as having started now.
* If it would push over the limit, don't try to
* combine with other started ones (that haven't
* been running as long) but potentially sort it
* with others that had the same fate.
*/
unsigned long now = jiffies;
u32 elapsed = jiffies_to_msecs(now - tmp->start_time);
struct wiphy *wiphy = local->hw.wiphy;
u32 max_roc = wiphy->max_remain_on_channel_duration;

if (elapsed + roc->duration > max_roc) {
combine_started = false;
continue;
}

Or maybe that logic here is broken somewhat ...

> > Not sure I understand this part. ROC is fine mostly for the "wait for
> > some frame and send a response", but not so much suited for "send a
> > frame and wait for a response" part. So 3-way-handshakes are iffy with
> > it...
>
> Yeah, and it actually has worked great for the entire DPP procedure
> using the same channel (presence -> auth request -> auth response ->
> auth confirm) assuming both sides respond in a timely fashion.
>
> The comes when changing the channel after the auth request. The auth
> request gets queued separately, which then delays the ROC and we
> can't/shouldn't send anything until ROC starts. The only strange thing
> is we actually receive the auth response on the new channel before the
> ROC for that new channel even starts. Its like the hardware and driver
> aren't quite in sync.

Did you say hwsim? That'd be weird. Though in hwsim I think you have an
additional quirk - it never really *leaves* the original channel it's
connected on, it kind of sticks around on *both* which isn't real but
some kind of simplification there. We might want to fix that eventually.
But not sure it's connected already in this case?

> But anyways I think its best to use ROC for presence (waiting for
> announcements) but then use CMD_FRAME for the rest of the protocol.

Right, that's pretty much the intent for this kind of thing. Similar in
P2P where we designed all this, really.

johannes

2023-04-25 19:21:17

by James Prestwood

[permalink] [raw]
Subject: Re: CMD_REMAIN_ON_CHANNEL vs CMD_FRAME (offchannel)

Hi Johannes,

On 4/25/23 11:10 AM, Johannes Berg wrote:
> Hi,
>
>>> I can't tell with the information you gave - depends on the waiting
>>> period for a response frame you ask for, I guess? With HW ROC (if you're
>>> using iwlwifi) we cannot extend the previous period, see
>>> ieee80211_coalesce_hw_started_roc().
>>
>> This is purely virtual at the moment, but in my case its looking like it
>> cannot be extended either since the CMD_FRAME just queues a separate
>> request.
>
> Hm. Not sure then? I guess then you'd be falling into the max duration
> or so?
>
> if (!local->ops->remain_on_channel) {
> /* If there's no hardware remain-on-channel, and
> * doing so won't push us over the maximum r-o-c
> * we allow, then we can just add the new one to
> * the list and mark it as having started now.
> * If it would push over the limit, don't try to
> * combine with other started ones (that haven't
> * been running as long) but potentially sort it
> * with others that had the same fate.
> */
> unsigned long now = jiffies;
> u32 elapsed = jiffies_to_msecs(now - tmp->start_time);
> struct wiphy *wiphy = local->hw.wiphy;
> u32 max_roc = wiphy->max_remain_on_channel_duration;
>
> if (elapsed + roc->duration > max_roc) {
> combine_started = false;
> continue;
> }
>
> Or maybe that logic here is broken somewhat ...

I guess my assumption was if ROC is currently active then a CMD_FRAME
(on the same channel) should just go out immediately, not be queued, and
not even bother checking the duration. (and in this case I'm not even
setting a duration for CMD_FRAME).

Does the OFFCHANNEL_TX_OK flag have any bearing on this? I found that I
have to set it even if ROC is ongoing, maybe thats another topic of
discussion.

>
>>> Not sure I understand this part. ROC is fine mostly for the "wait for
>>> some frame and send a response", but not so much suited for "send a
>>> frame and wait for a response" part. So 3-way-handshakes are iffy with
>>> it...
>>
>> Yeah, and it actually has worked great for the entire DPP procedure
>> using the same channel (presence -> auth request -> auth response ->
>> auth confirm) assuming both sides respond in a timely fashion.
>>
>> The comes when changing the channel after the auth request. The auth
>> request gets queued separately, which then delays the ROC and we
>> can't/shouldn't send anything until ROC starts. The only strange thing
>> is we actually receive the auth response on the new channel before the
>> ROC for that new channel even starts. Its like the hardware and driver
>> aren't quite in sync.
>
> Did you say hwsim? That'd be weird. Though in hwsim I think you have an
> additional quirk - it never really *leaves* the original channel it's
> connected on, it kind of sticks around on *both* which isn't real but
> some kind of simplification there. We might want to fix that eventually.
> But not sure it's connected already in this case?

Ah ok, thats probably whats happening then. And it is connected in this
case. We're connected to a BSS then initiate DPP to configure some
unprovisioned device. Start the protocol with the channel it used for
presence announcements, then tell the peer to transition to our
currently connected channel (by setting the frequency in the auth request).

>
>> But anyways I think its best to use ROC for presence (waiting for
>> announcements) but then use CMD_FRAME for the rest of the protocol.
>
> Right, that's pretty much the intent for this kind of thing. Similar in
> P2P where we designed all this, really.

Ok cool, thanks for clarifying

>
> johannes

2023-04-25 19:23:29

by Johannes Berg

[permalink] [raw]
Subject: Re: CMD_REMAIN_ON_CHANNEL vs CMD_FRAME (offchannel)

Hi,

> I guess my assumption was if ROC is currently active then a CMD_FRAME
> (on the same channel) should just go out immediately, not be queued, and
> not even bother checking the duration. (and in this case I'm not even
> setting a duration for CMD_FRAME).

Hm, yeah, I'd think so too? There's probably some logic to make sure it
can still go out, so it doesn't come in just at the end of the ROC, but
seems that should work?

I mean, that's also exactly what should happen for a response frame,
right? When you are using ROC to listen to requests, and then send a
response. That'd be without setting a duration, normally.

Even the code says:

/* This will handle all kinds of coalescing and immediate TX */
ret = ieee80211_start_roc_work(local, sdata, params->chan,
params->wait, cookie, skb,
IEEE80211_ROC_TYPE_MGMT_TX);


And then seems you should get into

...

/* otherwise handle queueing */

list_for_each_entry(tmp, &local->roc_list, list) {
if (tmp->chan != channel || tmp->sdata != sdata)
continue;

...
if (!local->ops->remain_on_channel) {
...
ieee80211_handle_roc_started(roc, now);

which does

static void ieee80211_handle_roc_started(struct ieee80211_roc_work *roc,
unsigned long start_time)
{
if (WARN_ON(roc->notified))
return;

roc->start_time = start_time;
roc->started = true;

if (roc->mgmt_tx_cookie) {
if (!WARN_ON(!roc->frame)) {
ieee80211_tx_skb_tid_band(roc->sdata, roc->frame, 7,
roc->chan->band);


Not sure why that wouldn't work?


> Does the OFFCHANNEL_TX_OK flag have any bearing on this? I found that I
> have to set it even if ROC is ongoing, maybe thats another topic of
> discussion.

I think you have to just always set that flag unless you're transmitting
on the operating channel.

johannes

2023-04-25 19:29:22

by James Prestwood

[permalink] [raw]
Subject: Re: CMD_REMAIN_ON_CHANNEL vs CMD_FRAME (offchannel)

Hi Johannes,

On 4/25/23 12:14 PM, James Prestwood wrote:
> Hi Johannes,
>
> On 4/25/23 11:10 AM, Johannes Berg wrote:
>> Hi,
>>
>>>> I can't tell with the information you gave - depends on the waiting
>>>> period for a response frame you ask for, I guess? With HW ROC (if
>>>> you're
>>>> using iwlwifi) we cannot extend the previous period, see
>>>> ieee80211_coalesce_hw_started_roc().
>>>
>>> This is purely virtual at the moment, but in my case its looking like it
>>> cannot be extended either since the CMD_FRAME just queues a separate
>>> request.
>>
>> Hm. Not sure then? I guess then you'd be falling into the max duration
>> or so?
>>
>>                  if (!local->ops->remain_on_channel) {
>>                          /* If there's no hardware remain-on-channel, and
>>                           * doing so won't push us over the maximum r-o-c
>>                           * we allow, then we can just add the new one to
>>                           * the list and mark it as having started now.
>>                           * If it would push over the limit, don't try to
>>                           * combine with other started ones (that haven't
>>                           * been running as long) but potentially sort it
>>                           * with others that had the same fate.
>>                           */
>>                          unsigned long now = jiffies;
>>                          u32 elapsed = jiffies_to_msecs(now -
>> tmp->start_time);
>>                          struct wiphy *wiphy = local->hw.wiphy;
>>                          u32 max_roc =
>> wiphy->max_remain_on_channel_duration;
>>
>>                          if (elapsed + roc->duration > max_roc) {
>>                                  combine_started = false;
>>                                  continue;
>>                          }
>>
>> Or maybe that logic here is broken somewhat ...
>
> I guess my assumption was if ROC is currently active then a CMD_FRAME
> (on the same channel) should just go out immediately, not be queued, and
> not even bother checking the duration. (and in this case I'm not even
> setting a duration for CMD_FRAME).
>
> Does the OFFCHANNEL_TX_OK flag have any bearing on this? I found that I
> have to set it even if ROC is ongoing, maybe thats another topic of
> discussion.
>
>>
>>>> Not sure I understand this part. ROC is fine mostly for the "wait for
>>>> some frame and send a response", but not so much suited for "send a
>>>> frame and wait for a response" part. So 3-way-handshakes are iffy with
>>>> it...
>>>
>>> Yeah, and it actually has worked great for the entire DPP procedure
>>> using the same channel (presence -> auth request -> auth response ->
>>> auth confirm) assuming both sides respond in a timely fashion.
>>>
>>> The comes when changing the channel after the auth request. The auth
>>> request gets queued separately, which then delays the ROC and we
>>> can't/shouldn't send anything until ROC starts. The only strange thing
>>> is we actually receive the auth response on the new channel before the
>>> ROC for that new channel even starts. Its like the hardware and driver
>>> aren't quite in sync.
>>
>> Did you say hwsim? That'd be weird. Though in hwsim I think you have an
>> additional quirk - it never really *leaves* the original channel it's
>> connected on, it kind of sticks around on *both* which isn't real but
>> some kind of simplification there. We might want to fix that eventually.
>> But not sure it's connected already in this case?
>
> Ah ok, thats probably whats happening then. And it is connected in this
> case. We're connected to a BSS then initiate DPP to configure some
> unprovisioned device. Start the protocol with the channel it used for
> presence announcements, then tell the peer to transition to our
> currently connected channel (by setting the frequency in the auth request).

So when you asked about being connected it got me thinking, and I went
to check and saw I was doing something very stupid. When we switch to
the new channel its the connected channel, so doing ROC is pointless
(surprised it actually allowed it). And this avoids the issue in
question entirely since there is no delay waiting for another ROC, just
cancel the last request and wait for a frame on the connected channel.

So thats my bad :) The behavior in question is still weird IMO (not
sending CMD_FRAME right away), but in any case it at least works now.

Thanks for the help getting my brain to work :)

>
>>
>>> But anyways I think its best to use ROC for presence (waiting for
>>> announcements) but then use CMD_FRAME for the rest of the protocol.
>>
>> Right, that's pretty much the intent for this kind of thing. Similar in
>> P2P where we designed all this, really.
>
> Ok cool, thanks for clarifying
>
>>
>> johannes