2011-02-04 12:25:31

by Vivek Natarajan

[permalink] [raw]
Subject: [PATCH v2] mac80211: Fix a race on enabling power save.

There is a race on sending a data frame before the tx completion
of nullfunc frame for enabling power save. As the data quickly
follows the nullfunc frame, the AP thinks that the station is out
of power save and continues to send the frames. Whereas in the
station, the nullfunc ack will be processed after the tx completion
of data frame and mac80211 goes to powersave. Thus the power
save state mismatch between the station and the AP causes some
data loss and some applications fail because of that. This patch
fixes this issue.

Signed-off-by: Vivek Natarajan <[email protected]>
---
net/mac80211/ieee80211_i.h | 1 +
net/mac80211/mlme.c | 8 ++++++--
net/mac80211/tx.c | 8 ++++++++
3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 533fd32..6ad97f6 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -346,6 +346,7 @@ enum ieee80211_sta_flags {
IEEE80211_STA_UAPSD_ENABLED = BIT(7),
IEEE80211_STA_NULLFUNC_ACKED = BIT(8),
IEEE80211_STA_RESET_SIGNAL_AVE = BIT(9),
+ IEEE80211_STA_PS_PENDING = BIT(10),
};

struct ieee80211_if_managed {
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index e059b3a..45f736e 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -727,13 +727,17 @@ void ieee80211_dynamic_ps_enable_work(struct work_struct *work)
return;

if ((local->hw.flags & IEEE80211_HW_PS_NULLFUNC_STACK) &&
- (!(ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED)))
+ (!(ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED))) {
+ ifmgd->flags |= IEEE80211_STA_PS_PENDING;
ieee80211_send_nullfunc(local, sdata, 1);
+ }

if (!((local->hw.flags & IEEE80211_HW_REPORTS_TX_ACK_STATUS) &&
(local->hw.flags & IEEE80211_HW_PS_NULLFUNC_STACK)) ||
- (ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED)) {
+ ((ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED) &&
+ ifmgd->flags & IEEE80211_STA_PS_PENDING)) {
ifmgd->flags &= ~IEEE80211_STA_NULLFUNC_ACKED;
+ ifmgd->flags &= ~IEEE80211_STA_PS_PENDING;
local->hw.conf.flags |= IEEE80211_CONF_PS;
ieee80211_hw_config(local, IEEE80211_CONF_CHANGE_PS);
}
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 8fbbc7a..e1c2256 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -185,6 +185,7 @@ ieee80211_tx_h_dynamic_ps(struct ieee80211_tx_data *tx)
{
struct ieee80211_local *local = tx->local;
struct ieee80211_if_managed *ifmgd;
+ struct ieee80211_hdr *hdr;

/* driver doesn't support power save */
if (!(local->hw.flags & IEEE80211_HW_SUPPORTS_PS))
@@ -233,6 +234,13 @@ ieee80211_tx_h_dynamic_ps(struct ieee80211_tx_data *tx)
&& skb_get_queue_mapping(tx->skb) == 0)
return TX_CONTINUE;

+ hdr = (struct ieee80211_hdr *)tx->skb->data;
+
+ if (!(ieee80211_is_nullfunc(hdr->frame_control) &&
+ ieee80211_has_pm(hdr->frame_control)) &&
+ (ifmgd->flags & IEEE80211_STA_PS_PENDING))
+ ifmgd->flags &= ~IEEE80211_STA_PS_PENDING;
+
if (local->hw.conf.flags & IEEE80211_CONF_PS) {
ieee80211_stop_queues_by_reason(&local->hw,
IEEE80211_QUEUE_STOP_REASON_PS);
--
1.6.3.3



2011-02-04 13:07:32

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: Fix a race on enabling power save.

On Fri, 2011-02-04 at 14:05 +0100, Johannes Berg wrote:
> On Fri, 2011-02-04 at 17:55 +0530, Vivek Natarajan wrote:
>
> > + if (!(ieee80211_is_nullfunc(hdr->frame_control) &&
> > + ieee80211_has_pm(hdr->frame_control)) &&
> > + (ifmgd->flags & IEEE80211_STA_PS_PENDING))
> > + ifmgd->flags &= ~IEEE80211_STA_PS_PENDING;
> > +
>
> That's still just as broken before -- you can't assume ifmgd is valid
> here.

Err, never mind, I see you moved it.

johannes


2011-02-04 14:08:29

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: Fix a race on enabling power save.

On Fri, 2011-02-04 at 18:58 +0530, Vivek Natarajan wrote:
> On Fri, Feb 4, 2011 at 6:42 PM, Johannes Berg <[email protected]> wrote:
> > On Fri, 2011-02-04 at 14:08 +0100, Johannes Berg wrote:
> >> > diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
> >> > index e059b3a..45f736e 100644
> >> > --- a/net/mac80211/mlme.c
> >> > +++ b/net/mac80211/mlme.c
> >> > @@ -727,13 +727,17 @@ void ieee80211_dynamic_ps_enable_work(struct work_struct *work)
> >> > return;
> >> >
> >> > if ((local->hw.flags & IEEE80211_HW_PS_NULLFUNC_STACK) &&
> >> > - (!(ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED)))
> >> > + (!(ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED))) {
> >> > + ifmgd->flags |= IEEE80211_STA_PS_PENDING;
> >> > ieee80211_send_nullfunc(local, sdata, 1);
> >> > + }
> >> >
> >> > if (!((local->hw.flags & IEEE80211_HW_REPORTS_TX_ACK_STATUS) &&
> >> > (local->hw.flags & IEEE80211_HW_PS_NULLFUNC_STACK)) ||
> >> > - (ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED)) {
> >> > + ((ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED) &&
> >> > + ifmgd->flags & IEEE80211_STA_PS_PENDING)) {
> >> > ifmgd->flags &= ~IEEE80211_STA_NULLFUNC_ACKED;
> >> > + ifmgd->flags &= ~IEEE80211_STA_PS_PENDING;
> >> > local->hw.conf.flags |= IEEE80211_CONF_PS;
> >> > ieee80211_hw_config(local, IEEE80211_CONF_CHANGE_PS);
> >> > }
> >> > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> >> > index 8fbbc7a..e1c2256 100644
> >> > --- a/net/mac80211/tx.c
> >> > +++ b/net/mac80211/tx.c
> >> > @@ -185,6 +185,7 @@ ieee80211_tx_h_dynamic_ps(struct ieee80211_tx_data *tx)
> >> > {
> >> > struct ieee80211_local *local = tx->local;
> >> > struct ieee80211_if_managed *ifmgd;
> >> > + struct ieee80211_hdr *hdr;
> >> >
> >> > /* driver doesn't support power save */
> >> > if (!(local->hw.flags & IEEE80211_HW_SUPPORTS_PS))
> >> > @@ -233,6 +234,13 @@ ieee80211_tx_h_dynamic_ps(struct ieee80211_tx_data *tx)
> >> > && skb_get_queue_mapping(tx->skb) == 0)
> >> > return TX_CONTINUE;
> >> >
> >> > + hdr = (struct ieee80211_hdr *)tx->skb->data;
> >> > +
> >> > + if (!(ieee80211_is_nullfunc(hdr->frame_control) &&
> >> > + ieee80211_has_pm(hdr->frame_control)) &&
> >> > + (ifmgd->flags & IEEE80211_STA_PS_PENDING))
> >> > + ifmgd->flags &= ~IEEE80211_STA_PS_PENDING;
> >> > +
> >> > if (local->hw.conf.flags & IEEE80211_CONF_PS) {
> >>
> >> I don't see how this patch helps anything. Should the last line I quoted
> >> be replaced instead by checking if PS was requested? We used to not wait
> >> for the ACK -- so waiting for the ACK broke this.
> >
> > Ok maybe I see how this helps -- but I don't think it's race-free. When
> > the PS-pending flag is cleared here, the code above that checks it might
> > already have passed and be in the driver callback or so.
>
> When it is in the driver callback, IEEE80211_CONF_PS would have been
> set and when this is set, ieee80211_tx_h_dynamic_ps will disable PS
> and there wont be any discrepancy in power save states between AP and
> the station.

Indeed, but the trace still exists between checking PS_PENDING and
setting CONF_PS.

johannes


2011-02-15 14:29:01

by Vivek Natarajan

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: Fix a race on enabling power save.

On Fri, Feb 4, 2011 at 7:58 PM, Vivek Natarajan <[email protected]> wrote:
> On Fri, Feb 4, 2011 at 7:38 PM, Johannes Berg <[email protected]> wrote:
>> On Fri, 2011-02-04 at 18:58 +0530, Vivek Natarajan wrote:
>>> On Fri, Feb 4, 2011 at 6:42 PM, Johannes Berg <[email protected]> wrote:
>>> > On Fri, 2011-02-04 at 14:08 +0100, Johannes Berg wrote:
>>> > Ok maybe I see how this helps -- but I don't think it's race-free. When
>>> > the PS-pending flag is cleared here, the code above that checks it might
>>> > already have passed and be in the driver callback or so.
>>>
>>> When it is in the driver callback, IEEE80211_CONF_PS would have been
>>> set and when this is set, ieee80211_tx_h_dynamic_ps will disable PS
>>> and there wont be any discrepancy in power save states between AP and
>>> the station.
>>
>> Indeed, but the trace still exists between checking PS_PENDING and
>> setting CONF_PS.
>
> Agreed. :)
> I will try this out and test a bit:
>>
>> Maybe the subif queues should be stopped, then flush, then tx nullfunc,
>> then stop all queues to configure the HW or something like that?
>>

The subif queues are stopped on receiving ACK_STATUS and the wake up
is done after setting CONF_PS in the newer version that I had sent.
This may help in preventing the above mentioned race. And the
PS_PENDING flag will take care of the actual race in between queuing
the nullfunc frame and receiving ack_status for that.

Vivek.

2011-02-04 14:28:26

by Vivek Natarajan

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: Fix a race on enabling power save.

On Fri, Feb 4, 2011 at 7:38 PM, Johannes Berg <[email protected]> wrote:
> On Fri, 2011-02-04 at 18:58 +0530, Vivek Natarajan wrote:
>> On Fri, Feb 4, 2011 at 6:42 PM, Johannes Berg <[email protected]> wrote:
>> > On Fri, 2011-02-04 at 14:08 +0100, Johannes Berg wrote:
>> > Ok maybe I see how this helps -- but I don't think it's race-free. When
>> > the PS-pending flag is cleared here, the code above that checks it might
>> > already have passed and be in the driver callback or so.
>>
>> When it is in the driver callback, IEEE80211_CONF_PS would have been
>> set and when this is set, ieee80211_tx_h_dynamic_ps will disable PS
>> and there wont be any discrepancy in power save states between AP and
>> the station.
>
> Indeed, but the trace still exists between checking PS_PENDING and
> setting CONF_PS.

Agreed. :)
I will try this out and test a bit:
>
> Maybe the subif queues should be stopped, then flush, then tx nullfunc,
> then stop all queues to configure the HW or something like that?
>
> johannes
>

Thanks
Vivek.

2011-02-16 12:28:27

by Vivek Natarajan

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: Fix a race on enabling power save.

On Wed, Feb 16, 2011 at 4:41 PM, Johannes Berg
<[email protected]> wrote:
> On Wed, 2011-02-16 at 15:01 +0530, Vivek Natarajan wrote:
>
>> >> And this is how I understand this:
>> >> stop queues, then send nullfunc, then do a flush to get the status and
>> >> wake queues again irrespective of acked or not(but set the ACK_STATUS
>> >> correspondingly). If tx() routine of nullfunc fails, then also wake
>> >> the queues.
>> >
>> > Yes, but since the flush is there, the latter (tx fails, ...) won't be
>> > needed.
>>
>> If queues are stopped before sending nullfunc, it is added in the
>> pending frames queue and the flush after nullfunc is ineffective. So,
>> is it okay to bypass this queue_stop check with REASON_PS if it is a
>> nullfunc frame? Any more ideas will be helpful.
>
> Ah, but I wasn't thinking of stopping the device queues, just the sdata
> queues, and the nullfunc doesn't go across those.

Thanks. It works. I will test it for some more time and send in the patch.

Vivek.

2011-02-16 09:31:38

by Vivek Natarajan

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: Fix a race on enabling power save.

On Tue, Feb 15, 2011 at 7:39 PM, Johannes Berg
<[email protected]> wrote:
> On Tue, 2011-02-15 at 19:34 +0530, Vivek Natarajan wrote:
>> On Tue, Feb 15, 2011 at 6:14 PM, Johannes Berg
>> <[email protected]> wrote:
>> >
>> > I've recently been thinking about this -- I'm thinking that maybe we
>> > should change this behaviour. Right now the tx() routine basically
>> > always returns OK (except in at76) and I suppose instead it could return
>> > whether the frame was queued up successfully...
>> >
>> >> After some time interval, once again stop queues on receiving ack for
>> >> nullfunc, configure the hw and then wake up queues. So, during the
>> >> above time interval, there is a race of queuing a frame to the hw. I
>> >> have tested this and the issue is quickly reproducible.
>> >
>> > Right. The code that sends the nullfunc -- I think it can probably
>> > sleep? If so, instead of starting the queue for the TX it could just
>> > flush TX again after sending the nullfunc -- after that either it had
>> > status for the frame, or the frame was dropped, no?
>>
>> So, this requires changing all the drivers to return the status for
>> tx() routine.
>
> Not necessarily. I think that if you do the flush() based approach, then
> mac80211 can infer that the frame was dropped by not getting a TX status
> during flush, right? Assuming there's reliable TX status to start with.
>
>> And this is how I understand this:
>> stop queues, then send nullfunc, then do a flush to get the status and
>> wake queues again irrespective of acked or not(but set the ACK_STATUS
>> correspondingly). If tx() routine of nullfunc fails, then also wake
>> the queues.
>
> Yes, but since the flush is there, the latter (tx fails, ...) won't be
> needed.

If queues are stopped before sending nullfunc, it is added in the
pending frames queue and the flush after nullfunc is ineffective. So,
is it okay to bypass this queue_stop check with REASON_PS if it is a
nullfunc frame? Any more ideas will be helpful.

Vivek.

2011-02-15 14:41:19

by Vivek Natarajan

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: Fix a race on enabling power save.

On Tue, Feb 15, 2011 at 7:39 PM, Johannes Berg
<[email protected]> wrote:
> On Tue, 2011-02-15 at 19:34 +0530, Vivek Natarajan wrote:
>> On Tue, Feb 15, 2011 at 6:14 PM, Johannes Berg
>> <[email protected]> wrote:
>> > On Tue, 2011-02-08 at 15:43 +0530, Vivek Natarajan wrote:
>> >
>> >> > Maybe the subif queues should be stopped, then flush, then tx nullfunc,
>> >> > then stop all queues to configure the HW or something like that?
>> >>
>> >> I tried this sequence:
>> >> the subif queues stopped, then flush, then tx nullfunc, and wake subif
>> >> queues,(we cannot have the queues stopped till we receive tx_status
>> >> because nullfunc might have failed during tx path itself and mac80211
>> >> will not receive tx_status)
>> >
>> > I've recently been thinking about this -- I'm thinking that maybe we
>> > should change this behaviour. Right now the tx() routine basically
>> > always returns OK (except in at76) and I suppose instead it could return
>> > whether the frame was queued up successfully...
>> >
>> >> After some time interval, once again stop queues on receiving ack for
>> >> nullfunc, configure the hw and then wake up queues. So, during the
>> >> above time interval, there is a race of queuing a frame to the hw. I
>> >> have tested this and the issue is quickly reproducible.
>> >
>> > Right. The code that sends the nullfunc -- I think it can probably
>> > sleep? If so, instead of starting the queue for the TX it could just
>> > flush TX again after sending the nullfunc -- after that either it had
>> > status for the frame, or the frame was dropped, no?
>>
>> So, this requires changing all the drivers to return the status for
>> tx() routine.
>
> Not necessarily. I think that if you do the flush() based approach, then
> mac80211 can infer that the frame was dropped by not getting a TX status
> during flush, right? Assuming there's reliable TX status to start with.

Thanks. I will give this a try.

Vivek.

2011-02-04 13:12:38

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: Fix a race on enabling power save.

On Fri, 2011-02-04 at 14:08 +0100, Johannes Berg wrote:
> On Fri, 2011-02-04 at 17:55 +0530, Vivek Natarajan wrote:
> > There is a race on sending a data frame before the tx completion
> > of nullfunc frame for enabling power save. As the data quickly
> > follows the nullfunc frame, the AP thinks that the station is out
> > of power save and continues to send the frames. Whereas in the
> > station, the nullfunc ack will be processed after the tx completion
> > of data frame and mac80211 goes to powersave. Thus the power
> > save state mismatch between the station and the AP causes some
> > data loss and some applications fail because of that. This patch
> > fixes this issue.
> >
> > Signed-off-by: Vivek Natarajan <[email protected]>
> > ---
> > net/mac80211/ieee80211_i.h | 1 +
> > net/mac80211/mlme.c | 8 ++++++--
> > net/mac80211/tx.c | 8 ++++++++
> > 3 files changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
> > index 533fd32..6ad97f6 100644
> > --- a/net/mac80211/ieee80211_i.h
> > +++ b/net/mac80211/ieee80211_i.h
> > @@ -346,6 +346,7 @@ enum ieee80211_sta_flags {
> > IEEE80211_STA_UAPSD_ENABLED = BIT(7),
> > IEEE80211_STA_NULLFUNC_ACKED = BIT(8),
> > IEEE80211_STA_RESET_SIGNAL_AVE = BIT(9),
> > + IEEE80211_STA_PS_PENDING = BIT(10),
> > };
> >
> > struct ieee80211_if_managed {
> > diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
> > index e059b3a..45f736e 100644
> > --- a/net/mac80211/mlme.c
> > +++ b/net/mac80211/mlme.c
> > @@ -727,13 +727,17 @@ void ieee80211_dynamic_ps_enable_work(struct work_struct *work)
> > return;
> >
> > if ((local->hw.flags & IEEE80211_HW_PS_NULLFUNC_STACK) &&
> > - (!(ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED)))
> > + (!(ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED))) {
> > + ifmgd->flags |= IEEE80211_STA_PS_PENDING;
> > ieee80211_send_nullfunc(local, sdata, 1);
> > + }
> >
> > if (!((local->hw.flags & IEEE80211_HW_REPORTS_TX_ACK_STATUS) &&
> > (local->hw.flags & IEEE80211_HW_PS_NULLFUNC_STACK)) ||
> > - (ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED)) {
> > + ((ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED) &&
> > + ifmgd->flags & IEEE80211_STA_PS_PENDING)) {
> > ifmgd->flags &= ~IEEE80211_STA_NULLFUNC_ACKED;
> > + ifmgd->flags &= ~IEEE80211_STA_PS_PENDING;
> > local->hw.conf.flags |= IEEE80211_CONF_PS;
> > ieee80211_hw_config(local, IEEE80211_CONF_CHANGE_PS);
> > }
> > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> > index 8fbbc7a..e1c2256 100644
> > --- a/net/mac80211/tx.c
> > +++ b/net/mac80211/tx.c
> > @@ -185,6 +185,7 @@ ieee80211_tx_h_dynamic_ps(struct ieee80211_tx_data *tx)
> > {
> > struct ieee80211_local *local = tx->local;
> > struct ieee80211_if_managed *ifmgd;
> > + struct ieee80211_hdr *hdr;
> >
> > /* driver doesn't support power save */
> > if (!(local->hw.flags & IEEE80211_HW_SUPPORTS_PS))
> > @@ -233,6 +234,13 @@ ieee80211_tx_h_dynamic_ps(struct ieee80211_tx_data *tx)
> > && skb_get_queue_mapping(tx->skb) == 0)
> > return TX_CONTINUE;
> >
> > + hdr = (struct ieee80211_hdr *)tx->skb->data;
> > +
> > + if (!(ieee80211_is_nullfunc(hdr->frame_control) &&
> > + ieee80211_has_pm(hdr->frame_control)) &&
> > + (ifmgd->flags & IEEE80211_STA_PS_PENDING))
> > + ifmgd->flags &= ~IEEE80211_STA_PS_PENDING;
> > +
> > if (local->hw.conf.flags & IEEE80211_CONF_PS) {
>
> I don't see how this patch helps anything. Should the last line I quoted
> be replaced instead by checking if PS was requested? We used to not wait
> for the ACK -- so waiting for the ACK broke this.

Ok maybe I see how this helps -- but I don't think it's race-free. When
the PS-pending flag is cleared here, the code above that checks it might
already have passed and be in the driver callback or so.

Maybe the subif queues should be stopped, then flush, then tx nullfunc,
then stop all queues to configure the HW or something like that?

johannes


2011-02-04 13:28:56

by Vivek Natarajan

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: Fix a race on enabling power save.

On Fri, Feb 4, 2011 at 6:42 PM, Johannes Berg <[email protected]> wrote:
> On Fri, 2011-02-04 at 14:08 +0100, Johannes Berg wrote:
>> > diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
>> > index e059b3a..45f736e 100644
>> > --- a/net/mac80211/mlme.c
>> > +++ b/net/mac80211/mlme.c
>> > @@ -727,13 +727,17 @@ void ieee80211_dynamic_ps_enable_work(struct work_struct *work)
>> > ? ? ? ? ? ? return;
>> >
>> > ? ? if ((local->hw.flags & IEEE80211_HW_PS_NULLFUNC_STACK) &&
>> > - ? ? ? (!(ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED)))
>> > + ? ? ? (!(ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED))) {
>> > + ? ? ? ? ? ifmgd->flags |= IEEE80211_STA_PS_PENDING;
>> > ? ? ? ? ? ? ieee80211_send_nullfunc(local, sdata, 1);
>> > + ? }
>> >
>> > ? ? if (!((local->hw.flags & IEEE80211_HW_REPORTS_TX_ACK_STATUS) &&
>> > ? ? ? ? ? (local->hw.flags & IEEE80211_HW_PS_NULLFUNC_STACK)) ||
>> > - ? ? ? (ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED)) {
>> > + ? ? ? ((ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED) &&
>> > + ? ? ? ? ifmgd->flags & IEEE80211_STA_PS_PENDING)) ?{
>> > ? ? ? ? ? ? ifmgd->flags &= ~IEEE80211_STA_NULLFUNC_ACKED;
>> > + ? ? ? ? ? ifmgd->flags &= ~IEEE80211_STA_PS_PENDING;
>> > ? ? ? ? ? ? local->hw.conf.flags |= IEEE80211_CONF_PS;
>> > ? ? ? ? ? ? ieee80211_hw_config(local, IEEE80211_CONF_CHANGE_PS);
>> > ? ? }
>> > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
>> > index 8fbbc7a..e1c2256 100644
>> > --- a/net/mac80211/tx.c
>> > +++ b/net/mac80211/tx.c
>> > @@ -185,6 +185,7 @@ ieee80211_tx_h_dynamic_ps(struct ieee80211_tx_data *tx)
>> > ?{
>> > ? ? struct ieee80211_local *local = tx->local;
>> > ? ? struct ieee80211_if_managed *ifmgd;
>> > + ? struct ieee80211_hdr *hdr;
>> >
>> > ? ? /* driver doesn't support power save */
>> > ? ? if (!(local->hw.flags & IEEE80211_HW_SUPPORTS_PS))
>> > @@ -233,6 +234,13 @@ ieee80211_tx_h_dynamic_ps(struct ieee80211_tx_data *tx)
>> > ? ? ? ? && skb_get_queue_mapping(tx->skb) == 0)
>> > ? ? ? ? ? ? return TX_CONTINUE;
>> >
>> > + ? hdr = (struct ieee80211_hdr *)tx->skb->data;
>> > +
>> > + ? if (!(ieee80211_is_nullfunc(hdr->frame_control) &&
>> > + ? ? ? ?ieee80211_has_pm(hdr->frame_control)) &&
>> > + ? ? ? (ifmgd->flags & IEEE80211_STA_PS_PENDING))
>> > + ? ? ? ? ? ifmgd->flags &= ~IEEE80211_STA_PS_PENDING;
>> > +
>> > ? ? if (local->hw.conf.flags & IEEE80211_CONF_PS) {
>>
>> I don't see how this patch helps anything. Should the last line I quoted
>> be replaced instead by checking if PS was requested? We used to not wait
>> for the ACK -- so waiting for the ACK broke this.
>
> Ok maybe I see how this helps -- but I don't think it's race-free. When
> the PS-pending flag is cleared here, the code above that checks it might
> already have passed and be in the driver callback or so.

When it is in the driver callback, IEEE80211_CONF_PS would have been
set and when this is set, ieee80211_tx_h_dynamic_ps will disable PS
and there wont be any discrepancy in power save states between AP and
the station.

Vivek.

2011-02-04 13:05:59

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: Fix a race on enabling power save.

On Fri, 2011-02-04 at 17:55 +0530, Vivek Natarajan wrote:

> + if (!(ieee80211_is_nullfunc(hdr->frame_control) &&
> + ieee80211_has_pm(hdr->frame_control)) &&
> + (ifmgd->flags & IEEE80211_STA_PS_PENDING))
> + ifmgd->flags &= ~IEEE80211_STA_PS_PENDING;
> +

That's still just as broken before -- you can't assume ifmgd is valid
here.

johannes


2011-02-16 11:11:34

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: Fix a race on enabling power save.

On Wed, 2011-02-16 at 15:01 +0530, Vivek Natarajan wrote:

> >> And this is how I understand this:
> >> stop queues, then send nullfunc, then do a flush to get the status and
> >> wake queues again irrespective of acked or not(but set the ACK_STATUS
> >> correspondingly). If tx() routine of nullfunc fails, then also wake
> >> the queues.
> >
> > Yes, but since the flush is there, the latter (tx fails, ...) won't be
> > needed.
>
> If queues are stopped before sending nullfunc, it is added in the
> pending frames queue and the flush after nullfunc is ineffective. So,
> is it okay to bypass this queue_stop check with REASON_PS if it is a
> nullfunc frame? Any more ideas will be helpful.

Ah, but I wasn't thinking of stopping the device queues, just the sdata
queues, and the nullfunc doesn't go across those.

johannes


2011-02-04 13:08:42

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: Fix a race on enabling power save.

On Fri, 2011-02-04 at 17:55 +0530, Vivek Natarajan wrote:
> There is a race on sending a data frame before the tx completion
> of nullfunc frame for enabling power save. As the data quickly
> follows the nullfunc frame, the AP thinks that the station is out
> of power save and continues to send the frames. Whereas in the
> station, the nullfunc ack will be processed after the tx completion
> of data frame and mac80211 goes to powersave. Thus the power
> save state mismatch between the station and the AP causes some
> data loss and some applications fail because of that. This patch
> fixes this issue.
>
> Signed-off-by: Vivek Natarajan <[email protected]>
> ---
> net/mac80211/ieee80211_i.h | 1 +
> net/mac80211/mlme.c | 8 ++++++--
> net/mac80211/tx.c | 8 ++++++++
> 3 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
> index 533fd32..6ad97f6 100644
> --- a/net/mac80211/ieee80211_i.h
> +++ b/net/mac80211/ieee80211_i.h
> @@ -346,6 +346,7 @@ enum ieee80211_sta_flags {
> IEEE80211_STA_UAPSD_ENABLED = BIT(7),
> IEEE80211_STA_NULLFUNC_ACKED = BIT(8),
> IEEE80211_STA_RESET_SIGNAL_AVE = BIT(9),
> + IEEE80211_STA_PS_PENDING = BIT(10),
> };
>
> struct ieee80211_if_managed {
> diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
> index e059b3a..45f736e 100644
> --- a/net/mac80211/mlme.c
> +++ b/net/mac80211/mlme.c
> @@ -727,13 +727,17 @@ void ieee80211_dynamic_ps_enable_work(struct work_struct *work)
> return;
>
> if ((local->hw.flags & IEEE80211_HW_PS_NULLFUNC_STACK) &&
> - (!(ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED)))
> + (!(ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED))) {
> + ifmgd->flags |= IEEE80211_STA_PS_PENDING;
> ieee80211_send_nullfunc(local, sdata, 1);
> + }
>
> if (!((local->hw.flags & IEEE80211_HW_REPORTS_TX_ACK_STATUS) &&
> (local->hw.flags & IEEE80211_HW_PS_NULLFUNC_STACK)) ||
> - (ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED)) {
> + ((ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED) &&
> + ifmgd->flags & IEEE80211_STA_PS_PENDING)) {
> ifmgd->flags &= ~IEEE80211_STA_NULLFUNC_ACKED;
> + ifmgd->flags &= ~IEEE80211_STA_PS_PENDING;
> local->hw.conf.flags |= IEEE80211_CONF_PS;
> ieee80211_hw_config(local, IEEE80211_CONF_CHANGE_PS);
> }
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 8fbbc7a..e1c2256 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -185,6 +185,7 @@ ieee80211_tx_h_dynamic_ps(struct ieee80211_tx_data *tx)
> {
> struct ieee80211_local *local = tx->local;
> struct ieee80211_if_managed *ifmgd;
> + struct ieee80211_hdr *hdr;
>
> /* driver doesn't support power save */
> if (!(local->hw.flags & IEEE80211_HW_SUPPORTS_PS))
> @@ -233,6 +234,13 @@ ieee80211_tx_h_dynamic_ps(struct ieee80211_tx_data *tx)
> && skb_get_queue_mapping(tx->skb) == 0)
> return TX_CONTINUE;
>
> + hdr = (struct ieee80211_hdr *)tx->skb->data;
> +
> + if (!(ieee80211_is_nullfunc(hdr->frame_control) &&
> + ieee80211_has_pm(hdr->frame_control)) &&
> + (ifmgd->flags & IEEE80211_STA_PS_PENDING))
> + ifmgd->flags &= ~IEEE80211_STA_PS_PENDING;
> +
> if (local->hw.conf.flags & IEEE80211_CONF_PS) {

I don't see how this patch helps anything. Should the last line I quoted
be replaced instead by checking if PS was requested? We used to not wait
for the ACK -- so waiting for the ACK broke this.

johannes


2011-02-15 14:04:52

by Vivek Natarajan

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: Fix a race on enabling power save.

On Tue, Feb 15, 2011 at 6:14 PM, Johannes Berg
<[email protected]> wrote:
> On Tue, 2011-02-08 at 15:43 +0530, Vivek Natarajan wrote:
>
>> > Maybe the subif queues should be stopped, then flush, then tx nullfunc,
>> > then stop all queues to configure the HW or something like that?
>>
>> I tried this sequence:
>> the subif queues stopped, then flush, then tx nullfunc, and wake subif
>> queues,(we cannot have the queues stopped till we receive tx_status
>> because nullfunc might have failed during tx path itself and mac80211
>> will not receive tx_status)
>
> I've recently been thinking about this -- I'm thinking that maybe we
> should change this behaviour. Right now the tx() routine basically
> always returns OK (except in at76) and I suppose instead it could return
> whether the frame was queued up successfully...
>
>> After some time interval, once again stop queues on receiving ack for
>> nullfunc, configure the hw and then wake up queues. So, during the
>> above time interval, there is a race of queuing a frame to the hw. I
>> have tested this and the issue is quickly reproducible.
>
> Right. The code that sends the nullfunc -- I think it can probably
> sleep? If so, instead of starting the queue for the TX it could just
> flush TX again after sending the nullfunc -- after that either it had
> status for the frame, or the frame was dropped, no?

So, this requires changing all the drivers to return the status for
tx() routine. And this is how I understand this:
stop queues, then send nullfunc, then do a flush to get the status and
wake queues again irrespective of acked or not(but set the ACK_STATUS
correspondingly). If tx() routine of nullfunc fails, then also wake
the queues.
I have sent another version with PS_PENDING and stop/wake queues
combined. That addresses the race that you had mentioned earlier. Do
we need that?

Thanks,
Vivek.

2011-02-15 14:09:14

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: Fix a race on enabling power save.

On Tue, 2011-02-15 at 19:34 +0530, Vivek Natarajan wrote:
> On Tue, Feb 15, 2011 at 6:14 PM, Johannes Berg
> <[email protected]> wrote:
> > On Tue, 2011-02-08 at 15:43 +0530, Vivek Natarajan wrote:
> >
> >> > Maybe the subif queues should be stopped, then flush, then tx nullfunc,
> >> > then stop all queues to configure the HW or something like that?
> >>
> >> I tried this sequence:
> >> the subif queues stopped, then flush, then tx nullfunc, and wake subif
> >> queues,(we cannot have the queues stopped till we receive tx_status
> >> because nullfunc might have failed during tx path itself and mac80211
> >> will not receive tx_status)
> >
> > I've recently been thinking about this -- I'm thinking that maybe we
> > should change this behaviour. Right now the tx() routine basically
> > always returns OK (except in at76) and I suppose instead it could return
> > whether the frame was queued up successfully...
> >
> >> After some time interval, once again stop queues on receiving ack for
> >> nullfunc, configure the hw and then wake up queues. So, during the
> >> above time interval, there is a race of queuing a frame to the hw. I
> >> have tested this and the issue is quickly reproducible.
> >
> > Right. The code that sends the nullfunc -- I think it can probably
> > sleep? If so, instead of starting the queue for the TX it could just
> > flush TX again after sending the nullfunc -- after that either it had
> > status for the frame, or the frame was dropped, no?
>
> So, this requires changing all the drivers to return the status for
> tx() routine.

Not necessarily. I think that if you do the flush() based approach, then
mac80211 can infer that the frame was dropped by not getting a TX status
during flush, right? Assuming there's reliable TX status to start with.

> And this is how I understand this:
> stop queues, then send nullfunc, then do a flush to get the status and
> wake queues again irrespective of acked or not(but set the ACK_STATUS
> correspondingly). If tx() routine of nullfunc fails, then also wake
> the queues.

Yes, but since the flush is there, the latter (tx fails, ...) won't be
needed.

> I have sent another version with PS_PENDING and stop/wake queues
> combined. That addresses the race that you had mentioned earlier. Do
> we need that?

How does it address the race? I haven't looked at that one yet.

johannes


2011-02-08 10:13:27

by Vivek Natarajan

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: Fix a race on enabling power save.

On Fri, Feb 4, 2011 at 6:42 PM, Johannes Berg <[email protected]> wrote:
> On Fri, 2011-02-04 at 14:08 +0100, Johannes Berg wrote:
>> On Fri, 2011-02-04 at 17:55 +0530, Vivek Natarajan wrote:
>> > There is a race on sending a data frame before the tx completion
>> > of nullfunc frame for enabling power save. As the data quickly
>> > follows the nullfunc frame, the AP thinks that the station is out
>> > of power save and continues to send the frames. Whereas in the
>> > station, the nullfunc ack will be processed after the tx completion
>> > of data frame and mac80211 goes to powersave. Thus the power
>> > save state mismatch between the station and the AP causes some
>> > data loss and some applications fail because of that. This patch
>> > fixes this issue.
>> >
>> > Signed-off-by: Vivek Natarajan <[email protected]>
>> > ---
>> > ?net/mac80211/ieee80211_i.h | ? ?1 +
>> > ?net/mac80211/mlme.c ? ? ? ?| ? ?8 ++++++--
>> > ?net/mac80211/tx.c ? ? ? ? ?| ? ?8 ++++++++
>> > ?3 files changed, 15 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
>> > index 533fd32..6ad97f6 100644
>> > --- a/net/mac80211/ieee80211_i.h
>> > +++ b/net/mac80211/ieee80211_i.h
>> > @@ -346,6 +346,7 @@ enum ieee80211_sta_flags {
>> > ? ? IEEE80211_STA_UAPSD_ENABLED ? ? = BIT(7),
>> > ? ? IEEE80211_STA_NULLFUNC_ACKED ? ?= BIT(8),
>> > ? ? IEEE80211_STA_RESET_SIGNAL_AVE ?= BIT(9),
>> > + ? IEEE80211_STA_PS_PENDING ? ? ? ?= BIT(10),
>> > ?};
>> >
>> > ?struct ieee80211_if_managed {
>> > diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
>> > index e059b3a..45f736e 100644
>> > --- a/net/mac80211/mlme.c
>> > +++ b/net/mac80211/mlme.c
>> > @@ -727,13 +727,17 @@ void ieee80211_dynamic_ps_enable_work(struct work_struct *work)
>> > ? ? ? ? ? ? return;
>> >
>> > ? ? if ((local->hw.flags & IEEE80211_HW_PS_NULLFUNC_STACK) &&
>> > - ? ? ? (!(ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED)))
>> > + ? ? ? (!(ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED))) {
>> > + ? ? ? ? ? ifmgd->flags |= IEEE80211_STA_PS_PENDING;
>> > ? ? ? ? ? ? ieee80211_send_nullfunc(local, sdata, 1);
>> > + ? }
>> >
>> > ? ? if (!((local->hw.flags & IEEE80211_HW_REPORTS_TX_ACK_STATUS) &&
>> > ? ? ? ? ? (local->hw.flags & IEEE80211_HW_PS_NULLFUNC_STACK)) ||
>> > - ? ? ? (ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED)) {
>> > + ? ? ? ((ifmgd->flags & IEEE80211_STA_NULLFUNC_ACKED) &&
>> > + ? ? ? ? ifmgd->flags & IEEE80211_STA_PS_PENDING)) ?{
>> > ? ? ? ? ? ? ifmgd->flags &= ~IEEE80211_STA_NULLFUNC_ACKED;
>> > + ? ? ? ? ? ifmgd->flags &= ~IEEE80211_STA_PS_PENDING;
>> > ? ? ? ? ? ? local->hw.conf.flags |= IEEE80211_CONF_PS;
>> > ? ? ? ? ? ? ieee80211_hw_config(local, IEEE80211_CONF_CHANGE_PS);
>> > ? ? }
>> > diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
>> > index 8fbbc7a..e1c2256 100644
>> > --- a/net/mac80211/tx.c
>> > +++ b/net/mac80211/tx.c
>> > @@ -185,6 +185,7 @@ ieee80211_tx_h_dynamic_ps(struct ieee80211_tx_data *tx)
>> > ?{
>> > ? ? struct ieee80211_local *local = tx->local;
>> > ? ? struct ieee80211_if_managed *ifmgd;
>> > + ? struct ieee80211_hdr *hdr;
>> >
>> > ? ? /* driver doesn't support power save */
>> > ? ? if (!(local->hw.flags & IEEE80211_HW_SUPPORTS_PS))
>> > @@ -233,6 +234,13 @@ ieee80211_tx_h_dynamic_ps(struct ieee80211_tx_data *tx)
>> > ? ? ? ? && skb_get_queue_mapping(tx->skb) == 0)
>> > ? ? ? ? ? ? return TX_CONTINUE;
>> >
>> > + ? hdr = (struct ieee80211_hdr *)tx->skb->data;
>> > +
>> > + ? if (!(ieee80211_is_nullfunc(hdr->frame_control) &&
>> > + ? ? ? ?ieee80211_has_pm(hdr->frame_control)) &&
>> > + ? ? ? (ifmgd->flags & IEEE80211_STA_PS_PENDING))
>> > + ? ? ? ? ? ifmgd->flags &= ~IEEE80211_STA_PS_PENDING;
>> > +
>> > ? ? if (local->hw.conf.flags & IEEE80211_CONF_PS) {
>>
>
> Maybe the subif queues should be stopped, then flush, then tx nullfunc,
> then stop all queues to configure the HW or something like that?

I tried this sequence:
the subif queues stopped, then flush, then tx nullfunc, and wake subif
queues,(we cannot have the queues stopped till we receive tx_status
because nullfunc might have failed during tx path itself and mac80211
will not receive tx_status)
After some time interval, once again stop queues on receiving ack for
nullfunc, configure the hw and then wake up queues. So, during the
above time interval, there is a race of queuing a frame to the hw. I
have tested this and the issue is quickly reproducible.

> Indeed, but the trace still exists between checking PS_PENDING and
> setting CONF_PS.

But this race(in microseconds?) did not happen in my testing for 10 hrs.

For tx path and the mlme PS path to be atomic, a new spinlock is
needed. So, to fix this, I can think of only a lock to protect the
checking of PS_PENDING and setting CONF_PS in mlme.c and lock in tx
path where we check the PS_PENDING.

Thanks
Vivek.

2011-02-15 12:44:29

by Johannes Berg

[permalink] [raw]
Subject: Re: [PATCH v2] mac80211: Fix a race on enabling power save.

On Tue, 2011-02-08 at 15:43 +0530, Vivek Natarajan wrote:

> > Maybe the subif queues should be stopped, then flush, then tx nullfunc,
> > then stop all queues to configure the HW or something like that?
>
> I tried this sequence:
> the subif queues stopped, then flush, then tx nullfunc, and wake subif
> queues,(we cannot have the queues stopped till we receive tx_status
> because nullfunc might have failed during tx path itself and mac80211
> will not receive tx_status)

I've recently been thinking about this -- I'm thinking that maybe we
should change this behaviour. Right now the tx() routine basically
always returns OK (except in at76) and I suppose instead it could return
whether the frame was queued up successfully...

> After some time interval, once again stop queues on receiving ack for
> nullfunc, configure the hw and then wake up queues. So, during the
> above time interval, there is a race of queuing a frame to the hw. I
> have tested this and the issue is quickly reproducible.

Right. The code that sends the nullfunc -- I think it can probably
sleep? If so, instead of starting the queue for the TX it could just
flush TX again after sending the nullfunc -- after that either it had
status for the frame, or the frame was dropped, no?

johannes