2012-02-08 18:17:21

by Felix Fietkau

[permalink] [raw]
Subject: [PATCH] mac80211: do not call rate control .tx_status before .rate_init

Most rate control implementations assume .get_rate and .tx_status are only
called once the per-station data has been fully initialized.
minstrel_ht crashes if this assumption is violated.

Signed-off-by: Felix Fietkau <[email protected]>
Tested-by: Arend van Spriel <[email protected]>
---
net/mac80211/rate.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/mac80211/rate.h b/net/mac80211/rate.h
index 5fc3135..fbb1efd 100644
--- a/net/mac80211/rate.h
+++ b/net/mac80211/rate.h
@@ -37,7 +37,7 @@ static inline void rate_control_tx_status(struct ieee80211_local *local,
struct ieee80211_sta *ista = &sta->sta;
void *priv_sta = sta->rate_ctrl_priv;

- if (!ref)
+ if (!ref || !test_sta_flag(sta, WLAN_STA_RATE_CONTROL))
return;

ref->ops->tx_status(ref->priv, sband, ista, priv_sta, skb);
--
1.7.3.2



2012-02-08 19:31:39

by John W. Linville

[permalink] [raw]
Subject: Re: [PATCH] mac80211: do not call rate control .tx_status before .rate_init

On Wed, Feb 08, 2012 at 07:17:11PM +0100, Felix Fietkau wrote:
> Most rate control implementations assume .get_rate and .tx_status are only
> called once the per-station data has been fully initialized.
> minstrel_ht crashes if this assumption is violated.
>
> Signed-off-by: Felix Fietkau <[email protected]>
> Tested-by: Arend van Spriel <[email protected]>
> ---
> net/mac80211/rate.h | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/net/mac80211/rate.h b/net/mac80211/rate.h
> index 5fc3135..fbb1efd 100644
> --- a/net/mac80211/rate.h
> +++ b/net/mac80211/rate.h
> @@ -37,7 +37,7 @@ static inline void rate_control_tx_status(struct ieee80211_local *local,
> struct ieee80211_sta *ista = &sta->sta;
> void *priv_sta = sta->rate_ctrl_priv;
>
> - if (!ref)
> + if (!ref || !test_sta_flag(sta, WLAN_STA_RATE_CONTROL))
> return;
>
> ref->ops->tx_status(ref->priv, sband, ista, priv_sta, skb);

Any reason not to apply this for 3.3? Or stable?

John
--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2012-02-08 23:02:05

by Pavel Roskin

[permalink] [raw]
Subject: Re: [PATCH] mac80211: do not call rate control .tx_status before .rate_init

On Wed, 8 Feb 2012 14:44:54 -0500
"John W. Linville" <[email protected]> wrote:

> On Wed, Feb 08, 2012 at 08:38:00PM +0100, Felix Fietkau wrote:
> > On 2012-02-08 8:25 PM, John W. Linville wrote:
> > > On Wed, Feb 08, 2012 at 07:17:11PM +0100, Felix Fietkau wrote:
> > >> Most rate control implementations assume .get_rate
> > >> and .tx_status are only called once the per-station data has
> > >> been fully initialized. minstrel_ht crashes if this assumption
> > >> is violated.
> > >>
> > >> Signed-off-by: Felix Fietkau <[email protected]>
> > >> Tested-by: Arend van Spriel <[email protected]>
> > >> ---
> > >> net/mac80211/rate.h | 2 +-
> > >> 1 files changed, 1 insertions(+), 1 deletions(-)
> > >>
> > >> diff --git a/net/mac80211/rate.h b/net/mac80211/rate.h
> > >> index 5fc3135..fbb1efd 100644
> > >> --- a/net/mac80211/rate.h
> > >> +++ b/net/mac80211/rate.h
> > >> @@ -37,7 +37,7 @@ static inline void
> > >> rate_control_tx_status(struct ieee80211_local *local, struct
> > >> ieee80211_sta *ista = &sta->sta; void *priv_sta =
> > >> sta->rate_ctrl_priv;
> > >> - if (!ref)
> > >> + if (!ref || !test_sta_flag(sta, WLAN_STA_RATE_CONTROL))
> > >> return;
> > >>
> > >> ref->ops->tx_status(ref->priv, sband, ista, priv_sta,
> > >> skb);
> > >
> > > Any reason not to apply this for 3.3? Or stable?
> > I think 3.3 doesn't have that sta flag, the issue was probably
> > introduced with the 3.4 changes.
> > I don't remember something like this appearing in earlier versions.
>
> Cool, thanks.

I believe 3.3 is affected. At least it looks like the Fedora bug 768639
(https://bugzilla.redhat.com/show_bug.cgi?id=768639) is caused by
calling .tx_status at a wrong time. Fedora kernels use
compat-wireless-3.3. I'm going to test the bleeding edge
compat-wireless with the patch by Felix to see if it fixes things.

The lack of the WLAN_STA_RATE_CONTROL flag doesn't mean that the old
behavior was correct. The flag was introduced to correct that behavior.

The oldest report is dated 2011-12-17 and it's about Linux 3.2.0-rc5
with compat-wireless-2011-12-01.

--
Regards,
Pavel Roskin

2012-02-09 20:16:38

by John W. Linville

[permalink] [raw]
Subject: Re: [PATCH] mac80211: do not call rate control .tx_status before .rate_init

On Thu, Feb 09, 2012 at 01:51:51AM +0100, Felix Fietkau wrote:
> On 2012-02-09 12:01 AM, Pavel Roskin wrote:
> > On Wed, 8 Feb 2012 14:44:54 -0500
> > "John W. Linville" <[email protected]> wrote:
> >
> >> On Wed, Feb 08, 2012 at 08:38:00PM +0100, Felix Fietkau wrote:
> >> > On 2012-02-08 8:25 PM, John W. Linville wrote:
> >> > > On Wed, Feb 08, 2012 at 07:17:11PM +0100, Felix Fietkau wrote:
> >> > >> Most rate control implementations assume .get_rate
> >> > >> and .tx_status are only called once the per-station data has
> >> > >> been fully initialized. minstrel_ht crashes if this assumption
> >> > >> is violated.
> >> > >>
> >> > >> Signed-off-by: Felix Fietkau <[email protected]>
> >> > >> Tested-by: Arend van Spriel <[email protected]>
> >> > >> ---
> >> > >> net/mac80211/rate.h | 2 +-
> >> > >> 1 files changed, 1 insertions(+), 1 deletions(-)
> >> > >>
> >> > >> diff --git a/net/mac80211/rate.h b/net/mac80211/rate.h
> >> > >> index 5fc3135..fbb1efd 100644
> >> > >> --- a/net/mac80211/rate.h
> >> > >> +++ b/net/mac80211/rate.h
> >> > >> @@ -37,7 +37,7 @@ static inline void
> >> > >> rate_control_tx_status(struct ieee80211_local *local, struct
> >> > >> ieee80211_sta *ista = &sta->sta; void *priv_sta =
> >> > >> sta->rate_ctrl_priv;
> >> > >> - if (!ref)
> >> > >> + if (!ref || !test_sta_flag(sta, WLAN_STA_RATE_CONTROL))
> >> > >> return;
> >> > >>
> >> > >> ref->ops->tx_status(ref->priv, sband, ista, priv_sta,
> >> > >> skb);
> >> > >
> >> > > Any reason not to apply this for 3.3? Or stable?
> >> > I think 3.3 doesn't have that sta flag, the issue was probably
> >> > introduced with the 3.4 changes.
> >> > I don't remember something like this appearing in earlier versions.
> >>
> >> Cool, thanks.
> >
> > I believe 3.3 is affected. At least it looks like the Fedora bug 768639
> > (https://bugzilla.redhat.com/show_bug.cgi?id=768639) is caused by
> > calling .tx_status at a wrong time. Fedora kernels use
> > compat-wireless-3.3. I'm going to test the bleeding edge
> > compat-wireless with the patch by Felix to see if it fixes things.
> >
> > The lack of the WLAN_STA_RATE_CONTROL flag doesn't mean that the old
> > behavior was correct. The flag was introduced to correct that behavior.
> >
> > The oldest report is dated 2011-12-17 and it's about Linux 3.2.0-rc5
> > with compat-wireless-2011-12-01.
> Only .get_rate and .tx_status are affected, wireless-testing commit
> e1936e9407138b483e6d1332dd944afec8131f30 adds one of the checks, and my
> commit adds the other. Maybe John could merge those two to 3.3.

At least one of them will cause some merge issues. Can someone try
the attached patches to verify that they actually fix a real problem
in 3.3?

Thanks!

John
--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.


Attachments:
(No filename) (2.80 kB)
0001-mac80211-call-rate-control-only-after-init.patch (3.15 kB)
0002-mac80211-do-not-call-rate-control-.tx_status-before-.patch (1.11 kB)
Download all attachments

2012-02-08 19:38:04

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH] mac80211: do not call rate control .tx_status before .rate_init

On 2012-02-08 8:25 PM, John W. Linville wrote:
> On Wed, Feb 08, 2012 at 07:17:11PM +0100, Felix Fietkau wrote:
>> Most rate control implementations assume .get_rate and .tx_status are only
>> called once the per-station data has been fully initialized.
>> minstrel_ht crashes if this assumption is violated.
>>
>> Signed-off-by: Felix Fietkau <[email protected]>
>> Tested-by: Arend van Spriel <[email protected]>
>> ---
>> net/mac80211/rate.h | 2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/net/mac80211/rate.h b/net/mac80211/rate.h
>> index 5fc3135..fbb1efd 100644
>> --- a/net/mac80211/rate.h
>> +++ b/net/mac80211/rate.h
>> @@ -37,7 +37,7 @@ static inline void rate_control_tx_status(struct ieee80211_local *local,
>> struct ieee80211_sta *ista = &sta->sta;
>> void *priv_sta = sta->rate_ctrl_priv;
>>
>> - if (!ref)
>> + if (!ref || !test_sta_flag(sta, WLAN_STA_RATE_CONTROL))
>> return;
>>
>> ref->ops->tx_status(ref->priv, sband, ista, priv_sta, skb);
>
> Any reason not to apply this for 3.3? Or stable?
I think 3.3 doesn't have that sta flag, the issue was probably
introduced with the 3.4 changes.
I don't remember something like this appearing in earlier versions.

- Felix

2012-02-09 20:21:54

by Arend van Spriel

[permalink] [raw]
Subject: Re: [PATCH] mac80211: do not call rate control .tx_status before .rate_init

On 02/09/2012 09:14 PM, John W. Linville wrote:
> On Thu, Feb 09, 2012 at 01:51:51AM +0100, Felix Fietkau wrote:
>> On 2012-02-09 12:01 AM, Pavel Roskin wrote:
>>> On Wed, 8 Feb 2012 14:44:54 -0500
>>> "John W. Linville" <[email protected]> wrote:
>>>
>>>> On Wed, Feb 08, 2012 at 08:38:00PM +0100, Felix Fietkau wrote:
>>>>> On 2012-02-08 8:25 PM, John W. Linville wrote:
>>>>>> On Wed, Feb 08, 2012 at 07:17:11PM +0100, Felix Fietkau wrote:
>>>>>>> Most rate control implementations assume .get_rate
>>>>>>> and .tx_status are only called once the per-station data has
>>>>>>> been fully initialized. minstrel_ht crashes if this assumption
>>>>>>> is violated.
>>>>>>>
>>>>>>> Signed-off-by: Felix Fietkau <[email protected]>
>>>>>>> Tested-by: Arend van Spriel <[email protected]>
>>>>>>> ---
>>>>>>> net/mac80211/rate.h | 2 +-
>>>>>>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>>>>>>
>>>>>>> diff --git a/net/mac80211/rate.h b/net/mac80211/rate.h
>>>>>>> index 5fc3135..fbb1efd 100644
>>>>>>> --- a/net/mac80211/rate.h
>>>>>>> +++ b/net/mac80211/rate.h
>>>>>>> @@ -37,7 +37,7 @@ static inline void
>>>>>>> rate_control_tx_status(struct ieee80211_local *local, struct
>>>>>>> ieee80211_sta *ista = &sta->sta; void *priv_sta =
>>>>>>> sta->rate_ctrl_priv;
>>>>>>> - if (!ref)
>>>>>>> + if (!ref || !test_sta_flag(sta, WLAN_STA_RATE_CONTROL))
>>>>>>> return;
>>>>>>>
>>>>>>> ref->ops->tx_status(ref->priv, sband, ista, priv_sta,
>>>>>>> skb);
>>>>>>
>>>>>> Any reason not to apply this for 3.3? Or stable?
>>>>> I think 3.3 doesn't have that sta flag, the issue was probably
>>>>> introduced with the 3.4 changes.
>>>>> I don't remember something like this appearing in earlier versions.
>>>>
>>>> Cool, thanks.
>>>
>>> I believe 3.3 is affected. At least it looks like the Fedora bug 768639
>>> (https://bugzilla.redhat.com/show_bug.cgi?id=768639) is caused by
>>> calling .tx_status at a wrong time. Fedora kernels use
>>> compat-wireless-3.3. I'm going to test the bleeding edge
>>> compat-wireless with the patch by Felix to see if it fixes things.
>>>
>>> The lack of the WLAN_STA_RATE_CONTROL flag doesn't mean that the old
>>> behavior was correct. The flag was introduced to correct that behavior.
>>>
>>> The oldest report is dated 2011-12-17 and it's about Linux 3.2.0-rc5
>>> with compat-wireless-2011-12-01.
>> Only .get_rate and .tx_status are affected, wireless-testing commit
>> e1936e9407138b483e6d1332dd944afec8131f30 adds one of the checks, and my
>> commit adds the other. Maybe John could merge those two to 3.3.
>
> At least one of them will cause some merge issues. Can someone try
> the attached patches to verify that they actually fix a real problem
> in 3.3?
>
> Thanks!
>
> John

Hi John,

This patch fixes NULL deref issue I found and bisected in
wireless-testing earlier this week (see [1]). I don't think gives a
problem with 3.3 at the moment.

Gr. AvS

[1] http://www.spinics.net/lists/linux-wireless/msg84575.html





2012-02-09 00:51:58

by Felix Fietkau

[permalink] [raw]
Subject: Re: [PATCH] mac80211: do not call rate control .tx_status before .rate_init

On 2012-02-09 12:01 AM, Pavel Roskin wrote:
> On Wed, 8 Feb 2012 14:44:54 -0500
> "John W. Linville" <[email protected]> wrote:
>
>> On Wed, Feb 08, 2012 at 08:38:00PM +0100, Felix Fietkau wrote:
>> > On 2012-02-08 8:25 PM, John W. Linville wrote:
>> > > On Wed, Feb 08, 2012 at 07:17:11PM +0100, Felix Fietkau wrote:
>> > >> Most rate control implementations assume .get_rate
>> > >> and .tx_status are only called once the per-station data has
>> > >> been fully initialized. minstrel_ht crashes if this assumption
>> > >> is violated.
>> > >>
>> > >> Signed-off-by: Felix Fietkau <[email protected]>
>> > >> Tested-by: Arend van Spriel <[email protected]>
>> > >> ---
>> > >> net/mac80211/rate.h | 2 +-
>> > >> 1 files changed, 1 insertions(+), 1 deletions(-)
>> > >>
>> > >> diff --git a/net/mac80211/rate.h b/net/mac80211/rate.h
>> > >> index 5fc3135..fbb1efd 100644
>> > >> --- a/net/mac80211/rate.h
>> > >> +++ b/net/mac80211/rate.h
>> > >> @@ -37,7 +37,7 @@ static inline void
>> > >> rate_control_tx_status(struct ieee80211_local *local, struct
>> > >> ieee80211_sta *ista = &sta->sta; void *priv_sta =
>> > >> sta->rate_ctrl_priv;
>> > >> - if (!ref)
>> > >> + if (!ref || !test_sta_flag(sta, WLAN_STA_RATE_CONTROL))
>> > >> return;
>> > >>
>> > >> ref->ops->tx_status(ref->priv, sband, ista, priv_sta,
>> > >> skb);
>> > >
>> > > Any reason not to apply this for 3.3? Or stable?
>> > I think 3.3 doesn't have that sta flag, the issue was probably
>> > introduced with the 3.4 changes.
>> > I don't remember something like this appearing in earlier versions.
>>
>> Cool, thanks.
>
> I believe 3.3 is affected. At least it looks like the Fedora bug 768639
> (https://bugzilla.redhat.com/show_bug.cgi?id=768639) is caused by
> calling .tx_status at a wrong time. Fedora kernels use
> compat-wireless-3.3. I'm going to test the bleeding edge
> compat-wireless with the patch by Felix to see if it fixes things.
>
> The lack of the WLAN_STA_RATE_CONTROL flag doesn't mean that the old
> behavior was correct. The flag was introduced to correct that behavior.
>
> The oldest report is dated 2011-12-17 and it's about Linux 3.2.0-rc5
> with compat-wireless-2011-12-01.
Only .get_rate and .tx_status are affected, wireless-testing commit
e1936e9407138b483e6d1332dd944afec8131f30 adds one of the checks, and my
commit adds the other. Maybe John could merge those two to 3.3.

- Felix

2012-02-10 10:05:47

by Joerg

[permalink] [raw]
Subject: Re: [PATCH] mac80211: do not call rate control .tx_status before .rate_init

Pavel Roskin <proski@...> writes:

> I tried to find a way to reproduce the "-1 event" quickly, I even put
> the laptop in a microwave oven while running flood ping to the router.
> Too bad, nothing interesting happened.

Switch it on, the oven that is!



2012-02-09 23:11:24

by Pavel Roskin

[permalink] [raw]
Subject: Re: [PATCH] mac80211: do not call rate control .tx_status before .rate_init

On Thu, 09 Feb 2012 01:51:51 +0100
Felix Fietkau <[email protected]> wrote:

> Only .get_rate and .tx_status are affected, wireless-testing commit
> e1936e9407138b483e6d1332dd944afec8131f30 adds one of the checks, and
> my commit adds the other. Maybe John could merge those two to 3.3.

I ran the "affected" laptop with ath9k rate control overnight with
compat-wireless-2012-02-06 and both patches. As strange as it is, I
still got a pair of "-1 events" less than a millisecond apart. It took
18433 seconds (about 4 hours) to reproduce.

I wish I put more instrumentation into the ath9k rate control code. I
absolutely didn't expect the issue to stay.

I tried to find a way to reproduce the "-1 event" quickly, I even put
the laptop in a microwave oven while running flood ping to the router.
Too bad, nothing interesting happened.

The full kernel log is attached. Also, I'm getting "ath: Failed to
stop TX DMA, queues=0x001!"; it wasn't there with compat-wireless 3.3.

It looks like the issue with ath9k rate control is separate and I have
no standing (as lawyers would say) with regard to the already committed
patches.

--
Regards,
Pavel Roskin


Attachments:
(No filename) (1.13 kB)
dmesg (62.15 kB)
Download all attachments

2012-02-08 19:46:35

by John W. Linville

[permalink] [raw]
Subject: Re: [PATCH] mac80211: do not call rate control .tx_status before .rate_init

On Wed, Feb 08, 2012 at 08:38:00PM +0100, Felix Fietkau wrote:
> On 2012-02-08 8:25 PM, John W. Linville wrote:
> > On Wed, Feb 08, 2012 at 07:17:11PM +0100, Felix Fietkau wrote:
> >> Most rate control implementations assume .get_rate and .tx_status are only
> >> called once the per-station data has been fully initialized.
> >> minstrel_ht crashes if this assumption is violated.
> >>
> >> Signed-off-by: Felix Fietkau <[email protected]>
> >> Tested-by: Arend van Spriel <[email protected]>
> >> ---
> >> net/mac80211/rate.h | 2 +-
> >> 1 files changed, 1 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/net/mac80211/rate.h b/net/mac80211/rate.h
> >> index 5fc3135..fbb1efd 100644
> >> --- a/net/mac80211/rate.h
> >> +++ b/net/mac80211/rate.h
> >> @@ -37,7 +37,7 @@ static inline void rate_control_tx_status(struct ieee80211_local *local,
> >> struct ieee80211_sta *ista = &sta->sta;
> >> void *priv_sta = sta->rate_ctrl_priv;
> >>
> >> - if (!ref)
> >> + if (!ref || !test_sta_flag(sta, WLAN_STA_RATE_CONTROL))
> >> return;
> >>
> >> ref->ops->tx_status(ref->priv, sband, ista, priv_sta, skb);
> >
> > Any reason not to apply this for 3.3? Or stable?
> I think 3.3 doesn't have that sta flag, the issue was probably
> introduced with the 3.4 changes.
> I don't remember something like this appearing in earlier versions.

Cool, thanks.

--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.