2013-07-31 08:40:39

by Arend van Spriel

[permalink] [raw]
Subject: Fwd: [Bug 989269] Connecting to WLAN causes kernel panic

Hi Felix,

How are things in OpenWRT. I wanted to ask you something regarding a
defect I am looking at. Since kernel 3.9 several reports have been made
about a kernel panic in brcmsmac, ie. a divide-by-zero error.

Debugging the issue shows we end up with a rate with MCS index 110,
which is, well, impossible. As brcmsmac gets the rate info from
minstrel_ht I was wondering if we have an intergration issue here. I saw
around April patches about new API which may have been in the 3.9 time
frame and something subtly changed things for brcmsmac.

Regards,
Arend

-------- Original Message --------
Subject: [Bug 989269] Connecting to WLAN causes kernel panic
Date: Wed, 31 Jul 2013 08:11:41 +0000
From: <[email protected]>
To: <[email protected]>

https://bugzilla.redhat.com/show_bug.cgi?id=989269



--- Comment #13 from Arend van Spriel <[email protected]> ---
(In reply to Chris from comment #12)
> Created attachment 780839 [details]
> dmesg | grep brcms when connecting to WLAN after patch 2
>
> During gathering this data I connected to the internet, was sitting for a
> while and then walked through a corridor in my university, so that the
> computer was connecting to different routers. Sat down there for
> significantly longer time. At the end I reconnected and disconnected.
> It seems to work stable, without any problems, but I haven't tried to use
> the connection for something heavier.

Thanks for the data. I observed two values that are invalid. ratespec
value 0
is invalid and the driver selects 1Mbps rate to do the calculation. The
other
value 134217838 is what triggers the divide-by-zero. The ratespec value is:
ratespec: 0x800006E
RATE 110 (rate value [unit: 500Kbps or MCS index])
MIMORATE 1 (RATE field represents MIMO MCS index)

This does not make sense, because MCS index can only go up to 32. I suspect
this should not be a mimo rate, but 54Mbps. Looking further how we end up in
this situation.

--
You are receiving this mail because:
You are the assignee for the bug.






2013-07-31 09:45:54

by Arend van Spriel

[permalink] [raw]
Subject: Re: Fwd: [Bug 989269] Connecting to WLAN causes kernel panic

On 07/31/2013 11:09 AM, Felix Fietkau wrote:
> On 2013-07-31 10:39 AM, Arend van Spriel wrote:
>> Hi Felix,
>>
>> How are things in OpenWRT. I wanted to ask you something regarding a
>> defect I am looking at. Since kernel 3.9 several reports have been made
>> about a kernel panic in brcmsmac, ie. a divide-by-zero error.
> 3.9 was the first kernel to support CCK rates in minstrel_ht as
> fallback (in case the link gets very bad). Not sure if that triggers
> anything weird in brcmsmac.

It just might reading this in brcmsmac:

/*
* Currently only support same setting for primary and
* fallback rates. Unify flags for each rate into a
* single value for the frame
*/
use_rts |= txrate[k]->flags & IEEE80211_TX_RC_USE_RTS_CTS
? true : false;
use_cts |= txrate[k]->flags & IEEE80211_TX_RC_USE_CTS_PROTECT
? true : false;

Although this is not directly

>> Debugging the issue shows we end up with a rate with MCS index 110,
>> which is, well, impossible.
> Did you verify that it comes directly from minstrel_ht, or does it show
> up somewhere further down the chain in brcmsmac?

I am pretty sure it is not minstrel_ht. brcmsmac converts the
information from minstrel_ht into a so-called ratespec format. The
strange MCS is what I see in the ratespec leading up to the
divide-by-zero. Next thing to look at is the conversion step. As said
above the CCK fallback might be the culprit. I mean how brcmsmac deals
with it is.

>> As brcmsmac gets the rate info from
>> minstrel_ht I was wondering if we have an intergration issue here. I saw
>> around April patches about new API which may have been in the 3.9 time
>> frame and something subtly changed things for brcmsmac.
> The new rate API was added in 3.10, not 3.9. It did add bug that caused
> bogus MCS rates. I've sent a patch for this a while back (shortly
> before 3.10 was released), but it was too late to make it into the
> release. I guess we have to wait for it to be applied through stable -
> no idea why that hasn't happened yet.

Ping Greg? I will give it a try.

Thanks,
Arend

> Here is the fix:
>
> commit 1cd158573951f737fbc878a35cb5eb47bf9af3d5
> Author: Felix Fietkau <[email protected]>
> Date: Fri Jun 28 21:04:35 2013 +0200
>
> mac80211/minstrel_ht: fix cck rate sampling
>
> The CCK group needs special treatment to set the right flags and rate
> index. Add this missing check to prevent setting broken rates for tx
> packets.
>
> Cc: [email protected] # 3.10
> Signed-off-by: Felix Fietkau <[email protected]>
> Signed-off-by: Johannes Berg <[email protected]>
>
> diff --git a/net/mac80211/rc80211_minstrel_ht.c b/net/mac80211/rc80211_minstrel_ht.c
> index 5b2d301..f5aed96 100644
> --- a/net/mac80211/rc80211_minstrel_ht.c
> +++ b/net/mac80211/rc80211_minstrel_ht.c
> @@ -804,10 +804,18 @@ minstrel_ht_get_rate(void *priv, struct ieee80211_sta *sta, void *priv_sta,
>
> sample_group = &minstrel_mcs_groups[sample_idx / MCS_GROUP_RATES];
> info->flags |= IEEE80211_TX_CTL_RATE_CTRL_PROBE;
> + rate->count = 1;
> +
> + if (sample_idx / MCS_GROUP_RATES == MINSTREL_CCK_GROUP) {
> + int idx = sample_idx % ARRAY_SIZE(mp->cck_rates);
> + rate->idx = mp->cck_rates[idx];
> + rate->flags = 0;
> + return;
> + }
> +
> rate->idx = sample_idx % MCS_GROUP_RATES +
> (sample_group->streams - 1) * MCS_GROUP_RATES;
> rate->flags = IEEE80211_TX_RC_MCS | sample_group->flags;
> - rate->count = 1;
> }
>
> static void
>
>



2013-07-31 09:09:14

by Felix Fietkau

[permalink] [raw]
Subject: Re: Fwd: [Bug 989269] Connecting to WLAN causes kernel panic

On 2013-07-31 10:39 AM, Arend van Spriel wrote:
> Hi Felix,
>
> How are things in OpenWRT. I wanted to ask you something regarding a
> defect I am looking at. Since kernel 3.9 several reports have been made
> about a kernel panic in brcmsmac, ie. a divide-by-zero error.
3.9 was the first kernel to support CCK rates in minstrel_ht as
fallback (in case the link gets very bad). Not sure if that triggers
anything weird in brcmsmac.

> Debugging the issue shows we end up with a rate with MCS index 110,
> which is, well, impossible.
Did you verify that it comes directly from minstrel_ht, or does it show
up somewhere further down the chain in brcmsmac?

> As brcmsmac gets the rate info from
> minstrel_ht I was wondering if we have an intergration issue here. I saw
> around April patches about new API which may have been in the 3.9 time
> frame and something subtly changed things for brcmsmac.
The new rate API was added in 3.10, not 3.9. It did add bug that caused
bogus MCS rates. I've sent a patch for this a while back (shortly
before 3.10 was released), but it was too late to make it into the
release. I guess we have to wait for it to be applied through stable -
no idea why that hasn't happened yet.

Here is the fix:

commit 1cd158573951f737fbc878a35cb5eb47bf9af3d5
Author: Felix Fietkau <[email protected]>
Date: Fri Jun 28 21:04:35 2013 +0200

mac80211/minstrel_ht: fix cck rate sampling

The CCK group needs special treatment to set the right flags and rate
index. Add this missing check to prevent setting broken rates for tx
packets.

Cc: [email protected] # 3.10
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>

diff --git a/net/mac80211/rc80211_minstrel_ht.c b/net/mac80211/rc80211_minstrel_ht.c
index 5b2d301..f5aed96 100644
--- a/net/mac80211/rc80211_minstrel_ht.c
+++ b/net/mac80211/rc80211_minstrel_ht.c
@@ -804,10 +804,18 @@ minstrel_ht_get_rate(void *priv, struct ieee80211_sta *sta, void *priv_sta,

sample_group = &minstrel_mcs_groups[sample_idx / MCS_GROUP_RATES];
info->flags |= IEEE80211_TX_CTL_RATE_CTRL_PROBE;
+ rate->count = 1;
+
+ if (sample_idx / MCS_GROUP_RATES == MINSTREL_CCK_GROUP) {
+ int idx = sample_idx % ARRAY_SIZE(mp->cck_rates);
+ rate->idx = mp->cck_rates[idx];
+ rate->flags = 0;
+ return;
+ }
+
rate->idx = sample_idx % MCS_GROUP_RATES +
(sample_group->streams - 1) * MCS_GROUP_RATES;
rate->flags = IEEE80211_TX_RC_MCS | sample_group->flags;
- rate->count = 1;
}

static void


2013-07-31 09:46:44

by Sedat Dilek

[permalink] [raw]
Subject: Re: Fwd: [Bug 989269] Connecting to WLAN causes kernel panic

On Wed, Jul 31, 2013 at 11:09 AM, Felix Fietkau <[email protected]> wrote:
> On 2013-07-31 10:39 AM, Arend van Spriel wrote:
>> Hi Felix,
>>
>> How are things in OpenWRT. I wanted to ask you something regarding a
>> defect I am looking at. Since kernel 3.9 several reports have been made
>> about a kernel panic in brcmsmac, ie. a divide-by-zero error.
> 3.9 was the first kernel to support CCK rates in minstrel_ht as
> fallback (in case the link gets very bad). Not sure if that triggers
> anything weird in brcmsmac.
>
>> Debugging the issue shows we end up with a rate with MCS index 110,
>> which is, well, impossible.
> Did you verify that it comes directly from minstrel_ht, or does it show
> up somewhere further down the chain in brcmsmac?
>
>> As brcmsmac gets the rate info from
>> minstrel_ht I was wondering if we have an intergration issue here. I saw
>> around April patches about new API which may have been in the 3.9 time
>> frame and something subtly changed things for brcmsmac.
> The new rate API was added in 3.10, not 3.9. It did add bug that caused
> bogus MCS rates. I've sent a patch for this a while back (shortly
> before 3.10 was released), but it was too late to make it into the
> release. I guess we have to wait for it to be applied through stable -
> no idea why that hasn't happened yet.
>
> Here is the fix:
>
> commit 1cd158573951f737fbc878a35cb5eb47bf9af3d5
> Author: Felix Fietkau <[email protected]>
> Date: Fri Jun 28 21:04:35 2013 +0200
>
> mac80211/minstrel_ht: fix cck rate sampling
>

That patch is not in Linus tree yet, so it won't get into stable.

- Sedat -

> The CCK group needs special treatment to set the right flags and rate
> index. Add this missing check to prevent setting broken rates for tx
> packets.
>
> Cc: [email protected] # 3.10
> Signed-off-by: Felix Fietkau <[email protected]>
> Signed-off-by: Johannes Berg <[email protected]>
>
> diff --git a/net/mac80211/rc80211_minstrel_ht.c b/net/mac80211/rc80211_minstrel_ht.c
> index 5b2d301..f5aed96 100644
> --- a/net/mac80211/rc80211_minstrel_ht.c
> +++ b/net/mac80211/rc80211_minstrel_ht.c
> @@ -804,10 +804,18 @@ minstrel_ht_get_rate(void *priv, struct ieee80211_sta *sta, void *priv_sta,
>
> sample_group = &minstrel_mcs_groups[sample_idx / MCS_GROUP_RATES];
> info->flags |= IEEE80211_TX_CTL_RATE_CTRL_PROBE;
> + rate->count = 1;
> +
> + if (sample_idx / MCS_GROUP_RATES == MINSTREL_CCK_GROUP) {
> + int idx = sample_idx % ARRAY_SIZE(mp->cck_rates);
> + rate->idx = mp->cck_rates[idx];
> + rate->flags = 0;
> + return;
> + }
> +
> rate->idx = sample_idx % MCS_GROUP_RATES +
> (sample_group->streams - 1) * MCS_GROUP_RATES;
> rate->flags = IEEE80211_TX_RC_MCS | sample_group->flags;
> - rate->count = 1;
> }
>
> static void
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-08-16 22:20:03

by Arend van Spriel

[permalink] [raw]
Subject: Re: Fwd: [Bug 989269] Connecting to WLAN causes kernel panic

On 07/31/2013 11:09 AM, Felix Fietkau wrote:
> On 2013-07-31 10:39 AM, Arend van Spriel wrote:
>> Hi Felix,
>>
>> How are things in OpenWRT. I wanted to ask you something regarding a
>> defect I am looking at. Since kernel 3.9 several reports have been made
>> about a kernel panic in brcmsmac, ie. a divide-by-zero error.
> 3.9 was the first kernel to support CCK rates in minstrel_ht as
> fallback (in case the link gets very bad). Not sure if that triggers
> anything weird in brcmsmac.
>
>> Debugging the issue shows we end up with a rate with MCS index 110,
>> which is, well, impossible.
> Did you verify that it comes directly from minstrel_ht, or does it show
> up somewhere further down the chain in brcmsmac?
>
>> As brcmsmac gets the rate info from
>> minstrel_ht I was wondering if we have an intergration issue here. I saw
>> around April patches about new API which may have been in the 3.9 time
>> frame and something subtly changed things for brcmsmac.
> The new rate API was added in 3.10, not 3.9. It did add bug that caused
> bogus MCS rates. I've sent a patch for this a while back (shortly
> before 3.10 was released), but it was too late to make it into the
> release. I guess we have to wait for it to be applied through stable -
> no idea why that hasn't happened yet.

Reportedly the problem still exists in 3.10.6 and 3.11-rc4. So I started
digging some more. So can you have a look at the rate table below that
we setup in the wiphy structure:

static struct ieee80211_rate legacy_ratetable[] = {
RATE(10, 0),
RATE(20, IEEE80211_RATE_SHORT_PREAMBLE),
RATE(55, IEEE80211_RATE_SHORT_PREAMBLE),
RATE(110, IEEE80211_RATE_SHORT_PREAMBLE),
RATE(60, 0),
RATE(90, 0),
RATE(120, 0),
RATE(180, 0),
RATE(240, 0),
RATE(360, 0),
RATE(480, 0),
RATE(540, 0),
};

where RATE() is defined as:

#define RATE(rate100m, _flags) { \
.bitrate = (rate100m), \
.flags = (_flags), \
.hw_value = (rate100m / 5), \
}

Do you see anything obviously wrong here from minstrel_ht perspective?

Regards,
Arend

> Here is the fix:
>
> commit 1cd158573951f737fbc878a35cb5eb47bf9af3d5
> Author: Felix Fietkau <[email protected]>
> Date: Fri Jun 28 21:04:35 2013 +0200
>
> mac80211/minstrel_ht: fix cck rate sampling
>
> The CCK group needs special treatment to set the right flags and rate
> index. Add this missing check to prevent setting broken rates for tx
> packets.
>
> Cc: [email protected] # 3.10
> Signed-off-by: Felix Fietkau <[email protected]>
> Signed-off-by: Johannes Berg <[email protected]>
>
> diff --git a/net/mac80211/rc80211_minstrel_ht.c b/net/mac80211/rc80211_minstrel_ht.c
> index 5b2d301..f5aed96 100644
> --- a/net/mac80211/rc80211_minstrel_ht.c
> +++ b/net/mac80211/rc80211_minstrel_ht.c
> @@ -804,10 +804,18 @@ minstrel_ht_get_rate(void *priv, struct ieee80211_sta *sta, void *priv_sta,
>
> sample_group = &minstrel_mcs_groups[sample_idx / MCS_GROUP_RATES];
> info->flags |= IEEE80211_TX_CTL_RATE_CTRL_PROBE;
> + rate->count = 1;
> +
> + if (sample_idx / MCS_GROUP_RATES == MINSTREL_CCK_GROUP) {
> + int idx = sample_idx % ARRAY_SIZE(mp->cck_rates);
> + rate->idx = mp->cck_rates[idx];
> + rate->flags = 0;
> + return;
> + }
> +
> rate->idx = sample_idx % MCS_GROUP_RATES +
> (sample_group->streams - 1) * MCS_GROUP_RATES;
> rate->flags = IEEE80211_TX_RC_MCS | sample_group->flags;
> - rate->count = 1;
> }
>
> static void
>
>