From: Larry Finger <[email protected]>
Add a sanity check in rate_control_pid_adjust_rate(). Thanks to Larry Finger
for suggesting this and reporting a related bug.
Signed-off-by: Stefano Brivio <[email protected]>
NOT-Signed-off-by: Larry Finger <[email protected]>
---
Index: wireless-2.6/net/mac80211/rc80211_pid_algo.c
===================================================================
--- wireless-2.6.orig/net/mac80211/rc80211_pid_algo.c
+++ wireless-2.6/net/mac80211/rc80211_pid_algo.c
@@ -128,6 +128,11 @@ static void rate_control_pid_adjust_rate
}
newidx += back;
+
+ if (newidx < 0 || newidx >= sband->n_bitrates) {
+ WARN_ON(1);
+ break;
+ }
}
#ifdef CONFIG_MAC80211_DEBUGFS
--
Ciao
Stefano
Stefano Brivio wrote:
> From: Larry Finger <[email protected]>
>
> Add a sanity check in rate_control_pid_adjust_rate(). Thanks to Larry Finger
> for suggesting this and reporting a related bug.
>
> Signed-off-by: Stefano Brivio <[email protected]>
> NOT-Signed-off-by: Larry Finger <[email protected]>
Signed-off-by: Larry Finger <[email protected]>
On Sat, 26 Jan 2008 19:40:14 +0000 (UTC)
Lars <[email protected]> wrote:
> Hi,
>
> I have spent some time debugging this problem from a different angle.
>
> If you use a 'b/g' STA connecting to a 'b' AP you will run into this loop in a
> few seconds. What also happens is that the STA will try using 'g' speeds. That
> was the way I located the problem.
Thank you. I couldn't really figure out how to reproduce the bug.
> In other words, I do not think a WARN_ON is good solution.
Please could you try my 1/2 patch? In the meanwhile, I'll try to use your
information for more debugging.
--
Ciao
Stefano
So I tried to switch my AP to 802.11b only mode while associated:
Jan 27 14:00:43 morte [155681.801606] wlan0: RX deauthentication from 00:14:c1:35:8d:eb (reason=7)
Jan 27 14:00:43 morte [155681.801617] wlan0: deauthenticated
Jan 27 14:00:43 morte [155681.802802] wlan0: RX deauthentication from 00:14:c1:35:8d:eb (reason=7)
Jan 27 14:00:43 morte [155681.804121] wlan0: RX deauthentication from 00:14:c1:35:8d:eb (reason=7)
Jan 27 14:00:44 morte [155682.802888] wlan0: authenticate with AP 00:14:c1:35:8d:eb
Jan 27 14:00:44 morte [155682.805754] wlan0: RX authentication from 00:14:c1:35:8d:eb (alg=0 transaction=2 status=0)
Jan 27 14:00:44 morte [155682.805765] wlan0: authenticated
Jan 27 14:00:44 morte [155682.805773] wlan0: associate with AP 00:14:c1:35:8d:eb
Jan 27 14:00:44 morte [155682.807605] wlan0: RX ReassocResp from 00:14:c1:35:8d:eb (capab=0x1 status=0 aid=2)
Jan 27 14:00:44 morte [155682.807616] wlan0: associated
Jan 27 14:00:44 morte [155682.807629] wlan0: CTS protection enabled (BSSID=00:14:c1:35:8d:eb)
Jan 27 14:00:44 morte [155682.807638] wlan0: switched to long barker preamble (BSSID=00:14:c1:35:8d:eb)
Jan 27 14:00:44 morte [155682.836860] phy1: HW CONFIG: freq=2412
Jan 27 14:00:44 morte [155682.905834] phy1: HW CONFIG: freq=2417
Jan 27 14:00:45 morte [155682.974769] phy1: HW CONFIG: freq=2422
Jan 27 14:00:45 morte [155683.044555] phy1: HW CONFIG: freq=2427
Jan 27 14:00:45 morte [155683.112670] phy1: HW CONFIG: freq=2432
Jan 27 14:00:45 morte [155683.185615] phy1: HW CONFIG: freq=2437
Jan 27 14:00:45 morte [155683.254569] phy1: HW CONFIG: freq=2442
Jan 27 14:00:45 morte [155683.323524] phy1: HW CONFIG: freq=2447
Jan 27 14:00:45 morte [155683.393472] phy1: HW CONFIG: freq=2452
Jan 27 14:00:45 morte [155683.463473] phy1: HW CONFIG: freq=2457
Jan 27 14:00:45 morte [155683.532372] phy1: HW CONFIG: freq=2462
Jan 27 14:00:45 morte [155683.601635] phy1: HW CONFIG: freq=2462
Jan 27 14:01:42 morte [155740.006762] phy1: HW CONFIG: freq=2462
Jan 27 14:01:42 morte [155740.006803] wlan0: Initial auth_alg=0
Jan 27 14:01:42 morte [155740.006817] wlan0: authenticate with AP 00:14:c1:35:8d:eb
Jan 27 14:01:42 morte [155740.009325] wlan0: RX authentication from 00:14:c1:35:8d:eb (alg=0 transaction=2 status=0)
Jan 27 14:01:42 morte [155740.009336] wlan0: authenticated
Jan 27 14:01:42 morte [155740.009343] wlan0: associate with AP 00:14:c1:35:8d:eb
Jan 27 14:01:42 morte [155740.020796] wlan0: RX ReassocResp from 00:14:c1:35:8d:eb (capab=0x1 status=0 aid=2)
Jan 27 14:01:42 morte [155740.020806] wlan0: associated
Everything fine so far. Then, I wanted to unload mac80211, so I unloaded b43:
Jan 27 14:01:52 morte [155750.360286] b43-phy1 debug: Removing Interface type 2
Jan 27 14:01:52 morte [155750.412133] b43-phy1 debug: Wireless interface stopped
Jan 27 14:01:52 morte [155750.412331] b43-phy1 debug: DMA-32 0x0200 (RX) max used slots: 2/64
Jan 27 14:01:52 morte [155750.412462] b43-phy1 debug: DMA-32 0x02A0 (TX) max used slots: 0/128
Jan 27 14:01:52 morte [155750.414389] b43-phy1 debug: DMA-32 0x0280 (TX) max used slots: 0/128
Jan 27 14:01:52 morte [155750.416248] b43-phy1 debug: DMA-32 0x0260 (TX) max used slots: 0/128
Jan 27 14:01:52 morte [155750.418243] b43-phy1 debug: DMA-32 0x0240 (TX) max used slots: 0/128
Jan 27 14:01:52 morte [155750.420245] b43-phy1 debug: DMA-32 0x0220 (TX) max used slots: 128/128
Jan 27 14:01:52 morte [155750.422242] b43-phy1 debug: DMA-32 0x0200 (TX) max used slots: 0/128
Still fine. Then, I unloaded mac80211, but:
Jan 27 14:01:52 morte [155750.497798] phy1: Removed STA 00:14:c1:35:8d:eb
Jan 27 14:02:02 morte [155760.685067] unregister_netdevice: waiting for wlan0 to become free. Usage count = 1
Jan 27 14:02:12 morte [155770.782896] unregister_netdevice: waiting for wlan0 to become free. Usage count = 1
Jan 27 14:02:22 morte [155780.872092] unregister_netdevice: waiting for wlan0 to become free. Usage count = 1
Jan 27 14:02:33 morte [155790.960683] unregister_netdevice: waiting for wlan0 to become free. Usage count = 1
And even rmmod -f couldn't do that. Johannes, any clue?
--
Ciao
Stefano
On Saturday 26 January 2008 04:05:34 Stefano Brivio wrote:
> From: Larry Finger <[email protected]>
>
> Add a sanity check in rate_control_pid_adjust_rate(). Thanks to Larry Finger
> for suggesting this and reporting a related bug.
>
> Signed-off-by: Stefano Brivio <[email protected]>
> NOT-Signed-off-by: Larry Finger <[email protected]>
> ---
> Index: wireless-2.6/net/mac80211/rc80211_pid_algo.c
> ===================================================================
> --- wireless-2.6.orig/net/mac80211/rc80211_pid_algo.c
> +++ wireless-2.6/net/mac80211/rc80211_pid_algo.c
> @@ -128,6 +128,11 @@ static void rate_control_pid_adjust_rate
> }
>
> newidx += back;
> +
> + if (newidx < 0 || newidx >= sband->n_bitrates) {
> + WARN_ON(1);
> + break;
> + }
> }
Just you know, you can also do this kind of check this way:
if (WARN_ON(newidx < 0 || newidx >= sband->n_bitrates))
break;
This way you have an implicite unlikely() and less lines. :)
But you don't need to respin the patch. I'm also OK with the above.
This is just informational. ;)
--
Greetings Michael.
Stefano Brivio wrote:
> From: Larry Finger <[email protected]>
>
> Add a sanity check in rate_control_pid_adjust_rate(). Thanks to Larry Finger
> for suggesting this and reporting a related bug.
>
> Signed-off-by: Stefano Brivio <[email protected]>
> NOT-Signed-off-by: Larry Finger <[email protected]>
> ---
> Index: wireless-2.6/net/mac80211/rc80211_pid_algo.c
> ===================================================================
> --- wireless-2.6.orig/net/mac80211/rc80211_pid_algo.c
> +++ wireless-2.6/net/mac80211/rc80211_pid_algo.c
> @@ -128,6 +128,11 @@ static void rate_control_pid_adjust_rate
> }
>
> newidx += back;
> +
> + if (newidx < 0 || newidx >= sband->n_bitrates) {
> + WARN_ON(1);
> + break;
> + }
> }
>
> #ifdef CONFIG_MAC80211_DEBUGFS
The bug is triggered when the following conditions are true: (1) an 802.11g device is being used on
an 802.11b network, (2) "newidx" from rate_control_pid_shift_adjust is higher than the index of
rates used by the AP, and (3) the value of "back" is 1. The loop keeps increasing "newidx" but
sta->supp_rates & BIT(newidx) can never be true.
Although the real cause of the bug is due to selecting a trial value for the new rate that can never
be satisfied, I think that this sanity check is a reasonable way to detect and stop the runaway. As
this condition is likely to affect a lot of systems, I think the WARN_ON(1) should be removed from
the final version of the patch. With it in, a lot of logs will get spammed.
Larry
On Sun, 27 Jan 2008 09:56:00 -0700
Larry Finger <[email protected]> wrote:
> Although the real cause of the bug is due to selecting a trial value for the new rate that can never
> be satisfied, I think that this sanity check is a reasonable way to detect and stop the runaway. As
> this condition is likely to affect a lot of systems, I think the WARN_ON(1) should be removed from
> the final version of the patch. With it in, a lot of logs will get spammed.
Agreed. I'm going to post a quite different patch which aims to solve the
root cause. John, please disregard this one.
--
Ciao
Stefano
Sorry, just disregard this report, Michael and others already reported that.
--
Ciao
Stefano
Hi,
I have spent some time debugging this problem from a different angle.
If you use a 'b/g' STA connecting to a 'b' AP you will run into this loop in a
few seconds. What also happens is that the STA will try using 'g' speeds. That
was the way I located the problem.
In other words, I do not think a WARN_ON is good solution.
Regards
Lars
Hi Stefano,
What happens is that the while (newidx != sta->txrate) will continue
until the newidx reaches 33. Reason is that BIT(33) is the same as BIT(1)
In the rate_supported() function.
New sta->txrate will be 33 which is far beyond the valid size (12 in my
case).
In my system that happens to be a 11g speed :o).
I have a trace if you are intresting. Its about 260K data but it show
How it happens. I think you need trafik (ping) in order for the relulator
to operate.
Regards
Lars
-----Original Message-----
From: Stefano Brivio [mailto:[email protected]]
Sent: den 27 januari 2008 13:43
To: Lars
Cc: [email protected]
Subject: Re: [PATCH 2/2] rc80211-pid: add sanity check
On Sat, 26 Jan 2008 19:40:14 +0000 (UTC) Lars <[email protected]>
wrote:
> Hi,
>
> I have spent some time debugging this problem from a different angle.
>
> If you use a 'b/g' STA connecting to a 'b' AP you will run into this
> loop in a few seconds. What also happens is that the STA will try
> using 'g' speeds. That was the way I located the problem.
Thank you. I couldn't really figure out how to reproduce the bug.
> In other words, I do not think a WARN_ON is good solution.
Please could you try my 1/2 patch? In the meanwhile, I'll try to use your
information for more debugging.
--
Ciao
Stefano