Hi,
Firstly, please cc me into any any replies - I'm not subscribed.
In 2.6.25 kernels, my wireless LAN dies after even the smallest amount
of network activity. The following screen cut shows what I typically
see:
[chris:~]$ uname -a
Linux laptop 2.6.25-rc2 #10 PREEMPT Sat Feb 16 09:53:04 UTC 2008 i686
GNU/Linux
[chris:~]$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) from 192.168.1.30 : 56(84) bytes of data.
64 bytes from 192.168.1.1: icmp_seq=0 ttl=255 time=9.837 msec
64 bytes from 192.168.1.1: icmp_seq=1 ttl=255 time=3.148 msec
64 bytes from 192.168.1.1: icmp_seq=2 ttl=255 time=2.205 msec
^C
--- 192.168.1.1 ping statistics ---
9 packets transmitted, 3 packets received, 66% packet loss
round-trip min/avg/max/mdev = 2.205/5.063/9.837/3.397 ms
[chris:~]$ dmesg | tail
NET: Unregistered protocol family 23
parport_pc 00:0f: disabled
ACPI: PCI interrupt for device 0000:02:00.0 disabled
ACPI: PCI Interrupt 0000:01:00.0[A] -> Link [LNKA] -> GSI 11 (level, low) ->
IRQ 11
[drm] Initialized radeon 1.28.0 20060524 on minor 0
agpgart: Found an AGP 2.0 compliant device at 0000:00:00.0.
agpgart: Putting AGP V2 device at 0000:00:00.0 into 2x mode
agpgart: Putting AGP V2 device at 0000:01:00.0 into 2x mode
[drm] Setting GART location based on old memory map
[drm] writeback test succeeded in 1 usecs
[chris:~]$
As you can see, after a few packets, the ping application hangs and
after that point, all network accesses fail. There are no error
messages written to the logs when the network dies. I can restart the
network by simply unloading and reloading the driver. This hang does
not occur with a different wireless card that uses the rtl8180 driver.
My config and the output from dmesg (after reloading the driver) are
attached. I have Ralink debugfs enabled so can provide any additional
diagnostics that may be helpful from that source.
(I should perhaps add that I am sure that this is not the same problem
I reported tthrough bugzilla (bug 9860) a few weeks ago, which it
seems was a problem with my AP/router. Since resetting my AP/Router, I
can reboot to 2.6.24.2 and the wireless network works reliably with
the same hardware.)
The hardware is a Belkin F5D7010 802.11g Notebook card communicating
with a Draytek Vigor 2600 802.11b combined AP/Router/Broadband Modem.
I will be more than happy to provide additional diagnostics, but
please bear in mind that I am not a git user, so cannot do bisects. I
am, however, perfectly capable of applying or reverting patches,
rebuilding and re-testing, so I am quite happy to do that.
Thanks in advance.
Chris
Hi,
> In 2.6.25 kernels, my wireless LAN dies after even the smallest amount
> of network activity. The following screen cut shows what I typically
> see:
How complete is this failure? Just TX or also RX?
Could you use the tools found here:
http://www-user.rhrk.uni-kl.de/~nissler/rt2x00/index.html
and capture all TX/RX frames going through the hardware?
Note that after the failure, this dumping facitilty should still report
any ping request you might send to the interface.
> [chris:~]$ uname -a
> Linux laptop 2.6.25-rc2 #10 PREEMPT Sat Feb 16 09:53:04 UTC 2008 i686
> GNU/Linux
> [chris:~]$ ping 192.168.1.1
> PING 192.168.1.1 (192.168.1.1) from 192.168.1.30 : 56(84) bytes of data.
> 64 bytes from 192.168.1.1: icmp_seq=0 ttl=255 time=9.837 msec
> 64 bytes from 192.168.1.1: icmp_seq=1 ttl=255 time=3.148 msec
> 64 bytes from 192.168.1.1: icmp_seq=2 ttl=255 time=2.205 msec
I have a series of tests I would like to request from you,
you mentioned you already enabled debugfs, and that is just what we need. ;)
Please use attached script to create dumps of the hardware register contents.
There are specific moments that should be dumped:
- kernel 2.6.24 (last known working version for you).
- kernel 2.6.25-rc2 (after ifup, before TX dies)
- kernel 2.6.25-rc2 (after ifup, after TX dies)
> I will be more than happy to provide additional diagnostics, but
> please bear in mind that I am not a git user, so cannot do bisects. I
> am, however, perfectly capable of applying or reverting patches,
> rebuilding and re-testing, so I am quite happy to do that.
Above traces should be enough, but to determine where rt2x00 broke
down approximatly I need to have a few test result on specific moments.
Could you test the kernel with the following versions:
rt2x00 2.0.11 2d68de3efa62655d551092f5c787505735d561ad
rt2x00 2.0.12 a3c7aa58df7df80aa05f166fe3e42482247164cf
rt2x00 2.0.13 5a6012e105ae1664cd2841c33bf59fbdd8d4dbcc
Checking those out is simply a matter of:
git branch 2.0.11 2d68de3efa62655d551092f5c787505735d561ad
git checkout 2.0.11
No further bisecting is needed, but with above tests I can at least
narrow it down to find the cause of this issue.
Thanks.
Ivo
Almost forgot:
> > In 2.6.25 kernels, my wireless LAN dies after even the smallest amount
> > of network activity. The following screen cut shows what I typically
> > see:
>
> How complete is this failure? Just TX or also RX?
>
> Could you use the tools found here:
> http://www-user.rhrk.uni-kl.de/~nissler/rt2x00/index.html
>
> and capture all TX/RX frames going through the hardware?
> Note that after the failure, this dumping facitilty should still report
> any ping request you might send to the interface.
Note that this feature is only present in 2.6.25.
> I have a series of tests I would like to request from you,
> you mentioned you already enabled debugfs, and that is just what we need. ;)
> Please use attached script to create dumps of the hardware register contents.
The debugfs register files were moved into a seperate folder somewhere between
2.6.24 and 2.6.25. This means you might have to edit the file slightly to make it
point to the correct location of the chipset file.
Ivo
Hi Ivo,
On Monday 18 February 2008, Ivo van Doorn wrote:
> Hi,
>
> > In 2.6.25 kernels, my wireless LAN dies after even the smallest amount
> > of network activity. The following screen cut shows what I typically
> > see:
>
> How complete is this failure? Just TX or also RX?
>
> Could you use the tools found here:
> http://www-user.rhrk.uni-kl.de/~nissler/rt2x00/index.html
>
> and capture all TX/RX frames going through the hardware?
> Note that after the failure, this dumping facitilty should still report
> any ping request you might send to the interface.
>
It will be tomorrow before I can provide this because I'm struggling to get
wireshark to build against the old (2.4.x) kernel headers that match glibc on
my laptop. I'll build it on my desktop, which has more recent headers, and
decode the frame dump there. But I need sleep now :)
> > [chris:~]$ uname -a
> > Linux laptop 2.6.25-rc2 #10 PREEMPT Sat Feb 16 09:53:04 UTC 2008 i686
> > GNU/Linux
> > [chris:~]$ ping 192.168.1.1
> > PING 192.168.1.1 (192.168.1.1) from 192.168.1.30 : 56(84) bytes of data.
> > 64 bytes from 192.168.1.1: icmp_seq=0 ttl=255 time=9.837 msec
> > 64 bytes from 192.168.1.1: icmp_seq=1 ttl=255 time=3.148 msec
> > 64 bytes from 192.168.1.1: icmp_seq=2 ttl=255 time=2.205 msec
>
> I have a series of tests I would like to request from you,
> you mentioned you already enabled debugfs, and that is just what we need. ;)
> Please use attached script to create dumps of the hardware register contents.
>
> There are specific moments that should be dumped:
> - kernel 2.6.24 (last known working version for you).
> - kernel 2.6.25-rc2 (after ifup, before TX dies)
> - kernel 2.6.25-rc2 (after ifup, after TX dies)
>
These diagnostics are attached, with obvious filenames.
> > I will be more than happy to provide additional diagnostics, but
> > please bear in mind that I am not a git user, so cannot do bisects. I
> > am, however, perfectly capable of applying or reverting patches,
> > rebuilding and re-testing, so I am quite happy to do that.
>
> Above traces should be enough, but to determine where rt2x00 broke
> down approximatly I need to have a few test result on specific moments.
> Could you test the kernel with the following versions:
>
> rt2x00 2.0.11 2d68de3efa62655d551092f5c787505735d561ad
> rt2x00 2.0.12 a3c7aa58df7df80aa05f166fe3e42482247164cf
> rt2x00 2.0.13 5a6012e105ae1664cd2841c33bf59fbdd8d4dbcc
>
> Checking those out is simply a matter of:
> git branch 2.0.11 2d68de3efa62655d551092f5c787505735d561ad
> git checkout 2.0.11
>
> No further bisecting is needed, but with above tests I can at least
> narrow it down to find the cause of this issue.
>
> Thanks.
>
> Ivo
>
--
Beauty is in the eye of the beerholder.
Hi,
> > I have a series of tests I would like to request from you,
> > you mentioned you already enabled debugfs, and that is just what we need. ;)
> > Please use attached script to create dumps of the hardware register contents.
> >
> > There are specific moments that should be dumped:
> > - kernel 2.6.24 (last known working version for you).
> > - kernel 2.6.25-rc2 (after ifup, before TX dies)
> > - kernel 2.6.25-rc2 (after ifup, after TX dies)
> >
>
> These diagnostics are attached, with obvious filenames.
Thanks. I think I found something, please test below patch:
---
diff --git a/drivers/net/wireless/rt2x00/rt2x00dev.c b/drivers/net/wireless/rt2x00/rt2x00dev.c
index 015738a..8df1991 100644
--- a/drivers/net/wireless/rt2x00/rt2x00dev.c
+++ b/drivers/net/wireless/rt2x00/rt2x00dev.c
@@ -249,10 +249,10 @@ static void rt2x00lib_evaluate_antenna(struct rt2x00_dev *rt2x00dev)
rt2x00dev->link.ant.flags &= ~ANTENNA_TX_DIVERSITY;
if (rt2x00dev->hw->conf.antenna_sel_rx == 0 &&
- rt2x00dev->default_ant.rx != ANTENNA_SW_DIVERSITY)
+ rt2x00dev->default_ant.rx == ANTENNA_SW_DIVERSITY)
rt2x00dev->link.ant.flags |= ANTENNA_RX_DIVERSITY;
if (rt2x00dev->hw->conf.antenna_sel_tx == 0 &&
- rt2x00dev->default_ant.tx != ANTENNA_SW_DIVERSITY)
+ rt2x00dev->default_ant.tx == ANTENNA_SW_DIVERSITY)
rt2x00dev->link.ant.flags |= ANTENNA_TX_DIVERSITY;
if (!(rt2x00dev->link.ant.flags & ANTENNA_RX_DIVERSITY) &&
Hi,
On Tuesday 19 February 2008, Ivo van Doorn wrote:
> Hi,
>
> > > I have a series of tests I would like to request from you,
> > > you mentioned you already enabled debugfs, and that is just what we need. ;)
> > > Please use attached script to create dumps of the hardware register contents.
> > >
> > > There are specific moments that should be dumped:
> > > - kernel 2.6.24 (last known working version for you).
> > > - kernel 2.6.25-rc2 (after ifup, before TX dies)
> > > - kernel 2.6.25-rc2 (after ifup, after TX dies)
> > >
> >
> > These diagnostics are attached, with obvious filenames.
>
> Thanks. I think I found something, please test below patch:
>
I've tried the patch but, unfortunately, my wireless LAN still dies after a few pings.
The frame dump diagnostics you asked for are attached. This is a fresh dump taken
tonight running the driver with your patch applied.
Chris
--
Beauty is in the eye of the beerholder.
Hi,
[added rt2400-devel (rt2x00 development mailinglist) to the CC list.]
> > > > I have a series of tests I would like to request from you,
> > > > you mentioned you already enabled debugfs, and that is just what we need. ;)
> > > > Please use attached script to create dumps of the hardware register contents.
> > > >
> > > > There are specific moments that should be dumped:
> > > > - kernel 2.6.24 (last known working version for you).
> > > > - kernel 2.6.25-rc2 (after ifup, before TX dies)
> > > > - kernel 2.6.25-rc2 (after ifup, after TX dies)
> > > >
> > >
> > > These diagnostics are attached, with obvious filenames.
> >
> > Thanks. I think I found something, please test below patch:
> >
>
> I've tried the patch but, unfortunately, my wireless LAN still dies after a few pings.
Could you use below patch instead, and make a new dump of the register?
I'm still convinced the breakage occurs in the antenna diversity (or rather, I believe
it attempts a software diversity for your card while in fact it shouldn't).
> The frame dump diagnostics you asked for are attached. This is a fresh dump taken
> tonight running the driver with your patch applied.
Thanks, I think I miss some information in that dump,
but that is okay for now.
Ivo
---
diff --git a/drivers/net/wireless/rt2x00/rt2x00dev.c b/drivers/net/wireless/rt2x00/rt2x00dev.c
index 015738a..65a512f 100644
--- a/drivers/net/wireless/rt2x00/rt2x00dev.c
+++ b/drivers/net/wireless/rt2x00/rt2x00dev.c
@@ -223,7 +223,7 @@ static void rt2x00lib_evaluate_antenna_eval(struct rt2x00_dev *rt2x00dev)
* sample the rssi from the other antenna to make a valid
* comparison between the 2 antennas.
*/
- if ((rssi_curr - rssi_old) > -5 || (rssi_curr - rssi_old) < 5)
+ if (abs(rssi_curr - rssi_old) < 5)
return;
rt2x00dev->link.ant.flags |= ANTENNA_MODE_SAMPLE;
@@ -249,10 +249,10 @@ static void rt2x00lib_evaluate_antenna(struct rt2x00_dev *rt2x00dev)
rt2x00dev->link.ant.flags &= ~ANTENNA_TX_DIVERSITY;
if (rt2x00dev->hw->conf.antenna_sel_rx == 0 &&
- rt2x00dev->default_ant.rx != ANTENNA_SW_DIVERSITY)
+ rt2x00dev->default_ant.rx == ANTENNA_SW_DIVERSITY)
rt2x00dev->link.ant.flags |= ANTENNA_RX_DIVERSITY;
if (rt2x00dev->hw->conf.antenna_sel_tx == 0 &&
- rt2x00dev->default_ant.tx != ANTENNA_SW_DIVERSITY)
+ rt2x00dev->default_ant.tx == ANTENNA_SW_DIVERSITY)
rt2x00dev->link.ant.flags |= ANTENNA_TX_DIVERSITY;
if (!(rt2x00dev->link.ant.flags & ANTENNA_RX_DIVERSITY) &&
Hi,
On Tuesday 19 February 2008, Ivo van Doorn wrote:
> Hi,
>
[...]
> >
> > I've tried the patch but, unfortunately, my wireless LAN still dies after a few pings.
>
> Could you use below patch instead, and make a new dump of the register?
> I'm still convinced the breakage occurs in the antenna diversity (or rather, I believe
> it attempts a software diversity for your card while in fact it shouldn't).
>
Sorry, I've applied that patch and the LAN still dies after a few pings. BTW,
this and the earlier patch both apply without error, but give warnings of 70
line offsets. Were you expecting them to apply completely cleanly? I'm just
wondering if there might be some code that you are expecting to be running (or
not running) that is (or is not) present in the driver at 2.6.25-rc2.
The register dumps before and after are attached.
Thanks,
Chris
> > The frame dump diagnostics you asked for are attached. This is a fresh dump taken
> > tonight running the driver with your patch applied.
>
> Thanks, I think I miss some information in that dump,
> but that is okay for now.
>
> Ivo
>
> ---
>
> diff --git a/drivers/net/wireless/rt2x00/rt2x00dev.c b/drivers/net/wireless/rt2x00/rt2x00dev.c
> index 015738a..65a512f 100644
> --- a/drivers/net/wireless/rt2x00/rt2x00dev.c
> +++ b/drivers/net/wireless/rt2x00/rt2x00dev.c
> @@ -223,7 +223,7 @@ static void rt2x00lib_evaluate_antenna_eval(struct rt2x00_dev *rt2x00dev)
> * sample the rssi from the other antenna to make a valid
> * comparison between the 2 antennas.
> */
> - if ((rssi_curr - rssi_old) > -5 || (rssi_curr - rssi_old) < 5)
> + if (abs(rssi_curr - rssi_old) < 5)
> return;
>
> rt2x00dev->link.ant.flags |= ANTENNA_MODE_SAMPLE;
> @@ -249,10 +249,10 @@ static void rt2x00lib_evaluate_antenna(struct rt2x00_dev *rt2x00dev)
> rt2x00dev->link.ant.flags &= ~ANTENNA_TX_DIVERSITY;
>
> if (rt2x00dev->hw->conf.antenna_sel_rx == 0 &&
> - rt2x00dev->default_ant.rx != ANTENNA_SW_DIVERSITY)
> + rt2x00dev->default_ant.rx == ANTENNA_SW_DIVERSITY)
> rt2x00dev->link.ant.flags |= ANTENNA_RX_DIVERSITY;
> if (rt2x00dev->hw->conf.antenna_sel_tx == 0 &&
> - rt2x00dev->default_ant.tx != ANTENNA_SW_DIVERSITY)
> + rt2x00dev->default_ant.tx == ANTENNA_SW_DIVERSITY)
> rt2x00dev->link.ant.flags |= ANTENNA_TX_DIVERSITY;
>
> if (!(rt2x00dev->link.ant.flags & ANTENNA_RX_DIVERSITY) &&
>
>
--
Beauty is in the eye of the beerholder.
Hi,
> > > I've tried the patch but, unfortunately, my wireless LAN still dies after a few pings.
> >
> > Could you use below patch instead, and make a new dump of the register?
> > I'm still convinced the breakage occurs in the antenna diversity (or rather, I believe
> > it attempts a software diversity for your card while in fact it shouldn't).
> >
>
> Sorry, I've applied that patch and the LAN still dies after a few pings. BTW,
> this and the earlier patch both apply without error, but give warnings of 70
> line offsets. Were you expecting them to apply completely cleanly? I'm just
> wondering if there might be some code that you are expecting to be running (or
> not running) that is (or is not) present in the driver at 2.6.25-rc2.
Well to be honest I based the patch on rt2x00.git and not 2.6.25-rc2.
I know the patch would apply safely because the function that were changed
in that patch haven't changed between them. But some other functions were
moved. So that offset is correct. ;)
> The register dumps before and after are attached.
Thanks. I hope to have a new patch ready soon.
Ivo
On Tue, 2008-02-19 at 20:46 +0100, Ivo van Doorn wrote:
> Hi,
>
> [added rt2400-devel (rt2x00 development mailinglist) to the CC list.]
>
> > > > > I have a series of tests I would like to request from you,
> > > > > you mentioned you already enabled debugfs, and that is just what we need. ;)
> > > > > Please use attached script to create dumps of the hardware register contents.
> > > > >
> > > > > There are specific moments that should be dumped:
> > > > > - kernel 2.6.24 (last known working version for you).
> > > > > - kernel 2.6.25-rc2 (after ifup, before TX dies)
> > > > > - kernel 2.6.25-rc2 (after ifup, after TX dies)
> > > > >
> > > >
> > > > These diagnostics are attached, with obvious filenames.
> > >
> > > Thanks. I think I found something, please test below patch:
> > >
> >
> > I've tried the patch but, unfortunately, my wireless LAN still dies after a few pings.
rt2x00 2.0.14 is broken with my rt73 stick in the vanilla 2.6.25-rc2
kernel (not wireless-2.6/rt2x00 git). The modules load when I plug the
stick in but I then get a complete kernel lock up with two flashing
leds. Nothing is recorded to system logs. The last logged messages are
that usbcore has registered new interface driver rt73usb, and that the
rate control algorithm has been selected on phy0. This happens whether
the simple or pid mac80211 rate control algorithms have been chosen.
This is a shame because 2.0.14 was working really well for me until the
mac80211 changes 2 or 3 weeks ago broke it. (Shortly followed by the
release of 2.1.*).
Chris
On Tue, 2008-02-19 at 23:04 +0000, Chris Vine wrote:
> On Tue, 2008-02-19 at 20:46 +0100, Ivo van Doorn wrote:
> > Hi,
> >
> > [added rt2400-devel (rt2x00 development mailinglist) to the CC list.]
> >
> > > > > > I have a series of tests I would like to request from you,
> > > > > > you mentioned you already enabled debugfs, and that is just what we need. ;)
> > > > > > Please use attached script to create dumps of the hardware register contents.
> > > > > >
> > > > > > There are specific moments that should be dumped:
> > > > > > - kernel 2.6.24 (last known working version for you).
> > > > > > - kernel 2.6.25-rc2 (after ifup, before TX dies)
> > > > > > - kernel 2.6.25-rc2 (after ifup, after TX dies)
> > > > > >
> > > > >
> > > > > These diagnostics are attached, with obvious filenames.
> > > >
> > > > Thanks. I think I found something, please test below patch:
> > > >
> > >
> > > I've tried the patch but, unfortunately, my wireless LAN still dies after a few pings.
>
> rt2x00 2.0.14 is broken with my rt73 stick in the vanilla 2.6.25-rc2
> kernel (not wireless-2.6/rt2x00 git). The modules load when I plug the
> stick in but I then get a complete kernel lock up with two flashing
> leds. Nothing is recorded to system logs. The last logged messages are
> that usbcore has registered new interface driver rt73usb, and that the
> rate control algorithm has been selected on phy0. This happens whether
> the simple or pid mac80211 rate control algorithms have been chosen.
>
> This is a shame because 2.0.14 was working really well for me until the
> mac80211 changes 2 or 3 weeks ago broke it. (Shortly followed by the
> release of 2.1.*).
Switch to a VT with Ctl+Alt+1, then plug the stick in, and take a
picture of the panic if one shows up. _Something_ should show up on the
VT.
Dan
On Wed, 2008-02-20 at 11:05 -0500, Dan Williams wrote:
> On Tue, 2008-02-19 at 23:04 +0000, Chris Vine wrote:
> > On Tue, 2008-02-19 at 20:46 +0100, Ivo van Doorn wrote:
> > > Hi,
> > >
> > > [added rt2400-devel (rt2x00 development mailinglist) to the CC list.]
> > >
> > > > > > > I have a series of tests I would like to request from you,
> > > > > > > you mentioned you already enabled debugfs, and that is just what we need. ;)
> > > > > > > Please use attached script to create dumps of the hardware register contents.
> > > > > > >
> > > > > > > There are specific moments that should be dumped:
> > > > > > > - kernel 2.6.24 (last known working version for you).
> > > > > > > - kernel 2.6.25-rc2 (after ifup, before TX dies)
> > > > > > > - kernel 2.6.25-rc2 (after ifup, after TX dies)
> > > > > > >
> > > > > >
> > > > > > These diagnostics are attached, with obvious filenames.
> > > > >
> > > > > Thanks. I think I found something, please test below patch:
> > > > >
> > > >
> > > > I've tried the patch but, unfortunately, my wireless LAN still dies after a few pings.
> >
> > rt2x00 2.0.14 is broken with my rt73 stick in the vanilla 2.6.25-rc2
> > kernel (not wireless-2.6/rt2x00 git). The modules load when I plug the
> > stick in but I then get a complete kernel lock up with two flashing
> > leds. Nothing is recorded to system logs. The last logged messages are
> > that usbcore has registered new interface driver rt73usb, and that the
> > rate control algorithm has been selected on phy0. This happens whether
> > the simple or pid mac80211 rate control algorithms have been chosen.
> >
> > This is a shame because 2.0.14 was working really well for me until the
> > mac80211 changes 2 or 3 weeks ago broke it. (Shortly followed by the
> > release of 2.1.*).
>
> Switch to a VT with Ctl+Alt+1, then plug the stick in, and take a
> picture of the panic if one shows up. _Something_ should show up on the
> VT.
I did that yesterday and it just reported a kernel panic on the terminal
with the message:
Kernel panic - not syncing: Aiee, killing interrupt handler!
There is a complete lock up. Even the two leds don't send a dump in
morse code (if that is still a feature of the 2.6 kernels). They just
flash together at 1 second intervals.
However, I do not have debugging enabled on 2.6.25-rc2 (I was just
interested to see how it worked). If it is thought to be useful I can
recompile the kernel with debugging enabled, but this should be
reproducible by anyone with a rt73 stick.
By way of a further data point, I can scan OK using 2.6.25-rc2 and it
will report all the available access points in my area. But as soon as
association is attempted, it blows up.
Chris
On Wednesday 20 February 2008, Chris Vine wrote:
>
> On Wed, 2008-02-20 at 11:05 -0500, Dan Williams wrote:
> > On Tue, 2008-02-19 at 23:04 +0000, Chris Vine wrote:
> > > On Tue, 2008-02-19 at 20:46 +0100, Ivo van Doorn wrote:
> > > > Hi,
> > > >
> > > > [added rt2400-devel (rt2x00 development mailinglist) to the CC list.]
> > > >
> > > > > > > > I have a series of tests I would like to request from you,
> > > > > > > > you mentioned you already enabled debugfs, and that is just what we need. ;)
> > > > > > > > Please use attached script to create dumps of the hardware register contents.
> > > > > > > >
> > > > > > > > There are specific moments that should be dumped:
> > > > > > > > - kernel 2.6.24 (last known working version for you).
> > > > > > > > - kernel 2.6.25-rc2 (after ifup, before TX dies)
> > > > > > > > - kernel 2.6.25-rc2 (after ifup, after TX dies)
> > > > > > > >
> > > > > > >
> > > > > > > These diagnostics are attached, with obvious filenames.
> > > > > >
> > > > > > Thanks. I think I found something, please test below patch:
> > > > > >
> > > > >
> > > > > I've tried the patch but, unfortunately, my wireless LAN still dies after a few pings.
> > >
> > > rt2x00 2.0.14 is broken with my rt73 stick in the vanilla 2.6.25-rc2
> > > kernel (not wireless-2.6/rt2x00 git). The modules load when I plug the
> > > stick in but I then get a complete kernel lock up with two flashing
> > > leds. Nothing is recorded to system logs. The last logged messages are
> > > that usbcore has registered new interface driver rt73usb, and that the
> > > rate control algorithm has been selected on phy0. This happens whether
> > > the simple or pid mac80211 rate control algorithms have been chosen.
> > >
> > > This is a shame because 2.0.14 was working really well for me until the
> > > mac80211 changes 2 or 3 weeks ago broke it. (Shortly followed by the
> > > release of 2.1.*).
> >
> > Switch to a VT with Ctl+Alt+1, then plug the stick in, and take a
> > picture of the panic if one shows up. _Something_ should show up on the
> > VT.
>
> I did that yesterday and it just reported a kernel panic on the terminal
> with the message:
>
> Kernel panic - not syncing: Aiee, killing interrupt handler!
I have an idea, could you try below patch?
Note that while applying it will mention something about a line offset, but that can be ignored.
This could perhaps also fix the TX/RX issue mentioned earlier in the thread, but I am not
quite sure about that.
---
diff --git a/drivers/net/wireless/rt2x00/rt2400pci.c b/drivers/net/wireless/rt2x00/rt2400pci.c
index b63bc66..460ef2f 100644
--- a/drivers/net/wireless/rt2x00/rt2400pci.c
+++ b/drivers/net/wireless/rt2x00/rt2400pci.c
@@ -953,8 +953,12 @@ static int rt2400pci_set_device_state(struct rt2x00_dev *rt2x00dev,
rt2400pci_disable_radio(rt2x00dev);
break;
case STATE_RADIO_RX_ON:
+ case STATE_RADIO_RX_ON_LINK:
+ rt2400pci_toggle_rx(rt2x00dev, STATE_RADIO_RX_ON);
+ break;
case STATE_RADIO_RX_OFF:
- rt2400pci_toggle_rx(rt2x00dev, state);
+ case STATE_RADIO_RX_OFF_LINK:
+ rt2400pci_toggle_rx(rt2x00dev, STATE_RADIO_RX_OFF);
break;
case STATE_DEEP_SLEEP:
case STATE_SLEEP:
diff --git a/drivers/net/wireless/rt2x00/rt2500pci.c b/drivers/net/wireless/rt2x00/rt2500pci.c
index add8aff..ffcd996 100644
--- a/drivers/net/wireless/rt2x00/rt2500pci.c
+++ b/drivers/net/wireless/rt2x00/rt2500pci.c
@@ -1106,8 +1106,12 @@ static int rt2500pci_set_device_state(struct rt2x00_dev *rt2x00dev,
rt2500pci_disable_radio(rt2x00dev);
break;
case STATE_RADIO_RX_ON:
+ case STATE_RADIO_RX_ON_LINK:
+ rt2500pci_toggle_rx(rt2x00dev, STATE_RADIO_RX_ON);
+ break;
case STATE_RADIO_RX_OFF:
- rt2500pci_toggle_rx(rt2x00dev, state);
+ case STATE_RADIO_RX_OFF_LINK:
+ rt2500pci_toggle_rx(rt2x00dev, STATE_RADIO_RX_OFF);
break;
case STATE_DEEP_SLEEP:
case STATE_SLEEP:
diff --git a/drivers/net/wireless/rt2x00/rt2500usb.c b/drivers/net/wireless/rt2x00/rt2500usb.c
index d9643c5..9f59db9 100644
--- a/drivers/net/wireless/rt2x00/rt2500usb.c
+++ b/drivers/net/wireless/rt2x00/rt2500usb.c
@@ -996,8 +996,12 @@ static int rt2500usb_set_device_state(struct rt2x00_dev *rt2x00dev,
rt2500usb_disable_radio(rt2x00dev);
break;
case STATE_RADIO_RX_ON:
+ case STATE_RADIO_RX_ON_LINK:
+ rt2500usb_toggle_rx(rt2x00dev, STATE_RADIO_RX_ON);
+ break;
case STATE_RADIO_RX_OFF:
- rt2500usb_toggle_rx(rt2x00dev, state);
+ case STATE_RADIO_RX_OFF_LINK:
+ rt2500usb_toggle_rx(rt2x00dev, STATE_RADIO_RX_OFF);
break;
case STATE_DEEP_SLEEP:
case STATE_SLEEP:
diff --git a/drivers/net/wireless/rt2x00/rt2x00config.c b/drivers/net/wireless/rt2x00/rt2x00config.c
index 46888f9..a1d8e33 100644
--- a/drivers/net/wireless/rt2x00/rt2x00config.c
+++ b/drivers/net/wireless/rt2x00/rt2x00config.c
@@ -127,7 +127,7 @@ void rt2x00lib_config_antenna(struct rt2x00_dev *rt2x00dev,
* else the changes will be ignored by the device.
*/
if (test_bit(DEVICE_ENABLED_RADIO, &rt2x00dev->flags))
- rt2x00lib_toggle_rx(rt2x00dev, STATE_RADIO_RX_OFF);
+ rt2x00lib_toggle_rx(rt2x00dev, STATE_RADIO_RX_OFF_LINK);
/*
* Write new antenna setup to device and reset the link tuner.
@@ -141,7 +141,7 @@ void rt2x00lib_config_antenna(struct rt2x00_dev *rt2x00dev,
rt2x00dev->link.ant.active.tx = libconf.ant.tx;
if (test_bit(DEVICE_ENABLED_RADIO, &rt2x00dev->flags))
- rt2x00lib_toggle_rx(rt2x00dev, STATE_RADIO_RX_ON);
+ rt2x00lib_toggle_rx(rt2x00dev, STATE_RADIO_RX_ON_LINK);
}
void rt2x00lib_config(struct rt2x00_dev *rt2x00dev,
diff --git a/drivers/net/wireless/rt2x00/rt2x00reg.h b/drivers/net/wireless/rt2x00/rt2x00reg.h
index add1f09..0325bed 100644
--- a/drivers/net/wireless/rt2x00/rt2x00reg.h
+++ b/drivers/net/wireless/rt2x00/rt2x00reg.h
@@ -85,6 +85,8 @@ enum dev_state {
STATE_RADIO_OFF,
STATE_RADIO_RX_ON,
STATE_RADIO_RX_OFF,
+ STATE_RADIO_RX_ON_LINK,
+ STATE_RADIO_RX_OFF_LINK,
STATE_RADIO_IRQ_ON,
STATE_RADIO_IRQ_OFF,
};
diff --git a/drivers/net/wireless/rt2x00/rt61pci.c b/drivers/net/wireless/rt2x00/rt61pci.c
index ca83d94..091fe39 100644
--- a/drivers/net/wireless/rt2x00/rt61pci.c
+++ b/drivers/net/wireless/rt2x00/rt61pci.c
@@ -1458,8 +1458,12 @@ static int rt61pci_set_device_state(struct rt2x00_dev *rt2x00dev,
rt61pci_disable_radio(rt2x00dev);
break;
case STATE_RADIO_RX_ON:
+ case STATE_RADIO_RX_ON_LINK:
+ rt61pci_toggle_rx(rt2x00dev, STATE_RADIO_RX_ON);
+ break;
case STATE_RADIO_RX_OFF:
- rt61pci_toggle_rx(rt2x00dev, state);
+ case STATE_RADIO_RX_OFF_LINK:
+ rt61pci_toggle_rx(rt2x00dev, STATE_RADIO_RX_OFF);
break;
case STATE_DEEP_SLEEP:
case STATE_SLEEP:
diff --git a/drivers/net/wireless/rt2x00/rt73usb.c b/drivers/net/wireless/rt2x00/rt73usb.c
index 7d6ee97..6546b0d 100644
--- a/drivers/net/wireless/rt2x00/rt73usb.c
+++ b/drivers/net/wireless/rt2x00/rt73usb.c
@@ -1196,8 +1196,12 @@ static int rt73usb_set_device_state(struct rt2x00_dev *rt2x00dev,
rt73usb_disable_radio(rt2x00dev);
break;
case STATE_RADIO_RX_ON:
+ case STATE_RADIO_RX_ON_LINK:
+ rt73usb_toggle_rx(rt2x00dev, STATE_RADIO_RX_ON);
+ break;
case STATE_RADIO_RX_OFF:
- rt73usb_toggle_rx(rt2x00dev, state);
+ case STATE_RADIO_RX_OFF_LINK:
+ rt73usb_toggle_rx(rt2x00dev, STATE_RADIO_RX_OFF);
break;
case STATE_DEEP_SLEEP:
case STATE_SLEEP:
On Wed, 2008-02-20 at 21:50 +0100, Ivo van Doorn wrote:
[snip]
> On Wednesday 20 February 2008, Chris Vine wrote:
> > I did that yesterday and it just reported a kernel panic on the terminal
> > with the message:
> >
> > Kernel panic - not syncing: Aiee, killing interrupt handler!
>
> I have an idea, could you try below patch?
> Note that while applying it will mention something about a line offset, but that can be ignored.
>
> This could perhaps also fix the TX/RX issue mentioned earlier in the thread, but I am not
> quite sure about that.
The patch applied OK (with some offsets as you say) but it doesn't help.
The kernel panic still occurs when association is attempted.
Chris
Hi Ivo,
[...]
>
> I have an idea, could you try below patch?
> Note that while applying it will mention something about a line offset, but that can be ignored.
>
> This could perhaps also fix the TX/RX issue mentioned earlier in the thread, but I am not
> quite sure about that.
>
Sorry, but again a few pings and the network fails. I've attached the before and
after register dumps. This is with your patch applied against 2.6.25-rc2-git4.
Chris
--
Beauty is in the eye of the beerholder.
On Wed, 2008-02-20 at 21:16 +0000, Chris Vine wrote:
> On Wed, 2008-02-20 at 21:50 +0100, Ivo van Doorn wrote:
> [snip]
> > On Wednesday 20 February 2008, Chris Vine wrote:
> > > I did that yesterday and it just reported a kernel panic on the terminal
> > > with the message:
> > >
> > > Kernel panic - not syncing: Aiee, killing interrupt handler!
> >
> > I have an idea, could you try below patch?
> > Note that while applying it will mention something about a line offset, but that can be ignored.
> >
> > This could perhaps also fix the TX/RX issue mentioned earlier in the thread, but I am not
> > quite sure about that.
>
> The patch applied OK (with some offsets as you say) but it doesn't help.
> The kernel panic still occurs when association is attempted.
Here's some further information.
I have a fully functioning version of rt2x00-2.0.14 and mac80211 from
wireless-2.6/compat-wireless-2.6 of mid January which works fine on
kernel 2.6.24. On doing a comparison with the rt2x00 in vanilla kernel
2.6.25-rc2, there are no material differences. (There was a slight
change in the declaration a variable in rt2x00usb.c but it is
immaterial.)
I compiled up the working mid-January version of rt2x00 and mac80211
under kernel 2.6.25-rc2 and I get exactly the same result as I reported
earlier, namely I get a kernel panic as soon as I try to associate. It
looks therefore as if something has changed within the remainder of the
kernel which has caused rt2x00 (and possibly mac80211?) to break.
This probably explains the problem another user reported with rt61.
Chris
On Thursday 21 February 2008, Chris Vine wrote:
> On Wed, 2008-02-20 at 21:16 +0000, Chris Vine wrote:
> > On Wed, 2008-02-20 at 21:50 +0100, Ivo van Doorn wrote:
> > [snip]
> > > On Wednesday 20 February 2008, Chris Vine wrote:
> > > > I did that yesterday and it just reported a kernel panic on the terminal
> > > > with the message:
> > > >
> > > > Kernel panic - not syncing: Aiee, killing interrupt handler!
> > >
> > > I have an idea, could you try below patch?
> > > Note that while applying it will mention something about a line offset, but that can be ignored.
> > >
> > > This could perhaps also fix the TX/RX issue mentioned earlier in the thread, but I am not
> > > quite sure about that.
> >
> > The patch applied OK (with some offsets as you say) but it doesn't help.
> > The kernel panic still occurs when association is attempted.
>
> Here's some further information.
>
> I have a fully functioning version of rt2x00-2.0.14 and mac80211 from
> wireless-2.6/compat-wireless-2.6 of mid January which works fine on
> kernel 2.6.24. On doing a comparison with the rt2x00 in vanilla kernel
> 2.6.25-rc2, there are no material differences. (There was a slight
> change in the declaration a variable in rt2x00usb.c but it is
> immaterial.)
>
> I compiled up the working mid-January version of rt2x00 and mac80211
> under kernel 2.6.25-rc2 and I get exactly the same result as I reported
> earlier, namely I get a kernel panic as soon as I try to associate. It
> looks therefore as if something has changed within the remainder of the
> kernel which has caused rt2x00 (and possibly mac80211?) to break.
>
> This probably explains the problem another user reported with rt61.
Perhaps something similar like:
http://bugzilla.kernel.org/show_bug.cgi?id=10058
in there a reference is made to the following patch:
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc2/2.6.25-rc2-mm1/broken-out/revert-send-a-single-notification-on-device-state-changes.patch
Does applying that help?
Ivo
On Thu, 2008-02-21 at 22:51 +0100, Ivo van Doorn wrote:
> On Thursday 21 February 2008, Chris Vine wrote:
[snip]
> > Here's some further information.
> >
> > I have a fully functioning version of rt2x00-2.0.14 and mac80211 from
> > wireless-2.6/compat-wireless-2.6 of mid January which works fine on
> > kernel 2.6.24. On doing a comparison with the rt2x00 in vanilla kernel
> > 2.6.25-rc2, there are no material differences. (There was a slight
> > change in the declaration a variable in rt2x00usb.c but it is
> > immaterial.)
> >
> > I compiled up the working mid-January version of rt2x00 and mac80211
> > under kernel 2.6.25-rc2 and I get exactly the same result as I reported
> > earlier, namely I get a kernel panic as soon as I try to associate. It
> > looks therefore as if something has changed within the remainder of the
> > kernel which has caused rt2x00 (and possibly mac80211?) to break.
> >
> > This probably explains the problem another user reported with rt61.
>
> Perhaps something similar like:
> http://bugzilla.kernel.org/show_bug.cgi?id=10058
> in there a reference is made to the following patch:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc2/2.6.25-rc2-mm1/broken-out/revert-send-a-single-notification-on-device-state-changes.patch
>
> Does applying that help?
Yes, well done.
I have spent 20 minutes testing it and it seems to work fine (at least
as well as 2.0.14 does under kernel 2.6.24). The rate control algorithm
seems to work better as well, but that is probably a mac80211 thing.
Chris
On Thursday 21 February 2008, Chris Vine wrote:
> On Thu, 2008-02-21 at 22:51 +0100, Ivo van Doorn wrote:
> > On Thursday 21 February 2008, Chris Vine wrote:
> [snip]
> > > Here's some further information.
> > >
> > > I have a fully functioning version of rt2x00-2.0.14 and mac80211 from
> > > wireless-2.6/compat-wireless-2.6 of mid January which works fine on
> > > kernel 2.6.24. On doing a comparison with the rt2x00 in vanilla kernel
> > > 2.6.25-rc2, there are no material differences. (There was a slight
> > > change in the declaration a variable in rt2x00usb.c but it is
> > > immaterial.)
> > >
> > > I compiled up the working mid-January version of rt2x00 and mac80211
> > > under kernel 2.6.25-rc2 and I get exactly the same result as I reported
> > > earlier, namely I get a kernel panic as soon as I try to associate. It
> > > looks therefore as if something has changed within the remainder of the
> > > kernel which has caused rt2x00 (and possibly mac80211?) to break.
> > >
> > > This probably explains the problem another user reported with rt61.
> >
> > Perhaps something similar like:
> > http://bugzilla.kernel.org/show_bug.cgi?id=10058
> > in there a reference is made to the following patch:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc2/2.6.25-rc2-mm1/broken-out/revert-send-a-single-notification-on-device-state-changes.patch
> >
> > Does applying that help?
>
> Yes, well done.
>
> I have spent 20 minutes testing it and it seems to work fine (at least
> as well as 2.0.14 does under kernel 2.6.24). The rate control algorithm
> seems to work better as well, but that is probably a mac80211 thing.
Excellent, I am currently updating rt2x00.git from wireless-testing to get
the above mentioned patch into the repository.
Ivo
On Thursday 21 February 2008, Ivo van Doorn wrote:
> On Thursday 21 February 2008, Chris Vine wrote:
> > On Wed, 2008-02-20 at 21:16 +0000, Chris Vine wrote:
> > > On Wed, 2008-02-20 at 21:50 +0100, Ivo van Doorn wrote:
> > > [snip]
> > > > On Wednesday 20 February 2008, Chris Vine wrote:
> > > > > I did that yesterday and it just reported a kernel panic on the terminal
> > > > > with the message:
> > > > >
> > > > > Kernel panic - not syncing: Aiee, killing interrupt handler!
> > > >
> > > > I have an idea, could you try below patch?
> > > > Note that while applying it will mention something about a line offset, but that can be ignored.
> > > >
> > > > This could perhaps also fix the TX/RX issue mentioned earlier in the thread, but I am not
> > > > quite sure about that.
> > >
> > > The patch applied OK (with some offsets as you say) but it doesn't help.
> > > The kernel panic still occurs when association is attempted.
> >
> > Here's some further information.
> >
> > I have a fully functioning version of rt2x00-2.0.14 and mac80211 from
> > wireless-2.6/compat-wireless-2.6 of mid January which works fine on
> > kernel 2.6.24. On doing a comparison with the rt2x00 in vanilla kernel
> > 2.6.25-rc2, there are no material differences. (There was a slight
> > change in the declaration a variable in rt2x00usb.c but it is
> > immaterial.)
> >
> > I compiled up the working mid-January version of rt2x00 and mac80211
> > under kernel 2.6.25-rc2 and I get exactly the same result as I reported
> > earlier, namely I get a kernel panic as soon as I try to associate. It
> > looks therefore as if something has changed within the remainder of the
> > kernel which has caused rt2x00 (and possibly mac80211?) to break.
> >
> > This probably explains the problem another user reported with rt61.
>
> Perhaps something similar like:
> http://bugzilla.kernel.org/show_bug.cgi?id=10058
> in there a reference is made to the following patch:
> ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc2/2.6.25-rc2-mm1/broken-out/revert-send-a-single-notification-on-device-state-changes.patch
>
> Does applying that help?
I'm afraid not, Ivo. The test I ran last night was against 2.6.25.-rc2-git4 and
that already has this patch applied. Furthermore, I have another card that uses
the rtl8180 driver and that works reliably. I, therefore, suspect that my problem
lies within the rt61pci driver or the rt2x00 infrastructure.
Chris
>
> Ivo
>
--
Beauty is in the eye of the beerholder.
On Thu, 2008-02-21 at 23:04 +0000, Chris Clayton wrote:
> On Thursday 21 February 2008, Ivo van Doorn wrote:
> > On Thursday 21 February 2008, Chris Vine wrote:
[snip]
> > > This probably explains the problem another user reported with rt61.
> >
> > Perhaps something similar like:
> > http://bugzilla.kernel.org/show_bug.cgi?id=10058
> > in there a reference is made to the following patch:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc2/2.6.25-rc2-mm1/broken-out/revert-send-a-single-notification-on-device-state-changes.patch
> >
> > Does applying that help?
>
> I'm afraid not, Ivo. The test I ran last night was against 2.6.25.-rc2-git4 and
> that already has this patch applied. Furthermore, I have another card that uses
> the rtl8180 driver and that works reliably. I, therefore, suspect that my problem
> lies within the rt61pci driver or the rt2x00 infrastructure.
Does the same happen with 2.0.14 under kernel 2.6.24?
Chris
On Thursday 21 February 2008, Chris Vine wrote:
> On Thu, 2008-02-21 at 23:04 +0000, Chris Clayton wrote:
> > On Thursday 21 February 2008, Ivo van Doorn wrote:
> > > On Thursday 21 February 2008, Chris Vine wrote:
> [snip]
> > > > This probably explains the problem another user reported with rt61.
> > >
> > > Perhaps something similar like:
> > > http://bugzilla.kernel.org/show_bug.cgi?id=10058
> > > in there a reference is made to the following patch:
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc2/2.6.25-rc2-mm1/broken-out/revert-send-a-single-notification-on-device-state-changes.patch
> > >
> > > Does applying that help?
> >
> > I'm afraid not, Ivo. The test I ran last night was against 2.6.25.-rc2-git4 and
> > that already has this patch applied. Furthermore, I have another card that uses
> > the rtl8180 driver and that works reliably. I, therefore, suspect that my problem
> > lies within the rt61pci driver or the rt2x00 infrastructure.
>
> Does the same happen with 2.0.14 under kernel 2.6.24?
Unfortunately, a 2.6.24.2 tree with the drivers/net/wireless/rt2x00 directory replaced with that from 2.6.25-rc2-git4 doesn't build:
In file included from drivers/net/wireless/rt2x00/rt2x00dev.c:29:
drivers/net/wireless/rt2x00/rt2x00.h:942: warning: `struct ieee80211_bss_conf' declared inside parameter list
drivers/net/wireless/rt2x00/rt2x00.h:942: warning: its scope is only this definition or declaration, which is probably not what you want
[...]
drivers/net/wireless/rt2x00/rt2x00dev.c: In function `rt2x00lib_configuration_scheduled':
drivers/net/wireless/rt2x00/rt2x00dev.c:484: error: storage size of `bss_conf' isn't known
drivers/net/wireless/rt2x00/rt2x00dev.c:494: error: `BSS_CHANGED_ERP_PREAMBLE' undeclared (first use in this function)
drivers/net/wireless/rt2x00/rt2x00dev.c:494: error: (Each undeclared identifier is reported only once
drivers/net/wireless/rt2x00/rt2x00dev.c:494: error: for each function it appears in.)
drivers/net/wireless/rt2x00/rt2x00dev.c:484: warning: unused variable `bss_conf'
drivers/net/wireless/rt2x00/rt2x00dev.c: In function `rt2x00lib_beacondone_scheduled':
drivers/net/wireless/rt2x00/rt2x00dev.c:511: warning: passing arg 2 of `ieee80211_beacon_get' makes integer from pointer without a cast
>
> Chris
>
>
>
>
--
Beauty is in the eye of the beerholder.
Hi Ivo,
On 18/02/2008, Ivo van Doorn <[email protected]> wrote:
> Hi,
>
[...]
> Above traces should be enough, but to determine where rt2x00 broke
> down approximatly I need to have a few test result on specific moments.
> Could you test the kernel with the following versions:
>
> rt2x00 2.0.11 2d68de3efa62655d551092f5c787505735d561ad
> rt2x00 2.0.12 a3c7aa58df7df80aa05f166fe3e42482247164cf
> rt2x00 2.0.13 5a6012e105ae1664cd2841c33bf59fbdd8d4dbcc
>
> Checking those out is simply a matter of:
> git branch 2.0.11 2d68de3efa62655d551092f5c787505735d561ad
> git checkout 2.0.11
>
OK, we seem to be struggling a little, so I've built an installed git
and cloned Linus' 2.6 tree. My wireless network dies after a few pings
with rt2x00 2.0.11.
> No further bisecting is needed, but with above tests I can at least
> narrow it down to find the cause of this issue.
>
If you need me to bisect, just shout. Please be patient though, I'm
exploring new territory here :-)
Thanks
Chris
> Thanks.
>
>
> Ivo
>
>
--
Beauty is in the eye of the beerholder.
On Friday 22 February 2008, Chris Clayton wrote:
> On Thursday 21 February 2008, Chris Vine wrote:
> > On Thu, 2008-02-21 at 23:04 +0000, Chris Clayton wrote:
> > > On Thursday 21 February 2008, Ivo van Doorn wrote:
> > > > On Thursday 21 February 2008, Chris Vine wrote:
> > [snip]
> > > > > This probably explains the problem another user reported with rt61.
> > > >
> > > > Perhaps something similar like:
> > > > http://bugzilla.kernel.org/show_bug.cgi?id=10058
> > > > in there a reference is made to the following patch:
> > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc2/2.6.25-rc2-mm1/broken-out/revert-send-a-single-notification-on-device-state-changes.patch
> > > >
> > > > Does applying that help?
> > >
> > > I'm afraid not, Ivo. The test I ran last night was against 2.6.25.-rc2-git4 and
> > > that already has this patch applied. Furthermore, I have another card that uses
> > > the rtl8180 driver and that works reliably. I, therefore, suspect that my problem
> > > lies within the rt61pci driver or the rt2x00 infrastructure.
> >
> > Does the same happen with 2.0.14 under kernel 2.6.24?
>
> Unfortunately, a 2.6.24.2 tree with the drivers/net/wireless/rt2x00 directory replaced with that from 2.6.25-rc2-git4 doesn't build:
You need the mac80211 compat module from Intel. That allows new mac80211 versions to run
on older kernels. When you use that you need to grab the rt2x00 cvs tarball from:
http://rt2x00.serialmonkey.com/wiki/index.php/Downloads
which allows rt2x00 to be compiled outside of the kernel.
> In file included from drivers/net/wireless/rt2x00/rt2x00dev.c:29:
> drivers/net/wireless/rt2x00/rt2x00.h:942: warning: `struct ieee80211_bss_conf' declared inside parameter list
> drivers/net/wireless/rt2x00/rt2x00.h:942: warning: its scope is only this definition or declaration, which is probably not what you want
>
> [...]
>
> drivers/net/wireless/rt2x00/rt2x00dev.c: In function `rt2x00lib_configuration_scheduled':
> drivers/net/wireless/rt2x00/rt2x00dev.c:484: error: storage size of `bss_conf' isn't known
> drivers/net/wireless/rt2x00/rt2x00dev.c:494: error: `BSS_CHANGED_ERP_PREAMBLE' undeclared (first use in this function)
> drivers/net/wireless/rt2x00/rt2x00dev.c:494: error: (Each undeclared identifier is reported only once
> drivers/net/wireless/rt2x00/rt2x00dev.c:494: error: for each function it appears in.)
> drivers/net/wireless/rt2x00/rt2x00dev.c:484: warning: unused variable `bss_conf'
> drivers/net/wireless/rt2x00/rt2x00dev.c: In function `rt2x00lib_beacondone_scheduled':
> drivers/net/wireless/rt2x00/rt2x00dev.c:511: warning: passing arg 2 of `ieee80211_beacon_get' makes integer from pointer without a cast
Ivo
On Friday 22 February 2008, Chris Clayton wrote:
> Hi Ivo,
>
> On 18/02/2008, Ivo van Doorn <[email protected]> wrote:
> > Hi,
> >
>
> [...]
>
> > Above traces should be enough, but to determine where rt2x00 broke
> > down approximatly I need to have a few test result on specific moments.
> > Could you test the kernel with the following versions:
> >
> > rt2x00 2.0.11 2d68de3efa62655d551092f5c787505735d561ad
> > rt2x00 2.0.12 a3c7aa58df7df80aa05f166fe3e42482247164cf
> > rt2x00 2.0.13 5a6012e105ae1664cd2841c33bf59fbdd8d4dbcc
> >
> > Checking those out is simply a matter of:
> > git branch 2.0.11 2d68de3efa62655d551092f5c787505735d561ad
> > git checkout 2.0.11
> >
>
> OK, we seem to be struggling a little, so I've built an installed git
> and cloned Linus' 2.6 tree. My wireless network dies after a few pings
> with rt2x00 2.0.11.
>
> > No further bisecting is needed, but with above tests I can at least
> > narrow it down to find the cause of this issue.
> >
>
> If you need me to bisect, just shout. Please be patient though, I'm
> exploring new territory here :-)
I don't think bisecting this will help a lot, the rt2x00 2.0.11 release
introduced software diversity. And that is already something I suspect
of being broken.
Unfortunately software diversity was a bugfix and fix in one,
the previous setup was broken for some hardware since the
lack of software diversity caused problems.
Could you check if below patch helps in any way?
Ivo
---
diff --git a/drivers/net/wireless/rt2x00/rt2x00config.c b/drivers/net/wireless/rt2x00/rt2x00config.c
index a1d8e33..6995912 100644
--- a/drivers/net/wireless/rt2x00/rt2x00config.c
+++ b/drivers/net/wireless/rt2x00/rt2x00config.c
@@ -122,6 +122,10 @@ void rt2x00lib_config_antenna(struct rt2x00_dev *rt2x00dev,
libconf.ant.rx = rx;
libconf.ant.tx = tx;
+ if (rx == rt2x00dev->link.ant.active.rx &&
+ tx == rt2x00dev->link.ant.active.tx)
+ return;
+
/*
* Antenna setup changes require the RX to be disabled,
* else the changes will be ignored by the device.
diff --git a/drivers/net/wireless/rt2x00/rt2x00dev.c b/drivers/net/wireless/rt2x00/rt2x00dev.c
index 65a512f..4325c08 100644
--- a/drivers/net/wireless/rt2x00/rt2x00dev.c
+++ b/drivers/net/wireless/rt2x00/rt2x00dev.c
@@ -191,16 +191,16 @@ static void rt2x00lib_evaluate_antenna_sample(struct rt2x00_dev *rt2x00dev)
return;
if (rt2x00dev->link.ant.flags & ANTENNA_RX_DIVERSITY) {
- if (sample_a > sample_b && rx == ANTENNA_B)
+ if (sample_a > sample_b)
rx = ANTENNA_A;
- else if (rx == ANTENNA_A)
+ else
rx = ANTENNA_B;
}
if (rt2x00dev->link.ant.flags & ANTENNA_TX_DIVERSITY) {
- if (sample_a > sample_b && tx == ANTENNA_B)
+ if (sample_a > sample_b)
tx = ANTENNA_A;
- else if (tx == ANTENNA_A)
+ else
tx = ANTENNA_B;
}
@@ -257,7 +257,7 @@ static void rt2x00lib_evaluate_antenna(struct rt2x00_dev *rt2x00dev)
if (!(rt2x00dev->link.ant.flags & ANTENNA_RX_DIVERSITY) &&
!(rt2x00dev->link.ant.flags & ANTENNA_TX_DIVERSITY)) {
- rt2x00dev->link.ant.flags &= ~ANTENNA_MODE_SAMPLE;
+ rt2x00dev->link.ant.flags = 0;
return;
}
On Fri, 2008-02-22 at 07:39 +0000, Chris Clayton wrote:
> On Thursday 21 February 2008, Chris Vine wrote:
[snip]
> >
> > Does the same happen with 2.0.14 under kernel 2.6.24?
>
> Unfortunately, a 2.6.24.2 tree with the drivers/net/wireless/rt2x00 directory replaced with that from 2.6.25-rc2-git4 doesn't build:
You have to have a version of mac80211 which is current with the version
of rt2x00 2.0.14. If you don't want to go back into wireless-2.6 git to
do that I can send you my known working copy (for rt73 and I hope for
rt61) of compat-wireless-2.6 which will compile under 2.6.24. However
it is just over 1MB in size so I won't sent it to you unless you would
like me to do that instead of you pulling git from wireless-2.6 for mid
January (actually the most recent working version would be immediately
before the patch which raised the version of rt2x00 to 2.1.0).
Chris
Hi,
Firstly apologies for trimming linux-kernel and linux-wireless from my reply
to Ivo yesterday. Basically, I replied saying that the patch below didn't fix
the problem. But please do read on...
On Friday 22 February 2008, Ivo van Doorn wrote:
> On Friday 22 February 2008, Chris Clayton wrote:
> > Hi Ivo,
> >
> > On 18/02/2008, Ivo van Doorn <[email protected]> wrote:
> > > Hi,
> >
> > [...]
> >
> > > Above traces should be enough, but to determine where rt2x00 broke
> > > down approximatly I need to have a few test result on specific
> > > moments. Could you test the kernel with the following versions:
> > >
> > > rt2x00 2.0.11 2d68de3efa62655d551092f5c787505735d561ad
> > > rt2x00 2.0.12 a3c7aa58df7df80aa05f166fe3e42482247164cf
> > > rt2x00 2.0.13 5a6012e105ae1664cd2841c33bf59fbdd8d4dbcc
> > >
> > > Checking those out is simply a matter of:
> > > git branch 2.0.11 2d68de3efa62655d551092f5c787505735d561ad
> > > git checkout 2.0.11
> >
> > OK, we seem to be struggling a little, so I've built an installed git
> > and cloned Linus' 2.6 tree. My wireless network dies after a few pings
> > with rt2x00 2.0.11.
> >
> > > No further bisecting is needed, but with above tests I can at least
> > > narrow it down to find the cause of this issue.
> >
> > If you need me to bisect, just shout. Please be patient though, I'm
> > exploring new territory here :-)
>
> I don't think bisecting this will help a lot, the rt2x00 2.0.11 release
> introduced software diversity. And that is already something I suspect
> of being broken.
>
I've bisected anyway and although the results are not absolutely conclusive,
as I neared the end of the process, I was amongst a bunch of mac80211
patches. This set me on a path that resulted in me discovering that with the
rt61pci driver, I can freeze my wireless network connection almost at will if
I set mac82011's ieee80211_default_rc_algo parameter to 'pid'. if the
parametre is set to 'simple', the network seems to be reliable. I've just let
the ping application run on and ping another box on my network almost 1500
times whilst repeatedly transferring a kernel source tarball by ftp from
another box and the network connection was mantained That's with the
parameter set to 'simple', if \I set it to 'pid' the connection rarely
survives more than 40 pings even without the ftp activity.
If I replace my wireless card with one that uses the rtl8180 driver, the
network connection seems to be reliable regardless of how I set the
parameter, although I admit that i have not tested this extensively yet. I'll
do that now and report later.
Hope this helps.
Chris
> Unfortunately software diversity was a bugfix and fix in one,
> the previous setup was broken for some hardware since the
> lack of software diversity caused problems.
>
> Could you check if below patch helps in any way?
>
> Ivo
>
> ---
>
> diff --git a/drivers/net/wireless/rt2x00/rt2x00config.c
> b/drivers/net/wireless/rt2x00/rt2x00config.c index a1d8e33..6995912 100644
> --- a/drivers/net/wireless/rt2x00/rt2x00config.c
> +++ b/drivers/net/wireless/rt2x00/rt2x00config.c
> @@ -122,6 +122,10 @@ void rt2x00lib_config_antenna(struct rt2x00_dev
> *rt2x00dev, libconf.ant.rx = rx;
> libconf.ant.tx = tx;
>
> + if (rx == rt2x00dev->link.ant.active.rx &&
> + tx == rt2x00dev->link.ant.active.tx)
> + return;
> +
> /*
> * Antenna setup changes require the RX to be disabled,
> * else the changes will be ignored by the device.
> diff --git a/drivers/net/wireless/rt2x00/rt2x00dev.c
> b/drivers/net/wireless/rt2x00/rt2x00dev.c index 65a512f..4325c08 100644
> --- a/drivers/net/wireless/rt2x00/rt2x00dev.c
> +++ b/drivers/net/wireless/rt2x00/rt2x00dev.c
> @@ -191,16 +191,16 @@ static void rt2x00lib_evaluate_antenna_sample(struct
> rt2x00_dev *rt2x00dev) return;
>
> if (rt2x00dev->link.ant.flags & ANTENNA_RX_DIVERSITY) {
> - if (sample_a > sample_b && rx == ANTENNA_B)
> + if (sample_a > sample_b)
> rx = ANTENNA_A;
> - else if (rx == ANTENNA_A)
> + else
> rx = ANTENNA_B;
> }
>
> if (rt2x00dev->link.ant.flags & ANTENNA_TX_DIVERSITY) {
> - if (sample_a > sample_b && tx == ANTENNA_B)
> + if (sample_a > sample_b)
> tx = ANTENNA_A;
> - else if (tx == ANTENNA_A)
> + else
> tx = ANTENNA_B;
> }
>
> @@ -257,7 +257,7 @@ static void rt2x00lib_evaluate_antenna(struct
> rt2x00_dev *rt2x00dev)
>
> if (!(rt2x00dev->link.ant.flags & ANTENNA_RX_DIVERSITY) &&
> !(rt2x00dev->link.ant.flags & ANTENNA_TX_DIVERSITY)) {
> - rt2x00dev->link.ant.flags &= ~ANTENNA_MODE_SAMPLE;
> + rt2x00dev->link.ant.flags = 0;
> return;
> }
--
Beauty is in the eye of the beerholder.
Hi,
> > > > Checking those out is simply a matter of:
> > > > git branch 2.0.11 2d68de3efa62655d551092f5c787505735d561ad
> > > > git checkout 2.0.11
> > >
> > > OK, we seem to be struggling a little, so I've built an installed git
> > > and cloned Linus' 2.6 tree. My wireless network dies after a few pings
> > > with rt2x00 2.0.11.
> > >
> > > > No further bisecting is needed, but with above tests I can at least
> > > > narrow it down to find the cause of this issue.
> > >
> > > If you need me to bisect, just shout. Please be patient though, I'm
> > > exploring new territory here :-)
> >
> > I don't think bisecting this will help a lot, the rt2x00 2.0.11 release
> > introduced software diversity. And that is already something I suspect
> > of being broken.
> >
> I've bisected anyway and although the results are not absolutely conclusive,
> as I neared the end of the process, I was amongst a bunch of mac80211
> patches. This set me on a path that resulted in me discovering that with the
> rt61pci driver, I can freeze my wireless network connection almost at will if
> I set mac82011's ieee80211_default_rc_algo parameter to 'pid'. if the
> parametre is set to 'simple', the network seems to be reliable. I've just let
> the ping application run on and ping another box on my network almost 1500
> times whilst repeatedly transferring a kernel source tarball by ftp from
> another box and the network connection was mantained That's with the
> parameter set to 'simple', if \I set it to 'pid' the connection rarely
> survives more than 40 pings even without the ftp activity.
>
> If I replace my wireless card with one that uses the rtl8180 driver, the
> network connection seems to be reliable regardless of how I set the
> parameter, although I admit that i have not tested this extensively yet. I'll
> do that now and report later.
I'm about to send 4 patches to this (linux-wireless) list with patches
for rt2x00,
most of them you already tested individually, but several people reported
success after those patches.
Hopefully it will be working for you as well. :)
Ivo
Hi,
> > I've bisected anyway and although the results are not absolutely
> > conclusive, as I neared the end of the process, I was amongst a bunch of
> > mac80211 patches. This set me on a path that resulted in me discovering
> > that with the rt61pci driver, I can freeze my wireless network connection
> > almost at will if I set mac82011's ieee80211_default_rc_algo parameter to
> > 'pid'. if the parametre is set to 'simple', the network seems to be
> > reliable. I've just let the ping application run on and ping another box
> > on my network almost 1500 times whilst repeatedly transferring a kernel
> > source tarball by ftp from another box and the network connection was
> > mantained That's with the parameter set to 'simple', if \I set it to
> > 'pid' the connection rarely survives more than 40 pings even without the
> > ftp activity.
> >
> > If I replace my wireless card with one that uses the rtl8180 driver, the
> > network connection seems to be reliable regardless of how I set the
> > parameter, although I admit that i have not tested this extensively yet.
> > I'll do that now and report later.
>
I've rerun my tests with the rtl8180 driver and found the network to be
reliable with the mac82011 module's ieee80211_default_rc_algo parameter set
to either 'simple' or 'pid'.
> I'm about to send 4 patches to this (linux-wireless) list with patches
> for rt2x00,
> most of them you already tested individually, but several people reported
> success after those patches.
>
> Hopefully it will be working for you as well. :)
>
Sorry, but that's not the case. I find the same results as without the
patches. With the parameter set to 'pid', the network connection fails very
quickly, but with it set to 'simple' I can ping and ftp files to and from my
laptop as much as I like and the connection stays up. In fact, if anything
the patches seem to have made the network even more fragile, in that it fails
almost instantly once I start some network activity ( < 10 pings).
I'm sure this is not the hardware - it works perfectly with Windows XP, with
2.6.23.14 plus the out-of-tree rt61 driver from serialmonkey, with the
in-tree driver from 2.6.24.x and with 2.6.25-rc3 with the mac82011's
ieee80211_default_rc_algo parameter set to 'simple'.
Like I say above, sorry!
Chris
> Ivo
--
Beauty is in the eye of the beerholder.
On Tue, Feb 26, 2008 at 07:11:39PM +0000, Chris Clayton wrote:
> Sorry, but that's not the case. I find the same results as without the
> patches. With the parameter set to 'pid', the network connection fails very
> quickly, but with it set to 'simple' I can ping and ftp files to and from my
> laptop as much as I like and the connection stays up. In fact, if anything
> the patches seem to have made the network even more fragile, in that it fails
> almost instantly once I start some network activity ( < 10 pings).
>
> I'm sure this is not the hardware - it works perfectly with Windows XP, with
> 2.6.23.14 plus the out-of-tree rt61 driver from serialmonkey, with the
> in-tree driver from 2.6.24.x and with 2.6.25-rc3 with the mac82011's
> ieee80211_default_rc_algo parameter set to 'simple'.
At last! Vindication for insisting that we keep 'simple' around!
Bwahahaha! :-)
So, am I to understand that 'pid' works find for you with rtl8180?
If so, then I wonder if Stefano and Ivo can help us figure-out
what kind of problem is sensitive to both driver _and_ rate control
algorithm?
Thanks,
John
--
John W. Linville
[email protected]
> > Sorry, but that's not the case. I find the same results as without the
> > patches. With the parameter set to 'pid', the network connection fails very
> > quickly, but with it set to 'simple' I can ping and ftp files to and from my
> > laptop as much as I like and the connection stays up. In fact, if anything
> > the patches seem to have made the network even more fragile, in that it fails
> > almost instantly once I start some network activity ( < 10 pings).
> >
> > I'm sure this is not the hardware - it works perfectly with Windows XP, with
> > 2.6.23.14 plus the out-of-tree rt61 driver from serialmonkey, with the
> > in-tree driver from 2.6.24.x and with 2.6.25-rc3 with the mac82011's
> > ieee80211_default_rc_algo parameter set to 'simple'.
>
> At last! Vindication for insisting that we keep 'simple' around!
> Bwahahaha! :-)
>
> So, am I to understand that 'pid' works find for you with rtl8180?
> If so, then I wonder if Stefano and Ivo can help us figure-out
> what kind of problem is sensitive to both driver _and_ rate control
> algorithm?
rt2x00 is known to be less sensitive then the legacy drivers, scanning
produces less and more inconsistent results (Not all AP's are reported,
even when that AP has a high rssi), and the reported RSSI is often
much lower then expected with the distance to the AP.
I have compared many register dumps, but have never managed to
find a real register setting that might cause this. So what might be
the problem is that rt2x00 is not reporting the RSSI correctly to mac80211.
With the big difference between how mac80211 handles TX rates and
how the legacy drivers handle them, it is hard to make a comparison
where exactly things are going wrong. But in the end, I think it all
comes down to rt2x00 reporting invalid RSSI values to mac80211,
and/or the rate control mechanism being too dependent on some
statistics which are not provided by the driver.
I have to admit that I haven't looked into the 'pid' algorithm closely,
but could it be that some fields in the tx status report upon txdone
are being treated as "very important" while the driver doesn't report it
(For example ack signal strength)?
Other then that I have to say that rt2x00 never has reached a particular
state where link quality issues can be traced back to mac80211 or
the rate control mechanism. It usually was caused by a bug in the driver
itself. (rt2x00 cannot be considered stable yet for a very good reason. ;) )
Ivo
On 26/02/2008, John W. Linville <[email protected]> wrote:
> On Tue, Feb 26, 2008 at 07:11:39PM +0000, Chris Clayton wrote:
>
> > Sorry, but that's not the case. I find the same results as without the
> > patches. With the parameter set to 'pid', the network connection fails very
> > quickly, but with it set to 'simple' I can ping and ftp files to and from my
> > laptop as much as I like and the connection stays up. In fact, if anything
> > the patches seem to have made the network even more fragile, in that it fails
> > almost instantly once I start some network activity ( < 10 pings).
> >
> > I'm sure this is not the hardware - it works perfectly with Windows XP, with
> > 2.6.23.14 plus the out-of-tree rt61 driver from serialmonkey, with the
> > in-tree driver from 2.6.24.x and with 2.6.25-rc3 with the mac82011's
> > ieee80211_default_rc_algo parameter set to 'simple'.
>
>
> At last! Vindication for insisting that we keep 'simple' around!
> Bwahahaha! :-)
>
It's nice to be able to make someone happy :-)
> So, am I to understand that 'pid' works find for you with rtl8180?
Yes John, using the rtl8180 driver I get reliable network performance
with either 'pid' or 'simple'. With the rt61pci driver, I find that
'simple' provides a reliable network, but 'pid' simply does not work.
So keeping 'simple' around gets my vote too.
> If so, then I wonder if Stefano and Ivo can help us figure-out
> what kind of problem is sensitive to both driver _and_ rate control
> algorithm?
>
And I will help in any way I can, providing diagnostics and trying patches.
> Thanks,
>
> John
>
> --
> John W. Linville
> [email protected]
>
--
Beauty is in the eye of the beerholder.
On Tue, 26 Feb 2008 21:13:48 +0000
"Chris Clayton" <[email protected]> wrote:
> And I will help in any way I can, providing diagnostics and trying patches.
Please, could you mount debugfs and provide me with a dump of this file:
/debug/ieee80211/phy*/stations/*/rc_pid_events
Thank you.
--
Ciao
Stefano
On Tue, 26 Feb 2008 21:30:38 +0100
"Ivo Van Doorn" <[email protected]> wrote:
> rt2x00 is known to be less sensitive then the legacy drivers, scanning
> produces less and more inconsistent results (Not all AP's are reported,
> even when that AP has a high rssi), and the reported RSSI is often
> much lower then expected with the distance to the AP.
> I have compared many register dumps, but have never managed to
> find a real register setting that might cause this. So what might be
> the problem is that rt2x00 is not reporting the RSSI correctly to mac80211.
No, we don't care at all about RSSI in rc80211-pid.
> I have to admit that I haven't looked into the 'pid' algorithm closely,
> but could it be that some fields in the tx status report upon txdone
> are being treated as "very important" while the driver doesn't report it
> (For example ack signal strength)?
The only important thing drivers should report back to mac80211 are ACKed
frames. In rc80211-pid (and it's just the same in rc80211-simple) the only
inputs from mac80211 are succesfully (re)transmitted frames and failed
frames.
--
Ciao
Stefano
On 26/02/2008, Stefano Brivio <[email protected]> wrote:
> On Tue, 26 Feb 2008 21:13:48 +0000
> "Chris Clayton" <[email protected]> wrote:
>
> > And I will help in any way I can, providing diagnostics and trying patches.
>
>
> Please, could you mount debugfs and provide me with a dump of this file:
> /debug/ieee80211/phy*/stations/*/rc_pid_events
>
Here's a dump that I started, then began pinging my gateway in another
terminal until the network failed and then stopped the dump with ^C a
few seconds later. Hope it helps.
[chris:~]$ cat /debug/ieee80211/phy0/stations/00\:60\:b3\:77\:73\:1a/rc_pid_events
3 131904 tx_status 0 0
4 131904 pf_sample 0 3584 840 0
5 134212 tx_rate 0 10
6 134212 tx_status 0 0
7 134212 pf_sample 0 3584 1183 0
8 134212 tx_rate 0 10
9 134213 tx_status 0 1
10 134462 tx_rate 0 10
11 134462 tx_status 0 0
12 134462 pf_sample 8448 -4864 427 8448
13 134713 tx_rate 0 10
14 134713 tx_status 0 0
15 134713 pf_sample 0 3584 821 -8448
16 134713 rate_change 0 10
17 134964 tx_rate 0 10
18 134964 tx_status 0 0
19 134964 pf_sample 0 3584 1167 0
20 135215 tx_rate 0 10
21 135215 tx_status 0 0
22 135215 pf_sample 0 3584 1469 0
23 135215 rate_change 1 20
24 135466 tx_rate 1 20
25 135466 tx_status 0 0
26 135466 pf_sample 0 3584 1733 0
27 135466 rate_change 2 55
28 135717 tx_rate 2 55
29 135717 tx_status 0 0
30 135717 pf_sample 0 3584 1965 0
31 135717 rate_change 129 -541505508
32 135968 tx_rate 11 540
33 136219 tx_rate 11 540
34 136470 tx_rate 11 540
35 136721 tx_rate 11 540
36 136972 tx_rate 11 540
37 137223 tx_rate 11 540
38 137474 tx_rate 11 540
39 137725 tx_rate 11 540
40 137976 tx_rate 11 540
41 138227 tx_rate 11 540
^C
Chris
> Thank you.
>
>
> --
> Ciao
>
> Stefano
>
--
Beauty is in the eye of the beerholder.
On Tue, 26 Feb 2008 22:36:19 +0000
"Chris Clayton" <[email protected]> wrote:
> 27 135466 rate_change 2 55
> 28 135717 tx_rate 2 55
> 29 135717 tx_status 0 0
> 30 135717 pf_sample 0 3584 1965 0
> 31 135717 rate_change 129 -541505508
> 32 135968 tx_rate 11 540
Known and fixed. The fix isn't in 2.6.25-rc3 yet, though.
Fix:
commit 32720eae675d08990e97bffbf71a31382599cc8a
Author: Stefano Brivio <[email protected]>
Date: Tue Jan 29 20:29:16 2008 +0100
rc80211-pid: fix rate adjustment
--
Ciao
Stefano
On Wed, Feb 27, 2008 at 08:26:42AM +0100, Stefano Brivio wrote:
> On Tue, 26 Feb 2008 22:36:19 +0000
> "Chris Clayton" <[email protected]> wrote:
>
> > 27 135466 rate_change 2 55
> > 28 135717 tx_rate 2 55
> > 29 135717 tx_status 0 0
> > 30 135717 pf_sample 0 3584 1965 0
> > 31 135717 rate_change 129 -541505508
> > 32 135968 tx_rate 11 540
>
> Known and fixed. The fix isn't in 2.6.25-rc3 yet, though.
>
> Fix:
> commit 32720eae675d08990e97bffbf71a31382599cc8a
> Author: Stefano Brivio <[email protected]>
> Date: Tue Jan 29 20:29:16 2008 +0100
>
> rc80211-pid: fix rate adjustment
And it currently isn't queued for 2.6.25 at all.
Chris, can you cherry-pick this from the wireless-2.6.26 tree and
give it a test on your 2.6.25-rc3 tree? If it resolves a problem
then I'll queue it to Dave for 2.6.25 (which will probably provoke
a net-2.6.26 and wireless-2.6.26 rebase).
Let me know...
John
--
John W. Linville
[email protected]
On Wed, Feb 27, 2008 at 10:51:11AM -0500, John W. Linville wrote:
> On Wed, Feb 27, 2008 at 08:26:42AM +0100, Stefano Brivio wrote:
> > On Tue, 26 Feb 2008 22:36:19 +0000
> > "Chris Clayton" <[email protected]> wrote:
> >
> > > 27 135466 rate_change 2 55
> > > 28 135717 tx_rate 2 55
> > > 29 135717 tx_status 0 0
> > > 30 135717 pf_sample 0 3584 1965 0
> > > 31 135717 rate_change 129 -541505508
> > > 32 135968 tx_rate 11 540
> >
> > Known and fixed. The fix isn't in 2.6.25-rc3 yet, though.
> >
> > Fix:
> > commit 32720eae675d08990e97bffbf71a31382599cc8a
> > Author: Stefano Brivio <[email protected]>
> > Date: Tue Jan 29 20:29:16 2008 +0100
> >
> > rc80211-pid: fix rate adjustment
>
> And it currently isn't queued for 2.6.25 at all.
>
> Chris, can you cherry-pick this from the wireless-2.6.26 tree and
> give it a test on your 2.6.25-rc3 tree? If it resolves a problem
> then I'll queue it to Dave for 2.6.25 (which will probably provoke
> a net-2.6.26 and wireless-2.6.26 rebase).
That might be easier said than done. It looks like that patch depends
on the big cfg80211 API change queued for 2.6.26.
Stefano offered to rebase that on 2.6.25. Stefano, could you post
that as part of this thread?
Thanks,
John
--
John W. Linville
[email protected]
On 27/02/2008, John W. Linville <[email protected]> wrote:
> On Wed, Feb 27, 2008 at 10:51:11AM -0500, John W. Linville wrote:
> > On Wed, Feb 27, 2008 at 08:26:42AM +0100, Stefano Brivio wrote:
> > > On Tue, 26 Feb 2008 22:36:19 +0000
[...]
> > >
> > > Known and fixed. The fix isn't in 2.6.25-rc3 yet, though.
> > >
> > > Fix:
> > > commit 32720eae675d08990e97bffbf71a31382599cc8a
> > > Author: Stefano Brivio <[email protected]>
> > > Date: Tue Jan 29 20:29:16 2008 +0100
> > >
> > > rc80211-pid: fix rate adjustment
> >
> > And it currently isn't queued for 2.6.25 at all.
> >
> > Chris, can you cherry-pick this from the wireless-2.6.26 tree and
> > give it a test on your 2.6.25-rc3 tree? If it resolves a problem
> > then I'll queue it to Dave for 2.6.25 (which will probably provoke
> > a net-2.6.26 and wireless-2.6.26 rebase).
>
>
> That might be easier said than done. It looks like that patch depends
> on the big cfg80211 API change queued for 2.6.26.
>
Yes, that's correct. The patch doesn't apply cleanly to -rc3 and,
having inspected at it, it looks too complicated for me to hack in.
> Stefano offered to rebase that on 2.6.25. Stefano, could you post
> that as part of this thread?
>
That would be very helpful.
Thanks
> Thanks,
>
>
> John
> --
> John W. Linville
> [email protected]
>
--
Beauty is in the eye of the beerholder.
On Wed, 27 Feb 2008 12:25:46 -0500
"John W. Linville" <[email protected]> wrote:
> That might be easier said than done. It looks like that patch depends
> on the big cfg80211 API change queued for 2.6.26.
>
> Stefano offered to rebase that on 2.6.25. Stefano, could you post
> that as part of this thread?
Sorry for the delay. This is based on 2.6.25-rc3. Please test.
---
Merge rate_control_pid_shift_adjust() to rate_control_pid_adjust_rate()
in order to make the learning algorithm aware of constraints on rates. Also
add some comments and rename variables.
This fixes a bug which prevented 802.11b/g non-AP STAs from working with
802.11b only AP STAs.
Signed-off-by: Stefano Brivio <[email protected]>
---
Index: linux-2.6.24/net/mac80211/rc80211_pid_algo.c
===================================================================
--- linux-2.6.24.orig/net/mac80211/rc80211_pid_algo.c
+++ linux-2.6.24/net/mac80211/rc80211_pid_algo.c
@@ -2,7 +2,7 @@
* Copyright 2002-2005, Instant802 Networks, Inc.
* Copyright 2005, Devicescape Software, Inc.
* Copyright 2007, Mattias Nissler <[email protected]>
- * Copyright 2007, Stefano Brivio <[email protected]>
+ * Copyright 2007-2008, Stefano Brivio <[email protected]>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
@@ -63,72 +63,66 @@
* RC_PID_ARITH_SHIFT.
*/
-
-/* Shift the adjustment so that we won't switch to a lower rate if it exhibited
- * a worse failed frames behaviour and we'll choose the highest rate whose
- * failed frames behaviour is not worse than the one of the original rate
- * target. While at it, check that the adjustment is within the ranges. Then,
- * provide the new rate index. */
-static int rate_control_pid_shift_adjust(struct rc_pid_rateinfo *r,
- int adj, int cur, int l)
-{
- int i, j, k, tmp;
-
- j = r[cur].rev_index;
- i = j + adj;
-
- if (i < 0)
- return r[0].index;
- if (i >= l - 1)
- return r[l - 1].index;
-
- tmp = i;
-
- if (adj < 0) {
- for (k = j; k >= i; k--)
- if (r[k].diff <= r[j].diff)
- tmp = k;
- } else {
- for (k = i + 1; k + i < l; k++)
- if (r[k].diff <= r[i].diff)
- tmp = k;
- }
-
- return r[tmp].index;
-}
-
+/* Adjust the rate while ensuring that we won't switch to a lower rate if it
+ * exhibited a worse failed frames behaviour and we'll choose the highest rate
+ * whose failed frames behaviour is not worse than the one of the original rate
+ * target. While at it, check that the new rate is valid. */
static void rate_control_pid_adjust_rate(struct ieee80211_local *local,
struct sta_info *sta, int adj,
struct rc_pid_rateinfo *rinfo)
{
struct ieee80211_sub_if_data *sdata;
struct ieee80211_hw_mode *mode;
- int newidx;
- int maxrate;
- int back = (adj > 0) ? 1 : -1;
+ int cur_sorted, new_sorted, probe, tmp, n_bitrates;
+ int cur = sta->txrate;
sdata = IEEE80211_DEV_TO_SUB_IF(sta->dev);
mode = local->oper_hw_mode;
- maxrate = sdata->bss ? sdata->bss->max_ratectrl_rateidx : -1;
+ n_bitrates = mode->num_rates;
- newidx = rate_control_pid_shift_adjust(rinfo, adj, sta->txrate,
- mode->num_rates);
+ /* Map passed arguments to sorted values. */
+ cur_sorted = rinfo[cur].rev_index;
+ new_sorted = cur_sorted + adj;
+
+ /* Check limits. */
+ if (new_sorted < 0)
+ new_sorted = rinfo[0].rev_index;
+ else if (new_sorted >= n_bitrates)
+ new_sorted = rinfo[n_bitrates - 1].rev_index;
- while (newidx != sta->txrate) {
- if (rate_supported(sta, mode, newidx) &&
- (maxrate < 0 || newidx <= maxrate)) {
- sta->txrate = newidx;
- break;
- }
+ tmp = new_sorted;
- newidx += back;
+ if (adj < 0) {
+ /* Ensure that the rate decrease isn't disadvantageous. */
+ for (probe = cur_sorted; probe >= new_sorted; probe--)
+ if (rinfo[probe].diff <= rinfo[cur_sorted].diff &&
+ rate_supported(sta, mode, rinfo[probe].index))
+ tmp = probe;
+ } else {
+ /* Look for rate increase with zero (or below) cost. */
+ for (probe = new_sorted + 1; probe < n_bitrates; probe++)
+ if (rinfo[probe].diff <= rinfo[new_sorted].diff &&
+ rate_supported(sta, mode, rinfo[probe].index))
+ tmp = probe;
}
+ /* Fit the rate found to the nearest supported rate. */
+ do {
+ if (rate_supported(sta, mode, rinfo[tmp].index)) {
+ sta->txrate = rinfo[tmp].index;
+ break;
+ }
+ if (adj < 0)
+ tmp--;
+ else
+ tmp++;
+ } while (tmp < n_bitrates && tmp >= 0);
+
#ifdef CONFIG_MAC80211_DEBUGFS
rate_control_pid_event_rate_change(
&((struct rc_pid_sta_info *)sta->rate_ctrl_priv)->events,
- newidx, mode->rates[newidx].rate);
+ cur, mode->rates[cur].rate);
#endif
}
--
Ciao
Stefano
On 02/03/2008, Stefano Brivio <[email protected]> wrote:
> On Wed, 27 Feb 2008 12:25:46 -0500
> "John W. Linville" <[email protected]> wrote:
>
> > That might be easier said than done. It looks like that patch depends
> > on the big cfg80211 API change queued for 2.6.26.
> >
> > Stefano offered to rebase that on 2.6.25. Stefano, could you post
> > that as part of this thread?
>
>
> Sorry for the delay. This is based on 2.6.25-rc3. Please test.
>
I've tested this with the pid algorithm selected and my wireless
network connection is reliable. In a loop, I repeatedly ftp'd a kernel
source tarball from another box on my network for 40 minutes with no
failure, whilst at the same time, pinging my gateway. Without the
patch, that activity would have led to network failure in seconds. The
tests were with the patch applied to 2.6.25-rc3-git3, so that Ivo's
rt2x00 patches for 2.6.25 are also applied.
Thanks to everyone who has helped fix this.
> ---
>
> Merge rate_control_pid_shift_adjust() to rate_control_pid_adjust_rate()
> in order to make the learning algorithm aware of constraints on rates. Also
> add some comments and rename variables.
>
> This fixes a bug which prevented 802.11b/g non-AP STAs from working with
> 802.11b only AP STAs.
>
> Signed-off-by: Stefano Brivio <[email protected]>
Tested-by: Chris Clayton <[email protected]>
> ---
> Index: linux-2.6.24/net/mac80211/rc80211_pid_algo.c
> ===================================================================
> --- linux-2.6.24.orig/net/mac80211/rc80211_pid_algo.c
> +++ linux-2.6.24/net/mac80211/rc80211_pid_algo.c
> @@ -2,7 +2,7 @@
> * Copyright 2002-2005, Instant802 Networks, Inc.
> * Copyright 2005, Devicescape Software, Inc.
> * Copyright 2007, Mattias Nissler <[email protected]>
> - * Copyright 2007, Stefano Brivio <[email protected]>
> + * Copyright 2007-2008, Stefano Brivio <[email protected]>
> *
> * This program is free software; you can redistribute it and/or modify
> * it under the terms of the GNU General Public License version 2 as
> @@ -63,72 +63,66 @@
> * RC_PID_ARITH_SHIFT.
> */
>
> -
> -/* Shift the adjustment so that we won't switch to a lower rate if it exhibited
> - * a worse failed frames behaviour and we'll choose the highest rate whose
> - * failed frames behaviour is not worse than the one of the original rate
> - * target. While at it, check that the adjustment is within the ranges. Then,
> - * provide the new rate index. */
> -static int rate_control_pid_shift_adjust(struct rc_pid_rateinfo *r,
> - int adj, int cur, int l)
> -{
> - int i, j, k, tmp;
> -
> - j = r[cur].rev_index;
> - i = j + adj;
> -
> - if (i < 0)
> - return r[0].index;
> - if (i >= l - 1)
> - return r[l - 1].index;
> -
> - tmp = i;
> -
> - if (adj < 0) {
> - for (k = j; k >= i; k--)
> - if (r[k].diff <= r[j].diff)
> - tmp = k;
> - } else {
> - for (k = i + 1; k + i < l; k++)
> - if (r[k].diff <= r[i].diff)
> - tmp = k;
> - }
> -
> - return r[tmp].index;
> -}
> -
> +/* Adjust the rate while ensuring that we won't switch to a lower rate if it
> + * exhibited a worse failed frames behaviour and we'll choose the highest rate
> + * whose failed frames behaviour is not worse than the one of the original rate
> + * target. While at it, check that the new rate is valid. */
> static void rate_control_pid_adjust_rate(struct ieee80211_local *local,
> struct sta_info *sta, int adj,
> struct rc_pid_rateinfo *rinfo)
> {
> struct ieee80211_sub_if_data *sdata;
> struct ieee80211_hw_mode *mode;
> - int newidx;
> - int maxrate;
> - int back = (adj > 0) ? 1 : -1;
> + int cur_sorted, new_sorted, probe, tmp, n_bitrates;
> + int cur = sta->txrate;
>
> sdata = IEEE80211_DEV_TO_SUB_IF(sta->dev);
>
> mode = local->oper_hw_mode;
> - maxrate = sdata->bss ? sdata->bss->max_ratectrl_rateidx : -1;
> + n_bitrates = mode->num_rates;
>
> - newidx = rate_control_pid_shift_adjust(rinfo, adj, sta->txrate,
> - mode->num_rates);
> + /* Map passed arguments to sorted values. */
> + cur_sorted = rinfo[cur].rev_index;
> + new_sorted = cur_sorted + adj;
> +
> + /* Check limits. */
> + if (new_sorted < 0)
> + new_sorted = rinfo[0].rev_index;
> + else if (new_sorted >= n_bitrates)
> + new_sorted = rinfo[n_bitrates - 1].rev_index;
>
> - while (newidx != sta->txrate) {
> - if (rate_supported(sta, mode, newidx) &&
> - (maxrate < 0 || newidx <= maxrate)) {
> - sta->txrate = newidx;
> - break;
> - }
> + tmp = new_sorted;
>
> - newidx += back;
> + if (adj < 0) {
> + /* Ensure that the rate decrease isn't disadvantageous. */
> + for (probe = cur_sorted; probe >= new_sorted; probe--)
> + if (rinfo[probe].diff <= rinfo[cur_sorted].diff &&
> + rate_supported(sta, mode, rinfo[probe].index))
> + tmp = probe;
> + } else {
> + /* Look for rate increase with zero (or below) cost. */
> + for (probe = new_sorted + 1; probe < n_bitrates; probe++)
> + if (rinfo[probe].diff <= rinfo[new_sorted].diff &&
> + rate_supported(sta, mode, rinfo[probe].index))
> + tmp = probe;
> }
>
> + /* Fit the rate found to the nearest supported rate. */
> + do {
> + if (rate_supported(sta, mode, rinfo[tmp].index)) {
> + sta->txrate = rinfo[tmp].index;
> + break;
> + }
> + if (adj < 0)
> + tmp--;
> + else
> + tmp++;
> + } while (tmp < n_bitrates && tmp >= 0);
> +
> #ifdef CONFIG_MAC80211_DEBUGFS
> rate_control_pid_event_rate_change(
> &((struct rc_pid_sta_info *)sta->rate_ctrl_priv)->events,
> - newidx, mode->rates[newidx].rate);
> + cur, mode->rates[cur].rate);
> #endif
> }
>
>
>
> --
> Ciao
>
> Stefano
>
--
Beauty is in the eye of the beerholder.