2009-12-04 23:31:59

by Andi Kleen

[permalink] [raw]
Subject: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

[Rafael, something for your regression list]

I upgraded a system which was running 2.6.30 to 2.6.32.
It has a

06:02.0 Network controller: RaLink RT2561/RT61 802.11g PCI

wireless PCI card. Now regularly even under moderate traffic
I see messages like

phy0 -> rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2.
Please file bug report to http://rt2x00.serialmonkey.com.

and loss of connectivity, often until the wireless connection
is restarted. This wasn't the case in 2.6.30, there the driver
ran stable without any problems. The problem currently
happens every few minutes.

-Andi
--
[email protected] -- Speaking for myself only.


2009-12-08 22:40:43

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On Tue, 2009-12-08 at 23:39 +0100, Stefan Lippers-Hollmann wrote:

> Unfortunately this fails to compile for me:

well remove the spurious opening brace? :)

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-12-06 17:32:28

by Andi Kleen

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On Sun, Dec 06, 2009 at 06:20:57PM +0100, Johannes Berg wrote:
> On Sun, 2009-12-06 at 18:04 +0100, Andi Kleen wrote:
> > > Both of you have reported this problem on 2.6.32.
> >
> > For me it seems like the problem went away since I disabled
> > CONFIG_CFG80211_DEFAULT_PS (that would indicate it's related
> > to power saving?)
>
> FWIW, 'iwconfig wlan0 power on/off' is an easier way to change that than
> recompiling your kernel.

I wasn't sure if they are equivalent. This stuff is all quite under
documented and I gave up tracing the code paths at some point.

-Andi

--
[email protected] -- Speaking for myself only.

2009-12-07 09:32:28

by Tim Blechmann

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On 12/06/2009 02:40 PM, Gertjan van Wingerde wrote:
> On 12/05/09 00:32, Andi Kleen wrote:
>> [Rafael, something for your regression list]
>>
>> I upgraded a system which was running 2.6.30 to 2.6.32.
>> It has a
>>
>> 06:02.0 Network controller: RaLink RT2561/RT61 802.11g PCI
>>
>> wireless PCI card. Now regularly even under moderate traffic
>> I see messages like
>>
>> phy0 -> rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2.
>> Please file bug report to http://rt2x00.serialmonkey.com.
>>
>> and loss of connectivity, often until the wireless connection
>> is restarted. This wasn't the case in 2.6.30, there the driver
>> ran stable without any problems. The problem currently
>> happens every few minutes.
>>
>
> Andi, Tim,
>
> Both of you have reported this problem on 2.6.32.
>
> In the past this always has occurred due to queue locking problems. This led me
> to audit the queue locking code, and that certainly looked suspicious to me.
>
> Would you be able to test whether the attached test patch fixes the problem for
> you. The patch basically applies proper queue locking to the code, although in
> a very course manner. The patch is relative to 2.6.32.
>
> Note: I am not 100% sure that this is where the problem is, but at least the test
> patch should confirm whether I am searching in the right direction.

the patch didn't cleanly apply to v2.6.32 and the patched kernel didn't
boot ... which tree should it be applied on?

tim

--
[email protected]
http://tim.klingt.org

Im übrigen ist es gescheiter, sich warm zuzudecken als sich zu
betrinken.
Werner Schwab


Attachments:
signature.asc (197.00 B)
OpenPGP digital signature

2009-12-07 23:06:23

by Stefan Lippers-Hollmann

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

Hi

On Monday 07 December 2009, Gertjan van Wingerde wrote:
> On 12/05/09 00:32, Andi Kleen wrote:
> > [Rafael, something for your regression list]
> >
> > I upgraded a system which was running 2.6.30 to 2.6.32.
> > It has a
> >
> > 06:02.0 Network controller: RaLink RT2561/RT61 802.11g PCI
> >
> > wireless PCI card. Now regularly even under moderate traffic
> > I see messages like
> >
> > phy0 -> rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2.
> > Please file bug report to http://rt2x00.serialmonkey.com.
> >
> > and loss of connectivity, often until the wireless connection
> > is restarted. This wasn't the case in 2.6.30, there the driver
> > ran stable without any problems. The problem currently
> > happens every few minutes.

I can reproduce the problem with 2.6.32 and rt61pci.

rt73usb and rt2500usb seem to be fine.

> Andi, Tim,
>
> Both of you have reported this problem on 2.6.32.
>
> In the past this always has occurred due to queue locking problems. This led me
> to audit the queue locking code, and that certainly looked suspicious to me.
>
> Would you be able to test whether the attached test patch fixes the problem for
> you. The patch basically applies proper queue locking to the code, although in
> a very course manner. The patch is relative to 2.6.32.
>
> Note: I am not 100% sure that this is where the problem is, but at least the test
> patch should confirm whether I am searching in the right direction.

This patch doesn't seem to change this particular issue, disabling
powersaving however makes rt62pci work again.

Regards
Stefan Lippers-Hollmann

2009-12-08 22:39:13

by Stefan Lippers-Hollmann

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

Hi

On Tuesday 08 December 2009, Gertjan van Wingerde wrote:
> On 12/08/09 11:19, Gertjan van Wingerde wrote:
> > On Tue, Dec 8, 2009 at 10:57 AM, Andi Kleen <[email protected]> wrote:
> >>> This patch doesn't seem to change this particular issue, disabling
> >>> powersaving however makes rt62pci work again.
> >>
> >> Yes confirmed here. Haven't seen it in a day or so with disabled power
> >> saving.
> >>
> >> However I haven't tried the patch with power saving enabled.
> >
> > OK.
> > I think Stefan said that the patch didn't help for this issue.
> >
> > However, I've got an other hunch (based on the Ralink driver). Maybe
> > we shouldn't go to sleep when still TX entries are present. I'll cook
> > up a patch tonight.
> >
>
> OK. Please try again with the attached patch. Basically it disables going
> to sleep when there are still unacknowledged frames in the TX queues.
> This mirrors the behavior of the Ralink provided drivers.

Unfortunately this fails to compile for me:

drivers/net/wireless/rt2x00/rt61pci.c: In function 'rt61pci_config_ps':
drivers/net/wireless/rt2x00/rt61pci.c:999: error: invalid storage class for function 'rt61pci_config'
drivers/net/wireless/rt2x00/rt61pci.c:996: warning: ISO C90 forbids mixed declarations and code
drivers/net/wireless/rt2x00/rt61pci.c:1020: error: invalid storage class for function 'rt61pci_link_stats'
drivers/net/wireless/rt2x00/rt61pci.c:1038: error: invalid storage class for function 'rt61pci_set_vgc'
drivers/net/wireless/rt2x00/rt61pci.c:1048: error: invalid storage class for function 'rt61pci_reset_tuner'
drivers/net/wireless/rt2x00/rt61pci.c:1054: error: invalid storage class for function 'rt61pci_link_tuner'
drivers/net/wireless/rt2x00/rt61pci.c:1145: error: invalid storage class for function 'rt61pci_get_firmware_name'
drivers/net/wireless/rt2x00/rt61pci.c:1168: error: invalid storage class for function 'rt61pci_check_firmware'
drivers/net/wireless/rt2x00/rt61pci.c:1197: error: invalid storage class for function 'rt61pci_load_firmware'
drivers/net/wireless/rt2x00/rt61pci.c:1284: error: invalid storage class for function 'rt61pci_get_entry_state'
drivers/net/wireless/rt2x00/rt61pci.c:1301: error: invalid storage class for function 'rt61pci_clear_entry'
drivers/net/wireless/rt2x00/rt61pci.c:1324: error: invalid storage class for function 'rt61pci_init_queues'
drivers/net/wireless/rt2x00/rt61pci.c:1406: error: invalid storage class for function 'rt61pci_init_registers'
drivers/net/wireless/rt2x00/rt61pci.c:1550: error: invalid storage class for function 'rt61pci_wait_bbp_ready'
drivers/net/wireless/rt2x00/rt61pci.c:1566: error: invalid storage class for function 'rt61pci_init_bbp'
drivers/net/wireless/rt2x00/rt61pci.c:1618: error: invalid storage class for function 'rt61pci_toggle_rx'
drivers/net/wireless/rt2x00/rt61pci.c:1630: error: invalid storage class for function 'rt61pci_toggle_irq'
drivers/net/wireless/rt2x00/rt61pci.c:1670: error: invalid storage class for function 'rt61pci_enable_radio'
drivers/net/wireless/rt2x00/rt61pci.c:1692: error: invalid storage class for function 'rt61pci_disable_radio'
drivers/net/wireless/rt2x00/rt61pci.c:1700: error: invalid storage class for function 'rt61pci_set_state'
drivers/net/wireless/rt2x00/rt61pci.c:1730: error: invalid storage class for function 'rt61pci_set_device_state'
drivers/net/wireless/rt2x00/rt61pci.c:1774: error: invalid storage class for function 'rt61pci_write_tx_desc'
drivers/net/wireless/rt2x00/rt61pci.c:1855: error: invalid storage class for function 'rt61pci_write_beacon'
drivers/net/wireless/rt2x00/rt61pci.c:1889: error: invalid storage class for function 'rt61pci_kick_tx_queue'
drivers/net/wireless/rt2x00/rt61pci.c:1919: error: invalid storage class for function 'rt61pci_kill_tx_queue'
drivers/net/wireless/rt2x00/rt61pci.c:1939: error: invalid storage class for function 'rt61pci_agc_to_rssi'
drivers/net/wireless/rt2x00/rt61pci.c:1968: error: invalid storage class for function 'rt61pci_fill_rxdone'
drivers/net/wireless/rt2x00/rt61pci.c:2037: error: invalid storage class for function 'rt61pci_txdone'
drivers/net/wireless/rt2x00/rt61pci.c:2131: error: invalid storage class for function 'rt61pci_interrupt'
drivers/net/wireless/rt2x00/rt61pci.c:2184: error: invalid storage class for function 'rt61pci_validate_eeprom'
drivers/net/wireless/rt2x00/rt61pci.c:2295: error: invalid storage class for function 'rt61pci_init_eeprom'
drivers/net/wireless/rt2x00/rt61pci.c:2542: error: invalid storage class for function 'rt61pci_probe_hw_mode'
drivers/net/wireless/rt2x00/rt61pci.c:2606: error: invalid storage class for function 'rt61pci_probe_hw'
drivers/net/wireless/rt2x00/rt61pci.c:2659: error: invalid storage class for function 'rt61pci_conf_tx'
drivers/net/wireless/rt2x00/rt61pci.c:2715: error: invalid storage class for function 'rt61pci_get_tsf'
drivers/net/wireless/rt2x00/rt61pci.c:2740: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2740: error: (near initialization for 'rt61pci_mac80211_ops.conf_tx')
drivers/net/wireless/rt2x00/rt61pci.c:2742: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2742: error: (near initialization for 'rt61pci_mac80211_ops.get_tsf')
drivers/net/wireless/rt2x00/rt61pci.c:2747: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2747: error: (near initialization for 'rt61pci_rt2x00_ops.irq_handler')
drivers/net/wireless/rt2x00/rt61pci.c:2748: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2748: error: (near initialization for 'rt61pci_rt2x00_ops.probe_hw')
drivers/net/wireless/rt2x00/rt61pci.c:2749: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2749: error: (near initialization for 'rt61pci_rt2x00_ops.get_firmware_name')
drivers/net/wireless/rt2x00/rt61pci.c:2750: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2750: error: (near initialization for 'rt61pci_rt2x00_ops.check_firmware')
drivers/net/wireless/rt2x00/rt61pci.c:2751: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2751: error: (near initialization for 'rt61pci_rt2x00_ops.load_firmware')
drivers/net/wireless/rt2x00/rt61pci.c:2754: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2754: error: (near initialization for 'rt61pci_rt2x00_ops.get_entry_state')
drivers/net/wireless/rt2x00/rt61pci.c:2755: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2755: error: (near initialization for 'rt61pci_rt2x00_ops.clear_entry')
drivers/net/wireless/rt2x00/rt61pci.c:2756: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2756: error: (near initialization for 'rt61pci_rt2x00_ops.set_device_state')
drivers/net/wireless/rt2x00/rt61pci.c:2758: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2758: error: (near initialization for 'rt61pci_rt2x00_ops.link_stats')
drivers/net/wireless/rt2x00/rt61pci.c:2759: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2759: error: (near initialization for 'rt61pci_rt2x00_ops.reset_tuner')
drivers/net/wireless/rt2x00/rt61pci.c:2760: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2760: error: (near initialization for 'rt61pci_rt2x00_ops.link_tuner')
drivers/net/wireless/rt2x00/rt61pci.c:2761: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2761: error: (near initialization for 'rt61pci_rt2x00_ops.write_tx_desc')
drivers/net/wireless/rt2x00/rt61pci.c:2763: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2763: error: (near initialization for 'rt61pci_rt2x00_ops.write_beacon')
drivers/net/wireless/rt2x00/rt61pci.c:2764: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2764: error: (near initialization for 'rt61pci_rt2x00_ops.kick_tx_queue')
drivers/net/wireless/rt2x00/rt61pci.c:2765: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2765: error: (near initialization for 'rt61pci_rt2x00_ops.kill_tx_queue')
drivers/net/wireless/rt2x00/rt61pci.c:2766: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2766: error: (near initialization for 'rt61pci_rt2x00_ops.fill_rxdone')
drivers/net/wireless/rt2x00/rt61pci.c:2773: error: initializer element is not constant
drivers/net/wireless/rt2x00/rt61pci.c:2773: error: (near initialization for 'rt61pci_rt2x00_ops.config')
drivers/net/wireless/rt2x00/rt61pci.c:2832: warning: 'alias' attribute ignored
drivers/net/wireless/rt2x00/rt61pci.c:2832: warning: ISO C90 forbids mixed declarations and code
drivers/net/wireless/rt2x00/rt61pci.c:2848: error: invalid storage class for function 'rt61pci_init'
drivers/net/wireless/rt2x00/rt61pci.c:2853: error: invalid storage class for function 'rt61pci_exit'
drivers/net/wireless/rt2x00/rt61pci.c:2857: error: invalid storage class for function '__inittest'
drivers/net/wireless/rt2x00/rt61pci.c:2857: warning: 'alias' attribute ignored
drivers/net/wireless/rt2x00/rt61pci.c:2858: error: invalid storage class for function '__exittest'
drivers/net/wireless/rt2x00/rt61pci.c:2858: warning: ISO C90 forbids mixed declarations and code
drivers/net/wireless/rt2x00/rt61pci.c:2858: warning: 'alias' attribute ignored
drivers/net/wireless/rt2x00/rt61pci.c:2858: error: expected declaration or statement at end of input
make[7]: *** [drivers/net/wireless/rt2x00/rt61pci.o] Error 1
make[7]: *** Waiting for unfinished jobs....

$ gcc --version
gcc (Debian 4.4.2-3) 4.4.2
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Thanks
Stefan Lippers-Hollmann

2009-12-08 22:49:25

by Gertjan van Wingerde

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On 12/08/09 23:00, Johannes Berg wrote:
> On Tue, 2009-12-08 at 22:56 +0100, Gertjan van Wingerde wrote:
>
>> Well, this is a test patch, to validate my theory that this is where the problem is.
>> If that is confirmed, then I can start working out a proper solution.
>
> Well but this patch won't help you test that theory, because it'll just
> disable PS completely if there were packets pending when mac80211 asked
> you to enable PS, afaict.

I don't think so, but that can be because of a big misunderstanding on my side on PS.
We just do not go to sleep when being asked to by mac80211, and thus are saving less
power. But awake would still work.

When quickly testing the patch I did see multiple requests for enabling PS come in
which were denied because of this patch. So I don't think that anything is disabled
with respect to PS.

>
>> How does mac80211 look at these things anyway. How does it expect the driver to handle
>> the cases where mac80211 asks the driver to go to sleep because of power saving, and
>> the driver can't because its queues haven't drained yet?
>
> mac80211 does expect you to be able to transmit while asleep. Same
> happens when PS-polling, where it actually expects you to wait for the
> response frame too. I think. I haven't looked at this in a while.

OK. If that is the case then it may very well be that the PS implementation of rt2x00
is completely broken, as I cannot find any code to wake up if any TX frames come in.

Maybe the safest thing to do would be to disable powersaving from the driver until
we have figured out what the proper implementation of it is for rt2x00.

---
Gertjan.


2009-12-08 21:44:57

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On Tue, 2009-12-08 at 22:42 +0100, Gertjan van Wingerde wrote:

> OK. Please try again with the attached patch. Basically it disables going
> to sleep when there are still unacknowledged frames in the TX queues.
> This mirrors the behavior of the Ralink provided drivers.

Kinda looks like you forgot to actually make it go to sleep once the
queues are drained though?

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-12-08 22:00:38

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On Tue, 2009-12-08 at 22:56 +0100, Gertjan van Wingerde wrote:

> Well, this is a test patch, to validate my theory that this is where the problem is.
> If that is confirmed, then I can start working out a proper solution.

Well but this patch won't help you test that theory, because it'll just
disable PS completely if there were packets pending when mac80211 asked
you to enable PS, afaict.

> How does mac80211 look at these things anyway. How does it expect the driver to handle
> the cases where mac80211 asks the driver to go to sleep because of power saving, and
> the driver can't because its queues haven't drained yet?

mac80211 does expect you to be able to transmit while asleep. Same
happens when PS-polling, where it actually expects you to wait for the
response frame too. I think. I haven't looked at this in a while.

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-12-06 17:03:57

by Andi Kleen

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

> Both of you have reported this problem on 2.6.32.

For me it seems like the problem went away since I disabled
CONFIG_CFG80211_DEFAULT_PS (that would indicate it's related
to power saving?)

That is I'm watching it further, but previously the problem
happened every few minutes and now it already works without hickup
for a few hours.

I'll try a positive test (reenabling it) later and also
see if your patch makes a difference. Thanks.

-Andi

2009-12-07 21:58:04

by Gertjan van Wingerde

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On 12/07/09 10:31, Tim Blechmann wrote:
> On 12/06/2009 02:40 PM, Gertjan van Wingerde wrote:
>> On 12/05/09 00:32, Andi Kleen wrote:
>>> [Rafael, something for your regression list]
>>>
>>> I upgraded a system which was running 2.6.30 to 2.6.32.
>>> It has a
>>>
>>> 06:02.0 Network controller: RaLink RT2561/RT61 802.11g PCI
>>>
>>> wireless PCI card. Now regularly even under moderate traffic
>>> I see messages like
>>>
>>> phy0 -> rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2.
>>> Please file bug report to http://rt2x00.serialmonkey.com.
>>>
>>> and loss of connectivity, often until the wireless connection
>>> is restarted. This wasn't the case in 2.6.30, there the driver
>>> ran stable without any problems. The problem currently
>>> happens every few minutes.
>>>
>>
>> Andi, Tim,
>>
>> Both of you have reported this problem on 2.6.32.
>>
>> In the past this always has occurred due to queue locking problems. This led me
>> to audit the queue locking code, and that certainly looked suspicious to me.
>>
>> Would you be able to test whether the attached test patch fixes the problem for
>> you. The patch basically applies proper queue locking to the code, although in
>> a very course manner. The patch is relative to 2.6.32.
>>
>> Note: I am not 100% sure that this is where the problem is, but at least the test
>> patch should confirm whether I am searching in the right direction.
>
> the patch didn't cleanly apply to v2.6.32 and the patched kernel didn't
> boot ... which tree should it be applied on?
>

Yeah, I knew it didn't apply cleanly, as it was prepared on latest wireless-testing and
then stripped of unnecessary things. Find an updated patch attached. This one is completely
the same, only line numbers of where to apply changed.

I can't reproduce any boot failures with the patch applied, although for me the rt2x00
drivers are compiled as modules. Can you provide some more information on the boot failure
that you see?

---
Gertjan.


Attachments:
2.6.32-tx_entry_not_free.diff2 (7.41 kB)

2009-12-08 21:42:24

by Gertjan van Wingerde

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On 12/08/09 11:19, Gertjan van Wingerde wrote:
> On Tue, Dec 8, 2009 at 10:57 AM, Andi Kleen <[email protected]> wrote:
>>> This patch doesn't seem to change this particular issue, disabling
>>> powersaving however makes rt62pci work again.
>>
>> Yes confirmed here. Haven't seen it in a day or so with disabled power
>> saving.
>>
>> However I haven't tried the patch with power saving enabled.
>
> OK.
> I think Stefan said that the patch didn't help for this issue.
>
> However, I've got an other hunch (based on the Ralink driver). Maybe
> we shouldn't go to sleep when still TX entries are present. I'll cook
> up a patch tonight.
>

OK. Please try again with the attached patch. Basically it disables going
to sleep when there are still unacknowledged frames in the TX queues.
This mirrors the behavior of the Ralink provided drivers.

---
Gertjan.


Attachments:
2.6.32-powersave-fix.diff (777.00 B)

2009-12-08 21:56:15

by Gertjan van Wingerde

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On 12/08/09 22:44, Johannes Berg wrote:
> On Tue, 2009-12-08 at 22:42 +0100, Gertjan van Wingerde wrote:
>
>> OK. Please try again with the attached patch. Basically it disables going
>> to sleep when there are still unacknowledged frames in the TX queues.
>> This mirrors the behavior of the Ralink provided drivers.
>
> Kinda looks like you forgot to actually make it go to sleep once the
> queues are drained though?
>

Well, this is a test patch, to validate my theory that this is where the problem is.
If that is confirmed, then I can start working out a proper solution.

How does mac80211 look at these things anyway. How does it expect the driver to handle
the cases where mac80211 asks the driver to go to sleep because of power saving, and
the driver can't because its queues haven't drained yet?

---
Gertjan.


2009-12-09 23:49:50

by Tim Blechmann

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On 12/10/2009 12:23 AM, Gertjan van Wingerde wrote:
> On 12/09/09 23:59, Tim Blechmann wrote:
>>> I can't reproduce any boot failures with the patch applied, although for me the rt2x00
>>> drivers are compiled as modules. Can you provide some more information on the boot failure
>>> that you see?
>>
>> i am pretty sure, that i am experiencing the same issue as described
>> here [1] with stack trace ...
>>
>> [1] http://lkml.org/lkml/2009/12/8/207
>>
>
> OK. Can you then test with powersaving disabled?
>
> Just do iwconfig wlanX power off on the interface.

difficult, if i cannot boot a kernel ... i suspect, there are some other
issues, unrelated to the rt61pci driver ... i haven't been able to boot
-rc5 (a version i booted before), -rc8 and -final, one backtrace similar
to the other mail (which appears quite unrelated), another one from
within the yenta socket module :/ ... and sometimes the screen was
flooded by backtraces ... i am not really fluent with the sysrq keys, so
i cannot provide details at the moment ...

disabling the powersaving at compile time didn't help, but i maybe it
would, if i could boot a kernel :/

tim

--
[email protected]
http://tim.klingt.org

art is short - life is long
Jack Kerouac


Attachments:
signature.asc (197.00 B)
OpenPGP digital signature

2009-12-08 22:44:03

by Gertjan van Wingerde

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On 12/08/09 23:40, Johannes Berg wrote:
> On Tue, 2009-12-08 at 23:39 +0100, Stefan Lippers-Hollmann wrote:
>
>> Unfortunately this fails to compile for me:
>
> well remove the spurious opening brace? :)
>

Correct. That will teach me to make last minute edits without recompiling :-(

Corrected patch attached.

---
Gertjan.


Attachments:
2.6.32-powersave-fix.diff (775.00 B)

2009-12-08 23:53:21

by Stefan Lippers-Hollmann

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

Hi

On Tuesday 08 December 2009, Johannes Berg wrote:
> On Tue, 2009-12-08 at 23:39 +0100, Stefan Lippers-Hollmann wrote:
>
> > Unfortunately this fails to compile for me:
>
> well remove the spurious opening brace? :)

yes, of course - but I have to admit that I didn't look too well at the
patch, after your other subthread - sorry about the noise.


with the patch applied (powersaving enabled):

- Downloading a larger file and therefore keeping the interface busy works
now pretty well, while I did hit connection time outs before with
powersaving enabled:
Fetched 89.7MB in 30s (2926kB/s)

- pinging rt61pci results in rather unsteady response times (packet loss
can happen, but rarely), which might be a side effect of the card trying
to sleep.
# ping rt61
PING rt61 (192.168.0.71): 56 data bytes
64 bytes from 192.168.0.71: seq=0 ttl=64 time=1016.131 ms
64 bytes from 192.168.0.71: seq=1 ttl=64 time=17.574 ms
64 bytes from 192.168.0.71: seq=2 ttl=64 time=906.219 ms
64 bytes from 192.168.0.71: seq=3 ttl=64 time=1.499 ms
64 bytes from 192.168.0.71: seq=4 ttl=64 time=801.945 ms
64 bytes from 192.168.0.71: seq=5 ttl=64 time=1.498 ms
64 bytes from 192.168.0.71: seq=6 ttl=64 time=644.036 ms
^C
--- rt61 ping statistics ---
7 packets transmitted, 7 packets received, 0% packet loss
round-trip min/avg/max = 1.498/484.128/1016.131 ms

- connecting to the rt61pci through ssh now works again, but typing exposes
pretty high latencies whenever the interface is otherwise idle - I'm not
sure if this is to be expected with powersaving enabled, though.
$ while true; do time ssh rt61 exit; sleep 5; done

real 0m1.867s
user 0m0.040s
sys 0m0.003s

real 0m0.997s
user 0m0.042s
sys 0m0.005s

real 0m0.214s
user 0m0.044s
sys 0m0.003s
^C
while the latency is much better when the interface is busy (downloading
a large file) and therefore can't power down:
$ while true; do time ssh rt61 exit; sleep 5; done

real 0m0.395s
user 0m0.041s
sys 0m0.003s

real 0m0.491s
user 0m0.043s
sys 0m0.001s

real 0m0.518s
user 0m0.044s
sys 0m0.000s
^C



without the patch applied (powersaving enabled):

- Downloading a larger file in a loop times out pretty often and doesn't
reach expected throughputs (very unsteady), if the interface is kept busy
(ping in parallel), the situation improves significantly.

- pinging rt61pci exposes large latencies and exposes significant packet loss
# ping rt61
PING rt61 (192.168.0.71): 56 data bytes
64 bytes from 192.168.0.71: seq=3 ttl=64 time=6.415 ms
64 bytes from 192.168.0.71: seq=6 ttl=64 time=999.531 ms
64 bytes from 192.168.0.71: seq=7 ttl=64 time=2.695 ms
64 bytes from 192.168.0.71: seq=8 ttl=64 time=1.179 ms
64 bytes from 192.168.0.71: seq=9 ttl=64 time=0.974 ms
64 bytes from 192.168.0.71: seq=13 ttl=64 time=8650.425 ms
64 bytes from 192.168.0.71: seq=14 ttl=64 time=7653.903 ms
64 bytes from 192.168.0.71: seq=15 ttl=64 time=6653.707 ms
64 bytes from 192.168.0.71: seq=16 ttl=64 time=5653.463 ms
64 bytes from 192.168.0.71: seq=17 ttl=64 time=4653.132 ms
64 bytes from 192.168.0.71: seq=18 ttl=64 time=3652.887 ms
64 bytes from 192.168.0.71: seq=19 ttl=64 time=2652.628 ms
64 bytes from 192.168.0.71: seq=20 ttl=64 time=1652.377 ms
64 bytes from 192.168.0.71: seq=21 ttl=64 time=652.126 ms
^C
--- rt61 ping statistics ---
23 packets transmitted, 14 packets received, 39% packet loss
round-trip min/avg/max = 0.974/3063.245/8650.425 ms

- ssh connections are affected accordingly and basically unusable.
$ while true; do time ssh rt61 exit; sleep 5; done

real 0m5.242s
user 0m0.044s
sys 0m0.002s

real 0m1.682s
user 0m0.043s
sys 0m0.001s

real 0m5.797s
user 0m0.044s
sys 0m0.002s
^C



without the patch applied (powersaving disabled, see patch below which
unfortunately disables powersaving for all rt2x00 devices, even though
rt2500usb and rt73usb don't seem to be affected):

--- a/drivers/net/wireless/rt2x00/rt2x00dev.c
+++ b/drivers/net/wireless/rt2x00/rt2x00dev.c
@@ -837,6 +837,15 @@ int rt2x00lib_probe_dev(struct rt2x00_de
BIT(NL80211_IFTYPE_WDS);

/*
+ * Disable power saving for the time being, it breaks at least:
+ * - rt61pci
+ * while it isn't necessary for:
+ * - rt2500usb
+ * - rt73usb
+ */
+ rt2x00dev->hw->wiphy->ps_default = false;
+
+ /*
* Let the driver probe the device to detect the capabilities.
*/
retval = rt2x00dev->ops->lib->probe_hw(rt2x00dev);

- Downloading a larger file in a loop works as expected, with steady
results - independent of teh interface being idle or busy:
Fetched 89.7MB in 30s (2903kB/s)

- ping times are good, no packet loss:
# ping rt61
PING rt61 (192.168.0.71): 56 data bytes
64 bytes from 192.168.0.71: seq=0 ttl=64 time=1.267 ms
64 bytes from 192.168.0.71: seq=1 ttl=64 time=1.208 ms
64 bytes from 192.168.0.71: seq=2 ttl=64 time=1.197 ms
64 bytes from 192.168.0.71: seq=3 ttl=64 time=1.106 ms
64 bytes from 192.168.0.71: seq=4 ttl=64 time=1.123 ms
^C
--- rt61 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 1.106/1.180/1.267 ms

- ssh connections are snappy as expected:
$ while true; do time ssh rt61 exit; sleep 5; done

real 0m0.188s
user 0m0.042s
sys 0m0.003s

real 0m0.145s
user 0m0.041s
sys 0m0.004s

real 0m0.146s
user 0m0.041s
sys 0m0.003s
^C

Regards
Stefan Lippers-Hollmann

2009-12-09 22:59:30

by Tim Blechmann

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

> I can't reproduce any boot failures with the patch applied, although for me the rt2x00
> drivers are compiled as modules. Can you provide some more information on the boot failure
> that you see?

i am pretty sure, that i am experiencing the same issue as described
here [1] with stack trace ...

[1] http://lkml.org/lkml/2009/12/8/207

--
[email protected]
http://tim.klingt.org

art is short - life is long
Jack Kerouac


Attachments:
signature.asc (197.00 B)
OpenPGP digital signature

2009-12-06 17:21:21

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On Sun, 2009-12-06 at 18:04 +0100, Andi Kleen wrote:
> > Both of you have reported this problem on 2.6.32.
>
> For me it seems like the problem went away since I disabled
> CONFIG_CFG80211_DEFAULT_PS (that would indicate it's related
> to power saving?)

FWIW, 'iwconfig wlan0 power on/off' is an easier way to change that than
recompiling your kernel.

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-12-09 08:24:54

by Johannes Berg

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On Wed, 2009-12-09 at 07:58 +0100, Gertjan van Wingerde wrote:

> > - pinging rt61pci results in rather unsteady response times (packet loss
> > can happen, but rarely), which might be a side effect of the card trying
> > to sleep.
> > # ping rt61
> > PING rt61 (192.168.0.71): 56 data bytes
> > 64 bytes from 192.168.0.71: seq=0 ttl=64 time=1016.131 ms
> > 64 bytes from 192.168.0.71: seq=1 ttl=64 time=17.574 ms
> > 64 bytes from 192.168.0.71: seq=2 ttl=64 time=906.219 ms
> > 64 bytes from 192.168.0.71: seq=3 ttl=64 time=1.499 ms

> My guess is that this is due to powersaving being enabled, where mac80211 and
> the device go in powersaving mode and then it takes a while before packets
> reach the device. I don't think this signifies a bug of some kind.
>
> Johannes, is this correct expected behavior of powersaving?

Well, it shouldn't be quite that high, unless your beacon interval is
1000 TU.

johannes


Attachments:
signature.asc (801.00 B)
This is a digitally signed message part

2009-12-09 06:58:12

by Gertjan van Wingerde

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On 12/09/09 00:53, Stefan Lippers-Hollmann wrote:
> Hi
>
> On Tuesday 08 December 2009, Johannes Berg wrote:
>> On Tue, 2009-12-08 at 23:39 +0100, Stefan Lippers-Hollmann wrote:
>>
>>> Unfortunately this fails to compile for me:
>>
>> well remove the spurious opening brace? :)
>
> yes, of course - but I have to admit that I didn't look too well at the
> patch, after your other subthread - sorry about the noise.
>

Yeah, sorry about the compile error in the patch.

>
> with the patch applied (powersaving enabled):
>
> - Downloading a larger file and therefore keeping the interface busy works
> now pretty well, while I did hit connection time outs before with
> powersaving enabled:
> Fetched 89.7MB in 30s (2926kB/s)
>

Does this mean that the "Arrived at non-free entry" messages are gone with
power-saving enabled?

> - pinging rt61pci results in rather unsteady response times (packet loss
> can happen, but rarely), which might be a side effect of the card trying
> to sleep.
> # ping rt61
> PING rt61 (192.168.0.71): 56 data bytes
> 64 bytes from 192.168.0.71: seq=0 ttl=64 time=1016.131 ms
> 64 bytes from 192.168.0.71: seq=1 ttl=64 time=17.574 ms
> 64 bytes from 192.168.0.71: seq=2 ttl=64 time=906.219 ms
> 64 bytes from 192.168.0.71: seq=3 ttl=64 time=1.499 ms
> 64 bytes from 192.168.0.71: seq=4 ttl=64 time=801.945 ms
> 64 bytes from 192.168.0.71: seq=5 ttl=64 time=1.498 ms
> 64 bytes from 192.168.0.71: seq=6 ttl=64 time=644.036 ms
> ^C
> --- rt61 ping statistics ---
> 7 packets transmitted, 7 packets received, 0% packet loss
> round-trip min/avg/max = 1.498/484.128/1016.131 ms
>

My guess is that this is due to powersaving being enabled, where mac80211 and
the device go in powersaving mode and then it takes a while before packets
reach the device. I don't think this signifies a bug of some kind.

Johannes, is this correct expected behavior of powersaving?

<snip>

>
> without the patch applied (powersaving disabled, see patch below which
> unfortunately disables powersaving for all rt2x00 devices, even though
> rt2500usb and rt73usb don't seem to be affected):
>
> --- a/drivers/net/wireless/rt2x00/rt2x00dev.c
> +++ b/drivers/net/wireless/rt2x00/rt2x00dev.c
> @@ -837,6 +837,15 @@ int rt2x00lib_probe_dev(struct rt2x00_de
> BIT(NL80211_IFTYPE_WDS);
>
> /*
> + * Disable power saving for the time being, it breaks at least:
> + * - rt61pci
> + * while it isn't necessary for:
> + * - rt2500usb
> + * - rt73usb
> + */
> + rt2x00dev->hw->wiphy->ps_default = false;
> +
> + /*
> * Let the driver probe the device to detect the capabilities.
> */IEEE80211_HW_PS_NULLFUNC_STACK
> retval = rt2x00dev->ops->lib->probe_hw(rt2x00dev);
>

This can be done smarter by removing the IEEE80211_HW_SUPPORTS_PS and
IEEE80211_HW_PS_NULLFUNC_STACK flags from the driver initialization.
This can be done on a per driver basis.

BTW, it is expected that rt2500usb and rt73usb are not affected by the
originally reported problem, as their queue handling is completely different
from the PCI drivers.
Do you know if they expose the same unsteady ping response time behavior?

---
Gertjan.

2009-12-10 00:46:07

by Stefan Lippers-Hollmann

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

Hi

On Thursday 10 December 2009, Gertjan van Wingerde wrote:
> On 12/09/09 00:53, Stefan Lippers-Hollmann wrote:
> > Hi
> >
> > On Tuesday 08 December 2009, Johannes Berg wrote:
> >> On Tue, 2009-12-08 at 23:39 +0100, Stefan Lippers-Hollmann wrote:
> >>
> >>> Unfortunately this fails to compile for me:
> >>
> >> well remove the spurious opening brace? :)
> >
> > yes, of course - but I have to admit that I didn't look too well at the
> > patch, after your other subthread - sorry about the noise.
> >
>
> Yeah, sorry about the compile error in the patch.
>
> >
> > with the patch applied (powersaving enabled):
> >
> > - Downloading a larger file and therefore keeping the interface busy works
> > now pretty well, while I did hit connection time outs before with
> > powersaving enabled:
> > Fetched 89.7MB in 30s (2926kB/s)
> >
>
> Does this mean that the "Arrived at non-free entry" messages are gone with
> power-saving enabled?
[...]

I've never seen that message, neither does the system crash or freeze -
rt61pci is just almost unusable due to the packet loss and the latencies with
powersaving enabled (I don't see this problem in the same environment with
rt2500usb/ rt73usb or b43 (bcm4306, bcm4318), rtl8187b, ath5k, ar9170usb,
zd1211rw).

Regards
Stefan Lippers-Hollmann

2009-12-09 23:23:38

by Gertjan van Wingerde

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On 12/09/09 23:59, Tim Blechmann wrote:
>> I can't reproduce any boot failures with the patch applied, although for me the rt2x00
>> drivers are compiled as modules. Can you provide some more information on the boot failure
>> that you see?
>
> i am pretty sure, that i am experiencing the same issue as described
> here [1] with stack trace ...
>
> [1] http://lkml.org/lkml/2009/12/8/207
>

OK. Can you then test with powersaving disabled?

Just do iwconfig wlanX power off on the interface.

---
Gertjan.

2009-12-08 10:19:01

by Gertjan van Wingerde

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On Tue, Dec 8, 2009 at 10:57 AM, Andi Kleen <[email protected]> wrote:
>> This patch doesn't seem to change this particular issue, disabling
>> powersaving however makes rt62pci work again.
>
> Yes confirmed here. Haven't seen it in a day or so with disabled power
> saving.
>
> However I haven't tried the patch with power saving enabled.

OK.
I think Stefan said that the patch didn't help for this issue.

However, I've got an other hunch (based on the Ralink driver). Maybe
we shouldn't go to sleep when still TX entries are present. I'll cook
up a patch tonight.

---
Gertjan.

2009-12-06 13:40:25

by Gertjan van Wingerde

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

On 12/05/09 00:32, Andi Kleen wrote:
> [Rafael, something for your regression list]
>
> I upgraded a system which was running 2.6.30 to 2.6.32.
> It has a
>
> 06:02.0 Network controller: RaLink RT2561/RT61 802.11g PCI
>
> wireless PCI card. Now regularly even under moderate traffic
> I see messages like
>
> phy0 -> rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2.
> Please file bug report to http://rt2x00.serialmonkey.com.
>
> and loss of connectivity, often until the wireless connection
> is restarted. This wasn't the case in 2.6.30, there the driver
> ran stable without any problems. The problem currently
> happens every few minutes.
>

Andi, Tim,

Both of you have reported this problem on 2.6.32.

In the past this always has occurred due to queue locking problems. This led me
to audit the queue locking code, and that certainly looked suspicious to me.

Would you be able to test whether the attached test patch fixes the problem for
you. The patch basically applies proper queue locking to the code, although in
a very course manner. The patch is relative to 2.6.32.

Note: I am not 100% sure that this is where the problem is, but at least the test
patch should confirm whether I am searching in the right direction.

---
Gertjan.


Attachments:
2.6.32-tx_entry_not_free.diff (7.37 kB)

2009-12-08 09:57:49

by Andi Kleen

[permalink] [raw]
Subject: Re: regression: rt2561 frequent "Arrived at non-free entry" errors in 2.6.32

> This patch doesn't seem to change this particular issue, disabling
> powersaving however makes rt62pci work again.

Yes confirmed here. Haven't seen it in a day or so with disabled power
saving.

However I haven't tried the patch with power saving enabled.

-andi

--
[email protected] -- Speaking for myself only.