2009-09-04 08:05:34

by Jon Fairbairn

[permalink] [raw]
Subject: AP: ath5k + hostapd occasionally sulks


I'm running hostapd 0.6.9 and currently compat-wireless-2009-08-22.

This seems to work most of the time, but occasionally (usually days
between occurrences) stops accepting connexions. Usually there's no
indication of this in the logs, but when clients try to associate they
eventually time out.

However, the most recent time this happened was when I was downloading a
large amount of data to one of the clients. I got a lot of

ath5k phy0: no further txbuf available, dropping packet

in the log and then the connexion died and wouldn't come back until I
restarted hostapd (I reloaded ath5k for good measure; I should probably
have tried without doing that...)

I have two questions:

(1) is that buffer message something that might have been addressed
since 2009-08-22, and

(2) What can I do to track down something that happens this
infrequently? I'm running hostapd with -d, but that doesn't log anything
different when it happens.



--
Jón Fairbairn [email protected]




2009-09-10 18:56:16

by Philip Prindeville

[permalink] [raw]
Subject: Re: AP: ath5k + hostapd occasionally sulks

Philip Prindeville wrote:
> Bob Copeland wrote:
>
>> On Tue, Sep 8, 2009 at 1:53 PM, Philip Prindeville
>> <[email protected]> wrote:
>>
>>
>>> I pulled compat-wireless from GIT last night (or about 1:30am mountain,
>>> really) and rebuilt a 2.6.27.29 kernel.
>>>
>>> I'm seeing a lot of:
>>>
>>> Sep 8 11:44:09 pbx user.err kernel: ath5k phy0: no further txbuf available, dropping packet
>>>
>>>
>>> one every 10 seconds, in fact. This is with an Engenius EMP-8062+ card:
>>>
>>>
>> Ok, the timing information is useful. This is probably something (beacon
>> sending?) racing with the periodic calibration, which temporarily stops
>> all of the tx traffic and frees the tx buffers, then starts it all up
>> again. In short, apart from the logging this shouldn't cause any
>> problems, but we should probably disable the beacon tasklet during this
>> time.
>>
>>
>
> Alas it is causing problems. I have a Windows 7 client with an Atheros
> card (I forget which... it's the mini-PCIe card that comes with Zotac
> ION mini-itx motherboards).
>
> I either can't associate, or associate but don't get a DHCP address or
> don't pass traffic... I forget which.
>
> I can do more testing tomorrow...
>
>> If this only appeared all of a sudden in recent compat snapshots, it
>> would be useful to know the last one that worked normally.
>>
>>
>
> I could walk it backwards, I suppose... 2009-08-23 was definitely
> working with an 9K board.
>
> I've not tried it with a 5K board (I'm not at this location very often).
>

FYI: The Windows 7 box associates and runs just fine with 9K driver
with 2009-09-07. So it seems to be an issue with the 5K driver.

I'll set up a second WAP with a 5K card...

-Philip

>
>
>>> I'll probably have to reboot regularly, since this is on an embedded box
>>> with limited CF filesystem, and I can't overflow my /var partition...
>>>
>>>
>> Ouch. For now, just take it out or demote it to debug.
>>
>> As for the original problem, I don't know offhand why a large download
>> would trigger a cascade of these errors. The best way to track it down
>> is to try to come up with a case that reproduces it and sprinkle printks
>> throughout the driver, especially when we free and allocate the tx
>> buffers.
>>
>>
>>


2009-09-09 16:45:00

by Philip Prindeville

[permalink] [raw]
Subject: Re: AP: ath5k + hostapd occasionally sulks

Bob Copeland wrote:
> On Tue, Sep 8, 2009 at 1:53 PM, Philip Prindeville
> <[email protected]> wrote:
>
>> I pulled compat-wireless from GIT last night (or about 1:30am mountain,
>> really) and rebuilt a 2.6.27.29 kernel.
>>
>> I'm seeing a lot of:
>>
>> Sep 8 11:44:09 pbx user.err kernel: ath5k phy0: no further txbuf available, dropping packet
>>
>>
>> one every 10 seconds, in fact. This is with an Engenius EMP-8062+ card:
>>
>
> Ok, the timing information is useful. This is probably something (beacon
> sending?) racing with the periodic calibration, which temporarily stops
> all of the tx traffic and frees the tx buffers, then starts it all up
> again. In short, apart from the logging this shouldn't cause any
> problems, but we should probably disable the beacon tasklet during this
> time.
>

Alas it is causing problems. I have a Windows 7 client with an Atheros
card (I forget which... it's the mini-PCIe card that comes with Zotac
ION mini-itx motherboards).

I either can't associate, or associate but don't get a DHCP address or
don't pass traffic... I forget which.

I can do more testing tomorrow...
> If this only appeared all of a sudden in recent compat snapshots, it
> would be useful to know the last one that worked normally.
>

I could walk it backwards, I suppose... 2009-08-23 was definitely
working with an 9K board.

I've not tried it with a 5K board (I'm not at this location very often).


>> I'll probably have to reboot regularly, since this is on an embedded box
>> with limited CF filesystem, and I can't overflow my /var partition...
>>
>
> Ouch. For now, just take it out or demote it to debug.
>
> As for the original problem, I don't know offhand why a large download
> would trigger a cascade of these errors. The best way to track it down
> is to try to come up with a case that reproduces it and sprinkle printks
> throughout the driver, especially when we free and allocate the tx
> buffers.
>
>


2009-09-08 17:53:18

by Philip Prindeville

[permalink] [raw]
Subject: Re: AP: ath5k + hostapd occasionally sulks

I pulled compat-wireless from GIT last night (or about 1:30am mountain,
really) and rebuilt a 2.6.27.29 kernel.

I'm seeing a lot of:

Sep 8 11:44:09 pbx user.err kernel: ath5k phy0: no further txbuf available, dropping packet


one every 10 seconds, in fact. This is with an Engenius EMP-8062+ card:

00:11.0 Ethernet controller: Atheros Communications Inc. AR5413 802.11abg NIC (rev 01)
Subsystem: Atheros Communications Inc. EnGenius EMP-8602 (400mw) or Compex WLM54AG
Flags: bus master, medium devsel, latency 168, IRQ 15
Memory at a0010000 (32-bit, non-prefetchable) [size=64K]
Capabilities: [44] Power Management version 2
Kernel driver in use: ath5k
Kernel modules: ath5k



Is this a known issue? Jon posted the same question, but I didn't see an
answer to his question...

And like him, I'm also using hostapd (but I pulled a GIT at the same
time last night).

I'll probably have to reboot regularly, since this is on an embedded box
with limited CF filesystem, and I can't overflow my /var partition...

Thanks,

-Philip




2009-09-10 21:42:24

by Bob Copeland

[permalink] [raw]
Subject: Re: AP: ath5k + hostapd occasionally sulks

On Thu, Sep 10, 2009 at 11:55:38AM -0700, Philip A. Prindeville wrote:
> FYI: The Windows 7 box associates and runs just fine with 9K driver
> with 2009-09-07. So it seems to be an issue with the 5K driver.

Okay, probably so. Multicast buffering may be broken again.

--
Bob Copeland %% http://www.bobcopeland.com


2009-09-09 13:00:32

by Bob Copeland

[permalink] [raw]
Subject: Re: AP: ath5k + hostapd occasionally sulks

On Tue, Sep 8, 2009 at 1:53 PM, Philip Prindeville
<[email protected]> wrote:
> I pulled compat-wireless from GIT last night (or about 1:30am mountain,
> really) and rebuilt a 2.6.27.29 kernel.
>
> I'm seeing a lot of:
>
> Sep ?8 11:44:09 pbx user.err kernel: ath5k phy0: no further txbuf available, dropping packet
>
>
> one every 10 seconds, in fact. ?This is with an Engenius EMP-8062+ card:

Ok, the timing information is useful. This is probably something (beacon
sending?) racing with the periodic calibration, which temporarily stops
all of the tx traffic and frees the tx buffers, then starts it all up
again. In short, apart from the logging this shouldn't cause any
problems, but we should probably disable the beacon tasklet during this
time.

If this only appeared all of a sudden in recent compat snapshots, it
would be useful to know the last one that worked normally.

> I'll probably have to reboot regularly, since this is on an embedded box
> with limited CF filesystem, and I can't overflow my /var partition...

Ouch. For now, just take it out or demote it to debug.

As for the original problem, I don't know offhand why a large download
would trigger a cascade of these errors. The best way to track it down
is to try to come up with a case that reproduces it and sprinkle printks
throughout the driver, especially when we free and allocate the tx
buffers.

--
Bob Copeland %% http://www.bobcopeland.com

2009-10-14 20:35:28

by Marin Glibic

[permalink] [raw]
Subject: Re: AP: ath5k + hostapd occasionally sulks

I've also been hit by this bug, mentioned last month - but he guy had no
additional info. This is with compat-wireless 2009-10-09 and linux
2.6.31.3, all in master mode, using recent hostapd from git.
Might also be two bugs... first one being module warning and "ath5k
phy0: no further txbuf available, dropping packet" the other.


Oct 14 18:18:28 machinename kernel: ------------[ cut here ]------------
Oct 14 18:18:28 machinename kernel: WARNING: at
/source/compat/compat-wireless-2009-10-09/net/mac80211/rc80211_minstrel.c:69
minstrel_tx_status+0xdf/0x100 [mac80211]()
Oct 14 18:18:28 machinename kernel: Hardware name: KT600-8237
Oct 14 18:18:28 machinename kernel: Modules linked in: lp fuse ppdev
parport_pc rtc_cmos parport rtc_core rtc_lib fan ath5k mac80211 ath
processor uhci_hcd thermal thermal_sys cfg80211 hwmon button i2c_viapro rfk
ill i2c_core 3c59x led_class ehci_hcd shpchp via_agp evdev mii agpgart sg
Oct 14 18:18:28 machinename kernel: Pid: 0, comm: swapper Not tainted
2.6.31.3-smp #2
Oct 14 18:18:28 machinename kernel: Call Trace:
Oct 14 18:18:28 machinename kernel: [<c13cd9c0>] ? printk+0x18/0x20
Oct 14 18:18:28 machinename kernel: [<e0a8ce2f>] ?
minstrel_tx_status+0xdf/0x100 [mac80211]
Oct 14 18:18:28 machinename kernel: [<c1032afc>]
warn_slowpath_common+0x6c/0xc0
Oct 14 18:18:28 machinename kernel: [<e0a8ce2f>] ?
minstrel_tx_status+0xdf/0x100 [mac80211]
Oct 14 18:18:28 machinename kernel: [<c1032b65>]
warn_slowpath_null+0x15/0x20
Oct 14 18:18:28 machinename kernel: [<e0a8ce2f>]
minstrel_tx_status+0xdf/0x100 [mac80211]
Oct 14 18:18:28 machinename kernel: [<e0a6fc8c>]
ieee80211_tx_status+0x47c/0x4d0 [mac80211]
Oct 14 18:18:28 machinename kernel: [<e0ad1293>]
ath5k_tasklet_tx+0x203/0x3b0 [ath5k]
Oct 14 18:18:28 machinename kernel: [<e0ac5883>] ?
ath5k_hw_get_isr+0x223/0x3a0 [ath5k]
Oct 14 18:18:28 machinename kernel: [<c1037430>] tasklet_action+0x50/0xb0
Oct 14 18:18:28 machinename kernel: [<c10382aa>] __do_softirq+0xba/0x180
Oct 14 18:18:28 machinename kernel: [<c10660c8>] ?
handle_IRQ_event+0x58/0x140
Oct 14 18:18:28 machinename kernel: [<c1018dce>] ?
ack_apic_level+0x7e/0x270
Oct 14 18:18:28 machinename kernel: [<c103839d>] do_softirq+0x2d/0x40
Oct 14 18:18:28 machinename kernel: [<c10384f5>] irq_exit+0x65/0x90
Oct 14 18:18:28 machinename kernel: [<c1004d3f>] do_IRQ+0x4f/0xc0
Oct 14 18:18:28 machinename kernel: [<c104c7a9>] ? ktime_get+0x19/0x40
Oct 14 18:18:28 machinename kernel: [<c10034e9>] common_interrupt+0x29/0x30
Oct 14 18:18:28 machinename kernel: [<e0a1f381>] ?
acpi_idle_enter_simple+0x132/0x15d [processor]
Oct 14 18:18:28 machinename kernel: [<c1324c7f>]
cpuidle_idle_call+0x6f/0xc0
Oct 14 18:18:28 machinename kernel: [<c1001efd>] cpu_idle+0x4d/0x80
Oct 14 18:18:28 machinename kernel: [<c13bd835>] rest_init+0x55/0x60
Oct 14 18:18:28 machinename kernel: [<c15498a5>] start_kernel+0x2d5/0x338
Oct 14 18:18:28 machinename kernel: [<c1549386>] ?
unknown_bootoption+0x0/0x1f9
Oct 14 18:18:28 machinename kernel: [<c1549079>]
i386_start_kernel+0x79/0x81
Oct 14 18:18:28 machinename kernel: ---[ end trace 86949b8386bc65bb ]---

and then bit later:

Oct 14 20:22:02 machinename kernel: ath5k phy0: no further txbuf
available, dropping packet
Oct 14 20:22:32 machinename last message repeated 4 times
Oct 14 20:23:33 machinename last message repeated 8 times
Oct 14 20:24:35 machinename last message repeated 8 times
Oct 14 20:25:33 machinename last message repeated 7 times
Oct 14 20:26:33 machinename last message repeated 8 times
Oct 14 20:27:33 machinename last message repeated 8 times
Oct 14 20:28:33 machinename last message repeated 8 times
and lot of these. Network traffic goes down. And even when it works,
network traffic is very slow (<50kB/s)



some hw info:

2.6.31.3-smp #2 SMP Sat Oct 10 22:25:14 CEST 2009 i686 AMD Athlon(tm) XP
2000+ AuthenticAMD GNU/Linux

00:0a.0 Ethernet controller [0200]: Atheros Communications Inc. Atheros
AR5001X+ Wireless Network Adapter [168c:0013] (rev 01)
Subsystem: Wistron NeWeb Corp. CM9 Wireless a/b/g MiniPCI
Adapter [185f:1012]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 168 (2500ns min, 7000ns max), Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 17
Region 0: Memory at dd000000 (32-bit, non-prefetchable) [size=64K]
Capabilities: [44] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=2 PME-
Kernel driver in use: ath5k
Kernel modules: ath5k

Hopefully somebody will catch this. Thanks.