2009-05-27 23:30:16

by Andrey Yurovsky

[permalink] [raw]
Subject: ath9k locks up when bringing up a mesh point interface

With today's wireless-testing, bring up an ath9k mesh point locks up
my test laptop (single CPU) but I don't get any messages (/proc/sysrq
set to 9, unfortunately I can't use a serial console right now).

-Andrey


2009-05-28 08:55:59

by Jouni Malinen

[permalink] [raw]
Subject: Re: ath9k locks up when bringing up a mesh point interface

On Wed, May 27, 2009 at 04:30:17PM -0700, Andrey Yurovsky wrote:
> With today's wireless-testing, bring up an ath9k mesh point locks up
> my test laptop (single CPU) but I don't get any messages (/proc/sysrq
> set to 9, unfortunately I can't use a serial console right now).

I don't see a full lockup with two cores, but "ifconfig mesh0 up"
process ends up in some kind of unkillable state eating all CPU.. I
would assume this could be the same issue that just happens to show up
more severe with a single CPU.

dmesg is not showing anything else apart from "mesh: running mesh
housekeeping" popping up every 60 seconds. Pretty much all networking
commands hang (rtnl lock held?) after that, though. After some while,
hung task debugging does indeed start popping up messages that show
rtnl_mutex held, so that's why the system is getting quite unusable.

Anyway, I don't know what exactly is killing the mp setup (or well, not
exactly killing, but more like causing a busy loop somewhere) or whether
it has anything to do with the driver, but at least this one seems to be
trivial to reproduce:

modprobe ath9k
iw phy phy0 interface add mesh0 type mp mesh_id foo
ifconfig mesh0 up

--
Jouni Malinen PGP id EFC895FA

2009-05-28 14:40:34

by Andrey Yurovsky

[permalink] [raw]
Subject: Re: ath9k locks up when bringing up a mesh point interface

On Thu, May 28, 2009 at 1:55 AM, Jouni Malinen <[email protected]> wrote:
> On Wed, May 27, 2009 at 04:30:17PM -0700, Andrey Yurovsky wrote:
>> With today's wireless-testing, bring up an ath9k mesh point locks up
>> my test laptop (single CPU) but I don't get any messages (/proc/sysrq
>> set to 9, unfortunately I can't use a serial console right now).
>
> I don't see a full lockup with two cores, but "ifconfig mesh0 up"
> process ends up in some kind of unkillable state eating all CPU.. I
> would assume this could be the same issue that just happens to show up
> more severe with a single CPU.
>
> dmesg is not showing anything else apart from "mesh: running mesh
> housekeeping" popping up every 60 seconds. Pretty much all networking
> commands hang (rtnl lock held?) after that, though. After some while,
> hung task debugging does indeed start popping up messages that show
> rtnl_mutex held, so that's why the system is getting quite unusable.
>
> Anyway, I don't know what exactly is killing the mp setup (or well, not
> exactly killing, but more like causing a busy loop somewhere) or whether
> it has anything to do with the driver, but at least this one seems to be
> trivial to reproduce:
>
> modprobe ath9k
> iw phy phy0 interface add mesh0 type mp mesh_id foo
> ifconfig mesh0 up
>
> --
> Jouni Malinen ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?PGP id EFC895FA
>

Thanks. For what it's worth, substituting rt2x00 for ath9k works (ie:
one can bring up a mesh) on the same kernel, so it likely has
something to do with the driver.

-Andrey

2009-05-28 16:05:06

by Jouni Malinen

[permalink] [raw]
Subject: Re: ath9k locks up when bringing up a mesh point interface

On Thu, May 28, 2009 at 07:40:36AM -0700, Andrey Yurovsky wrote:

> Thanks. For what it's worth, substituting rt2x00 for ath9k works (ie:
> one can bring up a mesh) on the same kernel, so it likely has
> something to do with the driver.

OK, I traced it to the driver. It looks like mac80211 is asking the
driver to setup beaconing with beacon interval of zero (struct
ieee80211_bss_conf::beacon_int == 0) which does not sound correct to
me.. The adhoc mode beacon setup (which is also shared for mesh) in
ath9k does not exactly like this and ends up in an infinite loop trying
to figure out how many beacon frames have been transfered before the
current TSF..

We can obviously add a sanity check to the driver to avoid the busy
loop, but I would assume that something in the mac80211 mesh code could
also be changed to provide a more reason beacon interval to the driver.

--
Jouni Malinen PGP id EFC895FA