2010-05-26 21:47:00

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [ath5k-devel] Race condition in CRDA calls?

Just FYI, you want to use linux-wireless for reporting such things,
not ath5k-devel. Also I have been real busy in March (and stil am but
just trying to reply) so if I do not reply maybe others can. In
linux-wireless you'll likely get much more review/help than ath5k-devel
for this.

More below.

On Thu, Mar 25, 2010 at 10:01:54AM -0700, Lukáš Turek wrote:
> Hi,
>
> I've been testing current vanilla 2.6.34 from git today on a computer with
> three Atheros PCI cards and noticed a strange behavior: although all the
> cards have the same regulatory domain (0x36), only the last one (phy2) had
> the correct channels enabled, the other two (phy0 and phy1) stayed in US
> regulatory domain with channels 52-140 disabled.
>
> It seems the reply from CRDA for the first request gets into the kernel after
> the last card is detected and all the replies get mixed up some way:
>
> ath5k 0000:02:01.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
> ath5k 0000:02:01.0: registered as 'phy0'
> ath: EEPROM regdomain: 0x36
> ath: EEPROM indicates we should expect a direct regpair map
> ath: Country alpha2 being used: CZ
> ath: Regpair used: 0x36
> udev: renamed network interface eth0 to eth2
> udev: renamed network interface eth1_rename to eth0
> phy0: Selected rate control algorithm 'minstrel'
> ath5k phy0: Atheros AR5414 chip found (MAC: 0xa5, PHY: 0x61)
> ath5k 0000:02:02.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
> ath5k 0000:02:02.0: registered as 'phy1'
> ath: EEPROM regdomain: 0x36
> ath: EEPROM indicates we should expect a direct regpair map
> ath: Country alpha2 being used: CZ
> ath: Regpair used: 0x36
> cfg80211: Calling CRDA for country: CZ
> phy1: Selected rate control algorithm 'minstrel'
> ath5k phy1: Atheros AR5414 chip found (MAC: 0xa5, PHY: 0x61)
> ath5k 0000:02:03.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
> ath5k 0000:02:03.0: registered as 'phy2'
> ath: EEPROM regdomain: 0x36
> ath: EEPROM indicates we should expect a direct regpair map
> ath: Country alpha2 being used: CZ
> ath: Regpair used: 0x36
> cfg80211: Calling CRDA for country: CZ
> udev: renamed network interface wlan1 to wlan3
> phy2: Selected rate control algorithm 'minstrel'
> ath5k phy2: Atheros AR5414 chip found (MAC: 0xa5, PHY: 0x61)
> cfg80211: Calling CRDA for country: CZ
> udev: renamed network interface wlan1 to wlan2
> cfg80211: Current regulatory domain intersected:
> (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
> (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm)
> (2457000 KHz - 2482000 KHz @ 20000 KHz), (N/A, 2000 mBm)
> (2474000 KHz - 2483500 KHz @ 9500 KHz), (N/A, 2000 mBm)
> (5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)
> cfg80211: Current regulatory domain intersected:
> (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
> (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm)
> (2457000 KHz - 2482000 KHz @ 20000 KHz), (N/A, 2000 mBm)
> (2474000 KHz - 2483500 KHz @ 9500 KHz), (N/A, 2000 mBm)
> (5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)
> cfg80211: Current regulatory domain intersected:
> (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
> (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm)
> (2457000 KHz - 2482000 KHz @ 20000 KHz), (N/A, 2000 mBm)
> (2474000 KHz - 2483500 KHz @ 9500 KHz), (N/A, 2000 mBm)
> (5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)
>
> Another reason might be udev messing up with device names. I will try to debug
> this, but any hints would be greatly appreciated, because there's not much
> info about the CRDA <-> kernel interaction.

You forgot to supply the kernel log of when cfg80211 loads, that
would have been helpful. After cfg80211 loads then leave a window
open with 'iw event -t' running. Then load ath5k, and if you want to
debug udev you can use udevadm, for example to debug only
kernel messages:

udevadm monitor --environment kernel

In short cfg80211 will do an intersection if the core had a regulatory
domain set for some reason and then a driver suggested to use another
country. If cfg80211 was world roaming and a driver then just used
a request for CZ then it shuld stick to CZ. If a second device comes
up with CZ then -EALREADY should be expected IIRC, if this is not happening
it smells like a bug.

Luis


2010-06-17 10:04:18

by Lukáš Turek

[permalink] [raw]
Subject: Re: [ath5k-devel] Race condition in CRDA calls?

On 17.6.2010 06:47 Bruno Randolf wrote:
> are there any news on that? i think i hit the same bug, now...

I have an ugly workaround that works for me, unfortunately I didn't have
enough time to really fix it. It may involve rethinking the communication
with CRDA, I think the global variable last_request will have to go away.
I reported it in linux-wireless, because it's a generic mac80211 problem,
but I also didn't receive any useful reply there:
https://patchwork.kernel.org/patch/102727/

Lukas Turek


Attachments:
(No filename) (510.00 B)
signature.asc (836.00 B)
This is a digitally signed message part.
Download all attachments

2010-06-17 04:48:25

by Bruno Randolf

[permalink] [raw]
Subject: Re: [ath5k-devel] Race condition in CRDA calls?

hi!

are there any news on that? i think i hit the same bug, now...

thanks,
bruno

On Thu May 27 2010 06:46:58 Luis R. Rodriguez wrote:
> Just FYI, you want to use linux-wireless for reporting such things,
> not ath5k-devel. Also I have been real busy in March (and stil am but
> just trying to reply) so if I do not reply maybe others can. In
> linux-wireless you'll likely get much more review/help than ath5k-devel
> for this.
>
> More below.
>
> On Thu, Mar 25, 2010 at 10:01:54AM -0700, Lukáš Turek wrote:
> > Hi,
> >
> > I've been testing current vanilla 2.6.34 from git today on a computer
> > with three Atheros PCI cards and noticed a strange behavior: although
> > all the cards have the same regulatory domain (0x36), only the last one
> > (phy2) had the correct channels enabled, the other two (phy0 and phy1)
> > stayed in US regulatory domain with channels 52-140 disabled.
> >
> > It seems the reply from CRDA for the first request gets into the kernel
> > after the last card is detected and all the replies get mixed up some
> > way:
> >
> > ath5k 0000:02:01.0: PCI INT A -> GSI 22 (level, low) -> IRQ 22
> > ath5k 0000:02:01.0: registered as 'phy0'
> > ath: EEPROM regdomain: 0x36
> > ath: EEPROM indicates we should expect a direct regpair map
> > ath: Country alpha2 being used: CZ
> > ath: Regpair used: 0x36
> > udev: renamed network interface eth0 to eth2
> > udev: renamed network interface eth1_rename to eth0
> > phy0: Selected rate control algorithm 'minstrel'
> > ath5k phy0: Atheros AR5414 chip found (MAC: 0xa5, PHY: 0x61)
> > ath5k 0000:02:02.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
> > ath5k 0000:02:02.0: registered as 'phy1'
> > ath: EEPROM regdomain: 0x36
> > ath: EEPROM indicates we should expect a direct regpair map
> > ath: Country alpha2 being used: CZ
> > ath: Regpair used: 0x36
> > cfg80211: Calling CRDA for country: CZ
> > phy1: Selected rate control algorithm 'minstrel'
> > ath5k phy1: Atheros AR5414 chip found (MAC: 0xa5, PHY: 0x61)
> > ath5k 0000:02:03.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
> > ath5k 0000:02:03.0: registered as 'phy2'
> > ath: EEPROM regdomain: 0x36
> > ath: EEPROM indicates we should expect a direct regpair map
> > ath: Country alpha2 being used: CZ
> > ath: Regpair used: 0x36
> > cfg80211: Calling CRDA for country: CZ
> > udev: renamed network interface wlan1 to wlan3
> > phy2: Selected rate control algorithm 'minstrel'
> > ath5k phy2: Atheros AR5414 chip found (MAC: 0xa5, PHY: 0x61)
> > cfg80211: Calling CRDA for country: CZ
> > udev: renamed network interface wlan1 to wlan2
> > cfg80211: Current regulatory domain intersected:
> > (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
> > (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm)
> > (2457000 KHz - 2482000 KHz @ 20000 KHz), (N/A, 2000 mBm)
> > (2474000 KHz - 2483500 KHz @ 9500 KHz), (N/A, 2000 mBm)
> > (5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)
> > cfg80211: Current regulatory domain intersected:
> > (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
> > (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm)
> > (2457000 KHz - 2482000 KHz @ 20000 KHz), (N/A, 2000 mBm)
> > (2474000 KHz - 2483500 KHz @ 9500 KHz), (N/A, 2000 mBm)
> > (5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)
> > cfg80211: Current regulatory domain intersected:
> > (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
> > (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm)
> > (2457000 KHz - 2482000 KHz @ 20000 KHz), (N/A, 2000 mBm)
> > (2474000 KHz - 2483500 KHz @ 9500 KHz), (N/A, 2000 mBm)
> > (5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)
> >
> > Another reason might be udev messing up with device names. I will try to
> > debug this, but any hints would be greatly appreciated, because there's
> > not much info about the CRDA <-> kernel interaction.
>
> You forgot to supply the kernel log of when cfg80211 loads, that
> would have been helpful. After cfg80211 loads then leave a window
> open with 'iw event -t' running. Then load ath5k, and if you want to
> debug udev you can use udevadm, for example to debug only
> kernel messages:
>
> udevadm monitor --environment kernel
>
> In short cfg80211 will do an intersection if the core had a regulatory
> domain set for some reason and then a driver suggested to use another
> country. If cfg80211 was world roaming and a driver then just used
> a request for CZ then it shuld stick to CZ. If a second device comes
> up with CZ then -EALREADY should be expected IIRC, if this is not happening
> it smells like a bug.
>
> Luis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless"
> in the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2010-06-17 05:05:54

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [ath5k-devel] Race condition in CRDA calls?

On Wed, Jun 16, 2010 at 9:47 PM, Bruno Randolf <[email protected]> wrote:
> hi!
>
> are there any news on that? i think i hit the same bug, now...

I provided verbose instructions what can be done to further debug this.

Luis

2010-10-11 18:20:28

by Lukáš Turek

[permalink] [raw]
Subject: Re: [ath5k-devel] Race condition in CRDA calls?

On Friday 08 October 2010 09:57:17 you wrote:
> root@RMR1:~# 1284335259.046129: phy #1: regulatory domain change:
> intersection used due to a request made by a driver on phy1
> 1284335263.403515: phy #1: regulatory domain change: intersection used due
> to a request made by a driver on phy1
>
> interesting that phy #1 appears twice...

When I hit this bug my idea was that the problem lies in the CRDA API: the
wiphy identifier is not sent to CRDA, instead a global variable last_request
is used. However, if a new request is started before the previous one is
finished, the old last_request value is replaced, both replies use the same
last_request and strange think happen...

I hoped someone who knows the kernel-userspace interaction better will confirm
or disprove it, so I didn't investigate it further, and then I just forgot
about it...

Lukas Turek


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part.

2010-10-08 07:57:19

by Bruno Randolf

[permalink] [raw]
Subject: Re: [ath5k-devel] Race condition in CRDA calls?

hello!

sorry to revive an old thread, but i finally found the time to continue on
this. i can reproduce the regdomain intersection problem fairly regularly now.
it happens on slow boards (soekris net4826), with more than one ath5k card,
especially under high load. unfortunately this seems to be the case when
OpenWRT loads, but i can also trigger it by bombarding it with traffic (iperf
UDP) on the ethernet device, so the IRQ and SIRQ loads together are around
50%.

On Thu June 17 2010 14:05:32 Luis R. Rodriguez wrote:
> I provided verbose instructions what can be done to further debug this.

the following output mixes kernel log output on the serial and iw event. it's
a little messed up (lines cut) but i think you can see what's going on:

root@RMR1:/# config.sh reload kernel; iw event -t &

(the config.sh schript only reloads the mac80211, cfg80211 and ath module)

cfg80211: Calling CRDA to update world regulatory domain
root@RMR1:/# cfg80211: World regulatory domain updated:
(start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
(2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
(2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
(2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
(5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
(5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
1284335098.343222: phy #0: regulatory domain change: set to world roaming by
the wireless core upon initiat

root@RMR1:/# insmod /lib/modules/2.6.32.10/ath5k.ko

ath5k 0000:00:0e.0: registered as 'phy0'
ath5k phy0: Atheros AR5414 chip found (MAC: 0xa5, PHY: 0x61)
ath5k 0000:00:0f.0: registered as 'phy1'
cfg80211: Calling CRDA for country: JP
ath5k phy1: Atheros AR5414 chip found (MAC: 0xa5, PHY: 0x61)
cfg80211: Calling CRDA for country: JP
root@RMR1:/# cfg80211: Current regulatory domain intersected:
(start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
(2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm)
(2457000 KHz - 2472000 KHz @ 15000 KHz), (N/A, 2000 mBm)
(2457000 KHz - 2472000 KHz @ 15000 KHz), (N/A, 2000 mBm)
(2457000 KHz - 2482000 KHz @ 20000 KHz), (N/A, 2000 mBm)
(2474000 KHz - 2482000 KHz @ 8000 KHz), (N/A, 2000 mBm)
(2474000 KHz - 2482000 KHz @ 8000 KHz), (N/A, 2000 mBm)
(2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm)
(5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)
1284335110.273907: phy #1: regulatory domain change: intersection used due to
a request made by a driver o1
cfg80211: Current regulatory domain intersected:
(start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
(2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm)
(2457000 KHz - 2472000 KHz @ 15000 KHz), (N/A, 2000 mBm)
(2457000 KHz - 2472000 KHz @ 15000 KHz), (N/A, 2000 mBm)
(2457000 KHz - 2472000 KHz @ 15000 KHz), (N/A, 2000 mBm)
(2457000 KHz - 2472000 KHz @ 15000 KHz), (N/A, 2000 mBm)
(2457000 KHz - 2472000 KHz @ 15000 KHz), (N/A, 2000 mBm)
(2457000 KHz - 2472000 KHz @ 15000 KHz), (N/A, 2000 mBm)
(2457000 KHz - 2482000 KHz @ 20000 KHz), (N/A, 2000 mBm)
(2474000 KHz - 2482000 KHz @ 8000 KHz), (N/A, 2000 mBm)
(2474000 KHz - 2482000 KHz @ 8000 KHz), (N/A, 2000 mBm)
(2474000 KHz - 2482000 KHz @ 8000 KHz), (N/A, 2000 mBm)
(2474000 KHz - 2482000 KHz @ 8000 KHz), (N/A, 2000 mBm)
(2474000 KHz - 2482000 KHz @ 8000 KHz), (N/A, 2000 mBm)
(2474000 KHz - 2482000 KHz @ 8000 KHz), (N/A, 2000 mBm)
(2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm)
(5170000 KHz - 5250000 KHz @ 40000 KHz), (N/A, 2000 mBm)
1284335114.847761: phy #1: regulatory domain change: intersection used due to
a request made by a driver o

here again, the output of iw event on a proper console:

root@RMR1:~# 1284335259.046129: phy #1: regulatory domain change: intersection
used due to a request made by a driver on phy1
1284335263.403515: phy #1: regulatory domain change: intersection used due to
a request made by a driver on phy1

interesting that phy #1 appears twice...

if there is no intersection problem it looks like this:

1284334386.775218: phy #0: regulatory domain change: set to JP by a driver
request on phy0
1284334387.020971: phy #1: regulatory domain change: set to JP by a driver
request on phy1

btw: udev is not in use.

any ideas how to further debug this?

bruno

2010-10-11 18:50:36

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [ath5k-devel] Race condition in CRDA calls?

On Mon, Oct 11, 2010 at 11:13:03AM -0700, Lukáš Turek wrote:
> On Friday 08 October 2010 09:57:17 you wrote:
> > root@RMR1:~# 1284335259.046129: phy #1: regulatory domain change:
> > intersection used due to a request made by a driver on phy1
> > 1284335263.403515: phy #1: regulatory domain change: intersection used due
> > to a request made by a driver on phy1
> >
> > interesting that phy #1 appears twice...
>
> When I hit this bug my idea was that the problem lies in the CRDA API: the
> wiphy identifier is not sent to CRDA, instead a global variable last_request
> is used. However, if a new request is started before the previous one is
> finished, the old last_request value is replaced, both replies use the same
> last_request and strange think happen...
>
> I hoped someone who knows the kernel-userspace interaction better will confirm
> or disprove it, so I didn't investigate it further, and then I just forgot
> about it...

Sorry about the delay on a response on this, I've just been caught up
with other higher priority issues, but as soon as I have some cycles I will
give it a shot with two cards.

Luis