2010-11-12 02:27:29

by Luis R. Rodriguez

[permalink] [raw]
Subject: [PATCH] cfg80211: Fix regulatory bug with multiple cards with a delayed CRDA response

When two cards are connected with the same regulatory domain
if CRDA had a delayed response then cfg80211's own set regulatory
domain would still be the world regulatory domain. There was a bug
on cfg80211's ignore_request() when analyzing incoming driver
regulatory hints as it was only checking against the currently
set cfg80211 regulatory domain and not for any other new
pending requests. This is easily fixed by also checking against
the pending request.

Without this fix the second card would end up with an intersected
regulatory domain and now allow it to use the channels it really
is designed for.

This was reproduced and tested against mac80211_hwsim using this
CRDA delayer:

#!/bin/bash
echo $COUNTRY >> /tmp/log
sleep 2
/sbin/crda.orig

And this regulatory test:

modprobe mac80211_hwsim regtest=2

This patch has fixes for cfg80211 all the way back.

Reported-by: Mark Mentovai <[email protected]>
Cc: [email protected]
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
net/wireless/reg.c | 6 +++++-
1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/net/wireless/reg.c b/net/wireless/reg.c
index 3be18d9..ecac993 100644
--- a/net/wireless/reg.c
+++ b/net/wireless/reg.c
@@ -1281,9 +1281,13 @@ static int ignore_request(struct wiphy *wiphy,
* This would happen if you unplug and plug your card
* back in or if you add a new device for which the previously
* loaded card also agrees on the regulatory domain.
+ * This can also happen if you have two cards both with the
+ * same regulatory domain and CRDA hasn't yet replied back
+ * with the last request's regulatory domain.
*/
if (last_request->initiator == NL80211_REGDOM_SET_BY_DRIVER &&
- !regdom_changes(pending_request->alpha2))
+ (!regdom_changes(pending_request->alpha2) ||
+ alpha2_equal(last_request->alpha2, pending_request->alpha2)))
return -EALREADY;

return REG_INTERSECT;
--
1.7.1



2010-11-12 06:05:22

by Mark Mentovai

[permalink] [raw]
Subject: Re: [PATCH] cfg80211: Fix regulatory bug with multiple cards with a delayed CRDA response

Luis R. Rodriguez wrote:
> When two cards are connected with the same regulatory domain
> if CRDA had a delayed response then cfg80211's own set regulatory
> domain would still be the world regulatory domain. There was a bug
> on cfg80211's ignore_request() when analyzing incoming driver
> regulatory hints as it was only checking against the currently
> set cfg80211 regulatory domain and not for any other new
> pending requests. This is easily fixed by also checking against
> the pending request.

Luis, thanks for taking a look at this bug.

Capsule summary: you?ve overloaded -EALREADY, which until now seemed
to mean ?that?s the regdomain I?m already using,? to now also mean
?I?m going to be using that regdomain soon but I?m waiting on CRDA.?
The two cases need to be handled differently.

In my case, this patch makes things a little bit worse. I tested it
with compat_wireless 20101110. I?ve included what I see after boot
below the explanation.

Your patch changes things so that, according to the steps in my
original message, steps 3 and 4 become:

3. The second card?s driver calls regulatory_hint to provide US as a
driver hint. ignore_request sees that the last request came from a
driver and that the current and last request sought to set the same
hint, so it returns -EALREADY. This triggers the ?if the regulatory
domain being requested by the driver? block, which calls reg_copy_regd
on the apparent assumption that cfg80211_regdomain contains the
regdomain that the wiphy actually wants, although it doesn?t, and it?s
very wrong to do this copy. cfg80211_regdomain is still on 00, because
CRDA hasn?t responded to the first card?s request yet.

4. When CRDA finally responds to the first card?s request from #2, it
gets plumbed to set_regdom, which calls __set_regdom. __set_regdom
sees that it?s not intersecting, and enters the block where it would
normally copy reg_copy_regd to set the wiphy?s regd, but the wiphy it
uses is the one saved from last_request (in step 3), and it already
had something copied to it (also in step 3). Since __set_regdom checks
for this and bails out with -EALREADY in the ?userspace could have
sent two replies with only one kernel request? case. Because
__set_regdom fails, set_regdom returns early and never makes it to the
update_all_wiphy_regulatory or print_regdomain steps. The regdomain
remains unchanged. I?m still stuck at 00.

What about using something other than -EALREADY to signal ?that
request is already pending?? On its face, this works, but I think
there?s a deeper flaw that also needs to be addressed. I?m concerned
that the wiphys that fall into this state won?t see a reg_copy_regd
call at all. The idea is that any such wiphy shouldn?t really be
ignored, but it should be joined to a group of wiphys waiting on the
pending request, and when the request is satisfied, the regd field
should be populated for each of them.

With your patch:

# iw reg get
country 00:
(2402 - 2472 @ 40), (3, 20)
(2457 - 2482 @ 20), (3, 20), PASSIVE-SCAN, NO-IBSS
(2474 - 2494 @ 20), (3, 20), NO-OFDM, PASSIVE-SCAN, NO-IBSS
(5170 - 5250 @ 40), (3, 20), PASSIVE-SCAN, NO-IBSS
(5735 - 5835 @ 40), (3, 20), PASSIVE-SCAN, NO-IBSS
# iw list (relevant bits)
Wiphy phy1
Band 1:
Frequencies:
* 5180 MHz [36] (17.0 dBm) (passive scanning, no IBSS)
* 5200 MHz [40] (17.0 dBm) (passive scanning, no IBSS)
* 5220 MHz [44] (17.0 dBm) (passive scanning, no IBSS)
* 5240 MHz [48] (17.0 dBm) (passive scanning, no IBSS)
* 5260 MHz [52] (disabled)
* 5280 MHz [56] (disabled)
* 5300 MHz [60] (disabled)
* 5320 MHz [64] (disabled)
* 5500 MHz [100] (disabled)
* 5520 MHz [104] (disabled)
* 5540 MHz [108] (disabled)
* 5560 MHz [112] (disabled)
* 5580 MHz [116] (disabled)
* 5600 MHz [120] (disabled)
* 5620 MHz [124] (disabled)
* 5640 MHz [128] (disabled)
* 5660 MHz [132] (disabled)
* 5680 MHz [136] (disabled)
* 5700 MHz [140] (disabled)
* 5745 MHz [149] (20.0 dBm) (passive scanning, no IBSS)
* 5765 MHz [153] (20.0 dBm) (passive scanning, no IBSS)
* 5785 MHz [157] (20.0 dBm) (passive scanning, no IBSS)
* 5805 MHz [161] (20.0 dBm) (passive scanning, no IBSS)
* 5825 MHz [165] (20.0 dBm) (passive scanning, no IBSS)
Wiphy phy0
Band 1:
Frequencies:
* 2412 MHz [1] (20.0 dBm)
* 2417 MHz [2] (20.0 dBm)
* 2422 MHz [3] (20.0 dBm)
* 2427 MHz [4] (20.0 dBm)
* 2432 MHz [5] (20.0 dBm)
* 2437 MHz [6] (20.0 dBm)
* 2442 MHz [7] (20.0 dBm)
* 2447 MHz [8] (20.0 dBm)
* 2452 MHz [9] (20.0 dBm)
* 2457 MHz [10] (20.0 dBm)
* 2462 MHz [11] (20.0 dBm)
* 2467 MHz [12] (20.0 dBm) (passive scanning, no IBSS)
* 2472 MHz [13] (20.0 dBm) (passive scanning, no IBSS)
* 2484 MHz [14] (20.0 dBm) (passive scanning, no IBSS)
# dmesg (relevant bits)
cfg80211: World regulatory domain updated:
(start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
(2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
(2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
(2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm)
(5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
(5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm)
PCI: Enabling device 0000:00:11.0 (0000 -> 0002)
ath: EEPROM regdomain: 0x0
ath: EEPROM indicates default country code should be used
ath: doing EEPROM country->regdmn map search
ath: country maps to regdmn code: 0x3a
ath: Country alpha2 being used: US
ath: Regpair used: 0x3a
ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
Registered led device: ath9k-phy0::radio
Registered led device: ath9k-phy0::assoc
Registered led device: ath9k-phy0::tx
Registered led device: ath9k-phy0::rx
ieee80211 phy0: Atheros AR9280 Rev:2 mem=0xb0000000, irq=48
PCI: Enabling device 0000:00:12.0 (0000 -> 0002)
ath: EEPROM regdomain: 0x0
ath: EEPROM indicates default country code should be used
ath: doing EEPROM country->regdmn map search
ath: country maps to regdmn code: 0x3a
ath: Country alpha2 being used: US
ath: Regpair used: 0x3a
cfg80211: Calling CRDA for country: US
ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
Registered led device: ath9k-phy1::radio
Registered led device: ath9k-phy1::assoc
Registered led device: ath9k-phy1::tx
Registered led device: ath9k-phy1::rx
ieee80211 phy1: Atheros AR9280 Rev:2 mem=0xb0010000, irq=49
(Nothing else from cfg80211, although I was hoping for ?cfg80211:
Regulatory domain changed to country: US? and the new frequency
table.)

Incidentally, the workaround I?ve been using for the past few days
targeted the very same line your patch touches, although I used 0
instead of a special error flag. This resulted in an unnecessary extra
call out to CRDA, but more importantly, it had the ?doesn?t set the
wiphy?s regd? problem, which is why I didn?t mention it.

2010-11-12 02:45:31

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [PATCH] cfg80211: Fix regulatory bug with multiple cards with a delayed CRDA response

On Thu, Nov 11, 2010 at 6:27 PM, Luis R. Rodriguez
<[email protected]> wrote:
> When two cards are connected with the same regulatory domain
> if CRDA had a delayed response then cfg80211's own set regulatory
> domain would still be the world regulatory domain. There was a bug
> on cfg80211's ignore_request() when analyzing incoming driver
> regulatory hints as it was only checking against the currently
> set cfg80211 regulatory domain and not for any other new
> pending requests. This is easily fixed by also checking against
> the pending request.
>
> Without this fix the second card would end up with an intersected
> regulatory domain and now allow it to use the channels it really
> is designed for.
>
> This was reproduced and tested against mac80211_hwsim using this
> CRDA delayer:
>
>        #!/bin/bash
>        echo $COUNTRY >> /tmp/log
>        sleep 2
>        /sbin/crda.orig
>
> And this regulatory test:
>
>        modprobe mac80211_hwsim regtest=2
>
> This patch has fixes for cfg80211 all the way back.
>
> Reported-by: Mark Mentovai <[email protected]>
> Cc: [email protected]
> Signed-off-by: Luis R. Rodriguez <[email protected]>

Hrm,

modprobe mac80211_hwsim regtest=3

Does the similar test but using two different regulatory domains,
however this ends up slapping an intersection over to both. Going to
think this through a bit more.

Luis

2010-11-12 02:53:24

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [PATCH] cfg80211: Fix regulatory bug with multiple cards with a delayed CRDA response

On Thu, Nov 11, 2010 at 6:45 PM, Luis R. Rodriguez
<[email protected]> wrote:
> On Thu, Nov 11, 2010 at 6:27 PM, Luis R. Rodriguez
> <[email protected]> wrote:
>> When two cards are connected with the same regulatory domain
>> if CRDA had a delayed response then cfg80211's own set regulatory
>> domain would still be the world regulatory domain. There was a bug
>> on cfg80211's ignore_request() when analyzing incoming driver
>> regulatory hints as it was only checking against the currently
>> set cfg80211 regulatory domain and not for any other new
>> pending requests. This is easily fixed by also checking against
>> the pending request.
>>
>> Without this fix the second card would end up with an intersected
>> regulatory domain and now allow it to use the channels it really
>> is designed for.
>>
>> This was reproduced and tested against mac80211_hwsim using this
>> CRDA delayer:
>>
>>        #!/bin/bash
>>        echo $COUNTRY >> /tmp/log
>>        sleep 2
>>        /sbin/crda.orig
>>
>> And this regulatory test:
>>
>>        modprobe mac80211_hwsim regtest=2
>>
>> This patch has fixes for cfg80211 all the way back.
>>
>> Reported-by: Mark Mentovai <[email protected]>
>> Cc: [email protected]
>> Signed-off-by: Luis R. Rodriguez <[email protected]>
>
> Hrm,
>
> modprobe mac80211_hwsim regtest=3
>
> Does the similar test but using two different regulatory domains,
> however this ends up slapping an intersection over to both. Going to
> think this through a bit more.

Actually this highlights an issue in design that I originally
addressed in my code by setting up a queue. Without a queue for all
pending requests we have to drop all pending requests that differ. The
extra race here happens because of the new delay. Without the delay
things work as expected, each card would get two separate regulatory
domains and with a core intersected for cards that have no hints. With
the delay we end up getting an intersection but since the cfg80211's
regulatory domain hasn't yet been set and the previous request hasn't
yet been processed by differ we tell cfg80211 to intersect.

I'd like to keep this simple too though, but without a queue not sure
how other than rejecting the new requests under the assumption a
timeout was reached and no CRDA was present. This is of course
incorrect logic given that you can just install CRDA later.

Anyway, that's the issue, the patch is still correct as without the
delay it does not introduce a regression. There is still a lingering
bug for when two cards issue separate regulatory domains and the
pending request is not yet processed. What do do there.

Luis

2010-11-12 20:27:36

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [PATCH] cfg80211: Fix regulatory bug with multiple cards with a delayed CRDA response

On Thu, Nov 11, 2010 at 10:05:21PM -0800, Mark Mentovai wrote:
> Luis R. Rodriguez wrote:
> > When two cards are connected with the same regulatory domain
> > if CRDA had a delayed response then cfg80211's own set regulatory
> > domain would still be the world regulatory domain. There was a bug
> > on cfg80211's ignore_request() when analyzing incoming driver
> > regulatory hints as it was only checking against the currently
> > set cfg80211 regulatory domain and not for any other new
> > pending requests. This is easily fixed by also checking against
> > the pending request.
>
> Luis, thanks for taking a look at this bug.
>
> Capsule summary: you’ve overloaded -EALREADY, which until now seemed
> to mean “that’s the regdomain I’m already using,” to now also mean
> “I’m going to be using that regdomain soon but I’m waiting on CRDA.”
> The two cases need to be handled differently.
>
> In my case, this patch makes things a little bit worse. I tested it
> with compat_wireless 20101110. I’ve included what I see after boot
> below the explanation.
>
> Your patch changes things so that, according to the steps in my
> original message, steps 3 and 4 become:
>
> 3. The second card’s driver calls regulatory_hint to provide US as a
> driver hint. ignore_request sees that the last request came from a
> driver and that the current and last request sought to set the same
> hint, so it returns -EALREADY. This triggers the “if the regulatory
> domain being requested by the driver” block, which calls reg_copy_regd
> on the apparent assumption that cfg80211_regdomain contains the
> regdomain that the wiphy actually wants, although it doesn’t, and it’s
> very wrong to do this copy. cfg80211_regdomain is still on 00, because
> CRDA hasn’t responded to the first card’s request yet.
>
> 4. When CRDA finally responds to the first card’s request from #2, it
> gets plumbed to set_regdom, which calls __set_regdom. __set_regdom
> sees that it’s not intersecting, and enters the block where it would
> normally copy reg_copy_regd to set the wiphy’s regd, but the wiphy it
> uses is the one saved from last_request (in step 3), and it already
> had something copied to it (also in step 3). Since __set_regdom checks
> for this and bails out with -EALREADY in the “userspace could have
> sent two replies with only one kernel request” case. Because
> __set_regdom fails, set_regdom returns early and never makes it to the
> update_all_wiphy_regulatory or print_regdomain steps. The regdomain
> remains unchanged. I’m still stuck at 00.
>
> What about using something other than -EALREADY to signal “that
> request is already pending?” On its face, this works, but I think
> there’s a deeper flaw that also needs to be addressed. I’m concerned
> that the wiphys that fall into this state won’t see a reg_copy_regd
> call at all. The idea is that any such wiphy shouldn’t really be
> ignored, but it should be joined to a group of wiphys waiting on the
> pending request, and when the request is satisfied, the regd field
> should be populated for each of them.

Thanks for testing and your review.

We need to address a queue of requests with the associated wiphys,
as I originally had developed, I'll go ahead and add this and
treat these, it should take care of even multiple cards with
different regulatory hints. I expect the amount of code will be
a bit to large for stable though so likely we may only be able
to fix this for new kernels.

I'll take a stab at this now.

Luis

2010-11-16 21:34:33

by Mark Mentovai

[permalink] [raw]
Subject: Re: [PATCH] cfg80211: Fix regulatory bug with multiple cards with a delayed CRDA response

Luis R. Rodriguez wrote:
> On Mon, Nov 15, 2010 at 07:34:57PM -0800, Mark Mentovai wrote:
>> - I don?t think queue_regulatory_request needs to call schedule_work
>> if there was already something in reg_requests_list.
>
> You mean because we'd be iterating over the list already?

Yeah, but as I pointed out later, it doesn?t matter as long as you get
rid of the delayed work. It?s fine to make multiple calls to
schedule_work even if work is already pending since there?s no longer
any worry about overscheduling the timer.

>> and it might be better to
>> just separate the beacon stuff and the regulatory request stuff into
>> their own work_structs.
>
> Don't see the point. I'd rather just clean up the possible theoretical
> races you've pointed out.

OK. I was thinking that it would be cleaner to only have the work
queue run tasks that you know are ready to run, since you?ve got
pretty good signals for when to try running each. I was thinking that
this would avoid doing the unnecessary atomic operations in case there
was a storm of beacon work to do but nothing that involved CRDA
responses. Again, as long as the delayed thing is gone, this isn?t
strictly necessary. I?m just trying to keep things off the CPU as much
as possible.

>> I agree that the most serious concern is the one you pointed out in
>> your e-mail about the ?missing or sick CRDA? problem. I think that
>> this is particularly scary because CRDA is needed pretty much right
>> from the first time the module initializes.
>
> Not if you have CONFIG_CFG80211_INTERNAL_REGDB and are OK with
> a static update of the regulatory database or if you are just
> using world roaming cards.

Sure. I was kind of assuming that we weren?t worrying about that here,
because the way things work won?t change at all if CRDA is expected to
be missing.

>> even if CRDA shows up later.
>
> If CRDA shows up later then yes, the only way to get a regulatory
> domain applied to the card if it was not world roaming would be
> to reload the module right now. This is a separate issue though
> and I think we should treat it as such.

Yup, I agree.

>> I think that the only reasonable thing to
>> do is to just ensure that there?s only one request outstanding at a
>> time. I?m very opposed to reissuing CRDA requests as long as there?s
>> no way to synchronize by tying a CRDA response to a specific kernel
>> request.
>
> Sure, well so I believe udev is supposed to queue these things,
> not sure, either way I am pretty sure if a udev event was issued
> and CRDA was not present we would not get anything going. One idea
> here is to implement a CRDA wrapper that uses inotify for /sbin/crda
> and queues up requests until CRDA pops in. Can't say I have strong
> motivation for this right now though, chances of you not having
> CRDA are pretty slim ;) but its a way to resolve this IMHO.

I agree, this isn?t terribly important right now.

I think that the bigger problem isn?t ?CRDA missing??but ?CRDA is sick
because user land is broken.? But that?s not really cfg80211?s
problem.

>> Still, it might not be a bad idea to log something if CRDA
>> doesn?t respond for, say, five seconds, and then to log something
>> again if CRDA finally shows up.
>
> I'd rather leave this to userspace and just decide ourselves what
> to do in the kernel if we never get a response.

For now, I think I?m fine just letting things get wedged. If CRDA is
missing from the get-go, there?s no change in behavior from how things
work now. If the user uninstalls CRDA, they deserve what they
get?similar to how there?s no great recovery option when you remove
ld.so. Practically, if user space gets so messed up that CRDA can?t
run properly, probably enough else is broken that the only reasonable
recovery path is to reboot. It?s probably not the worst thing in the
world for cfg80211?s internal regulatory state to just get stuck when
this happens. It?s a better behavior than skipping a CRDA update and
then resuming processing, or applying rules without intersection when
they were intended for intersection.

I?d be open to other ways of handling this in the kernel, too, but I
don?t think things should get too complex here.

> I think that with the implementation I used but with your suggestion
> with avoiding a delayed work struct might be best. If no CRDA response
> goes through then we simply world roam unless you had
> CONFIG_CFG80211_INTERNAL_REGDB.
>
> Whatyda think?

That works for me, and it should work for anyone else with multiple
cards too. Go team!

2010-11-16 00:06:07

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [PATCH] cfg80211: Fix regulatory bug with multiple cards with a delayed CRDA response

On Fri, Nov 12, 2010 at 12:27:32PM -0800, Luis Rodriguez wrote:
> On Thu, Nov 11, 2010 at 10:05:21PM -0800, Mark Mentovai wrote:
> > Luis R. Rodriguez wrote:
> > > When two cards are connected with the same regulatory domain
> > > if CRDA had a delayed response then cfg80211's own set regulatory
> > > domain would still be the world regulatory domain. There was a bug
> > > on cfg80211's ignore_request() when analyzing incoming driver
> > > regulatory hints as it was only checking against the currently
> > > set cfg80211 regulatory domain and not for any other new
> > > pending requests. This is easily fixed by also checking against
> > > the pending request.
> >
> > Luis, thanks for taking a look at this bug.
> >
> > Capsule summary: you’ve overloaded -EALREADY, which until now seemed
> > to mean “that’s the regdomain I’m already using,” to now also mean
> > “I’m going to be using that regdomain soon but I’m waiting on CRDA.”
> > The two cases need to be handled differently.
> >
> > In my case, this patch makes things a little bit worse. I tested it
> > with compat_wireless 20101110. I’ve included what I see after boot
> > below the explanation.
> >
> > Your patch changes things so that, according to the steps in my
> > original message, steps 3 and 4 become:
> >
> > 3. The second card’s driver calls regulatory_hint to provide US as a
> > driver hint. ignore_request sees that the last request came from a
> > driver and that the current and last request sought to set the same
> > hint, so it returns -EALREADY. This triggers the “if the regulatory
> > domain being requested by the driver” block, which calls reg_copy_regd
> > on the apparent assumption that cfg80211_regdomain contains the
> > regdomain that the wiphy actually wants, although it doesn’t, and it’s
> > very wrong to do this copy. cfg80211_regdomain is still on 00, because
> > CRDA hasn’t responded to the first card’s request yet.
> >
> > 4. When CRDA finally responds to the first card’s request from #2, it
> > gets plumbed to set_regdom, which calls __set_regdom. __set_regdom
> > sees that it’s not intersecting, and enters the block where it would
> > normally copy reg_copy_regd to set the wiphy’s regd, but the wiphy it
> > uses is the one saved from last_request (in step 3), and it already
> > had something copied to it (also in step 3). Since __set_regdom checks
> > for this and bails out with -EALREADY in the “userspace could have
> > sent two replies with only one kernel request” case. Because
> > __set_regdom fails, set_regdom returns early and never makes it to the
> > update_all_wiphy_regulatory or print_regdomain steps. The regdomain
> > remains unchanged. I’m still stuck at 00.
> >
> > What about using something other than -EALREADY to signal “that
> > request is already pending?” On its face, this works, but I think
> > there’s a deeper flaw that also needs to be addressed. I’m concerned
> > that the wiphys that fall into this state won’t see a reg_copy_regd
> > call at all. The idea is that any such wiphy shouldn’t really be
> > ignored, but it should be joined to a group of wiphys waiting on the
> > pending request, and when the request is satisfied, the regd field
> > should be populated for each of them.
>
> Thanks for testing and your review.
>
> We need to address a queue of requests with the associated wiphys,
> as I originally had developed, I'll go ahead and add this and
> treat these, it should take care of even multiple cards with
> different regulatory hints. I expect the amount of code will be
> a bit to large for stable though so likely we may only be able
> to fix this for new kernels.
>
> I'll take a stab at this now.

OK we did have the queuing mechanism in place but we were not processing
the requests atomically. Please give the patch below a shot.

diff --git a/include/net/regulatory.h b/include/net/regulatory.h
index 9e103a4..356d6e3 100644
--- a/include/net/regulatory.h
+++ b/include/net/regulatory.h
@@ -43,6 +43,12 @@ enum environment_cap {
* @intersect: indicates whether the wireless core should intersect
* the requested regulatory domain with the presently set regulatory
* domain.
+ * @processed: indicates whether or not this requests has already been
+ * processed. When the last request is processed it means that the
+ * currently regulatory domain set on cfg80211 is updated from
+ * CRDA and can be used by other regulatory requests. When a
+ * the last request is not yet processed we must yield until it
+ * is processed before processing any new requests.
* @country_ie_checksum: checksum of the last processed and accepted
* country IE
* @country_ie_env: lets us know if the AP is telling us we are outdoor,
@@ -54,6 +60,7 @@ struct regulatory_request {
enum nl80211_reg_initiator initiator;
char alpha2[2];
bool intersect;
+ bool processed;
enum environment_cap country_ie_env;
struct list_head list;
};
diff --git a/net/wireless/reg.c b/net/wireless/reg.c
index 3be18d9..570df1e 100644
--- a/net/wireless/reg.c
+++ b/net/wireless/reg.c
@@ -1392,8 +1392,10 @@ new_request:
* have applied the requested regulatory domain before we just
* inform userspace we have processed the request
*/
- if (r == -EALREADY)
+ if (r == -EALREADY) {
nl80211_send_reg_change_event(last_request);
+ last_request->processed = true;
+ }
return r;
}

@@ -1409,16 +1411,13 @@ static void reg_process_hint(struct regulatory_request *reg_request)

BUG_ON(!reg_request->alpha2);

- mutex_lock(&cfg80211_mutex);
- mutex_lock(&reg_mutex);
-
if (wiphy_idx_valid(reg_request->wiphy_idx))
wiphy = wiphy_idx_to_wiphy(reg_request->wiphy_idx);

if (reg_request->initiator == NL80211_REGDOM_SET_BY_DRIVER &&
!wiphy) {
kfree(reg_request);
- goto out;
+ return;
}

r = __regulatory_hint(wiphy, reg_request);
@@ -1426,28 +1425,52 @@ static void reg_process_hint(struct regulatory_request *reg_request)
if (r == -EALREADY && wiphy &&
wiphy->flags & WIPHY_FLAG_STRICT_REGULATORY)
wiphy_update_regulatory(wiphy, initiator);
-out:
- mutex_unlock(&reg_mutex);
- mutex_unlock(&cfg80211_mutex);
}

-/* Processes regulatory hints, this is all the NL80211_REGDOM_SET_BY_* */
+static void reg_delayed_todo(struct work_struct *work);
+static DECLARE_DELAYED_WORK(reg_delayed_work, reg_delayed_todo);
+
+/*
+ * Processes regulatory hints, this is all the NL80211_REGDOM_SET_BY_*
+ * Regulatory hints come on a first come first serve basis and we
+ * must process each one atomically.
+ */
static void reg_process_pending_hints(void)
{
struct regulatory_request *reg_request;

+ mutex_lock(&cfg80211_mutex);
+ mutex_lock(&reg_mutex);
+
+ if (!last_request->processed) {
+ REG_DBG_PRINT("Pending regulatory request, waiting "
+ "for it to be processed...");
+ schedule_delayed_work(&reg_delayed_work, HZ / 20);
+ goto unlock;
+ }
+
spin_lock(&reg_requests_lock);
- while (!list_empty(&reg_requests_list)) {
- reg_request = list_first_entry(&reg_requests_list,
- struct regulatory_request,
- list);
- list_del_init(&reg_request->list);

+ if (list_empty(&reg_requests_list)) {
spin_unlock(&reg_requests_lock);
- reg_process_hint(reg_request);
- spin_lock(&reg_requests_lock);
+ goto unlock;
}
+
+ reg_request = list_first_entry(&reg_requests_list,
+ struct regulatory_request,
+ list);
+ list_del_init(&reg_request->list);
+
+ reg_process_hint(reg_request);
+
+ if (!list_empty(&reg_requests_list))
+ schedule_delayed_work(&reg_delayed_work, HZ / 20);
+
spin_unlock(&reg_requests_lock);
+
+unlock:
+ mutex_unlock(&reg_mutex);
+ mutex_unlock(&cfg80211_mutex);
}

/* Processes beacon hints -- this has nothing to do with country IEs */
@@ -1494,6 +1517,12 @@ static void reg_todo(struct work_struct *work)
reg_process_pending_beacon_hints();
}

+static void reg_delayed_todo(struct work_struct *work)
+{
+ reg_process_pending_hints();
+ reg_process_pending_beacon_hints();
+}
+
static DECLARE_WORK(reg_work, reg_todo);

static void queue_regulatory_request(struct regulatory_request *request)
@@ -1746,11 +1775,11 @@ static void restore_regulatory_settings(bool reset_user)
/* First restore to the basic regulatory settings */
cfg80211_regdomain = cfg80211_world_regdom;

+ regulatory_hint_core(cfg80211_regdomain->alpha2);
+
mutex_unlock(&reg_mutex);
mutex_unlock(&cfg80211_mutex);

- regulatory_hint_core(cfg80211_regdomain->alpha2);
-
/*
* This restores the ieee80211_regdom module parameter
* preference or the last user requested regulatory
@@ -2061,6 +2090,8 @@ int set_regdom(const struct ieee80211_regdomain *rd)

nl80211_send_reg_change_event(last_request);

+ last_request->processed = true;
+
mutex_unlock(&reg_mutex);

return r;
@@ -2105,10 +2136,15 @@ int __init regulatory_init(void)
user_alpha2[0] = '9';
user_alpha2[1] = '7';

+ mutex_lock(&cfg80211_mutex);
+ mutex_lock(&reg_mutex);
+
/* We always try to get an update for the static regdomain */
err = regulatory_hint_core(cfg80211_regdomain->alpha2);
if (err) {
if (err == -ENOMEM)
+ mutex_unlock(&cfg80211_mutex);
+ mutex_unlock(&reg_mutex);
return err;
/*
* N.B. kobject_uevent_env() can fail mainly for when we're out
@@ -2125,6 +2161,9 @@ int __init regulatory_init(void)
#endif
}

+ mutex_unlock(&reg_mutex);
+ mutex_unlock(&cfg80211_mutex);
+
/*
* Finally, if the user set the module parameter treat it
* as a user hint.
@@ -2141,6 +2180,7 @@ void /* __init_or_exit */ regulatory_exit(void)
struct reg_beacon *reg_beacon, *btmp;

cancel_work_sync(&reg_work);
+ cancel_delayed_work_sync(&reg_delayed_work);

mutex_lock(&cfg80211_mutex);
mutex_lock(&reg_mutex);

2010-11-16 20:27:33

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [PATCH] cfg80211: Fix regulatory bug with multiple cards with a delayed CRDA response

On Mon, Nov 15, 2010 at 07:34:57PM -0800, Mark Mentovai wrote:
> Luis-
>
> Excellent, this does the trick!
>
> I’ve got some review comments. This looks longish, but many of these
> suggestions are minor and will probably result in a simpler and better
> implementation.
>
> - The reg_process_hint call in regulatory_core bypasses the queue. I
> don’t think this is a great idea. regulatory_core call should probably
> go through queue_regulatory_request instead. Since the kernel
> regulatory code doesn’t maintain any sort of synchronization with user
> space, I think it’s real important to avoid having more than one
> request pending at any time. It would be bad if there were a pending
> request out, CRDA was slow to respond, and then a regulatory_hint_core
> call came in from restore_regulatory_settings expecting to set things
> back to a clean slate. That would put two outstanding requests out at
> once. When CRDA responds, the kernel wouldn’t be able to distinguish
> between the responses, it would possibly process them out of order,
> and with only a single last_request to refer to, it would process them
> incorrectly even if handled in the correct order.

Good point, I skipped the queue in the original work as I ran into
issues using it during the cfg80211's init process. Its odd, I had
complaints about using a mutex. Anyway, its using a mutex now and
it should work I think so I'll try it.

> - With the above change to regulatory_core, last_request becomes
> extraneous: it’s a rough synonym for the head of reg_requests_list.
> You may be able to get rid of last_request altogether. This is a more
> invasive change and should almost certainly not be a part of this
> patch, but it’s something to keep in mind.

Eh, not really, we'd have to keep around the request on the queue, and
we do not. So last_request really is the last process/being processed
request.

> - This will spin on reg_delayed_work/reg_process_pending_hints if user
> space never provides regulatory data. (You point this out in your
> subsequent e-mail.) The way things are implemented in your patch, I
> don’t see any point to spinning in reg_delayed_work. You could just
> call schedule_work from where you set last_request->processed to true
> if reg_requests_list is non-empty.

I also thought of doing it this way but for some reason which I forget
I ended up doing it using the delayed work.

> - Also looking at delayed work, it looks like there’s a way to call
> schedule_delayed_work when delayed work is already scheduled, and I
> think that the workqueue implementation considers this a bug. Assume
> that reg_process_pending_hints calls schedule_delayed_work. Then,
> before the delay expires, regulatory_hint calls
> queue_regulatory_request, which calls schedule_work. In that case,
> reg_process_pending_hints can be called again before the delay expires
> from the earlier pass through, and you’ll hit this case (kernel bug.)

Sure, thought of this as well, but left it as a after-thought excercise
for optimization.

> - I don’t think queue_regulatory_request needs to call schedule_work
> if there was already something in reg_requests_list.

You mean because we'd be iterating over the list already?

> I guess that
> making the call conditional on an empty list would work around the
> above problem too, although you’d still have to worry about the
> schedule_work call in regulatory_hint_found_beacon—more on that in a
> sec. Still, this doesn’t feel as clean or simple to me as just getting
> rid of the delayed work. Then you could keep this schedule_work call
> and just let the workqueue’s one-shot nature sort it all out for you.

I'll review this a bit more, thanks for the pointers.

> - I haven’t looked at reg_process_pending_beacon_hints or
> regulatory_hint_found_beacon much, but I suspect that they don’t have
> anything to do with ordering CRDA calls,

Right, they are used to send hints to cfg80211 when a beacon is found.
When a beacon from an AP is found we tell cfg80211, it in turn will
lift passive-scan and no-ibss flags from the channel if the card is
world roaming. This is done for all non-DFS and non-channel-12-13-14.

> and it might be better to
> just separate the beacon stuff and the regulatory request stuff into
> their own work_structs.

Don't see the point. I'd rather just clean up the possible theoretical
races you've pointed out.

> Be especially careful if you decide to keep
> them on the same work_struct AND you keep the delayed work—see above.

Sure.

> - Your comment describing @processed in regulatory.h could be
> reworded: “the currently regulatory domain set” and “when a the last
> request”.

Thanks.

> I agree that the most serious concern is the one you pointed out in
> your e-mail about the “missing or sick CRDA” problem. I think that
> this is particularly scary because CRDA is needed pretty much right
> from the first time the module initializes.

Not if you have CONFIG_CFG80211_INTERNAL_REGDB and are OK with
a static update of the regulatory database or if you are just
using world roaming cards.

> If at any point CRDA is
> missing, there’s no way to ensure that regulatory data is correct,

Yes there is, this was by design, you'll just world roam.

> even if CRDA shows up later.

If CRDA shows up later then yes, the only way to get a regulatory
domain applied to the card if it was not world roaming would be
to reload the module right now. This is a separate issue though
and I think we should treat it as such.

> I think that the only reasonable thing to
> do is to just ensure that there’s only one request outstanding at a
> time. I’m very opposed to reissuing CRDA requests as long as there’s
> no way to synchronize by tying a CRDA response to a specific kernel
> request.

Sure, well so I believe udev is supposed to queue these things,
not sure, either way I am pretty sure if a udev event was issued
and CRDA was not present we would not get anything going. One idea
here is to implement a CRDA wrapper that uses inotify for /sbin/crda
and queues up requests until CRDA pops in. Can't say I have strong
motivation for this right now though, chances of you not having
CRDA are pretty slim ;) but its a way to resolve this IMHO.

> Still, it might not be a bad idea to log something if CRDA
> doesn’t respond for, say, five seconds, and then to log something
> again if CRDA finally shows up.

I'd rather leave this to userspace and just decide ourselves what
to do in the kernel if we never get a response.

I think that with the implementation I used but with your suggestion
with avoiding a delayed work struct might be best. If no CRDA response
goes through then we simply world roam unless you had
CONFIG_CFG80211_INTERNAL_REGDB.

Whatyda think?

> Getting rid of the delayed work spin
> in the kernel means that at least the rest of the system won’t suffer
> if CRDA goes to lunch.

Agreed wholeheartedly.

> It sucks that user space isn’t as reliable as kernel space and that
> it’s possible for a transient problem like “no more processes” or “no
> more FDs” in user space to wedge the regulatory code like this, but I
> guess that was already considered and dismissed when the
> regulatory-CRDA interface was designed.

That's userspace's problem :D but yea we can implement the inotify
thingy, that would help. Patches welcomed.

> Thanks for working on this.

Thanks for reporting this and working with me on testing it, and of course
all the suggestions.

Luis

2010-11-16 03:34:59

by Mark Mentovai

[permalink] [raw]
Subject: Re: [PATCH] cfg80211: Fix regulatory bug with multiple cards with a delayed CRDA response

Luis-

Excellent, this does the trick!

I?ve got some review comments. This looks longish, but many of these
suggestions are minor and will probably result in a simpler and better
implementation.

- The reg_process_hint call in regulatory_core bypasses the queue. I
don?t think this is a great idea. regulatory_core call should probably
go through queue_regulatory_request instead. Since the kernel
regulatory code doesn?t maintain any sort of synchronization with user
space, I think it?s real important to avoid having more than one
request pending at any time. It would be bad if there were a pending
request out, CRDA was slow to respond, and then a regulatory_hint_core
call came in from restore_regulatory_settings expecting to set things
back to a clean slate. That would put two outstanding requests out at
once. When CRDA responds, the kernel wouldn?t be able to distinguish
between the responses, it would possibly process them out of order,
and with only a single last_request to refer to, it would process them
incorrectly even if handled in the correct order.

- With the above change to regulatory_core, last_request becomes
extraneous: it?s a rough synonym for the head of reg_requests_list.
You may be able to get rid of last_request altogether. This is a more
invasive change and should almost certainly not be a part of this
patch, but it?s something to keep in mind.

- This will spin on reg_delayed_work/reg_process_pending_hints if user
space never provides regulatory data. (You point this out in your
subsequent e-mail.) The way things are implemented in your patch, I
don?t see any point to spinning in reg_delayed_work. You could just
call schedule_work from where you set last_request->processed to true
if reg_requests_list is non-empty.

- Also looking at delayed work, it looks like there?s a way to call
schedule_delayed_work when delayed work is already scheduled, and I
think that the workqueue implementation considers this a bug. Assume
that reg_process_pending_hints calls schedule_delayed_work. Then,
before the delay expires, regulatory_hint calls
queue_regulatory_request, which calls schedule_work. In that case,
reg_process_pending_hints can be called again before the delay expires
from the earlier pass through, and you?ll hit this case (kernel bug.)

- I don?t think queue_regulatory_request needs to call schedule_work
if there was already something in reg_requests_list. I guess that
making the call conditional on an empty list would work around the
above problem too, although you?d still have to worry about the
schedule_work call in regulatory_hint_found_beacon?more on that in a
sec. Still, this doesn?t feel as clean or simple to me as just getting
rid of the delayed work. Then you could keep this schedule_work call
and just let the workqueue?s one-shot nature sort it all out for you.

- I haven?t looked at reg_process_pending_beacon_hints or
regulatory_hint_found_beacon much, but I suspect that they don?t have
anything to do with ordering CRDA calls, and it might be better to
just separate the beacon stuff and the regulatory request stuff into
their own work_structs. Be especially careful if you decide to keep
them on the same work_struct AND you keep the delayed work?see above.

- Your comment describing @processed in regulatory.h could be
reworded: ?the currently regulatory domain set? and ?when a the last
request?.

I agree that the most serious concern is the one you pointed out in
your e-mail about the ?missing or sick CRDA? problem. I think that
this is particularly scary because CRDA is needed pretty much right
from the first time the module initializes. If at any point CRDA is
missing, there?s no way to ensure that regulatory data is correct,
even if CRDA shows up later. I think that the only reasonable thing to
do is to just ensure that there?s only one request outstanding at a
time. I?m very opposed to reissuing CRDA requests as long as there?s
no way to synchronize by tying a CRDA response to a specific kernel
request. Still, it might not be a bad idea to log something if CRDA
doesn?t respond for, say, five seconds, and then to log something
again if CRDA finally shows up. Getting rid of the delayed work spin
in the kernel means that at least the rest of the system won?t suffer
if CRDA goes to lunch.

It sucks that user space isn?t as reliable as kernel space and that
it?s possible for a transient problem like ?no more processes? or ?no
more FDs? in user space to wedge the regulatory code like this, but I
guess that was already considered and dismissed when the
regulatory-CRDA interface was designed.

Thanks for working on this.

Mark

2010-11-16 00:33:25

by Luis R. Rodriguez

[permalink] [raw]
Subject: Re: [PATCH] cfg80211: Fix regulatory bug with multiple cards with a delayed CRDA response

On Mon, Nov 15, 2010 at 4:06 PM, Luis R. Rodriguez
<[email protected]> wrote:
> On Fri, Nov 12, 2010 at 12:27:32PM -0800, Luis Rodriguez wrote:
>> On Thu, Nov 11, 2010 at 10:05:21PM -0800, Mark Mentovai wrote:
>> > Luis R. Rodriguez wrote:
>> > > When two cards are connected with the same regulatory domain
>> > > if CRDA had a delayed response then cfg80211's own set regulatory
>> > > domain would still be the world regulatory domain. There was a bug
>> > > on cfg80211's ignore_request() when analyzing incoming driver
>> > > regulatory hints as it was only checking against the currently
>> > > set cfg80211 regulatory domain and not for any other new
>> > > pending requests. This is easily fixed by also checking against
>> > > the pending request.
>> >
>> > Luis, thanks for taking a look at this bug.
>> >
>> > Capsule summary: you’ve overloaded -EALREADY, which until now seemed
>> > to mean “that’s the regdomain I’m already using,” to now also mean
>> > “I’m going to be using that regdomain soon but I’m waiting on CRDA.”
>> > The two cases need to be handled differently.
>> >
>> > In my case, this patch makes things a little bit worse. I tested it
>> > with compat_wireless 20101110. I’ve included what I see after boot
>> > below the explanation.
>> >
>> > Your patch changes things so that, according to the steps in my
>> > original message, steps 3 and 4 become:
>> >
>> > 3. The second card’s driver calls regulatory_hint to provide US as a
>> > driver hint. ignore_request sees that the last request came from a
>> > driver and that the current and last request sought to set the same
>> > hint, so it returns -EALREADY. This triggers the “if the regulatory
>> > domain being requested by the driver” block, which calls reg_copy_regd
>> > on the apparent assumption that cfg80211_regdomain contains the
>> > regdomain that the wiphy actually wants, although it doesn’t, and it’s
>> > very wrong to do this copy. cfg80211_regdomain is still on 00, because
>> > CRDA hasn’t responded to the first card’s request yet.
>> >
>> > 4. When CRDA finally responds to the first card’s request from #2, it
>> > gets plumbed to set_regdom, which calls __set_regdom. __set_regdom
>> > sees that it’s not intersecting, and enters the block where it would
>> > normally copy reg_copy_regd to set the wiphy’s regd, but the wiphy it
>> > uses is the one saved from last_request (in step 3), and it already
>> > had something copied to it (also in step 3). Since __set_regdom checks
>> > for this and bails out with -EALREADY in the “userspace could have
>> > sent two replies with only one kernel request” case. Because
>> > __set_regdom fails, set_regdom returns early and never makes it to the
>> > update_all_wiphy_regulatory or print_regdomain steps. The regdomain
>> > remains unchanged. I’m still stuck at 00.
>> >
>> > What about using something other than -EALREADY to signal “that
>> > request is already pending?” On its face, this works, but I think
>> > there’s a deeper flaw that also needs to be addressed. I’m concerned
>> > that the wiphys that fall into this state won’t see a reg_copy_regd
>> > call at all. The idea is that any such wiphy shouldn’t really be
>> > ignored, but it should be joined to a group of wiphys waiting on the
>> > pending request, and when the request is satisfied, the regd field
>> > should be populated for each of them.
>>
>> Thanks for testing and your review.
>>
>> We need to address a queue of requests with the associated wiphys,
>> as I originally had developed, I'll go ahead and add this and
>> treat these, it should take care of even multiple cards with
>> different regulatory hints. I expect the amount of code will be
>> a bit to large for stable though so likely we may only be able
>> to fix this for new kernels.
>>
>> I'll take a stab at this now.
>
> OK we did have the queuing mechanism in place but we were not processing
> the requests atomically. Please give the patch below a shot.

Now the next thing we need to do is decide what we want to do in the
case CRDA is not present and we keep checking and the last request is
still not processed. How many times do we want to check for this, and
what do we do if we give up?

I'm thinking we check every 1/2 second and in the case we don't get a
response in 3 seconds we give up completely on CRDA. This would entail
dropping all pending regulatory requests. Not sure if we should bother
even allowing for 11d hints if CRDA was not found.. Then again users
can install CRDA after... so do we just require a reboot then?

Luis