2008-11-15 08:15:53

by Andrey Borzenkov

[permalink] [raw]
Subject: Re: [Orinoco-devel] Agere PCMCIA sometimes takes very long time to associate with 9.48 FW

On Thursday 06 November 2008, Dave wrote:
> Dave wrote:
> > Andrey Borzenkov wrote:
> >> To remind - sometimes after resume it takes *very* long time to associate,
> >> so everything else times out and I get no connection. Running orinoco from
> >> 2.6.28 in WPA mode and wpa_supplicant 0.6.4; in logs can be seen that
> >> wpa_supplicant endlessly tries to associate.
> >
> > The orinoco driver maintains its settings across an 'if down' - 'if up'.
> > This means the card will attempt to associate as soon as you 'if up'.
> >
> > I'm looking at a patch which will clear all driver settings in
> > orinoco_open, which should fix things for distributions which bring the
> > interface down when suspending.
>
> Can you try the patch below? Apologies if it gets whitespace damage.
>

It is hard to be sure when the problem is sporadic and no known way to
trigger it, but I have been running with this patch for a week without
issues; this so far exceeds usual score.

Now comments :)

- we should not be doing it in ->open. It is technically legal to set
wireless parameters before "icfonfig up" and we lose all of them. I will
try next week with similar patch in orinoco_stop().

- I am still not even sure we should do it at all. What is sematic of
ifconfig up/down w.r.t. wireless parameters? I.e. is "ifconfig down"
expected to clean all device state and start from scratch?

- finally I still think we are facing more wpa_supplicant bug, and this is
just a workaround for it. I have seen at least one case of losing access
point and wpa_supplicant was not able to cope with it after AP was found
again.

> Dave.
>
> ---
> diff --git a/drivers/net/wireless/orinoco.c b/drivers/net/wireless/orinoco.c
> index 12059f4..8ca9d51 100644
> --- a/drivers/net/wireless/orinoco.c
> +++ b/drivers/net/wireless/orinoco.c
> @@ -721,6 +721,18 @@ static int orinoco_open(struct net_device *dev)
> if (orinoco_lock(priv, &flags) != 0)
> return -EBUSY;
>
> + /* Clear driver settings. Copied from orinoco_init */
> + priv->iw_mode = IW_MODE_INFRA;
> + priv->prefer_port3 = priv->has_port3 && (! priv->has_ibss);
> + set_port_type(priv);
> + priv->channel = 0;
> + priv->promiscuous = 0;
> + priv->encode_alg = IW_ENCODE_ALG_NONE;
> + priv->tx_key = 0;
> + priv->wpa_enabled = 0;
> + priv->tkip_cm_active = 0;
> + priv->key_mgmt = 0;
> +
> err = __orinoco_up(dev);
>
> if (! err)
>
>
>



Attachments:
(No filename) (2.43 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2008-11-17 15:52:30

by Dan Williams

[permalink] [raw]
Subject: Re: [Orinoco-devel] Agere PCMCIA sometimes takes very long time to associate with 9.48 FW

On Sun, 2008-11-16 at 10:01 +0000, Dave wrote:
> John W. Linville wrote:
> > On Sat, Nov 15, 2008 at 11:15:47AM +0300, Andrey Borzenkov wrote:
> >
> >> - we should not be doing it in ->open. It is technically legal to set
> >> wireless parameters before "icfonfig up" and we lose all of them. I will
> >> try next week with similar patch in orinoco_stop().
> >
> > That seems wrong...
> >
> >> - I am still not even sure we should do it at all. What is sematic of
> >> ifconfig up/down w.r.t. wireless parameters? I.e. is "ifconfig down"
> >> expected to clean all device state and start from scratch?
> >
> > No. Unfortunately, it is mostly a matter of opinion as to what
> > wireless extensions expects.
>
> Agreed with all the above. I'll discard the driver patch.
>
> There are two other things I can think of:
>
> 1. make sure wpa_supplicant is shut down before ifconfig ethX down, and
> restart it on resume.

Drivers shouldn't really care about what userspace is driving them; they
need to either return an error for invalid requests, or handle the
request. Userspace (wpa_supplicant) then needs to be smart enough to
know about device events, which it already does.

> From the data you've provided it looks like your distribution brings the
> device down, but may leave wpa_supplicant running. I've noticed that

That's still a valid case that both the driver and supplicant should
handle.

> every time wpa_supplicant shuts down it removes most configuration
> settings. Or has that changed?

Hasn't changed, on shutdown the supplicant will clear keys and reset
countermeasures and whatnot.

> 2. Does the driver need to send a dissociation event (or something) to
> userspace on ifconfig down?

If the association with the AP is no longer valid, then yes. Most of
the other drivers do this already, I think.

Dan



2008-11-15 15:01:10

by John W. Linville

[permalink] [raw]
Subject: Re: [Orinoco-devel] Agere PCMCIA sometimes takes very long time to associate with 9.48 FW

On Sat, Nov 15, 2008 at 11:15:47AM +0300, Andrey Borzenkov wrote:

> - we should not be doing it in ->open. It is technically legal to set
> wireless parameters before "icfonfig up" and we lose all of them. I will
> try next week with similar patch in orinoco_stop().

That seems wrong...

> - I am still not even sure we should do it at all. What is sematic of
> ifconfig up/down w.r.t. wireless parameters? I.e. is "ifconfig down"
> expected to clean all device state and start from scratch?

No. Unfortunately, it is mostly a matter of opinion as to what
wireless extensions expects.

FWIW, some drivers (particularly mac80211-based ones) require the
device to be 'down' in order to change the mode (i.e. "managed",
"ad-hoc", etc). Further, some drivers (particularly mac80211-based
ones) use the setting of the SSID as a trigger for an association.
So for example, bringing-up a mac80211-based device needs to look
something like this:

ifconfig wlan0 down
iwconfig wlan0 mode Managed
ifconfig wlan0 up
iwconfig wlan0 essid SSID

Beyond that, taking the device down would probably cause a loss of
association. So later marking it up would be insufficient without
also (re-)setting the SSID. But, e.g. channel settings should remain
in effect.

Hopefully I haven't confused the issue -- I'll stop now. :-)

John
--
John W. Linville Linux should be at the core
[email protected] of your literate lifestyle.

2008-11-17 19:29:47

by Andrey Borzenkov

[permalink] [raw]
Subject: Re: [Orinoco-devel] Agere PCMCIA sometimes takes very long time to associate with 9.48 FW

On Sunday 16 November 2008, Dave wrote:
> John W. Linville wrote:
> > On Sat, Nov 15, 2008 at 11:15:47AM +0300, Andrey Borzenkov wrote:
> >
> >> - we should not be doing it in ->open. It is technically legal to set
> >> wireless parameters before "icfonfig up" and we lose all of them. I will
> >> try next week with similar patch in orinoco_stop().
> >
> > That seems wrong...
> >
> >> - I am still not even sure we should do it at all. What is sematic of
> >> ifconfig up/down w.r.t. wireless parameters? I.e. is "ifconfig down"
> >> expected to clean all device state and start from scratch?
> >
> > No. Unfortunately, it is mostly a matter of opinion as to what
> > wireless extensions expects.
>
> Agreed with all the above. I'll discard the driver patch.
>
> There are two other things I can think of:
>
> 1. make sure wpa_supplicant is shut down before ifconfig ethX down, and
> restart it on resume.
>

That is exactly what happens currently.

> From the data you've provided it looks like your distribution brings the
> device down, but may leave wpa_supplicant running.

No, wpa_supplicant is started as part of ifup and stopped as part of ifdown.
Which are called on network start/stop.

> I've noticed that
> every time wpa_supplicant shuts down it removes most configuration
> settings. Or has that changed?
>

As far as I can tell - it should. I am not sure how to check for it; I do
not see any difference in output of either iwconfig or iwlist scan either
before or after "network stop".

We really need some more debugging code in orinoco.

Returning to the original problem. The only difference visible currently is
when association happens.

Good case:

[16992.355418] eth1: Lucent/Agere firmware doesn't support manual roaming
[16992.636475] eth1: New link status: Connected (0001)

As already mentioned, "error" message comes from SIOCSIWAP which is not
supported by Agere firmware. wpa_supplicant calls it as part of association
resuest in wpa_driver_wext_associate().

Bad case is always:

[ 7669.871273] eth1: New link status: Connected (0001)
[ 7670.046171] eth1: Lucent/Agere firmware doesn't support manual roaming
[ 7670.495779] eth1: Lucent/Agere firmware doesn't support manual roaming
[ 7675.774280] eth1: Lucent/Agere firmware doesn't support manual roaming
...

i.e. it /looks/ like first card is connected to AP and then wpa_supplicant
request association. BTW, I just realized that I am not sure what "connected"
means exactly. Does it mean card is associated?

That is what let me believe that old parameters somehow are retained.

Looking at wpa_supplicant log intersting part is

Wireless event: cmd=0x8c07 len=32
AssocReq IE wireless event - hexdump(len=24): dd 16 00 50 f2 01 01 00 00 50 f2 02 01 00 00 50 f2 02 01 00 00 50 f2 02
RTM_NEWLINK: operstate=0 ifi_flags=0x11003 ([UP][LOWER_UP])
RTM_NEWLINK, IFLA_IFNAME: Interface 'eth1' added
Wireless event: cmd=0x8b15 len=20
Wireless event: new AP: 00:60:b3:ca:91:b0
Association info event
req_ies - hexdump(len=24): dd 16 00 50 f2 01 01 00 00 50 f2 02 01 00 00 50 f2 02 01 00 00 50 f2 02
WPA: set own WPA/RSN IE - hexdump(len=24): dd 16 00 50 f2 01 01 00 00 50 f2 02 01 00 00 50 f2 02 01 00 00 50 f2 02
State: ASSOCIATING -> ASSOCIATED
wpa_driver_wext_set_operstate: operstate 0->0 (DORMANT)
WEXT: Operstate: linkmode=-1, operstate=5
Associated to a new BSS: BSSID=44:44:44:44:44:44

Which corresponds to this code:

static void wpa_supplicant_event_assoc(struct wpa_supplicant *wpa_s,
union wpa_event_data *data)
{
u8 bssid[ETH_ALEN];
int ft_completed = wpa_ft_is_completed(wpa_s->wpa);

if (data)
wpa_supplicant_event_associnfo(wpa_s, data);

wpa_supplicant_set_state(wpa_s, WPA_ASSOCIATED);
if (wpa_s->use_client_mlme)
os_memcpy(bssid, wpa_s->bssid, ETH_ALEN);
if (wpa_s->use_client_mlme ||
(wpa_drv_get_bssid(wpa_s, bssid) >= 0 &&
os_memcmp(bssid, wpa_s->bssid, ETH_ALEN) != 0)) {
wpa_msg(wpa_s, MSG_DEBUG, "Associated to a new BSS: BSSID="
MACSTR, MAC2STR(bssid));

in other words, wpa_drv_get_bssid() returns 44:... BSSID. I could not find
any place in kernel or wpa_supplicant that would set it, so I can only assume
that firmware returns it for watever reason.

We really need more debugging in driver ...


Attachments:
(No filename) (4.30 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2008-11-16 10:01:53

by Dave Kilroy

[permalink] [raw]
Subject: Re: [Orinoco-devel] Agere PCMCIA sometimes takes very long time to associate with 9.48 FW

John W. Linville wrote:
> On Sat, Nov 15, 2008 at 11:15:47AM +0300, Andrey Borzenkov wrote:
>
>> - we should not be doing it in ->open. It is technically legal to set
>> wireless parameters before "icfonfig up" and we lose all of them. I will
>> try next week with similar patch in orinoco_stop().
>
> That seems wrong...
>
>> - I am still not even sure we should do it at all. What is sematic of
>> ifconfig up/down w.r.t. wireless parameters? I.e. is "ifconfig down"
>> expected to clean all device state and start from scratch?
>
> No. Unfortunately, it is mostly a matter of opinion as to what
> wireless extensions expects.

Agreed with all the above. I'll discard the driver patch.

There are two other things I can think of:

1. make sure wpa_supplicant is shut down before ifconfig ethX down, and
restart it on resume.

>From the data you've provided it looks like your distribution brings the
device down, but may leave wpa_supplicant running. I've noticed that
every time wpa_supplicant shuts down it removes most configuration
settings. Or has that changed?

2. Does the driver need to send a dissociation event (or something) to
userspace on ifconfig down?



Dave.

2008-11-17 19:37:37

by Dave Kilroy

[permalink] [raw]
Subject: Re: [Orinoco-devel] Agere PCMCIA sometimes takes very long time to associate with 9.48 FW

Dan Williams wrote:
> On Sun, 2008-11-16 at 10:01 +0000, Dave wrote:
>> 2. Does the driver need to send a dissociation event (or something) to
>> userspace on ifconfig down?
>
> If the association with the AP is no longer valid, then yes. Most of
> the other drivers do this already, I think.

I couldn't find in in mac80211, but a few (but not all) of the fullmac
drivers seem to do it.

Andrey: the attached patch may help, but given what you've written in
your other email (which I'll respond to separately), I don't think it
will fix things. Note that it's against wireless testing (+ private
changes). Should apply cleanly to an older tree after you account for
the file move.


Regards,

Dave.
---
diff --git a/drivers/net/wireless/orinoco/orinoco.c
b/drivers/net/wireless/orinoco/orinoco.c
index f8db026..007d5b8 100644
--- a/drivers/net/wireless/orinoco/orinoco.c
+++ b/drivers/net/wireless/orinoco/orinoco.c
@@ -2319,7 +2319,21 @@ int __orinoco_down(struct net_device *dev)
hermes_set_irqmask(hw, 0);
hermes_write_regn(hw, EVACK, 0xffff);
}
-
+
+ /* If associated, tell userspace we're dropping it */
+ if ((priv->last_linkstatus == HERMES_LINKSTATUS_CONNECTED) ||
+ (priv->last_linkstatus == HERMES_LINKSTATUS_AP_CHANGE) ||
+ (priv->last_linkstatus == HERMES_LINKSTATUS_AP_IN_RANGE))
+ {
+ union iwreq_data wrqu;
+
+ wrqu.data.length = 0;
+ wrqu.data.flags = 0;
+ wrqu.ap_addr.sa_family = ARPHRD_ETHER;
+ memset(wrqu.ap_addr.sa_data, 0, ETH_ALEN);
+ wireless_send_event(dev, SIOCGIWAP, &wrqu, NULL);
+ }
+
/* firmware will have to reassociate */
netif_carrier_off(dev);
priv->last_linkstatus = 0xffff;


2008-11-22 07:56:26

by Andrey Borzenkov

[permalink] [raw]
Subject: Re: [Orinoco-devel] Agere PCMCIA sometimes takes very long time to associate with 9.48 FW

On Monday 17 November 2008, Dave wrote:

> > Looking at wpa_supplicant log intersting part is
> >
> > Wireless event: cmd=0x8c07 len=32
> > AssocReq IE wireless event - hexdump(len=24): dd 16 00 50 f2 01 01 00 00 50 f2 02 01 00 00 50 f2 02 01 00 00 50 f2 02
> > RTM_NEWLINK: operstate=0 ifi_flags=0x11003 ([UP][LOWER_UP])
> > RTM_NEWLINK, IFLA_IFNAME: Interface 'eth1' added
> > Wireless event: cmd=0x8b15 len=20
> > Wireless event: new AP: 00:60:b3:ca:91:b0
> > Association info event
> > req_ies - hexdump(len=24): dd 16 00 50 f2 01 01 00 00 50 f2 02 01 00 00 50 f2 02 01 00 00 50 f2 02
> > WPA: set own WPA/RSN IE - hexdump(len=24): dd 16 00 50 f2 01 01 00 00 50 f2 02 01 00 00 50 f2 02 01 00 00 50 f2 02
> > State: ASSOCIATING -> ASSOCIATED
> > wpa_driver_wext_set_operstate: operstate 0->0 (DORMANT)
> > WEXT: Operstate: linkmode=-1, operstate=5
> > Associated to a new BSS: BSSID=44:44:44:44:44:44
>
> > in other words, wpa_drv_get_bssid() returns 44:... BSSID. I could not find
> > any place in kernel or wpa_supplicant that would set it, so I can only assume
> > that firmware returns it for watever reason.
>
> It is a firmware value, but I don't know what the it means. I know
> hostap checks for it in hostap_main:prism2_sta_deauth.
>

In some place wpa_supplicant is using this to mark invalid BSSID; I
do not think it leaks it into the driver now though.

And it is value that is use by Hermes to indicate invalid BSSID. It is even
documents in iwconfig.c :)

/*
* Display an Wireless Access Point Socket Address in readable format.
* Note : 0x44 is an accident of history, that's what the Orinoco/PrismII
* chipset report, and the driver doesn't filter it.
*/

Anyway I now have Yet Another Theory that actually fits quite well into
what I have seen in wpa_supplicant logs and explains 44:... BSSID as well.

Mandriva is using approximately the following sequence to start wireless:

ifconfig eth1 up
iwconfig eth1 whatever-is-set-for-it
wpa_supplicant eth1

Now I had these parameters for eth1:

### WIRELESS_MODE=Managed
### WIRELESS_ESSID="Home, sweet home"
WIRELESS_WPA_DRIVER=wext
### WIRELESS_POWER=on
### WIRELESS_SENS=3

Notice WIRELESS_ESSID. So what happened was apparently this

iwcofnig eth1 ESSID
driver starts associating
wpa_supplicant starts
driver associated and send IWEVASSOCREQIE
wpa_supplicant calls SIOCSIWESSID |
+-------------------------+
|
| driver *again* starts associating
/
wpa_supplicant receives SIOCSIWESSID
wpa_supplicant calls SIOCGIWAP
at this point driver has not yet
completed *second* association
so it returns invalid BSSID
(44:44:44:44:44:44)


So it looks like user space issue after all. There is little driver can
do (except returning "officially blessed" indication for invalid BSSID).
As it is now, wpa_supplicant does not seem to check for invalid BSSID
from SIOCGIWAP in wpa_supplicant_event_assoc() at all. It enters quite a
long retry loop using invalid value until it finally decides to trigger
association again.

It also explains why nobody was able to reproduce it - it is really
quite unique timing issue. Any change in scheduling could have "fixed" it.


Attachments:
(No filename) (3.24 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2008-11-17 20:37:48

by Dave Kilroy

[permalink] [raw]
Subject: Re: [Orinoco-devel] Agere PCMCIA sometimes takes very long time to associate with 9.48 FW

Andrey Borzenkov wrote:
> On Sunday 16 November 2008, Dave wrote:
>> There are two other things I can think of:
>>
>> 1. make sure wpa_supplicant is shut down before ifconfig ethX down, and
>> restart it on resume.
>
> That is exactly what happens currently.
>
>> From the data you've provided it looks like your distribution brings the
>> device down, but may leave wpa_supplicant running.
>
> No, wpa_supplicant is started as part of ifup and stopped as part of ifdown.
> Which are called on network start/stop.

OK. If wpa_supplicant is restarted after resume, the patch I posted
won't do anything interesting.

>> I've noticed that
>> every time wpa_supplicant shuts down it removes most configuration
>> settings. Or has that changed?
>
> As far as I can tell - it should. I am not sure how to check for it; I do
> not see any difference in output of either iwconfig or iwlist scan either
> before or after "network stop".

The wpa_supplicant logfile notes when it is clearing keys and other
settings.

> Returning to the original problem. The only difference visible currently is
> when association happens.
>
> Bad case is always:
>
> [ 7669.871273] eth1: New link status: Connected (0001)
> [ 7670.046171] eth1: Lucent/Agere firmware doesn't support manual roaming
> [ 7670.495779] eth1: Lucent/Agere firmware doesn't support manual roaming
> [ 7675.774280] eth1: Lucent/Agere firmware doesn't support manual roaming
> ...
>
> i.e. it /looks/ like first card is connected to AP and then wpa_supplicant
> request association. BTW, I just realized that I am not sure what "connected"
> means exactly. Does it mean card is associated?

When setup for WPA, I do not believe 'Connected' means 'associated'. The
card connects to the BSSID first. Then wpa_supplicant can do the WPA
association thing with that BSSID. If you are using WEP or no
authentication, 'Connected' is as good as 'associated'.

The repeated 'doesn't support manual roaming' just tells me that
wpa_supplicant is failing to associate, and just keeps retrying.

> That is what let me believe that old parameters somehow are retained.

CONNECTED will pop up shortly after the key management mode (PSK) is
set, wpa is enabled, and SSID is set. This is the same point as when
wireless event 0x8b1a should appear in your log.

> Looking at wpa_supplicant log intersting part is
>
> Wireless event: cmd=0x8c07 len=32
> AssocReq IE wireless event - hexdump(len=24): dd 16 00 50 f2 01 01 00 00 50 f2 02 01 00 00 50 f2 02 01 00 00 50 f2 02
> RTM_NEWLINK: operstate=0 ifi_flags=0x11003 ([UP][LOWER_UP])
> RTM_NEWLINK, IFLA_IFNAME: Interface 'eth1' added
> Wireless event: cmd=0x8b15 len=20
> Wireless event: new AP: 00:60:b3:ca:91:b0
> Association info event
> req_ies - hexdump(len=24): dd 16 00 50 f2 01 01 00 00 50 f2 02 01 00 00 50 f2 02 01 00 00 50 f2 02
> WPA: set own WPA/RSN IE - hexdump(len=24): dd 16 00 50 f2 01 01 00 00 50 f2 02 01 00 00 50 f2 02 01 00 00 50 f2 02
> State: ASSOCIATING -> ASSOCIATED
> wpa_driver_wext_set_operstate: operstate 0->0 (DORMANT)
> WEXT: Operstate: linkmode=-1, operstate=5
> Associated to a new BSS: BSSID=44:44:44:44:44:44

> in other words, wpa_drv_get_bssid() returns 44:... BSSID. I could not find
> any place in kernel or wpa_supplicant that would set it, so I can only assume
> that firmware returns it for watever reason.

It is a firmware value, but I don't know what the it means. I know
hostap checks for it in hostap_main:prism2_sta_deauth.


Regards,

Dave.