2009-04-15 23:21:18

by Marcel Holtmann

[permalink] [raw]
Subject: Problem with IEEE80211_MONITORING_INTERVAL

Hi guys,

so I have an Lenovo X200 with an Intel 5350 cards. This card contains
WiFi and WiMAX radios (not shared). The association with an AP keeps
breaking all the time and it is pretty hard to establish a durable
connection.

After a long debugging session with Reinette we finally tracked down
what is going on here. A full scan with this card takes 7-9 seconds and
that seems to be too long and trigger IEEE80211_MONITORING_INTERVAL
which then deauthenticates us with the AP (reason=6). Problem then is
that the attempt authenticate/associate with the AP fails with all the
time (reason=2). Bringing the interface down/up again doesn't help. It
seems that some internal mac80211 states are messed up. Some times the
whole process stalls completely and nothing happens anymore. Only
unloading and reloading the driver or rebooting helps.

Increasing the IEEE80211_MONITORING_INTERVAL to 10 seconds doesn't
trigger the deauthentication with the AP anymore (or less likely).

During debugging I verified that only the scan actually triggers the
deauthentication. If I don't scan (NM disabled) then everything works
smoothly.

I am running the latest wireless-testing tree (based on 2.6.30-rc2) from
today and the wpa_supplicant from GIT with nl80211 driver.

Regards

Marcel




2009-04-16 09:13:22

by Johannes Berg

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

On Thu, 2009-04-16 at 09:14 +0300, Kalle Valo wrote:

> Basically the problem is that if you scan in an area where there are no
> or very few APs and scan takes more than two seconds (like abg band
> scans usually take), sta->last_rx won't get updated because mac80211 is
> not receiving any frames. This is why the beacon loss check will trigger
> in ieee80211_associated().
>
> The problem won't happen if there are lot of APs in the neighbourhood
> distributed throughout the channels because then sta->last_rx is updated
> often enough and the beacon loss check won't trigger.

That doesn't make sense, why would our AP sta->last_rx be updated for
other APs? I think you mean sdata->u.mgd.last_beacon, which is, probably
wrongly, updated for all beacons.

> I have been thinking two ways to fix this, either disabling the timer
> for the duration of the scan or add a check for scan scan in
> ieee80211_associated(). I started implementing the former but haven't
> finished it yet. It would be great if someone else can fix it.

But that doesn't make sense to me now. ieee80211_associated() is only
run from the station work (ieee80211_sta_work) which doesn't do anything
when we're scanning (and gets restarted on scan end). So even if last_rx
or last_beacon isn't updated we should only run the evaluation of that
after the scan finishes.

I'll try to reproduce this.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2009-04-22 09:12:28

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi Johannes,

> > one other weird thing happens now (except wireless-testing tree and
> > suspend/resume is broken). The first connection attempt fails horribly:
> >
> > CTRL-EVENT-SCAN-RESULTS
> > Trying to authenticate with 00:23:ab:xx:xx:xx (SSID='Guest' freq=2412 MHz)
> > Trying to associate with 00:23:ab:xx:xx:xx (SSID='Guest' freq=2412 MHz)
> > Associated with 00:23:ab:xx:xx:xx
> > CTRL-EVENT-CONNECTED - Connection to 00:23:ab:xx:xx:xx completed (auth) [id=2 id_str=]
> > CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
> > l2_packet_receive - recvfrom: Network is down
> > Failed to initiate AP scan.
> > Failed to initiate AP scan.
> > ^CCTRL-EVENT-TERMINATING - signal 2 received
> >
> > So starting wpa_supplicant does its job fine, then running dhclient I
> > end up in the disconnect and network down and after that scanning fails.
>
> ???
>
> And the below kernel message is all yo uget?

Yes, I am as puzzled as you are and can't make any sense out of it. It
seems that something is going horribly wrong here somewhere. I just
don't have any clue where.

> > Killing wpa_supplicant and dhclient and then trying again works fine. So
> > far I have seen that only after a fresh boot.
> >
> > Found this one in dmesg:
> >
> > ------------[ cut here ]------------
> > WARNING: at net/wireless/nl80211.c:205 nl80211_send_wiphy+0x4b/0x988
> > [cfg80211]()
> > Hardware name: 7454CTO
> > Modules linked in: fuse rfcomm sco bnep l2cap binfmt_misc
> > snd_hda_codec_conexant iwlagn iwlcore snd_hda_intel i2400m_usb
> > snd_hda_codec i2400m mac80211 uvcvideo snd_pcm btusb snd_timer snd wimax
> > cfg80211 soundcore bluetooth snd_page_alloc uhci_hcd ehci_hcd [last
> > unloaded: microcode]
> > Pid: 3274, comm: wpa_supplicant Tainted: G W 2.6.30-rc2-wl #1
> > Call Trace:
> > [<ffffffff80236604>] warn_slowpath+0xb1/0xe5
> > [<ffffffff8039a44a>] ? extract_buf+0x84/0xe4
> > [<ffffffffa0055e04>] nl80211_send_wiphy+0x4b/0x988 [cfg80211]
>
> What tree are you using? This warning totally confuses me -- there's no
> code to produce it! It was a spurious warning Luis added some time ago,
> we removed right away again and now it's haunting us.

This is the iwlwifi tree based on wireless-testing. It should be
re-based, but maybe an old patch is still stuck in it. Remember that I
am using the nl80211 driver of wpa_supplicant.

Regards

Marcel



2009-04-22 07:32:25

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi Johannes,

> > I also updated wpa_supplicant and iw to the latest code and assume that
> > no fix in there silently fixed it for me.
>
> I wouldn't suspect any significant changes there -- for sure not iw :)
>
> > So we are suspecting that this issue was because of me having 3GB of RAM
> > and running on a 64-bit kernel. Does this makes sense and could be the
> > problem why it kept failing until I unload the iwlagn driver?
>
> I couldn't say. Nothing here makes much sense to me, unfortunately.

one other weird thing happens now (except wireless-testing tree and
suspend/resume is broken). The first connection attempt fails horribly:

CTRL-EVENT-SCAN-RESULTS
Trying to authenticate with 00:23:ab:xx:xx:xx (SSID='Guest' freq=2412 MHz)
Trying to associate with 00:23:ab:xx:xx:xx (SSID='Guest' freq=2412 MHz)
Associated with 00:23:ab:xx:xx:xx
CTRL-EVENT-CONNECTED - Connection to 00:23:ab:xx:xx:xx completed (auth) [id=2 id_str=]
CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
l2_packet_receive - recvfrom: Network is down
Failed to initiate AP scan.
Failed to initiate AP scan.
^CCTRL-EVENT-TERMINATING - signal 2 received

So starting wpa_supplicant does its job fine, then running dhclient I
end up in the disconnect and network down and after that scanning fails.

Killing wpa_supplicant and dhclient and then trying again works fine. So
far I have seen that only after a fresh boot.

Found this one in dmesg:

------------[ cut here ]------------
WARNING: at net/wireless/nl80211.c:205 nl80211_send_wiphy+0x4b/0x988
[cfg80211]()
Hardware name: 7454CTO
Modules linked in: fuse rfcomm sco bnep l2cap binfmt_misc
snd_hda_codec_conexant iwlagn iwlcore snd_hda_intel i2400m_usb
snd_hda_codec i2400m mac80211 uvcvideo snd_pcm btusb snd_timer snd wimax
cfg80211 soundcore bluetooth snd_page_alloc uhci_hcd ehci_hcd [last
unloaded: microcode]
Pid: 3274, comm: wpa_supplicant Tainted: G W 2.6.30-rc2-wl #1
Call Trace:
[<ffffffff80236604>] warn_slowpath+0xb1/0xe5
[<ffffffff8039a44a>] ? extract_buf+0x84/0xe4
[<ffffffffa0055e04>] nl80211_send_wiphy+0x4b/0x988 [cfg80211]
[<ffffffff8039a193>] ? mix_pool_bytes_extract+0x147/0x156
[<ffffffffa004d9f3>] ? __cfg80211_drv_from_info+0x6d/0xe5 [cfg80211]
[<ffffffffa00569cf>] nl80211_get_wiphy+0x56/0xa2 [cfg80211]
[<ffffffff804b8582>] genl_rcv_msg+0x198/0x1b9
[<ffffffff804b83ea>] ? genl_rcv_msg+0x0/0x1b9
[<ffffffff804b7cad>] netlink_rcv_skb+0x3e/0x91
[<ffffffff804b83da>] genl_rcv+0x27/0x37
[<ffffffff804b7759>] netlink_unicast+0x1fe/0x274
[<ffffffff804b7a66>] netlink_sendmsg+0x297/0x2aa
[<ffffffff80498c7c>] sock_sendmsg+0xfd/0x120
[<ffffffff8022da05>] ? enqueue_task_fair+0x145/0x1a2
[<ffffffff8051cfb5>] ? _spin_unlock_irqrestore+0x2e/0x30
[<ffffffff8024972a>] ? autoremove_wake_function+0x0/0x38
[<ffffffff8051b16b>] ? schedule+0x18/0x3b
[<ffffffff8051b2f7>] ? preempt_schedule+0x35/0x5c
[<ffffffff8051d045>] ? _spin_unlock+0x2c/0x2e
[<ffffffff8051b77f>] ? __mutex_unlock_slowpath+0x3a/0x3f
[<ffffffff8051b714>] ? mutex_unlock+0x16/0x18
[<ffffffff804b03e3>] ? __rtnl_unlock+0x10/0x12
[<ffffffff80499888>] ? move_addr_to_kernel+0x40/0x49
[<ffffffff804a16ae>] ? verify_iovec+0x4f/0x8d
[<ffffffff80498e71>] sys_sendmsg+0x1d2/0x23c
[<ffffffff804e46ae>] ? inet_ioctl+0x92/0xaa
[<ffffffff804981de>] ? sock_ioctl+0x1f1/0x21b
[<ffffffff802a73a0>] ? vfs_ioctl+0x2a/0x78
[<ffffffff802a782a>] ? do_vfs_ioctl+0x43c/0x478
[<ffffffff8026d4a1>] ? audit_syscall_entry+0x149/0x175
[<ffffffff8020af6b>] system_call_fastpath+0x16/0x1b
---[ end trace 414e84dc46400d98 ]---

Regards

Marcel



2009-04-16 01:08:44

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi Johannes,

> > After a long debugging session with Reinette we finally tracked down
> > what is going on here. A full scan with this card takes 7-9 seconds and
> > that seems to be too long and trigger IEEE80211_MONITORING_INTERVAL
> > which then deauthenticates us with the AP (reason=6).
>
> Yeah, this is a bug. We analysed this problem a couple of days ago and I
> thought Kalle was going to fix it -- maybe he forgot. I forgot what
> exactly it was and how to fix it, hmm. Right now I'm confused and can't
> seem to reproduce my analysis. I thought it had something to do with
> using a different timer but we do seem to use the regular mgd.timer, so
> I'm not sure.
>
> > Problem then is
> > that the attempt authenticate/associate with the AP fails with all the
> > time (reason=2). Bringing the interface down/up again doesn't help. It
> > seems that some internal mac80211 states are messed up. Some times the
> > whole process stalls completely and nothing happens anymore. Only
> > unloading and reloading the driver or rebooting helps.
>
> Hmm. Reason 2 is "Previous authentication no longer valid" which would
> seem we get confused wrt. being authenticated or not (ie. we think we
> are, but are not). Jouni might know? Can you try wpa_supplicant from git
> with -Dnl80211?

as I wrote, that is exactly what I am using. Trying to get away from
this WEXT crap ;)

Regards

Marcel



2009-04-20 17:14:01

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi Johannes,

> > After a long debugging session with Reinette we finally tracked down
> > what is going on here. A full scan with this card takes 7-9 seconds and
> > that seems to be too long and trigger IEEE80211_MONITORING_INTERVAL
> > which then deauthenticates us with the AP (reason=6).
>
> Note to everybody: _Please_ provide the exact output. This text above
> makes it sound like we deauth from the AP, when in reality the AP has
> deauth'ed us.
>
> Maybe your AP is just buggy and can't handle you going away for more
> than a couple of seconds? It _looks_ like it sends you a deauth frame
> while you're scanning (please verify using wireshark or similar), we
> miss that deauth because we're off the channel, and then when we get
> back and start sending data again the AP complains that we're no longer
> authenticated.
>
> Sorry, but there's not enough information here to tell what really is
> going wrong. The known bug shouldn't be able to cause this behaviour.

so I had time to log the details now. I still have the problem that the
association only works once. If it ever breaks it never works again. I
hope this is not something stupid in the iwlagn driver for the 5350
hardware.

This is from wpa_supplicant (latest version from GIT of course):

CTRL-EVENT-SCAN-RESULTS
CTRL-EVENT-SCAN-RESULTS
WPS-AP-AVAILABLE
Trying to authenticate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
Trying to associate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
Associated with 00:1f:3f:19:d8:b7
WPA: Key negotiation completed with 00:1f:3f:19:d8:b7 [PTK=CCMP GTK=CCMP]
CTRL-EVENT-CONNECTED - Connection to 00:1f:3f:19:d8:b7 completed (auth) [id=0 id_str=]
CTRL-EVENT-SCAN-RESULTS
CTRL-EVENT-SCAN-RESULTS
CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
CTRL-EVENT-SCAN-RESULTS
Trying to authenticate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
Trying to associate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
Associated with 00:1f:3f:19:d8:b7
CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
CTRL-EVENT-SCAN-RESULTS
Trying to authenticate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
Trying to associate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
Associated with 00:1f:3f:19:d8:b7
CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
CTRL-EVENT-SCAN-RESULTS
Trying to authenticate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
Trying to associate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
Associated with 00:1f:3f:19:d8:b7
CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
CTRL-EVENT-SCAN-RESULTS
Trying to authenticate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
Trying to associate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
Associated with 00:1f:3f:19:d8:b7
WPA: 4-Way Handshake failed - pre-shared key may be incorrect
CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
CTRL-EVENT-SCAN-RESULTS
Trying to authenticate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
Trying to associate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
Associated with 00:1f:3f:19:d8:b7
CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
CTRL-EVENT-SCAN-RESULTS
Trying to authenticate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
Trying to associate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
Associated with 00:1f:3f:19:d8:b7
CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
CTRL-EVENT-SCAN-RESULTS
Trying to authenticate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
Trying to associate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
Associated with 00:1f:3f:19:d8:b7
^CCTRL-EVENT-TERMINATING - signal 2 received

As you see it keeps scanning, authenticating, associating and then it
never completes.

And this is the debug output from the kernel side (latest
wireless-testing with all debug options enabled):

wlan0: authenticate with AP 00:1f:3f:19:d8:b7
wlan0: authenticated
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: RX AssocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
wlan0: associated
phy0: Allocated STA 00:1f:3f:19:d8:b7
phy0: Inserted STA 00:1f:3f:19:d8:b7
wmaster0: WMM queue=2 aci=0 acm=0 aifs=3 cWmin=15 cWmax=1023 txop=0
wmaster0: WMM queue=3 aci=1 acm=0 aifs=7 cWmin=15 cWmax=1023 txop=0
wmaster0: WMM queue=1 aci=2 acm=0 aifs=2 cWmin=7 cWmax=15 txop=94
wmaster0: WMM queue=0 aci=3 acm=0 aifs=2 cWmin=3 cWmax=7 txop=47
wlan0: switched to short barker preamble (BSSID=00:1f:3f:19:d8:b7)
wlan0: switched to short slot time (BSSID=00:1f:3f:19:d8:b7)
cfg80211: Calling CRDA for country: DE
wlan0: beacon loss from AP 00:1f:3f:19:d8:b7 - sending probe request
wlan0: beacon loss from AP 00:1f:3f:19:d8:b7 - sending probe request
wlan0: no probe response from AP 00:1f:3f:19:d8:b7 - disassociating
phy0: Removed STA 00:1f:3f:19:d8:b7
phy0: Destroyed STA 00:1f:3f:19:d8:b7
wlan0: authenticate with AP 00:1f:3f:19:d8:b7
wlan0: authenticated
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: RX ReassocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
wlan0: associated
phy0: Allocated STA 00:1f:3f:19:d8:b7
phy0: Inserted STA 00:1f:3f:19:d8:b7
wmaster0: WMM queue=2 aci=0 acm=0 aifs=3 cWmin=15 cWmax=1023 txop=0
wmaster0: WMM queue=3 aci=1 acm=0 aifs=7 cWmin=15 cWmax=1023 txop=0
wmaster0: WMM queue=1 aci=2 acm=0 aifs=2 cWmin=7 cWmax=15 txop=94
wmaster0: WMM queue=0 aci=3 acm=0 aifs=2 cWmin=3 cWmax=7 txop=47
wlan0: switched to short barker preamble (BSSID=00:1f:3f:19:d8:b7)
wlan0: switched to short slot time (BSSID=00:1f:3f:19:d8:b7)
wlan0: deauthenticated (Reason: 2)
phy0: Removed STA 00:1f:3f:19:d8:b7
phy0: Destroyed STA 00:1f:3f:19:d8:b7
wlan0: direct probe to AP 00:1f:3f:19:d8:b7 try 1
wlan0 direct probe responded
wlan0: authenticate with AP 00:1f:3f:19:d8:b7
wlan0: authenticate with AP 00:1f:3f:19:d8:b7
wlan0: authenticate with AP 00:1f:3f:19:d8:b7
wlan0: authenticated
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: RX ReassocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
wlan0: associated
phy0: Allocated STA 00:1f:3f:19:d8:b7
phy0: Inserted STA 00:1f:3f:19:d8:b7
wmaster0: WMM queue=2 aci=0 acm=0 aifs=3 cWmin=15 cWmax=1023 txop=0
wmaster0: WMM queue=3 aci=1 acm=0 aifs=7 cWmin=15 cWmax=1023 txop=0
wmaster0: WMM queue=1 aci=2 acm=0 aifs=2 cWmin=7 cWmax=15 txop=94
wmaster0: WMM queue=0 aci=3 acm=0 aifs=2 cWmin=3 cWmax=7 txop=47
wlan0: switched to short barker preamble (BSSID=00:1f:3f:19:d8:b7)
wlan0: switched to short slot time (BSSID=00:1f:3f:19:d8:b7)
wlan0: deauthenticated (Reason: 2)
phy0: Removed STA 00:1f:3f:19:d8:b7
phy0: Destroyed STA 00:1f:3f:19:d8:b7
wlan0: authenticate with AP 00:1f:3f:19:d8:b7
wlan0: authenticated
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: RX ReassocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
wlan0: associated
phy0: Allocated STA 00:1f:3f:19:d8:b7
phy0: Inserted STA 00:1f:3f:19:d8:b7
wmaster0: WMM queue=2 aci=0 acm=0 aifs=3 cWmin=15 cWmax=1023 txop=0
wmaster0: WMM queue=3 aci=1 acm=0 aifs=7 cWmin=15 cWmax=1023 txop=0
wmaster0: WMM queue=1 aci=2 acm=0 aifs=2 cWmin=7 cWmax=15 txop=94
wmaster0: WMM queue=0 aci=3 acm=0 aifs=2 cWmin=3 cWmax=7 txop=47
wlan0: switched to short barker preamble (BSSID=00:1f:3f:19:d8:b7)
wlan0: switched to short slot time (BSSID=00:1f:3f:19:d8:b7)
wlan0: deauthenticated (Reason: 2)
phy0: Removed STA 00:1f:3f:19:d8:b7
phy0: Destroyed STA 00:1f:3f:19:d8:b7
wlan0: authenticate with AP 00:1f:3f:19:d8:b7
wlan0: authenticated
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: RX ReassocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
wlan0: associated
phy0: Allocated STA 00:1f:3f:19:d8:b7
phy0: Inserted STA 00:1f:3f:19:d8:b7
wmaster0: WMM queue=2 aci=0 acm=0 aifs=3 cWmin=15 cWmax=1023 txop=0
wmaster0: WMM queue=3 aci=1 acm=0 aifs=7 cWmin=15 cWmax=1023 txop=0
wmaster0: WMM queue=1 aci=2 acm=0 aifs=2 cWmin=7 cWmax=15 txop=94
wmaster0: WMM queue=0 aci=3 acm=0 aifs=2 cWmin=3 cWmax=7 txop=47
wlan0: switched to short barker preamble (BSSID=00:1f:3f:19:d8:b7)
wlan0: switched to short slot time (BSSID=00:1f:3f:19:d8:b7)
wlan0: deauthenticated (Reason: 2)
phy0: Removed STA 00:1f:3f:19:d8:b7
phy0: Destroyed STA 00:1f:3f:19:d8:b7
wlan0: authenticate with AP 00:1f:3f:19:d8:b7
wlan0: authenticate with AP 00:1f:3f:19:d8:b7
wlan0: authenticate with AP 00:1f:3f:19:d8:b7
wlan0: authenticated
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: RX ReassocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
wlan0: associated
phy0: Allocated STA 00:1f:3f:19:d8:b7
phy0: Inserted STA 00:1f:3f:19:d8:b7
wmaster0: WMM queue=2 aci=0 acm=0 aifs=3 cWmin=15 cWmax=1023 txop=0
wmaster0: WMM queue=3 aci=1 acm=0 aifs=7 cWmin=15 cWmax=1023 txop=0
wmaster0: WMM queue=1 aci=2 acm=0 aifs=2 cWmin=7 cWmax=15 txop=94
wmaster0: WMM queue=0 aci=3 acm=0 aifs=2 cWmin=3 cWmax=7 txop=47
wlan0: switched to short barker preamble (BSSID=00:1f:3f:19:d8:b7)
wlan0: switched to short slot time (BSSID=00:1f:3f:19:d8:b7)
wlan0: deauthenticated (Reason: 2)
phy0: Removed STA 00:1f:3f:19:d8:b7
phy0: Destroyed STA 00:1f:3f:19:d8:b7
wlan0: direct probe to AP 00:1f:3f:19:d8:b7 try 1
wlan0: direct probe to AP 00:1f:3f:19:d8:b7 try 2
wlan0: direct probe to AP 00:1f:3f:19:d8:b7 try 3
wlan0 direct probe responded
wlan0: authenticate with AP 00:1f:3f:19:d8:b7
wlan0: authenticate with AP 00:1f:3f:19:d8:b7
wlan0: authenticate with AP 00:1f:3f:19:d8:b7
wlan0: authenticated
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: RX ReassocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
wlan0: associated
phy0: Allocated STA 00:1f:3f:19:d8:b7
phy0: Inserted STA 00:1f:3f:19:d8:b7
wmaster0: WMM queue=2 aci=0 acm=0 aifs=3 cWmin=15 cWmax=1023 txop=0
wmaster0: WMM queue=3 aci=1 acm=0 aifs=7 cWmin=15 cWmax=1023 txop=0
wmaster0: WMM queue=1 aci=2 acm=0 aifs=2 cWmin=7 cWmax=15 txop=94
wmaster0: WMM queue=0 aci=3 acm=0 aifs=2 cWmin=3 cWmax=7 txop=47
wlan0: switched to short barker preamble (BSSID=00:1f:3f:19:d8:b7)
wlan0: switched to short slot time (BSSID=00:1f:3f:19:d8:b7)
wlan0: deauthenticated (Reason: 2)
phy0: Removed STA 00:1f:3f:19:d8:b7
phy0: Destroyed STA 00:1f:3f:19:d8:b7
wlan0: authenticate with AP 00:1f:3f:19:d8:b7
wlan0: authenticate with AP 00:1f:3f:19:d8:b7
wlan0: authenticate with AP 00:1f:3f:19:d8:b7
wlan0: authenticated
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: associate with AP 00:1f:3f:19:d8:b7
wlan0: RX ReassocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
wlan0: associated
phy0: Allocated STA 00:1f:3f:19:d8:b7
phy0: Inserted STA 00:1f:3f:19:d8:b7
wmaster0: WMM queue=2 aci=0 acm=0 aifs=3 cWmin=15 cWmax=1023 txop=0
wmaster0: WMM queue=3 aci=1 acm=0 aifs=7 cWmin=15 cWmax=1023 txop=0
wmaster0: WMM queue=1 aci=2 acm=0 aifs=2 cWmin=7 cWmax=15 txop=94
wmaster0: WMM queue=0 aci=3 acm=0 aifs=2 cWmin=3 cWmax=7 txop=47
wlan0: switched to short barker preamble (BSSID=00:1f:3f:19:d8:b7)
wlan0: switched to short slot time (BSSID=00:1f:3f:19:d8:b7)
wlan0: deauthenticating by local choice (reason=3)
phy0: Removed STA 00:1f:3f:19:d8:b7
phy0: Destroyed STA 00:1f:3f:19:d8:b7

Do you need anything else? I have really no idea what happens here and I
like to get it fixed.

Regards

Marcel



2009-04-22 09:04:22

by Johannes Berg

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi,

> one other weird thing happens now (except wireless-testing tree and
> suspend/resume is broken). The first connection attempt fails horribly:
>
> CTRL-EVENT-SCAN-RESULTS
> Trying to authenticate with 00:23:ab:xx:xx:xx (SSID='Guest' freq=2412 MHz)
> Trying to associate with 00:23:ab:xx:xx:xx (SSID='Guest' freq=2412 MHz)
> Associated with 00:23:ab:xx:xx:xx
> CTRL-EVENT-CONNECTED - Connection to 00:23:ab:xx:xx:xx completed (auth) [id=2 id_str=]
> CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
> l2_packet_receive - recvfrom: Network is down
> Failed to initiate AP scan.
> Failed to initiate AP scan.
> ^CCTRL-EVENT-TERMINATING - signal 2 received
>
> So starting wpa_supplicant does its job fine, then running dhclient I
> end up in the disconnect and network down and after that scanning fails.

???

And the below kernel message is all yo uget?

> Killing wpa_supplicant and dhclient and then trying again works fine. So
> far I have seen that only after a fresh boot.
>
> Found this one in dmesg:
>
> ------------[ cut here ]------------
> WARNING: at net/wireless/nl80211.c:205 nl80211_send_wiphy+0x4b/0x988
> [cfg80211]()
> Hardware name: 7454CTO
> Modules linked in: fuse rfcomm sco bnep l2cap binfmt_misc
> snd_hda_codec_conexant iwlagn iwlcore snd_hda_intel i2400m_usb
> snd_hda_codec i2400m mac80211 uvcvideo snd_pcm btusb snd_timer snd wimax
> cfg80211 soundcore bluetooth snd_page_alloc uhci_hcd ehci_hcd [last
> unloaded: microcode]
> Pid: 3274, comm: wpa_supplicant Tainted: G W 2.6.30-rc2-wl #1
> Call Trace:
> [<ffffffff80236604>] warn_slowpath+0xb1/0xe5
> [<ffffffff8039a44a>] ? extract_buf+0x84/0xe4
> [<ffffffffa0055e04>] nl80211_send_wiphy+0x4b/0x988 [cfg80211]

What tree are you using? This warning totally confuses me -- there's no
code to produce it! It was a spurious warning Luis added some time ago,
we removed right away again and now it's haunting us.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2009-04-22 12:56:22

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi Johannes,

> > > > I just don't see anything in the kernel that would ever set the
> > > > interface down, as indicated there.
> >
> > so I have installed the latest wireless-testing kernel now.
> >
> > CTRL-EVENT-SCAN-RESULTS
> > Trying to authenticate with 00:1f:f3:fa:58:59 (SSID='HON Circle' freq=2452 MHz)
> > Trying to associate with 00:1f:f3:fa:58:59 (SSID='HON Circle' freq=2452 MHz)
> > Could not read SSID from driver.
> > WPA: No SSID info found (msg 1 of 4).
> > Associated with 00:1f:f3:fa:58:59
> > WPA: Key negotiation completed with 00:1f:f3:fa:58:59 [PTK=CCMP GTK=CCMP]
> > CTRL-EVENT-CONNECTED - Connection to 00:1f:f3:fa:58:59 completed (auth) [id=1 id_str=]
> >
> > <starting dhclient wlan0 now>
> >
> > CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
> > l2_packet_receive - recvfrom: Network is down
>
> I still don't understand this. Is the AP disconnecting you? Are you
> running NM on the side?

I am not running NM. Just plain wpa_supplicant and dhclient. However I
do have a suspicion that this is faulty firmware on my card and that
this messes things up. I had been doing testing with WiMAX and I run
into some issues and was speculating about interoperability. I started
testing with unreleased firmware. That would indeed explain why I don't
see this kind of issues with any other dongle I attached to the laptop.
I will reboot into a kernel that uses the official firmware.

Regards

Marcel



2009-04-16 00:22:07

by Johannes Berg

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi Marcel,

> After a long debugging session with Reinette we finally tracked down
> what is going on here. A full scan with this card takes 7-9 seconds and
> that seems to be too long and trigger IEEE80211_MONITORING_INTERVAL
> which then deauthenticates us with the AP (reason=6).

Yeah, this is a bug. We analysed this problem a couple of days ago and I
thought Kalle was going to fix it -- maybe he forgot. I forgot what
exactly it was and how to fix it, hmm. Right now I'm confused and can't
seem to reproduce my analysis. I thought it had something to do with
using a different timer but we do seem to use the regular mgd.timer, so
I'm not sure.

> Problem then is
> that the attempt authenticate/associate with the AP fails with all the
> time (reason=2). Bringing the interface down/up again doesn't help. It
> seems that some internal mac80211 states are messed up. Some times the
> whole process stalls completely and nothing happens anymore. Only
> unloading and reloading the driver or rebooting helps.

Hmm. Reason 2 is "Previous authentication no longer valid" which would
seem we get confused wrt. being authenticated or not (ie. we think we
are, but are not). Jouni might know? Can you try wpa_supplicant from git
with -Dnl80211?

> Increasing the IEEE80211_MONITORING_INTERVAL to 10 seconds doesn't
> trigger the deauthentication with the AP anymore (or less likely).
>
> During debugging I verified that only the scan actually triggers the
> deauthentication. If I don't scan (NM disabled) then everything works
> smoothly.

Right.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2009-04-16 09:46:40

by Johannes Berg

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

On Thu, 2009-04-16 at 01:21 +0200, Marcel Holtmann wrote:

> After a long debugging session with Reinette we finally tracked down
> what is going on here. A full scan with this card takes 7-9 seconds and
> that seems to be too long and trigger IEEE80211_MONITORING_INTERVAL
> which then deauthenticates us with the AP (reason=6).

Note to everybody: _Please_ provide the exact output. This text above
makes it sound like we deauth from the AP, when in reality the AP has
deauth'ed us.

Maybe your AP is just buggy and can't handle you going away for more
than a couple of seconds? It _looks_ like it sends you a deauth frame
while you're scanning (please verify using wireshark or similar), we
miss that deauth because we're off the channel, and then when we get
back and start sending data again the AP complains that we're no longer
authenticated.

Sorry, but there's not enough information here to tell what really is
going wrong. The known bug shouldn't be able to cause this behaviour.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2009-04-16 09:43:37

by Johannes Berg

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

On Thu, 2009-04-16 at 11:12 +0200, Johannes Berg wrote:

> > I have been thinking two ways to fix this, either disabling the timer
> > for the duration of the scan or add a check for scan scan in
> > ieee80211_associated(). I started implementing the former but haven't
> > finished it yet. It would be great if someone else can fix it.
>
> But that doesn't make sense to me now. ieee80211_associated() is only
> run from the station work (ieee80211_sta_work) which doesn't do anything
> when we're scanning (and gets restarted on scan end). So even if last_rx
> or last_beacon isn't updated we should only run the evaluation of that
> after the scan finishes.
>
> I'll try to reproduce this.

Can't. Scanning here, with 4965, takes almost 7 seconds, but all I see
is, _after_ the scan, a probe:

[ 1534.592164] wlan0: beacon loss from AP 00:1d:7e:4a:a1:ab - sending probe request

In a noisy environment I guess that probe or the reply could get lost,
but that seems unlikely to happen _every_ time.

Marcel, you will have to provide more information -- like kernel logs
for example, and maybe packet captures.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2009-04-22 11:56:09

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi Johannes,

> > > > > CTRL-EVENT-SCAN-RESULTS
> > > > > Trying to authenticate with 00:23:ab:xx:xx:xx (SSID='Guest' freq=2412 MHz)
> > > > > Trying to associate with 00:23:ab:xx:xx:xx (SSID='Guest' freq=2412 MHz)
> > > > > Associated with 00:23:ab:xx:xx:xx
> > > > > CTRL-EVENT-CONNECTED - Connection to 00:23:ab:xx:xx:xx completed (auth) [id=2 id_str=]
> > > > > CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
> > > > > l2_packet_receive - recvfrom: Network is down
> > > > > Failed to initiate AP scan.
> > > > > Failed to initiate AP scan.
> > > > > ^CCTRL-EVENT-TERMINATING - signal 2 received
> > > > >
> > > > > So starting wpa_supplicant does its job fine, then running dhclient I
> > > > > end up in the disconnect and network down and after that scanning fails.
> > > >
> > > > ???
> > > >
> > > > And the below kernel message is all yo uget?
> > >
> > > Yes, I am as puzzled as you are and can't make any sense out of it. It
> > > seems that something is going horribly wrong here somewhere. I just
> > > don't have any clue where.
> >
> > I just don't see anything in the kernel that would ever set the
> > interface down, as indicated there.

so I have installed the latest wireless-testing kernel now.

CTRL-EVENT-SCAN-RESULTS
Trying to authenticate with 00:1f:f3:fa:58:59 (SSID='HON Circle' freq=2452 MHz)
Trying to associate with 00:1f:f3:fa:58:59 (SSID='HON Circle' freq=2452 MHz)
Could not read SSID from driver.
WPA: No SSID info found (msg 1 of 4).
Associated with 00:1f:f3:fa:58:59
WPA: Key negotiation completed with 00:1f:f3:fa:58:59 [PTK=CCMP GTK=CCMP]
CTRL-EVENT-CONNECTED - Connection to 00:1f:f3:fa:58:59 completed (auth) [id=1 id_str=]

<starting dhclient wlan0 now>

CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
l2_packet_receive - recvfrom: Network is down
CTRL-EVENT-SCAN-RESULTS
Trying to authenticate with 00:1f:f3:fa:58:59 (SSID='HON Circle' freq=2452 MHz)
Trying to associate with 00:1f:f3:fa:58:59 (SSID='HON Circle' freq=2452 MHz)
Associated with 00:1f:f3:fa:58:59
WPA: Key negotiation completed with 00:1f:f3:fa:58:59 [PTK=CCMP GTK=CCMP]
CTRL-EVENT-CONNECTED - Connection to 00:1f:f3:fa:58:59 completed (reauth) [id=1 id_str=]

For some reason the dhclient triggers the disconnect and I see the
network down. With this kernel it reconnects properly and then dhclient
succeeds. This is against an N-capable AirPort Express.

Regards

Marcel



2009-04-22 07:27:23

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi Kalle,

> > so I have built the latest wireless-testing tree with a new patches sent
> > by Reinette fixing the DMA issue. So far I have no problems and I am
> > unable to re-produce it. However this is a different AP and I disabled
> > the power control feature for now.
> >
> > I also updated wpa_supplicant and iw to the latest code and assume that
> > no fix in there silently fixed it for me.
> >
> > So we are suspecting that this issue was because of me having 3GB of RAM
> > and running on a 64-bit kernel. Does this makes sense and could be the
> > problem why it kept failing until I unload the iwlagn driver?
>
> I would say that don't make any assumptions until you manage to test
> with the original AP. APs can be broken so many ways.

I would agree with that in general, but I haven been using that AP for
over a year now successfully with the 4965 card in my X61. No problems
ever.

Regards

Marcel



2009-04-16 10:19:09

by Johannes Berg

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

On Thu, 2009-04-16 at 13:08 +0300, Kalle Valo wrote:
> Johannes Berg <[email protected]> writes:
>
> >> ifmgd->timer is run every two seconds, even then scanning and it calls
> >> like this:
> >>
> >> ieee80211_sta_timer() -> ieee80211_sta_work() -> ieee80211_associated()
> >>
> >> And to my knowledge the timer is run even when scanning. Or am I missing
> >> something?
> >
> > But ieee80211_sta_work() checks if we're scanning and aborts if we are.
>
> Oh, I missed that one. Yes, you are right.

Maybe we should just set last_beacon = jiffies on scan finished as an
easy workaround :)

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2009-04-16 10:04:47

by Johannes Berg

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

On Thu, 2009-04-16 at 12:59 +0300, Kalle Valo wrote:

> >> I have been thinking two ways to fix this, either disabling the timer
> >> for the duration of the scan or add a check for scan scan in
> >> ieee80211_associated(). I started implementing the former but haven't
> >> finished it yet. It would be great if someone else can fix it.
> >
> > But that doesn't make sense to me now. ieee80211_associated() is only
> > run from the station work (ieee80211_sta_work) which doesn't do anything
> > when we're scanning (and gets restarted on scan end). So even if last_rx
> > or last_beacon isn't updated we should only run the evaluation of that
> > after the scan finishes.
>
> ifmgd->timer is run every two seconds, even then scanning and it calls
> like this:
>
> ieee80211_sta_timer() -> ieee80211_sta_work() -> ieee80211_associated()
>
> And to my knowledge the timer is run even when scanning. Or am I missing
> something?

But ieee80211_sta_work() checks if we're scanning and aborts if we are.

> Like you have sometime ago said, this MLME code we would need a rewrite.
> It's getting messier all the time.

Heh, yeah.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2009-04-21 21:20:26

by Johannes Berg

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi Marcel,

> so I have built the latest wireless-testing tree with a new patches sent
> by Reinette fixing the DMA issue. So far I have no problems and I am
> unable to re-produce it. However this is a different AP and I disabled
> the power control feature for now.

Ok.

> I also updated wpa_supplicant and iw to the latest code and assume that
> no fix in there silently fixed it for me.

I wouldn't suspect any significant changes there -- for sure not iw :)

> So we are suspecting that this issue was because of me having 3GB of RAM
> and running on a 64-bit kernel. Does this makes sense and could be the
> problem why it kept failing until I unload the iwlagn driver?

I couldn't say. Nothing here makes much sense to me, unfortunately.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2009-04-22 17:29:18

by Reinette Chatre

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi,

On Wed, 2009-04-22 at 02:45 -0700, Johannes Berg wrote:
> Hi,
>
> > > > CTRL-EVENT-SCAN-RESULTS
> > > > Trying to authenticate with 00:23:ab:xx:xx:xx (SSID='Guest' freq=2412 MHz)
> > > > Trying to associate with 00:23:ab:xx:xx:xx (SSID='Guest' freq=2412 MHz)
> > > > Associated with 00:23:ab:xx:xx:xx
> > > > CTRL-EVENT-CONNECTED - Connection to 00:23:ab:xx:xx:xx completed (auth) [id=2 id_str=]
> > > > CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
> > > > l2_packet_receive - recvfrom: Network is down
> > > > Failed to initiate AP scan.
> > > > Failed to initiate AP scan.
> > > > ^CCTRL-EVENT-TERMINATING - signal 2 received
> > > >
> > > > So starting wpa_supplicant does its job fine, then running dhclient I
> > > > end up in the disconnect and network down and after that scanning fails.
> > >
> > > ???
> > >
> > > And the below kernel message is all yo uget?
> >
> > Yes, I am as puzzled as you are and can't make any sense out of it. It
> > seems that something is going horribly wrong here somewhere. I just
> > don't have any clue where.
>
> I just don't see anything in the kernel that would ever set the
> interface down, as indicated there.
>
> > > > WARNING: at net/wireless/nl80211.c:205 nl80211_send_wiphy+0x4b/0x988
>
> > > > [<ffffffffa0055e04>] nl80211_send_wiphy+0x4b/0x988 [cfg80211]
> > >
> > > What tree are you using? This warning totally confuses me -- there's no
> > > code to produce it! It was a spurious warning Luis added some time ago,
> > > we removed right away again and now it's haunting us.
> >
> > This is the iwlwifi tree based on wireless-testing. It should be
> > re-based, but maybe an old patch is still stuck in it. Remember that I
> > am using the nl80211 driver of wpa_supplicant.
>
> Indeed:
>
> $ git diff iwlwifi-2.6/master..wireless-testing/master net/wireless
> diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
> index ebda7c5..97bb5c8 100644
> --- a/net/wireless/nl80211.c
> +++ b/net/wireless/nl80211.c
> @@ -202,8 +202,6 @@ static int nl80211_send_wiphy(struct sk_buff *msg, u32 pid, u32 seq, int flags,
> int i;
> u16 ifmodes = dev->wiphy.interface_modes;
>
> - assert_cfg80211_lock();
> -
> hdr = nl80211hdr_put(msg, pid, seq, flags, NL80211_CMD_NEW_WIPHY);
> if (!hdr)
> return -1;
>

I have no idea how this was missed during a merge ... it is fixed now.

Reinette



2009-04-22 05:04:22

by Kalle Valo

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Marcel Holtmann <[email protected]> writes:

> so I have built the latest wireless-testing tree with a new patches sent
> by Reinette fixing the DMA issue. So far I have no problems and I am
> unable to re-produce it. However this is a different AP and I disabled
> the power control feature for now.
>
> I also updated wpa_supplicant and iw to the latest code and assume that
> no fix in there silently fixed it for me.
>
> So we are suspecting that this issue was because of me having 3GB of RAM
> and running on a 64-bit kernel. Does this makes sense and could be the
> problem why it kept failing until I unload the iwlagn driver?

I would say that don't make any assumptions until you manage to test
with the original AP. APs can be broken so many ways.

--
Kalle Valo

2009-04-22 13:30:56

by John W. Linville

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

On Wed, Apr 22, 2009 at 12:53:25PM +0100, Marcel Holtmann wrote:

> <starting dhclient wlan0 now>
>
> CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
> l2_packet_receive - recvfrom: Network is down
> CTRL-EVENT-SCAN-RESULTS
> Trying to authenticate with 00:1f:f3:fa:58:59 (SSID='HON Circle' freq=2452 MHz)
> Trying to associate with 00:1f:f3:fa:58:59 (SSID='HON Circle' freq=2452 MHz)
> Associated with 00:1f:f3:fa:58:59
> WPA: Key negotiation completed with 00:1f:f3:fa:58:59 [PTK=CCMP GTK=CCMP]
> CTRL-EVENT-CONNECTED - Connection to 00:1f:f3:fa:58:59 completed (reauth) [id=1 id_str=]
>
> For some reason the dhclient triggers the disconnect and I see the
> network down. With this kernel it reconnects properly and then dhclient
> succeeds. This is against an N-capable AirPort Express.

IIRC, dhclient is prone to 'bouncing' the interface (i.e. down/up).
I have seen it do that, but I can't remember what the circumstances
were. Anyway, maybe it was worth mentioning...

John
--
John W. Linville Someday the world will need a hero, and you
[email protected] might be all we have. Be ready.

2009-04-16 06:15:00

by Kalle Valo

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Johannes Berg <[email protected]> writes:

> Hi Marcel,
>
>> After a long debugging session with Reinette we finally tracked down
>> what is going on here. A full scan with this card takes 7-9 seconds and
>> that seems to be too long and trigger IEEE80211_MONITORING_INTERVAL
>> which then deauthenticates us with the AP (reason=6).
>
> Yeah, this is a bug. We analysed this problem a couple of days ago and I
> thought Kalle was going to fix it -- maybe he forgot.

I was supposed to fix this during my easter vacation but was too lazy :)

> I forgot what exactly it was and how to fix it, hmm. Right now I'm
> confused and can't seem to reproduce my analysis. I thought it had
> something to do with using a different timer but we do seem to use the
> regular mgd.timer, so I'm not sure.

The discussion is here:

http://www.spinics.net/lists/kernel/msg866487.html

Basically the problem is that if you scan in an area where there are no
or very few APs and scan takes more than two seconds (like abg band
scans usually take), sta->last_rx won't get updated because mac80211 is
not receiving any frames. This is why the beacon loss check will trigger
in ieee80211_associated().

The problem won't happen if there are lot of APs in the neighbourhood
distributed throughout the channels because then sta->last_rx is updated
often enough and the beacon loss check won't trigger.

I have been thinking two ways to fix this, either disabling the timer
for the duration of the scan or add a check for scan scan in
ieee80211_associated(). I started implementing the former but haven't
finished it yet. It would be great if someone else can fix it.

--
Kalle Valo

2009-04-16 09:58:52

by Johannes Berg

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

On Thu, 2009-04-16 at 12:49 +0300, Kalle Valo wrote:

> >> But that doesn't make sense to me now. ieee80211_associated() is only
> >> run from the station work (ieee80211_sta_work) which doesn't do anything
> >> when we're scanning (and gets restarted on scan end). So even if last_rx
> >> or last_beacon isn't updated we should only run the evaluation of that
> >> after the scan finishes.
> >>
> >> I'll try to reproduce this.
> >
> > Can't. Scanning here, with 4965, takes almost 7 seconds, but all I see
> > is, _after_ the scan, a probe:
> >
> > [ 1534.592164] wlan0: beacon loss from AP 00:1d:7e:4a:a1:ab - sending probe request
> >
>
> How many APs can your 4965 see? You need to have at least 4 second
> silence period to see the disassication (2 seconds for beacon loss
> detection, other two seconds for waiting probe response).
>
> Try artificially dropping received frames during scan or something like
> that.

But that doesn't make sense. We don't run ieee80211_associated() during
scanning, so how would the state machine ever proceed?

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2009-04-16 10:08:16

by Kalle Valo

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Johannes Berg <[email protected]> writes:

>> ifmgd->timer is run every two seconds, even then scanning and it calls
>> like this:
>>
>> ieee80211_sta_timer() -> ieee80211_sta_work() -> ieee80211_associated()
>>
>> And to my knowledge the timer is run even when scanning. Or am I missing
>> something?
>
> But ieee80211_sta_work() checks if we're scanning and aborts if we are.

Oh, I missed that one. Yes, you are right.

--
Kalle Valo

2009-04-16 10:23:20

by Kalle Valo

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Johannes Berg <[email protected]> writes:

> On Thu, 2009-04-16 at 13:08 +0300, Kalle Valo wrote:
>> Johannes Berg <[email protected]> writes:
>>
>> > But ieee80211_sta_work() checks if we're scanning and aborts if we are.
>>
>> Oh, I missed that one. Yes, you are right.
>
> Maybe we should just set last_beacon = jiffies on scan finished as an
> easy workaround :)

That sounds quite good actually.

I wish I had noticed the check in ieee80211_sta_work() already when I
tried to fix this :)

--
Kalle Valo

2009-04-20 17:33:34

by Johannes Berg

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi,

> > > wlan0: associated
> >
> > > wlan0: deauthenticated (Reason: 2)
> >
> > > wlan0: RX ReassocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
> > > wlan0: associated
> >
> > > wlan0: deauthenticated (Reason: 2)
> >
> > > wlan0: RX ReassocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
> > > wlan0: associated
> >
> > > wlan0: deauthenticated (Reason: 2)
> >
> > > Do you need anything else? I have really no idea what happens here and I
> > > like to get it fixed.
> >
> > Can you do the kernel log with timing please? It seems strange to get
> > "previous authentication is no longer valid" from the AP as a reason
> > code while associated.
>
> I can try but I am not home anymore. So no promises. However it keeps
> happening in a pretty constant rate. Like every 2-5 seconds interval.

Ok. The thing I don't understand is why the AP would deauth you. And
with that reason code to boot.

> Could be the iwlagn driver for 5350 hardware doing something wrong here.
> And I also forced it to use sw_crypto.

I suppose I could try 5350. But I see no reason it would influence the
auth/assoc handshake.

> Do you have any ideas so far or want me to put some extra debug code in
> there. Not sure if I get that kernel up before tomorrow morning.

Sorry, not really. A parallel monitor capture might help:
iw dev wlan0 interface add moni0 type monitor flags none
ip link set moni0 up
tcpdump -s 10000 -w /tmp/dump.pkt -i moni0 -p
# now leave that running and do whatever as usual

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2009-04-22 09:45:58

by Johannes Berg

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi,

> > > CTRL-EVENT-SCAN-RESULTS
> > > Trying to authenticate with 00:23:ab:xx:xx:xx (SSID='Guest' freq=2412 MHz)
> > > Trying to associate with 00:23:ab:xx:xx:xx (SSID='Guest' freq=2412 MHz)
> > > Associated with 00:23:ab:xx:xx:xx
> > > CTRL-EVENT-CONNECTED - Connection to 00:23:ab:xx:xx:xx completed (auth) [id=2 id_str=]
> > > CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
> > > l2_packet_receive - recvfrom: Network is down
> > > Failed to initiate AP scan.
> > > Failed to initiate AP scan.
> > > ^CCTRL-EVENT-TERMINATING - signal 2 received
> > >
> > > So starting wpa_supplicant does its job fine, then running dhclient I
> > > end up in the disconnect and network down and after that scanning fails.
> >
> > ???
> >
> > And the below kernel message is all yo uget?
>
> Yes, I am as puzzled as you are and can't make any sense out of it. It
> seems that something is going horribly wrong here somewhere. I just
> don't have any clue where.

I just don't see anything in the kernel that would ever set the
interface down, as indicated there.

> > > WARNING: at net/wireless/nl80211.c:205 nl80211_send_wiphy+0x4b/0x988

> > > [<ffffffffa0055e04>] nl80211_send_wiphy+0x4b/0x988 [cfg80211]
> >
> > What tree are you using? This warning totally confuses me -- there's no
> > code to produce it! It was a spurious warning Luis added some time ago,
> > we removed right away again and now it's haunting us.
>
> This is the iwlwifi tree based on wireless-testing. It should be
> re-based, but maybe an old patch is still stuck in it. Remember that I
> am using the nl80211 driver of wpa_supplicant.

Indeed:

$ git diff iwlwifi-2.6/master..wireless-testing/master net/wireless
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index ebda7c5..97bb5c8 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -202,8 +202,6 @@ static int nl80211_send_wiphy(struct sk_buff *msg, u32 pid, u32 seq, int flags,
int i;
u16 ifmodes = dev->wiphy.interface_modes;

- assert_cfg80211_lock();
-
hdr = nl80211hdr_put(msg, pid, seq, flags, NL80211_CMD_NEW_WIPHY);
if (!hdr)
return -1;


johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2009-04-22 12:15:30

by Johannes Berg

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi,

> > > I just don't see anything in the kernel that would ever set the
> > > interface down, as indicated there.
>
> so I have installed the latest wireless-testing kernel now.
>
> CTRL-EVENT-SCAN-RESULTS
> Trying to authenticate with 00:1f:f3:fa:58:59 (SSID='HON Circle' freq=2452 MHz)
> Trying to associate with 00:1f:f3:fa:58:59 (SSID='HON Circle' freq=2452 MHz)
> Could not read SSID from driver.
> WPA: No SSID info found (msg 1 of 4).
> Associated with 00:1f:f3:fa:58:59
> WPA: Key negotiation completed with 00:1f:f3:fa:58:59 [PTK=CCMP GTK=CCMP]
> CTRL-EVENT-CONNECTED - Connection to 00:1f:f3:fa:58:59 completed (auth) [id=1 id_str=]
>
> <starting dhclient wlan0 now>
>
> CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
> l2_packet_receive - recvfrom: Network is down

I still don't understand this. Is the AP disconnecting you? Are you
running NM on the side?

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2009-04-20 17:20:45

by Johannes Berg

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Marcel,

> Trying to authenticate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
> Trying to associate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
> Associated with 00:1f:3f:19:d8:b7
> WPA: Key negotiation completed with 00:1f:3f:19:d8:b7 [PTK=CCMP GTK=CCMP]
> CTRL-EVENT-CONNECTED - Connection to 00:1f:3f:19:d8:b7 completed (auth) [id=0 id_str=]

> Associated with 00:1f:3f:19:d8:b7
> CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys

> Associated with 00:1f:3f:19:d8:b7
> CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys

> Associated with 00:1f:3f:19:d8:b7
> CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys

> Associated with 00:1f:3f:19:d8:b7
> WPA: 4-Way Handshake failed - pre-shared key may be incorrect

> Associated with 00:1f:3f:19:d8:b7
> CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys

> As you see it keeps scanning, authenticating, associating and then it
> never completes.

> And this is the debug output from the kernel side (latest
> wireless-testing with all debug options enabled):

> wlan0: associated

> wlan0: beacon loss from AP 00:1f:3f:19:d8:b7 - sending probe request
> wlan0: beacon loss from AP 00:1f:3f:19:d8:b7 - sending probe request
> wlan0: no probe response from AP 00:1f:3f:19:d8:b7 - disassociating
> phy0: Removed STA 00:1f:3f:19:d8:b7
> phy0: Destroyed STA 00:1f:3f:19:d8:b7
> wlan0: authenticate with AP 00:1f:3f:19:d8:b7
> wlan0: authenticated
> wlan0: associate with AP 00:1f:3f:19:d8:b7
> wlan0: associate with AP 00:1f:3f:19:d8:b7
> wlan0: RX ReassocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
> wlan0: associated
> phy0: Allocated STA 00:1f:3f:19:d8:b7
> phy0: Inserted STA 00:1f:3f:19:d8:b7

So far so good

> wlan0: deauthenticated (Reason: 2)

This is confusing. And it keeps happening:

> wlan0: associated

> wlan0: deauthenticated (Reason: 2)

> wlan0: RX ReassocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
> wlan0: associated

> wlan0: deauthenticated (Reason: 2)

> wlan0: RX ReassocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
> wlan0: associated

> wlan0: deauthenticated (Reason: 2)

> Do you need anything else? I have really no idea what happens here and I
> like to get it fixed.

Can you do the kernel log with timing please? It seems strange to get
"previous authentication is no longer valid" from the AP as a reason
code while associated.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2009-04-16 09:59:54

by Kalle Valo

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Johannes Berg <[email protected]> writes:

> On Thu, 2009-04-16 at 09:14 +0300, Kalle Valo wrote:
>
>> Basically the problem is that if you scan in an area where there are no
>> or very few APs and scan takes more than two seconds (like abg band
>> scans usually take), sta->last_rx won't get updated because mac80211 is
>> not receiving any frames. This is why the beacon loss check will trigger
>> in ieee80211_associated().
>>
>> The problem won't happen if there are lot of APs in the neighbourhood
>> distributed throughout the channels because then sta->last_rx is updated
>> often enough and the beacon loss check won't trigger.
>
> That doesn't make sense, why would our AP sta->last_rx be updated for
> other APs? I think you mean sdata->u.mgd.last_beacon, which is, probably
> wrongly, updated for all beacons.

Yes, I meant sta->last_beacon. But the principle is the same anyway.

>> I have been thinking two ways to fix this, either disabling the timer
>> for the duration of the scan or add a check for scan scan in
>> ieee80211_associated(). I started implementing the former but haven't
>> finished it yet. It would be great if someone else can fix it.
>
> But that doesn't make sense to me now. ieee80211_associated() is only
> run from the station work (ieee80211_sta_work) which doesn't do anything
> when we're scanning (and gets restarted on scan end). So even if last_rx
> or last_beacon isn't updated we should only run the evaluation of that
> after the scan finishes.

ifmgd->timer is run every two seconds, even then scanning and it calls
like this:

ieee80211_sta_timer() -> ieee80211_sta_work() -> ieee80211_associated()

And to my knowledge the timer is run even when scanning. Or am I missing
something?

Like you have sometime ago said, this MLME code we would need a rewrite.
It's getting messier all the time.

--
Kalle Valo

2009-04-16 01:12:12

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi Johannes,

> > > After a long debugging session with Reinette we finally tracked down
> > > what is going on here. A full scan with this card takes 7-9 seconds and
> > > that seems to be too long and trigger IEEE80211_MONITORING_INTERVAL
> > > which then deauthenticates us with the AP (reason=6).
> >
> > Yeah, this is a bug. We analysed this problem a couple of days ago and I
> > thought Kalle was going to fix it -- maybe he forgot. I forgot what
> > exactly it was and how to fix it, hmm. Right now I'm confused and can't
> > seem to reproduce my analysis. I thought it had something to do with
> > using a different timer but we do seem to use the regular mgd.timer, so
> > I'm not sure.
>
> Hmm. Can't seem to find what's going on -- can you enable
> CONFIG_MAC80211_VERBOSE_DEBUG and check for the beacon loss messages?

I did that already and yes, I see the beacon loss message. Increasing
IEEE80211_MONITORING_INTERVAL to 10 seconds fixes this, but that can't
be the real solution for it.

Also even when it deauth, it should then be able to authenticate and
associate again. If I end up in this situation basically only a reboot
fixes it.

Regards

Marcel



2009-04-16 01:18:15

by Johannes Berg

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

On Thu, 2009-04-16 at 03:08 +0200, Marcel Holtmann wrote:

> > Hmm. Reason 2 is "Previous authentication no longer valid" which would
> > seem we get confused wrt. being authenticated or not (ie. we think we
> > are, but are not). Jouni might know? Can you try wpa_supplicant from git
> > with -Dnl80211?
>
> as I wrote, that is exactly what I am using. Trying to get away from
> this WEXT crap ;)

Oops, sorry. Could you try wext anyway? I think I have the same problem
with -Dnl80211 actually, but I continually rmmod anyway so didn't notice
it that much.

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2009-04-22 11:17:13

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi Johannes,

> > > > CTRL-EVENT-SCAN-RESULTS
> > > > Trying to authenticate with 00:23:ab:xx:xx:xx (SSID='Guest' freq=2412 MHz)
> > > > Trying to associate with 00:23:ab:xx:xx:xx (SSID='Guest' freq=2412 MHz)
> > > > Associated with 00:23:ab:xx:xx:xx
> > > > CTRL-EVENT-CONNECTED - Connection to 00:23:ab:xx:xx:xx completed (auth) [id=2 id_str=]
> > > > CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
> > > > l2_packet_receive - recvfrom: Network is down
> > > > Failed to initiate AP scan.
> > > > Failed to initiate AP scan.
> > > > ^CCTRL-EVENT-TERMINATING - signal 2 received
> > > >
> > > > So starting wpa_supplicant does its job fine, then running dhclient I
> > > > end up in the disconnect and network down and after that scanning fails.
> > >
> > > ???
> > >
> > > And the below kernel message is all yo uget?
> >
> > Yes, I am as puzzled as you are and can't make any sense out of it. It
> > seems that something is going horribly wrong here somewhere. I just
> > don't have any clue where.
>
> I just don't see anything in the kernel that would ever set the
> interface down, as indicated there.

I will try to install the latest wireless-testing from today and try
again since John merged all the patches. Maybe this is an issue with
wpa_supplicant and the nl80211 driver, but I am always running the
latest stuff here. As soon as Jouni pushes fixes, I run them.

> > > > WARNING: at net/wireless/nl80211.c:205 nl80211_send_wiphy+0x4b/0x988
>
> > > > [<ffffffffa0055e04>] nl80211_send_wiphy+0x4b/0x988 [cfg80211]
> > >
> > > What tree are you using? This warning totally confuses me -- there's no
> > > code to produce it! It was a spurious warning Luis added some time ago,
> > > we removed right away again and now it's haunting us.
> >
> > This is the iwlwifi tree based on wireless-testing. It should be
> > re-based, but maybe an old patch is still stuck in it. Remember that I
> > am using the nl80211 driver of wpa_supplicant.
>
> Indeed:
>
> $ git diff iwlwifi-2.6/master..wireless-testing/master net/wireless
> diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
> index ebda7c5..97bb5c8 100644
> --- a/net/wireless/nl80211.c
> +++ b/net/wireless/nl80211.c
> @@ -202,8 +202,6 @@ static int nl80211_send_wiphy(struct sk_buff *msg, u32 pid, u32 seq, int flags,
> int i;
> u16 ifmodes = dev->wiphy.interface_modes;
>
> - assert_cfg80211_lock();
> -
> hdr = nl80211hdr_put(msg, pid, seq, flags, NL80211_CMD_NEW_WIPHY);
> if (!hdr)
> return -1;

I think that Reinette will fix that once she arrives at the office :)

Sorry about that confusion. I do still have an issue though.

Regards

Marcel



2009-04-16 09:49:58

by Kalle Valo

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Johannes Berg <[email protected]> writes:

> On Thu, 2009-04-16 at 11:12 +0200, Johannes Berg wrote:
>
>> > I have been thinking two ways to fix this, either disabling the timer
>> > for the duration of the scan or add a check for scan scan in
>> > ieee80211_associated(). I started implementing the former but haven't
>> > finished it yet. It would be great if someone else can fix it.
>>
>> But that doesn't make sense to me now. ieee80211_associated() is only
>> run from the station work (ieee80211_sta_work) which doesn't do anything
>> when we're scanning (and gets restarted on scan end). So even if last_rx
>> or last_beacon isn't updated we should only run the evaluation of that
>> after the scan finishes.
>>
>> I'll try to reproduce this.
>
> Can't. Scanning here, with 4965, takes almost 7 seconds, but all I see
> is, _after_ the scan, a probe:
>
> [ 1534.592164] wlan0: beacon loss from AP 00:1d:7e:4a:a1:ab - sending probe request
>

How many APs can your 4965 see? You need to have at least 4 second
silence period to see the disassication (2 seconds for beacon loss
detection, other two seconds for waiting probe response).

Try artificially dropping received frames during scan or something like
that.

--
Kalle Valo

2009-04-16 00:44:49

by Johannes Berg

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL


> > After a long debugging session with Reinette we finally tracked down
> > what is going on here. A full scan with this card takes 7-9 seconds and
> > that seems to be too long and trigger IEEE80211_MONITORING_INTERVAL
> > which then deauthenticates us with the AP (reason=6).
>
> Yeah, this is a bug. We analysed this problem a couple of days ago and I
> thought Kalle was going to fix it -- maybe he forgot. I forgot what
> exactly it was and how to fix it, hmm. Right now I'm confused and can't
> seem to reproduce my analysis. I thought it had something to do with
> using a different timer but we do seem to use the regular mgd.timer, so
> I'm not sure.

Hmm. Can't seem to find what's going on -- can you enable
CONFIG_MAC80211_VERBOSE_DEBUG and check for the beacon loss messages?

johannes


Attachments:
signature.asc (836.00 B)
This is a digitally signed message part

2009-04-20 18:29:51

by Jouni Malinen

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

On Mon, Apr 20, 2009 at 06:13:56PM +0100, Marcel Holtmann wrote:

> so I had time to log the details now. I still have the problem that the
> association only works once. If it ever breaks it never works again. I
> hope this is not something stupid in the iwlagn driver for the 5350
> hardware.

Is the problem you described in the first message on this thread
something different or is this still expected to be the same issue? I'm
trying to understand the part about this being somehow tied to scanning
(NM enabled vs. disabled). Does this happen if you have NM disabled
(i.e., nothing triggering background scans)?


> This is from wpa_supplicant (latest version from GIT of course):
>
> CTRL-EVENT-SCAN-RESULTS

..

When you have a chance to run this again, I would be interested in
seeing more details debug log from wpa_supplicant (-ddt on command
line). From this excerpt, it seems like the issue would not be in
association, but on something after that. This kind of log could be
produced, e.g., when EAPOL frames are not delivered at all for some
reason, but the log does not have enough details to confirm whether that
was the case.

> Trying to authenticate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
> Trying to associate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
> Associated with 00:1f:3f:19:d8:b7
> CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys

Would you happen to have any idea on how timing for the last two
entries, i.e., how long time it took from association to disconnection?


> wlan0: RX ReassocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
> wlan0: associated

> wlan0: deauthenticated (Reason: 2)

This would indicate that the AP accepted association, but for some
reason, decided to kick the STA out. Which AP is it?

If the deauthentication did not happen immediately after association,
this could be indication of failed 4-way handshake; wpa_supplicant debug
log would hopefully help in figuring out whether that is the case. If
wpa_supplicant log does not show any EAPOL frames, it would be
interesting to see a wireless capture log showing all the frames the AP
is sending out during the authentication process.

--
Jouni Malinen PGP id EFC895FA

2009-04-20 17:26:29

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi Johannes,

> So far so good
>
> > wlan0: deauthenticated (Reason: 2)
>
> This is confusing. And it keeps happening:
>
> > wlan0: associated
>
> > wlan0: deauthenticated (Reason: 2)
>
> > wlan0: RX ReassocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
> > wlan0: associated
>
> > wlan0: deauthenticated (Reason: 2)
>
> > wlan0: RX ReassocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
> > wlan0: associated
>
> > wlan0: deauthenticated (Reason: 2)
>
> > Do you need anything else? I have really no idea what happens here and I
> > like to get it fixed.
>
> Can you do the kernel log with timing please? It seems strange to get
> "previous authentication is no longer valid" from the AP as a reason
> code while associated.

I can try but I am not home anymore. So no promises. However it keeps
happening in a pretty constant rate. Like every 2-5 seconds interval.

Could be the iwlagn driver for 5350 hardware doing something wrong here.
And I also forced it to use sw_crypto.

Do you have any ideas so far or want me to put some extra debug code in
there. Not sure if I get that kernel up before tomorrow morning.

Regards

Marcel



2009-04-21 19:56:19

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi Johannes,

> > I can try but I am not home anymore. So no promises. However it keeps
> > happening in a pretty constant rate. Like every 2-5 seconds interval.
>
> Ok. The thing I don't understand is why the AP would deauth you. And
> with that reason code to boot.
>
> > Could be the iwlagn driver for 5350 hardware doing something wrong here.
> > And I also forced it to use sw_crypto.
>
> I suppose I could try 5350. But I see no reason it would influence the
> auth/assoc handshake.

so I have built the latest wireless-testing tree with a new patches sent
by Reinette fixing the DMA issue. So far I have no problems and I am
unable to re-produce it. However this is a different AP and I disabled
the power control feature for now.

I also updated wpa_supplicant and iw to the latest code and assume that
no fix in there silently fixed it for me.

So we are suspecting that this issue was because of me having 3GB of RAM
and running on a 64-bit kernel. Does this makes sense and could be the
problem why it kept failing until I unload the iwlagn driver?

Regards

Marcel



2009-04-21 10:47:33

by Marcel Holtmann

[permalink] [raw]
Subject: Re: Problem with IEEE80211_MONITORING_INTERVAL

Hi Jouni,

> > so I had time to log the details now. I still have the problem that the
> > association only works once. If it ever breaks it never works again. I
> > hope this is not something stupid in the iwlagn driver for the 5350
> > hardware.
>
> Is the problem you described in the first message on this thread
> something different or is this still expected to be the same issue? I'm
> trying to understand the part about this being somehow tied to scanning
> (NM enabled vs. disabled). Does this happen if you have NM disabled
> (i.e., nothing triggering background scans)?

the scanning comes into play since it triggers the initial
disassociation. You can do that manually by killing wpa_supplicant and
get the same effect.

> > This is from wpa_supplicant (latest version from GIT of course):
> >
> > CTRL-EVENT-SCAN-RESULTS
>
> ..
>
> When you have a chance to run this again, I would be interested in
> seeing more details debug log from wpa_supplicant (-ddt on command
> line). From this excerpt, it seems like the issue would not be in
> association, but on something after that. This kind of log could be
> produced, e.g., when EAPOL frames are not delivered at all for some
> reason, but the log does not have enough details to confirm whether that
> was the case.

I will try to get you the detailed logs. I am not at home right now, but
hopefully my travel access points lets me reproduce it. However I do
have seen the same thing with an AP not using any security at all.

> > Trying to authenticate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
> > Trying to associate with 00:1f:3f:19:d8:b7 (SSID='Holtmann' freq=2462 MHz)
> > Associated with 00:1f:3f:19:d8:b7
> > CTRL-EVENT-DISCONNECTED - Disconnect event - remove keys
>
> Would you happen to have any idea on how timing for the last two
> entries, i.e., how long time it took from association to disconnection?

Like 2-5 seconds. Not more.

> > wlan0: RX ReassocResp from 00:1f:3f:19:d8:b7 (capab=0x431 status=0 aid=2)
> > wlan0: associated
>
> > wlan0: deauthenticated (Reason: 2)
>
> This would indicate that the AP accepted association, but for some
> reason, decided to kick the STA out. Which AP is it?

It is an AVM Fritz!Box 7270 with their latest firmware. Could the
enabled WPS mess things up here?

> If the deauthentication did not happen immediately after association,
> this could be indication of failed 4-way handshake; wpa_supplicant debug
> log would hopefully help in figuring out whether that is the case. If
> wpa_supplicant log does not show any EAPOL frames, it would be
> interesting to see a wireless capture log showing all the frames the AP
> is sending out during the authentication process.

Will try to get you all the details needed. Funny part here is that the
problems stated with my 5350 card. The old 4965 in my X61 used to work
perfectly fine with the exact same access point.

Regards

Marcel