Return-path: Received: from mail-wg0-f48.google.com ([74.125.82.48]:58120 "EHLO mail-wg0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752453AbbBWKze (ORCPT ); Mon, 23 Feb 2015 05:55:34 -0500 Received: by mail-wg0-f48.google.com with SMTP id l18so24806604wgh.7 for ; Mon, 23 Feb 2015 02:55:33 -0800 (PST) Message-ID: <54EB0722.2030509@gmail.com> (sfid-20150223_115540_796888_C470A191) Date: Mon, 23 Feb 2015 11:55:30 +0100 From: wim torfs MIME-Version: 1.0 To: Jouni Malinen CC: Linus Torvalds , Adrian Chadd , "Luis R. Rodriguez" , Kalle Valo , QCA ath9k Development , "ath9k-devel@lists.ath9k.org" , Linux Wireless List Subject: Re: AR9462 problems connecting again.. References: <20150223103741.GA3594@w1.fi> In-Reply-To: <20150223103741.GA3594@w1.fi> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Sender: linux-wireless-owner@vger.kernel.org List-ID: If it is the case that the 4-way handshake fails, then I have seen this issue before. The problem is that messages 1 to 4 are sent with the previous key. However, right after sending message 4/4, does wpa_supplicant set the new key. In some cases, such as in a high throughput environment, this could result that the key is set even before the last packet is sent out. As a result, the AP receives a packet which is encrypted with the wrong key, since it still listens with the old key. I did not notice this issue while authenticating for the first time, only after the first rekey operation. A solution would be to adjust wpa_supplicant to wait for the transmission ack before it sets its key, however, this causes issues with the key reception and transmission count, which could be circumvented by copying the old counter values upon setting the new key, but this is a dirty hack. Another solution would require some more invasive operations, that is, the new key should somehow be linked to the message 4/4 and should only be set by the driver upon transmission of the message 4/4. Best regards, Wim. On 02/23/2015 11:37 AM, Jouni Malinen wrote: > On Sun, Feb 22, 2015 at 05:41:25PM -0800, Linus Torvalds wrote: >> Ok. Attached is what seems to be the relevant part of the >> wpa_supplicant.log file. >> >> The datestamp has been changed so that it can be matched up with the >> dmesg, and I added empty lines for pauses when I was trying to figure >> out what the heck it was doing, but other than that it's the raw log. > >> 14:07:16.059389: nl80211: Authenticate (ifindex=3) >> 14:07:16.390659: nl80211: MLME event 37; timeout with 20:9f:db:e7:80:80 > > I'm assuming this is unrelated to the issue of getting disconnected > after 4-way handshake, but anyway, something here prevent the simple > Authentication frame exchange from completing (i.e., either the AP did > not reply to these these frames or the response was lost for some > reason). The following attempt did you go through without issues. > >> 14:07:19.478850: nl80211: Authenticate (ifindex=3) >> 14:07:19.493911: nl80211: Authenticate event > >> 14:07:19.494223: nl80211: Associate (ifindex=3) >> 14:07:19.503237: nl80211: Associate event > >> 14:07:19.504682: wlp1s0: WPA: RX message 1 of 4-Way Handshake from 20:9f:db:e7:80:80 (ver=2) >> 14:07:19.505850: wlp1s0: WPA: Sending EAPOL-Key 2/4 >> 14:07:19.510108: wlp1s0: WPA: RX message 3 of 4-Way Handshake from 20:9f:db:e7:80:80 (ver=2) >> 14:07:19.510233: wlp1s0: WPA: Sending EAPOL-Key 4/4 > > So it looks like both the AP and the station are able to send and > receive EAPOL frames. EAPOL-Key message 1/4..3/4 worked fine. However, > I'm assuming msg 4/4 did not make it through for some reason. > >> 14:07:19.510338: wlp1s0: WPA: Installing PTK to the driver >> 14:07:19.510405: wpa_driver_nl80211_set_key: ifindex=3 alg=3 addr=0x7f0ab41cad68 key_idx=0 set_tx=1 seq_len=6 key_len=16 >> 14:07:19.510435: addr=20:9f:db:e7:80:80 > > And this is where IEEE 802.11 RSN gets a bit messy.. The station > installs a key for encrypting all unicast frames to the AP now that it > believes 4-way handshake has been completed successfully. > >> 14:07:19.511009: wlp1s0: WPA: Installing GTK to the driver (keyidx=1 tx=0 len=16) > > And same for a key to decrypt received broadcast/multicast frames. > >> 14:07:19.511205: wlp1s0: WPA: Key negotiation completed with 20:9f:db:e7:80:80 [PTK=CCMP GTK=CCMP] >> 14:07:19.511223: wlp1s0: Cancelling authentication timeout >> 14:07:19.511238: wlp1s0: State: GROUP_HANDSHAKE -> COMPLETED >> 14:07:19.511253: wlp1s0: CTRL-EVENT-CONNECTED - Connection to 20:9f:db:e7:80:80 completed [id=0 id_str=] > > As far as the station is concerned, everything looked fine until here > and connection was established successfully. > >> 14:07:20.512476: wlp1s0: WPA: RX message 3 of 4-Way Handshake from 20:9f:db:e7:80:80 (ver=2) > > However, the AP disagrees.. It looks like this specific AP uses a one > second timeout on EAPOL-Key timeouts. This EAPOL-Key frame was likely > unencrypted since the AP did not receive (or accept, but more likely not > receive) msg 4/4. We are currently allowing such unencrypted EAPOL > frames (but not other ethertypes) to get processed even when the > pairwise key has been configured. > >> 14:07:20.512674: wlp1s0: WPA: Sending EAPOL-Key 4/4 > > And here's an attempt to reply.. Alas, this will likely go out encrypted > since we have the pairwise key configured and the AP will be dropping it > most likely since it would configure the pairwise key for this station > only after having receive valid 4/4. > >> 14:07:20.513170: wpa_driver_nl80211_set_key: ifindex=3 alg=3 addr=0x7f0ab41cad68 key_idx=0 set_tx=1 seq_len=6 key_len=16 > > And station reconfigured the key.. > >> 14:07:21.513331: wlp1s0: WPA: RX message 3 of 4-Way Handshake from 20:9f:db:e7:80:80 (ver=2) >> 14:07:22.514320: wlp1s0: WPA: RX message 3 of 4-Way Handshake from 20:9f:db:e7:80:80 (ver=2) > > These are the following two retries from AP, one second apart. Trying > EAPOL-Key frames four times in normal behavior, so based on timing here, > the issue is in AP not receiving (or accepting) msg 4/4. > >> 14:07:23.542903: nl80211: Deauthenticate event > > And this is the expected deauthentication by the AP after the last > EAPOL-Key frame retransmission failed to get a valid response. > > > So yes, this is a race condition of sorts. It is interesting if verbose > debugging in wpa_supplicant is enough to make this less likely to fail. > That will give some more time between the EAPOL-Key msg 4/4 transmission > (packet socket send() on the netdev) and pairwise key configuration > (netlink send). In theory, it could be possible to get even the first > EAPOL-Key msg 4/4 encrypted if the kernel operations for Data frame > transmission and nl80211 key configuration get reordered. Though, if > that were to be happening commonly, we would see significantly more > issues with this, so I'd assume this is not really the main issue here. > > I'm not sure why the AP would not accept EAPOL-Key msg 4/4, so it would > be interesting to see a wireless capture log to confirm that the first > attempt of sending that out did indeed result in the frame going out and > doing so without encryption. > > > As far as making this issue less likely to cause connectivity issues, I > guess we could make mac80211 do a similar hack that some drivers use > here and force the EAPOL-Key msg 4/4 retransmissions to go out without > encryption even though the pairwise key was configured. I think it would > be nicer to control this from user space (i.e., wpa_supplicant could > indicate with the send command that the frame needs to go out without > encryption), but that is unfortunately not something that the current > mechanism used for sending out EAPOL frames in station mode supports. > mac80211 should be able to track this state if it were to look at which > EAPOL-Key frames has been exchanged and start encrypting them only after > the AP has transmitted an encrypted frame with the new pairwise key. > This is somewhat ugly, but doable. (Though, this would be yet another > justification for finally trying to come up with an alternative > mechanism for handling EAPOL frame TX/RX that is really a special case > for IEEE 802.1X.) >