Someone opened a bug [1] about Intel devices. He reports that we drop
packets after rekeying. I am not an expert about all the security
stuff, but the submitter says that the bug occurs on any mac80211
device. The AP is running openWRT.
I don't really have the time to learn all the security stuff, so I
thought I'd let everybody know about this bug since after all, it is
affecting all mac80211 devices. Not sure at all the bug is in
mac80211, it might very well be in the AP.
The submitter invested lots of time in root causing the bug including
patching wireshark to have it decrypt packets after rekeying.
I'd be glad if someone could take a look. If not, I'll have someone
from our team to look at it, but I don't know how long it will take...
Thanks.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=92451
Emmanuel Grumbach
[email protected]
On 18 May 2015 at 08:14, Peer, Ilan <[email protected]> wrote:
>
>
>> -----Original Message-----
>> From: [email protected] [mailto:linux-wireless-
>> [email protected]] On Behalf Of Johannes Berg
>> Sent: Sunday, May 17, 2015 23:22
>> To: Emmanuel Grumbach
>> Cc: [email protected]; Jouni Malinen; linux-wireless
>> Subject: Re: mac80211 drops packet with old IV after rekeying
>>
>> On Sun, 2015-05-17 at 23:13 +0300, Emmanuel Grumbach wrote:
>>
>> > >> Yeah - ok. But how come we *already* set the pointer to the new key
>> > >> while the HW is still successfully decrypting with the old key.
>> > >> This is the point I can' figure out. I'd expect the transmitting
>> > >> side to stop using the old key prior to sending the EAPOL (which
>
> There is probably no synchronization between the 4way HS and other data traffic on the transmitter side, as these are different processes. So it is possible that after receiving message 3 and before setting the keys, the HW would be able to decrypt additional frames with the old key.
>
In ath10k hw we have peer flag WMI_PEER_NEED_PTK_4_WAY.
This will lock tx (discard data) until PTK_M4_SENT and install key
after 4way HS.
But I didn't check ptk_rekey and I am not sure this will help with all races.
>> > >> #triggers the set key pointer line). So those 2 lines don't make sense to
>> me:
>> > >>
>> > >> > # set key pointer to new key
>> > >> > * data RX in HW, decrypt w/ old key, PN=999
>> > >>
>> > >> After all, the Rx path is serialized all the way through from the
>> > >> air to mac80211. The only thing I can think about is that the
>> > >> sending side is still using the old key *after* it already sent its EAPOL
>> frames.
>> > >
>> > > Not sure, isn't the key only installed on EAPOL acknowledgement or so?
>> > > With PTK rekeying, I'm not really sure what the timing is, and
>> > > there's not really any way it can be correct (without extended key
>> > > ID support.)
>> >
>> > Whatever the timing is, since the Rx path is serialized, there
>> > shouldn't be any timing issues. Or at least, I can't figure out how
>> > these lines above could be in the order you put them.
>>
>> I agree that it'll depend on how the key is installed on the sender, however, I
>> have no idea how that's done and how much potential delay there is between
>> sending the EAPOL frame and installing the key there.
>> If you're looking at RX path synchronisation only then you're assuming a
>> perfect sender, but that clearly cannot be the case.
>>
>
> AFAIK, the PTK is installed immediately after the 4th message is sent without waiting to ACK or any other delay. As the AP (should) installs the keys only after processing the 4th message, so a delay is expected.
>
> Regards,
>
> Ilan.
On Wed, 2015-05-20 at 22:55 +0200, Alexander Wetzel wrote:
> I've verified that turning off hardware encryption on the AP and the STA
> is indeed preventing the issue.
> As soon as one of them is using the hardware encryption I can trigger
> the problem. (In my setup it seems to be mostly caused by the AP, since
> I needed sometimes as much as three rekeys to get the freeze when the AP
> was using Software and the STA hardware encryption.)
Right, I did identify cases where both sides can have issues. I'm not
surprised that the AP-side issue is more likely.
> So confident that we finally found the root of the evil I tried to write
> some code catching the races, see the attachment.
>
> It's probably not the best fix, but the only one I could think of and
> deploy myself with the knowledge I gathered here and the last days.
Your patch breaks the security properties of this code, so we cannot use
it :-)
> What was really surprising me here is, that this is such a generic issue
> and I'm finding that in my home environment. For my understanding that
> should break many (all?) EAP Wlan's. (I'm using EAP-TLS and that did
> make the WLAN basically unusable and any sane person would have switched
> back to PSK...)
Well, I think it's a matter of probabilities.
First of all, the AP bug seems to be more likely to cause an issue, so
anyone who deployed EAP-TLS with non-broken APs is far better off than
you are. Secondly, you really can only run into this while you do
rekeying in heavy traffic, so in production environments with large
rekey intervals it doesn't matter as much again. And then I guess the
windows driver reconnects on PTK rekey request, so there you wouldn't
see it either ... as a consequence the number of affected people must be
pretty low :)
johannes
On 05/18/2015 01:03 AM, Janusz Dziedzic wrote:
> On 18 May 2015 at 08:14, Peer, Ilan <[email protected]> wrote:
>>
>>
>>> -----Original Message-----
>>> From: [email protected] [mailto:linux-wireless-
>>> [email protected]] On Behalf Of Johannes Berg
>>> Sent: Sunday, May 17, 2015 23:22
>>> To: Emmanuel Grumbach
>>> Cc: [email protected]; Jouni Malinen; linux-wireless
>>> Subject: Re: mac80211 drops packet with old IV after rekeying
>>>
>>> On Sun, 2015-05-17 at 23:13 +0300, Emmanuel Grumbach wrote:
>>>
>>>>>> Yeah - ok. But how come we *already* set the pointer to the new key
>>>>>> while the HW is still successfully decrypting with the old key.
>>>>>> This is the point I can' figure out. I'd expect the transmitting
>>>>>> side to stop using the old key prior to sending the EAPOL (which
>>
>> There is probably no synchronization between the 4way HS and other data traffic on the transmitter side, as these are different processes. So it is possible that after receiving message 3 and before setting the keys, the HW would be able to decrypt additional frames with the old key.
>>
> In ath10k hw we have peer flag WMI_PEER_NEED_PTK_4_WAY.
> This will lock tx (discard data) until PTK_M4_SENT and install key
> after 4way HS.
> But I didn't check ptk_rekey and I am not sure this will help with all races.
I think at least the 10.1 firmware has bugs that keep this from actually working
just right. Maybe later firmware works better.
Thanks,
Ben
--
Ben Greear <[email protected]>
Candela Technologies Inc http://www.candelatech.com
DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogbGludXgtd2lyZWxlc3Mt
b3duZXJAdmdlci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtd2lyZWxlc3MtDQo+IG93bmVyQHZn
ZXIua2VybmVsLm9yZ10gT24gQmVoYWxmIE9mIEpvaGFubmVzIEJlcmcNCj4gU2VudDogU3VuZGF5
LCBNYXkgMTcsIDIwMTUgMjM6MjINCj4gVG86IEVtbWFudWVsIEdydW1iYWNoDQo+IENjOiBhbGV4
YW5kZXIud2V0emVsQHdlYi5kZTsgSm91bmkgTWFsaW5lbjsgbGludXgtd2lyZWxlc3MNCj4gU3Vi
amVjdDogUmU6IG1hYzgwMjExIGRyb3BzIHBhY2tldCB3aXRoIG9sZCBJViBhZnRlciByZWtleWlu
Zw0KPiANCj4gT24gU3VuLCAyMDE1LTA1LTE3IGF0IDIzOjEzICswMzAwLCBFbW1hbnVlbCBHcnVt
YmFjaCB3cm90ZToNCj4gDQo+ID4gPj4gWWVhaCAtIG9rLiBCdXQgaG93IGNvbWUgd2UgKmFscmVh
ZHkqIHNldCB0aGUgcG9pbnRlciB0byB0aGUgbmV3IGtleQ0KPiA+ID4+IHdoaWxlIHRoZSBIVyBp
cyBzdGlsbCBzdWNjZXNzZnVsbHkgZGVjcnlwdGluZyB3aXRoIHRoZSBvbGQga2V5Lg0KPiA+ID4+
IFRoaXMgaXMgdGhlIHBvaW50IEkgY2FuJyBmaWd1cmUgb3V0LiBJJ2QgZXhwZWN0IHRoZSB0cmFu
c21pdHRpbmcNCj4gPiA+PiBzaWRlIHRvIHN0b3AgdXNpbmcgdGhlIG9sZCBrZXkgcHJpb3IgdG8g
c2VuZGluZyB0aGUgRUFQT0wgKHdoaWNoDQoNClRoZXJlIGlzIHByb2JhYmx5IG5vIHN5bmNocm9u
aXphdGlvbiBiZXR3ZWVuIHRoZSA0d2F5IEhTIGFuZCBvdGhlciBkYXRhIHRyYWZmaWMgb24gdGhl
IHRyYW5zbWl0dGVyIHNpZGUsIGFzIHRoZXNlIGFyZSBkaWZmZXJlbnQgcHJvY2Vzc2VzLiBTbyBp
dCBpcyBwb3NzaWJsZSB0aGF0IGFmdGVyIHJlY2VpdmluZyBtZXNzYWdlIDMgYW5kIGJlZm9yZSBz
ZXR0aW5nIHRoZSBrZXlzLCB0aGUgSFcgd291bGQgYmUgYWJsZSB0byBkZWNyeXB0IGFkZGl0aW9u
YWwgZnJhbWVzIHdpdGggdGhlIG9sZCBrZXkuDQoNCj4gPiA+PiAjdHJpZ2dlcnMgdGhlIHNldCBr
ZXkgcG9pbnRlciBsaW5lKS4gU28gdGhvc2UgMiBsaW5lcyBkb24ndCBtYWtlIHNlbnNlIHRvDQo+
IG1lOg0KPiA+ID4+DQo+ID4gPj4gPiAgIyBzZXQga2V5IHBvaW50ZXIgdG8gbmV3IGtleQ0KPiA+
ID4+ID4gICogZGF0YSBSWCBpbiBIVywgZGVjcnlwdCB3LyBvbGQga2V5LCBQTj05OTkNCj4gPiA+
Pg0KPiA+ID4+IEFmdGVyIGFsbCwgdGhlIFJ4IHBhdGggaXMgc2VyaWFsaXplZCBhbGwgdGhlIHdh
eSB0aHJvdWdoIGZyb20gdGhlDQo+ID4gPj4gYWlyIHRvIG1hYzgwMjExLiBUaGUgb25seSB0aGlu
ZyBJIGNhbiB0aGluayBhYm91dCBpcyB0aGF0IHRoZQ0KPiA+ID4+IHNlbmRpbmcgc2lkZSBpcyBz
dGlsbCB1c2luZyB0aGUgb2xkIGtleSAqYWZ0ZXIqIGl0IGFscmVhZHkgc2VudCBpdHMgRUFQT0wN
Cj4gZnJhbWVzLg0KPiA+ID4NCj4gPiA+IE5vdCBzdXJlLCBpc24ndCB0aGUga2V5IG9ubHkgaW5z
dGFsbGVkIG9uIEVBUE9MIGFja25vd2xlZGdlbWVudCBvciBzbz8NCj4gPiA+IFdpdGggUFRLIHJl
a2V5aW5nLCBJJ20gbm90IHJlYWxseSBzdXJlIHdoYXQgdGhlIHRpbWluZyBpcywgYW5kDQo+ID4g
PiB0aGVyZSdzIG5vdCByZWFsbHkgYW55IHdheSBpdCBjYW4gYmUgY29ycmVjdCAod2l0aG91dCBl
eHRlbmRlZCBrZXkNCj4gPiA+IElEIHN1cHBvcnQuKQ0KPiA+DQo+ID4gV2hhdGV2ZXIgdGhlIHRp
bWluZyBpcywgc2luY2UgdGhlIFJ4IHBhdGggaXMgc2VyaWFsaXplZCwgdGhlcmUNCj4gPiBzaG91
bGRuJ3QgYmUgYW55IHRpbWluZyBpc3N1ZXMuIE9yIGF0IGxlYXN0LCBJIGNhbid0IGZpZ3VyZSBv
dXQgaG93DQo+ID4gdGhlc2UgbGluZXMgYWJvdmUgY291bGQgYmUgaW4gdGhlIG9yZGVyIHlvdSBw
dXQgdGhlbS4NCj4gDQo+IEkgYWdyZWUgdGhhdCBpdCdsbCBkZXBlbmQgb24gaG93IHRoZSBrZXkg
aXMgaW5zdGFsbGVkIG9uIHRoZSBzZW5kZXIsIGhvd2V2ZXIsIEkNCj4gaGF2ZSBubyBpZGVhIGhv
dyB0aGF0J3MgZG9uZSBhbmQgaG93IG11Y2ggcG90ZW50aWFsIGRlbGF5IHRoZXJlIGlzIGJldHdl
ZW4NCj4gc2VuZGluZyB0aGUgRUFQT0wgZnJhbWUgYW5kIGluc3RhbGxpbmcgdGhlIGtleSB0aGVy
ZS4NCj4gSWYgeW91J3JlIGxvb2tpbmcgYXQgUlggcGF0aCBzeW5jaHJvbmlzYXRpb24gb25seSB0
aGVuIHlvdSdyZSBhc3N1bWluZyBhDQo+IHBlcmZlY3Qgc2VuZGVyLCBidXQgdGhhdCBjbGVhcmx5
IGNhbm5vdCBiZSB0aGUgY2FzZS4NCj4gDQoNCkFGQUlLLCB0aGUgUFRLIGlzIGluc3RhbGxlZCBp
bW1lZGlhdGVseSBhZnRlciB0aGUgNHRoIG1lc3NhZ2UgaXMgc2VudCB3aXRob3V0IHdhaXRpbmcg
dG8gQUNLIG9yIGFueSBvdGhlciBkZWxheS4gQXMgdGhlIEFQIChzaG91bGQpIGluc3RhbGxzIHRo
ZSBrZXlzIG9ubHkgYWZ0ZXIgcHJvY2Vzc2luZyB0aGUgNHRoIG1lc3NhZ2UsIHNvIGEgZGVsYXkg
aXMgZXhwZWN0ZWQuDQoNClJlZ2FyZHMsDQoNCklsYW4uDQo=
On Sun, May 17, 2015 at 7:05 PM, Jouni Malinen <[email protected]> wrote:
> On Sat, May 16, 2015 at 09:57:09PM +0200, Johannes Berg wrote:
>> The key index is used for GTK rekeying. The spec makes no provision for
>> seamless PTK rekeying, it's simply not supported.
>>
>> There was/is work in progress to actually change that, but I haven't
>> seen anything definitive. Jouni might know more.
>
> It was added in IEEE Std 802.11-2012. Search for Extended Key ID for
> Individually Addressed Frames subfield of the RSN Capabilities field (a
> field within RSN element).
>
> I'm not sure whether anyone has implemented this, but anyway, we could
> relatively easily add support for this with mac80211 + hostapd +
> wpa_supplicant combination. Though, probably that would end up depending
> on a new driver capability flag, so only with some drivers.
>
>> As I said, I believe at this point the only way to fix this bug is to
>> try to drop *old* key packets immediately, but it's difficult to ensure
>> this. Effectively, it would require synchronising RX vs. key
>> installation.
>
> I did not look at the details of the reported issue. Was this an issue
> in a received frame with old key being processed by mac80211 after key
> change and while doing that, ending up configuring incorrect (way too
> large) RX PN for the new key?
Yes this is what Johannes is saying (I think). The user seems to say
that he sees a frame encrypted with the new key and that has an old IV
which is odd. Note that I started this thread by the big disclaimer of
"I don't know much about security and key handling". So I might talk
nonsense.
One thing that I still haven't understood here is how can we get to a
situation in which we already parsed the EAPOL of the GTK re-keying,
but not a data frame that must have been sent before the EAPOL?
The Rx path is serialized after all. I'd expect maybe to loose frames
because we are still using the old key while the new key has been sent
and not to get to the situation where:
data: old Key PN = 997
data: old Key PN = 998
data: old Key PN = 999
set new key PN = 0
data: old Key PN = 1000
(new key's PN gets set to 1000 ** BUG **)
data: new Key PN = 1 <DROPPED>
>
> Dropping the frames with the old key would be one option, but not really
> ideal. A somewhat nicer option would be to add a concept of generation
> to the key (i.e., the 1st, 2nd, ... key using key index N) and with the
> help of drivers (that can do this), indicate which generation of the key
> was used for RX decryption. This would allow proper replay protection
> for both keys if we were to store copies of the RX counters for both the
> previous and current key in mac80211.
>
Ah ok - but if the driver were ever to try to decrypt with the wrong
key, it'd get garbage and would drop the frame anyway, which would not
update the PN anyway. I know I am wrong, I just don't know where :)
On Sun, 2015-05-17 at 21:23 +0300, Emmanuel Grumbach wrote:
> One thing that I still haven't understood here is how can we get to a
> situation in which we already parsed the EAPOL of the GTK re-keying,
> but not a data frame that must have been sent before the EAPOL?
> The Rx path is serialized after all. I'd expect maybe to loose frames
> because we are still using the old key while the new key has been sent
> and not to get to the situation where:
>
> data: old Key PN = 997
> data: old Key PN = 998
> data: old Key PN = 999
> set new key PN = 0
> data: old Key PN = 1000
> (new key's PN gets set to 1000 ** BUG **)
> data: new Key PN = 1 <DROPPED>
Yeah, this is the situation. How it happens is like this:
(* - data, # - control)
* data RX in HW, decrypt w/ old key, PN=998
* data RX in mac80211 - PN <= 998 [old key]
# set key pointer to new key
* data RX in HW, decrypt w/ old key, PN=999
* data RX in mac80211 - PN <= 999 [new key!! - PROBLEM FOR NEXT FRAME]
# delete old key from HW crypto
# add new key to HW crypto
Perhaps then the easier way to solve it would be to simply delete HW
crypto *before* changing the key pointer. Somewhat similar, but perhaps
simpler, than my previous patch. This would end up with the scenario
like this:
* data RX in HW, decrypt w/ old key, PN=998
* data RX in mac80211 - PN <= 998 [old key]
# delete old key from HW crypto
# set key pointer to new key
* data RX in HW, decryption possible
* data RX in mac80211, decrypt fails [old key was used, new key is
valid]
# add new key to HW crypto
The problem with that approach is how to handle drivers that *must* use
HW crypto, like ath10k, and cannot fall back to software crypto. For
those, having a state where a key is valid in software but not uploaded
to hardware is basically an invalid state.
Then again, we can probably accept that for the transition period, as
the result really would only be to indicate to mac80211 that unencrypted
frames must not be accepted (key pointer exists.)
That'd look something like this:
http://p.sipsolutions.net/941c1a69a9c54a73.txt
johannes
I've verified that turning off hardware encryption on the AP and the STA
is indeed preventing the issue.
As soon as one of them is using the hardware encryption I can trigger
the problem. (In my setup it seems to be mostly caused by the AP, since
I needed sometimes as much as three rekeys to get the freeze when the AP
was using Software and the STA hardware encryption.)
So confident that we finally found the root of the evil I tried to write
some code catching the races, see the attachment.
It's probably not the best fix, but the only one I could think of and
deploy myself with the knowledge I gathered here and the last days.
It's only for CCMP for now and I've not checked on assumptions what some
parts of the code are for.
This is just "works for me". (It survived now 30 rekeys under my test
load, when previously nearly every rekey did freeze it.)
I think there is no need any longer to generate captures, but if you
want them anyway I can of course still work on that.
The decision to check if the pn is <= 10 as indicator to switch over to
the "correct" pn counter is at best questionable. (Using 1 does not
work. I have problems with the inital key here and the code was not
switching over to the correct rx_pn counter and I expect that we also
may lose some frames here.) I'm not even sure that it's safe to assume
that pn is always starting at one, since wpa_supplicant is dumping some
sequence number on rekey I would have assumed to be a more or less
random start value. But in the debug prints the real value is always
starting at one. So I'm using that for now.
I'll now try that in my environment, keeping the insane low rekey
interval at two minutes for at least some weeks.
What was really surprising me here is, that this is such a generic issue
and I'm finding that in my home environment. For my understanding that
should break many (all?) EAP Wlan's. (I'm using EAP-TLS and that did
make the WLAN basically unusable and any sane person would have switched
back to PSK...)
For completeness, here is the original openwrt bug report I opened:
https://dev.openwrt.org/ticket/18966
Alexander
On Mon, 2015-05-18 at 21:47 +0200, Alexander Wetzel wrote:
> For my understanding that has already be done. And at least for me it
> looks like we have hard evidence for that fact.
[...]
> The Key information used to decrypt the packets is added in the same
> section as the key index, if you have problems finding it.
[building a new wireshark was awkward ... between their git being really
slow and the build needing to be completely deleted first ...]
I agree with you - what you can see in the capture, assuming the TK/PMK
display is correct, is that
packet 11: PN 0x11F2B, old key
packet 15: PN 0x11F40, old key
packet 19: PN 0x11F2C, new key
Note how packet 15, since it's VO priority, goes out far before packet
19, although packet 19 got the sequence number immediately after packet
11.
So... I guess we can, for now, go back to my earlier email and look at
the transmitter problem after all. I still think the receiver has a
similar issue though.
To be honest though, I'm not sure how to really solve this. Without
multi index capability, the spec doesn't really support PTK rekeying
well. With it, this is clearly no problem, but that would depend on more
code and driver support etc. and perhaps can't even be done with all
drivers/devices.
The first idea here would be to stop using HW crypto for TX while
changing the key, but I think at least ath10k wouldn't support that, not
sure what would happen though? Either way, it'd need a TX path flush, so
I guess it doesn't really make a difference.
So really, I guess what we need to do - and this will suck for
performance - is to stop queues and flush the TX path while the old key
is still programmed into the device, reinstall the key, and only then
restart transmission...
johannes
On Sun, May 17, 2015 at 10:25 PM, Johannes Berg
<[email protected]> wrote:
> On Sun, 2015-05-17 at 21:23 +0300, Emmanuel Grumbach wrote:
>
>> One thing that I still haven't understood here is how can we get to a
>> situation in which we already parsed the EAPOL of the GTK re-keying,
>> but not a data frame that must have been sent before the EAPOL?
>> The Rx path is serialized after all. I'd expect maybe to loose frames
>> because we are still using the old key while the new key has been sent
>> and not to get to the situation where:
>>
>> data: old Key PN = 997
>> data: old Key PN = 998
>> data: old Key PN = 999
>> set new key PN = 0
>> data: old Key PN = 1000
>> (new key's PN gets set to 1000 ** BUG **)
>> data: new Key PN = 1 <DROPPED>
>
> Yeah, this is the situation. How it happens is like this:
> (* - data, # - control)
>
>
> * data RX in HW, decrypt w/ old key, PN=998
> * data RX in mac80211 - PN <= 998 [old key]
> # set key pointer to new key
> * data RX in HW, decrypt w/ old key, PN=999
> * data RX in mac80211 - PN <= 999 [new key!! - PROBLEM FOR NEXT FRAME]
> # delete old key from HW crypto
> # add new key to HW crypto
Yeah - ok. But how come we *already* set the pointer to the new key
while the HW is still successfully decrypting with the old key. This
is the point I can' figure out. I'd expect the transmitting side to
stop using the old key prior to sending the EAPOL (which #triggers the
set key pointer line). So those 2 lines don't make sense to me:
> # set key pointer to new key
> * data RX in HW, decrypt w/ old key, PN=999
After all, the Rx path is serialized all the way through from the air
to mac80211. The only thing I can think about is that the sending side
is still using the old key *after* it already sent its EAPOL frames.
Then, by pure change, we can still decrypt them in HW because the HW
hasn't been updated yet (these frames are successfully decrypted
because of race basically) and then, these frames come up to mac80211
*after* the EAPOL but with the old key.
This is what the submitter says:
"
The encryption key indeed changes immediately after the last packet of
the handshake, but the Initialization Vector is still counting up
against the old value.
"
So maybe that's the real issue?
>
> Perhaps then the easier way to solve it would be to simply delete HW
> crypto *before* changing the key pointer. Somewhat similar, but perhaps
> simpler, than my previous patch. This would end up with the scenario
> like this:
>
> * data RX in HW, decrypt w/ old key, PN=998
> * data RX in mac80211 - PN <= 998 [old key]
> # delete old key from HW crypto
> # set key pointer to new key
> * data RX in HW, decryption possible
> * data RX in mac80211, decrypt fails [old key was used, new key is
> valid]
> # add new key to HW crypto
>
> The problem with that approach is how to handle drivers that *must* use
> HW crypto, like ath10k, and cannot fall back to software crypto. For
> those, having a state where a key is valid in software but not uploaded
> to hardware is basically an invalid state.
>
> Then again, we can probably accept that for the transition period, as
> the result really would only be to indicate to mac80211 that unencrypted
> frames must not be accepted (key pointer exists.)
>
> That'd look something like this:
> http://p.sipsolutions.net/941c1a69a9c54a73.txt
>
> johannes
>
On Sun, 2015-05-17 at 23:13 +0300, Emmanuel Grumbach wrote:
> >> Yeah - ok. But how come we *already* set the pointer to the new key
> >> while the HW is still successfully decrypting with the old key. This
> >> is the point I can' figure out. I'd expect the transmitting side to
> >> stop using the old key prior to sending the EAPOL (which #triggers the
> >> set key pointer line). So those 2 lines don't make sense to me:
> >>
> >> > # set key pointer to new key
> >> > * data RX in HW, decrypt w/ old key, PN=999
> >>
> >> After all, the Rx path is serialized all the way through from the air
> >> to mac80211. The only thing I can think about is that the sending side
> >> is still using the old key *after* it already sent its EAPOL frames.
> >
> > Not sure, isn't the key only installed on EAPOL acknowledgement or so?
> > With PTK rekeying, I'm not really sure what the timing is, and there's
> > not really any way it can be correct (without extended key ID support.)
>
> Whatever the timing is, since the Rx path is serialized, there
> shouldn't be any timing issues. Or at least, I can't figure out how
> these lines above could be in the order you put them.
I agree that it'll depend on how the key is installed on the sender,
however, I have no idea how that's done and how much potential delay
there is between sending the EAPOL frame and installing the key there.
If you're looking at RX path synchronisation only then you're assuming a
perfect sender, but that clearly cannot be the case.
> > Actually, yes, this *could* happen, in exactly the same way as on the
> > receiver, with hardware crypto that does PN generation in software. Not
> > on iwlwifi, because we copy the key material into the TX command, but on
> > any hardware that uses hw_key_idx to identify the key, and could replace
> > the key:
> >
> > * mac80211: encrypt TX frame - assign PN, ((tx_info
> > *)skb->cb)->hw_key_idx = 7
> > * mac80211: replace key with new key
> > * driver: remove old key from hw accel (hw_key_idx 7)
> > * driver: insert new key to hw accel (reusing hw_key_idx 7)
> >
>
> This seems to be exactly what the submitter is mentioning: new key old PN.
> I don't really know how we can determine what key was used though.
> Maybe by just trying to decrypt in wireshark with the new key and
> check.
Then it'd actually be more of a sender problem, not sure how to solve it
there.
johannes
On Sun, 2015-05-17 at 22:49 +0300, Emmanuel Grumbach wrote:
> Yeah - ok. But how come we *already* set the pointer to the new key
> while the HW is still successfully decrypting with the old key. This
> is the point I can' figure out. I'd expect the transmitting side to
> stop using the old key prior to sending the EAPOL (which #triggers the
> set key pointer line). So those 2 lines don't make sense to me:
>
> > # set key pointer to new key
> > * data RX in HW, decrypt w/ old key, PN=999
>
> After all, the Rx path is serialized all the way through from the air
> to mac80211. The only thing I can think about is that the sending side
> is still using the old key *after* it already sent its EAPOL frames.
Not sure, isn't the key only installed on EAPOL acknowledgement or so?
With PTK rekeying, I'm not really sure what the timing is, and there's
not really any way it can be correct (without extended key ID support.)
> Then, by pure change, we can still decrypt them in HW because the HW
> hasn't been updated yet (these frames are successfully decrypted
> because of race basically) and then, these frames come up to mac80211
> *after* the EAPOL but with the old key.
> This is what the submitter says:
>
> "
> The encryption key indeed changes immediately after the last packet of
> the handshake, but the Initialization Vector is still counting up
> against the old value.
> "
>
> So maybe that's the real issue?
You're thinking there's a transmitter issue instead? I can't see how
that could happen, especially that way around ...
Actually, yes, this *could* happen, in exactly the same way as on the
receiver, with hardware crypto that does PN generation in software. Not
on iwlwifi, because we copy the key material into the TX command, but on
any hardware that uses hw_key_idx to identify the key, and could replace
the key:
* mac80211: encrypt TX frame - assign PN, ((tx_info
*)skb->cb)->hw_key_idx = 7
* mac80211: replace key with new key
* driver: remove old key from hw accel (hw_key_idx 7)
* driver: insert new key to hw accel (reusing hw_key_idx 7)
johannes
On Sun, May 17, 2015 at 11:05 PM, Johannes Berg
<[email protected]> wrote:
> On Sun, 2015-05-17 at 22:49 +0300, Emmanuel Grumbach wrote:
>
>> Yeah - ok. But how come we *already* set the pointer to the new key
>> while the HW is still successfully decrypting with the old key. This
>> is the point I can' figure out. I'd expect the transmitting side to
>> stop using the old key prior to sending the EAPOL (which #triggers the
>> set key pointer line). So those 2 lines don't make sense to me:
>>
>> > # set key pointer to new key
>> > * data RX in HW, decrypt w/ old key, PN=999
>>
>> After all, the Rx path is serialized all the way through from the air
>> to mac80211. The only thing I can think about is that the sending side
>> is still using the old key *after* it already sent its EAPOL frames.
>
> Not sure, isn't the key only installed on EAPOL acknowledgement or so?
> With PTK rekeying, I'm not really sure what the timing is, and there's
> not really any way it can be correct (without extended key ID support.)
Whatever the timing is, since the Rx path is serialized, there
shouldn't be any timing issues. Or at least, I can't figure out how
these lines above could be in the order you put them.
>
>> Then, by pure change, we can still decrypt them in HW because the HW
>> hasn't been updated yet (these frames are successfully decrypted
>> because of race basically) and then, these frames come up to mac80211
>> *after* the EAPOL but with the old key.
>> This is what the submitter says:
>>
>> "
>> The encryption key indeed changes immediately after the last packet of
>> the handshake, but the Initialization Vector is still counting up
>> against the old value.
>> "
>>
>> So maybe that's the real issue?
>
> You're thinking there's a transmitter issue instead? I can't see how
> that could happen, especially that way around ...
>
> Actually, yes, this *could* happen, in exactly the same way as on the
> receiver, with hardware crypto that does PN generation in software. Not
> on iwlwifi, because we copy the key material into the TX command, but on
> any hardware that uses hw_key_idx to identify the key, and could replace
> the key:
>
> * mac80211: encrypt TX frame - assign PN, ((tx_info
> *)skb->cb)->hw_key_idx = 7
> * mac80211: replace key with new key
> * driver: remove old key from hw accel (hw_key_idx 7)
> * driver: insert new key to hw accel (reusing hw_key_idx 7)
>
This seems to be exactly what the submitter is mentioning: new key old PN.
I don't really know how we can determine what key was used though.
Maybe by just trying to decrypt in wireshark with the new key and
check.
> johannes
>
On Sun, 2015-05-17 at 19:05 +0300, Jouni Malinen wrote:
> On Sat, May 16, 2015 at 09:57:09PM +0200, Johannes Berg wrote:
> > The key index is used for GTK rekeying. The spec makes no provision for
> > seamless PTK rekeying, it's simply not supported.
> >
> > There was/is work in progress to actually change that, but I haven't
> > seen anything definitive. Jouni might know more.
>
> It was added in IEEE Std 802.11-2012. Search for Extended Key ID for
> Individually Addressed Frames subfield of the RSN Capabilities field (a
> field within RSN element).
Ah, I had searched for (very briefly) for key IV things but hadn't found
it. Good to know.
> I'm not sure whether anyone has implemented this, but anyway, we could
> relatively easily add support for this with mac80211 + hostapd +
> wpa_supplicant combination. Though, probably that would end up depending
> on a new driver capability flag, so only with some drivers.
Indeed, although for most drivers it could be implemented using SW
crypto in the transition period. Then again, the old key would probably
stick around for quite a while (practically forever, until the next
rekey) so that's not a good idea.
I guess we should consider that. I believe our device would support HW
crypto in this scenario as well, although I'm not completely sure and
would obviously have to test it.
> I did not look at the details of the reported issue. Was this an issue
> in a received frame with old key being processed by mac80211 after key
> change and while doing that, ending up configuring incorrect (way too
> large) RX PN for the new key?
Yes.
> Dropping the frames with the old key would be one option, but not really
> ideal. A somewhat nicer option would be to add a concept of generation
> to the key (i.e., the 1st, 2nd, ... key using key index N) and with the
> help of drivers (that can do this), indicate which generation of the key
> was used for RX decryption. This would allow proper replay protection
> for both keys if we were to store copies of the RX counters for both the
> previous and current key in mac80211.
That would be good, but I can't think of any driver that'd be able to
implement it. In software crypto we could (theoretically) try both keys,
i.e. structure the transition as having both keys valid in software, and
switch over once decrypting with the old one fails and succeeds with the
new one. However, that seems like a lot of effort for very little gain.
johannes
On Sat, May 16, 2015 at 09:57:09PM +0200, Johannes Berg wrote:
> The key index is used for GTK rekeying. The spec makes no provision for
> seamless PTK rekeying, it's simply not supported.
>
> There was/is work in progress to actually change that, but I haven't
> seen anything definitive. Jouni might know more.
It was added in IEEE Std 802.11-2012. Search for Extended Key ID for
Individually Addressed Frames subfield of the RSN Capabilities field (a
field within RSN element).
I'm not sure whether anyone has implemented this, but anyway, we could
relatively easily add support for this with mac80211 + hostapd +
wpa_supplicant combination. Though, probably that would end up depending
on a new driver capability flag, so only with some drivers.
> As I said, I believe at this point the only way to fix this bug is to
> try to drop *old* key packets immediately, but it's difficult to ensure
> this. Effectively, it would require synchronising RX vs. key
> installation.
I did not look at the details of the reported issue. Was this an issue
in a received frame with old key being processed by mac80211 after key
change and while doing that, ending up configuring incorrect (way too
large) RX PN for the new key?
Dropping the frames with the old key would be one option, but not really
ideal. A somewhat nicer option would be to add a concept of generation
to the key (i.e., the 1st, 2nd, ... key using key index N) and with the
help of drivers (that can do this), indicate which generation of the key
was used for RX decryption. This would allow proper replay protection
for both keys if we were to store copies of the RX counters for both the
previous and current key in mac80211.
--
Jouni Malinen PGP id EFC895FA
On Sat, 2015-05-16 at 21:18 +0300, Emmanuel Grumbach wrote:
> I don't get it. I might have misunderstood your previous mail, but I
> thought that you were saying the key index was meant to solve this:
> the peer could know what key was used based on the key index written
> in the frame (I guess it is there somehow) so that the Rx handling
> code can know that the jump in the PN is due to a rekeying and not use
> any heuristic. I though you meant that the Txing side had a poor
> implementation because it reused the same key index before and after
> the rekeying which can obviously lead to problems.
> Now you seem to say that changing the key index upon rekeying is not
> allowed by spec?
> What is it good for then?
The key index is used for GTK rekeying. The spec makes no provision for
seamless PTK rekeying, it's simply not supported.
There was/is work in progress to actually change that, but I haven't
seen anything definitive. Jouni might know more.
> Again - I am just trying to close this bug, not to learn this subject.
> I can learn by reading spec / reading code and less by wasting your
> time :)
As I said, I believe at this point the only way to fix this bug is to
try to drop *old* key packets immediately, but it's difficult to ensure
this. Effectively, it would require synchronising RX vs. key
installation.
Using SW crypto will avoid this problem, because as I described in my
first email, the only reason (I can think of anyway) is that there's a
race between HW decrypt and SW key install. In the case of SW crypto,
there's a definitive link between the actual key and the PN because the
frame would fail to decrypt (and thus be dropped) if the wrong key was
used. The problem only happens in HW crypto with software PN because the
two are linked by the key index only.
This (entirely untested) patch might also help:
http://p.sipsolutions.net/3f082fae8a023bbc.txt
johannes
On Fri, 2015-05-15 at 10:52 +0300, Emmanuel Grumbach wrote:
> >> I'd be glad if someone could take a look. If not, I'll have someone
> >> from our team to look at it, but I don't know how long it will take...
> >
> > Without looking too much - it seems to me that this is a fundamental
> > problem with PTK rekeying, in that it re-uses the key index that is
> > intended to avoid this.
>
> In this case, the AP is openWRT. I guess the Key idx is chosen in
> software, so maybe the proper fix would be to have openWRT increment
> the key index when it rekeys?
Neither the spec nor the code allow that.
johannes
On Mon, 2015-05-18 at 06:14 +0000, Peer, Ilan wrote:
> There is probably no synchronization between the 4way HS and other
> data traffic on the transmitter side, as these are different
> processes. So it is possible that after receiving message 3 and before
> setting the keys, the HW would be able to decrypt additional frames
> with the old key.
Right. I think the "new key with old PN" part is probably not really
happening, although it seems possible. I'd think we should look at the
receiver first and only then move on to the transmitter if issues
persist. Having a sniffer capture of the problem with known keys (!)
would be useful though.
> AFAIK, the PTK is installed immediately after the 4th message is sent
> without waiting to ACK or any other delay. As the AP (should) installs
> the keys only after processing the 4th message, so a delay is
> expected.
Makes sense.
johannes
>> I'd be glad if someone could take a look. If not, I'll have someone
>> from our team to look at it, but I don't know how long it will take...
>
> Without looking too much - it seems to me that this is a fundamental
> problem with PTK rekeying, in that it re-uses the key index that is
> intended to avoid this.
In this case, the AP is openWRT. I guess the Key idx is chosen in
software, so maybe the proper fix would be to have openWRT increment
the key index when it rekeys?
>
> The issue at hand here is likely a hardware decryption vs. key replacing
> race, in that hardware decryption happens while the key replacing also
> happens, and then the PN checking after key replace hits the wrong key
> (due to the above-mentioned issue with not being able to change the key
> index in PTK rekeying.)
>
> I don't see how we can fix this, except perhaps heuristically by
> dropping packets that were encrypted with the *old* key and therefore
> not updating the replay counter for them. However, detecting that they
> were encrypted with the old key would only be possible by detecting a
> large jump in PN, which could theoretically also happen in real
> usage ...
>
> johannes
>
On Mon, May 18, 2015 at 6:02 PM, Johannes Berg
<[email protected]> wrote:
> On Mon, 2015-05-18 at 06:14 +0000, Peer, Ilan wrote:
>
>> There is probably no synchronization between the 4way HS and other
>> data traffic on the transmitter side, as these are different
>> processes. So it is possible that after receiving message 3 and before
>> setting the keys, the HW would be able to decrypt additional frames
>> with the old key.
>
> Right. I think the "new key with old PN" part is probably not really
> happening, although it seems possible. I'd think we should look at the
> receiver first and only then move on to the transmitter if issues
> persist. Having a sniffer capture of the problem with known keys (!)
> would be useful though.
>
The submitter seems to say he sees the new key with old PN in the air.
He attached sniffer captures with known keys. I guess that Wireshark
is able to try different keys and check which one is the one that
leads to a successful decryption.
https://bugzilla.kernel.org/show_bug.cgi?id=92451#c14
I can indeed see that the PN is going back to one way after the EAPOL
frame exchange is finished.
Note that Windows doesn't seem to be suffering from this problem
(again - based on what the user said).
So basically, the sender is not behaving nice. It bumps the PN of the
new key. If the sender would have kept using the old key for a while
and change the IV and the key atomically, we would have been fine. We
would have dropped the packets until the sender finally uses the new
key, and this would not have bumped the PN of the new key. The problem
(IMHO) is that the sender use the *new* key which means that there is
no decryption failure with the old PN. And only later the PN goes back
to 1. I don't know how we can possibly work around this. And just like
the user is saying, I wonder how Windows doesn't suffer from the same
problem as mac80211...
>
>> AFAIK, the PTK is installed immediately after the 4th message is sent
>> without waiting to ACK or any other delay. As the AP (should) installs
>> the keys only after processing the 4th message, so a delay is
>> expected.
>
> Makes sense.
>
> johannes
>
On Fri, 2015-05-15 at 09:48 +0300, Emmanuel Grumbach wrote:
> I'd be glad if someone could take a look. If not, I'll have someone
> from our team to look at it, but I don't know how long it will take...
Without looking too much - it seems to me that this is a fundamental
problem with PTK rekeying, in that it re-uses the key index that is
intended to avoid this.
The issue at hand here is likely a hardware decryption vs. key replacing
race, in that hardware decryption happens while the key replacing also
happens, and then the PN checking after key replace hits the wrong key
(due to the above-mentioned issue with not being able to change the key
index in PTK rekeying.)
I don't see how we can fix this, except perhaps heuristically by
dropping packets that were encrypted with the *old* key and therefore
not updating the replay counter for them. However, detecting that they
were encrypted with the old key would only be possible by detecting a
large jump in PN, which could theoretically also happen in real
usage ...
johannes
On Fri, May 15, 2015 at 9:35 PM, Johannes Berg
<[email protected]> wrote:
> On Fri, 2015-05-15 at 10:52 +0300, Emmanuel Grumbach wrote:
>> >> I'd be glad if someone could take a look. If not, I'll have someone
>> >> from our team to look at it, but I don't know how long it will take...
>> >
>> > Without looking too much - it seems to me that this is a fundamental
>> > problem with PTK rekeying, in that it re-uses the key index that is
>> > intended to avoid this.
>>
>> In this case, the AP is openWRT. I guess the Key idx is chosen in
>> software, so maybe the proper fix would be to have openWRT increment
>> the key index when it rekeys?
>
> Neither the spec nor the code allow that.
>
I don't get it. I might have misunderstood your previous mail, but I
thought that you were saying the key index was meant to solve this:
the peer could know what key was used based on the key index written
in the frame (I guess it is there somehow) so that the Rx handling
code can know that the jump in the PN is due to a rekeying and not use
any heuristic. I though you meant that the Txing side had a poor
implementation because it reused the same key index before and after
the rekeying which can obviously lead to problems.
Now you seem to say that changing the key index upon rekeying is not
allowed by spec?
What is it good for then?
Again - I am just trying to close this bug, not to learn this subject.
I can learn by reading spec / reading code and less by wasting your
time :)
Hello,
I'm the one banging my head against this issue for quite some time, so
if I can do anything to help here contact me. I'll check the mailing
list from time to time but if you have something I should reply please
add/keep me on CC.
I can now reproduce the issue reliable within minutes on demand and can
also patch the kernels at both ends. (Just started looking at openwrt
and got my first openwrt kernel patch crashing the wlan driver:-)
But now to the topic:
> Right. I think the "new key with old PN" part is probably not really
> happening, although it seems possible. I'd think we should look at the
> receiver first and only then move on to the transmitter if issues
> persist. Having a sniffer capture of the problem with known keys (!)
> would be useful though.
For my understanding that has already be done. And at least for me it
looks like we have hard evidence for that fact.
In the linux bug report you can find an capture extract to verify that -
taken with a remote monitor station - and the PSK's needed to decrypt
the traffic:
https://bugzilla.kernel.org/show_bug.cgi?id=92451
You are probably interested in comment 14.
Maybe a short warning here:
The wireshark patch needed to make sense of these captures was written
by me and on my still sketchy understanding how this all works. But I
see no way how it could mix up keys and all it all only a minor
modification was needed. (It's taking a short cut to decide if it will
add the PSK to the packet by only looking at the key index and not at
the appropriate flags, but hardly relevant here.)
The Key information used to decrypt the packets is added in the same
section as the key index, if you have problems finding it.
This is an older capture and I'll verify that with a new one soon.
I have quite many retransmissions in it and the monitoring station also
missed quite some frames for some incomprehensible reason.
If you wanted the full capture and not the sipped down one from the
kernel bugzilla you can download it here:
https://cal.a.wdyn.eu:65443/index.php/s/UdMpcULG16Lz1Ah
I'll hope I can provide a better one at the weekend.
I have also still some output of my poking around with printk's in the
kernel and attached you an sample from them, see debug-out.txt.gz.
I have not preserved the exact kernel patch for those printk messages,
but attached a version close of it.
(The additional MIC check was a failed experiment of many to get it
working without removing the replay detection mechanism.)
The XXX debug lines are not in the patch, these are just some printk's
at the start of the function with the name printed out to see when the
key was installed.