2012-02-15 08:51:27

by Mathias Krause

[permalink] [raw]
Subject: AES-NI data corruption issues

Hi Linus,

in commit 5b1cbac3.. ("i387: make irq_fpu_usable() tests more robust")
you wrote on a side note:

So this explicitly verifies that we will not touch the TS_USEDFPU bit,
and adds a few related sanity-checks. Because it seems that somehow
AES-NI is corrupting user FP state. The cause is not clear, and this
patch doesn't fix it, but while debugging it I really wanted the code to
be more obviously correct and robust.

Can you please elaborate a little more on the AES-NI issues you're
seeing as I cannot find any information about them on
LKML/bugzilla/linux-crypto? Are they limited to the 3.3-rc kernels or
are they happening on released kernels as well? Are they happening on
32 bit, 64 bit or both?

I'm using aesni-intel.ko and fear my data may vanish, albeit I haven't
observed any data corruption so far.


Regards,
Mathias


2012-02-15 16:21:06

by Linus Torvalds

[permalink] [raw]
Subject: Re: AES-NI data corruption issues

On Wed, Feb 15, 2012 at 12:51 AM, Mathias Krause <[email protected]> wrote:
>
> ? ?So this explicitly verifies that we will not touch the TS_USEDFPU bit,
> ? ?and adds a few related sanity-checks. ?Because it seems that somehow
> ? ?AES-NI is corrupting user FP state. ?The cause is not clear, and this
> ? ?patch doesn't fix it, but while debugging it I really wanted the code to
> ? ?be more obviously correct and robust.
>
> Can you please elaborate a little more on the AES-NI issues you're
> seeing as I cannot find any information about them on
> LKML/bugzilla/linux-crypto? Are they limited to the 3.3-rc kernels or
> are they happening on released kernels as well? Are they happening on
> 32 bit, 64 bit or both?

So far we have reports from just one person, and it's seems limited to
32-bit and using the AES instructions from interrupts - by the WiFi
layer.

We have not figured out what's wrong yet, but it doesn't look like
it's AES-NI itself: it seems to be some FP state mixup (right now it
looks like the TS_USEDFPU bit we use to track it gets confused). It is
probably just triggered by the very unusual case of the mac80211 code
wanting to use FP state from interrupts.

There's a few other reports that *may* be the same thing, but they
also seem to be about wireless, and using WPA with AES. In fact, we
have no real reason to even consider them related to AES-NI at all,
other than that commonality.

Anyway, AES-NI itself seems to be fine, everything we have so far
points to the FPU/MMX state handling being very subtly broken.

Linus

2012-02-16 09:45:11

by Mathias Krause

[permalink] [raw]
Subject: Re: AES-NI data corruption issues

On Wed, Feb 15, 2012 at 5:20 PM, Linus Torvalds
<[email protected]> wrote:
> On Wed, Feb 15, 2012 at 12:51 AM, Mathias Krause <[email protected]> wrote:
>>
>> Can you please elaborate a little more on the AES-NI issues you're
>> seeing as I cannot find any information about them on
>> LKML/bugzilla/linux-crypto? Are they limited to the 3.3-rc kernels or
>> are they happening on released kernels as well? Are they happening on
>> 32 bit, 64 bit or both?
>
> So far we have reports from just one person, and it's seems limited to
> 32-bit and using the AES instructions from interrupts - by the WiFi
> layer.
>
> We have not figured out what's wrong yet, but it doesn't look like
> it's AES-NI itself: it seems to be some FP state mixup (right now it
> looks like the TS_USEDFPU bit we use to track it gets confused). It is
> probably just triggered by the very unusual case of the mac80211 code
> wanting to use FP state from interrupts.

Maybe it was a bad idea porting that code to 32 bit. Honestly, I
haven't checked if the kernel can save and restore the FP state
(especially the MMX/SSE state) correctly while doing the port. But as
my tests with dm-crypt worked flawlessly I was implying it can.

> There's a few other reports that *may* be the same thing, but they
> also seem to be about wireless, and using WPA with AES. In fact, we
> have no real reason to even consider them related to AES-NI at all,
> other than that commonality.

I'm actively using AES-NI with WPA as well for quite some months now
without any problems. I'm running on a 64 bit kernel, though. So this
problem may be 32 bit only.

> Anyway, AES-NI itself seems to be fine, everything we have so far
> points to the FPU/MMX state handling being very subtly broken.

Lets hope they get fixed soon.

Thanks,
Mathias