On 06/04/2014 03:15 PM, tip-bot for Borislav Petkov wrote:
> Commit-ID: baa916f39b50ad91661534652110df40396acda0
> Gitweb: http://git.kernel.org/tip/baa916f39b50ad91661534652110df40396acda0
> Author: Borislav Petkov <[email protected]>
> AuthorDate: Fri, 25 Apr 2014 13:40:10 +0200
> Committer: Matt Fleming <[email protected]>
> CommitDate: Sat, 3 May 2014 06:39:25 +0100
>
> x86/efi: Check for unsafe dealing with FPU state in irq ctxt
>
> efi_call can happen in an irq context (pstore) and there we really need
> to make sure we're not scribbling over FPU state while we've interrupted
> a thread or kernel mode with a live FPU state. Therefore, use the
> kernel_fpu_begin/end() variants which do that check.
>
> Signed-off-by: Borislav Petkov <[email protected]>
> Cc: Ricardo Neri <[email protected]>
> Signed-off-by: Matt Fleming <[email protected]>
> ---
> arch/x86/include/asm/efi.h | 6 ++----
> 1 file changed, 2 insertions(+), 4 deletions(-)
>
I seem to have lost track of this... does this actually solve anything,
or does it just mean we'll explode hard?
-hpa
On Wed, Jun 04, 2014 at 03:17:30PM -0700, H. Peter Anvin wrote:
> I seem to have lost track of this... does this actually solve
> anything, or does it just mean we'll explode hard?
Not that hard - it'll warn once only.
AFAIR, the discussion stalled but we were going in the direction of not
calling into efi from pstore in irq context.
I'd say we drop this until we have a proper solution.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On 06/04/2014 03:49 PM, Borislav Petkov wrote:
> On Wed, Jun 04, 2014 at 03:17:30PM -0700, H. Peter Anvin wrote:
>> I seem to have lost track of this... does this actually solve
>> anything, or does it just mean we'll explode hard?
>
> Not that hard - it'll warn once only.
>
> AFAIR, the discussion stalled but we were going in the direction of not
> calling into efi from pstore in irq context.
>
> I'd say we drop this until we have a proper solution.
>
Works for me.
-hpa
On 06/04/2014 03:49 PM, Borislav Petkov wrote:
> On Wed, Jun 04, 2014 at 03:17:30PM -0700, H. Peter Anvin wrote:
>> I seem to have lost track of this... does this actually solve
>> anything, or does it just mean we'll explode hard?
>
> Not that hard - it'll warn once only.
>
> AFAIR, the discussion stalled but we were going in the direction of not
> calling into efi from pstore in irq context.
The kernel_fpu_begin thing has annoyed me in the past. How bad would it
be to allocate some percpu space and just do a full save/restore when
kernel_fpu_begin happens in a context where it currently doesn't work?
I don't know how large the state is these days, but there must be some
limit to how deeply interrupts and exceptions can nest. For each IST
entry, there is a hard limit to how deeply they can nest (once for all
but debug and four times for debug IIRC), plus one NMI (nested ones
don't touch FPU). The most non-IST entries we can have must be bounded,
too.
Let's say there are at most 16 levels of nesting. 16 * state size *
cpus isn't that much.
Of course, code in interrupts that nests kernel_fpu_begin itself could
have a problem. But this can be solved with a little bit of trickery in
the entry code or something.
If we did this, then I think the x86 crypto code could get rid of all of
its ridiculous async code.
--Andy
On Wed, Jun 04, 2014 at 05:19:01PM -0700, Andy Lutomirski wrote:
> On 06/04/2014 03:49 PM, Borislav Petkov wrote:
> > On Wed, Jun 04, 2014 at 03:17:30PM -0700, H. Peter Anvin wrote:
> >> I seem to have lost track of this... does this actually solve
> >> anything, or does it just mean we'll explode hard?
> >
> > Not that hard - it'll warn once only.
> >
> > AFAIR, the discussion stalled but we were going in the direction of not
> > calling into efi from pstore in irq context.
>
> The kernel_fpu_begin thing has annoyed me in the past. How bad would it
> be to allocate some percpu space and just do a full save/restore when
> kernel_fpu_begin happens in a context where it currently doesn't work?
>
> I don't know how large the state is these days, but there must be some
> limit to how deeply interrupts and exceptions can nest. For each IST
> entry, there is a hard limit to how deeply they can nest (once for all
> but debug and four times for debug IIRC), plus one NMI (nested ones
> don't touch FPU). The most non-IST entries we can have must be bounded,
> too.
>
> Let's say there are at most 16 levels of nesting. 16 * state size *
> cpus isn't that much.
>
> Of course, code in interrupts that nests kernel_fpu_begin itself could
> have a problem. But this can be solved with a little bit of trickery in
> the entry code or something.
>
> If we did this, then I think the x86 crypto code could get rid of all of
> its ridiculous async code.
How are you going to detect when to save/restore state? Do it
unconditionally would probably be a no-no. Even with all that optimized
XSAVE* fun.
On demand would mean you allow FPU exceptions which probably gravitates
towards a no-no too.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On 5 June 2014 08:18, Borislav Petkov <[email protected]> wrote:.
>
> How are you going to detect when to save/restore state? Do it
> unconditionally would probably be a no-no. Even with all that optimized
> XSAVE* fun.
(I'm not talking about the crypto async code because I'm not familiar with it)
For the EFI pstore case we'd only be using this newly allocated
context space if we can't do the usual FPU xsave dance. e.g. we'd be
adding a new feature specifically for the !irq_fpu_usable() case. Only
then would we do an unconditional save. It would be useful to get some
numbers for this but I don't think it would be too bad, especially
given that it's in a fatal crash handler state anyway.
I don't think it's worth going to the trouble solely for the EFI
pstore code, but if it can also be used for the crypto code it might
be worth a look.
On Thu, Jun 05, 2014 at 09:49:08AM +0100, Matt Fleming wrote:
> On 5 June 2014 08:18, Borislav Petkov <[email protected]> wrote:.
> >
> > How are you going to detect when to save/restore state? Do it
> > unconditionally would probably be a no-no. Even with all that optimized
> > XSAVE* fun.
>
> (I'm not talking about the crypto async code because I'm not familiar with it)
>
> For the EFI pstore case we'd only be using this newly allocated
> context space if we can't do the usual FPU xsave dance. e.g. we'd be
> adding a new feature specifically for the !irq_fpu_usable() case. Only
> then would we do an unconditional save. It would be useful to get some
> numbers for this but I don't think it would be too bad, especially
> given that it's in a fatal crash handler state anyway.
>
> I don't think it's worth going to the trouble solely for the EFI
> pstore code, but if it can also be used for the crypto code it might
> be worth a look.
Right, if we do this only for special, slowpath cases, then we're
probably fine with unconditional. It would be simpler too.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Thu, Jun 5, 2014 at 2:02 AM, Borislav Petkov <[email protected]> wrote:
> On Thu, Jun 05, 2014 at 09:49:08AM +0100, Matt Fleming wrote:
>> On 5 June 2014 08:18, Borislav Petkov <[email protected]> wrote:.
>> >
>> > How are you going to detect when to save/restore state? Do it
>> > unconditionally would probably be a no-no. Even with all that optimized
>> > XSAVE* fun.
>>
>> (I'm not talking about the crypto async code because I'm not familiar with it)
>>
>> For the EFI pstore case we'd only be using this newly allocated
>> context space if we can't do the usual FPU xsave dance. e.g. we'd be
>> adding a new feature specifically for the !irq_fpu_usable() case. Only
>> then would we do an unconditional save. It would be useful to get some
>> numbers for this but I don't think it would be too bad, especially
>> given that it's in a fatal crash handler state anyway.
>>
>> I don't think it's worth going to the trouble solely for the EFI
>> pstore code, but if it can also be used for the crypto code it might
>> be worth a look.
>
> Right, if we do this only for special, slowpath cases, then we're
> probably fine with unconditional. It would be simpler too.
Are there weird contexts from which EFI calls can happen? It looks
like the current code isn't necessarily safe in things that aren't
normal process context but aren't interrupts either (e.g. debug traps,
#GP, etc).
I wonder if it would make sense at some point to maintain an explicit
stack of kernel entries. There doesn't seem to be a reliable way to
answer the question of "what context am I in" from C code right now.
--Andy
On Thu, Jun 05, 2014 at 08:44:20AM -0700, Andy Lutomirski wrote:
> Are there weird contexts from which EFI calls can happen? It looks
> like the current code isn't necessarily safe in things that aren't
> normal process context but aren't interrupts either (e.g. debug traps,
> #GP, etc).
The efi-pstore thing registers as a kmsg dumper which can be run in NMI
context and efi can be called there.
> I wonder if it would make sense at some point to maintain an explicit
> stack of kernel entries. There doesn't seem to be a reliable way to
> answer the question of "what context am I in" from C code right now.
So that you can ask int ctxt = what_context_Im_in() and then that
context can go and change right underneath you. :-)
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Thu, Jun 5, 2014 at 8:53 AM, Borislav Petkov <[email protected]> wrote:
> On Thu, Jun 05, 2014 at 08:44:20AM -0700, Andy Lutomirski wrote:
>> Are there weird contexts from which EFI calls can happen? It looks
>> like the current code isn't necessarily safe in things that aren't
>> normal process context but aren't interrupts either (e.g. debug traps,
>> #GP, etc).
>
> The efi-pstore thing registers as a kmsg dumper which can be run in NMI
> context and efi can be called there.
NMI might be okay. I haven't checked.
>
>> I wonder if it would make sense at some point to maintain an explicit
>> stack of kernel entries. There doesn't seem to be a reliable way to
>> answer the question of "what context am I in" from C code right now.
>
> So that you can ask int ctxt = what_context_Im_in() and then that
> context can go and change right underneath you. :-)
>
It has to change back, though. Completely unrealistic and useless example:
int ctxt = what_context_im_in();
set_up_the_fpu(ctxt);
// kprobe fires and changes the context
// kprobe does something
// kprobe changes the context back
use the FPU. Life is good.
put_back_the_fpu(ctxt);
--Andy
On Thu, Jun 05, 2014 at 09:01:02AM -0700, Andy Lutomirski wrote:
> NMI might be okay. I haven't checked.
Well, if efi decides to do FPU math and it happens in NMI, we will have
to provide for proper contexts handling.
> It has to change back, though. Completely unrealistic and useless example:
>
> int ctxt = what_context_im_in();
>
> set_up_the_fpu(ctxt);
>
> // kprobe fires and changes the context
> // kprobe does something
And since we're being completely unrealistic, kprobe decides to use the
fpu too and uses it...
> // kprobe changes the context back
>
> use the FPU. Life is good.
>
> put_back_the_fpu(ctxt);
So you probably need some way of mapping preallocated, per-cpu FPU
contexts to their users which can get and put them.
It's a whole different question whether that makes sense though, at all
or we simply remain conservative and don't do any efi in NMI context...
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On Thu, Jun 5, 2014 at 9:08 AM, Borislav Petkov <[email protected]> wrote:
> On Thu, Jun 05, 2014 at 09:01:02AM -0700, Andy Lutomirski wrote:
>> NMI might be okay. I haven't checked.
>
> Well, if efi decides to do FPU math and it happens in NMI, we will have
> to provide for proper contexts handling.
>
>> It has to change back, though. Completely unrealistic and useless example:
>>
>> int ctxt = what_context_im_in();
>>
>> set_up_the_fpu(ctxt);
>>
>> // kprobe fires and changes the context
>> // kprobe does something
>
> And since we're being completely unrealistic, kprobe decides to use the
> fpu too and uses it...
Sure. So it either needs to save and restore the state or to save it
and mark it for restore when the kprobe entry context is torn down.
>
>> // kprobe changes the context back
>>
>> use the FPU. Life is good.
>>
>> put_back_the_fpu(ctxt);
>
> So you probably need some way of mapping preallocated, per-cpu FPU
> contexts to their users which can get and put them.
>
> It's a whole different question whether that makes sense though, at all
> or we simply remain conservative and don't do any efi in NMI context...
Is this NMI pstore thing done from any context that's supposed to be
recoverable? It's always safe to use the FPU and then panic :)
Anyway, if we're just talking about EFI, there's an easier solution:
just preallocate a per-cpu FPU context for EFI and make the whole mess
be local to the EFI code. For crypto, that's not so good.
--Andy
On Thu, Jun 05, 2014 at 09:14:51AM -0700, Andy Lutomirski wrote:
> Is this NMI pstore thing done from any context that's supposed to be
> recoverable? It's always safe to use the FPU and then panic :)
Right :)
> Anyway, if we're just talking about EFI, there's an easier solution:
> just preallocate a per-cpu FPU context for EFI and make the whole mess
> be local to the EFI code. For crypto, that's not so good.
This is probably something for Matt to decide but it sounds doable. If
I'd have to guess, sooner or later we will need to do proper FPU context
handling for EFI as I don't see anything stopping it from using FPU
insns. At least we won't. :-)
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On 06/05/2014 09:18 AM, Borislav Petkov wrote:
> On Thu, Jun 05, 2014 at 09:14:51AM -0700, Andy Lutomirski wrote:
>> Is this NMI pstore thing done from any context that's supposed to be
>> recoverable? It's always safe to use the FPU and then panic :)
>
> Right :)
>
>> Anyway, if we're just talking about EFI, there's an easier solution:
>> just preallocate a per-cpu FPU context for EFI and make the whole mess
>> be local to the EFI code. For crypto, that's not so good.
>
> This is probably something for Matt to decide but it sounds doable. If
> I'd have to guess, sooner or later we will need to do proper FPU context
> handling for EFI as I don't see anything stopping it from using FPU
> insns. At least we won't. :-)
>
The bottom line is that we can't call EFI from a context where we can't
use the FPU. Or specifically, we can't then resume execution. If all
we're doing is stashing away some data before dying, well, then, by all
means - but we need to make sure that is what actually happens.
As far adding additional xstate save areas, the current size of the
xstate is about ~2.5K for AVX-512 enabled processors, and we need one
per thread. If we make that two copies, then
kernel_fpu_begin()..._end() would no longer have to disable preemption,
but it wouldn't resolve the conflict about using the FPU from IRQ
context when inside kernel_fpu_begin().._end().
To support the FPU in IRQ context we end up having to create a percpu
FPU state stack, and it becomes then a matter of how deep that stack
would have to be.
-hpa
On Thu, Jun 5, 2014 at 9:31 AM, H. Peter Anvin <[email protected]> wrote:
> On 06/05/2014 09:18 AM, Borislav Petkov wrote:
>> On Thu, Jun 05, 2014 at 09:14:51AM -0700, Andy Lutomirski wrote:
>>> Is this NMI pstore thing done from any context that's supposed to be
>>> recoverable? It's always safe to use the FPU and then panic :)
>>
>> Right :)
>>
>>> Anyway, if we're just talking about EFI, there's an easier solution:
>>> just preallocate a per-cpu FPU context for EFI and make the whole mess
>>> be local to the EFI code. For crypto, that's not so good.
>>
>> This is probably something for Matt to decide but it sounds doable. If
>> I'd have to guess, sooner or later we will need to do proper FPU context
>> handling for EFI as I don't see anything stopping it from using FPU
>> insns. At least we won't. :-)
>>
>
> The bottom line is that we can't call EFI from a context where we can't
> use the FPU. Or specifically, we can't then resume execution. If all
> we're doing is stashing away some data before dying, well, then, by all
> means - but we need to make sure that is what actually happens.
I bet we have to be extra careful about EFI: I imagine that, if we
take an NMI or MCE while in EFI code, another call into EFI is a
terrible idea, which might be a good reason to have the EFI code track
its own context and prevent nesting.
>
> As far adding additional xstate save areas, the current size of the
> xstate is about ~2.5K for AVX-512 enabled processors, and we need one
> per thread. If we make that two copies, then
> kernel_fpu_begin()..._end() would no longer have to disable preemption,
> but it wouldn't resolve the conflict about using the FPU from IRQ
> context when inside kernel_fpu_begin().._end().
>
> To support the FPU in IRQ context we end up having to create a percpu
> FPU state stack, and it becomes then a matter of how deep that stack
> would have to be.
I suppose it's a question of how valuable making a change like this
would be (two per-thread xstate areas plus a bunch per cpu, say). I
doubt that the memory usage matters much, but writing and maintaining
it will be a mild PITA. Maybe no worse than the current stuff.
What are the major users? If it worked really well, we could enable
SSE for all kernel code, but at least all the crypto stuff would
benefit a lot.
--Andy
On Thu, Jun 05, 2014 at 09:31:12AM -0700, H. Peter Anvin wrote:
> The bottom line is that we can't call EFI from a context where we can't
> use the FPU. Or specifically, we can't then resume execution.
Can't we allocate a save-state area, stash the state there and let EFI
scribble over it? When EFI returns, we scribble over it back assuming it
has done the saving/restoring on its own.
> If all we're doing is stashing away some data before dying, well,
> then, by all means - but we need to make sure that is what actually
> happens.
Yeah, who knows, we might return. I'm thinking of a #MC here which is
serious enough to real exception, we do some handling and issue the
error info into pstore and continue execution. Purely hypothetical
though.
> As far adding additional xstate save areas, the current size of the
> xstate is about ~2.5K for AVX-512 enabled processors, and we need one
> per thread. If we make that two copies, then
> kernel_fpu_begin()..._end() would no longer have to disable preemption,
> but it wouldn't resolve the conflict about using the FPU from IRQ
> context when inside kernel_fpu_begin().._end().
>
> To support the FPU in IRQ context we end up having to create a percpu
> FPU state stack, and it becomes then a matter of how deep that stack
> would have to be.
... if it all makes sense at all, of course.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--
On 06/05/2014 09:35 AM, Andy Lutomirski wrote:
>>
>> The bottom line is that we can't call EFI from a context where we can't
>> use the FPU. Or specifically, we can't then resume execution. If all
>> we're doing is stashing away some data before dying, well, then, by all
>> means - but we need to make sure that is what actually happens.
>
> I bet we have to be extra careful about EFI: I imagine that, if we
> take an NMI or MCE while in EFI code, another call into EFI is a
> terrible idea, which might be a good reason to have the EFI code track
> its own context and prevent nesting.
>
I strongly suspect we also want to make sure that we don't ever let more
than one CPU into EFI context. This would also make it possible to have
a dedicated XSAVE buffer for EFI.
Now, the tricky part: what do we do if another CPU is already in EFI and
we want to do something.
>
> I suppose it's a question of how valuable making a change like this
> would be (two per-thread xstate areas plus a bunch per cpu, say). I
> doubt that the memory usage matters much, but writing and maintaining
> it will be a mild PITA. Maybe no worse than the current stuff.
>
> What are the major users? If it worked really well, we could enable
> SSE for all kernel code, but at least all the crypto stuff would
> benefit a lot.
>
No, enabling SSE for all kernel code would force us to context-switch on
any kernel entry or exit, which is way too expensive for what it gains.
However, perhaps the realtime people will be happier if we don't have
to stop preemption.
-hpa
On 06/05/2014 09:37 AM, Borislav Petkov wrote:
> On Thu, Jun 05, 2014 at 09:31:12AM -0700, H. Peter Anvin wrote:
>> The bottom line is that we can't call EFI from a context where we can't
>> use the FPU. Or specifically, we can't then resume execution.
>
> Can't we allocate a save-state area, stash the state there and let EFI
> scribble over it? When EFI returns, we scribble over it back assuming it
> has done the saving/restoring on its own.
I'm sorry I don't follow.
-hpa
On Thu, Jun 05, 2014 at 09:43:50AM -0700, H. Peter Anvin wrote:
> I'm sorry I don't follow.
Exactly what you said in the other mail to Andy: a dedicated XSAVE
buffer for EFI.
--
Regards/Gruss,
Boris.
Sent from a fat crate under my desk. Formatting is fine.
--