On Thu, 2018-01-04 at 14:51 +0000, Andrew Cooper wrote:
>
> > * never turn off indirect branch prediction, but use a branch prediction
> > barrier on every mode switch (needed for current AMD microcode)
>
> Where have you got this idea from? Using IBPB on every mode switch
> would be an insane overhead to take, and isn't necessary.

AMD *only* has IBPB and not IBRS, but IIRC you don't need to do it on
every context switch into the kernel; only when switching between
VMs/processes?

> Also, remember that PTI and these mitigations are for orthogonal issues.
>
> Perhaps it is easiest to refer directly to the Xen SP2 mitigations and
> my commentary of what is going on:
> http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=blob;f=xen/arch/x86/spec_ctrl.c;h=79aedf774a390293dfd564ce978500085344e305;hb=refs/heads/sp2-mitigations-v6.5#l192
>
> With the GCC -mindirect-branch=thunk-external support, and microcode,
> Xen will make a boot-time choice between using Retpoline, Lfence (which
> is the better AMD option, and more performant than retpoline), or IBRS
> on Skylake and newer processors where it is strictly necessary, as well
> as using IBPB whenever available.

I need to pull in the AMD lfence alternative for retpoline, giving us a
3-way choice of the existing retpoline thunk, "lfence; jmp *%\reg", and
a bare "jmp *%\reg".

Then the IBRS bits can be added on top.

> It also supports virtualising IBRS for guest usage when the kernel has
> chosen not to use it; a configuration I haven't seen in any of the Linux
> patch series thusfar.

Adding that for KVM is in the Linux IBRS patch set that I've seen.
Didn't we already have a conversation about how the Linux patch set
does it as an atomically-switched MSR while you've done it manually in
Xen because it's faster?

Attachments:

smime.p7s (5.09 kB)

2018-01-04 15:32:08

by Paolo Bonzini

[permalink] [raw]

Subject: Re: Avoid speculative indirect calls in kernel

On 04/01/2018 15:51, Andrew Cooper wrote:
> Where have you got this idea from? Using IBPB on every mode switch
> would be an insane overhead to take, and isn't necessary.

IIRC it started as a paranoia mode for AMD, but then we found out it was
actually faster than IBRS on some Intel processor where IBRS performance
was horrible. But I don't remember the details of the performance
testing, sorry.

Paolo

> Also, remember that PTI and these mitigations are for orthogonal issues.
>
> Perhaps it is easiest to refer directly to the Xen SP2 mitigations and
> my commentary of what is going on:
> http://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=blob;f=xen/arch/x86/spec_ctrl.c;h=79aedf774a390293dfd564ce978500085344e305;hb=refs/heads/sp2-mitigations-v6.5#l192
>
> With the GCC -mindirect-branch=thunk-external support, and microcode,
> Xen will make a boot-time choice between using Retpoline, Lfence (which
> is the better AMD option, and more performant than retpoline), or IBRS
> on Skylake and newer processors where it is strictly necessary, as well
> as using IBPB whenever available.
>
> It also supports virtualising IBRS for guest usage when the kernel has
> chosen not to use it; a configuration I haven't seen in any of the Linux
> patch series thusfar.
>
> ~Andrew
>

2018-01-04 15:32:52

by Paolo Bonzini

[permalink] [raw]

Subject: Re: Avoid speculative indirect calls in kernel

On 04/01/2018 16:29, Woodhouse, David wrote:
> Adding that for KVM is in the Linux IBRS patch set that I've seen.
> Didn't we already have a conversation about how the Linux patch set
> does it as an atomically-switched MSR while you've done it manually in
> Xen because it's faster?

I'm also doing it manually in the RHEL versions of the KVM patches, for
what it's worth.

Paolo

2018-01-04 15:53:10

by Andi Kleen

[permalink] [raw]

Subject: Re: Avoid speculative indirect calls in kernel

> +.macro JMP_THUNK reg:req
> +#ifdef RETPOLINE
> +???????ALTERNATIVE __stringify(jmp __x86.indirect_thunk.\reg), __stringify(jmp *%\reg), X86_FEATURE_IBRS_ATT
> +#else
> +???????jmp *\reg
> +#endif
> +.endm

I remove that because what you're testing for doesn't exist in the tree yet.

Yes it can be added later.

Right now we just want a basic static version to work reliably.

-Andi

2018-01-04 15:56:37

2018-01-10 16:02:51

by Thomas Gleixner

[permalink] [raw]

Subject: Re: Avoid speculative indirect calls in kernel

On Tue, 9 Jan 2018, Dave Hansen wrote:
> On 01/09/2018 04:45 PM, Thomas Gleixner wrote:
> > On Mon, 8 Jan 2018, Andrea Arcangeli wrote:
> >> On Mon, Jan 08, 2018 at 09:53:02PM +0100, Thomas Gleixner wrote:
> >> Did my best to do the cleanest patch for tip, but I now figured Dave's
> >> original comment was spot on: a _PAGE_NX clear then becomes necessary
> >> also after pud_alloc not only after p4d_alloc.
> >>
> >> pmd_alloc would run into the same with x86 32bit non-PAE too.
>
> non-PAE doesn't have an NX bit. :)
>
> But we #define _PAGE_NX down to 0 there so it's harmless.
>
> >> So there are two choices, either going back to one single _PAGE_NX
> >> clear from the original Dave's original patch as below, or to add
> >> multiple clear after each level which was my objective and is more
> >> robust, but it may be overkill in this case. As long as it was one
> >> line it looked a clear improvement.
> >>
> >> Considering the caller in both cases is going to abort I guess we can
> >> use the one liner approach as Dave and Jiri did originally.
> >
> > Dave ?
>
> I agree with Andrea. The patch in -tip potentially misses the pgd
> clearing if pud_alloc() sets a PGD. It would also be nice to have that
> comment back.
>
> Note that the -tip commit probably works in *practice* because for two
> adjacent calls to map_tboot_page() that share a PGD entry, the first
> will clear NX, *then* allocate and set the PGD (without NX clear). The
> second call will *not* allocate but will clear the NX bit.
>
> The patch I think we want is attached.

Color me confused. I have queued the one below in tip. It lacks the comment
and does the !NX at a different place.

Thanks,

tglx

8<-----------------

commit 262b6b30087246abf09d6275eb0c0dc421bcbe38
Author: Dave Hansen <[email protected]>
Date: Sat Jan 6 18:41:14 2018 +0100

x86/tboot: Unbreak tboot with PTI enabled

This is another case similar to what EFI does: create a new set of
page tables, map some code at a low address, and jump to it. PTI
mistakes this low address for userspace and mistakenly marks it
non-executable in an effort to make it unusable for userspace.

Undo the poison to allow execution.

Fixes: 385ce0ea4c07 ("x86/mm/pti: Add Kconfig")
Signed-off-by: Dave Hansen <[email protected]>
Signed-off-by: Andrea Arcangeli <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Alan Cox <[email protected]>
Cc: Tim Chen <[email protected]>
Cc: Jon Masters <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Jeff Law <[email protected]>
Cc: Paolo Bonzini <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: David" <[email protected]>
Cc: Nick Clifton <[email protected]>
Cc: [email protected]
Link: https://lkml.kernel.org/r/[email protected]

diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index a4eb27918ceb..75869a4b6c41 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -127,6 +127,7 @@ static int map_tboot_page(unsigned long vaddr, unsigned long pfn,
p4d = p4d_alloc(&tboot_mm, pgd, vaddr);
if (!p4d)
return -1;
+ pgd->pgd &= ~_PAGE_NX;
pud = pud_alloc(&tboot_mm, p4d, vaddr);
if (!pud)
return -1;