2018-01-10 22:51:57

by Woodhouse, David

[permalink] [raw]
Subject: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

In accordance with the Intel and AMD documentation, we need to overwrite
all entries in the RSB on exiting a guest, to prevent malicious branch
target predictions from affecting the host kernel. This is needed both
for retpoline and for IBRS.

Signed-off-by: David Woodhouse <[email protected]>
---
Untested in this form although it's a variant on what we've had already.
I have an army of machines willing to do my bidding but nested virt
is non-trivial and I figure I might as well post it as someone else
can probably test it in less than the time it takes me to work out how.

This implements the most pressing of the RSB stuffing documented
by dhansen (based our discussions) in https://goo.gl/pXbvBE

arch/x86/include/asm/nospec-branch.h | 67 ++++++++++++++++++++++++++++++++++++
arch/x86/kvm/svm.c | 4 +++
arch/x86/kvm/vmx.c | 4 +++
3 files changed, 75 insertions(+)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 7d70ea9..e6a7ea0 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -7,6 +7,50 @@
#include <asm/alternative-asm.h>
#include <asm/cpufeatures.h>

+/*
+ * Use 32-N: 32 is the max return buffer size, but there should have been
+ * at a minimum two controlled calls already: one into the kernel from
+ * entry*.S and another into the function containing this macro. So N=2,
+ * thus 30.
+ */
+#define NUM_BRANCHES_TO_FILL 30
+
+/*
+ * Fill the CPU return stack buffer.
+ *
+ * Each entry in the RSB, if used for a speculative 'ret', contains an
+ * infinite 'pause; jmp' loop to capture speculative execution.
+ *
+ * This is required in various cases for retpoline and IBRS-based
+ * mitigations for the Spectre variant 2 vulnerability. Sometimes to
+ * eliminate potentially bogus entries from the RSB, and sometimes
+ * purely to ensure that doesn't get empty, which on some CPUs would
+ * allow predictions from other (unwanted!) sources to be used.
+ *
+ * We define a CPP macro such that it can be used from both .S files and
+ * inline assembly. It's possible to do a .macro and then include that
+ * from C via asm(".include <asm/nospec-branch.h>") but let's not go there.
+ */
+#define __FILL_RETURN_BUFFER(reg, sp, uniq) \
+ mov $(NUM_BRANCHES_TO_FILL/2), reg; \
+ .align 16; \
+.Ldo_call1_ ## uniq: \
+ call .Ldo_call2_ ## uniq; \
+.Ltrap1_ ## uniq: \
+ pause; \
+ jmp .Ltrap1_ ## uniq; \
+ .align 16; \
+.Ldo_call2_ ## uniq: \
+ call .Ldo_loop_ ## uniq; \
+.Ltrap2_ ## uniq: \
+ pause; \
+ jmp .Ltrap2_ ## uniq; \
+ .align 16; \
+.Ldo_loop_ ## uniq: \
+ dec reg; \
+ jnz .Ldo_call1_ ## uniq; \
+ add $(BITS_PER_LONG/8)*NUM_BRANCHES_TO_FILL, sp;
+
#ifdef __ASSEMBLY__

/*
@@ -61,6 +105,14 @@
#endif
.endm

+ /*
+ * A simpler FILL_RETURN_BUFFER macro. Don't make people use the CPP
+ * monstrosity above, manually.
+ */
+.macro FILL_RETURN_BUFFER reg:req
+ __FILL_RETURN_BUFFER(\reg,%_ASM_SP,\@)
+.endm
+
#else /* __ASSEMBLY__ */

#if defined(CONFIG_X86_64) && defined(RETPOLINE)
@@ -115,5 +167,20 @@ enum spectre_v2_mitigation {
SPECTRE_V2_IBRS,
};

+/*
+ * On VMEXIT we must ensure that no RSB predictions learned in the guest
+ * can be followed in the host, by overwriting the RSB completely. Both
+ * retpoline and IBRS mitigations for Spectre v2 need this; only on future
+ * CPUs with IBRS_ATT *might* it be avoided.
+ */
+static inline void vmexit_fill_RSB(void)
+{
+ unsigned long dummy;
+
+ asm volatile (ALTERNATIVE("",
+ __stringify(__FILL_RETURN_BUFFER(%0, %1, _%=)),
+ X86_FEATURE_RETPOLINE)
+ : "=r" (dummy), ASM_CALL_CONSTRAINT : : "memory" );
+}
#endif /* __ASSEMBLY__ */
#endif /* __NOSPEC_BRANCH_H__ */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 0e68f0b..2744b973 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -45,6 +45,7 @@
#include <asm/debugreg.h>
#include <asm/kvm_para.h>
#include <asm/irq_remapping.h>
+#include <asm/nospec-branch.h>

#include <asm/virtext.h>
#include "trace.h"
@@ -4985,6 +4986,9 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
#endif
);

+ /* Eliminate branch target predictions from guest mode */
+ vmexit_fill_RSB();
+
#ifdef CONFIG_X86_64
wrmsrl(MSR_GS_BASE, svm->host.gs_base);
#else
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 62ee436..d1e25db 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -50,6 +50,7 @@
#include <asm/apic.h>
#include <asm/irq_remapping.h>
#include <asm/mmu_context.h>
+#include <asm/nospec-branch.h>

#include "trace.h"
#include "pmu.h"
@@ -9403,6 +9404,9 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
#endif
);

+ /* Eliminate branch target predictions from guest mode */
+ vmexit_fill_RSB();
+
/* MSR_IA32_DEBUGCTLMSR is zeroed on vmexit. Restore it if needed */
if (debugctlmsr)
update_debugctlmsr(debugctlmsr);
--
2.7.4


2018-01-10 23:23:55

by David Lang

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

I somewhat hate to ask this, but for those of us following at home, what does
this add to the overhead?

I am remembering an estimate from mid last week that put retpoline at replacing
a 3 clock 'ret' with 30 clocks of eye-bleed code

2018-01-10 23:47:38

by Tim Chen

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

On 01/10/2018 02:51 PM, David Woodhouse wrote:

> + */
> +#define __FILL_RETURN_BUFFER(reg, sp, uniq) \
> + mov $(NUM_BRANCHES_TO_FILL/2), reg; \
> + .align 16; \
> +.Ldo_call1_ ## uniq: \
> + call .Ldo_call2_ ## uniq; \
> +.Ltrap1_ ## uniq: \
> + pause; \
> + jmp .Ltrap1_ ## uniq; \
> + .align 16; \
> +.Ldo_call2_ ## uniq: \
> + call .Ldo_loop_ ## uniq; \
> +.Ltrap2_ ## uniq: \
> + pause; \
> + jmp .Ltrap2_ ## uniq; \
> + .align 16; \
> +.Ldo_loop_ ## uniq: \
> + dec reg; \
> + jnz .Ldo_call1_ ## uniq; \
> + add $(BITS_PER_LONG/8)*NUM_BRANCHES_TO_FILL, sp;
> +
> #ifdef __ASSEMBLY__
>

> +
> + asm volatile (ALTERNATIVE("",
> + __stringify(__FILL_RETURN_BUFFER(%0, %1, _%=)),
> + X86_FEATURE_RETPOLINE)

We'll be patching in a fairly long set of instructions here. Maybe put
the ALTERNATIVE in the assembly and use a jmp skip_\@ for the ALTERNATIVE.

Thanks.

Tim

2018-01-10 23:47:57

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

On Wed, 2018-01-10 at 22:51 +0000, David Woodhouse wrote:
> In accordance with the Intel and AMD documentation, we need to overwrite
> all entries in the RSB on exiting a guest, to prevent malicious branch
> target predictions from affecting the host kernel. This is needed both
> for retpoline and for IBRS.
>
> Signed-off-by: David Woodhouse <[email protected]>
> ---
> Untested in this form although it's a variant on what we've had already.
> I have an army of machines willing to do my bidding but nested virt
> is non-trivial and I figure I might as well post it as someone else
> can probably test it in less than the time it takes me to work out how.

Now smoke tested with Intel VT-x, but not yet on AMD. Tom, would you be
able to do that?


> This implements the most pressing of the RSB stuffing documented
> by dhansen (based our discussions) in https://goo.gl/pXbvBE


Attachments:
smime.p7s (5.09 kB)

2018-01-11 00:05:57

by Woodhouse, David

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

On Wed, 2018-01-10 at 15:22 -0800, David Lang wrote:
> I somewhat hate to ask this, but for those of us following at home, what does 
> this add to the overhead?
>
> I am remembering an estimate from mid last week that put retpoline at replacing 
> a 3 clock 'ret' with 30 clocks of eye-bleed code

Retpoline doesn't replace 'ret'.

It replaces indirect branches (jmp *%rax) of which there aren't quite
as many in the kernel.

The eye-bleed retpoline thunk does actually stop speculation and cause
a pipeline stall. For the RSB stuffing that's not the case; there are
no barriers here.

The actual performance numbers depend on the precise CPU being used,
and I'm not sure anyone has done the microbenchmarks of each *specific*
part for of the mitigations separately. For this *particular* patch...
well, we strive to avoid vmexits anyway, and Intel has spent the last
decade adding more and more tricks to the CPU to help us *avoid*
vmexits. So a little extra overhead on the vmexit is something we can
probably tolerate.

FWIW the IBRS microcode also requires the RSB-stuffing, so it's kind of
orthogonal to the "retpoline is much faster than IBRS" observation.


Attachments:
smime.p7s (5.09 kB)

2018-01-11 00:06:09

by Woodhouse, David

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

On Wed, 2018-01-10 at 15:47 -0800, Tim Chen wrote:
>
> > +
> > +     asm volatile (ALTERNATIVE("",
> > +                               __stringify(__FILL_RETURN_BUFFER(%0, %1, _%=)),
> > +                               X86_FEATURE_RETPOLINE)
>
> We'll be patching in a fairly long set of instructions here.  Maybe put
> the ALTERNATIVE in the assembly and use a jmp skip_\@ for the ALTERNATIVE.

Perhaps the alternatives.h header could give me a clean way of doing this:

--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -177,7 +178,7 @@ static inline void vmexit_fill_RSB(void)
 {
        unsigned long dummy;
 
-       asm volatile (ALTERNATIVE("",
+       asm volatile (ALTERNATIVE("jmp " alt_end_marker "f",
                                  __stringify(__FILL_RETURN_BUFFER(%0, %1, _%=)),
                                  X86_FEATURE_RETPOLINE)
                      : "=r" (dummy), ASM_CALL_CONSTRAINT : : "memory" );


Attachments:
smime.p7s (5.09 kB)

2018-01-11 00:14:13

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

On 1/10/2018 5:47 PM, David Woodhouse wrote:
> On Wed, 2018-01-10 at 22:51 +0000, David Woodhouse wrote:
>> In accordance with the Intel and AMD documentation, we need to overwrite
>> all entries in the RSB on exiting a guest, to prevent malicious branch
>> target predictions from affecting the host kernel. This is needed both
>> for retpoline and for IBRS.
>>
>> Signed-off-by: David Woodhouse <[email protected]>
>> ---
>> Untested in this form although it's a variant on what we've had already.
>> I have an army of machines willing to do my bidding but nested virt
>> is non-trivial and I figure I might as well post it as someone else
>> can probably test it in less than the time it takes me to work out how.
>
> Now smoke tested with Intel VT-x, but not yet on AMD. Tom, would you be
> able to do that?

Yes, I'll try to get to it as soon as I can, but it might be tomorrow
(morning).

Thanks,
Tom

>
>
>> This implements the most pressing of the RSB stuffing documented
>> by dhansen (based our discussions) in https://goo.gl/pXbvBE

2018-01-11 01:04:57

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

On Wed, 2018-01-10 at 18:14 -0600, Tom Lendacky wrote:
> On 1/10/2018 5:47 PM, David Woodhouse wrote:
> > On Wed, 2018-01-10 at 22:51 +0000, David Woodhouse wrote:
> >> In accordance with the Intel and AMD documentation, we need to overwrite
> >> all entries in the RSB on exiting a guest, to prevent malicious branch
> >> target predictions from affecting the host kernel. This is needed both
> >> for retpoline and for IBRS.
> >>
> >> Signed-off-by: David Woodhouse <[email protected]>
> >> ---
> >> Untested in this form although it's a variant on what we've had already.
> >> I have an army of machines willing to do my bidding but nested virt
> >> is non-trivial and I figure I might as well post it as someone else
> >> can probably test it in less than the time it takes me to work out how.
> > 
> > Now smoke tested with Intel VT-x, but not yet on AMD. Tom, would you be
> > able to do that?
>
> Yes, I'll try to get to it as soon as I can, but it might be tomorrow
> (morning).

Thanks. I've pushed an updated version to
http://git.infradead.org/users/dwmw2/linux-retpoline.git/


Attachments:
smime.p7s (5.09 kB)

2018-01-11 01:13:07

by David Woodhouse

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

On Thu, 2018-01-11 at 01:04 +0000, David Woodhouse wrote:
> On Wed, 2018-01-10 at 18:14 -0600, Tom Lendacky wrote:
> > On 1/10/2018 5:47 PM, David Woodhouse wrote:
> > > Now smoke tested with Intel VT-x, but not yet on AMD. Tom, would you be
> > > able to do that?
> > Yes, I'll try to get to it as soon as I can, but it might be tomorrow
> > (morning).
> Thanks. I've pushed an updated version to
> http://git.infradead.org/users/dwmw2/linux-retpoline.git/

Oh, and the RSB-stuffing on kernel entry from userspace turns out now
to be an AMD-only thing, because it's for !SMEP && !PTI.

So we'll want to make up an appropriate feature bit and then do
'FILL_RETURN_BUFFER %a_reg X86_FEATURE_STUFF_RSB_K2U' in the
appropriate places in entry*.S. I think some of Tim's patch set already
highlighted the places it was needed?

With that, I think we have the final details for retpoline worked out
for everything except Skylake. And seriously, screw Skylake at least
for now. It can use IBRS, or take its chances with the additional
problems it might have.


Attachments:
smime.p7s (5.09 kB)

2018-01-11 08:42:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

On Thu, Jan 11, 2018 at 12:04:35AM +0000, Woodhouse, David wrote:
> On Wed, 2018-01-10 at 15:47 -0800, Tim Chen wrote:
> >
> > > +
> > > +?????asm volatile (ALTERNATIVE("",
> > > +?????????????????????????????? __stringify(__FILL_RETURN_BUFFER(%0, %1, _%=)),
> > > +?????????????????????????????? X86_FEATURE_RETPOLINE)
> >
> > We'll be patching in a fairly long set of instructions here.? Maybe put
> > the ALTERNATIVE in the assembly and use a jmp skip_\@ for the ALTERNATIVE.
>
> Perhaps the alternatives.h header could give me a clean way of doing this:
>
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -177,7 +178,7 @@ static inline void vmexit_fill_RSB(void)
> ?{
> ????????unsigned long dummy;
> ?
> -???????asm volatile (ALTERNATIVE("",
> +???????asm volatile (ALTERNATIVE("jmp " alt_end_marker "f",
> ??????????????????????????????????__stringify(__FILL_RETURN_BUFFER(%0, %1, _%=)),
> ??????????????????????????????????X86_FEATURE_RETPOLINE)
> ??????????????????????: "=r" (dummy), ASM_CALL_CONSTRAINT : : "memory" );


Or we teach the alternative thing to patch in a jmp to end instead of
NOP padding the entire thing as soon as the jmp (3 bytes) fits ?

2018-01-11 08:49:36

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

On Wed, Jan 10, 2018 at 10:51:22PM +0000, David Woodhouse wrote:
> This implements the most pressing of the RSB stuffing documented
> by dhansen (based our discussions) in https://goo.gl/pXbvBE

Only took me 3 readings to find interrupts/traps were in fact
enumerated. Could we plretty please separate them from the deep call
case? Also might make sense to explicitly mention SMI, as that will be
the most difficult to fix :-(

2018-01-11 08:51:06

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

On January 11, 2018 9:42:38 AM GMT+01:00, Peter Zijlstra <[email protected]> wrote:
>Or we teach the alternative thing to patch in a jmp to end instead of
>NOP padding the entire thing as soon as the jmp (3 bytes) fits ?

Or, even better: use alternative_call() to call functions instead of patching gazillion bytes.

--
Sent from a small device: formatting sux and brevity is inevitable.

2018-01-11 09:07:38

by Woodhouse, David

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

On Thu, 2018-01-11 at 09:49 +0100, Boris Petkov wrote:
> On January 11, 2018 9:42:38 AM GMT+01:00, Peter Zijlstra wrote:
> >Or we teach the alternative thing to patch in a jmp to end instead of
> >NOP padding the entire thing as soon as the jmp (3 bytes) fits ?
>
> Or, even better: use alternative_call() to call functions instead of patching gazillion bytes.

For this one I kind of wanted to keep it as a macro so we can select
which register it uses. I've taken the bulk of it out of the
ALTERNATIVE, and just switch between the first 'mov' instruction and a
jmp over the whole lot.

Looks like this now...

From 302622182f56825b7cf2c39ce88ea8c462d587fe Mon Sep 17 00:00:00 2001
From: David Woodhouse <[email protected]>
Date: Wed, 10 Jan 2018 22:32:24 +0000
Subject: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

In accordance with the Intel and AMD documentation, we need to overwrite
all entries in the RSB on exiting a guest, to prevent malicious branch
target predictions from affecting the host kernel. This is needed both
for retpoline and for IBRS.

Signed-off-by: David Woodhouse <[email protected]>
---
 arch/x86/include/asm/nospec-branch.h | 72 ++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm.c                   |  4 ++
 arch/x86/kvm/vmx.c                   |  4 ++
 3 files changed, 80 insertions(+)

diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index 7d70ea9..8fbc8b9 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -7,6 +7,50 @@
 #include <asm/alternative-asm.h>
 #include <asm/cpufeatures.h>
 
+/*
+ * Fill the CPU return stack buffer.
+ *
+ * Each entry in the RSB, if used for a speculative 'ret', contains an
+ * infinite 'pause; jmp' loop to capture speculative execution.
+ *
+ * This is required in various cases for retpoline and IBRS-based
+ * mitigations for the Spectre variant 2 vulnerability. Sometimes to
+ * eliminate potentially bogus entries from the RSB, and sometimes
+ * purely to ensure that it doesn't get empty, which on some CPUs would
+ * allow predictions from other (unwanted!) sources to be used.
+ *
+ * We define a CPP macro such that it can be used from both .S files and
+ * inline assembly. It's possible to do a .macro and then include that
+ * from C via asm(".include <asm/nospec-branch.h>") but let's not go there

2018-01-11 09:32:55

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

On Thu, Jan 11, 2018 at 09:07:09AM +0000, Woodhouse, David wrote:
> On Thu, 2018-01-11 at 09:49 +0100, Boris Petkov wrote:
> > On January 11, 2018 9:42:38 AM GMT+01:00, Peter Zijlstra wrote:
> > >Or we teach the alternative thing to patch in a jmp to end instead of
> > >NOP padding the entire thing as soon as the jmp (3 bytes) fits ?
> >
> > Or, even better: use alternative_call() to call functions instead of patching gazillion bytes.
>
> For this one I kind of wanted to keep it as a macro so we can select
> which register it uses.

can't you do lovely things like:

volatile asm ("call __fill_rsb_thunk_%1" : : "r" (dummy))

which would still let gcc select the register ?

2018-01-11 09:48:14

by Borislav Petkov

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

On Thu, Jan 11, 2018 at 10:32:31AM +0100, Peter Zijlstra wrote:
> can't you do lovely things like:
>
> volatile asm ("call __fill_rsb_thunk_%1" : : "r" (dummy))
>
> which would still let gcc select the register ?

Calling a function from asm is nasty because you need to pay attention
to clobbered registers as gcc doesn't see the function.

What one can do, I *think*, is do a non-inlined wrapping function and do
all the alternative_call() fun inside. There you can do all the fun and
have callee-clobbered regs which you can use.

--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

2018-01-11 09:58:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

On Thu, Jan 11, 2018 at 10:47:59AM +0100, Borislav Petkov wrote:
> On Thu, Jan 11, 2018 at 10:32:31AM +0100, Peter Zijlstra wrote:
> > can't you do lovely things like:
> >
> > volatile asm ("call __fill_rsb_thunk_%1" : : "r" (dummy))
> >
> > which would still let gcc select the register ?
>
> Calling a function from asm is nasty because you need to pay attention
> to clobbered registers as gcc doesn't see the function.

The point is that this is an asm function, much like the retpoline
thunks, replicated for each register.

And by stuffing the register in the function name and using a dummy
argument you let GCC pick which register to clobber.

Heck we could even pass in the actual stuff-count instead of treating it
as a pure dummy.

2018-01-11 10:00:21

by Woodhouse, David

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

On Thu, 2018-01-11 at 10:47 +0100, Borislav Petkov wrote:
> On Thu, Jan 11, 2018 at 10:32:31AM +0100, Peter Zijlstra wrote:
> >
> > can't you do lovely things like:
> >
> > volatile asm ("call __fill_rsb_thunk_%1" : : "r" (dummy))
> >
> > which would still let gcc select the register ?

I've had to do that for the __x86_indirect_thunk_\reg thunks and
provide all variants thereof, and export them to modules.

I'd much rather have this one inline. If I couldn't do that (which I
can, and have), then my next choice would probably have been to just
hard-code it to use %eax/%r12 and emit a simple call to that from the
call sites.

> Calling a function from asm is nasty because you need to pay attention
> to clobbered registers as gcc doesn't see the function.
>
> What one can do, I *think*, is do a non-inlined wrapping function and do
> all the alternative_call() fun inside. There you can do all the fun and
> have callee-clobbered regs which you can use.

That's OK; it's not really a C function and it only clobbers the *one*
register which is easy enough to tell __asm__() about.


Attachments:
smime.p7s (5.09 kB)

2018-01-12 00:34:43

by Tom Lendacky

[permalink] [raw]
Subject: Re: [PATCH] x86/retpoline: Fill return stack buffer on vmexit

On 1/10/2018 6:14 PM, Tom Lendacky wrote:
> On 1/10/2018 5:47 PM, David Woodhouse wrote:
>> On Wed, 2018-01-10 at 22:51 +0000, David Woodhouse wrote:
>>> In accordance with the Intel and AMD documentation, we need to overwrite
>>> all entries in the RSB on exiting a guest, to prevent malicious branch
>>> target predictions from affecting the host kernel. This is needed both
>>> for retpoline and for IBRS.
>>>
>>> Signed-off-by: David Woodhouse <[email protected]>
>>> ---
>>> Untested in this form although it's a variant on what we've had already.
>>> I have an army of machines willing to do my bidding but nested virt
>>> is non-trivial and I figure I might as well post it as someone else
>>> can probably test it in less than the time it takes me to work out how.
>>
>> Now smoke tested with Intel VT-x, but not yet on AMD. Tom, would you be
>> able to do that?
>
> Yes, I'll try to get to it as soon as I can, but it might be tomorrow
> (morning).

Sorry for the delay. This is working on my system for normal virt, but is
failing with nested virt. However, it does not appear that this patch is
the problem. I've gone back to commit 8d56eff266f3 in the x86/pti tree
(before retpoline, etc.) and nested virt is still broke. Not sure how
much time I want to spend digging through this since nested virt on
4.15-rc7 works.

Thanks,
Tom

>
> Thanks,
> Tom
>
>>
>>
>>> This implements the most pressing of the RSB stuffing documented
>>> by dhansen (based our discussions) in https://goo.gl/pXbvBE