Remove barrier_nospec(), which translates to LFENCE, in
vmx_spec_ctrl_restore_host() as RSB-barriers (as defined by [1])
already exist prior to this point.
This LFENCE was added on commit fc02735b14ff ("KVM: VMX: Prevent guest
RSB poisoning attacks with eIBRS") in the 5.19 timeframe; however,
commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") in
6.0 timeframe added a LFENCE for X86_FEATURE_RSB_VMEXIT_LITE was added
directly in vmx_vmexit, prior to CALL vmx_spec_ctrl_restore_host.
For posterity, vmx_spec_ctrl_restore_host also will execute WRMSR to
IA32_SPEC_CTRL for X86_FEATURE_KERNEL_IBRS or when guest/host MSR value
does not match, which serves as an additional RSB-barrier.
[1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html
Fixes: fc02735b14ff ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS")
Fixes: 2b1299322016 ("x86/speculation: Add RSB VM Exit protections")
Signed-off-by: Jon Kohler <[email protected]>
CC: Andrew Cooper <[email protected]>
CC: Daniel Sneddon <[email protected]>
CC: Josh Poimboeuf <[email protected]>
CC: Pawan Gupta <[email protected]>
CC: Peter Zijlstra (Intel) <[email protected]>
---
arch/x86/kvm/vmx/vmx.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 44fb619803b8..98ca8b79aab1 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7130,8 +7130,6 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) ||
vmx->spec_ctrl != hostval)
native_wrmsrl(MSR_IA32_SPEC_CTRL, hostval);
-
- barrier_nospec();
}
static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
--
2.30.1 (Apple Git-130)
On Wed, May 31, 2023 at 11:01:12AM -0400, Jon Kohler wrote:
> Remove barrier_nospec(), which translates to LFENCE, in
> vmx_spec_ctrl_restore_host() as RSB-barriers (as defined by [1])
> already exist prior to this point.
>
> This LFENCE was added on commit fc02735b14ff ("KVM: VMX: Prevent guest
> RSB poisoning attacks with eIBRS") in the 5.19 timeframe; however,
> commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") in
> 6.0 timeframe added a LFENCE for X86_FEATURE_RSB_VMEXIT_LITE was added
> directly in vmx_vmexit, prior to CALL vmx_spec_ctrl_restore_host.
>
> For posterity, vmx_spec_ctrl_restore_host also will execute WRMSR to
> IA32_SPEC_CTRL for X86_FEATURE_KERNEL_IBRS or when guest/host MSR value
> does not match, which serves as an additional RSB-barrier.
>
> [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html
>
> Fixes: fc02735b14ff ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS")
> Fixes: 2b1299322016 ("x86/speculation: Add RSB VM Exit protections")
Sorry, I knew I should have put a comment there.
The goal of this barrier_nospec() is to prevent speculative execution
from bypassing the SPEC_CTRL write (due to misprediction of the
conditional branch, Spectre v1 style). Otherwise the next indirect
branch or unbalanced RET could be an attack target.
So any previous LFENCEs before that conditional branch won't help here.
--
Josh
> On May 31, 2023, at 7:18 PM, Josh Poimboeuf <[email protected]> wrote:
>
> On Wed, May 31, 2023 at 11:01:12AM -0400, Jon Kohler wrote:
>> Remove barrier_nospec(), which translates to LFENCE, in
>> vmx_spec_ctrl_restore_host() as RSB-barriers (as defined by [1])
>> already exist prior to this point.
>>
>> This LFENCE was added on commit fc02735b14ff ("KVM: VMX: Prevent guest
>> RSB poisoning attacks with eIBRS") in the 5.19 timeframe; however,
>> commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") in
>> 6.0 timeframe added a LFENCE for X86_FEATURE_RSB_VMEXIT_LITE was added
>> directly in vmx_vmexit, prior to CALL vmx_spec_ctrl_restore_host.
>>
>> For posterity, vmx_spec_ctrl_restore_host also will execute WRMSR to
>> IA32_SPEC_CTRL for X86_FEATURE_KERNEL_IBRS or when guest/host MSR value
>> does not match, which serves as an additional RSB-barrier.
>>
>> [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__www.intel.com_content_www_us_en_developer_articles_technical_software-2Dsecurity-2Dguidance_advisory-2Dguidance_post-2Dbarrier-2Dreturn-2Dstack-2Dbuffer-2Dpredictions.html&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=NGPRGGo37mQiSXgHKm5rCQ&m=mWtdoxDY5cOR_aKAtV9D8kHfeWi380k1kYwGB_RAPTEL1F_AUSstYbevyVn9lhk-&s=IG-gZfjPGO_XI9FkdbrvZFvHPyWMQD8EK9AuBEpVY94&e=
>>
>> Fixes: fc02735b14ff ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS")
>> Fixes: 2b1299322016 ("x86/speculation: Add RSB VM Exit protections")
>
> Sorry, I knew I should have put a comment there.
>
> The goal of this barrier_nospec() is to prevent speculative execution
> from bypassing the SPEC_CTRL write (due to misprediction of the
> conditional branch, Spectre v1 style). Otherwise the next indirect
> branch or unbalanced RET could be an attack target.
>
> So any previous LFENCEs before that conditional branch won't help here.
>
> --
> Josh
Ah interesting. Ok, to be clear, thats a guest -> host attack, correct? And such
an attack would not at all be thwarted by the first CALL retire + LFENCE that
was added on commit 2b1299322016 ("x86/speculation: Add RSB VM Exit
protections”)? Sorry to be long winded, just wanting to triple check because
the aforementioned commit was added slightly after the original one, and I
want to make extra sure that they aren’t solving the same thing.
If that is indeed the case, does that commit need to be revisited at all?
Or are we saying that this Intel vulnerability needs *two* LFENCE’s to keep
the host secure?
If that’s the case, that’s fine, and I’m happy to transform this commit into
some comments in this area to capture the issue for future onlookers?
On Wed, May 31, 2023 at 11:58:12PM +0000, Jon Kohler wrote:
> > The goal of this barrier_nospec() is to prevent speculative execution
> > from bypassing the SPEC_CTRL write (due to misprediction of the
> > conditional branch, Spectre v1 style). Otherwise the next indirect
> > branch or unbalanced RET could be an attack target.
> >
> > So any previous LFENCEs before that conditional branch won't help here.
>
> Ah interesting. Ok, to be clear, thats a guest -> host attack, correct? And such
> an attack would not at all be thwarted by the first CALL retire + LFENCE that
> was added on commit 2b1299322016 ("x86/speculation: Add RSB VM Exit
> protections”)?
Right.
> Sorry to be long winded, just wanting to triple check because
> the aforementioned commit was added slightly after the original one, and I
> want to make extra sure that they aren’t solving the same thing.
>
> If that is indeed the case, does that commit need to be revisited at all?
>
> Or are we saying that this Intel vulnerability needs *two* LFENCE’s to keep
> the host secure?
The first LFENCE (FILL_RETURN_BUFFER) forces the CALL to retire so the
RSB stuff is guaranteed to take effect before the next unbalanced RET
can be speculatively executed.
The second LFENCE (vmx_spec_ctrl_restore_host) forces the conditional
branch to retire so the SPEC_CTRL write (potential IBRS/eIBRS
enablement) is guaranteed to take effect before the next indirect branch
and/or unbalanced RET can be speculatively executed.
So each LFENCE has a distinct purpose. That said, there are no indirect
branches or unbalanced RETs between them. So it should be fine to
combine them into a single LFENCE after both.
You could for example just remove the first LFENCE. But only for that
usage site, i.e. not for other users of FILL_RETURN_BUFFER.
Or, remove them both and add an LFENCE in vmx_vmexit() right after the
call to vmx_spec_ctrl_restore_host(). That might be clearer. Then it
could have a comment describing its dual purposes.
--
Josh
On Thu, Jun 01, 2023 at 01:29:12AM +0100, Andrew Cooper wrote:
> On 31/05/2023 4:01 pm, Jon Kohler wrote:
> > Remove barrier_nospec(), which translates to LFENCE, in
> > vmx_spec_ctrl_restore_host() as RSB-barriers (as defined by [1])
> > already exist prior to this point.
> >
> > This LFENCE was added on commit fc02735b14ff ("KVM: VMX: Prevent guest
> > RSB poisoning attacks with eIBRS") in the 5.19 timeframe; however,
> > commit 2b1299322016 ("x86/speculation: Add RSB VM Exit protections") in
> > 6.0 timeframe added a LFENCE for X86_FEATURE_RSB_VMEXIT_LITE was added
> > directly in vmx_vmexit, prior to CALL vmx_spec_ctrl_restore_host.
> >
> > For posterity, vmx_spec_ctrl_restore_host also will execute WRMSR to
> > IA32_SPEC_CTRL for X86_FEATURE_KERNEL_IBRS or when guest/host MSR value
> > does not match, which serves as an additional RSB-barrier.
> >
> > [1] https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/advisory-guidance/post-barrier-return-stack-buffer-predictions.html
>
> Yeah, unfortunately PBRSB is insidious.
>
> From memory (please correct me if I'm wrong), the required safety
> property is this: After a VMExit (if IBRS was set prior to exit) or the
> write to MSR_SPEC_CTRL setting IBRS (if IBRS was not set prior to exit),
> one single CALL instruction must architecturally retire before any RET
> instructions execute (speculatively).
>
> There are several ways to arrange this, but they all basically boil down
> to having some serialising instruction between the first CALL and first
> RET on any reachable path from VMExit.
The document says the problem is *unbalanced* RET, i.e. RSB underflow.
So the mitigation needs a single RSB stuff (i.e., unbalanced CALL) and
then an LFENCE anytime before the next unbalanced RET.
--
Josh
On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote:
> On 01/06/2023 1:42 am, Josh Poimboeuf wrote:
> > So each LFENCE has a distinct purpose. That said, there are no indirect
> > branches or unbalanced RETs between them.
>
> How lucky are you feeling?
>
> You're in C at this point, which means the compiler could have emitted a
> call to mem{cpy,cmp}() in place of a simple assignment/comparison.
But it's only unbalanced RETs we're concerned about, per Intel.
--
Josh
On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote:
> On 01/06/2023 1:42 am, Josh Poimboeuf wrote:
> > So each LFENCE has a distinct purpose. That said, there are no indirect
> > branches or unbalanced RETs between them.
>
> How lucky are you feeling?
>
> You're in C at this point, which means the compiler could have emitted a
> call to mem{cpy,cmp}() in place of a simple assignment/comparison.
Moving the second LFENCE to the else part of WRMSR should be possible?
So that the serialization can be achived either by WRMSR or LFENCE. This
saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ.
---
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index d2d6e1b6c788..d32e6d172dd6 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7157,8 +7157,8 @@ void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) ||
vmx->spec_ctrl != hostval)
native_wrmsrl(MSR_IA32_SPEC_CTRL, hostval);
-
- barrier_nospec();
+ else
+ barrier_nospec();
}
static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
On Wed, May 31, 2023 at 06:24:29PM -0700, Pawan Gupta wrote:
## 2023-05-31
> On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote:
> > On 01/06/2023 1:42 am, Josh Poimboeuf wrote:
> > > So each LFENCE has a distinct purpose. That said, there are no indirect
> > > branches or unbalanced RETs between them.
> >
> > How lucky are you feeling?
> >
> > You're in C at this point, which means the compiler could have emitted a
> > call to mem{cpy,cmp}() in place of a simple assignment/comparison.
>
> Moving the second LFENCE to the else part of WRMSR should be possible?
> So that the serialization can be achived either by WRMSR or LFENCE. This
> saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ.
Yes. Though in practice it might not make much of a difference. With
wrmsr+lfence, the lfence has nothing to do so it might be almost
instantaneous anyway.
--
Josh
> On Jun 1, 2023, at 12:23 AM, Josh Poimboeuf <[email protected]> wrote:
>
> On Wed, May 31, 2023 at 06:24:29PM -0700, Pawan Gupta wrote:
>
> ## 2023-05-31
>> On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote:
>>> On 01/06/2023 1:42 am, Josh Poimboeuf wrote:
>>>> So each LFENCE has a distinct purpose. That said, there are no indirect
>>>> branches or unbalanced RETs between them.
>>>
>>> How lucky are you feeling?
>>>
>>> You're in C at this point, which means the compiler could have emitted a
>>> call to mem{cpy,cmp}() in place of a simple assignment/comparison.
>>
>> Moving the second LFENCE to the else part of WRMSR should be possible?
>> So that the serialization can be achived either by WRMSR or LFENCE. This
>> saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ.
>
> Yes. Though in practice it might not make much of a difference. With
> wrmsr+lfence, the lfence has nothing to do so it might be almost
> instantaneous anyway.
>
> --
> Josh
Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above
FILL_RETURN_BUFFER, and dropped this LFENCE as I did here?
That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h,
and that would act as the “final line of defense” LFENCE.
Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur
before any sort of calls no matter what?
Thanks,
Jon
> On Jun 5, 2023, at 12:35 PM, Josh Poimboeuf <[email protected]> wrote:
>
> On Mon, Jun 05, 2023 at 02:29:02PM +0000, Jon Kohler wrote:
>>
>>
>>> On Jun 1, 2023, at 12:23 AM, Josh Poimboeuf <[email protected]> wrote:
>>>
>>> On Wed, May 31, 2023 at 06:24:29PM -0700, Pawan Gupta wrote:
>>>
>>> ## 2023-05-31
>>>> On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote:
>>>>> On 01/06/2023 1:42 am, Josh Poimboeuf wrote:
>>>>>> So each LFENCE has a distinct purpose. That said, there are no indirect
>>>>>> branches or unbalanced RETs between them.
>>>>>
>>>>> How lucky are you feeling?
>>>>>
>>>>> You're in C at this point, which means the compiler could have emitted a
>>>>> call to mem{cpy,cmp}() in place of a simple assignment/comparison.
>>>>
>>>> Moving the second LFENCE to the else part of WRMSR should be possible?
>>>> So that the serialization can be achived either by WRMSR or LFENCE. This
>>>> saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ.
>>>
>>> Yes. Though in practice it might not make much of a difference. With
>>> wrmsr+lfence, the lfence has nothing to do so it might be almost
>>> instantaneous anyway.
>>>
>>> --
>>> Josh
>>
>> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above
>> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here?
>>
>> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h,
>> and that would act as the “final line of defense” LFENCE.
>>
>> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur
>> before any sort of calls no matter what?
>
> If we go by Intel's statement that only unbalanced RETs are a concern,
> that *might* be ok as long as there's a nice comment above the
> FILL_RETURN_BUFFER usage site describing the two purposes for the
> LFENCE.
>
> However, based on Andy's concerns, which I've discussed with him
> privately (but I'm not qualified to agree or disagree with), we may want
> to just convert vmx_spec_ctrl_restore_host() to asm. Better safe than
> sorry. My original implementation of that function was actually asm. I
> can try to dig up that code.
>
> --
> Josh
RE ASM - that’d be nice, given that the restore guest pre-vmentry is now ASM,
would match nicely. Then all the code is in one spot, in one file, in one language.
If you could dig that up so we didn’t have to start from scratch, that’d be great (imho).
Jon
On Mon, Jun 05, 2023 at 02:29:02PM +0000, Jon Kohler wrote:
>
>
> > On Jun 1, 2023, at 12:23 AM, Josh Poimboeuf <[email protected]> wrote:
> >
> > On Wed, May 31, 2023 at 06:24:29PM -0700, Pawan Gupta wrote:
> >
> > ## 2023-05-31
> >> On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote:
> >>> On 01/06/2023 1:42 am, Josh Poimboeuf wrote:
> >>>> So each LFENCE has a distinct purpose. That said, there are no indirect
> >>>> branches or unbalanced RETs between them.
> >>>
> >>> How lucky are you feeling?
> >>>
> >>> You're in C at this point, which means the compiler could have emitted a
> >>> call to mem{cpy,cmp}() in place of a simple assignment/comparison.
> >>
> >> Moving the second LFENCE to the else part of WRMSR should be possible?
> >> So that the serialization can be achived either by WRMSR or LFENCE. This
> >> saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ.
> >
> > Yes. Though in practice it might not make much of a difference. With
> > wrmsr+lfence, the lfence has nothing to do so it might be almost
> > instantaneous anyway.
> >
> > --
> > Josh
>
> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above
> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here?
>
> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h,
> and that would act as the “final line of defense” LFENCE.
>
> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur
> before any sort of calls no matter what?
If we go by Intel's statement that only unbalanced RETs are a concern,
that *might* be ok as long as there's a nice comment above the
FILL_RETURN_BUFFER usage site describing the two purposes for the
LFENCE.
However, based on Andy's concerns, which I've discussed with him
privately (but I'm not qualified to agree or disagree with), we may want
to just convert vmx_spec_ctrl_restore_host() to asm. Better safe than
sorry. My original implementation of that function was actually asm. I
can try to dig up that code.
--
Josh
On Mon, Jun 05, 2023 at 04:39:02PM +0000, Jon Kohler wrote:
>
>
> > On Jun 5, 2023, at 12:35 PM, Josh Poimboeuf <[email protected]> wrote:
> >
> > On Mon, Jun 05, 2023 at 02:29:02PM +0000, Jon Kohler wrote:
> >>
> >>
> >>> On Jun 1, 2023, at 12:23 AM, Josh Poimboeuf <[email protected]> wrote:
> >>>
> >>> On Wed, May 31, 2023 at 06:24:29PM -0700, Pawan Gupta wrote:
> >>>
> >>> ## 2023-05-31
> >>>> On Thu, Jun 01, 2023 at 01:50:48AM +0100, Andrew Cooper wrote:
> >>>>> On 01/06/2023 1:42 am, Josh Poimboeuf wrote:
> >>>>>> So each LFENCE has a distinct purpose. That said, there are no indirect
> >>>>>> branches or unbalanced RETs between them.
> >>>>>
> >>>>> How lucky are you feeling?
> >>>>>
> >>>>> You're in C at this point, which means the compiler could have emitted a
> >>>>> call to mem{cpy,cmp}() in place of a simple assignment/comparison.
> >>>>
> >>>> Moving the second LFENCE to the else part of WRMSR should be possible?
> >>>> So that the serialization can be achived either by WRMSR or LFENCE. This
> >>>> saves an LFENCE when host and guest value of MSR_SPEC_CTRL differ.
> >>>
> >>> Yes. Though in practice it might not make much of a difference. With
> >>> wrmsr+lfence, the lfence has nothing to do so it might be almost
> >>> instantaneous anyway.
> >>>
> >>> --
> >>> Josh
> >>
> >> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above
> >> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here?
> >>
> >> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h,
> >> and that would act as the “final line of defense” LFENCE.
> >>
> >> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur
> >> before any sort of calls no matter what?
> >
> > If we go by Intel's statement that only unbalanced RETs are a concern,
> > that *might* be ok as long as there's a nice comment above the
> > FILL_RETURN_BUFFER usage site describing the two purposes for the
> > LFENCE.
We would then need FILL_RETURN_BUFFER to unconditionally execute LFENCE
to account for wrmsr branch misprediction. Currently LFENCE is not
executed for !X86_BUG_EIBRS_PBRSB.
> > However, based on Andy's concerns, which I've discussed with him
> > privately (but I'm not qualified to agree or disagree with), we may want
> > to just convert vmx_spec_ctrl_restore_host() to asm. Better safe than
> > sorry. My original implementation of that function was actually asm. I
> > can try to dig up that code.
Note:
VMexit
CALL
RET
RET <---- This is also a problem if the first call hasn't retired yet.
LFENCE
Converting vmx_spec_ctrl_restore_host() to ASM should be able to take care
of this.
On Mon, Jun 05, 2023, Pawan Gupta wrote:
> On Mon, Jun 05, 2023 at 04:39:02PM +0000, Jon Kohler wrote:
> > >>> Yes. Though in practice it might not make much of a difference. With
> > >>> wrmsr+lfence, the lfence has nothing to do so it might be almost
> > >>> instantaneous anyway.
> > >>>
> > >>> --
> > >>> Josh
> > >>
> > >> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above
> > >> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here?
> > >>
> > >> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h,
> > >> and that would act as the “final line of defense” LFENCE.
> > >>
> > >> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur
> > >> before any sort of calls no matter what?
> > >
> > > If we go by Intel's statement that only unbalanced RETs are a concern,
> > > that *might* be ok as long as there's a nice comment above the
> > > FILL_RETURN_BUFFER usage site describing the two purposes for the
> > > LFENCE.
>
> We would then need FILL_RETURN_BUFFER to unconditionally execute LFENCE
> to account for wrmsr branch misprediction. Currently LFENCE is not
> executed for !X86_BUG_EIBRS_PBRSB.
>
> > > However, based on Andy's concerns, which I've discussed with him
> > > privately (but I'm not qualified to agree or disagree with), we may want
> > > to just convert vmx_spec_ctrl_restore_host() to asm. Better safe than
> > > sorry. My original implementation of that function was actually asm. I
> > > can try to dig up that code.
>
> Note:
>
> VMexit
> CALL
> RET
> RET <---- This is also a problem if the first call hasn't retired yet.
> LFENCE
>
> Converting vmx_spec_ctrl_restore_host() to ASM should be able to take care
> of this.
Is there an actual bug here, or are we just micro-optimizing something that may or
may not need additional optimization? Unless there's a bug to be fixed, moving
code into ASM and increasing complexity doesn't seem worthwhile.
On Mon, Jun 05, 2023 at 11:31:34AM -0700, Sean Christopherson wrote:
> Is there an actual bug here, or are we just micro-optimizing something that may or
> may not need additional optimization? Unless there's a bug to be fixed, moving
> code into ASM and increasing complexity doesn't seem worthwhile.
I can't really argue with that. FWIW, here's the (completely untested)
patch.
---8<---
From: Josh Poimboeuf <[email protected]>
Subject: [PATCH] KVM: VMX: Convert vmx_spec_ctrl_restore_host() to assembly
Convert vmx_spec_ctrl_restore_host() to assembly. This allows the
removal of a redundant LFENCE. It also "feels" safer as it doesn't
allow the compiler to insert any surprises. And it's more symmetrical
with the pre-vmentry SPEC_CTRL handling, which is also done in assembly.
Signed-off-by: Josh Poimboeuf <[email protected]>
---
arch/x86/kvm/vmx/vmenter.S | 71 ++++++++++++++++++++++++++++++++------
arch/x86/kvm/vmx/vmx.c | 25 --------------
arch/x86/kvm/vmx/vmx.h | 1 -
3 files changed, 61 insertions(+), 36 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 631fd7da2bc3..977f3487469c 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -108,7 +108,7 @@ SYM_FUNC_START(__vmx_vcpu_run)
lea (%_ASM_SP), %_ASM_ARG2
call vmx_update_host_rsp
- ALTERNATIVE "jmp .Lspec_ctrl_done", "", X86_FEATURE_MSR_SPEC_CTRL
+ ALTERNATIVE "jmp .Lguest_spec_ctrl_done", "", X86_FEATURE_MSR_SPEC_CTRL
/*
* SPEC_CTRL handling: if the guest's SPEC_CTRL value differs from the
@@ -122,13 +122,13 @@ SYM_FUNC_START(__vmx_vcpu_run)
movl VMX_spec_ctrl(%_ASM_DI), %edi
movl PER_CPU_VAR(x86_spec_ctrl_current), %esi
cmp %edi, %esi
- je .Lspec_ctrl_done
+ je .Lguest_spec_ctrl_done
mov $MSR_IA32_SPEC_CTRL, %ecx
xor %edx, %edx
mov %edi, %eax
wrmsr
-.Lspec_ctrl_done:
+.Lguest_spec_ctrl_done:
/*
* Since vmentry is serializing on affected CPUs, there's no need for
@@ -253,9 +253,65 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL)
#endif
/*
- * IMPORTANT: RSB filling and SPEC_CTRL handling must be done before
- * the first unbalanced RET after vmexit!
+ * IMPORTANT: The below SPEC_CTRL handling and RSB filling must be done
+ * before the first RET after vmexit!
+ */
+
+ ALTERNATIVE "jmp .Lhost_spec_ctrl_done", "", X86_FEATURE_MSR_SPEC_CTRL
+
+ pop %_ASM_SI /* @flags */
+ pop %_ASM_DI /* @vmx */
+
+ /*
+ * if (flags & VMX_RUN_SAVE_SPEC_CTRL)
+ * vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
+ */
+ test $VMX_RUN_SAVE_SPEC_CTRL, %_ASM_SI
+ jz .Lhost_spec_ctrl_check
+
+ mov $MSR_IA32_SPEC_CTRL, %ecx
+ rdmsr
+ mov %eax, VMX_spec_ctrl(%_ASM_DI)
+
+.Lhost_spec_ctrl_check:
+ /*
+ * If the guest/host SPEC_CTRL values differ, restore the host value.
*
+ * For legacy IBRS, the IBRS bit always needs to be written after
+ * transitioning from a less privileged predictor mode, regardless of
+ * whether the guest/host values differ.
+ *
+ * if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) ||
+ * vmx->spec_ctrl != this_cpu_read(x86_spec_ctrl_current))
+ * native_wrmsrl(MSR_IA32_SPEC_CTRL, hostval);
+ */
+ ALTERNATIVE "", "jmp .Lhost_spec_ctrl_write", X86_FEATURE_KERNEL_IBRS
+ movl VMX_spec_ctrl(%_ASM_DI), %edi
+ movl PER_CPU_VAR(x86_spec_ctrl_current), %esi
+ cmp %edi, %esi
+ je .Lhost_spec_ctrl_done_lfence
+
+.Lhost_spec_ctrl_write:
+ mov $MSR_IA32_SPEC_CTRL, %ecx
+ xor %edx, %edx
+ mov %esi, %eax
+ wrmsr
+
+.Lhost_spec_ctrl_done_lfence:
+ /*
+ * This ensures that speculative execution doesn't incorrectly bypass
+ * the above SPEC_CTRL wrmsr by mispredicting the 'je'.
+ *
+ * It's only needed if the below FILL_RETURN_BUFFER doesn't do an
+ * LFENCE. Thus the X86_FEATURE_RSB_VMEXIT{_LITE} alternatives.
+ */
+ ALTERNATIVE_2 "lfence", \
+ "", X86_FEATURE_RSB_VMEXIT, \
+ "", X86_FEATURE_RSB_VMEXIT_LITE
+
+.Lhost_spec_ctrl_done:
+
+ /*
* For retpoline or IBRS, RSB filling is needed to prevent poisoned RSB
* entries and (in some cases) RSB underflow.
*
@@ -267,11 +323,6 @@ SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL)
FILL_RETURN_BUFFER %_ASM_CX, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT,\
X86_FEATURE_RSB_VMEXIT_LITE
- pop %_ASM_ARG2 /* @flags */
- pop %_ASM_ARG1 /* @vmx */
-
- call vmx_spec_ctrl_restore_host
-
/* Put return value in AX */
mov %_ASM_BX, %_ASM_AX
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 44fb619803b8..d353b0e23918 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7109,31 +7109,6 @@ void noinstr vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp)
}
}
-void noinstr vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx,
- unsigned int flags)
-{
- u64 hostval = this_cpu_read(x86_spec_ctrl_current);
-
- if (!cpu_feature_enabled(X86_FEATURE_MSR_SPEC_CTRL))
- return;
-
- if (flags & VMX_RUN_SAVE_SPEC_CTRL)
- vmx->spec_ctrl = __rdmsr(MSR_IA32_SPEC_CTRL);
-
- /*
- * If the guest/host SPEC_CTRL values differ, restore the host value.
- *
- * For legacy IBRS, the IBRS bit always needs to be written after
- * transitioning from a less privileged predictor mode, regardless of
- * whether the guest/host values differ.
- */
- if (cpu_feature_enabled(X86_FEATURE_KERNEL_IBRS) ||
- vmx->spec_ctrl != hostval)
- native_wrmsrl(MSR_IA32_SPEC_CTRL, hostval);
-
- barrier_nospec();
-}
-
static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu)
{
switch (to_vmx(vcpu)->exit_reason.basic) {
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 9e66531861cf..f9fab33f4d76 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -420,7 +420,6 @@ void vmx_set_virtual_apic_mode(struct kvm_vcpu *vcpu);
struct vmx_uret_msr *vmx_find_uret_msr(struct vcpu_vmx *vmx, u32 msr);
void pt_update_intercept_for_msr(struct kvm_vcpu *vcpu);
void vmx_update_host_rsp(struct vcpu_vmx *vmx, unsigned long host_rsp);
-void vmx_spec_ctrl_restore_host(struct vcpu_vmx *vmx, unsigned int flags);
unsigned int __vmx_vcpu_run_flags(struct vcpu_vmx *vmx);
bool __vmx_vcpu_run(struct vcpu_vmx *vmx, unsigned long *regs,
unsigned int flags);
--
2.40.1
> On Jun 5, 2023, at 2:31 PM, Sean Christopherson <[email protected]> wrote:
>
> On Mon, Jun 05, 2023, Pawan Gupta wrote:
>> On Mon, Jun 05, 2023 at 04:39:02PM +0000, Jon Kohler wrote:
>>>>>> Yes. Though in practice it might not make much of a difference. With
>>>>>> wrmsr+lfence, the lfence has nothing to do so it might be almost
>>>>>> instantaneous anyway.
>>>>>>
>>>>>> --
>>>>>> Josh
>>>>>
>>>>> Coming back to this, what if we hoisted call vmx_spec_ctrl_restore_host above
>>>>> FILL_RETURN_BUFFER, and dropped this LFENCE as I did here?
>>>>>
>>>>> That way, we wouldn’t have to mess with the internal LFENCE in nospec-branch.h,
>>>>> and that would act as the “final line of defense” LFENCE.
>>>>>
>>>>> Would that be acceptable? Or does FILL_RETURN_BUFFER *need* to occur
>>>>> before any sort of calls no matter what?
>>>>
>>>> If we go by Intel's statement that only unbalanced RETs are a concern,
>>>> that *might* be ok as long as there's a nice comment above the
>>>> FILL_RETURN_BUFFER usage site describing the two purposes for the
>>>> LFENCE.
>>
>> We would then need FILL_RETURN_BUFFER to unconditionally execute LFENCE
>> to account for wrmsr branch misprediction. Currently LFENCE is not
>> executed for !X86_BUG_EIBRS_PBRSB.
>>
>>>> However, based on Andy's concerns, which I've discussed with him
>>>> privately (but I'm not qualified to agree or disagree with), we may want
>>>> to just convert vmx_spec_ctrl_restore_host() to asm. Better safe than
>>>> sorry. My original implementation of that function was actually asm. I
>>>> can try to dig up that code.
>>
>> Note:
>>
>> VMexit
>> CALL
>> RET
>> RET <---- This is also a problem if the first call hasn't retired yet.
>> LFENCE
>>
>> Converting vmx_spec_ctrl_restore_host() to ASM should be able to take care
>> of this.
>
> Is there an actual bug here, or are we just micro-optimizing something that may or
> may not need additional optimization? Unless there's a bug to be fixed, moving
> code into ASM and increasing complexity doesn't seem worthwhile.
The (slight) bug here is that on systems where the host != guest spec ctrl, they get
hit with LFENCE + WRMSR to SPEC_CTRL + LFENCE, when in reality they should
just get LFENCE + WRMSR to SPEC_CTRL and thats it.
That would be satisfied with Pawan’s suggestion to move the LFENCE into the else
condition in the last branch of vmx_spec_ctrl_restore_host().
The optimization on top of that would be to see if we could whack that 2x LFENCE
down to 1x LFENCE. Just feels wrong to have 2x LFENCE’s in the critical path,
even if the second one ends up being fairly snappy because of the first one (and/or
the WRMSR).
So to recap, fixing “the bug” does not require moving to ASM. Optimizing the 2x LFENCE
into 1x LFENCE (probably) requires going to ASM from the sounds of it?
On Tue, Jun 06, 2023 at 01:20:52AM +0100, Andrew Cooper wrote:
<clip very useful summary which belongs in git somewhere>
> But, the safety of vmx_spec_ctrl_restore_host() in the first place
> depends on the early return never ever becoming a conditional,
Good point. And that would be easier to overlook in C.
> and the compiler never emitting a call to memcpy()/memset()/whatever
> behind your back - something which is not prohibited by noinstr.
Au contraire, objtool has checking for that:
if (state->noinstr && state->instr <= 0 &&
!noinstr_call_dest(file, insn, insn_call_dest(insn))) {
WARN_INSN(insn, "call to %s() leaves .noinstr.text section", call_dest_name(insn));
return 1;
}
Regardless, despite being the person who wrote this thing in C to begin
with, I believe asm really is a better fit due to the delicate and
precise nature of the mitigations.
--
Josh