2024-01-23 00:52:59

by Wilczynski, Michal

[permalink] [raw]
Subject: [PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction

Syzkaller found a warning triggered in nested_vmx_vmexit().
vmx->nested.nested_run_pending is non-zero, even though we're in
nested_vmx_vmexit(). Generally, trying to cancel a pending entry is
considered a bug. However in this particular scenario, the kernel
behavior seems correct.

Syzkaller scenario:
1) Set up VCPU's
2) Run some code with KVM_RUN in L2 as a nested guest
3) Return from KVM_RUN
4) Inject KVM_SMI into the VCPU
5) Change the EFER register with KVM_SET_SREGS to value 0x2501
6) Run some code on the VCPU using KVM_RUN
7) Observe following behavior:

kvm_smm_transition: vcpu 0: entering SMM, smbase 0x30000
kvm_entry: vcpu 0, rip 0x8000
kvm_entry: vcpu 0, rip 0x8000
kvm_entry: vcpu 0, rip 0x8002
kvm_smm_transition: vcpu 0: leaving SMM, smbase 0x30000
kvm_nested_vmenter: rip: 0x0000000000008002 vmcs: 0x0000000000007000
nested_rip: 0x0000000000000000 int_ctl: 0x00000000
event_inj: 0x00000000 nested_ept=n guest
cr3: 0x0000000000002000
kvm_nested_vmexit_inject: reason: TRIPLE_FAULT ext_inf1: 0x0000000000000000
ext_inf2: 0x0000000000000000 ext_int: 0x00000000
ext_int_err: 0x00000000

What happened here is an SMI was injected immediately and the handler was
called at address 0x8000; all is good. Later, an RSM instruction is
executed in an emulator to return to the nested VM. em_rsm() is called,
which leads to emulator_leave_smm(). A part of this function calls VMX/SVM
callback, in this case vmx_leave_smm(). It attempts to set up a pending
reentry to guest VM by calling nested_vmx_enter_non_root_mode() and sets
vmx->nested.nested_run_pending to one. Unfortunately, later in
emulator_leave_smm(), rsm_load_state_64() fails to write invalid EFER to
the MSR. This results in em_rsm() calling triple_fault callback. At this
point it's clear that the KVM should call the vmexit, but
vmx->nested.nested_run_pending is left set to 1.

Similar flow goes for SVM, as the bug also reproduces on AMD platforms.

To address this issue, reset the nested_run_pending flag in the
triple_fault handler. However, it's crucial to note that
nested_pending_run cannot be cleared in all cases. It should only be
cleared for the specific instruction requiring hardware VM-Enter to
complete the emulation, such as RSM. Previously, there were instances
where KVM prematurely synthesized a triple fault on nested VM-Enter. In
these cases, it is not appropriate to zero the nested_pending_run.

To resolve this, introduce a new emulator flag indicating the need for
HW VM-Enter to complete emulating RSM. Based on this flag, a decision can
be made in vendor-specific triple fault handlers about whether
nested_pending_run needs to be cleared.

Fixes: 759cbd59674a ("KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM")
Reported-by: Zheyu Ma <[email protected]>
Closes: https://lore.kernel.org/all/CAMhUBjmXMYsEoVYw_M8hSZjBMHh24i88QYm-RY6HDta5YZ7Wgw@mail.gmail.com
Signed-off-by: Michal Wilczynski <[email protected]>
---
v2:
- added new emulator flags indicating whether an instruction needs a
VM-Enter to complete emulation (Sean)
- fix in SVM nested triple_fault handler (Sean)
- only clear nested_run_pending on RSM instruction (Sean)

arch/x86/kvm/emulate.c | 4 +++-
arch/x86/kvm/kvm_emulate.h | 2 ++
arch/x86/kvm/svm/nested.c | 9 +++++++++
arch/x86/kvm/vmx/nested.c | 12 ++++++++++++
4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index e223043ef5b2..889460432eac 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -178,6 +178,7 @@
#define IncSP ((u64)1 << 54) /* SP is incremented before ModRM calc */
#define TwoMemOp ((u64)1 << 55) /* Instruction has two memory operand */
#define IsBranch ((u64)1 << 56) /* Instruction is considered a branch. */
+#define NeedVMEnter ((u64)1 << 57) /* Instruction needs HW VM-Enter to complete */

#define DstXacc (DstAccLo | SrcAccHi | SrcWrite)

@@ -4462,7 +4463,7 @@ static const struct opcode twobyte_table[256] = {
F(DstMem | SrcReg | Src2CL | ModRM, em_shld), N, N,
/* 0xA8 - 0xAF */
I(Stack | Src2GS, em_push_sreg), I(Stack | Src2GS, em_pop_sreg),
- II(EmulateOnUD | ImplicitOps, em_rsm, rsm),
+ II(EmulateOnUD | ImplicitOps | NeedVMEnter, em_rsm, rsm),
F(DstMem | SrcReg | ModRM | BitOp | Lock | PageTable, em_bts),
F(DstMem | SrcReg | Src2ImmByte | ModRM, em_shrd),
F(DstMem | SrcReg | Src2CL | ModRM, em_shrd),
@@ -4966,6 +4967,7 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len, int
}

ctxt->is_branch = opcode.flags & IsBranch;
+ ctxt->need_vm_enter = opcode.flags & NeedVMEnter;

/* Unrecognised? */
if (ctxt->d == 0)
diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h
index e6d149825169..1e1366afa51d 100644
--- a/arch/x86/kvm/kvm_emulate.h
+++ b/arch/x86/kvm/kvm_emulate.h
@@ -372,6 +372,8 @@ struct x86_emulate_ctxt {
struct read_cache io_read;
struct read_cache mem_read;
bool is_branch;
+ /* instruction need a HW VM-Enter to complete correctly */
+ bool need_vm_enter;
};

#define KVM_EMULATOR_BUG_ON(cond, ctxt) \
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index dee62362a360..8c19ad5e18d4 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1165,11 +1165,20 @@ int nested_svm_vmexit(struct vcpu_svm *svm)

static void nested_svm_triple_fault(struct kvm_vcpu *vcpu)
{
+ struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
struct vcpu_svm *svm = to_svm(vcpu);

if (!vmcb12_is_intercept(&svm->nested.ctl, INTERCEPT_SHUTDOWN))
return;

+ /*
+ * In case of a triple fault, cancel the nested reentry. This may occur
+ * when the RSM instruction fails while attempting to restore the state
+ * from SMRAM.
+ */
+ if (ctxt->need_vm_enter)
+ svm->nested.nested_run_pending = 0;
+
kvm_clear_request(KVM_REQ_TRIPLE_FAULT, vcpu);
nested_svm_simple_vmexit(to_svm(vcpu), SVM_EXIT_SHUTDOWN);
}
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 6329a306856b..9228699b4c1e 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4950,7 +4950,19 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,

static void nested_vmx_triple_fault(struct kvm_vcpu *vcpu)
{
+ struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+
kvm_clear_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+
+ /*
+ * In case of a triple fault, cancel the nested reentry. This may occur
+ * when the RSM instruction fails while attempting to restore the state
+ * from SMRAM.
+ */
+ if (ctxt->need_vm_enter)
+ vmx->nested.nested_run_pending = 0;
+
nested_vmx_vmexit(vcpu, EXIT_REASON_TRIPLE_FAULT, 0, 0);
}

--
2.41.0



2024-01-25 06:17:04

by Yunhong Jiang

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction

On Tue, Jan 23, 2024 at 02:15:55AM +0200, Michal Wilczynski wrote:
> Syzkaller found a warning triggered in nested_vmx_vmexit().
> vmx->nested.nested_run_pending is non-zero, even though we're in
> nested_vmx_vmexit(). Generally, trying to cancel a pending entry is
> considered a bug. However in this particular scenario, the kernel
> behavior seems correct.
>
> Syzkaller scenario:
> 1) Set up VCPU's
> 2) Run some code with KVM_RUN in L2 as a nested guest
> 3) Return from KVM_RUN
> 4) Inject KVM_SMI into the VCPU
> 5) Change the EFER register with KVM_SET_SREGS to value 0x2501
> 6) Run some code on the VCPU using KVM_RUN
> 7) Observe following behavior:
>
> kvm_smm_transition: vcpu 0: entering SMM, smbase 0x30000
> kvm_entry: vcpu 0, rip 0x8000
> kvm_entry: vcpu 0, rip 0x8000
> kvm_entry: vcpu 0, rip 0x8002
> kvm_smm_transition: vcpu 0: leaving SMM, smbase 0x30000
> kvm_nested_vmenter: rip: 0x0000000000008002 vmcs: 0x0000000000007000
> nested_rip: 0x0000000000000000 int_ctl: 0x00000000
> event_inj: 0x00000000 nested_ept=n guest
> cr3: 0x0000000000002000
> kvm_nested_vmexit_inject: reason: TRIPLE_FAULT ext_inf1: 0x0000000000000000
> ext_inf2: 0x0000000000000000 ext_int: 0x00000000
> ext_int_err: 0x00000000
>
> What happened here is an SMI was injected immediately and the handler was
> called at address 0x8000; all is good. Later, an RSM instruction is
> executed in an emulator to return to the nested VM. em_rsm() is called,
> which leads to emulator_leave_smm(). A part of this function calls VMX/SVM
> callback, in this case vmx_leave_smm(). It attempts to set up a pending
> reentry to guest VM by calling nested_vmx_enter_non_root_mode() and sets
> vmx->nested.nested_run_pending to one. Unfortunately, later in
> emulator_leave_smm(), rsm_load_state_64() fails to write invalid EFER to
> the MSR. This results in em_rsm() calling triple_fault callback. At this
> point it's clear that the KVM should call the vmexit, but
> vmx->nested.nested_run_pending is left set to 1.
>
> Similar flow goes for SVM, as the bug also reproduces on AMD platforms.
>
> To address this issue, reset the nested_run_pending flag in the
> triple_fault handler. However, it's crucial to note that
> nested_pending_run cannot be cleared in all cases. It should only be
> cleared for the specific instruction requiring hardware VM-Enter to
> complete the emulation, such as RSM. Previously, there were instances
> where KVM prematurely synthesized a triple fault on nested VM-Enter. In
> these cases, it is not appropriate to zero the nested_pending_run.
>
> To resolve this, introduce a new emulator flag indicating the need for
> HW VM-Enter to complete emulating RSM. Based on this flag, a decision can
> be made in vendor-specific triple fault handlers about whether
> nested_pending_run needs to be cleared.
Would it be ok to move the followed emulator_leave_smm() code into
vmx_leave_smm, before setting nested_run_pending bit? It avoids changing
the generic emulator code.

#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
return rsm_load_state_64(ctxt, &smram.smram64);
else
#endif
return rsm_load_state_32(ctxt, &smram.smram32);

--jyh


2024-01-30 20:08:04

by Wilczynski, Michal

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction



On 1/25/2024 1:57 AM, Yunhong Jiang wrote:
> On Tue, Jan 23, 2024 at 02:15:55AM +0200, Michal Wilczynski wrote:
>> To resolve this, introduce a new emulator flag indicating the need for
>> HW VM-Enter to complete emulating RSM. Based on this flag, a decision can
>> be made in vendor-specific triple fault handlers about whether
>> nested_pending_run needs to be cleared.
> Would it be ok to move the followed emulator_leave_smm() code into
> vmx_leave_smm, before setting nested_run_pending bit? It avoids changing
> the generic emulator code.
>
> #ifdef CONFIG_X86_64
> if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
> return rsm_load_state_64(ctxt, &smram.smram64);
> else
> #endif
> return rsm_load_state_32(ctxt, &smram.smram32);
>
> --jyh
>

Moving rsm_load_state_* to the vendor structs would be architecturally
incorrect as vendor callbacks should only do vendor specific stuff, and
recovering state from SMRAM is VMX/SVM independent, and should be kept
that way.

nested_pending_run is unfortunately buried in vendor-specific structs so
it's zeroeing has to be done in vendor specific callbacks.

The way I structured this fix is hopefully in line with the discussion
under v1 of this patch, where Sean gave some background on the code and
proposed ways to fix.

Thanks for your input !
Michał


2024-02-08 09:05:16

by Wilczynski, Michal

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction



On 1/23/2024 1:15 AM, Michal Wilczynski wrote:
> Syzkaller found a warning triggered in nested_vmx_vmexit().
> vmx->nested.nested_run_pending is non-zero, even though we're in
> nested_vmx_vmexit(). Generally, trying to cancel a pending entry is
> considered a bug. However in this particular scenario, the kernel
> behavior seems correct.
>
> Syzkaller scenario:
> 1) Set up VCPU's
> 2) Run some code with KVM_RUN in L2 as a nested guest
> 3) Return from KVM_RUN
> 4) Inject KVM_SMI into the VCPU
> 5) Change the EFER register with KVM_SET_SREGS to value 0x2501
> 6) Run some code on the VCPU using KVM_RUN
> 7) Observe following behavior:
>
> kvm_smm_transition: vcpu 0: entering SMM, smbase 0x30000
> kvm_entry: vcpu 0, rip 0x8000
> kvm_entry: vcpu 0, rip 0x8000
> kvm_entry: vcpu 0, rip 0x8002
> kvm_smm_transition: vcpu 0: leaving SMM, smbase 0x30000
> kvm_nested_vmenter: rip: 0x0000000000008002 vmcs: 0x0000000000007000
> nested_rip: 0x0000000000000000 int_ctl: 0x00000000
> event_inj: 0x00000000 nested_ept=n guest
> cr3: 0x0000000000002000
> kvm_nested_vmexit_inject: reason: TRIPLE_FAULT ext_inf1: 0x0000000000000000
> ext_inf2: 0x0000000000000000 ext_int: 0x00000000
> ext_int_err: 0x00000000
>
> What happened here is an SMI was injected immediately and the handler was
> called at address 0x8000; all is good. Later, an RSM instruction is
> executed in an emulator to return to the nested VM. em_rsm() is called,
> which leads to emulator_leave_smm(). A part of this function calls VMX/SVM
> callback, in this case vmx_leave_smm(). It attempts to set up a pending
> reentry to guest VM by calling nested_vmx_enter_non_root_mode() and sets
> vmx->nested.nested_run_pending to one. Unfortunately, later in
> emulator_leave_smm(), rsm_load_state_64() fails to write invalid EFER to
> the MSR. This results in em_rsm() calling triple_fault callback. At this
> point it's clear that the KVM should call the vmexit, but
> vmx->nested.nested_run_pending is left set to 1.
>
> Similar flow goes for SVM, as the bug also reproduces on AMD platforms.
>
> To address this issue, reset the nested_run_pending flag in the
> triple_fault handler. However, it's crucial to note that
> nested_pending_run cannot be cleared in all cases. It should only be
> cleared for the specific instruction requiring hardware VM-Enter to
> complete the emulation, such as RSM. Previously, there were instances
> where KVM prematurely synthesized a triple fault on nested VM-Enter. In
> these cases, it is not appropriate to zero the nested_pending_run.
>
> To resolve this, introduce a new emulator flag indicating the need for
> HW VM-Enter to complete emulating RSM. Based on this flag, a decision can
> be made in vendor-specific triple fault handlers about whether
> nested_pending_run needs to be cleared.
>
> Fixes: 759cbd59674a ("KVM: x86: nSVM/nVMX: set nested_run_pending on VM entry which is a result of RSM")
> Reported-by: Zheyu Ma <[email protected]>
> Closes: https://lore.kernel.org/all/CAMhUBjmXMYsEoVYw_M8hSZjBMHh24i88QYm-RY6HDta5YZ7Wgw@mail.gmail.com
> Signed-off-by: Michal Wilczynski <[email protected]>
> ---
> v2:
> - added new emulator flags indicating whether an instruction needs a
> VM-Enter to complete emulation (Sean)
> - fix in SVM nested triple_fault handler (Sean)
> - only clear nested_run_pending on RSM instruction (Sean)
>
> arch/x86/kvm/emulate.c | 4 +++-
> arch/x86/kvm/kvm_emulate.h | 2 ++
> arch/x86/kvm/svm/nested.c | 9 +++++++++
> arch/x86/kvm/vmx/nested.c | 12 ++++++++++++
> 4 files changed, 26 insertions(+), 1 deletion(-)
>

Hi Sean,
Friendly ping,

Regards,
Michał

2024-02-08 10:32:09

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction

On Thu, Jan 25, 2024 at 1:59 AM Yunhong Jiang
<[email protected]> wrote:
> Would it be ok to move the followed emulator_leave_smm() code into
> vmx_leave_smm, before setting nested_run_pending bit? It avoids changing
> the generic emulator code.
>
> #ifdef CONFIG_X86_64
> if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
> return rsm_load_state_64(ctxt, &smram.smram64);
> else
> #endif
> return rsm_load_state_32(ctxt, &smram.smram32);

I agree with Michal that vendor code should not be in charge of
calling rsm_load_state_*.

However, I don't understand the problem or the fix.

The possible causes are two, and in neither case the fix is to clear
nested_run_pending:

1) if the problem is that setting nested_run_pending was premature,
the correct fix in my opinion is to split the leave_smm vendor
callback in "prepare" and "commit" parts; not to clear it later. See
the attached, untested, patch.

2) otherwise, if the problem is that we have not gone through the
vmenter yet, then KVM needs to do that and _then_ inject the triple
fault. The fix is to merge the .triple_fault and .check_nested_events
callbacks, with something like the second attached patch - which
probably has so many problems that I haven't even tried to compile it.

The first patch should be equivalent to yours, and I guess it is
okay-ish with a few more comments that explain what's going on.

Paolo


Attachments:
nested.patch (8.14 kB)
nested2.patch (2.63 kB)
Download all attachments

2024-02-08 13:23:20

by Wilczynski, Michal

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction



On 2/8/2024 11:31 AM, Paolo Bonzini wrote:
> On Thu, Jan 25, 2024 at 1:59 AM Yunhong Jiang
> <[email protected]> wrote:
>> Would it be ok to move the followed emulator_leave_smm() code into
>> vmx_leave_smm, before setting nested_run_pending bit? It avoids changing
>> the generic emulator code.
>>
>> #ifdef CONFIG_X86_64
>> if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
>> return rsm_load_state_64(ctxt, &smram.smram64);
>> else
>> #endif
>> return rsm_load_state_32(ctxt, &smram.smram32);
>
> I agree with Michal that vendor code should not be in charge of
> calling rsm_load_state_*.
>
> However, I don't understand the problem or the fix.
>
> The possible causes are two, and in neither case the fix is to clear
> nested_run_pending:
>
> 1) if the problem is that setting nested_run_pending was premature,
> the correct fix in my opinion is to split the leave_smm vendor
> callback in "prepare" and "commit" parts; not to clear it later. See
> the attached, untested, patch.

Hi, I've tested the patch and it seems to work, both on Intel and AMD.
There was a problem with applying this chunk though:

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index ac8b7614e79d..3d18fa7db353 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -119,7 +119,8 @@ KVM_X86_OP(setup_mce)
#ifdef CONFIG_KVM_SMM
KVM_X86_OP(smi_allowed)
KVM_X86_OP() // <- This shouldn't be there I guess ?
-KVM_X86_OP(leave_smm)
+KVM_X86_OP(leave_smm_prepare)
+KVM_X86_OP(leave_smm_commit)
KVM_X86_OP(enable_smi_window)
#endif
KVM_X86_OP_OPTIONAL(dev_get_attr)

Anyway I was a bit averse to this approach as I noticed in the git log
that callbacks like e.g post_leave_smm() used to exist, but they were later
removed, so I though the maintainers don't like introducing extra
callbacks.

>
> 2) otherwise, if the problem is that we have not gone through the
> vmenter yet, then KVM needs to do that and _then_ inject the triple
> fault. The fix is to merge the .triple_fault and .check_nested_events
> callbacks, with something like the second attached patch - which
> probably has so many problems that I haven't even tried to compile it.

Well, in this case if we know that RSM will fail it doesn't seem to me
like it make sense to run vmenter just do kill the VM anyway, this would
be more confusing.

I've made the fix this way based on our discussion with Sean in v1, and
tried to mark the RSM instruction with a flag, as a one that needs
actual HW VMenter to complete succesfully, and based on that information
manipulate nested_run_pending.

>
> The first patch should be equivalent to yours, and I guess it is
> okay-ish with a few more comments that explain what's going on.

Sure, I'm fine with this. Thanks !

>
> Paolo

2024-02-08 17:47:04

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction

On Thu, Feb 8, 2024 at 2:18 PM Wilczynski, Michal
<[email protected]> wrote:
> Hi, I've tested the patch and it seems to work, both on Intel and AMD.
> There was a problem with applying this chunk though:
>
> diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> index ac8b7614e79d..3d18fa7db353 100644
> --- a/arch/x86/include/asm/kvm-x86-ops.h
> +++ b/arch/x86/include/asm/kvm-x86-ops.h
> @@ -119,7 +119,8 @@ KVM_X86_OP(setup_mce)
> #ifdef CONFIG_KVM_SMM
> KVM_X86_OP(smi_allowed)
> KVM_X86_OP() // <- This shouldn't be there I guess ?
> -KVM_X86_OP(leave_smm)
> +KVM_X86_OP(leave_smm_prepare)
> +KVM_X86_OP(leave_smm_commit)
> KVM_X86_OP(enable_smi_window)
> #endif
> KVM_X86_OP_OPTIONAL(dev_get_attr)
>
> Anyway I was a bit averse to this approach as I noticed in the git log
> that callbacks like e.g post_leave_smm() used to exist, but they were later
> removed, so I though the maintainers don't like introducing extra
> callbacks.

If they are needed, it's fine. In my opinion a new callback is easier
to handle and understand than new state.

> > 2) otherwise, if the problem is that we have not gone through the
> > vmenter yet, then KVM needs to do that and _then_ inject the triple
> > fault. The fix is to merge the .triple_fault and .check_nested_events
> > callbacks, with something like the second attached patch - which
> > probably has so many problems that I haven't even tried to compile it.
>
> Well, in this case if we know that RSM will fail it doesn't seem to me
> like it make sense to run vmenter just do kill the VM anyway, this would
> be more confusing.

Note that the triple fault must not kill the VM, it's just causing a
nested vmexit from L2 to L1. KVM's algorithm to inject a
vmexit-causing event is always to first ensure that the VMCS02 (VMCB02
for AMD) is consistent, and only then trigger the vmexit. So if patch
2 or something like it works, that would be even better.

> I've made the fix this way based on our discussion with Sean in v1, and
> tried to mark the RSM instruction with a flag, as a one that needs
> actual HW VMenter to complete succesfully, and based on that information
> manipulate nested_run_pending.

I understand, apologies for not noticing v1. Let's wait for Sean's opinion.

Paolo


2024-02-09 15:12:55

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction

On Thu, Feb 08, 2024, Paolo Bonzini wrote:
> On Thu, Feb 8, 2024 at 2:18 PM Wilczynski, Michal
> <[email protected]> wrote:
> > Hi, I've tested the patch and it seems to work, both on Intel and AMD.
> > There was a problem with applying this chunk though:
> >
> > diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
> > index ac8b7614e79d..3d18fa7db353 100644
> > --- a/arch/x86/include/asm/kvm-x86-ops.h
> > +++ b/arch/x86/include/asm/kvm-x86-ops.h
> > @@ -119,7 +119,8 @@ KVM_X86_OP(setup_mce)
> > #ifdef CONFIG_KVM_SMM
> > KVM_X86_OP(smi_allowed)
> > KVM_X86_OP() // <- This shouldn't be there I guess ?
> > -KVM_X86_OP(leave_smm)
> > +KVM_X86_OP(leave_smm_prepare)
> > +KVM_X86_OP(leave_smm_commit)
> > KVM_X86_OP(enable_smi_window)
> > #endif
> > KVM_X86_OP_OPTIONAL(dev_get_attr)
> >
> > Anyway I was a bit averse to this approach as I noticed in the git log
> > that callbacks like e.g post_leave_smm() used to exist, but they were later
> > removed, so I though the maintainers don't like introducing extra
> > callbacks.
>
> If they are needed, it's fine. In my opinion a new callback is easier
> to handle and understand than new state.

Yeah, we ripped out post_leave_smm() because its sole usage at the time was buggy,
and having a callback without a purpose would just be dead code.

> > > 2) otherwise, if the problem is that we have not gone through the
> > > vmenter yet, then KVM needs to do that and _then_ inject the triple
> > > fault. The fix is to merge the .triple_fault and .check_nested_events
> > > callbacks, with something like the second attached patch - which
> > > probably has so many problems that I haven't even tried to compile it.
> >
> > Well, in this case if we know that RSM will fail it doesn't seem to me
> > like it make sense to run vmenter just do kill the VM anyway, this would
> > be more confusing.
>
> Note that the triple fault must not kill the VM, it's just causing a
> nested vmexit from L2 to L1. KVM's algorithm to inject a
> vmexit-causing event is always to first ensure that the VMCS02 (VMCB02
> for AMD) is consistent, and only then trigger the vmexit. So if patch
> 2 or something like it works, that would be even better.
>
> > I've made the fix this way based on our discussion with Sean in v1, and
> > tried to mark the RSM instruction with a flag, as a one that needs
> > actual HW VMenter to complete succesfully, and based on that information
> > manipulate nested_run_pending.

Heh, you misunderstood my suggestion.

: But due to nested_run_pending being (unnecessarily) buried in vendor structs, it
: might actually be easier to do a cleaner fix. E.g. add yet another flag to track
: that a hardware VM-Enter needs to be completed in order to complete instruction
: emulation.

I didn't mean add a flag to the emulator to muck with nested_run_pending, I meant
add a flag to kvm_vcpu_arch to be a superset of nested_run_pending. E.g. as a
first step, something like the below. And then as follow up, see if it's doable
to propagate nested_run_pending => insn_emulation_needs_vmenter so that the
nested_run_pending checks in {svm,vmx}_{interrupt,nmi,smi}_allowed() can be
dropped.

---
arch/x86/include/asm/kvm_host.h | 8 ++++++
arch/x86/kvm/smm.c | 10 ++++++--
arch/x86/kvm/x86.c | 44 +++++++++++++++++++++++++--------
3 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d271ba20a0b2..bb4250551619 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -769,6 +769,14 @@ struct kvm_vcpu_arch {
u64 ia32_misc_enable_msr;
u64 smbase;
u64 smi_count;
+
+ /*
+ * Tracks if a successful VM-Enter is needed to complete emulation of
+ * an instruction, e.g. to ensure emulation of RSM or nested VM-Enter,
+ * which can directly inject events, completes before KVM attempts to
+ * inject new events.
+ */
+ bool insn_emulation_needs_vmenter;
bool at_instruction_boundary;
bool tpr_access_reporting;
bool xfd_no_write_intercept;
diff --git a/arch/x86/kvm/smm.c b/arch/x86/kvm/smm.c
index dc3d95fdca7d..c6e597b8c794 100644
--- a/arch/x86/kvm/smm.c
+++ b/arch/x86/kvm/smm.c
@@ -640,8 +640,14 @@ int emulator_leave_smm(struct x86_emulate_ctxt *ctxt)

#ifdef CONFIG_X86_64
if (guest_cpuid_has(vcpu, X86_FEATURE_LM))
- return rsm_load_state_64(ctxt, &smram.smram64);
+ ret = rsm_load_state_64(ctxt, &smram.smram64);
else
#endif
- return rsm_load_state_32(ctxt, &smram.smram32);
+ ret = rsm_load_state_32(ctxt, &smram.smram32);
+
+ if (ret != X86EMUL_CONTINUE)
+ return ret;
+
+ vcpu->arch.insn_emulation_needs_vmenter = true;
+ return X86EMUL_CONTINUE;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bf10a9073a09..21a7183bbf69 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10195,6 +10195,30 @@ int kvm_check_nested_events(struct kvm_vcpu *vcpu)
return kvm_x86_ops.nested_ops->check_events(vcpu);
}

+static int kvm_interrupt_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+ if (vcpu->arch.insn_emulation_needs_vmenter)
+ return -EBUSY;
+
+ return static_call(kvm_x86_interrupt_allowed)(vcpu, for_injection);
+}
+
+static int kvm_smi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+ if (vcpu->arch.insn_emulation_needs_vmenter)
+ return -EBUSY;
+
+ return static_call(kvm_x86_smi_allowed)(vcpu, for_injection)
+}
+
+static int kvm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)
+{
+ if (vcpu->arch.insn_emulation_needs_vmenter)
+ return -EBUSY;
+
+ return x86_nmi_static_call(kvm_x86_nmi_allowed)(vcpu, for_injection);
+}
+
static void kvm_inject_exception(struct kvm_vcpu *vcpu)
{
/*
@@ -10384,7 +10408,7 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
*/
#ifdef CONFIG_KVM_SMM
if (vcpu->arch.smi_pending) {
- r = can_inject ? static_call(kvm_x86_smi_allowed)(vcpu, true) : -EBUSY;
+ r = can_inject ? kvm_smi_allowed(vcpu, true) : -EBUSY;
if (r < 0)
goto out;
if (r) {
@@ -10398,7 +10422,7 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
#endif

if (vcpu->arch.nmi_pending) {
- r = can_inject ? static_call(kvm_x86_nmi_allowed)(vcpu, true) : -EBUSY;
+ r = can_inject ? kvm_nmi_allowed(vcpu, true) : -EBUSY;
if (r < 0)
goto out;
if (r) {
@@ -10406,14 +10430,14 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
vcpu->arch.nmi_injected = true;
static_call(kvm_x86_inject_nmi)(vcpu);
can_inject = false;
- WARN_ON(static_call(kvm_x86_nmi_allowed)(vcpu, true) < 0);
+ WARN_ON_ONCE(kvm_nmi_allowed() < 0);
}
if (vcpu->arch.nmi_pending)
static_call(kvm_x86_enable_nmi_window)(vcpu);
}

if (kvm_cpu_has_injectable_intr(vcpu)) {
- r = can_inject ? static_call(kvm_x86_interrupt_allowed)(vcpu, true) : -EBUSY;
+ r = can_inject ? kvm_interrupt_allowed(vcpu, true) : -EBUSY;
if (r < 0)
goto out;
if (r) {
@@ -10422,7 +10446,7 @@ static int kvm_check_and_inject_events(struct kvm_vcpu *vcpu,
if (!WARN_ON_ONCE(irq == -1)) {
kvm_queue_interrupt(vcpu, irq, false);
static_call(kvm_x86_inject_irq)(vcpu, false);
- WARN_ON(static_call(kvm_x86_interrupt_allowed)(vcpu, true) < 0);
+ WARN_ON(kvm_interrupt_allowed(vcpu, true) < 0);
}
}
if (kvm_cpu_has_injectable_intr(vcpu))
@@ -10969,6 +10993,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
WARN_ON_ONCE((kvm_vcpu_apicv_activated(vcpu) != kvm_vcpu_apicv_active(vcpu)) &&
(kvm_get_apic_mode(vcpu) != LAPIC_MODE_DISABLED));

+ vcpu->arch.insn_emulation_needs_vmenter = false;
+
exit_fastpath = static_call(kvm_x86_vcpu_run)(vcpu);
if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST))
break;
@@ -13051,14 +13077,12 @@ static inline bool kvm_vcpu_has_events(struct kvm_vcpu *vcpu)
return true;

if (kvm_test_request(KVM_REQ_NMI, vcpu) ||
- (vcpu->arch.nmi_pending &&
- static_call(kvm_x86_nmi_allowed)(vcpu, false)))
+ (vcpu->arch.nmi_pending && kvm_nmi_allowed(vcpu, false)))
return true;

#ifdef CONFIG_KVM_SMM
if (kvm_test_request(KVM_REQ_SMI, vcpu) ||
- (vcpu->arch.smi_pending &&
- static_call(kvm_x86_smi_allowed)(vcpu, false)))
+ (vcpu->arch.smi_pending && kvm_smi_allowed(vcpu, false)))
return true;
#endif

@@ -13136,7 +13160,7 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)

int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu)
{
- return static_call(kvm_x86_interrupt_allowed)(vcpu, false);
+ return kvm_interrupt_allowed(vcpu, false);
}

unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu)

base-commit: f8fe663bc413d2a14ab9a452638a99b975011a9d
--


2024-02-09 22:04:21

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: x86: nSVM/nVMX: Fix handling triple fault on RSM instruction

On Fri, Feb 9, 2024 at 4:05 PM Sean Christopherson <[email protected]> wrote:
> > If they are needed, it's fine. In my opinion a new callback is easier
> > to handle and understand than new state.
>
> Yeah, we ripped out post_leave_smm() because its sole usage at the time was buggy,
> and having a callback without a purpose would just be dead code.

[...]

> : But due to nested_run_pending being (unnecessarily) buried in vendor structs, it
> : might actually be easier to do a cleaner fix. E.g. add yet another flag to track
> : that a hardware VM-Enter needs to be completed in order to complete instruction
> : emulation.
>
> I didn't mean add a flag to the emulator to muck with nested_run_pending, I meant
> add a flag to kvm_vcpu_arch to be a superset of nested_run_pending. E.g. as a
> first step, something like the below. And then as follow up, see if it's doable
> to propagate nested_run_pending => insn_emulation_needs_vmenter so that the
> nested_run_pending checks in {svm,vmx}_{interrupt,nmi,smi}_allowed() can be
> dropped.

That seems a lot more complicated... What do you think of the patches
I posted (the one that works and the wish-it-could-be-like-that one
that folds triple faults into check_nested_events).

Paolo