2021-01-12 13:06:22

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH 1/2] KVM: x86: Add emulation support for #GP triggered by VM instructions

Wei Huang <[email protected]> writes:

> From: Bandan Das <[email protected]>
>
> While running VM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD
> CPUs check EAX against reserved memory regions (e.g. SMM memory on host)
> before checking VMCB's instruction intercept. If EAX falls into such
> memory areas, #GP is triggered before VMEXIT. This causes problem under
> nested virtualization. To solve this problem, KVM needs to trap #GP and
> check the instructions triggering #GP. For VM execution instructions,
> KVM emulates these instructions; otherwise it re-injects #GP back to
> guest VMs.
>
> Signed-off-by: Bandan Das <[email protected]>
> Co-developed-by: Wei Huang <[email protected]>
> Signed-off-by: Wei Huang <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 8 +-
> arch/x86/kvm/mmu.h | 1 +
> arch/x86/kvm/mmu/mmu.c | 7 ++
> arch/x86/kvm/svm/svm.c | 157 +++++++++++++++++++-------------
> arch/x86/kvm/svm/svm.h | 8 ++
> arch/x86/kvm/vmx/vmx.c | 2 +-
> arch/x86/kvm/x86.c | 37 +++++++-
> 7 files changed, 146 insertions(+), 74 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 3d6616f6f6ef..0ddc309f5a14 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1450,10 +1450,12 @@ extern u64 kvm_mce_cap_supported;
> * due to an intercepted #UD (see EMULTYPE_TRAP_UD).
> * Used to test the full emulator from userspace.
> *
> - * EMULTYPE_VMWARE_GP - Set when emulating an intercepted #GP for VMware
> + * EMULTYPE_PARAVIRT_GP - Set when emulating an intercepted #GP for VMware
> * backdoor emulation, which is opt in via module param.
> * VMware backoor emulation handles select instructions
> - * and reinjects the #GP for all other cases.
> + * and reinjects #GP for all other cases. This also
> + * handles other cases where #GP condition needs to be
> + * handled and emulated appropriately
> *
> * EMULTYPE_PF - Set when emulating MMIO by way of an intercepted #PF, in which
> * case the CR2/GPA value pass on the stack is valid.
> @@ -1463,7 +1465,7 @@ extern u64 kvm_mce_cap_supported;
> #define EMULTYPE_SKIP (1 << 2)
> #define EMULTYPE_ALLOW_RETRY_PF (1 << 3)
> #define EMULTYPE_TRAP_UD_FORCED (1 << 4)
> -#define EMULTYPE_VMWARE_GP (1 << 5)
> +#define EMULTYPE_PARAVIRT_GP (1 << 5)
> #define EMULTYPE_PF (1 << 6)
>
> int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 581925e476d6..1a2fff4e7140 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -219,5 +219,6 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
>
> int kvm_mmu_post_init_vm(struct kvm *kvm);
> void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
> +bool kvm_is_host_reserved_region(u64 gpa);

Just a suggestion: "kvm_gpa_in_host_reserved()" maybe?

>
> #endif
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 6d16481aa29d..c5c4aaf01a1a 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -50,6 +50,7 @@
> #include <asm/io.h>
> #include <asm/vmx.h>
> #include <asm/kvm_page_track.h>
> +#include <asm/e820/api.h>
> #include "trace.h"
>
> extern bool itlb_multihit_kvm_mitigation;
> @@ -5675,6 +5676,12 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,
> }
> EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
>
> +bool kvm_is_host_reserved_region(u64 gpa)
> +{
> + return e820__mbapped_raw_any(gpa-1, gpa+1, E820_TYPE_RESERVED);
> +}

While _e820__mapped_any()'s doc says '.. checks if any part of the
range <start,end> is mapped ..' it seems to me that the real check is
[start, end) so we should use 'gpa' instead of 'gpa-1', no?

> +EXPORT_SYMBOL_GPL(kvm_is_host_reserved_region);
> +
> void kvm_mmu_zap_all(struct kvm *kvm)
> {
> struct kvm_mmu_page *sp, *node;
> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
> index 7ef171790d02..74620d32aa82 100644
> --- a/arch/x86/kvm/svm/svm.c
> +++ b/arch/x86/kvm/svm/svm.c
> @@ -288,6 +288,7 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
> if (!(efer & EFER_SVME)) {
> svm_leave_nested(svm);
> svm_set_gif(svm, true);
> + clr_exception_intercept(svm, GP_VECTOR);
>
> /*
> * Free the nested guest state, unless we are in SMM.
> @@ -309,6 +310,10 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
>
> svm->vmcb->save.efer = efer | EFER_SVME;
> vmcb_mark_dirty(svm->vmcb, VMCB_CR);
> + /* Enable GP interception for SVM instructions if needed */
> + if (efer & EFER_SVME)
> + set_exception_intercept(svm, GP_VECTOR);
> +
> return 0;
> }
>
> @@ -1957,22 +1962,104 @@ static int ac_interception(struct vcpu_svm *svm)
> return 1;
> }
>
> +static int vmload_interception(struct vcpu_svm *svm)
> +{
> + struct vmcb *nested_vmcb;
> + struct kvm_host_map map;
> + int ret;
> +
> + if (nested_svm_check_permissions(svm))
> + return 1;
> +
> + ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->vmcb->save.rax), &map);
> + if (ret) {
> + if (ret == -EINVAL)
> + kvm_inject_gp(&svm->vcpu, 0);
> + return 1;
> + }
> +
> + nested_vmcb = map.hva;
> +
> + ret = kvm_skip_emulated_instruction(&svm->vcpu);
> +
> + nested_svm_vmloadsave(nested_vmcb, svm->vmcb);
> + kvm_vcpu_unmap(&svm->vcpu, &map, true);
> +
> + return ret;
> +}
> +
> +static int vmsave_interception(struct vcpu_svm *svm)
> +{
> + struct vmcb *nested_vmcb;
> + struct kvm_host_map map;
> + int ret;
> +
> + if (nested_svm_check_permissions(svm))
> + return 1;
> +
> + ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->vmcb->save.rax), &map);
> + if (ret) {
> + if (ret == -EINVAL)
> + kvm_inject_gp(&svm->vcpu, 0);
> + return 1;
> + }
> +
> + nested_vmcb = map.hva;
> +
> + ret = kvm_skip_emulated_instruction(&svm->vcpu);
> +
> + nested_svm_vmloadsave(svm->vmcb, nested_vmcb);
> + kvm_vcpu_unmap(&svm->vcpu, &map, true);
> +
> + return ret;
> +}
> +
> +static int vmrun_interception(struct vcpu_svm *svm)
> +{
> + if (nested_svm_check_permissions(svm))
> + return 1;
> +
> + return nested_svm_vmrun(svm);
> +}
> +
> +/* Emulate SVM VM execution instructions */
> +static int svm_emulate_vm_instr(struct kvm_vcpu *vcpu, u8 modrm)
> +{
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + switch (modrm) {
> + case 0xd8: /* VMRUN */
> + return vmrun_interception(svm);
> + case 0xda: /* VMLOAD */
> + return vmload_interception(svm);
> + case 0xdb: /* VMSAVE */
> + return vmsave_interception(svm);
> + default:
> + /* inject a #GP for all other cases */
> + kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
> + return 1;
> + }
> +}
> +
> static int gp_interception(struct vcpu_svm *svm)
> {
> struct kvm_vcpu *vcpu = &svm->vcpu;
> u32 error_code = svm->vmcb->control.exit_info_1;
> -
> - WARN_ON_ONCE(!enable_vmware_backdoor);
> + int rc;
>
> /*
> - * VMware backdoor emulation on #GP interception only handles IN{S},
> - * OUT{S}, and RDPMC, none of which generate a non-zero error code.
> + * Only VMware backdoor and SVM VME errata are handled. Neither of
> + * them has non-zero error codes.
> */
> if (error_code) {
> kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
> return 1;
> }
> - return kvm_emulate_instruction(vcpu, EMULTYPE_VMWARE_GP);
> +
> + rc = kvm_emulate_instruction(vcpu, EMULTYPE_PARAVIRT_GP);
> + if (rc > 1)
> + rc = svm_emulate_vm_instr(vcpu, rc);
> + return rc;
> }
>
> static bool is_erratum_383(void)
> @@ -2113,66 +2200,6 @@ static int vmmcall_interception(struct vcpu_svm *svm)
> return kvm_emulate_hypercall(&svm->vcpu);
> }
>
> -static int vmload_interception(struct vcpu_svm *svm)
> -{
> - struct vmcb *nested_vmcb;
> - struct kvm_host_map map;
> - int ret;
> -
> - if (nested_svm_check_permissions(svm))
> - return 1;
> -
> - ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->vmcb->save.rax), &map);
> - if (ret) {
> - if (ret == -EINVAL)
> - kvm_inject_gp(&svm->vcpu, 0);
> - return 1;
> - }
> -
> - nested_vmcb = map.hva;
> -
> - ret = kvm_skip_emulated_instruction(&svm->vcpu);
> -
> - nested_svm_vmloadsave(nested_vmcb, svm->vmcb);
> - kvm_vcpu_unmap(&svm->vcpu, &map, true);
> -
> - return ret;
> -}
> -
> -static int vmsave_interception(struct vcpu_svm *svm)
> -{
> - struct vmcb *nested_vmcb;
> - struct kvm_host_map map;
> - int ret;
> -
> - if (nested_svm_check_permissions(svm))
> - return 1;
> -
> - ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->vmcb->save.rax), &map);
> - if (ret) {
> - if (ret == -EINVAL)
> - kvm_inject_gp(&svm->vcpu, 0);
> - return 1;
> - }
> -
> - nested_vmcb = map.hva;
> -
> - ret = kvm_skip_emulated_instruction(&svm->vcpu);
> -
> - nested_svm_vmloadsave(svm->vmcb, nested_vmcb);
> - kvm_vcpu_unmap(&svm->vcpu, &map, true);
> -
> - return ret;
> -}
> -
> -static int vmrun_interception(struct vcpu_svm *svm)
> -{
> - if (nested_svm_check_permissions(svm))
> - return 1;
> -
> - return nested_svm_vmrun(svm);
> -}
> -

Maybe if you'd do it the other way around and put gp_interception()
after vm{load,save,run}_interception(), the diff (and code churn)
would've been smaller?

> void svm_set_gif(struct vcpu_svm *svm, bool value)
> {
> if (value) {
> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
> index 0fe874ae5498..d5dffcf59afa 100644
> --- a/arch/x86/kvm/svm/svm.h
> +++ b/arch/x86/kvm/svm/svm.h
> @@ -350,6 +350,14 @@ static inline void clr_exception_intercept(struct vcpu_svm *svm, u32 bit)
> recalc_intercepts(svm);
> }
>
> +static inline bool is_exception_intercept(struct vcpu_svm *svm, u32 bit)
> +{
> + struct vmcb *vmcb = get_host_vmcb(svm);
> +
> + WARN_ON_ONCE(bit >= 32);
> + return vmcb_is_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
> +}
> +
> static inline void svm_set_intercept(struct vcpu_svm *svm, int bit)
> {
> struct vmcb *vmcb = get_host_vmcb(svm);
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index 2af05d3b0590..5fac2f7cba24 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -4774,7 +4774,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
> kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
> return 1;
> }
> - return kvm_emulate_instruction(vcpu, EMULTYPE_VMWARE_GP);
> + return kvm_emulate_instruction(vcpu, EMULTYPE_PARAVIRT_GP);
> }
>
> /*
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 9a8969a6dd06..c3662fc3b1bc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7014,7 +7014,7 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
> ++vcpu->stat.insn_emulation_fail;
> trace_kvm_emulate_insn_failed(vcpu);
>
> - if (emulation_type & EMULTYPE_VMWARE_GP) {
> + if (emulation_type & EMULTYPE_PARAVIRT_GP) {
> kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
> return 1;
> }
> @@ -7267,6 +7267,28 @@ static bool kvm_vcpu_check_breakpoint(struct kvm_vcpu *vcpu, int *r)
> return false;
> }
>
> +static int is_vm_instr_opcode(struct x86_emulate_ctxt *ctxt)

Nit: it seems we either return '0' or 'ctxt->modrm' which is 'u8', so
'u8' instead of 'int' maybe?

> +{
> + unsigned long rax;
> +
> + if (ctxt->b != 0x1)
> + return 0;
> +
> + switch (ctxt->modrm) {
> + case 0xd8: /* VMRUN */
> + case 0xda: /* VMLOAD */
> + case 0xdb: /* VMSAVE */
> + rax = kvm_register_read(emul_to_vcpu(ctxt), VCPU_REGS_RAX);
> + if (!kvm_is_host_reserved_region(rax))
> + return 0;
> + break;
> + default:
> + return 0;
> + }
> +
> + return ctxt->modrm;
> +}
> +
> static bool is_vmware_backdoor_opcode(struct x86_emulate_ctxt *ctxt)
> {
> switch (ctxt->opcode_len) {
> @@ -7305,6 +7327,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
> bool writeback = true;
> bool write_fault_to_spt;
> + int vminstr;
>
> if (unlikely(!kvm_x86_ops.can_emulate_instruction(vcpu, insn, insn_len)))
> return 1;
> @@ -7367,10 +7390,14 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
> }
> }
>
> - if ((emulation_type & EMULTYPE_VMWARE_GP) &&
> - !is_vmware_backdoor_opcode(ctxt)) {
> - kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
> - return 1;
> + if (emulation_type & EMULTYPE_PARAVIRT_GP) {
> + vminstr = is_vm_instr_opcode(ctxt);
> + if (!vminstr && !is_vmware_backdoor_opcode(ctxt)) {
> + kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
> + return 1;
> + }
> + if (vminstr)
> + return vminstr;
> }
>
> /*

--
Vitaly


2021-01-12 15:17:15

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 1/2] KVM: x86: Add emulation support for #GP triggered by VM instructions


> On Jan 12, 2021, at 4:15 AM, Vitaly Kuznetsov <[email protected]> wrote:
>
> Wei Huang <[email protected]> writes:
>
>> From: Bandan Das <[email protected]>
>>
>> While running VM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD
>> CPUs check EAX against reserved memory regions (e.g. SMM memory on host)
>> before checking VMCB's instruction intercept. If EAX falls into such
>> memory areas, #GP is triggered before VMEXIT. This causes problem under
>> nested virtualization. To solve this problem, KVM needs to trap #GP and
>> check the instructions triggering #GP. For VM execution instructions,
>> KVM emulates these instructions; otherwise it re-injects #GP back to
>> guest VMs.
>>
>> Signed-off-by: Bandan Das <[email protected]>
>> Co-developed-by: Wei Huang <[email protected]>
>> Signed-off-by: Wei Huang <[email protected]>
>> ---
>> arch/x86/include/asm/kvm_host.h | 8 +-
>> arch/x86/kvm/mmu.h | 1 +
>> arch/x86/kvm/mmu/mmu.c | 7 ++
>> arch/x86/kvm/svm/svm.c | 157 +++++++++++++++++++-------------
>> arch/x86/kvm/svm/svm.h | 8 ++
>> arch/x86/kvm/vmx/vmx.c | 2 +-
>> arch/x86/kvm/x86.c | 37 +++++++-
>> 7 files changed, 146 insertions(+), 74 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 3d6616f6f6ef..0ddc309f5a14 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1450,10 +1450,12 @@ extern u64 kvm_mce_cap_supported;
>> * due to an intercepted #UD (see EMULTYPE_TRAP_UD).
>> * Used to test the full emulator from userspace.
>> *
>> - * EMULTYPE_VMWARE_GP - Set when emulating an intercepted #GP for VMware
>> + * EMULTYPE_PARAVIRT_GP - Set when emulating an intercepted #GP for VMware
>> * backdoor emulation, which is opt in via module param.
>> * VMware backoor emulation handles select instructions
>> - * and reinjects the #GP for all other cases.
>> + * and reinjects #GP for all other cases. This also
>> + * handles other cases where #GP condition needs to be
>> + * handled and emulated appropriately
>> *
>> * EMULTYPE_PF - Set when emulating MMIO by way of an intercepted #PF, in which
>> * case the CR2/GPA value pass on the stack is valid.
>> @@ -1463,7 +1465,7 @@ extern u64 kvm_mce_cap_supported;
>> #define EMULTYPE_SKIP (1 << 2)
>> #define EMULTYPE_ALLOW_RETRY_PF (1 << 3)
>> #define EMULTYPE_TRAP_UD_FORCED (1 << 4)
>> -#define EMULTYPE_VMWARE_GP (1 << 5)
>> +#define EMULTYPE_PARAVIRT_GP (1 << 5)
>> #define EMULTYPE_PF (1 << 6)
>>
>> int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>> index 581925e476d6..1a2fff4e7140 100644
>> --- a/arch/x86/kvm/mmu.h
>> +++ b/arch/x86/kvm/mmu.h
>> @@ -219,5 +219,6 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
>>
>> int kvm_mmu_post_init_vm(struct kvm *kvm);
>> void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
>> +bool kvm_is_host_reserved_region(u64 gpa);
>
> Just a suggestion: "kvm_gpa_in_host_reserved()" maybe?
>
>>
>> #endif
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index 6d16481aa29d..c5c4aaf01a1a 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -50,6 +50,7 @@
>> #include <asm/io.h>
>> #include <asm/vmx.h>
>> #include <asm/kvm_page_track.h>
>> +#include <asm/e820/api.h>
>> #include "trace.h"
>>
>> extern bool itlb_multihit_kvm_mitigation;
>> @@ -5675,6 +5676,12 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,
>> }
>> EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
>>
>> +bool kvm_is_host_reserved_region(u64 gpa)
>> +{
>> + return e820__mbapped_raw_any(gpa-1, gpa+1, E820_TYPE_RESERVED);
>> +}
>
> While _e820__mapped_any()'s doc says '.. checks if any part of the
> range <start,end> is mapped ..' it seems to me that the real check is
> [start, end) so we should use 'gpa' instead of 'gpa-1', no?

Why do you need to check GPA at all?


2021-01-12 15:22:05

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH 1/2] KVM: x86: Add emulation support for #GP triggered by VM instructions

On Tue, 2021-01-12 at 07:11 -0800, Andy Lutomirski wrote:
> > On Jan 12, 2021, at 4:15 AM, Vitaly Kuznetsov <[email protected]> wrote:
> >
> > Wei Huang <[email protected]> writes:
> >
> > > From: Bandan Das <[email protected]>
> > >
> > > While running VM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD
> > > CPUs check EAX against reserved memory regions (e.g. SMM memory on host)
> > > before checking VMCB's instruction intercept. If EAX falls into such
> > > memory areas, #GP is triggered before VMEXIT. This causes problem under
> > > nested virtualization. To solve this problem, KVM needs to trap #GP and
> > > check the instructions triggering #GP. For VM execution instructions,
> > > KVM emulates these instructions; otherwise it re-injects #GP back to
> > > guest VMs.
> > >
> > > Signed-off-by: Bandan Das <[email protected]>
> > > Co-developed-by: Wei Huang <[email protected]>
> > > Signed-off-by: Wei Huang <[email protected]>
> > > ---
> > > arch/x86/include/asm/kvm_host.h | 8 +-
> > > arch/x86/kvm/mmu.h | 1 +
> > > arch/x86/kvm/mmu/mmu.c | 7 ++
> > > arch/x86/kvm/svm/svm.c | 157 +++++++++++++++++++-------------
> > > arch/x86/kvm/svm/svm.h | 8 ++
> > > arch/x86/kvm/vmx/vmx.c | 2 +-
> > > arch/x86/kvm/x86.c | 37 +++++++-
> > > 7 files changed, 146 insertions(+), 74 deletions(-)
> > >
> > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > > index 3d6616f6f6ef..0ddc309f5a14 100644
> > > --- a/arch/x86/include/asm/kvm_host.h
> > > +++ b/arch/x86/include/asm/kvm_host.h
> > > @@ -1450,10 +1450,12 @@ extern u64 kvm_mce_cap_supported;
> > > * due to an intercepted #UD (see EMULTYPE_TRAP_UD).
> > > * Used to test the full emulator from userspace.
> > > *
> > > - * EMULTYPE_VMWARE_GP - Set when emulating an intercepted #GP for VMware
> > > + * EMULTYPE_PARAVIRT_GP - Set when emulating an intercepted #GP for VMware
> > > * backdoor emulation, which is opt in via module param.
> > > * VMware backoor emulation handles select instructions
> > > - * and reinjects the #GP for all other cases.
> > > + * and reinjects #GP for all other cases. This also
> > > + * handles other cases where #GP condition needs to be
> > > + * handled and emulated appropriately
> > > *
> > > * EMULTYPE_PF - Set when emulating MMIO by way of an intercepted #PF, in which
> > > * case the CR2/GPA value pass on the stack is valid.
> > > @@ -1463,7 +1465,7 @@ extern u64 kvm_mce_cap_supported;
> > > #define EMULTYPE_SKIP (1 << 2)
> > > #define EMULTYPE_ALLOW_RETRY_PF (1 << 3)
> > > #define EMULTYPE_TRAP_UD_FORCED (1 << 4)
> > > -#define EMULTYPE_VMWARE_GP (1 << 5)
> > > +#define EMULTYPE_PARAVIRT_GP (1 << 5)
> > > #define EMULTYPE_PF (1 << 6)
> > >
> > > int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
> > > diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> > > index 581925e476d6..1a2fff4e7140 100644
> > > --- a/arch/x86/kvm/mmu.h
> > > +++ b/arch/x86/kvm/mmu.h
> > > @@ -219,5 +219,6 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
> > >
> > > int kvm_mmu_post_init_vm(struct kvm *kvm);
> > > void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
> > > +bool kvm_is_host_reserved_region(u64 gpa);
> >
> > Just a suggestion: "kvm_gpa_in_host_reserved()" maybe?
> >
> > > #endif
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index 6d16481aa29d..c5c4aaf01a1a 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -50,6 +50,7 @@
> > > #include <asm/io.h>
> > > #include <asm/vmx.h>
> > > #include <asm/kvm_page_track.h>
> > > +#include <asm/e820/api.h>
> > > #include "trace.h"
> > >
> > > extern bool itlb_multihit_kvm_mitigation;
> > > @@ -5675,6 +5676,12 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,
> > > }
> > > EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
> > >
> > > +bool kvm_is_host_reserved_region(u64 gpa)
> > > +{
> > > + return e820__mbapped_raw_any(gpa-1, gpa+1, E820_TYPE_RESERVED);
> > > +}
> >
> > While _e820__mapped_any()'s doc says '.. checks if any part of the
> > range <start,end> is mapped ..' it seems to me that the real check is
> > [start, end) so we should use 'gpa' instead of 'gpa-1', no?
>
> Why do you need to check GPA at all?
>
To reduce the scope of the workaround.

The errata only happens when you use one of SVM instructions
in the guest with EAX that happens to be inside one
of the host reserved memory regions (for example SMM).

So it is not expected for an SVM instruction with EAX that is a valid host
physical address to get a #GP due to this errata.

Best regards,
Maxim Levitsky

>


2021-01-12 15:25:31

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 1/2] KVM: x86: Add emulation support for #GP triggered by VM instructions



> On Jan 12, 2021, at 7:17 AM, Maxim Levitsky <[email protected]> wrote:
>
> On Tue, 2021-01-12 at 07:11 -0800, Andy Lutomirski wrote:
>>>> On Jan 12, 2021, at 4:15 AM, Vitaly Kuznetsov <[email protected]> wrote:
>>>
>>> Wei Huang <[email protected]> writes:
>>>
>>>> From: Bandan Das <[email protected]>
>>>>
>>>> While running VM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD
>>>> CPUs check EAX against reserved memory regions (e.g. SMM memory on host)
>>>> before checking VMCB's instruction intercept. If EAX falls into such
>>>> memory areas, #GP is triggered before VMEXIT. This causes problem under
>>>> nested virtualization. To solve this problem, KVM needs to trap #GP and
>>>> check the instructions triggering #GP. For VM execution instructions,
>>>> KVM emulates these instructions; otherwise it re-injects #GP back to
>>>> guest VMs.
>>>>
>>>> Signed-off-by: Bandan Das <[email protected]>
>>>> Co-developed-by: Wei Huang <[email protected]>
>>>> Signed-off-by: Wei Huang <[email protected]>
>>>> ---
>>>> arch/x86/include/asm/kvm_host.h | 8 +-
>>>> arch/x86/kvm/mmu.h | 1 +
>>>> arch/x86/kvm/mmu/mmu.c | 7 ++
>>>> arch/x86/kvm/svm/svm.c | 157 +++++++++++++++++++-------------
>>>> arch/x86/kvm/svm/svm.h | 8 ++
>>>> arch/x86/kvm/vmx/vmx.c | 2 +-
>>>> arch/x86/kvm/x86.c | 37 +++++++-
>>>> 7 files changed, 146 insertions(+), 74 deletions(-)
>>>>
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index 3d6616f6f6ef..0ddc309f5a14 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -1450,10 +1450,12 @@ extern u64 kvm_mce_cap_supported;
>>>> * due to an intercepted #UD (see EMULTYPE_TRAP_UD).
>>>> * Used to test the full emulator from userspace.
>>>> *
>>>> - * EMULTYPE_VMWARE_GP - Set when emulating an intercepted #GP for VMware
>>>> + * EMULTYPE_PARAVIRT_GP - Set when emulating an intercepted #GP for VMware
>>>> * backdoor emulation, which is opt in via module param.
>>>> * VMware backoor emulation handles select instructions
>>>> - * and reinjects the #GP for all other cases.
>>>> + * and reinjects #GP for all other cases. This also
>>>> + * handles other cases where #GP condition needs to be
>>>> + * handled and emulated appropriately
>>>> *
>>>> * EMULTYPE_PF - Set when emulating MMIO by way of an intercepted #PF, in which
>>>> * case the CR2/GPA value pass on the stack is valid.
>>>> @@ -1463,7 +1465,7 @@ extern u64 kvm_mce_cap_supported;
>>>> #define EMULTYPE_SKIP (1 << 2)
>>>> #define EMULTYPE_ALLOW_RETRY_PF (1 << 3)
>>>> #define EMULTYPE_TRAP_UD_FORCED (1 << 4)
>>>> -#define EMULTYPE_VMWARE_GP (1 << 5)
>>>> +#define EMULTYPE_PARAVIRT_GP (1 << 5)
>>>> #define EMULTYPE_PF (1 << 6)
>>>>
>>>> int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
>>>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>>>> index 581925e476d6..1a2fff4e7140 100644
>>>> --- a/arch/x86/kvm/mmu.h
>>>> +++ b/arch/x86/kvm/mmu.h
>>>> @@ -219,5 +219,6 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
>>>>
>>>> int kvm_mmu_post_init_vm(struct kvm *kvm);
>>>> void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
>>>> +bool kvm_is_host_reserved_region(u64 gpa);
>>>
>>> Just a suggestion: "kvm_gpa_in_host_reserved()" maybe?
>>>
>>>> #endif
>>>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>>>> index 6d16481aa29d..c5c4aaf01a1a 100644
>>>> --- a/arch/x86/kvm/mmu/mmu.c
>>>> +++ b/arch/x86/kvm/mmu/mmu.c
>>>> @@ -50,6 +50,7 @@
>>>> #include <asm/io.h>
>>>> #include <asm/vmx.h>
>>>> #include <asm/kvm_page_track.h>
>>>> +#include <asm/e820/api.h>
>>>> #include "trace.h"
>>>>
>>>> extern bool itlb_multihit_kvm_mitigation;
>>>> @@ -5675,6 +5676,12 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,
>>>> }
>>>> EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
>>>>
>>>> +bool kvm_is_host_reserved_region(u64 gpa)
>>>> +{
>>>> + return e820__mbapped_raw_any(gpa-1, gpa+1, E820_TYPE_RESERVED);
>>>> +}
>>>
>>> While _e820__mapped_any()'s doc says '.. checks if any part of the
>>> range <start,end> is mapped ..' it seems to me that the real check is
>>> [start, end) so we should use 'gpa' instead of 'gpa-1', no?
>>
>> Why do you need to check GPA at all?
>>
> To reduce the scope of the workaround.
>
> The errata only happens when you use one of SVM instructions
> in the guest with EAX that happens to be inside one
> of the host reserved memory regions (for example SMM).

This code reduces the scope of the workaround at the cost of increasing the complexity of the workaround and adding a nonsensical coupling between KVM and host details and adding an export that really doesn’t deserve to be exported.

Is there an actual concrete benefit to this check?

2021-01-12 15:52:19

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH 1/2] KVM: x86: Add emulation support for #GP triggered by VM instructions

Andy Lutomirski <[email protected]> writes:
...
>>>>> #endif diff --git a/arch/x86/kvm/mmu/mmu.c
>>>>> b/arch/x86/kvm/mmu/mmu.c index 6d16481aa29d..c5c4aaf01a1a 100644
>>>>> --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@
>>>>> -50,6 +50,7 @@ #include <asm/io.h> #include <asm/vmx.h> #include
>>>>> <asm/kvm_page_track.h> +#include <asm/e820/api.h> #include
>>>>> "trace.h"
>>>>>
>>>>> extern bool itlb_multihit_kvm_mitigation; @@ -5675,6 +5676,12 @@
>>>>> void kvm_mmu_slot_set_dirty(struct kvm *kvm, }
>>>>> EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
>>>>>
>>>>> +bool kvm_is_host_reserved_region(u64 gpa) +{ + return
>>>>> e820__mbapped_raw_any(gpa-1, gpa+1, E820_TYPE_RESERVED); +}
>>>> While _e820__mapped_any()'s doc says '.. checks if any part of
>>>> the range <start,end> is mapped ..' it seems to me that the real
>>>> check is [start, end) so we should use 'gpa' instead of 'gpa-1',
>>>> no?
>>> Why do you need to check GPA at all?
>>>
>> To reduce the scope of the workaround.
>>
>> The errata only happens when you use one of SVM instructions in the
>> guest with EAX that happens to be inside one of the host reserved
>> memory regions (for example SMM).
>
> This code reduces the scope of the workaround at the cost of
> increasing the complexity of the workaround and adding a nonsensical
> coupling between KVM and host details and adding an export that really
> doesn’t deserve to be exported.
>
> Is there an actual concrete benefit to this check?

Besides reducing the scope, my intention for the check was that we should
know if such exceptions occur for any other undiscovered reasons with other
memory types rather than hiding them under this workaround.

Bandan



2021-01-12 15:54:33

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH 1/2] KVM: x86: Add emulation support for #GP triggered by VM instructions


> On Jan 12, 2021, at 7:46 AM, Bandan Das <[email protected]> wrote:
>
> Andy Lutomirski <[email protected]> writes:
> ...
>>>>>> #endif diff --git a/arch/x86/kvm/mmu/mmu.c
>>>>>> b/arch/x86/kvm/mmu/mmu.c index 6d16481aa29d..c5c4aaf01a1a 100644
>>>>>> --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@
>>>>>> -50,6 +50,7 @@ #include <asm/io.h> #include <asm/vmx.h> #include
>>>>>> <asm/kvm_page_track.h> +#include <asm/e820/api.h> #include
>>>>>> "trace.h"
>>>>>>
>>>>>> extern bool itlb_multihit_kvm_mitigation; @@ -5675,6 +5676,12 @@
>>>>>> void kvm_mmu_slot_set_dirty(struct kvm *kvm, }
>>>>>> EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
>>>>>>
>>>>>> +bool kvm_is_host_reserved_region(u64 gpa) +{ + return
>>>>>> e820__mbapped_raw_any(gpa-1, gpa+1, E820_TYPE_RESERVED); +}
>>>>> While _e820__mapped_any()'s doc says '.. checks if any part of
>>>>> the range <start,end> is mapped ..' it seems to me that the real
>>>>> check is [start, end) so we should use 'gpa' instead of 'gpa-1',
>>>>> no?
>>>> Why do you need to check GPA at all?
>>>>
>>> To reduce the scope of the workaround.
>>>
>>> The errata only happens when you use one of SVM instructions in the
>>> guest with EAX that happens to be inside one of the host reserved
>>> memory regions (for example SMM).
>>
>> This code reduces the scope of the workaround at the cost of
>> increasing the complexity of the workaround and adding a nonsensical
>> coupling between KVM and host details and adding an export that really
>> doesn’t deserve to be exported.
>>
>> Is there an actual concrete benefit to this check?
>
> Besides reducing the scope, my intention for the check was that we should
> know if such exceptions occur for any other undiscovered reasons with other
> memory types rather than hiding them under this workaround.

Ask AMD?

I would also believe that someone somewhere has a firmware that simply omits the problematic region instead of listing it as reserved.

>
> Bandan
>
>
>

2021-01-12 17:59:41

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 1/2] KVM: x86: Add emulation support for #GP triggered by VM instructions

On Tue, Jan 12, 2021, Andy Lutomirski wrote:
>
> > On Jan 12, 2021, at 7:46 AM, Bandan Das <[email protected]> wrote:
> >
> > Andy Lutomirski <[email protected]> writes:
> > ...
> >>>>>> #endif diff --git a/arch/x86/kvm/mmu/mmu.c
> >>>>>> b/arch/x86/kvm/mmu/mmu.c index 6d16481aa29d..c5c4aaf01a1a 100644
> >>>>>> --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@
> >>>>>> -50,6 +50,7 @@ #include <asm/io.h> #include <asm/vmx.h> #include
> >>>>>> <asm/kvm_page_track.h> +#include <asm/e820/api.h> #include
> >>>>>> "trace.h"
> >>>>>>
> >>>>>> extern bool itlb_multihit_kvm_mitigation; @@ -5675,6 +5676,12 @@
> >>>>>> void kvm_mmu_slot_set_dirty(struct kvm *kvm, }
> >>>>>> EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
> >>>>>>
> >>>>>> +bool kvm_is_host_reserved_region(u64 gpa) +{ + return
> >>>>>> e820__mbapped_raw_any(gpa-1, gpa+1, E820_TYPE_RESERVED); +}
> >>>>> While _e820__mapped_any()'s doc says '.. checks if any part of
> >>>>> the range <start,end> is mapped ..' it seems to me that the real
> >>>>> check is [start, end) so we should use 'gpa' instead of 'gpa-1',
> >>>>> no?
> >>>> Why do you need to check GPA at all?
> >>>>
> >>> To reduce the scope of the workaround.
> >>>
> >>> The errata only happens when you use one of SVM instructions in the
> >>> guest with EAX that happens to be inside one of the host reserved
> >>> memory regions (for example SMM).
> >>
> >> This code reduces the scope of the workaround at the cost of
> >> increasing the complexity of the workaround and adding a nonsensical
> >> coupling between KVM and host details and adding an export that really
> >> doesn’t deserve to be exported.
> >>
> >> Is there an actual concrete benefit to this check?
> >
> > Besides reducing the scope, my intention for the check was that we should
> > know if such exceptions occur for any other undiscovered reasons with other
> > memory types rather than hiding them under this workaround.
>
> Ask AMD?
>
> I would also believe that someone somewhere has a firmware that simply omits
> the problematic region instead of listing it as reserved.

I agree with Andy, odds are very good that attempting to be precise will lead to
pain due to false negatives.

And, KVM's SVM instruction emulation needs to be be rock solid regardless of
this behavior since KVM unconditionally intercepts the instruction, i.e. there's
basically zero risk to KVM.

2021-01-13 02:51:47

by Wei Huang

[permalink] [raw]
Subject: Re: [PATCH 1/2] KVM: x86: Add emulation support for #GP triggered by VM instructions



On 1/12/21 6:15 AM, Vitaly Kuznetsov wrote:
> Wei Huang <[email protected]> writes:
>
>> From: Bandan Das <[email protected]>
>>
>> While running VM related instructions (VMRUN/VMSAVE/VMLOAD), some AMD
>> CPUs check EAX against reserved memory regions (e.g. SMM memory on host)
>> before checking VMCB's instruction intercept. If EAX falls into such
>> memory areas, #GP is triggered before VMEXIT. This causes problem under
>> nested virtualization. To solve this problem, KVM needs to trap #GP and
>> check the instructions triggering #GP. For VM execution instructions,
>> KVM emulates these instructions; otherwise it re-injects #GP back to
>> guest VMs.
>>
>> Signed-off-by: Bandan Das <[email protected]>
>> Co-developed-by: Wei Huang <[email protected]>
>> Signed-off-by: Wei Huang <[email protected]>
>> ---
>> arch/x86/include/asm/kvm_host.h | 8 +-
>> arch/x86/kvm/mmu.h | 1 +
>> arch/x86/kvm/mmu/mmu.c | 7 ++
>> arch/x86/kvm/svm/svm.c | 157 +++++++++++++++++++-------------
>> arch/x86/kvm/svm/svm.h | 8 ++
>> arch/x86/kvm/vmx/vmx.c | 2 +-
>> arch/x86/kvm/x86.c | 37 +++++++-
>> 7 files changed, 146 insertions(+), 74 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 3d6616f6f6ef..0ddc309f5a14 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1450,10 +1450,12 @@ extern u64 kvm_mce_cap_supported;
>> * due to an intercepted #UD (see EMULTYPE_TRAP_UD).
>> * Used to test the full emulator from userspace.
>> *
>> - * EMULTYPE_VMWARE_GP - Set when emulating an intercepted #GP for VMware
>> + * EMULTYPE_PARAVIRT_GP - Set when emulating an intercepted #GP for VMware
>> * backdoor emulation, which is opt in via module param.
>> * VMware backoor emulation handles select instructions
>> - * and reinjects the #GP for all other cases.
>> + * and reinjects #GP for all other cases. This also
>> + * handles other cases where #GP condition needs to be
>> + * handled and emulated appropriately
>> *
>> * EMULTYPE_PF - Set when emulating MMIO by way of an intercepted #PF, in which
>> * case the CR2/GPA value pass on the stack is valid.
>> @@ -1463,7 +1465,7 @@ extern u64 kvm_mce_cap_supported;
>> #define EMULTYPE_SKIP (1 << 2)
>> #define EMULTYPE_ALLOW_RETRY_PF (1 << 3)
>> #define EMULTYPE_TRAP_UD_FORCED (1 << 4)
>> -#define EMULTYPE_VMWARE_GP (1 << 5)
>> +#define EMULTYPE_PARAVIRT_GP (1 << 5)
>> #define EMULTYPE_PF (1 << 6)
>>
>> int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type);
>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>> index 581925e476d6..1a2fff4e7140 100644
>> --- a/arch/x86/kvm/mmu.h
>> +++ b/arch/x86/kvm/mmu.h
>> @@ -219,5 +219,6 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
>>
>> int kvm_mmu_post_init_vm(struct kvm *kvm);
>> void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
>> +bool kvm_is_host_reserved_region(u64 gpa);
>
> Just a suggestion: "kvm_gpa_in_host_reserved()" maybe?

Will do in v2.

>
>>
>> #endif
>> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
>> index 6d16481aa29d..c5c4aaf01a1a 100644
>> --- a/arch/x86/kvm/mmu/mmu.c
>> +++ b/arch/x86/kvm/mmu/mmu.c
>> @@ -50,6 +50,7 @@
>> #include <asm/io.h>
>> #include <asm/vmx.h>
>> #include <asm/kvm_page_track.h>
>> +#include <asm/e820/api.h>
>> #include "trace.h"
>>
>> extern bool itlb_multihit_kvm_mitigation;
>> @@ -5675,6 +5676,12 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,
>> }
>> EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
>>
>> +bool kvm_is_host_reserved_region(u64 gpa)
>> +{
>> + return e820__mbapped_raw_any(gpa-1, gpa+1, E820_TYPE_RESERVED);
>> +}
>
> While _e820__mapped_any()'s doc says '.. checks if any part of the
> range <start,end> is mapped ..' it seems to me that the real check is
> [start, end) so we should use 'gpa' instead of 'gpa-1', no?

I think you are right. The statement of "entry->addr >= end ||
entry->addr + entry->size <= start" shows the checking is against the
area of [start, end).

>
>> +EXPORT_SYMBOL_GPL(kvm_is_host_reserved_region);
>> +
>> void kvm_mmu_zap_all(struct kvm *kvm)
>> {
>> struct kvm_mmu_page *sp, *node;
>> diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
>> index 7ef171790d02..74620d32aa82 100644
>> --- a/arch/x86/kvm/svm/svm.c
>> +++ b/arch/x86/kvm/svm/svm.c
>> @@ -288,6 +288,7 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
>> if (!(efer & EFER_SVME)) {
>> svm_leave_nested(svm);
>> svm_set_gif(svm, true);
>> + clr_exception_intercept(svm, GP_VECTOR);
>>
>> /*
>> * Free the nested guest state, unless we are in SMM.
>> @@ -309,6 +310,10 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer)
>>
>> svm->vmcb->save.efer = efer | EFER_SVME;
>> vmcb_mark_dirty(svm->vmcb, VMCB_CR);
>> + /* Enable GP interception for SVM instructions if needed */
>> + if (efer & EFER_SVME)
>> + set_exception_intercept(svm, GP_VECTOR);
>> +
>> return 0;
>> }
>>
>> @@ -1957,22 +1962,104 @@ static int ac_interception(struct vcpu_svm *svm)
>> return 1;
>> }
>>
>> +static int vmload_interception(struct vcpu_svm *svm)
>> +{
>> + struct vmcb *nested_vmcb;
>> + struct kvm_host_map map;
>> + int ret;
>> +
>> + if (nested_svm_check_permissions(svm))
>> + return 1;
>> +
>> + ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->vmcb->save.rax), &map);
>> + if (ret) {
>> + if (ret == -EINVAL)
>> + kvm_inject_gp(&svm->vcpu, 0);
>> + return 1;
>> + }
>> +
>> + nested_vmcb = map.hva;
>> +
>> + ret = kvm_skip_emulated_instruction(&svm->vcpu);
>> +
>> + nested_svm_vmloadsave(nested_vmcb, svm->vmcb);
>> + kvm_vcpu_unmap(&svm->vcpu, &map, true);
>> +
>> + return ret;
>> +}
>> +
>> +static int vmsave_interception(struct vcpu_svm *svm)
>> +{
>> + struct vmcb *nested_vmcb;
>> + struct kvm_host_map map;
>> + int ret;
>> +
>> + if (nested_svm_check_permissions(svm))
>> + return 1;
>> +
>> + ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->vmcb->save.rax), &map);
>> + if (ret) {
>> + if (ret == -EINVAL)
>> + kvm_inject_gp(&svm->vcpu, 0);
>> + return 1;
>> + }
>> +
>> + nested_vmcb = map.hva;
>> +
>> + ret = kvm_skip_emulated_instruction(&svm->vcpu);
>> +
>> + nested_svm_vmloadsave(svm->vmcb, nested_vmcb);
>> + kvm_vcpu_unmap(&svm->vcpu, &map, true);
>> +
>> + return ret;
>> +}
>> +
>> +static int vmrun_interception(struct vcpu_svm *svm)
>> +{
>> + if (nested_svm_check_permissions(svm))
>> + return 1;
>> +
>> + return nested_svm_vmrun(svm);
>> +}
>> +
>> +/* Emulate SVM VM execution instructions */
>> +static int svm_emulate_vm_instr(struct kvm_vcpu *vcpu, u8 modrm)
>> +{
>> + struct vcpu_svm *svm = to_svm(vcpu);
>> +
>> + switch (modrm) {
>> + case 0xd8: /* VMRUN */
>> + return vmrun_interception(svm);
>> + case 0xda: /* VMLOAD */
>> + return vmload_interception(svm);
>> + case 0xdb: /* VMSAVE */
>> + return vmsave_interception(svm);
>> + default:
>> + /* inject a #GP for all other cases */
>> + kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
>> + return 1;
>> + }
>> +}
>> +
>> static int gp_interception(struct vcpu_svm *svm)
>> {
>> struct kvm_vcpu *vcpu = &svm->vcpu;
>> u32 error_code = svm->vmcb->control.exit_info_1;
>> -
>> - WARN_ON_ONCE(!enable_vmware_backdoor);
>> + int rc;
>>
>> /*
>> - * VMware backdoor emulation on #GP interception only handles IN{S},
>> - * OUT{S}, and RDPMC, none of which generate a non-zero error code.
>> + * Only VMware backdoor and SVM VME errata are handled. Neither of
>> + * them has non-zero error codes.
>> */
>> if (error_code) {
>> kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
>> return 1;
>> }
>> - return kvm_emulate_instruction(vcpu, EMULTYPE_VMWARE_GP);
>> +
>> + rc = kvm_emulate_instruction(vcpu, EMULTYPE_PARAVIRT_GP);
>> + if (rc > 1)
>> + rc = svm_emulate_vm_instr(vcpu, rc);
>> + return rc;
>> }
>>
>> static bool is_erratum_383(void)
>> @@ -2113,66 +2200,6 @@ static int vmmcall_interception(struct vcpu_svm *svm)
>> return kvm_emulate_hypercall(&svm->vcpu);
>> }
>>
>> -static int vmload_interception(struct vcpu_svm *svm)
>> -{
>> - struct vmcb *nested_vmcb;
>> - struct kvm_host_map map;
>> - int ret;
>> -
>> - if (nested_svm_check_permissions(svm))
>> - return 1;
>> -
>> - ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->vmcb->save.rax), &map);
>> - if (ret) {
>> - if (ret == -EINVAL)
>> - kvm_inject_gp(&svm->vcpu, 0);
>> - return 1;
>> - }
>> -
>> - nested_vmcb = map.hva;
>> -
>> - ret = kvm_skip_emulated_instruction(&svm->vcpu);
>> -
>> - nested_svm_vmloadsave(nested_vmcb, svm->vmcb);
>> - kvm_vcpu_unmap(&svm->vcpu, &map, true);
>> -
>> - return ret;
>> -}
>> -
>> -static int vmsave_interception(struct vcpu_svm *svm)
>> -{
>> - struct vmcb *nested_vmcb;
>> - struct kvm_host_map map;
>> - int ret;
>> -
>> - if (nested_svm_check_permissions(svm))
>> - return 1;
>> -
>> - ret = kvm_vcpu_map(&svm->vcpu, gpa_to_gfn(svm->vmcb->save.rax), &map);
>> - if (ret) {
>> - if (ret == -EINVAL)
>> - kvm_inject_gp(&svm->vcpu, 0);
>> - return 1;
>> - }
>> -
>> - nested_vmcb = map.hva;
>> -
>> - ret = kvm_skip_emulated_instruction(&svm->vcpu);
>> -
>> - nested_svm_vmloadsave(svm->vmcb, nested_vmcb);
>> - kvm_vcpu_unmap(&svm->vcpu, &map, true);
>> -
>> - return ret;
>> -}
>> -
>> -static int vmrun_interception(struct vcpu_svm *svm)
>> -{
>> - if (nested_svm_check_permissions(svm))
>> - return 1;
>> -
>> - return nested_svm_vmrun(svm);
>> -}
>> -
>
> Maybe if you'd do it the other way around and put gp_interception()
> after vm{load,save,run}_interception(), the diff (and code churn)
> would've been smaller?

Agreed.

>
>> void svm_set_gif(struct vcpu_svm *svm, bool value)
>> {
>> if (value) {
>> diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
>> index 0fe874ae5498..d5dffcf59afa 100644
>> --- a/arch/x86/kvm/svm/svm.h
>> +++ b/arch/x86/kvm/svm/svm.h
>> @@ -350,6 +350,14 @@ static inline void clr_exception_intercept(struct vcpu_svm *svm, u32 bit)
>> recalc_intercepts(svm);
>> }
>>
>> +static inline bool is_exception_intercept(struct vcpu_svm *svm, u32 bit)
>> +{
>> + struct vmcb *vmcb = get_host_vmcb(svm);
>> +
>> + WARN_ON_ONCE(bit >= 32);
>> + return vmcb_is_intercept(&vmcb->control, INTERCEPT_EXCEPTION_OFFSET + bit);
>> +}
>> +
>> static inline void svm_set_intercept(struct vcpu_svm *svm, int bit)
>> {
>> struct vmcb *vmcb = get_host_vmcb(svm);
>> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
>> index 2af05d3b0590..5fac2f7cba24 100644
>> --- a/arch/x86/kvm/vmx/vmx.c
>> +++ b/arch/x86/kvm/vmx/vmx.c
>> @@ -4774,7 +4774,7 @@ static int handle_exception_nmi(struct kvm_vcpu *vcpu)
>> kvm_queue_exception_e(vcpu, GP_VECTOR, error_code);
>> return 1;
>> }
>> - return kvm_emulate_instruction(vcpu, EMULTYPE_VMWARE_GP);
>> + return kvm_emulate_instruction(vcpu, EMULTYPE_PARAVIRT_GP);
>> }
>>
>> /*
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 9a8969a6dd06..c3662fc3b1bc 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -7014,7 +7014,7 @@ static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
>> ++vcpu->stat.insn_emulation_fail;
>> trace_kvm_emulate_insn_failed(vcpu);
>>
>> - if (emulation_type & EMULTYPE_VMWARE_GP) {
>> + if (emulation_type & EMULTYPE_PARAVIRT_GP) {
>> kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
>> return 1;
>> }
>> @@ -7267,6 +7267,28 @@ static bool kvm_vcpu_check_breakpoint(struct kvm_vcpu *vcpu, int *r)
>> return false;
>> }
>>
>> +static int is_vm_instr_opcode(struct x86_emulate_ctxt *ctxt)
>
> Nit: it seems we either return '0' or 'ctxt->modrm' which is 'u8', so
> 'u8' instead of 'int' maybe?

Agreed. Also Paolo has some comments around this area as well. We will
take these comments into consideration in v2.

>
>> +{
>> + unsigned long rax;
>> +
>> + if (ctxt->b != 0x1)
>> + return 0;
>> +
>> + switch (ctxt->modrm) {
>> + case 0xd8: /* VMRUN */
>> + case 0xda: /* VMLOAD */
>> + case 0xdb: /* VMSAVE */
>> + rax = kvm_register_read(emul_to_vcpu(ctxt), VCPU_REGS_RAX);
>> + if (!kvm_is_host_reserved_region(rax))
>> + return 0;
>> + break;
>> + default:
>> + return 0;
>> + }
>> +
>> + return ctxt->modrm;
>> +}
>> +
>> static bool is_vmware_backdoor_opcode(struct x86_emulate_ctxt *ctxt)
>> {
>> switch (ctxt->opcode_len) {
>> @@ -7305,6 +7327,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>> struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt;
>> bool writeback = true;
>> bool write_fault_to_spt;
>> + int vminstr;
>>
>> if (unlikely(!kvm_x86_ops.can_emulate_instruction(vcpu, insn, insn_len)))
>> return 1;
>> @@ -7367,10 +7390,14 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
>> }
>> }
>>
>> - if ((emulation_type & EMULTYPE_VMWARE_GP) &&
>> - !is_vmware_backdoor_opcode(ctxt)) {
>> - kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
>> - return 1;
>> + if (emulation_type & EMULTYPE_PARAVIRT_GP) {
>> + vminstr = is_vm_instr_opcode(ctxt);
>> + if (!vminstr && !is_vmware_backdoor_opcode(ctxt)) {
>> + kvm_queue_exception_e(vcpu, GP_VECTOR, 0);
>> + return 1;
>> + }
>> + if (vminstr)
>> + return vminstr;
>> }
>>
>> /*
>

2021-01-13 04:58:26

by Wei Huang

[permalink] [raw]
Subject: Re: [PATCH 1/2] KVM: x86: Add emulation support for #GP triggered by VM instructions



On 1/12/21 11:56 AM, Sean Christopherson wrote:
> On Tue, Jan 12, 2021, Andy Lutomirski wrote:
>>
>>> On Jan 12, 2021, at 7:46 AM, Bandan Das <[email protected]> wrote:
>>>
>>> Andy Lutomirski <[email protected]> writes:
>>> ...
>>>>>>>> #endif diff --git a/arch/x86/kvm/mmu/mmu.c
>>>>>>>> b/arch/x86/kvm/mmu/mmu.c index 6d16481aa29d..c5c4aaf01a1a 100644
>>>>>>>> --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@
>>>>>>>> -50,6 +50,7 @@ #include <asm/io.h> #include <asm/vmx.h> #include
>>>>>>>> <asm/kvm_page_track.h> +#include <asm/e820/api.h> #include
>>>>>>>> "trace.h"
>>>>>>>>
>>>>>>>> extern bool itlb_multihit_kvm_mitigation; @@ -5675,6 +5676,12 @@
>>>>>>>> void kvm_mmu_slot_set_dirty(struct kvm *kvm, }
>>>>>>>> EXPORT_SYMBOL_GPL(kvm_mmu_slot_set_dirty);
>>>>>>>>
>>>>>>>> +bool kvm_is_host_reserved_region(u64 gpa) +{ + return
>>>>>>>> e820__mbapped_raw_any(gpa-1, gpa+1, E820_TYPE_RESERVED); +}
>>>>>>> While _e820__mapped_any()'s doc says '.. checks if any part of
>>>>>>> the range <start,end> is mapped ..' it seems to me that the real
>>>>>>> check is [start, end) so we should use 'gpa' instead of 'gpa-1',
>>>>>>> no?
>>>>>> Why do you need to check GPA at all?
>>>>>>
>>>>> To reduce the scope of the workaround.
>>>>>
>>>>> The errata only happens when you use one of SVM instructions in the
>>>>> guest with EAX that happens to be inside one of the host reserved
>>>>> memory regions (for example SMM).
>>>>
>>>> This code reduces the scope of the workaround at the cost of
>>>> increasing the complexity of the workaround and adding a nonsensical
>>>> coupling between KVM and host details and adding an export that really
>>>> doesn’t deserve to be exported.
>>>>
>>>> Is there an actual concrete benefit to this check?
>>>
>>> Besides reducing the scope, my intention for the check was that we should
>>> know if such exceptions occur for any other undiscovered reasons with other
>>> memory types rather than hiding them under this workaround.
>>
>> Ask AMD?

There are several checking before VMRUN launch. The function,
e820__mapped_raw_any(), was definitely one of the easies way to figure
out the problematic regions we had.

>>
>> I would also believe that someone somewhere has a firmware that simply omits
>> the problematic region instead of listing it as reserved.
>
> I agree with Andy, odds are very good that attempting to be precise will lead to
> pain due to false negatives.
>
> And, KVM's SVM instruction emulation needs to be be rock solid regardless of
> this behavior since KVM unconditionally intercepts the instruction, i.e. there's
> basically zero risk to KVM.
>

Are you saying that the instruction decode before
kvm_is_host_reserved_region() already guarantee the instructions #GP hit
are SVM execution instructions (see below)? If so, I think this argument
is fair.

+ switch (ctxt->modrm) {
+ case 0xd8: /* VMRUN */
+ case 0xda: /* VMLOAD */
+ case 0xdb: /* VMSAVE */

Bandan: What is your thoughts about removing kvm_is_host_reserved_region()?