2020-04-28 17:36:09

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2] KVM: nVMX: Tweak handling of failure code for nested VM-Enter failure

Use an enum for passing around the failure code for a failed VM-Enter
that results in VM-Exit to provide a level of indirection from the final
resting place of the failure code, vmcs.EXIT_QUALIFICATION. The exit
qualification field is an unsigned long, e.g. passing around
'u32 exit_qual' throws up red flags as it suggests KVM may be dropping
bits when reporting errors to L1. This is a red herring because the
only defined failure codes are 0, 2, 3, and 4, i.e. don't come remotely
close to overflowing a u32.

Setting vmcs.EXIT_QUALIFICATION on entry failure is further complicated
by the MSR load list, which returns the (1-based) entry that failed, and
the number of MSRs to load is a 32-bit VMCS field. At first blush, it
would appear that overflowing a u32 is possible, but the number of MSRs
that can be loaded is hardcapped at 4096 (limited by MSR_IA32_VMX_MISC).

In other words, there are two completely disparate types of data that
eventually get stuffed into vmcs.EXIT_QUALIFICATION, neither of which is
an 'unsigned long' in nature. This was presumably the reasoning for
switching to 'u32' when the related code was refactored in commit
ca0bde28f2ed6 ("kvm: nVMX: Split VMCS checks from nested_vmx_run()").

Using an enum for the failure code addresses the technically-possible-
but-will-never-happen scenario where Intel defines a failure code that
doesn't fit in a 32-bit integer. The enum variables and values will
either be automatically sized (gcc 5.4 behavior) or be subjected to some
combination of truncation. The former case will simply work, while the
latter will trigger a compile-time warning unless the compiler is being
particularly unhelpful.

Separating the failure code from the failed MSR entry allows for
disassociating both from vmcs.EXIT_QUALIFICATION, which avoids the
conundrum where KVM has to choose between 'u32 exit_qual' and tracking
values as 'unsigned long' that have no business being tracked as such.
To cement the split, set vmcs12->exit_qualification directly from the
entry error code or failed MSR index instead of bouncing through a local
variable.

Opportunistically rename the variables in load_vmcs12_host_state() and
vmx_set_nested_state() to call out that they're ignored, set exit_reason
on demand on nested VM-Enter failure, and add a comment in
nested_vmx_load_msr() to call out that returning 'i + 1' can't wrap.

No functional change intended.

Reported-by: Vitaly Kuznetsov <[email protected]>
Cc: Jim Mattson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---

v2:
- Set vmcs12->exit_qualification directly to avoid writing the failed
MSR index (a u32) to the entry_failure_code enum. [Jim]
- Set exit_reason on demand since the "goto vm_exit" paths need to set
vmcs12->exit_qualification anyways, i.e. already have curly braces.

arch/x86/include/asm/vmx.h | 10 +++++----
arch/x86/kvm/vmx/nested.c | 44 ++++++++++++++++++++++----------------
2 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 5e090d1f03f8..cd7de4b401fe 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -527,10 +527,12 @@ struct vmx_msr_entry {
/*
* Exit Qualifications for entry failure during or after loading guest state
*/
-#define ENTRY_FAIL_DEFAULT 0
-#define ENTRY_FAIL_PDPTE 2
-#define ENTRY_FAIL_NMI 3
-#define ENTRY_FAIL_VMCS_LINK_PTR 4
+enum vm_entry_failure_code {
+ ENTRY_FAIL_DEFAULT = 0,
+ ENTRY_FAIL_PDPTE = 2,
+ ENTRY_FAIL_NMI = 3,
+ ENTRY_FAIL_VMCS_LINK_PTR = 4,
+};

/*
* Exit Qualifications for EPT Violations
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 2c36f3f53108..dc00d1742480 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -922,6 +922,7 @@ static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
}
return 0;
fail:
+ /* Note, max_msr_list_size is at most 4096, i.e. this can't wrap. */
return i + 1;
}

@@ -1117,7 +1118,7 @@ static bool nested_vmx_transition_mmu_sync(struct kvm_vcpu *vcpu)
* @entry_failure_code.
*/
static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
- u32 *entry_failure_code)
+ enum vm_entry_failure_code *entry_failure_code)
{
if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
if (CC(!nested_cr3_valid(vcpu, cr3))) {
@@ -2470,7 +2471,7 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
* is assigned to entry_failure_code on failure.
*/
static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
- u32 *entry_failure_code)
+ enum vm_entry_failure_code *entry_failure_code)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct hv_enlightened_vmcs *hv_evmcs = vmx->nested.hv_evmcs;
@@ -2930,11 +2931,11 @@ static int nested_check_guest_non_reg_state(struct vmcs12 *vmcs12)

static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12,
- u32 *exit_qual)
+ enum vm_entry_failure_code *entry_failure_code)
{
bool ia32e;

- *exit_qual = ENTRY_FAIL_DEFAULT;
+ *entry_failure_code = ENTRY_FAIL_DEFAULT;

if (CC(!nested_guest_cr0_valid(vcpu, vmcs12->guest_cr0)) ||
CC(!nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4)))
@@ -2949,7 +2950,7 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
return -EINVAL;

if (nested_vmx_check_vmcs_link_ptr(vcpu, vmcs12)) {
- *exit_qual = ENTRY_FAIL_VMCS_LINK_PTR;
+ *entry_failure_code = ENTRY_FAIL_VMCS_LINK_PTR;
return -EINVAL;
}

@@ -3241,9 +3242,9 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+ enum vm_entry_failure_code entry_failure_code;
bool evaluate_pending_interrupts;
- u32 exit_reason = EXIT_REASON_INVALID_STATE;
- u32 exit_qual;
+ u32 exit_reason, failed_index;

if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
kvm_vcpu_flush_tlb_current(vcpu);
@@ -3291,24 +3292,30 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
return NVMX_VMENTRY_VMFAIL;
}

- if (nested_vmx_check_guest_state(vcpu, vmcs12, &exit_qual))
+ if (nested_vmx_check_guest_state(vcpu, vmcs12,
+ &entry_failure_code)) {
+ exit_reason = EXIT_REASON_INVALID_STATE;
+ vmcs12->exit_qualification = entry_failure_code;
goto vmentry_fail_vmexit;
+ }
}

enter_guest_mode(vcpu);
if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
vcpu->arch.tsc_offset += vmcs12->tsc_offset;

- if (prepare_vmcs02(vcpu, vmcs12, &exit_qual))
+ if (prepare_vmcs02(vcpu, vmcs12, &entry_failure_code))
goto vmentry_fail_vmexit_guest_mode;

if (from_vmentry) {
- exit_reason = EXIT_REASON_MSR_LOAD_FAIL;
- exit_qual = nested_vmx_load_msr(vcpu,
- vmcs12->vm_entry_msr_load_addr,
- vmcs12->vm_entry_msr_load_count);
- if (exit_qual)
+ failed_index = nested_vmx_load_msr(vcpu,
+ vmcs12->vm_entry_msr_load_addr,
+ vmcs12->vm_entry_msr_load_count);
+ if (failed_index) {
+ exit_reason = EXIT_REASON_MSR_LOAD_FAIL;
+ vmcs12->exit_qualification = failed_index;
goto vmentry_fail_vmexit_guest_mode;
+ }
} else {
/*
* The MMU is not initialized to point at the right entities yet and
@@ -3372,7 +3379,6 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,

load_vmcs12_host_state(vcpu, vmcs12);
vmcs12->vm_exit_reason = exit_reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
- vmcs12->exit_qualification = exit_qual;
if (enable_shadow_vmcs || vmx->nested.hv_evmcs)
vmx->nested.need_vmcs12_to_shadow_sync = true;
return NVMX_VMENTRY_VMEXIT;
@@ -4066,8 +4072,8 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
struct vmcs12 *vmcs12)
{
+ enum vm_entry_failure_code ignored;
struct kvm_segment seg;
- u32 entry_failure_code;

if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER)
vcpu->arch.efer = vmcs12->host_ia32_efer;
@@ -4102,7 +4108,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
* Only PDPTE load can fail as the value of cr3 was checked on entry and
* couldn't have changed.
*/
- if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, &entry_failure_code))
+ if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, &ignored))
nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL);

if (!enable_ept)
@@ -6002,7 +6008,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
struct vmcs12 *vmcs12;
- u32 exit_qual;
+ enum vm_entry_failure_code ignored;
struct kvm_vmx_nested_state_data __user *user_vmx_nested_state =
&user_kvm_nested_state->data.vmx[0];
int ret;
@@ -6143,7 +6149,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,

if (nested_vmx_check_controls(vcpu, vmcs12) ||
nested_vmx_check_host_state(vcpu, vmcs12) ||
- nested_vmx_check_guest_state(vcpu, vmcs12, &exit_qual))
+ nested_vmx_check_guest_state(vcpu, vmcs12, &ignored))
goto error_guest_mode;

vmx->nested.dirty_vmcs12 = true;
--
2.26.0


2020-04-28 18:49:06

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: nVMX: Tweak handling of failure code for nested VM-Enter failure

On Tue, Apr 28, 2020 at 10:32 AM Sean Christopherson
<[email protected]> wrote:
>
> Use an enum for passing around the failure code for a failed VM-Enter
> that results in VM-Exit to provide a level of indirection from the final
> resting place of the failure code, vmcs.EXIT_QUALIFICATION. The exit
> qualification field is an unsigned long, e.g. passing around
> 'u32 exit_qual' throws up red flags as it suggests KVM may be dropping
> bits when reporting errors to L1. This is a red herring because the
> only defined failure codes are 0, 2, 3, and 4, i.e. don't come remotely
> close to overflowing a u32.
>
> Setting vmcs.EXIT_QUALIFICATION on entry failure is further complicated
> by the MSR load list, which returns the (1-based) entry that failed, and
> the number of MSRs to load is a 32-bit VMCS field. At first blush, it
> would appear that overflowing a u32 is possible, but the number of MSRs
> that can be loaded is hardcapped at 4096 (limited by MSR_IA32_VMX_MISC).
>
> In other words, there are two completely disparate types of data that
> eventually get stuffed into vmcs.EXIT_QUALIFICATION, neither of which is
> an 'unsigned long' in nature. This was presumably the reasoning for
> switching to 'u32' when the related code was refactored in commit
> ca0bde28f2ed6 ("kvm: nVMX: Split VMCS checks from nested_vmx_run()").
>
> Using an enum for the failure code addresses the technically-possible-
> but-will-never-happen scenario where Intel defines a failure code that
> doesn't fit in a 32-bit integer. The enum variables and values will
> either be automatically sized (gcc 5.4 behavior) or be subjected to some
> combination of truncation. The former case will simply work, while the
> latter will trigger a compile-time warning unless the compiler is being
> particularly unhelpful.
>
> Separating the failure code from the failed MSR entry allows for
> disassociating both from vmcs.EXIT_QUALIFICATION, which avoids the
> conundrum where KVM has to choose between 'u32 exit_qual' and tracking
> values as 'unsigned long' that have no business being tracked as such.
> To cement the split, set vmcs12->exit_qualification directly from the
> entry error code or failed MSR index instead of bouncing through a local
> variable.
>
> Opportunistically rename the variables in load_vmcs12_host_state() and
> vmx_set_nested_state() to call out that they're ignored, set exit_reason
> on demand on nested VM-Enter failure, and add a comment in
> nested_vmx_load_msr() to call out that returning 'i + 1' can't wrap.
>
> No functional change intended.
>
> Reported-by: Vitaly Kuznetsov <[email protected]>
> Cc: Jim Mattson <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>

2020-04-29 13:14:33

by Vitaly Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: nVMX: Tweak handling of failure code for nested VM-Enter failure

Sean Christopherson <[email protected]> writes:

> Use an enum for passing around the failure code for a failed VM-Enter
> that results in VM-Exit to provide a level of indirection from the final
> resting place of the failure code, vmcs.EXIT_QUALIFICATION. The exit
> qualification field is an unsigned long, e.g. passing around
> 'u32 exit_qual' throws up red flags as it suggests KVM may be dropping
> bits when reporting errors to L1. This is a red herring because the
> only defined failure codes are 0, 2, 3, and 4, i.e. don't come remotely
> close to overflowing a u32.
>
> Setting vmcs.EXIT_QUALIFICATION on entry failure is further complicated
> by the MSR load list, which returns the (1-based) entry that failed, and
> the number of MSRs to load is a 32-bit VMCS field. At first blush, it
> would appear that overflowing a u32 is possible, but the number of MSRs
> that can be loaded is hardcapped at 4096 (limited by MSR_IA32_VMX_MISC).
>
> In other words, there are two completely disparate types of data that
> eventually get stuffed into vmcs.EXIT_QUALIFICATION, neither of which is
> an 'unsigned long' in nature. This was presumably the reasoning for
> switching to 'u32' when the related code was refactored in commit
> ca0bde28f2ed6 ("kvm: nVMX: Split VMCS checks from nested_vmx_run()").
>
> Using an enum for the failure code addresses the technically-possible-
> but-will-never-happen scenario where Intel defines a failure code that
> doesn't fit in a 32-bit integer. The enum variables and values will
> either be automatically sized (gcc 5.4 behavior) or be subjected to some
> combination of truncation. The former case will simply work, while the
> latter will trigger a compile-time warning unless the compiler is being
> particularly unhelpful.
>
> Separating the failure code from the failed MSR entry allows for
> disassociating both from vmcs.EXIT_QUALIFICATION, which avoids the
> conundrum where KVM has to choose between 'u32 exit_qual' and tracking
> values as 'unsigned long' that have no business being tracked as such.
> To cement the split, set vmcs12->exit_qualification directly from the
> entry error code or failed MSR index instead of bouncing through a local
> variable.
>
> Opportunistically rename the variables in load_vmcs12_host_state() and
> vmx_set_nested_state() to call out that they're ignored, set exit_reason
> on demand on nested VM-Enter failure, and add a comment in
> nested_vmx_load_msr() to call out that returning 'i + 1' can't wrap.
>
> No functional change intended.
>
> Reported-by: Vitaly Kuznetsov <[email protected]>
> Cc: Jim Mattson <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
>
> v2:
> - Set vmcs12->exit_qualification directly to avoid writing the failed
> MSR index (a u32) to the entry_failure_code enum. [Jim]
> - Set exit_reason on demand since the "goto vm_exit" paths need to set
> vmcs12->exit_qualification anyways, i.e. already have curly braces.
>
> arch/x86/include/asm/vmx.h | 10 +++++----
> arch/x86/kvm/vmx/nested.c | 44 ++++++++++++++++++++++----------------
> 2 files changed, 31 insertions(+), 23 deletions(-)
>
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 5e090d1f03f8..cd7de4b401fe 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -527,10 +527,12 @@ struct vmx_msr_entry {
> /*
> * Exit Qualifications for entry failure during or after loading guest state
> */
> -#define ENTRY_FAIL_DEFAULT 0
> -#define ENTRY_FAIL_PDPTE 2
> -#define ENTRY_FAIL_NMI 3
> -#define ENTRY_FAIL_VMCS_LINK_PTR 4
> +enum vm_entry_failure_code {
> + ENTRY_FAIL_DEFAULT = 0,
> + ENTRY_FAIL_PDPTE = 2,
> + ENTRY_FAIL_NMI = 3,
> + ENTRY_FAIL_VMCS_LINK_PTR = 4,
> +};
>
> /*
> * Exit Qualifications for EPT Violations
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 2c36f3f53108..dc00d1742480 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -922,6 +922,7 @@ static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
> }
> return 0;
> fail:
> + /* Note, max_msr_list_size is at most 4096, i.e. this can't wrap. */
> return i + 1;
> }
>
> @@ -1117,7 +1118,7 @@ static bool nested_vmx_transition_mmu_sync(struct kvm_vcpu *vcpu)
> * @entry_failure_code.
> */
> static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
> - u32 *entry_failure_code)
> + enum vm_entry_failure_code *entry_failure_code)
> {
> if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
> if (CC(!nested_cr3_valid(vcpu, cr3))) {
> @@ -2470,7 +2471,7 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
> * is assigned to entry_failure_code on failure.
> */
> static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> - u32 *entry_failure_code)
> + enum vm_entry_failure_code *entry_failure_code)
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> struct hv_enlightened_vmcs *hv_evmcs = vmx->nested.hv_evmcs;
> @@ -2930,11 +2931,11 @@ static int nested_check_guest_non_reg_state(struct vmcs12 *vmcs12)
>
> static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
> struct vmcs12 *vmcs12,
> - u32 *exit_qual)
> + enum vm_entry_failure_code *entry_failure_code)
> {
> bool ia32e;
>
> - *exit_qual = ENTRY_FAIL_DEFAULT;
> + *entry_failure_code = ENTRY_FAIL_DEFAULT;
>
> if (CC(!nested_guest_cr0_valid(vcpu, vmcs12->guest_cr0)) ||
> CC(!nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4)))
> @@ -2949,7 +2950,7 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
> return -EINVAL;
>
> if (nested_vmx_check_vmcs_link_ptr(vcpu, vmcs12)) {
> - *exit_qual = ENTRY_FAIL_VMCS_LINK_PTR;
> + *entry_failure_code = ENTRY_FAIL_VMCS_LINK_PTR;
> return -EINVAL;
> }
>
> @@ -3241,9 +3242,9 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> + enum vm_entry_failure_code entry_failure_code;
> bool evaluate_pending_interrupts;
> - u32 exit_reason = EXIT_REASON_INVALID_STATE;
> - u32 exit_qual;
> + u32 exit_reason, failed_index;
>
> if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
> kvm_vcpu_flush_tlb_current(vcpu);
> @@ -3291,24 +3292,30 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
> return NVMX_VMENTRY_VMFAIL;
> }
>
> - if (nested_vmx_check_guest_state(vcpu, vmcs12, &exit_qual))
> + if (nested_vmx_check_guest_state(vcpu, vmcs12,
> + &entry_failure_code)) {
> + exit_reason = EXIT_REASON_INVALID_STATE;
> + vmcs12->exit_qualification = entry_failure_code;
> goto vmentry_fail_vmexit;
> + }
> }
>
> enter_guest_mode(vcpu);
> if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
> vcpu->arch.tsc_offset += vmcs12->tsc_offset;
>
> - if (prepare_vmcs02(vcpu, vmcs12, &exit_qual))
> + if (prepare_vmcs02(vcpu, vmcs12, &entry_failure_code))
> goto vmentry_fail_vmexit_guest_mode;
>
> if (from_vmentry) {
> - exit_reason = EXIT_REASON_MSR_LOAD_FAIL;
> - exit_qual = nested_vmx_load_msr(vcpu,
> - vmcs12->vm_entry_msr_load_addr,
> - vmcs12->vm_entry_msr_load_count);
> - if (exit_qual)
> + failed_index = nested_vmx_load_msr(vcpu,
> + vmcs12->vm_entry_msr_load_addr,
> + vmcs12->vm_entry_msr_load_count);
> + if (failed_index) {
> + exit_reason = EXIT_REASON_MSR_LOAD_FAIL;
> + vmcs12->exit_qualification = failed_index;
> goto vmentry_fail_vmexit_guest_mode;
> + }
> } else {
> /*
> * The MMU is not initialized to point at the right entities yet and
> @@ -3372,7 +3379,6 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
>
> load_vmcs12_host_state(vcpu, vmcs12);
> vmcs12->vm_exit_reason = exit_reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
> - vmcs12->exit_qualification = exit_qual;
> if (enable_shadow_vmcs || vmx->nested.hv_evmcs)
> vmx->nested.need_vmcs12_to_shadow_sync = true;
> return NVMX_VMENTRY_VMEXIT;
> @@ -4066,8 +4072,8 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
> struct vmcs12 *vmcs12)
> {
> + enum vm_entry_failure_code ignored;
> struct kvm_segment seg;
> - u32 entry_failure_code;
>
> if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER)
> vcpu->arch.efer = vmcs12->host_ia32_efer;
> @@ -4102,7 +4108,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
> * Only PDPTE load can fail as the value of cr3 was checked on entry and
> * couldn't have changed.
> */
> - if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, &entry_failure_code))
> + if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, &ignored))
> nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL);
>
> if (!enable_ept)
> @@ -6002,7 +6008,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> struct vmcs12 *vmcs12;
> - u32 exit_qual;
> + enum vm_entry_failure_code ignored;
> struct kvm_vmx_nested_state_data __user *user_vmx_nested_state =
> &user_kvm_nested_state->data.vmx[0];
> int ret;
> @@ -6143,7 +6149,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
>
> if (nested_vmx_check_controls(vcpu, vmcs12) ||
> nested_vmx_check_host_state(vcpu, vmcs12) ||
> - nested_vmx_check_guest_state(vcpu, vmcs12, &exit_qual))
> + nested_vmx_check_guest_state(vcpu, vmcs12, &ignored))
> goto error_guest_mode;
>
> vmx->nested.dirty_vmcs12 = true;

Reviewed-by: Vitaly Kuznetsov <[email protected]>

Thanks!

--
Vitaly

2020-05-04 16:48:15

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2] KVM: nVMX: Tweak handling of failure code for nested VM-Enter failure

On 28/04/20 19:32, Sean Christopherson wrote:
> Use an enum for passing around the failure code for a failed VM-Enter
> that results in VM-Exit to provide a level of indirection from the final
> resting place of the failure code, vmcs.EXIT_QUALIFICATION. The exit
> qualification field is an unsigned long, e.g. passing around
> 'u32 exit_qual' throws up red flags as it suggests KVM may be dropping
> bits when reporting errors to L1. This is a red herring because the
> only defined failure codes are 0, 2, 3, and 4, i.e. don't come remotely
> close to overflowing a u32.
>
> Setting vmcs.EXIT_QUALIFICATION on entry failure is further complicated
> by the MSR load list, which returns the (1-based) entry that failed, and
> the number of MSRs to load is a 32-bit VMCS field. At first blush, it
> would appear that overflowing a u32 is possible, but the number of MSRs
> that can be loaded is hardcapped at 4096 (limited by MSR_IA32_VMX_MISC).
>
> In other words, there are two completely disparate types of data that
> eventually get stuffed into vmcs.EXIT_QUALIFICATION, neither of which is
> an 'unsigned long' in nature. This was presumably the reasoning for
> switching to 'u32' when the related code was refactored in commit
> ca0bde28f2ed6 ("kvm: nVMX: Split VMCS checks from nested_vmx_run()").
>
> Using an enum for the failure code addresses the technically-possible-
> but-will-never-happen scenario where Intel defines a failure code that
> doesn't fit in a 32-bit integer. The enum variables and values will
> either be automatically sized (gcc 5.4 behavior) or be subjected to some
> combination of truncation. The former case will simply work, while the
> latter will trigger a compile-time warning unless the compiler is being
> particularly unhelpful.
>
> Separating the failure code from the failed MSR entry allows for
> disassociating both from vmcs.EXIT_QUALIFICATION, which avoids the
> conundrum where KVM has to choose between 'u32 exit_qual' and tracking
> values as 'unsigned long' that have no business being tracked as such.
> To cement the split, set vmcs12->exit_qualification directly from the
> entry error code or failed MSR index instead of bouncing through a local
> variable.
>
> Opportunistically rename the variables in load_vmcs12_host_state() and
> vmx_set_nested_state() to call out that they're ignored, set exit_reason
> on demand on nested VM-Enter failure, and add a comment in
> nested_vmx_load_msr() to call out that returning 'i + 1' can't wrap.
>
> No functional change intended.
>
> Reported-by: Vitaly Kuznetsov <[email protected]>
> Cc: Jim Mattson <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
>
> v2:
> - Set vmcs12->exit_qualification directly to avoid writing the failed
> MSR index (a u32) to the entry_failure_code enum. [Jim]
> - Set exit_reason on demand since the "goto vm_exit" paths need to set
> vmcs12->exit_qualification anyways, i.e. already have curly braces.
>
> arch/x86/include/asm/vmx.h | 10 +++++----
> arch/x86/kvm/vmx/nested.c | 44 ++++++++++++++++++++++----------------
> 2 files changed, 31 insertions(+), 23 deletions(-)
>
> diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
> index 5e090d1f03f8..cd7de4b401fe 100644
> --- a/arch/x86/include/asm/vmx.h
> +++ b/arch/x86/include/asm/vmx.h
> @@ -527,10 +527,12 @@ struct vmx_msr_entry {
> /*
> * Exit Qualifications for entry failure during or after loading guest state
> */
> -#define ENTRY_FAIL_DEFAULT 0
> -#define ENTRY_FAIL_PDPTE 2
> -#define ENTRY_FAIL_NMI 3
> -#define ENTRY_FAIL_VMCS_LINK_PTR 4
> +enum vm_entry_failure_code {
> + ENTRY_FAIL_DEFAULT = 0,
> + ENTRY_FAIL_PDPTE = 2,
> + ENTRY_FAIL_NMI = 3,
> + ENTRY_FAIL_VMCS_LINK_PTR = 4,
> +};
>
> /*
> * Exit Qualifications for EPT Violations
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 2c36f3f53108..dc00d1742480 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -922,6 +922,7 @@ static u32 nested_vmx_load_msr(struct kvm_vcpu *vcpu, u64 gpa, u32 count)
> }
> return 0;
> fail:
> + /* Note, max_msr_list_size is at most 4096, i.e. this can't wrap. */
> return i + 1;
> }
>
> @@ -1117,7 +1118,7 @@ static bool nested_vmx_transition_mmu_sync(struct kvm_vcpu *vcpu)
> * @entry_failure_code.
> */
> static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, bool nested_ept,
> - u32 *entry_failure_code)
> + enum vm_entry_failure_code *entry_failure_code)
> {
> if (cr3 != kvm_read_cr3(vcpu) || (!nested_ept && pdptrs_changed(vcpu))) {
> if (CC(!nested_cr3_valid(vcpu, cr3))) {
> @@ -2470,7 +2471,7 @@ static void prepare_vmcs02_rare(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
> * is assigned to entry_failure_code on failure.
> */
> static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> - u32 *entry_failure_code)
> + enum vm_entry_failure_code *entry_failure_code)
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> struct hv_enlightened_vmcs *hv_evmcs = vmx->nested.hv_evmcs;
> @@ -2930,11 +2931,11 @@ static int nested_check_guest_non_reg_state(struct vmcs12 *vmcs12)
>
> static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
> struct vmcs12 *vmcs12,
> - u32 *exit_qual)
> + enum vm_entry_failure_code *entry_failure_code)
> {
> bool ia32e;
>
> - *exit_qual = ENTRY_FAIL_DEFAULT;
> + *entry_failure_code = ENTRY_FAIL_DEFAULT;
>
> if (CC(!nested_guest_cr0_valid(vcpu, vmcs12->guest_cr0)) ||
> CC(!nested_guest_cr4_valid(vcpu, vmcs12->guest_cr4)))
> @@ -2949,7 +2950,7 @@ static int nested_vmx_check_guest_state(struct kvm_vcpu *vcpu,
> return -EINVAL;
>
> if (nested_vmx_check_vmcs_link_ptr(vcpu, vmcs12)) {
> - *exit_qual = ENTRY_FAIL_VMCS_LINK_PTR;
> + *entry_failure_code = ENTRY_FAIL_VMCS_LINK_PTR;
> return -EINVAL;
> }
>
> @@ -3241,9 +3242,9 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
> + enum vm_entry_failure_code entry_failure_code;
> bool evaluate_pending_interrupts;
> - u32 exit_reason = EXIT_REASON_INVALID_STATE;
> - u32 exit_qual;
> + u32 exit_reason, failed_index;
>
> if (kvm_check_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu))
> kvm_vcpu_flush_tlb_current(vcpu);
> @@ -3291,24 +3292,30 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
> return NVMX_VMENTRY_VMFAIL;
> }
>
> - if (nested_vmx_check_guest_state(vcpu, vmcs12, &exit_qual))
> + if (nested_vmx_check_guest_state(vcpu, vmcs12,
> + &entry_failure_code)) {
> + exit_reason = EXIT_REASON_INVALID_STATE;
> + vmcs12->exit_qualification = entry_failure_code;
> goto vmentry_fail_vmexit;
> + }
> }
>
> enter_guest_mode(vcpu);
> if (vmcs12->cpu_based_vm_exec_control & CPU_BASED_USE_TSC_OFFSETTING)
> vcpu->arch.tsc_offset += vmcs12->tsc_offset;
>
> - if (prepare_vmcs02(vcpu, vmcs12, &exit_qual))
> + if (prepare_vmcs02(vcpu, vmcs12, &entry_failure_code))
> goto vmentry_fail_vmexit_guest_mode;
>
> if (from_vmentry) {
> - exit_reason = EXIT_REASON_MSR_LOAD_FAIL;
> - exit_qual = nested_vmx_load_msr(vcpu,
> - vmcs12->vm_entry_msr_load_addr,
> - vmcs12->vm_entry_msr_load_count);
> - if (exit_qual)
> + failed_index = nested_vmx_load_msr(vcpu,
> + vmcs12->vm_entry_msr_load_addr,
> + vmcs12->vm_entry_msr_load_count);
> + if (failed_index) {
> + exit_reason = EXIT_REASON_MSR_LOAD_FAIL;
> + vmcs12->exit_qualification = failed_index;
> goto vmentry_fail_vmexit_guest_mode;
> + }
> } else {
> /*
> * The MMU is not initialized to point at the right entities yet and
> @@ -3372,7 +3379,6 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu,
>
> load_vmcs12_host_state(vcpu, vmcs12);
> vmcs12->vm_exit_reason = exit_reason | VMX_EXIT_REASONS_FAILED_VMENTRY;
> - vmcs12->exit_qualification = exit_qual;
> if (enable_shadow_vmcs || vmx->nested.hv_evmcs)
> vmx->nested.need_vmcs12_to_shadow_sync = true;
> return NVMX_VMENTRY_VMEXIT;
> @@ -4066,8 +4072,8 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
> struct vmcs12 *vmcs12)
> {
> + enum vm_entry_failure_code ignored;
> struct kvm_segment seg;
> - u32 entry_failure_code;
>
> if (vmcs12->vm_exit_controls & VM_EXIT_LOAD_IA32_EFER)
> vcpu->arch.efer = vmcs12->host_ia32_efer;
> @@ -4102,7 +4108,7 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu,
> * Only PDPTE load can fail as the value of cr3 was checked on entry and
> * couldn't have changed.
> */
> - if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, &entry_failure_code))
> + if (nested_vmx_load_cr3(vcpu, vmcs12->host_cr3, false, &ignored))
> nested_vmx_abort(vcpu, VMX_ABORT_LOAD_HOST_PDPTE_FAIL);
>
> if (!enable_ept)
> @@ -6002,7 +6008,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
> struct vmcs12 *vmcs12;
> - u32 exit_qual;
> + enum vm_entry_failure_code ignored;
> struct kvm_vmx_nested_state_data __user *user_vmx_nested_state =
> &user_kvm_nested_state->data.vmx[0];
> int ret;
> @@ -6143,7 +6149,7 @@ static int vmx_set_nested_state(struct kvm_vcpu *vcpu,
>
> if (nested_vmx_check_controls(vcpu, vmcs12) ||
> nested_vmx_check_host_state(vcpu, vmcs12) ||
> - nested_vmx_check_guest_state(vcpu, vmcs12, &exit_qual))
> + nested_vmx_check_guest_state(vcpu, vmcs12, &ignored))
> goto error_guest_mode;
>
> vmx->nested.dirty_vmcs12 = true;
>

Queued this one, actually.

Paolo

2020-05-11 01:18:56

by Chen, Rong A

[permalink] [raw]
Subject: [KVM] 6b6a864bd7: kernel-selftests.kvm.vmx_tsc_adjust_test.fail

Greeting,

FYI, we noticed the following commit (built with gcc-7):

commit: 6b6a864bd7d7eb12a82ec6da1c2dd97a43f79449 ("[PATCH v2] KVM: nVMX: Tweak handling of failure code for nested VM-Enter failure")
url: https://github.com/0day-ci/linux/commits/Sean-Christopherson/KVM-nVMX-Tweak-handling-of-failure-code-for-nested-VM-Enter-failure/20200429-052911
base: https://git.kernel.org/cgit/virt/kvm/kvm.git linux-next

in testcase: kernel-selftests
with following parameters:

group: kselftests-kvm
ucode: 0x500002c

test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
test-url: https://www.kernel.org/doc/Documentation/kselftest.txt


on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>



# selftests: kvm: vmx_tsc_adjust_test
# ==== Test Assertion Failure ====
# x86_64/vmx_tsc_adjust_test.c:153: false
# pid=12157 tid=12157 - Interrupted system call
# 1 0x000000000040116a: main at vmx_tsc_adjust_test.c:153
# 2 0x00007fafd54bce0a: ?? ??:0
# 3 0x00000000004011e9: _start at ??:?
# Failed guest assert: (vmreadz(VM_EXIT_REASON) == (0x80000000 | 33))
# IA32_TSC_ADJUST is -4294972240 (-1 * TSC_ADJUST_VALUE + -4944).
not ok 14 selftests: kvm: vmx_tsc_adjust_test # exit=254


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp install job.yaml # job file is attached in this email
bin/lkp run job.yaml



Thanks,
Rong Chen


Attachments:
(No filename) (1.80 kB)
config-5.6.0-08409-g6b6a864bd7d7eb (207.04 kB)
job-script (6.31 kB)
kmsg.xz (38.88 kB)
kernel-selftests (66.08 kB)
job.yaml (5.43 kB)
reproduce (169.00 B)
Download all attachments

2020-05-11 21:39:41

by Sean Christopherson

[permalink] [raw]
Subject: Re: [KVM] 6b6a864bd7: kernel-selftests.kvm.vmx_tsc_adjust_test.fail

On Mon, May 11, 2020 at 09:09:26AM +0800, kernel test robot wrote:
> Greeting,
>
> FYI, we noticed the following commit (built with gcc-7):
>
> commit: 6b6a864bd7d7eb12a82ec6da1c2dd97a43f79449 ("[PATCH v2] KVM: nVMX: Tweak handling of failure code for nested VM-Enter failure")
> url: https://github.com/0day-ci/linux/commits/Sean-Christopherson/KVM-nVMX-Tweak-handling-of-failure-code-for-nested-VM-Enter-failure/20200429-052911
> base: https://git.kernel.org/cgit/virt/kvm/kvm.git linux-next
>
> in testcase: kernel-selftests
> with following parameters:
>
> group: kselftests-kvm
> ucode: 0x500002c
>
> test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
> test-url: https://www.kernel.org/doc/Documentation/kselftest.txt
>
>
> on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> If you fix the issue, kindly add following tag
> Reported-by: kernel test robot <[email protected]>
>
>
>
> # selftests: kvm: vmx_tsc_adjust_test
> # ==== Test Assertion Failure ====
> # x86_64/vmx_tsc_adjust_test.c:153: false
> # pid=12157 tid=12157 - Interrupted system call
> # 1 0x000000000040116a: main at vmx_tsc_adjust_test.c:153
> # 2 0x00007fafd54bce0a: ?? ??:0
> # 3 0x00000000004011e9: _start at ??:?
> # Failed guest assert: (vmreadz(VM_EXIT_REASON) == (0x80000000 | 33))
> # IA32_TSC_ADJUST is -4294972240 (-1 * TSC_ADJUST_VALUE + -4944).
> not ok 14 selftests: kvm: vmx_tsc_adjust_test # exit=254

Ugh, missed the prepare_vmcs02() path. v3 incoming.