2022-06-14 20:49:21

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 01/21] KVM: nVMX: Unconditionally purge queued/injected events on nested "exit"

Drop pending exceptions and events queued for re-injection when leaving
nested guest mode, even if the "exit" is due to VM-Fail, SMI, or forced
by host userspace. Failure to purge events could result in an event
belonging to L2 being injected into L1.

This _should_ never happen for VM-Fail as all events should be blocked by
nested_run_pending, but it's possible if KVM, not the L1 hypervisor, is
the source of VM-Fail when running vmcs02.

SMI is a nop (barring unknown bugs) as recognition of SMI and thus entry
to SMM is blocked by pending exceptions and re-injected events.

Forced exit is definitely buggy, but has likely gone unnoticed because
userspace probably follows the forced exit with KVM_SET_VCPU_EVENTS (or
some other ioctl() that purges the queue).

Fixes: 4f350c6dbcb9 ("kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly")
Cc: [email protected]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 7d8cd0ebcc75..ee6f27dffdba 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4263,14 +4263,6 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
nested_vmx_abort(vcpu,
VMX_ABORT_SAVE_GUEST_MSR_FAIL);
}
-
- /*
- * Drop what we picked up for L2 via vmx_complete_interrupts. It is
- * preserved above and would only end up incorrectly in L1.
- */
- vcpu->arch.nmi_injected = false;
- kvm_clear_exception_queue(vcpu);
- kvm_clear_interrupt_queue(vcpu);
}

/*
@@ -4609,6 +4601,17 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
WARN_ON_ONCE(nested_early_check);
}

+ /*
+ * Drop events/exceptions that were queued for re-injection to L2
+ * (picked up via vmx_complete_interrupts()), as well as exceptions
+ * that were pending for L2. Note, this must NOT be hoisted above
+ * prepare_vmcs12(), events/exceptions queued for re-injection need to
+ * be captured in vmcs12 (see vmcs12_save_pending_event()).
+ */
+ vcpu->arch.nmi_injected = false;
+ kvm_clear_exception_queue(vcpu);
+ kvm_clear_interrupt_queue(vcpu);
+
vmx_switch_vmcs(vcpu, &vmx->vmcs01);

/* Update any VMCS fields that might have changed while L2 ran */
--
2.36.1.476.g0c4daa206d-goog


2022-06-16 23:55:05

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH v2 01/21] KVM: nVMX: Unconditionally purge queued/injected events on nested "exit"

On Tue, Jun 14, 2022 at 1:47 PM Sean Christopherson <[email protected]> wrote:
>
> Drop pending exceptions and events queued for re-injection when leaving
> nested guest mode, even if the "exit" is due to VM-Fail, SMI, or forced
> by host userspace. Failure to purge events could result in an event
> belonging to L2 being injected into L1.
>
> This _should_ never happen for VM-Fail as all events should be blocked by
> nested_run_pending, but it's possible if KVM, not the L1 hypervisor, is
> the source of VM-Fail when running vmcs02.
>
> SMI is a nop (barring unknown bugs) as recognition of SMI and thus entry
> to SMM is blocked by pending exceptions and re-injected events.
>
> Forced exit is definitely buggy, but has likely gone unnoticed because
> userspace probably follows the forced exit with KVM_SET_VCPU_EVENTS (or
> some other ioctl() that purges the queue).
>
> Fixes: 4f350c6dbcb9 ("kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly")
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>

2022-07-06 12:34:58

by Maxim Levitsky

[permalink] [raw]
Subject: Re: [PATCH v2 01/21] KVM: nVMX: Unconditionally purge queued/injected events on nested "exit"

On Tue, 2022-06-14 at 20:47 +0000, Sean Christopherson wrote:
> Drop pending exceptions and events queued for re-injection when leaving
> nested guest mode, even if the "exit" is due to VM-Fail, SMI, or forced
> by host userspace. Failure to purge events could result in an event
> belonging to L2 being injected into L1.
>
> This _should_ never happen for VM-Fail as all events should be blocked by
> nested_run_pending, but it's possible if KVM, not the L1 hypervisor, is
> the source of VM-Fail when running vmcs02.
>
> SMI is a nop (barring unknown bugs) as recognition of SMI and thus entry
> to SMM is blocked by pending exceptions and re-injected events.
>
> Forced exit is definitely buggy, but has likely gone unnoticed because
> userspace probably follows the forced exit with KVM_SET_VCPU_EVENTS (or
> some other ioctl() that purges the queue).
>
> Fixes: 4f350c6dbcb9 ("kvm: nVMX: Handle deferred early VMLAUNCH/VMRESUME failure properly")
> Cc: [email protected]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/vmx/nested.c | 19 +++++++++++--------
> 1 file changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index 7d8cd0ebcc75..ee6f27dffdba 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -4263,14 +4263,6 @@ static void prepare_vmcs12(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
> nested_vmx_abort(vcpu,
> VMX_ABORT_SAVE_GUEST_MSR_FAIL);
> }
> -
> - /*
> - * Drop what we picked up for L2 via vmx_complete_interrupts. It is
> - * preserved above and would only end up incorrectly in L1.
> - */
> - vcpu->arch.nmi_injected = false;
> - kvm_clear_exception_queue(vcpu);
> - kvm_clear_interrupt_queue(vcpu);
> }
>
> /*
> @@ -4609,6 +4601,17 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
> WARN_ON_ONCE(nested_early_check);
> }
>
> + /*
> + * Drop events/exceptions that were queued for re-injection to L2
> + * (picked up via vmx_complete_interrupts()), as well as exceptions
> + * that were pending for L2. Note, this must NOT be hoisted above
> + * prepare_vmcs12(), events/exceptions queued for re-injection need to
> + * be captured in vmcs12 (see vmcs12_save_pending_event()).
> + */
> + vcpu->arch.nmi_injected = false;
> + kvm_clear_exception_queue(vcpu);
> + kvm_clear_interrupt_queue(vcpu);
> +
> vmx_switch_vmcs(vcpu, &vmx->vmcs01);
>
> /* Update any VMCS fields that might have changed while L2 ran */

Makes sense.

Reviewed-by: Maxim Levitsky <[email protected]>

Best regards,
Maxim Levitsky