2020-04-23 02:29:44

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 00/13] KVM: x86: Event fixes and cleanup

Most of this series only really affects nVMX, but there are a few x86
changes sprinkled in.

Patches 1 and 2 are alternative fixes[1][2] for bugs where a #DB destined
for L2 is dropped because a lower priority event, e.g. VMX preemption
timer, is serviced and triggers VM-Exit, and where correctly handling the
#DB can result in the preemption timer being dropped.

Patch 3 fixes a semi-theoretical bug. I've been intermittently observing
failures when running the preemption timer unit test in L1, but have never
been able to consistently reproduce the bug. I suspect the issue is
KVM_REQ_EVENT being lost, but can't really confirm this is the case due to
lack of a reproducer.

Patches 4-7 are cleanup/refactoring to fix non-exiting NMI/INTR priority
bugs (similar to above) in patch 8. Although patch 8 is technically a bug
fix, I don't think it's stable material (no sane L1 will notice), which is
why I prioritized (da-dum ching) a clean implementation over an easily
backported patch (a single patch would have been ugly).

Patch 9 fixes a similar issue with SMI priority, and again is probably not
stable material.

Patch 10 addresses a gap in WARN coverage that's effectively introduced
by the bug fix in patch 1.

Patches 11 and 12 replace the extra call to check_nested_events() with a
more precise hack-a-fix. This is a very small step towards a pipe dream
of processing each event class exactly once per run loop (more below).

Patch 13 is a random optimization that caught my eye when starting at this
code over and over.


I really, really dislike KVM's event handling flow. In the (distant)
future I'd love to rework the event injection to process each event
exactly once per loop, as opposed to the current behavior where
check_nested_events() can be called at least twice, if not more depending
on blocking behavior. That would make it much cleaner to correctly handle
event prioritization and likely to maintain the code, but getting there is
a significant rework with a fair number of scary changes.

[1] https://lkml.kernel.org/r/[email protected]
[2] https://lkml.kernel.org/r/[email protected]

Sean Christopherson (13):
KVM: nVMX: Preserve exception priority irrespective of exiting
behavior
KVM: nVMX: Open a window for pending nested VMX preemption timer
KVM: x86: Set KVM_REQ_EVENT if run is canceled with req_immediate_exit
set
KVM: x86: Make return for {interrupt_nmi}_allowed() a bool instead of
int
KVM: nVMX: Move nested_exit_on_nmi() to nested.h
KVM: nVMX: Report NMIs as allowed when in L2 and Exit-on-NMI is set
KVM: VMX: Split out architectural interrupt/NMI blocking checks
KVM: nVMX: Preserve IRQ/NMI priority irrespective of exiting behavior
KVM: nVMX: Prioritize SMI over nested IRQ/NMI
KVM: x86: WARN on injected+pending exception even in nested case
KVM: VMX: Use vmx_interrupt_blocked() directly from vmx_handle_exit()
KVM: x86: Replace late check_nested_events() hack with more precise
fix
KVM: VMX: Use vmx_get_rflags() to query RFLAGS in
vmx_interrupt_blocked()

arch/x86/include/asm/kvm_host.h | 6 ++-
arch/x86/kvm/svm/svm.c | 10 ++--
arch/x86/kvm/vmx/nested.c | 42 +++++++++++------
arch/x86/kvm/vmx/nested.h | 5 ++
arch/x86/kvm/vmx/vmx.c | 84 +++++++++++++++++++++------------
arch/x86/kvm/vmx/vmx.h | 2 +
arch/x86/kvm/x86.c | 32 +++++--------
7 files changed, 113 insertions(+), 68 deletions(-)

--
2.26.0


2020-04-23 02:30:06

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 08/13] KVM: nVMX: Preserve IRQ/NMI priority irrespective of exiting behavior

Short circuit vmx_check_nested_events() if an unblocked IRQ/NMI is
pending and needs to be injected into L2, priority between coincident
events is not dependent on exiting behavior.

Fixes: b6b8a1451fc4 ("KVM: nVMX: Rework interception of IRQs and NMIs")
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/vmx/nested.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 40b2427f35b7..1fdaca5fd93d 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3750,9 +3750,12 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
return 0;
}

- if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
+ if (vcpu->arch.nmi_pending && !vmx_nmi_blocked(vcpu)) {
if (block_nested_events)
return -EBUSY;
+ if (!nested_exit_on_nmi(vcpu))
+ goto no_vmexit;
+
nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
NMI_VECTOR | INTR_TYPE_NMI_INTR |
INTR_INFO_VALID_MASK, 0);
@@ -3765,9 +3768,11 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
return 0;
}

- if (kvm_cpu_has_interrupt(vcpu) && nested_exit_on_intr(vcpu)) {
+ if (kvm_cpu_has_interrupt(vcpu) && !vmx_interrupt_blocked(vcpu)) {
if (block_nested_events)
return -EBUSY;
+ if (!nested_exit_on_intr(vcpu))
+ goto no_vmexit;
nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0);
return 0;
}
--
2.26.0

2020-04-23 02:31:29

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 02/13] KVM: nVMX: Open a window for pending nested VMX preemption timer

Add a kvm_x86_ops hook to detect a nested pending "hypervisor timer" and
use it to effectively open a window for servicing the expired timer.
Like pending SMIs on VMX, opening a window simply means requesting an
immediate exit.

This fixes a bug where an expired VMX preemption timer (for L2) will be
delayed and/or lost if a pending exception is injected into L2. The
pending exception is rightly prioritized by vmx_check_nested_events()
and injected into L2, with the preemption timer left pending. Because
no window opened, L2 is free to run uninterrupted.

Fixes: f4124500c2c13 ("KVM: nVMX: Fully emulate preemption timer")
Reported-by: Jim Mattson <[email protected]>
Cc: Oliver Upton <[email protected]>
Cc: Peter Shier <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/vmx/nested.c | 10 ++++++++--
arch/x86/kvm/x86.c | 4 ++++
3 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f26df2cb0591..65dc2c88d8b2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1179,6 +1179,7 @@ struct kvm_x86_ops {
void (*handle_exit_irqoff)(struct kvm_vcpu *vcpu);

int (*check_nested_events)(struct kvm_vcpu *vcpu);
+ bool (*nested_hv_timer_pending)(struct kvm_vcpu *vcpu);
void (*request_immediate_exit)(struct kvm_vcpu *vcpu);

void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index dc7315b31fee..63cf339a13ac 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -3687,6 +3687,12 @@ static void nested_vmx_update_pending_dbg(struct kvm_vcpu *vcpu)
vcpu->arch.exception.payload);
}

+static bool nested_vmx_preemption_timer_pending(struct kvm_vcpu *vcpu)
+{
+ return nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
+ to_vmx(vcpu)->nested.preemption_timer_expired;
+}
+
static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
@@ -3742,8 +3748,7 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu)
return 0;
}

- if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
- vmx->nested.preemption_timer_expired) {
+ if (nested_vmx_preemption_timer_pending(vcpu)) {
if (block_nested_events)
return -EBUSY;
nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
@@ -6443,6 +6448,7 @@ __init int nested_vmx_hardware_setup(struct kvm_x86_ops *ops,
exit_handlers[EXIT_REASON_VMFUNC] = handle_vmfunc;

ops->check_nested_events = vmx_check_nested_events;
+ ops->nested_hv_timer_pending = nested_vmx_preemption_timer_pending;
ops->get_nested_state = vmx_get_nested_state;
ops->set_nested_state = vmx_set_nested_state;
ops->get_vmcs12_pages = nested_get_vmcs12_pages;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 59958ce2b681..ecd612807546 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -8324,6 +8324,10 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_x86_ops.enable_nmi_window(vcpu);
if (kvm_cpu_has_injectable_intr(vcpu) || req_int_win)
kvm_x86_ops.enable_irq_window(vcpu);
+ if (is_guest_mode(vcpu) &&
+ kvm_x86_ops.nested_hv_timer_pending &&
+ kvm_x86_ops.nested_hv_timer_pending(vcpu))
+ req_immediate_exit = true;
WARN_ON(vcpu->arch.exception.pending);
}

--
2.26.0

2020-04-28 21:42:04

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 02/13] KVM: nVMX: Open a window for pending nested VMX preemption timer

On Wed, Apr 22, 2020 at 7:26 PM Sean Christopherson
<[email protected]> wrote:
>
> Add a kvm_x86_ops hook to detect a nested pending "hypervisor timer" and
> use it to effectively open a window for servicing the expired timer.
> Like pending SMIs on VMX, opening a window simply means requesting an
> immediate exit.
>
> This fixes a bug where an expired VMX preemption timer (for L2) will be
> delayed and/or lost if a pending exception is injected into L2. The
> pending exception is rightly prioritized by vmx_check_nested_events()
> and injected into L2, with the preemption timer left pending. Because
> no window opened, L2 is free to run uninterrupted.
>
> Fixes: f4124500c2c13 ("KVM: nVMX: Fully emulate preemption timer")
> Reported-by: Jim Mattson <[email protected]>
> Cc: Oliver Upton <[email protected]>
> Cc: Peter Shier <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>

2020-04-28 22:00:44

by Jim Mattson

[permalink] [raw]
Subject: Re: [PATCH 08/13] KVM: nVMX: Preserve IRQ/NMI priority irrespective of exiting behavior

On Wed, Apr 22, 2020 at 7:26 PM Sean Christopherson
<[email protected]> wrote:
>
> Short circuit vmx_check_nested_events() if an unblocked IRQ/NMI is
> pending and needs to be injected into L2, priority between coincident
> events is not dependent on exiting behavior.
>
> Fixes: b6b8a1451fc4 ("KVM: nVMX: Rework interception of IRQs and NMIs")
> Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Jim Mattson <[email protected]>