2023-09-25 22:51:09

by Mingwei Zhang

[permalink] [raw]
Subject: [PATCH 0/2] Fix the duplicate PMI injections in vPMU

When we do stress test on KVM vPMU using Intel vtune, we find the following
warning kernel message in the guest VM:

[ 1437.487320] Uhhuh. NMI received for unknown reason 20 on CPU 3.
[ 1437.487330] Dazed and confused, but trying to continue

The Problem
===========

The above issue indicates that there are more NMIs injected than guest
could recognize. After a month of investigation, we discovered that the
bug happened due to minor glitches in two separate parts of the KVM: 1)
KVM vPMU mistakenly fires a PMI due to emulated counter overflow even
though the overflow has already been fired by the PMI handler on the
host [1]. 2) KVM APIC allows multiple injections of PMI at one VM entry
which violates Intel SDM. Both glitches contributes to extra injection
of PMIs and thus confuses PMI handler in guest VM and causes the above
warning messages.

The Fixes
=========

The patches disallow the multi-PMI injection fundamentally at APIC
level. In addition, they also simplify the PMI injection process by
removing irq_work and only use KVM_REQ_PMI.

The Testing
===========

With the series applied, we do not see the above warning messages when
stress testing VM with Intel vtune. In addition, we add some kernel
printing, all emulated counter overflow happens when hardware counter
value is 0 and emulated counter value is 1 (prev_counter is -1). We
never observed unexpected prev_counter values we saw in [2].

Note that this series does break the upstream kvm-unit-tests/pmu with the
following error:

FAIL: Intel: emulated instruction: instruction counter overflow
FAIL: Intel: full-width writes: emulated instruction: instruction counter overflow

This is a test bug and apply the following diff should fix the issue:

diff --git a/x86/pmu.c b/x86/pmu.c
index 0def2869..667e6233 100644
--- a/x86/pmu.c
+++ b/x86/pmu.c
@@ -68,6 +68,7 @@ volatile uint64_t irq_received;
static void cnt_overflow(isr_regs_t *regs)
{
»......irq_received++;
+»......apic_write(APIC_LVTPC, apic_read(APIC_LVTPC) & ~APIC_LVT_MASKED);
»......apic_write(APIC_EOI, 0);
}

We will post the above change soon.

[1] commit 9cd803d496e7 ("KVM: x86: Update vPMCs when retiring instructions")
[2] https://lore.kernel.org/all/CAL715WL9T8Ucnj_1AygwMgDjOJrttNZHRP9o-KUNfpx1aYZnog@mail.gmail.com/

Versioning
==========

The series is in v1. We made some changes:
- drop Dapeng's reviewed-by, since code changes.
- applies fix up in kvm_apic_local_deliver(). [seanjc]
- remove pmc->prev_counter. [seanjc]

Previous version (v0) shown as follows:
- [APIC patches v0]: https://lore.kernel.org/all/[email protected]/
- [vPMU patch v0]: https://lore.kernel.org/all/[email protected]/

Jim Mattson (2):
KVM: x86: Synthesize at most one PMI per VM-exit
KVM: x86: Mask LVTPC when handling a PMI

arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/lapic.c | 8 ++++++--
arch/x86/kvm/pmu.c | 27 +--------------------------
arch/x86/kvm/x86.c | 3 +++
4 files changed, 10 insertions(+), 29 deletions(-)


base-commit: 6de2ccc169683bf81feba163834dae7cdebdd826
--
2.42.0.515.g380fc7ccd1-goog


2023-09-26 03:44:13

by Mingwei Zhang

[permalink] [raw]
Subject: [PATCH 2/2] KVM: x86: Mask LVTPC when handling a PMI

From: Jim Mattson <[email protected]>

Per the SDM, "When the local APIC handles a performance-monitoring
counters interrupt, it automatically sets the mask flag in the LVT
performance counter register."

Add this behavior to KVM's local APIC emulation, to reduce the
incidence of "dazed and confused" spurious NMI warnings in Linux
guests (at least, those that use a PMI handler with "late_ack").

Fixes: 23930f9521c9 ("KVM: x86: Enable NMI Watchdog via in-kernel PIT source")
Signed-off-by: Jim Mattson <[email protected]>
Tested-by: Mingwei Zhang <[email protected]>
---
arch/x86/kvm/lapic.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 113ca9661ab2..1f3d56a1f45f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2729,13 +2729,17 @@ int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type)
{
u32 reg = kvm_lapic_get_reg(apic, lvt_type);
int vector, mode, trig_mode;
+ int r;

if (kvm_apic_hw_enabled(apic) && !(reg & APIC_LVT_MASKED)) {
vector = reg & APIC_VECTOR_MASK;
mode = reg & APIC_MODE_MASK;
trig_mode = reg & APIC_LVT_LEVEL_TRIGGER;
- return __apic_accept_irq(apic, mode, vector, 1, trig_mode,
- NULL);
+
+ r = __apic_accept_irq(apic, mode, vector, 1, trig_mode, NULL);
+ if (r && lvt_type == APIC_LVTPC)
+ kvm_lapic_set_reg(apic, lvt_type, reg | APIC_LVT_MASKED);
+ return r;
}
return 0;
}
--
2.42.0.515.g380fc7ccd1-goog

2023-09-29 01:20:45

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 0/2] Fix the duplicate PMI injections in vPMU

On Mon, 25 Sep 2023 17:34:45 +0000, Mingwei Zhang wrote:
> When we do stress test on KVM vPMU using Intel vtune, we find the following
> warning kernel message in the guest VM:
>
> [ 1437.487320] Uhhuh. NMI received for unknown reason 20 on CPU 3.
> [ 1437.487330] Dazed and confused, but trying to continue
>
> The Problem
> ===========
>
> [...]

Applied to kvm-x86 pmu, with the order swapped and a bit of changelog massaging.
Thanks!

[1/2] KVM: x86: Mask LVTPC when handling a PMI
https://github.com/kvm-x86/linux/commit/a16eb25b09c0
[2/2] KVM: x86/pmu: Synthesize at most one PMI per VM-exit
https://github.com/kvm-x86/linux/commit/73554b29bd70

--
https://github.com/kvm-x86/linux/tree/next