2024-03-08 22:37:31

by Sean Christopherson

[permalink] [raw]
Subject: [GIT PULL] KVM: x86 pull requests for 6.9

Main set of pull requests for 6.9, in addition to the previous two for-6.9 pull
requests (SVM[1] and a guest-side async #PF ABI cleanup[2]).

As mentioned in the PMU request, I'm expecting to send another pull request for
PMU fixes before the merge window closes (hopefully next week). I am also
planning on sending a pull request (again, next week) for Vitaly's fix+test
for the PV unhalt CPUID bug, I just want to give it a few more days to soak in
linux-next.

[1] https://lore.kernel.org/all/[email protected]
[2] https://lore.kernel.org/all/[email protected]


2024-03-08 22:37:55

by Sean Christopherson

[permalink] [raw]
Subject: [GIT PULL] KVM: Async #PF changes for 6.9

Fix a long-standing bug in the async #PF code where KVM code could be left
running in a workqueue even after all *external* references to KVM-the-module
have been put, and a few minor cleanups on top.

The following changes since commit 41bccc98fb7931d63d03f326a746ac4d429c1dd3:

Linux 6.8-rc2 (2024-01-28 17:01:12 -0800)

are available in the Git repository at:

https://github.com/kvm-x86/linux.git tags/kvm-x86-asyncpf-6.9

for you to fetch changes up to c2744ed2230a92636f04cde48f2f7d8d3486e194:

KVM: Nullify async #PF worker's "apf" pointer as soon as it might be freed (2024-02-06 11:04:58 -0800)

----------------------------------------------------------------
KVM async page fault changes for 6.9:

- Always flush the async page fault workqueue when a work item is being
removed, especially during vCPU destruction, to ensure that there are no
workers running in KVM code when all references to KVM-the-module are gone,
i.e. to prevent a use-after-free if kvm.ko is unloaded.

- Grab a reference to the VM's mm_struct in the async #PF worker itself instead
of gifting the worker a reference, e.g. so that there's no need to remember
to *conditionally* clean up after the worker.

----------------------------------------------------------------
Sean Christopherson (4):
KVM: Always flush async #PF workqueue when vCPU is being destroyed
KVM: Put mm immediately after async #PF worker completes remote gup()
KVM: Get reference to VM's address space in the async #PF worker
KVM: Nullify async #PF worker's "apf" pointer as soon as it might be freed

include/linux/kvm_host.h | 1 -
virt/kvm/async_pf.c | 73 ++++++++++++++++++++++++++++++++----------------
2 files changed, 49 insertions(+), 25 deletions(-)

2024-03-08 22:38:08

by Sean Christopherson

[permalink] [raw]
Subject: [GIT PULL] KVM: Common MMU changes for 6.9

Two small cleanups in what is effectively common MMU code.

The following changes since commit 41bccc98fb7931d63d03f326a746ac4d429c1dd3:

Linux 6.8-rc2 (2024-01-28 17:01:12 -0800)

are available in the Git repository at:

https://github.com/kvm-x86/linux.git tags/kvm-x86-generic-6.9

for you to fetch changes up to ea3689d9df50c283cb5d647a74aa45e2cc3f8064:

KVM: fix kvm_mmu_memory_cache allocation warning (2024-02-22 17:02:26 -0800)

----------------------------------------------------------------
KVM common MMU changes for 6.9:

- Harden KVM against underflowing the active mmu_notifier invalidation
count, so that "bad" invalidations (usually due to bugs elsehwere in the
kernel) are detected earlier and are less likely to hang the kernel.

- Fix a benign bug in __kvm_mmu_topup_memory_cache() where the object size
and number of objects parameters to kvmalloc_array() were swapped.

----------------------------------------------------------------
Arnd Bergmann (1):
KVM: fix kvm_mmu_memory_cache allocation warning

Sean Christopherson (1):
KVM: Harden against unpaired kvm_mmu_notifier_invalidate_range_end() calls

virt/kvm/kvm_main.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

2024-03-08 22:38:26

by Sean Christopherson

[permalink] [raw]
Subject: [GIT PULL] KVM: x86: Misc changes for 6.9

A variety of one-off cleanups and fixes, along with two medium sized series to
(1) improve the "force immediate exit" code and (2) clean up the "vCPU preempted
in-kernel" checks used for directed yield.

The following changes since commit 41bccc98fb7931d63d03f326a746ac4d429c1dd3:

Linux 6.8-rc2 (2024-01-28 17:01:12 -0800)

are available in the Git repository at:

https://github.com/kvm-x86/linux.git tags/kvm-x86-misc-6.9

for you to fetch changes up to 78ccfce774435a08d9c69ce434099166cc7952c8:

KVM: SVM: Rename vmplX_ssp -> plX_ssp (2024-02-27 12:22:43 -0800)

----------------------------------------------------------------
KVM x86 misc changes for 6.9:

- Explicitly initialize a variety of on-stack variables in the emulator that
triggered KMSAN false positives (though in fairness in KMSAN, it's comically
difficult to see that the uninitialized memory is never truly consumed).

- Fix the deubgregs ABI for 32-bit KVM, and clean up code related to reading
DR6 and DR7.

- Rework the "force immediate exit" code so that vendor code ultimately
decides how and when to force the exit. This allows VMX to further optimize
handling preemption timer exits, and allows SVM to avoid sending a duplicate
IPI (SVM also has a need to force an exit).

- Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if
vCPU creation ultimately failed, and add WARN to guard against similar bugs.

- Provide a dedicated arch hook for checking if a different vCPU was in-kernel
(for directed yield), and simplify the logic for checking if the currently
loaded vCPU is in-kernel.

- Misc cleanups and fixes.

----------------------------------------------------------------
John Allen (1):
KVM: SVM: Rename vmplX_ssp -> plX_ssp

Julian Stecklina (2):
KVM: x86: Clean up partially uninitialized integer in emulate_pop()
KVM: x86: rename push to emulate_push for consistency

Mathias Krause (1):
KVM: x86: Fix broken debugregs ABI for 32 bit kernels

Nikolay Borisov (1):
KVM: x86: Use mutex guards to eliminate __kvm_x86_vendor_init()

Sean Christopherson (14):
KVM: x86: Make kvm_get_dr() return a value, not use an out parameter
KVM: x86: Open code all direct reads to guest DR6 and DR7
KVM: x86: Plumb "force_immediate_exit" into kvm_entry() tracepoint
KVM: VMX: Re-enter guest in fastpath for "spurious" preemption timer exits
KVM: VMX: Handle forced exit due to preemption timer in fastpath
KVM: x86: Move handling of is_guest_mode() into fastpath exit handlers
KVM: VMX: Handle KVM-induced preemption timer exits in fastpath for L2
KVM: x86: Fully defer to vendor code to decide how to force immediate exit
KVM: x86: Move "KVM no-APIC vCPU" key management into local APIC code
KVM: x86: Sanity check that kvm_has_noapic_vcpu is zero at module_exit()
KVM: Add dedicated arch hook for querying if vCPU was preempted in-kernel
KVM: x86: Rely solely on preempted_in_kernel flag for directed yield
KVM: x86: Clean up directed yield API for "has pending interrupt"
KVM: Add a comment explaining the directed yield pending interrupt logic

Thomas Prescher (1):
KVM: x86/emulator: emulate movbe with operand-size prefix

arch/x86/include/asm/kvm-x86-ops.h | 1 -
arch/x86/include/asm/kvm_host.h | 8 +--
arch/x86/include/asm/svm.h | 8 +--
arch/x86/kvm/emulate.c | 45 +++++++--------
arch/x86/kvm/kvm_emulate.h | 2 +-
arch/x86/kvm/lapic.c | 27 ++++++++-
arch/x86/kvm/smm.c | 15 ++---
arch/x86/kvm/svm/svm.c | 25 ++++-----
arch/x86/kvm/trace.h | 9 ++-
arch/x86/kvm/vmx/nested.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 85 +++++++++++++++++-----------
arch/x86/kvm/vmx/vmx.h | 2 -
arch/x86/kvm/x86.c | 110 ++++++++++++-------------------------
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 21 ++++++-
15 files changed, 184 insertions(+), 177 deletions(-)

2024-03-08 22:38:42

by Sean Christopherson

[permalink] [raw]
Subject: [GIT PULL] KVM: x86: MMU changes for 6.9

The bulk of the changes are TDP MMU improvements related to memslot deletion
(ChromeOS has a use case that "requires" frequent deletion of a GPU buffer).
The other highlight is allocating the write-tracking metadata on-demand, e.g.
so that distro kernels pay the memory cost of the arrays if and only if KVM
or KVMGT actually needs to shadow guest page tables.

The following changes since commit 41bccc98fb7931d63d03f326a746ac4d429c1dd3:

Linux 6.8-rc2 (2024-01-28 17:01:12 -0800)

are available in the Git repository at:

https://github.com/kvm-x86/linux.git tags/kvm-x86-mmu-6.9

for you to fetch changes up to a364c014a2c1ad6e011bc5fdb8afb9d4ba316956:

kvm/x86: allocate the write-tracking metadata on-demand (2024-02-27 11:49:54 -0800)

----------------------------------------------------------------
KVM x86 MMU changes for 6.9:

- Clean up code related to unprotecting shadow pages when retrying a guest
instruction after failed #PF-induced emulation.

- Zap TDP MMU roots at 4KiB granularity to minimize the delay in yielding if
a reschedule is needed, e.g. if a high priority task needs to run. Because
KVM doesn't support yielding in the middle of processing a zapped non-leaf
SPTE, zapping at 1GiB granularity can result in multi-millisecond lag when
attempting to schedule in a high priority.

- Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for
read, e.g. to avoid serializing vCPUs when userspace deletes a memslot.

- Allocate write-tracking metadata on-demand to avoid the memory overhead when
running kernels built with KVMGT support (external write-tracking enabled),
but for workloads that don't use nested virtualization (shadow paging) or
KVMGT.

----------------------------------------------------------------
Andrei Vagin (1):
kvm/x86: allocate the write-tracking metadata on-demand

Kunwu Chan (1):
KVM: x86/mmu: Use KMEM_CACHE instead of kmem_cache_create()

Mingwei Zhang (1):
KVM: x86/mmu: Don't acquire mmu_lock when using indirect_shadow_pages as a heuristic

Sean Christopherson (10):
KVM: x86: Drop dedicated logic for direct MMUs in reexecute_instruction()
KVM: x86: Drop superfluous check on direct MMU vs. WRITE_PF_TO_SP flag
KVM: x86/mmu: Zap invalidated TDP MMU roots at 4KiB granularity
KVM: x86/mmu: Don't do TLB flush when zappings SPTEs in invalid roots
KVM: x86/mmu: Allow passing '-1' for "all" as_id for TDP MMU iterators
KVM: x86/mmu: Skip invalid roots when zapping leaf SPTEs for GFN range
KVM: x86/mmu: Skip invalid TDP MMU roots when write-protecting SPTEs
KVM: x86/mmu: Check for usable TDP MMU root while holding mmu_lock for read
KVM: x86/mmu: Alloc TDP MMU roots while holding mmu_lock for read
KVM: x86/mmu: Free TDP MMU roots while holding mmy_lock for read

arch/x86/include/asm/kvm_host.h | 9 +++
arch/x86/kvm/mmu/mmu.c | 37 +++++++-----
arch/x86/kvm/mmu/page_track.c | 68 +++++++++++++++++++++-
arch/x86/kvm/mmu/tdp_mmu.c | 124 ++++++++++++++++++++++++++++------------
arch/x86/kvm/mmu/tdp_mmu.h | 2 +-
arch/x86/kvm/x86.c | 35 +++++-------
6 files changed, 201 insertions(+), 74 deletions(-)

2024-03-08 22:39:10

by Sean Christopherson

[permalink] [raw]
Subject: [GIT PULL] KVM: x86: PMU changes for 6.9

Lots of PMU fixes and cleanups, along with related selftests. The most notable
fix is to *not* disallow the use of fixed counters and event encodings just
because the CPU doesn't report support for the matching architectural event
encoding.

Note, the selftests changes have several annoying conflicts with "the" selftests
pull request that you'll also receive from me. I recommend merging that one
first, as I found it slightly easier to resolve the conflicts in that order.

P.S. I expect to send another PMU related pull request of 3-4 fixes at some
point during the merge window. But they're all small and urgent (if we had a
few more weeks for 6.8, I'd have tried to squeeze them into 6.8).

The following changes since commit 41bccc98fb7931d63d03f326a746ac4d429c1dd3:

Linux 6.8-rc2 (2024-01-28 17:01:12 -0800)

are available in the Git repository at:

https://github.com/kvm-x86/linux.git tags/kvm-x86-pmu-6.9

for you to fetch changes up to 812d432373f629eb8d6cb696ea6804fca1534efa:

KVM: x86/pmu: Explicitly check NMI from guest to reducee false positives (2024-02-26 15:57:22 -0800)

----------------------------------------------------------------
KVM x86 PMU changes for 6.9:

- Fix several bugs where KVM speciously prevents the guest from utilizing
fixed counters and architectural event encodings based on whether or not
guest CPUID reports support for the _architectural_ encoding.

- Fix a variety of bugs in KVM's emulation of RDPMC, e.g. for "fast" reads,
priority of VMX interception vs #GP, PMC types in architectural PMUs, etc.

- Add a selftest to verify KVM correctly emulates RDMPC, counter availability,
and a variety of other PMC-related behaviors that depend on guest CPUID,
i.e. are difficult to validate via KVM-Unit-Tests.

- Zero out PMU metadata on AMD if the virtual PMU is disabled to avoid wasting
cycles, e.g. when checking if a PMC event needs to be synthesized when
skipping an instruction.

- Optimize triggering of emulated events, e.g. for "count instructions" events
when skipping an instruction, which yields a ~10% performance improvement in
VM-Exit microbenchmarks when a vPMU is exposed to the guest.

- Tighten the check for "PMI in guest" to reduce false positives if an NMI
arrives in the host while KVM is handling an IRQ VM-Exit.

----------------------------------------------------------------
Dapeng Mi (1):
KVM: selftests: Test top-down slots event in x86's pmu_counters_test

Jinrong Liang (7):
KVM: selftests: Add vcpu_set_cpuid_property() to set properties
KVM: selftests: Add pmu.h and lib/pmu.c for common PMU assets
KVM: selftests: Test Intel PMU architectural events on gp counters
KVM: selftests: Test Intel PMU architectural events on fixed counters
KVM: selftests: Test consistency of CPUID with num of gp counters
KVM: selftests: Test consistency of CPUID with num of fixed counters
KVM: selftests: Add functional test for Intel's fixed PMU counters

Like Xu (1):
KVM: x86/pmu: Explicitly check NMI from guest to reducee false positives

Sean Christopherson (32):
KVM: x86/pmu: Always treat Fixed counters as available when supported
KVM: x86/pmu: Allow programming events that match unsupported arch events
KVM: x86/pmu: Remove KVM's enumeration of Intel's architectural encodings
KVM: x86/pmu: Setup fixed counters' eventsel during PMU initialization
KVM: x86/pmu: Get eventsel for fixed counters from perf
KVM: x86/pmu: Don't ignore bits 31:30 for RDPMC index on AMD
KVM: x86/pmu: Prioritize VMX interception over #GP on RDPMC due to bad index
KVM: x86/pmu: Apply "fast" RDPMC only to Intel PMUs
KVM: x86/pmu: Disallow "fast" RDPMC for architectural Intel PMUs
KVM: x86/pmu: Treat "fixed" PMU type in RDPMC as index as a value, not flag
KVM: x86/pmu: Explicitly check for RDPMC of unsupported Intel PMC types
KVM: selftests: Drop the "name" param from KVM_X86_PMU_FEATURE()
KVM: selftests: Extend {kvm,this}_pmu_has() to support fixed counters
KVM: selftests: Expand PMU counters test to verify LLC events
KVM: selftests: Add a helper to query if the PMU module param is enabled
KVM: selftests: Add helpers to read integer module params
KVM: selftests: Query module param to detect FEP in MSR filtering test
KVM: selftests: Move KVM_FEP macro into common library header
KVM: selftests: Test PMC virtualization with forced emulation
KVM: selftests: Add a forced emulation variation of KVM_ASM_SAFE()
KVM: selftests: Add helpers for safe and safe+forced RDMSR, RDPMC, and XGETBV
KVM: selftests: Extend PMU counters test to validate RDPMC after WRMSR
KVM: x86/pmu: Zero out PMU metadata on AMD if PMU is disabled
KVM: x86/pmu: Add common define to capture fixed counters offset
KVM: x86/pmu: Move pmc_idx => pmc translation helper to common code
KVM: x86/pmu: Snapshot and clear reprogramming bitmap before reprogramming
KVM: x86/pmu: Add macros to iterate over all PMCs given a bitmap
KVM: x86/pmu: Process only enabled PMCs when emulating events in software
KVM: x86/pmu: Snapshot event selectors that KVM emulates in software
KVM: x86/pmu: Expand the comment about what bits are check emulating events
KVM: x86/pmu: Check eventsel first when emulating (branch) insns retired
KVM: x86/pmu: Avoid CPL lookup if PMC enabline for USER and KERNEL is the same

arch/x86/include/asm/kvm-x86-pmu-ops.h | 4 +-
arch/x86/include/asm/kvm_host.h | 11 +-
arch/x86/kvm/emulate.c | 2 +-
arch/x86/kvm/kvm_emulate.h | 2 +-
arch/x86/kvm/pmu.c | 163 ++++--
arch/x86/kvm/pmu.h | 57 +-
arch/x86/kvm/svm/pmu.c | 22 +-
arch/x86/kvm/vmx/nested.c | 2 +-
arch/x86/kvm/vmx/pmu_intel.c | 222 +++-----
arch/x86/kvm/x86.c | 15 +-
arch/x86/kvm/x86.h | 6 -
tools/testing/selftests/kvm/Makefile | 2 +
.../testing/selftests/kvm/include/kvm_util_base.h | 4 +
tools/testing/selftests/kvm/include/x86_64/pmu.h | 97 ++++
.../selftests/kvm/include/x86_64/processor.h | 148 +++--
tools/testing/selftests/kvm/lib/kvm_util.c | 62 ++-
tools/testing/selftests/kvm/lib/x86_64/pmu.c | 31 ++
tools/testing/selftests/kvm/lib/x86_64/processor.c | 15 +-
.../selftests/kvm/x86_64/pmu_counters_test.c | 620 +++++++++++++++++++++
.../selftests/kvm/x86_64/pmu_event_filter_test.c | 143 ++---
.../kvm/x86_64/smaller_maxphyaddr_emulation_test.c | 2 +-
.../selftests/kvm/x86_64/userspace_msr_exit_test.c | 29 +-
.../selftests/kvm/x86_64/vmx_pmu_caps_test.c | 2 +-
23 files changed, 1262 insertions(+), 399 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/x86_64/pmu.h
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/pmu.c
create mode 100644 tools/testing/selftests/kvm/x86_64/pmu_counters_test.c

2024-03-08 22:39:16

by Sean Christopherson

[permalink] [raw]
Subject: [GIT PULL] KVM: x86: Selftests changes for 6.9

Add SEV(-ES) smoke tests, and start building out infrastructure to utilize the
"core" selftests harness and TAP. In addition to provide TAP output, using the
infrastructure reduces boilerplate code and allows running all testscases in a
test, even if a previous testcase fails (compared with today, where a testcase
failure is terminal for the entire test).

As noted in the PMU pull request, the "Use TAP interface" changes have a few
conflicts. 3 of 4 are relatively straightforward, but the one in
userspace_msr_exit_test.c's test_msr_filter_allow() is a pain. At least, I
thought so as I botched it at least twice. (LOL, make that three times, as I
just botched my test merge resolution).

The code should end up looking like this:

---
KVM_ONE_VCPU_TEST_SUITE(user_msr);

KVM_ONE_VCPU_TEST(user_msr, msr_filter_allow, guest_code_filter_allow)
{
struct kvm_vm *vm = vcpu->vm;
uint64_t cmd;
int rc;

sync_global_to_guest(vm, fep_available);

rc = kvm_check_cap(KVM_CAP_X86_USER_SPACE_MSR);
---

The resolutions I've been using can be found in kvm-x86/next.


The following changes since commit db7d6fbc10447090bab8691a907a7c383ec66f58:

KVM: remove unnecessary #ifdef (2024-02-08 08:41:06 -0500)

are available in the Git repository at:

https://github.com/kvm-x86/linux.git tags/kvm-x86-selftests-6.9

for you to fetch changes up to e9da6f08edb0bd4c621165496778d77a222e1174:

KVM: selftests: Explicitly close guest_memfd files in some gmem tests (2024-03-05 13:31:20 -0800)

----------------------------------------------------------------
KVM selftests changes for 6.9:

- Add macros to reduce the amount of boilerplate code needed to write "simple"
selftests, and to utilize selftest TAP infrastructure, which is especially
beneficial for KVM selftests with multiple testcases.

- Add basic smoke tests for SEV and SEV-ES, along with a pile of library
support for handling private/encrypted/protected memory.

- Fix benign bugs where tests neglect to close() guest_memfd files.

----------------------------------------------------------------
Ackerley Tng (1):
KVM: selftests: Add a macro to iterate over a sparsebit range

Dongli Zhang (1):
KVM: selftests: Explicitly close guest_memfd files in some gmem tests

Michael Roth (2):
KVM: selftests: Make sparsebit structs const where appropriate
KVM: selftests: Add support for protected vm_vaddr_* allocations

Peter Gonda (5):
KVM: selftests: Add support for allocating/managing protected guest memory
KVM: selftests: Explicitly ucall pool from shared memory
KVM: selftests: Allow tagging protected memory in guest page tables
KVM: selftests: Add library for creating and interacting with SEV guests
KVM: selftests: Add a basic SEV smoke test

Sean Christopherson (4):
KVM: selftests: Move setting a vCPU's entry point to a dedicated API
KVM: selftests: Extend VM creation's @shape to allow control of VM subtype
KVM: selftests: Use the SEV library APIs in the intra-host migration test
KVM: selftests: Add a basic SEV-ES smoke test

Thomas Huth (7):
KVM: selftests: x86: sync_regs_test: Use vcpu_run() where appropriate
KVM: selftests: x86: sync_regs_test: Get regs structure before modifying it
KVM: selftests: Add a macro to define a test with one vcpu
KVM: selftests: x86: Use TAP interface in the sync_regs test
KVM: selftests: x86: Use TAP interface in the fix_hypercall test
KVM: selftests: x86: Use TAP interface in the vmx_pmu_caps test
KVM: selftests: x86: Use TAP interface in the userspace_msr_exit test

tools/testing/selftests/kvm/Makefile | 2 +
tools/testing/selftests/kvm/guest_memfd_test.c | 3 +
.../selftests/kvm/include/aarch64/kvm_util_arch.h | 7 ++
.../selftests/kvm/include/kvm_test_harness.h | 36 ++++++
.../testing/selftests/kvm/include/kvm_util_base.h | 61 +++++++++--
.../selftests/kvm/include/riscv/kvm_util_arch.h | 7 ++
.../selftests/kvm/include/s390x/kvm_util_arch.h | 7 ++
tools/testing/selftests/kvm/include/sparsebit.h | 56 +++++++---
.../selftests/kvm/include/x86_64/kvm_util_arch.h | 23 ++++
.../selftests/kvm/include/x86_64/processor.h | 8 ++
tools/testing/selftests/kvm/include/x86_64/sev.h | 107 ++++++++++++++++++
.../testing/selftests/kvm/lib/aarch64/processor.c | 24 +++-
tools/testing/selftests/kvm/lib/kvm_util.c | 67 ++++++++++--
tools/testing/selftests/kvm/lib/riscv/processor.c | 9 +-
tools/testing/selftests/kvm/lib/s390x/processor.c | 13 ++-
tools/testing/selftests/kvm/lib/sparsebit.c | 48 ++++----
tools/testing/selftests/kvm/lib/ucall_common.c | 3 +-
tools/testing/selftests/kvm/lib/x86_64/processor.c | 45 +++++++-
tools/testing/selftests/kvm/lib/x86_64/sev.c | 114 +++++++++++++++++++
.../selftests/kvm/x86_64/fix_hypercall_test.c | 27 +++--
.../kvm/x86_64/private_mem_conversions_test.c | 2 +
.../selftests/kvm/x86_64/sev_migrate_tests.c | 60 +++-------
.../testing/selftests/kvm/x86_64/sev_smoke_test.c | 88 +++++++++++++++
.../testing/selftests/kvm/x86_64/sync_regs_test.c | 121 +++++++++++++++------
.../selftests/kvm/x86_64/userspace_msr_exit_test.c | 52 +++------
.../selftests/kvm/x86_64/vmx_pmu_caps_test.c | 52 ++-------
26 files changed, 802 insertions(+), 240 deletions(-)
create mode 100644 tools/testing/selftests/kvm/include/aarch64/kvm_util_arch.h
create mode 100644 tools/testing/selftests/kvm/include/kvm_test_harness.h
create mode 100644 tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h
create mode 100644 tools/testing/selftests/kvm/include/s390x/kvm_util_arch.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/kvm_util_arch.h
create mode 100644 tools/testing/selftests/kvm/include/x86_64/sev.h
create mode 100644 tools/testing/selftests/kvm/lib/x86_64/sev.c
create mode 100644 tools/testing/selftests/kvm/x86_64/sev_smoke_test.c

2024-03-08 22:39:34

by Sean Christopherson

[permalink] [raw]
Subject: [GIT PULL] KVM: x86: VMX changes for 6.9

A small series for Dongli to cleanup the passthrough MSR bitmap code, and a
handful of one-off changes.

The following changes since commit 41bccc98fb7931d63d03f326a746ac4d429c1dd3:

Linux 6.8-rc2 (2024-01-28 17:01:12 -0800)

are available in the Git repository at:

https://github.com/kvm-x86/linux.git tags/kvm-x86-vmx-6.9

for you to fetch changes up to 259720c37d51aae21f70060ef96e1f1b08df0652:

KVM: VMX: Combine "check" and "get" APIs for passthrough MSR lookups (2024-02-27 12:29:46 -0800)

----------------------------------------------------------------
KVM VMX changes for 6.9:

- Fix a bug where KVM would report stale/bogus exit qualification information
when exiting to userspace due to an unexpected VM-Exit while the CPU was
vectoring an exception.

- Add a VMX flag in /proc/cpuinfo to report 5-level EPT support.

- Clean up the logic for massaging the passthrough MSR bitmaps when userspace
changes its MSR filter.

----------------------------------------------------------------
Chao Gao (1):
KVM: VMX: Report up-to-date exit qualification to userspace

Dongli Zhang (2):
KVM: VMX: fix comment to add LBR to passthrough MSRs
KVM: VMX: return early if msr_bitmap is not supported

Sean Christopherson (2):
x86/cpu: Add a VMX flag to enumerate 5-level EPT support to userspace
KVM: VMX: Combine "check" and "get" APIs for passthrough MSR lookups

arch/x86/include/asm/vmxfeatures.h | 1 +
arch/x86/kernel/cpu/feat_ctl.c | 2 ++
arch/x86/kvm/vmx/vmx.c | 72 ++++++++++++++++----------------------
3 files changed, 34 insertions(+), 41 deletions(-)

2024-03-08 22:40:04

by Sean Christopherson

[permalink] [raw]
Subject: [GIT PULL] KVM: Xen and gfn_to_pfn_cache changes for 6.9

Aaaand seeing my one commit in the shortlog made me realize I completely forgot
to get acks from s390 on the kvm_is_error_gpa() => kvm_is_gpa_in_memslot()
refactor. Fudge.

s390 folks, my apologies for not reaching out earlier. Please take a look at
commit 9e7325acb3dc ("KVM: s390: Refactor kvm_is_error_gpa() into
kvm_is_gpa_in_memslot()"). It *should* be a straight refactor, and I don't
expect the rename to be contentious, but I didn't intend to send this pull request
before getting an explicit ack.

As for the actual pull request, the bulk of the changes are to add support
for using gfn_to_pfn caches without a gfn, e.g. to opimize handling of overlay
pages, and then use that functionality for Xen's shared_info page.

Note, the commits towards the end are a variety of fixes from David that have
been on the list for a while, but only got applied this week due to issues with
the patches being corrupted (thanks to Evolution doing weird things).

The following changes since commit db7d6fbc10447090bab8691a907a7c383ec66f58:

KVM: remove unnecessary #ifdef (2024-02-08 08:41:06 -0500)

are available in the Git repository at:

https://github.com/kvm-x86/linux.git tags/kvm-x86-xen-6.9

for you to fetch changes up to 7a36d680658ba5a0d350f2ad275b97156b8d4333:

KVM: x86/xen: fix recursive deadlock in timer injection (2024-03-04 16:22:39 -0800)

----------------------------------------------------------------
KVM Xen and pfncache changes for 6.9:

- Rip out the half-baked support for using gfn_to_pfn caches to manage pages
that are "mapped" into guests via physical addresses.

- Add support for using gfn_to_pfn caches with only a host virtual address,
i.e. to bypass the "gfn" stage of the cache. The primary use case is
overlay pages, where the guest may change the gfn used to reference the
overlay page, but the backing hva+pfn remains the same.

- Add an ioctl() to allow mapping Xen's shared_info page using an hva instead
of a gpa, so that userspace doesn't need to reconfigure and invalidate the
cache/mapping if the guest changes the gpa (but userspace keeps the resolved
hva the same).

- When possible, use a single host TSC value when computing the deadline for
Xen timers in order to improve the accuracy of the timer emulation.

- Inject pending upcall events when the vCPU software-enables its APIC to fix
a bug where an upcall can be lost (and to follow Xen's behavior).

- Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen
events fails, e.g. if the guest has aliased xAPIC IDs.

- Extend gfn_to_pfn_cache's mutex to cover (de)activation (in addition to
refresh), and drop a now-redundant acquisition of xen_lock (that was
protecting the shared_info cache) to fix a deadlock due to recursively
acquiring xen_lock.

----------------------------------------------------------------
David Woodhouse (5):
KVM: x86/xen: improve accuracy of Xen timers
KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled
KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery
KVM: pfncache: simplify locking and make more self-contained
KVM: x86/xen: fix recursive deadlock in timer injection

Paul Durrant (17):
KVM: pfncache: Add a map helper function
KVM: pfncache: remove unnecessary exports
KVM: x86/xen: mark guest pages dirty with the pfncache lock held
KVM: pfncache: add a mark-dirty helper
KVM: pfncache: remove KVM_GUEST_USES_PFN usage
KVM: pfncache: stop open-coding offset_in_page()
KVM: pfncache: include page offset in uhva and use it consistently
KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA
KVM: x86/xen: separate initialization of shared_info cache and content
KVM: x86/xen: re-initialize shared_info if guest (32/64-bit) mode is set
KVM: x86/xen: allow shared_info to be mapped by fixed HVA
KVM: x86/xen: allow vcpu_info to be mapped by fixed HVA
KVM: selftests: map Xen's shared_info page using HVA rather than GFN
KVM: selftests: re-map Xen's vcpu_info using HVA rather than GPA
KVM: x86/xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability
KVM: pfncache: check the need for invalidation under read lock first
KVM: x86/xen: allow vcpu_info content to be 'safely' copied

Sean Christopherson (1):
KVM: s390: Refactor kvm_is_error_gpa() into kvm_is_gpa_in_memslot()

Documentation/virt/kvm/api.rst | 51 +++-
arch/s390/kvm/diag.c | 2 +-
arch/s390/kvm/gaccess.c | 14 +-
arch/s390/kvm/kvm-s390.c | 4 +-
arch/s390/kvm/priv.c | 4 +-
arch/s390/kvm/sigp.c | 2 +-
arch/x86/include/uapi/asm/kvm.h | 9 +-
arch/x86/kvm/lapic.c | 5 +-
arch/x86/kvm/x86.c | 68 ++++-
arch/x86/kvm/x86.h | 1 +
arch/x86/kvm/xen.c | 325 ++++++++++++++-------
arch/x86/kvm/xen.h | 18 ++
include/linux/kvm_host.h | 56 +++-
include/linux/kvm_types.h | 8 -
.../testing/selftests/kvm/x86_64/xen_shinfo_test.c | 59 +++-
virt/kvm/pfncache.c | 245 +++++++++-------
16 files changed, 602 insertions(+), 269 deletions(-)

2024-03-11 14:04:44

by Janosch Frank

[permalink] [raw]
Subject: Re: [GIT PULL] KVM: Xen and gfn_to_pfn_cache changes for 6.9

On 3/8/24 23:37, Sean Christopherson wrote:
> Aaaand seeing my one commit in the shortlog made me realize I completely forgot
> to get acks from s390 on the kvm_is_error_gpa() => kvm_is_gpa_in_memslot()
> refactor. Fudge.
>
> s390 folks, my apologies for not reaching out earlier. Please take a look at
> commit 9e7325acb3dc ("KVM: s390: Refactor kvm_is_error_gpa() into
> kvm_is_gpa_in_memslot()"). It *should* be a straight refactor, and I don't
> expect the rename to be contentious, but I didn't intend to send this pull request
> before getting an explicit ack.


kvm_is_gpa_in_memslot() is not my preferred name for this function but
it's way better than kvm_is_error_gpa() so I'm fine with it.

Acked-by: Janosch Frank <[email protected]>

2024-03-11 14:29:02

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [GIT PULL] KVM: x86: Selftests changes for 6.9

On 3/8/24 23:36, Sean Christopherson wrote:
> Add SEV(-ES) smoke tests, and start building out infrastructure to utilize the
> "core" selftests harness and TAP. In addition to provide TAP output, using the
> infrastructure reduces boilerplate code and allows running all testscases in a
> test, even if a previous testcase fails (compared with today, where a testcase
> failure is terminal for the entire test).
>
> As noted in the PMU pull request, the "Use TAP interface" changes have a few
> conflicts. 3 of 4 are relatively straightforward, but the one in
> userspace_msr_exit_test.c's test_msr_filter_allow() is a pain. At least, I
> thought so as I botched it at least twice. (LOL, make that three times, as I
> just botched my test merge resolution).
>
> The code should end up looking like this:
>
> ---
> KVM_ONE_VCPU_TEST_SUITE(user_msr);
>
> KVM_ONE_VCPU_TEST(user_msr, msr_filter_allow, guest_code_filter_allow)
> {
> struct kvm_vm *vm = vcpu->vm;
> uint64_t cmd;
> int rc;
>
> sync_global_to_guest(vm, fep_available);
>
> rc = kvm_check_cap(KVM_CAP_X86_USER_SPACE_MSR);
> ---
>
> The resolutions I've been using can be found in kvm-x86/next.
>
>
> The following changes since commit db7d6fbc10447090bab8691a907a7c383ec66f58:
>
> KVM: remove unnecessary #ifdef (2024-02-08 08:41:06 -0500)
>
> are available in the Git repository at:
>
> https://github.com/kvm-x86/linux.git tags/kvm-x86-selftests-6.9
>
> for you to fetch changes up to e9da6f08edb0bd4c621165496778d77a222e1174:
>
> KVM: selftests: Explicitly close guest_memfd files in some gmem tests (2024-03-05 13:31:20 -0800)
>
> ----------------------------------------------------------------
> KVM selftests changes for 6.9:
>
> - Add macros to reduce the amount of boilerplate code needed to write "simple"
> selftests, and to utilize selftest TAP infrastructure, which is especially
> beneficial for KVM selftests with multiple testcases.
>
> - Add basic smoke tests for SEV and SEV-ES, along with a pile of library
> support for handling private/encrypted/protected memory.
>
> - Fix benign bugs where tests neglect to close() guest_memfd files.
>
> ----------------------------------------------------------------

Pulled, thanks.

Paolo

> Ackerley Tng (1):
> KVM: selftests: Add a macro to iterate over a sparsebit range
>
> Dongli Zhang (1):
> KVM: selftests: Explicitly close guest_memfd files in some gmem tests
>
> Michael Roth (2):
> KVM: selftests: Make sparsebit structs const where appropriate
> KVM: selftests: Add support for protected vm_vaddr_* allocations
>
> Peter Gonda (5):
> KVM: selftests: Add support for allocating/managing protected guest memory
> KVM: selftests: Explicitly ucall pool from shared memory
> KVM: selftests: Allow tagging protected memory in guest page tables
> KVM: selftests: Add library for creating and interacting with SEV guests
> KVM: selftests: Add a basic SEV smoke test
>
> Sean Christopherson (4):
> KVM: selftests: Move setting a vCPU's entry point to a dedicated API
> KVM: selftests: Extend VM creation's @shape to allow control of VM subtype
> KVM: selftests: Use the SEV library APIs in the intra-host migration test
> KVM: selftests: Add a basic SEV-ES smoke test
>
> Thomas Huth (7):
> KVM: selftests: x86: sync_regs_test: Use vcpu_run() where appropriate
> KVM: selftests: x86: sync_regs_test: Get regs structure before modifying it
> KVM: selftests: Add a macro to define a test with one vcpu
> KVM: selftests: x86: Use TAP interface in the sync_regs test
> KVM: selftests: x86: Use TAP interface in the fix_hypercall test
> KVM: selftests: x86: Use TAP interface in the vmx_pmu_caps test
> KVM: selftests: x86: Use TAP interface in the userspace_msr_exit test
>
> tools/testing/selftests/kvm/Makefile | 2 +
> tools/testing/selftests/kvm/guest_memfd_test.c | 3 +
> .../selftests/kvm/include/aarch64/kvm_util_arch.h | 7 ++
> .../selftests/kvm/include/kvm_test_harness.h | 36 ++++++
> .../testing/selftests/kvm/include/kvm_util_base.h | 61 +++++++++--
> .../selftests/kvm/include/riscv/kvm_util_arch.h | 7 ++
> .../selftests/kvm/include/s390x/kvm_util_arch.h | 7 ++
> tools/testing/selftests/kvm/include/sparsebit.h | 56 +++++++---
> .../selftests/kvm/include/x86_64/kvm_util_arch.h | 23 ++++
> .../selftests/kvm/include/x86_64/processor.h | 8 ++
> tools/testing/selftests/kvm/include/x86_64/sev.h | 107 ++++++++++++++++++
> .../testing/selftests/kvm/lib/aarch64/processor.c | 24 +++-
> tools/testing/selftests/kvm/lib/kvm_util.c | 67 ++++++++++--
> tools/testing/selftests/kvm/lib/riscv/processor.c | 9 +-
> tools/testing/selftests/kvm/lib/s390x/processor.c | 13 ++-
> tools/testing/selftests/kvm/lib/sparsebit.c | 48 ++++----
> tools/testing/selftests/kvm/lib/ucall_common.c | 3 +-
> tools/testing/selftests/kvm/lib/x86_64/processor.c | 45 +++++++-
> tools/testing/selftests/kvm/lib/x86_64/sev.c | 114 +++++++++++++++++++
> .../selftests/kvm/x86_64/fix_hypercall_test.c | 27 +++--
> .../kvm/x86_64/private_mem_conversions_test.c | 2 +
> .../selftests/kvm/x86_64/sev_migrate_tests.c | 60 +++-------
> .../testing/selftests/kvm/x86_64/sev_smoke_test.c | 88 +++++++++++++++
> .../testing/selftests/kvm/x86_64/sync_regs_test.c | 121 +++++++++++++++------
> .../selftests/kvm/x86_64/userspace_msr_exit_test.c | 52 +++------
> .../selftests/kvm/x86_64/vmx_pmu_caps_test.c | 52 ++-------
> 26 files changed, 802 insertions(+), 240 deletions(-)
> create mode 100644 tools/testing/selftests/kvm/include/aarch64/kvm_util_arch.h
> create mode 100644 tools/testing/selftests/kvm/include/kvm_test_harness.h
> create mode 100644 tools/testing/selftests/kvm/include/riscv/kvm_util_arch.h
> create mode 100644 tools/testing/selftests/kvm/include/s390x/kvm_util_arch.h
> create mode 100644 tools/testing/selftests/kvm/include/x86_64/kvm_util_arch.h
> create mode 100644 tools/testing/selftests/kvm/include/x86_64/sev.h
> create mode 100644 tools/testing/selftests/kvm/lib/x86_64/sev.c
> create mode 100644 tools/testing/selftests/kvm/x86_64/sev_smoke_test.c
>


2024-03-11 14:29:17

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [GIT PULL] KVM: x86: Misc changes for 6.9

On 3/8/24 23:36, Sean Christopherson wrote:
> A variety of one-off cleanups and fixes, along with two medium sized series to
> (1) improve the "force immediate exit" code and (2) clean up the "vCPU preempted
> in-kernel" checks used for directed yield.
>
> The following changes since commit 41bccc98fb7931d63d03f326a746ac4d429c1dd3:
>
> Linux 6.8-rc2 (2024-01-28 17:01:12 -0800)
>
> are available in the Git repository at:
>
> https://github.com/kvm-x86/linux.git tags/kvm-x86-misc-6.9
>
> for you to fetch changes up to 78ccfce774435a08d9c69ce434099166cc7952c8:
>
> KVM: SVM: Rename vmplX_ssp -> plX_ssp (2024-02-27 12:22:43 -0800)

Queued, thanks.

Paolo

> ----------------------------------------------------------------
> KVM x86 misc changes for 6.9:
>
> - Explicitly initialize a variety of on-stack variables in the emulator that
> triggered KMSAN false positives (though in fairness in KMSAN, it's comically
> difficult to see that the uninitialized memory is never truly consumed).
>
> - Fix the deubgregs ABI for 32-bit KVM, and clean up code related to reading
> DR6 and DR7.
>
> - Rework the "force immediate exit" code so that vendor code ultimately
> decides how and when to force the exit. This allows VMX to further optimize
> handling preemption timer exits, and allows SVM to avoid sending a duplicate
> IPI (SVM also has a need to force an exit).
>
> - Fix a long-standing bug where kvm_has_noapic_vcpu could be left elevated if
> vCPU creation ultimately failed, and add WARN to guard against similar bugs.
>
> - Provide a dedicated arch hook for checking if a different vCPU was in-kernel
> (for directed yield), and simplify the logic for checking if the currently
> loaded vCPU is in-kernel.
>
> - Misc cleanups and fixes.
>
> ----------------------------------------------------------------
> John Allen (1):
> KVM: SVM: Rename vmplX_ssp -> plX_ssp
>
> Julian Stecklina (2):
> KVM: x86: Clean up partially uninitialized integer in emulate_pop()
> KVM: x86: rename push to emulate_push for consistency
>
> Mathias Krause (1):
> KVM: x86: Fix broken debugregs ABI for 32 bit kernels
>
> Nikolay Borisov (1):
> KVM: x86: Use mutex guards to eliminate __kvm_x86_vendor_init()
>
> Sean Christopherson (14):
> KVM: x86: Make kvm_get_dr() return a value, not use an out parameter
> KVM: x86: Open code all direct reads to guest DR6 and DR7
> KVM: x86: Plumb "force_immediate_exit" into kvm_entry() tracepoint
> KVM: VMX: Re-enter guest in fastpath for "spurious" preemption timer exits
> KVM: VMX: Handle forced exit due to preemption timer in fastpath
> KVM: x86: Move handling of is_guest_mode() into fastpath exit handlers
> KVM: VMX: Handle KVM-induced preemption timer exits in fastpath for L2
> KVM: x86: Fully defer to vendor code to decide how to force immediate exit
> KVM: x86: Move "KVM no-APIC vCPU" key management into local APIC code
> KVM: x86: Sanity check that kvm_has_noapic_vcpu is zero at module_exit()
> KVM: Add dedicated arch hook for querying if vCPU was preempted in-kernel
> KVM: x86: Rely solely on preempted_in_kernel flag for directed yield
> KVM: x86: Clean up directed yield API for "has pending interrupt"
> KVM: Add a comment explaining the directed yield pending interrupt logic
>
> Thomas Prescher (1):
> KVM: x86/emulator: emulate movbe with operand-size prefix
>
> arch/x86/include/asm/kvm-x86-ops.h | 1 -
> arch/x86/include/asm/kvm_host.h | 8 +--
> arch/x86/include/asm/svm.h | 8 +--
> arch/x86/kvm/emulate.c | 45 +++++++--------
> arch/x86/kvm/kvm_emulate.h | 2 +-
> arch/x86/kvm/lapic.c | 27 ++++++++-
> arch/x86/kvm/smm.c | 15 ++---
> arch/x86/kvm/svm/svm.c | 25 ++++-----
> arch/x86/kvm/trace.h | 9 ++-
> arch/x86/kvm/vmx/nested.c | 2 +-
> arch/x86/kvm/vmx/vmx.c | 85 +++++++++++++++++-----------
> arch/x86/kvm/vmx/vmx.h | 2 -
> arch/x86/kvm/x86.c | 110 ++++++++++++-------------------------
> include/linux/kvm_host.h | 1 +
> virt/kvm/kvm_main.c | 21 ++++++-
> 15 files changed, 184 insertions(+), 177 deletions(-)
>


2024-03-11 14:32:19

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [GIT PULL] KVM: x86: MMU changes for 6.9

On 3/8/24 23:36, Sean Christopherson wrote:
> The bulk of the changes are TDP MMU improvements related to memslot deletion
> (ChromeOS has a use case that "requires" frequent deletion of a GPU buffer).
> The other highlight is allocating the write-tracking metadata on-demand, e.g.
> so that distro kernels pay the memory cost of the arrays if and only if KVM
> or KVMGT actually needs to shadow guest page tables.
>
> The following changes since commit 41bccc98fb7931d63d03f326a746ac4d429c1dd3:
>
> Linux 6.8-rc2 (2024-01-28 17:01:12 -0800)
>
> are available in the Git repository at:
>
> https://github.com/kvm-x86/linux.git tags/kvm-x86-mmu-6.9
>
> for you to fetch changes up to a364c014a2c1ad6e011bc5fdb8afb9d4ba316956:
>
> kvm/x86: allocate the write-tracking metadata on-demand (2024-02-27 11:49:54 -0800)

Pulled, thanks.

Paolo

> ----------------------------------------------------------------
> KVM x86 MMU changes for 6.9:
>
> - Clean up code related to unprotecting shadow pages when retrying a guest
> instruction after failed #PF-induced emulation.
>
> - Zap TDP MMU roots at 4KiB granularity to minimize the delay in yielding if
> a reschedule is needed, e.g. if a high priority task needs to run. Because
> KVM doesn't support yielding in the middle of processing a zapped non-leaf
> SPTE, zapping at 1GiB granularity can result in multi-millisecond lag when
> attempting to schedule in a high priority.
>
> - Rework TDP MMU root unload, free, and alloc to run with mmu_lock held for
> read, e.g. to avoid serializing vCPUs when userspace deletes a memslot.
>
> - Allocate write-tracking metadata on-demand to avoid the memory overhead when
> running kernels built with KVMGT support (external write-tracking enabled),
> but for workloads that don't use nested virtualization (shadow paging) or
> KVMGT.
>
> ----------------------------------------------------------------
> Andrei Vagin (1):
> kvm/x86: allocate the write-tracking metadata on-demand
>
> Kunwu Chan (1):
> KVM: x86/mmu: Use KMEM_CACHE instead of kmem_cache_create()
>
> Mingwei Zhang (1):
> KVM: x86/mmu: Don't acquire mmu_lock when using indirect_shadow_pages as a heuristic
>
> Sean Christopherson (10):
> KVM: x86: Drop dedicated logic for direct MMUs in reexecute_instruction()
> KVM: x86: Drop superfluous check on direct MMU vs. WRITE_PF_TO_SP flag
> KVM: x86/mmu: Zap invalidated TDP MMU roots at 4KiB granularity
> KVM: x86/mmu: Don't do TLB flush when zappings SPTEs in invalid roots
> KVM: x86/mmu: Allow passing '-1' for "all" as_id for TDP MMU iterators
> KVM: x86/mmu: Skip invalid roots when zapping leaf SPTEs for GFN range
> KVM: x86/mmu: Skip invalid TDP MMU roots when write-protecting SPTEs
> KVM: x86/mmu: Check for usable TDP MMU root while holding mmu_lock for read
> KVM: x86/mmu: Alloc TDP MMU roots while holding mmu_lock for read
> KVM: x86/mmu: Free TDP MMU roots while holding mmy_lock for read
>
> arch/x86/include/asm/kvm_host.h | 9 +++
> arch/x86/kvm/mmu/mmu.c | 37 +++++++-----
> arch/x86/kvm/mmu/page_track.c | 68 +++++++++++++++++++++-
> arch/x86/kvm/mmu/tdp_mmu.c | 124 ++++++++++++++++++++++++++++------------
> arch/x86/kvm/mmu/tdp_mmu.h | 2 +-
> arch/x86/kvm/x86.c | 35 +++++-------
> 6 files changed, 201 insertions(+), 74 deletions(-)
>


2024-03-11 14:34:20

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [GIT PULL] KVM: Async #PF changes for 6.9

On 3/8/24 23:36, Sean Christopherson wrote:
> Fix a long-standing bug in the async #PF code where KVM code could be left
> running in a workqueue even after all *external* references to KVM-the-module
> have been put, and a few minor cleanups on top.
>
> The following changes since commit 41bccc98fb7931d63d03f326a746ac4d429c1dd3:
>
> Linux 6.8-rc2 (2024-01-28 17:01:12 -0800)
>
> are available in the Git repository at:
>
> https://github.com/kvm-x86/linux.git tags/kvm-x86-asyncpf-6.9
>
> for you to fetch changes up to c2744ed2230a92636f04cde48f2f7d8d3486e194:
>
> KVM: Nullify async #PF worker's "apf" pointer as soon as it might be freed (2024-02-06 11:04:58 -0800)
>
> ----------------------------------------------------------------
> KVM async page fault changes for 6.9:
>
> - Always flush the async page fault workqueue when a work item is being
> removed, especially during vCPU destruction, to ensure that there are no
> workers running in KVM code when all references to KVM-the-module are gone,
> i.e. to prevent a use-after-free if kvm.ko is unloaded.
>
> - Grab a reference to the VM's mm_struct in the async #PF worker itself instead
> of gifting the worker a reference, e.g. so that there's no need to remember
> to *conditionally* clean up after the worker.
>
> ----------------------------------------------------------------

Pulled, thanks.

Paolo

> Sean Christopherson (4):
> KVM: Always flush async #PF workqueue when vCPU is being destroyed
> KVM: Put mm immediately after async #PF worker completes remote gup()
> KVM: Get reference to VM's address space in the async #PF worker
> KVM: Nullify async #PF worker's "apf" pointer as soon as it might be freed
>
> include/linux/kvm_host.h | 1 -
> virt/kvm/async_pf.c | 73 ++++++++++++++++++++++++++++++++----------------
> 2 files changed, 49 insertions(+), 25 deletions(-)
>


2024-03-11 14:35:44

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [GIT PULL] KVM: Common MMU changes for 6.9

On 3/8/24 23:36, Sean Christopherson wrote:
> Two small cleanups in what is effectively common MMU code.
>
> The following changes since commit 41bccc98fb7931d63d03f326a746ac4d429c1dd3:
>
> Linux 6.8-rc2 (2024-01-28 17:01:12 -0800)
>
> are available in the Git repository at:
>
> https://github.com/kvm-x86/linux.git tags/kvm-x86-generic-6.9
>
> for you to fetch changes up to ea3689d9df50c283cb5d647a74aa45e2cc3f8064:
>
> KVM: fix kvm_mmu_memory_cache allocation warning (2024-02-22 17:02:26 -0800)

Pulled, thanks.

Paolo

> ----------------------------------------------------------------
> KVM common MMU changes for 6.9:
>
> - Harden KVM against underflowing the active mmu_notifier invalidation
> count, so that "bad" invalidations (usually due to bugs elsehwere in the
> kernel) are detected earlier and are less likely to hang the kernel.
>
> - Fix a benign bug in __kvm_mmu_topup_memory_cache() where the object size
> and number of objects parameters to kvmalloc_array() were swapped.
>
> ----------------------------------------------------------------
> Arnd Bergmann (1):
> KVM: fix kvm_mmu_memory_cache allocation warning
>
> Sean Christopherson (1):
> KVM: Harden against unpaired kvm_mmu_notifier_invalidate_range_end() calls
>
> virt/kvm/kvm_main.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>


2024-03-11 14:36:25

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [GIT PULL] KVM: x86: Selftests changes for 6.9

On 3/8/24 23:36, Sean Christopherson wrote:
> Add SEV(-ES) smoke tests, and start building out infrastructure to utilize the
> "core" selftests harness and TAP. In addition to provide TAP output, using the
> infrastructure reduces boilerplate code and allows running all testscases in a
> test, even if a previous testcase fails (compared with today, where a testcase
> failure is terminal for the entire test).

Hmm, now I remember why I would have liked to include the AMD SEV
changes in 6.9 --- because they get rid of the "subtype" case in selftests.

It's not a huge deal, it's just a nicer API, and anyway I'm not going to
ask you to rebase on top of my changes; and you couldn't have known that
when we talked about it last Wednesday, since the patches are for the
moment closely guarded on my hard drive.

But it may still be a good reason to sneak those as well in the second
week of the 6.9 merge window, though I'm not going to make a fuss if you
disagree.

Paolo


2024-03-11 14:41:58

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [GIT PULL] KVM: x86: PMU changes for 6.9

On 3/8/24 23:36, Sean Christopherson wrote:
> Lots of PMU fixes and cleanups, along with related selftests. The most notable
> fix is to *not* disallow the use of fixed counters and event encodings just
> because the CPU doesn't report support for the matching architectural event
> encoding.
>
> Note, the selftests changes have several annoying conflicts with "the" selftests
> pull request that you'll also receive from me. I recommend merging that one
> first, as I found it slightly easier to resolve the conflicts in that order.
>
> P.S. I expect to send another PMU related pull request of 3-4 fixes at some
> point during the merge window. But they're all small and urgent (if we had a
> few more weeks for 6.8, I'd have tried to squeeze them into 6.8).
>
> The following changes since commit 41bccc98fb7931d63d03f326a746ac4d429c1dd3:
>
> Linux 6.8-rc2 (2024-01-28 17:01:12 -0800)
>
> are available in the Git repository at:
>
> https://github.com/kvm-x86/linux.git tags/kvm-x86-pmu-6.9
>
> for you to fetch changes up to 812d432373f629eb8d6cb696ea6804fca1534efa:
>
> KVM: x86/pmu: Explicitly check NMI from guest to reducee false positives (2024-02-26 15:57:22 -0800)

Pulled, thanks.

Paolo

> ----------------------------------------------------------------
> KVM x86 PMU changes for 6.9:
>
> - Fix several bugs where KVM speciously prevents the guest from utilizing
> fixed counters and architectural event encodings based on whether or not
> guest CPUID reports support for the _architectural_ encoding.
>
> - Fix a variety of bugs in KVM's emulation of RDPMC, e.g. for "fast" reads,
> priority of VMX interception vs #GP, PMC types in architectural PMUs, etc.
>
> - Add a selftest to verify KVM correctly emulates RDMPC, counter availability,
> and a variety of other PMC-related behaviors that depend on guest CPUID,
> i.e. are difficult to validate via KVM-Unit-Tests.
>
> - Zero out PMU metadata on AMD if the virtual PMU is disabled to avoid wasting
> cycles, e.g. when checking if a PMC event needs to be synthesized when
> skipping an instruction.
>
> - Optimize triggering of emulated events, e.g. for "count instructions" events
> when skipping an instruction, which yields a ~10% performance improvement in
> VM-Exit microbenchmarks when a vPMU is exposed to the guest.
>
> - Tighten the check for "PMI in guest" to reduce false positives if an NMI
> arrives in the host while KVM is handling an IRQ VM-Exit.
>
> ----------------------------------------------------------------
> Dapeng Mi (1):
> KVM: selftests: Test top-down slots event in x86's pmu_counters_test
>
> Jinrong Liang (7):
> KVM: selftests: Add vcpu_set_cpuid_property() to set properties
> KVM: selftests: Add pmu.h and lib/pmu.c for common PMU assets
> KVM: selftests: Test Intel PMU architectural events on gp counters
> KVM: selftests: Test Intel PMU architectural events on fixed counters
> KVM: selftests: Test consistency of CPUID with num of gp counters
> KVM: selftests: Test consistency of CPUID with num of fixed counters
> KVM: selftests: Add functional test for Intel's fixed PMU counters
>
> Like Xu (1):
> KVM: x86/pmu: Explicitly check NMI from guest to reducee false positives
>
> Sean Christopherson (32):
> KVM: x86/pmu: Always treat Fixed counters as available when supported
> KVM: x86/pmu: Allow programming events that match unsupported arch events
> KVM: x86/pmu: Remove KVM's enumeration of Intel's architectural encodings
> KVM: x86/pmu: Setup fixed counters' eventsel during PMU initialization
> KVM: x86/pmu: Get eventsel for fixed counters from perf
> KVM: x86/pmu: Don't ignore bits 31:30 for RDPMC index on AMD
> KVM: x86/pmu: Prioritize VMX interception over #GP on RDPMC due to bad index
> KVM: x86/pmu: Apply "fast" RDPMC only to Intel PMUs
> KVM: x86/pmu: Disallow "fast" RDPMC for architectural Intel PMUs
> KVM: x86/pmu: Treat "fixed" PMU type in RDPMC as index as a value, not flag
> KVM: x86/pmu: Explicitly check for RDPMC of unsupported Intel PMC types
> KVM: selftests: Drop the "name" param from KVM_X86_PMU_FEATURE()
> KVM: selftests: Extend {kvm,this}_pmu_has() to support fixed counters
> KVM: selftests: Expand PMU counters test to verify LLC events
> KVM: selftests: Add a helper to query if the PMU module param is enabled
> KVM: selftests: Add helpers to read integer module params
> KVM: selftests: Query module param to detect FEP in MSR filtering test
> KVM: selftests: Move KVM_FEP macro into common library header
> KVM: selftests: Test PMC virtualization with forced emulation
> KVM: selftests: Add a forced emulation variation of KVM_ASM_SAFE()
> KVM: selftests: Add helpers for safe and safe+forced RDMSR, RDPMC, and XGETBV
> KVM: selftests: Extend PMU counters test to validate RDPMC after WRMSR
> KVM: x86/pmu: Zero out PMU metadata on AMD if PMU is disabled
> KVM: x86/pmu: Add common define to capture fixed counters offset
> KVM: x86/pmu: Move pmc_idx => pmc translation helper to common code
> KVM: x86/pmu: Snapshot and clear reprogramming bitmap before reprogramming
> KVM: x86/pmu: Add macros to iterate over all PMCs given a bitmap
> KVM: x86/pmu: Process only enabled PMCs when emulating events in software
> KVM: x86/pmu: Snapshot event selectors that KVM emulates in software
> KVM: x86/pmu: Expand the comment about what bits are check emulating events
> KVM: x86/pmu: Check eventsel first when emulating (branch) insns retired
> KVM: x86/pmu: Avoid CPL lookup if PMC enabline for USER and KERNEL is the same
>
> arch/x86/include/asm/kvm-x86-pmu-ops.h | 4 +-
> arch/x86/include/asm/kvm_host.h | 11 +-
> arch/x86/kvm/emulate.c | 2 +-
> arch/x86/kvm/kvm_emulate.h | 2 +-
> arch/x86/kvm/pmu.c | 163 ++++--
> arch/x86/kvm/pmu.h | 57 +-
> arch/x86/kvm/svm/pmu.c | 22 +-
> arch/x86/kvm/vmx/nested.c | 2 +-
> arch/x86/kvm/vmx/pmu_intel.c | 222 +++-----
> arch/x86/kvm/x86.c | 15 +-
> arch/x86/kvm/x86.h | 6 -
> tools/testing/selftests/kvm/Makefile | 2 +
> .../testing/selftests/kvm/include/kvm_util_base.h | 4 +
> tools/testing/selftests/kvm/include/x86_64/pmu.h | 97 ++++
> .../selftests/kvm/include/x86_64/processor.h | 148 +++--
> tools/testing/selftests/kvm/lib/kvm_util.c | 62 ++-
> tools/testing/selftests/kvm/lib/x86_64/pmu.c | 31 ++
> tools/testing/selftests/kvm/lib/x86_64/processor.c | 15 +-
> .../selftests/kvm/x86_64/pmu_counters_test.c | 620 +++++++++++++++++++++
> .../selftests/kvm/x86_64/pmu_event_filter_test.c | 143 ++---
> .../kvm/x86_64/smaller_maxphyaddr_emulation_test.c | 2 +-
> .../selftests/kvm/x86_64/userspace_msr_exit_test.c | 29 +-
> .../selftests/kvm/x86_64/vmx_pmu_caps_test.c | 2 +-
> 23 files changed, 1262 insertions(+), 399 deletions(-)
> create mode 100644 tools/testing/selftests/kvm/include/x86_64/pmu.h
> create mode 100644 tools/testing/selftests/kvm/lib/x86_64/pmu.c
> create mode 100644 tools/testing/selftests/kvm/x86_64/pmu_counters_test.c
>


2024-03-11 14:44:41

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [GIT PULL] KVM: Xen and gfn_to_pfn_cache changes for 6.9

On 3/8/24 23:37, Sean Christopherson wrote:
> Aaaand seeing my one commit in the shortlog made me realize I completely forgot
> to get acks from s390 on the kvm_is_error_gpa() => kvm_is_gpa_in_memslot()
> refactor. Fudge.
>
> s390 folks, my apologies for not reaching out earlier. Please take a look at
> commit 9e7325acb3dc ("KVM: s390: Refactor kvm_is_error_gpa() into
> kvm_is_gpa_in_memslot()"). It *should* be a straight refactor, and I don't
> expect the rename to be contentious, but I didn't intend to send this pull request
> before getting an explicit ack.
>
> As for the actual pull request, the bulk of the changes are to add support
> for using gfn_to_pfn caches without a gfn, e.g. to opimize handling of overlay
> pages, and then use that functionality for Xen's shared_info page.
>
> Note, the commits towards the end are a variety of fixes from David that have
> been on the list for a while, but only got applied this week due to issues with
> the patches being corrupted (thanks to Evolution doing weird things).

Evolution?!? :)

> The following changes since commit db7d6fbc10447090bab8691a907a7c383ec66f58:
>
> KVM: remove unnecessary #ifdef (2024-02-08 08:41:06 -0500)
>
> are available in the Git repository at:
>
> https://github.com/kvm-x86/linux.git tags/kvm-x86-xen-6.9
>
> for you to fetch changes up to 7a36d680658ba5a0d350f2ad275b97156b8d4333:
>
> KVM: x86/xen: fix recursive deadlock in timer injection (2024-03-04 16:22:39 -0800)

Pulled, thanks.

Paolo

> ----------------------------------------------------------------
> KVM Xen and pfncache changes for 6.9:
>
> - Rip out the half-baked support for using gfn_to_pfn caches to manage pages
> that are "mapped" into guests via physical addresses.
>
> - Add support for using gfn_to_pfn caches with only a host virtual address,
> i.e. to bypass the "gfn" stage of the cache. The primary use case is
> overlay pages, where the guest may change the gfn used to reference the
> overlay page, but the backing hva+pfn remains the same.
>
> - Add an ioctl() to allow mapping Xen's shared_info page using an hva instead
> of a gpa, so that userspace doesn't need to reconfigure and invalidate the
> cache/mapping if the guest changes the gpa (but userspace keeps the resolved
> hva the same).
>
> - When possible, use a single host TSC value when computing the deadline for
> Xen timers in order to improve the accuracy of the timer emulation.
>
> - Inject pending upcall events when the vCPU software-enables its APIC to fix
> a bug where an upcall can be lost (and to follow Xen's behavior).
>
> - Fall back to the slow path instead of warning if "fast" IRQ delivery of Xen
> events fails, e.g. if the guest has aliased xAPIC IDs.
>
> - Extend gfn_to_pfn_cache's mutex to cover (de)activation (in addition to
> refresh), and drop a now-redundant acquisition of xen_lock (that was
> protecting the shared_info cache) to fix a deadlock due to recursively
> acquiring xen_lock.
>
> ----------------------------------------------------------------
> David Woodhouse (5):
> KVM: x86/xen: improve accuracy of Xen timers
> KVM: x86/xen: inject vCPU upcall vector when local APIC is enabled
> KVM: x86/xen: remove WARN_ON_ONCE() with false positives in evtchn delivery
> KVM: pfncache: simplify locking and make more self-contained
> KVM: x86/xen: fix recursive deadlock in timer injection
>
> Paul Durrant (17):
> KVM: pfncache: Add a map helper function
> KVM: pfncache: remove unnecessary exports
> KVM: x86/xen: mark guest pages dirty with the pfncache lock held
> KVM: pfncache: add a mark-dirty helper
> KVM: pfncache: remove KVM_GUEST_USES_PFN usage
> KVM: pfncache: stop open-coding offset_in_page()
> KVM: pfncache: include page offset in uhva and use it consistently
> KVM: pfncache: allow a cache to be activated with a fixed (userspace) HVA
> KVM: x86/xen: separate initialization of shared_info cache and content
> KVM: x86/xen: re-initialize shared_info if guest (32/64-bit) mode is set
> KVM: x86/xen: allow shared_info to be mapped by fixed HVA
> KVM: x86/xen: allow vcpu_info to be mapped by fixed HVA
> KVM: selftests: map Xen's shared_info page using HVA rather than GFN
> KVM: selftests: re-map Xen's vcpu_info using HVA rather than GPA
> KVM: x86/xen: advertize the KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA capability
> KVM: pfncache: check the need for invalidation under read lock first
> KVM: x86/xen: allow vcpu_info content to be 'safely' copied
>
> Sean Christopherson (1):
> KVM: s390: Refactor kvm_is_error_gpa() into kvm_is_gpa_in_memslot()
>
> Documentation/virt/kvm/api.rst | 51 +++-
> arch/s390/kvm/diag.c | 2 +-
> arch/s390/kvm/gaccess.c | 14 +-
> arch/s390/kvm/kvm-s390.c | 4 +-
> arch/s390/kvm/priv.c | 4 +-
> arch/s390/kvm/sigp.c | 2 +-
> arch/x86/include/uapi/asm/kvm.h | 9 +-
> arch/x86/kvm/lapic.c | 5 +-
> arch/x86/kvm/x86.c | 68 ++++-
> arch/x86/kvm/x86.h | 1 +
> arch/x86/kvm/xen.c | 325 ++++++++++++++-------
> arch/x86/kvm/xen.h | 18 ++
> include/linux/kvm_host.h | 56 +++-
> include/linux/kvm_types.h | 8 -
> .../testing/selftests/kvm/x86_64/xen_shinfo_test.c | 59 +++-
> virt/kvm/pfncache.c | 245 +++++++++-------
> 16 files changed, 602 insertions(+), 269 deletions(-)
>


2024-03-11 14:46:40

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [GIT PULL] KVM: x86: VMX changes for 6.9

On 3/8/24 23:37, Sean Christopherson wrote:
> A small series for Dongli to cleanup the passthrough MSR bitmap code, and a
> handful of one-off changes.
>
> The following changes since commit 41bccc98fb7931d63d03f326a746ac4d429c1dd3:
>
> Linux 6.8-rc2 (2024-01-28 17:01:12 -0800)
>
> are available in the Git repository at:
>
> https://github.com/kvm-x86/linux.git tags/kvm-x86-vmx-6.9
>
> for you to fetch changes up to 259720c37d51aae21f70060ef96e1f1b08df0652:
>
> KVM: VMX: Combine "check" and "get" APIs for passthrough MSR lookups (2024-02-27 12:29:46 -0800)

Pulled, thanks.

Paolo


2024-03-12 23:01:03

by Sean Christopherson

[permalink] [raw]
Subject: Re: [GIT PULL] KVM: x86: Selftests changes for 6.9

On Mon, Mar 11, 2024, Paolo Bonzini wrote:
> On 3/8/24 23:36, Sean Christopherson wrote:
> > Add SEV(-ES) smoke tests, and start building out infrastructure to utilize the
> > "core" selftests harness and TAP. In addition to provide TAP output, using the
> > infrastructure reduces boilerplate code and allows running all testscases in a
> > test, even if a previous testcase fails (compared with today, where a testcase
> > failure is terminal for the entire test).
>
> Hmm, now I remember why I would have liked to include the AMD SEV changes in
> 6.9 --- because they get rid of the "subtype" case in selftests.
>
> It's not a huge deal, it's just a nicer API, and anyway I'm not going to ask
> you to rebase on top of my changes; and you couldn't have known that when we
> talked about it last Wednesday, since the patches are for the moment closely
> guarded on my hard drive.

Heh, though it is obvious in hindsight.

> But it may still be a good reason to sneak those as well in the second week
> of the 6.9 merge window, though I'm not going to make a fuss if you disagree.

My preference is still to wait. I would be very surprised if the subtype code
gains any users in the next few weeks, i.e. I doubt it'll be any harder to rip
out the subtype code in 6.9 versus 6.10.

On the other hand, waiting until 6.10 for the SEV changes will give us a bit more
time to see how they interact with the SNP and TDX series, e.g. in the off chance
there's something in the uAPI that could be done better for SNP and/or TDX.

2024-03-14 18:32:04

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [GIT PULL] KVM: x86: MMU changes for 6.9

On Fri, Mar 8, 2024 at 11:37 PM Sean Christopherson <[email protected]> wrote:
>
> - Zap TDP MMU roots at 4KiB granularity to minimize the delay in yielding if
> a reschedule is needed, e.g. if a high priority task needs to run. Because
> KVM doesn't support yielding in the middle of processing a zapped non-leaf
> SPTE, zapping at 1GiB granularity can result in multi-millisecond lag when
> attempting to schedule in a high priority.
>

Would 2 MiB provide a nice middle ground?

Paolo


2024-03-14 18:38:26

by Sean Christopherson

[permalink] [raw]
Subject: Re: [GIT PULL] KVM: x86: MMU changes for 6.9

On Thu, Mar 14, 2024, Paolo Bonzini wrote:
> On Fri, Mar 8, 2024 at 11:37 PM Sean Christopherson <[email protected]> wrote:
> >
> > - Zap TDP MMU roots at 4KiB granularity to minimize the delay in yielding if
> > a reschedule is needed, e.g. if a high priority task needs to run. Because
> > KVM doesn't support yielding in the middle of processing a zapped non-leaf
> > SPTE, zapping at 1GiB granularity can result in multi-millisecond lag when
> > attempting to schedule in a high priority.
> >
>
> Would 2 MiB provide a nice middle ground?

Not really?

Zapping at 2MiB definitely fixes the worst of the tail latencies, but there is
still a measurable difference between 2MiB and 4KiB. And on the other side of the
coing, I was unable to observe a meaningful difference in total runtime by zapping
at 2MiB, or even 1GiB, versus 4KiB.

In other words, AFAICT, there's no need to shoot for a middle ground because trying
to zap at larger granularities doesn't buy us anything.

2024-03-14 18:41:00

by Sean Christopherson

[permalink] [raw]
Subject: Re: [GIT PULL] KVM: x86: Selftests changes for 6.9

On Tue, Mar 12, 2024, Sean Christopherson wrote:
> On Mon, Mar 11, 2024, Paolo Bonzini wrote:
> > On 3/8/24 23:36, Sean Christopherson wrote:
> > > Add SEV(-ES) smoke tests, and start building out infrastructure to utilize the
> > > "core" selftests harness and TAP. In addition to provide TAP output, using the
> > > infrastructure reduces boilerplate code and allows running all testscases in a
> > > test, even if a previous testcase fails (compared with today, where a testcase
> > > failure is terminal for the entire test).
> >
> > Hmm, now I remember why I would have liked to include the AMD SEV changes in
> > 6.9 --- because they get rid of the "subtype" case in selftests.
> >
> > It's not a huge deal, it's just a nicer API, and anyway I'm not going to ask
> > you to rebase on top of my changes; and you couldn't have known that when we
> > talked about it last Wednesday, since the patches are for the moment closely
> > guarded on my hard drive.
>
> Heh, though it is obvious in hindsight.
>
> > But it may still be a good reason to sneak those as well in the second week
> > of the 6.9 merge window, though I'm not going to make a fuss if you disagree.
>
> My preference is still to wait. I would be very surprised if the subtype code
> gains any users in the next few weeks, i.e. I doubt it'll be any harder to rip
> out the subtype code in 6.9 versus 6.10.
>
> On the other hand, waiting until 6.10 for the SEV changes will give us a bit more
> time to see how they interact with the SNP and TDX series, e.g. in the off chance
> there's something in the uAPI that could be done better for SNP and/or TDX.

Though I'll add the belated disclaimer that performance testing is not my strong
suit...

2024-03-14 18:43:55

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [GIT PULL] KVM: x86: MMU changes for 6.9

On Thu, Mar 14, 2024 at 7:38 PM Sean Christopherson <[email protected]> wrote:
>
> On Thu, Mar 14, 2024, Paolo Bonzini wrote:
> > On Fri, Mar 8, 2024 at 11:37 PM Sean Christopherson <[email protected]> wrote:
> > >
> > > - Zap TDP MMU roots at 4KiB granularity to minimize the delay in yielding if
> > > a reschedule is needed, e.g. if a high priority task needs to run. Because
> > > KVM doesn't support yielding in the middle of processing a zapped non-leaf
> > > SPTE, zapping at 1GiB granularity can result in multi-millisecond lag when
> > > attempting to schedule in a high priority.
> > >
> >
> > Would 2 MiB provide a nice middle ground?
>
> Not really?
>
> Zapping at 2MiB definitely fixes the worst of the tail latencies, but there is
> still a measurable difference between 2MiB and 4KiB.

Yeah, but you said multi millisecond so I guessed 5/512 is a 10
microsecond latency, which should be pretty acceptable (for PREEMPT_RT
tests at Red Hat we shoot at 10-15 worst case, so for CONFIG_PREEMPT
it would be more than enough).

> And on the other side of the
> coing, I was unable to observe a meaningful difference in total runtime by zapping
> at 2MiB, or even 1GiB, versus 4KiB.

Ok, that's the answer.

Paolo

> In other words, AFAICT, there's no need to shoot for a middle ground because trying
> to zap at larger granularities doesn't buy us anything.
>