2022-05-27 06:13:12

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v8 00/11] KVM: x86: Add a cap to disable NX hugepages on a VM

Given the high cost of NX hugepages in terms of TLB performance, it may
be desirable to disable the mitigation on a per-VM basis. In the case of public
cloud providers with many VMs on a single host, some VMs may be more trusted
than others. In order to maximize performance on critical VMs, while still
providing some protection to the host from iTLB Multihit, allow the mitigation
to be selectively disabled.

Disabling NX hugepages on a VM is relatively straightforward, but I took this
as an opportunity to add some NX hugepages test coverage and clean up selftests
infrastructure a bit.

This series was tested with the new selftest and the rest of the KVM selftests
on an Intel Haswell machine.

Changelog:
v1->v2:
Dropped the complicated memslot refactor in favor of Ricardo Koller's
patch with a similar effect.
Incorporated David Dunn's feedback and reviewed by tag: shortened waits
to speed up test.
v2->v3:
Incorporated a suggestion from David on how to build the NX huge pages
test.
Fixed a build breakage identified by David.
Dropped the per-vm nx_huge_pages field in favor of simply checking the
global + per-VM disable override.
Documented the new capability
Separated out the commit to test disabling NX huge pages
Removed permission check when checking if the disable NX capability is
supported.
Added test coverage for the permission check.
v3->v4:
Collected RB's from Jing and David
Modified stat collection to reduce a memory allocation [David]
Incorporated various improvments to the NX test [David]
Changed the NX disable test to run by default [David]
Removed some now unnecessary commits
Dropped the code to dump KVM stats from the binary stats test, and
factor out parts of the existing test to library functions instead.
[David, Jing, Sean]
Dropped the improvement to a debugging log message as it's no longer
relevant to this series.
v4->v5:
Incorporated cleanup suggestions from David and Sean
Added a patch with style fixes for the binary stats test from Sean
Added a restriction that NX huge pages can only be disabled before
vCPUs are created [Sean]

v5->v6:
Scooped up David's RBs
Added a magic token to skip nx_huge_pages_test when not run via
wrapper script [Sean]
Made the call to nx_huge_pages_test in the wrapper script more
robust [Sean]
Incorportated various nits and comment / documentation suggestions from
Sean.
Improved negative testing of NX disable without reboot permissions. [Sean]

v6->v7:
Collected Peter Xu's Reviewed-by tags
Added stats metadata caching to kvm_util
Misc NX test fixups

v7->v8:
Spell out descriptors in library function names [Sean]
Reorganize stat descriptor size calculation [Sean]
Addded a get_stats_descriptor helper [Sean]
Remove misleading comment about error reporting in read_stat_data() [Sean]
Use unsigned size_t for input to pread [Sean]
Clean up read_stat_data() [Sean]
Add nx_huge_pages_test to .gitignore [Sean]
Fix organization of get_vm_stat() functions. [Sean]
Clean up #defines in NX huge pages test [Sean]
Add flag parsing and reclaim period flag to NX test [Sean]
Don't reduce hugepage allocation for NX test [Sean]
Fix error check when disabling NX huge pages [Sean]
Don't leave reboot permissions on test binary when executing as root [Sean]

Ben Gardon (10):
KVM: selftests: Remove dynamic memory allocation for stats header
KVM: selftests: Read binary stats header in lib
KVM: selftests: Read binary stats desc in lib
KVM: selftests: Read binary stat data in lib
KVM: selftests: Add NX huge pages test
KVM: x86: Fix errant brace in KVM capability handling
KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis
KVM: selftests: Factor out calculation of pages needed for a VM
KVM: selftests: Test disabling NX hugepages on a VM
KVM: selftests: Cache binary stats metadata for duration of test

Sean Christopherson (1):
KVM: selftests: Clean up coding style in binary stats test

Documentation/virt/kvm/api.rst | 17 ++
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/mmu/mmu_internal.h | 7 +-
arch/x86/kvm/mmu/spte.c | 7 +-
arch/x86/kvm/mmu/spte.h | 3 +-
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
arch/x86/kvm/x86.c | 32 ++-
include/uapi/linux/kvm.h | 1 +
tools/testing/selftests/kvm/.gitignore | 1 +
tools/testing/selftests/kvm/Makefile | 10 +
.../selftests/kvm/include/kvm_util_base.h | 46 +++
.../selftests/kvm/kvm_binary_stats_test.c | 138 +++++----
tools/testing/selftests/kvm/lib/kvm_util.c | 223 ++++++++++++--
.../selftests/kvm/lib/kvm_util_internal.h | 5 +
.../selftests/kvm/x86_64/nx_huge_pages_test.c | 271 ++++++++++++++++++
.../kvm/x86_64/nx_huge_pages_test.sh | 52 ++++
16 files changed, 720 insertions(+), 97 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
create mode 100755 tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh

--
2.36.1.124.g0e6072fb45-goog



2022-05-27 06:13:57

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v8 09/11] KVM: selftests: Factor out calculation of pages needed for a VM

Factor out the calculation of the number of pages needed for a VM to
make it easier to separate creating the VM and adding vCPUs.

Reviewed-by: David Matlack <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: Ben Gardon <[email protected]>
---
.../selftests/kvm/include/kvm_util_base.h | 4 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 59 ++++++++++++++-----
2 files changed, 47 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 035168cec451..3c9898c59ea1 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -312,6 +312,10 @@ vm_paddr_t vm_phy_pages_alloc(struct kvm_vm *vm, size_t num,
vm_paddr_t paddr_min, uint32_t memslot);
vm_paddr_t vm_alloc_page_table(struct kvm_vm *vm);

+uint64_t vm_pages_needed(enum vm_guest_mode mode, uint32_t nr_vcpus,
+ uint64_t slot0_mem_pages, uint64_t extra_mem_pages,
+ uint32_t num_percpu_pages);
+
/*
* Create a VM with reasonable defaults
*
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 200d3bb803e0..385f249c2dc5 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -377,7 +377,7 @@ struct kvm_vm *vm_create_without_vcpus(enum vm_guest_mode mode, uint64_t pages)
}

/*
- * VM Create with customized parameters
+ * Get the number of pages needed for a VM
*
* Input Args:
* mode - VM Mode (e.g. VM_MODE_P52V48_4K)
@@ -385,27 +385,17 @@ struct kvm_vm *vm_create_without_vcpus(enum vm_guest_mode mode, uint64_t pages)
* slot0_mem_pages - Slot0 physical memory size
* extra_mem_pages - Non-slot0 physical memory total size
* num_percpu_pages - Per-cpu physical memory pages
- * guest_code - Guest entry point
- * vcpuids - VCPU IDs
*
* Output Args: None
*
* Return:
- * Pointer to opaque structure that describes the created VM.
- *
- * Creates a VM with the mode specified by mode (e.g. VM_MODE_P52V48_4K),
- * with customized slot0 memory size, at least 512 pages currently.
- * extra_mem_pages is only used to calculate the maximum page table size,
- * no real memory allocation for non-slot0 memory in this function.
+ * The number of pages needed for the VM.
*/
-struct kvm_vm *vm_create_with_vcpus(enum vm_guest_mode mode, uint32_t nr_vcpus,
- uint64_t slot0_mem_pages, uint64_t extra_mem_pages,
- uint32_t num_percpu_pages, void *guest_code,
- uint32_t vcpuids[])
+uint64_t vm_pages_needed(enum vm_guest_mode mode, uint32_t nr_vcpus,
+ uint64_t slot0_mem_pages, uint64_t extra_mem_pages,
+ uint32_t num_percpu_pages)
{
uint64_t vcpu_pages, extra_pg_pages, pages;
- struct kvm_vm *vm;
- int i;

/* Force slot0 memory size not small than DEFAULT_GUEST_PHY_PAGES */
if (slot0_mem_pages < DEFAULT_GUEST_PHY_PAGES)
@@ -421,11 +411,48 @@ struct kvm_vm *vm_create_with_vcpus(enum vm_guest_mode mode, uint32_t nr_vcpus,
extra_pg_pages = (slot0_mem_pages + extra_mem_pages + vcpu_pages) / PTES_PER_MIN_PAGE * 2;
pages = slot0_mem_pages + vcpu_pages + extra_pg_pages;

+ pages = vm_adjust_num_guest_pages(mode, pages);
+
+ return pages;
+}
+
+/*
+ * VM Create with customized parameters
+ *
+ * Input Args:
+ * mode - VM Mode (e.g. VM_MODE_P52V48_4K)
+ * nr_vcpus - VCPU count
+ * slot0_mem_pages - Slot0 physical memory size
+ * extra_mem_pages - Non-slot0 physical memory total size
+ * num_percpu_pages - Per-cpu physical memory pages
+ * guest_code - Guest entry point
+ * vcpuids - VCPU IDs
+ *
+ * Output Args: None
+ *
+ * Return:
+ * Pointer to opaque structure that describes the created VM.
+ *
+ * Creates a VM with the mode specified by mode (e.g. VM_MODE_P52V48_4K),
+ * with customized slot0 memory size, at least 512 pages currently.
+ * extra_mem_pages is only used to calculate the maximum page table size,
+ * no real memory allocation for non-slot0 memory in this function.
+ */
+struct kvm_vm *vm_create_with_vcpus(enum vm_guest_mode mode, uint32_t nr_vcpus,
+ uint64_t slot0_mem_pages, uint64_t extra_mem_pages,
+ uint32_t num_percpu_pages, void *guest_code,
+ uint32_t vcpuids[])
+{
+ uint64_t pages;
+ struct kvm_vm *vm;
+ int i;
+
TEST_ASSERT(nr_vcpus <= kvm_check_cap(KVM_CAP_MAX_VCPUS),
"nr_vcpus = %d too large for host, max-vcpus = %d",
nr_vcpus, kvm_check_cap(KVM_CAP_MAX_VCPUS));

- pages = vm_adjust_num_guest_pages(mode, pages);
+ pages = vm_pages_needed(mode, nr_vcpus, slot0_mem_pages,
+ extra_mem_pages, num_percpu_pages);

vm = vm_create_without_vcpus(mode, pages);

--
2.36.1.124.g0e6072fb45-goog


2022-05-27 06:52:31

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v8 11/11] KVM: selftests: Cache binary stats metadata for duration of test

In order to improve performance across multiple reads of VM stats, cache
the stats metadata in the VM struct.

Signed-off-by: Ben Gardon <[email protected]>
---
tools/testing/selftests/kvm/lib/kvm_util.c | 32 ++++++++++---------
.../selftests/kvm/lib/kvm_util_internal.h | 5 +++
2 files changed, 22 insertions(+), 15 deletions(-)

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 33d4d64c1391..a32b7b4352b9 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -736,6 +736,12 @@ void kvm_vm_free(struct kvm_vm *vmp)
if (vmp == NULL)
return;

+ /* Free cached stats metadata and close FD */
+ if (vmp->stats_fd) {
+ free(vmp->stats_desc);
+ close(vmp->stats_fd);
+ }
+
/* Free userspace_mem_regions. */
hash_for_each_safe(vmp->regions.slot_hash, ctr, node, region, slot_node)
__vm_mem_region_delete(vmp, region, false);
@@ -2694,34 +2700,30 @@ void read_stat_data(int stats_fd, struct kvm_stats_header *header,
void __vm_get_stat(struct kvm_vm *vm, const char *stat_name, uint64_t *data,
size_t max_elements)
{
- struct kvm_stats_desc *stats_desc;
- struct kvm_stats_header header;
struct kvm_stats_desc *desc;
size_t size_desc;
- int stats_fd;
int i;

- stats_fd = vm_get_stats_fd(vm);
-
- read_stats_header(stats_fd, &header);
-
- stats_desc = read_stats_descriptors(stats_fd, &header);
+ if (!vm->stats_fd) {
+ vm->stats_fd = vm_get_stats_fd(vm);
+ read_stats_header(vm->stats_fd, &vm->stats_header);
+ vm->stats_desc = read_stats_descriptors(vm->stats_fd,
+ &vm->stats_header);
+ }

- size_desc = get_stats_descriptor_size(&header);
+ size_desc = get_stats_descriptor_size(&vm->stats_header);

- for (i = 0; i < header.num_desc; ++i) {
- desc = (void *)stats_desc + (i * size_desc);
+ for (i = 0; i < vm->stats_header.num_desc; ++i) {
+ desc = (void *)vm->stats_desc + (i * size_desc);

if (strcmp(desc->name, stat_name))
continue;

- read_stat_data(stats_fd, &header, desc, data, max_elements);
+ read_stat_data(vm->stats_fd, &vm->stats_header, desc,
+ data, max_elements);

break;
}
-
- free(stats_desc);
- close(stats_fd);
}

/* VM disable NX huge pages
diff --git a/tools/testing/selftests/kvm/lib/kvm_util_internal.h b/tools/testing/selftests/kvm/lib/kvm_util_internal.h
index a03febc24ba6..e753edd7b8d3 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util_internal.h
+++ b/tools/testing/selftests/kvm/lib/kvm_util_internal.h
@@ -67,6 +67,11 @@ struct kvm_vm {
vm_vaddr_t idt;
vm_vaddr_t handlers;
uint32_t dirty_ring_size;
+
+ /* Cache of information for binary stats interface */
+ int stats_fd;
+ struct kvm_stats_header stats_header;
+ struct kvm_stats_desc *stats_desc;
};

struct vcpu *vcpu_find(struct kvm_vm *vm, uint32_t vcpuid);
--
2.36.1.124.g0e6072fb45-goog


2022-05-27 08:07:46

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v8 08/11] KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis

In some cases, the NX hugepage mitigation for iTLB multihit is not
needed for all guests on a host. Allow disabling the mitigation on a
per-VM basis to avoid the performance hit of NX hugepages on trusted
workloads.

In order to disable NX hugepages on a VM, ensure that the userspace
actor has permission to reboot the system. Since disabling NX hugepages
would allow a guest to crash the system, it is similar to reboot
permissions.

Ideally, KVM would require userspace to prove it has access to KVM's
nx_huge_pages module param, e.g. so that userspace can opt out without
needing full reboot permissions. But getting access to the module param
file info is difficult because it is buried in layers of sysfs and module
glue. Requiring CAP_SYS_BOOT is sufficient for all known use cases.

Suggested-by: Jim Mattson <[email protected]>
Reviewed-by: David Matlack <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: Ben Gardon <[email protected]>
---
Documentation/virt/kvm/api.rst | 17 +++++++++++++++++
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/mmu/mmu_internal.h | 7 ++++---
arch/x86/kvm/mmu/spte.c | 7 ++++---
arch/x86/kvm/mmu/spte.h | 3 ++-
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
arch/x86/kvm/x86.c | 30 ++++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 1 +
8 files changed, 61 insertions(+), 8 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index a2e17baf8a5e..d1aec56a9ab7 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -7975,6 +7975,23 @@ should adjust CPUID leaf 0xA to reflect that the PMU is disabled.
When enabled, KVM will exit to userspace with KVM_EXIT_SYSTEM_EVENT of
type KVM_SYSTEM_EVENT_SUSPEND to process the guest suspend request.

+8.37 KVM_CAP_VM_DISABLE_NX_HUGE_PAGES
+---------------------------
+
+:Capability KVM_CAP_VM_DISABLE_NX_HUGE_PAGES
+:Architectures: x86
+:Type: vm
+:Parameters: arg[0] must be 0.
+:Returns 0 on success, -EPERM if the userspace process does not
+ have CAP_SYS_BOOT, -EINVAL if args[0] is not 0 or any vCPUs have been
+ created.
+
+This capability disables the NX huge pages mitigation for iTLB MULTIHIT.
+
+The capability has no effect if the nx_huge_pages module parameter is not set.
+
+This capability may only be set before any vCPUs are created.
+
9. Known KVM API problems
=========================

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ad28b0d1d5ea..de7e910741f2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1268,6 +1268,8 @@ struct kvm_arch {
* the global KVM_MAX_VCPU_IDS may lead to significant memory waste.
*/
u32 max_vcpu_ids;
+
+ bool disable_nx_huge_pages;
};

struct kvm_vm_stat {
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index bd2a26897b97..d7e915f3a013 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -141,9 +141,9 @@ void kvm_flush_remote_tlbs_with_address(struct kvm *kvm,
unsigned int pte_list_count(struct kvm_rmap_head *rmap_head);

extern int nx_huge_pages;
-static inline bool is_nx_huge_page_enabled(void)
+static inline bool is_nx_huge_page_enabled(struct kvm *kvm)
{
- return READ_ONCE(nx_huge_pages);
+ return READ_ONCE(nx_huge_pages) && !kvm->arch.disable_nx_huge_pages;
}

struct kvm_page_fault {
@@ -242,7 +242,8 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
.user = err & PFERR_USER_MASK,
.prefetch = prefetch,
.is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault),
- .nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(),
+ .nx_huge_page_workaround_enabled =
+ is_nx_huge_page_enabled(vcpu->kvm),

.max_level = KVM_MAX_HUGEPAGE_LEVEL,
.req_level = PG_LEVEL_4K,
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index cda1851ec155..f5d0977590f6 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -147,7 +147,7 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
spte |= spte_shadow_accessed_mask(spte);

if (level > PG_LEVEL_4K && (pte_access & ACC_EXEC_MASK) &&
- is_nx_huge_page_enabled()) {
+ is_nx_huge_page_enabled(vcpu->kvm)) {
pte_access &= ~ACC_EXEC_MASK;
}

@@ -246,7 +246,8 @@ static u64 make_spte_executable(u64 spte)
* This is used during huge page splitting to build the SPTEs that make up the
* new page table.
*/
-u64 make_huge_page_split_spte(u64 huge_spte, int huge_level, int index)
+u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte, int huge_level,
+ int index)
{
u64 child_spte;
int child_level;
@@ -274,7 +275,7 @@ u64 make_huge_page_split_spte(u64 huge_spte, int huge_level, int index)
* When splitting to a 4K page, mark the page executable as the
* NX hugepage mitigation no longer applies.
*/
- if (is_nx_huge_page_enabled())
+ if (is_nx_huge_page_enabled(kvm))
child_spte = make_spte_executable(child_spte);
}

diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h
index 0127bb6e3c7d..529b76ab8f46 100644
--- a/arch/x86/kvm/mmu/spte.h
+++ b/arch/x86/kvm/mmu/spte.h
@@ -425,7 +425,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn,
u64 old_spte, bool prefetch, bool can_unsync,
bool host_writable, u64 *new_spte);
-u64 make_huge_page_split_spte(u64 huge_spte, int huge_level, int index);
+u64 make_huge_page_split_spte(struct kvm *kvm, u64 huge_spte, int huge_level,
+ int index);
u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled);
u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access);
u64 mark_spte_for_access_track(u64 spte);
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 841feaa48be5..e3aef187dc50 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1488,7 +1488,7 @@ static int tdp_mmu_split_huge_page(struct kvm *kvm, struct tdp_iter *iter,
* not been linked in yet and thus is not reachable from any other CPU.
*/
for (i = 0; i < PT64_ENT_PER_PAGE; i++)
- sp->spt[i] = make_huge_page_split_spte(huge_spte, level, i);
+ sp->spt[i] = make_huge_page_split_spte(kvm, huge_spte, level, i);

/*
* Replace the huge spte with a pointer to the populated lower level
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 33653a008b28..cfa7d87a92ee 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4318,6 +4318,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_SYS_ATTRIBUTES:
case KVM_CAP_VAPIC:
case KVM_CAP_ENABLE_CAP:
+ case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES:
r = 1;
break;
case KVM_CAP_EXIT_HYPERCALL:
@@ -6125,6 +6126,35 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
}
mutex_unlock(&kvm->lock);
break;
+ case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES:
+ r = -EINVAL;
+
+ /*
+ * Since the risk of disabling NX hugepages is a guest crashing
+ * the system, ensure the userspace process has permission to
+ * reboot the system.
+ *
+ * Note that unlike the reboot() syscall, the process must have
+ * this capability in the root namespace because exposing
+ * /dev/kvm into a container does not limit the scope of the
+ * iTLB multihit bug to that container. In other words,
+ * this must use capable(), not ns_capable().
+ */
+ if (!capable(CAP_SYS_BOOT)) {
+ r = -EPERM;
+ break;
+ }
+
+ if (cap->args[0])
+ break;
+
+ mutex_lock(&kvm->lock);
+ if (!kvm->created_vcpus) {
+ kvm->arch.disable_nx_huge_pages = true;
+ r = 0;
+ }
+ mutex_unlock(&kvm->lock);
+ break;
default:
r = -EINVAL;
break;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 5088bd9f1922..8e55ed67b867 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1157,6 +1157,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_VM_TSC_CONTROL 214
#define KVM_CAP_SYSTEM_EVENT_DATA 215
#define KVM_CAP_ARM_SYSTEM_SUSPEND 216
+#define KVM_CAP_VM_DISABLE_NX_HUGE_PAGES 217

#ifdef KVM_CAP_IRQ_ROUTING

--
2.36.1.124.g0e6072fb45-goog


2022-05-27 15:57:49

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v8 10/11] KVM: selftests: Test disabling NX hugepages on a VM

Add an argument to the NX huge pages test to test disabling the feature
on a VM using the new capability.

Reviewed-by: David Matlack <[email protected]>
Signed-off-by: Ben Gardon <[email protected]>
---
.../selftests/kvm/include/kvm_util_base.h | 2 +
tools/testing/selftests/kvm/lib/kvm_util.c | 27 +++-
.../selftests/kvm/x86_64/nx_huge_pages_test.c | 140 ++++++++++++------
.../kvm/x86_64/nx_huge_pages_test.sh | 14 +-
4 files changed, 133 insertions(+), 50 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 3c9898c59ea1..6aa06a312250 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -447,4 +447,6 @@ static inline uint64_t vm_get_stat(struct kvm_vm *vm, const char *stat_name)

uint32_t guest_get_vcpuid(void);

+int __vm_disable_nx_huge_pages(struct kvm_vm *vm);
+
#endif /* SELFTEST_KVM_UTIL_BASE_H */
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 385f249c2dc5..33d4d64c1391 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -112,6 +112,11 @@ int vm_check_cap(struct kvm_vm *vm, long cap)
return ret;
}

+static int __vm_enable_cap(struct kvm_vm *vm, struct kvm_enable_cap *cap)
+{
+ return ioctl(vm->fd, KVM_ENABLE_CAP, cap);
+}
+
/* VM Enable Capability
*
* Input Args:
@@ -128,7 +133,7 @@ int vm_enable_cap(struct kvm_vm *vm, struct kvm_enable_cap *cap)
{
int ret;

- ret = ioctl(vm->fd, KVM_ENABLE_CAP, cap);
+ ret = __vm_enable_cap(vm, cap);
TEST_ASSERT(ret == 0, "KVM_ENABLE_CAP IOCTL failed,\n"
" rc: %i errno: %i", ret, errno);

@@ -2718,3 +2723,23 @@ void __vm_get_stat(struct kvm_vm *vm, const char *stat_name, uint64_t *data,
free(stats_desc);
close(stats_fd);
}
+
+/* VM disable NX huge pages
+ *
+ * Input Args:
+ * vm - Virtual Machine
+ *
+ * Output Args: None
+ *
+ * Return: On success, 0. -ERRNO on failure.
+ *
+ * Disables NX huge pages for the VM.
+ */
+int __vm_disable_nx_huge_pages(struct kvm_vm *vm)
+{
+ struct kvm_enable_cap cap = { 0 };
+
+ cap.cap = KVM_CAP_VM_DISABLE_NX_HUGE_PAGES;
+ cap.args[0] = 0;
+ return __vm_enable_cap(vm, &cap);
+}
diff --git a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
index 09e05cda3dcd..9ff554a572c0 100644
--- a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
+++ b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
@@ -107,52 +107,37 @@ static void wait_for_reclaim(int reclaim_period_ms)
nanosleep(&ts, NULL);
}

-static void help(char *name)
-{
- puts("");
- printf("usage: %s [-h] [-p period_ms] [-t token]\n", name);
- puts("");
- printf(" -p: The NX reclaim period in miliseconds.\n");
- printf(" -t: The magic token to indicate environment setup is done.\n");
- puts("");
- exit(0);
-}
-
-int main(int argc, char **argv)
+void run_test(int reclaim_period_ms, bool disable_nx_huge_pages,
+ bool reboot_permissions)
{
- int reclaim_period_ms = 0, token = 0, opt;
struct kvm_vm *vm;
+ uint64_t pages;
void *hva;
-
- while ((opt = getopt(argc, argv, "hp:t:")) != -1) {
- switch (opt) {
- case 'p':
- reclaim_period_ms = atoi(optarg);
- break;
- case 't':
- token = atoi(optarg);
- break;
- case 'h':
- default:
- help(argv[0]);
- break;
+ int r;
+
+ pages = vm_pages_needed(VM_MODE_DEFAULT, 1, DEFAULT_GUEST_PHY_PAGES,
+ 0, 0);
+ vm = vm_create_without_vcpus(VM_MODE_DEFAULT, pages);
+
+ if (disable_nx_huge_pages) {
+ /*
+ * Cannot run the test without NX huge pages if the kernel
+ * does not support it.
+ */
+ if (!kvm_check_cap(KVM_CAP_VM_DISABLE_NX_HUGE_PAGES))
+ return;
+
+ r = __vm_disable_nx_huge_pages(vm);
+ if (reboot_permissions) {
+ TEST_ASSERT(!r, "Disabling NX huge pages should succeed if process has reboot permissions");
+ } else {
+ TEST_ASSERT(r == -1 && errno == EPERM,
+ "This process should not have permission to disable NX huge pages");
+ return;
}
}

- if (token != MAGIC_TOKEN) {
- print_skip("This test must be run with the magic token %d.\n"
- "This is done by nx_huge_pages_test.sh, which\n"
- "also handles environment setup for the test.",
- MAGIC_TOKEN);
- exit(KSFT_SKIP);
- }
-
- if (!reclaim_period_ms) {
- print_skip("The NX reclaim period must be specified and non-zero");
- exit(KSFT_SKIP);
- }
-
- vm = vm_create_default(0, 0, guest_code);
+ vm_vcpu_add_default(vm, 0, guest_code);

vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS_HUGETLB,
HPAGE_GPA, HPAGE_SLOT,
@@ -185,31 +170,38 @@ int main(int argc, char **argv)
/*
* Next, the guest will execute from the first huge page, causing it
* to be remapped at 4k.
+ *
+ * If NX huge pages are disabled, this should have no effect.
*/
vcpu_run(vm, 0);
- check_2m_page_count(vm, 1);
- check_split_count(vm, 1);
+ check_2m_page_count(vm, disable_nx_huge_pages ? 2 : 1);
+ check_split_count(vm, disable_nx_huge_pages ? 0 : 1);

/*
* Executing from the third huge page (previously unaccessed) will
* cause part to be mapped at 4k.
+ *
+ * If NX huge pages are disabled, it should be mapped at 2M.
*/
vcpu_run(vm, 0);
- check_2m_page_count(vm, 1);
- check_split_count(vm, 2);
+ check_2m_page_count(vm, disable_nx_huge_pages ? 3 : 1);
+ check_split_count(vm, disable_nx_huge_pages ? 0 : 2);

/* Reading from the first huge page again should have no effect. */
vcpu_run(vm, 0);
- check_2m_page_count(vm, 1);
- check_split_count(vm, 2);
+ check_2m_page_count(vm, disable_nx_huge_pages ? 3 : 1);
+ check_split_count(vm, disable_nx_huge_pages ? 0 : 2);

/* Give recovery thread time to run. */
wait_for_reclaim(reclaim_period_ms);

/*
* Now that the reclaimer has run, all the split pages should be gone.
+ *
+ * If NX huge pages are disabled, the relaimer will not run, so
+ * nothing should change from here on.
*/
- check_2m_page_count(vm, 1);
+ check_2m_page_count(vm, disable_nx_huge_pages ? 3 : 1);
check_split_count(vm, 0);

/*
@@ -217,10 +209,62 @@ int main(int argc, char **argv)
* reading from it causes a huge page mapping to be installed.
*/
vcpu_run(vm, 0);
- check_2m_page_count(vm, 2);
+ check_2m_page_count(vm, disable_nx_huge_pages ? 3 : 2);
check_split_count(vm, 0);

kvm_vm_free(vm);
+}
+
+static void help(char *name)
+{
+ puts("");
+ printf("usage: %s [-h] [-p period_ms] [-t token]\n", name);
+ puts("");
+ printf(" -p: The NX reclaim period in miliseconds.\n");
+ printf(" -t: The magic token to indicate environment setup is done.\n");
+ printf(" -r: The test has reboot permissions and can disable NX huge pages.\n");
+ puts("");
+ exit(0);
+}
+
+int main(int argc, char **argv)
+{
+ int reclaim_period_ms = 0, token = 0, opt;
+ bool reboot_permissions = false;
+
+ while ((opt = getopt(argc, argv, "hp:t:r")) != -1) {
+ switch (opt) {
+ case 'p':
+ reclaim_period_ms = atoi(optarg);
+ break;
+ case 't':
+ token = atoi(optarg);
+ break;
+ case 'r':
+ reboot_permissions = true;
+ break;
+ case 'h':
+ default:
+ help(argv[0]);
+ break;
+ }
+ }
+
+ if (token != MAGIC_TOKEN) {
+ print_skip("This test must be run with the magic token %d.\n"
+ "This is done by nx_huge_pages_test.sh, which\n"
+ "also handles environment setup for the test.",
+ MAGIC_TOKEN);
+ exit(KSFT_SKIP);
+ }
+
+ if (!reclaim_period_ms) {
+ print_skip("The NX reclaim period must be specified and non-zero");
+ exit(KSFT_SKIP);
+ }
+
+ run_test(reclaim_period_ms, false, reboot_permissions);
+ run_test(reclaim_period_ms, true, reboot_permissions);

return 0;
}
diff --git a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
index 4e090a84f5f3..6bd8e026ee61 100755
--- a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
+++ b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
@@ -20,6 +20,8 @@ function sudo_echo () {
echo "$1" | sudo tee -a "$2" > /dev/null
}

+NXECUTABLE="$(dirname $0)/nx_huge_pages_test"
+
(
set -e

@@ -28,7 +30,17 @@ function sudo_echo () {
sudo_echo 100 /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms
sudo_echo "$(( $HUGE_PAGES + 3 ))" /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

- "$(dirname $0)"/nx_huge_pages_test -t 887563923 -p 100
+ # Test with reboot permissions
+ if [ $(whoami) != "root" ] ; then
+ sudo setcap cap_sys_boot+ep $NXECUTABLE
+ fi
+ $NXECUTABLE -t 887563923 -p 100 -r
+
+ # Test without reboot permissions
+ if [ $(whoami) != "root" ] ; then
+ sudo setcap cap_sys_boot-ep $NXECUTABLE
+ $NXECUTABLE -t 887563923 -p 100
+ fi
)
RET=$?

--
2.36.1.124.g0e6072fb45-goog


2022-05-28 06:30:26

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v8 07/11] KVM: x86: Fix errant brace in KVM capability handling

The braces around the KVM_CAP_XSAVE2 block also surround the
KVM_CAP_PMU_CAPABILITY block, likely the result of a merge issue. Simply
move the curly brace back to where it belongs.

Fixes: ba7bb663f5547 ("KVM: x86: Provide per VM capability for disabling PMU virtualization")

Reviewed-by: David Matlack <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: Ben Gardon <[email protected]>
---
arch/x86/kvm/x86.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7460b9a77d9a..33653a008b28 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4414,10 +4414,10 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
if (r < sizeof(struct kvm_xsave))
r = sizeof(struct kvm_xsave);
break;
+ }
case KVM_CAP_PMU_CAPABILITY:
r = enable_pmu ? KVM_CAP_PMU_VALID_MASK : 0;
break;
- }
case KVM_CAP_DISABLE_QUIRKS2:
r = KVM_X86_VALID_QUIRKS;
break;
--
2.36.1.124.g0e6072fb45-goog


2022-05-28 09:38:17

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v8 01/11] KVM: selftests: Remove dynamic memory allocation for stats header

There's no need to allocate dynamic memory for the stats header since
its size is known at compile time.

Reviewed-by: David Matlack <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: Ben Gardon <[email protected]>
---
.../selftests/kvm/kvm_binary_stats_test.c | 58 +++++++++----------
1 file changed, 27 insertions(+), 31 deletions(-)

diff --git a/tools/testing/selftests/kvm/kvm_binary_stats_test.c b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
index 17f65d514915..dad34d8a41fe 100644
--- a/tools/testing/selftests/kvm/kvm_binary_stats_test.c
+++ b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
@@ -26,56 +26,53 @@ static void stats_test(int stats_fd)
int i;
size_t size_desc;
size_t size_data = 0;
- struct kvm_stats_header *header;
+ struct kvm_stats_header header;
char *id;
struct kvm_stats_desc *stats_desc;
u64 *stats_data;
struct kvm_stats_desc *pdesc;

/* Read kvm stats header */
- header = malloc(sizeof(*header));
- TEST_ASSERT(header, "Allocate memory for stats header");
-
- ret = read(stats_fd, header, sizeof(*header));
- TEST_ASSERT(ret == sizeof(*header), "Read stats header");
- size_desc = sizeof(*stats_desc) + header->name_size;
+ ret = read(stats_fd, &header, sizeof(header));
+ TEST_ASSERT(ret == sizeof(header), "Read stats header");
+ size_desc = sizeof(*stats_desc) + header.name_size;

/* Read kvm stats id string */
- id = malloc(header->name_size);
+ id = malloc(header.name_size);
TEST_ASSERT(id, "Allocate memory for id string");
- ret = read(stats_fd, id, header->name_size);
- TEST_ASSERT(ret == header->name_size, "Read id string");
+ ret = read(stats_fd, id, header.name_size);
+ TEST_ASSERT(ret == header.name_size, "Read id string");

/* Check id string, that should start with "kvm" */
- TEST_ASSERT(!strncmp(id, "kvm", 3) && strlen(id) < header->name_size,
+ TEST_ASSERT(!strncmp(id, "kvm", 3) && strlen(id) < header.name_size,
"Invalid KVM stats type, id: %s", id);

/* Sanity check for other fields in header */
- if (header->num_desc == 0) {
+ if (header.num_desc == 0) {
printf("No KVM stats defined!");
return;
}
/* Check overlap */
- TEST_ASSERT(header->desc_offset > 0 && header->data_offset > 0
- && header->desc_offset >= sizeof(*header)
- && header->data_offset >= sizeof(*header),
+ TEST_ASSERT(header.desc_offset > 0 && header.data_offset > 0
+ && header.desc_offset >= sizeof(header)
+ && header.data_offset >= sizeof(header),
"Invalid offset fields in header");
- TEST_ASSERT(header->desc_offset > header->data_offset ||
- (header->desc_offset + size_desc * header->num_desc <=
- header->data_offset),
+ TEST_ASSERT(header.desc_offset > header.data_offset ||
+ (header.desc_offset + size_desc * header.num_desc <=
+ header.data_offset),
"Descriptor block is overlapped with data block");

/* Allocate memory for stats descriptors */
- stats_desc = calloc(header->num_desc, size_desc);
+ stats_desc = calloc(header.num_desc, size_desc);
TEST_ASSERT(stats_desc, "Allocate memory for stats descriptors");
/* Read kvm stats descriptors */
ret = pread(stats_fd, stats_desc,
- size_desc * header->num_desc, header->desc_offset);
- TEST_ASSERT(ret == size_desc * header->num_desc,
+ size_desc * header.num_desc, header.desc_offset);
+ TEST_ASSERT(ret == size_desc * header.num_desc,
"Read KVM stats descriptors");

/* Sanity check for fields in descriptors */
- for (i = 0; i < header->num_desc; ++i) {
+ for (i = 0; i < header.num_desc; ++i) {
pdesc = (void *)stats_desc + i * size_desc;
/* Check type,unit,base boundaries */
TEST_ASSERT((pdesc->flags & KVM_STATS_TYPE_MASK)
@@ -104,7 +101,7 @@ static void stats_test(int stats_fd)
break;
}
/* Check name string */
- TEST_ASSERT(strlen(pdesc->name) < header->name_size,
+ TEST_ASSERT(strlen(pdesc->name) < header.name_size,
"KVM stats name(%s) too long", pdesc->name);
/* Check size field, which should not be zero */
TEST_ASSERT(pdesc->size, "KVM descriptor(%s) with size of 0",
@@ -124,14 +121,14 @@ static void stats_test(int stats_fd)
size_data += pdesc->size * sizeof(*stats_data);
}
/* Check overlap */
- TEST_ASSERT(header->data_offset >= header->desc_offset
- || header->data_offset + size_data <= header->desc_offset,
+ TEST_ASSERT(header.data_offset >= header.desc_offset
+ || header.data_offset + size_data <= header.desc_offset,
"Data block is overlapped with Descriptor block");
/* Check validity of all stats data size */
- TEST_ASSERT(size_data >= header->num_desc * sizeof(*stats_data),
+ TEST_ASSERT(size_data >= header.num_desc * sizeof(*stats_data),
"Data size is not correct");
/* Check stats offset */
- for (i = 0; i < header->num_desc; ++i) {
+ for (i = 0; i < header.num_desc; ++i) {
pdesc = (void *)stats_desc + i * size_desc;
TEST_ASSERT(pdesc->offset < size_data,
"Invalid offset (%u) for stats: %s",
@@ -142,15 +139,15 @@ static void stats_test(int stats_fd)
stats_data = malloc(size_data);
TEST_ASSERT(stats_data, "Allocate memory for stats data");
/* Read kvm stats data as a bulk */
- ret = pread(stats_fd, stats_data, size_data, header->data_offset);
+ ret = pread(stats_fd, stats_data, size_data, header.data_offset);
TEST_ASSERT(ret == size_data, "Read KVM stats data");
/* Read kvm stats data one by one */
size_data = 0;
- for (i = 0; i < header->num_desc; ++i) {
+ for (i = 0; i < header.num_desc; ++i) {
pdesc = (void *)stats_desc + i * size_desc;
ret = pread(stats_fd, stats_data,
pdesc->size * sizeof(*stats_data),
- header->data_offset + size_data);
+ header.data_offset + size_data);
TEST_ASSERT(ret == pdesc->size * sizeof(*stats_data),
"Read data of KVM stats: %s", pdesc->name);
size_data += pdesc->size * sizeof(*stats_data);
@@ -159,7 +156,6 @@ static void stats_test(int stats_fd)
free(stats_data);
free(stats_desc);
free(id);
- free(header);
}


--
2.36.1.124.g0e6072fb45-goog


2022-05-28 20:18:46

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v8 02/11] KVM: selftests: Read binary stats header in lib

Move the code to read the binary stats header to the KVM selftests
library. It will be re-used by other tests to check KVM behavior.

No functional change intended.

Reviewed-by: David Matlack <[email protected]>
Reviewed-by: Peter Xu <[email protected]>
Signed-off-by: Ben Gardon <[email protected]>
---
.../selftests/kvm/include/kvm_util_base.h | 1 +
.../selftests/kvm/kvm_binary_stats_test.c | 4 ++--
tools/testing/selftests/kvm/lib/kvm_util.c | 21 +++++++++++++++++++
3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 92cef0ffb19e..749cded9b157 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -400,6 +400,7 @@ void assert_on_unhandled_exception(struct kvm_vm *vm, uint32_t vcpuid);

int vm_get_stats_fd(struct kvm_vm *vm);
int vcpu_get_stats_fd(struct kvm_vm *vm, uint32_t vcpuid);
+void read_stats_header(int stats_fd, struct kvm_stats_header *header);

uint32_t guest_get_vcpuid(void);

diff --git a/tools/testing/selftests/kvm/kvm_binary_stats_test.c b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
index dad34d8a41fe..fb511b42a03e 100644
--- a/tools/testing/selftests/kvm/kvm_binary_stats_test.c
+++ b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
@@ -33,8 +33,8 @@ static void stats_test(int stats_fd)
struct kvm_stats_desc *pdesc;

/* Read kvm stats header */
- ret = read(stats_fd, &header, sizeof(header));
- TEST_ASSERT(ret == sizeof(header), "Read stats header");
+ read_stats_header(stats_fd, &header);
+
size_desc = sizeof(*stats_desc) + header.name_size;

/* Read kvm stats id string */
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 1665a220abcb..1d75d41f92dc 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -2556,3 +2556,24 @@ int vcpu_get_stats_fd(struct kvm_vm *vm, uint32_t vcpuid)

return ioctl(vcpu->fd, KVM_GET_STATS_FD, NULL);
}
+
+/*
+ * Read binary stats header
+ *
+ * Input Args:
+ * stats_fd - the file descriptor for the binary stats file from which to read
+ *
+ * Output Args:
+ * header - a binary stats metadata header to be filled with data
+ *
+ * Return: void
+ *
+ * Read a header for the binary stats interface.
+ */
+void read_stats_header(int stats_fd, struct kvm_stats_header *header)
+{
+ ssize_t ret;
+
+ ret = read(stats_fd, header, sizeof(*header));
+ TEST_ASSERT(ret == sizeof(*header), "Read stats header");
+}
--
2.36.1.124.g0e6072fb45-goog