LinuxLists.cc - [PATCH v6 00/10] KVM: x86: Add a cap to disable NX hugepages on a VM

2022-04-22 18:28:21

Subject: [PATCH v6 00/10] KVM: x86: Add a cap to disable NX hugepages on a VM

Given the high cost of NX hugepages in terms of TLB performance, it may
be desirable to disable the mitigation on a per-VM basis. In the case of public
cloud providers with many VMs on a single host, some VMs may be more trusted
than others. In order to maximize performance on critical VMs, while still
providing some protection to the host from iTLB Multihit, allow the mitigation
to be selectively disabled.

Disabling NX hugepages on a VM is relatively straightforward, but I took this
as an opportunity to add some NX hugepages test coverage and clean up selftests
infrastructure a bit.

This series was tested with the new selftest and the rest of the KVM selftests
on an Intel Haswell machine.

The following tests failed, but I do not believe that has anything to do with
this series:
userspace_io_test
vmx_nested_tsc_scaling_test
vmx_preemption_timer_test

Changelog:
v1->v2:
Dropped the complicated memslot refactor in favor of Ricardo Koller's
patch with a similar effect.
Incorporated David Dunn's feedback and reviewed by tag: shortened waits
to speed up test.
v2->v3:
Incorporated a suggestion from David on how to build the NX huge pages
test.
Fixed a build breakage identified by David.
Dropped the per-vm nx_huge_pages field in favor of simply checking the
global + per-VM disable override.
Documented the new capability
Separated out the commit to test disabling NX huge pages
Removed permission check when checking if the disable NX capability is
supported.
Added test coverage for the permission check.
v3->v4:
Collected RB's from Jing and David
Modified stat collection to reduce a memory allocation [David]
Incorporated various improvments to the NX test [David]
Changed the NX disable test to run by default [David]
Removed some now unnecessary commits
Dropped the code to dump KVM stats from the binary stats test, and
factor out parts of the existing test to library functions instead.
[David, Jing, Sean]
Dropped the improvement to a debugging log message as it's no longer
relevant to this series.
v4->v5:
Incorporated cleanup suggestions from David and Sean
Added a patch with style fixes for the binary stats test from Sean
Added a restriction that NX huge pages can only be disabled before
vCPUs are created [Sean]

v5->v6:
Scooped up David's RBs
Added a magic token to skip nx_huge_pages_test when not run via
wrapper script [Sean]
Made the call to nx_huge_pages_test in the wrapper script more
robust [Sean]
Incorportated various nits and comment / documentation suggestions from
Sean.
Improved negative testing of NX disable without reboot permissions. [Sean]

Ben Gardon (9):
KVM: selftests: Remove dynamic memory allocation for stats header
KVM: selftests: Read binary stats header in lib
KVM: selftests: Read binary stats desc in lib
KVM: selftests: Read binary stat data in lib
KVM: selftests: Add NX huge pages test
KVM: x86: Fix errant brace in KVM capability handling
KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis
KVM: selftests: Factor out calculation of pages needed for a VM
KVM: selftests: Test disabling NX hugepages on a VM

Sean Christopherson (1):
KVM: selftests: Clean up coding style in binary stats test

Documentation/virt/kvm/api.rst | 17 ++
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/mmu.h | 8 +-
arch/x86/kvm/mmu/spte.c | 7 +-
arch/x86/kvm/mmu/spte.h | 3 +-
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
arch/x86/kvm/x86.c | 31 ++-
include/uapi/linux/kvm.h | 1 +
tools/testing/selftests/kvm/Makefile | 10 +
.../selftests/kvm/include/kvm_util_base.h | 13 +
.../selftests/kvm/kvm_binary_stats_test.c | 142 +++++-----
tools/testing/selftests/kvm/lib/kvm_util.c | 248 ++++++++++++++++--
.../selftests/kvm/x86_64/nx_huge_pages_test.c | 224 ++++++++++++++++
.../kvm/x86_64/nx_huge_pages_test.sh | 38 +++
14 files changed, 652 insertions(+), 94 deletions(-)
create mode 100644 tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
create mode 100755 tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh

--
2.36.0.rc0.470.gd361397f0d-goog

2022-04-22 18:37:28

by Ben Gardon

[permalink] [raw]

Subject: [PATCH v6 09/10] KVM: selftests: Factor out calculation of pages needed for a VM

Factor out the calculation of the number of pages needed for a VM to
make it easier to separate creating the VM and adding vCPUs.

Reviewed-by: David Matlack <[email protected]>
Signed-off-by: Ben Gardon <[email protected]>
---
.../selftests/kvm/include/kvm_util_base.h | 4 ++
tools/testing/selftests/kvm/lib/kvm_util.c | 59 ++++++++++++++-----
2 files changed, 47 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 001b55ae25f8..1dac3c6607f1 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -312,6 +312,10 @@ vm_paddr_t vm_phy_pages_alloc(struct kvm_vm *vm, size_t num,
vm_paddr_t paddr_min, uint32_t memslot);
vm_paddr_t vm_alloc_page_table(struct kvm_vm *vm);

+uint64_t vm_pages_needed(enum vm_guest_mode mode, uint32_t nr_vcpus,
+ uint64_t slot0_mem_pages, uint64_t extra_mem_pages,
+ uint32_t num_percpu_pages);
+
/*
* Create a VM with reasonable defaults
*
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 9896cc49eb54..392abd3c323d 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -377,7 +377,7 @@ struct kvm_vm *vm_create_without_vcpus(enum vm_guest_mode mode, uint64_t pages)
}

/*
- * VM Create with customized parameters
+ * Get the number of pages needed for a VM
*
* Input Args:
* mode - VM Mode (e.g. VM_MODE_P52V48_4K)
@@ -385,27 +385,17 @@ struct kvm_vm *vm_create_without_vcpus(enum vm_guest_mode mode, uint64_t pages)
* slot0_mem_pages - Slot0 physical memory size
* extra_mem_pages - Non-slot0 physical memory total size
* num_percpu_pages - Per-cpu physical memory pages
- * guest_code - Guest entry point
- * vcpuids - VCPU IDs
*
* Output Args: None
*
* Return:
- * Pointer to opaque structure that describes the created VM.
- *
- * Creates a VM with the mode specified by mode (e.g. VM_MODE_P52V48_4K),
- * with customized slot0 memory size, at least 512 pages currently.
- * extra_mem_pages is only used to calculate the maximum page table size,
- * no real memory allocation for non-slot0 memory in this function.
+ * The number of pages needed for the VM.
*/
-struct kvm_vm *vm_create_with_vcpus(enum vm_guest_mode mode, uint32_t nr_vcpus,
- uint64_t slot0_mem_pages, uint64_t extra_mem_pages,
- uint32_t num_percpu_pages, void *guest_code,
- uint32_t vcpuids[])
+uint64_t vm_pages_needed(enum vm_guest_mode mode, uint32_t nr_vcpus,
+ uint64_t slot0_mem_pages, uint64_t extra_mem_pages,
+ uint32_t num_percpu_pages)
{
uint64_t vcpu_pages, extra_pg_pages, pages;
- struct kvm_vm *vm;
- int i;

/* Force slot0 memory size not small than DEFAULT_GUEST_PHY_PAGES */
if (slot0_mem_pages < DEFAULT_GUEST_PHY_PAGES)
@@ -421,11 +411,48 @@ struct kvm_vm *vm_create_with_vcpus(enum vm_guest_mode mode, uint32_t nr_vcpus,
extra_pg_pages = (slot0_mem_pages + extra_mem_pages + vcpu_pages) / PTES_PER_MIN_PAGE * 2;
pages = slot0_mem_pages + vcpu_pages + extra_pg_pages;

+ pages = vm_adjust_num_guest_pages(mode, pages);
+
+ return pages;
+}
+
+/*
+ * VM Create with customized parameters
+ *
+ * Input Args:
+ * mode - VM Mode (e.g. VM_MODE_P52V48_4K)
+ * nr_vcpus - VCPU count
+ * slot0_mem_pages - Slot0 physical memory size
+ * extra_mem_pages - Non-slot0 physical memory total size
+ * num_percpu_pages - Per-cpu physical memory pages
+ * guest_code - Guest entry point
+ * vcpuids - VCPU IDs
+ *
+ * Output Args: None
+ *
+ * Return:
+ * Pointer to opaque structure that describes the created VM.
+ *
+ * Creates a VM with the mode specified by mode (e.g. VM_MODE_P52V48_4K),
+ * with customized slot0 memory size, at least 512 pages currently.
+ * extra_mem_pages is only used to calculate the maximum page table size,
+ * no real memory allocation for non-slot0 memory in this function.
+ */
+struct kvm_vm *vm_create_with_vcpus(enum vm_guest_mode mode, uint32_t nr_vcpus,
+ uint64_t slot0_mem_pages, uint64_t extra_mem_pages,
+ uint32_t num_percpu_pages, void *guest_code,
+ uint32_t vcpuids[])
+{
+ uint64_t pages;
+ struct kvm_vm *vm;
+ int i;
+
TEST_ASSERT(nr_vcpus <= kvm_check_cap(KVM_CAP_MAX_VCPUS),
"nr_vcpus = %d too large for host, max-vcpus = %d",
nr_vcpus, kvm_check_cap(KVM_CAP_MAX_VCPUS));

- pages = vm_adjust_num_guest_pages(mode, pages);
+ pages = vm_pages_needed(mode, nr_vcpus, slot0_mem_pages,
+ extra_mem_pages, num_percpu_pages);

vm = vm_create_without_vcpus(mode, pages);

--
2.36.0.rc0.470.gd361397f0d-goog

2022-04-22 19:46:00

by Ben Gardon

[permalink] [raw]

Subject: [PATCH v6 06/10] KVM: selftests: Add NX huge pages test

There's currently no test coverage of NX hugepages in KVM selftests, so
add a basic test to ensure that the feature works as intended.

Reviewed-by: David Matlack <[email protected]>
Signed-off-by: Ben Gardon <[email protected]>
---
tools/testing/selftests/kvm/Makefile | 10 +
.../selftests/kvm/include/kvm_util_base.h | 1 +
tools/testing/selftests/kvm/lib/kvm_util.c | 78 ++++++++
.../selftests/kvm/x86_64/nx_huge_pages_test.c | 177 ++++++++++++++++++
.../kvm/x86_64/nx_huge_pages_test.sh | 25 +++
5 files changed, 291 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
create mode 100755 tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh

diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index af582d168621..9bb9bce4df37 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -43,6 +43,10 @@ LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c lib/aarch64/handler
LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c lib/s390x/diag318_test_handler.c
LIBKVM_riscv = lib/riscv/processor.c lib/riscv/ucall.c

+# Non-compiled test targets
+TEST_PROGS_x86_64 += x86_64/nx_huge_pages_test.sh
+
+# Compiled test targets
TEST_GEN_PROGS_x86_64 = x86_64/cpuid_test
TEST_GEN_PROGS_x86_64 += x86_64/cr4_cpuid_sync_test
TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features
@@ -104,6 +108,9 @@ TEST_GEN_PROGS_x86_64 += steal_time
TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
TEST_GEN_PROGS_x86_64 += system_counter_offset_test

+# Compiled outputs used by test targets
+TEST_GEN_PROGS_EXTENDED_x86_64 += x86_64/nx_huge_pages_test
+
TEST_GEN_PROGS_aarch64 += aarch64/arch_timer
TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
@@ -142,7 +149,9 @@ TEST_GEN_PROGS_riscv += kvm_page_table_test
TEST_GEN_PROGS_riscv += set_memory_region_test
TEST_GEN_PROGS_riscv += kvm_binary_stats_test

+TEST_PROGS += $(TEST_PROGS_$(UNAME_M))
TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
+TEST_GEN_PROGS_EXTENDED += $(TEST_GEN_PROGS_EXTENDED_$(UNAME_M))
LIBKVM += $(LIBKVM_$(UNAME_M))

INSTALL_HDR_PATH = $(top_srcdir)/usr
@@ -193,6 +202,7 @@ $(OUTPUT)/libkvm.a: $(LIBKVM_OBJS)
x := $(shell mkdir -p $(sort $(dir $(TEST_GEN_PROGS))))
all: $(STATIC_LIBS)
$(TEST_GEN_PROGS): $(STATIC_LIBS)
+$(TEST_GEN_PROGS_EXTENDED): $(STATIC_LIBS)

cscope: include_paths = $(LINUX_TOOL_INCLUDE) $(LINUX_HDR_PATH) include lib ..
cscope:
diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index 2a3a4d9ed8e3..001b55ae25f8 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -406,6 +406,7 @@ struct kvm_stats_desc *read_stats_desc(int stats_fd,
int read_stat_data(int stats_fd, struct kvm_stats_header *header,
struct kvm_stats_desc *desc, uint64_t *data,
ssize_t max_elements);
+uint64_t vm_get_stat(struct kvm_vm *vm, const char *stat_name);

uint32_t guest_get_vcpuid(void);

diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index ea4ab64e5997..9896cc49eb54 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -2651,3 +2651,81 @@ int read_stat_data(int stats_fd, struct kvm_stats_header *header,

return ret;
}
+
+/*
+ * Read the data of the named stat
+ *
+ * Input Args:
+ * vm - the VM for which the stat should be read
+ * stat_name - the name of the stat to read
+ * max_elements - the maximum number of 8-byte values to read into data
+ *
+ * Output Args:
+ * data - the buffer into which stat data should be read
+ *
+ * Return:
+ * The number of data elements read into data or -ERRNO on error.
+ *
+ * Read the data values of a specified stat from the binary stats interface.
+ */
+static int __vm_get_stat(struct kvm_vm *vm, const char *stat_name,
+ uint64_t *data, ssize_t max_elements)
+{
+ struct kvm_stats_desc *stats_desc;
+ struct kvm_stats_header header;
+ struct kvm_stats_desc *desc;
+ size_t size_desc;
+ int stats_fd;
+ int ret = -EINVAL;
+ int i;
+
+ stats_fd = vm_get_stats_fd(vm);
+
+ read_stats_header(stats_fd, &header);
+
+ stats_desc = read_stats_desc(stats_fd, &header);
+
+ size_desc = sizeof(struct kvm_stats_desc) + header.name_size;
+
+ /* Read kvm stats data one by one */
+ for (i = 0; i < header.num_desc; ++i) {
+ desc = (void *)stats_desc + (i * size_desc);
+
+ if (strcmp(desc->name, stat_name))
+ continue;
+
+ ret = read_stat_data(stats_fd, &header, desc, data,
+ max_elements);
+ }
+
+ free(stats_desc);
+ close(stats_fd);
+ return ret;
+}
+
+/*
+ * Read the value of the named stat
+ *
+ * Input Args:
+ * vm - the VM for which the stat should be read
+ * stat_name - the name of the stat to read
+ *
+ * Output Args: None
+ *
+ * Return:
+ * The value of the stat
+ *
+ * Reads the value of the named stat through the binary stat interface. If
+ * the named stat has multiple data elements, only the first will be returned.
+ */
+uint64_t vm_get_stat(struct kvm_vm *vm, const char *stat_name)
+{
+ uint64_t data;
+ int ret;
+
+ ret = __vm_get_stat(vm, stat_name, &data, 1);
+ TEST_ASSERT(ret == 1,
+ "Stat %s expected to have 1 element, but %d returned",
+ stat_name, ret);
+ return data;
+}
diff --git a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
new file mode 100644
index 000000000000..1c14368500b7
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
@@ -0,0 +1,177 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * tools/testing/selftests/kvm/nx_huge_page_test.c
+ *
+ * Usage: to be run via nx_huge_page_test.sh, which does the necessary
+ * environment setup and teardown
+ *
+ * Copyright (C) 2022, Google LLC.
+ */
+
+#define _GNU_SOURCE
+
+#include <fcntl.h>
+#include <stdint.h>
+#include <time.h>
+
+#include <test_util.h>
+#include "kvm_util.h"
+
+#define HPAGE_SLOT 10
+#define HPAGE_GVA (23*1024*1024)
+#define HPAGE_GPA (10*1024*1024)
+#define HPAGE_SLOT_NPAGES (512 * 3)
+#define PAGE_SIZE 4096
+
+/*
+ * Passed by nx_huge_pages_test.sh to provide an easy warning if this test is
+ * being run without it.
+ */
+#define MAGIC_TOKEN 887563923
+
+/*
+ * x86 opcode for the return instruction. Used to call into, and then
+ * immediately return from, memory backed with hugepages.
+ */
+#define RETURN_OPCODE 0xC3
+
+/*
+ * Exit the VM after each memory access so that the userspace component of the
+ * test can make assertions about the pages backing the VM.
+ */
+void guest_code(void)
+{
+ uint64_t hpage_1 = HPAGE_GVA;
+ uint64_t hpage_2 = hpage_1 + (PAGE_SIZE * 512);
+ uint64_t hpage_3 = hpage_2 + (PAGE_SIZE * 512);
+
+ READ_ONCE(*(uint64_t *)hpage_1);
+ GUEST_SYNC(1);
+
+ READ_ONCE(*(uint64_t *)hpage_2);
+ GUEST_SYNC(2);
+
+ ((void (*)(void)) hpage_1)();
+ GUEST_SYNC(3);
+
+ ((void (*)(void)) hpage_3)();
+ GUEST_SYNC(4);
+
+ READ_ONCE(*(uint64_t *)hpage_1);
+ GUEST_SYNC(5);
+
+ READ_ONCE(*(uint64_t *)hpage_3);
+ GUEST_SYNC(6);
+}
+
+static void check_2m_page_count(struct kvm_vm *vm, int expected_pages_2m)
+{
+ int actual_pages_2m;
+
+ actual_pages_2m = vm_get_stat(vm, "pages_2m");
+
+ TEST_ASSERT(actual_pages_2m == expected_pages_2m,
+ "Unexpected 2m page count. Expected %d, got %d",
+ expected_pages_2m, actual_pages_2m);
+}
+
+static void check_split_count(struct kvm_vm *vm, int expected_splits)
+{
+ int actual_splits;
+
+ actual_splits = vm_get_stat(vm, "nx_lpage_splits");
+
+ TEST_ASSERT(actual_splits == expected_splits,
+ "Unexpected nx lpage split count. Expected %d, got %d",
+ expected_splits, actual_splits);
+}
+
+int main(int argc, char **argv)
+{
+ struct kvm_vm *vm;
+ struct timespec ts;
+ void *hva;
+
+ if (argc != 2 || strtol(argv[1], NULL, 0) != MAGIC_TOKEN) {
+ printf("This test must be run through nx_huge_pages_test.sh");
+ return KSFT_SKIP;
+ }
+
+ vm = vm_create_default(0, 0, guest_code);
+
+ vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS_HUGETLB,
+ HPAGE_GPA, HPAGE_SLOT,
+ HPAGE_SLOT_NPAGES, 0);
+
+ virt_map(vm, HPAGE_GVA, HPAGE_GPA, HPAGE_SLOT_NPAGES);
+
+ hva = addr_gpa2hva(vm, HPAGE_GPA);
+ memset(hva, RETURN_OPCODE, HPAGE_SLOT_NPAGES * PAGE_SIZE);
+
+ check_2m_page_count(vm, 0);
+ check_split_count(vm, 0);
+
+ /*
+ * The guest code will first read from the first hugepage, resulting
+ * in a huge page mapping being created.
+ */
+ vcpu_run(vm, 0);
+ check_2m_page_count(vm, 1);
+ check_split_count(vm, 0);
+
+ /*
+ * Then the guest code will read from the second hugepage, resulting
+ * in another huge page mapping being created.
+ */
+ vcpu_run(vm, 0);
+ check_2m_page_count(vm, 2);
+ check_split_count(vm, 0);
+
+ /*
+ * Next, the guest will execute from the first huge page, causing it
+ * to be remapped at 4k.
+ */
+ vcpu_run(vm, 0);
+ check_2m_page_count(vm, 1);
+ check_split_count(vm, 1);
+
+ /*
+ * Executing from the third huge page (previously unaccessed) will
+ * cause part to be mapped at 4k.
+ */
+ vcpu_run(vm, 0);
+ check_2m_page_count(vm, 1);
+ check_split_count(vm, 2);
+
+ /* Reading from the first huge page again should have no effect. */
+ vcpu_run(vm, 0);
+ check_2m_page_count(vm, 1);
+ check_split_count(vm, 2);
+
+ /*
+ * Give recovery thread time to run. The wrapper script sets
+ * recovery_period_ms to 100, so wait 5x that.
+ */
+ ts.tv_sec = 0;
+ ts.tv_nsec = 500000000;
+ nanosleep(&ts, NULL);
+
+ /*
+ * Now that the reclaimer has run, all the split pages should be gone.
+ */
+ check_2m_page_count(vm, 1);
+ check_split_count(vm, 0);
+
+ /*
+ * The 4k mapping on hpage 3 should have been removed, so check that
+ * reading from it causes a huge page mapping to be installed.
+ */
+ vcpu_run(vm, 0);
+ check_2m_page_count(vm, 2);
+ check_split_count(vm, 0);
+
+ kvm_vm_free(vm);
+
+ return 0;
+}
+
diff --git a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
new file mode 100755
index 000000000000..c2429ad8066a
--- /dev/null
+++ b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
@@ -0,0 +1,25 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-only */
+
+# tools/testing/selftests/kvm/nx_huge_page_test.sh
+# Copyright (C) 2022, Google LLC.
+
+NX_HUGE_PAGES=$(cat /sys/module/kvm/parameters/nx_huge_pages)
+NX_HUGE_PAGES_RECOVERY_RATIO=$(cat /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio)
+NX_HUGE_PAGES_RECOVERY_PERIOD=$(cat /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms)
+HUGE_PAGES=$(cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages)
+
+echo 1 > /sys/module/kvm/parameters/nx_huge_pages
+echo 1 > /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio
+echo 100 > /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms
+echo 200 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
+
+"$(dirname $0)"/nx_huge_pages_test 887563923
+RET=$?
+
+echo $NX_HUGE_PAGES > /sys/module/kvm/parameters/nx_huge_pages
+echo $NX_HUGE_PAGES_RECOVERY_RATIO > /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio
+echo $NX_HUGE_PAGES_RECOVERY_PERIOD > /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms
+echo $HUGE_PAGES > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
+
+exit $RET
--
2.36.0.rc0.470.gd361397f0d-goog

2022-04-22 20:47:48

by Ben Gardon

[permalink] [raw]

Subject: [PATCH v6 04/10] KVM: selftests: Clean up coding style in binary stats test

From: Sean Christopherson <[email protected]>

Fix a variety of code style violations and/or inconsistencies in the
binary stats test. The 80 char limit is a soft limit and can and should
be ignored/violated if doing so improves the overall code readability.

Specifically, provide consistent indentation and don't split expressions
at arbitrary points just to honor the 80 char limit.

Opportunistically expand/add comments to call out the more subtle aspects
of the code.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: David Matlack <[email protected]>
Signed-off-by: Ben Gardon <[email protected]>
---
.../selftests/kvm/kvm_binary_stats_test.c | 91 ++++++++++++-------
1 file changed, 56 insertions(+), 35 deletions(-)

diff --git a/tools/testing/selftests/kvm/kvm_binary_stats_test.c b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
index b49fae45db1e..8b31f8fc7e08 100644
--- a/tools/testing/selftests/kvm/kvm_binary_stats_test.c
+++ b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
@@ -35,47 +35,64 @@ static void stats_test(int stats_fd)
/* Read kvm stats header */
read_stats_header(stats_fd, &header);

+ /*
+ * The base size of the descriptor is defined by KVM's ABI, but the
+ * size of the name field is variable as far as KVM's ABI is concerned.
+ * But, the size of name is constant for a given instance of KVM and
+ * is provided by KVM in the overall stats header.
+ */
size_desc = sizeof(*stats_desc) + header.name_size;

/* Read kvm stats id string */
id = malloc(header.name_size);
TEST_ASSERT(id, "Allocate memory for id string");
+
ret = read(stats_fd, id, header.name_size);
TEST_ASSERT(ret == header.name_size, "Read id string");

/* Check id string, that should start with "kvm" */
TEST_ASSERT(!strncmp(id, "kvm", 3) && strlen(id) < header.name_size,
- "Invalid KVM stats type, id: %s", id);
+ "Invalid KVM stats type, id: %s", id);

/* Sanity check for other fields in header */
if (header.num_desc == 0) {
printf("No KVM stats defined!");
return;
}
- /* Check overlap */
- TEST_ASSERT(header.desc_offset > 0 && header.data_offset > 0
- && header.desc_offset >= sizeof(header)
- && header.data_offset >= sizeof(header),
- "Invalid offset fields in header");
+ /*
+ * The descriptor and data offsets must be valid, they must not overlap
+ * the header, and the descriptor and data blocks must not overlap each
+ * other. Note, the data block is rechecked after its size is known.
+ */
+ TEST_ASSERT(header.desc_offset && header.desc_offset >= sizeof(header) &&
+ header.data_offset && header.data_offset >= sizeof(header),
+ "Invalid offset fields in header");
+
TEST_ASSERT(header.desc_offset > header.data_offset ||
- (header.desc_offset + size_desc * header.num_desc <=
- header.data_offset),
- "Descriptor block is overlapped with data block");
+ (header.desc_offset + size_desc * header.num_desc <= header.data_offset),
+ "Descriptor block is overlapped with data block");

/* Read kvm stats descriptors */
stats_desc = read_stats_desc(stats_fd, &header);

/* Sanity check for fields in descriptors */
for (i = 0; i < header.num_desc; ++i) {
+ /*
+ * Note, size_desc includes the of the name field, which is
+ * variable, i.e. this is NOT equivalent to &stats_desc[i].
+ */
pdesc = (void *)stats_desc + i * size_desc;
- /* Check type,unit,base boundaries */
- TEST_ASSERT((pdesc->flags & KVM_STATS_TYPE_MASK)
- <= KVM_STATS_TYPE_MAX, "Unknown KVM stats type");
- TEST_ASSERT((pdesc->flags & KVM_STATS_UNIT_MASK)
- <= KVM_STATS_UNIT_MAX, "Unknown KVM stats unit");
- TEST_ASSERT((pdesc->flags & KVM_STATS_BASE_MASK)
- <= KVM_STATS_BASE_MAX, "Unknown KVM stats base");
- /* Check exponent for stats unit
+
+ /* Check type, unit, and base boundaries */
+ TEST_ASSERT((pdesc->flags & KVM_STATS_TYPE_MASK) <= KVM_STATS_TYPE_MAX,
+ "Unknown KVM stats type");
+ TEST_ASSERT((pdesc->flags & KVM_STATS_UNIT_MASK) <= KVM_STATS_UNIT_MAX,
+ "Unknown KVM stats unit");
+ TEST_ASSERT((pdesc->flags & KVM_STATS_BASE_MASK) <= KVM_STATS_BASE_MAX,
+ "Unknown KVM stats base");
+
+ /*
+ * Check exponent for stats unit
* Exponent for counter should be greater than or equal to 0
* Exponent for unit bytes should be greater than or equal to 0
* Exponent for unit seconds should be less than or equal to 0
@@ -86,47 +103,51 @@ static void stats_test(int stats_fd)
case KVM_STATS_UNIT_NONE:
case KVM_STATS_UNIT_BYTES:
case KVM_STATS_UNIT_CYCLES:
- TEST_ASSERT(pdesc->exponent >= 0,
- "Unsupported KVM stats unit");
+ TEST_ASSERT(pdesc->exponent >= 0, "Unsupported KVM stats unit");
break;
case KVM_STATS_UNIT_SECONDS:
- TEST_ASSERT(pdesc->exponent <= 0,
- "Unsupported KVM stats unit");
+ TEST_ASSERT(pdesc->exponent <= 0, "Unsupported KVM stats unit");
break;
}
/* Check name string */
TEST_ASSERT(strlen(pdesc->name) < header.name_size,
- "KVM stats name(%s) too long", pdesc->name);
+ "KVM stats name(%s) too long", pdesc->name);
/* Check size field, which should not be zero */
- TEST_ASSERT(pdesc->size, "KVM descriptor(%s) with size of 0",
- pdesc->name);
+ TEST_ASSERT(pdesc->size,
+ "KVM descriptor(%s) with size of 0", pdesc->name);
/* Check bucket_size field */
switch (pdesc->flags & KVM_STATS_TYPE_MASK) {
case KVM_STATS_TYPE_LINEAR_HIST:
TEST_ASSERT(pdesc->bucket_size,
- "Bucket size of Linear Histogram stats (%s) is zero",
- pdesc->name);
+ "Bucket size of Linear Histogram stats (%s) is zero",
+ pdesc->name);
break;
default:
TEST_ASSERT(!pdesc->bucket_size,
- "Bucket size of stats (%s) is not zero",
- pdesc->name);
+ "Bucket size of stats (%s) is not zero",
+ pdesc->name);
}
size_data += pdesc->size * sizeof(*stats_data);
}
- /* Check overlap */
- TEST_ASSERT(header.data_offset >= header.desc_offset
- || header.data_offset + size_data <= header.desc_offset,
- "Data block is overlapped with Descriptor block");
+
+ /*
+ * Now that the size of the data block is known, verify the data block
+ * doesn't overlap the descriptor block.
+ */
+ TEST_ASSERT(header.data_offset >= header.desc_offset ||
+ header.data_offset + size_data <= header.desc_offset,
+ "Data block is overlapped with Descriptor block");
+
/* Check validity of all stats data size */
TEST_ASSERT(size_data >= header.num_desc * sizeof(*stats_data),
- "Data size is not correct");
+ "Data size is not correct");
+
/* Check stats offset */
for (i = 0; i < header.num_desc; ++i) {
pdesc = (void *)stats_desc + i * size_desc;
TEST_ASSERT(pdesc->offset < size_data,
- "Invalid offset (%u) for stats: %s",
- pdesc->offset, pdesc->name);
+ "Invalid offset (%u) for stats: %s",
+ pdesc->offset, pdesc->name);
}

/* Allocate memory for stats data */
--
2.36.0.rc0.470.gd361397f0d-goog

2022-04-22 21:59:18

by Peter Xu

[permalink] [raw]

Subject: Re: [PATCH v6 09/10] KVM: selftests: Factor out calculation of pages needed for a VM

On Wed, Apr 20, 2022 at 10:35:12AM -0700, Ben Gardon wrote:
> Factor out the calculation of the number of pages needed for a VM to
> make it easier to separate creating the VM and adding vCPUs.
>
> Reviewed-by: David Matlack <[email protected]>
> Signed-off-by: Ben Gardon <[email protected]>

Reviewed-by: Peter Xu <[email protected]>

--
Peter Xu

2022-04-22 22:02:49

by Ben Gardon

[permalink] [raw]

Subject: [PATCH v6 01/10] KVM: selftests: Remove dynamic memory allocation for stats header

There's no need to allocate dynamic memory for the stats header since
its size is known at compile time.

Reviewed-by: David Matlack <[email protected]>
Signed-off-by: Ben Gardon <[email protected]>
---
.../selftests/kvm/kvm_binary_stats_test.c | 58 +++++++++----------
1 file changed, 27 insertions(+), 31 deletions(-)

diff --git a/tools/testing/selftests/kvm/kvm_binary_stats_test.c b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
index 17f65d514915..dad34d8a41fe 100644
--- a/tools/testing/selftests/kvm/kvm_binary_stats_test.c
+++ b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
@@ -26,56 +26,53 @@ static void stats_test(int stats_fd)
int i;
size_t size_desc;
size_t size_data = 0;
- struct kvm_stats_header *header;
+ struct kvm_stats_header header;
char *id;
struct kvm_stats_desc *stats_desc;
u64 *stats_data;
struct kvm_stats_desc *pdesc;

/* Read kvm stats header */
- header = malloc(sizeof(*header));
- TEST_ASSERT(header, "Allocate memory for stats header");
-
- ret = read(stats_fd, header, sizeof(*header));
- TEST_ASSERT(ret == sizeof(*header), "Read stats header");
- size_desc = sizeof(*stats_desc) + header->name_size;
+ ret = read(stats_fd, &header, sizeof(header));
+ TEST_ASSERT(ret == sizeof(header), "Read stats header");
+ size_desc = sizeof(*stats_desc) + header.name_size;

/* Read kvm stats id string */
- id = malloc(header->name_size);
+ id = malloc(header.name_size);
TEST_ASSERT(id, "Allocate memory for id string");
- ret = read(stats_fd, id, header->name_size);
- TEST_ASSERT(ret == header->name_size, "Read id string");
+ ret = read(stats_fd, id, header.name_size);
+ TEST_ASSERT(ret == header.name_size, "Read id string");

/* Check id string, that should start with "kvm" */
- TEST_ASSERT(!strncmp(id, "kvm", 3) && strlen(id) < header->name_size,
+ TEST_ASSERT(!strncmp(id, "kvm", 3) && strlen(id) < header.name_size,
"Invalid KVM stats type, id: %s", id);

/* Sanity check for other fields in header */
- if (header->num_desc == 0) {
+ if (header.num_desc == 0) {
printf("No KVM stats defined!");
return;
}
/* Check overlap */
- TEST_ASSERT(header->desc_offset > 0 && header->data_offset > 0
- && header->desc_offset >= sizeof(*header)
- && header->data_offset >= sizeof(*header),
+ TEST_ASSERT(header.desc_offset > 0 && header.data_offset > 0
+ && header.desc_offset >= sizeof(header)
+ && header.data_offset >= sizeof(header),
"Invalid offset fields in header");
- TEST_ASSERT(header->desc_offset > header->data_offset ||
- (header->desc_offset + size_desc * header->num_desc <=
- header->data_offset),
+ TEST_ASSERT(header.desc_offset > header.data_offset ||
+ (header.desc_offset + size_desc * header.num_desc <=
+ header.data_offset),
"Descriptor block is overlapped with data block");

/* Allocate memory for stats descriptors */
- stats_desc = calloc(header->num_desc, size_desc);
+ stats_desc = calloc(header.num_desc, size_desc);
TEST_ASSERT(stats_desc, "Allocate memory for stats descriptors");
/* Read kvm stats descriptors */
ret = pread(stats_fd, stats_desc,
- size_desc * header->num_desc, header->desc_offset);
- TEST_ASSERT(ret == size_desc * header->num_desc,
+ size_desc * header.num_desc, header.desc_offset);
+ TEST_ASSERT(ret == size_desc * header.num_desc,
"Read KVM stats descriptors");

/* Sanity check for fields in descriptors */
- for (i = 0; i < header->num_desc; ++i) {
+ for (i = 0; i < header.num_desc; ++i) {
pdesc = (void *)stats_desc + i * size_desc;
/* Check type,unit,base boundaries */
TEST_ASSERT((pdesc->flags & KVM_STATS_TYPE_MASK)
@@ -104,7 +101,7 @@ static void stats_test(int stats_fd)
break;
}
/* Check name string */
- TEST_ASSERT(strlen(pdesc->name) < header->name_size,
+ TEST_ASSERT(strlen(pdesc->name) < header.name_size,
"KVM stats name(%s) too long", pdesc->name);
/* Check size field, which should not be zero */
TEST_ASSERT(pdesc->size, "KVM descriptor(%s) with size of 0",
@@ -124,14 +121,14 @@ static void stats_test(int stats_fd)
size_data += pdesc->size * sizeof(*stats_data);
}
/* Check overlap */
- TEST_ASSERT(header->data_offset >= header->desc_offset
- || header->data_offset + size_data <= header->desc_offset,
+ TEST_ASSERT(header.data_offset >= header.desc_offset
+ || header.data_offset + size_data <= header.desc_offset,
"Data block is overlapped with Descriptor block");
/* Check validity of all stats data size */
- TEST_ASSERT(size_data >= header->num_desc * sizeof(*stats_data),
+ TEST_ASSERT(size_data >= header.num_desc * sizeof(*stats_data),
"Data size is not correct");
/* Check stats offset */
- for (i = 0; i < header->num_desc; ++i) {
+ for (i = 0; i < header.num_desc; ++i) {
pdesc = (void *)stats_desc + i * size_desc;
TEST_ASSERT(pdesc->offset < size_data,
"Invalid offset (%u) for stats: %s",
@@ -142,15 +139,15 @@ static void stats_test(int stats_fd)
stats_data = malloc(size_data);
TEST_ASSERT(stats_data, "Allocate memory for stats data");
/* Read kvm stats data as a bulk */
- ret = pread(stats_fd, stats_data, size_data, header->data_offset);
+ ret = pread(stats_fd, stats_data, size_data, header.data_offset);
TEST_ASSERT(ret == size_data, "Read KVM stats data");
/* Read kvm stats data one by one */
size_data = 0;
- for (i = 0; i < header->num_desc; ++i) {
+ for (i = 0; i < header.num_desc; ++i) {
pdesc = (void *)stats_desc + i * size_desc;
ret = pread(stats_fd, stats_data,
pdesc->size * sizeof(*stats_data),
- header->data_offset + size_data);
+ header.data_offset + size_data);
TEST_ASSERT(ret == pdesc->size * sizeof(*stats_data),
"Read data of KVM stats: %s", pdesc->name);
size_data += pdesc->size * sizeof(*stats_data);
@@ -159,7 +156,6 @@ static void stats_test(int stats_fd)
free(stats_data);
free(stats_desc);
free(id);
- free(header);
}

--
2.36.0.rc0.470.gd361397f0d-goog

2022-04-22 22:36:45

by Peter Xu

[permalink] [raw]

Subject: Re: [PATCH v6 06/10] KVM: selftests: Add NX huge pages test

On Wed, Apr 20, 2022 at 10:35:09AM -0700, Ben Gardon wrote:
> There's currently no test coverage of NX hugepages in KVM selftests, so
> add a basic test to ensure that the feature works as intended.
>
> Reviewed-by: David Matlack <[email protected]>
> Signed-off-by: Ben Gardon <[email protected]>
> ---
> tools/testing/selftests/kvm/Makefile | 10 +
> .../selftests/kvm/include/kvm_util_base.h | 1 +
> tools/testing/selftests/kvm/lib/kvm_util.c | 78 ++++++++
> .../selftests/kvm/x86_64/nx_huge_pages_test.c | 177 ++++++++++++++++++
> .../kvm/x86_64/nx_huge_pages_test.sh | 25 +++
> 5 files changed, 291 insertions(+)
> create mode 100644 tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
> create mode 100755 tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
>
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index af582d168621..9bb9bce4df37 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -43,6 +43,10 @@ LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c lib/aarch64/handler
> LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c lib/s390x/diag318_test_handler.c
> LIBKVM_riscv = lib/riscv/processor.c lib/riscv/ucall.c
>
> +# Non-compiled test targets
> +TEST_PROGS_x86_64 += x86_64/nx_huge_pages_test.sh
> +
> +# Compiled test targets
> TEST_GEN_PROGS_x86_64 = x86_64/cpuid_test
> TEST_GEN_PROGS_x86_64 += x86_64/cr4_cpuid_sync_test
> TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features
> @@ -104,6 +108,9 @@ TEST_GEN_PROGS_x86_64 += steal_time
> TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
> TEST_GEN_PROGS_x86_64 += system_counter_offset_test
>
> +# Compiled outputs used by test targets
> +TEST_GEN_PROGS_EXTENDED_x86_64 += x86_64/nx_huge_pages_test
> +
> TEST_GEN_PROGS_aarch64 += aarch64/arch_timer
> TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
> TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
> @@ -142,7 +149,9 @@ TEST_GEN_PROGS_riscv += kvm_page_table_test
> TEST_GEN_PROGS_riscv += set_memory_region_test
> TEST_GEN_PROGS_riscv += kvm_binary_stats_test
>
> +TEST_PROGS += $(TEST_PROGS_$(UNAME_M))
> TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
> +TEST_GEN_PROGS_EXTENDED += $(TEST_GEN_PROGS_EXTENDED_$(UNAME_M))
> LIBKVM += $(LIBKVM_$(UNAME_M))
>
> INSTALL_HDR_PATH = $(top_srcdir)/usr
> @@ -193,6 +202,7 @@ $(OUTPUT)/libkvm.a: $(LIBKVM_OBJS)
> x := $(shell mkdir -p $(sort $(dir $(TEST_GEN_PROGS))))
> all: $(STATIC_LIBS)
> $(TEST_GEN_PROGS): $(STATIC_LIBS)
> +$(TEST_GEN_PROGS_EXTENDED): $(STATIC_LIBS)
>
> cscope: include_paths = $(LINUX_TOOL_INCLUDE) $(LINUX_HDR_PATH) include lib ..
> cscope:
> diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
> index 2a3a4d9ed8e3..001b55ae25f8 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util_base.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
> @@ -406,6 +406,7 @@ struct kvm_stats_desc *read_stats_desc(int stats_fd,
> int read_stat_data(int stats_fd, struct kvm_stats_header *header,
> struct kvm_stats_desc *desc, uint64_t *data,
> ssize_t max_elements);
> +uint64_t vm_get_stat(struct kvm_vm *vm, const char *stat_name);
>
> uint32_t guest_get_vcpuid(void);
>
> diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> index ea4ab64e5997..9896cc49eb54 100644
> --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> @@ -2651,3 +2651,81 @@ int read_stat_data(int stats_fd, struct kvm_stats_header *header,
>
> return ret;
> }
> +
> +/*
> + * Read the data of the named stat
> + *
> + * Input Args:
> + * vm - the VM for which the stat should be read
> + * stat_name - the name of the stat to read
> + * max_elements - the maximum number of 8-byte values to read into data
> + *
> + * Output Args:
> + * data - the buffer into which stat data should be read
> + *
> + * Return:
> + * The number of data elements read into data or -ERRNO on error.
> + *
> + * Read the data values of a specified stat from the binary stats interface.
> + */
> +static int __vm_get_stat(struct kvm_vm *vm, const char *stat_name,
> + uint64_t *data, ssize_t max_elements)
> +{
> + struct kvm_stats_desc *stats_desc;
> + struct kvm_stats_header header;
> + struct kvm_stats_desc *desc;
> + size_t size_desc;
> + int stats_fd;
> + int ret = -EINVAL;
> + int i;
> +
> + stats_fd = vm_get_stats_fd(vm);
> +
> + read_stats_header(stats_fd, &header);
> +
> + stats_desc = read_stats_desc(stats_fd, &header);
> +
> + size_desc = sizeof(struct kvm_stats_desc) + header.name_size;
> +
> + /* Read kvm stats data one by one */
> + for (i = 0; i < header.num_desc; ++i) {
> + desc = (void *)stats_desc + (i * size_desc);
> +
> + if (strcmp(desc->name, stat_name))
> + continue;
> +
> + ret = read_stat_data(stats_fd, &header, desc, data,
> + max_elements);

Do we want to stop the loop when found?

> + }
> +
> + free(stats_desc);
> + close(stats_fd);

IIUC we call __vm_get_stat() a lot of times in a single test. It's kind of
against how kvm stats should be used according to the documents provided in
binary_stats.c:

* The file content of a vm/vcpu file descriptor is now defined as below:
* +-------------+
* | Header |
* +-------------+
* | id string |
* +-------------+
* | Descriptors |
* +-------------+
* | Stats Data |
* +-------------+
* Although this function allows userspace to read any amount of data (as long
* as in the limit) from any position, the typical usage would follow below
* steps:
* 1. Read header from offset 0. Get the offset of descriptors and stats data
* and some other necessary information. This is a one-time work for the
* lifecycle of the corresponding vm/vcpu stats fd.
* 2. Read id string from its offset. This is a one-time work for the lifecycle
* of the corresponding vm/vcpu stats fd.
* 3. Read descriptors from its offset and discover all the stats by parsing
* descriptors. This is a one-time work for the lifecycle of the
* corresponding vm/vcpu stats fd.
* 4. Periodically read stats data from its offset using pread.

IMHO it's nicer if we cache steps 1-3 in struct kvm_vm, but no strong
opinion (especially when I only jumped in at v6 :).

I think it'll also start to make a difference when we want a new test case
that'll need to frequently read some stat vars, then hopefully the code
introduced here can be directly reused (and I'm afraid the current approach
could drag things down when the period can be small).

> + return ret;
> +}
> +
> +/*
> + * Read the value of the named stat
> + *
> + * Input Args:
> + * vm - the VM for which the stat should be read
> + * stat_name - the name of the stat to read
> + *
> + * Output Args: None
> + *
> + * Return:
> + * The value of the stat
> + *
> + * Reads the value of the named stat through the binary stat interface. If
> + * the named stat has multiple data elements, only the first will be returned.
> + */
> +uint64_t vm_get_stat(struct kvm_vm *vm, const char *stat_name)
> +{
> + uint64_t data;
> + int ret;
> +
> + ret = __vm_get_stat(vm, stat_name, &data, 1);
> + TEST_ASSERT(ret == 1,
> + "Stat %s expected to have 1 element, but %d returned",
> + stat_name, ret);
> + return data;
> +}
> diff --git a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
> new file mode 100644
> index 000000000000..1c14368500b7
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
> @@ -0,0 +1,177 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * tools/testing/selftests/kvm/nx_huge_page_test.c
> + *
> + * Usage: to be run via nx_huge_page_test.sh, which does the necessary
> + * environment setup and teardown
> + *
> + * Copyright (C) 2022, Google LLC.
> + */
> +
> +#define _GNU_SOURCE
> +
> +#include <fcntl.h>
> +#include <stdint.h>
> +#include <time.h>
> +
> +#include <test_util.h>
> +#include "kvm_util.h"
> +
> +#define HPAGE_SLOT 10
> +#define HPAGE_GVA (23*1024*1024)
> +#define HPAGE_GPA (10*1024*1024)
> +#define HPAGE_SLOT_NPAGES (512 * 3)
> +#define PAGE_SIZE 4096
> +
> +/*
> + * Passed by nx_huge_pages_test.sh to provide an easy warning if this test is
> + * being run without it.
> + */
> +#define MAGIC_TOKEN 887563923
> +
> +/*
> + * x86 opcode for the return instruction. Used to call into, and then
> + * immediately return from, memory backed with hugepages.
> + */
> +#define RETURN_OPCODE 0xC3
> +
> +/*
> + * Exit the VM after each memory access so that the userspace component of the
> + * test can make assertions about the pages backing the VM.
> + */
> +void guest_code(void)
> +{
> + uint64_t hpage_1 = HPAGE_GVA;
> + uint64_t hpage_2 = hpage_1 + (PAGE_SIZE * 512);
> + uint64_t hpage_3 = hpage_2 + (PAGE_SIZE * 512);
> +
> + READ_ONCE(*(uint64_t *)hpage_1);
> + GUEST_SYNC(1);
> +
> + READ_ONCE(*(uint64_t *)hpage_2);
> + GUEST_SYNC(2);
> +
> + ((void (*)(void)) hpage_1)();
> + GUEST_SYNC(3);
> +
> + ((void (*)(void)) hpage_3)();
> + GUEST_SYNC(4);
> +
> + READ_ONCE(*(uint64_t *)hpage_1);
> + GUEST_SYNC(5);
> +
> + READ_ONCE(*(uint64_t *)hpage_3);
> + GUEST_SYNC(6);
> +}
> +
> +static void check_2m_page_count(struct kvm_vm *vm, int expected_pages_2m)
> +{
> + int actual_pages_2m;
> +
> + actual_pages_2m = vm_get_stat(vm, "pages_2m");
> +
> + TEST_ASSERT(actual_pages_2m == expected_pages_2m,
> + "Unexpected 2m page count. Expected %d, got %d",
> + expected_pages_2m, actual_pages_2m);
> +}
> +
> +static void check_split_count(struct kvm_vm *vm, int expected_splits)
> +{
> + int actual_splits;
> +
> + actual_splits = vm_get_stat(vm, "nx_lpage_splits");
> +
> + TEST_ASSERT(actual_splits == expected_splits,
> + "Unexpected nx lpage split count. Expected %d, got %d",
> + expected_splits, actual_splits);
> +}
> +
> +int main(int argc, char **argv)
> +{
> + struct kvm_vm *vm;
> + struct timespec ts;
> + void *hva;
> +
> + if (argc != 2 || strtol(argv[1], NULL, 0) != MAGIC_TOKEN) {
> + printf("This test must be run through nx_huge_pages_test.sh");
> + return KSFT_SKIP;
> + }
> +
> + vm = vm_create_default(0, 0, guest_code);
> +
> + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS_HUGETLB,
> + HPAGE_GPA, HPAGE_SLOT,
> + HPAGE_SLOT_NPAGES, 0);
> +
> + virt_map(vm, HPAGE_GVA, HPAGE_GPA, HPAGE_SLOT_NPAGES);
> +
> + hva = addr_gpa2hva(vm, HPAGE_GPA);
> + memset(hva, RETURN_OPCODE, HPAGE_SLOT_NPAGES * PAGE_SIZE);
> +
> + check_2m_page_count(vm, 0);
> + check_split_count(vm, 0);
> +
> + /*
> + * The guest code will first read from the first hugepage, resulting
> + * in a huge page mapping being created.
> + */
> + vcpu_run(vm, 0);
> + check_2m_page_count(vm, 1);
> + check_split_count(vm, 0);
> +
> + /*
> + * Then the guest code will read from the second hugepage, resulting
> + * in another huge page mapping being created.
> + */
> + vcpu_run(vm, 0);
> + check_2m_page_count(vm, 2);
> + check_split_count(vm, 0);
> +
> + /*
> + * Next, the guest will execute from the first huge page, causing it
> + * to be remapped at 4k.
> + */
> + vcpu_run(vm, 0);
> + check_2m_page_count(vm, 1);
> + check_split_count(vm, 1);
> +
> + /*
> + * Executing from the third huge page (previously unaccessed) will
> + * cause part to be mapped at 4k.
> + */
> + vcpu_run(vm, 0);
> + check_2m_page_count(vm, 1);
> + check_split_count(vm, 2);
> +
> + /* Reading from the first huge page again should have no effect. */
> + vcpu_run(vm, 0);
> + check_2m_page_count(vm, 1);
> + check_split_count(vm, 2);
> +
> + /*
> + * Give recovery thread time to run. The wrapper script sets
> + * recovery_period_ms to 100, so wait 5x that.
> + */
> + ts.tv_sec = 0;
> + ts.tv_nsec = 500000000;
> + nanosleep(&ts, NULL);

I understand that we don't have a good way to get notified on the kernel
thread doing the recycling, but I'd suggest instead of constantly sleeping
500ms, I'd just do a periodically checking below 2m=1 and split=0 for each
100ms and set the timeout higher (e.g., 10 seconds), so (1) on normal
systems it ends even faster, and (2) on hot systems it's even less likely
to fail.

> +
> + /*
> + * Now that the reclaimer has run, all the split pages should be gone.
> + */
> + check_2m_page_count(vm, 1);
> + check_split_count(vm, 0);
> +
> + /*
> + * The 4k mapping on hpage 3 should have been removed, so check that
> + * reading from it causes a huge page mapping to be installed.
> + */
> + vcpu_run(vm, 0);
> + check_2m_page_count(vm, 2);
> + check_split_count(vm, 0);
> +
> + kvm_vm_free(vm);
> +
> + return 0;
> +}
> +
> diff --git a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
> new file mode 100755
> index 000000000000..c2429ad8066a
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
> @@ -0,0 +1,25 @@
> +#!/bin/bash
> +# SPDX-License-Identifier: GPL-2.0-only */
> +
> +# tools/testing/selftests/kvm/nx_huge_page_test.sh
> +# Copyright (C) 2022, Google LLC.
> +
> +NX_HUGE_PAGES=$(cat /sys/module/kvm/parameters/nx_huge_pages)
> +NX_HUGE_PAGES_RECOVERY_RATIO=$(cat /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio)
> +NX_HUGE_PAGES_RECOVERY_PERIOD=$(cat /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms)
> +HUGE_PAGES=$(cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages)
> +
> +echo 1 > /sys/module/kvm/parameters/nx_huge_pages
> +echo 1 > /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio
> +echo 100 > /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms
> +echo 200 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages

We only need 3 pages, right?

Shall we also better check this "echo" didn't fail?

Thanks,

> +
> +"$(dirname $0)"/nx_huge_pages_test 887563923
> +RET=$?
> +
> +echo $NX_HUGE_PAGES > /sys/module/kvm/parameters/nx_huge_pages
> +echo $NX_HUGE_PAGES_RECOVERY_RATIO > /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio
> +echo $NX_HUGE_PAGES_RECOVERY_PERIOD > /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms
> +echo $HUGE_PAGES > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> +
> +exit $RET
> --
> 2.36.0.rc0.470.gd361397f0d-goog
>

--
Peter Xu

2022-04-27 10:45:13

by Ben Gardon

[permalink] [raw]

Subject: Re: [PATCH v6 06/10] KVM: selftests: Add NX huge pages test

On Thu, Apr 21, 2022 at 11:34 AM Peter Xu <[email protected]> wrote:
>
> On Wed, Apr 20, 2022 at 10:35:09AM -0700, Ben Gardon wrote:
> > There's currently no test coverage of NX hugepages in KVM selftests, so
> > add a basic test to ensure that the feature works as intended.
> >
> > Reviewed-by: David Matlack <[email protected]>
> > Signed-off-by: Ben Gardon <[email protected]>
> > ---
> > tools/testing/selftests/kvm/Makefile | 10 +
> > .../selftests/kvm/include/kvm_util_base.h | 1 +
> > tools/testing/selftests/kvm/lib/kvm_util.c | 78 ++++++++
> > .../selftests/kvm/x86_64/nx_huge_pages_test.c | 177 ++++++++++++++++++
> > .../kvm/x86_64/nx_huge_pages_test.sh | 25 +++
> > 5 files changed, 291 insertions(+)
> > create mode 100644 tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
> > create mode 100755 tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
> >
> > diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> > index af582d168621..9bb9bce4df37 100644
> > --- a/tools/testing/selftests/kvm/Makefile
> > +++ b/tools/testing/selftests/kvm/Makefile
> > @@ -43,6 +43,10 @@ LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c lib/aarch64/handler
> > LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c lib/s390x/diag318_test_handler.c
> > LIBKVM_riscv = lib/riscv/processor.c lib/riscv/ucall.c
> >
> > +# Non-compiled test targets
> > +TEST_PROGS_x86_64 += x86_64/nx_huge_pages_test.sh
> > +
> > +# Compiled test targets
> > TEST_GEN_PROGS_x86_64 = x86_64/cpuid_test
> > TEST_GEN_PROGS_x86_64 += x86_64/cr4_cpuid_sync_test
> > TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features
> > @@ -104,6 +108,9 @@ TEST_GEN_PROGS_x86_64 += steal_time
> > TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
> > TEST_GEN_PROGS_x86_64 += system_counter_offset_test
> >
> > +# Compiled outputs used by test targets
> > +TEST_GEN_PROGS_EXTENDED_x86_64 += x86_64/nx_huge_pages_test
> > +
> > TEST_GEN_PROGS_aarch64 += aarch64/arch_timer
> > TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions
> > TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list
> > @@ -142,7 +149,9 @@ TEST_GEN_PROGS_riscv += kvm_page_table_test
> > TEST_GEN_PROGS_riscv += set_memory_region_test
> > TEST_GEN_PROGS_riscv += kvm_binary_stats_test
> >
> > +TEST_PROGS += $(TEST_PROGS_$(UNAME_M))
> > TEST_GEN_PROGS += $(TEST_GEN_PROGS_$(UNAME_M))
> > +TEST_GEN_PROGS_EXTENDED += $(TEST_GEN_PROGS_EXTENDED_$(UNAME_M))
> > LIBKVM += $(LIBKVM_$(UNAME_M))
> >
> > INSTALL_HDR_PATH = $(top_srcdir)/usr
> > @@ -193,6 +202,7 @@ $(OUTPUT)/libkvm.a: $(LIBKVM_OBJS)
> > x := $(shell mkdir -p $(sort $(dir $(TEST_GEN_PROGS))))
> > all: $(STATIC_LIBS)
> > $(TEST_GEN_PROGS): $(STATIC_LIBS)
> > +$(TEST_GEN_PROGS_EXTENDED): $(STATIC_LIBS)
> >
> > cscope: include_paths = $(LINUX_TOOL_INCLUDE) $(LINUX_HDR_PATH) include lib ..
> > cscope:
> > diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
> > index 2a3a4d9ed8e3..001b55ae25f8 100644
> > --- a/tools/testing/selftests/kvm/include/kvm_util_base.h
> > +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
> > @@ -406,6 +406,7 @@ struct kvm_stats_desc *read_stats_desc(int stats_fd,
> > int read_stat_data(int stats_fd, struct kvm_stats_header *header,
> > struct kvm_stats_desc *desc, uint64_t *data,
> > ssize_t max_elements);
> > +uint64_t vm_get_stat(struct kvm_vm *vm, const char *stat_name);
> >
> > uint32_t guest_get_vcpuid(void);
> >
> > diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
> > index ea4ab64e5997..9896cc49eb54 100644
> > --- a/tools/testing/selftests/kvm/lib/kvm_util.c
> > +++ b/tools/testing/selftests/kvm/lib/kvm_util.c
> > @@ -2651,3 +2651,81 @@ int read_stat_data(int stats_fd, struct kvm_stats_header *header,
> >
> > return ret;
> > }
> > +
> > +/*
> > + * Read the data of the named stat
> > + *
> > + * Input Args:
> > + * vm - the VM for which the stat should be read
> > + * stat_name - the name of the stat to read
> > + * max_elements - the maximum number of 8-byte values to read into data
> > + *
> > + * Output Args:
> > + * data - the buffer into which stat data should be read
> > + *
> > + * Return:
> > + * The number of data elements read into data or -ERRNO on error.
> > + *
> > + * Read the data values of a specified stat from the binary stats interface.
> > + */
> > +static int __vm_get_stat(struct kvm_vm *vm, const char *stat_name,
> > + uint64_t *data, ssize_t max_elements)
> > +{
> > + struct kvm_stats_desc *stats_desc;
> > + struct kvm_stats_header header;
> > + struct kvm_stats_desc *desc;
> > + size_t size_desc;
> > + int stats_fd;
> > + int ret = -EINVAL;
> > + int i;
> > +
> > + stats_fd = vm_get_stats_fd(vm);
> > +
> > + read_stats_header(stats_fd, &header);
> > +
> > + stats_desc = read_stats_desc(stats_fd, &header);
> > +
> > + size_desc = sizeof(struct kvm_stats_desc) + header.name_size;
> > +
> > + /* Read kvm stats data one by one */
> > + for (i = 0; i < header.num_desc; ++i) {
> > + desc = (void *)stats_desc + (i * size_desc);
> > +
> > + if (strcmp(desc->name, stat_name))
> > + continue;
> > +
> > + ret = read_stat_data(stats_fd, &header, desc, data,
> > + max_elements);
>
> Do we want to stop the loop when found?

Woops, great point.

>
> > + }
> > +
> > + free(stats_desc);
> > + close(stats_fd);
>
> IIUC we call __vm_get_stat() a lot of times in a single test. It's kind of
> against how kvm stats should be used according to the documents provided in
> binary_stats.c:
>
> * The file content of a vm/vcpu file descriptor is now defined as below:
> * +-------------+
> * | Header |
> * +-------------+
> * | id string |
> * +-------------+
> * | Descriptors |
> * +-------------+
> * | Stats Data |
> * +-------------+
> * Although this function allows userspace to read any amount of data (as long
> * as in the limit) from any position, the typical usage would follow below
> * steps:
> * 1. Read header from offset 0. Get the offset of descriptors and stats data
> * and some other necessary information. This is a one-time work for the
> * lifecycle of the corresponding vm/vcpu stats fd.
> * 2. Read id string from its offset. This is a one-time work for the lifecycle
> * of the corresponding vm/vcpu stats fd.
> * 3. Read descriptors from its offset and discover all the stats by parsing
> * descriptors. This is a one-time work for the lifecycle of the
> * corresponding vm/vcpu stats fd.
> * 4. Periodically read stats data from its offset using pread.
>
> IMHO it's nicer if we cache steps 1-3 in struct kvm_vm, but no strong
> opinion (especially when I only jumped in at v6 :).
>
> I think it'll also start to make a difference when we want a new test case
> that'll need to frequently read some stat vars, then hopefully the code
> introduced here can be directly reused (and I'm afraid the current approach
> could drag things down when the period can be small).

David Matlack had the same feedback and I was inclined to put the
caching off until later, but since I've heard this multiple times,
I'll tack a patch on to the end of this series to do that.

>
> > + return ret;
> > +}
> > +
> > +/*
> > + * Read the value of the named stat
> > + *
> > + * Input Args:
> > + * vm - the VM for which the stat should be read
> > + * stat_name - the name of the stat to read
> > + *
> > + * Output Args: None
> > + *
> > + * Return:
> > + * The value of the stat
> > + *
> > + * Reads the value of the named stat through the binary stat interface. If
> > + * the named stat has multiple data elements, only the first will be returned.
> > + */
> > +uint64_t vm_get_stat(struct kvm_vm *vm, const char *stat_name)
> > +{
> > + uint64_t data;
> > + int ret;
> > +
> > + ret = __vm_get_stat(vm, stat_name, &data, 1);
> > + TEST_ASSERT(ret == 1,
> > + "Stat %s expected to have 1 element, but %d returned",
> > + stat_name, ret);
> > + return data;
> > +}
> > diff --git a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
> > new file mode 100644
> > index 000000000000..1c14368500b7
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.c
> > @@ -0,0 +1,177 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * tools/testing/selftests/kvm/nx_huge_page_test.c
> > + *
> > + * Usage: to be run via nx_huge_page_test.sh, which does the necessary
> > + * environment setup and teardown
> > + *
> > + * Copyright (C) 2022, Google LLC.
> > + */
> > +
> > +#define _GNU_SOURCE
> > +
> > +#include <fcntl.h>
> > +#include <stdint.h>
> > +#include <time.h>
> > +
> > +#include <test_util.h>
> > +#include "kvm_util.h"
> > +
> > +#define HPAGE_SLOT 10
> > +#define HPAGE_GVA (23*1024*1024)
> > +#define HPAGE_GPA (10*1024*1024)
> > +#define HPAGE_SLOT_NPAGES (512 * 3)
> > +#define PAGE_SIZE 4096
> > +
> > +/*
> > + * Passed by nx_huge_pages_test.sh to provide an easy warning if this test is
> > + * being run without it.
> > + */
> > +#define MAGIC_TOKEN 887563923
> > +
> > +/*
> > + * x86 opcode for the return instruction. Used to call into, and then
> > + * immediately return from, memory backed with hugepages.
> > + */
> > +#define RETURN_OPCODE 0xC3
> > +
> > +/*
> > + * Exit the VM after each memory access so that the userspace component of the
> > + * test can make assertions about the pages backing the VM.
> > + */
> > +void guest_code(void)
> > +{
> > + uint64_t hpage_1 = HPAGE_GVA;
> > + uint64_t hpage_2 = hpage_1 + (PAGE_SIZE * 512);
> > + uint64_t hpage_3 = hpage_2 + (PAGE_SIZE * 512);
> > +
> > + READ_ONCE(*(uint64_t *)hpage_1);
> > + GUEST_SYNC(1);
> > +
> > + READ_ONCE(*(uint64_t *)hpage_2);
> > + GUEST_SYNC(2);
> > +
> > + ((void (*)(void)) hpage_1)();
> > + GUEST_SYNC(3);
> > +
> > + ((void (*)(void)) hpage_3)();
> > + GUEST_SYNC(4);
> > +
> > + READ_ONCE(*(uint64_t *)hpage_1);
> > + GUEST_SYNC(5);
> > +
> > + READ_ONCE(*(uint64_t *)hpage_3);
> > + GUEST_SYNC(6);
> > +}
> > +
> > +static void check_2m_page_count(struct kvm_vm *vm, int expected_pages_2m)
> > +{
> > + int actual_pages_2m;
> > +
> > + actual_pages_2m = vm_get_stat(vm, "pages_2m");
> > +
> > + TEST_ASSERT(actual_pages_2m == expected_pages_2m,
> > + "Unexpected 2m page count. Expected %d, got %d",
> > + expected_pages_2m, actual_pages_2m);
> > +}
> > +
> > +static void check_split_count(struct kvm_vm *vm, int expected_splits)
> > +{
> > + int actual_splits;
> > +
> > + actual_splits = vm_get_stat(vm, "nx_lpage_splits");
> > +
> > + TEST_ASSERT(actual_splits == expected_splits,
> > + "Unexpected nx lpage split count. Expected %d, got %d",
> > + expected_splits, actual_splits);
> > +}
> > +
> > +int main(int argc, char **argv)
> > +{
> > + struct kvm_vm *vm;
> > + struct timespec ts;
> > + void *hva;
> > +
> > + if (argc != 2 || strtol(argv[1], NULL, 0) != MAGIC_TOKEN) {
> > + printf("This test must be run through nx_huge_pages_test.sh");
> > + return KSFT_SKIP;
> > + }
> > +
> > + vm = vm_create_default(0, 0, guest_code);
> > +
> > + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS_HUGETLB,
> > + HPAGE_GPA, HPAGE_SLOT,
> > + HPAGE_SLOT_NPAGES, 0);
> > +
> > + virt_map(vm, HPAGE_GVA, HPAGE_GPA, HPAGE_SLOT_NPAGES);
> > +
> > + hva = addr_gpa2hva(vm, HPAGE_GPA);
> > + memset(hva, RETURN_OPCODE, HPAGE_SLOT_NPAGES * PAGE_SIZE);
> > +
> > + check_2m_page_count(vm, 0);
> > + check_split_count(vm, 0);
> > +
> > + /*
> > + * The guest code will first read from the first hugepage, resulting
> > + * in a huge page mapping being created.
> > + */
> > + vcpu_run(vm, 0);
> > + check_2m_page_count(vm, 1);
> > + check_split_count(vm, 0);
> > +
> > + /*
> > + * Then the guest code will read from the second hugepage, resulting
> > + * in another huge page mapping being created.
> > + */
> > + vcpu_run(vm, 0);
> > + check_2m_page_count(vm, 2);
> > + check_split_count(vm, 0);
> > +
> > + /*
> > + * Next, the guest will execute from the first huge page, causing it
> > + * to be remapped at 4k.
> > + */
> > + vcpu_run(vm, 0);
> > + check_2m_page_count(vm, 1);
> > + check_split_count(vm, 1);
> > +
> > + /*
> > + * Executing from the third huge page (previously unaccessed) will
> > + * cause part to be mapped at 4k.
> > + */
> > + vcpu_run(vm, 0);
> > + check_2m_page_count(vm, 1);
> > + check_split_count(vm, 2);
> > +
> > + /* Reading from the first huge page again should have no effect. */
> > + vcpu_run(vm, 0);
> > + check_2m_page_count(vm, 1);
> > + check_split_count(vm, 2);
> > +
> > + /*
> > + * Give recovery thread time to run. The wrapper script sets
> > + * recovery_period_ms to 100, so wait 5x that.
> > + */
> > + ts.tv_sec = 0;
> > + ts.tv_nsec = 500000000;
> > + nanosleep(&ts, NULL);
>
> I understand that we don't have a good way to get notified on the kernel
> thread doing the recycling, but I'd suggest instead of constantly sleeping
> 500ms, I'd just do a periodically checking below 2m=1 and split=0 for each
> 100ms and set the timeout higher (e.g., 10 seconds), so (1) on normal
> systems it ends even faster, and (2) on hot systems it's even less likely
> to fail.

If the reclaimer can't get to these pages in 5x the reclaim period,
something is wrong regardless of how hot the system is. I think we
should fail in that case.
For the sake of simplicity I'm inclined to keep this as-is, but I
agree checking more frequently for completion would be a good way to
make the test run faster.

>
> > +
> > + /*
> > + * Now that the reclaimer has run, all the split pages should be gone.
> > + */
> > + check_2m_page_count(vm, 1);
> > + check_split_count(vm, 0);
> > +
> > + /*
> > + * The 4k mapping on hpage 3 should have been removed, so check that
> > + * reading from it causes a huge page mapping to be installed.
> > + */
> > + vcpu_run(vm, 0);
> > + check_2m_page_count(vm, 2);
> > + check_split_count(vm, 0);
> > +
> > + kvm_vm_free(vm);
> > +
> > + return 0;
> > +}
> > +
> > diff --git a/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
> > new file mode 100755
> > index 000000000000..c2429ad8066a
> > --- /dev/null
> > +++ b/tools/testing/selftests/kvm/x86_64/nx_huge_pages_test.sh
> > @@ -0,0 +1,25 @@
> > +#!/bin/bash
> > +# SPDX-License-Identifier: GPL-2.0-only */
> > +
> > +# tools/testing/selftests/kvm/nx_huge_page_test.sh
> > +# Copyright (C) 2022, Google LLC.
> > +
> > +NX_HUGE_PAGES=$(cat /sys/module/kvm/parameters/nx_huge_pages)
> > +NX_HUGE_PAGES_RECOVERY_RATIO=$(cat /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio)
> > +NX_HUGE_PAGES_RECOVERY_PERIOD=$(cat /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms)
> > +HUGE_PAGES=$(cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages)
> > +
> > +echo 1 > /sys/module/kvm/parameters/nx_huge_pages
> > +echo 1 > /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio
> > +echo 100 > /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms
> > +echo 200 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
>
> We only need 3 pages, right?

Ah right, now that we're only mapping the data region with hugepages
this can go down. Before we needed enough memory to store the entire
statically built binary.

>
> Shall we also better check this "echo" didn't fail?

Yes, I will add set -e to check that.

>
> Thanks,
>
> > +
> > +"$(dirname $0)"/nx_huge_pages_test 887563923
> > +RET=$?
> > +
> > +echo $NX_HUGE_PAGES > /sys/module/kvm/parameters/nx_huge_pages
> > +echo $NX_HUGE_PAGES_RECOVERY_RATIO > /sys/module/kvm/parameters/nx_huge_pages_recovery_ratio
> > +echo $NX_HUGE_PAGES_RECOVERY_PERIOD > /sys/module/kvm/parameters/nx_huge_pages_recovery_period_ms
> > +echo $HUGE_PAGES > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
> > +
> > +exit $RET
> > --
> > 2.36.0.rc0.470.gd361397f0d-goog
> >
>
> --
> Peter Xu
>