2024-04-19 09:00:05

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH v4 0/6] KVM: Guest Memory Pre-Population API

Pre-population has been requested several times to mitigate KVM page faults
during guest boot or after live migration. It is also required by TDX
before filling in the initial guest memory with measured contents; while
I am not yet sure that userspace will use this ioctl, if not the code
will be used by a TDX-specific ioctl---to pre-populate the SEPT before
invoking TDH.MEM.PAGE.ADD or TDH.MR.EXTEND.

This patch series depends on the other pieces that have been applied
to the kvm-coco-queue branch (and is present on the branch).

Paolo

v3->v4:
- renamed everything to KVM_PRE_FAULT_MEMORY, KVM_CAP_PRE_FAULT_MEMORY,
struct kvm_pre_fault_memory.
- renamed base_address field to gpa
- merged introduction of kvm_tdp_map_page() and kvm_arch_vcpu_map_memory()
in a single patch, moving the latter to mmu.c; did *not* merge them
in a single function though
- removed EINVAL return code for RET_PF_RETRY, do it in KVM and exit
on signal_pending()
- return ENOENT for RET_PF_EMULATE
- do not document the possibility that different VMs can have different
results for KVM_CHECK_EXTENSION(KVM_CAP_PRE_FAULT_MEMORY)
- return long from kvm_arch_vcpu_map_memory(), update size and gpa in
kvm_vcpu_map_memory()
- cover remaining range.size more thoroughly in the selftest

v2->v3:
- no vendor-specific hooks
- just fail if pre-population is invoked while nested virt is access
- just populate page tables for the SMM address space if invoked while
SMM is active
- struct name changed to `kvm_map_memory`
- common code has supports for KVM_CHECK_EXTENSION(KVM_MAP_MEMORY)
on the VM file descriptor, which allows to make this ioctl supported
only on a subset of VM types
- if EINTR or EAGAIN happens on the first page, it is returned. Otherwise,
the ioctl *succeeds* but mapping->size is left nonzero. While this
drops the detail as to why the system call was interrupted, it is
consistent with other interruptible system calls such as read().
- the test is not x86-specific anymore (though for now only compiled
on x86 because no other architectures supports the feature)
- instead of using __weak symbols, the code is conditional on a new
Kconfig CONFIG_KVM_GENERIC_MAP_MEMORY.


Isaku Yamahata (6):
KVM: Document KVM_PRE_FAULT_MEMORY ioctl
KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory
KVM: x86/mmu: Extract __kvm_mmu_do_page_fault()
KVM: x86/mmu: Make __kvm_mmu_do_page_fault() return mapped level
KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory()
KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY

Documentation/virt/kvm/api.rst | 50 ++++++
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/mmu/mmu.c | 72 +++++++++
arch/x86/kvm/mmu/mmu_internal.h | 42 +++--
arch/x86/kvm/x86.c | 3 +
include/linux/kvm_host.h | 5 +
include/uapi/linux/kvm.h | 10 ++
tools/include/uapi/linux/kvm.h | 8 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/pre_fault_memory_test.c | 146 ++++++++++++++++++
virt/kvm/Kconfig | 3 +
virt/kvm/kvm_main.c | 63 ++++++++
12 files changed, 390 insertions(+), 14 deletions(-)
create mode 100644 tools/testing/selftests/kvm/pre_fault_memory_test.c

--
2.43.0



2024-04-19 09:00:17

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 4/6] KVM: x86/mmu: Make __kvm_mmu_do_page_fault() return mapped level

From: Isaku Yamahata <[email protected]>

The guest memory population logic will need to know what page size or level
(4K, 2M, ...) is mapped.

Signed-off-by: Isaku Yamahata <[email protected]>
Message-ID: <eabc3f3e5eb03b370cadf6e1901ea34d7a020adc.1712785629.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <[email protected]>
---
arch/x86/kvm/mmu/mmu_internal.h | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 9baae6c223ee..b0a10f5a40dd 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -288,7 +288,8 @@ static inline void kvm_mmu_prepare_memory_fault_exit(struct kvm_vcpu *vcpu,
}

static inline int __kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
- u64 err, bool prefetch, int *emulation_type)
+ u64 err, bool prefetch,
+ int *emulation_type, u8 *level)
{
struct kvm_page_fault fault = {
.addr = cr2_or_gpa,
@@ -330,6 +331,8 @@ static inline int __kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gp

if (fault.write_fault_to_shadow_pgtable && emulation_type)
*emulation_type |= EMULTYPE_WRITE_PF_TO_SP;
+ if (level)
+ *level = fault.goal_level;

return r;
}
@@ -347,7 +350,8 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
if (!prefetch)
vcpu->stat.pf_taken++;

- r = __kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, err, prefetch, emulation_type);
+ r = __kvm_mmu_do_page_fault(vcpu, cr2_or_gpa, err, prefetch,
+ emulation_type, NULL);

/*
* Similar to above, prefetch faults aren't truly spurious, and the
--
2.43.0



2024-04-19 09:00:41

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 6/6] KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY

From: Isaku Yamahata <[email protected]>

Add a test case to exercise KVM_PRE_FAULT_MEMORY and run the guest to access the
pre-populated area. It tests KVM_PRE_FAULT_MEMORY ioctl for KVM_X86_DEFAULT_VM
and KVM_X86_SW_PROTECTED_VM.

Signed-off-by: Isaku Yamahata <[email protected]>
Message-ID: <32427791ef42e5efaafb05d2ac37fa4372715f47.1712785629.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <[email protected]>
---
tools/include/uapi/linux/kvm.h | 8 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/pre_fault_memory_test.c | 146 ++++++++++++++++++
3 files changed, 155 insertions(+)
create mode 100644 tools/testing/selftests/kvm/pre_fault_memory_test.c

diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
index c3308536482b..4d66d8afdcd1 100644
--- a/tools/include/uapi/linux/kvm.h
+++ b/tools/include/uapi/linux/kvm.h
@@ -2227,4 +2227,12 @@ struct kvm_create_guest_memfd {
__u64 reserved[6];
};

+#define KVM_PRE_FAULT_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_pre_fault_memory)
+
+struct kvm_pre_fault_memory {
+ __u64 gpa;
+ __u64 size;
+ __u64 flags;
+};
+
#endif /* __LINUX_KVM_H */
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 871e2de3eb05..61d581a4bab4 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -144,6 +144,7 @@ TEST_GEN_PROGS_x86_64 += set_memory_region_test
TEST_GEN_PROGS_x86_64 += steal_time
TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
TEST_GEN_PROGS_x86_64 += system_counter_offset_test
+TEST_GEN_PROGS_x86_64 += pre_fault_memory_test

# Compiled outputs used by test targets
TEST_GEN_PROGS_EXTENDED_x86_64 += x86_64/nx_huge_pages_test
diff --git a/tools/testing/selftests/kvm/pre_fault_memory_test.c b/tools/testing/selftests/kvm/pre_fault_memory_test.c
new file mode 100644
index 000000000000..e56eed2c1f05
--- /dev/null
+++ b/tools/testing/selftests/kvm/pre_fault_memory_test.c
@@ -0,0 +1,146 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2024, Intel, Inc
+ *
+ * Author:
+ * Isaku Yamahata <isaku.yamahata at gmail.com>
+ */
+#include <linux/sizes.h>
+
+#include <test_util.h>
+#include <kvm_util.h>
+#include <processor.h>
+
+/* Arbitrarily chosen values */
+#define TEST_SIZE (SZ_2M + PAGE_SIZE)
+#define TEST_NPAGES (TEST_SIZE / PAGE_SIZE)
+#define TEST_SLOT 10
+
+static void guest_code(uint64_t base_gpa)
+{
+ volatile uint64_t val __used;
+ int i;
+
+ for (i = 0; i < TEST_NPAGES; i++) {
+ uint64_t *src = (uint64_t *)(base_gpa + i * PAGE_SIZE);
+
+ val = *src;
+ }
+
+ GUEST_DONE();
+}
+
+static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size,
+ u64 left)
+{
+ struct kvm_pre_fault_memory range = {
+ .gpa = gpa,
+ .size = size,
+ .flags = 0,
+ };
+ u64 prev;
+ int ret, save_errno;
+
+ do {
+ prev = range.size;
+ ret = __vcpu_ioctl(vcpu, KVM_PRE_FAULT_MEMORY, &range);
+ save_errno = errno;
+ TEST_ASSERT((range.size < prev) ^ (ret < 0),
+ "%sexpecting range.size to change on %s",
+ ret < 0 ? "not " : "",
+ ret < 0 ? "failure" : "success");
+ } while (ret >= 0 ? range.size : save_errno == EINTR);
+
+ TEST_ASSERT(range.size == left,
+ "Completed with %lld bytes left, expected %" PRId64,
+ range.size, left);
+
+ if (left == 0)
+ __TEST_ASSERT_VM_VCPU_IOCTL(!ret, "KVM_PRE_FAULT_MEMORY", ret, vcpu->vm);
+ else
+ /* No memory slot causes RET_PF_EMULATE. it results in -ENOENT. */
+ __TEST_ASSERT_VM_VCPU_IOCTL(ret && save_errno == ENOENT,
+ "KVM_PRE_FAULT_MEMORY", ret, vcpu->vm);
+}
+
+static void __test_pre_fault_memory(unsigned long vm_type, bool private)
+{
+ const struct vm_shape shape = {
+ .mode = VM_MODE_DEFAULT,
+ .type = vm_type,
+ };
+ struct kvm_vcpu *vcpu;
+ struct kvm_run *run;
+ struct kvm_vm *vm;
+ struct ucall uc;
+
+ uint64_t guest_test_phys_mem;
+ uint64_t guest_test_virt_mem;
+ uint64_t alignment, guest_page_size;
+
+ vm = vm_create_shape_with_one_vcpu(shape, &vcpu, guest_code);
+
+ alignment = guest_page_size = vm_guest_mode_params[VM_MODE_DEFAULT].page_size;
+ guest_test_phys_mem = (vm->max_gfn - TEST_NPAGES) * guest_page_size;
+#ifdef __s390x__
+ alignment = max(0x100000UL, guest_page_size);
+#else
+ alignment = SZ_2M;
+#endif
+ guest_test_phys_mem = align_down(guest_test_phys_mem, alignment);
+ guest_test_virt_mem = guest_test_phys_mem;
+
+ vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
+ guest_test_phys_mem, TEST_SLOT, TEST_NPAGES,
+ private ? KVM_MEM_GUEST_MEMFD : 0);
+ virt_map(vm, guest_test_virt_mem, guest_test_phys_mem, TEST_NPAGES);
+
+ if (private)
+ vm_mem_set_private(vm, guest_test_phys_mem, TEST_SIZE);
+ pre_fault_memory(vcpu, guest_test_phys_mem, SZ_2M, 0);
+ pre_fault_memory(vcpu, guest_test_phys_mem + SZ_2M, PAGE_SIZE * 2, PAGE_SIZE);
+ pre_fault_memory(vcpu, guest_test_phys_mem + TEST_SIZE, PAGE_SIZE, PAGE_SIZE);
+
+ vcpu_args_set(vcpu, 1, guest_test_virt_mem);
+ vcpu_run(vcpu);
+
+ run = vcpu->run;
+ TEST_ASSERT(run->exit_reason == KVM_EXIT_IO,
+ "Wanted KVM_EXIT_IO, got exit reason: %u (%s)",
+ run->exit_reason, exit_reason_str(run->exit_reason));
+
+ switch (get_ucall(vcpu, &uc)) {
+ case UCALL_ABORT:
+ REPORT_GUEST_ASSERT(uc);
+ break;
+ case UCALL_DONE:
+ break;
+ default:
+ TEST_FAIL("Unknown ucall 0x%lx.", uc.cmd);
+ break;
+ }
+
+ kvm_vm_free(vm);
+}
+
+static void test_pre_fault_memory(unsigned long vm_type, bool private)
+{
+ if (vm_type && !(kvm_check_cap(KVM_CAP_VM_TYPES) & BIT(vm_type))) {
+ pr_info("Skipping tests for vm_type 0x%lx\n", vm_type);
+ return;
+ }
+
+ __test_pre_fault_memory(vm_type, private);
+}
+
+int main(int argc, char *argv[])
+{
+ TEST_REQUIRE(kvm_check_cap(KVM_CAP_PRE_FAULT_MEMORY));
+
+ test_pre_fault_memory(0, false);
+#ifdef __x86_64__
+ test_pre_fault_memory(KVM_X86_SW_PROTECTED_VM, false);
+ test_pre_fault_memory(KVM_X86_SW_PROTECTED_VM, true);
+#endif
+ return 0;
+}
--
2.43.0


2024-04-19 09:01:05

by Paolo Bonzini

[permalink] [raw]
Subject: [PATCH 2/6] KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory

From: Isaku Yamahata <[email protected]>

Add a new ioctl KVM_PRE_FAULT_MEMORY in the KVM common code. It iterates on the
memory range and calls the arch-specific function. Add stub arch function
as a weak symbol.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Isaku Yamahata <[email protected]>
Reviewed-by: Rick Edgecombe <[email protected]>
Message-ID: <819322b8f25971f2b9933bfa4506e618508ad782.1712785629.git.isaku.yamahata@intel.com>
Signed-off-by: Paolo Bonzini <[email protected]>
---
include/linux/kvm_host.h | 5 ++++
include/uapi/linux/kvm.h | 10 +++++++
virt/kvm/Kconfig | 3 ++
virt/kvm/kvm_main.c | 63 ++++++++++++++++++++++++++++++++++++++++
4 files changed, 81 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8dea11701ab2..9e9943e5e37c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2478,4 +2478,9 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t gfn, void __user *src, long npages
void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
#endif

+#ifdef CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY
+long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
+ struct kvm_pre_fault_memory *range);
+#endif
+
#endif
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..917d2964947d 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -917,6 +917,7 @@ struct kvm_enable_cap {
#define KVM_CAP_MEMORY_ATTRIBUTES 233
#define KVM_CAP_GUEST_MEMFD 234
#define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_PRE_FAULT_MEMORY 236

struct kvm_irq_routing_irqchip {
__u32 irqchip;
@@ -1548,4 +1549,13 @@ struct kvm_create_guest_memfd {
__u64 reserved[6];
};

+#define KVM_PRE_FAULT_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_pre_fault_memory)
+
+struct kvm_pre_fault_memory {
+ __u64 gpa;
+ __u64 size;
+ __u64 flags;
+ __u64 padding[5];
+};
+
#endif /* __LINUX_KVM_H */
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 754c6c923427..b14e14cdbfb9 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -67,6 +67,9 @@ config HAVE_KVM_INVALID_WAKEUPS
config KVM_GENERIC_DIRTYLOG_READ_PROTECT
bool

+config KVM_GENERIC_PRE_FAULT_MEMORY
+ bool
+
config KVM_COMPAT
def_bool y
depends on KVM && COMPAT && !(S390 || ARM64 || RISCV)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 38b498669ef9..51d8dbe7e93b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4379,6 +4379,55 @@ static int kvm_vcpu_ioctl_get_stats_fd(struct kvm_vcpu *vcpu)
return fd;
}

+#ifdef CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY
+static int kvm_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
+ struct kvm_pre_fault_memory *range)
+{
+ int idx;
+ long r;
+ u64 full_size;
+
+ if (range->flags)
+ return -EINVAL;
+
+ if (!PAGE_ALIGNED(range->gpa) ||
+ !PAGE_ALIGNED(range->size) ||
+ range->gpa + range->size <= range->gpa)
+ return -EINVAL;
+
+ if (!range->size)
+ return 0;
+
+ vcpu_load(vcpu);
+ idx = srcu_read_lock(&vcpu->kvm->srcu);
+
+ full_size = range->size;
+ do {
+ if (signal_pending(current)) {
+ r = -EINTR;
+ break;
+ }
+
+ r = kvm_arch_vcpu_pre_fault_memory(vcpu, range);
+ if (r < 0)
+ break;
+
+ if (WARN_ON_ONCE(r == 0))
+ break;
+
+ range->size -= r;
+ range->gpa += r;
+ cond_resched();
+ } while (range->size);
+
+ srcu_read_unlock(&vcpu->kvm->srcu, idx);
+ vcpu_put(vcpu);
+
+ /* Return success if at least one page was mapped successfully. */
+ return full_size == range->size ? r : 0;
+}
+#endif
+
static long kvm_vcpu_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg)
{
@@ -4580,6 +4629,20 @@ static long kvm_vcpu_ioctl(struct file *filp,
r = kvm_vcpu_ioctl_get_stats_fd(vcpu);
break;
}
+#ifdef CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY
+ case KVM_PRE_FAULT_MEMORY: {
+ struct kvm_pre_fault_memory range;
+
+ r = -EFAULT;
+ if (copy_from_user(&range, argp, sizeof(range)))
+ break;
+ r = kvm_vcpu_pre_fault_memory(vcpu, &range);
+ /* Pass back leftover range. */
+ if (copy_to_user(argp, &range, sizeof(range)))
+ r = -EFAULT;
+ break;
+ }
+#endif
default:
r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
}
--
2.43.0



2024-04-22 06:56:48

by Binbin Wu

[permalink] [raw]
Subject: Re: [PATCH 2/6] KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory



On 4/19/2024 4:59 PM, Paolo Bonzini wrote:
> From: Isaku Yamahata <[email protected]>
>
> Add a new ioctl KVM_PRE_FAULT_MEMORY in the KVM common code. It iterates on the
> memory range and calls the arch-specific function. Add stub arch function
> as a weak symbol.
>
> Suggested-by: Sean Christopherson <[email protected]>
> Signed-off-by: Isaku Yamahata <[email protected]>
> Reviewed-by: Rick Edgecombe <[email protected]>
> Message-ID: <819322b8f25971f2b9933bfa4506e618508ad782.1712785629.git.isaku.yamahata@intel.com>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> include/linux/kvm_host.h | 5 ++++
> include/uapi/linux/kvm.h | 10 +++++++
> virt/kvm/Kconfig | 3 ++
> virt/kvm/kvm_main.c | 63 ++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 81 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 8dea11701ab2..9e9943e5e37c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2478,4 +2478,9 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t gfn, void __user *src, long npages
> void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
> #endif
>
> +#ifdef CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY
> +long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> + struct kvm_pre_fault_memory *range);
> +#endif
> +
> #endif
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 2190adbe3002..917d2964947d 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -917,6 +917,7 @@ struct kvm_enable_cap {
> #define KVM_CAP_MEMORY_ATTRIBUTES 233
> #define KVM_CAP_GUEST_MEMFD 234
> #define KVM_CAP_VM_TYPES 235
> +#define KVM_CAP_PRE_FAULT_MEMORY 236
>
> struct kvm_irq_routing_irqchip {
> __u32 irqchip;
> @@ -1548,4 +1549,13 @@ struct kvm_create_guest_memfd {
> __u64 reserved[6];
> };
>
> +#define KVM_PRE_FAULT_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_pre_fault_memory)
> +
> +struct kvm_pre_fault_memory {
> + __u64 gpa;
> + __u64 size;
> + __u64 flags;
> + __u64 padding[5];
> +};
> +
> #endif /* __LINUX_KVM_H */
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 754c6c923427..b14e14cdbfb9 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -67,6 +67,9 @@ config HAVE_KVM_INVALID_WAKEUPS
> config KVM_GENERIC_DIRTYLOG_READ_PROTECT
> bool
>
> +config KVM_GENERIC_PRE_FAULT_MEMORY
> + bool
> +
> config KVM_COMPAT
> def_bool y
> depends on KVM && COMPAT && !(S390 || ARM64 || RISCV)
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 38b498669ef9..51d8dbe7e93b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4379,6 +4379,55 @@ static int kvm_vcpu_ioctl_get_stats_fd(struct kvm_vcpu *vcpu)
> return fd;
> }
>
> +#ifdef CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY
> +static int kvm_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> + struct kvm_pre_fault_memory *range)
> +{
> + int idx;
> + long r;
> + u64 full_size;
> +
> + if (range->flags)
> + return -EINVAL;
> +
> + if (!PAGE_ALIGNED(range->gpa) ||
> + !PAGE_ALIGNED(range->size) ||
> + range->gpa + range->size <= range->gpa)
> + return -EINVAL;
> +
> + if (!range->size)
> + return 0;

range->size equals 0 can be covered by "range->gpa + range->size <=
range->gpa"

If we want to return success when size is 0 (, though I am not sure it's
needed),
we need to use "range->gpa + range->size < range->gpa" instead.


> +
> + vcpu_load(vcpu);
> + idx = srcu_read_lock(&vcpu->kvm->srcu);
> +
> + full_size = range->size;
> + do {
> + if (signal_pending(current)) {
> + r = -EINTR;
> + break;
> + }
> +
> + r = kvm_arch_vcpu_pre_fault_memory(vcpu, range);
> + if (r < 0)
> + break;
> +
> + if (WARN_ON_ONCE(r == 0))
> + break;
> +
> + range->size -= r;
> + range->gpa += r;
> + cond_resched();
> + } while (range->size);
> +
> + srcu_read_unlock(&vcpu->kvm->srcu, idx);
> + vcpu_put(vcpu);
> +
> + /* Return success if at least one page was mapped successfully. */
> + return full_size == range->size ? r : 0;
> +}
> +#endif
> +
> static long kvm_vcpu_ioctl(struct file *filp,
> unsigned int ioctl, unsigned long arg)
> {
> @@ -4580,6 +4629,20 @@ static long kvm_vcpu_ioctl(struct file *filp,
> r = kvm_vcpu_ioctl_get_stats_fd(vcpu);
> break;
> }
> +#ifdef CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY
> + case KVM_PRE_FAULT_MEMORY: {
> + struct kvm_pre_fault_memory range;
> +
> + r = -EFAULT;
> + if (copy_from_user(&range, argp, sizeof(range)))
> + break;
> + r = kvm_vcpu_pre_fault_memory(vcpu, &range);
> + /* Pass back leftover range. */
> + if (copy_to_user(argp, &range, sizeof(range)))
> + r = -EFAULT;
> + break;
> + }
> +#endif
> default:
> r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
> }


2024-04-22 09:51:56

by Binbin Wu

[permalink] [raw]
Subject: Re: [PATCH 2/6] KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory



On 4/19/2024 4:59 PM, Paolo Bonzini wrote:
> From: Isaku Yamahata <[email protected]>
>
> Add a new ioctl KVM_PRE_FAULT_MEMORY in the KVM common code. It iterates on the
> memory range and calls the arch-specific function. Add stub arch function
> as a weak symbol.

The description is stale. The weak symbol was removed since v3.


2024-04-22 17:53:54

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH 6/6] KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY

On Fri, Apr 19, 2024 at 04:59:27AM -0400,
Paolo Bonzini <[email protected]> wrote:

> From: Isaku Yamahata <[email protected]>
>
> Add a test case to exercise KVM_PRE_FAULT_MEMORY and run the guest to access the
> pre-populated area. It tests KVM_PRE_FAULT_MEMORY ioctl for KVM_X86_DEFAULT_VM
> and KVM_X86_SW_PROTECTED_VM.
>
> Signed-off-by: Isaku Yamahata <[email protected]>
> Message-ID: <32427791ef42e5efaafb05d2ac37fa4372715f47.1712785629.git.isaku.yamahata@intel.com>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> tools/include/uapi/linux/kvm.h | 8 +
> tools/testing/selftests/kvm/Makefile | 1 +
> .../selftests/kvm/pre_fault_memory_test.c | 146 ++++++++++++++++++
> 3 files changed, 155 insertions(+)
> create mode 100644 tools/testing/selftests/kvm/pre_fault_memory_test.c
>
> diff --git a/tools/include/uapi/linux/kvm.h b/tools/include/uapi/linux/kvm.h
> index c3308536482b..4d66d8afdcd1 100644
> --- a/tools/include/uapi/linux/kvm.h
> +++ b/tools/include/uapi/linux/kvm.h
> @@ -2227,4 +2227,12 @@ struct kvm_create_guest_memfd {
> __u64 reserved[6];
> };
>
> +#define KVM_PRE_FAULT_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_pre_fault_memory)
> +
> +struct kvm_pre_fault_memory {
> + __u64 gpa;
> + __u64 size;
> + __u64 flags;

nitpick: catch up for struct update.
+ __u64 padding[5];

> +};
> +
> #endif /* __LINUX_KVM_H */
--
Isaku Yamahata <[email protected]>

2024-04-22 18:00:54

by Isaku Yamahata

[permalink] [raw]
Subject: Re: [PATCH 2/6] KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory

On Fri, Apr 19, 2024 at 04:59:23AM -0400,
Paolo Bonzini <[email protected]> wrote:

> From: Isaku Yamahata <[email protected]>
>
> Add a new ioctl KVM_PRE_FAULT_MEMORY in the KVM common code. It iterates on the
> memory range and calls the arch-specific function. Add stub arch function
> as a weak symbol.
>
> Suggested-by: Sean Christopherson <[email protected]>
> Signed-off-by: Isaku Yamahata <[email protected]>
> Reviewed-by: Rick Edgecombe <[email protected]>
> Message-ID: <819322b8f25971f2b9933bfa4506e618508ad782.1712785629.git.isaku.yamahata@intel.com>
> Signed-off-by: Paolo Bonzini <[email protected]>
> ---
> include/linux/kvm_host.h | 5 ++++
> include/uapi/linux/kvm.h | 10 +++++++
> virt/kvm/Kconfig | 3 ++
> virt/kvm/kvm_main.c | 63 ++++++++++++++++++++++++++++++++++++++++
> 4 files changed, 81 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 8dea11701ab2..9e9943e5e37c 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -2478,4 +2478,9 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t gfn, void __user *src, long npages
> void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end);
> #endif
>
> +#ifdef CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY
> +long kvm_arch_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> + struct kvm_pre_fault_memory *range);
> +#endif
> +
> #endif
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 2190adbe3002..917d2964947d 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -917,6 +917,7 @@ struct kvm_enable_cap {
> #define KVM_CAP_MEMORY_ATTRIBUTES 233
> #define KVM_CAP_GUEST_MEMFD 234
> #define KVM_CAP_VM_TYPES 235
> +#define KVM_CAP_PRE_FAULT_MEMORY 236
>
> struct kvm_irq_routing_irqchip {
> __u32 irqchip;
> @@ -1548,4 +1549,13 @@ struct kvm_create_guest_memfd {
> __u64 reserved[6];
> };
>
> +#define KVM_PRE_FAULT_MEMORY _IOWR(KVMIO, 0xd5, struct kvm_pre_fault_memory)
> +
> +struct kvm_pre_fault_memory {
> + __u64 gpa;
> + __u64 size;
> + __u64 flags;
> + __u64 padding[5];
> +};
> +
> #endif /* __LINUX_KVM_H */
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 754c6c923427..b14e14cdbfb9 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -67,6 +67,9 @@ config HAVE_KVM_INVALID_WAKEUPS
> config KVM_GENERIC_DIRTYLOG_READ_PROTECT
> bool
>
> +config KVM_GENERIC_PRE_FAULT_MEMORY
> + bool
> +
> config KVM_COMPAT
> def_bool y
> depends on KVM && COMPAT && !(S390 || ARM64 || RISCV)
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 38b498669ef9..51d8dbe7e93b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -4379,6 +4379,55 @@ static int kvm_vcpu_ioctl_get_stats_fd(struct kvm_vcpu *vcpu)
> return fd;
> }
>
> +#ifdef CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY
> +static int kvm_vcpu_pre_fault_memory(struct kvm_vcpu *vcpu,
> + struct kvm_pre_fault_memory *range)
> +{
> + int idx;
> + long r;
> + u64 full_size;
> +
> + if (range->flags)
> + return -EINVAL;

To keep future extensively, check the padding are zero.
Or will we be rely on flags?

if (!memchr_inv(range->padding, 0, sizeof(range->padding)))
return -EINVAL;
--
Isaku Yamahata <[email protected]>

2024-04-23 15:20:20

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH 6/6] KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY

On 4/19/2024 4:59 PM, Paolo Bonzini wrote:

..

> +static void __test_pre_fault_memory(unsigned long vm_type, bool private)
> +{
> + const struct vm_shape shape = {
> + .mode = VM_MODE_DEFAULT,
> + .type = vm_type,
> + };
> + struct kvm_vcpu *vcpu;
> + struct kvm_run *run;
> + struct kvm_vm *vm;
> + struct ucall uc;
> +
> + uint64_t guest_test_phys_mem;
> + uint64_t guest_test_virt_mem;
> + uint64_t alignment, guest_page_size;
> +
> + vm = vm_create_shape_with_one_vcpu(shape, &vcpu, guest_code);
> +
> + alignment = guest_page_size = vm_guest_mode_params[VM_MODE_DEFAULT].page_size;
> + guest_test_phys_mem = (vm->max_gfn - TEST_NPAGES) * guest_page_size;
> +#ifdef __s390x__
> + alignment = max(0x100000UL, guest_page_size);
> +#else
> + alignment = SZ_2M;
> +#endif
> + guest_test_phys_mem = align_down(guest_test_phys_mem, alignment);
> + guest_test_virt_mem = guest_test_phys_mem;

guest_test_virt_mem cannot be assigned as guest_test_phys_mem, which
leads to following virt_map() fails with

==== Test Assertion Failure ====
lib/x86_64/processor.c:197: sparsebit_is_set(vm->vpages_valid, (vaddr
>> vm->page_shift))
pid=4773 tid=4773 errno=0 - Success
1 0x000000000040f55c: __virt_pg_map at processor.c:197
2 0x000000000040605e: virt_pg_map at kvm_util_base.h:1065
3 (inlined by) virt_map at kvm_util.c:1571
4 0x0000000000402b75: __test_pre_fault_memory at
pre_fault_memory_test.c:96
5 0x000000000040246e: test_pre_fault_memory at
pre_fault_memory_test.c:133 (discriminator 3)
6 (inlined by) main at pre_fault_memory_test.c:140 (discriminator 3)
7 0x00007fcb68429d8f: ?? ??:0
8 0x00007fcb68429e3f: ?? ??:0
9 0x00000000004024e4: _start at ??:?
Invalid virtual address, vaddr: 0xfffffffc00000

> +
> + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
> + guest_test_phys_mem, TEST_SLOT, TEST_NPAGES,
> + private ? KVM_MEM_GUEST_MEMFD : 0);
> + virt_map(vm, guest_test_virt_mem, guest_test_phys_mem, TEST_NPAGES);





2024-04-24 02:00:07

by Xiaoyao Li

[permalink] [raw]
Subject: Re: [PATCH 6/6] KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY

On 4/23/2024 11:18 PM, Xiaoyao Li wrote:
> On 4/19/2024 4:59 PM, Paolo Bonzini wrote:
>
> ...
>
>> +static void __test_pre_fault_memory(unsigned long vm_type, bool private)
>> +{
>> +    const struct vm_shape shape = {
>> +        .mode = VM_MODE_DEFAULT,
>> +        .type = vm_type,
>> +    };
>> +    struct kvm_vcpu *vcpu;
>> +    struct kvm_run *run;
>> +    struct kvm_vm *vm;
>> +    struct ucall uc;
>> +
>> +    uint64_t guest_test_phys_mem;
>> +    uint64_t guest_test_virt_mem;
>> +    uint64_t alignment, guest_page_size;
>> +
>> +    vm = vm_create_shape_with_one_vcpu(shape, &vcpu, guest_code);
>> +
>> +    alignment = guest_page_size =
>> vm_guest_mode_params[VM_MODE_DEFAULT].page_size;
>> +    guest_test_phys_mem = (vm->max_gfn - TEST_NPAGES) * guest_page_size;
>> +#ifdef __s390x__
>> +    alignment = max(0x100000UL, guest_page_size);
>> +#else
>> +    alignment = SZ_2M;
>> +#endif
>> +    guest_test_phys_mem = align_down(guest_test_phys_mem, alignment);
>> +    guest_test_virt_mem = guest_test_phys_mem;
>
> guest_test_virt_mem cannot be assigned as guest_test_phys_mem, which
> leads to following virt_map() fails with

The root cause is that vm->pa_bits is 52 while vm->va_bits is 48. So
vm->max_gfn is beyond the capability of va space

> ==== Test Assertion Failure ====
>   lib/x86_64/processor.c:197: sparsebit_is_set(vm->vpages_valid, (vaddr
> >> vm->page_shift))
>   pid=4773 tid=4773 errno=0 - Success
>      1    0x000000000040f55c: __virt_pg_map at processor.c:197
>      2    0x000000000040605e: virt_pg_map at kvm_util_base.h:1065
>      3     (inlined by) virt_map at kvm_util.c:1571
>      4    0x0000000000402b75: __test_pre_fault_memory at
> pre_fault_memory_test.c:96
>      5    0x000000000040246e: test_pre_fault_memory at
> pre_fault_memory_test.c:133 (discriminator 3)
>      6     (inlined by) main at pre_fault_memory_test.c:140
> (discriminator 3)
>      7    0x00007fcb68429d8f: ?? ??:0
>      8    0x00007fcb68429e3f: ?? ??:0
>      9    0x00000000004024e4: _start at ??:?
>   Invalid virtual address, vaddr: 0xfffffffc00000
>
>> +
>> +    vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS,
>> +                    guest_test_phys_mem, TEST_SLOT, TEST_NPAGES,
>> +                    private ? KVM_MEM_GUEST_MEMFD : 0);
>> +    virt_map(vm, guest_test_virt_mem, guest_test_phys_mem, TEST_NPAGES);
>
>
>
>
>


2024-04-24 16:06:27

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 2/6] KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory

On Mon, Apr 22, 2024 at 7:39 AM Binbin Wu <[email protected]> wrote:
> range->size equals 0 can be covered by "range->gpa + range->size <=
> range->gpa"
>
> If we want to return success when size is 0 (, though I am not sure it's
> needed),
> we need to use "range->gpa + range->size < range->gpa" instead.

I think it's not needed because it could cause an infinite loop in
(buggy) userspace. Better return -EINVAL.

Paolo

>
> > +
> > + vcpu_load(vcpu);
> > + idx = srcu_read_lock(&vcpu->kvm->srcu);
> > +
> > + full_size = range->size;
> > + do {
> > + if (signal_pending(current)) {
> > + r = -EINTR;
> > + break;
> > + }
> > +
> > + r = kvm_arch_vcpu_pre_fault_memory(vcpu, range);
> > + if (r < 0)
> > + break;
> > +
> > + if (WARN_ON_ONCE(r == 0))
> > + break;
> > +
> > + range->size -= r;
> > + range->gpa += r;
> > + cond_resched();
> > + } while (range->size);
> > +
> > + srcu_read_unlock(&vcpu->kvm->srcu, idx);
> > + vcpu_put(vcpu);
> > +
> > + /* Return success if at least one page was mapped successfully. */
> > + return full_size == range->size ? r : 0;
> > +}
> > +#endif
> > +
> > static long kvm_vcpu_ioctl(struct file *filp,
> > unsigned int ioctl, unsigned long arg)
> > {
> > @@ -4580,6 +4629,20 @@ static long kvm_vcpu_ioctl(struct file *filp,
> > r = kvm_vcpu_ioctl_get_stats_fd(vcpu);
> > break;
> > }
> > +#ifdef CONFIG_KVM_GENERIC_PRE_FAULT_MEMORY
> > + case KVM_PRE_FAULT_MEMORY: {
> > + struct kvm_pre_fault_memory range;
> > +
> > + r = -EFAULT;
> > + if (copy_from_user(&range, argp, sizeof(range)))
> > + break;
> > + r = kvm_vcpu_pre_fault_memory(vcpu, &range);
> > + /* Pass back leftover range. */
> > + if (copy_to_user(argp, &range, sizeof(range)))
> > + r = -EFAULT;
> > + break;
> > + }
> > +#endif
> > default:
> > r = kvm_arch_vcpu_ioctl(filp, ioctl, arg);
> > }
>