This is a v2 for previous series [1] to allow mapping for compound tail
pages for IO or PFNMAP mapping.
Compared to v1, this version provides selftest to check functionality in
KVM to map memslots for MMIO BARs (VMAs with flag VM_IO | VM_PFNMAP), as
requested by Sean in [1].
The selftest can also be used to test series "allow mapping non-refcounted
pages" [2].
Tag RFC is added because a test driver is introduced in patch 2, which is
new to KVM selftest, and test "set_memory_region_io" in patch 3 depends on
that the test driver is compiled and loaded in kernel.
Besides, patch 3 calls vm_set_user_memory_region() directly without
modifying vm_mem_add().
So, this series is sent to ensure the main direction is right.
Thanks
Yan
[1] https://lore.kernel.org/all/[email protected]/
[2] https://lore.kernel.org/all/[email protected]/
v2:
added patch 2 and 3 to do selftest for patch 1 (Sean).
Yan Zhao (3):
KVM: allow mapping of compound tail pages for IO or PFNMAP mapping
KVM: selftests: add selftest driver for KVM to test memory slots for
MMIO BARs
KVM: selftests: Add set_memory_region_io to test memslots for MMIO
BARs
lib/Kconfig.debug | 14 +
lib/Makefile | 1 +
lib/test_kvm_mock_device.c | 281 ++++++++++++++++++
lib/test_kvm_mock_device_uapi.h | 16 +
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/set_memory_region_io.c | 188 ++++++++++++
virt/kvm/kvm_main.c | 2 +-
7 files changed, 502 insertions(+), 1 deletion(-)
create mode 100644 lib/test_kvm_mock_device.c
create mode 100644 lib/test_kvm_mock_device_uapi.h
create mode 100644 tools/testing/selftests/kvm/set_memory_region_io.c
base-commit: 8ed26ab8d59111c2f7b86d200d1eb97d2a458fd1
--
2.17.1
Allow mapping of tail pages of compound pages for IO or PFNMAP mapping
by trying and getting ref count of its head page.
For IO or PFNMAP mapping, sometimes it's backed by compound pages.
KVM will just return error on mapping of tail pages of the compound pages,
as ref count of the tail pages are always 0.
So, rather than check and add ref count of a tail page, check and add ref
count of its folio (head page) to allow mapping of the compound tail pages.
This will not break the origial intention to disallow mapping of tail pages
of non-compound higher order allocations as the folio of a non-compound
tail page is the same as the page itself.
On the other side, put_page() has already converted page to folio before
putting page ref.
Signed-off-by: Yan Zhao <[email protected]>
---
virt/kvm/kvm_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index acd67fb40183..f53b58446ac7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2892,7 +2892,7 @@ static int kvm_try_get_pfn(kvm_pfn_t pfn)
if (!page)
return 1;
- return get_page_unless_zero(page);
+ return folio_try_get(page_folio(page));
}
static int hva_to_pfn_remapped(struct vm_area_struct *vma,
--
2.17.1
This driver is for testing KVM memory slots for device MMIO BARs that are
mapped to pages serving as device resources.
This driver implements a mock device whose device resources are pages
array that can be mmaped into user space. It provides ioctl interface to
users to configure whether the pages are allocated as a compound huge page
or not.
KVM selftest code can then map the mock device resource to KVM memslots
to check if any error encountered. After VM shutdown, mock device
resource's page reference counters are checked to ensure KVM does not hold
extra reference count during memslot add/removal.
Signed-off-by: Yan Zhao <[email protected]>
---
lib/Kconfig.debug | 14 ++
lib/Makefile | 1 +
lib/test_kvm_mock_device.c | 281 ++++++++++++++++++++++++++++++++
lib/test_kvm_mock_device_uapi.h | 16 ++
4 files changed, 312 insertions(+)
create mode 100644 lib/test_kvm_mock_device.c
create mode 100644 lib/test_kvm_mock_device_uapi.h
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index cc7d53d9dc01..c0fd4b53db89 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2922,6 +2922,20 @@ config TEST_HMM
If unsure, say N.
+config TEST_KVM_MOCK_DEVICE
+ tristate "Test page-backended BAR to KVM mock device"
+ help
+ This is a mock KVM assigned device whose MMIO BAR is backended by
+ struct page.
+ Say M here if you want to build the "test_kvm_mock_device" module.
+ Doing so will allow you to run KVM selftest
+ tools/testing/selftest/kvm/set_memory_region_io, which tests
+ functionality of adding page-backended MMIO memslots in KVM and
+ ensures that reference count of the backend pages are correctly
+ handled.
+
+ If unsure, say N.
+
config TEST_FREE_PAGES
tristate "Test freeing pages"
help
diff --git a/lib/Makefile b/lib/Makefile
index 6b09731d8e61..894a185bbabd 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -83,6 +83,7 @@ obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_key_base.o
obj-$(CONFIG_TEST_DYNAMIC_DEBUG) += test_dynamic_debug.o
obj-$(CONFIG_TEST_PRINTF) += test_printf.o
obj-$(CONFIG_TEST_SCANF) += test_scanf.o
+obj-$(CONFIG_TEST_KVM_MOCK_DEVICE) += test_kvm_mock_device.o
obj-$(CONFIG_TEST_BITMAP) += test_bitmap.o
ifeq ($(CONFIG_CC_IS_CLANG)$(CONFIG_KASAN),yy)
diff --git a/lib/test_kvm_mock_device.c b/lib/test_kvm_mock_device.c
new file mode 100644
index 000000000000..4e7527c230cd
--- /dev/null
+++ b/lib/test_kvm_mock_device.c
@@ -0,0 +1,281 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * This is a module to test KVM DEVICE MMIO PASSTHROUGH.
+ */
+#include <linux/init.h>
+#include <linux/fs.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/cdev.h>
+#include <linux/device.h>
+#include <linux/mm.h>
+
+#include "test_kvm_mock_device_uapi.h"
+
+/* kvm mock device */
+struct kvm_mock_dev {
+ dev_t devt;
+ struct device device;
+ struct cdev cdev;
+};
+static struct kvm_mock_dev kvm_mock_dev;
+
+struct kvm_mock_device {
+ bool compound;
+ struct page *resource;
+ u64 bar_size;
+ int order;
+ int *ref_array;
+ struct mutex lock;
+ bool prepared;
+};
+
+static bool opened;
+
+#define BAR_SIZE 0x200000UL
+#define DEFAULT_COMPOUND true
+
+static vm_fault_t kvm_mock_device_mmap_fault(struct vm_fault *vmf)
+{
+ struct vm_area_struct *vma = vmf->vma;
+ struct kvm_mock_device *kmdev = vma->vm_private_data;
+ struct page *p = kmdev->resource;
+ vm_fault_t ret = VM_FAULT_NOPAGE;
+ unsigned long addr;
+ int i;
+
+ for (addr = vma->vm_start, i = vma->vm_pgoff; addr < vma->vm_end;
+ addr += PAGE_SIZE, i++) {
+
+ ret = vmf_insert_pfn(vma, addr, page_to_pfn(p + i));
+ if (ret == VM_FAULT_NOPAGE)
+ continue;
+
+ zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start);
+ return ret;
+
+ }
+ return ret;
+}
+
+static const struct vm_operations_struct kvm_mock_device_mmap_ops = {
+ .fault = kvm_mock_device_mmap_fault,
+};
+
+static int kvm_mock_device_fops_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ struct kvm_mock_device *kmdev = file->private_data;
+ u64 offset, req_len;
+ int ret = 0;
+
+ mutex_lock(&kmdev->lock);
+ if (!kmdev->prepared) {
+ ret = -ENODEV;
+ goto out;
+ }
+
+ offset = vma->vm_pgoff << PAGE_SHIFT;
+ req_len = vma->vm_end - vma->vm_start;
+ if (offset + req_len > BAR_SIZE) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
+ vma->vm_ops = &kvm_mock_device_mmap_ops;
+ vma->vm_private_data = kmdev;
+out:
+ mutex_unlock(&kmdev->lock);
+ return ret;
+}
+
+static int kvm_mock_device_prepare_resource(struct kvm_mock_device *kmdev)
+{
+ gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO;
+ unsigned int order = kmdev->order;
+ unsigned long count = 1 << order;
+ unsigned long i;
+ struct page *p;
+ int ret;
+
+ mutex_lock(&kmdev->lock);
+ if (kmdev->prepared) {
+ ret = -EBUSY;
+ goto out;
+ }
+
+ if (kmdev->compound)
+ gfp_flags |= __GFP_COMP;
+
+ p = alloc_pages_node(0, gfp_flags, order);
+ if (!p) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ kmdev->ref_array = kmalloc_array(count, sizeof(kmdev->ref_array),
+ GFP_KERNEL_ACCOUNT);
+ if (!kmdev->ref_array) {
+ __free_pages(p, order);
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ for (i = 0; i < count; i++)
+ kmdev->ref_array[i] = page_ref_count(p + i);
+
+ kmdev->resource = p;
+ kmdev->prepared = true;
+out:
+ mutex_unlock(&kmdev->lock);
+ return ret;
+}
+
+static int kvm_mock_device_check_resource_ref(struct kvm_mock_device *kmdev)
+{
+ u32 i, count = 1 << kmdev->order;
+ struct page *p = kmdev->resource;
+ int inequal = 0;
+
+ mutex_lock(&kmdev->lock);
+ if (!kmdev->prepared) {
+ mutex_unlock(&kmdev->lock);
+ return -ENODEV;
+ }
+
+ for (i = 0; i < count; i++) {
+ if (kmdev->ref_array[i] == page_ref_count(p + i))
+ continue;
+
+ pr_err("kvm test device check resource page %d old ref=%d new ref=%d\n",
+ i, kmdev->ref_array[i], page_ref_count(p + i));
+ inequal++;
+ }
+ mutex_unlock(&kmdev->lock);
+
+ return inequal;
+}
+
+static int kvm_mock_device_fops_open(struct inode *inode, struct file *filp)
+{
+ struct kvm_mock_device *kmdev;
+
+ if (opened)
+ return -EBUSY;
+
+ kmdev = kzalloc(sizeof(*kmdev), GFP_KERNEL_ACCOUNT);
+ if (!kmdev)
+ return -ENOMEM;
+
+ kmdev->compound = DEFAULT_COMPOUND;
+ kmdev->bar_size = BAR_SIZE;
+ kmdev->order = get_order(kmdev->bar_size);
+ mutex_init(&kmdev->lock);
+ filp->private_data = kmdev;
+
+ opened = true;
+ return 0;
+}
+
+static int kvm_mock_device_fops_release(struct inode *inode, struct file *filp)
+{
+ struct kvm_mock_device *kmdev = filp->private_data;
+
+ if (kmdev->prepared)
+ __free_pages(kmdev->resource, kmdev->order);
+ mutex_destroy(&kmdev->lock);
+ kfree(kmdev->ref_array);
+ kfree(kmdev);
+ opened = false;
+ return 0;
+}
+
+static long kvm_mock_device_fops_unlocked_ioctl(struct file *filp,
+ unsigned int command,
+ unsigned long arg)
+{
+ struct kvm_mock_device *kmdev = filp->private_data;
+ int r;
+
+ switch (command) {
+ case KVM_MOCK_DEVICE_GET_BAR_SIZE: {
+ u64 bar_size;
+
+ bar_size = kmdev->bar_size;
+ r = put_user(bar_size, (u64 __user *)arg);
+ break;
+ }
+ case KVM_MOCK_DEVICE_PREPARE_RESOURCE: {
+ u32 compound;
+
+ r = get_user(compound, (u32 __user *)arg);
+ if (r)
+ return r;
+
+ kmdev->compound = compound;
+ r = kvm_mock_device_prepare_resource(kmdev);
+ break;
+
+ }
+ case KVM_MOCK_DEVICE_CHECK_BACKEND_REF: {
+ int inequal;
+
+ inequal = kvm_mock_device_check_resource_ref(kmdev);
+
+ if (inequal < 0)
+ return inequal;
+
+ r = put_user(inequal, (u32 __user *)arg);
+ break;
+ }
+ default:
+ r = -EOPNOTSUPP;
+ }
+
+ return r;
+}
+
+
+static const struct file_operations kvm_mock_device_fops = {
+ .open = kvm_mock_device_fops_open,
+ .release = kvm_mock_device_fops_release,
+ .mmap = kvm_mock_device_fops_mmap,
+ .unlocked_ioctl = kvm_mock_device_fops_unlocked_ioctl,
+ .llseek = default_llseek,
+ .owner = THIS_MODULE,
+};
+
+
+static int __init kvm_mock_device_test_init(void)
+{
+ int ret;
+
+ ret = alloc_chrdev_region(&kvm_mock_dev.devt, 0, 1, "KVM-MOCK-DEVICE");
+ if (ret)
+ goto out;
+
+ cdev_init(&kvm_mock_dev.cdev, &kvm_mock_device_fops);
+ kvm_mock_dev.cdev.owner = THIS_MODULE;
+ device_initialize(&kvm_mock_dev.device);
+ kvm_mock_dev.device.devt = MKDEV(MAJOR(kvm_mock_dev.devt), 0);
+ ret = dev_set_name(&kvm_mock_dev.device, "kvm_mock_device");
+ if (ret)
+ goto out;
+
+ ret = cdev_device_add(&kvm_mock_dev.cdev, &kvm_mock_dev.device);
+ if (ret)
+ goto out;
+
+out:
+ return ret;
+}
+
+static void __exit kvm_mock_device_test_exit(void)
+{
+ cdev_device_del(&kvm_mock_dev.cdev, &kvm_mock_dev.device);
+ unregister_chrdev_region(kvm_mock_dev.devt, 1);
+}
+
+module_init(kvm_mock_device_test_init);
+module_exit(kvm_mock_device_test_exit);
+MODULE_LICENSE("GPL");
diff --git a/lib/test_kvm_mock_device_uapi.h b/lib/test_kvm_mock_device_uapi.h
new file mode 100644
index 000000000000..227d0bf1d430
--- /dev/null
+++ b/lib/test_kvm_mock_device_uapi.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * This is a module to help test KVM guest access of KVM mock device's BAR,
+ * whose backend is mapped to pages.
+ */
+#ifndef _LIB_TEST_KVM_MOCK_DEVICE_UAPI_H
+#define _LIB_TEST_KVM_MOCK_DEVICE_UAPI_H
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+#define KVM_MOCK_DEVICE_GET_BAR_SIZE _IOR('M', 0x00, u64)
+#define KVM_MOCK_DEVICE_PREPARE_RESOURCE _IOWR('M', 0x01, u32)
+#define KVM_MOCK_DEVICE_CHECK_BACKEND_REF _IOWR('M', 0x02, u32)
+
+#endif /* _LIB_TEST_KVM_MOCK_DEVICE_UAPI_H */
--
2.17.1
Added a selftest set_memory_region_io to test memslots for MMIO BARs.
The MMIO BARs' backends are compound/non-compound huge pages serving as
device resources allocated by a mock device driver.
This selftest will assert and report "errno=14 - Bad address" in vcpu_run()
if any failure is met to add such MMIO BAR memslots.
After MMIO memslots removal, page reference counts of the device resources
are also checked.
As this selftest will interacts with a mock device "/dev/kvm_mock_device",
it depends on test driver test_kvm_mock_device.ko in the kernel.
CONFIG_TEST_KVM_MOCK_DEVICE=m must be enabled in the kernel.
Currently, this selftest is only compiled for __x86_64__.
Signed-off-by: Yan Zhao <[email protected]>
---
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/set_memory_region_io.c | 188 ++++++++++++++++++
2 files changed, 189 insertions(+)
create mode 100644 tools/testing/selftests/kvm/set_memory_region_io.c
diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
index 4412b42d95de..9d39514b6403 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -144,6 +144,7 @@ TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
TEST_GEN_PROGS_x86_64 += memslot_perf_test
TEST_GEN_PROGS_x86_64 += rseq_test
TEST_GEN_PROGS_x86_64 += set_memory_region_test
+TEST_GEN_PROGS_x86_64 += set_memory_region_io
TEST_GEN_PROGS_x86_64 += steal_time
TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
TEST_GEN_PROGS_x86_64 += system_counter_offset_test
diff --git a/tools/testing/selftests/kvm/set_memory_region_io.c b/tools/testing/selftests/kvm/set_memory_region_io.c
new file mode 100644
index 000000000000..e221103091f4
--- /dev/null
+++ b/tools/testing/selftests/kvm/set_memory_region_io.c
@@ -0,0 +1,188 @@
+// SPDX-License-Identifier: GPL-2.0
+#define _GNU_SOURCE /* for program_invocation_short_name */
+#include <fcntl.h>
+#include <pthread.h>
+#include <sched.h>
+#include <semaphore.h>
+#include <signal.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <sys/ioctl.h>
+#include <sys/mman.h>
+
+#include <linux/compiler.h>
+
+#include <test_util.h>
+#include <kvm_util.h>
+#include <processor.h>
+
+#include <../../../../lib/test_kvm_mock_device_uapi.h>
+
+/*
+ * Somewhat arbitrary location and slot, intended to not overlap anything.
+ */
+#define MEM_REGION_GPA_BASE 0xc0000000
+#define RANDOM_OFFSET 0x1000
+#define MEM_REGION_GPA_RANDOM (MEM_REGION_GPA_BASE + RANDOM_OFFSET)
+#define MEM_REGION_SLOT_ID 10
+
+static const bool non_compound_supported;
+
+static const uint64_t BASE_VAL = 0x1111;
+static const uint64_t RANDOM_VAL = 0x2222;
+
+static unsigned long bar_size;
+
+static sem_t vcpu_ready;
+
+static void guest_code_read_bar(void)
+{
+ uint64_t val;
+
+ GUEST_SYNC(0);
+
+ val = READ_ONCE(*((uint64_t *)MEM_REGION_GPA_RANDOM));
+ GUEST_ASSERT_EQ(val, RANDOM_VAL);
+
+ val = READ_ONCE(*((uint64_t *)MEM_REGION_GPA_BASE));
+ GUEST_ASSERT_EQ(val, BASE_VAL);
+
+ GUEST_DONE();
+}
+
+static void *vcpu_worker(void *data)
+{
+ struct kvm_vcpu *vcpu = data;
+ struct kvm_run *run = vcpu->run;
+ struct ucall uc;
+ uint64_t cmd;
+
+ /*
+ * Loop until the guest is done. Re-enter the guest on all MMIO exits,
+ * which will occur if the guest attempts to access a memslot after it
+ * has been deleted or while it is being moved .
+ */
+ while (1) {
+ vcpu_run(vcpu);
+
+ if (run->exit_reason == KVM_EXIT_IO) {
+ cmd = get_ucall(vcpu, &uc);
+ if (cmd != UCALL_SYNC)
+ break;
+
+ sem_post(&vcpu_ready);
+ continue;
+ }
+
+ if (run->exit_reason != KVM_EXIT_MMIO)
+ break;
+
+ TEST_ASSERT(!run->mmio.is_write, "Unexpected exit mmio write");
+ TEST_ASSERT(run->mmio.len == 8,
+ "Unexpected exit mmio size = %u", run->mmio.len);
+
+ TEST_ASSERT(run->mmio.phys_addr < MEM_REGION_GPA_BASE ||
+ run->mmio.phys_addr >= MEM_REGION_GPA_BASE + bar_size,
+ "Unexpected exit mmio address = 0x%llx",
+ run->mmio.phys_addr);
+ }
+
+ if (run->exit_reason == KVM_EXIT_IO && cmd == UCALL_ABORT)
+ REPORT_GUEST_ASSERT(uc);
+
+ return NULL;
+}
+
+static void wait_for_vcpu(void)
+{
+ struct timespec ts;
+
+ TEST_ASSERT(!clock_gettime(CLOCK_REALTIME, &ts),
+ "clock_gettime() failed: %d\n", errno);
+
+ ts.tv_sec += 2;
+ TEST_ASSERT(!sem_timedwait(&vcpu_ready, &ts),
+ "sem_timedwait() failed: %d\n", errno);
+
+ /* Wait for the vCPU thread to reenter the guest. */
+ usleep(100000);
+}
+
+static void test_kvm_mock_device_bar(bool compound)
+{
+ struct kvm_vm *vm;
+ void *mem;
+ struct kvm_vcpu *vcpu;
+ pthread_t vcpu_thread;
+ int fd, ret;
+ u32 param_compound = compound;
+ u32 inequal = 0;
+
+ fd = open("/dev/kvm_mock_device", O_RDWR);
+ if (fd < 0) {
+ pr_info("Please ensure \"CONFIG_TEST_KVM_MOCK_DEVICE=m\" is enabled in the kernel");
+ pr_info(", and execute\n\"modprobe test_kvm_mock_device\n");
+ }
+ TEST_ASSERT(fd >= 0, "Failed to open kvm mock device.");
+
+ ret = ioctl(fd, KVM_MOCK_DEVICE_GET_BAR_SIZE, &bar_size);
+ TEST_ASSERT(ret == 0, "Failed to get bar size of kvm mock device");
+
+ ret = ioctl(fd, KVM_MOCK_DEVICE_PREPARE_RESOURCE, ¶m_compound);
+ TEST_ASSERT(ret == 0, "Failed to prepare resource of kvm mock device");
+
+ mem = mmap(NULL, (size_t)bar_size, PROT_READ | PROT_WRITE, MAP_SHARED,
+ fd, 0);
+ TEST_ASSERT(mem != MAP_FAILED, "Failed to mmap() kvm mock device bar");
+
+ *(u64 *)mem = BASE_VAL;
+ *(u64 *)(mem + RANDOM_OFFSET) = RANDOM_VAL;
+
+ vm = vm_create_with_one_vcpu(&vcpu, guest_code_read_bar);
+
+ vm_set_user_memory_region(vm, MEM_REGION_SLOT_ID, 0, MEM_REGION_GPA_BASE,
+ bar_size, mem);
+
+ virt_map(vm, MEM_REGION_GPA_BASE, MEM_REGION_GPA_BASE,
+ (RANDOM_OFFSET / getpagesize()) + 1);
+
+ pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
+
+ /* Ensure the guest thread is spun up. */
+ wait_for_vcpu();
+
+ pthread_join(vcpu_thread, NULL);
+
+ vm_set_user_memory_region(vm, MEM_REGION_SLOT_ID, 0, 0, 0, 0);
+ kvm_vm_free(vm);
+
+ ret = ioctl(fd, KVM_MOCK_DEVICE_CHECK_BACKEND_REF, &inequal);
+ TEST_ASSERT(ret == 0 && inequal == 0, "Incorrect resource ref of KVM device");
+
+ munmap(mem, bar_size);
+ close(fd);
+}
+
+static void test_non_compound_backend(void)
+{
+ pr_info("Testing non-compound huge page backend for mem slot\n");
+ test_kvm_mock_device_bar(false);
+}
+
+static void test_compound_backend(void)
+{
+ pr_info("Testing compound huge page backend for mem slot\n");
+ test_kvm_mock_device_bar(true);
+}
+
+int main(int argc, char *argv[])
+{
+#ifdef __x86_64__
+ test_compound_backend();
+ if (non_compound_supported)
+ test_non_compound_backend();
+#endif
+
+ return 0;
+}
--
2.17.1
On Wed, Jan 03, 2024 at 04:44:57PM +0800, Yan Zhao wrote:
> This driver is for testing KVM memory slots for device MMIO BARs that are
> mapped to pages serving as device resources.
>
> This driver implements a mock device whose device resources are pages
> array that can be mmaped into user space. It provides ioctl interface to
> users to configure whether the pages are allocated as a compound huge page
> or not.
I just think that it can be used in other scenarios, not only KVM.
>
> KVM selftest code can then map the mock device resource to KVM memslots
> to check if any error encountered. After VM shutdown, mock device
> resource's page reference counters are checked to ensure KVM does not hold
> extra reference count during memslot add/removal.
>
> Signed-off-by: Yan Zhao <[email protected]>
> ---
> lib/Kconfig.debug | 14 ++
> lib/Makefile | 1 +
> lib/test_kvm_mock_device.c | 281 ++++++++++++++++++++++++++++++++
> lib/test_kvm_mock_device_uapi.h | 16 ++
> 4 files changed, 312 insertions(+)
> create mode 100644 lib/test_kvm_mock_device.c
> create mode 100644 lib/test_kvm_mock_device_uapi.h
>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index cc7d53d9dc01..c0fd4b53db89 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -2922,6 +2922,20 @@ config TEST_HMM
>
> If unsure, say N.
>
> +config TEST_KVM_MOCK_DEVICE
> + tristate "Test page-backended BAR to KVM mock device"
> + help
> + This is a mock KVM assigned device whose MMIO BAR is backended by
> + struct page.
> + Say M here if you want to build the "test_kvm_mock_device" module.
> + Doing so will allow you to run KVM selftest
> + tools/testing/selftest/kvm/set_memory_region_io, which tests
> + functionality of adding page-backended MMIO memslots in KVM and
> + ensures that reference count of the backend pages are correctly
> + handled.
> +
> + If unsure, say N.
> +
> config TEST_FREE_PAGES
> tristate "Test freeing pages"
> help
> diff --git a/lib/Makefile b/lib/Makefile
> index 6b09731d8e61..894a185bbabd 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -83,6 +83,7 @@ obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_key_base.o
> obj-$(CONFIG_TEST_DYNAMIC_DEBUG) += test_dynamic_debug.o
> obj-$(CONFIG_TEST_PRINTF) += test_printf.o
> obj-$(CONFIG_TEST_SCANF) += test_scanf.o
> +obj-$(CONFIG_TEST_KVM_MOCK_DEVICE) += test_kvm_mock_device.o
>
> obj-$(CONFIG_TEST_BITMAP) += test_bitmap.o
> ifeq ($(CONFIG_CC_IS_CLANG)$(CONFIG_KASAN),yy)
> diff --git a/lib/test_kvm_mock_device.c b/lib/test_kvm_mock_device.c
> new file mode 100644
> index 000000000000..4e7527c230cd
> --- /dev/null
> +++ b/lib/test_kvm_mock_device.c
> @@ -0,0 +1,281 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * This is a module to test KVM DEVICE MMIO PASSTHROUGH.
> + */
> +#include <linux/init.h>
> +#include <linux/fs.h>
> +#include <linux/module.h>
> +#include <linux/kernel.h>
> +#include <linux/cdev.h>
> +#include <linux/device.h>
> +#include <linux/mm.h>
> +
> +#include "test_kvm_mock_device_uapi.h"
> +
> +/* kvm mock device */
> +struct kvm_mock_dev {
> + dev_t devt;
> + struct device device;
> + struct cdev cdev;
> +};
> +static struct kvm_mock_dev kvm_mock_dev;
> +
> +struct kvm_mock_device {
> + bool compound;
> + struct page *resource;
> + u64 bar_size;
> + int order;
Do you have plan to allow user to change the bar_size via IOCTL ?
If no "order" and "bar_size" can be removed.
> + int *ref_array;
> + struct mutex lock;
> + bool prepared;
> +};
> +
> +static bool opened;
> +
> +#define BAR_SIZE 0x200000UL
> +#define DEFAULT_COMPOUND true
"kmdev->compound = true;" is more easy to understand,
but "kmdev->compound = DEFAULT_COMPOUND;" not.
> +
> +static vm_fault_t kvm_mock_device_mmap_fault(struct vm_fault *vmf)
> +{
> + struct vm_area_struct *vma = vmf->vma;
> + struct kvm_mock_device *kmdev = vma->vm_private_data;
> + struct page *p = kmdev->resource;
> + vm_fault_t ret = VM_FAULT_NOPAGE;
> + unsigned long addr;
> + int i;
> +
> + for (addr = vma->vm_start, i = vma->vm_pgoff; addr < vma->vm_end;
> + addr += PAGE_SIZE, i++) {
Just question:
Will it be enough if only map the accessed page for the testing purpose ?
> +
> + ret = vmf_insert_pfn(vma, addr, page_to_pfn(p + i));
> + if (ret == VM_FAULT_NOPAGE)
> + continue;
> +
> + zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start);
> + return ret;
> +
> + }
> + return ret;
> +}
> +
> +static const struct vm_operations_struct kvm_mock_device_mmap_ops = {
> + .fault = kvm_mock_device_mmap_fault,
> +};
> +
> +static int kvm_mock_device_fops_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> + struct kvm_mock_device *kmdev = file->private_data;
> + u64 offset, req_len;
> + int ret = 0;
> +
> + mutex_lock(&kmdev->lock);
> + if (!kmdev->prepared) {
> + ret = -ENODEV;
> + goto out;
> + }
> +
> + offset = vma->vm_pgoff << PAGE_SHIFT;
> + req_len = vma->vm_end - vma->vm_start;
> + if (offset + req_len > BAR_SIZE) {
> + ret = -EINVAL;
> + goto out;
> + }
> +
> + vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
> + vma->vm_ops = &kvm_mock_device_mmap_ops;
> + vma->vm_private_data = kmdev;
> +out:
> + mutex_unlock(&kmdev->lock);
> + return ret;
> +}
> +
> +static int kvm_mock_device_prepare_resource(struct kvm_mock_device *kmdev)
> +{
> + gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO;
> + unsigned int order = kmdev->order;
> + unsigned long count = 1 << order;
> + unsigned long i;
> + struct page *p;
> + int ret;
> +
> + mutex_lock(&kmdev->lock);
> + if (kmdev->prepared) {
> + ret = -EBUSY;
> + goto out;
> + }
> +
> + if (kmdev->compound)
> + gfp_flags |= __GFP_COMP;
> +
> + p = alloc_pages_node(0, gfp_flags, order);
Please alloc_pages() to honor the memory policy of current task.
Hardcode to node 0 just works, but not hard to do this better.
> + if (!p) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + kmdev->ref_array = kmalloc_array(count, sizeof(kmdev->ref_array),
> + GFP_KERNEL_ACCOUNT);
> + if (!kmdev->ref_array) {
> + __free_pages(p, order);
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + for (i = 0; i < count; i++)
> + kmdev->ref_array[i] = page_ref_count(p + i);
> +
> + kmdev->resource = p;
> + kmdev->prepared = true;
> +out:
> + mutex_unlock(&kmdev->lock);
> + return ret;
> +}
> +
> +static int kvm_mock_device_check_resource_ref(struct kvm_mock_device *kmdev)
> +{
> + u32 i, count = 1 << kmdev->order;
> + struct page *p = kmdev->resource;
> + int inequal = 0;
> +
> + mutex_lock(&kmdev->lock);
> + if (!kmdev->prepared) {
> + mutex_unlock(&kmdev->lock);
> + return -ENODEV;
> + }
> +
> + for (i = 0; i < count; i++) {
> + if (kmdev->ref_array[i] == page_ref_count(p + i))
> + continue;
> +
> + pr_err("kvm test device check resource page %d old ref=%d new ref=%d\n",
> + i, kmdev->ref_array[i], page_ref_count(p + i));
How about just return a bitmap to userspace for each page ineuqal ?
Or if inequal number itself is enough then just remove this output, in worst case
it prints 512 times for 2MB bar case, which looks just useless.
> + inequal++;
> + }
> + mutex_unlock(&kmdev->lock);
> +
> + return inequal;
> +}
> +
> +static int kvm_mock_device_fops_open(struct inode *inode, struct file *filp)
> +{
> + struct kvm_mock_device *kmdev;
> +
> + if (opened)
> + return -EBUSY;
It can't work in case of 2 who open the device file at *real* same time, at least
you need atomic helpers for that purpose.
BTW I saw "kvm_mock_devie" instance is per file level, so maybe not hard
to remove this limitation ?
> +
> + kmdev = kzalloc(sizeof(*kmdev), GFP_KERNEL_ACCOUNT);
> + if (!kmdev)
> + return -ENOMEM;
> +
> + kmdev->compound = DEFAULT_COMPOUND;
> + kmdev->bar_size = BAR_SIZE;
> + kmdev->order = get_order(kmdev->bar_size);
> + mutex_init(&kmdev->lock);
> + filp->private_data = kmdev;
> +
> + opened = true;
> + return 0;
> +}
> +
> +static int kvm_mock_device_fops_release(struct inode *inode, struct file *filp)
> +{
> + struct kvm_mock_device *kmdev = filp->private_data;
> +
> + if (kmdev->prepared)
> + __free_pages(kmdev->resource, kmdev->order);
> + mutex_destroy(&kmdev->lock);
> + kfree(kmdev->ref_array);
> + kfree(kmdev);
> + opened = false;
> + return 0;
> +}
> +
> +static long kvm_mock_device_fops_unlocked_ioctl(struct file *filp,
> + unsigned int command,
> + unsigned long arg)
> +{
> + struct kvm_mock_device *kmdev = filp->private_data;
> + int r;
> +
> + switch (command) {
> + case KVM_MOCK_DEVICE_GET_BAR_SIZE: {
> + u64 bar_size;
> +
> + bar_size = kmdev->bar_size;
> + r = put_user(bar_size, (u64 __user *)arg);
> + break;
> + }
> + case KVM_MOCK_DEVICE_PREPARE_RESOURCE: {
> + u32 compound;
> +
> + r = get_user(compound, (u32 __user *)arg);
> + if (r)
> + return r;
> +
> + kmdev->compound = compound;
> + r = kvm_mock_device_prepare_resource(kmdev);
> + break;
> +
> + }
> + case KVM_MOCK_DEVICE_CHECK_BACKEND_REF: {
> + int inequal;
> +
> + inequal = kvm_mock_device_check_resource_ref(kmdev);
> +
> + if (inequal < 0)
> + return inequal;
> +
> + r = put_user(inequal, (u32 __user *)arg);
> + break;
> + }
> + default:
> + r = -EOPNOTSUPP;
> + }
> +
> + return r;
> +}
> +
> +
> +static const struct file_operations kvm_mock_device_fops = {
> + .open = kvm_mock_device_fops_open,
> + .release = kvm_mock_device_fops_release,
> + .mmap = kvm_mock_device_fops_mmap,
> + .unlocked_ioctl = kvm_mock_device_fops_unlocked_ioctl,
> + .llseek = default_llseek,
> + .owner = THIS_MODULE,
> +};
> +
> +
> +static int __init kvm_mock_device_test_init(void)
> +{
> + int ret;
> +
> + ret = alloc_chrdev_region(&kvm_mock_dev.devt, 0, 1, "KVM-MOCK-DEVICE");
How about misc_register() ? Like how KVM create /dev/kvm.
I think that will be more simpler.
> + if (ret)
> + goto out;
> +
> + cdev_init(&kvm_mock_dev.cdev, &kvm_mock_device_fops);
> + kvm_mock_dev.cdev.owner = THIS_MODULE;
> + device_initialize(&kvm_mock_dev.device);
> + kvm_mock_dev.device.devt = MKDEV(MAJOR(kvm_mock_dev.devt), 0);
> + ret = dev_set_name(&kvm_mock_dev.device, "kvm_mock_device");
> + if (ret)
> + goto out;
> +
> + ret = cdev_device_add(&kvm_mock_dev.cdev, &kvm_mock_dev.device);
> + if (ret)
> + goto out;
> +
> +out:
> + return ret;
> +}
> +
> +static void __exit kvm_mock_device_test_exit(void)
> +{
> + cdev_device_del(&kvm_mock_dev.cdev, &kvm_mock_dev.device);
> + unregister_chrdev_region(kvm_mock_dev.devt, 1);
> +}
> +
> +module_init(kvm_mock_device_test_init);
> +module_exit(kvm_mock_device_test_exit);
> +MODULE_LICENSE("GPL");
> diff --git a/lib/test_kvm_mock_device_uapi.h b/lib/test_kvm_mock_device_uapi.h
> new file mode 100644
> index 000000000000..227d0bf1d430
> --- /dev/null
> +++ b/lib/test_kvm_mock_device_uapi.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/*
> + * This is a module to help test KVM guest access of KVM mock device's BAR,
> + * whose backend is mapped to pages.
> + */
> +#ifndef _LIB_TEST_KVM_MOCK_DEVICE_UAPI_H
> +#define _LIB_TEST_KVM_MOCK_DEVICE_UAPI_H
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +#define KVM_MOCK_DEVICE_GET_BAR_SIZE _IOR('M', 0x00, u64)
> +#define KVM_MOCK_DEVICE_PREPARE_RESOURCE _IOWR('M', 0x01, u32)
> +#define KVM_MOCK_DEVICE_CHECK_BACKEND_REF _IOWR('M', 0x02, u32)
> +
> +#endif /* _LIB_TEST_KVM_MOCK_DEVICE_UAPI_H */
> --
> 2.17.1
>
>
On Wed, Jan 03, 2024 at 04:45:35PM +0800, Yan Zhao wrote:
> Added a selftest set_memory_region_io to test memslots for MMIO BARs.
Emm.. "set_memory_region_io" doesn't represent the real testing purpose,
but not sure if things like "memory_region_page_refcount_test" become
better...
> The MMIO BARs' backends are compound/non-compound huge pages serving as
> device resources allocated by a mock device driver.
>
> This selftest will assert and report "errno=14 - Bad address" in vcpu_run()
> if any failure is met to add such MMIO BAR memslots.
> After MMIO memslots removal, page reference counts of the device resources
> are also checked.
>
> As this selftest will interacts with a mock device "/dev/kvm_mock_device",
> it depends on test driver test_kvm_mock_device.ko in the kernel.
> CONFIG_TEST_KVM_MOCK_DEVICE=m must be enabled in the kernel.
>
> Currently, this selftest is only compiled for __x86_64__.
>
> Signed-off-by: Yan Zhao <[email protected]>
> ---
> tools/testing/selftests/kvm/Makefile | 1 +
> .../selftests/kvm/set_memory_region_io.c | 188 ++++++++++++++++++
> 2 files changed, 189 insertions(+)
> create mode 100644 tools/testing/selftests/kvm/set_memory_region_io.c
>
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index 4412b42d95de..9d39514b6403 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -144,6 +144,7 @@ TEST_GEN_PROGS_x86_64 += memslot_modification_stress_test
> TEST_GEN_PROGS_x86_64 += memslot_perf_test
> TEST_GEN_PROGS_x86_64 += rseq_test
> TEST_GEN_PROGS_x86_64 += set_memory_region_test
> +TEST_GEN_PROGS_x86_64 += set_memory_region_io
> TEST_GEN_PROGS_x86_64 += steal_time
> TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test
> TEST_GEN_PROGS_x86_64 += system_counter_offset_test
> diff --git a/tools/testing/selftests/kvm/set_memory_region_io.c b/tools/testing/selftests/kvm/set_memory_region_io.c
> new file mode 100644
> index 000000000000..e221103091f4
> --- /dev/null
> +++ b/tools/testing/selftests/kvm/set_memory_region_io.c
> @@ -0,0 +1,188 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define _GNU_SOURCE /* for program_invocation_short_name */
> +#include <fcntl.h>
> +#include <pthread.h>
> +#include <sched.h>
> +#include <semaphore.h>
> +#include <signal.h>
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <string.h>
> +#include <sys/ioctl.h>
> +#include <sys/mman.h>
> +
> +#include <linux/compiler.h>
> +
> +#include <test_util.h>
> +#include <kvm_util.h>
> +#include <processor.h>
> +
> +#include <../../../../lib/test_kvm_mock_device_uapi.h>
> +
> +/*
> + * Somewhat arbitrary location and slot, intended to not overlap anything.
> + */
> +#define MEM_REGION_GPA_BASE 0xc0000000
> +#define RANDOM_OFFSET 0x1000
> +#define MEM_REGION_GPA_RANDOM (MEM_REGION_GPA_BASE + RANDOM_OFFSET)
> +#define MEM_REGION_SLOT_ID 10
> +
> +static const bool non_compound_supported;
> +
> +static const uint64_t BASE_VAL = 0x1111;
> +static const uint64_t RANDOM_VAL = 0x2222;
> +
> +static unsigned long bar_size;
> +
> +static sem_t vcpu_ready;
> +
> +static void guest_code_read_bar(void)
> +{
> + uint64_t val;
> +
> + GUEST_SYNC(0);
> +
> + val = READ_ONCE(*((uint64_t *)MEM_REGION_GPA_RANDOM));
> + GUEST_ASSERT_EQ(val, RANDOM_VAL);
> +
> + val = READ_ONCE(*((uint64_t *)MEM_REGION_GPA_BASE));
> + GUEST_ASSERT_EQ(val, BASE_VAL);
> +
> + GUEST_DONE();
> +}
> +
> +static void *vcpu_worker(void *data)
> +{
> + struct kvm_vcpu *vcpu = data;
> + struct kvm_run *run = vcpu->run;
> + struct ucall uc;
> + uint64_t cmd;
> +
> + /*
> + * Loop until the guest is done. Re-enter the guest on all MMIO exits,
> + * which will occur if the guest attempts to access a memslot after it
> + * has been deleted or while it is being moved .
> + */
> + while (1) {
> + vcpu_run(vcpu);
> +
> + if (run->exit_reason == KVM_EXIT_IO) {
> + cmd = get_ucall(vcpu, &uc);
> + if (cmd != UCALL_SYNC)
> + break;
> +
> + sem_post(&vcpu_ready);
> + continue;
> + }
> +
> + if (run->exit_reason != KVM_EXIT_MMIO)
> + break;
Can the KVM_EXIT_MMIO happen on x86 ? IIUC the accessed GVAs
in guest code have 1:1 mapping to MEM_REGION_GPA_BASE, which
is covered by the memslot, and the memory slot is there
until the guest code path done.
> +
> + TEST_ASSERT(!run->mmio.is_write, "Unexpected exit mmio write");
> + TEST_ASSERT(run->mmio.len == 8,
> + "Unexpected exit mmio size = %u", run->mmio.len);
> +
> + TEST_ASSERT(run->mmio.phys_addr < MEM_REGION_GPA_BASE ||
> + run->mmio.phys_addr >= MEM_REGION_GPA_BASE + bar_size,
> + "Unexpected exit mmio address = 0x%llx",
> + run->mmio.phys_addr);
Ditto, I just think you don't need this part in this testing.
> + }
> +
> + if (run->exit_reason == KVM_EXIT_IO && cmd == UCALL_ABORT)
> + REPORT_GUEST_ASSERT(uc);
> +
> + return NULL;
> +}
> +
> +static void wait_for_vcpu(void)
> +{
> + struct timespec ts;
> +
> + TEST_ASSERT(!clock_gettime(CLOCK_REALTIME, &ts),
> + "clock_gettime() failed: %d\n", errno);
> +
> + ts.tv_sec += 2;
> + TEST_ASSERT(!sem_timedwait(&vcpu_ready, &ts),
> + "sem_timedwait() failed: %d\n", errno);
> +
> + /* Wait for the vCPU thread to reenter the guest. */
> + usleep(100000);
In this testing it's not needed.
Because you only check guest state after guest code path done,
so pthread_join() is enough there.
> +}
> +
> +static void test_kvm_mock_device_bar(bool compound)
> +{
> + struct kvm_vm *vm;
> + void *mem;
> + struct kvm_vcpu *vcpu;
> + pthread_t vcpu_thread;
> + int fd, ret;
> + u32 param_compound = compound;
> + u32 inequal = 0;
> +
> + fd = open("/dev/kvm_mock_device", O_RDWR);
> + if (fd < 0) {
> + pr_info("Please ensure \"CONFIG_TEST_KVM_MOCK_DEVICE=m\" is enabled in the kernel");
> + pr_info(", and execute\n\"modprobe test_kvm_mock_device\n");
> + }
> + TEST_ASSERT(fd >= 0, "Failed to open kvm mock device.");
Minor:
May better to move this part into main(), highlight it's a
must have dependency at beginning.
> +
> + ret = ioctl(fd, KVM_MOCK_DEVICE_GET_BAR_SIZE, &bar_size);
> + TEST_ASSERT(ret == 0, "Failed to get bar size of kvm mock device");
> +
> + ret = ioctl(fd, KVM_MOCK_DEVICE_PREPARE_RESOURCE, ¶m_compound);
> + TEST_ASSERT(ret == 0, "Failed to prepare resource of kvm mock device");
> +
> + mem = mmap(NULL, (size_t)bar_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> + fd, 0);
> + TEST_ASSERT(mem != MAP_FAILED, "Failed to mmap() kvm mock device bar");
> +
> + *(u64 *)mem = BASE_VAL;
> + *(u64 *)(mem + RANDOM_OFFSET) = RANDOM_VAL;
> +
> + vm = vm_create_with_one_vcpu(&vcpu, guest_code_read_bar);
> +
> + vm_set_user_memory_region(vm, MEM_REGION_SLOT_ID, 0, MEM_REGION_GPA_BASE,
> + bar_size, mem);
> +
> + virt_map(vm, MEM_REGION_GPA_BASE, MEM_REGION_GPA_BASE,
> + (RANDOM_OFFSET / getpagesize()) + 1);
> +
> + pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
> +
> + /* Ensure the guest thread is spun up. */
> + wait_for_vcpu();
> +
> + pthread_join(vcpu_thread, NULL);
> +
> + vm_set_user_memory_region(vm, MEM_REGION_SLOT_ID, 0, 0, 0, 0);
> + kvm_vm_free(vm);
> +
> + ret = ioctl(fd, KVM_MOCK_DEVICE_CHECK_BACKEND_REF, &inequal);
> + TEST_ASSERT(ret == 0 && inequal == 0, "Incorrect resource ref of KVM device");
> +
> + munmap(mem, bar_size);
> + close(fd);
> +}
> +
> +static void test_non_compound_backend(void)
> +{
> + pr_info("Testing non-compound huge page backend for mem slot\n");
> + test_kvm_mock_device_bar(false);
> +}
> +
> +static void test_compound_backend(void)
> +{
> + pr_info("Testing compound huge page backend for mem slot\n");
> + test_kvm_mock_device_bar(true);
> +}
> +
> +int main(int argc, char *argv[])
> +{
> +#ifdef __x86_64__
> + test_compound_backend();
> + if (non_compound_supported)
Nobody set this, but the mock device looks already supported
it, so how about just run the 2 testings directly here ?
> + test_non_compound_backend();
> +#endif
> +
> + return 0;
> +}
> --
> 2.17.1
>
>
On Thu, Jan 04, 2024 at 04:16:04PM +0800, Yuan Yao wrote:
> On Wed, Jan 03, 2024 at 04:44:57PM +0800, Yan Zhao wrote:
> > This driver is for testing KVM memory slots for device MMIO BARs that are
> > mapped to pages serving as device resources.
> >
> > This driver implements a mock device whose device resources are pages
> > array that can be mmaped into user space. It provides ioctl interface to
> > users to configure whether the pages are allocated as a compound huge page
> > or not.
>
> I just think that it can be used in other scenarios, not only KVM.
>
Right. But I just want to make it to serve only KVM specific tests :)
> >
> > KVM selftest code can then map the mock device resource to KVM memslots
> > to check if any error encountered. After VM shutdown, mock device
> > resource's page reference counters are checked to ensure KVM does not hold
> > extra reference count during memslot add/removal.
> >
> > Signed-off-by: Yan Zhao <[email protected]>
...
> > +struct kvm_mock_device {
> > + bool compound;
> > + struct page *resource;
> > + u64 bar_size;
> > + int order;
>
> Do you have plan to allow user to change the bar_size via IOCTL ?
> If no "order" and "bar_size" can be removed.
>
Currently no. But this structure is private to the test driver.
What the benefit to remove the two?
> > + int *ref_array;
> > + struct mutex lock;
> > + bool prepared;
> > +};
> > +
> > +static bool opened;
> > +
> > +#define BAR_SIZE 0x200000UL
> > +#define DEFAULT_COMPOUND true
>
> "kmdev->compound = true;" is more easy to understand,
> but "kmdev->compound = DEFAULT_COMPOUND;" not.
>
Ok. But I want to make the state that "default compound state is true"
more explicit and configurable by a macro.
> > +
> > +static vm_fault_t kvm_mock_device_mmap_fault(struct vm_fault *vmf)
> > +{
> > + struct vm_area_struct *vma = vmf->vma;
> > + struct kvm_mock_device *kmdev = vma->vm_private_data;
> > + struct page *p = kmdev->resource;
> > + vm_fault_t ret = VM_FAULT_NOPAGE;
> > + unsigned long addr;
> > + int i;
> > +
> > + for (addr = vma->vm_start, i = vma->vm_pgoff; addr < vma->vm_end;
> > + addr += PAGE_SIZE, i++) {
>
> Just question:
> Will it be enough if only map the accessed page for the testing purpose ?
>
It should be enough.
But as VFIO usually maps the whole BAR in a single fault, I want to
keep align with it :)
> > +
> > + ret = vmf_insert_pfn(vma, addr, page_to_pfn(p + i));
> > + if (ret == VM_FAULT_NOPAGE)
> > + continue;
> > +
> > + zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start);
> > + return ret;
> > +
> > + }
> > + return ret;
> > +}
> > +
> > +static const struct vm_operations_struct kvm_mock_device_mmap_ops = {
> > + .fault = kvm_mock_device_mmap_fault,
> > +};
> > +
> > +static int kvm_mock_device_fops_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > + struct kvm_mock_device *kmdev = file->private_data;
> > + u64 offset, req_len;
> > + int ret = 0;
> > +
> > + mutex_lock(&kmdev->lock);
> > + if (!kmdev->prepared) {
> > + ret = -ENODEV;
> > + goto out;
> > + }
> > +
> > + offset = vma->vm_pgoff << PAGE_SHIFT;
> > + req_len = vma->vm_end - vma->vm_start;
> > + if (offset + req_len > BAR_SIZE) {
> > + ret = -EINVAL;
> > + goto out;
> > + }
> > +
> > + vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP);
> > + vma->vm_ops = &kvm_mock_device_mmap_ops;
> > + vma->vm_private_data = kmdev;
> > +out:
> > + mutex_unlock(&kmdev->lock);
> > + return ret;
> > +}
> > +
> > +static int kvm_mock_device_prepare_resource(struct kvm_mock_device *kmdev)
> > +{
> > + gfp_t gfp_flags = GFP_KERNEL | __GFP_ZERO;
> > + unsigned int order = kmdev->order;
> > + unsigned long count = 1 << order;
> > + unsigned long i;
> > + struct page *p;
> > + int ret;
> > +
> > + mutex_lock(&kmdev->lock);
> > + if (kmdev->prepared) {
> > + ret = -EBUSY;
> > + goto out;
> > + }
> > +
> > + if (kmdev->compound)
> > + gfp_flags |= __GFP_COMP;
> > +
> > + p = alloc_pages_node(0, gfp_flags, order);
>
> Please alloc_pages() to honor the memory policy of current task.
> Hardcode to node 0 just works, but not hard to do this better.
>
Ok, will do it. thanks!
> > + if (!p) {
> > + ret = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + kmdev->ref_array = kmalloc_array(count, sizeof(kmdev->ref_array),
> > + GFP_KERNEL_ACCOUNT);
> > + if (!kmdev->ref_array) {
> > + __free_pages(p, order);
> > + ret = -ENOMEM;
> > + goto out;
> > + }
> > +
> > + for (i = 0; i < count; i++)
> > + kmdev->ref_array[i] = page_ref_count(p + i);
> > +
> > + kmdev->resource = p;
> > + kmdev->prepared = true;
> > +out:
> > + mutex_unlock(&kmdev->lock);
> > + return ret;
> > +}
> > +
> > +static int kvm_mock_device_check_resource_ref(struct kvm_mock_device *kmdev)
> > +{
> > + u32 i, count = 1 << kmdev->order;
> > + struct page *p = kmdev->resource;
> > + int inequal = 0;
> > +
> > + mutex_lock(&kmdev->lock);
> > + if (!kmdev->prepared) {
> > + mutex_unlock(&kmdev->lock);
> > + return -ENODEV;
> > + }
> > +
> > + for (i = 0; i < count; i++) {
> > + if (kmdev->ref_array[i] == page_ref_count(p + i))
> > + continue;
> > +
> > + pr_err("kvm test device check resource page %d old ref=%d new ref=%d\n",
> > + i, kmdev->ref_array[i], page_ref_count(p + i));
>
> How about just return a bitmap to userspace for each page ineuqal ?
> Or if inequal number itself is enough then just remove this output, in worst case
> it prints 512 times for 2MB bar case, which looks just useless.
>
Right, print for 512 times is too much though it will only appear in the
worst failure case.
But I do think the info of "old ref" and "new ref" are useful for debugging.
So, instead of printing bitmap, what about only printing the error message
for once for the first error page?
> > + inequal++;
> > + }
> > + mutex_unlock(&kmdev->lock);
> > +
> > + return inequal;
> > +}
> > +
> > +static int kvm_mock_device_fops_open(struct inode *inode, struct file *filp)
> > +{
> > + struct kvm_mock_device *kmdev;
> > +
> > + if (opened)
> > + return -EBUSY;
>
> It can't work in case of 2 who open the device file at *real* same time, at least
> you need atomic helpers for that purpose.
>
Ah, right. Will turn it to atomic.
> BTW I saw "kvm_mock_devie" instance is per file level, so maybe not hard
> to remove this limitation ?
Yes, but as it's a test driver, I don't see any needs to complicate the code.
> > +
> > + kmdev = kzalloc(sizeof(*kmdev), GFP_KERNEL_ACCOUNT);
> > + if (!kmdev)
> > + return -ENOMEM;
> > +
> > + kmdev->compound = DEFAULT_COMPOUND;
> > + kmdev->bar_size = BAR_SIZE;
> > + kmdev->order = get_order(kmdev->bar_size);
> > + mutex_init(&kmdev->lock);
> > + filp->private_data = kmdev;
> > +
> > + opened = true;
> > + return 0;
> > +}
> > +
> > +static int kvm_mock_device_fops_release(struct inode *inode, struct file *filp)
> > +{
> > + struct kvm_mock_device *kmdev = filp->private_data;
> > +
> > + if (kmdev->prepared)
> > + __free_pages(kmdev->resource, kmdev->order);
> > + mutex_destroy(&kmdev->lock);
> > + kfree(kmdev->ref_array);
> > + kfree(kmdev);
> > + opened = false;
> > + return 0;
> > +}
> > +
> > +static long kvm_mock_device_fops_unlocked_ioctl(struct file *filp,
> > + unsigned int command,
> > + unsigned long arg)
> > +{
> > + struct kvm_mock_device *kmdev = filp->private_data;
> > + int r;
> > +
> > + switch (command) {
> > + case KVM_MOCK_DEVICE_GET_BAR_SIZE: {
> > + u64 bar_size;
> > +
> > + bar_size = kmdev->bar_size;
> > + r = put_user(bar_size, (u64 __user *)arg);
> > + break;
> > + }
> > + case KVM_MOCK_DEVICE_PREPARE_RESOURCE: {
> > + u32 compound;
> > +
> > + r = get_user(compound, (u32 __user *)arg);
> > + if (r)
> > + return r;
> > +
> > + kmdev->compound = compound;
> > + r = kvm_mock_device_prepare_resource(kmdev);
> > + break;
> > +
> > + }
> > + case KVM_MOCK_DEVICE_CHECK_BACKEND_REF: {
> > + int inequal;
> > +
> > + inequal = kvm_mock_device_check_resource_ref(kmdev);
> > +
> > + if (inequal < 0)
> > + return inequal;
> > +
> > + r = put_user(inequal, (u32 __user *)arg);
> > + break;
> > + }
> > + default:
> > + r = -EOPNOTSUPP;
> > + }
> > +
> > + return r;
> > +}
> > +
> > +
> > +static const struct file_operations kvm_mock_device_fops = {
> > + .open = kvm_mock_device_fops_open,
> > + .release = kvm_mock_device_fops_release,
> > + .mmap = kvm_mock_device_fops_mmap,
> > + .unlocked_ioctl = kvm_mock_device_fops_unlocked_ioctl,
> > + .llseek = default_llseek,
> > + .owner = THIS_MODULE,
> > +};
> > +
> > +
> > +static int __init kvm_mock_device_test_init(void)
> > +{
> > + int ret;
> > +
> > + ret = alloc_chrdev_region(&kvm_mock_dev.devt, 0, 1, "KVM-MOCK-DEVICE");
>
> How about misc_register() ? Like how KVM create /dev/kvm.
> I think that will be more simpler.
Ah, right. Will try to use it in next version.
Thanks!
> > + if (ret)
> > + goto out;
> > +
> > + cdev_init(&kvm_mock_dev.cdev, &kvm_mock_device_fops);
> > + kvm_mock_dev.cdev.owner = THIS_MODULE;
> > + device_initialize(&kvm_mock_dev.device);
> > + kvm_mock_dev.device.devt = MKDEV(MAJOR(kvm_mock_dev.devt), 0);
> > + ret = dev_set_name(&kvm_mock_dev.device, "kvm_mock_device");
> > + if (ret)
> > + goto out;
> > +
> > + ret = cdev_device_add(&kvm_mock_dev.cdev, &kvm_mock_dev.device);
> > + if (ret)
> > + goto out;
> > +
> > +out:
> > + return ret;
> > +}
> > +
> > +static void __exit kvm_mock_device_test_exit(void)
> > +{
> > + cdev_device_del(&kvm_mock_dev.cdev, &kvm_mock_dev.device);
> > + unregister_chrdev_region(kvm_mock_dev.devt, 1);
> > +}
> > +
> > +module_init(kvm_mock_device_test_init);
> > +module_exit(kvm_mock_device_test_exit);
> > +MODULE_LICENSE("GPL");
> > diff --git a/lib/test_kvm_mock_device_uapi.h b/lib/test_kvm_mock_device_uapi.h
> > new file mode 100644
> > index 000000000000..227d0bf1d430
> > --- /dev/null
> > +++ b/lib/test_kvm_mock_device_uapi.h
> > @@ -0,0 +1,16 @@
> > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > +/*
> > + * This is a module to help test KVM guest access of KVM mock device's BAR,
> > + * whose backend is mapped to pages.
> > + */
> > +#ifndef _LIB_TEST_KVM_MOCK_DEVICE_UAPI_H
> > +#define _LIB_TEST_KVM_MOCK_DEVICE_UAPI_H
> > +
> > +#include <linux/types.h>
> > +#include <linux/ioctl.h>
> > +
> > +#define KVM_MOCK_DEVICE_GET_BAR_SIZE _IOR('M', 0x00, u64)
> > +#define KVM_MOCK_DEVICE_PREPARE_RESOURCE _IOWR('M', 0x01, u32)
> > +#define KVM_MOCK_DEVICE_CHECK_BACKEND_REF _IOWR('M', 0x02, u32)
> > +
> > +#endif /* _LIB_TEST_KVM_MOCK_DEVICE_UAPI_H */
> > --
> > 2.17.1
> >
> >
On Fri, Jan 05, 2024 at 02:25:26PM +0800, Yuan Yao wrote:
> On Wed, Jan 03, 2024 at 04:45:35PM +0800, Yan Zhao wrote:
> > Added a selftest set_memory_region_io to test memslots for MMIO BARs.
>
> Emm.. "set_memory_region_io" doesn't represent the real testing purpose,
> but not sure if things like "memory_region_page_refcount_test" become
> better...
Hmm, memory_region_io_page_test?
Not just ref count is tested.
Without patch 1, mapping of pages will even fail.
>
> > The MMIO BARs' backends are compound/non-compound huge pages serving as
> > device resources allocated by a mock device driver.
> >
> > This selftest will assert and report "errno=14 - Bad address" in vcpu_run()
> > if any failure is met to add such MMIO BAR memslots.
> > After MMIO memslots removal, page reference counts of the device resources
> > are also checked.
> >
> > As this selftest will interacts with a mock device "/dev/kvm_mock_device",
> > it depends on test driver test_kvm_mock_device.ko in the kernel.
> > CONFIG_TEST_KVM_MOCK_DEVICE=m must be enabled in the kernel.
> >
> > Currently, this selftest is only compiled for __x86_64__.
> >
> > Signed-off-by: Yan Zhao <[email protected]>
...
> > +static void *vcpu_worker(void *data)
> > +{
> > + struct kvm_vcpu *vcpu = data;
> > + struct kvm_run *run = vcpu->run;
> > + struct ucall uc;
> > + uint64_t cmd;
> > +
> > + /*
> > + * Loop until the guest is done. Re-enter the guest on all MMIO exits,
> > + * which will occur if the guest attempts to access a memslot after it
> > + * has been deleted or while it is being moved .
> > + */
> > + while (1) {
> > + vcpu_run(vcpu);
> > +
> > + if (run->exit_reason == KVM_EXIT_IO) {
> > + cmd = get_ucall(vcpu, &uc);
> > + if (cmd != UCALL_SYNC)
> > + break;
> > +
> > + sem_post(&vcpu_ready);
> > + continue;
> > + }
> > +
> > + if (run->exit_reason != KVM_EXIT_MMIO)
> > + break;
>
> Can the KVM_EXIT_MMIO happen on x86 ? IIUC the accessed GVAs
> in guest code have 1:1 mapping to MEM_REGION_GPA_BASE, which
> is covered by the memslot, and the memory slot is there
> until the guest code path done.
>
It can, if the GPAs accessed by guest code are not even mapped as a
memslot.
This check is to ensure GPAs are read from the testing memslot for mock IO.
> > +
> > + TEST_ASSERT(!run->mmio.is_write, "Unexpected exit mmio write");
> > + TEST_ASSERT(run->mmio.len == 8,
> > + "Unexpected exit mmio size = %u", run->mmio.len);
> > +
> > + TEST_ASSERT(run->mmio.phys_addr < MEM_REGION_GPA_BASE ||
> > + run->mmio.phys_addr >= MEM_REGION_GPA_BASE + bar_size,
> > + "Unexpected exit mmio address = 0x%llx",
> > + run->mmio.phys_addr);
>
> Ditto, I just think you don't need this part in this testing.
>
> > + }
> > +
> > + if (run->exit_reason == KVM_EXIT_IO && cmd == UCALL_ABORT)
> > + REPORT_GUEST_ASSERT(uc);
> > +
> > + return NULL;
> > +}
> > +
> > +static void wait_for_vcpu(void)
> > +{
> > + struct timespec ts;
> > +
> > + TEST_ASSERT(!clock_gettime(CLOCK_REALTIME, &ts),
> > + "clock_gettime() failed: %d\n", errno);
> > +
> > + ts.tv_sec += 2;
> > + TEST_ASSERT(!sem_timedwait(&vcpu_ready, &ts),
> > + "sem_timedwait() failed: %d\n", errno);
> > +
> > + /* Wait for the vCPU thread to reenter the guest. */
> > + usleep(100000);
>
> In this testing it's not needed.
> Because you only check guest state after guest code path done,
> so pthread_join() is enough there.
Right. Just keep to the convention :)
>
> > +}
> > +
> > +static void test_kvm_mock_device_bar(bool compound)
> > +{
> > + struct kvm_vm *vm;
> > + void *mem;
> > + struct kvm_vcpu *vcpu;
> > + pthread_t vcpu_thread;
> > + int fd, ret;
> > + u32 param_compound = compound;
> > + u32 inequal = 0;
> > +
> > + fd = open("/dev/kvm_mock_device", O_RDWR);
> > + if (fd < 0) {
> > + pr_info("Please ensure \"CONFIG_TEST_KVM_MOCK_DEVICE=m\" is enabled in the kernel");
> > + pr_info(", and execute\n\"modprobe test_kvm_mock_device\n");
> > + }
> > + TEST_ASSERT(fd >= 0, "Failed to open kvm mock device.");
>
> Minor:
> May better to move this part into main(), highlight it's a
> must have dependency at beginning.
I don't think so.
main() can do other tests that are not relying on the mock device.
Actually I'm not even sure if this test needs to be put in
set_memory_region_test.c.
>
> > +
> > + ret = ioctl(fd, KVM_MOCK_DEVICE_GET_BAR_SIZE, &bar_size);
> > + TEST_ASSERT(ret == 0, "Failed to get bar size of kvm mock device");
> > +
> > + ret = ioctl(fd, KVM_MOCK_DEVICE_PREPARE_RESOURCE, ¶m_compound);
> > + TEST_ASSERT(ret == 0, "Failed to prepare resource of kvm mock device");
> > +
> > + mem = mmap(NULL, (size_t)bar_size, PROT_READ | PROT_WRITE, MAP_SHARED,
> > + fd, 0);
> > + TEST_ASSERT(mem != MAP_FAILED, "Failed to mmap() kvm mock device bar");
> > +
> > + *(u64 *)mem = BASE_VAL;
> > + *(u64 *)(mem + RANDOM_OFFSET) = RANDOM_VAL;
> > +
> > + vm = vm_create_with_one_vcpu(&vcpu, guest_code_read_bar);
> > +
> > + vm_set_user_memory_region(vm, MEM_REGION_SLOT_ID, 0, MEM_REGION_GPA_BASE,
> > + bar_size, mem);
> > +
> > + virt_map(vm, MEM_REGION_GPA_BASE, MEM_REGION_GPA_BASE,
> > + (RANDOM_OFFSET / getpagesize()) + 1);
> > +
> > + pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
> > +
> > + /* Ensure the guest thread is spun up. */
> > + wait_for_vcpu();
> > +
> > + pthread_join(vcpu_thread, NULL);
> > +
> > + vm_set_user_memory_region(vm, MEM_REGION_SLOT_ID, 0, 0, 0, 0);
> > + kvm_vm_free(vm);
> > +
> > + ret = ioctl(fd, KVM_MOCK_DEVICE_CHECK_BACKEND_REF, &inequal);
> > + TEST_ASSERT(ret == 0 && inequal == 0, "Incorrect resource ref of KVM device");
> > +
> > + munmap(mem, bar_size);
> > + close(fd);
> > +}
> > +
> > +static void test_non_compound_backend(void)
> > +{
> > + pr_info("Testing non-compound huge page backend for mem slot\n");
> > + test_kvm_mock_device_bar(false);
> > +}
> > +
> > +static void test_compound_backend(void)
> > +{
> > + pr_info("Testing compound huge page backend for mem slot\n");
> > + test_kvm_mock_device_bar(true);
> > +}
> > +
> > +int main(int argc, char *argv[])
> > +{
> > +#ifdef __x86_64__
> > + test_compound_backend();
> > + if (non_compound_supported)
>
> Nobody set this, but the mock device looks already supported
> it, so how about just run the 2 testings directly here ?
Without the series "allow mapping non-refcounted pages" [1], the test
test_non_compound_backend() will simply fail.
I added this test case is to show it non-compound pages as backend can
also be tested with this selftest. And actually, I did tested that
series with this selftets.
So, can remove the "if non_compound_supported" after [1] is merged.
[1] https://lore.kernel.org/all/[email protected]/,
Thanks
>
> > + test_non_compound_backend();
> > +#endif
> > +
> > + return 0;
> > +}
> > --
> > 2.17.1
> >
>
On Fri, Jan 05, 2024 at 05:46:07PM +0800, Yan Zhao wrote:
> On Thu, Jan 04, 2024 at 04:16:04PM +0800, Yuan Yao wrote:
> > On Wed, Jan 03, 2024 at 04:44:57PM +0800, Yan Zhao wrote:
> > > This driver is for testing KVM memory slots for device MMIO BARs that are
> > > mapped to pages serving as device resources.
> > >
> > > This driver implements a mock device whose device resources are pages
> > > array that can be mmaped into user space. It provides ioctl interface to
> > > users to configure whether the pages are allocated as a compound huge page
> > > or not.
> >
> > I just think that it can be used in other scenarios, not only KVM.
> >
> Right. But I just want to make it to serve only KVM specific tests :)
>
> > >
> > > KVM selftest code can then map the mock device resource to KVM memslots
> > > to check if any error encountered. After VM shutdown, mock device
> > > resource's page reference counters are checked to ensure KVM does not hold
> > > extra reference count during memslot add/removal.
> > >
> > > Signed-off-by: Yan Zhao <[email protected]>
> ...
>
> > > +struct kvm_mock_device {
> > > + bool compound;
> > > + struct page *resource;
> > > + u64 bar_size;
> > > + int order;
> >
> > Do you have plan to allow user to change the bar_size via IOCTL ?
> > If no "order" and "bar_size" can be removed.
> >
> Currently no. But this structure is private to the test driver.
> What the benefit to remove the two?
It's useless so remove them makes code more easier to understand.
>
> > > + int *ref_array;
> > > + struct mutex lock;
> > > + bool prepared;
> > > +};
> > > +
> > > +static bool opened;
> > > +
> > > +#define BAR_SIZE 0x200000UL
> > > +#define DEFAULT_COMPOUND true
> >
> > "kmdev->compound = true;" is more easy to understand,
> > but "kmdev->compound = DEFAULT_COMPOUND;" not.
> >
> Ok. But I want to make the state that "default compound state is true"
> more explicit and configurable by a macro.
>
>
> > > +
> > > +static vm_fault_t kvm_mock_device_mmap_fault(struct vm_fault *vmf)
> > > +{
> > > + struct vm_area_struct *vma = vmf->vma;
> > > + struct kvm_mock_device *kmdev = vma->vm_private_data;
> > > + struct page *p = kmdev->resource;
> > > + vm_fault_t ret = VM_FAULT_NOPAGE;
> > > + unsigned long addr;
> > > + int i;
> > > +
> > > + for (addr = vma->vm_start, i = vma->vm_pgoff; addr < vma->vm_end;
> > > + addr += PAGE_SIZE, i++) {
> >
> > Just question:
> > Will it be enough if only map the accessed page for the testing purpose ?
> >
> It should be enough.
> But as VFIO usually maps the whole BAR in a single fault, I want to
> keep align with it :)
ah I see, thanks for your answer!
>
> > > +
> > > + ret = vmf_insert_pfn(vma, addr, page_to_pfn(p + i));
> > > + if (ret == VM_FAULT_NOPAGE)
> > > + continue;
> > > +
> > > + zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start);
> > > + return ret;
> > > +
> > > + }
> > > + return ret;
> > > +}
..
> > > +static int kvm_mock_device_check_resource_ref(struct kvm_mock_device *kmdev)
> > > +{
> > > + u32 i, count = 1 << kmdev->order;
> > > + struct page *p = kmdev->resource;
> > > + int inequal = 0;
> > > +
> > > + mutex_lock(&kmdev->lock);
> > > + if (!kmdev->prepared) {
> > > + mutex_unlock(&kmdev->lock);
> > > + return -ENODEV;
> > > + }
> > > +
> > > + for (i = 0; i < count; i++) {
> > > + if (kmdev->ref_array[i] == page_ref_count(p + i))
> > > + continue;
> > > +
> > > + pr_err("kvm test device check resource page %d old ref=%d new ref=%d\n",
> > > + i, kmdev->ref_array[i], page_ref_count(p + i));
> >
> > How about just return a bitmap to userspace for each page ineuqal ?
> > Or if inequal number itself is enough then just remove this output, in worst case
> > it prints 512 times for 2MB bar case, which looks just useless.
> >
> Right, print for 512 times is too much though it will only appear in the
> worst failure case.
> But I do think the info of "old ref" and "new ref" are useful for debugging.
> So, instead of printing bitmap, what about only printing the error message
> for once for the first error page?
For you reference:
The driver is designed for testing purpose
so I think just return the inequal should be enough, any one
who want to debug with this can easily change the source
code to see what's wrong there.
>
> > > + inequal++;
> > > + }
> > > + mutex_unlock(&kmdev->lock);
> > > +
> > > + return inequal;
> > > +}
> > > +
> > > +static int kvm_mock_device_fops_open(struct inode *inode, struct file *filp)
> > > +{
> > > + struct kvm_mock_device *kmdev;
> > > +
> > > + if (opened)
> > > + return -EBUSY;
> >
> > It can't work in case of 2 who open the device file at *real* same time, at least
> > you need atomic helpers for that purpose.
> >
> Ah, right. Will turn it to atomic.
>
> > BTW I saw "kvm_mock_devie" instance is per file level, so maybe not hard
> > to remove this limitation ?
> Yes, but as it's a test driver, I don't see any needs to complicate the code.
>
> > > +
> > > + kmdev = kzalloc(sizeof(*kmdev), GFP_KERNEL_ACCOUNT);
> > > + if (!kmdev)
> > > + return -ENOMEM;
> > > +
> > > + kmdev->compound = DEFAULT_COMPOUND;
> > > + kmdev->bar_size = BAR_SIZE;
> > > + kmdev->order = get_order(kmdev->bar_size);
> > > + mutex_init(&kmdev->lock);
> > > + filp->private_data = kmdev;
> > > +
> > > + opened = true;
> > > + return 0;
> > > +}
> > > +
> > > +static int kvm_mock_device_fops_release(struct inode *inode, struct file *filp)
> > > +{
> > > + struct kvm_mock_device *kmdev = filp->private_data;
> > > +
> > > + if (kmdev->prepared)
> > > + __free_pages(kmdev->resource, kmdev->order);
> > > + mutex_destroy(&kmdev->lock);
> > > + kfree(kmdev->ref_array);
> > > + kfree(kmdev);
> > > + opened = false;
> > > + return 0;
> > > +}
> > > +
> > > +static long kvm_mock_device_fops_unlocked_ioctl(struct file *filp,
> > > + unsigned int command,
> > > + unsigned long arg)
> > > +{
> > > + struct kvm_mock_device *kmdev = filp->private_data;
> > > + int r;
> > > +
> > > + switch (command) {
> > > + case KVM_MOCK_DEVICE_GET_BAR_SIZE: {
> > > + u64 bar_size;
> > > +
> > > + bar_size = kmdev->bar_size;
> > > + r = put_user(bar_size, (u64 __user *)arg);
> > > + break;
> > > + }
> > > + case KVM_MOCK_DEVICE_PREPARE_RESOURCE: {
> > > + u32 compound;
> > > +
> > > + r = get_user(compound, (u32 __user *)arg);
> > > + if (r)
> > > + return r;
> > > +
> > > + kmdev->compound = compound;
> > > + r = kvm_mock_device_prepare_resource(kmdev);
> > > + break;
> > > +
> > > + }
> > > + case KVM_MOCK_DEVICE_CHECK_BACKEND_REF: {
> > > + int inequal;
> > > +
> > > + inequal = kvm_mock_device_check_resource_ref(kmdev);
> > > +
> > > + if (inequal < 0)
> > > + return inequal;
> > > +
> > > + r = put_user(inequal, (u32 __user *)arg);
> > > + break;
> > > + }
> > > + default:
> > > + r = -EOPNOTSUPP;
> > > + }
> > > +
> > > + return r;
> > > +}
> > > +
> > > +
> > > +static const struct file_operations kvm_mock_device_fops = {
> > > + .open = kvm_mock_device_fops_open,
> > > + .release = kvm_mock_device_fops_release,
> > > + .mmap = kvm_mock_device_fops_mmap,
> > > + .unlocked_ioctl = kvm_mock_device_fops_unlocked_ioctl,
> > > + .llseek = default_llseek,
> > > + .owner = THIS_MODULE,
> > > +};
> > > +
> > > +
> > > +static int __init kvm_mock_device_test_init(void)
> > > +{
> > > + int ret;
> > > +
> > > + ret = alloc_chrdev_region(&kvm_mock_dev.devt, 0, 1, "KVM-MOCK-DEVICE");
> >
> > How about misc_register() ? Like how KVM create /dev/kvm.
> > I think that will be more simpler.
> Ah, right. Will try to use it in next version.
>
> Thanks!
>
> > > + if (ret)
> > > + goto out;
> > > +
> > > + cdev_init(&kvm_mock_dev.cdev, &kvm_mock_device_fops);
> > > + kvm_mock_dev.cdev.owner = THIS_MODULE;
> > > + device_initialize(&kvm_mock_dev.device);
> > > + kvm_mock_dev.device.devt = MKDEV(MAJOR(kvm_mock_dev.devt), 0);
> > > + ret = dev_set_name(&kvm_mock_dev.device, "kvm_mock_device");
> > > + if (ret)
> > > + goto out;
> > > +
> > > + ret = cdev_device_add(&kvm_mock_dev.cdev, &kvm_mock_dev.device);
> > > + if (ret)
> > > + goto out;
> > > +
> > > +out:
> > > + return ret;
> > > +}
> > > +
> > > +static void __exit kvm_mock_device_test_exit(void)
> > > +{
> > > + cdev_device_del(&kvm_mock_dev.cdev, &kvm_mock_dev.device);
> > > + unregister_chrdev_region(kvm_mock_dev.devt, 1);
> > > +}
> > > +
> > > +module_init(kvm_mock_device_test_init);
> > > +module_exit(kvm_mock_device_test_exit);
> > > +MODULE_LICENSE("GPL");
> > > diff --git a/lib/test_kvm_mock_device_uapi.h b/lib/test_kvm_mock_device_uapi.h
> > > new file mode 100644
> > > index 000000000000..227d0bf1d430
> > > --- /dev/null
> > > +++ b/lib/test_kvm_mock_device_uapi.h
> > > @@ -0,0 +1,16 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> > > +/*
> > > + * This is a module to help test KVM guest access of KVM mock device's BAR,
> > > + * whose backend is mapped to pages.
> > > + */
> > > +#ifndef _LIB_TEST_KVM_MOCK_DEVICE_UAPI_H
> > > +#define _LIB_TEST_KVM_MOCK_DEVICE_UAPI_H
> > > +
> > > +#include <linux/types.h>
> > > +#include <linux/ioctl.h>
> > > +
> > > +#define KVM_MOCK_DEVICE_GET_BAR_SIZE _IOR('M', 0x00, u64)
> > > +#define KVM_MOCK_DEVICE_PREPARE_RESOURCE _IOWR('M', 0x01, u32)
> > > +#define KVM_MOCK_DEVICE_CHECK_BACKEND_REF _IOWR('M', 0x02, u32)
> > > +
> > > +#endif /* _LIB_TEST_KVM_MOCK_DEVICE_UAPI_H */
> > > --
> > > 2.17.1
> > >
> > >
On Wed, Jan 10, 2024 at 02:27:08PM +0800, Yuan Yao wrote:
> > > Do you have plan to allow user to change the bar_size via IOCTL ?
> > > If no "order" and "bar_size" can be removed.
> > >
> > Currently no. But this structure is private to the test driver.
> > What the benefit to remove the two?
>
> It's useless so remove them makes code more easier to understand.
Just my two cents:
Keeping bar_size & order in a device structure is better than spreading
macro BAR_SIZE everywhere and the code is more scalable.
On Fri, Jan 12, 2024 at 08:21:29AM +0800, Yan Zhao wrote:
> On Wed, Jan 10, 2024 at 02:27:08PM +0800, Yuan Yao wrote:
> > > > Do you have plan to allow user to change the bar_size via IOCTL ?
> > > > If no "order" and "bar_size" can be removed.
> > > >
> > > Currently no. But this structure is private to the test driver.
> > > What the benefit to remove the two?
> >
> > It's useless so remove them makes code more easier to understand.
> Just my two cents:
> Keeping bar_size & order in a device structure is better than spreading
> macro BAR_SIZE everywhere and the code is more scalable.
yeah, that depends on the perspective, no big deal to me.
You can wait other's input.
On Wed, Jan 03, 2024, Yan Zhao wrote:
> This is a v2 for previous series [1] to allow mapping for compound tail
> pages for IO or PFNMAP mapping.
>
> Compared to v1, this version provides selftest to check functionality in
> KVM to map memslots for MMIO BARs (VMAs with flag VM_IO | VM_PFNMAP), as
> requested by Sean in [1].
Doh. So I didn't intend for you to have to create a mock device just to be able
to run a selftest. I assumed it would be easy-ish to utilize an existing generic
device. I take it that's not the case?
On Wed, Jan 03, 2024, Yan Zhao wrote:
> Allow mapping of tail pages of compound pages for IO or PFNMAP mapping
> by trying and getting ref count of its head page.
>
> For IO or PFNMAP mapping, sometimes it's backed by compound pages.
> KVM will just return error on mapping of tail pages of the compound pages,
> as ref count of the tail pages are always 0.
>
> So, rather than check and add ref count of a tail page, check and add ref
> count of its folio (head page) to allow mapping of the compound tail pages.
Can you add a blurb to call out that this is effectively what gup() does in
try_get_folio()? That knowledge give me a _lot_ more confidence that this is
correct (I didn't think too deeply about what this patch was doing when I looked
at v1).
> This will not break the origial intention to disallow mapping of tail pages
> of non-compound higher order allocations as the folio of a non-compound
> tail page is the same as the page itself.
>
> On the other side, put_page() has already converted page to folio before
> putting page ref.
>
> Signed-off-by: Yan Zhao <[email protected]>
> ---
> virt/kvm/kvm_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index acd67fb40183..f53b58446ac7 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2892,7 +2892,7 @@ static int kvm_try_get_pfn(kvm_pfn_t pfn)
> if (!page)
> return 1;
>
> - return get_page_unless_zero(page);
> + return folio_try_get(page_folio(page));
This seems like it needs retry logic, a la try_get_folio(), to guard against a
race with the folio being split. From page_folio():
If the caller* does not hold a reference, this call may race with a folio split,
so it should re-check the folio still contains this page after gaining a
reference on the folio.
I assume that splitting one of these folios is extremely unlikely, but I don't
see any harm in being paranoid (unless this really truly cannot race).
On Mon, Feb 12, 2024 at 07:07:25PM -0800, Sean Christopherson wrote:
> On Wed, Jan 03, 2024, Yan Zhao wrote:
> > This is a v2 for previous series [1] to allow mapping for compound tail
> > pages for IO or PFNMAP mapping.
> >
> > Compared to v1, this version provides selftest to check functionality in
> > KVM to map memslots for MMIO BARs (VMAs with flag VM_IO | VM_PFNMAP), as
> > requested by Sean in [1].
>
> Doh. So I didn't intend for you to have to create a mock device just to be able
> to run a selftest. I assumed it would be easy-ish to utilize an existing generic
> device. I take it that's not the case?
The selftest requires a vma with flag (VM_IO | VM_PFNMAP) with non-reserved
pages as backend.
Without a mock device, I don't find a easy way to let the selftest take
effect.
So, I borrowed the way in "tools/testing/selftests/mm/hmm-tests.c" which
uses a mock driver in "lib/test_hmm.c".
On Mon, Feb 12, 2024 at 07:17:21PM -0800, Sean Christopherson wrote:
> On Wed, Jan 03, 2024, Yan Zhao wrote:
> > Allow mapping of tail pages of compound pages for IO or PFNMAP mapping
> > by trying and getting ref count of its head page.
> >
> > For IO or PFNMAP mapping, sometimes it's backed by compound pages.
> > KVM will just return error on mapping of tail pages of the compound pages,
> > as ref count of the tail pages are always 0.
> >
> > So, rather than check and add ref count of a tail page, check and add ref
> > count of its folio (head page) to allow mapping of the compound tail pages.
>
> Can you add a blurb to call out that this is effectively what gup() does in
> try_get_folio()? That knowledge give me a _lot_ more confidence that this is
> correct (I didn't think too deeply about what this patch was doing when I looked
> at v1).
Sure.
>
> > This will not break the origial intention to disallow mapping of tail pages
> > of non-compound higher order allocations as the folio of a non-compound
> > tail page is the same as the page itself.
> >
> > On the other side, put_page() has already converted page to folio before
> > putting page ref.
> >
> > Signed-off-by: Yan Zhao <[email protected]>
> > ---
> > virt/kvm/kvm_main.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index acd67fb40183..f53b58446ac7 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -2892,7 +2892,7 @@ static int kvm_try_get_pfn(kvm_pfn_t pfn)
> > if (!page)
> > return 1;
> >
> > - return get_page_unless_zero(page);
> > + return folio_try_get(page_folio(page));
>
> This seems like it needs retry logic, a la try_get_folio(), to guard against a
> race with the folio being split. From page_folio():
>
> If the caller* does not hold a reference, this call may race with a folio split,
> so it should re-check the folio still contains this page after gaining a
> reference on the folio.
>
> I assume that splitting one of these folios is extremely unlikely, but I don't
> see any harm in being paranoid (unless this really truly cannot race).
Yes, you are right!
Will do the retry. Thanks!