2023-11-02 16:33:18

by Zeng Guang

[permalink] [raw]
Subject: [RFC PATCH v1 0/8] KVM: seftests: Support guest user mode execution and running

This patch series give a proposal to support guest VM running
in user mode and in canonical linear address organization as
well.

First design to parition the 64-bit canonical linear address space
into two half parts belonging to user-mode and supervisor-mode
respectively, similar as the organization of linear addresses used
in linux OS. Currently the linear addresses use 48-bit canonical
format in which bits 63:47 of the address are identical.

Secondly setup page table mapping the same guest physical address
of test code and data segment onto both user-mode and supervisor-mode
address space. It allows guest in different runtime mode, i.e.
user or supervisor, can run one code base in the corresponding
linear address space.

Also provide the runtime environment setup API for switching to
user mode execution.


Zeng Guang (8):
KVM: selftests: x86: Fix bug in addr_arch_gva2gpa()
KVM: selftests: x86: Support guest running on canonical linear-address
organization
KVM: selftests: Add virt_arch_ucall_prealloc() arch specific
implementation
KVM : selftests : Adapt selftest cases to kernel canonical linear
address
KVM: selftests: x86: Prepare setup for user mode support
KVM: selftests: x86: Allow user to access user-mode address and I/O
address space
KVM: selftests: x86: Support vcpu run in user mode
KVM: selftests: x86: Add KVM forced emulation prefix capability

.../selftests/kvm/include/kvm_util_base.h | 20 ++-
.../selftests/kvm/include/x86_64/processor.h | 48 ++++++-
.../selftests/kvm/lib/aarch64/processor.c | 5 +
tools/testing/selftests/kvm/lib/kvm_util.c | 6 +-
.../selftests/kvm/lib/riscv/processor.c | 5 +
.../selftests/kvm/lib/s390x/processor.c | 5 +
.../testing/selftests/kvm/lib/ucall_common.c | 2 +
.../selftests/kvm/lib/x86_64/processor.c | 117 ++++++++++++++----
.../selftests/kvm/set_memory_region_test.c | 13 +-
.../testing/selftests/kvm/x86_64/debug_regs.c | 2 +-
.../kvm/x86_64/userspace_msr_exit_test.c | 9 +-
11 files changed, 195 insertions(+), 37 deletions(-)

--
2.21.3


2023-11-02 16:33:26

by Zeng Guang

[permalink] [raw]
Subject: [RFC PATCH v1 2/8] KVM: selftests: x86: Support guest running on canonical linear-address organization

Setup execution environment running on 64-bit linear addresses for
user and supervisor mode.

Define the linear address based on 48-bit canonical format in which
bits 63:47 of the address are identical. All addresses to system data
structure are shifted to supervisor-mode address space.

Extend page table mapping for supervisor mode to same guest physical
address. This allows guest in supervisor mode can run in the
corresponding canonical linear address space.

Signed-off-by: Zeng Guang <[email protected]>
---
.../selftests/kvm/include/x86_64/processor.h | 6 ++++
tools/testing/selftests/kvm/lib/kvm_util.c | 6 ++--
.../selftests/kvm/lib/x86_64/processor.c | 28 ++++++++++++-------
3 files changed, 28 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 25bc61dac5fb..00f7337a520a 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -1256,4 +1256,10 @@ void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
#define PFERR_GUEST_PAGE_MASK BIT_ULL(PFERR_GUEST_PAGE_BIT)
#define PFERR_IMPLICIT_ACCESS BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)

+/*
+ * X86 kernel linear address defines
+ */
+#define KERNEL_LNA_OFFSET 0xffff800000000000
+#define KERNEL_ADDR(x) ((void *)(x) + KERNEL_LNA_OFFSET)
+
#endif /* SELFTEST_KVM_PROCESSOR_H */
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 7a8af1821f5d..584f111620f3 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -337,9 +337,11 @@ static uint64_t vm_nr_pages_required(enum vm_guest_mode mode,
* smallest page size is used. Considering each page contains x page
* table descriptors, the total extra size for page tables (for extra
* N pages) will be: N/x+N/x^2+N/x^3+... which is definitely smaller
- * than N/x*2.
+ * than N/x*2. To support mapping one set of physical addresses both
+ * to user-mode addresses and supervisor-mode addresses, it's proper
+ * to extend the page size to N/x*4.
*/
- nr_pages += (nr_pages + extra_mem_pages) / PTES_PER_MIN_PAGE * 2;
+ nr_pages += (nr_pages + extra_mem_pages) / PTES_PER_MIN_PAGE * 4;

/* Account for the number of pages needed by ucall. */
nr_pages += ucall_nr_pages_required(page_size);
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 9f4b8c47edce..6f4295a13d00 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -227,6 +227,13 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
{
__virt_pg_map(vm, vaddr, paddr, PG_LEVEL_4K);
+
+ /*
+ * Map same paddr to kernel linear address space. Make execution
+ * environment supporting running both in user and kernel mode.
+ */
+ if (!(vaddr & BIT_ULL(63)))
+ __virt_pg_map(vm, (uint64_t)KERNEL_ADDR(vaddr), paddr, PG_LEVEL_4K);
}

void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
@@ -505,7 +512,7 @@ static void kvm_setup_gdt(struct kvm_vm *vm, struct kvm_dtable *dt)
if (!vm->gdt)
vm->gdt = __vm_vaddr_alloc_page(vm, MEM_REGION_DATA);

- dt->base = vm->gdt;
+ dt->base = (unsigned long)KERNEL_ADDR(vm->gdt);
dt->limit = getpagesize();
}

@@ -516,7 +523,7 @@ static void kvm_setup_tss_64bit(struct kvm_vm *vm, struct kvm_segment *segp,
vm->tss = __vm_vaddr_alloc_page(vm, MEM_REGION_DATA);

memset(segp, 0, sizeof(*segp));
- segp->base = vm->tss;
+ segp->base = (unsigned long)KERNEL_ADDR(vm->tss);
segp->limit = 0x67;
segp->selector = selector;
segp->type = 0xb;
@@ -597,8 +604,8 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id,
/* Setup guest general purpose registers */
vcpu_regs_get(vcpu, &regs);
regs.rflags = regs.rflags | 0x2;
- regs.rsp = stack_vaddr;
- regs.rip = (unsigned long) guest_code;
+ regs.rsp = (unsigned long)KERNEL_ADDR(stack_vaddr);
+ regs.rip = (unsigned long)KERNEL_ADDR(guest_code);
vcpu_regs_set(vcpu, &regs);

/* Setup the MP state */
@@ -1103,8 +1110,9 @@ void vm_init_descriptor_tables(struct kvm_vm *vm)
vm->handlers = __vm_vaddr_alloc_page(vm, MEM_REGION_DATA);
/* Handlers have the same address in both address spaces.*/
for (i = 0; i < NUM_INTERRUPTS; i++)
- set_idt_entry(vm, i, (unsigned long)(&idt_handlers)[i], 0,
- DEFAULT_CODE_SELECTOR);
+ set_idt_entry(vm, i,
+ (unsigned long)KERNEL_ADDR((unsigned long)(&idt_handlers)[i]),
+ 0, DEFAULT_CODE_SELECTOR);
}

void vcpu_init_descriptor_tables(struct kvm_vcpu *vcpu)
@@ -1113,13 +1121,13 @@ void vcpu_init_descriptor_tables(struct kvm_vcpu *vcpu)
struct kvm_sregs sregs;

vcpu_sregs_get(vcpu, &sregs);
- sregs.idt.base = vm->idt;
+ sregs.idt.base = (unsigned long)KERNEL_ADDR(vm->idt);
sregs.idt.limit = NUM_INTERRUPTS * sizeof(struct idt_entry) - 1;
- sregs.gdt.base = vm->gdt;
+ sregs.gdt.base = (unsigned long)KERNEL_ADDR(vm->gdt);
sregs.gdt.limit = getpagesize() - 1;
kvm_seg_set_kernel_data_64bit(NULL, DEFAULT_DATA_SELECTOR, &sregs.gs);
vcpu_sregs_set(vcpu, &sregs);
- *(vm_vaddr_t *)addr_gva2hva(vm, (vm_vaddr_t)(&exception_handlers)) = vm->handlers;
+ *(vm_vaddr_t *)addr_gva2hva(vm, (vm_vaddr_t)(&exception_handlers)) = (vm_vaddr_t)KERNEL_ADDR(vm->handlers);
}

void vm_install_exception_handler(struct kvm_vm *vm, int vector,
@@ -1127,7 +1135,7 @@ void vm_install_exception_handler(struct kvm_vm *vm, int vector,
{
vm_vaddr_t *handlers = (vm_vaddr_t *)addr_gva2hva(vm, vm->handlers);

- handlers[vector] = (vm_vaddr_t)handler;
+ handlers[vector] = handler ? (vm_vaddr_t)KERNEL_ADDR(handler) : (vm_vaddr_t)NULL;
}

void assert_on_unhandled_exception(struct kvm_vcpu *vcpu)
--
2.21.3

2023-11-02 16:33:31

by Zeng Guang

[permalink] [raw]
Subject: [RFC PATCH v1 3/8] KVM: selftests: Add virt_arch_ucall_prealloc() arch specific implementation

Add virt_arch_ucall_prealloc() which allows to preprocess the memory
allocated to ucall_pool as per arch specific requirement.

For X86 platform, it needs to adjust the address to corresponding
address space based on the operation mode, i.e. user or supervisor
mode, at runtime.

There is no change for other platforms(aarch64/riscv/s390x).

Signed-off-by: Zeng Guang <[email protected]>
---
.../selftests/kvm/include/kvm_util_base.h | 17 +++++++++++++++++
.../selftests/kvm/lib/aarch64/processor.c | 5 +++++
.../testing/selftests/kvm/lib/riscv/processor.c | 5 +++++
.../testing/selftests/kvm/lib/s390x/processor.c | 5 +++++
tools/testing/selftests/kvm/lib/ucall_common.c | 2 ++
.../selftests/kvm/lib/x86_64/processor.c | 12 ++++++++++++
6 files changed, 46 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index a18db6a7b3cf..dbaa2cf83c1c 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -917,6 +917,23 @@ static inline void virt_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
virt_arch_dump(stream, vm, indent);
}

+/*
+ * Virtual UCALL memory pre-processing
+ *
+ * Input Args:
+ * ucall_gva - Guest virtual address point to memory of ucall pool
+ *
+ * Output Args: None
+ *
+ * Return:
+ * Processed guest virtual address point to memory of ucall pool
+ */
+void *virt_arch_ucall_prealloc(uint64_t ucall_gva);
+
+static inline void *virt_ucall_prealloc(uint64_t ucall_gva)
+{
+ return virt_arch_ucall_prealloc(ucall_gva);
+}

static inline int __vm_disable_nx_huge_pages(struct kvm_vm *vm)
{
diff --git a/tools/testing/selftests/kvm/lib/aarch64/processor.c b/tools/testing/selftests/kvm/lib/aarch64/processor.c
index 3a0259e25335..3a1827cce615 100644
--- a/tools/testing/selftests/kvm/lib/aarch64/processor.c
+++ b/tools/testing/selftests/kvm/lib/aarch64/processor.c
@@ -238,6 +238,11 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
}
}

+void *virt_arch_ucall_prealloc(uint64_t ucall_gva)
+{
+ return (void *)ucall_gva;
+}
+
void aarch64_vcpu_setup(struct kvm_vcpu *vcpu, struct kvm_vcpu_init *init)
{
struct kvm_vcpu_init default_init = { .target = -1, };
diff --git a/tools/testing/selftests/kvm/lib/riscv/processor.c b/tools/testing/selftests/kvm/lib/riscv/processor.c
index d146ca71e0c0..d3f7eed84195 100644
--- a/tools/testing/selftests/kvm/lib/riscv/processor.c
+++ b/tools/testing/selftests/kvm/lib/riscv/processor.c
@@ -180,6 +180,11 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
}
}

+void *virt_arch_ucall_prealloc(uint64_t ucall_gva)
+{
+ return (void *)ucall_gva;
+}
+
void riscv_vcpu_mmu_setup(struct kvm_vcpu *vcpu)
{
struct kvm_vm *vm = vcpu->vm;
diff --git a/tools/testing/selftests/kvm/lib/s390x/processor.c b/tools/testing/selftests/kvm/lib/s390x/processor.c
index 15945121daf1..b7c86649807d 100644
--- a/tools/testing/selftests/kvm/lib/s390x/processor.c
+++ b/tools/testing/selftests/kvm/lib/s390x/processor.c
@@ -155,6 +155,11 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
virt_dump_region(stream, vm, indent, vm->pgd);
}

+void *virt_arch_ucall_prealloc(uint64_t ucall_gva)
+{
+ return (void *)ucall_gva;
+}
+
struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id,
void *guest_code)
{
diff --git a/tools/testing/selftests/kvm/lib/ucall_common.c b/tools/testing/selftests/kvm/lib/ucall_common.c
index 816a3fa109bf..5afa32d77427 100644
--- a/tools/testing/selftests/kvm/lib/ucall_common.c
+++ b/tools/testing/selftests/kvm/lib/ucall_common.c
@@ -51,6 +51,8 @@ static struct ucall *ucall_alloc(void)
if (!ucall_pool)
goto ucall_failed;

+ ucall_pool = (struct ucall_header *)virt_ucall_prealloc((uint64_t)ucall_pool);
+
for (i = 0; i < KVM_MAX_VCPUS; ++i) {
if (!test_and_set_bit(i, ucall_pool->in_use)) {
uc = &ucall_pool->ucalls[i];
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 6f4295a13d00..525b714ee13c 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -388,6 +388,18 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
}
}

+void *virt_arch_ucall_prealloc(uint64_t ucall_gva)
+{
+ unsigned short desc_cs;
+
+ asm volatile ("mov %%cs,%0" : "=r" (desc_cs));
+
+ if (desc_cs & 0x3)
+ return (void *)(ucall_gva & ~KERNEL_LNA_OFFSET);
+ else
+ return (void *)(ucall_gva | KERNEL_LNA_OFFSET);
+}
+
/*
* Set Unusable Segment
*
--
2.21.3

2023-11-02 16:33:36

by Zeng Guang

[permalink] [raw]
Subject: [RFC PATCH v1 4/8] KVM : selftests : Adapt selftest cases to kernel canonical linear address

Adapt RIP to kernel canonical linear address in test cases
set_memory_region_test/debug_regs/userspace_msr_exit_test.

No functional change intended.

Signed-off-by: Zeng Guang <[email protected]>
---
.../testing/selftests/kvm/set_memory_region_test.c | 13 ++++++++++---
tools/testing/selftests/kvm/x86_64/debug_regs.c | 2 +-
.../selftests/kvm/x86_64/userspace_msr_exit_test.c | 9 +++++----
3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/tools/testing/selftests/kvm/set_memory_region_test.c b/tools/testing/selftests/kvm/set_memory_region_test.c
index b32960189f5f..8ab897bae3e0 100644
--- a/tools/testing/selftests/kvm/set_memory_region_test.c
+++ b/tools/testing/selftests/kvm/set_memory_region_test.c
@@ -31,6 +31,12 @@
#define MEM_REGION_GPA 0xc0000000
#define MEM_REGION_SLOT 10

+/*
+ * Offset to execute code at kernel address space
+ */
+#define KERNEL_LNA_OFFSET 0xffff800000000000
+#define CAST_TO_KERN(x) (x | KERNEL_LNA_OFFSET)
+
static const uint64_t MMIO_VAL = 0xbeefull;

extern const uint64_t final_rip_start;
@@ -300,10 +306,11 @@ static void test_delete_memory_region(void)
* so the instruction pointer would point to the reset vector.
*/
if (run->exit_reason == KVM_EXIT_INTERNAL_ERROR)
- TEST_ASSERT(regs.rip >= final_rip_start &&
- regs.rip < final_rip_end,
+ TEST_ASSERT(regs.rip >= CAST_TO_KERN(final_rip_start) &&
+ regs.rip < CAST_TO_KERN(final_rip_end),
"Bad rip, expected 0x%lx - 0x%lx, got 0x%llx\n",
- final_rip_start, final_rip_end, regs.rip);
+ CAST_TO_KERN(final_rip_start), CAST_TO_KERN(final_rip_end),
+ regs.rip);

kvm_vm_free(vm);
}
diff --git a/tools/testing/selftests/kvm/x86_64/debug_regs.c b/tools/testing/selftests/kvm/x86_64/debug_regs.c
index f6b295e0b2d2..73ce373e3299 100644
--- a/tools/testing/selftests/kvm/x86_64/debug_regs.c
+++ b/tools/testing/selftests/kvm/x86_64/debug_regs.c
@@ -64,7 +64,7 @@ static void guest_code(void)
GUEST_DONE();
}

-#define CAST_TO_RIP(v) ((unsigned long long)&(v))
+#define CAST_TO_RIP(v) ((unsigned long long)&(v) | KERNEL_LNA_OFFSET)

static void vcpu_skip_insn(struct kvm_vcpu *vcpu, int insn_len)
{
diff --git a/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
index 3533dc2fbfee..ab6b3f88352f 100644
--- a/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
+++ b/tools/testing/selftests/kvm/x86_64/userspace_msr_exit_test.c
@@ -18,6 +18,7 @@
static int fep_available = 1;

#define MSR_NON_EXISTENT 0x474f4f00
+#define CAST_TO_KERN(x) (x | KERNEL_LNA_OFFSET)

static u64 deny_bits = 0;
struct kvm_msr_filter filter_allow = {
@@ -363,12 +364,12 @@ static void __guest_gp_handler(struct ex_regs *regs,
char *r_start, char *r_end,
char *w_start, char *w_end)
{
- if (regs->rip == (uintptr_t)r_start) {
- regs->rip = (uintptr_t)r_end;
+ if (regs->rip == CAST_TO_KERN((uintptr_t)r_start)) {
+ regs->rip = CAST_TO_KERN((uintptr_t)r_end);
regs->rax = 0;
regs->rdx = 0;
- } else if (regs->rip == (uintptr_t)w_start) {
- regs->rip = (uintptr_t)w_end;
+ } else if (regs->rip == CAST_TO_KERN((uintptr_t)w_start)) {
+ regs->rip = CAST_TO_KERN((uintptr_t)w_end);
} else {
GUEST_ASSERT(!"RIP is at an unknown location!");
}
--
2.21.3

2023-11-02 16:33:43

by Zeng Guang

[permalink] [raw]
Subject: [RFC PATCH v1 5/8] KVM: selftests: x86: Prepare setup for user mode support

Extend the page size of stack memory that can be shared for user mode.
And configure the canonical linear address of the stack point(RSP0) for
privilege level 0 in TSS segment which processor will use to switch task,
e.g. from user mode back to supervisor mode triggered by interrupt.

Refactor KVM segment set API to support user mode setup.

No functional change intended.

Signed-off-by: Zeng Guang <[email protected]>
---
.../selftests/kvm/include/kvm_util_base.h | 3 +-
.../selftests/kvm/include/x86_64/processor.h | 18 +++++++++
.../selftests/kvm/lib/x86_64/processor.c | 37 +++++++++++++------
3 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
index dbaa2cf83c1c..6f580bc519f4 100644
--- a/tools/testing/selftests/kvm/include/kvm_util_base.h
+++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
@@ -71,6 +71,7 @@ struct kvm_vcpu {
struct kvm_dirty_gfn *dirty_gfns;
uint32_t fetch_index;
uint32_t dirty_gfns_count;
+ uint64_t stack_vaddr;
};

struct userspace_mem_regions {
@@ -167,7 +168,7 @@ static inline struct userspace_mem_region *vm_get_mem_region(struct kvm_vm *vm,
#define KVM_GUEST_PAGE_TABLE_MIN_PADDR 0x180000

#define DEFAULT_GUEST_STACK_VADDR_MIN 0xab6000
-#define DEFAULT_STACK_PGS 5
+#define DEFAULT_STACK_PGS 10

enum vm_guest_mode {
VM_MODE_P52V48_4K,
diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 00f7337a520a..4b167e3e0370 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -1072,6 +1072,24 @@ struct ex_regs {
uint64_t rflags;
};

+struct tss64_t {
+ uint32_t res1;
+ uint64_t rsp0;
+ uint64_t rsp1;
+ uint64_t rsp2;
+ uint64_t res2;
+ uint64_t ist1;
+ uint64_t ist2;
+ uint64_t ist3;
+ uint64_t ist4;
+ uint64_t ist5;
+ uint64_t ist6;
+ uint64_t ist7;
+ uint64_t res3;
+ uint16_t res4;
+ uint16_t iomap_base;
+} __attribute__((packed));
+
struct idt_entry {
uint16_t offset0;
uint16_t selector;
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 525b714ee13c..487e1f829031 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -16,6 +16,9 @@

#define DEFAULT_CODE_SELECTOR 0x8
#define DEFAULT_DATA_SELECTOR 0x10
+#define DEFAULT_TSS_SELECTOR 0x18
+#define USER_CODE_SELECTOR 0x23
+#define USER_DATA_SELECTOR 0x2B

#define MAX_NR_CPUID_ENTRIES 100

@@ -442,7 +445,7 @@ static void kvm_seg_fill_gdt_64bit(struct kvm_vm *vm, struct kvm_segment *segp)


/*
- * Set Long Mode Flat Kernel Code Segment
+ * Set Long Mode Flat Code Segment
*
* Input Args:
* vm - VM whose GDT is being filled, or NULL to only write segp
@@ -454,14 +457,16 @@ static void kvm_seg_fill_gdt_64bit(struct kvm_vm *vm, struct kvm_segment *segp)
* Return: None
*
* Sets up the KVM segment pointed to by @segp, to be a code segment
- * with the selector value given by @selector.
+ * with the selector value given by @selector. The @selector.dpl
+ * decides the descriptor privilege level, user or kernel.
*/
-static void kvm_seg_set_kernel_code_64bit(struct kvm_vm *vm, uint16_t selector,
+static void kvm_seg_set_code_64bit(struct kvm_vm *vm, uint16_t selector,
struct kvm_segment *segp)
{
memset(segp, 0, sizeof(*segp));
segp->selector = selector;
segp->limit = 0xFFFFFFFFu;
+ segp->dpl = selector & 0x3;
segp->s = 0x1; /* kTypeCodeData */
segp->type = 0x08 | 0x01 | 0x02; /* kFlagCode | kFlagCodeAccessed
* | kFlagCodeReadable
@@ -474,7 +479,7 @@ static void kvm_seg_set_kernel_code_64bit(struct kvm_vm *vm, uint16_t selector,
}

/*
- * Set Long Mode Flat Kernel Data Segment
+ * Set Long Mode Flat Data Segment
*
* Input Args:
* vm - VM whose GDT is being filled, or NULL to only write segp
@@ -486,14 +491,16 @@ static void kvm_seg_set_kernel_code_64bit(struct kvm_vm *vm, uint16_t selector,
* Return: None
*
* Sets up the KVM segment pointed to by @segp, to be a data segment
- * with the selector value given by @selector.
+ * with the selector value given by @selector. The @selector.dpl
+ * decides the descriptor privilege level, user or kernel.
*/
-static void kvm_seg_set_kernel_data_64bit(struct kvm_vm *vm, uint16_t selector,
+static void kvm_seg_set_data_64bit(struct kvm_vm *vm, uint16_t selector,
struct kvm_segment *segp)
{
memset(segp, 0, sizeof(*segp));
segp->selector = selector;
segp->limit = 0xFFFFFFFFu;
+ segp->dpl = selector & 0x3;
segp->s = 0x1; /* kTypeCodeData */
segp->type = 0x00 | 0x01 | 0x02; /* kFlagData | kFlagDataAccessed
* | kFlagDataWritable
@@ -561,10 +568,10 @@ static void vcpu_setup(struct kvm_vm *vm, struct kvm_vcpu *vcpu)
sregs.efer |= (EFER_LME | EFER_LMA | EFER_NX);

kvm_seg_set_unusable(&sregs.ldt);
- kvm_seg_set_kernel_code_64bit(vm, DEFAULT_CODE_SELECTOR, &sregs.cs);
- kvm_seg_set_kernel_data_64bit(vm, DEFAULT_DATA_SELECTOR, &sregs.ds);
- kvm_seg_set_kernel_data_64bit(vm, DEFAULT_DATA_SELECTOR, &sregs.es);
- kvm_setup_tss_64bit(vm, &sregs.tr, 0x18);
+ kvm_seg_set_code_64bit(vm, DEFAULT_CODE_SELECTOR, &sregs.cs);
+ kvm_seg_set_data_64bit(vm, DEFAULT_DATA_SELECTOR, &sregs.ds);
+ kvm_seg_set_data_64bit(vm, DEFAULT_DATA_SELECTOR, &sregs.es);
+ kvm_setup_tss_64bit(vm, &sregs.tr, DEFAULT_TSS_SELECTOR);
break;

default:
@@ -589,6 +596,7 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id,
struct kvm_regs regs;
vm_vaddr_t stack_vaddr;
struct kvm_vcpu *vcpu;
+ struct tss64_t *tss_hva;

stack_vaddr = __vm_vaddr_alloc(vm, DEFAULT_STACK_PGS * getpagesize(),
DEFAULT_GUEST_STACK_VADDR_MIN,
@@ -613,6 +621,13 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id,
vcpu_init_cpuid(vcpu, kvm_get_supported_cpuid());
vcpu_setup(vm, vcpu);

+ /* Save address of stack pool used for vCPU */
+ vcpu->stack_vaddr = stack_vaddr;
+
+ /* Setup canonical linear address form of the RSP0 for task switch */
+ tss_hva = (struct tss64_t *)addr_gva2hva(vm, vm->tss);
+ tss_hva->rsp0 = (uint64_t)KERNEL_ADDR(stack_vaddr);
+
/* Setup guest general purpose registers */
vcpu_regs_get(vcpu, &regs);
regs.rflags = regs.rflags | 0x2;
@@ -1137,7 +1152,7 @@ void vcpu_init_descriptor_tables(struct kvm_vcpu *vcpu)
sregs.idt.limit = NUM_INTERRUPTS * sizeof(struct idt_entry) - 1;
sregs.gdt.base = (unsigned long)KERNEL_ADDR(vm->gdt);
sregs.gdt.limit = getpagesize() - 1;
- kvm_seg_set_kernel_data_64bit(NULL, DEFAULT_DATA_SELECTOR, &sregs.gs);
+ kvm_seg_set_data_64bit(NULL, DEFAULT_DATA_SELECTOR, &sregs.gs);
vcpu_sregs_set(vcpu, &sregs);
*(vm_vaddr_t *)addr_gva2hva(vm, (vm_vaddr_t)(&exception_handlers)) = (vm_vaddr_t)KERNEL_ADDR(vm->handlers);
}
--
2.21.3

2023-11-02 16:33:43

by Zeng Guang

[permalink] [raw]
Subject: [RFC PATCH v1 1/8] KVM: selftests: x86: Fix bug in addr_arch_gva2gpa()

Fix the approach to get page map from gva to gpa.

If gva maps a 4-KByte page, current implementation of addr_arch_gva2gpa()
will obtain wrong page size and cannot derive correct offset from the guest
virtual address.

Meanwhile using HUGEPAGE_MASK(x) to calculate the offset within page
(1G/2M/4K) mistakenly incorporates the upper part of 64-bit canonical
linear address. That will work out improper guest physical address if
translating guest virtual address in supervisor-mode address space.

Signed-off-by: Zeng Guang <[email protected]>
---
tools/testing/selftests/kvm/lib/x86_64/processor.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index d8288374078e..9f4b8c47edce 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -293,6 +293,7 @@ uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
if (vm_is_target_pte(pde, level, PG_LEVEL_2M))
return pde;

+ *level = PG_LEVEL_4K;
return virt_get_pte(vm, pde, vaddr, PG_LEVEL_4K);
}

@@ -496,7 +497,7 @@ vm_paddr_t addr_arch_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva)
* No need for a hugepage mask on the PTE, x86-64 requires the "unused"
* address bits to be zero.
*/
- return PTE_GET_PA(*pte) | (gva & ~HUGEPAGE_MASK(level));
+ return PTE_GET_PA(*pte) | (gva & (HUGEPAGE_SIZE(level) - 1));
}

static void kvm_setup_gdt(struct kvm_vm *vm, struct kvm_dtable *dt)
--
2.21.3

2023-11-02 16:34:32

by Zeng Guang

[permalink] [raw]
Subject: [RFC PATCH v1 7/8] KVM: selftests: x86: Support vcpu run in user mode

Introduce vcpu_setup_user_mode() to support vcpu run in user mode.

Signed-off-by: Zeng Guang <[email protected]>
---
.../selftests/kvm/include/x86_64/processor.h | 1 +
.../selftests/kvm/lib/x86_64/processor.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 9c8224c80664..2534bdf8aa71 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -800,6 +800,7 @@ static inline void cpu_relax(void)
struct kvm_x86_state *vcpu_save_state(struct kvm_vcpu *vcpu);
void vcpu_load_state(struct kvm_vcpu *vcpu, struct kvm_x86_state *state);
void kvm_x86_state_cleanup(struct kvm_x86_state *state);
+void vcpu_setup_user_mode(struct kvm_vcpu *vcpu, void *guest_code);

const struct kvm_msr_list *kvm_get_msr_index_list(void);
const struct kvm_msr_list *kvm_get_feature_msr_index_list(void);
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 7647c3755ca2..c84292b35f2d 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -1071,6 +1071,25 @@ void vcpu_load_state(struct kvm_vcpu *vcpu, struct kvm_x86_state *state)
vcpu_nested_state_set(vcpu, &state->nested);
}

+void vcpu_setup_user_mode(struct kvm_vcpu *vcpu, void *guest_code)
+{
+ struct kvm_sregs sregs;
+ struct kvm_regs regs;
+ struct kvm_vm *vm = vcpu->vm;
+
+ vcpu_sregs_get(vcpu, &sregs);
+ kvm_seg_set_code_64bit(vm, USER_CODE_SELECTOR, &sregs.cs);
+ kvm_seg_set_data_64bit(vm, USER_DATA_SELECTOR, &sregs.ds);
+ kvm_seg_set_data_64bit(vm, USER_DATA_SELECTOR, &sregs.es);
+ kvm_seg_set_data_64bit(vm, USER_DATA_SELECTOR, &sregs.ss);
+ vcpu_sregs_set(vcpu, &sregs);
+
+ vcpu_regs_get(vcpu, &regs);
+ regs.rsp = vcpu->stack_vaddr - (DEFAULT_STACK_PGS >> 1) * getpagesize();
+ regs.rip = (unsigned long) guest_code;
+ vcpu_regs_set(vcpu, &regs);
+}
+
void kvm_x86_state_cleanup(struct kvm_x86_state *state)
{
free(state->xsave);
--
2.21.3

2023-11-02 16:34:41

by Zeng Guang

[permalink] [raw]
Subject: [RFC PATCH v1 8/8] KVM: selftests: x86: Add KVM forced emulation prefix capability

Introduce KVM selftest exception fixup using forced emulation prefix to
emulate instruction unconditionally when kvm.force_emulation_prefix is
enabled.

Signed-off-by: Zeng Guang <[email protected]>
---
.../selftests/kvm/include/x86_64/processor.h | 20 +++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 2534bdf8aa71..a1645508affc 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -1110,6 +1110,10 @@ void vcpu_init_descriptor_tables(struct kvm_vcpu *vcpu);
void vm_install_exception_handler(struct kvm_vm *vm, int vector,
void (*handler)(struct ex_regs *));

+/* Forced emulation prefix for KVM emulating instruction unconditionally */
+#define KVM_FEP "ud2; .byte 'k', 'v', 'm';"
+#define KVM_FEP_LENGTH 5
+
/* If a toddler were to say "abracadabra". */
#define KVM_EXCEPTION_MAGIC 0xabacadabaULL

@@ -1149,6 +1153,22 @@ void vm_install_exception_handler(struct kvm_vm *vm, int vector,
"mov %%r9b, %[vector]\n\t" \
"mov %%r10, %[error_code]\n\t"

+/*
+ * KVM selftest exception fixup using forced emulation prefix enforces KVM
+ * on emulating instruction unconditionally when kvm.force_emulation_prefix
+ * is enabled.
+ */
+#define KVM_FEP_ASM_SAFE(insn) \
+ "mov $" __stringify(KVM_EXCEPTION_MAGIC) ", %%r9\n\t" \
+ "lea 1f(%%rip), %%r10\n\t" \
+ "lea 2f(%%rip), %%r11\n\t" \
+ KVM_FEP \
+ "1: " insn "\n\t" \
+ "xor %%r9, %%r9\n\t" \
+ "2:\n\t" \
+ "mov %%r9b, %[vector]\n\t" \
+ "mov %%r10, %[error_code]\n\t"
+
#define KVM_ASM_SAFE_OUTPUTS(v, ec) [vector] "=qm"(v), [error_code] "=rm"(ec)
#define KVM_ASM_SAFE_CLOBBERS "r9", "r10", "r11"

--
2.21.3

2023-11-02 16:34:47

by Zeng Guang

[permalink] [raw]
Subject: [RFC PATCH v1 6/8] KVM: selftests: x86: Allow user to access user-mode address and I/O address space

Configure the U/S bit in paging-structure entries according to operation
mode and delimit user has user-mode access only to user-mode address
space.

Similarly set I/O privilege level as ring 3 in EFLAGS register to allow
user to access the I/O address space.

Signed-off-by: Zeng Guang <[email protected]>
---
.../selftests/kvm/include/x86_64/processor.h | 3 ++-
.../selftests/kvm/lib/x86_64/processor.c | 18 +++++++++++++++---
2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 4b167e3e0370..9c8224c80664 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -24,7 +24,8 @@ extern bool host_cpu_is_amd;

#define NMI_VECTOR 0x02

-#define X86_EFLAGS_FIXED (1u << 1)
+#define X86_EFLAGS_FIXED (1u << 1)
+#define X86_EFLAGS_IOPL (3u << 12)

#define X86_CR4_VME (1ul << 0)
#define X86_CR4_PVI (1ul << 1)
diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
index 487e1f829031..7647c3755ca2 100644
--- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
+++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
@@ -117,6 +117,14 @@ static void sregs_dump(FILE *stream, struct kvm_sregs *sregs, uint8_t indent)
}
}

+static bool gva_is_kernel_addr(uint64_t gva)
+{
+ if (gva & BIT_ULL(63))
+ return true;
+
+ return false;
+}
+
bool kvm_is_tdp_enabled(void)
{
if (host_cpu_is_intel)
@@ -161,7 +169,8 @@ static uint64_t *virt_create_upper_pte(struct kvm_vm *vm,
uint64_t *pte = virt_get_pte(vm, parent_pte, vaddr, current_level);

if (!(*pte & PTE_PRESENT_MASK)) {
- *pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK;
+ *pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK |
+ (gva_is_kernel_addr(vaddr) ? 0 : PTE_USER_MASK);
if (current_level == target_level)
*pte |= PTE_LARGE_MASK | (paddr & PHYSICAL_PAGE_MASK);
else
@@ -224,7 +233,8 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
pte = virt_get_pte(vm, pde, vaddr, PG_LEVEL_4K);
TEST_ASSERT(!(*pte & PTE_PRESENT_MASK),
"PTE already present for 4k page at vaddr: 0x%lx\n", vaddr);
- *pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK | (paddr & PHYSICAL_PAGE_MASK);
+ *pte = PTE_PRESENT_MASK | PTE_WRITABLE_MASK | (paddr & PHYSICAL_PAGE_MASK) |
+ (gva_is_kernel_addr(vaddr) ? 0 : PTE_USER_MASK);
}

void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
@@ -630,7 +640,9 @@ struct kvm_vcpu *vm_arch_vcpu_add(struct kvm_vm *vm, uint32_t vcpu_id,

/* Setup guest general purpose registers */
vcpu_regs_get(vcpu, &regs);
- regs.rflags = regs.rflags | 0x2;
+
+ /* Allow user privilege to access the I/O address space */
+ regs.rflags = regs.rflags | X86_EFLAGS_FIXED | X86_EFLAGS_IOPL;
regs.rsp = (unsigned long)KERNEL_ADDR(stack_vaddr);
regs.rip = (unsigned long)KERNEL_ADDR(guest_code);
vcpu_regs_set(vcpu, &regs);
--
2.21.3

2024-01-31 22:46:04

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v1 1/8] KVM: selftests: x86: Fix bug in addr_arch_gva2gpa()

On Thu, Nov 02, 2023, Zeng Guang wrote:
> Fix the approach to get page map from gva to gpa.
>
> If gva maps a 4-KByte page, current implementation of addr_arch_gva2gpa()
> will obtain wrong page size and cannot derive correct offset from the guest
> virtual address.
>
> Meanwhile using HUGEPAGE_MASK(x) to calculate the offset within page
> (1G/2M/4K) mistakenly incorporates the upper part of 64-bit canonical
> linear address. That will work out improper guest physical address if
> translating guest virtual address in supervisor-mode address space.

The "Meanwhile ..." is a huge clue that this should be two separate patches.

> Signed-off-by: Zeng Guang <[email protected]>
> ---
> tools/testing/selftests/kvm/lib/x86_64/processor.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> index d8288374078e..9f4b8c47edce 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> @@ -293,6 +293,7 @@ uint64_t *__vm_get_page_table_entry(struct kvm_vm *vm, uint64_t vaddr,
> if (vm_is_target_pte(pde, level, PG_LEVEL_2M))
> return pde;
>
> + *level = PG_LEVEL_4K;
> return virt_get_pte(vm, pde, vaddr, PG_LEVEL_4K);
> }
>
> @@ -496,7 +497,7 @@ vm_paddr_t addr_arch_gva2gpa(struct kvm_vm *vm, vm_vaddr_t gva)
> * No need for a hugepage mask on the PTE, x86-64 requires the "unused"
> * address bits to be zero.
> */
> - return PTE_GET_PA(*pte) | (gva & ~HUGEPAGE_MASK(level));
> + return PTE_GET_PA(*pte) | (gva & (HUGEPAGE_SIZE(level) - 1));

I think I would prefer to "fix" HUGEPAGE_MASK() and drop its incorporation of
PHYSICAL_PAGE_MASK. Regardless of anyone's personal views on whether or not
PAGE_MASK and HUGEPAGE_MASK should only cover physical address bits, (a) the
_one_ usage of HUGEPAGE_MASK is broken and (b) diverging from the kernel for
something like is a terrible idea, and the kernel does:

#define PAGE_MASK (~(PAGE_SIZE-1))
#define HPAGE_MASK (~(HPAGE_SIZE - 1))
#define KVM_HPAGE_MASK(x) (~(KVM_HPAGE_SIZE(x) - 1))

Luckily, there are barely any users in x86, so I think the entirety of the
conversion is this?

diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
index 0f4792083d01..ef895038c87f 100644
--- a/tools/testing/selftests/kvm/include/x86_64/processor.h
+++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
@@ -352,11 +352,12 @@ static inline unsigned int x86_model(unsigned int eax)

#define PAGE_SHIFT 12
#define PAGE_SIZE (1ULL << PAGE_SHIFT)
-#define PAGE_MASK (~(PAGE_SIZE-1) & PHYSICAL_PAGE_MASK)
+#define PAGE_MASK (~(PAGE_SIZE-1))
+kvm_static_assert((PHYSICAL_PAGE_MASK & PAGE_MASK) == PHYSICAL_PAGE_MASK);

#define HUGEPAGE_SHIFT(x) (PAGE_SHIFT + (((x) - 1) * 9))
#define HUGEPAGE_SIZE(x) (1UL << HUGEPAGE_SHIFT(x))
-#define HUGEPAGE_MASK(x) (~(HUGEPAGE_SIZE(x) - 1) & PHYSICAL_PAGE_MASK)
+#define HUGEPAGE_MASK(x) (~(HUGEPAGE_SIZE(x) - 1))

#define PTE_GET_PA(pte) ((pte) & PHYSICAL_PAGE_MASK)
#define PTE_GET_PFN(pte) (PTE_GET_PA(pte) >> PAGE_SHIFT)
diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c b/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
index 05b56095cf76..cc5730322072 100644
--- a/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
+++ b/tools/testing/selftests/kvm/x86_64/hyperv_tlb_flush.c
@@ -623,7 +623,7 @@ int main(int argc, char *argv[])
for (i = 0; i < NTEST_PAGES; i++) {
pte = vm_get_page_table_entry(vm, data->test_pages + i * PAGE_SIZE);
gpa = addr_hva2gpa(vm, pte);
- __virt_pg_map(vm, gva + PAGE_SIZE * i, gpa & PAGE_MASK, PG_LEVEL_4K);
+ __virt_pg_map(vm, gva + PAGE_SIZE * i, gpa & PHYSICAL_PAGE_MASK, PG_LEVEL_4K);
data->test_pages_pte[i] = gva + (gpa & ~PAGE_MASK);
}



2024-01-31 23:13:29

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v1 2/8] KVM: selftests: x86: Support guest running on canonical linear-address organization

On Thu, Nov 02, 2023, Zeng Guang wrote:
> Setup execution environment running on 64-bit linear addresses for
> user and supervisor mode.
>
> Define the linear address based on 48-bit canonical format in which
> bits 63:47 of the address are identical. All addresses to system data
> structure are shifted to supervisor-mode address space.
>
> Extend page table mapping for supervisor mode to same guest physical
> address. This allows guest in supervisor mode can run in the
> corresponding canonical linear address space.
>
> Signed-off-by: Zeng Guang <[email protected]>
> ---
> .../selftests/kvm/include/x86_64/processor.h | 6 ++++
> tools/testing/selftests/kvm/lib/kvm_util.c | 6 ++--
> .../selftests/kvm/lib/x86_64/processor.c | 28 ++++++++++++-------
> 3 files changed, 28 insertions(+), 12 deletions(-)
>
> diff --git a/tools/testing/selftests/kvm/include/x86_64/processor.h b/tools/testing/selftests/kvm/include/x86_64/processor.h
> index 25bc61dac5fb..00f7337a520a 100644
> --- a/tools/testing/selftests/kvm/include/x86_64/processor.h
> +++ b/tools/testing/selftests/kvm/include/x86_64/processor.h
> @@ -1256,4 +1256,10 @@ void virt_map_level(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr,
> #define PFERR_GUEST_PAGE_MASK BIT_ULL(PFERR_GUEST_PAGE_BIT)
> #define PFERR_IMPLICIT_ACCESS BIT_ULL(PFERR_IMPLICIT_ACCESS_BIT)
>
> +/*
> + * X86 kernel linear address defines
> + */
> +#define KERNEL_LNA_OFFSET 0xffff800000000000

Please don't make up acronyms, I can more or less glean what LNA is from the
context _here_, but in other usage I would truly have no idea.

> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> index 9f4b8c47edce..6f4295a13d00 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> @@ -227,6 +227,13 @@ void __virt_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr, int level)
> void virt_arch_pg_map(struct kvm_vm *vm, uint64_t vaddr, uint64_t paddr)
> {
> __virt_pg_map(vm, vaddr, paddr, PG_LEVEL_4K);
> +
> + /*
> + * Map same paddr to kernel linear address space. Make execution
> + * environment supporting running both in user and kernel mode.
> + */
> + if (!(vaddr & BIT_ULL(63)))
> + __virt_pg_map(vm, (uint64_t)KERNEL_ADDR(vaddr), paddr, PG_LEVEL_4K);

I really don't like the idea of piling hacks on top of selftests' misguided
infrastructure. Letting tests control virtual addresses is all kinds of stupid.
Except for ARM's ucall_arch_init(), I don't think there's a single user of
virt_map() that _needs_ a specific address, e.g. most tests just identity map
the GPA.

So rather than fudge things by stuffing two mappings, which is wasteful for 99%
of mappings and will likely be a maintenance nightmare, I think we should go
straight to getting x86's kernel mappings setup correctly from time zero.

From KUT experience, using USER mappings for kernel accesses is explosions waiting
to happen due to SMAP and SMEP. And expecting developers to remember to sprinkle
KERNEL_ADDR() everywhere is not remotely maintainable.

In other words, give virt_arch_pg_map() (or ideally, the common virt_map()) over
picking the virtual address, and then plumb in information as to whether the
allocation is USER vs. SUPERVISOR.

2024-01-31 23:56:29

by Sean Christopherson

[permalink] [raw]
Subject: Re: [RFC PATCH v1 3/8] KVM: selftests: Add virt_arch_ucall_prealloc() arch specific implementation

On Thu, Nov 02, 2023, Zeng Guang wrote:
> diff --git a/tools/testing/selftests/kvm/include/kvm_util_base.h b/tools/testing/selftests/kvm/include/kvm_util_base.h
> index a18db6a7b3cf..dbaa2cf83c1c 100644
> --- a/tools/testing/selftests/kvm/include/kvm_util_base.h
> +++ b/tools/testing/selftests/kvm/include/kvm_util_base.h
> @@ -917,6 +917,23 @@ static inline void virt_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
> virt_arch_dump(stream, vm, indent);
> }
>
> +/*
> + * Virtual UCALL memory pre-processing
> + *
> + * Input Args:
> + * ucall_gva - Guest virtual address point to memory of ucall pool
> + *
> + * Output Args: None
> + *
> + * Return:
> + * Processed guest virtual address point to memory of ucall pool
> + */

Please omit the massive comments, they are yet another misguided remnant in
selftests that we are purging.

> diff --git a/tools/testing/selftests/kvm/lib/x86_64/processor.c b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> index 6f4295a13d00..525b714ee13c 100644
> --- a/tools/testing/selftests/kvm/lib/x86_64/processor.c
> +++ b/tools/testing/selftests/kvm/lib/x86_64/processor.c
> @@ -388,6 +388,18 @@ void virt_arch_dump(FILE *stream, struct kvm_vm *vm, uint8_t indent)
> }
> }
>
> +void *virt_arch_ucall_prealloc(uint64_t ucall_gva)
> +{
> + unsigned short desc_cs;
> +
> + asm volatile ("mov %%cs,%0" : "=r" (desc_cs));

Strictly speaking, CS.DPL is not the source of truth for CPL, SS.DPL is. But
that's probably a moot point, because I again think this is a hack that shows the
overall approach isn't maintainable.

Can you post the actual usage of userspace selftests, i.e. the "full" series?
It's really hard to build a mental model of how this all fits together without
seeing the actual usage.