2023-09-19 21:50:47

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v6 00/15] GenieZone hypervisor drivers

This series is based on linux-next, tag: next-20230919.

GenieZone hypervisor(gzvm) is a type-1 hypervisor that supports various virtual
machine types and provides security features such as TEE-like scenarios and
secure boot. It can create guest VMs for security use cases and has
virtualization capabilities for both platform and interrupt. Although the
hypervisor can be booted independently, it requires the assistance of GenieZone
hypervisor kernel driver(gzvm-ko) to leverage the ability of Linux kernel for
vCPU scheduling, memory management, inter-VM communication and virtio backend
support.

Changes in v6:
- Rebase based on kernel 6.6-rc1
- Keep dt solution and leave the reasons in the commit message
- Remove arch/arm64/include/uapi/asm/gzvm_arch.h due to simplicity
- Remove resampler in drivers/virt/geniezone/gzvm_irqfd.c due to defeature for
now
- Remove PPI in arch/arm64/geniezone/vgic.c
- Refactor vm related components into 3 smaller patches, namely adding vm
support, setting user memory region and checking vm capability
- Refactor vcpu and vm component to remove unnecessary ARM prefix
- Add demand paging to fix crash on destroying memory page, acclerate on booting
and support ballooning deflate
- Add memory pin/unpin memory mechanism to support protected VM
- Add block-based demand paging for performance concern
- Response to reviewers and fix coding style accordingly

Changes in v5:
https://lore.kernel.org/all/[email protected]/
- Add dt solution back for device initialization
- Add GZVM_EXIT_GZ reason for gzvm_vcpu_run()
- Add patch for guest page fault handler
- Add patch for supporitng pin/unpin memory
- Remove unused enum members, namely GZVM_FUNC_GET_REGS and GZVM_FUNC_SET_REGS
- Use dev_debug() for debugging when platform device is available, and use
pr_debug() otherwise
- Response to reviewers and fix bugs accordingly


Changes in v4:
https://lore.kernel.org/lkml/[email protected]/
- Add macro to set VM as protected without triggering pvmfw in AVF.
- Add support to pass dtb config to hypervisor.
- Add support for virtual timer.
- Add UAPI to pass memory region metadata to hypervisor.
- Define our own macros for ARM's interrupt number
- Elaborate more on GenieZone hyperivsor in documentation
- Fix coding style.
- Implement our own module for coverting ipa to pa
- Modify the way of initializing device from dt to a more discoverable way
- Move refactoring changes into indepedent patches.

Changes in v3:
https://lore.kernel.org/all/[email protected]/
- Refactor: separate arch/arm64/geniezone/gzvm_arch.c into vm.c/vcpu.c/vgic.c
- Remove redundant functions
- Fix reviewer's comments

Changes in v2:
https://lore.kernel.org/all/[email protected]/
- Refactor: move to drivers/virt/geniezone
- Refactor: decouple arch-dependent and arch-independent
- Check pending signal before entering guest context
- Fix reviewer's comments

Initial Commit in v1:
https://lore.kernel.org/all/[email protected]/

Yi-De Wu (15):
docs: geniezone: Introduce GenieZone hypervisor
dt-bindings: hypervisor: Add MediaTek GenieZone hypervisor
virt: geniezone: Add GenieZone hypervisor driver
virt: geniezone: Add vm support
virt: geniezone: Add set_user_memory_region for vm
virt: geniezone: Add vm capability check
virt: geniezone: Add vcpu support
virt: geniezone: Add irqchip support for virtual interrupt injection
virt: geniezone: Add irqfd support
virt: geniezone: Add ioeventfd support
virt: geniezone: Add memory region support
virt: geniezone: Add dtb config support
virt: geniezone: Add demand paging support
virt: geniezone: Add memory pin/unpin support
virt: geniezone: Add block-based demand paging support

.../hypervisor/mediatek,geniezone-hyp.yaml | 31 ++
Documentation/virt/geniezone/introduction.rst | 86 ++++
Documentation/virt/index.rst | 1 +
MAINTAINERS | 11 +
arch/arm64/Kbuild | 1 +
arch/arm64/geniezone/Makefile | 9 +
arch/arm64/geniezone/gzvm_arch_common.h | 114 +++++
arch/arm64/geniezone/vcpu.c | 80 +++
arch/arm64/geniezone/vgic.c | 50 ++
arch/arm64/geniezone/vm.c | 380 ++++++++++++++
drivers/virt/Kconfig | 2 +
drivers/virt/geniezone/Kconfig | 16 +
drivers/virt/geniezone/Makefile | 12 +
drivers/virt/geniezone/gzvm_common.h | 12 +
drivers/virt/geniezone/gzvm_exception.c | 67 +++
drivers/virt/geniezone/gzvm_ioeventfd.c | 276 +++++++++++
drivers/virt/geniezone/gzvm_irqfd.c | 382 ++++++++++++++
drivers/virt/geniezone/gzvm_main.c | 147 ++++++
drivers/virt/geniezone/gzvm_mmu.c | 277 +++++++++++
drivers/virt/geniezone/gzvm_vcpu.c | 281 +++++++++++
drivers/virt/geniezone/gzvm_vm.c | 468 ++++++++++++++++++
include/linux/gzvm_drv.h | 204 ++++++++
include/uapi/linux/gzvm.h | 395 +++++++++++++++
23 files changed, 3302 insertions(+)
create mode 100644 Documentation/devicetree/bindings/hypervisor/mediatek,geniezone-hyp.yaml
create mode 100644 Documentation/virt/geniezone/introduction.rst
create mode 100644 arch/arm64/geniezone/Makefile
create mode 100644 arch/arm64/geniezone/gzvm_arch_common.h
create mode 100644 arch/arm64/geniezone/vcpu.c
create mode 100644 arch/arm64/geniezone/vgic.c
create mode 100644 arch/arm64/geniezone/vm.c
create mode 100644 drivers/virt/geniezone/Kconfig
create mode 100644 drivers/virt/geniezone/Makefile
create mode 100644 drivers/virt/geniezone/gzvm_common.h
create mode 100644 drivers/virt/geniezone/gzvm_exception.c
create mode 100644 drivers/virt/geniezone/gzvm_ioeventfd.c
create mode 100644 drivers/virt/geniezone/gzvm_irqfd.c
create mode 100644 drivers/virt/geniezone/gzvm_main.c
create mode 100644 drivers/virt/geniezone/gzvm_mmu.c
create mode 100644 drivers/virt/geniezone/gzvm_vcpu.c
create mode 100644 drivers/virt/geniezone/gzvm_vm.c
create mode 100644 include/linux/gzvm_drv.h
create mode 100644 include/uapi/linux/gzvm.h

--
2.18.0


2023-09-19 22:32:27

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v6 11/15] virt: geniezone: Add memory region support

From: "Jerry Wang" <[email protected]>

Hypervisor might need to know the precise purpose of each memory
region, so that it can provide specific memory protection. We add a new
uapi to pass address and size of a memory region and its purpose.

Signed-off-by: Jerry Wang <[email protected]>
Signed-off-by: Liju-clr Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
arch/arm64/geniezone/gzvm_arch_common.h | 2 ++
arch/arm64/geniezone/vm.c | 10 ++++++++++
drivers/virt/geniezone/gzvm_vm.c | 7 +++++++
include/linux/gzvm_drv.h | 3 +++
4 files changed, 22 insertions(+)

diff --git a/arch/arm64/geniezone/gzvm_arch_common.h b/arch/arm64/geniezone/gzvm_arch_common.h
index 051d8f49a1df..321f5dbcd616 100644
--- a/arch/arm64/geniezone/gzvm_arch_common.h
+++ b/arch/arm64/geniezone/gzvm_arch_common.h
@@ -22,6 +22,7 @@ enum {
GZVM_FUNC_PROBE = 12,
GZVM_FUNC_ENABLE_CAP = 13,
GZVM_FUNC_INFORM_EXIT = 14,
+ GZVM_FUNC_MEMREGION_PURPOSE = 15,
NR_GZVM_FUNC,
};

@@ -44,6 +45,7 @@ enum {
#define MT_HVC_GZVM_PROBE GZVM_HCALL_ID(GZVM_FUNC_PROBE)
#define MT_HVC_GZVM_ENABLE_CAP GZVM_HCALL_ID(GZVM_FUNC_ENABLE_CAP)
#define MT_HVC_GZVM_INFORM_EXIT GZVM_HCALL_ID(GZVM_FUNC_INFORM_EXIT)
+#define MT_HVC_GZVM_MEMREGION_PURPOSE GZVM_HCALL_ID(GZVM_FUNC_MEMREGION_PURPOSE)

#define GIC_V3_NR_LRS 16

diff --git a/arch/arm64/geniezone/vm.c b/arch/arm64/geniezone/vm.c
index 6db5e7f0cf2f..1198f32f4785 100644
--- a/arch/arm64/geniezone/vm.c
+++ b/arch/arm64/geniezone/vm.c
@@ -104,6 +104,16 @@ int gzvm_arch_destroy_vm(u16 vm_id)
0, 0, &res);
}

+int gzvm_arch_memregion_purpose(struct gzvm *gzvm,
+ struct gzvm_userspace_memory_region *mem)
+{
+ struct arm_smccc_res res;
+
+ return gzvm_hypcall_wrapper(MT_HVC_GZVM_MEMREGION_PURPOSE, gzvm->vm_id,
+ mem->guest_phys_addr, mem->memory_size,
+ mem->flags, 0, 0, 0, &res);
+}
+
static int gzvm_vm_arch_enable_cap(struct gzvm *gzvm,
struct gzvm_enable_cap *cap,
struct arm_smccc_res *res)
diff --git a/drivers/virt/geniezone/gzvm_vm.c b/drivers/virt/geniezone/gzvm_vm.c
index 7a935b1cf509..dfa03e850e8a 100644
--- a/drivers/virt/geniezone/gzvm_vm.c
+++ b/drivers/virt/geniezone/gzvm_vm.c
@@ -144,6 +144,7 @@ static int
gzvm_vm_ioctl_set_memory_region(struct gzvm *gzvm,
struct gzvm_userspace_memory_region *mem)
{
+ int ret;
struct vm_area_struct *vma;
struct gzvm_memslot *memslot;
unsigned long size;
@@ -168,6 +169,12 @@ gzvm_vm_ioctl_set_memory_region(struct gzvm *gzvm,
memslot->vma = vma;
memslot->flags = mem->flags;
memslot->slot_id = mem->slot;
+
+ ret = gzvm_arch_memregion_purpose(gzvm, mem);
+ if (ret) {
+ pr_err("Failed to config memory region for the specified purpose\n");
+ return -EFAULT;
+ }
return register_memslot_addr_range(gzvm, memslot);
}

diff --git a/include/linux/gzvm_drv.h b/include/linux/gzvm_drv.h
index 406bc9f821b2..d35ca179b688 100644
--- a/include/linux/gzvm_drv.h
+++ b/include/linux/gzvm_drv.h
@@ -152,6 +152,9 @@ void gzvm_drv_irqfd_exit(void);
int gzvm_vm_irqfd_init(struct gzvm *gzvm);
void gzvm_vm_irqfd_release(struct gzvm *gzvm);

+int gzvm_arch_memregion_purpose(struct gzvm *gzvm,
+ struct gzvm_userspace_memory_region *mem);
+
int gzvm_init_ioeventfd(struct gzvm *gzvm);
int gzvm_ioeventfd(struct gzvm *gzvm, struct gzvm_ioeventfd *args);
bool gzvm_ioevent_write(struct gzvm_vcpu *vcpu, __u64 addr, int len,
--
2.18.0

2023-09-19 22:52:05

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v6 14/15] virt: geniezone: Add memory pin/unpin support

From: "Jerry Wang" <[email protected]>

Protected VM's memory cannot be swapped out because the memory pages are
protected from host access.

Once host accesses to those protected pages, the hardware exception is
triggered and may crash the host. So, we have to make those protected
pages be ineligible for swapping or merging by the host kernel to avoid
host access. To do so, we pin the page when it is assigned (donated) to
VM and unpin when VM relinquish the pages or is destroyed. Besides, the
protected VM’s memory requires hypervisor to clear the content before
returning to host, but VMM may free those memory before clearing, it
will result in those memory pages are reclaimed and reused before
totally clearing. Using pin/unpin can also avoid the above problems.

The implementation is described as follows.
- Use rb_tree to store pinned memory pages.
- Pin the page when handling page fault.
- Unpin the pages when VM relinquish the pages or is destroyed.

Signed-off-by: Jerry Wang <[email protected]>
Signed-off-by: Yingshiuan Pan <[email protected]>
Signed-off-by: Liju Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
drivers/virt/geniezone/gzvm_exception.c | 28 ++++++++
drivers/virt/geniezone/gzvm_mmu.c | 89 +++++++++++++++++++++++++
drivers/virt/geniezone/gzvm_vcpu.c | 7 +-
drivers/virt/geniezone/gzvm_vm.c | 18 +++++
include/linux/gzvm_drv.h | 13 ++++
include/uapi/linux/gzvm.h | 5 ++
6 files changed, 157 insertions(+), 3 deletions(-)

diff --git a/drivers/virt/geniezone/gzvm_exception.c b/drivers/virt/geniezone/gzvm_exception.c
index 31fdb4ae8db4..58d1ce305cf7 100644
--- a/drivers/virt/geniezone/gzvm_exception.c
+++ b/drivers/virt/geniezone/gzvm_exception.c
@@ -37,3 +37,31 @@ bool gzvm_handle_guest_exception(struct gzvm_vcpu *vcpu)
else
return false;
}
+
+/**
+ * gzvm_handle_guest_hvc() - Handle guest hvc
+ * @vcpu: Pointer to struct gzvm_vcpu_run in userspace
+ * Return:
+ * * true - This hvc has been processed, no need to back to VMM.
+ * * false - This hvc has not been processed, require userspace.
+ */
+bool gzvm_handle_guest_hvc(struct gzvm_vcpu *vcpu)
+{
+ unsigned long ipa;
+ int ret;
+
+ switch (vcpu->run->hypercall.args[0]) {
+ case GZVM_HVC_MEM_RELINQUISH:
+ ipa = vcpu->run->hypercall.args[1];
+ ret = gzvm_handle_relinquish(vcpu, ipa);
+ break;
+ default:
+ ret = false;
+ break;
+ }
+
+ if (!ret)
+ return true;
+ else
+ return false;
+}
diff --git a/drivers/virt/geniezone/gzvm_mmu.c b/drivers/virt/geniezone/gzvm_mmu.c
index 4fdbdd8e809d..542599f104db 100644
--- a/drivers/virt/geniezone/gzvm_mmu.c
+++ b/drivers/virt/geniezone/gzvm_mmu.c
@@ -107,6 +107,78 @@ int gzvm_gfn_to_pfn_memslot(struct gzvm_memslot *memslot, u64 gfn,
return 0;
}

+static int cmp_ppages(struct rb_node *node, const struct rb_node *parent)
+{
+ struct gzvm_pinned_page *a = container_of(node,
+ struct gzvm_pinned_page,
+ node);
+ struct gzvm_pinned_page *b = container_of(parent,
+ struct gzvm_pinned_page,
+ node);
+
+ if (a->ipa < b->ipa)
+ return -1;
+ if (a->ipa > b->ipa)
+ return 1;
+ return 0;
+}
+
+static int rb_ppage_cmp(const void *key, const struct rb_node *node)
+{
+ struct gzvm_pinned_page *p = container_of(node,
+ struct gzvm_pinned_page,
+ node);
+ phys_addr_t ipa = (phys_addr_t)key;
+
+ return (ipa < p->ipa) ? -1 : (ipa > p->ipa);
+}
+
+static int gzvm_insert_ppage(struct gzvm *vm, struct gzvm_pinned_page *ppage)
+{
+ if (rb_find_add(&ppage->node, &vm->pinned_pages, cmp_ppages))
+ return -EEXIST;
+ return 0;
+}
+
+static int pin_one_page(unsigned long hva, struct page **page)
+{
+ struct mm_struct *mm = current->mm;
+ unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
+
+ mmap_read_lock(mm);
+ pin_user_pages(hva, 1, flags, page);
+ mmap_read_unlock(mm);
+
+ return 0;
+}
+
+/**
+ * gzvm_handle_relinquish() - Handle memory relinquish request from hypervisor
+ *
+ * @vcpu: Pointer to struct gzvm_vcpu_run in userspace
+ * @ipa: Start address(gpa) of a reclaimed page
+ *
+ * Return: Always return 0 because there are no cases of failure
+ */
+int gzvm_handle_relinquish(struct gzvm_vcpu *vcpu, phys_addr_t ipa)
+{
+ struct gzvm_pinned_page *ppage;
+ struct rb_node *node;
+ struct gzvm *vm = vcpu->gzvm;
+
+ node = rb_find((void *)ipa, &vm->pinned_pages, rb_ppage_cmp);
+
+ if (node)
+ rb_erase(node, &vm->pinned_pages);
+ else
+ return 0;
+
+ ppage = container_of(node, struct gzvm_pinned_page, node);
+ unpin_user_pages_dirty_lock(&ppage->page, 1, true);
+ kfree(ppage);
+ return 0;
+}
+
/**
* gzvm_handle_page_fault() - Handle guest page fault, find corresponding page
* for the faulting gpa
@@ -118,7 +190,10 @@ int gzvm_gfn_to_pfn_memslot(struct gzvm_memslot *memslot, u64 gfn,
*/
int gzvm_handle_page_fault(struct gzvm_vcpu *vcpu)
{
+ struct gzvm_pinned_page *ppage = NULL;
struct gzvm *vm = vcpu->gzvm;
+ struct page *page = NULL;
+ unsigned long hva;
u64 pfn, gfn;
int memslot_id;
int ret;
@@ -136,5 +211,19 @@ int gzvm_handle_page_fault(struct gzvm_vcpu *vcpu)
if (unlikely(ret))
return -EFAULT;

+ hva = gzvm_gfn_to_hva_memslot(&vm->memslot[memslot_id], gfn);
+ pin_one_page(hva, &page);
+
+ if (!page)
+ return -EFAULT;
+
+ ppage = kmalloc(sizeof(*ppage), GFP_KERNEL_ACCOUNT);
+ if (!ppage)
+ return -ENOMEM;
+
+ ppage->page = page;
+ ppage->ipa = vcpu->run->exception.fault_gpa;
+ gzvm_insert_ppage(vm, ppage);
+
return 0;
}
diff --git a/drivers/virt/geniezone/gzvm_vcpu.c b/drivers/virt/geniezone/gzvm_vcpu.c
index 0c62fe7f2c37..c3d84644c3eb 100644
--- a/drivers/virt/geniezone/gzvm_vcpu.c
+++ b/drivers/virt/geniezone/gzvm_vcpu.c
@@ -36,8 +36,7 @@ static long gzvm_vcpu_update_one_reg(struct gzvm_vcpu *vcpu,
return -EINVAL;

if (is_write) {
- if (vcpu->vcpuid > 0)
- return -EPERM;
+ /* GZ hypervisor would filter out invalid vcpu register access */
if (copy_from_user(&data, reg_addr, reg_size))
return -EFAULT;
} else {
@@ -119,7 +118,9 @@ static long gzvm_vcpu_run(struct gzvm_vcpu *vcpu, void * __user argp)
need_userspace = true;
break;
case GZVM_EXIT_HYPERCALL:
- fallthrough;
+ if (!gzvm_handle_guest_hvc(vcpu))
+ need_userspace = true;
+ break;
case GZVM_EXIT_DEBUG:
fallthrough;
case GZVM_EXIT_FAIL_ENTRY:
diff --git a/drivers/virt/geniezone/gzvm_vm.c b/drivers/virt/geniezone/gzvm_vm.c
index 9f7e44521de5..747e5cf523bf 100644
--- a/drivers/virt/geniezone/gzvm_vm.c
+++ b/drivers/virt/geniezone/gzvm_vm.c
@@ -292,6 +292,21 @@ static long gzvm_vm_ioctl(struct file *filp, unsigned int ioctl,
return ret;
}

+static void gzvm_destroy_ppage(struct gzvm *gzvm)
+{
+ struct gzvm_pinned_page *ppage;
+ struct rb_node *node;
+
+ node = rb_first(&gzvm->pinned_pages);
+ while (node) {
+ ppage = rb_entry(node, struct gzvm_pinned_page, node);
+ unpin_user_pages_dirty_lock(&ppage->page, 1, true);
+ node = rb_next(node);
+ rb_erase(&ppage->node, &gzvm->pinned_pages);
+ kfree(ppage);
+ }
+}
+
static void gzvm_destroy_vm(struct gzvm *gzvm)
{
pr_debug("VM-%u is going to be destroyed\n", gzvm->vm_id);
@@ -308,6 +323,8 @@ static void gzvm_destroy_vm(struct gzvm *gzvm)

mutex_unlock(&gzvm->lock);

+ gzvm_destroy_ppage(gzvm);
+
kfree(gzvm);
}

@@ -343,6 +360,7 @@ static struct gzvm *gzvm_create_vm(unsigned long vm_type)
gzvm->vm_id = ret;
gzvm->mm = current->mm;
mutex_init(&gzvm->lock);
+ gzvm->pinned_pages = RB_ROOT;

ret = gzvm_vm_irqfd_init(gzvm);
if (ret) {
diff --git a/include/linux/gzvm_drv.h b/include/linux/gzvm_drv.h
index b9e60fe5dcde..77c839c02b17 100644
--- a/include/linux/gzvm_drv.h
+++ b/include/linux/gzvm_drv.h
@@ -12,6 +12,8 @@
#include <linux/mutex.h>
#include <linux/gzvm.h>
#include <linux/srcu.h>
+#include <linux/rbtree.h>
+#include <linux/mm.h>

/*
* For the normal physical address, the highest 12 bits should be zero, so we
@@ -80,6 +82,12 @@ struct gzvm_vcpu {
struct gzvm_vcpu_hwstate *hwstate;
};

+struct gzvm_pinned_page {
+ struct rb_node node;
+ struct page *page;
+ u64 ipa;
+};
+
struct gzvm {
struct gzvm_vcpu *vcpus[GZVM_MAX_VCPUS];
/* userspace tied to this vm */
@@ -106,6 +114,9 @@ struct gzvm {
struct srcu_struct irq_srcu;
/* lock for irq injection */
struct mutex irq_lock;
+
+ /* Use rb-tree to record pin/unpin page */
+ struct rb_root pinned_pages;
};

long gzvm_dev_ioctl_check_extension(struct gzvm *gzvm, unsigned long args);
@@ -147,6 +158,8 @@ int gzvm_arch_inform_exit(u16 vm_id);
int gzvm_find_memslot(struct gzvm *vm, u64 gpa);
int gzvm_handle_page_fault(struct gzvm_vcpu *vcpu);
bool gzvm_handle_guest_exception(struct gzvm_vcpu *vcpu);
+bool gzvm_handle_guest_hvc(struct gzvm_vcpu *vcpu);
+int gzvm_handle_relinquish(struct gzvm_vcpu *vcpu, phys_addr_t ipa);

int gzvm_arch_create_device(u16 vm_id, struct gzvm_create_device *gzvm_dev);
int gzvm_arch_inject_irq(struct gzvm *gzvm, unsigned int vcpu_idx,
diff --git a/include/uapi/linux/gzvm.h b/include/uapi/linux/gzvm.h
index 1f134c55ac2a..edac8e3ae790 100644
--- a/include/uapi/linux/gzvm.h
+++ b/include/uapi/linux/gzvm.h
@@ -191,6 +191,11 @@ enum {
GZVM_EXCEPTION_PAGE_FAULT = 0x1,
};

+/* hypercall definitions of GZVM_EXIT_HYPERCALL */
+enum {
+ GZVM_HVC_MEM_RELINQUISH = 0xc6000009,
+};
+
/**
* struct gzvm_vcpu_run: Same purpose as kvm_run, this struct is
* shared between userspace, kernel and
--
2.18.0

2023-09-20 00:47:53

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v6 15/15] virt: geniezone: Add block-based demand paging support

From: "Yingshiuan Pan" <[email protected]>

To balance memory usage and performance, GenieZone supports larger granularity
demand paging, called block-based demand paging. Gzvm driver uses enable_cap
to query the hypervisor if it supports block-based demand paging and the given
granularity or not. Meanwhile, the gzvm driver allocates a shared buffer for
storing the physical pages later.

If the hypervisor supports, every time the gzvm driver handles guest page
faults, it allocates more memory in advance (default: 2MB) for demand paging.
And fills those physical pages into the allocated shared memory, then calls
the hypervisor to map to guest's memory.

The physical pages allocated for block-based demand paging is not necessary to
be contiguous because in many cases, 2MB block is not followed. 1st, the
memory is allocated because of VMM's page fault (VMM loads kernel image to
guest memory before running). In this case, the page is allocated by the host
kernel and using PAGE_SIZE. 2nd is that guest may return memory to host via
ballooning and that is still 4KB (or PAGE_SIZE) granularity. Therefore, we do
not have to allocate physically contiguous 2MB pages.

Signed-off-by: Yingshiuan Pan <[email protected]>
Signed-off-by: Liju Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
arch/arm64/geniezone/gzvm_arch_common.h | 2 +
arch/arm64/geniezone/vm.c | 19 +++-
drivers/virt/geniezone/gzvm_mmu.c | 110 +++++++++++++++++-------
drivers/virt/geniezone/gzvm_vm.c | 49 +++++++++++
include/linux/gzvm_drv.h | 16 ++++
include/uapi/linux/gzvm.h | 2 +
6 files changed, 164 insertions(+), 34 deletions(-)

diff --git a/arch/arm64/geniezone/gzvm_arch_common.h b/arch/arm64/geniezone/gzvm_arch_common.h
index c21eb8ca8d4b..f57f799bbbc9 100644
--- a/arch/arm64/geniezone/gzvm_arch_common.h
+++ b/arch/arm64/geniezone/gzvm_arch_common.h
@@ -25,6 +25,7 @@ enum {
GZVM_FUNC_MEMREGION_PURPOSE = 15,
GZVM_FUNC_SET_DTB_CONFIG = 16,
GZVM_FUNC_MAP_GUEST = 17,
+ GZVM_FUNC_MAP_GUEST_BLOCK = 18,
NR_GZVM_FUNC,
};

@@ -50,6 +51,7 @@ enum {
#define MT_HVC_GZVM_MEMREGION_PURPOSE GZVM_HCALL_ID(GZVM_FUNC_MEMREGION_PURPOSE)
#define MT_HVC_GZVM_SET_DTB_CONFIG GZVM_HCALL_ID(GZVM_FUNC_SET_DTB_CONFIG)
#define MT_HVC_GZVM_MAP_GUEST GZVM_HCALL_ID(GZVM_FUNC_MAP_GUEST)
+#define MT_HVC_GZVM_MAP_GUEST_BLOCK GZVM_HCALL_ID(GZVM_FUNC_MAP_GUEST_BLOCK)

#define GIC_V3_NR_LRS 16

diff --git a/arch/arm64/geniezone/vm.c b/arch/arm64/geniezone/vm.c
index d236b6cf84b3..d1948045ed72 100644
--- a/arch/arm64/geniezone/vm.c
+++ b/arch/arm64/geniezone/vm.c
@@ -299,10 +299,11 @@ static int gzvm_vm_ioctl_cap_pvm(struct gzvm *gzvm,
fallthrough;
case GZVM_CAP_PVM_SET_PROTECTED_VM:
/*
- * To improve performance for protected VM, we have to populate VM's memory
- * before VM booting
+ * If the hypervisor doesn't support block-based demand paging, we
+ * populate memory in advance to improve performance for protected VM.
*/
- populate_mem_region(gzvm);
+ if (gzvm->demand_page_gran == PAGE_SIZE)
+ populate_mem_region(gzvm);
ret = gzvm_vm_arch_enable_cap(gzvm, cap, &res);
return ret;
case GZVM_CAP_PVM_GET_PVMFW_SIZE:
@@ -319,12 +320,16 @@ int gzvm_vm_ioctl_arch_enable_cap(struct gzvm *gzvm,
struct gzvm_enable_cap *cap,
void __user *argp)
{
+ struct arm_smccc_res res = {0};
int ret;

switch (cap->cap) {
case GZVM_CAP_PROTECTED_VM:
ret = gzvm_vm_ioctl_cap_pvm(gzvm, cap, argp);
return ret;
+ case GZVM_CAP_BLOCK_BASED_DEMAND_PAGING:
+ ret = gzvm_vm_arch_enable_cap(gzvm, cap, &res);
+ return ret;
default:
break;
}
@@ -365,3 +370,11 @@ int gzvm_arch_map_guest(u16 vm_id, int memslot_id, u64 pfn, u64 gfn,
return gzvm_hypcall_wrapper(MT_HVC_GZVM_MAP_GUEST, vm_id, memslot_id,
pfn, gfn, nr_pages, 0, 0, &res);
}
+
+int gzvm_arch_map_guest_block(u16 vm_id, int memslot_id, u64 gfn)
+{
+ struct arm_smccc_res res;
+
+ return gzvm_hypcall_wrapper(MT_HVC_GZVM_MAP_GUEST_BLOCK, vm_id,
+ memslot_id, gfn, 0, 0, 0, 0, &res);
+}
diff --git a/drivers/virt/geniezone/gzvm_mmu.c b/drivers/virt/geniezone/gzvm_mmu.c
index 542599f104db..02b9d89d482d 100644
--- a/drivers/virt/geniezone/gzvm_mmu.c
+++ b/drivers/virt/geniezone/gzvm_mmu.c
@@ -140,15 +140,30 @@ static int gzvm_insert_ppage(struct gzvm *vm, struct gzvm_pinned_page *ppage)
return 0;
}

-static int pin_one_page(unsigned long hva, struct page **page)
+static int pin_one_page(struct gzvm *vm, unsigned long hva, u64 gpa)
{
- struct mm_struct *mm = current->mm;
unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
+ struct gzvm_pinned_page *ppage = NULL;
+ struct mm_struct *mm = current->mm;
+ struct page *page = NULL;
+
+ ppage = kmalloc(sizeof(*ppage), GFP_KERNEL_ACCOUNT);
+ if (!ppage)
+ return -ENOMEM;

mmap_read_lock(mm);
- pin_user_pages(hva, 1, flags, page);
+ pin_user_pages(hva, 1, flags, &page);
mmap_read_unlock(mm);

+ if (!page) {
+ kfree(ppage);
+ return -EFAULT;
+ }
+
+ ppage->page = page;
+ ppage->ipa = gpa;
+ gzvm_insert_ppage(vm, ppage);
+
return 0;
}

@@ -179,6 +194,62 @@ int gzvm_handle_relinquish(struct gzvm_vcpu *vcpu, phys_addr_t ipa)
return 0;
}

+static int handle_block_demand_page(struct gzvm *vm, int memslot_id, u64 gfn)
+{
+ u64 pfn, __gfn, start_gfn;
+ u32 nr_entries;
+ unsigned long hva;
+ int ret, i;
+
+ nr_entries = GZVM_BLOCK_BASED_DEMAND_PAGE_SIZE / PAGE_SIZE;
+ start_gfn = ALIGN_DOWN(gfn, nr_entries);
+
+ mutex_lock(&vm->demand_paging_lock);
+ for (i = 0, __gfn = start_gfn; i < nr_entries; i++, __gfn++) {
+ ret = gzvm_gfn_to_pfn_memslot(&vm->memslot[memslot_id], __gfn,
+ &pfn);
+ if (unlikely(ret)) {
+ ret = -ERR_FAULT;
+ goto err_unlock;
+ }
+ vm->demand_page_buffer[i] = pfn;
+
+ hva = gzvm_gfn_to_hva_memslot(&vm->memslot[memslot_id], __gfn);
+ ret = pin_one_page(vm, hva, PFN_PHYS(__gfn));
+ if (ret)
+ goto err_unlock;
+ }
+
+ ret = gzvm_arch_map_guest_block(vm->vm_id, memslot_id, start_gfn);
+ if (unlikely(ret)) {
+ ret = -EFAULT;
+ goto err_unlock;
+ }
+
+err_unlock:
+ mutex_unlock(&vm->demand_paging_lock);
+
+ return ret;
+}
+
+static int handle_single_demand_page(struct gzvm *vm, int memslot_id, u64 gfn)
+{
+ unsigned long hva;
+ int ret;
+ u64 pfn;
+
+ ret = gzvm_gfn_to_pfn_memslot(&vm->memslot[memslot_id], gfn, &pfn);
+ if (unlikely(ret))
+ return -EFAULT;
+
+ ret = gzvm_arch_map_guest(vm->vm_id, memslot_id, pfn, gfn, 1);
+ if (unlikely(ret))
+ return -EFAULT;
+
+ hva = gzvm_gfn_to_hva_memslot(&vm->memslot[memslot_id], gfn);
+ return pin_one_page(vm, hva, PFN_PHYS(gfn));
+}
+
/**
* gzvm_handle_page_fault() - Handle guest page fault, find corresponding page
* for the faulting gpa
@@ -190,40 +261,17 @@ int gzvm_handle_relinquish(struct gzvm_vcpu *vcpu, phys_addr_t ipa)
*/
int gzvm_handle_page_fault(struct gzvm_vcpu *vcpu)
{
- struct gzvm_pinned_page *ppage = NULL;
struct gzvm *vm = vcpu->gzvm;
- struct page *page = NULL;
- unsigned long hva;
- u64 pfn, gfn;
int memslot_id;
- int ret;
+ u64 gfn;

gfn = PHYS_PFN(vcpu->run->exception.fault_gpa);
memslot_id = gzvm_find_memslot(vm, gfn);
if (unlikely(memslot_id < 0))
return -EFAULT;

- ret = gzvm_gfn_to_pfn_memslot(&vm->memslot[memslot_id], gfn, &pfn);
- if (unlikely(ret))
- return -EFAULT;
-
- ret = gzvm_arch_map_guest(vm->vm_id, memslot_id, pfn, gfn, 1);
- if (unlikely(ret))
- return -EFAULT;
-
- hva = gzvm_gfn_to_hva_memslot(&vm->memslot[memslot_id], gfn);
- pin_one_page(hva, &page);
-
- if (!page)
- return -EFAULT;
-
- ppage = kmalloc(sizeof(*ppage), GFP_KERNEL_ACCOUNT);
- if (!ppage)
- return -ENOMEM;
-
- ppage->page = page;
- ppage->ipa = vcpu->run->exception.fault_gpa;
- gzvm_insert_ppage(vm, ppage);
-
- return 0;
+ if (vm->demand_page_gran == PAGE_SIZE)
+ return handle_single_demand_page(vm, memslot_id, gfn);
+ else
+ return handle_block_demand_page(vm, memslot_id, gfn);
}
diff --git a/drivers/virt/geniezone/gzvm_vm.c b/drivers/virt/geniezone/gzvm_vm.c
index 747e5cf523bf..a7d43bedfad0 100644
--- a/drivers/virt/geniezone/gzvm_vm.c
+++ b/drivers/virt/geniezone/gzvm_vm.c
@@ -309,6 +309,8 @@ static void gzvm_destroy_ppage(struct gzvm *gzvm)

static void gzvm_destroy_vm(struct gzvm *gzvm)
{
+ size_t allocated_size;
+
pr_debug("VM-%u is going to be destroyed\n", gzvm->vm_id);

mutex_lock(&gzvm->lock);
@@ -321,6 +323,11 @@ static void gzvm_destroy_vm(struct gzvm *gzvm)
list_del(&gzvm->vm_list);
mutex_unlock(&gzvm_list_lock);

+ if (gzvm->demand_page_buffer) {
+ allocated_size = GZVM_BLOCK_BASED_DEMAND_PAGE_SIZE / PAGE_SIZE * sizeof(u64);
+ free_pages_exact(gzvm->demand_page_buffer, allocated_size);
+ }
+
mutex_unlock(&gzvm->lock);

gzvm_destroy_ppage(gzvm);
@@ -342,6 +349,46 @@ static const struct file_operations gzvm_vm_fops = {
.llseek = noop_llseek,
};

+/**
+ * setup_vm_demand_paging - Query hypervisor suitable demand page size and set
+ * @vm: gzvm instance for setting up demand page size
+ *
+ * Return: void
+ */
+static void setup_vm_demand_paging(struct gzvm *vm)
+{
+ u32 buf_size = GZVM_BLOCK_BASED_DEMAND_PAGE_SIZE / PAGE_SIZE * sizeof(u64);
+ struct gzvm_enable_cap cap = {0};
+ void *buffer;
+ int ret;
+
+ mutex_init(&vm->demand_paging_lock);
+ buffer = alloc_pages_exact(buf_size, GFP_KERNEL);
+ if (!buffer) {
+ /* Fall back to use default page size for demand paging */
+ vm->demand_page_gran = PAGE_SIZE;
+ vm->demand_page_buffer = NULL;
+ return;
+ }
+
+ cap.cap = GZVM_CAP_BLOCK_BASED_DEMAND_PAGING;
+ cap.args[0] = GZVM_BLOCK_BASED_DEMAND_PAGE_SIZE;
+ cap.args[1] = (__u64)virt_to_phys(buffer);
+ /* demand_page_buffer is freed when destroy VM */
+ vm->demand_page_buffer = buffer;
+
+ ret = gzvm_vm_ioctl_enable_cap(vm, &cap, NULL);
+ if (ret == 0) {
+ vm->demand_page_gran = GZVM_BLOCK_BASED_DEMAND_PAGE_SIZE;
+ /* freed when destroy vm */
+ vm->demand_page_buffer = buffer;
+ } else {
+ vm->demand_page_gran = PAGE_SIZE;
+ vm->demand_page_buffer = NULL;
+ free_pages_exact(buffer, buf_size);
+ }
+}
+
static struct gzvm *gzvm_create_vm(unsigned long vm_type)
{
int ret;
@@ -376,6 +423,8 @@ static struct gzvm *gzvm_create_vm(unsigned long vm_type)
return ERR_PTR(ret);
}

+ setup_vm_demand_paging(gzvm);
+
mutex_lock(&gzvm_list_lock);
list_add(&gzvm->vm_list, &gzvm_list);
mutex_unlock(&gzvm_list_lock);
diff --git a/include/linux/gzvm_drv.h b/include/linux/gzvm_drv.h
index 77c839c02b17..a6994a1c391a 100644
--- a/include/linux/gzvm_drv.h
+++ b/include/linux/gzvm_drv.h
@@ -46,6 +46,8 @@

#define GZVM_VCPU_RUN_MAP_SIZE (PAGE_SIZE * 2)

+#define GZVM_BLOCK_BASED_DEMAND_PAGE_SIZE (2 * 1024 * 1024) /* 2MB */
+
/* struct mem_region_addr_range - Identical to ffa memory constituent */
struct mem_region_addr_range {
/* the base IPA of the constituent memory region, aligned to 4 kiB */
@@ -117,6 +119,19 @@ struct gzvm {

/* Use rb-tree to record pin/unpin page */
struct rb_root pinned_pages;
+
+ /*
+ * demand page granularity: how much memory we allocate for VM in a
+ * single page fault
+ */
+ u32 demand_page_gran;
+ /* the mailbox for transferring large portion pages */
+ u64 *demand_page_buffer;
+ /*
+ * lock for preventing multiple cpu using the same demand page mailbox
+ * at the same time
+ */
+ struct mutex demand_paging_lock;
};

long gzvm_dev_ioctl_check_extension(struct gzvm *gzvm, unsigned long args);
@@ -137,6 +152,7 @@ int gzvm_arch_create_vm(unsigned long vm_type);
int gzvm_arch_destroy_vm(u16 vm_id);
int gzvm_arch_map_guest(u16 vm_id, int memslot_id, u64 pfn, u64 gfn,
u64 nr_pages);
+int gzvm_arch_map_guest_block(u16 vm_id, int memslot_id, u64 gfn);
int gzvm_vm_ioctl_arch_enable_cap(struct gzvm *gzvm,
struct gzvm_enable_cap *cap,
void __user *argp);
diff --git a/include/uapi/linux/gzvm.h b/include/uapi/linux/gzvm.h
index edac8e3ae790..a98787a18c36 100644
--- a/include/uapi/linux/gzvm.h
+++ b/include/uapi/linux/gzvm.h
@@ -18,6 +18,8 @@

#define GZVM_CAP_VM_GPA_SIZE 0xa5
#define GZVM_CAP_PROTECTED_VM 0xffbadab1
+/* query hypervisor supported block-based demand page */
+#define GZVM_CAP_BLOCK_BASED_DEMAND_PAGING 0x9201

/* sub-commands put in args[0] for GZVM_CAP_PROTECTED_VM */
#define GZVM_CAP_PVM_SET_PVMFW_GPA 0
--
2.18.0