2024-04-12 07:00:41

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v10 00/21] GenieZone hypervisor drivers

This series is based on linux-next, tag: next-20240411.

GenieZone hypervisor(gzvm) is a type-1 hypervisor that supports various virtual
machine types and provides security features such as TEE-like scenarios and
secure boot. It can create guest VMs for security use cases and has
virtualization capabilities for both platform and interrupt. Although the
hypervisor can be booted independently, it requires the assistance of GenieZone
hypervisor kernel driver(gzvm-ko) to leverage the ability of Linux kernel for
vCPU scheduling, memory management, inter-VM communication and virtio backend
support.

Changes in v10:
- Optimize memory allocation: query hypervisor demand paging capability before
VM memory population.
- Fix goto syntax according to ACK reviewer in `gzvm_vcpu.c`.
- Fix coding style from viewer suggestion and checking tools.

Changes in v9:
https://lore.kernel.org/all/[email protected]/
- Add gzvm_vm_allocate_guest_page function for demand paging support and
protected VM memory performance optimization.
- Fix coding style from viewer suggestion and checking tools.

Changes in v8:
https://lore.kernel.org/all/[email protected]/
- Add reasons for using dt solution in dt-bindings.
- Add locks for memory pin/unpin and relinquish operations.
- Add VM memory stats in debugfs.
- Add tracing support for hypercall and vcpu exit reasons.
- Enable PTP for timing synchronization between host and guests.
- Optimize memory performance for protected VMs.
- Refactor wording and titles in documentation.

Changes in v7:
https://lore.kernel.org/all/[email protected]/
- Rebase these patches to the Linux 6.7-rc1 release.
- Refactor patches 1 to 15 to improve coding style while ensuring they do not
violate the majority of the changes made in v6
- Provide individual VM memory statistics within debugfs in patch 16.
- Add tracing support for hyper call and vcpu exit_reason.

Changes in v6:
https://lore.kernel.org/all/[email protected]/
- Rebase based on kernel 6.6-rc1
- Keep dt solution and leave the reasons in the commit message
- Remove arch/arm64/include/uapi/asm/gzvm_arch.h due to simplicity
- Remove resampler in drivers/virt/geniezone/gzvm_irqfd.c due to defeature for
now
- Remove PPI in arch/arm64/geniezone/vgic.c
- Refactor vm related components into 3 smaller patches, namely adding vm
support, setting user memory region and checking vm capability
- Refactor vcpu and vm component to remove unnecessary ARM prefix
- Add demand paging to fix crash on destroying memory page, acclerate on booting
and support ballooning deflate
- Add memory pin/unpin memory mechanism to support protected VM
- Add block-based demand paging for performance concern
- Response to reviewers and fix coding style accordingly


Changes in v5:
https://lore.kernel.org/all/[email protected]/
- Add dt solution back for device initialization
- Add GZVM_EXIT_GZ reason for gzvm_vcpu_run()
- Add patch for guest page fault handler
- Add patch for supporitng pin/unpin memory
- Remove unused enum members, namely GZVM_FUNC_GET_REGS and GZVM_FUNC_SET_REGS
- Use dev_debug() for debugging when platform device is available, and use
pr_debug() otherwise
- Response to reviewers and fix bugs accordingly


Changes in v4:
https://lore.kernel.org/all/[email protected]/
- Add macro to set VM as protected without triggering pvmfw in AVF.
- Add support to pass dtb config to hypervisor.
- Add support for virtual timer.
- Add UAPI to pass memory region metadata to hypervisor.
- Define our own macros for ARM's interrupt number
- Elaborate more on GenieZone hyperivsor in documentation
- Fix coding style.
- Implement our own module for coverting ipa to pa
- Modify the way of initializing device from dt to a more discoverable way
- Move refactoring changes into indepedent patches.

Changes in v3:
https://lore.kernel.org/all/[email protected]/
- Refactor: separate arch/arm64/geniezone/gzvm_arch.c into vm.c/vcpu.c/vgic.c
- Remove redundant functions
- Fix reviewer's comments

Changes in v2:
https://lore.kernel.org/all/[email protected]/
- Refactor: move to drivers/virt/geniezone
- Refactor: decouple arch-dependent and arch-independent
- Check pending signal before entering guest context
- Fix reviewer's comments

Initial Commit in v1:
https://lore.kernel.org/all/[email protected]/


Yi-De Wu (21):
virt: geniezone: enable gzvm-ko in defconfig
docs: geniezone: Introduce GenieZone hypervisor
dt-bindings: hypervisor: Add MediaTek GenieZone hypervisor
virt: geniezone: Add GenieZone hypervisor driver
virt: geniezone: Add vm support
virt: geniezone: Add set_user_memory_region for vm
virt: geniezone: Add vm capability check
virt: geniezone: Optimize performance of protected VM memory
virt: geniezone: Add vcpu support
virt: geniezone: Add irqchip support for virtual interrupt injection
virt: geniezone: Add irqfd support
virt: geniezone: Add ioeventfd support
virt: geniezone: Add memory region support
virt: geniezone: Add dtb config support
virt: geniezone: Add demand paging support
virt: geniezone: Add block-based demand paging support
virt: geniezone: Add memory pin/unpin support
virt: geniezone: Add memory relinquish support
virt: geniezone: Provide individual VM memory statistics within
debugfs
virt: geniezone: Add tracing support for hyp call and vcpu exit_reason
virt: geniezone: Enable PTP for synchronizing time between host and
guest VMs

.../hypervisor/mediatek,geniezone-hyp.yaml | 31 +
Documentation/virt/geniezone/introduction.rst | 87 +++
Documentation/virt/index.rst | 1 +
MAINTAINERS | 11 +
arch/arm64/Kbuild | 1 +
arch/arm64/configs/defconfig | 2 +
arch/arm64/geniezone/Makefile | 9 +
arch/arm64/geniezone/gzvm_arch_common.h | 105 +++
arch/arm64/geniezone/hvc.c | 73 ++
arch/arm64/geniezone/vcpu.c | 80 +++
arch/arm64/geniezone/vgic.c | 50 ++
arch/arm64/geniezone/vm.c | 450 +++++++++++++
drivers/virt/Kconfig | 2 +
drivers/virt/geniezone/Kconfig | 16 +
drivers/virt/geniezone/Makefile | 12 +
drivers/virt/geniezone/gzvm_common.h | 12 +
drivers/virt/geniezone/gzvm_exception.c | 61 ++
drivers/virt/geniezone/gzvm_ioeventfd.c | 276 ++++++++
drivers/virt/geniezone/gzvm_irqfd.c | 382 +++++++++++
drivers/virt/geniezone/gzvm_main.c | 149 +++++
drivers/virt/geniezone/gzvm_mmu.c | 330 +++++++++
drivers/virt/geniezone/gzvm_vcpu.c | 282 ++++++++
drivers/virt/geniezone/gzvm_vm.c | 633 ++++++++++++++++++
include/linux/soc/mediatek/gzvm_drv.h | 252 +++++++
include/trace/events/geniezone.h | 84 +++
include/uapi/linux/gzvm.h | 402 +++++++++++
26 files changed, 3793 insertions(+)
create mode 100644 Documentation/devicetree/bindings/hypervisor/mediatek,geniezone-hyp.yaml
create mode 100644 Documentation/virt/geniezone/introduction.rst
create mode 100644 arch/arm64/geniezone/Makefile
create mode 100644 arch/arm64/geniezone/gzvm_arch_common.h
create mode 100644 arch/arm64/geniezone/hvc.c
create mode 100644 arch/arm64/geniezone/vcpu.c
create mode 100644 arch/arm64/geniezone/vgic.c
create mode 100644 arch/arm64/geniezone/vm.c
create mode 100644 drivers/virt/geniezone/Kconfig
create mode 100644 drivers/virt/geniezone/Makefile
create mode 100644 drivers/virt/geniezone/gzvm_common.h
create mode 100644 drivers/virt/geniezone/gzvm_exception.c
create mode 100644 drivers/virt/geniezone/gzvm_ioeventfd.c
create mode 100644 drivers/virt/geniezone/gzvm_irqfd.c
create mode 100644 drivers/virt/geniezone/gzvm_main.c
create mode 100644 drivers/virt/geniezone/gzvm_mmu.c
create mode 100644 drivers/virt/geniezone/gzvm_vcpu.c
create mode 100644 drivers/virt/geniezone/gzvm_vm.c
create mode 100644 include/linux/soc/mediatek/gzvm_drv.h
create mode 100644 include/trace/events/geniezone.h
create mode 100644 include/uapi/linux/gzvm.h

--
2.18.0



2024-04-12 07:01:12

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v10 03/21] dt-bindings: hypervisor: Add MediaTek GenieZone hypervisor

From: "Yingshiuan Pan" <[email protected]>

Add documentation for GenieZone(gzvm) node. This node informs gzvm
driver to start probing if geniezone hypervisor is available and
able to do virtual machine operations.

[Reason to use dt solution]
- The GenieZone hypervisor serves as a vendor model for facilitating
platform virtualization, with an implementation that is independent
from Linuxism.
- In contrast to the dt solution, our previous approach involved probing
via hypercall to determine the existence of our hypervisor. However, this
method raised concerns about potentially impacting all systems, including
those without the GenieZone hypervisor embedded[ref].

Link: https://lore.kernel.org/all/[email protected]/

Signed-off-by: Yingshiuan Pan <[email protected]>
Signed-off-by: Liju Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
.../hypervisor/mediatek,geniezone-hyp.yaml | 31 +++++++++++++++++++
MAINTAINERS | 1 +
2 files changed, 32 insertions(+)
create mode 100644 Documentation/devicetree/bindings/hypervisor/mediatek,geniezone-hyp.yaml

diff --git a/Documentation/devicetree/bindings/hypervisor/mediatek,geniezone-hyp.yaml b/Documentation/devicetree/bindings/hypervisor/mediatek,geniezone-hyp.yaml
new file mode 100644
index 000000000000..ab89a4c310cb
--- /dev/null
+++ b/Documentation/devicetree/bindings/hypervisor/mediatek,geniezone-hyp.yaml
@@ -0,0 +1,31 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/hypervisor/mediatek,geniezone-hyp.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: MediaTek GenieZone hypervisor
+
+maintainers:
+ - Yingshiuan Pan <[email protected]>
+
+description:
+ This interface is designed for integrating GenieZone hypervisor into Android
+ Virtualization Framework(AVF) along with Crosvm as a VMM.
+ It acts like a wrapper for every hypercall to GenieZone hypervisor in
+ order to control guest VM lifecycles and virtual interrupt injections.
+
+properties:
+ compatible:
+ const: mediatek,geniezone-hyp
+
+required:
+ - compatible
+
+additionalProperties: false
+
+examples:
+ - |
+ hypervisor {
+ compatible = "mediatek,geniezone-hyp";
+ };
diff --git a/MAINTAINERS b/MAINTAINERS
index 0cda103140b4..0d1e5d127929 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9180,6 +9180,7 @@ GENIEZONE HYPERVISOR DRIVER
M: Yingshiuan Pan <[email protected]>
M: Ze-Yu Wang <[email protected]>
M: Yi-De Wu <[email protected]>
+F: Documentation/devicetree/bindings/hypervisor/mediatek,geniezone-hyp.yaml
F: Documentation/virt/geniezone/

GENWQE (IBM Generic Workqueue Card)
--
2.18.0


2024-04-12 07:01:42

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v10 05/21] virt: geniezone: Add vm support

From: "Yingshiuan Pan" <[email protected]>

The VM component is responsible for setting up the capability and memory
management for the protected VMs. The capability is mainly about the
lifecycle control and boot context initialization.

Signed-off-by: Yingshiuan Pan <[email protected]>
Signed-off-by: Jerry Wang <[email protected]>
Signed-off-by: Liju Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
MAINTAINERS | 1 +
arch/arm64/geniezone/gzvm_arch_common.h | 4 +
arch/arm64/geniezone/vm.c | 27 ++++++
drivers/virt/geniezone/Makefile | 2 +-
drivers/virt/geniezone/gzvm_main.c | 16 ++++
drivers/virt/geniezone/gzvm_vm.c | 107 ++++++++++++++++++++++++
include/linux/soc/mediatek/gzvm_drv.h | 27 ++++++
include/uapi/linux/gzvm.h | 25 ++++++
8 files changed, 208 insertions(+), 1 deletion(-)
create mode 100644 drivers/virt/geniezone/gzvm_vm.c
create mode 100644 include/uapi/linux/gzvm.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 709ecfbbd691..e2a6f3afc6fa 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9185,6 +9185,7 @@ F: Documentation/virt/geniezone/
F: arch/arm64/geniezone/
F: drivers/virt/geniezone/
F: include/linux/soc/mediatek/gzvm_drv.h
+F: include/uapi/linux/gzvm.h

GENWQE (IBM Generic Workqueue Card)
M: Frank Haverkamp <[email protected]>
diff --git a/arch/arm64/geniezone/gzvm_arch_common.h b/arch/arm64/geniezone/gzvm_arch_common.h
index 660c7cf3fc18..60ee5ed2b39f 100644
--- a/arch/arm64/geniezone/gzvm_arch_common.h
+++ b/arch/arm64/geniezone/gzvm_arch_common.h
@@ -9,6 +9,8 @@
#include <linux/arm-smccc.h>

enum {
+ GZVM_FUNC_CREATE_VM = 0,
+ GZVM_FUNC_DESTROY_VM = 1,
GZVM_FUNC_PROBE = 12,
NR_GZVM_FUNC,
};
@@ -19,6 +21,8 @@ enum {
ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, ARM_SMCCC_SMC_64, \
SMC_ENTITY_MTK, (GZVM_FUNCID_START + (func)))

+#define MT_HVC_GZVM_CREATE_VM GZVM_HCALL_ID(GZVM_FUNC_CREATE_VM)
+#define MT_HVC_GZVM_DESTROY_VM GZVM_HCALL_ID(GZVM_FUNC_DESTROY_VM)
#define MT_HVC_GZVM_PROBE GZVM_HCALL_ID(GZVM_FUNC_PROBE)

/**
diff --git a/arch/arm64/geniezone/vm.c b/arch/arm64/geniezone/vm.c
index dce933f0c122..8ee5490d604a 100644
--- a/arch/arm64/geniezone/vm.c
+++ b/arch/arm64/geniezone/vm.c
@@ -7,6 +7,7 @@
#include <linux/err.h>
#include <linux/uaccess.h>

+#include <linux/gzvm.h>
#include <linux/soc/mediatek/gzvm_drv.h>
#include "gzvm_arch_common.h"

@@ -61,3 +62,29 @@ int gzvm_arch_probe(void)

return 0;
}
+
+/**
+ * gzvm_arch_create_vm() - create vm
+ * @vm_type: VM type. Only supports Linux VM now.
+ *
+ * Return:
+ * * positive value - VM ID
+ * * -ENOMEM - Memory not enough for storing VM data
+ */
+int gzvm_arch_create_vm(unsigned long vm_type)
+{
+ struct arm_smccc_res res;
+ int ret;
+
+ ret = gzvm_hypcall_wrapper(MT_HVC_GZVM_CREATE_VM, vm_type, 0, 0, 0, 0,
+ 0, 0, &res);
+ return ret ? ret : res.a1;
+}
+
+int gzvm_arch_destroy_vm(u16 vm_id)
+{
+ struct arm_smccc_res res;
+
+ return gzvm_hypcall_wrapper(MT_HVC_GZVM_DESTROY_VM, vm_id, 0, 0, 0, 0,
+ 0, 0, &res);
+}
diff --git a/drivers/virt/geniezone/Makefile b/drivers/virt/geniezone/Makefile
index 3a82e5fddf90..25614ea3dea2 100644
--- a/drivers/virt/geniezone/Makefile
+++ b/drivers/virt/geniezone/Makefile
@@ -6,4 +6,4 @@

GZVM_DIR ?= ../../../drivers/virt/geniezone

-gzvm-y := $(GZVM_DIR)/gzvm_main.o
+gzvm-y := $(GZVM_DIR)/gzvm_main.o $(GZVM_DIR)/gzvm_vm.o
diff --git a/drivers/virt/geniezone/gzvm_main.c b/drivers/virt/geniezone/gzvm_main.c
index 12efc3db516a..4b4e5a222a6e 100644
--- a/drivers/virt/geniezone/gzvm_main.c
+++ b/drivers/virt/geniezone/gzvm_main.c
@@ -4,6 +4,7 @@
*/

#include <linux/device.h>
+#include <linux/file.h>
#include <linux/kdev_t.h>
#include <linux/miscdevice.h>
#include <linux/module.h>
@@ -40,7 +41,21 @@ int gzvm_err_to_errno(unsigned long err)
return -EINVAL;
}

+static long gzvm_dev_ioctl(struct file *filp, unsigned int cmd,
+ unsigned long user_args)
+{
+ switch (cmd) {
+ case GZVM_CREATE_VM:
+ return gzvm_dev_ioctl_create_vm(user_args);
+ default:
+ break;
+ }
+
+ return -ENOTTY;
+}
+
static const struct file_operations gzvm_chardev_ops = {
+ .unlocked_ioctl = gzvm_dev_ioctl,
.llseek = noop_llseek,
};

@@ -62,6 +77,7 @@ static int gzvm_drv_probe(struct platform_device *pdev)

static int gzvm_drv_remove(struct platform_device *pdev)
{
+ gzvm_destroy_all_vms();
misc_deregister(&gzvm_dev);
return 0;
}
diff --git a/drivers/virt/geniezone/gzvm_vm.c b/drivers/virt/geniezone/gzvm_vm.c
new file mode 100644
index 000000000000..76722dba6b1f
--- /dev/null
+++ b/drivers/virt/geniezone/gzvm_vm.c
@@ -0,0 +1,107 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2023 MediaTek Inc.
+ */
+
+#include <linux/anon_inodes.h>
+#include <linux/file.h>
+#include <linux/kdev_t.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/soc/mediatek/gzvm_drv.h>
+
+static DEFINE_MUTEX(gzvm_list_lock);
+static LIST_HEAD(gzvm_list);
+
+static void gzvm_destroy_vm(struct gzvm *gzvm)
+{
+ pr_debug("VM-%u is going to be destroyed\n", gzvm->vm_id);
+
+ mutex_lock(&gzvm->lock);
+
+ gzvm_arch_destroy_vm(gzvm->vm_id);
+
+ mutex_lock(&gzvm_list_lock);
+ list_del(&gzvm->vm_list);
+ mutex_unlock(&gzvm_list_lock);
+
+ mutex_unlock(&gzvm->lock);
+
+ kfree(gzvm);
+}
+
+static int gzvm_vm_release(struct inode *inode, struct file *filp)
+{
+ struct gzvm *gzvm = filp->private_data;
+
+ gzvm_destroy_vm(gzvm);
+ return 0;
+}
+
+static const struct file_operations gzvm_vm_fops = {
+ .release = gzvm_vm_release,
+ .llseek = noop_llseek,
+};
+
+static struct gzvm *gzvm_create_vm(unsigned long vm_type)
+{
+ int ret;
+ struct gzvm *gzvm;
+
+ gzvm = kzalloc(sizeof(*gzvm), GFP_KERNEL);
+ if (!gzvm)
+ return ERR_PTR(-ENOMEM);
+
+ ret = gzvm_arch_create_vm(vm_type);
+ if (ret < 0) {
+ kfree(gzvm);
+ return ERR_PTR(ret);
+ }
+
+ gzvm->vm_id = ret;
+ gzvm->mm = current->mm;
+ mutex_init(&gzvm->lock);
+
+ mutex_lock(&gzvm_list_lock);
+ list_add(&gzvm->vm_list, &gzvm_list);
+ mutex_unlock(&gzvm_list_lock);
+
+ pr_debug("VM-%u is created\n", gzvm->vm_id);
+
+ return gzvm;
+}
+
+/**
+ * gzvm_dev_ioctl_create_vm - Create vm fd
+ * @vm_type: VM type. Only supports Linux VM now.
+ *
+ * Return: fd of vm, negative if error
+ */
+int gzvm_dev_ioctl_create_vm(unsigned long vm_type)
+{
+ struct gzvm *gzvm;
+
+ gzvm = gzvm_create_vm(vm_type);
+ if (IS_ERR(gzvm))
+ return PTR_ERR(gzvm);
+
+ return anon_inode_getfd("gzvm-vm", &gzvm_vm_fops, gzvm,
+ O_RDWR | O_CLOEXEC);
+}
+
+void gzvm_destroy_all_vms(void)
+{
+ struct gzvm *gzvm, *tmp;
+
+ mutex_lock(&gzvm_list_lock);
+ if (list_empty(&gzvm_list))
+ goto out;
+
+ list_for_each_entry_safe(gzvm, tmp, &gzvm_list, vm_list)
+ gzvm_destroy_vm(gzvm);
+
+out:
+ mutex_unlock(&gzvm_list_lock);
+}
diff --git a/include/linux/soc/mediatek/gzvm_drv.h b/include/linux/soc/mediatek/gzvm_drv.h
index 907f2f984de9..e7c29c826a7c 100644
--- a/include/linux/soc/mediatek/gzvm_drv.h
+++ b/include/linux/soc/mediatek/gzvm_drv.h
@@ -6,6 +6,12 @@
#ifndef __GZVM_DRV_H__
#define __GZVM_DRV_H__

+#include <linux/list.h>
+#include <linux/mutex.h>
+#include <linux/gzvm.h>
+
+#define INVALID_VM_ID 0xffff
+
/*
* These are the definitions of APIs between GenieZone hypervisor and driver,
* there's no need to be visible to uapi. Furthermore, we need GenieZone
@@ -17,9 +23,30 @@
#define ERR_NOT_IMPLEMENTED (-27)
#define ERR_FAULT (-40)

+/**
+ * struct gzvm: the following data structures are for data transferring between
+ * driver and hypervisor, and they're aligned with hypervisor definitions.
+ * @mm: userspace tied to this vm
+ * @lock: lock for list_add
+ * @vm_list: list head for vm list
+ * @vm_id: vm id
+ */
+struct gzvm {
+ struct mm_struct *mm;
+ struct mutex lock;
+ struct list_head vm_list;
+ u16 vm_id;
+};
+
+int gzvm_dev_ioctl_create_vm(unsigned long vm_type);
+
int gzvm_err_to_errno(unsigned long err);

+void gzvm_destroy_all_vms(void);
+
/* arch-dependant functions */
int gzvm_arch_probe(void);
+int gzvm_arch_create_vm(unsigned long vm_type);
+int gzvm_arch_destroy_vm(u16 vm_id);

#endif /* __GZVM_DRV_H__ */
diff --git a/include/uapi/linux/gzvm.h b/include/uapi/linux/gzvm.h
new file mode 100644
index 000000000000..c26c7720fab7
--- /dev/null
+++ b/include/uapi/linux/gzvm.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Copyright (c) 2023 MediaTek Inc.
+ */
+
+/**
+ * DOC: UAPI of GenieZone Hypervisor
+ *
+ * This file declares common data structure shared among user space,
+ * kernel space, and GenieZone hypervisor.
+ */
+#ifndef __GZVM_H__
+#define __GZVM_H__
+
+#include <linux/const.h>
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+/* GZVM ioctls */
+#define GZVM_IOC_MAGIC 0x92 /* gz */
+
+/* ioctls for /dev/gzvm fds */
+#define GZVM_CREATE_VM _IO(GZVM_IOC_MAGIC, 0x01) /* Returns a Geniezone VM fd */
+
+#endif /* __GZVM_H__ */
--
2.18.0


2024-04-12 07:01:48

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v10 12/21] virt: geniezone: Add ioeventfd support

From: "Yingshiuan Pan" <[email protected]>

Ioeventfd leverages eventfd to provide asynchronous notification
mechanism for VMM. VMM can register a mmio address and bind with an
eventfd. Once a mmio trap occurs on this registered region, its
corresponding eventfd will be notified.

Signed-off-by: Yingshiuan Pan <[email protected]>
Signed-off-by: Liju Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
drivers/virt/geniezone/Makefile | 3 +-
drivers/virt/geniezone/gzvm_ioeventfd.c | 276 ++++++++++++++++++++++++
drivers/virt/geniezone/gzvm_vcpu.c | 27 ++-
drivers/virt/geniezone/gzvm_vm.c | 17 ++
include/linux/soc/mediatek/gzvm_drv.h | 13 ++
include/uapi/linux/gzvm.h | 25 +++
6 files changed, 359 insertions(+), 2 deletions(-)
create mode 100644 drivers/virt/geniezone/gzvm_ioeventfd.c

diff --git a/drivers/virt/geniezone/Makefile b/drivers/virt/geniezone/Makefile
index cebe5ad53f41..9956f4891df2 100644
--- a/drivers/virt/geniezone/Makefile
+++ b/drivers/virt/geniezone/Makefile
@@ -8,4 +8,5 @@ GZVM_DIR ?= ../../../drivers/virt/geniezone

gzvm-y := $(GZVM_DIR)/gzvm_main.o $(GZVM_DIR)/gzvm_vm.o \
$(GZVM_DIR)/gzvm_mmu.o $(GZVM_DIR)/gzvm_vcpu.o \
- $(GZVM_DIR)/gzvm_irqfd.o
+ $(GZVM_DIR)/gzvm_irqfd.o $(GZVM_DIR)/gzvm_ioeventfd.o
+
diff --git a/drivers/virt/geniezone/gzvm_ioeventfd.c b/drivers/virt/geniezone/gzvm_ioeventfd.c
new file mode 100644
index 000000000000..f751b3fa6171
--- /dev/null
+++ b/drivers/virt/geniezone/gzvm_ioeventfd.c
@@ -0,0 +1,276 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2023 MediaTek Inc.
+ */
+
+#include <linux/eventfd.h>
+#include <linux/file.h>
+#include <linux/syscalls.h>
+#include <linux/gzvm.h>
+#include <linux/soc/mediatek/gzvm_drv.h>
+#include <linux/wait.h>
+#include <linux/poll.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+
+struct gzvm_ioevent {
+ struct list_head list;
+ __u64 addr;
+ __u32 len;
+ struct eventfd_ctx *evt_ctx;
+ __u64 datamatch;
+ bool wildcard;
+};
+
+/**
+ * ioeventfd_check_collision() - Check collison assumes gzvm->slots_lock held.
+ * @gzvm: Pointer to gzvm.
+ * @p: Pointer to gzvm_ioevent.
+ *
+ * Return:
+ * * true - collison found
+ * * false - no collison
+ */
+static bool ioeventfd_check_collision(struct gzvm *gzvm, struct gzvm_ioevent *p)
+{
+ struct gzvm_ioevent *_p;
+
+ list_for_each_entry(_p, &gzvm->ioevents, list) {
+ if (_p->addr == p->addr &&
+ (!_p->len || !p->len ||
+ (_p->len == p->len &&
+ (_p->wildcard || p->wildcard ||
+ _p->datamatch == p->datamatch))))
+ return true;
+ if (p->addr >= _p->addr && p->addr < _p->addr + _p->len)
+ return true;
+ }
+
+ return false;
+}
+
+static void gzvm_ioevent_release(struct gzvm_ioevent *p)
+{
+ eventfd_ctx_put(p->evt_ctx);
+ list_del(&p->list);
+ kfree(p);
+}
+
+static bool gzvm_ioevent_in_range(struct gzvm_ioevent *p, __u64 addr, int len,
+ const void *val)
+{
+ u64 _val;
+
+ if (addr != p->addr)
+ /* address must be precise for a hit */
+ return false;
+
+ if (!p->len)
+ /* length = 0 means only look at the address, so always a hit */
+ return true;
+
+ if (len != p->len)
+ /* address-range must be precise for a hit */
+ return false;
+
+ if (p->wildcard)
+ /* all else equal, wildcard is always a hit */
+ return true;
+
+ /* otherwise, we have to actually compare the data */
+
+ WARN_ON_ONCE(!IS_ALIGNED((unsigned long)val, len));
+
+ switch (len) {
+ case 1:
+ _val = *(u8 *)val;
+ break;
+ case 2:
+ _val = *(u16 *)val;
+ break;
+ case 4:
+ _val = *(u32 *)val;
+ break;
+ case 8:
+ _val = *(u64 *)val;
+ break;
+ default:
+ return false;
+ }
+
+ return _val == p->datamatch;
+}
+
+static int gzvm_deassign_ioeventfd(struct gzvm *gzvm,
+ struct gzvm_ioeventfd *args)
+{
+ struct gzvm_ioevent *p, *tmp;
+ struct eventfd_ctx *evt_ctx;
+ int ret = -ENOENT;
+ bool wildcard;
+
+ evt_ctx = eventfd_ctx_fdget(args->fd);
+ if (IS_ERR(evt_ctx))
+ return PTR_ERR(evt_ctx);
+
+ wildcard = !(args->flags & GZVM_IOEVENTFD_FLAG_DATAMATCH);
+
+ mutex_lock(&gzvm->lock);
+
+ list_for_each_entry_safe(p, tmp, &gzvm->ioevents, list) {
+ if (p->evt_ctx != evt_ctx ||
+ p->addr != args->addr ||
+ p->len != args->len ||
+ p->wildcard != wildcard)
+ continue;
+
+ if (!p->wildcard && p->datamatch != args->datamatch)
+ continue;
+
+ gzvm_ioevent_release(p);
+ ret = 0;
+ break;
+ }
+
+ mutex_unlock(&gzvm->lock);
+
+ /* got in the front of this function */
+ eventfd_ctx_put(evt_ctx);
+
+ return ret;
+}
+
+static int gzvm_assign_ioeventfd(struct gzvm *gzvm, struct gzvm_ioeventfd *args)
+{
+ struct eventfd_ctx *evt_ctx;
+ struct gzvm_ioevent *evt;
+ int ret;
+
+ evt_ctx = eventfd_ctx_fdget(args->fd);
+ if (IS_ERR(evt_ctx))
+ return PTR_ERR(evt_ctx);
+
+ evt = kmalloc(sizeof(*evt), GFP_KERNEL);
+ if (!evt)
+ return -ENOMEM;
+ *evt = (struct gzvm_ioevent) {
+ .addr = args->addr,
+ .len = args->len,
+ .evt_ctx = evt_ctx,
+ };
+ if (args->flags & GZVM_IOEVENTFD_FLAG_DATAMATCH) {
+ evt->datamatch = args->datamatch;
+ evt->wildcard = false;
+ } else {
+ evt->wildcard = true;
+ }
+
+ if (ioeventfd_check_collision(gzvm, evt)) {
+ ret = -EEXIST;
+ goto err_free;
+ }
+
+ mutex_lock(&gzvm->lock);
+ list_add_tail(&evt->list, &gzvm->ioevents);
+ mutex_unlock(&gzvm->lock);
+
+ return 0;
+
+err_free:
+ kfree(evt);
+ eventfd_ctx_put(evt_ctx);
+ return ret;
+}
+
+/**
+ * gzvm_ioeventfd_check_valid() - Check user arguments is valid.
+ * @args: Pointer to gzvm_ioeventfd.
+ *
+ * Return:
+ * * true if user arguments are valid.
+ * * false if user arguments are invalid.
+ */
+static bool gzvm_ioeventfd_check_valid(struct gzvm_ioeventfd *args)
+{
+ /* must be natural-word sized, or 0 to ignore length */
+ switch (args->len) {
+ case 0:
+ case 1:
+ case 2:
+ case 4:
+ case 8:
+ break;
+ default:
+ return false;
+ }
+
+ /* check for range overflow */
+ if (args->addr + args->len < args->addr)
+ return false;
+
+ /* check for extra flags that we don't understand */
+ if (args->flags & ~GZVM_IOEVENTFD_VALID_FLAG_MASK)
+ return false;
+
+ /* ioeventfd with no length can't be combined with DATAMATCH */
+ if (!args->len && (args->flags & GZVM_IOEVENTFD_FLAG_DATAMATCH))
+ return false;
+
+ /* gzvm does not support pio bus ioeventfd */
+ if (args->flags & GZVM_IOEVENTFD_FLAG_PIO)
+ return false;
+
+ return true;
+}
+
+/**
+ * gzvm_ioeventfd() - Register ioevent to ioevent list.
+ * @gzvm: Pointer to gzvm.
+ * @args: Pointer to gzvm_ioeventfd.
+ *
+ * Return:
+ * * 0 - Success.
+ * * Negative - Failure.
+ */
+int gzvm_ioeventfd(struct gzvm *gzvm, struct gzvm_ioeventfd *args)
+{
+ if (gzvm_ioeventfd_check_valid(args) == false)
+ return -EINVAL;
+
+ if (args->flags & GZVM_IOEVENTFD_FLAG_DEASSIGN)
+ return gzvm_deassign_ioeventfd(gzvm, args);
+ return gzvm_assign_ioeventfd(gzvm, args);
+}
+
+/**
+ * gzvm_ioevent_write() - Travers this vm's registered ioeventfd to see if
+ * need notifying it.
+ * @vcpu: Pointer to vcpu.
+ * @addr: mmio address.
+ * @len: mmio size.
+ * @val: Pointer to void.
+ *
+ * Return:
+ * * true if this io is already sent to ioeventfd's listener.
+ * * false if we cannot find any ioeventfd registering this mmio write.
+ */
+bool gzvm_ioevent_write(struct gzvm_vcpu *vcpu, __u64 addr, int len,
+ const void *val)
+{
+ struct gzvm_ioevent *e;
+
+ list_for_each_entry(e, &vcpu->gzvm->ioevents, list) {
+ if (gzvm_ioevent_in_range(e, addr, len, val)) {
+ eventfd_signal(e->evt_ctx);
+ return true;
+ }
+ }
+ return false;
+}
+
+int gzvm_init_ioeventfd(struct gzvm *gzvm)
+{
+ INIT_LIST_HEAD(&gzvm->ioevents);
+
+ return 0;
+}
diff --git a/drivers/virt/geniezone/gzvm_vcpu.c b/drivers/virt/geniezone/gzvm_vcpu.c
index 1ac09bf5f2d8..388d25e1183b 100644
--- a/drivers/virt/geniezone/gzvm_vcpu.c
+++ b/drivers/virt/geniezone/gzvm_vcpu.c
@@ -51,6 +51,30 @@ static long gzvm_vcpu_update_one_reg(struct gzvm_vcpu *vcpu,
return 0;
}

+/**
+ * gzvm_vcpu_handle_mmio() - Handle mmio in kernel space.
+ * @vcpu: Pointer to vcpu.
+ *
+ * Return:
+ * * true - This mmio exit has been processed.
+ * * false - This mmio exit has not been processed, require userspace.
+ */
+static bool gzvm_vcpu_handle_mmio(struct gzvm_vcpu *vcpu)
+{
+ __u64 addr;
+ __u32 len;
+ const void *val_ptr;
+
+ /* So far, we don't have in-kernel mmio read handler */
+ if (!vcpu->run->mmio.is_write)
+ return false;
+ addr = vcpu->run->mmio.phys_addr;
+ len = vcpu->run->mmio.size;
+ val_ptr = &vcpu->run->mmio.data;
+
+ return gzvm_ioevent_write(vcpu, addr, len, val_ptr);
+}
+
/**
* gzvm_vcpu_run() - Handle vcpu run ioctl, entry point to guest and exit
* point from guest
@@ -82,7 +106,8 @@ static long gzvm_vcpu_run(struct gzvm_vcpu *vcpu, void __user *argp)

switch (exit_reason) {
case GZVM_EXIT_MMIO:
- need_userspace = true;
+ if (!gzvm_vcpu_handle_mmio(vcpu))
+ need_userspace = true;
break;
/**
* it's geniezone's responsibility to fill corresponding data
diff --git a/drivers/virt/geniezone/gzvm_vm.c b/drivers/virt/geniezone/gzvm_vm.c
index 77be1a22d767..dbd83e2358c9 100644
--- a/drivers/virt/geniezone/gzvm_vm.c
+++ b/drivers/virt/geniezone/gzvm_vm.c
@@ -227,6 +227,16 @@ static long gzvm_vm_ioctl(struct file *filp, unsigned int ioctl,
ret = gzvm_irqfd(gzvm, &data);
break;
}
+ case GZVM_IOEVENTFD: {
+ struct gzvm_ioeventfd data;
+
+ if (copy_from_user(&data, argp, sizeof(data))) {
+ ret = -EFAULT;
+ goto out;
+ }
+ ret = gzvm_ioeventfd(gzvm, &data);
+ break;
+ }
case GZVM_ENABLE_CAP: {
struct gzvm_enable_cap cap;

@@ -303,6 +313,13 @@ static struct gzvm *gzvm_create_vm(unsigned long vm_type)
return ERR_PTR(ret);
}

+ ret = gzvm_init_ioeventfd(gzvm);
+ if (ret) {
+ pr_err("Failed to initialize ioeventfd\n");
+ kfree(gzvm);
+ return ERR_PTR(ret);
+ }
+
mutex_lock(&gzvm_list_lock);
list_add(&gzvm->vm_list, &gzvm_list);
mutex_unlock(&gzvm_list_lock);
diff --git a/include/linux/soc/mediatek/gzvm_drv.h b/include/linux/soc/mediatek/gzvm_drv.h
index 0b02b5daa817..e459dfa681a4 100644
--- a/include/linux/soc/mediatek/gzvm_drv.h
+++ b/include/linux/soc/mediatek/gzvm_drv.h
@@ -6,6 +6,7 @@
#ifndef __GZVM_DRV_H__
#define __GZVM_DRV_H__

+#include <linux/eventfd.h>
#include <linux/list.h>
#include <linux/mm.h>
#include <linux/mutex.h>
@@ -98,6 +99,7 @@ struct gzvm_vcpu {
* @memslot: VM's memory slot descriptor
* @lock: lock for list_add
* @irqfds: the data structure is used to keep irqfds's information
+ * @ioevents: list head for ioevents
* @vm_list: list head for vm list
* @vm_id: vm id
* @irq_ack_notifier_list: list head for irq ack notifier
@@ -117,6 +119,8 @@ struct gzvm {
struct mutex resampler_lock;
} irqfds;

+ struct list_head ioevents;
+
struct list_head vm_list;
u16 vm_id;

@@ -173,4 +177,13 @@ void gzvm_drv_irqfd_exit(void);
int gzvm_vm_irqfd_init(struct gzvm *gzvm);
void gzvm_vm_irqfd_release(struct gzvm *gzvm);

+int gzvm_init_ioeventfd(struct gzvm *gzvm);
+int gzvm_ioeventfd(struct gzvm *gzvm, struct gzvm_ioeventfd *args);
+bool gzvm_ioevent_write(struct gzvm_vcpu *vcpu, __u64 addr, int len,
+ const void *val);
+void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt);
+struct vm_area_struct *vma_lookup(struct mm_struct *mm, unsigned long addr);
+void add_wait_queue_priority(struct wait_queue_head *wq_head,
+ struct wait_queue_entry *wq_entry);
+
#endif /* __GZVM_DRV_H__ */
diff --git a/include/uapi/linux/gzvm.h b/include/uapi/linux/gzvm.h
index aa61ece00cac..6e102cbfec98 100644
--- a/include/uapi/linux/gzvm.h
+++ b/include/uapi/linux/gzvm.h
@@ -339,4 +339,29 @@ struct gzvm_irqfd {

#define GZVM_IRQFD _IOW(GZVM_IOC_MAGIC, 0x76, struct gzvm_irqfd)

+enum {
+ gzvm_ioeventfd_flag_nr_datamatch = 0,
+ gzvm_ioeventfd_flag_nr_pio = 1,
+ gzvm_ioeventfd_flag_nr_deassign = 2,
+ gzvm_ioeventfd_flag_nr_max,
+};
+
+#define GZVM_IOEVENTFD_FLAG_DATAMATCH (1 << gzvm_ioeventfd_flag_nr_datamatch)
+#define GZVM_IOEVENTFD_FLAG_PIO (1 << gzvm_ioeventfd_flag_nr_pio)
+#define GZVM_IOEVENTFD_FLAG_DEASSIGN (1 << gzvm_ioeventfd_flag_nr_deassign)
+#define GZVM_IOEVENTFD_VALID_FLAG_MASK ((1 << gzvm_ioeventfd_flag_nr_max) - 1)
+
+struct gzvm_ioeventfd {
+ __u64 datamatch;
+ /* private: legal pio/mmio address */
+ __u64 addr;
+ /* private: 1, 2, 4, or 8 bytes; or 0 to ignore length */
+ __u32 len;
+ __s32 fd;
+ __u32 flags;
+ __u8 pad[36];
+};
+
+#define GZVM_IOEVENTFD _IOW(GZVM_IOC_MAGIC, 0x79, struct gzvm_ioeventfd)
+
#endif /* __GZVM_H__ */
--
2.18.0


2024-04-12 07:03:48

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v10 20/21] virt: geniezone: Add tracing support for hyp call and vcpu exit_reason

Add tracepoints for hypervisor calls and VCPU exit reasons in GenieZone
driver. It aids performance debugging by providing more information
about hypervisor operations and VCPU behavior.

Command Usage:
echo geniezone:* >> /sys/kernel/tracing/set_event
echo 1 > /sys/kernel/tracing/tracing_on
echo 0 > /sys/kernel/tracing/tracing_on
cat /sys/kernel/tracing/trace

For example:
crosvm_vcpu0-4874 [007] ..... 94.757349: mtk_hypcall_enter: id=0xfb001005
crosvm_vcpu0-4874 [007] ..... 94.760902: mtk_hypcall_leave: id=0xfb001005 invalid=0
crosvm_vcpu0-4874 [007] ..... 94.760902: mtk_vcpu_exit: vcpu exit_reason=IRQ(0x92920003)

This example tracks a hypervisor function call by an ID (`0xbb001005`)
from initiation to termination, which is supported (invalid=0). A vCPU
exit is triggered by an Interrupt Request (IRQ) (exit reason: 0x92920003).

/* VM exit reason */
enum {
GZVM_EXIT_UNKNOWN = 0x92920000,
GZVM_EXIT_MMIO = 0x92920001,
GZVM_EXIT_HYPERCALL = 0x92920002,
GZVM_EXIT_IRQ = 0x92920003,
GZVM_EXIT_EXCEPTION = 0x92920004,
GZVM_EXIT_DEBUG = 0x92920005,
GZVM_EXIT_FAIL_ENTRY = 0x92920006,
GZVM_EXIT_INTERNAL_ERROR = 0x92920007,
GZVM_EXIT_SYSTEM_EVENT = 0x92920008,
GZVM_EXIT_SHUTDOWN = 0x92920009,
GZVM_EXIT_GZ = 0x9292000a,
};

Signed-off-by: Liju-clr Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
arch/arm64/geniezone/vm.c | 4 ++
drivers/virt/geniezone/gzvm_vcpu.c | 5 +-
include/trace/events/geniezone.h | 84 ++++++++++++++++++++++++++++++
3 files changed, 91 insertions(+), 2 deletions(-)
create mode 100644 include/trace/events/geniezone.h

diff --git a/arch/arm64/geniezone/vm.c b/arch/arm64/geniezone/vm.c
index a477546c5a1a..c00142f2e942 100644
--- a/arch/arm64/geniezone/vm.c
+++ b/arch/arm64/geniezone/vm.c
@@ -7,6 +7,8 @@
#include <linux/err.h>
#include <linux/uaccess.h>

+#define CREATE_TRACE_POINTS
+#include <trace/events/geniezone.h>
#include <linux/gzvm.h>
#include <linux/soc/mediatek/gzvm_drv.h>
#include "gzvm_arch_common.h"
@@ -44,11 +46,13 @@ int gzvm_hypcall_wrapper(unsigned long a0, unsigned long a1,
.a6 = a6,
.a7 = a7,
};
+ trace_mtk_hypcall_enter(a0);
arm_smccc_1_2_hvc(&args, &res_1_2);
res->a0 = res_1_2.a0;
res->a1 = res_1_2.a1;
res->a2 = res_1_2.a2;
res->a3 = res_1_2.a3;
+ trace_mtk_hypcall_leave(a0, (res->a0 != ERR_NOT_SUPPORTED) ? 0 : 1);

return gzvm_err_to_errno(res->a0);
}
diff --git a/drivers/virt/geniezone/gzvm_vcpu.c b/drivers/virt/geniezone/gzvm_vcpu.c
index e135d9388090..28bd690e4b7c 100644
--- a/drivers/virt/geniezone/gzvm_vcpu.c
+++ b/drivers/virt/geniezone/gzvm_vcpu.c
@@ -10,6 +10,8 @@
#include <linux/mm.h>
#include <linux/platform_device.h>
#include <linux/slab.h>
+
+#include <trace/events/geniezone.h>
#include <linux/soc/mediatek/gzvm_drv.h>

/* maximum size needed for holding an integer */
@@ -103,6 +105,7 @@ static long gzvm_vcpu_run(struct gzvm_vcpu *vcpu, void __user *argp)

while (!need_userspace && !signal_pending(current)) {
gzvm_arch_vcpu_run(vcpu, &exit_reason);
+ trace_mtk_vcpu_exit(exit_reason);

switch (exit_reason) {
case GZVM_EXIT_MMIO:
@@ -141,11 +144,9 @@ static long gzvm_vcpu_run(struct gzvm_vcpu *vcpu, void __user *argp)
default:
pr_err("vcpu unknown exit\n");
need_userspace = true;
- goto out;
}
}

-out:
if (copy_to_user(argp, vcpu->run, sizeof(struct gzvm_vcpu_run)))
return -EFAULT;
if (signal_pending(current)) {
diff --git a/include/trace/events/geniezone.h b/include/trace/events/geniezone.h
new file mode 100644
index 000000000000..4fffd826ba67
--- /dev/null
+++ b/include/trace/events/geniezone.h
@@ -0,0 +1,84 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (c) 2023 MediaTek Inc.
+ */
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM geniezone
+
+#define _TRACE_GENIEZONE_H
+
+#include <linux/gzvm.h>
+#include <linux/tracepoint.h>
+
+#define GZVM_EXIT_REASONS \
+EM(UNKNOWN)\
+EM(MMIO)\
+EM(HYPERCALL)\
+EM(IRQ)\
+EM(EXCEPTION)\
+EM(DEBUG)\
+EM(FAIL_ENTRY)\
+EM(INTERNAL_ERROR)\
+EM(SYSTEM_EVENT)\
+EM(SHUTDOWN)\
+EMe(GZ)
+
+#undef EM
+#undef EMe
+#define EM(a) TRACE_DEFINE_ENUM(GZVM_EXIT_##a);
+#define EMe(a) TRACE_DEFINE_ENUM(GZVM_EXIT_##a);
+
+GZVM_EXIT_REASONS
+
+#undef EM
+#undef EMe
+
+#define EM(a) { GZVM_EXIT_##a, #a },
+#define EMe(a) { GZVM_EXIT_##a, #a }
+
+TRACE_EVENT(mtk_hypcall_enter,
+ TP_PROTO(unsigned long id),
+
+ TP_ARGS(id),
+
+ TP_STRUCT__entry(__field(unsigned long, id)),
+
+ TP_fast_assign(__entry->id = id;),
+
+ TP_printk("id=0x%lx", __entry->id)
+);
+
+TRACE_EVENT(mtk_hypcall_leave,
+ TP_PROTO(unsigned long id, unsigned long invalid),
+
+ TP_ARGS(id, invalid),
+
+ TP_STRUCT__entry(__field(unsigned long, id)
+ __field(unsigned long, invalid)
+ ),
+
+ TP_fast_assign(__entry->id = id;
+ __entry->invalid = invalid;
+ ),
+
+ TP_printk("id=0x%lx invalid=%lu", __entry->id, __entry->invalid)
+);
+
+TRACE_EVENT(mtk_vcpu_exit,
+ TP_PROTO(unsigned long exit_reason),
+
+ TP_ARGS(exit_reason),
+
+ TP_STRUCT__entry(__field(unsigned long, exit_reason)),
+
+ TP_fast_assign(__entry->exit_reason = exit_reason;),
+
+ TP_printk("vcpu exit_reason=%s(0x%lx)",
+ __print_symbolic(__entry->exit_reason, GZVM_EXIT_REASONS),
+ __entry->exit_reason)
+
+);
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
--
2.18.0


2024-04-12 07:03:49

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v10 08/21] virt: geniezone: Optimize performance of protected VM memory

From: "Yingshiuan Pan" <[email protected]>

The memory protection mechanism performs better with batch operations on
memory pages. To leverage this, we pre-allocate memory for VMs that are
set to protected mode. As a result, the memory protection mechanism can
proactively protect the pre-allocated memory in advance through batch
operations, leading to improved performance during VM booting.

Signed-off-by: Yingshiuan Pan <[email protected]>
Signed-off-by: Jerry Wang <[email protected]>
Signed-off-by: Liju Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
arch/arm64/geniezone/vm.c | 154 ++++++++++++++++++++++++++
drivers/virt/geniezone/Makefile | 3 +-
drivers/virt/geniezone/gzvm_mmu.c | 117 +++++++++++++++++++
include/linux/soc/mediatek/gzvm_drv.h | 6 +
4 files changed, 279 insertions(+), 1 deletion(-)
create mode 100644 drivers/virt/geniezone/gzvm_mmu.c

diff --git a/arch/arm64/geniezone/vm.c b/arch/arm64/geniezone/vm.c
index 0030e57bf77b..642efa596112 100644
--- a/arch/arm64/geniezone/vm.c
+++ b/arch/arm64/geniezone/vm.c
@@ -11,6 +11,8 @@
#include <linux/soc/mediatek/gzvm_drv.h>
#include "gzvm_arch_common.h"

+#define PAR_PA47_MASK GENMASK_ULL(47, 12)
+
/**
* gzvm_hypcall_wrapper() - the wrapper for hvc calls
* @a0: arguments passed in registers 0
@@ -170,6 +172,128 @@ static int gzvm_vm_ioctl_get_pvmfw_size(struct gzvm *gzvm,
return 0;
}

+/**
+ * fill_constituents() - Populate pa to buffer until full
+ * @consti: Pointer to struct mem_region_addr_range.
+ * @consti_cnt: Constituent count.
+ * @max_nr_consti: Maximum number of constituent count.
+ * @gfn: Guest frame number.
+ * @total_pages: Total page numbers.
+ * @slot: Pointer to struct gzvm_memslot.
+ *
+ * Return: how many pages we've fill in, negative if error
+ */
+static int fill_constituents(struct mem_region_addr_range *consti,
+ int *consti_cnt, int max_nr_consti, u64 gfn,
+ u32 total_pages, struct gzvm_memslot *slot)
+{
+ u64 pfn = 0, prev_pfn = 0, gfn_end = 0;
+ int nr_pages = 0;
+ int i = -1;
+
+ if (unlikely(total_pages == 0))
+ return -EINVAL;
+ gfn_end = gfn + total_pages;
+
+ while (i < max_nr_consti && gfn < gfn_end) {
+ if (gzvm_vm_allocate_guest_page(slot, gfn, &pfn) != 0)
+ return -EFAULT;
+ if (pfn == (prev_pfn + 1)) {
+ consti[i].pg_cnt++;
+ } else {
+ i++;
+ if (i >= max_nr_consti)
+ break;
+ consti[i].address = PFN_PHYS(pfn);
+ consti[i].pg_cnt = 1;
+ }
+ prev_pfn = pfn;
+ gfn++;
+ nr_pages++;
+ }
+ if (i != max_nr_consti)
+ i++;
+ *consti_cnt = i;
+
+ return nr_pages;
+}
+
+/**
+ * gzvm_vm_populate_mem_region() - Iterate all mem slot and populate pa to
+ * buffer until it's full
+ * @gzvm: Pointer to struct gzvm.
+ * @slot_id: Memory slot id to be populated.
+ *
+ * Return: 0 if it is successful, negative if error
+ */
+int gzvm_vm_populate_mem_region(struct gzvm *gzvm, int slot_id)
+{
+ struct gzvm_memslot *memslot = &gzvm->memslot[slot_id];
+ struct gzvm_memory_region_ranges *region;
+ int max_nr_consti, remain_pages;
+ u64 gfn, gfn_end;
+ u32 buf_size;
+
+ buf_size = PAGE_SIZE * 2;
+ region = alloc_pages_exact(buf_size, GFP_KERNEL);
+ if (!region)
+ return -ENOMEM;
+
+ max_nr_consti = (buf_size - sizeof(*region)) /
+ sizeof(struct mem_region_addr_range);
+
+ region->slot = memslot->slot_id;
+ remain_pages = memslot->npages;
+ gfn = memslot->base_gfn;
+ gfn_end = gfn + remain_pages;
+
+ while (gfn < gfn_end) {
+ int nr_pages;
+
+ nr_pages = fill_constituents(region->constituents,
+ &region->constituent_cnt,
+ max_nr_consti, gfn,
+ remain_pages, memslot);
+
+ if (nr_pages < 0) {
+ pr_err("Failed to fill constituents\n");
+ free_pages_exact(region, buf_size);
+ return -EFAULT;
+ }
+
+ region->gpa = PFN_PHYS(gfn);
+ region->total_pages = nr_pages;
+ remain_pages -= nr_pages;
+ gfn += nr_pages;
+
+ if (gzvm_arch_set_memregion(gzvm->vm_id, buf_size,
+ virt_to_phys(region))) {
+ pr_err("Failed to register memregion to hypervisor\n");
+ free_pages_exact(region, buf_size);
+ return -EFAULT;
+ }
+ }
+ free_pages_exact(region, buf_size);
+
+ return 0;
+}
+
+static int populate_all_mem_regions(struct gzvm *gzvm)
+{
+ int ret, i;
+
+ for (i = 0; i < GZVM_MAX_MEM_REGION; i++) {
+ if (gzvm->memslot[i].npages == 0)
+ continue;
+
+ ret = gzvm_vm_populate_mem_region(gzvm, i);
+ if (ret != 0)
+ return ret;
+ }
+
+ return 0;
+}
+
/**
* gzvm_vm_ioctl_cap_pvm() - Proceed GZVM_CAP_PROTECTED_VM's subcommands
* @gzvm: Pointer to struct gzvm.
@@ -191,6 +315,11 @@ static int gzvm_vm_ioctl_cap_pvm(struct gzvm *gzvm,
case GZVM_CAP_PVM_SET_PVMFW_GPA:
fallthrough;
case GZVM_CAP_PVM_SET_PROTECTED_VM:
+ /*
+ * To improve performance for protected VM, we have to populate VM's memory
+ * before VM booting
+ */
+ populate_all_mem_regions(gzvm);
ret = gzvm_vm_arch_enable_cap(gzvm, cap, &res);
return ret;
case GZVM_CAP_PVM_GET_PVMFW_SIZE:
@@ -219,3 +348,28 @@ int gzvm_vm_ioctl_arch_enable_cap(struct gzvm *gzvm,

return -EINVAL;
}
+
+/**
+ * gzvm_hva_to_pa_arch() - converts hva to pa with arch-specific way
+ * @hva: Host virtual address.
+ *
+ * Return: GZVM_PA_ERR_BAD for translation error
+ */
+u64 gzvm_hva_to_pa_arch(u64 hva)
+{
+ unsigned long flags;
+ u64 par;
+
+ local_irq_save(flags);
+ asm volatile("at s1e1r, %0" :: "r" (hva));
+ isb();
+ par = read_sysreg_par();
+ local_irq_restore(flags);
+
+ if (par & SYS_PAR_EL1_F)
+ return GZVM_PA_ERR_BAD;
+ par = par & PAR_PA47_MASK;
+ if (!par)
+ return GZVM_PA_ERR_BAD;
+ return par;
+}
diff --git a/drivers/virt/geniezone/Makefile b/drivers/virt/geniezone/Makefile
index 25614ea3dea2..59fc4510a843 100644
--- a/drivers/virt/geniezone/Makefile
+++ b/drivers/virt/geniezone/Makefile
@@ -6,4 +6,5 @@

GZVM_DIR ?= ../../../drivers/virt/geniezone

-gzvm-y := $(GZVM_DIR)/gzvm_main.o $(GZVM_DIR)/gzvm_vm.o
+gzvm-y := $(GZVM_DIR)/gzvm_main.o $(GZVM_DIR)/gzvm_vm.o \
+ $(GZVM_DIR)/gzvm_mmu.o
diff --git a/drivers/virt/geniezone/gzvm_mmu.c b/drivers/virt/geniezone/gzvm_mmu.c
new file mode 100644
index 000000000000..3f1272f0e22d
--- /dev/null
+++ b/drivers/virt/geniezone/gzvm_mmu.c
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2023 MediaTek Inc.
+ */
+
+#include <linux/soc/mediatek/gzvm_drv.h>
+
+/**
+ * hva_to_pa_fast() - converts hva to pa in generic fast way
+ * @hva: Host virtual address.
+ *
+ * Return: GZVM_PA_ERR_BAD for translation error
+ */
+u64 hva_to_pa_fast(u64 hva)
+{
+ struct page *page[1];
+ u64 pfn;
+
+ if (get_user_page_fast_only(hva, 0, page)) {
+ pfn = page_to_phys(page[0]);
+ put_page(page[0]);
+ return pfn;
+ }
+ return GZVM_PA_ERR_BAD;
+}
+
+/**
+ * hva_to_pa_slow() - converts hva to pa in a slow way
+ * @hva: Host virtual address
+ *
+ * This function converts HVA to PA in a slow way because the target hva is not
+ * yet allocated and mapped in the host stage1 page table, we cannot find it
+ * directly from current page table.
+ * Thus, we have to allocate it and this operation is much slower than directly
+ * find via current page table.
+ *
+ * Context: This function may sleep
+ * Return: PA or GZVM_PA_ERR_BAD for translation error
+ */
+u64 hva_to_pa_slow(u64 hva)
+{
+ struct page *page = NULL;
+ u64 pfn = 0;
+ int npages;
+
+ npages = get_user_pages_unlocked(hva, 1, &page, 0);
+ if (npages != 1)
+ return GZVM_PA_ERR_BAD;
+
+ if (page) {
+ pfn = page_to_phys(page);
+ put_page(page);
+ return pfn;
+ }
+
+ return GZVM_PA_ERR_BAD;
+}
+
+static u64 __gzvm_gfn_to_pfn_memslot(struct gzvm_memslot *memslot, u64 gfn)
+{
+ u64 hva, pa;
+
+ if (gzvm_gfn_to_hva_memslot(memslot, gfn, &hva) != 0)
+ return GZVM_PA_ERR_BAD;
+
+ pa = gzvm_hva_to_pa_arch(hva);
+ if (pa != GZVM_PA_ERR_BAD)
+ return PHYS_PFN(pa);
+
+ pa = hva_to_pa_fast(hva);
+ if (pa != GZVM_PA_ERR_BAD)
+ return PHYS_PFN(pa);
+
+ pa = hva_to_pa_slow(hva);
+ if (pa != GZVM_PA_ERR_BAD)
+ return PHYS_PFN(pa);
+
+ return GZVM_PA_ERR_BAD;
+}
+
+/**
+ * gzvm_gfn_to_pfn_memslot() - Translate gfn (guest ipa) to pfn (host pa),
+ * result is in @pfn
+ * @memslot: Pointer to struct gzvm_memslot.
+ * @gfn: Guest frame number.
+ * @pfn: Host page frame number.
+ *
+ * Return:
+ * * 0 - Succeed
+ * * -EFAULT - Failed to convert
+ */
+int gzvm_gfn_to_pfn_memslot(struct gzvm_memslot *memslot, u64 gfn,
+ u64 *pfn)
+{
+ u64 __pfn;
+
+ if (!memslot)
+ return -EFAULT;
+
+ __pfn = __gzvm_gfn_to_pfn_memslot(memslot, gfn);
+ if (__pfn == GZVM_PA_ERR_BAD) {
+ *pfn = 0;
+ return -EFAULT;
+ }
+
+ *pfn = __pfn;
+
+ return 0;
+}
+
+int gzvm_vm_allocate_guest_page(struct gzvm_memslot *slot, u64 gfn, u64 *pfn)
+{
+ if (gzvm_gfn_to_pfn_memslot(slot, gfn, pfn) != 0)
+ return -EFAULT;
+ return 0;
+}
+
diff --git a/include/linux/soc/mediatek/gzvm_drv.h b/include/linux/soc/mediatek/gzvm_drv.h
index 16283ad75df9..18a3e19347ce 100644
--- a/include/linux/soc/mediatek/gzvm_drv.h
+++ b/include/linux/soc/mediatek/gzvm_drv.h
@@ -110,7 +110,13 @@ int gzvm_vm_ioctl_arch_enable_cap(struct gzvm *gzvm,
struct gzvm_enable_cap *cap,
void __user *argp);

+u64 gzvm_hva_to_pa_arch(u64 hva);
+u64 hva_to_pa_fast(u64 hva);
+u64 hva_to_pa_slow(u64 hva);
+int gzvm_gfn_to_pfn_memslot(struct gzvm_memslot *memslot, u64 gfn, u64 *pfn);
int gzvm_gfn_to_hva_memslot(struct gzvm_memslot *memslot, u64 gfn,
u64 *hva_memslot);
+int gzvm_vm_populate_mem_region(struct gzvm *gzvm, int slot_id);
+int gzvm_vm_allocate_guest_page(struct gzvm_memslot *slot, u64 gfn, u64 *pfn);

#endif /* __GZVM_DRV_H__ */
--
2.18.0


2024-04-12 07:03:51

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v10 02/21] docs: geniezone: Introduce GenieZone hypervisor

From: "Yingshiuan Pan" <[email protected]>

GenieZone is MediaTek proprietary hypervisor solution, and it is running
in EL2 stand alone as a type-I hypervisor. It is a pure EL2
implementation which implies it does not rely any specific host VM, and
this behavior improves GenieZone's security as it limits its interface.

Signed-off-by: Yingshiuan Pan <[email protected]>
Signed-off-by: Liju Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
Documentation/virt/geniezone/introduction.rst | 87 +++++++++++++++++++
Documentation/virt/index.rst | 1 +
MAINTAINERS | 6 ++
3 files changed, 94 insertions(+)
create mode 100644 Documentation/virt/geniezone/introduction.rst

diff --git a/Documentation/virt/geniezone/introduction.rst b/Documentation/virt/geniezone/introduction.rst
new file mode 100644
index 000000000000..f280476228b3
--- /dev/null
+++ b/Documentation/virt/geniezone/introduction.rst
@@ -0,0 +1,87 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+GenieZone Introduction
+======================
+
+Overview
+========
+GenieZone hypervisor (gzvm) is a type-1 hypervisor that supports various virtual
+machine types and provides security features such as TEE-like scenarios and
+secure boot. It can create guest VMs for security use cases and has
+virtualization capabilities for both platform and interrupt. Although the
+hypervisor can be booted independently, it requires the assistance of GenieZone
+hypervisor kernel driver(also named gzvm) to leverage the ability of Linux
+kernel for vCPU scheduling, memory management, inter-VM communication and virtio
+backend support.
+
+Supported Architecture
+======================
+GenieZone now only supports MediaTek ARM64 SoC.
+
+Features
+========
+
+- vCPU Management
+
+ VM manager aims to provide vCPUs on the basis of time sharing on physical
+ CPUs. It requires Linux kernel in host VM for vCPU scheduling and VM power
+ management.
+
+- Memory Management
+
+ Direct use of physical memory from VMs is forbidden and designed to be
+ dictated to the privilege models managed by GenieZone hypervisor for security
+ reason. With the help of gzvm module, the hypervisor would be able to manipulate
+ memory as objects.
+
+- Virtual Platform
+
+ We manage to emulate a virtual mobile platform for guest OS running on guest
+ VM. The platform supports various architecture-defined devices, such as
+ virtual arch timer, GIC, MMIO, PSCI, and exception watching...etc.
+
+- Inter-VM Communication
+
+ Communication among guest VMs was provided mainly on RPC. More communication
+ mechanisms were to be provided in the future based on VirtIO-vsock.
+
+- Device Virtualization
+
+ The solution is provided using the well-known VirtIO. The gzvm module would
+ redirect MMIO traps back to VMM where the virtual devices are mostly emulated.
+ Ioeventfd is implemented using eventfd for signaling host VM that some IO
+ events in guest VMs need to be processed.
+
+- Interrupt virtualization
+
+ All Interrupts during some guest VMs running would be handled by GenieZone
+ hypervisor with the help of gzvm module, both virtual and physical ones.
+ In case there's no guest VM running out there, physical interrupts would be
+ handled by host VM directly for performance reason. Irqfd is also implemented
+ using eventfd for accepting vIRQ requests in gzvm module.
+
+Platform architecture component
+===============================
+
+- vm
+
+ The vm component is responsible for setting up the capability and memory
+ management for the protected VMs. The capability is mainly about the lifecycle
+ control and boot context initialization. And the memory management is highly
+ integrated with ARM 2-stage translation tables to convert VA to IPA to PA
+ under proper security measures required by protected VMs.
+
+- vcpu
+
+ The vcpu component is the core of virtualizing aarch64 physical CPU runnable,
+ and it controls the vCPU lifecycle including creating, running and destroying.
+ With self-defined exit handler, the vm component would be able to act
+ accordingly before terminated.
+
+- vgic
+
+ The vgic component exposes control interfaces to Linux kernel via irqchip, and
+ we intend to support all SPI, PPI, and SGI. When it comes to virtual
+ interrupts, the GenieZone hypervisor would write to list registers and trigger
+ vIRQ injection in guest VMs via GIC.
diff --git a/Documentation/virt/index.rst b/Documentation/virt/index.rst
index 7fb55ae08598..cf12444db336 100644
--- a/Documentation/virt/index.rst
+++ b/Documentation/virt/index.rst
@@ -16,6 +16,7 @@ Virtualization Support
coco/sev-guest
coco/tdx-guest
hyperv/index
+ geniezone/introduction

.. only:: html and subproject

diff --git a/MAINTAINERS b/MAINTAINERS
index 88981d9f3958..0cda103140b4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9176,6 +9176,12 @@ F: include/vdso/
F: kernel/time/vsyscall.c
F: lib/vdso/

+GENIEZONE HYPERVISOR DRIVER
+M: Yingshiuan Pan <[email protected]>
+M: Ze-Yu Wang <[email protected]>
+M: Yi-De Wu <[email protected]>
+F: Documentation/virt/geniezone/
+
GENWQE (IBM Generic Workqueue Card)
M: Frank Haverkamp <[email protected]>
S: Supported
--
2.18.0


2024-04-12 07:03:52

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v10 21/21] virt: geniezone: Enable PTP for synchronizing time between host and guest VMs

From: "Kevenny Hsieh" <[email protected]>

Enabled Precision Time Protocol (PTP) for improved host-guest VM time
synchronization, optimizing operations needing precise clock sync in
virtual environment.

Signed-off-by: Kevenny Hsieh <[email protected]>
Signed-off-by: Liju Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
arch/arm64/geniezone/Makefile | 2 +-
arch/arm64/geniezone/gzvm_arch_common.h | 3 +
arch/arm64/geniezone/hvc.c | 73 +++++++++++++++++++++++++
drivers/virt/geniezone/gzvm_exception.c | 3 +-
include/linux/soc/mediatek/gzvm_drv.h | 1 +
include/uapi/linux/gzvm.h | 1 +
6 files changed, 80 insertions(+), 3 deletions(-)
create mode 100644 arch/arm64/geniezone/hvc.c

diff --git a/arch/arm64/geniezone/Makefile b/arch/arm64/geniezone/Makefile
index 0e4f1087f9de..553a64a926dc 100644
--- a/arch/arm64/geniezone/Makefile
+++ b/arch/arm64/geniezone/Makefile
@@ -4,6 +4,6 @@
#
include $(srctree)/drivers/virt/geniezone/Makefile

-gzvm-y += vm.o vcpu.o vgic.o
+gzvm-y += vm.o vcpu.o vgic.o hvc.o

obj-$(CONFIG_MTK_GZVM) += gzvm.o
diff --git a/arch/arm64/geniezone/gzvm_arch_common.h b/arch/arm64/geniezone/gzvm_arch_common.h
index 192d023722e5..8f5d8528ab96 100644
--- a/arch/arm64/geniezone/gzvm_arch_common.h
+++ b/arch/arm64/geniezone/gzvm_arch_common.h
@@ -83,6 +83,8 @@ int gzvm_hypcall_wrapper(unsigned long a0, unsigned long a1,
* @__pad: add an explicit '__u32 __pad;' in the middle to make it clear
* what the actual layout is.
* @lr: The array of LRs(list registers).
+ * @vtimer_offset: The offset maintained by hypervisor that is host cycle count
+ * when guest VM startup.
*
* - Keep the same layout of hypervisor data struct.
* - Sync list registers back for acking virtual device interrupt status.
@@ -91,6 +93,7 @@ struct gzvm_vcpu_hwstate {
__le32 nr_lrs;
__le32 __pad;
__le64 lr[GIC_V3_NR_LRS];
+ __le64 vtimer_offset;
};

static inline unsigned int
diff --git a/arch/arm64/geniezone/hvc.c b/arch/arm64/geniezone/hvc.c
new file mode 100644
index 000000000000..3d7f71f20dce
--- /dev/null
+++ b/arch/arm64/geniezone/hvc.c
@@ -0,0 +1,73 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2023 MediaTek Inc.
+ */
+#include <linux/clocksource.h>
+#include <linux/kernel.h>
+#include <linux/timekeeping.h>
+#include <linux/soc/mediatek/gzvm_drv.h>
+#include "gzvm_arch_common.h"
+
+#define GZVM_PTP_VIRT_COUNTER 0
+#define GZVM_PTP_PHYS_COUNTER 1
+/**
+ * gzvm_handle_ptp_time() - Sync time between host and guest VM
+ * @vcpu: Pointer to struct gzvm_vcpu_run in userspace
+ * @counter: Counter type from guest VM
+ * Return: Always return 0 because there are no cases of failure
+ *
+ * The following register values will be passed to the guest VM
+ * for time synchronization:
+ * regs->x0 (upper 32 bits) wall clock time
+ * regs->x1 (lower 32 bits) wall clock time
+ * regs->x2 (upper 32 bits) cycles
+ * regs->x3 (lower 32 bits) cycles
+ */
+static int gzvm_handle_ptp_time(struct gzvm_vcpu *vcpu, int counter)
+{
+ struct system_time_snapshot snapshot;
+ u64 cycles = 0;
+
+ ktime_get_snapshot(&snapshot);
+
+ switch (counter) {
+ case GZVM_PTP_VIRT_COUNTER:
+ cycles = snapshot.cycles -
+ le64_to_cpu(vcpu->hwstate->vtimer_offset);
+ break;
+ case GZVM_PTP_PHYS_COUNTER:
+ cycles = snapshot.cycles;
+ break;
+ default:
+ break;
+ }
+
+ vcpu->run->hypercall.args[0] = upper_32_bits(snapshot.real);
+ vcpu->run->hypercall.args[1] = lower_32_bits(snapshot.real);
+ vcpu->run->hypercall.args[2] = upper_32_bits(cycles);
+ vcpu->run->hypercall.args[3] = lower_32_bits(cycles);
+
+ return 0;
+}
+
+/**
+ * gzvm_arch_handle_guest_hvc() - Handle architecture-related guest hvc
+ * @vcpu: Pointer to struct gzvm_vcpu_run in userspace
+ * Return:
+ * * true - This hvc has been processed, no need to back to VMM.
+ * * false - This hvc has not been processed, require userspace.
+ */
+bool gzvm_arch_handle_guest_hvc(struct gzvm_vcpu *vcpu)
+{
+ int ret, counter;
+
+ switch (vcpu->run->hypercall.args[0]) {
+ case GZVM_HVC_PTP:
+ counter = vcpu->run->hypercall.args[1];
+ ret = gzvm_handle_ptp_time(vcpu, counter);
+ return (ret == 0) ? true : false;
+ default:
+ break;
+ }
+ return false;
+}
diff --git a/drivers/virt/geniezone/gzvm_exception.c b/drivers/virt/geniezone/gzvm_exception.c
index 07871ec74651..d824211f49a6 100644
--- a/drivers/virt/geniezone/gzvm_exception.c
+++ b/drivers/virt/geniezone/gzvm_exception.c
@@ -56,7 +56,6 @@ bool gzvm_handle_guest_hvc(struct gzvm_vcpu *vcpu)
ret = gzvm_handle_relinquish(vcpu, ipa);
return (ret == 0) ? true : false;
default:
- break;
+ return gzvm_arch_handle_guest_hvc(vcpu);
}
- return false;
}
diff --git a/include/linux/soc/mediatek/gzvm_drv.h b/include/linux/soc/mediatek/gzvm_drv.h
index e123787cd70d..f6b7acca37b8 100644
--- a/include/linux/soc/mediatek/gzvm_drv.h
+++ b/include/linux/soc/mediatek/gzvm_drv.h
@@ -223,6 +223,7 @@ int gzvm_handle_page_fault(struct gzvm_vcpu *vcpu);
bool gzvm_handle_guest_exception(struct gzvm_vcpu *vcpu);
int gzvm_handle_relinquish(struct gzvm_vcpu *vcpu, phys_addr_t ipa);
bool gzvm_handle_guest_hvc(struct gzvm_vcpu *vcpu);
+bool gzvm_arch_handle_guest_hvc(struct gzvm_vcpu *vcpu);

int gzvm_arch_create_device(u16 vm_id, struct gzvm_create_device *gzvm_dev);
int gzvm_arch_inject_irq(struct gzvm *gzvm, unsigned int vcpu_idx,
diff --git a/include/uapi/linux/gzvm.h b/include/uapi/linux/gzvm.h
index 5411357ec05e..1cf89213a383 100644
--- a/include/uapi/linux/gzvm.h
+++ b/include/uapi/linux/gzvm.h
@@ -197,6 +197,7 @@ enum {

/* hypercall definitions of GZVM_EXIT_HYPERCALL */
enum {
+ GZVM_HVC_PTP = 0x86000001,
GZVM_HVC_MEM_RELINQUISH = 0xc6000009,
};

--
2.18.0


2024-04-12 07:03:57

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v10 01/21] virt: geniezone: enable gzvm-ko in defconfig

Add config in defconfig to enable gzvm driver by default

Signed-off-by: Yingshiuan Pan <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
arch/arm64/configs/defconfig | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 9957e126e32d..6ca6bb580096 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -1688,3 +1688,5 @@ CONFIG_CORESIGHT_STM=m
CONFIG_CORESIGHT_CPU_DEBUG=m
CONFIG_CORESIGHT_CTI=m
CONFIG_MEMTEST=y
+CONFIG_VIRT_DRIVERS=y
+CONFIG_MTK_GZVM=m
--
2.18.0


2024-04-12 07:03:57

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v10 15/21] virt: geniezone: Add demand paging support

From: "Yingshiuan Pan" <[email protected]>

This page fault handler helps GenieZone hypervisor to do demand paging.
On a lower level translation fault, GenieZone hypervisor will first
check the fault GPA (guest physical address or IPA in ARM) is valid
e.g. within the registered memory region, then it will setup the
vcpu_run->exit_reason with necessary information for returning to
gzvm driver.

With the fault information, the gzvm driver looks up the physical
address and call the MT_HVC_GZVM_MAP_GUEST to request the hypervisor
maps the found PA to the fault GPA (IPA).

There is one exception, for protected vm, we will populate full VM's
memory region in advance in order to improve performance.

Signed-off-by: Yingshiuan Pan <[email protected]>
Signed-off-by: Jerry Wang <[email protected]>
Signed-off-by: kevenny hsieh <[email protected]>
Signed-off-by: Liju Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
arch/arm64/geniezone/gzvm_arch_common.h | 2 ++
arch/arm64/geniezone/vm.c | 13 +++++++
drivers/virt/geniezone/Makefile | 4 +--
drivers/virt/geniezone/gzvm_exception.c | 39 ++++++++++++++++++++
drivers/virt/geniezone/gzvm_main.c | 2 ++
drivers/virt/geniezone/gzvm_mmu.c | 41 +++++++++++++++++++++
drivers/virt/geniezone/gzvm_vcpu.c | 6 ++--
drivers/virt/geniezone/gzvm_vm.c | 48 ++++++++++++++++++++++++-
include/linux/soc/mediatek/gzvm_drv.h | 14 ++++++++
include/uapi/linux/gzvm.h | 13 +++++++
10 files changed, 177 insertions(+), 5 deletions(-)
create mode 100644 drivers/virt/geniezone/gzvm_exception.c

diff --git a/arch/arm64/geniezone/gzvm_arch_common.h b/arch/arm64/geniezone/gzvm_arch_common.h
index 4366618cdc0a..928191e3cdb2 100644
--- a/arch/arm64/geniezone/gzvm_arch_common.h
+++ b/arch/arm64/geniezone/gzvm_arch_common.h
@@ -24,6 +24,7 @@ enum {
GZVM_FUNC_INFORM_EXIT = 14,
GZVM_FUNC_MEMREGION_PURPOSE = 15,
GZVM_FUNC_SET_DTB_CONFIG = 16,
+ GZVM_FUNC_MAP_GUEST = 17,
NR_GZVM_FUNC,
};

@@ -48,6 +49,7 @@ enum {
#define MT_HVC_GZVM_INFORM_EXIT GZVM_HCALL_ID(GZVM_FUNC_INFORM_EXIT)
#define MT_HVC_GZVM_MEMREGION_PURPOSE GZVM_HCALL_ID(GZVM_FUNC_MEMREGION_PURPOSE)
#define MT_HVC_GZVM_SET_DTB_CONFIG GZVM_HCALL_ID(GZVM_FUNC_SET_DTB_CONFIG)
+#define MT_HVC_GZVM_MAP_GUEST GZVM_HCALL_ID(GZVM_FUNC_MAP_GUEST)

#define GIC_V3_NR_LRS 16

diff --git a/arch/arm64/geniezone/vm.c b/arch/arm64/geniezone/vm.c
index cbebae3ff663..3cd24408f880 100644
--- a/arch/arm64/geniezone/vm.c
+++ b/arch/arm64/geniezone/vm.c
@@ -367,12 +367,16 @@ int gzvm_vm_ioctl_arch_enable_cap(struct gzvm *gzvm,
struct gzvm_enable_cap *cap,
void __user *argp)
{
+ struct arm_smccc_res res = {0};
int ret;

switch (cap->cap) {
case GZVM_CAP_PROTECTED_VM:
ret = gzvm_vm_ioctl_cap_pvm(gzvm, cap, argp);
return ret;
+ case GZVM_CAP_ENABLE_DEMAND_PAGING:
+ ret = gzvm_vm_arch_enable_cap(gzvm, cap, &res);
+ return ret;
default:
break;
}
@@ -404,3 +408,12 @@ u64 gzvm_hva_to_pa_arch(u64 hva)
return GZVM_PA_ERR_BAD;
return par;
}
+
+int gzvm_arch_map_guest(u16 vm_id, int memslot_id, u64 pfn, u64 gfn,
+ u64 nr_pages)
+{
+ struct arm_smccc_res res;
+
+ return gzvm_hypcall_wrapper(MT_HVC_GZVM_MAP_GUEST, vm_id, memslot_id,
+ pfn, gfn, nr_pages, 0, 0, &res);
+}
diff --git a/drivers/virt/geniezone/Makefile b/drivers/virt/geniezone/Makefile
index 9956f4891df2..2e12870637d5 100644
--- a/drivers/virt/geniezone/Makefile
+++ b/drivers/virt/geniezone/Makefile
@@ -8,5 +8,5 @@ GZVM_DIR ?= ../../../drivers/virt/geniezone

gzvm-y := $(GZVM_DIR)/gzvm_main.o $(GZVM_DIR)/gzvm_vm.o \
$(GZVM_DIR)/gzvm_mmu.o $(GZVM_DIR)/gzvm_vcpu.o \
- $(GZVM_DIR)/gzvm_irqfd.o $(GZVM_DIR)/gzvm_ioeventfd.o
-
+ $(GZVM_DIR)/gzvm_irqfd.o $(GZVM_DIR)/gzvm_ioeventfd.o \
+ $(GZVM_DIR)/gzvm_exception.o
diff --git a/drivers/virt/geniezone/gzvm_exception.c b/drivers/virt/geniezone/gzvm_exception.c
new file mode 100644
index 000000000000..475bc15b0689
--- /dev/null
+++ b/drivers/virt/geniezone/gzvm_exception.c
@@ -0,0 +1,39 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2023 MediaTek Inc.
+ */
+
+#include <linux/device.h>
+#include <linux/soc/mediatek/gzvm_drv.h>
+
+/**
+ * gzvm_handle_guest_exception() - Handle guest exception
+ * @vcpu: Pointer to struct gzvm_vcpu_run in userspace
+ * Return:
+ * * true - This exception has been processed, no need to back to VMM.
+ * * false - This exception has not been processed, require userspace.
+ */
+bool gzvm_handle_guest_exception(struct gzvm_vcpu *vcpu)
+{
+ int ret;
+
+ for (int i = 0; i < ARRAY_SIZE(vcpu->run->exception.reserved); i++) {
+ if (vcpu->run->exception.reserved[i])
+ return -EINVAL;
+ }
+
+ switch (vcpu->run->exception.exception) {
+ case GZVM_EXCEPTION_PAGE_FAULT:
+ ret = gzvm_handle_page_fault(vcpu);
+ break;
+ case GZVM_EXCEPTION_UNKNOWN:
+ fallthrough;
+ default:
+ ret = -EFAULT;
+ }
+
+ if (!ret)
+ return true;
+ else
+ return false;
+}
diff --git a/drivers/virt/geniezone/gzvm_main.c b/drivers/virt/geniezone/gzvm_main.c
index 75f643222b91..8f11a27f2723 100644
--- a/drivers/virt/geniezone/gzvm_main.c
+++ b/drivers/virt/geniezone/gzvm_main.c
@@ -28,6 +28,8 @@ int gzvm_err_to_errno(unsigned long err)
return 0;
case ERR_NO_MEMORY:
return -ENOMEM;
+ case ERR_INVALID_ARGS:
+ return -EINVAL;
case ERR_NOT_SUPPORTED:
fallthrough;
case ERR_NOT_IMPLEMENTED:
diff --git a/drivers/virt/geniezone/gzvm_mmu.c b/drivers/virt/geniezone/gzvm_mmu.c
index 3f1272f0e22d..3f7657544c30 100644
--- a/drivers/virt/geniezone/gzvm_mmu.c
+++ b/drivers/virt/geniezone/gzvm_mmu.c
@@ -115,3 +115,44 @@ int gzvm_vm_allocate_guest_page(struct gzvm_memslot *slot, u64 gfn, u64 *pfn)
return 0;
}

+static int handle_single_demand_page(struct gzvm *vm, int memslot_id, u64 gfn)
+{
+ int ret;
+ u64 pfn;
+
+ ret = gzvm_vm_allocate_guest_page(&vm->memslot[memslot_id], gfn, &pfn);
+ if (unlikely(ret))
+ return -EFAULT;
+
+ ret = gzvm_arch_map_guest(vm->vm_id, memslot_id, pfn, gfn, 1);
+ if (unlikely(ret))
+ return -EFAULT;
+
+ return 0;
+}
+
+/**
+ * gzvm_handle_page_fault() - Handle guest page fault, find corresponding page
+ * for the faulting gpa
+ * @vcpu: Pointer to struct gzvm_vcpu_run of the faulting vcpu
+ *
+ * Return:
+ * * 0 - Success to handle guest page fault
+ * * -EFAULT - Failed to map phys addr to guest's GPA
+ */
+int gzvm_handle_page_fault(struct gzvm_vcpu *vcpu)
+{
+ struct gzvm *vm = vcpu->gzvm;
+ int memslot_id;
+ u64 gfn;
+
+ gfn = PHYS_PFN(vcpu->run->exception.fault_gpa);
+ memslot_id = gzvm_find_memslot(vm, gfn);
+ if (unlikely(memslot_id < 0))
+ return -EFAULT;
+
+ if (unlikely(vm->mem_alloc_mode == GZVM_FULLY_POPULATED))
+ return -EFAULT;
+
+ return handle_single_demand_page(vm, memslot_id, gfn);
+}
diff --git a/drivers/virt/geniezone/gzvm_vcpu.c b/drivers/virt/geniezone/gzvm_vcpu.c
index 388d25e1183b..e8d6f32f325c 100644
--- a/drivers/virt/geniezone/gzvm_vcpu.c
+++ b/drivers/virt/geniezone/gzvm_vcpu.c
@@ -113,9 +113,11 @@ static long gzvm_vcpu_run(struct gzvm_vcpu *vcpu, void __user *argp)
* it's geniezone's responsibility to fill corresponding data
* structure
*/
- case GZVM_EXIT_HYPERCALL:
- fallthrough;
case GZVM_EXIT_EXCEPTION:
+ if (!gzvm_handle_guest_exception(vcpu))
+ need_userspace = true;
+ break;
+ case GZVM_EXIT_HYPERCALL:
fallthrough;
case GZVM_EXIT_DEBUG:
fallthrough;
diff --git a/drivers/virt/geniezone/gzvm_vm.c b/drivers/virt/geniezone/gzvm_vm.c
index 1fc915b790b8..fc6e58008b92 100644
--- a/drivers/virt/geniezone/gzvm_vm.c
+++ b/drivers/virt/geniezone/gzvm_vm.c
@@ -29,6 +29,31 @@ int gzvm_gfn_to_hva_memslot(struct gzvm_memslot *memslot, u64 gfn,
return 0;
}

+/**
+ * gzvm_find_memslot() - Find memslot containing this @gpa
+ * @vm: Pointer to struct gzvm
+ * @gfn: Guest frame number
+ *
+ * Return:
+ * * >=0 - Index of memslot
+ * * -EFAULT - Not found
+ */
+int gzvm_find_memslot(struct gzvm *vm, u64 gfn)
+{
+ int i;
+
+ for (i = 0; i < GZVM_MAX_MEM_REGION; i++) {
+ if (vm->memslot[i].npages == 0)
+ continue;
+
+ if (gfn >= vm->memslot[i].base_gfn &&
+ gfn < vm->memslot[i].base_gfn + vm->memslot[i].npages)
+ return i;
+ }
+
+ return -EFAULT;
+}
+
/**
* register_memslot_addr_range() - Register memory region to GenieZone
* @gzvm: Pointer to struct gzvm
@@ -60,7 +85,10 @@ register_memslot_addr_range(struct gzvm *gzvm, struct gzvm_memslot *memslot)
}

free_pages_exact(region, buf_size);
- return 0;
+
+ if (gzvm->mem_alloc_mode == GZVM_DEMAND_PAGING)
+ return 0;
+ return gzvm_vm_populate_mem_region(gzvm, memslot->slot_id);
}

/**
@@ -304,6 +332,22 @@ static const struct file_operations gzvm_vm_fops = {
.llseek = noop_llseek,
};

+static int setup_mem_alloc_mode(struct gzvm *vm)
+{
+ int ret;
+ struct gzvm_enable_cap cap = {0};
+
+ cap.cap = GZVM_CAP_ENABLE_DEMAND_PAGING;
+
+ ret = gzvm_vm_ioctl_enable_cap(vm, &cap, NULL);
+ if (!ret)
+ vm->mem_alloc_mode = GZVM_DEMAND_PAGING;
+ else
+ vm->mem_alloc_mode = GZVM_FULLY_POPULATED;
+
+ return 0;
+}
+
static struct gzvm *gzvm_create_vm(unsigned long vm_type)
{
int ret;
@@ -337,6 +381,8 @@ static struct gzvm *gzvm_create_vm(unsigned long vm_type)
return ERR_PTR(ret);
}

+ setup_mem_alloc_mode(gzvm);
+
mutex_lock(&gzvm_list_lock);
list_add(&gzvm->vm_list, &gzvm_list);
mutex_unlock(&gzvm_list_lock);
diff --git a/include/linux/soc/mediatek/gzvm_drv.h b/include/linux/soc/mediatek/gzvm_drv.h
index 798880468991..7ca4ae0de482 100644
--- a/include/linux/soc/mediatek/gzvm_drv.h
+++ b/include/linux/soc/mediatek/gzvm_drv.h
@@ -29,6 +29,7 @@
*/
#define NO_ERROR (0)
#define ERR_NO_MEMORY (-5)
+#define ERR_INVALID_ARGS (-8)
#define ERR_NOT_SUPPORTED (-24)
#define ERR_NOT_IMPLEMENTED (-27)
#define ERR_FAULT (-40)
@@ -43,6 +44,11 @@

#define GZVM_VCPU_RUN_MAP_SIZE (PAGE_SIZE * 2)

+enum gzvm_demand_paging_mode {
+ GZVM_FULLY_POPULATED = 0,
+ GZVM_DEMAND_PAGING = 1,
+};
+
/**
* struct mem_region_addr_range: identical to ffa memory constituent
* @address: the base IPA of the constituent memory region, aligned to 4 kiB
@@ -105,6 +111,7 @@ struct gzvm_vcpu {
* @irq_ack_notifier_list: list head for irq ack notifier
* @irq_srcu: structure data for SRCU(sleepable rcu)
* @irq_lock: lock for irq injection
+ * @mem_alloc_mode: memory allocation mode - fully allocated or demand paging
*/
struct gzvm {
struct gzvm_vcpu *vcpus[GZVM_MAX_VCPUS];
@@ -127,6 +134,7 @@ struct gzvm {
struct hlist_head irq_ack_notifier_list;
struct srcu_struct irq_srcu;
struct mutex irq_lock;
+ u32 mem_alloc_mode;
};

long gzvm_dev_ioctl_check_extension(struct gzvm *gzvm, unsigned long args);
@@ -145,6 +153,8 @@ int gzvm_arch_set_memregion(u16 vm_id, size_t buf_size,
int gzvm_arch_check_extension(struct gzvm *gzvm, __u64 cap, void __user *argp);
int gzvm_arch_create_vm(unsigned long vm_type);
int gzvm_arch_destroy_vm(u16 vm_id);
+int gzvm_arch_map_guest(u16 vm_id, int memslot_id, u64 pfn, u64 gfn,
+ u64 nr_pages);
int gzvm_vm_ioctl_arch_enable_cap(struct gzvm *gzvm,
struct gzvm_enable_cap *cap,
void __user *argp);
@@ -166,6 +176,10 @@ int gzvm_arch_vcpu_run(struct gzvm_vcpu *vcpu, __u64 *exit_reason);
int gzvm_arch_destroy_vcpu(u16 vm_id, int vcpuid);
int gzvm_arch_inform_exit(u16 vm_id);

+int gzvm_find_memslot(struct gzvm *vm, u64 gpa);
+int gzvm_handle_page_fault(struct gzvm_vcpu *vcpu);
+bool gzvm_handle_guest_exception(struct gzvm_vcpu *vcpu);
+
int gzvm_arch_create_device(u16 vm_id, struct gzvm_create_device *gzvm_dev);
int gzvm_arch_inject_irq(struct gzvm *gzvm, unsigned int vcpu_idx,
u32 irq, bool level);
diff --git a/include/uapi/linux/gzvm.h b/include/uapi/linux/gzvm.h
index 7aec4adf2206..61a7a87b3d23 100644
--- a/include/uapi/linux/gzvm.h
+++ b/include/uapi/linux/gzvm.h
@@ -18,6 +18,7 @@

#define GZVM_CAP_VM_GPA_SIZE 0xa5
#define GZVM_CAP_PROTECTED_VM 0xffbadab1
+#define GZVM_CAP_ENABLE_DEMAND_PAGING 0x9202

/* sub-commands put in args[0] for GZVM_CAP_PROTECTED_VM */
#define GZVM_CAP_PVM_SET_PVMFW_GPA 0
@@ -186,6 +187,12 @@ enum {
GZVM_EXIT_GZ = 0x9292000a,
};

+/* exception definitions of GZVM_EXIT_EXCEPTION */
+enum {
+ GZVM_EXCEPTION_UNKNOWN = 0x0,
+ GZVM_EXCEPTION_PAGE_FAULT = 0x1,
+};
+
/**
* struct gzvm_vcpu_run: Same purpose as kvm_run, this struct is
* shared between userspace, kernel and
@@ -250,6 +257,12 @@ struct gzvm_vcpu_run {
__u32 exception;
/* Exception error codes */
__u32 error_code;
+ /* Fault GPA (guest physical address or IPA in ARM) */
+ __u64 fault_gpa;
+ /* Future-proof reservation and reset to zero in hypervisor.
+ * Fill up to the union size, 256 bytes.
+ */
+ __u64 reserved[30];
} exception;
/* GZVM_EXIT_HYPERCALL */
struct {
--
2.18.0


2024-04-12 07:04:00

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v10 19/21] virt: geniezone: Provide individual VM memory statistics within debugfs

From: "Jerry Wang" <[email protected]>

Created a dedicated per-VM debugfs folder under gzvm, providing
user-level programs with easy access to per-VM memory statistics for
debugging and profiling purposes. This enables users to effectively
analyze and optimize the memory usage of individual virtual machines.

Two types of information can be obtained:

`cat /sys/kernel/debug/gzvm/<pid>-<vmid>/protected_hyp_mem` shows memory
used by the hypervisor and the size of the stage 2 table in bytes.

`cat /sys/kernel/debug/gzvm/<pid>-<vmid>/protected_shared_mem` gives
memory used by the shared resources of the guest and host in bytes.

For example:
console:/ # cat /sys/kernel/debug/gzvm/3417-15/protected_hyp_mem
180328
console:/ # cat /sys/kernel/debug/gzvm/3417-15/protected_shared_mem
262144
console:/ #

More stats will be added in the future.

Signed-off-by: Jerry Wang <[email protected]>
Signed-off-by: Liju-Clr Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
arch/arm64/geniezone/gzvm_arch_common.h | 2 +
arch/arm64/geniezone/vm.c | 13 +++
drivers/virt/geniezone/gzvm_main.c | 6 ++
drivers/virt/geniezone/gzvm_vm.c | 137 ++++++++++++++++++++++++
include/linux/soc/mediatek/gzvm_drv.h | 17 +++
5 files changed, 175 insertions(+)

diff --git a/arch/arm64/geniezone/gzvm_arch_common.h b/arch/arm64/geniezone/gzvm_arch_common.h
index 8a082ba808a4..192d023722e5 100644
--- a/arch/arm64/geniezone/gzvm_arch_common.h
+++ b/arch/arm64/geniezone/gzvm_arch_common.h
@@ -26,6 +26,7 @@ enum {
GZVM_FUNC_SET_DTB_CONFIG = 16,
GZVM_FUNC_MAP_GUEST = 17,
GZVM_FUNC_MAP_GUEST_BLOCK = 18,
+ GZVM_FUNC_GET_STATISTICS = 19,
NR_GZVM_FUNC,
};

@@ -52,6 +53,7 @@ enum {
#define MT_HVC_GZVM_SET_DTB_CONFIG GZVM_HCALL_ID(GZVM_FUNC_SET_DTB_CONFIG)
#define MT_HVC_GZVM_MAP_GUEST GZVM_HCALL_ID(GZVM_FUNC_MAP_GUEST)
#define MT_HVC_GZVM_MAP_GUEST_BLOCK GZVM_HCALL_ID(GZVM_FUNC_MAP_GUEST_BLOCK)
+#define MT_HVC_GZVM_GET_STATISTICS GZVM_HCALL_ID(GZVM_FUNC_GET_STATISTICS)

#define GIC_V3_NR_LRS 16

diff --git a/arch/arm64/geniezone/vm.c b/arch/arm64/geniezone/vm.c
index eb28c3850b5d..a477546c5a1a 100644
--- a/arch/arm64/geniezone/vm.c
+++ b/arch/arm64/geniezone/vm.c
@@ -431,3 +431,16 @@ int gzvm_arch_map_guest_block(u16 vm_id, int memslot_id, u64 gfn, u64 nr_pages)
return gzvm_hypcall_wrapper(MT_HVC_GZVM_MAP_GUEST_BLOCK, vm_id,
memslot_id, gfn, nr_pages, 0, 0, 0, &res);
}
+
+int gzvm_arch_get_statistics(struct gzvm *gzvm)
+{
+ struct arm_smccc_res res;
+ int ret;
+
+ ret = gzvm_hypcall_wrapper(MT_HVC_GZVM_GET_STATISTICS, gzvm->vm_id,
+ 0, 0, 0, 0, 0, 0, &res);
+
+ gzvm->stat.protected_hyp_mem = ((ret == 0) ? res.a1 : 0);
+ gzvm->stat.protected_shared_mem = ((ret == 0) ? res.a2 : 0);
+ return ret;
+}
diff --git a/drivers/virt/geniezone/gzvm_main.c b/drivers/virt/geniezone/gzvm_main.c
index 8f11a27f2723..d17505cf9755 100644
--- a/drivers/virt/geniezone/gzvm_main.c
+++ b/drivers/virt/geniezone/gzvm_main.c
@@ -109,6 +109,11 @@ static int gzvm_drv_probe(struct platform_device *pdev)
ret = gzvm_drv_irqfd_init();
if (ret)
return ret;
+
+ ret = gzvm_drv_debug_init();
+ if (ret)
+ return ret;
+
return 0;
}

@@ -117,6 +122,7 @@ static int gzvm_drv_remove(struct platform_device *pdev)
gzvm_drv_irqfd_exit();
gzvm_destroy_all_vms();
misc_deregister(&gzvm_dev);
+ gzvm_drv_debug_exit();
return 0;
}

diff --git a/drivers/virt/geniezone/gzvm_vm.c b/drivers/virt/geniezone/gzvm_vm.c
index 04af59b77189..e5751b07e425 100644
--- a/drivers/virt/geniezone/gzvm_vm.c
+++ b/drivers/virt/geniezone/gzvm_vm.c
@@ -11,11 +11,14 @@
#include <linux/platform_device.h>
#include <linux/slab.h>
#include <linux/soc/mediatek/gzvm_drv.h>
+#include <linux/debugfs.h>
#include "gzvm_common.h"

static DEFINE_MUTEX(gzvm_list_lock);
static LIST_HEAD(gzvm_list);

+static struct dentry *gzvm_debugfs_dir;
+
int gzvm_gfn_to_hva_memslot(struct gzvm_memslot *memslot, u64 gfn,
u64 *hva_memslot)
{
@@ -315,6 +318,12 @@ static void gzvm_destroy_all_ppage(struct gzvm *gzvm)
}
}

+static int gzvm_destroy_vm_debugfs(struct gzvm *vm)
+{
+ debugfs_remove_recursive(vm->debug_dir);
+ return 0;
+}
+
static void gzvm_destroy_vm(struct gzvm *gzvm)
{
size_t allocated_size;
@@ -341,6 +350,8 @@ static void gzvm_destroy_vm(struct gzvm *gzvm)
/* No need to lock here becauese it's single-threaded execution */
gzvm_destroy_all_ppage(gzvm);

+ gzvm_destroy_vm_debugfs(gzvm);
+
kfree(gzvm);
}

@@ -398,6 +409,113 @@ static void setup_vm_demand_paging(struct gzvm *vm)
}
}

+static int debugfs_open(struct inode *inode, struct file *file)
+{
+ file->private_data = inode->i_private;
+ return 0;
+}
+
+/**
+ * hyp_mem_read() - Get size of hypervisor-allocated memory and stage 2 table
+ * @file: Pointer to struct file
+ * @buf: User space buffer for storing the return value
+ * @len: Size of @buf, in bytes
+ * @offset: Pointer to loff_t
+ *
+ * Return: Size of hypervisor-allocated memory and stage 2 table, in bytes
+ */
+static ssize_t hyp_mem_read(struct file *file, char __user *buf, size_t len,
+ loff_t *offset)
+{
+ char tmp_buffer[GZVM_MAX_DEBUGFS_VALUE_SIZE] = {0};
+ struct gzvm *vm = file->private_data;
+ int ret;
+
+ if (*offset == 0) {
+ ret = gzvm_arch_get_statistics(vm);
+ if (ret)
+ return ret;
+ snprintf(tmp_buffer, sizeof(tmp_buffer), "%llu\n",
+ vm->stat.protected_hyp_mem);
+ if (copy_to_user(buf, tmp_buffer, sizeof(tmp_buffer)))
+ return -EFAULT;
+ *offset += sizeof(tmp_buffer);
+ return sizeof(tmp_buffer);
+ }
+ return 0;
+}
+
+/**
+ * shared_mem_read() - Get size of memory shared between host and guest
+ * @file: Pointer to struct file
+ * @buf: User space buffer for storing the return value
+ * @len: Size of @buf, in bytes
+ * @offset: Pointer to loff_t
+ *
+ * Return: Size of memory shared between host and guest, in bytes
+ */
+static ssize_t shared_mem_read(struct file *file, char __user *buf, size_t len,
+ loff_t *offset)
+{
+ char tmp_buffer[GZVM_MAX_DEBUGFS_VALUE_SIZE] = {0};
+ struct gzvm *vm = file->private_data;
+ int ret;
+
+ if (*offset == 0) {
+ ret = gzvm_arch_get_statistics(vm);
+ if (ret)
+ return ret;
+ snprintf(tmp_buffer, sizeof(tmp_buffer), "%llu\n",
+ vm->stat.protected_shared_mem);
+ if (copy_to_user(buf, tmp_buffer, sizeof(tmp_buffer)))
+ return -EFAULT;
+ *offset += sizeof(tmp_buffer);
+ return sizeof(tmp_buffer);
+ }
+ return 0;
+}
+
+static const struct file_operations hyp_mem_fops = {
+ .owner = THIS_MODULE,
+ .open = debugfs_open,
+ .read = hyp_mem_read,
+ .llseek = no_llseek,
+};
+
+static const struct file_operations shared_mem_fops = {
+ .owner = THIS_MODULE,
+ .open = debugfs_open,
+ .read = shared_mem_read,
+ .llseek = no_llseek,
+};
+
+static int gzvm_create_vm_debugfs(struct gzvm *vm)
+{
+ struct dentry *dent;
+ char dir_name[GZVM_MAX_DEBUGFS_DIR_NAME_SIZE];
+
+ if (vm->debug_dir) {
+ pr_warn("VM debugfs directory is duplicated\n");
+ return 0;
+ }
+
+ snprintf(dir_name, sizeof(dir_name), "%d-%d", task_pid_nr(current), vm->vm_id);
+
+ dent = debugfs_lookup(dir_name, gzvm_debugfs_dir);
+ if (dent) {
+ pr_warn("Debugfs directory is duplicated\n");
+ dput(dent);
+ return 0;
+ }
+ dent = debugfs_create_dir(dir_name, gzvm_debugfs_dir);
+ vm->debug_dir = dent;
+
+ debugfs_create_file("protected_shared_mem", 0444, dent, vm, &shared_mem_fops);
+ debugfs_create_file("protected_hyp_mem", 0444, dent, vm, &hyp_mem_fops);
+
+ return 0;
+}
+
static int setup_mem_alloc_mode(struct gzvm *vm)
{
int ret;
@@ -457,6 +575,8 @@ static struct gzvm *gzvm_create_vm(unsigned long vm_type)
list_add(&gzvm->vm_list, &gzvm_list);
mutex_unlock(&gzvm_list_lock);

+ gzvm_create_vm_debugfs(gzvm);
+
pr_debug("VM-%u is created\n", gzvm->vm_id);

return gzvm;
@@ -494,3 +614,20 @@ void gzvm_destroy_all_vms(void)
out:
mutex_unlock(&gzvm_list_lock);
}
+
+int gzvm_drv_debug_init(void)
+{
+ if (!debugfs_initialized())
+ return 0;
+
+ if (!gzvm_debugfs_dir && !debugfs_lookup("gzvm", gzvm_debugfs_dir))
+ gzvm_debugfs_dir = debugfs_create_dir("gzvm", NULL);
+
+ return 0;
+}
+
+void gzvm_drv_debug_exit(void)
+{
+ if (gzvm_debugfs_dir && debugfs_lookup("gzvm", gzvm_debugfs_dir))
+ debugfs_remove_recursive(gzvm_debugfs_dir);
+}
diff --git a/include/linux/soc/mediatek/gzvm_drv.h b/include/linux/soc/mediatek/gzvm_drv.h
index 2e5e9c67cfa5..e123787cd70d 100644
--- a/include/linux/soc/mediatek/gzvm_drv.h
+++ b/include/linux/soc/mediatek/gzvm_drv.h
@@ -47,6 +47,9 @@

#define GZVM_BLOCK_BASED_DEMAND_PAGE_SIZE (2 * 1024 * 1024) /* 2MB */

+#define GZVM_MAX_DEBUGFS_DIR_NAME_SIZE 20
+#define GZVM_MAX_DEBUGFS_VALUE_SIZE 20
+
enum gzvm_demand_paging_mode {
GZVM_FULLY_POPULATED = 0,
GZVM_DEMAND_PAGING = 1,
@@ -106,6 +109,11 @@ struct gzvm_pinned_page {
u64 ipa;
};

+struct gzvm_vm_stat {
+ u64 protected_hyp_mem;
+ u64 protected_shared_mem;
+};
+
/**
* struct gzvm: the following data structures are for data transferring between
* driver and hypervisor, and they're aligned with hypervisor definitions.
@@ -128,6 +136,8 @@ struct gzvm_pinned_page {
* page mailbox at the same time
* @pinned_pages: use rb-tree to record pin/unpin page
* @mem_lock: lock for memory operations
+ * @stat: information for VM memory statistics
+ * @debug_dir: debugfs directory node for VM memory statistics
*/
struct gzvm {
struct gzvm_vcpu *vcpus[GZVM_MAX_VCPUS];
@@ -158,6 +168,9 @@ struct gzvm {

struct rb_root pinned_pages;
struct mutex mem_lock;
+
+ struct gzvm_vm_stat stat;
+ struct dentry *debug_dir;
};

long gzvm_dev_ioctl_check_extension(struct gzvm *gzvm, unsigned long args);
@@ -179,6 +192,7 @@ int gzvm_arch_destroy_vm(u16 vm_id);
int gzvm_arch_map_guest(u16 vm_id, int memslot_id, u64 pfn, u64 gfn,
u64 nr_pages);
int gzvm_arch_map_guest_block(u16 vm_id, int memslot_id, u64 gfn, u64 nr_pages);
+int gzvm_arch_get_statistics(struct gzvm *gzvm);
int gzvm_vm_ioctl_arch_enable_cap(struct gzvm *gzvm,
struct gzvm_enable_cap *cap,
void __user *argp);
@@ -201,6 +215,9 @@ int gzvm_arch_vcpu_run(struct gzvm_vcpu *vcpu, __u64 *exit_reason);
int gzvm_arch_destroy_vcpu(u16 vm_id, int vcpuid);
int gzvm_arch_inform_exit(u16 vm_id);

+int gzvm_drv_debug_init(void);
+void gzvm_drv_debug_exit(void);
+
int gzvm_find_memslot(struct gzvm *vm, u64 gpa);
int gzvm_handle_page_fault(struct gzvm_vcpu *vcpu);
bool gzvm_handle_guest_exception(struct gzvm_vcpu *vcpu);
--
2.18.0


2024-04-12 07:04:40

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v10 09/21] virt: geniezone: Add vcpu support

From: "Yingshiuan Pan" <[email protected]>

VMM use this interface to create vcpu instance which is a fd, and this
fd will be for any vcpu operations, such as setting vcpu registers and
accepts the most important ioctl GZVM_VCPU_RUN which requests GenieZone
hypervisor to do context switch to execute VM's vcpu context.

Signed-off-by: Yingshiuan Pan <[email protected]>
Signed-off-by: Jerry Wang <[email protected]>
Signed-off-by: kevenny hsieh <[email protected]>
Signed-off-by: Liju Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
arch/arm64/geniezone/Makefile | 2 +-
arch/arm64/geniezone/gzvm_arch_common.h | 18 ++
arch/arm64/geniezone/vcpu.c | 80 ++++++++
arch/arm64/geniezone/vm.c | 12 ++
drivers/virt/geniezone/Makefile | 2 +-
drivers/virt/geniezone/gzvm_vcpu.c | 251 ++++++++++++++++++++++++
drivers/virt/geniezone/gzvm_vm.c | 5 +
include/linux/soc/mediatek/gzvm_drv.h | 24 +++
include/uapi/linux/gzvm.h | 163 +++++++++++++++
9 files changed, 555 insertions(+), 2 deletions(-)
create mode 100644 arch/arm64/geniezone/vcpu.c
create mode 100644 drivers/virt/geniezone/gzvm_vcpu.c

diff --git a/arch/arm64/geniezone/Makefile b/arch/arm64/geniezone/Makefile
index 2957898cdd05..69b0a4abeab0 100644
--- a/arch/arm64/geniezone/Makefile
+++ b/arch/arm64/geniezone/Makefile
@@ -4,6 +4,6 @@
#
include $(srctree)/drivers/virt/geniezone/Makefile

-gzvm-y += vm.o
+gzvm-y += vm.o vcpu.o

obj-$(CONFIG_MTK_GZVM) += gzvm.o
diff --git a/arch/arm64/geniezone/gzvm_arch_common.h b/arch/arm64/geniezone/gzvm_arch_common.h
index e500dbe7f943..3ec7bea5651f 100644
--- a/arch/arm64/geniezone/gzvm_arch_common.h
+++ b/arch/arm64/geniezone/gzvm_arch_common.h
@@ -11,9 +11,15 @@
enum {
GZVM_FUNC_CREATE_VM = 0,
GZVM_FUNC_DESTROY_VM = 1,
+ GZVM_FUNC_CREATE_VCPU = 2,
+ GZVM_FUNC_DESTROY_VCPU = 3,
GZVM_FUNC_SET_MEMREGION = 4,
+ GZVM_FUNC_RUN = 5,
+ GZVM_FUNC_GET_ONE_REG = 8,
+ GZVM_FUNC_SET_ONE_REG = 9,
GZVM_FUNC_PROBE = 12,
GZVM_FUNC_ENABLE_CAP = 13,
+ GZVM_FUNC_INFORM_EXIT = 14,
NR_GZVM_FUNC,
};

@@ -25,9 +31,15 @@ enum {

#define MT_HVC_GZVM_CREATE_VM GZVM_HCALL_ID(GZVM_FUNC_CREATE_VM)
#define MT_HVC_GZVM_DESTROY_VM GZVM_HCALL_ID(GZVM_FUNC_DESTROY_VM)
+#define MT_HVC_GZVM_CREATE_VCPU GZVM_HCALL_ID(GZVM_FUNC_CREATE_VCPU)
+#define MT_HVC_GZVM_DESTROY_VCPU GZVM_HCALL_ID(GZVM_FUNC_DESTROY_VCPU)
#define MT_HVC_GZVM_SET_MEMREGION GZVM_HCALL_ID(GZVM_FUNC_SET_MEMREGION)
+#define MT_HVC_GZVM_RUN GZVM_HCALL_ID(GZVM_FUNC_RUN)
+#define MT_HVC_GZVM_GET_ONE_REG GZVM_HCALL_ID(GZVM_FUNC_GET_ONE_REG)
+#define MT_HVC_GZVM_SET_ONE_REG GZVM_HCALL_ID(GZVM_FUNC_SET_ONE_REG)
#define MT_HVC_GZVM_PROBE GZVM_HCALL_ID(GZVM_FUNC_PROBE)
#define MT_HVC_GZVM_ENABLE_CAP GZVM_HCALL_ID(GZVM_FUNC_ENABLE_CAP)
+#define MT_HVC_GZVM_INFORM_EXIT GZVM_HCALL_ID(GZVM_FUNC_INFORM_EXIT)

/**
* gzvm_hypcall_wrapper() - the wrapper for hvc calls
@@ -49,4 +61,10 @@ int gzvm_hypcall_wrapper(unsigned long a0, unsigned long a1,
unsigned long a6, unsigned long a7,
struct arm_smccc_res *res);

+static inline unsigned int
+assemble_vm_vcpu_tuple(u16 vmid, u16 vcpuid)
+{
+ return ((unsigned int)vmid << 16 | vcpuid);
+}
+
#endif /* __GZVM_ARCH_COMMON_H__ */
diff --git a/arch/arm64/geniezone/vcpu.c b/arch/arm64/geniezone/vcpu.c
new file mode 100644
index 000000000000..e12ea9cb4941
--- /dev/null
+++ b/arch/arm64/geniezone/vcpu.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2023 MediaTek Inc.
+ */
+
+#include <linux/arm-smccc.h>
+#include <linux/err.h>
+#include <linux/uaccess.h>
+
+#include <linux/gzvm.h>
+#include <linux/soc/mediatek/gzvm_drv.h>
+#include "gzvm_arch_common.h"
+
+int gzvm_arch_vcpu_update_one_reg(struct gzvm_vcpu *vcpu, __u64 reg_id,
+ bool is_write, __u64 *data)
+{
+ struct arm_smccc_res res;
+ unsigned long a1;
+ int ret;
+
+ a1 = assemble_vm_vcpu_tuple(vcpu->gzvm->vm_id, vcpu->vcpuid);
+ if (!is_write) {
+ ret = gzvm_hypcall_wrapper(MT_HVC_GZVM_GET_ONE_REG,
+ a1, reg_id, 0, 0, 0, 0, 0, &res);
+ if (ret == 0)
+ *data = res.a1;
+ } else {
+ ret = gzvm_hypcall_wrapper(MT_HVC_GZVM_SET_ONE_REG,
+ a1, reg_id, *data, 0, 0, 0, 0, &res);
+ }
+
+ return ret;
+}
+
+int gzvm_arch_vcpu_run(struct gzvm_vcpu *vcpu, __u64 *exit_reason)
+{
+ struct arm_smccc_res res;
+ unsigned long a1;
+ int ret;
+
+ a1 = assemble_vm_vcpu_tuple(vcpu->gzvm->vm_id, vcpu->vcpuid);
+ ret = gzvm_hypcall_wrapper(MT_HVC_GZVM_RUN, a1, 0, 0, 0, 0, 0,
+ 0, &res);
+ *exit_reason = res.a1;
+ return ret;
+}
+
+int gzvm_arch_destroy_vcpu(u16 vm_id, int vcpuid)
+{
+ struct arm_smccc_res res;
+ unsigned long a1;
+
+ a1 = assemble_vm_vcpu_tuple(vm_id, vcpuid);
+ gzvm_hypcall_wrapper(MT_HVC_GZVM_DESTROY_VCPU, a1, 0, 0, 0, 0, 0, 0,
+ &res);
+
+ return 0;
+}
+
+/**
+ * gzvm_arch_create_vcpu() - Call smc to gz hypervisor to create vcpu
+ * @vm_id: vm id
+ * @vcpuid: vcpu id
+ * @run: Virtual address of vcpu->run
+ *
+ * Return: The wrapper helps caller to convert geniezone errno to Linux errno.
+ */
+int gzvm_arch_create_vcpu(u16 vm_id, int vcpuid, void *run)
+{
+ struct arm_smccc_res res;
+ unsigned long a1, a2;
+ int ret;
+
+ a1 = assemble_vm_vcpu_tuple(vm_id, vcpuid);
+ a2 = (__u64)virt_to_phys(run);
+ ret = gzvm_hypcall_wrapper(MT_HVC_GZVM_CREATE_VCPU, a1, a2, 0, 0, 0, 0,
+ 0, &res);
+
+ return ret;
+}
diff --git a/arch/arm64/geniezone/vm.c b/arch/arm64/geniezone/vm.c
index 642efa596112..84d763032f60 100644
--- a/arch/arm64/geniezone/vm.c
+++ b/arch/arm64/geniezone/vm.c
@@ -53,6 +53,18 @@ int gzvm_hypcall_wrapper(unsigned long a0, unsigned long a1,
return gzvm_err_to_errno(res->a0);
}

+int gzvm_arch_inform_exit(u16 vm_id)
+{
+ struct arm_smccc_res res;
+ int ret;
+
+ ret = gzvm_hypcall_wrapper(MT_HVC_GZVM_INFORM_EXIT, vm_id, 0, 0, 0, 0, 0, 0, &res);
+ if (ret)
+ return -ENXIO;
+
+ return 0;
+}
+
int gzvm_arch_probe(void)
{
struct arm_smccc_res res;
diff --git a/drivers/virt/geniezone/Makefile b/drivers/virt/geniezone/Makefile
index 59fc4510a843..a630b919cda5 100644
--- a/drivers/virt/geniezone/Makefile
+++ b/drivers/virt/geniezone/Makefile
@@ -7,4 +7,4 @@
GZVM_DIR ?= ../../../drivers/virt/geniezone

gzvm-y := $(GZVM_DIR)/gzvm_main.o $(GZVM_DIR)/gzvm_vm.o \
- $(GZVM_DIR)/gzvm_mmu.o
+ $(GZVM_DIR)/gzvm_mmu.o $(GZVM_DIR)/gzvm_vcpu.o
diff --git a/drivers/virt/geniezone/gzvm_vcpu.c b/drivers/virt/geniezone/gzvm_vcpu.c
new file mode 100644
index 000000000000..55668341d455
--- /dev/null
+++ b/drivers/virt/geniezone/gzvm_vcpu.c
@@ -0,0 +1,251 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2023 MediaTek Inc.
+ */
+
+#include <asm/sysreg.h>
+#include <linux/anon_inodes.h>
+#include <linux/device.h>
+#include <linux/file.h>
+#include <linux/mm.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/soc/mediatek/gzvm_drv.h>
+
+/* maximum size needed for holding an integer */
+#define ITOA_MAX_LEN 12
+
+static long gzvm_vcpu_update_one_reg(struct gzvm_vcpu *vcpu,
+ void __user *argp,
+ bool is_write)
+{
+ struct gzvm_one_reg reg;
+ void __user *reg_addr;
+ u64 data = 0;
+ u64 reg_size;
+ long ret;
+
+ if (copy_from_user(&reg, argp, sizeof(reg)))
+ return -EFAULT;
+
+ reg_addr = (void __user *)reg.addr;
+ reg_size = (reg.id & GZVM_REG_SIZE_MASK) >> GZVM_REG_SIZE_SHIFT;
+ reg_size = BIT(reg_size);
+
+ if (reg_size != 1 && reg_size != 2 && reg_size != 4 && reg_size != 8)
+ return -EINVAL;
+
+ if (is_write) {
+ /* GZ hypervisor would filter out invalid vcpu register access */
+ if (copy_from_user(&data, reg_addr, reg_size))
+ return -EFAULT;
+ } else {
+ return -EOPNOTSUPP;
+ }
+
+ ret = gzvm_arch_vcpu_update_one_reg(vcpu, reg.id, is_write, &data);
+
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
+/**
+ * gzvm_vcpu_run() - Handle vcpu run ioctl, entry point to guest and exit
+ * point from guest
+ * @vcpu: Pointer to struct gzvm_vcpu
+ * @argp: Pointer to struct gzvm_vcpu_run in userspace
+ *
+ * Return:
+ * * 0 - Success.
+ * * Negative - Failure.
+ */
+static long gzvm_vcpu_run(struct gzvm_vcpu *vcpu, void __user *argp)
+{
+ bool need_userspace = false;
+ u64 exit_reason = 0;
+
+ if (copy_from_user(vcpu->run, argp, sizeof(struct gzvm_vcpu_run)))
+ return -EFAULT;
+
+ for (int i = 0; i < ARRAY_SIZE(vcpu->run->padding1); i++) {
+ if (vcpu->run->padding1[i])
+ return -EINVAL;
+ }
+
+ if (vcpu->run->immediate_exit == 1)
+ return -EINTR;
+
+ while (!need_userspace && !signal_pending(current)) {
+ gzvm_arch_vcpu_run(vcpu, &exit_reason);
+
+ switch (exit_reason) {
+ case GZVM_EXIT_MMIO:
+ need_userspace = true;
+ break;
+ /**
+ * it's geniezone's responsibility to fill corresponding data
+ * structure
+ */
+ case GZVM_EXIT_HYPERCALL:
+ fallthrough;
+ case GZVM_EXIT_EXCEPTION:
+ fallthrough;
+ case GZVM_EXIT_DEBUG:
+ fallthrough;
+ case GZVM_EXIT_FAIL_ENTRY:
+ fallthrough;
+ case GZVM_EXIT_INTERNAL_ERROR:
+ fallthrough;
+ case GZVM_EXIT_SYSTEM_EVENT:
+ fallthrough;
+ case GZVM_EXIT_SHUTDOWN:
+ need_userspace = true;
+ break;
+ case GZVM_EXIT_IRQ:
+ fallthrough;
+ case GZVM_EXIT_GZ:
+ break;
+ case GZVM_EXIT_UNKNOWN:
+ fallthrough;
+ default:
+ pr_err("vcpu unknown exit\n");
+ need_userspace = true;
+ goto out;
+ }
+ }
+
+out:
+ if (copy_to_user(argp, vcpu->run, sizeof(struct gzvm_vcpu_run)))
+ return -EFAULT;
+ if (signal_pending(current)) {
+ // invoke hvc to inform gz to map memory
+ gzvm_arch_inform_exit(vcpu->gzvm->vm_id);
+ return -ERESTARTSYS;
+ }
+ return 0;
+}
+
+static long gzvm_vcpu_ioctl(struct file *filp, unsigned int ioctl,
+ unsigned long arg)
+{
+ int ret = -ENOTTY;
+ void __user *argp = (void __user *)arg;
+ struct gzvm_vcpu *vcpu = filp->private_data;
+
+ switch (ioctl) {
+ case GZVM_RUN:
+ ret = gzvm_vcpu_run(vcpu, argp);
+ break;
+ case GZVM_GET_ONE_REG:
+ /* !is_write */
+ ret = -EOPNOTSUPP;
+ break;
+ case GZVM_SET_ONE_REG:
+ /* is_write */
+ ret = gzvm_vcpu_update_one_reg(vcpu, argp, true);
+ break;
+ default:
+ break;
+ }
+
+ return ret;
+}
+
+static const struct file_operations gzvm_vcpu_fops = {
+ .unlocked_ioctl = gzvm_vcpu_ioctl,
+ .llseek = noop_llseek,
+};
+
+/* caller must hold the vm lock */
+static void gzvm_destroy_vcpu(struct gzvm_vcpu *vcpu)
+{
+ if (!vcpu)
+ return;
+
+ gzvm_arch_destroy_vcpu(vcpu->gzvm->vm_id, vcpu->vcpuid);
+ /* clean guest's data */
+ memset(vcpu->run, 0, GZVM_VCPU_RUN_MAP_SIZE);
+ free_pages_exact(vcpu->run, GZVM_VCPU_RUN_MAP_SIZE);
+ kfree(vcpu);
+}
+
+/**
+ * gzvm_destroy_vcpus() - Destroy all vcpus, caller has to hold the vm lock
+ *
+ * @gzvm: vm struct that owns the vcpus
+ */
+void gzvm_destroy_vcpus(struct gzvm *gzvm)
+{
+ int i;
+
+ for (i = 0; i < GZVM_MAX_VCPUS; i++) {
+ gzvm_destroy_vcpu(gzvm->vcpus[i]);
+ gzvm->vcpus[i] = NULL;
+ }
+}
+
+/* create_vcpu_fd() - Allocates an inode for the vcpu. */
+static int create_vcpu_fd(struct gzvm_vcpu *vcpu)
+{
+ /* sizeof("gzvm-vcpu:") + max(strlen(itoa(vcpuid))) + null */
+ char name[10 + ITOA_MAX_LEN + 1];
+
+ snprintf(name, sizeof(name), "gzvm-vcpu:%d", vcpu->vcpuid);
+ return anon_inode_getfd(name, &gzvm_vcpu_fops, vcpu, O_RDWR | O_CLOEXEC);
+}
+
+/**
+ * gzvm_vm_ioctl_create_vcpu() - for GZVM_CREATE_VCPU
+ * @gzvm: Pointer to struct gzvm
+ * @cpuid: equals arg
+ *
+ * Return: Fd of vcpu, negative errno if error occurs
+ */
+int gzvm_vm_ioctl_create_vcpu(struct gzvm *gzvm, u32 cpuid)
+{
+ struct gzvm_vcpu *vcpu;
+ int ret;
+
+ if (cpuid >= GZVM_MAX_VCPUS)
+ return -EINVAL;
+
+ vcpu = kzalloc(sizeof(*vcpu), GFP_KERNEL);
+ if (!vcpu)
+ return -ENOMEM;
+
+ /**
+ * Allocate 2 pages for data sharing between driver and gz hypervisor
+ *
+ * |- page 0 -|- page 1 -|
+ * |gzvm_vcpu_run|......|hwstate|.......|
+ *
+ */
+ vcpu->run = alloc_pages_exact(GZVM_VCPU_RUN_MAP_SIZE,
+ GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+ if (!vcpu->run) {
+ ret = -ENOMEM;
+ goto free_vcpu;
+ }
+ vcpu->vcpuid = cpuid;
+ vcpu->gzvm = gzvm;
+ mutex_init(&vcpu->lock);
+
+ ret = gzvm_arch_create_vcpu(gzvm->vm_id, vcpu->vcpuid, vcpu->run);
+ if (ret < 0)
+ goto free_vcpu_run;
+
+ ret = create_vcpu_fd(vcpu);
+ if (ret < 0)
+ goto free_vcpu_run;
+ gzvm->vcpus[cpuid] = vcpu;
+
+ return ret;
+
+free_vcpu_run:
+ free_pages_exact(vcpu->run, GZVM_VCPU_RUN_MAP_SIZE);
+free_vcpu:
+ kfree(vcpu);
+ return ret;
+}
diff --git a/drivers/virt/geniezone/gzvm_vm.c b/drivers/virt/geniezone/gzvm_vm.c
index 1b02f1676d7b..b29273b9c057 100644
--- a/drivers/virt/geniezone/gzvm_vm.c
+++ b/drivers/virt/geniezone/gzvm_vm.c
@@ -123,6 +123,10 @@ static long gzvm_vm_ioctl(struct file *filp, unsigned int ioctl,
ret = gzvm_dev_ioctl_check_extension(gzvm, arg);
break;
}
+ case GZVM_CREATE_VCPU: {
+ ret = gzvm_vm_ioctl_create_vcpu(gzvm, arg);
+ break;
+ }
case GZVM_SET_USER_MEMORY_REGION: {
struct gzvm_userspace_memory_region userspace_mem;

@@ -155,6 +159,7 @@ static void gzvm_destroy_vm(struct gzvm *gzvm)

mutex_lock(&gzvm->lock);

+ gzvm_destroy_vcpus(gzvm);
gzvm_arch_destroy_vm(gzvm->vm_id);

mutex_lock(&gzvm_list_lock);
diff --git a/include/linux/soc/mediatek/gzvm_drv.h b/include/linux/soc/mediatek/gzvm_drv.h
index 18a3e19347ce..853e99c54ae5 100644
--- a/include/linux/soc/mediatek/gzvm_drv.h
+++ b/include/linux/soc/mediatek/gzvm_drv.h
@@ -17,6 +17,7 @@
*/
#define GZVM_PA_ERR_BAD (0x7ffULL << 52)

+#define GZVM_VCPU_MMAP_SIZE PAGE_SIZE
#define INVALID_VM_ID 0xffff

/*
@@ -34,8 +35,11 @@
* The following data structures are for data transferring between driver and
* hypervisor, and they're aligned with hypervisor definitions
*/
+#define GZVM_MAX_VCPUS 8
#define GZVM_MAX_MEM_REGION 10

+#define GZVM_VCPU_RUN_MAP_SIZE (PAGE_SIZE * 2)
+
/**
* struct mem_region_addr_range: identical to ffa memory constituent
* @address: the base IPA of the constituent memory region, aligned to 4 kiB
@@ -75,9 +79,18 @@ struct gzvm_memslot {
u32 slot_id;
};

+struct gzvm_vcpu {
+ struct gzvm *gzvm;
+ int vcpuid;
+ /* lock of vcpu*/
+ struct mutex lock;
+ struct gzvm_vcpu_run *run;
+};
+
/**
* struct gzvm: the following data structures are for data transferring between
* driver and hypervisor, and they're aligned with hypervisor definitions.
+ * @vcpus: VM's cpu descriptors
* @mm: userspace tied to this vm
* @memslot: VM's memory slot descriptor
* @lock: lock for list_add
@@ -85,6 +98,7 @@ struct gzvm_memslot {
* @vm_id: vm id
*/
struct gzvm {
+ struct gzvm_vcpu *vcpus[GZVM_MAX_VCPUS];
struct mm_struct *mm;
struct gzvm_memslot memslot[GZVM_MAX_MEM_REGION];
struct mutex lock;
@@ -99,6 +113,8 @@ int gzvm_err_to_errno(unsigned long err);

void gzvm_destroy_all_vms(void);

+void gzvm_destroy_vcpus(struct gzvm *gzvm);
+
/* arch-dependant functions */
int gzvm_arch_probe(void);
int gzvm_arch_set_memregion(u16 vm_id, size_t buf_size,
@@ -119,4 +135,12 @@ int gzvm_gfn_to_hva_memslot(struct gzvm_memslot *memslot, u64 gfn,
int gzvm_vm_populate_mem_region(struct gzvm *gzvm, int slot_id);
int gzvm_vm_allocate_guest_page(struct gzvm_memslot *slot, u64 gfn, u64 *pfn);

+int gzvm_vm_ioctl_create_vcpu(struct gzvm *gzvm, u32 cpuid);
+int gzvm_arch_vcpu_update_one_reg(struct gzvm_vcpu *vcpu, __u64 reg_id,
+ bool is_write, __u64 *data);
+int gzvm_arch_create_vcpu(u16 vm_id, int vcpuid, void *run);
+int gzvm_arch_vcpu_run(struct gzvm_vcpu *vcpu, __u64 *exit_reason);
+int gzvm_arch_destroy_vcpu(u16 vm_id, int vcpuid);
+int gzvm_arch_inform_exit(u16 vm_id);
+
#endif /* __GZVM_DRV_H__ */
diff --git a/include/uapi/linux/gzvm.h b/include/uapi/linux/gzvm.h
index a79e787c9181..1146467487ca 100644
--- a/include/uapi/linux/gzvm.h
+++ b/include/uapi/linux/gzvm.h
@@ -25,6 +25,30 @@
/* GZVM_CAP_PVM_SET_PROTECTED_VM only sets protected but not load pvmfw */
#define GZVM_CAP_PVM_SET_PROTECTED_VM 2

+/*
+ * Architecture specific registers are to be defined and ORed with
+ * the arch identifier.
+ */
+#define GZVM_REG_ARCH_ARM64 FIELD_PREP(GENMASK_ULL(63, 56), 0x60)
+#define GZVM_REG_ARCH_MASK FIELD_PREP(GENMASK_ULL(63, 56), 0xff)
+/*
+ * Reg size = BIT((reg.id & GZVM_REG_SIZE_MASK) >> GZVM_REG_SIZE_SHIFT) bytes
+ */
+#define GZVM_REG_SIZE_SHIFT 52
+#define GZVM_REG_SIZE_MASK FIELD_PREP(GENMASK_ULL(63, 48), 0x00f0)
+
+#define GZVM_REG_SIZE_U8 FIELD_PREP(GENMASK_ULL(63, 48), 0x0000)
+#define GZVM_REG_SIZE_U16 FIELD_PREP(GENMASK_ULL(63, 48), 0x0010)
+#define GZVM_REG_SIZE_U32 FIELD_PREP(GENMASK_ULL(63, 48), 0x0020)
+#define GZVM_REG_SIZE_U64 FIELD_PREP(GENMASK_ULL(63, 48), 0x0030)
+#define GZVM_REG_SIZE_U128 FIELD_PREP(GENMASK_ULL(63, 48), 0x0040)
+#define GZVM_REG_SIZE_U256 FIELD_PREP(GENMASK_ULL(63, 48), 0x0050)
+#define GZVM_REG_SIZE_U512 FIELD_PREP(GENMASK_ULL(63, 48), 0x0060)
+#define GZVM_REG_SIZE_U1024 FIELD_PREP(GENMASK_ULL(63, 48), 0x0070)
+#define GZVM_REG_SIZE_U2048 FIELD_PREP(GENMASK_ULL(63, 48), 0x0080)
+
+#define GZVM_REG_TYPE_GENERAL2 FIELD_PREP(GENMASK(23, 16), 0x10)
+
/* GZVM ioctls */
#define GZVM_IOC_MAGIC 0x92 /* gz */

@@ -51,6 +75,11 @@ struct gzvm_memory_region {

#define GZVM_SET_MEMORY_REGION _IOW(GZVM_IOC_MAGIC, 0x40, \
struct gzvm_memory_region)
+/*
+ * GZVM_CREATE_VCPU receives as a parameter the vcpu slot,
+ * and returns a vcpu fd.
+ */
+#define GZVM_CREATE_VCPU _IO(GZVM_IOC_MAGIC, 0x41)

/**
* struct gzvm_userspace_memory_region: gzvm userspace memory region descriptor
@@ -71,6 +100,127 @@ struct gzvm_userspace_memory_region {
#define GZVM_SET_USER_MEMORY_REGION _IOW(GZVM_IOC_MAGIC, 0x46, \
struct gzvm_userspace_memory_region)

+/*
+ * ioctls for vcpu fds
+ */
+#define GZVM_RUN _IO(GZVM_IOC_MAGIC, 0x80)
+
+/* VM exit reason */
+enum {
+ GZVM_EXIT_UNKNOWN = 0x92920000,
+ GZVM_EXIT_MMIO = 0x92920001,
+ GZVM_EXIT_HYPERCALL = 0x92920002,
+ GZVM_EXIT_IRQ = 0x92920003,
+ GZVM_EXIT_EXCEPTION = 0x92920004,
+ GZVM_EXIT_DEBUG = 0x92920005,
+ GZVM_EXIT_FAIL_ENTRY = 0x92920006,
+ GZVM_EXIT_INTERNAL_ERROR = 0x92920007,
+ GZVM_EXIT_SYSTEM_EVENT = 0x92920008,
+ GZVM_EXIT_SHUTDOWN = 0x92920009,
+ GZVM_EXIT_GZ = 0x9292000a,
+};
+
+/**
+ * struct gzvm_vcpu_run: Same purpose as kvm_run, this struct is
+ * shared between userspace, kernel and
+ * GenieZone hypervisor
+ * @exit_reason: The reason why gzvm_vcpu_run has stopped running the vCPU
+ * @immediate_exit: Polled when the vcpu is scheduled.
+ * If set, immediately returns -EINTR
+ * @padding1: Reserved for future-proof and must be zero filled
+ * @mmio: The nested struct in anonymous union. Handle mmio in host side
+ * @fail_entry: The nested struct in anonymous union.
+ * Handle invalid entry address at the first run
+ * @exception: The nested struct in anonymous union.
+ * Handle exception occurred in VM
+ * @hypercall: The nested struct in anonymous union.
+ * Some hypercalls issued from VM must be handled
+ * @internal: The nested struct in anonymous union. The errors from hypervisor
+ * @system_event: The nested struct in anonymous union.
+ * VM's PSCI must be handled by host
+ * @padding: Fix it to a reasonable size future-proof for keeping the same
+ * struct size when adding new variables in the union is needed
+ *
+ * Keep identical layout between the 3 modules
+ */
+struct gzvm_vcpu_run {
+ /* to userspace */
+ __u32 exit_reason;
+ __u8 immediate_exit;
+ __u8 padding1[3];
+ /* union structure of collection of guest exit reason */
+ union {
+ /* GZVM_EXIT_MMIO */
+ struct {
+ /* From FAR_EL2 */
+ /* The address guest tries to access */
+ __u64 phys_addr;
+ /* The value to be written (is_write is 1) or
+ * be filled by user for reads (is_write is 0)
+ */
+ __u8 data[8];
+ /* From ESR_EL2 as */
+ /* The size of written data.
+ * Only the first `size` bytes of `data` are handled
+ */
+ __u64 size;
+ /* From ESR_EL2 */
+ /* The register number where the data is stored */
+ __u32 reg_nr;
+ /* From ESR_EL2 */
+ /* 1 for VM to perform a write or 0 for VM to perform a read */
+ __u8 is_write;
+ } mmio;
+ /* GZVM_EXIT_FAIL_ENTRY */
+ struct {
+ /* The reason codes about hardware entry failure */
+ __u64 hardware_entry_failure_reason;
+ /* The current processor number via smp_processor_id() */
+ __u32 cpu;
+ } fail_entry;
+ /* GZVM_EXIT_EXCEPTION */
+ struct {
+ /* Which exception vector */
+ __u32 exception;
+ /* Exception error codes */
+ __u32 error_code;
+ } exception;
+ /* GZVM_EXIT_HYPERCALL */
+ struct {
+ /* The hypercall's arguments */
+ __u64 args[8]; /* in-out */
+ } hypercall;
+ /* GZVM_EXIT_INTERNAL_ERROR */
+ struct {
+ /* The errors codes about GZVM_EXIT_INTERNAL_ERROR */
+ __u32 suberror;
+ /* The number of elements used in data[] */
+ __u32 ndata;
+ /* Keep the detailed information about GZVM_EXIT_SYSTEM_EVENT */
+ __u64 data[16];
+ } internal;
+ /* GZVM_EXIT_SYSTEM_EVENT */
+ struct {
+#define GZVM_SYSTEM_EVENT_SHUTDOWN 1
+#define GZVM_SYSTEM_EVENT_RESET 2
+#define GZVM_SYSTEM_EVENT_CRASH 3
+#define GZVM_SYSTEM_EVENT_WAKEUP 4
+#define GZVM_SYSTEM_EVENT_SUSPEND 5
+#define GZVM_SYSTEM_EVENT_SEV_TERM 6
+#define GZVM_SYSTEM_EVENT_S2IDLE 7
+ /* System event type.
+ * Ex. GZVM_SYSTEM_EVENT_SHUTDOWN or GZVM_SYSTEM_EVENT_RESET...etc.
+ */
+ __u32 type;
+ /* The number of elements used in data[] */
+ __u32 ndata;
+ /* Keep the detailed information about GZVM_EXIT_SYSTEM_EVENT */
+ __u64 data[16];
+ } system_event;
+ char padding[256];
+ };
+};
+
/**
* struct gzvm_enable_cap: The `capability support` on GenieZone hypervisor
* @cap: `GZVM_CAP_ARM_PROTECTED_VM` or `GZVM_CAP_ARM_VM_IPA_SIZE`
@@ -84,4 +234,17 @@ struct gzvm_enable_cap {
#define GZVM_ENABLE_CAP _IOW(GZVM_IOC_MAGIC, 0xa3, \
struct gzvm_enable_cap)

+/* for GZVM_GET/SET_ONE_REG */
+struct gzvm_one_reg {
+ __u64 id;
+ __u64 addr;
+};
+
+#define GZVM_GET_ONE_REG _IOW(GZVM_IOC_MAGIC, 0xab, \
+ struct gzvm_one_reg)
+#define GZVM_SET_ONE_REG _IOW(GZVM_IOC_MAGIC, 0xac, \
+ struct gzvm_one_reg)
+
+#define GZVM_REG_GENERIC 0x0000000000000000ULL
+
#endif /* __GZVM_H__ */
--
2.18.0


2024-04-12 07:14:38

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v10 06/21] virt: geniezone: Add set_user_memory_region for vm

From: "Yingshiuan Pan" <[email protected]>

Direct use of physical memory from VMs is forbidden and designed to be
dictated to the privilege models managed by GenieZone hypervisor for
security reason. With the help of gzvm-ko, the hypervisor would be able
to manipulate memory as objects. And the memory management is highly
integrated with ARM 2-stage translation tables to convert VA to IPA to
PA under proper security measures required by protected VMs.

Signed-off-by: Yingshiuan Pan <[email protected]>
Signed-off-by: Jerry Wang <[email protected]>
Signed-off-by: Liju Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
arch/arm64/geniezone/gzvm_arch_common.h | 2 +
arch/arm64/geniezone/vm.c | 9 ++
drivers/virt/geniezone/gzvm_vm.c | 114 ++++++++++++++++++++++++
include/linux/soc/mediatek/gzvm_drv.h | 59 ++++++++++++
include/uapi/linux/gzvm.h | 31 +++++++
5 files changed, 215 insertions(+)

diff --git a/arch/arm64/geniezone/gzvm_arch_common.h b/arch/arm64/geniezone/gzvm_arch_common.h
index 60ee5ed2b39f..4250c0f567e7 100644
--- a/arch/arm64/geniezone/gzvm_arch_common.h
+++ b/arch/arm64/geniezone/gzvm_arch_common.h
@@ -11,6 +11,7 @@
enum {
GZVM_FUNC_CREATE_VM = 0,
GZVM_FUNC_DESTROY_VM = 1,
+ GZVM_FUNC_SET_MEMREGION = 4,
GZVM_FUNC_PROBE = 12,
NR_GZVM_FUNC,
};
@@ -23,6 +24,7 @@ enum {

#define MT_HVC_GZVM_CREATE_VM GZVM_HCALL_ID(GZVM_FUNC_CREATE_VM)
#define MT_HVC_GZVM_DESTROY_VM GZVM_HCALL_ID(GZVM_FUNC_DESTROY_VM)
+#define MT_HVC_GZVM_SET_MEMREGION GZVM_HCALL_ID(GZVM_FUNC_SET_MEMREGION)
#define MT_HVC_GZVM_PROBE GZVM_HCALL_ID(GZVM_FUNC_PROBE)

/**
diff --git a/arch/arm64/geniezone/vm.c b/arch/arm64/geniezone/vm.c
index 8ee5490d604a..d4f0aa81d224 100644
--- a/arch/arm64/geniezone/vm.c
+++ b/arch/arm64/geniezone/vm.c
@@ -63,6 +63,15 @@ int gzvm_arch_probe(void)
return 0;
}

+int gzvm_arch_set_memregion(u16 vm_id, size_t buf_size,
+ phys_addr_t region)
+{
+ struct arm_smccc_res res;
+
+ return gzvm_hypcall_wrapper(MT_HVC_GZVM_SET_MEMREGION, vm_id,
+ buf_size, region, 0, 0, 0, 0, &res);
+}
+
/**
* gzvm_arch_create_vm() - create vm
* @vm_type: VM type. Only supports Linux VM now.
diff --git a/drivers/virt/geniezone/gzvm_vm.c b/drivers/virt/geniezone/gzvm_vm.c
index 76722dba6b1f..fed426e7d375 100644
--- a/drivers/virt/geniezone/gzvm_vm.c
+++ b/drivers/virt/geniezone/gzvm_vm.c
@@ -15,6 +15,119 @@
static DEFINE_MUTEX(gzvm_list_lock);
static LIST_HEAD(gzvm_list);

+int gzvm_gfn_to_hva_memslot(struct gzvm_memslot *memslot, u64 gfn,
+ u64 *hva_memslot)
+{
+ u64 offset;
+
+ if (gfn < memslot->base_gfn)
+ return -EINVAL;
+
+ offset = gfn - memslot->base_gfn;
+ *hva_memslot = memslot->userspace_addr + offset * PAGE_SIZE;
+ return 0;
+}
+
+/**
+ * register_memslot_addr_range() - Register memory region to GenieZone
+ * @gzvm: Pointer to struct gzvm
+ * @memslot: Pointer to struct gzvm_memslot
+ *
+ * Return: 0 for success, negative number for error
+ */
+static int
+register_memslot_addr_range(struct gzvm *gzvm, struct gzvm_memslot *memslot)
+{
+ struct gzvm_memory_region_ranges *region;
+ u32 buf_size = PAGE_SIZE * 2;
+ u64 gfn;
+
+ region = alloc_pages_exact(buf_size, GFP_KERNEL);
+ if (!region)
+ return -ENOMEM;
+
+ region->slot = memslot->slot_id;
+ region->total_pages = memslot->npages;
+ gfn = memslot->base_gfn;
+ region->gpa = PFN_PHYS(gfn);
+
+ if (gzvm_arch_set_memregion(gzvm->vm_id, buf_size,
+ virt_to_phys(region))) {
+ pr_err("Failed to register memregion to hypervisor\n");
+ free_pages_exact(region, buf_size);
+ return -EFAULT;
+ }
+
+ free_pages_exact(region, buf_size);
+ return 0;
+}
+
+/**
+ * gzvm_vm_ioctl_set_memory_region() - Set memory region of guest
+ * @gzvm: Pointer to struct gzvm.
+ * @mem: Input memory region from user.
+ *
+ * Return: 0 for success, negative number for error
+ *
+ * -EXIO - The memslot is out-of-range
+ * -EFAULT - Cannot find corresponding vma
+ * -EINVAL - Region size and VMA size mismatch
+ */
+static int
+gzvm_vm_ioctl_set_memory_region(struct gzvm *gzvm,
+ struct gzvm_userspace_memory_region *mem)
+{
+ struct vm_area_struct *vma;
+ struct gzvm_memslot *memslot;
+ unsigned long size;
+
+ if (mem->slot >= GZVM_MAX_MEM_REGION)
+ return -ENXIO;
+
+ memslot = &gzvm->memslot[mem->slot];
+
+ vma = vma_lookup(gzvm->mm, mem->userspace_addr);
+ if (!vma)
+ return -EFAULT;
+
+ size = vma->vm_end - vma->vm_start;
+ if (size != mem->memory_size)
+ return -EINVAL;
+
+ memslot->base_gfn = __phys_to_pfn(mem->guest_phys_addr);
+ memslot->npages = size >> PAGE_SHIFT;
+ memslot->userspace_addr = mem->userspace_addr;
+ memslot->vma = vma;
+ memslot->flags = mem->flags;
+ memslot->slot_id = mem->slot;
+ return register_memslot_addr_range(gzvm, memslot);
+}
+
+/* gzvm_vm_ioctl() - Ioctl handler of VM FD */
+static long gzvm_vm_ioctl(struct file *filp, unsigned int ioctl,
+ unsigned long arg)
+{
+ long ret;
+ void __user *argp = (void __user *)arg;
+ struct gzvm *gzvm = filp->private_data;
+
+ switch (ioctl) {
+ case GZVM_SET_USER_MEMORY_REGION: {
+ struct gzvm_userspace_memory_region userspace_mem;
+
+ if (copy_from_user(&userspace_mem, argp, sizeof(userspace_mem)))
+ return -EFAULT;
+
+ ret = gzvm_vm_ioctl_set_memory_region(gzvm, &userspace_mem);
+ break;
+ }
+ default:
+ ret = -ENOTTY;
+ }
+out:
+ return ret;
+}
+
static void gzvm_destroy_vm(struct gzvm *gzvm)
{
pr_debug("VM-%u is going to be destroyed\n", gzvm->vm_id);
@@ -42,6 +155,7 @@ static int gzvm_vm_release(struct inode *inode, struct file *filp)

static const struct file_operations gzvm_vm_fops = {
.release = gzvm_vm_release,
+ .unlocked_ioctl = gzvm_vm_ioctl,
.llseek = noop_llseek,
};

diff --git a/include/linux/soc/mediatek/gzvm_drv.h b/include/linux/soc/mediatek/gzvm_drv.h
index e7c29c826a7c..e8dded3419d6 100644
--- a/include/linux/soc/mediatek/gzvm_drv.h
+++ b/include/linux/soc/mediatek/gzvm_drv.h
@@ -7,9 +7,16 @@
#define __GZVM_DRV_H__

#include <linux/list.h>
+#include <linux/mm.h>
#include <linux/mutex.h>
#include <linux/gzvm.h>

+/*
+ * For the normal physical address, the highest 12 bits should be zero, so we
+ * can mask bit 62 ~ bit 52 to indicate the error physical address
+ */
+#define GZVM_PA_ERR_BAD (0x7ffULL << 52)
+
#define INVALID_VM_ID 0xffff

/*
@@ -23,16 +30,63 @@
#define ERR_NOT_IMPLEMENTED (-27)
#define ERR_FAULT (-40)

+/*
+ * The following data structures are for data transferring between driver and
+ * hypervisor, and they're aligned with hypervisor definitions
+ */
+#define GZVM_MAX_MEM_REGION 10
+
+/**
+ * struct mem_region_addr_range: identical to ffa memory constituent
+ * @address: the base IPA of the constituent memory region, aligned to 4 kiB
+ * @pg_cnt: the number of 4 kiB pages in the constituent memory region
+ * @reserved: reserved for 64bit alignment
+ */
+struct mem_region_addr_range {
+ __u64 address;
+ __u32 pg_cnt;
+ __u32 reserved;
+};
+
+struct gzvm_memory_region_ranges {
+ __u32 slot;
+ __u32 constituent_cnt;
+ __u64 total_pages;
+ __u64 gpa;
+ struct mem_region_addr_range constituents[];
+};
+
+/**
+ * struct gzvm_memslot: VM's memory slot descriptor
+ * @base_gfn: begin of guest page frame
+ * @npages: number of pages this slot covers
+ * @userspace_addr: corresponding userspace va
+ * @vma: vma related to this userspace addr
+ * @flags: define the usage of memory region. Ex. guest memory or
+ * firmware protection
+ * @slot_id: the id is used to identify the memory slot
+ */
+struct gzvm_memslot {
+ u64 base_gfn;
+ unsigned long npages;
+ unsigned long userspace_addr;
+ struct vm_area_struct *vma;
+ u32 flags;
+ u32 slot_id;
+};
+
/**
* struct gzvm: the following data structures are for data transferring between
* driver and hypervisor, and they're aligned with hypervisor definitions.
* @mm: userspace tied to this vm
+ * @memslot: VM's memory slot descriptor
* @lock: lock for list_add
* @vm_list: list head for vm list
* @vm_id: vm id
*/
struct gzvm {
struct mm_struct *mm;
+ struct gzvm_memslot memslot[GZVM_MAX_MEM_REGION];
struct mutex lock;
struct list_head vm_list;
u16 vm_id;
@@ -46,7 +100,12 @@ void gzvm_destroy_all_vms(void);

/* arch-dependant functions */
int gzvm_arch_probe(void);
+int gzvm_arch_set_memregion(u16 vm_id, size_t buf_size,
+ phys_addr_t region);
int gzvm_arch_create_vm(unsigned long vm_type);
int gzvm_arch_destroy_vm(u16 vm_id);

+int gzvm_gfn_to_hva_memslot(struct gzvm_memslot *memslot, u64 gfn,
+ u64 *hva_memslot);
+
#endif /* __GZVM_DRV_H__ */
diff --git a/include/uapi/linux/gzvm.h b/include/uapi/linux/gzvm.h
index c26c7720fab7..59c0f790b2e6 100644
--- a/include/uapi/linux/gzvm.h
+++ b/include/uapi/linux/gzvm.h
@@ -22,4 +22,35 @@
/* ioctls for /dev/gzvm fds */
#define GZVM_CREATE_VM _IO(GZVM_IOC_MAGIC, 0x01) /* Returns a Geniezone VM fd */

+/* ioctls for VM fds */
+/* for GZVM_SET_MEMORY_REGION */
+struct gzvm_memory_region {
+ __u32 slot;
+ __u32 flags;
+ __u64 guest_phys_addr;
+ __u64 memory_size; /* bytes */
+};
+
+#define GZVM_SET_MEMORY_REGION _IOW(GZVM_IOC_MAGIC, 0x40, \
+ struct gzvm_memory_region)
+
+/**
+ * struct gzvm_userspace_memory_region: gzvm userspace memory region descriptor
+ * @slot: memory slot
+ * @flags: describe the usage of userspace memory region
+ * @guest_phys_addr: guest vm's physical address
+ * @memory_size: memory size in bytes
+ * @userspace_addr: start of the userspace allocated memory
+ */
+struct gzvm_userspace_memory_region {
+ __u32 slot;
+ __u32 flags;
+ __u64 guest_phys_addr;
+ __u64 memory_size;
+ __u64 userspace_addr;
+};
+
+#define GZVM_SET_USER_MEMORY_REGION _IOW(GZVM_IOC_MAGIC, 0x46, \
+ struct gzvm_userspace_memory_region)
+
#endif /* __GZVM_H__ */
--
2.18.0


2024-04-12 07:14:48

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v10 17/21] virt: geniezone: Add memory pin/unpin support

From: "Jerry Wang" <[email protected]>

Protected VM's memory cannot be swapped out because the memory pages are
protected from host access.

Once host accesses to those protected pages, the hardware exception is
triggered and may crash the host. So, we have to make those protected
pages be ineligible for swapping or merging by the host kernel to avoid
host access. To do so, we pin the page when it is assigned (donated) to
VM and unpin when VM relinquish the pages or is destroyed. Besides, the
protected VM’s memory requires hypervisor to clear the content before
returning to host, but VMM may free those memory before clearing, it
will result in those memory pages are reclaimed and reused before
totally clearing. Using pin/unpin can also avoid the above problems.

The implementation is described as follows.
- Use rb_tree to store pinned memory pages.
- Pin the page when handling page fault.
- Unpin the pages when VM relinquish the pages or is destroyed.

Signed-off-by: Jerry Wang <[email protected]>
Signed-off-by: Yingshiuan Pan <[email protected]>
Signed-off-by: Liju Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
arch/arm64/geniezone/vm.c | 8 ++-
drivers/virt/geniezone/gzvm_mmu.c | 88 +++++++++++++++++++++++++--
drivers/virt/geniezone/gzvm_vm.c | 21 +++++++
include/linux/soc/mediatek/gzvm_drv.h | 15 ++++-
4 files changed, 122 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/geniezone/vm.c b/arch/arm64/geniezone/vm.c
index 4691a3ada678..eb28c3850b5d 100644
--- a/arch/arm64/geniezone/vm.c
+++ b/arch/arm64/geniezone/vm.c
@@ -211,12 +211,14 @@ static int gzvm_vm_ioctl_get_pvmfw_size(struct gzvm *gzvm,
* @gfn: Guest frame number.
* @total_pages: Total page numbers.
* @slot: Pointer to struct gzvm_memslot.
+ * @gzvm: Pointer to struct gzvm.
*
* Return: how many pages we've fill in, negative if error
*/
static int fill_constituents(struct mem_region_addr_range *consti,
int *consti_cnt, int max_nr_consti, u64 gfn,
- u32 total_pages, struct gzvm_memslot *slot)
+ u32 total_pages, struct gzvm_memslot *slot,
+ struct gzvm *gzvm)
{
u64 pfn = 0, prev_pfn = 0, gfn_end = 0;
int nr_pages = 0;
@@ -227,7 +229,7 @@ static int fill_constituents(struct mem_region_addr_range *consti,
gfn_end = gfn + total_pages;

while (i < max_nr_consti && gfn < gfn_end) {
- if (gzvm_vm_allocate_guest_page(slot, gfn, &pfn) != 0)
+ if (gzvm_vm_allocate_guest_page(gzvm, slot, gfn, &pfn) != 0)
return -EFAULT;
if (pfn == (prev_pfn + 1)) {
consti[i].pg_cnt++;
@@ -284,7 +286,7 @@ int gzvm_vm_populate_mem_region(struct gzvm *gzvm, int slot_id)
nr_pages = fill_constituents(region->constituents,
&region->constituent_cnt,
max_nr_consti, gfn,
- remain_pages, memslot);
+ remain_pages, memslot, gzvm);

if (nr_pages < 0) {
pr_err("Failed to fill constituents\n");
diff --git a/drivers/virt/geniezone/gzvm_mmu.c b/drivers/virt/geniezone/gzvm_mmu.c
index eff02a5e1b17..7bc96cba1ecb 100644
--- a/drivers/virt/geniezone/gzvm_mmu.c
+++ b/drivers/virt/geniezone/gzvm_mmu.c
@@ -108,11 +108,88 @@ int gzvm_gfn_to_pfn_memslot(struct gzvm_memslot *memslot, u64 gfn,
return 0;
}

-int gzvm_vm_allocate_guest_page(struct gzvm_memslot *slot, u64 gfn, u64 *pfn)
+static int cmp_ppages(struct rb_node *node, const struct rb_node *parent)
{
+ struct gzvm_pinned_page *a = container_of(node,
+ struct gzvm_pinned_page,
+ node);
+ struct gzvm_pinned_page *b = container_of(parent,
+ struct gzvm_pinned_page,
+ node);
+
+ if (a->ipa < b->ipa)
+ return -1;
+ if (a->ipa > b->ipa)
+ return 1;
+ return 0;
+}
+
+/* Invoker of this function is responsible for locking */
+static int gzvm_insert_ppage(struct gzvm *vm, struct gzvm_pinned_page *ppage)
+{
+ if (rb_find_add(&ppage->node, &vm->pinned_pages, cmp_ppages))
+ return -EEXIST;
+ return 0;
+}
+
+static int pin_one_page(struct gzvm *vm, unsigned long hva, u64 gpa)
+{
+ unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
+ struct gzvm_pinned_page *ppage = NULL;
+ struct mm_struct *mm = current->mm;
+ struct page *page = NULL;
+ int ret;
+
+ ppage = kmalloc(sizeof(*ppage), GFP_KERNEL_ACCOUNT);
+ if (!ppage)
+ return -ENOMEM;
+
+ mmap_read_lock(mm);
+ pin_user_pages(hva, 1, flags, &page);
+ mmap_read_unlock(mm);
+
+ if (!page) {
+ kfree(ppage);
+ return -EFAULT;
+ }
+
+ ppage->page = page;
+ ppage->ipa = gpa;
+
+ mutex_lock(&vm->mem_lock);
+ ret = gzvm_insert_ppage(vm, ppage);
+
+ /**
+ * The return of -EEXIST from gzvm_insert_ppage is considered an
+ * expected behavior in this context.
+ * This situation arises when two or more VCPUs are concurrently
+ * engaged in demand paging handling. The initial VCPU has already
+ * allocated and pinned a page, while the subsequent VCPU attempts
+ * to pin the same page again. As a result, we prompt the unpinning
+ * and release of the allocated structure, followed by a return 0.
+ */
+ if (ret == -EEXIST) {
+ kfree(ppage);
+ unpin_user_pages(&page, 1);
+ ret = 0;
+ }
+ mutex_unlock(&vm->mem_lock);
+
+ return ret;
+}
+
+int gzvm_vm_allocate_guest_page(struct gzvm *vm, struct gzvm_memslot *slot,
+ u64 gfn, u64 *pfn)
+{
+ unsigned long hva;
+
if (gzvm_gfn_to_pfn_memslot(slot, gfn, pfn) != 0)
return -EFAULT;
- return 0;
+
+ if (gzvm_gfn_to_hva_memslot(slot, gfn, (u64 *)&hva) != 0)
+ return -EINVAL;
+
+ return pin_one_page(vm, hva, PFN_PHYS(gfn));
}

static int handle_block_demand_page(struct gzvm *vm, int memslot_id, u64 gfn)
@@ -138,7 +215,7 @@ static int handle_block_demand_page(struct gzvm *vm, int memslot_id, u64 gfn)

mutex_lock(&vm->demand_paging_lock);
for (i = 0, __gfn = start_gfn; i < nr_entries; i++, __gfn++) {
- ret = gzvm_vm_allocate_guest_page(memslot, __gfn, &pfn);
+ ret = gzvm_vm_allocate_guest_page(vm, memslot, __gfn, &pfn);
if (unlikely(ret)) {
ret = -ERR_FAULT;
goto err_unlock;
@@ -164,15 +241,14 @@ static int handle_single_demand_page(struct gzvm *vm, int memslot_id, u64 gfn)
int ret;
u64 pfn;

- ret = gzvm_vm_allocate_guest_page(&vm->memslot[memslot_id], gfn, &pfn);
+ ret = gzvm_vm_allocate_guest_page(vm, &vm->memslot[memslot_id], gfn, &pfn);
if (unlikely(ret))
return -EFAULT;

ret = gzvm_arch_map_guest(vm->vm_id, memslot_id, pfn, gfn, 1);
if (unlikely(ret))
return -EFAULT;
-
- return 0;
+ return ret;
}

/**
diff --git a/drivers/virt/geniezone/gzvm_vm.c b/drivers/virt/geniezone/gzvm_vm.c
index d698e4e86b0e..04af59b77189 100644
--- a/drivers/virt/geniezone/gzvm_vm.c
+++ b/drivers/virt/geniezone/gzvm_vm.c
@@ -299,6 +299,22 @@ static long gzvm_vm_ioctl(struct file *filp, unsigned int ioctl,
return ret;
}

+/* Invoker of this function is responsible for locking */
+static void gzvm_destroy_all_ppage(struct gzvm *gzvm)
+{
+ struct gzvm_pinned_page *ppage;
+ struct rb_node *node;
+
+ node = rb_first(&gzvm->pinned_pages);
+ while (node) {
+ ppage = rb_entry(node, struct gzvm_pinned_page, node);
+ unpin_user_pages_dirty_lock(&ppage->page, 1, true);
+ node = rb_next(node);
+ rb_erase(&ppage->node, &gzvm->pinned_pages);
+ kfree(ppage);
+ }
+}
+
static void gzvm_destroy_vm(struct gzvm *gzvm)
{
size_t allocated_size;
@@ -322,6 +338,9 @@ static void gzvm_destroy_vm(struct gzvm *gzvm)

mutex_unlock(&gzvm->lock);

+ /* No need to lock here becauese it's single-threaded execution */
+ gzvm_destroy_all_ppage(gzvm);
+
kfree(gzvm);
}

@@ -415,6 +434,8 @@ static struct gzvm *gzvm_create_vm(unsigned long vm_type)
gzvm->vm_id = ret;
gzvm->mm = current->mm;
mutex_init(&gzvm->lock);
+ mutex_init(&gzvm->mem_lock);
+ gzvm->pinned_pages = RB_ROOT;

ret = gzvm_vm_irqfd_init(gzvm);
if (ret) {
diff --git a/include/linux/soc/mediatek/gzvm_drv.h b/include/linux/soc/mediatek/gzvm_drv.h
index 1c16960a1728..bf5f1abf8dbe 100644
--- a/include/linux/soc/mediatek/gzvm_drv.h
+++ b/include/linux/soc/mediatek/gzvm_drv.h
@@ -12,6 +12,7 @@
#include <linux/mutex.h>
#include <linux/gzvm.h>
#include <linux/srcu.h>
+#include <linux/rbtree.h>

/*
* For the normal physical address, the highest 12 bits should be zero, so we
@@ -99,6 +100,12 @@ struct gzvm_vcpu {
struct gzvm_vcpu_hwstate *hwstate;
};

+struct gzvm_pinned_page {
+ struct rb_node node;
+ struct page *page;
+ u64 ipa;
+};
+
/**
* struct gzvm: the following data structures are for data transferring between
* driver and hypervisor, and they're aligned with hypervisor definitions.
@@ -119,6 +126,8 @@ struct gzvm_vcpu {
* @demand_page_buffer: the mailbox for transferring large portion pages
* @demand_paging_lock: lock for preventing multiple cpu using the same demand
* page mailbox at the same time
+ * @pinned_pages: use rb-tree to record pin/unpin page
+ * @mem_lock: lock for memory operations
*/
struct gzvm {
struct gzvm_vcpu *vcpus[GZVM_MAX_VCPUS];
@@ -146,6 +155,9 @@ struct gzvm {
u32 demand_page_gran;
u64 *demand_page_buffer;
struct mutex demand_paging_lock;
+
+ struct rb_root pinned_pages;
+ struct mutex mem_lock;
};

long gzvm_dev_ioctl_check_extension(struct gzvm *gzvm, unsigned long args);
@@ -178,7 +190,8 @@ int gzvm_gfn_to_pfn_memslot(struct gzvm_memslot *memslot, u64 gfn, u64 *pfn);
int gzvm_gfn_to_hva_memslot(struct gzvm_memslot *memslot, u64 gfn,
u64 *hva_memslot);
int gzvm_vm_populate_mem_region(struct gzvm *gzvm, int slot_id);
-int gzvm_vm_allocate_guest_page(struct gzvm_memslot *slot, u64 gfn, u64 *pfn);
+int gzvm_vm_allocate_guest_page(struct gzvm *gzvm, struct gzvm_memslot *slot,
+ u64 gfn, u64 *pfn);

int gzvm_vm_ioctl_create_vcpu(struct gzvm *gzvm, u32 cpuid);
int gzvm_arch_vcpu_update_one_reg(struct gzvm_vcpu *vcpu, __u64 reg_id,
--
2.18.0


2024-04-12 07:21:02

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v10 18/21] virt: geniezone: Add memory relinquish support

From: "Jerry Wang" <[email protected]>

Unpin the pages when VM relinquish the pages or is destroyed.

Signed-off-by: Jerry Wang <[email protected]>
Signed-off-by: Yingshiuan Pan <[email protected]>
Signed-off-by: Liju-Clr Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
drivers/virt/geniezone/gzvm_exception.c | 23 ++++++++++++
drivers/virt/geniezone/gzvm_mmu.c | 49 +++++++++++++++++++++++++
drivers/virt/geniezone/gzvm_vcpu.c | 6 ++-
include/linux/soc/mediatek/gzvm_drv.h | 2 +
include/uapi/linux/gzvm.h | 5 +++
5 files changed, 83 insertions(+), 2 deletions(-)

diff --git a/drivers/virt/geniezone/gzvm_exception.c b/drivers/virt/geniezone/gzvm_exception.c
index 475bc15b0689..07871ec74651 100644
--- a/drivers/virt/geniezone/gzvm_exception.c
+++ b/drivers/virt/geniezone/gzvm_exception.c
@@ -37,3 +37,26 @@ bool gzvm_handle_guest_exception(struct gzvm_vcpu *vcpu)
else
return false;
}
+
+/**
+ * gzvm_handle_guest_hvc() - Handle guest hvc
+ * @vcpu: Pointer to struct gzvm_vcpu struct
+ * Return:
+ * * true - This hvc has been processed, no need to back to VMM.
+ * * false - This hvc has not been processed, require userspace.
+ */
+bool gzvm_handle_guest_hvc(struct gzvm_vcpu *vcpu)
+{
+ unsigned long ipa;
+ int ret;
+
+ switch (vcpu->run->hypercall.args[0]) {
+ case GZVM_HVC_MEM_RELINQUISH:
+ ipa = vcpu->run->hypercall.args[1];
+ ret = gzvm_handle_relinquish(vcpu, ipa);
+ return (ret == 0) ? true : false;
+ default:
+ break;
+ }
+ return false;
+}
diff --git a/drivers/virt/geniezone/gzvm_mmu.c b/drivers/virt/geniezone/gzvm_mmu.c
index 7bc96cba1ecb..4ce3ec49adba 100644
--- a/drivers/virt/geniezone/gzvm_mmu.c
+++ b/drivers/virt/geniezone/gzvm_mmu.c
@@ -132,6 +132,36 @@ static int gzvm_insert_ppage(struct gzvm *vm, struct gzvm_pinned_page *ppage)
return 0;
}

+static int rb_ppage_cmp(const void *key, const struct rb_node *node)
+{
+ struct gzvm_pinned_page *p = container_of(node,
+ struct gzvm_pinned_page,
+ node);
+ phys_addr_t ipa = (phys_addr_t)key;
+
+ return (ipa < p->ipa) ? -1 : (ipa > p->ipa);
+}
+
+/* Invoker of this function is responsible for locking */
+static int gzvm_remove_ppage(struct gzvm *vm, phys_addr_t ipa)
+{
+ struct gzvm_pinned_page *ppage;
+ struct rb_node *node;
+
+ node = rb_find((void *)ipa, &vm->pinned_pages, rb_ppage_cmp);
+
+ if (node)
+ rb_erase(node, &vm->pinned_pages);
+ else
+ return 0;
+
+ ppage = container_of(node, struct gzvm_pinned_page, node);
+ unpin_user_pages_dirty_lock(&ppage->page, 1, true);
+ kfree(ppage);
+
+ return 0;
+}
+
static int pin_one_page(struct gzvm *vm, unsigned long hva, u64 gpa)
{
unsigned int flags = FOLL_HWPOISON | FOLL_LONGTERM | FOLL_WRITE;
@@ -178,6 +208,25 @@ static int pin_one_page(struct gzvm *vm, unsigned long hva, u64 gpa)
return ret;
}

+/**
+ * gzvm_handle_relinquish() - Handle memory relinquish request from hypervisor
+ *
+ * @vcpu: Pointer to struct gzvm_vcpu_run in userspace
+ * @ipa: Start address(gpa) of a reclaimed page
+ *
+ * Return: Always return 0 because there are no cases of failure
+ */
+int gzvm_handle_relinquish(struct gzvm_vcpu *vcpu, phys_addr_t ipa)
+{
+ struct gzvm *vm = vcpu->gzvm;
+
+ mutex_lock(&vm->mem_lock);
+ gzvm_remove_ppage(vm, ipa);
+ mutex_unlock(&vm->mem_lock);
+
+ return 0;
+}
+
int gzvm_vm_allocate_guest_page(struct gzvm *vm, struct gzvm_memslot *slot,
u64 gfn, u64 *pfn)
{
diff --git a/drivers/virt/geniezone/gzvm_vcpu.c b/drivers/virt/geniezone/gzvm_vcpu.c
index e8d6f32f325c..e135d9388090 100644
--- a/drivers/virt/geniezone/gzvm_vcpu.c
+++ b/drivers/virt/geniezone/gzvm_vcpu.c
@@ -113,12 +113,14 @@ static long gzvm_vcpu_run(struct gzvm_vcpu *vcpu, void __user *argp)
* it's geniezone's responsibility to fill corresponding data
* structure
*/
+ case GZVM_EXIT_HYPERCALL:
+ if (!gzvm_handle_guest_hvc(vcpu))
+ need_userspace = true;
+ break;
case GZVM_EXIT_EXCEPTION:
if (!gzvm_handle_guest_exception(vcpu))
need_userspace = true;
break;
- case GZVM_EXIT_HYPERCALL:
- fallthrough;
case GZVM_EXIT_DEBUG:
fallthrough;
case GZVM_EXIT_FAIL_ENTRY:
diff --git a/include/linux/soc/mediatek/gzvm_drv.h b/include/linux/soc/mediatek/gzvm_drv.h
index bf5f1abf8dbe..2e5e9c67cfa5 100644
--- a/include/linux/soc/mediatek/gzvm_drv.h
+++ b/include/linux/soc/mediatek/gzvm_drv.h
@@ -204,6 +204,8 @@ int gzvm_arch_inform_exit(u16 vm_id);
int gzvm_find_memslot(struct gzvm *vm, u64 gpa);
int gzvm_handle_page_fault(struct gzvm_vcpu *vcpu);
bool gzvm_handle_guest_exception(struct gzvm_vcpu *vcpu);
+int gzvm_handle_relinquish(struct gzvm_vcpu *vcpu, phys_addr_t ipa);
+bool gzvm_handle_guest_hvc(struct gzvm_vcpu *vcpu);

int gzvm_arch_create_device(u16 vm_id, struct gzvm_create_device *gzvm_dev);
int gzvm_arch_inject_irq(struct gzvm *gzvm, unsigned int vcpu_idx,
diff --git a/include/uapi/linux/gzvm.h b/include/uapi/linux/gzvm.h
index 0d38a0963cb7..5411357ec05e 100644
--- a/include/uapi/linux/gzvm.h
+++ b/include/uapi/linux/gzvm.h
@@ -195,6 +195,11 @@ enum {
GZVM_EXCEPTION_PAGE_FAULT = 0x1,
};

+/* hypercall definitions of GZVM_EXIT_HYPERCALL */
+enum {
+ GZVM_HVC_MEM_RELINQUISH = 0xc6000009,
+};
+
/**
* struct gzvm_vcpu_run: Same purpose as kvm_run, this struct is
* shared between userspace, kernel and
--
2.18.0


2024-04-12 07:21:42

by Yi-De Wu

[permalink] [raw]
Subject: [PATCH v10 11/21] virt: geniezone: Add irqfd support

From: "Yingshiuan Pan" <[email protected]>

irqfd enables other threads than vcpu threads to inject virtual
interrupt through irqfd asynchronously rather through ioctl interface.
This interface is necessary for VMM which creates separated thread for
IO handling or uses vhost devices.

Signed-off-by: Yingshiuan Pan <[email protected]>
Signed-off-by: kevenny hsieh <[email protected]>
Signed-off-by: Liju Chen <[email protected]>
Signed-off-by: Yi-De Wu <[email protected]>
---
arch/arm64/geniezone/gzvm_arch_common.h | 18 ++
drivers/virt/geniezone/Makefile | 3 +-
drivers/virt/geniezone/gzvm_irqfd.c | 382 ++++++++++++++++++++++++
drivers/virt/geniezone/gzvm_main.c | 12 +-
drivers/virt/geniezone/gzvm_vcpu.c | 1 +
drivers/virt/geniezone/gzvm_vm.c | 18 ++
include/linux/soc/mediatek/gzvm_drv.h | 26 ++
include/uapi/linux/gzvm.h | 26 ++
8 files changed, 484 insertions(+), 2 deletions(-)
create mode 100644 drivers/virt/geniezone/gzvm_irqfd.c

diff --git a/arch/arm64/geniezone/gzvm_arch_common.h b/arch/arm64/geniezone/gzvm_arch_common.h
index eb7a0b7ded8c..d4b49a4b283a 100644
--- a/arch/arm64/geniezone/gzvm_arch_common.h
+++ b/arch/arm64/geniezone/gzvm_arch_common.h
@@ -45,6 +45,8 @@ enum {
#define MT_HVC_GZVM_ENABLE_CAP GZVM_HCALL_ID(GZVM_FUNC_ENABLE_CAP)
#define MT_HVC_GZVM_INFORM_EXIT GZVM_HCALL_ID(GZVM_FUNC_INFORM_EXIT)

+#define GIC_V3_NR_LRS 16
+
/**
* gzvm_hypcall_wrapper() - the wrapper for hvc calls
* @a0: argument passed in registers 0
@@ -65,6 +67,22 @@ int gzvm_hypcall_wrapper(unsigned long a0, unsigned long a1,
unsigned long a6, unsigned long a7,
struct arm_smccc_res *res);

+/**
+ * struct gzvm_vcpu_hwstate: Sync architecture state back to host for handling
+ * @nr_lrs: The available LRs(list registers) in Soc.
+ * @__pad: add an explicit '__u32 __pad;' in the middle to make it clear
+ * what the actual layout is.
+ * @lr: The array of LRs(list registers).
+ *
+ * - Keep the same layout of hypervisor data struct.
+ * - Sync list registers back for acking virtual device interrupt status.
+ */
+struct gzvm_vcpu_hwstate {
+ __le32 nr_lrs;
+ __le32 __pad;
+ __le64 lr[GIC_V3_NR_LRS];
+};
+
static inline unsigned int
assemble_vm_vcpu_tuple(u16 vmid, u16 vcpuid)
{
diff --git a/drivers/virt/geniezone/Makefile b/drivers/virt/geniezone/Makefile
index a630b919cda5..cebe5ad53f41 100644
--- a/drivers/virt/geniezone/Makefile
+++ b/drivers/virt/geniezone/Makefile
@@ -7,4 +7,5 @@
GZVM_DIR ?= ../../../drivers/virt/geniezone

gzvm-y := $(GZVM_DIR)/gzvm_main.o $(GZVM_DIR)/gzvm_vm.o \
- $(GZVM_DIR)/gzvm_mmu.o $(GZVM_DIR)/gzvm_vcpu.o
+ $(GZVM_DIR)/gzvm_mmu.o $(GZVM_DIR)/gzvm_vcpu.o \
+ $(GZVM_DIR)/gzvm_irqfd.o
diff --git a/drivers/virt/geniezone/gzvm_irqfd.c b/drivers/virt/geniezone/gzvm_irqfd.c
new file mode 100644
index 000000000000..8095a5a68fd8
--- /dev/null
+++ b/drivers/virt/geniezone/gzvm_irqfd.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2023 MediaTek Inc.
+ */
+
+#include <linux/eventfd.h>
+#include <linux/syscalls.h>
+#include <linux/soc/mediatek/gzvm_drv.h>
+#include "gzvm_common.h"
+
+struct gzvm_irq_ack_notifier {
+ struct hlist_node link;
+ unsigned int gsi;
+ void (*irq_acked)(struct gzvm_irq_ack_notifier *ian);
+};
+
+/**
+ * struct gzvm_kernel_irqfd: gzvm kernel irqfd descriptor.
+ * @gzvm: Pointer to struct gzvm.
+ * @wait: Wait queue entry.
+ * @gsi: Used for level IRQ fast-path.
+ * @eventfd: Used for setup/shutdown.
+ * @list: struct list_head.
+ * @pt: struct poll_table_struct.
+ * @shutdown: struct work_struct.
+ */
+struct gzvm_kernel_irqfd {
+ struct gzvm *gzvm;
+ wait_queue_entry_t wait;
+
+ int gsi;
+
+ struct eventfd_ctx *eventfd;
+ struct list_head list;
+ poll_table pt;
+ struct work_struct shutdown;
+};
+
+static struct workqueue_struct *irqfd_cleanup_wq;
+
+/**
+ * irqfd_set_irq(): irqfd to inject virtual interrupt.
+ * @gzvm: Pointer to gzvm.
+ * @irq: This is spi interrupt number (starts from 0 instead of 32).
+ * @level: irq triggered level.
+ */
+static void irqfd_set_irq(struct gzvm *gzvm, u32 irq, int level)
+{
+ if (level)
+ gzvm_irqchip_inject_irq(gzvm, 0, irq, level);
+}
+
+/**
+ * irqfd_shutdown() - Race-free decouple logic (ordering is critical).
+ * @work: Pointer to work_struct.
+ */
+static void irqfd_shutdown(struct work_struct *work)
+{
+ struct gzvm_kernel_irqfd *irqfd =
+ container_of(work, struct gzvm_kernel_irqfd, shutdown);
+ struct gzvm *gzvm = irqfd->gzvm;
+ u64 cnt;
+
+ /* Make sure irqfd has been initialized in assign path. */
+ synchronize_srcu(&gzvm->irq_srcu);
+
+ /*
+ * Synchronize with the wait-queue and unhook ourselves to prevent
+ * further events.
+ */
+ eventfd_ctx_remove_wait_queue(irqfd->eventfd, &irqfd->wait, &cnt);
+
+ /*
+ * It is now safe to release the object's resources
+ */
+ eventfd_ctx_put(irqfd->eventfd);
+ kfree(irqfd);
+}
+
+/**
+ * irqfd_is_active() - Assumes gzvm->irqfds.lock is held.
+ * @irqfd: Pointer to gzvm_kernel_irqfd.
+ *
+ * Return:
+ * * true - irqfd is active.
+ */
+static bool irqfd_is_active(struct gzvm_kernel_irqfd *irqfd)
+{
+ return list_empty(&irqfd->list) ? false : true;
+}
+
+/**
+ * irqfd_deactivate() - Mark the irqfd as inactive and schedule it for removal.
+ * assumes gzvm->irqfds.lock is held.
+ * @irqfd: Pointer to gzvm_kernel_irqfd.
+ */
+static void irqfd_deactivate(struct gzvm_kernel_irqfd *irqfd)
+{
+ if (!irqfd_is_active(irqfd))
+ return;
+
+ list_del_init(&irqfd->list);
+
+ queue_work(irqfd_cleanup_wq, &irqfd->shutdown);
+}
+
+/**
+ * irqfd_wakeup() - Callback of irqfd wait queue, would be woken by writing to
+ * irqfd to do virtual interrupt injection.
+ * @wait: Pointer to wait_queue_entry_t.
+ * @mode: Unused.
+ * @sync: Unused.
+ * @key: Get flags about Epoll events.
+ *
+ * Return:
+ * * 0 - Success
+ */
+static int irqfd_wakeup(wait_queue_entry_t *wait, unsigned int mode, int sync,
+ void *key)
+{
+ struct gzvm_kernel_irqfd *irqfd =
+ container_of(wait, struct gzvm_kernel_irqfd, wait);
+ __poll_t flags = key_to_poll(key);
+ struct gzvm *gzvm = irqfd->gzvm;
+
+ if (flags & EPOLLIN) {
+ u64 cnt;
+
+ eventfd_ctx_do_read(irqfd->eventfd, &cnt);
+ /* gzvm's irq injection is not blocked, don't need workq */
+ irqfd_set_irq(gzvm, irqfd->gsi, 1);
+ }
+
+ if (flags & EPOLLHUP) {
+ /* The eventfd is closing, detach from GZVM */
+ unsigned long iflags;
+
+ spin_lock_irqsave(&gzvm->irqfds.lock, iflags);
+
+ /*
+ * Do more check if someone deactivated the irqfd before
+ * we could acquire the irqfds.lock.
+ */
+ if (irqfd_is_active(irqfd))
+ irqfd_deactivate(irqfd);
+
+ spin_unlock_irqrestore(&gzvm->irqfds.lock, iflags);
+ }
+
+ return 0;
+}
+
+static void irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh,
+ poll_table *pt)
+{
+ struct gzvm_kernel_irqfd *irqfd =
+ container_of(pt, struct gzvm_kernel_irqfd, pt);
+ add_wait_queue_priority(wqh, &irqfd->wait);
+}
+
+static int gzvm_irqfd_assign(struct gzvm *gzvm, struct gzvm_irqfd *args)
+{
+ struct gzvm_kernel_irqfd *irqfd, *tmp;
+ struct fd f;
+ struct eventfd_ctx *eventfd = NULL;
+ int ret;
+ int idx;
+
+ irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL_ACCOUNT);
+ if (!irqfd)
+ return -ENOMEM;
+
+ irqfd->gzvm = gzvm;
+ irqfd->gsi = args->gsi;
+
+ INIT_LIST_HEAD(&irqfd->list);
+ INIT_WORK(&irqfd->shutdown, irqfd_shutdown);
+
+ f = fdget(args->fd);
+ if (!f.file) {
+ ret = -EBADF;
+ goto out;
+ }
+
+ eventfd = eventfd_ctx_fileget(f.file);
+ if (IS_ERR(eventfd)) {
+ ret = PTR_ERR(eventfd);
+ goto fail;
+ }
+
+ irqfd->eventfd = eventfd;
+
+ /*
+ * Install our own custom wake-up handling so we are notified via
+ * a callback whenever someone signals the underlying eventfd
+ */
+ init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
+ init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
+
+ spin_lock_irq(&gzvm->irqfds.lock);
+
+ ret = 0;
+ list_for_each_entry(tmp, &gzvm->irqfds.items, list) {
+ if (irqfd->eventfd != tmp->eventfd)
+ continue;
+ /* This fd is used for another irq already. */
+ pr_err("already used: gsi=%d fd=%d\n", args->gsi, args->fd);
+ ret = -EBUSY;
+ spin_unlock_irq(&gzvm->irqfds.lock);
+ goto fail;
+ }
+
+ idx = srcu_read_lock(&gzvm->irq_srcu);
+
+ list_add_tail(&irqfd->list, &gzvm->irqfds.items);
+
+ spin_unlock_irq(&gzvm->irqfds.lock);
+
+ vfs_poll(f.file, &irqfd->pt);
+
+ srcu_read_unlock(&gzvm->irq_srcu, idx);
+
+ /*
+ * do not drop the file until the irqfd is fully initialized, otherwise
+ * we might race against the EPOLLHUP
+ */
+ fdput(f);
+ return 0;
+
+fail:
+ if (eventfd && !IS_ERR(eventfd))
+ eventfd_ctx_put(eventfd);
+
+ fdput(f);
+
+out:
+ kfree(irqfd);
+ return ret;
+}
+
+static void gzvm_notify_acked_gsi(struct gzvm *gzvm, int gsi)
+{
+ struct gzvm_irq_ack_notifier *gian;
+
+ hlist_for_each_entry_srcu(gian, &gzvm->irq_ack_notifier_list,
+ link, srcu_read_lock_held(&gzvm->irq_srcu))
+ if (gian->gsi == gsi)
+ gian->irq_acked(gian);
+}
+
+void gzvm_notify_acked_irq(struct gzvm *gzvm, unsigned int gsi)
+{
+ int idx;
+
+ idx = srcu_read_lock(&gzvm->irq_srcu);
+ gzvm_notify_acked_gsi(gzvm, gsi);
+ srcu_read_unlock(&gzvm->irq_srcu, idx);
+}
+
+/**
+ * gzvm_irqfd_deassign() - Shutdown any irqfd's that match fd+gsi.
+ * @gzvm: Pointer to gzvm.
+ * @args: Pointer to gzvm_irqfd.
+ *
+ * Return:
+ * * 0 - Success.
+ * * Negative value - Failure.
+ */
+static int gzvm_irqfd_deassign(struct gzvm *gzvm, struct gzvm_irqfd *args)
+{
+ struct gzvm_kernel_irqfd *irqfd, *tmp;
+ struct eventfd_ctx *eventfd;
+
+ eventfd = eventfd_ctx_fdget(args->fd);
+ if (IS_ERR(eventfd))
+ return PTR_ERR(eventfd);
+
+ spin_lock_irq(&gzvm->irqfds.lock);
+
+ list_for_each_entry_safe(irqfd, tmp, &gzvm->irqfds.items, list) {
+ if (irqfd->eventfd == eventfd && irqfd->gsi == args->gsi)
+ irqfd_deactivate(irqfd);
+ }
+
+ spin_unlock_irq(&gzvm->irqfds.lock);
+ eventfd_ctx_put(eventfd);
+
+ /*
+ * Block until we know all outstanding shutdown jobs have completed
+ * so that we guarantee there will not be any more interrupts on this
+ * gsi once this deassign function returns.
+ */
+ flush_workqueue(irqfd_cleanup_wq);
+
+ return 0;
+}
+
+int gzvm_irqfd(struct gzvm *gzvm, struct gzvm_irqfd *args)
+{
+ for (int i = 0; i < ARRAY_SIZE(args->pad); i++) {
+ if (args->pad[i])
+ return -EINVAL;
+ }
+
+ if (args->flags &
+ ~(GZVM_IRQFD_FLAG_DEASSIGN | GZVM_IRQFD_FLAG_RESAMPLE))
+ return -EINVAL;
+
+ if (args->flags & GZVM_IRQFD_FLAG_DEASSIGN)
+ return gzvm_irqfd_deassign(gzvm, args);
+
+ return gzvm_irqfd_assign(gzvm, args);
+}
+
+/**
+ * gzvm_vm_irqfd_init() - Initialize irqfd data structure per VM
+ *
+ * @gzvm: Pointer to struct gzvm.
+ *
+ * Return:
+ * * 0 - Success.
+ * * Negative - Failure.
+ */
+int gzvm_vm_irqfd_init(struct gzvm *gzvm)
+{
+ mutex_init(&gzvm->irq_lock);
+
+ spin_lock_init(&gzvm->irqfds.lock);
+ INIT_LIST_HEAD(&gzvm->irqfds.items);
+ if (init_srcu_struct(&gzvm->irq_srcu))
+ return -EINVAL;
+ INIT_HLIST_HEAD(&gzvm->irq_ack_notifier_list);
+
+ return 0;
+}
+
+/**
+ * gzvm_vm_irqfd_release() - This function is called as the gzvm VM fd is being
+ * released. Shutdown all irqfds that still remain open.
+ * @gzvm: Pointer to gzvm.
+ */
+void gzvm_vm_irqfd_release(struct gzvm *gzvm)
+{
+ struct gzvm_kernel_irqfd *irqfd, *tmp;
+
+ spin_lock_irq(&gzvm->irqfds.lock);
+
+ list_for_each_entry_safe(irqfd, tmp, &gzvm->irqfds.items, list)
+ irqfd_deactivate(irqfd);
+
+ spin_unlock_irq(&gzvm->irqfds.lock);
+
+ /*
+ * Block until we know all outstanding shutdown jobs have completed.
+ */
+ flush_workqueue(irqfd_cleanup_wq);
+}
+
+/**
+ * gzvm_drv_irqfd_init() - Erase flushing work items when a VM exits.
+ *
+ * Return:
+ * * 0 - Success.
+ * * Negative - Failure.
+ *
+ * Create a host-wide workqueue for issuing deferred shutdown requests
+ * aggregated from all vm* instances. We need our own isolated
+ * queue to ease flushing work items when a VM exits.
+ */
+int gzvm_drv_irqfd_init(void)
+{
+ irqfd_cleanup_wq = alloc_workqueue("gzvm-irqfd-cleanup", 0, 0);
+ if (!irqfd_cleanup_wq)
+ return -ENOMEM;
+
+ return 0;
+}
+
+void gzvm_drv_irqfd_exit(void)
+{
+ destroy_workqueue(irqfd_cleanup_wq);
+}
diff --git a/drivers/virt/geniezone/gzvm_main.c b/drivers/virt/geniezone/gzvm_main.c
index 565bd1fe8ece..75f643222b91 100644
--- a/drivers/virt/geniezone/gzvm_main.c
+++ b/drivers/virt/geniezone/gzvm_main.c
@@ -93,16 +93,26 @@ static struct miscdevice gzvm_dev = {

static int gzvm_drv_probe(struct platform_device *pdev)
{
+ int ret;
+
if (gzvm_arch_probe() != 0) {
dev_err(&pdev->dev, "Not found available conduit\n");
return -ENODEV;
}

- return misc_register(&gzvm_dev);
+ ret = misc_register(&gzvm_dev);
+ if (ret)
+ return ret;
+
+ ret = gzvm_drv_irqfd_init();
+ if (ret)
+ return ret;
+ return 0;
}

static int gzvm_drv_remove(struct platform_device *pdev)
{
+ gzvm_drv_irqfd_exit();
gzvm_destroy_all_vms();
misc_deregister(&gzvm_dev);
return 0;
diff --git a/drivers/virt/geniezone/gzvm_vcpu.c b/drivers/virt/geniezone/gzvm_vcpu.c
index 55668341d455..1ac09bf5f2d8 100644
--- a/drivers/virt/geniezone/gzvm_vcpu.c
+++ b/drivers/virt/geniezone/gzvm_vcpu.c
@@ -228,6 +228,7 @@ int gzvm_vm_ioctl_create_vcpu(struct gzvm *gzvm, u32 cpuid)
ret = -ENOMEM;
goto free_vcpu;
}
+ vcpu->hwstate = (void *)vcpu->run + PAGE_SIZE;
vcpu->vcpuid = cpuid;
vcpu->gzvm = gzvm;
mutex_init(&vcpu->lock);
diff --git a/drivers/virt/geniezone/gzvm_vm.c b/drivers/virt/geniezone/gzvm_vm.c
index 85c670a99ae5..77be1a22d767 100644
--- a/drivers/virt/geniezone/gzvm_vm.c
+++ b/drivers/virt/geniezone/gzvm_vm.c
@@ -217,6 +217,16 @@ static long gzvm_vm_ioctl(struct file *filp, unsigned int ioctl,
ret = gzvm_vm_ioctl_create_device(gzvm, argp);
break;
}
+ case GZVM_IRQFD: {
+ struct gzvm_irqfd data;
+
+ if (copy_from_user(&data, argp, sizeof(data))) {
+ ret = -EFAULT;
+ goto out;
+ }
+ ret = gzvm_irqfd(gzvm, &data);
+ break;
+ }
case GZVM_ENABLE_CAP: {
struct gzvm_enable_cap cap;

@@ -240,6 +250,7 @@ static void gzvm_destroy_vm(struct gzvm *gzvm)

mutex_lock(&gzvm->lock);

+ gzvm_vm_irqfd_release(gzvm);
gzvm_destroy_vcpus(gzvm);
gzvm_arch_destroy_vm(gzvm->vm_id);

@@ -285,6 +296,13 @@ static struct gzvm *gzvm_create_vm(unsigned long vm_type)
gzvm->mm = current->mm;
mutex_init(&gzvm->lock);

+ ret = gzvm_vm_irqfd_init(gzvm);
+ if (ret) {
+ pr_err("Failed to initialize irqfd\n");
+ kfree(gzvm);
+ return ERR_PTR(ret);
+ }
+
mutex_lock(&gzvm_list_lock);
list_add(&gzvm->vm_list, &gzvm_list);
mutex_unlock(&gzvm_list_lock);
diff --git a/include/linux/soc/mediatek/gzvm_drv.h b/include/linux/soc/mediatek/gzvm_drv.h
index a510df71b62e..0b02b5daa817 100644
--- a/include/linux/soc/mediatek/gzvm_drv.h
+++ b/include/linux/soc/mediatek/gzvm_drv.h
@@ -10,6 +10,7 @@
#include <linux/mm.h>
#include <linux/mutex.h>
#include <linux/gzvm.h>
+#include <linux/srcu.h>

/*
* For the normal physical address, the highest 12 bits should be zero, so we
@@ -30,6 +31,7 @@
#define ERR_NOT_SUPPORTED (-24)
#define ERR_NOT_IMPLEMENTED (-27)
#define ERR_FAULT (-40)
+#define GZVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1

/*
* The following data structures are for data transferring between driver and
@@ -85,6 +87,7 @@ struct gzvm_vcpu {
/* lock of vcpu*/
struct mutex lock;
struct gzvm_vcpu_run *run;
+ struct gzvm_vcpu_hwstate *hwstate;
};

/**
@@ -94,16 +97,32 @@ struct gzvm_vcpu {
* @mm: userspace tied to this vm
* @memslot: VM's memory slot descriptor
* @lock: lock for list_add
+ * @irqfds: the data structure is used to keep irqfds's information
* @vm_list: list head for vm list
* @vm_id: vm id
+ * @irq_ack_notifier_list: list head for irq ack notifier
+ * @irq_srcu: structure data for SRCU(sleepable rcu)
+ * @irq_lock: lock for irq injection
*/
struct gzvm {
struct gzvm_vcpu *vcpus[GZVM_MAX_VCPUS];
struct mm_struct *mm;
struct gzvm_memslot memslot[GZVM_MAX_MEM_REGION];
struct mutex lock;
+
+ struct {
+ spinlock_t lock;
+ struct list_head items;
+ struct list_head resampler_list;
+ struct mutex resampler_lock;
+ } irqfds;
+
struct list_head vm_list;
u16 vm_id;
+
+ struct hlist_head irq_ack_notifier_list;
+ struct srcu_struct irq_srcu;
+ struct mutex irq_lock;
};

long gzvm_dev_ioctl_check_extension(struct gzvm *gzvm, unsigned long args);
@@ -147,4 +166,11 @@ int gzvm_arch_create_device(u16 vm_id, struct gzvm_create_device *gzvm_dev);
int gzvm_arch_inject_irq(struct gzvm *gzvm, unsigned int vcpu_idx,
u32 irq, bool level);

+void gzvm_notify_acked_irq(struct gzvm *gzvm, unsigned int gsi);
+int gzvm_irqfd(struct gzvm *gzvm, struct gzvm_irqfd *args);
+int gzvm_drv_irqfd_init(void);
+void gzvm_drv_irqfd_exit(void);
+int gzvm_vm_irqfd_init(struct gzvm *gzvm);
+void gzvm_vm_irqfd_release(struct gzvm *gzvm);
+
#endif /* __GZVM_DRV_H__ */
diff --git a/include/uapi/linux/gzvm.h b/include/uapi/linux/gzvm.h
index 03fd0735fb80..aa61ece00cac 100644
--- a/include/uapi/linux/gzvm.h
+++ b/include/uapi/linux/gzvm.h
@@ -313,4 +313,30 @@ struct gzvm_one_reg {

#define GZVM_REG_GENERIC 0x0000000000000000ULL

+#define GZVM_IRQFD_FLAG_DEASSIGN BIT(0)
+/*
+ * GZVM_IRQFD_FLAG_RESAMPLE indicates resamplefd is valid and specifies
+ * the irqfd to operate in resampling mode for level triggered interrupt
+ * emulation.
+ */
+#define GZVM_IRQFD_FLAG_RESAMPLE BIT(1)
+
+/**
+ * struct gzvm_irqfd: gzvm irqfd descriptor
+ * @fd: File descriptor.
+ * @gsi: Used for level IRQ fast-path.
+ * @flags: FLAG_DEASSIGN or FLAG_RESAMPLE.
+ * @resamplefd: The file descriptor of the resampler.
+ * @pad: Reserved for future-proof.
+ */
+struct gzvm_irqfd {
+ __u32 fd;
+ __u32 gsi;
+ __u32 flags;
+ __u32 resamplefd;
+ __u8 pad[16];
+};
+
+#define GZVM_IRQFD _IOW(GZVM_IOC_MAGIC, 0x76, struct gzvm_irqfd)
+
#endif /* __GZVM_H__ */
--
2.18.0


2024-04-15 14:53:09

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH v10 15/21] virt: geniezone: Add demand paging support

On Fri, Apr 12, 2024 at 02:57:12PM +0800, Yi-De Wu wrote:
> From: "Yingshiuan Pan" <[email protected]>
>
> This page fault handler helps GenieZone hypervisor to do demand paging.
> On a lower level translation fault, GenieZone hypervisor will first
> check the fault GPA (guest physical address or IPA in ARM) is valid
> e.g. within the registered memory region, then it will setup the
> vcpu_run->exit_reason with necessary information for returning to
> gzvm driver.
>
> With the fault information, the gzvm driver looks up the physical
> address and call the MT_HVC_GZVM_MAP_GUEST to request the hypervisor
> maps the found PA to the fault GPA (IPA).
>
> There is one exception, for protected vm, we will populate full VM's
> memory region in advance in order to improve performance.
>
> Signed-off-by: Yingshiuan Pan <[email protected]>
> Signed-off-by: Jerry Wang <[email protected]>
> Signed-off-by: kevenny hsieh <[email protected]>
> Signed-off-by: Liju Chen <[email protected]>
> Signed-off-by: Yi-De Wu <[email protected]>

..

> diff --git a/drivers/virt/geniezone/gzvm_exception.c b/drivers/virt/geniezone/gzvm_exception.c
> new file mode 100644
> index 000000000000..475bc15b0689
> --- /dev/null
> +++ b/drivers/virt/geniezone/gzvm_exception.c
> @@ -0,0 +1,39 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2023 MediaTek Inc.
> + */
> +
> +#include <linux/device.h>
> +#include <linux/soc/mediatek/gzvm_drv.h>
> +
> +/**
> + * gzvm_handle_guest_exception() - Handle guest exception
> + * @vcpu: Pointer to struct gzvm_vcpu_run in userspace
> + * Return:
> + * * true - This exception has been processed, no need to back to VMM.
> + * * false - This exception has not been processed, require userspace.
> + */
> +bool gzvm_handle_guest_exception(struct gzvm_vcpu *vcpu)

Hi Yi-De Wu,

The return type is bool, however the function actually
returns either a bool or signed int.

I think that either:

1. The return type should be changed to int,
and returning true and false should be updated.

2. The function should always return true or false.

Flagged by Smatch.

> +{
> + int ret;
> +
> + for (int i = 0; i < ARRAY_SIZE(vcpu->run->exception.reserved); i++) {
> + if (vcpu->run->exception.reserved[i])
> + return -EINVAL;
> + }
> +
> + switch (vcpu->run->exception.exception) {
> + case GZVM_EXCEPTION_PAGE_FAULT:
> + ret = gzvm_handle_page_fault(vcpu);
> + break;
> + case GZVM_EXCEPTION_UNKNOWN:
> + fallthrough;
> + default:
> + ret = -EFAULT;
> + }
> +
> + if (!ret)
> + return true;
> + else
> + return false;
> +}

..

2024-04-15 14:57:50

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH v10 06/21] virt: geniezone: Add set_user_memory_region for vm

On Fri, Apr 12, 2024 at 02:57:03PM +0800, Yi-De Wu wrote:
> From: "Yingshiuan Pan" <[email protected]>
>
> Direct use of physical memory from VMs is forbidden and designed to be
> dictated to the privilege models managed by GenieZone hypervisor for
> security reason. With the help of gzvm-ko, the hypervisor would be able
> to manipulate memory as objects. And the memory management is highly
> integrated with ARM 2-stage translation tables to convert VA to IPA to
> PA under proper security measures required by protected VMs.
>
> Signed-off-by: Yingshiuan Pan <[email protected]>
> Signed-off-by: Jerry Wang <[email protected]>
> Signed-off-by: Liju Chen <[email protected]>
> Signed-off-by: Yi-De Wu <[email protected]>

..

> diff --git a/drivers/virt/geniezone/gzvm_vm.c b/drivers/virt/geniezone/gzvm_vm.c

..

> +/* gzvm_vm_ioctl() - Ioctl handler of VM FD */
> +static long gzvm_vm_ioctl(struct file *filp, unsigned int ioctl,
> + unsigned long arg)
> +{
> + long ret;
> + void __user *argp = (void __user *)arg;
> + struct gzvm *gzvm = filp->private_data;
> +
> + switch (ioctl) {
> + case GZVM_SET_USER_MEMORY_REGION: {
> + struct gzvm_userspace_memory_region userspace_mem;
> +
> + if (copy_from_user(&userspace_mem, argp, sizeof(userspace_mem)))
> + return -EFAULT;
> +
> + ret = gzvm_vm_ioctl_set_memory_region(gzvm, &userspace_mem);
> + break;
> + }
> + default:
> + ret = -ENOTTY;
> + }
> +out:

nit: the out label as added here, but it does not seem to be used
(until [PATCH v10 11/21] virt: geniezone: Add irqfd support).

Although it probably isn't hurting anything - other than automated
testing - it would be best to add as part of a patch that uses it.

Flagged by gcc-13 and clang-18 W=1 builds.

> + return ret;
> +}
> +
> static void gzvm_destroy_vm(struct gzvm *gzvm)
> {
> pr_debug("VM-%u is going to be destroyed\n", gzvm->vm_id);

..

2024-04-15 16:32:34

by Simon Horman

[permalink] [raw]
Subject: Re: [PATCH v10 19/21] virt: geniezone: Provide individual VM memory statistics within debugfs

On Fri, Apr 12, 2024 at 02:57:16PM +0800, Yi-De Wu wrote:
> From: "Jerry Wang" <[email protected]>
>
> Created a dedicated per-VM debugfs folder under gzvm, providing
> user-level programs with easy access to per-VM memory statistics for
> debugging and profiling purposes. This enables users to effectively
> analyze and optimize the memory usage of individual virtual machines.
>
> Two types of information can be obtained:
>
> `cat /sys/kernel/debug/gzvm/<pid>-<vmid>/protected_hyp_mem` shows memory
> used by the hypervisor and the size of the stage 2 table in bytes.
>
> `cat /sys/kernel/debug/gzvm/<pid>-<vmid>/protected_shared_mem` gives
> memory used by the shared resources of the guest and host in bytes.
>
> For example:
> console:/ # cat /sys/kernel/debug/gzvm/3417-15/protected_hyp_mem
> 180328
> console:/ # cat /sys/kernel/debug/gzvm/3417-15/protected_shared_mem
> 262144
> console:/ #
>
> More stats will be added in the future.
>
> Signed-off-by: Jerry Wang <[email protected]>
> Signed-off-by: Liju-Clr Chen <[email protected]>
> Signed-off-by: Yi-De Wu <[email protected]>

..

> diff --git a/drivers/virt/geniezone/gzvm_vm.c b/drivers/virt/geniezone/gzvm_vm.c

..

> @@ -398,6 +409,113 @@ static void setup_vm_demand_paging(struct gzvm *vm)
> }
> }
>
> +static int debugfs_open(struct inode *inode, struct file *file)
> +{
> + file->private_data = inode->i_private;
> + return 0;
> +}

nit: Coccinelle suggests that simple_open() can be used in place
of the debugfs_open() implementation above.

..

> +static const struct file_operations hyp_mem_fops = {
> + .owner = THIS_MODULE,
> + .open = debugfs_open,
> + .read = hyp_mem_read,
> + .llseek = no_llseek,
> +};
> +
> +static const struct file_operations shared_mem_fops = {
> + .owner = THIS_MODULE,
> + .open = debugfs_open,
> + .read = shared_mem_read,
> + .llseek = no_llseek,
> +};

..

2024-04-15 17:07:00

by Conor Dooley

[permalink] [raw]
Subject: Re: [PATCH v10 03/21] dt-bindings: hypervisor: Add MediaTek GenieZone hypervisor

On Fri, Apr 12, 2024 at 02:57:00PM +0800, Yi-De Wu wrote:
> From: "Yingshiuan Pan" <[email protected]>
>
> Add documentation for GenieZone(gzvm) node. This node informs gzvm
> driver to start probing if geniezone hypervisor is available and
> able to do virtual machine operations.
>
> [Reason to use dt solution]
> - The GenieZone hypervisor serves as a vendor model for facilitating
> platform virtualization, with an implementation that is independent
> from Linuxism.
> - In contrast to the dt solution, our previous approach involved probing
> via hypercall to determine the existence of our hypervisor. However, this
> method raised concerns about potentially impacting all systems, including
> those without the GenieZone hypervisor embedded[ref].
>
> Link: https://lore.kernel.org/all/[email protected]/

> +properties:
> + compatible:
> + const: mediatek,geniezone-hyp

Been avoiding this binding every time it shows up because Rob had
already told you no and hasn't revisited it since, but I feel this
should be s/-hyp// since that's redundant information.

> +description:
> + This interface is designed for integrating GenieZone hypervisor into Android
> + Virtualization Framework(AVF) along with Crosvm as a VMM.
> + It acts like a wrapper for every hypercall to GenieZone hypervisor in
> + order to control guest VM lifecycles and virtual interrupt injections.

The description however doesn't really make sense. The binding claims to
be for geniezone but the description talks about something else entirely
that "acts like a wrapper" between the OS and geniezone. What is the
binding actually for?


Attachments:
(No filename) (1.66 kB)
signature.asc (235.00 B)
Download all attachments