Gunyah is a Type-1 hypervisor independent of any
high-level OS kernel, and runs in a higher CPU privilege level. It does
not depend on any lower-privileged OS kernel/code for its core
functionality. This increases its security and can support a much smaller
trusted computing base than a Type-2 hypervisor.
Gunyah is an open source hypervisor. The source repo is available at
https://github.com/quic/gunyah-hypervisor.
The diagram below shows the architecture.
::
VM A VM B
+-----+ +-----+ | +-----+ +-----+ +-----+
| | | | | | | | | | |
EL0 | APP | | APP | | | APP | | APP | | APP |
| | | | | | | | | | |
+-----+ +-----+ | +-----+ +-----+ +-----+
---------------------|-------------------------
+--------------+ | +----------------------+
| | | | |
EL1 | Linux Kernel | | |Linux kernel/Other OS | ...
| | | | |
+--------------+ | +----------------------+
--------hvc/smc------|------hvc/smc------------
+----------------------------------------+
| |
EL2 | Gunyah Hypervisor |
| |
+----------------------------------------+
Gunyah provides these following features.
- Threads and Scheduling: The scheduler schedules virtual CPUs (VCPUs) on
physical CPUs and enables time-sharing of the CPUs.
- Memory Management: Gunyah tracks memory ownership and use of all memory
under its control. Memory partitioning between VMs is a fundamental
security feature.
- Interrupt Virtualization: All interrupts are handled in the hypervisor
and routed to the assigned VM.
- Inter-VM Communication: There are several different mechanisms provided
for communicating between VMs.
- Device Virtualization: Para-virtualization of devices is supported using
inter-VM communication. Low level system features and devices such as
interrupt controllers are supported with emulation where required.
This series adds the basic framework for detecting that Linux is running
under Gunyah as a virtual machine, communication with the Gunyah Resource
Manager, and a sample virtual machine manager capable of launching virtual machines.
The series relies on two other patches posted separately:
- https://lore.kernel.org/all/[email protected]/
- https://lore.kernel.org/all/[email protected]/
Changes in v13:
- Tweaks to message queue driver to address race condition between IRQ and mailbox registration
- Allow removal of VM functions by function-specific comparison -- specifically to allow
removing irqfd by label only and not requiring original FD to be provided.
Changes in v12: https://lore.kernel.org/all/[email protected]/
- Stylistic/cosmetic tweaks suggested by Alex
- Remove patch "virt: gunyah: Identify hypervisor version" and squash the
check that we're running under a reasonable Gunyah hypervisor into RM driver
- Refactor platform hooks into a separate module per suggestion from Srini
- GFP_KERNEL_ACCOUNT and account_locked_vm() for page pinning
- enum-ify related constants
Changes in v11: https://lore.kernel.org/all/[email protected]/
- Rename struct gh_vm_dtb_config:gpa -> guest_phys_addr & overflow checks for this
- More docstrings throughout
- Make resp_buf and resp_buf_size optional
- Replace deprecated idr with xarray
- Refconting on misc device instead of RM's platform device
- Renaming variables, structs, etc. from gunyah_ -> gh_
- Drop removal of user mem regions
- Drop mem_lend functionality; to converge with restricted_memfd later
Changes in v10: https://lore.kernel.org/all/[email protected]/
- Fix bisectability (end result of series is same, --fixups applied to wrong commits)
- Convert GH_ERROR_* and GH_RM_ERROR_* to enums
- Correct race condition between allocating/freeing user memory
- Replace offsetof with struct_size
- Series-wide renaming of functions to be more consistent
- VM shutdown & restart support added in vCPU and VM Manager patches
- Convert VM function name (string) to type (number)
- Convert VM function argument to value (which could be a pointer) to remove memory wastage for arguments
- Remove defensive checks of hypervisor correctness
- Clean ups to ioeventfd as suggested by Srivatsa
Changes in v9: https://lore.kernel.org/all/[email protected]/
- Refactor Gunyah API flags to be exposed as feature flags at kernel level
- Move mbox client cleanup into gunyah_msgq_remove()
- Simplify gh_rm_call return value and response payload
- Missing clean-up/error handling/little endian fixes as suggested by Srivatsa and Alex in v8 series
Changes in v8: https://lore.kernel.org/all/[email protected]/
- Treat VM manager as a library of RM
- Add patches 21-28 as RFC to support proxy-scheduled vCPUs and necessary bits to support virtio
from Gunyah userspace
Changes in v7: https://lore.kernel.org/all/[email protected]/
- Refactor to remove gunyah RM bus
- Refactor allow multiple RM device instances
- Bump UAPI to start at 0x0
- Refactor QCOM SCM's platform hooks to allow CONFIG_QCOM_SCM=Y/CONFIG_GUNYAH=M combinations
Changes in v6: https://lore.kernel.org/all/[email protected]/
- *Replace gunyah-console with gunyah VM Manager*
- Move include/asm-generic/gunyah.h into include/linux/gunyah.h
- s/gunyah_msgq/gh_msgq/
- Minor tweaks and documentation tidying based on comments from Jiri, Greg, Arnd, Dmitry, and Bagas.
Changes in v5: https://lore.kernel.org/all/[email protected]/
- Dropped sysfs nodes
- Switch from aux bus to Gunyah RM bus for the subdevices
- Cleaning up RM console
Changes in v4: https://lore.kernel.org/all/[email protected]/
- Tidied up documentation throughout based on questions/feedback received
- Switched message queue implementation to use mailboxes
- Renamed "gunyah_device" as "gunyah_resource"
Changes in v3: https://lore.kernel.org/all/[email protected]/
- /Maintained/Supported/ in MAINTAINERS
- Tidied up documentation throughout based on questions/feedback received
- Moved hypercalls into arch/arm64/gunyah/; following hyper-v's implementation
- Drop opaque typedefs
- Move sysfs nodes under /sys/hypervisor/gunyah/
- Moved Gunyah console driver to drivers/tty/
- Reworked gh_device design to drop the Gunyah bus.
Changes in v2: https://lore.kernel.org/all/[email protected]/
- DT bindings clean up
- Switch hypercalls to follow SMCCC
v1: https://lore.kernel.org/all/[email protected]/
Elliot Berman (24):
dt-bindings: Add binding for gunyah hypervisor
gunyah: Common types and error codes for Gunyah hypercalls
virt: gunyah: Add hypercalls to identify Gunyah
virt: gunyah: msgq: Add hypercalls to send and receive messages
mailbox: Add Gunyah message queue mailbox
gunyah: rsc_mgr: Add resource manager RPC core
gunyah: rsc_mgr: Add VM lifecycle RPC
gunyah: vm_mgr: Introduce basic VM Manager
gunyah: rsc_mgr: Add RPC for sharing memory
gunyah: vm_mgr: Add/remove user memory regions
gunyah: vm_mgr: Add ioctls to support basic non-proxy VM boot
samples: Add sample userspace Gunyah VM Manager
gunyah: rsc_mgr: Add platform ops on mem_lend/mem_reclaim
virt: gunyah: Add Qualcomm Gunyah platform ops
docs: gunyah: Document Gunyah VM Manager
virt: gunyah: Translate gh_rm_hyp_resource into gunyah_resource
gunyah: vm_mgr: Add framework for VM Functions
virt: gunyah: Add resource tickets
virt: gunyah: Add IO handlers
virt: gunyah: Add proxy-scheduled vCPUs
virt: gunyah: Add hypercalls for sending doorbell
virt: gunyah: Add irqfd interface
virt: gunyah: Add ioeventfd
MAINTAINERS: Add Gunyah hypervisor drivers section
.../bindings/firmware/gunyah-hypervisor.yaml | 82 ++
.../userspace-api/ioctl/ioctl-number.rst | 1 +
Documentation/virt/gunyah/index.rst | 1 +
Documentation/virt/gunyah/message-queue.rst | 8 +
Documentation/virt/gunyah/vm-manager.rst | 142 +++
MAINTAINERS | 13 +
arch/arm64/Kbuild | 1 +
arch/arm64/gunyah/Makefile | 3 +
arch/arm64/gunyah/gunyah_hypercall.c | 140 +++
arch/arm64/include/asm/gunyah.h | 24 +
drivers/mailbox/Makefile | 2 +
drivers/mailbox/gunyah-msgq.c | 212 ++++
drivers/virt/Kconfig | 2 +
drivers/virt/Makefile | 1 +
drivers/virt/gunyah/Kconfig | 59 ++
drivers/virt/gunyah/Makefile | 11 +
drivers/virt/gunyah/gunyah_ioeventfd.c | 130 +++
drivers/virt/gunyah/gunyah_irqfd.c | 180 ++++
drivers/virt/gunyah/gunyah_platform_hooks.c | 80 ++
drivers/virt/gunyah/gunyah_qcom.c | 147 +++
drivers/virt/gunyah/gunyah_vcpu.c | 468 +++++++++
drivers/virt/gunyah/rsc_mgr.c | 910 ++++++++++++++++++
drivers/virt/gunyah/rsc_mgr.h | 19 +
drivers/virt/gunyah/rsc_mgr_rpc.c | 500 ++++++++++
drivers/virt/gunyah/vm_mgr.c | 794 +++++++++++++++
drivers/virt/gunyah/vm_mgr.h | 70 ++
drivers/virt/gunyah/vm_mgr_mm.c | 256 +++++
include/linux/gunyah.h | 207 ++++
include/linux/gunyah_rsc_mgr.h | 162 ++++
include/linux/gunyah_vm_mgr.h | 126 +++
include/uapi/linux/gunyah.h | 293 ++++++
samples/Kconfig | 10 +
samples/Makefile | 1 +
samples/gunyah/.gitignore | 2 +
samples/gunyah/Makefile | 6 +
samples/gunyah/gunyah_vmm.c | 270 ++++++
samples/gunyah/sample_vm.dts | 68 ++
37 files changed, 5401 insertions(+)
create mode 100644 Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml
create mode 100644 Documentation/virt/gunyah/vm-manager.rst
create mode 100644 arch/arm64/gunyah/Makefile
create mode 100644 arch/arm64/gunyah/gunyah_hypercall.c
create mode 100644 arch/arm64/include/asm/gunyah.h
create mode 100644 drivers/mailbox/gunyah-msgq.c
create mode 100644 drivers/virt/gunyah/Kconfig
create mode 100644 drivers/virt/gunyah/Makefile
create mode 100644 drivers/virt/gunyah/gunyah_ioeventfd.c
create mode 100644 drivers/virt/gunyah/gunyah_irqfd.c
create mode 100644 drivers/virt/gunyah/gunyah_platform_hooks.c
create mode 100644 drivers/virt/gunyah/gunyah_qcom.c
create mode 100644 drivers/virt/gunyah/gunyah_vcpu.c
create mode 100644 drivers/virt/gunyah/rsc_mgr.c
create mode 100644 drivers/virt/gunyah/rsc_mgr.h
create mode 100644 drivers/virt/gunyah/rsc_mgr_rpc.c
create mode 100644 drivers/virt/gunyah/vm_mgr.c
create mode 100644 drivers/virt/gunyah/vm_mgr.h
create mode 100644 drivers/virt/gunyah/vm_mgr_mm.c
create mode 100644 include/linux/gunyah.h
create mode 100644 include/linux/gunyah_rsc_mgr.h
create mode 100644 include/linux/gunyah_vm_mgr.h
create mode 100644 include/uapi/linux/gunyah.h
create mode 100644 samples/gunyah/.gitignore
create mode 100644 samples/gunyah/Makefile
create mode 100644 samples/gunyah/gunyah_vmm.c
create mode 100644 samples/gunyah/sample_vm.dts
base-commit: c8c655c34e33544aec9d64b660872ab33c29b5f1
prerequisite-patch-id: b48c45acdec06adf37e09fe35e6a9412c5784800
prerequisite-patch-id: bc27499c7652385c584424529edbc5781c074d68
--
2.40.0
Add hypercalls to identify when Linux is running a virtual machine under
Gunyah.
There are two calls to help identify Gunyah:
1. gh_hypercall_get_uid() returns a UID when running under a Gunyah
hypervisor.
2. gh_hypercall_hyp_identify() returns build information and a set of
feature flags that are supported by Gunyah.
Reviewed-by: Srinivas Kandagatla <[email protected]>
Signed-off-by: Elliot Berman <[email protected]>
---
arch/arm64/Kbuild | 1 +
arch/arm64/gunyah/Makefile | 3 ++
arch/arm64/gunyah/gunyah_hypercall.c | 56 ++++++++++++++++++++++++++++
drivers/virt/Kconfig | 2 +
drivers/virt/gunyah/Kconfig | 13 +++++++
include/linux/gunyah.h | 31 +++++++++++++++
6 files changed, 106 insertions(+)
create mode 100644 arch/arm64/gunyah/Makefile
create mode 100644 arch/arm64/gunyah/gunyah_hypercall.c
create mode 100644 drivers/virt/gunyah/Kconfig
diff --git a/arch/arm64/Kbuild b/arch/arm64/Kbuild
index 5bfbf7d79c99..e4847ba0e3c9 100644
--- a/arch/arm64/Kbuild
+++ b/arch/arm64/Kbuild
@@ -3,6 +3,7 @@ obj-y += kernel/ mm/ net/
obj-$(CONFIG_KVM) += kvm/
obj-$(CONFIG_XEN) += xen/
obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/
+obj-$(CONFIG_GUNYAH) += gunyah/
obj-$(CONFIG_CRYPTO) += crypto/
# for cleaning
diff --git a/arch/arm64/gunyah/Makefile b/arch/arm64/gunyah/Makefile
new file mode 100644
index 000000000000..84f1e38cafb1
--- /dev/null
+++ b/arch/arm64/gunyah/Makefile
@@ -0,0 +1,3 @@
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_GUNYAH) += gunyah_hypercall.o
diff --git a/arch/arm64/gunyah/gunyah_hypercall.c b/arch/arm64/gunyah/gunyah_hypercall.c
new file mode 100644
index 000000000000..2166d5dab869
--- /dev/null
+++ b/arch/arm64/gunyah/gunyah_hypercall.c
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#include <linux/arm-smccc.h>
+#include <linux/module.h>
+#include <linux/gunyah.h>
+#include <linux/uuid.h>
+
+/* {c1d58fcd-a453-5fdb-9265-ce36673d5f14} */
+static const uuid_t GUNYAH_UUID =
+ UUID_INIT(0xc1d58fcd, 0xa453, 0x5fdb, 0x92, 0x65, 0xce, 0x36, 0x67, 0x3d, 0x5f, 0x14);
+
+bool arch_is_gh_guest(void)
+{
+ struct arm_smccc_res res;
+ uuid_t uuid;
+
+ arm_smccc_1_1_hvc(ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID, &res);
+
+ ((u32 *)&uuid.b[0])[0] = lower_32_bits(res.a0);
+ ((u32 *)&uuid.b[0])[1] = lower_32_bits(res.a1);
+ ((u32 *)&uuid.b[0])[2] = lower_32_bits(res.a2);
+ ((u32 *)&uuid.b[0])[3] = lower_32_bits(res.a3);
+
+ return uuid_equal(&uuid, &GUNYAH_UUID);
+}
+EXPORT_SYMBOL_GPL(arch_is_gh_guest);
+
+#define GH_HYPERCALL(fn) ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, ARM_SMCCC_SMC_64, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, \
+ fn)
+
+#define GH_HYPERCALL_HYP_IDENTIFY GH_HYPERCALL(0x8000)
+
+/**
+ * gh_hypercall_hyp_identify() - Returns build information and feature flags
+ * supported by Gunyah.
+ * @hyp_identity: filled by the hypercall with the API info and feature flags.
+ */
+void gh_hypercall_hyp_identify(struct gh_hypercall_hyp_identify_resp *hyp_identity)
+{
+ struct arm_smccc_res res;
+
+ arm_smccc_1_1_hvc(GH_HYPERCALL_HYP_IDENTIFY, &res);
+
+ hyp_identity->api_info = res.a0;
+ hyp_identity->flags[0] = res.a1;
+ hyp_identity->flags[1] = res.a2;
+ hyp_identity->flags[2] = res.a3;
+}
+EXPORT_SYMBOL_GPL(gh_hypercall_hyp_identify);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Gunyah Hypervisor Hypercalls");
diff --git a/drivers/virt/Kconfig b/drivers/virt/Kconfig
index f79ab13a5c28..85bd6626ffc9 100644
--- a/drivers/virt/Kconfig
+++ b/drivers/virt/Kconfig
@@ -54,4 +54,6 @@ source "drivers/virt/coco/sev-guest/Kconfig"
source "drivers/virt/coco/tdx-guest/Kconfig"
+source "drivers/virt/gunyah/Kconfig"
+
endif
diff --git a/drivers/virt/gunyah/Kconfig b/drivers/virt/gunyah/Kconfig
new file mode 100644
index 000000000000..1a737694c333
--- /dev/null
+++ b/drivers/virt/gunyah/Kconfig
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+config GUNYAH
+ tristate "Gunyah Virtualization drivers"
+ depends on ARM64
+ depends on MAILBOX
+ help
+ The Gunyah drivers are the helper interfaces that run in a guest VM
+ such as basic inter-VM IPC and signaling mechanisms, and higher level
+ services such as memory/device sharing, IRQ sharing, and so on.
+
+ Say Y/M here to enable the drivers needed to interact in a Gunyah
+ virtual environment.
diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
index a4e8ec91961d..6b36cf4787ef 100644
--- a/include/linux/gunyah.h
+++ b/include/linux/gunyah.h
@@ -6,8 +6,10 @@
#ifndef _LINUX_GUNYAH_H
#define _LINUX_GUNYAH_H
+#include <linux/bitfield.h>
#include <linux/errno.h>
#include <linux/limits.h>
+#include <linux/types.h>
/******************************************************************************/
/* Common arch-independent definitions for Gunyah hypercalls */
@@ -80,4 +82,33 @@ static inline int gh_error_remap(enum gh_error gh_error)
}
}
+enum gh_api_feature {
+ GH_FEATURE_DOORBELL = 1,
+ GH_FEATURE_MSGQUEUE = 2,
+ GH_FEATURE_VCPU = 5,
+ GH_FEATURE_MEMEXTENT = 6,
+};
+
+bool arch_is_gh_guest(void);
+
+#define GH_API_V1 1
+
+/* Other bits reserved for future use and will be zero */
+#define GH_API_INFO_API_VERSION_MASK GENMASK_ULL(13, 0)
+#define GH_API_INFO_BIG_ENDIAN BIT_ULL(14)
+#define GH_API_INFO_IS_64BIT BIT_ULL(15)
+#define GH_API_INFO_VARIANT_MASK GENMASK_ULL(63, 56)
+
+struct gh_hypercall_hyp_identify_resp {
+ u64 api_info;
+ u64 flags[3];
+};
+
+static inline u16 gh_api_version(const struct gh_hypercall_hyp_identify_resp *gh_api)
+{
+ return FIELD_GET(GH_API_INFO_API_VERSION_MASK, gh_api->api_info);
+}
+
+void gh_hypercall_hyp_identify(struct gh_hypercall_hyp_identify_resp *hyp_identity);
+
#endif
--
2.40.0
Add architecture-independent standard error codes, types, and macros for
Gunyah hypercalls.
Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Elliot Berman <[email protected]>
---
include/linux/gunyah.h | 83 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 83 insertions(+)
create mode 100644 include/linux/gunyah.h
diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
new file mode 100644
index 000000000000..a4e8ec91961d
--- /dev/null
+++ b/include/linux/gunyah.h
@@ -0,0 +1,83 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef _LINUX_GUNYAH_H
+#define _LINUX_GUNYAH_H
+
+#include <linux/errno.h>
+#include <linux/limits.h>
+
+/******************************************************************************/
+/* Common arch-independent definitions for Gunyah hypercalls */
+#define GH_CAPID_INVAL U64_MAX
+#define GH_VMID_ROOT_VM 0xff
+
+enum gh_error {
+ GH_ERROR_OK = 0,
+ GH_ERROR_UNIMPLEMENTED = -1,
+ GH_ERROR_RETRY = -2,
+
+ GH_ERROR_ARG_INVAL = 1,
+ GH_ERROR_ARG_SIZE = 2,
+ GH_ERROR_ARG_ALIGN = 3,
+
+ GH_ERROR_NOMEM = 10,
+
+ GH_ERROR_ADDR_OVFL = 20,
+ GH_ERROR_ADDR_UNFL = 21,
+ GH_ERROR_ADDR_INVAL = 22,
+
+ GH_ERROR_DENIED = 30,
+ GH_ERROR_BUSY = 31,
+ GH_ERROR_IDLE = 32,
+
+ GH_ERROR_IRQ_BOUND = 40,
+ GH_ERROR_IRQ_UNBOUND = 41,
+
+ GH_ERROR_CSPACE_CAP_NULL = 50,
+ GH_ERROR_CSPACE_CAP_REVOKED = 51,
+ GH_ERROR_CSPACE_WRONG_OBJ_TYPE = 52,
+ GH_ERROR_CSPACE_INSUF_RIGHTS = 53,
+ GH_ERROR_CSPACE_FULL = 54,
+
+ GH_ERROR_MSGQUEUE_EMPTY = 60,
+ GH_ERROR_MSGQUEUE_FULL = 61,
+};
+
+/**
+ * gh_error_remap() - Remap Gunyah hypervisor errors into a Linux error code
+ * @gh_error: Gunyah hypercall return value
+ */
+static inline int gh_error_remap(enum gh_error gh_error)
+{
+ switch (gh_error) {
+ case GH_ERROR_OK:
+ return 0;
+ case GH_ERROR_NOMEM:
+ return -ENOMEM;
+ case GH_ERROR_DENIED:
+ case GH_ERROR_CSPACE_CAP_NULL:
+ case GH_ERROR_CSPACE_CAP_REVOKED:
+ case GH_ERROR_CSPACE_WRONG_OBJ_TYPE:
+ case GH_ERROR_CSPACE_INSUF_RIGHTS:
+ case GH_ERROR_CSPACE_FULL:
+ return -EACCES;
+ case GH_ERROR_BUSY:
+ case GH_ERROR_IDLE:
+ return -EBUSY;
+ case GH_ERROR_IRQ_BOUND:
+ case GH_ERROR_IRQ_UNBOUND:
+ case GH_ERROR_MSGQUEUE_FULL:
+ case GH_ERROR_MSGQUEUE_EMPTY:
+ return -EIO;
+ case GH_ERROR_UNIMPLEMENTED:
+ case GH_ERROR_RETRY:
+ return -EOPNOTSUPP;
+ default:
+ return -EINVAL;
+ }
+}
+
+#endif
--
2.40.0
Gunyah VM manager is a kernel moduel which exposes an interface to
Gunyah userspace to load, run, and interact with other Gunyah virtual
machines. The interface is a character device at /dev/gunyah.
Add a basic VM manager driver. Upcoming patches will add more ioctls
into this driver.
Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Elliot Berman <[email protected]>
---
.../userspace-api/ioctl/ioctl-number.rst | 1 +
drivers/virt/gunyah/Makefile | 2 +-
drivers/virt/gunyah/rsc_mgr.c | 50 +++++++++-
drivers/virt/gunyah/vm_mgr.c | 93 +++++++++++++++++++
drivers/virt/gunyah/vm_mgr.h | 20 ++++
include/uapi/linux/gunyah.h | 23 +++++
6 files changed, 187 insertions(+), 2 deletions(-)
create mode 100644 drivers/virt/gunyah/vm_mgr.c
create mode 100644 drivers/virt/gunyah/vm_mgr.h
create mode 100644 include/uapi/linux/gunyah.h
diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 176e8fc3f31b..396212e88f7d 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -137,6 +137,7 @@ Code Seq# Include File Comments
'F' DD video/sstfb.h conflict!
'G' 00-3F drivers/misc/sgi-gru/grulib.h conflict!
'G' 00-0F xen/gntalloc.h, xen/gntdev.h conflict!
+'G' 00-0f linux/gunyah.h conflict!
'H' 00-7F linux/hiddev.h conflict!
'H' 00-0F linux/hidraw.h conflict!
'H' 01 linux/mei.h conflict!
diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
index 241bab357b86..e47e25895299 100644
--- a/drivers/virt/gunyah/Makefile
+++ b/drivers/virt/gunyah/Makefile
@@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
-gunyah-y += rsc_mgr.o rsc_mgr_rpc.o
+gunyah-y += rsc_mgr.o rsc_mgr_rpc.o vm_mgr.o
obj-$(CONFIG_GUNYAH) += gunyah.o
diff --git a/drivers/virt/gunyah/rsc_mgr.c b/drivers/virt/gunyah/rsc_mgr.c
index 88b5beb1ea51..4f6f96bdcf3d 100644
--- a/drivers/virt/gunyah/rsc_mgr.c
+++ b/drivers/virt/gunyah/rsc_mgr.c
@@ -15,8 +15,10 @@
#include <linux/completion.h>
#include <linux/gunyah_rsc_mgr.h>
#include <linux/platform_device.h>
+#include <linux/miscdevice.h>
#include "rsc_mgr.h"
+#include "vm_mgr.h"
#define RM_RPC_API_VERSION_MASK GENMASK(3, 0)
#define RM_RPC_HEADER_WORDS_MASK GENMASK(7, 4)
@@ -130,6 +132,7 @@ struct gh_rm_connection {
* @cache: cache for allocating Tx messages
* @send_lock: synchronization to allow only one request to be sent at a time
* @nh: notifier chain for clients interested in RM notification messages
+ * @miscdev: /dev/gunyah
*/
struct gh_rm {
struct device *dev;
@@ -146,6 +149,8 @@ struct gh_rm {
struct kmem_cache *cache;
struct mutex send_lock;
struct blocking_notifier_head nh;
+
+ struct miscdevice miscdev;
};
/**
@@ -581,6 +586,33 @@ int gh_rm_notifier_unregister(struct gh_rm *rm, struct notifier_block *nb)
}
EXPORT_SYMBOL_GPL(gh_rm_notifier_unregister);
+struct device *gh_rm_get(struct gh_rm *rm)
+{
+ return get_device(rm->miscdev.this_device);
+}
+EXPORT_SYMBOL_GPL(gh_rm_get);
+
+void gh_rm_put(struct gh_rm *rm)
+{
+ put_device(rm->miscdev.this_device);
+}
+EXPORT_SYMBOL_GPL(gh_rm_put);
+
+static long gh_dev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+ struct miscdevice *miscdev = filp->private_data;
+ struct gh_rm *rm = container_of(miscdev, struct gh_rm, miscdev);
+
+ return gh_dev_vm_mgr_ioctl(rm, cmd, arg);
+}
+
+static const struct file_operations gh_dev_fops = {
+ .owner = THIS_MODULE,
+ .unlocked_ioctl = gh_dev_ioctl,
+ .compat_ioctl = compat_ptr_ioctl,
+ .llseek = noop_llseek,
+};
+
static int gh_msgq_platform_probe_direction(struct platform_device *pdev, bool tx,
struct gh_resource *ghrsc)
{
@@ -665,7 +697,22 @@ static int gh_rm_drv_probe(struct platform_device *pdev)
rm->msgq_client.rx_callback = gh_rm_msgq_rx_data;
rm->msgq_client.tx_done = gh_rm_msgq_tx_done;
- return gh_msgq_init(&pdev->dev, &rm->msgq, &rm->msgq_client, &rm->tx_ghrsc, &rm->rx_ghrsc);
+ ret = gh_msgq_init(&pdev->dev, &rm->msgq, &rm->msgq_client, &rm->tx_ghrsc, &rm->rx_ghrsc);
+ if (ret)
+ goto err_cache;
+
+ rm->miscdev.name = "gunyah";
+ rm->miscdev.minor = MISC_DYNAMIC_MINOR;
+ rm->miscdev.fops = &gh_dev_fops;
+
+ ret = misc_register(&rm->miscdev);
+ if (ret)
+ goto err_msgq;
+
+ return 0;
+err_msgq:
+ mbox_free_channel(gh_msgq_chan(&rm->msgq));
+ gh_msgq_remove(&rm->msgq);
err_cache:
kmem_cache_destroy(rm->cache);
return ret;
@@ -675,6 +722,7 @@ static int gh_rm_drv_remove(struct platform_device *pdev)
{
struct gh_rm *rm = platform_get_drvdata(pdev);
+ misc_deregister(&rm->miscdev);
mbox_free_channel(gh_msgq_chan(&rm->msgq));
gh_msgq_remove(&rm->msgq);
kmem_cache_destroy(rm->cache);
diff --git a/drivers/virt/gunyah/vm_mgr.c b/drivers/virt/gunyah/vm_mgr.c
new file mode 100644
index 000000000000..a43401cb34f7
--- /dev/null
+++ b/drivers/virt/gunyah/vm_mgr.c
@@ -0,0 +1,93 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#define pr_fmt(fmt) "gh_vm_mgr: " fmt
+
+#include <linux/anon_inodes.h>
+#include <linux/file.h>
+#include <linux/gunyah_rsc_mgr.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+
+#include <uapi/linux/gunyah.h>
+
+#include "vm_mgr.h"
+
+static __must_check struct gh_vm *gh_vm_alloc(struct gh_rm *rm)
+{
+ struct gh_vm *ghvm;
+
+ ghvm = kzalloc(sizeof(*ghvm), GFP_KERNEL);
+ if (!ghvm)
+ return ERR_PTR(-ENOMEM);
+
+ ghvm->parent = gh_rm_get(rm);
+ ghvm->rm = rm;
+
+ return ghvm;
+}
+
+static int gh_vm_release(struct inode *inode, struct file *filp)
+{
+ struct gh_vm *ghvm = filp->private_data;
+
+ gh_rm_put(ghvm->rm);
+ kfree(ghvm);
+ return 0;
+}
+
+static const struct file_operations gh_vm_fops = {
+ .owner = THIS_MODULE,
+ .release = gh_vm_release,
+ .llseek = noop_llseek,
+};
+
+static long gh_dev_ioctl_create_vm(struct gh_rm *rm, unsigned long arg)
+{
+ struct gh_vm *ghvm;
+ struct file *file;
+ int fd, err;
+
+ /* arg reserved for future use. */
+ if (arg)
+ return -EINVAL;
+
+ ghvm = gh_vm_alloc(rm);
+ if (IS_ERR(ghvm))
+ return PTR_ERR(ghvm);
+
+ fd = get_unused_fd_flags(O_CLOEXEC);
+ if (fd < 0) {
+ err = fd;
+ goto err_destroy_vm;
+ }
+
+ file = anon_inode_getfile("gunyah-vm", &gh_vm_fops, ghvm, O_RDWR);
+ if (IS_ERR(file)) {
+ err = PTR_ERR(file);
+ goto err_put_fd;
+ }
+
+ fd_install(fd, file);
+
+ return fd;
+
+err_put_fd:
+ put_unused_fd(fd);
+err_destroy_vm:
+ gh_rm_put(ghvm->rm);
+ kfree(ghvm);
+ return err;
+}
+
+long gh_dev_vm_mgr_ioctl(struct gh_rm *rm, unsigned int cmd, unsigned long arg)
+{
+ switch (cmd) {
+ case GH_CREATE_VM:
+ return gh_dev_ioctl_create_vm(rm, arg);
+ default:
+ return -ENOTTY;
+ }
+}
diff --git a/drivers/virt/gunyah/vm_mgr.h b/drivers/virt/gunyah/vm_mgr.h
new file mode 100644
index 000000000000..1e94b58d7d34
--- /dev/null
+++ b/drivers/virt/gunyah/vm_mgr.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef _GH_VM_MGR_H
+#define _GH_VM_MGR_H
+
+#include <linux/gunyah_rsc_mgr.h>
+
+#include <uapi/linux/gunyah.h>
+
+long gh_dev_vm_mgr_ioctl(struct gh_rm *rm, unsigned int cmd, unsigned long arg);
+
+struct gh_vm {
+ struct gh_rm *rm;
+ struct device *parent;
+};
+
+#endif
diff --git a/include/uapi/linux/gunyah.h b/include/uapi/linux/gunyah.h
new file mode 100644
index 000000000000..86b9cb60118d
--- /dev/null
+++ b/include/uapi/linux/gunyah.h
@@ -0,0 +1,23 @@
+/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef _UAPI_LINUX_GUNYAH_H
+#define _UAPI_LINUX_GUNYAH_H
+
+/*
+ * Userspace interface for /dev/gunyah - gunyah based virtual machine
+ */
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+#define GH_IOCTL_TYPE 'G'
+
+/*
+ * ioctls for /dev/gunyah fds:
+ */
+#define GH_CREATE_VM _IO(GH_IOCTL_TYPE, 0x0) /* Returns a Gunyah VM fd */
+
+#endif
--
2.40.0
Add remaining ioctls to support non-proxy VM boot:
- Gunyah Resource Manager uses the VM's devicetree to configure the
virtual machine. The location of the devicetree in the guest's
virtual memory can be declared via the SET_DTB_CONFIG ioctl.
- Trigger start of the virtual machine with VM_START ioctl.
Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Elliot Berman <[email protected]>
---
drivers/virt/gunyah/vm_mgr.c | 215 ++++++++++++++++++++++++++++++++
drivers/virt/gunyah/vm_mgr.h | 11 ++
drivers/virt/gunyah/vm_mgr_mm.c | 20 +++
include/uapi/linux/gunyah.h | 15 +++
4 files changed, 261 insertions(+)
diff --git a/drivers/virt/gunyah/vm_mgr.c b/drivers/virt/gunyah/vm_mgr.c
index 297427952b8c..a800061f56bf 100644
--- a/drivers/virt/gunyah/vm_mgr.c
+++ b/drivers/virt/gunyah/vm_mgr.c
@@ -17,6 +17,68 @@
static void gh_vm_free(struct work_struct *work);
+static int gh_vm_rm_notification_status(struct gh_vm *ghvm, void *data)
+{
+ struct gh_rm_vm_status_payload *payload = data;
+
+ if (le16_to_cpu(payload->vmid) != ghvm->vmid)
+ return NOTIFY_OK;
+
+ /* All other state transitions are synchronous to a corresponding RM call */
+ if (payload->vm_status == GH_RM_VM_STATUS_RESET) {
+ down_write(&ghvm->status_lock);
+ ghvm->vm_status = payload->vm_status;
+ up_write(&ghvm->status_lock);
+ wake_up(&ghvm->vm_status_wait);
+ }
+
+ return NOTIFY_DONE;
+}
+
+static int gh_vm_rm_notification_exited(struct gh_vm *ghvm, void *data)
+{
+ struct gh_rm_vm_exited_payload *payload = data;
+
+ if (le16_to_cpu(payload->vmid) != ghvm->vmid)
+ return NOTIFY_OK;
+
+ down_write(&ghvm->status_lock);
+ ghvm->vm_status = GH_RM_VM_STATUS_EXITED;
+ up_write(&ghvm->status_lock);
+ wake_up(&ghvm->vm_status_wait);
+
+ return NOTIFY_DONE;
+}
+
+static int gh_vm_rm_notification(struct notifier_block *nb, unsigned long action, void *data)
+{
+ struct gh_vm *ghvm = container_of(nb, struct gh_vm, nb);
+
+ switch (action) {
+ case GH_RM_NOTIFICATION_VM_STATUS:
+ return gh_vm_rm_notification_status(ghvm, data);
+ case GH_RM_NOTIFICATION_VM_EXITED:
+ return gh_vm_rm_notification_exited(ghvm, data);
+ default:
+ return NOTIFY_OK;
+ }
+}
+
+static void gh_vm_stop(struct gh_vm *ghvm)
+{
+ int ret;
+
+ down_write(&ghvm->status_lock);
+ if (ghvm->vm_status == GH_RM_VM_STATUS_RUNNING) {
+ ret = gh_rm_vm_stop(ghvm->rm, ghvm->vmid);
+ if (ret)
+ dev_warn(ghvm->parent, "Failed to stop VM: %d\n", ret);
+ }
+ up_write(&ghvm->status_lock);
+
+ wait_event(ghvm->vm_status_wait, ghvm->vm_status == GH_RM_VM_STATUS_EXITED);
+}
+
static __must_check struct gh_vm *gh_vm_alloc(struct gh_rm *rm)
{
struct gh_vm *ghvm;
@@ -26,17 +88,130 @@ static __must_check struct gh_vm *gh_vm_alloc(struct gh_rm *rm)
return ERR_PTR(-ENOMEM);
ghvm->parent = gh_rm_get(rm);
+ ghvm->vmid = GH_VMID_INVAL;
ghvm->rm = rm;
mmgrab(current->mm);
ghvm->mm = current->mm;
mutex_init(&ghvm->mm_lock);
INIT_LIST_HEAD(&ghvm->memory_mappings);
+ init_rwsem(&ghvm->status_lock);
+ init_waitqueue_head(&ghvm->vm_status_wait);
INIT_WORK(&ghvm->free_work, gh_vm_free);
+ ghvm->vm_status = GH_RM_VM_STATUS_NO_STATE;
return ghvm;
}
+static int gh_vm_start(struct gh_vm *ghvm)
+{
+ struct gh_vm_mem *mapping;
+ u64 dtb_offset;
+ u32 mem_handle;
+ int ret;
+
+ down_write(&ghvm->status_lock);
+ if (ghvm->vm_status != GH_RM_VM_STATUS_NO_STATE) {
+ up_write(&ghvm->status_lock);
+ return 0;
+ }
+
+ ghvm->nb.notifier_call = gh_vm_rm_notification;
+ ret = gh_rm_notifier_register(ghvm->rm, &ghvm->nb);
+ if (ret)
+ goto err;
+
+ ret = gh_rm_alloc_vmid(ghvm->rm, 0);
+ if (ret < 0) {
+ gh_rm_notifier_unregister(ghvm->rm, &ghvm->nb);
+ goto err;
+ }
+ ghvm->vmid = ret;
+ ghvm->vm_status = GH_RM_VM_STATUS_LOAD;
+
+ mutex_lock(&ghvm->mm_lock);
+ list_for_each_entry(mapping, &ghvm->memory_mappings, list) {
+ mapping->parcel.acl_entries[0].vmid = cpu_to_le16(ghvm->vmid);
+ ret = gh_rm_mem_share(ghvm->rm, &mapping->parcel);
+ if (ret) {
+ dev_warn(ghvm->parent, "Failed to share parcel %d: %d\n",
+ mapping->parcel.label, ret);
+ mutex_unlock(&ghvm->mm_lock);
+ goto err;
+ }
+ }
+ mutex_unlock(&ghvm->mm_lock);
+
+ mapping = gh_vm_mem_find_by_addr(ghvm, ghvm->dtb_config.guest_phys_addr,
+ ghvm->dtb_config.size);
+ if (!mapping) {
+ dev_warn(ghvm->parent, "Failed to find the memory_handle for DTB\n");
+ ret = -EINVAL;
+ goto err;
+ }
+
+ mem_handle = mapping->parcel.mem_handle;
+ dtb_offset = ghvm->dtb_config.guest_phys_addr - mapping->guest_phys_addr;
+
+ ret = gh_rm_vm_configure(ghvm->rm, ghvm->vmid, ghvm->auth, mem_handle,
+ 0, 0, dtb_offset, ghvm->dtb_config.size);
+ if (ret) {
+ dev_warn(ghvm->parent, "Failed to configure VM: %d\n", ret);
+ goto err;
+ }
+
+ ret = gh_rm_vm_init(ghvm->rm, ghvm->vmid);
+ if (ret) {
+ ghvm->vm_status = GH_RM_VM_STATUS_INIT_FAILED;
+ dev_warn(ghvm->parent, "Failed to initialize VM: %d\n", ret);
+ goto err;
+ }
+ ghvm->vm_status = GH_RM_VM_STATUS_READY;
+
+ ret = gh_rm_vm_start(ghvm->rm, ghvm->vmid);
+ if (ret) {
+ dev_warn(ghvm->parent, "Failed to start VM: %d\n", ret);
+ goto err;
+ }
+
+ ghvm->vm_status = GH_RM_VM_STATUS_RUNNING;
+ up_write(&ghvm->status_lock);
+ return ret;
+err:
+ /* gh_vm_free will handle releasing resources and reclaiming memory */
+ up_write(&ghvm->status_lock);
+ return ret;
+}
+
+static int gh_vm_ensure_started(struct gh_vm *ghvm)
+{
+ int ret;
+
+ ret = down_read_interruptible(&ghvm->status_lock);
+ if (ret)
+ return ret;
+
+ /* Unlikely because VM is typically started */
+ if (unlikely(ghvm->vm_status == GH_RM_VM_STATUS_NO_STATE)) {
+ up_read(&ghvm->status_lock);
+ ret = gh_vm_start(ghvm);
+ if (ret)
+ return ret;
+ /** gh_vm_start() is guaranteed to bring status out of
+ * GH_RM_VM_STATUS_LOAD, thus inifitely recursive call is not
+ * possible
+ */
+ return gh_vm_ensure_started(ghvm);
+ }
+
+ /* Unlikely because VM is typically running */
+ if (unlikely(ghvm->vm_status != GH_RM_VM_STATUS_RUNNING))
+ ret = -ENODEV;
+
+ up_read(&ghvm->status_lock);
+ return ret;
+}
+
static long gh_vm_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
struct gh_vm *ghvm = filp->private_data;
@@ -61,6 +236,24 @@ static long gh_vm_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
r = gh_vm_mem_alloc(ghvm, ®ion);
break;
}
+ case GH_VM_SET_DTB_CONFIG: {
+ struct gh_vm_dtb_config dtb_config;
+
+ if (copy_from_user(&dtb_config, argp, sizeof(dtb_config)))
+ return -EFAULT;
+
+ if (overflows_type(dtb_config.guest_phys_addr + dtb_config.size, u64))
+ return -EOVERFLOW;
+
+ ghvm->dtb_config = dtb_config;
+
+ r = 0;
+ break;
+ }
+ case GH_VM_START: {
+ r = gh_vm_ensure_started(ghvm);
+ break;
+ }
default:
r = -ENOTTY;
break;
@@ -72,8 +265,30 @@ static long gh_vm_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
static void gh_vm_free(struct work_struct *work)
{
struct gh_vm *ghvm = container_of(work, struct gh_vm, free_work);
+ int ret;
+
+ if (ghvm->vm_status == GH_RM_VM_STATUS_RUNNING)
+ gh_vm_stop(ghvm);
+
+ if (ghvm->vm_status != GH_RM_VM_STATUS_NO_STATE &&
+ ghvm->vm_status != GH_RM_VM_STATUS_LOAD &&
+ ghvm->vm_status != GH_RM_VM_STATUS_RESET) {
+ ret = gh_rm_vm_reset(ghvm->rm, ghvm->vmid);
+ if (ret)
+ dev_err(ghvm->parent, "Failed to reset the vm: %d\n", ret);
+ wait_event(ghvm->vm_status_wait, ghvm->vm_status == GH_RM_VM_STATUS_RESET);
+ }
gh_vm_mem_reclaim(ghvm);
+
+ if (ghvm->vm_status > GH_RM_VM_STATUS_NO_STATE) {
+ gh_rm_notifier_unregister(ghvm->rm, &ghvm->nb);
+
+ ret = gh_rm_dealloc_vmid(ghvm->rm, ghvm->vmid);
+ if (ret)
+ dev_warn(ghvm->parent, "Failed to deallocate vmid: %d\n", ret);
+ }
+
gh_rm_put(ghvm->rm);
mmdrop(ghvm->mm);
kfree(ghvm);
diff --git a/drivers/virt/gunyah/vm_mgr.h b/drivers/virt/gunyah/vm_mgr.h
index 434ef9f662a7..4173bd51f83f 100644
--- a/drivers/virt/gunyah/vm_mgr.h
+++ b/drivers/virt/gunyah/vm_mgr.h
@@ -10,6 +10,8 @@
#include <linux/list.h>
#include <linux/miscdevice.h>
#include <linux/mutex.h>
+#include <linux/rwsem.h>
+#include <linux/wait.h>
#include <uapi/linux/gunyah.h>
@@ -31,8 +33,16 @@ struct gh_vm_mem {
};
struct gh_vm {
+ u16 vmid;
struct gh_rm *rm;
struct device *parent;
+ enum gh_rm_vm_auth_mechanism auth;
+ struct gh_vm_dtb_config dtb_config;
+
+ struct notifier_block nb;
+ enum gh_rm_vm_status vm_status;
+ wait_queue_head_t vm_status_wait;
+ struct rw_semaphore status_lock;
struct work_struct free_work;
struct mm_struct *mm; /* userspace tied to this vm */
@@ -42,5 +52,6 @@ struct gh_vm {
int gh_vm_mem_alloc(struct gh_vm *ghvm, struct gh_userspace_memory_region *region);
void gh_vm_mem_reclaim(struct gh_vm *ghvm);
+struct gh_vm_mem *gh_vm_mem_find_by_addr(struct gh_vm *ghvm, u64 guest_phys_addr, u32 size);
#endif
diff --git a/drivers/virt/gunyah/vm_mgr_mm.c b/drivers/virt/gunyah/vm_mgr_mm.c
index 91109bbf36b3..44cb887268a0 100644
--- a/drivers/virt/gunyah/vm_mgr_mm.c
+++ b/drivers/virt/gunyah/vm_mgr_mm.c
@@ -79,6 +79,26 @@ void gh_vm_mem_reclaim(struct gh_vm *ghvm)
mutex_unlock(&ghvm->mm_lock);
}
+struct gh_vm_mem *gh_vm_mem_find_by_addr(struct gh_vm *ghvm, u64 guest_phys_addr, u32 size)
+{
+ struct gh_vm_mem *mapping;
+
+ if (overflows_type(guest_phys_addr + size, u64))
+ return NULL;
+
+ mutex_lock(&ghvm->mm_lock);
+
+ list_for_each_entry(mapping, &ghvm->memory_mappings, list) {
+ if (gh_vm_mem_overlap(mapping, guest_phys_addr, size))
+ goto unlock;
+ }
+
+ mapping = NULL;
+unlock:
+ mutex_unlock(&ghvm->mm_lock);
+ return mapping;
+}
+
int gh_vm_mem_alloc(struct gh_vm *ghvm, struct gh_userspace_memory_region *region)
{
struct gh_vm_mem *mapping, *tmp_mapping;
diff --git a/include/uapi/linux/gunyah.h b/include/uapi/linux/gunyah.h
index 91d6dd26fcc8..4b63d0b9b8ba 100644
--- a/include/uapi/linux/gunyah.h
+++ b/include/uapi/linux/gunyah.h
@@ -57,4 +57,19 @@ struct gh_userspace_memory_region {
#define GH_VM_SET_USER_MEM_REGION _IOW(GH_IOCTL_TYPE, 0x1, \
struct gh_userspace_memory_region)
+/**
+ * struct gh_vm_dtb_config - Set the location of the VM's devicetree blob
+ * @guest_phys_addr: Address of the VM's devicetree in guest memory.
+ * @size: Maximum size of the devicetree including space for overlays.
+ * Resource manager applies an overlay to the DTB and dtb_size should
+ * include room for the overlay. A page of memory is typicaly plenty.
+ */
+struct gh_vm_dtb_config {
+ __u64 guest_phys_addr;
+ __u64 size;
+};
+#define GH_VM_SET_DTB_CONFIG _IOW(GH_IOCTL_TYPE, 0x2, struct gh_vm_dtb_config)
+
+#define GH_VM_START _IO(GH_IOCTL_TYPE, 0x3)
+
#endif
--
2.40.0
When launching a virtual machine, Gunyah userspace allocates memory for
the guest and informs Gunyah about these memory regions through
SET_USER_MEMORY_REGION ioctl.
Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Elliot Berman <[email protected]>
---
drivers/virt/gunyah/Makefile | 2 +-
drivers/virt/gunyah/vm_mgr.c | 59 +++++++-
drivers/virt/gunyah/vm_mgr.h | 26 ++++
drivers/virt/gunyah/vm_mgr_mm.c | 236 ++++++++++++++++++++++++++++++++
include/uapi/linux/gunyah.h | 37 +++++
5 files changed, 356 insertions(+), 4 deletions(-)
create mode 100644 drivers/virt/gunyah/vm_mgr_mm.c
diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
index e47e25895299..bacf78b8fa33 100644
--- a/drivers/virt/gunyah/Makefile
+++ b/drivers/virt/gunyah/Makefile
@@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
-gunyah-y += rsc_mgr.o rsc_mgr_rpc.o vm_mgr.o
+gunyah-y += rsc_mgr.o rsc_mgr_rpc.o vm_mgr.o vm_mgr_mm.o
obj-$(CONFIG_GUNYAH) += gunyah.o
diff --git a/drivers/virt/gunyah/vm_mgr.c b/drivers/virt/gunyah/vm_mgr.c
index a43401cb34f7..297427952b8c 100644
--- a/drivers/virt/gunyah/vm_mgr.c
+++ b/drivers/virt/gunyah/vm_mgr.c
@@ -15,6 +15,8 @@
#include "vm_mgr.h"
+static void gh_vm_free(struct work_struct *work);
+
static __must_check struct gh_vm *gh_vm_alloc(struct gh_rm *rm)
{
struct gh_vm *ghvm;
@@ -26,20 +28,72 @@ static __must_check struct gh_vm *gh_vm_alloc(struct gh_rm *rm)
ghvm->parent = gh_rm_get(rm);
ghvm->rm = rm;
+ mmgrab(current->mm);
+ ghvm->mm = current->mm;
+ mutex_init(&ghvm->mm_lock);
+ INIT_LIST_HEAD(&ghvm->memory_mappings);
+ INIT_WORK(&ghvm->free_work, gh_vm_free);
+
return ghvm;
}
-static int gh_vm_release(struct inode *inode, struct file *filp)
+static long gh_vm_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
struct gh_vm *ghvm = filp->private_data;
+ void __user *argp = (void __user *)arg;
+ long r;
+
+ switch (cmd) {
+ case GH_VM_SET_USER_MEM_REGION: {
+ struct gh_userspace_memory_region region;
+
+ /* only allow owner task to add memory */
+ if (ghvm->mm != current->mm)
+ return -EPERM;
+
+ if (copy_from_user(®ion, argp, sizeof(region)))
+ return -EFAULT;
+
+ /* All other flag bits are reserved for future use */
+ if (region.flags & ~(GH_MEM_ALLOW_READ | GH_MEM_ALLOW_WRITE | GH_MEM_ALLOW_EXEC))
+ return -EINVAL;
+
+ r = gh_vm_mem_alloc(ghvm, ®ion);
+ break;
+ }
+ default:
+ r = -ENOTTY;
+ break;
+ }
+ return r;
+}
+
+static void gh_vm_free(struct work_struct *work)
+{
+ struct gh_vm *ghvm = container_of(work, struct gh_vm, free_work);
+
+ gh_vm_mem_reclaim(ghvm);
gh_rm_put(ghvm->rm);
+ mmdrop(ghvm->mm);
kfree(ghvm);
+}
+
+static int gh_vm_release(struct inode *inode, struct file *filp)
+{
+ struct gh_vm *ghvm = filp->private_data;
+
+ /* VM will be reset and make RM calls which can interruptible sleep.
+ * Defer to a work so this thread can receive signal.
+ */
+ schedule_work(&ghvm->free_work);
return 0;
}
static const struct file_operations gh_vm_fops = {
.owner = THIS_MODULE,
+ .unlocked_ioctl = gh_vm_ioctl,
+ .compat_ioctl = compat_ptr_ioctl,
.release = gh_vm_release,
.llseek = noop_llseek,
};
@@ -77,8 +131,7 @@ static long gh_dev_ioctl_create_vm(struct gh_rm *rm, unsigned long arg)
err_put_fd:
put_unused_fd(fd);
err_destroy_vm:
- gh_rm_put(ghvm->rm);
- kfree(ghvm);
+ gh_vm_free(&ghvm->free_work);
return err;
}
diff --git a/drivers/virt/gunyah/vm_mgr.h b/drivers/virt/gunyah/vm_mgr.h
index 1e94b58d7d34..434ef9f662a7 100644
--- a/drivers/virt/gunyah/vm_mgr.h
+++ b/drivers/virt/gunyah/vm_mgr.h
@@ -7,14 +7,40 @@
#define _GH_VM_MGR_H
#include <linux/gunyah_rsc_mgr.h>
+#include <linux/list.h>
+#include <linux/miscdevice.h>
+#include <linux/mutex.h>
#include <uapi/linux/gunyah.h>
long gh_dev_vm_mgr_ioctl(struct gh_rm *rm, unsigned int cmd, unsigned long arg);
+enum gh_vm_mem_share_type {
+ VM_MEM_SHARE,
+ VM_MEM_LEND,
+};
+
+struct gh_vm_mem {
+ struct list_head list;
+ enum gh_vm_mem_share_type share_type;
+ struct gh_rm_mem_parcel parcel;
+
+ __u64 guest_phys_addr;
+ struct page **pages;
+ unsigned long npages;
+};
+
struct gh_vm {
struct gh_rm *rm;
struct device *parent;
+
+ struct work_struct free_work;
+ struct mm_struct *mm; /* userspace tied to this vm */
+ struct mutex mm_lock;
+ struct list_head memory_mappings;
};
+int gh_vm_mem_alloc(struct gh_vm *ghvm, struct gh_userspace_memory_region *region);
+void gh_vm_mem_reclaim(struct gh_vm *ghvm);
+
#endif
diff --git a/drivers/virt/gunyah/vm_mgr_mm.c b/drivers/virt/gunyah/vm_mgr_mm.c
new file mode 100644
index 000000000000..91109bbf36b3
--- /dev/null
+++ b/drivers/virt/gunyah/vm_mgr_mm.c
@@ -0,0 +1,236 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#define pr_fmt(fmt) "gh_vm_mgr: " fmt
+
+#include <linux/gunyah_rsc_mgr.h>
+#include <linux/mm.h>
+
+#include <uapi/linux/gunyah.h>
+
+#include "vm_mgr.h"
+
+static bool pages_are_mergeable(struct page *a, struct page *b)
+{
+ if (page_to_pfn(a) + 1 != page_to_pfn(b))
+ return false;
+ if (!zone_device_pages_have_same_pgmap(a, b))
+ return false;
+ return true;
+}
+
+static bool gh_vm_mem_overlap(struct gh_vm_mem *a, u64 addr, u64 size)
+{
+ u64 a_end = a->guest_phys_addr + (a->npages << PAGE_SHIFT);
+ u64 end = addr + size;
+
+ return a->guest_phys_addr < end && addr < a_end;
+}
+
+static struct gh_vm_mem *__gh_vm_mem_find_by_label(struct gh_vm *ghvm, u32 label)
+ __must_hold(&ghvm->mm_lock)
+{
+ struct gh_vm_mem *mapping;
+
+ list_for_each_entry(mapping, &ghvm->memory_mappings, list)
+ if (mapping->parcel.label == label)
+ return mapping;
+
+ return NULL;
+}
+
+static void gh_vm_mem_reclaim_mapping(struct gh_vm *ghvm, struct gh_vm_mem *mapping)
+ __must_hold(&ghvm->mm_lock)
+{
+ int ret = 0;
+
+ if (mapping->parcel.mem_handle != GH_MEM_HANDLE_INVAL) {
+ ret = gh_rm_mem_reclaim(ghvm->rm, &mapping->parcel);
+ if (ret)
+ pr_warn("Failed to reclaim memory parcel for label %d: %d\n",
+ mapping->parcel.label, ret);
+ }
+
+ if (!ret) {
+ unpin_user_pages(mapping->pages, mapping->npages);
+ account_locked_vm(ghvm->mm, mapping->npages, false);
+ }
+
+ kfree(mapping->pages);
+ kfree(mapping->parcel.acl_entries);
+ kfree(mapping->parcel.mem_entries);
+
+ list_del(&mapping->list);
+}
+
+void gh_vm_mem_reclaim(struct gh_vm *ghvm)
+{
+ struct gh_vm_mem *mapping, *tmp;
+
+ mutex_lock(&ghvm->mm_lock);
+
+ list_for_each_entry_safe(mapping, tmp, &ghvm->memory_mappings, list) {
+ gh_vm_mem_reclaim_mapping(ghvm, mapping);
+ kfree(mapping);
+ }
+
+ mutex_unlock(&ghvm->mm_lock);
+}
+
+int gh_vm_mem_alloc(struct gh_vm *ghvm, struct gh_userspace_memory_region *region)
+{
+ struct gh_vm_mem *mapping, *tmp_mapping;
+ struct page *curr_page, *prev_page;
+ struct gh_rm_mem_parcel *parcel;
+ int i, j, pinned, ret = 0;
+ unsigned int gup_flags;
+ size_t entry_size;
+ u16 vmid;
+
+ if (!region->memory_size || !PAGE_ALIGNED(region->memory_size) ||
+ !PAGE_ALIGNED(region->userspace_addr) ||
+ !PAGE_ALIGNED(region->guest_phys_addr))
+ return -EINVAL;
+
+ if (overflows_type(region->guest_phys_addr + region->memory_size, u64))
+ return -EOVERFLOW;
+
+ ret = mutex_lock_interruptible(&ghvm->mm_lock);
+ if (ret)
+ return ret;
+
+ mapping = __gh_vm_mem_find_by_label(ghvm, region->label);
+ if (mapping) {
+ ret = -EEXIST;
+ goto unlock;
+ }
+
+ list_for_each_entry(tmp_mapping, &ghvm->memory_mappings, list) {
+ if (gh_vm_mem_overlap(tmp_mapping, region->guest_phys_addr,
+ region->memory_size)) {
+ ret = -EEXIST;
+ goto unlock;
+ }
+ }
+
+ mapping = kzalloc(sizeof(*mapping), GFP_KERNEL_ACCOUNT);
+ if (!mapping) {
+ ret = -ENOMEM;
+ goto unlock;
+ }
+
+ mapping->guest_phys_addr = region->guest_phys_addr;
+ mapping->npages = region->memory_size >> PAGE_SHIFT;
+ parcel = &mapping->parcel;
+ parcel->label = region->label;
+ parcel->mem_handle = GH_MEM_HANDLE_INVAL; /* to be filled later by mem_share/mem_lend */
+ parcel->mem_type = GH_RM_MEM_TYPE_NORMAL;
+
+ ret = account_locked_vm(ghvm->mm, mapping->npages, true);
+ if (ret)
+ goto free_mapping;
+
+ mapping->pages = kcalloc(mapping->npages, sizeof(*mapping->pages), GFP_KERNEL_ACCOUNT);
+ if (!mapping->pages) {
+ ret = -ENOMEM;
+ mapping->npages = 0; /* update npages for reclaim */
+ goto unlock_pages;
+ }
+
+ gup_flags = FOLL_LONGTERM;
+ if (region->flags & GH_MEM_ALLOW_WRITE)
+ gup_flags |= FOLL_WRITE;
+
+ pinned = pin_user_pages_fast(region->userspace_addr, mapping->npages,
+ gup_flags, mapping->pages);
+ if (pinned < 0) {
+ ret = pinned;
+ goto free_pages;
+ } else if (pinned != mapping->npages) {
+ ret = -EFAULT;
+ mapping->npages = pinned; /* update npages for reclaim */
+ goto unpin_pages;
+ }
+
+ parcel->n_acl_entries = 2;
+ mapping->share_type = VM_MEM_SHARE;
+ parcel->acl_entries = kcalloc(parcel->n_acl_entries, sizeof(*parcel->acl_entries),
+ GFP_KERNEL);
+ if (!parcel->acl_entries) {
+ ret = -ENOMEM;
+ goto unpin_pages;
+ }
+
+ /* acl_entries[0].vmid will be this VM's vmid. We'll fill it when the
+ * VM is starting and we know the VM's vmid.
+ */
+ if (region->flags & GH_MEM_ALLOW_READ)
+ parcel->acl_entries[0].perms |= GH_RM_ACL_R;
+ if (region->flags & GH_MEM_ALLOW_WRITE)
+ parcel->acl_entries[0].perms |= GH_RM_ACL_W;
+ if (region->flags & GH_MEM_ALLOW_EXEC)
+ parcel->acl_entries[0].perms |= GH_RM_ACL_X;
+
+ ret = gh_rm_get_vmid(ghvm->rm, &vmid);
+ if (ret)
+ goto free_acl;
+
+ parcel->acl_entries[1].vmid = cpu_to_le16(vmid);
+ /* Host assumed to have all these permissions. Gunyah will not
+ * grant new permissions if host actually had less than RWX
+ */
+ parcel->acl_entries[1].perms = GH_RM_ACL_R | GH_RM_ACL_W | GH_RM_ACL_X;
+
+ parcel->n_mem_entries = 1;
+ for (i = 1; i < mapping->npages; i++) {
+ if (!pages_are_mergeable(mapping->pages[i - 1], mapping->pages[i]))
+ parcel->n_mem_entries++;
+ }
+
+ parcel->mem_entries = kcalloc(parcel->n_mem_entries,
+ sizeof(parcel->mem_entries[0]),
+ GFP_KERNEL_ACCOUNT);
+ if (!parcel->mem_entries) {
+ ret = -ENOMEM;
+ goto free_acl;
+ }
+
+ /* reduce number of entries by combining contiguous pages into single memory entry */
+ prev_page = mapping->pages[0];
+ parcel->mem_entries[0].phys_addr = cpu_to_le64(page_to_phys(prev_page));
+ entry_size = PAGE_SIZE;
+ for (i = 1, j = 0; i < mapping->npages; i++) {
+ curr_page = mapping->pages[i];
+ if (pages_are_mergeable(prev_page, curr_page)) {
+ entry_size += PAGE_SIZE;
+ } else {
+ parcel->mem_entries[j].size = cpu_to_le64(entry_size);
+ j++;
+ parcel->mem_entries[j].phys_addr =
+ cpu_to_le64(page_to_phys(curr_page));
+ entry_size = PAGE_SIZE;
+ }
+
+ prev_page = curr_page;
+ }
+ parcel->mem_entries[j].size = cpu_to_le64(entry_size);
+
+ list_add(&mapping->list, &ghvm->memory_mappings);
+ mutex_unlock(&ghvm->mm_lock);
+ return 0;
+free_acl:
+ kfree(parcel->acl_entries);
+unpin_pages:
+ unpin_user_pages(mapping->pages, pinned);
+free_pages:
+ kfree(mapping->pages);
+unlock_pages:
+ account_locked_vm(ghvm->mm, mapping->npages, false);
+free_mapping:
+ kfree(mapping);
+unlock:
+ mutex_unlock(&ghvm->mm_lock);
+ return ret;
+}
diff --git a/include/uapi/linux/gunyah.h b/include/uapi/linux/gunyah.h
index 86b9cb60118d..91d6dd26fcc8 100644
--- a/include/uapi/linux/gunyah.h
+++ b/include/uapi/linux/gunyah.h
@@ -20,4 +20,41 @@
*/
#define GH_CREATE_VM _IO(GH_IOCTL_TYPE, 0x0) /* Returns a Gunyah VM fd */
+/*
+ * ioctls for VM fds
+ */
+
+/**
+ * enum gh_mem_flags - Possible flags on &struct gh_userspace_memory_region
+ * @GH_MEM_ALLOW_READ: Allow guest to read the memory
+ * @GH_MEM_ALLOW_WRITE: Allow guest to write to the memory
+ * @GH_MEM_ALLOW_EXEC: Allow guest to execute instructions in the memory
+ */
+enum gh_mem_flags {
+ GH_MEM_ALLOW_READ = 1UL << 0,
+ GH_MEM_ALLOW_WRITE = 1UL << 1,
+ GH_MEM_ALLOW_EXEC = 1UL << 2,
+};
+
+/**
+ * struct gh_userspace_memory_region - Userspace memory descripion for GH_VM_SET_USER_MEM_REGION
+ * @label: Identifer to the region which is unique to the VM.
+ * @flags: Flags for memory parcel behavior. See &enum gh_mem_flags.
+ * @guest_phys_addr: Location of the memory region in guest's memory space (page-aligned)
+ * @memory_size: Size of the region (page-aligned)
+ * @userspace_addr: Location of the memory region in caller (userspace)'s memory
+ *
+ * See Documentation/virt/gunyah/vm-manager.rst for further details.
+ */
+struct gh_userspace_memory_region {
+ __u32 label;
+ __u32 flags;
+ __u64 guest_phys_addr;
+ __u64 memory_size;
+ __u64 userspace_addr;
+};
+
+#define GH_VM_SET_USER_MEM_REGION _IOW(GH_IOCTL_TYPE, 0x1, \
+ struct gh_userspace_memory_region)
+
#endif
--
2.40.0
Gunyah doorbells allow two virtual machines to signal each other using
interrupts. Add the hypercalls needed to assert the interrupt.
Signed-off-by: Elliot Berman <[email protected]>
---
arch/arm64/gunyah/gunyah_hypercall.c | 25 +++++++++++++++++++++++++
include/linux/gunyah.h | 3 +++
2 files changed, 28 insertions(+)
diff --git a/arch/arm64/gunyah/gunyah_hypercall.c b/arch/arm64/gunyah/gunyah_hypercall.c
index 5f33f53e05a9..3d48c8650851 100644
--- a/arch/arm64/gunyah/gunyah_hypercall.c
+++ b/arch/arm64/gunyah/gunyah_hypercall.c
@@ -33,6 +33,8 @@ EXPORT_SYMBOL_GPL(arch_is_gh_guest);
fn)
#define GH_HYPERCALL_HYP_IDENTIFY GH_HYPERCALL(0x8000)
+#define GH_HYPERCALL_BELL_SEND GH_HYPERCALL(0x8012)
+#define GH_HYPERCALL_BELL_SET_MASK GH_HYPERCALL(0x8015)
#define GH_HYPERCALL_MSGQ_SEND GH_HYPERCALL(0x801B)
#define GH_HYPERCALL_MSGQ_RECV GH_HYPERCALL(0x801C)
#define GH_HYPERCALL_VCPU_RUN GH_HYPERCALL(0x8065)
@@ -55,6 +57,29 @@ void gh_hypercall_hyp_identify(struct gh_hypercall_hyp_identify_resp *hyp_identi
}
EXPORT_SYMBOL_GPL(gh_hypercall_hyp_identify);
+enum gh_error gh_hypercall_bell_send(u64 capid, u64 new_flags, u64 *old_flags)
+{
+ struct arm_smccc_res res;
+
+ arm_smccc_1_1_hvc(GH_HYPERCALL_BELL_SEND, capid, new_flags, 0, &res);
+
+ if (res.a0 == GH_ERROR_OK && old_flags)
+ *old_flags = res.a1;
+
+ return res.a0;
+}
+EXPORT_SYMBOL_GPL(gh_hypercall_bell_send);
+
+enum gh_error gh_hypercall_bell_set_mask(u64 capid, u64 enable_mask, u64 ack_mask)
+{
+ struct arm_smccc_res res;
+
+ arm_smccc_1_1_hvc(GH_HYPERCALL_BELL_SET_MASK, capid, enable_mask, ack_mask, 0, &res);
+
+ return res.a0;
+}
+EXPORT_SYMBOL_GPL(gh_hypercall_bell_set_mask);
+
enum gh_error gh_hypercall_msgq_send(u64 capid, size_t size, void *buff, u64 tx_flags, bool *ready)
{
struct arm_smccc_res res;
diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
index cd5704a82c6a..1f1685518bf3 100644
--- a/include/linux/gunyah.h
+++ b/include/linux/gunyah.h
@@ -171,6 +171,9 @@ static inline u16 gh_api_version(const struct gh_hypercall_hyp_identify_resp *gh
void gh_hypercall_hyp_identify(struct gh_hypercall_hyp_identify_resp *hyp_identity);
+enum gh_error gh_hypercall_bell_send(u64 capid, u64 new_flags, u64 *old_flags);
+enum gh_error gh_hypercall_bell_set_mask(u64 capid, u64 enable_mask, u64 ack_mask);
+
#define GH_HYPERCALL_MSGQ_TX_FLAGS_PUSH BIT(0)
enum gh_error gh_hypercall_msgq_send(u64 capid, size_t size, void *buff, u64 tx_flags, bool *ready);
--
2.40.0
When Linux is booted as a guest under the Gunyah hypervisor, the Gunyah
Resource Manager applies a devicetree overlay describing the virtual
platform configuration of the guest VM, such as the message queue
capability IDs for communicating with the Resource Manager. This
information is not otherwise discoverable by a VM: the Gunyah hypervisor
core does not provide a direct interface to discover capability IDs nor
a way to communicate with RM without having already known the
corresponding message queue capability ID. Add the DT bindings that
Gunyah adheres for the hypervisor node and message queues.
Reviewed-by: Rob Herring <[email protected]>
Signed-off-by: Elliot Berman <[email protected]>
---
.../bindings/firmware/gunyah-hypervisor.yaml | 82 +++++++++++++++++++
1 file changed, 82 insertions(+)
create mode 100644 Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml
diff --git a/Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml b/Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml
new file mode 100644
index 000000000000..3fc0b043ac3c
--- /dev/null
+++ b/Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml
@@ -0,0 +1,82 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/firmware/gunyah-hypervisor.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Gunyah Hypervisor
+
+maintainers:
+ - Prakruthi Deepak Heragu <[email protected]>
+ - Elliot Berman <[email protected]>
+
+description: |+
+ Gunyah virtual machines use this information to determine the capability IDs
+ of the message queues used to communicate with the Gunyah Resource Manager.
+ See also: https://github.com/quic/gunyah-resource-manager/blob/develop/src/vm_creation/dto_construct.c
+
+properties:
+ compatible:
+ const: gunyah-hypervisor
+
+ "#address-cells":
+ description: Number of cells needed to represent 64-bit capability IDs.
+ const: 2
+
+ "#size-cells":
+ description: must be 0, because capability IDs are not memory address
+ ranges and do not have a size.
+ const: 0
+
+patternProperties:
+ "^gunyah-resource-mgr(@.*)?":
+ type: object
+ description:
+ Resource Manager node which is required to communicate to Resource
+ Manager VM using Gunyah Message Queues.
+
+ properties:
+ compatible:
+ const: gunyah-resource-manager
+
+ reg:
+ items:
+ - description: Gunyah capability ID of the TX message queue
+ - description: Gunyah capability ID of the RX message queue
+
+ interrupts:
+ items:
+ - description: Interrupt for the TX message queue
+ - description: Interrupt for the RX message queue
+
+ additionalProperties: false
+
+ required:
+ - compatible
+ - reg
+ - interrupts
+
+additionalProperties: false
+
+required:
+ - compatible
+ - "#address-cells"
+ - "#size-cells"
+
+examples:
+ - |
+ #include <dt-bindings/interrupt-controller/arm-gic.h>
+
+ hypervisor {
+ #address-cells = <2>;
+ #size-cells = <0>;
+ compatible = "gunyah-hypervisor";
+
+ gunyah-resource-mgr@0 {
+ compatible = "gunyah-resource-manager";
+ interrupts = <GIC_SPI 3 IRQ_TYPE_EDGE_RISING>, /* TX full IRQ */
+ <GIC_SPI 4 IRQ_TYPE_EDGE_RISING>; /* RX empty IRQ */
+ reg = <0x00000000 0x00000000>, <0x00000000 0x00000001>;
+ /* TX, RX cap ids */
+ };
+ };
--
2.40.0
On Qualcomm platforms, there is a firmware entity which controls access
to physical pages. In order to share memory with another VM, this entity
needs to be informed that the guest VM should have access to the memory.
Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Elliot Berman <[email protected]>
---
drivers/virt/gunyah/Kconfig | 4 ++
drivers/virt/gunyah/Makefile | 2 +
drivers/virt/gunyah/gunyah_platform_hooks.c | 80 +++++++++++++++++++++
drivers/virt/gunyah/rsc_mgr.h | 3 +
drivers/virt/gunyah/rsc_mgr_rpc.c | 18 ++++-
include/linux/gunyah_rsc_mgr.h | 17 +++++
6 files changed, 122 insertions(+), 2 deletions(-)
create mode 100644 drivers/virt/gunyah/gunyah_platform_hooks.c
diff --git a/drivers/virt/gunyah/Kconfig b/drivers/virt/gunyah/Kconfig
index 1a737694c333..de815189dab6 100644
--- a/drivers/virt/gunyah/Kconfig
+++ b/drivers/virt/gunyah/Kconfig
@@ -4,6 +4,7 @@ config GUNYAH
tristate "Gunyah Virtualization drivers"
depends on ARM64
depends on MAILBOX
+ select GUNYAH_PLATFORM_HOOKS
help
The Gunyah drivers are the helper interfaces that run in a guest VM
such as basic inter-VM IPC and signaling mechanisms, and higher level
@@ -11,3 +12,6 @@ config GUNYAH
Say Y/M here to enable the drivers needed to interact in a Gunyah
virtual environment.
+
+config GUNYAH_PLATFORM_HOOKS
+ tristate
diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
index bacf78b8fa33..4fbeee521d60 100644
--- a/drivers/virt/gunyah/Makefile
+++ b/drivers/virt/gunyah/Makefile
@@ -1,4 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
+obj-$(CONFIG_GUNYAH_PLATFORM_HOOKS) += gunyah_platform_hooks.o
+
gunyah-y += rsc_mgr.o rsc_mgr_rpc.o vm_mgr.o vm_mgr_mm.o
obj-$(CONFIG_GUNYAH) += gunyah.o
diff --git a/drivers/virt/gunyah/gunyah_platform_hooks.c b/drivers/virt/gunyah/gunyah_platform_hooks.c
new file mode 100644
index 000000000000..60da0e154e98
--- /dev/null
+++ b/drivers/virt/gunyah/gunyah_platform_hooks.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#include <linux/module.h>
+#include <linux/rwsem.h>
+#include <linux/gunyah_rsc_mgr.h>
+
+#include "rsc_mgr.h"
+
+static struct gh_rm_platform_ops *rm_platform_ops;
+static DECLARE_RWSEM(rm_platform_ops_lock);
+
+int gh_rm_platform_pre_mem_share(struct gh_rm *rm, struct gh_rm_mem_parcel *mem_parcel)
+{
+ int ret = 0;
+
+ down_read(&rm_platform_ops_lock);
+ if (rm_platform_ops && rm_platform_ops->pre_mem_share)
+ ret = rm_platform_ops->pre_mem_share(rm, mem_parcel);
+ up_read(&rm_platform_ops_lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(gh_rm_platform_pre_mem_share);
+
+int gh_rm_platform_post_mem_reclaim(struct gh_rm *rm, struct gh_rm_mem_parcel *mem_parcel)
+{
+ int ret = 0;
+
+ down_read(&rm_platform_ops_lock);
+ if (rm_platform_ops && rm_platform_ops->post_mem_reclaim)
+ ret = rm_platform_ops->post_mem_reclaim(rm, mem_parcel);
+ up_read(&rm_platform_ops_lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(gh_rm_platform_post_mem_reclaim);
+
+int gh_rm_register_platform_ops(struct gh_rm_platform_ops *platform_ops)
+{
+ int ret = 0;
+
+ down_write(&rm_platform_ops_lock);
+ if (!rm_platform_ops)
+ rm_platform_ops = platform_ops;
+ else
+ ret = -EEXIST;
+ up_write(&rm_platform_ops_lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(gh_rm_register_platform_ops);
+
+void gh_rm_unregister_platform_ops(struct gh_rm_platform_ops *platform_ops)
+{
+ down_write(&rm_platform_ops_lock);
+ if (rm_platform_ops == platform_ops)
+ rm_platform_ops = NULL;
+ up_write(&rm_platform_ops_lock);
+}
+EXPORT_SYMBOL_GPL(gh_rm_unregister_platform_ops);
+
+static void _devm_gh_rm_unregister_platform_ops(void *data)
+{
+ gh_rm_unregister_platform_ops(data);
+}
+
+int devm_gh_rm_register_platform_ops(struct device *dev, struct gh_rm_platform_ops *ops)
+{
+ int ret;
+
+ ret = gh_rm_register_platform_ops(ops);
+ if (ret)
+ return ret;
+
+ return devm_add_action(dev, _devm_gh_rm_unregister_platform_ops, ops);
+}
+EXPORT_SYMBOL_GPL(devm_gh_rm_register_platform_ops);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Gunyah Platform Hooks");
diff --git a/drivers/virt/gunyah/rsc_mgr.h b/drivers/virt/gunyah/rsc_mgr.h
index 8309b7bf4668..3d43bb79ff44 100644
--- a/drivers/virt/gunyah/rsc_mgr.h
+++ b/drivers/virt/gunyah/rsc_mgr.h
@@ -13,4 +13,7 @@ struct gh_rm;
int gh_rm_call(struct gh_rm *rsc_mgr, u32 message_id, const void *req_buf, size_t req_buf_size,
void **resp_buf, size_t *resp_buf_size);
+int gh_rm_platform_pre_mem_share(struct gh_rm *rm, struct gh_rm_mem_parcel *mem_parcel);
+int gh_rm_platform_post_mem_reclaim(struct gh_rm *rm, struct gh_rm_mem_parcel *mem_parcel);
+
#endif
diff --git a/drivers/virt/gunyah/rsc_mgr_rpc.c b/drivers/virt/gunyah/rsc_mgr_rpc.c
index 4f25f07400b3..835a851ae71b 100644
--- a/drivers/virt/gunyah/rsc_mgr_rpc.c
+++ b/drivers/virt/gunyah/rsc_mgr_rpc.c
@@ -207,6 +207,12 @@ static int gh_rm_mem_lend_common(struct gh_rm *rm, u32 message_id, struct gh_rm_
if (!msg)
return -ENOMEM;
+ ret = gh_rm_platform_pre_mem_share(rm, p);
+ if (ret) {
+ kfree(msg);
+ return ret;
+ }
+
req_header = msg;
acl_section = (void *)req_header + sizeof(*req_header);
mem_section = (void *)acl_section + acl_section_size;
@@ -231,8 +237,10 @@ static int gh_rm_mem_lend_common(struct gh_rm *rm, u32 message_id, struct gh_rm_
ret = gh_rm_call(rm, message_id, msg, msg_size, (void **)&resp, &resp_size);
kfree(msg);
- if (ret)
+ if (ret) {
+ gh_rm_platform_post_mem_reclaim(rm, p);
return ret;
+ }
p->mem_handle = le32_to_cpu(*resp);
kfree(resp);
@@ -287,8 +295,14 @@ int gh_rm_mem_reclaim(struct gh_rm *rm, struct gh_rm_mem_parcel *parcel)
struct gh_rm_mem_release_req req = {
.mem_handle = cpu_to_le32(parcel->mem_handle),
};
+ int ret;
+
+ ret = gh_rm_call(rm, GH_RM_RPC_MEM_RECLAIM, &req, sizeof(req), NULL, NULL);
+ /* Only call the platform mem reclaim hooks if we reclaimed the memory */
+ if (ret)
+ return ret;
- return gh_rm_call(rm, GH_RM_RPC_MEM_RECLAIM, &req, sizeof(req), NULL, NULL);
+ return gh_rm_platform_post_mem_reclaim(rm, parcel);
}
/**
diff --git a/include/linux/gunyah_rsc_mgr.h b/include/linux/gunyah_rsc_mgr.h
index dfac088420bd..7c599654ea30 100644
--- a/include/linux/gunyah_rsc_mgr.h
+++ b/include/linux/gunyah_rsc_mgr.h
@@ -139,4 +139,21 @@ int gh_rm_get_hyp_resources(struct gh_rm *rm, u16 vmid,
struct gh_rm_hyp_resources **resources);
int gh_rm_get_vmid(struct gh_rm *rm, u16 *vmid);
+struct gh_rm_platform_ops {
+ int (*pre_mem_share)(struct gh_rm *rm, struct gh_rm_mem_parcel *mem_parcel);
+ int (*post_mem_reclaim)(struct gh_rm *rm, struct gh_rm_mem_parcel *mem_parcel);
+};
+
+#if IS_ENABLED(CONFIG_GUNYAH_PLATFORM_HOOKS)
+int gh_rm_register_platform_ops(struct gh_rm_platform_ops *platform_ops);
+void gh_rm_unregister_platform_ops(struct gh_rm_platform_ops *platform_ops);
+int devm_gh_rm_register_platform_ops(struct device *dev, struct gh_rm_platform_ops *ops);
+#else
+static inline int gh_rm_register_platform_ops(struct gh_rm_platform_ops *platform_ops)
+ { return 0; }
+static inline void gh_rm_unregister_platform_ops(struct gh_rm_platform_ops *platform_ops) { }
+static inline int devm_gh_rm_register_platform_ops(struct device *dev,
+ struct gh_rm_platform_ops *ops) { return 0; }
+#endif
+
#endif
--
2.40.0
Some VM functions need to acquire Gunyah resources. For instance, Gunyah
vCPUs are exposed to the host as a resource. The Gunyah vCPU function
will register a resource ticket and be able to interact with the
hypervisor once the resource ticket is filled.
Resource tickets are the mechanism for functions to acquire ownership of
Gunyah resources. Gunyah functions can be created before the VM's
resources are created and made available to Linux. A resource ticket
identifies a type of resource and a label of a resource which the ticket
holder is interested in.
Resources are created by Gunyah as configured in the VM's devicetree
configuration. Gunyah doesn't process the label and that makes it
possible for userspace to create multiple resources with the same label.
Resource ticket owners need to be prepared for populate to be called
multiple times if userspace created multiple resources with the same
label.
Signed-off-by: Elliot Berman <[email protected]>
---
drivers/virt/gunyah/vm_mgr.c | 117 +++++++++++++++++++++++++++++++++-
drivers/virt/gunyah/vm_mgr.h | 4 ++
include/linux/gunyah_vm_mgr.h | 14 ++++
3 files changed, 134 insertions(+), 1 deletion(-)
diff --git a/drivers/virt/gunyah/vm_mgr.c b/drivers/virt/gunyah/vm_mgr.c
index 56464451b262..6228090aceb6 100644
--- a/drivers/virt/gunyah/vm_mgr.c
+++ b/drivers/virt/gunyah/vm_mgr.c
@@ -186,6 +186,99 @@ void gh_vm_function_unregister(struct gh_vm_function *fn)
}
EXPORT_SYMBOL_GPL(gh_vm_function_unregister);
+int gh_vm_add_resource_ticket(struct gh_vm *ghvm, struct gh_vm_resource_ticket *ticket)
+{
+ struct gh_vm_resource_ticket *iter;
+ struct gh_resource *ghrsc, *rsc_iter;
+ int ret = 0;
+
+ mutex_lock(&ghvm->resources_lock);
+ list_for_each_entry(iter, &ghvm->resource_tickets, vm_list) {
+ if (iter->resource_type == ticket->resource_type && iter->label == ticket->label) {
+ ret = -EEXIST;
+ goto out;
+ }
+ }
+
+ if (!try_module_get(ticket->owner)) {
+ ret = -ENODEV;
+ goto out;
+ }
+
+ list_add(&ticket->vm_list, &ghvm->resource_tickets);
+ INIT_LIST_HEAD(&ticket->resources);
+
+ list_for_each_entry_safe(ghrsc, rsc_iter, &ghvm->resources, list) {
+ if (ghrsc->type == ticket->resource_type && ghrsc->rm_label == ticket->label) {
+ if (ticket->populate(ticket, ghrsc))
+ list_move(&ghrsc->list, &ticket->resources);
+ }
+ }
+out:
+ mutex_unlock(&ghvm->resources_lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(gh_vm_add_resource_ticket);
+
+void gh_vm_remove_resource_ticket(struct gh_vm *ghvm, struct gh_vm_resource_ticket *ticket)
+{
+ struct gh_resource *ghrsc, *iter;
+
+ mutex_lock(&ghvm->resources_lock);
+ list_for_each_entry_safe(ghrsc, iter, &ticket->resources, list) {
+ ticket->unpopulate(ticket, ghrsc);
+ list_move(&ghrsc->list, &ghvm->resources);
+ }
+
+ module_put(ticket->owner);
+ list_del(&ticket->vm_list);
+ mutex_unlock(&ghvm->resources_lock);
+}
+EXPORT_SYMBOL_GPL(gh_vm_remove_resource_ticket);
+
+static void gh_vm_add_resource(struct gh_vm *ghvm, struct gh_resource *ghrsc)
+{
+ struct gh_vm_resource_ticket *ticket;
+
+ mutex_lock(&ghvm->resources_lock);
+ list_for_each_entry(ticket, &ghvm->resource_tickets, vm_list) {
+ if (ghrsc->type == ticket->resource_type && ghrsc->rm_label == ticket->label) {
+ if (ticket->populate(ticket, ghrsc))
+ list_add(&ghrsc->list, &ticket->resources);
+ else
+ list_add(&ghrsc->list, &ghvm->resources);
+ /* unconditonal -- we prevent multiple identical
+ * resource tickets so there will not be some other
+ * ticket elsewhere in the list if populate() failed.
+ */
+ goto found;
+ }
+ }
+ list_add(&ghrsc->list, &ghvm->resources);
+found:
+ mutex_unlock(&ghvm->resources_lock);
+}
+
+static void gh_vm_clean_resources(struct gh_vm *ghvm)
+{
+ struct gh_vm_resource_ticket *ticket, *titer;
+ struct gh_resource *ghrsc, *riter;
+
+ mutex_lock(&ghvm->resources_lock);
+ if (!list_empty(&ghvm->resource_tickets)) {
+ dev_warn(ghvm->parent, "Dangling resource tickets:\n");
+ list_for_each_entry_safe(ticket, titer, &ghvm->resource_tickets, vm_list) {
+ dev_warn(ghvm->parent, " %pS\n", ticket->populate);
+ gh_vm_remove_resource_ticket(ghvm, ticket);
+ }
+ }
+
+ list_for_each_entry_safe(ghrsc, riter, &ghvm->resources, list) {
+ gh_rm_free_resource(ghrsc);
+ }
+ mutex_unlock(&ghvm->resources_lock);
+}
+
static int gh_vm_rm_notification_status(struct gh_vm *ghvm, void *data)
{
struct gh_rm_vm_status_payload *payload = data;
@@ -268,6 +361,9 @@ static __must_check struct gh_vm *gh_vm_alloc(struct gh_rm *rm)
init_waitqueue_head(&ghvm->vm_status_wait);
INIT_WORK(&ghvm->free_work, gh_vm_free);
kref_init(&ghvm->kref);
+ mutex_init(&ghvm->resources_lock);
+ INIT_LIST_HEAD(&ghvm->resources);
+ INIT_LIST_HEAD(&ghvm->resource_tickets);
INIT_LIST_HEAD(&ghvm->functions);
ghvm->vm_status = GH_RM_VM_STATUS_NO_STATE;
@@ -277,9 +373,11 @@ static __must_check struct gh_vm *gh_vm_alloc(struct gh_rm *rm)
static int gh_vm_start(struct gh_vm *ghvm)
{
struct gh_vm_mem *mapping;
+ struct gh_rm_hyp_resources *resources;
+ struct gh_resource *ghrsc;
u64 dtb_offset;
u32 mem_handle;
- int ret;
+ int ret, i, n;
down_write(&ghvm->status_lock);
if (ghvm->vm_status != GH_RM_VM_STATUS_NO_STATE) {
@@ -339,6 +437,22 @@ static int gh_vm_start(struct gh_vm *ghvm)
}
ghvm->vm_status = GH_RM_VM_STATUS_READY;
+ ret = gh_rm_get_hyp_resources(ghvm->rm, ghvm->vmid, &resources);
+ if (ret) {
+ dev_warn(ghvm->parent, "Failed to get hypervisor resources for VM: %d\n", ret);
+ goto err;
+ }
+
+ for (i = 0, n = le32_to_cpu(resources->n_entries); i < n; i++) {
+ ghrsc = gh_rm_alloc_resource(ghvm->rm, &resources->entries[i]);
+ if (!ghrsc) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ gh_vm_add_resource(ghvm, ghrsc);
+ }
+
ret = gh_rm_vm_start(ghvm->rm, ghvm->vmid);
if (ret) {
dev_warn(ghvm->parent, "Failed to start VM: %d\n", ret);
@@ -460,6 +574,7 @@ static void gh_vm_free(struct work_struct *work)
gh_vm_stop(ghvm);
gh_vm_remove_functions(ghvm);
+ gh_vm_clean_resources(ghvm);
if (ghvm->vm_status != GH_RM_VM_STATUS_NO_STATE &&
ghvm->vm_status != GH_RM_VM_STATUS_LOAD &&
diff --git a/drivers/virt/gunyah/vm_mgr.h b/drivers/virt/gunyah/vm_mgr.h
index c4bec1469ae8..e5e0c92d4cb1 100644
--- a/drivers/virt/gunyah/vm_mgr.h
+++ b/drivers/virt/gunyah/vm_mgr.h
@@ -7,6 +7,7 @@
#define _GH_VM_MGR_H
#include <linux/gunyah_rsc_mgr.h>
+#include <linux/gunyah_vm_mgr.h>
#include <linux/list.h>
#include <linux/kref.h>
#include <linux/miscdevice.h>
@@ -52,6 +53,9 @@ struct gh_vm {
struct list_head memory_mappings;
struct mutex fn_lock;
struct list_head functions;
+ struct mutex resources_lock;
+ struct list_head resources;
+ struct list_head resource_tickets;
};
int gh_vm_mem_alloc(struct gh_vm *ghvm, struct gh_userspace_memory_region *region);
diff --git a/include/linux/gunyah_vm_mgr.h b/include/linux/gunyah_vm_mgr.h
index 1f0dc43ade50..e3a6666d7529 100644
--- a/include/linux/gunyah_vm_mgr.h
+++ b/include/linux/gunyah_vm_mgr.h
@@ -84,4 +84,18 @@ void gh_vm_function_unregister(struct gh_vm_function *f);
module_gh_vm_function(_name); \
MODULE_ALIAS_GH_VM_FUNCTION(_type, _idx)
+struct gh_vm_resource_ticket {
+ struct list_head vm_list; /* for gh_vm's resource tickets list */
+ struct list_head resources; /* resources associated with this ticket */
+ enum gh_resource_type resource_type;
+ u32 label;
+
+ struct module *owner;
+ bool (*populate)(struct gh_vm_resource_ticket *ticket, struct gh_resource *ghrsc);
+ void (*unpopulate)(struct gh_vm_resource_ticket *ticket, struct gh_resource *ghrsc);
+};
+
+int gh_vm_add_resource_ticket(struct gh_vm *ghvm, struct gh_vm_resource_ticket *ticket);
+void gh_vm_remove_resource_ticket(struct gh_vm *ghvm, struct gh_vm_resource_ticket *ticket);
+
#endif
--
2.40.0
Add a sample Gunyah VMM capable of launching a non-proxy scheduled VM.
Signed-off-by: Elliot Berman <[email protected]>
---
samples/Kconfig | 10 ++
samples/Makefile | 1 +
samples/gunyah/.gitignore | 2 +
samples/gunyah/Makefile | 6 +
samples/gunyah/gunyah_vmm.c | 270 +++++++++++++++++++++++++++++++++++
samples/gunyah/sample_vm.dts | 68 +++++++++
6 files changed, 357 insertions(+)
create mode 100644 samples/gunyah/.gitignore
create mode 100644 samples/gunyah/Makefile
create mode 100644 samples/gunyah/gunyah_vmm.c
create mode 100644 samples/gunyah/sample_vm.dts
diff --git a/samples/Kconfig b/samples/Kconfig
index b2db430bd3ff..567c7a706c01 100644
--- a/samples/Kconfig
+++ b/samples/Kconfig
@@ -280,6 +280,16 @@ config SAMPLE_KMEMLEAK
Build a sample program which have explicitly leaks memory to test
kmemleak
+config SAMPLE_GUNYAH
+ bool "Build example Gunyah Virtual Machine Manager"
+ depends on CC_CAN_LINK && HEADERS_INSTALL
+ depends on GUNYAH
+ help
+ Build an example Gunyah VMM userspace program capable of launching
+ a basic virtual machine under the Gunyah hypervisor.
+ This demonstrates how to create a virtual machine under the Gunyah
+ hypervisor.
+
source "samples/rust/Kconfig"
endif # SAMPLES
diff --git a/samples/Makefile b/samples/Makefile
index 7727f1a0d6d1..e1b92dec169f 100644
--- a/samples/Makefile
+++ b/samples/Makefile
@@ -37,3 +37,4 @@ obj-$(CONFIG_SAMPLE_KMEMLEAK) += kmemleak/
obj-$(CONFIG_SAMPLE_CORESIGHT_SYSCFG) += coresight/
obj-$(CONFIG_SAMPLE_FPROBE) += fprobe/
obj-$(CONFIG_SAMPLES_RUST) += rust/
+obj-$(CONFIG_SAMPLE_GUNYAH) += gunyah/
diff --git a/samples/gunyah/.gitignore b/samples/gunyah/.gitignore
new file mode 100644
index 000000000000..adc7d1589fde
--- /dev/null
+++ b/samples/gunyah/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0
+/gunyah_vmm
diff --git a/samples/gunyah/Makefile b/samples/gunyah/Makefile
new file mode 100644
index 000000000000..faf14f9bb337
--- /dev/null
+++ b/samples/gunyah/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+userprogs-always-y += gunyah_vmm
+dtb-y += sample_vm.dtb
+
+userccflags += -I usr/include
diff --git a/samples/gunyah/gunyah_vmm.c b/samples/gunyah/gunyah_vmm.c
new file mode 100644
index 000000000000..d0eb49e86372
--- /dev/null
+++ b/samples/gunyah/gunyah_vmm.c
@@ -0,0 +1,270 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <unistd.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <sys/ioctl.h>
+#include <getopt.h>
+#include <limits.h>
+#include <stdint.h>
+#include <fcntl.h>
+#include <string.h>
+#include <sys/sysmacros.h>
+#define __USE_GNU
+#include <sys/mman.h>
+
+#include <linux/gunyah.h>
+
+struct vm_config {
+ int image_fd;
+ int dtb_fd;
+ int ramdisk_fd;
+
+ uint64_t guest_base;
+ uint64_t guest_size;
+
+ uint64_t image_offset;
+ off_t image_size;
+ uint64_t dtb_offset;
+ off_t dtb_size;
+ uint64_t ramdisk_offset;
+ off_t ramdisk_size;
+};
+
+static struct option options[] = {
+ { "help", no_argument, NULL, 'h' },
+ { "image", required_argument, NULL, 'i' },
+ { "dtb", required_argument, NULL, 'd' },
+ { "ramdisk", optional_argument, NULL, 'r' },
+ { "base", optional_argument, NULL, 'B' },
+ { "size", optional_argument, NULL, 'S' },
+ { "image_offset", optional_argument, NULL, 'I' },
+ { "dtb_offset", optional_argument, NULL, 'D' },
+ { "ramdisk_offset", optional_argument, NULL, 'R' },
+ { }
+};
+
+static void print_help(char *cmd)
+{
+ printf("gunyah_vmm, a sample tool to launch Gunyah VMs\n"
+ "Usage: %s <options>\n"
+ " --help, -h this menu\n"
+ " --image, -i <image> VM image file to load (e.g. a kernel Image) [Required]\n"
+ " --dtb, -d <dtb> Devicetree file to load [Required]\n"
+ " --ramdisk, -r <ramdisk> Ramdisk file to load\n"
+ " --base, -B <address> Set the base address of guest's memory [Default: 0x80000000]\n"
+ " --size, -S <number> The number of bytes large to make the guest's memory [Default: 0x6400000 (100 MB)]\n"
+ " --image_offset, -I <number> Offset into guest memory to load the VM image file [Default: 0x10000]\n"
+ " --dtb_offset, -D <number> Offset into guest memory to load the DTB [Default: 0]\n"
+ " --ramdisk_offset, -R <number> Offset into guest memory to load a ramdisk [Default: 0x4600000]\n"
+ , cmd);
+}
+
+int main(int argc, char **argv)
+{
+ int gunyah_fd, vm_fd, guest_fd;
+ struct gh_userspace_memory_region guest_mem_desc = { 0 };
+ struct gh_vm_dtb_config dtb_config = { 0 };
+ char *guest_mem;
+ struct vm_config config = {
+ /* Defaults good enough to boot static kernel and a basic ramdisk */
+ .ramdisk_fd = -1,
+ .guest_base = 0x80000000,
+ .guest_size = 0x6400000, /* 100 MB */
+ .image_offset = 0,
+ .dtb_offset = 0x45f0000,
+ .ramdisk_offset = 0x4600000, /* put at +70MB (30MB for ramdisk) */
+ };
+ struct stat st;
+ int opt, optidx, ret = 0;
+ long l;
+
+ while ((opt = getopt_long(argc, argv, "hi:d:r:B:S:I:D:R:c:", options, &optidx)) != -1) {
+ switch (opt) {
+ case 'i':
+ config.image_fd = open(optarg, O_RDONLY | O_CLOEXEC);
+ if (config.image_fd < 0) {
+ perror("Failed to open image");
+ return -1;
+ }
+ if (stat(optarg, &st) < 0) {
+ perror("Failed to stat image");
+ return -1;
+ }
+ config.image_size = st.st_size;
+ break;
+ case 'd':
+ config.dtb_fd = open(optarg, O_RDONLY | O_CLOEXEC);
+ if (config.dtb_fd < 0) {
+ perror("Failed to open dtb");
+ return -1;
+ }
+ if (stat(optarg, &st) < 0) {
+ perror("Failed to stat dtb");
+ return -1;
+ }
+ config.dtb_size = st.st_size;
+ break;
+ case 'r':
+ config.ramdisk_fd = open(optarg, O_RDONLY | O_CLOEXEC);
+ if (config.ramdisk_fd < 0) {
+ perror("Failed to open ramdisk");
+ return -1;
+ }
+ if (stat(optarg, &st) < 0) {
+ perror("Failed to stat ramdisk");
+ return -1;
+ }
+ config.ramdisk_size = st.st_size;
+ break;
+ case 'B':
+ l = strtol(optarg, NULL, 0);
+ if (l == LONG_MIN) {
+ perror("Failed to parse base address");
+ return -1;
+ }
+ config.guest_base = l;
+ break;
+ case 'S':
+ l = strtol(optarg, NULL, 0);
+ if (l == LONG_MIN) {
+ perror("Failed to parse memory size");
+ return -1;
+ }
+ config.guest_size = l;
+ break;
+ case 'I':
+ l = strtol(optarg, NULL, 0);
+ if (l == LONG_MIN) {
+ perror("Failed to parse image offset");
+ return -1;
+ }
+ config.image_offset = l;
+ break;
+ case 'D':
+ l = strtol(optarg, NULL, 0);
+ if (l == LONG_MIN) {
+ perror("Failed to parse dtb offset");
+ return -1;
+ }
+ config.dtb_offset = l;
+ break;
+ case 'R':
+ l = strtol(optarg, NULL, 0);
+ if (l == LONG_MIN) {
+ perror("Failed to parse ramdisk offset");
+ return -1;
+ }
+ config.ramdisk_offset = l;
+ break;
+ case 'h':
+ print_help(argv[0]);
+ return 0;
+ default:
+ print_help(argv[0]);
+ return -1;
+ }
+ }
+
+ if (!config.image_fd || !config.dtb_fd) {
+ print_help(argv[0]);
+ return -1;
+ }
+
+ if (config.image_offset + config.image_size > config.guest_size) {
+ fprintf(stderr, "Image offset and size puts it outside guest memory. Make image smaller or increase guest memory size.\n");
+ return -1;
+ }
+
+ if (config.dtb_offset + config.dtb_size > config.guest_size) {
+ fprintf(stderr, "DTB offset and size puts it outside guest memory. Make dtb smaller or increase guest memory size.\n");
+ return -1;
+ }
+
+ if (config.ramdisk_fd == -1 &&
+ config.ramdisk_offset + config.ramdisk_size > config.guest_size) {
+ fprintf(stderr, "Ramdisk offset and size puts it outside guest memory. Make ramdisk smaller or increase guest memory size.\n");
+ return -1;
+ }
+
+ gunyah_fd = open("/dev/gunyah", O_RDWR | O_CLOEXEC);
+ if (gunyah_fd < 0) {
+ perror("Failed to open /dev/gunyah");
+ return -1;
+ }
+
+ vm_fd = ioctl(gunyah_fd, GH_CREATE_VM, 0);
+ if (vm_fd < 0) {
+ perror("Failed to create vm");
+ return -1;
+ }
+
+ guest_fd = memfd_create("guest_memory", MFD_CLOEXEC);
+ if (guest_fd < 0) {
+ perror("Failed to create guest memfd");
+ return -1;
+ }
+
+ if (ftruncate(guest_fd, config.guest_size) < 0) {
+ perror("Failed to grow guest memory");
+ return -1;
+ }
+
+ guest_mem = mmap(NULL, config.guest_size, PROT_READ | PROT_WRITE, MAP_SHARED, guest_fd, 0);
+ if (guest_mem == MAP_FAILED) {
+ perror("Not enough memory");
+ return -1;
+ }
+
+ if (read(config.image_fd, guest_mem + config.image_offset, config.image_size) < 0) {
+ perror("Failed to read image into guest memory");
+ return -1;
+ }
+
+ if (read(config.dtb_fd, guest_mem + config.dtb_offset, config.dtb_size) < 0) {
+ perror("Failed to read dtb into guest memory");
+ return -1;
+ }
+
+ if (config.ramdisk_fd > 0 &&
+ read(config.ramdisk_fd, guest_mem + config.ramdisk_offset,
+ config.ramdisk_size) < 0) {
+ perror("Failed to read ramdisk into guest memory");
+ return -1;
+ }
+
+ guest_mem_desc.label = 0;
+ guest_mem_desc.flags = GH_MEM_ALLOW_READ | GH_MEM_ALLOW_WRITE | GH_MEM_ALLOW_EXEC;
+ guest_mem_desc.guest_phys_addr = config.guest_base;
+ guest_mem_desc.memory_size = config.guest_size;
+ guest_mem_desc.userspace_addr = (__u64)guest_mem;
+
+ if (ioctl(vm_fd, GH_VM_SET_USER_MEM_REGION, &guest_mem_desc) < 0) {
+ perror("Failed to register guest memory with VM");
+ return -1;
+ }
+
+ dtb_config.guest_phys_addr = config.guest_base + config.dtb_offset;
+ dtb_config.size = config.dtb_size;
+ if (ioctl(vm_fd, GH_VM_SET_DTB_CONFIG, &dtb_config) < 0) {
+ perror("Failed to set DTB configuration for VM");
+ return -1;
+ }
+
+ ret = ioctl(vm_fd, GH_VM_START);
+ if (ret) {
+ perror("GH_VM_START failed");
+ return -1;
+ }
+
+ while (1)
+ sleep(10);
+
+ return 0;
+}
diff --git a/samples/gunyah/sample_vm.dts b/samples/gunyah/sample_vm.dts
new file mode 100644
index 000000000000..293bbc0469c8
--- /dev/null
+++ b/samples/gunyah/sample_vm.dts
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: BSD-3-Clause
+/*
+ * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+/dts-v1/;
+
+/ {
+ #address-cells = <2>;
+ #size-cells = <2>;
+ interrupt-parent = <&intc>;
+
+ chosen {
+ bootargs = "nokaslr";
+ };
+
+ cpus {
+ #address-cells = <0x2>;
+ #size-cells = <0>;
+
+ cpu@0 {
+ device_type = "cpu";
+ compatible = "arm,armv8";
+ reg = <0 0>;
+ };
+ };
+
+ intc: interrupt-controller@3FFF0000 {
+ compatible = "arm,gic-v3";
+ #interrupt-cells = <3>;
+ #address-cells = <2>;
+ #size-cells = <2>;
+ interrupt-controller;
+ reg = <0 0x3FFF0000 0 0x10000>,
+ <0 0x3FFD0000 0 0x20000>;
+ };
+
+ timer {
+ compatible = "arm,armv8-timer";
+ always-on;
+ interrupts = <1 13 0x108>,
+ <1 14 0x108>,
+ <1 11 0x108>,
+ <1 10 0x108>;
+ clock-frequency = <19200000>;
+ };
+
+ gunyah-vm-config {
+ image-name = "linux_vm_0";
+
+ memory {
+ #address-cells = <2>;
+ #size-cells = <2>;
+
+ base-address = <0 0x80000000>;
+ };
+
+ interrupts {
+ config = <&intc>;
+ };
+
+ vcpus {
+ affinity-map = < 0 >;
+ sched-priority = < (-1) >;
+ sched-timeslice = < 2000 >;
+ };
+ };
+};
--
2.40.0
Enable support for creating irqfds which can raise an interrupt on a
Gunyah virtual machine. irqfds are exposed to userspace as a Gunyah VM
function with the name "irqfd". If the VM devicetree is not configured
to create a doorbell with the corresponding label, userspace will still
be able to assert the eventfd but no interrupt will be raised on the
guest.
Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Elliot Berman <[email protected]>
---
Documentation/virt/gunyah/vm-manager.rst | 2 +-
drivers/virt/gunyah/Kconfig | 9 ++
drivers/virt/gunyah/Makefile | 1 +
drivers/virt/gunyah/gunyah_irqfd.c | 180 +++++++++++++++++++++++
include/uapi/linux/gunyah.h | 35 +++++
5 files changed, 226 insertions(+), 1 deletion(-)
create mode 100644 drivers/virt/gunyah/gunyah_irqfd.c
diff --git a/Documentation/virt/gunyah/vm-manager.rst b/Documentation/virt/gunyah/vm-manager.rst
index 6789d13fed14..c4960948c779 100644
--- a/Documentation/virt/gunyah/vm-manager.rst
+++ b/Documentation/virt/gunyah/vm-manager.rst
@@ -115,7 +115,7 @@ the VM *before* the VM starts.
The argument types are documented below:
.. kernel-doc:: include/uapi/linux/gunyah.h
- :identifiers: gh_fn_vcpu_arg
+ :identifiers: gh_fn_vcpu_arg gh_fn_irqfd_arg gh_irqfd_flags
Gunyah VCPU API Descriptions
----------------------------
diff --git a/drivers/virt/gunyah/Kconfig b/drivers/virt/gunyah/Kconfig
index 0a58395f7d2c..bc2c46d9df94 100644
--- a/drivers/virt/gunyah/Kconfig
+++ b/drivers/virt/gunyah/Kconfig
@@ -39,3 +39,12 @@ config GUNYAH_VCPU
VMMs can also handle stage 2 faults of the vCPUs.
Say Y/M here if unsure and you want to support Gunyah VMMs.
+
+config GUNYAH_IRQFD
+ tristate "Gunyah irqfd interface"
+ depends on GUNYAH
+ help
+ Enable kernel support for creating irqfds which can raise an interrupt
+ on Gunyah virtual machine.
+
+ Say Y/M here if unsure and you want to support Gunyah VMMs.
diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
index cc16b6c19db9..ad212a1cf967 100644
--- a/drivers/virt/gunyah/Makefile
+++ b/drivers/virt/gunyah/Makefile
@@ -7,3 +7,4 @@ gunyah-y += rsc_mgr.o rsc_mgr_rpc.o vm_mgr.o vm_mgr_mm.o
obj-$(CONFIG_GUNYAH) += gunyah.o
obj-$(CONFIG_GUNYAH_VCPU) += gunyah_vcpu.o
+obj-$(CONFIG_GUNYAH_IRQFD) += gunyah_irqfd.o
diff --git a/drivers/virt/gunyah/gunyah_irqfd.c b/drivers/virt/gunyah/gunyah_irqfd.c
new file mode 100644
index 000000000000..3e954ebd2029
--- /dev/null
+++ b/drivers/virt/gunyah/gunyah_irqfd.c
@@ -0,0 +1,180 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#include <linux/eventfd.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/gunyah.h>
+#include <linux/gunyah_vm_mgr.h>
+#include <linux/module.h>
+#include <linux/poll.h>
+#include <linux/printk.h>
+
+#include <uapi/linux/gunyah.h>
+
+struct gh_irqfd {
+ struct gh_resource *ghrsc;
+ struct gh_vm_resource_ticket ticket;
+ struct gh_vm_function_instance *f;
+
+ bool level;
+
+ struct eventfd_ctx *ctx;
+ wait_queue_entry_t wait;
+ poll_table pt;
+};
+
+static int irqfd_wakeup(wait_queue_entry_t *wait, unsigned int mode, int sync, void *key)
+{
+ struct gh_irqfd *irqfd = container_of(wait, struct gh_irqfd, wait);
+ __poll_t flags = key_to_poll(key);
+ int ret = 0;
+
+ if (flags & EPOLLIN) {
+ if (irqfd->ghrsc) {
+ ret = gh_hypercall_bell_send(irqfd->ghrsc->capid, 1, NULL);
+ if (ret)
+ pr_err_ratelimited("Failed to inject interrupt %d: %d\n",
+ irqfd->ticket.label, ret);
+ } else
+ pr_err_ratelimited("Premature injection of interrupt\n");
+ }
+
+ return 0;
+}
+
+static void irqfd_ptable_queue_proc(struct file *file, wait_queue_head_t *wqh, poll_table *pt)
+{
+ struct gh_irqfd *irq_ctx = container_of(pt, struct gh_irqfd, pt);
+
+ add_wait_queue(wqh, &irq_ctx->wait);
+}
+
+static bool gh_irqfd_populate(struct gh_vm_resource_ticket *ticket, struct gh_resource *ghrsc)
+{
+ struct gh_irqfd *irqfd = container_of(ticket, struct gh_irqfd, ticket);
+ int ret;
+
+ if (irqfd->ghrsc) {
+ pr_warn("irqfd%d already got a Gunyah resource. Check if multiple resources with same label were configured.\n",
+ irqfd->ticket.label);
+ return false;
+ }
+
+ irqfd->ghrsc = ghrsc;
+ if (irqfd->level) {
+ /* Configure the bell to trigger when bit 0 is asserted (see
+ * irq_wakeup) and for bell to automatically clear bit 0 once
+ * received by the VM (ack_mask). need to make sure bit 0 is cleared right away,
+ * otherwise the line will never be deasserted. Emulating edge
+ * trigger interrupt does not need to set either mask
+ * because irq is listed only once per gh_hypercall_bell_send
+ */
+ ret = gh_hypercall_bell_set_mask(irqfd->ghrsc->capid, 1, 1);
+ if (ret)
+ pr_warn("irq %d couldn't be set as level triggered. Might cause IRQ storm if asserted\n",
+ irqfd->ticket.label);
+ }
+
+ return true;
+}
+
+static void gh_irqfd_unpopulate(struct gh_vm_resource_ticket *ticket, struct gh_resource *ghrsc)
+{
+ struct gh_irqfd *irqfd = container_of(ticket, struct gh_irqfd, ticket);
+ u64 cnt;
+
+ eventfd_ctx_remove_wait_queue(irqfd->ctx, &irqfd->wait, &cnt);
+}
+
+static long gh_irqfd_bind(struct gh_vm_function_instance *f)
+{
+ struct gh_fn_irqfd_arg *args = f->argp;
+ struct gh_irqfd *irqfd;
+ __poll_t events;
+ struct fd fd;
+ long r;
+
+ if (f->arg_size != sizeof(*args))
+ return -EINVAL;
+
+ /* All other flag bits are reserved for future use */
+ if (args->flags & ~GH_IRQFD_FLAGS_LEVEL)
+ return -EINVAL;
+
+ irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL);
+ if (!irqfd)
+ return -ENOMEM;
+
+ irqfd->f = f;
+ f->data = irqfd;
+
+ fd = fdget(args->fd);
+ if (!fd.file) {
+ kfree(irqfd);
+ return -EBADF;
+ }
+
+ irqfd->ctx = eventfd_ctx_fileget(fd.file);
+ if (IS_ERR(irqfd->ctx)) {
+ r = PTR_ERR(irqfd->ctx);
+ goto err_fdput;
+ }
+
+ if (args->flags & GH_IRQFD_FLAGS_LEVEL)
+ irqfd->level = true;
+
+ init_waitqueue_func_entry(&irqfd->wait, irqfd_wakeup);
+ init_poll_funcptr(&irqfd->pt, irqfd_ptable_queue_proc);
+
+ irqfd->ticket.resource_type = GH_RESOURCE_TYPE_BELL_TX;
+ irqfd->ticket.label = args->label;
+ irqfd->ticket.owner = THIS_MODULE;
+ irqfd->ticket.populate = gh_irqfd_populate;
+ irqfd->ticket.unpopulate = gh_irqfd_unpopulate;
+
+ r = gh_vm_add_resource_ticket(f->ghvm, &irqfd->ticket);
+ if (r)
+ goto err_ctx;
+
+ events = vfs_poll(fd.file, &irqfd->pt);
+ if (events & EPOLLIN)
+ pr_warn("Premature injection of interrupt\n");
+ fdput(fd);
+
+ return 0;
+err_ctx:
+ eventfd_ctx_put(irqfd->ctx);
+err_fdput:
+ fdput(fd);
+ kfree(irqfd);
+ return r;
+}
+
+static void gh_irqfd_unbind(struct gh_vm_function_instance *f)
+{
+ struct gh_irqfd *irqfd = f->data;
+
+ gh_vm_remove_resource_ticket(irqfd->f->ghvm, &irqfd->ticket);
+ eventfd_ctx_put(irqfd->ctx);
+ kfree(irqfd);
+}
+
+static bool gh_irqfd_compare(const struct gh_vm_function_instance *f,
+ const void *arg, size_t size)
+{
+ const struct gh_fn_irqfd_arg *instance = f->argp,
+ *other = arg;
+
+ if (sizeof(*other) != size)
+ return false;
+
+ return instance->label == other->label;
+}
+
+DECLARE_GH_VM_FUNCTION_INIT(irqfd, GH_FN_IRQFD, 2, gh_irqfd_bind, gh_irqfd_unbind,
+ gh_irqfd_compare);
+MODULE_DESCRIPTION("Gunyah irqfd VM Function");
+MODULE_LICENSE("GPL");
diff --git a/include/uapi/linux/gunyah.h b/include/uapi/linux/gunyah.h
index 434ffa8ffc78..0c480c622686 100644
--- a/include/uapi/linux/gunyah.h
+++ b/include/uapi/linux/gunyah.h
@@ -77,9 +77,12 @@ struct gh_vm_dtb_config {
* @GH_FN_VCPU: create a vCPU instance to control a vCPU
* &struct gh_fn_desc.arg is a pointer to &struct gh_fn_vcpu_arg
* Return: file descriptor to manipulate the vcpu.
+ * @GH_FN_IRQFD: register eventfd to assert a Gunyah doorbell
+ * &struct gh_fn_desc.arg is a pointer to &struct gh_fn_irqfd_arg
*/
enum gh_fn_type {
GH_FN_VCPU = 1,
+ GH_FN_IRQFD,
};
#define GH_FN_MAX_ARG_SIZE 256
@@ -99,6 +102,38 @@ struct gh_fn_vcpu_arg {
__u32 id;
};
+/**
+ * enum gh_irqfd_flags - flags for use in gh_fn_irqfd_arg
+ * @GH_IRQFD_FLAGS_LEVEL: make the interrupt operate like a level triggered
+ * interrupt on guest side. Triggering IRQFD before
+ * guest handles the interrupt causes interrupt to
+ * stay asserted.
+ */
+enum gh_irqfd_flags {
+ GH_IRQFD_FLAGS_LEVEL = 1UL << 0,
+};
+
+/**
+ * struct gh_fn_irqfd_arg - Arguments to create an irqfd function.
+ *
+ * Create this function with &GH_VM_ADD_FUNCTION using type &GH_FN_IRQFD.
+ *
+ * Allows setting an eventfd to directly trigger a guest interrupt.
+ * irqfd.fd specifies the file descriptor to use as the eventfd.
+ * irqfd.label corresponds to the doorbell label used in the guest VM's devicetree.
+ *
+ * @fd: an eventfd which when written to will raise a doorbell
+ * @label: Label of the doorbell created on the guest VM
+ * @flags: see &enum gh_irqfd_flags
+ * @padding: padding bytes
+ */
+struct gh_fn_irqfd_arg {
+ __u32 fd;
+ __u32 label;
+ __u32 flags;
+ __u32 padding;
+};
+
/**
* struct gh_fn_desc - Arguments to create a VM function
* @type: Type of the function. See &enum gh_fn_type.
--
2.40.0
The resource manager is a special virtual machine which is always
running on a Gunyah system. It provides APIs for creating and destroying
VMs, secure memory management, sharing/lending of memory between VMs,
and setup of inter-VM communication. Calls to the resource manager are
made via message queues.
This patch implements the basic probing and RPC mechanism to make those
API calls. Request/response calls can be made with gh_rm_call.
Drivers can also register to notifications pushed by RM via
gh_rm_register_notifier
Specific API calls that resource manager supports will be implemented in
subsequent patches.
Signed-off-by: Elliot Berman <[email protected]>
---
drivers/virt/Makefile | 1 +
drivers/virt/gunyah/Makefile | 4 +
drivers/virt/gunyah/rsc_mgr.c | 702 +++++++++++++++++++++++++++++++++
drivers/virt/gunyah/rsc_mgr.h | 16 +
include/linux/gunyah_rsc_mgr.h | 21 +
5 files changed, 744 insertions(+)
create mode 100644 drivers/virt/gunyah/Makefile
create mode 100644 drivers/virt/gunyah/rsc_mgr.c
create mode 100644 drivers/virt/gunyah/rsc_mgr.h
create mode 100644 include/linux/gunyah_rsc_mgr.h
diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
index e9aa6fc96fab..a5817e2d7d71 100644
--- a/drivers/virt/Makefile
+++ b/drivers/virt/Makefile
@@ -12,3 +12,4 @@ obj-$(CONFIG_ACRN_HSM) += acrn/
obj-$(CONFIG_EFI_SECRET) += coco/efi_secret/
obj-$(CONFIG_SEV_GUEST) += coco/sev-guest/
obj-$(CONFIG_INTEL_TDX_GUEST) += coco/tdx-guest/
+obj-y += gunyah/
diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
new file mode 100644
index 000000000000..0f5aec834698
--- /dev/null
+++ b/drivers/virt/gunyah/Makefile
@@ -0,0 +1,4 @@
+# SPDX-License-Identifier: GPL-2.0
+
+gunyah-y += rsc_mgr.o
+obj-$(CONFIG_GUNYAH) += gunyah.o
diff --git a/drivers/virt/gunyah/rsc_mgr.c b/drivers/virt/gunyah/rsc_mgr.c
new file mode 100644
index 000000000000..88b5beb1ea51
--- /dev/null
+++ b/drivers/virt/gunyah/rsc_mgr.c
@@ -0,0 +1,702 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#include <linux/of.h>
+#include <linux/slab.h>
+#include <linux/mutex.h>
+#include <linux/sched.h>
+#include <linux/gunyah.h>
+#include <linux/module.h>
+#include <linux/of_irq.h>
+#include <linux/notifier.h>
+#include <linux/workqueue.h>
+#include <linux/completion.h>
+#include <linux/gunyah_rsc_mgr.h>
+#include <linux/platform_device.h>
+
+#include "rsc_mgr.h"
+
+#define RM_RPC_API_VERSION_MASK GENMASK(3, 0)
+#define RM_RPC_HEADER_WORDS_MASK GENMASK(7, 4)
+#define RM_RPC_API_VERSION FIELD_PREP(RM_RPC_API_VERSION_MASK, 1)
+#define RM_RPC_HEADER_WORDS FIELD_PREP(RM_RPC_HEADER_WORDS_MASK, \
+ (sizeof(struct gh_rm_rpc_hdr) / sizeof(u32)))
+#define RM_RPC_API (RM_RPC_API_VERSION | RM_RPC_HEADER_WORDS)
+
+#define RM_RPC_TYPE_CONTINUATION 0x0
+#define RM_RPC_TYPE_REQUEST 0x1
+#define RM_RPC_TYPE_REPLY 0x2
+#define RM_RPC_TYPE_NOTIF 0x3
+#define RM_RPC_TYPE_MASK GENMASK(1, 0)
+
+#define GH_RM_MAX_NUM_FRAGMENTS 62
+#define RM_RPC_FRAGMENTS_MASK GENMASK(7, 2)
+
+struct gh_rm_rpc_hdr {
+ u8 api;
+ u8 type;
+ __le16 seq;
+ __le32 msg_id;
+} __packed;
+
+struct gh_rm_rpc_reply_hdr {
+ struct gh_rm_rpc_hdr hdr;
+ __le32 err_code; /* GH_RM_ERROR_* */
+} __packed;
+
+#define GH_RM_MAX_MSG_SIZE (GH_MSGQ_MAX_MSG_SIZE - sizeof(struct gh_rm_rpc_hdr))
+
+/* RM Error codes */
+enum gh_rm_error {
+ GH_RM_ERROR_OK = 0x0,
+ GH_RM_ERROR_UNIMPLEMENTED = 0xFFFFFFFF,
+ GH_RM_ERROR_NOMEM = 0x1,
+ GH_RM_ERROR_NORESOURCE = 0x2,
+ GH_RM_ERROR_DENIED = 0x3,
+ GH_RM_ERROR_INVALID = 0x4,
+ GH_RM_ERROR_BUSY = 0x5,
+ GH_RM_ERROR_ARGUMENT_INVALID = 0x6,
+ GH_RM_ERROR_HANDLE_INVALID = 0x7,
+ GH_RM_ERROR_VALIDATE_FAILED = 0x8,
+ GH_RM_ERROR_MAP_FAILED = 0x9,
+ GH_RM_ERROR_MEM_INVALID = 0xA,
+ GH_RM_ERROR_MEM_INUSE = 0xB,
+ GH_RM_ERROR_MEM_RELEASED = 0xC,
+ GH_RM_ERROR_VMID_INVALID = 0xD,
+ GH_RM_ERROR_LOOKUP_FAILED = 0xE,
+ GH_RM_ERROR_IRQ_INVALID = 0xF,
+ GH_RM_ERROR_IRQ_INUSE = 0x10,
+ GH_RM_ERROR_IRQ_RELEASED = 0x11,
+};
+
+/**
+ * struct gh_rm_connection - Represents a complete message from resource manager
+ * @payload: Combined payload of all the fragments (msg headers stripped off).
+ * @size: Size of the payload received so far.
+ * @msg_id: Message ID from the header.
+ * @type: RM_RPC_TYPE_REPLY or RM_RPC_TYPE_NOTIF.
+ * @num_fragments: total number of fragments expected to be received.
+ * @fragments_received: fragments received so far.
+ * @reply: Fields used for request/reply sequences
+ * @notification: Fields used for notifiations
+ */
+struct gh_rm_connection {
+ void *payload;
+ size_t size;
+ __le32 msg_id;
+ u8 type;
+
+ u8 num_fragments;
+ u8 fragments_received;
+
+ union {
+ /**
+ * @ret: Linux return code, there was an error processing connection
+ * @seq: Sequence ID for the main message.
+ * @rm_error: For request/reply sequences with standard replies
+ * @seq_done: Signals caller that the RM reply has been received
+ */
+ struct {
+ int ret;
+ u16 seq;
+ enum gh_rm_error rm_error;
+ struct completion seq_done;
+ } reply;
+
+ /**
+ * @rm: Pointer to the RM that launched the connection
+ * @work: Triggered when all fragments of a notification received
+ */
+ struct {
+ struct gh_rm *rm;
+ struct work_struct work;
+ } notification;
+ };
+};
+
+/**
+ * struct gh_rm - private data for communicating w/Gunyah resource manager
+ * @dev: pointer to device
+ * @tx_ghrsc: message queue resource to TX to RM
+ * @rx_ghrsc: message queue resource to RX from RM
+ * @msgq: mailbox instance of TX/RX resources above
+ * @msgq_client: mailbox client of above msgq
+ * @active_rx_connection: ongoing gh_rm_connection for which we're receiving fragments
+ * @last_tx_ret: return value of last mailbox tx
+ * @call_xarray: xarray to allocate & lookup sequence IDs for Request/Response flows
+ * @next_seq: next ID to allocate (for xa_alloc_cyclic)
+ * @cache: cache for allocating Tx messages
+ * @send_lock: synchronization to allow only one request to be sent at a time
+ * @nh: notifier chain for clients interested in RM notification messages
+ */
+struct gh_rm {
+ struct device *dev;
+ struct gh_resource tx_ghrsc;
+ struct gh_resource rx_ghrsc;
+ struct gh_msgq msgq;
+ struct mbox_client msgq_client;
+ struct gh_rm_connection *active_rx_connection;
+ int last_tx_ret;
+
+ struct xarray call_xarray;
+ u32 next_seq;
+
+ struct kmem_cache *cache;
+ struct mutex send_lock;
+ struct blocking_notifier_head nh;
+};
+
+/**
+ * gh_rm_remap_error() - Remap Gunyah resource manager errors into a Linux error code
+ * @rm_error: "Standard" return value from Gunyah resource manager
+ */
+static inline int gh_rm_remap_error(enum gh_rm_error rm_error)
+{
+ switch (rm_error) {
+ case GH_RM_ERROR_OK:
+ return 0;
+ case GH_RM_ERROR_UNIMPLEMENTED:
+ return -EOPNOTSUPP;
+ case GH_RM_ERROR_NOMEM:
+ return -ENOMEM;
+ case GH_RM_ERROR_NORESOURCE:
+ return -ENODEV;
+ case GH_RM_ERROR_DENIED:
+ return -EPERM;
+ case GH_RM_ERROR_BUSY:
+ return -EBUSY;
+ case GH_RM_ERROR_INVALID:
+ case GH_RM_ERROR_ARGUMENT_INVALID:
+ case GH_RM_ERROR_HANDLE_INVALID:
+ case GH_RM_ERROR_VALIDATE_FAILED:
+ case GH_RM_ERROR_MAP_FAILED:
+ case GH_RM_ERROR_MEM_INVALID:
+ case GH_RM_ERROR_MEM_INUSE:
+ case GH_RM_ERROR_MEM_RELEASED:
+ case GH_RM_ERROR_VMID_INVALID:
+ case GH_RM_ERROR_LOOKUP_FAILED:
+ case GH_RM_ERROR_IRQ_INVALID:
+ case GH_RM_ERROR_IRQ_INUSE:
+ case GH_RM_ERROR_IRQ_RELEASED:
+ return -EINVAL;
+ default:
+ return -EBADMSG;
+ }
+}
+
+static int gh_rm_init_connection_payload(struct gh_rm_connection *connection, void *msg,
+ size_t hdr_size, size_t msg_size)
+{
+ size_t max_buf_size, payload_size;
+ struct gh_rm_rpc_hdr *hdr = msg;
+
+ if (msg_size < hdr_size)
+ return -EINVAL;
+
+ payload_size = msg_size - hdr_size;
+
+ connection->num_fragments = FIELD_GET(RM_RPC_FRAGMENTS_MASK, hdr->type);
+ connection->fragments_received = 0;
+
+ /* There's not going to be any payload, no need to allocate buffer. */
+ if (!payload_size && !connection->num_fragments)
+ return 0;
+
+ if (connection->num_fragments > GH_RM_MAX_NUM_FRAGMENTS)
+ return -EINVAL;
+
+ max_buf_size = payload_size + (connection->num_fragments * GH_RM_MAX_MSG_SIZE);
+
+ connection->payload = kzalloc(max_buf_size, GFP_KERNEL);
+ if (!connection->payload)
+ return -ENOMEM;
+
+ memcpy(connection->payload, msg + hdr_size, payload_size);
+ connection->size = payload_size;
+ return 0;
+}
+
+static void gh_rm_abort_connection(struct gh_rm *rm)
+{
+ switch (rm->active_rx_connection->type) {
+ case RM_RPC_TYPE_REPLY:
+ rm->active_rx_connection->reply.ret = -EIO;
+ complete(&rm->active_rx_connection->reply.seq_done);
+ break;
+ case RM_RPC_TYPE_NOTIF:
+ fallthrough;
+ default:
+ kfree(rm->active_rx_connection->payload);
+ kfree(rm->active_rx_connection);
+ }
+
+ rm->active_rx_connection = NULL;
+}
+
+static void gh_rm_notif_work(struct work_struct *work)
+{
+ struct gh_rm_connection *connection = container_of(work, struct gh_rm_connection,
+ notification.work);
+ struct gh_rm *rm = connection->notification.rm;
+
+ blocking_notifier_call_chain(&rm->nh, le32_to_cpu(connection->msg_id), connection->payload);
+
+ put_device(rm->dev);
+ kfree(connection->payload);
+ kfree(connection);
+}
+
+static void gh_rm_process_notif(struct gh_rm *rm, void *msg, size_t msg_size)
+{
+ struct gh_rm_connection *connection;
+ struct gh_rm_rpc_hdr *hdr = msg;
+ int ret;
+
+ if (rm->active_rx_connection)
+ gh_rm_abort_connection(rm);
+
+ connection = kzalloc(sizeof(*connection), GFP_KERNEL);
+ if (!connection)
+ return;
+
+ connection->type = RM_RPC_TYPE_NOTIF;
+ connection->msg_id = hdr->msg_id;
+
+ get_device(rm->dev);
+ connection->notification.rm = rm;
+ INIT_WORK(&connection->notification.work, gh_rm_notif_work);
+
+ ret = gh_rm_init_connection_payload(connection, msg, sizeof(*hdr), msg_size);
+ if (ret) {
+ dev_err(rm->dev, "Failed to initialize connection for notification: %d\n", ret);
+ put_device(rm->dev);
+ kfree(connection);
+ return;
+ }
+
+ rm->active_rx_connection = connection;
+}
+
+static void gh_rm_process_rply(struct gh_rm *rm, void *msg, size_t msg_size)
+{
+ struct gh_rm_rpc_reply_hdr *reply_hdr = msg;
+ struct gh_rm_connection *connection;
+ u16 seq_id;
+
+ seq_id = le16_to_cpu(reply_hdr->hdr.seq);
+ connection = xa_load(&rm->call_xarray, seq_id);
+
+ if (!connection || connection->msg_id != reply_hdr->hdr.msg_id)
+ return;
+
+ if (rm->active_rx_connection)
+ gh_rm_abort_connection(rm);
+
+ if (gh_rm_init_connection_payload(connection, msg, sizeof(*reply_hdr), msg_size)) {
+ dev_err(rm->dev, "Failed to alloc connection buffer for sequence %d\n", seq_id);
+ /* Send connection complete and error the client. */
+ connection->reply.ret = -ENOMEM;
+ complete(&connection->reply.seq_done);
+ return;
+ }
+
+ connection->reply.rm_error = le32_to_cpu(reply_hdr->err_code);
+ rm->active_rx_connection = connection;
+}
+
+static void gh_rm_process_cont(struct gh_rm *rm, struct gh_rm_connection *connection,
+ void *msg, size_t msg_size)
+{
+ struct gh_rm_rpc_hdr *hdr = msg;
+ size_t payload_size = msg_size - sizeof(*hdr);
+
+ if (!rm->active_rx_connection)
+ return;
+
+ /*
+ * hdr->fragments and hdr->msg_id preserves the value from first reply
+ * or notif message. To detect mishandling, check it's still intact.
+ */
+ if (connection->msg_id != hdr->msg_id ||
+ connection->num_fragments != FIELD_GET(RM_RPC_FRAGMENTS_MASK, hdr->type)) {
+ gh_rm_abort_connection(rm);
+ return;
+ }
+
+ memcpy(connection->payload + connection->size, msg + sizeof(*hdr), payload_size);
+ connection->size += payload_size;
+ connection->fragments_received++;
+}
+
+static void gh_rm_try_complete_connection(struct gh_rm *rm)
+{
+ struct gh_rm_connection *connection = rm->active_rx_connection;
+
+ if (!connection || connection->fragments_received != connection->num_fragments)
+ return;
+
+ switch (connection->type) {
+ case RM_RPC_TYPE_REPLY:
+ complete(&connection->reply.seq_done);
+ break;
+ case RM_RPC_TYPE_NOTIF:
+ schedule_work(&connection->notification.work);
+ break;
+ default:
+ dev_err_ratelimited(rm->dev, "Invalid message type (%u) received\n",
+ connection->type);
+ gh_rm_abort_connection(rm);
+ break;
+ }
+
+ rm->active_rx_connection = NULL;
+}
+
+static void gh_rm_msgq_rx_data(struct mbox_client *cl, void *mssg)
+{
+ struct gh_rm *rm = container_of(cl, struct gh_rm, msgq_client);
+ struct gh_msgq_rx_data *rx_data = mssg;
+ size_t msg_size = rx_data->length;
+ void *msg = rx_data->data;
+ struct gh_rm_rpc_hdr *hdr;
+
+ if (msg_size < sizeof(*hdr) || msg_size > GH_MSGQ_MAX_MSG_SIZE)
+ return;
+
+ hdr = msg;
+ if (hdr->api != RM_RPC_API) {
+ dev_err(rm->dev, "Unknown RM RPC API version: %x\n", hdr->api);
+ return;
+ }
+
+ switch (FIELD_GET(RM_RPC_TYPE_MASK, hdr->type)) {
+ case RM_RPC_TYPE_NOTIF:
+ gh_rm_process_notif(rm, msg, msg_size);
+ break;
+ case RM_RPC_TYPE_REPLY:
+ gh_rm_process_rply(rm, msg, msg_size);
+ break;
+ case RM_RPC_TYPE_CONTINUATION:
+ gh_rm_process_cont(rm, rm->active_rx_connection, msg, msg_size);
+ break;
+ default:
+ dev_err(rm->dev, "Invalid message type (%lu) received\n",
+ FIELD_GET(RM_RPC_TYPE_MASK, hdr->type));
+ return;
+ }
+
+ gh_rm_try_complete_connection(rm);
+}
+
+static void gh_rm_msgq_tx_done(struct mbox_client *cl, void *mssg, int r)
+{
+ struct gh_rm *rm = container_of(cl, struct gh_rm, msgq_client);
+
+ kmem_cache_free(rm->cache, mssg);
+ rm->last_tx_ret = r;
+}
+
+static int gh_rm_send_request(struct gh_rm *rm, u32 message_id,
+ const void *req_buf, size_t req_buf_size,
+ struct gh_rm_connection *connection)
+{
+ size_t buf_size_remaining = req_buf_size;
+ const void *req_buf_curr = req_buf;
+ struct gh_msgq_tx_data *msg;
+ struct gh_rm_rpc_hdr *hdr, hdr_template;
+ u32 cont_fragments = 0;
+ size_t payload_size;
+ void *payload;
+ int ret;
+
+ if (req_buf_size > GH_RM_MAX_NUM_FRAGMENTS * GH_RM_MAX_MSG_SIZE) {
+ dev_warn(rm->dev, "Limit (%lu bytes) exceeded for the maximum message size: %lu\n",
+ GH_RM_MAX_NUM_FRAGMENTS * GH_RM_MAX_MSG_SIZE, req_buf_size);
+ dump_stack();
+ return -E2BIG;
+ }
+
+ if (req_buf_size)
+ cont_fragments = (req_buf_size - 1) / GH_RM_MAX_MSG_SIZE;
+
+ hdr_template.api = RM_RPC_API;
+ hdr_template.type = FIELD_PREP(RM_RPC_TYPE_MASK, RM_RPC_TYPE_REQUEST) |
+ FIELD_PREP(RM_RPC_FRAGMENTS_MASK, cont_fragments);
+ hdr_template.seq = cpu_to_le16(connection->reply.seq);
+ hdr_template.msg_id = cpu_to_le32(message_id);
+
+ ret = mutex_lock_interruptible(&rm->send_lock);
+ if (ret)
+ return ret;
+
+ do {
+ msg = kmem_cache_zalloc(rm->cache, GFP_KERNEL);
+ if (!msg) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ /* Fill header */
+ hdr = (struct gh_rm_rpc_hdr *)&msg->data[0];
+ *hdr = hdr_template;
+
+ /* Copy payload */
+ payload = &msg->data[0] + sizeof(*hdr);
+ payload_size = min(buf_size_remaining, GH_RM_MAX_MSG_SIZE);
+ memcpy(payload, req_buf_curr, payload_size);
+ req_buf_curr += payload_size;
+ buf_size_remaining -= payload_size;
+
+ /* Force the last fragment to immediately alert the receiver */
+ msg->push = !buf_size_remaining;
+ msg->length = sizeof(*hdr) + payload_size;
+
+ ret = mbox_send_message(gh_msgq_chan(&rm->msgq), msg);
+ if (ret < 0) {
+ kmem_cache_free(rm->cache, msg);
+ break;
+ }
+
+ if (rm->last_tx_ret) {
+ ret = rm->last_tx_ret;
+ break;
+ }
+
+ hdr_template.type = FIELD_PREP(RM_RPC_TYPE_MASK, RM_RPC_TYPE_CONTINUATION) |
+ FIELD_PREP(RM_RPC_FRAGMENTS_MASK, cont_fragments);
+ } while (buf_size_remaining);
+
+out:
+ mutex_unlock(&rm->send_lock);
+ return ret < 0 ? ret : 0;
+}
+
+/**
+ * gh_rm_call: Achieve request-response type communication with RPC
+ * @rm: Pointer to Gunyah resource manager internal data
+ * @message_id: The RM RPC message-id
+ * @req_buf: Request buffer that contains the payload
+ * @req_buf_size: Total size of the payload
+ * @resp_buf: Pointer to a response buffer
+ * @resp_buf_size: Size of the response buffer
+ *
+ * Make a request to the Resource Manager and wait for reply back. For a successful
+ * response, the function returns the payload. The size of the payload is set in
+ * resp_buf_size. The resp_buf must be freed by the caller when 0 is returned
+ * and resp_buf_size != 0.
+ *
+ * req_buf should be not NULL for req_buf_size >0. If req_buf_size == 0,
+ * req_buf *can* be NULL and no additional payload is sent.
+ *
+ * Context: Process context. Will sleep waiting for reply.
+ * Return: 0 on success. <0 if error.
+ */
+int gh_rm_call(struct gh_rm *rm, u32 message_id, const void *req_buf, size_t req_buf_size,
+ void **resp_buf, size_t *resp_buf_size)
+{
+ struct gh_rm_connection *connection;
+ u32 seq_id;
+ int ret;
+
+ /* message_id 0 is reserved. req_buf_size implies req_buf is not NULL */
+ if (!rm || !message_id || (!req_buf && req_buf_size))
+ return -EINVAL;
+
+
+ connection = kzalloc(sizeof(*connection), GFP_KERNEL);
+ if (!connection)
+ return -ENOMEM;
+
+ connection->type = RM_RPC_TYPE_REPLY;
+ connection->msg_id = cpu_to_le32(message_id);
+
+ init_completion(&connection->reply.seq_done);
+
+ /* Allocate a new seq number for this connection */
+ ret = xa_alloc_cyclic(&rm->call_xarray, &seq_id, connection, xa_limit_16b, &rm->next_seq,
+ GFP_KERNEL);
+ if (ret < 0)
+ goto free;
+ connection->reply.seq = lower_16_bits(seq_id);
+
+ /* Send the request to the Resource Manager */
+ ret = gh_rm_send_request(rm, message_id, req_buf, req_buf_size, connection);
+ if (ret < 0)
+ goto out;
+
+ /* Wait for response */
+ ret = wait_for_completion_interruptible(&connection->reply.seq_done);
+ if (ret)
+ goto out;
+
+ /* Check for internal (kernel) error waiting for the response */
+ if (connection->reply.ret) {
+ ret = connection->reply.ret;
+ if (ret != -ENOMEM)
+ kfree(connection->payload);
+ goto out;
+ }
+
+ /* Got a response, did resource manager give us an error? */
+ if (connection->reply.rm_error != GH_RM_ERROR_OK) {
+ dev_warn(rm->dev, "RM rejected message %08x. Error: %d\n", message_id,
+ connection->reply.rm_error);
+ dump_stack();
+ ret = gh_rm_remap_error(connection->reply.rm_error);
+ kfree(connection->payload);
+ goto out;
+ }
+
+ /* Everything looks good, return the payload */
+ if (resp_buf_size)
+ *resp_buf_size = connection->size;
+ if (connection->size && resp_buf)
+ *resp_buf = connection->payload;
+ else {
+ /* kfree in case RM sent us multiple fragments but never any data in
+ * those fragments. We would've allocated memory for it, but connection->size == 0
+ */
+ kfree(connection->payload);
+ }
+
+out:
+ xa_erase(&rm->call_xarray, connection->reply.seq);
+free:
+ kfree(connection);
+ return ret;
+}
+
+
+int gh_rm_notifier_register(struct gh_rm *rm, struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&rm->nh, nb);
+}
+EXPORT_SYMBOL_GPL(gh_rm_notifier_register);
+
+int gh_rm_notifier_unregister(struct gh_rm *rm, struct notifier_block *nb)
+{
+ return blocking_notifier_chain_unregister(&rm->nh, nb);
+}
+EXPORT_SYMBOL_GPL(gh_rm_notifier_unregister);
+
+static int gh_msgq_platform_probe_direction(struct platform_device *pdev, bool tx,
+ struct gh_resource *ghrsc)
+{
+ struct device_node *node = pdev->dev.of_node;
+ int ret;
+ int idx = tx ? 0 : 1;
+
+ ghrsc->type = tx ? GH_RESOURCE_TYPE_MSGQ_TX : GH_RESOURCE_TYPE_MSGQ_RX;
+
+ ghrsc->irq = platform_get_irq(pdev, idx);
+ if (ghrsc->irq < 0) {
+ dev_err(&pdev->dev, "Failed to get irq%d: %d\n", idx, ghrsc->irq);
+ return ghrsc->irq;
+ }
+
+ ret = of_property_read_u64_index(node, "reg", idx, &ghrsc->capid);
+ if (ret) {
+ dev_err(&pdev->dev, "Failed to get capid%d: %d\n", idx, ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+static int gh_identify(void)
+{
+ struct gh_hypercall_hyp_identify_resp gh_api;
+
+ if (!arch_is_gh_guest())
+ return -ENODEV;
+
+ gh_hypercall_hyp_identify(&gh_api);
+
+ pr_info("Running under Gunyah hypervisor %llx/v%u\n",
+ FIELD_GET(GH_API_INFO_VARIANT_MASK, gh_api.api_info),
+ gh_api_version(&gh_api));
+
+ /* We might move this out to individual drivers if there's ever an API version bump */
+ if (gh_api_version(&gh_api) != GH_API_V1) {
+ pr_info("Unsupported Gunyah version: %u\n", gh_api_version(&gh_api));
+ return -ENODEV;
+ }
+
+ return 0;
+}
+
+static int gh_rm_drv_probe(struct platform_device *pdev)
+{
+ struct gh_msgq_tx_data *msg;
+ struct gh_rm *rm;
+ int ret;
+
+ ret = gh_identify();
+ if (ret)
+ return ret;
+
+ rm = devm_kzalloc(&pdev->dev, sizeof(*rm), GFP_KERNEL);
+ if (!rm)
+ return -ENOMEM;
+
+ platform_set_drvdata(pdev, rm);
+ rm->dev = &pdev->dev;
+
+ mutex_init(&rm->send_lock);
+ BLOCKING_INIT_NOTIFIER_HEAD(&rm->nh);
+ xa_init_flags(&rm->call_xarray, XA_FLAGS_ALLOC);
+ rm->cache = kmem_cache_create("gh_rm", struct_size(msg, data, GH_MSGQ_MAX_MSG_SIZE), 0,
+ SLAB_HWCACHE_ALIGN, NULL);
+ if (!rm->cache)
+ return -ENOMEM;
+
+ ret = gh_msgq_platform_probe_direction(pdev, true, &rm->tx_ghrsc);
+ if (ret)
+ goto err_cache;
+
+ ret = gh_msgq_platform_probe_direction(pdev, false, &rm->rx_ghrsc);
+ if (ret)
+ goto err_cache;
+
+ rm->msgq_client.dev = &pdev->dev;
+ rm->msgq_client.tx_block = true;
+ rm->msgq_client.rx_callback = gh_rm_msgq_rx_data;
+ rm->msgq_client.tx_done = gh_rm_msgq_tx_done;
+
+ return gh_msgq_init(&pdev->dev, &rm->msgq, &rm->msgq_client, &rm->tx_ghrsc, &rm->rx_ghrsc);
+err_cache:
+ kmem_cache_destroy(rm->cache);
+ return ret;
+}
+
+static int gh_rm_drv_remove(struct platform_device *pdev)
+{
+ struct gh_rm *rm = platform_get_drvdata(pdev);
+
+ mbox_free_channel(gh_msgq_chan(&rm->msgq));
+ gh_msgq_remove(&rm->msgq);
+ kmem_cache_destroy(rm->cache);
+
+ return 0;
+}
+
+static const struct of_device_id gh_rm_of_match[] = {
+ { .compatible = "gunyah-resource-manager" },
+ {}
+};
+MODULE_DEVICE_TABLE(of, gh_rm_of_match);
+
+static struct platform_driver gh_rm_driver = {
+ .probe = gh_rm_drv_probe,
+ .remove = gh_rm_drv_remove,
+ .driver = {
+ .name = "gh_rsc_mgr",
+ .of_match_table = gh_rm_of_match,
+ },
+};
+module_platform_driver(gh_rm_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Gunyah Resource Manager Driver");
diff --git a/drivers/virt/gunyah/rsc_mgr.h b/drivers/virt/gunyah/rsc_mgr.h
new file mode 100644
index 000000000000..8309b7bf4668
--- /dev/null
+++ b/drivers/virt/gunyah/rsc_mgr.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+#ifndef __GH_RSC_MGR_PRIV_H
+#define __GH_RSC_MGR_PRIV_H
+
+#include <linux/gunyah.h>
+#include <linux/gunyah_rsc_mgr.h>
+#include <linux/types.h>
+
+struct gh_rm;
+int gh_rm_call(struct gh_rm *rsc_mgr, u32 message_id, const void *req_buf, size_t req_buf_size,
+ void **resp_buf, size_t *resp_buf_size);
+
+#endif
diff --git a/include/linux/gunyah_rsc_mgr.h b/include/linux/gunyah_rsc_mgr.h
new file mode 100644
index 000000000000..f2a312e80af5
--- /dev/null
+++ b/include/linux/gunyah_rsc_mgr.h
@@ -0,0 +1,21 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef _GUNYAH_RSC_MGR_H
+#define _GUNYAH_RSC_MGR_H
+
+#include <linux/list.h>
+#include <linux/notifier.h>
+#include <linux/gunyah.h>
+
+#define GH_VMID_INVAL U16_MAX
+
+struct gh_rm;
+int gh_rm_notifier_register(struct gh_rm *rm, struct notifier_block *nb);
+int gh_rm_notifier_unregister(struct gh_rm *rm, struct notifier_block *nb);
+struct device *gh_rm_get(struct gh_rm *rm);
+void gh_rm_put(struct gh_rm *rm);
+
+#endif
--
2.40.0
Add hypercalls to send and receive messages on a Gunyah message queue.
Signed-off-by: Elliot Berman <[email protected]>
---
arch/arm64/gunyah/gunyah_hypercall.c | 31 ++++++++++++++++++++++++++++
include/linux/gunyah.h | 6 ++++++
2 files changed, 37 insertions(+)
diff --git a/arch/arm64/gunyah/gunyah_hypercall.c b/arch/arm64/gunyah/gunyah_hypercall.c
index 2166d5dab869..2b2a63e9b9e5 100644
--- a/arch/arm64/gunyah/gunyah_hypercall.c
+++ b/arch/arm64/gunyah/gunyah_hypercall.c
@@ -33,6 +33,8 @@ EXPORT_SYMBOL_GPL(arch_is_gh_guest);
fn)
#define GH_HYPERCALL_HYP_IDENTIFY GH_HYPERCALL(0x8000)
+#define GH_HYPERCALL_MSGQ_SEND GH_HYPERCALL(0x801B)
+#define GH_HYPERCALL_MSGQ_RECV GH_HYPERCALL(0x801C)
/**
* gh_hypercall_hyp_identify() - Returns build information and feature flags
@@ -52,5 +54,34 @@ void gh_hypercall_hyp_identify(struct gh_hypercall_hyp_identify_resp *hyp_identi
}
EXPORT_SYMBOL_GPL(gh_hypercall_hyp_identify);
+enum gh_error gh_hypercall_msgq_send(u64 capid, size_t size, void *buff, u64 tx_flags, bool *ready)
+{
+ struct arm_smccc_res res;
+
+ arm_smccc_1_1_hvc(GH_HYPERCALL_MSGQ_SEND, capid, size, (uintptr_t)buff, tx_flags, 0, &res);
+
+ if (res.a0 == GH_ERROR_OK)
+ *ready = !!res.a1;
+
+ return res.a0;
+}
+EXPORT_SYMBOL_GPL(gh_hypercall_msgq_send);
+
+enum gh_error gh_hypercall_msgq_recv(u64 capid, void *buff, size_t size, size_t *recv_size,
+ bool *ready)
+{
+ struct arm_smccc_res res;
+
+ arm_smccc_1_1_hvc(GH_HYPERCALL_MSGQ_RECV, capid, (uintptr_t)buff, size, 0, &res);
+
+ if (res.a0 == GH_ERROR_OK) {
+ *recv_size = res.a1;
+ *ready = !!res.a2;
+ }
+
+ return res.a0;
+}
+EXPORT_SYMBOL_GPL(gh_hypercall_msgq_recv);
+
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Gunyah Hypervisor Hypercalls");
diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
index 6b36cf4787ef..01a6f202d037 100644
--- a/include/linux/gunyah.h
+++ b/include/linux/gunyah.h
@@ -111,4 +111,10 @@ static inline u16 gh_api_version(const struct gh_hypercall_hyp_identify_resp *gh
void gh_hypercall_hyp_identify(struct gh_hypercall_hyp_identify_resp *hyp_identity);
+#define GH_HYPERCALL_MSGQ_TX_FLAGS_PUSH BIT(0)
+
+enum gh_error gh_hypercall_msgq_send(u64 capid, size_t size, void *buff, u64 tx_flags, bool *ready);
+enum gh_error gh_hypercall_msgq_recv(u64 capid, void *buff, size_t size, size_t *recv_size,
+ bool *ready);
+
#endif
--
2.40.0
Gunyah message queues are a unidirectional inter-VM pipe for messages up
to 1024 bytes. This driver supports pairing a receiver message queue and
a transmitter message queue to expose a single mailbox channel.
Signed-off-by: Elliot Berman <[email protected]>
---
Documentation/virt/gunyah/message-queue.rst | 8 +
drivers/mailbox/Makefile | 2 +
drivers/mailbox/gunyah-msgq.c | 212 ++++++++++++++++++++
include/linux/gunyah.h | 57 ++++++
4 files changed, 279 insertions(+)
create mode 100644 drivers/mailbox/gunyah-msgq.c
diff --git a/Documentation/virt/gunyah/message-queue.rst b/Documentation/virt/gunyah/message-queue.rst
index b352918ae54b..70d82a4ef32d 100644
--- a/Documentation/virt/gunyah/message-queue.rst
+++ b/Documentation/virt/gunyah/message-queue.rst
@@ -61,3 +61,11 @@ vIRQ: two TX message queues will have two vIRQs (and two capability IDs).
| | | | | |
| | | | | |
+---------------+ +-----------------+ +---------------+
+
+Gunyah message queues are exposed as mailboxes. To create the mailbox, create
+a mbox_client and call `gh_msgq_init()`. On receipt of the RX_READY interrupt,
+all messages in the RX message queue are read and pushed via the `rx_callback`
+of the registered mbox_client.
+
+.. kernel-doc:: drivers/mailbox/gunyah-msgq.c
+ :identifiers: gh_msgq_init
diff --git a/drivers/mailbox/Makefile b/drivers/mailbox/Makefile
index fc9376117111..5f929bb55e9a 100644
--- a/drivers/mailbox/Makefile
+++ b/drivers/mailbox/Makefile
@@ -55,6 +55,8 @@ obj-$(CONFIG_MTK_CMDQ_MBOX) += mtk-cmdq-mailbox.o
obj-$(CONFIG_ZYNQMP_IPI_MBOX) += zynqmp-ipi-mailbox.o
+obj-$(CONFIG_GUNYAH) += gunyah-msgq.o
+
obj-$(CONFIG_SUN6I_MSGBOX) += sun6i-msgbox.o
obj-$(CONFIG_SPRD_MBOX) += sprd-mailbox.o
diff --git a/drivers/mailbox/gunyah-msgq.c b/drivers/mailbox/gunyah-msgq.c
new file mode 100644
index 000000000000..b7a54f233680
--- /dev/null
+++ b/drivers/mailbox/gunyah-msgq.c
@@ -0,0 +1,212 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#include <linux/mailbox_controller.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+#include <linux/gunyah.h>
+#include <linux/printk.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/wait.h>
+
+#define mbox_chan_to_msgq(chan) (container_of(chan->mbox, struct gh_msgq, mbox))
+
+static irqreturn_t gh_msgq_rx_irq_handler(int irq, void *data)
+{
+ struct gh_msgq *msgq = data;
+ struct gh_msgq_rx_data rx_data;
+ enum gh_error gh_error;
+ bool ready = true;
+
+ while (ready) {
+ gh_error = gh_hypercall_msgq_recv(msgq->rx_ghrsc->capid,
+ &rx_data.data, sizeof(rx_data.data),
+ &rx_data.length, &ready);
+ if (gh_error != GH_ERROR_OK) {
+ if (gh_error != GH_ERROR_MSGQUEUE_EMPTY)
+ dev_warn(msgq->mbox.dev, "Failed to receive data: %d\n", gh_error);
+ break;
+ }
+ if (likely(gh_msgq_chan(msgq)->cl))
+ mbox_chan_received_data(gh_msgq_chan(msgq), &rx_data);
+ }
+
+ return IRQ_HANDLED;
+}
+
+/* Fired when message queue transitions from "full" to "space available" to send messages */
+static irqreturn_t gh_msgq_tx_irq_handler(int irq, void *data)
+{
+ struct gh_msgq *msgq = data;
+
+ mbox_chan_txdone(gh_msgq_chan(msgq), 0);
+
+ return IRQ_HANDLED;
+}
+
+/* Fired after sending message and hypercall told us there was more space available. */
+static void gh_msgq_txdone_tasklet(struct tasklet_struct *tasklet)
+{
+ struct gh_msgq *msgq = container_of(tasklet, struct gh_msgq, txdone_tasklet);
+
+ mbox_chan_txdone(gh_msgq_chan(msgq), msgq->last_ret);
+}
+
+static int gh_msgq_send_data(struct mbox_chan *chan, void *data)
+{
+ struct gh_msgq *msgq = mbox_chan_to_msgq(chan);
+ struct gh_msgq_tx_data *msgq_data = data;
+ u64 tx_flags = 0;
+ enum gh_error gh_error;
+ bool ready;
+
+ if (!msgq->tx_ghrsc)
+ return -EOPNOTSUPP;
+
+ if (msgq_data->push)
+ tx_flags |= GH_HYPERCALL_MSGQ_TX_FLAGS_PUSH;
+
+ gh_error = gh_hypercall_msgq_send(msgq->tx_ghrsc->capid, msgq_data->length, msgq_data->data,
+ tx_flags, &ready);
+
+ /**
+ * unlikely because Linux tracks state of msgq and should not try to
+ * send message when msgq is full.
+ */
+ if (unlikely(gh_error == GH_ERROR_MSGQUEUE_FULL))
+ return -EAGAIN;
+
+ /**
+ * Propagate all other errors to client. If we return error to mailbox
+ * framework, then no other messages can be sent and nobody will know
+ * to retry this message.
+ */
+ msgq->last_ret = gh_error_remap(gh_error);
+
+ /**
+ * This message was successfully sent, but message queue isn't ready to
+ * accept more messages because it's now full. Mailbox framework
+ * requires that we only report that message was transmitted when
+ * we're ready to transmit another message. We'll get that in the form
+ * of tx IRQ once the other side starts to drain the msgq.
+ */
+ if (gh_error == GH_ERROR_OK) {
+ if (!ready)
+ return 0;
+ } else
+ dev_err(msgq->mbox.dev, "Failed to send data: %d (%d)\n", gh_error, msgq->last_ret);
+
+ /**
+ * We can send more messages. Mailbox framework requires that tx done
+ * happens asynchronously to sending the message. Gunyah message queues
+ * tell us right away on the hypercall return whether we can send more
+ * messages. To work around this, defer the txdone to a tasklet.
+ */
+ tasklet_schedule(&msgq->txdone_tasklet);
+
+ return 0;
+}
+
+static struct mbox_chan_ops gh_msgq_ops = {
+ .send_data = gh_msgq_send_data,
+};
+
+/**
+ * gh_msgq_init() - Initialize a Gunyah message queue with an mbox_client
+ * @parent: device parent used for the mailbox controller
+ * @msgq: Pointer to the gh_msgq to initialize
+ * @cl: A mailbox client to bind to the mailbox channel that the message queue creates
+ * @tx_ghrsc: optional, the transmission side of the message queue
+ * @rx_ghrsc: optional, the receiving side of the message queue
+ *
+ * At least one of tx_ghrsc and rx_ghrsc must be not NULL. Most message queue use cases come with
+ * a pair of message queues to facilitate bidirectional communication. When tx_ghrsc is set,
+ * the client can send messages with mbox_send_message(gh_msgq_chan(msgq), msg). When rx_ghrsc
+ * is set, the mbox_client must register an .rx_callback() and the message queue driver will
+ * deliver all available messages upon receiving the RX ready interrupt. The messages should be
+ * consumed or copied by the client right away as the gh_msgq_rx_data will be replaced/destroyed
+ * after the callback.
+ *
+ * Returns - 0 on success, negative otherwise
+ */
+int gh_msgq_init(struct device *parent, struct gh_msgq *msgq, struct mbox_client *cl,
+ struct gh_resource *tx_ghrsc, struct gh_resource *rx_ghrsc)
+{
+ int ret;
+
+ /* Must have at least a tx_ghrsc or rx_ghrsc and that they are the right device types */
+ if ((!tx_ghrsc && !rx_ghrsc) ||
+ (tx_ghrsc && tx_ghrsc->type != GH_RESOURCE_TYPE_MSGQ_TX) ||
+ (rx_ghrsc && rx_ghrsc->type != GH_RESOURCE_TYPE_MSGQ_RX))
+ return -EINVAL;
+
+ msgq->mbox.dev = parent;
+ msgq->mbox.ops = &gh_msgq_ops;
+ msgq->mbox.num_chans = 1;
+ msgq->mbox.txdone_irq = true;
+ msgq->mbox.chans = &msgq->mbox_chan;
+
+ ret = mbox_controller_register(&msgq->mbox);
+ if (ret)
+ return ret;
+
+ ret = mbox_bind_client(gh_msgq_chan(msgq), cl);
+ if (ret)
+ goto err_mbox;
+
+ if (tx_ghrsc) {
+ msgq->tx_ghrsc = tx_ghrsc;
+
+ ret = request_irq(msgq->tx_ghrsc->irq, gh_msgq_tx_irq_handler, 0, "gh_msgq_tx",
+ msgq);
+ if (ret)
+ goto err_tx_ghrsc;
+
+ tasklet_setup(&msgq->txdone_tasklet, gh_msgq_txdone_tasklet);
+ }
+
+ if (rx_ghrsc) {
+ msgq->rx_ghrsc = rx_ghrsc;
+
+ ret = request_threaded_irq(msgq->rx_ghrsc->irq, NULL, gh_msgq_rx_irq_handler,
+ IRQF_ONESHOT, "gh_msgq_rx", msgq);
+ if (ret)
+ goto err_tx_irq;
+ }
+
+ return 0;
+err_tx_irq:
+ if (msgq->tx_ghrsc)
+ free_irq(msgq->tx_ghrsc->irq, msgq);
+
+ msgq->rx_ghrsc = NULL;
+err_tx_ghrsc:
+ msgq->tx_ghrsc = NULL;
+err_mbox:
+ mbox_controller_unregister(&msgq->mbox);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(gh_msgq_init);
+
+void gh_msgq_remove(struct gh_msgq *msgq)
+{
+ if (msgq->rx_ghrsc)
+ free_irq(msgq->rx_ghrsc->irq, msgq);
+
+ if (msgq->tx_ghrsc) {
+ tasklet_kill(&msgq->txdone_tasklet);
+ free_irq(msgq->tx_ghrsc->irq, msgq);
+ }
+
+ mbox_controller_unregister(&msgq->mbox);
+
+ msgq->rx_ghrsc = NULL;
+ msgq->tx_ghrsc = NULL;
+}
+EXPORT_SYMBOL_GPL(gh_msgq_remove);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Gunyah Message Queue Driver");
diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
index 01a6f202d037..982e27d10d57 100644
--- a/include/linux/gunyah.h
+++ b/include/linux/gunyah.h
@@ -8,11 +8,68 @@
#include <linux/bitfield.h>
#include <linux/errno.h>
+#include <linux/interrupt.h>
#include <linux/limits.h>
+#include <linux/mailbox_controller.h>
+#include <linux/mailbox_client.h>
#include <linux/types.h>
+/* Matches resource manager's resource types for VM_GET_HYP_RESOURCES RPC */
+enum gh_resource_type {
+ GH_RESOURCE_TYPE_BELL_TX = 0,
+ GH_RESOURCE_TYPE_BELL_RX = 1,
+ GH_RESOURCE_TYPE_MSGQ_TX = 2,
+ GH_RESOURCE_TYPE_MSGQ_RX = 3,
+ GH_RESOURCE_TYPE_VCPU = 4,
+};
+
+struct gh_resource {
+ enum gh_resource_type type;
+ u64 capid;
+ unsigned int irq;
+};
+
+/**
+ * Gunyah Message Queues
+ */
+
+#define GH_MSGQ_MAX_MSG_SIZE 240
+
+struct gh_msgq_tx_data {
+ size_t length;
+ bool push;
+ char data[];
+};
+
+struct gh_msgq_rx_data {
+ size_t length;
+ char data[GH_MSGQ_MAX_MSG_SIZE];
+};
+
+struct gh_msgq {
+ struct gh_resource *tx_ghrsc;
+ struct gh_resource *rx_ghrsc;
+
+ /* msgq private */
+ int last_ret; /* Linux error, not GH_STATUS_* */
+ struct mbox_chan mbox_chan;
+ struct mbox_controller mbox;
+ struct tasklet_struct txdone_tasklet;
+};
+
+
+int gh_msgq_init(struct device *parent, struct gh_msgq *msgq, struct mbox_client *cl,
+ struct gh_resource *tx_ghrsc, struct gh_resource *rx_ghrsc);
+void gh_msgq_remove(struct gh_msgq *msgq);
+
+static inline struct mbox_chan *gh_msgq_chan(struct gh_msgq *msgq)
+{
+ return &msgq->mbox.chans[0];
+}
+
/******************************************************************************/
/* Common arch-independent definitions for Gunyah hypercalls */
+
#define GH_CAPID_INVAL U64_MAX
#define GH_VMID_ROOT_VM 0xff
--
2.40.0
Add framework for VM functions to handle stage-2 write faults from Gunyah
guest virtual machines. IO handlers have a range of addresses which they
apply to. Optionally, they may apply to only when the value written
matches the IO handler's value.
Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Elliot Berman <[email protected]>
---
drivers/virt/gunyah/vm_mgr.c | 104 ++++++++++++++++++++++++++++++++++
drivers/virt/gunyah/vm_mgr.h | 4 ++
include/linux/gunyah_vm_mgr.h | 25 ++++++++
3 files changed, 133 insertions(+)
diff --git a/drivers/virt/gunyah/vm_mgr.c b/drivers/virt/gunyah/vm_mgr.c
index 6228090aceb6..81b42ab675f6 100644
--- a/drivers/virt/gunyah/vm_mgr.c
+++ b/drivers/virt/gunyah/vm_mgr.c
@@ -279,6 +279,108 @@ static void gh_vm_clean_resources(struct gh_vm *ghvm)
mutex_unlock(&ghvm->resources_lock);
}
+static int _gh_vm_io_handler_compare(const struct rb_node *node, const struct rb_node *parent)
+{
+ struct gh_vm_io_handler *n = container_of(node, struct gh_vm_io_handler, node);
+ struct gh_vm_io_handler *p = container_of(parent, struct gh_vm_io_handler, node);
+
+ if (n->addr < p->addr)
+ return -1;
+ if (n->addr > p->addr)
+ return 1;
+ if ((n->len && !p->len) || (!n->len && p->len))
+ return 0;
+ if (n->len < p->len)
+ return -1;
+ if (n->len > p->len)
+ return 1;
+ /* one of the io handlers doesn't have datamatch and the other does.
+ * For purposes of comparison, that makes them identical since the
+ * one that doesn't have datamatch will cover the same handler that
+ * does.
+ */
+ if (n->datamatch != p->datamatch)
+ return 0;
+ if (n->data < p->data)
+ return -1;
+ if (n->data > p->data)
+ return 1;
+ return 0;
+}
+
+static int gh_vm_io_handler_compare(struct rb_node *node, const struct rb_node *parent)
+{
+ return _gh_vm_io_handler_compare(node, parent);
+}
+
+static int gh_vm_io_handler_find(const void *key, const struct rb_node *node)
+{
+ const struct gh_vm_io_handler *k = key;
+
+ return _gh_vm_io_handler_compare(&k->node, node);
+}
+
+static struct gh_vm_io_handler *gh_vm_mgr_find_io_hdlr(struct gh_vm *ghvm, u64 addr,
+ u64 len, u64 data)
+{
+ struct gh_vm_io_handler key = {
+ .addr = addr,
+ .len = len,
+ .datamatch = true,
+ .data = data,
+ };
+ struct rb_node *node;
+
+ node = rb_find(&key, &ghvm->mmio_handler_root, gh_vm_io_handler_find);
+ if (!node)
+ return NULL;
+
+ return container_of(node, struct gh_vm_io_handler, node);
+}
+
+int gh_vm_mmio_write(struct gh_vm *ghvm, u64 addr, u32 len, u64 data)
+{
+ struct gh_vm_io_handler *io_hdlr = NULL;
+ int ret;
+
+ down_read(&ghvm->mmio_handler_lock);
+ io_hdlr = gh_vm_mgr_find_io_hdlr(ghvm, addr, len, data);
+ if (!io_hdlr || !io_hdlr->ops || !io_hdlr->ops->write) {
+ ret = -ENODEV;
+ goto out;
+ }
+
+ ret = io_hdlr->ops->write(io_hdlr, addr, len, data);
+
+out:
+ up_read(&ghvm->mmio_handler_lock);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(gh_vm_mmio_write);
+
+int gh_vm_add_io_handler(struct gh_vm *ghvm, struct gh_vm_io_handler *io_hdlr)
+{
+ struct rb_node *found;
+
+ if (io_hdlr->datamatch && (!io_hdlr->len || io_hdlr->len > sizeof(io_hdlr->data)))
+ return -EINVAL;
+
+ down_write(&ghvm->mmio_handler_lock);
+ found = rb_find_add(&io_hdlr->node, &ghvm->mmio_handler_root, gh_vm_io_handler_compare);
+ up_write(&ghvm->mmio_handler_lock);
+
+ return found ? -EEXIST : 0;
+}
+EXPORT_SYMBOL_GPL(gh_vm_add_io_handler);
+
+void gh_vm_remove_io_handler(struct gh_vm *ghvm, struct gh_vm_io_handler *io_hdlr)
+{
+ down_write(&ghvm->mmio_handler_lock);
+ rb_erase(&io_hdlr->node, &ghvm->mmio_handler_root);
+ up_write(&ghvm->mmio_handler_lock);
+}
+EXPORT_SYMBOL_GPL(gh_vm_remove_io_handler);
+
static int gh_vm_rm_notification_status(struct gh_vm *ghvm, void *data)
{
struct gh_rm_vm_status_payload *payload = data;
@@ -364,6 +466,8 @@ static __must_check struct gh_vm *gh_vm_alloc(struct gh_rm *rm)
mutex_init(&ghvm->resources_lock);
INIT_LIST_HEAD(&ghvm->resources);
INIT_LIST_HEAD(&ghvm->resource_tickets);
+ init_rwsem(&ghvm->mmio_handler_lock);
+ ghvm->mmio_handler_root = RB_ROOT;
INIT_LIST_HEAD(&ghvm->functions);
ghvm->vm_status = GH_RM_VM_STATUS_NO_STATE;
diff --git a/drivers/virt/gunyah/vm_mgr.h b/drivers/virt/gunyah/vm_mgr.h
index e5e0c92d4cb1..3fc0f91dfd1a 100644
--- a/drivers/virt/gunyah/vm_mgr.h
+++ b/drivers/virt/gunyah/vm_mgr.h
@@ -56,10 +56,14 @@ struct gh_vm {
struct mutex resources_lock;
struct list_head resources;
struct list_head resource_tickets;
+ struct rb_root mmio_handler_root;
+ struct rw_semaphore mmio_handler_lock;
};
int gh_vm_mem_alloc(struct gh_vm *ghvm, struct gh_userspace_memory_region *region);
void gh_vm_mem_reclaim(struct gh_vm *ghvm);
struct gh_vm_mem *gh_vm_mem_find_by_addr(struct gh_vm *ghvm, u64 guest_phys_addr, u32 size);
+int gh_vm_mmio_write(struct gh_vm *ghvm, u64 addr, u32 len, u64 data);
+
#endif
diff --git a/include/linux/gunyah_vm_mgr.h b/include/linux/gunyah_vm_mgr.h
index e3a6666d7529..0fa3cf6bcaca 100644
--- a/include/linux/gunyah_vm_mgr.h
+++ b/include/linux/gunyah_vm_mgr.h
@@ -98,4 +98,29 @@ struct gh_vm_resource_ticket {
int gh_vm_add_resource_ticket(struct gh_vm *ghvm, struct gh_vm_resource_ticket *ticket);
void gh_vm_remove_resource_ticket(struct gh_vm *ghvm, struct gh_vm_resource_ticket *ticket);
+/*
+ * gh_vm_io_handler contains the info about an io device and its associated
+ * addr and the ops associated with the io device.
+ */
+struct gh_vm_io_handler {
+ struct rb_node node;
+ u64 addr;
+
+ bool datamatch;
+ u8 len;
+ u64 data;
+ struct gh_vm_io_handler_ops *ops;
+};
+
+/*
+ * gh_vm_io_handler_ops contains function pointers associated with an iodevice.
+ */
+struct gh_vm_io_handler_ops {
+ int (*read)(struct gh_vm_io_handler *io_dev, u64 addr, u32 len, u64 data);
+ int (*write)(struct gh_vm_io_handler *io_dev, u64 addr, u32 len, u64 data);
+};
+
+int gh_vm_add_io_handler(struct gh_vm *ghvm, struct gh_vm_io_handler *io_dev);
+void gh_vm_remove_io_handler(struct gh_vm *ghvm, struct gh_vm_io_handler *io_dev);
+
#endif
--
2.40.0
Gunyah allows host virtual machines to schedule guest virtual machines
and handle their MMIO accesses. vCPUs are presented to the host as a
Gunyah resource and represented to userspace as a Gunyah VM function.
Creating the vcpu VM function will create a file descriptor that:
- can run an ioctl: GH_VCPU_RUN to schedule the guest vCPU until the
next interrupt occurs on the host or when the guest vCPU can no
longer be run.
- can be mmap'd to share a gh_vcpu_run structure which can look up the
reason why GH_VCPU_RUN returned and provide return values for MMIO
access.
Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Elliot Berman <[email protected]>
---
Documentation/virt/gunyah/vm-manager.rst | 46 ++-
arch/arm64/gunyah/gunyah_hypercall.c | 28 ++
drivers/virt/gunyah/Kconfig | 11 +
drivers/virt/gunyah/Makefile | 2 +
drivers/virt/gunyah/gunyah_vcpu.c | 468 +++++++++++++++++++++++
drivers/virt/gunyah/vm_mgr.c | 4 +
drivers/virt/gunyah/vm_mgr.h | 1 +
include/linux/gunyah.h | 24 ++
include/uapi/linux/gunyah.h | 128 +++++++
9 files changed, 710 insertions(+), 2 deletions(-)
create mode 100644 drivers/virt/gunyah/gunyah_vcpu.c
diff --git a/Documentation/virt/gunyah/vm-manager.rst b/Documentation/virt/gunyah/vm-manager.rst
index 3b51bab9d793..6789d13fed14 100644
--- a/Documentation/virt/gunyah/vm-manager.rst
+++ b/Documentation/virt/gunyah/vm-manager.rst
@@ -5,8 +5,7 @@ Virtual Machine Manager
=======================
The Gunyah Virtual Machine Manager is a Linux driver to support launching
-virtual machines using Gunyah. It presently supports launching non-proxy
-scheduled Linux-like virtual machines.
+virtual machines using Gunyah.
Except for some basic information about the location of initial binaries,
most of the configuration about a Gunyah virtual machine is described in the
@@ -98,3 +97,46 @@ GH_VM_START
~~~~~~~~~~~
This ioctl starts the VM.
+
+GH_VM_ADD_FUNCTION
+~~~~~~~~~~~~~~~~~~
+
+This ioctl registers a Gunyah VM function with the VM manager. The VM function
+is described with a &struct gh_fn_desc.type and some arguments for that type.
+Typically, the function is added before the VM starts, but the function doesn't
+"operate" until the VM starts with `GH_VM_START`_. For example, vCPU ioclts will
+all return an error until the VM starts because the vCPUs don't exist until the
+VM is started. This allows the VMM to set up all the kernel functions needed for
+the VM *before* the VM starts.
+
+.. kernel-doc:: include/uapi/linux/gunyah.h
+ :identifiers: gh_fn_desc gh_fn_type
+
+The argument types are documented below:
+
+.. kernel-doc:: include/uapi/linux/gunyah.h
+ :identifiers: gh_fn_vcpu_arg
+
+Gunyah VCPU API Descriptions
+----------------------------
+
+A vCPU file descriptor is created after calling `GH_VM_ADD_FUNCTION` with the type `GH_FN_VCPU`.
+
+GH_VCPU_RUN
+~~~~~~~~~~~
+
+This ioctl is used to run a guest virtual cpu. While there are no
+explicit parameters, there is an implicit parameter block that can be
+obtained by mmap()ing the vcpu fd at offset 0, with the size given by
+`GH_VCPU_MMAP_SIZE`_. The parameter block is formatted as a 'struct
+gh_vcpu_run' (see below).
+
+GH_VCPU_MMAP_SIZE
+~~~~~~~~~~~~~~~~~
+
+The `GH_VCPU_RUN`_ ioctl communicates with userspace via a shared
+memory region. This ioctl returns the size of that region. See the
+`GH_VCPU_RUN`_ documentation for details.
+
+.. kernel-doc:: include/uapi/linux/gunyah.h
+ :identifiers: gh_vcpu_exit gh_vcpu_run gh_vm_status gh_vm_exit_info
diff --git a/arch/arm64/gunyah/gunyah_hypercall.c b/arch/arm64/gunyah/gunyah_hypercall.c
index 2b2a63e9b9e5..5f33f53e05a9 100644
--- a/arch/arm64/gunyah/gunyah_hypercall.c
+++ b/arch/arm64/gunyah/gunyah_hypercall.c
@@ -35,6 +35,7 @@ EXPORT_SYMBOL_GPL(arch_is_gh_guest);
#define GH_HYPERCALL_HYP_IDENTIFY GH_HYPERCALL(0x8000)
#define GH_HYPERCALL_MSGQ_SEND GH_HYPERCALL(0x801B)
#define GH_HYPERCALL_MSGQ_RECV GH_HYPERCALL(0x801C)
+#define GH_HYPERCALL_VCPU_RUN GH_HYPERCALL(0x8065)
/**
* gh_hypercall_hyp_identify() - Returns build information and feature flags
@@ -83,5 +84,32 @@ enum gh_error gh_hypercall_msgq_recv(u64 capid, void *buff, size_t size, size_t
}
EXPORT_SYMBOL_GPL(gh_hypercall_msgq_recv);
+enum gh_error gh_hypercall_vcpu_run(u64 capid, u64 *resume_data,
+ struct gh_hypercall_vcpu_run_resp *resp)
+{
+ struct arm_smccc_1_2_regs args = {
+ .a0 = GH_HYPERCALL_VCPU_RUN,
+ .a1 = capid,
+ .a2 = resume_data[0],
+ .a3 = resume_data[1],
+ .a4 = resume_data[2],
+ /* C language says this will be implictly zero. Gunyah requires 0, so be explicit */
+ .a5 = 0,
+ };
+ struct arm_smccc_1_2_regs res;
+
+ arm_smccc_1_2_hvc(&args, &res);
+
+ if (res.a0 == GH_ERROR_OK) {
+ resp->sized_state = res.a1;
+ resp->state_data[0] = res.a2;
+ resp->state_data[1] = res.a3;
+ resp->state_data[2] = res.a4;
+ }
+
+ return res.a0;
+}
+EXPORT_SYMBOL_GPL(gh_hypercall_vcpu_run);
+
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("Gunyah Hypervisor Hypercalls");
diff --git a/drivers/virt/gunyah/Kconfig b/drivers/virt/gunyah/Kconfig
index 0421b751aad4..0a58395f7d2c 100644
--- a/drivers/virt/gunyah/Kconfig
+++ b/drivers/virt/gunyah/Kconfig
@@ -28,3 +28,14 @@ config GUNYAH_QCOM_PLATFORM
extra platform-specific support.
Say Y/M here to use Gunyah on Qualcomm platforms.
+
+config GUNYAH_VCPU
+ tristate "Runnable Gunyah vCPUs"
+ depends on GUNYAH
+ help
+ Enable kernel support for host-scheduled vCPUs running under Gunyah.
+ When selecting this option, userspace virtual machine managers (VMM)
+ can schedule the guest VM's vCPUs instead of using Gunyah's scheduler.
+ VMMs can also handle stage 2 faults of the vCPUs.
+
+ Say Y/M here if unsure and you want to support Gunyah VMMs.
diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
index 2aa9ff038ed0..cc16b6c19db9 100644
--- a/drivers/virt/gunyah/Makefile
+++ b/drivers/virt/gunyah/Makefile
@@ -5,3 +5,5 @@ obj-$(CONFIG_GUNYAH_QCOM_PLATFORM) += gunyah_qcom.o
gunyah-y += rsc_mgr.o rsc_mgr_rpc.o vm_mgr.o vm_mgr_mm.o
obj-$(CONFIG_GUNYAH) += gunyah.o
+
+obj-$(CONFIG_GUNYAH_VCPU) += gunyah_vcpu.o
diff --git a/drivers/virt/gunyah/gunyah_vcpu.c b/drivers/virt/gunyah/gunyah_vcpu.c
new file mode 100644
index 000000000000..4f0bbd58a205
--- /dev/null
+++ b/drivers/virt/gunyah/gunyah_vcpu.c
@@ -0,0 +1,468 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#include <linux/anon_inodes.h>
+#include <linux/file.h>
+#include <linux/gunyah.h>
+#include <linux/gunyah_vm_mgr.h>
+#include <linux/interrupt.h>
+#include <linux/kref.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/wait.h>
+
+#include "vm_mgr.h"
+
+#include <uapi/linux/gunyah.h>
+
+#define MAX_VCPU_NAME 20 /* gh-vcpu:u32_max+NUL */
+
+struct gh_vcpu {
+ struct gh_vm_function_instance *f;
+ struct gh_resource *rsc;
+ struct mutex run_lock;
+ /* Track why vcpu_run left last time around. */
+ enum {
+ GH_VCPU_UNKNOWN = 0,
+ GH_VCPU_READY,
+ GH_VCPU_MMIO_READ,
+ GH_VCPU_SYSTEM_DOWN,
+ } state;
+ u8 mmio_read_len;
+ struct gh_vcpu_run *vcpu_run;
+ struct completion ready;
+ struct gh_vm *ghvm;
+
+ struct notifier_block nb;
+ struct gh_vm_resource_ticket ticket;
+ struct kref kref;
+};
+
+static void vcpu_release(struct kref *kref)
+{
+ struct gh_vcpu *vcpu = container_of(kref, struct gh_vcpu, kref);
+
+ free_page((unsigned long)vcpu->vcpu_run);
+ kfree(vcpu);
+}
+
+/*
+ * When hypervisor allows us to schedule vCPU again, it gives us an interrupt
+ */
+static irqreturn_t gh_vcpu_irq_handler(int irq, void *data)
+{
+ struct gh_vcpu *vcpu = data;
+
+ complete(&vcpu->ready);
+ return IRQ_HANDLED;
+}
+
+static bool gh_handle_mmio(struct gh_vcpu *vcpu,
+ struct gh_hypercall_vcpu_run_resp *vcpu_run_resp)
+{
+ int ret = 0;
+ u64 addr = vcpu_run_resp->state_data[0],
+ len = vcpu_run_resp->state_data[1],
+ data = vcpu_run_resp->state_data[2];
+
+ if (WARN_ON(len > sizeof(u64)))
+ len = sizeof(u64);
+
+ if (vcpu_run_resp->state == GH_VCPU_ADDRSPACE_VMMIO_READ) {
+ vcpu->vcpu_run->mmio.is_write = 0;
+ /* Record that we need to give vCPU user's supplied value next gh_vcpu_run() */
+ vcpu->state = GH_VCPU_MMIO_READ;
+ vcpu->mmio_read_len = len;
+ } else { /* GH_VCPU_ADDRSPACE_VMMIO_WRITE */
+ /* Try internal handlers first */
+ ret = gh_vm_mmio_write(vcpu->f->ghvm, addr, len, data);
+ if (!ret)
+ return true;
+
+ /* Give userspace the info */
+ vcpu->vcpu_run->mmio.is_write = 1;
+ memcpy(vcpu->vcpu_run->mmio.data, &data, len);
+ }
+
+ vcpu->vcpu_run->mmio.phys_addr = addr;
+ vcpu->vcpu_run->mmio.len = len;
+ vcpu->vcpu_run->exit_reason = GH_VCPU_EXIT_MMIO;
+
+ return false;
+}
+
+static int gh_vcpu_rm_notification(struct notifier_block *nb, unsigned long action, void *data)
+{
+ struct gh_vcpu *vcpu = container_of(nb, struct gh_vcpu, nb);
+ struct gh_rm_vm_exited_payload *exit_payload = data;
+
+ if (action == GH_RM_NOTIFICATION_VM_EXITED &&
+ le16_to_cpu(exit_payload->vmid) == vcpu->ghvm->vmid)
+ complete(&vcpu->ready);
+
+ return NOTIFY_OK;
+}
+
+static inline enum gh_vm_status remap_vm_status(enum gh_rm_vm_status rm_status)
+{
+ switch (rm_status) {
+ case GH_RM_VM_STATUS_INIT_FAILED:
+ return GH_VM_STATUS_LOAD_FAILED;
+ case GH_RM_VM_STATUS_EXITED:
+ return GH_VM_STATUS_EXITED;
+ default:
+ return GH_VM_STATUS_CRASHED;
+ }
+}
+
+/**
+ * gh_vcpu_check_system() - Check whether VM as a whole is running
+ * @vcpu: Pointer to gh_vcpu
+ *
+ * Returns true if the VM is alive.
+ * Returns false if the vCPU is the VM is not alive (can only be that VM is shutting down).
+ */
+static bool gh_vcpu_check_system(struct gh_vcpu *vcpu)
+ __must_hold(&vcpu->run_lock)
+{
+ bool ret = true;
+
+ down_read(&vcpu->ghvm->status_lock);
+ if (likely(vcpu->ghvm->vm_status == GH_RM_VM_STATUS_RUNNING))
+ goto out;
+
+ vcpu->vcpu_run->status.status = remap_vm_status(vcpu->ghvm->vm_status);
+ vcpu->vcpu_run->status.exit_info = vcpu->ghvm->exit_info;
+ vcpu->vcpu_run->exit_reason = GH_VCPU_EXIT_STATUS;
+ vcpu->state = GH_VCPU_SYSTEM_DOWN;
+ ret = false;
+out:
+ up_read(&vcpu->ghvm->status_lock);
+ return ret;
+}
+
+/**
+ * gh_vcpu_run() - Request Gunyah to begin scheduling this vCPU.
+ * @vcpu: The client descriptor that was obtained via gh_vcpu_alloc()
+ */
+static int gh_vcpu_run(struct gh_vcpu *vcpu)
+{
+ struct gh_hypercall_vcpu_run_resp vcpu_run_resp;
+ u64 state_data[3] = { 0 };
+ enum gh_error gh_error;
+ int ret = 0;
+
+ if (!vcpu->f)
+ return -ENODEV;
+
+ if (mutex_lock_interruptible(&vcpu->run_lock))
+ return -ERESTARTSYS;
+
+ if (!vcpu->rsc) {
+ ret = -ENODEV;
+ goto out;
+ }
+
+ switch (vcpu->state) {
+ case GH_VCPU_UNKNOWN:
+ if (vcpu->ghvm->vm_status != GH_RM_VM_STATUS_RUNNING) {
+ /* Check if VM is up. If VM is starting, will block until VM is fully up
+ * since that thread does down_write.
+ */
+ if (!gh_vcpu_check_system(vcpu))
+ goto out;
+ }
+ vcpu->state = GH_VCPU_READY;
+ break;
+ case GH_VCPU_MMIO_READ:
+ if (unlikely(vcpu->mmio_read_len > sizeof(state_data[0])))
+ vcpu->mmio_read_len = sizeof(state_data[0]);
+ memcpy(&state_data[0], vcpu->vcpu_run->mmio.data, vcpu->mmio_read_len);
+ vcpu->state = GH_VCPU_READY;
+ break;
+ case GH_VCPU_SYSTEM_DOWN:
+ goto out;
+ default:
+ break;
+ }
+
+ while (!ret && !signal_pending(current)) {
+ if (vcpu->vcpu_run->immediate_exit) {
+ ret = -EINTR;
+ goto out;
+ }
+
+ gh_error = gh_hypercall_vcpu_run(vcpu->rsc->capid, state_data, &vcpu_run_resp);
+ if (gh_error == GH_ERROR_OK) {
+ switch (vcpu_run_resp.state) {
+ case GH_VCPU_STATE_READY:
+ if (need_resched())
+ schedule();
+ break;
+ case GH_VCPU_STATE_POWERED_OFF:
+ /* vcpu might be off because the VM is shut down.
+ * If so, it won't ever run again: exit back to user
+ */
+ if (!gh_vcpu_check_system(vcpu))
+ goto out;
+ /* Otherwise, another vcpu will turn it on (e.g. by PSCI)
+ * and hyp sends an interrupt to wake Linux up.
+ */
+ fallthrough;
+ case GH_VCPU_STATE_EXPECTS_WAKEUP:
+ ret = wait_for_completion_interruptible(&vcpu->ready);
+ /* reinitialize completion before next hypercall. If we reinitialize
+ * after the hypercall, interrupt may have already come before
+ * re-initializing the completion and then end up waiting for
+ * event that already happened.
+ */
+ reinit_completion(&vcpu->ready);
+ /* Check system status again. Completion might've
+ * come from gh_vcpu_rm_notification
+ */
+ if (!ret && !gh_vcpu_check_system(vcpu))
+ goto out;
+ break;
+ case GH_VCPU_STATE_BLOCKED:
+ schedule();
+ break;
+ case GH_VCPU_ADDRSPACE_VMMIO_READ:
+ case GH_VCPU_ADDRSPACE_VMMIO_WRITE:
+ if (!gh_handle_mmio(vcpu, &vcpu_run_resp))
+ goto out;
+ break;
+ default:
+ pr_warn_ratelimited("Unknown vCPU state: %llx\n",
+ vcpu_run_resp.sized_state);
+ schedule();
+ break;
+ }
+ } else if (gh_error == GH_ERROR_RETRY) {
+ schedule();
+ } else {
+ ret = gh_error_remap(gh_error);
+ }
+ }
+
+out:
+ mutex_unlock(&vcpu->run_lock);
+
+ if (signal_pending(current))
+ return -ERESTARTSYS;
+
+ return ret;
+}
+
+static long gh_vcpu_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+ struct gh_vcpu *vcpu = filp->private_data;
+ long ret = -EINVAL;
+
+ switch (cmd) {
+ case GH_VCPU_RUN:
+ ret = gh_vcpu_run(vcpu);
+ break;
+ case GH_VCPU_MMAP_SIZE:
+ ret = PAGE_SIZE;
+ break;
+ default:
+ break;
+ }
+ return ret;
+}
+
+static int gh_vcpu_release(struct inode *inode, struct file *filp)
+{
+ struct gh_vcpu *vcpu = filp->private_data;
+
+ gh_vm_put(vcpu->ghvm);
+ kref_put(&vcpu->kref, vcpu_release);
+ return 0;
+}
+
+static vm_fault_t gh_vcpu_fault(struct vm_fault *vmf)
+{
+ struct gh_vcpu *vcpu = vmf->vma->vm_file->private_data;
+ struct page *page = NULL;
+
+ if (vmf->pgoff == 0)
+ page = virt_to_page(vcpu->vcpu_run);
+
+ get_page(page);
+ vmf->page = page;
+ return 0;
+}
+
+static const struct vm_operations_struct gh_vcpu_ops = {
+ .fault = gh_vcpu_fault,
+};
+
+static int gh_vcpu_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ vma->vm_ops = &gh_vcpu_ops;
+ return 0;
+}
+
+static const struct file_operations gh_vcpu_fops = {
+ .owner = THIS_MODULE,
+ .unlocked_ioctl = gh_vcpu_ioctl,
+ .release = gh_vcpu_release,
+ .llseek = noop_llseek,
+ .mmap = gh_vcpu_mmap,
+};
+
+static bool gh_vcpu_populate(struct gh_vm_resource_ticket *ticket, struct gh_resource *ghrsc)
+{
+ struct gh_vcpu *vcpu = container_of(ticket, struct gh_vcpu, ticket);
+ int ret;
+
+ mutex_lock(&vcpu->run_lock);
+ if (vcpu->rsc) {
+ pr_warn("vcpu%d already got a Gunyah resource. Check if multiple resources with same label were configured.\n",
+ vcpu->ticket.label);
+ ret = -EEXIST;
+ goto out;
+ }
+
+ vcpu->rsc = ghrsc;
+ init_completion(&vcpu->ready);
+
+ ret = request_irq(vcpu->rsc->irq, gh_vcpu_irq_handler, IRQF_TRIGGER_RISING, "gh_vcpu",
+ vcpu);
+ if (ret)
+ pr_warn("Failed to request vcpu irq %d: %d", vcpu->rsc->irq, ret);
+
+out:
+ mutex_unlock(&vcpu->run_lock);
+ return !ret;
+}
+
+static void gh_vcpu_unpopulate(struct gh_vm_resource_ticket *ticket,
+ struct gh_resource *ghrsc)
+{
+ struct gh_vcpu *vcpu = container_of(ticket, struct gh_vcpu, ticket);
+
+ vcpu->vcpu_run->immediate_exit = true;
+ complete_all(&vcpu->ready);
+ mutex_lock(&vcpu->run_lock);
+ free_irq(vcpu->rsc->irq, vcpu);
+ vcpu->rsc = NULL;
+ mutex_unlock(&vcpu->run_lock);
+}
+
+static long gh_vcpu_bind(struct gh_vm_function_instance *f)
+{
+ struct gh_fn_vcpu_arg *arg = f->argp;
+ struct gh_vcpu *vcpu;
+ char name[MAX_VCPU_NAME];
+ struct file *file;
+ struct page *page;
+ int fd;
+ long r;
+
+ if (f->arg_size != sizeof(*arg))
+ return -EINVAL;
+
+ vcpu = kzalloc(sizeof(*vcpu), GFP_KERNEL);
+ if (!vcpu)
+ return -ENOMEM;
+
+ vcpu->f = f;
+ f->data = vcpu;
+ mutex_init(&vcpu->run_lock);
+ kref_init(&vcpu->kref);
+
+ page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+ if (!page) {
+ r = -ENOMEM;
+ goto err_destroy_vcpu;
+ }
+ vcpu->vcpu_run = page_address(page);
+
+ vcpu->ticket.resource_type = GH_RESOURCE_TYPE_VCPU;
+ vcpu->ticket.label = arg->id;
+ vcpu->ticket.owner = THIS_MODULE;
+ vcpu->ticket.populate = gh_vcpu_populate;
+ vcpu->ticket.unpopulate = gh_vcpu_unpopulate;
+
+ r = gh_vm_add_resource_ticket(f->ghvm, &vcpu->ticket);
+ if (r)
+ goto err_destroy_page;
+
+ if (!gh_vm_get(f->ghvm)) {
+ r = -ENODEV;
+ goto err_remove_resource_ticket;
+ }
+ vcpu->ghvm = f->ghvm;
+
+ vcpu->nb.notifier_call = gh_vcpu_rm_notification;
+ /* Ensure we run after the vm_mgr handles the notification and does
+ * any necessary state changes. We wake up to check the new state.
+ */
+ vcpu->nb.priority = -1;
+ r = gh_rm_notifier_register(f->rm, &vcpu->nb);
+ if (r)
+ goto err_put_gh_vm;
+
+ kref_get(&vcpu->kref);
+
+ fd = get_unused_fd_flags(O_CLOEXEC);
+ if (fd < 0) {
+ r = fd;
+ goto err_notifier;
+ }
+
+ snprintf(name, sizeof(name), "gh-vcpu:%u", vcpu->ticket.label);
+ file = anon_inode_getfile(name, &gh_vcpu_fops, vcpu, O_RDWR);
+ if (IS_ERR(file)) {
+ r = PTR_ERR(file);
+ goto err_put_fd;
+ }
+
+ fd_install(fd, file);
+
+ return fd;
+err_put_fd:
+ put_unused_fd(fd);
+err_notifier:
+ gh_rm_notifier_unregister(f->rm, &vcpu->nb);
+err_put_gh_vm:
+ gh_vm_put(vcpu->ghvm);
+err_remove_resource_ticket:
+ gh_vm_remove_resource_ticket(f->ghvm, &vcpu->ticket);
+err_destroy_page:
+ free_page((unsigned long)vcpu->vcpu_run);
+err_destroy_vcpu:
+ kfree(vcpu);
+ return r;
+}
+
+static void gh_vcpu_unbind(struct gh_vm_function_instance *f)
+{
+ struct gh_vcpu *vcpu = f->data;
+
+ gh_rm_notifier_unregister(f->rm, &vcpu->nb);
+ gh_vm_remove_resource_ticket(vcpu->f->ghvm, &vcpu->ticket);
+ vcpu->f = NULL;
+
+ kref_put(&vcpu->kref, vcpu_release);
+}
+
+static bool gh_vcpu_compare(const struct gh_vm_function_instance *f,
+ const void *arg, size_t size)
+{
+ const struct gh_fn_vcpu_arg *instance = f->argp,
+ *other = arg;
+
+ if (sizeof(*other) != size)
+ return false;
+
+ return instance->id == other->id;
+}
+
+DECLARE_GH_VM_FUNCTION_INIT(vcpu, GH_FN_VCPU, 1, gh_vcpu_bind, gh_vcpu_unbind, gh_vcpu_compare);
+MODULE_DESCRIPTION("Gunyah vCPU Function");
+MODULE_LICENSE("GPL");
diff --git a/drivers/virt/gunyah/vm_mgr.c b/drivers/virt/gunyah/vm_mgr.c
index 81b42ab675f6..e7844204c151 100644
--- a/drivers/virt/gunyah/vm_mgr.c
+++ b/drivers/virt/gunyah/vm_mgr.c
@@ -408,6 +408,10 @@ static int gh_vm_rm_notification_exited(struct gh_vm *ghvm, void *data)
down_write(&ghvm->status_lock);
ghvm->vm_status = GH_RM_VM_STATUS_EXITED;
+ ghvm->exit_info.type = le16_to_cpu(payload->exit_type);
+ ghvm->exit_info.reason_size = le32_to_cpu(payload->exit_reason_size);
+ memcpy(&ghvm->exit_info.reason, payload->exit_reason,
+ min(GH_VM_MAX_EXIT_REASON_SIZE, ghvm->exit_info.reason_size));
up_write(&ghvm->status_lock);
wake_up(&ghvm->vm_status_wait);
diff --git a/drivers/virt/gunyah/vm_mgr.h b/drivers/virt/gunyah/vm_mgr.h
index 3fc0f91dfd1a..fa2a61e10b57 100644
--- a/drivers/virt/gunyah/vm_mgr.h
+++ b/drivers/virt/gunyah/vm_mgr.h
@@ -45,6 +45,7 @@ struct gh_vm {
enum gh_rm_vm_status vm_status;
wait_queue_head_t vm_status_wait;
struct rw_semaphore status_lock;
+ struct gh_vm_exit_info exit_info;
struct work_struct free_work;
struct kref kref;
diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
index 4b398b59c2c5..cd5704a82c6a 100644
--- a/include/linux/gunyah.h
+++ b/include/linux/gunyah.h
@@ -177,4 +177,28 @@ enum gh_error gh_hypercall_msgq_send(u64 capid, size_t size, void *buff, u64 tx_
enum gh_error gh_hypercall_msgq_recv(u64 capid, void *buff, size_t size, size_t *recv_size,
bool *ready);
+struct gh_hypercall_vcpu_run_resp {
+ union {
+ enum {
+ /* VCPU is ready to run */
+ GH_VCPU_STATE_READY = 0,
+ /* VCPU is sleeping until an interrupt arrives */
+ GH_VCPU_STATE_EXPECTS_WAKEUP = 1,
+ /* VCPU is powered off */
+ GH_VCPU_STATE_POWERED_OFF = 2,
+ /* VCPU is blocked in EL2 for unspecified reason */
+ GH_VCPU_STATE_BLOCKED = 3,
+ /* VCPU has returned for MMIO READ */
+ GH_VCPU_ADDRSPACE_VMMIO_READ = 4,
+ /* VCPU has returned for MMIO WRITE */
+ GH_VCPU_ADDRSPACE_VMMIO_WRITE = 5,
+ } state;
+ u64 sized_state;
+ };
+ u64 state_data[3];
+};
+
+enum gh_error gh_hypercall_vcpu_run(u64 capid, u64 *resume_data,
+ struct gh_hypercall_vcpu_run_resp *resp);
+
#endif
diff --git a/include/uapi/linux/gunyah.h b/include/uapi/linux/gunyah.h
index bb07118a351f..434ffa8ffc78 100644
--- a/include/uapi/linux/gunyah.h
+++ b/include/uapi/linux/gunyah.h
@@ -72,8 +72,33 @@ struct gh_vm_dtb_config {
#define GH_VM_START _IO(GH_IOCTL_TYPE, 0x3)
+/**
+ * enum gh_fn_type - Valid types of Gunyah VM functions
+ * @GH_FN_VCPU: create a vCPU instance to control a vCPU
+ * &struct gh_fn_desc.arg is a pointer to &struct gh_fn_vcpu_arg
+ * Return: file descriptor to manipulate the vcpu.
+ */
+enum gh_fn_type {
+ GH_FN_VCPU = 1,
+};
+
#define GH_FN_MAX_ARG_SIZE 256
+/**
+ * struct gh_fn_vcpu_arg - Arguments to create a vCPU.
+ * @id: vcpu id
+ *
+ * Create this function with &GH_VM_ADD_FUNCTION using type &GH_FN_VCPU.
+ *
+ * The vcpu type will register with the VM Manager to expect to control
+ * vCPU number `vcpu_id`. It returns a file descriptor allowing interaction with
+ * the vCPU. See the Gunyah vCPU API description sections for interacting with
+ * the Gunyah vCPU file descriptors.
+ */
+struct gh_fn_vcpu_arg {
+ __u32 id;
+};
+
/**
* struct gh_fn_desc - Arguments to create a VM function
* @type: Type of the function. See &enum gh_fn_type.
@@ -90,4 +115,107 @@ struct gh_fn_desc {
#define GH_VM_ADD_FUNCTION _IOW(GH_IOCTL_TYPE, 0x4, struct gh_fn_desc)
#define GH_VM_REMOVE_FUNCTION _IOW(GH_IOCTL_TYPE, 0x7, struct gh_fn_desc)
+/*
+ * ioctls for vCPU fds
+ */
+
+/**
+ * enum gh_vm_status - Stores status reason why VM is not runnable (exited).
+ * @GH_VM_STATUS_LOAD_FAILED: VM didn't start because it couldn't be loaded.
+ * @GH_VM_STATUS_EXITED: VM requested shutdown/reboot.
+ * Use &struct gh_vm_exit_info.reason for further details.
+ * @GH_VM_STATUS_CRASHED: VM state is unknown and has crashed.
+ */
+enum gh_vm_status {
+ GH_VM_STATUS_LOAD_FAILED = 1,
+ GH_VM_STATUS_EXITED = 2,
+ GH_VM_STATUS_CRASHED = 3,
+};
+
+/*
+ * Gunyah presently sends max 4 bytes of exit_reason.
+ * If that changes, this macro can be safely increased without breaking
+ * userspace so long as struct gh_vcpu_run < PAGE_SIZE.
+ */
+#define GH_VM_MAX_EXIT_REASON_SIZE 8u
+
+/**
+ * struct gh_vm_exit_info - Reason for VM exit as reported by Gunyah
+ * See Gunyah documentation for values.
+ * @type: Describes how VM exited
+ * @padding: padding bytes
+ * @reason_size: Number of bytes valid for `reason`
+ * @reason: See Gunyah documentation for interpretation. Note: these values are
+ * not interpreted by Linux and need to be converted from little-endian
+ * as applicable.
+ */
+struct gh_vm_exit_info {
+ __u16 type;
+ __u16 padding;
+ __u32 reason_size;
+ __u8 reason[GH_VM_MAX_EXIT_REASON_SIZE];
+};
+
+/**
+ * enum gh_vcpu_exit - Stores reason why &GH_VCPU_RUN ioctl recently exited with status 0
+ * @GH_VCPU_EXIT_UNKNOWN: Not used, status != 0
+ * @GH_VCPU_EXIT_MMIO: vCPU performed a read or write that could not be handled
+ * by hypervisor or Linux. Use @struct gh_vcpu_run.mmio for
+ * details of the read/write.
+ * @GH_VCPU_EXIT_STATUS: vCPU not able to run because the VM has exited.
+ * Use @struct gh_vcpu_run.status for why VM has exited.
+ */
+enum gh_vcpu_exit {
+ GH_VCPU_EXIT_UNKNOWN,
+ GH_VCPU_EXIT_MMIO,
+ GH_VCPU_EXIT_STATUS,
+};
+
+/**
+ * struct gh_vcpu_run - Application code obtains a pointer to the gh_vcpu_run
+ * structure by mmap()ing a vcpu fd.
+ * @immediate_exit: polled when scheduling the vcpu. If set, immediately returns -EINTR.
+ * @padding: padding bytes
+ * @exit_reason: Set when GH_VCPU_RUN returns successfully and gives reason why
+ * GH_VCPU_RUN has stopped running the vCPU. See &enum gh_vcpu_exit.
+ * @mmio: Used when exit_reason == GH_VCPU_EXIT_MMIO
+ * The guest has faulted on an memory-mapped I/O instruction that
+ * couldn't be satisfied by gunyah.
+ * @mmio.phys_addr: Address guest tried to access
+ * @mmio.data: the value that was written if `is_write == 1`. Filled by
+ * user for reads (`is_write == 0`).
+ * @mmio.len: Length of write. Only the first `len` bytes of `data`
+ * are considered by Gunyah.
+ * @mmio.is_write: 1 if VM tried to perform a write, 0 for a read
+ * @status: Used when exit_reason == GH_VCPU_EXIT_STATUS.
+ * The guest VM is no longer runnable. This struct informs why.
+ * @status.status: See &enum gh_vm_status for possible values
+ * @status.exit_info: Used when status == GH_VM_STATUS_EXITED
+ */
+struct gh_vcpu_run {
+ /* in */
+ __u8 immediate_exit;
+ __u8 padding[7];
+
+ /* out */
+ __u32 exit_reason;
+
+ union {
+ struct {
+ __u64 phys_addr;
+ __u8 data[8];
+ __u32 len;
+ __u8 is_write;
+ } mmio;
+
+ struct {
+ enum gh_vm_status status;
+ struct gh_vm_exit_info exit_info;
+ } status;
+ };
+};
+
+#define GH_VCPU_RUN _IO(GH_IOCTL_TYPE, 0x5)
+#define GH_VCPU_MMAP_SIZE _IO(GH_IOCTL_TYPE, 0x6)
+
#endif
--
2.40.0
When booting a Gunyah virtual machine, the host VM may gain capabilities
to interact with resources for the guest virtual machine. Examples of
such resources are vCPUs or message queues. To use those resources, we
need to translate the RM response into a gunyah_resource structure which
are useful to Linux drivers. Presently, Linux drivers need only to know
the type of resource, the capability ID, and an interrupt.
On ARM64 systems, the interrupt reported by Gunyah is the GIC interrupt
ID number and always a SPI.
Signed-off-by: Elliot Berman <[email protected]>
---
arch/arm64/include/asm/gunyah.h | 24 +++++
drivers/virt/gunyah/rsc_mgr.c | 162 +++++++++++++++++++++++++++++++-
include/linux/gunyah.h | 3 +
include/linux/gunyah_rsc_mgr.h | 3 +
4 files changed, 191 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/include/asm/gunyah.h
diff --git a/arch/arm64/include/asm/gunyah.h b/arch/arm64/include/asm/gunyah.h
new file mode 100644
index 000000000000..c83d983b0f4e
--- /dev/null
+++ b/arch/arm64/include/asm/gunyah.h
@@ -0,0 +1,24 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+#ifndef _ASM_GUNYAH_H
+#define _ASM_GUNYAH_H
+
+#include <linux/irq.h>
+#include <dt-bindings/interrupt-controller/arm-gic.h>
+
+static inline int arch_gh_fill_irq_fwspec_params(u32 virq, struct irq_fwspec *fwspec)
+{
+ /* Assume that Gunyah gave us an SPI; defensively check it */
+ if (WARN_ON(virq < 32 || virq > 1019))
+ return -EINVAL;
+
+ fwspec->param_count = 3;
+ fwspec->param[0] = GIC_SPI;
+ fwspec->param[1] = virq - 32;
+ fwspec->param[2] = IRQ_TYPE_EDGE_RISING;
+ return 0;
+}
+
+#endif
diff --git a/drivers/virt/gunyah/rsc_mgr.c b/drivers/virt/gunyah/rsc_mgr.c
index 4f6f96bdcf3d..43ea010ea47a 100644
--- a/drivers/virt/gunyah/rsc_mgr.c
+++ b/drivers/virt/gunyah/rsc_mgr.c
@@ -17,6 +17,8 @@
#include <linux/platform_device.h>
#include <linux/miscdevice.h>
+#include <asm/gunyah.h>
+
#include "rsc_mgr.h"
#include "vm_mgr.h"
@@ -133,6 +135,7 @@ struct gh_rm_connection {
* @send_lock: synchronization to allow only one request to be sent at a time
* @nh: notifier chain for clients interested in RM notification messages
* @miscdev: /dev/gunyah
+ * @irq_domain: Domain to translate Gunyah hwirqs to Linux irqs
*/
struct gh_rm {
struct device *dev;
@@ -151,6 +154,7 @@ struct gh_rm {
struct blocking_notifier_head nh;
struct miscdevice miscdev;
+ struct irq_domain *irq_domain;
};
/**
@@ -191,6 +195,133 @@ static inline int gh_rm_remap_error(enum gh_rm_error rm_error)
}
}
+struct gh_irq_chip_data {
+ u32 gh_virq;
+};
+
+static struct irq_chip gh_rm_irq_chip = {
+ .name = "Gunyah",
+ .irq_enable = irq_chip_enable_parent,
+ .irq_disable = irq_chip_disable_parent,
+ .irq_ack = irq_chip_ack_parent,
+ .irq_mask = irq_chip_mask_parent,
+ .irq_mask_ack = irq_chip_mask_ack_parent,
+ .irq_unmask = irq_chip_unmask_parent,
+ .irq_eoi = irq_chip_eoi_parent,
+ .irq_set_affinity = irq_chip_set_affinity_parent,
+ .irq_set_type = irq_chip_set_type_parent,
+ .irq_set_wake = irq_chip_set_wake_parent,
+ .irq_set_vcpu_affinity = irq_chip_set_vcpu_affinity_parent,
+ .irq_retrigger = irq_chip_retrigger_hierarchy,
+ .irq_get_irqchip_state = irq_chip_get_parent_state,
+ .irq_set_irqchip_state = irq_chip_set_parent_state,
+ .flags = IRQCHIP_SET_TYPE_MASKED |
+ IRQCHIP_SKIP_SET_WAKE |
+ IRQCHIP_MASK_ON_SUSPEND,
+};
+
+static int gh_rm_irq_domain_alloc(struct irq_domain *d, unsigned int virq, unsigned int nr_irqs,
+ void *arg)
+{
+ struct gh_irq_chip_data *chip_data, *spec = arg;
+ struct irq_fwspec parent_fwspec;
+ struct gh_rm *rm = d->host_data;
+ u32 gh_virq = spec->gh_virq;
+ int ret;
+
+ if (nr_irqs != 1)
+ return -EINVAL;
+
+ chip_data = kzalloc(sizeof(*chip_data), GFP_KERNEL);
+ if (!chip_data)
+ return -ENOMEM;
+
+ chip_data->gh_virq = gh_virq;
+
+ ret = irq_domain_set_hwirq_and_chip(d, virq, chip_data->gh_virq, &gh_rm_irq_chip,
+ chip_data);
+ if (ret)
+ goto err_free_irq_data;
+
+ parent_fwspec.fwnode = d->parent->fwnode;
+ ret = arch_gh_fill_irq_fwspec_params(chip_data->gh_virq, &parent_fwspec);
+ if (ret) {
+ dev_err(rm->dev, "virq translation failed %u: %d\n", chip_data->gh_virq, ret);
+ goto err_free_irq_data;
+ }
+
+ ret = irq_domain_alloc_irqs_parent(d, virq, nr_irqs, &parent_fwspec);
+ if (ret)
+ goto err_free_irq_data;
+
+ return ret;
+err_free_irq_data:
+ kfree(chip_data);
+ return ret;
+}
+
+static void gh_rm_irq_domain_free_single(struct irq_domain *d, unsigned int virq)
+{
+ struct irq_data *irq_data;
+
+ irq_data = irq_domain_get_irq_data(d, virq);
+ if (!irq_data)
+ return;
+
+ kfree(irq_data->chip_data);
+ irq_data->chip_data = NULL;
+}
+
+static void gh_rm_irq_domain_free(struct irq_domain *d, unsigned int virq, unsigned int nr_irqs)
+{
+ unsigned int i;
+
+ for (i = 0; i < nr_irqs; i++)
+ gh_rm_irq_domain_free_single(d, virq);
+}
+
+static const struct irq_domain_ops gh_rm_irq_domain_ops = {
+ .alloc = gh_rm_irq_domain_alloc,
+ .free = gh_rm_irq_domain_free,
+};
+
+struct gh_resource *gh_rm_alloc_resource(struct gh_rm *rm, struct gh_rm_hyp_resource *hyp_resource)
+{
+ struct gh_resource *ghrsc;
+ int ret;
+
+ ghrsc = kzalloc(sizeof(*ghrsc), GFP_KERNEL);
+ if (!ghrsc)
+ return NULL;
+
+ ghrsc->type = hyp_resource->type;
+ ghrsc->capid = le64_to_cpu(hyp_resource->cap_id);
+ ghrsc->irq = IRQ_NOTCONNECTED;
+ ghrsc->rm_label = le32_to_cpu(hyp_resource->resource_label);
+ if (hyp_resource->virq) {
+ struct gh_irq_chip_data irq_data = {
+ .gh_virq = le32_to_cpu(hyp_resource->virq),
+ };
+
+ ret = irq_domain_alloc_irqs(rm->irq_domain, 1, NUMA_NO_NODE, &irq_data);
+ if (ret < 0) {
+ dev_err(rm->dev,
+ "Failed to allocate interrupt for resource %d label: %d: %d\n",
+ ghrsc->type, ghrsc->rm_label, ghrsc->irq);
+ } else {
+ ghrsc->irq = ret;
+ }
+ }
+
+ return ghrsc;
+}
+
+void gh_rm_free_resource(struct gh_resource *ghrsc)
+{
+ irq_dispose_mapping(ghrsc->irq);
+ kfree(ghrsc);
+}
+
static int gh_rm_init_connection_payload(struct gh_rm_connection *connection, void *msg,
size_t hdr_size, size_t msg_size)
{
@@ -661,6 +792,8 @@ static int gh_identify(void)
static int gh_rm_drv_probe(struct platform_device *pdev)
{
+ struct irq_domain *parent_irq_domain;
+ struct device_node *parent_irq_node;
struct gh_msgq_tx_data *msg;
struct gh_rm *rm;
int ret;
@@ -701,15 +834,41 @@ static int gh_rm_drv_probe(struct platform_device *pdev)
if (ret)
goto err_cache;
+ parent_irq_node = of_irq_find_parent(pdev->dev.of_node);
+ if (!parent_irq_node) {
+ dev_err(&pdev->dev, "Failed to find interrupt parent of resource manager\n");
+ ret = -ENODEV;
+ goto err_msgq;
+ }
+
+ parent_irq_domain = irq_find_host(parent_irq_node);
+ if (!parent_irq_domain) {
+ dev_err(&pdev->dev, "Failed to find interrupt parent domain of resource manager\n");
+ ret = -ENODEV;
+ goto err_msgq;
+ }
+
+ rm->irq_domain = irq_domain_add_hierarchy(parent_irq_domain, 0, 0, pdev->dev.of_node,
+ &gh_rm_irq_domain_ops, NULL);
+ if (!rm->irq_domain) {
+ dev_err(&pdev->dev, "Failed to add irq domain\n");
+ ret = -ENODEV;
+ goto err_msgq;
+ }
+ rm->irq_domain->host_data = rm;
+
+ rm->miscdev.parent = &pdev->dev;
rm->miscdev.name = "gunyah";
rm->miscdev.minor = MISC_DYNAMIC_MINOR;
rm->miscdev.fops = &gh_dev_fops;
ret = misc_register(&rm->miscdev);
if (ret)
- goto err_msgq;
+ goto err_irq_domain;
return 0;
+err_irq_domain:
+ irq_domain_remove(rm->irq_domain);
err_msgq:
mbox_free_channel(gh_msgq_chan(&rm->msgq));
gh_msgq_remove(&rm->msgq);
@@ -723,6 +882,7 @@ static int gh_rm_drv_remove(struct platform_device *pdev)
struct gh_rm *rm = platform_get_drvdata(pdev);
misc_deregister(&rm->miscdev);
+ irq_domain_remove(rm->irq_domain);
mbox_free_channel(gh_msgq_chan(&rm->msgq));
gh_msgq_remove(&rm->msgq);
kmem_cache_destroy(rm->cache);
diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
index 982e27d10d57..4b398b59c2c5 100644
--- a/include/linux/gunyah.h
+++ b/include/linux/gunyah.h
@@ -27,6 +27,9 @@ struct gh_resource {
enum gh_resource_type type;
u64 capid;
unsigned int irq;
+
+ struct list_head list;
+ u32 rm_label;
};
/**
diff --git a/include/linux/gunyah_rsc_mgr.h b/include/linux/gunyah_rsc_mgr.h
index 7c599654ea30..e74e867583f5 100644
--- a/include/linux/gunyah_rsc_mgr.h
+++ b/include/linux/gunyah_rsc_mgr.h
@@ -139,6 +139,9 @@ int gh_rm_get_hyp_resources(struct gh_rm *rm, u16 vmid,
struct gh_rm_hyp_resources **resources);
int gh_rm_get_vmid(struct gh_rm *rm, u16 *vmid);
+struct gh_resource *gh_rm_alloc_resource(struct gh_rm *rm, struct gh_rm_hyp_resource *hyp_resource);
+void gh_rm_free_resource(struct gh_resource *ghrsc);
+
struct gh_rm_platform_ops {
int (*pre_mem_share)(struct gh_rm *rm, struct gh_rm_mem_parcel *mem_parcel);
int (*post_mem_reclaim)(struct gh_rm *rm, struct gh_rm_mem_parcel *mem_parcel);
--
2.40.0
Add myself and Prakruthi as maintainers of Gunyah hypervisor drivers.
Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Elliot Berman <[email protected]>
---
MAINTAINERS | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index c754befb94e7..323391320cf1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8970,6 +8970,19 @@ L: [email protected]
S: Maintained
F: block/partitions/efi.*
+GUNYAH HYPERVISOR DRIVER
+M: Elliot Berman <[email protected]>
+M: Prakruthi Deepak Heragu <[email protected]>
+L: [email protected]
+S: Supported
+F: Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml
+F: Documentation/virt/gunyah/
+F: arch/arm64/gunyah/
+F: drivers/mailbox/gunyah-msgq.c
+F: drivers/virt/gunyah/
+F: include/linux/gunyah*.h
+F: samples/gunyah/
+
HABANALABS PCI DRIVER
M: Oded Gabbay <[email protected]>
L: [email protected]
--
2.40.0
Introduce a framework for Gunyah userspace to install VM functions. VM
functions are optional interfaces to the virtual machine. vCPUs,
ioeventfs, and irqfds are examples of such VM functions and are
implemented in subsequent patches.
A generic framework is implemented instead of individual ioctls to
create vCPUs, irqfds, etc., in order to simplify the VM manager core
implementation and allow dynamic loading of VM function modules.
Signed-off-by: Elliot Berman <[email protected]>
---
Documentation/virt/gunyah/vm-manager.rst | 18 ++
drivers/virt/gunyah/vm_mgr.c | 216 ++++++++++++++++++++++-
drivers/virt/gunyah/vm_mgr.h | 4 +
include/linux/gunyah_vm_mgr.h | 87 +++++++++
include/uapi/linux/gunyah.h | 18 ++
5 files changed, 340 insertions(+), 3 deletions(-)
create mode 100644 include/linux/gunyah_vm_mgr.h
diff --git a/Documentation/virt/gunyah/vm-manager.rst b/Documentation/virt/gunyah/vm-manager.rst
index 50d8ae7fabcd..3b51bab9d793 100644
--- a/Documentation/virt/gunyah/vm-manager.rst
+++ b/Documentation/virt/gunyah/vm-manager.rst
@@ -17,6 +17,24 @@ sharing userspace memory with a VM is done via the `GH_VM_SET_USER_MEM_REGION`_
ioctl. The VM itself is configured to use the memory region via the
devicetree.
+Gunyah Functions
+================
+
+Components of a Gunyah VM's configuration that need kernel configuration are
+called "functions" and are built on top of a framework. Functions are identified
+by a string and have some argument(s) to configure them. They are typically
+created by the `GH_VM_ADD_FUNCTION`_ ioctl.
+
+Functions typically will always do at least one of these operations:
+
+1. Create resource ticket(s). Resource tickets allow a function to register
+ itself as the client for a Gunyah resource (e.g. doorbell or vCPU) and
+ the function is given the pointer to the &struct gh_resource when the
+ VM is starting.
+
+2. Register IO handler(s). IO handlers allow a function to handle stage-2 faults
+ from the virtual machine.
+
Sample Userspace VMM
====================
diff --git a/drivers/virt/gunyah/vm_mgr.c b/drivers/virt/gunyah/vm_mgr.c
index a800061f56bf..56464451b262 100644
--- a/drivers/virt/gunyah/vm_mgr.c
+++ b/drivers/virt/gunyah/vm_mgr.c
@@ -6,10 +6,13 @@
#define pr_fmt(fmt) "gh_vm_mgr: " fmt
#include <linux/anon_inodes.h>
+#include <linux/compat.h>
#include <linux/file.h>
#include <linux/gunyah_rsc_mgr.h>
+#include <linux/gunyah_vm_mgr.h>
#include <linux/miscdevice.h>
#include <linux/module.h>
+#include <linux/xarray.h>
#include <uapi/linux/gunyah.h>
@@ -17,6 +20,172 @@
static void gh_vm_free(struct work_struct *work);
+static DEFINE_XARRAY(gh_vm_functions);
+
+static void gh_vm_put_function(struct gh_vm_function *fn)
+{
+ module_put(fn->mod);
+}
+
+static struct gh_vm_function *gh_vm_get_function(u32 type)
+{
+ struct gh_vm_function *fn;
+ int r;
+
+ fn = xa_load(&gh_vm_functions, type);
+ if (!fn) {
+ r = request_module("ghfunc:%d", type);
+ if (r)
+ return ERR_PTR(r > 0 ? -r : r);
+
+ fn = xa_load(&gh_vm_functions, type);
+ }
+
+ if (!fn || !try_module_get(fn->mod))
+ fn = ERR_PTR(-ENOENT);
+
+ return fn;
+}
+
+static void gh_vm_remove_function_instance(struct gh_vm_function_instance *inst)
+ __must_hold(&inst->ghvm->fn_lock)
+{
+ inst->fn->unbind(inst);
+ list_del(&inst->vm_list);
+ gh_vm_put_function(inst->fn);
+ kfree(inst->argp);
+ kfree(inst);
+}
+
+static void gh_vm_remove_functions(struct gh_vm *ghvm)
+{
+ struct gh_vm_function_instance *inst, *iiter;
+
+ mutex_lock(&ghvm->fn_lock);
+ list_for_each_entry_safe(inst, iiter, &ghvm->functions, vm_list) {
+ gh_vm_remove_function_instance(inst);
+ }
+ mutex_unlock(&ghvm->fn_lock);
+}
+
+static long gh_vm_add_function_instance(struct gh_vm *ghvm, struct gh_fn_desc *f)
+{
+ struct gh_vm_function_instance *inst;
+ void __user *argp;
+ long r = 0;
+
+ if (f->arg_size > GH_FN_MAX_ARG_SIZE) {
+ dev_err_ratelimited(ghvm->parent, "%s: arg_size > %d\n",
+ __func__, GH_FN_MAX_ARG_SIZE);
+ return -EINVAL;
+ }
+
+ inst = kzalloc(sizeof(*inst), GFP_KERNEL);
+ if (!inst)
+ return -ENOMEM;
+
+ inst->arg_size = f->arg_size;
+ if (inst->arg_size) {
+ inst->argp = kzalloc(inst->arg_size, GFP_KERNEL);
+ if (!inst->argp) {
+ r = -ENOMEM;
+ goto free;
+ }
+
+ argp = u64_to_user_ptr(f->arg);
+ if (copy_from_user(inst->argp, argp, f->arg_size)) {
+ r = -EFAULT;
+ goto free_arg;
+ }
+ }
+
+ inst->fn = gh_vm_get_function(f->type);
+ if (IS_ERR(inst->fn)) {
+ r = PTR_ERR(inst->fn);
+ goto free_arg;
+ }
+
+ inst->ghvm = ghvm;
+ inst->rm = ghvm->rm;
+
+ mutex_lock(&ghvm->fn_lock);
+ r = inst->fn->bind(inst);
+ if (r < 0) {
+ mutex_unlock(&ghvm->fn_lock);
+ gh_vm_put_function(inst->fn);
+ goto free_arg;
+ }
+
+ list_add(&inst->vm_list, &ghvm->functions);
+ mutex_unlock(&ghvm->fn_lock);
+
+ return r;
+free_arg:
+ kfree(inst->argp);
+free:
+ kfree(inst);
+ return r;
+}
+
+static long gh_vm_rm_function_instance(struct gh_vm *ghvm, struct gh_fn_desc *f)
+{
+ struct gh_vm_function_instance *inst, *iter;
+ void __user *user_argp;
+ void *argp;
+ long r = 0;
+
+ r = mutex_lock_interruptible(&ghvm->fn_lock);
+ if (r)
+ return r;
+
+ if (f->arg_size) {
+ argp = kzalloc(f->arg_size, GFP_KERNEL);
+ if (!argp) {
+ r = -ENOMEM;
+ goto out;
+ }
+
+ user_argp = u64_to_user_ptr(f->arg);
+ if (copy_from_user(argp, user_argp, f->arg_size)) {
+ r = -EFAULT;
+ kfree(argp);
+ goto out;
+ }
+
+ r = -ENOENT;
+ list_for_each_entry_safe(inst, iter, &ghvm->functions, vm_list) {
+ if (inst->fn->type == f->type &&
+ inst->fn->compare(inst, argp, f->arg_size)) {
+ gh_vm_remove_function_instance(inst);
+ r = 0;
+ }
+ }
+
+ kfree(argp);
+ }
+
+out:
+ mutex_unlock(&ghvm->fn_lock);
+ return r;
+}
+
+int gh_vm_function_register(struct gh_vm_function *fn)
+{
+ if (!fn->bind || !fn->unbind)
+ return -EINVAL;
+
+ return xa_err(xa_store(&gh_vm_functions, fn->type, fn, GFP_KERNEL));
+}
+EXPORT_SYMBOL_GPL(gh_vm_function_register);
+
+void gh_vm_function_unregister(struct gh_vm_function *fn)
+{
+ /* Expecting unregister to only come when unloading a module */
+ WARN_ON(fn->mod && module_refcount(fn->mod));
+ xa_erase(&gh_vm_functions, fn->type);
+}
+EXPORT_SYMBOL_GPL(gh_vm_function_unregister);
+
static int gh_vm_rm_notification_status(struct gh_vm *ghvm, void *data)
{
struct gh_rm_vm_status_payload *payload = data;
@@ -98,6 +267,8 @@ static __must_check struct gh_vm *gh_vm_alloc(struct gh_rm *rm)
init_rwsem(&ghvm->status_lock);
init_waitqueue_head(&ghvm->vm_status_wait);
INIT_WORK(&ghvm->free_work, gh_vm_free);
+ kref_init(&ghvm->kref);
+ INIT_LIST_HEAD(&ghvm->functions);
ghvm->vm_status = GH_RM_VM_STATUS_NO_STATE;
return ghvm;
@@ -254,6 +425,24 @@ static long gh_vm_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
r = gh_vm_ensure_started(ghvm);
break;
}
+ case GH_VM_ADD_FUNCTION: {
+ struct gh_fn_desc f;
+
+ if (copy_from_user(&f, argp, sizeof(f)))
+ return -EFAULT;
+
+ r = gh_vm_add_function_instance(ghvm, &f);
+ break;
+ }
+ case GH_VM_REMOVE_FUNCTION: {
+ struct gh_fn_desc f;
+
+ if (copy_from_user(&f, argp, sizeof(f)))
+ return -EFAULT;
+
+ r = gh_vm_rm_function_instance(ghvm, &f);
+ break;
+ }
default:
r = -ENOTTY;
break;
@@ -270,6 +459,8 @@ static void gh_vm_free(struct work_struct *work)
if (ghvm->vm_status == GH_RM_VM_STATUS_RUNNING)
gh_vm_stop(ghvm);
+ gh_vm_remove_functions(ghvm);
+
if (ghvm->vm_status != GH_RM_VM_STATUS_NO_STATE &&
ghvm->vm_status != GH_RM_VM_STATUS_LOAD &&
ghvm->vm_status != GH_RM_VM_STATUS_RESET) {
@@ -294,14 +485,33 @@ static void gh_vm_free(struct work_struct *work)
kfree(ghvm);
}
-static int gh_vm_release(struct inode *inode, struct file *filp)
+int __must_check gh_vm_get(struct gh_vm *ghvm)
{
- struct gh_vm *ghvm = filp->private_data;
+ return kref_get_unless_zero(&ghvm->kref);
+}
+EXPORT_SYMBOL_GPL(gh_vm_get);
+
+static void _gh_vm_put(struct kref *kref)
+{
+ struct gh_vm *ghvm = container_of(kref, struct gh_vm, kref);
/* VM will be reset and make RM calls which can interruptible sleep.
* Defer to a work so this thread can receive signal.
*/
schedule_work(&ghvm->free_work);
+}
+
+void gh_vm_put(struct gh_vm *ghvm)
+{
+ kref_put(&ghvm->kref, _gh_vm_put);
+}
+EXPORT_SYMBOL_GPL(gh_vm_put);
+
+static int gh_vm_release(struct inode *inode, struct file *filp)
+{
+ struct gh_vm *ghvm = filp->private_data;
+
+ gh_vm_put(ghvm);
return 0;
}
@@ -346,7 +556,7 @@ static long gh_dev_ioctl_create_vm(struct gh_rm *rm, unsigned long arg)
err_put_fd:
put_unused_fd(fd);
err_destroy_vm:
- gh_vm_free(&ghvm->free_work);
+ gh_vm_put(ghvm);
return err;
}
diff --git a/drivers/virt/gunyah/vm_mgr.h b/drivers/virt/gunyah/vm_mgr.h
index 4173bd51f83f..c4bec1469ae8 100644
--- a/drivers/virt/gunyah/vm_mgr.h
+++ b/drivers/virt/gunyah/vm_mgr.h
@@ -8,6 +8,7 @@
#include <linux/gunyah_rsc_mgr.h>
#include <linux/list.h>
+#include <linux/kref.h>
#include <linux/miscdevice.h>
#include <linux/mutex.h>
#include <linux/rwsem.h>
@@ -45,9 +46,12 @@ struct gh_vm {
struct rw_semaphore status_lock;
struct work_struct free_work;
+ struct kref kref;
struct mm_struct *mm; /* userspace tied to this vm */
struct mutex mm_lock;
struct list_head memory_mappings;
+ struct mutex fn_lock;
+ struct list_head functions;
};
int gh_vm_mem_alloc(struct gh_vm *ghvm, struct gh_userspace_memory_region *region);
diff --git a/include/linux/gunyah_vm_mgr.h b/include/linux/gunyah_vm_mgr.h
new file mode 100644
index 000000000000..1f0dc43ade50
--- /dev/null
+++ b/include/linux/gunyah_vm_mgr.h
@@ -0,0 +1,87 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef _GUNYAH_VM_MGR_H
+#define _GUNYAH_VM_MGR_H
+
+#include <linux/compiler_types.h>
+#include <linux/gunyah.h>
+#include <linux/gunyah_rsc_mgr.h>
+#include <linux/list.h>
+#include <linux/mod_devicetable.h>
+#include <linux/notifier.h>
+
+#include <uapi/linux/gunyah.h>
+
+struct gh_vm;
+
+int __must_check gh_vm_get(struct gh_vm *ghvm);
+void gh_vm_put(struct gh_vm *ghvm);
+
+struct gh_vm_function_instance;
+struct gh_vm_function {
+ u32 type;
+ const char *name;
+ struct module *mod;
+ long (*bind)(struct gh_vm_function_instance *f);
+ void (*unbind)(struct gh_vm_function_instance *f);
+ bool (*compare)(const struct gh_vm_function_instance *f, const void *arg, size_t size);
+};
+
+/**
+ * struct gh_vm_function_instance - Represents one function instance
+ * @arg_size: size of user argument
+ * @argp: pointer to user argument
+ * @ghvm: Pointer to VM instance
+ * @rm: Pointer to resource manager for the VM instance
+ * @fn: The ops for the function
+ * @data: Private data for function
+ * @vm_list: for gh_vm's functions list
+ * @fn_list: for gh_vm_function's instances list
+ */
+struct gh_vm_function_instance {
+ size_t arg_size;
+ void *argp;
+ struct gh_vm *ghvm;
+ struct gh_rm *rm;
+ struct gh_vm_function *fn;
+ void *data;
+ struct list_head vm_list;
+};
+
+int gh_vm_function_register(struct gh_vm_function *f);
+void gh_vm_function_unregister(struct gh_vm_function *f);
+
+/* Since the function identifiers were setup in a uapi header as an
+ * enum and we do no want to change that, the user must supply the expanded
+ * constant as well and the compiler checks they are the same.
+ * See also MODULE_ALIAS_RDMA_NETLINK.
+ */
+#define MODULE_ALIAS_GH_VM_FUNCTION(_type, _idx) \
+ static inline void __maybe_unused __chk##_idx(void) \
+ { \
+ BUILD_BUG_ON(_type != _idx); \
+ } \
+ MODULE_ALIAS("ghfunc:" __stringify(_idx))
+
+#define DECLARE_GH_VM_FUNCTION(_name, _type, _bind, _unbind, _compare) \
+ static struct gh_vm_function _name = { \
+ .type = _type, \
+ .name = __stringify(_name), \
+ .mod = THIS_MODULE, \
+ .bind = _bind, \
+ .unbind = _unbind, \
+ .compare = _compare, \
+ }
+
+#define module_gh_vm_function(__gf) \
+ module_driver(__gf, gh_vm_function_register, gh_vm_function_unregister)
+
+#define DECLARE_GH_VM_FUNCTION_INIT(_name, _type, _idx, _bind, _unbind, _compare) \
+ DECLARE_GH_VM_FUNCTION(_name, _type, _bind, _unbind, _compare); \
+ module_gh_vm_function(_name); \
+ MODULE_ALIAS_GH_VM_FUNCTION(_type, _idx)
+
+#endif
diff --git a/include/uapi/linux/gunyah.h b/include/uapi/linux/gunyah.h
index 4b63d0b9b8ba..bb07118a351f 100644
--- a/include/uapi/linux/gunyah.h
+++ b/include/uapi/linux/gunyah.h
@@ -72,4 +72,22 @@ struct gh_vm_dtb_config {
#define GH_VM_START _IO(GH_IOCTL_TYPE, 0x3)
+#define GH_FN_MAX_ARG_SIZE 256
+
+/**
+ * struct gh_fn_desc - Arguments to create a VM function
+ * @type: Type of the function. See &enum gh_fn_type.
+ * @arg_size: Size of argument to pass to the function. arg_size <= GH_FN_MAX_ARG_SIZE
+ * @arg: Pointer to argument given to the function. See &enum gh_fn_type for expected
+ * arguments for a function type.
+ */
+struct gh_fn_desc {
+ __u32 type;
+ __u32 arg_size;
+ __u64 arg;
+};
+
+#define GH_VM_ADD_FUNCTION _IOW(GH_IOCTL_TYPE, 0x4, struct gh_fn_desc)
+#define GH_VM_REMOVE_FUNCTION _IOW(GH_IOCTL_TYPE, 0x7, struct gh_fn_desc)
+
#endif
--
2.40.0
Add Gunyah Resource Manager RPC to launch an unauthenticated VM.
Signed-off-by: Elliot Berman <[email protected]>
---
drivers/virt/gunyah/Makefile | 2 +-
drivers/virt/gunyah/rsc_mgr_rpc.c | 259 ++++++++++++++++++++++++++++++
include/linux/gunyah_rsc_mgr.h | 73 +++++++++
3 files changed, 333 insertions(+), 1 deletion(-)
create mode 100644 drivers/virt/gunyah/rsc_mgr_rpc.c
diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
index 0f5aec834698..241bab357b86 100644
--- a/drivers/virt/gunyah/Makefile
+++ b/drivers/virt/gunyah/Makefile
@@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0
-gunyah-y += rsc_mgr.o
+gunyah-y += rsc_mgr.o rsc_mgr_rpc.o
obj-$(CONFIG_GUNYAH) += gunyah.o
diff --git a/drivers/virt/gunyah/rsc_mgr_rpc.c b/drivers/virt/gunyah/rsc_mgr_rpc.c
new file mode 100644
index 000000000000..a4a9f0ba4e1f
--- /dev/null
+++ b/drivers/virt/gunyah/rsc_mgr_rpc.c
@@ -0,0 +1,259 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#include <linux/gunyah_rsc_mgr.h>
+#include "rsc_mgr.h"
+
+/* Message IDs: VM Management */
+#define GH_RM_RPC_VM_ALLOC_VMID 0x56000001
+#define GH_RM_RPC_VM_DEALLOC_VMID 0x56000002
+#define GH_RM_RPC_VM_START 0x56000004
+#define GH_RM_RPC_VM_STOP 0x56000005
+#define GH_RM_RPC_VM_RESET 0x56000006
+#define GH_RM_RPC_VM_CONFIG_IMAGE 0x56000009
+#define GH_RM_RPC_VM_INIT 0x5600000B
+#define GH_RM_RPC_VM_GET_HYP_RESOURCES 0x56000020
+#define GH_RM_RPC_VM_GET_VMID 0x56000024
+
+struct gh_rm_vm_common_vmid_req {
+ __le16 vmid;
+ __le16 _padding;
+} __packed;
+
+/* Call: VM_ALLOC */
+struct gh_rm_vm_alloc_vmid_resp {
+ __le16 vmid;
+ __le16 _padding;
+} __packed;
+
+/* Call: VM_STOP */
+#define GH_RM_VM_STOP_FLAG_FORCE_STOP BIT(0)
+
+#define GH_RM_VM_STOP_REASON_FORCE_STOP 3
+
+struct gh_rm_vm_stop_req {
+ __le16 vmid;
+ u8 flags;
+ u8 _padding;
+ __le32 stop_reason;
+} __packed;
+
+/* Call: VM_CONFIG_IMAGE */
+struct gh_rm_vm_config_image_req {
+ __le16 vmid;
+ __le16 auth_mech;
+ __le32 mem_handle;
+ __le64 image_offset;
+ __le64 image_size;
+ __le64 dtb_offset;
+ __le64 dtb_size;
+} __packed;
+
+/*
+ * Several RM calls take only a VMID as a parameter and give only standard
+ * response back. Deduplicate boilerplate code by using this common call.
+ */
+static int gh_rm_common_vmid_call(struct gh_rm *rm, u32 message_id, u16 vmid)
+{
+ struct gh_rm_vm_common_vmid_req req_payload = {
+ .vmid = cpu_to_le16(vmid),
+ };
+
+ return gh_rm_call(rm, message_id, &req_payload, sizeof(req_payload), NULL, NULL);
+}
+
+/**
+ * gh_rm_alloc_vmid() - Allocate a new VM in Gunyah. Returns the VM identifier.
+ * @rm: Handle to a Gunyah resource manager
+ * @vmid: Use 0 to dynamically allocate a VM. A reserved VMID can be supplied
+ * to request allocation of a platform-defined VM.
+ *
+ * Returns - the allocated VMID or negative value on error
+ */
+int gh_rm_alloc_vmid(struct gh_rm *rm, u16 vmid)
+{
+ struct gh_rm_vm_common_vmid_req req_payload = {
+ .vmid = cpu_to_le16(vmid),
+ };
+ struct gh_rm_vm_alloc_vmid_resp *resp_payload;
+ size_t resp_size;
+ void *resp;
+ int ret;
+
+ ret = gh_rm_call(rm, GH_RM_RPC_VM_ALLOC_VMID, &req_payload, sizeof(req_payload), &resp,
+ &resp_size);
+ if (ret)
+ return ret;
+
+ if (!vmid) {
+ resp_payload = resp;
+ ret = le16_to_cpu(resp_payload->vmid);
+ kfree(resp);
+ }
+
+ return ret;
+}
+
+/**
+ * gh_rm_dealloc_vmid() - Dispose of a VMID
+ * @rm: Handle to a Gunyah resource manager
+ * @vmid: VM identifier allocated with gh_rm_alloc_vmid
+ */
+int gh_rm_dealloc_vmid(struct gh_rm *rm, u16 vmid)
+{
+ return gh_rm_common_vmid_call(rm, GH_RM_RPC_VM_DEALLOC_VMID, vmid);
+}
+
+/**
+ * gh_rm_vm_reset() - Reset a VM's resources
+ * @rm: Handle to a Gunyah resource manager
+ * @vmid: VM identifier allocated with gh_rm_alloc_vmid
+ *
+ * As part of tearing down the VM, request RM to clean up all the VM resources
+ * associated with the VM. Only after this, Linux can clean up all the
+ * references it maintains to resources.
+ */
+int gh_rm_vm_reset(struct gh_rm *rm, u16 vmid)
+{
+ return gh_rm_common_vmid_call(rm, GH_RM_RPC_VM_RESET, vmid);
+}
+
+/**
+ * gh_rm_vm_start() - Move a VM into "ready to run" state
+ * @rm: Handle to a Gunyah resource manager
+ * @vmid: VM identifier allocated with gh_rm_alloc_vmid
+ *
+ * On VMs which use proxy scheduling, vcpu_run is needed to actually run the VM.
+ * On VMs which use Gunyah's scheduling, the vCPUs start executing in accordance with Gunyah
+ * scheduling policies.
+ */
+int gh_rm_vm_start(struct gh_rm *rm, u16 vmid)
+{
+ return gh_rm_common_vmid_call(rm, GH_RM_RPC_VM_START, vmid);
+}
+
+/**
+ * gh_rm_vm_stop() - Send a request to Resource Manager VM to forcibly stop a VM.
+ * @rm: Handle to a Gunyah resource manager
+ * @vmid: VM identifier allocated with gh_rm_alloc_vmid
+ */
+int gh_rm_vm_stop(struct gh_rm *rm, u16 vmid)
+{
+ struct gh_rm_vm_stop_req req_payload = {
+ .vmid = cpu_to_le16(vmid),
+ .flags = GH_RM_VM_STOP_FLAG_FORCE_STOP,
+ .stop_reason = cpu_to_le32(GH_RM_VM_STOP_REASON_FORCE_STOP),
+ };
+
+ return gh_rm_call(rm, GH_RM_RPC_VM_STOP, &req_payload, sizeof(req_payload), NULL, NULL);
+}
+
+/**
+ * gh_rm_vm_configure() - Prepare a VM to start and provide the common
+ * configuration needed by RM to configure a VM
+ * @rm: Handle to a Gunyah resource manager
+ * @vmid: VM identifier allocated with gh_rm_alloc_vmid
+ * @auth_mechanism: Authentication mechanism used by resource manager to verify
+ * the virtual machine
+ * @mem_handle: Handle to a previously shared memparcel that contains all parts
+ * of the VM image subject to authentication.
+ * @image_offset: Start address of VM image, relative to the start of memparcel
+ * @image_size: Size of the VM image
+ * @dtb_offset: Start address of the devicetree binary with VM configuration,
+ * relative to start of memparcel.
+ * @dtb_size: Maximum size of devicetree binary.
+ */
+int gh_rm_vm_configure(struct gh_rm *rm, u16 vmid, enum gh_rm_vm_auth_mechanism auth_mechanism,
+ u32 mem_handle, u64 image_offset, u64 image_size, u64 dtb_offset, u64 dtb_size)
+{
+ struct gh_rm_vm_config_image_req req_payload = {
+ .vmid = cpu_to_le16(vmid),
+ .auth_mech = cpu_to_le16(auth_mechanism),
+ .mem_handle = cpu_to_le32(mem_handle),
+ .image_offset = cpu_to_le64(image_offset),
+ .image_size = cpu_to_le64(image_size),
+ .dtb_offset = cpu_to_le64(dtb_offset),
+ .dtb_size = cpu_to_le64(dtb_size),
+ };
+
+ return gh_rm_call(rm, GH_RM_RPC_VM_CONFIG_IMAGE, &req_payload, sizeof(req_payload),
+ NULL, NULL);
+}
+
+/**
+ * gh_rm_vm_init() - Move the VM to initialized state.
+ * @rm: Handle to a Gunyah resource manager
+ * @vmid: VM identifier
+ *
+ * RM will allocate needed resources for the VM.
+ */
+int gh_rm_vm_init(struct gh_rm *rm, u16 vmid)
+{
+ return gh_rm_common_vmid_call(rm, GH_RM_RPC_VM_INIT, vmid);
+}
+
+/**
+ * gh_rm_get_hyp_resources() - Retrieve hypervisor resources (capabilities) associated with a VM
+ * @rm: Handle to a Gunyah resource manager
+ * @vmid: VMID of the other VM to get the resources of
+ * @resources: Set by gh_rm_get_hyp_resources and contains the returned hypervisor resources.
+ * Caller must free the resources pointer if successful.
+ */
+int gh_rm_get_hyp_resources(struct gh_rm *rm, u16 vmid,
+ struct gh_rm_hyp_resources **resources)
+{
+ struct gh_rm_vm_common_vmid_req req_payload = {
+ .vmid = cpu_to_le16(vmid),
+ };
+ struct gh_rm_hyp_resources *resp;
+ size_t resp_size;
+ int ret;
+
+ ret = gh_rm_call(rm, GH_RM_RPC_VM_GET_HYP_RESOURCES,
+ &req_payload, sizeof(req_payload),
+ (void **)&resp, &resp_size);
+ if (ret)
+ return ret;
+
+ if (!resp_size)
+ return -EBADMSG;
+
+ if (resp_size < struct_size(resp, entries, 0) ||
+ resp_size != struct_size(resp, entries, le32_to_cpu(resp->n_entries))) {
+ kfree(resp);
+ return -EBADMSG;
+ }
+
+ *resources = resp;
+ return 0;
+}
+
+/**
+ * gh_rm_get_vmid() - Retrieve VMID of this virtual machine
+ * @rm: Handle to a Gunyah resource manager
+ * @vmid: Filled with the VMID of this VM
+ */
+int gh_rm_get_vmid(struct gh_rm *rm, u16 *vmid)
+{
+ static u16 cached_vmid = GH_VMID_INVAL;
+ size_t resp_size;
+ __le32 *resp;
+ int ret;
+
+ if (cached_vmid != GH_VMID_INVAL) {
+ *vmid = cached_vmid;
+ return 0;
+ }
+
+ ret = gh_rm_call(rm, GH_RM_RPC_VM_GET_VMID, NULL, 0, (void **)&resp, &resp_size);
+ if (ret)
+ return ret;
+
+ *vmid = cached_vmid = lower_16_bits(le32_to_cpu(*resp));
+ kfree(resp);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(gh_rm_get_vmid);
diff --git a/include/linux/gunyah_rsc_mgr.h b/include/linux/gunyah_rsc_mgr.h
index f2a312e80af5..1ac66d9004d2 100644
--- a/include/linux/gunyah_rsc_mgr.h
+++ b/include/linux/gunyah_rsc_mgr.h
@@ -18,4 +18,77 @@ int gh_rm_notifier_unregister(struct gh_rm *rm, struct notifier_block *nb);
struct device *gh_rm_get(struct gh_rm *rm);
void gh_rm_put(struct gh_rm *rm);
+struct gh_rm_vm_exited_payload {
+ __le16 vmid;
+ __le16 exit_type;
+ __le32 exit_reason_size;
+ u8 exit_reason[];
+} __packed;
+
+#define GH_RM_NOTIFICATION_VM_EXITED 0x56100001
+
+enum gh_rm_vm_status {
+ GH_RM_VM_STATUS_NO_STATE = 0,
+ GH_RM_VM_STATUS_INIT = 1,
+ GH_RM_VM_STATUS_READY = 2,
+ GH_RM_VM_STATUS_RUNNING = 3,
+ GH_RM_VM_STATUS_PAUSED = 4,
+ GH_RM_VM_STATUS_LOAD = 5,
+ GH_RM_VM_STATUS_AUTH = 6,
+ GH_RM_VM_STATUS_INIT_FAILED = 8,
+ GH_RM_VM_STATUS_EXITED = 9,
+ GH_RM_VM_STATUS_RESETTING = 10,
+ GH_RM_VM_STATUS_RESET = 11,
+};
+
+struct gh_rm_vm_status_payload {
+ __le16 vmid;
+ u16 reserved;
+ u8 vm_status;
+ u8 os_status;
+ __le16 app_status;
+} __packed;
+
+#define GH_RM_NOTIFICATION_VM_STATUS 0x56100008
+
+/* RPC Calls */
+int gh_rm_alloc_vmid(struct gh_rm *rm, u16 vmid);
+int gh_rm_dealloc_vmid(struct gh_rm *rm, u16 vmid);
+int gh_rm_vm_reset(struct gh_rm *rm, u16 vmid);
+int gh_rm_vm_start(struct gh_rm *rm, u16 vmid);
+int gh_rm_vm_stop(struct gh_rm *rm, u16 vmid);
+
+enum gh_rm_vm_auth_mechanism {
+ GH_RM_VM_AUTH_NONE = 0,
+ GH_RM_VM_AUTH_QCOM_PIL_ELF = 1,
+ GH_RM_VM_AUTH_QCOM_ANDROID_PVM = 2,
+};
+
+int gh_rm_vm_configure(struct gh_rm *rm, u16 vmid, enum gh_rm_vm_auth_mechanism auth_mechanism,
+ u32 mem_handle, u64 image_offset, u64 image_size,
+ u64 dtb_offset, u64 dtb_size);
+int gh_rm_vm_init(struct gh_rm *rm, u16 vmid);
+
+struct gh_rm_hyp_resource {
+ u8 type;
+ u8 reserved;
+ __le16 partner_vmid;
+ __le32 resource_handle;
+ __le32 resource_label;
+ __le64 cap_id;
+ __le32 virq_handle;
+ __le32 virq;
+ __le64 base;
+ __le64 size;
+} __packed;
+
+struct gh_rm_hyp_resources {
+ __le32 n_entries;
+ struct gh_rm_hyp_resource entries[];
+} __packed;
+
+int gh_rm_get_hyp_resources(struct gh_rm *rm, u16 vmid,
+ struct gh_rm_hyp_resources **resources);
+int gh_rm_get_vmid(struct gh_rm *rm, u16 *vmid);
+
#endif
--
2.40.0
Qualcomm platforms have a firmware entity which performs access control
to physical pages. Dynamically started Gunyah virtual machines use the
QCOM_SCM_RM_MANAGED_VMID for access. Linux thus needs to assign access
to the memory used by guest VMs. Gunyah doesn't do this operation for us
since it is the current VM (typically VMID_HLOS) delegating the access
and not Gunyah itself. Use the Gunyah platform ops to achieve this so
that only Qualcomm platforms attempt to make the needed SCM calls.
Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Elliot Berman <[email protected]>
---
drivers/virt/gunyah/Kconfig | 13 +++
drivers/virt/gunyah/Makefile | 1 +
drivers/virt/gunyah/gunyah_qcom.c | 147 ++++++++++++++++++++++++++++++
3 files changed, 161 insertions(+)
create mode 100644 drivers/virt/gunyah/gunyah_qcom.c
diff --git a/drivers/virt/gunyah/Kconfig b/drivers/virt/gunyah/Kconfig
index de815189dab6..0421b751aad4 100644
--- a/drivers/virt/gunyah/Kconfig
+++ b/drivers/virt/gunyah/Kconfig
@@ -5,6 +5,7 @@ config GUNYAH
depends on ARM64
depends on MAILBOX
select GUNYAH_PLATFORM_HOOKS
+ imply GUNYAH_QCOM_PLATFORM if ARCH_QCOM
help
The Gunyah drivers are the helper interfaces that run in a guest VM
such as basic inter-VM IPC and signaling mechanisms, and higher level
@@ -15,3 +16,15 @@ config GUNYAH
config GUNYAH_PLATFORM_HOOKS
tristate
+
+config GUNYAH_QCOM_PLATFORM
+ tristate "Support for Gunyah on Qualcomm platforms"
+ depends on GUNYAH
+ select GUNYAH_PLATFORM_HOOKS
+ select QCOM_SCM
+ help
+ Enable support for interacting with Gunyah on Qualcomm
+ platforms. Interaction with Qualcomm firmware requires
+ extra platform-specific support.
+
+ Say Y/M here to use Gunyah on Qualcomm platforms.
diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
index 4fbeee521d60..2aa9ff038ed0 100644
--- a/drivers/virt/gunyah/Makefile
+++ b/drivers/virt/gunyah/Makefile
@@ -1,6 +1,7 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_GUNYAH_PLATFORM_HOOKS) += gunyah_platform_hooks.o
+obj-$(CONFIG_GUNYAH_QCOM_PLATFORM) += gunyah_qcom.o
gunyah-y += rsc_mgr.o rsc_mgr_rpc.o vm_mgr.o vm_mgr_mm.o
obj-$(CONFIG_GUNYAH) += gunyah.o
diff --git a/drivers/virt/gunyah/gunyah_qcom.c b/drivers/virt/gunyah/gunyah_qcom.c
new file mode 100644
index 000000000000..18acbda8fcbd
--- /dev/null
+++ b/drivers/virt/gunyah/gunyah_qcom.c
@@ -0,0 +1,147 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#include <linux/arm-smccc.h>
+#include <linux/gunyah_rsc_mgr.h>
+#include <linux/module.h>
+#include <linux/firmware/qcom/qcom_scm.h>
+#include <linux/types.h>
+#include <linux/uuid.h>
+
+#define QCOM_SCM_RM_MANAGED_VMID 0x3A
+#define QCOM_SCM_MAX_MANAGED_VMID 0x3F
+
+static int qcom_scm_gh_rm_pre_mem_share(struct gh_rm *rm, struct gh_rm_mem_parcel *mem_parcel)
+{
+ struct qcom_scm_vmperm *new_perms;
+ u64 src, src_cpy;
+ int ret = 0, i, n;
+ u16 vmid;
+
+ new_perms = kcalloc(mem_parcel->n_acl_entries, sizeof(*new_perms), GFP_KERNEL);
+ if (!new_perms)
+ return -ENOMEM;
+
+ for (n = 0; n < mem_parcel->n_acl_entries; n++) {
+ vmid = le16_to_cpu(mem_parcel->acl_entries[n].vmid);
+ if (vmid <= QCOM_SCM_MAX_MANAGED_VMID)
+ new_perms[n].vmid = vmid;
+ else
+ new_perms[n].vmid = QCOM_SCM_RM_MANAGED_VMID;
+ if (mem_parcel->acl_entries[n].perms & GH_RM_ACL_X)
+ new_perms[n].perm |= QCOM_SCM_PERM_EXEC;
+ if (mem_parcel->acl_entries[n].perms & GH_RM_ACL_W)
+ new_perms[n].perm |= QCOM_SCM_PERM_WRITE;
+ if (mem_parcel->acl_entries[n].perms & GH_RM_ACL_R)
+ new_perms[n].perm |= QCOM_SCM_PERM_READ;
+ }
+
+ src = (1ull << QCOM_SCM_VMID_HLOS);
+
+ for (i = 0; i < mem_parcel->n_mem_entries; i++) {
+ src_cpy = src;
+ ret = qcom_scm_assign_mem(le64_to_cpu(mem_parcel->mem_entries[i].phys_addr),
+ le64_to_cpu(mem_parcel->mem_entries[i].size),
+ &src_cpy, new_perms, mem_parcel->n_acl_entries);
+ if (ret) {
+ src = 0;
+ for (n = 0; n < mem_parcel->n_acl_entries; n++) {
+ vmid = le16_to_cpu(mem_parcel->acl_entries[n].vmid);
+ if (vmid <= QCOM_SCM_MAX_MANAGED_VMID)
+ src |= (1ull << vmid);
+ else
+ src |= (1ull << QCOM_SCM_RM_MANAGED_VMID);
+ }
+
+ new_perms[0].vmid = QCOM_SCM_VMID_HLOS;
+
+ for (i--; i >= 0; i--) {
+ src_cpy = src;
+ WARN_ON_ONCE(qcom_scm_assign_mem(
+ le64_to_cpu(mem_parcel->mem_entries[i].phys_addr),
+ le64_to_cpu(mem_parcel->mem_entries[i].size),
+ &src_cpy, new_perms, 1));
+ }
+ break;
+ }
+ }
+
+ kfree(new_perms);
+ return ret;
+}
+
+static int qcom_scm_gh_rm_post_mem_reclaim(struct gh_rm *rm, struct gh_rm_mem_parcel *mem_parcel)
+{
+ struct qcom_scm_vmperm new_perms;
+ u64 src = 0, src_cpy;
+ int ret = 0, i, n;
+ u16 vmid;
+
+ new_perms.vmid = QCOM_SCM_VMID_HLOS;
+ new_perms.perm = QCOM_SCM_PERM_EXEC | QCOM_SCM_PERM_WRITE | QCOM_SCM_PERM_READ;
+
+ for (n = 0; n < mem_parcel->n_acl_entries; n++) {
+ vmid = le16_to_cpu(mem_parcel->acl_entries[n].vmid);
+ if (vmid <= QCOM_SCM_MAX_MANAGED_VMID)
+ src |= (1ull << vmid);
+ else
+ src |= (1ull << QCOM_SCM_RM_MANAGED_VMID);
+ }
+
+ for (i = 0; i < mem_parcel->n_mem_entries; i++) {
+ src_cpy = src;
+ ret = qcom_scm_assign_mem(le64_to_cpu(mem_parcel->mem_entries[i].phys_addr),
+ le64_to_cpu(mem_parcel->mem_entries[i].size),
+ &src_cpy, &new_perms, 1);
+ WARN_ON_ONCE(ret);
+ }
+
+ return ret;
+}
+
+static struct gh_rm_platform_ops qcom_scm_gh_rm_platform_ops = {
+ .pre_mem_share = qcom_scm_gh_rm_pre_mem_share,
+ .post_mem_reclaim = qcom_scm_gh_rm_post_mem_reclaim,
+};
+
+/* {19bd54bd-0b37-571b-946f-609b54539de6} */
+static const uuid_t QCOM_EXT_UUID =
+ UUID_INIT(0x19bd54bd, 0x0b37, 0x571b, 0x94, 0x6f, 0x60, 0x9b, 0x54, 0x53, 0x9d, 0xe6);
+
+#define GH_QCOM_EXT_CALL_UUID_ID ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, ARM_SMCCC_SMC_32, \
+ ARM_SMCCC_OWNER_VENDOR_HYP, 0x3f01)
+
+static bool gh_has_qcom_extensions(void)
+{
+ struct arm_smccc_res res;
+ uuid_t uuid;
+
+ arm_smccc_1_1_smc(GH_QCOM_EXT_CALL_UUID_ID, &res);
+
+ ((u32 *)&uuid.b[0])[0] = lower_32_bits(res.a0);
+ ((u32 *)&uuid.b[0])[1] = lower_32_bits(res.a1);
+ ((u32 *)&uuid.b[0])[2] = lower_32_bits(res.a2);
+ ((u32 *)&uuid.b[0])[3] = lower_32_bits(res.a3);
+
+ return uuid_equal(&uuid, &QCOM_EXT_UUID);
+}
+
+static int __init qcom_gh_platform_hooks_register(void)
+{
+ if (!gh_has_qcom_extensions())
+ return -ENODEV;
+
+ return gh_rm_register_platform_ops(&qcom_scm_gh_rm_platform_ops);
+}
+
+static void __exit qcom_gh_platform_hooks_unregister(void)
+{
+ gh_rm_unregister_platform_ops(&qcom_scm_gh_rm_platform_ops);
+}
+
+module_init(qcom_gh_platform_hooks_register);
+module_exit(qcom_gh_platform_hooks_unregister);
+MODULE_DESCRIPTION("Qualcomm Technologies, Inc. Platform Hooks for Gunyah");
+MODULE_LICENSE("GPL");
--
2.40.0
Gunyah resource manager provides API to manipulate stage 2 page tables.
Manipulations are represented as a memory parcel. Memory parcels
describe a list of memory regions (intermediate physical address and
size), a list of new permissions for VMs, and the memory type (DDR or
MMIO). Memory parcels are uniquely identified by a handle allocated by
Gunyah. There are a few types of memory parcel sharing which Gunyah
supports:
- Sharing: the guest and host VM both have access
- Lending: only the guest has access; host VM loses access
- Donating: Permanently lent (not reclaimed even if guest shuts down)
Memory parcels that have been shared or lent can be reclaimed by the
host via an additional call. The reclaim operation restores the original
access the host VM had to the memory parcel and removes the access to
other VM.
One point to note that memory parcels don't describe where in the guest
VM the memory parcel should reside. The guest VM must accept the memory
parcel either explicitly via a "gh_rm_mem_accept" call (not introduced
here) or be configured to accept it automatically at boot. As the guest
VM accepts the memory parcel, it also mentions the IPA it wants to place
memory parcel.
Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Elliot Berman <[email protected]>
---
drivers/virt/gunyah/rsc_mgr_rpc.c | 227 ++++++++++++++++++++++++++++++
include/linux/gunyah_rsc_mgr.h | 48 +++++++
2 files changed, 275 insertions(+)
diff --git a/drivers/virt/gunyah/rsc_mgr_rpc.c b/drivers/virt/gunyah/rsc_mgr_rpc.c
index a4a9f0ba4e1f..4f25f07400b3 100644
--- a/drivers/virt/gunyah/rsc_mgr_rpc.c
+++ b/drivers/virt/gunyah/rsc_mgr_rpc.c
@@ -6,6 +6,12 @@
#include <linux/gunyah_rsc_mgr.h>
#include "rsc_mgr.h"
+/* Message IDs: Memory Management */
+#define GH_RM_RPC_MEM_LEND 0x51000012
+#define GH_RM_RPC_MEM_SHARE 0x51000013
+#define GH_RM_RPC_MEM_RECLAIM 0x51000015
+#define GH_RM_RPC_MEM_APPEND 0x51000018
+
/* Message IDs: VM Management */
#define GH_RM_RPC_VM_ALLOC_VMID 0x56000001
#define GH_RM_RPC_VM_DEALLOC_VMID 0x56000002
@@ -22,6 +28,46 @@ struct gh_rm_vm_common_vmid_req {
__le16 _padding;
} __packed;
+/* Call: MEM_LEND, MEM_SHARE */
+#define GH_MEM_SHARE_REQ_FLAGS_APPEND BIT(1)
+
+struct gh_rm_mem_share_req_header {
+ u8 mem_type;
+ u8 _padding0;
+ u8 flags;
+ u8 _padding1;
+ __le32 label;
+} __packed;
+
+struct gh_rm_mem_share_req_acl_section {
+ __le32 n_entries;
+ struct gh_rm_mem_acl_entry entries[];
+};
+
+struct gh_rm_mem_share_req_mem_section {
+ __le16 n_entries;
+ __le16 _padding;
+ struct gh_rm_mem_entry entries[];
+};
+
+/* Call: MEM_RELEASE */
+struct gh_rm_mem_release_req {
+ __le32 mem_handle;
+ u8 flags; /* currently not used */
+ u8 _padding0;
+ __le16 _padding1;
+} __packed;
+
+/* Call: MEM_APPEND */
+#define GH_MEM_APPEND_REQ_FLAGS_END BIT(0)
+
+struct gh_rm_mem_append_req_header {
+ __le32 mem_handle;
+ u8 flags;
+ u8 _padding0;
+ __le16 _padding1;
+} __packed;
+
/* Call: VM_ALLOC */
struct gh_rm_vm_alloc_vmid_resp {
__le16 vmid;
@@ -51,6 +97,8 @@ struct gh_rm_vm_config_image_req {
__le64 dtb_size;
} __packed;
+#define GH_RM_MAX_MEM_ENTRIES 512
+
/*
* Several RM calls take only a VMID as a parameter and give only standard
* response back. Deduplicate boilerplate code by using this common call.
@@ -64,6 +112,185 @@ static int gh_rm_common_vmid_call(struct gh_rm *rm, u32 message_id, u16 vmid)
return gh_rm_call(rm, message_id, &req_payload, sizeof(req_payload), NULL, NULL);
}
+static int _gh_rm_mem_append(struct gh_rm *rm, u32 mem_handle, bool end_append,
+ struct gh_rm_mem_entry *mem_entries, size_t n_mem_entries)
+{
+ struct gh_rm_mem_share_req_mem_section *mem_section;
+ struct gh_rm_mem_append_req_header *req_header;
+ size_t msg_size = 0;
+ void *msg;
+ int ret;
+
+ msg_size += sizeof(struct gh_rm_mem_append_req_header);
+ msg_size += struct_size(mem_section, entries, n_mem_entries);
+
+ msg = kzalloc(msg_size, GFP_KERNEL);
+ if (!msg)
+ return -ENOMEM;
+
+ req_header = msg;
+ mem_section = (void *)req_header + sizeof(struct gh_rm_mem_append_req_header);
+
+ req_header->mem_handle = cpu_to_le32(mem_handle);
+ if (end_append)
+ req_header->flags |= GH_MEM_APPEND_REQ_FLAGS_END;
+
+ mem_section->n_entries = cpu_to_le16(n_mem_entries);
+ memcpy(mem_section->entries, mem_entries, sizeof(*mem_entries) * n_mem_entries);
+
+ ret = gh_rm_call(rm, GH_RM_RPC_MEM_APPEND, msg, msg_size, NULL, NULL);
+ kfree(msg);
+
+ return ret;
+}
+
+static int gh_rm_mem_append(struct gh_rm *rm, u32 mem_handle,
+ struct gh_rm_mem_entry *mem_entries, size_t n_mem_entries)
+{
+ bool end_append;
+ int ret = 0;
+ size_t n;
+
+ while (n_mem_entries) {
+ if (n_mem_entries > GH_RM_MAX_MEM_ENTRIES) {
+ end_append = false;
+ n = GH_RM_MAX_MEM_ENTRIES;
+ } else {
+ end_append = true;
+ n = n_mem_entries;
+ }
+
+ ret = _gh_rm_mem_append(rm, mem_handle, end_append, mem_entries, n);
+ if (ret)
+ break;
+
+ mem_entries += n;
+ n_mem_entries -= n;
+ }
+
+ return ret;
+}
+
+static int gh_rm_mem_lend_common(struct gh_rm *rm, u32 message_id, struct gh_rm_mem_parcel *p)
+{
+ size_t msg_size = 0, initial_mem_entries = p->n_mem_entries, resp_size;
+ size_t acl_section_size, mem_section_size;
+ struct gh_rm_mem_share_req_acl_section *acl_section;
+ struct gh_rm_mem_share_req_mem_section *mem_section;
+ struct gh_rm_mem_share_req_header *req_header;
+ u32 *attr_section;
+ __le32 *resp;
+ void *msg;
+ int ret;
+
+ if (!p->acl_entries || !p->n_acl_entries || !p->mem_entries || !p->n_mem_entries ||
+ p->n_acl_entries > U8_MAX || p->mem_handle != GH_MEM_HANDLE_INVAL)
+ return -EINVAL;
+
+ if (initial_mem_entries > GH_RM_MAX_MEM_ENTRIES)
+ initial_mem_entries = GH_RM_MAX_MEM_ENTRIES;
+
+ acl_section_size = struct_size(acl_section, entries, p->n_acl_entries);
+ mem_section_size = struct_size(mem_section, entries, initial_mem_entries);
+ /* The format of the message goes:
+ * request header
+ * ACL entries (which VMs get what kind of access to this memory parcel)
+ * Memory entries (list of memory regions to share)
+ * Memory attributes (currently unused, we'll hard-code the size to 0)
+ */
+ msg_size += sizeof(struct gh_rm_mem_share_req_header);
+ msg_size += acl_section_size;
+ msg_size += mem_section_size;
+ msg_size += sizeof(u32); /* for memory attributes, currently unused */
+
+ msg = kzalloc(msg_size, GFP_KERNEL);
+ if (!msg)
+ return -ENOMEM;
+
+ req_header = msg;
+ acl_section = (void *)req_header + sizeof(*req_header);
+ mem_section = (void *)acl_section + acl_section_size;
+ attr_section = (void *)mem_section + mem_section_size;
+
+ req_header->mem_type = p->mem_type;
+ if (initial_mem_entries != p->n_mem_entries)
+ req_header->flags |= GH_MEM_SHARE_REQ_FLAGS_APPEND;
+ req_header->label = cpu_to_le32(p->label);
+
+ acl_section->n_entries = cpu_to_le32(p->n_acl_entries);
+ memcpy(acl_section->entries, p->acl_entries,
+ flex_array_size(acl_section, entries, p->n_acl_entries));
+
+ mem_section->n_entries = cpu_to_le16(initial_mem_entries);
+ memcpy(mem_section->entries, p->mem_entries,
+ flex_array_size(mem_section, entries, initial_mem_entries));
+
+ /* Set n_entries for memory attribute section to 0 */
+ *attr_section = 0;
+
+ ret = gh_rm_call(rm, message_id, msg, msg_size, (void **)&resp, &resp_size);
+ kfree(msg);
+
+ if (ret)
+ return ret;
+
+ p->mem_handle = le32_to_cpu(*resp);
+ kfree(resp);
+
+ if (initial_mem_entries != p->n_mem_entries) {
+ ret = gh_rm_mem_append(rm, p->mem_handle,
+ &p->mem_entries[initial_mem_entries],
+ p->n_mem_entries - initial_mem_entries);
+ if (ret) {
+ gh_rm_mem_reclaim(rm, p);
+ p->mem_handle = GH_MEM_HANDLE_INVAL;
+ }
+ }
+
+ return ret;
+}
+
+/**
+ * gh_rm_mem_lend() - Lend memory to other virtual machines.
+ * @rm: Handle to a Gunyah resource manager
+ * @parcel: Information about the memory to be lent.
+ *
+ * Lending removes Linux's access to the memory while the memory parcel is lent.
+ */
+int gh_rm_mem_lend(struct gh_rm *rm, struct gh_rm_mem_parcel *parcel)
+{
+ return gh_rm_mem_lend_common(rm, GH_RM_RPC_MEM_LEND, parcel);
+}
+
+
+/**
+ * gh_rm_mem_share() - Share memory with other virtual machines.
+ * @rm: Handle to a Gunyah resource manager
+ * @parcel: Information about the memory to be shared.
+ *
+ * Sharing keeps Linux's access to the memory while the memory parcel is shared.
+ */
+int gh_rm_mem_share(struct gh_rm *rm, struct gh_rm_mem_parcel *parcel)
+{
+ return gh_rm_mem_lend_common(rm, GH_RM_RPC_MEM_SHARE, parcel);
+}
+
+/**
+ * gh_rm_mem_reclaim() - Reclaim a memory parcel
+ * @rm: Handle to a Gunyah resource manager
+ * @parcel: Information about the memory to be reclaimed.
+ *
+ * RM maps the associated memory back into the stage-2 page tables of the owner VM.
+ */
+int gh_rm_mem_reclaim(struct gh_rm *rm, struct gh_rm_mem_parcel *parcel)
+{
+ struct gh_rm_mem_release_req req = {
+ .mem_handle = cpu_to_le32(parcel->mem_handle),
+ };
+
+ return gh_rm_call(rm, GH_RM_RPC_MEM_RECLAIM, &req, sizeof(req), NULL, NULL);
+}
+
/**
* gh_rm_alloc_vmid() - Allocate a new VM in Gunyah. Returns the VM identifier.
* @rm: Handle to a Gunyah resource manager
diff --git a/include/linux/gunyah_rsc_mgr.h b/include/linux/gunyah_rsc_mgr.h
index 1ac66d9004d2..dfac088420bd 100644
--- a/include/linux/gunyah_rsc_mgr.h
+++ b/include/linux/gunyah_rsc_mgr.h
@@ -11,6 +11,7 @@
#include <linux/gunyah.h>
#define GH_VMID_INVAL U16_MAX
+#define GH_MEM_HANDLE_INVAL U32_MAX
struct gh_rm;
int gh_rm_notifier_register(struct gh_rm *rm, struct notifier_block *nb);
@@ -51,7 +52,54 @@ struct gh_rm_vm_status_payload {
#define GH_RM_NOTIFICATION_VM_STATUS 0x56100008
+#define GH_RM_ACL_X BIT(0)
+#define GH_RM_ACL_W BIT(1)
+#define GH_RM_ACL_R BIT(2)
+
+struct gh_rm_mem_acl_entry {
+ __le16 vmid;
+ u8 perms;
+ u8 reserved;
+} __packed;
+
+struct gh_rm_mem_entry {
+ __le64 phys_addr;
+ __le64 size;
+} __packed;
+
+enum gh_rm_mem_type {
+ GH_RM_MEM_TYPE_NORMAL = 0,
+ GH_RM_MEM_TYPE_IO = 1,
+};
+
+/*
+ * struct gh_rm_mem_parcel - Info about memory to be lent/shared/donated/reclaimed
+ * @mem_type: The type of memory: normal (DDR) or IO
+ * @label: An client-specified identifier which can be used by the other VMs to identify the purpose
+ * of the memory parcel.
+ * @n_acl_entries: Count of the number of entries in the @acl_entries array.
+ * @acl_entries: An array of access control entries. Each entry specifies a VM and what access
+ * is allowed for the memory parcel.
+ * @n_mem_entries: Count of the number of entries in the @mem_entries array.
+ * @mem_entries: An array of regions to be associated with the memory parcel. Addresses should be
+ * (intermediate) physical addresses from Linux's perspective.
+ * @mem_handle: On success, filled with memory handle that RM allocates for this memory parcel
+ */
+struct gh_rm_mem_parcel {
+ enum gh_rm_mem_type mem_type;
+ u32 label;
+ size_t n_acl_entries;
+ struct gh_rm_mem_acl_entry *acl_entries;
+ size_t n_mem_entries;
+ struct gh_rm_mem_entry *mem_entries;
+ u32 mem_handle;
+};
+
/* RPC Calls */
+int gh_rm_mem_lend(struct gh_rm *rm, struct gh_rm_mem_parcel *parcel);
+int gh_rm_mem_share(struct gh_rm *rm, struct gh_rm_mem_parcel *parcel);
+int gh_rm_mem_reclaim(struct gh_rm *rm, struct gh_rm_mem_parcel *parcel);
+
int gh_rm_alloc_vmid(struct gh_rm *rm, u16 vmid);
int gh_rm_dealloc_vmid(struct gh_rm *rm, u16 vmid);
int gh_rm_vm_reset(struct gh_rm *rm, u16 vmid);
--
2.40.0
Allow userspace to attach an ioeventfd to an mmio address within the guest.
Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
Signed-off-by: Elliot Berman <[email protected]>
---
Documentation/virt/gunyah/vm-manager.rst | 2 +-
drivers/virt/gunyah/Kconfig | 9 ++
drivers/virt/gunyah/Makefile | 1 +
drivers/virt/gunyah/gunyah_ioeventfd.c | 130 +++++++++++++++++++++++
include/uapi/linux/gunyah.h | 37 +++++++
5 files changed, 178 insertions(+), 1 deletion(-)
create mode 100644 drivers/virt/gunyah/gunyah_ioeventfd.c
diff --git a/Documentation/virt/gunyah/vm-manager.rst b/Documentation/virt/gunyah/vm-manager.rst
index c4960948c779..87838c5b5945 100644
--- a/Documentation/virt/gunyah/vm-manager.rst
+++ b/Documentation/virt/gunyah/vm-manager.rst
@@ -115,7 +115,7 @@ the VM *before* the VM starts.
The argument types are documented below:
.. kernel-doc:: include/uapi/linux/gunyah.h
- :identifiers: gh_fn_vcpu_arg gh_fn_irqfd_arg gh_irqfd_flags
+ :identifiers: gh_fn_vcpu_arg gh_fn_irqfd_arg gh_irqfd_flags gh_fn_ioeventfd_arg gh_ioeventfd_flags
Gunyah VCPU API Descriptions
----------------------------
diff --git a/drivers/virt/gunyah/Kconfig b/drivers/virt/gunyah/Kconfig
index bc2c46d9df94..63bebc5b9f82 100644
--- a/drivers/virt/gunyah/Kconfig
+++ b/drivers/virt/gunyah/Kconfig
@@ -48,3 +48,12 @@ config GUNYAH_IRQFD
on Gunyah virtual machine.
Say Y/M here if unsure and you want to support Gunyah VMMs.
+
+config GUNYAH_IOEVENTFD
+ tristate "Gunyah ioeventfd interface"
+ depends on GUNYAH
+ help
+ Enable kernel support for creating ioeventfds which can alert userspace
+ when a Gunyah virtual machine accesses a memory address.
+
+ Say Y/M here if unsure and you want to support Gunyah VMMs.
diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
index ad212a1cf967..63ca11e74796 100644
--- a/drivers/virt/gunyah/Makefile
+++ b/drivers/virt/gunyah/Makefile
@@ -8,3 +8,4 @@ obj-$(CONFIG_GUNYAH) += gunyah.o
obj-$(CONFIG_GUNYAH_VCPU) += gunyah_vcpu.o
obj-$(CONFIG_GUNYAH_IRQFD) += gunyah_irqfd.o
+obj-$(CONFIG_GUNYAH_IOEVENTFD) += gunyah_ioeventfd.o
diff --git a/drivers/virt/gunyah/gunyah_ioeventfd.c b/drivers/virt/gunyah/gunyah_ioeventfd.c
new file mode 100644
index 000000000000..5b1b9fd9ac3a
--- /dev/null
+++ b/drivers/virt/gunyah/gunyah_ioeventfd.c
@@ -0,0 +1,130 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#include <linux/eventfd.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/gunyah.h>
+#include <linux/gunyah_vm_mgr.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+
+#include <uapi/linux/gunyah.h>
+
+struct gh_ioeventfd {
+ struct gh_vm_function_instance *f;
+ struct gh_vm_io_handler io_handler;
+
+ struct eventfd_ctx *ctx;
+};
+
+static int gh_write_ioeventfd(struct gh_vm_io_handler *io_dev, u64 addr, u32 len, u64 data)
+{
+ struct gh_ioeventfd *iofd = container_of(io_dev, struct gh_ioeventfd, io_handler);
+
+ eventfd_signal(iofd->ctx, 1);
+ return 0;
+}
+
+static struct gh_vm_io_handler_ops io_ops = {
+ .write = gh_write_ioeventfd,
+};
+
+static long gh_ioeventfd_bind(struct gh_vm_function_instance *f)
+{
+ const struct gh_fn_ioeventfd_arg *args = f->argp;
+ struct gh_ioeventfd *iofd;
+ struct eventfd_ctx *ctx;
+ int ret;
+
+ if (f->arg_size != sizeof(*args))
+ return -EINVAL;
+
+ /* All other flag bits are reserved for future use */
+ if (args->flags & ~GH_IOEVENTFD_FLAGS_DATAMATCH)
+ return -EINVAL;
+
+ /* must be natural-word sized, or 0 to ignore length */
+ switch (args->len) {
+ case 0:
+ case 1:
+ case 2:
+ case 4:
+ case 8:
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ /* check for range overflow */
+ if (overflows_type(args->addr + args->len, u64))
+ return -EINVAL;
+
+ /* ioeventfd with no length can't be combined with DATAMATCH */
+ if (!args->len && (args->flags & GH_IOEVENTFD_FLAGS_DATAMATCH))
+ return -EINVAL;
+
+ ctx = eventfd_ctx_fdget(args->fd);
+ if (IS_ERR(ctx))
+ return PTR_ERR(ctx);
+
+ iofd = kzalloc(sizeof(*iofd), GFP_KERNEL);
+ if (!iofd) {
+ ret = -ENOMEM;
+ goto err_eventfd;
+ }
+
+ f->data = iofd;
+ iofd->f = f;
+
+ iofd->ctx = ctx;
+
+ if (args->flags & GH_IOEVENTFD_FLAGS_DATAMATCH) {
+ iofd->io_handler.datamatch = true;
+ iofd->io_handler.len = args->len;
+ iofd->io_handler.data = args->datamatch;
+ }
+ iofd->io_handler.addr = args->addr;
+ iofd->io_handler.ops = &io_ops;
+
+ ret = gh_vm_add_io_handler(f->ghvm, &iofd->io_handler);
+ if (ret)
+ goto err_io_dev_add;
+
+ return 0;
+
+err_io_dev_add:
+ kfree(iofd);
+err_eventfd:
+ eventfd_ctx_put(ctx);
+ return ret;
+}
+
+static void gh_ioevent_unbind(struct gh_vm_function_instance *f)
+{
+ struct gh_ioeventfd *iofd = f->data;
+
+ eventfd_ctx_put(iofd->ctx);
+ gh_vm_remove_io_handler(iofd->f->ghvm, &iofd->io_handler);
+ kfree(iofd);
+}
+
+static bool gh_ioevent_compare(const struct gh_vm_function_instance *f,
+ const void *arg, size_t size)
+{
+ const struct gh_fn_ioeventfd_arg *instance = f->argp,
+ *other = arg;
+
+ if (sizeof(*other) != size)
+ return false;
+
+ return instance->addr == other->addr;
+}
+
+DECLARE_GH_VM_FUNCTION_INIT(ioeventfd, GH_FN_IOEVENTFD, 3,
+ gh_ioeventfd_bind, gh_ioevent_unbind,
+ gh_ioevent_compare);
+MODULE_DESCRIPTION("Gunyah ioeventfd VM Function");
+MODULE_LICENSE("GPL");
diff --git a/include/uapi/linux/gunyah.h b/include/uapi/linux/gunyah.h
index 0c480c622686..fa1cae7419d2 100644
--- a/include/uapi/linux/gunyah.h
+++ b/include/uapi/linux/gunyah.h
@@ -79,10 +79,13 @@ struct gh_vm_dtb_config {
* Return: file descriptor to manipulate the vcpu.
* @GH_FN_IRQFD: register eventfd to assert a Gunyah doorbell
* &struct gh_fn_desc.arg is a pointer to &struct gh_fn_irqfd_arg
+ * @GH_FN_IOEVENTFD: register ioeventfd to trigger when VM faults on parameter
+ * &struct gh_fn_desc.arg is a pointer to &struct gh_fn_ioeventfd_arg
*/
enum gh_fn_type {
GH_FN_VCPU = 1,
GH_FN_IRQFD,
+ GH_FN_IOEVENTFD,
};
#define GH_FN_MAX_ARG_SIZE 256
@@ -134,6 +137,40 @@ struct gh_fn_irqfd_arg {
__u32 padding;
};
+/**
+ * enum gh_ioeventfd_flags - flags for use in gh_fn_ioeventfd_arg
+ * @GH_IOEVENTFD_FLAGS_DATAMATCH: the event will be signaled only if the
+ * written value to the registered address is
+ * equal to &struct gh_fn_ioeventfd_arg.datamatch
+ */
+enum gh_ioeventfd_flags {
+ GH_IOEVENTFD_FLAGS_DATAMATCH = 1UL << 0,
+};
+
+/**
+ * struct gh_fn_ioeventfd_arg - Arguments to create an ioeventfd function
+ * @datamatch: data used when GH_IOEVENTFD_DATAMATCH is set
+ * @addr: Address in guest memory
+ * @len: Length of access
+ * @fd: When ioeventfd is matched, this eventfd is written
+ * @flags: See &enum gh_ioeventfd_flags
+ * @padding: padding bytes
+ *
+ * Create this function with &GH_VM_ADD_FUNCTION using type &GH_FN_IOEVENTFD.
+ *
+ * Attaches an ioeventfd to a legal mmio address within the guest. A guest write
+ * in the registered address will signal the provided event instead of triggering
+ * an exit on the GH_VCPU_RUN ioctl.
+ */
+struct gh_fn_ioeventfd_arg {
+ __u64 datamatch;
+ __u64 addr; /* legal mmio address */
+ __u32 len; /* 1, 2, 4, or 8 bytes; or 0 to ignore length */
+ __s32 fd;
+ __u32 flags;
+ __u32 padding;
+};
+
/**
* struct gh_fn_desc - Arguments to create a VM function
* @type: Type of the function. See &enum gh_fn_type.
--
2.40.0
Document the ioctls and usage of Gunyah VM Manager driver.
Signed-off-by: Elliot Berman <[email protected]>
---
Documentation/virt/gunyah/index.rst | 1 +
Documentation/virt/gunyah/vm-manager.rst | 82 ++++++++++++++++++++++++
2 files changed, 83 insertions(+)
create mode 100644 Documentation/virt/gunyah/vm-manager.rst
diff --git a/Documentation/virt/gunyah/index.rst b/Documentation/virt/gunyah/index.rst
index 74aa345e0a14..7058249825b1 100644
--- a/Documentation/virt/gunyah/index.rst
+++ b/Documentation/virt/gunyah/index.rst
@@ -7,6 +7,7 @@ Gunyah Hypervisor
.. toctree::
:maxdepth: 1
+ vm-manager
message-queue
Gunyah is a Type-1 hypervisor which is independent of any OS kernel, and runs in
diff --git a/Documentation/virt/gunyah/vm-manager.rst b/Documentation/virt/gunyah/vm-manager.rst
new file mode 100644
index 000000000000..50d8ae7fabcd
--- /dev/null
+++ b/Documentation/virt/gunyah/vm-manager.rst
@@ -0,0 +1,82 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=======================
+Virtual Machine Manager
+=======================
+
+The Gunyah Virtual Machine Manager is a Linux driver to support launching
+virtual machines using Gunyah. It presently supports launching non-proxy
+scheduled Linux-like virtual machines.
+
+Except for some basic information about the location of initial binaries,
+most of the configuration about a Gunyah virtual machine is described in the
+VM's devicetree. The devicetree is generated by userspace. Interacting with the
+virtual machine is still done via the kernel and VM configuration requires some
+of the corresponding functionality to be set up in the kernel. For instance,
+sharing userspace memory with a VM is done via the `GH_VM_SET_USER_MEM_REGION`_
+ioctl. The VM itself is configured to use the memory region via the
+devicetree.
+
+Sample Userspace VMM
+====================
+
+A sample userspace VMM is included in samples/gunyah/ along with a minimal
+devicetree that can be used to launch a VM. To build this sample, enable
+CONFIG_SAMPLE_GUNYAH.
+
+IOCTLs and userspace VMM flows
+==============================
+
+The kernel exposes a char device interface at /dev/gunyah.
+
+To create a VM, use the `GH_CREATE_VM`_ ioctl. A successful call will return a
+"Gunyah VM" file descriptor.
+
+/dev/gunyah API Descriptions
+----------------------------
+
+GH_CREATE_VM
+~~~~~~~~~~~~
+
+Creates a Gunyah VM. The argument is reserved for future use and must be 0.
+
+Gunyah VM API Descriptions
+--------------------------
+
+GH_VM_SET_USER_MEM_REGION
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+This ioctl allows the user to create or delete a memory parcel for a guest
+virtual machine. Each memory region is uniquely identified by a label;
+attempting to create two regions with the same label is not allowed. Labels are
+unique per virtual machine.
+
+While VMM is guest-agnostic and allows runtime addition of memory regions,
+Linux guest virtual machines do not support accepting memory regions at runtime.
+Thus, memory regions should be provided before starting the VM and the VM must
+be configured to accept these at boot-up.
+
+The guest physical address is used by Linux kernel to check that the requested
+user regions do not overlap and to help find the corresponding memory region
+for calls like `GH_VM_SET_DTB_CONFIG`_. It must be page aligned.
+
+To add a memory region, call `GH_VM_SET_USER_MEM_REGION`_ with fields set as
+described above.
+
+.. kernel-doc:: include/uapi/linux/gunyah.h
+ :identifiers: gh_userspace_memory_region gh_mem_flags
+
+GH_VM_SET_DTB_CONFIG
+~~~~~~~~~~~~~~~~~~~~
+
+This ioctl sets the location of the VM's devicetree blob and is used by Gunyah
+Resource Manager to allocate resources. The guest physical memory should be part
+of the primary memory parcel provided to the VM prior to GH_VM_START.
+
+.. kernel-doc:: include/uapi/linux/gunyah.h
+ :identifiers: gh_vm_dtb_config
+
+GH_VM_START
+~~~~~~~~~~~
+
+This ioctl starts the VM.
--
2.40.0
Hi Elliot,
On Tue, May 09, 2023 at 01:47:47PM -0700, Elliot Berman wrote:
> When launching a virtual machine, Gunyah userspace allocates memory for
> the guest and informs Gunyah about these memory regions through
> SET_USER_MEMORY_REGION ioctl.
>
> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
> ---
> drivers/virt/gunyah/Makefile | 2 +-
> drivers/virt/gunyah/vm_mgr.c | 59 +++++++-
> drivers/virt/gunyah/vm_mgr.h | 26 ++++
> drivers/virt/gunyah/vm_mgr_mm.c | 236 ++++++++++++++++++++++++++++++++
> include/uapi/linux/gunyah.h | 37 +++++
> 5 files changed, 356 insertions(+), 4 deletions(-)
> create mode 100644 drivers/virt/gunyah/vm_mgr_mm.c
[...]
> +int gh_vm_mem_alloc(struct gh_vm *ghvm, struct gh_userspace_memory_region *region)
> +{
> + struct gh_vm_mem *mapping, *tmp_mapping;
> + struct page *curr_page, *prev_page;
> + struct gh_rm_mem_parcel *parcel;
> + int i, j, pinned, ret = 0;
> + unsigned int gup_flags;
> + size_t entry_size;
> + u16 vmid;
> +
> + if (!region->memory_size || !PAGE_ALIGNED(region->memory_size) ||
> + !PAGE_ALIGNED(region->userspace_addr) ||
> + !PAGE_ALIGNED(region->guest_phys_addr))
> + return -EINVAL;
> +
> + if (overflows_type(region->guest_phys_addr + region->memory_size, u64))
> + return -EOVERFLOW;
> +
> + ret = mutex_lock_interruptible(&ghvm->mm_lock);
> + if (ret)
> + return ret;
> +
> + mapping = __gh_vm_mem_find_by_label(ghvm, region->label);
> + if (mapping) {
> + ret = -EEXIST;
> + goto unlock;
> + }
> +
> + list_for_each_entry(tmp_mapping, &ghvm->memory_mappings, list) {
> + if (gh_vm_mem_overlap(tmp_mapping, region->guest_phys_addr,
> + region->memory_size)) {
> + ret = -EEXIST;
> + goto unlock;
> + }
> + }
> +
> + mapping = kzalloc(sizeof(*mapping), GFP_KERNEL_ACCOUNT);
> + if (!mapping) {
> + ret = -ENOMEM;
> + goto unlock;
> + }
> +
> + mapping->guest_phys_addr = region->guest_phys_addr;
> + mapping->npages = region->memory_size >> PAGE_SHIFT;
> + parcel = &mapping->parcel;
> + parcel->label = region->label;
> + parcel->mem_handle = GH_MEM_HANDLE_INVAL; /* to be filled later by mem_share/mem_lend */
> + parcel->mem_type = GH_RM_MEM_TYPE_NORMAL;
> +
> + ret = account_locked_vm(ghvm->mm, mapping->npages, true);
> + if (ret)
> + goto free_mapping;
> +
> + mapping->pages = kcalloc(mapping->npages, sizeof(*mapping->pages), GFP_KERNEL_ACCOUNT);
> + if (!mapping->pages) {
> + ret = -ENOMEM;
> + mapping->npages = 0; /* update npages for reclaim */
> + goto unlock_pages;
> + }
> +
> + gup_flags = FOLL_LONGTERM;
> + if (region->flags & GH_MEM_ALLOW_WRITE)
> + gup_flags |= FOLL_WRITE;
> +
> + pinned = pin_user_pages_fast(region->userspace_addr, mapping->npages,
> + gup_flags, mapping->pages);
> + if (pinned < 0) {
> + ret = pinned;
> + goto free_pages;
> + } else if (pinned != mapping->npages) {
> + ret = -EFAULT;
> + mapping->npages = pinned; /* update npages for reclaim */
> + goto unpin_pages;
> + }
Sorry if I missed it, but I still don't see where you reject file mappings
here.
This is also the wrong interface for upstream. Please get involved with
the fd-based guest memory discussions [1] and port your series to that.
This patch cannot be merged in its current form.
Will
[1] https://lore.kernel.org/kvm/[email protected]/
On 5/19/2023 4:59 AM, Will Deacon wrote:
> Hi Elliot,
>
> On Tue, May 09, 2023 at 01:47:47PM -0700, Elliot Berman wrote:
>> When launching a virtual machine, Gunyah userspace allocates memory for
>> the guest and informs Gunyah about these memory regions through
>> SET_USER_MEMORY_REGION ioctl.
>>
>> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
>> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
>> Signed-off-by: Elliot Berman <[email protected]>
>> ---
>> drivers/virt/gunyah/Makefile | 2 +-
>> drivers/virt/gunyah/vm_mgr.c | 59 +++++++-
>> drivers/virt/gunyah/vm_mgr.h | 26 ++++
>> drivers/virt/gunyah/vm_mgr_mm.c | 236 ++++++++++++++++++++++++++++++++
>> include/uapi/linux/gunyah.h | 37 +++++
>> 5 files changed, 356 insertions(+), 4 deletions(-)
>> create mode 100644 drivers/virt/gunyah/vm_mgr_mm.c
>
> [...]
>
>> +int gh_vm_mem_alloc(struct gh_vm *ghvm, struct gh_userspace_memory_region *region)
>> +{
>> + struct gh_vm_mem *mapping, *tmp_mapping;
>> + struct page *curr_page, *prev_page;
>> + struct gh_rm_mem_parcel *parcel;
>> + int i, j, pinned, ret = 0;
>> + unsigned int gup_flags;
>> + size_t entry_size;
>> + u16 vmid;
>> +
>> + if (!region->memory_size || !PAGE_ALIGNED(region->memory_size) ||
>> + !PAGE_ALIGNED(region->userspace_addr) ||
>> + !PAGE_ALIGNED(region->guest_phys_addr))
>> + return -EINVAL;
>> +
>> + if (overflows_type(region->guest_phys_addr + region->memory_size, u64))
>> + return -EOVERFLOW;
>> +
>> + ret = mutex_lock_interruptible(&ghvm->mm_lock);
>> + if (ret)
>> + return ret;
>> +
>> + mapping = __gh_vm_mem_find_by_label(ghvm, region->label);
>> + if (mapping) {
>> + ret = -EEXIST;
>> + goto unlock;
>> + }
>> +
>> + list_for_each_entry(tmp_mapping, &ghvm->memory_mappings, list) {
>> + if (gh_vm_mem_overlap(tmp_mapping, region->guest_phys_addr,
>> + region->memory_size)) {
>> + ret = -EEXIST;
>> + goto unlock;
>> + }
>> + }
>> +
>> + mapping = kzalloc(sizeof(*mapping), GFP_KERNEL_ACCOUNT);
>> + if (!mapping) {
>> + ret = -ENOMEM;
>> + goto unlock;
>> + }
>> +
>> + mapping->guest_phys_addr = region->guest_phys_addr;
>> + mapping->npages = region->memory_size >> PAGE_SHIFT;
>> + parcel = &mapping->parcel;
>> + parcel->label = region->label;
>> + parcel->mem_handle = GH_MEM_HANDLE_INVAL; /* to be filled later by mem_share/mem_lend */
>> + parcel->mem_type = GH_RM_MEM_TYPE_NORMAL;
>> +
>> + ret = account_locked_vm(ghvm->mm, mapping->npages, true);
>> + if (ret)
>> + goto free_mapping;
>> +
>> + mapping->pages = kcalloc(mapping->npages, sizeof(*mapping->pages), GFP_KERNEL_ACCOUNT);
>> + if (!mapping->pages) {
>> + ret = -ENOMEM;
>> + mapping->npages = 0; /* update npages for reclaim */
>> + goto unlock_pages;
>> + }
>> +
>> + gup_flags = FOLL_LONGTERM;
>> + if (region->flags & GH_MEM_ALLOW_WRITE)
>> + gup_flags |= FOLL_WRITE;
>> +
>> + pinned = pin_user_pages_fast(region->userspace_addr, mapping->npages,
>> + gup_flags, mapping->pages);
>> + if (pinned < 0) {
>> + ret = pinned;
>> + goto free_pages;
>> + } else if (pinned != mapping->npages) {
>> + ret = -EFAULT;
>> + mapping->npages = pinned; /* update npages for reclaim */
>> + goto unpin_pages;
>> + }
>
> Sorry if I missed it, but I still don't see where you reject file mappings
> here.
>
Sure, I can reject file mappings. I didn't catch that was the ask
previously and thought it was only a comment about behavior of file
mappings.
> This is also the wrong interface for upstream. Please get involved with
> the fd-based guest memory discussions [1] and port your series to that.
>
The user interface design for *shared* memory aligns with
KVM_SET_USER_MEMORY_REGION.
I understood we want to use restricted memfd for giving guest-private
memory (Gunyah calls this "lending memory"). When I went through the
changes, I gathered KVM is using restricted memfd only for guest-private
memory and not for shared memory. Thus, I dropped support for lending
memory to the guest VM and only retained the shared memory support in
this series. I'd like to merge what we can today and introduce the
guest-private memory support in tandem with the restricted memfd; I
don't see much reason to delay the series.
I briefly evaluated and picked the arm64/pKVM support that Fuad shared
[2] and found it should be fine for Gunyah. I did build-only at the
time. I don't have any comments on the base restricted_memfd support and
Fuad has not posted [2] on mailing lists yet as far as I can tell.
> This patch cannot be merged in its current form.
>
I am a little confused why the implementation to share memory with the
VM is being rejected. Besides rejecting file mappings, any other changes
needed to be accepted?
- Elliot
> Will
>
> [1] https://lore.kernel.org/kvm/[email protected]/
[2]:
https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/fdmem-v10-core
On 09/05/2023 23:47, Elliot Berman wrote:
> Gunyah message queues are a unidirectional inter-VM pipe for messages up
> to 1024 bytes. This driver supports pairing a receiver message queue and
> a transmitter message queue to expose a single mailbox channel.
>
> Signed-off-by: Elliot Berman <[email protected]>
> ---
> Documentation/virt/gunyah/message-queue.rst | 8 +
> drivers/mailbox/Makefile | 2 +
> drivers/mailbox/gunyah-msgq.c | 212 ++++++++++++++++++++
> include/linux/gunyah.h | 57 ++++++
> 4 files changed, 279 insertions(+)
> create mode 100644 drivers/mailbox/gunyah-msgq.c
>
> diff --git a/Documentation/virt/gunyah/message-queue.rst b/Documentation/virt/gunyah/message-queue.rst
> index b352918ae54b..70d82a4ef32d 100644
> --- a/Documentation/virt/gunyah/message-queue.rst
> +++ b/Documentation/virt/gunyah/message-queue.rst
Note, this file is not a part of the patchset. Trying to apply the
patchset results in rejects. Probably you missed the documentation patch
when sending this series.
> @@ -61,3 +61,11 @@ vIRQ: two TX message queues will have two vIRQs (and two capability IDs).
> | | | | | |
> | | | | | |
> +---------------+ +-----------------+ +---------------+
> +
> +Gunyah message queues are exposed as mailboxes. To create the mailbox, create
> +a mbox_client and call `gh_msgq_init()`. On receipt of the RX_READY interrupt,
> +all messages in the RX message queue are read and pushed via the `rx_callback`
> +of the registered mbox_client.
> +
> +.. kernel-doc:: drivers/mailbox/gunyah-msgq.c
> + :identifiers: gh_msgq_init--
With best wishes
Dmitry
On 09/05/2023 23:47, Elliot Berman wrote:
> Gunyah is a Type-1 hypervisor independent of any
> high-level OS kernel, and runs in a higher CPU privilege level. It does
> not depend on any lower-privileged OS kernel/code for its core
> functionality. This increases its security and can support a much smaller
> trusted computing base than a Type-2 hypervisor.
>
> Gunyah is an open source hypervisor. The source repo is available at
> https://github.com/quic/gunyah-hypervisor.
>
> The diagram below shows the architecture.
>
> ::
>
> VM A VM B
> +-----+ +-----+ | +-----+ +-----+ +-----+
> | | | | | | | | | | |
> EL0 | APP | | APP | | | APP | | APP | | APP |
> | | | | | | | | | | |
> +-----+ +-----+ | +-----+ +-----+ +-----+
> ---------------------|-------------------------
> +--------------+ | +----------------------+
> | | | | |
> EL1 | Linux Kernel | | |Linux kernel/Other OS | ...
> | | | | |
> +--------------+ | +----------------------+
> --------hvc/smc------|------hvc/smc------------
> +----------------------------------------+
> | |
> EL2 | Gunyah Hypervisor |
> | |
> +----------------------------------------+
>
> Gunyah provides these following features.
>
> - Threads and Scheduling: The scheduler schedules virtual CPUs (VCPUs) on
> physical CPUs and enables time-sharing of the CPUs.
> - Memory Management: Gunyah tracks memory ownership and use of all memory
> under its control. Memory partitioning between VMs is a fundamental
> security feature.
> - Interrupt Virtualization: All interrupts are handled in the hypervisor
> and routed to the assigned VM.
> - Inter-VM Communication: There are several different mechanisms provided
> for communicating between VMs.
> - Device Virtualization: Para-virtualization of devices is supported using
> inter-VM communication. Low level system features and devices such as
> interrupt controllers are supported with emulation where required.
>
> This series adds the basic framework for detecting that Linux is running
> under Gunyah as a virtual machine, communication with the Gunyah Resource
> Manager, and a sample virtual machine manager capable of launching virtual machines.
>
> The series relies on two other patches posted separately:
> - https://lore.kernel.org/all/[email protected]/
> - https://lore.kernel.org/all/[email protected]/
The second link returns "message ID not found" page.
>
> Changes in v13:
> - Tweaks to message queue driver to address race condition between IRQ and mailbox registration
> - Allow removal of VM functions by function-specific comparison -- specifically to allow
> removing irqfd by label only and not requiring original FD to be provided.
>
> Changes in v12: https://lore.kernel.org/all/[email protected]/
> - Stylistic/cosmetic tweaks suggested by Alex
> - Remove patch "virt: gunyah: Identify hypervisor version" and squash the
> check that we're running under a reasonable Gunyah hypervisor into RM driver
> - Refactor platform hooks into a separate module per suggestion from Srini
> - GFP_KERNEL_ACCOUNT and account_locked_vm() for page pinning
> - enum-ify related constants
>
> Changes in v11: https://lore.kernel.org/all/[email protected]/
> - Rename struct gh_vm_dtb_config:gpa -> guest_phys_addr & overflow checks for this
> - More docstrings throughout
> - Make resp_buf and resp_buf_size optional
> - Replace deprecated idr with xarray
> - Refconting on misc device instead of RM's platform device
> - Renaming variables, structs, etc. from gunyah_ -> gh_
> - Drop removal of user mem regions
> - Drop mem_lend functionality; to converge with restricted_memfd later
>
> Changes in v10: https://lore.kernel.org/all/[email protected]/
> - Fix bisectability (end result of series is same, --fixups applied to wrong commits)
> - Convert GH_ERROR_* and GH_RM_ERROR_* to enums
> - Correct race condition between allocating/freeing user memory
> - Replace offsetof with struct_size
> - Series-wide renaming of functions to be more consistent
> - VM shutdown & restart support added in vCPU and VM Manager patches
> - Convert VM function name (string) to type (number)
> - Convert VM function argument to value (which could be a pointer) to remove memory wastage for arguments
> - Remove defensive checks of hypervisor correctness
> - Clean ups to ioeventfd as suggested by Srivatsa
>
> Changes in v9: https://lore.kernel.org/all/[email protected]/
> - Refactor Gunyah API flags to be exposed as feature flags at kernel level
> - Move mbox client cleanup into gunyah_msgq_remove()
> - Simplify gh_rm_call return value and response payload
> - Missing clean-up/error handling/little endian fixes as suggested by Srivatsa and Alex in v8 series
>
> Changes in v8: https://lore.kernel.org/all/[email protected]/
> - Treat VM manager as a library of RM
> - Add patches 21-28 as RFC to support proxy-scheduled vCPUs and necessary bits to support virtio
> from Gunyah userspace
>
> Changes in v7: https://lore.kernel.org/all/[email protected]/
> - Refactor to remove gunyah RM bus
> - Refactor allow multiple RM device instances
> - Bump UAPI to start at 0x0
> - Refactor QCOM SCM's platform hooks to allow CONFIG_QCOM_SCM=Y/CONFIG_GUNYAH=M combinations
>
> Changes in v6: https://lore.kernel.org/all/[email protected]/
> - *Replace gunyah-console with gunyah VM Manager*
> - Move include/asm-generic/gunyah.h into include/linux/gunyah.h
> - s/gunyah_msgq/gh_msgq/
> - Minor tweaks and documentation tidying based on comments from Jiri, Greg, Arnd, Dmitry, and Bagas.
>
> Changes in v5: https://lore.kernel.org/all/[email protected]/
> - Dropped sysfs nodes
> - Switch from aux bus to Gunyah RM bus for the subdevices
> - Cleaning up RM console
>
> Changes in v4: https://lore.kernel.org/all/[email protected]/
> - Tidied up documentation throughout based on questions/feedback received
> - Switched message queue implementation to use mailboxes
> - Renamed "gunyah_device" as "gunyah_resource"
>
> Changes in v3: https://lore.kernel.org/all/[email protected]/
> - /Maintained/Supported/ in MAINTAINERS
> - Tidied up documentation throughout based on questions/feedback received
> - Moved hypercalls into arch/arm64/gunyah/; following hyper-v's implementation
> - Drop opaque typedefs
> - Move sysfs nodes under /sys/hypervisor/gunyah/
> - Moved Gunyah console driver to drivers/tty/
> - Reworked gh_device design to drop the Gunyah bus.
>
> Changes in v2: https://lore.kernel.org/all/[email protected]/
> - DT bindings clean up
> - Switch hypercalls to follow SMCCC
>
> v1: https://lore.kernel.org/all/[email protected]/
>
> Elliot Berman (24):
> dt-bindings: Add binding for gunyah hypervisor
> gunyah: Common types and error codes for Gunyah hypercalls
> virt: gunyah: Add hypercalls to identify Gunyah
> virt: gunyah: msgq: Add hypercalls to send and receive messages
> mailbox: Add Gunyah message queue mailbox
> gunyah: rsc_mgr: Add resource manager RPC core
> gunyah: rsc_mgr: Add VM lifecycle RPC
> gunyah: vm_mgr: Introduce basic VM Manager
> gunyah: rsc_mgr: Add RPC for sharing memory
> gunyah: vm_mgr: Add/remove user memory regions
> gunyah: vm_mgr: Add ioctls to support basic non-proxy VM boot
> samples: Add sample userspace Gunyah VM Manager
> gunyah: rsc_mgr: Add platform ops on mem_lend/mem_reclaim
> virt: gunyah: Add Qualcomm Gunyah platform ops
> docs: gunyah: Document Gunyah VM Manager
> virt: gunyah: Translate gh_rm_hyp_resource into gunyah_resource
> gunyah: vm_mgr: Add framework for VM Functions
> virt: gunyah: Add resource tickets
> virt: gunyah: Add IO handlers
> virt: gunyah: Add proxy-scheduled vCPUs
> virt: gunyah: Add hypercalls for sending doorbell
> virt: gunyah: Add irqfd interface
> virt: gunyah: Add ioeventfd
> MAINTAINERS: Add Gunyah hypervisor drivers section
>
> .../bindings/firmware/gunyah-hypervisor.yaml | 82 ++
> .../userspace-api/ioctl/ioctl-number.rst | 1 +
> Documentation/virt/gunyah/index.rst | 1 +
> Documentation/virt/gunyah/message-queue.rst | 8 +
> Documentation/virt/gunyah/vm-manager.rst | 142 +++
> MAINTAINERS | 13 +
> arch/arm64/Kbuild | 1 +
> arch/arm64/gunyah/Makefile | 3 +
> arch/arm64/gunyah/gunyah_hypercall.c | 140 +++
> arch/arm64/include/asm/gunyah.h | 24 +
> drivers/mailbox/Makefile | 2 +
> drivers/mailbox/gunyah-msgq.c | 212 ++++
> drivers/virt/Kconfig | 2 +
> drivers/virt/Makefile | 1 +
> drivers/virt/gunyah/Kconfig | 59 ++
> drivers/virt/gunyah/Makefile | 11 +
> drivers/virt/gunyah/gunyah_ioeventfd.c | 130 +++
> drivers/virt/gunyah/gunyah_irqfd.c | 180 ++++
> drivers/virt/gunyah/gunyah_platform_hooks.c | 80 ++
> drivers/virt/gunyah/gunyah_qcom.c | 147 +++
> drivers/virt/gunyah/gunyah_vcpu.c | 468 +++++++++
> drivers/virt/gunyah/rsc_mgr.c | 910 ++++++++++++++++++
> drivers/virt/gunyah/rsc_mgr.h | 19 +
> drivers/virt/gunyah/rsc_mgr_rpc.c | 500 ++++++++++
> drivers/virt/gunyah/vm_mgr.c | 794 +++++++++++++++
> drivers/virt/gunyah/vm_mgr.h | 70 ++
> drivers/virt/gunyah/vm_mgr_mm.c | 256 +++++
> include/linux/gunyah.h | 207 ++++
> include/linux/gunyah_rsc_mgr.h | 162 ++++
> include/linux/gunyah_vm_mgr.h | 126 +++
> include/uapi/linux/gunyah.h | 293 ++++++
> samples/Kconfig | 10 +
> samples/Makefile | 1 +
> samples/gunyah/.gitignore | 2 +
> samples/gunyah/Makefile | 6 +
> samples/gunyah/gunyah_vmm.c | 270 ++++++
> samples/gunyah/sample_vm.dts | 68 ++
> 37 files changed, 5401 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml
> create mode 100644 Documentation/virt/gunyah/vm-manager.rst
> create mode 100644 arch/arm64/gunyah/Makefile
> create mode 100644 arch/arm64/gunyah/gunyah_hypercall.c
> create mode 100644 arch/arm64/include/asm/gunyah.h
> create mode 100644 drivers/mailbox/gunyah-msgq.c
> create mode 100644 drivers/virt/gunyah/Kconfig
> create mode 100644 drivers/virt/gunyah/Makefile
> create mode 100644 drivers/virt/gunyah/gunyah_ioeventfd.c
> create mode 100644 drivers/virt/gunyah/gunyah_irqfd.c
> create mode 100644 drivers/virt/gunyah/gunyah_platform_hooks.c
> create mode 100644 drivers/virt/gunyah/gunyah_qcom.c
> create mode 100644 drivers/virt/gunyah/gunyah_vcpu.c
> create mode 100644 drivers/virt/gunyah/rsc_mgr.c
> create mode 100644 drivers/virt/gunyah/rsc_mgr.h
> create mode 100644 drivers/virt/gunyah/rsc_mgr_rpc.c
> create mode 100644 drivers/virt/gunyah/vm_mgr.c
> create mode 100644 drivers/virt/gunyah/vm_mgr.h
> create mode 100644 drivers/virt/gunyah/vm_mgr_mm.c
> create mode 100644 include/linux/gunyah.h
> create mode 100644 include/linux/gunyah_rsc_mgr.h
> create mode 100644 include/linux/gunyah_vm_mgr.h
> create mode 100644 include/uapi/linux/gunyah.h
> create mode 100644 samples/gunyah/.gitignore
> create mode 100644 samples/gunyah/Makefile
> create mode 100644 samples/gunyah/gunyah_vmm.c
> create mode 100644 samples/gunyah/sample_vm.dts
>
>
> base-commit: c8c655c34e33544aec9d64b660872ab33c29b5f1
> prerequisite-patch-id: b48c45acdec06adf37e09fe35e6a9412c5784800
> prerequisite-patch-id: bc27499c7652385c584424529edbc5781c074d68
--
With best wishes
Dmitry
Elliot Berman <[email protected]> writes:
> Gunyah is a Type-1 hypervisor independent of any
> high-level OS kernel, and runs in a higher CPU privilege level. It does
> not depend on any lower-privileged OS kernel/code for its core
> functionality. This increases its security and can support a much smaller
> trusted computing base than a Type-2 hypervisor.
>
<snip>
>
> The series relies on two other patches posted separately:
> - https://lore.kernel.org/all/[email protected]/
> -
> https://lore.kernel.org/all/[email protected]/
I couldn't find this one, but is this what it should have been:
b4 am -S -t [email protected]
Grabbing thread from lore.kernel.org/all/20230213232537.2040976-1-quic_eberman%40quicinc.com/t.mbox.gz
Analyzing 9 messages in the thread
Checking attestation on all messages, may take a moment...
---
✓ [PATCH 1/3] mailbox: Allow direct registration to a channel
+ Tested-by: Sudeep Holla <[email protected]>
✓ [PATCH 2/3] mailbox: omap: Use mbox_bind_client
+ Tested-by: Sudeep Holla <[email protected]>
✓ [PATCH 3/3] mailbox: pcc: Use mbox_bind_client
+ Tested-by: Sudeep Holla <[email protected]>
---
✓ Signed: DKIM/quicinc.com
---
Total patches: 3
---
Cover: ./20230213_quic_eberman_mailbox_allow_direct_registration_to_a_channel.cover
Link: https://lore.kernel.org/r/[email protected]
Base: base-commit 09e41676e35ab06e4bce8870ea3bf1f191c3cb90 not known, ignoring
Base: applies clean to current tree
git checkout -b 20230213_quic_eberman_quicinc_com HEAD
git am ./20230213_quic_eberman_mailbox_allow_direct_registration_to_a_channel.mbx
????18:10:45 alex@zen:linux.git on review/gunyah-v12 [$?]
➜ git am 20230213_quic_eberman_mailbox_allow_direct_registration_to_a_channel.mbx
Applying: mailbox: Allow direct registration to a channel
Applying: mailbox: omap: Use mbox_bind_client
Applying: mailbox: pcc: Use mbox_bind_client
<snip>
>
> Elliot Berman (24):
<snip>
> mailbox: Add Gunyah message queue mailbox
This patch touches a file that isn't in mainline which makes me wonder
if I've missed another pre-requisite patch?
<snip>
> Documentation/virt/gunyah/message-queue.rst | 8 +
<snip>
--
Alex Bennée
Virtualisation Tech Lead @ Linaro
Hi Elliot,
[+Quentin since he's looked at the MMU notifiers]
Sorry for the slow response, I got buried in email during a week away.
On Fri, May 19, 2023 at 10:02:29AM -0700, Elliot Berman wrote:
> On 5/19/2023 4:59 AM, Will Deacon wrote:
> > On Tue, May 09, 2023 at 01:47:47PM -0700, Elliot Berman wrote:
> > > + ret = account_locked_vm(ghvm->mm, mapping->npages, true);
> > > + if (ret)
> > > + goto free_mapping;
> > > +
> > > + mapping->pages = kcalloc(mapping->npages, sizeof(*mapping->pages), GFP_KERNEL_ACCOUNT);
> > > + if (!mapping->pages) {
> > > + ret = -ENOMEM;
> > > + mapping->npages = 0; /* update npages for reclaim */
> > > + goto unlock_pages;
> > > + }
> > > +
> > > + gup_flags = FOLL_LONGTERM;
> > > + if (region->flags & GH_MEM_ALLOW_WRITE)
> > > + gup_flags |= FOLL_WRITE;
> > > +
> > > + pinned = pin_user_pages_fast(region->userspace_addr, mapping->npages,
> > > + gup_flags, mapping->pages);
> > > + if (pinned < 0) {
> > > + ret = pinned;
> > > + goto free_pages;
> > > + } else if (pinned != mapping->npages) {
> > > + ret = -EFAULT;
> > > + mapping->npages = pinned; /* update npages for reclaim */
> > > + goto unpin_pages;
> > > + }
> >
> > Sorry if I missed it, but I still don't see where you reject file mappings
> > here.
> >
>
> Sure, I can reject file mappings. I didn't catch that was the ask previously
> and thought it was only a comment about behavior of file mappings.
I thought the mention of filesystem corruption was clear enough! It's
definitely something we shouldn't allow.
> > This is also the wrong interface for upstream. Please get involved with
> > the fd-based guest memory discussions [1] and port your series to that.
> >
>
> The user interface design for *shared* memory aligns with
> KVM_SET_USER_MEMORY_REGION.
I don't think it does. For example, file mappings don't work (as above),
you're placing additional rlimit requirements on the caller, read-only
memslots are not functional, the memory cannot be swapped or migrated,
dirty logging doesn't work etc. pKVM is in the same boat, but that's why
we're not upstreaming this part in its current form.
> I understood we want to use restricted memfd for giving guest-private memory
> (Gunyah calls this "lending memory"). When I went through the changes, I
> gathered KVM is using restricted memfd only for guest-private memory and not
> for shared memory. Thus, I dropped support for lending memory to the guest
> VM and only retained the shared memory support in this series. I'd like to
> merge what we can today and introduce the guest-private memory support in
> tandem with the restricted memfd; I don't see much reason to delay the
> series.
Right, protected guests will use the new restricted memfd ("guest mem"
now, I think?), but non-protected guests should implement the existing
interface *without* the need for the GUP pin on guest memory pages. Yes,
that means full support for MMU notifiers so that these pages can be
managed properly by the host kernel. We're working on that for pKVM, but
it requires a more flexible form of memory sharing over what we currently
have so that e.g. the zero page can be shared between multiple entities.
Will
On 5/9/23 3:47 PM, Elliot Berman wrote:
> When launching a virtual machine, Gunyah userspace allocates memory for
> the guest and informs Gunyah about these memory regions through
> SET_USER_MEMORY_REGION ioctl.
>
> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
Two minor comments below. In any case:
Reviewed-by: Alex Elder <[email protected]>
> ---
> drivers/virt/gunyah/Makefile | 2 +-
> drivers/virt/gunyah/vm_mgr.c | 59 +++++++-
> drivers/virt/gunyah/vm_mgr.h | 26 ++++
> drivers/virt/gunyah/vm_mgr_mm.c | 236 ++++++++++++++++++++++++++++++++
> include/uapi/linux/gunyah.h | 37 +++++
> 5 files changed, 356 insertions(+), 4 deletions(-)
> create mode 100644 drivers/virt/gunyah/vm_mgr_mm.c
>
> diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
> index e47e25895299..bacf78b8fa33 100644
> --- a/drivers/virt/gunyah/Makefile
> +++ b/drivers/virt/gunyah/Makefile
> @@ -1,4 +1,4 @@
> # SPDX-License-Identifier: GPL-2.0
>
> -gunyah-y += rsc_mgr.o rsc_mgr_rpc.o vm_mgr.o
> +gunyah-y += rsc_mgr.o rsc_mgr_rpc.o vm_mgr.o vm_mgr_mm.o
> obj-$(CONFIG_GUNYAH) += gunyah.o
> diff --git a/drivers/virt/gunyah/vm_mgr.c b/drivers/virt/gunyah/vm_mgr.c
> index a43401cb34f7..297427952b8c 100644
> --- a/drivers/virt/gunyah/vm_mgr.c
> +++ b/drivers/virt/gunyah/vm_mgr.c
> @@ -15,6 +15,8 @@
>
> #include "vm_mgr.h"
>
> +static void gh_vm_free(struct work_struct *work);
> +
You could just define gh_vm_free() here rather than declaring
and defining it later.
> static __must_check struct gh_vm *gh_vm_alloc(struct gh_rm *rm)
> {
> struct gh_vm *ghvm;
. . .
> diff --git a/drivers/virt/gunyah/vm_mgr_mm.c b/drivers/virt/gunyah/vm_mgr_mm.c
> new file mode 100644
> index 000000000000..91109bbf36b3
> --- /dev/null
> +++ b/drivers/virt/gunyah/vm_mgr_mm.c
> @@ -0,0 +1,236 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#define pr_fmt(fmt) "gh_vm_mgr: " fmt
> +
> +#include <linux/gunyah_rsc_mgr.h>
> +#include <linux/mm.h>
> +
> +#include <uapi/linux/gunyah.h>
> +
> +#include "vm_mgr.h"
> +
> +static bool pages_are_mergeable(struct page *a, struct page *b)
> +{
> + if (page_to_pfn(a) + 1 != page_to_pfn(b))
> + return false;
> + if (!zone_device_pages_have_same_pgmap(a, b))
> + return false;
> + return true;
Maybe just:
return zone_device_pages_have_same_pgmap(a, b);
> +}
> +
> +static bool gh_vm_mem_overlap(struct gh_vm_mem *a, u64 addr, u64 size)
> +{
> + u64 a_end = a->guest_phys_addr + (a->npages << PAGE_SHIFT);
> + u64 end = addr + size;
> +
> + return a->guest_phys_addr < end && addr < a_end;
> +}
> +
. . .
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Add architecture-independent standard error codes, types, and macros for
> Gunyah hypercalls.
>
> Reviewed-by: Dmitry Baryshkov <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
Looks OK to me.
Reviewed-by: Alex Elder <[email protected]>
> ---
> include/linux/gunyah.h | 83 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 83 insertions(+)
> create mode 100644 include/linux/gunyah.h
>
> diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
> new file mode 100644
> index 000000000000..a4e8ec91961d
> --- /dev/null
> +++ b/include/linux/gunyah.h
> @@ -0,0 +1,83 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#ifndef _LINUX_GUNYAH_H
> +#define _LINUX_GUNYAH_H
> +
> +#include <linux/errno.h>
> +#include <linux/limits.h>
> +
> +/******************************************************************************/
> +/* Common arch-independent definitions for Gunyah hypercalls */
> +#define GH_CAPID_INVAL U64_MAX
> +#define GH_VMID_ROOT_VM 0xff
> +
> +enum gh_error {
> + GH_ERROR_OK = 0,
> + GH_ERROR_UNIMPLEMENTED = -1,
> + GH_ERROR_RETRY = -2,
I know you explained it "should be OK" to use a negative
value (with unspecified bit width) here. I continue to
feel it's not well-enough specified for an external API,
but I'm going to try to just let it go.
> + GH_ERROR_ARG_INVAL = 1,
> + GH_ERROR_ARG_SIZE = 2,
> + GH_ERROR_ARG_ALIGN = 3,
> +
> + GH_ERROR_NOMEM = 10,
> +
> + GH_ERROR_ADDR_OVFL = 20,
> + GH_ERROR_ADDR_UNFL = 21,
> + GH_ERROR_ADDR_INVAL = 22,
> +
> + GH_ERROR_DENIED = 30,
> + GH_ERROR_BUSY = 31,
> + GH_ERROR_IDLE = 32,
> +
> + GH_ERROR_IRQ_BOUND = 40,
> + GH_ERROR_IRQ_UNBOUND = 41,
> +
> + GH_ERROR_CSPACE_CAP_NULL = 50,
> + GH_ERROR_CSPACE_CAP_REVOKED = 51,
> + GH_ERROR_CSPACE_WRONG_OBJ_TYPE = 52,
> + GH_ERROR_CSPACE_INSUF_RIGHTS = 53,
> + GH_ERROR_CSPACE_FULL = 54,
> +
> + GH_ERROR_MSGQUEUE_EMPTY = 60,
> + GH_ERROR_MSGQUEUE_FULL = 61,
> +};
> +
> +/**
> + * gh_error_remap() - Remap Gunyah hypervisor errors into a Linux error code
> + * @gh_error: Gunyah hypercall return value
> + */
> +static inline int gh_error_remap(enum gh_error gh_error)
> +{
> + switch (gh_error) {
> + case GH_ERROR_OK:
> + return 0;
> + case GH_ERROR_NOMEM:
> + return -ENOMEM;
> + case GH_ERROR_DENIED:
> + case GH_ERROR_CSPACE_CAP_NULL:
> + case GH_ERROR_CSPACE_CAP_REVOKED:
> + case GH_ERROR_CSPACE_WRONG_OBJ_TYPE:
> + case GH_ERROR_CSPACE_INSUF_RIGHTS:
> + case GH_ERROR_CSPACE_FULL:
> + return -EACCES;
> + case GH_ERROR_BUSY:
> + case GH_ERROR_IDLE:
> + return -EBUSY;
> + case GH_ERROR_IRQ_BOUND:
> + case GH_ERROR_IRQ_UNBOUND:
> + case GH_ERROR_MSGQUEUE_FULL:
> + case GH_ERROR_MSGQUEUE_EMPTY:
> + return -EIO;
> + case GH_ERROR_UNIMPLEMENTED:
> + case GH_ERROR_RETRY:
> + return -EOPNOTSUPP;
> + default:
> + return -EINVAL;
> + }
> +}
> +
> +#endif
On 5/9/23 3:47 PM, Elliot Berman wrote:
> On Qualcomm platforms, there is a firmware entity which controls access
> to physical pages. In order to share memory with another VM, this entity
> needs to be informed that the guest VM should have access to the memory.
You might be able to avoid the lock by using rcu_assign_pointer() and
rcu_dereference(), but I'm not recommending it (because I'm not sure).
I have one more suggestion below.
Reviewed-by: Alex Elder <[email protected]>
> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
> ---
> drivers/virt/gunyah/Kconfig | 4 ++
> drivers/virt/gunyah/Makefile | 2 +
> drivers/virt/gunyah/gunyah_platform_hooks.c | 80 +++++++++++++++++++++
> drivers/virt/gunyah/rsc_mgr.h | 3 +
> drivers/virt/gunyah/rsc_mgr_rpc.c | 18 ++++-
> include/linux/gunyah_rsc_mgr.h | 17 +++++
> 6 files changed, 122 insertions(+), 2 deletions(-)
> create mode 100644 drivers/virt/gunyah/gunyah_platform_hooks.c
>
> diff --git a/drivers/virt/gunyah/Kconfig b/drivers/virt/gunyah/Kconfig
> index 1a737694c333..de815189dab6 100644
> --- a/drivers/virt/gunyah/Kconfig
> +++ b/drivers/virt/gunyah/Kconfig
> @@ -4,6 +4,7 @@ config GUNYAH
> tristate "Gunyah Virtualization drivers"
> depends on ARM64
> depends on MAILBOX
> + select GUNYAH_PLATFORM_HOOKS
> help
> The Gunyah drivers are the helper interfaces that run in a guest VM
> such as basic inter-VM IPC and signaling mechanisms, and higher level
> @@ -11,3 +12,6 @@ config GUNYAH
>
> Say Y/M here to enable the drivers needed to interact in a Gunyah
> virtual environment.
> +
> +config GUNYAH_PLATFORM_HOOKS
> + tristate
> diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
> index bacf78b8fa33..4fbeee521d60 100644
> --- a/drivers/virt/gunyah/Makefile
> +++ b/drivers/virt/gunyah/Makefile
> @@ -1,4 +1,6 @@
> # SPDX-License-Identifier: GPL-2.0
>
> +obj-$(CONFIG_GUNYAH_PLATFORM_HOOKS) += gunyah_platform_hooks.o
> +
> gunyah-y += rsc_mgr.o rsc_mgr_rpc.o vm_mgr.o vm_mgr_mm.o
> obj-$(CONFIG_GUNYAH) += gunyah.o
> diff --git a/drivers/virt/gunyah/gunyah_platform_hooks.c b/drivers/virt/gunyah/gunyah_platform_hooks.c
> new file mode 100644
> index 000000000000..60da0e154e98
> --- /dev/null
> +++ b/drivers/virt/gunyah/gunyah_platform_hooks.c
> @@ -0,0 +1,80 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/rwsem.h>
> +#include <linux/gunyah_rsc_mgr.h>
> +
> +#include "rsc_mgr.h"
> +
> +static struct gh_rm_platform_ops *rm_platform_ops;
> +static DECLARE_RWSEM(rm_platform_ops_lock);
> +
> +int gh_rm_platform_pre_mem_share(struct gh_rm *rm, struct gh_rm_mem_parcel *mem_parcel)
> +{
> + int ret = 0;
> +
> + down_read(&rm_platform_ops_lock);
> + if (rm_platform_ops && rm_platform_ops->pre_mem_share)
> + ret = rm_platform_ops->pre_mem_share(rm, mem_parcel);
> + up_read(&rm_platform_ops_lock);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(gh_rm_platform_pre_mem_share);
> +
> +int gh_rm_platform_post_mem_reclaim(struct gh_rm *rm, struct gh_rm_mem_parcel *mem_parcel)
> +{
> + int ret = 0;
> +
> + down_read(&rm_platform_ops_lock);
> + if (rm_platform_ops && rm_platform_ops->post_mem_reclaim)
> + ret = rm_platform_ops->post_mem_reclaim(rm, mem_parcel);
> + up_read(&rm_platform_ops_lock);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(gh_rm_platform_post_mem_reclaim);
> +
> +int gh_rm_register_platform_ops(struct gh_rm_platform_ops *platform_ops)
Can (should) platform_ops be declared as const? (I think it can,
that would be better as long as you don't expect operation function
pointers to be added after registration.) If you do that, all such
arguments will probably need to be updated to pointer-to-const.
> +{
> + int ret = 0;
> +
> + down_write(&rm_platform_ops_lock);
> + if (!rm_platform_ops)
> + rm_platform_ops = platform_ops;
> + else
> + ret = -EEXIST;
> + up_write(&rm_platform_ops_lock);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(gh_rm_register_platform_ops);
. . .
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Add Gunyah Resource Manager RPC to launch an unauthenticated VM.
>
> Signed-off-by: Elliot Berman <[email protected]>
Looks good to me.
Reviewed-by: Alex Elder <[email protected]>
> ---
> drivers/virt/gunyah/Makefile | 2 +-
> drivers/virt/gunyah/rsc_mgr_rpc.c | 259 ++++++++++++++++++++++++++++++
> include/linux/gunyah_rsc_mgr.h | 73 +++++++++
> 3 files changed, 333 insertions(+), 1 deletion(-)
> create mode 100644 drivers/virt/gunyah/rsc_mgr_rpc.c
. . .
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Introduce a framework for Gunyah userspace to install VM functions. VM
> functions are optional interfaces to the virtual machine. vCPUs,
> ioeventfs, and irqfds are examples of such VM functions and are
s/ioventfs/ioventfds/
Also, these aren't just examples of VM functions, they *are* the
VM functions implemented.
> implemented in subsequent patches.
>
> A generic framework is implemented instead of individual ioctls to
> create vCPUs, irqfds, etc., in order to simplify the VM manager core
> implementation and allow dynamic loading of VM function modules.
This also allows the set of VM functions to be extended without
updating the API (like it or not).
>
> Signed-off-by: Elliot Berman <[email protected]>
I have a few more comments, but this looks pretty good.
Reviewed-by: Alex Elder <[email protected]>
> ---
> Documentation/virt/gunyah/vm-manager.rst | 18 ++
> drivers/virt/gunyah/vm_mgr.c | 216 ++++++++++++++++++++++-
> drivers/virt/gunyah/vm_mgr.h | 4 +
> include/linux/gunyah_vm_mgr.h | 87 +++++++++
> include/uapi/linux/gunyah.h | 18 ++
> 5 files changed, 340 insertions(+), 3 deletions(-)
> create mode 100644 include/linux/gunyah_vm_mgr.h
>
> diff --git a/Documentation/virt/gunyah/vm-manager.rst b/Documentation/virt/gunyah/vm-manager.rst
> index 50d8ae7fabcd..3b51bab9d793 100644
> --- a/Documentation/virt/gunyah/vm-manager.rst
> +++ b/Documentation/virt/gunyah/vm-manager.rst
> @@ -17,6 +17,24 @@ sharing userspace memory with a VM is done via the `GH_VM_SET_USER_MEM_REGION`_
> ioctl. The VM itself is configured to use the memory region via the
> devicetree.
>
> +Gunyah Functions
> +================
> +
> +Components of a Gunyah VM's configuration that need kernel configuration are
> +called "functions" and are built on top of a framework. Functions are identified
> +by a string and have some argument(s) to configure them. They are typically
> +created by the `GH_VM_ADD_FUNCTION`_ ioctl.
Is a function *type* (e.g., VCPU or ioeventfd) identified by a string?
Or a function *instance* (e.g. four VCPUs)? Or both?
> +
> +Functions typically will always do at least one of these operations:
Typically, or always?
> +
> +1. Create resource ticket(s). Resource tickets allow a function to register
> + itself as the client for a Gunyah resource (e.g. doorbell or vCPU) and
> + the function is given the pointer to the &struct gh_resource when the
> + VM is starting.
> +
What I think this means is that tickets are used to allow functions
to be defined *before* the VM is actually started. So once it starts,
the functions get added. (I might have this slightly wrong, but in
any case I'm not sure the above sentence is very clear.)
> +2. Register IO handler(s). IO handlers allow a function to handle stage-2 faults
> + from the virtual machine.
> +
> Sample Userspace VMM
> ====================
>
> diff --git a/drivers/virt/gunyah/vm_mgr.c b/drivers/virt/gunyah/vm_mgr.c
> index a800061f56bf..56464451b262 100644
> --- a/drivers/virt/gunyah/vm_mgr.c
> +++ b/drivers/virt/gunyah/vm_mgr.c
> @@ -6,10 +6,13 @@
> #define pr_fmt(fmt) "gh_vm_mgr: " fmt
>
> #include <linux/anon_inodes.h>
> +#include <linux/compat.h>
> #include <linux/file.h>
> #include <linux/gunyah_rsc_mgr.h>
> +#include <linux/gunyah_vm_mgr.h>
> #include <linux/miscdevice.h>
> #include <linux/module.h>
> +#include <linux/xarray.h>
>
> #include <uapi/linux/gunyah.h>
>
> @@ -17,6 +20,172 @@
>
> static void gh_vm_free(struct work_struct *work);
>
> +static DEFINE_XARRAY(gh_vm_functions);
> +
> +static void gh_vm_put_function(struct gh_vm_function *fn)
> +{
> + module_put(fn->mod);
> +}
> +
> +static struct gh_vm_function *gh_vm_get_function(u32 type)
> +{
> + struct gh_vm_function *fn;
> + int r;
> +
> + fn = xa_load(&gh_vm_functions, type);
> + if (!fn) {
> + r = request_module("ghfunc:%d", type);
> + if (r)
> + return ERR_PTR(r > 0 ? -r : r);
Almost all callers of request_module() simply ignore the
return value. What positive values are you expecting to
see here (and are you sure they're positive errno values)?
> +
> + fn = xa_load(&gh_vm_functions, type);
> + }
> +
> + if (!fn || !try_module_get(fn->mod))
> + fn = ERR_PTR(-ENOENT);
> +
> + return fn;
> +}
. . .
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Gunyah is a Type-1 hypervisor independent of any
> high-level OS kernel, and runs in a higher CPU privilege level. It does
> not depend on any lower-privileged OS kernel/code for its core
> functionality. This increases its security and can support a much smaller
> trusted computing base than a Type-2 hypervisor.
>
> Gunyah is an open source hypervisor. The source repo is available at
> https://github.com/quic/gunyah-hypervisor.
Does the kernel code in this patch series function with the
hypervisor code at the link above? It looks like it has not
been updated in 2 years. If not, where can one find the open
source hypervisor code that this kernel driver *does* function
with? If it is not available now, *when* will it be published?
It's OK for the hypervisor to be closed source, but if that's
the case, the statement about it being open source should not
be made.
In addition, the Gunyah resource manager is a fundamental
component of Gunyah. It's code appears to be here:
https://github.com/quic/gunyah-resource-manager/
I haven't looked further on in the kernel documentation
yet but if there is a permanent place where the open source
hypervisor and resource manager code will reside, you should
link to both repositories (and anything else that might be
required) there.
Previously I reviewed the net result of all applied patches,
and did a pretty detailed review of the code. I'm comfortable
I've previously pointed out things I thought were significant,
so this time around I'm doing less detailed review, looking at
each individual patch. For the most part, it looks fine to me
(and in most cases I've provided a Reviewed-by tag).
I'll state once more that my review is oriented toward correct
code and good practices. For "virtualization issues" you should
rely on others (like Will) to provide informed feedback.
-Alex
> The diagram below shows the architecture.
>
> ::
>
> VM A VM B
> +-----+ +-----+ | +-----+ +-----+ +-----+
> | | | | | | | | | | |
> EL0 | APP | | APP | | | APP | | APP | | APP |
> | | | | | | | | | | |
> +-----+ +-----+ | +-----+ +-----+ +-----+
> ---------------------|-------------------------
> +--------------+ | +----------------------+
> | | | | |
> EL1 | Linux Kernel | | |Linux kernel/Other OS | ...
> | | | | |
> +--------------+ | +----------------------+
> --------hvc/smc------|------hvc/smc------------
> +----------------------------------------+
> | |
> EL2 | Gunyah Hypervisor |
> | |
> +----------------------------------------+
>
> Gunyah provides these following features.
>
> - Threads and Scheduling: The scheduler schedules virtual CPUs (VCPUs) on
> physical CPUs and enables time-sharing of the CPUs.
> - Memory Management: Gunyah tracks memory ownership and use of all memory
> under its control. Memory partitioning between VMs is a fundamental
> security feature.
> - Interrupt Virtualization: All interrupts are handled in the hypervisor
> and routed to the assigned VM.
> - Inter-VM Communication: There are several different mechanisms provided
> for communicating between VMs.
> - Device Virtualization: Para-virtualization of devices is supported using
> inter-VM communication. Low level system features and devices such as
> interrupt controllers are supported with emulation where required.
>
> This series adds the basic framework for detecting that Linux is running
> under Gunyah as a virtual machine, communication with the Gunyah Resource
> Manager, and a sample virtual machine manager capable of launching virtual machines.
>
> The series relies on two other patches posted separately:
> - https://lore.kernel.org/all/[email protected]/
The above patch has been applied in the Qualcomm tree.
> - https://lore.kernel.org/all/[email protected]/
And this link doesn't lead to a patch. Has it too been
applied? (Otherwise, it should be corrected.)
-Alex
> Changes in v13:
> - Tweaks to message queue driver to address race condition between IRQ and mailbox registration
> - Allow removal of VM functions by function-specific comparison -- specifically to allow
> removing irqfd by label only and not requiring original FD to be provided.
>
> Changes in v12: https://lore.kernel.org/all/[email protected]/
> - Stylistic/cosmetic tweaks suggested by Alex
> - Remove patch "virt: gunyah: Identify hypervisor version" and squash the
> check that we're running under a reasonable Gunyah hypervisor into RM driver
> - Refactor platform hooks into a separate module per suggestion from Srini
> - GFP_KERNEL_ACCOUNT and account_locked_vm() for page pinning
> - enum-ify related constants
>
> Changes in v11: https://lore.kernel.org/all/[email protected]/
> - Rename struct gh_vm_dtb_config:gpa -> guest_phys_addr & overflow checks for this
> - More docstrings throughout
> - Make resp_buf and resp_buf_size optional
> - Replace deprecated idr with xarray
> - Refconting on misc device instead of RM's platform device
> - Renaming variables, structs, etc. from gunyah_ -> gh_
> - Drop removal of user mem regions
> - Drop mem_lend functionality; to converge with restricted_memfd later
>
> Changes in v10: https://lore.kernel.org/all/[email protected]/
> - Fix bisectability (end result of series is same, --fixups applied to wrong commits)
> - Convert GH_ERROR_* and GH_RM_ERROR_* to enums
> - Correct race condition between allocating/freeing user memory
> - Replace offsetof with struct_size
> - Series-wide renaming of functions to be more consistent
> - VM shutdown & restart support added in vCPU and VM Manager patches
> - Convert VM function name (string) to type (number)
> - Convert VM function argument to value (which could be a pointer) to remove memory wastage for arguments
> - Remove defensive checks of hypervisor correctness
> - Clean ups to ioeventfd as suggested by Srivatsa
>
> Changes in v9: https://lore.kernel.org/all/[email protected]/
> - Refactor Gunyah API flags to be exposed as feature flags at kernel level
> - Move mbox client cleanup into gunyah_msgq_remove()
> - Simplify gh_rm_call return value and response payload
> - Missing clean-up/error handling/little endian fixes as suggested by Srivatsa and Alex in v8 series
>
> Changes in v8: https://lore.kernel.org/all/[email protected]/
> - Treat VM manager as a library of RM
> - Add patches 21-28 as RFC to support proxy-scheduled vCPUs and necessary bits to support virtio
> from Gunyah userspace
>
> Changes in v7: https://lore.kernel.org/all/[email protected]/
> - Refactor to remove gunyah RM bus
> - Refactor allow multiple RM device instances
> - Bump UAPI to start at 0x0
> - Refactor QCOM SCM's platform hooks to allow CONFIG_QCOM_SCM=Y/CONFIG_GUNYAH=M combinations
>
> Changes in v6: https://lore.kernel.org/all/[email protected]/
> - *Replace gunyah-console with gunyah VM Manager*
> - Move include/asm-generic/gunyah.h into include/linux/gunyah.h
> - s/gunyah_msgq/gh_msgq/
> - Minor tweaks and documentation tidying based on comments from Jiri, Greg, Arnd, Dmitry, and Bagas.
>
> Changes in v5: https://lore.kernel.org/all/[email protected]/
> - Dropped sysfs nodes
> - Switch from aux bus to Gunyah RM bus for the subdevices
> - Cleaning up RM console
>
> Changes in v4: https://lore.kernel.org/all/[email protected]/
> - Tidied up documentation throughout based on questions/feedback received
> - Switched message queue implementation to use mailboxes
> - Renamed "gunyah_device" as "gunyah_resource"
>
> Changes in v3: https://lore.kernel.org/all/[email protected]/
> - /Maintained/Supported/ in MAINTAINERS
> - Tidied up documentation throughout based on questions/feedback received
> - Moved hypercalls into arch/arm64/gunyah/; following hyper-v's implementation
> - Drop opaque typedefs
> - Move sysfs nodes under /sys/hypervisor/gunyah/
> - Moved Gunyah console driver to drivers/tty/
> - Reworked gh_device design to drop the Gunyah bus.
>
> Changes in v2: https://lore.kernel.org/all/[email protected]/
> - DT bindings clean up
> - Switch hypercalls to follow SMCCC
>
> v1: https://lore.kernel.org/all/[email protected]/
>
> Elliot Berman (24):
> dt-bindings: Add binding for gunyah hypervisor
> gunyah: Common types and error codes for Gunyah hypercalls
> virt: gunyah: Add hypercalls to identify Gunyah
> virt: gunyah: msgq: Add hypercalls to send and receive messages
> mailbox: Add Gunyah message queue mailbox
> gunyah: rsc_mgr: Add resource manager RPC core
> gunyah: rsc_mgr: Add VM lifecycle RPC
> gunyah: vm_mgr: Introduce basic VM Manager
> gunyah: rsc_mgr: Add RPC for sharing memory
> gunyah: vm_mgr: Add/remove user memory regions
> gunyah: vm_mgr: Add ioctls to support basic non-proxy VM boot
> samples: Add sample userspace Gunyah VM Manager
> gunyah: rsc_mgr: Add platform ops on mem_lend/mem_reclaim
> virt: gunyah: Add Qualcomm Gunyah platform ops
> docs: gunyah: Document Gunyah VM Manager
> virt: gunyah: Translate gh_rm_hyp_resource into gunyah_resource
> gunyah: vm_mgr: Add framework for VM Functions
> virt: gunyah: Add resource tickets
> virt: gunyah: Add IO handlers
> virt: gunyah: Add proxy-scheduled vCPUs
> virt: gunyah: Add hypercalls for sending doorbell
> virt: gunyah: Add irqfd interface
> virt: gunyah: Add ioeventfd
> MAINTAINERS: Add Gunyah hypervisor drivers section
>
> .../bindings/firmware/gunyah-hypervisor.yaml | 82 ++
> .../userspace-api/ioctl/ioctl-number.rst | 1 +
> Documentation/virt/gunyah/index.rst | 1 +
> Documentation/virt/gunyah/message-queue.rst | 8 +
> Documentation/virt/gunyah/vm-manager.rst | 142 +++
> MAINTAINERS | 13 +
> arch/arm64/Kbuild | 1 +
> arch/arm64/gunyah/Makefile | 3 +
> arch/arm64/gunyah/gunyah_hypercall.c | 140 +++
> arch/arm64/include/asm/gunyah.h | 24 +
> drivers/mailbox/Makefile | 2 +
> drivers/mailbox/gunyah-msgq.c | 212 ++++
> drivers/virt/Kconfig | 2 +
> drivers/virt/Makefile | 1 +
> drivers/virt/gunyah/Kconfig | 59 ++
> drivers/virt/gunyah/Makefile | 11 +
> drivers/virt/gunyah/gunyah_ioeventfd.c | 130 +++
> drivers/virt/gunyah/gunyah_irqfd.c | 180 ++++
> drivers/virt/gunyah/gunyah_platform_hooks.c | 80 ++
> drivers/virt/gunyah/gunyah_qcom.c | 147 +++
> drivers/virt/gunyah/gunyah_vcpu.c | 468 +++++++++
> drivers/virt/gunyah/rsc_mgr.c | 910 ++++++++++++++++++
> drivers/virt/gunyah/rsc_mgr.h | 19 +
> drivers/virt/gunyah/rsc_mgr_rpc.c | 500 ++++++++++
> drivers/virt/gunyah/vm_mgr.c | 794 +++++++++++++++
> drivers/virt/gunyah/vm_mgr.h | 70 ++
> drivers/virt/gunyah/vm_mgr_mm.c | 256 +++++
> include/linux/gunyah.h | 207 ++++
> include/linux/gunyah_rsc_mgr.h | 162 ++++
> include/linux/gunyah_vm_mgr.h | 126 +++
> include/uapi/linux/gunyah.h | 293 ++++++
> samples/Kconfig | 10 +
> samples/Makefile | 1 +
> samples/gunyah/.gitignore | 2 +
> samples/gunyah/Makefile | 6 +
> samples/gunyah/gunyah_vmm.c | 270 ++++++
> samples/gunyah/sample_vm.dts | 68 ++
> 37 files changed, 5401 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml
> create mode 100644 Documentation/virt/gunyah/vm-manager.rst
> create mode 100644 arch/arm64/gunyah/Makefile
> create mode 100644 arch/arm64/gunyah/gunyah_hypercall.c
> create mode 100644 arch/arm64/include/asm/gunyah.h
> create mode 100644 drivers/mailbox/gunyah-msgq.c
> create mode 100644 drivers/virt/gunyah/Kconfig
> create mode 100644 drivers/virt/gunyah/Makefile
> create mode 100644 drivers/virt/gunyah/gunyah_ioeventfd.c
> create mode 100644 drivers/virt/gunyah/gunyah_irqfd.c
> create mode 100644 drivers/virt/gunyah/gunyah_platform_hooks.c
> create mode 100644 drivers/virt/gunyah/gunyah_qcom.c
> create mode 100644 drivers/virt/gunyah/gunyah_vcpu.c
> create mode 100644 drivers/virt/gunyah/rsc_mgr.c
> create mode 100644 drivers/virt/gunyah/rsc_mgr.h
> create mode 100644 drivers/virt/gunyah/rsc_mgr_rpc.c
> create mode 100644 drivers/virt/gunyah/vm_mgr.c
> create mode 100644 drivers/virt/gunyah/vm_mgr.h
> create mode 100644 drivers/virt/gunyah/vm_mgr_mm.c
> create mode 100644 include/linux/gunyah.h
> create mode 100644 include/linux/gunyah_rsc_mgr.h
> create mode 100644 include/linux/gunyah_vm_mgr.h
> create mode 100644 include/uapi/linux/gunyah.h
> create mode 100644 samples/gunyah/.gitignore
> create mode 100644 samples/gunyah/Makefile
> create mode 100644 samples/gunyah/gunyah_vmm.c
> create mode 100644 samples/gunyah/sample_vm.dts
>
>
> base-commit: c8c655c34e33544aec9d64b660872ab33c29b5f1
> prerequisite-patch-id: b48c45acdec06adf37e09fe35e6a9412c5784800
> prerequisite-patch-id: bc27499c7652385c584424529edbc5781c074d68
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Add framework for VM functions to handle stage-2 write faults from Gunyah
> guest virtual machines. IO handlers have a range of addresses which they
> apply to. Optionally, they may apply to only when the value written
> matches the IO handler's value.
>
> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
Looks good.
Reviewed-by: Alex Elder <[email protected]>
> ---
> drivers/virt/gunyah/vm_mgr.c | 104 ++++++++++++++++++++++++++++++++++
> drivers/virt/gunyah/vm_mgr.h | 4 ++
> include/linux/gunyah_vm_mgr.h | 25 ++++++++
> 3 files changed, 133 insertions(+)
. . .
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Gunyah allows host virtual machines to schedule guest virtual machines
> and handle their MMIO accesses. vCPUs are presented to the host as a
> Gunyah resource and represented to userspace as a Gunyah VM function.
>
> Creating the vcpu VM function will create a file descriptor that:
> - can run an ioctl: GH_VCPU_RUN to schedule the guest vCPU until the
> next interrupt occurs on the host or when the guest vCPU can no
> longer be run.
> - can be mmap'd to share a gh_vcpu_run structure which can look up the
> reason why GH_VCPU_RUN returned and provide return values for MMIO
> access.
>
> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
To be honest I have spent less time immersed in the VCPU stuff
than I would like.
I have looked through this patch today, though, and with the
exception of a typo I point out, it looks generally good to me.
For now I'm going to give you this; I might take a closer look
at a future date when you have updated your code.
Acked-by: Alex Elder <[email protected]>
> ---
> Documentation/virt/gunyah/vm-manager.rst | 46 ++-
> arch/arm64/gunyah/gunyah_hypercall.c | 28 ++
> drivers/virt/gunyah/Kconfig | 11 +
> drivers/virt/gunyah/Makefile | 2 +
> drivers/virt/gunyah/gunyah_vcpu.c | 468 +++++++++++++++++++++++
> drivers/virt/gunyah/vm_mgr.c | 4 +
> drivers/virt/gunyah/vm_mgr.h | 1 +
> include/linux/gunyah.h | 24 ++
> include/uapi/linux/gunyah.h | 128 +++++++
> 9 files changed, 710 insertions(+), 2 deletions(-)
> create mode 100644 drivers/virt/gunyah/gunyah_vcpu.c
>
> diff --git a/Documentation/virt/gunyah/vm-manager.rst b/Documentation/virt/gunyah/vm-manager.rst
> index 3b51bab9d793..6789d13fed14 100644
> --- a/Documentation/virt/gunyah/vm-manager.rst
> +++ b/Documentation/virt/gunyah/vm-manager.rst
> @@ -5,8 +5,7 @@ Virtual Machine Manager
> =======================
>
> The Gunyah Virtual Machine Manager is a Linux driver to support launching
> -virtual machines using Gunyah. It presently supports launching non-proxy
> -scheduled Linux-like virtual machines.
> +virtual machines using Gunyah.
>
> Except for some basic information about the location of initial binaries,
> most of the configuration about a Gunyah virtual machine is described in the
> @@ -98,3 +97,46 @@ GH_VM_START
> ~~~~~~~~~~~
>
> This ioctl starts the VM.
> +
> +GH_VM_ADD_FUNCTION
> +~~~~~~~~~~~~~~~~~~
> +
> +This ioctl registers a Gunyah VM function with the VM manager. The VM function
> +is described with a &struct gh_fn_desc.type and some arguments for that type.
> +Typically, the function is added before the VM starts, but the function doesn't
> +"operate" until the VM starts with `GH_VM_START`_. For example, vCPU ioclts will
s/ioclts/ioctls/
> +all return an error until the VM starts because the vCPUs don't exist until the
> +VM is started. This allows the VMM to set up all the kernel functions needed for
> +the VM *before* the VM starts.
> +
> +.. kernel-doc:: include/uapi/linux/gunyah.h
> + :identifiers: gh_fn_desc gh_fn_type
> +
. . .
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Gunyah message queues are a unidirectional inter-VM pipe for messages up
> to 1024 bytes. This driver supports pairing a receiver message queue and
> a transmitter message queue to expose a single mailbox channel.
>
> Signed-off-by: Elliot Berman <[email protected]>
This patch does not apply properly, because it updates
"message-queue.rst", which does not currently exist.
I'm going to ignore that for now, though.
I'm generally OK with giving you my "Reviewed-by" on this
but I'll wait until you send v14 with the full content
of "message-queue.rst".
Also, I suggest below that you update all your SPDX tags
to no longer use the deprecated "GPL-2.0" identifier.
-Alex
> ---
> Documentation/virt/gunyah/message-queue.rst | 8 +
> drivers/mailbox/Makefile | 2 +
> drivers/mailbox/gunyah-msgq.c | 212 ++++++++++++++++++++
> include/linux/gunyah.h | 57 ++++++
> 4 files changed, 279 insertions(+)
> create mode 100644 drivers/mailbox/gunyah-msgq.c
>
> diff --git a/Documentation/virt/gunyah/message-queue.rst b/Documentation/virt/gunyah/message-queue.rst
> index b352918ae54b..70d82a4ef32d 100644
> --- a/Documentation/virt/gunyah/message-queue.rst
> +++ b/Documentation/virt/gunyah/message-queue.rst
> @@ -61,3 +61,11 @@ vIRQ: two TX message queues will have two vIRQs (and two capability IDs).
> | | | | | |
> | | | | | |
> +---------------+ +-----------------+ +---------------+
> +
> +Gunyah message queues are exposed as mailboxes. To create the mailbox, create
> +a mbox_client and call `gh_msgq_init()`. On receipt of the RX_READY interrupt,
> +all messages in the RX message queue are read and pushed via the `rx_callback`
> +of the registered mbox_client.
> +
> +.. kernel-doc:: drivers/mailbox/gunyah-msgq.c
> + :identifiers: gh_msgq_init
> diff --git a/drivers/mailbox/Makefile b/drivers/mailbox/Makefile
> index fc9376117111..5f929bb55e9a 100644
> --- a/drivers/mailbox/Makefile
> +++ b/drivers/mailbox/Makefile
> @@ -55,6 +55,8 @@ obj-$(CONFIG_MTK_CMDQ_MBOX) += mtk-cmdq-mailbox.o
>
> obj-$(CONFIG_ZYNQMP_IPI_MBOX) += zynqmp-ipi-mailbox.o
>
> +obj-$(CONFIG_GUNYAH) += gunyah-msgq.o
> +
> obj-$(CONFIG_SUN6I_MSGBOX) += sun6i-msgbox.o
>
> obj-$(CONFIG_SPRD_MBOX) += sprd-mailbox.o
> diff --git a/drivers/mailbox/gunyah-msgq.c b/drivers/mailbox/gunyah-msgq.c
> new file mode 100644
> index 000000000000..b7a54f233680
> --- /dev/null
> +++ b/drivers/mailbox/gunyah-msgq.c
> @@ -0,0 +1,212 @@
> +// SPDX-License-Identifier: GPL-2.0-only
After seeing this tag I looked into it, and see that SPDX
has deprecated "GPL-2.0" in favor of "GPL-2.0-only".
Please update all SPDX licenses that use "GPL-2.0" to use
this new tag instead.
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <linux/mailbox_controller.h>
> +#include <linux/module.h>
> +#include <linux/interrupt.h>
> +#include <linux/gunyah.h>
> +#include <linux/printk.h>
> +#include <linux/init.h>
> +#include <linux/slab.h>
> +#include <linux/wait.h>
> +
> +#define mbox_chan_to_msgq(chan) (container_of(chan->mbox, struct gh_msgq, mbox))
Parentheses are not needed around a container_of() call.
> +
> +static irqreturn_t gh_msgq_rx_irq_handler(int irq, void *data)
> +{
> + struct gh_msgq *msgq = data;
> + struct gh_msgq_rx_data rx_data;
> + enum gh_error gh_error;
> + bool ready = true;
> +
> + while (ready) {
> + gh_error = gh_hypercall_msgq_recv(msgq->rx_ghrsc->capid,
> + &rx_data.data, sizeof(rx_data.data),
> + &rx_data.length, &ready);
> + if (gh_error != GH_ERROR_OK) {
> + if (gh_error != GH_ERROR_MSGQUEUE_EMPTY)
> + dev_warn(msgq->mbox.dev, "Failed to receive data: %d\n", gh_error);
> + break;
> + }
> + if (likely(gh_msgq_chan(msgq)->cl))
> + mbox_chan_received_data(gh_msgq_chan(msgq), &rx_data);
> + }
> +
> + return IRQ_HANDLED;
> +}
> +
> +/* Fired when message queue transitions from "full" to "space available" to send messages */
> +static irqreturn_t gh_msgq_tx_irq_handler(int irq, void *data)
> +{
> + struct gh_msgq *msgq = data;
> +
> + mbox_chan_txdone(gh_msgq_chan(msgq), 0);
> +
> + return IRQ_HANDLED;
> +}
> +
> +/* Fired after sending message and hypercall told us there was more space available. */
> +static void gh_msgq_txdone_tasklet(struct tasklet_struct *tasklet)
> +{
> + struct gh_msgq *msgq = container_of(tasklet, struct gh_msgq, txdone_tasklet);
> +
> + mbox_chan_txdone(gh_msgq_chan(msgq), msgq->last_ret);
> +}
> +
> +static int gh_msgq_send_data(struct mbox_chan *chan, void *data)
> +{
> + struct gh_msgq *msgq = mbox_chan_to_msgq(chan);
> + struct gh_msgq_tx_data *msgq_data = data;
> + u64 tx_flags = 0;
> + enum gh_error gh_error;
> + bool ready;
> +
> + if (!msgq->tx_ghrsc)
> + return -EOPNOTSUPP;
> +
> + if (msgq_data->push)
> + tx_flags |= GH_HYPERCALL_MSGQ_TX_FLAGS_PUSH;
> +
> + gh_error = gh_hypercall_msgq_send(msgq->tx_ghrsc->capid, msgq_data->length, msgq_data->data,
> + tx_flags, &ready);
> +
> + /**
> + * unlikely because Linux tracks state of msgq and should not try to
> + * send message when msgq is full.
> + */
> + if (unlikely(gh_error == GH_ERROR_MSGQUEUE_FULL))
> + return -EAGAIN;
> +
> + /**
> + * Propagate all other errors to client. If we return error to mailbox
> + * framework, then no other messages can be sent and nobody will know
> + * to retry this message.
> + */
> + msgq->last_ret = gh_error_remap(gh_error);
> +
> + /**
> + * This message was successfully sent, but message queue isn't ready to
> + * accept more messages because it's now full. Mailbox framework
> + * requires that we only report that message was transmitted when
> + * we're ready to transmit another message. We'll get that in the form
> + * of tx IRQ once the other side starts to drain the msgq.
> + */
> + if (gh_error == GH_ERROR_OK) {
> + if (!ready)
> + return 0;
> + } else
Standard style would add curly braces to the else block.
} else {
> + dev_err(msgq->mbox.dev, "Failed to send data: %d (%d)\n", gh_error, msgq->last_ret);
> +
> + /**
> + * We can send more messages. Mailbox framework requires that tx done
> + * happens asynchronously to sending the message. Gunyah message queues
> + * tell us right away on the hypercall return whether we can send more
> + * messages. To work around this, defer the txdone to a tasklet.
> + */
> + tasklet_schedule(&msgq->txdone_tasklet);
> +
> + return 0;
> +}
. . .
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Document the ioctls and usage of Gunyah VM Manager driver.
>
> Signed-off-by: Elliot Berman <[email protected]>
This patch does not apply, because at this point in the
series, "Documentation/virt/gunyah/index.rst" does not
exist. I'm going to ignore that.
I have some suggestions, but this generally looks good.
I'll wait to see v14 with the full "index.rst" before I
give my Reviewed-by.
-Alex
> ---
> Documentation/virt/gunyah/index.rst | 1 +
> Documentation/virt/gunyah/vm-manager.rst | 82 ++++++++++++++++++++++++
> 2 files changed, 83 insertions(+)
> create mode 100644 Documentation/virt/gunyah/vm-manager.rst
>
> diff --git a/Documentation/virt/gunyah/index.rst b/Documentation/virt/gunyah/index.rst
> index 74aa345e0a14..7058249825b1 100644
> --- a/Documentation/virt/gunyah/index.rst
> +++ b/Documentation/virt/gunyah/index.rst
> @@ -7,6 +7,7 @@ Gunyah Hypervisor
> .. toctree::
> :maxdepth: 1
>
> + vm-manager
> message-queue
>
> Gunyah is a Type-1 hypervisor which is independent of any OS kernel, and runs in
> diff --git a/Documentation/virt/gunyah/vm-manager.rst b/Documentation/virt/gunyah/vm-manager.rst
> new file mode 100644
> index 000000000000..50d8ae7fabcd
> --- /dev/null
> +++ b/Documentation/virt/gunyah/vm-manager.rst
> @@ -0,0 +1,82 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=======================
> +Virtual Machine Manager
> +=======================
> +
> +The Gunyah Virtual Machine Manager is a Linux driver to support launching
> +virtual machines using Gunyah. It presently supports launching non-proxy
> +scheduled Linux-like virtual machines.
Does everyone know what "non-proxy-scheduled virtual machines" are?
> +Except for some basic information about the location of initial binaries,
> +most of the configuration about a Gunyah virtual machine is described in the
> +VM's devicetree. The devicetree is generated by userspace. Interacting with the
> +virtual machine is still done via the kernel and VM configuration requires some
> +of the corresponding functionality to be set up in the kernel. For instance,
> +sharing userspace memory with a VM is done via the `GH_VM_SET_USER_MEM_REGION`_
> +ioctl. The VM itself is configured to use the memory region via the
> +devicetree.
Without looking at the code, I'm a little unsure what that last
sentence reallly means.
> +
> +Sample Userspace VMM
> +====================
> +
> +A sample userspace VMM is included in samples/gunyah/ along with a minimal
> +devicetree that can be used to launch a VM. To build this sample, enable
> +CONFIG_SAMPLE_GUNYAH.
> +
> +IOCTLs and userspace VMM flows
> +==============================
> +
> +The kernel exposes a char device interface at /dev/gunyah.
> +
> +To create a VM, use the `GH_CREATE_VM`_ ioctl. A successful call will return a
> +"Gunyah VM" file descriptor.
> +
> +/dev/gunyah API Descriptions
> +----------------------------
> +
> +GH_CREATE_VM
> +~~~~~~~~~~~~
> +
> +Creates a Gunyah VM. The argument is reserved for future use and must be 0.
Maybe mention it returns a file descriptor representing the created VM?
> +
> +Gunyah VM API Descriptions
> +--------------------------
> +
> +GH_VM_SET_USER_MEM_REGION
> +~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +This ioctl allows the user to create or delete a memory parcel for a guest
> +virtual machine. Each memory region is uniquely identified by a label;
> +attempting to create two regions with the same label is not allowed. Labels are
> +unique per virtual machine.
> +
> +While VMM is guest-agnostic and allows runtime addition of memory regions,
> +Linux guest virtual machines do not support accepting memory regions at runtime.
> +Thus, memory regions should be provided before starting the VM and the VM must
Thus, for Linux guests, memory regions must be provided...
> +be configured to accept these at boot-up.
> +
> +The guest physical address is used by Linux kernel to check that the requested
> +user regions do not overlap and to help find the corresponding memory region
> +for calls like `GH_VM_SET_DTB_CONFIG`_. It must be page aligned.
> +
> +To add a memory region, call `GH_VM_SET_USER_MEM_REGION`_ with fields set as
> +described above.
> +
> +.. kernel-doc:: include/uapi/linux/gunyah.h
> + :identifiers: gh_userspace_memory_region gh_mem_flags
> +
> +GH_VM_SET_DTB_CONFIG
> +~~~~~~~~~~~~~~~~~~~~
> +
> +This ioctl sets the location of the VM's devicetree blob and is used by Gunyah
> +Resource Manager to allocate resources. The guest physical memory should be part
s/should/must/ /* ? */
> +of the primary memory parcel provided to the VM prior to GH_VM_START.
Is it possible to provide multiple memory parcels? If so, is the
"primary" memory parcel the first?
> +
> +.. kernel-doc:: include/uapi/linux/gunyah.h
> + :identifiers: gh_vm_dtb_config
> +
> +GH_VM_START
> +~~~~~~~~~~~
> +
> +This ioctl starts the VM.
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Gunyah resource manager provides API to manipulate stage 2 page tables.
> Manipulations are represented as a memory parcel. Memory parcels
Not a huge deal, but maybe:
The Gunyah resource manager provides an API for manipulating stage 2
page tables. The API uses "memory parcels" to represent regions of
memory affected by API calls.
> describe a list of memory regions (intermediate physical address and
...(intermediate physical address--IPA--and size)...
> size), a list of new permissions for VMs, and the memory type (DDR or
> MMIO). Memory parcels are uniquely identified by a handle allocated by
s/Memory parcels are/Each memory parcel is/
Also, as I recall, a memory parcel is contiguous memory in
the guest address space. If that's true, it might be worth
mentioning (here and/or in the code).
> Gunyah. There are a few types of memory parcel sharing which Gunyah
> supports:
>
> - Sharing: the guest and host VM both have access
> - Lending: only the guest has access; host VM loses access
> - Donating: Permanently lent (not reclaimed even if guest shuts down)
>
> Memory parcels that have been shared or lent can be reclaimed by the
> host via an additional call. The reclaim operation restores the original
> access the host VM had to the memory parcel and removes the access to
> other VM.
>
> One point to note that memory parcels don't describe where in the guest
> VM the memory parcel should reside. The guest VM must accept the memory
> parcel either explicitly via a "gh_rm_mem_accept" call (not introduced
> here) or be configured to accept it automatically at boot. As the guest
> VM accepts the memory parcel, it also mentions the IPA it wants to place
> memory parcel.
I have quite a few small comments and questions. Some of the
questions arise because I haven't done a very deep review this
time, so I might just be missing or forgetting bits of the
bigger picture.
My feedback is down to nits though, for the most part. Consider
what I say, but even if you ignore much of it:
Reviewed-by: Alex Elder <[email protected]>
> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
> ---
> drivers/virt/gunyah/rsc_mgr_rpc.c | 227 ++++++++++++++++++++++++++++++
> include/linux/gunyah_rsc_mgr.h | 48 +++++++
> 2 files changed, 275 insertions(+)
>
> diff --git a/drivers/virt/gunyah/rsc_mgr_rpc.c b/drivers/virt/gunyah/rsc_mgr_rpc.c
> index a4a9f0ba4e1f..4f25f07400b3 100644
> --- a/drivers/virt/gunyah/rsc_mgr_rpc.c
> +++ b/drivers/virt/gunyah/rsc_mgr_rpc.c
> @@ -6,6 +6,12 @@
> #include <linux/gunyah_rsc_mgr.h>
> #include "rsc_mgr.h"
>
> +/* Message IDs: Memory Management */
> +#define GH_RM_RPC_MEM_LEND 0x51000012
> +#define GH_RM_RPC_MEM_SHARE 0x51000013
> +#define GH_RM_RPC_MEM_RECLAIM 0x51000015
> +#define GH_RM_RPC_MEM_APPEND 0x51000018
These definitions seem to be permanent, unchanging, definitional
values for the Gunyah RM API. It seems like they could reside
in a file that reinforces that--like "gh_rm_api.h" or something.
That said, nobody else will be using these, so I guess defining
it here makes sense.
> +
> /* Message IDs: VM Management */
> #define GH_RM_RPC_VM_ALLOC_VMID 0x56000001
> #define GH_RM_RPC_VM_DEALLOC_VMID 0x56000002
> @@ -22,6 +28,46 @@ struct gh_rm_vm_common_vmid_req {
> __le16 _padding;
> } __packed;
>
> +/* Call: MEM_LEND, MEM_SHARE */
> +#define GH_MEM_SHARE_REQ_FLAGS_APPEND BIT(1)
> +
> +struct gh_rm_mem_share_req_header {
> + u8 mem_type;
> + u8 _padding0;
> + u8 flags;
> + u8 _padding1;
> + __le32 label;
> +} __packed;
> +
> +struct gh_rm_mem_share_req_acl_section {
> + __le32 n_entries;
> + struct gh_rm_mem_acl_entry entries[];
> +};
> +
> +struct gh_rm_mem_share_req_mem_section {
> + __le16 n_entries;
> + __le16 _padding;
> + struct gh_rm_mem_entry entries[];
> +};
> +
> +/* Call: MEM_RELEASE */
> +struct gh_rm_mem_release_req {
> + __le32 mem_handle;
> + u8 flags; /* currently not used */
> + u8 _padding0;
> + __le16 _padding1;
> +} __packed;
> +
> +/* Call: MEM_APPEND */
> +#define GH_MEM_APPEND_REQ_FLAGS_END BIT(0)
> +
> +struct gh_rm_mem_append_req_header {
> + __le32 mem_handle;
> + u8 flags;
> + u8 _padding0;
> + __le16 _padding1;
> +} __packed;
> +
> /* Call: VM_ALLOC */
> struct gh_rm_vm_alloc_vmid_resp {
> __le16 vmid;
> @@ -51,6 +97,8 @@ struct gh_rm_vm_config_image_req {
> __le64 dtb_size;
> } __packed;
>
> +#define GH_RM_MAX_MEM_ENTRIES 512
> +
> /*
> * Several RM calls take only a VMID as a parameter and give only standard
> * response back. Deduplicate boilerplate code by using this common call.
> @@ -64,6 +112,185 @@ static int gh_rm_common_vmid_call(struct gh_rm *rm, u32 message_id, u16 vmid)
> return gh_rm_call(rm, message_id, &req_payload, sizeof(req_payload), NULL, NULL);
> }
>
> +static int _gh_rm_mem_append(struct gh_rm *rm, u32 mem_handle, bool end_append,
> + struct gh_rm_mem_entry *mem_entries, size_t n_mem_entries)
> +{
> + struct gh_rm_mem_share_req_mem_section *mem_section;
> + struct gh_rm_mem_append_req_header *req_header;
> + size_t msg_size = 0;
> + void *msg;
> + int ret;
> +
> + msg_size += sizeof(struct gh_rm_mem_append_req_header);
> + msg_size += struct_size(mem_section, entries, n_mem_entries);
> +
> + msg = kzalloc(msg_size, GFP_KERNEL);
> + if (!msg)
> + return -ENOMEM;
> +
> + req_header = msg;
> + mem_section = (void *)req_header + sizeof(struct gh_rm_mem_append_req_header);
You could use req_header + 1. Even if not, use sizeof(*req_header).
> +
> + req_header->mem_handle = cpu_to_le32(mem_handle);
> + if (end_append)
> + req_header->flags |= GH_MEM_APPEND_REQ_FLAGS_END;
> +
> + mem_section->n_entries = cpu_to_le16(n_mem_entries);
> + memcpy(mem_section->entries, mem_entries, sizeof(*mem_entries) * n_mem_entries);
> +
> + ret = gh_rm_call(rm, GH_RM_RPC_MEM_APPEND, msg, msg_size, NULL, NULL);
> + kfree(msg);
> +
> + return ret;
> +}
> +
> +static int gh_rm_mem_append(struct gh_rm *rm, u32 mem_handle,
> + struct gh_rm_mem_entry *mem_entries, size_t n_mem_entries)
> +{
> + bool end_append;
> + int ret = 0;
> + size_t n;
> +
> + while (n_mem_entries) {
> + if (n_mem_entries > GH_RM_MAX_MEM_ENTRIES) {
> + end_append = false;
> + n = GH_RM_MAX_MEM_ENTRIES;
> + } else {
> + end_append = true;
> + n = n_mem_entries;
> + }
> +
> + ret = _gh_rm_mem_append(rm, mem_handle, end_append, mem_entries, n);
> + if (ret)
> + break;
> +
> + mem_entries += n;
> + n_mem_entries -= n;
> + }
> +
> + return ret;
> +}
> +
> +static int gh_rm_mem_lend_common(struct gh_rm *rm, u32 message_id, struct gh_rm_mem_parcel *p)
> +{
> + size_t msg_size = 0, initial_mem_entries = p->n_mem_entries, resp_size;
> + size_t acl_section_size, mem_section_size;
> + struct gh_rm_mem_share_req_acl_section *acl_section;
> + struct gh_rm_mem_share_req_mem_section *mem_section;
> + struct gh_rm_mem_share_req_header *req_header;
> + u32 *attr_section;
> + __le32 *resp;
> + void *msg;
> + int ret;
> +
> + if (!p->acl_entries || !p->n_acl_entries || !p->mem_entries || !p->n_mem_entries ||
> + p->n_acl_entries > U8_MAX || p->mem_handle != GH_MEM_HANDLE_INVAL)
> + return -EINVAL;
> +
> + if (initial_mem_entries > GH_RM_MAX_MEM_ENTRIES)
> + initial_mem_entries = GH_RM_MAX_MEM_ENTRIES;
Is it OK to truncate the number of entries silently?
> +
> + acl_section_size = struct_size(acl_section, entries, p->n_acl_entries);
Is there a limit on the number of ACL entries (as there is for
the number of mem entries).
> + mem_section_size = struct_size(mem_section, entries, initial_mem_entries);
> + /* The format of the message goes:
> + * request header
> + * ACL entries (which VMs get what kind of access to this memory parcel)
> + * Memory entries (list of memory regions to share)
> + * Memory attributes (currently unused, we'll hard-code the size to 0)
> + */
> + msg_size += sizeof(struct gh_rm_mem_share_req_header);
> + msg_size += acl_section_size;
> + msg_size += mem_section_size;
> + msg_size += sizeof(u32); /* for memory attributes, currently unused */
> +
> + msg = kzalloc(msg_size, GFP_KERNEL);
> + if (!msg)
> + return -ENOMEM;
> +
> + req_header = msg;
> + acl_section = (void *)req_header + sizeof(*req_header);
> + mem_section = (void *)acl_section + acl_section_size;
> + attr_section = (void *)mem_section + mem_section_size;
> +
> + req_header->mem_type = p->mem_type;
> + if (initial_mem_entries != p->n_mem_entries)
> + req_header->flags |= GH_MEM_SHARE_REQ_FLAGS_APPEND;
> + req_header->label = cpu_to_le32(p->label);
> +
> + acl_section->n_entries = cpu_to_le32(p->n_acl_entries);
> + memcpy(acl_section->entries, p->acl_entries,
> + flex_array_size(acl_section, entries, p->n_acl_entries));
> +
> + mem_section->n_entries = cpu_to_le16(initial_mem_entries);
> + memcpy(mem_section->entries, p->mem_entries,
> + flex_array_size(mem_section, entries, initial_mem_entries));
> +
> + /* Set n_entries for memory attribute section to 0 */
> + *attr_section = 0;
> +
> + ret = gh_rm_call(rm, message_id, msg, msg_size, (void **)&resp, &resp_size);
> + kfree(msg);
> +
> + if (ret)
> + return ret;
> +
> + p->mem_handle = le32_to_cpu(*resp);
> + kfree(resp);
> +
> + if (initial_mem_entries != p->n_mem_entries) {
> + ret = gh_rm_mem_append(rm, p->mem_handle,
> + &p->mem_entries[initial_mem_entries],
> + p->n_mem_entries - initial_mem_entries);
Will there always be at most one gh_rm_mem_append() call?
> + if (ret) {
> + gh_rm_mem_reclaim(rm, p);
> + p->mem_handle = GH_MEM_HANDLE_INVAL;
> + }
> + }
> +
> + return ret;
> +}
. . .
On 5/9/23 3:47 PM, Elliot Berman wrote:
> When booting a Gunyah virtual machine, the host VM may gain capabilities
> to interact with resources for the guest virtual machine. Examples of
> such resources are vCPUs or message queues. To use those resources, we
> need to translate the RM response into a gunyah_resource structure which
> are useful to Linux drivers. Presently, Linux drivers need only to know
> the type of resource, the capability ID, and an interrupt.
>
> On ARM64 systems, the interrupt reported by Gunyah is the GIC interrupt
> ID number and always a SPI.
>
> Signed-off-by: Elliot Berman <[email protected]>
Please zero the automatic variable in the place I suggest it.
I have two other comments/questions. Otherwise, this looks good.
Reviewed-by: Alex Elder <[email protected]>
> ---
> arch/arm64/include/asm/gunyah.h | 24 +++++
> drivers/virt/gunyah/rsc_mgr.c | 162 +++++++++++++++++++++++++++++++-
> include/linux/gunyah.h | 3 +
> include/linux/gunyah_rsc_mgr.h | 3 +
> 4 files changed, 191 insertions(+), 1 deletion(-)
> create mode 100644 arch/arm64/include/asm/gunyah.h
>
> diff --git a/arch/arm64/include/asm/gunyah.h b/arch/arm64/include/asm/gunyah.h
> new file mode 100644
> index 000000000000..c83d983b0f4e
> --- /dev/null
> +++ b/arch/arm64/include/asm/gunyah.h
> @@ -0,0 +1,24 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +#ifndef _ASM_GUNYAH_H
> +#define _ASM_GUNYAH_H
> +
> +#include <linux/irq.h>
> +#include <dt-bindings/interrupt-controller/arm-gic.h>
> +
> +static inline int arch_gh_fill_irq_fwspec_params(u32 virq, struct irq_fwspec *fwspec)
> +{
> + /* Assume that Gunyah gave us an SPI; defensively check it */
> + if (WARN_ON(virq < 32 || virq > 1019))
> + return -EINVAL;
> +
> + fwspec->param_count = 3;
> + fwspec->param[0] = GIC_SPI;
> + fwspec->param[1] = virq - 32;
> + fwspec->param[2] = IRQ_TYPE_EDGE_RISING;
> + return 0;
> +}
> +
> +#endif
> diff --git a/drivers/virt/gunyah/rsc_mgr.c b/drivers/virt/gunyah/rsc_mgr.c
> index 4f6f96bdcf3d..43ea010ea47a 100644
> --- a/drivers/virt/gunyah/rsc_mgr.c
> +++ b/drivers/virt/gunyah/rsc_mgr.c
> @@ -17,6 +17,8 @@
> #include <linux/platform_device.h>
> #include <linux/miscdevice.h>
>
> +#include <asm/gunyah.h>
> +
> #include "rsc_mgr.h"
> #include "vm_mgr.h"
>
> @@ -133,6 +135,7 @@ struct gh_rm_connection {
> * @send_lock: synchronization to allow only one request to be sent at a time
> * @nh: notifier chain for clients interested in RM notification messages
> * @miscdev: /dev/gunyah
> + * @irq_domain: Domain to translate Gunyah hwirqs to Linux irqs
> */
> struct gh_rm {
> struct device *dev;
> @@ -151,6 +154,7 @@ struct gh_rm {
> struct blocking_notifier_head nh;
>
> struct miscdevice miscdev;
> + struct irq_domain *irq_domain;
> };
>
> /**
> @@ -191,6 +195,133 @@ static inline int gh_rm_remap_error(enum gh_rm_error rm_error)
> }
> }
>
> +struct gh_irq_chip_data {
> + u32 gh_virq;
> +};
> +
> +static struct irq_chip gh_rm_irq_chip = {
> + .name = "Gunyah",
> + .irq_enable = irq_chip_enable_parent,
> + .irq_disable = irq_chip_disable_parent,
> + .irq_ack = irq_chip_ack_parent,
> + .irq_mask = irq_chip_mask_parent,
> + .irq_mask_ack = irq_chip_mask_ack_parent,
> + .irq_unmask = irq_chip_unmask_parent,
> + .irq_eoi = irq_chip_eoi_parent,
> + .irq_set_affinity = irq_chip_set_affinity_parent,
> + .irq_set_type = irq_chip_set_type_parent,
> + .irq_set_wake = irq_chip_set_wake_parent,
> + .irq_set_vcpu_affinity = irq_chip_set_vcpu_affinity_parent,
> + .irq_retrigger = irq_chip_retrigger_hierarchy,
> + .irq_get_irqchip_state = irq_chip_get_parent_state,
> + .irq_set_irqchip_state = irq_chip_set_parent_state,
> + .flags = IRQCHIP_SET_TYPE_MASKED |
> + IRQCHIP_SKIP_SET_WAKE |
> + IRQCHIP_MASK_ON_SUSPEND,
> +};
> +
> +static int gh_rm_irq_domain_alloc(struct irq_domain *d, unsigned int virq, unsigned int nr_irqs,
> + void *arg)
> +{
> + struct gh_irq_chip_data *chip_data, *spec = arg;
> + struct irq_fwspec parent_fwspec;
You are passing the above structure as an argument
to arch_gh_fill_irq_fwspec_params(). It might not
matter currently, but since it's not clear whether
you're assigning all fields before that call, you
should initially zero it.
struct irq_fwspec parent_fwspec = { };
> + struct gh_rm *rm = d->host_data;
> + u32 gh_virq = spec->gh_virq;
> + int ret;
> +
> + if (nr_irqs != 1)
> + return -EINVAL;
> +
> + chip_data = kzalloc(sizeof(*chip_data), GFP_KERNEL);
> + if (!chip_data)
> + return -ENOMEM;
> +
> + chip_data->gh_virq = gh_virq;
> +
> + ret = irq_domain_set_hwirq_and_chip(d, virq, chip_data->gh_virq, &gh_rm_irq_chip,
> + chip_data);
> + if (ret)
> + goto err_free_irq_data;
> +
> + parent_fwspec.fwnode = d->parent->fwnode;
> + ret = arch_gh_fill_irq_fwspec_params(chip_data->gh_virq, &parent_fwspec);
> + if (ret) {
> + dev_err(rm->dev, "virq translation failed %u: %d\n", chip_data->gh_virq, ret);
> + goto err_free_irq_data;
> + }
> +
> + ret = irq_domain_alloc_irqs_parent(d, virq, nr_irqs, &parent_fwspec);
> + if (ret)
> + goto err_free_irq_data;
> +
> + return ret;
> +err_free_irq_data:
> + kfree(chip_data);
> + return ret;
> +}
> +
> +static void gh_rm_irq_domain_free_single(struct irq_domain *d, unsigned int virq)
> +{
> + struct irq_data *irq_data;
> +
> + irq_data = irq_domain_get_irq_data(d, virq);
> + if (!irq_data)
> + return;
> +
> + kfree(irq_data->chip_data);
> + irq_data->chip_data = NULL;
> +}
> +
> +static void gh_rm_irq_domain_free(struct irq_domain *d, unsigned int virq, unsigned int nr_irqs)
> +{
> + unsigned int i;
> +
> + for (i = 0; i < nr_irqs; i++)
> + gh_rm_irq_domain_free_single(d, virq);
> +}
> +
> +static const struct irq_domain_ops gh_rm_irq_domain_ops = {
> + .alloc = gh_rm_irq_domain_alloc,
> + .free = gh_rm_irq_domain_free,
> +};
> +
> +struct gh_resource *gh_rm_alloc_resource(struct gh_rm *rm, struct gh_rm_hyp_resource *hyp_resource)
> +{
> + struct gh_resource *ghrsc;
> + int ret;
> +
> + ghrsc = kzalloc(sizeof(*ghrsc), GFP_KERNEL);
> + if (!ghrsc)
> + return NULL;
> +
> + ghrsc->type = hyp_resource->type;
> + ghrsc->capid = le64_to_cpu(hyp_resource->cap_id);
> + ghrsc->irq = IRQ_NOTCONNECTED;
> + ghrsc->rm_label = le32_to_cpu(hyp_resource->resource_label);
> + if (hyp_resource->virq) {
> + struct gh_irq_chip_data irq_data = {
> + .gh_virq = le32_to_cpu(hyp_resource->virq),
> + };
> +
> + ret = irq_domain_alloc_irqs(rm->irq_domain, 1, NUMA_NO_NODE, &irq_data);
> + if (ret < 0) {
> + dev_err(rm->dev,
> + "Failed to allocate interrupt for resource %d label: %d: %d\n",
> + ghrsc->type, ghrsc->rm_label, ghrsc->irq);
Is it reasonable to return in this case without indicating to the
caller that something is wrong?
> + } else {
> + ghrsc->irq = ret;
> + }
> + }
> +
> + return ghrsc;
> +}
> +
> +void gh_rm_free_resource(struct gh_resource *ghrsc)
> +{
> + irq_dispose_mapping(ghrsc->irq);
> + kfree(ghrsc);
> +}
> +
> static int gh_rm_init_connection_payload(struct gh_rm_connection *connection, void *msg,
> size_t hdr_size, size_t msg_size)
> {
> @@ -661,6 +792,8 @@ static int gh_identify(void)
>
> static int gh_rm_drv_probe(struct platform_device *pdev)
> {
> + struct irq_domain *parent_irq_domain;
> + struct device_node *parent_irq_node;
> struct gh_msgq_tx_data *msg;
> struct gh_rm *rm;
> int ret;
> @@ -701,15 +834,41 @@ static int gh_rm_drv_probe(struct platform_device *pdev)
> if (ret)
> goto err_cache;
>
> + parent_irq_node = of_irq_find_parent(pdev->dev.of_node);
> + if (!parent_irq_node) {
> + dev_err(&pdev->dev, "Failed to find interrupt parent of resource manager\n");
> + ret = -ENODEV;
> + goto err_msgq;
> + }
> +
> + parent_irq_domain = irq_find_host(parent_irq_node);
> + if (!parent_irq_domain) {
> + dev_err(&pdev->dev, "Failed to find interrupt parent domain of resource manager\n");
> + ret = -ENODEV;
> + goto err_msgq;
> + }
> +
> + rm->irq_domain = irq_domain_add_hierarchy(parent_irq_domain, 0, 0, pdev->dev.of_node,
> + &gh_rm_irq_domain_ops, NULL);
> + if (!rm->irq_domain) {
> + dev_err(&pdev->dev, "Failed to add irq domain\n");
> + ret = -ENODEV;
> + goto err_msgq;
> + }
> + rm->irq_domain->host_data = rm;
> +
> + rm->miscdev.parent = &pdev->dev;
> rm->miscdev.name = "gunyah";
> rm->miscdev.minor = MISC_DYNAMIC_MINOR;
> rm->miscdev.fops = &gh_dev_fops;
>
> ret = misc_register(&rm->miscdev);
> if (ret)
> - goto err_msgq;
> + goto err_irq_domain;
>
> return 0;
> +err_irq_domain:
> + irq_domain_remove(rm->irq_domain);
> err_msgq:
> mbox_free_channel(gh_msgq_chan(&rm->msgq));
> gh_msgq_remove(&rm->msgq);
> @@ -723,6 +882,7 @@ static int gh_rm_drv_remove(struct platform_device *pdev)
> struct gh_rm *rm = platform_get_drvdata(pdev);
>
> misc_deregister(&rm->miscdev);
> + irq_domain_remove(rm->irq_domain);
> mbox_free_channel(gh_msgq_chan(&rm->msgq));
> gh_msgq_remove(&rm->msgq);
> kmem_cache_destroy(rm->cache);
> diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
> index 982e27d10d57..4b398b59c2c5 100644
> --- a/include/linux/gunyah.h
> +++ b/include/linux/gunyah.h
> @@ -27,6 +27,9 @@ struct gh_resource {
> enum gh_resource_type type;
> u64 capid;
> unsigned int irq;
> +
> + struct list_head list;
This list isn't used (yet). It seems to be the links used
to hold the resource either on a VM's list of resources, or
on a resource ticket's list of resources.
Not a big deal, but it might be better to introduce the
field in the patch that first uses it.
> + u32 rm_label;
> };
>
> /**
> diff --git a/include/linux/gunyah_rsc_mgr.h b/include/linux/gunyah_rsc_mgr.h
> index 7c599654ea30..e74e867583f5 100644
> --- a/include/linux/gunyah_rsc_mgr.h
> +++ b/include/linux/gunyah_rsc_mgr.h
> @@ -139,6 +139,9 @@ int gh_rm_get_hyp_resources(struct gh_rm *rm, u16 vmid,
> struct gh_rm_hyp_resources **resources);
> int gh_rm_get_vmid(struct gh_rm *rm, u16 *vmid);
>
> +struct gh_resource *gh_rm_alloc_resource(struct gh_rm *rm, struct gh_rm_hyp_resource *hyp_resource);
> +void gh_rm_free_resource(struct gh_resource *ghrsc);
> +
> struct gh_rm_platform_ops {
> int (*pre_mem_share)(struct gh_rm *rm, struct gh_rm_mem_parcel *mem_parcel);
> int (*post_mem_reclaim)(struct gh_rm *rm, struct gh_rm_mem_parcel *mem_parcel);
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Add hypercalls to identify when Linux is running a virtual machine under
> Gunyah.
>
> There are two calls to help identify Gunyah:
>
> 1. gh_hypercall_get_uid() returns a UID when running under a Gunyah
> hypervisor.
> 2. gh_hypercall_hyp_identify() returns build information and a set of
> feature flags that are supported by Gunyah.
>
> Reviewed-by: Srinivas Kandagatla <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
I have a suggestion below. But whether or not you choose to
incorporate it:
Reviewed-by: Alex Elder <[email protected]>
> ---
> arch/arm64/Kbuild | 1 +
> arch/arm64/gunyah/Makefile | 3 ++
> arch/arm64/gunyah/gunyah_hypercall.c | 56 ++++++++++++++++++++++++++++
> drivers/virt/Kconfig | 2 +
> drivers/virt/gunyah/Kconfig | 13 +++++++
> include/linux/gunyah.h | 31 +++++++++++++++
> 6 files changed, 106 insertions(+)
> create mode 100644 arch/arm64/gunyah/Makefile
> create mode 100644 arch/arm64/gunyah/gunyah_hypercall.c
> create mode 100644 drivers/virt/gunyah/Kconfig
>
> diff --git a/arch/arm64/Kbuild b/arch/arm64/Kbuild
> index 5bfbf7d79c99..e4847ba0e3c9 100644
> --- a/arch/arm64/Kbuild
> +++ b/arch/arm64/Kbuild
> @@ -3,6 +3,7 @@ obj-y += kernel/ mm/ net/
> obj-$(CONFIG_KVM) += kvm/
> obj-$(CONFIG_XEN) += xen/
> obj-$(subst m,y,$(CONFIG_HYPERV)) += hyperv/
> +obj-$(CONFIG_GUNYAH) += gunyah/
> obj-$(CONFIG_CRYPTO) += crypto/
>
> # for cleaning
> diff --git a/arch/arm64/gunyah/Makefile b/arch/arm64/gunyah/Makefile
> new file mode 100644
> index 000000000000..84f1e38cafb1
> --- /dev/null
> +++ b/arch/arm64/gunyah/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_GUNYAH) += gunyah_hypercall.o
> diff --git a/arch/arm64/gunyah/gunyah_hypercall.c b/arch/arm64/gunyah/gunyah_hypercall.c
> new file mode 100644
> index 000000000000..2166d5dab869
> --- /dev/null
> +++ b/arch/arm64/gunyah/gunyah_hypercall.c
> @@ -0,0 +1,56 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <linux/arm-smccc.h>
> +#include <linux/module.h>
> +#include <linux/gunyah.h>
> +#include <linux/uuid.h>
> +
> +/* {c1d58fcd-a453-5fdb-9265-ce36673d5f14} */
> +static const uuid_t GUNYAH_UUID =
> + UUID_INIT(0xc1d58fcd, 0xa453, 0x5fdb, 0x92, 0x65, 0xce, 0x36, 0x67, 0x3d, 0x5f, 0x14);
> +
> +bool arch_is_gh_guest(void)
> +{
> + struct arm_smccc_res res;
> + uuid_t uuid;
> +
> + arm_smccc_1_1_hvc(ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID, &res);
> +
> + ((u32 *)&uuid.b[0])[0] = lower_32_bits(res.a0);
> + ((u32 *)&uuid.b[0])[1] = lower_32_bits(res.a1);
> + ((u32 *)&uuid.b[0])[2] = lower_32_bits(res.a2);
> + ((u32 *)&uuid.b[0])[3] = lower_32_bits(res.a3);
I think I'd rather see this more like:
u32 *up = (u32 *)&uuid.b;
/* The lower bytes of the four result fields encode the UUID */
*up++ = lower_32_bits(res.a0);
*up++ = lower_32_bits(res.a1);
*up++ = lower_32_bits(res.a2);
*up = lower_32_bits(res.a3);
Basically I think casting the assigned-to value makes things
harder to read. So doing that cast just once seems simpler.
But it's not a big deal.
> +
> + return uuid_equal(&uuid, &GUNYAH_UUID);
> +}
> +EXPORT_SYMBOL_GPL(arch_is_gh_guest);
. . .
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Enable support for creating irqfds which can raise an interrupt on a
> Gunyah virtual machine. irqfds are exposed to userspace as a Gunyah VM
> function with the name "irqfd". If the VM devicetree is not configured
> to create a doorbell with the corresponding label, userspace will still
> be able to assert the eventfd but no interrupt will be raised on the
> guest.
>
> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
I have a minor suggestion. I think I'd like to look at this
again, so:
Acked-by: Alex Elder <[email protected]>
> ---
> Documentation/virt/gunyah/vm-manager.rst | 2 +-
> drivers/virt/gunyah/Kconfig | 9 ++
> drivers/virt/gunyah/Makefile | 1 +
> drivers/virt/gunyah/gunyah_irqfd.c | 180 +++++++++++++++++++++++
> include/uapi/linux/gunyah.h | 35 +++++
> 5 files changed, 226 insertions(+), 1 deletion(-)
> create mode 100644 drivers/virt/gunyah/gunyah_irqfd.c
>
. . .
> @@ -99,6 +102,38 @@ struct gh_fn_vcpu_arg {
> __u32 id;
> };
>
> +/**
> + * enum gh_irqfd_flags - flags for use in gh_fn_irqfd_arg
> + * @GH_IRQFD_FLAGS_LEVEL: make the interrupt operate like a level triggered
> + * interrupt on guest side. Triggering IRQFD before
> + * guest handles the interrupt causes interrupt to
> + * stay asserted.
> + */
> +enum gh_irqfd_flags {
> + GH_IRQFD_FLAGS_LEVEL = 1UL << 0,
BIT(0), /* ? */
> +};
> +
> +/**
> + * struct gh_fn_irqfd_arg - Arguments to create an irqfd function.
> + *
> + * Create this function with &GH_VM_ADD_FUNCTION using type &GH_FN_IRQFD.
> + *
> + * Allows setting an eventfd to directly trigger a guest interrupt.
> + * irqfd.fd specifies the file descriptor to use as the eventfd.
> + * irqfd.label corresponds to the doorbell label used in the guest VM's devicetree.
> + *
> + * @fd: an eventfd which when written to will raise a doorbell
> + * @label: Label of the doorbell created on the guest VM
> + * @flags: see &enum gh_irqfd_flags
> + * @padding: padding bytes
> + */
> +struct gh_fn_irqfd_arg {
> + __u32 fd;
> + __u32 label;
> + __u32 flags;
> + __u32 padding;
> +};
> +
> /**
> * struct gh_fn_desc - Arguments to create a VM function
> * @type: Type of the function. See &enum gh_fn_type.
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Qualcomm platforms have a firmware entity which performs access control
> to physical pages. Dynamically started Gunyah virtual machines use the
> QCOM_SCM_RM_MANAGED_VMID for access. Linux thus needs to assign access
> to the memory used by guest VMs. Gunyah doesn't do this operation for us
> since it is the current VM (typically VMID_HLOS) delegating the access
> and not Gunyah itself. Use the Gunyah platform ops to achieve this so
> that only Qualcomm platforms attempt to make the needed SCM calls.
>
> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
Minor suggestions below. Please consider them, but either way:
Reviewed-by: Alex Elder <[email protected]>
> ---
> drivers/virt/gunyah/Kconfig | 13 +++
> drivers/virt/gunyah/Makefile | 1 +
> drivers/virt/gunyah/gunyah_qcom.c | 147 ++++++++++++++++++++++++++++++
> 3 files changed, 161 insertions(+)
> create mode 100644 drivers/virt/gunyah/gunyah_qcom.c
>
> diff --git a/drivers/virt/gunyah/Kconfig b/drivers/virt/gunyah/Kconfig
> index de815189dab6..0421b751aad4 100644
> --- a/drivers/virt/gunyah/Kconfig
> +++ b/drivers/virt/gunyah/Kconfig
> @@ -5,6 +5,7 @@ config GUNYAH
> depends on ARM64
> depends on MAILBOX
> select GUNYAH_PLATFORM_HOOKS
> + imply GUNYAH_QCOM_PLATFORM if ARCH_QCOM
> help
> The Gunyah drivers are the helper interfaces that run in a guest VM
> such as basic inter-VM IPC and signaling mechanisms, and higher level
> @@ -15,3 +16,15 @@ config GUNYAH
>
> config GUNYAH_PLATFORM_HOOKS
> tristate
> +
> +config GUNYAH_QCOM_PLATFORM
> + tristate "Support for Gunyah on Qualcomm platforms"
> + depends on GUNYAH
> + select GUNYAH_PLATFORM_HOOKS
> + select QCOM_SCM
> + help
> + Enable support for interacting with Gunyah on Qualcomm
> + platforms. Interaction with Qualcomm firmware requires
> + extra platform-specific support.
> +
> + Say Y/M here to use Gunyah on Qualcomm platforms.
> diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
> index 4fbeee521d60..2aa9ff038ed0 100644
> --- a/drivers/virt/gunyah/Makefile
> +++ b/drivers/virt/gunyah/Makefile
> @@ -1,6 +1,7 @@
> # SPDX-License-Identifier: GPL-2.0
>
> obj-$(CONFIG_GUNYAH_PLATFORM_HOOKS) += gunyah_platform_hooks.o
> +obj-$(CONFIG_GUNYAH_QCOM_PLATFORM) += gunyah_qcom.o
>
> gunyah-y += rsc_mgr.o rsc_mgr_rpc.o vm_mgr.o vm_mgr_mm.o
> obj-$(CONFIG_GUNYAH) += gunyah.o
> diff --git a/drivers/virt/gunyah/gunyah_qcom.c b/drivers/virt/gunyah/gunyah_qcom.c
> new file mode 100644
> index 000000000000..18acbda8fcbd
> --- /dev/null
> +++ b/drivers/virt/gunyah/gunyah_qcom.c
> @@ -0,0 +1,147 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <linux/arm-smccc.h>
> +#include <linux/gunyah_rsc_mgr.h>
> +#include <linux/module.h>
> +#include <linux/firmware/qcom/qcom_scm.h>
> +#include <linux/types.h>
> +#include <linux/uuid.h>
> +
> +#define QCOM_SCM_RM_MANAGED_VMID 0x3A
> +#define QCOM_SCM_MAX_MANAGED_VMID 0x3F
Is this limited to 63 because there are at most 64 VMIDs
that can be represented in a 64-bit unsigned?
> +
> +static int qcom_scm_gh_rm_pre_mem_share(struct gh_rm *rm, struct gh_rm_mem_parcel *mem_parcel)
> +{
> + struct qcom_scm_vmperm *new_perms;
> + u64 src, src_cpy;
> + int ret = 0, i, n;
> + u16 vmid;
> +
> + new_perms = kcalloc(mem_parcel->n_acl_entries, sizeof(*new_perms), GFP_KERNEL);
> + if (!new_perms)
> + return -ENOMEM;
> +
> + for (n = 0; n < mem_parcel->n_acl_entries; n++) {
> + vmid = le16_to_cpu(mem_parcel->acl_entries[n].vmid);
> + if (vmid <= QCOM_SCM_MAX_MANAGED_VMID)
> + new_perms[n].vmid = vmid;
> + else
> + new_perms[n].vmid = QCOM_SCM_RM_MANAGED_VMID;
So any out-of-range VM ID will cause the hunk of memory to
be assigned to the resource manager. Is it expected that
this can occur (and not be an error)?
> + if (mem_parcel->acl_entries[n].perms & GH_RM_ACL_X)
> + new_perms[n].perm |= QCOM_SCM_PERM_EXEC;
> + if (mem_parcel->acl_entries[n].perms & GH_RM_ACL_W)
> + new_perms[n].perm |= QCOM_SCM_PERM_WRITE;
> + if (mem_parcel->acl_entries[n].perms & GH_RM_ACL_R)
> + new_perms[n].perm |= QCOM_SCM_PERM_READ;
> + }
> +
> + src = (1ull << QCOM_SCM_VMID_HLOS);
src = BIT_ULL(QCOM_SCM_VMID_HLOS);
> +
> + for (i = 0; i < mem_parcel->n_mem_entries; i++) {
> + src_cpy = src;
> + ret = qcom_scm_assign_mem(le64_to_cpu(mem_parcel->mem_entries[i].phys_addr),
> + le64_to_cpu(mem_parcel->mem_entries[i].size),
> + &src_cpy, new_perms, mem_parcel->n_acl_entries);
Loops like this can look simpler if you jump to error handling
at the end that does this unwind activity, rather than incorporating
it inside the loop itself. Or even just breaking if ret != 0, e.g.:
if (ret)
break;
}
if (!ret)
return 0;
/* And do the following block here, "outdented" twice */
> + if (ret) {
> + src = 0;
> + for (n = 0; n < mem_parcel->n_acl_entries; n++) {
> + vmid = le16_to_cpu(mem_parcel->acl_entries[n].vmid);
> + if (vmid <= QCOM_SCM_MAX_MANAGED_VMID)
> + src |= (1ull << vmid);
src |= BIT_ULL(vmid);
> + else
> + src |= (1ull << QCOM_SCM_RM_MANAGED_VMID);
src |= BIT_ULL(QCOM_SCM_RM_MANAGED_VMID);
> + }
> +
> + new_perms[0].vmid = QCOM_SCM_VMID_HLOS;
> +
> + for (i--; i >= 0; i--) {
> + src_cpy = src;
> + WARN_ON_ONCE(qcom_scm_assign_mem(
> + le64_to_cpu(mem_parcel->mem_entries[i].phys_addr),
> + le64_to_cpu(mem_parcel->mem_entries[i].size),
> + &src_cpy, new_perms, 1));
> + }
> + break;
> + }
> + }
> +
> + kfree(new_perms);
> + return ret;
> +}
> +
> +static int qcom_scm_gh_rm_post_mem_reclaim(struct gh_rm *rm, struct gh_rm_mem_parcel *mem_parcel)
> +{
> + struct qcom_scm_vmperm new_perms;
> + u64 src = 0, src_cpy;
> + int ret = 0, i, n;
> + u16 vmid;
> +
> + new_perms.vmid = QCOM_SCM_VMID_HLOS;
> + new_perms.perm = QCOM_SCM_PERM_EXEC | QCOM_SCM_PERM_WRITE | QCOM_SCM_PERM_READ;
> +
> + for (n = 0; n < mem_parcel->n_acl_entries; n++) {
> + vmid = le16_to_cpu(mem_parcel->acl_entries[n].vmid);
> + if (vmid <= QCOM_SCM_MAX_MANAGED_VMID)
> + src |= (1ull << vmid);
> + else
> + src |= (1ull << QCOM_SCM_RM_MANAGED_VMID);
> + }
> +
> + for (i = 0; i < mem_parcel->n_mem_entries; i++) {
> + src_cpy = src;
> + ret = qcom_scm_assign_mem(le64_to_cpu(mem_parcel->mem_entries[i].phys_addr),
> + le64_to_cpu(mem_parcel->mem_entries[i].size),
> + &src_cpy, &new_perms, 1);
> + WARN_ON_ONCE(ret);
> + }
> +
> + return ret;
> +}
> +
> +static struct gh_rm_platform_ops qcom_scm_gh_rm_platform_ops = {
> + .pre_mem_share = qcom_scm_gh_rm_pre_mem_share,
> + .post_mem_reclaim = qcom_scm_gh_rm_post_mem_reclaim,
> +};
> +
> +/* {19bd54bd-0b37-571b-946f-609b54539de6} */
> +static const uuid_t QCOM_EXT_UUID =
> + UUID_INIT(0x19bd54bd, 0x0b37, 0x571b, 0x94, 0x6f, 0x60, 0x9b, 0x54, 0x53, 0x9d, 0xe6);
> +
> +#define GH_QCOM_EXT_CALL_UUID_ID ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, ARM_SMCCC_SMC_32, \
> + ARM_SMCCC_OWNER_VENDOR_HYP, 0x3f01)
> +
> +static bool gh_has_qcom_extensions(void)
> +{
> + struct arm_smccc_res res;
> + uuid_t uuid;
> +
> + arm_smccc_1_1_smc(GH_QCOM_EXT_CALL_UUID_ID, &res);
> +
> + ((u32 *)&uuid.b[0])[0] = lower_32_bits(res.a0);
> + ((u32 *)&uuid.b[0])[1] = lower_32_bits(res.a1);
> + ((u32 *)&uuid.b[0])[2] = lower_32_bits(res.a2);
> + ((u32 *)&uuid.b[0])[3] = lower_32_bits(res.a3);
I said this elsewhere. I'd rather see:
u32 *u = (u32 *)&uuid; /* Or &uuid.b? */
*u++ = lower_32_bits(res.a0);
. . .
> +
> + return uuid_equal(&uuid, &QCOM_EXT_UUID);
> +}
> +
> +static int __init qcom_gh_platform_hooks_register(void)
> +{
> + if (!gh_has_qcom_extensions())
> + return -ENODEV;
> +
> + return gh_rm_register_platform_ops(&qcom_scm_gh_rm_platform_ops);
> +}
> +
> +static void __exit qcom_gh_platform_hooks_unregister(void)
> +{
> + gh_rm_unregister_platform_ops(&qcom_scm_gh_rm_platform_ops);
> +}
> +
> +module_init(qcom_gh_platform_hooks_register);
> +module_exit(qcom_gh_platform_hooks_unregister);
> +MODULE_DESCRIPTION("Qualcomm Technologies, Inc. Platform Hooks for Gunyah");
> +MODULE_LICENSE("GPL");
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Gunyah doorbells allow two virtual machines to signal each other using
> interrupts. Add the hypercalls needed to assert the interrupt.
>
> Signed-off-by: Elliot Berman <[email protected]>
Looks good.
Reviewed-by: Alex Elder <[email protected]>
> ---
> arch/arm64/gunyah/gunyah_hypercall.c | 25 +++++++++++++++++++++++++
> include/linux/gunyah.h | 3 +++
> 2 files changed, 28 insertions(+)
>
> diff --git a/arch/arm64/gunyah/gunyah_hypercall.c b/arch/arm64/gunyah/gunyah_hypercall.c
> index 5f33f53e05a9..3d48c8650851 100644
> --- a/arch/arm64/gunyah/gunyah_hypercall.c
> +++ b/arch/arm64/gunyah/gunyah_hypercall.c
> @@ -33,6 +33,8 @@ EXPORT_SYMBOL_GPL(arch_is_gh_guest);
> fn)
>
> #define GH_HYPERCALL_HYP_IDENTIFY GH_HYPERCALL(0x8000)
> +#define GH_HYPERCALL_BELL_SEND GH_HYPERCALL(0x8012)
> +#define GH_HYPERCALL_BELL_SET_MASK GH_HYPERCALL(0x8015)
> #define GH_HYPERCALL_MSGQ_SEND GH_HYPERCALL(0x801B)
> #define GH_HYPERCALL_MSGQ_RECV GH_HYPERCALL(0x801C)
> #define GH_HYPERCALL_VCPU_RUN GH_HYPERCALL(0x8065)
> @@ -55,6 +57,29 @@ void gh_hypercall_hyp_identify(struct gh_hypercall_hyp_identify_resp *hyp_identi
> }
> EXPORT_SYMBOL_GPL(gh_hypercall_hyp_identify);
>
> +enum gh_error gh_hypercall_bell_send(u64 capid, u64 new_flags, u64 *old_flags)
> +{
> + struct arm_smccc_res res;
> +
> + arm_smccc_1_1_hvc(GH_HYPERCALL_BELL_SEND, capid, new_flags, 0, &res);
> +
> + if (res.a0 == GH_ERROR_OK && old_flags)
> + *old_flags = res.a1;
> +
> + return res.a0;
> +}
> +EXPORT_SYMBOL_GPL(gh_hypercall_bell_send);
> +
> +enum gh_error gh_hypercall_bell_set_mask(u64 capid, u64 enable_mask, u64 ack_mask)
> +{
> + struct arm_smccc_res res;
> +
> + arm_smccc_1_1_hvc(GH_HYPERCALL_BELL_SET_MASK, capid, enable_mask, ack_mask, 0, &res);
> +
> + return res.a0;
> +}
> +EXPORT_SYMBOL_GPL(gh_hypercall_bell_set_mask);
> +
> enum gh_error gh_hypercall_msgq_send(u64 capid, size_t size, void *buff, u64 tx_flags, bool *ready)
> {
> struct arm_smccc_res res;
> diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
> index cd5704a82c6a..1f1685518bf3 100644
> --- a/include/linux/gunyah.h
> +++ b/include/linux/gunyah.h
> @@ -171,6 +171,9 @@ static inline u16 gh_api_version(const struct gh_hypercall_hyp_identify_resp *gh
>
> void gh_hypercall_hyp_identify(struct gh_hypercall_hyp_identify_resp *hyp_identity);
>
> +enum gh_error gh_hypercall_bell_send(u64 capid, u64 new_flags, u64 *old_flags);
> +enum gh_error gh_hypercall_bell_set_mask(u64 capid, u64 enable_mask, u64 ack_mask);
> +
> #define GH_HYPERCALL_MSGQ_TX_FLAGS_PUSH BIT(0)
>
> enum gh_error gh_hypercall_msgq_send(u64 capid, size_t size, void *buff, u64 tx_flags, bool *ready);
On 5/9/23 3:48 PM, Elliot Berman wrote:
> Allow userspace to attach an ioeventfd to an mmio address within the guest.
>
> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
Looks good. One question below.
Reviewed-by: Alex Elder <[email protected]>
> ---
> Documentation/virt/gunyah/vm-manager.rst | 2 +-
> drivers/virt/gunyah/Kconfig | 9 ++
> drivers/virt/gunyah/Makefile | 1 +
> drivers/virt/gunyah/gunyah_ioeventfd.c | 130 +++++++++++++++++++++++
> include/uapi/linux/gunyah.h | 37 +++++++
> 5 files changed, 178 insertions(+), 1 deletion(-)
> create mode 100644 drivers/virt/gunyah/gunyah_ioeventfd.c
>
> diff --git a/Documentation/virt/gunyah/vm-manager.rst b/Documentation/virt/gunyah/vm-manager.rst
> index c4960948c779..87838c5b5945 100644
> --- a/Documentation/virt/gunyah/vm-manager.rst
> +++ b/Documentation/virt/gunyah/vm-manager.rst
> @@ -115,7 +115,7 @@ the VM *before* the VM starts.
> The argument types are documented below:
>
> .. kernel-doc:: include/uapi/linux/gunyah.h
> - :identifiers: gh_fn_vcpu_arg gh_fn_irqfd_arg gh_irqfd_flags
> + :identifiers: gh_fn_vcpu_arg gh_fn_irqfd_arg gh_irqfd_flags gh_fn_ioeventfd_arg gh_ioeventfd_flags
>
> Gunyah VCPU API Descriptions
> ----------------------------
> diff --git a/drivers/virt/gunyah/Kconfig b/drivers/virt/gunyah/Kconfig
> index bc2c46d9df94..63bebc5b9f82 100644
> --- a/drivers/virt/gunyah/Kconfig
> +++ b/drivers/virt/gunyah/Kconfig
> @@ -48,3 +48,12 @@ config GUNYAH_IRQFD
> on Gunyah virtual machine.
>
> Say Y/M here if unsure and you want to support Gunyah VMMs.
> +
> +config GUNYAH_IOEVENTFD
> + tristate "Gunyah ioeventfd interface"
> + depends on GUNYAH
> + help
> + Enable kernel support for creating ioeventfds which can alert userspace
> + when a Gunyah virtual machine accesses a memory address.
> +
> + Say Y/M here if unsure and you want to support Gunyah VMMs.
> diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
> index ad212a1cf967..63ca11e74796 100644
> --- a/drivers/virt/gunyah/Makefile
> +++ b/drivers/virt/gunyah/Makefile
> @@ -8,3 +8,4 @@ obj-$(CONFIG_GUNYAH) += gunyah.o
>
> obj-$(CONFIG_GUNYAH_VCPU) += gunyah_vcpu.o
> obj-$(CONFIG_GUNYAH_IRQFD) += gunyah_irqfd.o
> +obj-$(CONFIG_GUNYAH_IOEVENTFD) += gunyah_ioeventfd.o
> diff --git a/drivers/virt/gunyah/gunyah_ioeventfd.c b/drivers/virt/gunyah/gunyah_ioeventfd.c
> new file mode 100644
> index 000000000000..5b1b9fd9ac3a
> --- /dev/null
> +++ b/drivers/virt/gunyah/gunyah_ioeventfd.c
> @@ -0,0 +1,130 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <linux/eventfd.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/gunyah.h>
> +#include <linux/gunyah_vm_mgr.h>
> +#include <linux/module.h>
> +#include <linux/printk.h>
> +
> +#include <uapi/linux/gunyah.h>
> +
> +struct gh_ioeventfd {
> + struct gh_vm_function_instance *f;
> + struct gh_vm_io_handler io_handler;
> +
> + struct eventfd_ctx *ctx;
> +};
> +
> +static int gh_write_ioeventfd(struct gh_vm_io_handler *io_dev, u64 addr, u32 len, u64 data)
> +{
> + struct gh_ioeventfd *iofd = container_of(io_dev, struct gh_ioeventfd, io_handler);
Does a write of 0 bytes still signal an event?
> +
> + eventfd_signal(iofd->ctx, 1);
> + return 0;
> +}
> +
> +static struct gh_vm_io_handler_ops io_ops = {
> + .write = gh_write_ioeventfd,
> +};
> +
> +static long gh_ioeventfd_bind(struct gh_vm_function_instance *f)
> +{
> + const struct gh_fn_ioeventfd_arg *args = f->argp;
> + struct gh_ioeventfd *iofd;
> + struct eventfd_ctx *ctx;
> + int ret;
> +
> + if (f->arg_size != sizeof(*args))
> + return -EINVAL;
> +
> + /* All other flag bits are reserved for future use */
> + if (args->flags & ~GH_IOEVENTFD_FLAGS_DATAMATCH)
> + return -EINVAL;
> +
> + /* must be natural-word sized, or 0 to ignore length */
> + switch (args->len) {
> + case 0:
> + case 1:
> + case 2:
> + case 4:
> + case 8:
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + /* check for range overflow */
> + if (overflows_type(args->addr + args->len, u64))
> + return -EINVAL;
> +
> + /* ioeventfd with no length can't be combined with DATAMATCH */
> + if (!args->len && (args->flags & GH_IOEVENTFD_FLAGS_DATAMATCH))
> + return -EINVAL;
> +
> + ctx = eventfd_ctx_fdget(args->fd);
> + if (IS_ERR(ctx))
> + return PTR_ERR(ctx);
> +
> + iofd = kzalloc(sizeof(*iofd), GFP_KERNEL);
> + if (!iofd) {
> + ret = -ENOMEM;
> + goto err_eventfd;
> + }
> +
> + f->data = iofd;
> + iofd->f = f;
> +
> + iofd->ctx = ctx;
> +
> + if (args->flags & GH_IOEVENTFD_FLAGS_DATAMATCH) {
> + iofd->io_handler.datamatch = true;
> + iofd->io_handler.len = args->len;
> + iofd->io_handler.data = args->datamatch;
> + }
> + iofd->io_handler.addr = args->addr;
> + iofd->io_handler.ops = &io_ops;
> +
> + ret = gh_vm_add_io_handler(f->ghvm, &iofd->io_handler);
> + if (ret)
> + goto err_io_dev_add;
> +
> + return 0;
> +
> +err_io_dev_add:
> + kfree(iofd);
> +err_eventfd:
> + eventfd_ctx_put(ctx);
> + return ret;
> +}
> +
> +static void gh_ioevent_unbind(struct gh_vm_function_instance *f)
> +{
> + struct gh_ioeventfd *iofd = f->data;
> +
> + eventfd_ctx_put(iofd->ctx);
> + gh_vm_remove_io_handler(iofd->f->ghvm, &iofd->io_handler);
> + kfree(iofd);
> +}
> +
> +static bool gh_ioevent_compare(const struct gh_vm_function_instance *f,
> + const void *arg, size_t size)
> +{
> + const struct gh_fn_ioeventfd_arg *instance = f->argp,
> + *other = arg;
> +
> + if (sizeof(*other) != size)
> + return false;
> +
> + return instance->addr == other->addr;
> +}
> +
> +DECLARE_GH_VM_FUNCTION_INIT(ioeventfd, GH_FN_IOEVENTFD, 3,
> + gh_ioeventfd_bind, gh_ioevent_unbind,
> + gh_ioevent_compare);
> +MODULE_DESCRIPTION("Gunyah ioeventfd VM Function");
> +MODULE_LICENSE("GPL");
> diff --git a/include/uapi/linux/gunyah.h b/include/uapi/linux/gunyah.h
> index 0c480c622686..fa1cae7419d2 100644
> --- a/include/uapi/linux/gunyah.h
> +++ b/include/uapi/linux/gunyah.h
> @@ -79,10 +79,13 @@ struct gh_vm_dtb_config {
> * Return: file descriptor to manipulate the vcpu.
> * @GH_FN_IRQFD: register eventfd to assert a Gunyah doorbell
> * &struct gh_fn_desc.arg is a pointer to &struct gh_fn_irqfd_arg
> + * @GH_FN_IOEVENTFD: register ioeventfd to trigger when VM faults on parameter
> + * &struct gh_fn_desc.arg is a pointer to &struct gh_fn_ioeventfd_arg
> */
> enum gh_fn_type {
> GH_FN_VCPU = 1,
> GH_FN_IRQFD,
> + GH_FN_IOEVENTFD,
> };
>
> #define GH_FN_MAX_ARG_SIZE 256
> @@ -134,6 +137,40 @@ struct gh_fn_irqfd_arg {
> __u32 padding;
> };
>
> +/**
> + * enum gh_ioeventfd_flags - flags for use in gh_fn_ioeventfd_arg
> + * @GH_IOEVENTFD_FLAGS_DATAMATCH: the event will be signaled only if the
> + * written value to the registered address is
> + * equal to &struct gh_fn_ioeventfd_arg.datamatch
> + */
> +enum gh_ioeventfd_flags {
> + GH_IOEVENTFD_FLAGS_DATAMATCH = 1UL << 0,
> +};
> +
> +/**
> + * struct gh_fn_ioeventfd_arg - Arguments to create an ioeventfd function
> + * @datamatch: data used when GH_IOEVENTFD_DATAMATCH is set
> + * @addr: Address in guest memory
> + * @len: Length of access
> + * @fd: When ioeventfd is matched, this eventfd is written
> + * @flags: See &enum gh_ioeventfd_flags
> + * @padding: padding bytes
> + *
> + * Create this function with &GH_VM_ADD_FUNCTION using type &GH_FN_IOEVENTFD.
> + *
> + * Attaches an ioeventfd to a legal mmio address within the guest. A guest write
> + * in the registered address will signal the provided event instead of triggering
> + * an exit on the GH_VCPU_RUN ioctl.
> + */
> +struct gh_fn_ioeventfd_arg {
> + __u64 datamatch;
> + __u64 addr; /* legal mmio address */
> + __u32 len; /* 1, 2, 4, or 8 bytes; or 0 to ignore length */
> + __s32 fd;
> + __u32 flags;
> + __u32 padding;
> +};
> +
> /**
> * struct gh_fn_desc - Arguments to create a VM function
> * @type: Type of the function. See &enum gh_fn_type.
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Add hypercalls to send and receive messages on a Gunyah message queue.
>
> Signed-off-by: Elliot Berman <[email protected]>
I continue to dislike the long lines, but aside from that this
looks fine.
Reviewed-by: Alex Elder <[email protected]>
> ---
> arch/arm64/gunyah/gunyah_hypercall.c | 31 ++++++++++++++++++++++++++++
> include/linux/gunyah.h | 6 ++++++
> 2 files changed, 37 insertions(+)
>
> diff --git a/arch/arm64/gunyah/gunyah_hypercall.c b/arch/arm64/gunyah/gunyah_hypercall.c
> index 2166d5dab869..2b2a63e9b9e5 100644
> --- a/arch/arm64/gunyah/gunyah_hypercall.c
> +++ b/arch/arm64/gunyah/gunyah_hypercall.c
> @@ -33,6 +33,8 @@ EXPORT_SYMBOL_GPL(arch_is_gh_guest);
> fn)
>
> #define GH_HYPERCALL_HYP_IDENTIFY GH_HYPERCALL(0x8000)
> +#define GH_HYPERCALL_MSGQ_SEND GH_HYPERCALL(0x801B)
> +#define GH_HYPERCALL_MSGQ_RECV GH_HYPERCALL(0x801C)
>
> /**
> * gh_hypercall_hyp_identify() - Returns build information and feature flags
> @@ -52,5 +54,34 @@ void gh_hypercall_hyp_identify(struct gh_hypercall_hyp_identify_resp *hyp_identi
> }
> EXPORT_SYMBOL_GPL(gh_hypercall_hyp_identify);
>
> +enum gh_error gh_hypercall_msgq_send(u64 capid, size_t size, void *buff, u64 tx_flags, bool *ready)
> +{
> + struct arm_smccc_res res;
> +
> + arm_smccc_1_1_hvc(GH_HYPERCALL_MSGQ_SEND, capid, size, (uintptr_t)buff, tx_flags, 0, &res);
> +
> + if (res.a0 == GH_ERROR_OK)
> + *ready = !!res.a1;
> +
> + return res.a0;
> +}
> +EXPORT_SYMBOL_GPL(gh_hypercall_msgq_send);
> +
> +enum gh_error gh_hypercall_msgq_recv(u64 capid, void *buff, size_t size, size_t *recv_size,
> + bool *ready)
> +{
> + struct arm_smccc_res res;
> +
> + arm_smccc_1_1_hvc(GH_HYPERCALL_MSGQ_RECV, capid, (uintptr_t)buff, size, 0, &res);
> +
> + if (res.a0 == GH_ERROR_OK) {
> + *recv_size = res.a1;
> + *ready = !!res.a2;
> + }
> +
> + return res.a0;
> +}
> +EXPORT_SYMBOL_GPL(gh_hypercall_msgq_recv);
> +
> MODULE_LICENSE("GPL");
> MODULE_DESCRIPTION("Gunyah Hypervisor Hypercalls");
> diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
> index 6b36cf4787ef..01a6f202d037 100644
> --- a/include/linux/gunyah.h
> +++ b/include/linux/gunyah.h
> @@ -111,4 +111,10 @@ static inline u16 gh_api_version(const struct gh_hypercall_hyp_identify_resp *gh
>
> void gh_hypercall_hyp_identify(struct gh_hypercall_hyp_identify_resp *hyp_identity);
>
> +#define GH_HYPERCALL_MSGQ_TX_FLAGS_PUSH BIT(0)
> +
> +enum gh_error gh_hypercall_msgq_send(u64 capid, size_t size, void *buff, u64 tx_flags, bool *ready);
> +enum gh_error gh_hypercall_msgq_recv(u64 capid, void *buff, size_t size, size_t *recv_size,
> + bool *ready);
> +
> #endif
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Some VM functions need to acquire Gunyah resources. For instance, Gunyah
> vCPUs are exposed to the host as a resource. The Gunyah vCPU function
> will register a resource ticket and be able to interact with the
> hypervisor once the resource ticket is filled.
>
> Resource tickets are the mechanism for functions to acquire ownership of
> Gunyah resources. Gunyah functions can be created before the VM's
> resources are created and made available to Linux. A resource ticket
> identifies a type of resource and a label of a resource which the ticket
> holder is interested in.
>
> Resources are created by Gunyah as configured in the VM's devicetree
> configuration. Gunyah doesn't process the label and that makes it
> possible for userspace to create multiple resources with the same label.
> Resource ticket owners need to be prepared for populate to be called
> multiple times if userspace created multiple resources with the same
> label.
>
> Signed-off-by: Elliot Berman <[email protected]>
Looks good to me.
Reviewed-by: Alex Elder <[email protected]>
> ---
> drivers/virt/gunyah/vm_mgr.c | 117 +++++++++++++++++++++++++++++++++-
> drivers/virt/gunyah/vm_mgr.h | 4 ++
> include/linux/gunyah_vm_mgr.h | 14 ++++
> 3 files changed, 134 insertions(+), 1 deletion(-)
. . .
On 5/9/23 3:47 PM, Elliot Berman wrote:
> The resource manager is a special virtual machine which is always
> running on a Gunyah system. It provides APIs for creating and destroying
> VMs, secure memory management, sharing/lending of memory between VMs,
> and setup of inter-VM communication. Calls to the resource manager are
> made via message queues.
>
> This patch implements the basic probing and RPC mechanism to make those
> API calls. Request/response calls can be made with gh_rm_call.
> Drivers can also register to notifications pushed by RM via
> gh_rm_register_notifier
>
> Specific API calls that resource manager supports will be implemented in
> subsequent patches.
>
> Signed-off-by: Elliot Berman <[email protected]>
I have some comments below, but none is critical so whether or not you
address what I mention:
Reviewed-by: Alex Elder <[email protected]>
> ---
> drivers/virt/Makefile | 1 +
> drivers/virt/gunyah/Makefile | 4 +
> drivers/virt/gunyah/rsc_mgr.c | 702 +++++++++++++++++++++++++++++++++
> drivers/virt/gunyah/rsc_mgr.h | 16 +
> include/linux/gunyah_rsc_mgr.h | 21 +
> 5 files changed, 744 insertions(+)
> create mode 100644 drivers/virt/gunyah/Makefile
> create mode 100644 drivers/virt/gunyah/rsc_mgr.c
> create mode 100644 drivers/virt/gunyah/rsc_mgr.h
> create mode 100644 include/linux/gunyah_rsc_mgr.h
>
> diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
> index e9aa6fc96fab..a5817e2d7d71 100644
> --- a/drivers/virt/Makefile
> +++ b/drivers/virt/Makefile
> @@ -12,3 +12,4 @@ obj-$(CONFIG_ACRN_HSM) += acrn/
> obj-$(CONFIG_EFI_SECRET) += coco/efi_secret/
> obj-$(CONFIG_SEV_GUEST) += coco/sev-guest/
> obj-$(CONFIG_INTEL_TDX_GUEST) += coco/tdx-guest/
> +obj-y += gunyah/
> diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
> new file mode 100644
> index 000000000000..0f5aec834698
> --- /dev/null
> +++ b/drivers/virt/gunyah/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +gunyah-y += rsc_mgr.o
> +obj-$(CONFIG_GUNYAH) += gunyah.o
> diff --git a/drivers/virt/gunyah/rsc_mgr.c b/drivers/virt/gunyah/rsc_mgr.c
> new file mode 100644
> index 000000000000..88b5beb1ea51
> --- /dev/null
> +++ b/drivers/virt/gunyah/rsc_mgr.c
> @@ -0,0 +1,702 @@
. . .
> +/**
> + * struct gh_rm - private data for communicating w/Gunyah resource manager
> + * @dev: pointer to device
This points to the device structure for the RM platform device.
(Maybe that's clear...)
> + * @tx_ghrsc: message queue resource to TX to RM
> + * @rx_ghrsc: message queue resource to RX from RM
> + * @msgq: mailbox instance of TX/RX resources above
> + * @msgq_client: mailbox client of above msgq
> + * @active_rx_connection: ongoing gh_rm_connection for which we're receiving fragments
> + * @last_tx_ret: return value of last mailbox tx
> + * @call_xarray: xarray to allocate & lookup sequence IDs for Request/Response flows
> + * @next_seq: next ID to allocate (for xa_alloc_cyclic)
> + * @cache: cache for allocating Tx messages
> + * @send_lock: synchronization to allow only one request to be sent at a time
> + * @nh: notifier chain for clients interested in RM notification messages
> + */
> +struct gh_rm {
> + struct device *dev;
> + struct gh_resource tx_ghrsc;
> + struct gh_resource rx_ghrsc;
> + struct gh_msgq msgq;
> + struct mbox_client msgq_client;
> + struct gh_rm_connection *active_rx_connection;
> + int last_tx_ret;
> +
> + struct xarray call_xarray;
> + u32 next_seq;
> +
> + struct kmem_cache *cache;
> + struct mutex send_lock;
> + struct blocking_notifier_head nh;
> +};
> +
> +/**
> + * gh_rm_remap_error() - Remap Gunyah resource manager errors into a Linux error code
> + * @rm_error: "Standard" return value from Gunyah resource manager
> + */
> +static inline int gh_rm_remap_error(enum gh_rm_error rm_error)
I suggested something similar last time. I you are operating
on an rm_error value, so I would call this gh_rm_error_remap().
> +{
> + switch (rm_error) {
> + case GH_RM_ERROR_OK:
> + return 0;
> + case GH_RM_ERROR_UNIMPLEMENTED:
> + return -EOPNOTSUPP;
> + case GH_RM_ERROR_NOMEM:
> + return -ENOMEM;
> + case GH_RM_ERROR_NORESOURCE:
> + return -ENODEV;
> + case GH_RM_ERROR_DENIED:
> + return -EPERM;
> + case GH_RM_ERROR_BUSY:
> + return -EBUSY;
> + case GH_RM_ERROR_INVALID:
> + case GH_RM_ERROR_ARGUMENT_INVALID:
> + case GH_RM_ERROR_HANDLE_INVALID:
> + case GH_RM_ERROR_VALIDATE_FAILED:
> + case GH_RM_ERROR_MAP_FAILED:
> + case GH_RM_ERROR_MEM_INVALID:
> + case GH_RM_ERROR_MEM_INUSE:
> + case GH_RM_ERROR_MEM_RELEASED:
> + case GH_RM_ERROR_VMID_INVALID:
> + case GH_RM_ERROR_LOOKUP_FAILED:
> + case GH_RM_ERROR_IRQ_INVALID:
> + case GH_RM_ERROR_IRQ_INUSE:
> + case GH_RM_ERROR_IRQ_RELEASED:
> + return -EINVAL;
> + default:
> + return -EBADMSG;
> + }
> +}
. . .
> +static void gh_rm_process_rply(struct gh_rm *rm, void *msg, size_t msg_size)
> +{
> + struct gh_rm_rpc_reply_hdr *reply_hdr = msg;
> + struct gh_rm_connection *connection;
> + u16 seq_id;
> +
> + seq_id = le16_to_cpu(reply_hdr->hdr.seq);
> + connection = xa_load(&rm->call_xarray, seq_id);
> +
> + if (!connection || connection->msg_id != reply_hdr->hdr.msg_id)
> + return;
Do either of the above conditions warrant reporting a warning if
it occurs? Or are these expected to be possible--and if either
occur they're harmless if handled this way?
> +
> + if (rm->active_rx_connection)
> + gh_rm_abort_connection(rm);
> +
> + if (gh_rm_init_connection_payload(connection, msg, sizeof(*reply_hdr), msg_size)) {
> + dev_err(rm->dev, "Failed to alloc connection buffer for sequence %d\n", seq_id);
> + /* Send connection complete and error the client. */
> + connection->reply.ret = -ENOMEM;
> + complete(&connection->reply.seq_done);
> + return;
> + }
> +
> + connection->reply.rm_error = le32_to_cpu(reply_hdr->err_code);
> + rm->active_rx_connection = connection;
> +}
. . .
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Add remaining ioctls to support non-proxy VM boot:
>
> - Gunyah Resource Manager uses the VM's devicetree to configure the
> virtual machine. The location of the devicetree in the guest's
> virtual memory can be declared via the SET_DTB_CONFIG ioctl.
> - Trigger start of the virtual machine with VM_START ioctl.
>
> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
I point out a spelling error, but otherwise this looks OK to me.
Reviewed-by: Alex Elder <[email protected]>
> ---
> drivers/virt/gunyah/vm_mgr.c | 215 ++++++++++++++++++++++++++++++++
> drivers/virt/gunyah/vm_mgr.h | 11 ++
> drivers/virt/gunyah/vm_mgr_mm.c | 20 +++
> include/uapi/linux/gunyah.h | 15 +++
> 4 files changed, 261 insertions(+)
>
. . .
> +static int gh_vm_ensure_started(struct gh_vm *ghvm)
> +{
> + int ret;
> +
> + ret = down_read_interruptible(&ghvm->status_lock);
> + if (ret)
> + return ret;
> +
> + /* Unlikely because VM is typically started */
> + if (unlikely(ghvm->vm_status == GH_RM_VM_STATUS_NO_STATE)) {
> + up_read(&ghvm->status_lock);
> + ret = gh_vm_start(ghvm);
> + if (ret)
> + return ret;
> + /** gh_vm_start() is guaranteed to bring status out of
> + * GH_RM_VM_STATUS_LOAD, thus inifitely recursive call is not
s/inifitely/infinitely/
> + * possible
> + */
> + return gh_vm_ensure_started(ghvm);
> + }
> +
> + /* Unlikely because VM is typically running */
> + if (unlikely(ghvm->vm_status != GH_RM_VM_STATUS_RUNNING))
> + ret = -ENODEV;
> +
> + up_read(&ghvm->status_lock);
> + return ret;
> +}
> +
> static long gh_vm_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> {
> struct gh_vm *ghvm = filp->private_data;
. . .
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Gunyah VM manager is a kernel moduel which exposes an interface to
> Gunyah userspace to load, run, and interact with other Gunyah virtual
> machines. The interface is a character device at /dev/gunyah.
>
> Add a basic VM manager driver. Upcoming patches will add more ioctls
> into this driver.
>
> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
I have a couple of comments, but regardless of how you respond
to them:
Reviewed-by: Alex Elder <[email protected]>
> ---
> .../userspace-api/ioctl/ioctl-number.rst | 1 +
> drivers/virt/gunyah/Makefile | 2 +-
> drivers/virt/gunyah/rsc_mgr.c | 50 +++++++++-
> drivers/virt/gunyah/vm_mgr.c | 93 +++++++++++++++++++
> drivers/virt/gunyah/vm_mgr.h | 20 ++++
> include/uapi/linux/gunyah.h | 23 +++++
> 6 files changed, 187 insertions(+), 2 deletions(-)
> create mode 100644 drivers/virt/gunyah/vm_mgr.c
> create mode 100644 drivers/virt/gunyah/vm_mgr.h
> create mode 100644 include/uapi/linux/gunyah.h
>
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 176e8fc3f31b..396212e88f7d 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -137,6 +137,7 @@ Code Seq# Include File Comments
> 'F' DD video/sstfb.h conflict!
> 'G' 00-3F drivers/misc/sgi-gru/grulib.h conflict!
> 'G' 00-0F xen/gntalloc.h, xen/gntdev.h conflict!
> +'G' 00-0f linux/gunyah.h conflict!
The existing pattern throughout this file is to use capital A-F,
so I would follow that here.
Sort off related: I prefer lower-case a-f in hexadecimal
numbers in code, and you use capitals (at least some of the
time).
> 'H' 00-7F linux/hiddev.h conflict!
> 'H' 00-0F linux/hidraw.h conflict!
> 'H' 01 linux/mei.h conflict!
> diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
> index 241bab357b86..e47e25895299 100644
> --- a/drivers/virt/gunyah/Makefile
> +++ b/drivers/virt/gunyah/Makefile
> @@ -1,4 +1,4 @@
> # SPDX-License-Identifier: GPL-2.0
>
> -gunyah-y += rsc_mgr.o rsc_mgr_rpc.o
> +gunyah-y += rsc_mgr.o rsc_mgr_rpc.o vm_mgr.o
> obj-$(CONFIG_GUNYAH) += gunyah.o
> diff --git a/drivers/virt/gunyah/rsc_mgr.c b/drivers/virt/gunyah/rsc_mgr.c
> index 88b5beb1ea51..4f6f96bdcf3d 100644
> --- a/drivers/virt/gunyah/rsc_mgr.c
> +++ b/drivers/virt/gunyah/rsc_mgr.c
> @@ -15,8 +15,10 @@
> #include <linux/completion.h>
> #include <linux/gunyah_rsc_mgr.h>
> #include <linux/platform_device.h>
> +#include <linux/miscdevice.h>
>
> #include "rsc_mgr.h"
> +#include "vm_mgr.h"
>
> #define RM_RPC_API_VERSION_MASK GENMASK(3, 0)
> #define RM_RPC_HEADER_WORDS_MASK GENMASK(7, 4)
> @@ -130,6 +132,7 @@ struct gh_rm_connection {
> * @cache: cache for allocating Tx messages
> * @send_lock: synchronization to allow only one request to be sent at a time
> * @nh: notifier chain for clients interested in RM notification messages
> + * @miscdev: /dev/gunyah
> */
> struct gh_rm {
> struct device *dev;
> @@ -146,6 +149,8 @@ struct gh_rm {
> struct kmem_cache *cache;
> struct mutex send_lock;
> struct blocking_notifier_head nh;
> +
> + struct miscdevice miscdev;
> };
>
> /**
> @@ -581,6 +586,33 @@ int gh_rm_notifier_unregister(struct gh_rm *rm, struct notifier_block *nb)
> }
> EXPORT_SYMBOL_GPL(gh_rm_notifier_unregister);
>
> +struct device *gh_rm_get(struct gh_rm *rm)
> +{
> + return get_device(rm->miscdev.this_device);
> +}
> +EXPORT_SYMBOL_GPL(gh_rm_get);
> +
> +void gh_rm_put(struct gh_rm *rm)
> +{
> + put_device(rm->miscdev.this_device);
> +}
> +EXPORT_SYMBOL_GPL(gh_rm_put);
> +
> +static long gh_dev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> +{
> + struct miscdevice *miscdev = filp->private_data;
> + struct gh_rm *rm = container_of(miscdev, struct gh_rm, miscdev);
> +
> + return gh_dev_vm_mgr_ioctl(rm, cmd, arg);
> +}
> +
> +static const struct file_operations gh_dev_fops = {
> + .owner = THIS_MODULE,
> + .unlocked_ioctl = gh_dev_ioctl,
> + .compat_ioctl = compat_ptr_ioctl,
> + .llseek = noop_llseek,
> +};
> +
> static int gh_msgq_platform_probe_direction(struct platform_device *pdev, bool tx,
> struct gh_resource *ghrsc)
> {
> @@ -665,7 +697,22 @@ static int gh_rm_drv_probe(struct platform_device *pdev)
> rm->msgq_client.rx_callback = gh_rm_msgq_rx_data;
> rm->msgq_client.tx_done = gh_rm_msgq_tx_done;
>
> - return gh_msgq_init(&pdev->dev, &rm->msgq, &rm->msgq_client, &rm->tx_ghrsc, &rm->rx_ghrsc);
> + ret = gh_msgq_init(&pdev->dev, &rm->msgq, &rm->msgq_client, &rm->tx_ghrsc, &rm->rx_ghrsc);
> + if (ret)
> + goto err_cache;
> +
> + rm->miscdev.name = "gunyah";
> + rm->miscdev.minor = MISC_DYNAMIC_MINOR;
> + rm->miscdev.fops = &gh_dev_fops;
> +
> + ret = misc_register(&rm->miscdev);
> + if (ret)
> + goto err_msgq;
> +
> + return 0;
> +err_msgq:
> + mbox_free_channel(gh_msgq_chan(&rm->msgq));
I'm sure I've said this before. I find it strange that you need
to call mbox_free_channel() here, when it's not obvious where the
client got bound to any mbox channel. It seems like freeing the
channel should happen inside gh_msgq_remove(). But... perhaps
you previously explained to me why it's done this way.
> + gh_msgq_remove(&rm->msgq);
> err_cache:
> kmem_cache_destroy(rm->cache);
> return ret;
> @@ -675,6 +722,7 @@ static int gh_rm_drv_remove(struct platform_device *pdev)
> {
> struct gh_rm *rm = platform_get_drvdata(pdev);
>
> + misc_deregister(&rm->miscdev);
> mbox_free_channel(gh_msgq_chan(&rm->msgq));
> gh_msgq_remove(&rm->msgq);
> kmem_cache_destroy(rm->cache);
. . .
On 5/9/23 3:47 PM, Elliot Berman wrote:
> When Linux is booted as a guest under the Gunyah hypervisor, the Gunyah
> Resource Manager applies a devicetree overlay describing the virtual
> platform configuration of the guest VM, such as the message queue
> capability IDs for communicating with the Resource Manager. This
> information is not otherwise discoverable by a VM: the Gunyah hypervisor
> core does not provide a direct interface to discover capability IDs nor
> a way to communicate with RM without having already known the
> corresponding message queue capability ID. Add the DT bindings that
> Gunyah adheres for the hypervisor node and message queues.
>
> Reviewed-by: Rob Herring <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
Rob has reviewed this so I presume it's fine, but I wonder
why there's no "qcom," prefix on the compatible strings.
It also seems like there might be more bindings to define
for Gunyah. See a few more comments below.
-Alex
> ---
> .../bindings/firmware/gunyah-hypervisor.yaml | 82 +++++++++++++++++++
> 1 file changed, 82 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml
>
> diff --git a/Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml b/Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml
> new file mode 100644
> index 000000000000..3fc0b043ac3c
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml
> @@ -0,0 +1,82 @@
> +# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/firmware/gunyah-hypervisor.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Gunyah Hypervisor
> +
> +maintainers:
> + - Prakruthi Deepak Heragu <[email protected]>
> + - Elliot Berman <[email protected]>
> +
> +description: |+
> + Gunyah virtual machines use this information to determine the capability IDs
> + of the message queues used to communicate with the Gunyah Resource Manager.
> + See also: https://github.com/quic/gunyah-resource-manager/blob/develop/src/vm_creation/dto_construct.c
Looking at dto_create_msg_queue() at the above link, it seems
that Gunyah message queues and capabilities get represented
in DTS, but I don't see those things documented in bindings.
I might be misinterpreting that code, but if not, should
these other things be documented as well?
> +
> +properties:
> + compatible:
> + const: gunyah-hypervisor
Should this be qcom,gunyah-hypervisor?
> +
> + "#address-cells":
> + description: Number of cells needed to represent 64-bit capability IDs.
> + const: 2
> +
> + "#size-cells":
> + description: must be 0, because capability IDs are not memory address
> + ranges and do not have a size.
> + const: 0
> +
> +patternProperties:
> + "^gunyah-resource-mgr(@.*)?":
> + type: object
> + description:
> + Resource Manager node which is required to communicate to Resource
> + Manager VM using Gunyah Message Queues.
> +
> + properties:
> + compatible:
> + const: gunyah-resource-manager
Here too, should this be qcom,gunyah-resource-manager?
> +
> + reg:
> + items:
> + - description: Gunyah capability ID of the TX message queue
> + - description: Gunyah capability ID of the RX message queue
> +
> + interrupts:
> + items:
> + - description: Interrupt for the TX message queue
> + - description: Interrupt for the RX message queue
> +
> + additionalProperties: false
> +
> + required:
> + - compatible
> + - reg
> + - interrupts
> +
> +additionalProperties: false
> +
> +required:
> + - compatible
> + - "#address-cells"
> + - "#size-cells"
> +
> +examples:
> + - |
> + #include <dt-bindings/interrupt-controller/arm-gic.h>
> +
> + hypervisor {
> + #address-cells = <2>;
> + #size-cells = <0>;
> + compatible = "gunyah-hypervisor";
> +
> + gunyah-resource-mgr@0 {
> + compatible = "gunyah-resource-manager";
> + interrupts = <GIC_SPI 3 IRQ_TYPE_EDGE_RISING>, /* TX full IRQ */
> + <GIC_SPI 4 IRQ_TYPE_EDGE_RISING>; /* RX empty IRQ */
> + reg = <0x00000000 0x00000000>, <0x00000000 0x00000001>;
> + /* TX, RX cap ids */
> + };
> + };
On 5/9/23 3:48 PM, Elliot Berman wrote:
> Add myself and Prakruthi as maintainers of Gunyah hypervisor drivers.
>
> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
Looks good.
Reviewed-by: Alex Elder <[email protected]>
> ---
> MAINTAINERS | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index c754befb94e7..323391320cf1 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -8970,6 +8970,19 @@ L: [email protected]
> S: Maintained
> F: block/partitions/efi.*
>
> +GUNYAH HYPERVISOR DRIVER
> +M: Elliot Berman <[email protected]>
> +M: Prakruthi Deepak Heragu <[email protected]>
> +L: [email protected]
> +S: Supported
> +F: Documentation/devicetree/bindings/firmware/gunyah-hypervisor.yaml
> +F: Documentation/virt/gunyah/
> +F: arch/arm64/gunyah/
> +F: drivers/mailbox/gunyah-msgq.c
> +F: drivers/virt/gunyah/
> +F: include/linux/gunyah*.h
> +F: samples/gunyah/
> +
> HABANALABS PCI DRIVER
> M: Oded Gabbay <[email protected]>
> L: [email protected]
On 5/9/23 3:47 PM, Elliot Berman wrote:
> Add a sample Gunyah VMM capable of launching a non-proxy scheduled VM.
>
> Signed-off-by: Elliot Berman <[email protected]>
I haven't tested this, but I trust it works.
I have some trivial comments, but otherwise:
Reviewed-by: Alex Elder <[email protected]>
> ---
> samples/Kconfig | 10 ++
> samples/Makefile | 1 +
> samples/gunyah/.gitignore | 2 +
> samples/gunyah/Makefile | 6 +
> samples/gunyah/gunyah_vmm.c | 270 +++++++++++++++++++++++++++++++++++
> samples/gunyah/sample_vm.dts | 68 +++++++++
> 6 files changed, 357 insertions(+)
> create mode 100644 samples/gunyah/.gitignore
> create mode 100644 samples/gunyah/Makefile
> create mode 100644 samples/gunyah/gunyah_vmm.c
> create mode 100644 samples/gunyah/sample_vm.dts
>
> diff --git a/samples/Kconfig b/samples/Kconfig
> index b2db430bd3ff..567c7a706c01 100644
> --- a/samples/Kconfig
> +++ b/samples/Kconfig
> @@ -280,6 +280,16 @@ config SAMPLE_KMEMLEAK
> Build a sample program which have explicitly leaks memory to test
> kmemleak
>
> +config SAMPLE_GUNYAH
> + bool "Build example Gunyah Virtual Machine Manager"
> + depends on CC_CAN_LINK && HEADERS_INSTALL
> + depends on GUNYAH
> + help
> + Build an example Gunyah VMM userspace program capable of launching
> + a basic virtual machine under the Gunyah hypervisor.
> + This demonstrates how to create a virtual machine under the Gunyah
> + hypervisor.
I think you can drop the second sentence above. Perhaps adjust the
first a bit if you think the second adds anything important.
> +
> source "samples/rust/Kconfig"
>
> endif # SAMPLES
> diff --git a/samples/Makefile b/samples/Makefile
> index 7727f1a0d6d1..e1b92dec169f 100644
> --- a/samples/Makefile
> +++ b/samples/Makefile
> @@ -37,3 +37,4 @@ obj-$(CONFIG_SAMPLE_KMEMLEAK) += kmemleak/
> obj-$(CONFIG_SAMPLE_CORESIGHT_SYSCFG) += coresight/
> obj-$(CONFIG_SAMPLE_FPROBE) += fprobe/
> obj-$(CONFIG_SAMPLES_RUST) += rust/
> +obj-$(CONFIG_SAMPLE_GUNYAH) += gunyah/
> diff --git a/samples/gunyah/.gitignore b/samples/gunyah/.gitignore
> new file mode 100644
> index 000000000000..adc7d1589fde
> --- /dev/null
> +++ b/samples/gunyah/.gitignore
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0
> +/gunyah_vmm
> diff --git a/samples/gunyah/Makefile b/samples/gunyah/Makefile
> new file mode 100644
> index 000000000000..faf14f9bb337
> --- /dev/null
> +++ b/samples/gunyah/Makefile
> @@ -0,0 +1,6 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +
> +userprogs-always-y += gunyah_vmm
> +dtb-y += sample_vm.dtb
> +
> +userccflags += -I usr/include
> diff --git a/samples/gunyah/gunyah_vmm.c b/samples/gunyah/gunyah_vmm.c
> new file mode 100644
> index 000000000000..d0eb49e86372
> --- /dev/null
> +++ b/samples/gunyah/gunyah_vmm.c
> @@ -0,0 +1,270 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved.
Update the copyright.
> + */
> +
> +#include <stdlib.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <fcntl.h>
> +#include <sys/ioctl.h>
> +#include <getopt.h>
> +#include <limits.h>
> +#include <stdint.h>
> +#include <fcntl.h>
> +#include <string.h>
> +#include <sys/sysmacros.h>
> +#define __USE_GNU
> +#include <sys/mman.h>
> +
> +#include <linux/gunyah.h>
> +
> +struct vm_config {
> + int image_fd;
> + int dtb_fd;
> + int ramdisk_fd;
> +
> + uint64_t guest_base;
> + uint64_t guest_size;
> +
> + uint64_t image_offset;
> + off_t image_size;
> + uint64_t dtb_offset;
> + off_t dtb_size;
> + uint64_t ramdisk_offset;
> + off_t ramdisk_size;
> +};
> +
> +static struct option options[] = {
> + { "help", no_argument, NULL, 'h' },
> + { "image", required_argument, NULL, 'i' },
> + { "dtb", required_argument, NULL, 'd' },
> + { "ramdisk", optional_argument, NULL, 'r' },
> + { "base", optional_argument, NULL, 'B' },
> + { "size", optional_argument, NULL, 'S' },
> + { "image_offset", optional_argument, NULL, 'I' },
> + { "dtb_offset", optional_argument, NULL, 'D' },
> + { "ramdisk_offset", optional_argument, NULL, 'R' },
> + { }
> +};
> +
> +static void print_help(char *cmd)
> +{
> + printf("gunyah_vmm, a sample tool to launch Gunyah VMs\n"
> + "Usage: %s <options>\n"
> + " --help, -h this menu\n"
> + " --image, -i <image> VM image file to load (e.g. a kernel Image) [Required]\n"
> + " --dtb, -d <dtb> Devicetree file to load [Required]\n"
> + " --ramdisk, -r <ramdisk> Ramdisk file to load\n"
> + " --base, -B <address> Set the base address of guest's memory [Default: 0x80000000]\n"
> + " --size, -S <number> The number of bytes large to make the guest's memory [Default: 0x6400000 (100 MB)]\n"
> + " --image_offset, -I <number> Offset into guest memory to load the VM image file [Default: 0x10000]\n"
> + " --dtb_offset, -D <number> Offset into guest memory to load the DTB [Default: 0]\n"
> + " --ramdisk_offset, -R <number> Offset into guest memory to load a ramdisk [Default: 0x4600000]\n"
> + , cmd);
You could define the default values above with symbolic constants,
and print them with 0x%08x in the messages above (or something
similar).
> +}
> +
> +int main(int argc, char **argv)
> +{
> + int gunyah_fd, vm_fd, guest_fd;
> + struct gh_userspace_memory_region guest_mem_desc = { 0 };
> + struct gh_vm_dtb_config dtb_config = { 0 };
> + char *guest_mem;
> + struct vm_config config = {
> + /* Defaults good enough to boot static kernel and a basic ramdisk */
> + .ramdisk_fd = -1,
> + .guest_base = 0x80000000,
> + .guest_size = 0x6400000, /* 100 MB */
> + .image_offset = 0,
> + .dtb_offset = 0x45f0000,
> + .ramdisk_offset = 0x4600000, /* put at +70MB (30MB for ramdisk) */
> + };
> + struct stat st;
> + int opt, optidx, ret = 0;
> + long l;
> +
> + while ((opt = getopt_long(argc, argv, "hi:d:r:B:S:I:D:R:c:", options, &optidx)) != -1) {
> + switch (opt) {
> + case 'i':
> + config.image_fd = open(optarg, O_RDONLY | O_CLOEXEC);
> + if (config.image_fd < 0) {
> + perror("Failed to open image");
> + return -1;
> + }
> + if (stat(optarg, &st) < 0) {
> + perror("Failed to stat image");
> + return -1;
> + }
> + config.image_size = st.st_size;
> + break;
> + case 'd':
> + config.dtb_fd = open(optarg, O_RDONLY | O_CLOEXEC);
> + if (config.dtb_fd < 0) {
> + perror("Failed to open dtb");
> + return -1;
> + }
> + if (stat(optarg, &st) < 0) {
> + perror("Failed to stat dtb");
> + return -1;
> + }
> + config.dtb_size = st.st_size;
> + break;
> + case 'r':
> + config.ramdisk_fd = open(optarg, O_RDONLY | O_CLOEXEC);
> + if (config.ramdisk_fd < 0) {
> + perror("Failed to open ramdisk");
> + return -1;
> + }
> + if (stat(optarg, &st) < 0) {
> + perror("Failed to stat ramdisk");
> + return -1;
> + }
> + config.ramdisk_size = st.st_size;
> + break;
> + case 'B':
> + l = strtol(optarg, NULL, 0);
> + if (l == LONG_MIN) {
> + perror("Failed to parse base address");
> + return -1;
> + }
> + config.guest_base = l;
> + break;
> + case 'S':
> + l = strtol(optarg, NULL, 0);
> + if (l == LONG_MIN) {
> + perror("Failed to parse memory size");
> + return -1;
> + }
> + config.guest_size = l;
> + break;
> + case 'I':
> + l = strtol(optarg, NULL, 0);
> + if (l == LONG_MIN) {
> + perror("Failed to parse image offset");
> + return -1;
> + }
> + config.image_offset = l;
> + break;
> + case 'D':
> + l = strtol(optarg, NULL, 0);
> + if (l == LONG_MIN) {
> + perror("Failed to parse dtb offset");
> + return -1;
> + }
> + config.dtb_offset = l;
> + break;
> + case 'R':
> + l = strtol(optarg, NULL, 0);
> + if (l == LONG_MIN) {
> + perror("Failed to parse ramdisk offset");
> + return -1;
> + }
> + config.ramdisk_offset = l;
> + break;
> + case 'h':
> + print_help(argv[0]);
> + return 0;
> + default:
> + print_help(argv[0]);
> + return -1;
> + }
> + }
> +
> + if (!config.image_fd || !config.dtb_fd) {
I *think* it's possible to have 0 be assigned as config.image_fd
if STDIN is closed when this is run. I might be wrong though, it's
been quite a while... In any case, to guarantee this works correctly
these should be set to -1 (as you do for ramdisk_fd).
> + print_help(argv[0]);
> + return -1;
> + }
> +
> + if (config.image_offset + config.image_size > config.guest_size) {
> + fprintf(stderr, "Image offset and size puts it outside guest memory. Make image smaller or increase guest memory size.\n");
> + return -1;
> + }
> +
> + if (config.dtb_offset + config.dtb_size > config.guest_size) {
> + fprintf(stderr, "DTB offset and size puts it outside guest memory. Make dtb smaller or increase guest memory size.\n");
> + return -1;
> + }
> +
> + if (config.ramdisk_fd == -1 &&
> + config.ramdisk_offset + config.ramdisk_size > config.guest_size) {
> + fprintf(stderr, "Ramdisk offset and size puts it outside guest memory. Make ramdisk smaller or increase guest memory size.\n");
> + return -1;
> + }
> +
> + gunyah_fd = open("/dev/gunyah", O_RDWR | O_CLOEXEC);
> + if (gunyah_fd < 0) {
> + perror("Failed to open /dev/gunyah");
> + return -1;
> + }
> +
> + vm_fd = ioctl(gunyah_fd, GH_CREATE_VM, 0);
> + if (vm_fd < 0) {
> + perror("Failed to create vm");
> + return -1;
> + }
> +
> + guest_fd = memfd_create("guest_memory", MFD_CLOEXEC);
> + if (guest_fd < 0) {
> + perror("Failed to create guest memfd");
> + return -1;
> + }
> +
> + if (ftruncate(guest_fd, config.guest_size) < 0) {
> + perror("Failed to grow guest memory");
> + return -1;
> + }
> +
> + guest_mem = mmap(NULL, config.guest_size, PROT_READ | PROT_WRITE, MAP_SHARED, guest_fd, 0);
> + if (guest_mem == MAP_FAILED) {
> + perror("Not enough memory");
> + return -1;
> + }
> +
> + if (read(config.image_fd, guest_mem + config.image_offset, config.image_size) < 0) {
> + perror("Failed to read image into guest memory");
> + return -1;
> + }
> +
> + if (read(config.dtb_fd, guest_mem + config.dtb_offset, config.dtb_size) < 0) {
> + perror("Failed to read dtb into guest memory");
> + return -1;
> + }
> +
> + if (config.ramdisk_fd > 0 &&
> + read(config.ramdisk_fd, guest_mem + config.ramdisk_offset,
> + config.ramdisk_size) < 0) {
> + perror("Failed to read ramdisk into guest memory");
> + return -1;
> + }
> +
> + guest_mem_desc.label = 0;
> + guest_mem_desc.flags = GH_MEM_ALLOW_READ | GH_MEM_ALLOW_WRITE | GH_MEM_ALLOW_EXEC;
> + guest_mem_desc.guest_phys_addr = config.guest_base;
> + guest_mem_desc.memory_size = config.guest_size;
> + guest_mem_desc.userspace_addr = (__u64)guest_mem;
> +
> + if (ioctl(vm_fd, GH_VM_SET_USER_MEM_REGION, &guest_mem_desc) < 0) {
> + perror("Failed to register guest memory with VM");
> + return -1;
> + }
> +
> + dtb_config.guest_phys_addr = config.guest_base + config.dtb_offset;
> + dtb_config.size = config.dtb_size;
> + if (ioctl(vm_fd, GH_VM_SET_DTB_CONFIG, &dtb_config) < 0) {
> + perror("Failed to set DTB configuration for VM");
> + return -1;
> + }
> +
> + ret = ioctl(vm_fd, GH_VM_START);
> + if (ret) {
> + perror("GH_VM_START failed");
> + return -1;
> + }
> +
> + while (1)
> + sleep(10);
Maybe call pause() instead of sleep?
> +
> + return 0;
> +}
> diff --git a/samples/gunyah/sample_vm.dts b/samples/gunyah/sample_vm.dts
> new file mode 100644
> index 000000000000..293bbc0469c8
> --- /dev/null
> +++ b/samples/gunyah/sample_vm.dts
> @@ -0,0 +1,68 @@
> +// SPDX-License-Identifier: BSD-3-Clause
> +/*
> + * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +/dts-v1/;
> +
> +/ {
> + #address-cells = <2>;
> + #size-cells = <2>;
> + interrupt-parent = <&intc>;
> +
> + chosen {
> + bootargs = "nokaslr";
> + };
> +
> + cpus {
> + #address-cells = <0x2>;
> + #size-cells = <0>;
> +
> + cpu@0 {
> + device_type = "cpu";
> + compatible = "arm,armv8";
> + reg = <0 0>;
> + };
> + };
> +
> + intc: interrupt-controller@3FFF0000 {
> + compatible = "arm,gic-v3";
> + #interrupt-cells = <3>;
> + #address-cells = <2>;
> + #size-cells = <2>;
> + interrupt-controller;
> + reg = <0 0x3FFF0000 0 0x10000>,
> + <0 0x3FFD0000 0 0x20000>;
> + };
> +
> + timer {
> + compatible = "arm,armv8-timer";
> + always-on;
> + interrupts = <1 13 0x108>,
> + <1 14 0x108>,
> + <1 11 0x108>,
> + <1 10 0x108>;
> + clock-frequency = <19200000>;
> + };
> +
> + gunyah-vm-config {
> + image-name = "linux_vm_0";
> +
> + memory {
> + #address-cells = <2>;
> + #size-cells = <2>;
> +
> + base-address = <0 0x80000000>;
> + };
> +
> + interrupts {
> + config = <&intc>;
> + };
> +
> + vcpus {
> + affinity-map = < 0 >;
> + sched-priority = < (-1) >;
> + sched-timeslice = < 2000 >;
> + };
> + };
> +};
On 09/05/2023 21:47, Elliot Berman wrote:
> Gunyah message queues are a unidirectional inter-VM pipe for messages up
> to 1024 bytes. This driver supports pairing a receiver message queue and
> a transmitter message queue to expose a single mailbox channel.
>
> Signed-off-by: Elliot Berman <[email protected]>
> ---
Reviewed-by: Srinivas Kandagatla <[email protected]>
--srini
> Documentation/virt/gunyah/message-queue.rst | 8 +
> drivers/mailbox/Makefile | 2 +
> drivers/mailbox/gunyah-msgq.c | 212 ++++++++++++++++++++
> include/linux/gunyah.h | 57 ++++++
> 4 files changed, 279 insertions(+)
> create mode 100644 drivers/mailbox/gunyah-msgq.c
>
> diff --git a/Documentation/virt/gunyah/message-queue.rst b/Documentation/virt/gunyah/message-queue.rst
> index b352918ae54b..70d82a4ef32d 100644
> --- a/Documentation/virt/gunyah/message-queue.rst
> +++ b/Documentation/virt/gunyah/message-queue.rst
> @@ -61,3 +61,11 @@ vIRQ: two TX message queues will have two vIRQs (and two capability IDs).
> | | | | | |
> | | | | | |
> +---------------+ +-----------------+ +---------------+
> +
> +Gunyah message queues are exposed as mailboxes. To create the mailbox, create
> +a mbox_client and call `gh_msgq_init()`. On receipt of the RX_READY interrupt,
> +all messages in the RX message queue are read and pushed via the `rx_callback`
> +of the registered mbox_client.
> +
> +.. kernel-doc:: drivers/mailbox/gunyah-msgq.c
> + :identifiers: gh_msgq_init
> diff --git a/drivers/mailbox/Makefile b/drivers/mailbox/Makefile
> index fc9376117111..5f929bb55e9a 100644
> --- a/drivers/mailbox/Makefile
> +++ b/drivers/mailbox/Makefile
> @@ -55,6 +55,8 @@ obj-$(CONFIG_MTK_CMDQ_MBOX) += mtk-cmdq-mailbox.o
>
> obj-$(CONFIG_ZYNQMP_IPI_MBOX) += zynqmp-ipi-mailbox.o
>
> +obj-$(CONFIG_GUNYAH) += gunyah-msgq.o
> +
> obj-$(CONFIG_SUN6I_MSGBOX) += sun6i-msgbox.o
>
> obj-$(CONFIG_SPRD_MBOX) += sprd-mailbox.o
> diff --git a/drivers/mailbox/gunyah-msgq.c b/drivers/mailbox/gunyah-msgq.c
> new file mode 100644
> index 000000000000..b7a54f233680
> --- /dev/null
> +++ b/drivers/mailbox/gunyah-msgq.c
> @@ -0,0 +1,212 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <linux/mailbox_controller.h>
> +#include <linux/module.h>
> +#include <linux/interrupt.h>
> +#include <linux/gunyah.h>
> +#include <linux/printk.h>
> +#include <linux/init.h>
> +#include <linux/slab.h>
> +#include <linux/wait.h>
> +
> +#define mbox_chan_to_msgq(chan) (container_of(chan->mbox, struct gh_msgq, mbox))
> +
> +static irqreturn_t gh_msgq_rx_irq_handler(int irq, void *data)
> +{
> + struct gh_msgq *msgq = data;
> + struct gh_msgq_rx_data rx_data;
> + enum gh_error gh_error;
> + bool ready = true;
> +
> + while (ready) {
> + gh_error = gh_hypercall_msgq_recv(msgq->rx_ghrsc->capid,
> + &rx_data.data, sizeof(rx_data.data),
> + &rx_data.length, &ready);
> + if (gh_error != GH_ERROR_OK) {
> + if (gh_error != GH_ERROR_MSGQUEUE_EMPTY)
> + dev_warn(msgq->mbox.dev, "Failed to receive data: %d\n", gh_error);
> + break;
> + }
> + if (likely(gh_msgq_chan(msgq)->cl))
> + mbox_chan_received_data(gh_msgq_chan(msgq), &rx_data);
> + }
> +
> + return IRQ_HANDLED;
> +}
> +
> +/* Fired when message queue transitions from "full" to "space available" to send messages */
> +static irqreturn_t gh_msgq_tx_irq_handler(int irq, void *data)
> +{
> + struct gh_msgq *msgq = data;
> +
> + mbox_chan_txdone(gh_msgq_chan(msgq), 0);
> +
> + return IRQ_HANDLED;
> +}
> +
> +/* Fired after sending message and hypercall told us there was more space available. */
> +static void gh_msgq_txdone_tasklet(struct tasklet_struct *tasklet)
> +{
> + struct gh_msgq *msgq = container_of(tasklet, struct gh_msgq, txdone_tasklet);
> +
> + mbox_chan_txdone(gh_msgq_chan(msgq), msgq->last_ret);
> +}
> +
> +static int gh_msgq_send_data(struct mbox_chan *chan, void *data)
> +{
> + struct gh_msgq *msgq = mbox_chan_to_msgq(chan);
> + struct gh_msgq_tx_data *msgq_data = data;
> + u64 tx_flags = 0;
> + enum gh_error gh_error;
> + bool ready;
> +
> + if (!msgq->tx_ghrsc)
> + return -EOPNOTSUPP;
> +
> + if (msgq_data->push)
> + tx_flags |= GH_HYPERCALL_MSGQ_TX_FLAGS_PUSH;
> +
> + gh_error = gh_hypercall_msgq_send(msgq->tx_ghrsc->capid, msgq_data->length, msgq_data->data,
> + tx_flags, &ready);
> +
> + /**
> + * unlikely because Linux tracks state of msgq and should not try to
> + * send message when msgq is full.
> + */
> + if (unlikely(gh_error == GH_ERROR_MSGQUEUE_FULL))
> + return -EAGAIN;
> +
> + /**
> + * Propagate all other errors to client. If we return error to mailbox
> + * framework, then no other messages can be sent and nobody will know
> + * to retry this message.
> + */
> + msgq->last_ret = gh_error_remap(gh_error);
> +
> + /**
> + * This message was successfully sent, but message queue isn't ready to
> + * accept more messages because it's now full. Mailbox framework
> + * requires that we only report that message was transmitted when
> + * we're ready to transmit another message. We'll get that in the form
> + * of tx IRQ once the other side starts to drain the msgq.
> + */
> + if (gh_error == GH_ERROR_OK) {
> + if (!ready)
> + return 0;
> + } else
> + dev_err(msgq->mbox.dev, "Failed to send data: %d (%d)\n", gh_error, msgq->last_ret);
> +
> + /**
> + * We can send more messages. Mailbox framework requires that tx done
> + * happens asynchronously to sending the message. Gunyah message queues
> + * tell us right away on the hypercall return whether we can send more
> + * messages. To work around this, defer the txdone to a tasklet.
> + */
> + tasklet_schedule(&msgq->txdone_tasklet);
> +
> + return 0;
> +}
> +
> +static struct mbox_chan_ops gh_msgq_ops = {
> + .send_data = gh_msgq_send_data,
> +};
> +
> +/**
> + * gh_msgq_init() - Initialize a Gunyah message queue with an mbox_client
> + * @parent: device parent used for the mailbox controller
> + * @msgq: Pointer to the gh_msgq to initialize
> + * @cl: A mailbox client to bind to the mailbox channel that the message queue creates
> + * @tx_ghrsc: optional, the transmission side of the message queue
> + * @rx_ghrsc: optional, the receiving side of the message queue
> + *
> + * At least one of tx_ghrsc and rx_ghrsc must be not NULL. Most message queue use cases come with
> + * a pair of message queues to facilitate bidirectional communication. When tx_ghrsc is set,
> + * the client can send messages with mbox_send_message(gh_msgq_chan(msgq), msg). When rx_ghrsc
> + * is set, the mbox_client must register an .rx_callback() and the message queue driver will
> + * deliver all available messages upon receiving the RX ready interrupt. The messages should be
> + * consumed or copied by the client right away as the gh_msgq_rx_data will be replaced/destroyed
> + * after the callback.
> + *
> + * Returns - 0 on success, negative otherwise
> + */
> +int gh_msgq_init(struct device *parent, struct gh_msgq *msgq, struct mbox_client *cl,
> + struct gh_resource *tx_ghrsc, struct gh_resource *rx_ghrsc)
> +{
> + int ret;
> +
> + /* Must have at least a tx_ghrsc or rx_ghrsc and that they are the right device types */
> + if ((!tx_ghrsc && !rx_ghrsc) ||
> + (tx_ghrsc && tx_ghrsc->type != GH_RESOURCE_TYPE_MSGQ_TX) ||
> + (rx_ghrsc && rx_ghrsc->type != GH_RESOURCE_TYPE_MSGQ_RX))
> + return -EINVAL;
> +
> + msgq->mbox.dev = parent;
> + msgq->mbox.ops = &gh_msgq_ops;
> + msgq->mbox.num_chans = 1;
> + msgq->mbox.txdone_irq = true;
> + msgq->mbox.chans = &msgq->mbox_chan;
> +
> + ret = mbox_controller_register(&msgq->mbox);
> + if (ret)
> + return ret;
> +
> + ret = mbox_bind_client(gh_msgq_chan(msgq), cl);
> + if (ret)
> + goto err_mbox;
> +
> + if (tx_ghrsc) {
> + msgq->tx_ghrsc = tx_ghrsc;
> +
> + ret = request_irq(msgq->tx_ghrsc->irq, gh_msgq_tx_irq_handler, 0, "gh_msgq_tx",
> + msgq);
> + if (ret)
> + goto err_tx_ghrsc;
> +
> + tasklet_setup(&msgq->txdone_tasklet, gh_msgq_txdone_tasklet);
> + }
> +
> + if (rx_ghrsc) {
> + msgq->rx_ghrsc = rx_ghrsc;
> +
> + ret = request_threaded_irq(msgq->rx_ghrsc->irq, NULL, gh_msgq_rx_irq_handler,
> + IRQF_ONESHOT, "gh_msgq_rx", msgq);
> + if (ret)
> + goto err_tx_irq;
> + }
> +
> + return 0;
> +err_tx_irq:
> + if (msgq->tx_ghrsc)
> + free_irq(msgq->tx_ghrsc->irq, msgq);
> +
> + msgq->rx_ghrsc = NULL;
> +err_tx_ghrsc:
> + msgq->tx_ghrsc = NULL;
> +err_mbox:
> + mbox_controller_unregister(&msgq->mbox);
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(gh_msgq_init);
> +
> +void gh_msgq_remove(struct gh_msgq *msgq)
> +{
> + if (msgq->rx_ghrsc)
> + free_irq(msgq->rx_ghrsc->irq, msgq);
> +
> + if (msgq->tx_ghrsc) {
> + tasklet_kill(&msgq->txdone_tasklet);
> + free_irq(msgq->tx_ghrsc->irq, msgq);
> + }
> +
> + mbox_controller_unregister(&msgq->mbox);
> +
> + msgq->rx_ghrsc = NULL;
> + msgq->tx_ghrsc = NULL;
> +}
> +EXPORT_SYMBOL_GPL(gh_msgq_remove);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("Gunyah Message Queue Driver");
> diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
> index 01a6f202d037..982e27d10d57 100644
> --- a/include/linux/gunyah.h
> +++ b/include/linux/gunyah.h
> @@ -8,11 +8,68 @@
>
> #include <linux/bitfield.h>
> #include <linux/errno.h>
> +#include <linux/interrupt.h>
> #include <linux/limits.h>
> +#include <linux/mailbox_controller.h>
> +#include <linux/mailbox_client.h>
> #include <linux/types.h>
>
> +/* Matches resource manager's resource types for VM_GET_HYP_RESOURCES RPC */
> +enum gh_resource_type {
> + GH_RESOURCE_TYPE_BELL_TX = 0,
> + GH_RESOURCE_TYPE_BELL_RX = 1,
> + GH_RESOURCE_TYPE_MSGQ_TX = 2,
> + GH_RESOURCE_TYPE_MSGQ_RX = 3,
> + GH_RESOURCE_TYPE_VCPU = 4,
> +};
> +
> +struct gh_resource {
> + enum gh_resource_type type;
> + u64 capid;
> + unsigned int irq;
> +};
> +
> +/**
> + * Gunyah Message Queues
> + */
> +
> +#define GH_MSGQ_MAX_MSG_SIZE 240
> +
> +struct gh_msgq_tx_data {
> + size_t length;
> + bool push;
> + char data[];
> +};
> +
> +struct gh_msgq_rx_data {
> + size_t length;
> + char data[GH_MSGQ_MAX_MSG_SIZE];
> +};
> +
> +struct gh_msgq {
> + struct gh_resource *tx_ghrsc;
> + struct gh_resource *rx_ghrsc;
> +
> + /* msgq private */
> + int last_ret; /* Linux error, not GH_STATUS_* */
> + struct mbox_chan mbox_chan;
> + struct mbox_controller mbox;
> + struct tasklet_struct txdone_tasklet;
> +};
> +
> +
> +int gh_msgq_init(struct device *parent, struct gh_msgq *msgq, struct mbox_client *cl,
> + struct gh_resource *tx_ghrsc, struct gh_resource *rx_ghrsc);
> +void gh_msgq_remove(struct gh_msgq *msgq);
> +
> +static inline struct mbox_chan *gh_msgq_chan(struct gh_msgq *msgq)
> +{
> + return &msgq->mbox.chans[0];
> +}
> +
> /******************************************************************************/
> /* Common arch-independent definitions for Gunyah hypercalls */
> +
> #define GH_CAPID_INVAL U64_MAX
> #define GH_VMID_ROOT_VM 0xff
>
On 09/05/2023 21:47, Elliot Berman wrote:
> Add architecture-independent standard error codes, types, and macros for
> Gunyah hypercalls.
>
> Reviewed-by: Dmitry Baryshkov <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
> ---
lgtm,
Reviewed-by: Srinivas Kandagatla <[email protected]>
--srini
> include/linux/gunyah.h | 83 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 83 insertions(+)
> create mode 100644 include/linux/gunyah.h
>
> diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
> new file mode 100644
> index 000000000000..a4e8ec91961d
> --- /dev/null
> +++ b/include/linux/gunyah.h
> @@ -0,0 +1,83 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#ifndef _LINUX_GUNYAH_H
> +#define _LINUX_GUNYAH_H
> +
> +#include <linux/errno.h>
> +#include <linux/limits.h>
> +
> +/******************************************************************************/
> +/* Common arch-independent definitions for Gunyah hypercalls */
> +#define GH_CAPID_INVAL U64_MAX
> +#define GH_VMID_ROOT_VM 0xff
> +
> +enum gh_error {
> + GH_ERROR_OK = 0,
> + GH_ERROR_UNIMPLEMENTED = -1,
> + GH_ERROR_RETRY = -2,
> +
> + GH_ERROR_ARG_INVAL = 1,
> + GH_ERROR_ARG_SIZE = 2,
> + GH_ERROR_ARG_ALIGN = 3,
> +
> + GH_ERROR_NOMEM = 10,
> +
> + GH_ERROR_ADDR_OVFL = 20,
> + GH_ERROR_ADDR_UNFL = 21,
> + GH_ERROR_ADDR_INVAL = 22,
> +
> + GH_ERROR_DENIED = 30,
> + GH_ERROR_BUSY = 31,
> + GH_ERROR_IDLE = 32,
> +
> + GH_ERROR_IRQ_BOUND = 40,
> + GH_ERROR_IRQ_UNBOUND = 41,
> +
> + GH_ERROR_CSPACE_CAP_NULL = 50,
> + GH_ERROR_CSPACE_CAP_REVOKED = 51,
> + GH_ERROR_CSPACE_WRONG_OBJ_TYPE = 52,
> + GH_ERROR_CSPACE_INSUF_RIGHTS = 53,
> + GH_ERROR_CSPACE_FULL = 54,
> +
> + GH_ERROR_MSGQUEUE_EMPTY = 60,
> + GH_ERROR_MSGQUEUE_FULL = 61,
> +};
> +
> +/**
> + * gh_error_remap() - Remap Gunyah hypervisor errors into a Linux error code
> + * @gh_error: Gunyah hypercall return value
> + */
> +static inline int gh_error_remap(enum gh_error gh_error)
> +{
> + switch (gh_error) {
> + case GH_ERROR_OK:
> + return 0;
> + case GH_ERROR_NOMEM:
> + return -ENOMEM;
> + case GH_ERROR_DENIED:
> + case GH_ERROR_CSPACE_CAP_NULL:
> + case GH_ERROR_CSPACE_CAP_REVOKED:
> + case GH_ERROR_CSPACE_WRONG_OBJ_TYPE:
> + case GH_ERROR_CSPACE_INSUF_RIGHTS:
> + case GH_ERROR_CSPACE_FULL:
> + return -EACCES;
> + case GH_ERROR_BUSY:
> + case GH_ERROR_IDLE:
> + return -EBUSY;
> + case GH_ERROR_IRQ_BOUND:
> + case GH_ERROR_IRQ_UNBOUND:
> + case GH_ERROR_MSGQUEUE_FULL:
> + case GH_ERROR_MSGQUEUE_EMPTY:
> + return -EIO;
> + case GH_ERROR_UNIMPLEMENTED:
> + case GH_ERROR_RETRY:
> + return -EOPNOTSUPP;
> + default:
> + return -EINVAL;
> + }
> +}
> +
> +#endif
On 09/05/2023 21:47, Elliot Berman wrote:
> Add hypercalls to send and receive messages on a Gunyah message queue.
>
> Signed-off-by: Elliot Berman <[email protected]>
> ---
Reviewed-by: Srinivas Kandagatla <[email protected]>
--srini
> arch/arm64/gunyah/gunyah_hypercall.c | 31 ++++++++++++++++++++++++++++
> include/linux/gunyah.h | 6 ++++++
> 2 files changed, 37 insertions(+)
>
> diff --git a/arch/arm64/gunyah/gunyah_hypercall.c b/arch/arm64/gunyah/gunyah_hypercall.c
> index 2166d5dab869..2b2a63e9b9e5 100644
> --- a/arch/arm64/gunyah/gunyah_hypercall.c
> +++ b/arch/arm64/gunyah/gunyah_hypercall.c
> @@ -33,6 +33,8 @@ EXPORT_SYMBOL_GPL(arch_is_gh_guest);
> fn)
>
> #define GH_HYPERCALL_HYP_IDENTIFY GH_HYPERCALL(0x8000)
> +#define GH_HYPERCALL_MSGQ_SEND GH_HYPERCALL(0x801B)
> +#define GH_HYPERCALL_MSGQ_RECV GH_HYPERCALL(0x801C)
>
> /**
> * gh_hypercall_hyp_identify() - Returns build information and feature flags
> @@ -52,5 +54,34 @@ void gh_hypercall_hyp_identify(struct gh_hypercall_hyp_identify_resp *hyp_identi
> }
> EXPORT_SYMBOL_GPL(gh_hypercall_hyp_identify);
>
> +enum gh_error gh_hypercall_msgq_send(u64 capid, size_t size, void *buff, u64 tx_flags, bool *ready)
> +{
> + struct arm_smccc_res res;
> +
> + arm_smccc_1_1_hvc(GH_HYPERCALL_MSGQ_SEND, capid, size, (uintptr_t)buff, tx_flags, 0, &res);
> +
> + if (res.a0 == GH_ERROR_OK)
> + *ready = !!res.a1;
> +
> + return res.a0;
> +}
> +EXPORT_SYMBOL_GPL(gh_hypercall_msgq_send);
> +
> +enum gh_error gh_hypercall_msgq_recv(u64 capid, void *buff, size_t size, size_t *recv_size,
> + bool *ready)
> +{
> + struct arm_smccc_res res;
> +
> + arm_smccc_1_1_hvc(GH_HYPERCALL_MSGQ_RECV, capid, (uintptr_t)buff, size, 0, &res);
> +
> + if (res.a0 == GH_ERROR_OK) {
> + *recv_size = res.a1;
> + *ready = !!res.a2;
> + }
> +
> + return res.a0;
> +}
> +EXPORT_SYMBOL_GPL(gh_hypercall_msgq_recv);
> +
> MODULE_LICENSE("GPL");
> MODULE_DESCRIPTION("Gunyah Hypervisor Hypercalls");
> diff --git a/include/linux/gunyah.h b/include/linux/gunyah.h
> index 6b36cf4787ef..01a6f202d037 100644
> --- a/include/linux/gunyah.h
> +++ b/include/linux/gunyah.h
> @@ -111,4 +111,10 @@ static inline u16 gh_api_version(const struct gh_hypercall_hyp_identify_resp *gh
>
> void gh_hypercall_hyp_identify(struct gh_hypercall_hyp_identify_resp *hyp_identity);
>
> +#define GH_HYPERCALL_MSGQ_TX_FLAGS_PUSH BIT(0)
> +
> +enum gh_error gh_hypercall_msgq_send(u64 capid, size_t size, void *buff, u64 tx_flags, bool *ready);
> +enum gh_error gh_hypercall_msgq_recv(u64 capid, void *buff, size_t size, size_t *recv_size,
> + bool *ready);
> +
> #endif
On 09/05/2023 21:47, Elliot Berman wrote:
> The resource manager is a special virtual machine which is always
> running on a Gunyah system. It provides APIs for creating and destroying
> VMs, secure memory management, sharing/lending of memory between VMs,
> and setup of inter-VM communication. Calls to the resource manager are
> made via message queues.
>
> This patch implements the basic probing and RPC mechanism to make those
> API calls. Request/response calls can be made with gh_rm_call.
> Drivers can also register to notifications pushed by RM via
> gh_rm_register_notifier
>
> Specific API calls that resource manager supports will be implemented in
> subsequent patches.
>
> Signed-off-by: Elliot Berman <[email protected]>
> ---
Reviewed-by: Srinivas Kandagatla <[email protected]>
--srini
> drivers/virt/Makefile | 1 +
> drivers/virt/gunyah/Makefile | 4 +
> drivers/virt/gunyah/rsc_mgr.c | 702 +++++++++++++++++++++++++++++++++
> drivers/virt/gunyah/rsc_mgr.h | 16 +
> include/linux/gunyah_rsc_mgr.h | 21 +
> 5 files changed, 744 insertions(+)
> create mode 100644 drivers/virt/gunyah/Makefile
> create mode 100644 drivers/virt/gunyah/rsc_mgr.c
> create mode 100644 drivers/virt/gunyah/rsc_mgr.h
> create mode 100644 include/linux/gunyah_rsc_mgr.h
>
> diff --git a/drivers/virt/Makefile b/drivers/virt/Makefile
> index e9aa6fc96fab..a5817e2d7d71 100644
> --- a/drivers/virt/Makefile
> +++ b/drivers/virt/Makefile
> @@ -12,3 +12,4 @@ obj-$(CONFIG_ACRN_HSM) += acrn/
> obj-$(CONFIG_EFI_SECRET) += coco/efi_secret/
> obj-$(CONFIG_SEV_GUEST) += coco/sev-guest/
> obj-$(CONFIG_INTEL_TDX_GUEST) += coco/tdx-guest/
> +obj-y += gunyah/
> diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
> new file mode 100644
> index 000000000000..0f5aec834698
> --- /dev/null
> +++ b/drivers/virt/gunyah/Makefile
> @@ -0,0 +1,4 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +gunyah-y += rsc_mgr.o
> +obj-$(CONFIG_GUNYAH) += gunyah.o
> diff --git a/drivers/virt/gunyah/rsc_mgr.c b/drivers/virt/gunyah/rsc_mgr.c
> new file mode 100644
> index 000000000000..88b5beb1ea51
> --- /dev/null
> +++ b/drivers/virt/gunyah/rsc_mgr.c
> @@ -0,0 +1,702 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <linux/of.h>
> +#include <linux/slab.h>
> +#include <linux/mutex.h>
> +#include <linux/sched.h>
> +#include <linux/gunyah.h>
> +#include <linux/module.h>
> +#include <linux/of_irq.h>
> +#include <linux/notifier.h>
> +#include <linux/workqueue.h>
> +#include <linux/completion.h>
> +#include <linux/gunyah_rsc_mgr.h>
> +#include <linux/platform_device.h>
> +
> +#include "rsc_mgr.h"
> +
> +#define RM_RPC_API_VERSION_MASK GENMASK(3, 0)
> +#define RM_RPC_HEADER_WORDS_MASK GENMASK(7, 4)
> +#define RM_RPC_API_VERSION FIELD_PREP(RM_RPC_API_VERSION_MASK, 1)
> +#define RM_RPC_HEADER_WORDS FIELD_PREP(RM_RPC_HEADER_WORDS_MASK, \
> + (sizeof(struct gh_rm_rpc_hdr) / sizeof(u32)))
> +#define RM_RPC_API (RM_RPC_API_VERSION | RM_RPC_HEADER_WORDS)
> +
> +#define RM_RPC_TYPE_CONTINUATION 0x0
> +#define RM_RPC_TYPE_REQUEST 0x1
> +#define RM_RPC_TYPE_REPLY 0x2
> +#define RM_RPC_TYPE_NOTIF 0x3
> +#define RM_RPC_TYPE_MASK GENMASK(1, 0)
> +
> +#define GH_RM_MAX_NUM_FRAGMENTS 62
> +#define RM_RPC_FRAGMENTS_MASK GENMASK(7, 2)
> +
> +struct gh_rm_rpc_hdr {
> + u8 api;
> + u8 type;
> + __le16 seq;
> + __le32 msg_id;
> +} __packed;
> +
> +struct gh_rm_rpc_reply_hdr {
> + struct gh_rm_rpc_hdr hdr;
> + __le32 err_code; /* GH_RM_ERROR_* */
> +} __packed;
> +
> +#define GH_RM_MAX_MSG_SIZE (GH_MSGQ_MAX_MSG_SIZE - sizeof(struct gh_rm_rpc_hdr))
> +
> +/* RM Error codes */
> +enum gh_rm_error {
> + GH_RM_ERROR_OK = 0x0,
> + GH_RM_ERROR_UNIMPLEMENTED = 0xFFFFFFFF,
> + GH_RM_ERROR_NOMEM = 0x1,
> + GH_RM_ERROR_NORESOURCE = 0x2,
> + GH_RM_ERROR_DENIED = 0x3,
> + GH_RM_ERROR_INVALID = 0x4,
> + GH_RM_ERROR_BUSY = 0x5,
> + GH_RM_ERROR_ARGUMENT_INVALID = 0x6,
> + GH_RM_ERROR_HANDLE_INVALID = 0x7,
> + GH_RM_ERROR_VALIDATE_FAILED = 0x8,
> + GH_RM_ERROR_MAP_FAILED = 0x9,
> + GH_RM_ERROR_MEM_INVALID = 0xA,
> + GH_RM_ERROR_MEM_INUSE = 0xB,
> + GH_RM_ERROR_MEM_RELEASED = 0xC,
> + GH_RM_ERROR_VMID_INVALID = 0xD,
> + GH_RM_ERROR_LOOKUP_FAILED = 0xE,
> + GH_RM_ERROR_IRQ_INVALID = 0xF,
> + GH_RM_ERROR_IRQ_INUSE = 0x10,
> + GH_RM_ERROR_IRQ_RELEASED = 0x11,
> +};
> +
> +/**
> + * struct gh_rm_connection - Represents a complete message from resource manager
> + * @payload: Combined payload of all the fragments (msg headers stripped off).
> + * @size: Size of the payload received so far.
> + * @msg_id: Message ID from the header.
> + * @type: RM_RPC_TYPE_REPLY or RM_RPC_TYPE_NOTIF.
> + * @num_fragments: total number of fragments expected to be received.
> + * @fragments_received: fragments received so far.
> + * @reply: Fields used for request/reply sequences
> + * @notification: Fields used for notifiations
> + */
> +struct gh_rm_connection {
> + void *payload;
> + size_t size;
> + __le32 msg_id;
> + u8 type;
> +
> + u8 num_fragments;
> + u8 fragments_received;
> +
> + union {
> + /**
> + * @ret: Linux return code, there was an error processing connection
> + * @seq: Sequence ID for the main message.
> + * @rm_error: For request/reply sequences with standard replies
> + * @seq_done: Signals caller that the RM reply has been received
> + */
> + struct {
> + int ret;
> + u16 seq;
> + enum gh_rm_error rm_error;
> + struct completion seq_done;
> + } reply;
> +
> + /**
> + * @rm: Pointer to the RM that launched the connection
> + * @work: Triggered when all fragments of a notification received
> + */
> + struct {
> + struct gh_rm *rm;
> + struct work_struct work;
> + } notification;
> + };
> +};
> +
> +/**
> + * struct gh_rm - private data for communicating w/Gunyah resource manager
> + * @dev: pointer to device
> + * @tx_ghrsc: message queue resource to TX to RM
> + * @rx_ghrsc: message queue resource to RX from RM
> + * @msgq: mailbox instance of TX/RX resources above
> + * @msgq_client: mailbox client of above msgq
> + * @active_rx_connection: ongoing gh_rm_connection for which we're receiving fragments
> + * @last_tx_ret: return value of last mailbox tx
> + * @call_xarray: xarray to allocate & lookup sequence IDs for Request/Response flows
> + * @next_seq: next ID to allocate (for xa_alloc_cyclic)
> + * @cache: cache for allocating Tx messages
> + * @send_lock: synchronization to allow only one request to be sent at a time
> + * @nh: notifier chain for clients interested in RM notification messages
> + */
> +struct gh_rm {
> + struct device *dev;
> + struct gh_resource tx_ghrsc;
> + struct gh_resource rx_ghrsc;
> + struct gh_msgq msgq;
> + struct mbox_client msgq_client;
> + struct gh_rm_connection *active_rx_connection;
> + int last_tx_ret;
> +
> + struct xarray call_xarray;
> + u32 next_seq;
> +
> + struct kmem_cache *cache;
> + struct mutex send_lock;
> + struct blocking_notifier_head nh;
> +};
> +
> +/**
> + * gh_rm_remap_error() - Remap Gunyah resource manager errors into a Linux error code
> + * @rm_error: "Standard" return value from Gunyah resource manager
> + */
> +static inline int gh_rm_remap_error(enum gh_rm_error rm_error)
> +{
> + switch (rm_error) {
> + case GH_RM_ERROR_OK:
> + return 0;
> + case GH_RM_ERROR_UNIMPLEMENTED:
> + return -EOPNOTSUPP;
> + case GH_RM_ERROR_NOMEM:
> + return -ENOMEM;
> + case GH_RM_ERROR_NORESOURCE:
> + return -ENODEV;
> + case GH_RM_ERROR_DENIED:
> + return -EPERM;
> + case GH_RM_ERROR_BUSY:
> + return -EBUSY;
> + case GH_RM_ERROR_INVALID:
> + case GH_RM_ERROR_ARGUMENT_INVALID:
> + case GH_RM_ERROR_HANDLE_INVALID:
> + case GH_RM_ERROR_VALIDATE_FAILED:
> + case GH_RM_ERROR_MAP_FAILED:
> + case GH_RM_ERROR_MEM_INVALID:
> + case GH_RM_ERROR_MEM_INUSE:
> + case GH_RM_ERROR_MEM_RELEASED:
> + case GH_RM_ERROR_VMID_INVALID:
> + case GH_RM_ERROR_LOOKUP_FAILED:
> + case GH_RM_ERROR_IRQ_INVALID:
> + case GH_RM_ERROR_IRQ_INUSE:
> + case GH_RM_ERROR_IRQ_RELEASED:
> + return -EINVAL;
> + default:
> + return -EBADMSG;
> + }
> +}
> +
> +static int gh_rm_init_connection_payload(struct gh_rm_connection *connection, void *msg,
> + size_t hdr_size, size_t msg_size)
> +{
> + size_t max_buf_size, payload_size;
> + struct gh_rm_rpc_hdr *hdr = msg;
> +
> + if (msg_size < hdr_size)
> + return -EINVAL;
> +
> + payload_size = msg_size - hdr_size;
> +
> + connection->num_fragments = FIELD_GET(RM_RPC_FRAGMENTS_MASK, hdr->type);
> + connection->fragments_received = 0;
> +
> + /* There's not going to be any payload, no need to allocate buffer. */
> + if (!payload_size && !connection->num_fragments)
> + return 0;
> +
> + if (connection->num_fragments > GH_RM_MAX_NUM_FRAGMENTS)
> + return -EINVAL;
> +
> + max_buf_size = payload_size + (connection->num_fragments * GH_RM_MAX_MSG_SIZE);
> +
> + connection->payload = kzalloc(max_buf_size, GFP_KERNEL);
> + if (!connection->payload)
> + return -ENOMEM;
> +
> + memcpy(connection->payload, msg + hdr_size, payload_size);
> + connection->size = payload_size;
> + return 0;
> +}
> +
> +static void gh_rm_abort_connection(struct gh_rm *rm)
> +{
> + switch (rm->active_rx_connection->type) {
> + case RM_RPC_TYPE_REPLY:
> + rm->active_rx_connection->reply.ret = -EIO;
> + complete(&rm->active_rx_connection->reply.seq_done);
> + break;
> + case RM_RPC_TYPE_NOTIF:
> + fallthrough;
> + default:
> + kfree(rm->active_rx_connection->payload);
> + kfree(rm->active_rx_connection);
> + }
> +
> + rm->active_rx_connection = NULL;
> +}
> +
> +static void gh_rm_notif_work(struct work_struct *work)
> +{
> + struct gh_rm_connection *connection = container_of(work, struct gh_rm_connection,
> + notification.work);
> + struct gh_rm *rm = connection->notification.rm;
> +
> + blocking_notifier_call_chain(&rm->nh, le32_to_cpu(connection->msg_id), connection->payload);
> +
> + put_device(rm->dev);
> + kfree(connection->payload);
> + kfree(connection);
> +}
> +
> +static void gh_rm_process_notif(struct gh_rm *rm, void *msg, size_t msg_size)
> +{
> + struct gh_rm_connection *connection;
> + struct gh_rm_rpc_hdr *hdr = msg;
> + int ret;
> +
> + if (rm->active_rx_connection)
> + gh_rm_abort_connection(rm);
> +
> + connection = kzalloc(sizeof(*connection), GFP_KERNEL);
> + if (!connection)
> + return;
> +
> + connection->type = RM_RPC_TYPE_NOTIF;
> + connection->msg_id = hdr->msg_id;
> +
> + get_device(rm->dev);
> + connection->notification.rm = rm;
> + INIT_WORK(&connection->notification.work, gh_rm_notif_work);
> +
> + ret = gh_rm_init_connection_payload(connection, msg, sizeof(*hdr), msg_size);
> + if (ret) {
> + dev_err(rm->dev, "Failed to initialize connection for notification: %d\n", ret);
> + put_device(rm->dev);
> + kfree(connection);
> + return;
> + }
> +
> + rm->active_rx_connection = connection;
> +}
> +
> +static void gh_rm_process_rply(struct gh_rm *rm, void *msg, size_t msg_size)
> +{
> + struct gh_rm_rpc_reply_hdr *reply_hdr = msg;
> + struct gh_rm_connection *connection;
> + u16 seq_id;
> +
> + seq_id = le16_to_cpu(reply_hdr->hdr.seq);
> + connection = xa_load(&rm->call_xarray, seq_id);
> +
> + if (!connection || connection->msg_id != reply_hdr->hdr.msg_id)
> + return;
> +
> + if (rm->active_rx_connection)
> + gh_rm_abort_connection(rm);
> +
> + if (gh_rm_init_connection_payload(connection, msg, sizeof(*reply_hdr), msg_size)) {
> + dev_err(rm->dev, "Failed to alloc connection buffer for sequence %d\n", seq_id);
> + /* Send connection complete and error the client. */
> + connection->reply.ret = -ENOMEM;
> + complete(&connection->reply.seq_done);
> + return;
> + }
> +
> + connection->reply.rm_error = le32_to_cpu(reply_hdr->err_code);
> + rm->active_rx_connection = connection;
> +}
> +
> +static void gh_rm_process_cont(struct gh_rm *rm, struct gh_rm_connection *connection,
> + void *msg, size_t msg_size)
> +{
> + struct gh_rm_rpc_hdr *hdr = msg;
> + size_t payload_size = msg_size - sizeof(*hdr);
> +
> + if (!rm->active_rx_connection)
> + return;
> +
> + /*
> + * hdr->fragments and hdr->msg_id preserves the value from first reply
> + * or notif message. To detect mishandling, check it's still intact.
> + */
> + if (connection->msg_id != hdr->msg_id ||
> + connection->num_fragments != FIELD_GET(RM_RPC_FRAGMENTS_MASK, hdr->type)) {
> + gh_rm_abort_connection(rm);
> + return;
> + }
> +
> + memcpy(connection->payload + connection->size, msg + sizeof(*hdr), payload_size);
> + connection->size += payload_size;
> + connection->fragments_received++;
> +}
> +
> +static void gh_rm_try_complete_connection(struct gh_rm *rm)
> +{
> + struct gh_rm_connection *connection = rm->active_rx_connection;
> +
> + if (!connection || connection->fragments_received != connection->num_fragments)
> + return;
> +
> + switch (connection->type) {
> + case RM_RPC_TYPE_REPLY:
> + complete(&connection->reply.seq_done);
> + break;
> + case RM_RPC_TYPE_NOTIF:
> + schedule_work(&connection->notification.work);
> + break;
> + default:
> + dev_err_ratelimited(rm->dev, "Invalid message type (%u) received\n",
> + connection->type);
> + gh_rm_abort_connection(rm);
> + break;
> + }
> +
> + rm->active_rx_connection = NULL;
> +}
> +
> +static void gh_rm_msgq_rx_data(struct mbox_client *cl, void *mssg)
> +{
> + struct gh_rm *rm = container_of(cl, struct gh_rm, msgq_client);
> + struct gh_msgq_rx_data *rx_data = mssg;
> + size_t msg_size = rx_data->length;
> + void *msg = rx_data->data;
> + struct gh_rm_rpc_hdr *hdr;
> +
> + if (msg_size < sizeof(*hdr) || msg_size > GH_MSGQ_MAX_MSG_SIZE)
> + return;
> +
> + hdr = msg;
> + if (hdr->api != RM_RPC_API) {
> + dev_err(rm->dev, "Unknown RM RPC API version: %x\n", hdr->api);
> + return;
> + }
> +
> + switch (FIELD_GET(RM_RPC_TYPE_MASK, hdr->type)) {
> + case RM_RPC_TYPE_NOTIF:
> + gh_rm_process_notif(rm, msg, msg_size);
> + break;
> + case RM_RPC_TYPE_REPLY:
> + gh_rm_process_rply(rm, msg, msg_size);
> + break;
> + case RM_RPC_TYPE_CONTINUATION:
> + gh_rm_process_cont(rm, rm->active_rx_connection, msg, msg_size);
> + break;
> + default:
> + dev_err(rm->dev, "Invalid message type (%lu) received\n",
> + FIELD_GET(RM_RPC_TYPE_MASK, hdr->type));
> + return;
> + }
> +
> + gh_rm_try_complete_connection(rm);
> +}
> +
> +static void gh_rm_msgq_tx_done(struct mbox_client *cl, void *mssg, int r)
> +{
> + struct gh_rm *rm = container_of(cl, struct gh_rm, msgq_client);
> +
> + kmem_cache_free(rm->cache, mssg);
> + rm->last_tx_ret = r;
> +}
> +
> +static int gh_rm_send_request(struct gh_rm *rm, u32 message_id,
> + const void *req_buf, size_t req_buf_size,
> + struct gh_rm_connection *connection)
> +{
> + size_t buf_size_remaining = req_buf_size;
> + const void *req_buf_curr = req_buf;
> + struct gh_msgq_tx_data *msg;
> + struct gh_rm_rpc_hdr *hdr, hdr_template;
> + u32 cont_fragments = 0;
> + size_t payload_size;
> + void *payload;
> + int ret;
> +
> + if (req_buf_size > GH_RM_MAX_NUM_FRAGMENTS * GH_RM_MAX_MSG_SIZE) {
> + dev_warn(rm->dev, "Limit (%lu bytes) exceeded for the maximum message size: %lu\n",
> + GH_RM_MAX_NUM_FRAGMENTS * GH_RM_MAX_MSG_SIZE, req_buf_size);
> + dump_stack();
> + return -E2BIG;
> + }
> +
> + if (req_buf_size)
> + cont_fragments = (req_buf_size - 1) / GH_RM_MAX_MSG_SIZE;
> +
> + hdr_template.api = RM_RPC_API;
> + hdr_template.type = FIELD_PREP(RM_RPC_TYPE_MASK, RM_RPC_TYPE_REQUEST) |
> + FIELD_PREP(RM_RPC_FRAGMENTS_MASK, cont_fragments);
> + hdr_template.seq = cpu_to_le16(connection->reply.seq);
> + hdr_template.msg_id = cpu_to_le32(message_id);
> +
> + ret = mutex_lock_interruptible(&rm->send_lock);
> + if (ret)
> + return ret;
> +
> + do {
> + msg = kmem_cache_zalloc(rm->cache, GFP_KERNEL);
> + if (!msg) {
> + ret = -ENOMEM;
> + goto out;
> + }
> +
> + /* Fill header */
> + hdr = (struct gh_rm_rpc_hdr *)&msg->data[0];
> + *hdr = hdr_template;
> +
> + /* Copy payload */
> + payload = &msg->data[0] + sizeof(*hdr);
> + payload_size = min(buf_size_remaining, GH_RM_MAX_MSG_SIZE);
> + memcpy(payload, req_buf_curr, payload_size);
> + req_buf_curr += payload_size;
> + buf_size_remaining -= payload_size;
> +
> + /* Force the last fragment to immediately alert the receiver */
> + msg->push = !buf_size_remaining;
> + msg->length = sizeof(*hdr) + payload_size;
> +
> + ret = mbox_send_message(gh_msgq_chan(&rm->msgq), msg);
> + if (ret < 0) {
> + kmem_cache_free(rm->cache, msg);
> + break;
> + }
> +
> + if (rm->last_tx_ret) {
> + ret = rm->last_tx_ret;
> + break;
> + }
> +
> + hdr_template.type = FIELD_PREP(RM_RPC_TYPE_MASK, RM_RPC_TYPE_CONTINUATION) |
> + FIELD_PREP(RM_RPC_FRAGMENTS_MASK, cont_fragments);
> + } while (buf_size_remaining);
> +
> +out:
> + mutex_unlock(&rm->send_lock);
> + return ret < 0 ? ret : 0;
> +}
> +
> +/**
> + * gh_rm_call: Achieve request-response type communication with RPC
> + * @rm: Pointer to Gunyah resource manager internal data
> + * @message_id: The RM RPC message-id
> + * @req_buf: Request buffer that contains the payload
> + * @req_buf_size: Total size of the payload
> + * @resp_buf: Pointer to a response buffer
> + * @resp_buf_size: Size of the response buffer
> + *
> + * Make a request to the Resource Manager and wait for reply back. For a successful
> + * response, the function returns the payload. The size of the payload is set in
> + * resp_buf_size. The resp_buf must be freed by the caller when 0 is returned
> + * and resp_buf_size != 0.
> + *
> + * req_buf should be not NULL for req_buf_size >0. If req_buf_size == 0,
> + * req_buf *can* be NULL and no additional payload is sent.
> + *
> + * Context: Process context. Will sleep waiting for reply.
> + * Return: 0 on success. <0 if error.
> + */
> +int gh_rm_call(struct gh_rm *rm, u32 message_id, const void *req_buf, size_t req_buf_size,
> + void **resp_buf, size_t *resp_buf_size)
> +{
> + struct gh_rm_connection *connection;
> + u32 seq_id;
> + int ret;
> +
> + /* message_id 0 is reserved. req_buf_size implies req_buf is not NULL */
> + if (!rm || !message_id || (!req_buf && req_buf_size))
> + return -EINVAL;
> +
> +
> + connection = kzalloc(sizeof(*connection), GFP_KERNEL);
> + if (!connection)
> + return -ENOMEM;
> +
> + connection->type = RM_RPC_TYPE_REPLY;
> + connection->msg_id = cpu_to_le32(message_id);
> +
> + init_completion(&connection->reply.seq_done);
> +
> + /* Allocate a new seq number for this connection */
> + ret = xa_alloc_cyclic(&rm->call_xarray, &seq_id, connection, xa_limit_16b, &rm->next_seq,
> + GFP_KERNEL);
> + if (ret < 0)
> + goto free;
> + connection->reply.seq = lower_16_bits(seq_id);
> +
> + /* Send the request to the Resource Manager */
> + ret = gh_rm_send_request(rm, message_id, req_buf, req_buf_size, connection);
> + if (ret < 0)
> + goto out;
> +
> + /* Wait for response */
> + ret = wait_for_completion_interruptible(&connection->reply.seq_done);
> + if (ret)
> + goto out;
> +
> + /* Check for internal (kernel) error waiting for the response */
> + if (connection->reply.ret) {
> + ret = connection->reply.ret;
> + if (ret != -ENOMEM)
> + kfree(connection->payload);
> + goto out;
> + }
> +
> + /* Got a response, did resource manager give us an error? */
> + if (connection->reply.rm_error != GH_RM_ERROR_OK) {
> + dev_warn(rm->dev, "RM rejected message %08x. Error: %d\n", message_id,
> + connection->reply.rm_error);
> + dump_stack();
> + ret = gh_rm_remap_error(connection->reply.rm_error);
> + kfree(connection->payload);
> + goto out;
> + }
> +
> + /* Everything looks good, return the payload */
> + if (resp_buf_size)
> + *resp_buf_size = connection->size;
> + if (connection->size && resp_buf)
> + *resp_buf = connection->payload;
> + else {
> + /* kfree in case RM sent us multiple fragments but never any data in
> + * those fragments. We would've allocated memory for it, but connection->size == 0
> + */
> + kfree(connection->payload);
> + }
> +
> +out:
> + xa_erase(&rm->call_xarray, connection->reply.seq);
> +free:
> + kfree(connection);
> + return ret;
> +}
> +
> +
> +int gh_rm_notifier_register(struct gh_rm *rm, struct notifier_block *nb)
> +{
> + return blocking_notifier_chain_register(&rm->nh, nb);
> +}
> +EXPORT_SYMBOL_GPL(gh_rm_notifier_register);
> +
> +int gh_rm_notifier_unregister(struct gh_rm *rm, struct notifier_block *nb)
> +{
> + return blocking_notifier_chain_unregister(&rm->nh, nb);
> +}
> +EXPORT_SYMBOL_GPL(gh_rm_notifier_unregister);
> +
> +static int gh_msgq_platform_probe_direction(struct platform_device *pdev, bool tx,
> + struct gh_resource *ghrsc)
> +{
> + struct device_node *node = pdev->dev.of_node;
> + int ret;
> + int idx = tx ? 0 : 1;
> +
> + ghrsc->type = tx ? GH_RESOURCE_TYPE_MSGQ_TX : GH_RESOURCE_TYPE_MSGQ_RX;
> +
> + ghrsc->irq = platform_get_irq(pdev, idx);
> + if (ghrsc->irq < 0) {
> + dev_err(&pdev->dev, "Failed to get irq%d: %d\n", idx, ghrsc->irq);
> + return ghrsc->irq;
> + }
> +
> + ret = of_property_read_u64_index(node, "reg", idx, &ghrsc->capid);
> + if (ret) {
> + dev_err(&pdev->dev, "Failed to get capid%d: %d\n", idx, ret);
> + return ret;
> + }
> +
> + return 0;
> +}
> +
> +static int gh_identify(void)
> +{
> + struct gh_hypercall_hyp_identify_resp gh_api;
> +
> + if (!arch_is_gh_guest())
> + return -ENODEV;
> +
> + gh_hypercall_hyp_identify(&gh_api);
> +
> + pr_info("Running under Gunyah hypervisor %llx/v%u\n",
> + FIELD_GET(GH_API_INFO_VARIANT_MASK, gh_api.api_info),
> + gh_api_version(&gh_api));
> +
> + /* We might move this out to individual drivers if there's ever an API version bump */
> + if (gh_api_version(&gh_api) != GH_API_V1) {
> + pr_info("Unsupported Gunyah version: %u\n", gh_api_version(&gh_api));
> + return -ENODEV;
> + }
> +
> + return 0;
> +}
> +
> +static int gh_rm_drv_probe(struct platform_device *pdev)
> +{
> + struct gh_msgq_tx_data *msg;
> + struct gh_rm *rm;
> + int ret;
> +
> + ret = gh_identify();
> + if (ret)
> + return ret;
> +
> + rm = devm_kzalloc(&pdev->dev, sizeof(*rm), GFP_KERNEL);
> + if (!rm)
> + return -ENOMEM;
> +
> + platform_set_drvdata(pdev, rm);
> + rm->dev = &pdev->dev;
> +
> + mutex_init(&rm->send_lock);
> + BLOCKING_INIT_NOTIFIER_HEAD(&rm->nh);
> + xa_init_flags(&rm->call_xarray, XA_FLAGS_ALLOC);
> + rm->cache = kmem_cache_create("gh_rm", struct_size(msg, data, GH_MSGQ_MAX_MSG_SIZE), 0,
> + SLAB_HWCACHE_ALIGN, NULL);
> + if (!rm->cache)
> + return -ENOMEM;
> +
> + ret = gh_msgq_platform_probe_direction(pdev, true, &rm->tx_ghrsc);
> + if (ret)
> + goto err_cache;
> +
> + ret = gh_msgq_platform_probe_direction(pdev, false, &rm->rx_ghrsc);
> + if (ret)
> + goto err_cache;
> +
> + rm->msgq_client.dev = &pdev->dev;
> + rm->msgq_client.tx_block = true;
> + rm->msgq_client.rx_callback = gh_rm_msgq_rx_data;
> + rm->msgq_client.tx_done = gh_rm_msgq_tx_done;
> +
> + return gh_msgq_init(&pdev->dev, &rm->msgq, &rm->msgq_client, &rm->tx_ghrsc, &rm->rx_ghrsc);
> +err_cache:
> + kmem_cache_destroy(rm->cache);
> + return ret;
> +}
> +
> +static int gh_rm_drv_remove(struct platform_device *pdev)
> +{
> + struct gh_rm *rm = platform_get_drvdata(pdev);
> +
> + mbox_free_channel(gh_msgq_chan(&rm->msgq));
> + gh_msgq_remove(&rm->msgq);
> + kmem_cache_destroy(rm->cache);
> +
> + return 0;
> +}
> +
> +static const struct of_device_id gh_rm_of_match[] = {
> + { .compatible = "gunyah-resource-manager" },
> + {}
> +};
> +MODULE_DEVICE_TABLE(of, gh_rm_of_match);
> +
> +static struct platform_driver gh_rm_driver = {
> + .probe = gh_rm_drv_probe,
> + .remove = gh_rm_drv_remove,
> + .driver = {
> + .name = "gh_rsc_mgr",
> + .of_match_table = gh_rm_of_match,
> + },
> +};
> +module_platform_driver(gh_rm_driver);
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("Gunyah Resource Manager Driver");
> diff --git a/drivers/virt/gunyah/rsc_mgr.h b/drivers/virt/gunyah/rsc_mgr.h
> new file mode 100644
> index 000000000000..8309b7bf4668
> --- /dev/null
> +++ b/drivers/virt/gunyah/rsc_mgr.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +#ifndef __GH_RSC_MGR_PRIV_H
> +#define __GH_RSC_MGR_PRIV_H
> +
> +#include <linux/gunyah.h>
> +#include <linux/gunyah_rsc_mgr.h>
> +#include <linux/types.h>
> +
> +struct gh_rm;
> +int gh_rm_call(struct gh_rm *rsc_mgr, u32 message_id, const void *req_buf, size_t req_buf_size,
> + void **resp_buf, size_t *resp_buf_size);
> +
> +#endif
> diff --git a/include/linux/gunyah_rsc_mgr.h b/include/linux/gunyah_rsc_mgr.h
> new file mode 100644
> index 000000000000..f2a312e80af5
> --- /dev/null
> +++ b/include/linux/gunyah_rsc_mgr.h
> @@ -0,0 +1,21 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#ifndef _GUNYAH_RSC_MGR_H
> +#define _GUNYAH_RSC_MGR_H
> +
> +#include <linux/list.h>
> +#include <linux/notifier.h>
> +#include <linux/gunyah.h>
> +
> +#define GH_VMID_INVAL U16_MAX
> +
> +struct gh_rm;
> +int gh_rm_notifier_register(struct gh_rm *rm, struct notifier_block *nb);
> +int gh_rm_notifier_unregister(struct gh_rm *rm, struct notifier_block *nb);
> +struct device *gh_rm_get(struct gh_rm *rm);
> +void gh_rm_put(struct gh_rm *rm);
> +
> +#endif
On 09/05/2023 21:47, Elliot Berman wrote:
> Add Gunyah Resource Manager RPC to launch an unauthenticated VM.
>
> Signed-off-by: Elliot Berman <[email protected]>
> ---
Reviewed-by: Srinivas Kandagatla <[email protected]>
--srini
> drivers/virt/gunyah/Makefile | 2 +-
> drivers/virt/gunyah/rsc_mgr_rpc.c | 259 ++++++++++++++++++++++++++++++
> include/linux/gunyah_rsc_mgr.h | 73 +++++++++
> 3 files changed, 333 insertions(+), 1 deletion(-)
> create mode 100644 drivers/virt/gunyah/rsc_mgr_rpc.c
>
> diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
> index 0f5aec834698..241bab357b86 100644
> --- a/drivers/virt/gunyah/Makefile
> +++ b/drivers/virt/gunyah/Makefile
> @@ -1,4 +1,4 @@
> # SPDX-License-Identifier: GPL-2.0
>
> -gunyah-y += rsc_mgr.o
> +gunyah-y += rsc_mgr.o rsc_mgr_rpc.o
> obj-$(CONFIG_GUNYAH) += gunyah.o
> diff --git a/drivers/virt/gunyah/rsc_mgr_rpc.c b/drivers/virt/gunyah/rsc_mgr_rpc.c
> new file mode 100644
> index 000000000000..a4a9f0ba4e1f
> --- /dev/null
> +++ b/drivers/virt/gunyah/rsc_mgr_rpc.c
> @@ -0,0 +1,259 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#include <linux/gunyah_rsc_mgr.h>
> +#include "rsc_mgr.h"
> +
> +/* Message IDs: VM Management */
> +#define GH_RM_RPC_VM_ALLOC_VMID 0x56000001
> +#define GH_RM_RPC_VM_DEALLOC_VMID 0x56000002
> +#define GH_RM_RPC_VM_START 0x56000004
> +#define GH_RM_RPC_VM_STOP 0x56000005
> +#define GH_RM_RPC_VM_RESET 0x56000006
> +#define GH_RM_RPC_VM_CONFIG_IMAGE 0x56000009
> +#define GH_RM_RPC_VM_INIT 0x5600000B
> +#define GH_RM_RPC_VM_GET_HYP_RESOURCES 0x56000020
> +#define GH_RM_RPC_VM_GET_VMID 0x56000024
> +
> +struct gh_rm_vm_common_vmid_req {
> + __le16 vmid;
> + __le16 _padding;
> +} __packed;
> +
> +/* Call: VM_ALLOC */
> +struct gh_rm_vm_alloc_vmid_resp {
> + __le16 vmid;
> + __le16 _padding;
> +} __packed;
> +
> +/* Call: VM_STOP */
> +#define GH_RM_VM_STOP_FLAG_FORCE_STOP BIT(0)
> +
> +#define GH_RM_VM_STOP_REASON_FORCE_STOP 3
> +
> +struct gh_rm_vm_stop_req {
> + __le16 vmid;
> + u8 flags;
> + u8 _padding;
> + __le32 stop_reason;
> +} __packed;
> +
> +/* Call: VM_CONFIG_IMAGE */
> +struct gh_rm_vm_config_image_req {
> + __le16 vmid;
> + __le16 auth_mech;
> + __le32 mem_handle;
> + __le64 image_offset;
> + __le64 image_size;
> + __le64 dtb_offset;
> + __le64 dtb_size;
> +} __packed;
> +
> +/*
> + * Several RM calls take only a VMID as a parameter and give only standard
> + * response back. Deduplicate boilerplate code by using this common call.
> + */
> +static int gh_rm_common_vmid_call(struct gh_rm *rm, u32 message_id, u16 vmid)
> +{
> + struct gh_rm_vm_common_vmid_req req_payload = {
> + .vmid = cpu_to_le16(vmid),
> + };
> +
> + return gh_rm_call(rm, message_id, &req_payload, sizeof(req_payload), NULL, NULL);
> +}
> +
> +/**
> + * gh_rm_alloc_vmid() - Allocate a new VM in Gunyah. Returns the VM identifier.
> + * @rm: Handle to a Gunyah resource manager
> + * @vmid: Use 0 to dynamically allocate a VM. A reserved VMID can be supplied
> + * to request allocation of a platform-defined VM.
> + *
> + * Returns - the allocated VMID or negative value on error
> + */
> +int gh_rm_alloc_vmid(struct gh_rm *rm, u16 vmid)
> +{
> + struct gh_rm_vm_common_vmid_req req_payload = {
> + .vmid = cpu_to_le16(vmid),
> + };
> + struct gh_rm_vm_alloc_vmid_resp *resp_payload;
> + size_t resp_size;
> + void *resp;
> + int ret;
> +
> + ret = gh_rm_call(rm, GH_RM_RPC_VM_ALLOC_VMID, &req_payload, sizeof(req_payload), &resp,
> + &resp_size);
> + if (ret)
> + return ret;
> +
> + if (!vmid) {
> + resp_payload = resp;
> + ret = le16_to_cpu(resp_payload->vmid);
> + kfree(resp);
> + }
> +
> + return ret;
> +}
> +
> +/**
> + * gh_rm_dealloc_vmid() - Dispose of a VMID
> + * @rm: Handle to a Gunyah resource manager
> + * @vmid: VM identifier allocated with gh_rm_alloc_vmid
> + */
> +int gh_rm_dealloc_vmid(struct gh_rm *rm, u16 vmid)
> +{
> + return gh_rm_common_vmid_call(rm, GH_RM_RPC_VM_DEALLOC_VMID, vmid);
> +}
> +
> +/**
> + * gh_rm_vm_reset() - Reset a VM's resources
> + * @rm: Handle to a Gunyah resource manager
> + * @vmid: VM identifier allocated with gh_rm_alloc_vmid
> + *
> + * As part of tearing down the VM, request RM to clean up all the VM resources
> + * associated with the VM. Only after this, Linux can clean up all the
> + * references it maintains to resources.
> + */
> +int gh_rm_vm_reset(struct gh_rm *rm, u16 vmid)
> +{
> + return gh_rm_common_vmid_call(rm, GH_RM_RPC_VM_RESET, vmid);
> +}
> +
> +/**
> + * gh_rm_vm_start() - Move a VM into "ready to run" state
> + * @rm: Handle to a Gunyah resource manager
> + * @vmid: VM identifier allocated with gh_rm_alloc_vmid
> + *
> + * On VMs which use proxy scheduling, vcpu_run is needed to actually run the VM.
> + * On VMs which use Gunyah's scheduling, the vCPUs start executing in accordance with Gunyah
> + * scheduling policies.
> + */
> +int gh_rm_vm_start(struct gh_rm *rm, u16 vmid)
> +{
> + return gh_rm_common_vmid_call(rm, GH_RM_RPC_VM_START, vmid);
> +}
> +
> +/**
> + * gh_rm_vm_stop() - Send a request to Resource Manager VM to forcibly stop a VM.
> + * @rm: Handle to a Gunyah resource manager
> + * @vmid: VM identifier allocated with gh_rm_alloc_vmid
> + */
> +int gh_rm_vm_stop(struct gh_rm *rm, u16 vmid)
> +{
> + struct gh_rm_vm_stop_req req_payload = {
> + .vmid = cpu_to_le16(vmid),
> + .flags = GH_RM_VM_STOP_FLAG_FORCE_STOP,
> + .stop_reason = cpu_to_le32(GH_RM_VM_STOP_REASON_FORCE_STOP),
> + };
> +
> + return gh_rm_call(rm, GH_RM_RPC_VM_STOP, &req_payload, sizeof(req_payload), NULL, NULL);
> +}
> +
> +/**
> + * gh_rm_vm_configure() - Prepare a VM to start and provide the common
> + * configuration needed by RM to configure a VM
> + * @rm: Handle to a Gunyah resource manager
> + * @vmid: VM identifier allocated with gh_rm_alloc_vmid
> + * @auth_mechanism: Authentication mechanism used by resource manager to verify
> + * the virtual machine
> + * @mem_handle: Handle to a previously shared memparcel that contains all parts
> + * of the VM image subject to authentication.
> + * @image_offset: Start address of VM image, relative to the start of memparcel
> + * @image_size: Size of the VM image
> + * @dtb_offset: Start address of the devicetree binary with VM configuration,
> + * relative to start of memparcel.
> + * @dtb_size: Maximum size of devicetree binary.
> + */
> +int gh_rm_vm_configure(struct gh_rm *rm, u16 vmid, enum gh_rm_vm_auth_mechanism auth_mechanism,
> + u32 mem_handle, u64 image_offset, u64 image_size, u64 dtb_offset, u64 dtb_size)
> +{
> + struct gh_rm_vm_config_image_req req_payload = {
> + .vmid = cpu_to_le16(vmid),
> + .auth_mech = cpu_to_le16(auth_mechanism),
> + .mem_handle = cpu_to_le32(mem_handle),
> + .image_offset = cpu_to_le64(image_offset),
> + .image_size = cpu_to_le64(image_size),
> + .dtb_offset = cpu_to_le64(dtb_offset),
> + .dtb_size = cpu_to_le64(dtb_size),
> + };
> +
> + return gh_rm_call(rm, GH_RM_RPC_VM_CONFIG_IMAGE, &req_payload, sizeof(req_payload),
> + NULL, NULL);
> +}
> +
> +/**
> + * gh_rm_vm_init() - Move the VM to initialized state.
> + * @rm: Handle to a Gunyah resource manager
> + * @vmid: VM identifier
> + *
> + * RM will allocate needed resources for the VM.
> + */
> +int gh_rm_vm_init(struct gh_rm *rm, u16 vmid)
> +{
> + return gh_rm_common_vmid_call(rm, GH_RM_RPC_VM_INIT, vmid);
> +}
> +
> +/**
> + * gh_rm_get_hyp_resources() - Retrieve hypervisor resources (capabilities) associated with a VM
> + * @rm: Handle to a Gunyah resource manager
> + * @vmid: VMID of the other VM to get the resources of
> + * @resources: Set by gh_rm_get_hyp_resources and contains the returned hypervisor resources.
> + * Caller must free the resources pointer if successful.
> + */
> +int gh_rm_get_hyp_resources(struct gh_rm *rm, u16 vmid,
> + struct gh_rm_hyp_resources **resources)
> +{
> + struct gh_rm_vm_common_vmid_req req_payload = {
> + .vmid = cpu_to_le16(vmid),
> + };
> + struct gh_rm_hyp_resources *resp;
> + size_t resp_size;
> + int ret;
> +
> + ret = gh_rm_call(rm, GH_RM_RPC_VM_GET_HYP_RESOURCES,
> + &req_payload, sizeof(req_payload),
> + (void **)&resp, &resp_size);
> + if (ret)
> + return ret;
> +
> + if (!resp_size)
> + return -EBADMSG;
> +
> + if (resp_size < struct_size(resp, entries, 0) ||
> + resp_size != struct_size(resp, entries, le32_to_cpu(resp->n_entries))) {
> + kfree(resp);
> + return -EBADMSG;
> + }
> +
> + *resources = resp;
> + return 0;
> +}
> +
> +/**
> + * gh_rm_get_vmid() - Retrieve VMID of this virtual machine
> + * @rm: Handle to a Gunyah resource manager
> + * @vmid: Filled with the VMID of this VM
> + */
> +int gh_rm_get_vmid(struct gh_rm *rm, u16 *vmid)
> +{
> + static u16 cached_vmid = GH_VMID_INVAL;
> + size_t resp_size;
> + __le32 *resp;
> + int ret;
> +
> + if (cached_vmid != GH_VMID_INVAL) {
> + *vmid = cached_vmid;
> + return 0;
> + }
> +
> + ret = gh_rm_call(rm, GH_RM_RPC_VM_GET_VMID, NULL, 0, (void **)&resp, &resp_size);
> + if (ret)
> + return ret;
> +
> + *vmid = cached_vmid = lower_16_bits(le32_to_cpu(*resp));
> + kfree(resp);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(gh_rm_get_vmid);
> diff --git a/include/linux/gunyah_rsc_mgr.h b/include/linux/gunyah_rsc_mgr.h
> index f2a312e80af5..1ac66d9004d2 100644
> --- a/include/linux/gunyah_rsc_mgr.h
> +++ b/include/linux/gunyah_rsc_mgr.h
> @@ -18,4 +18,77 @@ int gh_rm_notifier_unregister(struct gh_rm *rm, struct notifier_block *nb);
> struct device *gh_rm_get(struct gh_rm *rm);
> void gh_rm_put(struct gh_rm *rm);
>
> +struct gh_rm_vm_exited_payload {
> + __le16 vmid;
> + __le16 exit_type;
> + __le32 exit_reason_size;
> + u8 exit_reason[];
> +} __packed;
> +
> +#define GH_RM_NOTIFICATION_VM_EXITED 0x56100001
> +
> +enum gh_rm_vm_status {
> + GH_RM_VM_STATUS_NO_STATE = 0,
> + GH_RM_VM_STATUS_INIT = 1,
> + GH_RM_VM_STATUS_READY = 2,
> + GH_RM_VM_STATUS_RUNNING = 3,
> + GH_RM_VM_STATUS_PAUSED = 4,
> + GH_RM_VM_STATUS_LOAD = 5,
> + GH_RM_VM_STATUS_AUTH = 6,
> + GH_RM_VM_STATUS_INIT_FAILED = 8,
> + GH_RM_VM_STATUS_EXITED = 9,
> + GH_RM_VM_STATUS_RESETTING = 10,
> + GH_RM_VM_STATUS_RESET = 11,
> +};
> +
> +struct gh_rm_vm_status_payload {
> + __le16 vmid;
> + u16 reserved;
> + u8 vm_status;
> + u8 os_status;
> + __le16 app_status;
> +} __packed;
> +
> +#define GH_RM_NOTIFICATION_VM_STATUS 0x56100008
> +
> +/* RPC Calls */
> +int gh_rm_alloc_vmid(struct gh_rm *rm, u16 vmid);
> +int gh_rm_dealloc_vmid(struct gh_rm *rm, u16 vmid);
> +int gh_rm_vm_reset(struct gh_rm *rm, u16 vmid);
> +int gh_rm_vm_start(struct gh_rm *rm, u16 vmid);
> +int gh_rm_vm_stop(struct gh_rm *rm, u16 vmid);
> +
> +enum gh_rm_vm_auth_mechanism {
> + GH_RM_VM_AUTH_NONE = 0,
> + GH_RM_VM_AUTH_QCOM_PIL_ELF = 1,
> + GH_RM_VM_AUTH_QCOM_ANDROID_PVM = 2,
> +};
> +
> +int gh_rm_vm_configure(struct gh_rm *rm, u16 vmid, enum gh_rm_vm_auth_mechanism auth_mechanism,
> + u32 mem_handle, u64 image_offset, u64 image_size,
> + u64 dtb_offset, u64 dtb_size);
> +int gh_rm_vm_init(struct gh_rm *rm, u16 vmid);
> +
> +struct gh_rm_hyp_resource {
> + u8 type;
> + u8 reserved;
> + __le16 partner_vmid;
> + __le32 resource_handle;
> + __le32 resource_label;
> + __le64 cap_id;
> + __le32 virq_handle;
> + __le32 virq;
> + __le64 base;
> + __le64 size;
> +} __packed;
> +
> +struct gh_rm_hyp_resources {
> + __le32 n_entries;
> + struct gh_rm_hyp_resource entries[];
> +} __packed;
> +
> +int gh_rm_get_hyp_resources(struct gh_rm *rm, u16 vmid,
> + struct gh_rm_hyp_resources **resources);
> +int gh_rm_get_vmid(struct gh_rm *rm, u16 *vmid);
> +
> #endif
On 09/05/2023 21:47, Elliot Berman wrote:
> Gunyah VM manager is a kernel moduel which exposes an interface to
> Gunyah userspace to load, run, and interact with other Gunyah virtual
> machines. The interface is a character device at /dev/gunyah.
>
> Add a basic VM manager driver. Upcoming patches will add more ioctls
> into this driver.
>
> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
> ---
Reviewed-by: Srinivas Kandagatla <[email protected]>
--srini
> .../userspace-api/ioctl/ioctl-number.rst | 1 +
> drivers/virt/gunyah/Makefile | 2 +-
> drivers/virt/gunyah/rsc_mgr.c | 50 +++++++++-
> drivers/virt/gunyah/vm_mgr.c | 93 +++++++++++++++++++
> drivers/virt/gunyah/vm_mgr.h | 20 ++++
> include/uapi/linux/gunyah.h | 23 +++++
> 6 files changed, 187 insertions(+), 2 deletions(-)
> create mode 100644 drivers/virt/gunyah/vm_mgr.c
> create mode 100644 drivers/virt/gunyah/vm_mgr.h
> create mode 100644 include/uapi/linux/gunyah.h
>
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 176e8fc3f31b..396212e88f7d 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -137,6 +137,7 @@ Code Seq# Include File Comments
> 'F' DD video/sstfb.h conflict!
> 'G' 00-3F drivers/misc/sgi-gru/grulib.h conflict!
> 'G' 00-0F xen/gntalloc.h, xen/gntdev.h conflict!
> +'G' 00-0f linux/gunyah.h conflict!
> 'H' 00-7F linux/hiddev.h conflict!
> 'H' 00-0F linux/hidraw.h conflict!
> 'H' 01 linux/mei.h conflict!
> diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
> index 241bab357b86..e47e25895299 100644
> --- a/drivers/virt/gunyah/Makefile
> +++ b/drivers/virt/gunyah/Makefile
> @@ -1,4 +1,4 @@
> # SPDX-License-Identifier: GPL-2.0
>
> -gunyah-y += rsc_mgr.o rsc_mgr_rpc.o
> +gunyah-y += rsc_mgr.o rsc_mgr_rpc.o vm_mgr.o
> obj-$(CONFIG_GUNYAH) += gunyah.o
> diff --git a/drivers/virt/gunyah/rsc_mgr.c b/drivers/virt/gunyah/rsc_mgr.c
> index 88b5beb1ea51..4f6f96bdcf3d 100644
> --- a/drivers/virt/gunyah/rsc_mgr.c
> +++ b/drivers/virt/gunyah/rsc_mgr.c
> @@ -15,8 +15,10 @@
> #include <linux/completion.h>
> #include <linux/gunyah_rsc_mgr.h>
> #include <linux/platform_device.h>
> +#include <linux/miscdevice.h>
>
> #include "rsc_mgr.h"
> +#include "vm_mgr.h"
>
> #define RM_RPC_API_VERSION_MASK GENMASK(3, 0)
> #define RM_RPC_HEADER_WORDS_MASK GENMASK(7, 4)
> @@ -130,6 +132,7 @@ struct gh_rm_connection {
> * @cache: cache for allocating Tx messages
> * @send_lock: synchronization to allow only one request to be sent at a time
> * @nh: notifier chain for clients interested in RM notification messages
> + * @miscdev: /dev/gunyah
> */
> struct gh_rm {
> struct device *dev;
> @@ -146,6 +149,8 @@ struct gh_rm {
> struct kmem_cache *cache;
> struct mutex send_lock;
> struct blocking_notifier_head nh;
> +
> + struct miscdevice miscdev;
> };
>
> /**
> @@ -581,6 +586,33 @@ int gh_rm_notifier_unregister(struct gh_rm *rm, struct notifier_block *nb)
> }
> EXPORT_SYMBOL_GPL(gh_rm_notifier_unregister);
>
> +struct device *gh_rm_get(struct gh_rm *rm)
> +{
> + return get_device(rm->miscdev.this_device);
> +}
> +EXPORT_SYMBOL_GPL(gh_rm_get);
> +
> +void gh_rm_put(struct gh_rm *rm)
> +{
> + put_device(rm->miscdev.this_device);
> +}
> +EXPORT_SYMBOL_GPL(gh_rm_put);
> +
> +static long gh_dev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> +{
> + struct miscdevice *miscdev = filp->private_data;
> + struct gh_rm *rm = container_of(miscdev, struct gh_rm, miscdev);
> +
> + return gh_dev_vm_mgr_ioctl(rm, cmd, arg);
> +}
> +
> +static const struct file_operations gh_dev_fops = {
> + .owner = THIS_MODULE,
> + .unlocked_ioctl = gh_dev_ioctl,
> + .compat_ioctl = compat_ptr_ioctl,
> + .llseek = noop_llseek,
> +};
> +
> static int gh_msgq_platform_probe_direction(struct platform_device *pdev, bool tx,
> struct gh_resource *ghrsc)
> {
> @@ -665,7 +697,22 @@ static int gh_rm_drv_probe(struct platform_device *pdev)
> rm->msgq_client.rx_callback = gh_rm_msgq_rx_data;
> rm->msgq_client.tx_done = gh_rm_msgq_tx_done;
>
> - return gh_msgq_init(&pdev->dev, &rm->msgq, &rm->msgq_client, &rm->tx_ghrsc, &rm->rx_ghrsc);
> + ret = gh_msgq_init(&pdev->dev, &rm->msgq, &rm->msgq_client, &rm->tx_ghrsc, &rm->rx_ghrsc);
> + if (ret)
> + goto err_cache;
> +
> + rm->miscdev.name = "gunyah";
> + rm->miscdev.minor = MISC_DYNAMIC_MINOR;
> + rm->miscdev.fops = &gh_dev_fops;
> +
> + ret = misc_register(&rm->miscdev);
> + if (ret)
> + goto err_msgq;
> +
> + return 0;
> +err_msgq:
> + mbox_free_channel(gh_msgq_chan(&rm->msgq));
> + gh_msgq_remove(&rm->msgq);
> err_cache:
> kmem_cache_destroy(rm->cache);
> return ret;
> @@ -675,6 +722,7 @@ static int gh_rm_drv_remove(struct platform_device *pdev)
> {
> struct gh_rm *rm = platform_get_drvdata(pdev);
>
> + misc_deregister(&rm->miscdev);
> mbox_free_channel(gh_msgq_chan(&rm->msgq));
> gh_msgq_remove(&rm->msgq);
> kmem_cache_destroy(rm->cache);
> diff --git a/drivers/virt/gunyah/vm_mgr.c b/drivers/virt/gunyah/vm_mgr.c
> new file mode 100644
> index 000000000000..a43401cb34f7
> --- /dev/null
> +++ b/drivers/virt/gunyah/vm_mgr.c
> @@ -0,0 +1,93 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#define pr_fmt(fmt) "gh_vm_mgr: " fmt
> +
> +#include <linux/anon_inodes.h>
> +#include <linux/file.h>
> +#include <linux/gunyah_rsc_mgr.h>
> +#include <linux/miscdevice.h>
> +#include <linux/module.h>
> +
> +#include <uapi/linux/gunyah.h>
> +
> +#include "vm_mgr.h"
> +
> +static __must_check struct gh_vm *gh_vm_alloc(struct gh_rm *rm)
> +{
> + struct gh_vm *ghvm;
> +
> + ghvm = kzalloc(sizeof(*ghvm), GFP_KERNEL);
> + if (!ghvm)
> + return ERR_PTR(-ENOMEM);
> +
> + ghvm->parent = gh_rm_get(rm);
> + ghvm->rm = rm;
> +
> + return ghvm;
> +}
> +
> +static int gh_vm_release(struct inode *inode, struct file *filp)
> +{
> + struct gh_vm *ghvm = filp->private_data;
> +
> + gh_rm_put(ghvm->rm);
> + kfree(ghvm);
> + return 0;
> +}
> +
> +static const struct file_operations gh_vm_fops = {
> + .owner = THIS_MODULE,
> + .release = gh_vm_release,
> + .llseek = noop_llseek,
> +};
> +
> +static long gh_dev_ioctl_create_vm(struct gh_rm *rm, unsigned long arg)
> +{
> + struct gh_vm *ghvm;
> + struct file *file;
> + int fd, err;
> +
> + /* arg reserved for future use. */
> + if (arg)
> + return -EINVAL;
> +
> + ghvm = gh_vm_alloc(rm);
> + if (IS_ERR(ghvm))
> + return PTR_ERR(ghvm);
> +
> + fd = get_unused_fd_flags(O_CLOEXEC);
> + if (fd < 0) {
> + err = fd;
> + goto err_destroy_vm;
> + }
> +
> + file = anon_inode_getfile("gunyah-vm", &gh_vm_fops, ghvm, O_RDWR);
> + if (IS_ERR(file)) {
> + err = PTR_ERR(file);
> + goto err_put_fd;
> + }
> +
> + fd_install(fd, file);
> +
> + return fd;
> +
> +err_put_fd:
> + put_unused_fd(fd);
> +err_destroy_vm:
> + gh_rm_put(ghvm->rm);
> + kfree(ghvm);
> + return err;
> +}
> +
> +long gh_dev_vm_mgr_ioctl(struct gh_rm *rm, unsigned int cmd, unsigned long arg)
> +{
> + switch (cmd) {
> + case GH_CREATE_VM:
> + return gh_dev_ioctl_create_vm(rm, arg);
> + default:
> + return -ENOTTY;
> + }
> +}
> diff --git a/drivers/virt/gunyah/vm_mgr.h b/drivers/virt/gunyah/vm_mgr.h
> new file mode 100644
> index 000000000000..1e94b58d7d34
> --- /dev/null
> +++ b/drivers/virt/gunyah/vm_mgr.h
> @@ -0,0 +1,20 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#ifndef _GH_VM_MGR_H
> +#define _GH_VM_MGR_H
> +
> +#include <linux/gunyah_rsc_mgr.h>
> +
> +#include <uapi/linux/gunyah.h>
> +
> +long gh_dev_vm_mgr_ioctl(struct gh_rm *rm, unsigned int cmd, unsigned long arg);
> +
> +struct gh_vm {
> + struct gh_rm *rm;
> + struct device *parent;
> +};
> +
> +#endif
> diff --git a/include/uapi/linux/gunyah.h b/include/uapi/linux/gunyah.h
> new file mode 100644
> index 000000000000..86b9cb60118d
> --- /dev/null
> +++ b/include/uapi/linux/gunyah.h
> @@ -0,0 +1,23 @@
> +/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
> +/*
> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All rights reserved.
> + */
> +
> +#ifndef _UAPI_LINUX_GUNYAH_H
> +#define _UAPI_LINUX_GUNYAH_H
> +
> +/*
> + * Userspace interface for /dev/gunyah - gunyah based virtual machine
> + */
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +#define GH_IOCTL_TYPE 'G'
> +
> +/*
> + * ioctls for /dev/gunyah fds:
> + */
> +#define GH_CREATE_VM _IO(GH_IOCTL_TYPE, 0x0) /* Returns a Gunyah VM fd */
> +
> +#endif
On 09/05/2023 21:47, Elliot Berman wrote:
> Gunyah resource manager provides API to manipulate stage 2 page tables.
> Manipulations are represented as a memory parcel. Memory parcels
> describe a list of memory regions (intermediate physical address and
> size), a list of new permissions for VMs, and the memory type (DDR or
> MMIO). Memory parcels are uniquely identified by a handle allocated by
> Gunyah. There are a few types of memory parcel sharing which Gunyah
> supports:
>
> - Sharing: the guest and host VM both have access
> - Lending: only the guest has access; host VM loses access
> - Donating: Permanently lent (not reclaimed even if guest shuts down)
>
> Memory parcels that have been shared or lent can be reclaimed by the
> host via an additional call. The reclaim operation restores the original
> access the host VM had to the memory parcel and removes the access to
> other VM.
>
> One point to note that memory parcels don't describe where in the guest
> VM the memory parcel should reside. The guest VM must accept the memory
> parcel either explicitly via a "gh_rm_mem_accept" call (not introduced
> here) or be configured to accept it automatically at boot. As the guest
> VM accepts the memory parcel, it also mentions the IPA it wants to place
> memory parcel.
>
> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
> Signed-off-by: Elliot Berman <[email protected]>
> ---
Reviewed-by: Srinivas Kandagatla <[email protected]>
--srini
> drivers/virt/gunyah/rsc_mgr_rpc.c | 227 ++++++++++++++++++++++++++++++
> include/linux/gunyah_rsc_mgr.h | 48 +++++++
> 2 files changed, 275 insertions(+)
>
> diff --git a/drivers/virt/gunyah/rsc_mgr_rpc.c b/drivers/virt/gunyah/rsc_mgr_rpc.c
> index a4a9f0ba4e1f..4f25f07400b3 100644
> --- a/drivers/virt/gunyah/rsc_mgr_rpc.c
> +++ b/drivers/virt/gunyah/rsc_mgr_rpc.c
> @@ -6,6 +6,12 @@
> #include <linux/gunyah_rsc_mgr.h>
> #include "rsc_mgr.h"
>
> +/* Message IDs: Memory Management */
> +#define GH_RM_RPC_MEM_LEND 0x51000012
> +#define GH_RM_RPC_MEM_SHARE 0x51000013
> +#define GH_RM_RPC_MEM_RECLAIM 0x51000015
> +#define GH_RM_RPC_MEM_APPEND 0x51000018
> +
> /* Message IDs: VM Management */
> #define GH_RM_RPC_VM_ALLOC_VMID 0x56000001
> #define GH_RM_RPC_VM_DEALLOC_VMID 0x56000002
> @@ -22,6 +28,46 @@ struct gh_rm_vm_common_vmid_req {
> __le16 _padding;
> } __packed;
>
> +/* Call: MEM_LEND, MEM_SHARE */
> +#define GH_MEM_SHARE_REQ_FLAGS_APPEND BIT(1)
> +
> +struct gh_rm_mem_share_req_header {
> + u8 mem_type;
> + u8 _padding0;
> + u8 flags;
> + u8 _padding1;
> + __le32 label;
> +} __packed;
> +
> +struct gh_rm_mem_share_req_acl_section {
> + __le32 n_entries;
> + struct gh_rm_mem_acl_entry entries[];
> +};
> +
> +struct gh_rm_mem_share_req_mem_section {
> + __le16 n_entries;
> + __le16 _padding;
> + struct gh_rm_mem_entry entries[];
> +};
> +
> +/* Call: MEM_RELEASE */
> +struct gh_rm_mem_release_req {
> + __le32 mem_handle;
> + u8 flags; /* currently not used */
> + u8 _padding0;
> + __le16 _padding1;
> +} __packed;
> +
> +/* Call: MEM_APPEND */
> +#define GH_MEM_APPEND_REQ_FLAGS_END BIT(0)
> +
> +struct gh_rm_mem_append_req_header {
> + __le32 mem_handle;
> + u8 flags;
> + u8 _padding0;
> + __le16 _padding1;
> +} __packed;
> +
> /* Call: VM_ALLOC */
> struct gh_rm_vm_alloc_vmid_resp {
> __le16 vmid;
> @@ -51,6 +97,8 @@ struct gh_rm_vm_config_image_req {
> __le64 dtb_size;
> } __packed;
>
> +#define GH_RM_MAX_MEM_ENTRIES 512
> +
> /*
> * Several RM calls take only a VMID as a parameter and give only standard
> * response back. Deduplicate boilerplate code by using this common call.
> @@ -64,6 +112,185 @@ static int gh_rm_common_vmid_call(struct gh_rm *rm, u32 message_id, u16 vmid)
> return gh_rm_call(rm, message_id, &req_payload, sizeof(req_payload), NULL, NULL);
> }
>
> +static int _gh_rm_mem_append(struct gh_rm *rm, u32 mem_handle, bool end_append,
> + struct gh_rm_mem_entry *mem_entries, size_t n_mem_entries)
> +{
> + struct gh_rm_mem_share_req_mem_section *mem_section;
> + struct gh_rm_mem_append_req_header *req_header;
> + size_t msg_size = 0;
> + void *msg;
> + int ret;
> +
> + msg_size += sizeof(struct gh_rm_mem_append_req_header);
> + msg_size += struct_size(mem_section, entries, n_mem_entries);
> +
> + msg = kzalloc(msg_size, GFP_KERNEL);
> + if (!msg)
> + return -ENOMEM;
> +
> + req_header = msg;
> + mem_section = (void *)req_header + sizeof(struct gh_rm_mem_append_req_header);
> +
> + req_header->mem_handle = cpu_to_le32(mem_handle);
> + if (end_append)
> + req_header->flags |= GH_MEM_APPEND_REQ_FLAGS_END;
> +
> + mem_section->n_entries = cpu_to_le16(n_mem_entries);
> + memcpy(mem_section->entries, mem_entries, sizeof(*mem_entries) * n_mem_entries);
> +
> + ret = gh_rm_call(rm, GH_RM_RPC_MEM_APPEND, msg, msg_size, NULL, NULL);
> + kfree(msg);
> +
> + return ret;
> +}
> +
> +static int gh_rm_mem_append(struct gh_rm *rm, u32 mem_handle,
> + struct gh_rm_mem_entry *mem_entries, size_t n_mem_entries)
> +{
> + bool end_append;
> + int ret = 0;
> + size_t n;
> +
> + while (n_mem_entries) {
> + if (n_mem_entries > GH_RM_MAX_MEM_ENTRIES) {
> + end_append = false;
> + n = GH_RM_MAX_MEM_ENTRIES;
> + } else {
> + end_append = true;
> + n = n_mem_entries;
> + }
> +
> + ret = _gh_rm_mem_append(rm, mem_handle, end_append, mem_entries, n);
> + if (ret)
> + break;
> +
> + mem_entries += n;
> + n_mem_entries -= n;
> + }
> +
> + return ret;
> +}
> +
> +static int gh_rm_mem_lend_common(struct gh_rm *rm, u32 message_id, struct gh_rm_mem_parcel *p)
> +{
> + size_t msg_size = 0, initial_mem_entries = p->n_mem_entries, resp_size;
> + size_t acl_section_size, mem_section_size;
> + struct gh_rm_mem_share_req_acl_section *acl_section;
> + struct gh_rm_mem_share_req_mem_section *mem_section;
> + struct gh_rm_mem_share_req_header *req_header;
> + u32 *attr_section;
> + __le32 *resp;
> + void *msg;
> + int ret;
> +
> + if (!p->acl_entries || !p->n_acl_entries || !p->mem_entries || !p->n_mem_entries ||
> + p->n_acl_entries > U8_MAX || p->mem_handle != GH_MEM_HANDLE_INVAL)
> + return -EINVAL;
> +
> + if (initial_mem_entries > GH_RM_MAX_MEM_ENTRIES)
> + initial_mem_entries = GH_RM_MAX_MEM_ENTRIES;
> +
> + acl_section_size = struct_size(acl_section, entries, p->n_acl_entries);
> + mem_section_size = struct_size(mem_section, entries, initial_mem_entries);
> + /* The format of the message goes:
> + * request header
> + * ACL entries (which VMs get what kind of access to this memory parcel)
> + * Memory entries (list of memory regions to share)
> + * Memory attributes (currently unused, we'll hard-code the size to 0)
> + */
> + msg_size += sizeof(struct gh_rm_mem_share_req_header);
> + msg_size += acl_section_size;
> + msg_size += mem_section_size;
> + msg_size += sizeof(u32); /* for memory attributes, currently unused */
> +
> + msg = kzalloc(msg_size, GFP_KERNEL);
> + if (!msg)
> + return -ENOMEM;
> +
> + req_header = msg;
> + acl_section = (void *)req_header + sizeof(*req_header);
> + mem_section = (void *)acl_section + acl_section_size;
> + attr_section = (void *)mem_section + mem_section_size;
> +
> + req_header->mem_type = p->mem_type;
> + if (initial_mem_entries != p->n_mem_entries)
> + req_header->flags |= GH_MEM_SHARE_REQ_FLAGS_APPEND;
> + req_header->label = cpu_to_le32(p->label);
> +
> + acl_section->n_entries = cpu_to_le32(p->n_acl_entries);
> + memcpy(acl_section->entries, p->acl_entries,
> + flex_array_size(acl_section, entries, p->n_acl_entries));
> +
> + mem_section->n_entries = cpu_to_le16(initial_mem_entries);
> + memcpy(mem_section->entries, p->mem_entries,
> + flex_array_size(mem_section, entries, initial_mem_entries));
> +
> + /* Set n_entries for memory attribute section to 0 */
> + *attr_section = 0;
> +
> + ret = gh_rm_call(rm, message_id, msg, msg_size, (void **)&resp, &resp_size);
> + kfree(msg);
> +
> + if (ret)
> + return ret;
> +
> + p->mem_handle = le32_to_cpu(*resp);
> + kfree(resp);
> +
> + if (initial_mem_entries != p->n_mem_entries) {
> + ret = gh_rm_mem_append(rm, p->mem_handle,
> + &p->mem_entries[initial_mem_entries],
> + p->n_mem_entries - initial_mem_entries);
> + if (ret) {
> + gh_rm_mem_reclaim(rm, p);
> + p->mem_handle = GH_MEM_HANDLE_INVAL;
> + }
> + }
> +
> + return ret;
> +}
> +
> +/**
> + * gh_rm_mem_lend() - Lend memory to other virtual machines.
> + * @rm: Handle to a Gunyah resource manager
> + * @parcel: Information about the memory to be lent.
> + *
> + * Lending removes Linux's access to the memory while the memory parcel is lent.
> + */
> +int gh_rm_mem_lend(struct gh_rm *rm, struct gh_rm_mem_parcel *parcel)
> +{
> + return gh_rm_mem_lend_common(rm, GH_RM_RPC_MEM_LEND, parcel);
> +}
> +
> +
> +/**
> + * gh_rm_mem_share() - Share memory with other virtual machines.
> + * @rm: Handle to a Gunyah resource manager
> + * @parcel: Information about the memory to be shared.
> + *
> + * Sharing keeps Linux's access to the memory while the memory parcel is shared.
> + */
> +int gh_rm_mem_share(struct gh_rm *rm, struct gh_rm_mem_parcel *parcel)
> +{
> + return gh_rm_mem_lend_common(rm, GH_RM_RPC_MEM_SHARE, parcel);
> +}
> +
> +/**
> + * gh_rm_mem_reclaim() - Reclaim a memory parcel
> + * @rm: Handle to a Gunyah resource manager
> + * @parcel: Information about the memory to be reclaimed.
> + *
> + * RM maps the associated memory back into the stage-2 page tables of the owner VM.
> + */
> +int gh_rm_mem_reclaim(struct gh_rm *rm, struct gh_rm_mem_parcel *parcel)
> +{
> + struct gh_rm_mem_release_req req = {
> + .mem_handle = cpu_to_le32(parcel->mem_handle),
> + };
> +
> + return gh_rm_call(rm, GH_RM_RPC_MEM_RECLAIM, &req, sizeof(req), NULL, NULL);
> +}
> +
> /**
> * gh_rm_alloc_vmid() - Allocate a new VM in Gunyah. Returns the VM identifier.
> * @rm: Handle to a Gunyah resource manager
> diff --git a/include/linux/gunyah_rsc_mgr.h b/include/linux/gunyah_rsc_mgr.h
> index 1ac66d9004d2..dfac088420bd 100644
> --- a/include/linux/gunyah_rsc_mgr.h
> +++ b/include/linux/gunyah_rsc_mgr.h
> @@ -11,6 +11,7 @@
> #include <linux/gunyah.h>
>
> #define GH_VMID_INVAL U16_MAX
> +#define GH_MEM_HANDLE_INVAL U32_MAX
>
> struct gh_rm;
> int gh_rm_notifier_register(struct gh_rm *rm, struct notifier_block *nb);
> @@ -51,7 +52,54 @@ struct gh_rm_vm_status_payload {
>
> #define GH_RM_NOTIFICATION_VM_STATUS 0x56100008
>
> +#define GH_RM_ACL_X BIT(0)
> +#define GH_RM_ACL_W BIT(1)
> +#define GH_RM_ACL_R BIT(2)
> +
> +struct gh_rm_mem_acl_entry {
> + __le16 vmid;
> + u8 perms;
> + u8 reserved;
> +} __packed;
> +
> +struct gh_rm_mem_entry {
> + __le64 phys_addr;
> + __le64 size;
> +} __packed;
> +
> +enum gh_rm_mem_type {
> + GH_RM_MEM_TYPE_NORMAL = 0,
> + GH_RM_MEM_TYPE_IO = 1,
> +};
> +
> +/*
> + * struct gh_rm_mem_parcel - Info about memory to be lent/shared/donated/reclaimed
> + * @mem_type: The type of memory: normal (DDR) or IO
> + * @label: An client-specified identifier which can be used by the other VMs to identify the purpose
> + * of the memory parcel.
> + * @n_acl_entries: Count of the number of entries in the @acl_entries array.
> + * @acl_entries: An array of access control entries. Each entry specifies a VM and what access
> + * is allowed for the memory parcel.
> + * @n_mem_entries: Count of the number of entries in the @mem_entries array.
> + * @mem_entries: An array of regions to be associated with the memory parcel. Addresses should be
> + * (intermediate) physical addresses from Linux's perspective.
> + * @mem_handle: On success, filled with memory handle that RM allocates for this memory parcel
> + */
> +struct gh_rm_mem_parcel {
> + enum gh_rm_mem_type mem_type;
> + u32 label;
> + size_t n_acl_entries;
> + struct gh_rm_mem_acl_entry *acl_entries;
> + size_t n_mem_entries;
> + struct gh_rm_mem_entry *mem_entries;
> + u32 mem_handle;
> +};
> +
> /* RPC Calls */
> +int gh_rm_mem_lend(struct gh_rm *rm, struct gh_rm_mem_parcel *parcel);
> +int gh_rm_mem_share(struct gh_rm *rm, struct gh_rm_mem_parcel *parcel);
> +int gh_rm_mem_reclaim(struct gh_rm *rm, struct gh_rm_mem_parcel *parcel);
> +
> int gh_rm_alloc_vmid(struct gh_rm *rm, u16 vmid);
> int gh_rm_dealloc_vmid(struct gh_rm *rm, u16 vmid);
> int gh_rm_vm_reset(struct gh_rm *rm, u16 vmid);
On 6/5/2023 7:18 AM, Will Deacon wrote:
> Hi Elliot,
>
> [+Quentin since he's looked at the MMU notifiers]
>
> Sorry for the slow response, I got buried in email during a week away.
>
> On Fri, May 19, 2023 at 10:02:29AM -0700, Elliot Berman wrote:
>> On 5/19/2023 4:59 AM, Will Deacon wrote:
>>> On Tue, May 09, 2023 at 01:47:47PM -0700, Elliot Berman wrote:
>>>> + ret = account_locked_vm(ghvm->mm, mapping->npages, true);
>>>> + if (ret)
>>>> + goto free_mapping;
>>>> +
>>>> + mapping->pages = kcalloc(mapping->npages, sizeof(*mapping->pages), GFP_KERNEL_ACCOUNT);
>>>> + if (!mapping->pages) {
>>>> + ret = -ENOMEM;
>>>> + mapping->npages = 0; /* update npages for reclaim */
>>>> + goto unlock_pages;
>>>> + }
>>>> +
>>>> + gup_flags = FOLL_LONGTERM;
>>>> + if (region->flags & GH_MEM_ALLOW_WRITE)
>>>> + gup_flags |= FOLL_WRITE;
>>>> +
>>>> + pinned = pin_user_pages_fast(region->userspace_addr, mapping->npages,
>>>> + gup_flags, mapping->pages);
>>>> + if (pinned < 0) {
>>>> + ret = pinned;
>>>> + goto free_pages;
>>>> + } else if (pinned != mapping->npages) {
>>>> + ret = -EFAULT;
>>>> + mapping->npages = pinned; /* update npages for reclaim */
>>>> + goto unpin_pages;
>>>> + }
>>>
>>> Sorry if I missed it, but I still don't see where you reject file mappings
>>> here.
>>>
>>
>> Sure, I can reject file mappings. I didn't catch that was the ask previously
>> and thought it was only a comment about behavior of file mappings.
>
> I thought the mention of filesystem corruption was clear enough! It's
> definitely something we shouldn't allow.
>
>>> This is also the wrong interface for upstream. Please get involved with
>>> the fd-based guest memory discussions [1] and port your series to that.
>>>
>>
>> The user interface design for *shared* memory aligns with
>> KVM_SET_USER_MEMORY_REGION.
>
> I don't think it does. For example, file mappings don't work (as above),
> you're placing additional rlimit requirements on the caller, read-only
> memslots are not functional, the memory cannot be swapped or migrated,
> dirty logging doesn't work etc. pKVM is in the same boat, but that's why
> we're not upstreaming this part in its current form.
>
I thought pKVM was only holding off on upstreaming changes related to
guest-private memory?
>> I understood we want to use restricted memfd for giving guest-private memory
>> (Gunyah calls this "lending memory"). When I went through the changes, I
>> gathered KVM is using restricted memfd only for guest-private memory and not
>> for shared memory. Thus, I dropped support for lending memory to the guest
>> VM and only retained the shared memory support in this series. I'd like to
>> merge what we can today and introduce the guest-private memory support in
>> tandem with the restricted memfd; I don't see much reason to delay the
>> series.
>
> Right, protected guests will use the new restricted memfd ("guest mem"
> now, I think?), but non-protected guests should implement the existing
> interface *without* the need for the GUP pin on guest memory pages. Yes,
> that means full support for MMU notifiers so that these pages can be
> managed properly by the host kernel. We're working on that for pKVM, but
> it requires a more flexible form of memory sharing over what we currently
> have so that e.g. the zero page can be shared between multiple entities.
Gunyah doesn't support swapping pages out while the guest is running and
the design of Gunyah isn't made to give host kernel full control over
the S2 page table for its guests. As best I can tell from reading the
respective drivers, ACRN and Nitro Enclaves both GUP pin guest memory
pages prior to giving them to the guest, so I don't think this
requirement from Gunyah is particularly unusual.
On 6/5/2023 12:48 PM, Alex Elder wrote:
> On 5/9/23 3:47 PM, Elliot Berman wrote:
>> Qualcomm platforms have a firmware entity which performs access control
>> to physical pages. Dynamically started Gunyah virtual machines use the
>> QCOM_SCM_RM_MANAGED_VMID for access. Linux thus needs to assign access
>> to the memory used by guest VMs. Gunyah doesn't do this operation for us
>> since it is the current VM (typically VMID_HLOS) delegating the access
>> and not Gunyah itself. Use the Gunyah platform ops to achieve this so
>> that only Qualcomm platforms attempt to make the needed SCM calls.
>>
>> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
>> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
>> Signed-off-by: Elliot Berman <[email protected]>
>
> Minor suggestions below. Please consider them, but either way:
>
> Reviewed-by: Alex Elder <[email protected]>
>
>> ---
>> drivers/virt/gunyah/Kconfig | 13 +++
>> drivers/virt/gunyah/Makefile | 1 +
>> drivers/virt/gunyah/gunyah_qcom.c | 147 ++++++++++++++++++++++++++++++
>> 3 files changed, 161 insertions(+)
>> create mode 100644 drivers/virt/gunyah/gunyah_qcom.c
>>
>> diff --git a/drivers/virt/gunyah/Kconfig b/drivers/virt/gunyah/Kconfig
>> index de815189dab6..0421b751aad4 100644
>> --- a/drivers/virt/gunyah/Kconfig
>> +++ b/drivers/virt/gunyah/Kconfig
>> @@ -5,6 +5,7 @@ config GUNYAH
>> depends on ARM64
>> depends on MAILBOX
>> select GUNYAH_PLATFORM_HOOKS
>> + imply GUNYAH_QCOM_PLATFORM if ARCH_QCOM
>> help
>> The Gunyah drivers are the helper interfaces that run in a
>> guest VM
>> such as basic inter-VM IPC and signaling mechanisms, and
>> higher level
>> @@ -15,3 +16,15 @@ config GUNYAH
>> config GUNYAH_PLATFORM_HOOKS
>> tristate
>> +
>> +config GUNYAH_QCOM_PLATFORM
>> + tristate "Support for Gunyah on Qualcomm platforms"
>> + depends on GUNYAH
>> + select GUNYAH_PLATFORM_HOOKS
>> + select QCOM_SCM
>> + help
>> + Enable support for interacting with Gunyah on Qualcomm
>> + platforms. Interaction with Qualcomm firmware requires
>> + extra platform-specific support.
>> +
>> + Say Y/M here to use Gunyah on Qualcomm platforms.
>> diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
>> index 4fbeee521d60..2aa9ff038ed0 100644
>> --- a/drivers/virt/gunyah/Makefile
>> +++ b/drivers/virt/gunyah/Makefile
>> @@ -1,6 +1,7 @@
>> # SPDX-License-Identifier: GPL-2.0
>> obj-$(CONFIG_GUNYAH_PLATFORM_HOOKS) += gunyah_platform_hooks.o
>> +obj-$(CONFIG_GUNYAH_QCOM_PLATFORM) += gunyah_qcom.o
>> gunyah-y += rsc_mgr.o rsc_mgr_rpc.o vm_mgr.o vm_mgr_mm.o
>> obj-$(CONFIG_GUNYAH) += gunyah.o
>> diff --git a/drivers/virt/gunyah/gunyah_qcom.c
>> b/drivers/virt/gunyah/gunyah_qcom.c
>> new file mode 100644
>> index 000000000000..18acbda8fcbd
>> --- /dev/null
>> +++ b/drivers/virt/gunyah/gunyah_qcom.c
>> @@ -0,0 +1,147 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright (c) 2023 Qualcomm Innovation Center, Inc. All rights
>> reserved.
>> + */
>> +
>> +#include <linux/arm-smccc.h>
>> +#include <linux/gunyah_rsc_mgr.h>
>> +#include <linux/module.h>
>> +#include <linux/firmware/qcom/qcom_scm.h>
>> +#include <linux/types.h>
>> +#include <linux/uuid.h>
>> +
>> +#define QCOM_SCM_RM_MANAGED_VMID 0x3A
>> +#define QCOM_SCM_MAX_MANAGED_VMID 0x3F
>
> Is this limited to 63 because there are at most 64 VMIDs
> that can be represented in a 64-bit unsigned?
>
It's a limitation imposed by QC firmware, but I speculate that's why 63
is selected.
>> +
>> +static int qcom_scm_gh_rm_pre_mem_share(struct gh_rm *rm, struct
>> gh_rm_mem_parcel *mem_parcel)
>> +{
>> + struct qcom_scm_vmperm *new_perms;
>> + u64 src, src_cpy;
>> + int ret = 0, i, n;
>> + u16 vmid;
>> +
>> + new_perms = kcalloc(mem_parcel->n_acl_entries,
>> sizeof(*new_perms), GFP_KERNEL);
>> + if (!new_perms)
>> + return -ENOMEM;
>> +
>> + for (n = 0; n < mem_parcel->n_acl_entries; n++) {
>> + vmid = le16_to_cpu(mem_parcel->acl_entries[n].vmid);
>> + if (vmid <= QCOM_SCM_MAX_MANAGED_VMID)
>> + new_perms[n].vmid = vmid;
>> + else
>> + new_perms[n].vmid = QCOM_SCM_RM_MANAGED_VMID;
>
> So any out-of-range VM ID will cause the hunk of memory to
> be assigned to the resource manager. Is it expected that
> this can occur (and not be an error)?
>
Yes, that's the expectation for these virtual machines. This is for the
access control implemented in QC firmware, so it's not that we're
assigning memory to resource manager, but rather assigning memory to a
resource manager-managed VM. This is done to de-escalate the access
level of guest VMs.
>> + if (mem_parcel->acl_entries[n].perms & GH_RM_ACL_X)
>> + new_perms[n].perm |= QCOM_SCM_PERM_EXEC;
>> + if (mem_parcel->acl_entries[n].perms & GH_RM_ACL_W)
>> + new_perms[n].perm |= QCOM_SCM_PERM_WRITE;
>> + if (mem_parcel->acl_entries[n].perms & GH_RM_ACL_R)
>> + new_perms[n].perm |= QCOM_SCM_PERM_READ;
>> + }
>> +
>> + src = (1ull << QCOM_SCM_VMID_HLOS);
>
> src = BIT_ULL(QCOM_SCM_VMID_HLOS);
>
>> +
>> + for (i = 0; i < mem_parcel->n_mem_entries; i++) {
>> + src_cpy = src;
>> + ret =
>> qcom_scm_assign_mem(le64_to_cpu(mem_parcel->mem_entries[i].phys_addr),
>> + le64_to_cpu(mem_parcel->mem_entries[i].size),
>> + &src_cpy, new_perms, mem_parcel->n_acl_entries);
>
> Loops like this can look simpler if you jump to error handling
> at the end that does this unwind activity, rather than incorporating
> it inside the loop itself. Or even just breaking if ret != 0, e.g.:
>
> if (ret)
> break;
> }
>
> if (!ret)
> return 0;
>
> /* And do the following block here, "outdented" twice */
>
>> + if (ret) {
>> + src = 0;
>> + for (n = 0; n < mem_parcel->n_acl_entries; n++) {
>> + vmid = le16_to_cpu(mem_parcel->acl_entries[n].vmid);
>> + if (vmid <= QCOM_SCM_MAX_MANAGED_VMID)
>> + src |= (1ull << vmid);
>
> src |= BIT_ULL(vmid);
>
>> + else
>> + src |= (1ull << QCOM_SCM_RM_MANAGED_VMID);
>
> src |= BIT_ULL(QCOM_SCM_RM_MANAGED_VMID);
>
>> + }
>> +
>> + new_perms[0].vmid = QCOM_SCM_VMID_HLOS;
>> +
>> + for (i--; i >= 0; i--) {
>> + src_cpy = src;
>> + WARN_ON_ONCE(qcom_scm_assign_mem(
>> +
>> le64_to_cpu(mem_parcel->mem_entries[i].phys_addr),
>> + le64_to_cpu(mem_parcel->mem_entries[i].size),
>> + &src_cpy, new_perms, 1));
>> + }
>> + break;
>> + }
>> + }
>> +
>> + kfree(new_perms);
>> + return ret;
>> +}
>> +
>> +static int qcom_scm_gh_rm_post_mem_reclaim(struct gh_rm *rm, struct
>> gh_rm_mem_parcel *mem_parcel)
>> +{
>> + struct qcom_scm_vmperm new_perms;
>> + u64 src = 0, src_cpy;
>> + int ret = 0, i, n;
>> + u16 vmid;
>> +
>> + new_perms.vmid = QCOM_SCM_VMID_HLOS;
>> + new_perms.perm = QCOM_SCM_PERM_EXEC | QCOM_SCM_PERM_WRITE |
>> QCOM_SCM_PERM_READ;
>> +
>> + for (n = 0; n < mem_parcel->n_acl_entries; n++) {
>> + vmid = le16_to_cpu(mem_parcel->acl_entries[n].vmid);
>> + if (vmid <= QCOM_SCM_MAX_MANAGED_VMID)
>> + src |= (1ull << vmid);
>> + else
>> + src |= (1ull << QCOM_SCM_RM_MANAGED_VMID);
>> + }
>> +
>> + for (i = 0; i < mem_parcel->n_mem_entries; i++) {
>> + src_cpy = src;
>> + ret =
>> qcom_scm_assign_mem(le64_to_cpu(mem_parcel->mem_entries[i].phys_addr),
>> + le64_to_cpu(mem_parcel->mem_entries[i].size),
>> + &src_cpy, &new_perms, 1);
>> + WARN_ON_ONCE(ret);
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +static struct gh_rm_platform_ops qcom_scm_gh_rm_platform_ops = {
>> + .pre_mem_share = qcom_scm_gh_rm_pre_mem_share,
>> + .post_mem_reclaim = qcom_scm_gh_rm_post_mem_reclaim,
>> +};
>> +
>> +/* {19bd54bd-0b37-571b-946f-609b54539de6} */
>> +static const uuid_t QCOM_EXT_UUID =
>> + UUID_INIT(0x19bd54bd, 0x0b37, 0x571b, 0x94, 0x6f, 0x60, 0x9b,
>> 0x54, 0x53, 0x9d, 0xe6);
>> +
>> +#define GH_QCOM_EXT_CALL_UUID_ID
>> ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, ARM_SMCCC_SMC_32, \
>> + ARM_SMCCC_OWNER_VENDOR_HYP, 0x3f01)
>> +
>> +static bool gh_has_qcom_extensions(void)
>> +{
>> + struct arm_smccc_res res;
>> + uuid_t uuid;
>> +
>> + arm_smccc_1_1_smc(GH_QCOM_EXT_CALL_UUID_ID, &res);
>> +
>> + ((u32 *)&uuid.b[0])[0] = lower_32_bits(res.a0);
>> + ((u32 *)&uuid.b[0])[1] = lower_32_bits(res.a1);
>> + ((u32 *)&uuid.b[0])[2] = lower_32_bits(res.a2);
>> + ((u32 *)&uuid.b[0])[3] = lower_32_bits(res.a3);
>
> I said this elsewhere. I'd rather see:
>
> u32 *u = (u32 *)&uuid; /* Or &uuid.b? */
>
> *u++ = lower_32_bits(res.a0);
> . . .
>
>> +
>> + return uuid_equal(&uuid, &QCOM_EXT_UUID);
>> +}
>> +
>> +static int __init qcom_gh_platform_hooks_register(void)
>> +{
>> + if (!gh_has_qcom_extensions())
>> + return -ENODEV;
>> +
>> + return gh_rm_register_platform_ops(&qcom_scm_gh_rm_platform_ops);
>> +}
>> +
>> +static void __exit qcom_gh_platform_hooks_unregister(void)
>> +{
>> + gh_rm_unregister_platform_ops(&qcom_scm_gh_rm_platform_ops);
>> +}
>> +
>> +module_init(qcom_gh_platform_hooks_register);
>> +module_exit(qcom_gh_platform_hooks_unregister);
>> +MODULE_DESCRIPTION("Qualcomm Technologies, Inc. Platform Hooks for
>> Gunyah");
>> +MODULE_LICENSE("GPL");
>
On 6/5/2023 12:50 PM, Alex Elder wrote:
> On 5/9/23 3:48 PM, Elliot Berman wrote:
>> Allow userspace to attach an ioeventfd to an mmio address within the
>> guest.
>>
>> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
>> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
>> Signed-off-by: Elliot Berman <[email protected]>
>
> Looks good. One question below.
>
> Reviewed-by: Alex Elder <[email protected]>
>
Thanks!
>> ---
>> Documentation/virt/gunyah/vm-manager.rst | 2 +-
>> drivers/virt/gunyah/Kconfig | 9 ++
>> drivers/virt/gunyah/Makefile | 1 +
>> drivers/virt/gunyah/gunyah_ioeventfd.c | 130 +++++++++++++++++++++++
>> include/uapi/linux/gunyah.h | 37 +++++++
>> 5 files changed, 178 insertions(+), 1 deletion(-)
>> create mode 100644 drivers/virt/gunyah/gunyah_ioeventfd.c
>>
>> diff --git a/Documentation/virt/gunyah/vm-manager.rst
>> b/Documentation/virt/gunyah/vm-manager.rst
>> index c4960948c779..87838c5b5945 100644
>> --- a/Documentation/virt/gunyah/vm-manager.rst
>> +++ b/Documentation/virt/gunyah/vm-manager.rst
>> @@ -115,7 +115,7 @@ the VM *before* the VM starts.
>> The argument types are documented below:
>> .. kernel-doc:: include/uapi/linux/gunyah.h
>> - :identifiers: gh_fn_vcpu_arg gh_fn_irqfd_arg gh_irqfd_flags
>> + :identifiers: gh_fn_vcpu_arg gh_fn_irqfd_arg gh_irqfd_flags
>> gh_fn_ioeventfd_arg gh_ioeventfd_flags
>> Gunyah VCPU API Descriptions
>> ----------------------------
>> diff --git a/drivers/virt/gunyah/Kconfig b/drivers/virt/gunyah/Kconfig
>> index bc2c46d9df94..63bebc5b9f82 100644
>> --- a/drivers/virt/gunyah/Kconfig
>> +++ b/drivers/virt/gunyah/Kconfig
>> @@ -48,3 +48,12 @@ config GUNYAH_IRQFD
>> on Gunyah virtual machine.
>> Say Y/M here if unsure and you want to support Gunyah VMMs.
>> +
>> +config GUNYAH_IOEVENTFD
>> + tristate "Gunyah ioeventfd interface"
>> + depends on GUNYAH
>> + help
>> + Enable kernel support for creating ioeventfds which can alert
>> userspace
>> + when a Gunyah virtual machine accesses a memory address.
>> +
>> + Say Y/M here if unsure and you want to support Gunyah VMMs.
>> diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
>> index ad212a1cf967..63ca11e74796 100644
>> --- a/drivers/virt/gunyah/Makefile
>> +++ b/drivers/virt/gunyah/Makefile
>> @@ -8,3 +8,4 @@ obj-$(CONFIG_GUNYAH) += gunyah.o
>> obj-$(CONFIG_GUNYAH_VCPU) += gunyah_vcpu.o
>> obj-$(CONFIG_GUNYAH_IRQFD) += gunyah_irqfd.o
>> +obj-$(CONFIG_GUNYAH_IOEVENTFD) += gunyah_ioeventfd.o
>> diff --git a/drivers/virt/gunyah/gunyah_ioeventfd.c
>> b/drivers/virt/gunyah/gunyah_ioeventfd.c
>> new file mode 100644
>> index 000000000000..5b1b9fd9ac3a
>> --- /dev/null
>> +++ b/drivers/virt/gunyah/gunyah_ioeventfd.c
>> @@ -0,0 +1,130 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * Copyright (c) 2022-2023 Qualcomm Innovation Center, Inc. All
>> rights reserved.
>> + */
>> +
>> +#include <linux/eventfd.h>
>> +#include <linux/file.h>
>> +#include <linux/fs.h>
>> +#include <linux/gunyah.h>
>> +#include <linux/gunyah_vm_mgr.h>
>> +#include <linux/module.h>
>> +#include <linux/printk.h>
>> +
>> +#include <uapi/linux/gunyah.h>
>> +
>> +struct gh_ioeventfd {
>> + struct gh_vm_function_instance *f;
>> + struct gh_vm_io_handler io_handler;
>> +
>> + struct eventfd_ctx *ctx;
>> +};
>> +
>> +static int gh_write_ioeventfd(struct gh_vm_io_handler *io_dev, u64
>> addr, u32 len, u64 data)
>> +{
>> + struct gh_ioeventfd *iofd = container_of(io_dev, struct
>> gh_ioeventfd, io_handler);
>
> Does a write of 0 bytes still signal an event?
>
From gunyah_ioeventfd perspective, yes. I don't think a write of 0
bytes is possible, but maybe you are thinking of scenario I'm not?
>> +
>> + eventfd_signal(iofd->ctx, 1);
>> + return 0;
>> +}
>> +
>> +static struct gh_vm_io_handler_ops io_ops = {
>> + .write = gh_write_ioeventfd,
>> +};
>> +
>> +static long gh_ioeventfd_bind(struct gh_vm_function_instance *f)
>> +{
>> + const struct gh_fn_ioeventfd_arg *args = f->argp;
>> + struct gh_ioeventfd *iofd;
>> + struct eventfd_ctx *ctx;
>> + int ret;
>> +
>> + if (f->arg_size != sizeof(*args))
>> + return -EINVAL;
>> +
>> + /* All other flag bits are reserved for future use */
>> + if (args->flags & ~GH_IOEVENTFD_FLAGS_DATAMATCH)
>> + return -EINVAL;
>> +
>> + /* must be natural-word sized, or 0 to ignore length */
>> + switch (args->len) {
>> + case 0:
>> + case 1:
>> + case 2:
>> + case 4:
>> + case 8:
>> + break;
>> + default:
>> + return -EINVAL;
>> + }
>> +
>> + /* check for range overflow */
>> + if (overflows_type(args->addr + args->len, u64))
>> + return -EINVAL;
>> +
>> + /* ioeventfd with no length can't be combined with DATAMATCH */
>> + if (!args->len && (args->flags & GH_IOEVENTFD_FLAGS_DATAMATCH))
>> + return -EINVAL;
>> +
>> + ctx = eventfd_ctx_fdget(args->fd);
>> + if (IS_ERR(ctx))
>> + return PTR_ERR(ctx);
>> +
>> + iofd = kzalloc(sizeof(*iofd), GFP_KERNEL);
>> + if (!iofd) {
>> + ret = -ENOMEM;
>> + goto err_eventfd;
>> + }
>> +
>> + f->data = iofd;
>> + iofd->f = f;
>> +
>> + iofd->ctx = ctx;
>> +
>> + if (args->flags & GH_IOEVENTFD_FLAGS_DATAMATCH) {
>> + iofd->io_handler.datamatch = true;
>> + iofd->io_handler.len = args->len;
>> + iofd->io_handler.data = args->datamatch;
>> + }
>> + iofd->io_handler.addr = args->addr;
>> + iofd->io_handler.ops = &io_ops;
>> +
>> + ret = gh_vm_add_io_handler(f->ghvm, &iofd->io_handler);
>> + if (ret)
>> + goto err_io_dev_add;
>> +
>> + return 0;
>> +
>> +err_io_dev_add:
>> + kfree(iofd);
>> +err_eventfd:
>> + eventfd_ctx_put(ctx);
>> + return ret;
>> +}
>> +
>> +static void gh_ioevent_unbind(struct gh_vm_function_instance *f)
>> +{
>> + struct gh_ioeventfd *iofd = f->data;
>> +
>> + eventfd_ctx_put(iofd->ctx);
>> + gh_vm_remove_io_handler(iofd->f->ghvm, &iofd->io_handler);
>> + kfree(iofd);
>> +}
>> +
>> +static bool gh_ioevent_compare(const struct gh_vm_function_instance *f,
>> + const void *arg, size_t size)
>> +{
>> + const struct gh_fn_ioeventfd_arg *instance = f->argp,
>> + *other = arg;
>> +
>> + if (sizeof(*other) != size)
>> + return false;
>> +
>> + return instance->addr == other->addr;
>> +}
>> +
>> +DECLARE_GH_VM_FUNCTION_INIT(ioeventfd, GH_FN_IOEVENTFD, 3,
>> + gh_ioeventfd_bind, gh_ioevent_unbind,
>> + gh_ioevent_compare);
>> +MODULE_DESCRIPTION("Gunyah ioeventfd VM Function");
>> +MODULE_LICENSE("GPL");
>> diff --git a/include/uapi/linux/gunyah.h b/include/uapi/linux/gunyah.h
>> index 0c480c622686..fa1cae7419d2 100644
>> --- a/include/uapi/linux/gunyah.h
>> +++ b/include/uapi/linux/gunyah.h
>> @@ -79,10 +79,13 @@ struct gh_vm_dtb_config {
>> * Return: file descriptor to manipulate the vcpu.
>> * @GH_FN_IRQFD: register eventfd to assert a Gunyah doorbell
>> * &struct gh_fn_desc.arg is a pointer to &struct
>> gh_fn_irqfd_arg
>> + * @GH_FN_IOEVENTFD: register ioeventfd to trigger when VM faults on
>> parameter
>> + * &struct gh_fn_desc.arg is a pointer to &struct
>> gh_fn_ioeventfd_arg
>> */
>> enum gh_fn_type {
>> GH_FN_VCPU = 1,
>> GH_FN_IRQFD,
>> + GH_FN_IOEVENTFD,
>> };
>> #define GH_FN_MAX_ARG_SIZE 256
>> @@ -134,6 +137,40 @@ struct gh_fn_irqfd_arg {
>> __u32 padding;
>> };
>> +/**
>> + * enum gh_ioeventfd_flags - flags for use in gh_fn_ioeventfd_arg
>> + * @GH_IOEVENTFD_FLAGS_DATAMATCH: the event will be signaled only if the
>> + * written value to the registered
>> address is
>> + * equal to &struct
>> gh_fn_ioeventfd_arg.datamatch
>> + */
>> +enum gh_ioeventfd_flags {
>> + GH_IOEVENTFD_FLAGS_DATAMATCH = 1UL << 0,
>> +};
>> +
>> +/**
>> + * struct gh_fn_ioeventfd_arg - Arguments to create an ioeventfd
>> function
>> + * @datamatch: data used when GH_IOEVENTFD_DATAMATCH is set
>> + * @addr: Address in guest memory
>> + * @len: Length of access
>> + * @fd: When ioeventfd is matched, this eventfd is written
>> + * @flags: See &enum gh_ioeventfd_flags
>> + * @padding: padding bytes
>> + *
>> + * Create this function with &GH_VM_ADD_FUNCTION using type
>> &GH_FN_IOEVENTFD.
>> + *
>> + * Attaches an ioeventfd to a legal mmio address within the guest. A
>> guest write
>> + * in the registered address will signal the provided event instead
>> of triggering
>> + * an exit on the GH_VCPU_RUN ioctl.
>> + */
>> +struct gh_fn_ioeventfd_arg {
>> + __u64 datamatch;
>> + __u64 addr; /* legal mmio address */
>> + __u32 len; /* 1, 2, 4, or 8 bytes; or 0 to ignore length */
>> + __s32 fd;
>> + __u32 flags;
>> + __u32 padding;
>> +};
>> +
>> /**
>> * struct gh_fn_desc - Arguments to create a VM function
>> * @type: Type of the function. See &enum gh_fn_type.
>
On 6/9/23 12:33 PM, Elliot Berman wrote:
>>> +static int gh_write_ioeventfd(struct gh_vm_io_handler *io_dev, u64
>>> addr, u32 len, u64 data)
>>> +{
>>> + struct gh_ioeventfd *iofd = container_of(io_dev, struct
>>> gh_ioeventfd, io_handler);
>>
>> Does a write of 0 bytes still signal an event?
>>
>
> From gunyah_ioeventfd perspective, yes. I don't think a write of 0
> bytes is possible, but maybe you are thinking of scenario I'm not?
>
>>> +
>>> + eventfd_signal(iofd->ctx, 1);
>>> + return 0;
>>> +}
No, I was just observing that eventfd_signal() is called regardless
of the value of len.
-Alex
On 6/5/2023 12:50 PM, Alex Elder wrote:
> On 5/9/23 3:47 PM, Elliot Berman wrote:
>> Enable support for creating irqfds which can raise an interrupt on a
>> Gunyah virtual machine. irqfds are exposed to userspace as a Gunyah VM
>> function with the name "irqfd". If the VM devicetree is not configured
>> to create a doorbell with the corresponding label, userspace will still
>> be able to assert the eventfd but no interrupt will be raised on the
>> guest.
>>
>> Co-developed-by: Prakruthi Deepak Heragu <[email protected]>
>> Signed-off-by: Prakruthi Deepak Heragu <[email protected]>
>> Signed-off-by: Elliot Berman <[email protected]>
>
> I have a minor suggestion. I think I'd like to look at this
> again, so:
>
> Acked-by: Alex Elder <[email protected]>
>
>> ---
>> Documentation/virt/gunyah/vm-manager.rst | 2 +-
>> drivers/virt/gunyah/Kconfig | 9 ++
>> drivers/virt/gunyah/Makefile | 1 +
>> drivers/virt/gunyah/gunyah_irqfd.c | 180 +++++++++++++++++++++++
>> include/uapi/linux/gunyah.h | 35 +++++
>> 5 files changed, 226 insertions(+), 1 deletion(-)
>> create mode 100644 drivers/virt/gunyah/gunyah_irqfd.c
>>
>
> . . .
>
>> @@ -99,6 +102,38 @@ struct gh_fn_vcpu_arg {
>> __u32 id;
>> };
>> +/**
>> + * enum gh_irqfd_flags - flags for use in gh_fn_irqfd_arg
>> + * @GH_IRQFD_FLAGS_LEVEL: make the interrupt operate like a level
>> triggered
>> + * interrupt on guest side. Triggering IRQFD
>> before
>> + * guest handles the interrupt causes
>> interrupt to
>> + * stay asserted.
>> + */
>> +enum gh_irqfd_flags {
>> + GH_IRQFD_FLAGS_LEVEL = 1UL << 0,
>
> BIT(0), /* ? */
>
The BIT macro isn't a standard C macro and isn't defined by Linux, so it
causes compile errors at least for me when I use it in userspace.
>> +};
>> +
>> +/**
>> + * struct gh_fn_irqfd_arg - Arguments to create an irqfd function.
>> + *
>> + * Create this function with &GH_VM_ADD_FUNCTION using type
>> &GH_FN_IRQFD.
>> + *
>> + * Allows setting an eventfd to directly trigger a guest interrupt.
>> + * irqfd.fd specifies the file descriptor to use as the eventfd.
>> + * irqfd.label corresponds to the doorbell label used in the guest
>> VM's devicetree.
>> + *
>> + * @fd: an eventfd which when written to will raise a doorbell
>> + * @label: Label of the doorbell created on the guest VM
>> + * @flags: see &enum gh_irqfd_flags
>> + * @padding: padding bytes
>> + */
>> +struct gh_fn_irqfd_arg {
>> + __u32 fd;
>> + __u32 label;
>> + __u32 flags;
>> + __u32 padding;
>> +};
>> +
>> /**
>> * struct gh_fn_desc - Arguments to create a VM function
>> * @type: Type of the function. See &enum gh_fn_type.
>
On 6/9/23 1:22 PM, Elliot Berman wrote:
>>>
>>> +enum gh_irqfd_flags {
>>> + GH_IRQFD_FLAGS_LEVEL = 1UL << 0,
>>
>> BIT(0), /* ? */
>>
>
> The BIT macro isn't a standard C macro and isn't defined by Linux, so it
> causes compile errors at least for me when I use it in userspace.
OK that makes sense. I hadn't thought about this
being a user space header when I made the comment.
-Alex
On 6/5/2023 12:49 PM, Alex Elder wrote:
> On 5/9/23 3:47 PM, Elliot Berman wrote:
>> Introduce a framework for Gunyah userspace to install VM functions. VM
>> functions are optional interfaces to the virtual machine. vCPUs,
>> ioeventfs, and irqfds are examples of such VM functions and are
>
> s/ioventfs/ioventfds/
>
> Also, these aren't just examples of VM functions, they *are* the
> VM functions implemented.
>
>> implemented in subsequent patches.
>>
>> A generic framework is implemented instead of individual ioctls to
>> create vCPUs, irqfds, etc., in order to simplify the VM manager core
>> implementation and allow dynamic loading of VM function modules.
>
> This also allows the set of VM functions to be extended without
> updating the API (like it or not).
>
>>
>> Signed-off-by: Elliot Berman <[email protected]>
>
> I have a few more comments, but this looks pretty good.
>
> Reviewed-by: Alex Elder <[email protected]>
>
>> ---
>> Documentation/virt/gunyah/vm-manager.rst | 18 ++
>> drivers/virt/gunyah/vm_mgr.c | 216 ++++++++++++++++++++++-
>> drivers/virt/gunyah/vm_mgr.h | 4 +
>> include/linux/gunyah_vm_mgr.h | 87 +++++++++
>> include/uapi/linux/gunyah.h | 18 ++
>> 5 files changed, 340 insertions(+), 3 deletions(-)
>> create mode 100644 include/linux/gunyah_vm_mgr.h
>>
>> diff --git a/Documentation/virt/gunyah/vm-manager.rst
>> b/Documentation/virt/gunyah/vm-manager.rst
>> index 50d8ae7fabcd..3b51bab9d793 100644
>> --- a/Documentation/virt/gunyah/vm-manager.rst
>> +++ b/Documentation/virt/gunyah/vm-manager.rst
>> @@ -17,6 +17,24 @@ sharing userspace memory with a VM is done via the
>> `GH_VM_SET_USER_MEM_REGION`_
>> ioctl. The VM itself is configured to use the memory region via the
>> devicetree.
>> +Gunyah Functions
>> +================
>> +
>> +Components of a Gunyah VM's configuration that need kernel
>> configuration are
>> +called "functions" and are built on top of a framework. Functions are
>> identified
>> +by a string and have some argument(s) to configure them. They are
>> typically
>> +created by the `GH_VM_ADD_FUNCTION`_ ioctl.
>
> Is a function *type* (e.g., VCPU or ioeventfd) identified by a string?
> Or a function *instance* (e.g. four VCPUs)? Or both?
>
Ah, this should be:
Function types are identified by an enum and have some argument(s)...
>> +
>> +Functions typically will always do at least one of these operations:
>
> Typically, or always?
>
Hmm, I didn't want to use a more absolute term like "always" since it
implies to me that the framework forces this somehow. A VM function
wouldn't do much interesting if it weren't interacting with the VM and
resource tickets/IO handlers are the ways for functions to interact with
VMs.
I'll tweak the wording here.
>> +
>> +1. Create resource ticket(s). Resource tickets allow a function to
>> register
>> + itself as the client for a Gunyah resource (e.g. doorbell or vCPU)
>> and
>> + the function is given the pointer to the &struct gh_resource when the
>> + VM is starting.
>> +
>
> What I think this means is that tickets are used to allow functions
> to be defined *before* the VM is actually started. So once it starts,
> the functions get added. (I might have this slightly wrong, but in
> any case I'm not sure the above sentence is very clear.)
>
I'm going to remove the "and the function is given the pointer to..."
since I agree it is a bit confusing. I think it'll be clearer for me to
put it in the resource ticket kerneldoc where there's context of the
populate() callback in the resource ticket. I'll mention there that the
populate() callback may not be made until the VM is started which could
be a while.
>> +2. Register IO handler(s). IO handlers allow a function to handle
>> stage-2 faults
>> + from the virtual machine.
>> +
>> Sample Userspace VMM
>> ====================
>> diff --git a/drivers/virt/gunyah/vm_mgr.c b/drivers/virt/gunyah/vm_mgr.c
>> index a800061f56bf..56464451b262 100644
>> --- a/drivers/virt/gunyah/vm_mgr.c
>> +++ b/drivers/virt/gunyah/vm_mgr.c
>> @@ -6,10 +6,13 @@
>> #define pr_fmt(fmt) "gh_vm_mgr: " fmt
>> #include <linux/anon_inodes.h>
>> +#include <linux/compat.h>
>> #include <linux/file.h>
>> #include <linux/gunyah_rsc_mgr.h>
>> +#include <linux/gunyah_vm_mgr.h>
>> #include <linux/miscdevice.h>
>> #include <linux/module.h>
>> +#include <linux/xarray.h>
>> #include <uapi/linux/gunyah.h>
>> @@ -17,6 +20,172 @@
>> static void gh_vm_free(struct work_struct *work);
>> +static DEFINE_XARRAY(gh_vm_functions);
>> +
>> +static void gh_vm_put_function(struct gh_vm_function *fn)
>> +{
>> + module_put(fn->mod);
>> +}
>> +
>> +static struct gh_vm_function *gh_vm_get_function(u32 type)
>> +{
>> + struct gh_vm_function *fn;
>> + int r;
>> +
>> + fn = xa_load(&gh_vm_functions, type);
>> + if (!fn) {
>> + r = request_module("ghfunc:%d", type);
>> + if (r)
>> + return ERR_PTR(r > 0 ? -r : r);
>
> Almost all callers of request_module() simply ignore the
> return value. What positive values are you expecting to
> see here (and are you sure they're positive errno values)?
>
I can ignore the return value here, too, to follow the convention.
I had observed request_module can return modprobe's exit code.
>> +
>> + fn = xa_load(&gh_vm_functions, type);
>> + }
>> +
>> + if (!fn || !try_module_get(fn->mod))
>> + fn = ERR_PTR(-ENOENT);
>> +
>> + return fn;
>> +}
>
> . . .
On 6/5/2023 12:49 PM, Alex Elder wrote:
> On 5/9/23 3:47 PM, Elliot Berman wrote:
>> When booting a Gunyah virtual machine, the host VM may gain capabilities
>> to interact with resources for the guest virtual machine. Examples of
>> such resources are vCPUs or message queues. To use those resources, we
>> need to translate the RM response into a gunyah_resource structure which
>> are useful to Linux drivers. Presently, Linux drivers need only to know
>> the type of resource, the capability ID, and an interrupt.
>>
>> On ARM64 systems, the interrupt reported by Gunyah is the GIC interrupt
>> ID number and always a SPI.
>>
>> Signed-off-by: Elliot Berman <[email protected]>
>
> Please zero the automatic variable in the place I suggest it.
> I have two other comments/questions. Otherwise, this looks good.
>
> Reviewed-by: Alex Elder <[email protected]>
>
>> ---
...
>> +struct gh_resource *gh_rm_alloc_resource(struct gh_rm *rm, struct
>> gh_rm_hyp_resource *hyp_resource)
>> +{
>> + struct gh_resource *ghrsc;
>> + int ret;
>> +
>> + ghrsc = kzalloc(sizeof(*ghrsc), GFP_KERNEL);
>> + if (!ghrsc)
>> + return NULL;
>> +
>> + ghrsc->type = hyp_resource->type;
>> + ghrsc->capid = le64_to_cpu(hyp_resource->cap_id);
>> + ghrsc->irq = IRQ_NOTCONNECTED;
>> + ghrsc->rm_label = le32_to_cpu(hyp_resource->resource_label);
>> + if (hyp_resource->virq) {
>> + struct gh_irq_chip_data irq_data = {
>> + .gh_virq = le32_to_cpu(hyp_resource->virq),
>> + };
>> +
>> + ret = irq_domain_alloc_irqs(rm->irq_domain, 1, NUMA_NO_NODE,
>> &irq_data);
>> + if (ret < 0) {
>> + dev_err(rm->dev,
>> + "Failed to allocate interrupt for resource %d label:
>> %d: %d\n",
>> + ghrsc->type, ghrsc->rm_label, ghrsc->irq);
>
> Is it reasonable to return in this case without indicating to the
> caller that something is wrong?
>
I wasn't sure what to do here since this is unexpected edge case. Not
returning would cause a client's "request_irq" to fail down the line if
the client was interested in the irq. I had picked not to return since
this error doesn't put us in an unrecoverable state. No one currently
wants to try to recover from that error, so I'm really just deferring
the real error handling until later.
I can return ret here.
>> + } else {
>> + ghrsc->irq = ret;
>> + }
>> + }
>> +
>> + return ghrsc;
...
On 6/5/2023 12:48 PM, Alex Elder wrote:
> On 5/9/23 3:47 PM, Elliot Berman wrote: >> +
>> + req_header->mem_handle = cpu_to_le32(mem_handle);
>> + if (end_append)
>> + req_header->flags |= GH_MEM_APPEND_REQ_FLAGS_END;
>> +
>> + mem_section->n_entries = cpu_to_le16(n_mem_entries);
>> + memcpy(mem_section->entries, mem_entries, sizeof(*mem_entries) *
>> n_mem_entries);
>> +
>> + ret = gh_rm_call(rm, GH_RM_RPC_MEM_APPEND, msg, msg_size, NULL,
>> NULL);
>> + kfree(msg);
>> +
>> + return ret;
>> +}
>> +
>> +static int gh_rm_mem_append(struct gh_rm *rm, u32 mem_handle,
>> + struct gh_rm_mem_entry *mem_entries, size_t n_mem_entries)
>> +{
>> + bool end_append;
>> + int ret = 0;
>> + size_t n;
>> +
>> + while (n_mem_entries) {
>> + if (n_mem_entries > GH_RM_MAX_MEM_ENTRIES) {
>> + end_append = false;
>> + n = GH_RM_MAX_MEM_ENTRIES;
>> + } else {
>> + end_append = true;
>> + n = n_mem_entries;
>> + }
>> +
>> + ret = _gh_rm_mem_append(rm, mem_handle, end_append,
>> mem_entries, n);
>> + if (ret)
>> + break;
>> +
>> + mem_entries += n;
>> + n_mem_entries -= n;
>> + }
>> +
>> + return ret;
>> +}
>> +
>> +static int gh_rm_mem_lend_common(struct gh_rm *rm, u32 message_id,
>> struct gh_rm_mem_parcel *p)
>> +{
>> + size_t msg_size = 0, initial_mem_entries = p->n_mem_entries,
>> resp_size;
>> + size_t acl_section_size, mem_section_size;
>> + struct gh_rm_mem_share_req_acl_section *acl_section;
>> + struct gh_rm_mem_share_req_mem_section *mem_section;
>> + struct gh_rm_mem_share_req_header *req_header;
>> + u32 *attr_section;
>> + __le32 *resp;
>> + void *msg;
>> + int ret;
>> +
>> + if (!p->acl_entries || !p->n_acl_entries || !p->mem_entries ||
>> !p->n_mem_entries ||
>> + p->n_acl_entries > U8_MAX || p->mem_handle !=
>> GH_MEM_HANDLE_INVAL)
>> + return -EINVAL;
>> +
>> + if (initial_mem_entries > GH_RM_MAX_MEM_ENTRIES)
>> + initial_mem_entries = GH_RM_MAX_MEM_ENTRIES;
>
> Is it OK to truncate the number of entries silently?
>
The initial share/lend accepts GH_RM_MAX_MEM_ENTRIES. I append the rest
of the mem entries later.
>> +
>> + acl_section_size = struct_size(acl_section, entries,
>> p->n_acl_entries);
>
> Is there a limit on the number of ACL entries (as there is for
> the number of mem entries).
>
There is limit based at the transport level -- messages sent to resource
manager can only be so long. Max # ACL entries limit is dynamic based on
the size of the rest of the message such as how many mem entries there
are. We could try to compute the limit and even lower max number of
mem_entries, but max # of ACL entries in practice will single digits so
it seemed premature optimization to be "smarter" about the limit and let
the RPC core do the checking/complaining.
>> + mem_section_size = struct_size(mem_section, entries,
>> initial_mem_entries);
>> + /* The format of the message goes:
>> + * request header
>> + * ACL entries (which VMs get what kind of access to this memory
>> parcel)
>> + * Memory entries (list of memory regions to share)
>> + * Memory attributes (currently unused, we'll hard-code the size
>> to 0)
>> + */
>> + msg_size += sizeof(struct gh_rm_mem_share_req_header);
>> + msg_size += acl_section_size;
>> + msg_size += mem_section_size;
>> + msg_size += sizeof(u32); /* for memory attributes, currently
>> unused */
>> +
>> + msg = kzalloc(msg_size, GFP_KERNEL);
>> + if (!msg)
>> + return -ENOMEM;
>> +
>> + req_header = msg;
>> + acl_section = (void *)req_header + sizeof(*req_header);
>> + mem_section = (void *)acl_section + acl_section_size;
>> + attr_section = (void *)mem_section + mem_section_size;
>> +
>> + req_header->mem_type = p->mem_type;
>> + if (initial_mem_entries != p->n_mem_entries)
>> + req_header->flags |= GH_MEM_SHARE_REQ_FLAGS_APPEND;
>> + req_header->label = cpu_to_le32(p->label);
>> +
>> + acl_section->n_entries = cpu_to_le32(p->n_acl_entries);
>> + memcpy(acl_section->entries, p->acl_entries,
>> + flex_array_size(acl_section, entries, p->n_acl_entries));
>> +
>> + mem_section->n_entries = cpu_to_le16(initial_mem_entries);
>> + memcpy(mem_section->entries, p->mem_entries,
>> + flex_array_size(mem_section, entries, initial_mem_entries));
>> +
>> + /* Set n_entries for memory attribute section to 0 */
>> + *attr_section = 0;
>> +
>> + ret = gh_rm_call(rm, message_id, msg, msg_size, (void **)&resp,
>> &resp_size);
>> + kfree(msg);
>> +
>> + if (ret)
>> + return ret;
>> +
>> + p->mem_handle = le32_to_cpu(*resp);
>> + kfree(resp);
>> +
>> + if (initial_mem_entries != p->n_mem_entries) {
>> + ret = gh_rm_mem_append(rm, p->mem_handle,
>> + &p->mem_entries[initial_mem_entries],
>> + p->n_mem_entries - initial_mem_entries);
>
> Will there always be at most one gh_rm_mem_append() call?
>
Yes, gh_rm_mem_append makes multiple RPC calls as necessary for all the
remaining entries.
>> + if (ret) {
>> + gh_rm_mem_reclaim(rm, p);
>> + p->mem_handle = GH_MEM_HANDLE_INVAL;
>> + }
>> + }
>> +
>> + return ret;
>> +}
>
> . . .
>
On 6/9/23 2:49 PM, Elliot Berman wrote:
>>> +static struct gh_vm_function *gh_vm_get_function(u32 type)
>>> +{
>>> + struct gh_vm_function *fn;
>>> + int r;
>>> +
>>> + fn = xa_load(&gh_vm_functions, type);
>>> + if (!fn) {
>>> + r = request_module("ghfunc:%d", type);
>>> + if (r)
>>> + return ERR_PTR(r > 0 ? -r : r);
>>
>> Almost all callers of request_module() simply ignore the
>> return value. What positive values are you expecting to
>> see here (and are you sure they're positive errno values)?
>>
>
> I can ignore the return value here, too, to follow the convention.
>
> I had observed request_module can return modprobe's exit code.
I actually like checking the return value, but if a positive one comes
back it's not clear at all that it should be interpreted as an errno.
Given that almost everybody ignores the return value, maybe the
called function should change, but blk_request_module() and
cpufreq_parse_governor() (for two examples) actually use the
return value to affect behavior.
If you check its return, and it's positive, I would return a
known negative errno rather than just negating it--and perhaps
issue a warning. But it's OK with me if you just ignore it
like most other callers.
-Alex
On 6/5/2023 7:18 AM, Will Deacon wrote:
> Hi Elliot,
>
> [+Quentin since he's looked at the MMU notifiers]
>
> Sorry for the slow response, I got buried in email during a week away.
>
> On Fri, May 19, 2023 at 10:02:29AM -0700, Elliot Berman wrote:
>> On 5/19/2023 4:59 AM, Will Deacon wrote:
>>> On Tue, May 09, 2023 at 01:47:47PM -0700, Elliot Berman wrote:
>>>> + ret = account_locked_vm(ghvm->mm, mapping->npages, true);
>>>> + if (ret)
>>>> + goto free_mapping;
>>>> +
>>>> + mapping->pages = kcalloc(mapping->npages, sizeof(*mapping->pages), GFP_KERNEL_ACCOUNT);
>>>> + if (!mapping->pages) {
>>>> + ret = -ENOMEM;
>>>> + mapping->npages = 0; /* update npages for reclaim */
>>>> + goto unlock_pages;
>>>> + }
>>>> +
>>>> + gup_flags = FOLL_LONGTERM;
>>>> + if (region->flags & GH_MEM_ALLOW_WRITE)
>>>> + gup_flags |= FOLL_WRITE;
>>>> +
>>>> + pinned = pin_user_pages_fast(region->userspace_addr, mapping->npages,
>>>> + gup_flags, mapping->pages);
>>>> + if (pinned < 0) {
>>>> + ret = pinned;
>>>> + goto free_pages;
>>>> + } else if (pinned != mapping->npages) {
>>>> + ret = -EFAULT;
>>>> + mapping->npages = pinned; /* update npages for reclaim */
>>>> + goto unpin_pages;
>>>> + }
>>>
>>> Sorry if I missed it, but I still don't see where you reject file mappings
>>> here.
>>>
>>
>> Sure, I can reject file mappings. I didn't catch that was the ask previously
>> and thought it was only a comment about behavior of file mappings.
>
> I thought the mention of filesystem corruption was clear enough! It's
> definitely something we shouldn't allow.
>
I tried preventing file mappings but this breaks memfd used by crosvm. I
didn't understand the vector you were tracking for filesystem
corruption. I ran a few basic experiments with real filesystem backed
memory mappings and didn't observe corruption, but maybe my experiments
weren't right.
[snip; response to other comments in
https://lore.kernel.org/all/[email protected]/]
On 5/24/2023 10:13 AM, Alex Bennée wrote:
>
> Elliot Berman <[email protected]> writes:
>
snip
> Applying: mailbox: pcc: Use mbox_bind_client
>
>
> <snip>
>>
>> Elliot Berman (24):
> <snip>
>
>> mailbox: Add Gunyah message queue mailbox
>
> This patch touches a file that isn't in mainline which makes me wonder
> if I've missed another pre-requisite patch?
>
The v13 series had missed out on this patch:
https://lore.kernel.org/all/[email protected]/
(which was present in every recent series). Apologies about that!
The v14 series applies cleanly on v6.4-rc6 (and should also apply on
other recent tags, too).
b4 am
https://lore.kernel.org/all/[email protected]/
> <snip>
>> Documentation/virt/gunyah/message-queue.rst | 8 +
> <snip>
>
On 6/7/2023 8:54 AM, Elliot Berman wrote:
>
>
> On 6/5/2023 7:18 AM, Will Deacon wrote:
>> Hi Elliot,
>>
>> [+Quentin since he's looked at the MMU notifiers]
>>
>> Sorry for the slow response, I got buried in email during a week away.
>>
>> On Fri, May 19, 2023 at 10:02:29AM -0700, Elliot Berman wrote:
>>> On 5/19/2023 4:59 AM, Will Deacon wrote:
>>>> On Tue, May 09, 2023 at 01:47:47PM -0700, Elliot Berman wrote:
>>>>> + ret = account_locked_vm(ghvm->mm, mapping->npages, true);
>>>>> + if (ret)
>>>>> + goto free_mapping;
>>>>> +
>>>>> + mapping->pages = kcalloc(mapping->npages,
>>>>> sizeof(*mapping->pages), GFP_KERNEL_ACCOUNT);
>>>>> + if (!mapping->pages) {
>>>>> + ret = -ENOMEM;
>>>>> + mapping->npages = 0; /* update npages for reclaim */
>>>>> + goto unlock_pages;
>>>>> + }
>>>>> +
>>>>> + gup_flags = FOLL_LONGTERM;
>>>>> + if (region->flags & GH_MEM_ALLOW_WRITE)
>>>>> + gup_flags |= FOLL_WRITE;
>>>>> +
>>>>> + pinned = pin_user_pages_fast(region->userspace_addr,
>>>>> mapping->npages,
>>>>> + gup_flags, mapping->pages);
>>>>> + if (pinned < 0) {
>>>>> + ret = pinned;
>>>>> + goto free_pages;
>>>>> + } else if (pinned != mapping->npages) {
>>>>> + ret = -EFAULT;
>>>>> + mapping->npages = pinned; /* update npages for reclaim */
>>>>> + goto unpin_pages;
>>>>> + }
>>>>
>>>> Sorry if I missed it, but I still don't see where you reject file
>>>> mappings
>>>> here.
>>>>
>>>
>>> Sure, I can reject file mappings. I didn't catch that was the ask
>>> previously
>>> and thought it was only a comment about behavior of file mappings.
>>
>> I thought the mention of filesystem corruption was clear enough! It's
>> definitely something we shouldn't allow.
>>
>>>> This is also the wrong interface for upstream. Please get involved with
>>>> the fd-based guest memory discussions [1] and port your series to that.
>>>>
>>>
>>> The user interface design for *shared* memory aligns with
>>> KVM_SET_USER_MEMORY_REGION.
>>
>> I don't think it does. For example, file mappings don't work (as above),
>> you're placing additional rlimit requirements on the caller, read-only
>> memslots are not functional, the memory cannot be swapped or migrated,
>> dirty logging doesn't work etc. pKVM is in the same boat, but that's why
>> we're not upstreaming this part in its current form.
>>
>
> I thought pKVM was only holding off on upstreaming changes related to
> guest-private memory?
>
>>> I understood we want to use restricted memfd for giving guest-private
>>> memory
>>> (Gunyah calls this "lending memory"). When I went through the changes, I
>>> gathered KVM is using restricted memfd only for guest-private memory
>>> and not
>>> for shared memory. Thus, I dropped support for lending memory to the
>>> guest
>>> VM and only retained the shared memory support in this series. I'd
>>> like to
>>> merge what we can today and introduce the guest-private memory
>>> support in
>>> tandem with the restricted memfd; I don't see much reason to delay the
>>> series.
>>
>> Right, protected guests will use the new restricted memfd ("guest mem"
>> now, I think?), but non-protected guests should implement the existing
>> interface *without* the need for the GUP pin on guest memory pages. Yes,
>> that means full support for MMU notifiers so that these pages can be
>> managed properly by the host kernel. We're working on that for pKVM, but
>> it requires a more flexible form of memory sharing over what we currently
>> have so that e.g. the zero page can be shared between multiple entities.
>
> Gunyah doesn't support swapping pages out while the guest is running and
> the design of Gunyah isn't made to give host kernel full control over
> the S2 page table for its guests. As best I can tell from reading the
> respective drivers, ACRN and Nitro Enclaves both GUP pin guest memory
> pages prior to giving them to the guest, so I don't think this
> requirement from Gunyah is particularly unusual.
>
I read/dug into mmu notifiers more and I don't think it matches with
Gunyah's features today. We don't allow the host to freely manage VM's
pages because it requires the guest VM to have a level of trust on the
host. Once a page is given to the guest, it's done for the lifetime of
the VM. Allowing the host to replace pages in the guest memory map isn't
part of any VM's security model that we run in Gunyah. With that
requirement, longterm pinning looks like the correct approach to me.
Thanks,
Elliot
Hi Will,
On 6/22/2023 4:56 PM, Elliot Berman wrote:
>
>
> On 6/7/2023 8:54 AM, Elliot Berman wrote:
>>
>>
>> On 6/5/2023 7:18 AM, Will Deacon wrote:
>>> Hi Elliot,
>>>
>>> [+Quentin since he's looked at the MMU notifiers]
>>>
>>> Sorry for the slow response, I got buried in email during a week away.
>>>
>>> On Fri, May 19, 2023 at 10:02:29AM -0700, Elliot Berman wrote:
>>>> On 5/19/2023 4:59 AM, Will Deacon wrote:
>>>>> On Tue, May 09, 2023 at 01:47:47PM -0700, Elliot Berman wrote:
>>>>>> + ret = account_locked_vm(ghvm->mm, mapping->npages, true);
>>>>>> + if (ret)
>>>>>> + goto free_mapping;
>>>>>> +
>>>>>> + mapping->pages = kcalloc(mapping->npages,
>>>>>> sizeof(*mapping->pages), GFP_KERNEL_ACCOUNT);
>>>>>> + if (!mapping->pages) {
>>>>>> + ret = -ENOMEM;
>>>>>> + mapping->npages = 0; /* update npages for reclaim */
>>>>>> + goto unlock_pages;
>>>>>> + }
>>>>>> +
>>>>>> + gup_flags = FOLL_LONGTERM;
>>>>>> + if (region->flags & GH_MEM_ALLOW_WRITE)
>>>>>> + gup_flags |= FOLL_WRITE;
>>>>>> +
>>>>>> + pinned = pin_user_pages_fast(region->userspace_addr,
>>>>>> mapping->npages,
>>>>>> + gup_flags, mapping->pages);
>>>>>> + if (pinned < 0) {
>>>>>> + ret = pinned;
>>>>>> + goto free_pages;
>>>>>> + } else if (pinned != mapping->npages) {
>>>>>> + ret = -EFAULT;
>>>>>> + mapping->npages = pinned; /* update npages for reclaim */
>>>>>> + goto unpin_pages;
>>>>>> + }
>>>>>
>>>>> Sorry if I missed it, but I still don't see where you reject file
>>>>> mappings
>>>>> here.
>>>>>
>>>>
>>>> Sure, I can reject file mappings. I didn't catch that was the ask
>>>> previously
>>>> and thought it was only a comment about behavior of file mappings.
>>>
>>> I thought the mention of filesystem corruption was clear enough! It's
>>> definitely something we shouldn't allow.
>>>
>>>>> This is also the wrong interface for upstream. Please get involved
>>>>> with
>>>>> the fd-based guest memory discussions [1] and port your series to
>>>>> that.
>>>>>
>>>>
>>>> The user interface design for *shared* memory aligns with
>>>> KVM_SET_USER_MEMORY_REGION.
>>>
>>> I don't think it does. For example, file mappings don't work (as above),
>>> you're placing additional rlimit requirements on the caller, read-only
>>> memslots are not functional, the memory cannot be swapped or migrated,
>>> dirty logging doesn't work etc. pKVM is in the same boat, but that's why
>>> we're not upstreaming this part in its current form.
>>>
>>
>> I thought pKVM was only holding off on upstreaming changes related to
>> guest-private memory?
>>
>>>> I understood we want to use restricted memfd for giving
>>>> guest-private memory
>>>> (Gunyah calls this "lending memory"). When I went through the
>>>> changes, I
>>>> gathered KVM is using restricted memfd only for guest-private memory
>>>> and not
>>>> for shared memory. Thus, I dropped support for lending memory to the
>>>> guest
>>>> VM and only retained the shared memory support in this series. I'd
>>>> like to
>>>> merge what we can today and introduce the guest-private memory
>>>> support in
>>>> tandem with the restricted memfd; I don't see much reason to delay the
>>>> series.
>>>
>>> Right, protected guests will use the new restricted memfd ("guest mem"
>>> now, I think?), but non-protected guests should implement the existing
>>> interface *without* the need for the GUP pin on guest memory pages. Yes,
>>> that means full support for MMU notifiers so that these pages can be
>>> managed properly by the host kernel. We're working on that for pKVM, but
>>> it requires a more flexible form of memory sharing over what we
>>> currently
>>> have so that e.g. the zero page can be shared between multiple entities.
>>
>> Gunyah doesn't support swapping pages out while the guest is running
>> and the design of Gunyah isn't made to give host kernel full control
>> over the S2 page table for its guests. As best I can tell from reading
>> the respective drivers, ACRN and Nitro Enclaves both GUP pin guest
>> memory pages prior to giving them to the guest, so I don't think this
>> requirement from Gunyah is particularly unusual.
>>
>
> I read/dug into mmu notifiers more and I don't think it matches with
> Gunyah's features today. We don't allow the host to freely manage VM's
> pages because it requires the guest VM to have a level of trust on the
> host. Once a page is given to the guest, it's done for the lifetime of
> the VM. Allowing the host to replace pages in the guest memory map isn't
> part of any VM's security model that we run in Gunyah. With that
> requirement, longterm pinning looks like the correct approach to me.
Is my approach of longterm pinning correct given that Gunyah doesn't
allow host to freely swap pages?
On Thu, Jul 13, 2023 at 01:28:34PM -0700, Elliot Berman wrote:
> On 6/22/2023 4:56 PM, Elliot Berman wrote:
> > On 6/7/2023 8:54 AM, Elliot Berman wrote:
> > > On 6/5/2023 7:18 AM, Will Deacon wrote:
> > > > On Fri, May 19, 2023 at 10:02:29AM -0700, Elliot Berman wrote:
> > > > > The user interface design for *shared* memory aligns with
> > > > > KVM_SET_USER_MEMORY_REGION.
> > > >
> > > > I don't think it does. For example, file mappings don't work (as above),
> > > > you're placing additional rlimit requirements on the caller, read-only
> > > > memslots are not functional, the memory cannot be swapped or migrated,
> > > > dirty logging doesn't work etc. pKVM is in the same boat, but that's why
> > > > we're not upstreaming this part in its current form.
> > > >
> > >
> > > I thought pKVM was only holding off on upstreaming changes related
> > > to guest-private memory?
> > >
> > > > > I understood we want to use restricted memfd for giving
> > > > > guest-private memory
> > > > > (Gunyah calls this "lending memory"). When I went through
> > > > > the changes, I
> > > > > gathered KVM is using restricted memfd only for
> > > > > guest-private memory and not
> > > > > for shared memory. Thus, I dropped support for lending
> > > > > memory to the guest
> > > > > VM and only retained the shared memory support in this
> > > > > series. I'd like to
> > > > > merge what we can today and introduce the guest-private
> > > > > memory support in
> > > > > tandem with the restricted memfd; I don't see much reason to delay the
> > > > > series.
> > > >
> > > > Right, protected guests will use the new restricted memfd ("guest mem"
> > > > now, I think?), but non-protected guests should implement the existing
> > > > interface *without* the need for the GUP pin on guest memory pages. Yes,
> > > > that means full support for MMU notifiers so that these pages can be
> > > > managed properly by the host kernel. We're working on that for pKVM, but
> > > > it requires a more flexible form of memory sharing over what we
> > > > currently
> > > > have so that e.g. the zero page can be shared between multiple entities.
> > >
> > > Gunyah doesn't support swapping pages out while the guest is running
> > > and the design of Gunyah isn't made to give host kernel full control
> > > over the S2 page table for its guests. As best I can tell from
> > > reading the respective drivers, ACRN and Nitro Enclaves both GUP pin
> > > guest memory pages prior to giving them to the guest, so I don't
> > > think this requirement from Gunyah is particularly unusual.
> > >
> >
> > I read/dug into mmu notifiers more and I don't think it matches with
> > Gunyah's features today. We don't allow the host to freely manage VM's
> > pages because it requires the guest VM to have a level of trust on the
> > host. Once a page is given to the guest, it's done for the lifetime of
> > the VM. Allowing the host to replace pages in the guest memory map isn't
> > part of any VM's security model that we run in Gunyah. With that
> > requirement, longterm pinning looks like the correct approach to me.
>
> Is my approach of longterm pinning correct given that Gunyah doesn't allow
> host to freely swap pages?
No, I really don't think a longterm GUP pin is the right approach for this.
GUP pins in general are horrible for the mm layer, but required for cases
such as DMA where I/O faults are unrecoverable. Gunyah is not a good
justification for such a hack, and I don't think you get to choose which
parts of the Linux mm you want and which bits you don't.
In other words, either carve out your memory and pin it that way, or
implement the proper hooks for the mm to do its job.
Will
Hi Will,
On 7/14/2023 5:13 AM, Will Deacon wrote:
> On Thu, Jul 13, 2023 at 01:28:34PM -0700, Elliot Berman wrote:
>> On 6/22/2023 4:56 PM, Elliot Berman wrote:
>>> On 6/7/2023 8:54 AM, Elliot Berman wrote:
>>>> On 6/5/2023 7:18 AM, Will Deacon wrote:
>>>>> On Fri, May 19, 2023 at 10:02:29AM -0700, Elliot Berman wrote:
>>>>>> The user interface design for *shared* memory aligns with
>>>>>> KVM_SET_USER_MEMORY_REGION.
>>>>>
>>>>> I don't think it does. For example, file mappings don't work (as above),
>>>>> you're placing additional rlimit requirements on the caller, read-only
>>>>> memslots are not functional, the memory cannot be swapped or migrated,
>>>>> dirty logging doesn't work etc. pKVM is in the same boat, but that's why
>>>>> we're not upstreaming this part in its current form.
>>>>>
>>>>
>>>> I thought pKVM was only holding off on upstreaming changes related
>>>> to guest-private memory?
>>>>
>>>>>> I understood we want to use restricted memfd for giving
>>>>>> guest-private memory
>>>>>> (Gunyah calls this "lending memory"). When I went through
>>>>>> the changes, I
>>>>>> gathered KVM is using restricted memfd only for
>>>>>> guest-private memory and not
>>>>>> for shared memory. Thus, I dropped support for lending
>>>>>> memory to the guest
>>>>>> VM and only retained the shared memory support in this
>>>>>> series. I'd like to
>>>>>> merge what we can today and introduce the guest-private
>>>>>> memory support in
>>>>>> tandem with the restricted memfd; I don't see much reason to delay the
>>>>>> series.
>>>>>
>>>>> Right, protected guests will use the new restricted memfd ("guest mem"
>>>>> now, I think?), but non-protected guests should implement the existing
>>>>> interface *without* the need for the GUP pin on guest memory pages. Yes,
>>>>> that means full support for MMU notifiers so that these pages can be
>>>>> managed properly by the host kernel. We're working on that for pKVM, but
>>>>> it requires a more flexible form of memory sharing over what we
>>>>> currently
>>>>> have so that e.g. the zero page can be shared between multiple entities.
>>>>
>>>> Gunyah doesn't support swapping pages out while the guest is running
>>>> and the design of Gunyah isn't made to give host kernel full control
>>>> over the S2 page table for its guests. As best I can tell from
>>>> reading the respective drivers, ACRN and Nitro Enclaves both GUP pin
>>>> guest memory pages prior to giving them to the guest, so I don't
>>>> think this requirement from Gunyah is particularly unusual.
>>>>
>>>
>>> I read/dug into mmu notifiers more and I don't think it matches with
>>> Gunyah's features today. We don't allow the host to freely manage VM's
>>> pages because it requires the guest VM to have a level of trust on the
>>> host. Once a page is given to the guest, it's done for the lifetime of
>>> the VM. Allowing the host to replace pages in the guest memory map isn't
>>> part of any VM's security model that we run in Gunyah. With that
>>> requirement, longterm pinning looks like the correct approach to me.
>>
>> Is my approach of longterm pinning correct given that Gunyah doesn't allow
>> host to freely swap pages?
>
> No, I really don't think a longterm GUP pin is the right approach for this.
> GUP pins in general are horrible for the mm layer, but required for cases
> such as DMA where I/O faults are unrecoverable. Gunyah is not a good
> justification for such a hack, and I don't think you get to choose which
> parts of the Linux mm you want and which bits you don't.
>
> In other words, either carve out your memory and pin it that way, or
> implement the proper hooks for the mm to do its job.
I talked to the team about whether we can extend the Gunyah support for
this. We have plans to support sharing/lending individual pages when the
guest faults on them. The support also allows (unprotected) pages to be
removed from the VM. We'll need to temporarily pin the pages of the VM
configuration device tree blob while the VM is being created and those
pages can be unpinned once the VM starts. I'll work on this.
Thanks for the feedback!
- Elliot
On Tue, Jul 18, 2023 at 07:28:49PM -0700, Elliot Berman wrote:
> On 7/14/2023 5:13 AM, Will Deacon wrote:
> > On Thu, Jul 13, 2023 at 01:28:34PM -0700, Elliot Berman wrote:
> > > On 6/22/2023 4:56 PM, Elliot Berman wrote:
> > > > On 6/7/2023 8:54 AM, Elliot Berman wrote:
> > > > > On 6/5/2023 7:18 AM, Will Deacon wrote:
> > > > > > Right, protected guests will use the new restricted memfd ("guest mem"
> > > > > > now, I think?), but non-protected guests should implement the existing
> > > > > > interface *without* the need for the GUP pin on guest memory pages. Yes,
> > > > > > that means full support for MMU notifiers so that these pages can be
> > > > > > managed properly by the host kernel. We're working on that for pKVM, but
> > > > > > it requires a more flexible form of memory sharing over what we
> > > > > > currently
> > > > > > have so that e.g. the zero page can be shared between multiple entities.
> > > > >
> > > > > Gunyah doesn't support swapping pages out while the guest is running
> > > > > and the design of Gunyah isn't made to give host kernel full control
> > > > > over the S2 page table for its guests. As best I can tell from
> > > > > reading the respective drivers, ACRN and Nitro Enclaves both GUP pin
> > > > > guest memory pages prior to giving them to the guest, so I don't
> > > > > think this requirement from Gunyah is particularly unusual.
> > > > >
> > > >
> > > > I read/dug into mmu notifiers more and I don't think it matches with
> > > > Gunyah's features today. We don't allow the host to freely manage VM's
> > > > pages because it requires the guest VM to have a level of trust on the
> > > > host. Once a page is given to the guest, it's done for the lifetime of
> > > > the VM. Allowing the host to replace pages in the guest memory map isn't
> > > > part of any VM's security model that we run in Gunyah. With that
> > > > requirement, longterm pinning looks like the correct approach to me.
> > >
> > > Is my approach of longterm pinning correct given that Gunyah doesn't allow
> > > host to freely swap pages?
> >
> > No, I really don't think a longterm GUP pin is the right approach for this.
> > GUP pins in general are horrible for the mm layer, but required for cases
> > such as DMA where I/O faults are unrecoverable. Gunyah is not a good
> > justification for such a hack, and I don't think you get to choose which
> > parts of the Linux mm you want and which bits you don't.
> >
> > In other words, either carve out your memory and pin it that way, or
> > implement the proper hooks for the mm to do its job.
>
> I talked to the team about whether we can extend the Gunyah support for
> this. We have plans to support sharing/lending individual pages when the
> guest faults on them. The support also allows (unprotected) pages to be
> removed from the VM. We'll need to temporarily pin the pages of the VM
> configuration device tree blob while the VM is being created and those pages
> can be unpinned once the VM starts. I'll work on this.
That's pleasantly unexpected, thanks for pursuing this!
Will