2020-06-24 08:52:10

by Yi Liu

[permalink] [raw]
Subject: [PATCH v3 00/14] vfio: expose virtual Shared Virtual Addressing to VMs

Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
Intel platforms allows address space sharing between device DMA and
applications. SVA can reduce programming complexity and enhance security.

This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
guest application address space with passthru devices. This is called
vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
changes. For IOMMU and QEMU changes, they are in separate series (listed
in the "Related series").

The high-level architecture for SVA virtualization is as below, the key
design of vSVA support is to utilize the dual-stage IOMMU translation (
also known as IOMMU nesting translation) capability in host IOMMU.


.-------------. .---------------------------.
| vIOMMU | | Guest process CR3, FL only|
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush -
'-------------' |
| | V
| | CR3 in GPA
'-------------'
Guest
------| Shadow |--------------------------|--------
v v v
Host
.-------------. .----------------------.
| pIOMMU | | Bind FL for GVA-GPA |
| | '----------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.------------------------------.
| | |SL for GPA-HPA, default domain|
| | '------------------------------'
'-------------'
Where:
- FL = First level/stage one page tables
- SL = Second level/stage two page tables

Patch Overview:
1. a refactor to vfio_iommu_type1 ioctl (patch 0001)
2. reports IOMMU nesting info to userspace ( patch 0002, 0003 and 0014)
3. vfio support for PASID allocation and free for VMs (patch 0004, 0005, 0006)
4. vfio support for binding guest page table to host (patch 0007, 0008, 0009)
5. vfio support for IOMMU cache invalidation from VMs (patch 0010)
6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0011)
7. expose PASID capability to VM (patch 0012)
8. add doc for VFIO dual stage control (patch 0013)

The complete vSVA kernel upstream patches are divided into three phases:
1. Common APIs and PCI device direct assignment
2. IOMMU-backed Mediated Device assignment
3. Page Request Services (PRS) support

This patchset is aiming for the phase 1 and phase 2, and based on Jacob's
below series.
[PATCH v3 0/5] IOMMU user API enhancement - wip
https://lore.kernel.org/linux-iommu/[email protected]/

[PATCH 00/10] IOASID extensions for guest SVA - wip
https://lkml.org/lkml/2020/3/25/874

The latest IOASID code added below new interface for itertate all PASIDs of an
ioasid_set. The implementation is not sent out yet as Jacob needs some cleanup,
it can be found in branch vsva-linux-5.8-rc1-v3
int ioasid_set_for_each_ioasid(int sid, void (*fn)(ioasid_t id, void *data), void *data);

Complete set for current vSVA can be found in below branch.
This branch also includes some extra modifications to IOASID core code and
vt-d iommu driver cleanup patches.
https://github.com/luxis1999/linux-vsva.git:vsva-linux-5.8-rc1-v3

The corresponding QEMU patch series is included in below branch:
https://github.com/luxis1999/qemu.git:vsva_5.8_rc1_qemu_rfcv6


Regards,
Yi Liu

Changelog:
- Patch v2 -> Patch v3:
a) Rebase on top of Jacob's v3 iommu uapi patchset
b) Address comments from Kevin and Stefan Hajnoczi
c) Reuse DOMAIN_ATTR_NESTING to get iommu nesting info
d) Drop [PATCH v2 07/15] iommu/uapi: Add iommu_gpasid_unbind_data
https://lore.kernel.org/linux-iommu/[email protected]/#r

- Patch v1 -> Patch v2:
a) Refactor vfio_iommu_type1_ioctl() per suggestion from Christoph
Hellwig.
b) Re-sequence the patch series for better bisect support.
c) Report IOMMU nesting cap info in detail instead of a format in
v1.
d) Enforce one group per nesting type container for vfio iommu type1
driver.
e) Build the vfio_mm related code from vfio.c to be a separate
vfio_pasid.ko.
f) Add PASID ownership check in IOMMU driver.
g) Adopted to latest IOMMU UAPI design. Removed IOMMU UAPI version
check. Added iommu_gpasid_unbind_data for unbind requests from
userspace.
h) Define a single ioctl:VFIO_IOMMU_NESTING_OP for bind/unbind_gtbl
and cahce_invld.
i) Document dual stage control in vfio.rst.
Patch v1: https://lore.kernel.org/linux-iommu/[email protected]/

- RFC v3 -> Patch v1:
a) Address comments to the PASID request(alloc/free) path
b) Report PASID alloc/free availabitiy to user-space
c) Add a vfio_iommu_type1 parameter to support pasid quota tuning
d) Adjusted to latest ioasid code implementation. e.g. remove the
code for tracking the allocated PASIDs as latest ioasid code
will track it, VFIO could use ioasid_free_set() to free all
PASIDs.
RFC v3: https://lore.kernel.org/linux-iommu/[email protected]/

- RFC v2 -> v3:
a) Refine the whole patchset to fit the roughly parts in this series
b) Adds complete vfio PASID management framework. e.g. pasid alloc,
free, reclaim in VM crash/down and per-VM PASID quota to prevent
PASID abuse.
c) Adds IOMMU uAPI version check and page table format check to ensure
version compatibility and hardware compatibility.
d) Adds vSVA vfio support for IOMMU-backed mdevs.
RFC v2: https://lore.kernel.org/linux-iommu/[email protected]/

- RFC v1 -> v2:
Dropped vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE.
RFC v1: https://lore.kernel.org/linux-iommu/[email protected]/

---
Eric Auger (1):
vfio: Document dual stage control

Liu Yi L (12):
vfio/type1: Refactor vfio_iommu_type1_ioctl()
iommu: Report domain nesting info
vfio/type1: Report iommu nesting info to userspace
vfio: Add PASID allocation/free support
iommu/vt-d: Support setting ioasid set to domain
vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)
iommu/vt-d: Check ownership for PASIDs from user-space
vfio/type1: Support binding guest page tables to PASID
vfio/type1: Allow invalidating first-level/stage IOMMU cache
vfio/type1: Add vSVA support for IOMMU-backed mdevs
vfio/pci: Expose PCIe PASID capability to guest
iommu/vt-d: Support reporting nesting capability info

Yi Sun (1):
iommu: Pass domain to sva_unbind_gpasid()

Documentation/driver-api/vfio.rst | 67 ++++
drivers/iommu/arm-smmu-v3.c | 29 +-
drivers/iommu/arm-smmu.c | 29 +-
drivers/iommu/intel/iommu.c | 105 ++++-
drivers/iommu/intel/svm.c | 10 +-
drivers/iommu/iommu.c | 2 +-
drivers/vfio/Kconfig | 6 +
drivers/vfio/Makefile | 1 +
drivers/vfio/pci/vfio_pci_config.c | 2 +-
drivers/vfio/vfio_iommu_type1.c | 800 +++++++++++++++++++++++++++++--------
drivers/vfio/vfio_pasid.c | 191 +++++++++
include/linux/intel-iommu.h | 23 +-
include/linux/iommu.h | 4 +-
include/linux/vfio.h | 54 +++
include/uapi/linux/iommu.h | 59 +++
include/uapi/linux/vfio.h | 78 ++++
16 files changed, 1273 insertions(+), 187 deletions(-)
create mode 100644 drivers/vfio/vfio_pasid.c

--
2.7.4


2020-06-24 08:52:22

by Yi Liu

[permalink] [raw]
Subject: [PATCH v3 04/14] vfio: Add PASID allocation/free support

Shared Virtual Addressing (a.k.a Shared Virtual Memory) allows sharing
multiple process virtual address spaces with the device for simplified
programming model. PASID is used to tag an virtual address space in DMA
requests and to identify the related translation structure in IOMMU. When
a PASID-capable device is assigned to a VM, we want the same capability
of using PASID to tag guest process virtual address spaces to achieve
virtual SVA (vSVA).

PASID management for guest is vendor specific. Some vendors (e.g. Intel
VT-d) requires system-wide managed PASIDs cross all devices, regardless
of whether a device is used by host or assigned to guest. Other vendors
(e.g. ARM SMMU) may allow PASIDs managed per-device thus could be fully
delegated to the guest for assigned devices.

For system-wide managed PASIDs, this patch introduces a vfio module to
handle explicit PASID alloc/free requests from guest. Allocated PASIDs
are associated to a process (or, mm_struct) in IOASID core. A vfio_mm
object is introduced to track mm_struct. Multiple VFIO containers within
a process share the same vfio_mm object.

A quota mechanism is provided to prevent malicious user from exhausting
available PASIDs. Currently the quota is a global parameter applied to
all VFIO devices. In the future per-device quota might be supported too.

Cc: Kevin Tian <[email protected]>
CC: Jacob Pan <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Jean-Philippe Brucker <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Lu Baolu <[email protected]>
Suggested-by: Alex Williamson <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
---
v1 -> v2:
*) added in v2, split from the pasid alloc/free support of v1
---
drivers/vfio/Kconfig | 5 ++
drivers/vfio/Makefile | 1 +
drivers/vfio/vfio_pasid.c | 151 ++++++++++++++++++++++++++++++++++++++++++++++
include/linux/vfio.h | 28 +++++++++
4 files changed, 185 insertions(+)
create mode 100644 drivers/vfio/vfio_pasid.c

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index fd17db9..3d8a108 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -19,6 +19,11 @@ config VFIO_VIRQFD
depends on VFIO && EVENTFD
default n

+config VFIO_PASID
+ tristate
+ depends on IOASID && VFIO
+ default n
+
menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
depends on IOMMU_API
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index de67c47..bb836a3 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -3,6 +3,7 @@ vfio_virqfd-y := virqfd.o

obj-$(CONFIG_VFIO) += vfio.o
obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
+obj-$(CONFIG_VFIO_PASID) += vfio_pasid.o
obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
new file mode 100644
index 0000000..dd5b6d1
--- /dev/null
+++ b/drivers/vfio/vfio_pasid.c
@@ -0,0 +1,151 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 Intel Corporation.
+ * Author: Liu Yi L <[email protected]>
+ *
+ */
+
+#include <linux/vfio.h>
+#include <linux/eventfd.h>
+#include <linux/file.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/sched/mm.h>
+
+#define DRIVER_VERSION "0.1"
+#define DRIVER_AUTHOR "Liu Yi L <[email protected]>"
+#define DRIVER_DESC "PASID management for VFIO bus drivers"
+
+#define VFIO_DEFAULT_PASID_QUOTA 1000
+static int pasid_quota = VFIO_DEFAULT_PASID_QUOTA;
+module_param_named(pasid_quota, pasid_quota, uint, 0444);
+MODULE_PARM_DESC(pasid_quota,
+ " Set the quota for max number of PASIDs that an application is allowed to request (default 1000)");
+
+struct vfio_mm_token {
+ unsigned long long val;
+};
+
+struct vfio_mm {
+ struct kref kref;
+ struct vfio_mm_token token;
+ int ioasid_sid;
+ int pasid_quota;
+ struct list_head next;
+};
+
+static struct vfio_pasid {
+ struct mutex vfio_mm_lock;
+ struct list_head vfio_mm_list;
+} vfio_pasid;
+
+/* called with vfio.vfio_mm_lock held */
+static void vfio_mm_release(struct kref *kref)
+{
+ struct vfio_mm *vmm = container_of(kref, struct vfio_mm, kref);
+
+ list_del(&vmm->next);
+ mutex_unlock(&vfio_pasid.vfio_mm_lock);
+ ioasid_free_set(vmm->ioasid_sid, true);
+ kfree(vmm);
+}
+
+void vfio_mm_put(struct vfio_mm *vmm)
+{
+ kref_put_mutex(&vmm->kref, vfio_mm_release, &vfio_pasid.vfio_mm_lock);
+}
+
+static void vfio_mm_get(struct vfio_mm *vmm)
+{
+ kref_get(&vmm->kref);
+}
+
+struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
+{
+ struct mm_struct *mm = get_task_mm(task);
+ struct vfio_mm *vmm;
+ unsigned long long val = (unsigned long long) mm;
+ int ret;
+
+ mutex_lock(&vfio_pasid.vfio_mm_lock);
+ /* Search existing vfio_mm with current mm pointer */
+ list_for_each_entry(vmm, &vfio_pasid.vfio_mm_list, next) {
+ if (vmm->token.val == val) {
+ vfio_mm_get(vmm);
+ goto out;
+ }
+ }
+
+ vmm = kzalloc(sizeof(*vmm), GFP_KERNEL);
+ if (!vmm)
+ return ERR_PTR(-ENOMEM);
+
+ /*
+ * IOASID core provides a 'IOASID set' concept to track all
+ * PASIDs associated with a token. Here we use mm_struct as
+ * the token and create a IOASID set per mm_struct. All the
+ * containers of the process share the same IOASID set.
+ */
+ ret = ioasid_alloc_set((struct ioasid_set *) mm, pasid_quota,
+ &vmm->ioasid_sid);
+ if (ret) {
+ kfree(vmm);
+ return ERR_PTR(ret);
+ }
+
+ kref_init(&vmm->kref);
+ vmm->token.val = (unsigned long long) mm;
+ vmm->pasid_quota = pasid_quota;
+
+ list_add(&vmm->next, &vfio_pasid.vfio_mm_list);
+out:
+ mutex_unlock(&vfio_pasid.vfio_mm_lock);
+ mmput(mm);
+ return vmm;
+}
+
+int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
+{
+ ioasid_t pasid;
+
+ pasid = ioasid_alloc(vmm->ioasid_sid, min, max, NULL);
+
+ return (pasid == INVALID_IOASID) ? -ENOSPC : pasid;
+}
+
+void vfio_pasid_free_range(struct vfio_mm *vmm,
+ ioasid_t min, ioasid_t max)
+{
+ ioasid_t pasid = min;
+
+ if (min > max)
+ return;
+
+ /*
+ * IOASID core will notify PASID users (e.g. IOMMU driver) to
+ * teardown necessary structures depending on the to-be-freed
+ * PASID.
+ */
+ for (; pasid <= max; pasid++)
+ ioasid_free(pasid);
+}
+
+static int __init vfio_pasid_init(void)
+{
+ mutex_init(&vfio_pasid.vfio_mm_lock);
+ INIT_LIST_HEAD(&vfio_pasid.vfio_mm_list);
+ return 0;
+}
+
+static void __exit vfio_pasid_exit(void)
+{
+ WARN_ON(!list_empty(&vfio_pasid.vfio_mm_list));
+}
+
+module_init(vfio_pasid_init);
+module_exit(vfio_pasid_exit);
+
+MODULE_VERSION(DRIVER_VERSION);
+MODULE_LICENSE("GPL v2");
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 38d3c6a..74e077d 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -97,6 +97,34 @@ extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
extern void vfio_unregister_iommu_driver(
const struct vfio_iommu_driver_ops *ops);

+struct vfio_mm;
+#if IS_ENABLED(CONFIG_VFIO_PASID)
+extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task);
+extern void vfio_mm_put(struct vfio_mm *vmm);
+extern int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max);
+extern void vfio_pasid_free_range(struct vfio_mm *vmm,
+ ioasid_t min, ioasid_t max);
+#else
+static inline struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
+{
+ return NULL;
+}
+
+static inline void vfio_mm_put(struct vfio_mm *vmm)
+{
+}
+
+static inline int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
+{
+ return -ENOTTY;
+}
+
+static inline void vfio_pasid_free_range(struct vfio_mm *vmm,
+ ioasid_t min, ioasid_t max)
+{
+}
+#endif /* CONFIG_VFIO_PASID */
+
/*
* External user API
*/
--
2.7.4

2020-06-24 08:52:26

by Yi Liu

[permalink] [raw]
Subject: [PATCH v3 13/14] vfio: Document dual stage control

From: Eric Auger <[email protected]>

The VFIO API was enhanced to support nested stage control: a bunch of
new iotcls and usage guideline.

Let's document the process to follow to set up nested mode.

Cc: Kevin Tian <[email protected]>
CC: Jacob Pan <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Jean-Philippe Brucker <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Lu Baolu <[email protected]>
Signed-off-by: Eric Auger <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
---
v2 -> v3:
*) address comments from Stefan Hajnoczi

v1 -> v2:
*) new in v2, compared with Eric's original version, pasid table bind
and fault reporting is removed as this series doesn't cover them.
Original version from Eric.
https://lkml.org/lkml/2020/3/20/700

Documentation/driver-api/vfio.rst | 67 +++++++++++++++++++++++++++++++++++++++
1 file changed, 67 insertions(+)

diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index f1a4d3c..639890f 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -239,6 +239,73 @@ group and can access them as follows::
/* Gratuitous device reset and go... */
ioctl(device, VFIO_DEVICE_RESET);

+IOMMU Dual Stage Control
+------------------------
+
+Some IOMMUs support 2 stages/levels of translation. Stage corresponds to
+the ARM terminology while level corresponds to Intel's VTD terminology.
+In the following text we use either without distinction.
+
+This is useful when the guest is exposed with a virtual IOMMU and some
+devices are assigned to the guest through VFIO. Then the guest OS can use
+stage 1 (GIOVA -> GPA or GVA->GPA), while the hypervisor uses stage 2 for
+VM isolation (GPA -> HPA).
+
+Under dual stage translation, the guest gets ownership of the stage 1 page
+tables and also owns stage 1 configuration structures. The hypervisor owns
+the root configuration structure (for security reason), including stage 2
+configuration. This works as long as configuration structures and page table
+formats are compatible between the virtual IOMMU and the physical IOMMU.
+
+Assuming the HW supports it, this nested mode is selected by choosing the
+VFIO_TYPE1_NESTING_IOMMU type through:
+
+ ioctl(container, VFIO_SET_IOMMU, VFIO_TYPE1_NESTING_IOMMU);
+
+This forces the hypervisor to use the stage 2, leaving stage 1 available
+for guest usage. The guest stage 1 format depends on IOMMU vendor, and
+it is the same with the nesting configuration method. User space should
+check the format and configuration method after setting nesting type by
+using:
+
+ ioctl(container->fd, VFIO_IOMMU_GET_INFO, &nesting_info);
+
+Details can be found in Documentation/userspace-api/iommu.rst. For Intel
+VT-d, each stage 1 page table is bound to host by:
+
+ nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
+ memcpy(&nesting_op->data, &bind_data, sizeof(bind_data));
+ ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
+
+As mentioned above, guest OS may use stage 1 for GIOVA->GPA or GVA->GPA.
+GVA->GPA page tables are available when PASID (Process Address Space ID)
+is exposed to guest. e.g. guest with PASID-capable devices assigned. For
+such page table binding, the bind_data should include PASID info, which
+is allocated by guest itself or by host. This depends on hardware vendor
+e.g. Intel VT-d requires to allocate PASID from host. This requirement is
+defined by the Virtual Command Support in VT-d 3.0 spec, guest software
+running on VT-d should allocate PASID from host kernel. To allocate PASID
+from host, user space should +check the IOMMU_NESTING_FEAT_SYSWIDE_PASID
+bit of the nesting info reported from host kernel. VFIO reports the nesting
+info by VFIO_IOMMU_GET_INFO. User space could allocate PASID from host by:
+
+ req.flags = VFIO_IOMMU_ALLOC_PASID;
+ ioctl(container, VFIO_IOMMU_PASID_REQUEST, &req);
+
+With first stage/level page table bound to host, it allows to combine the
+guest stage 1 translation along with the hypervisor stage 2 translation to
+get final address.
+
+When the guest invalidates stage 1 related caches, invalidations must be
+forwarded to the host through
+
+ nesting_op->flags = VFIO_IOMMU_NESTING_OP_CACHE_INVLD;
+ memcpy(&nesting_op->data, &inv_data, sizeof(inv_data));
+ ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
+
+Those invalidations can happen at various granularity levels, page, context,
+...
+
VFIO User API
-------------------------------------------------------------------------------

--
2.7.4

2020-06-24 08:53:28

by Yi Liu

[permalink] [raw]
Subject: [PATCH v3 11/14] vfio/type1: Add vSVA support for IOMMU-backed mdevs

Recent years, mediated device pass-through framework (e.g. vfio-mdev)
is used to achieve flexible device sharing across domains (e.g. VMs).
Also there are hardware assisted mediated pass-through solutions from
platform vendors. e.g. Intel VT-d scalable mode which supports Intel
Scalable I/O Virtualization technology. Such mdevs are called IOMMU-
backed mdevs as there are IOMMU enforced DMA isolation for such mdevs.
In kernel, IOMMU-backed mdevs are exposed to IOMMU layer by aux-domain
concept, which means mdevs are protected by an iommu domain which is
auxiliary to the domain that the kernel driver primarily uses for DMA
API. Details can be found in the KVM presentation as below:

https://events19.linuxfoundation.org/wp-content/uploads/2017/12/\
Hardware-Assisted-Mediated-Pass-Through-with-VFIO-Kevin-Tian-Intel.pdf

This patch extends NESTING_IOMMU ops to IOMMU-backed mdev devices. The
main requirement is to use the auxiliary domain associated with mdev.

Cc: Kevin Tian <[email protected]>
CC: Jacob Pan <[email protected]>
CC: Jun Tian <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Jean-Philippe Brucker <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Lu Baolu <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
---
v1 -> v2:
*) check the iommu_device to ensure the handling mdev is IOMMU-backed
---
drivers/vfio/vfio_iommu_type1.c | 40 ++++++++++++++++++++++++++++++++++++----
1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4c21300..e1a794c 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2378,20 +2378,41 @@ static int vfio_iommu_resv_refresh(struct vfio_iommu *iommu,
return ret;
}

+static struct device *vfio_get_iommu_device(struct vfio_group *group,
+ struct device *dev)
+{
+ if (group->mdev_group)
+ return vfio_mdev_get_iommu_device(dev);
+ else
+ return dev;
+}
+
static int vfio_dev_bind_gpasid_fn(struct device *dev, void *data)
{
struct domain_capsule *dc = (struct domain_capsule *)data;
unsigned long arg = *(unsigned long *) dc->data;
+ struct device *iommu_device;
+
+ iommu_device = vfio_get_iommu_device(dc->group, dev);
+ if (!iommu_device)
+ return -EINVAL;

- return iommu_sva_bind_gpasid(dc->domain, dev, (void __user *) arg);
+ return iommu_sva_bind_gpasid(dc->domain, iommu_device,
+ (void __user *) arg);
}

static int vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
{
struct domain_capsule *dc = (struct domain_capsule *)data;
unsigned long arg = *(unsigned long *) dc->data;
+ struct device *iommu_device;

- iommu_sva_unbind_gpasid(dc->domain, dev, (void __user *) arg);
+ iommu_device = vfio_get_iommu_device(dc->group, dev);
+ if (!iommu_device)
+ return -EINVAL;
+
+ iommu_sva_unbind_gpasid(dc->domain, iommu_device,
+ (void __user *) arg);
return 0;
}

@@ -2400,8 +2421,13 @@ static int __vfio_dev_unbind_gpasid_fn(struct device *dev, void *data)
struct domain_capsule *dc = (struct domain_capsule *)data;
struct iommu_gpasid_bind_data *unbind_data =
(struct iommu_gpasid_bind_data *) dc->data;
+ struct device *iommu_device;
+
+ iommu_device = vfio_get_iommu_device(dc->group, dev);
+ if (!iommu_device)
+ return -EINVAL;

- __iommu_sva_unbind_gpasid(dc->domain, dev, unbind_data);
+ __iommu_sva_unbind_gpasid(dc->domain, iommu_device, unbind_data);
return 0;
}

@@ -3084,8 +3110,14 @@ static int vfio_dev_cache_invalidate_fn(struct device *dev, void *data)
{
struct domain_capsule *dc = (struct domain_capsule *)data;
unsigned long arg = *(unsigned long *) dc->data;
+ struct device *iommu_device;
+
+ iommu_device = vfio_get_iommu_device(dc->group, dev);
+ if (!iommu_device)
+ return -EINVAL;

- iommu_cache_invalidate(dc->domain, dev, (void __user *) arg);
+ iommu_cache_invalidate(dc->domain, iommu_device,
+ (void __user *) arg);
return 0;
}

--
2.7.4

2020-06-29 19:01:57

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v3 13/14] vfio: Document dual stage control

> From: Stefan Hajnoczi <[email protected]>
> Sent: Monday, June 29, 2020 5:22 PM
>
> On Wed, Jun 24, 2020 at 01:55:26AM -0700, Liu Yi L wrote:
> > +Details can be found in Documentation/userspace-api/iommu.rst. For
> > +Intel VT-d, each stage 1 page table is bound to host by:
> > +
> > + nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
> > + memcpy(&nesting_op->data, &bind_data, sizeof(bind_data));
> > + ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
> > +
> > +As mentioned above, guest OS may use stage 1 for GIOVA->GPA or GVA->GPA.
> > +GVA->GPA page tables are available when PASID (Process Address Space
> > +GVA->ID)
> > +is exposed to guest. e.g. guest with PASID-capable devices assigned.
> > +For such page table binding, the bind_data should include PASID info,
> > +which is allocated by guest itself or by host. This depends on
> > +hardware vendor e.g. Intel VT-d requires to allocate PASID from host.
> > +This requirement is defined by the Virtual Command Support in VT-d
> > +3.0 spec, guest software running on VT-d should allocate PASID from
> > +host kernel. To allocate PASID from host, user space should +check
> > +the IOMMU_NESTING_FEAT_SYSWIDE_PASID
>
> s/+check/check/g

got it.

> Reviewed-by: Stefan Hajnoczi <[email protected]>

thanks :-)

Regards,
Yi Liu

2020-06-29 20:42:01

by Stefan Hajnoczi

[permalink] [raw]
Subject: Re: [PATCH v3 13/14] vfio: Document dual stage control

On Wed, Jun 24, 2020 at 01:55:26AM -0700, Liu Yi L wrote:
> +Details can be found in Documentation/userspace-api/iommu.rst. For Intel
> +VT-d, each stage 1 page table is bound to host by:
> +
> + nesting_op->flags = VFIO_IOMMU_NESTING_OP_BIND_PGTBL;
> + memcpy(&nesting_op->data, &bind_data, sizeof(bind_data));
> + ioctl(container->fd, VFIO_IOMMU_NESTING_OP, nesting_op);
> +
> +As mentioned above, guest OS may use stage 1 for GIOVA->GPA or GVA->GPA.
> +GVA->GPA page tables are available when PASID (Process Address Space ID)
> +is exposed to guest. e.g. guest with PASID-capable devices assigned. For
> +such page table binding, the bind_data should include PASID info, which
> +is allocated by guest itself or by host. This depends on hardware vendor
> +e.g. Intel VT-d requires to allocate PASID from host. This requirement is
> +defined by the Virtual Command Support in VT-d 3.0 spec, guest software
> +running on VT-d should allocate PASID from host kernel. To allocate PASID
> +from host, user space should +check the IOMMU_NESTING_FEAT_SYSWIDE_PASID

s/+check/check/g

Reviewed-by: Stefan Hajnoczi <[email protected]>


Attachments:
(No filename) (1.13 kB)
signature.asc (499.00 B)
Download all attachments

2020-07-02 21:18:18

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v3 04/14] vfio: Add PASID allocation/free support

On Wed, 24 Jun 2020 01:55:17 -0700
Liu Yi L <[email protected]> wrote:

> Shared Virtual Addressing (a.k.a Shared Virtual Memory) allows sharing
> multiple process virtual address spaces with the device for simplified
> programming model. PASID is used to tag an virtual address space in DMA
> requests and to identify the related translation structure in IOMMU. When
> a PASID-capable device is assigned to a VM, we want the same capability
> of using PASID to tag guest process virtual address spaces to achieve
> virtual SVA (vSVA).
>
> PASID management for guest is vendor specific. Some vendors (e.g. Intel
> VT-d) requires system-wide managed PASIDs cross all devices, regardless
> of whether a device is used by host or assigned to guest. Other vendors
> (e.g. ARM SMMU) may allow PASIDs managed per-device thus could be fully
> delegated to the guest for assigned devices.
>
> For system-wide managed PASIDs, this patch introduces a vfio module to
> handle explicit PASID alloc/free requests from guest. Allocated PASIDs
> are associated to a process (or, mm_struct) in IOASID core. A vfio_mm
> object is introduced to track mm_struct. Multiple VFIO containers within
> a process share the same vfio_mm object.
>
> A quota mechanism is provided to prevent malicious user from exhausting
> available PASIDs. Currently the quota is a global parameter applied to
> all VFIO devices. In the future per-device quota might be supported too.
>
> Cc: Kevin Tian <[email protected]>
> CC: Jacob Pan <[email protected]>
> Cc: Eric Auger <[email protected]>
> Cc: Jean-Philippe Brucker <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Lu Baolu <[email protected]>
> Suggested-by: Alex Williamson <[email protected]>
> Signed-off-by: Liu Yi L <[email protected]>
> ---
> v1 -> v2:
> *) added in v2, split from the pasid alloc/free support of v1
> ---
> drivers/vfio/Kconfig | 5 ++
> drivers/vfio/Makefile | 1 +
> drivers/vfio/vfio_pasid.c | 151 ++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/vfio.h | 28 +++++++++
> 4 files changed, 185 insertions(+)
> create mode 100644 drivers/vfio/vfio_pasid.c
>
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index fd17db9..3d8a108 100644
> --- a/drivers/vfio/Kconfig
> +++ b/drivers/vfio/Kconfig
> @@ -19,6 +19,11 @@ config VFIO_VIRQFD
> depends on VFIO && EVENTFD
> default n
>
> +config VFIO_PASID
> + tristate
> + depends on IOASID && VFIO
> + default n
> +
> menuconfig VFIO
> tristate "VFIO Non-Privileged userspace driver framework"
> depends on IOMMU_API
> diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
> index de67c47..bb836a3 100644
> --- a/drivers/vfio/Makefile
> +++ b/drivers/vfio/Makefile
> @@ -3,6 +3,7 @@ vfio_virqfd-y := virqfd.o
>
> obj-$(CONFIG_VFIO) += vfio.o
> obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
> +obj-$(CONFIG_VFIO_PASID) += vfio_pasid.o
> obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
> obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
> obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o
> diff --git a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c
> new file mode 100644
> index 0000000..dd5b6d1
> --- /dev/null
> +++ b/drivers/vfio/vfio_pasid.c
> @@ -0,0 +1,151 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2020 Intel Corporation.
> + * Author: Liu Yi L <[email protected]>
> + *
> + */
> +
> +#include <linux/vfio.h>
> +#include <linux/eventfd.h>
> +#include <linux/file.h>
> +#include <linux/module.h>
> +#include <linux/slab.h>
> +#include <linux/sched/mm.h>
> +
> +#define DRIVER_VERSION "0.1"
> +#define DRIVER_AUTHOR "Liu Yi L <[email protected]>"
> +#define DRIVER_DESC "PASID management for VFIO bus drivers"
> +
> +#define VFIO_DEFAULT_PASID_QUOTA 1000
> +static int pasid_quota = VFIO_DEFAULT_PASID_QUOTA;
> +module_param_named(pasid_quota, pasid_quota, uint, 0444);
> +MODULE_PARM_DESC(pasid_quota,
> + " Set the quota for max number of PASIDs that an application is allowed to request (default 1000)");
> +
> +struct vfio_mm_token {
> + unsigned long long val;
> +};
> +
> +struct vfio_mm {
> + struct kref kref;
> + struct vfio_mm_token token;
> + int ioasid_sid;
> + int pasid_quota;
> + struct list_head next;
> +};
> +
> +static struct vfio_pasid {
> + struct mutex vfio_mm_lock;
> + struct list_head vfio_mm_list;
> +} vfio_pasid;
> +
> +/* called with vfio.vfio_mm_lock held */
> +static void vfio_mm_release(struct kref *kref)
> +{
> + struct vfio_mm *vmm = container_of(kref, struct vfio_mm, kref);
> +
> + list_del(&vmm->next);
> + mutex_unlock(&vfio_pasid.vfio_mm_lock);
> + ioasid_free_set(vmm->ioasid_sid, true);
> + kfree(vmm);
> +}
> +
> +void vfio_mm_put(struct vfio_mm *vmm)
> +{
> + kref_put_mutex(&vmm->kref, vfio_mm_release, &vfio_pasid.vfio_mm_lock);
> +}
> +
> +static void vfio_mm_get(struct vfio_mm *vmm)
> +{
> + kref_get(&vmm->kref);
> +}
> +
> +struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
> +{
> + struct mm_struct *mm = get_task_mm(task);
> + struct vfio_mm *vmm;
> + unsigned long long val = (unsigned long long) mm;
> + int ret;
> +
> + mutex_lock(&vfio_pasid.vfio_mm_lock);
> + /* Search existing vfio_mm with current mm pointer */
> + list_for_each_entry(vmm, &vfio_pasid.vfio_mm_list, next) {
> + if (vmm->token.val == val) {
> + vfio_mm_get(vmm);
> + goto out;
> + }
> + }
> +
> + vmm = kzalloc(sizeof(*vmm), GFP_KERNEL);
> + if (!vmm)
> + return ERR_PTR(-ENOMEM);

lock leaked, mm leaked.

> +
> + /*
> + * IOASID core provides a 'IOASID set' concept to track all
> + * PASIDs associated with a token. Here we use mm_struct as
> + * the token and create a IOASID set per mm_struct. All the
> + * containers of the process share the same IOASID set.
> + */
> + ret = ioasid_alloc_set((struct ioasid_set *) mm, pasid_quota,
> + &vmm->ioasid_sid);
> + if (ret) {
> + kfree(vmm);
> + return ERR_PTR(ret);

lock leaked, mm leaked.

> + }
> +
> + kref_init(&vmm->kref);
> + vmm->token.val = (unsigned long long) mm;

We already have it in @val.

> + vmm->pasid_quota = pasid_quota;

This field on the structure and this assignment seems to serve no
purpose. Thanks,

Alex

> +
> + list_add(&vmm->next, &vfio_pasid.vfio_mm_list);
> +out:
> + mutex_unlock(&vfio_pasid.vfio_mm_lock);
> + mmput(mm);
> + return vmm;
> +}
> +
> +int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
> +{
> + ioasid_t pasid;
> +
> + pasid = ioasid_alloc(vmm->ioasid_sid, min, max, NULL);
> +
> + return (pasid == INVALID_IOASID) ? -ENOSPC : pasid;
> +}
> +
> +void vfio_pasid_free_range(struct vfio_mm *vmm,
> + ioasid_t min, ioasid_t max)
> +{
> + ioasid_t pasid = min;
> +
> + if (min > max)
> + return;
> +
> + /*
> + * IOASID core will notify PASID users (e.g. IOMMU driver) to
> + * teardown necessary structures depending on the to-be-freed
> + * PASID.
> + */
> + for (; pasid <= max; pasid++)
> + ioasid_free(pasid);
> +}
> +
> +static int __init vfio_pasid_init(void)
> +{
> + mutex_init(&vfio_pasid.vfio_mm_lock);
> + INIT_LIST_HEAD(&vfio_pasid.vfio_mm_list);
> + return 0;
> +}
> +
> +static void __exit vfio_pasid_exit(void)
> +{
> + WARN_ON(!list_empty(&vfio_pasid.vfio_mm_list));
> +}
> +
> +module_init(vfio_pasid_init);
> +module_exit(vfio_pasid_exit);
> +
> +MODULE_VERSION(DRIVER_VERSION);
> +MODULE_LICENSE("GPL v2");
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 38d3c6a..74e077d 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -97,6 +97,34 @@ extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
> extern void vfio_unregister_iommu_driver(
> const struct vfio_iommu_driver_ops *ops);
>
> +struct vfio_mm;
> +#if IS_ENABLED(CONFIG_VFIO_PASID)
> +extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task);
> +extern void vfio_mm_put(struct vfio_mm *vmm);
> +extern int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max);
> +extern void vfio_pasid_free_range(struct vfio_mm *vmm,
> + ioasid_t min, ioasid_t max);
> +#else
> +static inline struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task)
> +{
> + return NULL;
> +}
> +
> +static inline void vfio_mm_put(struct vfio_mm *vmm)
> +{
> +}
> +
> +static inline int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max)
> +{
> + return -ENOTTY;
> +}
> +
> +static inline void vfio_pasid_free_range(struct vfio_mm *vmm,
> + ioasid_t min, ioasid_t max)
> +{
> +}
> +#endif /* CONFIG_VFIO_PASID */
> +
> /*
> * External user API
> */

2020-07-03 06:10:48

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v3 04/14] vfio: Add PASID allocation/free support

Hi Alex,

> From: Alex Williamson <[email protected]>
> Sent: Friday, July 3, 2020 5:17 AM
>
> On Wed, 24 Jun 2020 01:55:17 -0700
> Liu Yi L <[email protected]> wrote:
>
> > Shared Virtual Addressing (a.k.a Shared Virtual Memory) allows sharing
> > multiple process virtual address spaces with the device for simplified
> > programming model. PASID is used to tag an virtual address space in
> > DMA requests and to identify the related translation structure in
> > IOMMU. When a PASID-capable device is assigned to a VM, we want the
> > same capability of using PASID to tag guest process virtual address
> > spaces to achieve virtual SVA (vSVA).
> >
> > PASID management for guest is vendor specific. Some vendors (e.g.
> > Intel
> > VT-d) requires system-wide managed PASIDs cross all devices,
> > regardless of whether a device is used by host or assigned to guest.
> > Other vendors (e.g. ARM SMMU) may allow PASIDs managed per-device thus
> > could be fully delegated to the guest for assigned devices.
> >
> > For system-wide managed PASIDs, this patch introduces a vfio module to
> > handle explicit PASID alloc/free requests from guest. Allocated PASIDs
> > are associated to a process (or, mm_struct) in IOASID core. A vfio_mm
> > object is introduced to track mm_struct. Multiple VFIO containers
> > within a process share the same vfio_mm object.
> >
> > A quota mechanism is provided to prevent malicious user from
> > exhausting available PASIDs. Currently the quota is a global parameter
> > applied to all VFIO devices. In the future per-device quota might be supported
> too.
> >
> > Cc: Kevin Tian <[email protected]>
> > CC: Jacob Pan <[email protected]>
> > Cc: Eric Auger <[email protected]>
> > Cc: Jean-Philippe Brucker <[email protected]>
> > Cc: Joerg Roedel <[email protected]>
> > Cc: Lu Baolu <[email protected]>
> > Suggested-by: Alex Williamson <[email protected]>
> > Signed-off-by: Liu Yi L <[email protected]>
> > ---
> > v1 -> v2:
> > *) added in v2, split from the pasid alloc/free support of v1
> > ---
> > drivers/vfio/Kconfig | 5 ++
> > drivers/vfio/Makefile | 1 +
> > drivers/vfio/vfio_pasid.c | 151
> ++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/vfio.h | 28 +++++++++
> > 4 files changed, 185 insertions(+)
> > create mode 100644 drivers/vfio/vfio_pasid.c
> >
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig index
> > fd17db9..3d8a108 100644
> > --- a/drivers/vfio/Kconfig
> > +++ b/drivers/vfio/Kconfig
> > @@ -19,6 +19,11 @@ config VFIO_VIRQFD
> > depends on VFIO && EVENTFD
> > default n
> >
> > +config VFIO_PASID
> > + tristate
> > + depends on IOASID && VFIO
> > + default n
> > +
> > menuconfig VFIO
> > tristate "VFIO Non-Privileged userspace driver framework"
> > depends on IOMMU_API
> > diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile index
> > de67c47..bb836a3 100644
> > --- a/drivers/vfio/Makefile
> > +++ b/drivers/vfio/Makefile
> > @@ -3,6 +3,7 @@ vfio_virqfd-y := virqfd.o
> >
> > obj-$(CONFIG_VFIO) += vfio.o
> > obj-$(CONFIG_VFIO_VIRQFD) += vfio_virqfd.o
> > +obj-$(CONFIG_VFIO_PASID) += vfio_pasid.o
> > obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o
> > obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
> > obj-$(CONFIG_VFIO_SPAPR_EEH) += vfio_spapr_eeh.o diff --git
> > a/drivers/vfio/vfio_pasid.c b/drivers/vfio/vfio_pasid.c new file mode
> > 100644 index 0000000..dd5b6d1
> > --- /dev/null
> > +++ b/drivers/vfio/vfio_pasid.c
> > @@ -0,0 +1,151 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2020 Intel Corporation.
> > + * Author: Liu Yi L <[email protected]>
> > + *
> > + */
> > +
> > +#include <linux/vfio.h>
> > +#include <linux/eventfd.h>
> > +#include <linux/file.h>
> > +#include <linux/module.h>
> > +#include <linux/slab.h>
> > +#include <linux/sched/mm.h>
> > +
> > +#define DRIVER_VERSION "0.1"
> > +#define DRIVER_AUTHOR "Liu Yi L <[email protected]>"
> > +#define DRIVER_DESC "PASID management for VFIO bus drivers"
> > +
> > +#define VFIO_DEFAULT_PASID_QUOTA 1000
> > +static int pasid_quota = VFIO_DEFAULT_PASID_QUOTA;
> > +module_param_named(pasid_quota, pasid_quota, uint, 0444);
> > +MODULE_PARM_DESC(pasid_quota,
> > + " Set the quota for max number of PASIDs that an application is
> > +allowed to request (default 1000)");
> > +
> > +struct vfio_mm_token {
> > + unsigned long long val;
> > +};
> > +
> > +struct vfio_mm {
> > + struct kref kref;
> > + struct vfio_mm_token token;
> > + int ioasid_sid;
> > + int pasid_quota;
> > + struct list_head next;
> > +};
> > +
> > +static struct vfio_pasid {
> > + struct mutex vfio_mm_lock;
> > + struct list_head vfio_mm_list;
> > +} vfio_pasid;
> > +
> > +/* called with vfio.vfio_mm_lock held */ static void
> > +vfio_mm_release(struct kref *kref) {
> > + struct vfio_mm *vmm = container_of(kref, struct vfio_mm, kref);
> > +
> > + list_del(&vmm->next);
> > + mutex_unlock(&vfio_pasid.vfio_mm_lock);
> > + ioasid_free_set(vmm->ioasid_sid, true);
> > + kfree(vmm);
> > +}
> > +
> > +void vfio_mm_put(struct vfio_mm *vmm) {
> > + kref_put_mutex(&vmm->kref, vfio_mm_release,
> > +&vfio_pasid.vfio_mm_lock); }
> > +
> > +static void vfio_mm_get(struct vfio_mm *vmm) {
> > + kref_get(&vmm->kref);
> > +}
> > +
> > +struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task) {
> > + struct mm_struct *mm = get_task_mm(task);
> > + struct vfio_mm *vmm;
> > + unsigned long long val = (unsigned long long) mm;
> > + int ret;
> > +
> > + mutex_lock(&vfio_pasid.vfio_mm_lock);
> > + /* Search existing vfio_mm with current mm pointer */
> > + list_for_each_entry(vmm, &vfio_pasid.vfio_mm_list, next) {
> > + if (vmm->token.val == val) {
> > + vfio_mm_get(vmm);
> > + goto out;
> > + }
> > + }
> > +
> > + vmm = kzalloc(sizeof(*vmm), GFP_KERNEL);
> > + if (!vmm)
> > + return ERR_PTR(-ENOMEM);
>
> lock leaked, mm leaked.

oh, yes. silly mistake.

> > +
> > + /*
> > + * IOASID core provides a 'IOASID set' concept to track all
> > + * PASIDs associated with a token. Here we use mm_struct as
> > + * the token and create a IOASID set per mm_struct. All the
> > + * containers of the process share the same IOASID set.
> > + */
> > + ret = ioasid_alloc_set((struct ioasid_set *) mm, pasid_quota,
> > + &vmm->ioasid_sid);
> > + if (ret) {
> > + kfree(vmm);
> > + return ERR_PTR(ret);
>
> lock leaked, mm leaked.

got it.

> > + }
> > +
> > + kref_init(&vmm->kref);
> > + vmm->token.val = (unsigned long long) mm;
>
> We already have it in @val.

yep, let me use val directly.

> > + vmm->pasid_quota = pasid_quota;
>
> This field on the structure and this assignment seems to serve no purpose.

yeah, it's used in prior version. let me drop it. if we still want it, may add
later.

> Thanks,
>
> Alex
>
> > +
> > + list_add(&vmm->next, &vfio_pasid.vfio_mm_list);
> > +out:
> > + mutex_unlock(&vfio_pasid.vfio_mm_lock);
> > + mmput(mm);
> > + return vmm;
> > +}
> > +
> > +int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max) {
> > + ioasid_t pasid;
> > +
> > + pasid = ioasid_alloc(vmm->ioasid_sid, min, max, NULL);
> > +
> > + return (pasid == INVALID_IOASID) ? -ENOSPC : pasid; }
> > +
> > +void vfio_pasid_free_range(struct vfio_mm *vmm,
> > + ioasid_t min, ioasid_t max)
> > +{
> > + ioasid_t pasid = min;
> > +
> > + if (min > max)
> > + return;
> > +
> > + /*
> > + * IOASID core will notify PASID users (e.g. IOMMU driver) to
> > + * teardown necessary structures depending on the to-be-freed
> > + * PASID.
> > + */
> > + for (; pasid <= max; pasid++)
> > + ioasid_free(pasid);
> > +}
> > +
> > +static int __init vfio_pasid_init(void) {
> > + mutex_init(&vfio_pasid.vfio_mm_lock);
> > + INIT_LIST_HEAD(&vfio_pasid.vfio_mm_list);
> > + return 0;
> > +}
> > +
> > +static void __exit vfio_pasid_exit(void) {
> > + WARN_ON(!list_empty(&vfio_pasid.vfio_mm_list));
> > +}
> > +
> > +module_init(vfio_pasid_init);
> > +module_exit(vfio_pasid_exit);
> > +
> > +MODULE_VERSION(DRIVER_VERSION);
> > +MODULE_LICENSE("GPL v2");
> > +MODULE_AUTHOR(DRIVER_AUTHOR);
> > +MODULE_DESCRIPTION(DRIVER_DESC);
> > diff --git a/include/linux/vfio.h b/include/linux/vfio.h index
> > 38d3c6a..74e077d 100644
> > --- a/include/linux/vfio.h
> > +++ b/include/linux/vfio.h
> > @@ -97,6 +97,34 @@ extern int vfio_register_iommu_driver(const struct
> > vfio_iommu_driver_ops *ops); extern void vfio_unregister_iommu_driver(
> > const struct vfio_iommu_driver_ops *ops);
> >
> > +struct vfio_mm;
> > +#if IS_ENABLED(CONFIG_VFIO_PASID)
> > +extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct
> > +*task); extern void vfio_mm_put(struct vfio_mm *vmm); extern int
> > +vfio_pasid_alloc(struct vfio_mm *vmm, int min, int max); extern void
> > +vfio_pasid_free_range(struct vfio_mm *vmm,
> > + ioasid_t min, ioasid_t max);
> > +#else
> > +static inline struct vfio_mm *vfio_mm_get_from_task(struct
> > +task_struct *task) {
> > + return NULL;
> > +}
> > +
> > +static inline void vfio_mm_put(struct vfio_mm *vmm) { }
> > +
> > +static inline int vfio_pasid_alloc(struct vfio_mm *vmm, int min, int
> > +max) {
> > + return -ENOTTY;
> > +}
> > +
> > +static inline void vfio_pasid_free_range(struct vfio_mm *vmm,
> > + ioasid_t min, ioasid_t max)
> > +{
> > +}
> > +#endif /* CONFIG_VFIO_PASID */
> > +
> > /*
> > * External user API
> > */