2020-07-28 06:25:37

by Yi Liu

[permalink] [raw]
Subject: [PATCH v6 00/15] vfio: expose virtual Shared Virtual Addressing to VMs

Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
Intel platforms allows address space sharing between device DMA and
applications. SVA can reduce programming complexity and enhance security.

This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
guest application address space with passthru devices. This is called
vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
changes. For IOMMU and QEMU changes, they are in separate series (listed
in the "Related series").

The high-level architecture for SVA virtualization is as below, the key
design of vSVA support is to utilize the dual-stage IOMMU translation (
also known as IOMMU nesting translation) capability in host IOMMU.


.-------------. .---------------------------.
| vIOMMU | | Guest process CR3, FL only|
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush -
'-------------' |
| | V
| | CR3 in GPA
'-------------'
Guest
------| Shadow |--------------------------|--------
v v v
Host
.-------------. .----------------------.
| pIOMMU | | Bind FL for GVA-GPA |
| | '----------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.------------------------------.
| | |SL for GPA-HPA, default domain|
| | '------------------------------'
'-------------'
Where:
- FL = First level/stage one page tables
- SL = Second level/stage two page tables

Patch Overview:
1. a refactor to vfio_iommu_type1 ioctl (patch 0001)
2. reports IOMMU nesting info to userspace ( patch 0002, 0003, 0004 and 0015)
3. vfio support for PASID allocation and free for VMs (patch 0005, 0006, 0007)
4. vfio support for binding guest page table to host (patch 0008, 0009, 0010)
5. vfio support for IOMMU cache invalidation from VMs (patch 0011)
6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012)
7. expose PASID capability to VM (patch 0013)
8. add doc for VFIO dual stage control (patch 0014)

The complete vSVA kernel upstream patches are divided into three phases:
1. Common APIs and PCI device direct assignment
2. IOMMU-backed Mediated Device assignment
3. Page Request Services (PRS) support

This patchset is aiming for the phase 1 and phase 2, and based on Jacob's
below series.
*) [PATCH v6 0/6] IOMMU user API enhancement - wip
https://lore.kernel.org/linux-iommu/[email protected]/

*) [PATCH 00/10] IOASID extensions for guest SVA - wip
https://lore.kernel.org/linux-iommu/[email protected]/

The latest IOASID code added below new interface for itertate all PASIDs of an
ioasid_set. The implementation is not sent out yet as Jacob needs some cleanup,
it can be found in branch vsva-linux-5.8-rc6-v6 on github (mentioned below):
int ioasid_set_for_each_ioasid(int sid, void (*fn)(ioasid_t id, void *data), void *data);

Complete set for current vSVA can be found in below branch.
https://github.com/luxis1999/linux-vsva.git: vsva-linux-5.8-rc6-v6

The corresponding QEMU patch series is included in below branch:
https://github.com/luxis1999/qemu.git: vsva_5.8_rc6_qemu_rfcv9


Regards,
Yi Liu

Changelog:
- Patch v5 -> Patch v6:
a) Address comments against v5 from Eric.
b) rebase on Jacob's v6 IOMMU uapi enhancement
Patch v5: https://lore.kernel.org/kvm/[email protected]/

- Patch v4 -> Patch v5:
a) Address comments against v4
Patch v4: https://lore.kernel.org/kvm/[email protected]/

- Patch v3 -> Patch v4:
a) Address comments against v3
b) Add rb from Stefan on patch 14/15
Patch v3: https://lore.kernel.org/linux-iommu/[email protected]/

- Patch v2 -> Patch v3:
a) Rebase on top of Jacob's v3 iommu uapi patchset
b) Address comments from Kevin and Stefan Hajnoczi
c) Reuse DOMAIN_ATTR_NESTING to get iommu nesting info
d) Drop [PATCH v2 07/15] iommu/uapi: Add iommu_gpasid_unbind_data
Patch v2: https://lore.kernel.org/linux-iommu/[email protected]/#r

- Patch v1 -> Patch v2:
a) Refactor vfio_iommu_type1_ioctl() per suggestion from Christoph
Hellwig.
b) Re-sequence the patch series for better bisect support.
c) Report IOMMU nesting cap info in detail instead of a format in
v1.
d) Enforce one group per nesting type container for vfio iommu type1
driver.
e) Build the vfio_mm related code from vfio.c to be a separate
vfio_pasid.ko.
f) Add PASID ownership check in IOMMU driver.
g) Adopted to latest IOMMU UAPI design. Removed IOMMU UAPI version
check. Added iommu_gpasid_unbind_data for unbind requests from
userspace.
h) Define a single ioctl:VFIO_IOMMU_NESTING_OP for bind/unbind_gtbl
and cahce_invld.
i) Document dual stage control in vfio.rst.
Patch v1: https://lore.kernel.org/linux-iommu/[email protected]/

- RFC v3 -> Patch v1:
a) Address comments to the PASID request(alloc/free) path
b) Report PASID alloc/free availabitiy to user-space
c) Add a vfio_iommu_type1 parameter to support pasid quota tuning
d) Adjusted to latest ioasid code implementation. e.g. remove the
code for tracking the allocated PASIDs as latest ioasid code
will track it, VFIO could use ioasid_free_set() to free all
PASIDs.
RFC v3: https://lore.kernel.org/linux-iommu/[email protected]/

- RFC v2 -> v3:
a) Refine the whole patchset to fit the roughly parts in this series
b) Adds complete vfio PASID management framework. e.g. pasid alloc,
free, reclaim in VM crash/down and per-VM PASID quota to prevent
PASID abuse.
c) Adds IOMMU uAPI version check and page table format check to ensure
version compatibility and hardware compatibility.
d) Adds vSVA vfio support for IOMMU-backed mdevs.
RFC v2: https://lore.kernel.org/linux-iommu/[email protected]/

- RFC v1 -> v2:
Dropped vfio: VFIO_IOMMU_ATTACH/DETACH_PASID_TABLE.
RFC v1: https://lore.kernel.org/linux-iommu/[email protected]/

---
Eric Auger (1):
vfio: Document dual stage control

Liu Yi L (13):
vfio/type1: Refactor vfio_iommu_type1_ioctl()
iommu: Report domain nesting info
iommu/smmu: Report empty domain nesting info
vfio/type1: Report iommu nesting info to userspace
vfio: Add PASID allocation/free support
iommu/vt-d: Support setting ioasid set to domain
vfio/type1: Add VFIO_IOMMU_PASID_REQUEST (alloc/free)
iommu/vt-d: Check ownership for PASIDs from user-space
vfio/type1: Support binding guest page tables to PASID
vfio/type1: Allow invalidating first-level/stage IOMMU cache
vfio/type1: Add vSVA support for IOMMU-backed mdevs
vfio/pci: Expose PCIe PASID capability to guest
iommu/vt-d: Support reporting nesting capability info

Yi Sun (1):
iommu: Pass domain to sva_unbind_gpasid()

Documentation/driver-api/vfio.rst | 75 ++++
drivers/iommu/arm-smmu-v3.c | 29 +-
drivers/iommu/arm-smmu.c | 29 +-
drivers/iommu/intel/iommu.c | 114 +++++-
drivers/iommu/intel/svm.c | 10 +-
drivers/iommu/iommu.c | 2 +-
drivers/vfio/Kconfig | 6 +
drivers/vfio/Makefile | 1 +
drivers/vfio/pci/vfio_pci_config.c | 2 +-
drivers/vfio/vfio_iommu_type1.c | 796 ++++++++++++++++++++++++++++---------
drivers/vfio/vfio_pasid.c | 284 +++++++++++++
include/linux/intel-iommu.h | 23 +-
include/linux/iommu.h | 4 +-
include/linux/vfio.h | 54 +++
include/uapi/linux/iommu.h | 74 ++++
include/uapi/linux/vfio.h | 90 +++++
16 files changed, 1391 insertions(+), 202 deletions(-)
create mode 100644 drivers/vfio/vfio_pasid.c

--
2.7.4


2020-07-28 06:25:39

by Yi Liu

[permalink] [raw]
Subject: [PATCH v6 03/15] iommu/smmu: Report empty domain nesting info

This patch is added as instead of returning a boolean for DOMAIN_ATTR_NESTING,
iommu_domain_get_attr() should return an iommu_nesting_info handle. For
now, return an empty nesting info struct for now as true nesting is not
yet supported by the SMMUs.

Cc: Will Deacon <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Eric Auger <[email protected]>
Suggested-by: Jean-Philippe Brucker <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
v5 -> v6:
*) add review-by from Eric Auger.

v4 -> v5:
*) address comments from Eric Auger.
---
drivers/iommu/arm-smmu-v3.c | 29 +++++++++++++++++++++++++++--
drivers/iommu/arm-smmu.c | 29 +++++++++++++++++++++++++++--
2 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f578677..c702ef9 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -3019,6 +3019,32 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
return group;
}

+static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
+ void *data)
+{
+ struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
+ unsigned int size;
+
+ if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+ return -ENODEV;
+
+ size = sizeof(struct iommu_nesting_info);
+
+ /*
+ * if provided buffer size is smaller than expected, should
+ * return 0 and also the expected buffer size to caller.
+ */
+ if (info->argsz < size) {
+ info->argsz = size;
+ return 0;
+ }
+
+ /* report an empty iommu_nesting_info for now */
+ memset(info, 0x0, size);
+ info->argsz = size;
+ return 0;
+}
+
static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
enum iommu_attr attr, void *data)
{
@@ -3028,8 +3054,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
case IOMMU_DOMAIN_UNMANAGED:
switch (attr) {
case DOMAIN_ATTR_NESTING:
- *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
- return 0;
+ return arm_smmu_domain_nesting_info(smmu_domain, data);
default:
return -ENODEV;
}
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4c..2bd58f4 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1506,6 +1506,32 @@ static struct iommu_group *arm_smmu_device_group(struct device *dev)
return group;
}

+static int arm_smmu_domain_nesting_info(struct arm_smmu_domain *smmu_domain,
+ void *data)
+{
+ struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
+ unsigned int size;
+
+ if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
+ return -ENODEV;
+
+ size = sizeof(struct iommu_nesting_info);
+
+ /*
+ * if provided buffer size is smaller than expected, should
+ * return 0 and also the expected buffer size to caller.
+ */
+ if (info->argsz < size) {
+ info->argsz = size;
+ return 0;
+ }
+
+ /* report an empty iommu_nesting_info for now */
+ memset(info, 0x0, size);
+ info->argsz = size;
+ return 0;
+}
+
static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
enum iommu_attr attr, void *data)
{
@@ -1515,8 +1541,7 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
case IOMMU_DOMAIN_UNMANAGED:
switch (attr) {
case DOMAIN_ATTR_NESTING:
- *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
- return 0;
+ return arm_smmu_domain_nesting_info(smmu_domain, data);
default:
return -ENODEV;
}
--
2.7.4

2020-07-28 06:25:40

by Yi Liu

[permalink] [raw]
Subject: [PATCH v6 06/15] iommu/vt-d: Support setting ioasid set to domain

From IOMMU p.o.v., PASIDs allocated and managed by external components
(e.g. VFIO) will be passed in for gpasid_bind/unbind operation. IOMMU
needs some knowledge to check the PASID ownership, hence add an interface
for those components to tell the PASID owner.

In latest kernel design, PASID ownership is managed by IOASID set where
the PASID is allocated from. This patch adds support for setting ioasid
set ID to the domains used for nesting/vSVA. Subsequent SVA operations
will check the PASID against its IOASID set for proper ownership.

Cc: Kevin Tian <[email protected]>
CC: Jacob Pan <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Jean-Philippe Brucker <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Lu Baolu <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
v5 -> v6:
*) address comments against v5 from Eric Auger.

v4 -> v5:
*) address comments from Eric Auger.
---
drivers/iommu/intel/iommu.c | 23 +++++++++++++++++++++++
include/linux/intel-iommu.h | 4 ++++
include/linux/iommu.h | 1 +
3 files changed, 28 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ed4b71c..b2fe54e 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1793,6 +1793,7 @@ static struct dmar_domain *alloc_domain(int flags)
if (first_level_by_default())
domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
domain->has_iotlb_device = false;
+ domain->ioasid_sid = INVALID_IOASID_SET;
INIT_LIST_HEAD(&domain->devices);

return domain;
@@ -6040,6 +6041,28 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain,
}
spin_unlock_irqrestore(&device_domain_lock, flags);
break;
+ case DOMAIN_ATTR_IOASID_SID:
+ {
+ int sid = *(int *)data;
+
+ spin_lock_irqsave(&device_domain_lock, flags);
+ if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) {
+ ret = -ENODEV;
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+ break;
+ }
+ if (dmar_domain->ioasid_sid != INVALID_IOASID_SET &&
+ dmar_domain->ioasid_sid != sid) {
+ pr_warn_ratelimited("multi ioasid_set (%d:%d) setting",
+ dmar_domain->ioasid_sid, sid);
+ ret = -EBUSY;
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+ break;
+ }
+ dmar_domain->ioasid_sid = sid;
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+ break;
+ }
default:
ret = -EINVAL;
break;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 3f23c26..0d0ab32 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -549,6 +549,10 @@ struct dmar_domain {
2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
u64 max_addr; /* maximum mapped address */

+ int ioasid_sid; /*
+ * the ioasid set which tracks all
+ * PASIDs used by the domain.
+ */
int default_pasid; /*
* The default pasid used for non-SVM
* traffic on mediated devices.
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 4a02c9e..b1ff702 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -124,6 +124,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING, /* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+ DOMAIN_ATTR_IOASID_SID,
DOMAIN_ATTR_MAX,
};

--
2.7.4

2020-07-28 06:25:43

by Yi Liu

[permalink] [raw]
Subject: [PATCH v6 04/15] vfio/type1: Report iommu nesting info to userspace

This patch exports iommu nesting capability info to user space through
VFIO. Userspace is expected to check this info for supported uAPIs (e.g.
PASID alloc/free, bind page table, and cache invalidation) and the vendor
specific format information for first level/stage page table that will be
bound to.

The nesting info is available only after container set to be NESTED type.
Current implementation imposes one limitation - one nesting container
should include at most one iommu group. The philosophy of vfio container
is having all groups/devices within the container share the same IOMMU
context. When vSVA is enabled, one IOMMU context could include one 2nd-
level address space and multiple 1st-level address spaces. While the
2nd-level address space is reasonably sharable by multiple groups, blindly
sharing 1st-level address spaces across all groups within the container
might instead break the guest expectation. In the future sub/super container
concept might be introduced to allow partial address space sharing within
an IOMMU context. But for now let's go with this restriction by requiring
singleton container for using nesting iommu features. Below link has the
related discussion about this decision.

https://lore.kernel.org/kvm/[email protected]/

This patch also changes the NESTING type container behaviour. Something
that would have succeeded before will now fail: Before this series, if
user asked for a VFIO_IOMMU_TYPE1_NESTING, it would have succeeded even
if the SMMU didn't support stage-2, as the driver would have silently
fallen back on stage-1 mappings (which work exactly the same as stage-2
only since there was no nesting supported). After the series, we do check
for DOMAIN_ATTR_NESTING so if user asks for VFIO_IOMMU_TYPE1_NESTING and
the SMMU doesn't support stage-2, the ioctl fails. But it should be a good
fix and completely harmless. Detail can be found in below link as well.

https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/

Cc: Kevin Tian <[email protected]>
CC: Jacob Pan <[email protected]>
Cc: Alex Williamson <[email protected]>
Cc: Eric Auger <[email protected]>
Cc: Jean-Philippe Brucker <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Lu Baolu <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
---
v5 -> v6:
*) address comments against v5 from Eric Auger.
*) don't report nesting cap to userspace if the nesting_info->format is
invalid.

v4 -> v5:
*) address comments from Eric Auger.
*) return struct iommu_nesting_info for VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
cap is much "cheap", if needs extension in future, just define another cap.
https://lore.kernel.org/kvm/[email protected]/

v3 -> v4:
*) address comments against v3.

v1 -> v2:
*) added in v2
---
drivers/vfio/vfio_iommu_type1.c | 106 +++++++++++++++++++++++++++++++++++-----
include/uapi/linux/vfio.h | 19 +++++++
2 files changed, 113 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 3bd70ff..18ff0c3 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
"Maximum number of user DMA mappings per container (65535).");

struct vfio_iommu {
- struct list_head domain_list;
- struct list_head iova_list;
- struct vfio_domain *external_domain; /* domain for external user */
- struct mutex lock;
- struct rb_root dma_list;
- struct blocking_notifier_head notifier;
- unsigned int dma_avail;
- uint64_t pgsize_bitmap;
- bool v2;
- bool nesting;
- bool dirty_page_tracking;
- bool pinned_page_dirty_scope;
+ struct list_head domain_list;
+ struct list_head iova_list;
+ /* domain for external user */
+ struct vfio_domain *external_domain;
+ struct mutex lock;
+ struct rb_root dma_list;
+ struct blocking_notifier_head notifier;
+ unsigned int dma_avail;
+ uint64_t pgsize_bitmap;
+ bool v2;
+ bool nesting;
+ bool dirty_page_tracking;
+ bool pinned_page_dirty_scope;
+ struct iommu_nesting_info *nesting_info;
};

struct vfio_domain {
@@ -130,6 +132,9 @@ struct vfio_regions {
#define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \
(!list_empty(&iommu->domain_list))

+#define CONTAINER_HAS_DOMAIN(iommu) (((iommu)->external_domain) || \
+ (!list_empty(&(iommu)->domain_list)))
+
#define DIRTY_BITMAP_BYTES(n) (ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)

/*
@@ -1929,6 +1934,13 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu,

list_splice_tail(iova_copy, iova);
}
+
+static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
+{
+ kfree(iommu->nesting_info);
+ iommu->nesting_info = NULL;
+}
+
static int vfio_iommu_type1_attach_group(void *iommu_data,
struct iommu_group *iommu_group)
{
@@ -1959,6 +1971,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
}
}

+ /* Nesting type container can include only one group */
+ if (iommu->nesting && CONTAINER_HAS_DOMAIN(iommu)) {
+ mutex_unlock(&iommu->lock);
+ return -EINVAL;
+ }
+
group = kzalloc(sizeof(*group), GFP_KERNEL);
domain = kzalloc(sizeof(*domain), GFP_KERNEL);
if (!group || !domain) {
@@ -2029,6 +2047,32 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
if (ret)
goto out_domain;

+ /* Nesting cap info is available only after attaching */
+ if (iommu->nesting) {
+ struct iommu_nesting_info tmp = { .argsz = 0, };
+
+ /* First get the size of vendor specific nesting info */
+ ret = iommu_domain_get_attr(domain->domain,
+ DOMAIN_ATTR_NESTING,
+ &tmp);
+ if (ret)
+ goto out_detach;
+
+ iommu->nesting_info = kzalloc(tmp.argsz, GFP_KERNEL);
+ if (!iommu->nesting_info) {
+ ret = -ENOMEM;
+ goto out_detach;
+ }
+
+ /* Now get the nesting info */
+ iommu->nesting_info->argsz = tmp.argsz;
+ ret = iommu_domain_get_attr(domain->domain,
+ DOMAIN_ATTR_NESTING,
+ iommu->nesting_info);
+ if (ret)
+ goto out_detach;
+ }
+
/* Get aperture info */
iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, &geo);

@@ -2138,6 +2182,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
return 0;

out_detach:
+ vfio_iommu_release_nesting_info(iommu);
vfio_iommu_detach_group(domain, group);
out_domain:
iommu_domain_free(domain->domain);
@@ -2338,6 +2383,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
vfio_iommu_unmap_unpin_all(iommu);
else
vfio_iommu_unmap_unpin_reaccount(iommu);
+
+ vfio_iommu_release_nesting_info(iommu);
}
iommu_domain_free(domain->domain);
list_del(&domain->next);
@@ -2546,6 +2593,39 @@ static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
return vfio_info_add_capability(caps, &cap_mig.header, sizeof(cap_mig));
}

+static int vfio_iommu_add_nesting_cap(struct vfio_iommu *iommu,
+ struct vfio_info_cap *caps)
+{
+ struct vfio_info_cap_header *header;
+ struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
+ size_t size;
+
+ /* when nesting_info is null, no need go further */
+ if (!iommu->nesting_info)
+ return 0;
+
+ /* when @format of nesting_info is 0, fail the call */
+ if (iommu->nesting_info->format == 0)
+ return -ENOENT;
+
+ size = offsetof(struct vfio_iommu_type1_info_cap_nesting, info) +
+ iommu->nesting_info->argsz;
+
+ header = vfio_info_cap_add(caps, size,
+ VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
+ if (IS_ERR(header))
+ return PTR_ERR(header);
+
+ nesting_cap = container_of(header,
+ struct vfio_iommu_type1_info_cap_nesting,
+ header);
+
+ memcpy(&nesting_cap->info, iommu->nesting_info,
+ iommu->nesting_info->argsz);
+
+ return 0;
+}
+
static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
unsigned long arg)
{
@@ -2581,6 +2661,8 @@ static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
if (!ret)
ret = vfio_iommu_iova_build_caps(iommu, &caps);

+ ret = vfio_iommu_add_nesting_cap(iommu, &caps);
+
mutex_unlock(&iommu->lock);

if (ret)
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9204705..0cf3d6d 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -14,6 +14,7 @@

#include <linux/types.h>
#include <linux/ioctl.h>
+#include <linux/iommu.h>

#define VFIO_API_VERSION 0

@@ -1039,6 +1040,24 @@ struct vfio_iommu_type1_info_cap_migration {
__u64 max_dirty_bitmap_size; /* in bytes */
};

+/*
+ * The nesting capability allows to report the related capability
+ * and info for nesting iommu type.
+ *
+ * The structures below define version 1 of this capability.
+ *
+ * Userspace selected VFIO_TYPE1_NESTING_IOMMU type should check
+ * this capability to get supported features.
+ *
+ * @info: the nesting info provided by IOMMU driver.
+ */
+#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING 3
+
+struct vfio_iommu_type1_info_cap_nesting {
+ struct vfio_info_cap_header header;
+ struct iommu_nesting_info info;
+};
+
#define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)

/**
--
2.7.4

2020-08-13 13:24:44

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v6 04/15] vfio/type1: Report iommu nesting info to userspace

Hi Yi,
On 7/28/20 8:27 AM, Liu Yi L wrote:
> This patch exports iommu nesting capability info to user space through
> VFIO. Userspace is expected to check this info for supported uAPIs (e.g.
> PASID alloc/free, bind page table, and cache invalidation) and the vendor
> specific format information for first level/stage page table that will be
> bound to.
>
> The nesting info is available only after container set to be NESTED type.
> Current implementation imposes one limitation - one nesting container
> should include at most one iommu group. The philosophy of vfio container
> is having all groups/devices within the container share the same IOMMU
> context. When vSVA is enabled, one IOMMU context could include one 2nd-
> level address space and multiple 1st-level address spaces. While the
> 2nd-level address space is reasonably sharable by multiple groups, blindly
> sharing 1st-level address spaces across all groups within the container
> might instead break the guest expectation. In the future sub/super container
> concept might be introduced to allow partial address space sharing within
> an IOMMU context. But for now let's go with this restriction by requiring
> singleton container for using nesting iommu features. Below link has the
> related discussion about this decision.
>
> https://lore.kernel.org/kvm/[email protected]/
>
> This patch also changes the NESTING type container behaviour. Something
> that would have succeeded before will now fail: Before this series, if
> user asked for a VFIO_IOMMU_TYPE1_NESTING, it would have succeeded even
> if the SMMU didn't support stage-2, as the driver would have silently
> fallen back on stage-1 mappings (which work exactly the same as stage-2
> only since there was no nesting supported). After the series, we do check
> for DOMAIN_ATTR_NESTING so if user asks for VFIO_IOMMU_TYPE1_NESTING and
> the SMMU doesn't support stage-2, the ioctl fails. But it should be a good
> fix and completely harmless. Detail can be found in below link as well.
>
> https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/
>
> Cc: Kevin Tian <[email protected]>
> CC: Jacob Pan <[email protected]>
> Cc: Alex Williamson <[email protected]>
> Cc: Eric Auger <[email protected]>
> Cc: Jean-Philippe Brucker <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Lu Baolu <[email protected]>
> Signed-off-by: Liu Yi L <[email protected]>
> ---
> v5 -> v6:
> *) address comments against v5 from Eric Auger.
> *) don't report nesting cap to userspace if the nesting_info->format is
> invalid.
>
> v4 -> v5:
> *) address comments from Eric Auger.
> *) return struct iommu_nesting_info for VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
> cap is much "cheap", if needs extension in future, just define another cap.
> https://lore.kernel.org/kvm/[email protected]/
>
> v3 -> v4:
> *) address comments against v3.
>
> v1 -> v2:
> *) added in v2
> ---
> drivers/vfio/vfio_iommu_type1.c | 106 +++++++++++++++++++++++++++++++++++-----
> include/uapi/linux/vfio.h | 19 +++++++
> 2 files changed, 113 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 3bd70ff..18ff0c3 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
> "Maximum number of user DMA mappings per container (65535).");
>
> struct vfio_iommu {
> - struct list_head domain_list;
> - struct list_head iova_list;
> - struct vfio_domain *external_domain; /* domain for external user */
> - struct mutex lock;
> - struct rb_root dma_list;
> - struct blocking_notifier_head notifier;
> - unsigned int dma_avail;
> - uint64_t pgsize_bitmap;
> - bool v2;
> - bool nesting;
> - bool dirty_page_tracking;
> - bool pinned_page_dirty_scope;
> + struct list_head domain_list;
> + struct list_head iova_list;
> + /* domain for external user */
> + struct vfio_domain *external_domain;
> + struct mutex lock;
> + struct rb_root dma_list;
> + struct blocking_notifier_head notifier;
> + unsigned int dma_avail;
> + uint64_t pgsize_bitmap;
> + bool v2;
> + bool nesting;
> + bool dirty_page_tracking;
> + bool pinned_page_dirty_scope;
> + struct iommu_nesting_info *nesting_info;
> };
>
> struct vfio_domain {
> @@ -130,6 +132,9 @@ struct vfio_regions {
> #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \
> (!list_empty(&iommu->domain_list))
>
> +#define CONTAINER_HAS_DOMAIN(iommu) (((iommu)->external_domain) || \
> + (!list_empty(&(iommu)->domain_list)))
> +
> #define DIRTY_BITMAP_BYTES(n) (ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)
>
> /*
> @@ -1929,6 +1934,13 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu,
>
> list_splice_tail(iova_copy, iova);
> }
> +
> +static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
> +{
> + kfree(iommu->nesting_info);
> + iommu->nesting_info = NULL;
> +}
> +
> static int vfio_iommu_type1_attach_group(void *iommu_data,
> struct iommu_group *iommu_group)
> {
> @@ -1959,6 +1971,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> }
> }
>
> + /* Nesting type container can include only one group */
> + if (iommu->nesting && CONTAINER_HAS_DOMAIN(iommu)) {
> + mutex_unlock(&iommu->lock);
> + return -EINVAL;
> + }
> +
> group = kzalloc(sizeof(*group), GFP_KERNEL);
> domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> if (!group || !domain) {
> @@ -2029,6 +2047,32 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> if (ret)
> goto out_domain;
>
> + /* Nesting cap info is available only after attaching */
> + if (iommu->nesting) {
> + struct iommu_nesting_info tmp = { .argsz = 0, };
> +
> + /* First get the size of vendor specific nesting info */
> + ret = iommu_domain_get_attr(domain->domain,
> + DOMAIN_ATTR_NESTING,
> + &tmp);
> + if (ret)
> + goto out_detach;
> +
> + iommu->nesting_info = kzalloc(tmp.argsz, GFP_KERNEL);
> + if (!iommu->nesting_info) {
> + ret = -ENOMEM;
> + goto out_detach;
> + }
> +
> + /* Now get the nesting info */
> + iommu->nesting_info->argsz = tmp.argsz;
> + ret = iommu_domain_get_attr(domain->domain,
> + DOMAIN_ATTR_NESTING,
> + iommu->nesting_info);
> + if (ret)
> + goto out_detach;
> + }
> +
> /* Get aperture info */
> iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, &geo);
>
> @@ -2138,6 +2182,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> return 0;
>
> out_detach:
> + vfio_iommu_release_nesting_info(iommu);
> vfio_iommu_detach_group(domain, group);
> out_domain:
> iommu_domain_free(domain->domain);
> @@ -2338,6 +2383,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
> vfio_iommu_unmap_unpin_all(iommu);
> else
> vfio_iommu_unmap_unpin_reaccount(iommu);
> +
> + vfio_iommu_release_nesting_info(iommu);
> }
> iommu_domain_free(domain->domain);
> list_del(&domain->next);
> @@ -2546,6 +2593,39 @@ static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
> return vfio_info_add_capability(caps, &cap_mig.header, sizeof(cap_mig));
> }
>
> +static int vfio_iommu_add_nesting_cap(struct vfio_iommu *iommu,
> + struct vfio_info_cap *caps)
> +{
> + struct vfio_info_cap_header *header;
> + struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
> + size_t size;
> +
> + /* when nesting_info is null, no need go further */
no need to go
> + if (!iommu->nesting_info)
> + return 0;
> +
> + /* when @format of nesting_info is 0, fail the call */
> + if (iommu->nesting_info->format == 0)
> + return -ENOENT;
> +
> + size = offsetof(struct vfio_iommu_type1_info_cap_nesting, info) +
> + iommu->nesting_info->argsz;
> +
> + header = vfio_info_cap_add(caps, size,
> + VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> + if (IS_ERR(header))
> + return PTR_ERR(header);
> +
> + nesting_cap = container_of(header,
> + struct vfio_iommu_type1_info_cap_nesting,
> + header);
> +
> + memcpy(&nesting_cap->info, iommu->nesting_info,
> + iommu->nesting_info->argsz);
can't you use vfio_info_add_capability() directly?
> +
> + return 0;
> +}
> +
> static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
> unsigned long arg)
> {
> @@ -2581,6 +2661,8 @@ static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
> if (!ret)
> ret = vfio_iommu_iova_build_caps(iommu, &caps);
>
> + ret = vfio_iommu_add_nesting_cap(iommu, &caps);
> +
> mutex_unlock(&iommu->lock);
>
> if (ret)
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 9204705..0cf3d6d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -14,6 +14,7 @@
>
> #include <linux/types.h>
> #include <linux/ioctl.h>
> +#include <linux/iommu.h>
>
> #define VFIO_API_VERSION 0
>
> @@ -1039,6 +1040,24 @@ struct vfio_iommu_type1_info_cap_migration {
> __u64 max_dirty_bitmap_size; /* in bytes */
> };
>
> +/*
> + * The nesting capability allows to report the related capability
> + * and info for nesting iommu type.
> + *
> + * The structures below define version 1 of this capability.
> + *
> + * Userspace selected VFIO_TYPE1_NESTING_IOMMU type should check
> + * this capability to get supported features.
nested capabilities should be checked by the userspace after setting
VFIO_TYPE1_NESTING_IOMMU?
> + *
> + * @info: the nesting info provided by IOMMU driver.
> + */
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING 3
> +
> +struct vfio_iommu_type1_info_cap_nesting {
> + struct vfio_info_cap_header header;
> + struct iommu_nesting_info info;
> +};
> +
> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>
> /**
>
Thanks

Eric

2020-08-13 15:08:19

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v6 06/15] iommu/vt-d: Support setting ioasid set to domain

Hi Yi,

On 7/28/20 8:27 AM, Liu Yi L wrote:
> From IOMMU p.o.v., PASIDs allocated and managed by external components
> (e.g. VFIO) will be passed in for gpasid_bind/unbind operation. IOMMU
> needs some knowledge to check the PASID ownership, hence add an interface
> for those components to tell the PASID owner.
>
> In latest kernel design, PASID ownership is managed by IOASID set where
> the PASID is allocated from. This patch adds support for setting ioasid
> set ID to the domains used for nesting/vSVA. Subsequent SVA operations
> will check the PASID against its IOASID set for proper ownership.
>
> Cc: Kevin Tian <[email protected]>
> CC: Jacob Pan <[email protected]>
> Cc: Alex Williamson <[email protected]>
> Cc: Eric Auger <[email protected]>
> Cc: Jean-Philippe Brucker <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Lu Baolu <[email protected]>
> Signed-off-by: Liu Yi L <[email protected]>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> v5 -> v6:
> *) address comments against v5 from Eric Auger.
>
> v4 -> v5:
> *) address comments from Eric Auger.
> ---
> drivers/iommu/intel/iommu.c | 23 +++++++++++++++++++++++
> include/linux/intel-iommu.h | 4 ++++
> include/linux/iommu.h | 1 +
> 3 files changed, 28 insertions(+)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index ed4b71c..b2fe54e 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1793,6 +1793,7 @@ static struct dmar_domain *alloc_domain(int flags)
> if (first_level_by_default())
> domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
> domain->has_iotlb_device = false;
> + domain->ioasid_sid = INVALID_IOASID_SET;
> INIT_LIST_HEAD(&domain->devices);
>
> return domain;
> @@ -6040,6 +6041,28 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain,
> }
> spin_unlock_irqrestore(&device_domain_lock, flags);
> break;
> + case DOMAIN_ATTR_IOASID_SID:
> + {
> + int sid = *(int *)data;

> +
> + spin_lock_irqsave(&device_domain_lock, flags);
> + if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) {
> + ret = -ENODEV;
> + spin_unlock_irqrestore(&device_domain_lock, flags);
> + break;
> + }
> + if (dmar_domain->ioasid_sid != INVALID_IOASID_SET &&
> + dmar_domain->ioasid_sid != sid) {
> + pr_warn_ratelimited("multi ioasid_set (%d:%d) setting",
> + dmar_domain->ioasid_sid, sid);
> + ret = -EBUSY;
> + spin_unlock_irqrestore(&device_domain_lock, flags);
> + break;
> + }
> + dmar_domain->ioasid_sid = sid;
> + spin_unlock_irqrestore(&device_domain_lock, flags);
> + break;
nit: Adding a small helper
int__set_ioasid_sid(struct dmar_domain *dmar_domain, int sid_id)

may simplify the lock handling


> + }
> default:
> ret = -EINVAL;
> break;
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 3f23c26..0d0ab32 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -549,6 +549,10 @@ struct dmar_domain {
> 2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
> u64 max_addr; /* maximum mapped address */
>
> + int ioasid_sid; /*
> + * the ioasid set which tracks all
id of the ioasid set?
> + * PASIDs used by the domain.
> + */
> int default_pasid; /*
> * The default pasid used for non-SVM
> * traffic on mediated devices.
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 4a02c9e..b1ff702 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -124,6 +124,7 @@ enum iommu_attr {
> DOMAIN_ATTR_FSL_PAMUV1,
> DOMAIN_ATTR_NESTING, /* two stages of translation */
> DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> + DOMAIN_ATTR_IOASID_SID,
> DOMAIN_ATTR_MAX,
> };
>
>
Besides
Reviewed-by: Eric Auger <[email protected]>


Eric

2020-08-14 07:59:10

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v6 04/15] vfio/type1: Report iommu nesting info to userspace

Hi Eric,

> From: Auger Eric <[email protected]>
> Sent: Thursday, August 13, 2020 9:20 PM
>
> Hi Yi,
> On 7/28/20 8:27 AM, Liu Yi L wrote:
> > This patch exports iommu nesting capability info to user space through
> > VFIO. Userspace is expected to check this info for supported uAPIs (e.g.
> > PASID alloc/free, bind page table, and cache invalidation) and the
> > vendor specific format information for first level/stage page table
> > that will be bound to.
> >
> > The nesting info is available only after container set to be NESTED type.
> > Current implementation imposes one limitation - one nesting container
> > should include at most one iommu group. The philosophy of vfio
> > container is having all groups/devices within the container share the
> > same IOMMU context. When vSVA is enabled, one IOMMU context could
> > include one 2nd- level address space and multiple 1st-level address
> > spaces. While the 2nd-level address space is reasonably sharable by
> > multiple groups, blindly sharing 1st-level address spaces across all
> > groups within the container might instead break the guest expectation.
> > In the future sub/super container concept might be introduced to allow
> > partial address space sharing within an IOMMU context. But for now
> > let's go with this restriction by requiring singleton container for
> > using nesting iommu features. Below link has the related discussion about this
> decision.
> >
> > https://lore.kernel.org/kvm/[email protected]/
> >
> > This patch also changes the NESTING type container behaviour.
> > Something that would have succeeded before will now fail: Before this
> > series, if user asked for a VFIO_IOMMU_TYPE1_NESTING, it would have
> > succeeded even if the SMMU didn't support stage-2, as the driver would
> > have silently fallen back on stage-1 mappings (which work exactly the
> > same as stage-2 only since there was no nesting supported). After the
> > series, we do check for DOMAIN_ATTR_NESTING so if user asks for
> > VFIO_IOMMU_TYPE1_NESTING and the SMMU doesn't support stage-2, the
> > ioctl fails. But it should be a good fix and completely harmless. Detail can be found
> in below link as well.
> >
> > https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/
> >
> > Cc: Kevin Tian <[email protected]>
> > CC: Jacob Pan <[email protected]>
> > Cc: Alex Williamson <[email protected]>
> > Cc: Eric Auger <[email protected]>
> > Cc: Jean-Philippe Brucker <[email protected]>
> > Cc: Joerg Roedel <[email protected]>
> > Cc: Lu Baolu <[email protected]>
> > Signed-off-by: Liu Yi L <[email protected]>
> > ---
> > v5 -> v6:
> > *) address comments against v5 from Eric Auger.
> > *) don't report nesting cap to userspace if the nesting_info->format is
> > invalid.
> >
> > v4 -> v5:
> > *) address comments from Eric Auger.
> > *) return struct iommu_nesting_info for
> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
> > cap is much "cheap", if needs extension in future, just define another cap.
> > https://lore.kernel.org/kvm/[email protected]/
> >
> > v3 -> v4:
> > *) address comments against v3.
> >
> > v1 -> v2:
> > *) added in v2
> > ---
> > drivers/vfio/vfio_iommu_type1.c | 106
> +++++++++++++++++++++++++++++++++++-----
> > include/uapi/linux/vfio.h | 19 +++++++
> > 2 files changed, 113 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c
> > b/drivers/vfio/vfio_iommu_type1.c index 3bd70ff..18ff0c3 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
> > "Maximum number of user DMA mappings per container (65535).");
> >
> > struct vfio_iommu {
> > - struct list_head domain_list;
> > - struct list_head iova_list;
> > - struct vfio_domain *external_domain; /* domain for external user */
> > - struct mutex lock;
> > - struct rb_root dma_list;
> > - struct blocking_notifier_head notifier;
> > - unsigned int dma_avail;
> > - uint64_t pgsize_bitmap;
> > - bool v2;
> > - bool nesting;
> > - bool dirty_page_tracking;
> > - bool pinned_page_dirty_scope;
> > + struct list_head domain_list;
> > + struct list_head iova_list;
> > + /* domain for external user */
> > + struct vfio_domain *external_domain;
> > + struct mutex lock;
> > + struct rb_root dma_list;
> > + struct blocking_notifier_head notifier;
> > + unsigned int dma_avail;
> > + uint64_t pgsize_bitmap;
> > + bool v2;
> > + bool nesting;
> > + bool dirty_page_tracking;
> > + bool pinned_page_dirty_scope;
> > + struct iommu_nesting_info *nesting_info;
> > };
> >
> > struct vfio_domain {
> > @@ -130,6 +132,9 @@ struct vfio_regions {
> > #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \
> > (!list_empty(&iommu->domain_list))
> >
> > +#define CONTAINER_HAS_DOMAIN(iommu) (((iommu)->external_domain) || \
> > + (!list_empty(&(iommu)->domain_list)))
> > +
> > #define DIRTY_BITMAP_BYTES(n) (ALIGN(n, BITS_PER_TYPE(u64)) /
> BITS_PER_BYTE)
> >
> > /*
> > @@ -1929,6 +1934,13 @@ static void vfio_iommu_iova_insert_copy(struct
> > vfio_iommu *iommu,
> >
> > list_splice_tail(iova_copy, iova);
> > }
> > +
> > +static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
> > +{
> > + kfree(iommu->nesting_info);
> > + iommu->nesting_info = NULL;
> > +}
> > +
> > static int vfio_iommu_type1_attach_group(void *iommu_data,
> > struct iommu_group *iommu_group)
> { @@ -1959,6 +1971,12 @@
> > static int vfio_iommu_type1_attach_group(void *iommu_data,
> > }
> > }
> >
> > + /* Nesting type container can include only one group */
> > + if (iommu->nesting && CONTAINER_HAS_DOMAIN(iommu)) {
> > + mutex_unlock(&iommu->lock);
> > + return -EINVAL;
> > + }
> > +
> > group = kzalloc(sizeof(*group), GFP_KERNEL);
> > domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> > if (!group || !domain) {
> > @@ -2029,6 +2047,32 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> > if (ret)
> > goto out_domain;
> >
> > + /* Nesting cap info is available only after attaching */
> > + if (iommu->nesting) {
> > + struct iommu_nesting_info tmp = { .argsz = 0, };
> > +
> > + /* First get the size of vendor specific nesting info */
> > + ret = iommu_domain_get_attr(domain->domain,
> > + DOMAIN_ATTR_NESTING,
> > + &tmp);
> > + if (ret)
> > + goto out_detach;
> > +
> > + iommu->nesting_info = kzalloc(tmp.argsz, GFP_KERNEL);
> > + if (!iommu->nesting_info) {
> > + ret = -ENOMEM;
> > + goto out_detach;
> > + }
> > +
> > + /* Now get the nesting info */
> > + iommu->nesting_info->argsz = tmp.argsz;
> > + ret = iommu_domain_get_attr(domain->domain,
> > + DOMAIN_ATTR_NESTING,
> > + iommu->nesting_info);
> > + if (ret)
> > + goto out_detach;
> > + }
> > +
> > /* Get aperture info */
> > iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY,
> &geo);
> >
> > @@ -2138,6 +2182,7 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> > return 0;
> >
> > out_detach:
> > + vfio_iommu_release_nesting_info(iommu);
> > vfio_iommu_detach_group(domain, group);
> > out_domain:
> > iommu_domain_free(domain->domain);
> > @@ -2338,6 +2383,8 @@ static void vfio_iommu_type1_detach_group(void
> *iommu_data,
> > vfio_iommu_unmap_unpin_all(iommu);
> > else
> >
> vfio_iommu_unmap_unpin_reaccount(iommu);
> > +
> > + vfio_iommu_release_nesting_info(iommu);
> > }
> > iommu_domain_free(domain->domain);
> > list_del(&domain->next);
> > @@ -2546,6 +2593,39 @@ static int vfio_iommu_migration_build_caps(struct
> vfio_iommu *iommu,
> > return vfio_info_add_capability(caps, &cap_mig.header,
> > sizeof(cap_mig)); }
> >
> > +static int vfio_iommu_add_nesting_cap(struct vfio_iommu *iommu,
> > + struct vfio_info_cap *caps) {
> > + struct vfio_info_cap_header *header;
> > + struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
> > + size_t size;
> > +
> > + /* when nesting_info is null, no need go further */
> no need to go

nice catch.

> > + if (!iommu->nesting_info)
> > + return 0;
> > +
> > + /* when @format of nesting_info is 0, fail the call */
> > + if (iommu->nesting_info->format == 0)
> > + return -ENOENT;
> > +
> > + size = offsetof(struct vfio_iommu_type1_info_cap_nesting, info) +
> > + iommu->nesting_info->argsz;
> > +
> > + header = vfio_info_cap_add(caps, size,
> > + VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> > + if (IS_ERR(header))
> > + return PTR_ERR(header);
> > +
> > + nesting_cap = container_of(header,
> > + struct vfio_iommu_type1_info_cap_nesting,
> > + header);
> > +
> > + memcpy(&nesting_cap->info, iommu->nesting_info,
> > + iommu->nesting_info->argsz);
> can't you use vfio_info_add_capability() directly?

yes, the below lines will be covered by vfio_info_add_capability().

+ header = vfio_info_cap_add(caps, size,
+ VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
+ if (IS_ERR(header))
+ return PTR_ERR(header);
+
+ nesting_cap = container_of(header,
+ struct vfio_iommu_type1_info_cap_nesting,
+ header);

> > +
> > + return 0;
> > +}
> > +
> > static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
> > unsigned long arg)
> > {
> > @@ -2581,6 +2661,8 @@ static int vfio_iommu_type1_get_info(struct
> vfio_iommu *iommu,
> > if (!ret)
> > ret = vfio_iommu_iova_build_caps(iommu, &caps);
> >
> > + ret = vfio_iommu_add_nesting_cap(iommu, &caps);
> > +
> > mutex_unlock(&iommu->lock);
> >
> > if (ret)
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 9204705..0cf3d6d 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -14,6 +14,7 @@
> >
> > #include <linux/types.h>
> > #include <linux/ioctl.h>
> > +#include <linux/iommu.h>
> >
> > #define VFIO_API_VERSION 0
> >
> > @@ -1039,6 +1040,24 @@ struct vfio_iommu_type1_info_cap_migration {
> > __u64 max_dirty_bitmap_size; /* in bytes */
> > };
> >
> > +/*
> > + * The nesting capability allows to report the related capability
> > + * and info for nesting iommu type.
> > + *
> > + * The structures below define version 1 of this capability.
> > + *
> > + * Userspace selected VFIO_TYPE1_NESTING_IOMMU type should check
> > + * this capability to get supported features.
> nested capabilities should be checked by the userspace after setting
> VFIO_TYPE1_NESTING_IOMMU?

yes, will modify it.

Regards,
Yi Liu

> > + *
> > + * @info: the nesting info provided by IOMMU driver.
> > + */
> > +#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING 3
> > +
> > +struct vfio_iommu_type1_info_cap_nesting {
> > + struct vfio_info_cap_header header;
> > + struct iommu_nesting_info info;
> > +};
> > +
> > #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> >
> > /**
> >
> Thanks
>
> Eric

2020-08-14 08:16:24

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v6 06/15] iommu/vt-d: Support setting ioasid set to domain

Hi Eric,

> From: Auger Eric <[email protected]>
> Sent: Thursday, August 13, 2020 11:07 PM
>
> Hi Yi,
>
> On 7/28/20 8:27 AM, Liu Yi L wrote:
> > From IOMMU p.o.v., PASIDs allocated and managed by external components
> > (e.g. VFIO) will be passed in for gpasid_bind/unbind operation. IOMMU
> > needs some knowledge to check the PASID ownership, hence add an
> > interface for those components to tell the PASID owner.
> >
> > In latest kernel design, PASID ownership is managed by IOASID set
> > where the PASID is allocated from. This patch adds support for setting
> > ioasid set ID to the domains used for nesting/vSVA. Subsequent SVA
> > operations will check the PASID against its IOASID set for proper ownership.
> >
> > Cc: Kevin Tian <[email protected]>
> > CC: Jacob Pan <[email protected]>
> > Cc: Alex Williamson <[email protected]>
> > Cc: Eric Auger <[email protected]>
> > Cc: Jean-Philippe Brucker <[email protected]>
> > Cc: Joerg Roedel <[email protected]>
> > Cc: Lu Baolu <[email protected]>
> > Signed-off-by: Liu Yi L <[email protected]>
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > v5 -> v6:
> > *) address comments against v5 from Eric Auger.
> >
> > v4 -> v5:
> > *) address comments from Eric Auger.
> > ---
> > drivers/iommu/intel/iommu.c | 23 +++++++++++++++++++++++
> > include/linux/intel-iommu.h | 4 ++++
> > include/linux/iommu.h | 1 +
> > 3 files changed, 28 insertions(+)
> >
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index ed4b71c..b2fe54e 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -1793,6 +1793,7 @@ static struct dmar_domain *alloc_domain(int flags)
> > if (first_level_by_default())
> > domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
> > domain->has_iotlb_device = false;
> > + domain->ioasid_sid = INVALID_IOASID_SET;
> > INIT_LIST_HEAD(&domain->devices);
> >
> > return domain;
> > @@ -6040,6 +6041,28 @@ intel_iommu_domain_set_attr(struct iommu_domain
> *domain,
> > }
> > spin_unlock_irqrestore(&device_domain_lock, flags);
> > break;
> > + case DOMAIN_ATTR_IOASID_SID:
> > + {
> > + int sid = *(int *)data;
>
> > +
> > + spin_lock_irqsave(&device_domain_lock, flags);
> > + if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) {
> > + ret = -ENODEV;
> > + spin_unlock_irqrestore(&device_domain_lock, flags);
> > + break;
> > + }
> > + if (dmar_domain->ioasid_sid != INVALID_IOASID_SET &&
> > + dmar_domain->ioasid_sid != sid) {
> > + pr_warn_ratelimited("multi ioasid_set (%d:%d) setting",
> > + dmar_domain->ioasid_sid, sid);
> > + ret = -EBUSY;
> > + spin_unlock_irqrestore(&device_domain_lock, flags);
> > + break;
> > + }
> > + dmar_domain->ioasid_sid = sid;
> > + spin_unlock_irqrestore(&device_domain_lock, flags);
> > + break;
> nit: Adding a small helper
> int__set_ioasid_sid(struct dmar_domain *dmar_domain, int sid_id)
>
> may simplify the lock handling

ok. will do.

>
> > + }
> > default:
> > ret = -EINVAL;
> > break;
> > diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> > index 3f23c26..0d0ab32 100644
> > --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -549,6 +549,10 @@ struct dmar_domain {
> > 2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
> > u64 max_addr; /* maximum mapped address */
> >
> > + int ioasid_sid; /*
> > + * the ioasid set which tracks all
> id of the ioasid set?

should be ioasid_set. however, ioasid_alloc_set() returns sid in Jacob's
series. but, I heard from Jacob, he will remove ioasid_sid, and return
ioasid_set. so I will modify it once his patch is sent out.

https://lore.kernel.org/linux-iommu/[email protected]/

> > + * PASIDs used by the domain.
> > + */
> > int default_pasid; /*
> > * The default pasid used for non-SVM
> > * traffic on mediated devices.
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
> > 4a02c9e..b1ff702 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -124,6 +124,7 @@ enum iommu_attr {
> > DOMAIN_ATTR_FSL_PAMUV1,
> > DOMAIN_ATTR_NESTING, /* two stages of translation */
> > DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
> > + DOMAIN_ATTR_IOASID_SID,
> > DOMAIN_ATTR_MAX,
> > };
> >
> >
> Besides
> Reviewed-by: Eric Auger <[email protected]>

thanks :-)

Regards,
Yi Liu

>
> Eric

2020-08-16 13:01:55

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v6 06/15] iommu/vt-d: Support setting ioasid set to domain



On 8/14/20 10:04 AM, Liu, Yi L wrote:
> Hi Eric,
>
>> From: Auger Eric <[email protected]>
>> Sent: Thursday, August 13, 2020 11:07 PM
>>
>> Hi Yi,
>>
>> On 7/28/20 8:27 AM, Liu Yi L wrote:
>>> From IOMMU p.o.v., PASIDs allocated and managed by external components
>>> (e.g. VFIO) will be passed in for gpasid_bind/unbind operation. IOMMU
>>> needs some knowledge to check the PASID ownership, hence add an
>>> interface for those components to tell the PASID owner.
>>>
>>> In latest kernel design, PASID ownership is managed by IOASID set
>>> where the PASID is allocated from. This patch adds support for setting
>>> ioasid set ID to the domains used for nesting/vSVA. Subsequent SVA
>>> operations will check the PASID against its IOASID set for proper ownership.
>>>
>>> Cc: Kevin Tian <[email protected]>
>>> CC: Jacob Pan <[email protected]>
>>> Cc: Alex Williamson <[email protected]>
>>> Cc: Eric Auger <[email protected]>
>>> Cc: Jean-Philippe Brucker <[email protected]>
>>> Cc: Joerg Roedel <[email protected]>
>>> Cc: Lu Baolu <[email protected]>
>>> Signed-off-by: Liu Yi L <[email protected]>
>>> Signed-off-by: Jacob Pan <[email protected]>
>>> ---
>>> v5 -> v6:
>>> *) address comments against v5 from Eric Auger.
>>>
>>> v4 -> v5:
>>> *) address comments from Eric Auger.
>>> ---
>>> drivers/iommu/intel/iommu.c | 23 +++++++++++++++++++++++
>>> include/linux/intel-iommu.h | 4 ++++
>>> include/linux/iommu.h | 1 +
>>> 3 files changed, 28 insertions(+)
>>>
>>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>>> index ed4b71c..b2fe54e 100644
>>> --- a/drivers/iommu/intel/iommu.c
>>> +++ b/drivers/iommu/intel/iommu.c
>>> @@ -1793,6 +1793,7 @@ static struct dmar_domain *alloc_domain(int flags)
>>> if (first_level_by_default())
>>> domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
>>> domain->has_iotlb_device = false;
>>> + domain->ioasid_sid = INVALID_IOASID_SET;
>>> INIT_LIST_HEAD(&domain->devices);
>>>
>>> return domain;
>>> @@ -6040,6 +6041,28 @@ intel_iommu_domain_set_attr(struct iommu_domain
>> *domain,
>>> }
>>> spin_unlock_irqrestore(&device_domain_lock, flags);
>>> break;
>>> + case DOMAIN_ATTR_IOASID_SID:
>>> + {
>>> + int sid = *(int *)data;
>>
>>> +
>>> + spin_lock_irqsave(&device_domain_lock, flags);
>>> + if (!(dmar_domain->flags & DOMAIN_FLAG_NESTING_MODE)) {
>>> + ret = -ENODEV;
>>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>>> + break;
>>> + }
>>> + if (dmar_domain->ioasid_sid != INVALID_IOASID_SET &&
>>> + dmar_domain->ioasid_sid != sid) {
>>> + pr_warn_ratelimited("multi ioasid_set (%d:%d) setting",
>>> + dmar_domain->ioasid_sid, sid);
>>> + ret = -EBUSY;
>>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>>> + break;
>>> + }
>>> + dmar_domain->ioasid_sid = sid;
>>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>>> + break;
>> nit: Adding a small helper
>> int__set_ioasid_sid(struct dmar_domain *dmar_domain, int sid_id)
>>
>> may simplify the lock handling
>
> ok. will do.
>
>>
>>> + }
>>> default:
>>> ret = -EINVAL;
>>> break;
>>> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
>>> index 3f23c26..0d0ab32 100644
>>> --- a/include/linux/intel-iommu.h
>>> +++ b/include/linux/intel-iommu.h
>>> @@ -549,6 +549,10 @@ struct dmar_domain {
>>> 2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
>>> u64 max_addr; /* maximum mapped address */
>>>
>>> + int ioasid_sid; /*
>>> + * the ioasid set which tracks all
>> id of the ioasid set?
>
> should be ioasid_set. however, ioasid_alloc_set() returns sid in Jacob's
> series. but, I heard from Jacob, he will remove ioasid_sid, and return
> ioasid_set. so I will modify it once his patch is sent out.
>
> https://lore.kernel.org/linux-iommu/[email protected]/

OK

Thanks

Eric
>
>>> + * PASIDs used by the domain.
>>> + */
>>> int default_pasid; /*
>>> * The default pasid used for non-SVM
>>> * traffic on mediated devices.
>>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h index
>>> 4a02c9e..b1ff702 100644
>>> --- a/include/linux/iommu.h
>>> +++ b/include/linux/iommu.h
>>> @@ -124,6 +124,7 @@ enum iommu_attr {
>>> DOMAIN_ATTR_FSL_PAMUV1,
>>> DOMAIN_ATTR_NESTING, /* two stages of translation */
>>> DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
>>> + DOMAIN_ATTR_IOASID_SID,
>>> DOMAIN_ATTR_MAX,
>>> };
>>>
>>>
>> Besides
>> Reviewed-by: Eric Auger <[email protected]>
>
> thanks :-)
>
> Regards,
> Yi Liu
>
>>
>> Eric
>

2020-08-20 21:48:28

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v6 04/15] vfio/type1: Report iommu nesting info to userspace

On Mon, 27 Jul 2020 23:27:33 -0700
Liu Yi L <[email protected]> wrote:

> This patch exports iommu nesting capability info to user space through
> VFIO. Userspace is expected to check this info for supported uAPIs (e.g.
> PASID alloc/free, bind page table, and cache invalidation) and the vendor
> specific format information for first level/stage page table that will be
> bound to.
>
> The nesting info is available only after container set to be NESTED type.
> Current implementation imposes one limitation - one nesting container
> should include at most one iommu group. The philosophy of vfio container
> is having all groups/devices within the container share the same IOMMU
> context. When vSVA is enabled, one IOMMU context could include one 2nd-
> level address space and multiple 1st-level address spaces. While the
> 2nd-level address space is reasonably sharable by multiple groups, blindly
> sharing 1st-level address spaces across all groups within the container
> might instead break the guest expectation. In the future sub/super container
> concept might be introduced to allow partial address space sharing within
> an IOMMU context. But for now let's go with this restriction by requiring
> singleton container for using nesting iommu features. Below link has the
> related discussion about this decision.
>
> https://lore.kernel.org/kvm/[email protected]/
>
> This patch also changes the NESTING type container behaviour. Something
> that would have succeeded before will now fail: Before this series, if
> user asked for a VFIO_IOMMU_TYPE1_NESTING, it would have succeeded even
> if the SMMU didn't support stage-2, as the driver would have silently
> fallen back on stage-1 mappings (which work exactly the same as stage-2
> only since there was no nesting supported). After the series, we do check
> for DOMAIN_ATTR_NESTING so if user asks for VFIO_IOMMU_TYPE1_NESTING and
> the SMMU doesn't support stage-2, the ioctl fails. But it should be a good
> fix and completely harmless. Detail can be found in below link as well.
>
> https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/
>
> Cc: Kevin Tian <[email protected]>
> CC: Jacob Pan <[email protected]>
> Cc: Alex Williamson <[email protected]>
> Cc: Eric Auger <[email protected]>
> Cc: Jean-Philippe Brucker <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Lu Baolu <[email protected]>
> Signed-off-by: Liu Yi L <[email protected]>
> ---
> v5 -> v6:
> *) address comments against v5 from Eric Auger.
> *) don't report nesting cap to userspace if the nesting_info->format is
> invalid.
>
> v4 -> v5:
> *) address comments from Eric Auger.
> *) return struct iommu_nesting_info for VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
> cap is much "cheap", if needs extension in future, just define another cap.
> https://lore.kernel.org/kvm/[email protected]/
>
> v3 -> v4:
> *) address comments against v3.
>
> v1 -> v2:
> *) added in v2
> ---
> drivers/vfio/vfio_iommu_type1.c | 106 +++++++++++++++++++++++++++++++++++-----
> include/uapi/linux/vfio.h | 19 +++++++
> 2 files changed, 113 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 3bd70ff..18ff0c3 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
> "Maximum number of user DMA mappings per container (65535).");
>
> struct vfio_iommu {
> - struct list_head domain_list;
> - struct list_head iova_list;
> - struct vfio_domain *external_domain; /* domain for external user */
> - struct mutex lock;
> - struct rb_root dma_list;
> - struct blocking_notifier_head notifier;
> - unsigned int dma_avail;
> - uint64_t pgsize_bitmap;
> - bool v2;
> - bool nesting;
> - bool dirty_page_tracking;
> - bool pinned_page_dirty_scope;
> + struct list_head domain_list;
> + struct list_head iova_list;
> + /* domain for external user */
> + struct vfio_domain *external_domain;
> + struct mutex lock;
> + struct rb_root dma_list;
> + struct blocking_notifier_head notifier;
> + unsigned int dma_avail;
> + uint64_t pgsize_bitmap;
> + bool v2;
> + bool nesting;
> + bool dirty_page_tracking;
> + bool pinned_page_dirty_scope;
> + struct iommu_nesting_info *nesting_info;
> };
>
> struct vfio_domain {
> @@ -130,6 +132,9 @@ struct vfio_regions {
> #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \
> (!list_empty(&iommu->domain_list))
>
> +#define CONTAINER_HAS_DOMAIN(iommu) (((iommu)->external_domain) || \
> + (!list_empty(&(iommu)->domain_list)))
> +
> #define DIRTY_BITMAP_BYTES(n) (ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)
>
> /*
> @@ -1929,6 +1934,13 @@ static void vfio_iommu_iova_insert_copy(struct vfio_iommu *iommu,
>
> list_splice_tail(iova_copy, iova);
> }
> +
> +static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
> +{
> + kfree(iommu->nesting_info);
> + iommu->nesting_info = NULL;
> +}
> +
> static int vfio_iommu_type1_attach_group(void *iommu_data,
> struct iommu_group *iommu_group)
> {
> @@ -1959,6 +1971,12 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> }
> }
>
> + /* Nesting type container can include only one group */
> + if (iommu->nesting && CONTAINER_HAS_DOMAIN(iommu)) {
> + mutex_unlock(&iommu->lock);
> + return -EINVAL;
> + }
> +
> group = kzalloc(sizeof(*group), GFP_KERNEL);
> domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> if (!group || !domain) {
> @@ -2029,6 +2047,32 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> if (ret)
> goto out_domain;


I think that currently a user can configure a VFIO_TYPE1_NESTING_IOMMU
IOMMU type with a non-IOMMU backed mdev. Does that make any sort of
sense or is this a bug? Thanks,

Alex

>
> + /* Nesting cap info is available only after attaching */
> + if (iommu->nesting) {
> + struct iommu_nesting_info tmp = { .argsz = 0, };
> +
> + /* First get the size of vendor specific nesting info */
> + ret = iommu_domain_get_attr(domain->domain,
> + DOMAIN_ATTR_NESTING,
> + &tmp);
> + if (ret)
> + goto out_detach;
> +
> + iommu->nesting_info = kzalloc(tmp.argsz, GFP_KERNEL);
> + if (!iommu->nesting_info) {
> + ret = -ENOMEM;
> + goto out_detach;
> + }
> +
> + /* Now get the nesting info */
> + iommu->nesting_info->argsz = tmp.argsz;
> + ret = iommu_domain_get_attr(domain->domain,
> + DOMAIN_ATTR_NESTING,
> + iommu->nesting_info);
> + if (ret)
> + goto out_detach;
> + }
> +
> /* Get aperture info */
> iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY, &geo);
>
> @@ -2138,6 +2182,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
> return 0;
>
> out_detach:
> + vfio_iommu_release_nesting_info(iommu);
> vfio_iommu_detach_group(domain, group);
> out_domain:
> iommu_domain_free(domain->domain);
> @@ -2338,6 +2383,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data,
> vfio_iommu_unmap_unpin_all(iommu);
> else
> vfio_iommu_unmap_unpin_reaccount(iommu);
> +
> + vfio_iommu_release_nesting_info(iommu);
> }
> iommu_domain_free(domain->domain);
> list_del(&domain->next);
> @@ -2546,6 +2593,39 @@ static int vfio_iommu_migration_build_caps(struct vfio_iommu *iommu,
> return vfio_info_add_capability(caps, &cap_mig.header, sizeof(cap_mig));
> }
>
> +static int vfio_iommu_add_nesting_cap(struct vfio_iommu *iommu,
> + struct vfio_info_cap *caps)
> +{
> + struct vfio_info_cap_header *header;
> + struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
> + size_t size;
> +
> + /* when nesting_info is null, no need go further */
> + if (!iommu->nesting_info)
> + return 0;
> +
> + /* when @format of nesting_info is 0, fail the call */
> + if (iommu->nesting_info->format == 0)
> + return -ENOENT;
> +
> + size = offsetof(struct vfio_iommu_type1_info_cap_nesting, info) +
> + iommu->nesting_info->argsz;
> +
> + header = vfio_info_cap_add(caps, size,
> + VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> + if (IS_ERR(header))
> + return PTR_ERR(header);
> +
> + nesting_cap = container_of(header,
> + struct vfio_iommu_type1_info_cap_nesting,
> + header);
> +
> + memcpy(&nesting_cap->info, iommu->nesting_info,
> + iommu->nesting_info->argsz);
> +
> + return 0;
> +}
> +
> static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
> unsigned long arg)
> {
> @@ -2581,6 +2661,8 @@ static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
> if (!ret)
> ret = vfio_iommu_iova_build_caps(iommu, &caps);
>
> + ret = vfio_iommu_add_nesting_cap(iommu, &caps);
> +
> mutex_unlock(&iommu->lock);
>
> if (ret)
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 9204705..0cf3d6d 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -14,6 +14,7 @@
>
> #include <linux/types.h>
> #include <linux/ioctl.h>
> +#include <linux/iommu.h>
>
> #define VFIO_API_VERSION 0
>
> @@ -1039,6 +1040,24 @@ struct vfio_iommu_type1_info_cap_migration {
> __u64 max_dirty_bitmap_size; /* in bytes */
> };
>
> +/*
> + * The nesting capability allows to report the related capability
> + * and info for nesting iommu type.
> + *
> + * The structures below define version 1 of this capability.
> + *
> + * Userspace selected VFIO_TYPE1_NESTING_IOMMU type should check
> + * this capability to get supported features.
> + *
> + * @info: the nesting info provided by IOMMU driver.
> + */
> +#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING 3
> +
> +struct vfio_iommu_type1_info_cap_nesting {
> + struct vfio_info_cap_header header;
> + struct iommu_nesting_info info;
> +};
> +
> #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>
> /**

2020-08-21 00:54:02

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v6 04/15] vfio/type1: Report iommu nesting info to userspace

Hi Alex,

> From: Alex Williamson <[email protected]>
> Sent: Friday, August 21, 2020 3:52 AM
>
> On Mon, 27 Jul 2020 23:27:33 -0700
> Liu Yi L <[email protected]> wrote:
>
> > This patch exports iommu nesting capability info to user space through
> > VFIO. Userspace is expected to check this info for supported uAPIs (e.g.
> > PASID alloc/free, bind page table, and cache invalidation) and the vendor
> > specific format information for first level/stage page table that will be
> > bound to.
> >
> > The nesting info is available only after container set to be NESTED type.
> > Current implementation imposes one limitation - one nesting container
> > should include at most one iommu group. The philosophy of vfio container
> > is having all groups/devices within the container share the same IOMMU
> > context. When vSVA is enabled, one IOMMU context could include one 2nd-
> > level address space and multiple 1st-level address spaces. While the
> > 2nd-level address space is reasonably sharable by multiple groups, blindly
> > sharing 1st-level address spaces across all groups within the container
> > might instead break the guest expectation. In the future sub/super container
> > concept might be introduced to allow partial address space sharing within
> > an IOMMU context. But for now let's go with this restriction by requiring
> > singleton container for using nesting iommu features. Below link has the
> > related discussion about this decision.
> >
> > https://lore.kernel.org/kvm/[email protected]/
> >
> > This patch also changes the NESTING type container behaviour. Something
> > that would have succeeded before will now fail: Before this series, if
> > user asked for a VFIO_IOMMU_TYPE1_NESTING, it would have succeeded even
> > if the SMMU didn't support stage-2, as the driver would have silently
> > fallen back on stage-1 mappings (which work exactly the same as stage-2
> > only since there was no nesting supported). After the series, we do check
> > for DOMAIN_ATTR_NESTING so if user asks for VFIO_IOMMU_TYPE1_NESTING
> and
> > the SMMU doesn't support stage-2, the ioctl fails. But it should be a good
> > fix and completely harmless. Detail can be found in below link as well.
> >
> > https://lore.kernel.org/kvm/20200717090900.GC4850@myrica/
> >
> > Cc: Kevin Tian <[email protected]>
> > CC: Jacob Pan <[email protected]>
> > Cc: Alex Williamson <[email protected]>
> > Cc: Eric Auger <[email protected]>
> > Cc: Jean-Philippe Brucker <[email protected]>
> > Cc: Joerg Roedel <[email protected]>
> > Cc: Lu Baolu <[email protected]>
> > Signed-off-by: Liu Yi L <[email protected]>
> > ---
> > v5 -> v6:
> > *) address comments against v5 from Eric Auger.
> > *) don't report nesting cap to userspace if the nesting_info->format is
> > invalid.
> >
> > v4 -> v5:
> > *) address comments from Eric Auger.
> > *) return struct iommu_nesting_info for
> VFIO_IOMMU_TYPE1_INFO_CAP_NESTING as
> > cap is much "cheap", if needs extension in future, just define another cap.
> > https://lore.kernel.org/kvm/[email protected]/
> >
> > v3 -> v4:
> > *) address comments against v3.
> >
> > v1 -> v2:
> > *) added in v2
> > ---
> > drivers/vfio/vfio_iommu_type1.c | 106
> +++++++++++++++++++++++++++++++++++-----
> > include/uapi/linux/vfio.h | 19 +++++++
> > 2 files changed, 113 insertions(+), 12 deletions(-)
> >
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index 3bd70ff..18ff0c3 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -62,18 +62,20 @@ MODULE_PARM_DESC(dma_entry_limit,
> > "Maximum number of user DMA mappings per container (65535).");
> >
> > struct vfio_iommu {
> > - struct list_head domain_list;
> > - struct list_head iova_list;
> > - struct vfio_domain *external_domain; /* domain for external user */
> > - struct mutex lock;
> > - struct rb_root dma_list;
> > - struct blocking_notifier_head notifier;
> > - unsigned int dma_avail;
> > - uint64_t pgsize_bitmap;
> > - bool v2;
> > - bool nesting;
> > - bool dirty_page_tracking;
> > - bool pinned_page_dirty_scope;
> > + struct list_head domain_list;
> > + struct list_head iova_list;
> > + /* domain for external user */
> > + struct vfio_domain *external_domain;
> > + struct mutex lock;
> > + struct rb_root dma_list;
> > + struct blocking_notifier_head notifier;
> > + unsigned int dma_avail;
> > + uint64_t pgsize_bitmap;
> > + bool v2;
> > + bool nesting;
> > + bool dirty_page_tracking;
> > + bool pinned_page_dirty_scope;
> > + struct iommu_nesting_info *nesting_info;
> > };
> >
> > struct vfio_domain {
> > @@ -130,6 +132,9 @@ struct vfio_regions {
> > #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu) \
> > (!list_empty(&iommu->domain_list))
> >
> > +#define CONTAINER_HAS_DOMAIN(iommu) (((iommu)->external_domain) || \
> > + (!list_empty(&(iommu)->domain_list)))
> > +
> > #define DIRTY_BITMAP_BYTES(n) (ALIGN(n, BITS_PER_TYPE(u64)) /
> BITS_PER_BYTE)
> >
> > /*
> > @@ -1929,6 +1934,13 @@ static void vfio_iommu_iova_insert_copy(struct
> vfio_iommu *iommu,
> >
> > list_splice_tail(iova_copy, iova);
> > }
> > +
> > +static void vfio_iommu_release_nesting_info(struct vfio_iommu *iommu)
> > +{
> > + kfree(iommu->nesting_info);
> > + iommu->nesting_info = NULL;
> > +}
> > +
> > static int vfio_iommu_type1_attach_group(void *iommu_data,
> > struct iommu_group *iommu_group)
> > {
> > @@ -1959,6 +1971,12 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> > }
> > }
> >
> > + /* Nesting type container can include only one group */
> > + if (iommu->nesting && CONTAINER_HAS_DOMAIN(iommu)) {
> > + mutex_unlock(&iommu->lock);
> > + return -EINVAL;
> > + }
> > +
> > group = kzalloc(sizeof(*group), GFP_KERNEL);
> > domain = kzalloc(sizeof(*domain), GFP_KERNEL);
> > if (!group || !domain) {
> > @@ -2029,6 +2047,32 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> > if (ret)
> > goto out_domain;
>
>
> I think that currently a user can configure a VFIO_TYPE1_NESTING_IOMMU
> IOMMU type with a non-IOMMU backed mdev. Does that make any sort of
> sense or is this a bug? Thanks,

I guess we don't support such case as non-IOMMU mdev is attached to the
external domain. It won't come to the part where allocates domain and
sets nesting for the domain.

Regards,
Yi Liu

> Alex
>
> >
> > + /* Nesting cap info is available only after attaching */
> > + if (iommu->nesting) {
> > + struct iommu_nesting_info tmp = { .argsz = 0, };
> > +
> > + /* First get the size of vendor specific nesting info */
> > + ret = iommu_domain_get_attr(domain->domain,
> > + DOMAIN_ATTR_NESTING,
> > + &tmp);
> > + if (ret)
> > + goto out_detach;
> > +
> > + iommu->nesting_info = kzalloc(tmp.argsz, GFP_KERNEL);
> > + if (!iommu->nesting_info) {
> > + ret = -ENOMEM;
> > + goto out_detach;
> > + }
> > +
> > + /* Now get the nesting info */
> > + iommu->nesting_info->argsz = tmp.argsz;
> > + ret = iommu_domain_get_attr(domain->domain,
> > + DOMAIN_ATTR_NESTING,
> > + iommu->nesting_info);
> > + if (ret)
> > + goto out_detach;
> > + }
> > +
> > /* Get aperture info */
> > iommu_domain_get_attr(domain->domain, DOMAIN_ATTR_GEOMETRY,
> &geo);
> >
> > @@ -2138,6 +2182,7 @@ static int vfio_iommu_type1_attach_group(void
> *iommu_data,
> > return 0;
> >
> > out_detach:
> > + vfio_iommu_release_nesting_info(iommu);
> > vfio_iommu_detach_group(domain, group);
> > out_domain:
> > iommu_domain_free(domain->domain);
> > @@ -2338,6 +2383,8 @@ static void vfio_iommu_type1_detach_group(void
> *iommu_data,
> > vfio_iommu_unmap_unpin_all(iommu);
> > else
> >
> vfio_iommu_unmap_unpin_reaccount(iommu);
> > +
> > + vfio_iommu_release_nesting_info(iommu);
> > }
> > iommu_domain_free(domain->domain);
> > list_del(&domain->next);
> > @@ -2546,6 +2593,39 @@ static int vfio_iommu_migration_build_caps(struct
> vfio_iommu *iommu,
> > return vfio_info_add_capability(caps, &cap_mig.header, sizeof(cap_mig));
> > }
> >
> > +static int vfio_iommu_add_nesting_cap(struct vfio_iommu *iommu,
> > + struct vfio_info_cap *caps)
> > +{
> > + struct vfio_info_cap_header *header;
> > + struct vfio_iommu_type1_info_cap_nesting *nesting_cap;
> > + size_t size;
> > +
> > + /* when nesting_info is null, no need go further */
> > + if (!iommu->nesting_info)
> > + return 0;
> > +
> > + /* when @format of nesting_info is 0, fail the call */
> > + if (iommu->nesting_info->format == 0)
> > + return -ENOENT;
> > +
> > + size = offsetof(struct vfio_iommu_type1_info_cap_nesting, info) +
> > + iommu->nesting_info->argsz;
> > +
> > + header = vfio_info_cap_add(caps, size,
> > + VFIO_IOMMU_TYPE1_INFO_CAP_NESTING, 1);
> > + if (IS_ERR(header))
> > + return PTR_ERR(header);
> > +
> > + nesting_cap = container_of(header,
> > + struct vfio_iommu_type1_info_cap_nesting,
> > + header);
> > +
> > + memcpy(&nesting_cap->info, iommu->nesting_info,
> > + iommu->nesting_info->argsz);
> > +
> > + return 0;
> > +}
> > +
> > static int vfio_iommu_type1_get_info(struct vfio_iommu *iommu,
> > unsigned long arg)
> > {
> > @@ -2581,6 +2661,8 @@ static int vfio_iommu_type1_get_info(struct
> vfio_iommu *iommu,
> > if (!ret)
> > ret = vfio_iommu_iova_build_caps(iommu, &caps);
> >
> > + ret = vfio_iommu_add_nesting_cap(iommu, &caps);
> > +
> > mutex_unlock(&iommu->lock);
> >
> > if (ret)
> > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> > index 9204705..0cf3d6d 100644
> > --- a/include/uapi/linux/vfio.h
> > +++ b/include/uapi/linux/vfio.h
> > @@ -14,6 +14,7 @@
> >
> > #include <linux/types.h>
> > #include <linux/ioctl.h>
> > +#include <linux/iommu.h>
> >
> > #define VFIO_API_VERSION 0
> >
> > @@ -1039,6 +1040,24 @@ struct vfio_iommu_type1_info_cap_migration {
> > __u64 max_dirty_bitmap_size; /* in bytes */
> > };
> >
> > +/*
> > + * The nesting capability allows to report the related capability
> > + * and info for nesting iommu type.
> > + *
> > + * The structures below define version 1 of this capability.
> > + *
> > + * Userspace selected VFIO_TYPE1_NESTING_IOMMU type should check
> > + * this capability to get supported features.
> > + *
> > + * @info: the nesting info provided by IOMMU driver.
> > + */
> > +#define VFIO_IOMMU_TYPE1_INFO_CAP_NESTING 3
> > +
> > +struct vfio_iommu_type1_info_cap_nesting {
> > + struct vfio_info_cap_header header;
> > + struct iommu_nesting_info info;
> > +};
> > +
> > #define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
> >
> > /**