2023-03-09 08:25:09

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 0/5] Add Intel VT-d nested translation

This is to add Intel VT-d nested translation based on IOMMUFD nesting
infrastructure. As the iommufd nesting infrastructure series[1], iommu
core supports new ops to report iommu hardware information, allocate
domains with user data and sync stage-1 IOTLB. The data required in
the three paths are vendor-specific, so

1) IOMMU_HW_INFO_TYPE_INTEL_VTD and struct iommu_device_info_vtd are
defined to report iommu hardware information for Intel VT-d .
2) IOMMU_HWPT_DATA_VTD_S1 is defined for the Intel VT-d stage-1 page
table, it will be used in the stage-1 domain allocation and IOTLB
syncing path. struct iommu_hwpt_intel_vtd is defined to pass user_data
for the Intel VT-d stage-1 domain allocation.
struct iommu_hwpt_invalidate_intel_vtd is defined to pass the data for
the Intel VT-d stage-1 IOTLB invalidation.

With above IOMMUFD extensions, the intel iommu driver implements the three
callbacks to support nested translation.

Complete code can be found in [2], QEMU could can be found in [3].

base-commit: f01f1c95684dd18c15dd0e51b4fd6e796a0a2c0e

[1] https://lore.kernel.org/linux-iommu/[email protected]/
[2] https://github.com/yiliu1765/iommufd/tree/iommufd_nesting
[3] https://github.com/yiliu1765/qemu/tree/wip/iommufd_rfcv3%2Bnesting

Change log:
v2:
- The iommufd infrastructure is split to be separate series.

v1: https://lore.kernel.org/linux-iommu/[email protected]/

Regards,
Yi Liu

Lu Baolu (4):
iommu/vt-d: Implement hw_info for iommu capability query
iommu/vt-d: Extend dmar_domain to support nested domain
iommu/vt-d: Add helper to setup pasid nested translation
iommu/vt-d: Add nested domain support

Yi Liu (1):
iommufd: Add nesting related data structures for Intel VT-d

drivers/iommu/intel/Makefile | 2 +-
drivers/iommu/intel/iommu.c | 57 ++++++++---
drivers/iommu/intel/iommu.h | 51 ++++++++--
drivers/iommu/intel/nested.c | 143 +++++++++++++++++++++++++++
drivers/iommu/intel/pasid.c | 142 ++++++++++++++++++++++++++
drivers/iommu/intel/pasid.h | 2 +
drivers/iommu/iommufd/hw_pagetable.c | 7 +-
drivers/iommu/iommufd/main.c | 5 +
include/uapi/linux/iommufd.h | 136 +++++++++++++++++++++++++
9 files changed, 522 insertions(+), 23 deletions(-)
create mode 100644 drivers/iommu/intel/nested.c

--
2.34.1



2023-03-09 08:25:09

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 3/5] iommu/vt-d: Extend dmar_domain to support nested domain

From: Lu Baolu <[email protected]>

The nested domain fields are exclusive to those that used for a DMA
remapping domain. Use union to avoid memory waste.

Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
---
drivers/iommu/intel/iommu.h | 35 +++++++++++++++++++++++++++++------
1 file changed, 29 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 9871de2170eb..9446e17dd202 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -599,15 +599,38 @@ struct dmar_domain {
spinlock_t lock; /* Protect device tracking lists */
struct list_head devices; /* all devices' list */

- struct dma_pte *pgd; /* virtual address */
- int gaw; /* max guest address width */
-
- /* adjusted guest address width, 0 is level 2 30-bit */
- int agaw;
int iommu_superpage;/* Level of superpages supported:
0 == 4KiB (no superpages), 1 == 2MiB,
2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
- u64 max_addr; /* maximum mapped address */
+ union {
+ /* DMA remapping domain */
+ struct {
+ /* virtual address */
+ struct dma_pte *pgd;
+ /* max guest address width */
+ int gaw;
+ /*
+ * adjusted guest address width:
+ * 0: level 2 30-bit
+ * 1: level 3 39-bit
+ * 2: level 4 48-bit
+ * 3: level 5 57-bit
+ */
+ int agaw;
+ /* maximum mapped address */
+ u64 max_addr;
+ };
+
+ /* Nested user domain */
+ struct {
+ /* 2-level page table the user domain nested */
+ struct dmar_domain *s2_domain;
+ /* user page table pointer (in GPA) */
+ unsigned long s1_pgtbl;
+ /* page table attributes */
+ struct iommu_hwpt_intel_vtd s1_cfg;
+ };
+ };

struct iommu_domain domain; /* generic domain data structure for
iommu core */
--
2.34.1


2023-03-09 08:25:09

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 1/5] iommufd: Add nesting related data structures for Intel VT-d

Add the following data structures for corresponding ioctls:
iommu_hwpt_intel_vtd => IOMMU_HWPT_ALLOC
iommu_hw_info_vtd => IOMMU_DEVICE_GET_HW_INFO
iommu_hwpt_invalidate_intel_vtd => IOMMU_HWPT_INVALIDATE

Also, add IOMMU_HW_INFO_TYPE_INTEL_VTD and IOMMU_HWPT_TYPE_VTD_S1 to the
header and corresponding type/size arrays.

Signed-off-by: Yi Liu <[email protected]>
---
drivers/iommu/iommufd/hw_pagetable.c | 7 +-
drivers/iommu/iommufd/main.c | 5 +
include/uapi/linux/iommufd.h | 136 +++++++++++++++++++++++++++
3 files changed, 147 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommufd/hw_pagetable.c b/drivers/iommu/iommufd/hw_pagetable.c
index 76ff228dfc1f..36e79db8a17d 100644
--- a/drivers/iommu/iommufd/hw_pagetable.c
+++ b/drivers/iommu/iommufd/hw_pagetable.c
@@ -172,6 +172,7 @@ iommufd_hw_pagetable_alloc(struct iommufd_ctx *ictx, struct iommufd_ioas *ioas,
*/
static const size_t iommufd_hwpt_alloc_data_size[] = {
[IOMMU_HWPT_TYPE_DEFAULT] = 0,
+ [IOMMU_HWPT_TYPE_VTD_S1] = sizeof(struct iommu_hwpt_intel_vtd),
};

/*
@@ -180,6 +181,8 @@ static const size_t iommufd_hwpt_alloc_data_size[] = {
*/
const u64 iommufd_hwpt_type_bitmaps[] = {
[IOMMU_HW_INFO_TYPE_DEFAULT] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT),
+ [IOMMU_HW_INFO_TYPE_INTEL_VTD] = BIT_ULL(IOMMU_HWPT_TYPE_DEFAULT) |
+ BIT_ULL(IOMMU_HWPT_TYPE_VTD_S1),
};

/* Return true if type is supported, otherwise false */
@@ -324,7 +327,9 @@ int iommufd_hwpt_alloc(struct iommufd_ucmd *ucmd)
* size of page table type specific invalidate_info, indexed by
* enum iommu_hwpt_type.
*/
-static const size_t iommufd_hwpt_invalidate_info_size[] = {};
+static const size_t iommufd_hwpt_invalidate_info_size[] = {
+ [IOMMU_HWPT_TYPE_VTD_S1] = sizeof(struct iommu_hwpt_invalidate_intel_vtd),
+};

int iommufd_hwpt_invalidate(struct iommufd_ucmd *ucmd)
{
diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c
index 7ec3ceac01b3..514db4c26927 100644
--- a/drivers/iommu/iommufd/main.c
+++ b/drivers/iommu/iommufd/main.c
@@ -275,6 +275,11 @@ union ucmd_buffer {
#ifdef CONFIG_IOMMUFD_TEST
struct iommu_test_cmd test;
#endif
+ /*
+ * data_type specific structure used in the cache invalidation
+ * path.
+ */
+ struct iommu_hwpt_invalidate_intel_vtd vtd;
};

struct iommufd_ioctl_op {
diff --git a/include/uapi/linux/iommufd.h b/include/uapi/linux/iommufd.h
index e2eff9c56ab3..2a6c326391b2 100644
--- a/include/uapi/linux/iommufd.h
+++ b/include/uapi/linux/iommufd.h
@@ -351,9 +351,64 @@ struct iommu_vfio_ioas {
/**
* enum iommu_hwpt_type - IOMMU HWPT Type
* @IOMMU_HWPT_TYPE_DEFAULT: default
+ * @IOMMU_HWPT_TYPE_VTD_S1: Intel VT-d stage-1 page table
*/
enum iommu_hwpt_type {
IOMMU_HWPT_TYPE_DEFAULT,
+ IOMMU_HWPT_TYPE_VTD_S1,
+};
+
+/**
+ * enum iommu_hwpt_intel_vtd_flags - Intel VT-d stage-1 page
+ * table entry attributes
+ * @IOMMU_VTD_PGTBL_SRE: Supervisor request
+ * @IOMMU_VTD_PGTBL_EAFE: Extended access enable
+ * @IOMMU_VTD_PGTBL_PCD: Page-level cache disable
+ * @IOMMU_VTD_PGTBL_PWT: Page-level write through
+ * @IOMMU_VTD_PGTBL_EMTE: Extended mem type enable
+ * @IOMMU_VTD_PGTBL_CD: PASID-level cache disable
+ * @IOMMU_VTD_PGTBL_WPE: Write protect enable
+ */
+enum iommu_hwpt_intel_vtd_flags {
+ IOMMU_VTD_PGTBL_SRE = 1 << 0,
+ IOMMU_VTD_PGTBL_EAFE = 1 << 1,
+ IOMMU_VTD_PGTBL_PCD = 1 << 2,
+ IOMMU_VTD_PGTBL_PWT = 1 << 3,
+ IOMMU_VTD_PGTBL_EMTE = 1 << 4,
+ IOMMU_VTD_PGTBL_CD = 1 << 5,
+ IOMMU_VTD_PGTBL_WPE = 1 << 6,
+ IOMMU_VTD_PGTBL_LAST = 1 << 7,
+};
+
+/**
+ * struct iommu_hwpt_intel_vtd - Intel VT-d specific user-managed
+ * stage-1 page table info
+ * @flags: Combination of enum iommu_hwpt_intel_vtd_flags
+ * @pgtbl_addr: The base address of the user-managed stage-1 page table.
+ * @pat: Page attribute table data to compute effective memory type
+ * @emt: Extended memory type
+ * @addr_width: The address width of the untranslated addresses that are
+ * subjected to the user-managed stage-1 page table.
+ * @__reserved: Must be 0
+ *
+ * The Intel VT-d specific data for creating hw_pagetable to represent
+ * the user-managed stage-1 page table that is used in nested translation.
+ *
+ * In nested translation, the stage-1 page table locates in the address
+ * space that defined by the corresponding stage-2 page table. Hence the
+ * stage-1 page table base address value should not be higher than the
+ * maximum untranslated address of stage-2 page table.
+ *
+ * The paging level of the stage-1 page table should be compataible with
+ * the hardware iommu. Otherwise, the allocation would be failed.
+ */
+struct iommu_hwpt_intel_vtd {
+ __u64 flags;
+ __u64 pgtbl_addr;
+ __u32 pat;
+ __u32 emt;
+ __u32 addr_width;
+ __u32 __reserved;
};

/**
@@ -389,6 +444,8 @@ enum iommu_hwpt_type {
* +------------------------------+-------------------------------------+-----------+
* | IOMMU_HWPT_TYPE_DEFAULT | N/A | IOAS |
* +------------------------------+-------------------------------------+-----------+
+ * | IOMMU_HWPT_TYPE_VTD_S1 | struct iommu_hwpt_intel_vtd | HWPT |
+ * +------------------------------+-------------------------------------+-----------+
*/
struct iommu_hwpt_alloc {
__u32 size;
@@ -405,9 +462,32 @@ struct iommu_hwpt_alloc {

/**
* enum iommu_hw_info_type - IOMMU Hardware Info Types
+ * @IOMMU_HW_INFO_TYPE_INTEL_VTD: Intel VT-d iommu info type
*/
enum iommu_hw_info_type {
IOMMU_HW_INFO_TYPE_DEFAULT,
+ IOMMU_HW_INFO_TYPE_INTEL_VTD,
+};
+
+/**
+ * struct iommu_hw_info_vtd - Intel VT-d hardware information
+ *
+ * @flags: Must be set to 0
+ * @__reserved: Must be 0
+ *
+ * @cap_reg: Value of Intel VT-d capability register defined in VT-d spec
+ * section 11.4.2 Capability Register.
+ * @ecap_reg: Value of Intel VT-d capability register defined in VT-d spec
+ * section 11.4.3 Extended Capability Register.
+ *
+ * User needs to understand the Intel VT-d specification to decode the
+ * register value.
+ */
+struct iommu_hw_info_vtd {
+ __u32 flags;
+ __u32 __reserved;
+ __aligned_u64 cap_reg;
+ __aligned_u64 ecap_reg;
};

/**
@@ -457,6 +537,60 @@ struct iommu_hw_info {
};
#define IOMMU_DEVICE_GET_HW_INFO _IO(IOMMUFD_TYPE, IOMMUFD_CMD_DEVICE_GET_HW_INFO)

+/**
+ * enum iommu_vtd_qi_granularity - Intel VT-d specific granularity of
+ * queued invalidation
+ * @IOMMU_VTD_QI_GRAN_DOMAIN: domain-selective invalidation
+ * @IOMMU_VTD_QI_GRAN_ADDR: page-selective invalidation
+ */
+enum iommu_vtd_qi_granularity {
+ IOMMU_VTD_QI_GRAN_DOMAIN,
+ IOMMU_VTD_QI_GRAN_ADDR,
+};
+
+/**
+ * enum iommu_hwpt_intel_vtd_invalidate_flags - Flags for Intel VT-d
+ * stage-1 page table cache
+ * invalidation
+ * @IOMMU_VTD_QI_FLAGS_LEAF: The LEAF flag indicates whether only the
+ * leaf PTE caching needs to be invalidated
+ * and other paging structure caches can be
+ * preserved.
+ */
+enum iommu_hwpt_intel_vtd_invalidate_flags {
+ IOMMU_VTD_QI_FLAGS_LEAF = 1 << 0,
+};
+
+/**
+ * struct iommu_hwpt_invalidate_intel_vtd - Intel VT-d cache invalidation info
+ * @granularity: One of enum iommu_vtd_qi_granularity.
+ * @flags: Combination of enum iommu_hwpt_intel_vtd_invalidate_flags
+ * @__reserved: Must be 0
+ * @addr: The start address of the addresses to be invalidated.
+ * @granule_size: Page/block size of the mapping in bytes. It is used to
+ * compute the invalidation range togehter with @nb_granules.
+ * @nb_granules: Number of contiguous granules to be invalidated.
+ *
+ * The Intel VT-d specific invalidation data for user-managed stage-1 cache
+ * invalidation under nested translation. Userspace uses this structure to
+ * tell host about the impacted caches after modifying the stage-1 page table.
+ *
+ * @addr, @granule_size and @nb_granules are meaningful when
+ * @granularity==IOMMU_VTD_QI_GRAN_ADDR. Intel VT-d currently only supports
+ * 4kB page size, so @granule_size should be 4KB. @addr should be aligned
+ * with @granule_size * @nb_granules, otherwise invalidation won't take
+ * effect.
+ */
+struct iommu_hwpt_invalidate_intel_vtd {
+ __u8 granularity;
+ __u8 padding[7];
+ __u32 flags;
+ __u32 __reserved;
+ __u64 addr;
+ __u64 granule_size;
+ __u64 nb_granules;
+};
+
/**
* struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE)
* @size: sizeof(struct iommu_hwpt_invalidate)
@@ -473,6 +607,8 @@ struct iommu_hw_info {
* +==============================+========================================+
* | @data_type | Data structure in @data_uptr |
* +------------------------------+----------------------------------------+
+ * | IOMMU_HWPT_TYPE_VTD_S1 | struct iommu_hwpt_invalidate_intel_vtd |
+ * +------------------------------+----------------------------------------+
*/
struct iommu_hwpt_invalidate {
__u32 size;
--
2.34.1


2023-03-09 08:25:09

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 4/5] iommu/vt-d: Add helper to setup pasid nested translation

From: Lu Baolu <[email protected]>

The configurations are passed in from the user when the user domain is
allocated. This helper interprets these configurations according to the
data structure defined in uapi/linux/iommufd.h. The EINVAL error will be
returned if any of configurations are not compatible with the hardware
capabilities. The caller can retry with another compatible user domain.
The encoding of fields of each pasid entry is defined in section 9.6 of
the VT-d spec.

Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
---
drivers/iommu/intel/pasid.c | 142 ++++++++++++++++++++++++++++++++++++
drivers/iommu/intel/pasid.h | 2 +
2 files changed, 144 insertions(+)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 633e0a4a01e7..2e8a912b1499 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -21,6 +21,11 @@
#include "iommu.h"
#include "pasid.h"

+#define IOMMU_VTD_PGTBL_MTS_MASK (IOMMU_VTD_PGTBL_CD | \
+ IOMMU_VTD_PGTBL_EMTE | \
+ IOMMU_VTD_PGTBL_PCD | \
+ IOMMU_VTD_PGTBL_PWT)
+
/*
* Intel IOMMU system wide PASID name space:
*/
@@ -411,6 +416,15 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value)
pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
}

+/*
+ * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
+ * of a scalable mode PASID entry.
+ */
+static inline void pasid_set_eafe(struct pasid_entry *pe)
+{
+ pasid_set_bits(&pe->val[2], 1 << 7, 1 << 7);
+}
+
static void
pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
u16 did, u32 pasid)
@@ -756,3 +770,131 @@ void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
if (!cap_caching_mode(iommu->cap))
devtlb_invalidation_with_pasid(iommu, dev, pasid);
}
+
+/**
+ * intel_pasid_setup_nested() - Set up PASID entry for nested translation.
+ * This could be used for guest shared virtual address. In this case, the
+ * first level page tables are used for GVA-GPA translation in the guest,
+ * second level page tables are used for GPA-HPA translation.
+ *
+ * @iommu: IOMMU which the device belong to
+ * @dev: Device to be set up for translation
+ * @pasid: PASID to be programmed in the device PASID table
+ * @domain: User domain nested on a s2 domain
+ */
+int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
+ u32 pasid, struct dmar_domain *domain)
+{
+ struct iommu_hwpt_intel_vtd *s1_cfg = &domain->s1_cfg;
+ pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl;
+ struct dmar_domain *s2_domain = domain->s2_domain;
+ u16 did = domain_id_iommu(domain, iommu);
+ struct dma_pte *pgd = s2_domain->pgd;
+ struct pasid_entry *pte;
+ int agaw;
+
+ if (!ecap_nest(iommu->ecap)) {
+ pr_err_ratelimited("%s: No nested translation support\n",
+ iommu->name);
+ return -ENODEV;
+ }
+
+ /*
+ * Sanity checking performed by caller to make sure address width
+ * matching in two dimensions: CPU vs. IOMMU, guest vs. host.
+ */
+ switch (s1_cfg->addr_width) {
+ case ADDR_WIDTH_4LEVEL:
+ break;
+#ifdef CONFIG_X86
+ case ADDR_WIDTH_5LEVEL:
+ if (!cpu_feature_enabled(X86_FEATURE_LA57) ||
+ !cap_fl5lp_support(iommu->cap)) {
+ dev_err_ratelimited(dev,
+ "5-level paging not supported\n");
+ return -EINVAL;
+ }
+ break;
+#endif
+ default:
+ dev_err_ratelimited(dev, "Invalid guest address width %d\n",
+ s1_cfg->addr_width);
+ return -EINVAL;
+ }
+
+ if ((s1_cfg->flags & IOMMU_VTD_PGTBL_SRE) && !ecap_srs(iommu->ecap)) {
+ pr_err_ratelimited("No supervisor request support on %s\n",
+ iommu->name);
+ return -EINVAL;
+ }
+
+ if ((s1_cfg->flags & IOMMU_VTD_PGTBL_EAFE) && !ecap_eafs(iommu->ecap)) {
+ pr_err_ratelimited("No extended access flag support on %s\n",
+ iommu->name);
+ return -EINVAL;
+ }
+
+ /*
+ * Memory type is only applicable to devices inside processor coherent
+ * domain. Will add MTS support once coherent devices are available.
+ */
+ if (s1_cfg->flags & IOMMU_VTD_PGTBL_MTS_MASK) {
+ pr_warn_ratelimited("No memory type support %s\n",
+ iommu->name);
+ return -EINVAL;
+ }
+
+ agaw = iommu_skip_agaw(s2_domain, iommu, &pgd);
+ if (agaw < 0) {
+ dev_err_ratelimited(dev, "Invalid domain page table\n");
+ return -EINVAL;
+ }
+
+ /* First level PGD (in GPA) must be supported by the second level. */
+ if ((uintptr_t)s1_gpgd > (1ULL << s2_domain->gaw)) {
+ dev_err_ratelimited(dev,
+ "Guest PGD %lx not supported, max %llx\n",
+ (uintptr_t)s1_gpgd, s2_domain->max_addr);
+ return -EINVAL;
+ }
+
+ spin_lock(&iommu->lock);
+ pte = intel_pasid_get_entry(dev, pasid);
+ if (!pte) {
+ spin_unlock(&iommu->lock);
+ return -ENODEV;
+ }
+ if (pasid_pte_is_present(pte)) {
+ spin_unlock(&iommu->lock);
+ return -EBUSY;
+ }
+
+ pasid_clear_entry(pte);
+
+ if (s1_cfg->addr_width == ADDR_WIDTH_5LEVEL)
+ pasid_set_flpm(pte, 1);
+
+ pasid_set_flptr(pte, (uintptr_t)s1_gpgd);
+
+ if (s1_cfg->flags & IOMMU_VTD_PGTBL_SRE) {
+ pasid_set_sre(pte);
+ if (s1_cfg->flags & IOMMU_VTD_PGTBL_WPE)
+ pasid_set_wpe(pte);
+ }
+
+ if (s1_cfg->flags & IOMMU_VTD_PGTBL_EAFE)
+ pasid_set_eafe(pte);
+
+ pasid_set_slptr(pte, virt_to_phys(pgd));
+ pasid_set_fault_enable(pte);
+ pasid_set_domain_id(pte, did);
+ pasid_set_address_width(pte, agaw);
+ pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+ pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
+ pasid_set_present(pte);
+ spin_unlock(&iommu->lock);
+
+ pasid_flush_caches(iommu, pte, pasid, did);
+
+ return 0;
+}
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index 20c54e50f533..2a72bbc79532 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -118,6 +118,8 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
struct dmar_domain *domain,
struct device *dev, u32 pasid);
+int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
+ u32 pasid, struct dmar_domain *domain);
void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
struct device *dev, u32 pasid,
bool fault_ignore);
--
2.34.1


2023-03-09 08:25:40

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 2/5] iommu/vt-d: Implement hw_info for iommu capability query

From: Lu Baolu <[email protected]>

Add intel_iommu_hw_info() to report cap_reg and ecap_reg information.

Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Yi Sun <[email protected]>
---
drivers/iommu/intel/iommu.c | 19 +++++++++++++++++++
drivers/iommu/intel/iommu.h | 1 +
2 files changed, 20 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 7c2f4bd33582..004406209793 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4780,8 +4780,26 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
intel_pasid_tear_down_entry(iommu, dev, pasid, false);
}

+static void *intel_iommu_hw_info(struct device *dev, u32 *length)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+ struct iommu_hw_info_vtd *vtd;
+
+ vtd = kzalloc(sizeof(*vtd), GFP_KERNEL);
+ if (!vtd)
+ return ERR_PTR(-ENOMEM);
+
+ vtd->cap_reg = iommu->cap;
+ vtd->ecap_reg = iommu->ecap;
+ *length = sizeof(*vtd);
+
+ return vtd;
+}
+
const struct iommu_ops intel_iommu_ops = {
.capable = intel_iommu_capable,
+ .hw_info = intel_iommu_hw_info,
.domain_alloc = intel_iommu_domain_alloc,
.probe_device = intel_iommu_probe_device,
.probe_finalize = intel_iommu_probe_finalize,
@@ -4794,6 +4812,7 @@ const struct iommu_ops intel_iommu_ops = {
.def_domain_type = device_def_domain_type,
.remove_dev_pasid = intel_iommu_remove_dev_pasid,
.pgsize_bitmap = SZ_4K,
+ .driver_type = IOMMU_HW_INFO_TYPE_INTEL_VTD,
#ifdef CONFIG_INTEL_IOMMU_SVM
.page_response = intel_svm_page_response,
#endif
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index d6df3b865812..9871de2170eb 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -23,6 +23,7 @@
#include <linux/bitfield.h>
#include <linux/xarray.h>
#include <linux/perf_event.h>
+#include <uapi/linux/iommufd.h>

#include <asm/cacheflush.h>
#include <asm/iommu.h>
--
2.34.1


2023-03-09 08:25:40

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 5/5] iommu/vt-d: Add nested domain support

From: Lu Baolu <[email protected]>

This adds nested domain support in the Intel IOMMU driver. It allows to
allocate and free a nested domain, set the nested domain to a device,
and synchronize the caches when the userspace managed page tables are
updated.

Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Nicolin Chen <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
---
drivers/iommu/intel/Makefile | 2 +-
drivers/iommu/intel/iommu.c | 38 ++++++----
drivers/iommu/intel/iommu.h | 15 ++++
drivers/iommu/intel/nested.c | 143 +++++++++++++++++++++++++++++++++++
4 files changed, 182 insertions(+), 16 deletions(-)
create mode 100644 drivers/iommu/intel/nested.c

diff --git a/drivers/iommu/intel/Makefile b/drivers/iommu/intel/Makefile
index 7af3b8a4f2a0..5dabf081a779 100644
--- a/drivers/iommu/intel/Makefile
+++ b/drivers/iommu/intel/Makefile
@@ -1,6 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
obj-$(CONFIG_DMAR_TABLE) += dmar.o
-obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o
+obj-$(CONFIG_INTEL_IOMMU) += iommu.o pasid.o nested.o
obj-$(CONFIG_DMAR_TABLE) += trace.o cap_audit.o
obj-$(CONFIG_DMAR_PERF) += perf.o
obj-$(CONFIG_INTEL_IOMMU_DEBUGFS) += debugfs.o
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 004406209793..89cc7095afed 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -277,7 +277,6 @@ static LIST_HEAD(dmar_satc_units);
#define for_each_rmrr_units(rmrr) \
list_for_each_entry(rmrr, &dmar_rmrr_units, list)

-static void device_block_translation(struct device *dev);
static void intel_iommu_domain_free(struct iommu_domain *domain);

int dmar_disabled = !IS_ENABLED(CONFIG_INTEL_IOMMU_DEFAULT_ON);
@@ -555,7 +554,7 @@ static unsigned long domain_super_pgsize_bitmap(struct dmar_domain *domain)
}

/* Some capabilities may be different across iommus */
-static void domain_update_iommu_cap(struct dmar_domain *domain)
+void domain_update_iommu_cap(struct dmar_domain *domain)
{
domain_update_iommu_coherency(domain);
domain->iommu_superpage = domain_update_iommu_superpage(domain, NULL);
@@ -1498,10 +1497,10 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
spin_unlock_irqrestore(&domain->lock, flags);
}

-static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
- struct dmar_domain *domain,
- unsigned long pfn, unsigned int pages,
- int ih, int map)
+void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
+ struct dmar_domain *domain,
+ unsigned long pfn, unsigned int pages,
+ int ih, int map)
{
unsigned int aligned_pages = __roundup_pow_of_two(pages);
unsigned int mask = ilog2(aligned_pages);
@@ -1573,7 +1572,7 @@ static inline void __mapping_notify_one(struct intel_iommu *iommu,
iommu_flush_write_buffer(iommu);
}

-static void intel_flush_iotlb_all(struct iommu_domain *domain)
+void intel_flush_iotlb_all(struct iommu_domain *domain)
{
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
struct iommu_domain_info *info;
@@ -1765,8 +1764,7 @@ static struct dmar_domain *alloc_domain(unsigned int type)
return domain;
}

-static int domain_attach_iommu(struct dmar_domain *domain,
- struct intel_iommu *iommu)
+int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
{
struct iommu_domain_info *info, *curr;
unsigned long ndomains;
@@ -1815,8 +1813,7 @@ static int domain_attach_iommu(struct dmar_domain *domain,
return ret;
}

-static void domain_detach_iommu(struct dmar_domain *domain,
- struct intel_iommu *iommu)
+void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu)
{
struct iommu_domain_info *info;

@@ -4103,7 +4100,7 @@ static void dmar_remove_one_dev_info(struct device *dev)
* all DMA requests without PASID from the device are blocked. If the page
* table has been set, clean up the data structures.
*/
-static void device_block_translation(struct device *dev)
+void device_block_translation(struct device *dev)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
struct intel_iommu *iommu = info->iommu;
@@ -4204,14 +4201,24 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
return NULL;
}

+static struct iommu_domain *
+intel_iommu_domain_alloc_user(struct device *dev, struct iommu_domain *parent,
+ const void *user_data)
+{
+ if (parent)
+ return intel_nested_domain_alloc(parent, user_data);
+ else
+ return iommu_domain_alloc(dev->bus);
+}
+
static void intel_iommu_domain_free(struct iommu_domain *domain)
{
if (domain != &si_domain->domain && domain != &blocking_domain)
domain_exit(to_dmar_domain(domain));
}

-static int prepare_domain_attach_device(struct iommu_domain *domain,
- struct device *dev)
+int prepare_domain_attach_device(struct iommu_domain *domain,
+ struct device *dev)
{
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
struct intel_iommu *iommu;
@@ -4450,7 +4457,7 @@ static void domain_set_force_snooping(struct dmar_domain *domain)
PASID_RID2PASID);
}

-static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
+bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
{
struct dmar_domain *dmar_domain = to_dmar_domain(domain);
unsigned long flags;
@@ -4801,6 +4808,7 @@ const struct iommu_ops intel_iommu_ops = {
.capable = intel_iommu_capable,
.hw_info = intel_iommu_hw_info,
.domain_alloc = intel_iommu_domain_alloc,
+ .domain_alloc_user = intel_iommu_domain_alloc_user,
.probe_device = intel_iommu_probe_device,
.probe_finalize = intel_iommu_probe_finalize,
.release_device = intel_iommu_release_device,
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 9446e17dd202..661fb5050e2d 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -848,6 +848,19 @@ void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,

int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
unsigned int count, unsigned long options);
+int domain_attach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu);
+void domain_detach_iommu(struct dmar_domain *domain, struct intel_iommu *iommu);
+void device_block_translation(struct device *dev);
+int prepare_domain_attach_device(struct iommu_domain *domain,
+ struct device *dev);
+bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain);
+void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
+ struct dmar_domain *domain,
+ unsigned long pfn, unsigned int pages,
+ int ih, int map);
+void intel_flush_iotlb_all(struct iommu_domain *domain);
+void domain_update_iommu_cap(struct dmar_domain *domain);
+
/*
* Options used in qi_submit_sync:
* QI_OPT_WAIT_DRAIN - Wait for PRQ drain completion, spec 6.5.2.8.
@@ -860,6 +873,8 @@ void *alloc_pgtable_page(int node, gfp_t gfp);
void free_pgtable_page(void *vaddr);
void iommu_flush_write_buffer(struct intel_iommu *iommu);
struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
+struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
+ const void *data);

#ifdef CONFIG_INTEL_IOMMU_SVM
extern void intel_svm_check(struct intel_iommu *iommu);
diff --git a/drivers/iommu/intel/nested.c b/drivers/iommu/intel/nested.c
new file mode 100644
index 000000000000..64441d11a84d
--- /dev/null
+++ b/drivers/iommu/intel/nested.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * nested.c - nested mode translation support
+ *
+ * Copyright (C) 2023 Intel Corporation
+ *
+ * Author: Lu Baolu <[email protected]>
+ * Jacob Pan <[email protected]>
+ */
+
+#define pr_fmt(fmt) "DMAR: " fmt
+
+#include <linux/iommu.h>
+#include <linux/pci.h>
+#include <linux/pci-ats.h>
+
+#include "iommu.h"
+#include "pasid.h"
+
+static int intel_nested_attach_dev(struct iommu_domain *domain,
+ struct device *dev)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ struct intel_iommu *iommu = info->iommu;
+ unsigned long flags;
+ int ret = 0;
+
+ if (info->domain)
+ device_block_translation(dev);
+
+ /* Is s2_domain compatible with this IOMMU? */
+ ret = prepare_domain_attach_device(&dmar_domain->s2_domain->domain, dev);
+ if (ret) {
+ dev_err_ratelimited(dev, "s2 domain is not compatible\n");
+ return ret;
+ }
+
+ ret = domain_attach_iommu(dmar_domain, iommu);
+ if (ret) {
+ dev_err_ratelimited(dev, "Failed to attach domain to iommu\n");
+ return ret;
+ }
+
+ ret = intel_pasid_setup_nested(iommu, dev,
+ PASID_RID2PASID, dmar_domain);
+ if (ret) {
+ domain_detach_iommu(dmar_domain, iommu);
+ dev_err_ratelimited(dev, "Failed to setup pasid entry\n");
+ return ret;
+ }
+
+ info->domain = dmar_domain;
+ spin_lock_irqsave(&dmar_domain->lock, flags);
+ list_add(&info->link, &dmar_domain->devices);
+ spin_unlock_irqrestore(&dmar_domain->lock, flags);
+ domain_update_iommu_cap(dmar_domain);
+
+ return 0;
+}
+
+static void intel_nested_domain_free(struct iommu_domain *domain)
+{
+ kfree(to_dmar_domain(domain));
+}
+
+static void intel_nested_invalidate(struct device *dev,
+ struct dmar_domain *domain,
+ void *user_data)
+{
+ struct iommu_hwpt_invalidate_intel_vtd *inv_info = user_data;
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+
+ if (WARN_ON(!user_data))
+ return;
+
+ switch (inv_info->granularity) {
+ case IOMMU_VTD_QI_GRAN_ADDR:
+ if (inv_info->granule_size != VTD_PAGE_SIZE ||
+ !IS_ALIGNED(inv_info->addr, VTD_PAGE_SIZE)) {
+ dev_err_ratelimited(dev, "Invalid invalidation address 0x%llx\n",
+ inv_info->addr);
+ return;
+ }
+
+ iommu_flush_iotlb_psi(iommu, domain,
+ inv_info->addr >> VTD_PAGE_SHIFT,
+ inv_info->nb_granules, 1, 0);
+ break;
+ case IOMMU_VTD_QI_GRAN_DOMAIN:
+ intel_flush_iotlb_all(&domain->domain);
+ break;
+ default:
+ dev_err_ratelimited(dev, "Unsupported IOMMU invalidation type %d\n",
+ inv_info->granularity);
+ break;
+ }
+}
+
+static void intel_nested_cache_invalidate_user(struct iommu_domain *domain,
+ void *user_data)
+{
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ struct device_domain_info *info;
+ unsigned long flags;
+
+ spin_lock_irqsave(&dmar_domain->lock, flags);
+ list_for_each_entry(info, &dmar_domain->devices, link)
+ intel_nested_invalidate(info->dev, dmar_domain,
+ user_data);
+ spin_unlock_irqrestore(&dmar_domain->lock, flags);
+}
+
+static const struct iommu_domain_ops intel_nested_domain_ops = {
+ .attach_dev = intel_nested_attach_dev,
+ .cache_invalidate_user = intel_nested_cache_invalidate_user,
+ .free = intel_nested_domain_free,
+ .enforce_cache_coherency = intel_iommu_enforce_cache_coherency,
+};
+
+struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
+ const void *user_data)
+{
+ const struct iommu_hwpt_intel_vtd *vtd = user_data;
+ struct dmar_domain *domain;
+
+ domain = kzalloc(sizeof(*domain), GFP_KERNEL_ACCOUNT);
+ if (!domain)
+ return NULL;
+
+ domain->use_first_level = true;
+ domain->s2_domain = to_dmar_domain(s2_domain);
+ domain->s1_pgtbl = vtd->pgtbl_addr;
+ domain->s1_cfg = *vtd;
+ domain->domain.ops = &intel_nested_domain_ops;
+ domain->domain.type = IOMMU_DOMAIN_NESTED;
+ INIT_LIST_HEAD(&domain->devices);
+ spin_lock_init(&domain->lock);
+ xa_init(&domain->iommu_array);
+
+ return &domain->domain;
+}
--
2.34.1


2023-03-13 13:51:55

by Jingqi Liu

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] iommu/vt-d: Add nested domain support


On 3/9/2023 4:22 PM, Yi Liu wrote:
> From: Lu Baolu <[email protected]>
>
> This adds nested domain support in the Intel IOMMU driver. It allows to
> allocate and free a nested domain, set the nested domain to a device,
> and synchronize the caches when the userspace managed page tables are
> updated.
>
> Signed-off-by: Jacob Pan <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Signed-off-by: Nicolin Chen <[email protected]>
> Signed-off-by: Yi Liu <[email protected]>
> ---
> drivers/iommu/intel/Makefile | 2 +-
> drivers/iommu/intel/iommu.c | 38 ++++++----
> drivers/iommu/intel/iommu.h | 15 ++++
> drivers/iommu/intel/nested.c | 143 +++++++++++++++++++++++++++++++++++
> 4 files changed, 182 insertions(+), 16 deletions(-)
> create mode 100644 drivers/iommu/intel/nested.c
[...]

> +
> +struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
> + const void *user_data)
> +{
> + const struct iommu_hwpt_intel_vtd *vtd = user_data;
> + struct dmar_domain *domain;
> +

Would it be better to add the following checkĀ ?

if (WARN_ON(!user_data))
return NULL;

> + domain = kzalloc(sizeof(*domain), GFP_KERNEL_ACCOUNT);
> + if (!domain)
> + return NULL;
> +
> + domain->use_first_level = true;
> + domain->s2_domain = to_dmar_domain(s2_domain);
> + domain->s1_pgtbl = vtd->pgtbl_addr;
> + domain->s1_cfg = *vtd;
> + domain->domain.ops = &intel_nested_domain_ops;
> + domain->domain.type = IOMMU_DOMAIN_NESTED;
> + INIT_LIST_HEAD(&domain->devices);
> + spin_lock_init(&domain->lock);
> + xa_init(&domain->iommu_array);
> +
> + return &domain->domain;
> +}

Thanks,

Jingqi



2023-03-14 01:24:56

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] iommu/vt-d: Add nested domain support

On 3/13/23 5:11 PM, Liu, Jingqi wrote:
>> +struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *s2_domain,
>> + const void *user_data)
>> +{
>> + const struct iommu_hwpt_intel_vtd *vtd = user_data;
>> + struct dmar_domain *domain;
>> +
>
> Would it be better to add the following checkĀ ?
>
> if (WARN_ON(!user_data))
> return NULL;

The iommufd has already done the sanity check. Considering that this
callback is only for iommufd purpose, the individual driver has no need
to check it.

Best regards,
baolu

2023-03-15 13:51:09

by Jingqi Liu

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] iommufd: Add nesting related data structures for Intel VT-d


On 3/9/2023 4:22 PM, Yi Liu wrote:
> Add the following data structures for corresponding ioctls:
> iommu_hwpt_intel_vtd => IOMMU_HWPT_ALLOC
> iommu_hw_info_vtd => IOMMU_DEVICE_GET_HW_INFO
> iommu_hwpt_invalidate_intel_vtd => IOMMU_HWPT_INVALIDATE
>
> Also, add IOMMU_HW_INFO_TYPE_INTEL_VTD and IOMMU_HWPT_TYPE_VTD_S1 to the
> header and corresponding type/size arrays.
>
> Signed-off-by: Yi Liu <[email protected]>
> ---
> drivers/iommu/iommufd/hw_pagetable.c | 7 +-
> drivers/iommu/iommufd/main.c | 5 +
> include/uapi/linux/iommufd.h | 136 +++++++++++++++++++++++++++
> 3 files changed, 147 insertions(+), 1 deletion(-)
>
[...]

> +
> +/**
> + * struct iommu_hwpt_intel_vtd - Intel VT-d specific user-managed
> + * stage-1 page table info
> + * @flags: Combination of enum iommu_hwpt_intel_vtd_flags
> + * @pgtbl_addr: The base address of the user-managed stage-1 page table.
> + * @pat: Page attribute table data to compute effective memory type
> + * @emt: Extended memory type
> + * @addr_width: The address width of the untranslated addresses that are
> + * subjected to the user-managed stage-1 page table.
> + * @__reserved: Must be 0
> + *
> + * The Intel VT-d specific data for creating hw_pagetable to represent
> + * the user-managed stage-1 page table that is used in nested translation.
> + *
> + * In nested translation, the stage-1 page table locates in the address
> + * space that defined by the corresponding stage-2 page table. Hence the
> + * stage-1 page table base address value should not be higher than the
> + * maximum untranslated address of stage-2 page table.
> + *
> + * The paging level of the stage-1 page table should be compataible with

s/compataible/compatible

> + * the hardware iommu. Otherwise, the allocation would be failed.
> + */
> +struct iommu_hwpt_intel_vtd {
> + __u64 flags;
> + __u64 pgtbl_addr;
> + __u32 pat;
> + __u32 emt;
> + __u32 addr_width;
> + __u32 __reserved;
> };
>

[...]

> +
> +/**
> + * struct iommu_hwpt_invalidate_intel_vtd - Intel VT-d cache invalidation info
> + * @granularity: One of enum iommu_vtd_qi_granularity.
> + * @flags: Combination of enum iommu_hwpt_intel_vtd_invalidate_flags
> + * @__reserved: Must be 0
> + * @addr: The start address of the addresses to be invalidated.
> + * @granule_size: Page/block size of the mapping in bytes. It is used to
> + * compute the invalidation range togehter with @nb_granules.

s/togehter/together

Thanks,
Jingqi


2023-03-20 13:50:17

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 1/5] iommufd: Add nesting related data structures for Intel VT-d

On Thu, Mar 09, 2023 at 12:22:03AM -0800, Yi Liu wrote:

> +struct iommu_hwpt_invalidate_intel_vtd {
> + __u8 granularity;
> + __u8 padding[7];
> + __u32 flags;
> + __u32 __reserved;
> + __u64 addr;
> + __u64 granule_size;
> + __u64 nb_granules;
> +};

Is there a reason this has such a weird layout? Put the granularity in
the __reserved slot?

Consider the discussion on ARM if you prefer to use the native HW
command structure instead?

Jason

2023-03-20 13:54:31

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] iommu/vt-d: Extend dmar_domain to support nested domain

On Thu, Mar 09, 2023 at 12:22:05AM -0800, Yi Liu wrote:
> From: Lu Baolu <[email protected]>
>
> The nested domain fields are exclusive to those that used for a DMA
> remapping domain. Use union to avoid memory waste.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Signed-off-by: Yi Liu <[email protected]>
> ---
> drivers/iommu/intel/iommu.h | 35 +++++++++++++++++++++++++++++------
> 1 file changed, 29 insertions(+), 6 deletions(-)

Using unions like this often devolves into a mess.

You'd be better to have more structures

struct intel_iommu_domain {
struct iommu_domain domain;
[general fields about attachment]
};

struct intel_iopte_domain {
struct intel_iommu_domain domain;
[stuff describing the io page table data, pgd, format, etc]
};

strut intel_s1_domain {
struct intel_iommu_domain domain;
struct dmar_domain *s2_domain;
/* user page table pointer (in GPA) */
unsigned long s1_pgtbl;
/* page table attributes */
struct iommu_hwpt_intel_vtd s1_cfg;
};
static_assert(offset_of(struct intel_s1_domain, domain.domain) ==
offset_of(struct intel_iommu_domain, domain));

The per-domain ops allow to make this work sensibly

Jason

2023-03-21 01:59:46

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v2 3/5] iommu/vt-d: Extend dmar_domain to support nested domain

On 3/20/23 9:54 PM, Jason Gunthorpe wrote:
> On Thu, Mar 09, 2023 at 12:22:05AM -0800, Yi Liu wrote:
>> From: Lu Baolu <[email protected]>
>>
>> The nested domain fields are exclusive to those that used for a DMA
>> remapping domain. Use union to avoid memory waste.
>>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Signed-off-by: Yi Liu <[email protected]>
>> ---
>> drivers/iommu/intel/iommu.h | 35 +++++++++++++++++++++++++++++------
>> 1 file changed, 29 insertions(+), 6 deletions(-)
>
> Using unions like this often devolves into a mess.
>
> You'd be better to have more structures
>
> struct intel_iommu_domain {
> struct iommu_domain domain;
> [general fields about attachment]
> };
>
> struct intel_iopte_domain {
> struct intel_iommu_domain domain;
> [stuff describing the io page table data, pgd, format, etc]
> };
>
> strut intel_s1_domain {
> struct intel_iommu_domain domain;
> struct dmar_domain *s2_domain;
> /* user page table pointer (in GPA) */
> unsigned long s1_pgtbl;
> /* page table attributes */
> struct iommu_hwpt_intel_vtd s1_cfg;
> };
> static_assert(offset_of(struct intel_s1_domain, domain.domain) ==
> offset_of(struct intel_iommu_domain, domain));
>
> The per-domain ops allow to make this work sensibly

Yes. This will make the data structures clearer.

However, this will lead to significant code changes. I think it would be
more appropriate to put it in a separate refactoring series later.

Best regards,
baolu

2023-03-21 06:07:37

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v2 1/5] iommufd: Add nesting related data structures for Intel VT-d

> From: Jason Gunthorpe <[email protected]>
> Sent: Monday, March 20, 2023 9:50 PM
>
> On Thu, Mar 09, 2023 at 12:22:03AM -0800, Yi Liu wrote:
>
> > +struct iommu_hwpt_invalidate_intel_vtd {
> > + __u8 granularity;
> > + __u8 padding[7];
> > + __u32 flags;
> > + __u32 __reserved;
> > + __u64 addr;
> > + __u64 granule_size;
> > + __u64 nb_granules;
> > +};
>
> Is there a reason this has such a weird layout? Put the granularity in
> the __reserved slot?

No special reason. This layout was from the previous merged version.
Will modify it as you suggested.

> Consider the discussion on ARM if you prefer to use the native HW
> command structure instead?

Yes, will think about it. at least granule_size and nb_granules are not
necessary. They was added in the previous abstracted invalidation
uapi structure.

Regards,
Yi Liu

2023-03-21 06:31:58

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v2 1/5] iommufd: Add nesting related data structures for Intel VT-d

> From: Liu, Jingqi <[email protected]>
> Sent: Wednesday, March 15, 2023 9:51 PM
>
>
> On 3/9/2023 4:22 PM, Yi Liu wrote:
> > Add the following data structures for corresponding ioctls:
> > iommu_hwpt_intel_vtd => IOMMU_HWPT_ALLOC
> > iommu_hw_info_vtd => IOMMU_DEVICE_GET_HW_INFO
> > iommu_hwpt_invalidate_intel_vtd => IOMMU_HWPT_INVALIDATE
> >
> > Also, add IOMMU_HW_INFO_TYPE_INTEL_VTD and
> IOMMU_HWPT_TYPE_VTD_S1 to the
> > header and corresponding type/size arrays.
> >
> > Signed-off-by: Yi Liu <[email protected]>
> > ---
> > drivers/iommu/iommufd/hw_pagetable.c | 7 +-
> > drivers/iommu/iommufd/main.c | 5 +
> > include/uapi/linux/iommufd.h | 136
> +++++++++++++++++++++++++++
> > 3 files changed, 147 insertions(+), 1 deletion(-)
> >
> [...]
>
> > +
> > +/**
> > + * struct iommu_hwpt_intel_vtd - Intel VT-d specific user-managed
> > + * stage-1 page table info
> > + * @flags: Combination of enum iommu_hwpt_intel_vtd_flags
> > + * @pgtbl_addr: The base address of the user-managed stage-1 page
> table.
> > + * @pat: Page attribute table data to compute effective memory type
> > + * @emt: Extended memory type
> > + * @addr_width: The address width of the untranslated addresses that
> are
> > + * subjected to the user-managed stage-1 page table.
> > + * @__reserved: Must be 0
> > + *
> > + * The Intel VT-d specific data for creating hw_pagetable to represent
> > + * the user-managed stage-1 page table that is used in nested translation.
> > + *
> > + * In nested translation, the stage-1 page table locates in the address
> > + * space that defined by the corresponding stage-2 page table. Hence
> the
> > + * stage-1 page table base address value should not be higher than the
> > + * maximum untranslated address of stage-2 page table.
> > + *
> > + * The paging level of the stage-1 page table should be compataible with
>
> s/compataible/compatible
>
> > + * the hardware iommu. Otherwise, the allocation would be failed.
> > + */
> > +struct iommu_hwpt_intel_vtd {
> > + __u64 flags;
> > + __u64 pgtbl_addr;
> > + __u32 pat;
> > + __u32 emt;
> > + __u32 addr_width;
> > + __u32 __reserved;
> > };
> >
>
> [...]
>
> > +
> > +/**
> > + * struct iommu_hwpt_invalidate_intel_vtd - Intel VT-d cache
> invalidation info
> > + * @granularity: One of enum iommu_vtd_qi_granularity.
> > + * @flags: Combination of enum
> iommu_hwpt_intel_vtd_invalidate_flags
> > + * @__reserved: Must be 0
> > + * @addr: The start address of the addresses to be invalidated.
> > + * @granule_size: Page/block size of the mapping in bytes. It is used to
> > + * compute the invalidation range togehter with @nb_granules.
>
> s/togehter/together
>

All above received. Thanks.

Regards,
Yi Liu