2023-03-27 23:18:47

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v2 0/8] Re-enable IDXD kernel workqueue under DMA API

Hi all,

IDXD kernel work queues were disabled due to the flawed use of kernel VA
and SVA API.
Link: https://lore.kernel.org/linux-iommu/[email protected]/

The solution is to enable it under DMA API where IDXD shared workqueue users
can use ENQCMDS to submit work on buffers mapped by DMA API.

This patchset adds support for attaching PASID to the device's default
domain and the ability to reserve global PASIDs from SVA APIs. We can then
re-enable the kernel work queues and use them under DMA API.

This depends on the IOASID removal series.
https://lore.kernel.org/linux-iommu/[email protected]/T/#t

Thanks,

---
Changelog:
v2:
- refactored device PASID attach domain ops based on Baolu's early patch
- addressed TLB flush gap
- explicitly reserve RID_PASID from SVA PASID number space
- get dma domain directly, avoid checking domain types

Jacob

Jacob Pan (8):
iommu/vt-d: Use non-privileged mode for all PASIDs
iommu/vt-d: Remove PASID supervisor request support
iommu/sva: Support reservation of global SVA PASIDs
iommu/vt-d: Reserve RID_PASID from global SVA PASID space
iommu/vt-d: Make device pasid attachment explicit
iommu/vt-d: Implement set_dev_pasid domain op
iommu: Export iommu_get_dma_domain
dmaengine/idxd: Re-enable kernel workqueue under DMA API

drivers/dma/idxd/device.c | 30 +-----
drivers/dma/idxd/init.c | 51 +++++++++-
drivers/dma/idxd/sysfs.c | 7 --
drivers/iommu/intel/iommu.c | 188 ++++++++++++++++++++++++++++--------
drivers/iommu/intel/iommu.h | 8 ++
drivers/iommu/intel/pasid.c | 43 ---------
drivers/iommu/intel/pasid.h | 7 --
drivers/iommu/iommu-sva.c | 33 +++++++
drivers/iommu/iommu.c | 1 +
include/linux/iommu.h | 19 ++++
10 files changed, 261 insertions(+), 126 deletions(-)

--
2.25.1


2023-03-27 23:18:57

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v2 2/8] iommu/vt-d: Remove PASID supervisor request support

There's no more usage, remove PASID supervisor support.

Suggested-by: Lu Baolu <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/pasid.c | 43 -------------------------------------
drivers/iommu/intel/pasid.h | 7 ------
2 files changed, 50 deletions(-)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 633e0a4a01e7..c5d479770e12 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -335,15 +335,6 @@ static inline void pasid_set_fault_enable(struct pasid_entry *pe)
pasid_set_bits(&pe->val[0], 1 << 1, 0);
}

-/*
- * Setup the SRE(Supervisor Request Enable) field (Bit 128) of a
- * scalable mode PASID entry.
- */
-static inline void pasid_set_sre(struct pasid_entry *pe)
-{
- pasid_set_bits(&pe->val[2], 1 << 0, 1);
-}
-
/*
* Setup the WPE(Write Protect Enable) field (Bit 132) of a
* scalable mode PASID entry.
@@ -521,23 +512,6 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
return -EINVAL;
}

- if (flags & PASID_FLAG_SUPERVISOR_MODE) {
-#ifdef CONFIG_X86
- unsigned long cr0 = read_cr0();
-
- /* CR0.WP is normally set but just to be sure */
- if (unlikely(!(cr0 & X86_CR0_WP))) {
- pr_err("No CPU write protect!\n");
- return -EINVAL;
- }
-#endif
- if (!ecap_srs(iommu->ecap)) {
- pr_err("No supervisor request support on %s\n",
- iommu->name);
- return -EINVAL;
- }
- }
-
if ((flags & PASID_FLAG_FL5LP) && !cap_fl5lp_support(iommu->cap)) {
pr_err("No 5-level paging support for first-level on %s\n",
iommu->name);
@@ -560,10 +534,6 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,

/* Setup the first level page table pointer: */
pasid_set_flptr(pte, (u64)__pa(pgd));
- if (flags & PASID_FLAG_SUPERVISOR_MODE) {
- pasid_set_sre(pte);
- pasid_set_wpe(pte);
- }

if (flags & PASID_FLAG_FL5LP)
pasid_set_flpm(pte, 1);
@@ -658,12 +628,6 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
pasid_set_fault_enable(pte);
pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));

- /*
- * Since it is a second level only translation setup, we should
- * set SRE bit as well (addresses are expected to be GPAs).
- */
- if (pasid != PASID_RID2PASID && ecap_srs(iommu->ecap))
- pasid_set_sre(pte);
pasid_set_present(pte);
spin_unlock(&iommu->lock);

@@ -700,13 +664,6 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
pasid_set_translation_type(pte, PASID_ENTRY_PGTT_PT);
pasid_set_fault_enable(pte);
pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
-
- /*
- * We should set SRE bit as well since the addresses are expected
- * to be GPAs.
- */
- if (ecap_srs(iommu->ecap))
- pasid_set_sre(pte);
pasid_set_present(pte);
spin_unlock(&iommu->lock);

diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index 20c54e50f533..d6b7d21244b1 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -41,13 +41,6 @@
#define FLPT_DEFAULT_DID 1
#define NUM_RESERVED_DID 2

-/*
- * The SUPERVISOR_MODE flag indicates a first level translation which
- * can be used for access to kernel addresses. It is valid only for
- * access to the kernel's static 1:1 mapping of physical memory — not
- * to vmalloc or even module mappings.
- */
-#define PASID_FLAG_SUPERVISOR_MODE BIT(0)
#define PASID_FLAG_NESTED BIT(1)
#define PASID_FLAG_PAGE_SNOOP BIT(2)

--
2.25.1

2023-03-27 23:19:15

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v2 3/8] iommu/sva: Support reservation of global SVA PASIDs

Devices that use Intel ENQCMD to submit work must use global PASIDs in
that the PASID are stored in a per CPU MSR. When such device need to
submit work for in-kernel DMA with PASID, it must allocate PASIDs from
the same global number space to avoid conflict.

This patch introduces IOMMU SVA APIs to reserve and release global PASIDs.
It is expected that device drivers will use the allocated PASIDs to attach
to appropriate IOMMU domains for use.

Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/iommu-sva.c | 33 +++++++++++++++++++++++++++++++++
include/linux/iommu.h | 14 ++++++++++++++
2 files changed, 47 insertions(+)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index c434b95dc8eb..84b9de84b3e0 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -148,6 +148,39 @@ u32 iommu_sva_get_pasid(struct iommu_sva *handle)
}
EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);

+/**
+ * @brief
+ * Reserve a PASID from the SVA global number space.
+ *
+ * @param min starting range, inclusive
+ * @param max ending range, inclusive
+ * @return The reserved PASID on success or IOMMU_PASID_INVALID on failure.
+ */
+ioasid_t iommu_sva_reserve_pasid(ioasid_t min, ioasid_t max)
+{
+ int ret;
+
+ if (!pasid_valid(min) || !pasid_valid(max) ||
+ min == 0 || max < min)
+ return IOMMU_PASID_INVALID;
+
+ ret = ida_alloc_range(&iommu_global_pasid_ida, min, max, GFP_KERNEL);
+ if (ret < 0)
+ return IOMMU_PASID_INVALID;
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_reserve_pasid);
+
+void iommu_sva_release_pasid(ioasid_t pasid)
+{
+ if (!pasid_valid(pasid))
+ return;
+
+ ida_free(&iommu_global_pasid_ida, pasid);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_release_pasid);
+
/*
* I/O page fault handler for SVA
*/
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 54f535ff9868..0471089dc1d0 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -1187,6 +1187,9 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
struct mm_struct *mm);
void iommu_sva_unbind_device(struct iommu_sva *handle);
u32 iommu_sva_get_pasid(struct iommu_sva *handle);
+ioasid_t iommu_sva_reserve_pasid(ioasid_t min, ioasid_t max);
+void iommu_sva_release_pasid(ioasid_t pasid);
+
#else
static inline struct iommu_sva *
iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
@@ -1202,6 +1205,17 @@ static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
{
return IOMMU_PASID_INVALID;
}
+
+static inline ioasid_t iommu_sva_reserve_pasid(ioasid_t min, ioasid_t max)
+{
+ return IOMMU_PASID_INVALID;
+}
+
+static inline void iommu_sva_release_pasid(ioasid_t pasid)
+{
+
+}
+
static inline void mm_pasid_init(struct mm_struct *mm) {}
static inline void mm_pasid_drop(struct mm_struct *mm) {}
#endif /* CONFIG_IOMMU_SVA */
--
2.25.1

2023-03-27 23:19:15

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v2 5/8] iommu/vt-d: Make device pasid attachment explicit

Currently, when a device is attached to its DMA domain, RID_PASID is
implicitly attached if VT-d is in scalable mode. To prepare for generic
PASID-device domain attachment, this patch parameterizes PASID such that
all PASIDs are attached explicitly.

It will allow code reuse for DMA API with PASID usage and makes no
assumptions of the ordering in which PASIDs and device are attached.
The same change applies to IOTLB invalidation and removing PASIDs.

Extracted common code based on Baolu's patch:

Link:https://lore.kernel.org/linux-iommu/[email protected]/
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/iommu.c | 161 +++++++++++++++++++++++++++---------
drivers/iommu/intel/iommu.h | 8 ++
2 files changed, 131 insertions(+), 38 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 61c06f7ad8f7..0a195136bac7 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -277,7 +277,7 @@ static LIST_HEAD(dmar_satc_units);
#define for_each_rmrr_units(rmrr) \
list_for_each_entry(rmrr, &dmar_rmrr_units, list)

-static void device_block_translation(struct device *dev);
+static void device_block_translation(struct device *dev, struct device_pasid_info *dev_pasid);
static void intel_iommu_domain_free(struct iommu_domain *domain);

int dmar_disabled = !IS_ENABLED(CONFIG_INTEL_IOMMU_DEFAULT_ON);
@@ -1365,6 +1365,7 @@ domain_lookup_dev_info(struct dmar_domain *domain,

static void domain_update_iotlb(struct dmar_domain *domain)
{
+ struct device_pasid_info *dev_pasid;
struct device_domain_info *info;
bool has_iotlb_device = false;
unsigned long flags;
@@ -1376,6 +1377,14 @@ static void domain_update_iotlb(struct dmar_domain *domain)
break;
}
}
+
+ list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain) {
+ info = dev_iommu_priv_get(dev_pasid->dev);
+ if (info->ats_enabled) {
+ has_iotlb_device = true;
+ break;
+ }
+ }
domain->has_iotlb_device = has_iotlb_device;
spin_unlock_irqrestore(&domain->lock, flags);
}
@@ -1486,6 +1495,7 @@ static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
u64 addr, unsigned mask)
{
+ struct device_pasid_info *dev_pasid;
struct device_domain_info *info;
unsigned long flags;

@@ -1495,6 +1505,39 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
spin_lock_irqsave(&domain->lock, flags);
list_for_each_entry(info, &domain->devices, link)
__iommu_flush_dev_iotlb(info, addr, mask);
+
+ list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain) {
+ /* device TLB is not aware of the use of RID PASID is for DMA w/o PASID */
+ if (dev_pasid->pasid == PASID_RID2PASID)
+ continue;
+
+ info = dev_iommu_priv_get(dev_pasid->dev);
+ qi_flush_dev_iotlb_pasid(info->iommu,
+ PCI_DEVID(info->bus, info->devfn),
+ info->pfsid, dev_pasid->pasid,
+ info->ats_qdep, addr,
+ mask);
+ }
+ spin_unlock_irqrestore(&domain->lock, flags);
+}
+
+/*
+ * The VT-d spec requires to use PASID-based-IOTLB Invalidation to
+ * invalidate IOTLB and the paging-structure-caches for a first-stage
+ * page table.
+ */
+static void domain_flush_pasid_iotlb(struct intel_iommu *iommu,
+ struct dmar_domain *domain, u64 addr,
+ unsigned long npages, bool ih)
+{
+ u16 did = domain_id_iommu(domain, iommu);
+ struct device_pasid_info *dev_pasid;
+ unsigned long flags;
+
+ spin_lock_irqsave(&domain->lock, flags);
+ list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain)
+ qi_flush_piotlb(iommu, did, dev_pasid->pasid, addr, npages, ih);
+
spin_unlock_irqrestore(&domain->lock, flags);
}

@@ -1514,7 +1557,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
ih = 1 << 6;

if (domain->use_first_level) {
- qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, pages, ih);
+ domain_flush_pasid_iotlb(iommu, domain, addr, pages, ih);
} else {
unsigned long bitmask = aligned_pages - 1;

@@ -1584,7 +1627,7 @@ static void intel_flush_iotlb_all(struct iommu_domain *domain)
u16 did = domain_id_iommu(dmar_domain, iommu);

if (dmar_domain->use_first_level)
- qi_flush_piotlb(iommu, did, PASID_RID2PASID, 0, -1, 0);
+ domain_flush_pasid_iotlb(iommu, dmar_domain, 0, -1, 0);
else
iommu->flush.flush_iotlb(iommu, did, 0, 0,
DMA_TLB_DSI_FLUSH);
@@ -1756,6 +1799,7 @@ static struct dmar_domain *alloc_domain(unsigned int type)
domain->use_first_level = true;
domain->has_iotlb_device = false;
INIT_LIST_HEAD(&domain->devices);
+ INIT_LIST_HEAD(&domain->dev_pasids);
spin_lock_init(&domain->lock);
xa_init(&domain->iommu_array);

@@ -2429,10 +2473,11 @@ static int __init si_domain_init(int hw)
return 0;
}

-static int dmar_domain_attach_device(struct dmar_domain *domain,
- struct device *dev)
+static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
+ struct device *dev, ioasid_t pasid)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct device_pasid_info *dev_pasid;
struct intel_iommu *iommu;
unsigned long flags;
u8 bus, devfn;
@@ -2442,43 +2487,56 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
if (!iommu)
return -ENODEV;

+ dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
+ if (!dev_pasid)
+ return -ENOMEM;
+
ret = domain_attach_iommu(domain, iommu);
if (ret)
- return ret;
+ goto exit_free;
+
info->domain = domain;
+ dev_pasid->pasid = pasid;
+ dev_pasid->dev = dev;
spin_lock_irqsave(&domain->lock, flags);
- list_add(&info->link, &domain->devices);
+ if (!info->dev_attached)
+ list_add(&info->link, &domain->devices);
+ list_add(&dev_pasid->link_domain, &domain->dev_pasids);
spin_unlock_irqrestore(&domain->lock, flags);

/* PASID table is mandatory for a PCI device in scalable mode. */
if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
/* Setup the PASID entry for requests without PASID: */
if (hw_pass_through && domain_type_is_si(domain))
- ret = intel_pasid_setup_pass_through(iommu, domain,
- dev, PASID_RID2PASID);
+ ret = intel_pasid_setup_pass_through(iommu, domain, dev, pasid);
else if (domain->use_first_level)
- ret = domain_setup_first_level(iommu, domain, dev,
- PASID_RID2PASID);
+ ret = domain_setup_first_level(iommu, domain, dev, pasid);
else
- ret = intel_pasid_setup_second_level(iommu, domain,
- dev, PASID_RID2PASID);
+ ret = intel_pasid_setup_second_level(iommu, domain, dev, pasid);
if (ret) {
- dev_err(dev, "Setup RID2PASID failed\n");
- device_block_translation(dev);
- return ret;
+ dev_err(dev, "Setup PASID %d failed\n", pasid);
+ device_block_translation(dev, dev_pasid);
+ goto exit_free;
}
}
+ /* device context already activated, we are done */
+ if (info->dev_attached)
+ goto exit;

ret = domain_context_mapping(domain, dev);
if (ret) {
dev_err(dev, "Domain context map failed\n");
- device_block_translation(dev);
- return ret;
+ device_block_translation(dev, dev_pasid);
+ goto exit_free;
}

iommu_enable_pci_caps(info);
-
+ info->dev_attached = 1;
+exit:
return 0;
+exit_free:
+ kfree(dev_pasid);
+ return ret;
}

static bool device_has_rmrr(struct device *dev)
@@ -4020,17 +4078,21 @@ static void dmar_remove_one_dev_info(struct device *dev)
* all DMA requests without PASID from the device are blocked. If the page
* table has been set, clean up the data structures.
*/
-static void device_block_translation(struct device *dev)
+static void device_block_translation(struct device *dev, struct device_pasid_info *dev_pasid)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
struct intel_iommu *iommu = info->iommu;
+ ioasid_t pasid = PASID_RID2PASID;
unsigned long flags;

+ if (dev_pasid)
+ pasid = dev_pasid->pasid;
+
iommu_disable_pci_caps(info);
if (!dev_is_real_dma_subdevice(dev)) {
if (sm_supported(iommu))
intel_pasid_tear_down_entry(iommu, dev,
- PASID_RID2PASID, false);
+ pasid, false);
else
domain_context_clear(info);
}
@@ -4040,6 +4102,8 @@ static void device_block_translation(struct device *dev)

spin_lock_irqsave(&info->domain->lock, flags);
list_del(&info->link);
+ if (dev_pasid)
+ list_del(&dev_pasid->link_domain);
spin_unlock_irqrestore(&info->domain->lock, flags);

domain_detach_iommu(info->domain, iommu);
@@ -4070,7 +4134,7 @@ static int md_domain_init(struct dmar_domain *domain, int guest_width)
static int blocking_domain_attach_dev(struct iommu_domain *domain,
struct device *dev)
{
- device_block_translation(dev);
+ device_block_translation(dev, NULL);
return 0;
}

@@ -4180,13 +4244,13 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
}

if (info->domain)
- device_block_translation(dev);
+ device_block_translation(dev, NULL);

ret = prepare_domain_attach_device(domain, dev);
if (ret)
return ret;

- return dmar_domain_attach_device(to_dmar_domain(domain), dev);
+ return dmar_domain_attach_device_pasid(to_dmar_domain(domain), dev, PASID_RID2PASID);
}

static int intel_iommu_map(struct iommu_domain *domain,
@@ -4675,26 +4739,47 @@ static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
__mapping_notify_one(info->iommu, dmar_domain, pfn, pages);
}

-static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
+static void intel_iommu_detach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
{
- struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
- struct iommu_domain *domain;
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ struct device_pasid_info *i, *dev_pasid = NULL;
+ struct intel_iommu *iommu = info->iommu;
+ unsigned long flags;

- /* Domain type specific cleanup: */
- domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
- if (domain) {
- switch (domain->type) {
- case IOMMU_DOMAIN_SVA:
- intel_svm_remove_dev_pasid(dev, pasid);
- break;
- default:
- /* should never reach here */
- WARN_ON(1);
+ spin_lock_irqsave(&dmar_domain->lock, flags);
+ list_for_each_entry(i, &dmar_domain->dev_pasids, link_domain) {
+ if (i->dev == dev && i->pasid == pasid) {
+ list_del(&i->link_domain);
+ dev_pasid = i;
break;
}
}
+ spin_unlock_irqrestore(&dmar_domain->lock, flags);
+ if (WARN_ON(!dev_pasid))
+ return;
+
+ /* PASID entry already cleared during SVA unbind */
+ if (domain->type != IOMMU_DOMAIN_SVA)
+ intel_pasid_tear_down_entry(iommu, dev, pasid, false);

- intel_pasid_tear_down_entry(iommu, dev, pasid, false);
+ domain_detach_iommu(dmar_domain, iommu);
+ kfree(dev_pasid);
+}
+
+static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+ struct iommu_domain *domain;
+
+ domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
+ /*
+ * SVA Domain type specific cleanup: Not ideal but not until we have
+ * IOPF capable domain specific ops, we need this special case.
+ */
+ if (domain->type == IOMMU_DOMAIN_SVA)
+ return intel_svm_remove_dev_pasid(dev, pasid);
+ intel_iommu_detach_device_pasid(domain, dev, pasid);
}

const struct iommu_ops intel_iommu_ops = {
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 65b15be72878..b6c26f25d1ba 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -595,6 +595,7 @@ struct dmar_domain {

spinlock_t lock; /* Protect device tracking lists */
struct list_head devices; /* all devices' list */
+ struct list_head dev_pasids; /* all attached pasids */

struct dma_pte *pgd; /* virtual address */
int gaw; /* max guest address width */
@@ -708,6 +709,7 @@ struct device_domain_info {
u8 ats_supported:1;
u8 ats_enabled:1;
u8 dtlb_extra_inval:1; /* Quirk for devices need extra flush */
+ u8 dev_attached:1; /* Device context activated */
u8 ats_qdep;
struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
struct intel_iommu *iommu; /* IOMMU used by this device */
@@ -715,6 +717,12 @@ struct device_domain_info {
struct pasid_table *pasid_table; /* pasid table */
};

+struct device_pasid_info {
+ struct list_head link_domain; /* link to domain siblings */
+ struct device *dev; /* physical device derived from */
+ ioasid_t pasid; /* PASID on physical device */
+};
+
static inline void __iommu_flush_cache(
struct intel_iommu *iommu, void *addr, int size)
{
--
2.25.1

2023-03-27 23:19:18

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v2 6/8] iommu/vt-d: Implement set_dev_pasid domain op

Devices that use ENQCMDS to submit work on buffers mapped by DMA API
must attach a PASID to the default domain of the device. In preparation
for this use case, this patch implements set_dev_pasid() for the
default_domain_ops. Besides PASID attachment, device will also be
attached to the domain as the result of this call if the device has not
been attached before.

Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/iommu.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 0a195136bac7..652ee0ca72da 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4782,6 +4782,26 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
intel_iommu_detach_device_pasid(domain, dev, pasid);
}

+static int intel_iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ struct intel_iommu *iommu = info->iommu;
+ int ret;
+
+ if (!pasid_supported(iommu))
+ return -ENODEV;
+
+ ret = prepare_domain_attach_device(domain, dev);
+ if (ret)
+ return ret;
+
+ return dmar_domain_attach_device_pasid(dmar_domain, dev, pasid);
+}
+
+
+
const struct iommu_ops intel_iommu_ops = {
.capable = intel_iommu_capable,
.domain_alloc = intel_iommu_domain_alloc,
@@ -4801,6 +4821,7 @@ const struct iommu_ops intel_iommu_ops = {
#endif
.default_domain_ops = &(const struct iommu_domain_ops) {
.attach_dev = intel_iommu_attach_device,
+ .set_dev_pasid = intel_iommu_attach_device_pasid,
.map_pages = intel_iommu_map_pages,
.unmap_pages = intel_iommu_unmap_pages,
.iotlb_sync_map = intel_iommu_iotlb_sync_map,
--
2.25.1

2023-03-27 23:19:19

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v2 4/8] iommu/vt-d: Reserve RID_PASID from global SVA PASID space

On VT-d platforms, RID_PASID is used for DMA request without PASID. We
should not treat RID_PASID special instead let it be allocated from the
global SVA PASID number space.

Consequently, for devices also do DMA with PASID, drivers will not worry
about conflicts when it comes to allocating PASIDs for in-kernel DMA.

Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/iommu.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9f737ef55463..61c06f7ad8f7 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3956,6 +3956,10 @@ int __init intel_iommu_init(void)

intel_iommu_enabled = 1;

+#ifdef CONFIG_INTEL_IOMMU_SVM
+ /* Reserved RID_PASID from the global namespace for legacy DMA */
+ iommu_sva_reserve_pasid(PASID_RID2PASID, PASID_RID2PASID);
+#endif
return 0;

out_free_dmar:
--
2.25.1

2023-03-27 23:19:21

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v2 7/8] iommu: Export iommu_get_dma_domain

Devices that use ENQCMDS to submit work needs to retrieve its DMA
domain. It can then attach PASID to the DMA domain for shared mapping
(with RID) established by DMA API.

Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/iommu.c | 1 +
include/linux/iommu.h | 5 +++++
2 files changed, 6 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 10db680acaed..c51d343a75d2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2118,6 +2118,7 @@ struct iommu_domain *iommu_get_dma_domain(struct device *dev)
{
return dev->iommu_group->default_domain;
}
+EXPORT_SYMBOL_GPL(iommu_get_dma_domain);

/*
* IOMMU groups are really the natural working unit of the IOMMU, but
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0471089dc1d0..1ef9a1109534 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -1089,6 +1089,11 @@ iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid,
{
return NULL;
}
+
+static inline struct iommu_domain *iommu_get_dma_domain(struct device *dev)
+{
+ return NULL;
+}
#endif /* CONFIG_IOMMU_API */

/**
--
2.25.1

2023-03-27 23:19:31

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v2 1/8] iommu/vt-d: Use non-privileged mode for all PASIDs

Supervisor Request Enable (SRE) bit in a PASID entry is for permission
checking on DMA requests. When SRE = 0, DMA with supervisor privilege
will be blocked. However, for in-kernel DMA this is not necessary in that
we are targeting kernel memory anyway. There's no need to differentiate
user and kernel for in-kernel DMA.

Let's use non-privileged (user) permission for all PASIDs used in kernel,
it will be consistent with DMA without PASID (RID_PASID) as well.

Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/iommu.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 0768dcae90fd..9f737ef55463 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2338,8 +2338,6 @@ static int domain_setup_first_level(struct intel_iommu *iommu,
if (level != 4 && level != 5)
return -EINVAL;

- if (pasid != PASID_RID2PASID)
- flags |= PASID_FLAG_SUPERVISOR_MODE;
if (level == 5)
flags |= PASID_FLAG_FL5LP;

--
2.25.1

2023-03-27 23:19:52

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v2 8/8] dmaengine/idxd: Re-enable kernel workqueue under DMA API

Kernel workqueues were disabled due to flawed use of kernel VA and SVA
API. Now That we have the support for attaching PASID to the device's
default domain and the ability to reserve global PASIDs from SVA APIs,
we can re-enable the kernel work queues and use them under DMA API.

We also use non-privileged access for in-kernel DMA to be consistent
with the IOMMU settings. Consequently, interrupt for user privilege is
enabled for work completion IRQs.

Link:https://lore.kernel.org/linux-iommu/[email protected]/
Reviewed-by: Dave Jiang <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/dma/idxd/device.c | 30 ++++-------------------
drivers/dma/idxd/init.c | 51 ++++++++++++++++++++++++++++++++++++---
drivers/dma/idxd/sysfs.c | 7 ------
3 files changed, 52 insertions(+), 36 deletions(-)

diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index 6fca8fa8d3a8..f6b133d61a04 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -299,21 +299,6 @@ void idxd_wqs_unmap_portal(struct idxd_device *idxd)
}
}

-static void __idxd_wq_set_priv_locked(struct idxd_wq *wq, int priv)
-{
- struct idxd_device *idxd = wq->idxd;
- union wqcfg wqcfg;
- unsigned int offset;
-
- offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PRIVL_IDX);
- spin_lock(&idxd->dev_lock);
- wqcfg.bits[WQCFG_PRIVL_IDX] = ioread32(idxd->reg_base + offset);
- wqcfg.priv = priv;
- wq->wqcfg->bits[WQCFG_PRIVL_IDX] = wqcfg.bits[WQCFG_PRIVL_IDX];
- iowrite32(wqcfg.bits[WQCFG_PRIVL_IDX], idxd->reg_base + offset);
- spin_unlock(&idxd->dev_lock);
-}
-
static void __idxd_wq_set_pasid_locked(struct idxd_wq *wq, int pasid)
{
struct idxd_device *idxd = wq->idxd;
@@ -1324,15 +1309,14 @@ int drv_enable_wq(struct idxd_wq *wq)
}

/*
- * In the event that the WQ is configurable for pasid and priv bits.
- * For kernel wq, the driver should setup the pasid, pasid_en, and priv bit.
- * However, for non-kernel wq, the driver should only set the pasid_en bit for
- * shared wq. A dedicated wq that is not 'kernel' type will configure pasid and
+ * In the event that the WQ is configurable for pasid, the driver
+ * should setup the pasid, pasid_en bit. This is true for both kernel
+ * and user shared workqueues. There is no need to setup priv bit in
+ * that in-kernel DMA will also do user privileged requests.
+ * A dedicated wq that is not 'kernel' type will configure pasid and
* pasid_en later on so there is no need to setup.
*/
if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) {
- int priv = 0;
-
if (wq_pasid_enabled(wq)) {
if (is_idxd_wq_kernel(wq) || wq_shared(wq)) {
u32 pasid = wq_dedicated(wq) ? idxd->pasid : 0;
@@ -1340,10 +1324,6 @@ int drv_enable_wq(struct idxd_wq *wq)
__idxd_wq_set_pasid_locked(wq, pasid);
}
}
-
- if (is_idxd_wq_kernel(wq))
- priv = 1;
- __idxd_wq_set_priv_locked(wq, priv);
}

rc = 0;
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index e6ee267da0ff..a3396e1b38f1 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -506,14 +506,56 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d

static int idxd_enable_system_pasid(struct idxd_device *idxd)
{
- return -EOPNOTSUPP;
+ struct pci_dev *pdev = idxd->pdev;
+ struct device *dev = &pdev->dev;
+ struct iommu_domain *domain;
+ union gencfg_reg gencfg;
+ ioasid_t pasid;
+ int ret;
+
+ /*
+ * Attach a global PASID to the DMA domain so that we can use ENQCMDS
+ * to submit work on buffers mapped by DMA API.
+ */
+ domain = iommu_get_dma_domain(dev);
+ if (!domain)
+ return -EPERM;
+
+ pasid = iommu_sva_reserve_pasid(1, dev->iommu->max_pasids);
+ if (pasid == IOMMU_PASID_INVALID)
+ return -ENOSPC;
+
+ ret = iommu_attach_device_pasid(domain, dev, pasid);
+ if (ret) {
+ dev_err(dev, "failed to attach device pasid %d, domain type %d",
+ pasid, domain->type);
+ iommu_sva_release_pasid(pasid);
+ return ret;
+ }
+
+ /* Since we set user privilege for kernel DMA, enable completion IRQ */
+ gencfg.bits = ioread32(idxd->reg_base + IDXD_GENCFG_OFFSET);
+ gencfg.user_int_en = 1;
+ iowrite32(gencfg.bits, idxd->reg_base + IDXD_GENCFG_OFFSET);
+ idxd->pasid = pasid;
+
+ return ret;
}

static void idxd_disable_system_pasid(struct idxd_device *idxd)
{
+ struct pci_dev *pdev = idxd->pdev;
+ struct device *dev = &pdev->dev;
+ struct iommu_domain *domain;
+
+ domain = iommu_get_domain_for_dev(dev);
+ if (!domain || domain->type == IOMMU_DOMAIN_BLOCKED)
+ return;

- iommu_sva_unbind_device(idxd->sva);
+ iommu_detach_device_pasid(domain, dev, idxd->pasid);
+ iommu_sva_release_pasid(idxd->pasid);
idxd->sva = NULL;
+ idxd->pasid = IOMMU_PASID_INVALID;
}

static int idxd_probe(struct idxd_device *idxd)
@@ -535,8 +577,9 @@ static int idxd_probe(struct idxd_device *idxd)
} else {
set_bit(IDXD_FLAG_USER_PASID_ENABLED, &idxd->flags);

- if (idxd_enable_system_pasid(idxd))
- dev_warn(dev, "No in-kernel DMA with PASID.\n");
+ rc = idxd_enable_system_pasid(idxd);
+ if (rc)
+ dev_warn(dev, "No in-kernel DMA with PASID. %d\n", rc);
else
set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
}
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 18cd8151dee0..c5561c00a503 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -944,13 +944,6 @@ static ssize_t wq_name_store(struct device *dev,
if (strlen(buf) > WQ_NAME_SIZE || strlen(buf) == 0)
return -EINVAL;

- /*
- * This is temporarily placed here until we have SVM support for
- * dmaengine.
- */
- if (wq->type == IDXD_WQT_KERNEL && device_pasid_enabled(wq->idxd))
- return -EOPNOTSUPP;
-
input = kstrndup(buf, count, GFP_KERNEL);
if (!input)
return -ENOMEM;
--
2.25.1

2023-03-28 04:57:11

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v2 1/8] iommu/vt-d: Use non-privileged mode for all PASIDs

On 3/28/23 7:21 AM, Jacob Pan wrote:
> Supervisor Request Enable (SRE) bit in a PASID entry is for permission
> checking on DMA requests. When SRE = 0, DMA with supervisor privilege
> will be blocked. However, for in-kernel DMA this is not necessary in that
> we are targeting kernel memory anyway. There's no need to differentiate
> user and kernel for in-kernel DMA.
>
> Let's use non-privileged (user) permission for all PASIDs used in kernel,
> it will be consistent with DMA without PASID (RID_PASID) as well.
>
> Signed-off-by: Jacob Pan<[email protected]>

Reviewed-by: Lu Baolu <[email protected]>

Best regards,
baolu

2023-03-28 04:59:46

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v2 2/8] iommu/vt-d: Remove PASID supervisor request support

On 3/28/23 7:21 AM, Jacob Pan wrote:
> There's no more usage, remove PASID supervisor support.
>
> Suggested-by: Lu Baolu<[email protected]>
> Signed-off-by: Jacob Pan<[email protected]>

Reviewed-by: Lu Baolu <[email protected]>

Best regards,
baolu

2023-03-28 05:16:26

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v2 3/8] iommu/sva: Support reservation of global SVA PASIDs

On 3/28/23 7:21 AM, Jacob Pan wrote:
> Devices that use Intel ENQCMD to submit work must use global PASIDs in
> that the PASID are stored in a per CPU MSR. When such device need to
> submit work for in-kernel DMA with PASID, it must allocate PASIDs from
> the same global number space to avoid conflict.
>
> This patch introduces IOMMU SVA APIs to reserve and release global PASIDs.
> It is expected that device drivers will use the allocated PASIDs to attach
> to appropriate IOMMU domains for use.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/iommu-sva.c | 33 +++++++++++++++++++++++++++++++++
> include/linux/iommu.h | 14 ++++++++++++++
> 2 files changed, 47 insertions(+)
>
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index c434b95dc8eb..84b9de84b3e0 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -148,6 +148,39 @@ u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> }
> EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
>
> +/**
> + * @brief
> + * Reserve a PASID from the SVA global number space.
> + *
> + * @param min starting range, inclusive
> + * @param max ending range, inclusive
> + * @return The reserved PASID on success or IOMMU_PASID_INVALID on failure.
> + */
> +ioasid_t iommu_sva_reserve_pasid(ioasid_t min, ioasid_t max)
> +{
> + int ret;
> +
> + if (!pasid_valid(min) || !pasid_valid(max) ||
> + min == 0 || max < min)

I still think we should make "min == 0" a valid case. The ARM/AMD/Intel
drivers should reserve PASID 0 for special usage with this interface.

Probably we should also make "min == max" a valid case. Both @min and
@max are inclusive.

> + return IOMMU_PASID_INVALID;
> +
> + ret = ida_alloc_range(&iommu_global_pasid_ida, min, max, GFP_KERNEL);
> + if (ret < 0)
> + return IOMMU_PASID_INVALID;
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_reserve_pasid);
> +
> +void iommu_sva_release_pasid(ioasid_t pasid)
> +{
> + if (!pasid_valid(pasid))

The caller should never release an invalid pasid. So perhaps,

if (WARN_ON(!pasid_valid(pasid)))
return;

to discover bugs during development.

> + return;
> +
> + ida_free(&iommu_global_pasid_ida, pasid);
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_release_pasid);
> +
> /*
> * I/O page fault handler for SVA
> */
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 54f535ff9868..0471089dc1d0 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -1187,6 +1187,9 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> struct mm_struct *mm);
> void iommu_sva_unbind_device(struct iommu_sva *handle);
> u32 iommu_sva_get_pasid(struct iommu_sva *handle);
> +ioasid_t iommu_sva_reserve_pasid(ioasid_t min, ioasid_t max);
> +void iommu_sva_release_pasid(ioasid_t pasid);
> +
> #else
> static inline struct iommu_sva *
> iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> @@ -1202,6 +1205,17 @@ static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> {
> return IOMMU_PASID_INVALID;
> }
> +
> +static inline ioasid_t iommu_sva_reserve_pasid(ioasid_t min, ioasid_t max)
> +{
> + return IOMMU_PASID_INVALID;
> +}
> +
> +static inline void iommu_sva_release_pasid(ioasid_t pasid)
> +{
> +
> +}
> +
> static inline void mm_pasid_init(struct mm_struct *mm) {}
> static inline void mm_pasid_drop(struct mm_struct *mm) {}
> #endif /* CONFIG_IOMMU_SVA */

Best regards,
baolu

2023-03-28 05:20:27

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v2 4/8] iommu/vt-d: Reserve RID_PASID from global SVA PASID space

On 3/28/23 7:21 AM, Jacob Pan wrote:
> On VT-d platforms, RID_PASID is used for DMA request without PASID. We
> should not treat RID_PASID special instead let it be allocated from the
> global SVA PASID number space.

It's same to AMD and ARM SMMUv3, right? They also need an explicit
reservation of PASID 0.

>
> Consequently, for devices also do DMA with PASID, drivers will not worry
> about conflicts when it comes to allocating PASIDs for in-kernel DMA.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/intel/iommu.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 9f737ef55463..61c06f7ad8f7 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -3956,6 +3956,10 @@ int __init intel_iommu_init(void)
>
> intel_iommu_enabled = 1;
>
> +#ifdef CONFIG_INTEL_IOMMU_SVM

Do we really need this #ifdef? IOMMU_SVA is selected by INTEL_IOMMU_SVM,
right? So if CONFIG_INTEL_IOMMU_SVM is not set,
iommu_sva_reserve_pasid() is just a dumb.

> + /* Reserved RID_PASID from the global namespace for legacy DMA */
> + iommu_sva_reserve_pasid(PASID_RID2PASID, PASID_RID2PASID);
> +#endif
> return 0;
>
> out_free_dmar:

Best regards,
baolu

2023-03-28 05:49:55

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v2 5/8] iommu/vt-d: Make device pasid attachment explicit

On 3/28/23 7:21 AM, Jacob Pan wrote:
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 65b15be72878..b6c26f25d1ba 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -595,6 +595,7 @@ struct dmar_domain {
>
> spinlock_t lock; /* Protect device tracking lists */
> struct list_head devices; /* all devices' list */
> + struct list_head dev_pasids; /* all attached pasids */
>
> struct dma_pte *pgd; /* virtual address */
> int gaw; /* max guest address width */
> @@ -708,6 +709,7 @@ struct device_domain_info {
> u8 ats_supported:1;
> u8 ats_enabled:1;
> u8 dtlb_extra_inval:1; /* Quirk for devices need extra flush */
> + u8 dev_attached:1; /* Device context activated */
> u8 ats_qdep;
> struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
> struct intel_iommu *iommu; /* IOMMU used by this device */
> @@ -715,6 +717,12 @@ struct device_domain_info {
> struct pasid_table *pasid_table; /* pasid table */
> };
>
> +struct device_pasid_info {
> + struct list_head link_domain; /* link to domain siblings */
> + struct device *dev; /* physical device derived from */
> + ioasid_t pasid; /* PASID on physical device */
> +};

The dev_pasids list seems to be duplicate with iommu_group::pasid_array.

The pasid_array is de facto per-device as the PCI subsystem requires ACS
to be enabled on the upstream path to the root port.

pci_enable_pasid():
385 if (!pci_acs_path_enabled(pdev, NULL, PCI_ACS_RR | PCI_ACS_UF))
386 return -EINVAL;

For such PCI topology, pci_device_group() always assigns an exclusive
iommu group (a.k.a. singleton group).

So, how about moving the pasid_array from struct iommu_group to struct
dev_iommu? With this refactoring, the individual iommu driver has no
need to create their own pasid array or list.

Instead of using iommu_group::mutex, perhaps the pasid_array needs its
own lock in struct dev_iommu after moving.

Best regards,
baolu

2023-03-28 06:12:42

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v2 7/8] iommu: Export iommu_get_dma_domain

On 3/28/23 7:21 AM, Jacob Pan wrote:
> Devices that use ENQCMDS to submit work needs to retrieve its DMA
> domain. It can then attach PASID to the DMA domain for shared mapping
> (with RID) established by DMA API.
>
> Signed-off-by: Jacob Pan<[email protected]>
> ---
> drivers/iommu/iommu.c | 1 +
> include/linux/iommu.h | 5 +++++
> 2 files changed, 6 insertions(+)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 10db680acaed..c51d343a75d2 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2118,6 +2118,7 @@ struct iommu_domain *iommu_get_dma_domain(struct device *dev)
> {
> return dev->iommu_group->default_domain;
> }
> +EXPORT_SYMBOL_GPL(iommu_get_dma_domain);

Directly exporting this function for external use seems unsafe. If the
caller is the kernel driver for this device, it's fine because default
domain remains unchanged during the life cycle of the driver. Otherwise,
using this function may cause UAF. Keep in mind that group's default
domain could be changed through sysfs.

However, iommu_get_domain_for_dev() has already done so and has been
exported. Maybe I'm worried too much. :-)

Best regards,
baolu

2023-03-28 07:38:00

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 3/8] iommu/sva: Support reservation of global SVA PASIDs

> From: Jacob Pan <[email protected]>
> Sent: Tuesday, March 28, 2023 7:22 AM
>
> +/**
> + * @brief
> + * Reserve a PASID from the SVA global number space.
> + *
> + * @param min starting range, inclusive
> + * @param max ending range, inclusive
> + * @return The reserved PASID on success or IOMMU_PASID_INVALID on
> failure.
> + */
> +ioasid_t iommu_sva_reserve_pasid(ioasid_t min, ioasid_t max)
> +{
> + int ret;
> +
> + if (!pasid_valid(min) || !pasid_valid(max) ||
> + min == 0 || max < min)
> + return IOMMU_PASID_INVALID;
> +
> + ret = ida_alloc_range(&iommu_global_pasid_ida, min, max,
> GFP_KERNEL);
> + if (ret < 0)
> + return IOMMU_PASID_INVALID;
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_sva_reserve_pasid);
> +

Look at this function. There is no single word about sva except
in the function name.

sva is just one user of global pasids.

when a driver supports sva it has to always use global pasids even
for non-sva usages like dma pasid.

but this doesn't mean that we should build the API around sva.

it's really about global pasids.

let's just call it clearly as iommu_alloc_global_pasid(min, max).

Then we can define a wrapper iommu_reserve_global_pasid(pasid)
as iommu_alloc_global_pasid(pasid, pasid).

for PASID#0 driver calls iommu_reserve_global_pasid(0).

for dma pasid driver calls iommu_alloc_global_pasid() to get a random
one instead of reserving pasid#1.

this would be future proof when the same driver starts to allocate
more pasids for other usages e..g siov.

2023-03-28 07:46:42

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 5/8] iommu/vt-d: Make device pasid attachment explicit

> From: Baolu Lu <[email protected]>
> Sent: Tuesday, March 28, 2023 1:49 PM
>
> On 3/28/23 7:21 AM, Jacob Pan wrote:
> > diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> > index 65b15be72878..b6c26f25d1ba 100644
> > --- a/drivers/iommu/intel/iommu.h
> > +++ b/drivers/iommu/intel/iommu.h
> > @@ -595,6 +595,7 @@ struct dmar_domain {
> >
> > spinlock_t lock; /* Protect device tracking lists */
> > struct list_head devices; /* all devices' list */
> > + struct list_head dev_pasids; /* all attached pasids */
> >
> > struct dma_pte *pgd; /* virtual address */
> > int gaw; /* max guest address width */
> > @@ -708,6 +709,7 @@ struct device_domain_info {
> > u8 ats_supported:1;
> > u8 ats_enabled:1;
> > u8 dtlb_extra_inval:1; /* Quirk for devices need extra flush */
> > + u8 dev_attached:1; /* Device context activated */
> > u8 ats_qdep;
> > struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
> > struct intel_iommu *iommu; /* IOMMU used by this device */
> > @@ -715,6 +717,12 @@ struct device_domain_info {
> > struct pasid_table *pasid_table; /* pasid table */
> > };
> >
> > +struct device_pasid_info {
> > + struct list_head link_domain; /* link to domain siblings */
> > + struct device *dev; /* physical device derived from */
> > + ioasid_t pasid; /* PASID on physical device */
> > +};
>
> The dev_pasids list seems to be duplicate with iommu_group::pasid_array.
>
> The pasid_array is de facto per-device as the PCI subsystem requires ACS
> to be enabled on the upstream path to the root port.
>
> pci_enable_pasid():
> 385 if (!pci_acs_path_enabled(pdev, NULL, PCI_ACS_RR | PCI_ACS_UF))
> 386 return -EINVAL;
>
> For such PCI topology, pci_device_group() always assigns an exclusive
> iommu group (a.k.a. singleton group).
>
> So, how about moving the pasid_array from struct iommu_group to struct
> dev_iommu? With this refactoring, the individual iommu driver has no
> need to create their own pasid array or list.
>
> Instead of using iommu_group::mutex, perhaps the pasid_array needs its
> own lock in struct dev_iommu after moving.
>

What you suggested is a right thing and more friendly to pasid attach
in iommufd [1].

but dev_pasids list here is a different thing. It tracks which [device, pasid]
is attached to the domain. w/o this information you'll have to walk the
pasid_array of every attached device under the domain and search for
every pasid entry pointing to the said domain. It's very inefficient.

of course if this can be done more generally it'd be nice.????

[1] https://lore.kernel.org/linux-iommu/[email protected]/

2023-03-28 07:48:57

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 6/8] iommu/vt-d: Implement set_dev_pasid domain op

> From: Jacob Pan <[email protected]>
> Sent: Tuesday, March 28, 2023 7:22 AM
>
> Devices that use ENQCMDS to submit work on buffers mapped by DMA API
> must attach a PASID to the default domain of the device. In preparation
> for this use case, this patch implements set_dev_pasid() for the
> default_domain_ops. Besides PASID attachment, device will also be
> attached to the domain as the result of this call if the device has not
> been attached before.
>

I didn't get the last point. PASID attach should only have the scope
for the pasid. RID of the device might be attached to another domain
which shouldn't be changed by this call.

2023-03-28 08:04:01

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 7/8] iommu: Export iommu_get_dma_domain

> From: Baolu Lu <[email protected]>
> Sent: Tuesday, March 28, 2023 2:04 PM
>
> On 3/28/23 7:21 AM, Jacob Pan wrote:
> > Devices that use ENQCMDS to submit work needs to retrieve its DMA
> > domain. It can then attach PASID to the DMA domain for shared mapping
> > (with RID) established by DMA API.
> >
> > Signed-off-by: Jacob Pan<[email protected]>
> > ---
> > drivers/iommu/iommu.c | 1 +
> > include/linux/iommu.h | 5 +++++
> > 2 files changed, 6 insertions(+)
> >
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 10db680acaed..c51d343a75d2 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -2118,6 +2118,7 @@ struct iommu_domain
> *iommu_get_dma_domain(struct device *dev)
> > {
> > return dev->iommu_group->default_domain;
> > }
> > +EXPORT_SYMBOL_GPL(iommu_get_dma_domain);
>
> Directly exporting this function for external use seems unsafe. If the
> caller is the kernel driver for this device, it's fine because default
> domain remains unchanged during the life cycle of the driver. Otherwise,
> using this function may cause UAF. Keep in mind that group's default
> domain could be changed through sysfs.
>
> However, iommu_get_domain_for_dev() has already done so and has been
> exported. Maybe I'm worried too much. :-)
>

Agree. The kernel driver managing the device wants to get the current
domain of the device then iommu_get_domain_for_dev() is the right
interface. It knows the domain is the dma domain.

2023-03-28 15:29:04

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v2 3/8] iommu/sva: Support reservation of global SVA PASIDs

Hi Baolu,

On Tue, 28 Mar 2023 13:11:12 +0800, Baolu Lu <[email protected]>
wrote:

> On 3/28/23 7:21 AM, Jacob Pan wrote:
> > Devices that use Intel ENQCMD to submit work must use global PASIDs in
> > that the PASID are stored in a per CPU MSR. When such device need to
> > submit work for in-kernel DMA with PASID, it must allocate PASIDs from
> > the same global number space to avoid conflict.
> >
> > This patch introduces IOMMU SVA APIs to reserve and release global
> > PASIDs. It is expected that device drivers will use the allocated
> > PASIDs to attach to appropriate IOMMU domains for use.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/iommu-sva.c | 33 +++++++++++++++++++++++++++++++++
> > include/linux/iommu.h | 14 ++++++++++++++
> > 2 files changed, 47 insertions(+)
> >
> > diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> > index c434b95dc8eb..84b9de84b3e0 100644
> > --- a/drivers/iommu/iommu-sva.c
> > +++ b/drivers/iommu/iommu-sva.c
> > @@ -148,6 +148,39 @@ u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> > }
> > EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
> >
> > +/**
> > + * @brief
> > + * Reserve a PASID from the SVA global number space.
> > + *
> > + * @param min starting range, inclusive
> > + * @param max ending range, inclusive
> > + * @return The reserved PASID on success or IOMMU_PASID_INVALID on
> > failure.
> > + */
> > +ioasid_t iommu_sva_reserve_pasid(ioasid_t min, ioasid_t max)
> > +{
> > + int ret;
> > +
> > + if (!pasid_valid(min) || !pasid_valid(max) ||
> > + min == 0 || max < min)
>
> I still think we should make "min == 0" a valid case. The ARM/AMD/Intel
> drivers should reserve PASID 0 for special usage with this interface.
>
> Probably we should also make "min == max" a valid case. Both @min and
> @max are inclusive.
I agreed, 0 should not be special.

>
> > + return IOMMU_PASID_INVALID;
> > +
> > + ret = ida_alloc_range(&iommu_global_pasid_ida, min, max,
> > GFP_KERNEL);
> > + if (ret < 0)
> > + return IOMMU_PASID_INVALID;
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_sva_reserve_pasid);
> > +
> > +void iommu_sva_release_pasid(ioasid_t pasid)
> > +{
> > + if (!pasid_valid(pasid))
>
> The caller should never release an invalid pasid. So perhaps,
>
> if (WARN_ON(!pasid_valid(pasid)))
> return;
>
good point!

> to discover bugs during development.
>
> > + return;
> > +
> > + ida_free(&iommu_global_pasid_ida, pasid);
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_sva_release_pasid);
> > +
> > /*
> > * I/O page fault handler for SVA
> > */
> > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index 54f535ff9868..0471089dc1d0 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -1187,6 +1187,9 @@ struct iommu_sva *iommu_sva_bind_device(struct
> > device *dev, struct mm_struct *mm);
> > void iommu_sva_unbind_device(struct iommu_sva *handle);
> > u32 iommu_sva_get_pasid(struct iommu_sva *handle);
> > +ioasid_t iommu_sva_reserve_pasid(ioasid_t min, ioasid_t max);
> > +void iommu_sva_release_pasid(ioasid_t pasid);
> > +
> > #else
> > static inline struct iommu_sva *
> > iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> > @@ -1202,6 +1205,17 @@ static inline u32 iommu_sva_get_pasid(struct
> > iommu_sva *handle) {
> > return IOMMU_PASID_INVALID;
> > }
> > +
> > +static inline ioasid_t iommu_sva_reserve_pasid(ioasid_t min, ioasid_t
> > max) +{
> > + return IOMMU_PASID_INVALID;
> > +}
> > +
> > +static inline void iommu_sva_release_pasid(ioasid_t pasid)
> > +{
> > +
> > +}
> > +
> > static inline void mm_pasid_init(struct mm_struct *mm) {}
> > static inline void mm_pasid_drop(struct mm_struct *mm) {}
> > #endif /* CONFIG_IOMMU_SVA */
>
> Best regards,
> baolu


Thanks,

Jacob

2023-03-28 15:32:22

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v2 3/8] iommu/sva: Support reservation of global SVA PASIDs

Hi Kevin,

On Tue, 28 Mar 2023 07:35:43 +0000, "Tian, Kevin" <[email protected]>
wrote:

> > From: Jacob Pan <[email protected]>
> > Sent: Tuesday, March 28, 2023 7:22 AM
> >
> > +/**
> > + * @brief
> > + * Reserve a PASID from the SVA global number space.
> > + *
> > + * @param min starting range, inclusive
> > + * @param max ending range, inclusive
> > + * @return The reserved PASID on success or IOMMU_PASID_INVALID on
> > failure.
> > + */
> > +ioasid_t iommu_sva_reserve_pasid(ioasid_t min, ioasid_t max)
> > +{
> > + int ret;
> > +
> > + if (!pasid_valid(min) || !pasid_valid(max) ||
> > + min == 0 || max < min)
> > + return IOMMU_PASID_INVALID;
> > +
> > + ret = ida_alloc_range(&iommu_global_pasid_ida, min, max,
> > GFP_KERNEL);
> > + if (ret < 0)
> > + return IOMMU_PASID_INVALID;
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(iommu_sva_reserve_pasid);
> > +
>
> Look at this function. There is no single word about sva except
> in the function name.
>
> sva is just one user of global pasids.
>
> when a driver supports sva it has to always use global pasids even
> for non-sva usages like dma pasid.
>
> but this doesn't mean that we should build the API around sva.
>
> it's really about global pasids.
>
> let's just call it clearly as iommu_alloc_global_pasid(min, max).
>
> Then we can define a wrapper iommu_reserve_global_pasid(pasid)
> as iommu_alloc_global_pasid(pasid, pasid).
>
> for PASID#0 driver calls iommu_reserve_global_pasid(0).
>
> for dma pasid driver calls iommu_alloc_global_pasid() to get a random
> one instead of reserving pasid#1.
>
> this would be future proof when the same driver starts to allocate
> more pasids for other usages e..g siov.
I don't have strong preference here. Jason and others?

For the DMA vs. SVA use cases, these APIs are used to carve out PASIDs from
the SVA space. Let it be the entire global space or a subset, we don't care.
We just don't want conflicts with SVA. e.g. if the SVA space shrank in the
future, this still works.

Thanks,

Jacob

2023-03-28 15:42:20

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v2 6/8] iommu/vt-d: Implement set_dev_pasid domain op

Hi Kevin,

On Tue, 28 Mar 2023 07:47:45 +0000, "Tian, Kevin" <[email protected]>
wrote:

> > From: Jacob Pan <[email protected]>
> > Sent: Tuesday, March 28, 2023 7:22 AM
> >
> > Devices that use ENQCMDS to submit work on buffers mapped by DMA API
> > must attach a PASID to the default domain of the device. In preparation
> > for this use case, this patch implements set_dev_pasid() for the
> > default_domain_ops. Besides PASID attachment, device will also be
> > attached to the domain as the result of this call if the device has not
> > been attached before.
> >
>
> I didn't get the last point. PASID attach should only have the scope
> for the pasid. RID of the device might be attached to another domain
> which shouldn't be changed by this call.
I meant if the RID context has not been set up before attaching this PASID,
this call will also set up the context, PASID dir etc. In the end, we
eliminated ordering requirement of attaching device, RID_PASID first, then
other PASIDs.
How about:
"If the device context has not been set up prior to this call, this will
set up the device context in addition to PASID attachment."

Thanks,

Jacob

2023-03-28 15:47:07

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v2 7/8] iommu: Export iommu_get_dma_domain

Hi Kevin,

On Tue, 28 Mar 2023 07:52:23 +0000, "Tian, Kevin" <[email protected]>
wrote:

> > From: Baolu Lu <[email protected]>
> > Sent: Tuesday, March 28, 2023 2:04 PM
> >
> > On 3/28/23 7:21 AM, Jacob Pan wrote:
> > > Devices that use ENQCMDS to submit work needs to retrieve its DMA
> > > domain. It can then attach PASID to the DMA domain for shared mapping
> > > (with RID) established by DMA API.
> > >
> > > Signed-off-by: Jacob Pan<[email protected]>
> > > ---
> > > drivers/iommu/iommu.c | 1 +
> > > include/linux/iommu.h | 5 +++++
> > > 2 files changed, 6 insertions(+)
> > >
> > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > > index 10db680acaed..c51d343a75d2 100644
> > > --- a/drivers/iommu/iommu.c
> > > +++ b/drivers/iommu/iommu.c
> > > @@ -2118,6 +2118,7 @@ struct iommu_domain
> > *iommu_get_dma_domain(struct device *dev)
> > > {
> > > return dev->iommu_group->default_domain;
> > > }
> > > +EXPORT_SYMBOL_GPL(iommu_get_dma_domain);
> >
> > Directly exporting this function for external use seems unsafe. If the
> > caller is the kernel driver for this device, it's fine because default
> > domain remains unchanged during the life cycle of the driver. Otherwise,
> > using this function may cause UAF. Keep in mind that group's default
> > domain could be changed through sysfs.
> >
> > However, iommu_get_domain_for_dev() has already done so and has been
> > exported. Maybe I'm worried too much. :-)
> >
>
> Agree. The kernel driver managing the device wants to get the current
> domain of the device then iommu_get_domain_for_dev() is the right
> interface. It knows the domain is the dma domain.
This goes back to v1 then :)

I thought the comments from v1 is that we don't want to check the domain
type is DMA domain returned by iommu_get_domain_for_dev()

If the driver "knows" the domain is dma domain, why can't it use
iommu_get_dma_domain()? seems paradoxical.

Thanks,

Jacob

2023-03-28 15:48:16

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v2 7/8] iommu: Export iommu_get_dma_domain

Hi Baolu,

On Tue, 28 Mar 2023 14:04:15 +0800, Baolu Lu <[email protected]>
wrote:

> On 3/28/23 7:21 AM, Jacob Pan wrote:
> > Devices that use ENQCMDS to submit work needs to retrieve its DMA
> > domain. It can then attach PASID to the DMA domain for shared mapping
> > (with RID) established by DMA API.
> >
> > Signed-off-by: Jacob Pan<[email protected]>
> > ---
> > drivers/iommu/iommu.c | 1 +
> > include/linux/iommu.h | 5 +++++
> > 2 files changed, 6 insertions(+)
> >
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 10db680acaed..c51d343a75d2 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -2118,6 +2118,7 @@ struct iommu_domain *iommu_get_dma_domain(struct
> > device *dev) {
> > return dev->iommu_group->default_domain;
> > }
> > +EXPORT_SYMBOL_GPL(iommu_get_dma_domain);
>
> Directly exporting this function for external use seems unsafe. If the
> caller is the kernel driver for this device, it's fine because default
> domain remains unchanged during the life cycle of the driver. Otherwise,
> using this function may cause UAF. Keep in mind that group's default
> domain could be changed through sysfs.
don't you have to unload the driver?

> However, iommu_get_domain_for_dev() has already done so and has been
> exported. Maybe I'm worried too much. :-)
>
> Best regards,
> baolu


Thanks,

Jacob

2023-03-28 16:05:37

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 3/8] iommu/sva: Support reservation of global SVA PASIDs

On Tue, Mar 28, 2023 at 08:31:10AM -0700, Jacob Pan wrote:

> > this would be future proof when the same driver starts to allocate
> > more pasids for other usages e..g siov.
> I don't have strong preference here. Jason and others?

Oh I'm all for getting the word SVA out of the PASID infrastructure.

SVA has nothing to do with PASID other than it requires it to work.

Jason

2023-03-28 16:32:04

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v2 7/8] iommu: Export iommu_get_dma_domain

On Tue, Mar 28, 2023 at 08:48:22AM -0700, Jacob Pan wrote:

> > Agree. The kernel driver managing the device wants to get the current
> > domain of the device then iommu_get_domain_for_dev() is the right
> > interface. It knows the domain is the dma domain.
> This goes back to v1 then :)
>
> I thought the comments from v1 is that we don't want to check the domain
> type is DMA domain returned by iommu_get_domain_for_dev()
>
> If the driver "knows" the domain is dma domain, why can't it use
> iommu_get_dma_domain()? seems paradoxical.

Huh?

The DMA API operates on the current domain of the device, be it
IDENTITY or UNMANAGED.

You have to copy whatever the current domain is to the PASID and you
should definately not check if it is DMA or something.

Jason

2023-03-28 16:37:52

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v2 4/8] iommu/vt-d: Reserve RID_PASID from global SVA PASID space

Hi Baolu,

On Tue, 28 Mar 2023 13:20:19 +0800, Baolu Lu <[email protected]>
wrote:

> On 3/28/23 7:21 AM, Jacob Pan wrote:
> > On VT-d platforms, RID_PASID is used for DMA request without PASID. We
> > should not treat RID_PASID special instead let it be allocated from the
> > global SVA PASID number space.
>
> It's same to AMD and ARM SMMUv3, right? They also need an explicit
> reservation of PASID 0.
>
yes, all IOMMU drivers need to do that. I will give it a try but might need
help to place the call.

> >
> > Consequently, for devices also do DMA with PASID, drivers will not worry
> > about conflicts when it comes to allocating PASIDs for in-kernel DMA.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/intel/iommu.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 9f737ef55463..61c06f7ad8f7 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -3956,6 +3956,10 @@ int __init intel_iommu_init(void)
> >
> > intel_iommu_enabled = 1;
> >
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
>
> Do we really need this #ifdef? IOMMU_SVA is selected by INTEL_IOMMU_SVM,
> right? So if CONFIG_INTEL_IOMMU_SVM is not set,
> iommu_sva_reserve_pasid() is just a dumb.
>
good catch! will remove

> > + /* Reserved RID_PASID from the global namespace for legacy DMA
> > */
> > + iommu_sva_reserve_pasid(PASID_RID2PASID, PASID_RID2PASID);
> > +#endif
> > return 0;
> >
> > out_free_dmar:
>
> Best regards,
> baolu


Thanks,

Jacob

2023-03-28 16:38:02

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v2 3/8] iommu/sva: Support reservation of global SVA PASIDs

Hi Jason,

On Tue, 28 Mar 2023 12:55:40 -0300, Jason Gunthorpe <[email protected]> wrote:

> On Tue, Mar 28, 2023 at 08:31:10AM -0700, Jacob Pan wrote:
>
> > > this would be future proof when the same driver starts to allocate
> > > more pasids for other usages e..g siov.
> > I don't have strong preference here. Jason and others?
>
> Oh I'm all for getting the word SVA out of the PASID infrastructure.
>
> SVA has nothing to do with PASID other than it requires it to work.
>
I will move pasid allocation code to iommu.c then.

Thanks,

Jacob

2023-03-28 17:32:44

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v2 7/8] iommu: Export iommu_get_dma_domain

Hi Jason,

On Tue, 28 Mar 2023 13:19:11 -0300, Jason Gunthorpe <[email protected]> wrote:

> On Tue, Mar 28, 2023 at 08:48:22AM -0700, Jacob Pan wrote:
>
> > > Agree. The kernel driver managing the device wants to get the current
> > > domain of the device then iommu_get_domain_for_dev() is the right
> > > interface. It knows the domain is the dma domain.
> > This goes back to v1 then :)
> >
> > I thought the comments from v1 is that we don't want to check the domain
> > type is DMA domain returned by iommu_get_domain_for_dev()
> >
> > If the driver "knows" the domain is dma domain, why can't it use
> > iommu_get_dma_domain()? seems paradoxical.
>
> Huh?
>
> The DMA API operates on the current domain of the device, be it
> IDENTITY or UNMANAGED.
>
> You have to copy whatever the current domain is to the PASID and you
> should definately not check if it is DMA or something.
>
right, the PASID works for IOVA, passthrough. I misunderstood the v1
review comments. I will go back to use iommu_get_domain_for_dev() but w/o
checking domain types.

Thanks all,

Jacob

2023-03-28 18:38:29

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v2 8/8] dmaengine/idxd: Re-enable kernel workqueue under DMA API

Hi, Jacob,

On 3/27/23 16:21, Jacob Pan wrote:
> Kernel workqueues were disabled due to flawed use of kernel VA and SVA
> API. Now That we have the support for attaching PASID to the device's

s/That/that/

> default domain and the ability to reserve global PASIDs from SVA APIs,
> we can re-enable the kernel work queues and use them under DMA API.
>
> We also use non-privileged access for in-kernel DMA to be consistent
> with the IOMMU settings. Consequently, interrupt for user privilege is
> enabled for work completion IRQs.
>
> Link:https://lore.kernel.org/linux-iommu/[email protected]/
> Reviewed-by: Dave Jiang <[email protected]>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/dma/idxd/device.c | 30 ++++-------------------
> drivers/dma/idxd/init.c | 51 ++++++++++++++++++++++++++++++++++++---
> drivers/dma/idxd/sysfs.c | 7 ------
> 3 files changed, 52 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
> index 6fca8fa8d3a8..f6b133d61a04 100644
> --- a/drivers/dma/idxd/device.c
> +++ b/drivers/dma/idxd/device.c
> @@ -299,21 +299,6 @@ void idxd_wqs_unmap_portal(struct idxd_device *idxd)
> }
> }
>
> -static void __idxd_wq_set_priv_locked(struct idxd_wq *wq, int priv)
> -{
> - struct idxd_device *idxd = wq->idxd;
> - union wqcfg wqcfg;
> - unsigned int offset;
> -
> - offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PRIVL_IDX);
> - spin_lock(&idxd->dev_lock);
> - wqcfg.bits[WQCFG_PRIVL_IDX] = ioread32(idxd->reg_base + offset);
> - wqcfg.priv = priv;
> - wq->wqcfg->bits[WQCFG_PRIVL_IDX] = wqcfg.bits[WQCFG_PRIVL_IDX];
> - iowrite32(wqcfg.bits[WQCFG_PRIVL_IDX], idxd->reg_base + offset);
> - spin_unlock(&idxd->dev_lock);
> -}
> -
> static void __idxd_wq_set_pasid_locked(struct idxd_wq *wq, int pasid)
> {
> struct idxd_device *idxd = wq->idxd;
> @@ -1324,15 +1309,14 @@ int drv_enable_wq(struct idxd_wq *wq)
> }
>
> /*
> - * In the event that the WQ is configurable for pasid and priv bits.
> - * For kernel wq, the driver should setup the pasid, pasid_en, and priv bit.
> - * However, for non-kernel wq, the driver should only set the pasid_en bit for
> - * shared wq. A dedicated wq that is not 'kernel' type will configure pasid and
> + * In the event that the WQ is configurable for pasid, the driver
> + * should setup the pasid, pasid_en bit. This is true for both kernel
> + * and user shared workqueues. There is no need to setup priv bit in
> + * that in-kernel DMA will also do user privileged requests.
> + * A dedicated wq that is not 'kernel' type will configure pasid and
> * pasid_en later on so there is no need to setup.
> */
> if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) {
> - int priv = 0;
> -
> if (wq_pasid_enabled(wq)) {
> if (is_idxd_wq_kernel(wq) || wq_shared(wq)) {
> u32 pasid = wq_dedicated(wq) ? idxd->pasid : 0;
> @@ -1340,10 +1324,6 @@ int drv_enable_wq(struct idxd_wq *wq)
> __idxd_wq_set_pasid_locked(wq, pasid);
> }
> }
> -
> - if (is_idxd_wq_kernel(wq))
> - priv = 1;
> - __idxd_wq_set_priv_locked(wq, priv);
> }
>
> rc = 0;
> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> index e6ee267da0ff..a3396e1b38f1 100644
> --- a/drivers/dma/idxd/init.c
> +++ b/drivers/dma/idxd/init.c
> @@ -506,14 +506,56 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
>
> static int idxd_enable_system_pasid(struct idxd_device *idxd)
> {
> - return -EOPNOTSUPP;
> + struct pci_dev *pdev = idxd->pdev;
> + struct device *dev = &pdev->dev;
> + struct iommu_domain *domain;
> + union gencfg_reg gencfg;
> + ioasid_t pasid;
> + int ret;
> +
> + /*
> + * Attach a global PASID to the DMA domain so that we can use ENQCMDS
> + * to submit work on buffers mapped by DMA API.
> + */
> + domain = iommu_get_dma_domain(dev);
> + if (!domain)
> + return -EPERM;
> +
> + pasid = iommu_sva_reserve_pasid(1, dev->iommu->max_pasids);
> + if (pasid == IOMMU_PASID_INVALID)
> + return -ENOSPC;
> +
> + ret = iommu_attach_device_pasid(domain, dev, pasid);
> + if (ret) {
> + dev_err(dev, "failed to attach device pasid %d, domain type %d",
> + pasid, domain->type);
> + iommu_sva_release_pasid(pasid);
> + return ret;
> + }
> +
> + /* Since we set user privilege for kernel DMA, enable completion IRQ */
> + gencfg.bits = ioread32(idxd->reg_base + IDXD_GENCFG_OFFSET);
> + gencfg.user_int_en = 1;
> + iowrite32(gencfg.bits, idxd->reg_base + IDXD_GENCFG_OFFSET);
> + idxd->pasid = pasid;
> +
> + return ret;
> }
>
> static void idxd_disable_system_pasid(struct idxd_device *idxd)
> {
> + struct pci_dev *pdev = idxd->pdev;
> + struct device *dev = &pdev->dev;
> + struct iommu_domain *domain;
> +
> + domain = iommu_get_domain_for_dev(dev);
> + if (!domain || domain->type == IOMMU_DOMAIN_BLOCKED)
> + return;
>
> - iommu_sva_unbind_device(idxd->sva);
> + iommu_detach_device_pasid(domain, dev, idxd->pasid);
> + iommu_sva_release_pasid(idxd->pasid);

May need gencfg.user_int_en = 0 here.

> idxd->sva = NULL;
> + idxd->pasid = IOMMU_PASID_INVALID;
> }
>
> static int idxd_probe(struct idxd_device *idxd)
> @@ -535,8 +577,9 @@ static int idxd_probe(struct idxd_device *idxd)
> } else {
> set_bit(IDXD_FLAG_USER_PASID_ENABLED, &idxd->flags);
>
> - if (idxd_enable_system_pasid(idxd))
> - dev_warn(dev, "No in-kernel DMA with PASID.\n");
> + rc = idxd_enable_system_pasid(idxd);
> + if (rc)
> + dev_warn(dev, "No in-kernel DMA with PASID. %d\n", rc);
> else
> set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
> }
> diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
> index 18cd8151dee0..c5561c00a503 100644
> --- a/drivers/dma/idxd/sysfs.c
> +++ b/drivers/dma/idxd/sysfs.c
> @@ -944,13 +944,6 @@ static ssize_t wq_name_store(struct device *dev,
> if (strlen(buf) > WQ_NAME_SIZE || strlen(buf) == 0)
> return -EINVAL;
>
> - /*
> - * This is temporarily placed here until we have SVM support for
> - * dmaengine.
> - */
> - if (wq->type == IDXD_WQT_KERNEL && device_pasid_enabled(wq->idxd))
> - return -EOPNOTSUPP;
> -
> input = kstrndup(buf, count, GFP_KERNEL);
> if (!input)
> return -ENOMEM;

Thanks.

-Fenghua

2023-03-28 20:22:38

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v2 8/8] dmaengine/idxd: Re-enable kernel workqueue under DMA API

Hi Fenghua,

On Tue, 28 Mar 2023 11:16:31 -0700, Fenghua Yu <[email protected]> wrote:

> Hi, Jacob,
>
> On 3/27/23 16:21, Jacob Pan wrote:
> > Kernel workqueues were disabled due to flawed use of kernel VA and SVA
> > API. Now That we have the support for attaching PASID to the device's
>
> s/That/that/
will fix

> > default domain and the ability to reserve global PASIDs from SVA APIs,
> > we can re-enable the kernel work queues and use them under DMA API.
> >
> > We also use non-privileged access for in-kernel DMA to be consistent
> > with the IOMMU settings. Consequently, interrupt for user privilege is
> > enabled for work completion IRQs.
> >
> > Link:https://lore.kernel.org/linux-iommu/[email protected]/
> > Reviewed-by: Dave Jiang <[email protected]>
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/dma/idxd/device.c | 30 ++++-------------------
> > drivers/dma/idxd/init.c | 51 ++++++++++++++++++++++++++++++++++++---
> > drivers/dma/idxd/sysfs.c | 7 ------
> > 3 files changed, 52 insertions(+), 36 deletions(-)
> >
> > diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
> > index 6fca8fa8d3a8..f6b133d61a04 100644
> > --- a/drivers/dma/idxd/device.c
> > +++ b/drivers/dma/idxd/device.c
> > @@ -299,21 +299,6 @@ void idxd_wqs_unmap_portal(struct idxd_device
> > *idxd) }
> > }
> >
> > -static void __idxd_wq_set_priv_locked(struct idxd_wq *wq, int priv)
> > -{
> > - struct idxd_device *idxd = wq->idxd;
> > - union wqcfg wqcfg;
> > - unsigned int offset;
> > -
> > - offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PRIVL_IDX);
> > - spin_lock(&idxd->dev_lock);
> > - wqcfg.bits[WQCFG_PRIVL_IDX] = ioread32(idxd->reg_base +
> > offset);
> > - wqcfg.priv = priv;
> > - wq->wqcfg->bits[WQCFG_PRIVL_IDX] = wqcfg.bits[WQCFG_PRIVL_IDX];
> > - iowrite32(wqcfg.bits[WQCFG_PRIVL_IDX], idxd->reg_base +
> > offset);
> > - spin_unlock(&idxd->dev_lock);
> > -}
> > -
> > static void __idxd_wq_set_pasid_locked(struct idxd_wq *wq, int pasid)
> > {
> > struct idxd_device *idxd = wq->idxd;
> > @@ -1324,15 +1309,14 @@ int drv_enable_wq(struct idxd_wq *wq)
> > }
> >
> > /*
> > - * In the event that the WQ is configurable for pasid and priv
> > bits.
> > - * For kernel wq, the driver should setup the pasid, pasid_en,
> > and priv bit.
> > - * However, for non-kernel wq, the driver should only set the
> > pasid_en bit for
> > - * shared wq. A dedicated wq that is not 'kernel' type will
> > configure pasid and
> > + * In the event that the WQ is configurable for pasid, the
> > driver
> > + * should setup the pasid, pasid_en bit. This is true for both
> > kernel
> > + * and user shared workqueues. There is no need to setup priv
> > bit in
> > + * that in-kernel DMA will also do user privileged requests.
> > + * A dedicated wq that is not 'kernel' type will configure
> > pasid and
> > * pasid_en later on so there is no need to setup.
> > */
> > if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) {
> > - int priv = 0;
> > -
> > if (wq_pasid_enabled(wq)) {
> > if (is_idxd_wq_kernel(wq) || wq_shared(wq)) {
> > u32 pasid = wq_dedicated(wq) ?
> > idxd->pasid : 0; @@ -1340,10 +1324,6 @@ int drv_enable_wq(struct
> > idxd_wq *wq) __idxd_wq_set_pasid_locked(wq, pasid);
> > }
> > }
> > -
> > - if (is_idxd_wq_kernel(wq))
> > - priv = 1;
> > - __idxd_wq_set_priv_locked(wq, priv);
> > }
> >
> > rc = 0;
> > diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> > index e6ee267da0ff..a3396e1b38f1 100644
> > --- a/drivers/dma/idxd/init.c
> > +++ b/drivers/dma/idxd/init.c
> > @@ -506,14 +506,56 @@ static struct idxd_device *idxd_alloc(struct
> > pci_dev *pdev, struct idxd_driver_d
> > static int idxd_enable_system_pasid(struct idxd_device *idxd)
> > {
> > - return -EOPNOTSUPP;
> > + struct pci_dev *pdev = idxd->pdev;
> > + struct device *dev = &pdev->dev;
> > + struct iommu_domain *domain;
> > + union gencfg_reg gencfg;
> > + ioasid_t pasid;
> > + int ret;
> > +
> > + /*
> > + * Attach a global PASID to the DMA domain so that we can use
> > ENQCMDS
> > + * to submit work on buffers mapped by DMA API.
> > + */
> > + domain = iommu_get_dma_domain(dev);
> > + if (!domain)
> > + return -EPERM;
> > +
> > + pasid = iommu_sva_reserve_pasid(1, dev->iommu->max_pasids);
> > + if (pasid == IOMMU_PASID_INVALID)
> > + return -ENOSPC;
> > +
> > + ret = iommu_attach_device_pasid(domain, dev, pasid);
> > + if (ret) {
> > + dev_err(dev, "failed to attach device pasid %d, domain
> > type %d",
> > + pasid, domain->type);
> > + iommu_sva_release_pasid(pasid);
> > + return ret;
> > + }
> > +
> > + /* Since we set user privilege for kernel DMA, enable
> > completion IRQ */
> > + gencfg.bits = ioread32(idxd->reg_base + IDXD_GENCFG_OFFSET);
> > + gencfg.user_int_en = 1;
> > + iowrite32(gencfg.bits, idxd->reg_base + IDXD_GENCFG_OFFSET);
> > + idxd->pasid = pasid;
> > +
> > + return ret;
> > }
> >
> > static void idxd_disable_system_pasid(struct idxd_device *idxd)
> > {
> > + struct pci_dev *pdev = idxd->pdev;
> > + struct device *dev = &pdev->dev;
> > + struct iommu_domain *domain;
> > +
> > + domain = iommu_get_domain_for_dev(dev);
> > + if (!domain || domain->type == IOMMU_DOMAIN_BLOCKED)
> > + return;
> >
> > - iommu_sva_unbind_device(idxd->sva);
> > + iommu_detach_device_pasid(domain, dev, idxd->pasid);
> > + iommu_sva_release_pasid(idxd->pasid);
>
> May need gencfg.user_int_en = 0 here.
Yes, good catch!


Thanks,

Jacob

2023-03-28 20:41:06

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v2 5/8] iommu/vt-d: Make device pasid attachment explicit

Hi Kevin,

On Tue, 28 Mar 2023 07:44:52 +0000, "Tian, Kevin" <[email protected]>
wrote:

> > From: Baolu Lu <[email protected]>
> > Sent: Tuesday, March 28, 2023 1:49 PM
> >
> > On 3/28/23 7:21 AM, Jacob Pan wrote:
> > > diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> > > index 65b15be72878..b6c26f25d1ba 100644
> > > --- a/drivers/iommu/intel/iommu.h
> > > +++ b/drivers/iommu/intel/iommu.h
> > > @@ -595,6 +595,7 @@ struct dmar_domain {
> > >
> > > spinlock_t lock; /* Protect device tracking
> > > lists */ struct list_head devices; /* all devices' list */
> > > + struct list_head dev_pasids; /* all attached pasids */
> > >
> > > struct dma_pte *pgd; /* virtual
> > > address */ int gaw; /* max guest
> > > address width */ @@ -708,6 +709,7 @@ struct device_domain_info {
> > > u8 ats_supported:1;
> > > u8 ats_enabled:1;
> > > u8 dtlb_extra_inval:1; /* Quirk for devices need
> > > extra flush */
> > > + u8 dev_attached:1; /* Device context activated */
> > > u8 ats_qdep;
> > > struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
> > > struct intel_iommu *iommu; /* IOMMU used by this device */
> > > @@ -715,6 +717,12 @@ struct device_domain_info {
> > > struct pasid_table *pasid_table; /* pasid table */
> > > };
> > >
> > > +struct device_pasid_info {
> > > + struct list_head link_domain; /* link to domain
> > > siblings */
> > > + struct device *dev; /* physical device
> > > derived from */
> > > + ioasid_t pasid; /* PASID on physical
> > > device */ +};
> >
> > The dev_pasids list seems to be duplicate with iommu_group::pasid_array.
> >
> > The pasid_array is de facto per-device as the PCI subsystem requires ACS
> > to be enabled on the upstream path to the root port.
> >
> > pci_enable_pasid():
> > 385 if (!pci_acs_path_enabled(pdev, NULL, PCI_ACS_RR |
> > PCI_ACS_UF)) 386 return -EINVAL;
> >
> > For such PCI topology, pci_device_group() always assigns an exclusive
> > iommu group (a.k.a. singleton group).
> >
> > So, how about moving the pasid_array from struct iommu_group to struct
> > dev_iommu? With this refactoring, the individual iommu driver has no
> > need to create their own pasid array or list.
> >
> > Instead of using iommu_group::mutex, perhaps the pasid_array needs its
> > own lock in struct dev_iommu after moving.
> >
>
> What you suggested is a right thing and more friendly to pasid attach
> in iommufd [1].
>
> but dev_pasids list here is a different thing. It tracks which [device,
> pasid] is attached to the domain. w/o this information you'll have to
> walk the pasid_array of every attached device under the domain and search
> for every pasid entry pointing to the said domain. It's very inefficient.
>
> of course if this can be done more generally it'd be nice.????
>
> [1] https://lore.kernel.org/linux-iommu/[email protected]/
Yes, it would be nice as the next step. But so far only ENQCMDS usages may
not justify.

Thanks,

Jacob

2023-03-28 20:58:02

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v2 4/8] iommu/vt-d: Reserve RID_PASID from global SVA PASID space

Hi Baolu,

On Tue, 28 Mar 2023 09:29:19 -0700, Jacob Pan
<[email protected]> wrote:

> > > On VT-d platforms, RID_PASID is used for DMA request without PASID. We
> > > should not treat RID_PASID special instead let it be allocated from
> > > the global SVA PASID number space.
> >
> > It's same to AMD and ARM SMMUv3, right? They also need an explicit
> > reservation of PASID 0.
> >
> yes, all IOMMU drivers need to do that. I will give it a try but might
> need help to place the call.
It might be simpler to just let SVA code allocate from 1 up instead of 0
(as is in the current code). Global PASID allocator would still allow the
full range from 0 to max. Then there is no change to architectures that
don't support non-zero RID_PASID. For VT-d, it would still work in the
future when we have nonzero RID_PASID. is that reasonable?

Thanks,

Jacob

2023-03-29 03:34:30

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 6/8] iommu/vt-d: Implement set_dev_pasid domain op

> From: Jacob Pan <[email protected]>
> Sent: Tuesday, March 28, 2023 11:40 PM
>
> Hi Kevin,
>
> On Tue, 28 Mar 2023 07:47:45 +0000, "Tian, Kevin" <[email protected]>
> wrote:
>
> > > From: Jacob Pan <[email protected]>
> > > Sent: Tuesday, March 28, 2023 7:22 AM
> > >
> > > Devices that use ENQCMDS to submit work on buffers mapped by DMA
> API
> > > must attach a PASID to the default domain of the device. In preparation
> > > for this use case, this patch implements set_dev_pasid() for the
> > > default_domain_ops. Besides PASID attachment, device will also be
> > > attached to the domain as the result of this call if the device has not
> > > been attached before.
> > >
> >
> > I didn't get the last point. PASID attach should only have the scope
> > for the pasid. RID of the device might be attached to another domain
> > which shouldn't be changed by this call.
> I meant if the RID context has not been set up before attaching this PASID,
> this call will also set up the context, PASID dir etc. In the end, we
> eliminated ordering requirement of attaching device, RID_PASID first, then
> other PASIDs.
> How about:
> "If the device context has not been set up prior to this call, this will
> set up the device context in addition to PASID attachment."
>

this is clearer.

2023-03-29 06:26:15

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v2 5/8] iommu/vt-d: Make device pasid attachment explicit

On 3/28/23 3:44 PM, Tian, Kevin wrote:
>> From: Baolu Lu <[email protected]>
>> Sent: Tuesday, March 28, 2023 1:49 PM
>>
>> On 3/28/23 7:21 AM, Jacob Pan wrote:
>>> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
>>> index 65b15be72878..b6c26f25d1ba 100644
>>> --- a/drivers/iommu/intel/iommu.h
>>> +++ b/drivers/iommu/intel/iommu.h
>>> @@ -595,6 +595,7 @@ struct dmar_domain {
>>>
>>> spinlock_t lock; /* Protect device tracking lists */
>>> struct list_head devices; /* all devices' list */
>>> + struct list_head dev_pasids; /* all attached pasids */
>>>
>>> struct dma_pte *pgd; /* virtual address */
>>> int gaw; /* max guest address width */
>>> @@ -708,6 +709,7 @@ struct device_domain_info {
>>> u8 ats_supported:1;
>>> u8 ats_enabled:1;
>>> u8 dtlb_extra_inval:1; /* Quirk for devices need extra flush */
>>> + u8 dev_attached:1; /* Device context activated */
>>> u8 ats_qdep;
>>> struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
>>> struct intel_iommu *iommu; /* IOMMU used by this device */
>>> @@ -715,6 +717,12 @@ struct device_domain_info {
>>> struct pasid_table *pasid_table; /* pasid table */
>>> };
>>>
>>> +struct device_pasid_info {
>>> + struct list_head link_domain; /* link to domain siblings */
>>> + struct device *dev; /* physical device derived from */
>>> + ioasid_t pasid; /* PASID on physical device */
>>> +};
>>
>> The dev_pasids list seems to be duplicate with iommu_group::pasid_array.
>>
>> The pasid_array is de facto per-device as the PCI subsystem requires ACS
>> to be enabled on the upstream path to the root port.
>>
>> pci_enable_pasid():
>> 385 if (!pci_acs_path_enabled(pdev, NULL, PCI_ACS_RR | PCI_ACS_UF))
>> 386 return -EINVAL;
>>
>> For such PCI topology, pci_device_group() always assigns an exclusive
>> iommu group (a.k.a. singleton group).
>>
>> So, how about moving the pasid_array from struct iommu_group to struct
>> dev_iommu? With this refactoring, the individual iommu driver has no
>> need to create their own pasid array or list.
>>
>> Instead of using iommu_group::mutex, perhaps the pasid_array needs its
>> own lock in struct dev_iommu after moving.
>>
>
> What you suggested is a right thing and more friendly to pasid attach
> in iommufd [1].
>
> but dev_pasids list here is a different thing. It tracks which [device, pasid]
> is attached to the domain. w/o this information you'll have to walk the
> pasid_array of every attached device under the domain and search for
> every pasid entry pointing to the said domain. It's very inefficient.
>
> of course if this can be done more generally it'd be nice.????
>
> [1] https://lore.kernel.org/linux-iommu/[email protected]/

Ah, yes. You are right. I was confused.

Best regards,
baolu

2023-03-29 06:29:18

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v2 6/8] iommu/vt-d: Implement set_dev_pasid domain op

On 3/28/23 11:40 PM, Jacob Pan wrote:
> Hi Kevin,
>
> On Tue, 28 Mar 2023 07:47:45 +0000, "Tian, Kevin" <[email protected]>
> wrote:
>
>>> From: Jacob Pan <[email protected]>
>>> Sent: Tuesday, March 28, 2023 7:22 AM
>>>
>>> Devices that use ENQCMDS to submit work on buffers mapped by DMA API
>>> must attach a PASID to the default domain of the device. In preparation
>>> for this use case, this patch implements set_dev_pasid() for the
>>> default_domain_ops. Besides PASID attachment, device will also be
>>> attached to the domain as the result of this call if the device has not
>>> been attached before.
>>>
>>
>> I didn't get the last point. PASID attach should only have the scope
>> for the pasid. RID of the device might be attached to another domain
>> which shouldn't be changed by this call.
> I meant if the RID context has not been set up before attaching this PASID,
> this call will also set up the context, PASID dir etc. In the end, we
> eliminated ordering requirement of attaching device, RID_PASID first, then
> other PASIDs.

This occurs in default domain deferred attaching case.

> How about:
> "If the device context has not been set up prior to this call, this will
> set up the device context in addition to PASID attachment."

Best regards,
baolu

2023-03-29 06:39:39

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v2 4/8] iommu/vt-d: Reserve RID_PASID from global SVA PASID space

On 3/29/23 4:52 AM, Jacob Pan wrote:
> On Tue, 28 Mar 2023 09:29:19 -0700, Jacob Pan
> <[email protected]> wrote:
>
>>>> On VT-d platforms, RID_PASID is used for DMA request without PASID. We
>>>> should not treat RID_PASID special instead let it be allocated from
>>>> the global SVA PASID number space.
>>> It's same to AMD and ARM SMMUv3, right? They also need an explicit
>>> reservation of PASID 0.
>>>
>> yes, all IOMMU drivers need to do that. I will give it a try but might
>> need help to place the call.
> It might be simpler to just let SVA code allocate from 1 up instead of 0
> (as is in the current code). Global PASID allocator would still allow the
> full range from 0 to max. Then there is no change to architectures that
> don't support non-zero RID_PASID. For VT-d, it would still work in the
> future when we have nonzero RID_PASID. is that reasonable?

Yes. It's reasonable from the status quo.

Best regards,
baolu

2023-03-29 06:41:11

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v2 7/8] iommu: Export iommu_get_dma_domain

On 3/28/23 11:48 PM, Jacob Pan wrote:
> On Tue, 28 Mar 2023 14:04:15 +0800, Baolu Lu<[email protected]>
> wrote:
>
>> On 3/28/23 7:21 AM, Jacob Pan wrote:
>>> Devices that use ENQCMDS to submit work needs to retrieve its DMA
>>> domain. It can then attach PASID to the DMA domain for shared mapping
>>> (with RID) established by DMA API.
>>>
>>> Signed-off-by: Jacob Pan<[email protected]>
>>> ---
>>> drivers/iommu/iommu.c | 1 +
>>> include/linux/iommu.h | 5 +++++
>>> 2 files changed, 6 insertions(+)
>>>
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index 10db680acaed..c51d343a75d2 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -2118,6 +2118,7 @@ struct iommu_domain *iommu_get_dma_domain(struct
>>> device *dev) {
>>> return dev->iommu_group->default_domain;
>>> }
>>> +EXPORT_SYMBOL_GPL(iommu_get_dma_domain);
>> Directly exporting this function for external use seems unsafe. If the
>> caller is the kernel driver for this device, it's fine because default
>> domain remains unchanged during the life cycle of the driver. Otherwise,
>> using this function may cause UAF. Keep in mind that group's default
>> domain could be changed through sysfs.
> don't you have to unload the driver?

Yes, of cause.

Hence, the getting domain interfaces are only safe to be used in the
driver of the device.

Best regards,
baolu

2023-03-29 08:23:08

by Vasant Hegde

[permalink] [raw]
Subject: Re: [PATCH v2 4/8] iommu/vt-d: Reserve RID_PASID from global SVA PASID space

On 3/28/2023 10:50 AM, Baolu Lu wrote:
> On 3/28/23 7:21 AM, Jacob Pan wrote:
>> On VT-d platforms, RID_PASID is used for DMA request without PASID. We
>> should not treat RID_PASID special instead let it be allocated from the
>> global SVA PASID number space.
>
> It's same to AMD and ARM SMMUv3, right? They also need an explicit
> reservation of PASID 0.

Yes for AMD driver. (Requests from the I/O device without a PASID are treated as
if they have PASID of 0).


-Vasant