2023-04-27 17:48:04

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v5 0/7] Re-enable IDXD kernel workqueue under DMA API

Hi all,

IDXD kernel work queues were disabled due to the flawed use of kernel VA
and SVA API.
Link: https://lore.kernel.org/linux-iommu/[email protected]/

The solution is to enable it under DMA API where IDXD shared workqueue users
can use ENQCMDS to submit work on buffers mapped by DMA API.

This patchset adds support for attaching PASID to the device's default
domain and the ability to allocate global PASIDs from IOMMU APIs. We can then
re-enable the kernel work queues and use them under DMA API.

This depends on the IOASID removal series. (merged)
https://lore.kernel.org/all/[email protected]/


Thanks,

Jacob

---
Changelog:
v5:
- exclude two patches related to supervisor mode, taken by VT-d
maintainer Baolu.
- move PASID range check into allocation API so that device drivers
only need to pass in struct device*. (Kevin)
- factor out helper functions in device-domain attach (Baolu)
- make explicit use of RID_PASID across architectures
v4:
- move dummy functions outside ifdef CONFIG_IOMMU_SVA (Baolu)
- dropped domain type check while disabling idxd system PASID (Baolu)

v3:
- moved global PASID allocation API from SVA to IOMMU (Kevin)
- remove #ifdef around global PASID reservation during boot (Baolu)
- remove restriction on PASID 0 allocation (Baolu)
- fix a bug in sysfs domain change when attaching devices
- clear idxd user interrupt enable bit after disabling device( Fenghua)
v2:
- refactored device PASID attach domain ops based on Baolu's early patch
- addressed TLB flush gap
- explicitly reserve RID_PASID from SVA PASID number space
- get dma domain directly, avoid checking domain types


Jacob Pan (6):
iommu: Generalize default PCIe requester ID PASID
iommu/sva: Explicitly exclude RID_PASID from SVA
iommu: Move global PASID allocation from SVA to core
iommu/vt-d: Prepare PASID attachment beyond RID_PASID
iommu/vt-d: Implement set_dev_pasid domain op
dmaengine/idxd: Re-enable kernel workqueue under DMA API

Lu Baolu (1):
iommu/vt-d: Factoring out PASID set up helper function

drivers/dma/idxd/device.c | 30 +----
drivers/dma/idxd/init.c | 60 ++++++++-
drivers/dma/idxd/sysfs.c | 7 --
drivers/iommu/intel/iommu.c | 240 +++++++++++++++++++++++++++++-------
drivers/iommu/intel/iommu.h | 8 ++
drivers/iommu/intel/pasid.c | 2 +-
drivers/iommu/intel/pasid.h | 1 -
drivers/iommu/iommu-sva.c | 25 +---
drivers/iommu/iommu.c | 24 ++++
include/linux/iommu.h | 10 ++
10 files changed, 305 insertions(+), 102 deletions(-)

--
2.25.1


2023-04-27 17:48:15

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v5 2/7] iommu/sva: Explicitly exclude RID_PASID from SVA

SVA PASID allocation is hardcoded to start from 1 because 0 is used for
RID_PASID, let's make it explicit to avoid the potential conflicts.

Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/iommu-sva.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index c434b95dc8eb..ac7c93bacb5c 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -66,7 +66,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
return ERR_PTR(-EOPNOTSUPP);

/* Allocate mm->pasid if necessary. */
- ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
+ ret = iommu_sva_alloc_pasid(mm, IOMMU_DEF_RID_PASID + 1, max_pasids - 1);
if (ret)
return ERR_PTR(ret);

--
2.25.1

2023-04-27 17:48:23

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v5 1/7] iommu: Generalize default PCIe requester ID PASID

PCIe Process address space ID (PASID) is used to tag DMA traffic, it
provides finer grained isolation than requester ID (RID).

For each RID, 0 is as a special PASID for the legacy DMA (without
PASID), thus RID_PASID. This is universal across all architectures,
therefore warranted to be declared in the common header.
Noting that VT-d could support none-zero RID_PASID, but currently not
used.

By having a common RID_PASID, we can avoid conflicts between different
use cases in the generic code. e.g. SVA and DMA API with PASIDs.

Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/iommu.c | 24 ++++++++++++------------
drivers/iommu/intel/pasid.c | 2 +-
drivers/iommu/intel/pasid.h | 1 -
include/linux/iommu.h | 1 +
4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9f737ef55463..9ec45e0497cc 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -877,7 +877,7 @@ void dmar_fault_dump_ptes(struct intel_iommu *iommu, u16 source_id,
}
/* For request-without-pasid, get the pasid from context entry */
if (intel_iommu_sm && pasid == IOMMU_PASID_INVALID)
- pasid = PASID_RID2PASID;
+ pasid = IOMMU_DEF_RID_PASID;

dir_index = pasid >> PASID_PDE_SHIFT;
pde = &dir[dir_index];
@@ -1480,7 +1480,7 @@ static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
qdep = info->ats_qdep;
qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
qdep, addr, mask);
- quirk_extra_dev_tlb_flush(info, addr, mask, PASID_RID2PASID, qdep);
+ quirk_extra_dev_tlb_flush(info, addr, mask, IOMMU_DEF_RID_PASID, qdep);
}

static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
@@ -1514,7 +1514,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
ih = 1 << 6;

if (domain->use_first_level) {
- qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, pages, ih);
+ qi_flush_piotlb(iommu, did, IOMMU_DEF_RID_PASID, addr, pages, ih);
} else {
unsigned long bitmask = aligned_pages - 1;

@@ -1584,7 +1584,7 @@ static void intel_flush_iotlb_all(struct iommu_domain *domain)
u16 did = domain_id_iommu(dmar_domain, iommu);

if (dmar_domain->use_first_level)
- qi_flush_piotlb(iommu, did, PASID_RID2PASID, 0, -1, 0);
+ qi_flush_piotlb(iommu, did, IOMMU_DEF_RID_PASID, 0, -1, 0);
else
iommu->flush.flush_iotlb(iommu, did, 0, 0,
DMA_TLB_DSI_FLUSH);
@@ -1976,7 +1976,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
context_pdts(pds);

/* Setup the RID_PASID field: */
- context_set_sm_rid2pasid(context, PASID_RID2PASID);
+ context_set_sm_rid2pasid(context, IOMMU_DEF_RID_PASID);

/*
* Setup the Device-TLB enable bit and Page request
@@ -2455,13 +2455,13 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
/* Setup the PASID entry for requests without PASID: */
if (hw_pass_through && domain_type_is_si(domain))
ret = intel_pasid_setup_pass_through(iommu, domain,
- dev, PASID_RID2PASID);
+ dev, IOMMU_DEF_RID_PASID);
else if (domain->use_first_level)
ret = domain_setup_first_level(iommu, domain, dev,
- PASID_RID2PASID);
+ IOMMU_DEF_RID_PASID);
else
ret = intel_pasid_setup_second_level(iommu, domain,
- dev, PASID_RID2PASID);
+ dev, IOMMU_DEF_RID_PASID);
if (ret) {
dev_err(dev, "Setup RID2PASID failed\n");
device_block_translation(dev);
@@ -3997,7 +3997,7 @@ static void dmar_remove_one_dev_info(struct device *dev)
if (!dev_is_real_dma_subdevice(info->dev)) {
if (dev_is_pci(info->dev) && sm_supported(iommu))
intel_pasid_tear_down_entry(iommu, info->dev,
- PASID_RID2PASID, false);
+ IOMMU_DEF_RID_PASID, false);

iommu_disable_pci_caps(info);
domain_context_clear(info);
@@ -4026,7 +4026,7 @@ static void device_block_translation(struct device *dev)
if (!dev_is_real_dma_subdevice(dev)) {
if (sm_supported(iommu))
intel_pasid_tear_down_entry(iommu, dev,
- PASID_RID2PASID, false);
+ IOMMU_DEF_RID_PASID, false);
else
domain_context_clear(info);
}
@@ -4360,7 +4360,7 @@ static void domain_set_force_snooping(struct dmar_domain *domain)

list_for_each_entry(info, &domain->devices, link)
intel_pasid_setup_page_snoop_control(info->iommu, info->dev,
- PASID_RID2PASID);
+ IOMMU_DEF_RID_PASID);
}

static bool intel_iommu_enforce_cache_coherency(struct iommu_domain *domain)
@@ -4948,7 +4948,7 @@ void quirk_extra_dev_tlb_flush(struct device_domain_info *info,
return;

sid = PCI_DEVID(info->bus, info->devfn);
- if (pasid == PASID_RID2PASID) {
+ if (pasid == IOMMU_DEF_RID_PASID) {
qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
qdep, address, mask);
} else {
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index c5d479770e12..e8c60af8591b 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -438,7 +438,7 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
* SVA usage, device could do DMA with multiple PASIDs. It is more
* efficient to flush devTLB specific to the PASID.
*/
- if (pasid == PASID_RID2PASID)
+ if (pasid == IOMMU_DEF_RID_PASID)
qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT);
else
qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 - VTD_PAGE_SHIFT);
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index d6b7d21244b1..027d30afaba6 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -10,7 +10,6 @@
#ifndef __INTEL_PASID_H
#define __INTEL_PASID_H

-#define PASID_RID2PASID 0x0
#define PASID_MIN 0x1
#define PASID_MAX 0x100000
#define PASID_PTE_MASK 0x3F
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 54f535ff9868..15aa4a1f7b1a 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -190,6 +190,7 @@ enum iommu_dev_features {
IOMMU_DEV_FEAT_IOPF,
};

+#define IOMMU_DEF_RID_PASID (0U) /* Reserved for DMA w/o PASID */
#define IOMMU_PASID_INVALID (-1U)
typedef unsigned int ioasid_t;

--
2.25.1

2023-04-27 17:48:24

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v5 4/7] iommu/vt-d: Factoring out PASID set up helper function

From: Lu Baolu <[email protected]>

To prepare non-RID_PASID attachment, factoring out common code to be
reused. PASID entry set up is common for all DMA API PASIDs.

Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/iommu.c | 42 ++++++++++++++++++++++---------------
1 file changed, 25 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9ec45e0497cc..cb586849a1ee 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2429,6 +2429,26 @@ static int __init si_domain_init(int hw)
return 0;
}

+static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
+ struct intel_iommu *iommu,
+ struct device *dev, ioasid_t pasid)
+{
+ int ret;
+
+ /* PASID table is mandatory for a PCI device in scalable mode. */
+ if (!sm_supported(iommu) && dev_is_real_dma_subdevice(dev))
+ return -EOPNOTSUPP;
+
+ if (hw_pass_through && domain_type_is_si(domain))
+ ret = intel_pasid_setup_pass_through(iommu, domain, dev, pasid);
+ else if (domain->use_first_level)
+ ret = domain_setup_first_level(iommu, domain, dev, pasid);
+ else
+ ret = intel_pasid_setup_second_level(iommu, domain, dev, pasid);
+
+ return 0;
+}
+
static int dmar_domain_attach_device(struct dmar_domain *domain,
struct device *dev)
{
@@ -2450,23 +2470,11 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
list_add(&info->link, &domain->devices);
spin_unlock_irqrestore(&domain->lock, flags);

- /* PASID table is mandatory for a PCI device in scalable mode. */
- if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
- /* Setup the PASID entry for requests without PASID: */
- if (hw_pass_through && domain_type_is_si(domain))
- ret = intel_pasid_setup_pass_through(iommu, domain,
- dev, IOMMU_DEF_RID_PASID);
- else if (domain->use_first_level)
- ret = domain_setup_first_level(iommu, domain, dev,
- IOMMU_DEF_RID_PASID);
- else
- ret = intel_pasid_setup_second_level(iommu, domain,
- dev, IOMMU_DEF_RID_PASID);
- if (ret) {
- dev_err(dev, "Setup RID2PASID failed\n");
- device_block_translation(dev);
- return ret;
- }
+ ret = dmar_domain_attach_device_pasid(domain, iommu, dev,
+ IOMMU_DEF_RID_PASID);
+ if (ret) {
+ dev_err(dev, "Setup RID2PASID failed\n");
+ device_block_translation(dev);
}

ret = domain_context_mapping(domain, dev);
--
2.25.1

2023-04-27 17:48:40

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v5 3/7] iommu: Move global PASID allocation from SVA to core

Devices that use Intel ENQCMD to submit work must use global PASIDs in
that the PASID are stored in a per CPU MSR. When such device need to
submit work for in-kernel DMA with PASID, it must allocate PASIDs from
the same global number space to avoid conflict.

This patch moves global PASID allocation APIs from SVA to IOMMU APIs.
It is expected that device drivers will use the allocated PASIDs to
attach to appropriate IOMMU domains for use.

Signed-off-by: Jacob Pan <[email protected]>
---
v5: move PASID range check inside API so that device drivers only pass
in struct device* (Kevin)
v4: move dummy functions outside ifdef CONFIG_IOMMU_SVA (Baolu)
---
drivers/iommu/iommu-sva.c | 25 ++++++-------------------
drivers/iommu/iommu.c | 24 ++++++++++++++++++++++++
include/linux/iommu.h | 9 +++++++++
3 files changed, 39 insertions(+), 19 deletions(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index ac7c93bacb5c..f07997b37cea 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -9,27 +9,19 @@
#include "iommu-sva.h"

static DEFINE_MUTEX(iommu_sva_lock);
-static DEFINE_IDA(iommu_global_pasid_ida);

/* Allocate a PASID for the mm within range (inclusive) */
-static int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
+static int iommu_sva_alloc_pasid(struct mm_struct *mm, struct device *dev)
{
int ret = 0;

- if (!pasid_valid(min) || !pasid_valid(max) ||
- min == 0 || max < min)
- return -EINVAL;
-
mutex_lock(&iommu_sva_lock);
/* Is a PASID already associated with this mm? */
- if (pasid_valid(mm->pasid)) {
- if (mm->pasid < min || mm->pasid > max)
- ret = -EOVERFLOW;
+ if (pasid_valid(mm->pasid))
goto out;
- }

- ret = ida_alloc_range(&iommu_global_pasid_ida, min, max, GFP_KERNEL);
- if (ret < min)
+ ret = iommu_alloc_global_pasid_dev(dev);
+ if (!pasid_valid(ret))
goto out;
mm->pasid = ret;
ret = 0;
@@ -58,15 +50,10 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm
{
struct iommu_domain *domain;
struct iommu_sva *handle;
- ioasid_t max_pasids;
int ret;

- max_pasids = dev->iommu->max_pasids;
- if (!max_pasids)
- return ERR_PTR(-EOPNOTSUPP);
-
/* Allocate mm->pasid if necessary. */
- ret = iommu_sva_alloc_pasid(mm, IOMMU_DEF_RID_PASID + 1, max_pasids - 1);
+ ret = iommu_sva_alloc_pasid(mm, dev);
if (ret)
return ERR_PTR(ret);

@@ -211,5 +198,5 @@ void mm_pasid_drop(struct mm_struct *mm)
if (likely(!pasid_valid(mm->pasid)))
return;

- ida_free(&iommu_global_pasid_ida, mm->pasid);
+ iommu_free_global_pasid(mm->pasid);
}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 10db680acaed..6314513c00e6 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -38,6 +38,7 @@

static struct kset *iommu_group_kset;
static DEFINE_IDA(iommu_group_ida);
+static DEFINE_IDA(iommu_global_pasid_ida);

static unsigned int iommu_def_domain_type __read_mostly;
static bool iommu_dma_strict __read_mostly = IS_ENABLED(CONFIG_IOMMU_DEFAULT_DMA_STRICT);
@@ -3450,3 +3451,26 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,

return domain;
}
+
+ioasid_t iommu_alloc_global_pasid_dev(struct device *dev)
+{
+ int ret;
+ ioasid_t max;
+
+ max = dev_iommu_get_max_pasids(dev);
+ ret = ida_alloc_range(&iommu_global_pasid_ida, IOMMU_DEF_RID_PASID + 1, max, GFP_KERNEL);
+ if (ret < 0)
+ return IOMMU_PASID_INVALID;
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_alloc_global_pasid_dev);
+
+void iommu_free_global_pasid(ioasid_t pasid)
+{
+ if (WARN_ON(!pasid_valid(pasid)))
+ return;
+
+ ida_free(&iommu_global_pasid_ida, pasid);
+}
+EXPORT_SYMBOL_GPL(iommu_free_global_pasid);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 15aa4a1f7b1a..07654969414c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -724,6 +724,8 @@ void iommu_detach_device_pasid(struct iommu_domain *domain,
struct iommu_domain *
iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid,
unsigned int type);
+ioasid_t iommu_alloc_global_pasid_dev(struct device *dev);
+void iommu_free_global_pasid(ioasid_t pasid);
#else /* CONFIG_IOMMU_API */

struct iommu_ops {};
@@ -1090,6 +1092,13 @@ iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid,
{
return NULL;
}
+
+static inline ioasid_t iommu_alloc_global_pasid_dev(struct device *dev)
+{
+ return IOMMU_PASID_INVALID;
+}
+
+static inline void iommu_free_global_pasid(ioasid_t pasid) {}
#endif /* CONFIG_IOMMU_API */

/**
--
2.25.1

2023-04-27 17:49:42

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v5 7/7] dmaengine/idxd: Re-enable kernel workqueue under DMA API

Kernel workqueues were disabled due to flawed use of kernel VA and SVA
API. Now that we have the support for attaching PASID to the device's
default domain and the ability to reserve global PASIDs from SVA APIs,
we can re-enable the kernel work queues and use them under DMA API.

We also use non-privileged access for in-kernel DMA to be consistent
with the IOMMU settings. Consequently, interrupt for user privilege is
enabled for work completion IRQs.

Link:https://lore.kernel.org/linux-iommu/[email protected]/
Reviewed-by: Dave Jiang <[email protected]>
Reviewed-by: Fenghua Yu <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/dma/idxd/device.c | 30 ++++----------------
drivers/dma/idxd/init.c | 60 ++++++++++++++++++++++++++++++++++++---
drivers/dma/idxd/sysfs.c | 7 -----
3 files changed, 61 insertions(+), 36 deletions(-)

diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index 6fca8fa8d3a8..f6b133d61a04 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -299,21 +299,6 @@ void idxd_wqs_unmap_portal(struct idxd_device *idxd)
}
}

-static void __idxd_wq_set_priv_locked(struct idxd_wq *wq, int priv)
-{
- struct idxd_device *idxd = wq->idxd;
- union wqcfg wqcfg;
- unsigned int offset;
-
- offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PRIVL_IDX);
- spin_lock(&idxd->dev_lock);
- wqcfg.bits[WQCFG_PRIVL_IDX] = ioread32(idxd->reg_base + offset);
- wqcfg.priv = priv;
- wq->wqcfg->bits[WQCFG_PRIVL_IDX] = wqcfg.bits[WQCFG_PRIVL_IDX];
- iowrite32(wqcfg.bits[WQCFG_PRIVL_IDX], idxd->reg_base + offset);
- spin_unlock(&idxd->dev_lock);
-}
-
static void __idxd_wq_set_pasid_locked(struct idxd_wq *wq, int pasid)
{
struct idxd_device *idxd = wq->idxd;
@@ -1324,15 +1309,14 @@ int drv_enable_wq(struct idxd_wq *wq)
}

/*
- * In the event that the WQ is configurable for pasid and priv bits.
- * For kernel wq, the driver should setup the pasid, pasid_en, and priv bit.
- * However, for non-kernel wq, the driver should only set the pasid_en bit for
- * shared wq. A dedicated wq that is not 'kernel' type will configure pasid and
+ * In the event that the WQ is configurable for pasid, the driver
+ * should setup the pasid, pasid_en bit. This is true for both kernel
+ * and user shared workqueues. There is no need to setup priv bit in
+ * that in-kernel DMA will also do user privileged requests.
+ * A dedicated wq that is not 'kernel' type will configure pasid and
* pasid_en later on so there is no need to setup.
*/
if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) {
- int priv = 0;
-
if (wq_pasid_enabled(wq)) {
if (is_idxd_wq_kernel(wq) || wq_shared(wq)) {
u32 pasid = wq_dedicated(wq) ? idxd->pasid : 0;
@@ -1340,10 +1324,6 @@ int drv_enable_wq(struct idxd_wq *wq)
__idxd_wq_set_pasid_locked(wq, pasid);
}
}
-
- if (is_idxd_wq_kernel(wq))
- priv = 1;
- __idxd_wq_set_priv_locked(wq, priv);
}

rc = 0;
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index e6ee267da0ff..fdfc7f76186f 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -506,14 +506,65 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d

static int idxd_enable_system_pasid(struct idxd_device *idxd)
{
- return -EOPNOTSUPP;
+ struct pci_dev *pdev = idxd->pdev;
+ struct device *dev = &pdev->dev;
+ struct iommu_domain *domain;
+ union gencfg_reg gencfg;
+ ioasid_t pasid;
+ int ret;
+
+ /*
+ * Attach a global PASID to the DMA domain so that we can use ENQCMDS
+ * to submit work on buffers mapped by DMA API.
+ */
+ domain = iommu_get_domain_for_dev(dev);
+ if (!domain)
+ return -EPERM;
+
+ pasid = iommu_alloc_global_pasid_dev(dev);
+ if (!pasid_valid(pasid))
+ return -ENOSPC;
+
+ /*
+ * DMA domain is owned by the driver, it should support all valid
+ * types such as DMA-FQ, identity, etc.
+ */
+ ret = iommu_attach_device_pasid(domain, dev, pasid);
+ if (ret) {
+ dev_err(dev, "failed to attach device pasid %d, domain type %d",
+ pasid, domain->type);
+ iommu_free_global_pasid(pasid);
+ return ret;
+ }
+
+ /* Since we set user privilege for kernel DMA, enable completion IRQ */
+ gencfg.bits = ioread32(idxd->reg_base + IDXD_GENCFG_OFFSET);
+ gencfg.user_int_en = 1;
+ iowrite32(gencfg.bits, idxd->reg_base + IDXD_GENCFG_OFFSET);
+ idxd->pasid = pasid;
+
+ return ret;
}

static void idxd_disable_system_pasid(struct idxd_device *idxd)
{
+ struct pci_dev *pdev = idxd->pdev;
+ struct device *dev = &pdev->dev;
+ struct iommu_domain *domain;
+ union gencfg_reg gencfg;
+
+ domain = iommu_get_domain_for_dev(dev);
+ if (!domain)
+ return;
+
+ iommu_detach_device_pasid(domain, dev, idxd->pasid);
+ iommu_free_global_pasid(idxd->pasid);

- iommu_sva_unbind_device(idxd->sva);
+ gencfg.bits = ioread32(idxd->reg_base + IDXD_GENCFG_OFFSET);
+ gencfg.user_int_en = 0;
+ iowrite32(gencfg.bits, idxd->reg_base + IDXD_GENCFG_OFFSET);
idxd->sva = NULL;
+ idxd->pasid = IOMMU_PASID_INVALID;
}

static int idxd_probe(struct idxd_device *idxd)
@@ -535,8 +586,9 @@ static int idxd_probe(struct idxd_device *idxd)
} else {
set_bit(IDXD_FLAG_USER_PASID_ENABLED, &idxd->flags);

- if (idxd_enable_system_pasid(idxd))
- dev_warn(dev, "No in-kernel DMA with PASID.\n");
+ rc = idxd_enable_system_pasid(idxd);
+ if (rc)
+ dev_warn(dev, "No in-kernel DMA with PASID. %d\n", rc);
else
set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
}
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 18cd8151dee0..c5561c00a503 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -944,13 +944,6 @@ static ssize_t wq_name_store(struct device *dev,
if (strlen(buf) > WQ_NAME_SIZE || strlen(buf) == 0)
return -EINVAL;

- /*
- * This is temporarily placed here until we have SVM support for
- * dmaengine.
- */
- if (wq->type == IDXD_WQT_KERNEL && device_pasid_enabled(wq->idxd))
- return -EOPNOTSUPP;
-
input = kstrndup(buf, count, GFP_KERNEL);
if (!input)
return -ENOMEM;
--
2.25.1

2023-04-27 17:49:50

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v5 5/7] iommu/vt-d: Prepare PASID attachment beyond RID_PASID

Currently, device and default domain attaching process includes RID_PASID
setup whenever PASID is supported.
To prepare for non-RID_PASID usage such as ENQCMDS, we can factor out helper
functions such that they can be reused in any order between PASID and
device attachment. i.e. non-RID_PASID attachment via device_set_pasid()
can happen prior to device_attach().
It was agreed that upper level APIs should not make assumptions about
ordering.
Link: https://lore.kernel.org/lkml/[email protected]/

Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/iommu.c | 106 ++++++++++++++++++++++++++++++++----
drivers/iommu/intel/iommu.h | 8 +++
2 files changed, 102 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cb586849a1ee..388453a7415e 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1365,6 +1365,7 @@ domain_lookup_dev_info(struct dmar_domain *domain,

static void domain_update_iotlb(struct dmar_domain *domain)
{
+ struct device_pasid_info *dev_pasid;
struct device_domain_info *info;
bool has_iotlb_device = false;
unsigned long flags;
@@ -1376,6 +1377,14 @@ static void domain_update_iotlb(struct dmar_domain *domain)
break;
}
}
+
+ list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain) {
+ info = dev_iommu_priv_get(dev_pasid->dev);
+ if (info->ats_enabled) {
+ has_iotlb_device = true;
+ break;
+ }
+ }
domain->has_iotlb_device = has_iotlb_device;
spin_unlock_irqrestore(&domain->lock, flags);
}
@@ -1486,6 +1495,7 @@ static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
u64 addr, unsigned mask)
{
+ struct device_pasid_info *dev_pasid;
struct device_domain_info *info;
unsigned long flags;

@@ -1495,6 +1505,39 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
spin_lock_irqsave(&domain->lock, flags);
list_for_each_entry(info, &domain->devices, link)
__iommu_flush_dev_iotlb(info, addr, mask);
+
+ list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain) {
+ /* device TLB is not aware of the use of RID PASID is for DMA w/o PASID */
+ if (dev_pasid->pasid == IOMMU_DEF_RID_PASID)
+ continue;
+
+ info = dev_iommu_priv_get(dev_pasid->dev);
+ qi_flush_dev_iotlb_pasid(info->iommu,
+ PCI_DEVID(info->bus, info->devfn),
+ info->pfsid, dev_pasid->pasid,
+ info->ats_qdep, addr,
+ mask);
+ }
+ spin_unlock_irqrestore(&domain->lock, flags);
+}
+
+/*
+ * The VT-d spec requires to use PASID-based-IOTLB Invalidation to
+ * invalidate IOTLB and the paging-structure-caches for a first-stage
+ * page table.
+ */
+static void domain_flush_pasid_iotlb(struct intel_iommu *iommu,
+ struct dmar_domain *domain, u64 addr,
+ unsigned long npages, bool ih)
+{
+ u16 did = domain_id_iommu(domain, iommu);
+ struct device_pasid_info *dev_pasid;
+ unsigned long flags;
+
+ spin_lock_irqsave(&domain->lock, flags);
+ list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain)
+ qi_flush_piotlb(iommu, did, dev_pasid->pasid, addr, npages, ih);
+
spin_unlock_irqrestore(&domain->lock, flags);
}

@@ -1514,7 +1557,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
ih = 1 << 6;

if (domain->use_first_level) {
- qi_flush_piotlb(iommu, did, IOMMU_DEF_RID_PASID, addr, pages, ih);
+ domain_flush_pasid_iotlb(iommu, domain, addr, pages, ih);
} else {
unsigned long bitmask = aligned_pages - 1;

@@ -1584,7 +1627,7 @@ static void intel_flush_iotlb_all(struct iommu_domain *domain)
u16 did = domain_id_iommu(dmar_domain, iommu);

if (dmar_domain->use_first_level)
- qi_flush_piotlb(iommu, did, IOMMU_DEF_RID_PASID, 0, -1, 0);
+ domain_flush_pasid_iotlb(iommu, dmar_domain, 0, -1, 0);
else
iommu->flush.flush_iotlb(iommu, did, 0, 0,
DMA_TLB_DSI_FLUSH);
@@ -1756,6 +1799,7 @@ static struct dmar_domain *alloc_domain(unsigned int type)
domain->use_first_level = true;
domain->has_iotlb_device = false;
INIT_LIST_HEAD(&domain->devices);
+ INIT_LIST_HEAD(&domain->dev_pasids);
spin_lock_init(&domain->lock);
xa_init(&domain->iommu_array);

@@ -2433,12 +2477,17 @@ static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
struct intel_iommu *iommu,
struct device *dev, ioasid_t pasid)
{
+ struct device_pasid_info *dev_pasid;
+ unsigned long flags;
int ret;

- /* PASID table is mandatory for a PCI device in scalable mode. */
if (!sm_supported(iommu) && dev_is_real_dma_subdevice(dev))
return -EOPNOTSUPP;

+ dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
+ if (!dev_pasid)
+ return -ENOMEM;
+
if (hw_pass_through && domain_type_is_si(domain))
ret = intel_pasid_setup_pass_through(iommu, domain, dev, pasid);
else if (domain->use_first_level)
@@ -2446,6 +2495,17 @@ static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
else
ret = intel_pasid_setup_second_level(iommu, domain, dev, pasid);

+ if (ret) {
+ kfree(dev_pasid);
+ return ret;
+ }
+
+ dev_pasid->pasid = pasid;
+ dev_pasid->dev = dev;
+ spin_lock_irqsave(&domain->lock, flags);
+ list_add(&dev_pasid->link_domain, &domain->dev_pasids);
+ spin_unlock_irqrestore(&domain->lock, flags);
+
return 0;
}

@@ -2467,16 +2527,13 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
return ret;
info->domain = domain;
spin_lock_irqsave(&domain->lock, flags);
+ if (info->dev_attached) {
+ spin_unlock_irqrestore(&domain->lock, flags);
+ return 0;
+ }
list_add(&info->link, &domain->devices);
spin_unlock_irqrestore(&domain->lock, flags);

- ret = dmar_domain_attach_device_pasid(domain, iommu, dev,
- IOMMU_DEF_RID_PASID);
- if (ret) {
- dev_err(dev, "Setup RID2PASID failed\n");
- device_block_translation(dev);
- }
-
ret = domain_context_mapping(domain, dev);
if (ret) {
dev_err(dev, "Domain context map failed\n");
@@ -2485,8 +2542,9 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
}

iommu_enable_pci_caps(info);
+ info->dev_attached = 1;

- return 0;
+ return ret;
}

static bool device_has_rmrr(struct device *dev)
@@ -4044,6 +4102,7 @@ static void device_block_translation(struct device *dev)

spin_lock_irqsave(&info->domain->lock, flags);
list_del(&info->link);
+ info->dev_attached = 0;
spin_unlock_irqrestore(&info->domain->lock, flags);

domain_detach_iommu(info->domain, iommu);
@@ -4175,8 +4234,15 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
struct device *dev)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ struct intel_iommu *iommu;
+ u8 bus, devfn;
int ret;

+ iommu = device_to_iommu(dev, &bus, &devfn);
+ if (!iommu)
+ return -ENODEV;
+
if (domain->type == IOMMU_DOMAIN_UNMANAGED &&
device_is_rmrr_locked(dev)) {
dev_warn(dev, "Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.\n");
@@ -4190,7 +4256,23 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
if (ret)
return ret;

- return dmar_domain_attach_device(to_dmar_domain(domain), dev);
+ ret = dmar_domain_attach_device(to_dmar_domain(domain), dev);
+ if (ret) {
+ dev_err(dev, "Attach device failed\n");
+ return ret;
+ }
+
+ /* PASID table is mandatory for a PCI device in scalable mode. */
+ if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
+ /* Setup the PASID entry for requests without PASID: */
+ ret = dmar_domain_attach_device_pasid(dmar_domain, iommu, dev,
+ IOMMU_DEF_RID_PASID);
+ if (ret) {
+ dev_err(dev, "Setup RID2PASID failed\n");
+ device_block_translation(dev);
+ }
+ }
+ return ret;
}

static int intel_iommu_map(struct iommu_domain *domain,
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 65b15be72878..b6c26f25d1ba 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -595,6 +595,7 @@ struct dmar_domain {

spinlock_t lock; /* Protect device tracking lists */
struct list_head devices; /* all devices' list */
+ struct list_head dev_pasids; /* all attached pasids */

struct dma_pte *pgd; /* virtual address */
int gaw; /* max guest address width */
@@ -708,6 +709,7 @@ struct device_domain_info {
u8 ats_supported:1;
u8 ats_enabled:1;
u8 dtlb_extra_inval:1; /* Quirk for devices need extra flush */
+ u8 dev_attached:1; /* Device context activated */
u8 ats_qdep;
struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
struct intel_iommu *iommu; /* IOMMU used by this device */
@@ -715,6 +717,12 @@ struct device_domain_info {
struct pasid_table *pasid_table; /* pasid table */
};

+struct device_pasid_info {
+ struct list_head link_domain; /* link to domain siblings */
+ struct device *dev; /* physical device derived from */
+ ioasid_t pasid; /* PASID on physical device */
+};
+
static inline void __iommu_flush_cache(
struct intel_iommu *iommu, void *addr, int size)
{
--
2.25.1

2023-04-27 17:50:00

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v5 6/7] iommu/vt-d: Implement set_dev_pasid domain op

Devices that use ENQCMDS to submit work on buffers mapped by DMA API
must attach a PASID to the default domain of the device. In preparation
for this use case, this patch implements set_dev_pasid() for the
default_domain_ops.

If the device context has not been set up prior to this call, this will
set up the device context in addition to PASID attachment.

Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/iommu.c | 92 ++++++++++++++++++++++++++++++-------
1 file changed, 76 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 388453a7415e..f9d6c31cdc8e 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -278,6 +278,8 @@ static LIST_HEAD(dmar_satc_units);
list_for_each_entry(rmrr, &dmar_rmrr_units, list)

static void device_block_translation(struct device *dev);
+static void intel_iommu_detach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid);
static void intel_iommu_domain_free(struct iommu_domain *domain);

int dmar_disabled = !IS_ENABLED(CONFIG_INTEL_IOMMU_DEFAULT_ON);
@@ -4091,8 +4093,7 @@ static void device_block_translation(struct device *dev)
iommu_disable_pci_caps(info);
if (!dev_is_real_dma_subdevice(dev)) {
if (sm_supported(iommu))
- intel_pasid_tear_down_entry(iommu, dev,
- IOMMU_DEF_RID_PASID, false);
+ intel_iommu_detach_device_pasid(&info->domain->domain, dev, IOMMU_DEF_RID_PASID);
else
domain_context_clear(info);
}
@@ -4761,28 +4762,86 @@ static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
__mapping_notify_one(info->iommu, dmar_domain, pfn, pages);
}

-static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
+static void intel_iommu_detach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
{
- struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
- struct iommu_domain *domain;
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ struct device_pasid_info *i, *dev_pasid = NULL;
+ struct intel_iommu *iommu = info->iommu;
+ unsigned long flags;

- /* Domain type specific cleanup: */
- domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
- if (domain) {
- switch (domain->type) {
- case IOMMU_DOMAIN_SVA:
- intel_svm_remove_dev_pasid(dev, pasid);
- break;
- default:
- /* should never reach here */
- WARN_ON(1);
+ spin_lock_irqsave(&dmar_domain->lock, flags);
+ list_for_each_entry(i, &dmar_domain->dev_pasids, link_domain) {
+ if (i->dev == dev && i->pasid == pasid) {
+ list_del(&i->link_domain);
+ dev_pasid = i;
break;
}
}
+ spin_unlock_irqrestore(&dmar_domain->lock, flags);
+ if (WARN_ON(!dev_pasid))
+ return;
+
+ /* PASID entry already cleared during SVA unbind */
+ if (domain->type != IOMMU_DOMAIN_SVA)
+ intel_pasid_tear_down_entry(iommu, dev, pasid, false);
+
+ kfree(dev_pasid);
+}
+
+static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct dmar_domain *dmar_domain;
+ struct iommu_domain *domain;
+
+ domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
+ dmar_domain = to_dmar_domain(domain);
+
+ /*
+ * SVA Domain type specific cleanup: Not ideal but not until we have
+ * IOPF capable domain specific ops, we need this special case.
+ */
+ if (domain->type == IOMMU_DOMAIN_SVA)
+ return intel_svm_remove_dev_pasid(dev, pasid);

- intel_pasid_tear_down_entry(iommu, dev, pasid, false);
+ intel_iommu_detach_device_pasid(domain, dev, pasid);
+ domain_detach_iommu(dmar_domain, info->iommu);
}

+static int intel_iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ struct intel_iommu *iommu = info->iommu;
+ int ret;
+
+ if (!pasid_supported(iommu))
+ return -ENODEV;
+
+ ret = prepare_domain_attach_device(domain, dev);
+ if (ret)
+ return ret;
+
+ /*
+ * Most likely the device context has already been set up, will only
+ * take a domain ID reference. Otherwise, device context will be set
+ * up here.
+ * The upper layer APIs make no assumption about the ordering between
+ * device attachment and the PASID attachment.
+ */
+ ret = dmar_domain_attach_device(to_dmar_domain(domain), dev);
+ if (ret) {
+ dev_err(dev, "Attach device failed\n");
+ return ret;
+ }
+ return dmar_domain_attach_device_pasid(dmar_domain, iommu, dev, pasid);
+}
+
+
+
const struct iommu_ops intel_iommu_ops = {
.capable = intel_iommu_capable,
.domain_alloc = intel_iommu_domain_alloc,
@@ -4802,6 +4861,7 @@ const struct iommu_ops intel_iommu_ops = {
#endif
.default_domain_ops = &(const struct iommu_domain_ops) {
.attach_dev = intel_iommu_attach_device,
+ .set_dev_pasid = intel_iommu_attach_device_pasid,
.map_pages = intel_iommu_map_pages,
.unmap_pages = intel_iommu_unmap_pages,
.iotlb_sync_map = intel_iommu_iotlb_sync_map,
--
2.25.1

2023-04-28 09:46:56

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v5 1/7] iommu: Generalize default PCIe requester ID PASID

> From: Jacob Pan <[email protected]>
> Sent: Friday, April 28, 2023 1:50 AM
>
> PCIe Process address space ID (PASID) is used to tag DMA traffic, it
> provides finer grained isolation than requester ID (RID).
>
> For each RID, 0 is as a special PASID for the legacy DMA (without
> PASID), thus RID_PASID. This is universal across all architectures,
> therefore warranted to be declared in the common header.
> Noting that VT-d could support none-zero RID_PASID, but currently not
> used.
>
> By having a common RID_PASID, we can avoid conflicts between different
> use cases in the generic code. e.g. SVA and DMA API with PASIDs.

You intend it to be generic but in the end only vt-d driver is changed
to use it in this series...

> @@ -190,6 +190,7 @@ enum iommu_dev_features {
> IOMMU_DEV_FEAT_IOPF,
> };
>
> +#define IOMMU_DEF_RID_PASID (0U) /* Reserved for DMA w/o PASID
> */

Is RID a general team on other platform?

Would IOMMU_DEF_NO_PASID serve the purpose better?

2023-04-28 09:47:30

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v5 2/7] iommu/sva: Explicitly exclude RID_PASID from SVA

> From: Jacob Pan <[email protected]>
> Sent: Friday, April 28, 2023 1:50 AM
>
> SVA PASID allocation is hardcoded to start from 1 because 0 is used for
> RID_PASID, let's make it explicit to avoid the potential conflicts.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/iommu-sva.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index c434b95dc8eb..ac7c93bacb5c 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -66,7 +66,7 @@ struct iommu_sva *iommu_sva_bind_device(struct
> device *dev, struct mm_struct *mm
> return ERR_PTR(-EOPNOTSUPP);
>
> /* Allocate mm->pasid if necessary. */
> - ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
> + ret = iommu_sva_alloc_pasid(mm, IOMMU_DEF_RID_PASID + 1,
> max_pasids - 1);

To be future proof it's probably cleaner to define a
IOMMU_MAX_RSVD_PASID in case there may be more reserved
pasids in future usages?

2023-04-28 09:48:56

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v5 3/7] iommu: Move global PASID allocation from SVA to core

> From: Jacob Pan <[email protected]>
> Sent: Friday, April 28, 2023 1:50 AM
>
> Devices that use Intel ENQCMD to submit work must use global PASIDs in
> that the PASID are stored in a per CPU MSR. When such device need to
> submit work for in-kernel DMA with PASID, it must allocate PASIDs from
> the same global number space to avoid conflict.

well the device itself cannot submit work to itself. It's software to
submit work to the device. ????

> /* Allocate a PASID for the mm within range (inclusive) */
> -static int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min,
> ioasid_t max)
> +static int iommu_sva_alloc_pasid(struct mm_struct *mm, struct device *dev)
> {
> int ret = 0;
>
> - if (!pasid_valid(min) || !pasid_valid(max) ||
> - min == 0 || max < min)
> - return -EINVAL;
> -
> mutex_lock(&iommu_sva_lock);
> /* Is a PASID already associated with this mm? */
> - if (pasid_valid(mm->pasid)) {
> - if (mm->pasid < min || mm->pasid > max)
> - ret = -EOVERFLOW;
> + if (pasid_valid(mm->pasid))
> goto out;

emmm here you still want to check whether mm->pasid exceeds
the max pasid width of the bound device.

2023-04-28 09:51:23

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v5 4/7] iommu/vt-d: Factoring out PASID set up helper function

> From: Jacob Pan <[email protected]>
> Sent: Friday, April 28, 2023 1:50 AM
>
>
> +static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
> + struct intel_iommu *iommu,
> + struct device *dev, ioasid_t pasid)
> +{
> + int ret;
> +
> + /* PASID table is mandatory for a PCI device in scalable mode. */
> + if (!sm_supported(iommu) && dev_is_real_dma_subdevice(dev))
> + return -EOPNOTSUPP;

"&&" should be "||"

2023-04-28 15:56:33

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v5 1/7] iommu: Generalize default PCIe requester ID PASID

Hi Kevin,

On Fri, 28 Apr 2023 09:38:32 +0000, "Tian, Kevin" <[email protected]>
wrote:

> > From: Jacob Pan <[email protected]>
> > Sent: Friday, April 28, 2023 1:50 AM
> >
> > PCIe Process address space ID (PASID) is used to tag DMA traffic, it
> > provides finer grained isolation than requester ID (RID).
> >
> > For each RID, 0 is as a special PASID for the legacy DMA (without
> > PASID), thus RID_PASID. This is universal across all architectures,
> > therefore warranted to be declared in the common header.
> > Noting that VT-d could support none-zero RID_PASID, but currently not
> > used.
> >
> > By having a common RID_PASID, we can avoid conflicts between different
> > use cases in the generic code. e.g. SVA and DMA API with PASIDs.
>
> You intend it to be generic but in the end only vt-d driver is changed
> to use it in this series...
change for SVA is in the patch.

> > @@ -190,6 +190,7 @@ enum iommu_dev_features {
> > IOMMU_DEV_FEAT_IOPF,
> > };
> >
> > +#define IOMMU_DEF_RID_PASID (0U) /* Reserved for DMA w/o PASID
> > */
>
> Is RID a general team on other platform?
RID, aka requester id is a PCI term. so I this it is common though on SMMU
might be better called stream ID PASID?

> Would IOMMU_DEF_NO_PASID serve the purpose better?
not sure, it is still a PASID. For completeness, might be called
IOMMU_DEF_PASID_FOR_DMA_NO_PASID :)

other suggestions?

Thanks,

Jacob

2023-04-28 16:03:44

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v5 2/7] iommu/sva: Explicitly exclude RID_PASID from SVA

Hi Kevin,

On Fri, 28 Apr 2023 09:40:12 +0000, "Tian, Kevin" <[email protected]>
wrote:

> > From: Jacob Pan <[email protected]>
> > Sent: Friday, April 28, 2023 1:50 AM
> >
> > SVA PASID allocation is hardcoded to start from 1 because 0 is used for
> > RID_PASID, let's make it explicit to avoid the potential conflicts.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/iommu-sva.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> > index c434b95dc8eb..ac7c93bacb5c 100644
> > --- a/drivers/iommu/iommu-sva.c
> > +++ b/drivers/iommu/iommu-sva.c
> > @@ -66,7 +66,7 @@ struct iommu_sva *iommu_sva_bind_device(struct
> > device *dev, struct mm_struct *mm
> > return ERR_PTR(-EOPNOTSUPP);
> >
> > /* Allocate mm->pasid if necessary. */
> > - ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
> > + ret = iommu_sva_alloc_pasid(mm, IOMMU_DEF_RID_PASID + 1,
> > max_pasids - 1);
>
> To be future proof it's probably cleaner to define a
> IOMMU_MAX_RSVD_PASID in case there may be more reserved
> pasids in future usages?
much better, will do.

Thanks,

Jacob

2023-04-28 16:42:24

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v5 3/7] iommu: Move global PASID allocation from SVA to core

Hi Kevin,

On Fri, 28 Apr 2023 09:46:25 +0000, "Tian, Kevin" <[email protected]>
wrote:

> > From: Jacob Pan <[email protected]>
> > Sent: Friday, April 28, 2023 1:50 AM
> >
> > Devices that use Intel ENQCMD to submit work must use global PASIDs in
> > that the PASID are stored in a per CPU MSR. When such device need to
> > submit work for in-kernel DMA with PASID, it must allocate PASIDs from
> > the same global number space to avoid conflict.
>
> well the device itself cannot submit work to itself. It's software to
> submit work to the device. ????
will rephrase, how about:
Intel ENQCMD work submission requires the use of global PASIDs in ...
> > that the PASID are stored in a per CPU MSR
>
> > /* Allocate a PASID for the mm within range (inclusive) */
> > -static int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min,
> > ioasid_t max)
> > +static int iommu_sva_alloc_pasid(struct mm_struct *mm, struct device
> > *dev) {
> > int ret = 0;
> >
> > - if (!pasid_valid(min) || !pasid_valid(max) ||
> > - min == 0 || max < min)
> > - return -EINVAL;
> > -
> > mutex_lock(&iommu_sva_lock);
> > /* Is a PASID already associated with this mm? */
> > - if (pasid_valid(mm->pasid)) {
> > - if (mm->pasid < min || mm->pasid > max)
> > - ret = -EOVERFLOW;
> > + if (pasid_valid(mm->pasid))
> > goto out;
>
> emmm here you still want to check whether mm->pasid exceeds
> the max pasid width of the bound device.
good point, existing mm->pasid could be from another device that has a
larger pasid range.


Thanks,

Jacob

2023-05-03 06:35:18

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v5 3/7] iommu: Move global PASID allocation from SVA to core

On 4/28/23 1:49 AM, Jacob Pan wrote:
> +ioasid_t iommu_alloc_global_pasid_dev(struct device *dev)
> +{
> + int ret;
> + ioasid_t max;
> +
> + max = dev_iommu_get_max_pasids(dev);

Perhaps you can use dev->iommu->max_pasids. It's static, so no need to
recalculate it.

> + ret = ida_alloc_range(&iommu_global_pasid_ida, IOMMU_DEF_RID_PASID + 1, max, GFP_KERNEL);
> + if (ret < 0)
> + return IOMMU_PASID_INVALID;
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_alloc_global_pasid_dev);

Best regards,
baolu

2023-05-03 06:38:29

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v5 4/7] iommu/vt-d: Factoring out PASID set up helper function

On 4/28/23 5:47 PM, Tian, Kevin wrote:
>> From: Jacob Pan <[email protected]>
>> Sent: Friday, April 28, 2023 1:50 AM
>>
>>
>> +static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
>> + struct intel_iommu *iommu,
>> + struct device *dev, ioasid_t pasid)
>> +{
>> + int ret;
>> +
>> + /* PASID table is mandatory for a PCI device in scalable mode. */
>> + if (!sm_supported(iommu) && dev_is_real_dma_subdevice(dev))
>> + return -EOPNOTSUPP;
>
> "&&" should be "||"
>

Also should return success instead if this is a RID case. Perhaps,

if (!sm_supported(iommu) || dev_is_real_dma_subdevice(dev))
return pasid == RID2PASID ? 0 : -EOPNOTSUPP;

Best regards,
baolu

2023-05-03 06:57:47

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v5 5/7] iommu/vt-d: Prepare PASID attachment beyond RID_PASID

On 4/28/23 1:49 AM, Jacob Pan wrote:
> @@ -2433,12 +2477,17 @@ static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
> struct intel_iommu *iommu,
> struct device *dev, ioasid_t pasid)
> {
> + struct device_pasid_info *dev_pasid;
> + unsigned long flags;
> int ret;
>
> - /* PASID table is mandatory for a PCI device in scalable mode. */
> if (!sm_supported(iommu) && dev_is_real_dma_subdevice(dev))
> return -EOPNOTSUPP;
>
> + dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
> + if (!dev_pasid)
> + return -ENOMEM;
> +
> if (hw_pass_through && domain_type_is_si(domain))
> ret = intel_pasid_setup_pass_through(iommu, domain, dev, pasid);
> else if (domain->use_first_level)
> @@ -2446,6 +2495,17 @@ static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
> else
> ret = intel_pasid_setup_second_level(iommu, domain, dev, pasid);
>
> + if (ret) {
> + kfree(dev_pasid);
> + return ret;
> + }
> +
> + dev_pasid->pasid = pasid;
> + dev_pasid->dev = dev;
> + spin_lock_irqsave(&domain->lock, flags);
> + list_add(&dev_pasid->link_domain, &domain->dev_pasids);
> + spin_unlock_irqrestore(&domain->lock, flags);
> +
> return 0;
> }
>
> @@ -2467,16 +2527,13 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
> return ret;
> info->domain = domain;
> spin_lock_irqsave(&domain->lock, flags);
> + if (info->dev_attached) {
> + spin_unlock_irqrestore(&domain->lock, flags);
> + return 0;
> + }
> list_add(&info->link, &domain->devices);
> spin_unlock_irqrestore(&domain->lock, flags);
>
> - ret = dmar_domain_attach_device_pasid(domain, iommu, dev,
> - IOMMU_DEF_RID_PASID);
> - if (ret) {
> - dev_err(dev, "Setup RID2PASID failed\n");
> - device_block_translation(dev);
> - }
> -
> ret = domain_context_mapping(domain, dev);
> if (ret) {
> dev_err(dev, "Domain context map failed\n");
> @@ -2485,8 +2542,9 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
> }
>
> iommu_enable_pci_caps(info);
> + info->dev_attached = 1;
>
> - return 0;
> + return ret;
> }
>
> static bool device_has_rmrr(struct device *dev)
> @@ -4044,6 +4102,7 @@ static void device_block_translation(struct device *dev)
>
> spin_lock_irqsave(&info->domain->lock, flags);
> list_del(&info->link);
> + info->dev_attached = 0;
> spin_unlock_irqrestore(&info->domain->lock, flags);
>
> domain_detach_iommu(info->domain, iommu);
> @@ -4175,8 +4234,15 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
> struct device *dev)
> {
> struct device_domain_info *info = dev_iommu_priv_get(dev);
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct intel_iommu *iommu;
> + u8 bus, devfn;
> int ret;
>
> + iommu = device_to_iommu(dev, &bus, &devfn);
> + if (!iommu)
> + return -ENODEV;
> +
> if (domain->type == IOMMU_DOMAIN_UNMANAGED &&
> device_is_rmrr_locked(dev)) {
> dev_warn(dev, "Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.\n");
> @@ -4190,7 +4256,23 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
> if (ret)
> return ret;
>
> - return dmar_domain_attach_device(to_dmar_domain(domain), dev);
> + ret = dmar_domain_attach_device(to_dmar_domain(domain), dev);
> + if (ret) {
> + dev_err(dev, "Attach device failed\n");
> + return ret;
> + }
> +
> + /* PASID table is mandatory for a PCI device in scalable mode. */
> + if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
> + /* Setup the PASID entry for requests without PASID: */
> + ret = dmar_domain_attach_device_pasid(dmar_domain, iommu, dev,
> + IOMMU_DEF_RID_PASID);
> + if (ret) {
> + dev_err(dev, "Setup RID2PASID failed\n");
> + device_block_translation(dev);
> + }
> + }
> + return ret;
> }
>
> static int intel_iommu_map(struct iommu_domain *domain,

I am not following why do you need to change the attach_device path in
this patch. Perhaps you want to make sure that context entry for the
device is configured before attach_device_pasid?

Best regards,
baolu

2023-05-03 07:30:52

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v5 6/7] iommu/vt-d: Implement set_dev_pasid domain op

On 4/28/23 1:49 AM, Jacob Pan wrote:
> Devices that use ENQCMDS to submit work on buffers mapped by DMA API
> must attach a PASID to the default domain of the device. In preparation
> for this use case, this patch implements set_dev_pasid() for the
> default_domain_ops.
>
> If the device context has not been set up prior to this call, this will
> set up the device context in addition to PASID attachment.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/intel/iommu.c | 92 ++++++++++++++++++++++++++++++-------
> 1 file changed, 76 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 388453a7415e..f9d6c31cdc8e 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -278,6 +278,8 @@ static LIST_HEAD(dmar_satc_units);
> list_for_each_entry(rmrr, &dmar_rmrr_units, list)
>
> static void device_block_translation(struct device *dev);
> +static void intel_iommu_detach_device_pasid(struct iommu_domain *domain,
> + struct device *dev, ioasid_t pasid);
> static void intel_iommu_domain_free(struct iommu_domain *domain);
>
> int dmar_disabled = !IS_ENABLED(CONFIG_INTEL_IOMMU_DEFAULT_ON);
> @@ -4091,8 +4093,7 @@ static void device_block_translation(struct device *dev)
> iommu_disable_pci_caps(info);
> if (!dev_is_real_dma_subdevice(dev)) {
> if (sm_supported(iommu))
> - intel_pasid_tear_down_entry(iommu, dev,
> - IOMMU_DEF_RID_PASID, false);
> + intel_iommu_detach_device_pasid(&info->domain->domain, dev, IOMMU_DEF_RID_PASID);

device_block_translation() is called when switch RID's domain or release
the device. I assume that we don't need to touch this path when we add
the attach_dev_pasid support.

Blocking DMA translation through RID/PASID should be done in
remove_dev_pasid path.

Or, I overlooked anything?

[...]

>
> +static int intel_iommu_attach_device_pasid(struct iommu_domain *domain,
> + struct device *dev, ioasid_t pasid)
> +{
> + struct device_domain_info *info = dev_iommu_priv_get(dev);
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct intel_iommu *iommu = info->iommu;
> + int ret;
> +
> + if (!pasid_supported(iommu))
> + return -ENODEV;
> +
> + ret = prepare_domain_attach_device(domain, dev);
> + if (ret)
> + return ret;
> +
> + /*
> + * Most likely the device context has already been set up, will only
> + * take a domain ID reference. Otherwise, device context will be set
> + * up here.

The "otherwise" case is only default domain deferred attaching case,
right?

When the device driver starts to call attach_dev_pasid api, it means
that the bus and device DMA configuration have been done. We could do
the deferred default domain attaching now. So, perhaps we should add
below code in the core:

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f1dcfa3f1a1b..633b5ca53606 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3296,6 +3296,12 @@ int iommu_attach_device_pasid(struct iommu_domain
*domain,
if (!group)
return -ENODEV;

+ ret = iommu_deferred_attach(dev, group->default_domain);
+ if (ret) {
+ iommu_group_put(group);
+ return ret;
+ }
+
mutex_lock(&group->mutex);
curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain,
GFP_KERNEL);
if (curr) {

Perhaps need to call iommu_deferred_attach() inside the group->mutex
critical region?

> + * The upper layer APIs make no assumption about the ordering between
> + * device attachment and the PASID attachment.
> + */
> + ret = dmar_domain_attach_device(to_dmar_domain(domain), dev);

Calling attach_device on the attach_dev_pasid path is not right.

> + if (ret) {
> + dev_err(dev, "Attach device failed\n");
> + return ret;
> + }
> + return dmar_domain_attach_device_pasid(dmar_domain, iommu, dev, pasid);
> +}
> +
> +
> +
> const struct iommu_ops intel_iommu_ops = {
> .capable = intel_iommu_capable,
> .domain_alloc = intel_iommu_domain_alloc,
> @@ -4802,6 +4861,7 @@ const struct iommu_ops intel_iommu_ops = {
> #endif
> .default_domain_ops = &(const struct iommu_domain_ops) {
> .attach_dev = intel_iommu_attach_device,
> + .set_dev_pasid = intel_iommu_attach_device_pasid,
> .map_pages = intel_iommu_map_pages,
> .unmap_pages = intel_iommu_unmap_pages,
> .iotlb_sync_map = intel_iommu_iotlb_sync_map,

Best regards,
baolu

2023-05-04 21:29:30

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v5 3/7] iommu: Move global PASID allocation from SVA to core

Hi Baolu,

On Wed, 3 May 2023 14:32:15 +0800, Baolu Lu <[email protected]>
wrote:

> On 4/28/23 1:49 AM, Jacob Pan wrote:
> > +ioasid_t iommu_alloc_global_pasid_dev(struct device *dev)
> > +{
> > + int ret;
> > + ioasid_t max;
> > +
> > + max = dev_iommu_get_max_pasids(dev);
>
> Perhaps you can use dev->iommu->max_pasids. It's static, so no need to
> recalculate it.
>
sounds good!


Thanks,

Jacob

2023-05-04 21:59:42

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v5 5/7] iommu/vt-d: Prepare PASID attachment beyond RID_PASID

Hi Baolu,

On Wed, 3 May 2023 14:49:36 +0800, Baolu Lu <[email protected]>
wrote:

> On 4/28/23 1:49 AM, Jacob Pan wrote:
> > @@ -2433,12 +2477,17 @@ static int
> > dmar_domain_attach_device_pasid(struct dmar_domain *domain, struct
> > intel_iommu *iommu, struct device *dev, ioasid_t pasid)
> > {
> > + struct device_pasid_info *dev_pasid;
> > + unsigned long flags;
> > int ret;
> >
> > - /* PASID table is mandatory for a PCI device in scalable mode.
> > */ if (!sm_supported(iommu) && dev_is_real_dma_subdevice(dev))
> > return -EOPNOTSUPP;
> >
> > + dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
> > + if (!dev_pasid)
> > + return -ENOMEM;
> > +
> > if (hw_pass_through && domain_type_is_si(domain))
> > ret = intel_pasid_setup_pass_through(iommu, domain,
> > dev, pasid); else if (domain->use_first_level)
> > @@ -2446,6 +2495,17 @@ static int
> > dmar_domain_attach_device_pasid(struct dmar_domain *domain, else
> > ret = intel_pasid_setup_second_level(iommu, domain,
> > dev, pasid);
> > + if (ret) {
> > + kfree(dev_pasid);
> > + return ret;
> > + }
> > +
> > + dev_pasid->pasid = pasid;
> > + dev_pasid->dev = dev;
> > + spin_lock_irqsave(&domain->lock, flags);
> > + list_add(&dev_pasid->link_domain, &domain->dev_pasids);
> > + spin_unlock_irqrestore(&domain->lock, flags);
> > +
> > return 0;
> > }
> >
> > @@ -2467,16 +2527,13 @@ static int dmar_domain_attach_device(struct
> > dmar_domain *domain, return ret;
> > info->domain = domain;
> > spin_lock_irqsave(&domain->lock, flags);
> > + if (info->dev_attached) {
> > + spin_unlock_irqrestore(&domain->lock, flags);
> > + return 0;
> > + }
> > list_add(&info->link, &domain->devices);
> > spin_unlock_irqrestore(&domain->lock, flags);
> >
> > - ret = dmar_domain_attach_device_pasid(domain, iommu, dev,
> > - IOMMU_DEF_RID_PASID);
> > - if (ret) {
> > - dev_err(dev, "Setup RID2PASID failed\n");
> > - device_block_translation(dev);
> > - }
> > -
> > ret = domain_context_mapping(domain, dev);
> > if (ret) {
> > dev_err(dev, "Domain context map failed\n");
> > @@ -2485,8 +2542,9 @@ static int dmar_domain_attach_device(struct
> > dmar_domain *domain, }
> >
> > iommu_enable_pci_caps(info);
> > + info->dev_attached = 1;
> >
> > - return 0;
> > + return ret;
> > }
> >
> > static bool device_has_rmrr(struct device *dev)
> > @@ -4044,6 +4102,7 @@ static void device_block_translation(struct
> > device *dev)
> > spin_lock_irqsave(&info->domain->lock, flags);
> > list_del(&info->link);
> > + info->dev_attached = 0;
> > spin_unlock_irqrestore(&info->domain->lock, flags);
> >
> > domain_detach_iommu(info->domain, iommu);
> > @@ -4175,8 +4234,15 @@ static int intel_iommu_attach_device(struct
> > iommu_domain *domain, struct device *dev)
> > {
> > struct device_domain_info *info = dev_iommu_priv_get(dev);
> > + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > + struct intel_iommu *iommu;
> > + u8 bus, devfn;
> > int ret;
> >
> > + iommu = device_to_iommu(dev, &bus, &devfn);
> > + if (!iommu)
> > + return -ENODEV;
> > +
> > if (domain->type == IOMMU_DOMAIN_UNMANAGED &&
> > device_is_rmrr_locked(dev)) {
> > dev_warn(dev, "Device is ineligible for IOMMU domain
> > attach due to platform RMRR requirement. Contact your platform
> > vendor.\n"); @@ -4190,7 +4256,23 @@ static int
> > intel_iommu_attach_device(struct iommu_domain *domain, if (ret) return
> > ret;
> > - return dmar_domain_attach_device(to_dmar_domain(domain), dev);
> > + ret = dmar_domain_attach_device(to_dmar_domain(domain), dev);
> > + if (ret) {
> > + dev_err(dev, "Attach device failed\n");
> > + return ret;
> > + }
> > +
> > + /* PASID table is mandatory for a PCI device in scalable mode.
> > */
> > + if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
> > + /* Setup the PASID entry for requests without PASID: */
> > + ret = dmar_domain_attach_device_pasid(dmar_domain,
> > iommu, dev,
> > +
> > IOMMU_DEF_RID_PASID);
> > + if (ret) {
> > + dev_err(dev, "Setup RID2PASID failed\n");
> > + device_block_translation(dev);
> > + }
> > + }
> > + return ret;
> > }
> >
> > static int intel_iommu_map(struct iommu_domain *domain,
>
> I am not following why do you need to change the attach_device path in
> this patch. Perhaps you want to make sure that context entry for the
> device is configured before attach_device_pasid?
This is just refactoring, with this patch attach_device is broken down into
1. prepare_domain_attach_device()
2. dmar_domain_attach_device()
3. dmar_domain_attach_device_pasid()
the change is due to factoring out dmar_domain_attach_device_pasid().
device context set up in #2, already ensured before PASID attachment.

perhaps I miss your point?


Thanks,

Jacob

2023-05-04 22:15:12

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v5 4/7] iommu/vt-d: Factoring out PASID set up helper function

Hi Baolu,

On Wed, 3 May 2023 14:37:16 +0800, Baolu Lu <[email protected]>
wrote:

> On 4/28/23 5:47 PM, Tian, Kevin wrote:
> >> From: Jacob Pan <[email protected]>
> >> Sent: Friday, April 28, 2023 1:50 AM
> >>
> >>
> >> +static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
> >> + struct intel_iommu *iommu,
> >> + struct device *dev,
> >> ioasid_t pasid) +{
> >> + int ret;
> >> +
> >> + /* PASID table is mandatory for a PCI device in scalable
> >> mode. */
> >> + if (!sm_supported(iommu) && dev_is_real_dma_subdevice(dev))
> >> + return -EOPNOTSUPP;
> >
> > "&&" should be "||"
> >
>
> Also should return success instead if this is a RID case. Perhaps,
>
> if (!sm_supported(iommu) || dev_is_real_dma_subdevice(dev))
> return pasid == RID2PASID ? 0 : -EOPNOTSUPP;
>
Yeah, I think this is better. will do.

I was hoping not to treat RIDPASID special here. The caller of this
function does the check if that is RIDPASID but code is duplicated.

Thanks,

Jacob

2023-05-04 22:16:26

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v5 4/7] iommu/vt-d: Factoring out PASID set up helper function

Hi Kevin,

On Fri, 28 Apr 2023 09:47:45 +0000, "Tian, Kevin" <[email protected]>
wrote:

> > From: Jacob Pan <[email protected]>
> > Sent: Friday, April 28, 2023 1:50 AM
> >
> >
> > +static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
> > + struct intel_iommu *iommu,
> > + struct device *dev,
> > ioasid_t pasid) +{
> > + int ret;
> > +
> > + /* PASID table is mandatory for a PCI device in scalable mode.
> > */
> > + if (!sm_supported(iommu) && dev_is_real_dma_subdevice(dev))
> > + return -EOPNOTSUPP;
>
> "&&" should be "||"
>
good catch,

Thanks,

Jacob

2023-05-04 23:05:11

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v5 6/7] iommu/vt-d: Implement set_dev_pasid domain op

Hi Baolu,

On Wed, 3 May 2023 15:26:00 +0800, Baolu Lu <[email protected]>
wrote:

> On 4/28/23 1:49 AM, Jacob Pan wrote:
> > Devices that use ENQCMDS to submit work on buffers mapped by DMA API
> > must attach a PASID to the default domain of the device. In preparation
> > for this use case, this patch implements set_dev_pasid() for the
> > default_domain_ops.
> >
> > If the device context has not been set up prior to this call, this will
> > set up the device context in addition to PASID attachment.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/intel/iommu.c | 92 ++++++++++++++++++++++++++++++-------
> > 1 file changed, 76 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 388453a7415e..f9d6c31cdc8e 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -278,6 +278,8 @@ static LIST_HEAD(dmar_satc_units);
> > list_for_each_entry(rmrr, &dmar_rmrr_units, list)
> >
> > static void device_block_translation(struct device *dev);
> > +static void intel_iommu_detach_device_pasid(struct iommu_domain
> > *domain,
> > + struct device *dev,
> > ioasid_t pasid); static void intel_iommu_domain_free(struct
> > iommu_domain *domain);
> > int dmar_disabled = !IS_ENABLED(CONFIG_INTEL_IOMMU_DEFAULT_ON);
> > @@ -4091,8 +4093,7 @@ static void device_block_translation(struct
> > device *dev) iommu_disable_pci_caps(info);
> > if (!dev_is_real_dma_subdevice(dev)) {
> > if (sm_supported(iommu))
> > - intel_pasid_tear_down_entry(iommu, dev,
> > -
> > IOMMU_DEF_RID_PASID, false);
> > +
> > intel_iommu_detach_device_pasid(&info->domain->domain, dev,
> > IOMMU_DEF_RID_PASID);
>
> device_block_translation() is called when switch RID's domain or release
> the device. I assume that we don't need to touch this path when we add
> the attach_dev_pasid support.
>
> Blocking DMA translation through RID/PASID should be done in
> remove_dev_pasid path.
>
> Or, I overlooked anything?
>
> [...]
>
> >
> > +static int intel_iommu_attach_device_pasid(struct iommu_domain *domain,
> > + struct device *dev,
> > ioasid_t pasid) +{
> > + struct device_domain_info *info = dev_iommu_priv_get(dev);
> > + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > + struct intel_iommu *iommu = info->iommu;
> > + int ret;
> > +
> > + if (!pasid_supported(iommu))
> > + return -ENODEV;
> > +
> > + ret = prepare_domain_attach_device(domain, dev);
> > + if (ret)
> > + return ret;
> > +
> > + /*
> > + * Most likely the device context has already been set up,
> > will only
> > + * take a domain ID reference. Otherwise, device context will
> > be set
> > + * up here.
>
> The "otherwise" case is only default domain deferred attaching case,
> right?
it might be the only case so far, but my intention is to be general. i.e.
no ordering requirements. I believe it is more future proof in case
device_attach_pasid called before device_attach.

> When the device driver starts to call attach_dev_pasid api, it means
> that the bus and device DMA configuration have been done. We could do
> the deferred default domain attaching now. So, perhaps we should add
> below code in the core:
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index f1dcfa3f1a1b..633b5ca53606 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -3296,6 +3296,12 @@ int iommu_attach_device_pasid(struct iommu_domain
> *domain,
> if (!group)
> return -ENODEV;
>
> + ret = iommu_deferred_attach(dev, group->default_domain);
> + if (ret) {
> + iommu_group_put(group);
> + return ret;
> + }
it will cover the device_attach, but adding a special case.

> mutex_lock(&group->mutex);
> curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain,
> GFP_KERNEL);
> if (curr) {
>
> Perhaps need to call iommu_deferred_attach() inside the group->mutex
> critical region?
i agree, attach RID_PASID should also be on the group's pasid_array.

> > + * The upper layer APIs make no assumption about the ordering
> > between
> > + * device attachment and the PASID attachment.
> > + */
> > + ret = dmar_domain_attach_device(to_dmar_domain(domain), dev);
>
> Calling attach_device on the attach_dev_pasid path is not right.
I think it comes down to the philosophical differences in terms of
who is responsible for ensuring device ctx is set up prior to device pasid
attach:
1. vt-d driver
2. upper layer API

> > + if (ret) {
> > + dev_err(dev, "Attach device failed\n");
> > + return ret;
> > + }
> > + return dmar_domain_attach_device_pasid(dmar_domain, iommu,
> > dev, pasid); +}
> > +
> > +
> > +
> > const struct iommu_ops intel_iommu_ops = {
> > .capable = intel_iommu_capable,
> > .domain_alloc = intel_iommu_domain_alloc,
> > @@ -4802,6 +4861,7 @@ const struct iommu_ops intel_iommu_ops = {
> > #endif
> > .default_domain_ops = &(const struct iommu_domain_ops) {
> > .attach_dev =
> > intel_iommu_attach_device,
> > + .set_dev_pasid =
> > intel_iommu_attach_device_pasid, .map_pages =
> > intel_iommu_map_pages, .unmap_pages =
> > intel_iommu_unmap_pages, .iotlb_sync_map =
> > intel_iommu_iotlb_sync_map,
>
> Best regards,
> baolu


Thanks,

Jacob

2023-05-05 03:04:54

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v5 6/7] iommu/vt-d: Implement set_dev_pasid domain op

On 5/5/23 7:03 AM, Jacob Pan wrote:
>>> +static int intel_iommu_attach_device_pasid(struct iommu_domain *domain,
>>> + struct device *dev,
>>> ioasid_t pasid) +{
>>> + struct device_domain_info *info = dev_iommu_priv_get(dev);
>>> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>>> + struct intel_iommu *iommu = info->iommu;
>>> + int ret;
>>> +
>>> + if (!pasid_supported(iommu))
>>> + return -ENODEV;
>>> +
>>> + ret = prepare_domain_attach_device(domain, dev);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + /*
>>> + * Most likely the device context has already been set up,
>>> will only
>>> + * take a domain ID reference. Otherwise, device context will
>>> be set
>>> + * up here.
>> The "otherwise" case is only default domain deferred attaching case,
>> right?
> it might be the only case so far, but my intention is to be general. i.e.
> no ordering requirements. I believe it is more future proof in case
> device_attach_pasid called before device_attach.

Let's put aside deferred attach and talk about it later.

The device's context entry is configured when the default domain is
being attached to the device. And, the default domain attaching is in
the iommu probe_device path. It always happens before set_device_pasid
which is designed to be called by the device driver. So the real
situation is that when the device driver calls set_device_pasid, the
context entry should already have been configured.

Then let's pick up the deferred attach case. It is a workaround for
kdump (Documentation/admin-guide/kdump/kdump.rst). I don't think PASID
feature is functionally required by any kdump capture kernel as its
main purpose is to dump the memory of a panic kernel.

In summary, it seems to be reasonable for the vt-d driver to return
-EBUSY when a device's context was copied. The idxd driver should
continue to work without kernel DMA with PASID support.

if (context_copied(iommu, bus, devfn))
return -EBUSY;

Make things general is always good, but this doesn't mean that we need
to make the code complex to support cases that do not exist or are not
used. Thoughts?

Best regards,
baolu

2023-05-05 08:32:26

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v5 1/7] iommu: Generalize default PCIe requester ID PASID

> From: Jacob Pan <[email protected]>
> Sent: Friday, April 28, 2023 11:56 PM
>
> Hi Kevin,
>
> On Fri, 28 Apr 2023 09:38:32 +0000, "Tian, Kevin" <[email protected]>
> wrote:
>
> > > From: Jacob Pan <[email protected]>
> > > Sent: Friday, April 28, 2023 1:50 AM
> > >
> > > PCIe Process address space ID (PASID) is used to tag DMA traffic, it
> > > provides finer grained isolation than requester ID (RID).
> > >
> > > For each RID, 0 is as a special PASID for the legacy DMA (without
> > > PASID), thus RID_PASID. This is universal across all architectures,
> > > therefore warranted to be declared in the common header.
> > > Noting that VT-d could support none-zero RID_PASID, but currently not
> > > used.
> > >
> > > By having a common RID_PASID, we can avoid conflicts between different
> > > use cases in the generic code. e.g. SVA and DMA API with PASIDs.
> >
> > You intend it to be generic but in the end only vt-d driver is changed
> > to use it in this series...
> change for SVA is in the patch.

My point was that since it is common why there is no change in arm-smmu
driver to use the common macro?

>
> > > @@ -190,6 +190,7 @@ enum iommu_dev_features {
> > > IOMMU_DEV_FEAT_IOPF,
> > > };
> > >
> > > +#define IOMMU_DEF_RID_PASID (0U) /* Reserved for DMA w/o PASID
> > > */
> >
> > Is RID a general team on other platform?
> RID, aka requester id is a PCI term. so I this it is common though on SMMU
> might be better called stream ID PASID?
>
> > Would IOMMU_DEF_NO_PASID serve the purpose better?
> not sure, it is still a PASID. For completeness, might be called
> IOMMU_DEF_PASID_FOR_DMA_NO_PASID :)
>
> other suggestions?
>

that is even worse. Let's keep the original one and see whether others
have better suggestion.

2023-05-05 20:42:05

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v5 6/7] iommu/vt-d: Implement set_dev_pasid domain op

Hi Baolu,

On Fri, 5 May 2023 10:58:38 +0800, Baolu Lu <[email protected]>
wrote:

> On 5/5/23 7:03 AM, Jacob Pan wrote:
> >>> +static int intel_iommu_attach_device_pasid(struct iommu_domain
> >>> *domain,
> >>> + struct device *dev,
> >>> ioasid_t pasid) +{
> >>> + struct device_domain_info *info = dev_iommu_priv_get(dev);
> >>> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> >>> + struct intel_iommu *iommu = info->iommu;
> >>> + int ret;
> >>> +
> >>> + if (!pasid_supported(iommu))
> >>> + return -ENODEV;
> >>> +
> >>> + ret = prepare_domain_attach_device(domain, dev);
> >>> + if (ret)
> >>> + return ret;
> >>> +
> >>> + /*
> >>> + * Most likely the device context has already been set up,
> >>> will only
> >>> + * take a domain ID reference. Otherwise, device context will
> >>> be set
> >>> + * up here.
> >> The "otherwise" case is only default domain deferred attaching case,
> >> right?
> > it might be the only case so far, but my intention is to be general.
> > i.e. no ordering requirements. I believe it is more future proof in case
> > device_attach_pasid called before device_attach.
>
> Let's put aside deferred attach and talk about it later.
>
> The device's context entry is configured when the default domain is
> being attached to the device. And, the default domain attaching is in
> the iommu probe_device path. It always happens before set_device_pasid
> which is designed to be called by the device driver. So the real
> situation is that when the device driver calls set_device_pasid, the
> context entry should already have been configured.
>
> Then let's pick up the deferred attach case. It is a workaround for
> kdump (Documentation/admin-guide/kdump/kdump.rst). I don't think PASID
> feature is functionally required by any kdump capture kernel as its
> main purpose is to dump the memory of a panic kernel.
>
> In summary, it seems to be reasonable for the vt-d driver to return
> -EBUSY when a device's context was copied. The idxd driver should
> continue to work without kernel DMA with PASID support.
>
> if (context_copied(iommu, bus, devfn))
> return -EBUSY;
>
> Make things general is always good, but this doesn't mean that we need
> to make the code complex to support cases that do not exist or are not
> used. Thoughts?
>
Good point, it is better not put dead code in. Let me also document this
behavior for future change that may affect the ordering requirement.

Thanks,

Jacob

2023-05-09 20:49:37

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v5 1/7] iommu: Generalize default PCIe requester ID PASID

Hi Kevin,

On Fri, 5 May 2023 08:28:13 +0000, "Tian, Kevin" <[email protected]>
wrote:

> > > >
> > > > By having a common RID_PASID, we can avoid conflicts between
> > > > different use cases in the generic code. e.g. SVA and DMA API with
> > > > PASIDs.
> > >
> > > You intend it to be generic but in the end only vt-d driver is changed
> > > to use it in this series...
> > change for SVA is in the patch.
>
> My point was that since it is common why there is no change in arm-smmu
> driver to use the common macro?
Got it, I will include changes to SSID 0 in smmu code.

Thanks,

Jacob