Modern accelerators such as Intel's Data Streaming Accelerator (DSA) can
perform DMA requests with PASID, which is a finer granularity than the
device's requester ID(RID). In fact, work submissions on DSA shared work
queues require PASID.
DMA mapping API is the de facto standard for in-kernel DMA. However, it
operates on a per device/RID basis which is not PASID-aware.
This patch introduces the following driver facing API that enables DMA API
PASID usage: ioasid_t iommu_enable_pasid_dma(struct device *dev);
A PASID field is added to struct device for the purposes of storing kernel
DMA PASID and flushing device IOTLBs. A separate use case in interrupt
message store (IMS) also hinted adding a PASID field to struct device.
https://lore.kernel.org/all/[email protected]/
IMS virtualization and DMA API does not overlap.
Once enabled, device drivers can continue to use DMA APIs as-is. There is
no difference in terms of mapping in dma_handle between without PASID and
with PASID. The DMA mapping performed by IOMMU will be identical for both
requests with and without PASID (legacy), let it be IOVA or PA in case of
pass-through.
In addition, this set converts the current support for in-kernel PASID DMA
from SVA lib to DMA API. There have been security and functional issues
with the kernel SVA approach:
(https://lore.kernel.org/linux-iommu/[email protected]/)
The highlights are as the following:
- The lack of IOTLB synchronization upon kernel page table updates.
(vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.)
- Other than slight more protection, using kernel virtual address (KVA)
has little advantage over physical address.
There are also no use cases yet where DMA engines need kernel virtual
addresses for in-kernel DMA.
Once this set is accepted, more cleanup patches will follow. The plan is to
remove the usage of sva_bind_device() for in-kernel usages. Removing page
requests and other special cases around kernel SVA in VT-d driver.
Jacob Pan (4):
ioasid: Reserve a global PASID for in-kernel DMA
iommu: Add PASID support for DMA mapping API users
iommu/vt-d: Support PASID DMA for in-kernel usage
dmaengine: idxd: Use DMA API for in-kernel DMA with PASID
.../admin-guide/kernel-parameters.txt | 6 -
drivers/dma/Kconfig | 10 --
drivers/dma/idxd/idxd.h | 1 -
drivers/dma/idxd/init.c | 59 +++-------
drivers/dma/idxd/sysfs.c | 7 --
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
drivers/iommu/dma-iommu.c | 71 ++++++++++++
drivers/iommu/intel/iommu.c | 109 +++++++++++++++++-
drivers/iommu/intel/pasid.c | 7 ++
drivers/iommu/intel/pasid.h | 3 +-
drivers/iommu/intel/svm.c | 2 +-
drivers/iommu/ioasid.c | 2 +
include/linux/device.h | 1 +
include/linux/dma-iommu.h | 7 ++
include/linux/intel-iommu.h | 3 +-
include/linux/ioasid.h | 4 +
include/linux/iommu.h | 4 +
17 files changed, 226 insertions(+), 72 deletions(-)
--
2.25.1
In-kernel DMA is managed by DMA mapping APIs, which supports per device
addressing mode for legacy DMA requests. With the introduction of
Process Address Space ID (PASID), device DMA can now target at a finer
granularity per PASID + Requester ID (RID).
However, for in-kernel DMA there is no need to differentiate between
legacy DMA and DMA with PASID in terms of mapping. DMA address mapping
for RID+PASID can be made identical to the RID. The benefit for the
drivers is the continuation of DMA mapping APIs without change.
This patch reserves a special IOASID for devices that perform in-kernel
DMA requests with PASID. This global IOASID is excluded from the
IOASID allocator. The analogous case is PASID #0, a special PASID
reserved for DMA requests without PASID (legacy). We could have different
kernel PASIDs for individual devices, but for simplicity reasons, a
globally reserved one will fit the bill.
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
drivers/iommu/intel/iommu.c | 4 ++--
drivers/iommu/intel/pasid.h | 3 +--
drivers/iommu/intel/svm.c | 2 +-
drivers/iommu/ioasid.c | 2 ++
include/linux/ioasid.h | 4 ++++
6 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index ee66d1f4cb81..ac79a37ffe06 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -329,7 +329,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
return ERR_PTR(-ENOMEM);
/* Allocate a PASID for this mm if necessary */
- ret = iommu_sva_alloc_pasid(mm, 1, (1U << master->ssid_bits) - 1);
+ ret = iommu_sva_alloc_pasid(mm, IOASID_ALLOC_BASE, (1U << master->ssid_bits) - 1);
if (ret)
goto err_free_bond;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6afb4d4e09ef..60253bc436bb 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3253,7 +3253,7 @@ static ioasid_t intel_vcmd_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)
* PASID range. Host can partition guest PASID range based on
* policies but it is out of guest's control.
*/
- if (min < PASID_MIN || max > intel_pasid_max_id)
+ if (min < IOASID_ALLOC_BASE || max > intel_pasid_max_id)
return INVALID_IOASID;
if (vcmd_alloc_pasid(iommu, &ioasid))
@@ -4824,7 +4824,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
u32 pasid;
/* No private data needed for the default pasid */
- pasid = ioasid_alloc(NULL, PASID_MIN,
+ pasid = ioasid_alloc(NULL, IOASID_ALLOC_BASE,
pci_max_pasids(to_pci_dev(dev)) - 1,
NULL);
if (pasid == INVALID_IOASID) {
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index d5552e2c160d..c3a714535c03 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -10,8 +10,7 @@
#ifndef __INTEL_PASID_H
#define __INTEL_PASID_H
-#define PASID_RID2PASID 0x0
-#define PASID_MIN 0x1
+#define PASID_RID2PASID IOASID_DMA_NO_PASID
#define PASID_MAX 0x100000
#define PASID_PTE_MASK 0x3F
#define PASID_PTE_PRESENT 1
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 5b5d69b04fcc..95dcaf78c22c 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -511,7 +511,7 @@ static int intel_svm_alloc_pasid(struct device *dev, struct mm_struct *mm,
ioasid_t max_pasid = dev_is_pci(dev) ?
pci_max_pasids(to_pci_dev(dev)) : intel_pasid_max_id;
- return iommu_sva_alloc_pasid(mm, PASID_MIN, max_pasid - 1);
+ return iommu_sva_alloc_pasid(mm, IOASID_ALLOC_BASE, max_pasid - 1);
}
static void intel_svm_free_pasid(struct mm_struct *mm)
diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
index 50ee27bbd04e..89c6132bf1ec 100644
--- a/drivers/iommu/ioasid.c
+++ b/drivers/iommu/ioasid.c
@@ -317,6 +317,8 @@ ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
data->private = private;
refcount_set(&data->refs, 1);
+ if (min < IOASID_ALLOC_BASE)
+ min = IOASID_ALLOC_BASE;
/*
* Custom allocator needs allocator data to perform platform specific
* operations.
diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h
index e9dacd4b9f6b..4d435cbd48b8 100644
--- a/include/linux/ioasid.h
+++ b/include/linux/ioasid.h
@@ -6,6 +6,10 @@
#include <linux/errno.h>
#define INVALID_IOASID ((ioasid_t)-1)
+#define IOASID_DMA_NO_PASID 0 /* For DMA request w/o PASID */
+#define IOASID_DMA_PASID 1 /* For in-kernel DMA w/ PASID */
+#define IOASID_ALLOC_BASE 2 /* Start of the allocation */
+
typedef unsigned int ioasid_t;
typedef ioasid_t (*ioasid_alloc_fn_t)(ioasid_t min, ioasid_t max, void *data);
typedef void (*ioasid_free_fn_t)(ioasid_t ioasid, void *data);
--
2.25.1
DMA mapping API is the de facto standard for in-kernel DMA. It operates
on a per device/RID basis which is not PASID-aware.
Some modern devices such as Intel Data Streaming Accelerator, PASID is
required for certain work submissions. To allow such devices use DMA
mapping API, we need the following functionalities:
1. Provide device a way to retrieve a kernel PASID for work submission
2. Enable the kernel PASID on the IOMMU
3. Establish address space for the kernel PASID that matches the default
domain. Let it be IOVA or physical address in case of pass-through.
This patch introduces a driver facing API that enables DMA API
PASID usage. Once enabled, device drivers can continue to use DMA APIs as
is. There is no difference in dma_handle between without PASID and with
PASID.
To manage device IOTLB flush at PASID level, this patch also introduces
a .pasid field to struct device. This also serves as a flag indicating
whether PASID is being used for the device to perform in-kernel DMA.
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/dma-iommu.c | 71 +++++++++++++++++++++++++++++++++++++++
include/linux/device.h | 1 +
include/linux/dma-iommu.h | 7 ++++
include/linux/iommu.h | 4 +++
4 files changed, 83 insertions(+)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index b42e38a0dbe2..8855d5e99d8e 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -167,6 +167,77 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
domain->iova_cookie = NULL;
}
+/**
+ * iommu_enable_pasid_dma --Enable in-kernel DMA request with PASID
+ * @dev: Device to be enabled
+ *
+ * DMA request with PASID will be mapped the same way as the legacy DMA.
+ * If the device is in pass-through, PASID will also pass-through. If the
+ * device is in IOVA map, the supervisor PASID will point to the same IOVA
+ * page table.
+ *
+ * @return the kernel PASID to be used for DMA or INVALID_IOASID on failure
+ */
+ioasid_t iommu_enable_pasid_dma(struct device *dev)
+{
+ struct iommu_domain *dom;
+
+ if (dev->pasid) {
+ dev_err(dev, "PASID DMA already enabled\n");
+ return IOASID_DMA_PASID;
+ }
+ dom = iommu_get_domain_for_dev(dev);
+
+ if (!dom) {
+ dev_err(dev, "No IOMMU domain\n");
+ return INVALID_IOASID;
+ }
+
+ /*
+ * Use the reserved kernel PASID for all devices. For now,
+ * there is no need to have different PASIDs for in-kernel use.
+ */
+ if (!dom->ops->enable_pasid_dma || dom->ops->enable_pasid_dma(dev, IOASID_DMA_PASID))
+ return INVALID_IOASID;
+ /* Used for device IOTLB flush */
+ dev->pasid = IOASID_DMA_PASID;
+
+ return IOASID_DMA_PASID;
+}
+EXPORT_SYMBOL(iommu_enable_pasid_dma);
+
+/**
+ * iommu_disable_pasid_dma --Disable in-kernel DMA request with PASID
+ * @dev: Device's PASID DMA to be disabled
+ *
+ * It is the device driver's responsibility to ensure no more incoming DMA
+ * requests with the kernel PASID before calling this function. IOMMU driver
+ * ensures PASID cache, IOTLBs related to the kernel PASID are cleared and
+ * drained.
+ *
+ * @return 0 on success or error code on failure
+ */
+int iommu_disable_pasid_dma(struct device *dev)
+{
+ struct iommu_domain *dom;
+ int ret = 0;
+
+ if (!dev->pasid) {
+ dev_err(dev, "PASID DMA not enabled\n");
+ return -ENODEV;
+ }
+ dom = iommu_get_domain_for_dev(dev);
+ if (!dom->ops->disable_pasid_dma)
+ return -ENOTSUPP;
+
+ ret = dom->ops->disable_pasid_dma(dev);
+ if (!ret)
+ dev->pasid = 0;
+
+ return ret;
+}
+EXPORT_SYMBOL(iommu_disable_pasid_dma);
+
/**
* iommu_dma_get_resv_regions - Reserved region driver helper
* @dev: Device from iommu_get_resv_regions()
diff --git a/include/linux/device.h b/include/linux/device.h
index e270cb740b9e..8afa033b8b0b 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -559,6 +559,7 @@ struct device {
void (*release)(struct device *dev);
struct iommu_group *iommu_group;
struct dev_iommu *iommu;
+ u32 pasid; /* For in-kernel DMA w/ PASID */
enum device_removable removable;
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 24607dc3c2ac..298b31e3a007 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -18,6 +18,13 @@ int iommu_get_dma_cookie(struct iommu_domain *domain);
int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
void iommu_put_dma_cookie(struct iommu_domain *domain);
+/*
+ * For devices that can do DMA request with PASID, setup a system PASID.
+ * Address modes (IOVA, PA) are selected by the platform code.
+ */
+ioasid_t iommu_enable_pasid_dma(struct device *dev);
+int iommu_disable_pasid_dma(struct device *dev);
+
/* Setup call for arch DMA mapping code */
void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
int iommu_dma_init_fq(struct iommu_domain *domain);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d2f3435e7d17..45d281849c93 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -236,6 +236,8 @@ struct iommu_iotlb_gather {
* - IOMMU_DOMAIN_IDENTITY: must use an identity domain
* - IOMMU_DOMAIN_DMA: must use a dma domain
* - 0: use the default setting
+ * @enable_pasid_dma: Set up PASID for in-kernel DMA
+ * @disable_pasid_dma: Disable in-kernel DMA with PASID on the device
* @pgsize_bitmap: bitmap of all possible supported page sizes
* @owner: Driver module providing these ops
*/
@@ -310,6 +312,8 @@ struct iommu_ops {
int (*def_domain_type)(struct device *dev);
+ int (*enable_pasid_dma)(struct device *dev, u32 pasid);
+ int (*disable_pasid_dma)(struct device *dev);
unsigned long pgsize_bitmap;
struct module *owner;
};
--
2.25.1
Between DMA requests with and without PASID (legacy), DMA mapping APIs
are used indiscriminately on a device. Therefore, we should always match
the addressing mode of the legacy DMA when enabling kernel PASID.
This patch adds support for VT-d driver where the kernel PASID is
programmed to match RIDPASID. i.e. if the device is in pass-through, the
kernel PASID is also in pass-through; if the device is in IOVA mode, the
kernel PASID will also be using the same IOVA space.
There is additional handling for IOTLB and device TLB flush w.r.t. the
kernel PASID. On VT-d, PASID-selective IOTLB flush is also on a
per-domain basis; whereas device TLB flush is per device. Note that
IOTLBs are used even when devices are in pass-through mode. ATS is
enabled device-wide, but the device drivers can choose to manage ATS at
per PASID level whenever control is available.
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/iommu.c | 105 +++++++++++++++++++++++++++++++++++-
drivers/iommu/intel/pasid.c | 7 +++
include/linux/intel-iommu.h | 3 +-
3 files changed, 113 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 60253bc436bb..a2ef6b9e4bfc 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1743,7 +1743,14 @@ static void domain_flush_piotlb(struct intel_iommu *iommu,
if (domain->default_pasid)
qi_flush_piotlb(iommu, did, domain->default_pasid,
addr, npages, ih);
-
+ if (domain->kernel_pasid && !domain_type_is_si(domain)) {
+ /*
+ * REVISIT: we only do PASID IOTLB inval for FL, we could have SL
+ * for PASID in the future such as vIOMMU PT. this doesn't get hit.
+ */
+ qi_flush_piotlb(iommu, did, domain->kernel_pasid,
+ addr, npages, ih);
+ }
if (!list_empty(&domain->devices))
qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, npages, ih);
}
@@ -5695,6 +5702,100 @@ static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
}
}
+static int intel_enable_pasid_dma(struct device *dev, u32 pasid)
+{
+ struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
+ struct device_domain_info *info;
+ unsigned long flags;
+ int ret = 0;
+
+ info = get_domain_info(dev);
+ if (!info)
+ return -ENODEV;
+
+ if (!dev_is_pci(dev) || !sm_supported(info->iommu))
+ return -EINVAL;
+
+ if (intel_iommu_enable_pasid(info->iommu, dev))
+ return -ENODEV;
+
+ spin_lock_irqsave(&device_domain_lock, flags);
+ spin_lock(&iommu->lock);
+ /*
+ * Store PASID for IOTLB flush, but only needed for non-passthrough
+ * unmap case. For passthrough, we only need to do IOTLB flush during
+ * PASID teardown. Flush covers all devices in the same domain as the
+ * domain ID is the same for the same SL.
+ */
+ info->domain->kernel_pasid = pasid;
+
+ /*
+ * Tracks how many attached devices are using the kernel PASID. Clear
+ * the domain kernel PASID when all users called disable_pasid_dma().
+ */
+ atomic_inc(&info->domain->kernel_pasid_user);
+
+ /*
+ * Addressing modes (IOVA vs. PA) is a per device choice made by the
+ * platform code. We must treat legacy DMA (request w/o PASID) and
+ * DMA w/ PASID identially in terms of mapping. Here we just set up
+ * the kernel PASID to match the mapping of RID2PASID/PASID0.
+ */
+ if (hw_pass_through && domain_type_is_si(info->domain)) {
+ ret = intel_pasid_setup_pass_through(info->iommu, info->domain,
+ dev, pasid);
+ if (ret)
+ dev_err(dev, "Failed kernel PASID %d in BYPASS", pasid);
+
+ } else if (domain_use_first_level(info->domain)) {
+ /* We are using FL for IOVA, this is the default option */
+ ret = domain_setup_first_level(info->iommu, info->domain, dev,
+ pasid);
+ if (ret)
+ dev_err(dev, "Failed kernel PASID %d IOVA FL", pasid);
+ } else {
+ ret = intel_pasid_setup_second_level(info->iommu, info->domain,
+ dev, pasid);
+ if (ret)
+ dev_err(dev, "Failed kernel SPASID %d IOVA SL", pasid);
+ }
+
+ spin_unlock(&iommu->lock);
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+
+ return ret;
+}
+
+static int intel_disable_pasid_dma(struct device *dev)
+{
+ struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
+ struct device_domain_info *info;
+ unsigned long flags;
+ int ret = 0;
+
+ info = get_domain_info(dev);
+ if (!info)
+ return -ENODEV;
+
+ if (!dev_is_pci(dev) || !sm_supported(info->iommu))
+ return -EINVAL;
+
+ spin_lock_irqsave(&device_domain_lock, flags);
+ spin_lock(&iommu->lock);
+
+ /* Tear down kernel PASID for this device */
+ intel_pasid_tear_down_entry(info->iommu, info->dev,
+ info->domain->kernel_pasid,
+ false);
+ /* Clear the domain kernel PASID when there is no users */
+ if (atomic_dec_and_test(&info->domain->kernel_pasid_user))
+ info->domain->kernel_pasid = 0;
+
+ spin_unlock(&iommu->lock);
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+ return ret;
+}
+
const struct iommu_ops intel_iommu_ops = {
.capable = intel_iommu_capable,
.domain_alloc = intel_iommu_domain_alloc,
@@ -5732,6 +5833,8 @@ const struct iommu_ops intel_iommu_ops = {
.sva_get_pasid = intel_svm_get_pasid,
.page_response = intel_svm_page_response,
#endif
+ .enable_pasid_dma = intel_enable_pasid_dma,
+ .disable_pasid_dma = intel_disable_pasid_dma,
};
static void quirk_iommu_igfx(struct pci_dev *dev)
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 07c390aed1fe..24e889612357 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -505,6 +505,13 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT);
else
qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid, qdep, 0, 64 - VTD_PAGE_SHIFT);
+ /*
+ * Flush the kernel PASID if used by the device. This is the case where
+ * a device driver uses IOVA via DMA map APIs for request with PASID.
+ */
+ if (dev->pasid)
+ qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, dev->pasid, qdep, 0,
+ 64 - VTD_PAGE_SHIFT);
}
void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index fe9fd417d611..6725a0ddfc6a 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -567,7 +567,8 @@ struct dmar_domain {
* The default pasid used for non-SVM
* traffic on mediated devices.
*/
-
+ u32 kernel_pasid; /* for in-kernel DMA w/ PASID */
+ atomic_t kernel_pasid_user; /* count of kernel_pasid users */
struct iommu_domain domain; /* generic domain data structure for
iommu core */
};
--
2.25.1
In-kernel DMA should be managed by DMA mapping API. The existing kernel
PASID support is based on the SVA machinery in SVA lib that is intended
for user process SVA. The binding between a kernel PASID and kernel
mapping has many flaws. See discussions in the link below.
This patch utilizes iommu_enable_pasid_dma() to enable DSA to perform DMA
requests with PASID under the same mapping managed by DMA mapping API.
In addition, SVA-related bits for kernel DMA are removed. As a result,
DSA users shall use DMA mapping API to obtain DMA handles instead of
using kernel virtual addresses.
Link: https://lore.kernel.org/linux-iommu/[email protected]/
Signed-off-by: Jacob Pan <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 6 --
drivers/dma/Kconfig | 10 ----
drivers/dma/idxd/idxd.h | 1 -
drivers/dma/idxd/init.c | 59 ++++++-------------
drivers/dma/idxd/sysfs.c | 7 ---
5 files changed, 19 insertions(+), 64 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9725c546a0d4..fe73d02c62f3 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1751,12 +1751,6 @@
In such case C2/C3 won't be used again.
idle=nomwait: Disable mwait for CPU C-states
- idxd.sva= [HW]
- Format: <bool>
- Allow force disabling of Shared Virtual Memory (SVA)
- support for the idxd driver. By default it is set to
- true (1).
-
idxd.tc_override= [HW]
Format: <bool>
Allow override of default traffic class configuration
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 6bcdb4e6a0d1..3b28bd720e7d 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -313,16 +313,6 @@ config INTEL_IDXD_COMPAT
If unsure, say N.
-# Config symbol that collects all the dependencies that's necessary to
-# support shared virtual memory for the devices supported by idxd.
-config INTEL_IDXD_SVM
- bool "Accelerator Shared Virtual Memory Support"
- depends on INTEL_IDXD
- depends on INTEL_IOMMU_SVM
- depends on PCI_PRI
- depends on PCI_PASID
- depends on PCI_IOV
-
config INTEL_IDXD_PERFMON
bool "Intel Data Accelerators performance monitor support"
depends on INTEL_IDXD
diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
index 0cf8d3145870..3155e3a2d3ae 100644
--- a/drivers/dma/idxd/idxd.h
+++ b/drivers/dma/idxd/idxd.h
@@ -262,7 +262,6 @@ struct idxd_device {
struct idxd_wq **wqs;
struct idxd_engine **engines;
- struct iommu_sva *sva;
unsigned int pasid;
int num_groups;
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 7bf03f371ce1..44633f8113e2 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -16,6 +16,7 @@
#include <linux/idr.h>
#include <linux/intel-svm.h>
#include <linux/iommu.h>
+#include <linux/dma-iommu.h>
#include <uapi/linux/idxd.h>
#include <linux/dmaengine.h>
#include "../dmaengine.h"
@@ -28,10 +29,6 @@ MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Intel Corporation");
MODULE_IMPORT_NS(IDXD);
-static bool sva = true;
-module_param(sva, bool, 0644);
-MODULE_PARM_DESC(sva, "Toggle SVA support on/off");
-
bool tc_override;
module_param(tc_override, bool, 0644);
MODULE_PARM_DESC(tc_override, "Override traffic class defaults");
@@ -530,36 +527,22 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
static int idxd_enable_system_pasid(struct idxd_device *idxd)
{
- int flags;
- unsigned int pasid;
- struct iommu_sva *sva;
-
- flags = SVM_FLAG_SUPERVISOR_MODE;
-
- sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, &flags);
- if (IS_ERR(sva)) {
- dev_warn(&idxd->pdev->dev,
- "iommu sva bind failed: %ld\n", PTR_ERR(sva));
- return PTR_ERR(sva);
- }
+ u32 pasid;
- pasid = iommu_sva_get_pasid(sva);
- if (pasid == IOMMU_PASID_INVALID) {
- iommu_sva_unbind_device(sva);
+ pasid = iommu_enable_pasid_dma(&idxd->pdev->dev);
+ if (pasid == INVALID_IOASID) {
+ dev_err(&idxd->pdev->dev, "No kernel DMA PASID\n");
return -ENODEV;
}
-
- idxd->sva = sva;
idxd->pasid = pasid;
- dev_dbg(&idxd->pdev->dev, "system pasid: %u\n", pasid);
+
return 0;
}
static void idxd_disable_system_pasid(struct idxd_device *idxd)
{
-
- iommu_sva_unbind_device(idxd->sva);
- idxd->sva = NULL;
+ iommu_disable_pasid_dma(&idxd->pdev->dev);
+ idxd->pasid = 0;
}
static int idxd_probe(struct idxd_device *idxd)
@@ -575,21 +558,17 @@ static int idxd_probe(struct idxd_device *idxd)
dev_dbg(dev, "IDXD reset complete\n");
- if (IS_ENABLED(CONFIG_INTEL_IDXD_SVM) && sva) {
- rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA);
- if (rc == 0) {
- rc = idxd_enable_system_pasid(idxd);
- if (rc < 0) {
- iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_SVA);
- dev_warn(dev, "Failed to enable PASID. No SVA support: %d\n", rc);
- } else {
- set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
- }
- } else {
- dev_warn(dev, "Unable to turn on SVA feature.\n");
- }
- } else if (!sva) {
- dev_warn(dev, "User forced SVA off via module param.\n");
+ /*
+ * Try to enable both in-kernel and user DMA request with PASID.
+ * PASID is supported unless both user and kernel PASID are
+ * supported. Do not fail probe here in that idxd can still be
+ * used w/o PASID or IOMMU.
+ */
+ if (iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) ||
+ idxd_enable_system_pasid(idxd)) {
+ dev_warn(dev, "Failed to enable PASID\n");
+ } else {
+ set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
}
idxd_read_caps(idxd);
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index a9025be940db..35737299c355 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -776,13 +776,6 @@ static ssize_t wq_name_store(struct device *dev,
if (strlen(buf) > WQ_NAME_SIZE || strlen(buf) == 0)
return -EINVAL;
- /*
- * This is temporarily placed here until we have SVM support for
- * dmaengine.
- */
- if (wq->type == IDXD_WQT_KERNEL && device_pasid_enabled(wq->idxd))
- return -EOPNOTSUPP;
-
memset(wq->name, 0, WQ_NAME_SIZE + 1);
strncpy(wq->name, buf, WQ_NAME_SIZE);
strreplace(wq->name, '\n', '\0');
--
2.25.1
On 12/7/2021 6:47 AM, Jacob Pan wrote:
> In-kernel DMA should be managed by DMA mapping API. The existing kernel
> PASID support is based on the SVA machinery in SVA lib that is intended
> for user process SVA. The binding between a kernel PASID and kernel
> mapping has many flaws. See discussions in the link below.
>
> This patch utilizes iommu_enable_pasid_dma() to enable DSA to perform DMA
> requests with PASID under the same mapping managed by DMA mapping API.
> In addition, SVA-related bits for kernel DMA are removed. As a result,
> DSA users shall use DMA mapping API to obtain DMA handles instead of
> using kernel virtual addresses.
>
> Link: https://lore.kernel.org/linux-iommu/[email protected]/
> Signed-off-by: Jacob Pan <[email protected]>
Acked-by: Dave Jiang <[email protected]>
Also cc Vinod and dmaengine@vger
> ---
> .../admin-guide/kernel-parameters.txt | 6 --
> drivers/dma/Kconfig | 10 ----
> drivers/dma/idxd/idxd.h | 1 -
> drivers/dma/idxd/init.c | 59 ++++++-------------
> drivers/dma/idxd/sysfs.c | 7 ---
> 5 files changed, 19 insertions(+), 64 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 9725c546a0d4..fe73d02c62f3 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1751,12 +1751,6 @@
> In such case C2/C3 won't be used again.
> idle=nomwait: Disable mwait for CPU C-states
>
> - idxd.sva= [HW]
> - Format: <bool>
> - Allow force disabling of Shared Virtual Memory (SVA)
> - support for the idxd driver. By default it is set to
> - true (1).
> -
> idxd.tc_override= [HW]
> Format: <bool>
> Allow override of default traffic class configuration
> diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
> index 6bcdb4e6a0d1..3b28bd720e7d 100644
> --- a/drivers/dma/Kconfig
> +++ b/drivers/dma/Kconfig
> @@ -313,16 +313,6 @@ config INTEL_IDXD_COMPAT
>
> If unsure, say N.
>
> -# Config symbol that collects all the dependencies that's necessary to
> -# support shared virtual memory for the devices supported by idxd.
> -config INTEL_IDXD_SVM
> - bool "Accelerator Shared Virtual Memory Support"
> - depends on INTEL_IDXD
> - depends on INTEL_IOMMU_SVM
> - depends on PCI_PRI
> - depends on PCI_PASID
> - depends on PCI_IOV
> -
> config INTEL_IDXD_PERFMON
> bool "Intel Data Accelerators performance monitor support"
> depends on INTEL_IDXD
> diff --git a/drivers/dma/idxd/idxd.h b/drivers/dma/idxd/idxd.h
> index 0cf8d3145870..3155e3a2d3ae 100644
> --- a/drivers/dma/idxd/idxd.h
> +++ b/drivers/dma/idxd/idxd.h
> @@ -262,7 +262,6 @@ struct idxd_device {
> struct idxd_wq **wqs;
> struct idxd_engine **engines;
>
> - struct iommu_sva *sva;
> unsigned int pasid;
>
> int num_groups;
> diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
> index 7bf03f371ce1..44633f8113e2 100644
> --- a/drivers/dma/idxd/init.c
> +++ b/drivers/dma/idxd/init.c
> @@ -16,6 +16,7 @@
> #include <linux/idr.h>
> #include <linux/intel-svm.h>
> #include <linux/iommu.h>
> +#include <linux/dma-iommu.h>
> #include <uapi/linux/idxd.h>
> #include <linux/dmaengine.h>
> #include "../dmaengine.h"
> @@ -28,10 +29,6 @@ MODULE_LICENSE("GPL v2");
> MODULE_AUTHOR("Intel Corporation");
> MODULE_IMPORT_NS(IDXD);
>
> -static bool sva = true;
> -module_param(sva, bool, 0644);
> -MODULE_PARM_DESC(sva, "Toggle SVA support on/off");
> -
> bool tc_override;
> module_param(tc_override, bool, 0644);
> MODULE_PARM_DESC(tc_override, "Override traffic class defaults");
> @@ -530,36 +527,22 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
>
> static int idxd_enable_system_pasid(struct idxd_device *idxd)
> {
> - int flags;
> - unsigned int pasid;
> - struct iommu_sva *sva;
> -
> - flags = SVM_FLAG_SUPERVISOR_MODE;
> -
> - sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, &flags);
> - if (IS_ERR(sva)) {
> - dev_warn(&idxd->pdev->dev,
> - "iommu sva bind failed: %ld\n", PTR_ERR(sva));
> - return PTR_ERR(sva);
> - }
> + u32 pasid;
>
> - pasid = iommu_sva_get_pasid(sva);
> - if (pasid == IOMMU_PASID_INVALID) {
> - iommu_sva_unbind_device(sva);
> + pasid = iommu_enable_pasid_dma(&idxd->pdev->dev);
> + if (pasid == INVALID_IOASID) {
> + dev_err(&idxd->pdev->dev, "No kernel DMA PASID\n");
> return -ENODEV;
> }
> -
> - idxd->sva = sva;
> idxd->pasid = pasid;
> - dev_dbg(&idxd->pdev->dev, "system pasid: %u\n", pasid);
> +
> return 0;
> }
>
> static void idxd_disable_system_pasid(struct idxd_device *idxd)
> {
> -
> - iommu_sva_unbind_device(idxd->sva);
> - idxd->sva = NULL;
> + iommu_disable_pasid_dma(&idxd->pdev->dev);
> + idxd->pasid = 0;
> }
>
> static int idxd_probe(struct idxd_device *idxd)
> @@ -575,21 +558,17 @@ static int idxd_probe(struct idxd_device *idxd)
>
> dev_dbg(dev, "IDXD reset complete\n");
>
> - if (IS_ENABLED(CONFIG_INTEL_IDXD_SVM) && sva) {
> - rc = iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA);
> - if (rc == 0) {
> - rc = idxd_enable_system_pasid(idxd);
> - if (rc < 0) {
> - iommu_dev_disable_feature(dev, IOMMU_DEV_FEAT_SVA);
> - dev_warn(dev, "Failed to enable PASID. No SVA support: %d\n", rc);
> - } else {
> - set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
> - }
> - } else {
> - dev_warn(dev, "Unable to turn on SVA feature.\n");
> - }
> - } else if (!sva) {
> - dev_warn(dev, "User forced SVA off via module param.\n");
> + /*
> + * Try to enable both in-kernel and user DMA request with PASID.
> + * PASID is supported unless both user and kernel PASID are
> + * supported. Do not fail probe here in that idxd can still be
> + * used w/o PASID or IOMMU.
> + */
> + if (iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) ||
> + idxd_enable_system_pasid(idxd)) {
> + dev_warn(dev, "Failed to enable PASID\n");
> + } else {
> + set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
> }
>
> idxd_read_caps(idxd);
> diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
> index a9025be940db..35737299c355 100644
> --- a/drivers/dma/idxd/sysfs.c
> +++ b/drivers/dma/idxd/sysfs.c
> @@ -776,13 +776,6 @@ static ssize_t wq_name_store(struct device *dev,
> if (strlen(buf) > WQ_NAME_SIZE || strlen(buf) == 0)
> return -EINVAL;
>
> - /*
> - * This is temporarily placed here until we have SVM support for
> - * dmaengine.
> - */
> - if (wq->type == IDXD_WQT_KERNEL && device_pasid_enabled(wq->idxd))
> - return -EOPNOTSUPP;
> -
> memset(wq->name, 0, WQ_NAME_SIZE + 1);
> strncpy(wq->name, buf, WQ_NAME_SIZE);
> strreplace(wq->name, '\n', '\0');
Hi Jacob,
On 12/7/21 9:47 PM, Jacob Pan wrote:
> DMA mapping API is the de facto standard for in-kernel DMA. It operates
> on a per device/RID basis which is not PASID-aware.
>
> Some modern devices such as Intel Data Streaming Accelerator, PASID is
> required for certain work submissions. To allow such devices use DMA
> mapping API, we need the following functionalities:
> 1. Provide device a way to retrieve a kernel PASID for work submission
> 2. Enable the kernel PASID on the IOMMU
> 3. Establish address space for the kernel PASID that matches the default
> domain. Let it be IOVA or physical address in case of pass-through.
>
> This patch introduces a driver facing API that enables DMA API
> PASID usage. Once enabled, device drivers can continue to use DMA APIs as
> is. There is no difference in dma_handle between without PASID and with
> PASID.
Can a device issue DMA requests with PASID even there's no system IOMMU
or the system IOMMU is disabled?
Best regards,
baolu
On 07-12-21, 16:27, Dave Jiang wrote:
>
> On 12/7/2021 6:47 AM, Jacob Pan wrote:
> > In-kernel DMA should be managed by DMA mapping API. The existing kernel
> > PASID support is based on the SVA machinery in SVA lib that is intended
> > for user process SVA. The binding between a kernel PASID and kernel
> > mapping has many flaws. See discussions in the link below.
> >
> > This patch utilizes iommu_enable_pasid_dma() to enable DSA to perform DMA
> > requests with PASID under the same mapping managed by DMA mapping API.
> > In addition, SVA-related bits for kernel DMA are removed. As a result,
> > DSA users shall use DMA mapping API to obtain DMA handles instead of
> > using kernel virtual addresses.
> >
> > Link: https://lore.kernel.org/linux-iommu/[email protected]/
> > Signed-off-by: Jacob Pan <[email protected]>
>
> Acked-by: Dave Jiang <[email protected]>
>
> Also cc Vinod and dmaengine@vger
Pls resend collecting acks. I dont have this in my queue
--
~Vinod
On Tue, Dec 07, 2021 at 05:47:10AM -0800, Jacob Pan wrote:
> Modern accelerators such as Intel's Data Streaming Accelerator (DSA) can
> perform DMA requests with PASID, which is a finer granularity than the
> device's requester ID(RID). In fact, work submissions on DSA shared work
> queues require PASID.
Lets use plain langauge please:
DSA HW cannot do DMA from its RID, so always requires a PASID, even
for kernel controlled DMA.
To allow it to use the DMA API we must associate a PASID with the
iommu_domain that the DMA API is already using for the device's RID.
This way DMA tagged with the PASID will be treated exactly the same as
DMA originating from the RID.
> DMA mapping API is the de facto standard for in-kernel DMA. However, it
> operates on a per device/RID basis which is not PASID-aware.
>
> This patch introduces the following driver facing API that enables DMA API
> PASID usage: ioasid_t iommu_enable_pasid_dma(struct device *dev);
This is the wrong API, IMHO
It should be more like
int iommu_get_dma_api_pasid(struct device *dev, ioasid_t *pasid);
void iommu_destroy_dma_api_pasid(struct device *dev);
> A PASID field is added to struct device for the purposes of storing kernel
> DMA PASID and flushing device IOTLBs. A separate use case in interrupt
And this really should not be touching the struct device at all.
At worst the PASID should be stored in the iommu_group.
> message store (IMS) also hinted adding a PASID field to struct device.
> https://lore.kernel.org/all/[email protected]/
> IMS virtualization and DMA API does not overlap.
This is under debate, I'm skeptical it will happen considering the new
direction for this work.
> Once enabled, device drivers can continue to use DMA APIs as-is. There is
> no difference in terms of mapping in dma_handle between without PASID and
> with PASID. The DMA mapping performed by IOMMU will be identical for both
> requests with and without PASID (legacy), let it be IOVA or PA in case of
> pass-through.
In other words all this does is connect the PASID to the normal
DMA-API owned iommu_domain.
Jason
On Tue, Dec 07, 2021 at 05:47:14AM -0800, Jacob Pan wrote:
> In-kernel DMA should be managed by DMA mapping API. The existing kernel
> PASID support is based on the SVA machinery in SVA lib that is intended
> for user process SVA. The binding between a kernel PASID and kernel
> mapping has many flaws. See discussions in the link below.
>
> This patch utilizes iommu_enable_pasid_dma() to enable DSA to perform DMA
> requests with PASID under the same mapping managed by DMA mapping API.
> In addition, SVA-related bits for kernel DMA are removed. As a result,
> DSA users shall use DMA mapping API to obtain DMA handles instead of
> using kernel virtual addresses.
Er, shouldn't this be adding dma_map/etc type calls?
You can't really say a driver is using the DMA API without actually
calling the DMA API..
> + /*
> + * Try to enable both in-kernel and user DMA request with PASID.
> + * PASID is supported unless both user and kernel PASID are
> + * supported. Do not fail probe here in that idxd can still be
> + * used w/o PASID or IOMMU.
> + */
> + if (iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) ||
> + idxd_enable_system_pasid(idxd)) {
> + dev_warn(dev, "Failed to enable PASID\n");
> + } else {
> + set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
> }
Huh? How can the driver keep going if PASID isn't supported? I thought
the whole point of this was because the device cannot do DMA without
PASID at all?
Jason
On Tue, Dec 07, 2021 at 05:47:13AM -0800, Jacob Pan wrote:
> Between DMA requests with and without PASID (legacy), DMA mapping APIs
> are used indiscriminately on a device. Therefore, we should always match
> the addressing mode of the legacy DMA when enabling kernel PASID.
>
> This patch adds support for VT-d driver where the kernel PASID is
> programmed to match RIDPASID. i.e. if the device is in pass-through, the
> kernel PASID is also in pass-through; if the device is in IOVA mode, the
> kernel PASID will also be using the same IOVA space.
>
> There is additional handling for IOTLB and device TLB flush w.r.t. the
> kernel PASID. On VT-d, PASID-selective IOTLB flush is also on a
> per-domain basis; whereas device TLB flush is per device. Note that
> IOTLBs are used even when devices are in pass-through mode. ATS is
> enabled device-wide, but the device drivers can choose to manage ATS at
> per PASID level whenever control is available.
>
> Signed-off-by: Jacob Pan <[email protected]>
> drivers/iommu/intel/iommu.c | 105 +++++++++++++++++++++++++++++++++++-
> drivers/iommu/intel/pasid.c | 7 +++
> include/linux/intel-iommu.h | 3 +-
> 3 files changed, 113 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 60253bc436bb..a2ef6b9e4bfc 100644
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1743,7 +1743,14 @@ static void domain_flush_piotlb(struct intel_iommu *iommu,
> if (domain->default_pasid)
> qi_flush_piotlb(iommu, did, domain->default_pasid,
> addr, npages, ih);
> -
> + if (domain->kernel_pasid && !domain_type_is_si(domain)) {
> + /*
> + * REVISIT: we only do PASID IOTLB inval for FL, we could have SL
> + * for PASID in the future such as vIOMMU PT. this doesn't get hit.
> + */
> + qi_flush_piotlb(iommu, did, domain->kernel_pasid,
> + addr, npages, ih);
> + }
> if (!list_empty(&domain->devices))
> qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, npages, ih);
> }
> @@ -5695,6 +5702,100 @@ static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
> }
> }
>
> +static int intel_enable_pasid_dma(struct device *dev, u32 pasid)
> +{
This seems like completely the wrong kind of op.
At the level of the iommu driver things should be iommu_domain centric
The op should be
int attach_dev_pasid(struct iommu_domain *domain, struct device *dev, ioasid_t pasid)
Where 'dev' purpose is to provide the RID
The iommu_domain passed in should be the 'default domain' ie the table
used for on-demand mapping, or the passthrough page table.
> + struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
> + struct device_domain_info *info;
I don't even want to know why an iommu driver is tracking its own
per-device state. That seems like completely wrong layering.
Jason
On 12/8/2021 6:13 AM, Jason Gunthorpe wrote:
> On Tue, Dec 07, 2021 at 05:47:14AM -0800, Jacob Pan wrote:
>> In-kernel DMA should be managed by DMA mapping API. The existing kernel
>> PASID support is based on the SVA machinery in SVA lib that is intended
>> for user process SVA. The binding between a kernel PASID and kernel
>> mapping has many flaws. See discussions in the link below.
>>
>> This patch utilizes iommu_enable_pasid_dma() to enable DSA to perform DMA
>> requests with PASID under the same mapping managed by DMA mapping API.
>> In addition, SVA-related bits for kernel DMA are removed. As a result,
>> DSA users shall use DMA mapping API to obtain DMA handles instead of
>> using kernel virtual addresses.
> Er, shouldn't this be adding dma_map/etc type calls?
>
> You can't really say a driver is using the DMA API without actually
> calling the DMA API..
>
>> + /*
>> + * Try to enable both in-kernel and user DMA request with PASID.
>> + * PASID is supported unless both user and kernel PASID are
>> + * supported. Do not fail probe here in that idxd can still be
>> + * used w/o PASID or IOMMU.
>> + */
>> + if (iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) ||
>> + idxd_enable_system_pasid(idxd)) {
>> + dev_warn(dev, "Failed to enable PASID\n");
>> + } else {
>> + set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
>> }
> Huh? How can the driver keep going if PASID isn't supported? I thought
> the whole point of this was because the device cannot do DMA without
> PASID at all?
There are 2 types of WQ supported with the DSA devices. A dedicated WQ
type and a shared WQ type. The dedicated WQ type can support DMA with
and without PASID. The shared wq type must have a PASID to operate. The
driver can support dedicated WQ only without PASID usage when there is
no PASID support.
>
> Jason
Hi Vinod,
On Wed, 8 Dec 2021 10:26:22 +0530, Vinod Koul <[email protected]> wrote:
> Pls resend collecting acks. I dont have this in my queue
Will do. Sorry I missed the dmaengine list.
Thanks,
Jacob
On Wed, Dec 08, 2021 at 08:35:49AM -0700, Dave Jiang wrote:
>
> On 12/8/2021 6:13 AM, Jason Gunthorpe wrote:
> > On Tue, Dec 07, 2021 at 05:47:14AM -0800, Jacob Pan wrote:
> > > In-kernel DMA should be managed by DMA mapping API. The existing kernel
> > > PASID support is based on the SVA machinery in SVA lib that is intended
> > > for user process SVA. The binding between a kernel PASID and kernel
> > > mapping has many flaws. See discussions in the link below.
> > >
> > > This patch utilizes iommu_enable_pasid_dma() to enable DSA to perform DMA
> > > requests with PASID under the same mapping managed by DMA mapping API.
> > > In addition, SVA-related bits for kernel DMA are removed. As a result,
> > > DSA users shall use DMA mapping API to obtain DMA handles instead of
> > > using kernel virtual addresses.
> > Er, shouldn't this be adding dma_map/etc type calls?
> >
> > You can't really say a driver is using the DMA API without actually
> > calling the DMA API..
> >
> > > + /*
> > > + * Try to enable both in-kernel and user DMA request with PASID.
> > > + * PASID is supported unless both user and kernel PASID are
> > > + * supported. Do not fail probe here in that idxd can still be
> > > + * used w/o PASID or IOMMU.
> > > + */
> > > + if (iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) ||
> > > + idxd_enable_system_pasid(idxd)) {
> > > + dev_warn(dev, "Failed to enable PASID\n");
> > > + } else {
> > > + set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
> > > }
> > Huh? How can the driver keep going if PASID isn't supported? I thought
> > the whole point of this was because the device cannot do DMA without
> > PASID at all?
>
> There are 2 types of WQ supported with the DSA devices. A dedicated WQ type
> and a shared WQ type. The dedicated WQ type can support DMA with and without
> PASID. The shared wq type must have a PASID to operate. The driver can
> support dedicated WQ only without PASID usage when there is no PASID
> support.
Can you add to the cover letter why does the kernel require to use the
shared WQ?
Jason
Hi Jason,
Thanks for the quick review.
On Wed, 8 Dec 2021 09:10:38 -0400, Jason Gunthorpe <[email protected]> wrote:
> On Tue, Dec 07, 2021 at 05:47:10AM -0800, Jacob Pan wrote:
> > Modern accelerators such as Intel's Data Streaming Accelerator (DSA) can
> > perform DMA requests with PASID, which is a finer granularity than the
> > device's requester ID(RID). In fact, work submissions on DSA shared work
> > queues require PASID.
>
> Lets use plain langauge please:
>
> DSA HW cannot do DMA from its RID, so always requires a PASID, even
> for kernel controlled DMA.
>
> To allow it to use the DMA API we must associate a PASID with the
> iommu_domain that the DMA API is already using for the device's RID.
>
> This way DMA tagged with the PASID will be treated exactly the same as
> DMA originating from the RID.
>
Exactly, will incorporate in the next version.
> > DMA mapping API is the de facto standard for in-kernel DMA. However, it
> > operates on a per device/RID basis which is not PASID-aware.
> >
> > This patch introduces the following driver facing API that enables DMA
> > API PASID usage: ioasid_t iommu_enable_pasid_dma(struct device *dev);
>
> This is the wrong API, IMHO
>
> It should be more like
>
> int iommu_get_dma_api_pasid(struct device *dev, ioasid_t *pasid);
This works. I had ioasid_t *pasid in my previous version but thinking we
can simplify the interface since we can have the reserved INVALID_IOASID
for return status.
But it seems to me _get_ does not convey the message that this API
actually enables/attaches the kernel DMA PASID. Perhaps call it
iommu_attach_dma_api_pasid() as you suggested in the ops function?
> void iommu_destroy_dma_api_pasid(struct device *dev);
>
Sounds good
> > A PASID field is added to struct device for the purposes of storing
> > kernel DMA PASID and flushing device IOTLBs. A separate use case in
> > interrupt
>
> And this really should not be touching the struct device at all.
>
I was thinking RID is per device, this PASID == RID. We could put in pci_dev
but there are non-PCI devices use SSID/PASID.
> At worst the PASID should be stored in the iommu_group.
>
This also makes sense, default domain is stored per group. To support
multiple devices per group, we still need a per device flag for devTLB
flush. Right?
i.e. while doing iova unmap, IOTLBs are flush for all devices, but we
only need to flush the device TLBs for devices that has kernel DMA PASID.
> > message store (IMS) also hinted adding a PASID field to struct device.
> > https://lore.kernel.org/all/[email protected]/
> > IMS virtualization and DMA API does not overlap.
>
> This is under debate, I'm skeptical it will happen considering the new
> direction for this work.
>
Good to know, thanks.
> > Once enabled, device drivers can continue to use DMA APIs as-is. There
> > is no difference in terms of mapping in dma_handle between without
> > PASID and with PASID. The DMA mapping performed by IOMMU will be
> > identical for both requests with and without PASID (legacy), let it be
> > IOVA or PA in case of pass-through.
>
> In other words all this does is connect the PASID to the normal
> DMA-API owned iommu_domain.
>
Exactly! will incorporate.
Thanks,
Jacob
Hi Jacob,
I love your patch! Yet something to improve:
[auto build test ERROR on joro-iommu/next]
[also build test ERROR on vkoul-dmaengine/next linux/master linus/master v5.16-rc4 next-20211208]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Jacob-Pan/Enable-PASID-for-DMA-API-users/20211208-063637
base: https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git next
config: x86_64-randconfig-a013-20211208 (https://download.01.org/0day-ci/archive/20211209/[email protected]/config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
# https://github.com/0day-ci/linux/commit/018108bcd08fc145526791870b4b58edeecfca1e
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Jacob-Pan/Enable-PASID-for-DMA-API-users/20211208-063637
git checkout 018108bcd08fc145526791870b4b58edeecfca1e
# save the config file to linux build tree
mkdir build_dir
make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/dma/idxd/
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All errors (new ones prefixed by >>):
drivers/dma/idxd/init.c: In function 'idxd_enable_system_pasid':
>> drivers/dma/idxd/init.c:532:10: error: implicit declaration of function 'iommu_enable_pasid_dma'; did you mean 'iommu_enable_nesting'? [-Werror=implicit-function-declaration]
532 | pasid = iommu_enable_pasid_dma(&idxd->pdev->dev);
| ^~~~~~~~~~~~~~~~~~~~~~
| iommu_enable_nesting
drivers/dma/idxd/init.c: In function 'idxd_disable_system_pasid':
>> drivers/dma/idxd/init.c:544:2: error: implicit declaration of function 'iommu_disable_pasid_dma' [-Werror=implicit-function-declaration]
544 | iommu_disable_pasid_dma(&idxd->pdev->dev);
| ^~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
vim +532 drivers/dma/idxd/init.c
527
528 static int idxd_enable_system_pasid(struct idxd_device *idxd)
529 {
530 u32 pasid;
531
> 532 pasid = iommu_enable_pasid_dma(&idxd->pdev->dev);
533 if (pasid == INVALID_IOASID) {
534 dev_err(&idxd->pdev->dev, "No kernel DMA PASID\n");
535 return -ENODEV;
536 }
537 idxd->pasid = pasid;
538
539 return 0;
540 }
541
542 static void idxd_disable_system_pasid(struct idxd_device *idxd)
543 {
> 544 iommu_disable_pasid_dma(&idxd->pdev->dev);
545 idxd->pasid = 0;
546 }
547
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
Hi Lu,
On Wed, 8 Dec 2021 10:31:36 +0800, Lu Baolu <[email protected]>
wrote:
> Hi Jacob,
>
> On 12/7/21 9:47 PM, Jacob Pan wrote:
> > DMA mapping API is the de facto standard for in-kernel DMA. It operates
> > on a per device/RID basis which is not PASID-aware.
> >
> > Some modern devices such as Intel Data Streaming Accelerator, PASID is
> > required for certain work submissions. To allow such devices use DMA
> > mapping API, we need the following functionalities:
> > 1. Provide device a way to retrieve a kernel PASID for work submission
> > 2. Enable the kernel PASID on the IOMMU
> > 3. Establish address space for the kernel PASID that matches the default
> > domain. Let it be IOVA or physical address in case of pass-through.
> >
> > This patch introduces a driver facing API that enables DMA API
> > PASID usage. Once enabled, device drivers can continue to use DMA APIs
> > as is. There is no difference in dma_handle between without PASID and
> > with PASID.
>
> Can a device issue DMA requests with PASID even there's no system IOMMU
> or the system IOMMU is disabled?
>
Good point.
If IOMMU is not enabled, device cannot issue DMA requests with PASID. This
API will not be available. Forgot to add dummy functions to the header.
> Best regards,
> baolu
Thanks,
Jacob
Hi Jason,
On Wed, 8 Dec 2021 09:22:55 -0400, Jason Gunthorpe <[email protected]> wrote:
> On Tue, Dec 07, 2021 at 05:47:13AM -0800, Jacob Pan wrote:
> > Between DMA requests with and without PASID (legacy), DMA mapping APIs
> > are used indiscriminately on a device. Therefore, we should always match
> > the addressing mode of the legacy DMA when enabling kernel PASID.
> >
> > This patch adds support for VT-d driver where the kernel PASID is
> > programmed to match RIDPASID. i.e. if the device is in pass-through, the
> > kernel PASID is also in pass-through; if the device is in IOVA mode, the
> > kernel PASID will also be using the same IOVA space.
> >
> > There is additional handling for IOTLB and device TLB flush w.r.t. the
> > kernel PASID. On VT-d, PASID-selective IOTLB flush is also on a
> > per-domain basis; whereas device TLB flush is per device. Note that
> > IOTLBs are used even when devices are in pass-through mode. ATS is
> > enabled device-wide, but the device drivers can choose to manage ATS at
> > per PASID level whenever control is available.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > drivers/iommu/intel/iommu.c | 105 +++++++++++++++++++++++++++++++++++-
> > drivers/iommu/intel/pasid.c | 7 +++
> > include/linux/intel-iommu.h | 3 +-
> > 3 files changed, 113 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 60253bc436bb..a2ef6b9e4bfc 100644
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -1743,7 +1743,14 @@ static void domain_flush_piotlb(struct
> > intel_iommu *iommu, if (domain->default_pasid)
> > qi_flush_piotlb(iommu, did, domain->default_pasid,
> > addr, npages, ih);
> > -
> > + if (domain->kernel_pasid && !domain_type_is_si(domain)) {
> > + /*
> > + * REVISIT: we only do PASID IOTLB inval for FL, we
> > could have SL
> > + * for PASID in the future such as vIOMMU PT. this
> > doesn't get hit.
> > + */
> > + qi_flush_piotlb(iommu, did, domain->kernel_pasid,
> > + addr, npages, ih);
> > + }
> > if (!list_empty(&domain->devices))
> > qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr,
> > npages, ih); }
> > @@ -5695,6 +5702,100 @@ static void intel_iommu_iotlb_sync_map(struct
> > iommu_domain *domain, }
> > }
> >
> > +static int intel_enable_pasid_dma(struct device *dev, u32 pasid)
> > +{
>
> This seems like completely the wrong kind of op.
>
> At the level of the iommu driver things should be iommu_domain centric
>
> The op should be
>
> int attach_dev_pasid(struct iommu_domain *domain, struct device *dev,
> ioasid_t pasid)
>
> Where 'dev' purpose is to provide the RID
>
> The iommu_domain passed in should be the 'default domain' ie the table
> used for on-demand mapping, or the passthrough page table.
>
Makes sense. DMA API is device centric, iommu API is domain centric. It
should be the common IOMMU code to get the default domain then pass to
vendor drivers. Then we can enforce default domain behavior across all
vendor drivers.
i.e.
dom = iommu_get_dma_domain(dev);
attach_dev_pasid(dom, dev, pasid);
> > + struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
> > + struct device_domain_info *info;
>
> I don't even want to know why an iommu driver is tracking its own
> per-device state. That seems like completely wrong layering.
>
This is for IOTLB and deTLB flush. IOTLB is flushed at per domain level,
devTLB is per device.
For multi-device groups, this is a need to track how many devices are using
the kernel DMA PASID.
Are you suggesting we add the tracking info in the generic layer? i.e.
iommu_group.
We could also have a generic device domain info to replace what is in VT-d
and FSL IOMMU driver, etc.
Thanks,
Jacob
Hi Jason,
On Wed, 8 Dec 2021 09:13:58 -0400, Jason Gunthorpe <[email protected]> wrote:
> > This patch utilizes iommu_enable_pasid_dma() to enable DSA to perform
> > DMA requests with PASID under the same mapping managed by DMA mapping
> > API. In addition, SVA-related bits for kernel DMA are removed. As a
> > result, DSA users shall use DMA mapping API to obtain DMA handles
> > instead of using kernel virtual addresses.
>
> Er, shouldn't this be adding dma_map/etc type calls?
>
> You can't really say a driver is using the DMA API without actually
> calling the DMA API..
The IDXD driver is not aware of addressing mode, it is up to the user of
dmaengine API to prepare the buffer mappings. Here we only set up the PASID
such that it can be picked up during DMA work submission. I tested with
/drivers/dma/dmatest.c which does dma_map_page(), map_single etc. also
tested with other pieces under development.
Thanks,
Jacob
On Wed, Dec 08, 2021 at 11:55:16AM -0800, Jacob Pan wrote:
> Hi Jason,
>
> On Wed, 8 Dec 2021 09:13:58 -0400, Jason Gunthorpe <[email protected]> wrote:
>
> > > This patch utilizes iommu_enable_pasid_dma() to enable DSA to perform
> > > DMA requests with PASID under the same mapping managed by DMA mapping
> > > API. In addition, SVA-related bits for kernel DMA are removed. As a
> > > result, DSA users shall use DMA mapping API to obtain DMA handles
> > > instead of using kernel virtual addresses.
> >
> > Er, shouldn't this be adding dma_map/etc type calls?
> >
> > You can't really say a driver is using the DMA API without actually
> > calling the DMA API..
> The IDXD driver is not aware of addressing mode, it is up to the user of
> dmaengine API to prepare the buffer mappings. Here we only set up the PASID
> such that it can be picked up during DMA work submission. I tested with
> /drivers/dma/dmatest.c which does dma_map_page(), map_single etc. also
> tested with other pieces under development.
Ignoring the work, doesn't IDXD prepare the DMA queues itself, don't
those need the DMA API?
I'm still very confused how this can radically change from using kSVA
to DMA API and NOT introduce some more changes than this. They are not
the same thing, they do not use the same IOVA's. Did you test this
with bypass mode off?
Jason
Hi Jason,
On Wed, 8 Dec 2021 16:30:22 -0400, Jason Gunthorpe <[email protected]> wrote:
> On Wed, Dec 08, 2021 at 11:55:16AM -0800, Jacob Pan wrote:
> > Hi Jason,
> >
> > On Wed, 8 Dec 2021 09:13:58 -0400, Jason Gunthorpe <[email protected]>
> > wrote:
> > > > This patch utilizes iommu_enable_pasid_dma() to enable DSA to
> > > > perform DMA requests with PASID under the same mapping managed by
> > > > DMA mapping API. In addition, SVA-related bits for kernel DMA are
> > > > removed. As a result, DSA users shall use DMA mapping API to obtain
> > > > DMA handles instead of using kernel virtual addresses.
> > >
> > > Er, shouldn't this be adding dma_map/etc type calls?
> > >
> > > You can't really say a driver is using the DMA API without actually
> > > calling the DMA API..
> > The IDXD driver is not aware of addressing mode, it is up to the user of
> > dmaengine API to prepare the buffer mappings. Here we only set up the
> > PASID such that it can be picked up during DMA work submission. I
> > tested with /drivers/dma/dmatest.c which does dma_map_page(),
> > map_single etc. also tested with other pieces under development.
>
> Ignoring the work, doesn't IDXD prepare the DMA queues itself, don't
> those need the DMA API?
>
Do you mean wq completion record address? It is already using DMA API.
wq->compls = dma_alloc_coherent(dev, wq->compls_size,
&wq->compls_addr, GFP_KERNEL);
desc->compl_dma = wq->compls_addr + idxd->data->compl_size * i;
> I'm still very confused how this can radically change from using kSVA
> to DMA API and NOT introduce some more changes than this. They are not
I am guessing the confusion comes from that fact the user of kSVA is not
merged. We were in the process of upstreaming then abandon it. Perhaps that
is why you don't see removing kSVA code?
> the same thing, they do not use the same IOVA's. Did you test this
> with bypass mode off?
Yes with dmatest. IOVA is the default, I separated out the SATC patch which
will put internal accelerators in bypass mode. It can also be verified by
iommu debugfs dump of DMA PASID (2) and PASID 0 (RIDPASID) are pointing to
he same default domain. e.g
PASID PASID_table_entry
0 0x0000000119ed7004:0x0000000000800002:0x000000000000004d
1 0x0000000000000001:0x0000000000800001:0x000000000000010d
2 0x0000000119ed7004:0x0000000000800002:0x000000000000004d
>
> Jason
Thanks,
Jacob
On Wed, Dec 08, 2021 at 01:59:45PM -0800, Jacob Pan wrote:
> Hi Jason,
>
> On Wed, 8 Dec 2021 16:30:22 -0400, Jason Gunthorpe <[email protected]> wrote:
>
> > On Wed, Dec 08, 2021 at 11:55:16AM -0800, Jacob Pan wrote:
> > > Hi Jason,
> > >
> > > On Wed, 8 Dec 2021 09:13:58 -0400, Jason Gunthorpe <[email protected]>
> > > wrote:
> > > > > This patch utilizes iommu_enable_pasid_dma() to enable DSA to
> > > > > perform DMA requests with PASID under the same mapping managed by
> > > > > DMA mapping API. In addition, SVA-related bits for kernel DMA are
> > > > > removed. As a result, DSA users shall use DMA mapping API to obtain
> > > > > DMA handles instead of using kernel virtual addresses.
> > > >
> > > > Er, shouldn't this be adding dma_map/etc type calls?
> > > >
> > > > You can't really say a driver is using the DMA API without actually
> > > > calling the DMA API..
> > > The IDXD driver is not aware of addressing mode, it is up to the user of
> > > dmaengine API to prepare the buffer mappings. Here we only set up the
> > > PASID such that it can be picked up during DMA work submission. I
> > > tested with /drivers/dma/dmatest.c which does dma_map_page(),
> > > map_single etc. also tested with other pieces under development.
> >
> > Ignoring the work, doesn't IDXD prepare the DMA queues itself, don't
> > those need the DMA API?
> >
> Do you mean wq completion record address? It is already using DMA API.
> wq->compls = dma_alloc_coherent(dev, wq->compls_size,
> &wq->compls_addr, GFP_KERNEL);
> desc->compl_dma = wq->compls_addr + idxd->data->compl_size * i;
I would have expected something on the queue submission side too?
> > the same thing, they do not use the same IOVA's. Did you test this
> > with bypass mode off?
> Yes with dmatest. IOVA is the default, I separated out the SATC patch which
> will put internal accelerators in bypass mode. It can also be verified by
> iommu debugfs dump of DMA PASID (2) and PASID 0 (RIDPASID) are pointing to
> he same default domain. e.g
Well, OK then..
Jason
On 12/8/2021 4:39 PM, Jason Gunthorpe wrote:
> On Wed, Dec 08, 2021 at 01:59:45PM -0800, Jacob Pan wrote:
>> Hi Jason,
>>
>> On Wed, 8 Dec 2021 16:30:22 -0400, Jason Gunthorpe <[email protected]> wrote:
>>
>>> On Wed, Dec 08, 2021 at 11:55:16AM -0800, Jacob Pan wrote:
>>>> Hi Jason,
>>>>
>>>> On Wed, 8 Dec 2021 09:13:58 -0400, Jason Gunthorpe <[email protected]>
>>>> wrote:
>>>>>> This patch utilizes iommu_enable_pasid_dma() to enable DSA to
>>>>>> perform DMA requests with PASID under the same mapping managed by
>>>>>> DMA mapping API. In addition, SVA-related bits for kernel DMA are
>>>>>> removed. As a result, DSA users shall use DMA mapping API to obtain
>>>>>> DMA handles instead of using kernel virtual addresses.
>>>>> Er, shouldn't this be adding dma_map/etc type calls?
>>>>>
>>>>> You can't really say a driver is using the DMA API without actually
>>>>> calling the DMA API..
>>>> The IDXD driver is not aware of addressing mode, it is up to the user of
>>>> dmaengine API to prepare the buffer mappings. Here we only set up the
>>>> PASID such that it can be picked up during DMA work submission. I
>>>> tested with /drivers/dma/dmatest.c which does dma_map_page(),
>>>> map_single etc. also tested with other pieces under development.
>>> Ignoring the work, doesn't IDXD prepare the DMA queues itself, don't
>>> those need the DMA API?
>>>
>> Do you mean wq completion record address? It is already using DMA API.
>> wq->compls = dma_alloc_coherent(dev, wq->compls_size,
>> &wq->compls_addr, GFP_KERNEL);
>> desc->compl_dma = wq->compls_addr + idxd->data->compl_size * i;
> I would have expected something on the queue submission side too?
DSA is different than typical DMA devices in the past. Instead of a
software descriptor ring where the device DMA to fetch the descriptors
after the software ringing a doorbell or writing a head index, the
descriptors are submitted directly to the device via a CPU instruction
(i.e. MOVDIR64B or ENQCMD(S)). The CPU takes the KVA of the 64B
descriptor and writes to the device atomically. No DMA mapping is
necessary in this case.
>
>>> the same thing, they do not use the same IOVA's. Did you test this
>>> with bypass mode off?
>> Yes with dmatest. IOVA is the default, I separated out the SATC patch which
>> will put internal accelerators in bypass mode. It can also be verified by
>> iommu debugfs dump of DMA PASID (2) and PASID 0 (RIDPASID) are pointing to
>> he same default domain. e.g
> Well, OK then..
>
> Jason
> From: Jason Gunthorpe <[email protected]>
> Sent: Thursday, December 9, 2021 1:51 AM
>
> > > > + /*
> > > > + * Try to enable both in-kernel and user DMA request with PASID.
> > > > + * PASID is supported unless both user and kernel PASID are
> > > > + * supported. Do not fail probe here in that idxd can still be
> > > > + * used w/o PASID or IOMMU.
> > > > + */
> > > > + if (iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) ||
> > > > + idxd_enable_system_pasid(idxd)) {
> > > > + dev_warn(dev, "Failed to enable PASID\n");
> > > > + } else {
> > > > + set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
> > > > }
> > > Huh? How can the driver keep going if PASID isn't supported? I thought
> > > the whole point of this was because the device cannot do DMA without
> > > PASID at all?
> >
> > There are 2 types of WQ supported with the DSA devices. A dedicated WQ
> type
> > and a shared WQ type. The dedicated WQ type can support DMA with and
> without
> > PASID. The shared wq type must have a PASID to operate. The driver can
> > support dedicated WQ only without PASID usage when there is no PASID
> > support.
>
> Can you add to the cover letter why does the kernel require to use the
> shared WQ?
>
> Jason
Two reasons:
On native the shared WQ is useful when the kernel wants to offload
some memory operations (e.g. page-zeroing) to DSA. When #CPUs are
more than #WQs, this allows per-cpu lock-less submissions using
ENQCMD(PASID, payload) instruction.
In guest the virtual DSA HW may only contain a WQ in shared mode
(unchangeable by the guest) when the host admin wants to share
the limited WQ resource among many VMs. Then there is no choice
in guest regardless whether it's for user or kernel controlled DMA.
Thanks
Kevin
> From: Jacob Pan <[email protected]>
> Sent: Thursday, December 9, 2021 2:50 AM
>
> > Can a device issue DMA requests with PASID even there's no system
> IOMMU
> > or the system IOMMU is disabled?
> >
> Good point.
> If IOMMU is not enabled, device cannot issue DMA requests with PASID. This
> API will not be available. Forgot to add dummy functions to the header.
>
PASID is a PCI thing, not defined by IOMMU.
I think the key is physically if IOMMU is disabled, how will root complex
handle a PCI memory request including a PASID TLP prefix? Does it block
such request due to no IOMMU to consume PASID or simply ignore PASID
and continue routing the request to the memory controller?
If block, then having an iommu interface makes sense.
If ignore, possibly a DMA API call makes more sense instead, implying that
this extension can be used even when iommu is disabled.
I think that is what Baolu wants to point out.
Thanks
Kevin
> From: Jiang, Dave <[email protected]>
> Sent: Thursday, December 9, 2021 8:12 AM
> >>>
> >> Do you mean wq completion record address? It is already using DMA API.
> >> wq->compls = dma_alloc_coherent(dev, wq->compls_size,
> >> &wq->compls_addr, GFP_KERNEL);
> >> desc->compl_dma = wq->compls_addr + idxd->data->compl_size * i;
> > I would have expected something on the queue submission side too?
>
> DSA is different than typical DMA devices in the past. Instead of a
> software descriptor ring where the device DMA to fetch the descriptors
> after the software ringing a doorbell or writing a head index, the
> descriptors are submitted directly to the device via a CPU instruction
> (i.e. MOVDIR64B or ENQCMD(S)). The CPU takes the KVA of the 64B
> descriptor and writes to the device atomically. No DMA mapping is
> necessary in this case.
>
To be accurate, the cpu reads the 64B descriptor from KVA and
then write the descriptor to the device atomically.
On 12/9/21 9:56 AM, Tian, Kevin wrote:
>> From: Jacob Pan <[email protected]>
>> Sent: Thursday, December 9, 2021 2:50 AM
>>
>>> Can a device issue DMA requests with PASID even there's no system
>> IOMMU
>>> or the system IOMMU is disabled?
>>>
>> Good point.
>> If IOMMU is not enabled, device cannot issue DMA requests with PASID. This
>> API will not be available. Forgot to add dummy functions to the header.
>>
>
> PASID is a PCI thing, not defined by IOMMU.
>
> I think the key is physically if IOMMU is disabled, how will root complex
> handle a PCI memory request including a PASID TLP prefix? Does it block
> such request due to no IOMMU to consume PASID or simply ignore PASID
> and continue routing the request to the memory controller?
>
> If block, then having an iommu interface makes sense.
>
> If ignore, possibly a DMA API call makes more sense instead, implying that
> this extension can be used even when iommu is disabled.
>
> I think that is what Baolu wants to point out.
Yes, exactly. Imagining in the VM guest environment, do we require a
vIOMMU for this functionality? vIOMMU is not performance friendly if we
put aside the security considerations.
Best regards,
baolu
On 12/9/21 3:16 AM, Jacob Pan wrote:
> Hi Jason,
>
> On Wed, 8 Dec 2021 09:22:55 -0400, Jason Gunthorpe <[email protected]> wrote:
>
>> On Tue, Dec 07, 2021 at 05:47:13AM -0800, Jacob Pan wrote:
>>> Between DMA requests with and without PASID (legacy), DMA mapping APIs
>>> are used indiscriminately on a device. Therefore, we should always match
>>> the addressing mode of the legacy DMA when enabling kernel PASID.
>>>
>>> This patch adds support for VT-d driver where the kernel PASID is
>>> programmed to match RIDPASID. i.e. if the device is in pass-through, the
>>> kernel PASID is also in pass-through; if the device is in IOVA mode, the
>>> kernel PASID will also be using the same IOVA space.
>>>
>>> There is additional handling for IOTLB and device TLB flush w.r.t. the
>>> kernel PASID. On VT-d, PASID-selective IOTLB flush is also on a
>>> per-domain basis; whereas device TLB flush is per device. Note that
>>> IOTLBs are used even when devices are in pass-through mode. ATS is
>>> enabled device-wide, but the device drivers can choose to manage ATS at
>>> per PASID level whenever control is available.
>>>
>>> Signed-off-by: Jacob Pan <[email protected]>
>>> drivers/iommu/intel/iommu.c | 105 +++++++++++++++++++++++++++++++++++-
>>> drivers/iommu/intel/pasid.c | 7 +++
>>> include/linux/intel-iommu.h | 3 +-
>>> 3 files changed, 113 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>>> index 60253bc436bb..a2ef6b9e4bfc 100644
>>> +++ b/drivers/iommu/intel/iommu.c
>>> @@ -1743,7 +1743,14 @@ static void domain_flush_piotlb(struct
>>> intel_iommu *iommu, if (domain->default_pasid)
>>> qi_flush_piotlb(iommu, did, domain->default_pasid,
>>> addr, npages, ih);
>>> -
>>> + if (domain->kernel_pasid && !domain_type_is_si(domain)) {
>>> + /*
>>> + * REVISIT: we only do PASID IOTLB inval for FL, we
>>> could have SL
>>> + * for PASID in the future such as vIOMMU PT. this
>>> doesn't get hit.
>>> + */
>>> + qi_flush_piotlb(iommu, did, domain->kernel_pasid,
>>> + addr, npages, ih);
>>> + }
>>> if (!list_empty(&domain->devices))
>>> qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr,
>>> npages, ih); }
>>> @@ -5695,6 +5702,100 @@ static void intel_iommu_iotlb_sync_map(struct
>>> iommu_domain *domain, }
>>> }
>>>
>>> +static int intel_enable_pasid_dma(struct device *dev, u32 pasid)
>>> +{
>>
>> This seems like completely the wrong kind of op.
>>
>> At the level of the iommu driver things should be iommu_domain centric
>>
>> The op should be
>>
>> int attach_dev_pasid(struct iommu_domain *domain, struct device *dev,
>> ioasid_t pasid)
>>
>> Where 'dev' purpose is to provide the RID
>>
>> The iommu_domain passed in should be the 'default domain' ie the table
>> used for on-demand mapping, or the passthrough page table.
>>
> Makes sense. DMA API is device centric, iommu API is domain centric. It
> should be the common IOMMU code to get the default domain then pass to
> vendor drivers. Then we can enforce default domain behavior across all
> vendor drivers.
> i.e.
> dom = iommu_get_dma_domain(dev);
> attach_dev_pasid(dom, dev, pasid);
>
>>> + struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
>>> + struct device_domain_info *info;
>>
>> I don't even want to know why an iommu driver is tracking its own
>> per-device state. That seems like completely wrong layering.
>>
> This is for IOTLB and deTLB flush. IOTLB is flushed at per domain level,
> devTLB is per device.
>
> For multi-device groups, this is a need to track how many devices are using
> the kernel DMA PASID.
>
> Are you suggesting we add the tracking info in the generic layer? i.e.
> iommu_group.
>
> We could also have a generic device domain info to replace what is in VT-d
> and FSL IOMMU driver, etc.
The store place of per-device iommu driver private data has already been
standardized. The iommu core provides below interfaces for this purpose:
void dev_iommu_priv_set(struct device *dev, void *priv);
void *dev_iommu_priv_get(struct device *dev);
If we have anything generic among different vendor iommu drivers,
perhaps we could move them into dev->iommu.
Best regards,
baolu
Hi Jacob,
On Tue, Dec 07, 2021 at 05:47:11AM -0800, Jacob Pan wrote:
> In-kernel DMA is managed by DMA mapping APIs, which supports per device
> addressing mode for legacy DMA requests. With the introduction of
> Process Address Space ID (PASID), device DMA can now target at a finer
> granularity per PASID + Requester ID (RID).
>
> However, for in-kernel DMA there is no need to differentiate between
> legacy DMA and DMA with PASID in terms of mapping. DMA address mapping
> for RID+PASID can be made identical to the RID. The benefit for the
> drivers is the continuation of DMA mapping APIs without change.
>
> This patch reserves a special IOASID for devices that perform in-kernel
> DMA requests with PASID. This global IOASID is excluded from the
> IOASID allocator. The analogous case is PASID #0, a special PASID
> reserved for DMA requests without PASID (legacy). We could have different
> kernel PASIDs for individual devices, but for simplicity reasons, a
> globally reserved one will fit the bill.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
> drivers/iommu/intel/iommu.c | 4 ++--
> drivers/iommu/intel/pasid.h | 3 +--
> drivers/iommu/intel/svm.c | 2 +-
> drivers/iommu/ioasid.c | 2 ++
> include/linux/ioasid.h | 4 ++++
> 6 files changed, 11 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index ee66d1f4cb81..ac79a37ffe06 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -329,7 +329,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
> return ERR_PTR(-ENOMEM);
>
> /* Allocate a PASID for this mm if necessary */
> - ret = iommu_sva_alloc_pasid(mm, 1, (1U << master->ssid_bits) - 1);
> + ret = iommu_sva_alloc_pasid(mm, IOASID_ALLOC_BASE, (1U << master->ssid_bits) - 1);
I'd rather keep hardware limits as parameters here. PASID#0 is reserved by
the SMMUv3 hardware so we have to pass at least 1 here, but VT-d could
change RID_PASID and pass 0. On the other hand IOASID_DMA_PASID depends on
device drivers needs and is not needed on all systems, so I think could
stay within the ioasid allocator. Could VT-d do an ioasid_alloc()/ioasid_get()
to reserve this global PASID, storing it under the device_domain_lock?
This looks like we're just one step away from device drivers needing
multiple PASIDs for kernel DMA so I'm trying to figure out how to evolve
the API towards that. It's probably as simple as keeping a kernel IOASID
set at first, but then we'll probably want to optimize by having multiple
overlapping sets for each device driver (all separate from the SVA set).
Thanks,
Jean
Hi Lu,
On Thu, 9 Dec 2021 10:21:38 +0800, Lu Baolu <[email protected]>
wrote:
> On 12/9/21 9:56 AM, Tian, Kevin wrote:
> >> From: Jacob Pan <[email protected]>
> >> Sent: Thursday, December 9, 2021 2:50 AM
> >>
> >>> Can a device issue DMA requests with PASID even there's no system
> >> IOMMU
> >>> or the system IOMMU is disabled?
> >>>
> >> Good point.
> >> If IOMMU is not enabled, device cannot issue DMA requests with PASID.
> >> This API will not be available. Forgot to add dummy functions to the
> >> header.
> >
> > PASID is a PCI thing, not defined by IOMMU.
> >
> > I think the key is physically if IOMMU is disabled, how will root
> > complex handle a PCI memory request including a PASID TLP prefix? Does
> > it block such request due to no IOMMU to consume PASID or simply ignore
> > PASID and continue routing the request to the memory controller?
> >
> > If block, then having an iommu interface makes sense.
> >
> > If ignore, possibly a DMA API call makes more sense instead, implying
> > that this extension can be used even when iommu is disabled.
> >
> > I think that is what Baolu wants to point out.
>
Thanks for clarifying, very good point.
Looking at the PCIe spec. I don't see specific rules for RC to ignore or
block PASID TLP if not enabled.
"- A Root Complex that supports PASID TLP Prefixes must have a device
specific mechanism for enabling them. By default usage of PASID TLP
Prefixes is disabled
- Root Complexes may optionally support TLPs with PASID TLP Prefixes. The
mechanism used to detect whether a Root Complex supports the PASID TLP
Prefix is implementation specific
"
For all practical purposes, why would someone sets up PASID for DMA just to
be ignored? An IOMMU interface makes sense to me.
> Yes, exactly. Imagining in the VM guest environment, do we require a
> vIOMMU for this functionality? vIOMMU is not performance friendly if we
> put aside the security considerations.
>
The primary use case for accelerators to use in-kernel DMA will be in
pass-through mode. vIOMMU should be able to do PT with good performance,
right? no nesting, IO page faults.
> Best regards,
> baolu
Thanks,
Jacob
On Thu, Dec 09, 2021 at 08:32:49AM -0800, Jacob Pan wrote:
> Hi Lu,
>
> On Thu, 9 Dec 2021 10:21:38 +0800, Lu Baolu <[email protected]>
> wrote:
>
> > On 12/9/21 9:56 AM, Tian, Kevin wrote:
> > >> From: Jacob Pan <[email protected]>
> > >> Sent: Thursday, December 9, 2021 2:50 AM
> > >>
> > >>> Can a device issue DMA requests with PASID even there's no system
> > >> IOMMU
> > >>> or the system IOMMU is disabled?
> > >>>
> > >> Good point.
> > >> If IOMMU is not enabled, device cannot issue DMA requests with PASID.
> > >> This API will not be available. Forgot to add dummy functions to the
> > >> header.
> > >
> > > PASID is a PCI thing, not defined by IOMMU.
True, but RP is just a forwarding agent on these EETLP prefixes. I'm not
sure how RP's will behave if they receive a TLP they don't understand
and I suspect they might pull the system down via some UR type response
when IOMMU isn't enabled.
> > >
> > > I think the key is physically if IOMMU is disabled, how will root
> > > complex handle a PCI memory request including a PASID TLP prefix? Does
> > > it block such request due to no IOMMU to consume PASID or simply ignore
> > > PASID and continue routing the request to the memory controller?
> > >
> > > If block, then having an iommu interface makes sense.
> > >
> > > If ignore, possibly a DMA API call makes more sense instead, implying
> > > that this extension can be used even when iommu is disabled.
> > >
> > > I think that is what Baolu wants to point out.
> >
> Thanks for clarifying, very good point.
> Looking at the PCIe spec. I don't see specific rules for RC to ignore or
> block PASID TLP if not enabled.
> "- A Root Complex that supports PASID TLP Prefixes must have a device
> specific mechanism for enabling them. By default usage of PASID TLP
> Prefixes is disabled
> - Root Complexes may optionally support TLPs with PASID TLP Prefixes. The
> mechanism used to detect whether a Root Complex supports the PASID TLP
> Prefix is implementation specific
Isn't implementation specific mechanism is IOMMU?
> "
> For all practical purposes, why would someone sets up PASID for DMA just to
> be ignored? An IOMMU interface makes sense to me.
>
> > Yes, exactly. Imagining in the VM guest environment, do we require a
> > vIOMMU for this functionality? vIOMMU is not performance friendly if we
> > put aside the security considerations.
> >
> The primary use case for accelerators to use in-kernel DMA will be in
> pass-through mode. vIOMMU should be able to do PT with good performance,
> right? no nesting, IO page faults.
But from an enabling perspective when PASID is in use we have to mandate
either the presence of an IOMMU, or some hypercall that will do the
required plumbing for PASID isn't it?
Hi Ashok,
On Thu, 9 Dec 2021 08:57:15 -0800, "Raj, Ashok" <[email protected]> wrote:
> > Prefixes is disabled
> > - Root Complexes may optionally support TLPs with PASID TLP Prefixes.
> > The mechanism used to detect whether a Root Complex supports the PASID
> > TLP Prefix is implementation specific
>
> Isn't implementation specific mechanism is IOMMU?
>
I agree. In case of VT-d it would be in ecap.pasid bit.
> > "
> > For all practical purposes, why would someone sets up PASID for DMA
> > just to be ignored? An IOMMU interface makes sense to me.
> >
> > > Yes, exactly. Imagining in the VM guest environment, do we require a
> > > vIOMMU for this functionality? vIOMMU is not performance friendly if
> > > we put aside the security considerations.
> > >
> > The primary use case for accelerators to use in-kernel DMA will be in
> > pass-through mode. vIOMMU should be able to do PT with good performance,
> > right? no nesting, IO page faults.
>
> But from an enabling perspective when PASID is in use we have to mandate
> either the presence of an IOMMU, or some hypercall that will do the
> required plumbing for PASID isn't it?
So the point is that we need either vIOMMU or virtio IOMMU to use PASID?
For the purpose of this discussion to decide whether iommu API or DMA API
should be used, I am still convinced it should be iommu API.
Unlike IOMMU on/off for DMA API (which is transparent to the driver), using
PASID is not transparent. Other than enabling the PASID, the driver has to
program the PASID explicitly. There is no point of doing this dance knowing
the PASID might be ignored.
Thanks,
Jacob
Hi Jean-Philippe,
On Thu, 9 Dec 2021 11:03:23 +0000, Jean-Philippe Brucker
<[email protected]> wrote:
> Hi Jacob,
>
> On Tue, Dec 07, 2021 at 05:47:11AM -0800, Jacob Pan wrote:
> > In-kernel DMA is managed by DMA mapping APIs, which supports per device
> > addressing mode for legacy DMA requests. With the introduction of
> > Process Address Space ID (PASID), device DMA can now target at a finer
> > granularity per PASID + Requester ID (RID).
> >
> > However, for in-kernel DMA there is no need to differentiate between
> > legacy DMA and DMA with PASID in terms of mapping. DMA address mapping
> > for RID+PASID can be made identical to the RID. The benefit for the
> > drivers is the continuation of DMA mapping APIs without change.
> >
> > This patch reserves a special IOASID for devices that perform in-kernel
> > DMA requests with PASID. This global IOASID is excluded from the
> > IOASID allocator. The analogous case is PASID #0, a special PASID
> > reserved for DMA requests without PASID (legacy). We could have
> > different kernel PASIDs for individual devices, but for simplicity
> > reasons, a globally reserved one will fit the bill.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
> > drivers/iommu/intel/iommu.c | 4 ++--
> > drivers/iommu/intel/pasid.h | 3 +--
> > drivers/iommu/intel/svm.c | 2 +-
> > drivers/iommu/ioasid.c | 2 ++
> > include/linux/ioasid.h | 4 ++++
> > 6 files changed, 11 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c index
> > ee66d1f4cb81..ac79a37ffe06 100644 ---
> > a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c +++
> > b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c @@ -329,7 +329,7 @@
> > __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm) return
> > ERR_PTR(-ENOMEM);
> > /* Allocate a PASID for this mm if necessary */
> > - ret = iommu_sva_alloc_pasid(mm, 1, (1U << master->ssid_bits) -
> > 1);
> > + ret = iommu_sva_alloc_pasid(mm, IOASID_ALLOC_BASE, (1U <<
> > master->ssid_bits) - 1);
>
> I'd rather keep hardware limits as parameters here. PASID#0 is reserved by
> the SMMUv3 hardware so we have to pass at least 1 here, but VT-d could
> change RID_PASID and pass 0. On the other hand IOASID_DMA_PASID depends on
> device drivers needs and is not needed on all systems, so I think could
> stay within the ioasid allocator. Could VT-d do an
> ioasid_alloc()/ioasid_get() to reserve this global PASID, storing it
> under the device_domain_lock?
>
Yes, this works. We can delegate DMA PASID allocation to vendor drivers. My
proposal here is driven by simplicity.
> This looks like we're just one step away from device drivers needing
> multiple PASIDs for kernel DMA so I'm trying to figure out how to evolve
> the API towards that. It's probably as simple as keeping a kernel IOASID
> set at first, but then we'll probably want to optimize by having multiple
> overlapping sets for each device driver (all separate from the SVA set).
Sounds reasonable to start with a kernel set for in-kernel DMA once we need
multiple ones. But I am not sure what *overlapping* sets mean here, could
you explain?
>
Thanks,
Jacob
Hi Kevin,
On Thu, 9 Dec 2021 01:48:09 +0000, "Tian, Kevin" <[email protected]>
wrote:
> > From: Jason Gunthorpe <[email protected]>
> > Sent: Thursday, December 9, 2021 1:51 AM
> >
> > > > > + /*
> > > > > + * Try to enable both in-kernel and user DMA request
> > > > > with PASID.
> > > > > + * PASID is supported unless both user and kernel PASID
> > > > > are
> > > > > + * supported. Do not fail probe here in that idxd can
> > > > > still be
> > > > > + * used w/o PASID or IOMMU.
> > > > > + */
> > > > > + if (iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) ||
> > > > > + idxd_enable_system_pasid(idxd)) {
> > > > > + dev_warn(dev, "Failed to enable PASID\n");
> > > > > + } else {
> > > > > + set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
> > > > > }
> > > > Huh? How can the driver keep going if PASID isn't supported? I
> > > > thought the whole point of this was because the device cannot do
> > > > DMA without PASID at all?
> > >
> > > There are 2 types of WQ supported with the DSA devices. A dedicated
> > > WQ
> > type
> > > and a shared WQ type. The dedicated WQ type can support DMA with and
> > without
> > > PASID. The shared wq type must have a PASID to operate. The driver can
> > > support dedicated WQ only without PASID usage when there is no PASID
> > > support.
> >
> > Can you add to the cover letter why does the kernel require to use the
> > shared WQ?
> >
> > Jason
>
> Two reasons:
>
> On native the shared WQ is useful when the kernel wants to offload
> some memory operations (e.g. page-zeroing) to DSA. When #CPUs are
> more than #WQs, this allows per-cpu lock-less submissions using
> ENQCMD(PASID, payload) instruction.
>
> In guest the virtual DSA HW may only contain a WQ in shared mode
> (unchangeable by the guest) when the host admin wants to share
> the limited WQ resource among many VMs. Then there is no choice
> in guest regardless whether it's for user or kernel controlled DMA.
I will add these to the next cover letter.
Thanks,
Jacob
Hi Lu,
On Thu, 9 Dec 2021 10:32:43 +0800, Lu Baolu <[email protected]>
wrote:
> On 12/9/21 3:16 AM, Jacob Pan wrote:
> > Hi Jason,
> >
> > On Wed, 8 Dec 2021 09:22:55 -0400, Jason Gunthorpe <[email protected]>
> > wrote:
> >> On Tue, Dec 07, 2021 at 05:47:13AM -0800, Jacob Pan wrote:
> >>> Between DMA requests with and without PASID (legacy), DMA mapping APIs
> >>> are used indiscriminately on a device. Therefore, we should always
> >>> match the addressing mode of the legacy DMA when enabling kernel
> >>> PASID.
> >>>
> >>> This patch adds support for VT-d driver where the kernel PASID is
> >>> programmed to match RIDPASID. i.e. if the device is in pass-through,
> >>> the kernel PASID is also in pass-through; if the device is in IOVA
> >>> mode, the kernel PASID will also be using the same IOVA space.
> >>>
> >>> There is additional handling for IOTLB and device TLB flush w.r.t. the
> >>> kernel PASID. On VT-d, PASID-selective IOTLB flush is also on a
> >>> per-domain basis; whereas device TLB flush is per device. Note that
> >>> IOTLBs are used even when devices are in pass-through mode. ATS is
> >>> enabled device-wide, but the device drivers can choose to manage ATS
> >>> at per PASID level whenever control is available.
> >>>
> >>> Signed-off-by: Jacob Pan <[email protected]>
> >>> drivers/iommu/intel/iommu.c | 105
> >>> +++++++++++++++++++++++++++++++++++- drivers/iommu/intel/pasid.c |
> >>> 7 +++ include/linux/intel-iommu.h | 3 +-
> >>> 3 files changed, 113 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> >>> index 60253bc436bb..a2ef6b9e4bfc 100644
> >>> +++ b/drivers/iommu/intel/iommu.c
> >>> @@ -1743,7 +1743,14 @@ static void domain_flush_piotlb(struct
> >>> intel_iommu *iommu, if (domain->default_pasid)
> >>> qi_flush_piotlb(iommu, did, domain->default_pasid,
> >>> addr, npages, ih);
> >>> -
> >>> + if (domain->kernel_pasid && !domain_type_is_si(domain)) {
> >>> + /*
> >>> + * REVISIT: we only do PASID IOTLB inval for FL, we
> >>> could have SL
> >>> + * for PASID in the future such as vIOMMU PT. this
> >>> doesn't get hit.
> >>> + */
> >>> + qi_flush_piotlb(iommu, did, domain->kernel_pasid,
> >>> + addr, npages, ih);
> >>> + }
> >>> if (!list_empty(&domain->devices))
> >>> qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr,
> >>> npages, ih); }
> >>> @@ -5695,6 +5702,100 @@ static void intel_iommu_iotlb_sync_map(struct
> >>> iommu_domain *domain, }
> >>> }
> >>>
> >>> +static int intel_enable_pasid_dma(struct device *dev, u32 pasid)
> >>> +{
> >>
> >> This seems like completely the wrong kind of op.
> >>
> >> At the level of the iommu driver things should be iommu_domain centric
> >>
> >> The op should be
> >>
> >> int attach_dev_pasid(struct iommu_domain *domain, struct device *dev,
> >> ioasid_t pasid)
> >>
> >> Where 'dev' purpose is to provide the RID
> >>
> >> The iommu_domain passed in should be the 'default domain' ie the table
> >> used for on-demand mapping, or the passthrough page table.
> >>
> > Makes sense. DMA API is device centric, iommu API is domain centric. It
> > should be the common IOMMU code to get the default domain then pass to
> > vendor drivers. Then we can enforce default domain behavior across all
> > vendor drivers.
> > i.e.
> > dom = iommu_get_dma_domain(dev);
> > attach_dev_pasid(dom, dev, pasid);
> >
> >>> + struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
> >>> + struct device_domain_info *info;
> >>
> >> I don't even want to know why an iommu driver is tracking its own
> >> per-device state. That seems like completely wrong layering.
> >>
> > This is for IOTLB and deTLB flush. IOTLB is flushed at per domain level,
> > devTLB is per device.
> >
> > For multi-device groups, this is a need to track how many devices are
> > using the kernel DMA PASID.
> >
> > Are you suggesting we add the tracking info in the generic layer? i.e.
> > iommu_group.
> >
> > We could also have a generic device domain info to replace what is in
> > VT-d and FSL IOMMU driver, etc.
>
> The store place of per-device iommu driver private data has already been
> standardized. The iommu core provides below interfaces for this purpose:
>
> void dev_iommu_priv_set(struct device *dev, void *priv);
> void *dev_iommu_priv_get(struct device *dev);
>
> If we have anything generic among different vendor iommu drivers,
> perhaps we could move them into dev->iommu.
>
Yes, good suggestion. DMA PASID should be a generic feature, not suitable
for the opaque private date. Can we agree on adding the following flag for
devTLB invalidation?
@@ -379,6 +379,7 @@ struct dev_iommu {
struct iommu_fwspec *fwspec;
struct iommu_device *iommu_dev;
void *priv;
+ u32 pasid_dma_enabled : 1;
};
For DMA PASID storage, can we store it in the iommu_domain instead of
iommu_group? In the end, this PASID is only used for the default domain. It
will be easier to refcount how many attached devices are using the PASID.
Destroy the PASID when no devices in the group are using PASID DMA. IOTLB
flush is per domain also.
Jason, do you have guidance here?
> Best regards,
> baolu
Thanks,
Jacob
On Thu, Dec 09, 2021 at 03:21:13PM -0800, Jacob Pan wrote:
> For DMA PASID storage, can we store it in the iommu_domain instead of
> iommu_group?
It doesn't make sense to put in the domain, the domain should be only
the page table and not have any relation to how things are matched to
it
Jason
On 2021/12/10 7:21, Jacob Pan wrote:
> On Thu, 9 Dec 2021 10:32:43 +0800, Lu Baolu<[email protected]>
> wrote:
>
>> On 12/9/21 3:16 AM, Jacob Pan wrote:
>>> Hi Jason,
>>>
>>> On Wed, 8 Dec 2021 09:22:55 -0400, Jason Gunthorpe<[email protected]>
>>> wrote:
>>>> On Tue, Dec 07, 2021 at 05:47:13AM -0800, Jacob Pan wrote:
>>>>> Between DMA requests with and without PASID (legacy), DMA mapping APIs
>>>>> are used indiscriminately on a device. Therefore, we should always
>>>>> match the addressing mode of the legacy DMA when enabling kernel
>>>>> PASID.
>>>>>
>>>>> This patch adds support for VT-d driver where the kernel PASID is
>>>>> programmed to match RIDPASID. i.e. if the device is in pass-through,
>>>>> the kernel PASID is also in pass-through; if the device is in IOVA
>>>>> mode, the kernel PASID will also be using the same IOVA space.
>>>>>
>>>>> There is additional handling for IOTLB and device TLB flush w.r.t. the
>>>>> kernel PASID. On VT-d, PASID-selective IOTLB flush is also on a
>>>>> per-domain basis; whereas device TLB flush is per device. Note that
>>>>> IOTLBs are used even when devices are in pass-through mode. ATS is
>>>>> enabled device-wide, but the device drivers can choose to manage ATS
>>>>> at per PASID level whenever control is available.
>>>>>
>>>>> Signed-off-by: Jacob Pan<[email protected]>
>>>>> drivers/iommu/intel/iommu.c | 105
>>>>> +++++++++++++++++++++++++++++++++++- drivers/iommu/intel/pasid.c |
>>>>> 7 +++ include/linux/intel-iommu.h | 3 +-
>>>>> 3 files changed, 113 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>>>>> index 60253bc436bb..a2ef6b9e4bfc 100644
>>>>> +++ b/drivers/iommu/intel/iommu.c
>>>>> @@ -1743,7 +1743,14 @@ static void domain_flush_piotlb(struct
>>>>> intel_iommu *iommu, if (domain->default_pasid)
>>>>> qi_flush_piotlb(iommu, did, domain->default_pasid,
>>>>> addr, npages, ih);
>>>>> -
>>>>> + if (domain->kernel_pasid && !domain_type_is_si(domain)) {
>>>>> + /*
>>>>> + * REVISIT: we only do PASID IOTLB inval for FL, we
>>>>> could have SL
>>>>> + * for PASID in the future such as vIOMMU PT. this
>>>>> doesn't get hit.
>>>>> + */
>>>>> + qi_flush_piotlb(iommu, did, domain->kernel_pasid,
>>>>> + addr, npages, ih);
>>>>> + }
>>>>> if (!list_empty(&domain->devices))
>>>>> qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr,
>>>>> npages, ih); }
>>>>> @@ -5695,6 +5702,100 @@ static void intel_iommu_iotlb_sync_map(struct
>>>>> iommu_domain *domain, }
>>>>> }
>>>>>
>>>>> +static int intel_enable_pasid_dma(struct device *dev, u32 pasid)
>>>>> +{
>>>> This seems like completely the wrong kind of op.
>>>>
>>>> At the level of the iommu driver things should be iommu_domain centric
>>>>
>>>> The op should be
>>>>
>>>> int attach_dev_pasid(struct iommu_domain *domain, struct device *dev,
>>>> ioasid_t pasid)
>>>>
>>>> Where 'dev' purpose is to provide the RID
>>>>
>>>> The iommu_domain passed in should be the 'default domain' ie the table
>>>> used for on-demand mapping, or the passthrough page table.
>>>>
>>> Makes sense. DMA API is device centric, iommu API is domain centric. It
>>> should be the common IOMMU code to get the default domain then pass to
>>> vendor drivers. Then we can enforce default domain behavior across all
>>> vendor drivers.
>>> i.e.
>>> dom = iommu_get_dma_domain(dev);
>>> attach_dev_pasid(dom, dev, pasid);
>>>
>>>>> + struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
>>>>> + struct device_domain_info *info;
>>>> I don't even want to know why an iommu driver is tracking its own
>>>> per-device state. That seems like completely wrong layering.
>>>>
>>> This is for IOTLB and deTLB flush. IOTLB is flushed at per domain level,
>>> devTLB is per device.
>>>
>>> For multi-device groups, this is a need to track how many devices are
>>> using the kernel DMA PASID.
>>>
>>> Are you suggesting we add the tracking info in the generic layer? i.e.
>>> iommu_group.
>>>
>>> We could also have a generic device domain info to replace what is in
>>> VT-d and FSL IOMMU driver, etc.
>> The store place of per-device iommu driver private data has already been
>> standardized. The iommu core provides below interfaces for this purpose:
>>
>> void dev_iommu_priv_set(struct device *dev, void *priv);
>> void *dev_iommu_priv_get(struct device *dev);
>>
>> If we have anything generic among different vendor iommu drivers,
>> perhaps we could move them into dev->iommu.
>>
> Yes, good suggestion. DMA PASID should be a generic feature, not suitable
> for the opaque private date. Can we agree on adding the following flag for
> devTLB invalidation?
>
> @@ -379,6 +379,7 @@ struct dev_iommu {
> struct iommu_fwspec *fwspec;
> struct iommu_device *iommu_dev;
> void *priv;
> + u32 pasid_dma_enabled : 1;
> };
>
> For DMA PASID storage, can we store it in the iommu_domain instead of
> iommu_group? In the end, this PASID is only used for the default domain. It
> will be easier to refcount how many attached devices are using the PASID.
> Destroy the PASID when no devices in the group are using PASID DMA. IOTLB
> flush is per domain also.
Tying pasid to an iommu_domain is not a good idea. An iommu_domain
represents an I/O address translation table. It could be attached to a
device or a PASID on the device.
Perhaps the dev_iommu is a reasonable place for this.
@@ -390,6 +390,8 @@ struct dev_iommu {
struct iommu_fwspec *fwspec;
struct iommu_device *iommu_dev;
void *priv;
+ unsigned int pasid_bits;
+ u32 kernel_dma_pasid;
};
@pasid_bits is a static attribute of a device which supports PASID
feature. It reads the PASID bitwidth that the device could support.
The vendor iommu driver could set this when the PASID feature is about
to be enabled. Normally, it's the MIN of device and iommu capabilities.
@kernel_dma_pasid is the PASID value used for kernel DMA if it's
enabled. It reads INVALID_IOASID if kernel DMA with PASID is not
enabled.
Best regards,
baolu
On Thu, Dec 09, 2021 at 10:14:04AM -0800, Jacob Pan wrote:
> > This looks like we're just one step away from device drivers needing
> > multiple PASIDs for kernel DMA so I'm trying to figure out how to evolve
> > the API towards that. It's probably as simple as keeping a kernel IOASID
> > set at first, but then we'll probably want to optimize by having multiple
> > overlapping sets for each device driver (all separate from the SVA set).
> Sounds reasonable to start with a kernel set for in-kernel DMA once we need
> multiple ones. But I am not sure what *overlapping* sets mean here, could
> you explain?
Given that each device uses a separate PASID table, we could allocate the
same set of PASID values for different device drivers. We just need to
make sure that those values are different from PASIDs allocated for user
SVA.
Thanks,
Jean
On Fri, Dec 10, 2021 at 09:06:24AM +0000, Jean-Philippe Brucker wrote:
> On Thu, Dec 09, 2021 at 10:14:04AM -0800, Jacob Pan wrote:
> > > This looks like we're just one step away from device drivers needing
> > > multiple PASIDs for kernel DMA so I'm trying to figure out how to evolve
> > > the API towards that. It's probably as simple as keeping a kernel IOASID
> > > set at first, but then we'll probably want to optimize by having multiple
> > > overlapping sets for each device driver (all separate from the SVA set).
> > Sounds reasonable to start with a kernel set for in-kernel DMA once we need
> > multiple ones. But I am not sure what *overlapping* sets mean here, could
> > you explain?
>
> Given that each device uses a separate PASID table, we could allocate the
> same set of PASID values for different device drivers. We just need to
> make sure that those values are different from PASIDs allocated for user
> SVA.
Why does user SVA need global values anyhow?
Jason
Hi Lu,
On Fri, 10 Dec 2021 14:46:32 +0800, Lu Baolu <[email protected]>
wrote:
> On 2021/12/10 7:21, Jacob Pan wrote:
> > On Thu, 9 Dec 2021 10:32:43 +0800, Lu Baolu<[email protected]>
> > wrote:
> >
> >> On 12/9/21 3:16 AM, Jacob Pan wrote:
> >>> Hi Jason,
> >>>
> >>> On Wed, 8 Dec 2021 09:22:55 -0400, Jason Gunthorpe<[email protected]>
> >>> wrote:
> >>>> On Tue, Dec 07, 2021 at 05:47:13AM -0800, Jacob Pan wrote:
> >>>>> Between DMA requests with and without PASID (legacy), DMA mapping
> >>>>> APIs are used indiscriminately on a device. Therefore, we should
> >>>>> always match the addressing mode of the legacy DMA when enabling
> >>>>> kernel PASID.
> >>>>>
> >>>>> This patch adds support for VT-d driver where the kernel PASID is
> >>>>> programmed to match RIDPASID. i.e. if the device is in pass-through,
> >>>>> the kernel PASID is also in pass-through; if the device is in IOVA
> >>>>> mode, the kernel PASID will also be using the same IOVA space.
> >>>>>
> >>>>> There is additional handling for IOTLB and device TLB flush w.r.t.
> >>>>> the kernel PASID. On VT-d, PASID-selective IOTLB flush is also on a
> >>>>> per-domain basis; whereas device TLB flush is per device. Note that
> >>>>> IOTLBs are used even when devices are in pass-through mode. ATS is
> >>>>> enabled device-wide, but the device drivers can choose to manage ATS
> >>>>> at per PASID level whenever control is available.
> >>>>>
> >>>>> Signed-off-by: Jacob Pan<[email protected]>
> >>>>> drivers/iommu/intel/iommu.c | 105
> >>>>> +++++++++++++++++++++++++++++++++++- drivers/iommu/intel/pasid.c |
> >>>>> 7 +++ include/linux/intel-iommu.h | 3 +-
> >>>>> 3 files changed, 113 insertions(+), 2 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/iommu/intel/iommu.c
> >>>>> b/drivers/iommu/intel/iommu.c index 60253bc436bb..a2ef6b9e4bfc
> >>>>> 100644 +++ b/drivers/iommu/intel/iommu.c
> >>>>> @@ -1743,7 +1743,14 @@ static void domain_flush_piotlb(struct
> >>>>> intel_iommu *iommu, if (domain->default_pasid)
> >>>>> qi_flush_piotlb(iommu, did,
> >>>>> domain->default_pasid, addr, npages, ih);
> >>>>> -
> >>>>> + if (domain->kernel_pasid && !domain_type_is_si(domain)) {
> >>>>> + /*
> >>>>> + * REVISIT: we only do PASID IOTLB inval for FL, we
> >>>>> could have SL
> >>>>> + * for PASID in the future such as vIOMMU PT. this
> >>>>> doesn't get hit.
> >>>>> + */
> >>>>> + qi_flush_piotlb(iommu, did, domain->kernel_pasid,
> >>>>> + addr, npages, ih);
> >>>>> + }
> >>>>> if (!list_empty(&domain->devices))
> >>>>> qi_flush_piotlb(iommu, did, PASID_RID2PASID,
> >>>>> addr, npages, ih); }
> >>>>> @@ -5695,6 +5702,100 @@ static void
> >>>>> intel_iommu_iotlb_sync_map(struct iommu_domain *domain, }
> >>>>> }
> >>>>>
> >>>>> +static int intel_enable_pasid_dma(struct device *dev, u32 pasid)
> >>>>> +{
> >>>> This seems like completely the wrong kind of op.
> >>>>
> >>>> At the level of the iommu driver things should be iommu_domain
> >>>> centric
> >>>>
> >>>> The op should be
> >>>>
> >>>> int attach_dev_pasid(struct iommu_domain *domain, struct device *dev,
> >>>> ioasid_t pasid)
> >>>>
> >>>> Where 'dev' purpose is to provide the RID
> >>>>
> >>>> The iommu_domain passed in should be the 'default domain' ie the
> >>>> table used for on-demand mapping, or the passthrough page table.
> >>>>
> >>> Makes sense. DMA API is device centric, iommu API is domain centric.
> >>> It should be the common IOMMU code to get the default domain then
> >>> pass to vendor drivers. Then we can enforce default domain behavior
> >>> across all vendor drivers.
> >>> i.e.
> >>> dom = iommu_get_dma_domain(dev);
> >>> attach_dev_pasid(dom, dev, pasid);
> >>>
> >>>>> + struct intel_iommu *iommu = device_to_iommu(dev, NULL,
> >>>>> NULL);
> >>>>> + struct device_domain_info *info;
> >>>> I don't even want to know why an iommu driver is tracking its own
> >>>> per-device state. That seems like completely wrong layering.
> >>>>
> >>> This is for IOTLB and deTLB flush. IOTLB is flushed at per domain
> >>> level, devTLB is per device.
> >>>
> >>> For multi-device groups, this is a need to track how many devices are
> >>> using the kernel DMA PASID.
> >>>
> >>> Are you suggesting we add the tracking info in the generic layer? i.e.
> >>> iommu_group.
> >>>
> >>> We could also have a generic device domain info to replace what is in
> >>> VT-d and FSL IOMMU driver, etc.
> >> The store place of per-device iommu driver private data has already
> >> been standardized. The iommu core provides below interfaces for this
> >> purpose:
> >>
> >> void dev_iommu_priv_set(struct device *dev, void *priv);
> >> void *dev_iommu_priv_get(struct device *dev);
> >>
> >> If we have anything generic among different vendor iommu drivers,
> >> perhaps we could move them into dev->iommu.
> >>
> > Yes, good suggestion. DMA PASID should be a generic feature, not
> > suitable for the opaque private date. Can we agree on adding the
> > following flag for devTLB invalidation?
> >
> > @@ -379,6 +379,7 @@ struct dev_iommu {
> > struct iommu_fwspec *fwspec;
> > struct iommu_device *iommu_dev;
> > void *priv;
> > + u32 pasid_dma_enabled : 1;
> > };
> >
> > For DMA PASID storage, can we store it in the iommu_domain instead of
> > iommu_group? In the end, this PASID is only used for the default
> > domain. It will be easier to refcount how many attached devices are
> > using the PASID. Destroy the PASID when no devices in the group are
> > using PASID DMA. IOTLB flush is per domain also.
>
> Tying pasid to an iommu_domain is not a good idea. An iommu_domain
> represents an I/O address translation table. It could be attached to a
> device or a PASID on the device.
>
I don;t think we can avoid storing PASID at domain level or the group's
default domain. IOTLB flush is per domain. Default domain of DMA type
is already tying to PASID0, right?
> Perhaps the dev_iommu is a reasonable place for this.
>
> @@ -390,6 +390,8 @@ struct dev_iommu {
> struct iommu_fwspec *fwspec;
> struct iommu_device *iommu_dev;
> void *priv;
> + unsigned int pasid_bits;
> + u32 kernel_dma_pasid;
> };
>
> @pasid_bits is a static attribute of a device which supports PASID
> feature. It reads the PASID bitwidth that the device could support.
> The vendor iommu driver could set this when the PASID feature is about
> to be enabled. Normally, it's the MIN of device and iommu capabilities.
>
> @kernel_dma_pasid is the PASID value used for kernel DMA if it's
> enabled. It reads INVALID_IOASID if kernel DMA with PASID is not
> enabled.
>
This essentially goes back to the same layering as struct device.pasid,
just embedded under device.iommu.pasid. That is fine but we still need a
per domain PASID info. I see the the following functionalities:
1. per device PASID info for devTLB flush
2. per domain PASID info for IOTLB flush for all attached devices
The PASID info includes PASID value, user/device count, enabled status.
Though both DMA API PASID and RIDPASID (0) are mapped identically, RIDPASID
TLB flush is implied for all devices attached to a domain. That is why we
don't need to track it.
For DMA API PASID, it is opt-in by device drivers, therefore we must track
on a per-domain basis to see how many attached devices are using this PASID.
This will avoid blindly flushing IOTLBs for DMA API PASID. dma_unmap() does
not tell you which PASIDs to flush.
In this patchset, I store per domain DMA API PASID info in VT-d only
dmar_domain. (VT-d is the only user so far). If we were to store it the
generic layer, this could be simpler.
> Best regards,
> baolu
Thanks,
Jacob
On Fri, Dec 10, 2021 at 09:50:25AM -0800, Jacob Pan wrote:
> > Tying pasid to an iommu_domain is not a good idea. An iommu_domain
> > represents an I/O address translation table. It could be attached to a
> > device or a PASID on the device.
>
> I don;t think we can avoid storing PASID at domain level or the group's
> default domain. IOTLB flush is per domain. Default domain of DMA type
> is already tying to PASID0, right?
No, it is just wrong.
If the HW requires a list of everything that is connected to the
iommu_domain then it's private iommu_domain should have that list.
But it is a *list* not a single PASID.
If one device has 10 PASID's pointing to this domain you must flush
them all if that is what the HW requires.
Jason
Hi Jason,
On Fri, 10 Dec 2021 08:31:09 -0400, Jason Gunthorpe <[email protected]> wrote:
> On Fri, Dec 10, 2021 at 09:06:24AM +0000, Jean-Philippe Brucker wrote:
> > On Thu, Dec 09, 2021 at 10:14:04AM -0800, Jacob Pan wrote:
> > > > This looks like we're just one step away from device drivers needing
> > > > multiple PASIDs for kernel DMA so I'm trying to figure out how to
> > > > evolve the API towards that. It's probably as simple as keeping a
> > > > kernel IOASID set at first, but then we'll probably want to
> > > > optimize by having multiple overlapping sets for each device driver
> > > > (all separate from the SVA set).
> > > Sounds reasonable to start with a kernel set for in-kernel DMA once
> > > we need multiple ones. But I am not sure what *overlapping* sets mean
> > > here, could you explain?
> >
> > Given that each device uses a separate PASID table, we could allocate
> > the same set of PASID values for different device drivers. We just need
> > to make sure that those values are different from PASIDs allocated for
> > user SVA.
>
> Why does user SVA need global values anyhow?
>
Currently, we have mm.pasid for user SVA. mm is global. We could have per
device PASID for dedicated devices (not shared across mm's), but that would
make things a lot more complex. I am thinking multiple PASIDs per mm is
needed, right?
For VT-d, the shared workqueue (SWQ) requires global PASIDs in that we
cannot have two processes use the same PASID to submit work on a workqueue
shared by the two processes. Each process's PASID must be unique to the
SWQ's PASID table.
> Jason
Thanks,
Jacob
Hi Jason,
On Fri, 10 Dec 2021 13:48:48 -0400, Jason Gunthorpe <[email protected]> wrote:
> On Fri, Dec 10, 2021 at 09:50:25AM -0800, Jacob Pan wrote:
>
> > > Tying pasid to an iommu_domain is not a good idea. An iommu_domain
> > > represents an I/O address translation table. It could be attached to a
> > > device or a PASID on the device.
> >
> > I don;t think we can avoid storing PASID at domain level or the group's
> > default domain. IOTLB flush is per domain. Default domain of DMA type
> > is already tying to PASID0, right?
>
> No, it is just wrong.
>
> If the HW requires a list of everything that is connected to the
> iommu_domain then it's private iommu_domain should have that list.
>
What I have in this patchset is in the private dmar_domain
struct dmar_domain {
...
u32 kernel_pasid; /* for in-kernel DMA w/
PASID */ atomic_t kernel_pasid_user; /* count of kernel_pasid users
*/ struct iommu_domain domain; /* generic domain data structure for
iommu core */
};
Perhaps I am missing the point. "private domain" is still "domain level" as
what I stated. Confused :(
> But it is a *list* not a single PASID.
>
We could have a list when real use case comes.
> If one device has 10 PASID's pointing to this domain you must flush
> them all if that is what the HW requires.
>
Yes. My point is that other than PASID 0 is a given, we must track the 10
PASIDs to avoid wasted flush. It also depend on how TLBs are tagged and
flush granularity available. But at the API level, should we support all the
cases?
> Jason
Thanks,
Jacob
On Fri, Dec 10, 2021 at 10:18:20AM -0800, Jacob Pan wrote:
> > If one device has 10 PASID's pointing to this domain you must flush
> > them all if that is what the HW requires.
> >
> Yes. My point is that other than PASID 0 is a given, we must track the 10
> PASIDs to avoid wasted flush. It also depend on how TLBs are tagged and
> flush granularity available. But at the API level, should we support all the
> cases?
Yes, iommufd will expose all the cases to userspace and anything is
possible.
A scheme that can only attach a domain to one PASID is functionally
useless except for this IDXD problem, so don't make something so
broken.
Jason
> From: Jacob Pan <[email protected]>
> Sent: Saturday, December 11, 2021 2:06 AM
>
> Hi Jason,
>
> On Fri, 10 Dec 2021 08:31:09 -0400, Jason Gunthorpe <[email protected]>
> wrote:
>
> > On Fri, Dec 10, 2021 at 09:06:24AM +0000, Jean-Philippe Brucker wrote:
> > > On Thu, Dec 09, 2021 at 10:14:04AM -0800, Jacob Pan wrote:
> > > > > This looks like we're just one step away from device drivers needing
> > > > > multiple PASIDs for kernel DMA so I'm trying to figure out how to
> > > > > evolve the API towards that. It's probably as simple as keeping a
> > > > > kernel IOASID set at first, but then we'll probably want to
> > > > > optimize by having multiple overlapping sets for each device driver
> > > > > (all separate from the SVA set).
> > > > Sounds reasonable to start with a kernel set for in-kernel DMA once
> > > > we need multiple ones. But I am not sure what *overlapping* sets
> mean
> > > > here, could you explain?
> > >
> > > Given that each device uses a separate PASID table, we could allocate
> > > the same set of PASID values for different device drivers. We just need
> > > to make sure that those values are different from PASIDs allocated for
> > > user SVA.
> >
> > Why does user SVA need global values anyhow?
> >
> Currently, we have mm.pasid for user SVA. mm is global. We could have per
> device PASID for dedicated devices (not shared across mm's), but that would
> make things a lot more complex. I am thinking multiple PASIDs per mm is
> needed, right?
>
> For VT-d, the shared workqueue (SWQ) requires global PASIDs in that we
> cannot have two processes use the same PASID to submit work on a
> workqueue
> shared by the two processes. Each process's PASID must be unique to the
> SWQ's PASID table.
>
Uniqueness is not the main argument of using global PASIDs for
SWQ, since it can be defined either in per-RID or in global PASID
space. No SVA architecture can allow two processes to use the
same PASID to submit work unless they share mm! ????
IMO the real reason is that SWQ for user SVA must be accessed
via ENQCMD instruction which fetches the PASID from a CPU MSR
(managed as part of XSAVE state). This assumes a single PASID
representing the address space for the user task regardless
how many devices are attached to this task. Otherwise the kernel
doesn't know when to set which PASID if multiple PASIDs per mm.
Then this single PASID has to be global cross all attached devices
to ensure uniqueness.
Of course SWQ is only one configuration for user SVA. In theory
multiple per-device PASIDs are still feasible for non-SWQ cases.
But given mm is global and the SWQ restriction, it's not worthwhile
of adding that complexity...
Thanks
Kevin
On Sat, Dec 11, 2021 at 08:39:12AM +0000, Tian, Kevin wrote:
> Uniqueness is not the main argument of using global PASIDs for
> SWQ, since it can be defined either in per-RID or in global PASID
> space. No SVA architecture can allow two processes to use the
> same PASID to submit work unless they share mm! ????
>
> IMO the real reason is that SWQ for user SVA must be accessed
> via ENQCMD instruction which fetches the PASID from a CPU MSR
This really should have been inside a comment in the struct mm
"pasid is the value used by x86 ENQCMD"
(and if we phrase it that way I wonder why it is in a struct mm not
some process or task related struct, since it has nothing to do with
page tables)
And, IMHO, the IOMMU part of the code should avoid using this
field. IOMMU should be able to create arbitarily many "SVA"
iommu_domains for use by PASID even if they can't be used with
ENQCMD. Such is proper layering.
Jason