Hi all,
IDXD kernel work queues were disabled due to the flawed use of kernel VA
and SVA API.
Link: https://lore.kernel.org/linux-iommu/[email protected]/
The solution is to enable it under DMA API where IDXD shared workqueue users
can use ENQCMDS to submit work on buffers mapped by DMA API.
This patchset adds support for attaching PASID to the device's default
domain and the ability to reserve global PASIDs from SVA APIs. We can then
re-enable the kernel work queues and use them under DMA API.
This depends on the IOASID removal series.
https://lore.kernel.org/all/[email protected]/
Thanks,
Jacob
---
Changelog:
v4:
- move dummy functions outside ifdef CONFIG_IOMMU_SVA (Baolu)
- dropped domain type check while disabling idxd system PASID (Baolu)
v3:
- moved global PASID allocation API from SVA to IOMMU (Kevin)
- remove #ifdef around global PASID reservation during boot (Baolu)
- remove restriction on PASID 0 allocation (Baolu)
- fix a bug in sysfs domain change when attaching devices
- clear idxd user interrupt enable bit after disabling device( Fenghua)
v2:
- refactored device PASID attach domain ops based on Baolu's early patch
- addressed TLB flush gap
- explicitly reserve RID_PASID from SVA PASID number space
- get dma domain directly, avoid checking domain types
Jacob Pan (7):
iommu/vt-d: Use non-privileged mode for all PASIDs
iommu/vt-d: Remove PASID supervisor request support
iommu: Support allocation of global PASIDs outside SVA
iommu/vt-d: Reserve RID_PASID from global PASID space
iommu/vt-d: Make device pasid attachment explicit
iommu/vt-d: Implement set_dev_pasid domain op
dmaengine/idxd: Re-enable kernel workqueue under DMA API
drivers/dma/idxd/device.c | 30 +-----
drivers/dma/idxd/init.c | 60 +++++++++++-
drivers/dma/idxd/sysfs.c | 7 --
drivers/iommu/intel/iommu.c | 180 +++++++++++++++++++++++++++++-------
drivers/iommu/intel/iommu.h | 8 ++
drivers/iommu/intel/pasid.c | 43 ---------
drivers/iommu/intel/pasid.h | 7 --
drivers/iommu/iommu-sva.c | 10 +-
drivers/iommu/iommu.c | 33 +++++++
include/linux/iommu.h | 11 +++
10 files changed, 262 insertions(+), 127 deletions(-)
--
2.25.1
Devices that use Intel ENQCMD to submit work must use global PASIDs in
that the PASID are stored in a per CPU MSR. When such device need to
submit work for in-kernel DMA with PASID, it must allocate PASIDs from
the same global number space to avoid conflict.
This patch moves global PASID allocation APIs from SVA to IOMMU APIs.
It is expected that device drivers will use the allocated PASIDs to attach
to appropriate IOMMU domains for use.
Signed-off-by: Jacob Pan <[email protected]>
---
v4: move dummy functions outside ifdef CONFIG_IOMMU_SVA (Baolu)
---
drivers/iommu/iommu-sva.c | 10 ++++------
drivers/iommu/iommu.c | 33 +++++++++++++++++++++++++++++++++
include/linux/iommu.h | 11 +++++++++++
3 files changed, 48 insertions(+), 6 deletions(-)
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index c434b95dc8eb..222544587582 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -9,15 +9,13 @@
#include "iommu-sva.h"
static DEFINE_MUTEX(iommu_sva_lock);
-static DEFINE_IDA(iommu_global_pasid_ida);
/* Allocate a PASID for the mm within range (inclusive) */
static int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
{
int ret = 0;
- if (!pasid_valid(min) || !pasid_valid(max) ||
- min == 0 || max < min)
+ if (!pasid_valid(min) || !pasid_valid(max) || max < min)
return -EINVAL;
mutex_lock(&iommu_sva_lock);
@@ -28,8 +26,8 @@ static int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t ma
goto out;
}
- ret = ida_alloc_range(&iommu_global_pasid_ida, min, max, GFP_KERNEL);
- if (ret < min)
+ ret = iommu_alloc_global_pasid(min, max);
+ if (!pasid_valid(ret))
goto out;
mm->pasid = ret;
ret = 0;
@@ -211,5 +209,5 @@ void mm_pasid_drop(struct mm_struct *mm)
if (likely(!pasid_valid(mm->pasid)))
return;
- ida_free(&iommu_global_pasid_ida, mm->pasid);
+ iommu_free_global_pasid(mm->pasid);
}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 10db680acaed..2a132ff7e3de 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -38,6 +38,7 @@
static struct kset *iommu_group_kset;
static DEFINE_IDA(iommu_group_ida);
+static DEFINE_IDA(iommu_global_pasid_ida);
static unsigned int iommu_def_domain_type __read_mostly;
static bool iommu_dma_strict __read_mostly = IS_ENABLED(CONFIG_IOMMU_DEFAULT_DMA_STRICT);
@@ -3450,3 +3451,35 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
return domain;
}
+
+/**
+ * @brief
+ * Allocate a PASID from the global number space.
+ *
+ * @param min starting range, inclusive
+ * @param max ending range, inclusive
+ * @return The reserved PASID on success or IOMMU_PASID_INVALID on failure.
+ */
+ioasid_t iommu_alloc_global_pasid(ioasid_t min, ioasid_t max)
+{
+ int ret;
+
+ if (!pasid_valid(min) || !pasid_valid(max) || max < min)
+ return IOMMU_PASID_INVALID;
+
+ ret = ida_alloc_range(&iommu_global_pasid_ida, min, max, GFP_KERNEL);
+ if (ret < 0)
+ return IOMMU_PASID_INVALID;
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_alloc_global_pasid);
+
+void iommu_free_global_pasid(ioasid_t pasid)
+{
+ if (WARN_ON(!pasid_valid(pasid)))
+ return;
+
+ ida_free(&iommu_global_pasid_ida, pasid);
+}
+EXPORT_SYMBOL_GPL(iommu_free_global_pasid);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 54f535ff9868..c9720ddc81d2 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -723,6 +723,8 @@ void iommu_detach_device_pasid(struct iommu_domain *domain,
struct iommu_domain *
iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid,
unsigned int type);
+ioasid_t iommu_alloc_global_pasid(ioasid_t min, ioasid_t max);
+void iommu_free_global_pasid(ioasid_t pasid);
#else /* CONFIG_IOMMU_API */
struct iommu_ops {};
@@ -1089,6 +1091,13 @@ iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid,
{
return NULL;
}
+
+static inline ioasid_t iommu_alloc_global_pasid(ioasid_t min, ioasid_t max)
+{
+ return IOMMU_PASID_INVALID;
+}
+
+static inline void iommu_free_global_pasid(ioasid_t pasid) {}
#endif /* CONFIG_IOMMU_API */
/**
@@ -1187,6 +1196,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
struct mm_struct *mm);
void iommu_sva_unbind_device(struct iommu_sva *handle);
u32 iommu_sva_get_pasid(struct iommu_sva *handle);
+
#else
static inline struct iommu_sva *
iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
@@ -1202,6 +1212,7 @@ static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
{
return IOMMU_PASID_INVALID;
}
+
static inline void mm_pasid_init(struct mm_struct *mm) {}
static inline void mm_pasid_drop(struct mm_struct *mm) {}
#endif /* CONFIG_IOMMU_SVA */
--
2.25.1
Supervisor Request Enable (SRE) bit in a PASID entry is for permission
checking on DMA requests. When SRE = 0, DMA with supervisor privilege
will be blocked. However, for in-kernel DMA this is not necessary in that
we are targeting kernel memory anyway. There's no need to differentiate
user and kernel for in-kernel DMA.
Let's use non-privileged (user) permission for all PASIDs used in kernel,
it will be consistent with DMA without PASID (RID_PASID) as well.
Reviewed-by: Lu Baolu <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/iommu.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 0768dcae90fd..9f737ef55463 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2338,8 +2338,6 @@ static int domain_setup_first_level(struct intel_iommu *iommu,
if (level != 4 && level != 5)
return -EINVAL;
- if (pasid != PASID_RID2PASID)
- flags |= PASID_FLAG_SUPERVISOR_MODE;
if (level == 5)
flags |= PASID_FLAG_FL5LP;
--
2.25.1
Devices that use ENQCMDS to submit work on buffers mapped by DMA API
must attach a PASID to the default domain of the device. In preparation
for this use case, this patch implements set_dev_pasid() for the
default_domain_ops.
If the device context has not been set up prior to this call, this will
set up the device context in addition to PASID attachment.
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/iommu.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 52b9d0d3a02c..1ad9c5a4bd8f 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4784,6 +4784,26 @@ static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
domain_detach_iommu(dmar_domain, info->iommu);
}
+static int intel_iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ struct intel_iommu *iommu = info->iommu;
+ int ret;
+
+ if (!pasid_supported(iommu))
+ return -ENODEV;
+
+ ret = prepare_domain_attach_device(domain, dev);
+ if (ret)
+ return ret;
+
+ return dmar_domain_attach_device_pasid(dmar_domain, dev, pasid);
+}
+
+
+
const struct iommu_ops intel_iommu_ops = {
.capable = intel_iommu_capable,
.domain_alloc = intel_iommu_domain_alloc,
@@ -4803,6 +4823,7 @@ const struct iommu_ops intel_iommu_ops = {
#endif
.default_domain_ops = &(const struct iommu_domain_ops) {
.attach_dev = intel_iommu_attach_device,
+ .set_dev_pasid = intel_iommu_attach_device_pasid,
.map_pages = intel_iommu_map_pages,
.unmap_pages = intel_iommu_unmap_pages,
.iotlb_sync_map = intel_iommu_iotlb_sync_map,
--
2.25.1
Currently, when a device is attached to its DMA domain, RID_PASID is
implicitly attached if VT-d is in scalable mode. To prepare for generic
PASID-device domain attachment, this patch parameterizes PASID such that
all PASIDs are attached explicitly.
It will allow code reuse for DMA API with PASID usage and makes no
assumptions of the ordering in which PASIDs and device are attached.
The same change applies to IOTLB invalidation and removing PASIDs.
Extracted common code based on Baolu's patch:
Link:https://lore.kernel.org/linux-iommu/[email protected]/
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/iommu.c | 153 ++++++++++++++++++++++++++++--------
drivers/iommu/intel/iommu.h | 8 ++
2 files changed, 128 insertions(+), 33 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cbb2670f88ca..52b9d0d3a02c 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -278,6 +278,8 @@ static LIST_HEAD(dmar_satc_units);
list_for_each_entry(rmrr, &dmar_rmrr_units, list)
static void device_block_translation(struct device *dev);
+static void intel_iommu_detach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid);
static void intel_iommu_domain_free(struct iommu_domain *domain);
int dmar_disabled = !IS_ENABLED(CONFIG_INTEL_IOMMU_DEFAULT_ON);
@@ -1365,6 +1367,7 @@ domain_lookup_dev_info(struct dmar_domain *domain,
static void domain_update_iotlb(struct dmar_domain *domain)
{
+ struct device_pasid_info *dev_pasid;
struct device_domain_info *info;
bool has_iotlb_device = false;
unsigned long flags;
@@ -1376,6 +1379,14 @@ static void domain_update_iotlb(struct dmar_domain *domain)
break;
}
}
+
+ list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain) {
+ info = dev_iommu_priv_get(dev_pasid->dev);
+ if (info->ats_enabled) {
+ has_iotlb_device = true;
+ break;
+ }
+ }
domain->has_iotlb_device = has_iotlb_device;
spin_unlock_irqrestore(&domain->lock, flags);
}
@@ -1486,6 +1497,7 @@ static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
u64 addr, unsigned mask)
{
+ struct device_pasid_info *dev_pasid;
struct device_domain_info *info;
unsigned long flags;
@@ -1495,6 +1507,39 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
spin_lock_irqsave(&domain->lock, flags);
list_for_each_entry(info, &domain->devices, link)
__iommu_flush_dev_iotlb(info, addr, mask);
+
+ list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain) {
+ /* device TLB is not aware of the use of RID PASID is for DMA w/o PASID */
+ if (dev_pasid->pasid == PASID_RID2PASID)
+ continue;
+
+ info = dev_iommu_priv_get(dev_pasid->dev);
+ qi_flush_dev_iotlb_pasid(info->iommu,
+ PCI_DEVID(info->bus, info->devfn),
+ info->pfsid, dev_pasid->pasid,
+ info->ats_qdep, addr,
+ mask);
+ }
+ spin_unlock_irqrestore(&domain->lock, flags);
+}
+
+/*
+ * The VT-d spec requires to use PASID-based-IOTLB Invalidation to
+ * invalidate IOTLB and the paging-structure-caches for a first-stage
+ * page table.
+ */
+static void domain_flush_pasid_iotlb(struct intel_iommu *iommu,
+ struct dmar_domain *domain, u64 addr,
+ unsigned long npages, bool ih)
+{
+ u16 did = domain_id_iommu(domain, iommu);
+ struct device_pasid_info *dev_pasid;
+ unsigned long flags;
+
+ spin_lock_irqsave(&domain->lock, flags);
+ list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain)
+ qi_flush_piotlb(iommu, did, dev_pasid->pasid, addr, npages, ih);
+
spin_unlock_irqrestore(&domain->lock, flags);
}
@@ -1514,7 +1559,7 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
ih = 1 << 6;
if (domain->use_first_level) {
- qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, pages, ih);
+ domain_flush_pasid_iotlb(iommu, domain, addr, pages, ih);
} else {
unsigned long bitmask = aligned_pages - 1;
@@ -1584,7 +1629,7 @@ static void intel_flush_iotlb_all(struct iommu_domain *domain)
u16 did = domain_id_iommu(dmar_domain, iommu);
if (dmar_domain->use_first_level)
- qi_flush_piotlb(iommu, did, PASID_RID2PASID, 0, -1, 0);
+ domain_flush_pasid_iotlb(iommu, dmar_domain, 0, -1, 0);
else
iommu->flush.flush_iotlb(iommu, did, 0, 0,
DMA_TLB_DSI_FLUSH);
@@ -1756,6 +1801,7 @@ static struct dmar_domain *alloc_domain(unsigned int type)
domain->use_first_level = true;
domain->has_iotlb_device = false;
INIT_LIST_HEAD(&domain->devices);
+ INIT_LIST_HEAD(&domain->dev_pasids);
spin_lock_init(&domain->lock);
xa_init(&domain->iommu_array);
@@ -2429,10 +2475,11 @@ static int __init si_domain_init(int hw)
return 0;
}
-static int dmar_domain_attach_device(struct dmar_domain *domain,
- struct device *dev)
+static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
+ struct device *dev, ioasid_t pasid)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct device_pasid_info *dev_pasid;
struct intel_iommu *iommu;
unsigned long flags;
u8 bus, devfn;
@@ -2442,43 +2489,57 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
if (!iommu)
return -ENODEV;
+ dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
+ if (!dev_pasid)
+ return -ENOMEM;
+
ret = domain_attach_iommu(domain, iommu);
if (ret)
- return ret;
+ goto exit_free;
+
info->domain = domain;
+ dev_pasid->pasid = pasid;
+ dev_pasid->dev = dev;
spin_lock_irqsave(&domain->lock, flags);
- list_add(&info->link, &domain->devices);
+ if (!info->dev_attached)
+ list_add(&info->link, &domain->devices);
+
+ list_add(&dev_pasid->link_domain, &domain->dev_pasids);
spin_unlock_irqrestore(&domain->lock, flags);
/* PASID table is mandatory for a PCI device in scalable mode. */
if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
/* Setup the PASID entry for requests without PASID: */
if (hw_pass_through && domain_type_is_si(domain))
- ret = intel_pasid_setup_pass_through(iommu, domain,
- dev, PASID_RID2PASID);
+ ret = intel_pasid_setup_pass_through(iommu, domain, dev, pasid);
else if (domain->use_first_level)
- ret = domain_setup_first_level(iommu, domain, dev,
- PASID_RID2PASID);
+ ret = domain_setup_first_level(iommu, domain, dev, pasid);
else
- ret = intel_pasid_setup_second_level(iommu, domain,
- dev, PASID_RID2PASID);
+ ret = intel_pasid_setup_second_level(iommu, domain, dev, pasid);
if (ret) {
- dev_err(dev, "Setup RID2PASID failed\n");
+ dev_err(dev, "Setup PASID %d failed\n", pasid);
device_block_translation(dev);
- return ret;
+ goto exit_free;
}
}
+ /* device context already activated, we are done */
+ if (info->dev_attached)
+ goto exit;
ret = domain_context_mapping(domain, dev);
if (ret) {
dev_err(dev, "Domain context map failed\n");
device_block_translation(dev);
- return ret;
+ goto exit_free;
}
iommu_enable_pci_caps(info);
-
+ info->dev_attached = 1;
+exit:
return 0;
+exit_free:
+ kfree(dev_pasid);
+ return ret;
}
static bool device_has_rmrr(struct device *dev)
@@ -4029,8 +4090,7 @@ static void device_block_translation(struct device *dev)
iommu_disable_pci_caps(info);
if (!dev_is_real_dma_subdevice(dev)) {
if (sm_supported(iommu))
- intel_pasid_tear_down_entry(iommu, dev,
- PASID_RID2PASID, false);
+ intel_iommu_detach_device_pasid(&info->domain->domain, dev, PASID_RID2PASID);
else
domain_context_clear(info);
}
@@ -4040,6 +4100,7 @@ static void device_block_translation(struct device *dev)
spin_lock_irqsave(&info->domain->lock, flags);
list_del(&info->link);
+ info->dev_attached = 0;
spin_unlock_irqrestore(&info->domain->lock, flags);
domain_detach_iommu(info->domain, iommu);
@@ -4186,7 +4247,7 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
if (ret)
return ret;
- return dmar_domain_attach_device(to_dmar_domain(domain), dev);
+ return dmar_domain_attach_device_pasid(to_dmar_domain(domain), dev, PASID_RID2PASID);
}
static int intel_iommu_map(struct iommu_domain *domain,
@@ -4675,26 +4736,52 @@ static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
__mapping_notify_one(info->iommu, dmar_domain, pfn, pages);
}
-static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
+static void intel_iommu_detach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
{
- struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
- struct iommu_domain *domain;
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ struct device_pasid_info *i, *dev_pasid = NULL;
+ struct intel_iommu *iommu = info->iommu;
+ unsigned long flags;
- /* Domain type specific cleanup: */
- domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
- if (domain) {
- switch (domain->type) {
- case IOMMU_DOMAIN_SVA:
- intel_svm_remove_dev_pasid(dev, pasid);
- break;
- default:
- /* should never reach here */
- WARN_ON(1);
+ spin_lock_irqsave(&dmar_domain->lock, flags);
+ list_for_each_entry(i, &dmar_domain->dev_pasids, link_domain) {
+ if (i->dev == dev && i->pasid == pasid) {
+ list_del(&i->link_domain);
+ dev_pasid = i;
break;
}
}
+ spin_unlock_irqrestore(&dmar_domain->lock, flags);
+ if (WARN_ON(!dev_pasid))
+ return;
+
+ /* PASID entry already cleared during SVA unbind */
+ if (domain->type != IOMMU_DOMAIN_SVA)
+ intel_pasid_tear_down_entry(iommu, dev, pasid, false);
+
+ kfree(dev_pasid);
+}
+
+static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct dmar_domain *dmar_domain;
+ struct iommu_domain *domain;
+
+ domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
+ dmar_domain = to_dmar_domain(domain);
+
+ /*
+ * SVA Domain type specific cleanup: Not ideal but not until we have
+ * IOPF capable domain specific ops, we need this special case.
+ */
+ if (domain->type == IOMMU_DOMAIN_SVA)
+ return intel_svm_remove_dev_pasid(dev, pasid);
- intel_pasid_tear_down_entry(iommu, dev, pasid, false);
+ intel_iommu_detach_device_pasid(domain, dev, pasid);
+ domain_detach_iommu(dmar_domain, info->iommu);
}
const struct iommu_ops intel_iommu_ops = {
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 65b15be72878..b6c26f25d1ba 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -595,6 +595,7 @@ struct dmar_domain {
spinlock_t lock; /* Protect device tracking lists */
struct list_head devices; /* all devices' list */
+ struct list_head dev_pasids; /* all attached pasids */
struct dma_pte *pgd; /* virtual address */
int gaw; /* max guest address width */
@@ -708,6 +709,7 @@ struct device_domain_info {
u8 ats_supported:1;
u8 ats_enabled:1;
u8 dtlb_extra_inval:1; /* Quirk for devices need extra flush */
+ u8 dev_attached:1; /* Device context activated */
u8 ats_qdep;
struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
struct intel_iommu *iommu; /* IOMMU used by this device */
@@ -715,6 +717,12 @@ struct device_domain_info {
struct pasid_table *pasid_table; /* pasid table */
};
+struct device_pasid_info {
+ struct list_head link_domain; /* link to domain siblings */
+ struct device *dev; /* physical device derived from */
+ ioasid_t pasid; /* PASID on physical device */
+};
+
static inline void __iommu_flush_cache(
struct intel_iommu *iommu, void *addr, int size)
{
--
2.25.1
Kernel workqueues were disabled due to flawed use of kernel VA and SVA
API. Now that we have the support for attaching PASID to the device's
default domain and the ability to reserve global PASIDs from SVA APIs,
we can re-enable the kernel work queues and use them under DMA API.
We also use non-privileged access for in-kernel DMA to be consistent
with the IOMMU settings. Consequently, interrupt for user privilege is
enabled for work completion IRQs.
Link:https://lore.kernel.org/linux-iommu/[email protected]/
Reviewed-by: Dave Jiang <[email protected]>
Reviewed-by: Fenghua Yu <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/dma/idxd/device.c | 30 ++++----------------
drivers/dma/idxd/init.c | 60 ++++++++++++++++++++++++++++++++++++---
drivers/dma/idxd/sysfs.c | 7 -----
3 files changed, 61 insertions(+), 36 deletions(-)
diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c
index 6fca8fa8d3a8..f6b133d61a04 100644
--- a/drivers/dma/idxd/device.c
+++ b/drivers/dma/idxd/device.c
@@ -299,21 +299,6 @@ void idxd_wqs_unmap_portal(struct idxd_device *idxd)
}
}
-static void __idxd_wq_set_priv_locked(struct idxd_wq *wq, int priv)
-{
- struct idxd_device *idxd = wq->idxd;
- union wqcfg wqcfg;
- unsigned int offset;
-
- offset = WQCFG_OFFSET(idxd, wq->id, WQCFG_PRIVL_IDX);
- spin_lock(&idxd->dev_lock);
- wqcfg.bits[WQCFG_PRIVL_IDX] = ioread32(idxd->reg_base + offset);
- wqcfg.priv = priv;
- wq->wqcfg->bits[WQCFG_PRIVL_IDX] = wqcfg.bits[WQCFG_PRIVL_IDX];
- iowrite32(wqcfg.bits[WQCFG_PRIVL_IDX], idxd->reg_base + offset);
- spin_unlock(&idxd->dev_lock);
-}
-
static void __idxd_wq_set_pasid_locked(struct idxd_wq *wq, int pasid)
{
struct idxd_device *idxd = wq->idxd;
@@ -1324,15 +1309,14 @@ int drv_enable_wq(struct idxd_wq *wq)
}
/*
- * In the event that the WQ is configurable for pasid and priv bits.
- * For kernel wq, the driver should setup the pasid, pasid_en, and priv bit.
- * However, for non-kernel wq, the driver should only set the pasid_en bit for
- * shared wq. A dedicated wq that is not 'kernel' type will configure pasid and
+ * In the event that the WQ is configurable for pasid, the driver
+ * should setup the pasid, pasid_en bit. This is true for both kernel
+ * and user shared workqueues. There is no need to setup priv bit in
+ * that in-kernel DMA will also do user privileged requests.
+ * A dedicated wq that is not 'kernel' type will configure pasid and
* pasid_en later on so there is no need to setup.
*/
if (test_bit(IDXD_FLAG_CONFIGURABLE, &idxd->flags)) {
- int priv = 0;
-
if (wq_pasid_enabled(wq)) {
if (is_idxd_wq_kernel(wq) || wq_shared(wq)) {
u32 pasid = wq_dedicated(wq) ? idxd->pasid : 0;
@@ -1340,10 +1324,6 @@ int drv_enable_wq(struct idxd_wq *wq)
__idxd_wq_set_pasid_locked(wq, pasid);
}
}
-
- if (is_idxd_wq_kernel(wq))
- priv = 1;
- __idxd_wq_set_priv_locked(wq, priv);
}
rc = 0;
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index e6ee267da0ff..fd4560c91296 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -506,14 +506,65 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d
static int idxd_enable_system_pasid(struct idxd_device *idxd)
{
- return -EOPNOTSUPP;
+ struct pci_dev *pdev = idxd->pdev;
+ struct device *dev = &pdev->dev;
+ struct iommu_domain *domain;
+ union gencfg_reg gencfg;
+ ioasid_t pasid;
+ int ret;
+
+ /*
+ * Attach a global PASID to the DMA domain so that we can use ENQCMDS
+ * to submit work on buffers mapped by DMA API.
+ */
+ domain = iommu_get_domain_for_dev(dev);
+ if (!domain)
+ return -EPERM;
+
+ pasid = iommu_alloc_global_pasid(0, dev->iommu->max_pasids);
+ if (!pasid_valid(pasid))
+ return -ENOSPC;
+
+ /*
+ * DMA domain is owned by the driver, it should support all valid
+ * types such as DMA-FQ, identity, etc.
+ */
+ ret = iommu_attach_device_pasid(domain, dev, pasid);
+ if (ret) {
+ dev_err(dev, "failed to attach device pasid %d, domain type %d",
+ pasid, domain->type);
+ iommu_free_global_pasid(pasid);
+ return ret;
+ }
+
+ /* Since we set user privilege for kernel DMA, enable completion IRQ */
+ gencfg.bits = ioread32(idxd->reg_base + IDXD_GENCFG_OFFSET);
+ gencfg.user_int_en = 1;
+ iowrite32(gencfg.bits, idxd->reg_base + IDXD_GENCFG_OFFSET);
+ idxd->pasid = pasid;
+
+ return ret;
}
static void idxd_disable_system_pasid(struct idxd_device *idxd)
{
+ struct pci_dev *pdev = idxd->pdev;
+ struct device *dev = &pdev->dev;
+ struct iommu_domain *domain;
+ union gencfg_reg gencfg;
+
+ domain = iommu_get_domain_for_dev(dev);
+ if (!domain)
+ return;
+
+ iommu_detach_device_pasid(domain, dev, idxd->pasid);
+ iommu_free_global_pasid(idxd->pasid);
- iommu_sva_unbind_device(idxd->sva);
+ gencfg.bits = ioread32(idxd->reg_base + IDXD_GENCFG_OFFSET);
+ gencfg.user_int_en = 0;
+ iowrite32(gencfg.bits, idxd->reg_base + IDXD_GENCFG_OFFSET);
idxd->sva = NULL;
+ idxd->pasid = IOMMU_PASID_INVALID;
}
static int idxd_probe(struct idxd_device *idxd)
@@ -535,8 +586,9 @@ static int idxd_probe(struct idxd_device *idxd)
} else {
set_bit(IDXD_FLAG_USER_PASID_ENABLED, &idxd->flags);
- if (idxd_enable_system_pasid(idxd))
- dev_warn(dev, "No in-kernel DMA with PASID.\n");
+ rc = idxd_enable_system_pasid(idxd);
+ if (rc)
+ dev_warn(dev, "No in-kernel DMA with PASID. %d\n", rc);
else
set_bit(IDXD_FLAG_PASID_ENABLED, &idxd->flags);
}
diff --git a/drivers/dma/idxd/sysfs.c b/drivers/dma/idxd/sysfs.c
index 18cd8151dee0..c5561c00a503 100644
--- a/drivers/dma/idxd/sysfs.c
+++ b/drivers/dma/idxd/sysfs.c
@@ -944,13 +944,6 @@ static ssize_t wq_name_store(struct device *dev,
if (strlen(buf) > WQ_NAME_SIZE || strlen(buf) == 0)
return -EINVAL;
- /*
- * This is temporarily placed here until we have SVM support for
- * dmaengine.
- */
- if (wq->type == IDXD_WQT_KERNEL && device_pasid_enabled(wq->idxd))
- return -EOPNOTSUPP;
-
input = kstrndup(buf, count, GFP_KERNEL);
if (!input)
return -ENOMEM;
--
2.25.1
On VT-d platforms, RID_PASID is used for DMA request without PASID. We
should not treat RID_PASID special instead let it be allocated from the
global PASID number space. Non-zero value can be used in RID_PASID on
Intel VT-d.
For ARM, AMD and others that _always_ use 0 as RID_PASID, there is no
impact in that SVA PASID allocation base is 1.
With this change, devices do both DMA with PASID and SVA will not worry
about conflicts when it comes to allocating PASIDs for in-kernel DMA.
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel/iommu.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9f737ef55463..cbb2670f88ca 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3956,6 +3956,10 @@ int __init intel_iommu_init(void)
intel_iommu_enabled = 1;
+ /* Reserved RID_PASID from the global namespace for legacy DMA */
+ WARN_ON(iommu_alloc_global_pasid(PASID_RID2PASID, PASID_RID2PASID) !=
+ PASID_RID2PASID);
+
return 0;
out_free_dmar:
--
2.25.1
On 4/8/23 2:05 AM, Jacob Pan wrote:
> Devices that use Intel ENQCMD to submit work must use global PASIDs in
> that the PASID are stored in a per CPU MSR. When such device need to
> submit work for in-kernel DMA with PASID, it must allocate PASIDs from
> the same global number space to avoid conflict.
>
> This patch moves global PASID allocation APIs from SVA to IOMMU APIs.
> It is expected that device drivers will use the allocated PASIDs to attach
> to appropriate IOMMU domains for use.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> v4: move dummy functions outside ifdef CONFIG_IOMMU_SVA (Baolu)
> ---
> drivers/iommu/iommu-sva.c | 10 ++++------
> drivers/iommu/iommu.c | 33 +++++++++++++++++++++++++++++++++
> include/linux/iommu.h | 11 +++++++++++
> 3 files changed, 48 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
> index c434b95dc8eb..222544587582 100644
> --- a/drivers/iommu/iommu-sva.c
> +++ b/drivers/iommu/iommu-sva.c
> @@ -9,15 +9,13 @@
> #include "iommu-sva.h"
>
> static DEFINE_MUTEX(iommu_sva_lock);
> -static DEFINE_IDA(iommu_global_pasid_ida);
>
> /* Allocate a PASID for the mm within range (inclusive) */
> static int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
> {
> int ret = 0;
>
> - if (!pasid_valid(min) || !pasid_valid(max) ||
> - min == 0 || max < min)
> + if (!pasid_valid(min) || !pasid_valid(max) || max < min)
> return -EINVAL;
>
> mutex_lock(&iommu_sva_lock);
> @@ -28,8 +26,8 @@ static int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t ma
> goto out;
> }
>
> - ret = ida_alloc_range(&iommu_global_pasid_ida, min, max, GFP_KERNEL);
> - if (ret < min)
> + ret = iommu_alloc_global_pasid(min, max);
> + if (!pasid_valid(ret))
> goto out;
> mm->pasid = ret;
> ret = 0;
> @@ -211,5 +209,5 @@ void mm_pasid_drop(struct mm_struct *mm)
> if (likely(!pasid_valid(mm->pasid)))
> return;
>
> - ida_free(&iommu_global_pasid_ida, mm->pasid);
> + iommu_free_global_pasid(mm->pasid);
> }
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 10db680acaed..2a132ff7e3de 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -38,6 +38,7 @@
>
> static struct kset *iommu_group_kset;
> static DEFINE_IDA(iommu_group_ida);
> +static DEFINE_IDA(iommu_global_pasid_ida);
>
> static unsigned int iommu_def_domain_type __read_mostly;
> static bool iommu_dma_strict __read_mostly = IS_ENABLED(CONFIG_IOMMU_DEFAULT_DMA_STRICT);
> @@ -3450,3 +3451,35 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
>
> return domain;
> }
> +
> +/**
> + * @brief
> + * Allocate a PASID from the global number space.
> + *
> + * @param min starting range, inclusive
> + * @param max ending range, inclusive
> + * @return The reserved PASID on success or IOMMU_PASID_INVALID on failure.
> + */
> +ioasid_t iommu_alloc_global_pasid(ioasid_t min, ioasid_t max)
> +{
> + int ret;
> +
> + if (!pasid_valid(min) || !pasid_valid(max) || max < min)
> + return IOMMU_PASID_INVALID;
> +
> + ret = ida_alloc_range(&iommu_global_pasid_ida, min, max, GFP_KERNEL);
> + if (ret < 0)
> + return IOMMU_PASID_INVALID;
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(iommu_alloc_global_pasid);
> +
> +void iommu_free_global_pasid(ioasid_t pasid)
> +{
> + if (WARN_ON(!pasid_valid(pasid)))
> + return;
> +
> + ida_free(&iommu_global_pasid_ida, pasid);
> +}
> +EXPORT_SYMBOL_GPL(iommu_free_global_pasid);
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 54f535ff9868..c9720ddc81d2 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -723,6 +723,8 @@ void iommu_detach_device_pasid(struct iommu_domain *domain,
> struct iommu_domain *
> iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid,
> unsigned int type);
> +ioasid_t iommu_alloc_global_pasid(ioasid_t min, ioasid_t max);
> +void iommu_free_global_pasid(ioasid_t pasid);
> #else /* CONFIG_IOMMU_API */
>
> struct iommu_ops {};
> @@ -1089,6 +1091,13 @@ iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid,
> {
> return NULL;
> }
> +
> +static inline ioasid_t iommu_alloc_global_pasid(ioasid_t min, ioasid_t max)
> +{
> + return IOMMU_PASID_INVALID;
> +}
> +
> +static inline void iommu_free_global_pasid(ioasid_t pasid) {}
> #endif /* CONFIG_IOMMU_API */
>
> /**
> @@ -1187,6 +1196,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
> struct mm_struct *mm);
> void iommu_sva_unbind_device(struct iommu_sva *handle);
> u32 iommu_sva_get_pasid(struct iommu_sva *handle);
> +
Nit: irrelevant blank line
> #else
> static inline struct iommu_sva *
> iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> @@ -1202,6 +1212,7 @@ static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
> {
> return IOMMU_PASID_INVALID;
> }
> +
Ditto
> static inline void mm_pasid_init(struct mm_struct *mm) {}
> static inline void mm_pasid_drop(struct mm_struct *mm) {}
> #endif /* CONFIG_IOMMU_SVA */
Other look good to me.
Reviewed-by: Lu Baolu <[email protected]>
Best regards,
baolu
On 4/8/23 2:05 AM, Jacob Pan wrote:
> On VT-d platforms, RID_PASID is used for DMA request without PASID. We
> should not treat RID_PASID special instead let it be allocated from the
> global PASID number space. Non-zero value can be used in RID_PASID on
> Intel VT-d.
>
> For ARM, AMD and others that_always_ use 0 as RID_PASID, there is no
> impact in that SVA PASID allocation base is 1.
>
> With this change, devices do both DMA with PASID and SVA will not worry
> about conflicts when it comes to allocating PASIDs for in-kernel DMA.
>
> Signed-off-by: Jacob Pan<[email protected]>
> ---
> drivers/iommu/intel/iommu.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 9f737ef55463..cbb2670f88ca 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -3956,6 +3956,10 @@ int __init intel_iommu_init(void)
>
> intel_iommu_enabled = 1;
>
> + /* Reserved RID_PASID from the global namespace for legacy DMA */
> + WARN_ON(iommu_alloc_global_pasid(PASID_RID2PASID, PASID_RID2PASID) !=
> + PASID_RID2PASID);
How about moving above line up a bit? For example, at least before
iommu_device_register(). This is the starting point where device drivers
may want global PASIDs.
Best regards,
baolu
On 4/8/23 2:05 AM, Jacob Pan wrote:
> @@ -2429,10 +2475,11 @@ static int __init si_domain_init(int hw)
> return 0;
> }
>
> -static int dmar_domain_attach_device(struct dmar_domain *domain,
> - struct device *dev)
> +static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
> + struct device *dev, ioasid_t pasid)
> {
> struct device_domain_info *info = dev_iommu_priv_get(dev);
> + struct device_pasid_info *dev_pasid;
> struct intel_iommu *iommu;
> unsigned long flags;
> u8 bus, devfn;
> @@ -2442,43 +2489,57 @@ static int dmar_domain_attach_device(struct dmar_domain *domain,
> if (!iommu)
> return -ENODEV;
>
> + dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
> + if (!dev_pasid)
> + return -ENOMEM;
> +
> ret = domain_attach_iommu(domain, iommu);
> if (ret)
> - return ret;
> + goto exit_free;
> +
> info->domain = domain;
> + dev_pasid->pasid = pasid;
> + dev_pasid->dev = dev;
> spin_lock_irqsave(&domain->lock, flags);
> - list_add(&info->link, &domain->devices);
> + if (!info->dev_attached)
> + list_add(&info->link, &domain->devices);
> +
> + list_add(&dev_pasid->link_domain, &domain->dev_pasids);
> spin_unlock_irqrestore(&domain->lock, flags);
>
> /* PASID table is mandatory for a PCI device in scalable mode. */
> if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
> /* Setup the PASID entry for requests without PASID: */
> if (hw_pass_through && domain_type_is_si(domain))
> - ret = intel_pasid_setup_pass_through(iommu, domain,
> - dev, PASID_RID2PASID);
> + ret = intel_pasid_setup_pass_through(iommu, domain, dev, pasid);
> else if (domain->use_first_level)
> - ret = domain_setup_first_level(iommu, domain, dev,
> - PASID_RID2PASID);
> + ret = domain_setup_first_level(iommu, domain, dev, pasid);
> else
> - ret = intel_pasid_setup_second_level(iommu, domain,
> - dev, PASID_RID2PASID);
> + ret = intel_pasid_setup_second_level(iommu, domain, dev, pasid);
> if (ret) {
> - dev_err(dev, "Setup RID2PASID failed\n");
> + dev_err(dev, "Setup PASID %d failed\n", pasid);
> device_block_translation(dev);
> - return ret;
> + goto exit_free;
> }
> }
> + /* device context already activated, we are done */
> + if (info->dev_attached)
> + goto exit;
>
> ret = domain_context_mapping(domain, dev);
> if (ret) {
> dev_err(dev, "Domain context map failed\n");
> device_block_translation(dev);
> - return ret;
> + goto exit_free;
> }
>
> iommu_enable_pci_caps(info);
> -
> + info->dev_attached = 1;
> +exit:
> return 0;
> +exit_free:
> + kfree(dev_pasid);
> + return ret;
> }
>
> static bool device_has_rmrr(struct device *dev)
> @@ -4029,8 +4090,7 @@ static void device_block_translation(struct device *dev)
> iommu_disable_pci_caps(info);
> if (!dev_is_real_dma_subdevice(dev)) {
> if (sm_supported(iommu))
> - intel_pasid_tear_down_entry(iommu, dev,
> - PASID_RID2PASID, false);
> + intel_iommu_detach_device_pasid(&info->domain->domain, dev, PASID_RID2PASID);
> else
> domain_context_clear(info);
> }
> @@ -4040,6 +4100,7 @@ static void device_block_translation(struct device *dev)
>
> spin_lock_irqsave(&info->domain->lock, flags);
> list_del(&info->link);
> + info->dev_attached = 0;
> spin_unlock_irqrestore(&info->domain->lock, flags);
>
> domain_detach_iommu(info->domain, iommu);
> @@ -4186,7 +4247,7 @@ static int intel_iommu_attach_device(struct iommu_domain *domain,
> if (ret)
> return ret;
>
> - return dmar_domain_attach_device(to_dmar_domain(domain), dev);
> + return dmar_domain_attach_device_pasid(to_dmar_domain(domain), dev, PASID_RID2PASID);
> }
For VT-d driver, attach_dev and attach_dev_pasid have different
meanings. Merging them into one helper may lead to confusion. What do
you think of the following code? The dmar_domain_attach_device_pasid()
helper could be reused for attach_dev_pasid path.
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 7c2f4bd33582..09ae62bc3724 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2434,6 +2434,40 @@ static int __init si_domain_init(int hw)
return 0;
}
+
+static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
+ struct intel_iommu *iommu,
+ struct device *dev, ioasid_t pasid)
+{
+ struct device_pasid_info *dev_pasid;
+ unsigned long flags;
+ int ret;
+
+ dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
+ if (!dev_pasid)
+ return -ENOMEM;
+
+ if (hw_pass_through && domain_type_is_si(domain))
+ ret = intel_pasid_setup_pass_through(iommu, domain, dev, pasid);
+ else if (domain->use_first_level)
+ ret = domain_setup_first_level(iommu, domain, dev, pasid);
+ else
+ ret = intel_pasid_setup_second_level(iommu, domain, dev, pasid);
+
+ if (ret) {
+ kfree(dev_pasid);
+ return ret;
+ }
+
+ dev_pasid->pasid = pasid;
+ dev_pasid->dev = dev;
+ spin_lock_irqsave(&domain->lock, flags);
+ list_add(&dev_pasid->link_domain, &domain->dev_pasids);
+ spin_unlock_irqrestore(&domain->lock, flags);
+
+ return 0;
+}
+
static int dmar_domain_attach_device(struct dmar_domain *domain,
struct device *dev)
{
@@ -2458,15 +2492,8 @@ static int dmar_domain_attach_device(struct
dmar_domain *domain,
/* PASID table is mandatory for a PCI device in scalable mode. */
if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
/* Setup the PASID entry for requests without PASID: */
- if (hw_pass_through && domain_type_is_si(domain))
- ret = intel_pasid_setup_pass_through(iommu, domain,
- dev, PASID_RID2PASID);
- else if (domain->use_first_level)
- ret = domain_setup_first_level(iommu, domain, dev,
- PASID_RID2PASID);
- else
- ret = intel_pasid_setup_second_level(iommu, domain,
- dev, PASID_RID2PASID);
+ ret = dmar_domain_attach_device_pasid(domain, iommu, dev,
+ PASID_RID2PASID);
if (ret) {
dev_err(dev, "Setup RID2PASID failed\n");
device_block_translation(dev);
>
> static int intel_iommu_map(struct iommu_domain *domain,
> @@ -4675,26 +4736,52 @@ static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
> __mapping_notify_one(info->iommu, dmar_domain, pfn, pages);
> }
>
> -static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
> +static void intel_iommu_detach_device_pasid(struct iommu_domain *domain,
> + struct device *dev, ioasid_t pasid)
> {
> - struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
> - struct iommu_domain *domain;
> + struct device_domain_info *info = dev_iommu_priv_get(dev);
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct device_pasid_info *i, *dev_pasid = NULL;
> + struct intel_iommu *iommu = info->iommu;
> + unsigned long flags;
>
> - /* Domain type specific cleanup: */
> - domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
> - if (domain) {
> - switch (domain->type) {
> - case IOMMU_DOMAIN_SVA:
> - intel_svm_remove_dev_pasid(dev, pasid);
> - break;
> - default:
> - /* should never reach here */
> - WARN_ON(1);
> + spin_lock_irqsave(&dmar_domain->lock, flags);
> + list_for_each_entry(i, &dmar_domain->dev_pasids, link_domain) {
> + if (i->dev == dev && i->pasid == pasid) {
> + list_del(&i->link_domain);
> + dev_pasid = i;
> break;
> }
> }
> + spin_unlock_irqrestore(&dmar_domain->lock, flags);
> + if (WARN_ON(!dev_pasid))
> + return;
> +
> + /* PASID entry already cleared during SVA unbind */
> + if (domain->type != IOMMU_DOMAIN_SVA)
> + intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> +
> + kfree(dev_pasid);
> +}
> +
> +static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
> +{
> + struct device_domain_info *info = dev_iommu_priv_get(dev);
> + struct dmar_domain *dmar_domain;
> + struct iommu_domain *domain;
> +
> + domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
> + dmar_domain = to_dmar_domain(domain);
> +
> + /*
> + * SVA Domain type specific cleanup: Not ideal but not until we have
> + * IOPF capable domain specific ops, we need this special case.
> + */
> + if (domain->type == IOMMU_DOMAIN_SVA)
> + return intel_svm_remove_dev_pasid(dev, pasid);
>
> - intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> + intel_iommu_detach_device_pasid(domain, dev, pasid);
> + domain_detach_iommu(dmar_domain, info->iommu);
> }
The remove_dev_pasid path need to change only after attach_dev_pasid op
is added, right? If so, we should move such change into the next patch.
>
> const struct iommu_ops intel_iommu_ops = {
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 65b15be72878..b6c26f25d1ba 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -595,6 +595,7 @@ struct dmar_domain {
>
> spinlock_t lock; /* Protect device tracking lists */
> struct list_head devices; /* all devices' list */
> + struct list_head dev_pasids; /* all attached pasids */
>
> struct dma_pte *pgd; /* virtual address */
> int gaw; /* max guest address width */
> @@ -708,6 +709,7 @@ struct device_domain_info {
> u8 ats_supported:1;
> u8 ats_enabled:1;
> u8 dtlb_extra_inval:1; /* Quirk for devices need extra flush */
> + u8 dev_attached:1; /* Device context activated */
> u8 ats_qdep;
> struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
> struct intel_iommu *iommu; /* IOMMU used by this device */
> @@ -715,6 +717,12 @@ struct device_domain_info {
> struct pasid_table *pasid_table; /* pasid table */
> };
>
> +struct device_pasid_info {
> + struct list_head link_domain; /* link to domain siblings */
> + struct device *dev; /* physical device derived from */
> + ioasid_t pasid; /* PASID on physical device */
> +};
> +
> static inline void __iommu_flush_cache(
> struct intel_iommu *iommu, void *addr, int size)
> {
Best regards,
baolu
On 4/10/23 10:46 AM, Baolu Lu wrote:
>> @@ -4040,6 +4100,7 @@ static void device_block_translation(struct
>> device *dev)
>> spin_lock_irqsave(&info->domain->lock, flags);
>> list_del(&info->link);
>> + info->dev_attached = 0;
>> spin_unlock_irqrestore(&info->domain->lock, flags);
>> domain_detach_iommu(info->domain, iommu);
>> @@ -4186,7 +4247,7 @@ static int intel_iommu_attach_device(struct
>> iommu_domain *domain,
>> if (ret)
>> return ret;
>> - return dmar_domain_attach_device(to_dmar_domain(domain), dev);
>> + return dmar_domain_attach_device_pasid(to_dmar_domain(domain),
>> dev, PASID_RID2PASID);
>> }
>
> For VT-d driver, attach_dev and attach_dev_pasid have different
> meanings. Merging them into one helper may lead to confusion. What do
> you think of the following code? The dmar_domain_attach_device_pasid()
> helper could be reused for attach_dev_pasid path.
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 7c2f4bd33582..09ae62bc3724 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2434,6 +2434,40 @@ static int __init si_domain_init(int hw)
> return 0;
> }
>
> +
> +static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
> + struct intel_iommu *iommu,
> + struct device *dev, ioasid_t pasid)
> +{
> + struct device_pasid_info *dev_pasid;
> + unsigned long flags;
> + int ret;
> +
> + dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
> + if (!dev_pasid)
> + return -ENOMEM;
> +
> + if (hw_pass_through && domain_type_is_si(domain))
> + ret = intel_pasid_setup_pass_through(iommu, domain, dev, pasid);
> + else if (domain->use_first_level)
> + ret = domain_setup_first_level(iommu, domain, dev, pasid);
> + else
> + ret = intel_pasid_setup_second_level(iommu, domain, dev, pasid);
> +
> + if (ret) {
> + kfree(dev_pasid);
> + return ret;
> + }
> +
> + dev_pasid->pasid = pasid;
> + dev_pasid->dev = dev;
> + spin_lock_irqsave(&domain->lock, flags);
> + list_add(&dev_pasid->link_domain, &domain->dev_pasids);
> + spin_unlock_irqrestore(&domain->lock, flags);
> +
> + return 0;
> +}
> +
> static int dmar_domain_attach_device(struct dmar_domain *domain,
> struct device *dev)
> {
> @@ -2458,15 +2492,8 @@ static int dmar_domain_attach_device(struct
> dmar_domain *domain,
> /* PASID table is mandatory for a PCI device in scalable mode. */
> if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
> /* Setup the PASID entry for requests without PASID: */
> - if (hw_pass_through && domain_type_is_si(domain))
> - ret = intel_pasid_setup_pass_through(iommu, domain,
> - dev, PASID_RID2PASID);
> - else if (domain->use_first_level)
> - ret = domain_setup_first_level(iommu, domain, dev,
> - PASID_RID2PASID);
> - else
> - ret = intel_pasid_setup_second_level(iommu, domain,
> - dev, PASID_RID2PASID);
> + ret = dmar_domain_attach_device_pasid(domain, iommu, dev,
> + PASID_RID2PASID);
> if (ret) {
> dev_err(dev, "Setup RID2PASID failed\n");
> device_block_translation(dev);
Sorry! I forgot one thing. The dev_pasid data allocated in attach_dev
path should be freed in device_block_translation(). Perhaps we need to
add below change?
@@ -4107,6 +4134,7 @@ static void device_block_translation(struct device
*dev)
{
struct device_domain_info *info = dev_iommu_priv_get(dev);
struct intel_iommu *iommu = info->iommu;
+ struct device_pasid_info *dev_pasid;
unsigned long flags;
iommu_disable_pci_caps(info);
@@ -4118,6 +4146,16 @@ static void device_block_translation(struct
device *dev)
domain_context_clear(info);
}
+ spin_lock_irqsave(&info->domain->lock, flags);
+ list_for_each_entry(dev_pasid, &domain->dev_pasids, link_domain) {
+ if (dev_pasid->dev != dev || dev_pasid->pasid != RID2PASID)
+ continue;
+
+ list_del(&dev_pasid->link_domain);
+ kfree(dev_pasid);
+ }
+ spin_unlock_irqrestore(&info->domain->lock, flags);
+
if (!info->domain)
return;
Best regards,
baolu
> From: Jacob Pan <[email protected]>
> Sent: Saturday, April 8, 2023 2:06 AM
> @@ -28,8 +26,8 @@ static int iommu_sva_alloc_pasid(struct mm_struct
> *mm, ioasid_t min, ioasid_t ma
> goto out;
> }
>
> - ret = ida_alloc_range(&iommu_global_pasid_ida, min, max,
> GFP_KERNEL);
> - if (ret < min)
> + ret = iommu_alloc_global_pasid(min, max);
I wonder whether this can take a device pointer so dev->iommu->max_pasids
is enforced inside the alloc function.
and do we even need the min/max parameters? With special pasids reserved
then what driver needs is just to get a free pasid from the global space within
dev->iommu->max_pasids constraint...
iommu_sva_alloc_pasid() can be reworked to avoid min/max by taking a
device pointer too.
On 4/11/23 4:02 PM, Tian, Kevin wrote:
>> From: Jacob Pan <[email protected]>
>> Sent: Saturday, April 8, 2023 2:06 AM
>> @@ -28,8 +26,8 @@ static int iommu_sva_alloc_pasid(struct mm_struct
>> *mm, ioasid_t min, ioasid_t ma
>> goto out;
>> }
>>
>> - ret = ida_alloc_range(&iommu_global_pasid_ida, min, max,
>> GFP_KERNEL);
>> - if (ret < min)
>> + ret = iommu_alloc_global_pasid(min, max);
>
> I wonder whether this can take a device pointer so dev->iommu->max_pasids
> is enforced inside the alloc function.
Agreed. Instead of using the open code, it looks better to have a helper
like dev_iommu_max_pasids().
>
> and do we even need the min/max parameters? With special pasids reserved
> then what driver needs is just to get a free pasid from the global space within
> dev->iommu->max_pasids constraint...
>
> iommu_sva_alloc_pasid() can be reworked to avoid min/max by taking a
> device pointer too.
Best regards,
baolu
Hi Kevin,
On Tue, 11 Apr 2023 08:02:55 +0000, "Tian, Kevin" <[email protected]>
wrote:
> > From: Jacob Pan <[email protected]>
> > Sent: Saturday, April 8, 2023 2:06 AM
> > @@ -28,8 +26,8 @@ static int iommu_sva_alloc_pasid(struct mm_struct
> > *mm, ioasid_t min, ioasid_t ma
> > goto out;
> > }
> >
> > - ret = ida_alloc_range(&iommu_global_pasid_ida, min, max,
> > GFP_KERNEL);
> > - if (ret < min)
> > + ret = iommu_alloc_global_pasid(min, max);
>
> I wonder whether this can take a device pointer so dev->iommu->max_pasids
> is enforced inside the alloc function.
>
> and do we even need the min/max parameters? With special pasids reserved
> then what driver needs is just to get a free pasid from the global space
> within dev->iommu->max_pasids constraint...
>
> iommu_sva_alloc_pasid() can be reworked to avoid min/max by taking a
> device pointer too.
I think that will work too albeit a philosophical change. It probably
should be called iommu_alloc_dev_global_pasid(dev).
But I feel the current approach is more flexible in that device drivers
can control the range if for some reason it does not want go max_pasid.
Thanks,
Jacob
Hi Baolu,
On Mon, 10 Apr 2023 09:59:45 +0800, Baolu Lu <[email protected]>
wrote:
> On 4/8/23 2:05 AM, Jacob Pan wrote:
> > On VT-d platforms, RID_PASID is used for DMA request without PASID. We
> > should not treat RID_PASID special instead let it be allocated from the
> > global PASID number space. Non-zero value can be used in RID_PASID on
> > Intel VT-d.
> >
> > For ARM, AMD and others that_always_ use 0 as RID_PASID, there is no
> > impact in that SVA PASID allocation base is 1.
> >
> > With this change, devices do both DMA with PASID and SVA will not worry
> > about conflicts when it comes to allocating PASIDs for in-kernel DMA.
> >
> > Signed-off-by: Jacob Pan<[email protected]>
> > ---
> > drivers/iommu/intel/iommu.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 9f737ef55463..cbb2670f88ca 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -3956,6 +3956,10 @@ int __init intel_iommu_init(void)
> >
> > intel_iommu_enabled = 1;
> >
> > + /* Reserved RID_PASID from the global namespace for legacy DMA
> > */
> > + WARN_ON(iommu_alloc_global_pasid(PASID_RID2PASID,
> > PASID_RID2PASID) !=
> > + PASID_RID2PASID);
>
> How about moving above line up a bit? For example, at least before
> iommu_device_register(). This is the starting point where device drivers
> may want global PASIDs.
>
makes sense will do.
Thanks,
Jacob
Hi Baolu,
On Wed, 12 Apr 2023 09:37:48 +0800, Baolu Lu <[email protected]>
wrote:
> On 4/11/23 4:02 PM, Tian, Kevin wrote:
> >> From: Jacob Pan <[email protected]>
> >> Sent: Saturday, April 8, 2023 2:06 AM
> >> @@ -28,8 +26,8 @@ static int iommu_sva_alloc_pasid(struct mm_struct
> >> *mm, ioasid_t min, ioasid_t ma
> >> goto out;
> >> }
> >>
> >> - ret = ida_alloc_range(&iommu_global_pasid_ida, min, max,
> >> GFP_KERNEL);
> >> - if (ret < min)
> >> + ret = iommu_alloc_global_pasid(min, max);
> >
> > I wonder whether this can take a device pointer so
> > dev->iommu->max_pasids is enforced inside the alloc function.
>
> Agreed. Instead of using the open code, it looks better to have a helper
> like dev_iommu_max_pasids().
yes, probably export dev_iommu_get_max_pasids(dev)?
But if I understood Kevin correctly, he's also suggesting that the
interface should be changed to iommu_alloc_global_pasid(dev), my concern is
that how do we use this function to reserve RID_PASID which is not specific
to a device?
>
> >
> > and do we even need the min/max parameters? With special pasids reserved
> > then what driver needs is just to get a free pasid from the global
> > space within dev->iommu->max_pasids constraint...
> >
> > iommu_sva_alloc_pasid() can be reworked to avoid min/max by taking a
> > device pointer too.
>
> Best regards,
> baolu
Thanks,
Jacob
On 4/18/23 12:46 AM, Jacob Pan wrote:
> On Wed, 12 Apr 2023 09:37:48 +0800, Baolu Lu<[email protected]>
> wrote:
>
>> On 4/11/23 4:02 PM, Tian, Kevin wrote:
>>>> From: Jacob Pan<[email protected]>
>>>> Sent: Saturday, April 8, 2023 2:06 AM
>>>> @@ -28,8 +26,8 @@ static int iommu_sva_alloc_pasid(struct mm_struct
>>>> *mm, ioasid_t min, ioasid_t ma
>>>> goto out;
>>>> }
>>>>
>>>> - ret = ida_alloc_range(&iommu_global_pasid_ida, min, max,
>>>> GFP_KERNEL);
>>>> - if (ret < min)
>>>> + ret = iommu_alloc_global_pasid(min, max);
>>> I wonder whether this can take a device pointer so
>>> dev->iommu->max_pasids is enforced inside the alloc function.
>> Agreed. Instead of using the open code, it looks better to have a helper
>> like dev_iommu_max_pasids().
> yes, probably export dev_iommu_get_max_pasids(dev)?
>
> But if I understood Kevin correctly, he's also suggesting that the
> interface should be changed to iommu_alloc_global_pasid(dev), my concern is
> that how do we use this function to reserve RID_PASID which is not specific
> to a device?
Probably we can introduce a counterpart dev->iommu->min_pasids, so that
there's no need to reserve the RID_PASID. At present, we can set it to 1
in the core as ARM/AMD/Intel all treat PASID 0 as a special pasid.
In the future, if VT-d supports using arbitrary number as RID_PASID for
any specific device, we can call iommu_alloc_global_pasid() for that
device.
The device drivers don't know and don't need to know the range of viable
PASIDs, so the @min, @max parameters seem to be unreasonable.
Best regards,
baolu
Hi Baolu,
On Mon, 10 Apr 2023 10:46:02 +0800, Baolu Lu <[email protected]>
wrote:
> On 4/8/23 2:05 AM, Jacob Pan wrote:
> > @@ -2429,10 +2475,11 @@ static int __init si_domain_init(int hw)
> > return 0;
> > }
> >
> > -static int dmar_domain_attach_device(struct dmar_domain *domain,
> > - struct device *dev)
> > +static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
> > + struct device *dev, ioasid_t
> > pasid) {
> > struct device_domain_info *info = dev_iommu_priv_get(dev);
> > + struct device_pasid_info *dev_pasid;
> > struct intel_iommu *iommu;
> > unsigned long flags;
> > u8 bus, devfn;
> > @@ -2442,43 +2489,57 @@ static int dmar_domain_attach_device(struct
> > dmar_domain *domain, if (!iommu)
> > return -ENODEV;
> >
> > + dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
> > + if (!dev_pasid)
> > + return -ENOMEM;
> > +
> > ret = domain_attach_iommu(domain, iommu);
> > if (ret)
> > - return ret;
> > + goto exit_free;
> > +
> > info->domain = domain;
> > + dev_pasid->pasid = pasid;
> > + dev_pasid->dev = dev;
> > spin_lock_irqsave(&domain->lock, flags);
> > - list_add(&info->link, &domain->devices);
> > + if (!info->dev_attached)
> > + list_add(&info->link, &domain->devices);
> > +
> > + list_add(&dev_pasid->link_domain, &domain->dev_pasids);
> > spin_unlock_irqrestore(&domain->lock, flags);
> >
> > /* PASID table is mandatory for a PCI device in scalable
> > mode. */ if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
> > /* Setup the PASID entry for requests without PASID:
> > */ if (hw_pass_through && domain_type_is_si(domain))
> > - ret = intel_pasid_setup_pass_through(iommu,
> > domain,
> > - dev, PASID_RID2PASID);
> > + ret = intel_pasid_setup_pass_through(iommu,
> > domain, dev, pasid); else if (domain->use_first_level)
> > - ret = domain_setup_first_level(iommu, domain,
> > dev,
> > - PASID_RID2PASID);
> > + ret = domain_setup_first_level(iommu, domain,
> > dev, pasid); else
> > - ret = intel_pasid_setup_second_level(iommu,
> > domain,
> > - dev, PASID_RID2PASID);
> > + ret = intel_pasid_setup_second_level(iommu,
> > domain, dev, pasid); if (ret) {
> > - dev_err(dev, "Setup RID2PASID failed\n");
> > + dev_err(dev, "Setup PASID %d failed\n", pasid);
> > device_block_translation(dev);
> > - return ret;
> > + goto exit_free;
> > }
> > }
> > + /* device context already activated, we are done */
> > + if (info->dev_attached)
> > + goto exit;
> >
> > ret = domain_context_mapping(domain, dev);
> > if (ret) {
> > dev_err(dev, "Domain context map failed\n");
> > device_block_translation(dev);
> > - return ret;
> > + goto exit_free;
> > }
> >
> > iommu_enable_pci_caps(info);
> > -
> > + info->dev_attached = 1;
> > +exit:
> > return 0;
> > +exit_free:
> > + kfree(dev_pasid);
> > + return ret;
> > }
> >
> > static bool device_has_rmrr(struct device *dev)
> > @@ -4029,8 +4090,7 @@ static void device_block_translation(struct
> > device *dev) iommu_disable_pci_caps(info);
> > if (!dev_is_real_dma_subdevice(dev)) {
> > if (sm_supported(iommu))
> > - intel_pasid_tear_down_entry(iommu, dev,
> > - PASID_RID2PASID,
> > false);
> > +
> > intel_iommu_detach_device_pasid(&info->domain->domain, dev,
> > PASID_RID2PASID); else domain_context_clear(info);
> > }
> > @@ -4040,6 +4100,7 @@ static void device_block_translation(struct
> > device *dev)
> > spin_lock_irqsave(&info->domain->lock, flags);
> > list_del(&info->link);
> > + info->dev_attached = 0;
> > spin_unlock_irqrestore(&info->domain->lock, flags);
> >
> > domain_detach_iommu(info->domain, iommu);
> > @@ -4186,7 +4247,7 @@ static int intel_iommu_attach_device(struct
> > iommu_domain *domain, if (ret)
> > return ret;
> >
> > - return dmar_domain_attach_device(to_dmar_domain(domain), dev);
> > + return dmar_domain_attach_device_pasid(to_dmar_domain(domain),
> > dev, PASID_RID2PASID); }
>
> For VT-d driver, attach_dev and attach_dev_pasid have different
> meanings. Merging them into one helper may lead to confusion. What do
> you think of the following code? The dmar_domain_attach_device_pasid()
> helper could be reused for attach_dev_pasid path.
Per our previous discussion
https://lore.kernel.org/lkml/[email protected]/
We wanted to remove the ordering dependency between attaching device and
device_pasid. i.e. making the two equal at IOMMU API level.
So from that perspective, attach_dev_pasid will include attach_dev if the
device has not been attached. i.e.
attach_dev includes set up device context and RID_PASID
attach_dev_pasid also include set up device context and another PASID.
No ordering requirement.
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 7c2f4bd33582..09ae62bc3724 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -2434,6 +2434,40 @@ static int __init si_domain_init(int hw)
> return 0;
> }
>
> +
> +static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
> + struct intel_iommu *iommu,
> + struct device *dev, ioasid_t
> pasid) +{
> + struct device_pasid_info *dev_pasid;
> + unsigned long flags;
> + int ret;
> +
> + dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
> + if (!dev_pasid)
> + return -ENOMEM;
> +
> + if (hw_pass_through && domain_type_is_si(domain))
> + ret = intel_pasid_setup_pass_through(iommu, domain, dev,
> pasid);
> + else if (domain->use_first_level)
> + ret = domain_setup_first_level(iommu, domain, dev,
> pasid);
> + else
> + ret = intel_pasid_setup_second_level(iommu, domain, dev,
> pasid); +
> + if (ret) {
> + kfree(dev_pasid);
> + return ret;
> + }
> +
> + dev_pasid->pasid = pasid;
> + dev_pasid->dev = dev;
> + spin_lock_irqsave(&domain->lock, flags);
> + list_add(&dev_pasid->link_domain, &domain->dev_pasids);
> + spin_unlock_irqrestore(&domain->lock, flags);
> +
> + return 0;
> +}
> +
> static int dmar_domain_attach_device(struct dmar_domain *domain,
> struct device *dev)
> {
> @@ -2458,15 +2492,8 @@ static int dmar_domain_attach_device(struct
> dmar_domain *domain,
> /* PASID table is mandatory for a PCI device in scalable mode.
> */ if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
> /* Setup the PASID entry for requests without PASID: */
> - if (hw_pass_through && domain_type_is_si(domain))
> - ret = intel_pasid_setup_pass_through(iommu,
> domain,
> - dev, PASID_RID2PASID);
> - else if (domain->use_first_level)
> - ret = domain_setup_first_level(iommu, domain,
> dev,
> - PASID_RID2PASID);
> - else
> - ret = intel_pasid_setup_second_level(iommu,
> domain,
> - dev, PASID_RID2PASID);
> + ret = dmar_domain_attach_device_pasid(domain, iommu, dev,
> + PASID_RID2PASID);
> if (ret) {
> dev_err(dev, "Setup RID2PASID failed\n");
> device_block_translation(dev);
>
> >
> > static int intel_iommu_map(struct iommu_domain *domain,
> > @@ -4675,26 +4736,52 @@ static void intel_iommu_iotlb_sync_map(struct
> > iommu_domain *domain, __mapping_notify_one(info->iommu, dmar_domain,
> > pfn, pages); }
> >
> > -static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t
> > pasid) +static void intel_iommu_detach_device_pasid(struct iommu_domain
> > *domain,
> > + struct device *dev,
> > ioasid_t pasid) {
> > - struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
> > - struct iommu_domain *domain;
> > + struct device_domain_info *info = dev_iommu_priv_get(dev);
> > + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > + struct device_pasid_info *i, *dev_pasid = NULL;
> > + struct intel_iommu *iommu = info->iommu;
> > + unsigned long flags;
> >
> > - /* Domain type specific cleanup: */
> > - domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
> > - if (domain) {
> > - switch (domain->type) {
> > - case IOMMU_DOMAIN_SVA:
> > - intel_svm_remove_dev_pasid(dev, pasid);
> > - break;
> > - default:
> > - /* should never reach here */
> > - WARN_ON(1);
> > + spin_lock_irqsave(&dmar_domain->lock, flags);
> > + list_for_each_entry(i, &dmar_domain->dev_pasids, link_domain) {
> > + if (i->dev == dev && i->pasid == pasid) {
> > + list_del(&i->link_domain);
> > + dev_pasid = i;
> > break;
> > }
> > }
> > + spin_unlock_irqrestore(&dmar_domain->lock, flags);
> > + if (WARN_ON(!dev_pasid))
> > + return;
> > +
> > + /* PASID entry already cleared during SVA unbind */
> > + if (domain->type != IOMMU_DOMAIN_SVA)
> > + intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> > +
> > + kfree(dev_pasid);
> > +}
> > +
> > +static void intel_iommu_remove_dev_pasid(struct device *dev, ioasid_t
> > pasid) +{
> > + struct device_domain_info *info = dev_iommu_priv_get(dev);
> > + struct dmar_domain *dmar_domain;
> > + struct iommu_domain *domain;
> > +
> > + domain = iommu_get_domain_for_dev_pasid(dev, pasid, 0);
> > + dmar_domain = to_dmar_domain(domain);
> > +
> > + /*
> > + * SVA Domain type specific cleanup: Not ideal but not until
> > we have
> > + * IOPF capable domain specific ops, we need this special case.
> > + */
> > + if (domain->type == IOMMU_DOMAIN_SVA)
> > + return intel_svm_remove_dev_pasid(dev, pasid);
> >
> > - intel_pasid_tear_down_entry(iommu, dev, pasid, false);
> > + intel_iommu_detach_device_pasid(domain, dev, pasid);
> > + domain_detach_iommu(dmar_domain, info->iommu);
> > }
>
> The remove_dev_pasid path need to change only after attach_dev_pasid op
> is added, right? If so, we should move such change into the next patch.
yes, you are right, will do.
> >
> > const struct iommu_ops intel_iommu_ops = {
> > diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> > index 65b15be72878..b6c26f25d1ba 100644
> > --- a/drivers/iommu/intel/iommu.h
> > +++ b/drivers/iommu/intel/iommu.h
> > @@ -595,6 +595,7 @@ struct dmar_domain {
> >
> > spinlock_t lock; /* Protect device tracking
> > lists */ struct list_head devices; /* all devices' list */
> > + struct list_head dev_pasids; /* all attached pasids */
> >
> > struct dma_pte *pgd; /* virtual address
> > */ int gaw; /* max guest address width */
> > @@ -708,6 +709,7 @@ struct device_domain_info {
> > u8 ats_supported:1;
> > u8 ats_enabled:1;
> > u8 dtlb_extra_inval:1; /* Quirk for devices need extra
> > flush */
> > + u8 dev_attached:1; /* Device context activated */
> > u8 ats_qdep;
> > struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
> > struct intel_iommu *iommu; /* IOMMU used by this device */
> > @@ -715,6 +717,12 @@ struct device_domain_info {
> > struct pasid_table *pasid_table; /* pasid table */
> > };
> >
> > +struct device_pasid_info {
> > + struct list_head link_domain; /* link to domain
> > siblings */
> > + struct device *dev; /* physical device derived
> > from */
> > + ioasid_t pasid; /* PASID on physical
> > device */ +};
> > +
> > static inline void __iommu_flush_cache(
> > struct intel_iommu *iommu, void *addr, int size)
> > {
>
> Best regards,
> baolu
Thanks,
Jacob
Hi Baolu,
On Tue, 18 Apr 2023 10:06:12 +0800, Baolu Lu <[email protected]>
wrote:
> On 4/18/23 12:46 AM, Jacob Pan wrote:
> > On Wed, 12 Apr 2023 09:37:48 +0800, Baolu Lu<[email protected]>
> > wrote:
> >
> >> On 4/11/23 4:02 PM, Tian, Kevin wrote:
> >>>> From: Jacob Pan<[email protected]>
> >>>> Sent: Saturday, April 8, 2023 2:06 AM
> >>>> @@ -28,8 +26,8 @@ static int iommu_sva_alloc_pasid(struct mm_struct
> >>>> *mm, ioasid_t min, ioasid_t ma
> >>>> goto out;
> >>>> }
> >>>>
> >>>> - ret = ida_alloc_range(&iommu_global_pasid_ida, min, max,
> >>>> GFP_KERNEL);
> >>>> - if (ret < min)
> >>>> + ret = iommu_alloc_global_pasid(min, max);
> >>> I wonder whether this can take a device pointer so
> >>> dev->iommu->max_pasids is enforced inside the alloc function.
> >> Agreed. Instead of using the open code, it looks better to have a
> >> helper like dev_iommu_max_pasids().
> > yes, probably export dev_iommu_get_max_pasids(dev)?
> >
> > But if I understood Kevin correctly, he's also suggesting that the
> > interface should be changed to iommu_alloc_global_pasid(dev), my
> > concern is that how do we use this function to reserve RID_PASID which
> > is not specific to a device?
>
> Probably we can introduce a counterpart dev->iommu->min_pasids, so that
> there's no need to reserve the RID_PASID. At present, we can set it to 1
> in the core as ARM/AMD/Intel all treat PASID 0 as a special pasid.
>
> In the future, if VT-d supports using arbitrary number as RID_PASID for
> any specific device, we can call iommu_alloc_global_pasid() for that
> device.
>
> The device drivers don't know and don't need to know the range of viable
> PASIDs, so the @min, @max parameters seem to be unreasonable.
Sure, that is reasonable. Another question is whether global PASID
allocation is always for a single device, if not I prefer to keep the
current iommu_alloc_global_pasid() and add a wrapper
iommu_alloc_global_pasid_dev(dev) to extract the @min, @max. OK?
Thanks,
Jacob
On 4/19/23 5:32 AM, Jacob Pan wrote:
> On Mon, 10 Apr 2023 10:46:02 +0800, Baolu Lu<[email protected]>
> wrote:
>
>> On 4/8/23 2:05 AM, Jacob Pan wrote:
>>> @@ -2429,10 +2475,11 @@ static int __init si_domain_init(int hw)
>>> return 0;
>>> }
>>>
>>> -static int dmar_domain_attach_device(struct dmar_domain *domain,
>>> - struct device *dev)
>>> +static int dmar_domain_attach_device_pasid(struct dmar_domain *domain,
>>> + struct device *dev, ioasid_t
>>> pasid) {
>>> struct device_domain_info *info = dev_iommu_priv_get(dev);
>>> + struct device_pasid_info *dev_pasid;
>>> struct intel_iommu *iommu;
>>> unsigned long flags;
>>> u8 bus, devfn;
>>> @@ -2442,43 +2489,57 @@ static int dmar_domain_attach_device(struct
>>> dmar_domain *domain, if (!iommu)
>>> return -ENODEV;
>>>
>>> + dev_pasid = kzalloc(sizeof(*dev_pasid), GFP_KERNEL);
>>> + if (!dev_pasid)
>>> + return -ENOMEM;
>>> +
>>> ret = domain_attach_iommu(domain, iommu);
>>> if (ret)
>>> - return ret;
>>> + goto exit_free;
>>> +
>>> info->domain = domain;
>>> + dev_pasid->pasid = pasid;
>>> + dev_pasid->dev = dev;
>>> spin_lock_irqsave(&domain->lock, flags);
>>> - list_add(&info->link, &domain->devices);
>>> + if (!info->dev_attached)
>>> + list_add(&info->link, &domain->devices);
>>> +
>>> + list_add(&dev_pasid->link_domain, &domain->dev_pasids);
>>> spin_unlock_irqrestore(&domain->lock, flags);
>>>
>>> /* PASID table is mandatory for a PCI device in scalable
>>> mode. */ if (sm_supported(iommu) && !dev_is_real_dma_subdevice(dev)) {
>>> /* Setup the PASID entry for requests without PASID:
>>> */ if (hw_pass_through && domain_type_is_si(domain))
>>> - ret = intel_pasid_setup_pass_through(iommu,
>>> domain,
>>> - dev, PASID_RID2PASID);
>>> + ret = intel_pasid_setup_pass_through(iommu,
>>> domain, dev, pasid); else if (domain->use_first_level)
>>> - ret = domain_setup_first_level(iommu, domain,
>>> dev,
>>> - PASID_RID2PASID);
>>> + ret = domain_setup_first_level(iommu, domain,
>>> dev, pasid); else
>>> - ret = intel_pasid_setup_second_level(iommu,
>>> domain,
>>> - dev, PASID_RID2PASID);
>>> + ret = intel_pasid_setup_second_level(iommu,
>>> domain, dev, pasid); if (ret) {
>>> - dev_err(dev, "Setup RID2PASID failed\n");
>>> + dev_err(dev, "Setup PASID %d failed\n", pasid);
>>> device_block_translation(dev);
>>> - return ret;
>>> + goto exit_free;
>>> }
>>> }
>>> + /* device context already activated, we are done */
>>> + if (info->dev_attached)
>>> + goto exit;
>>>
>>> ret = domain_context_mapping(domain, dev);
>>> if (ret) {
>>> dev_err(dev, "Domain context map failed\n");
>>> device_block_translation(dev);
>>> - return ret;
>>> + goto exit_free;
>>> }
>>>
>>> iommu_enable_pci_caps(info);
>>> -
>>> + info->dev_attached = 1;
>>> +exit:
>>> return 0;
>>> +exit_free:
>>> + kfree(dev_pasid);
>>> + return ret;
>>> }
>>>
>>> static bool device_has_rmrr(struct device *dev)
>>> @@ -4029,8 +4090,7 @@ static void device_block_translation(struct
>>> device *dev) iommu_disable_pci_caps(info);
>>> if (!dev_is_real_dma_subdevice(dev)) {
>>> if (sm_supported(iommu))
>>> - intel_pasid_tear_down_entry(iommu, dev,
>>> - PASID_RID2PASID,
>>> false);
>>> +
>>> intel_iommu_detach_device_pasid(&info->domain->domain, dev,
>>> PASID_RID2PASID); else domain_context_clear(info);
>>> }
>>> @@ -4040,6 +4100,7 @@ static void device_block_translation(struct
>>> device *dev)
>>> spin_lock_irqsave(&info->domain->lock, flags);
>>> list_del(&info->link);
>>> + info->dev_attached = 0;
>>> spin_unlock_irqrestore(&info->domain->lock, flags);
>>>
>>> domain_detach_iommu(info->domain, iommu);
>>> @@ -4186,7 +4247,7 @@ static int intel_iommu_attach_device(struct
>>> iommu_domain *domain, if (ret)
>>> return ret;
>>>
>>> - return dmar_domain_attach_device(to_dmar_domain(domain), dev);
>>> + return dmar_domain_attach_device_pasid(to_dmar_domain(domain),
>>> dev, PASID_RID2PASID); }
>> For VT-d driver, attach_dev and attach_dev_pasid have different
>> meanings. Merging them into one helper may lead to confusion. What do
>> you think of the following code? The dmar_domain_attach_device_pasid()
>> helper could be reused for attach_dev_pasid path.
> Per our previous discussion
> https://lore.kernel.org/lkml/[email protected]/
> We wanted to remove the ordering dependency between attaching device and
> device_pasid. i.e. making the two equal at IOMMU API level.
Yes. That still holds.
>
> So from that perspective, attach_dev_pasid will include attach_dev if the
> device has not been attached. i.e.
I don't follow here. attach_dev and attach_dev_pasid are independent of
each other. So in any case, attach_dev_pasid shouldn't include
attach_dev.
> attach_dev includes set up device context and RID_PASID
> attach_dev_pasid also include set up device context and another PASID.
I guess that you are worrying about the case where the context entry and
pasid table are not setup yet in attach_dev_pasid path? In theory yes,
but not exist in reality. The best case is that we setup context entry
in probe_device path, but at present, perhaps we can simply check and
return failure in this case.
Any way, I'd suggest not mix two ops in a single function.
>
> No ordering requirement.
>
Best regards,
baolu
On 4/19/23 7:04 AM, Jacob Pan wrote:
> On Tue, 18 Apr 2023 10:06:12 +0800, Baolu Lu<[email protected]>
> wrote:
>
>> On 4/18/23 12:46 AM, Jacob Pan wrote:
>>> On Wed, 12 Apr 2023 09:37:48 +0800, Baolu Lu<[email protected]>
>>> wrote:
>>>
>>>> On 4/11/23 4:02 PM, Tian, Kevin wrote:
>>>>>> From: Jacob Pan<[email protected]>
>>>>>> Sent: Saturday, April 8, 2023 2:06 AM
>>>>>> @@ -28,8 +26,8 @@ static int iommu_sva_alloc_pasid(struct mm_struct
>>>>>> *mm, ioasid_t min, ioasid_t ma
>>>>>> goto out;
>>>>>> }
>>>>>>
>>>>>> - ret = ida_alloc_range(&iommu_global_pasid_ida, min, max,
>>>>>> GFP_KERNEL);
>>>>>> - if (ret < min)
>>>>>> + ret = iommu_alloc_global_pasid(min, max);
>>>>> I wonder whether this can take a device pointer so
>>>>> dev->iommu->max_pasids is enforced inside the alloc function.
>>>> Agreed. Instead of using the open code, it looks better to have a
>>>> helper like dev_iommu_max_pasids().
>>> yes, probably export dev_iommu_get_max_pasids(dev)?
>>>
>>> But if I understood Kevin correctly, he's also suggesting that the
>>> interface should be changed to iommu_alloc_global_pasid(dev), my
>>> concern is that how do we use this function to reserve RID_PASID which
>>> is not specific to a device?
>> Probably we can introduce a counterpart dev->iommu->min_pasids, so that
>> there's no need to reserve the RID_PASID. At present, we can set it to 1
>> in the core as ARM/AMD/Intel all treat PASID 0 as a special pasid.
>>
>> In the future, if VT-d supports using arbitrary number as RID_PASID for
>> any specific device, we can call iommu_alloc_global_pasid() for that
>> device.
>>
>> The device drivers don't know and don't need to know the range of viable
>> PASIDs, so the @min, @max parameters seem to be unreasonable.
> Sure, that is reasonable. Another question is whether global PASID
> allocation is always for a single device, if not I prefer to keep the
> current iommu_alloc_global_pasid() and add a wrapper
> iommu_alloc_global_pasid_dev(dev) to extract the @min, @max. OK?
No problem from the code perspective. But we only need one API.
We can now add the kAPI that we really need. In this series, the idxd
driver wants to allocate a global PASID for its kernel dma with pasid
purpose. So, iommu_alloc_global_pasid_dev() seems to be sufficient.
If, in the future, we will have a need to provide global pasid
allocation other than device drivers, we can easily add the variants.
Best regards,
baolu
Hi Baolu,
On Wed, 19 Apr 2023 10:40:46 +0800, Baolu Lu <[email protected]>
wrote:
> On 4/19/23 7:04 AM, Jacob Pan wrote:
> > On Tue, 18 Apr 2023 10:06:12 +0800, Baolu Lu<[email protected]>
> > wrote:
> >
> >> On 4/18/23 12:46 AM, Jacob Pan wrote:
> >>> On Wed, 12 Apr 2023 09:37:48 +0800, Baolu Lu<[email protected]>
> >>> wrote:
> >>>
> >>>> On 4/11/23 4:02 PM, Tian, Kevin wrote:
> >>>>>> From: Jacob Pan<[email protected]>
> >>>>>> Sent: Saturday, April 8, 2023 2:06 AM
> >>>>>> @@ -28,8 +26,8 @@ static int iommu_sva_alloc_pasid(struct mm_struct
> >>>>>> *mm, ioasid_t min, ioasid_t ma
> >>>>>> goto out;
> >>>>>> }
> >>>>>>
> >>>>>> - ret = ida_alloc_range(&iommu_global_pasid_ida, min, max,
> >>>>>> GFP_KERNEL);
> >>>>>> - if (ret < min)
> >>>>>> + ret = iommu_alloc_global_pasid(min, max);
> >>>>> I wonder whether this can take a device pointer so
> >>>>> dev->iommu->max_pasids is enforced inside the alloc function.
> >>>> Agreed. Instead of using the open code, it looks better to have a
> >>>> helper like dev_iommu_max_pasids().
> >>> yes, probably export dev_iommu_get_max_pasids(dev)?
> >>>
> >>> But if I understood Kevin correctly, he's also suggesting that the
> >>> interface should be changed to iommu_alloc_global_pasid(dev), my
> >>> concern is that how do we use this function to reserve RID_PASID which
> >>> is not specific to a device?
> >> Probably we can introduce a counterpart dev->iommu->min_pasids, so that
> >> there's no need to reserve the RID_PASID. At present, we can set it to
> >> 1 in the core as ARM/AMD/Intel all treat PASID 0 as a special pasid.
> >>
> >> In the future, if VT-d supports using arbitrary number as RID_PASID for
> >> any specific device, we can call iommu_alloc_global_pasid() for that
> >> device.
> >>
> >> The device drivers don't know and don't need to know the range of
> >> viable PASIDs, so the @min, @max parameters seem to be unreasonable.
> > Sure, that is reasonable. Another question is whether global PASID
> > allocation is always for a single device, if not I prefer to keep the
> > current iommu_alloc_global_pasid() and add a wrapper
> > iommu_alloc_global_pasid_dev(dev) to extract the @min, @max. OK?
>
> No problem from the code perspective. But we only need one API.
>
> We can now add the kAPI that we really need. In this series, the idxd
> driver wants to allocate a global PASID for its kernel dma with pasid
> purpose. So, iommu_alloc_global_pasid_dev() seems to be sufficient.
>
> If, in the future, we will have a need to provide global pasid
> allocation other than device drivers, we can easily add the variants.
>
sounds good, I will only add iommu_alloc_global_pasid_dev(dev). let the
core code set @min, @max for devices.
Thanks,
Jacob