2022-08-17 01:39:31

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v11 00/13] iommu: SVA and IOPF refactoring

Hi folks,

The former part of this series introduces the IOMMU interfaces to attach
or detach an iommu domain to/from a pasid of a device, and refactors the
exsiting IOMMU SVA implementation by assigning an SVA type of iommu
domain to a shared virtual address and replacing sva_bind/unbind iommu
ops with a set_dev_pasid domain ops.

The latter part changes the existing I/O page fault handling framework
from only serving SVA to a generic one. Any driver or component could
handle the I/O page faults for its domain in its own way by installing
an I/O page fault handler.

This series has been functionally tested on an x86 machine and compile
tested for all architectures.

This series is also available on github:
[2] https://github.com/LuBaolu/intel-iommu/commits/iommu-sva-refactoring-v11

Please review and suggest.

Best regards,
baolu

Change log:
v11:
- [PATCH 04/13] PCI: Allow PASID only when ACS enforced on upstreaming path
- new patch
- [PATCH 05/13] iommu: Add attach/detach_dev_pasid iommu interface
- Remove block_dev_pasid domain ops and use setting group blocking
domain instead.
- Remove iommu_group_immutable_singleton(). Move the PCI/ACS
requirement into the pci_enable_pasid(). All devices in an iommu
group share a same iommu domain for each pasid.
- [PATCH 06/13] iommu: Add IOMMU SVA domain support
- Add a refcount for SVA multiple bindings.
- [PATCH 07/13] iommu/vt-d: Add SVA domain support
- Use set_dev_pasid for both domain attaching and detaching.
- [PATCH 08/13] arm-smmu-v3/sva: Add SVA domain support
- Use set_dev_pasid for both domain attaching and detaching.
- [PATCH 09/13] iommu/sva: Refactoring iommu_sva_bind/unbind_device()
- Remove the refcount of iommu_sva::users.
- Add iommu_sva::domain.
- [PATCH 11/13] iommu: Prepare IOMMU domain for IOPF
- Remove unnecessary check of IS_ERR_OR_NULL(mm).
- [Overall]
- Rebase to v6.0-rc1.
- Remove previous Test-by's as some APIs are changed.
- Polishing of various codes and comments.

v10:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Rebase on next branch of iommu tree.
- Split attach/detach_device_pasid interfaces and SVA domain extensions
to different patches.
- Handle the return error of xa_cmpxchg() gracefully.
- Directly pass mm in as the SVA fault data.
- Rename iopf_handle_group() to iopf_handler().
- Some commit message and code comment refinement.
- Add Tested-by's from Zhangfei and Tony.

v9:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Some minor changes on comments and function names.
- Simplify dev_iommu_get_max_pasids().

v8:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Add support for calculating the max pasids that a device could
consume.
- Replace container_of_safe() with container_of.
- Remove iommu_ops->sva_domain_ops and make sva support through the
generic domain_alloc/free() interfaces.
- [Robin] It would be logical to pass IOMMU_DOMAIN_SVA to the normal
domain_alloc call, so that driver-internal stuff like context
descriptors can be still be hung off the domain as usual (rather than
all drivers having to implement some extra internal lookup mechanism
to handle all the SVA domain ops).
- [Robin] I'd just stick the mm pointer in struct iommu_domain, in a
union with the fault handler stuff those are mutually exclusive with
SVA.
- https://lore.kernel.org/linux-iommu/[email protected]/

v7:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Remove duplicate array for sva domain.
- Rename detach_dev_pasid to block_dev_pasid.
- Add raw device driver interfaces for iommufd.
- Other misc refinements and patch reorganization.
- Drop "dmaengine: idxd: Separate user and kernel pasid enabling" which
has been picked for dmaengine tree.

v6:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Refine the SVA basic data structures.
Link: https://lore.kernel.org/linux-iommu/YnFv0ps0Ad8v+7uH@myrica/
- Refine arm smmuv3 sva domain allocation.
- Fix a possible lock issue.
Link: https://lore.kernel.org/linux-iommu/YnFydE8j8l7Q4m+b@myrica/

v5:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Address review comments from Jean-Philippe Brucker. Very appreciated!
- Remove redundant pci aliases check in
device_group_immutable_singleton().
- Treat all buses except PCI as static in immutable singleton check.
- As the sva_bind/unbind() have already guaranteed sva domain free only
after iopf_queue_flush_dev(), remove the unnecessary domain refcount.
- Move domain get() out of the list iteration in iopf_handle_group().

v4:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Solve the overlap with another series and make this series
self-contained.
- No objection to the abstraction of data structure during v3 review.
Hence remove the RFC subject prefix.
- Refine the immutable singleton group code according to Kevin's
comments.

v3:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Rework iommu_group_singleton_lockdown() by adding a flag to the group
that positively indicates the group can never have more than one
member, even after hot plug.
- Abstract the data structs used for iommu sva in a separated patches to
make it easier for review.
- I still keep the RFC prefix in this series as above two significant
changes need at least another round review to be finalized.
- Several misc refinements.

v2:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Add sva domain life cycle management to avoid race between unbind and
page fault handling.
- Use a single domain for each mm.
- Return a single sva handler for the same binding.
- Add a new helper to meet singleton group requirement.
- Rework the SVA domain allocation for arm smmu v3 driver and move the
pasid_bit initialization to device probe.
- Drop the patch "iommu: Handle IO page faults directly".
- Add mmget_not_zero(mm) in SVA page fault handler.

v1:
- https://lore.kernel.org/linux-iommu/[email protected]/
- Initial post.

Lu Baolu (13):
iommu: Add max_pasids field in struct iommu_device
iommu: Add max_pasids field in struct dev_iommu
iommu: Remove SVM_FLAG_SUPERVISOR_MODE support
PCI: Allow PASID only when ACS enforced on upstreaming path
iommu: Add attach/detach_dev_pasid iommu interface
iommu: Add IOMMU SVA domain support
iommu/vt-d: Add SVA domain support
arm-smmu-v3/sva: Add SVA domain support
iommu/sva: Refactoring iommu_sva_bind/unbind_device()
iommu: Remove SVA related callbacks from iommu ops
iommu: Prepare IOMMU domain for IOPF
iommu: Per-domain I/O page fault handling
iommu: Rename iommu-sva-lib.{c,h}

include/linux/intel-svm.h | 13 -
include/linux/iommu.h | 110 +++++---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 19 +-
drivers/iommu/intel/iommu.h | 13 +-
.../iommu/{iommu-sva-lib.h => iommu-sva.h} | 14 +-
drivers/dma/idxd/cdev.c | 3 +-
drivers/dma/idxd/init.c | 25 +-
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 119 +++++---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 9 +-
drivers/iommu/intel/dmar.c | 7 +
drivers/iommu/intel/iommu.c | 7 +-
drivers/iommu/intel/svm.c | 150 +++++-----
drivers/iommu/io-pgfault.c | 77 ++----
drivers/iommu/iommu-sva-lib.c | 71 -----
drivers/iommu/iommu-sva.c | 233 ++++++++++++++++
drivers/iommu/iommu.c | 259 +++++++++++-------
drivers/misc/uacce/uacce.c | 2 +-
drivers/pci/ats.c | 5 +
drivers/iommu/Makefile | 2 +-
19 files changed, 680 insertions(+), 458 deletions(-)
rename drivers/iommu/{iommu-sva-lib.h => iommu-sva.h} (83%)
delete mode 100644 drivers/iommu/iommu-sva-lib.c
create mode 100644 drivers/iommu/iommu-sva.c

--
2.25.1


2022-08-17 01:40:47

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v11 07/13] iommu/vt-d: Add SVA domain support

Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops. This implementation is based on the existing SVA
code. Possible cleanup and refactoring are left for incremental
changes later.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Yi Liu <[email protected]>
---
drivers/iommu/intel/iommu.h | 5 ++++
drivers/iommu/intel/iommu.c | 2 ++
drivers/iommu/intel/svm.c | 50 +++++++++++++++++++++++++++++++++++++
3 files changed, 57 insertions(+)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index a9b8367c9361..4875c9974abd 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -747,6 +747,7 @@ void intel_svm_unbind(struct iommu_sva *handle);
u32 intel_svm_get_pasid(struct iommu_sva *handle);
int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
struct iommu_page_response *msg);
+struct iommu_domain *intel_svm_domain_alloc(void);

struct intel_svm_dev {
struct list_head list;
@@ -772,6 +773,10 @@ struct intel_svm {
};
#else
static inline void intel_svm_check(struct intel_iommu *iommu) {}
+static inline struct iommu_domain *intel_svm_domain_alloc(void)
+{
+ return NULL;
+}
#endif

#ifdef CONFIG_INTEL_IOMMU_DEBUGFS
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 7cca030a508e..27c9fd6139a8 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4149,6 +4149,8 @@ static struct iommu_domain *intel_iommu_domain_alloc(unsigned type)
return domain;
case IOMMU_DOMAIN_IDENTITY:
return &si_domain->domain;
+ case IOMMU_DOMAIN_SVA:
+ return intel_svm_domain_alloc();
default:
return NULL;
}
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 2420fa5c2360..16a4d413fce4 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -928,3 +928,53 @@ int intel_svm_page_response(struct device *dev,
mutex_unlock(&pasid_mutex);
return ret;
}
+
+static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ struct device_domain_info *info = dev_iommu_priv_get(dev);
+ struct intel_iommu *iommu = info->iommu;
+ struct iommu_sva *sva;
+ int ret = 0;
+
+ mutex_lock(&pasid_mutex);
+ /*
+ * Detach the domain if a blocking domain is set. Check the
+ * right domain type once the IOMMU driver supports a real
+ * blocking domain.
+ */
+ if (!domain || domain->type == IOMMU_DOMAIN_UNMANAGED) {
+ intel_svm_unbind_mm(dev, pasid);
+ } else {
+ struct mm_struct *mm = domain->mm;
+
+ sva = intel_svm_bind_mm(iommu, dev, mm);
+ if (IS_ERR(sva))
+ ret = PTR_ERR(sva);
+ }
+ mutex_unlock(&pasid_mutex);
+
+ return ret;
+}
+
+static void intel_svm_domain_free(struct iommu_domain *domain)
+{
+ kfree(to_dmar_domain(domain));
+}
+
+static const struct iommu_domain_ops intel_svm_domain_ops = {
+ .set_dev_pasid = intel_svm_set_dev_pasid,
+ .free = intel_svm_domain_free,
+};
+
+struct iommu_domain *intel_svm_domain_alloc(void)
+{
+ struct dmar_domain *domain;
+
+ domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+ if (!domain)
+ return NULL;
+ domain->domain.ops = &intel_svm_domain_ops;
+
+ return &domain->domain;
+}
--
2.25.1

2022-08-17 01:44:06

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v11 08/13] arm-smmu-v3/sva: Add SVA domain support

Add support for SVA domain allocation and provide an SVA-specific
iommu_domain_ops. This implementation is based on the existing SVA
code. Possible cleanup and refactoring are left for incremental
changes later.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 6 ++
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 76 +++++++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 +
3 files changed, 85 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index d2ba86470c42..96399dd3a67a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -758,6 +758,7 @@ struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm);
void arm_smmu_sva_unbind(struct iommu_sva *handle);
u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
void arm_smmu_sva_notifier_synchronize(void);
+struct iommu_domain *arm_smmu_sva_domain_alloc(void);
#else /* CONFIG_ARM_SMMU_V3_SVA */
static inline bool arm_smmu_sva_supported(struct arm_smmu_device *smmu)
{
@@ -803,5 +804,10 @@ static inline u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle)
}

static inline void arm_smmu_sva_notifier_synchronize(void) {}
+
+static inline struct iommu_domain *arm_smmu_sva_domain_alloc(void)
+{
+ return NULL;
+}
#endif /* CONFIG_ARM_SMMU_V3_SVA */
#endif /* _ARM_SMMU_V3_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index f155d406c5d5..43564b61c726 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -549,3 +549,79 @@ void arm_smmu_sva_notifier_synchronize(void)
*/
mmu_notifier_synchronize();
}
+
+static void arm_smmu_sva_block_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t id)
+{
+ struct mm_struct *mm = domain->mm;
+ struct arm_smmu_bond *bond = NULL, *t;
+ struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+
+ mutex_lock(&sva_lock);
+ list_for_each_entry(t, &master->bonds, list) {
+ if (t->mm == mm) {
+ bond = t;
+ break;
+ }
+ }
+
+ if (!WARN_ON(!bond) && refcount_dec_and_test(&bond->refs)) {
+ list_del(&bond->list);
+ arm_smmu_mmu_notifier_put(bond->smmu_mn);
+ kfree(bond);
+ }
+ mutex_unlock(&sva_lock);
+}
+
+static int arm_smmu_sva_set_dev_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t id)
+{
+ int ret = 0;
+ struct mm_struct *mm;
+ struct iommu_sva *handle;
+
+ /*
+ * Detach the domain if a blocking domain is set. Check the
+ * right domain type once the IOMMU driver supports a real
+ * blocking domain.
+ */
+ if (!domain || domain->type == IOMMU_DOMAIN_UNMANAGED) {
+ domain = iommu_get_domain_for_dev_pasid(dev, id);
+ if (!domain)
+ return -EINVAL;
+ arm_smmu_sva_block_dev_pasid(domain, dev, id);
+
+ return 0;
+ }
+
+ mm = domain->mm;
+ mutex_lock(&sva_lock);
+ handle = __arm_smmu_sva_bind(dev, mm);
+ if (IS_ERR(handle))
+ ret = PTR_ERR(handle);
+ mutex_unlock(&sva_lock);
+
+ return ret;
+}
+
+static void arm_smmu_sva_domain_free(struct iommu_domain *domain)
+{
+ kfree(domain);
+}
+
+static const struct iommu_domain_ops arm_smmu_sva_domain_ops = {
+ .set_dev_pasid = arm_smmu_sva_set_dev_pasid,
+ .free = arm_smmu_sva_domain_free,
+};
+
+struct iommu_domain *arm_smmu_sva_domain_alloc(void)
+{
+ struct iommu_domain *domain;
+
+ domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+ if (!domain)
+ return NULL;
+ domain->ops = &arm_smmu_sva_domain_ops;
+
+ return domain;
+}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index f88541be8213..057f7c8824d7 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2008,6 +2008,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
{
struct arm_smmu_domain *smmu_domain;

+ if (type == IOMMU_DOMAIN_SVA)
+ return arm_smmu_sva_domain_alloc();
+
if (type != IOMMU_DOMAIN_UNMANAGED &&
type != IOMMU_DOMAIN_DMA &&
type != IOMMU_DOMAIN_DMA_FQ &&
--
2.25.1

2022-08-17 01:44:18

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v11 12/13] iommu: Per-domain I/O page fault handling

Tweak the I/O page fault handling framework to route the page faults to
the domain and call the page fault handler retrieved from the domain.
This makes the I/O page fault handling framework possible to serve more
usage scenarios as long as they have an IOMMU domain and install a page
fault handler in it. Some unused functions are also removed to avoid
dead code.

The iommu_get_domain_for_dev_pasid() which retrieves attached domain
for a {device, PASID} pair is used. It will be used by the page fault
handling framework which knows {device, PASID} reported from the iommu
driver. We have a guarantee that the SVA domain doesn't go away during
IOPF handling, because unbind() won't free the domain until all the
pending page requests have been flushed from the pipeline. The drivers
either call iopf_queue_flush_dev() explicitly, or in stall case, the
device driver is required to flush all DMAs including stalled
transactions before calling unbind().

This also renames iopf_handle_group() to iopf_handler() to avoid
confusing.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
---
drivers/iommu/io-pgfault.c | 68 +++++---------------------------------
1 file changed, 9 insertions(+), 59 deletions(-)

diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index aee9e033012f..d1c522f4ab34 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -69,69 +69,18 @@ static int iopf_complete_group(struct device *dev, struct iopf_fault *iopf,
return iommu_page_response(dev, &resp);
}

-static enum iommu_page_response_code
-iopf_handle_single(struct iopf_fault *iopf)
-{
- vm_fault_t ret;
- struct mm_struct *mm;
- struct vm_area_struct *vma;
- unsigned int access_flags = 0;
- unsigned int fault_flags = FAULT_FLAG_REMOTE;
- struct iommu_fault_page_request *prm = &iopf->fault.prm;
- enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
-
- if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
- return status;
-
- mm = iommu_sva_find(prm->pasid);
- if (IS_ERR_OR_NULL(mm))
- return status;
-
- mmap_read_lock(mm);
-
- vma = find_extend_vma(mm, prm->addr);
- if (!vma)
- /* Unmapped area */
- goto out_put_mm;
-
- if (prm->perm & IOMMU_FAULT_PERM_READ)
- access_flags |= VM_READ;
-
- if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
- access_flags |= VM_WRITE;
- fault_flags |= FAULT_FLAG_WRITE;
- }
-
- if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
- access_flags |= VM_EXEC;
- fault_flags |= FAULT_FLAG_INSTRUCTION;
- }
-
- if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
- fault_flags |= FAULT_FLAG_USER;
-
- if (access_flags & ~vma->vm_flags)
- /* Access fault */
- goto out_put_mm;
-
- ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL);
- status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
- IOMMU_PAGE_RESP_SUCCESS;
-
-out_put_mm:
- mmap_read_unlock(mm);
- mmput(mm);
-
- return status;
-}
-
-static void iopf_handle_group(struct work_struct *work)
+static void iopf_handler(struct work_struct *work)
{
struct iopf_group *group;
+ struct iommu_domain *domain;
struct iopf_fault *iopf, *next;
enum iommu_page_response_code status = IOMMU_PAGE_RESP_SUCCESS;

group = container_of(work, struct iopf_group, work);
+ domain = iommu_get_domain_for_dev_pasid(group->dev,
+ group->last_fault.fault.prm.pasid);
+ if (!domain || !domain->iopf_handler)
+ status = IOMMU_PAGE_RESP_INVALID;

list_for_each_entry_safe(iopf, next, &group->faults, list) {
/*
@@ -139,7 +88,8 @@ static void iopf_handle_group(struct work_struct *work)
* faults in the group if there is an error.
*/
if (status == IOMMU_PAGE_RESP_SUCCESS)
- status = iopf_handle_single(iopf);
+ status = domain->iopf_handler(&iopf->fault,
+ domain->fault_data);

if (!(iopf->fault.prm.flags &
IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE))
@@ -242,7 +192,7 @@ int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
group->last_fault.fault = *fault;
INIT_LIST_HEAD(&group->faults);
list_add(&group->last_fault.list, &group->faults);
- INIT_WORK(&group->work, iopf_handle_group);
+ INIT_WORK(&group->work, iopf_handler);

/* See if we have partial faults for this group */
list_for_each_entry_safe(iopf, next, &iopf_param->partial, list) {
--
2.25.1

2022-08-17 01:44:38

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v11 04/13] PCI: Allow PASID only when ACS enforced on upstreaming path

Some configurations of the PCI fabric will route device originated TLP
packets based on the memory addresses. These configurations are
incompatible with PASID as the PASID packets form a distinct address
space. For instance, any configuration where switches are present
without ACS enabled is incompatible.

This enhances the pci_enable_pasid() interface by requiring the ACS to
support Source Validation, Request Redirection, Completer Redirection,
and Upstream Forwarding. This effectively means that devices cannot
spoof their requester ID, requests and completions cannot be redirected,
and all transactions are forwarded upstream, even as it passes through a
bridge where the target device is downstream.

Suggested-by: Jason Gunthorpe <[email protected]>
Suggested-by: Kevin Tian <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/pci/ats.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index c967ad6e2626..0715e48e7973 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -382,6 +382,11 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
if (!pasid)
return -EINVAL;

+ if (!pci_acs_path_enabled(pdev, NULL,
+ PCI_ACS_SV | PCI_ACS_RR |
+ PCI_ACS_CR | PCI_ACS_UF))
+ return -EINVAL;
+
pci_read_config_word(pdev, pasid + PCI_PASID_CAP, &supported);
supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;

--
2.25.1

2022-08-17 01:45:58

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v11 05/13] iommu: Add attach/detach_dev_pasid iommu interface

Attaching an IOMMU domain to a PASID of a device is a generic operation
for modern IOMMU drivers which support PASID-granular DMA address
translation. Currently visible usage scenarios include (but not limited):

- SVA (Shared Virtual Address)
- kernel DMA with PASID
- hardware-assist mediated device

This adds set_dev_pasid domain ops for this purpose and also adds some
interfaces for device drivers to attach/detach/retrieve a domain for a
PASID of a device.

If multiple devices share a single group, it's fine as long the fabric
always routes every TLP marked with a PASID to the host bridge and only
the host bridge. For example, ACS achieves this universally and has been
checked when pci_enable_pasid() is called. As we can't reliably tell the
source apart in a group, all the devices in a group have to be considered
as the same source, and mapped to the same PASID table.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Yi Liu <[email protected]>
---
include/linux/iommu.h | 26 +++++++++
drivers/iommu/iommu.c | 123 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 149 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 2f237c3cd680..f1e8953b1e2e 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -266,6 +266,7 @@ struct iommu_ops {
* struct iommu_domain_ops - domain specific operations
* @attach_dev: attach an iommu domain to a device
* @detach_dev: detach an iommu domain from a device
+ * @set_dev_pasid: set an iommu domain to a pasid of device
* @map: map a physically contiguous memory region to an iommu domain
* @map_pages: map a physically contiguous set of pages of the same size to
* an iommu domain.
@@ -286,6 +287,8 @@ struct iommu_ops {
struct iommu_domain_ops {
int (*attach_dev)(struct iommu_domain *domain, struct device *dev);
void (*detach_dev)(struct iommu_domain *domain, struct device *dev);
+ int (*set_dev_pasid)(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid);

int (*map)(struct iommu_domain *domain, unsigned long iova,
phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
@@ -680,6 +683,12 @@ int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner);
void iommu_group_release_dma_owner(struct iommu_group *group);
bool iommu_group_dma_owner_claimed(struct iommu_group *group);

+int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid);
+void iommu_detach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid);
+struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid);
#else /* CONFIG_IOMMU_API */

struct iommu_ops {};
@@ -1047,6 +1056,23 @@ static inline bool iommu_group_dma_owner_claimed(struct iommu_group *group)
{
return false;
}
+
+static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ return -ENODEV;
+}
+
+static inline void iommu_detach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+}
+
+static inline struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+ return NULL;
+}
#endif /* CONFIG_IOMMU_API */

/**
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 1d28a74a0511..6f2cbccc0570 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -39,6 +39,7 @@ struct iommu_group {
struct kobject kobj;
struct kobject *devices_kobj;
struct list_head devices;
+ struct xarray pasid_array;
struct mutex mutex;
void *iommu_data;
void (*iommu_data_release)(void *iommu_data);
@@ -663,6 +664,7 @@ struct iommu_group *iommu_group_alloc(void)
mutex_init(&group->mutex);
INIT_LIST_HEAD(&group->devices);
INIT_LIST_HEAD(&group->entry);
+ xa_init(&group->pasid_array);

ret = ida_alloc(&iommu_group_ida, GFP_KERNEL);
if (ret < 0) {
@@ -3258,3 +3260,124 @@ bool iommu_group_dma_owner_claimed(struct iommu_group *group)
return user;
}
EXPORT_SYMBOL_GPL(iommu_group_dma_owner_claimed);
+
+static int __iommu_set_group_pasid(struct iommu_domain *domain,
+ struct iommu_group *group, ioasid_t pasid)
+{
+ struct iommu_domain *ops_domain;
+ struct group_device *device;
+ int ret = 0;
+
+ if (domain == group->blocking_domain)
+ ops_domain = xa_load(&group->pasid_array, pasid);
+ else
+ ops_domain = domain;
+
+ list_for_each_entry(device, &group->devices, list) {
+ ret = ops_domain->ops->set_dev_pasid(domain, device->dev, pasid);
+ if (ret)
+ break;
+ }
+
+ return ret;
+}
+
+/*
+ * iommu_attach_device_pasid() - Attach a domain to pasid of device
+ * @domain: the iommu domain.
+ * @dev: the attached device.
+ * @pasid: the pasid of the device.
+ *
+ * Return: 0 on success, or an error.
+ */
+int iommu_attach_device_pasid(struct iommu_domain *domain,
+ struct device *dev, ioasid_t pasid)
+{
+ struct iommu_group *group;
+ void *curr;
+ int ret;
+
+ if (!domain->ops->set_dev_pasid)
+ return -EOPNOTSUPP;
+
+ group = iommu_group_get(dev);
+ if (!group)
+ return -ENODEV;
+
+ mutex_lock(&group->mutex);
+ curr = xa_cmpxchg(&group->pasid_array, pasid, NULL, domain, GFP_KERNEL);
+ if (curr) {
+ ret = xa_err(curr) ? : -EBUSY;
+ goto out_unlock;
+ }
+
+ ret = __iommu_set_group_pasid(domain, group, pasid);
+ if (ret) {
+ __iommu_set_group_pasid(group->blocking_domain, group, pasid);
+ xa_erase(&group->pasid_array, pasid);
+ }
+out_unlock:
+ mutex_unlock(&group->mutex);
+ iommu_group_put(group);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_attach_device_pasid);
+
+/*
+ * iommu_detach_device_pasid() - Detach the domain from pasid of device
+ * @domain: the iommu domain.
+ * @dev: the attached device.
+ * @pasid: the pasid of the device.
+ *
+ * The @domain must have been attached to @pasid of the @dev with
+ * iommu_attach_device_pasid().
+ */
+void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
+ ioasid_t pasid)
+{
+ struct iommu_group *group = iommu_group_get(dev);
+
+ mutex_lock(&group->mutex);
+ __iommu_set_group_pasid(group->blocking_domain, group, pasid);
+ WARN_ON(xa_erase(&group->pasid_array, pasid) != domain);
+ mutex_unlock(&group->mutex);
+
+ iommu_group_put(group);
+}
+EXPORT_SYMBOL_GPL(iommu_detach_device_pasid);
+
+/*
+ * iommu_get_domain_for_dev_pasid() - Retrieve domain for @pasid of @dev
+ * @dev: the queried device
+ * @pasid: the pasid of the device
+ *
+ * This is a variant of iommu_get_domain_for_dev(). It returns the existing
+ * domain attached to pasid of a device. It's only for internal use of the
+ * IOMMU subsystem. The caller must take care to avoid any possible
+ * use-after-free case.
+ *
+ * Return: attached domain on success, NULL otherwise.
+ */
+struct iommu_domain *
+iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
+{
+ struct iommu_domain *domain;
+ struct iommu_group *group;
+
+ if (!pasid_valid(pasid))
+ return NULL;
+
+ group = iommu_group_get(dev);
+ if (!group)
+ return NULL;
+ /*
+ * The xarray protects its internal state with RCU. Hence the domain
+ * obtained is either NULL or fully formed.
+ */
+ domain = xa_load(&group->pasid_array, pasid);
+ iommu_group_put(group);
+
+ return domain;
+}
+EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev_pasid);
--
2.25.1

2022-08-17 02:03:44

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v11 06/13] iommu: Add IOMMU SVA domain support

The sva iommu_domain represents a hardware pagetable that the IOMMU
hardware could use for SVA translation. This adds some infrastructure
to support SVA domain in the iommu common layer. It includes:

- Extend the iommu_domain to support a new IOMMU_DOMAIN_SVA domain
type. The IOMMU drivers that support allocation of the SVA domain
should provide its own sva domain specific iommu_domain_ops.
- Add a helper to allocate an SVA domain. The iommu_domain_free()
is still used to free an SVA domain.

The report_iommu_fault() should be replaced by the new
iommu_report_device_fault(). Leave the existing fault handler with the
existing users and the newly added SVA members excludes it.

Suggested-by: Jean-Philippe Brucker <[email protected]>
Suggested-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Yi Liu <[email protected]>
---
include/linux/iommu.h | 25 +++++++++++++++++++++++--
drivers/iommu/iommu.c | 20 ++++++++++++++++++++
2 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index f1e8953b1e2e..d0b32a289835 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -64,6 +64,8 @@ struct iommu_domain_geometry {
#define __IOMMU_DOMAIN_PT (1U << 2) /* Domain is identity mapped */
#define __IOMMU_DOMAIN_DMA_FQ (1U << 3) /* DMA-API uses flush queue */

+#define __IOMMU_DOMAIN_SVA (1U << 4) /* Shared process address space */
+
/*
* This are the possible domain-types
*
@@ -77,6 +79,8 @@ struct iommu_domain_geometry {
* certain optimizations for these domains
* IOMMU_DOMAIN_DMA_FQ - As above, but definitely using batched TLB
* invalidation.
+ * IOMMU_DOMAIN_SVA - DMA addresses are shared process addresses
+ * represented by mm_struct's.
*/
#define IOMMU_DOMAIN_BLOCKED (0U)
#define IOMMU_DOMAIN_IDENTITY (__IOMMU_DOMAIN_PT)
@@ -86,15 +90,24 @@ struct iommu_domain_geometry {
#define IOMMU_DOMAIN_DMA_FQ (__IOMMU_DOMAIN_PAGING | \
__IOMMU_DOMAIN_DMA_API | \
__IOMMU_DOMAIN_DMA_FQ)
+#define IOMMU_DOMAIN_SVA (__IOMMU_DOMAIN_SVA)

struct iommu_domain {
unsigned type;
const struct iommu_domain_ops *ops;
unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
- iommu_fault_handler_t handler;
- void *handler_token;
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
+ union {
+ struct {
+ iommu_fault_handler_t handler;
+ void *handler_token;
+ };
+ struct { /* IOMMU_DOMAIN_SVA */
+ struct mm_struct *mm;
+ refcount_t users;
+ };
+ };
};

static inline bool iommu_is_dma_domain(struct iommu_domain *domain)
@@ -683,6 +696,8 @@ int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner);
void iommu_group_release_dma_owner(struct iommu_group *group);
bool iommu_group_dma_owner_claimed(struct iommu_group *group);

+struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
+ struct mm_struct *mm);
int iommu_attach_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid);
void iommu_detach_device_pasid(struct iommu_domain *domain,
@@ -1057,6 +1072,12 @@ static inline bool iommu_group_dma_owner_claimed(struct iommu_group *group)
return false;
}

+static inline struct iommu_domain *
+iommu_sva_domain_alloc(struct device *dev, struct mm_struct *mm)
+{
+ return NULL;
+}
+
static inline int iommu_attach_device_pasid(struct iommu_domain *domain,
struct device *dev, ioasid_t pasid)
{
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6f2cbccc0570..ac5a1f51a9a1 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -27,6 +27,7 @@
#include <linux/module.h>
#include <linux/cc_platform.h>
#include <trace/events/iommu.h>
+#include <linux/sched/mm.h>

static struct kset *iommu_group_kset;
static DEFINE_IDA(iommu_group_ida);
@@ -1960,6 +1961,8 @@ EXPORT_SYMBOL_GPL(iommu_domain_alloc);

void iommu_domain_free(struct iommu_domain *domain)
{
+ if (domain->type == IOMMU_DOMAIN_SVA)
+ mmdrop(domain->mm);
iommu_put_dma_cookie(domain);
domain->ops->free(domain);
}
@@ -3381,3 +3384,20 @@ iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
return domain;
}
EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev_pasid);
+
+struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
+ struct mm_struct *mm)
+{
+ const struct iommu_ops *ops = dev_iommu_ops(dev);
+ struct iommu_domain *domain;
+
+ domain = ops->domain_alloc(IOMMU_DOMAIN_SVA);
+ if (!domain)
+ return NULL;
+
+ domain->type = IOMMU_DOMAIN_SVA;
+ mmgrab(mm);
+ domain->mm = mm;
+
+ return domain;
+}
--
2.25.1

2022-08-17 02:04:51

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v11 09/13] iommu/sva: Refactoring iommu_sva_bind/unbind_device()

The existing iommu SVA interfaces are implemented by calling the SVA
specific iommu ops provided by the IOMMU drivers. There's no need for
any SVA specific ops in iommu_ops vector anymore as we can achieve
this through the generic attach/detach_dev_pasid domain ops.

This refactors the IOMMU SVA interfaces implementation by using the
iommu_detach/detach_device_pasid interfaces and align them with the
concept of the SVA iommu domain. Put the new SVA code in the SVA
related file in order to make it self-contained.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
---
include/linux/iommu.h | 43 +++++++-------
drivers/iommu/iommu-sva-lib.c | 104 ++++++++++++++++++++++++++++++++++
drivers/iommu/iommu.c | 91 -----------------------------
3 files changed, 127 insertions(+), 111 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index d0b32a289835..86b6870f9697 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -643,6 +643,7 @@ struct iommu_fwspec {
*/
struct iommu_sva {
struct device *dev;
+ struct iommu_domain *domain;
};

int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode,
@@ -684,11 +685,6 @@ void iommu_release_device(struct device *dev);
int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f);
int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f);

-struct iommu_sva *iommu_sva_bind_device(struct device *dev,
- struct mm_struct *mm);
-void iommu_sva_unbind_device(struct iommu_sva *handle);
-u32 iommu_sva_get_pasid(struct iommu_sva *handle);
-
int iommu_device_use_default_domain(struct device *dev);
void iommu_device_unuse_default_domain(struct device *dev);

@@ -1028,21 +1024,6 @@ iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
return -ENODEV;
}

-static inline struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
-{
- return NULL;
-}
-
-static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
-{
-}
-
-static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
-{
- return IOMMU_PASID_INVALID;
-}
-
static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev)
{
return NULL;
@@ -1119,4 +1100,26 @@ void iommu_debugfs_setup(void);
static inline void iommu_debugfs_setup(void) {}
#endif

+#ifdef CONFIG_IOMMU_SVA
+struct iommu_sva *iommu_sva_bind_device(struct device *dev,
+ struct mm_struct *mm);
+void iommu_sva_unbind_device(struct iommu_sva *handle);
+u32 iommu_sva_get_pasid(struct iommu_sva *handle);
+#else
+static inline struct iommu_sva *
+iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
+{
+ return NULL;
+}
+
+static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
+{
+}
+
+static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle)
+{
+ return IOMMU_PASID_INVALID;
+}
+#endif /* CONFIG_IOMMU_SVA */
+
#endif /* __LINUX_IOMMU_H */
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index 106506143896..f5a9adde7491 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -4,6 +4,7 @@
*/
#include <linux/mutex.h>
#include <linux/sched/mm.h>
+#include <linux/iommu.h>

#include "iommu-sva-lib.h"

@@ -69,3 +70,106 @@ struct mm_struct *iommu_sva_find(ioasid_t pasid)
return ioasid_find(&iommu_sva_pasid, pasid, __mmget_not_zero);
}
EXPORT_SYMBOL_GPL(iommu_sva_find);
+
+/**
+ * iommu_sva_bind_device() - Bind a process address space to a device
+ * @dev: the device
+ * @mm: the mm to bind, caller must hold a reference to mm_users
+ *
+ * Create a bond between device and address space, allowing the device to access
+ * the mm using the returned PASID. If a bond already exists between @device and
+ * @mm, it is returned and an additional reference is taken. Caller must call
+ * iommu_sva_unbind_device() to release each reference.
+ *
+ * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
+ * initialize the required SVA features.
+ *
+ * On error, returns an ERR_PTR value.
+ */
+struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
+{
+ struct iommu_domain *domain;
+ struct iommu_sva *bond;
+ ioasid_t max_pasids;
+ int ret;
+
+ max_pasids = dev->iommu->max_pasids;
+ if (!max_pasids)
+ return ERR_PTR(-EOPNOTSUPP);
+
+ /* Allocate mm->pasid if necessary. */
+ ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
+ if (ret)
+ return ERR_PTR(ret);
+
+ bond = kzalloc(sizeof(*bond), GFP_KERNEL);
+ if (!bond)
+ return ERR_PTR(-ENOMEM);
+
+ mutex_lock(&iommu_sva_lock);
+ /* Search for an existing domain. */
+ domain = iommu_get_domain_for_dev_pasid(dev, mm->pasid);
+ if (domain) {
+ refcount_inc(&domain->users);
+ goto out;
+ }
+
+ /* Allocate a new domain and set it on device pasid. */
+ domain = iommu_sva_domain_alloc(dev, mm);
+ if (!domain) {
+ ret = -ENOMEM;
+ goto out_unlock;
+ }
+
+ ret = iommu_attach_device_pasid(domain, dev, mm->pasid);
+ if (ret)
+ goto out_free_domain;
+ refcount_set(&domain->users, 1);
+out:
+ mutex_unlock(&iommu_sva_lock);
+ bond->dev = dev;
+ bond->domain = domain;
+
+ return bond;
+
+out_free_domain:
+ iommu_domain_free(domain);
+out_unlock:
+ mutex_unlock(&iommu_sva_lock);
+ kfree(bond);
+
+ return ERR_PTR(ret);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
+
+/**
+ * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
+ * @handle: the handle returned by iommu_sva_bind_device()
+ *
+ * Put reference to a bond between device and address space. The device should
+ * not be issuing any more transaction for this PASID. All outstanding page
+ * requests for this PASID must have been flushed to the IOMMU.
+ */
+void iommu_sva_unbind_device(struct iommu_sva *handle)
+{
+ struct iommu_domain *domain = handle->domain;
+ ioasid_t pasid = domain->mm->pasid;
+ struct device *dev = handle->dev;
+
+ mutex_lock(&iommu_sva_lock);
+ if (refcount_dec_and_test(&domain->users)) {
+ iommu_detach_device_pasid(domain, dev, pasid);
+ iommu_domain_free(domain);
+ }
+ mutex_unlock(&iommu_sva_lock);
+ kfree(handle);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
+
+u32 iommu_sva_get_pasid(struct iommu_sva *handle)
+{
+ struct iommu_domain *domain = handle->domain;
+
+ return domain->mm->pasid;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ac5a1f51a9a1..c77860695de3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2776,97 +2776,6 @@ int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat)
}
EXPORT_SYMBOL_GPL(iommu_dev_disable_feature);

-/**
- * iommu_sva_bind_device() - Bind a process address space to a device
- * @dev: the device
- * @mm: the mm to bind, caller must hold a reference to it
- *
- * Create a bond between device and address space, allowing the device to access
- * the mm using the returned PASID. If a bond already exists between @device and
- * @mm, it is returned and an additional reference is taken. Caller must call
- * iommu_sva_unbind_device() to release each reference.
- *
- * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
- * initialize the required SVA features.
- *
- * On error, returns an ERR_PTR value.
- */
-struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
-{
- struct iommu_group *group;
- struct iommu_sva *handle = ERR_PTR(-EINVAL);
- const struct iommu_ops *ops = dev_iommu_ops(dev);
-
- if (!ops->sva_bind)
- return ERR_PTR(-ENODEV);
-
- group = iommu_group_get(dev);
- if (!group)
- return ERR_PTR(-ENODEV);
-
- /* Ensure device count and domain don't change while we're binding */
- mutex_lock(&group->mutex);
-
- /*
- * To keep things simple, SVA currently doesn't support IOMMU groups
- * with more than one device. Existing SVA-capable systems are not
- * affected by the problems that required IOMMU groups (lack of ACS
- * isolation, device ID aliasing and other hardware issues).
- */
- if (iommu_group_device_count(group) != 1)
- goto out_unlock;
-
- handle = ops->sva_bind(dev, mm);
-
-out_unlock:
- mutex_unlock(&group->mutex);
- iommu_group_put(group);
-
- return handle;
-}
-EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
-
-/**
- * iommu_sva_unbind_device() - Remove a bond created with iommu_sva_bind_device
- * @handle: the handle returned by iommu_sva_bind_device()
- *
- * Put reference to a bond between device and address space. The device should
- * not be issuing any more transaction for this PASID. All outstanding page
- * requests for this PASID must have been flushed to the IOMMU.
- */
-void iommu_sva_unbind_device(struct iommu_sva *handle)
-{
- struct iommu_group *group;
- struct device *dev = handle->dev;
- const struct iommu_ops *ops = dev_iommu_ops(dev);
-
- if (!ops->sva_unbind)
- return;
-
- group = iommu_group_get(dev);
- if (!group)
- return;
-
- mutex_lock(&group->mutex);
- ops->sva_unbind(handle);
- mutex_unlock(&group->mutex);
-
- iommu_group_put(group);
-}
-EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);
-
-u32 iommu_sva_get_pasid(struct iommu_sva *handle)
-{
- const struct iommu_ops *ops = dev_iommu_ops(handle->dev);
-
- if (!ops->sva_get_pasid)
- return IOMMU_PASID_INVALID;
-
- return ops->sva_get_pasid(handle);
-}
-EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
-
/*
* Changes the default domain of an iommu group that has *only* one device
*
--
2.25.1

2022-08-17 02:05:22

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v11 02/13] iommu: Add max_pasids field in struct dev_iommu

Use this field to save the number of PASIDs that a device is able to
consume. It is a generic attribute of a device and lifting it into the
per-device dev_iommu struct could help to avoid the boilerplate code
in various IOMMU drivers.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
Reviewed-by: Yi Liu <[email protected]>
---
include/linux/iommu.h | 2 ++
drivers/iommu/iommu.c | 20 ++++++++++++++++++++
2 files changed, 22 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ed172cbdabf2..0d9ce209a501 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -368,6 +368,7 @@ struct iommu_fault_param {
* @fwspec: IOMMU fwspec data
* @iommu_dev: IOMMU device this device is linked to
* @priv: IOMMU Driver private data
+ * @max_pasids: number of PASIDs this device can consume
*
* TODO: migrate other per device data pointers under iommu_dev_data, e.g.
* struct iommu_group *iommu_group;
@@ -379,6 +380,7 @@ struct dev_iommu {
struct iommu_fwspec *fwspec;
struct iommu_device *iommu_dev;
void *priv;
+ u32 max_pasids;
};

int iommu_device_register(struct iommu_device *iommu,
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 780fb7071577..e9f6a8d33b58 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -20,6 +20,7 @@
#include <linux/idr.h>
#include <linux/err.h>
#include <linux/pci.h>
+#include <linux/pci-ats.h>
#include <linux/bitops.h>
#include <linux/property.h>
#include <linux/fsl/mc.h>
@@ -218,6 +219,24 @@ static void dev_iommu_free(struct device *dev)
kfree(param);
}

+static u32 dev_iommu_get_max_pasids(struct device *dev)
+{
+ u32 max_pasids = 0, bits = 0;
+ int ret;
+
+ if (dev_is_pci(dev)) {
+ ret = pci_max_pasids(to_pci_dev(dev));
+ if (ret > 0)
+ max_pasids = ret;
+ } else {
+ ret = device_property_read_u32(dev, "pasid-num-bits", &bits);
+ if (!ret)
+ max_pasids = 1UL << bits;
+ }
+
+ return min_t(u32, max_pasids, dev->iommu->iommu_dev->max_pasids);
+}
+
static int __iommu_probe_device(struct device *dev, struct list_head *group_list)
{
const struct iommu_ops *ops = dev->bus->iommu_ops;
@@ -243,6 +262,7 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
}

dev->iommu->iommu_dev = iommu_dev;
+ dev->iommu->max_pasids = dev_iommu_get_max_pasids(dev);

group = iommu_group_get_for_dev(dev);
if (IS_ERR(group)) {
--
2.25.1

2022-08-17 02:05:29

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v11 11/13] iommu: Prepare IOMMU domain for IOPF

This adds some mechanisms around the iommu_domain so that the I/O page
fault handling framework could route a page fault to the domain and
call the fault handler from it.

Add pointers to the page fault handler and its private data in struct
iommu_domain. The fault handler will be called with the private data
as a parameter once a page fault is routed to the domain. Any kernel
component which owns an iommu domain could install handler and its
private parameter so that the page fault could be further routed and
handled.

This also prepares the SVA implementation to be the first consumer of
the per-domain page fault handling model. The I/O page fault handler
for SVA is copied to the SVA file with mmget_not_zero() added before
mmap_read_lock().

Suggested-by: Jean-Philippe Brucker <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
---
include/linux/iommu.h | 3 ++
drivers/iommu/iommu-sva-lib.h | 8 +++++
drivers/iommu/io-pgfault.c | 7 +++++
drivers/iommu/iommu-sva-lib.c | 58 +++++++++++++++++++++++++++++++++++
drivers/iommu/iommu.c | 4 +++
5 files changed, 80 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 790bdc007f54..bd6331a4dbae 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -98,6 +98,9 @@ struct iommu_domain {
unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */
struct iommu_domain_geometry geometry;
struct iommu_dma_cookie *iova_cookie;
+ enum iommu_page_response_code (*iopf_handler)(struct iommu_fault *fault,
+ void *data);
+ void *fault_data;
union {
struct {
iommu_fault_handler_t handler;
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index 8909ea1094e3..1b3ace4b5863 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -26,6 +26,8 @@ int iopf_queue_flush_dev(struct device *dev);
struct iopf_queue *iopf_queue_alloc(const char *name);
void iopf_queue_free(struct iopf_queue *queue);
int iopf_queue_discard_partial(struct iopf_queue *queue);
+enum iommu_page_response_code
+iommu_sva_handle_iopf(struct iommu_fault *fault, void *data);

#else /* CONFIG_IOMMU_SVA */
static inline int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
@@ -63,5 +65,11 @@ static inline int iopf_queue_discard_partial(struct iopf_queue *queue)
{
return -ENODEV;
}
+
+static inline enum iommu_page_response_code
+iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
+{
+ return IOMMU_PAGE_RESP_INVALID;
+}
#endif /* CONFIG_IOMMU_SVA */
#endif /* _IOMMU_SVA_LIB_H */
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index 1df8c1dcae77..aee9e033012f 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -181,6 +181,13 @@ static void iopf_handle_group(struct work_struct *work)
* request completes, outstanding faults will have been dealt with by the time
* the PASID is freed.
*
+ * Any valid page fault will be eventually routed to an iommu domain and the
+ * page fault handler installed there will get called. The users of this
+ * handling framework should guarantee that the iommu domain could only be
+ * freed after the device has stopped generating page faults (or the iommu
+ * hardware has been set to block the page faults) and the pending page faults
+ * have been flushed.
+ *
* Return: 0 on success and <0 on error.
*/
int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index f5a9adde7491..79a9d43bdfe1 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -173,3 +173,61 @@ u32 iommu_sva_get_pasid(struct iommu_sva *handle)
return domain->mm->pasid;
}
EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
+
+/*
+ * I/O page fault handler for SVA
+ */
+enum iommu_page_response_code
+iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
+{
+ vm_fault_t ret;
+ struct vm_area_struct *vma;
+ struct mm_struct *mm = data;
+ unsigned int access_flags = 0;
+ unsigned int fault_flags = FAULT_FLAG_REMOTE;
+ struct iommu_fault_page_request *prm = &fault->prm;
+ enum iommu_page_response_code status = IOMMU_PAGE_RESP_INVALID;
+
+ if (!(prm->flags & IOMMU_FAULT_PAGE_REQUEST_PASID_VALID))
+ return status;
+
+ if (!mmget_not_zero(mm))
+ return status;
+
+ mmap_read_lock(mm);
+
+ vma = find_extend_vma(mm, prm->addr);
+ if (!vma)
+ /* Unmapped area */
+ goto out_put_mm;
+
+ if (prm->perm & IOMMU_FAULT_PERM_READ)
+ access_flags |= VM_READ;
+
+ if (prm->perm & IOMMU_FAULT_PERM_WRITE) {
+ access_flags |= VM_WRITE;
+ fault_flags |= FAULT_FLAG_WRITE;
+ }
+
+ if (prm->perm & IOMMU_FAULT_PERM_EXEC) {
+ access_flags |= VM_EXEC;
+ fault_flags |= FAULT_FLAG_INSTRUCTION;
+ }
+
+ if (!(prm->perm & IOMMU_FAULT_PERM_PRIV))
+ fault_flags |= FAULT_FLAG_USER;
+
+ if (access_flags & ~vma->vm_flags)
+ /* Access fault */
+ goto out_put_mm;
+
+ ret = handle_mm_fault(vma, prm->addr, fault_flags, NULL);
+ status = ret & VM_FAULT_ERROR ? IOMMU_PAGE_RESP_INVALID :
+ IOMMU_PAGE_RESP_SUCCESS;
+
+out_put_mm:
+ mmap_read_unlock(mm);
+ mmput(mm);
+
+ return status;
+}
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c77860695de3..3ad32aa7523a 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -29,6 +29,8 @@
#include <trace/events/iommu.h>
#include <linux/sched/mm.h>

+#include "iommu-sva-lib.h"
+
static struct kset *iommu_group_kset;
static DEFINE_IDA(iommu_group_ida);

@@ -3307,6 +3309,8 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev,
domain->type = IOMMU_DOMAIN_SVA;
mmgrab(mm);
domain->mm = mm;
+ domain->iopf_handler = iommu_sva_handle_iopf;
+ domain->fault_data = mm;

return domain;
}
--
2.25.1

2022-08-17 02:19:37

by Baolu Lu

[permalink] [raw]
Subject: [PATCH v11 13/13] iommu: Rename iommu-sva-lib.{c,h}

Rename iommu-sva-lib.c[h] to iommu-sva.c[h] as it contains all code
for SVA implementation in iommu core.

Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Jean-Philippe Brucker <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Jason Gunthorpe <[email protected]>
---
drivers/iommu/{iommu-sva-lib.h => iommu-sva.h} | 6 +++---
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 2 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 2 +-
drivers/iommu/intel/iommu.c | 2 +-
drivers/iommu/intel/svm.c | 2 +-
drivers/iommu/io-pgfault.c | 2 +-
drivers/iommu/{iommu-sva-lib.c => iommu-sva.c} | 2 +-
drivers/iommu/iommu.c | 2 +-
drivers/iommu/Makefile | 2 +-
9 files changed, 11 insertions(+), 11 deletions(-)
rename drivers/iommu/{iommu-sva-lib.h => iommu-sva.h} (95%)
rename drivers/iommu/{iommu-sva-lib.c => iommu-sva.c} (99%)

diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva.h
similarity index 95%
rename from drivers/iommu/iommu-sva-lib.h
rename to drivers/iommu/iommu-sva.h
index 1b3ace4b5863..7215a761b962 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva.h
@@ -2,8 +2,8 @@
/*
* SVA library for IOMMU drivers
*/
-#ifndef _IOMMU_SVA_LIB_H
-#define _IOMMU_SVA_LIB_H
+#ifndef _IOMMU_SVA_H
+#define _IOMMU_SVA_H

#include <linux/ioasid.h>
#include <linux/mm_types.h>
@@ -72,4 +72,4 @@ iommu_sva_handle_iopf(struct iommu_fault *fault, void *data)
return IOMMU_PAGE_RESP_INVALID;
}
#endif /* CONFIG_IOMMU_SVA */
-#endif /* _IOMMU_SVA_LIB_H */
+#endif /* _IOMMU_SVA_H */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 27a56ae6fff8..553cd44773e1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -10,7 +10,7 @@
#include <linux/slab.h>

#include "arm-smmu-v3.h"
-#include "../../iommu-sva-lib.h"
+#include "../../iommu-sva.h"
#include "../../io-pgtable-arm.h"

struct arm_smmu_mmu_notifier {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 0dd31466401d..6dd57b6f9050 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -31,7 +31,7 @@
#include <linux/amba/bus.h>

#include "arm-smmu-v3.h"
-#include "../../iommu-sva-lib.h"
+#include "../../iommu-sva.h"

static bool disable_bypass = true;
module_param(disable_bypass, bool, 0444);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 804bfae54d89..17aa07e9799c 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -27,7 +27,7 @@

#include "iommu.h"
#include "../irq_remapping.h"
-#include "../iommu-sva-lib.h"
+#include "../iommu-sva.h"
#include "pasid.h"
#include "cap_audit.h"

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index af4dbcdbe380..d07b16918e4c 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -24,7 +24,7 @@
#include "iommu.h"
#include "pasid.h"
#include "perf.h"
-#include "../iommu-sva-lib.h"
+#include "../iommu-sva.h"
#include "trace.h"

static irqreturn_t prq_event_thread(int irq, void *d);
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c
index d1c522f4ab34..7a60b123e6b9 100644
--- a/drivers/iommu/io-pgfault.c
+++ b/drivers/iommu/io-pgfault.c
@@ -11,7 +11,7 @@
#include <linux/slab.h>
#include <linux/workqueue.h>

-#include "iommu-sva-lib.h"
+#include "iommu-sva.h"

/**
* struct iopf_queue - IO Page Fault queue
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva.c
similarity index 99%
rename from drivers/iommu/iommu-sva-lib.c
rename to drivers/iommu/iommu-sva.c
index 79a9d43bdfe1..d9a3a6336e51 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva.c
@@ -6,7 +6,7 @@
#include <linux/sched/mm.h>
#include <linux/iommu.h>

-#include "iommu-sva-lib.h"
+#include "iommu-sva.h"

static DEFINE_MUTEX(iommu_sva_lock);
static DECLARE_IOASID_SET(iommu_sva_pasid);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3ad32aa7523a..be4c652871d8 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -29,7 +29,7 @@
#include <trace/events/iommu.h>
#include <linux/sched/mm.h>

-#include "iommu-sva-lib.h"
+#include "iommu-sva.h"

static struct kset *iommu_group_kset;
static DEFINE_IDA(iommu_group_ida);
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 44475a9b3eea..c1763476162b 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -27,6 +27,6 @@ obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
-obj-$(CONFIG_IOMMU_SVA) += iommu-sva-lib.o io-pgfault.o
+obj-$(CONFIG_IOMMU_SVA) += iommu-sva.o io-pgfault.o
obj-$(CONFIG_SPRD_IOMMU) += sprd-iommu.o
obj-$(CONFIG_APPLE_DART) += apple-dart.o
--
2.25.1

2022-08-17 21:27:30

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v11 04/13] PCI: Allow PASID only when ACS enforced on upstreaming path

On Wed, Aug 17, 2022 at 09:20:15AM +0800, Lu Baolu wrote:
> Some configurations of the PCI fabric will route device originated TLP
> packets based on the memory addresses.

This makes it sound like a few unusual configurations will route TLPs
based on memory addresses, but address routing is the default for all
PCIe Memory Requests, and ACS provides a way to override that default.

> These configurations are incompatible with PASID as the PASID
> packets form a distinct address space.

I would say "the Requester ID/PASID combination forms a distinct
address space."

> For instance, any configuration where switches are present
> without ACS enabled is incompatible.
>
> This enhances the pci_enable_pasid() interface by requiring the ACS to
> support Source Validation, Request Redirection, Completer Redirection,
> and Upstream Forwarding. This effectively means that devices cannot
> spoof their requester ID, requests and completions cannot be redirected,
> and all transactions are forwarded upstream, even as it passes through a
> bridge where the target device is downstream.

I think your patch actually requires all those features to be not just
"supported" but actually *enabled* for the entire path leading to the
device.

To use the terms from the spec:

"P2P Request Redirect"
"P2P Completion Redirect"
"Requester ID, Requests, and Completions"

and maybe something like:

... even if the TLP looks like a P2P Request because its memory
address (ignoring the PASID) would fall in a bridge window and would
normally be routed downstream.

Does the PCIe spec really allow TLPs with PASID to be routed anywhere
except upstream? It seems nonsensical to route them downstream, and
hardware should be able to check that easily. But I took a quick look
through the spec and didn't see anything about PASID by itself
influencing routing.

> Suggested-by: Jason Gunthorpe <[email protected]>
> Suggested-by: Kevin Tian <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> ---
> drivers/pci/ats.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index c967ad6e2626..0715e48e7973 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -382,6 +382,11 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
> if (!pasid)
> return -EINVAL;
>
> + if (!pci_acs_path_enabled(pdev, NULL,
> + PCI_ACS_SV | PCI_ACS_RR |
> + PCI_ACS_CR | PCI_ACS_UF))
> + return -EINVAL;
> +
> pci_read_config_word(pdev, pasid + PCI_PASID_CAP, &supported);
> supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV;
>
> --
> 2.25.1
>

2022-08-17 22:54:09

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v11 04/13] PCI: Allow PASID only when ACS enforced on upstreaming path

On Wed, Aug 17, 2022 at 04:17:43PM -0500, Bjorn Helgaas wrote:

> Does the PCIe spec really allow TLPs with PASID to be routed anywhere
> except upstream?

I think yes:

2.2.10.2 End-End TLP Prefix Processing:

The presence of an End-End TLP Prefix does not alter the routing of a
TLP. TLPs are routed based on the routing rules covered in Section
2.2.4 .

Which I read as saying that routing is done after stripping all the
prefixes. PASID is a prefix.

Lu, you may want to quote the spec in the commit message to make it
clear.

A strange choice in my opinion, but there it is..

Jason

2022-08-18 12:24:54

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v11 04/13] PCI: Allow PASID only when ACS enforced on upstreaming path

Hi Bjorn,

On 2022/8/18 05:17, Bjorn Helgaas wrote:
> On Wed, Aug 17, 2022 at 09:20:15AM +0800, Lu Baolu wrote:
>> Some configurations of the PCI fabric will route device originated TLP
>> packets based on the memory addresses.
> This makes it sound like a few unusual configurations will route TLPs
> based on memory addresses, but address routing is the default for all
> PCIe Memory Requests, and ACS provides a way to override that default.
>
>> These configurations are incompatible with PASID as the PASID
>> packets form a distinct address space.
> I would say "the Requester ID/PASID combination forms a distinct
> address space."
>
>> For instance, any configuration where switches are present
>> without ACS enabled is incompatible.
>>
>> This enhances the pci_enable_pasid() interface by requiring the ACS to
>> support Source Validation, Request Redirection, Completer Redirection,
>> and Upstream Forwarding. This effectively means that devices cannot
>> spoof their requester ID, requests and completions cannot be redirected,
>> and all transactions are forwarded upstream, even as it passes through a
>> bridge where the target device is downstream.
> I think your patch actually requires all those features to be not just
> "supported" but actually*enabled* for the entire path leading to the
> device.
>
> To use the terms from the spec:
>
> "P2P Request Redirect"
> "P2P Completion Redirect"
> "Requester ID, Requests, and Completions"
>
> and maybe something like:
>
> ... even if the TLP looks like a P2P Request because its memory
> address (ignoring the PASID) would fall in a bridge window and would
> normally be routed downstream.

Thank you for the suggestions. I will rephrase the commit message
accordingly like this:


PCI: Allow PASID only when ACS enforced on upstreaming path

The PCIe fabric routes TLPs based on memory addresses for all PCIe Memory
Requests regardless of whether TLPs have PASID prefixes. This is stated in
section "2.2.10.2 End-End TLP Prefix Processing" of the specification:

The presence of an End-End TLP Prefix does not alter the routing of a
TLP. TLPs are routed based on the routing rules covered in Section
2.2.4 .

As the Requester ID/PASID combination forms a distinct address space. The
memory address based routing is not compatible for PASID TLPs anymore.
Therefore we have to rely on ACS to override that default.

This enhances pci_enable_pasid() interface by requiring the ACS features
to be enabled for the entire path leading to the device. So that even if
the TLP looks like a P2P Request because its memory address (ignoring the
PASID) would fall in a bridge window and would normally be routed
downstream.

Best regards,
baolu

2022-08-18 12:42:47

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v11 04/13] PCI: Allow PASID only when ACS enforced on upstreaming path

On 2022/8/18 06:48, Jason Gunthorpe wrote:
> On Wed, Aug 17, 2022 at 04:17:43PM -0500, Bjorn Helgaas wrote:
>
>> Does the PCIe spec really allow TLPs with PASID to be routed anywhere
>> except upstream?
> I think yes:
>
> 2.2.10.2 End-End TLP Prefix Processing:
>
> The presence of an End-End TLP Prefix does not alter the routing of a
> TLP. TLPs are routed based on the routing rules covered in Section
> 2.2.4 .
>
> Which I read as saying that routing is done after stripping all the
> prefixes. PASID is a prefix.
>
> Lu, you may want to quote the spec in the commit message to make it
> clear.

Yes. Sure thing. Thank you!

Best regards,
baolu

2022-08-18 13:28:06

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v11 04/13] PCI: Allow PASID only when ACS enforced on upstreaming path

On Wed, Aug 17, 2022 at 09:20:15AM +0800, Lu Baolu wrote:
> Some configurations of the PCI fabric will route device originated TLP
> packets based on the memory addresses. These configurations are
> incompatible with PASID as the PASID packets form a distinct address
> space. For instance, any configuration where switches are present
> without ACS enabled is incompatible.
>
> This enhances the pci_enable_pasid() interface by requiring the ACS to
> support Source Validation, Request Redirection, Completer Redirection,
> and Upstream Forwarding. This effectively means that devices cannot
> spoof their requester ID, requests and completions cannot be redirected,
> and all transactions are forwarded upstream, even as it passes through a
> bridge where the target device is downstream.
>
> Suggested-by: Jason Gunthorpe <[email protected]>
> Suggested-by: Kevin Tian <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> ---
> drivers/pci/ats.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
> index c967ad6e2626..0715e48e7973 100644
> --- a/drivers/pci/ats.c
> +++ b/drivers/pci/ats.c
> @@ -382,6 +382,11 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
> if (!pasid)
> return -EINVAL;
>
> + if (!pci_acs_path_enabled(pdev, NULL,
> + PCI_ACS_SV | PCI_ACS_RR |
> + PCI_ACS_CR | PCI_ACS_UF))

I think we only need RR and UF here?

Source Validation causes the switch to validate the requestor RID in
each TLP which has nothing to do with address based routing

Completion Redirect changes how RID routing works, and has nothing to
do with address based routing.

Yes, both of those are usually set for virtualization scenarios but we
shouldn't check it here as a basic requirement to enable PASID.

Jason

2022-08-18 13:41:54

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v11 07/13] iommu/vt-d: Add SVA domain support

On Wed, Aug 17, 2022 at 09:20:18AM +0800, Lu Baolu wrote:

> +static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
> + struct device *dev, ioasid_t pasid)
> +{
> + struct device_domain_info *info = dev_iommu_priv_get(dev);
> + struct intel_iommu *iommu = info->iommu;
> + struct iommu_sva *sva;
> + int ret = 0;
> +
> + mutex_lock(&pasid_mutex);
> + /*
> + * Detach the domain if a blocking domain is set. Check the
> + * right domain type once the IOMMU driver supports a real
> + * blocking domain.
> + */
> + if (!domain || domain->type == IOMMU_DOMAIN_UNMANAGED) {
> + intel_svm_unbind_mm(dev, pasid);

See, I think this is exactly the wrong way to use the ops

The blockin domain ops should have its own function that just
unconditionally calls intel_svm_unbind_mm()

> + } else {
> + struct mm_struct *mm = domain->mm;
> +
> + sva = intel_svm_bind_mm(iommu, dev, mm);
> + if (IS_ERR(sva))
> + ret = PTR_ERR(sva);

And similarly the SVA domain should have its own op that does this SVM
call.

Muxing the ops with tests on the domain is an anti-pattern. In fact I
would say any time you see an op testing the domain->type it is very
suspicious.

Jason

2022-08-18 13:52:11

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v11 05/13] iommu: Add attach/detach_dev_pasid iommu interface

On Wed, Aug 17, 2022 at 09:20:16AM +0800, Lu Baolu wrote:

> +static int __iommu_set_group_pasid(struct iommu_domain *domain,
> + struct iommu_group *group, ioasid_t pasid)
> +{
> + struct iommu_domain *ops_domain;
> + struct group_device *device;
> + int ret = 0;
> +
> + if (domain == group->blocking_domain)
> + ops_domain = xa_load(&group->pasid_array, pasid);
> + else
> + ops_domain = domain;

This seems weird, why isn't this just always

domain->ops->set_dev_pasid()?

> + if (curr) {
> + ret = xa_err(curr) ? : -EBUSY;
> + goto out_unlock;
> + }
> +
> + ret = __iommu_set_group_pasid(domain, group, pasid);
> + if (ret) {
> + __iommu_set_group_pasid(group->blocking_domain, group, pasid);
> + xa_erase(&group->pasid_array, pasid);

I was looking at this trying to figure out why we are having
attach/detach semantics vs set and this error handling seems to be the
reason

Lets add a comment because it is subtle thing:

Setting a PASID to a blocking domain cannot fail, so we can always
safely error unwind a failure to attach a domain back to the original
group configuration of the PASID being unused.

> +/*
> + * iommu_detach_device_pasid() - Detach the domain from pasid of device
> + * @domain: the iommu domain.
> + * @dev: the attached device.
> + * @pasid: the pasid of the device.
> + *
> + * The @domain must have been attached to @pasid of the @dev with
> + * iommu_attach_device_pasid().
> + */
> +void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
> + ioasid_t pasid)

Don't pass domain here?

> +/*
> + * iommu_get_domain_for_dev_pasid() - Retrieve domain for @pasid of @dev
> + * @dev: the queried device
> + * @pasid: the pasid of the device
> + *
> + * This is a variant of iommu_get_domain_for_dev(). It returns the existing
> + * domain attached to pasid of a device. It's only for internal use of the
> + * IOMMU subsystem. The caller must take care to avoid any possible
> + * use-after-free case.

How exactly does the caller manage that?

> + *
> + * Return: attached domain on success, NULL otherwise.
> + */
> +struct iommu_domain *
> +iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
> +{
> + struct iommu_domain *domain;
> + struct iommu_group *group;
> +
> + if (!pasid_valid(pasid))
> + return NULL;

Why bother? If the pasid is not valid then it definitely won't be in the xarray.

But otherwise this overall thing seems fine to me

Jason

2022-08-18 13:56:26

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v11 09/13] iommu/sva: Refactoring iommu_sva_bind/unbind_device()

On Wed, Aug 17, 2022 at 09:20:20AM +0800, Lu Baolu wrote:
> +
> +/**
> + * iommu_sva_bind_device() - Bind a process address space to a device
> + * @dev: the device
> + * @mm: the mm to bind, caller must hold a reference to mm_users
> + *
> + * Create a bond between device and address space, allowing the device to access
> + * the mm using the returned PASID. If a bond already exists between @device and
> + * @mm, it is returned and an additional reference is taken. Caller must call
> + * iommu_sva_unbind_device() to release each reference.
> + *
> + * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
> + * initialize the required SVA features.
> + *
> + * On error, returns an ERR_PTR value.
> + */
> +struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
> +{
> + struct iommu_domain *domain;
> + struct iommu_sva *bond;

This is called handle below, pick one name please

> + ioasid_t max_pasids;
> + int ret;
> +
> + max_pasids = dev->iommu->max_pasids;
> + if (!max_pasids)
> + return ERR_PTR(-EOPNOTSUPP);
> +
> + /* Allocate mm->pasid if necessary. */
> + ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + bond = kzalloc(sizeof(*bond), GFP_KERNEL);
> + if (!bond)
> + return ERR_PTR(-ENOMEM);
> +
> + mutex_lock(&iommu_sva_lock);
> + /* Search for an existing domain. */
> + domain = iommu_get_domain_for_dev_pasid(dev, mm->pasid);
> + if (domain) {

This isn't safe, or sane. A driver could have attached something to
this PASID that is not a SVA domain and thus not protected by the
iommu_sva_lock.

At a minimum you should add a type match to
iommu_get_domain_for_dev_pasid(), eg to confirm it is a SVA domain and
do that check under the xa_lock of the pasid xarray.

And then the general idea is that SVA domain attach/detach must hold
this janky global lock.

> + refcount_inc(&domain->users);

This atomic is always processed under the iommu_sva_lock, so it
doesn't need to be an atomic anymore.

Otherwise this design looks OK to me too

Jason

2022-08-18 23:03:33

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v11 04/13] PCI: Allow PASID only when ACS enforced on upstreaming path

On Thu, Aug 18, 2022 at 07:53:15PM +0800, Baolu Lu wrote:
> On 2022/8/18 05:17, Bjorn Helgaas wrote:
> > On Wed, Aug 17, 2022 at 09:20:15AM +0800, Lu Baolu wrote:
> > > Some configurations of the PCI fabric will route device originated TLP
> > > packets based on the memory addresses.
> > This makes it sound like a few unusual configurations will route TLPs
> > based on memory addresses, but address routing is the default for all
> > PCIe Memory Requests, and ACS provides a way to override that default.
> >
> > > These configurations are incompatible with PASID as the PASID
> > > packets form a distinct address space.
> > I would say "the Requester ID/PASID combination forms a distinct
> > address space."
> >
> > > For instance, any configuration where switches are present
> > > without ACS enabled is incompatible.
> > >
> > > This enhances the pci_enable_pasid() interface by requiring the ACS to
> > > support Source Validation, Request Redirection, Completer Redirection,
> > > and Upstream Forwarding. This effectively means that devices cannot
> > > spoof their requester ID, requests and completions cannot be redirected,
> > > and all transactions are forwarded upstream, even as it passes through a
> > > bridge where the target device is downstream.
> >
> > I think your patch actually requires all those features to be not just
> > "supported" but actually*enabled* for the entire path leading to the
> > device.
> >
> > To use the terms from the spec:
> >
> > "P2P Request Redirect"
> > "P2P Completion Redirect"
> > "Requester ID, Requests, and Completions"
> >
> > and maybe something like:
> >
> > ... even if the TLP looks like a P2P Request because its memory
> > address (ignoring the PASID) would fall in a bridge window and would
> > normally be routed downstream.
>
> Thank you for the suggestions. I will rephrase the commit message
> accordingly like this:
>
>
> PCI: Allow PASID only when ACS enforced on upstreaming path

PCI: Enable PASID only when ACS RR & UF enabled on upstream path

The Requester ID/Process Address Space ID (PASID) combination
identifies an address space distinct from the PCI bus address space,
e.g., an address space defined by an IOMMU.

But the PCIe fabric routes Memory Requests based on the TLP address,
ignoring any PASID (PCIe r6.0, sec 2.2.10.4), so a TLP with PASID that
*should* go upstream to the IOMMU may instead be routed as a P2P
Request if its address falls in a bridge window.

To ensure that all Memory Requests with PASID are routed upstream,
only enable PASID if ACS P2P Request Redirect and Upstream Forwarding
are enabled for the path leading to the device.

> The PCIe fabric routes TLPs based on memory addresses for all PCIe Memory
> Requests regardless of whether TLPs have PASID prefixes. This is stated in
> section "2.2.10.2 End-End TLP Prefix Processing" of the specification:
>
> The presence of an End-End TLP Prefix does not alter the routing of a
> TLP. TLPs are routed based on the routing rules covered in Section
> 2.2.4 .
>
> As the Requester ID/PASID combination forms a distinct address space. The
> memory address based routing is not compatible for PASID TLPs anymore.
> Therefore we have to rely on ACS to override that default.
>
> This enhances pci_enable_pasid() interface by requiring the ACS features
> to be enabled for the entire path leading to the device. So that even if
> the TLP looks like a P2P Request because its memory address (ignoring the
> PASID) would fall in a bridge window and would normally be routed
> downstream.
>
> Best regards,
> baolu

2022-08-22 05:10:57

by zhangfei

[permalink] [raw]
Subject: Re: [PATCH v11 00/13] iommu: SVA and IOPF refactoring



On 2022/8/17 上午9:20, Lu Baolu wrote:
> Hi folks,
>
> The former part of this series introduces the IOMMU interfaces to attach
> or detach an iommu domain to/from a pasid of a device, and refactors the
> exsiting IOMMU SVA implementation by assigning an SVA type of iommu
> domain to a shared virtual address and replacing sva_bind/unbind iommu
> ops with a set_dev_pasid domain ops.
>
> The latter part changes the existing I/O page fault handling framework
> from only serving SVA to a generic one. Any driver or component could
> handle the I/O page faults for its domain in its own way by installing
> an I/O page fault handler.
>
> This series has been functionally tested on an x86 machine and compile
> tested for all architectures.
>
> This series is also available on github:
> [2] https://github.com/LuBaolu/intel-iommu/commits/iommu-sva-refactoring-v11
>
> Please review and suggest.
Tested-by: Zhangfei Gao <[email protected]>
On arm64 (Kunpeng920) with uacce.

Thanks

2022-08-22 07:45:44

by Ethan Zhao

[permalink] [raw]
Subject: Re: [PATCH v11 04/13] PCI: Allow PASID only when ACS enforced on upstreaming path


在 2022/8/19 7:00, Bjorn Helgaas 写道:
> On Thu, Aug 18, 2022 at 07:53:15PM +0800, Baolu Lu wrote:
>> On 2022/8/18 05:17, Bjorn Helgaas wrote:
>>> On Wed, Aug 17, 2022 at 09:20:15AM +0800, Lu Baolu wrote:
>>>> Some configurations of the PCI fabric will route device originated TLP
>>>> packets based on the memory addresses.
>>> This makes it sound like a few unusual configurations will route TLPs
>>> based on memory addresses, but address routing is the default for all
>>> PCIe Memory Requests, and ACS provides a way to override that default.
>>>
>>>> These configurations are incompatible with PASID as the PASID
>>>> packets form a distinct address space.
>>> I would say "the Requester ID/PASID combination forms a distinct
>>> address space."
>>>
>>>> For instance, any configuration where switches are present
>>>> without ACS enabled is incompatible.
>>>>
>>>> This enhances the pci_enable_pasid() interface by requiring the ACS to
>>>> support Source Validation, Request Redirection, Completer Redirection,
>>>> and Upstream Forwarding. This effectively means that devices cannot
>>>> spoof their requester ID, requests and completions cannot be redirected,
>>>> and all transactions are forwarded upstream, even as it passes through a
>>>> bridge where the target device is downstream.
>>> I think your patch actually requires all those features to be not just
>>> "supported" but actually*enabled* for the entire path leading to the
>>> device.
>>>
>>> To use the terms from the spec:
>>>
>>> "P2P Request Redirect"
>>> "P2P Completion Redirect"
>>> "Requester ID, Requests, and Completions"
>>>
>>> and maybe something like:
>>>
>>> ... even if the TLP looks like a P2P Request because its memory
>>> address (ignoring the PASID) would fall in a bridge window and would
>>> normally be routed downstream.
>> Thank you for the suggestions. I will rephrase the commit message
>> accordingly like this:
>>
>>
>> PCI: Allow PASID only when ACS enforced on upstreaming path
> PCI: Enable PASID only when ACS RR & UF enabled on upstream path
>
> The Requester ID/Process Address Space ID (PASID) combination
> identifies an address space distinct from the PCI bus address space,
> e.g., an address space defined by an IOMMU.
>
> But the PCIe fabric routes Memory Requests based on the TLP address,
> ignoring any PASID (PCIe r6.0, sec 2.2.10.4), so a TLP with PASID that
> *should* go upstream to the IOMMU may instead be routed as a P2P
> Request if its address falls in a bridge window.
>
> To ensure that all Memory Requests with PASID are routed upstream,
> only enable PASID if ACS P2P Request Redirect and Upstream Forwarding
> are enabled for the path leading to the device.

Seeing these comments, my questions gone.

Thanks Bjorn !

>> The PCIe fabric routes TLPs based on memory addresses for all PCIe Memory
>> Requests regardless of whether TLPs have PASID prefixes. This is stated in
>> section "2.2.10.2 End-End TLP Prefix Processing" of the specification:
>>
>> The presence of an End-End TLP Prefix does not alter the routing of a
>> TLP. TLPs are routed based on the routing rules covered in Section
>> 2.2.4 .
>>
>> As the Requester ID/PASID combination forms a distinct address space. The
>> memory address based routing is not compatible for PASID TLPs anymore.
>> Therefore we have to rely on ACS to override that default.
>>
>> This enhances pci_enable_pasid() interface by requiring the ACS features
>> to be enabled for the entire path leading to the device. So that even if
>> the TLP looks like a P2P Request because its memory address (ignoring the
>> PASID) would fall in a bridge window and would normally be routed
>> downstream.
>>
>> Best regards,
>> baolu

--
"firm, enduring, strong, and long-lived"

2022-08-23 07:19:13

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v11 04/13] PCI: Allow PASID only when ACS enforced on upstreaming path

On 2022/8/18 21:04, Jason Gunthorpe wrote:
> On Wed, Aug 17, 2022 at 09:20:15AM +0800, Lu Baolu wrote:
>> Some configurations of the PCI fabric will route device originated TLP
>> packets based on the memory addresses. These configurations are
>> incompatible with PASID as the PASID packets form a distinct address
>> space. For instance, any configuration where switches are present
>> without ACS enabled is incompatible.
>>
>> This enhances the pci_enable_pasid() interface by requiring the ACS to
>> support Source Validation, Request Redirection, Completer Redirection,
>> and Upstream Forwarding. This effectively means that devices cannot
>> spoof their requester ID, requests and completions cannot be redirected,
>> and all transactions are forwarded upstream, even as it passes through a
>> bridge where the target device is downstream.
>>
>> Suggested-by: Jason Gunthorpe<[email protected]>
>> Suggested-by: Kevin Tian<[email protected]>
>> Signed-off-by: Lu Baolu<[email protected]>
>> ---
>> drivers/pci/ats.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
>> index c967ad6e2626..0715e48e7973 100644
>> --- a/drivers/pci/ats.c
>> +++ b/drivers/pci/ats.c
>> @@ -382,6 +382,11 @@ int pci_enable_pasid(struct pci_dev *pdev, int features)
>> if (!pasid)
>> return -EINVAL;
>>
>> + if (!pci_acs_path_enabled(pdev, NULL,
>> + PCI_ACS_SV | PCI_ACS_RR |
>> + PCI_ACS_CR | PCI_ACS_UF))
> I think we only need RR and UF here?
>
> Source Validation causes the switch to validate the requestor RID in
> each TLP which has nothing to do with address based routing
>
> Completion Redirect changes how RID routing works, and has nothing to
> do with address based routing.
>
> Yes, both of those are usually set for virtualization scenarios but we
> shouldn't check it here as a basic requirement to enable PASID.

Yes. Here only requires RR and UF.

Best regards,
baolu

2022-08-23 07:58:12

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v11 04/13] PCI: Allow PASID only when ACS enforced on upstreaming path

On 2022/8/19 07:00, Bjorn Helgaas wrote:
> On Thu, Aug 18, 2022 at 07:53:15PM +0800, Baolu Lu wrote:
>> On 2022/8/18 05:17, Bjorn Helgaas wrote:
>>> On Wed, Aug 17, 2022 at 09:20:15AM +0800, Lu Baolu wrote:
>>>> Some configurations of the PCI fabric will route device originated TLP
>>>> packets based on the memory addresses.
>>> This makes it sound like a few unusual configurations will route TLPs
>>> based on memory addresses, but address routing is the default for all
>>> PCIe Memory Requests, and ACS provides a way to override that default.
>>>
>>>> These configurations are incompatible with PASID as the PASID
>>>> packets form a distinct address space.
>>> I would say "the Requester ID/PASID combination forms a distinct
>>> address space."
>>>
>>>> For instance, any configuration where switches are present
>>>> without ACS enabled is incompatible.
>>>>
>>>> This enhances the pci_enable_pasid() interface by requiring the ACS to
>>>> support Source Validation, Request Redirection, Completer Redirection,
>>>> and Upstream Forwarding. This effectively means that devices cannot
>>>> spoof their requester ID, requests and completions cannot be redirected,
>>>> and all transactions are forwarded upstream, even as it passes through a
>>>> bridge where the target device is downstream.
>>> I think your patch actually requires all those features to be not just
>>> "supported" but actually*enabled* for the entire path leading to the
>>> device.
>>>
>>> To use the terms from the spec:
>>>
>>> "P2P Request Redirect"
>>> "P2P Completion Redirect"
>>> "Requester ID, Requests, and Completions"
>>>
>>> and maybe something like:
>>>
>>> ... even if the TLP looks like a P2P Request because its memory
>>> address (ignoring the PASID) would fall in a bridge window and would
>>> normally be routed downstream.
>> Thank you for the suggestions. I will rephrase the commit message
>> accordingly like this:
>>
>>
>> PCI: Allow PASID only when ACS enforced on upstreaming path
> PCI: Enable PASID only when ACS RR & UF enabled on upstream path
>
> The Requester ID/Process Address Space ID (PASID) combination
> identifies an address space distinct from the PCI bus address space,
> e.g., an address space defined by an IOMMU.
>
> But the PCIe fabric routes Memory Requests based on the TLP address,
> ignoring any PASID (PCIe r6.0, sec 2.2.10.4), so a TLP with PASID that
> *should* go upstream to the IOMMU may instead be routed as a P2P
> Request if its address falls in a bridge window.
>
> To ensure that all Memory Requests with PASID are routed upstream,
> only enable PASID if ACS P2P Request Redirect and Upstream Forwarding
> are enabled for the path leading to the device.

Yours is clear and straight-forward. I will update the patch with above.
Thank you and very appreciated!

Best regards,
baolu

2022-08-23 08:00:09

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v11 00/13] iommu: SVA and IOPF refactoring

On 2022/8/22 12:49, Zhangfei Gao wrote:
> On 2022/8/17 上午9:20, Lu Baolu wrote:
>> Hi folks,
>>
>> The former part of this series introduces the IOMMU interfaces to attach
>> or detach an iommu domain to/from a pasid of a device, and refactors the
>> exsiting IOMMU SVA implementation by assigning an SVA type of iommu
>> domain to a shared virtual address and replacing sva_bind/unbind iommu
>> ops with a set_dev_pasid domain ops.
>>
>> The latter part changes the existing I/O page fault handling framework
>> from only serving SVA to a generic one. Any driver or component could
>> handle the I/O page faults for its domain in its own way by installing
>> an I/O page fault handler.
>>
>> This series has been functionally tested on an x86 machine and compile
>> tested for all architectures.
>>
>> This series is also available on github:
>> [2]
>> https://github.com/LuBaolu/intel-iommu/commits/iommu-sva-refactoring-v11
>>
>> Please review and suggest.
> Tested-by: Zhangfei Gao <[email protected]>
> On arm64 (Kunpeng920) with uacce.

Thank you very much!

Best regards,
baolu

2022-08-23 08:08:05

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v11 07/13] iommu/vt-d: Add SVA domain support

On 2022/8/18 21:36, Jason Gunthorpe wrote:
> On Wed, Aug 17, 2022 at 09:20:18AM +0800, Lu Baolu wrote:
>
>> +static int intel_svm_set_dev_pasid(struct iommu_domain *domain,
>> + struct device *dev, ioasid_t pasid)
>> +{
>> + struct device_domain_info *info = dev_iommu_priv_get(dev);
>> + struct intel_iommu *iommu = info->iommu;
>> + struct iommu_sva *sva;
>> + int ret = 0;
>> +
>> + mutex_lock(&pasid_mutex);
>> + /*
>> + * Detach the domain if a blocking domain is set. Check the
>> + * right domain type once the IOMMU driver supports a real
>> + * blocking domain.
>> + */
>> + if (!domain || domain->type == IOMMU_DOMAIN_UNMANAGED) {
>> + intel_svm_unbind_mm(dev, pasid);
>
> See, I think this is exactly the wrong way to use the ops
>
> The blockin domain ops should have its own function that just
> unconditionally calls intel_svm_unbind_mm()
>
>> + } else {
>> + struct mm_struct *mm = domain->mm;
>> +
>> + sva = intel_svm_bind_mm(iommu, dev, mm);
>> + if (IS_ERR(sva))
>> + ret = PTR_ERR(sva);
>
> And similarly the SVA domain should have its own op that does this SVM
> call.
>
> Muxing the ops with tests on the domain is an anti-pattern. In fact I
> would say any time you see an op testing the domain->type it is very
> suspicious.

Both agreed. Will fix them in the next version.

Best regards,
baolu

2022-08-23 09:02:25

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v11 05/13] iommu: Add attach/detach_dev_pasid iommu interface

On 2022/8/18 21:33, Jason Gunthorpe wrote:
> On Wed, Aug 17, 2022 at 09:20:16AM +0800, Lu Baolu wrote:
>
>> +static int __iommu_set_group_pasid(struct iommu_domain *domain,
>> + struct iommu_group *group, ioasid_t pasid)
>> +{
>> + struct iommu_domain *ops_domain;
>> + struct group_device *device;
>> + int ret = 0;
>> +
>> + if (domain == group->blocking_domain)
>> + ops_domain = xa_load(&group->pasid_array, pasid);
>> + else
>> + ops_domain = domain;
>
> This seems weird, why isn't this just always
>
> domain->ops->set_dev_pasid()?

Sure. I will fix this in the next version.

>
>> + if (curr) {
>> + ret = xa_err(curr) ? : -EBUSY;
>> + goto out_unlock;
>> + }
>> +
>> + ret = __iommu_set_group_pasid(domain, group, pasid);
>> + if (ret) {
>> + __iommu_set_group_pasid(group->blocking_domain, group, pasid);
>> + xa_erase(&group->pasid_array, pasid);
>
> I was looking at this trying to figure out why we are having
> attach/detach semantics vs set and this error handling seems to be the
> reason
>
> Lets add a comment because it is subtle thing:
>
> Setting a PASID to a blocking domain cannot fail, so we can always
> safely error unwind a failure to attach a domain back to the original
> group configuration of the PASID being unused.

Updated.

>
>> +/*
>> + * iommu_detach_device_pasid() - Detach the domain from pasid of device
>> + * @domain: the iommu domain.
>> + * @dev: the attached device.
>> + * @pasid: the pasid of the device.
>> + *
>> + * The @domain must have been attached to @pasid of the @dev with
>> + * iommu_attach_device_pasid().
>> + */
>> +void iommu_detach_device_pasid(struct iommu_domain *domain, struct device *dev,
>> + ioasid_t pasid)
>
> Don't pass domain here?

It is checked in the function to make sure that the detached domain is
the same one as the previous attached one.

>
>> +/*
>> + * iommu_get_domain_for_dev_pasid() - Retrieve domain for @pasid of @dev
>> + * @dev: the queried device
>> + * @pasid: the pasid of the device
>> + *
>> + * This is a variant of iommu_get_domain_for_dev(). It returns the existing
>> + * domain attached to pasid of a device. It's only for internal use of the
>> + * IOMMU subsystem. The caller must take care to avoid any possible
>> + * use-after-free case.
>
> How exactly does the caller manage that?

"... the returned domain pointer could only be used before detaching
from the device PASID."

>
>> + *
>> + * Return: attached domain on success, NULL otherwise.
>> + */
>> +struct iommu_domain *
>> +iommu_get_domain_for_dev_pasid(struct device *dev, ioasid_t pasid)
>> +{
>> + struct iommu_domain *domain;
>> + struct iommu_group *group;
>> +
>> + if (!pasid_valid(pasid))
>> + return NULL;
>
> Why bother? If the pasid is not valid then it definitely won't be in the xarray.

Removed.

> But otherwise this overall thing seems fine to me

Thank you!

Best regards,
baolu

2022-08-23 13:23:59

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v11 09/13] iommu/sva: Refactoring iommu_sva_bind/unbind_device()

On 2022/8/18 21:41, Jason Gunthorpe wrote:
> On Wed, Aug 17, 2022 at 09:20:20AM +0800, Lu Baolu wrote:
>> +
>> +/**
>> + * iommu_sva_bind_device() - Bind a process address space to a device
>> + * @dev: the device
>> + * @mm: the mm to bind, caller must hold a reference to mm_users
>> + *
>> + * Create a bond between device and address space, allowing the device to access
>> + * the mm using the returned PASID. If a bond already exists between @device and
>> + * @mm, it is returned and an additional reference is taken. Caller must call
>> + * iommu_sva_unbind_device() to release each reference.
>> + *
>> + * iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
>> + * initialize the required SVA features.
>> + *
>> + * On error, returns an ERR_PTR value.
>> + */
>> +struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm)
>> +{
>> + struct iommu_domain *domain;
>> + struct iommu_sva *bond;
>
> This is called handle below, pick one name please

Updated.

>
>> + ioasid_t max_pasids;
>> + int ret;
>> +
>> + max_pasids = dev->iommu->max_pasids;
>> + if (!max_pasids)
>> + return ERR_PTR(-EOPNOTSUPP);
>> +
>> + /* Allocate mm->pasid if necessary. */
>> + ret = iommu_sva_alloc_pasid(mm, 1, max_pasids - 1);
>> + if (ret)
>> + return ERR_PTR(ret);
>> +
>> + bond = kzalloc(sizeof(*bond), GFP_KERNEL);
>> + if (!bond)
>> + return ERR_PTR(-ENOMEM);
>> +
>> + mutex_lock(&iommu_sva_lock);
>> + /* Search for an existing domain. */
>> + domain = iommu_get_domain_for_dev_pasid(dev, mm->pasid);
>> + if (domain) {
>
> This isn't safe, or sane. A driver could have attached something to
> this PASID that is not a SVA domain and thus not protected by the
> iommu_sva_lock.
>
> At a minimum you should add a type match to
> iommu_get_domain_for_dev_pasid(), eg to confirm it is a SVA domain and
> do that check under the xa_lock of the pasid xarray.
>
> And then the general idea is that SVA domain attach/detach must hold
> this janky global lock.

Make sense. I will add this logic.

>
>> + refcount_inc(&domain->users);
>
> This atomic is always processed under the iommu_sva_lock, so it
> doesn't need to be an atomic anymore.

Will change it to an integer.

>
> Otherwise this design looks OK to me too

Thank you very much for your suggestions.

Best regards,
baolu

2022-08-24 16:40:08

by Bjorn Helgaas

[permalink] [raw]
Subject: Re: [PATCH v11 04/13] PCI: Allow PASID only when ACS enforced on upstreaming path

On Tue, Aug 23, 2022 at 03:05:53PM +0800, Baolu Lu wrote:
> On 2022/8/19 07:00, Bjorn Helgaas wrote:

> > PCI: Enable PASID only when ACS RR & UF enabled on upstream path
> >
> > The Requester ID/Process Address Space ID (PASID) combination
> > identifies an address space distinct from the PCI bus address space,
> > e.g., an address space defined by an IOMMU.
> >
> > But the PCIe fabric routes Memory Requests based on the TLP address,
> > ignoring any PASID (PCIe r6.0, sec 2.2.10.4), so a TLP with PASID that
> > *should* go upstream to the IOMMU may instead be routed as a P2P
> > Request if its address falls in a bridge window.
> >
> > To ensure that all Memory Requests with PASID are routed upstream,
> > only enable PASID if ACS P2P Request Redirect and Upstream Forwarding
> > are enabled for the path leading to the device.
>
> Yours is clear and straight-forward. I will update the patch with above.
> Thank you and very appreciated!

With the update to only require RR and UF and the commit log update,

Acked-by: Bjorn Helgaas <[email protected]>

Thanks!

2022-08-26 04:47:34

by Baolu Lu

[permalink] [raw]
Subject: Re: [PATCH v11 00/13] iommu: SVA and IOPF refactoring

On 2022/8/17 9:20, Lu Baolu wrote:
> Hi folks,
>
> The former part of this series introduces the IOMMU interfaces to attach
> or detach an iommu domain to/from a pasid of a device, and refactors the
> exsiting IOMMU SVA implementation by assigning an SVA type of iommu
> domain to a shared virtual address and replacing sva_bind/unbind iommu
> ops with a set_dev_pasid domain ops.
>
> The latter part changes the existing I/O page fault handling framework
> from only serving SVA to a generic one. Any driver or component could
> handle the I/O page faults for its domain in its own way by installing
> an I/O page fault handler.
>
> This series has been functionally tested on an x86 machine and compile
> tested for all architectures.
>
> This series is also available on github:
> [2]https://github.com/LuBaolu/intel-iommu/commits/iommu-sva-refactoring-v11
>
> Please review and suggest.

Thank you all for review and test. I have updated this series and
uploaded a new version at

https://github.com/LuBaolu/intel-iommu/commits/iommu-sva-refactoring-v12

Zhangfei and Tony have tested it on real Intel and arm64 hardware.

I will soon post it for further review.

Best regards,
baolu