2021-05-10 22:02:56

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v4 0/2] Simplify and restrict IOMMU SVA APIs

A couple of small changes to simplify and restrict SVA APIs. The motivation
is to make PASID allocation palatable for cgroup consumptions. Misc cgroup
is merged for v5.13, it can be extended for IOASID as another scalar
resource.

I have not tested on ARM platforms due to availability. Would appreciate
if someone could help with the testing on uacce based SVA usages.

Thanks,

Jacob

ChangeLog:
V4 - fixed a cross-compile error
- rebased to v5.13-rc1 resolved a conflict in intel-svm code

V3 - stop passing mm to sva_bind IOMMU ops, no need to take mm refcount
in the common SVA code.
- deleted flag variable in idxd driver

V2
- retained mm argument in iommu_sva_alloc_pasid()
- keep generic supervisor flag separated from vt-d's SRE
- move flag declaration out of CONFIG_IOMMU_API



Jacob Pan (2):
iommu/sva: Tighten SVA bind API with explicit flags
iommu/sva: Remove mm parameter from SVA bind API

drivers/dma/idxd/cdev.c | 2 +-
drivers/dma/idxd/init.c | 7 ++----
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 12 ++++++----
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 ++--
drivers/iommu/intel/svm.c | 19 ++++++++-------
drivers/iommu/iommu-sva-lib.c | 11 +++++----
drivers/iommu/iommu-sva-lib.h | 2 +-
drivers/iommu/iommu.c | 13 +++++------
drivers/misc/uacce/uacce.c | 2 +-
include/linux/intel-iommu.h | 3 +--
include/linux/intel-svm.h | 12 ----------
include/linux/iommu.h | 23 ++++++++++++++-----
12 files changed, 54 insertions(+), 57 deletions(-)

--
2.25.1


2021-05-10 22:03:20

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

The void* drvdata parameter isn't really used in iommu_sva_bind_device()
API, the current IDXD code "borrows" the drvdata for a VT-d private flag
for supervisor SVA usage.

Supervisor/Privileged mode request is a generic feature. It should be
promoted from the VT-d vendor driver to the generic code.

This patch replaces void* drvdata with a unsigned int flags parameter
and adjusts callers accordingly.

Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/
Suggested-by: Jean-Philippe Brucker <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/dma/idxd/cdev.c | 2 +-
drivers/dma/idxd/init.c | 7 ++-----
.../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 5 ++++-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 4 ++--
drivers/iommu/intel/svm.c | 14 ++++----------
drivers/iommu/iommu.c | 9 ++++++---
drivers/misc/uacce/uacce.c | 2 +-
include/linux/intel-iommu.h | 2 +-
include/linux/intel-svm.h | 12 ------------
include/linux/iommu.h | 19 ++++++++++++++++---
10 files changed, 37 insertions(+), 39 deletions(-)

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 302cba5ff779..04e3813b613c 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -100,7 +100,7 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
filp->private_data = ctx;

if (device_pasid_enabled(idxd)) {
- sva = iommu_sva_bind_device(dev, current->mm, NULL);
+ sva = iommu_sva_bind_device(dev, current->mm, 0);
if (IS_ERR(sva)) {
rc = PTR_ERR(sva);
dev_err(dev, "pasid allocation failed: %d\n", rc);
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 2a926bef87f2..6f75c29d3616 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -14,7 +14,6 @@
#include <linux/io-64-nonatomic-lo-hi.h>
#include <linux/device.h>
#include <linux/idr.h>
-#include <linux/intel-svm.h>
#include <linux/iommu.h>
#include <uapi/linux/idxd.h>
#include <linux/dmaengine.h>
@@ -457,13 +456,11 @@ static struct idxd_device *idxd_alloc(struct pci_dev *pdev, struct idxd_driver_d

static int idxd_enable_system_pasid(struct idxd_device *idxd)
{
- int flags;
unsigned int pasid;
struct iommu_sva *sva;

- flags = SVM_FLAG_SUPERVISOR_MODE;
-
- sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL, &flags);
+ sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL,
+ IOMMU_SVA_BIND_SUPERVISOR);
if (IS_ERR(sva)) {
dev_warn(&idxd->pdev->dev,
"iommu sva bind failed: %ld\n", PTR_ERR(sva));
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index bb251cab61f3..145ceb2fc5da 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -354,12 +354,15 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
}

struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
{
struct iommu_sva *handle;
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);

+ if (flags)
+ return ERR_PTR(-EINVAL);
+
if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
return ERR_PTR(-EINVAL);

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 46e8c49214a8..3a4f24da59d9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -717,7 +717,7 @@ bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
- void *drvdata);
+ unsigned int flags);
void arm_smmu_sva_unbind(struct iommu_sva *handle);
u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
void arm_smmu_sva_notifier_synchronize(void);
@@ -748,7 +748,7 @@ static inline int arm_smmu_master_disable_sva(struct arm_smmu_master *master)
}

static inline struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
{
return ERR_PTR(-ENODEV);
}
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 5165cea90421..531f03d13bd5 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -485,12 +485,9 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
} else
pasid_max = 1 << 20;

- /* Bind supervisor PASID shuld have mm = NULL */
- if (flags & SVM_FLAG_SUPERVISOR_MODE) {
- if (!ecap_srs(iommu->ecap) || mm) {
- pr_err("Supervisor PASID with user provided mm.\n");
- return -EINVAL;
- }
+ if ((flags & IOMMU_SVA_BIND_SUPERVISOR) && !ecap_srs(iommu->ecap)) {
+ pr_err("Supervisor PASID not supported.\n");
+ return -EINVAL;
}

list_for_each_entry(t, &global_svm_list, list) {
@@ -1055,11 +1052,10 @@ static irqreturn_t prq_event_thread(int irq, void *d)

#define to_intel_svm_dev(handle) container_of(handle, struct intel_svm_dev, sva)
struct iommu_sva *
-intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
+intel_svm_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
{
struct iommu_sva *sva = ERR_PTR(-EINVAL);
struct intel_svm_dev *sdev = NULL;
- unsigned int flags = 0;
int ret;

/*
@@ -1067,8 +1063,6 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
* It will require shared SVM data structures, i.e. combine io_mm
* and intel_svm etc.
*/
- if (drvdata)
- flags = *(unsigned int *)drvdata;
mutex_lock(&pasid_mutex);
ret = intel_svm_bind_mm(dev, flags, mm, &sdev);
if (ret)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 808ab70d5df5..04e7ec640284 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2949,6 +2949,7 @@ EXPORT_SYMBOL_GPL(iommu_aux_get_pasid);
* iommu_sva_bind_device() - Bind a process address space to a device
* @dev: the device
* @mm: the mm to bind, caller must hold a reference to it
+ * @flags: options for the bind operation defined as IOMMU_SVA_BIND_*
*
* Create a bond between device and address space, allowing the device to access
* the mm using the returned PASID. If a bond already exists between @device and
@@ -2961,7 +2962,7 @@ EXPORT_SYMBOL_GPL(iommu_aux_get_pasid);
* On error, returns an ERR_PTR value.
*/
struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
+iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, unsigned int flags)
{
struct iommu_group *group;
struct iommu_sva *handle = ERR_PTR(-EINVAL);
@@ -2974,6 +2975,9 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
if (!group)
return ERR_PTR(-ENODEV);

+ /* Supervisor SVA does not need the current mm */
+ if ((flags & IOMMU_SVA_BIND_SUPERVISOR) && mm)
+ return ERR_PTR(-EINVAL);
/* Ensure device count and domain don't change while we're binding */
mutex_lock(&group->mutex);

@@ -2986,8 +2990,7 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
if (iommu_group_device_count(group) != 1)
goto out_unlock;

- handle = ops->sva_bind(dev, mm, drvdata);
-
+ handle = ops->sva_bind(dev, mm, flags);
out_unlock:
mutex_unlock(&group->mutex);
iommu_group_put(group);
diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index bae18ef03dcb..1357427d497a 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -99,7 +99,7 @@ static int uacce_bind_queue(struct uacce_device *uacce, struct uacce_queue *q)
if (!(uacce->flags & UACCE_DEV_SVA))
return 0;

- handle = iommu_sva_bind_device(uacce->parent, current->mm, NULL);
+ handle = iommu_sva_bind_device(uacce->parent, current->mm, 0);
if (IS_ERR(handle))
return PTR_ERR(handle);

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 03faf20a6817..a93966361754 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -764,7 +764,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
struct iommu_gpasid_bind_data *data);
int intel_svm_unbind_gpasid(struct device *dev, u32 pasid);
struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
- void *drvdata);
+ unsigned int flags);
void intel_svm_unbind(struct iommu_sva *handle);
u32 intel_svm_get_pasid(struct iommu_sva *handle);
int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
index 10fa80eef13a..6fb41c69b7c0 100644
--- a/include/linux/intel-svm.h
+++ b/include/linux/intel-svm.h
@@ -14,18 +14,6 @@
#define SVM_REQ_EXEC (1<<1)
#define SVM_REQ_PRIV (1<<0)

-/*
- * The SVM_FLAG_SUPERVISOR_MODE flag requests a PASID which can be used only
- * for access to kernel addresses. No IOTLB flushes are automatically done
- * for kernel mappings; it is valid only for access to the kernel's static
- * 1:1 mapping of physical memory — not to vmalloc or even module mappings.
- * A future API addition may permit the use of such ranges, by means of an
- * explicit IOTLB flush call (akin to the DMA API's unmap method).
- *
- * It is unlikely that we will ever hook into flush_tlb_kernel_range() to
- * do such IOTLB flushes automatically.
- */
-#define SVM_FLAG_SUPERVISOR_MODE BIT(0)
/*
* The SVM_FLAG_GUEST_MODE flag is used when a PASID bind is for guest
* processes. Compared to the host bind, the primary differences are:
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..fcc9d36b4b01 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -152,6 +152,19 @@ enum iommu_dev_features {

#define IOMMU_PASID_INVALID (-1U)

+/*
+ * The IOMMU_SVA_BIND_SUPERVISOR flag requests a PASID which can be used only
+ * for access to kernel addresses. No IOTLB flushes are automatically done
+ * for kernel mappings; it is valid only for access to the kernel's static
+ * 1:1 mapping of physical memory — not to vmalloc or even module mappings.
+ * A future API addition may permit the use of such ranges, by means of an
+ * explicit IOTLB flush call (akin to the DMA API's unmap method).
+ *
+ * It is unlikely that we will ever hook into flush_tlb_kernel_range() to
+ * do such IOTLB flushes automatically.
+ */
+#define IOMMU_SVA_BIND_SUPERVISOR BIT(0)
+
#ifdef CONFIG_IOMMU_API

/**
@@ -267,7 +280,7 @@ struct iommu_ops {
int (*aux_get_pasid)(struct iommu_domain *domain, struct device *dev);

struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
- void *drvdata);
+ unsigned int flags);
void (*sva_unbind)(struct iommu_sva *handle);
u32 (*sva_get_pasid)(struct iommu_sva *handle);

@@ -600,7 +613,7 @@ int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev);

struct iommu_sva *iommu_sva_bind_device(struct device *dev,
struct mm_struct *mm,
- void *drvdata);
+ unsigned int flags);
void iommu_sva_unbind_device(struct iommu_sva *handle);
u32 iommu_sva_get_pasid(struct iommu_sva *handle);

@@ -954,7 +967,7 @@ iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
}

static inline struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata)
+iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, unsigned int flags)
{
return NULL;
}
--
2.25.1

2021-05-10 22:03:26

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v4 2/2] iommu/sva: Remove mm parameter from SVA bind API

The mm parameter in iommu_sva_bind_device() is intended for privileged
process perform bind() on behalf of other processes. This use case has
yet to be materialized, let alone potential security implications of
adding kernel hooks without explicit user consent.
In addition, with the agreement that IOASID allocation shall be subject
cgroup limit. It will be inline with misc cgroup proposal if IOASID
allocation as part of the SVA bind is limited to the current task.

Link: https://lore.kernel.org/linux-iommu/20210303160205.151d114e@jacob-builder/
Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/dma/idxd/cdev.c | 2 +-
drivers/dma/idxd/init.c | 2 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 9 +++++----
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 5 ++---
drivers/iommu/intel/svm.c | 17 +++++++++++------
drivers/iommu/iommu-sva-lib.c | 11 ++++++-----
drivers/iommu/iommu-sva-lib.h | 2 +-
drivers/iommu/iommu.c | 14 +++++---------
drivers/misc/uacce/uacce.c | 2 +-
include/linux/intel-iommu.h | 3 +--
include/linux/iommu.h | 8 +++-----
11 files changed, 37 insertions(+), 38 deletions(-)

diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
index 04e3813b613c..875864c832fc 100644
--- a/drivers/dma/idxd/cdev.c
+++ b/drivers/dma/idxd/cdev.c
@@ -100,7 +100,7 @@ static int idxd_cdev_open(struct inode *inode, struct file *filp)
filp->private_data = ctx;

if (device_pasid_enabled(idxd)) {
- sva = iommu_sva_bind_device(dev, current->mm, 0);
+ sva = iommu_sva_bind_device(dev, 0);
if (IS_ERR(sva)) {
rc = PTR_ERR(sva);
dev_err(dev, "pasid allocation failed: %d\n", rc);
diff --git a/drivers/dma/idxd/init.c b/drivers/dma/idxd/init.c
index 6f75c29d3616..f7c43d03ad79 100644
--- a/drivers/dma/idxd/init.c
+++ b/drivers/dma/idxd/init.c
@@ -459,7 +459,7 @@ static int idxd_enable_system_pasid(struct idxd_device *idxd)
unsigned int pasid;
struct iommu_sva *sva;

- sva = iommu_sva_bind_device(&idxd->pdev->dev, NULL,
+ sva = iommu_sva_bind_device(&idxd->pdev->dev,
IOMMU_SVA_BIND_SUPERVISOR);
if (IS_ERR(sva)) {
dev_warn(&idxd->pdev->dev,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 145ceb2fc5da..0c3014e64c77 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -305,10 +305,11 @@ static void arm_smmu_mmu_notifier_put(struct arm_smmu_mmu_notifier *smmu_mn)
}

static struct iommu_sva *
-__arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
+__arm_smmu_sva_bind(struct device *dev)
{
int ret;
struct arm_smmu_bond *bond;
+ struct mm_struct *mm = current->mm;
struct arm_smmu_master *master = dev_iommu_priv_get(dev);
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -329,7 +330,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
return ERR_PTR(-ENOMEM);

/* Allocate a PASID for this mm if necessary */
- ret = iommu_sva_alloc_pasid(mm, 1, (1U << master->ssid_bits) - 1);
+ ret = iommu_sva_alloc_pasid(1, (1U << master->ssid_bits) - 1);
if (ret)
goto err_free_bond;

@@ -354,7 +355,7 @@ __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
}

struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
+arm_smmu_sva_bind(struct device *dev, unsigned int flags)
{
struct iommu_sva *handle;
struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
@@ -367,7 +368,7 @@ arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
return ERR_PTR(-EINVAL);

mutex_lock(&sva_lock);
- handle = __arm_smmu_sva_bind(dev, mm);
+ handle = __arm_smmu_sva_bind(dev);
mutex_unlock(&sva_lock);
return handle;
}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 3a4f24da59d9..485ead85a841 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -716,8 +716,7 @@ bool arm_smmu_master_sva_supported(struct arm_smmu_master *master);
bool arm_smmu_master_sva_enabled(struct arm_smmu_master *master);
int arm_smmu_master_enable_sva(struct arm_smmu_master *master);
int arm_smmu_master_disable_sva(struct arm_smmu_master *master);
-struct iommu_sva *arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm,
- unsigned int flags);
+struct iommu_sva *arm_smmu_sva_bind(struct device *dev, unsigned int flags);
void arm_smmu_sva_unbind(struct iommu_sva *handle);
u32 arm_smmu_sva_get_pasid(struct iommu_sva *handle);
void arm_smmu_sva_notifier_synchronize(void);
@@ -748,7 +747,7 @@ static inline int arm_smmu_master_disable_sva(struct arm_smmu_master *master)
}

static inline struct iommu_sva *
-arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
+arm_smmu_sva_bind(struct device *dev, unsigned int flags)
{
return ERR_PTR(-ENODEV);
}
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 531f03d13bd5..47d748f79460 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -462,11 +462,12 @@ static void load_pasid(struct mm_struct *mm, u32 pasid)
/* Caller must hold pasid_mutex, mm reference */
static int
intel_svm_bind_mm(struct device *dev, unsigned int flags,
- struct mm_struct *mm, struct intel_svm_dev **sd)
+ struct intel_svm_dev **sd)
{
struct intel_iommu *iommu = device_to_iommu(dev, NULL, NULL);
struct intel_svm *svm = NULL, *t;
struct device_domain_info *info;
+ struct mm_struct *mm = NULL;
struct intel_svm_dev *sdev;
unsigned long iflags;
int pasid_max;
@@ -485,9 +486,13 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
} else
pasid_max = 1 << 20;

- if ((flags & IOMMU_SVA_BIND_SUPERVISOR) && !ecap_srs(iommu->ecap)) {
- pr_err("Supervisor PASID not supported.\n");
- return -EINVAL;
+ if (flags & IOMMU_SVA_BIND_SUPERVISOR) {
+ if (!ecap_srs(iommu->ecap)) {
+ pr_err("Supervisor PASID not supported.\n");
+ return -EINVAL;
+ }
+ } else {
+ mm = current->mm;
}

list_for_each_entry(t, &global_svm_list, list) {
@@ -1052,7 +1057,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)

#define to_intel_svm_dev(handle) container_of(handle, struct intel_svm_dev, sva)
struct iommu_sva *
-intel_svm_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
+intel_svm_bind(struct device *dev, unsigned int flags)
{
struct iommu_sva *sva = ERR_PTR(-EINVAL);
struct intel_svm_dev *sdev = NULL;
@@ -1064,7 +1069,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, unsigned int flags)
* and intel_svm etc.
*/
mutex_lock(&pasid_mutex);
- ret = intel_svm_bind_mm(dev, flags, mm, &sdev);
+ ret = intel_svm_bind_mm(dev, flags, &sdev);
if (ret)
sva = ERR_PTR(ret);
else if (sdev)
diff --git a/drivers/iommu/iommu-sva-lib.c b/drivers/iommu/iommu-sva-lib.c
index bd41405d34e9..5f7b486ba5b0 100644
--- a/drivers/iommu/iommu-sva-lib.c
+++ b/drivers/iommu/iommu-sva-lib.c
@@ -12,21 +12,22 @@ static DECLARE_IOASID_SET(iommu_sva_pasid);

/**
* iommu_sva_alloc_pasid - Allocate a PASID for the mm
- * @mm: the mm
* @min: minimum PASID value (inclusive)
* @max: maximum PASID value (inclusive)
*
- * Try to allocate a PASID for this mm, or take a reference to the existing one
- * provided it fits within the [@min, @max] range. On success the PASID is
- * available in mm->pasid, and must be released with iommu_sva_free_pasid().
+ * Try to allocate a PASID for the current mm, or take a reference to the
+ * existing one provided it fits within the [@min, @max] range. On success
+ * the PASID is available in the current mm->pasid, and must be released with
+ * iommu_sva_free_pasid().
* @min must be greater than 0, because 0 indicates an unused mm->pasid.
*
* Returns 0 on success and < 0 on error.
*/
-int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max)
+int iommu_sva_alloc_pasid(ioasid_t min, ioasid_t max)
{
int ret = 0;
ioasid_t pasid;
+ struct mm_struct *mm = current->mm;

if (min == INVALID_IOASID || max == INVALID_IOASID ||
min == 0 || max < min)
diff --git a/drivers/iommu/iommu-sva-lib.h b/drivers/iommu/iommu-sva-lib.h
index 031155010ca8..43d3b529c215 100644
--- a/drivers/iommu/iommu-sva-lib.h
+++ b/drivers/iommu/iommu-sva-lib.h
@@ -8,7 +8,7 @@
#include <linux/ioasid.h>
#include <linux/mm_types.h>

-int iommu_sva_alloc_pasid(struct mm_struct *mm, ioasid_t min, ioasid_t max);
+int iommu_sva_alloc_pasid(ioasid_t min, ioasid_t max);
void iommu_sva_free_pasid(struct mm_struct *mm);
struct mm_struct *iommu_sva_find(ioasid_t pasid);

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 04e7ec640284..4327edb888c6 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2946,15 +2946,14 @@ int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
EXPORT_SYMBOL_GPL(iommu_aux_get_pasid);

/**
- * iommu_sva_bind_device() - Bind a process address space to a device
+ * iommu_sva_bind_device() - Bind the current process address space to a device
* @dev: the device
- * @mm: the mm to bind, caller must hold a reference to it
* @flags: options for the bind operation defined as IOMMU_SVA_BIND_*
*
* Create a bond between device and address space, allowing the device to access
* the mm using the returned PASID. If a bond already exists between @device and
- * @mm, it is returned and an additional reference is taken. Caller must call
- * iommu_sva_unbind_device() to release each reference.
+ * the current mm, it is returned and an additional reference is taken. Caller
+ * must call iommu_sva_unbind_device() to release each reference.
*
* iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) must be called first, to
* initialize the required SVA features.
@@ -2962,7 +2961,7 @@ EXPORT_SYMBOL_GPL(iommu_aux_get_pasid);
* On error, returns an ERR_PTR value.
*/
struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, unsigned int flags)
+iommu_sva_bind_device(struct device *dev, unsigned int flags)
{
struct iommu_group *group;
struct iommu_sva *handle = ERR_PTR(-EINVAL);
@@ -2975,9 +2974,6 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, unsigned int fla
if (!group)
return ERR_PTR(-ENODEV);

- /* Supervisor SVA does not need the current mm */
- if ((flags & IOMMU_SVA_BIND_SUPERVISOR) && mm)
- return ERR_PTR(-EINVAL);
/* Ensure device count and domain don't change while we're binding */
mutex_lock(&group->mutex);

@@ -2990,7 +2986,7 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, unsigned int fla
if (iommu_group_device_count(group) != 1)
goto out_unlock;

- handle = ops->sva_bind(dev, mm, flags);
+ handle = ops->sva_bind(dev, flags);
out_unlock:
mutex_unlock(&group->mutex);
iommu_group_put(group);
diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 1357427d497a..c25ba907fe70 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -99,7 +99,7 @@ static int uacce_bind_queue(struct uacce_device *uacce, struct uacce_queue *q)
if (!(uacce->flags & UACCE_DEV_SVA))
return 0;

- handle = iommu_sva_bind_device(uacce->parent, current->mm, 0);
+ handle = iommu_sva_bind_device(uacce->parent, 0);
if (IS_ERR(handle))
return PTR_ERR(handle);

diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index a93966361754..b1febc05df54 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -763,8 +763,7 @@ extern int intel_svm_finish_prq(struct intel_iommu *iommu);
int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
struct iommu_gpasid_bind_data *data);
int intel_svm_unbind_gpasid(struct device *dev, u32 pasid);
-struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
- unsigned int flags);
+struct iommu_sva *intel_svm_bind(struct device *dev, unsigned int flags);
void intel_svm_unbind(struct iommu_sva *handle);
u32 intel_svm_get_pasid(struct iommu_sva *handle);
int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt,
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index fcc9d36b4b01..9b3ba9ab35dc 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -217,7 +217,7 @@ struct iommu_iotlb_gather {
* @dev_feat_enabled: check enabled feature
* @aux_attach/detach_dev: aux-domain specific attach/detach entries.
* @aux_get_pasid: get the pasid given an aux-domain
- * @sva_bind: Bind process address space to device
+ * @sva_bind: Bind the current process address space to device
* @sva_unbind: Unbind process address space from device
* @sva_get_pasid: Get PASID associated to a SVA handle
* @page_response: handle page request response
@@ -279,8 +279,7 @@ struct iommu_ops {
void (*aux_detach_dev)(struct iommu_domain *domain, struct device *dev);
int (*aux_get_pasid)(struct iommu_domain *domain, struct device *dev);

- struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
- unsigned int flags);
+ struct iommu_sva *(*sva_bind)(struct device *dev, unsigned int flags);
void (*sva_unbind)(struct iommu_sva *handle);
u32 (*sva_get_pasid)(struct iommu_sva *handle);

@@ -612,7 +611,6 @@ void iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev);
int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev);

struct iommu_sva *iommu_sva_bind_device(struct device *dev,
- struct mm_struct *mm,
unsigned int flags);
void iommu_sva_unbind_device(struct iommu_sva *handle);
u32 iommu_sva_get_pasid(struct iommu_sva *handle);
@@ -967,7 +965,7 @@ iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev)
}

static inline struct iommu_sva *
-iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, unsigned int flags)
+iommu_sva_bind_device(struct device *dev, unsigned int flags)
{
return NULL;
}
--
2.25.1

2021-05-10 23:38:46

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

On Mon, May 10, 2021 at 06:25:07AM -0700, Jacob Pan wrote:

> +/*
> + * The IOMMU_SVA_BIND_SUPERVISOR flag requests a PASID which can be used only
> + * for access to kernel addresses. No IOTLB flushes are automatically done
> + * for kernel mappings; it is valid only for access to the kernel's static
> + * 1:1 mapping of physical memory — not to vmalloc or even module mappings.
> + * A future API addition may permit the use of such ranges, by means of an
> + * explicit IOTLB flush call (akin to the DMA API's unmap method).
> + *
> + * It is unlikely that we will ever hook into flush_tlb_kernel_range() to
> + * do such IOTLB flushes automatically.
> + */
> +#define IOMMU_SVA_BIND_SUPERVISOR BIT(0)

Huh? That isn't really SVA, can you call it something saner please?

Is it really a PASID that always has all of physical memory mapped
into it? Sounds dangerous. What is it for?

Jason

2021-05-11 03:30:09

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

Hi Jason,

On Mon, 10 May 2021 20:37:49 -0300, Jason Gunthorpe <[email protected]> wrote:

> On Mon, May 10, 2021 at 06:25:07AM -0700, Jacob Pan wrote:
>
> > +/*
> > + * The IOMMU_SVA_BIND_SUPERVISOR flag requests a PASID which can be
> > used only
> > + * for access to kernel addresses. No IOTLB flushes are automatically
> > done
> > + * for kernel mappings; it is valid only for access to the kernel's
> > static
> > + * 1:1 mapping of physical memory — not to vmalloc or even module
> > mappings.
> > + * A future API addition may permit the use of such ranges, by means
> > of an
> > + * explicit IOTLB flush call (akin to the DMA API's unmap method).
> > + *
> > + * It is unlikely that we will ever hook into flush_tlb_kernel_range()
> > to
> > + * do such IOTLB flushes automatically.
> > + */
> > +#define IOMMU_SVA_BIND_SUPERVISOR BIT(0)
>
> Huh? That isn't really SVA, can you call it something saner please?
>
This is shared kernel virtual address, I am following the SVA lib naming
since this is where the flag will be used. Why this is not SVA? Kernel
virtual address is still virtual address. Is it due to direct map?

> Is it really a PASID that always has all of physical memory mapped
> into it? Sounds dangerous. What is it for?
>

Yes. It is to bind DMA request w/ PASID with init_mm/init_top_pgt. Per PCIe
spec PASID TLP prefix has "Privileged Mode Requested" bit. VT-d supports
this with "Privileged-mode-Requested (PR) flag (to distinguish user versus
supervisor access)". Each PASID entry has a SRE (Supervisor Request Enable)
bit.

Perhaps we should limit that to trusted device, e.g. RCIEP device?

> Jason


Thanks,

Jacob

2021-05-11 11:49:40

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

On Mon, May 10, 2021 at 08:31:45PM -0700, Jacob Pan wrote:
> Hi Jason,
>
> On Mon, 10 May 2021 20:37:49 -0300, Jason Gunthorpe <[email protected]> wrote:
>
> > On Mon, May 10, 2021 at 06:25:07AM -0700, Jacob Pan wrote:
> >
> > > +/*
> > > + * The IOMMU_SVA_BIND_SUPERVISOR flag requests a PASID which can be
> > > used only
> > > + * for access to kernel addresses. No IOTLB flushes are automatically
> > > done
> > > + * for kernel mappings; it is valid only for access to the kernel's
> > > static
> > > + * 1:1 mapping of physical memory — not to vmalloc or even module
> > > mappings.
> > > + * A future API addition may permit the use of such ranges, by means
> > > of an
> > > + * explicit IOTLB flush call (akin to the DMA API's unmap method).
> > > + *
> > > + * It is unlikely that we will ever hook into flush_tlb_kernel_range()
> > > to
> > > + * do such IOTLB flushes automatically.
> > > + */
> > > +#define IOMMU_SVA_BIND_SUPERVISOR BIT(0)
> >
> > Huh? That isn't really SVA, can you call it something saner please?
> >
> This is shared kernel virtual address, I am following the SVA lib naming
> since this is where the flag will be used. Why this is not SVA? Kernel
> virtual address is still virtual address. Is it due to direct map?

As the above explains it doesn't actually synchronize the kernel's
address space it just shoves the direct map into the IOMMU.

I suppose a different IOMMU implementation might point the PASID directly
at the kernel's page table and avoid those limitations - but since
that isn't portable it seems irrelevant.

Since the only thing it really maps is the direct map I would just
call it direct_map, or all physical or something.

How does this interact with the DMA APIs? How do you get CPU cache
flushing/etc into PASID operations that don't trigger IOMMU updates?

Honestly, I'm not convinced we should have "kernel SVA" at all.. Why
does IDXD use normal DMA on the RID for kernel controlled accesses?

> > Is it really a PASID that always has all of physical memory mapped
> > into it? Sounds dangerous. What is it for?
>
> Yes. It is to bind DMA request w/ PASID with init_mm/init_top_pgt. Per PCIe
> spec PASID TLP prefix has "Privileged Mode Requested" bit. VT-d supports
> this with "Privileged-mode-Requested (PR) flag (to distinguish user versus
> supervisor access)". Each PASID entry has a SRE (Supervisor Request Enable)
> bit.

The PR flag is only needed if the underlying IOMMU is directly
processing the CPU page tables. For cases where the IOMMU is using its
own page table format and has its own copies the PR flag shouldn't be
used.

Jason

2021-05-11 16:17:06

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

Hi Jason,

On Tue, 11 May 2021 08:48:48 -0300, Jason Gunthorpe <[email protected]> wrote:

> On Mon, May 10, 2021 at 08:31:45PM -0700, Jacob Pan wrote:
> > Hi Jason,
> >
> > On Mon, 10 May 2021 20:37:49 -0300, Jason Gunthorpe <[email protected]>
> > wrote:
> > > On Mon, May 10, 2021 at 06:25:07AM -0700, Jacob Pan wrote:
> > >
> > > > +/*
> > > > + * The IOMMU_SVA_BIND_SUPERVISOR flag requests a PASID which can be
> > > > used only
> > > > + * for access to kernel addresses. No IOTLB flushes are
> > > > automatically done
> > > > + * for kernel mappings; it is valid only for access to the kernel's
> > > > static
> > > > + * 1:1 mapping of physical memory — not to vmalloc or even module
> > > > mappings.
> > > > + * A future API addition may permit the use of such ranges, by
> > > > means of an
> > > > + * explicit IOTLB flush call (akin to the DMA API's unmap method).
> > > > + *
> > > > + * It is unlikely that we will ever hook into
> > > > flush_tlb_kernel_range() to
> > > > + * do such IOTLB flushes automatically.
> > > > + */
> > > > +#define IOMMU_SVA_BIND_SUPERVISOR BIT(0)
> > >
> > > Huh? That isn't really SVA, can you call it something saner please?
> > >
> > This is shared kernel virtual address, I am following the SVA lib naming
> > since this is where the flag will be used. Why this is not SVA? Kernel
> > virtual address is still virtual address. Is it due to direct map?
>
> As the above explains it doesn't actually synchronize the kernel's
> address space it just shoves the direct map into the IOMMU.
>
There is no duplicated kernel direct map in IOMMU.

> I suppose a different IOMMU implementation might point the PASID directly
> at the kernel's page table and avoid those limitations - but since
> that isn't portable it seems irrelevant.
>
This is what we are doing here. We allocate a supervisor PASID and put
the kernel page table (init_mm pgd) in this PASID entry.

> Since the only thing it really maps is the direct map I would just
> call it direct_map, or all physical or something.
>
Good idea. It makes things clear to the callers. They must only use direct
map memory for DMA.

> How does this interact with the DMA APIs?
DMA API would use RID2PASID (PASID 0), so it is separated by PASIDs.

> How do you get CPU cache
> flushing/etc into PASID operations that don't trigger IOMMU updates?
>
Sorry, I am not following. This is used for direct map only.

> Honestly, I'm not convinced we should have "kernel SVA" at all.. Why
> does IDXD use normal DMA on the RID for kernel controlled accesses?
>
Using SVA simplifies the work submission, there is no need to do map/unmap.
Just bind PASID with init_mm, then submit work directly either with ENQCMDS
(supervisor version of ENQCMD) to a shared workqueue or put the supervisor
PASID in the descriptor for dedicated workqueue.

> > > Is it really a PASID that always has all of physical memory mapped
> > > into it? Sounds dangerous. What is it for?
> >
> > Yes. It is to bind DMA request w/ PASID with init_mm/init_top_pgt. Per
> > PCIe spec PASID TLP prefix has "Privileged Mode Requested" bit. VT-d
> > supports this with "Privileged-mode-Requested (PR) flag (to distinguish
> > user versus supervisor access)". Each PASID entry has a SRE (Supervisor
> > Request Enable) bit.
>
> The PR flag is only needed if the underlying IOMMU is directly
> processing the CPU page tables. For cases where the IOMMU is using its
> own page table format and has its own copies the PR flag shouldn't be
> used.
>
We are doing the former case. There is no IOMMU page tables for the direct
map.

> Jason


Thanks,

Jacob

2021-05-11 16:38:53

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

On Tue, May 11, 2021 at 09:14:52AM -0700, Jacob Pan wrote:

> > Honestly, I'm not convinced we should have "kernel SVA" at all.. Why
> > does IDXD use normal DMA on the RID for kernel controlled accesses?
>
> Using SVA simplifies the work submission, there is no need to do map/unmap.
> Just bind PASID with init_mm, then submit work directly either with ENQCMDS
> (supervisor version of ENQCMD) to a shared workqueue or put the supervisor
> PASID in the descriptor for dedicated workqueue.

That is not OK, protable drivers in Linux have to sue dma map/unmap
calls to manage cache coherence. PASID does not opt out of any of
that.

I dislike this whole idea a lot. A single driver should not opt itself
out of IOMMU based security "just because"

Jason

2021-05-11 18:08:03

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

Hi Jason,

On Tue, 11 May 2021 13:35:21 -0300, Jason Gunthorpe <[email protected]> wrote:

> On Tue, May 11, 2021 at 09:14:52AM -0700, Jacob Pan wrote:
>
> > > Honestly, I'm not convinced we should have "kernel SVA" at all.. Why
> > > does IDXD use normal DMA on the RID for kernel controlled accesses?
> >
> > Using SVA simplifies the work submission, there is no need to do
> > map/unmap. Just bind PASID with init_mm, then submit work directly
> > either with ENQCMDS (supervisor version of ENQCMD) to a shared
> > workqueue or put the supervisor PASID in the descriptor for dedicated
> > workqueue.
>
> That is not OK, protable drivers in Linux have to sue dma map/unmap
> calls to manage cache coherence. PASID does not opt out of any of
> that.
>
Let me try to break down your concerns:
1. portability - driver uses DMA APIs can function w/ and w/o IOMMU. is
that your concern? But PASID is intrinsically tied with IOMMU and if
the drivers are using a generic sva-lib API, why they are not portable?
SVA by its definition is to avoid map/unmap every time.

2. cache coherence - as you suggested if we name the flag "direct_map",
there is no mapping change, then there is no need for mmu_notifier like tlb
flush calls, right? it is caller's responsibility to make sure vmalloc are
not used.

> I dislike this whole idea a lot. A single driver should not opt itself
> out of IOMMU based security "just because"
>
Perhaps I missed your point here. This is NOT a single driver, privileged
access is a PCIe spec defined feature. We are using IOMMU sva-lib APIs, not
sure why it is by-passing.

Perhaps we should add check for struct pci_dev->untrusted && rciep to
address security?

> Jason


Thanks,

Jacob

2021-05-11 19:49:47

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

On Tue, May 11, 2021 at 11:05:50AM -0700, Jacob Pan wrote:
> Hi Jason,
>
> On Tue, 11 May 2021 13:35:21 -0300, Jason Gunthorpe <[email protected]> wrote:
>
> > On Tue, May 11, 2021 at 09:14:52AM -0700, Jacob Pan wrote:
> >
> > > > Honestly, I'm not convinced we should have "kernel SVA" at all.. Why
> > > > does IDXD use normal DMA on the RID for kernel controlled accesses?
> > >
> > > Using SVA simplifies the work submission, there is no need to do
> > > map/unmap. Just bind PASID with init_mm, then submit work directly
> > > either with ENQCMDS (supervisor version of ENQCMD) to a shared
> > > workqueue or put the supervisor PASID in the descriptor for dedicated
> > > workqueue.
> >
> > That is not OK, protable drivers in Linux have to sue dma map/unmap
> > calls to manage cache coherence. PASID does not opt out of any of
> > that.
> >
> Let me try to break down your concerns:
> 1. portability - driver uses DMA APIs can function w/ and w/o IOMMU. is
> that your concern? But PASID is intrinsically tied with IOMMU and if
> the drivers are using a generic sva-lib API, why they are not portable?
> SVA by its definition is to avoid map/unmap every time.

Kernel explicitly does not support this programming model. All DMA is
explicit and the DMA API hides platform details like IOMMU and CPU
cache coherences. Just because x86 doesn't care about this doesn't
make any of it optional.

If you want to do SVA PASID then it also must come with DMA APIs to
manage the CPU cache coherence that are all NOP's on x86.

> > I dislike this whole idea a lot. A single driver should not opt itself
> > out of IOMMU based security "just because"
>
> Perhaps I missed your point here. This is NOT a single driver, privileged
> access is a PCIe spec defined feature. We are using IOMMU sva-lib APIs, not
> sure why it is by-passing.

Until today the overal security of the IOMMU configuration is
controlled by kernel boot parameters set by the sysadmin.

This magic PASID effectively disables all IOMMU security and allows an
IO device unrestricted access to the entire system memory. This is the
opposite of what the admin may have specified in the various boot
options.

I don't like it all. Kust because the PCI sig defined this mechanism
doesn't mean it is mandatory for Linux to use it.

If you want the performance gain of no iommu updates then use RID
based access and boot flags that have an identity translation for all
RIDs.

Forcing the system to create an identiy translation in all cases by
having a single kernel driver create a PASID feels really
wrong.

Especially since there isn't any real gain in functionality or
performance as it it just exposing the same things the normal DMA API
would expose.

Jason

2021-05-12 06:42:27

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

On Tue, May 11, 2021 at 04:47:26PM -0300, Jason Gunthorpe wrote:
> > Let me try to break down your concerns:
> > 1. portability - driver uses DMA APIs can function w/ and w/o IOMMU. is
> > that your concern? But PASID is intrinsically tied with IOMMU and if
> > the drivers are using a generic sva-lib API, why they are not portable?
> > SVA by its definition is to avoid map/unmap every time.
>
> Kernel explicitly does not support this programming model. All DMA is
> explicit and the DMA API hides platform details like IOMMU and CPU
> cache coherences. Just because x86 doesn't care about this doesn't
> make any of it optional.

Exactly.

> If you want to do SVA PASID then it also must come with DMA APIs to
> manage the CPU cache coherence that are all NOP's on x86.

Yes. And we have plenty of precende where an IOMMU is in "bypass" mode
to allow access to all memory and then uses the simple dma-direct case.

2021-05-12 10:19:52

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

On Mon, May 10, 2021 at 06:25:07AM -0700, Jacob Pan wrote:
> The void* drvdata parameter isn't really used in iommu_sva_bind_device()
> API, the current IDXD code "borrows" the drvdata for a VT-d private flag
> for supervisor SVA usage.
>
> Supervisor/Privileged mode request is a generic feature. It should be
> promoted from the VT-d vendor driver to the generic code.
>
> This patch replaces void* drvdata with a unsigned int flags parameter
> and adjusts callers accordingly.
>
> Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/
> Suggested-by: Jean-Philippe Brucker <[email protected]>
> Signed-off-by: Jacob Pan <[email protected]>

Thanks for cleaning this up. Whether Vt-d's supervisor mode will need
rework or not, this is still good to have. One nit below if you resend

Reviewed-by: Jean-Philippe Brucker <[email protected]>

[...]
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 32d448050bf7..fcc9d36b4b01 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -152,6 +152,19 @@ enum iommu_dev_features {
>
> #define IOMMU_PASID_INVALID (-1U)
>
> +/*
> + * The IOMMU_SVA_BIND_SUPERVISOR flag requests a PASID which can be used only
> + * for access to kernel addresses. No IOTLB flushes are automatically done
> + * for kernel mappings; it is valid only for access to the kernel's static
> + * 1:1 mapping of physical memory — not to vmalloc or even module mappings.
> + * A future API addition may permit the use of such ranges, by means of an
> + * explicit IOTLB flush call (akin to the DMA API's unmap method).
> + *
> + * It is unlikely that we will ever hook into flush_tlb_kernel_range() to
> + * do such IOTLB flushes automatically.

I would add that this is platform specific and not all IOMMU drivers
support the feature.

Thanks,
Jean

> + */
> +#define IOMMU_SVA_BIND_SUPERVISOR BIT(0)
> +
> #ifdef CONFIG_IOMMU_API

2021-05-12 10:26:11

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH v4 2/2] iommu/sva: Remove mm parameter from SVA bind API

On Mon, May 10, 2021 at 06:25:08AM -0700, Jacob Pan wrote:
> The mm parameter in iommu_sva_bind_device() is intended for privileged
> process perform bind() on behalf of other processes. This use case has
> yet to be materialized, let alone potential security implications of
> adding kernel hooks without explicit user consent.
> In addition, with the agreement that IOASID allocation shall be subject
> cgroup limit. It will be inline with misc cgroup proposal if IOASID
> allocation as part of the SVA bind is limited to the current task.
>
> Link: https://lore.kernel.org/linux-iommu/20210303160205.151d114e@jacob-builder/
> Link: https://lore.kernel.org/linux-iommu/YFhiMLR35WWMW%2FHu@myrica/
> Signed-off-by: Jacob Pan <[email protected]>

I'm not particularly enthusiastic about this change, because restoring the
mm parameter will be difficult after IOMMU drivers start assuming
everything is on current. Regardless, it looks correct and makes my life
easier (and lightens my test suite quite a bit).

Reviewed-by: Jean-Philippe Brucker <[email protected]>

2021-05-13 13:07:30

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

Hi Christoph,

On Wed, 12 May 2021 07:37:40 +0100, Christoph Hellwig <[email protected]>
wrote:

> On Tue, May 11, 2021 at 04:47:26PM -0300, Jason Gunthorpe wrote:
> > > Let me try to break down your concerns:
> > > 1. portability - driver uses DMA APIs can function w/ and w/o IOMMU.
> > > is that your concern? But PASID is intrinsically tied with IOMMU and
> > > if the drivers are using a generic sva-lib API, why they are not
> > > portable? SVA by its definition is to avoid map/unmap every time.
> >
> > Kernel explicitly does not support this programming model. All DMA is
> > explicit and the DMA API hides platform details like IOMMU and CPU
> > cache coherences. Just because x86 doesn't care about this doesn't
> > make any of it optional.
>
> Exactly.
Perhaps we can view these SVA capable devices as FPU or a device that can
be fused onto the CPU by PASID binding. We don't require DMA map/unmap for
FPU to use kernel virtual address, right?

>
> > If you want to do SVA PASID then it also must come with DMA APIs to
> > manage the CPU cache coherence that are all NOP's on x86.
>
> Yes. And we have plenty of precende where an IOMMU is in "bypass" mode
> to allow access to all memory and then uses the simple dma-direct case.
I agree it is better not to expose the entire direct map. But the missing
piece of using DMA APIs is the PASID. The caller needs the PASID value to
do work submission once buffer is mapped.

Perhaps we can add a parameter to the DMA API to specify the address space?
As Jason suggested the definition of IOASID, which represents a page table.
Just my quick thought, can you help us with a viable solution?

I know we are supposed to hide IOMMU by using DMA APIs which makes drivers
portable w/ and w/o IOMMU. This IOASID can be optional.

Thanks,

Jacob

2021-05-13 20:49:21

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

On Thu, May 13, 2021 at 06:00:12AM -0700, Jacob Pan wrote:
> > > If you want to do SVA PASID then it also must come with DMA APIs to
> > > manage the CPU cache coherence that are all NOP's on x86.
> >
> > Yes. And we have plenty of precende where an IOMMU is in "bypass" mode
> > to allow access to all memory and then uses the simple dma-direct case.
> I agree it is better not to expose the entire direct map. But the missing
> piece of using DMA APIs is the PASID. The caller needs the PASID value to
> do work submission once buffer is mapped.

You still haven't explained why the kernel driver should have a PASID at all.

Jason

2021-05-13 22:33:50

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

Hi Jason,

On Thu, 13 May 2021 10:38:34 -0300, Jason Gunthorpe <[email protected]> wrote:

> On Thu, May 13, 2021 at 06:00:12AM -0700, Jacob Pan wrote:
> > > > If you want to do SVA PASID then it also must come with DMA APIs to
> > > > manage the CPU cache coherence that are all NOP's on x86.
> > >
> > > Yes. And we have plenty of precende where an IOMMU is in "bypass"
> > > mode to allow access to all memory and then uses the simple
> > > dma-direct case.
> > I agree it is better not to expose the entire direct map. But the
> > missing piece of using DMA APIs is the PASID. The caller needs the
> > PASID value to do work submission once buffer is mapped.
>
> You still haven't explained why the kernel driver should have a PASID at
> all.
>
For shared workqueue, it can only generate DMA request with PASID. The
submission is done by ENQCMDS (S for supervisor) instruction.

If we were not to share page tables with init_mm, we need a system PASID
that doing the same direct mapping in IOMMU page tables.

> Jason


Thanks,

Jacob

2021-05-13 22:38:04

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

> For shared workqueue, it can only generate DMA request with PASID. The
> submission is done by ENQCMDS (S for supervisor) instruction.
>
> If we were not to share page tables with init_mm, we need a system PASID
> that doing the same direct mapping in IOMMU page tables.

Note that for the currently envisioned kernel use cases for accelerators it
would be OK for this system PASID to just provide either:

1) A 1:1 mapping for physical addresses. Kernel users of the accelerators
would provide physical addresses in descriptors.
2) The same mapping that the kernel uses for its "1:1" map of all physical
memory. Users would use kernel virtual addresses in that "1:1" range
(e.g. those obtained from page_to_virt(struct page *p);)

If people want to use an accelerator on memory allocated by vmalloc()
things will get more complicated. But maybe we can delay solving that
problem until someone comes up with a real use case that needs to
do this?

-Tony



2021-05-13 22:40:24

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

On Thu, May 13, 2021 at 04:44:14PM +0000, Luck, Tony wrote:
> > For shared workqueue, it can only generate DMA request with PASID. The
> > submission is done by ENQCMDS (S for supervisor) instruction.
> >
> > If we were not to share page tables with init_mm, we need a system PASID
> > that doing the same direct mapping in IOMMU page tables.
>
> Note that for the currently envisioned kernel use cases for accelerators it
> would be OK for this system PASID to just provide either:
>
> 1) A 1:1 mapping for physical addresses. Kernel users of the accelerators
> would provide physical addresses in descriptors.
> 2) The same mapping that the kernel uses for its "1:1" map of all physical
> memory. Users would use kernel virtual addresses in that "1:1" range
> (e.g. those obtained from page_to_virt(struct page *p);)

Well, no, neither of those are OK.

The page table under the kernel PASID should behave the same way that
the kernel would operate the page table assigned to a kernel RID.

If the kernel has security off then the PASID should map to all
physical memory, just like the RID does.

If security is on then every DMA map needs to be loaded into the
PASID's io page table no different than a RID page table.

"kernel SVA" is, IMHO, not a desirable thing, it completely destroys
the kernel's DMA security model.

> If people want to use an accelerator on memory allocated by vmalloc()
> things will get more complicated. But maybe we can delay solving that
> problem until someone comes up with a real use case that needs to
> do this?

If you have a HW limitation that the device can only issue TLPs
with a PASID, even for kernel users, then I think the proper thing is
to tell the IOMMU layer than a certain 'struct device' enters
PASID-only mode and the IOMMU layer should construct an appropriate
PASID and flow the dma operations through it.

Pretending the DMA layer doesn't exist and that PASID gets a free pass
is not OK in the kernel.

Jason

2021-05-13 23:27:35

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

On Thu, May 13, 2021 at 07:14:54PM +0000, Luck, Tony wrote:
> > If you want this then be explicit about what it is you are making when
> > building the API. Don't try to hide it under some generic idea of
> > "kernel PCI DMA SVA"
>
> So, a special API call (that Dave can call from IDXD) to set up this
> kernel PASID. With suitable documentation to explain the scope.
> Maybe with a separate CONFIG option so it can be completely
> stubbed out (IDXD does *NOT* "select" this option ... users have
> to explicitly pick it).
>
> > I could easily see an admin option to "turn this off" entirely as
> > being too dangerous, especially if the users have no interest in IDXD.
>
> And a kernel command line option to block IDXD from using that
> special API ... for users on generic kernels who want to block
> this use model (but still use IDXD in non-kernel cases). Users
> who don't want IDXD at all can block loading of the driver.

A generic IOMMU API should not be IDXD specific, if you want to allow
some special "integrated to the SOC accelerator PASID" mode then it
should be a global IOMMU API and any security toggles for it should be
global and unrelated to IDXD.

Concurrently it seems necessary to have a solution for secure kernel
PASID use under the DMA API and reserve this special mode for getting
higher performance.

I think you need to come with a proposal, and it is a bit alarming a
noteworthy security hole was added under the guise of "kernel SVA" :(

Jason

2021-05-13 23:37:25

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

On Thu, May 13, 2021 at 12:46:21PM -0700, Jacob Pan wrote:
> It seems there are two options:
> 1. Add a new IOMMU API to set up a system PASID with a *separate* IOMMU page
> table/domain, mark the device is PASID only with a flag. Use DMA APIs
> to explicit map/unmap. Based on this PASID-only flag, Vendor IOMMU driver
> will decide whether to use system PASID domain during map/unmap. Not clear
> if we also need to make IOVA==kernel VA.
>
> 2. Add a new IOMMU API to setup a system PASID which points to init_mm.pgd.
> This API only allows trusted device to bind with the system PASID at its
> own risk. There is no need for DMA API. This is the same as the current
> code except with an explicit API.
>
> Which option?

Option #1 looks cleaner to me. Option #2 gives access to bits
of memory that the users of system PASID shouldn't ever need
to touch ... just map regions of memory that the kernel has
a "struct page" for.

What does "use DMA APIs to explicitly map/unmap" mean? Is that
for the whole region?

I'm expecting that once this system PASID has been initialized,
then any accelerator device with a kernel use case would use the
same PASID. I.e. DSA for page clearing, IAX for ZSwap compression
& decompression, etc.

-Tony

2021-05-13 23:54:53

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

Hi Tony,

On Thu, 13 May 2021 12:57:49 -0700, "Luck, Tony" <[email protected]>
wrote:

> On Thu, May 13, 2021 at 12:46:21PM -0700, Jacob Pan wrote:
> > It seems there are two options:
> > 1. Add a new IOMMU API to set up a system PASID with a *separate* IOMMU
> > page table/domain, mark the device is PASID only with a flag. Use DMA
> > APIs to explicit map/unmap. Based on this PASID-only flag, Vendor IOMMU
> > driver will decide whether to use system PASID domain during map/unmap.
> > Not clear if we also need to make IOVA==kernel VA.
> >
> > 2. Add a new IOMMU API to setup a system PASID which points to
> > init_mm.pgd. This API only allows trusted device to bind with the
> > system PASID at its own risk. There is no need for DMA API. This is the
> > same as the current code except with an explicit API.
> >
> > Which option?
>
> Option #1 looks cleaner to me. Option #2 gives access to bits
> of memory that the users of system PASID shouldn't ever need
> to touch ... just map regions of memory that the kernel has
> a "struct page" for.
>
> What does "use DMA APIs to explicitly map/unmap" mean? Is that
> for the whole region?
>
If we map the entire kernel direct map during system PASID setup, then we
don't need to use DMA API to map/unmap certain range.

I was thinking this system PASID page table could be on-demand. The mapping
is built by explicit use of DMA map/unmap APIs.

> I'm expecting that once this system PASID has been initialized,
> then any accelerator device with a kernel use case would use the
> same PASID. I.e. DSA for page clearing, IAX for ZSwap compression
> & decompression, etc.
>
OK, sounds like we have to map the entire kernel VA with struct page as you
said. So we still by-pass DMA APIs, can we all agree on that?

> -Tony


Thanks,

Jacob

2021-05-14 02:16:41

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

Hi Jason,

On Thu, 13 May 2021 19:31:22 -0300, Jason Gunthorpe <[email protected]> wrote:

> On Thu, May 13, 2021 at 01:22:51PM -0700, Jacob Pan wrote:
> > Hi Tony,
> >
> > On Thu, 13 May 2021 12:57:49 -0700, "Luck, Tony" <[email protected]>
> > wrote:
> >
> > > On Thu, May 13, 2021 at 12:46:21PM -0700, Jacob Pan wrote:
> > > > It seems there are two options:
> > > > 1. Add a new IOMMU API to set up a system PASID with a *separate*
> > > > IOMMU page table/domain, mark the device is PASID only with a flag.
> > > > Use DMA APIs to explicit map/unmap. Based on this PASID-only flag,
> > > > Vendor IOMMU driver will decide whether to use system PASID domain
> > > > during map/unmap. Not clear if we also need to make IOVA==kernel VA.
> > > >
> > > > 2. Add a new IOMMU API to setup a system PASID which points to
> > > > init_mm.pgd. This API only allows trusted device to bind with the
> > > > system PASID at its own risk. There is no need for DMA API. This is
> > > > the same as the current code except with an explicit API.
> > > >
> > > > Which option?
> > >
> > > Option #1 looks cleaner to me. Option #2 gives access to bits
> > > of memory that the users of system PASID shouldn't ever need
> > > to touch ... just map regions of memory that the kernel has
> > > a "struct page" for.
> > >
> > > What does "use DMA APIs to explicitly map/unmap" mean? Is that
> > > for the whole region?
> > >
> > If we map the entire kernel direct map during system PASID setup, then
> > we don't need to use DMA API to map/unmap certain range.
> >
> > I was thinking this system PASID page table could be on-demand. The
> > mapping is built by explicit use of DMA map/unmap APIs.
>
> Option 1 should be the PASID works exactly like a normal RID and uses
> all the normal DMA APIs and IOMMU mechanisms, whatever the platform
> implements. This might mean an iommu update on every operation or not.
>
> > > I'm expecting that once this system PASID has been initialized,
> > > then any accelerator device with a kernel use case would use the
> > > same PASID. I.e. DSA for page clearing, IAX for ZSwap compression
> > > & decompression, etc.
> > >
> > OK, sounds like we have to map the entire kernel VA with struct page as
> > you said. So we still by-pass DMA APIs, can we all agree on that?
>
> Option 2 should be the faster option, but not available in all cases.
>
> Option 1 isn't optional. DMA and IOMMU code has to be portable and
> this is the portable API.
>
> If you want to do option 1 and option 2 then give it a go, but in most
> common cases with the IOMMU in a direct map you shouldn't get a
> notable performance win.
>
Looks like we are converging. Let me summarize the takeaways:
1. Remove IOMMU_SVA_BIND_SUPERVISOR flag from this patch, in fact there
will be no flags at all for iommu_sva_bind_device()
2. Remove all supervisor SVA related vt-d, idxd code.
3. Create API iommu_setup_system_pasid_direct_map(option_flag)
if (option_flag == 1)
iommu_domain_alloc(IOMMU_DOMAIN_DMA);
if (option_flag == 2)
iommu_domain_alloc(IOMMU_DOMAIN_DIRECT); //new domain type?
setup IOMMU page tables mirroring the direct map
4. Create API iommu_enable_dev_direct_map(struct dev, &pasid, &option)
- Drivers call this API to get the system PASID and which option is
available on the system PASID
- mark device as PASID only, perhaps a new flag in struct
device->dev_iommu->pasid_only = 1
5. DMA API IOMMU vendor ops will take action based on the pasid_only flag to
decide if the mapping is for system PASID page tables.

Does it make sense?


> Jason


Thanks,

Jacob

2021-05-14 05:36:02

by Luck, Tony

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

On Thu, May 13, 2021 at 02:33:03PM -0300, Jason Gunthorpe wrote:
> The page table under the kernel PASID should behave the same way that
> the kernel would operate the page table assigned to a kernel RID.
>
> If the kernel has security off then the PASID should map to all
> physical memory, just like the RID does.
>
> If security is on then every DMA map needs to be loaded into the
> PASID's io page table no different than a RID page table.
>
> "kernel SVA" is, IMHO, not a desirable thing, it completely destroys
> the kernel's DMA security model.
>
> > If people want to use an accelerator on memory allocated by vmalloc()
> > things will get more complicated. But maybe we can delay solving that
> > problem until someone comes up with a real use case that needs to
> > do this?
>
> If you have a HW limitation that the device can only issue TLPs
> with a PASID, even for kernel users, then I think the proper thing is
> to tell the IOMMU layer than a certain 'struct device' enters
> PASID-only mode and the IOMMU layer should construct an appropriate
> PASID and flow the dma operations through it.
>
> Pretending the DMA layer doesn't exist and that PASID gets a free pass
> is not OK in the kernel.

I can see why a tight security model is needed to stop
random devices having access to mamory that they should
not be able to access. Now that PCIe devices can be plugged
into Thunderbolt ports on computers, nobody wants to repeat
the disaster that Firewire ports created for systems over
a decade ago.

But I'd like to challege the one-size-fits-all policy. There's
a big difference between a random device plugged into a port
(which may even lie about its VendorID/DeviceID) and a device
that is part of the CPU socket.

I'd like people to think of DSA as an extension to the instruction
set. It implements asynchronous instructions like "MEMFILL" and
"MEMCOPY". These can be limited in scope when executed in user
processes or guests. But when executed by the host OS ring0 code
they can have all the same access that ring0 code has when it
dereferences a pointer.

-Tony

2021-05-14 05:41:17

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

> If you want this then be explicit about what it is you are making when
> building the API. Don't try to hide it under some generic idea of
> "kernel PCI DMA SVA"

So, a special API call (that Dave can call from IDXD) to set up this
kernel PASID. With suitable documentation to explain the scope.
Maybe with a separate CONFIG option so it can be completely
stubbed out (IDXD does *NOT* "select" this option ... users have
to explicitly pick it).

> I could easily see an admin option to "turn this off" entirely as
> being too dangerous, especially if the users have no interest in IDXD.

And a kernel command line option to block IDXD from using that
special API ... for users on generic kernels who want to block
this use model (but still use IDXD in non-kernel cases). Users
who don't want IDXD at all can block loading of the driver.

Would that work?

-Tony

2021-05-14 06:34:40

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

On Thu, May 13, 2021 at 11:53:49AM -0700, Luck, Tony wrote:

> I'd like people to think of DSA as an extension to the instruction
> set. It implements asynchronous instructions like "MEMFILL" and
> "MEMCOPY". These can be limited in scope when executed in user
> processes or guests. But when executed by the host OS ring0 code
> they can have all the same access that ring0 code has when it
> dereferences a pointer.

If you want this then be explicit about what it is you are making when
building the API. Don't try to hide it under some generic idea of
"kernel PCI DMA SVA"

Add appropriately safe APIs that might have a chance of making it
secure, eg by confirming it is a physically trusted on-socket device.

It is not just Thunderbolt devices that could trigger this, many
devices with programmable firmware can spoof PCI DID/VID - having an
exploit chain that compromises a in-system FW device, chains it to
faking a IDXD HW, then accessing the all-memory PASID is pretty
alarming if the admin thought they had protection against DMA attacks.

I could easially see an admin option to "turn this off" entirely as
being too dangerous, especially if the users have no interest in IDXD.

Which is why being explicit of intent is really important.

Jason

2021-05-14 07:00:59

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

Hi Jason,

On Thu, 13 May 2021 16:20:14 -0300, Jason Gunthorpe <[email protected]> wrote:

> On Thu, May 13, 2021 at 07:14:54PM +0000, Luck, Tony wrote:
> > > If you want this then be explicit about what it is you are making when
> > > building the API. Don't try to hide it under some generic idea of
> > > "kernel PCI DMA SVA"
> >
> > So, a special API call (that Dave can call from IDXD) to set up this
> > kernel PASID. With suitable documentation to explain the scope.
> > Maybe with a separate CONFIG option so it can be completely
> > stubbed out (IDXD does *NOT* "select" this option ... users have
> > to explicitly pick it).
> >
> > > I could easily see an admin option to "turn this off" entirely as
> > > being too dangerous, especially if the users have no interest in
> > > IDXD.
> >
> > And a kernel command line option to block IDXD from using that
> > special API ... for users on generic kernels who want to block
> > this use model (but still use IDXD in non-kernel cases). Users
> > who don't want IDXD at all can block loading of the driver.
>
> A generic IOMMU API should not be IDXD specific, if you want to allow
> some special "integrated to the SOC accelerator PASID" mode then it
> should be a global IOMMU API and any security toggles for it should be
> global and unrelated to IDXD.
>
> Concurrently it seems necessary to have a solution for secure kernel
> PASID use under the DMA API and reserve this special mode for getting
> higher performance.
>
> I think you need to come with a proposal, and it is a bit alarming a
> noteworthy security hole was added under the guise of "kernel SVA" :(
>
It seems there are two options:
1. Add a new IOMMU API to set up a system PASID with a *separate* IOMMU page
table/domain, mark the device is PASID only with a flag. Use DMA APIs
to explicit map/unmap. Based on this PASID-only flag, Vendor IOMMU driver
will decide whether to use system PASID domain during map/unmap. Not clear
if we also need to make IOVA==kernel VA.

2. Add a new IOMMU API to setup a system PASID which points to init_mm.pgd.
This API only allows trusted device to bind with the system PASID at its
own risk. There is no need for DMA API. This is the same as the current
code except with an explicit API.

Which option?

> Jason


Thanks,

Jacob

2021-05-14 07:54:29

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

On Thu, May 13, 2021 at 01:22:51PM -0700, Jacob Pan wrote:
> Hi Tony,
>
> On Thu, 13 May 2021 12:57:49 -0700, "Luck, Tony" <[email protected]>
> wrote:
>
> > On Thu, May 13, 2021 at 12:46:21PM -0700, Jacob Pan wrote:
> > > It seems there are two options:
> > > 1. Add a new IOMMU API to set up a system PASID with a *separate* IOMMU
> > > page table/domain, mark the device is PASID only with a flag. Use DMA
> > > APIs to explicit map/unmap. Based on this PASID-only flag, Vendor IOMMU
> > > driver will decide whether to use system PASID domain during map/unmap.
> > > Not clear if we also need to make IOVA==kernel VA.
> > >
> > > 2. Add a new IOMMU API to setup a system PASID which points to
> > > init_mm.pgd. This API only allows trusted device to bind with the
> > > system PASID at its own risk. There is no need for DMA API. This is the
> > > same as the current code except with an explicit API.
> > >
> > > Which option?
> >
> > Option #1 looks cleaner to me. Option #2 gives access to bits
> > of memory that the users of system PASID shouldn't ever need
> > to touch ... just map regions of memory that the kernel has
> > a "struct page" for.
> >
> > What does "use DMA APIs to explicitly map/unmap" mean? Is that
> > for the whole region?
> >
> If we map the entire kernel direct map during system PASID setup, then we
> don't need to use DMA API to map/unmap certain range.
>
> I was thinking this system PASID page table could be on-demand. The mapping
> is built by explicit use of DMA map/unmap APIs.

Option 1 should be the PASID works exactly like a normal RID and uses
all the normal DMA APIs and IOMMU mechanisms, whatever the platform
implements. This might mean an iommu update on every operation or not.

> > I'm expecting that once this system PASID has been initialized,
> > then any accelerator device with a kernel use case would use the
> > same PASID. I.e. DSA for page clearing, IAX for ZSwap compression
> > & decompression, etc.
> >
> OK, sounds like we have to map the entire kernel VA with struct page as you
> said. So we still by-pass DMA APIs, can we all agree on that?

Option 2 should be the faster option, but not available in all cases.

Option 1 isn't optional. DMA and IOMMU code has to be portable and
this is the portable API.

If you want to do option 1 and option 2 then give it a go, but in most
common cases with the IOMMU in a direct map you shouldn't get a
notable performance win.

Jason

2021-05-18 05:34:53

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

On Thu, May 13, 2021 at 04:40:28PM -0700, Jacob Pan wrote:

> Looks like we are converging. Let me summarize the takeaways:
> 1. Remove IOMMU_SVA_BIND_SUPERVISOR flag from this patch, in fact there
> will be no flags at all for iommu_sva_bind_device()
> 2. Remove all supervisor SVA related vt-d, idxd code.
> 3. Create API iommu_setup_system_pasid_direct_map(option_flag)
> if (option_flag == 1)
> iommu_domain_alloc(IOMMU_DOMAIN_DMA);
> if (option_flag == 2)
> iommu_domain_alloc(IOMMU_DOMAIN_DIRECT); //new domain type?
> setup IOMMU page tables mirroring the direct map
> 4. Create API iommu_enable_dev_direct_map(struct dev, &pasid, &option)
> - Drivers call this API to get the system PASID and which option is
> available on the system PASID
> - mark device as PASID only, perhaps a new flag in struct
> device->dev_iommu->pasid_only = 1
> 5. DMA API IOMMU vendor ops will take action based on the pasid_only flag to
> decide if the mapping is for system PASID page tables.
>
> Does it make sense?

I think you will run into trouble with that approach when you get to
patches..

For 'option 1' what you want is an API that is 'give me a PASID that
is equivalent to the RID'.

Then all the DMA API operations map IO page tables to both RID and
PASID access. For the direct mode the PASID and RID will both point at
the shared all physical memory IO page table.

Otherwise the DMA API won't care if the device is using RID or PASID,
if it needs to map a range it does it to the shared IO page table and
flushes both the RID and PASID based caches.

Then the driver will use the normal DMA API with its normal struct
pci_device and simply tell the HW to do DMA TLP's with the returned
PASID.

For 'option 2' it should be a completely different API family.

Jason

2021-05-19 20:16:15

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v4 1/2] iommu/sva: Tighten SVA bind API with explicit flags

Hi Jason,

On Mon, 17 May 2021 11:37:58 -0300, Jason Gunthorpe <[email protected]> wrote:

> On Thu, May 13, 2021 at 04:40:28PM -0700, Jacob Pan wrote:
>
> > Looks like we are converging. Let me summarize the takeaways:
> > 1. Remove IOMMU_SVA_BIND_SUPERVISOR flag from this patch, in fact there
> > will be no flags at all for iommu_sva_bind_device()
> > 2. Remove all supervisor SVA related vt-d, idxd code.
> > 3. Create API iommu_setup_system_pasid_direct_map(option_flag)
> > if (option_flag == 1)
> > iommu_domain_alloc(IOMMU_DOMAIN_DMA);
> > if (option_flag == 2)
> > iommu_domain_alloc(IOMMU_DOMAIN_DIRECT); //new domain
> > type? setup IOMMU page tables mirroring the direct map
> > 4. Create API iommu_enable_dev_direct_map(struct dev, &pasid, &option)
> > - Drivers call this API to get the system PASID and which
> > option is available on the system PASID
> > - mark device as PASID only, perhaps a new flag in struct
> > device->dev_iommu->pasid_only = 1
> > 5. DMA API IOMMU vendor ops will take action based on the pasid_only
> > flag to decide if the mapping is for system PASID page tables.
> >
> > Does it make sense?
>
> I think you will run into trouble with that approach when you get to
> patches..
>
> For 'option 1' what you want is an API that is 'give me a PASID that
> is equivalent to the RID'.
>
> Then all the DMA API operations map IO page tables to both RID and
> PASID access. For the direct mode the PASID and RID will both point at
> the shared all physical memory IO page table.
>
> Otherwise the DMA API won't care if the device is using RID or PASID,
> if it needs to map a range it does it to the shared IO page table and
> flushes both the RID and PASID based caches.
>
> Then the driver will use the normal DMA API with its normal struct
> pci_device and simply tell the HW to do DMA TLP's with the returned
> PASID.
>
> For 'option 2' it should be a completely different API family.
>
Make sense, thanks for the suggestions.

> Jason


Thanks,

Jacob