2020-06-13 00:43:59

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v2 00/12] x86: tag application address space for devices

Typical hardware devices require a driver stack to translate application
buffers to hardware addresses, and a kernel-user transition to notify the
hardware of new work. What if both the translation and transition overhead
could be eliminated? This is what Shared Virtual Address (SVA) and ENQCMD
enabled hardware like Data Streaming Accelerator (DSA) aims to achieve.
Applications map portals in their local-address-space and directly submit
work to them using a new instruction.

This series enables ENQCMD and associated management of the new MSR
(MSR_IA32_PASID). This new MSR allows an application address space to be
associated with what the PCIe spec calls a Process Address Space ID (PASID).
This PASID tag is carried along with all requests between applications and
devices and allows devices to interact with the process address space.

SVA and ENQCMD enabled device drivers need this series. The phase 2 DSA
patches with SVA and ENQCMD support was released on the top of this series:
https://lore.kernel.org/patchwork/cover/1244060/

This series only provides simple and basic support for ENQCMD and the MSR:
1. Clean up type definitions (patch 1-3). These patches can be in a
separate series.
- Define "pasid" as "unsigned int" consistently (patch 1 and 2).
- Define "flags" as "unsigned int"
2. Explain different various technical terms used in the series (patch 4).
3. Enumerate support for ENQCMD in the processor (patch 5).
4. Handle FPU PASID state and the MSR during context switch (patches 6-7).
5. Define "pasid" in mm_struct (patch 8).
5. Clear PASID state for new mm and forked and cloned thread (patch 9-10).
6. Allocate and free PASID for a process (patch 11).
7. Fix up the PASID MSR in #GP handler when one thread in a process
executes ENQCMD for the first time (patches 12).

This patch series and the DSA phase 2 series are in
https://github.com/intel/idxd-driver/tree/idxd-stage2

References:
1. Detailed information on the ENQCMD/ENQCMDS instructions and the
IA32_PASID MSR can be found in Intel Architecture Instruction Set
Extensions and Future Features Programming Reference:
https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf

2. Detailed information on DSA can be found in DSA specification:
https://software.intel.com/en-us/download/intel-data-streaming-accelerator-preliminary-architecture-specification

Chang log:
v2:
- Add patches 1-3 to define "pasid" and "flags" as "unsigned int"
consistently (Thomas)
(these 3 patches could be in a separate patch set)
- Add patch 8 to move "pasid" to generic mm_struct (Christoph).
Jean-Philippe Brucker released a virtually same patch. Upstream only
needs one of the two.
- Add patch 9 to initialize PASID in a new mm.
- Plus other changes described in each patch (Thomas)

Ashok Raj (1):
docs: x86: Add documentation for SVA (Shared Virtual Addressing)

Fenghua Yu (10):
iommu: Change type of pasid to unsigned int
ocxl: Change type of pasid to unsigned int
iommu/vt-d: Change flags type to unsigned int in binding mm
x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions
x86/msr-index: Define IA32_PASID MSR
mm: Define pasid in mm
fork: Clear PASID for new mm
x86/process: Clear PASID state for a newly forked/cloned thread
x86/mmu: Allocate/free PASID
x86/traps: Fix up invalid PASID

Yu-cheng Yu (1):
x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature

Documentation/x86/index.rst | 1 +
Documentation/x86/sva.rst | 287 +++++++++++++++++++++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/fpu/types.h | 10 +
arch/x86/include/asm/fpu/xstate.h | 2 +-
arch/x86/include/asm/iommu.h | 3 +
arch/x86/include/asm/mmu_context.h | 14 ++
arch/x86/include/asm/msr-index.h | 3 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
arch/x86/kernel/fpu/xstate.c | 4 +
arch/x86/kernel/process.c | 18 ++
arch/x86/kernel/traps.c | 23 ++
drivers/gpu/drm/amd/amdkfd/kfd_iommu.c | 5 +-
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +-
drivers/iommu/amd/amd_iommu.h | 13 +-
drivers/iommu/amd/amd_iommu_types.h | 12 +-
drivers/iommu/amd/init.c | 4 +-
drivers/iommu/amd/iommu.c | 41 ++--
drivers/iommu/amd/iommu_v2.c | 22 +-
drivers/iommu/intel/debugfs.c | 2 +-
drivers/iommu/intel/dmar.c | 13 +-
drivers/iommu/intel/intel-pasid.h | 21 +-
drivers/iommu/intel/iommu.c | 4 +-
drivers/iommu/intel/pasid.c | 36 ++--
drivers/iommu/intel/svm.c | 159 ++++++++++++--
drivers/iommu/iommu.c | 2 +-
drivers/misc/ocxl/config.c | 3 +-
drivers/misc/ocxl/link.c | 6 +-
drivers/misc/ocxl/ocxl_internal.h | 6 +-
drivers/misc/ocxl/pasid.c | 2 +-
drivers/misc/ocxl/trace.h | 20 +-
drivers/misc/uacce/uacce.c | 2 +-
include/linux/amd-iommu.h | 9 +-
include/linux/intel-iommu.h | 20 +-
include/linux/intel-svm.h | 2 +-
include/linux/iommu.h | 8 +-
include/linux/mm_types.h | 6 +
include/linux/uacce.h | 2 +-
include/misc/ocxl.h | 6 +-
kernel/fork.c | 8 +
40 files changed, 656 insertions(+), 147 deletions(-)
create mode 100644 Documentation/x86/sva.rst

--
2.19.1


2020-06-13 00:44:13

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v2 06/12] x86/fpu/xstate: Add supervisor PASID state for ENQCMD feature

From: Yu-cheng Yu <[email protected]>

ENQCMD instruction reads PASID from IA32_PASID MSR. The MSR is stored
in the task's supervisor FPU PASID state and is context switched by
XSAVES/XRSTORS.

Signed-off-by: Yu-cheng Yu <[email protected]>
Co-developed-by: Fenghua Yu <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
v2:
- Modify the commit message (Thomas)

arch/x86/include/asm/fpu/types.h | 10 ++++++++++
arch/x86/include/asm/fpu/xstate.h | 2 +-
arch/x86/kernel/fpu/xstate.c | 4 ++++
3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/fpu/types.h b/arch/x86/include/asm/fpu/types.h
index f098f6cab94b..00f8efd4c07d 100644
--- a/arch/x86/include/asm/fpu/types.h
+++ b/arch/x86/include/asm/fpu/types.h
@@ -114,6 +114,7 @@ enum xfeature {
XFEATURE_Hi16_ZMM,
XFEATURE_PT_UNIMPLEMENTED_SO_FAR,
XFEATURE_PKRU,
+ XFEATURE_PASID,

XFEATURE_MAX,
};
@@ -128,6 +129,7 @@ enum xfeature {
#define XFEATURE_MASK_Hi16_ZMM (1 << XFEATURE_Hi16_ZMM)
#define XFEATURE_MASK_PT (1 << XFEATURE_PT_UNIMPLEMENTED_SO_FAR)
#define XFEATURE_MASK_PKRU (1 << XFEATURE_PKRU)
+#define XFEATURE_MASK_PASID (1 << XFEATURE_PASID)

#define XFEATURE_MASK_FPSSE (XFEATURE_MASK_FP | XFEATURE_MASK_SSE)
#define XFEATURE_MASK_AVX512 (XFEATURE_MASK_OPMASK \
@@ -229,6 +231,14 @@ struct pkru_state {
u32 pad;
} __packed;

+/*
+ * State component 10 is supervisor state used for context-switching the
+ * PASID state.
+ */
+struct ia32_pasid_state {
+ u64 pasid;
+} __packed;
+
struct xstate_header {
u64 xfeatures;
u64 xcomp_bv;
diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index 422d8369012a..ab9833c57aaa 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -33,7 +33,7 @@
XFEATURE_MASK_BNDCSR)

/* All currently supported supervisor features */
-#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (0)
+#define XFEATURE_MASK_SUPERVISOR_SUPPORTED (XFEATURE_MASK_PASID)

/*
* Unsupported supervisor features. When a supervisor feature in this mask is
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index bda2e5eaca0e..31629e43383c 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -37,6 +37,7 @@ static const char *xfeature_names[] =
"AVX-512 ZMM_Hi256" ,
"Processor Trace (unused)" ,
"Protection Keys User registers",
+ "PASID state",
"unknown xstate feature" ,
};

@@ -51,6 +52,7 @@ static short xsave_cpuid_features[] __initdata = {
X86_FEATURE_AVX512F,
X86_FEATURE_INTEL_PT,
X86_FEATURE_PKU,
+ X86_FEATURE_ENQCMD,
};

/*
@@ -316,6 +318,7 @@ static void __init print_xstate_features(void)
print_xstate_feature(XFEATURE_MASK_ZMM_Hi256);
print_xstate_feature(XFEATURE_MASK_Hi16_ZMM);
print_xstate_feature(XFEATURE_MASK_PKRU);
+ print_xstate_feature(XFEATURE_MASK_PASID);
}

/*
@@ -590,6 +593,7 @@ static void check_xstate_against_struct(int nr)
XCHECK_SZ(sz, nr, XFEATURE_ZMM_Hi256, struct avx_512_zmm_uppers_state);
XCHECK_SZ(sz, nr, XFEATURE_Hi16_ZMM, struct avx_512_hi16_state);
XCHECK_SZ(sz, nr, XFEATURE_PKRU, struct pkru_state);
+ XCHECK_SZ(sz, nr, XFEATURE_PASID, struct ia32_pasid_state);

/*
* Make *SURE* to add any feature numbers in below if
--
2.19.1

2020-06-13 00:44:16

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v2 03/12] iommu/vt-d: Change flags type to unsigned int in binding mm

"flags" passed to intel_svm_bind_mm() is a bit mask and should be
defined as "unsigned int" instead of "int".

Change its type to "unsigned int".

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
v2:
- Add this new patch per Thomas' comment.

drivers/iommu/intel/svm.c | 7 ++++---
include/linux/intel-iommu.h | 2 +-
2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index b5618341b4b1..4e775e12ae52 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -427,7 +427,8 @@ int intel_svm_unbind_gpasid(struct device *dev, unsigned int pasid)

/* Caller must hold pasid_mutex, mm reference */
static int
-intel_svm_bind_mm(struct device *dev, int flags, struct svm_dev_ops *ops,
+intel_svm_bind_mm(struct device *dev, unsigned int flags,
+ struct svm_dev_ops *ops,
struct mm_struct *mm, struct intel_svm_dev **sd)
{
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
@@ -954,7 +955,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
{
struct iommu_sva *sva = ERR_PTR(-EINVAL);
struct intel_svm_dev *sdev = NULL;
- int flags = 0;
+ unsigned int flags = 0;
int ret;

/*
@@ -963,7 +964,7 @@ intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata)
* and intel_svm etc.
*/
if (drvdata)
- flags = *(int *)drvdata;
+ flags = *(unsigned int *)drvdata;
mutex_lock(&pasid_mutex);
ret = intel_svm_bind_mm(dev, flags, NULL, mm, &sdev);
if (ret)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 44fa8879f829..9abc30cf10fc 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -759,7 +759,7 @@ struct intel_svm {
struct mm_struct *mm;

struct intel_iommu *iommu;
- int flags;
+ unsigned int flags;
unsigned int pasid;
int gpasid; /* In case that guest PASID is different from host PASID */
struct list_head devs;
--
2.19.1

2020-06-13 00:44:18

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v2 10/12] x86/process: Clear PASID state for a newly forked/cloned thread

The PASID state has to be cleared on forks, since the child has a
different address space. The PASID is also cleared for thread clone. While
it would be correct to inherit the PASID in this case, it is unknown
whether the new task will use ENQCMD. Giving it the PASID "just in case"
would have the downside of increased context switch overhead to setting
the PASID MSR.

Since #GP faults have to be handled on any threads that were created before
the PASID was assigned to the mm of the process, newly created threads
might as well be treated in a consistent way.

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
v2:
- Modify init_task_pasid().

arch/x86/kernel/process.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index f362ce0d5ac0..1b1492e337a6 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -121,6 +121,21 @@ static int set_new_tls(struct task_struct *p, unsigned long tls)
return do_set_thread_area_64(p, ARCH_SET_FS, tls);
}

+/* Initialize the PASID state for the forked/cloned thread. */
+static void init_task_pasid(struct task_struct *task)
+{
+ struct ia32_pasid_state *ppasid;
+
+ /*
+ * Initialize the PASID state so that the PASID MSR will be
+ * initialized to its initial state (0) by XRSTORS when the task is
+ * scheduled for the first time.
+ */
+ ppasid = get_xsave_addr(&task->thread.fpu.state.xsave, XFEATURE_PASID);
+ if (ppasid)
+ ppasid->pasid = INIT_PASID;
+}
+
int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
unsigned long arg, struct task_struct *p, unsigned long tls)
{
@@ -174,6 +189,9 @@ int copy_thread_tls(unsigned long clone_flags, unsigned long sp,
task_user_gs(p) = get_user_gs(current_pt_regs());
#endif

+ if (static_cpu_has(X86_FEATURE_ENQCMD))
+ init_task_pasid(p);
+
/* Set a new TLS for the child thread? */
if (clone_flags & CLONE_SETTLS)
ret = set_new_tls(p, tls);
--
2.19.1

2020-06-13 00:44:28

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v2 11/12] x86/mmu: Allocate/free PASID

A PASID is allocated for an "mm" the first time any thread attaches
to an SVM capable device. Later device attachments (whether to the same
device or another SVM device) will re-use the same PASID.

The PASID is freed when the process exits (so no need to keep
reference counts on how many SVM devices are sharing the PASID).

Signed-off-by: Fenghua Yu <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
v2:
- Define a helper free_bind() to simplify error exit code in bind_mm()
(Thomas)
- Fix a ret error code in bind_mm() (Thomas)
- Change pasid's type from "int" to "unsigned int" to have consistent
pasid type in iommu (Thomas)
- Simplify alloc_pasid() a bit.

arch/x86/include/asm/iommu.h | 2 +
arch/x86/include/asm/mmu_context.h | 14 ++++
drivers/iommu/intel/svm.c | 101 +++++++++++++++++++++++++----
3 files changed, 105 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index bf1ed2ddc74b..ed41259fe7ac 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -26,4 +26,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
return -EINVAL;
}

+void __free_pasid(struct mm_struct *mm);
+
#endif /* _ASM_X86_IOMMU_H */
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 47562147e70b..f8c91ce8c451 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -13,6 +13,7 @@
#include <asm/tlbflush.h>
#include <asm/paravirt.h>
#include <asm/debugreg.h>
+#include <asm/iommu.h>

extern atomic64_t last_mm_ctx_id;

@@ -117,9 +118,22 @@ static inline int init_new_context(struct task_struct *tsk,
init_new_context_ldt(mm);
return 0;
}
+
+static inline void free_pasid(struct mm_struct *mm)
+{
+ if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
+ return;
+
+ if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
+ return;
+
+ __free_pasid(mm);
+}
+
static inline void destroy_context(struct mm_struct *mm)
{
destroy_context_ldt(mm);
+ free_pasid(mm);
}

extern void switch_mm(struct mm_struct *prev, struct mm_struct *next,
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 4e775e12ae52..27dc866b8461 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -425,6 +425,53 @@ int intel_svm_unbind_gpasid(struct device *dev, unsigned int pasid)
return ret;
}

+static void free_bind(struct intel_svm *svm, struct intel_svm_dev *sdev,
+ bool new_pasid)
+{
+ if (new_pasid)
+ ioasid_free(svm->pasid);
+ kfree(svm);
+ kfree(sdev);
+}
+
+/*
+ * If this mm already has a PASID, use it. Otherwise allocate a new one.
+ * Let the caller know if a new PASID is allocated via 'new_pasid'.
+ */
+static int alloc_pasid(struct intel_svm *svm, struct mm_struct *mm,
+ unsigned int pasid_max, bool *new_pasid,
+ unsigned int flags)
+{
+ unsigned int pasid;
+
+ *new_pasid = false;
+
+ /*
+ * Reuse the PASID if the mm already has a PASID and not a private
+ * PASID is requested.
+ */
+ if (mm && mm->pasid && !(flags & SVM_FLAG_PRIVATE_PASID)) {
+ /*
+ * Once a PASID is allocated for this mm, the PASID
+ * stays with the mm until the mm is dropped. Reuse
+ * the PASID which has been already allocated for the
+ * mm instead of allocating a new one.
+ */
+ ioasid_set_data(mm->pasid, svm);
+
+ return mm->pasid;
+ }
+
+ /* Allocate a new pasid. Do not use PASID 0, reserved for init PASID. */
+ pasid = ioasid_alloc(NULL, PASID_MIN, pasid_max - 1, svm);
+ if (pasid != INVALID_IOASID) {
+ /* A new pasid is allocated. */
+ *new_pasid = true;
+ }
+
+ return pasid;
+}
+
/* Caller must hold pasid_mutex, mm reference */
static int
intel_svm_bind_mm(struct device *dev, unsigned int flags,
@@ -518,6 +565,8 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
init_rcu_head(&sdev->rcu);

if (!svm) {
+ bool new_pasid;
+
svm = kzalloc(sizeof(*svm), GFP_KERNEL);
if (!svm) {
ret = -ENOMEM;
@@ -529,12 +578,9 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
if (pasid_max > intel_pasid_max_id)
pasid_max = intel_pasid_max_id;

- /* Do not use PASID 0, reserved for RID to PASID */
- svm->pasid = ioasid_alloc(NULL, PASID_MIN,
- pasid_max - 1, svm);
+ svm->pasid = alloc_pasid(svm, mm, pasid_max, &new_pasid, flags);
if (svm->pasid == INVALID_IOASID) {
- kfree(svm);
- kfree(sdev);
+ free_bind(svm, sdev, new_pasid);
ret = -ENOSPC;
goto out;
}
@@ -547,9 +593,7 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
if (mm) {
ret = mmu_notifier_register(&svm->notifier, mm);
if (ret) {
- ioasid_free(svm->pasid);
- kfree(svm);
- kfree(sdev);
+ free_bind(svm, sdev, new_pasid);
goto out;
}
}
@@ -565,12 +609,18 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
if (ret) {
if (mm)
mmu_notifier_unregister(&svm->notifier, mm);
- ioasid_free(svm->pasid);
- kfree(svm);
- kfree(sdev);
+ free_bind(svm, sdev, new_pasid);
goto out;
}

+ if (mm && new_pasid && !(flags & SVM_FLAG_PRIVATE_PASID)) {
+ /*
+ * Track the new pasid in the mm. The pasid will be
+ * freed at process exit. Don't track requested
+ * private PASID in the mm.
+ */
+ mm->pasid = svm->pasid;
+ }
list_add_tail(&svm->list, &global_svm_list);
} else {
/*
@@ -640,7 +690,8 @@ static int intel_svm_unbind_mm(struct device *dev, unsigned int pasid)
kfree_rcu(sdev, rcu);

if (list_empty(&svm->devs)) {
- ioasid_free(svm->pasid);
+ /* Clear data in the pasid. */
+ ioasid_set_data(pasid, NULL);
if (svm->mm)
mmu_notifier_unregister(&svm->notifier, svm->mm);
list_del(&svm->list);
@@ -1001,3 +1052,29 @@ unsigned int intel_svm_get_pasid(struct iommu_sva *sva)

return pasid;
}
+
+/*
+ * An invalid pasid is either 0 (init PASID value) or bigger than max PASID
+ * (PASID_MAX - 1).
+ */
+static bool invalid_pasid(unsigned int pasid)
+{
+ return (pasid == INIT_PASID) || (pasid >= PASID_MAX);
+}
+
+/* On process exit free the PASID (if one was allocated). */
+void __free_pasid(struct mm_struct *mm)
+{
+ unsigned int pasid = mm->pasid;
+
+ /* No need to free invalid pasid. */
+ if (invalid_pasid(pasid))
+ return;
+
+ /*
+ * Since the pasid is not bound to any svm by now, there is no race
+ * here with binding/unbinding and no need to protect the free
+ * operation by pasid_mutex.
+ */
+ ioasid_free(pasid);
+}
--
2.19.1

2020-06-13 00:44:34

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v2 04/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing)

From: Ashok Raj <[email protected]>

ENQCMD and Data Streaming Accelerator (DSA) and all of their associated
features are a complicated stack with lots of interconnected pieces.
This documentation provides a big picture overview for all of the
features.

Signed-off-by: Ashok Raj <[email protected]>
Co-developed-by: Fenghua Yu <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
v2:
- Fix the doc format and add the doc in toctree (Thomas)
- Modify the doc for better description (Thomas, Tony, Dave)

Documentation/x86/index.rst | 1 +
Documentation/x86/sva.rst | 287 ++++++++++++++++++++++++++++++++++++
2 files changed, 288 insertions(+)
create mode 100644 Documentation/x86/sva.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 265d9e9a093b..e5d5ff096685 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -30,3 +30,4 @@ x86-specific Documentation
usb-legacy-support
i386/index
x86_64/index
+ sva
diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
new file mode 100644
index 000000000000..1e52208c7dda
--- /dev/null
+++ b/Documentation/x86/sva.rst
@@ -0,0 +1,287 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===========================================
+Shared Virtual Addressing (SVA) with ENQCMD
+===========================================
+
+Background
+==========
+
+Shared Virtual Addressing (SVA) allows the processor and device to use the
+same virtual addresses avoiding the need for software to translate virtual
+addresses to physical addresses. SVA is what PCIe calls Shared Virtual
+Memory (SVM)
+
+In addition to the convenience of using application virtual addresses
+by the device, it also doesn't require pinning pages for DMA.
+PCIe Address Translation Services (ATS) along with Page Request Interface
+(PRI) allow devices to function much the same way as the CPU handling
+application page-faults. For more information please refer to PCIe
+specification Chapter 10: ATS Specification.
+
+Use of SVA requires IOMMU support in the platform. IOMMU also is required
+to support PCIe features ATS and PRI. ATS allows devices to cache
+translations for the virtual address. IOMMU driver uses the mmu_notifier()
+support to keep the device tlb cache and the CPU cache in sync. PRI allows
+the device to request paging the virtual address before using if they are
+not paged in the CPU page tables.
+
+
+Shared Hardware Workqueues
+==========================
+
+Unlike Single Root I/O Virtualization (SRIOV), Scalable IOV (SIOV) permits
+the use of Shared Work Queues (SWQ) by both applications and Virtual
+Machines (VM's). This allows better hardware utilization vs. hard
+partitioning resources that could result in under utilization. In order to
+allow the hardware to distinguish the context for which work is being
+executed in the hardware by SWQ interface, SIOV uses Process Address Space
+ID (PASID), which is a 20bit number defined by the PCIe SIG.
+
+PASID value is encoded in all transactions from the device. This allows the
+IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe
+Resource Identifier (RID) which is the Bus/Device/Function.
+
+
+ENQCMD
+======
+
+ENQCMD is a new instruction on Intel platforms that atomically submits a
+work descriptor to a device. The descriptor includes the operation to be
+performed, virtual addresses of all parameters, virtual address of a completion
+record, and the PASID (process address space ID) of the current process.
+
+ENQCMD works with non-posted semantics and carries a status back if the
+command was accepted by hardware. This allows the submitter to know if the
+submission needs to be retried or other device specific mechanisms to
+implement implement fairness or ensure forward progress can be made.
+
+ENQCMD is the glue that ensures applications can directly submit commands
+to the hardware and also permit hardware to be aware of application context
+to perform I/O operations via use of PASID.
+
+Process Address Space Tagging
+=============================
+
+A new thread scoped MSR (IA32_PASID) provides the connection between
+user processes and the rest of the hardware. When an application first
+accesses an SVA capable device this MSR is initialized with a newly
+allocated PASID. The driver for the device calls an IOMMU specific api
+that sets up the routing for DMA and page-requests.
+
+For example, the Intel Data Streaming Accelerator (DSA) uses
+intel_svm_bind_mm(), which will do the following.
+
+- Allocate the PASID, and program the process page-table (cr3) in the PASID
+ context entries.
+- Register for mmu_notifier() to track any page-table invalidations to keep
+ the device tlb in sync. For example, when a page-table entry is invalidated,
+ IOMMU propagates the invalidation to device tlb. This will force any
+ future access by the device to this virtual address to participate in
+ ATS. If the IOMMU responds with proper response that a page is not
+ present, the device would request the page to be paged in via the PCIe PRI
+ protocol before performing I/O.
+
+This MSR is managed with the XSAVE feature set as "supervisor state" to
+ensure the MSR is updated during context switch.
+
+PASID Management
+================
+
+The kernel must allocate a PASID on behalf of each process and program it
+into the new MSR to communicate the process identity to platform hardware.
+ENQCMD uses the PASID stored in this MSR to tag requests from this process.
+When a user submits a work descriptor to a device using the ENQCMD
+instruction, the PASID field in the descriptor is auto-filled with the
+value from MSR_IA32_PASID. Requests for DMA from the device are also tagged
+with the same PASID. The platform IOMMU uses the PASID in the transaction to
+perform address translation. The IOMMU api's setup the corresponding PASID
+entry in IOMMU with the process address used by the CPU (for e.g cr3 in x86).
+
+The MSR must be configured on each logical CPU before any application
+thread can interact with a device. Threads that belong to the same
+process share the same page tables, thus the same MSR value.
+
+PASID is cleared when a process is created. The PASID allocation and MSR
+programming may occur long after a process and its threads have been created.
+One thread must call bind() to allocate the PASID for the process. If a
+thread uses ENQCMD without the MSR first being populated, it will cause #GP.
+The kernel will fix up the #GP by writing the process-wide PASID into the
+thread that took the #GP. A single process PASID can be used simultaneously
+with multiple devices since they all share the same address space.
+
+New threads could inherit the MSR value from the parent. But this would
+involve additional state management for those threads which may never use
+ENQCMD. Clearing the MSR at thread creation permits all threads to have a
+consistent behavior; the PASID is only programmed when the thread calls
+the bind() api (intel_svm_bind_mm()()), or when a thread calls ENQCMD for
+the first time.
+
+PASID Lifecycle Management
+==========================
+
+Only processes that access SVA capable devices need to have a PASID
+allocated. This allocation happens when a process first opens an SVA
+capable device (subsequent opens of the same, or other devices will
+share the same PASID).
+
+Although the PASID is allocated to the process by opening a device,
+it is not active in any of the threads of that process. Activation is
+done lazily when a thread tries to submit a work descriptor to a device
+using the ENQCMD.
+
+That first access will trigger a #GP fault because the IA32_PASID MSR
+has not been initialized with the PASID value assigned to the process
+when the device was opened. The Linux #GP handler notes that a PASID as
+been allocated for the process, and so initializes the IA32_PASID MSR
+and returns so that the ENQCMD instruction is re-executed.
+
+On fork(2) or exec(2) the PASID is removed from the process as it no
+longer has the same address space that it had when the device was opened.
+
+On clone(2) the new task shares the same address space, so will be
+able to use the PASID allocated to the process. The IA32_PASID is not
+preemptively initialized as the kernel does not know whether this thread
+is going to access the device.
+
+On exit(2) the PASID is freed. The device driver ensures that any pending
+operations queued to the device are either completed or aborted before
+allowing the PASID to be re-allocated.
+
+Relationships
+=============
+
+ * Each process has many threads, but only one PASID
+ * Devices have a limited number (~10's to 1000's) of hardware
+ workqueues and each portal maps down to a single workqueue.
+ The device driver manages allocating hardware workqueues.
+ * A single mmap() maps a single hardware workqueue as a "portal"
+ * For each device with which a process interacts, there must be
+ one or more mmap()'d portals.
+ * Many threads within a process can share a single portal to access
+ a single device.
+ * Multiple processes can separately mmap() the same portal, in
+ which case they still share one device hardware workqueue.
+ * The single process-wide PASID is used by all threads to interact
+ with all devices. There is not, for instance, a PASID for each
+ thread or each thread<->device pair.
+
+FAQ
+===
+
+* What is SVA/SVM?
+
+Shared Virtual Addressing (SVA) permits I/O hardware and the processor to
+work in the same address space. In short, sharing the address space. Some
+call it Shared Virtual Memory (SVM), but Linux community wanted to avoid
+it with Posix Shared Memory and Secure Virtual Machines which were terms
+already in circulation.
+
+* What is a PASID?
+
+A Process Address Space ID (PASID) is a PCIe-defined TLP Prefix. A PASID is
+a 20 bit number allocated and managed by the OS. PASID is included in all
+transactions between the platform and the device.
+
+* How are shared work queues different?
+
+Traditionally to allow user space applications interact with hardware,
+there is a separate instance required per process. For example, consider
+doorbells as a mechanism of informing hardware about work to process. Each
+doorbell is required to be spaced 4k (or page-size) apart for process
+isolation. This requires hardware to provision that space and reserve in
+MMIO. This doesn't scale as the number of threads becomes quite large. The
+hardware also manages the queue depth for Shared Work Queues (SWQ), and
+consumers don't need to track queue depth. If there is no space to accept
+a command, the device will return an error indicating retry. Also
+submitting a command to an MMIO address that can't accept ENQCMD will
+return retry in response. In the new DMWr PCIe terminology, devices need to
+support DMWr completer capability. In addition it requires all switch ports
+to support DMWr routing and must be enabled by the PCIe subsystem, much
+like how PCIe Atomics() are managed for instance.
+
+SWQ allows hardware to provision just a single address in the device. When
+used with ENQCMD to submit work, the device can distinguish the process
+submitting the work since it will include the PASID assigned to that
+process. This decreases the pressure of hardware requiring to support
+hardware to scale to a large number of processes.
+
+* Is this the same as a user space device driver?
+
+Communicating with the device via the shared work queue is much simpler
+than a full blown user space driver. The kernel driver does all the
+initialization of the hardware. User space only needs to worry about
+submitting work and processing completions.
+
+* Is this the same as SR-IOV?
+
+Single Root I/O Virtualization (SR-IOV) focuses on providing independent
+hardware interfaces for virtualizing hardware. Hence its required to be
+almost fully functional interface to software supporting the traditional
+BAR's, space for interrupts via MSI-x, its own register layout.
+Virtual Functions (VFs) are assisted by the Physical Function (PF)
+driver.
+
+Scalable I/O Virtualization builds on the PASID concept to create device
+instances for virtualization. SIOV requires host software to assist in
+creating virtual devices, each virtual device is represented by a PASID
+along with the BDF of the device. This allows device hardware to optimize
+device resource creation and can grow dynamically on demand. SR-IOV creation
+and management is very static in nature. Consult references below for more
+details.
+
+* Why not just create a virtual function for each app?
+
+Creating PCIe SRIOV type virtual functions (VF) are expensive. They create
+duplicated hardware for PCI config space requirements, Interrupts such as
+MSIx for instance. Resources such as interrupts have to be hard partitioned
+between VF's at creation time, and cannot scale dynamically on demand. The
+VF's are not completely independent from the Physical function (PF). Most
+VF's require some communication and assistance from the PF driver. SIOV
+creates a software defined device. Where all the configuration and control
+aspects are mediated via the slow path. The work submission and completion
+happen without any mediation.
+
+* Does this support virtualization?
+
+ENQCMD can be used from within a guest VM. In these cases the VMM helps
+with setting up a translation table to translate from Guest PASID to Host
+PASID. Please consult the ENQCMD instruction set reference for more
+details.
+
+* Does memory need to be pinned?
+
+When devices support SVA, along with platform hardware such as IOMMU
+supporting such devices, there is no need to pin memory for DMA purposes.
+Devices that support SVA also support other PCIe features that remove the
+pinning requirement for memory.
+
+Device TLB support - Device requests the IOMMU to lookup an address before
+use via Address Translation Service (ATS) requests. If the mapping exists
+but there is no page allocated by the OS, IOMMU hardware returns that no
+mapping exists.
+
+Device requests that virtual address to be mapped via Page Request
+Interface (PRI). Once the OS has successfully completed the mapping, it
+returns the response back to the device. The device continues again to
+request for a translation and continues.
+
+IOMMU works with the OS in managing consistency of page-tables with the
+device. When removing pages, it interacts with the device to remove any
+device-tlb that might have been cached before removing the mappings from
+the OS.
+
+References
+==========
+
+VT-D:
+https://01.org/blogs/ashokraj/2018/recent-enhancements-intel-virtualization-technology-directed-i/o-intel-vt-d
+
+SIOV:
+https://01.org/blogs/2019/assignable-interfaces-intel-scalable-i/o-virtualization-linux
+
+ENQCMD in ISE:
+https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
+
+DSA spec:
+https://software.intel.com/sites/default/files/341204-intel-data-streaming-accelerator-spec.pdf
--
2.19.1

2020-06-13 00:45:52

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v2 07/12] x86/msr-index: Define IA32_PASID MSR

The IA32_PASID MSR (0xd93) contains the Process Address Space Identifier
(PASID), a 20-bit value. Bit 31 must be set to indicate the value
programmed in the MSR is valid. Hardware uses PASID to identify process
address space and direct responses to the right address space.

Signed-off-by: Fenghua Yu <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
v2:
- Change "identify process" to "identify process address space" in the
commit message (Thomas)

arch/x86/include/asm/msr-index.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index e8370e64a155..e5f699ff1dd6 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -237,6 +237,9 @@
#define MSR_IA32_LASTINTFROMIP 0x000001dd
#define MSR_IA32_LASTINTTOIP 0x000001de

+#define MSR_IA32_PASID 0x00000d93
+#define MSR_IA32_PASID_VALID BIT_ULL(31)
+
/* DEBUGCTLMSR bits (others vary by model): */
#define DEBUGCTLMSR_LBR (1UL << 0) /* last branch recording */
#define DEBUGCTLMSR_BTF_SHIFT 1
--
2.19.1

2020-06-13 00:46:07

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v2 08/12] mm: Define pasid in mm

PASID is shared by all threads in a process. So the logical place to keep
track of it is in the "mm". Both ARM and X86 need to use the PASID in the
"mm".

Suggested-by: Christoph Hellwig <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
v2:
- This new patch moves "pasid" from x86 specific mm_context_t to generic
struct mm_struct per Christopher's comment: https://lore.kernel.org/linux-iommu/[email protected]/T/#mb57110ffe1aaa24750eeea4f93b611f0d1913911
- Jean-Philippe Brucker released a virtually same patch. I still put this
patch in the series for better review. The upstream kernel only needs one
of the two patches eventually.
https://lore.kernel.org/linux-iommu/[email protected]/
- Change CONFIG_IOASID to CONFIG_PCI_PASID (Ashok)

include/linux/mm_types.h | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 64ede5f150dc..5778db3aa42d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -538,6 +538,10 @@ struct mm_struct {
atomic_long_t hugetlb_usage;
#endif
struct work_struct async_put_work;
+
+#ifdef CONFIG_PCI_PASID
+ unsigned int pasid;
+#endif
} __randomize_layout;

/*
--
2.19.1

2020-06-13 00:46:18

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v2 09/12] fork: Clear PASID for new mm

When a new mm is created, its PASID should be cleared, i.e. the PASID is
initialized to its init state 0 on both ARM and X86.

Signed-off-by: Fenghua Yu <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
v2:
- Add this patch to initialize PASID value for a new mm.

include/linux/mm_types.h | 2 ++
kernel/fork.c | 8 ++++++++
2 files changed, 10 insertions(+)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5778db3aa42d..904bc07411a9 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -22,6 +22,8 @@
#endif
#define AT_VECTOR_SIZE (2*(AT_VECTOR_SIZE_ARCH + AT_VECTOR_SIZE_BASE + 1))

+/* Initial PASID value is 0. */
+#define INIT_PASID 0

struct address_space;
struct mem_cgroup;
diff --git a/kernel/fork.c b/kernel/fork.c
index 142b23645d82..085e72d3e9eb 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1007,6 +1007,13 @@ static void mm_init_owner(struct mm_struct *mm, struct task_struct *p)
#endif
}

+static void mm_init_pasid(struct mm_struct *mm)
+{
+#ifdef CONFIG_PCI_PASID
+ mm->pasid = INIT_PASID;
+#endif
+}
+
static void mm_init_uprobes_state(struct mm_struct *mm)
{
#ifdef CONFIG_UPROBES
@@ -1035,6 +1042,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p,
mm_init_cpumask(mm);
mm_init_aio(mm);
mm_init_owner(mm, p);
+ mm_init_pasid(mm);
RCU_INIT_POINTER(mm->exe_file, NULL);
mmu_notifier_subscriptions_init(mm);
init_tlb_flush_pending(mm);
--
2.19.1

2020-06-13 00:46:26

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v2 01/12] iommu: Change type of pasid to unsigned int

PASID is defined as a few different types in iommu including "int",
"u32", and "unsigned int". To be consistent and to match with ioasid's
type, define PASID and its variations (e.g. max PASID) as "unsigned int".

No PASID type change in uapi.

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
v2:
- Create this new patch to define PASID as "unsigned int" consistently in
iommu (Thomas)

drivers/gpu/drm/amd/amdkfd/kfd_iommu.c | 5 ++--
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +-
drivers/iommu/amd/amd_iommu.h | 13 ++++----
drivers/iommu/amd/amd_iommu_types.h | 12 ++++----
drivers/iommu/amd/init.c | 4 +--
drivers/iommu/amd/iommu.c | 41 ++++++++++++++------------
drivers/iommu/amd/iommu_v2.c | 22 +++++++-------
drivers/iommu/intel/debugfs.c | 2 +-
drivers/iommu/intel/dmar.c | 13 ++++----
drivers/iommu/intel/intel-pasid.h | 21 ++++++-------
drivers/iommu/intel/iommu.c | 4 +--
drivers/iommu/intel/pasid.c | 36 +++++++++++-----------
drivers/iommu/intel/svm.c | 12 ++++----
drivers/iommu/iommu.c | 2 +-
drivers/misc/uacce/uacce.c | 2 +-
include/linux/amd-iommu.h | 9 +++---
include/linux/intel-iommu.h | 18 +++++------
include/linux/intel-svm.h | 2 +-
include/linux/iommu.h | 8 ++---
include/linux/uacce.h | 2 +-
20 files changed, 121 insertions(+), 109 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
index 7c8786b9eb0a..703d23deca76 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
@@ -139,7 +139,8 @@ void kfd_iommu_unbind_process(struct kfd_process *p)
}

/* Callback for process shutdown invoked by the IOMMU driver */
-static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
+static void iommu_pasid_shutdown_callback(struct pci_dev *pdev,
+ unsigned int pasid)
{
struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
struct kfd_process *p;
@@ -185,7 +186,7 @@ static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
}

/* This function called by IOMMU driver on PPR failure */
-static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
+static int iommu_invalid_ppr_cb(struct pci_dev *pdev, unsigned int pasid,
unsigned long address, u16 flags)
{
struct kfd_dev *dev;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index f0587d94294d..3c7d1f774afe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -714,7 +714,7 @@ struct kfd_process {
/* We want to receive a notification when the mm_struct is destroyed */
struct mmu_notifier mmu_notifier;

- uint16_t pasid;
+ unsigned int pasid;
unsigned int doorbell_index;

/*
diff --git a/drivers/iommu/amd/amd_iommu.h b/drivers/iommu/amd/amd_iommu.h
index f892992c8744..0914b5b6f879 100644
--- a/drivers/iommu/amd/amd_iommu.h
+++ b/drivers/iommu/amd/amd_iommu.h
@@ -45,12 +45,13 @@ extern int amd_iommu_register_ppr_notifier(struct notifier_block *nb);
extern int amd_iommu_unregister_ppr_notifier(struct notifier_block *nb);
extern void amd_iommu_domain_direct_map(struct iommu_domain *dom);
extern int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids);
-extern int amd_iommu_flush_page(struct iommu_domain *dom, int pasid,
+extern int amd_iommu_flush_page(struct iommu_domain *dom, unsigned int pasid,
u64 address);
-extern int amd_iommu_flush_tlb(struct iommu_domain *dom, int pasid);
-extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, int pasid,
- unsigned long cr3);
-extern int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, int pasid);
+extern int amd_iommu_flush_tlb(struct iommu_domain *dom, unsigned int pasid);
+extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom,
+ unsigned int pasid, unsigned long cr3);
+extern int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom,
+ unsigned int pasid);
extern struct iommu_domain *amd_iommu_get_v2_domain(struct pci_dev *pdev);

#ifdef CONFIG_IRQ_REMAP
@@ -66,7 +67,7 @@ static inline int amd_iommu_create_irq_domain(struct amd_iommu *iommu)
#define PPR_INVALID 0x1
#define PPR_FAILURE 0xf

-extern int amd_iommu_complete_ppr(struct pci_dev *pdev, int pasid,
+extern int amd_iommu_complete_ppr(struct pci_dev *pdev, unsigned int pasid,
int status, int tag);

static inline bool is_rd890_iommu(struct pci_dev *pdev)
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 30a5d412255a..d395931b2232 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -443,11 +443,11 @@ extern struct kmem_cache *amd_iommu_irq_cache;
* incoming PPR faults around.
*/
struct amd_iommu_fault {
- u64 address; /* IO virtual address of the fault*/
- u32 pasid; /* Address space identifier */
- u16 device_id; /* Originating PCI device id */
- u16 tag; /* PPR tag */
- u16 flags; /* Fault flags */
+ u64 address; /* IO virtual address of the fault*/
+ unsigned int pasid; /* Address space identifier */
+ u16 device_id; /* Originating PCI device id */
+ u16 tag; /* PPR tag */
+ u16 flags; /* Fault flags */

};

@@ -745,7 +745,7 @@ extern unsigned long *amd_iommu_pd_alloc_bitmap;
extern bool amd_iommu_unmap_flush;

/* Smallest max PASID supported by any IOMMU in the system */
-extern u32 amd_iommu_max_pasid;
+extern unsigned int amd_iommu_max_pasid;

extern bool amd_iommu_v2_present;

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 6ebd4825e320..fbf5dd1d3eab 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -172,7 +172,7 @@ static int amd_iommus_present;
bool amd_iommu_np_cache __read_mostly;
bool amd_iommu_iotlb_sup __read_mostly = true;

-u32 amd_iommu_max_pasid __read_mostly = ~0;
+unsigned int amd_iommu_max_pasid __read_mostly = ~0;

bool amd_iommu_v2_present __read_mostly;
static bool amd_iommu_pc_present __read_mostly;
@@ -1750,7 +1750,7 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)

if (iommu_feature(iommu, FEATURE_GT)) {
int glxval;
- u32 max_pasid;
+ unsigned int max_pasid;
u64 pasmax;

pasmax = iommu->features & FEATURE_PASID_MASK;
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index 74cca1757172..32999adbbd46 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -502,7 +502,8 @@ static void amd_iommu_report_page_fault(u16 devid, u16 domain_id,
static void iommu_print_event(struct amd_iommu *iommu, void *__evt)
{
struct device *dev = iommu->iommu.dev;
- int type, devid, pasid, flags, tag;
+ int type, devid, flags, tag;
+ unsigned int pasid;
volatile u32 *event = __evt;
int count = 0;
u64 address;
@@ -898,8 +899,8 @@ static void build_inv_iotlb_pages(struct iommu_cmd *cmd, u16 devid, int qdep,
cmd->data[2] |= CMD_INV_IOMMU_PAGES_SIZE_MASK;
}

-static void build_inv_iommu_pasid(struct iommu_cmd *cmd, u16 domid, int pasid,
- u64 address, bool size)
+static void build_inv_iommu_pasid(struct iommu_cmd *cmd, u16 domid,
+ unsigned int pasid, u64 address, bool size)
{
memset(cmd, 0, sizeof(*cmd));

@@ -916,8 +917,9 @@ static void build_inv_iommu_pasid(struct iommu_cmd *cmd, u16 domid, int pasid,
CMD_SET_TYPE(cmd, CMD_INV_IOMMU_PAGES);
}

-static void build_inv_iotlb_pasid(struct iommu_cmd *cmd, u16 devid, int pasid,
- int qdep, u64 address, bool size)
+static void build_inv_iotlb_pasid(struct iommu_cmd *cmd, u16 devid,
+ unsigned int pasid, int qdep, u64 address,
+ bool size)
{
memset(cmd, 0, sizeof(*cmd));

@@ -936,8 +938,8 @@ static void build_inv_iotlb_pasid(struct iommu_cmd *cmd, u16 devid, int pasid,
CMD_SET_TYPE(cmd, CMD_INV_IOTLB_PAGES);
}

-static void build_complete_ppr(struct iommu_cmd *cmd, u16 devid, int pasid,
- int status, int tag, bool gn)
+static void build_complete_ppr(struct iommu_cmd *cmd, u16 devid,
+ unsigned int pasid, int status, int tag, bool gn)
{
memset(cmd, 0, sizeof(*cmd));

@@ -2772,7 +2774,7 @@ int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids)
}
EXPORT_SYMBOL(amd_iommu_domain_enable_v2);

-static int __flush_pasid(struct protection_domain *domain, int pasid,
+static int __flush_pasid(struct protection_domain *domain, unsigned int pasid,
u64 address, bool size)
{
struct iommu_dev_data *dev_data;
@@ -2833,13 +2835,13 @@ static int __flush_pasid(struct protection_domain *domain, int pasid,
return ret;
}

-static int __amd_iommu_flush_page(struct protection_domain *domain, int pasid,
- u64 address)
+static int __amd_iommu_flush_page(struct protection_domain *domain,
+ unsigned int pasid, u64 address)
{
return __flush_pasid(domain, pasid, address, false);
}

-int amd_iommu_flush_page(struct iommu_domain *dom, int pasid,
+int amd_iommu_flush_page(struct iommu_domain *dom, unsigned int pasid,
u64 address)
{
struct protection_domain *domain = to_pdomain(dom);
@@ -2854,13 +2856,14 @@ int amd_iommu_flush_page(struct iommu_domain *dom, int pasid,
}
EXPORT_SYMBOL(amd_iommu_flush_page);

-static int __amd_iommu_flush_tlb(struct protection_domain *domain, int pasid)
+static int __amd_iommu_flush_tlb(struct protection_domain *domain,
+ unsigned int pasid)
{
return __flush_pasid(domain, pasid, CMD_INV_IOMMU_ALL_PAGES_ADDRESS,
true);
}

-int amd_iommu_flush_tlb(struct iommu_domain *dom, int pasid)
+int amd_iommu_flush_tlb(struct iommu_domain *dom, unsigned int pasid)
{
struct protection_domain *domain = to_pdomain(dom);
unsigned long flags;
@@ -2874,7 +2877,7 @@ int amd_iommu_flush_tlb(struct iommu_domain *dom, int pasid)
}
EXPORT_SYMBOL(amd_iommu_flush_tlb);

-static u64 *__get_gcr3_pte(u64 *root, int level, int pasid, bool alloc)
+static u64 *__get_gcr3_pte(u64 *root, int level, unsigned int pasid, bool alloc)
{
int index;
u64 *pte;
@@ -2906,7 +2909,7 @@ static u64 *__get_gcr3_pte(u64 *root, int level, int pasid, bool alloc)
return pte;
}

-static int __set_gcr3(struct protection_domain *domain, int pasid,
+static int __set_gcr3(struct protection_domain *domain, unsigned int pasid,
unsigned long cr3)
{
struct domain_pgtable pgtable;
@@ -2925,7 +2928,7 @@ static int __set_gcr3(struct protection_domain *domain, int pasid,
return __amd_iommu_flush_tlb(domain, pasid);
}

-static int __clear_gcr3(struct protection_domain *domain, int pasid)
+static int __clear_gcr3(struct protection_domain *domain, unsigned int pasid)
{
struct domain_pgtable pgtable;
u64 *pte;
@@ -2943,7 +2946,7 @@ static int __clear_gcr3(struct protection_domain *domain, int pasid)
return __amd_iommu_flush_tlb(domain, pasid);
}

-int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, int pasid,
+int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, unsigned int pasid,
unsigned long cr3)
{
struct protection_domain *domain = to_pdomain(dom);
@@ -2958,7 +2961,7 @@ int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, int pasid,
}
EXPORT_SYMBOL(amd_iommu_domain_set_gcr3);

-int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, int pasid)
+int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, unsigned int pasid)
{
struct protection_domain *domain = to_pdomain(dom);
unsigned long flags;
@@ -2972,7 +2975,7 @@ int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, int pasid)
}
EXPORT_SYMBOL(amd_iommu_domain_clear_gcr3);

-int amd_iommu_complete_ppr(struct pci_dev *pdev, int pasid,
+int amd_iommu_complete_ppr(struct pci_dev *pdev, unsigned int pasid,
int status, int tag)
{
struct iommu_dev_data *dev_data;
diff --git a/drivers/iommu/amd/iommu_v2.c b/drivers/iommu/amd/iommu_v2.c
index e4b025c5637c..21dd53054acb 100644
--- a/drivers/iommu/amd/iommu_v2.c
+++ b/drivers/iommu/amd/iommu_v2.c
@@ -40,7 +40,7 @@ struct pasid_state {
struct mmu_notifier mn; /* mmu_notifier handle */
struct pri_queue pri[PRI_QUEUE_SIZE]; /* PRI tag states */
struct device_state *device_state; /* Link to our device_state */
- int pasid; /* PASID index */
+ unsigned int pasid; /* PASID index */
bool invalid; /* Used during setup and
teardown of the pasid */
spinlock_t lock; /* Protect pri_queues and
@@ -70,7 +70,7 @@ struct fault {
struct mm_struct *mm;
u64 address;
u16 devid;
- u16 pasid;
+ unsigned int pasid;
u16 tag;
u16 finish;
u16 flags;
@@ -150,7 +150,8 @@ static void put_device_state(struct device_state *dev_state)

/* Must be called under dev_state->lock */
static struct pasid_state **__get_pasid_state_ptr(struct device_state *dev_state,
- int pasid, bool alloc)
+ unsigned int pasid,
+ bool alloc)
{
struct pasid_state **root, **ptr;
int level, index;
@@ -184,7 +185,7 @@ static struct pasid_state **__get_pasid_state_ptr(struct device_state *dev_state

static int set_pasid_state(struct device_state *dev_state,
struct pasid_state *pasid_state,
- int pasid)
+ unsigned int pasid)
{
struct pasid_state **ptr;
unsigned long flags;
@@ -211,7 +212,8 @@ static int set_pasid_state(struct device_state *dev_state,
return ret;
}

-static void clear_pasid_state(struct device_state *dev_state, int pasid)
+static void clear_pasid_state(struct device_state *dev_state,
+ unsigned int pasid)
{
struct pasid_state **ptr;
unsigned long flags;
@@ -229,7 +231,7 @@ static void clear_pasid_state(struct device_state *dev_state, int pasid)
}

static struct pasid_state *get_pasid_state(struct device_state *dev_state,
- int pasid)
+ unsigned int pasid)
{
struct pasid_state **ptr, *ret = NULL;
unsigned long flags;
@@ -594,7 +596,7 @@ static struct notifier_block ppr_nb = {
.notifier_call = ppr_notifier,
};

-int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid,
+int amd_iommu_bind_pasid(struct pci_dev *pdev, unsigned int pasid,
struct task_struct *task)
{
struct pasid_state *pasid_state;
@@ -615,7 +617,7 @@ int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid,
return -EINVAL;

ret = -EINVAL;
- if (pasid < 0 || pasid >= dev_state->max_pasids)
+ if (pasid >= dev_state->max_pasids)
goto out;

ret = -ENOMEM;
@@ -679,7 +681,7 @@ int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid,
}
EXPORT_SYMBOL(amd_iommu_bind_pasid);

-void amd_iommu_unbind_pasid(struct pci_dev *pdev, int pasid)
+void amd_iommu_unbind_pasid(struct pci_dev *pdev, unsigned int pasid)
{
struct pasid_state *pasid_state;
struct device_state *dev_state;
@@ -695,7 +697,7 @@ void amd_iommu_unbind_pasid(struct pci_dev *pdev, int pasid)
if (dev_state == NULL)
return;

- if (pasid < 0 || pasid >= dev_state->max_pasids)
+ if (pasid >= dev_state->max_pasids)
goto out;

pasid_state = get_pasid_state(dev_state, pasid);
diff --git a/drivers/iommu/intel/debugfs.c b/drivers/iommu/intel/debugfs.c
index cf1ebb98e418..f3b99ed8ee92 100644
--- a/drivers/iommu/intel/debugfs.c
+++ b/drivers/iommu/intel/debugfs.c
@@ -20,7 +20,7 @@
struct tbl_walk {
u16 bus;
u16 devfn;
- u32 pasid;
+ unsigned int pasid;
struct root_entry *rt_entry;
struct context_entry *ctx_entry;
struct pasid_entry *pasid_tbl_entry;
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index cc46dff98fa0..805c02584df5 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -1395,8 +1395,8 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
}

/* PASID-based IOTLB invalidation */
-void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
- unsigned long npages, bool ih)
+void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, unsigned int pasid,
+ u64 addr, unsigned long npages, bool ih)
{
struct qi_desc desc = {.qw2 = 0, .qw3 = 0};

@@ -1437,7 +1437,7 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,

/* PASID-based device IOTLB Invalidate */
void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
- u32 pasid, u16 qdep, u64 addr,
+ unsigned int pasid, u16 qdep, u64 addr,
unsigned int size_order, u64 granu)
{
unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order - 1);
@@ -1465,7 +1465,7 @@ void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
}

void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
- u64 granu, int pasid)
+ u64 granu, unsigned int pasid)
{
struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};

@@ -1779,7 +1779,7 @@ void dmar_msi_read(int irq, struct msi_msg *msg)
}

static int dmar_fault_do_one(struct intel_iommu *iommu, int type,
- u8 fault_reason, int pasid, u16 source_id,
+ u8 fault_reason, unsigned int pasid, u16 source_id,
unsigned long long addr)
{
const char *reason;
@@ -1826,10 +1826,11 @@ irqreturn_t dmar_fault(int irq, void *dev_id)
while (1) {
/* Disable printing, simply clear the fault when ratelimited */
bool ratelimited = !__ratelimit(&rs);
+ unsigned int pasid;
u8 fault_reason;
u16 source_id;
u64 guest_addr;
- int type, pasid;
+ int type;
u32 data;
bool pasid_present;

diff --git a/drivers/iommu/intel/intel-pasid.h b/drivers/iommu/intel/intel-pasid.h
index c5318d40e0fa..fe9e9ac3b97e 100644
--- a/drivers/iommu/intel/intel-pasid.h
+++ b/drivers/iommu/intel/intel-pasid.h
@@ -72,7 +72,7 @@ struct pasid_entry {
struct pasid_table {
void *table; /* pasid table pointer */
int order; /* page order of pasid table */
- int max_pasid; /* max pasid */
+ unsigned int max_pasid; /* max pasid */
struct list_head dev; /* device list */
};

@@ -98,30 +98,31 @@ static inline bool pasid_pte_is_present(struct pasid_entry *pte)
return READ_ONCE(pte->val[0]) & PASID_PTE_PRESENT;
}

-extern u32 intel_pasid_max_id;
+extern unsigned int intel_pasid_max_id;
int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp);
-void intel_pasid_free_id(int pasid);
-void *intel_pasid_lookup_id(int pasid);
+void intel_pasid_free_id(unsigned int pasid);
+void *intel_pasid_lookup_id(unsigned int pasid);
int intel_pasid_alloc_table(struct device *dev);
void intel_pasid_free_table(struct device *dev);
struct pasid_table *intel_pasid_get_table(struct device *dev);
int intel_pasid_get_dev_max_id(struct device *dev);
-struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid);
+struct pasid_entry *intel_pasid_get_entry(struct device *dev,
+ unsigned int pasid);
int intel_pasid_setup_first_level(struct intel_iommu *iommu,
struct device *dev, pgd_t *pgd,
- int pasid, u16 did, int flags);
+ unsigned int pasid, u16 did, int flags);
int intel_pasid_setup_second_level(struct intel_iommu *iommu,
struct dmar_domain *domain,
- struct device *dev, int pasid);
+ struct device *dev, unsigned int pasid);
int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
struct dmar_domain *domain,
- struct device *dev, int pasid);
+ struct device *dev, unsigned int pasid);
int intel_pasid_setup_nested(struct intel_iommu *iommu,
- struct device *dev, pgd_t *pgd, int pasid,
+ struct device *dev, pgd_t *pgd, unsigned int pasid,
struct iommu_gpasid_bind_data_vtd *pasid_data,
struct dmar_domain *domain, int addr_width);
void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
- struct device *dev, int pasid,
+ struct device *dev, unsigned int pasid,
bool fault_ignore);
int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9129663a7406..5a6f85d645c6 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2469,7 +2469,7 @@ dmar_search_domain_by_dev_info(int segment, int bus, int devfn)
static int domain_setup_first_level(struct intel_iommu *iommu,
struct dmar_domain *domain,
struct device *dev,
- int pasid)
+ unsigned int pasid)
{
int flags = PASID_FLAG_SUPERVISOR_MODE;
struct dma_pte *pgd = domain->pgd;
@@ -5147,7 +5147,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
return -ENODEV;

if (domain->default_pasid <= 0) {
- int pasid;
+ unsigned int pasid;

/* No private data needed for the default pasid */
pasid = ioasid_alloc(NULL, PASID_MIN,
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index c81f0f17c6ba..8a98b68c1c87 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -25,7 +25,7 @@
* Intel IOMMU system wide PASID name space:
*/
static DEFINE_SPINLOCK(pasid_lock);
-u32 intel_pasid_max_id = PASID_MAX;
+unsigned int intel_pasid_max_id = PASID_MAX;

int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
{
@@ -146,7 +146,7 @@ int intel_pasid_alloc_table(struct device *dev)
struct pasid_table *pasid_table;
struct pasid_table_opaque data;
struct page *pages;
- int max_pasid = 0;
+ unsigned int max_pasid = 0;
int ret, order;
int size;

@@ -168,7 +168,7 @@ int intel_pasid_alloc_table(struct device *dev)
INIT_LIST_HEAD(&pasid_table->dev);

if (info->pasid_supported)
- max_pasid = min_t(int, pci_max_pasids(to_pci_dev(dev)),
+ max_pasid = min_t(unsigned int, pci_max_pasids(to_pci_dev(dev)),
intel_pasid_max_id);

size = max_pasid >> (PASID_PDE_SHIFT - 3);
@@ -242,7 +242,8 @@ int intel_pasid_get_dev_max_id(struct device *dev)
return info->pasid_table->max_pasid;
}

-struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid)
+struct pasid_entry *intel_pasid_get_entry(struct device *dev,
+ unsigned int pasid)
{
struct device_domain_info *info;
struct pasid_table *pasid_table;
@@ -251,8 +252,7 @@ struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid)
int dir_index, index;

pasid_table = intel_pasid_get_table(dev);
- if (WARN_ON(!pasid_table || pasid < 0 ||
- pasid >= intel_pasid_get_dev_max_id(dev)))
+ if (WARN_ON(!pasid_table || pasid >= intel_pasid_get_dev_max_id(dev)))
return NULL;

dir = pasid_table->table;
@@ -305,7 +305,8 @@ static inline void pasid_clear_entry_with_fpd(struct pasid_entry *pe)
}

static void
-intel_pasid_clear_entry(struct device *dev, int pasid, bool fault_ignore)
+intel_pasid_clear_entry(struct device *dev, unsigned int pasid,
+ bool fault_ignore)
{
struct pasid_entry *pe;

@@ -444,7 +445,7 @@ pasid_set_eafe(struct pasid_entry *pe)

static void
pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
- u16 did, int pasid)
+ u16 did, unsigned int pasid)
{
struct qi_desc desc;

@@ -458,7 +459,8 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
}

static void
-iotlb_invalidation_with_pasid(struct intel_iommu *iommu, u16 did, u32 pasid)
+iotlb_invalidation_with_pasid(struct intel_iommu *iommu, u16 did,
+ unsigned int pasid)
{
struct qi_desc desc;

@@ -473,7 +475,7 @@ iotlb_invalidation_with_pasid(struct intel_iommu *iommu, u16 did, u32 pasid)

static void
devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
- struct device *dev, int pasid)
+ struct device *dev, unsigned int pasid)
{
struct device_domain_info *info;
u16 sid, qdep, pfsid;
@@ -490,7 +492,7 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
}

void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev,
- int pasid, bool fault_ignore)
+ unsigned int pasid, bool fault_ignore)
{
struct pasid_entry *pte;
u16 did;
@@ -514,8 +516,8 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu, struct device *dev,
}

static void pasid_flush_caches(struct intel_iommu *iommu,
- struct pasid_entry *pte,
- int pasid, u16 did)
+ struct pasid_entry *pte,
+ unsigned int pasid, u16 did)
{
if (!ecap_coherent(iommu->ecap))
clflush_cache_range(pte, sizeof(*pte));
@@ -534,7 +536,7 @@ static void pasid_flush_caches(struct intel_iommu *iommu,
*/
int intel_pasid_setup_first_level(struct intel_iommu *iommu,
struct device *dev, pgd_t *pgd,
- int pasid, u16 did, int flags)
+ unsigned int pasid, u16 did, int flags)
{
struct pasid_entry *pte;

@@ -607,7 +609,7 @@ static inline int iommu_skip_agaw(struct dmar_domain *domain,
*/
int intel_pasid_setup_second_level(struct intel_iommu *iommu,
struct dmar_domain *domain,
- struct device *dev, int pasid)
+ struct device *dev, unsigned int pasid)
{
struct pasid_entry *pte;
struct dma_pte *pgd;
@@ -665,7 +667,7 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
*/
int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
struct dmar_domain *domain,
- struct device *dev, int pasid)
+ struct device *dev, unsigned int pasid)
{
u16 did = FLPT_DEFAULT_DID;
struct pasid_entry *pte;
@@ -751,7 +753,7 @@ intel_pasid_setup_bind_data(struct intel_iommu *iommu, struct pasid_entry *pte,
* @addr_width: Address width of the first level (guest)
*/
int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
- pgd_t *gpgd, int pasid,
+ pgd_t *gpgd, unsigned int pasid,
struct iommu_gpasid_bind_data_vtd *pasid_data,
struct dmar_domain *domain, int addr_width)
{
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 6c87c807a0ab..b5618341b4b1 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -23,7 +23,7 @@
#include "intel-pasid.h"

static irqreturn_t prq_event_thread(int irq, void *d);
-static void intel_svm_drain_prq(struct device *dev, int pasid);
+static void intel_svm_drain_prq(struct device *dev, unsigned int pasid);

#define PRQ_ORDER 0

@@ -371,7 +371,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
return ret;
}

-int intel_svm_unbind_gpasid(struct device *dev, int pasid)
+int intel_svm_unbind_gpasid(struct device *dev, unsigned int pasid)
{
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
struct intel_svm_dev *sdev;
@@ -601,7 +601,7 @@ intel_svm_bind_mm(struct device *dev, int flags, struct svm_dev_ops *ops,
}

/* Caller must hold pasid_mutex */
-static int intel_svm_unbind_mm(struct device *dev, int pasid)
+static int intel_svm_unbind_mm(struct device *dev, unsigned int pasid)
{
struct intel_svm_dev *sdev;
struct intel_iommu *iommu;
@@ -728,7 +728,7 @@ static bool is_canonical_address(u64 addr)
* described in VT-d spec CH7.10 to drain all page requests and page
* responses pending in the hardware.
*/
-static void intel_svm_drain_prq(struct device *dev, int pasid)
+static void intel_svm_drain_prq(struct device *dev, unsigned int pasid)
{
struct device_domain_info *info;
struct dmar_domain *domain;
@@ -988,10 +988,10 @@ void intel_svm_unbind(struct iommu_sva *sva)
mutex_unlock(&pasid_mutex);
}

-int intel_svm_get_pasid(struct iommu_sva *sva)
+unsigned int intel_svm_get_pasid(struct iommu_sva *sva)
{
struct intel_svm_dev *sdev;
- int pasid;
+ unsigned int pasid;

mutex_lock(&pasid_mutex);
sdev = to_intel_svm_dev(sva);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d43120eb1dc5..aba136ac377d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2828,7 +2828,7 @@ void iommu_sva_unbind_device(struct iommu_sva *handle)
}
EXPORT_SYMBOL_GPL(iommu_sva_unbind_device);

-int iommu_sva_get_pasid(struct iommu_sva *handle)
+unsigned int iommu_sva_get_pasid(struct iommu_sva *handle)
{
const struct iommu_ops *ops = handle->dev->bus->iommu_ops;

diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
index 107028e77ca3..8a3686e689dc 100644
--- a/drivers/misc/uacce/uacce.c
+++ b/drivers/misc/uacce/uacce.c
@@ -92,7 +92,7 @@ static long uacce_fops_compat_ioctl(struct file *filep,

static int uacce_bind_queue(struct uacce_device *uacce, struct uacce_queue *q)
{
- int pasid;
+ unsigned int pasid;
struct iommu_sva *handle;

if (!(uacce->flags & UACCE_DEV_SVA))
diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h
index 21e950e4ab62..25387ab5a4f5 100644
--- a/include/linux/amd-iommu.h
+++ b/include/linux/amd-iommu.h
@@ -76,7 +76,7 @@ extern void amd_iommu_free_device(struct pci_dev *pdev);
*
* The function returns 0 on success or a negative value on error.
*/
-extern int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid,
+extern int amd_iommu_bind_pasid(struct pci_dev *pdev, unsigned int pasid,
struct task_struct *task);

/**
@@ -88,7 +88,7 @@ extern int amd_iommu_bind_pasid(struct pci_dev *pdev, int pasid,
* When this function returns the device is no longer using the PASID
* and the PASID is no longer bound to its task.
*/
-extern void amd_iommu_unbind_pasid(struct pci_dev *pdev, int pasid);
+extern void amd_iommu_unbind_pasid(struct pci_dev *pdev, unsigned int pasid);

/**
* amd_iommu_set_invalid_ppr_cb() - Register a call-back for failed
@@ -114,7 +114,7 @@ extern void amd_iommu_unbind_pasid(struct pci_dev *pdev, int pasid);
#define AMD_IOMMU_INV_PRI_RSP_FAIL 2

typedef int (*amd_iommu_invalid_ppr_cb)(struct pci_dev *pdev,
- int pasid,
+ unsigned int pasid,
unsigned long address,
u16);

@@ -166,7 +166,8 @@ extern int amd_iommu_device_info(struct pci_dev *pdev,
* @cb: The call-back function
*/

-typedef void (*amd_iommu_invalidate_ctx)(struct pci_dev *pdev, int pasid);
+typedef void (*amd_iommu_invalidate_ctx)(struct pci_dev *pdev,
+ unsigned int pasid);

extern int amd_iommu_set_invalidate_ctx_cb(struct pci_dev *pdev,
amd_iommu_invalidate_ctx cb);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 4100bd224f5c..44fa8879f829 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -549,7 +549,7 @@ struct dmar_domain {
2 == 1GiB, 3 == 512GiB, 4 == 1TiB */
u64 max_addr; /* maximum mapped address */

- int default_pasid; /*
+ unsigned int default_pasid; /*
* The default pasid used for non-SVM
* traffic on mediated devices.
*/
@@ -699,14 +699,14 @@ extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
u16 qdep, u64 addr, unsigned mask);

-void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
- unsigned long npages, bool ih);
+void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, unsigned int pasid,
+ u64 addr, unsigned long npages, bool ih);

void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
- u32 pasid, u16 qdep, u64 addr,
+ unsigned int pasid, u16 qdep, u64 addr,
unsigned int size_order, u64 granu);
void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
- int pasid);
+ unsigned int pasid);

int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc,
unsigned int count, unsigned long options);
@@ -734,11 +734,11 @@ extern int intel_svm_enable_prq(struct intel_iommu *iommu);
extern int intel_svm_finish_prq(struct intel_iommu *iommu);
int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev,
struct iommu_gpasid_bind_data *data);
-int intel_svm_unbind_gpasid(struct device *dev, int pasid);
+int intel_svm_unbind_gpasid(struct device *dev, unsigned int pasid);
struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm,
void *drvdata);
void intel_svm_unbind(struct iommu_sva *handle);
-int intel_svm_get_pasid(struct iommu_sva *handle);
+unsigned int intel_svm_get_pasid(struct iommu_sva *handle);
struct svm_dev_ops;

struct intel_svm_dev {
@@ -747,7 +747,7 @@ struct intel_svm_dev {
struct device *dev;
struct svm_dev_ops *ops;
struct iommu_sva sva;
- int pasid;
+ unsigned int pasid;
int users;
u16 did;
u16 dev_iotlb:1;
@@ -760,7 +760,7 @@ struct intel_svm {

struct intel_iommu *iommu;
int flags;
- int pasid;
+ unsigned int pasid;
int gpasid; /* In case that guest PASID is different from host PASID */
struct list_head devs;
struct list_head list;
diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
index c9e7e601950d..7ec24f19b7d3 100644
--- a/include/linux/intel-svm.h
+++ b/include/linux/intel-svm.h
@@ -11,7 +11,7 @@
struct device;

struct svm_dev_ops {
- void (*fault_cb)(struct device *dev, int pasid, u64 address,
+ void (*fault_cb)(struct device *dev, unsigned int pasid, u64 address,
void *private, int rwxp, int response);
};

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5f0b7859d2eb..813961145679 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -292,7 +292,7 @@ struct iommu_ops {
struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm,
void *drvdata);
void (*sva_unbind)(struct iommu_sva *handle);
- int (*sva_get_pasid)(struct iommu_sva *handle);
+ unsigned int (*sva_get_pasid)(struct iommu_sva *handle);

int (*page_response)(struct device *dev,
struct iommu_fault_event *evt,
@@ -302,7 +302,7 @@ struct iommu_ops {
int (*sva_bind_gpasid)(struct iommu_domain *domain,
struct device *dev, struct iommu_gpasid_bind_data *data);

- int (*sva_unbind_gpasid)(struct device *dev, int pasid);
+ int (*sva_unbind_gpasid)(struct device *dev, unsigned int pasid);

int (*def_domain_type)(struct device *dev);

@@ -656,7 +656,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev,
struct mm_struct *mm,
void *drvdata);
void iommu_sva_unbind_device(struct iommu_sva *handle);
-int iommu_sva_get_pasid(struct iommu_sva *handle);
+unsigned int iommu_sva_get_pasid(struct iommu_sva *handle);

#else /* CONFIG_IOMMU_API */

@@ -1049,7 +1049,7 @@ static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
{
}

-static inline int iommu_sva_get_pasid(struct iommu_sva *handle)
+static inline unsigned int iommu_sva_get_pasid(struct iommu_sva *handle)
{
return IOMMU_PASID_INVALID;
}
diff --git a/include/linux/uacce.h b/include/linux/uacce.h
index 454c2f6672d7..667d7e691c0d 100644
--- a/include/linux/uacce.h
+++ b/include/linux/uacce.h
@@ -81,7 +81,7 @@ struct uacce_queue {
struct list_head list;
struct uacce_qfile_region *qfrs[UACCE_MAX_REGION];
enum uacce_q_state state;
- int pasid;
+ unsigned int pasid;
struct iommu_sva *handle;
};

--
2.19.1

2020-06-13 00:46:27

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v2 02/12] ocxl: Change type of pasid to unsigned int

PASID is defined as "int" although it's a 20-bit value and shouldn't be
negative int. To be consistent with type defined in iommu, define PASID
as "unsigned int".

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
v2:
- Create this new patch to define PASID as "unsigned int" consistently in
ocxl (Thomas)

drivers/misc/ocxl/config.c | 3 ++-
drivers/misc/ocxl/link.c | 6 +++---
drivers/misc/ocxl/ocxl_internal.h | 6 +++---
drivers/misc/ocxl/pasid.c | 2 +-
drivers/misc/ocxl/trace.h | 20 ++++++++++----------
include/misc/ocxl.h | 6 +++---
6 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
index c8e19bfb5ef9..22d034caed3d 100644
--- a/drivers/misc/ocxl/config.c
+++ b/drivers/misc/ocxl/config.c
@@ -806,7 +806,8 @@ int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec)
}
EXPORT_SYMBOL_GPL(ocxl_config_set_TL);

-int ocxl_config_terminate_pasid(struct pci_dev *dev, int afu_control, int pasid)
+int ocxl_config_terminate_pasid(struct pci_dev *dev, int afu_control,
+ unsigned int pasid)
{
u32 val;
unsigned long timeout;
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 58d111afd9f6..931f6ae022db 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -492,7 +492,7 @@ static u64 calculate_cfg_state(bool kernel)
return state;
}

-int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
+int ocxl_link_add_pe(void *link_handle, unsigned int pasid, u32 pidr, u32 tidr,
u64 amr, struct mm_struct *mm,
void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
void *xsl_err_data)
@@ -572,7 +572,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
}
EXPORT_SYMBOL_GPL(ocxl_link_add_pe);

-int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid)
+int ocxl_link_update_pe(void *link_handle, unsigned int pasid, __u16 tid)
{
struct ocxl_link *link = (struct ocxl_link *) link_handle;
struct spa *spa = link->spa;
@@ -608,7 +608,7 @@ int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid)
return rc;
}

-int ocxl_link_remove_pe(void *link_handle, int pasid)
+int ocxl_link_remove_pe(void *link_handle, unsigned int pasid)
{
struct ocxl_link *link = (struct ocxl_link *) link_handle;
struct spa *spa = link->spa;
diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
index 345bf843a38e..3ca982ba7472 100644
--- a/drivers/misc/ocxl/ocxl_internal.h
+++ b/drivers/misc/ocxl/ocxl_internal.h
@@ -41,7 +41,7 @@ struct ocxl_afu {
struct ocxl_afu_config config;
int pasid_base;
int pasid_count; /* opened contexts */
- int pasid_max; /* maximum number of contexts */
+ unsigned int pasid_max; /* maximum number of contexts */
int actag_base;
int actag_enabled;
struct mutex contexts_lock;
@@ -69,7 +69,7 @@ struct ocxl_xsl_error {

struct ocxl_context {
struct ocxl_afu *afu;
- int pasid;
+ unsigned int pasid;
struct mutex status_mutex;
enum ocxl_context_status status;
struct address_space *mapping;
@@ -128,7 +128,7 @@ int ocxl_config_check_afu_index(struct pci_dev *dev,
* pasid: the PASID for the AFU context
* tid: the new thread id for the process element
*/
-int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
+int ocxl_link_update_pe(void *link_handle, unsigned int pasid, __u16 tid);

int ocxl_context_mmap(struct ocxl_context *ctx,
struct vm_area_struct *vma);
diff --git a/drivers/misc/ocxl/pasid.c b/drivers/misc/ocxl/pasid.c
index d14cb56e6920..a151fc8f0bec 100644
--- a/drivers/misc/ocxl/pasid.c
+++ b/drivers/misc/ocxl/pasid.c
@@ -80,7 +80,7 @@ static void range_free(struct list_head *head, u32 start, u32 size,

int ocxl_pasid_afu_alloc(struct ocxl_fn *fn, u32 size)
{
- int max_pasid;
+ unsigned int max_pasid;

if (fn->config.max_pasid_log < 0)
return -ENOSPC;
diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h
index 17e21cb2addd..019e2fc63b1d 100644
--- a/drivers/misc/ocxl/trace.h
+++ b/drivers/misc/ocxl/trace.h
@@ -9,13 +9,13 @@
#include <linux/tracepoint.h>

DECLARE_EVENT_CLASS(ocxl_context,
- TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
+ TP_PROTO(pid_t pid, void *spa, unsigned int pasid, u32 pidr, u32 tidr),
TP_ARGS(pid, spa, pasid, pidr, tidr),

TP_STRUCT__entry(
__field(pid_t, pid)
__field(void*, spa)
- __field(int, pasid)
+ __field(unsigned int, pasid)
__field(u32, pidr)
__field(u32, tidr)
),
@@ -38,21 +38,21 @@ DECLARE_EVENT_CLASS(ocxl_context,
);

DEFINE_EVENT(ocxl_context, ocxl_context_add,
- TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
+ TP_PROTO(pid_t pid, void *spa, unsigned int pasid, u32 pidr, u32 tidr),
TP_ARGS(pid, spa, pasid, pidr, tidr)
);

DEFINE_EVENT(ocxl_context, ocxl_context_remove,
- TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
+ TP_PROTO(pid_t pid, void *spa, unsigned int pasid, u32 pidr, u32 tidr),
TP_ARGS(pid, spa, pasid, pidr, tidr)
);

TRACE_EVENT(ocxl_terminate_pasid,
- TP_PROTO(int pasid, int rc),
+ TP_PROTO(unsigned int pasid, int rc),
TP_ARGS(pasid, rc),

TP_STRUCT__entry(
- __field(int, pasid)
+ __field(unsigned int, pasid)
__field(int, rc)
),

@@ -107,11 +107,11 @@ DEFINE_EVENT(ocxl_fault_handler, ocxl_fault_ack,
);

TRACE_EVENT(ocxl_afu_irq_alloc,
- TP_PROTO(int pasid, int irq_id, unsigned int virq, int hw_irq),
+ TP_PROTO(unsigned int pasid, int irq_id, unsigned int virq, int hw_irq),
TP_ARGS(pasid, irq_id, virq, hw_irq),

TP_STRUCT__entry(
- __field(int, pasid)
+ __field(unsigned int, pasid)
__field(int, irq_id)
__field(unsigned int, virq)
__field(int, hw_irq)
@@ -133,11 +133,11 @@ TRACE_EVENT(ocxl_afu_irq_alloc,
);

TRACE_EVENT(ocxl_afu_irq_free,
- TP_PROTO(int pasid, int irq_id),
+ TP_PROTO(unsigned int pasid, int irq_id),
TP_ARGS(pasid, irq_id),

TP_STRUCT__entry(
- __field(int, pasid)
+ __field(unsigned int, pasid)
__field(int, irq_id)
),

diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
index 06dd5839e438..5eca04c8da97 100644
--- a/include/misc/ocxl.h
+++ b/include/misc/ocxl.h
@@ -429,7 +429,7 @@ int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec);
* desired AFU. It can be found in the AFU configuration
*/
int ocxl_config_terminate_pasid(struct pci_dev *dev,
- int afu_control_offset, int pasid);
+ int afu_control_offset, unsigned int pasid);

/*
* Read the configuration space of a function and fill in a
@@ -466,7 +466,7 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle);
* 'xsl_err_data' is an argument passed to the above callback, if
* defined
*/
-int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
+int ocxl_link_add_pe(void *link_handle, unsigned int pasid, u32 pidr, u32 tidr,
u64 amr, struct mm_struct *mm,
void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
void *xsl_err_data);
@@ -474,7 +474,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
/*
* Remove a Process Element from the Shared Process Area for a link
*/
-int ocxl_link_remove_pe(void *link_handle, int pasid);
+int ocxl_link_remove_pe(void *link_handle, unsigned int pasid);

/*
* Allocate an AFU interrupt associated to the link.
--
2.19.1

2020-06-13 00:47:06

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v2 05/12] x86/cpufeatures: Enumerate ENQCMD and ENQCMDS instructions

Work submission instruction comes in two flavors. ENQCMD can be called
both in ring 3 and ring 0 and always uses the contents of PASID MSR when
shipping the command to the device. ENQCMDS allows a kernel driver to
submit commands on behalf of a user process. The driver supplies the
PASID value in ENQCMDS. There isn't any usage of ENQCMD in the kernel
as of now.

The CPU feature flag is shown as "enqcmd" in /proc/cpuinfo.

Signed-off-by: Fenghua Yu <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
v2:
- Re-write commit message (Thomas)

arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/cpuid-deps.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 02dabc9e77b0..4469618c410f 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -351,6 +351,7 @@
#define X86_FEATURE_CLDEMOTE (16*32+25) /* CLDEMOTE instruction */
#define X86_FEATURE_MOVDIRI (16*32+27) /* MOVDIRI instruction */
#define X86_FEATURE_MOVDIR64B (16*32+28) /* MOVDIR64B instruction */
+#define X86_FEATURE_ENQCMD (16*32+29) /* ENQCMD and ENQCMDS instructions */

/* AMD-defined CPU features, CPUID level 0x80000007 (EBX), word 17 */
#define X86_FEATURE_OVERFLOW_RECOV (17*32+ 0) /* MCA overflow recovery support */
diff --git a/arch/x86/kernel/cpu/cpuid-deps.c b/arch/x86/kernel/cpu/cpuid-deps.c
index 3cbe24ca80ab..3a02707c1f4d 100644
--- a/arch/x86/kernel/cpu/cpuid-deps.c
+++ b/arch/x86/kernel/cpu/cpuid-deps.c
@@ -69,6 +69,7 @@ static const struct cpuid_dep cpuid_deps[] = {
{ X86_FEATURE_CQM_MBM_TOTAL, X86_FEATURE_CQM_LLC },
{ X86_FEATURE_CQM_MBM_LOCAL, X86_FEATURE_CQM_LLC },
{ X86_FEATURE_AVX512_BF16, X86_FEATURE_AVX512VL },
+ { X86_FEATURE_ENQCMD, X86_FEATURE_XSAVES },
{}
};

--
2.19.1

2020-06-13 00:47:10

by Fenghua Yu

[permalink] [raw]
Subject: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

A #GP fault is generated when ENQCMD instruction is executed without
a valid PASID value programmed in the current thread's PASID MSR. The
#GP fault handler will initialize the MSR if a PASID has been allocated
for this process.

Decoding the user instruction is ugly and sets a bad architecture
precedent. It may not function if the faulting instruction is modified
after #GP.

Thomas suggested to provide a reason for the #GP caused by executing ENQCMD
without a valid PASID value programmed. #GP error codes are 16 bits and all
16 bits are taken. Refer to SDM Vol 3, Chapter 16.13 for details. The other
choice was to reflect the error code in an MSR. ENQCMD can also cause #GP
when loading from the source operand, so its not fully comprehending all
the reasons. Rather than special case the ENQCMD, in future Intel may
choose a different fault mechanism for such cases if recovery is needed on
#GP.

The following heuristic is used to avoid decoding the user instructions
to determine the precise reason for the #GP fault:
1) If the mm for the process has not been allocated a PASID, this #GP
cannot be fixed.
2) If the PASID MSR is already initialized, then the #GP was for some
other reason
3) Try initializing the PASID MSR and returning. If the #GP was from
an ENQCMD this will fix it. If not, the #GP fault will be repeated
and will hit case "2".

Suggested-by: Thomas Gleixner <[email protected]>
Signed-off-by: Fenghua Yu <[email protected]>
Reviewed-by: Tony Luck <[email protected]>
---
v2:
- Update the first paragraph of the commit message (Thomas)
- Add reasons why don't decode the user instruction and don't use
#GP error code (Thomas)
- Change get_task_mm() to current->mm (Thomas)
- Add comments on why IRQ is disabled during PASID fixup (Thomas)
- Add comment in fixup() that the function is called when #GP is from
user (so mm is not NULL) (Dave Hansen)

arch/x86/include/asm/iommu.h | 1 +
arch/x86/kernel/traps.c | 23 +++++++++++++++++++++
drivers/iommu/intel/svm.c | 39 ++++++++++++++++++++++++++++++++++++
3 files changed, 63 insertions(+)

diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
index ed41259fe7ac..e9365a5d6f7d 100644
--- a/arch/x86/include/asm/iommu.h
+++ b/arch/x86/include/asm/iommu.h
@@ -27,5 +27,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
}

void __free_pasid(struct mm_struct *mm);
+bool __fixup_pasid_exception(void);

#endif /* _ASM_X86_IOMMU_H */
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index 4cc541051994..0f78d5cdddfe 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -59,6 +59,7 @@
#include <asm/umip.h>
#include <asm/insn.h>
#include <asm/insn-eval.h>
+#include <asm/iommu.h>

#ifdef CONFIG_X86_64
#include <asm/x86_init.h>
@@ -436,6 +437,16 @@ static enum kernel_gp_hint get_kernel_gp_address(struct pt_regs *regs,
return GP_CANONICAL;
}

+static bool fixup_pasid_exception(void)
+{
+ if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
+ return false;
+ if (!static_cpu_has(X86_FEATURE_ENQCMD))
+ return false;
+
+ return __fixup_pasid_exception();
+}
+
#define GPFSTR "general protection fault"

dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code)
@@ -447,6 +458,18 @@ dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code)
int ret;

RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
+
+ /*
+ * Perform the check for a user mode PASID exception before enable
+ * interrupts. Doing this here ensures that the PASID MSR can be simply
+ * accessed because the contents are known to be still associated
+ * with the current process.
+ */
+ if (user_mode(regs) && fixup_pasid_exception()) {
+ cond_local_irq_enable(regs);
+ return;
+ }
+
cond_local_irq_enable(regs);

if (static_cpu_has(X86_FEATURE_UMIP)) {
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 27dc866b8461..81fd2380c0f9 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1078,3 +1078,42 @@ void __free_pasid(struct mm_struct *mm)
*/
ioasid_free(pasid);
}
+
+/*
+ * Apply some heuristics to see if the #GP fault was caused by a thread
+ * that hasn't had the IA32_PASID MSR initialized. If it looks like that
+ * is the problem, try initializing the IA32_PASID MSR. If the heuristic
+ * guesses incorrectly, take one more #GP fault.
+ */
+bool __fixup_pasid_exception(void)
+{
+ u64 pasid_msr;
+ unsigned int pasid;
+
+ /*
+ * This function is called only when this #GP was triggered from user
+ * space. So the mm cannot be NULL.
+ */
+ pasid = current->mm->pasid;
+ /* If the mm doesn't have a valid PASID, then can't help. */
+ if (invalid_pasid(pasid))
+ return false;
+
+ /*
+ * Since IRQ is disabled now, the current task still owns the FPU on
+ * this CPU and the PASID MSR can be directly accessed.
+ *
+ * If the MSR has a valid PASID, the #GP must be for some other reason.
+ *
+ * If rdmsr() is really a performance issue, a TIF_ flag may be
+ * added to check if the thread has a valid PASID instead of rdmsr().
+ */
+ rdmsrl(MSR_IA32_PASID, pasid_msr);
+ if (pasid_msr & MSR_IA32_PASID_VALID)
+ return false;
+
+ /* Fix up the MSR if the MSR doesn't have a valid PASID. */
+ wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
+
+ return true;
+}
--
2.19.1

2020-06-13 12:26:00

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v2 04/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing)

Hi Fenghua,

On 2020/6/13 8:41, Fenghua Yu wrote:
> From: Ashok Raj <[email protected]>
>
> ENQCMD and Data Streaming Accelerator (DSA) and all of their associated
> features are a complicated stack with lots of interconnected pieces.
> This documentation provides a big picture overview for all of the
> features.
>
> Signed-off-by: Ashok Raj <[email protected]>
> Co-developed-by: Fenghua Yu <[email protected]>
> Signed-off-by: Fenghua Yu <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> ---
> v2:
> - Fix the doc format and add the doc in toctree (Thomas)
> - Modify the doc for better description (Thomas, Tony, Dave)
>
> Documentation/x86/index.rst | 1 +
> Documentation/x86/sva.rst | 287 ++++++++++++++++++++++++++++++++++++
> 2 files changed, 288 insertions(+)
> create mode 100644 Documentation/x86/sva.rst
>
> diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
> index 265d9e9a093b..e5d5ff096685 100644
> --- a/Documentation/x86/index.rst
> +++ b/Documentation/x86/index.rst
> @@ -30,3 +30,4 @@ x86-specific Documentation
> usb-legacy-support
> i386/index
> x86_64/index
> + sva
> diff --git a/Documentation/x86/sva.rst b/Documentation/x86/sva.rst
> new file mode 100644
> index 000000000000..1e52208c7dda
> --- /dev/null
> +++ b/Documentation/x86/sva.rst
> @@ -0,0 +1,287 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===========================================
> +Shared Virtual Addressing (SVA) with ENQCMD
> +===========================================
> +
> +Background
> +==========
> +
> +Shared Virtual Addressing (SVA) allows the processor and device to use the
> +same virtual addresses avoiding the need for software to translate virtual
> +addresses to physical addresses. SVA is what PCIe calls Shared Virtual
> +Memory (SVM)
> +
> +In addition to the convenience of using application virtual addresses
> +by the device, it also doesn't require pinning pages for DMA.
> +PCIe Address Translation Services (ATS) along with Page Request Interface
> +(PRI) allow devices to function much the same way as the CPU handling
> +application page-faults. For more information please refer to PCIe
> +specification Chapter 10: ATS Specification.
> +
> +Use of SVA requires IOMMU support in the platform. IOMMU also is required
> +to support PCIe features ATS and PRI. ATS allows devices to cache
> +translations for the virtual address. IOMMU driver uses the mmu_notifier()
> +support to keep the device tlb cache and the CPU cache in sync. PRI allows
> +the device to request paging the virtual address before using if they are
> +not paged in the CPU page tables.
> +
> +
> +Shared Hardware Workqueues
> +==========================
> +
> +Unlike Single Root I/O Virtualization (SRIOV), Scalable IOV (SIOV) permits
> +the use of Shared Work Queues (SWQ) by both applications and Virtual
> +Machines (VM's). This allows better hardware utilization vs. hard
> +partitioning resources that could result in under utilization. In order to
> +allow the hardware to distinguish the context for which work is being
> +executed in the hardware by SWQ interface, SIOV uses Process Address Space
> +ID (PASID), which is a 20bit number defined by the PCIe SIG.
> +
> +PASID value is encoded in all transactions from the device. This allows the
> +IOMMU to track I/O on a per-PASID granularity in addition to using the PCIe
> +Resource Identifier (RID) which is the Bus/Device/Function.
> +
> +
> +ENQCMD
> +======
> +
> +ENQCMD is a new instruction on Intel platforms that atomically submits a
> +work descriptor to a device. The descriptor includes the operation to be
> +performed, virtual addresses of all parameters, virtual address of a completion
> +record, and the PASID (process address space ID) of the current process.
> +
> +ENQCMD works with non-posted semantics and carries a status back if the
> +command was accepted by hardware. This allows the submitter to know if the
> +submission needs to be retried or other device specific mechanisms to
> +implement implement fairness or ensure forward progress can be made.

Repeated "implement".

> +
> +ENQCMD is the glue that ensures applications can directly submit commands
> +to the hardware and also permit hardware to be aware of application context
> +to perform I/O operations via use of PASID.
> +
> +Process Address Space Tagging
> +=============================
> +
> +A new thread scoped MSR (IA32_PASID) provides the connection between
> +user processes and the rest of the hardware. When an application first
> +accesses an SVA capable device this MSR is initialized with a newly
> +allocated PASID. The driver for the device calls an IOMMU specific api
> +that sets up the routing for DMA and page-requests.
> +
> +For example, the Intel Data Streaming Accelerator (DSA) uses
> +intel_svm_bind_mm(), which will do the following.

The Intel SVM APIs have been deprecated. Drivers should use
iommu_sva_bind_device() instead. Please also update other places in
this document.

> +
> +- Allocate the PASID, and program the process page-table (cr3) in the PASID
> + context entries.
> +- Register for mmu_notifier() to track any page-table invalidations to keep
> + the device tlb in sync. For example, when a page-table entry is invalidated,
> + IOMMU propagates the invalidation to device tlb. This will force any
> + future access by the device to this virtual address to participate in
> + ATS. If the IOMMU responds with proper response that a page is not
> + present, the device would request the page to be paged in via the PCIe PRI
> + protocol before performing I/O.
> +
> +This MSR is managed with the XSAVE feature set as "supervisor state" to
> +ensure the MSR is updated during context switch.
> +
> +PASID Management
> +================
> +
> +The kernel must allocate a PASID on behalf of each process and program it
> +into the new MSR to communicate the process identity to platform hardware.
> +ENQCMD uses the PASID stored in this MSR to tag requests from this process.
> +When a user submits a work descriptor to a device using the ENQCMD
> +instruction, the PASID field in the descriptor is auto-filled with the
> +value from MSR_IA32_PASID. Requests for DMA from the device are also tagged
> +with the same PASID. The platform IOMMU uses the PASID in the transaction to
> +perform address translation. The IOMMU api's setup the corresponding PASID
> +entry in IOMMU with the process address used by the CPU (for e.g cr3 in x86).
> +
> +The MSR must be configured on each logical CPU before any application
> +thread can interact with a device. Threads that belong to the same
> +process share the same page tables, thus the same MSR value.
> +
> +PASID is cleared when a process is created. The PASID allocation and MSR
> +programming may occur long after a process and its threads have been created.
> +One thread must call bind() to allocate the PASID for the process. If a
> +thread uses ENQCMD without the MSR first being populated, it will cause #GP.
> +The kernel will fix up the #GP by writing the process-wide PASID into the
> +thread that took the #GP. A single process PASID can be used simultaneously
> +with multiple devices since they all share the same address space.
> +
> +New threads could inherit the MSR value from the parent. But this would
> +involve additional state management for those threads which may never use
> +ENQCMD. Clearing the MSR at thread creation permits all threads to have a
> +consistent behavior; the PASID is only programmed when the thread calls
> +the bind() api (intel_svm_bind_mm()()), or when a thread calls ENQCMD for
> +the first time.
> +
> +PASID Lifecycle Management
> +==========================
> +
> +Only processes that access SVA capable devices need to have a PASID
> +allocated. This allocation happens when a process first opens an SVA
> +capable device (subsequent opens of the same, or other devices will
> +share the same PASID).
> +
> +Although the PASID is allocated to the process by opening a device,
> +it is not active in any of the threads of that process. Activation is
> +done lazily when a thread tries to submit a work descriptor to a device
> +using the ENQCMD.
> +
> +That first access will trigger a #GP fault because the IA32_PASID MSR
> +has not been initialized with the PASID value assigned to the process
> +when the device was opened. The Linux #GP handler notes that a PASID as
> +been allocated for the process, and so initializes the IA32_PASID MSR
> +and returns so that the ENQCMD instruction is re-executed.
> +
> +On fork(2) or exec(2) the PASID is removed from the process as it no
> +longer has the same address space that it had when the device was opened.
> +
> +On clone(2) the new task shares the same address space, so will be
> +able to use the PASID allocated to the process. The IA32_PASID is not
> +preemptively initialized as the kernel does not know whether this thread
> +is going to access the device.
> +
> +On exit(2) the PASID is freed. The device driver ensures that any pending
> +operations queued to the device are either completed or aborted before
> +allowing the PASID to be re-allocated.

reallocated

> +
> +Relationships
> +=============
> +
> + * Each process has many threads, but only one PASID
> + * Devices have a limited number (~10's to 1000's) of hardware
> + workqueues and each portal maps down to a single workqueue.
> + The device driver manages allocating hardware workqueues.
> + * A single mmap() maps a single hardware workqueue as a "portal"
> + * For each device with which a process interacts, there must be
> + one or more mmap()'d portals.
> + * Many threads within a process can share a single portal to access
> + a single device.
> + * Multiple processes can separately mmap() the same portal, in
> + which case they still share one device hardware workqueue.
> + * The single process-wide PASID is used by all threads to interact
> + with all devices. There is not, for instance, a PASID for each
> + thread or each thread<->device pair.
> +
> +FAQ
> +===
> +
> +* What is SVA/SVM?
> +
> +Shared Virtual Addressing (SVA) permits I/O hardware and the processor to
> +work in the same address space. In short, sharing the address space. Some
> +call it Shared Virtual Memory (SVM), but Linux community wanted to avoid
> +it with Posix Shared Memory and Secure Virtual Machines which were terms
> +already in circulation.
> +
> +* What is a PASID?
> +
> +A Process Address Space ID (PASID) is a PCIe-defined TLP Prefix. A PASID is
> +a 20 bit number allocated and managed by the OS. PASID is included in all
> +transactions between the platform and the device.
> +
> +* How are shared work queues different?
> +
> +Traditionally to allow user space applications interact with hardware,
> +there is a separate instance required per process. For example, consider
> +doorbells as a mechanism of informing hardware about work to process. Each
> +doorbell is required to be spaced 4k (or page-size) apart for process
> +isolation. This requires hardware to provision that space and reserve in
> +MMIO. This doesn't scale as the number of threads becomes quite large. The
> +hardware also manages the queue depth for Shared Work Queues (SWQ), and
> +consumers don't need to track queue depth. If there is no space to accept
> +a command, the device will return an error indicating retry. Also
> +submitting a command to an MMIO address that can't accept ENQCMD will
> +return retry in response. In the new DMWr PCIe terminology, devices need to
> +support DMWr completer capability. In addition it requires all switch ports
> +to support DMWr routing and must be enabled by the PCIe subsystem, much
> +like how PCIe Atomics() are managed for instance.
> +
> +SWQ allows hardware to provision just a single address in the device. When
> +used with ENQCMD to submit work, the device can distinguish the process
> +submitting the work since it will include the PASID assigned to that
> +process. This decreases the pressure of hardware requiring to support
> +hardware to scale to a large number of processes.
> +
> +* Is this the same as a user space device driver?
> +
> +Communicating with the device via the shared work queue is much simpler
> +than a full blown user space driver. The kernel driver does all the
> +initialization of the hardware. User space only needs to worry about
> +submitting work and processing completions.
> +
> +* Is this the same as SR-IOV?
> +
> +Single Root I/O Virtualization (SR-IOV) focuses on providing independent
> +hardware interfaces for virtualizing hardware. Hence its required to be
> +almost fully functional interface to software supporting the traditional
> +BAR's, space for interrupts via MSI-x, its own register layout.
> +Virtual Functions (VFs) are assisted by the Physical Function (PF)
> +driver.
> +
> +Scalable I/O Virtualization builds on the PASID concept to create device
> +instances for virtualization. SIOV requires host software to assist in
> +creating virtual devices, each virtual device is represented by a PASID
> +along with the BDF of the device. This allows device hardware to optimize
> +device resource creation and can grow dynamically on demand. SR-IOV creation
> +and management is very static in nature. Consult references below for more
> +details.
> +
> +* Why not just create a virtual function for each app?
> +
> +Creating PCIe SRIOV type virtual functions (VF) are expensive. They create
> +duplicated hardware for PCI config space requirements, Interrupts such as
> +MSIx for instance. Resources such as interrupts have to be hard partitioned
> +between VF's at creation time, and cannot scale dynamically on demand. The
> +VF's are not completely independent from the Physical function (PF). Most
> +VF's require some communication and assistance from the PF driver. SIOV
> +creates a software defined device. Where all the configuration and control
> +aspects are mediated via the slow path. The work submission and completion
> +happen without any mediation.
> +
> +* Does this support virtualization?
> +
> +ENQCMD can be used from within a guest VM. In these cases the VMM helps
> +with setting up a translation table to translate from Guest PASID to Host
> +PASID. Please consult the ENQCMD instruction set reference for more
> +details.
> +
> +* Does memory need to be pinned?
> +
> +When devices support SVA, along with platform hardware such as IOMMU
> +supporting such devices, there is no need to pin memory for DMA purposes.
> +Devices that support SVA also support other PCIe features that remove the
> +pinning requirement for memory.
> +
> +Device TLB support - Device requests the IOMMU to lookup an address before
> +use via Address Translation Service (ATS) requests. If the mapping exists
> +but there is no page allocated by the OS, IOMMU hardware returns that no
> +mapping exists.
> +
> +Device requests that virtual address to be mapped via Page Request
> +Interface (PRI). Once the OS has successfully completed the mapping, it
> +returns the response back to the device. The device continues again to
> +request for a translation and continues.
> +
> +IOMMU works with the OS in managing consistency of page-tables with the
> +device. When removing pages, it interacts with the device to remove any
> +device-tlb that might have been cached before removing the mappings from
> +the OS.
> +
> +References
> +==========
> +
> +VT-D:
> +https://01.org/blogs/ashokraj/2018/recent-enhancements-intel-virtualization-technology-directed-i/o-intel-vt-d
> +
> +SIOV:
> +https://01.org/blogs/2019/assignable-interfaces-intel-scalable-i/o-virtualization-linux
> +
> +ENQCMD in ISE:
> +https://software.intel.com/sites/default/files/managed/c5/15/architecture-instruction-set-extensions-programming-reference.pdf
> +
> +DSA spec:
> +https://software.intel.com/sites/default/files/341204-intel-data-streaming-accelerator-spec.pdf
>

Best regards,
baolu

2020-06-13 13:11:44

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v2 11/12] x86/mmu: Allocate/free PASID

Hi Fenghua,

On 2020/6/13 8:41, Fenghua Yu wrote:
> A PASID is allocated for an "mm" the first time any thread attaches
> to an SVM capable device. Later device attachments (whether to the same
> device or another SVM device) will re-use the same PASID.
>
> The PASID is freed when the process exits (so no need to keep
> reference counts on how many SVM devices are sharing the PASID).

FYI.

Jean-Philippe Brucker has a patch for mm->pasid management in the vendor
agnostic manner.

https://www.spinics.net/lists/iommu/msg44459.html

Best regards,
baolu

>
> Signed-off-by: Fenghua Yu <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> ---
> v2:
> - Define a helper free_bind() to simplify error exit code in bind_mm()
> (Thomas)
> - Fix a ret error code in bind_mm() (Thomas)
> - Change pasid's type from "int" to "unsigned int" to have consistent
> pasid type in iommu (Thomas)
> - Simplify alloc_pasid() a bit.
>
> arch/x86/include/asm/iommu.h | 2 +
> arch/x86/include/asm/mmu_context.h | 14 ++++
> drivers/iommu/intel/svm.c | 101 +++++++++++++++++++++++++----
> 3 files changed, 105 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
> index bf1ed2ddc74b..ed41259fe7ac 100644
> --- a/arch/x86/include/asm/iommu.h
> +++ b/arch/x86/include/asm/iommu.h
> @@ -26,4 +26,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
> return -EINVAL;
> }
>
> +void __free_pasid(struct mm_struct *mm);
> +
> #endif /* _ASM_X86_IOMMU_H */
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index 47562147e70b..f8c91ce8c451 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -13,6 +13,7 @@
> #include <asm/tlbflush.h>
> #include <asm/paravirt.h>
> #include <asm/debugreg.h>
> +#include <asm/iommu.h>
>
> extern atomic64_t last_mm_ctx_id;
>
> @@ -117,9 +118,22 @@ static inline int init_new_context(struct task_struct *tsk,
> init_new_context_ldt(mm);
> return 0;
> }
> +
> +static inline void free_pasid(struct mm_struct *mm)
> +{
> + if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
> + return;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
> + return;
> +
> + __free_pasid(mm);
> +}
> +
> static inline void destroy_context(struct mm_struct *mm)
> {
> destroy_context_ldt(mm);
> + free_pasid(mm);
> }
>
> extern void switch_mm(struct mm_struct *prev, struct mm_struct *next,
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 4e775e12ae52..27dc866b8461 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -425,6 +425,53 @@ int intel_svm_unbind_gpasid(struct device *dev, unsigned int pasid)
> return ret;
> }
>
> +static void free_bind(struct intel_svm *svm, struct intel_svm_dev *sdev,
> + bool new_pasid)
> +{
> + if (new_pasid)
> + ioasid_free(svm->pasid);
> + kfree(svm);
> + kfree(sdev);
> +}
> +
> +/*
> + * If this mm already has a PASID, use it. Otherwise allocate a new one.
> + * Let the caller know if a new PASID is allocated via 'new_pasid'.
> + */
> +static int alloc_pasid(struct intel_svm *svm, struct mm_struct *mm,
> + unsigned int pasid_max, bool *new_pasid,
> + unsigned int flags)
> +{
> + unsigned int pasid;
> +
> + *new_pasid = false;
> +
> + /*
> + * Reuse the PASID if the mm already has a PASID and not a private
> + * PASID is requested.
> + */
> + if (mm && mm->pasid && !(flags & SVM_FLAG_PRIVATE_PASID)) {
> + /*
> + * Once a PASID is allocated for this mm, the PASID
> + * stays with the mm until the mm is dropped. Reuse
> + * the PASID which has been already allocated for the
> + * mm instead of allocating a new one.
> + */
> + ioasid_set_data(mm->pasid, svm);
> +
> + return mm->pasid;
> + }
> +
> + /* Allocate a new pasid. Do not use PASID 0, reserved for init PASID. */
> + pasid = ioasid_alloc(NULL, PASID_MIN, pasid_max - 1, svm);
> + if (pasid != INVALID_IOASID) {
> + /* A new pasid is allocated. */
> + *new_pasid = true;
> + }
> +
> + return pasid;
> +}
> +
> /* Caller must hold pasid_mutex, mm reference */
> static int
> intel_svm_bind_mm(struct device *dev, unsigned int flags,
> @@ -518,6 +565,8 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
> init_rcu_head(&sdev->rcu);
>
> if (!svm) {
> + bool new_pasid;
> +
> svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> if (!svm) {
> ret = -ENOMEM;
> @@ -529,12 +578,9 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
> if (pasid_max > intel_pasid_max_id)
> pasid_max = intel_pasid_max_id;
>
> - /* Do not use PASID 0, reserved for RID to PASID */
> - svm->pasid = ioasid_alloc(NULL, PASID_MIN,
> - pasid_max - 1, svm);
> + svm->pasid = alloc_pasid(svm, mm, pasid_max, &new_pasid, flags);
> if (svm->pasid == INVALID_IOASID) {
> - kfree(svm);
> - kfree(sdev);
> + free_bind(svm, sdev, new_pasid);
> ret = -ENOSPC;
> goto out;
> }
> @@ -547,9 +593,7 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
> if (mm) {
> ret = mmu_notifier_register(&svm->notifier, mm);
> if (ret) {
> - ioasid_free(svm->pasid);
> - kfree(svm);
> - kfree(sdev);
> + free_bind(svm, sdev, new_pasid);
> goto out;
> }
> }
> @@ -565,12 +609,18 @@ intel_svm_bind_mm(struct device *dev, unsigned int flags,
> if (ret) {
> if (mm)
> mmu_notifier_unregister(&svm->notifier, mm);
> - ioasid_free(svm->pasid);
> - kfree(svm);
> - kfree(sdev);
> + free_bind(svm, sdev, new_pasid);
> goto out;
> }
>
> + if (mm && new_pasid && !(flags & SVM_FLAG_PRIVATE_PASID)) {
> + /*
> + * Track the new pasid in the mm. The pasid will be
> + * freed at process exit. Don't track requested
> + * private PASID in the mm.
> + */
> + mm->pasid = svm->pasid;
> + }
> list_add_tail(&svm->list, &global_svm_list);
> } else {
> /*
> @@ -640,7 +690,8 @@ static int intel_svm_unbind_mm(struct device *dev, unsigned int pasid)
> kfree_rcu(sdev, rcu);
>
> if (list_empty(&svm->devs)) {
> - ioasid_free(svm->pasid);
> + /* Clear data in the pasid. */
> + ioasid_set_data(pasid, NULL);
> if (svm->mm)
> mmu_notifier_unregister(&svm->notifier, svm->mm);
> list_del(&svm->list);
> @@ -1001,3 +1052,29 @@ unsigned int intel_svm_get_pasid(struct iommu_sva *sva)
>
> return pasid;
> }
> +
> +/*
> + * An invalid pasid is either 0 (init PASID value) or bigger than max PASID
> + * (PASID_MAX - 1).
> + */
> +static bool invalid_pasid(unsigned int pasid)
> +{
> + return (pasid == INIT_PASID) || (pasid >= PASID_MAX);
> +}
> +
> +/* On process exit free the PASID (if one was allocated). */
> +void __free_pasid(struct mm_struct *mm)
> +{
> + unsigned int pasid = mm->pasid;
> +
> + /* No need to free invalid pasid. */
> + if (invalid_pasid(pasid))
> + return;
> +
> + /*
> + * Since the pasid is not bound to any svm by now, there is no race
> + * here with binding/unbinding and no need to protect the free
> + * operation by pasid_mutex.
> + */
> + ioasid_free(pasid);
> +}
>

2020-06-15 02:21:56

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v2 11/12] x86/mmu: Allocate/free PASID

Hi Fenghua,

On 6/13/20 8:41 AM, Fenghua Yu wrote:
> A PASID is allocated for an "mm" the first time any thread attaches
> to an SVM capable device. Later device attachments (whether to the same
> device or another SVM device) will re-use the same PASID.
>
> The PASID is freed when the process exits (so no need to keep
> reference counts on how many SVM devices are sharing the PASID).
>
> Signed-off-by: Fenghua Yu <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> ---
> v2:
> - Define a helper free_bind() to simplify error exit code in bind_mm()
> (Thomas)
> - Fix a ret error code in bind_mm() (Thomas)
> - Change pasid's type from "int" to "unsigned int" to have consistent
> pasid type in iommu (Thomas)
> - Simplify alloc_pasid() a bit.
>
> arch/x86/include/asm/iommu.h | 2 +
> arch/x86/include/asm/mmu_context.h | 14 ++++
> drivers/iommu/intel/svm.c | 101 +++++++++++++++++++++++++----
> 3 files changed, 105 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/include/asm/iommu.h b/arch/x86/include/asm/iommu.h
> index bf1ed2ddc74b..ed41259fe7ac 100644
> --- a/arch/x86/include/asm/iommu.h
> +++ b/arch/x86/include/asm/iommu.h
> @@ -26,4 +26,6 @@ arch_rmrr_sanity_check(struct acpi_dmar_reserved_memory *rmrr)
> return -EINVAL;
> }
>
> +void __free_pasid(struct mm_struct *mm);
> +
> #endif /* _ASM_X86_IOMMU_H */
> diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
> index 47562147e70b..f8c91ce8c451 100644
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -13,6 +13,7 @@
> #include <asm/tlbflush.h>
> #include <asm/paravirt.h>
> #include <asm/debugreg.h>
> +#include <asm/iommu.h>
>
> extern atomic64_t last_mm_ctx_id;
>
> @@ -117,9 +118,22 @@ static inline int init_new_context(struct task_struct *tsk,
> init_new_context_ldt(mm);
> return 0;
> }
> +
> +static inline void free_pasid(struct mm_struct *mm)
> +{
> + if (!IS_ENABLED(CONFIG_INTEL_IOMMU_SVM))
> + return;
> +
> + if (!cpu_feature_enabled(X86_FEATURE_ENQCMD))
> + return;
> +
> + __free_pasid(mm);
> +}
> +
> static inline void destroy_context(struct mm_struct *mm)
> {
> destroy_context_ldt(mm);
> + free_pasid(mm);
> }
>
> extern void switch_mm(struct mm_struct *prev, struct mm_struct *next,
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 4e775e12ae52..27dc866b8461 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -425,6 +425,53 @@ int intel_svm_unbind_gpasid(struct device *dev, unsigned int pasid)
> return ret;
> }
>
> +static void free_bind(struct intel_svm *svm, struct intel_svm_dev *sdev,
> + bool new_pasid)
> +{
> + if (new_pasid)
> + ioasid_free(svm->pasid);
> + kfree(svm);
> + kfree(sdev);
> +}
> +
> +/*
> + * If this mm already has a PASID, use it. Otherwise allocate a new one.
> + * Let the caller know if a new PASID is allocated via 'new_pasid'.
> + */
> +static int alloc_pasid(struct intel_svm *svm, struct mm_struct *mm,
> + unsigned int pasid_max, bool *new_pasid,
> + unsigned int flags)
> +{
> + unsigned int pasid;
> +
> + *new_pasid = false;
> +
> + /*
> + * Reuse the PASID if the mm already has a PASID and not a private
> + * PASID is requested.
> + */
> + if (mm && mm->pasid && !(flags & SVM_FLAG_PRIVATE_PASID)) {
> + /*
> + * Once a PASID is allocated for this mm, the PASID
> + * stays with the mm until the mm is dropped. Reuse
> + * the PASID which has been already allocated for the
> + * mm instead of allocating a new one.
> + */
> + ioasid_set_data(mm->pasid, svm);

How about adding some sanity checks here? For example,

void *p = ioasid_find(NULL, mm->pasid, NULL);

if (!p)
ioasid_set_data(mm->pasid, svm);
else if (IS_ERR(p) || p != svm)
return INVALID_IOSASID;

Best regards,
baolu

2020-06-15 07:55:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] x86: tag application address space for devices

On Fri, Jun 12, 2020 at 05:41:21PM -0700, Fenghua Yu wrote:

> This series only provides simple and basic support for ENQCMD and the MSR:
> 1. Clean up type definitions (patch 1-3). These patches can be in a
> separate series.
> - Define "pasid" as "unsigned int" consistently (patch 1 and 2).
> - Define "flags" as "unsigned int"
> 2. Explain different various technical terms used in the series (patch 4).
> 3. Enumerate support for ENQCMD in the processor (patch 5).
> 4. Handle FPU PASID state and the MSR during context switch (patches 6-7).
> 5. Define "pasid" in mm_struct (patch 8).
> 5. Clear PASID state for new mm and forked and cloned thread (patch 9-10).
> 6. Allocate and free PASID for a process (patch 11).
> 7. Fix up the PASID MSR in #GP handler when one thread in a process
> executes ENQCMD for the first time (patches 12).

If this is per mm, should not switch_mm() update the MSR ? I'm not
seeing that, nor do I see it explained why not.

2020-06-15 07:57:59

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote:
> @@ -447,6 +458,18 @@ dotraplinkage void do_general_protection(struct pt_regs *regs, long error_code)
> int ret;
>
> RCU_LOCKDEP_WARN(!rcu_is_watching(), "entry code didn't wake RCU");
> +
> + /*
> + * Perform the check for a user mode PASID exception before enable
> + * interrupts. Doing this here ensures that the PASID MSR can be simply
> + * accessed because the contents are known to be still associated
> + * with the current process.
> + */
> + if (user_mode(regs) && fixup_pasid_exception()) {
> + cond_local_irq_enable(regs);
> + return;

OK, so we're done with the exception, lets enable interrupts?

> + }

2020-06-15 07:59:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote:
> +/*
> + * Apply some heuristics to see if the #GP fault was caused by a thread
> + * that hasn't had the IA32_PASID MSR initialized. If it looks like that
> + * is the problem, try initializing the IA32_PASID MSR. If the heuristic
> + * guesses incorrectly, take one more #GP fault.

How is that going to help? Aren't we then going to run this same
heuristic again and again and again?

> + */
> +bool __fixup_pasid_exception(void)
> +{
> + u64 pasid_msr;
> + unsigned int pasid;
> +
> + /*
> + * This function is called only when this #GP was triggered from user
> + * space. So the mm cannot be NULL.
> + */
> + pasid = current->mm->pasid;
> + /* If the mm doesn't have a valid PASID, then can't help. */
> + if (invalid_pasid(pasid))
> + return false;
> +
> + /*
> + * Since IRQ is disabled now, the current task still owns the FPU on

That's just weird and confusing. What you want to say is that you rely
on the exception disabling the interrupt.

> + * this CPU and the PASID MSR can be directly accessed.
> + *
> + * If the MSR has a valid PASID, the #GP must be for some other reason.
> + *
> + * If rdmsr() is really a performance issue, a TIF_ flag may be
> + * added to check if the thread has a valid PASID instead of rdmsr().

I don't understand any of this. Nobody except us writes to this MSR, we
should bloody well know what's in it. What gives?

> + */
> + rdmsrl(MSR_IA32_PASID, pasid_msr);
> + if (pasid_msr & MSR_IA32_PASID_VALID)
> + return false;
> +
> + /* Fix up the MSR if the MSR doesn't have a valid PASID. */
> + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
> +
> + return true;
> +}
> --
> 2.19.1
>

2020-06-15 14:56:42

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v2 00/12] x86: tag application address space for devices

Hi, Peter,
On Mon, Jun 15, 2020 at 09:52:02AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 12, 2020 at 05:41:21PM -0700, Fenghua Yu wrote:
>
> > This series only provides simple and basic support for ENQCMD and the MSR:
> > 1. Clean up type definitions (patch 1-3). These patches can be in a
> > separate series.
> > - Define "pasid" as "unsigned int" consistently (patch 1 and 2).
> > - Define "flags" as "unsigned int"
> > 2. Explain different various technical terms used in the series (patch 4).
> > 3. Enumerate support for ENQCMD in the processor (patch 5).
> > 4. Handle FPU PASID state and the MSR during context switch (patches 6-7).
> > 5. Define "pasid" in mm_struct (patch 8).
> > 5. Clear PASID state for new mm and forked and cloned thread (patch 9-10).
> > 6. Allocate and free PASID for a process (patch 11).
> > 7. Fix up the PASID MSR in #GP handler when one thread in a process
> > executes ENQCMD for the first time (patches 12).
>
> If this is per mm, should not switch_mm() update the MSR ? I'm not
> seeing that, nor do I see it explained why not.

PASID value is per mm and all threads in a process have the same PASID
value in the MSR. However, the MSR is per thread and is context switched
by XSAVES/XRSTROS in patches 6-7.

Thanks.

-Fenghua

2020-06-15 15:53:57

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

Hi, Peter,
On Mon, Jun 15, 2020 at 09:56:49AM +0200, Peter Zijlstra wrote:
> On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote:
> > +/*
> > + * Apply some heuristics to see if the #GP fault was caused by a thread
> > + * that hasn't had the IA32_PASID MSR initialized. If it looks like that
> > + * is the problem, try initializing the IA32_PASID MSR. If the heuristic
> > + * guesses incorrectly, take one more #GP fault.
>
> How is that going to help? Aren't we then going to run this same
> heuristic again and again and again?

The heuristic always initializes the MSR with the per mm PASID IIF the
mm has a valid PASID but the MSR doesn't have one. This heuristic usually
happens only once on the first #GP in a thread.

If the next #GP still comes in, the heuristic finds out the MSR already
has a valid PASID and thus will not fixup the MSR any more. The fixup()
returns "false" and lets others to handle the #GP.

So the heuristic will be executed once (at most) and won't be executed
again and again.

>
> > + */
> > +bool __fixup_pasid_exception(void)
> > +{
> > + u64 pasid_msr;
> > + unsigned int pasid;
> > +
> > + /*
> > + * This function is called only when this #GP was triggered from user
> > + * space. So the mm cannot be NULL.
> > + */
> > + pasid = current->mm->pasid;
> > + /* If the mm doesn't have a valid PASID, then can't help. */
> > + if (invalid_pasid(pasid))
> > + return false;
> > +
> > + /*
> > + * Since IRQ is disabled now, the current task still owns the FPU on
>
> That's just weird and confusing. What you want to say is that you rely
> on the exception disabling the interrupt.

I checked SDM again. You are right. #GP can be interrupted by machine check
or other interrupts. So I cannot assume the current task still owns the FPU.
Instead of directly rdmsr() and wrmsr(), I will add helpers that can access
either the MSR on the processor or the PASID state in the memory.

>
> > + * this CPU and the PASID MSR can be directly accessed.
> > + *
> > + * If the MSR has a valid PASID, the #GP must be for some other reason.
> > + *
> > + * If rdmsr() is really a performance issue, a TIF_ flag may be
> > + * added to check if the thread has a valid PASID instead of rdmsr().
>
> I don't understand any of this. Nobody except us writes to this MSR, we
> should bloody well know what's in it. What gives?

Patch 4 describes how to manage the MSR and patch 7 describes the format
of the MSR (20-bit PASID value and bit 31 is valid bit).

Are they sufficient to help? Or do you mean something else?

> > + */
> > + rdmsrl(MSR_IA32_PASID, pasid_msr);
> > + if (pasid_msr & MSR_IA32_PASID_VALID)
> > + return false;
> > +
> > + /* Fix up the MSR if the MSR doesn't have a valid PASID. */
> > + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
> > +
> > + return true;
> > +}
> > --
> > 2.19.1
> >

Thanks.

-Fenghua

2020-06-15 16:08:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

On Mon, Jun 15, 2020 at 08:48:54AM -0700, Fenghua Yu wrote:
> Hi, Peter,
> On Mon, Jun 15, 2020 at 09:56:49AM +0200, Peter Zijlstra wrote:
> > On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote:
> > > +/*
> > > + * Apply some heuristics to see if the #GP fault was caused by a thread
> > > + * that hasn't had the IA32_PASID MSR initialized. If it looks like that
> > > + * is the problem, try initializing the IA32_PASID MSR. If the heuristic
> > > + * guesses incorrectly, take one more #GP fault.
> >
> > How is that going to help? Aren't we then going to run this same
> > heuristic again and again and again?
>
> The heuristic always initializes the MSR with the per mm PASID IIF the
> mm has a valid PASID but the MSR doesn't have one. This heuristic usually
> happens only once on the first #GP in a thread.

But it doesn't guarantee the PASID is the right one. Suppose both the mm
has a PASID and the MSR has a VALID one, but the MSR isn't the mm one.
Then we NO-OP. So if the exception was due to us having the wrong PASID,
we stuck.

> If the next #GP still comes in, the heuristic finds out the MSR already
> has a valid PASID and thus will not fixup the MSR any more. The fixup()
> returns "false" and lets others to handle the #GP.
>
> So the heuristic will be executed once (at most) and won't be executed
> again and again.

So I get that you want to set the PASID on-demand, but I find this all
really weird code to make that happen.

> > > +bool __fixup_pasid_exception(void)
> > > +{
> > > + u64 pasid_msr;
> > > + unsigned int pasid;
> > > +
> > > + /*
> > > + * This function is called only when this #GP was triggered from user
> > > + * space. So the mm cannot be NULL.
> > > + */
> > > + pasid = current->mm->pasid;
> > > + /* If the mm doesn't have a valid PASID, then can't help. */
> > > + if (invalid_pasid(pasid))
> > > + return false;
> > > +
> > > + /*
> > > + * Since IRQ is disabled now, the current task still owns the FPU on
> >
> > That's just weird and confusing. What you want to say is that you rely
> > on the exception disabling the interrupt.
>
> I checked SDM again. You are right. #GP can be interrupted by machine check
> or other interrupts. So I cannot assume the current task still owns the FPU.
> Instead of directly rdmsr() and wrmsr(), I will add helpers that can access
> either the MSR on the processor or the PASID state in the memory.

That's not in fact what I meant, but yes, you can take exceptions while
!IF just fine.

> > > + * this CPU and the PASID MSR can be directly accessed.
> > > + *
> > > + * If the MSR has a valid PASID, the #GP must be for some other reason.
> > > + *
> > > + * If rdmsr() is really a performance issue, a TIF_ flag may be
> > > + * added to check if the thread has a valid PASID instead of rdmsr().
> >
> > I don't understand any of this. Nobody except us writes to this MSR, we
> > should bloody well know what's in it. What gives?
>
> Patch 4 describes how to manage the MSR and patch 7 describes the format
> of the MSR (20-bit PASID value and bit 31 is valid bit).
>
> Are they sufficient to help? Or do you mean something else?

I don't get why you need a rdmsr here, or why not having one would
require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?

> > > + */
> > > + rdmsrl(MSR_IA32_PASID, pasid_msr);
> > > + if (pasid_msr & MSR_IA32_PASID_VALID)
> > > + return false;
> > > +
> > > + /* Fix up the MSR if the MSR doesn't have a valid PASID. */
> > > + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);

How much more expensive is the wrmsr over the rdmsr? Can't we just
unconditionally write the current PASID and call it a day?

2020-06-15 18:17:53

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

On Mon, Jun 15, 2020 at 06:03:57PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 15, 2020 at 08:48:54AM -0700, Fenghua Yu wrote:
> > Hi, Peter,
> > On Mon, Jun 15, 2020 at 09:56:49AM +0200, Peter Zijlstra wrote:
> > > On Fri, Jun 12, 2020 at 05:41:33PM -0700, Fenghua Yu wrote:
> > > > +/*
> > > > + * Apply some heuristics to see if the #GP fault was caused by a thread
> > > > + * that hasn't had the IA32_PASID MSR initialized. If it looks like that
> > > > + * is the problem, try initializing the IA32_PASID MSR. If the heuristic
> > > > + * guesses incorrectly, take one more #GP fault.
> > >
> > > How is that going to help? Aren't we then going to run this same
> > > heuristic again and again and again?
> >
> > The heuristic always initializes the MSR with the per mm PASID IIF the
> > mm has a valid PASID but the MSR doesn't have one. This heuristic usually
> > happens only once on the first #GP in a thread.
>
> But it doesn't guarantee the PASID is the right one. Suppose both the mm
> has a PASID and the MSR has a VALID one, but the MSR isn't the mm one.
> Then we NO-OP. So if the exception was due to us having the wrong PASID,
> we stuck.

The MSR for each thread was cleared during fork() and clone(). The PASID
was cleared during mm_init(). The per-mm PASID was assigned when fist
bind_mm() is called and remains the same value until process exit().

The MSR is only fixed up when the first ENQCMD is executed in a thread:
bit 31 in the MSR is 0 and the PASID in the mm is non-zero.

The MSR remains the same PASID value once it's fixed up until the thread
exits.

So the work flow ensures the PASID goes from mm's PASID to the MSR.

The PASID could be unbund from the mm. In this case, iommu will generate
#PF and kernel oops instead of #GP.

>
> > If the next #GP still comes in, the heuristic finds out the MSR already
> > has a valid PASID and thus will not fixup the MSR any more. The fixup()
> > returns "false" and lets others to handle the #GP.
> >
> > So the heuristic will be executed once (at most) and won't be executed
> > again and again.
>
> So I get that you want to set the PASID on-demand, but I find this all
> really weird code to make that happen.

We could keep PASID same in all threads sychronously by propogating the MSRs
when the PASID is bound to the mm via IPIs or taskworks to all
threads in the process. But the code is complex and error-prone and
overhead could be high:
1. The user can call driver to do binding and unbinding multiple times.
The IPIs or taskworks will be sent multiple times to make sure only
the last IPIs or taskworks take action.
2. Even if a thread never executes ENQCMD and thus never uses the MSR,
the MSR still needs to be updated whenever bind_mm() and needs to be
context switched each time. This could cause high overhead.

Setting the PASID on-demand is simpler and cleaner and was recommended
by Thomas.

>
> > > > +bool __fixup_pasid_exception(void)
> > > > +{
> > > > + u64 pasid_msr;
> > > > + unsigned int pasid;
> > > > +
> > > > + /*
> > > > + * This function is called only when this #GP was triggered from user
> > > > + * space. So the mm cannot be NULL.
> > > > + */
> > > > + pasid = current->mm->pasid;
> > > > + /* If the mm doesn't have a valid PASID, then can't help. */
> > > > + if (invalid_pasid(pasid))
> > > > + return false;
> > > > +
> > > > + /*
> > > > + * Since IRQ is disabled now, the current task still owns the FPU on
> > >
> > > That's just weird and confusing. What you want to say is that you rely
> > > on the exception disabling the interrupt.
> >
> > I checked SDM again. You are right. #GP can be interrupted by machine check
> > or other interrupts. So I cannot assume the current task still owns the FPU.
> > Instead of directly rdmsr() and wrmsr(), I will add helpers that can access
> > either the MSR on the processor or the PASID state in the memory.
>
> That's not in fact what I meant, but yes, you can take exceptions while
> !IF just fine.
>
> > > > + * this CPU and the PASID MSR can be directly accessed.
> > > > + *
> > > > + * If the MSR has a valid PASID, the #GP must be for some other reason.
> > > > + *
> > > > + * If rdmsr() is really a performance issue, a TIF_ flag may be
> > > > + * added to check if the thread has a valid PASID instead of rdmsr().
> > >
> > > I don't understand any of this. Nobody except us writes to this MSR, we
> > > should bloody well know what's in it. What gives?
> >
> > Patch 4 describes how to manage the MSR and patch 7 describes the format
> > of the MSR (20-bit PASID value and bit 31 is valid bit).
> >
> > Are they sufficient to help? Or do you mean something else?
>
> I don't get why you need a rdmsr here, or why not having one would
> require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?

My concern is TIF flags are precious (only 3 slots available). Defining
a new TIF flag may be not worth it while rdmsr() can check if PASID
is valid in the MSR. And performance here might not be a big issue
in #GP.

But if you think using TIF flag is better, I can define a new TIF flag
and maintain it per thread (init 0 when clone()/fork(), set 1 in fixup()).
Then we can avoid using rdmsr() to check valid PASID in the MSR.

>
> > > > + */
> > > > + rdmsrl(MSR_IA32_PASID, pasid_msr);
> > > > + if (pasid_msr & MSR_IA32_PASID_VALID)
> > > > + return false;
> > > > +
> > > > + /* Fix up the MSR if the MSR doesn't have a valid PASID. */
> > > > + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
>
> How much more expensive is the wrmsr over the rdmsr? Can't we just
> unconditionally write the current PASID and call it a day?

rdmsr() and wrmsr() might have same cost.

If using a TIF flag to check if the thread has a valid PASID MSR, then
I can replace rdmsr() by checking the TIF flag and only do wrmsr()
when the TIF flag is set.

Please advice.

Thanks.

-Fenghua

2020-06-15 18:21:36

by Ashok Raj

[permalink] [raw]
Subject: Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

On Mon, Jun 15, 2020 at 06:03:57PM +0200, Peter Zijlstra wrote:
>
> I don't get why you need a rdmsr here, or why not having one would
> require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?
>
> > > > + */
> > > > + rdmsrl(MSR_IA32_PASID, pasid_msr);
> > > > + if (pasid_msr & MSR_IA32_PASID_VALID)
> > > > + return false;
> > > > +
> > > > + /* Fix up the MSR if the MSR doesn't have a valid PASID. */
> > > > + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
>
> How much more expensive is the wrmsr over the rdmsr? Can't we just
> unconditionally write the current PASID and call it a day?

The reason to check the rdmsr() is because we are using a hueristic taking
GP faults. If we already setup the MSR, but we get it a second time it
means the reason is something other than PASID_MSR not being set.

Ideally we should use the TIF_ to track this would be cheaper, but we were
told those bits aren't easy to give out.

Cheers,
Ashok

2020-06-15 18:34:27

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

On Mon, Jun 15, 2020 at 11:12:59AM -0700, Fenghua Yu wrote:
> > I don't get why you need a rdmsr here, or why not having one would
> > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?
>
> My concern is TIF flags are precious (only 3 slots available). Defining
> a new TIF flag may be not worth it while rdmsr() can check if PASID
> is valid in the MSR. And performance here might not be a big issue
> in #GP.
>
> But if you think using TIF flag is better, I can define a new TIF flag
> and maintain it per thread (init 0 when clone()/fork(), set 1 in fixup()).
> Then we can avoid using rdmsr() to check valid PASID in the MSR.

WHY ?!?! What do you need a TIF flag for?

2020-06-15 18:34:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

On Mon, Jun 15, 2020 at 11:19:21AM -0700, Raj, Ashok wrote:
> On Mon, Jun 15, 2020 at 06:03:57PM +0200, Peter Zijlstra wrote:
> >
> > I don't get why you need a rdmsr here, or why not having one would
> > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?
> >
> > > > > + */
> > > > > + rdmsrl(MSR_IA32_PASID, pasid_msr);
> > > > > + if (pasid_msr & MSR_IA32_PASID_VALID)
> > > > > + return false;
> > > > > +
> > > > > + /* Fix up the MSR if the MSR doesn't have a valid PASID. */
> > > > > + wrmsrl(MSR_IA32_PASID, pasid | MSR_IA32_PASID_VALID);
> >
> > How much more expensive is the wrmsr over the rdmsr? Can't we just
> > unconditionally write the current PASID and call it a day?
>
> The reason to check the rdmsr() is because we are using a hueristic taking
> GP faults. If we already setup the MSR, but we get it a second time it
> means the reason is something other than PASID_MSR not being set.
>
> Ideally we should use the TIF_ to track this would be cheaper, but we were
> told those bits aren't easy to give out.

Why do you need a TIF flag? Why not any other random flag in current?

2020-06-15 18:57:40

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

Hi, Peter,

On Mon, Jun 15, 2020 at 08:31:16PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 15, 2020 at 11:12:59AM -0700, Fenghua Yu wrote:
> > > I don't get why you need a rdmsr here, or why not having one would
> > > require a TIF flag. Is that because this MSR is XSAVE/XRSTOR managed?
> >
> > My concern is TIF flags are precious (only 3 slots available). Defining
> > a new TIF flag may be not worth it while rdmsr() can check if PASID
> > is valid in the MSR. And performance here might not be a big issue
> > in #GP.
> >
> > But if you think using TIF flag is better, I can define a new TIF flag
> > and maintain it per thread (init 0 when clone()/fork(), set 1 in fixup()).
> > Then we can avoid using rdmsr() to check valid PASID in the MSR.
>
> WHY ?!?! What do you need a TIF flag for?

We need "a way" to check if the per thread MSR has a valid PASID. If yes,
no need to fix up the MSR (wrmsr()), and let other handler to handle the #GP.
Otherwise, apply the heuristics and fix up the MSR and exit the #GP.

The way to check the valid PASID in the MSR is rdmsr() in this series.
A TIF flag will be much faster than rdmsr() and seems a sutiable way
to check valid PASID status per thread. That's why it could replace
rdmsr() to check PASID in the MSR.

Or do you suggest to add a random new flag in struct thread_info instead
of a TIF flag?

Thanks.

-Fenghua

2020-06-15 19:13:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote:

> Or do you suggest to add a random new flag in struct thread_info instead
> of a TIF flag?

Why thread_info? What's wrong with something simple like the below. It
takes a bit from the 'strictly current' flags word.


diff --git a/include/linux/sched.h b/include/linux/sched.h
index b62e6aaf28f0..fca830b97055 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -801,6 +801,9 @@ struct task_struct {
/* Stalled due to lack of memory */
unsigned in_memstall:1;
#endif
+#ifdef CONFIG_PCI_PASID
+ unsigned has_valid_pasid:1;
+#endif

unsigned long atomic_flags; /* Flags requiring atomic access. */

diff --git a/kernel/fork.c b/kernel/fork.c
index 142b23645d82..10b3891be99e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
tsk->use_memdelay = 0;
#endif

+#ifdef CONFIG_PCI_PASID
+ tsk->has_valid_pasid = 0;
+#endif
+
#ifdef CONFIG_MEMCG
tsk->active_memcg = NULL;
#endif

2020-06-15 19:29:41

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

>> The heuristic always initializes the MSR with the per mm PASID IIF the
>> mm has a valid PASID but the MSR doesn't have one. This heuristic usually
>> happens only once on the first #GP in a thread.
>
> But it doesn't guarantee the PASID is the right one. Suppose both the mm
> has a PASID and the MSR has a VALID one, but the MSR isn't the mm one.
> Then we NO-OP. So if the exception was due to us having the wrong PASID,
> we stuck.

ENQCMD only checks the 'valid' bit of the IA32_PASID MSR to decide whether
to #GP or not. H/W has no concept of the "right" pasid value.

If IA32_PASID is valid with the wrong value ... then the system is about to
see some major corruption because the operations in the accelerator are
not going to translate to the physical addresses for pages owned by the process
that issued the ENQCMD.

-Tony

2020-06-15 20:20:40

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

Hi, Peter,

On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote:
>
> > Or do you suggest to add a random new flag in struct thread_info instead
> > of a TIF flag?
>
> Why thread_info? What's wrong with something simple like the below. It
> takes a bit from the 'strictly current' flags word.
>
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b62e6aaf28f0..fca830b97055 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -801,6 +801,9 @@ struct task_struct {
> /* Stalled due to lack of memory */
> unsigned in_memstall:1;
> #endif
> +#ifdef CONFIG_PCI_PASID
> + unsigned has_valid_pasid:1;
> +#endif
>
> unsigned long atomic_flags; /* Flags requiring atomic access. */
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 142b23645d82..10b3891be99e 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
> tsk->use_memdelay = 0;
> #endif
>
> +#ifdef CONFIG_PCI_PASID
> + tsk->has_valid_pasid = 0;
> +#endif
> +
> #ifdef CONFIG_MEMCG
> tsk->active_memcg = NULL;
> #endif

The PASID MSR is x86 specific although PASID is PCIe concept and per-mm.
Checking if the MSR has valid PASID (bit31=1) is an x86 specifc work.
The flag should be cleared in cloned()/forked() and is only set and
read in fixup() in x86 #GP for heuristic. It's not used anywhere outside
of x86.

That's why we think the flag should be in x86 struct thread_info instead
of in generice struct task_struct.

Please advice.

Thanks.

-Fenghua

2020-06-15 20:54:41

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID


> On Jun 15, 2020, at 1:17 PM, Fenghua Yu <[email protected]> wrote:
>
> Hi, Peter,
>
>> On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote:
>>> On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote:
>>>
>>> Or do you suggest to add a random new flag in struct thread_info instead
>>> of a TIF flag?
>>
>> Why thread_info? What's wrong with something simple like the below. It
>> takes a bit from the 'strictly current' flags word.
>>
>>
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index b62e6aaf28f0..fca830b97055 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -801,6 +801,9 @@ struct task_struct {
>> /* Stalled due to lack of memory */
>> unsigned in_memstall:1;
>> #endif
>> +#ifdef CONFIG_PCI_PASID
>> + unsigned has_valid_pasid:1;
>> +#endif
>>
>> unsigned long atomic_flags; /* Flags requiring atomic access. */
>>
>> diff --git a/kernel/fork.c b/kernel/fork.c
>> index 142b23645d82..10b3891be99e 100644
>> --- a/kernel/fork.c
>> +++ b/kernel/fork.c
>> @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
>> tsk->use_memdelay = 0;
>> #endif
>>
>> +#ifdef CONFIG_PCI_PASID
>> + tsk->has_valid_pasid = 0;
>> +#endif
>> +
>> #ifdef CONFIG_MEMCG
>> tsk->active_memcg = NULL;
>> #endif
>
> The PASID MSR is x86 specific although PASID is PCIe concept and per-mm.
> Checking if the MSR has valid PASID (bit31=1) is an x86 specifc work.
> The flag should be cleared in cloned()/forked() and is only set and
> read in fixup() in x86 #GP for heuristic. It's not used anywhere outside
> of x86.
>
> That's why we think the flag should be in x86 struct thread_info instead
> of in generice struct task_struct.
>

Are we planning to keep PASID live once a task has used it once or are we going to swap it lazily? If the latter, a percpu variable might be better.

> Please advice.
>
> Thanks.
>
> -Fenghua

2020-06-15 20:58:43

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

> Are we planning to keep PASID live once a task has used it once or are we going to swap it lazily? If the latter, a percpu variable might be better.

Current plan is "touch it once and the task owns it until exit(2)"

Maybe someday in the future when we have data on how applications
actually use accelerators we could look at something more complex
if usage patterns look like it would be beneficial.

-Tony

2020-06-15 21:20:58

by Andy Lutomirski

[permalink] [raw]
Subject: Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID



> On Jun 15, 2020, at 1:56 PM, Luck, Tony <[email protected]> wrote:
>
> 
>>
>> Are we planning to keep PASID live once a task has used it once or are we going to swap it lazily? If the latter, a percpu variable might be better.
>
> Current plan is "touch it once and the task owns it until exit(2)"
>
> Maybe someday in the future when we have data on how applications
> actually use accelerators we could look at something more complex
> if usage patterns look like it would be beneficial.
>
>

So what’s the RDMSR for? Surely you
have some state somewhere that says “this task has a PASID.” Can’t you just make sure that stays in sync with the MSR? Then, on #GP, if the task already has a PASID, you know the MSR is set.

2020-06-15 21:27:53

by Luck, Tony

[permalink] [raw]
Subject: RE: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

> So what’s the RDMSR for? Surely you
> have some state somewhere that says “this task has a PASID.”
> Can’t you just make sure that stays in sync with the MSR? Then, on #GP, if the task already has a PASID, you know the MSR is set.

We have state that says the process ("mm") has been allocated a PASID. But not for each task.

E.g. a process may clone a bunch of tasks, then one of them opens a device that needs
a PASID. We did internally have patches to go hunt down all those other tasks and
force a PASID onto each. But the code was big and ugly. Also maybe the wrong thing
to do if those threads didn't ever need to access the device.

PeterZ suggested that we can add a bit to the task structure for this purpose.

Fenghua is hesitant about adding an x86 only bit there.

-Tony

2020-06-15 21:58:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

On Mon, Jun 15, 2020 at 01:17:35PM -0700, Fenghua Yu wrote:
> Hi, Peter,
>
> On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote:
> > On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote:
> >
> > > Or do you suggest to add a random new flag in struct thread_info instead
> > > of a TIF flag?
> >
> > Why thread_info? What's wrong with something simple like the below. It
> > takes a bit from the 'strictly current' flags word.
> >
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index b62e6aaf28f0..fca830b97055 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -801,6 +801,9 @@ struct task_struct {
> > /* Stalled due to lack of memory */
> > unsigned in_memstall:1;
> > #endif
> > +#ifdef CONFIG_PCI_PASID
> > + unsigned has_valid_pasid:1;
> > +#endif
> >
> > unsigned long atomic_flags; /* Flags requiring atomic access. */
> >
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 142b23645d82..10b3891be99e 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
> > tsk->use_memdelay = 0;
> > #endif
> >
> > +#ifdef CONFIG_PCI_PASID
> > + tsk->has_valid_pasid = 0;
> > +#endif
> > +
> > #ifdef CONFIG_MEMCG
> > tsk->active_memcg = NULL;
> > #endif
>
> The PASID MSR is x86 specific although PASID is PCIe concept and per-mm.
> Checking if the MSR has valid PASID (bit31=1) is an x86 specifc work.
> The flag should be cleared in cloned()/forked() and is only set and
> read in fixup() in x86 #GP for heuristic. It's not used anywhere outside
> of x86.
>
> That's why we think the flag should be in x86 struct thread_info instead
> of in generice struct task_struct.

I don't think anybody really cares, it's just one bit, we have plenty
left.

On x86_64 there's a u32 sized alignment hole in thread_info, also we
don't use the upper 32bit of thread_info::flags, however using those
would still mean you have to use atomic ops, which you really don't
need.


2020-06-15 23:20:40

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v2 04/12] docs: x86: Add documentation for SVA (Shared Virtual Addressing)

Hi, Baolu,

On Sat, Jun 13, 2020 at 08:17:40PM +0800, Lu Baolu wrote:
> Hi Fenghua,
>
> On 2020/6/13 8:41, Fenghua Yu wrote:
> >+implement implement fairness or ensure forward progress can be made.
>
> Repeated "implement".

Will fix this.

> >+For example, the Intel Data Streaming Accelerator (DSA) uses
> >+intel_svm_bind_mm(), which will do the following.
>
> The Intel SVM APIs have been deprecated. Drivers should use
> iommu_sva_bind_device() instead. Please also update other places in
> this document.

Will fix this.

Thanks.

-Fenghua

2020-06-16 08:32:44

by Jean-Philippe Brucker

[permalink] [raw]
Subject: Re: [PATCH v2 08/12] mm: Define pasid in mm

On Fri, Jun 12, 2020 at 05:41:29PM -0700, Fenghua Yu wrote:
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 64ede5f150dc..5778db3aa42d 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -538,6 +538,10 @@ struct mm_struct {
> atomic_long_t hugetlb_usage;
> #endif
> struct work_struct async_put_work;
> +
> +#ifdef CONFIG_PCI_PASID

Non-PCI devices can also use a PASID (e.g. Arm's SubstreamID). How about
CONFIG_IOMMU_SUPPORT?

Thanks,
Jean

> + unsigned int pasid;
> +#endif

2020-06-16 15:16:03

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v2 08/12] mm: Define pasid in mm

Hi, Jean,

On Tue, Jun 16, 2020 at 10:28:19AM +0200, Jean-Philippe Brucker wrote:
> On Fri, Jun 12, 2020 at 05:41:29PM -0700, Fenghua Yu wrote:
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 64ede5f150dc..5778db3aa42d 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -538,6 +538,10 @@ struct mm_struct {
> > atomic_long_t hugetlb_usage;
> > #endif
> > struct work_struct async_put_work;
> > +
> > +#ifdef CONFIG_PCI_PASID
>
> Non-PCI devices can also use a PASID (e.g. Arm's SubstreamID). How about
> CONFIG_IOMMU_SUPPORT?

Sure. I will change it to CONFIG_IOMMU_SUPPORT.

Thanks.

-Fenghua

2020-06-16 23:28:07

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

Hi, Peter,

On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote:
> On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote:
>
> > Or do you suggest to add a random new flag in struct thread_info instead
> > of a TIF flag?
>
> Why thread_info? What's wrong with something simple like the below. It
> takes a bit from the 'strictly current' flags word.
>
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index b62e6aaf28f0..fca830b97055 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -801,6 +801,9 @@ struct task_struct {
> /* Stalled due to lack of memory */
> unsigned in_memstall:1;
> #endif
> +#ifdef CONFIG_PCI_PASID
> + unsigned has_valid_pasid:1;
> +#endif
>
> unsigned long atomic_flags; /* Flags requiring atomic access. */
>
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 142b23645d82..10b3891be99e 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
> tsk->use_memdelay = 0;
> #endif
>
> +#ifdef CONFIG_PCI_PASID
> + tsk->has_valid_pasid = 0;
> +#endif
> +
> #ifdef CONFIG_MEMCG
> tsk->active_memcg = NULL;
> #endif

Can I add "Signed-off-by: Peter Zijlstra <[email protected]>"
to this patch? I will send this patch in the next version of the series.

Thanks.

-Fenghua

2020-06-17 08:34:54

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 12/12] x86/traps: Fix up invalid PASID

On Tue, Jun 16, 2020 at 04:23:46PM -0700, Fenghua Yu wrote:
> Hi, Peter,
>
> On Mon, Jun 15, 2020 at 09:09:28PM +0200, Peter Zijlstra wrote:
> > On Mon, Jun 15, 2020 at 11:55:29AM -0700, Fenghua Yu wrote:
> >
> > > Or do you suggest to add a random new flag in struct thread_info instead
> > > of a TIF flag?
> >
> > Why thread_info? What's wrong with something simple like the below. It
> > takes a bit from the 'strictly current' flags word.
> >
> >
> > diff --git a/include/linux/sched.h b/include/linux/sched.h
> > index b62e6aaf28f0..fca830b97055 100644
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -801,6 +801,9 @@ struct task_struct {
> > /* Stalled due to lack of memory */
> > unsigned in_memstall:1;
> > #endif
> > +#ifdef CONFIG_PCI_PASID
> > + unsigned has_valid_pasid:1;
> > +#endif
> >
> > unsigned long atomic_flags; /* Flags requiring atomic access. */
> >
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index 142b23645d82..10b3891be99e 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -955,6 +955,10 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
> > tsk->use_memdelay = 0;
> > #endif
> >
> > +#ifdef CONFIG_PCI_PASID
> > + tsk->has_valid_pasid = 0;
> > +#endif
> > +
> > #ifdef CONFIG_MEMCG
> > tsk->active_memcg = NULL;
> > #endif
>
> Can I add "Signed-off-by: Peter Zijlstra <[email protected]>"
> to this patch? I will send this patch in the next version of the series.

Sure, n/p.

2020-06-18 08:12:11

by Frederic Barrat

[permalink] [raw]
Subject: Re: [PATCH v2 02/12] ocxl: Change type of pasid to unsigned int



Le 13/06/2020 à 02:41, Fenghua Yu a écrit :
> PASID is defined as "int" although it's a 20-bit value and shouldn't be
> negative int. To be consistent with type defined in iommu, define PASID
> as "unsigned int".


It looks like this patch was considered because of the use of 'pasid' in
variable or function names. The ocxl driver only makes sense on powerpc
and shouldn't compile on anything else, so it's probably useless in the
context of that series.
The pasid here is defined by the opencapi specification
(https://opencapi.org), it is borrowed from the PCI world and you could
argue it could be an unsigned int. But then I think the patch doesn't go
far enough. But considering it's not used on x86, I think this patch can
be dropped.

Fred



> Suggested-by: Thomas Gleixner <[email protected]>
> Signed-off-by: Fenghua Yu <[email protected]>
> Reviewed-by: Tony Luck <[email protected]>
> ---
> v2:
> - Create this new patch to define PASID as "unsigned int" consistently in
> ocxl (Thomas)
>
> drivers/misc/ocxl/config.c | 3 ++-
> drivers/misc/ocxl/link.c | 6 +++---
> drivers/misc/ocxl/ocxl_internal.h | 6 +++---
> drivers/misc/ocxl/pasid.c | 2 +-
> drivers/misc/ocxl/trace.h | 20 ++++++++++----------
> include/misc/ocxl.h | 6 +++---
> 6 files changed, 22 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/misc/ocxl/config.c b/drivers/misc/ocxl/config.c
> index c8e19bfb5ef9..22d034caed3d 100644
> --- a/drivers/misc/ocxl/config.c
> +++ b/drivers/misc/ocxl/config.c
> @@ -806,7 +806,8 @@ int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec)
> }
> EXPORT_SYMBOL_GPL(ocxl_config_set_TL);
>
> -int ocxl_config_terminate_pasid(struct pci_dev *dev, int afu_control, int pasid)
> +int ocxl_config_terminate_pasid(struct pci_dev *dev, int afu_control,
> + unsigned int pasid)
> {
> u32 val;
> unsigned long timeout;
> diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
> index 58d111afd9f6..931f6ae022db 100644
> --- a/drivers/misc/ocxl/link.c
> +++ b/drivers/misc/ocxl/link.c
> @@ -492,7 +492,7 @@ static u64 calculate_cfg_state(bool kernel)
> return state;
> }
>
> -int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
> +int ocxl_link_add_pe(void *link_handle, unsigned int pasid, u32 pidr, u32 tidr,
> u64 amr, struct mm_struct *mm,
> void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
> void *xsl_err_data)
> @@ -572,7 +572,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
> }
> EXPORT_SYMBOL_GPL(ocxl_link_add_pe);
>
> -int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid)
> +int ocxl_link_update_pe(void *link_handle, unsigned int pasid, __u16 tid)
> {
> struct ocxl_link *link = (struct ocxl_link *) link_handle;
> struct spa *spa = link->spa;
> @@ -608,7 +608,7 @@ int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid)
> return rc;
> }
>
> -int ocxl_link_remove_pe(void *link_handle, int pasid)
> +int ocxl_link_remove_pe(void *link_handle, unsigned int pasid)
> {
> struct ocxl_link *link = (struct ocxl_link *) link_handle;
> struct spa *spa = link->spa;
> diff --git a/drivers/misc/ocxl/ocxl_internal.h b/drivers/misc/ocxl/ocxl_internal.h
> index 345bf843a38e..3ca982ba7472 100644
> --- a/drivers/misc/ocxl/ocxl_internal.h
> +++ b/drivers/misc/ocxl/ocxl_internal.h
> @@ -41,7 +41,7 @@ struct ocxl_afu {
> struct ocxl_afu_config config;
> int pasid_base;
> int pasid_count; /* opened contexts */
> - int pasid_max; /* maximum number of contexts */
> + unsigned int pasid_max; /* maximum number of contexts */
> int actag_base;
> int actag_enabled;
> struct mutex contexts_lock;
> @@ -69,7 +69,7 @@ struct ocxl_xsl_error {
>
> struct ocxl_context {
> struct ocxl_afu *afu;
> - int pasid;
> + unsigned int pasid;
> struct mutex status_mutex;
> enum ocxl_context_status status;
> struct address_space *mapping;
> @@ -128,7 +128,7 @@ int ocxl_config_check_afu_index(struct pci_dev *dev,
> * pasid: the PASID for the AFU context
> * tid: the new thread id for the process element
> */
> -int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid);
> +int ocxl_link_update_pe(void *link_handle, unsigned int pasid, __u16 tid);
>
> int ocxl_context_mmap(struct ocxl_context *ctx,
> struct vm_area_struct *vma);
> diff --git a/drivers/misc/ocxl/pasid.c b/drivers/misc/ocxl/pasid.c
> index d14cb56e6920..a151fc8f0bec 100644
> --- a/drivers/misc/ocxl/pasid.c
> +++ b/drivers/misc/ocxl/pasid.c
> @@ -80,7 +80,7 @@ static void range_free(struct list_head *head, u32 start, u32 size,
>
> int ocxl_pasid_afu_alloc(struct ocxl_fn *fn, u32 size)
> {
> - int max_pasid;
> + unsigned int max_pasid;
>
> if (fn->config.max_pasid_log < 0)
> return -ENOSPC;
> diff --git a/drivers/misc/ocxl/trace.h b/drivers/misc/ocxl/trace.h
> index 17e21cb2addd..019e2fc63b1d 100644
> --- a/drivers/misc/ocxl/trace.h
> +++ b/drivers/misc/ocxl/trace.h
> @@ -9,13 +9,13 @@
> #include <linux/tracepoint.h>
>
> DECLARE_EVENT_CLASS(ocxl_context,
> - TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
> + TP_PROTO(pid_t pid, void *spa, unsigned int pasid, u32 pidr, u32 tidr),
> TP_ARGS(pid, spa, pasid, pidr, tidr),
>
> TP_STRUCT__entry(
> __field(pid_t, pid)
> __field(void*, spa)
> - __field(int, pasid)
> + __field(unsigned int, pasid)
> __field(u32, pidr)
> __field(u32, tidr)
> ),
> @@ -38,21 +38,21 @@ DECLARE_EVENT_CLASS(ocxl_context,
> );
>
> DEFINE_EVENT(ocxl_context, ocxl_context_add,
> - TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
> + TP_PROTO(pid_t pid, void *spa, unsigned int pasid, u32 pidr, u32 tidr),
> TP_ARGS(pid, spa, pasid, pidr, tidr)
> );
>
> DEFINE_EVENT(ocxl_context, ocxl_context_remove,
> - TP_PROTO(pid_t pid, void *spa, int pasid, u32 pidr, u32 tidr),
> + TP_PROTO(pid_t pid, void *spa, unsigned int pasid, u32 pidr, u32 tidr),
> TP_ARGS(pid, spa, pasid, pidr, tidr)
> );
>
> TRACE_EVENT(ocxl_terminate_pasid,
> - TP_PROTO(int pasid, int rc),
> + TP_PROTO(unsigned int pasid, int rc),
> TP_ARGS(pasid, rc),
>
> TP_STRUCT__entry(
> - __field(int, pasid)
> + __field(unsigned int, pasid)
> __field(int, rc)
> ),
>
> @@ -107,11 +107,11 @@ DEFINE_EVENT(ocxl_fault_handler, ocxl_fault_ack,
> );
>
> TRACE_EVENT(ocxl_afu_irq_alloc,
> - TP_PROTO(int pasid, int irq_id, unsigned int virq, int hw_irq),
> + TP_PROTO(unsigned int pasid, int irq_id, unsigned int virq, int hw_irq),
> TP_ARGS(pasid, irq_id, virq, hw_irq),
>
> TP_STRUCT__entry(
> - __field(int, pasid)
> + __field(unsigned int, pasid)
> __field(int, irq_id)
> __field(unsigned int, virq)
> __field(int, hw_irq)
> @@ -133,11 +133,11 @@ TRACE_EVENT(ocxl_afu_irq_alloc,
> );
>
> TRACE_EVENT(ocxl_afu_irq_free,
> - TP_PROTO(int pasid, int irq_id),
> + TP_PROTO(unsigned int pasid, int irq_id),
> TP_ARGS(pasid, irq_id),
>
> TP_STRUCT__entry(
> - __field(int, pasid)
> + __field(unsigned int, pasid)
> __field(int, irq_id)
> ),
>
> diff --git a/include/misc/ocxl.h b/include/misc/ocxl.h
> index 06dd5839e438..5eca04c8da97 100644
> --- a/include/misc/ocxl.h
> +++ b/include/misc/ocxl.h
> @@ -429,7 +429,7 @@ int ocxl_config_set_TL(struct pci_dev *dev, int tl_dvsec);
> * desired AFU. It can be found in the AFU configuration
> */
> int ocxl_config_terminate_pasid(struct pci_dev *dev,
> - int afu_control_offset, int pasid);
> + int afu_control_offset, unsigned int pasid);
>
> /*
> * Read the configuration space of a function and fill in a
> @@ -466,7 +466,7 @@ void ocxl_link_release(struct pci_dev *dev, void *link_handle);
> * 'xsl_err_data' is an argument passed to the above callback, if
> * defined
> */
> -int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
> +int ocxl_link_add_pe(void *link_handle, unsigned int pasid, u32 pidr, u32 tidr,
> u64 amr, struct mm_struct *mm,
> void (*xsl_err_cb)(void *data, u64 addr, u64 dsisr),
> void *xsl_err_data);
> @@ -474,7 +474,7 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 pidr, u32 tidr,
> /*
> * Remove a Process Element from the Shared Process Area for a link
> */
> -int ocxl_link_remove_pe(void *link_handle, int pasid);
> +int ocxl_link_remove_pe(void *link_handle, unsigned int pasid);
>
> /*
> * Allocate an AFU interrupt associated to the link.
>

2020-06-18 15:42:25

by Fenghua Yu

[permalink] [raw]
Subject: Re: [PATCH v2 02/12] ocxl: Change type of pasid to unsigned int

Hi, Frederic,

On Thu, Jun 18, 2020 at 10:05:19AM +0200, Frederic Barrat wrote:
>
>
> Le 13/06/2020 ? 02:41, Fenghua Yu a ?crit?:
> >PASID is defined as "int" although it's a 20-bit value and shouldn't be
> >negative int. To be consistent with type defined in iommu, define PASID
> >as "unsigned int".
>
>
> It looks like this patch was considered because of the use of 'pasid' in
> variable or function names. The ocxl driver only makes sense on powerpc and
> shouldn't compile on anything else, so it's probably useless in the context
> of that series.
> The pasid here is defined by the opencapi specification
> (https://opencapi.org), it is borrowed from the PCI world and you could
> argue it could be an unsigned int. But then I think the patch doesn't go far
> enough. But considering it's not used on x86, I think this patch can be
> dropped.

The first 3 patches clean up pasid and flag defitions to prepare for
following patches.

If you think this patch can be dropped, we will drop it.

Thanks.

-Fenghua

2020-06-18 17:01:13

by Frederic Barrat

[permalink] [raw]
Subject: Re: [PATCH v2 02/12] ocxl: Change type of pasid to unsigned int



Le 18/06/2020 à 17:37, Fenghua Yu a écrit :
> The first 3 patches clean up pasid and flag defitions to prepare for
> following patches.
>
> If you think this patch can be dropped, we will drop it.

Yes, I think that's the case.

Thanks,

Fred