2019-10-25 19:00:32

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v7 00/11] Nested Shared Virtual Address (SVA) VT-d support

Shared virtual address (SVA), a.k.a, Shared virtual memory (SVM) on Intel
platforms allow address space sharing between device DMA and applications.
SVA can reduce programming complexity and enhance security.
This series is intended to enable SVA virtualization, i.e. enable use of SVA
within a guest user application.

Only IOMMU portion of the changes are included in this series. Additional
support is needed in VFIO and QEMU (will be submitted separately) to complete
this functionality.

To make incremental changes and reduce the size of each patchset. This series
does not inlcude support for page request services.

In VT-d implementation, PASID table is per device and maintained in the host.
Guest PASID table is shadowed in VMM where virtual IOMMU is emulated.

.-------------. .---------------------------.
| vIOMMU | | Guest process CR3, FL only|
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush -
'-------------' |
| | V
| | CR3 in GPA
'-------------'
Guest
------| Shadow |--------------------------|--------
v v v
Host
.-------------. .----------------------.
| pIOMMU | | Bind FL for GVA-GPA |
| | '----------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.------------------------------.
| | |SL for GPA-HPA, default domain|
| | '------------------------------'
'-------------'
Where:
- FL = First level/stage one page tables
- SL = Second level/stage two page tables

This is the remaining VT-d only portion of V5 since the uAPIs and IOASID common
code have been applied to Joerg's IOMMU core branch.
(https://lkml.org/lkml/2019/10/2/833)

The complete set with VFIO patches are here:
https://github.com/jacobpan/linux.git:siov_sva

The complete nested SVA upstream patches are divided into three phases:
1. Common APIs and PCI device direct assignment
2. Page Request Services (PRS) support
3. Mediated device assignment

With this set and the accompanied VFIO code, we will achieve phase #1.

Thanks,

Jacob

ChangeLog:
- V7
- Respect vIOMMU PASID range in virtual command PASID/IOASID allocator
- Caching virtual command capabilities to avoid runtime checks that
could cause vmexits.

- V6
- Rebased on top of Joerg's core branch
(git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git core)
- Adapt to new uAPIs and IOASID allocators

- V5
Rebased on v5.3-rc4 which has some of the IOMMU fault APIs merged.
Addressed v4 review comments from Eric Auger, Baolu Lu, and
Jonathan Cameron. Specific changes are as follows:
- Refined custom IOASID allocator to support multiple vIOMMU, hotplug
cases.
- Extracted vendor data from IOMMU guest PASID bind data, for VT-d
will support all necessary guest PASID entry fields for PASID
bind.
- Support non-identity host-guest PASID mapping
- Exception handling in various cases

- V4
- Redesigned IOASID allocator such that it can support custom
allocators with shared helper functions. Use separate XArray
to store IOASIDs per allocator. Took advice from Eric Auger to
have default allocator use the generic allocator structure.
Combined into one patch in that the default allocator is just
"another" allocator now. Can be built as a module in case of
driver use without IOMMU.
- Extended bind guest PASID data to support SMMU and non-identity
guest to host PASID mapping https://lkml.org/lkml/2019/5/21/802
- Rebased on Jean's sva/api common tree, new patches starts with
[PATCH v4 10/22]

- V3
- Addressed thorough review comments from Eric Auger (Thank you!)
- Moved IOASID allocator from driver core to IOMMU code per
suggestion by Christoph Hellwig
(https://lkml.org/lkml/2019/4/26/462)
- Rebased on top of Jean's SVA API branch and Eric's v7[1]
(git://linux-arm.org/linux-jpb.git sva/api)
- All IOMMU APIs are unmodified (except the new bind guest PASID
call in patch 9/16)

- V2
- Rebased on Joerg's IOMMU x86/vt-d branch v5.1-rc4
- Integrated with Eric Auger's new v7 series for common APIs
(https://github.com/eauger/linux/tree/v5.1-rc3-2stage-v7)
- Addressed review comments from Andy Shevchenko and Alex Williamson on
IOASID custom allocator.
- Support multiple custom IOASID allocators (vIOMMUs) and dynamic
registration.

Jacob Pan (10):
iommu/vt-d: Cache virtual command capability register
iommu/vt-d: Add custom allocator for IOASID
iommu/vt-d: Replace Intel specific PASID allocator with IOASID
iommu/vt-d: Move domain helper to header
iommu/vt-d: Avoid duplicated code for PASID setup
iommu/vt-d: Add nested translation helper function
iommu/vt-d: Misc macro clean up for SVM
iommu/vt-d: Add bind guest PASID support
iommu/vt-d: Support flushing more translation cache types
iommu/vt-d: Add svm/sva invalidate function

Lu Baolu (1):
iommu/vt-d: Enlightened PASID allocation

drivers/iommu/Kconfig | 1 +
drivers/iommu/dmar.c | 47 +++++++
drivers/iommu/intel-iommu.c | 259 ++++++++++++++++++++++++++++++++--
drivers/iommu/intel-pasid.c | 332 ++++++++++++++++++++++++++++++++++++--------
drivers/iommu/intel-pasid.h | 25 +++-
drivers/iommu/intel-svm.c | 298 +++++++++++++++++++++++++++++++--------
include/linux/intel-iommu.h | 43 +++++-
include/linux/intel-svm.h | 17 +++
8 files changed, 890 insertions(+), 132 deletions(-)

--
2.7.4


2019-10-25 19:00:49

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v7 07/11] iommu/vt-d: Add nested translation helper function

Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
With PASID granular translation type set to 0x11b, translation
result from the first level(FL) also subject to a second level(SL)
page table translation. This mode is used for SVA virtualization,
where FL performs guest virtual to guest physical translation and
SL performs guest physical to host physical translation.

Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Liu, Yi L <[email protected]>
---
drivers/iommu/intel-pasid.c | 207 ++++++++++++++++++++++++++++++++++++++++++++
drivers/iommu/intel-pasid.h | 12 +++
2 files changed, 219 insertions(+)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index ffbd416ed3b8..f846a907cfcf 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -415,6 +415,76 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value)
pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
}

+/*
+ * Setup the Extended Memory Type(EMT) field (Bits 91-93)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_emt(struct pasid_entry *pe, u64 value)
+{
+ pasid_set_bits(&pe->val[1], GENMASK_ULL(29, 27), value << 27);
+}
+
+/*
+ * Setup the Page Attribute Table (PAT) field (Bits 96-127)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_pat(struct pasid_entry *pe, u64 value)
+{
+ pasid_set_bits(&pe->val[1], GENMASK_ULL(63, 32), value << 27);
+}
+
+/*
+ * Setup the Cache Disable (CD) field (Bit 89)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_cd(struct pasid_entry *pe)
+{
+ pasid_set_bits(&pe->val[1], 1 << 25, 1);
+}
+
+/*
+ * Setup the Extended Memory Type Enable (EMTE) field (Bit 90)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_emte(struct pasid_entry *pe)
+{
+ pasid_set_bits(&pe->val[1], 1 << 26, 1);
+}
+
+/*
+ * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_eafe(struct pasid_entry *pe)
+{
+ pasid_set_bits(&pe->val[2], 1 << 7, 1);
+}
+
+/*
+ * Setup the Page-level Cache Disable (PCD) field (Bit 95)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_pcd(struct pasid_entry *pe)
+{
+ pasid_set_bits(&pe->val[1], 1 << 31, 1);
+}
+
+/*
+ * Setup the Page-level Write-Through (PWT)) field (Bit 94)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_pwt(struct pasid_entry *pe)
+{
+ pasid_set_bits(&pe->val[1], 1 << 30, 1);
+}
+
static void
pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
u16 did, int pasid)
@@ -647,3 +717,140 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,

return 0;
}
+
+static int intel_pasid_setup_bind_data(struct intel_iommu *iommu,
+ struct pasid_entry *pte,
+ struct iommu_gpasid_bind_data_vtd *pasid_data)
+{
+ /*
+ * Not all guest PASID table entry fields are passed down during bind,
+ * here we only set up the ones that are dependent on guest settings.
+ * Execution related bits such as NXE, SMEP are not meaningful to IOMMU,
+ * therefore not set. Other fields, such as snoop related, are set based
+ * on host needs regardless of guest settings.
+ */
+ if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_SRE) {
+ if (!ecap_srs(iommu->ecap)) {
+ pr_err("No supervisor request support on %s\n",
+ iommu->name);
+ return -EINVAL;
+ }
+ pasid_set_sre(pte);
+ }
+
+ if ((pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) && ecap_eafs(iommu->ecap))
+ pasid_set_eafe(pte);
+
+ if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EMTE) {
+ pasid_set_emte(pte);
+ pasid_set_emt(pte, pasid_data->emt);
+ }
+
+ /*
+ * Memory type is only applicable to devices inside processor coherent
+ * domain. PCIe devices are not included. We can skip the rest of the
+ * flags if IOMMU does not support MTS.
+ */
+ if (!ecap_mts(iommu->ecap)) {
+ pr_info("%s does not support memory type bind guest PASID\n",
+ iommu->name);
+ return 0;
+ }
+
+ if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PCD)
+ pasid_set_pcd(pte);
+ if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PWT)
+ pasid_set_pwt(pte);
+ if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_CD)
+ pasid_set_cd(pte);
+ pasid_set_pat(pte, pasid_data->pat);
+
+ return 0;
+
+}
+
+/**
+ * intel_pasid_setup_nested() - Set up PASID entry for nested translation
+ * which is used for vSVA. The first level page tables are used for
+ * GVA-GPA translation in the guest, second level page tables are used
+ * for GPA to HPA translation.
+ *
+ * @iommu: Iommu which the device belong to
+ * @dev: Device to be set up for translation
+ * @gpgd: FLPTPTR: First Level Page translation pointer in GPA
+ * @pasid: PASID to be programmed in the device PASID table
+ * @pasid_data: Additional PASID info from the guest bind request
+ * @domain: Domain info for setting up second level page tables
+ * @addr_width: Address width of the first level (guest)
+ */
+int intel_pasid_setup_nested(struct intel_iommu *iommu,
+ struct device *dev, pgd_t *gpgd,
+ int pasid, struct iommu_gpasid_bind_data_vtd *pasid_data,
+ struct dmar_domain *domain,
+ int addr_width)
+{
+ struct pasid_entry *pte;
+ struct dma_pte *pgd;
+ u64 pgd_val;
+ int agaw;
+ u16 did;
+
+ if (!ecap_nest(iommu->ecap)) {
+ pr_err("IOMMU: %s: No nested translation support\n",
+ iommu->name);
+ return -EINVAL;
+ }
+
+ pte = intel_pasid_get_entry(dev, pasid);
+ if (WARN_ON(!pte))
+ return -EINVAL;
+
+ pasid_clear_entry(pte);
+
+ /* Sanity checking performed by caller to make sure address
+ * width matching in two dimensions:
+ * 1. CPU vs. IOMMU
+ * 2. Guest vs. Host.
+ */
+ switch (addr_width) {
+ case 57:
+ pasid_set_flpm(pte, 1);
+ break;
+ case 48:
+ pasid_set_flpm(pte, 0);
+ break;
+ default:
+ dev_err(dev, "Invalid paging mode %d\n", addr_width);
+ return -EINVAL;
+ }
+
+ pasid_set_flptr(pte, (u64)gpgd);
+
+ intel_pasid_setup_bind_data(iommu, pte, pasid_data);
+
+ /* Setup the second level based on the given domain */
+ pgd = domain->pgd;
+
+ for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
+ pgd = phys_to_virt(dma_pte_addr(pgd));
+ if (!dma_pte_present(pgd)) {
+ dev_err(dev, "Invalid domain page table\n");
+ return -EINVAL;
+ }
+ }
+ pgd_val = virt_to_phys(pgd);
+ pasid_set_slptr(pte, pgd_val);
+ pasid_set_fault_enable(pte);
+
+ did = domain->iommu_did[iommu->seq_id];
+ pasid_set_domain_id(pte, did);
+
+ pasid_set_address_width(pte, agaw);
+ pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+
+ pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
+ pasid_set_present(pte);
+ pasid_flush_caches(iommu, pte, pasid, did);
+
+ return 0;
+}
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index e413e884e685..09c85db73b77 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -46,6 +46,7 @@
* to vmalloc or even module mappings.
*/
#define PASID_FLAG_SUPERVISOR_MODE BIT(0)
+#define PASID_FLAG_NESTED BIT(1)

struct pasid_dir_entry {
u64 val;
@@ -55,6 +56,11 @@ struct pasid_entry {
u64 val[8];
};

+#define PASID_ENTRY_PGTT_FL_ONLY (1)
+#define PASID_ENTRY_PGTT_SL_ONLY (2)
+#define PASID_ENTRY_PGTT_NESTED (3)
+#define PASID_ENTRY_PGTT_PT (4)
+
/* The representative of a PASID table */
struct pasid_table {
void *table; /* pasid table pointer */
@@ -103,6 +109,12 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
struct dmar_domain *domain,
struct device *dev, int pasid);
+int intel_pasid_setup_nested(struct intel_iommu *iommu,
+ struct device *dev, pgd_t *pgd,
+ int pasid,
+ struct iommu_gpasid_bind_data_vtd *pasid_data,
+ struct dmar_domain *domain,
+ int addr_width);
void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
struct device *dev, int pasid);
int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
--
2.7.4

2019-10-25 19:01:09

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v7 05/11] iommu/vt-d: Move domain helper to header

Move domain helper to header to be used by SVA code.

Signed-off-by: Jacob Pan <[email protected]>
Reviewed-by: Eric Auger <[email protected]>
---
drivers/iommu/intel-iommu.c | 6 ------
include/linux/intel-iommu.h | 6 ++++++
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 2ea09b988a23..acd1ac787d8b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -428,12 +428,6 @@ static void init_translation_status(struct intel_iommu *iommu)
iommu->flags |= VTD_FLAG_TRANS_PRE_ENABLED;
}

-/* Convert generic 'struct iommu_domain to private struct dmar_domain */
-static struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
-{
- return container_of(dom, struct dmar_domain, domain);
-}
-
static int __init intel_iommu_setup(char *str)
{
if (!str)
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index c624733cb2e6..3dba6ad3e9ad 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -594,6 +594,12 @@ static inline void __iommu_flush_cache(
clflush_cache_range(addr, size);
}

+/* Convert generic struct iommu_domain to private struct dmar_domain */
+static inline struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
+{
+ return container_of(dom, struct dmar_domain, domain);
+}
+
/*
* 0: readable
* 1: writable
--
2.7.4

2019-10-25 19:01:13

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v7 06/11] iommu/vt-d: Avoid duplicated code for PASID setup

After each setup for PASID entry, related translation caches must be flushed.
We can combine duplicated code into one function which is less error prone.

Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel-pasid.c | 48 +++++++++++++++++----------------------------
1 file changed, 18 insertions(+), 30 deletions(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index e79d680fe300..ffbd416ed3b8 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -485,6 +485,21 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
devtlb_invalidation_with_pasid(iommu, dev, pasid);
}

+static void pasid_flush_caches(struct intel_iommu *iommu,
+ struct pasid_entry *pte,
+ int pasid, u16 did)
+{
+ if (!ecap_coherent(iommu->ecap))
+ clflush_cache_range(pte, sizeof(*pte));
+
+ if (cap_caching_mode(iommu->cap)) {
+ pasid_cache_invalidation_with_pasid(iommu, did, pasid);
+ iotlb_invalidation_with_pasid(iommu, did, pasid);
+ } else {
+ iommu_flush_write_buffer(iommu);
+ }
+}
+
/*
* Set up the scalable mode pasid table entry for first only
* translation type.
@@ -530,16 +545,7 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
/* Setup Present and PASID Granular Transfer Type: */
pasid_set_translation_type(pte, 1);
pasid_set_present(pte);
-
- if (!ecap_coherent(iommu->ecap))
- clflush_cache_range(pte, sizeof(*pte));
-
- if (cap_caching_mode(iommu->cap)) {
- pasid_cache_invalidation_with_pasid(iommu, did, pasid);
- iotlb_invalidation_with_pasid(iommu, did, pasid);
- } else {
- iommu_flush_write_buffer(iommu);
- }
+ pasid_flush_caches(iommu, pte, pasid, did);

return 0;
}
@@ -603,16 +609,7 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
*/
pasid_set_sre(pte);
pasid_set_present(pte);
-
- if (!ecap_coherent(iommu->ecap))
- clflush_cache_range(pte, sizeof(*pte));
-
- if (cap_caching_mode(iommu->cap)) {
- pasid_cache_invalidation_with_pasid(iommu, did, pasid);
- iotlb_invalidation_with_pasid(iommu, did, pasid);
- } else {
- iommu_flush_write_buffer(iommu);
- }
+ pasid_flush_caches(iommu, pte, pasid, did);

return 0;
}
@@ -646,16 +643,7 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
*/
pasid_set_sre(pte);
pasid_set_present(pte);
-
- if (!ecap_coherent(iommu->ecap))
- clflush_cache_range(pte, sizeof(*pte));
-
- if (cap_caching_mode(iommu->cap)) {
- pasid_cache_invalidation_with_pasid(iommu, did, pasid);
- iotlb_invalidation_with_pasid(iommu, did, pasid);
- } else {
- iommu_flush_write_buffer(iommu);
- }
+ pasid_flush_caches(iommu, pte, pasid, did);

return 0;
}
--
2.7.4

2019-10-25 19:01:19

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v7 04/11] iommu/vt-d: Replace Intel specific PASID allocator with IOASID

Make use of generic IOASID code to manage PASID allocation,
free, and lookup. Replace Intel specific code.

Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/intel-iommu.c | 12 ++++++------
drivers/iommu/intel-pasid.c | 36 ------------------------------------
drivers/iommu/intel-svm.c | 39 +++++++++++++++++++++++----------------
3 files changed, 29 insertions(+), 58 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index ced1d89ef977..2ea09b988a23 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5311,7 +5311,7 @@ static void auxiliary_unlink_device(struct dmar_domain *domain,
domain->auxd_refcnt--;

if (!domain->auxd_refcnt && domain->default_pasid > 0)
- intel_pasid_free_id(domain->default_pasid);
+ ioasid_free(domain->default_pasid);
}

static int aux_domain_add_dev(struct dmar_domain *domain,
@@ -5329,10 +5329,10 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
if (domain->default_pasid <= 0) {
int pasid;

- pasid = intel_pasid_alloc_id(domain, PASID_MIN,
- pci_max_pasids(to_pci_dev(dev)),
- GFP_KERNEL);
- if (pasid <= 0) {
+ /* No private data needed for the default pasid */
+ pasid = ioasid_alloc(NULL, PASID_MIN, pci_max_pasids(to_pci_dev(dev)) - 1,
+ NULL);
+ if (pasid == INVALID_IOASID) {
pr_err("Can't allocate default pasid\n");
return -ENODEV;
}
@@ -5368,7 +5368,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
spin_unlock(&iommu->lock);
spin_unlock_irqrestore(&device_domain_lock, flags);
if (!domain->auxd_refcnt && domain->default_pasid > 0)
- intel_pasid_free_id(domain->default_pasid);
+ ioasid_free(domain->default_pasid);

return ret;
}
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index d81e857d2b25..e79d680fe300 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -26,42 +26,6 @@
*/
static DEFINE_SPINLOCK(pasid_lock);
u32 intel_pasid_max_id = PASID_MAX;
-static DEFINE_IDR(pasid_idr);
-
-int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
-{
- int ret, min, max;
-
- min = max_t(int, start, PASID_MIN);
- max = min_t(int, end, intel_pasid_max_id);
-
- WARN_ON(in_interrupt());
- idr_preload(gfp);
- spin_lock(&pasid_lock);
- ret = idr_alloc(&pasid_idr, ptr, min, max, GFP_ATOMIC);
- spin_unlock(&pasid_lock);
- idr_preload_end();
-
- return ret;
-}
-
-void intel_pasid_free_id(int pasid)
-{
- spin_lock(&pasid_lock);
- idr_remove(&pasid_idr, pasid);
- spin_unlock(&pasid_lock);
-}
-
-void *intel_pasid_lookup_id(int pasid)
-{
- void *p;
-
- spin_lock(&pasid_lock);
- p = idr_find(&pasid_idr, pasid);
- spin_unlock(&pasid_lock);
-
- return p;
-}

int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
{
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 9b159132405d..a9a7f85a09bc 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -17,6 +17,7 @@
#include <linux/dmar.h>
#include <linux/interrupt.h>
#include <linux/mm_types.h>
+#include <linux/ioasid.h>
#include <asm/page.h>

#include "intel-pasid.h"
@@ -318,16 +319,15 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
if (pasid_max > intel_pasid_max_id)
pasid_max = intel_pasid_max_id;

- /* Do not use PASID 0 in caching mode (virtualised IOMMU) */
- ret = intel_pasid_alloc_id(svm,
- !!cap_caching_mode(iommu->cap),
- pasid_max - 1, GFP_KERNEL);
- if (ret < 0) {
+ /* Do not use PASID 0, reserved for RID to PASID */
+ svm->pasid = ioasid_alloc(NULL, PASID_MIN,
+ pasid_max - 1, svm);
+ if (svm->pasid == INVALID_IOASID) {
kfree(svm);
kfree(sdev);
+ ret = ENOSPC;
goto out;
}
- svm->pasid = ret;
svm->notifier.ops = &intel_mmuops;
svm->mm = mm;
svm->flags = flags;
@@ -337,7 +337,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
if (mm) {
ret = mmu_notifier_register(&svm->notifier, mm);
if (ret) {
- intel_pasid_free_id(svm->pasid);
+ ioasid_free(svm->pasid);
kfree(svm);
kfree(sdev);
goto out;
@@ -353,7 +353,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
if (ret) {
if (mm)
mmu_notifier_unregister(&svm->notifier, mm);
- intel_pasid_free_id(svm->pasid);
+ ioasid_free(svm->pasid);
kfree(svm);
kfree(sdev);
goto out;
@@ -401,7 +401,12 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
if (!iommu)
goto out;

- svm = intel_pasid_lookup_id(pasid);
+ svm = ioasid_find(NULL, pasid, NULL);
+ if (IS_ERR(svm)) {
+ ret = PTR_ERR(svm);
+ goto out;
+ }
+
if (!svm)
goto out;

@@ -423,7 +428,9 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
kfree_rcu(sdev, rcu);

if (list_empty(&svm->devs)) {
- intel_pasid_free_id(svm->pasid);
+ /* Clear private data so that free pass check */
+ ioasid_set_data(svm->pasid, NULL);
+ ioasid_free(svm->pasid);
if (svm->mm)
mmu_notifier_unregister(&svm->notifier, svm->mm);

@@ -458,10 +465,11 @@ int intel_svm_is_pasid_valid(struct device *dev, int pasid)
if (!iommu)
goto out;

- svm = intel_pasid_lookup_id(pasid);
- if (!svm)
+ svm = ioasid_find(NULL, pasid, NULL);
+ if (IS_ERR(svm)) {
+ ret = PTR_ERR(svm);
goto out;
-
+ }
/* init_mm is used in this case */
if (!svm->mm)
ret = 1;
@@ -568,13 +576,12 @@ static irqreturn_t prq_event_thread(int irq, void *d)

if (!svm || svm->pasid != req->pasid) {
rcu_read_lock();
- svm = intel_pasid_lookup_id(req->pasid);
+ svm = ioasid_find(NULL, req->pasid, NULL);
/* It *can't* go away, because the driver is not permitted
* to unbind the mm while any page faults are outstanding.
* So we only need RCU to protect the internal idr code. */
rcu_read_unlock();
-
- if (!svm) {
+ if (IS_ERR(svm) || !svm) {
pr_err("%s: Page request for invalid PASID %d: %08llx %08llx\n",
iommu->name, req->pasid, ((unsigned long long *)req)[0],
((unsigned long long *)req)[1]);
--
2.7.4

2019-10-25 19:02:55

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support

When supporting guest SVA with emulated IOMMU, the guest PASID
table is shadowed in VMM. Updates to guest vIOMMU PASID table
will result in PASID cache flush which will be passed down to
the host as bind guest PASID calls.

For the SL page tables, it will be harvested from device's
default domain (request w/o PASID), or aux domain in case of
mediated device.

.-------------. .---------------------------.
| vIOMMU | | Guest process CR3, FL only|
| | '---------------------------'
.----------------/
| PASID Entry |--- PASID cache flush -
'-------------' |
| | V
| | CR3 in GPA
'-------------'
Guest
------| Shadow |--------------------------|--------
v v v
Host
.-------------. .----------------------.
| pIOMMU | | Bind FL for GVA-GPA |
| | '----------------------'
.----------------/ |
| PASID Entry | V (Nested xlate)
'----------------\.------------------------------.
| | |SL for GPA-HPA, default domain|
| | '------------------------------'
'-------------'
Where:
- FL = First level/stage one page tables
- SL = Second level/stage two page tables

Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Liu, Yi L <[email protected]>
---
drivers/iommu/intel-iommu.c | 4 +
drivers/iommu/intel-svm.c | 184 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/intel-iommu.h | 8 +-
include/linux/intel-svm.h | 17 ++++
4 files changed, 212 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index acd1ac787d8b..5fab32fbc4b4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -6026,6 +6026,10 @@ const struct iommu_ops intel_iommu_ops = {
.dev_disable_feat = intel_iommu_dev_disable_feat,
.is_attach_deferred = intel_iommu_is_attach_deferred,
.pgsize_bitmap = INTEL_IOMMU_PGSIZES,
+#ifdef CONFIG_INTEL_IOMMU_SVM
+ .sva_bind_gpasid = intel_svm_bind_gpasid,
+ .sva_unbind_gpasid = intel_svm_unbind_gpasid,
+#endif
};

static void quirk_iommu_igfx(struct pci_dev *dev)
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index a18b02a9709d..ae13a310cf96 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -216,6 +216,190 @@ static LIST_HEAD(global_svm_list);
list_for_each_entry(sdev, &svm->devs, list) \
if (dev == sdev->dev) \

+int intel_svm_bind_gpasid(struct iommu_domain *domain,
+ struct device *dev,
+ struct iommu_gpasid_bind_data *data)
+{
+ struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
+ struct dmar_domain *ddomain;
+ struct intel_svm_dev *sdev;
+ struct intel_svm *svm;
+ int ret = 0;
+
+ if (WARN_ON(!iommu) || !data)
+ return -EINVAL;
+
+ if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
+ data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
+ return -EINVAL;
+
+ if (dev_is_pci(dev)) {
+ /* VT-d supports devices with full 20 bit PASIDs only */
+ if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
+ return -EINVAL;
+ }
+
+ /*
+ * We only check host PASID range, we have no knowledge to check
+ * guest PASID range nor do we use the guest PASID.
+ */
+ if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
+ return -EINVAL;
+
+ ddomain = to_dmar_domain(domain);
+ /* REVISIT:
+ * Sanity check adddress width and paging mode support
+ * width matching in two dimensions:
+ * 1. paging mode CPU <= IOMMU
+ * 2. address width Guest <= Host.
+ */
+ mutex_lock(&pasid_mutex);
+ svm = ioasid_find(NULL, data->hpasid, NULL);
+ if (IS_ERR(svm)) {
+ ret = PTR_ERR(svm);
+ goto out;
+ }
+ if (svm) {
+ /*
+ * If we found svm for the PASID, there must be at
+ * least one device bond, otherwise svm should be freed.
+ */
+ BUG_ON(list_empty(&svm->devs));
+
+ for_each_svm_dev(svm, dev) {
+ /* In case of multiple sub-devices of the same pdev assigned, we should
+ * allow multiple bind calls with the same PASID and pdev.
+ */
+ sdev->users++;
+ goto out;
+ }
+ } else {
+ /* We come here when PASID has never been bond to a device. */
+ svm = kzalloc(sizeof(*svm), GFP_KERNEL);
+ if (!svm) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ /* REVISIT: upper layer/VFIO can track host process that bind the PASID.
+ * ioasid_set = mm might be sufficient for vfio to check pasid VMM
+ * ownership.
+ */
+ svm->mm = get_task_mm(current);
+ svm->pasid = data->hpasid;
+ if (data->flags & IOMMU_SVA_GPASID_VAL) {
+ svm->gpasid = data->gpasid;
+ svm->flags |= SVM_FLAG_GUEST_PASID;
+ }
+ ioasid_set_data(data->hpasid, svm);
+ INIT_LIST_HEAD_RCU(&svm->devs);
+ INIT_LIST_HEAD(&svm->list);
+
+ mmput(svm->mm);
+ }
+ sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
+ if (!sdev) {
+ if (list_empty(&svm->devs))
+ kfree(svm);
+ ret = -ENOMEM;
+ goto out;
+ }
+ sdev->dev = dev;
+ sdev->users = 1;
+
+ /* Set up device context entry for PASID if not enabled already */
+ ret = intel_iommu_enable_pasid(iommu, sdev->dev);
+ if (ret) {
+ dev_err(dev, "Failed to enable PASID capability\n");
+ kfree(sdev);
+ goto out;
+ }
+
+ /*
+ * For guest bind, we need to set up PASID table entry as follows:
+ * - FLPM matches guest paging mode
+ * - turn on nested mode
+ * - SL guest address width matching
+ */
+ ret = intel_pasid_setup_nested(iommu,
+ dev,
+ (pgd_t *)data->gpgd,
+ data->hpasid,
+ &data->vtd,
+ ddomain,
+ data->addr_width);
+ if (ret) {
+ dev_err(dev, "Failed to set up PASID %llu in nested mode, Err %d\n",
+ data->hpasid, ret);
+ kfree(sdev);
+ goto out;
+ }
+ svm->flags |= SVM_FLAG_GUEST_MODE;
+
+ init_rcu_head(&sdev->rcu);
+ list_add_rcu(&sdev->list, &svm->devs);
+ out:
+ mutex_unlock(&pasid_mutex);
+ return ret;
+}
+
+int intel_svm_unbind_gpasid(struct device *dev, int pasid)
+{
+ struct intel_svm_dev *sdev;
+ struct intel_iommu *iommu;
+ struct intel_svm *svm;
+ int ret = -EINVAL;
+
+ mutex_lock(&pasid_mutex);
+ iommu = intel_svm_device_to_iommu(dev);
+ if (!iommu)
+ goto out;
+
+ svm = ioasid_find(NULL, pasid, NULL);
+ if (IS_ERR_OR_NULL(svm)) {
+ ret = PTR_ERR(svm);
+ goto out;
+ }
+
+ for_each_svm_dev(svm, dev) {
+ ret = 0;
+ sdev->users--;
+ if (!sdev->users) {
+ list_del_rcu(&sdev->list);
+ intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+ /* TODO: Drain in flight PRQ for the PASID since it
+ * may get reused soon, we don't want to
+ * confuse with its previous life.
+ * intel_svm_drain_prq(dev, pasid);
+ */
+ kfree_rcu(sdev, rcu);
+
+ if (list_empty(&svm->devs)) {
+ list_del(&svm->list);
+ kfree(svm);
+ /*
+ * We do not free PASID here until explicit call
+ * from VFIO to free. The PASID life cycle
+ * management is largely tied to VFIO management
+ * of assigned device life cycles. In case of
+ * guest exit without a explicit free PASID call,
+ * the responsibility lies in VFIO layer to free
+ * the PASIDs allocated for the guest.
+ * For security reasons, VFIO has to track the
+ * PASID ownership per guest anyway to ensure
+ * that PASID allocated by one guest cannot be
+ * used by another.
+ */
+ ioasid_set_data(pasid, NULL);
+ }
+ }
+ break;
+ }
+ out:
+ mutex_unlock(&pasid_mutex);
+
+ return ret;
+}
+
int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
{
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 3dba6ad3e9ad..6c74c71b1ebf 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -673,7 +673,9 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev);
int intel_svm_init(struct intel_iommu *iommu);
extern int intel_svm_enable_prq(struct intel_iommu *iommu);
extern int intel_svm_finish_prq(struct intel_iommu *iommu);
-
+extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
+ struct device *dev, struct iommu_gpasid_bind_data *data);
+extern int intel_svm_unbind_gpasid(struct device *dev, int pasid);
struct svm_dev_ops;

struct intel_svm_dev {
@@ -690,9 +692,13 @@ struct intel_svm_dev {
struct intel_svm {
struct mmu_notifier notifier;
struct mm_struct *mm;
+
struct intel_iommu *iommu;
int flags;
int pasid;
+ int gpasid; /* Guest PASID in case of vSVA bind with non-identity host
+ * to guest PASID mapping.
+ */
struct list_head devs;
struct list_head list;
};
diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
index 94f047a8a845..a2c189ad0b01 100644
--- a/include/linux/intel-svm.h
+++ b/include/linux/intel-svm.h
@@ -44,6 +44,23 @@ struct svm_dev_ops {
* do such IOTLB flushes automatically.
*/
#define SVM_FLAG_SUPERVISOR_MODE (1<<1)
+/*
+ * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind to a device.
+ * In this case the mm_struct is in the guest kernel or userspace, its life
+ * cycle is managed by VMM and VFIO layer. For IOMMU driver, this API provides
+ * means to bind/unbind guest CR3 with PASIDs allocated for a device.
+ */
+#define SVM_FLAG_GUEST_MODE (1<<2)
+/*
+ * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own PASID space,
+ * which requires guest and host PASID translation at both directions. We keep
+ * track of guest PASID in order to provide lookup service to device drivers.
+ * One such example is a physical function (PF) driver that supports mediated
+ * device (mdev) assignment. Guest programming of mdev configuration space can
+ * only be done with guest PASID, therefore PF driver needs to find the matching
+ * host PASID to program the real hardware.
+ */
+#define SVM_FLAG_GUEST_PASID (1<<3)

#ifdef CONFIG_INTEL_IOMMU_SVM

--
2.7.4

2019-10-25 19:03:01

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v7 01/11] iommu/vt-d: Cache virtual command capability register

Virtual command registers are used in the guest only, to prevent
vmexit cost, we cache the capability and store it during initialization.

Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/dmar.c | 1 +
include/linux/intel-iommu.h | 4 ++++
2 files changed, 5 insertions(+)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index eecd6a421667..49bb7d76e646 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -950,6 +950,7 @@ static int map_iommu(struct intel_iommu *iommu, u64 phys_addr)
warn_invalid_dmar(phys_addr, " returns all ones");
goto unmap;
}
+ iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);

/* the registers might be more than one page */
map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index ed11ef594378..2e1bed9b7eef 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -186,6 +186,9 @@
#define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
#define ecap_sc_support(e) ((e >> 7) & 0x1) /* Snooping Control */

+/* Virtual command interface capabilities */
+#define vccap_pasid(v) ((v & DMA_VCS_PAS)) /* PASID allocation */
+
/* IOTLB_REG */
#define DMA_TLB_FLUSH_GRANU_OFFSET 60
#define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60)
@@ -520,6 +523,7 @@ struct intel_iommu {
u64 reg_size; /* size of hw register set */
u64 cap;
u64 ecap;
+ u64 vccap;
u32 gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF */
raw_spinlock_t register_lock; /* protect register handling */
int seq_id; /* sequence id of the iommu */
--
2.7.4

2019-10-25 19:03:14

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function

When Shared Virtual Address (SVA) is enabled for a guest OS via
vIOMMU, we need to provide invalidation support at IOMMU API and driver
level. This patch adds Intel VT-d specific function to implement
iommu passdown invalidate API for shared virtual address.

The use case is for supporting caching structure invalidation
of assigned SVM capable devices. Emulated IOMMU exposes queue
invalidation capability and passes down all descriptors from the guest
to the physical IOMMU.

The assumption is that guest to host device ID mapping should be
resolved prior to calling IOMMU driver. Based on the device handle,
host IOMMU driver can replace certain fields before submit to the
invalidation queue.

Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Ashok Raj <[email protected]>
Signed-off-by: Liu, Yi L <[email protected]>
---
drivers/iommu/intel-iommu.c | 170 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 170 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5fab32fbc4b4..a73e76d6457a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5491,6 +5491,175 @@ static void intel_iommu_aux_detach_device(struct iommu_domain *domain,
aux_domain_remove_dev(to_dmar_domain(domain), dev);
}

+/*
+ * 2D array for converting and sanitizing IOMMU generic TLB granularity to
+ * VT-d granularity. Invalidation is typically included in the unmap operation
+ * as a result of DMA or VFIO unmap. However, for assigned device where guest
+ * could own the first level page tables without being shadowed by QEMU. In
+ * this case there is no pass down unmap to the host IOMMU as a result of unmap
+ * in the guest. Only invalidations are trapped and passed down.
+ * In all cases, only first level TLB invalidation (request with PASID) can be
+ * passed down, therefore we do not include IOTLB granularity for request
+ * without PASID (second level).
+ *
+ * For an example, to find the VT-d granularity encoding for IOTLB
+ * type and page selective granularity within PASID:
+ * X: indexed by iommu cache type
+ * Y: indexed by enum iommu_inv_granularity
+ * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
+ *
+ * Granu_map array indicates validity of the table. 1: valid, 0: invalid
+ *
+ */
+const static int inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
+ /* PASID based IOTLB, support PASID selective and page selective */
+ {0, 1, 1},
+ /* PASID based dev TLBs, only support all PASIDs or single PASID */
+ {1, 1, 0},
+ /* PASID cache */
+ {1, 1, 0}
+};
+
+const static u64 inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
+ /* PASID based IOTLB */
+ {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
+ /* PASID based dev TLBs */
+ {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
+ /* PASID cache */
+ {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
+};
+
+static inline int to_vtd_granularity(int type, int granu, u64 *vtd_granu)
+{
+ if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >= IOMMU_INV_GRANU_NR ||
+ !inv_type_granu_map[type][granu])
+ return -EINVAL;
+
+ *vtd_granu = inv_type_granu_table[type][granu];
+
+ return 0;
+}
+
+static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
+{
+ u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
+
+ /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
+ * IOMMU cache invalidate API passes granu_size in bytes, and number of
+ * granu size in contiguous memory.
+ */
+ return order_base_2(nr_pages);
+}
+
+#ifdef CONFIG_INTEL_IOMMU_SVM
+static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
+ struct device *dev, struct iommu_cache_invalidate_info *inv_info)
+{
+ struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+ struct device_domain_info *info;
+ struct intel_iommu *iommu;
+ unsigned long flags;
+ int cache_type;
+ u8 bus, devfn;
+ u16 did, sid;
+ int ret = 0;
+ u64 size;
+
+ if (!inv_info || !dmar_domain ||
+ inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+ return -EINVAL;
+
+ if (!dev || !dev_is_pci(dev))
+ return -ENODEV;
+
+ iommu = device_to_iommu(dev, &bus, &devfn);
+ if (!iommu)
+ return -ENODEV;
+
+ spin_lock_irqsave(&device_domain_lock, flags);
+ spin_lock(&iommu->lock);
+ info = iommu_support_dev_iotlb(dmar_domain, iommu, bus, devfn);
+ if (!info) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+ did = dmar_domain->iommu_did[iommu->seq_id];
+ sid = PCI_DEVID(bus, devfn);
+ size = to_vtd_size(inv_info->addr_info.granule_size, inv_info->addr_info.nb_granules);
+
+ for_each_set_bit(cache_type, (unsigned long *)&inv_info->cache, IOMMU_CACHE_INV_TYPE_NR) {
+ u64 granu = 0;
+ u64 pasid = 0;
+
+ ret = to_vtd_granularity(cache_type, inv_info->granularity, &granu);
+ if (ret) {
+ pr_err("Invalid cache type and granu combination %d/%d\n", cache_type,
+ inv_info->granularity);
+ break;
+ }
+
+ /* PASID is stored in different locations based on granularity */
+ if (inv_info->granularity == IOMMU_INV_GRANU_PASID)
+ pasid = inv_info->pasid_info.pasid;
+ else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
+ pasid = inv_info->addr_info.pasid;
+ else {
+ pr_err("Cannot find PASID for given cache type and granularity\n");
+ break;
+ }
+
+ switch (BIT(cache_type)) {
+ case IOMMU_CACHE_INV_TYPE_IOTLB:
+ if (size && (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
+ pr_err("Address out of range, 0x%llx, size order %llu\n",
+ inv_info->addr_info.addr, size);
+ ret = -ERANGE;
+ goto out_unlock;
+ }
+
+ qi_flush_piotlb(iommu, did, mm_to_dma_pfn(inv_info->addr_info.addr),
+ pasid, size, granu, inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF);
+
+ /*
+ * Always flush device IOTLB if ATS is enabled since guest
+ * vIOMMU exposes CM = 1, no device IOTLB flush will be passed
+ * down.
+ */
+ if (info->ats_enabled) {
+ qi_flush_dev_piotlb(iommu, sid, info->pfsid,
+ pasid, info->ats_qdep,
+ inv_info->addr_info.addr, size,
+ granu);
+ }
+ break;
+ case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
+ if (info->ats_enabled) {
+ qi_flush_dev_piotlb(iommu, sid, info->pfsid,
+ inv_info->addr_info.pasid, info->ats_qdep,
+ inv_info->addr_info.addr, size,
+ granu);
+ } else
+ pr_warn("Passdown device IOTLB flush w/o ATS!\n");
+
+ break;
+ case IOMMU_CACHE_INV_TYPE_PASID:
+ qi_flush_pasid_cache(iommu, did, granu, inv_info->pasid_info.pasid);
+
+ break;
+ default:
+ dev_err(dev, "Unsupported IOMMU invalidation type %d\n",
+ cache_type);
+ ret = -EINVAL;
+ }
+ }
+out_unlock:
+ spin_unlock(&iommu->lock);
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+
+ return ret;
+}
+#endif
+
static int intel_iommu_map(struct iommu_domain *domain,
unsigned long iova, phys_addr_t hpa,
size_t size, int iommu_prot)
@@ -6027,6 +6196,7 @@ const struct iommu_ops intel_iommu_ops = {
.is_attach_deferred = intel_iommu_is_attach_deferred,
.pgsize_bitmap = INTEL_IOMMU_PGSIZES,
#ifdef CONFIG_INTEL_IOMMU_SVM
+ .cache_invalidate = intel_iommu_sva_invalidate,
.sva_bind_gpasid = intel_svm_bind_gpasid,
.sva_unbind_gpasid = intel_svm_unbind_gpasid,
#endif
--
2.7.4

2019-10-25 19:03:24

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v7 10/11] iommu/vt-d: Support flushing more translation cache types

When Shared Virtual Memory is exposed to a guest via vIOMMU, scalable
IOTLB invalidation may be passed down from outside IOMMU subsystems.
This patch adds invalidation functions that can be used for additional
translation cache types.

Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/dmar.c | 46 +++++++++++++++++++++++++++++++++++++++++++++
drivers/iommu/intel-pasid.c | 3 ++-
include/linux/intel-iommu.h | 21 +++++++++++++++++----
3 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 49bb7d76e646..0ce2d32ff99e 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1346,6 +1346,20 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
qi_submit_sync(&desc, iommu);
}

+/* PASID-based IOTLB Invalidate */
+void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr, u32 pasid,
+ unsigned int size_order, u64 granu, int ih)
+{
+ struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
+
+ desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
+ QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
+ desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
+ QI_EIOTLB_AM(size_order);
+
+ qi_submit_sync(&desc, iommu);
+}
+
void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
u16 qdep, u64 addr, unsigned mask)
{
@@ -1369,6 +1383,38 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
qi_submit_sync(&desc, iommu);
}

+/* PASID-based device IOTLB Invalidate */
+void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
+ u32 pasid, u16 qdep, u64 addr, unsigned size_order, u64 granu)
+{
+ struct qi_desc desc;
+
+ desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
+ QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
+ QI_DEV_IOTLB_PFSID(pfsid);
+ desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
+
+ /* If S bit is 0, we only flush a single page. If S bit is set,
+ * The least significant zero bit indicates the invalidation address
+ * range. VT-d spec 6.5.2.6.
+ * e.g. address bit 12[0] indicates 8KB, 13[0] indicates 16KB.
+ */
+ if (!size_order) {
+ desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) & ~QI_DEV_EIOTLB_SIZE;
+ } else {
+ unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order);
+ desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) | QI_DEV_EIOTLB_SIZE;
+ }
+ qi_submit_sync(&desc, iommu);
+}
+
+void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int pasid)
+{
+ struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
+
+ desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) | QI_PC_GRAN(granu) | QI_PC_TYPE;
+ qi_submit_sync(&desc, iommu);
+}
/*
* Disable Queued Invalidation interface.
*/
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index f846a907cfcf..6d7a701ef4d3 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -491,7 +491,8 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
{
struct qi_desc desc;

- desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL | QI_PC_PASID(pasid);
+ desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) |
+ QI_PC_PASID(pasid) | QI_PC_TYPE;
desc.qw1 = 0;
desc.qw2 = 0;
desc.qw3 = 0;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 6c74c71b1ebf..a25fb3a0ea5b 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -332,7 +332,7 @@ enum {
#define QI_IOTLB_GRAN(gran) (((u64)gran) >> (DMA_TLB_FLUSH_GRANU_OFFSET-4))
#define QI_IOTLB_ADDR(addr) (((u64)addr) & VTD_PAGE_MASK)
#define QI_IOTLB_IH(ih) (((u64)ih) << 6)
-#define QI_IOTLB_AM(am) (((u8)am))
+#define QI_IOTLB_AM(am) (((u8)am) & 0x3f)

#define QI_CC_FM(fm) (((u64)fm) << 48)
#define QI_CC_SID(sid) (((u64)sid) << 32)
@@ -350,16 +350,21 @@ enum {
#define QI_PC_DID(did) (((u64)did) << 16)
#define QI_PC_GRAN(gran) (((u64)gran) << 4)

-#define QI_PC_ALL_PASIDS (QI_PC_TYPE | QI_PC_GRAN(0))
-#define QI_PC_PASID_SEL (QI_PC_TYPE | QI_PC_GRAN(1))
+/* PASID cache invalidation granu */
+#define QI_PC_ALL_PASIDS 0
+#define QI_PC_PASID_SEL 1

#define QI_EIOTLB_ADDR(addr) ((u64)(addr) & VTD_PAGE_MASK)
#define QI_EIOTLB_IH(ih) (((u64)ih) << 6)
-#define QI_EIOTLB_AM(am) (((u64)am))
+#define QI_EIOTLB_AM(am) (((u64)am) & 0x3f)
#define QI_EIOTLB_PASID(pasid) (((u64)pasid) << 32)
#define QI_EIOTLB_DID(did) (((u64)did) << 16)
#define QI_EIOTLB_GRAN(gran) (((u64)gran) << 4)

+/* QI Dev-IOTLB inv granu */
+#define QI_DEV_IOTLB_GRAN_ALL 1
+#define QI_DEV_IOTLB_GRAN_PASID_SEL 0
+
#define QI_DEV_EIOTLB_ADDR(a) ((u64)(a) & VTD_PAGE_MASK)
#define QI_DEV_EIOTLB_SIZE (((u64)1) << 11)
#define QI_DEV_EIOTLB_GLOB(g) ((u64)g)
@@ -655,8 +660,16 @@ extern void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid,
u8 fm, u64 type);
extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
unsigned int size_order, u64 type);
+extern void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr,
+ u32 pasid, unsigned int size_order, u64 type, int ih);
extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
u16 qdep, u64 addr, unsigned mask);
+
+extern void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
+ u32 pasid, u16 qdep, u64 addr, unsigned size_order, u64 granu);
+
+extern void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int pasid);
+
extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);

extern int dmar_ir_support(void);
--
2.7.4

2019-10-25 19:03:35

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v7 08/11] iommu/vt-d: Misc macro clean up for SVM

Use combined macros for_each_svm_dev() to simplify SVM device iteration
and error checking.

Suggested-by: Andy Shevchenko <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
Reviewed-by: Eric Auger <[email protected]>
---
drivers/iommu/intel-svm.c | 89 ++++++++++++++++++++++-------------------------
1 file changed, 42 insertions(+), 47 deletions(-)

diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index a9a7f85a09bc..a18b02a9709d 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -212,6 +212,10 @@ static const struct mmu_notifier_ops intel_mmuops = {
static DEFINE_MUTEX(pasid_mutex);
static LIST_HEAD(global_svm_list);

+#define for_each_svm_dev(svm, dev) \
+ list_for_each_entry(sdev, &svm->devs, list) \
+ if (dev == sdev->dev) \
+
int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
{
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
@@ -257,15 +261,13 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
goto out;
}

- list_for_each_entry(sdev, &svm->devs, list) {
- if (dev == sdev->dev) {
- if (sdev->ops != ops) {
- ret = -EBUSY;
- goto out;
- }
- sdev->users++;
- goto success;
+ for_each_svm_dev(svm, dev) {
+ if (sdev->ops != ops) {
+ ret = -EBUSY;
+ goto out;
}
+ sdev->users++;
+ goto success;
}

break;
@@ -402,50 +404,43 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
goto out;

svm = ioasid_find(NULL, pasid, NULL);
- if (IS_ERR(svm)) {
+ if (IS_ERR_OR_NULL(svm)) {
ret = PTR_ERR(svm);
goto out;
}

- if (!svm)
- goto out;
-
- list_for_each_entry(sdev, &svm->devs, list) {
- if (dev == sdev->dev) {
- ret = 0;
- sdev->users--;
- if (!sdev->users) {
- list_del_rcu(&sdev->list);
- /* Flush the PASID cache and IOTLB for this device.
- * Note that we do depend on the hardware *not* using
- * the PASID any more. Just as we depend on other
- * devices never using PASIDs that they have no right
- * to use. We have a *shared* PASID table, because it's
- * large and has to be physically contiguous. So it's
- * hard to be as defensive as we might like. */
- intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
- intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
- kfree_rcu(sdev, rcu);
-
- if (list_empty(&svm->devs)) {
- /* Clear private data so that free pass check */
- ioasid_set_data(svm->pasid, NULL);
- ioasid_free(svm->pasid);
- if (svm->mm)
- mmu_notifier_unregister(&svm->notifier, svm->mm);
-
- list_del(&svm->list);
-
- /* We mandate that no page faults may be outstanding
- * for the PASID when intel_svm_unbind_mm() is called.
- * If that is not obeyed, subtle errors will happen.
- * Let's make them less subtle... */
- memset(svm, 0x6b, sizeof(*svm));
- kfree(svm);
- }
+ for_each_svm_dev(svm, dev) {
+ ret = 0;
+ sdev->users--;
+ if (!sdev->users) {
+ list_del_rcu(&sdev->list);
+ /* Flush the PASID cache and IOTLB for this device.
+ * Note that we do depend on the hardware *not* using
+ * the PASID any more. Just as we depend on other
+ * devices never using PASIDs that they have no right
+ * to use. We have a *shared* PASID table, because it's
+ * large and has to be physically contiguous. So it's
+ * hard to be as defensive as we might like. */
+ intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
+ intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
+ kfree_rcu(sdev, rcu);
+
+ if (list_empty(&svm->devs)) {
+ /* Clear private data so that free pass check */
+ ioasid_set_data(svm->pasid, NULL);
+ ioasid_free(svm->pasid);
+ if (svm->mm)
+ mmu_notifier_unregister(&svm->notifier, svm->mm);
+ list_del(&svm->list);
+ /* We mandate that no page faults may be outstanding
+ * for the PASID when intel_svm_unbind_mm() is called.
+ * If that is not obeyed, subtle errors will happen.
+ * Let's make them less subtle... */
+ memset(svm, 0x6b, sizeof(*svm));
+ kfree(svm);
}
- break;
}
+ break;
}
out:
mutex_unlock(&pasid_mutex);
@@ -581,7 +576,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
* to unbind the mm while any page faults are outstanding.
* So we only need RCU to protect the internal idr code. */
rcu_read_unlock();
- if (IS_ERR(svm) || !svm) {
+ if (IS_ERR_OR_NULL(svm)) {
pr_err("%s: Page request for invalid PASID %d: %08llx %08llx\n",
iommu->name, req->pasid, ((unsigned long long *)req)[0],
((unsigned long long *)req)[1]);
--
2.7.4

2019-10-25 19:03:40

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v7 02/11] iommu/vt-d: Enlightened PASID allocation

From: Lu Baolu <[email protected]>

Enabling IOMMU in a guest requires communication with the host
driver for certain aspects. Use of PASID ID to enable Shared Virtual
Addressing (SVA) requires managing PASID's in the host. VT-d 3.0 spec
provides a Virtual Command Register (VCMD) to facilitate this.
Writes to this register in the guest are trapped by QEMU which
proxies the call to the host driver.

This virtual command interface consists of a capability register,
a virtual command register, and a virtual response register. Refer
to section 10.4.42, 10.4.43, 10.4.44 for more information.

This patch adds the enlightened PASID allocation/free interfaces
via the virtual command interface.

Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Signed-off-by: Liu Yi L <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
Reviewed-by: Eric Auger <[email protected]>
---
drivers/iommu/intel-pasid.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
drivers/iommu/intel-pasid.h | 13 ++++++++++-
include/linux/intel-iommu.h | 2 ++
3 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 040a445be300..d81e857d2b25 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -63,6 +63,62 @@ void *intel_pasid_lookup_id(int pasid)
return p;
}

+int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
+{
+ unsigned long flags;
+ u8 status_code;
+ int ret = 0;
+ u64 res;
+
+ raw_spin_lock_irqsave(&iommu->register_lock, flags);
+ dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC);
+ IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
+ !(res & VCMD_VRSP_IP), res);
+ raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
+
+ status_code = VCMD_VRSP_SC(res);
+ switch (status_code) {
+ case VCMD_VRSP_SC_SUCCESS:
+ *pasid = VCMD_VRSP_RESULT(res);
+ break;
+ case VCMD_VRSP_SC_NO_PASID_AVAIL:
+ pr_info("IOMMU: %s: No PASID available\n", iommu->name);
+ ret = -ENOMEM;
+ break;
+ default:
+ ret = -ENODEV;
+ pr_warn("IOMMU: %s: Unexpected error code %d\n",
+ iommu->name, status_code);
+ }
+
+ return ret;
+}
+
+void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
+{
+ unsigned long flags;
+ u8 status_code;
+ u64 res;
+
+ raw_spin_lock_irqsave(&iommu->register_lock, flags);
+ dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) | VCMD_CMD_FREE);
+ IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
+ !(res & VCMD_VRSP_IP), res);
+ raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
+
+ status_code = VCMD_VRSP_SC(res);
+ switch (status_code) {
+ case VCMD_VRSP_SC_SUCCESS:
+ break;
+ case VCMD_VRSP_SC_INVALID_PASID:
+ pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
+ break;
+ default:
+ pr_warn("IOMMU: %s: Unexpected error code %d\n",
+ iommu->name, status_code);
+ }
+}
+
/*
* Per device pasid table management:
*/
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index fc8cd8f17de1..e413e884e685 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -23,6 +23,16 @@
#define is_pasid_enabled(entry) (((entry)->lo >> 3) & 0x1)
#define get_pasid_dir_size(entry) (1 << ((((entry)->lo >> 9) & 0x7) + 7))

+/* Virtual command interface for enlightened pasid management. */
+#define VCMD_CMD_ALLOC 0x1
+#define VCMD_CMD_FREE 0x2
+#define VCMD_VRSP_IP 0x1
+#define VCMD_VRSP_SC(e) (((e) >> 1) & 0x3)
+#define VCMD_VRSP_SC_SUCCESS 0
+#define VCMD_VRSP_SC_NO_PASID_AVAIL 1
+#define VCMD_VRSP_SC_INVALID_PASID 1
+#define VCMD_VRSP_RESULT(e) (((e) >> 8) & 0xfffff)
+
/*
* Domain ID reserved for pasid entries programmed for first-level
* only and pass-through transfer modes.
@@ -95,5 +105,6 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
struct device *dev, int pasid);
void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
struct device *dev, int pasid);
-
+int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
+void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
#endif /* __INTEL_PASID_H */
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 2e1bed9b7eef..1d4b8dcdc5d8 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -161,6 +161,7 @@
#define ecap_smpwc(e) (((e) >> 48) & 0x1)
#define ecap_flts(e) (((e) >> 47) & 0x1)
#define ecap_slts(e) (((e) >> 46) & 0x1)
+#define ecap_vcs(e) (((e) >> 44) & 0x1)
#define ecap_smts(e) (((e) >> 43) & 0x1)
#define ecap_dit(e) ((e >> 41) & 0x1)
#define ecap_pasid(e) ((e >> 40) & 0x1)
@@ -282,6 +283,7 @@

/* PRS_REG */
#define DMA_PRS_PPR ((u32)1)
+#define DMA_VCS_PAS ((u64)1)

#define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \
do { \
--
2.7.4

2019-10-25 19:03:58

by Jacob Pan

[permalink] [raw]
Subject: [PATCH v7 03/11] iommu/vt-d: Add custom allocator for IOASID

When VT-d driver runs in the guest, PASID allocation must be
performed via virtual command interface. This patch registers a
custom IOASID allocator which takes precedence over the default
XArray based allocator. The resulting IOASID allocation will always
come from the host. This ensures that PASID namespace is system-
wide.

Signed-off-by: Lu Baolu <[email protected]>
Signed-off-by: Liu, Yi L <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
---
drivers/iommu/Kconfig | 1 +
drivers/iommu/intel-iommu.c | 67 +++++++++++++++++++++++++++++++++++++++++++++
include/linux/intel-iommu.h | 2 ++
3 files changed, 70 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index fd50ddffffbf..961fe5795a90 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -211,6 +211,7 @@ config INTEL_IOMMU_SVM
bool "Support for Shared Virtual Memory with Intel IOMMU"
depends on INTEL_IOMMU && X86
select PCI_PASID
+ select IOASID
select MMU_NOTIFIER
help
Shared Virtual Memory (SVM) provides a facility for devices
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3f974919d3bd..ced1d89ef977 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1706,6 +1706,9 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
if (ecap_prs(iommu->ecap))
intel_svm_finish_prq(iommu);
}
+ if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
+ ioasid_unregister_allocator(&iommu->pasid_allocator);
+
#endif
}

@@ -4910,6 +4913,44 @@ static int __init probe_acpi_namespace_devices(void)
return 0;
}

+#ifdef CONFIG_INTEL_IOMMU_SVM
+static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)
+{
+ struct intel_iommu *iommu = data;
+ ioasid_t ioasid;
+
+ /*
+ * VT-d virtual command interface always uses the full 20 bit
+ * PASID range. Host can partition guest PASID range based on
+ * policies but it is out of guest's control.
+ */
+ if (min < PASID_MIN || max > intel_pasid_max_id)
+ return INVALID_IOASID;
+
+ if (vcmd_alloc_pasid(iommu, &ioasid))
+ return INVALID_IOASID;
+
+ return ioasid;
+}
+
+static void intel_ioasid_free(ioasid_t ioasid, void *data)
+{
+ struct intel_iommu *iommu = data;
+
+ if (!iommu)
+ return;
+ /*
+ * Sanity check the ioasid owner is done at upper layer, e.g. VFIO
+ * We can only free the PASID when all the devices are unbond.
+ */
+ if (ioasid_find(NULL, ioasid, NULL)) {
+ pr_alert("Cannot free active IOASID %d\n", ioasid);
+ return;
+ }
+ vcmd_free_pasid(iommu, ioasid);
+}
+#endif
+
int __init intel_iommu_init(void)
{
int ret = -ENODEV;
@@ -5020,6 +5061,32 @@ int __init intel_iommu_init(void)
"%s", iommu->name);
iommu_device_set_ops(&iommu->iommu, &intel_iommu_ops);
iommu_device_register(&iommu->iommu);
+#ifdef CONFIG_INTEL_IOMMU_SVM
+ if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap)) {
+ pr_info("Register custom PASID allocator\n");
+ /*
+ * Register a custom ASID allocator if we are running
+ * in a guest, the purpose is to have a system wide PASID
+ * namespace among all PASID users.
+ * There can be multiple vIOMMUs in each guest but only
+ * one allocator is active. All vIOMMU allocators will
+ * eventually be calling the same host allocator.
+ */
+ iommu->pasid_allocator.alloc = intel_ioasid_alloc;
+ iommu->pasid_allocator.free = intel_ioasid_free;
+ iommu->pasid_allocator.pdata = (void *)iommu;
+ ret = ioasid_register_allocator(&iommu->pasid_allocator);
+ if (ret) {
+ pr_warn("Custom PASID allocator registeration failed\n");
+ /*
+ * Disable scalable mode on this IOMMU if there
+ * is no custom allocator. Mixing SM capable vIOMMU
+ * and non-SM vIOMMU are not supported.
+ */
+ intel_iommu_sm = 0;
+ }
+ }
+#endif
}

bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 1d4b8dcdc5d8..c624733cb2e6 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -19,6 +19,7 @@
#include <linux/iommu.h>
#include <linux/io-64-nonatomic-lo-hi.h>
#include <linux/dmar.h>
+#include <linux/ioasid.h>

#include <asm/cacheflush.h>
#include <asm/iommu.h>
@@ -546,6 +547,7 @@ struct intel_iommu {
#ifdef CONFIG_INTEL_IOMMU_SVM
struct page_req_dsc *prq;
unsigned char prq_name[16]; /* Name for PRQ interrupt */
+ struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for PASIDs */
#endif
struct q_inval *qi; /* Queued invalidation info */
u32 *iommu_state; /* Store iommu states between suspend and resume.*/
--
2.7.4

2019-10-25 19:20:21

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v7 03/11] iommu/vt-d: Add custom allocator for IOASID

Hi Jacob,

On 10/25/19 3:54 AM, Jacob Pan wrote:
> When VT-d driver runs in the guest, PASID allocation must be
> performed via virtual command interface. This patch registers a
> custom IOASID allocator which takes precedence over the default
> XArray based allocator. The resulting IOASID allocation will always
> come from the host. This ensures that PASID namespace is system-
> wide.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Signed-off-by: Liu, Yi L <[email protected]>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/Kconfig | 1 +
> drivers/iommu/intel-iommu.c | 67 +++++++++++++++++++++++++++++++++++++++++++++
> include/linux/intel-iommu.h | 2 ++
> 3 files changed, 70 insertions(+)
>
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index fd50ddffffbf..961fe5795a90 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -211,6 +211,7 @@ config INTEL_IOMMU_SVM
> bool "Support for Shared Virtual Memory with Intel IOMMU"
> depends on INTEL_IOMMU && X86
> select PCI_PASID
> + select IOASID
> select MMU_NOTIFIER
> help
> Shared Virtual Memory (SVM) provides a facility for devices
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 3f974919d3bd..ced1d89ef977 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1706,6 +1706,9 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
> if (ecap_prs(iommu->ecap))
> intel_svm_finish_prq(iommu);
> }
> + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
> + ioasid_unregister_allocator(&iommu->pasid_allocator);

Since scalable mode is disabled if pasid allocator failed to register,
add sm_support(iommu) check here will be better.

> +
> #endif
> }
>
> @@ -4910,6 +4913,44 @@ static int __init probe_acpi_namespace_devices(void)
> return 0;
> }
>
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)
> +{
> + struct intel_iommu *iommu = data;
> + ioasid_t ioasid;
> +
> + /*
> + * VT-d virtual command interface always uses the full 20 bit
> + * PASID range. Host can partition guest PASID range based on
> + * policies but it is out of guest's control.
> + */
> + if (min < PASID_MIN || max > intel_pasid_max_id)
> + return INVALID_IOASID;
> +
> + if (vcmd_alloc_pasid(iommu, &ioasid))
> + return INVALID_IOASID;
> +
> + return ioasid;
> +}
> +
> +static void intel_ioasid_free(ioasid_t ioasid, void *data)
> +{
> + struct intel_iommu *iommu = data;
> +
> + if (!iommu)
> + return;
> + /*
> + * Sanity check the ioasid owner is done at upper layer, e.g. VFIO
> + * We can only free the PASID when all the devices are unbond.
> + */
> + if (ioasid_find(NULL, ioasid, NULL)) {
> + pr_alert("Cannot free active IOASID %d\n", ioasid);
> + return;
> + }
> + vcmd_free_pasid(iommu, ioasid);
> +}
> +#endif
> +
> int __init intel_iommu_init(void)
> {
> int ret = -ENODEV;
> @@ -5020,6 +5061,32 @@ int __init intel_iommu_init(void)
> "%s", iommu->name);
> iommu_device_set_ops(&iommu->iommu, &intel_iommu_ops);
> iommu_device_register(&iommu->iommu);
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap)) {
> + pr_info("Register custom PASID allocator\n");
> + /*
> + * Register a custom ASID allocator if we are running
> + * in a guest, the purpose is to have a system wide PASID
> + * namespace among all PASID users.
> + * There can be multiple vIOMMUs in each guest but only
> + * one allocator is active. All vIOMMU allocators will
> + * eventually be calling the same host allocator.
> + */
> + iommu->pasid_allocator.alloc = intel_ioasid_alloc;
> + iommu->pasid_allocator.free = intel_ioasid_free;
> + iommu->pasid_allocator.pdata = (void *)iommu;
> + ret = ioasid_register_allocator(&iommu->pasid_allocator);
> + if (ret) {
> + pr_warn("Custom PASID allocator registeration failed\n");
> + /*
> + * Disable scalable mode on this IOMMU if there
> + * is no custom allocator. Mixing SM capable vIOMMU
> + * and non-SM vIOMMU are not supported.
> + */
> + intel_iommu_sm = 0;

It's insufficient to disable scalable mode by only clearing
intel_iommu_sm. The DMA_RTADDR_SMT bit in root entry has already been
set. Probably, you need to

for each iommu
clear DMA_RTADDR_SMT in root entry

Alternatively, since vSVA is the only customer of this custom PASID
allocator, is it possible to only disable SVA here?

> + }
> + }
> +#endif
> }
>
> bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 1d4b8dcdc5d8..c624733cb2e6 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -19,6 +19,7 @@
> #include <linux/iommu.h>
> #include <linux/io-64-nonatomic-lo-hi.h>
> #include <linux/dmar.h>
> +#include <linux/ioasid.h>
>
> #include <asm/cacheflush.h>
> #include <asm/iommu.h>
> @@ -546,6 +547,7 @@ struct intel_iommu {
> #ifdef CONFIG_INTEL_IOMMU_SVM
> struct page_req_dsc *prq;
> unsigned char prq_name[16]; /* Name for PRQ interrupt */
> + struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for PASIDs */
> #endif
> struct q_inval *qi; /* Queued invalidation info */
> u32 *iommu_state; /* Store iommu states between suspend and resume.*/
>

Best regards,
baolu

2019-10-25 19:21:44

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v7 01/11] iommu/vt-d: Cache virtual command capability register

Hi,

On 10/25/19 3:54 AM, Jacob Pan wrote:
> Virtual command registers are used in the guest only, to prevent
> vmexit cost, we cache the capability and store it during initialization.
>
> Signed-off-by: Jacob Pan <[email protected]>

This patch looks good to me.

Reviewed-by: Lu Baolu <[email protected]>

Best regards,
baolu

> ---
> drivers/iommu/dmar.c | 1 +
> include/linux/intel-iommu.h | 4 ++++
> 2 files changed, 5 insertions(+)
>
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index eecd6a421667..49bb7d76e646 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -950,6 +950,7 @@ static int map_iommu(struct intel_iommu *iommu, u64 phys_addr)
> warn_invalid_dmar(phys_addr, " returns all ones");
> goto unmap;
> }
> + iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
>
> /* the registers might be more than one page */
> map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index ed11ef594378..2e1bed9b7eef 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -186,6 +186,9 @@
> #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
> #define ecap_sc_support(e) ((e >> 7) & 0x1) /* Snooping Control */
>
> +/* Virtual command interface capabilities */
> +#define vccap_pasid(v) ((v & DMA_VCS_PAS)) /* PASID allocation */
> +
> /* IOTLB_REG */
> #define DMA_TLB_FLUSH_GRANU_OFFSET 60
> #define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60)
> @@ -520,6 +523,7 @@ struct intel_iommu {
> u64 reg_size; /* size of hw register set */
> u64 cap;
> u64 ecap;
> + u64 vccap;
> u32 gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF */
> raw_spinlock_t register_lock; /* protect register handling */
> int seq_id; /* sequence id of the iommu */
>

2019-10-25 19:22:25

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 03/11] iommu/vt-d: Add custom allocator for IOASID

Hi Baolu,

Thanks for the review. please see my comments inline.

On Fri, 25 Oct 2019 10:30:48 +0800
Lu Baolu <[email protected]> wrote:

> Hi Jacob,
>
> On 10/25/19 3:54 AM, Jacob Pan wrote:
> > When VT-d driver runs in the guest, PASID allocation must be
> > performed via virtual command interface. This patch registers a
> > custom IOASID allocator which takes precedence over the default
> > XArray based allocator. The resulting IOASID allocation will always
> > come from the host. This ensures that PASID namespace is system-
> > wide.
> >
> > Signed-off-by: Lu Baolu <[email protected]>
> > Signed-off-by: Liu, Yi L <[email protected]>
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/Kconfig | 1 +
> > drivers/iommu/intel-iommu.c | 67
> > +++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/intel-iommu.h | 2 ++ 3 files changed, 70
> > insertions(+)
> >
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index fd50ddffffbf..961fe5795a90 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -211,6 +211,7 @@ config INTEL_IOMMU_SVM
> > bool "Support for Shared Virtual Memory with Intel IOMMU"
> > depends on INTEL_IOMMU && X86
> > select PCI_PASID
> > + select IOASID
> > select MMU_NOTIFIER
> > help
> > Shared Virtual Memory (SVM) provides a facility for
> > devices diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 3f974919d3bd..ced1d89ef977
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -1706,6 +1706,9 @@ static void free_dmar_iommu(struct
> > intel_iommu *iommu) if (ecap_prs(iommu->ecap))
> > intel_svm_finish_prq(iommu);
> > }
> > + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
> > +
> > ioasid_unregister_allocator(&iommu->pasid_allocator);
>
> Since scalable mode is disabled if pasid allocator failed to register,
> add sm_support(iommu) check here will be better.
>
I was thinking to be symmetric with register call, checking for the
same conditions. Also, I like your advice below to only disable SVA
instead of scalable mode.
> > +
> > #endif
> > }
> >
> > @@ -4910,6 +4913,44 @@ static int __init
> > probe_acpi_namespace_devices(void) return 0;
> > }
> >
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max,
> > void *data) +{
> > + struct intel_iommu *iommu = data;
> > + ioasid_t ioasid;
> > +
> > + /*
> > + * VT-d virtual command interface always uses the full 20
> > bit
> > + * PASID range. Host can partition guest PASID range based
> > on
> > + * policies but it is out of guest's control.
> > + */
> > + if (min < PASID_MIN || max > intel_pasid_max_id)
> > + return INVALID_IOASID;
> > +
> > + if (vcmd_alloc_pasid(iommu, &ioasid))
> > + return INVALID_IOASID;
> > +
> > + return ioasid;
> > +}
> > +
> > +static void intel_ioasid_free(ioasid_t ioasid, void *data)
> > +{
> > + struct intel_iommu *iommu = data;
> > +
> > + if (!iommu)
> > + return;
> > + /*
> > + * Sanity check the ioasid owner is done at upper layer,
> > e.g. VFIO
> > + * We can only free the PASID when all the devices are
> > unbond.
> > + */
> > + if (ioasid_find(NULL, ioasid, NULL)) {
> > + pr_alert("Cannot free active IOASID %d\n", ioasid);
> > + return;
> > + }
> > + vcmd_free_pasid(iommu, ioasid);
> > +}
> > +#endif
> > +
> > int __init intel_iommu_init(void)
> > {
> > int ret = -ENODEV;
> > @@ -5020,6 +5061,32 @@ int __init intel_iommu_init(void)
> > "%s", iommu->name);
> > iommu_device_set_ops(&iommu->iommu,
> > &intel_iommu_ops); iommu_device_register(&iommu->iommu);
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > + if (ecap_vcs(iommu->ecap) &&
> > vccap_pasid(iommu->vccap)) {
> > + pr_info("Register custom PASID
> > allocator\n");
> > + /*
> > + * Register a custom ASID allocator if we
> > are running
> > + * in a guest, the purpose is to have a
> > system wide PASID
> > + * namespace among all PASID users.
> > + * There can be multiple vIOMMUs in each
> > guest but only
> > + * one allocator is active. All vIOMMU
> > allocators will
> > + * eventually be calling the same host
> > allocator.
> > + */
> > + iommu->pasid_allocator.alloc =
> > intel_ioasid_alloc;
> > + iommu->pasid_allocator.free =
> > intel_ioasid_free;
> > + iommu->pasid_allocator.pdata = (void
> > *)iommu;
> > + ret =
> > ioasid_register_allocator(&iommu->pasid_allocator);
> > + if (ret) {
> > + pr_warn("Custom PASID allocator
> > registeration failed\n");
> > + /*
> > + * Disable scalable mode on this
> > IOMMU if there
> > + * is no custom allocator. Mixing
> > SM capable vIOMMU
> > + * and non-SM vIOMMU are not
> > supported.
> > + */
> > + intel_iommu_sm = 0;
>
> It's insufficient to disable scalable mode by only clearing
> intel_iommu_sm. The DMA_RTADDR_SMT bit in root entry has already been
> set. Probably, you need to
>
> for each iommu
> clear DMA_RTADDR_SMT in root entry
>
> Alternatively, since vSVA is the only customer of this custom PASID
> allocator, is it possible to only disable SVA here?
>
Yeah, I think disable SVA is better. We can still do gIOVA in SM. I
guess we need to introduce a flag for sva_enabled.
> > + }
> > + }
> > +#endif
> > }
> >
> > bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index 1d4b8dcdc5d8..c624733cb2e6
> > 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -19,6 +19,7 @@
> > #include <linux/iommu.h>
> > #include <linux/io-64-nonatomic-lo-hi.h>
> > #include <linux/dmar.h>
> > +#include <linux/ioasid.h>
> >
> > #include <asm/cacheflush.h>
> > #include <asm/iommu.h>
> > @@ -546,6 +547,7 @@ struct intel_iommu {
> > #ifdef CONFIG_INTEL_IOMMU_SVM
> > struct page_req_dsc *prq;
> > unsigned char prq_name[16]; /* Name for PRQ interrupt
> > */
> > + struct ioasid_allocator_ops pasid_allocator; /* Custom
> > allocator for PASIDs */ #endif
> > struct q_inval *qi; /* Queued invalidation
> > info */ u32 *iommu_state; /* Store iommu states between suspend and
> > resume.*/
>
> Best regards,
> baolu

[Jacob Pan]

2019-10-25 19:23:21

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v7 06/11] iommu/vt-d: Avoid duplicated code for PASID setup

Hi,

On 10/25/19 3:54 AM, Jacob Pan wrote:
> After each setup for PASID entry, related translation caches must be flushed.
> We can combine duplicated code into one function which is less error prone.
>
> Signed-off-by: Jacob Pan <[email protected]>

This patch looks good to me.

Reviewed-by: Lu Baolu <[email protected]>

Best regards,
baolu

> ---
> drivers/iommu/intel-pasid.c | 48 +++++++++++++++++----------------------------
> 1 file changed, 18 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index e79d680fe300..ffbd416ed3b8 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -485,6 +485,21 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> devtlb_invalidation_with_pasid(iommu, dev, pasid);
> }
>
> +static void pasid_flush_caches(struct intel_iommu *iommu,
> + struct pasid_entry *pte,
> + int pasid, u16 did)
> +{
> + if (!ecap_coherent(iommu->ecap))
> + clflush_cache_range(pte, sizeof(*pte));
> +
> + if (cap_caching_mode(iommu->cap)) {
> + pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> + iotlb_invalidation_with_pasid(iommu, did, pasid);
> + } else {
> + iommu_flush_write_buffer(iommu);
> + }
> +}
> +
> /*
> * Set up the scalable mode pasid table entry for first only
> * translation type.
> @@ -530,16 +545,7 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
> /* Setup Present and PASID Granular Transfer Type: */
> pasid_set_translation_type(pte, 1);
> pasid_set_present(pte);
> -
> - if (!ecap_coherent(iommu->ecap))
> - clflush_cache_range(pte, sizeof(*pte));
> -
> - if (cap_caching_mode(iommu->cap)) {
> - pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> - iotlb_invalidation_with_pasid(iommu, did, pasid);
> - } else {
> - iommu_flush_write_buffer(iommu);
> - }
> + pasid_flush_caches(iommu, pte, pasid, did);
>
> return 0;
> }
> @@ -603,16 +609,7 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
> */
> pasid_set_sre(pte);
> pasid_set_present(pte);
> -
> - if (!ecap_coherent(iommu->ecap))
> - clflush_cache_range(pte, sizeof(*pte));
> -
> - if (cap_caching_mode(iommu->cap)) {
> - pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> - iotlb_invalidation_with_pasid(iommu, did, pasid);
> - } else {
> - iommu_flush_write_buffer(iommu);
> - }
> + pasid_flush_caches(iommu, pte, pasid, did);
>
> return 0;
> }
> @@ -646,16 +643,7 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
> */
> pasid_set_sre(pte);
> pasid_set_present(pte);
> -
> - if (!ecap_coherent(iommu->ecap))
> - clflush_cache_range(pte, sizeof(*pte));
> -
> - if (cap_caching_mode(iommu->cap)) {
> - pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> - iotlb_invalidation_with_pasid(iommu, did, pasid);
> - } else {
> - iommu_flush_write_buffer(iommu);
> - }
> + pasid_flush_caches(iommu, pte, pasid, did);
>
> return 0;
> }
>

2019-10-25 19:23:39

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v7 05/11] iommu/vt-d: Move domain helper to header

HI,

On 10/25/19 3:54 AM, Jacob Pan wrote:
> Move domain helper to header to be used by SVA code.
>
> Signed-off-by: Jacob Pan <[email protected]>
> Reviewed-by: Eric Auger <[email protected]>

This patch looks good to me.

Reviewed-by: Lu Baolu <[email protected]>

Best regards,
baolu

> ---
> drivers/iommu/intel-iommu.c | 6 ------
> include/linux/intel-iommu.h | 6 ++++++
> 2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 2ea09b988a23..acd1ac787d8b 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -428,12 +428,6 @@ static void init_translation_status(struct intel_iommu *iommu)
> iommu->flags |= VTD_FLAG_TRANS_PRE_ENABLED;
> }
>
> -/* Convert generic 'struct iommu_domain to private struct dmar_domain */
> -static struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
> -{
> - return container_of(dom, struct dmar_domain, domain);
> -}
> -
> static int __init intel_iommu_setup(char *str)
> {
> if (!str)
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index c624733cb2e6..3dba6ad3e9ad 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -594,6 +594,12 @@ static inline void __iommu_flush_cache(
> clflush_cache_range(addr, size);
> }
>
> +/* Convert generic struct iommu_domain to private struct dmar_domain */
> +static inline struct dmar_domain *to_dmar_domain(struct iommu_domain *dom)
> +{
> + return container_of(dom, struct dmar_domain, domain);
> +}
> +
> /*
> * 0: readable
> * 1: writable
>

2019-10-25 19:23:44

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v7 04/11] iommu/vt-d: Replace Intel specific PASID allocator with IOASID

Hi,

On 10/25/19 3:54 AM, Jacob Pan wrote:
> Make use of generic IOASID code to manage PASID allocation,
> free, and lookup. Replace Intel specific code.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 12 ++++++------
> drivers/iommu/intel-pasid.c | 36 ------------------------------------
> drivers/iommu/intel-svm.c | 39 +++++++++++++++++++++++----------------
> 3 files changed, 29 insertions(+), 58 deletions(-)

[--cut--]

> @@ -458,10 +465,11 @@ int intel_svm_is_pasid_valid(struct device *dev, int pasid)
> if (!iommu)
> goto out;
>
> - svm = intel_pasid_lookup_id(pasid);
> - if (!svm)
> + svm = ioasid_find(NULL, pasid, NULL);
> + if (IS_ERR(svm)) {

Shall we check whether svm is NULL?

Others looks good to me.

Reviewed-by: Lu Baolu <[email protected]>

Best regards,
baolu

2019-10-25 19:25:07

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 03/11] iommu/vt-d: Add custom allocator for IOASID

> From: Jacob Pan [mailto:[email protected]]
> Sent: Friday, October 25, 2019 3:55 AM
>
> When VT-d driver runs in the guest, PASID allocation must be
> performed via virtual command interface. This patch registers a
> custom IOASID allocator which takes precedence over the default
> XArray based allocator. The resulting IOASID allocation will always
> come from the host. This ensures that PASID namespace is system-
> wide.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Signed-off-by: Liu, Yi L <[email protected]>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/Kconfig | 1 +
> drivers/iommu/intel-iommu.c | 67
> +++++++++++++++++++++++++++++++++++++++++++++
> include/linux/intel-iommu.h | 2 ++
> 3 files changed, 70 insertions(+)
>
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index fd50ddffffbf..961fe5795a90 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -211,6 +211,7 @@ config INTEL_IOMMU_SVM
> bool "Support for Shared Virtual Memory with Intel IOMMU"
> depends on INTEL_IOMMU && X86
> select PCI_PASID
> + select IOASID
> select MMU_NOTIFIER
> help
> Shared Virtual Memory (SVM) provides a facility for devices
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 3f974919d3bd..ced1d89ef977 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1706,6 +1706,9 @@ static void free_dmar_iommu(struct intel_iommu
> *iommu)
> if (ecap_prs(iommu->ecap))
> intel_svm_finish_prq(iommu);
> }
> + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
> + ioasid_unregister_allocator(&iommu->pasid_allocator);
> +
> #endif
> }
>
> @@ -4910,6 +4913,44 @@ static int __init
> probe_acpi_namespace_devices(void)
> return 0;
> }
>
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)
> +{
> + struct intel_iommu *iommu = data;
> + ioasid_t ioasid;
> +
> + /*
> + * VT-d virtual command interface always uses the full 20 bit
> + * PASID range. Host can partition guest PASID range based on
> + * policies but it is out of guest's control.
> + */
> + if (min < PASID_MIN || max > intel_pasid_max_id)
> + return INVALID_IOASID;
> +
> + if (vcmd_alloc_pasid(iommu, &ioasid))
> + return INVALID_IOASID;
> +
> + return ioasid;
> +}
> +
> +static void intel_ioasid_free(ioasid_t ioasid, void *data)
> +{
> + struct intel_iommu *iommu = data;
> +
> + if (!iommu)
> + return;
> + /*
> + * Sanity check the ioasid owner is done at upper layer, e.g. VFIO
> + * We can only free the PASID when all the devices are unbond.

unbond -> unbound

> + */
> + if (ioasid_find(NULL, ioasid, NULL)) {
> + pr_alert("Cannot free active IOASID %d\n", ioasid);
> + return;
> + }
> + vcmd_free_pasid(iommu, ioasid);
> +}
> +#endif
> +
> int __init intel_iommu_init(void)
> {
> int ret = -ENODEV;
> @@ -5020,6 +5061,32 @@ int __init intel_iommu_init(void)
> "%s", iommu->name);
> iommu_device_set_ops(&iommu->iommu,
> &intel_iommu_ops);
> iommu_device_register(&iommu->iommu);
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
> {
> + pr_info("Register custom PASID allocator\n");
> + /*
> + * Register a custom ASID allocator if we are running
> + * in a guest, the purpose is to have a system wide
> PASID
> + * namespace among all PASID users.
> + * There can be multiple vIOMMUs in each guest but
> only
> + * one allocator is active. All vIOMMU allocators will
> + * eventually be calling the same host allocator.
> + */
> + iommu->pasid_allocator.alloc = intel_ioasid_alloc;
> + iommu->pasid_allocator.free = intel_ioasid_free;
> + iommu->pasid_allocator.pdata = (void *)iommu;
> + ret = ioasid_register_allocator(&iommu-
> >pasid_allocator);
> + if (ret) {
> + pr_warn("Custom PASID allocator
> registeration failed\n");

registration

> + /*
> + * Disable scalable mode on this IOMMU if
> there
> + * is no custom allocator. Mixing SM capable
> vIOMMU
> + * and non-SM vIOMMU are not supported.
> + */
> + intel_iommu_sm = 0;
> + }
> + }
> +#endif
> }
>
> bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 1d4b8dcdc5d8..c624733cb2e6 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -19,6 +19,7 @@
> #include <linux/iommu.h>
> #include <linux/io-64-nonatomic-lo-hi.h>
> #include <linux/dmar.h>
> +#include <linux/ioasid.h>
>
> #include <asm/cacheflush.h>
> #include <asm/iommu.h>
> @@ -546,6 +547,7 @@ struct intel_iommu {
> #ifdef CONFIG_INTEL_IOMMU_SVM
> struct page_req_dsc *prq;
> unsigned char prq_name[16]; /* Name for PRQ interrupt */
> + struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for
> PASIDs */
> #endif
> struct q_inval *qi; /* Queued invalidation info */
> u32 *iommu_state; /* Store iommu states between suspend and
> resume.*/
> --
> 2.7.4

2019-10-25 19:25:16

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 03/11] iommu/vt-d: Add custom allocator for IOASID

> From: Jacob Pan [mailto:[email protected]]
> Sent: Friday, October 25, 2019 12:43 PM
>
> Hi Baolu,
>
> Thanks for the review. please see my comments inline.
>
> On Fri, 25 Oct 2019 10:30:48 +0800
> Lu Baolu <[email protected]> wrote:
>
> > Hi Jacob,
> >
> > On 10/25/19 3:54 AM, Jacob Pan wrote:
> > > When VT-d driver runs in the guest, PASID allocation must be
> > > performed via virtual command interface. This patch registers a
> > > custom IOASID allocator which takes precedence over the default
> > > XArray based allocator. The resulting IOASID allocation will always
> > > come from the host. This ensures that PASID namespace is system-
> > > wide.
> > >
> > > Signed-off-by: Lu Baolu <[email protected]>
> > > Signed-off-by: Liu, Yi L <[email protected]>
> > > Signed-off-by: Jacob Pan <[email protected]>
> > > ---
> > > drivers/iommu/Kconfig | 1 +
> > > drivers/iommu/intel-iommu.c | 67
> > > +++++++++++++++++++++++++++++++++++++++++++++
> > > include/linux/intel-iommu.h | 2 ++ 3 files changed, 70
> > > insertions(+)
> > >
> > > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > > index fd50ddffffbf..961fe5795a90 100644
> > > --- a/drivers/iommu/Kconfig
> > > +++ b/drivers/iommu/Kconfig
> > > @@ -211,6 +211,7 @@ config INTEL_IOMMU_SVM
> > > bool "Support for Shared Virtual Memory with Intel IOMMU"
> > > depends on INTEL_IOMMU && X86
> > > select PCI_PASID
> > > + select IOASID
> > > select MMU_NOTIFIER
> > > help
> > > Shared Virtual Memory (SVM) provides a facility for
> > > devices diff --git a/drivers/iommu/intel-iommu.c
> > > b/drivers/iommu/intel-iommu.c index 3f974919d3bd..ced1d89ef977
> > > 100644 --- a/drivers/iommu/intel-iommu.c
> > > +++ b/drivers/iommu/intel-iommu.c
> > > @@ -1706,6 +1706,9 @@ static void free_dmar_iommu(struct
> > > intel_iommu *iommu) if (ecap_prs(iommu->ecap))
> > > intel_svm_finish_prq(iommu);
> > > }
> > > + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
> > > +
> > > ioasid_unregister_allocator(&iommu->pasid_allocator);
> >
> > Since scalable mode is disabled if pasid allocator failed to register,
> > add sm_support(iommu) check here will be better.
> >
> I was thinking to be symmetric with register call, checking for the
> same conditions. Also, I like your advice below to only disable SVA
> instead of scalable mode.
> > > +
> > > #endif
> > > }
> > >
> > > @@ -4910,6 +4913,44 @@ static int __init
> > > probe_acpi_namespace_devices(void) return 0;
> > > }
> > >
> > > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > > +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max,
> > > void *data) +{
> > > + struct intel_iommu *iommu = data;
> > > + ioasid_t ioasid;
> > > +
> > > + /*
> > > + * VT-d virtual command interface always uses the full 20
> > > bit
> > > + * PASID range. Host can partition guest PASID range based
> > > on
> > > + * policies but it is out of guest's control.
> > > + */
> > > + if (min < PASID_MIN || max > intel_pasid_max_id)
> > > + return INVALID_IOASID;
> > > +
> > > + if (vcmd_alloc_pasid(iommu, &ioasid))
> > > + return INVALID_IOASID;
> > > +
> > > + return ioasid;
> > > +}
> > > +
> > > +static void intel_ioasid_free(ioasid_t ioasid, void *data)
> > > +{
> > > + struct intel_iommu *iommu = data;
> > > +
> > > + if (!iommu)
> > > + return;
> > > + /*
> > > + * Sanity check the ioasid owner is done at upper layer,
> > > e.g. VFIO
> > > + * We can only free the PASID when all the devices are
> > > unbond.
> > > + */
> > > + if (ioasid_find(NULL, ioasid, NULL)) {
> > > + pr_alert("Cannot free active IOASID %d\n", ioasid);
> > > + return;
> > > + }
> > > + vcmd_free_pasid(iommu, ioasid);
> > > +}
> > > +#endif
> > > +
> > > int __init intel_iommu_init(void)
> > > {
> > > int ret = -ENODEV;
> > > @@ -5020,6 +5061,32 @@ int __init intel_iommu_init(void)
> > > "%s", iommu->name);
> > > iommu_device_set_ops(&iommu->iommu,
> > > &intel_iommu_ops); iommu_device_register(&iommu->iommu);
> > > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > > + if (ecap_vcs(iommu->ecap) &&
> > > vccap_pasid(iommu->vccap)) {
> > > + pr_info("Register custom PASID
> > > allocator\n");
> > > + /*
> > > + * Register a custom ASID allocator if we
> > > are running
> > > + * in a guest, the purpose is to have a
> > > system wide PASID
> > > + * namespace among all PASID users.
> > > + * There can be multiple vIOMMUs in each
> > > guest but only
> > > + * one allocator is active. All vIOMMU
> > > allocators will
> > > + * eventually be calling the same host
> > > allocator.
> > > + */
> > > + iommu->pasid_allocator.alloc =
> > > intel_ioasid_alloc;
> > > + iommu->pasid_allocator.free =
> > > intel_ioasid_free;
> > > + iommu->pasid_allocator.pdata = (void
> > > *)iommu;
> > > + ret =
> > > ioasid_register_allocator(&iommu->pasid_allocator);
> > > + if (ret) {
> > > + pr_warn("Custom PASID allocator
> > > registeration failed\n");
> > > + /*
> > > + * Disable scalable mode on this
> > > IOMMU if there
> > > + * is no custom allocator. Mixing
> > > SM capable vIOMMU
> > > + * and non-SM vIOMMU are not
> > > supported.
> > > + */
> > > + intel_iommu_sm = 0;
> >
> > It's insufficient to disable scalable mode by only clearing
> > intel_iommu_sm. The DMA_RTADDR_SMT bit in root entry has already
> been
> > set. Probably, you need to
> >
> > for each iommu
> > clear DMA_RTADDR_SMT in root entry
> >
> > Alternatively, since vSVA is the only customer of this custom PASID
> > allocator, is it possible to only disable SVA here?
> >
> Yeah, I think disable SVA is better. We can still do gIOVA in SM. I
> guess we need to introduce a flag for sva_enabled.

I'm not sure whether tying above logic to SVA is the right approach.
If vcmd interface doesn't work, the whole SM mode doesn't make
sense which is based on PASID-granular protection (SVA is only one
usage atop). If the only remaining usage of SM is to map gIOVA using
reserved PASID#0, then why not disabling SM and just fallback to
legacy mode?

Based on that I prefer to disabling the SM mode completely (better
through an interface), and move the logic out of CONFIG_INTEL_
IOMMU_SVM


> > > + }
> > > + }
> > > +#endif
> > > }
> > >
> > > bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
> > > diff --git a/include/linux/intel-iommu.h
> > > b/include/linux/intel-iommu.h index 1d4b8dcdc5d8..c624733cb2e6
> > > 100644 --- a/include/linux/intel-iommu.h
> > > +++ b/include/linux/intel-iommu.h
> > > @@ -19,6 +19,7 @@
> > > #include <linux/iommu.h>
> > > #include <linux/io-64-nonatomic-lo-hi.h>
> > > #include <linux/dmar.h>
> > > +#include <linux/ioasid.h>
> > >
> > > #include <asm/cacheflush.h>
> > > #include <asm/iommu.h>
> > > @@ -546,6 +547,7 @@ struct intel_iommu {
> > > #ifdef CONFIG_INTEL_IOMMU_SVM
> > > struct page_req_dsc *prq;
> > > unsigned char prq_name[16]; /* Name for PRQ interrupt
> > > */
> > > + struct ioasid_allocator_ops pasid_allocator; /* Custom
> > > allocator for PASIDs */ #endif
> > > struct q_inval *qi; /* Queued invalidation
> > > info */ u32 *iommu_state; /* Store iommu states between suspend and
> > > resume.*/
> >
> > Best regards,
> > baolu
>
> [Jacob Pan]

2019-10-25 19:25:21

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 04/11] iommu/vt-d: Replace Intel specific PASID allocator with IOASID

> From: Jacob Pan [mailto:[email protected]]
> Sent: Friday, October 25, 2019 3:55 AM
>
> Make use of generic IOASID code to manage PASID allocation,
> free, and lookup. Replace Intel specific code.
>
> Signed-off-by: Jacob Pan <[email protected]>

better push this patch separately. It's a generic cleanup.

> ---
> drivers/iommu/intel-iommu.c | 12 ++++++------
> drivers/iommu/intel-pasid.c | 36 ------------------------------------
> drivers/iommu/intel-svm.c | 39 +++++++++++++++++++++++----------------
> 3 files changed, 29 insertions(+), 58 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index ced1d89ef977..2ea09b988a23 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5311,7 +5311,7 @@ static void auxiliary_unlink_device(struct
> dmar_domain *domain,
> domain->auxd_refcnt--;
>
> if (!domain->auxd_refcnt && domain->default_pasid > 0)
> - intel_pasid_free_id(domain->default_pasid);
> + ioasid_free(domain->default_pasid);
> }
>
> static int aux_domain_add_dev(struct dmar_domain *domain,
> @@ -5329,10 +5329,10 @@ static int aux_domain_add_dev(struct
> dmar_domain *domain,
> if (domain->default_pasid <= 0) {
> int pasid;
>
> - pasid = intel_pasid_alloc_id(domain, PASID_MIN,
> - pci_max_pasids(to_pci_dev(dev)),
> - GFP_KERNEL);
> - if (pasid <= 0) {
> + /* No private data needed for the default pasid */
> + pasid = ioasid_alloc(NULL, PASID_MIN,
> pci_max_pasids(to_pci_dev(dev)) - 1,
> + NULL);
> + if (pasid == INVALID_IOASID) {
> pr_err("Can't allocate default pasid\n");
> return -ENODEV;
> }
> @@ -5368,7 +5368,7 @@ static int aux_domain_add_dev(struct
> dmar_domain *domain,
> spin_unlock(&iommu->lock);
> spin_unlock_irqrestore(&device_domain_lock, flags);
> if (!domain->auxd_refcnt && domain->default_pasid > 0)
> - intel_pasid_free_id(domain->default_pasid);
> + ioasid_free(domain->default_pasid);
>
> return ret;
> }
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index d81e857d2b25..e79d680fe300 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -26,42 +26,6 @@
> */
> static DEFINE_SPINLOCK(pasid_lock);
> u32 intel_pasid_max_id = PASID_MAX;
> -static DEFINE_IDR(pasid_idr);
> -
> -int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
> -{
> - int ret, min, max;
> -
> - min = max_t(int, start, PASID_MIN);
> - max = min_t(int, end, intel_pasid_max_id);
> -
> - WARN_ON(in_interrupt());
> - idr_preload(gfp);
> - spin_lock(&pasid_lock);
> - ret = idr_alloc(&pasid_idr, ptr, min, max, GFP_ATOMIC);
> - spin_unlock(&pasid_lock);
> - idr_preload_end();
> -
> - return ret;
> -}
> -
> -void intel_pasid_free_id(int pasid)
> -{
> - spin_lock(&pasid_lock);
> - idr_remove(&pasid_idr, pasid);
> - spin_unlock(&pasid_lock);
> -}
> -
> -void *intel_pasid_lookup_id(int pasid)
> -{
> - void *p;
> -
> - spin_lock(&pasid_lock);
> - p = idr_find(&pasid_idr, pasid);
> - spin_unlock(&pasid_lock);
> -
> - return p;
> -}
>
> int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
> {
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 9b159132405d..a9a7f85a09bc 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -17,6 +17,7 @@
> #include <linux/dmar.h>
> #include <linux/interrupt.h>
> #include <linux/mm_types.h>
> +#include <linux/ioasid.h>
> #include <asm/page.h>
>
> #include "intel-pasid.h"
> @@ -318,16 +319,15 @@ int intel_svm_bind_mm(struct device *dev, int
> *pasid, int flags, struct svm_dev_
> if (pasid_max > intel_pasid_max_id)
> pasid_max = intel_pasid_max_id;
>
> - /* Do not use PASID 0 in caching mode (virtualised IOMMU)
> */
> - ret = intel_pasid_alloc_id(svm,
> - !!cap_caching_mode(iommu->cap),
> - pasid_max - 1, GFP_KERNEL);
> - if (ret < 0) {
> + /* Do not use PASID 0, reserved for RID to PASID */
> + svm->pasid = ioasid_alloc(NULL, PASID_MIN,
> + pasid_max - 1, svm);
> + if (svm->pasid == INVALID_IOASID) {
> kfree(svm);
> kfree(sdev);
> + ret = ENOSPC;
> goto out;
> }
> - svm->pasid = ret;
> svm->notifier.ops = &intel_mmuops;
> svm->mm = mm;
> svm->flags = flags;
> @@ -337,7 +337,7 @@ int intel_svm_bind_mm(struct device *dev, int
> *pasid, int flags, struct svm_dev_
> if (mm) {
> ret = mmu_notifier_register(&svm->notifier, mm);
> if (ret) {
> - intel_pasid_free_id(svm->pasid);
> + ioasid_free(svm->pasid);
> kfree(svm);
> kfree(sdev);
> goto out;
> @@ -353,7 +353,7 @@ int intel_svm_bind_mm(struct device *dev, int
> *pasid, int flags, struct svm_dev_
> if (ret) {
> if (mm)
> mmu_notifier_unregister(&svm->notifier,
> mm);
> - intel_pasid_free_id(svm->pasid);
> + ioasid_free(svm->pasid);
> kfree(svm);
> kfree(sdev);
> goto out;
> @@ -401,7 +401,12 @@ int intel_svm_unbind_mm(struct device *dev, int
> pasid)
> if (!iommu)
> goto out;
>
> - svm = intel_pasid_lookup_id(pasid);
> + svm = ioasid_find(NULL, pasid, NULL);
> + if (IS_ERR(svm)) {
> + ret = PTR_ERR(svm);
> + goto out;
> + }
> +
> if (!svm)
> goto out;
>
> @@ -423,7 +428,9 @@ int intel_svm_unbind_mm(struct device *dev, int
> pasid)
> kfree_rcu(sdev, rcu);
>
> if (list_empty(&svm->devs)) {
> - intel_pasid_free_id(svm->pasid);
> + /* Clear private data so that free
> pass check */
> + ioasid_set_data(svm->pasid, NULL);
> + ioasid_free(svm->pasid);
> if (svm->mm)
>
> mmu_notifier_unregister(&svm->notifier, svm->mm);
>
> @@ -458,10 +465,11 @@ int intel_svm_is_pasid_valid(struct device *dev,
> int pasid)
> if (!iommu)
> goto out;
>
> - svm = intel_pasid_lookup_id(pasid);
> - if (!svm)
> + svm = ioasid_find(NULL, pasid, NULL);
> + if (IS_ERR(svm)) {
> + ret = PTR_ERR(svm);
> goto out;
> -
> + }
> /* init_mm is used in this case */
> if (!svm->mm)
> ret = 1;
> @@ -568,13 +576,12 @@ static irqreturn_t prq_event_thread(int irq, void
> *d)
>
> if (!svm || svm->pasid != req->pasid) {
> rcu_read_lock();
> - svm = intel_pasid_lookup_id(req->pasid);
> + svm = ioasid_find(NULL, req->pasid, NULL);
> /* It *can't* go away, because the driver is not
> permitted
> * to unbind the mm while any page faults are
> outstanding.
> * So we only need RCU to protect the internal idr
> code. */
> rcu_read_unlock();
> -
> - if (!svm) {
> + if (IS_ERR(svm) || !svm) {
> pr_err("%s: Page request for invalid
> PASID %d: %08llx %08llx\n",
> iommu->name, req->pasid, ((unsigned
> long long *)req)[0],
> ((unsigned long long *)req)[1]);
> --
> 2.7.4

2019-10-25 19:25:25

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 06/11] iommu/vt-d: Avoid duplicated code for PASID setup

> From: Jacob Pan [mailto:[email protected]]
> Sent: Friday, October 25, 2019 3:55 AM
>
> After each setup for PASID entry, related translation caches must be
> flushed.
> We can combine duplicated code into one function which is less error
> prone.
>
> Signed-off-by: Jacob Pan <[email protected]>

similarly, it doesn't need to be in this series.

> ---
> drivers/iommu/intel-pasid.c | 48 +++++++++++++++++---------------------------
> -
> 1 file changed, 18 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index e79d680fe300..ffbd416ed3b8 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -485,6 +485,21 @@ void intel_pasid_tear_down_entry(struct
> intel_iommu *iommu,
> devtlb_invalidation_with_pasid(iommu, dev, pasid);
> }
>
> +static void pasid_flush_caches(struct intel_iommu *iommu,
> + struct pasid_entry *pte,
> + int pasid, u16 did)
> +{
> + if (!ecap_coherent(iommu->ecap))
> + clflush_cache_range(pte, sizeof(*pte));
> +
> + if (cap_caching_mode(iommu->cap)) {
> + pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> + iotlb_invalidation_with_pasid(iommu, did, pasid);
> + } else {
> + iommu_flush_write_buffer(iommu);
> + }
> +}
> +
> /*
> * Set up the scalable mode pasid table entry for first only
> * translation type.
> @@ -530,16 +545,7 @@ int intel_pasid_setup_first_level(struct
> intel_iommu *iommu,
> /* Setup Present and PASID Granular Transfer Type: */
> pasid_set_translation_type(pte, 1);
> pasid_set_present(pte);
> -
> - if (!ecap_coherent(iommu->ecap))
> - clflush_cache_range(pte, sizeof(*pte));
> -
> - if (cap_caching_mode(iommu->cap)) {
> - pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> - iotlb_invalidation_with_pasid(iommu, did, pasid);
> - } else {
> - iommu_flush_write_buffer(iommu);
> - }
> + pasid_flush_caches(iommu, pte, pasid, did);
>
> return 0;
> }
> @@ -603,16 +609,7 @@ int intel_pasid_setup_second_level(struct
> intel_iommu *iommu,
> */
> pasid_set_sre(pte);
> pasid_set_present(pte);
> -
> - if (!ecap_coherent(iommu->ecap))
> - clflush_cache_range(pte, sizeof(*pte));
> -
> - if (cap_caching_mode(iommu->cap)) {
> - pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> - iotlb_invalidation_with_pasid(iommu, did, pasid);
> - } else {
> - iommu_flush_write_buffer(iommu);
> - }
> + pasid_flush_caches(iommu, pte, pasid, did);
>
> return 0;
> }
> @@ -646,16 +643,7 @@ int intel_pasid_setup_pass_through(struct
> intel_iommu *iommu,
> */
> pasid_set_sre(pte);
> pasid_set_present(pte);
> -
> - if (!ecap_coherent(iommu->ecap))
> - clflush_cache_range(pte, sizeof(*pte));
> -
> - if (cap_caching_mode(iommu->cap)) {
> - pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> - iotlb_invalidation_with_pasid(iommu, did, pasid);
> - } else {
> - iommu_flush_write_buffer(iommu);
> - }
> + pasid_flush_caches(iommu, pte, pasid, did);
>
> return 0;
> }
> --
> 2.7.4

2019-10-25 19:26:01

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 01/11] iommu/vt-d: Cache virtual command capability register

> From: Jacob Pan [mailto:[email protected]]
> Sent: Friday, October 25, 2019 3:55 AM
>
> Virtual command registers are used in the guest only, to prevent
> vmexit cost, we cache the capability and store it during initialization.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/dmar.c | 1 +
> include/linux/intel-iommu.h | 4 ++++
> 2 files changed, 5 insertions(+)
>
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index eecd6a421667..49bb7d76e646 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -950,6 +950,7 @@ static int map_iommu(struct intel_iommu *iommu,
> u64 phys_addr)
> warn_invalid_dmar(phys_addr, " returns all ones");
> goto unmap;
> }
> + iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
>
> /* the registers might be more than one page */
> map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index ed11ef594378..2e1bed9b7eef 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -186,6 +186,9 @@
> #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
> #define ecap_sc_support(e) ((e >> 7) & 0x1) /* Snooping Control */
>
> +/* Virtual command interface capabilities */
> +#define vccap_pasid(v) ((v & DMA_VCS_PAS)) /* PASID
> allocation */

DMA_VCS_PAS is defined in [2/11]. should move to here.

> +
> /* IOTLB_REG */
> #define DMA_TLB_FLUSH_GRANU_OFFSET 60
> #define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60)
> @@ -520,6 +523,7 @@ struct intel_iommu {
> u64 reg_size; /* size of hw register set */
> u64 cap;
> u64 ecap;
> + u64 vccap;
> u32 gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF
> */
> raw_spinlock_t register_lock; /* protect register handling */
> int seq_id; /* sequence id of the iommu */
> --
> 2.7.4

2019-10-25 19:26:52

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 02/11] iommu/vt-d: Enlightened PASID allocation

> From: Jacob Pan [mailto:[email protected]]
> Sent: Friday, October 25, 2019 3:55 AM
>
> From: Lu Baolu <[email protected]>
>
> Enabling IOMMU in a guest requires communication with the host
> driver for certain aspects. Use of PASID ID to enable Shared Virtual
> Addressing (SVA) requires managing PASID's in the host. VT-d 3.0 spec
> provides a Virtual Command Register (VCMD) to facilitate this.
> Writes to this register in the guest are trapped by QEMU which
> proxies the call to the host driver.
>
> This virtual command interface consists of a capability register,
> a virtual command register, and a virtual response register. Refer
> to section 10.4.42, 10.4.43, 10.4.44 for more information.
>
> This patch adds the enlightened PASID allocation/free interfaces
> via the virtual command interface.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Signed-off-by: Liu Yi L <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Signed-off-by: Jacob Pan <[email protected]>
> Reviewed-by: Eric Auger <[email protected]>
> ---
> drivers/iommu/intel-pasid.c | 56
> +++++++++++++++++++++++++++++++++++++++++++++
> drivers/iommu/intel-pasid.h | 13 ++++++++++-
> include/linux/intel-iommu.h | 2 ++
> 3 files changed, 70 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 040a445be300..d81e857d2b25 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -63,6 +63,62 @@ void *intel_pasid_lookup_id(int pasid)
> return p;
> }
>
> +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
> +{
> + unsigned long flags;
> + u8 status_code;
> + int ret = 0;
> + u64 res;
> +
> + raw_spin_lock_irqsave(&iommu->register_lock, flags);
> + dmar_writeq(iommu->reg + DMAR_VCMD_REG,
> VCMD_CMD_ALLOC);
> + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> + !(res & VCMD_VRSP_IP), res);
> + raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> +

should we handle VCMD_VRSP_IP here?

> + status_code = VCMD_VRSP_SC(res);
> + switch (status_code) {
> + case VCMD_VRSP_SC_SUCCESS:
> + *pasid = VCMD_VRSP_RESULT(res);
> + break;
> + case VCMD_VRSP_SC_NO_PASID_AVAIL:
> + pr_info("IOMMU: %s: No PASID available\n", iommu-
> >name);
> + ret = -ENOMEM;
> + break;
> + default:
> + ret = -ENODEV;
> + pr_warn("IOMMU: %s: Unexpected error code %d\n",
> + iommu->name, status_code);
> + }
> +
> + return ret;
> +}
> +
> +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
> +{
> + unsigned long flags;
> + u8 status_code;
> + u64 res;
> +
> + raw_spin_lock_irqsave(&iommu->register_lock, flags);
> + dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) |
> VCMD_CMD_FREE);

define a macro for pasid offset.

> + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> + !(res & VCMD_VRSP_IP), res);
> + raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> +
> + status_code = VCMD_VRSP_SC(res);
> + switch (status_code) {
> + case VCMD_VRSP_SC_SUCCESS:
> + break;
> + case VCMD_VRSP_SC_INVALID_PASID:
> + pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
> + break;
> + default:
> + pr_warn("IOMMU: %s: Unexpected error code %d\n",
> + iommu->name, status_code);
> + }
> +}
> +
> /*
> * Per device pasid table management:
> */
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index fc8cd8f17de1..e413e884e685 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -23,6 +23,16 @@
> #define is_pasid_enabled(entry) (((entry)->lo >> 3) & 0x1)
> #define get_pasid_dir_size(entry) (1 << ((((entry)->lo >> 9) & 0x7) + 7))
>
> +/* Virtual command interface for enlightened pasid management. */
> +#define VCMD_CMD_ALLOC 0x1
> +#define VCMD_CMD_FREE 0x2
> +#define VCMD_VRSP_IP 0x1
> +#define VCMD_VRSP_SC(e) (((e) >> 1) & 0x3)
> +#define VCMD_VRSP_SC_SUCCESS 0
> +#define VCMD_VRSP_SC_NO_PASID_AVAIL 1
> +#define VCMD_VRSP_SC_INVALID_PASID 1
> +#define VCMD_VRSP_RESULT(e) (((e) >> 8) & 0xfffff)
> +
> /*
> * Domain ID reserved for pasid entries programmed for first-level
> * only and pass-through transfer modes.
> @@ -95,5 +105,6 @@ int intel_pasid_setup_pass_through(struct
> intel_iommu *iommu,
> struct device *dev, int pasid);
> void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> struct device *dev, int pasid);
> -
> +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
> +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
> #endif /* __INTEL_PASID_H */
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 2e1bed9b7eef..1d4b8dcdc5d8 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -161,6 +161,7 @@
> #define ecap_smpwc(e) (((e) >> 48) & 0x1)
> #define ecap_flts(e) (((e) >> 47) & 0x1)
> #define ecap_slts(e) (((e) >> 46) & 0x1)
> +#define ecap_vcs(e) (((e) >> 44) & 0x1)
> #define ecap_smts(e) (((e) >> 43) & 0x1)
> #define ecap_dit(e) ((e >> 41) & 0x1)
> #define ecap_pasid(e) ((e >> 40) & 0x1)
> @@ -282,6 +283,7 @@
>
> /* PRS_REG */
> #define DMA_PRS_PPR ((u32)1)
> +#define DMA_VCS_PAS ((u64)1)
>
> #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts)
> \
> do { \
> --
> 2.7.4

2019-10-25 19:27:49

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 07/11] iommu/vt-d: Add nested translation helper function

> From: Jacob Pan [mailto:[email protected]]
> Sent: Friday, October 25, 2019 3:55 AM
>
> Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
> With PASID granular translation type set to 0x11b, translation
> result from the first level(FL) also subject to a second level(SL)
> page table translation. This mode is used for SVA virtualization,
> where FL performs guest virtual to guest physical translation and
> SL performs guest physical to host physical translation.

I think we really differentiate what is the common logic for
first-level usages (GVA, GIOVA, etc.) in scalable mode, and
what is specific to SVA. I have the feeling that SVA is over-used
to cause confusing interpretation.

>
> Signed-off-by: Jacob Pan <[email protected]>
> Signed-off-by: Liu, Yi L <[email protected]>
> ---
> drivers/iommu/intel-pasid.c | 207
> ++++++++++++++++++++++++++++++++++++++++++++
> drivers/iommu/intel-pasid.h | 12 +++
> 2 files changed, 219 insertions(+)
>
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index ffbd416ed3b8..f846a907cfcf 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -415,6 +415,76 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value)
> pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
> }
>
> +/*
> + * Setup the Extended Memory Type(EMT) field (Bits 91-93)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_emt(struct pasid_entry *pe, u64 value)
> +{
> + pasid_set_bits(&pe->val[1], GENMASK_ULL(29, 27), value << 27);
> +}
> +
> +/*
> + * Setup the Page Attribute Table (PAT) field (Bits 96-127)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_pat(struct pasid_entry *pe, u64 value)
> +{
> + pasid_set_bits(&pe->val[1], GENMASK_ULL(63, 32), value << 27);
> +}
> +
> +/*
> + * Setup the Cache Disable (CD) field (Bit 89)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_cd(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 25, 1);
> +}
> +
> +/*
> + * Setup the Extended Memory Type Enable (EMTE) field (Bit 90)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_emte(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 26, 1);
> +}
> +
> +/*
> + * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_eafe(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[2], 1 << 7, 1);
> +}
> +
> +/*
> + * Setup the Page-level Cache Disable (PCD) field (Bit 95)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_pcd(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 31, 1);
> +}
> +
> +/*
> + * Setup the Page-level Write-Through (PWT)) field (Bit 94)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_pwt(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 30, 1);
> +}
> +
> static void
> pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
> u16 did, int pasid)
> @@ -647,3 +717,140 @@ int intel_pasid_setup_pass_through(struct
> intel_iommu *iommu,
>
> return 0;
> }
> +
> +static int intel_pasid_setup_bind_data(struct intel_iommu *iommu,
> + struct pasid_entry *pte,
> + struct iommu_gpasid_bind_data_vtd
> *pasid_data)
> +{
> + /*
> + * Not all guest PASID table entry fields are passed down during
> bind,
> + * here we only set up the ones that are dependent on guest
> settings.
> + * Execution related bits such as NXE, SMEP are not meaningful to
> IOMMU,
> + * therefore not set. Other fields, such as snoop related, are set
> based
> + * on host needs regardless of guest settings.
> + */
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_SRE) {
> + if (!ecap_srs(iommu->ecap)) {
> + pr_err("No supervisor request support on %s\n",
> + iommu->name);
> + return -EINVAL;
> + }
> + pasid_set_sre(pte);
> + }
> +
> + if ((pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) &&
> ecap_eafs(iommu->ecap))
> + pasid_set_eafe(pte);
> +
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EMTE) {
> + pasid_set_emte(pte);
> + pasid_set_emt(pte, pasid_data->emt);
> + }

above conditional checks are not consistent. The 1st check may
return error but latter two don't. Can you confirm whether it's
desired way?

> +
> + /*
> + * Memory type is only applicable to devices inside processor
> coherent
> + * domain. PCIe devices are not included. We can skip the rest of
> the
> + * flags if IOMMU does not support MTS.
> + */
> + if (!ecap_mts(iommu->ecap)) {
> + pr_info("%s does not support memory type bind guest
> PASID\n",
> + iommu->name);
> + return 0;

why not -EINVAL?

> + }
> +
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PCD)
> + pasid_set_pcd(pte);
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PWT)
> + pasid_set_pwt(pte);
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_CD)
> + pasid_set_cd(pte);
> + pasid_set_pat(pte, pasid_data->pat);
> +
> + return 0;
> +
> +}
> +
> +/**
> + * intel_pasid_setup_nested() - Set up PASID entry for nested translation
> + * which is used for vSVA. The first level page tables are used for
> + * GVA-GPA translation in the guest, second level page tables are used
> + * for GPA to HPA translation.

It's too restricting on how 1st level is used by guest.

> + *
> + * @iommu: Iommu which the device belong to
> + * @dev: Device to be set up for translation
> + * @gpgd: FLPTPTR: First Level Page translation pointer in GPA
> + * @pasid: PASID to be programmed in the device PASID table
> + * @pasid_data: Additional PASID info from the guest bind request
> + * @domain: Domain info for setting up second level page tables
> + * @addr_width: Address width of the first level (guest)
> + */
> +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> + struct device *dev, pgd_t *gpgd,
> + int pasid, struct iommu_gpasid_bind_data_vtd
> *pasid_data,
> + struct dmar_domain *domain,
> + int addr_width)
> +{
> + struct pasid_entry *pte;
> + struct dma_pte *pgd;
> + u64 pgd_val;
> + int agaw;
> + u16 did;
> +
> + if (!ecap_nest(iommu->ecap)) {
> + pr_err("IOMMU: %s: No nested translation support\n",
> + iommu->name);
> + return -EINVAL;
> + }
> +
> + pte = intel_pasid_get_entry(dev, pasid);
> + if (WARN_ON(!pte))
> + return -EINVAL;
> +
> + pasid_clear_entry(pte);
> +
> + /* Sanity checking performed by caller to make sure address
> + * width matching in two dimensions:
> + * 1. CPU vs. IOMMU
> + * 2. Guest vs. Host.
> + */
> + switch (addr_width) {
> + case 57:

AW_5LEVEL

> + pasid_set_flpm(pte, 1);
> + break;
> + case 48:

AW_4LEVEL

> + pasid_set_flpm(pte, 0);
> + break;
> + default:
> + dev_err(dev, "Invalid paging mode %d\n", addr_width);
> + return -EINVAL;
> + }
> +
> + pasid_set_flptr(pte, (u64)gpgd);
> +
> + intel_pasid_setup_bind_data(iommu, pte, pasid_data);
> +
> + /* Setup the second level based on the given domain */
> + pgd = domain->pgd;
> +
> + for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
> + pgd = phys_to_virt(dma_pte_addr(pgd));
> + if (!dma_pte_present(pgd)) {
> + dev_err(dev, "Invalid domain page table\n");

pasid_clear_entry?

> + return -EINVAL;
> + }
> + }
> + pgd_val = virt_to_phys(pgd);
> + pasid_set_slptr(pte, pgd_val);
> + pasid_set_fault_enable(pte);
> +
> + did = domain->iommu_did[iommu->seq_id];
> + pasid_set_domain_id(pte, did);
> +
> + pasid_set_address_width(pte, agaw);
> + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
> +
> + pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
> + pasid_set_present(pte);
> + pasid_flush_caches(iommu, pte, pasid, did);
> +
> + return 0;
> +}
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index e413e884e685..09c85db73b77 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -46,6 +46,7 @@
> * to vmalloc or even module mappings.
> */
> #define PASID_FLAG_SUPERVISOR_MODE BIT(0)
> +#define PASID_FLAG_NESTED BIT(1)
>
> struct pasid_dir_entry {
> u64 val;
> @@ -55,6 +56,11 @@ struct pasid_entry {
> u64 val[8];
> };
>
> +#define PASID_ENTRY_PGTT_FL_ONLY (1)
> +#define PASID_ENTRY_PGTT_SL_ONLY (2)
> +#define PASID_ENTRY_PGTT_NESTED (3)
> +#define PASID_ENTRY_PGTT_PT (4)
> +
> /* The representative of a PASID table */
> struct pasid_table {
> void *table; /* pasid table pointer */
> @@ -103,6 +109,12 @@ int intel_pasid_setup_second_level(struct
> intel_iommu *iommu,
> int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
> struct dmar_domain *domain,
> struct device *dev, int pasid);
> +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> + struct device *dev, pgd_t *pgd,
> + int pasid,
> + struct iommu_gpasid_bind_data_vtd *pasid_data,
> + struct dmar_domain *domain,
> + int addr_width);
> void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> struct device *dev, int pasid);
> int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
> --
> 2.7.4

2019-10-25 19:28:17

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support

> From: Jacob Pan [mailto:[email protected]]
> Sent: Friday, October 25, 2019 3:55 AM
>
> When supporting guest SVA with emulated IOMMU, the guest PASID
> table is shadowed in VMM. Updates to guest vIOMMU PASID table
> will result in PASID cache flush which will be passed down to
> the host as bind guest PASID calls.

will be translated into binding/unbinding guest PASID calls to update
the host shadow PASID table.

>
> For the SL page tables, it will be harvested from device's
> default domain (request w/o PASID), or aux domain in case of
> mediated device.

harvested -> copied or linked to?

>
> .-------------. .---------------------------.
> | vIOMMU | | Guest process CR3, FL only|
> | | '---------------------------'
> .----------------/
> | PASID Entry |--- PASID cache flush -
> '-------------' |
> | | V
> | | CR3 in GPA
> '-------------'
> Guest
> ------| Shadow |--------------------------|--------
> v v v
> Host
> .-------------. .----------------------.
> | pIOMMU | | Bind FL for GVA-GPA |
> | | '----------------------'
> .----------------/ |
> | PASID Entry | V (Nested xlate)
> '----------------\.------------------------------.
> | | |SL for GPA-HPA, default domain|
> | | '------------------------------'
> '-------------'
> Where:
> - FL = First level/stage one page tables
> - SL = Second level/stage two page tables
>
> Signed-off-by: Jacob Pan <[email protected]>
> Signed-off-by: Liu, Yi L <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 4 +
> drivers/iommu/intel-svm.c | 184
> ++++++++++++++++++++++++++++++++++++++++++++
> include/linux/intel-iommu.h | 8 +-
> include/linux/intel-svm.h | 17 ++++
> 4 files changed, 212 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index acd1ac787d8b..5fab32fbc4b4 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -6026,6 +6026,10 @@ const struct iommu_ops intel_iommu_ops = {
> .dev_disable_feat = intel_iommu_dev_disable_feat,
> .is_attach_deferred = intel_iommu_is_attach_deferred,
> .pgsize_bitmap = INTEL_IOMMU_PGSIZES,
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> + .sva_bind_gpasid = intel_svm_bind_gpasid,
> + .sva_unbind_gpasid = intel_svm_unbind_gpasid,
> +#endif

again, pure PASID management logic should be separated from SVM.

> };
>
> static void quirk_iommu_igfx(struct pci_dev *dev)
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index a18b02a9709d..ae13a310cf96 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -216,6 +216,190 @@ static LIST_HEAD(global_svm_list);
> list_for_each_entry(sdev, &svm->devs, list) \
> if (dev == sdev->dev) \
>
> +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> + struct device *dev,
> + struct iommu_gpasid_bind_data *data)
> +{
> + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> + struct dmar_domain *ddomain;
> + struct intel_svm_dev *sdev;
> + struct intel_svm *svm;
> + int ret = 0;
> +
> + if (WARN_ON(!iommu) || !data)
> + return -EINVAL;
> +
> + if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> + data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> + return -EINVAL;
> +
> + if (dev_is_pci(dev)) {
> + /* VT-d supports devices with full 20 bit PASIDs only */
> + if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> + return -EINVAL;
> + }

what about non-pci devices? It just moves forward w/o any check here?

> +
> + /*
> + * We only check host PASID range, we have no knowledge to check
> + * guest PASID range nor do we use the guest PASID.
> + */
> + if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> + return -EINVAL;
> +
> + ddomain = to_dmar_domain(domain);
> + /* REVISIT:
> + * Sanity check adddress width and paging mode support
> + * width matching in two dimensions:
> + * 1. paging mode CPU <= IOMMU
> + * 2. address width Guest <= Host.
> + */

Is lacking of above logic harmful? If not, we should add

> + mutex_lock(&pasid_mutex);
> + svm = ioasid_find(NULL, data->hpasid, NULL);
> + if (IS_ERR(svm)) {
> + ret = PTR_ERR(svm);
> + goto out;
> + }
> + if (svm) {
> + /*
> + * If we found svm for the PASID, there must be at
> + * least one device bond, otherwise svm should be freed.
> + */
> + BUG_ON(list_empty(&svm->devs));
> +
> + for_each_svm_dev(svm, dev) {
> + /* In case of multiple sub-devices of the same pdev
> assigned, we should
> + * allow multiple bind calls with the same PASID and
> pdev.
> + */
> + sdev->users++;
> + goto out;

sorry if I overlooked, but I didn't see any check on the PASID
actually belonging to this process. At least should check the
match between svm->mm and get_task_mm? also check
whether a previous binding between this hpasid and gpasid
already exists.

> + }
> + } else {
> + /* We come here when PASID has never been bond to a
> device. */
> + svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> + if (!svm) {
> + ret = -ENOMEM;
> + goto out;
> + }
> + /* REVISIT: upper layer/VFIO can track host process that
> bind the PASID.
> + * ioasid_set = mm might be sufficient for vfio to check
> pasid VMM
> + * ownership.
> + */

Is it correct to leave the check to the caller?

> + svm->mm = get_task_mm(current);
> + svm->pasid = data->hpasid;
> + if (data->flags & IOMMU_SVA_GPASID_VAL) {
> + svm->gpasid = data->gpasid;
> + svm->flags |= SVM_FLAG_GUEST_PASID;
> + }
> + ioasid_set_data(data->hpasid, svm);
> + INIT_LIST_HEAD_RCU(&svm->devs);
> + INIT_LIST_HEAD(&svm->list);
> +
> + mmput(svm->mm);
> + }
> + sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> + if (!sdev) {
> + if (list_empty(&svm->devs))
> + kfree(svm);
> + ret = -ENOMEM;
> + goto out;
> + }
> + sdev->dev = dev;
> + sdev->users = 1;
> +
> + /* Set up device context entry for PASID if not enabled already */
> + ret = intel_iommu_enable_pasid(iommu, sdev->dev);
> + if (ret) {
> + dev_err(dev, "Failed to enable PASID capability\n");
> + kfree(sdev);
> + goto out;
> + }
> +
> + /*
> + * For guest bind, we need to set up PASID table entry as follows:
> + * - FLPM matches guest paging mode
> + * - turn on nested mode
> + * - SL guest address width matching
> + */
> + ret = intel_pasid_setup_nested(iommu,
> + dev,
> + (pgd_t *)data->gpgd,
> + data->hpasid,
> + &data->vtd,
> + ddomain,
> + data->addr_width);
> + if (ret) {
> + dev_err(dev, "Failed to set up PASID %llu in nested mode,
> Err %d\n",
> + data->hpasid, ret);
> + kfree(sdev);

disable pasid? revert ioasid_set_data?

> + goto out;
> + }
> + svm->flags |= SVM_FLAG_GUEST_MODE;
> +
> + init_rcu_head(&sdev->rcu);
> + list_add_rcu(&sdev->list, &svm->devs);
> + out:
> + mutex_unlock(&pasid_mutex);
> + return ret;
> +}
> +
> +int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> +{
> + struct intel_svm_dev *sdev;
> + struct intel_iommu *iommu;
> + struct intel_svm *svm;
> + int ret = -EINVAL;
> +
> + mutex_lock(&pasid_mutex);
> + iommu = intel_svm_device_to_iommu(dev);
> + if (!iommu)
> + goto out;
> +
> + svm = ioasid_find(NULL, pasid, NULL);
> + if (IS_ERR_OR_NULL(svm)) {
> + ret = PTR_ERR(svm);
> + goto out;
> + }
> +
> + for_each_svm_dev(svm, dev) {
> + ret = 0;
> + sdev->users--;
> + if (!sdev->users) {
> + list_del_rcu(&sdev->list);
> + intel_pasid_tear_down_entry(iommu, dev, svm-
> >pasid);
> + /* TODO: Drain in flight PRQ for the PASID since it
> + * may get reused soon, we don't want to
> + * confuse with its previous life.
> + * intel_svm_drain_prq(dev, pasid);
> + */
> + kfree_rcu(sdev, rcu);
> +
> + if (list_empty(&svm->devs)) {
> + list_del(&svm->list);
> + kfree(svm);
> + /*
> + * We do not free PASID here until explicit
> call
> + * from VFIO to free. The PASID life cycle
> + * management is largely tied to VFIO
> management
> + * of assigned device life cycles. In case of
> + * guest exit without a explicit free PASID call,
> + * the responsibility lies in VFIO layer to free
> + * the PASIDs allocated for the guest.
> + * For security reasons, VFIO has to track the
> + * PASID ownership per guest anyway to
> ensure
> + * that PASID allocated by one guest cannot
> be
> + * used by another.
> + */
> + ioasid_set_data(pasid, NULL);
> + }
> + }
> + break;
> + }
> + out:
> + mutex_unlock(&pasid_mutex);
> +
> + return ret;
> +}
> +
> int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct
> svm_dev_ops *ops)
> {
> struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 3dba6ad3e9ad..6c74c71b1ebf 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -673,7 +673,9 @@ int intel_iommu_enable_pasid(struct intel_iommu
> *iommu, struct device *dev);
> int intel_svm_init(struct intel_iommu *iommu);
> extern int intel_svm_enable_prq(struct intel_iommu *iommu);
> extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> -
> +extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
> + struct device *dev, struct iommu_gpasid_bind_data *data);
> +extern int intel_svm_unbind_gpasid(struct device *dev, int pasid);
> struct svm_dev_ops;
>
> struct intel_svm_dev {
> @@ -690,9 +692,13 @@ struct intel_svm_dev {
> struct intel_svm {
> struct mmu_notifier notifier;
> struct mm_struct *mm;
> +
> struct intel_iommu *iommu;
> int flags;
> int pasid;
> + int gpasid; /* Guest PASID in case of vSVA bind with non-identity
> host
> + * to guest PASID mapping.
> + */
> struct list_head devs;
> struct list_head list;
> };
> diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> index 94f047a8a845..a2c189ad0b01 100644
> --- a/include/linux/intel-svm.h
> +++ b/include/linux/intel-svm.h
> @@ -44,6 +44,23 @@ struct svm_dev_ops {
> * do such IOTLB flushes automatically.
> */
> #define SVM_FLAG_SUPERVISOR_MODE (1<<1)
> +/*
> + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind to
> a device.
> + * In this case the mm_struct is in the guest kernel or userspace, its life
> + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this API
> provides
> + * means to bind/unbind guest CR3 with PASIDs allocated for a device.
> + */
> +#define SVM_FLAG_GUEST_MODE (1<<2)
> +/*
> + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own
> PASID space,
> + * which requires guest and host PASID translation at both directions. We
> keep
> + * track of guest PASID in order to provide lookup service to device drivers.
> + * One such example is a physical function (PF) driver that supports
> mediated
> + * device (mdev) assignment. Guest programming of mdev configuration
> space can
> + * only be done with guest PASID, therefore PF driver needs to find the
> matching
> + * host PASID to program the real hardware.
> + */
> +#define SVM_FLAG_GUEST_PASID (1<<3)
>
> #ifdef CONFIG_INTEL_IOMMU_SVM
>
> --
> 2.7.4

2019-10-25 19:28:29

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 10/11] iommu/vt-d: Support flushing more translation cache types

> From: Jacob Pan [mailto:[email protected]]
> Sent: Friday, October 25, 2019 3:55 AM
>
> When Shared Virtual Memory is exposed to a guest via vIOMMU, scalable
> IOTLB invalidation may be passed down from outside IOMMU subsystems.

from outside of host IOMMU subsystem

> This patch adds invalidation functions that can be used for additional
> translation cache types.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/dmar.c | 46
> +++++++++++++++++++++++++++++++++++++++++++++
> drivers/iommu/intel-pasid.c | 3 ++-
> include/linux/intel-iommu.h | 21 +++++++++++++++++----
> 3 files changed, 65 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index 49bb7d76e646..0ce2d32ff99e 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1346,6 +1346,20 @@ void qi_flush_iotlb(struct intel_iommu *iommu,
> u16 did, u64 addr,
> qi_submit_sync(&desc, iommu);
> }
>
> +/* PASID-based IOTLB Invalidate */
> +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr, u32
> pasid,

qi_flush_iotlb_pasid.

> + unsigned int size_order, u64 granu, int ih)
> +{
> + struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
> +
> + desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
> + QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
> + desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
> + QI_EIOTLB_AM(size_order);
> +
> + qi_submit_sync(&desc, iommu);
> +}
> +
> void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> u16 qdep, u64 addr, unsigned mask)
> {
> @@ -1369,6 +1383,38 @@ void qi_flush_dev_iotlb(struct intel_iommu
> *iommu, u16 sid, u16 pfsid,
> qi_submit_sync(&desc, iommu);
> }
>
> +/* PASID-based device IOTLB Invalidate */
> +void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> + u32 pasid, u16 qdep, u64 addr, unsigned size_order, u64
> granu)
> +{
> + struct qi_desc desc;
> +
> + desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid)
> |
> + QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
> + QI_DEV_IOTLB_PFSID(pfsid);
> + desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
> +
> + /* If S bit is 0, we only flush a single page. If S bit is set,
> + * The least significant zero bit indicates the invalidation address
> + * range. VT-d spec 6.5.2.6.
> + * e.g. address bit 12[0] indicates 8KB, 13[0] indicates 16KB.
> + */
> + if (!size_order) {
> + desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) &
> ~QI_DEV_EIOTLB_SIZE;
> + } else {
> + unsigned long mask = 1UL << (VTD_PAGE_SHIFT +
> size_order);
> + desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) |
> QI_DEV_EIOTLB_SIZE;
> + }
> + qi_submit_sync(&desc, iommu);
> +}
> +
> +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64
> granu, int pasid)
> +{
> + struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
> +
> + desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) |
> QI_PC_GRAN(granu) | QI_PC_TYPE;
> + qi_submit_sync(&desc, iommu);
> +}
> /*
> * Disable Queued Invalidation interface.
> */
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index f846a907cfcf..6d7a701ef4d3 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -491,7 +491,8 @@ pasid_cache_invalidation_with_pasid(struct
> intel_iommu *iommu,
> {
> struct qi_desc desc;
>
> - desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL |
> QI_PC_PASID(pasid);
> + desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) |
> + QI_PC_PASID(pasid) | QI_PC_TYPE;
> desc.qw1 = 0;
> desc.qw2 = 0;
> desc.qw3 = 0;
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 6c74c71b1ebf..a25fb3a0ea5b 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -332,7 +332,7 @@ enum {
> #define QI_IOTLB_GRAN(gran) (((u64)gran) >>
> (DMA_TLB_FLUSH_GRANU_OFFSET-4))
> #define QI_IOTLB_ADDR(addr) (((u64)addr) & VTD_PAGE_MASK)
> #define QI_IOTLB_IH(ih) (((u64)ih) << 6)
> -#define QI_IOTLB_AM(am) (((u8)am))
> +#define QI_IOTLB_AM(am) (((u8)am) & 0x3f)
>
> #define QI_CC_FM(fm) (((u64)fm) << 48)
> #define QI_CC_SID(sid) (((u64)sid) << 32)
> @@ -350,16 +350,21 @@ enum {
> #define QI_PC_DID(did) (((u64)did) << 16)
> #define QI_PC_GRAN(gran) (((u64)gran) << 4)
>
> -#define QI_PC_ALL_PASIDS (QI_PC_TYPE | QI_PC_GRAN(0))
> -#define QI_PC_PASID_SEL (QI_PC_TYPE | QI_PC_GRAN(1))
> +/* PASID cache invalidation granu */
> +#define QI_PC_ALL_PASIDS 0
> +#define QI_PC_PASID_SEL 1
>
> #define QI_EIOTLB_ADDR(addr) ((u64)(addr) & VTD_PAGE_MASK)
> #define QI_EIOTLB_IH(ih) (((u64)ih) << 6)
> -#define QI_EIOTLB_AM(am) (((u64)am))
> +#define QI_EIOTLB_AM(am) (((u64)am) & 0x3f)
> #define QI_EIOTLB_PASID(pasid) (((u64)pasid) << 32)
> #define QI_EIOTLB_DID(did) (((u64)did) << 16)
> #define QI_EIOTLB_GRAN(gran) (((u64)gran) << 4)
>
> +/* QI Dev-IOTLB inv granu */
> +#define QI_DEV_IOTLB_GRAN_ALL 1
> +#define QI_DEV_IOTLB_GRAN_PASID_SEL 0
> +
> #define QI_DEV_EIOTLB_ADDR(a) ((u64)(a) & VTD_PAGE_MASK)
> #define QI_DEV_EIOTLB_SIZE (((u64)1) << 11)
> #define QI_DEV_EIOTLB_GLOB(g) ((u64)g)
> @@ -655,8 +660,16 @@ extern void qi_flush_context(struct intel_iommu
> *iommu, u16 did, u16 sid,
> u8 fm, u64 type);
> extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> unsigned int size_order, u64 type);
> +extern void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> + u32 pasid, unsigned int size_order, u64 type, int ih);
> extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16
> pfsid,
> u16 qdep, u64 addr, unsigned mask);
> +
> +extern void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16
> pfsid,
> + u32 pasid, u16 qdep, u64 addr, unsigned size_order,
> u64 granu);
> +
> +extern void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did,
> u64 granu, int pasid);
> +
> extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu
> *iommu);
>
> extern int dmar_ir_support(void);
> --
> 2.7.4

2019-10-25 19:28:48

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function

> From: Jacob Pan [mailto:[email protected]]
> Sent: Friday, October 25, 2019 3:55 AM
>
> When Shared Virtual Address (SVA) is enabled for a guest OS via
> vIOMMU, we need to provide invalidation support at IOMMU API and
> driver
> level. This patch adds Intel VT-d specific function to implement
> iommu passdown invalidate API for shared virtual address.
>
> The use case is for supporting caching structure invalidation
> of assigned SVM capable devices. Emulated IOMMU exposes queue
> invalidation capability and passes down all descriptors from the guest
> to the physical IOMMU.

specifically you may clarify that only invalidations related to
first-level page table is passed down, because it's guest
structure being bound to the first-level. other descriptors
are emulated or translated into other necessary operations.

>
> The assumption is that guest to host device ID mapping should be
> resolved prior to calling IOMMU driver. Based on the device handle,
> host IOMMU driver can replace certain fields before submit to the
> invalidation queue.

what is device ID? it's a bit confusing term here.

>
> Signed-off-by: Jacob Pan <[email protected]>
> Signed-off-by: Ashok Raj <[email protected]>
> Signed-off-by: Liu, Yi L <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 170
> ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 170 insertions(+)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 5fab32fbc4b4..a73e76d6457a 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5491,6 +5491,175 @@ static void
> intel_iommu_aux_detach_device(struct iommu_domain *domain,
> aux_domain_remove_dev(to_dmar_domain(domain), dev);
> }
>
> +/*
> + * 2D array for converting and sanitizing IOMMU generic TLB granularity to
> + * VT-d granularity. Invalidation is typically included in the unmap
> operation
> + * as a result of DMA or VFIO unmap. However, for assigned device where
> guest
> + * could own the first level page tables without being shadowed by QEMU.
> In
> + * this case there is no pass down unmap to the host IOMMU as a result of
> unmap
> + * in the guest. Only invalidations are trapped and passed down.
> + * In all cases, only first level TLB invalidation (request with PASID) can be
> + * passed down, therefore we do not include IOTLB granularity for request
> + * without PASID (second level).
> + *
> + * For an example, to find the VT-d granularity encoding for IOTLB
> + * type and page selective granularity within PASID:
> + * X: indexed by iommu cache type
> + * Y: indexed by enum iommu_inv_granularity
> + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
> + *
> + * Granu_map array indicates validity of the table. 1: valid, 0: invalid
> + *
> + */
> +const static int
> inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRAN
> U_NR] = {
> + /* PASID based IOTLB, support PASID selective and page selective */
> + {0, 1, 1},
> + /* PASID based dev TLBs, only support all PASIDs or single PASID */
> + {1, 1, 0},

I forgot previous discussion. is it necessary to pass down dev TLB invalidation
requests? Can it be handled by host iOMMU driver automatically?

> + /* PASID cache */
> + {1, 1, 0}
> +};
> +
> +const static u64
> inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRAN
> U_NR] = {
> + /* PASID based IOTLB */
> + {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
> + /* PASID based dev TLBs */
> + {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
> + /* PASID cache */
> + {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
> +};
> +
> +static inline int to_vtd_granularity(int type, int granu, u64 *vtd_granu)
> +{
> + if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >=
> IOMMU_INV_GRANU_NR ||
> + !inv_type_granu_map[type][granu])
> + return -EINVAL;
> +
> + *vtd_granu = inv_type_granu_table[type][granu];
> +
> + return 0;
> +}
> +
> +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
> +{
> + u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
> +
> + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB,
> etc.
> + * IOMMU cache invalidate API passes granu_size in bytes, and
> number of
> + * granu size in contiguous memory.
> + */
> + return order_base_2(nr_pages);
> +}
> +
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
> + struct device *dev, struct iommu_cache_invalidate_info
> *inv_info)
> +{
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct device_domain_info *info;
> + struct intel_iommu *iommu;
> + unsigned long flags;
> + int cache_type;
> + u8 bus, devfn;
> + u16 did, sid;
> + int ret = 0;
> + u64 size;
> +
> + if (!inv_info || !dmar_domain ||
> + inv_info->version !=
> IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> + return -EINVAL;
> +
> + if (!dev || !dev_is_pci(dev))
> + return -ENODEV;
> +
> + iommu = device_to_iommu(dev, &bus, &devfn);
> + if (!iommu)
> + return -ENODEV;
> +
> + spin_lock_irqsave(&device_domain_lock, flags);
> + spin_lock(&iommu->lock);
> + info = iommu_support_dev_iotlb(dmar_domain, iommu, bus,
> devfn);
> + if (!info) {
> + ret = -EINVAL;
> + goto out_unlock;
> + }
> + did = dmar_domain->iommu_did[iommu->seq_id];
> + sid = PCI_DEVID(bus, devfn);
> + size = to_vtd_size(inv_info->addr_info.granule_size, inv_info-
> >addr_info.nb_granules);
> +
> + for_each_set_bit(cache_type, (unsigned long *)&inv_info->cache,
> IOMMU_CACHE_INV_TYPE_NR) {
> + u64 granu = 0;
> + u64 pasid = 0;
> +
> + ret = to_vtd_granularity(cache_type, inv_info->granularity,
> &granu);
> + if (ret) {
> + pr_err("Invalid cache type and granu
> combination %d/%d\n", cache_type,
> + inv_info->granularity);
> + break;
> + }
> +
> + /* PASID is stored in different locations based on
> granularity */
> + if (inv_info->granularity == IOMMU_INV_GRANU_PASID)
> + pasid = inv_info->pasid_info.pasid;
> + else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
> + pasid = inv_info->addr_info.pasid;
> + else {
> + pr_err("Cannot find PASID for given cache type and
> granularity\n");
> + break;
> + }
> +
> + switch (BIT(cache_type)) {
> + case IOMMU_CACHE_INV_TYPE_IOTLB:
> + if (size && (inv_info->addr_info.addr &
> ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
> + pr_err("Address out of range, 0x%llx, size
> order %llu\n",
> + inv_info->addr_info.addr, size);
> + ret = -ERANGE;
> + goto out_unlock;
> + }
> +
> + qi_flush_piotlb(iommu, did,
> mm_to_dma_pfn(inv_info->addr_info.addr),
> + pasid, size, granu, inv_info-
> >addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF);
> +
> + /*
> + * Always flush device IOTLB if ATS is enabled since
> guest
> + * vIOMMU exposes CM = 1, no device IOTLB flush
> will be passed
> + * down.
> + */
> + if (info->ats_enabled) {
> + qi_flush_dev_piotlb(iommu, sid, info->pfsid,
> + pasid, info->ats_qdep,
> + inv_info->addr_info.addr,
> size,
> + granu);
> + }
> + break;
> + case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
> + if (info->ats_enabled) {
> + qi_flush_dev_piotlb(iommu, sid, info->pfsid,
> + inv_info->addr_info.pasid,
> info->ats_qdep,
> + inv_info->addr_info.addr,
> size,
> + granu);
> + } else
> + pr_warn("Passdown device IOTLB flush w/o
> ATS!\n");
> +
> + break;
> + case IOMMU_CACHE_INV_TYPE_PASID:
> + qi_flush_pasid_cache(iommu, did, granu, inv_info-
> >pasid_info.pasid);
> +
> + break;
> + default:
> + dev_err(dev, "Unsupported IOMMU invalidation
> type %d\n",
> + cache_type);
> + ret = -EINVAL;
> + }
> + }
> +out_unlock:
> + spin_unlock(&iommu->lock);
> + spin_unlock_irqrestore(&device_domain_lock, flags);
> +
> + return ret;
> +}
> +#endif
> +
> static int intel_iommu_map(struct iommu_domain *domain,
> unsigned long iova, phys_addr_t hpa,
> size_t size, int iommu_prot)
> @@ -6027,6 +6196,7 @@ const struct iommu_ops intel_iommu_ops = {
> .is_attach_deferred = intel_iommu_is_attach_deferred,
> .pgsize_bitmap = INTEL_IOMMU_PGSIZES,
> #ifdef CONFIG_INTEL_IOMMU_SVM
> + .cache_invalidate = intel_iommu_sva_invalidate,
> .sva_bind_gpasid = intel_svm_bind_gpasid,
> .sva_unbind_gpasid = intel_svm_unbind_gpasid,
> #endif
> --
> 2.7.4

2019-10-25 20:39:37

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v7 07/11] iommu/vt-d: Add nested translation helper function

Hi,

On 10/25/19 3:55 AM, Jacob Pan wrote:
> Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
> With PASID granular translation type set to 0x11b, translation
> result from the first level(FL) also subject to a second level(SL)
> page table translation. This mode is used for SVA virtualization,
> where FL performs guest virtual to guest physical translation and
> SL performs guest physical to host physical translation.
>
> Signed-off-by: Jacob Pan <[email protected]>
> Signed-off-by: Liu, Yi L <[email protected]>
> ---
> drivers/iommu/intel-pasid.c | 207 ++++++++++++++++++++++++++++++++++++++++++++
> drivers/iommu/intel-pasid.h | 12 +++
> 2 files changed, 219 insertions(+)
>
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index ffbd416ed3b8..f846a907cfcf 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -415,6 +415,76 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value)
> pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
> }
>
> +/*
> + * Setup the Extended Memory Type(EMT) field (Bits 91-93)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_emt(struct pasid_entry *pe, u64 value)
> +{
> + pasid_set_bits(&pe->val[1], GENMASK_ULL(29, 27), value << 27);
> +}
> +
> +/*
> + * Setup the Page Attribute Table (PAT) field (Bits 96-127)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_pat(struct pasid_entry *pe, u64 value)
> +{
> + pasid_set_bits(&pe->val[1], GENMASK_ULL(63, 32), value << 27);

Should be "value << 32", right?

> +}
> +
> +/*
> + * Setup the Cache Disable (CD) field (Bit 89)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_cd(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 25, 1);
> +}
> +
> +/*
> + * Setup the Extended Memory Type Enable (EMTE) field (Bit 90)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_emte(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 26, 1);
> +}
> +
> +/*
> + * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_eafe(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[2], 1 << 7, 1);
> +}
> +
> +/*
> + * Setup the Page-level Cache Disable (PCD) field (Bit 95)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_pcd(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 31, 1);
> +}
> +
> +/*
> + * Setup the Page-level Write-Through (PWT)) field (Bit 94)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_pwt(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 30, 1);
> +}
> +
> static void
> pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
> u16 did, int pasid)
> @@ -647,3 +717,140 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
>
> return 0;
> }
> +
> +static int intel_pasid_setup_bind_data(struct intel_iommu *iommu,
> + struct pasid_entry *pte,
> + struct iommu_gpasid_bind_data_vtd *pasid_data)
> +{
> + /*
> + * Not all guest PASID table entry fields are passed down during bind,
> + * here we only set up the ones that are dependent on guest settings.
> + * Execution related bits such as NXE, SMEP are not meaningful to IOMMU,
> + * therefore not set. Other fields, such as snoop related, are set based
> + * on host needs regardless of guest settings.
> + */
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_SRE) {
> + if (!ecap_srs(iommu->ecap)) {
> + pr_err("No supervisor request support on %s\n",
> + iommu->name);
> + return -EINVAL;
> + }
> + pasid_set_sre(pte);
> + }
> +
> + if ((pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) && ecap_eafs(iommu->ecap))
> + pasid_set_eafe(pte);
> +
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EMTE) {
> + pasid_set_emte(pte);
> + pasid_set_emt(pte, pasid_data->emt);
> + }
> +
> + /*
> + * Memory type is only applicable to devices inside processor coherent
> + * domain. PCIe devices are not included. We can skip the rest of the
> + * flags if IOMMU does not support MTS.
> + */
> + if (!ecap_mts(iommu->ecap)) {
> + pr_info("%s does not support memory type bind guest PASID\n",
> + iommu->name);
> + return 0;
> + }

How about making below lines as

if (ecap_mts(iommu->ecap)) {
do various pasid entry settings
} else {
pr_info(...);
}

Otherwise, when someone later adds code at the end of this function. It
might be ignored by above return 0.

Best regards,
baolu

> +
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PCD)
> + pasid_set_pcd(pte);
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PWT)
> + pasid_set_pwt(pte);
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_CD)
> + pasid_set_cd(pte);
> + pasid_set_pat(pte, pasid_data->pat);
> +
> + return 0;
> +
> +}
> +
> +/**
> + * intel_pasid_setup_nested() - Set up PASID entry for nested translation
> + * which is used for vSVA. The first level page tables are used for
> + * GVA-GPA translation in the guest, second level page tables are used
> + * for GPA to HPA translation.
> + *
> + * @iommu: Iommu which the device belong to
> + * @dev: Device to be set up for translation
> + * @gpgd: FLPTPTR: First Level Page translation pointer in GPA
> + * @pasid: PASID to be programmed in the device PASID table
> + * @pasid_data: Additional PASID info from the guest bind request
> + * @domain: Domain info for setting up second level page tables
> + * @addr_width: Address width of the first level (guest)
> + */
> +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> + struct device *dev, pgd_t *gpgd,
> + int pasid, struct iommu_gpasid_bind_data_vtd *pasid_data,
> + struct dmar_domain *domain,
> + int addr_width)
> +{
> + struct pasid_entry *pte;
> + struct dma_pte *pgd;
> + u64 pgd_val;
> + int agaw;
> + u16 did;
> +
> + if (!ecap_nest(iommu->ecap)) {
> + pr_err("IOMMU: %s: No nested translation support\n",
> + iommu->name);
> + return -EINVAL;
> + }
> +
> + pte = intel_pasid_get_entry(dev, pasid);
> + if (WARN_ON(!pte))
> + return -EINVAL;
> +
> + pasid_clear_entry(pte);
> +
> + /* Sanity checking performed by caller to make sure address
> + * width matching in two dimensions:
> + * 1. CPU vs. IOMMU
> + * 2. Guest vs. Host.
> + */
> + switch (addr_width) {
> + case 57:
> + pasid_set_flpm(pte, 1);
> + break;
> + case 48:
> + pasid_set_flpm(pte, 0);
> + break;
> + default:
> + dev_err(dev, "Invalid paging mode %d\n", addr_width);
> + return -EINVAL;
> + }
> +
> + pasid_set_flptr(pte, (u64)gpgd);
> +
> + intel_pasid_setup_bind_data(iommu, pte, pasid_data);
> +
> + /* Setup the second level based on the given domain */
> + pgd = domain->pgd;
> +
> + for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
> + pgd = phys_to_virt(dma_pte_addr(pgd));
> + if (!dma_pte_present(pgd)) {
> + dev_err(dev, "Invalid domain page table\n");
> + return -EINVAL;
> + }
> + }
> + pgd_val = virt_to_phys(pgd);
> + pasid_set_slptr(pte, pgd_val);
> + pasid_set_fault_enable(pte);
> +
> + did = domain->iommu_did[iommu->seq_id];
> + pasid_set_domain_id(pte, did);
> +
> + pasid_set_address_width(pte, agaw);
> + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
> +
> + pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
> + pasid_set_present(pte);
> + pasid_flush_caches(iommu, pte, pasid, did);
> +
> + return 0;
> +}
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index e413e884e685..09c85db73b77 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -46,6 +46,7 @@
> * to vmalloc or even module mappings.
> */
> #define PASID_FLAG_SUPERVISOR_MODE BIT(0)
> +#define PASID_FLAG_NESTED BIT(1)
>
> struct pasid_dir_entry {
> u64 val;
> @@ -55,6 +56,11 @@ struct pasid_entry {
> u64 val[8];
> };
>
> +#define PASID_ENTRY_PGTT_FL_ONLY (1)
> +#define PASID_ENTRY_PGTT_SL_ONLY (2)
> +#define PASID_ENTRY_PGTT_NESTED (3)
> +#define PASID_ENTRY_PGTT_PT (4)
> +
> /* The representative of a PASID table */
> struct pasid_table {
> void *table; /* pasid table pointer */
> @@ -103,6 +109,12 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
> int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
> struct dmar_domain *domain,
> struct device *dev, int pasid);
> +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> + struct device *dev, pgd_t *pgd,
> + int pasid,
> + struct iommu_gpasid_bind_data_vtd *pasid_data,
> + struct dmar_domain *domain,
> + int addr_width);
> void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> struct device *dev, int pasid);
> int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
>

2019-10-25 20:39:53

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v7 03/11] iommu/vt-d: Add custom allocator for IOASID

Hi,

On 10/25/19 2:40 PM, Tian, Kevin wrote:
>>>> ioasid_register_allocator(&iommu->pasid_allocator);
>>>> + if (ret) {
>>>> + pr_warn("Custom PASID allocator
>>>> registeration failed\n");
>>>> + /*
>>>> + * Disable scalable mode on this
>>>> IOMMU if there
>>>> + * is no custom allocator. Mixing
>>>> SM capable vIOMMU
>>>> + * and non-SM vIOMMU are not
>>>> supported.
>>>> + */
>>>> + intel_iommu_sm = 0;
>>> It's insufficient to disable scalable mode by only clearing
>>> intel_iommu_sm. The DMA_RTADDR_SMT bit in root entry has already
>> been
>>> set. Probably, you need to
>>>
>>> for each iommu
>>> clear DMA_RTADDR_SMT in root entry
>>>
>>> Alternatively, since vSVA is the only customer of this custom PASID
>>> allocator, is it possible to only disable SVA here?
>>>
>> Yeah, I think disable SVA is better. We can still do gIOVA in SM. I
>> guess we need to introduce a flag for sva_enabled.
> I'm not sure whether tying above logic to SVA is the right approach.
> If vcmd interface doesn't work, the whole SM mode doesn't make
> sense which is based on PASID-granular protection (SVA is only one
> usage atop). If the only remaining usage of SM is to map gIOVA using
> reserved PASID#0, then why not disabling SM and just fallback to
> legacy mode?
>
> Based on that I prefer to disabling the SM mode completely (better
> through an interface), and move the logic out of CONFIG_INTEL_
> IOMMU_SVM
>

Unfortunately, it is dangerous to disable SM after boot. SM uses
different root/device contexts and pasid table formats. Disabling SM
after boot requires changing above from SM format into legacy format.

Since ioasid registration failure is a rare case. How about moving this
part of code up to the early stage of intel_iommu_init() and returning
error if hardware present vcmd capability but software fails to register
a custom ioasid allocator?

Best regards,
baolu

2019-10-25 20:43:58

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 03/11] iommu/vt-d: Add custom allocator for IOASID

> From: Lu Baolu [mailto:[email protected]]
> Sent: Friday, October 25, 2019 10:39 PM
>
> Hi,
>
> On 10/25/19 2:40 PM, Tian, Kevin wrote:
> >>>> ioasid_register_allocator(&iommu->pasid_allocator);
> >>>> + if (ret) {
> >>>> + pr_warn("Custom PASID allocator
> >>>> registeration failed\n");
> >>>> + /*
> >>>> + * Disable scalable mode on this
> >>>> IOMMU if there
> >>>> + * is no custom allocator. Mixing
> >>>> SM capable vIOMMU
> >>>> + * and non-SM vIOMMU are not
> >>>> supported.
> >>>> + */
> >>>> + intel_iommu_sm = 0;
> >>> It's insufficient to disable scalable mode by only clearing
> >>> intel_iommu_sm. The DMA_RTADDR_SMT bit in root entry has already
> >> been
> >>> set. Probably, you need to
> >>>
> >>> for each iommu
> >>> clear DMA_RTADDR_SMT in root entry
> >>>
> >>> Alternatively, since vSVA is the only customer of this custom PASID
> >>> allocator, is it possible to only disable SVA here?
> >>>
> >> Yeah, I think disable SVA is better. We can still do gIOVA in SM. I
> >> guess we need to introduce a flag for sva_enabled.
> > I'm not sure whether tying above logic to SVA is the right approach.
> > If vcmd interface doesn't work, the whole SM mode doesn't make
> > sense which is based on PASID-granular protection (SVA is only one
> > usage atop). If the only remaining usage of SM is to map gIOVA using
> > reserved PASID#0, then why not disabling SM and just fallback to
> > legacy mode?
> >
> > Based on that I prefer to disabling the SM mode completely (better
> > through an interface), and move the logic out of CONFIG_INTEL_
> > IOMMU_SVM
> >
>
> Unfortunately, it is dangerous to disable SM after boot. SM uses
> different root/device contexts and pasid table formats. Disabling SM
> after boot requires changing above from SM format into legacy format.

You are correct.

>
> Since ioasid registration failure is a rare case. How about moving this
> part of code up to the early stage of intel_iommu_init() and returning
> error if hardware present vcmd capability but software fails to register
> a custom ioasid allocator?
>

It makes sense to me.

Thanks
Kevin

2019-10-25 20:45:10

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 07/11] iommu/vt-d: Add nested translation helper function

On Fri, 25 Oct 2019 23:04:48 +0800
Lu Baolu <[email protected]> wrote:

> Hi,
>
> On 10/25/19 3:55 AM, Jacob Pan wrote:
> > Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
> > With PASID granular translation type set to 0x11b, translation
> > result from the first level(FL) also subject to a second level(SL)
> > page table translation. This mode is used for SVA virtualization,
> > where FL performs guest virtual to guest physical translation and
> > SL performs guest physical to host physical translation.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > Signed-off-by: Liu, Yi L <[email protected]>
> > ---
> > drivers/iommu/intel-pasid.c | 207
> > ++++++++++++++++++++++++++++++++++++++++++++
> > drivers/iommu/intel-pasid.h | 12 +++ 2 files changed, 219
> > insertions(+)
> >
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index ffbd416ed3b8..f846a907cfcf
> > 100644 --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -415,6 +415,76 @@ pasid_set_flpm(struct pasid_entry *pe, u64
> > value) pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
> > }
> >
> > +/*
> > + * Setup the Extended Memory Type(EMT) field (Bits 91-93)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_emt(struct pasid_entry *pe, u64 value)
> > +{
> > + pasid_set_bits(&pe->val[1], GENMASK_ULL(29, 27), value <<
> > 27); +}
> > +
> > +/*
> > + * Setup the Page Attribute Table (PAT) field (Bits 96-127)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_pat(struct pasid_entry *pe, u64 value)
> > +{
> > + pasid_set_bits(&pe->val[1], GENMASK_ULL(63, 32), value <<
> > 27);
>
> Should be "value << 32", right?
>
> > +}
> > +
> > +/*
> > + * Setup the Cache Disable (CD) field (Bit 89)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_cd(struct pasid_entry *pe)
> > +{
> > + pasid_set_bits(&pe->val[1], 1 << 25, 1);
> > +}
> > +
> > +/*
> > + * Setup the Extended Memory Type Enable (EMTE) field (Bit 90)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_emte(struct pasid_entry *pe)
> > +{
> > + pasid_set_bits(&pe->val[1], 1 << 26, 1);
> > +}
> > +
> > +/*
> > + * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_eafe(struct pasid_entry *pe)
> > +{
> > + pasid_set_bits(&pe->val[2], 1 << 7, 1);
> > +}
> > +
> > +/*
> > + * Setup the Page-level Cache Disable (PCD) field (Bit 95)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_pcd(struct pasid_entry *pe)
> > +{
> > + pasid_set_bits(&pe->val[1], 1 << 31, 1);
> > +}
> > +
> > +/*
> > + * Setup the Page-level Write-Through (PWT)) field (Bit 94)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_pwt(struct pasid_entry *pe)
> > +{
> > + pasid_set_bits(&pe->val[1], 1 << 30, 1);
> > +}
> > +
> > static void
> > pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
> > u16 did, int pasid)
> > @@ -647,3 +717,140 @@ int intel_pasid_setup_pass_through(struct
> > intel_iommu *iommu,
> > return 0;
> > }
> > +
> > +static int intel_pasid_setup_bind_data(struct intel_iommu *iommu,
> > + struct pasid_entry *pte,
> > + struct iommu_gpasid_bind_data_vtd
> > *pasid_data) +{
> > + /*
> > + * Not all guest PASID table entry fields are passed down
> > during bind,
> > + * here we only set up the ones that are dependent on
> > guest settings.
> > + * Execution related bits such as NXE, SMEP are not
> > meaningful to IOMMU,
> > + * therefore not set. Other fields, such as snoop related,
> > are set based
> > + * on host needs regardless of guest settings.
> > + */
> > + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_SRE) {
> > + if (!ecap_srs(iommu->ecap)) {
> > + pr_err("No supervisor request support on
> > %s\n",
> > + iommu->name);
> > + return -EINVAL;
> > + }
> > + pasid_set_sre(pte);
> > + }
> > +
> > + if ((pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) &&
> > ecap_eafs(iommu->ecap))
> > + pasid_set_eafe(pte);
> > +
> > + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EMTE) {
> > + pasid_set_emte(pte);
> > + pasid_set_emt(pte, pasid_data->emt);
> > + }
> > +
> > + /*
> > + * Memory type is only applicable to devices inside
> > processor coherent
> > + * domain. PCIe devices are not included. We can skip the
> > rest of the
> > + * flags if IOMMU does not support MTS.
> > + */
> > + if (!ecap_mts(iommu->ecap)) {
> > + pr_info("%s does not support memory type bind
> > guest PASID\n",
> > + iommu->name);
> > + return 0;
> > + }
>
> How about making below lines as
>
> if (ecap_mts(iommu->ecap)) {
> do various pasid entry settings
> } else {
> pr_info(...);
> }
>
> Otherwise, when someone later adds code at the end of this function.
> It might be ignored by above return 0.
>
Sounds good, I like the positive logic and more readable this way.

Thanks,

Jacob
> Best regards,
> baolu
>
> > +
> > + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PCD)
> > + pasid_set_pcd(pte);
> > + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PWT)
> > + pasid_set_pwt(pte);
> > + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_CD)
> > + pasid_set_cd(pte);
> > + pasid_set_pat(pte, pasid_data->pat);
> > +
> > + return 0;
> > +
> > +}
> > +
> > +/**
> > + * intel_pasid_setup_nested() - Set up PASID entry for nested
> > translation
> > + * which is used for vSVA. The first level page tables are used for
> > + * GVA-GPA translation in the guest, second level page tables are
> > used
> > + * for GPA to HPA translation.
> > + *
> > + * @iommu: Iommu which the device belong to
> > + * @dev: Device to be set up for translation
> > + * @gpgd: FLPTPTR: First Level Page translation pointer in
> > GPA
> > + * @pasid: PASID to be programmed in the device PASID table
> > + * @pasid_data: Additional PASID info from the guest bind request
> > + * @domain: Domain info for setting up second level page tables
> > + * @addr_width: Address width of the first level (guest)
> > + */
> > +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> > + struct device *dev, pgd_t *gpgd,
> > + int pasid, struct
> > iommu_gpasid_bind_data_vtd *pasid_data,
> > + struct dmar_domain *domain,
> > + int addr_width)
> > +{
> > + struct pasid_entry *pte;
> > + struct dma_pte *pgd;
> > + u64 pgd_val;
> > + int agaw;
> > + u16 did;
> > +
> > + if (!ecap_nest(iommu->ecap)) {
> > + pr_err("IOMMU: %s: No nested translation
> > support\n",
> > + iommu->name);
> > + return -EINVAL;
> > + }
> > +
> > + pte = intel_pasid_get_entry(dev, pasid);
> > + if (WARN_ON(!pte))
> > + return -EINVAL;
> > +
> > + pasid_clear_entry(pte);
> > +
> > + /* Sanity checking performed by caller to make sure address
> > + * width matching in two dimensions:
> > + * 1. CPU vs. IOMMU
> > + * 2. Guest vs. Host.
> > + */
> > + switch (addr_width) {
> > + case 57:
> > + pasid_set_flpm(pte, 1);
> > + break;
> > + case 48:
> > + pasid_set_flpm(pte, 0);
> > + break;
> > + default:
> > + dev_err(dev, "Invalid paging mode %d\n",
> > addr_width);
> > + return -EINVAL;
> > + }
> > +
> > + pasid_set_flptr(pte, (u64)gpgd);
> > +
> > + intel_pasid_setup_bind_data(iommu, pte, pasid_data);
> > +
> > + /* Setup the second level based on the given domain */
> > + pgd = domain->pgd;
> > +
> > + for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
> > + pgd = phys_to_virt(dma_pte_addr(pgd));
> > + if (!dma_pte_present(pgd)) {
> > + dev_err(dev, "Invalid domain page
> > table\n");
> > + return -EINVAL;
> > + }
> > + }
> > + pgd_val = virt_to_phys(pgd);
> > + pasid_set_slptr(pte, pgd_val);
> > + pasid_set_fault_enable(pte);
> > +
> > + did = domain->iommu_did[iommu->seq_id];
> > + pasid_set_domain_id(pte, did);
> > +
> > + pasid_set_address_width(pte, agaw);
> > + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
> > +
> > + pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
> > + pasid_set_present(pte);
> > + pasid_flush_caches(iommu, pte, pasid, did);
> > +
> > + return 0;
> > +}
> > diff --git a/drivers/iommu/intel-pasid.h
> > b/drivers/iommu/intel-pasid.h index e413e884e685..09c85db73b77
> > 100644 --- a/drivers/iommu/intel-pasid.h
> > +++ b/drivers/iommu/intel-pasid.h
> > @@ -46,6 +46,7 @@
> > * to vmalloc or even module mappings.
> > */
> > #define PASID_FLAG_SUPERVISOR_MODE BIT(0)
> > +#define PASID_FLAG_NESTED BIT(1)
> >
> > struct pasid_dir_entry {
> > u64 val;
> > @@ -55,6 +56,11 @@ struct pasid_entry {
> > u64 val[8];
> > };
> >
> > +#define PASID_ENTRY_PGTT_FL_ONLY (1)
> > +#define PASID_ENTRY_PGTT_SL_ONLY (2)
> > +#define PASID_ENTRY_PGTT_NESTED (3)
> > +#define PASID_ENTRY_PGTT_PT (4)
> > +
> > /* The representative of a PASID table */
> > struct pasid_table {
> > void *table; /*
> > pasid table pointer */ @@ -103,6 +109,12 @@ int
> > intel_pasid_setup_second_level(struct intel_iommu *iommu, int
> > intel_pasid_setup_pass_through(struct intel_iommu *iommu, struct
> > dmar_domain *domain, struct device *dev, int pasid);
> > +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> > + struct device *dev, pgd_t *pgd,
> > + int pasid,
> > + struct iommu_gpasid_bind_data_vtd
> > *pasid_data,
> > + struct dmar_domain *domain,
> > + int addr_width);
> > void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> > struct device *dev, int pasid);
> > int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
> > *pasid);

[Jacob Pan]

2019-10-25 20:47:42

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support

Hi Kevin,


On Fri, 25 Oct 2019 07:19:26 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Jacob Pan [mailto:[email protected]]
> > Sent: Friday, October 25, 2019 3:55 AM
> >
> > When supporting guest SVA with emulated IOMMU, the guest PASID
> > table is shadowed in VMM. Updates to guest vIOMMU PASID table
> > will result in PASID cache flush which will be passed down to
> > the host as bind guest PASID calls.
>
> will be translated into binding/unbinding guest PASID calls to update
> the host shadow PASID table.
>
yours is more precise, will replace.
> >
> > For the SL page tables, it will be harvested from device's
> > default domain (request w/o PASID), or aux domain in case of
> > mediated device.
>
> harvested -> copied or linked to?
Kind of the same, but I agree copied is more technical and precise
term. Will change.

> >
> > .-------------. .---------------------------.
> > | vIOMMU | | Guest process CR3, FL only|
> > | | '---------------------------'
> > .----------------/
> > | PASID Entry |--- PASID cache flush -
> > '-------------' |
> > | | V
> > | | CR3 in GPA
> > '-------------'
> > Guest
> > ------| Shadow |--------------------------|--------
> > v v v
> > Host
> > .-------------. .----------------------.
> > | pIOMMU | | Bind FL for GVA-GPA |
> > | | '----------------------'
> > .----------------/ |
> > | PASID Entry | V (Nested xlate)
> > '----------------\.------------------------------.
> > | | |SL for GPA-HPA, default domain|
> > | | '------------------------------'
> > '-------------'
> > Where:
> > - FL = First level/stage one page tables
> > - SL = Second level/stage two page tables
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > Signed-off-by: Liu, Yi L <[email protected]>
> > ---
> > drivers/iommu/intel-iommu.c | 4 +
> > drivers/iommu/intel-svm.c | 184
> > ++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/intel-iommu.h | 8 +-
> > include/linux/intel-svm.h | 17 ++++
> > 4 files changed, 212 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index acd1ac787d8b..5fab32fbc4b4
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -6026,6 +6026,10 @@ const struct iommu_ops intel_iommu_ops = {
> > .dev_disable_feat = intel_iommu_dev_disable_feat,
> > .is_attach_deferred =
> > intel_iommu_is_attach_deferred, .pgsize_bitmap =
> > INTEL_IOMMU_PGSIZES, +#ifdef CONFIG_INTEL_IOMMU_SVM
> > + .sva_bind_gpasid = intel_svm_bind_gpasid,
> > + .sva_unbind_gpasid = intel_svm_unbind_gpasid,
> > +#endif
>
> again, pure PASID management logic should be separated from SVM.
>
I am not following, these two functions are SVM functionality, not
pure PASID management which is already separated in ioasid.c

> > };
> >
> > static void quirk_iommu_igfx(struct pci_dev *dev)
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index a18b02a9709d..ae13a310cf96 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -216,6 +216,190 @@ static LIST_HEAD(global_svm_list);
> > list_for_each_entry(sdev, &svm->devs, list) \
> > if (dev == sdev->dev) \
> >
> > +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> > + struct device *dev,
> > + struct iommu_gpasid_bind_data *data)
> > +{
> > + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > + struct dmar_domain *ddomain;
> > + struct intel_svm_dev *sdev;
> > + struct intel_svm *svm;
> > + int ret = 0;
> > +
> > + if (WARN_ON(!iommu) || !data)
> > + return -EINVAL;
> > +
> > + if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> > + data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> > + return -EINVAL;
> > +
> > + if (dev_is_pci(dev)) {
> > + /* VT-d supports devices with full 20 bit PASIDs
> > only */
> > + if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> > + return -EINVAL;
> > + }
>
> what about non-pci devices? It just moves forward w/o any check here?
>
Good catch, we only support PCI-device on Intel. Even mdev has to pass
the pdev to bind. Will add the else case.

> > +
> > + /*
> > + * We only check host PASID range, we have no knowledge to
> > check
> > + * guest PASID range nor do we use the guest PASID.
> > + */
> > + if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> > + return -EINVAL;
> > +
> > + ddomain = to_dmar_domain(domain);
> > + /* REVISIT:
> > + * Sanity check adddress width and paging mode support
> > + * width matching in two dimensions:
> > + * 1. paging mode CPU <= IOMMU
> > + * 2. address width Guest <= Host.
> > + */
>
> Is lacking of above logic harmful? If not, we should add
>
It is better to add the check now, not solely rely on QEMU.

> > + mutex_lock(&pasid_mutex);
> > + svm = ioasid_find(NULL, data->hpasid, NULL);
> > + if (IS_ERR(svm)) {
> > + ret = PTR_ERR(svm);
> > + goto out;
> > + }
> > + if (svm) {
> > + /*
> > + * If we found svm for the PASID, there must be at
> > + * least one device bond, otherwise svm should be
> > freed.
> > + */
> > + BUG_ON(list_empty(&svm->devs));
> > +
> > + for_each_svm_dev(svm, dev) {
> > + /* In case of multiple sub-devices of the
> > same pdev assigned, we should
> > + * allow multiple bind calls with the same
> > PASID and pdev.
> > + */
> > + sdev->users++;
> > + goto out;
>
> sorry if I overlooked, but I didn't see any check on the PASID
> actually belonging to this process. At least should check the
> match between svm->mm and get_task_mm? also check
> whether a previous binding between this hpasid and gpasid
> already exists.
>
We had some discussions on whom should be responsible for checking
ownership. I tend to think VFIO is right place but I guess we can also
double check here.
Good point, we should check the same H-G PASID bind already exists.
> > + }
> > + } else {
> > + /* We come here when PASID has never been bond to a
> > device. */
> > + svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> > + if (!svm) {
> > + ret = -ENOMEM;
> > + goto out;
> > + }
> > + /* REVISIT: upper layer/VFIO can track host
> > process that bind the PASID.
> > + * ioasid_set = mm might be sufficient for vfio to
> > check pasid VMM
> > + * ownership.
> > + */
>
> Is it correct to leave the check to the caller?
>
Ditto, we will double check. But since this is related to the guest, I
feel iommu driver check mm might be too restrictive. I am not sure if
any VMM could have more than one process? One process does alloc, the
other does bind.

> > + svm->mm = get_task_mm(current);
> > + svm->pasid = data->hpasid;
> > + if (data->flags & IOMMU_SVA_GPASID_VAL) {
> > + svm->gpasid = data->gpasid;
> > + svm->flags |= SVM_FLAG_GUEST_PASID;
> > + }
> > + ioasid_set_data(data->hpasid, svm);
> > + INIT_LIST_HEAD_RCU(&svm->devs);
> > + INIT_LIST_HEAD(&svm->list);
> > +
> > + mmput(svm->mm);
> > + }
> > + sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> > + if (!sdev) {
> > + if (list_empty(&svm->devs))
> > + kfree(svm);
> > + ret = -ENOMEM;
> > + goto out;
> > + }
> > + sdev->dev = dev;
> > + sdev->users = 1;
> > +
> > + /* Set up device context entry for PASID if not enabled
> > already */
> > + ret = intel_iommu_enable_pasid(iommu, sdev->dev);
> > + if (ret) {
> > + dev_err(dev, "Failed to enable PASID
> > capability\n");
> > + kfree(sdev);
> > + goto out;
> > + }
> > +
> > + /*
> > + * For guest bind, we need to set up PASID table entry as
> > follows:
> > + * - FLPM matches guest paging mode
> > + * - turn on nested mode
> > + * - SL guest address width matching
> > + */
> > + ret = intel_pasid_setup_nested(iommu,
> > + dev,
> > + (pgd_t *)data->gpgd,
> > + data->hpasid,
> > + &data->vtd,
> > + ddomain,
> > + data->addr_width);
> > + if (ret) {
> > + dev_err(dev, "Failed to set up PASID %llu in
> > nested mode, Err %d\n",
> > + data->hpasid, ret);
> > + kfree(sdev);
>
> disable pasid? revert ioasid_set_data?
>
Good catch, will do.
> > + goto out;
> > + }
> > + svm->flags |= SVM_FLAG_GUEST_MODE;
> > +
> > + init_rcu_head(&sdev->rcu);
> > + list_add_rcu(&sdev->list, &svm->devs);
> > + out:
> > + mutex_unlock(&pasid_mutex);
> > + return ret;
> > +}
> > +
> > +int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> > +{
> > + struct intel_svm_dev *sdev;
> > + struct intel_iommu *iommu;
> > + struct intel_svm *svm;
> > + int ret = -EINVAL;
> > +
> > + mutex_lock(&pasid_mutex);
> > + iommu = intel_svm_device_to_iommu(dev);
> > + if (!iommu)
> > + goto out;
> > +
> > + svm = ioasid_find(NULL, pasid, NULL);
> > + if (IS_ERR_OR_NULL(svm)) {
> > + ret = PTR_ERR(svm);
> > + goto out;
> > + }
> > +
> > + for_each_svm_dev(svm, dev) {
> > + ret = 0;
> > + sdev->users--;
> > + if (!sdev->users) {
> > + list_del_rcu(&sdev->list);
> > + intel_pasid_tear_down_entry(iommu, dev,
> > svm-
> > >pasid);
> > + /* TODO: Drain in flight PRQ for the PASID
> > since it
> > + * may get reused soon, we don't want to
> > + * confuse with its previous life.
> > + * intel_svm_drain_prq(dev, pasid);
> > + */
> > + kfree_rcu(sdev, rcu);
> > +
> > + if (list_empty(&svm->devs)) {
> > + list_del(&svm->list);
> > + kfree(svm);
> > + /*
> > + * We do not free PASID here until
> > explicit call
> > + * from VFIO to free. The PASID
> > life cycle
> > + * management is largely tied to
> > VFIO management
> > + * of assigned device life cycles.
> > In case of
> > + * guest exit without a explicit
> > free PASID call,
> > + * the responsibility lies in VFIO
> > layer to free
> > + * the PASIDs allocated for the
> > guest.
> > + * For security reasons, VFIO has
> > to track the
> > + * PASID ownership per guest
> > anyway to ensure
> > + * that PASID allocated by one
> > guest cannot be
> > + * used by another.
> > + */
> > + ioasid_set_data(pasid, NULL);
> > + }
> > + }
> > + break;
> > + }
> > + out:
> > + mutex_unlock(&pasid_mutex);
> > +
> > + return ret;
> > +}
> > +
> > int intel_svm_bind_mm(struct device *dev, int *pasid, int flags,
> > struct svm_dev_ops *ops)
> > {
> > struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index 3dba6ad3e9ad..6c74c71b1ebf
> > 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -673,7 +673,9 @@ int intel_iommu_enable_pasid(struct intel_iommu
> > *iommu, struct device *dev);
> > int intel_svm_init(struct intel_iommu *iommu);
> > extern int intel_svm_enable_prq(struct intel_iommu *iommu);
> > extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> > -
> > +extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
> > + struct device *dev, struct iommu_gpasid_bind_data
> > *data); +extern int intel_svm_unbind_gpasid(struct device *dev, int
> > pasid); struct svm_dev_ops;
> >
> > struct intel_svm_dev {
> > @@ -690,9 +692,13 @@ struct intel_svm_dev {
> > struct intel_svm {
> > struct mmu_notifier notifier;
> > struct mm_struct *mm;
> > +
> > struct intel_iommu *iommu;
> > int flags;
> > int pasid;
> > + int gpasid; /* Guest PASID in case of vSVA bind with
> > non-identity host
> > + * to guest PASID mapping.
> > + */
> > struct list_head devs;
> > struct list_head list;
> > };
> > diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> > index 94f047a8a845..a2c189ad0b01 100644
> > --- a/include/linux/intel-svm.h
> > +++ b/include/linux/intel-svm.h
> > @@ -44,6 +44,23 @@ struct svm_dev_ops {
> > * do such IOTLB flushes automatically.
> > */
> > #define SVM_FLAG_SUPERVISOR_MODE (1<<1)
> > +/*
> > + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind
> > to a device.
> > + * In this case the mm_struct is in the guest kernel or userspace,
> > its life
> > + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this
> > API provides
> > + * means to bind/unbind guest CR3 with PASIDs allocated for a
> > device.
> > + */
> > +#define SVM_FLAG_GUEST_MODE (1<<2)
> > +/*
> > + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own
> > PASID space,
> > + * which requires guest and host PASID translation at both
> > directions. We keep
> > + * track of guest PASID in order to provide lookup service to
> > device drivers.
> > + * One such example is a physical function (PF) driver that
> > supports mediated
> > + * device (mdev) assignment. Guest programming of mdev
> > configuration space can
> > + * only be done with guest PASID, therefore PF driver needs to
> > find the matching
> > + * host PASID to program the real hardware.
> > + */
> > +#define SVM_FLAG_GUEST_PASID (1<<3)
> >
> > #ifdef CONFIG_INTEL_IOMMU_SVM
> >
> > --
> > 2.7.4
>

[Jacob Pan]

2019-10-26 01:10:05

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v7 08/11] iommu/vt-d: Misc macro clean up for SVM

Hi,

On 10/25/19 3:55 AM, Jacob Pan wrote:
> Use combined macros for_each_svm_dev() to simplify SVM device iteration
> and error checking.
>
> Suggested-by: Andy Shevchenko <[email protected]>
> Signed-off-by: Jacob Pan <[email protected]>
> Reviewed-by: Eric Auger <[email protected]>
> ---
> drivers/iommu/intel-svm.c | 89 ++++++++++++++++++++++-------------------------
> 1 file changed, 42 insertions(+), 47 deletions(-)
>
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index a9a7f85a09bc..a18b02a9709d 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -212,6 +212,10 @@ static const struct mmu_notifier_ops intel_mmuops = {
> static DEFINE_MUTEX(pasid_mutex);
> static LIST_HEAD(global_svm_list);
>
> +#define for_each_svm_dev(svm, dev) \
> + list_for_each_entry(sdev, &svm->devs, list) \
> + if (dev == sdev->dev) \
> +
> int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
> {
> struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> @@ -257,15 +261,13 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
> goto out;
> }
>
> - list_for_each_entry(sdev, &svm->devs, list) {
> - if (dev == sdev->dev) {
> - if (sdev->ops != ops) {
> - ret = -EBUSY;
> - goto out;
> - }
> - sdev->users++;
> - goto success;
> + for_each_svm_dev(svm, dev) {
> + if (sdev->ops != ops) {
> + ret = -EBUSY;
> + goto out;
> }
> + sdev->users++;
> + goto success;
> }
>
> break;
> @@ -402,50 +404,43 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
> goto out;
>
> svm = ioasid_find(NULL, pasid, NULL);
> - if (IS_ERR(svm)) {
> + if (IS_ERR_OR_NULL(svm)) {
> ret = PTR_ERR(svm);
> goto out;
> }
>
> - if (!svm)
> - goto out;

If svm == NULL here, this function will return success. This isn't
expected, right?

Others looks good to me.

Reviewed-by: Lu Baolu <[email protected]>

Best regards,
baolu

> -
> - list_for_each_entry(sdev, &svm->devs, list) {
> - if (dev == sdev->dev) {
> - ret = 0;
> - sdev->users--;
> - if (!sdev->users) {
> - list_del_rcu(&sdev->list);
> - /* Flush the PASID cache and IOTLB for this device.
> - * Note that we do depend on the hardware *not* using
> - * the PASID any more. Just as we depend on other
> - * devices never using PASIDs that they have no right
> - * to use. We have a *shared* PASID table, because it's
> - * large and has to be physically contiguous. So it's
> - * hard to be as defensive as we might like. */
> - intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
> - intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
> - kfree_rcu(sdev, rcu);
> -
> - if (list_empty(&svm->devs)) {
> - /* Clear private data so that free pass check */
> - ioasid_set_data(svm->pasid, NULL);
> - ioasid_free(svm->pasid);
> - if (svm->mm)
> - mmu_notifier_unregister(&svm->notifier, svm->mm);
> -
> - list_del(&svm->list);
> -
> - /* We mandate that no page faults may be outstanding
> - * for the PASID when intel_svm_unbind_mm() is called.
> - * If that is not obeyed, subtle errors will happen.
> - * Let's make them less subtle... */
> - memset(svm, 0x6b, sizeof(*svm));
> - kfree(svm);
> - }
> + for_each_svm_dev(svm, dev) {
> + ret = 0;
> + sdev->users--;
> + if (!sdev->users) {
> + list_del_rcu(&sdev->list);
> + /* Flush the PASID cache and IOTLB for this device.
> + * Note that we do depend on the hardware *not* using
> + * the PASID any more. Just as we depend on other
> + * devices never using PASIDs that they have no right
> + * to use. We have a *shared* PASID table, because it's
> + * large and has to be physically contiguous. So it's
> + * hard to be as defensive as we might like. */
> + intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
> + intel_flush_svm_range_dev(svm, sdev, 0, -1, 0);
> + kfree_rcu(sdev, rcu);
> +
> + if (list_empty(&svm->devs)) {
> + /* Clear private data so that free pass check */
> + ioasid_set_data(svm->pasid, NULL);
> + ioasid_free(svm->pasid);
> + if (svm->mm)
> + mmu_notifier_unregister(&svm->notifier, svm->mm);
> + list_del(&svm->list);
> + /* We mandate that no page faults may be outstanding
> + * for the PASID when intel_svm_unbind_mm() is called.
> + * If that is not obeyed, subtle errors will happen.
> + * Let's make them less subtle... */
> + memset(svm, 0x6b, sizeof(*svm));
> + kfree(svm);
> }
> - break;
> }
> + break;
> }
> out:
> mutex_unlock(&pasid_mutex);
> @@ -581,7 +576,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
> * to unbind the mm while any page faults are outstanding.
> * So we only need RCU to protect the internal idr code. */
> rcu_read_unlock();
> - if (IS_ERR(svm) || !svm) {
> + if (IS_ERR_OR_NULL(svm)) {
> pr_err("%s: Page request for invalid PASID %d: %08llx %08llx\n",
> iommu->name, req->pasid, ((unsigned long long *)req)[0],
> ((unsigned long long *)req)[1]);
>

2019-10-26 02:05:00

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support

Hi,

On 10/25/19 3:55 AM, Jacob Pan wrote:
> When supporting guest SVA with emulated IOMMU, the guest PASID
> table is shadowed in VMM. Updates to guest vIOMMU PASID table
> will result in PASID cache flush which will be passed down to
> the host as bind guest PASID calls.
>
> For the SL page tables, it will be harvested from device's
> default domain (request w/o PASID), or aux domain in case of
> mediated device.
>
> .-------------. .---------------------------.
> | vIOMMU | | Guest process CR3, FL only|
> | | '---------------------------'
> .----------------/
> | PASID Entry |--- PASID cache flush -
> '-------------' |
> | | V
> | | CR3 in GPA
> '-------------'
> Guest
> ------| Shadow |--------------------------|--------
> v v v
> Host
> .-------------. .----------------------.
> | pIOMMU | | Bind FL for GVA-GPA |
> | | '----------------------'
> .----------------/ |
> | PASID Entry | V (Nested xlate)
> '----------------\.------------------------------.
> | | |SL for GPA-HPA, default domain|
> | | '------------------------------'
> '-------------'
> Where:
> - FL = First level/stage one page tables
> - SL = Second level/stage two page tables
>
> Signed-off-by: Jacob Pan <[email protected]>
> Signed-off-by: Liu, Yi L <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 4 +
> drivers/iommu/intel-svm.c | 184 ++++++++++++++++++++++++++++++++++++++++++++
> include/linux/intel-iommu.h | 8 +-
> include/linux/intel-svm.h | 17 ++++
> 4 files changed, 212 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index acd1ac787d8b..5fab32fbc4b4 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -6026,6 +6026,10 @@ const struct iommu_ops intel_iommu_ops = {
> .dev_disable_feat = intel_iommu_dev_disable_feat,
> .is_attach_deferred = intel_iommu_is_attach_deferred,
> .pgsize_bitmap = INTEL_IOMMU_PGSIZES,
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> + .sva_bind_gpasid = intel_svm_bind_gpasid,
> + .sva_unbind_gpasid = intel_svm_unbind_gpasid,
> +#endif
> };
>
> static void quirk_iommu_igfx(struct pci_dev *dev)
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index a18b02a9709d..ae13a310cf96 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -216,6 +216,190 @@ static LIST_HEAD(global_svm_list);
> list_for_each_entry(sdev, &svm->devs, list) \
> if (dev == sdev->dev) \

Add an indent tab please.

>
> +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> + struct device *dev,
> + struct iommu_gpasid_bind_data *data)
> +{
> + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> + struct dmar_domain *ddomain;
> + struct intel_svm_dev *sdev;
> + struct intel_svm *svm;
> + int ret = 0;
> +
> + if (WARN_ON(!iommu) || !data)
> + return -EINVAL;
> +
> + if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> + data->format != IOMMU_PASID_FORMAT_INTEL_VTD)

Alignment should match open parenthesis.

Run "scripts/checkpatch.pl --strict" for all in this patch. I will
ignore others.

> + return -EINVAL;
> +
> + if (dev_is_pci(dev)) {
> + /* VT-d supports devices with full 20 bit PASIDs only */
> + if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> + return -EINVAL;
> + }
> +
> + /*
> + * We only check host PASID range, we have no knowledge to check
> + * guest PASID range nor do we use the guest PASID.
> + */
> + if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> + return -EINVAL;
> +
> + ddomain = to_dmar_domain(domain);
> + /* REVISIT:
> + * Sanity check adddress width and paging mode support

s/adddress/address/g

> + * width matching in two dimensions:
> + * 1. paging mode CPU <= IOMMU
> + * 2. address width Guest <= Host.
> + */ > + mutex_lock(&pasid_mutex);
> + svm = ioasid_find(NULL, data->hpasid, NULL);
> + if (IS_ERR(svm)) {
> + ret = PTR_ERR(svm);
> + goto out;
> + }

A blank line looks better.

> + if (svm) {
> + /*
> + * If we found svm for the PASID, there must be at
> + * least one device bond, otherwise svm should be freed.
> + */
> + BUG_ON(list_empty(&svm->devs));

Avoid crashing kernel, use WARN_ON() instead.

if (WARN_ON(list_empty(&svm->devs))) {
ret = -EINVAL;
goto out;
}

> +
> + for_each_svm_dev(svm, dev) {
> + /* In case of multiple sub-devices of the same pdev assigned, we should

Make line shorter. Not over 80 characters.

The same for other lines.

> + * allow multiple bind calls with the same PASID and pdev.
> + */
> + sdev->users++;
> + goto out;
> + }

I remember I ever pointed this out before. But I forgot how we addressed
it. So forgive me if this has been addressed.

What if we have a valid bound svm but @dev doesn't belong to it
(a.k.a. @dev not in svm->devs list)?

> + } else {
> + /* We come here when PASID has never been bond to a device. */
> + svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> + if (!svm) {
> + ret = -ENOMEM;
> + goto out;
> + }
> + /* REVISIT: upper layer/VFIO can track host process that bind the PASID.
> + * ioasid_set = mm might be sufficient for vfio to check pasid VMM
> + * ownership.
> + */
> + svm->mm = get_task_mm(current);
> + svm->pasid = data->hpasid;
> + if (data->flags & IOMMU_SVA_GPASID_VAL) {
> + svm->gpasid = data->gpasid;
> + svm->flags |= SVM_FLAG_GUEST_PASID;
> + }
> + ioasid_set_data(data->hpasid, svm);
> + INIT_LIST_HEAD_RCU(&svm->devs);
> + INIT_LIST_HEAD(&svm->list);
> +
> + mmput(svm->mm);
> + }

A blank line, please.

> + sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> + if (!sdev) {
> + if (list_empty(&svm->devs))
> + kfree(svm);

This is dangerous. This might leave a wild pointer bound with gpasid.

> + ret = -ENOMEM;
> + goto out;
> + }
> + sdev->dev = dev;
> + sdev->users = 1;
> +
> + /* Set up device context entry for PASID if not enabled already */
> + ret = intel_iommu_enable_pasid(iommu, sdev->dev);
> + if (ret) {
> + dev_err(dev, "Failed to enable PASID capability\n");
> + kfree(sdev);
> + goto out;
> + }
> +
> + /*
> + * For guest bind, we need to set up PASID table entry as follows:
> + * - FLPM matches guest paging mode
> + * - turn on nested mode
> + * - SL guest address width matching
> + */
> + ret = intel_pasid_setup_nested(iommu,
> + dev,
> + (pgd_t *)data->gpgd,
> + data->hpasid,
> + &data->vtd,
> + ddomain,
> + data->addr_width);
> + if (ret) {
> + dev_err(dev, "Failed to set up PASID %llu in nested mode, Err %d\n",
> + data->hpasid, ret);

This error handling is insufficient. You should at least:

1. free sdev
2. if list_empty(&svm->devs)
unbound the svm from gpasid
free svm

The same for above error handling. Add a branch for error recovery at
the end of function might help here.

> + kfree(sdev);
> + goto out;
> + }
> + svm->flags |= SVM_FLAG_GUEST_MODE;
> +
> + init_rcu_head(&sdev->rcu);
> + list_add_rcu(&sdev->list, &svm->devs);
> + out:
> + mutex_unlock(&pasid_mutex);
> + return ret;
> +}
> +
> +int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> +{
> + struct intel_svm_dev *sdev;
> + struct intel_iommu *iommu;
> + struct intel_svm *svm;
> + int ret = -EINVAL;
> +
> + mutex_lock(&pasid_mutex);
> + iommu = intel_svm_device_to_iommu(dev);
> + if (!iommu)
> + goto out;

Make it symmetrical with bind function.

if (WARN_ON(!iommu))
goto out;

> +
> + svm = ioasid_find(NULL, pasid, NULL);
> + if (IS_ERR_OR_NULL(svm)) {
> + ret = PTR_ERR(svm);

If svm == NULL, this function will return success. This is not expected,
right?

> + goto out;
> + }
> +
> + for_each_svm_dev(svm, dev) {
> + ret = 0;
> + sdev->users--;
> + if (!sdev->users) {
> + list_del_rcu(&sdev->list);
> + intel_pasid_tear_down_entry(iommu, dev, svm->pasid);
> + /* TODO: Drain in flight PRQ for the PASID since it
> + * may get reused soon, we don't want to
> + * confuse with its previous life.
> + * intel_svm_drain_prq(dev, pasid);
> + */
> + kfree_rcu(sdev, rcu);
> +
> + if (list_empty(&svm->devs)) {
> + list_del(&svm->list);
> + kfree(svm);
> + /*
> + * We do not free PASID here until explicit call
> + * from VFIO to free. The PASID life cycle
> + * management is largely tied to VFIO management
> + * of assigned device life cycles. In case of
> + * guest exit without a explicit free PASID call,
> + * the responsibility lies in VFIO layer to free
> + * the PASIDs allocated for the guest.
> + * For security reasons, VFIO has to track the
> + * PASID ownership per guest anyway to ensure
> + * that PASID allocated by one guest cannot be
> + * used by another.
> + */
> + ioasid_set_data(pasid, NULL);

Exchange order. First unbind svm from gpasid and then free svm.

> + }
> + }
> + break;
> + }
> + out:
> + mutex_unlock(&pasid_mutex);
> +
> + return ret;
> +}
> +
> int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
> {
> struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 3dba6ad3e9ad..6c74c71b1ebf 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -673,7 +673,9 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev);
> int intel_svm_init(struct intel_iommu *iommu);
> extern int intel_svm_enable_prq(struct intel_iommu *iommu);
> extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> -
> +extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
> + struct device *dev, struct iommu_gpasid_bind_data *data);
> +extern int intel_svm_unbind_gpasid(struct device *dev, int pasid);
> struct svm_dev_ops;
>
> struct intel_svm_dev {
> @@ -690,9 +692,13 @@ struct intel_svm_dev {
> struct intel_svm {
> struct mmu_notifier notifier;
> struct mm_struct *mm;
> +
> struct intel_iommu *iommu;
> int flags;
> int pasid;
> + int gpasid; /* Guest PASID in case of vSVA bind with non-identity host
> + * to guest PASID mapping.
> + */
> struct list_head devs;
> struct list_head list;
> };
> diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> index 94f047a8a845..a2c189ad0b01 100644
> --- a/include/linux/intel-svm.h
> +++ b/include/linux/intel-svm.h
> @@ -44,6 +44,23 @@ struct svm_dev_ops {
> * do such IOTLB flushes automatically.
> */
> #define SVM_FLAG_SUPERVISOR_MODE (1<<1)
> +/*
> + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind to a device.
> + * In this case the mm_struct is in the guest kernel or userspace, its life
> + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this API provides
> + * means to bind/unbind guest CR3 with PASIDs allocated for a device.
> + */
> +#define SVM_FLAG_GUEST_MODE (1<<2)

How about keeping this aligned with top by adding a tab?

BIT macro is preferred. Hence, make it BIT(1), BIT(2), BIT(3) is
preferred.

> +/*
> + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own PASID space,
> + * which requires guest and host PASID translation at both directions. We keep
> + * track of guest PASID in order to provide lookup service to device drivers.
> + * One such example is a physical function (PF) driver that supports mediated
> + * device (mdev) assignment. Guest programming of mdev configuration space can
> + * only be done with guest PASID, therefore PF driver needs to find the matching
> + * host PASID to program the real hardware.
> + */
> +#define SVM_FLAG_GUEST_PASID (1<<3)

Ditto.

Best regards,
baolu

2019-10-26 02:29:18

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v7 10/11] iommu/vt-d: Support flushing more translation cache types

Hi,

On 10/25/19 3:55 AM, Jacob Pan wrote:
> When Shared Virtual Memory is exposed to a guest via vIOMMU, scalable
> IOTLB invalidation may be passed down from outside IOMMU subsystems.
> This patch adds invalidation functions that can be used for additional
> translation cache types.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/dmar.c | 46 +++++++++++++++++++++++++++++++++++++++++++++
> drivers/iommu/intel-pasid.c | 3 ++-
> include/linux/intel-iommu.h | 21 +++++++++++++++++----
> 3 files changed, 65 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index 49bb7d76e646..0ce2d32ff99e 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1346,6 +1346,20 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> qi_submit_sync(&desc, iommu);
> }
>
> +/* PASID-based IOTLB Invalidate */
> +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr, u32 pasid,
> + unsigned int size_order, u64 granu, int ih)
> +{
> + struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
> +
> + desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
> + QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
> + desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
> + QI_EIOTLB_AM(size_order);
> +
> + qi_submit_sync(&desc, iommu);
> +}
> +
> void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> u16 qdep, u64 addr, unsigned mask)
> {
> @@ -1369,6 +1383,38 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> qi_submit_sync(&desc, iommu);
> }
>
> +/* PASID-based device IOTLB Invalidate */
> +void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> + u32 pasid, u16 qdep, u64 addr, unsigned size_order, u64 granu)
> +{
> + struct qi_desc desc;

Do you need to set qw2 and qw3 to 0?

> +
> + desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
> + QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
> + QI_DEV_IOTLB_PFSID(pfsid);
> + desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
> +
> + /* If S bit is 0, we only flush a single page. If S bit is set,
> + * The least significant zero bit indicates the invalidation address
> + * range. VT-d spec 6.5.2.6.
> + * e.g. address bit 12[0] indicates 8KB, 13[0] indicates 16KB.
> + */
> + if (!size_order) {
> + desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) & ~QI_DEV_EIOTLB_SIZE;
> + } else {
> + unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order);
> + desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) | QI_DEV_EIOTLB_SIZE;
> + }
> + qi_submit_sync(&desc, iommu);
> +}
> +
> +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int pasid)
> +{
> + struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
> +
> + desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) | QI_PC_GRAN(granu) | QI_PC_TYPE;
> + qi_submit_sync(&desc, iommu);
> +}
> /*
> * Disable Queued Invalidation interface.
> */
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index f846a907cfcf..6d7a701ef4d3 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -491,7 +491,8 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
> {
> struct qi_desc desc;
>
> - desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL | QI_PC_PASID(pasid);
> + desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) |
> + QI_PC_PASID(pasid) | QI_PC_TYPE;
> desc.qw1 = 0;
> desc.qw2 = 0;
> desc.qw3 = 0;
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 6c74c71b1ebf..a25fb3a0ea5b 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -332,7 +332,7 @@ enum {
> #define QI_IOTLB_GRAN(gran) (((u64)gran) >> (DMA_TLB_FLUSH_GRANU_OFFSET-4))
> #define QI_IOTLB_ADDR(addr) (((u64)addr) & VTD_PAGE_MASK)
> #define QI_IOTLB_IH(ih) (((u64)ih) << 6)
> -#define QI_IOTLB_AM(am) (((u8)am))
> +#define QI_IOTLB_AM(am) (((u8)am) & 0x3f)
>
> #define QI_CC_FM(fm) (((u64)fm) << 48)
> #define QI_CC_SID(sid) (((u64)sid) << 32)
> @@ -350,16 +350,21 @@ enum {
> #define QI_PC_DID(did) (((u64)did) << 16)
> #define QI_PC_GRAN(gran) (((u64)gran) << 4)
>
> -#define QI_PC_ALL_PASIDS (QI_PC_TYPE | QI_PC_GRAN(0))
> -#define QI_PC_PASID_SEL (QI_PC_TYPE | QI_PC_GRAN(1))
> +/* PASID cache invalidation granu */
> +#define QI_PC_ALL_PASIDS 0
> +#define QI_PC_PASID_SEL 1
>
> #define QI_EIOTLB_ADDR(addr) ((u64)(addr) & VTD_PAGE_MASK)
> #define QI_EIOTLB_IH(ih) (((u64)ih) << 6)
> -#define QI_EIOTLB_AM(am) (((u64)am))
> +#define QI_EIOTLB_AM(am) (((u64)am) & 0x3f)
> #define QI_EIOTLB_PASID(pasid) (((u64)pasid) << 32)
> #define QI_EIOTLB_DID(did) (((u64)did) << 16)
> #define QI_EIOTLB_GRAN(gran) (((u64)gran) << 4)
>
> +/* QI Dev-IOTLB inv granu */
> +#define QI_DEV_IOTLB_GRAN_ALL 1
> +#define QI_DEV_IOTLB_GRAN_PASID_SEL 0
> +
> #define QI_DEV_EIOTLB_ADDR(a) ((u64)(a) & VTD_PAGE_MASK)
> #define QI_DEV_EIOTLB_SIZE (((u64)1) << 11)
> #define QI_DEV_EIOTLB_GLOB(g) ((u64)g)
> @@ -655,8 +660,16 @@ extern void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid,
> u8 fm, u64 type);
> extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> unsigned int size_order, u64 type);
> +extern void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> + u32 pasid, unsigned int size_order, u64 type, int ih);
> extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> u16 qdep, u64 addr, unsigned mask);
> +
> +extern void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> + u32 pasid, u16 qdep, u64 addr, unsigned size_order, u64 granu);
> +
> +extern void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int pasid);
> +
> extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
>
> extern int dmar_ir_support(void);
>

Best regards,
baolu

2019-10-26 02:45:52

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function

Hi,

On 10/25/19 3:27 PM, Tian, Kevin wrote:
>> From: Jacob Pan [mailto:[email protected]]
>> Sent: Friday, October 25, 2019 3:55 AM
>>
>> When Shared Virtual Address (SVA) is enabled for a guest OS via
>> vIOMMU, we need to provide invalidation support at IOMMU API and
>> driver
>> level. This patch adds Intel VT-d specific function to implement
>> iommu passdown invalidate API for shared virtual address.
>>
>> The use case is for supporting caching structure invalidation
>> of assigned SVM capable devices. Emulated IOMMU exposes queue
>> invalidation capability and passes down all descriptors from the guest
>> to the physical IOMMU.
>
> specifically you may clarify that only invalidations related to
> first-level page table is passed down, because it's guest
> structure being bound to the first-level. other descriptors
> are emulated or translated into other necessary operations.
>
>>
>> The assumption is that guest to host device ID mapping should be
>> resolved prior to calling IOMMU driver. Based on the device handle,
>> host IOMMU driver can replace certain fields before submit to the
>> invalidation queue.
>
> what is device ID? it's a bit confusing term here.
>
>>
>> Signed-off-by: Jacob Pan <[email protected]>
>> Signed-off-by: Ashok Raj <[email protected]>
>> Signed-off-by: Liu, Yi L <[email protected]>
>> ---
>> drivers/iommu/intel-iommu.c | 170
>> ++++++++++++++++++++++++++++++++++++++++++++
>> 1 file changed, 170 insertions(+)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 5fab32fbc4b4..a73e76d6457a 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -5491,6 +5491,175 @@ static void
>> intel_iommu_aux_detach_device(struct iommu_domain *domain,
>> aux_domain_remove_dev(to_dmar_domain(domain), dev);
>> }
>>
>> +/*
>> + * 2D array for converting and sanitizing IOMMU generic TLB granularity to
>> + * VT-d granularity. Invalidation is typically included in the unmap
>> operation
>> + * as a result of DMA or VFIO unmap. However, for assigned device where
>> guest
>> + * could own the first level page tables without being shadowed by QEMU.
>> In
>> + * this case there is no pass down unmap to the host IOMMU as a result of
>> unmap
>> + * in the guest. Only invalidations are trapped and passed down.
>> + * In all cases, only first level TLB invalidation (request with PASID) can be
>> + * passed down, therefore we do not include IOTLB granularity for request
>> + * without PASID (second level).
>> + *
>> + * For an example, to find the VT-d granularity encoding for IOTLB
>> + * type and page selective granularity within PASID:
>> + * X: indexed by iommu cache type
>> + * Y: indexed by enum iommu_inv_granularity
>> + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
>> + *
>> + * Granu_map array indicates validity of the table. 1: valid, 0: invalid
>> + *
>> + */
>> +const static int
>> inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRAN
>> U_NR] = {
>> + /* PASID based IOTLB, support PASID selective and page selective */
>> + {0, 1, 1},
>> + /* PASID based dev TLBs, only support all PASIDs or single PASID */
>> + {1, 1, 0},
>
> I forgot previous discussion. is it necessary to pass down dev TLB invalidation
> requests? Can it be handled by host iOMMU driver automatically?

On host SVA, when a memory is unmapped, driver callback will invalidate
dev IOTLB explicitly. So I guess we need to pass down it for guest case.
This is also required for guest iova over 1st level usage as far as can
see.

Best regards,
baolu

>
>> + /* PASID cache */
>> + {1, 1, 0}
>> +};
>> +
>> +const static u64
>> inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRAN
>> U_NR] = {
>> + /* PASID based IOTLB */
>> + {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
>> + /* PASID based dev TLBs */
>> + {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
>> + /* PASID cache */
>> + {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
>> +};
>> +
>> +static inline int to_vtd_granularity(int type, int granu, u64 *vtd_granu)
>> +{
>> + if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >=
>> IOMMU_INV_GRANU_NR ||
>> + !inv_type_granu_map[type][granu])
>> + return -EINVAL;
>> +
>> + *vtd_granu = inv_type_granu_table[type][granu];
>> +
>> + return 0;
>> +}
>> +
>> +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
>> +{
>> + u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
>> +
>> + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB,
>> etc.
>> + * IOMMU cache invalidate API passes granu_size in bytes, and
>> number of
>> + * granu size in contiguous memory.
>> + */
>> + return order_base_2(nr_pages);
>> +}
>> +
>> +#ifdef CONFIG_INTEL_IOMMU_SVM
>> +static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
>> + struct device *dev, struct iommu_cache_invalidate_info
>> *inv_info)
>> +{
>> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
>> + struct device_domain_info *info;
>> + struct intel_iommu *iommu;
>> + unsigned long flags;
>> + int cache_type;
>> + u8 bus, devfn;
>> + u16 did, sid;
>> + int ret = 0;
>> + u64 size;
>> +
>> + if (!inv_info || !dmar_domain ||
>> + inv_info->version !=
>> IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
>> + return -EINVAL;
>> +
>> + if (!dev || !dev_is_pci(dev))
>> + return -ENODEV;
>> +
>> + iommu = device_to_iommu(dev, &bus, &devfn);
>> + if (!iommu)
>> + return -ENODEV;
>> +
>> + spin_lock_irqsave(&device_domain_lock, flags);
>> + spin_lock(&iommu->lock);
>> + info = iommu_support_dev_iotlb(dmar_domain, iommu, bus,
>> devfn);
>> + if (!info) {
>> + ret = -EINVAL;
>> + goto out_unlock;
>> + }
>> + did = dmar_domain->iommu_did[iommu->seq_id];
>> + sid = PCI_DEVID(bus, devfn);
>> + size = to_vtd_size(inv_info->addr_info.granule_size, inv_info-
>>> addr_info.nb_granules);
>> +
>> + for_each_set_bit(cache_type, (unsigned long *)&inv_info->cache,
>> IOMMU_CACHE_INV_TYPE_NR) {
>> + u64 granu = 0;
>> + u64 pasid = 0;
>> +
>> + ret = to_vtd_granularity(cache_type, inv_info->granularity,
>> &granu);
>> + if (ret) {
>> + pr_err("Invalid cache type and granu
>> combination %d/%d\n", cache_type,
>> + inv_info->granularity);
>> + break;
>> + }
>> +
>> + /* PASID is stored in different locations based on
>> granularity */
>> + if (inv_info->granularity == IOMMU_INV_GRANU_PASID)
>> + pasid = inv_info->pasid_info.pasid;
>> + else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
>> + pasid = inv_info->addr_info.pasid;
>> + else {
>> + pr_err("Cannot find PASID for given cache type and
>> granularity\n");
>> + break;
>> + }
>> +
>> + switch (BIT(cache_type)) {
>> + case IOMMU_CACHE_INV_TYPE_IOTLB:
>> + if (size && (inv_info->addr_info.addr &
>> ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
>> + pr_err("Address out of range, 0x%llx, size
>> order %llu\n",
>> + inv_info->addr_info.addr, size);
>> + ret = -ERANGE;
>> + goto out_unlock;
>> + }
>> +
>> + qi_flush_piotlb(iommu, did,
>> mm_to_dma_pfn(inv_info->addr_info.addr),
>> + pasid, size, granu, inv_info-
>>> addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF);
>> +
>> + /*
>> + * Always flush device IOTLB if ATS is enabled since
>> guest
>> + * vIOMMU exposes CM = 1, no device IOTLB flush
>> will be passed
>> + * down.
>> + */
>> + if (info->ats_enabled) {
>> + qi_flush_dev_piotlb(iommu, sid, info->pfsid,
>> + pasid, info->ats_qdep,
>> + inv_info->addr_info.addr,
>> size,
>> + granu);
>> + }
>> + break;
>> + case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
>> + if (info->ats_enabled) {
>> + qi_flush_dev_piotlb(iommu, sid, info->pfsid,
>> + inv_info->addr_info.pasid,
>> info->ats_qdep,
>> + inv_info->addr_info.addr,
>> size,
>> + granu);
>> + } else
>> + pr_warn("Passdown device IOTLB flush w/o
>> ATS!\n");
>> +
>> + break;
>> + case IOMMU_CACHE_INV_TYPE_PASID:
>> + qi_flush_pasid_cache(iommu, did, granu, inv_info-
>>> pasid_info.pasid);
>> +
>> + break;
>> + default:
>> + dev_err(dev, "Unsupported IOMMU invalidation
>> type %d\n",
>> + cache_type);
>> + ret = -EINVAL;
>> + }
>> + }
>> +out_unlock:
>> + spin_unlock(&iommu->lock);
>> + spin_unlock_irqrestore(&device_domain_lock, flags);
>> +
>> + return ret;
>> +}
>> +#endif
>> +
>> static int intel_iommu_map(struct iommu_domain *domain,
>> unsigned long iova, phys_addr_t hpa,
>> size_t size, int iommu_prot)
>> @@ -6027,6 +6196,7 @@ const struct iommu_ops intel_iommu_ops = {
>> .is_attach_deferred = intel_iommu_is_attach_deferred,
>> .pgsize_bitmap = INTEL_IOMMU_PGSIZES,
>> #ifdef CONFIG_INTEL_IOMMU_SVM
>> + .cache_invalidate = intel_iommu_sva_invalidate,
>> .sva_bind_gpasid = intel_svm_bind_gpasid,
>> .sva_unbind_gpasid = intel_svm_unbind_gpasid,
>> #endif
>> --
>> 2.7.4
>
>

2019-10-26 07:06:45

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function

Hi again,

On 10/26/19 10:40 AM, Lu Baolu wrote:
> Hi,
>
> On 10/25/19 3:27 PM, Tian, Kevin wrote:
>>> From: Jacob Pan [mailto:[email protected]]
>>> Sent: Friday, October 25, 2019 3:55 AM
>>>
>>> When Shared Virtual Address (SVA) is enabled for a guest OS via
>>> vIOMMU, we need to provide invalidation support at IOMMU API and
>>> driver
>>> level. This patch adds Intel VT-d specific function to implement
>>> iommu passdown invalidate API for shared virtual address.
>>>
>>> The use case is for supporting caching structure invalidation
>>> of assigned SVM capable devices. Emulated IOMMU exposes queue
>>> invalidation capability and passes down all descriptors from the guest
>>> to the physical IOMMU.
>>
>> specifically you may clarify that only invalidations related to
>> first-level page table is passed down, because it's guest
>> structure being bound to the first-level. other descriptors
>> are emulated or translated into other necessary operations.
>>
>>>
>>> The assumption is that guest to host device ID mapping should be
>>> resolved prior to calling IOMMU driver. Based on the device handle,
>>> host IOMMU driver can replace certain fields before submit to the
>>> invalidation queue.
>>
>> what is device ID? it's a bit confusing term here.
>>
>>>
>>> Signed-off-by: Jacob Pan <[email protected]>
>>> Signed-off-by: Ashok Raj <[email protected]>
>>> Signed-off-by: Liu, Yi L <[email protected]>
>>> ---
>>>   drivers/iommu/intel-iommu.c | 170
>>> ++++++++++++++++++++++++++++++++++++++++++++
>>>   1 file changed, 170 insertions(+)
>>>
>>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>>> index 5fab32fbc4b4..a73e76d6457a 100644
>>> --- a/drivers/iommu/intel-iommu.c
>>> +++ b/drivers/iommu/intel-iommu.c
>>> @@ -5491,6 +5491,175 @@ static void
>>> intel_iommu_aux_detach_device(struct iommu_domain *domain,
>>>       aux_domain_remove_dev(to_dmar_domain(domain), dev);
>>>   }
>>>
>>> +/*
>>> + * 2D array for converting and sanitizing IOMMU generic TLB
>>> granularity to
>>> + * VT-d granularity. Invalidation is typically included in the unmap
>>> operation
>>> + * as a result of DMA or VFIO unmap. However, for assigned device where
>>> guest
>>> + * could own the first level page tables without being shadowed by
>>> QEMU.
>>> In
>>> + * this case there is no pass down unmap to the host IOMMU as a
>>> result of
>>> unmap
>>> + * in the guest. Only invalidations are trapped and passed down.
>>> + * In all cases, only first level TLB invalidation (request with
>>> PASID) can be
>>> + * passed down, therefore we do not include IOTLB granularity for
>>> request
>>> + * without PASID (second level).
>>> + *
>>> + * For an example, to find the VT-d granularity encoding for IOTLB
>>> + * type and page selective granularity within PASID:
>>> + * X: indexed by iommu cache type
>>> + * Y: indexed by enum iommu_inv_granularity
>>> + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
>>> + *
>>> + * Granu_map array indicates validity of the table. 1: valid, 0:
>>> invalid
>>> + *
>>> + */
>>> +const static int
>>> inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRAN
>>> U_NR] = {
>>> +    /* PASID based IOTLB, support PASID selective and page selective */
>>> +    {0, 1, 1},
>>> +    /* PASID based dev TLBs, only support all PASIDs or single PASID */
>>> +    {1, 1, 0},
>>
>> I forgot previous discussion. is it necessary to pass down dev TLB
>> invalidation
>> requests? Can it be handled by host iOMMU driver automatically?
>
> On host SVA, when a memory is unmapped, driver callback will invalidate
> dev IOTLB explicitly. So I guess we need to pass down it for guest case.
> This is also required for guest iova over 1st level usage as far as can
> see.
>

Sorry, I confused guest vIOVA and guest vSVA. For guest vIOVA, no device
TLB invalidation pass down. But currently for guest vSVA, device TLB
invalidation is passed down. Perhaps we can avoid passing down dev TLB
flush just like what we are doing for guest IOVA.

Best regards,
baolu

2019-10-28 15:56:03

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support

> From: Jacob Pan [mailto:[email protected]]
> Sent: Saturday, October 26, 2019 1:34 AM
>
> Hi Kevin,
>
>
> On Fri, 25 Oct 2019 07:19:26 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > From: Jacob Pan [mailto:[email protected]]
> > > Sent: Friday, October 25, 2019 3:55 AM
> > >
> > > When supporting guest SVA with emulated IOMMU, the guest PASID
> > > table is shadowed in VMM. Updates to guest vIOMMU PASID table
> > > will result in PASID cache flush which will be passed down to
> > > the host as bind guest PASID calls.
> >
> > will be translated into binding/unbinding guest PASID calls to update
> > the host shadow PASID table.
> >
> yours is more precise, will replace.
> > >
> > > For the SL page tables, it will be harvested from device's
> > > default domain (request w/o PASID), or aux domain in case of
> > > mediated device.
> >
> > harvested -> copied or linked to?
> Kind of the same, but I agree copied is more technical and precise
> term. Will change.
>
> > >
> > > .-------------. .---------------------------.
> > > | vIOMMU | | Guest process CR3, FL only|
> > > | | '---------------------------'
> > > .----------------/
> > > | PASID Entry |--- PASID cache flush -
> > > '-------------' |
> > > | | V
> > > | | CR3 in GPA
> > > '-------------'
> > > Guest
> > > ------| Shadow |--------------------------|--------
> > > v v v
> > > Host
> > > .-------------. .----------------------.
> > > | pIOMMU | | Bind FL for GVA-GPA |
> > > | | '----------------------'
> > > .----------------/ |
> > > | PASID Entry | V (Nested xlate)
> > > '----------------\.------------------------------.
> > > | | |SL for GPA-HPA, default domain|
> > > | | '------------------------------'
> > > '-------------'
> > > Where:
> > > - FL = First level/stage one page tables
> > > - SL = Second level/stage two page tables
> > >
> > > Signed-off-by: Jacob Pan <[email protected]>
> > > Signed-off-by: Liu, Yi L <[email protected]>
> > > ---
> > > drivers/iommu/intel-iommu.c | 4 +
> > > drivers/iommu/intel-svm.c | 184
> > > ++++++++++++++++++++++++++++++++++++++++++++
> > > include/linux/intel-iommu.h | 8 +-
> > > include/linux/intel-svm.h | 17 ++++
> > > 4 files changed, 212 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/iommu/intel-iommu.c
> > > b/drivers/iommu/intel-iommu.c index acd1ac787d8b..5fab32fbc4b4
> > > 100644 --- a/drivers/iommu/intel-iommu.c
> > > +++ b/drivers/iommu/intel-iommu.c
> > > @@ -6026,6 +6026,10 @@ const struct iommu_ops intel_iommu_ops =
> {
> > > .dev_disable_feat = intel_iommu_dev_disable_feat,
> > > .is_attach_deferred =
> > > intel_iommu_is_attach_deferred, .pgsize_bitmap =
> > > INTEL_IOMMU_PGSIZES, +#ifdef CONFIG_INTEL_IOMMU_SVM
> > > + .sva_bind_gpasid = intel_svm_bind_gpasid,
> > > + .sva_unbind_gpasid = intel_svm_unbind_gpasid,
> > > +#endif
> >
> > again, pure PASID management logic should be separated from SVM.
> >
> I am not following, these two functions are SVM functionality, not
> pure PASID management which is already separated in ioasid.c

I should say pure "scalable mode" logic. Above callbacks are not
related to host SVM per se. They are serving gpasid requests from
guest side, thus part of generic scalable mode capability.

>
> > > };
> > >
> > > static void quirk_iommu_igfx(struct pci_dev *dev)
> > > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > > index a18b02a9709d..ae13a310cf96 100644
> > > --- a/drivers/iommu/intel-svm.c
> > > +++ b/drivers/iommu/intel-svm.c
> > > @@ -216,6 +216,190 @@ static LIST_HEAD(global_svm_list);
> > > list_for_each_entry(sdev, &svm->devs, list) \
> > > if (dev == sdev->dev) \
> > >
> > > +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> > > + struct device *dev,
> > > + struct iommu_gpasid_bind_data *data)
> > > +{
> > > + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > > + struct dmar_domain *ddomain;
> > > + struct intel_svm_dev *sdev;
> > > + struct intel_svm *svm;
> > > + int ret = 0;
> > > +
> > > + if (WARN_ON(!iommu) || !data)
> > > + return -EINVAL;
> > > +
> > > + if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> > > + data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
> > > + return -EINVAL;
> > > +
> > > + if (dev_is_pci(dev)) {
> > > + /* VT-d supports devices with full 20 bit PASIDs
> > > only */
> > > + if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> > > + return -EINVAL;
> > > + }
> >
> > what about non-pci devices? It just moves forward w/o any check here?
> >
> Good catch, we only support PCI-device on Intel. Even mdev has to pass
> the pdev to bind. Will add the else case.
>
> > > +
> > > + /*
> > > + * We only check host PASID range, we have no knowledge to
> > > check
> > > + * guest PASID range nor do we use the guest PASID.
> > > + */
> > > + if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> > > + return -EINVAL;
> > > +
> > > + ddomain = to_dmar_domain(domain);
> > > + /* REVISIT:
> > > + * Sanity check adddress width and paging mode support
> > > + * width matching in two dimensions:
> > > + * 1. paging mode CPU <= IOMMU
> > > + * 2. address width Guest <= Host.
> > > + */
> >
> > Is lacking of above logic harmful? If not, we should add
> >
> It is better to add the check now, not solely rely on QEMU.
>
> > > + mutex_lock(&pasid_mutex);
> > > + svm = ioasid_find(NULL, data->hpasid, NULL);
> > > + if (IS_ERR(svm)) {
> > > + ret = PTR_ERR(svm);
> > > + goto out;
> > > + }
> > > + if (svm) {
> > > + /*
> > > + * If we found svm for the PASID, there must be at
> > > + * least one device bond, otherwise svm should be
> > > freed.
> > > + */
> > > + BUG_ON(list_empty(&svm->devs));
> > > +
> > > + for_each_svm_dev(svm, dev) {
> > > + /* In case of multiple sub-devices of the
> > > same pdev assigned, we should
> > > + * allow multiple bind calls with the same
> > > PASID and pdev.
> > > + */
> > > + sdev->users++;
> > > + goto out;
> >
> > sorry if I overlooked, but I didn't see any check on the PASID
> > actually belonging to this process. At least should check the
> > match between svm->mm and get_task_mm? also check
> > whether a previous binding between this hpasid and gpasid
> > already exists.
> >
> We had some discussions on whom should be responsible for checking
> ownership. I tend to think VFIO is right place but I guess we can also
> double check here.
> Good point, we should check the same H-G PASID bind already exists.
> > > + }
> > > + } else {
> > > + /* We come here when PASID has never been bond to a
> > > device. */
> > > + svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> > > + if (!svm) {
> > > + ret = -ENOMEM;
> > > + goto out;
> > > + }
> > > + /* REVISIT: upper layer/VFIO can track host
> > > process that bind the PASID.
> > > + * ioasid_set = mm might be sufficient for vfio to
> > > check pasid VMM
> > > + * ownership.
> > > + */
> >
> > Is it correct to leave the check to the caller?
> >
> Ditto, we will double check. But since this is related to the guest, I
> feel iommu driver check mm might be too restrictive. I am not sure if
> any VMM could have more than one process? One process does alloc, the
> other does bind.

one process, and there might be multiple threads each corresponding
to a vCPU.

>
> > > + svm->mm = get_task_mm(current);
> > > + svm->pasid = data->hpasid;
> > > + if (data->flags & IOMMU_SVA_GPASID_VAL) {
> > > + svm->gpasid = data->gpasid;
> > > + svm->flags |= SVM_FLAG_GUEST_PASID;
> > > + }
> > > + ioasid_set_data(data->hpasid, svm);
> > > + INIT_LIST_HEAD_RCU(&svm->devs);
> > > + INIT_LIST_HEAD(&svm->list);
> > > +
> > > + mmput(svm->mm);
> > > + }
> > > + sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> > > + if (!sdev) {
> > > + if (list_empty(&svm->devs))
> > > + kfree(svm);
> > > + ret = -ENOMEM;
> > > + goto out;
> > > + }
> > > + sdev->dev = dev;
> > > + sdev->users = 1;
> > > +
> > > + /* Set up device context entry for PASID if not enabled
> > > already */
> > > + ret = intel_iommu_enable_pasid(iommu, sdev->dev);
> > > + if (ret) {
> > > + dev_err(dev, "Failed to enable PASID
> > > capability\n");
> > > + kfree(sdev);
> > > + goto out;
> > > + }
> > > +
> > > + /*
> > > + * For guest bind, we need to set up PASID table entry as
> > > follows:
> > > + * - FLPM matches guest paging mode
> > > + * - turn on nested mode
> > > + * - SL guest address width matching
> > > + */
> > > + ret = intel_pasid_setup_nested(iommu,
> > > + dev,
> > > + (pgd_t *)data->gpgd,
> > > + data->hpasid,
> > > + &data->vtd,
> > > + ddomain,
> > > + data->addr_width);
> > > + if (ret) {
> > > + dev_err(dev, "Failed to set up PASID %llu in
> > > nested mode, Err %d\n",
> > > + data->hpasid, ret);
> > > + kfree(sdev);
> >
> > disable pasid? revert ioasid_set_data?
> >
> Good catch, will do.
> > > + goto out;
> > > + }
> > > + svm->flags |= SVM_FLAG_GUEST_MODE;
> > > +
> > > + init_rcu_head(&sdev->rcu);
> > > + list_add_rcu(&sdev->list, &svm->devs);
> > > + out:
> > > + mutex_unlock(&pasid_mutex);
> > > + return ret;
> > > +}
> > > +
> > > +int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> > > +{
> > > + struct intel_svm_dev *sdev;
> > > + struct intel_iommu *iommu;
> > > + struct intel_svm *svm;
> > > + int ret = -EINVAL;
> > > +
> > > + mutex_lock(&pasid_mutex);
> > > + iommu = intel_svm_device_to_iommu(dev);
> > > + if (!iommu)
> > > + goto out;
> > > +
> > > + svm = ioasid_find(NULL, pasid, NULL);
> > > + if (IS_ERR_OR_NULL(svm)) {
> > > + ret = PTR_ERR(svm);
> > > + goto out;
> > > + }
> > > +
> > > + for_each_svm_dev(svm, dev) {
> > > + ret = 0;
> > > + sdev->users--;
> > > + if (!sdev->users) {
> > > + list_del_rcu(&sdev->list);
> > > + intel_pasid_tear_down_entry(iommu, dev,
> > > svm-
> > > >pasid);
> > > + /* TODO: Drain in flight PRQ for the PASID
> > > since it
> > > + * may get reused soon, we don't want to
> > > + * confuse with its previous life.
> > > + * intel_svm_drain_prq(dev, pasid);
> > > + */
> > > + kfree_rcu(sdev, rcu);
> > > +
> > > + if (list_empty(&svm->devs)) {
> > > + list_del(&svm->list);
> > > + kfree(svm);
> > > + /*
> > > + * We do not free PASID here until
> > > explicit call
> > > + * from VFIO to free. The PASID
> > > life cycle
> > > + * management is largely tied to
> > > VFIO management
> > > + * of assigned device life cycles.
> > > In case of
> > > + * guest exit without a explicit
> > > free PASID call,
> > > + * the responsibility lies in VFIO
> > > layer to free
> > > + * the PASIDs allocated for the
> > > guest.
> > > + * For security reasons, VFIO has
> > > to track the
> > > + * PASID ownership per guest
> > > anyway to ensure
> > > + * that PASID allocated by one
> > > guest cannot be
> > > + * used by another.
> > > + */
> > > + ioasid_set_data(pasid, NULL);
> > > + }
> > > + }
> > > + break;
> > > + }
> > > + out:
> > > + mutex_unlock(&pasid_mutex);
> > > +
> > > + return ret;
> > > +}
> > > +
> > > int intel_svm_bind_mm(struct device *dev, int *pasid, int flags,
> > > struct svm_dev_ops *ops)
> > > {
> > > struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > > diff --git a/include/linux/intel-iommu.h
> > > b/include/linux/intel-iommu.h index 3dba6ad3e9ad..6c74c71b1ebf
> > > 100644 --- a/include/linux/intel-iommu.h
> > > +++ b/include/linux/intel-iommu.h
> > > @@ -673,7 +673,9 @@ int intel_iommu_enable_pasid(struct
> intel_iommu
> > > *iommu, struct device *dev);
> > > int intel_svm_init(struct intel_iommu *iommu);
> > > extern int intel_svm_enable_prq(struct intel_iommu *iommu);
> > > extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> > > -
> > > +extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
> > > + struct device *dev, struct iommu_gpasid_bind_data
> > > *data); +extern int intel_svm_unbind_gpasid(struct device *dev, int
> > > pasid); struct svm_dev_ops;
> > >
> > > struct intel_svm_dev {
> > > @@ -690,9 +692,13 @@ struct intel_svm_dev {
> > > struct intel_svm {
> > > struct mmu_notifier notifier;
> > > struct mm_struct *mm;
> > > +
> > > struct intel_iommu *iommu;
> > > int flags;
> > > int pasid;
> > > + int gpasid; /* Guest PASID in case of vSVA bind with
> > > non-identity host
> > > + * to guest PASID mapping.
> > > + */
> > > struct list_head devs;
> > > struct list_head list;
> > > };
> > > diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> > > index 94f047a8a845..a2c189ad0b01 100644
> > > --- a/include/linux/intel-svm.h
> > > +++ b/include/linux/intel-svm.h
> > > @@ -44,6 +44,23 @@ struct svm_dev_ops {
> > > * do such IOTLB flushes automatically.
> > > */
> > > #define SVM_FLAG_SUPERVISOR_MODE (1<<1)
> > > +/*
> > > + * The SVM_FLAG_GUEST_MODE flag is used when a guest process
> bind
> > > to a device.
> > > + * In this case the mm_struct is in the guest kernel or userspace,
> > > its life
> > > + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this
> > > API provides
> > > + * means to bind/unbind guest CR3 with PASIDs allocated for a
> > > device.
> > > + */
> > > +#define SVM_FLAG_GUEST_MODE (1<<2)
> > > +/*
> > > + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own
> > > PASID space,
> > > + * which requires guest and host PASID translation at both
> > > directions. We keep
> > > + * track of guest PASID in order to provide lookup service to
> > > device drivers.
> > > + * One such example is a physical function (PF) driver that
> > > supports mediated
> > > + * device (mdev) assignment. Guest programming of mdev
> > > configuration space can
> > > + * only be done with guest PASID, therefore PF driver needs to
> > > find the matching
> > > + * host PASID to program the real hardware.
> > > + */
> > > +#define SVM_FLAG_GUEST_PASID (1<<3)
> > >
> > > #ifdef CONFIG_INTEL_IOMMU_SVM
> > >
> > > --
> > > 2.7.4
> >
>
> [Jacob Pan]

2019-10-28 15:56:45

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function

> From: Lu Baolu [mailto:[email protected]]
> Sent: Saturday, October 26, 2019 3:03 PM
>
> Hi again,
>
> On 10/26/19 10:40 AM, Lu Baolu wrote:
> > Hi,
> >
> > On 10/25/19 3:27 PM, Tian, Kevin wrote:
> >>> From: Jacob Pan [mailto:[email protected]]
> >>> Sent: Friday, October 25, 2019 3:55 AM
> >>>
> >>> When Shared Virtual Address (SVA) is enabled for a guest OS via
> >>> vIOMMU, we need to provide invalidation support at IOMMU API and
> >>> driver
> >>> level. This patch adds Intel VT-d specific function to implement
> >>> iommu passdown invalidate API for shared virtual address.
> >>>
> >>> The use case is for supporting caching structure invalidation
> >>> of assigned SVM capable devices. Emulated IOMMU exposes queue
> >>> invalidation capability and passes down all descriptors from the guest
> >>> to the physical IOMMU.
> >>
> >> specifically you may clarify that only invalidations related to
> >> first-level page table is passed down, because it's guest
> >> structure being bound to the first-level. other descriptors
> >> are emulated or translated into other necessary operations.
> >>
> >>>
> >>> The assumption is that guest to host device ID mapping should be
> >>> resolved prior to calling IOMMU driver. Based on the device handle,
> >>> host IOMMU driver can replace certain fields before submit to the
> >>> invalidation queue.
> >>
> >> what is device ID? it's a bit confusing term here.
> >>
> >>>
> >>> Signed-off-by: Jacob Pan <[email protected]>
> >>> Signed-off-by: Ashok Raj <[email protected]>
> >>> Signed-off-by: Liu, Yi L <[email protected]>
> >>> ---
> >>>   drivers/iommu/intel-iommu.c | 170
> >>> ++++++++++++++++++++++++++++++++++++++++++++
> >>>   1 file changed, 170 insertions(+)
> >>>
> >>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-
> iommu.c
> >>> index 5fab32fbc4b4..a73e76d6457a 100644
> >>> --- a/drivers/iommu/intel-iommu.c
> >>> +++ b/drivers/iommu/intel-iommu.c
> >>> @@ -5491,6 +5491,175 @@ static void
> >>> intel_iommu_aux_detach_device(struct iommu_domain *domain,
> >>>       aux_domain_remove_dev(to_dmar_domain(domain), dev);
> >>>   }
> >>>
> >>> +/*
> >>> + * 2D array for converting and sanitizing IOMMU generic TLB
> >>> granularity to
> >>> + * VT-d granularity. Invalidation is typically included in the unmap
> >>> operation
> >>> + * as a result of DMA or VFIO unmap. However, for assigned device
> where
> >>> guest
> >>> + * could own the first level page tables without being shadowed by
> >>> QEMU.
> >>> In
> >>> + * this case there is no pass down unmap to the host IOMMU as a
> >>> result of
> >>> unmap
> >>> + * in the guest. Only invalidations are trapped and passed down.
> >>> + * In all cases, only first level TLB invalidation (request with
> >>> PASID) can be
> >>> + * passed down, therefore we do not include IOTLB granularity for
> >>> request
> >>> + * without PASID (second level).
> >>> + *
> >>> + * For an example, to find the VT-d granularity encoding for IOTLB
> >>> + * type and page selective granularity within PASID:
> >>> + * X: indexed by iommu cache type
> >>> + * Y: indexed by enum iommu_inv_granularity
> >>> + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
> >>> + *
> >>> + * Granu_map array indicates validity of the table. 1: valid, 0:
> >>> invalid
> >>> + *
> >>> + */
> >>> +const static int
> >>>
> inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRAN
> >>> U_NR] = {
> >>> +    /* PASID based IOTLB, support PASID selective and page selective */
> >>> +    {0, 1, 1},
> >>> +    /* PASID based dev TLBs, only support all PASIDs or single PASID */
> >>> +    {1, 1, 0},
> >>
> >> I forgot previous discussion. is it necessary to pass down dev TLB
> >> invalidation
> >> requests? Can it be handled by host iOMMU driver automatically?
> >
> > On host SVA, when a memory is unmapped, driver callback will invalidate
> > dev IOTLB explicitly. So I guess we need to pass down it for guest case.
> > This is also required for guest iova over 1st level usage as far as can
> > see.
> >
>
> Sorry, I confused guest vIOVA and guest vSVA. For guest vIOVA, no device
> TLB invalidation pass down. But currently for guest vSVA, device TLB
> invalidation is passed down. Perhaps we can avoid passing down dev TLB
> flush just like what we are doing for guest IOVA.
>

I think dev TLB is fully handled within IOMMU driver today. It doesn't
require device driver to explicit toggle. With this then we can fully
virtualize guest dev TLB invalidation request to save one syscall, since
the host is supposed to flush dev TLB when serving the earlier IOTLB
invalidation pass-down.

Thanks
Kevin

2019-10-28 21:13:51

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function

On Fri, 25 Oct 2019 07:27:26 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Jacob Pan [mailto:[email protected]]
> > Sent: Friday, October 25, 2019 3:55 AM
> >
> > When Shared Virtual Address (SVA) is enabled for a guest OS via
> > vIOMMU, we need to provide invalidation support at IOMMU API and
> > driver
> > level. This patch adds Intel VT-d specific function to implement
> > iommu passdown invalidate API for shared virtual address.
> >
> > The use case is for supporting caching structure invalidation
> > of assigned SVM capable devices. Emulated IOMMU exposes queue
> > invalidation capability and passes down all descriptors from the
> > guest to the physical IOMMU.
>
> specifically you may clarify that only invalidations related to
> first-level page table is passed down, because it's guest
> structure being bound to the first-level. other descriptors
> are emulated or translated into other necessary operations.
>
Sounds good, will do.
> >
> > The assumption is that guest to host device ID mapping should be
> > resolved prior to calling IOMMU driver. Based on the device handle,
> > host IOMMU driver can replace certain fields before submit to the
> > invalidation queue.
>
> what is device ID? it's a bit confusing term here.
>
Device ID meant requester IDs, or guest to host PCI BDF mapping should
be resolved such that passdown invalidation is targeting host PCI
device. I will rephrase.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > Signed-off-by: Ashok Raj <[email protected]>
> > Signed-off-by: Liu, Yi L <[email protected]>
> > ---
> > drivers/iommu/intel-iommu.c | 170
> > ++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 170 insertions(+)
> >
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 5fab32fbc4b4..a73e76d6457a
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5491,6 +5491,175 @@ static void
> > intel_iommu_aux_detach_device(struct iommu_domain *domain,
> > aux_domain_remove_dev(to_dmar_domain(domain), dev);
> > }
> >
> > +/*
> > + * 2D array for converting and sanitizing IOMMU generic TLB
> > granularity to
> > + * VT-d granularity. Invalidation is typically included in the
> > unmap operation
> > + * as a result of DMA or VFIO unmap. However, for assigned device
> > where guest
> > + * could own the first level page tables without being shadowed by
> > QEMU. In
> > + * this case there is no pass down unmap to the host IOMMU as a
> > result of unmap
> > + * in the guest. Only invalidations are trapped and passed down.
> > + * In all cases, only first level TLB invalidation (request with
> > PASID) can be
> > + * passed down, therefore we do not include IOTLB granularity for
> > request
> > + * without PASID (second level).
> > + *
> > + * For an example, to find the VT-d granularity encoding for IOTLB
> > + * type and page selective granularity within PASID:
> > + * X: indexed by iommu cache type
> > + * Y: indexed by enum iommu_inv_granularity
> > + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
> > + *
> > + * Granu_map array indicates validity of the table. 1: valid, 0:
> > invalid
> > + *
> > + */
> > +const static int
> > inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRAN
> > U_NR] = {
> > + /* PASID based IOTLB, support PASID selective and page
> > selective */
> > + {0, 1, 1},
> > + /* PASID based dev TLBs, only support all PASIDs or single
> > PASID */
> > + {1, 1, 0},
>
> I forgot previous discussion. is it necessary to pass down dev TLB
> invalidation requests? Can it be handled by host iOMMU driver
> automatically?
>
> > + /* PASID cache */
> > + {1, 1, 0}
> > +};
> > +
> > +const static u64
> > inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRAN
> > U_NR] = {
> > + /* PASID based IOTLB */
> > + {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
> > + /* PASID based dev TLBs */
> > + {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
> > + /* PASID cache */
> > + {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
> > +};
> > +
> > +static inline int to_vtd_granularity(int type, int granu, u64
> > *vtd_granu) +{
> > + if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >=
> > IOMMU_INV_GRANU_NR ||
> > + !inv_type_granu_map[type][granu])
> > + return -EINVAL;
> > +
> > + *vtd_granu = inv_type_granu_table[type][granu];
> > +
> > + return 0;
> > +}
> > +
> > +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
> > +{
> > + u64 nr_pages = (granu_size * nr_granules) >>
> > VTD_PAGE_SHIFT; +
> > + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9
> > for 2MB, etc.
> > + * IOMMU cache invalidate API passes granu_size in bytes,
> > and number of
> > + * granu size in contiguous memory.
> > + */
> > + return order_base_2(nr_pages);
> > +}
> > +
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
> > + struct device *dev, struct
> > iommu_cache_invalidate_info *inv_info)
> > +{
> > + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > + struct device_domain_info *info;
> > + struct intel_iommu *iommu;
> > + unsigned long flags;
> > + int cache_type;
> > + u8 bus, devfn;
> > + u16 did, sid;
> > + int ret = 0;
> > + u64 size;
> > +
> > + if (!inv_info || !dmar_domain ||
> > + inv_info->version !=
> > IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> > + return -EINVAL;
> > +
> > + if (!dev || !dev_is_pci(dev))
> > + return -ENODEV;
> > +
> > + iommu = device_to_iommu(dev, &bus, &devfn);
> > + if (!iommu)
> > + return -ENODEV;
> > +
> > + spin_lock_irqsave(&device_domain_lock, flags);
> > + spin_lock(&iommu->lock);
> > + info = iommu_support_dev_iotlb(dmar_domain, iommu, bus,
> > devfn);
> > + if (!info) {
> > + ret = -EINVAL;
> > + goto out_unlock;
> > + }
> > + did = dmar_domain->iommu_did[iommu->seq_id];
> > + sid = PCI_DEVID(bus, devfn);
> > + size = to_vtd_size(inv_info->addr_info.granule_size,
> > inv_info-
> > >addr_info.nb_granules);
> > +
> > + for_each_set_bit(cache_type, (unsigned long
> > *)&inv_info->cache, IOMMU_CACHE_INV_TYPE_NR) {
> > + u64 granu = 0;
> > + u64 pasid = 0;
> > +
> > + ret = to_vtd_granularity(cache_type,
> > inv_info->granularity, &granu);
> > + if (ret) {
> > + pr_err("Invalid cache type and granu
> > combination %d/%d\n", cache_type,
> > + inv_info->granularity);
> > + break;
> > + }
> > +
> > + /* PASID is stored in different locations based on
> > granularity */
> > + if (inv_info->granularity == IOMMU_INV_GRANU_PASID)
> > + pasid = inv_info->pasid_info.pasid;
> > + else if (inv_info->granularity ==
> > IOMMU_INV_GRANU_ADDR)
> > + pasid = inv_info->addr_info.pasid;
> > + else {
> > + pr_err("Cannot find PASID for given cache
> > type and granularity\n");
> > + break;
> > + }
> > +
> > + switch (BIT(cache_type)) {
> > + case IOMMU_CACHE_INV_TYPE_IOTLB:
> > + if (size && (inv_info->addr_info.addr &
> > ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
> > + pr_err("Address out of range,
> > 0x%llx, size order %llu\n",
> > + inv_info->addr_info.addr,
> > size);
> > + ret = -ERANGE;
> > + goto out_unlock;
> > + }
> > +
> > + qi_flush_piotlb(iommu, did,
> > mm_to_dma_pfn(inv_info->addr_info.addr),
> > + pasid, size, granu,
> > inv_info-
> > >addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF);
> > +
> > + /*
> > + * Always flush device IOTLB if ATS is
> > enabled since guest
> > + * vIOMMU exposes CM = 1, no device IOTLB
> > flush will be passed
> > + * down.
> > + */
> > + if (info->ats_enabled) {
> > + qi_flush_dev_piotlb(iommu, sid,
> > info->pfsid,
> > + pasid,
> > info->ats_qdep,
> > +
> > inv_info->addr_info.addr, size,
> > + granu);
> > + }
> > + break;
> > + case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
> > + if (info->ats_enabled) {
> > + qi_flush_dev_piotlb(iommu, sid,
> > info->pfsid,
> > +
> > inv_info->addr_info.pasid, info->ats_qdep,
> > +
> > inv_info->addr_info.addr, size,
> > + granu);
> > + } else
> > + pr_warn("Passdown device IOTLB
> > flush w/o ATS!\n");
> > +
> > + break;
> > + case IOMMU_CACHE_INV_TYPE_PASID:
> > + qi_flush_pasid_cache(iommu, did, granu,
> > inv_info-
> > >pasid_info.pasid);
> > +
> > + break;
> > + default:
> > + dev_err(dev, "Unsupported IOMMU
> > invalidation type %d\n",
> > + cache_type);
> > + ret = -EINVAL;
> > + }
> > + }
> > +out_unlock:
> > + spin_unlock(&iommu->lock);
> > + spin_unlock_irqrestore(&device_domain_lock, flags);
> > +
> > + return ret;
> > +}
> > +#endif
> > +
> > static int intel_iommu_map(struct iommu_domain *domain,
> > unsigned long iova, phys_addr_t hpa,
> > size_t size, int iommu_prot)
> > @@ -6027,6 +6196,7 @@ const struct iommu_ops intel_iommu_ops = {
> > .is_attach_deferred =
> > intel_iommu_is_attach_deferred, .pgsize_bitmap =
> > INTEL_IOMMU_PGSIZES, #ifdef CONFIG_INTEL_IOMMU_SVM
> > + .cache_invalidate = intel_iommu_sva_invalidate,
> > .sva_bind_gpasid = intel_svm_bind_gpasid,
> > .sva_unbind_gpasid = intel_svm_unbind_gpasid,
> > #endif
> > --
> > 2.7.4
>

[Jacob Pan]

2019-10-28 21:14:53

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support

On Mon, 28 Oct 2019 06:03:36 +0000
"Tian, Kevin" <[email protected]> wrote:

> > > > + .sva_bind_gpasid = intel_svm_bind_gpasid,
> > > > + .sva_unbind_gpasid = intel_svm_unbind_gpasid,
> > > > +#endif
> > >
> > > again, pure PASID management logic should be separated from SVM.
> > >
> > I am not following, these two functions are SVM functionality, not
> > pure PASID management which is already separated in ioasid.c
>
> I should say pure "scalable mode" logic. Above callbacks are not
> related to host SVM per se. They are serving gpasid requests from
> guest side, thus part of generic scalable mode capability.
Got your point, but we are sharing data structures with host SVM, it is
very difficult and inefficient to separate the two.

2019-10-28 21:15:43

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function

On Mon, 28 Oct 2019 06:06:33 +0000
"Tian, Kevin" <[email protected]> wrote:

> > >>> +    /* PASID based dev TLBs, only support all PASIDs or single
> > >>> PASID */
> > >>> +    {1, 1, 0},
> > >>
> > >> I forgot previous discussion. is it necessary to pass down dev
> > >> TLB invalidation
> > >> requests? Can it be handled by host iOMMU driver automatically?
> > >
> > > On host SVA, when a memory is unmapped, driver callback will
> > > invalidate dev IOTLB explicitly. So I guess we need to pass down
> > > it for guest case. This is also required for guest iova over 1st
> > > level usage as far as can see.
> > >
> >
> > Sorry, I confused guest vIOVA and guest vSVA. For guest vIOVA, no
> > device TLB invalidation pass down. But currently for guest vSVA,
> > device TLB invalidation is passed down. Perhaps we can avoid
> > passing down dev TLB flush just like what we are doing for guest
> > IOVA.
>
> I think dev TLB is fully handled within IOMMU driver today. It doesn't
> require device driver to explicit toggle. With this then we can fully
> virtualize guest dev TLB invalidation request to save one syscall,
> since the host is supposed to flush dev TLB when serving the earlier
> IOTLB invalidation pass-down.

In the previous discussions, we thought about making IOTLB flush
inclusive, where IOTLB flush would always include device TLB flush. But
we thought such behavior cannot be assumed for all VMMs, some may still
do explicit dev TLB flush. So for completeness, we included dev TLB
here.

2019-10-28 23:45:26

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 08/11] iommu/vt-d: Misc macro clean up for SVM

On Sat, 26 Oct 2019 09:00:51 +0800
Lu Baolu <[email protected]> wrote:

> Hi,
>
> On 10/25/19 3:55 AM, Jacob Pan wrote:
> > Use combined macros for_each_svm_dev() to simplify SVM device
> > iteration and error checking.
> >
> > Suggested-by: Andy Shevchenko <[email protected]>
> > Signed-off-by: Jacob Pan <[email protected]>
> > Reviewed-by: Eric Auger <[email protected]>
> > ---
> > drivers/iommu/intel-svm.c | 89
> > ++++++++++++++++++++++------------------------- 1 file changed, 42
> > insertions(+), 47 deletions(-)
> >
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index a9a7f85a09bc..a18b02a9709d 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -212,6 +212,10 @@ static const struct mmu_notifier_ops
> > intel_mmuops = { static DEFINE_MUTEX(pasid_mutex);
> > static LIST_HEAD(global_svm_list);
> >
> > +#define for_each_svm_dev(svm, dev) \
> > + list_for_each_entry(sdev, &svm->devs, list) \
> > + if (dev == sdev->dev) \
> > +
> > int intel_svm_bind_mm(struct device *dev, int *pasid, int flags,
> > struct svm_dev_ops *ops) {
> > struct intel_iommu *iommu =
> > intel_svm_device_to_iommu(dev); @@ -257,15 +261,13 @@ int
> > intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct
> > svm_dev_ goto out; }
> >
> > - list_for_each_entry(sdev, &svm->devs,
> > list) {
> > - if (dev == sdev->dev) {
> > - if (sdev->ops != ops) {
> > - ret = -EBUSY;
> > - goto out;
> > - }
> > - sdev->users++;
> > - goto success;
> > + for_each_svm_dev(svm, dev) {
> > + if (sdev->ops != ops) {
> > + ret = -EBUSY;
> > + goto out;
> > }
> > + sdev->users++;
> > + goto success;
> > }
> >
> > break;
> > @@ -402,50 +404,43 @@ int intel_svm_unbind_mm(struct device *dev,
> > int pasid) goto out;
> >
> > svm = ioasid_find(NULL, pasid, NULL);
> > - if (IS_ERR(svm)) {
> > + if (IS_ERR_OR_NULL(svm)) {
> > ret = PTR_ERR(svm);
> > goto out;
> > }
> >
> > - if (!svm)
> > - goto out;
>
> If svm == NULL here, this function will return success. This isn't
> expected, right?
>
you are right, should handle separately.

Thanks!
> Others looks good to me.
>
> Reviewed-by: Lu Baolu <[email protected]>
>
> Best regards,
> baolu
>
> > -
> > - list_for_each_entry(sdev, &svm->devs, list) {
> > - if (dev == sdev->dev) {
> > - ret = 0;
> > - sdev->users--;
> > - if (!sdev->users) {
> > - list_del_rcu(&sdev->list);
> > - /* Flush the PASID cache and IOTLB
> > for this device.
> > - * Note that we do depend on the
> > hardware *not* using
> > - * the PASID any more. Just as we
> > depend on other
> > - * devices never using PASIDs that
> > they have no right
> > - * to use. We have a *shared*
> > PASID table, because it's
> > - * large and has to be physically
> > contiguous. So it's
> > - * hard to be as defensive as we
> > might like. */
> > - intel_pasid_tear_down_entry(iommu,
> > dev, svm->pasid);
> > - intel_flush_svm_range_dev(svm,
> > sdev, 0, -1, 0);
> > - kfree_rcu(sdev, rcu);
> > -
> > - if (list_empty(&svm->devs)) {
> > - /* Clear private data so
> > that free pass check */
> > -
> > ioasid_set_data(svm->pasid, NULL);
> > - ioasid_free(svm->pasid);
> > - if (svm->mm)
> > -
> > mmu_notifier_unregister(&svm->notifier, svm->mm); -
> > - list_del(&svm->list);
> > -
> > - /* We mandate that no page
> > faults may be outstanding
> > - * for the PASID when
> > intel_svm_unbind_mm() is called.
> > - * If that is not obeyed,
> > subtle errors will happen.
> > - * Let's make them less
> > subtle... */
> > - memset(svm, 0x6b,
> > sizeof(*svm));
> > - kfree(svm);
> > - }
> > + for_each_svm_dev(svm, dev) {
> > + ret = 0;
> > + sdev->users--;
> > + if (!sdev->users) {
> > + list_del_rcu(&sdev->list);
> > + /* Flush the PASID cache and IOTLB for
> > this device.
> > + * Note that we do depend on the hardware
> > *not* using
> > + * the PASID any more. Just as we depend
> > on other
> > + * devices never using PASIDs that they
> > have no right
> > + * to use. We have a *shared* PASID table,
> > because it's
> > + * large and has to be physically
> > contiguous. So it's
> > + * hard to be as defensive as we might
> > like. */
> > + intel_pasid_tear_down_entry(iommu, dev,
> > svm->pasid);
> > + intel_flush_svm_range_dev(svm, sdev, 0,
> > -1, 0);
> > + kfree_rcu(sdev, rcu);
> > +
> > + if (list_empty(&svm->devs)) {
> > + /* Clear private data so that free
> > pass check */
> > + ioasid_set_data(svm->pasid, NULL);
> > + ioasid_free(svm->pasid);
> > + if (svm->mm)
> > +
> > mmu_notifier_unregister(&svm->notifier, svm->mm);
> > + list_del(&svm->list);
> > + /* We mandate that no page faults
> > may be outstanding
> > + * for the PASID when
> > intel_svm_unbind_mm() is called.
> > + * If that is not obeyed, subtle
> > errors will happen.
> > + * Let's make them less subtle...
> > */
> > + memset(svm, 0x6b, sizeof(*svm));
> > + kfree(svm);
> > }
> > - break;
> > }
> > + break;
> > }
> > out:
> > mutex_unlock(&pasid_mutex);
> > @@ -581,7 +576,7 @@ static irqreturn_t prq_event_thread(int irq,
> > void *d)
> > * to unbind the mm while any page faults
> > are outstanding.
> > * So we only need RCU to protect the
> > internal idr code. */ rcu_read_unlock();
> > - if (IS_ERR(svm) || !svm) {
> > + if (IS_ERR_OR_NULL(svm)) {
> > pr_err("%s: Page request for
> > invalid PASID %d: %08llx %08llx\n", iommu->name, req->pasid,
> > ((unsigned long long *)req)[0], ((unsigned long long *)req)[1]);
> >

[Jacob Pan]

2019-10-28 23:46:00

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 06/11] iommu/vt-d: Avoid duplicated code for PASID setup

On Fri, 25 Oct 2019 06:42:54 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Jacob Pan [mailto:[email protected]]
> > Sent: Friday, October 25, 2019 3:55 AM
> >
> > After each setup for PASID entry, related translation caches must be
> > flushed.
> > We can combine duplicated code into one function which is less error
> > prone.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
>
> similarly, it doesn't need to be in this series.
Technically true, it is in this series so that we can use the combined
function.
>
> > ---
> > drivers/iommu/intel-pasid.c | 48
> > +++++++++++++++++--------------------------- -
> > 1 file changed, 18 insertions(+), 30 deletions(-)
> >
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index e79d680fe300..ffbd416ed3b8
> > 100644 --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -485,6 +485,21 @@ void intel_pasid_tear_down_entry(struct
> > intel_iommu *iommu,
> > devtlb_invalidation_with_pasid(iommu, dev, pasid);
> > }
> >
> > +static void pasid_flush_caches(struct intel_iommu *iommu,
> > + struct pasid_entry *pte,
> > + int pasid, u16 did)
> > +{
> > + if (!ecap_coherent(iommu->ecap))
> > + clflush_cache_range(pte, sizeof(*pte));
> > +
> > + if (cap_caching_mode(iommu->cap)) {
> > + pasid_cache_invalidation_with_pasid(iommu, did,
> > pasid);
> > + iotlb_invalidation_with_pasid(iommu, did, pasid);
> > + } else {
> > + iommu_flush_write_buffer(iommu);
> > + }
> > +}
> > +
> > /*
> > * Set up the scalable mode pasid table entry for first only
> > * translation type.
> > @@ -530,16 +545,7 @@ int intel_pasid_setup_first_level(struct
> > intel_iommu *iommu,
> > /* Setup Present and PASID Granular Transfer Type: */
> > pasid_set_translation_type(pte, 1);
> > pasid_set_present(pte);
> > -
> > - if (!ecap_coherent(iommu->ecap))
> > - clflush_cache_range(pte, sizeof(*pte));
> > -
> > - if (cap_caching_mode(iommu->cap)) {
> > - pasid_cache_invalidation_with_pasid(iommu, did,
> > pasid);
> > - iotlb_invalidation_with_pasid(iommu, did, pasid);
> > - } else {
> > - iommu_flush_write_buffer(iommu);
> > - }
> > + pasid_flush_caches(iommu, pte, pasid, did);
> >
> > return 0;
> > }
> > @@ -603,16 +609,7 @@ int intel_pasid_setup_second_level(struct
> > intel_iommu *iommu,
> > */
> > pasid_set_sre(pte);
> > pasid_set_present(pte);
> > -
> > - if (!ecap_coherent(iommu->ecap))
> > - clflush_cache_range(pte, sizeof(*pte));
> > -
> > - if (cap_caching_mode(iommu->cap)) {
> > - pasid_cache_invalidation_with_pasid(iommu, did,
> > pasid);
> > - iotlb_invalidation_with_pasid(iommu, did, pasid);
> > - } else {
> > - iommu_flush_write_buffer(iommu);
> > - }
> > + pasid_flush_caches(iommu, pte, pasid, did);
> >
> > return 0;
> > }
> > @@ -646,16 +643,7 @@ int intel_pasid_setup_pass_through(struct
> > intel_iommu *iommu,
> > */
> > pasid_set_sre(pte);
> > pasid_set_present(pte);
> > -
> > - if (!ecap_coherent(iommu->ecap))
> > - clflush_cache_range(pte, sizeof(*pte));
> > -
> > - if (cap_caching_mode(iommu->cap)) {
> > - pasid_cache_invalidation_with_pasid(iommu, did,
> > pasid);
> > - iotlb_invalidation_with_pasid(iommu, did, pasid);
> > - } else {
> > - iommu_flush_write_buffer(iommu);
> > - }
> > + pasid_flush_caches(iommu, pte, pasid, did);
> >
> > return 0;
> > }
> > --
> > 2.7.4
>

[Jacob Pan]

2019-10-28 23:47:09

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 04/11] iommu/vt-d: Replace Intel specific PASID allocator with IOASID

On Fri, 25 Oct 2019 06:41:16 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Jacob Pan [mailto:[email protected]]
> > Sent: Friday, October 25, 2019 3:55 AM
> >
> > Make use of generic IOASID code to manage PASID allocation,
> > free, and lookup. Replace Intel specific code.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
>
> better push this patch separately. It's a generic cleanup.
>
True but might be more efficient to have this cleanup patch paved way.
Since the follow up new guest SVA code uses the new API. So I wanted to
get rid of the old code completely.
> > ---
> > drivers/iommu/intel-iommu.c | 12 ++++++------
> > drivers/iommu/intel-pasid.c | 36
> > ------------------------------------ drivers/iommu/intel-svm.c |
> > 39 +++++++++++++++++++++++---------------- 3 files changed, 29
> > insertions(+), 58 deletions(-)
> >
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index ced1d89ef977..2ea09b988a23
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5311,7 +5311,7 @@ static void auxiliary_unlink_device(struct
> > dmar_domain *domain,
> > domain->auxd_refcnt--;
> >
> > if (!domain->auxd_refcnt && domain->default_pasid > 0)
> > - intel_pasid_free_id(domain->default_pasid);
> > + ioasid_free(domain->default_pasid);
> > }
> >
> > static int aux_domain_add_dev(struct dmar_domain *domain,
> > @@ -5329,10 +5329,10 @@ static int aux_domain_add_dev(struct
> > dmar_domain *domain,
> > if (domain->default_pasid <= 0) {
> > int pasid;
> >
> > - pasid = intel_pasid_alloc_id(domain, PASID_MIN,
> > -
> > pci_max_pasids(to_pci_dev(dev)),
> > - GFP_KERNEL);
> > - if (pasid <= 0) {
> > + /* No private data needed for the default pasid */
> > + pasid = ioasid_alloc(NULL, PASID_MIN,
> > pci_max_pasids(to_pci_dev(dev)) - 1,
> > + NULL);
> > + if (pasid == INVALID_IOASID) {
> > pr_err("Can't allocate default pasid\n");
> > return -ENODEV;
> > }
> > @@ -5368,7 +5368,7 @@ static int aux_domain_add_dev(struct
> > dmar_domain *domain,
> > spin_unlock(&iommu->lock);
> > spin_unlock_irqrestore(&device_domain_lock, flags);
> > if (!domain->auxd_refcnt && domain->default_pasid > 0)
> > - intel_pasid_free_id(domain->default_pasid);
> > + ioasid_free(domain->default_pasid);
> >
> > return ret;
> > }
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index d81e857d2b25..e79d680fe300
> > 100644 --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -26,42 +26,6 @@
> > */
> > static DEFINE_SPINLOCK(pasid_lock);
> > u32 intel_pasid_max_id = PASID_MAX;
> > -static DEFINE_IDR(pasid_idr);
> > -
> > -int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
> > -{
> > - int ret, min, max;
> > -
> > - min = max_t(int, start, PASID_MIN);
> > - max = min_t(int, end, intel_pasid_max_id);
> > -
> > - WARN_ON(in_interrupt());
> > - idr_preload(gfp);
> > - spin_lock(&pasid_lock);
> > - ret = idr_alloc(&pasid_idr, ptr, min, max, GFP_ATOMIC);
> > - spin_unlock(&pasid_lock);
> > - idr_preload_end();
> > -
> > - return ret;
> > -}
> > -
> > -void intel_pasid_free_id(int pasid)
> > -{
> > - spin_lock(&pasid_lock);
> > - idr_remove(&pasid_idr, pasid);
> > - spin_unlock(&pasid_lock);
> > -}
> > -
> > -void *intel_pasid_lookup_id(int pasid)
> > -{
> > - void *p;
> > -
> > - spin_lock(&pasid_lock);
> > - p = idr_find(&pasid_idr, pasid);
> > - spin_unlock(&pasid_lock);
> > -
> > - return p;
> > -}
> >
> > int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
> > *pasid) {
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index 9b159132405d..a9a7f85a09bc 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -17,6 +17,7 @@
> > #include <linux/dmar.h>
> > #include <linux/interrupt.h>
> > #include <linux/mm_types.h>
> > +#include <linux/ioasid.h>
> > #include <asm/page.h>
> >
> > #include "intel-pasid.h"
> > @@ -318,16 +319,15 @@ int intel_svm_bind_mm(struct device *dev, int
> > *pasid, int flags, struct svm_dev_
> > if (pasid_max > intel_pasid_max_id)
> > pasid_max = intel_pasid_max_id;
> >
> > - /* Do not use PASID 0 in caching mode (virtualised
> > IOMMU) */
> > - ret = intel_pasid_alloc_id(svm,
> > - !!cap_caching_mode(iommu->cap),
> > - pasid_max - 1,
> > GFP_KERNEL);
> > - if (ret < 0) {
> > + /* Do not use PASID 0, reserved for RID to PASID */
> > + svm->pasid = ioasid_alloc(NULL, PASID_MIN,
> > + pasid_max - 1, svm);
> > + if (svm->pasid == INVALID_IOASID) {
> > kfree(svm);
> > kfree(sdev);
> > + ret = ENOSPC;
> > goto out;
> > }
> > - svm->pasid = ret;
> > svm->notifier.ops = &intel_mmuops;
> > svm->mm = mm;
> > svm->flags = flags;
> > @@ -337,7 +337,7 @@ int intel_svm_bind_mm(struct device *dev, int
> > *pasid, int flags, struct svm_dev_
> > if (mm) {
> > ret =
> > mmu_notifier_register(&svm->notifier, mm); if (ret) {
> > - intel_pasid_free_id(svm->pasid);
> > + ioasid_free(svm->pasid);
> > kfree(svm);
> > kfree(sdev);
> > goto out;
> > @@ -353,7 +353,7 @@ int intel_svm_bind_mm(struct device *dev, int
> > *pasid, int flags, struct svm_dev_
> > if (ret) {
> > if (mm)
> > mmu_notifier_unregister(&svm->notifier,
> > mm);
> > - intel_pasid_free_id(svm->pasid);
> > + ioasid_free(svm->pasid);
> > kfree(svm);
> > kfree(sdev);
> > goto out;
> > @@ -401,7 +401,12 @@ int intel_svm_unbind_mm(struct device *dev, int
> > pasid)
> > if (!iommu)
> > goto out;
> >
> > - svm = intel_pasid_lookup_id(pasid);
> > + svm = ioasid_find(NULL, pasid, NULL);
> > + if (IS_ERR(svm)) {
> > + ret = PTR_ERR(svm);
> > + goto out;
> > + }
> > +
> > if (!svm)
> > goto out;
> >
> > @@ -423,7 +428,9 @@ int intel_svm_unbind_mm(struct device *dev, int
> > pasid)
> > kfree_rcu(sdev, rcu);
> >
> > if (list_empty(&svm->devs)) {
> > -
> > intel_pasid_free_id(svm->pasid);
> > + /* Clear private data so
> > that free pass check */
> > +
> > ioasid_set_data(svm->pasid, NULL);
> > + ioasid_free(svm->pasid);
> > if (svm->mm)
> >
> > mmu_notifier_unregister(&svm->notifier, svm->mm);
> >
> > @@ -458,10 +465,11 @@ int intel_svm_is_pasid_valid(struct device
> > *dev, int pasid)
> > if (!iommu)
> > goto out;
> >
> > - svm = intel_pasid_lookup_id(pasid);
> > - if (!svm)
> > + svm = ioasid_find(NULL, pasid, NULL);
> > + if (IS_ERR(svm)) {
> > + ret = PTR_ERR(svm);
> > goto out;
> > -
> > + }
> > /* init_mm is used in this case */
> > if (!svm->mm)
> > ret = 1;
> > @@ -568,13 +576,12 @@ static irqreturn_t prq_event_thread(int irq,
> > void *d)
> >
> > if (!svm || svm->pasid != req->pasid) {
> > rcu_read_lock();
> > - svm = intel_pasid_lookup_id(req->pasid);
> > + svm = ioasid_find(NULL, req->pasid, NULL);
> > /* It *can't* go away, because the driver
> > is not permitted
> > * to unbind the mm while any page faults
> > are outstanding.
> > * So we only need RCU to protect the
> > internal idr code. */
> > rcu_read_unlock();
> > -
> > - if (!svm) {
> > + if (IS_ERR(svm) || !svm) {
> > pr_err("%s: Page request for
> > invalid PASID %d: %08llx %08llx\n",
> > iommu->name, req->pasid,
> > ((unsigned long long *)req)[0],
> > ((unsigned long long
> > *)req)[1]); --
> > 2.7.4
>

[Jacob Pan]

2019-10-28 23:47:50

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 03/11] iommu/vt-d: Add custom allocator for IOASID

On Fri, 25 Oct 2019 15:52:39 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Lu Baolu [mailto:[email protected]]
> > Sent: Friday, October 25, 2019 10:39 PM
> >
> > Hi,
> >
> > On 10/25/19 2:40 PM, Tian, Kevin wrote:
> > >>>> ioasid_register_allocator(&iommu->pasid_allocator);
> > >>>> + if (ret) {
> > >>>> + pr_warn("Custom PASID
> > >>>> allocator registeration failed\n");
> > >>>> + /*
> > >>>> + * Disable scalable mode on
> > >>>> this IOMMU if there
> > >>>> + * is no custom allocator.
> > >>>> Mixing SM capable vIOMMU
> > >>>> + * and non-SM vIOMMU are not
> > >>>> supported.
> > >>>> + */
> > >>>> + intel_iommu_sm = 0;
> > >>> It's insufficient to disable scalable mode by only clearing
> > >>> intel_iommu_sm. The DMA_RTADDR_SMT bit in root entry has
> > >>> already
> > >> been
> > >>> set. Probably, you need to
> > >>>
> > >>> for each iommu
> > >>> clear DMA_RTADDR_SMT in root entry
> > >>>
> > >>> Alternatively, since vSVA is the only customer of this custom
> > >>> PASID allocator, is it possible to only disable SVA here?
> > >>>
> > >> Yeah, I think disable SVA is better. We can still do gIOVA in
> > >> SM. I guess we need to introduce a flag for sva_enabled.
> > > I'm not sure whether tying above logic to SVA is the right
> > > approach. If vcmd interface doesn't work, the whole SM mode
> > > doesn't make sense which is based on PASID-granular protection
> > > (SVA is only one usage atop). If the only remaining usage of SM
> > > is to map gIOVA using reserved PASID#0, then why not disabling SM
> > > and just fallback to legacy mode?
> > >
> > > Based on that I prefer to disabling the SM mode completely (better
> > > through an interface), and move the logic out of CONFIG_INTEL_
> > > IOMMU_SVM
> > >
> >
> > Unfortunately, it is dangerous to disable SM after boot. SM uses
> > different root/device contexts and pasid table formats. Disabling SM
> > after boot requires changing above from SM format into legacy
> > format.
>
> You are correct.
>
> >
> > Since ioasid registration failure is a rare case. How about moving
> > this part of code up to the early stage of intel_iommu_init() and
> > returning error if hardware present vcmd capability but software
> > fails to register a custom ioasid allocator?
> >
>
> It makes sense to me.
>
sounds good to me too, the earlier the less to clean up.
> Thanks
> Kevin

[Jacob Pan]

2019-10-28 23:48:22

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 03/11] iommu/vt-d: Add custom allocator for IOASID

On Fri, 25 Oct 2019 06:31:04 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Jacob Pan [mailto:[email protected]]
> > Sent: Friday, October 25, 2019 3:55 AM
> >
> > When VT-d driver runs in the guest, PASID allocation must be
> > performed via virtual command interface. This patch registers a
> > custom IOASID allocator which takes precedence over the default
> > XArray based allocator. The resulting IOASID allocation will always
> > come from the host. This ensures that PASID namespace is system-
> > wide.
> >
> > Signed-off-by: Lu Baolu <[email protected]>
> > Signed-off-by: Liu, Yi L <[email protected]>
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/Kconfig | 1 +
> > drivers/iommu/intel-iommu.c | 67
> > +++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/intel-iommu.h | 2 ++
> > 3 files changed, 70 insertions(+)
> >
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index fd50ddffffbf..961fe5795a90 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -211,6 +211,7 @@ config INTEL_IOMMU_SVM
> > bool "Support for Shared Virtual Memory with Intel IOMMU"
> > depends on INTEL_IOMMU && X86
> > select PCI_PASID
> > + select IOASID
> > select MMU_NOTIFIER
> > help
> > Shared Virtual Memory (SVM) provides a facility for
> > devices diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 3f974919d3bd..ced1d89ef977
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -1706,6 +1706,9 @@ static void free_dmar_iommu(struct intel_iommu
> > *iommu)
> > if (ecap_prs(iommu->ecap))
> > intel_svm_finish_prq(iommu);
> > }
> > + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
> > +
> > ioasid_unregister_allocator(&iommu->pasid_allocator); +
> > #endif
> > }
> >
> > @@ -4910,6 +4913,44 @@ static int __init
> > probe_acpi_namespace_devices(void)
> > return 0;
> > }
> >
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max,
> > void *data) +{
> > + struct intel_iommu *iommu = data;
> > + ioasid_t ioasid;
> > +
> > + /*
> > + * VT-d virtual command interface always uses the full 20
> > bit
> > + * PASID range. Host can partition guest PASID range based
> > on
> > + * policies but it is out of guest's control.
> > + */
> > + if (min < PASID_MIN || max > intel_pasid_max_id)
> > + return INVALID_IOASID;
> > +
> > + if (vcmd_alloc_pasid(iommu, &ioasid))
> > + return INVALID_IOASID;
> > +
> > + return ioasid;
> > +}
> > +
> > +static void intel_ioasid_free(ioasid_t ioasid, void *data)
> > +{
> > + struct intel_iommu *iommu = data;
> > +
> > + if (!iommu)
> > + return;
> > + /*
> > + * Sanity check the ioasid owner is done at upper layer,
> > e.g. VFIO
> > + * We can only free the PASID when all the devices are
> > unbond.
>
> unbond -> unbound
>
will fix
> > + */
> > + if (ioasid_find(NULL, ioasid, NULL)) {
> > + pr_alert("Cannot free active IOASID %d\n", ioasid);
> > + return;
> > + }
> > + vcmd_free_pasid(iommu, ioasid);
> > +}
> > +#endif
> > +
> > int __init intel_iommu_init(void)
> > {
> > int ret = -ENODEV;
> > @@ -5020,6 +5061,32 @@ int __init intel_iommu_init(void)
> > "%s", iommu->name);
> > iommu_device_set_ops(&iommu->iommu,
> > &intel_iommu_ops);
> > iommu_device_register(&iommu->iommu);
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > + if (ecap_vcs(iommu->ecap) &&
> > vccap_pasid(iommu->vccap)) {
> > + pr_info("Register custom PASID
> > allocator\n");
> > + /*
> > + * Register a custom ASID allocator if we
> > are running
> > + * in a guest, the purpose is to have a
> > system wide PASID
> > + * namespace among all PASID users.
> > + * There can be multiple vIOMMUs in each
> > guest but only
> > + * one allocator is active. All vIOMMU
> > allocators will
> > + * eventually be calling the same host
> > allocator.
> > + */
> > + iommu->pasid_allocator.alloc =
> > intel_ioasid_alloc;
> > + iommu->pasid_allocator.free =
> > intel_ioasid_free;
> > + iommu->pasid_allocator.pdata = (void
> > *)iommu;
> > + ret = ioasid_register_allocator(&iommu-
> > >pasid_allocator);
> > + if (ret) {
> > + pr_warn("Custom PASID allocator
> > registeration failed\n");
>
> registration
will fix

Thanks!
>
> > + /*
> > + * Disable scalable mode on this
> > IOMMU if there
> > + * is no custom allocator. Mixing
> > SM capable vIOMMU
> > + * and non-SM vIOMMU are not
> > supported.
> > + */
> > + intel_iommu_sm = 0;
> > + }
> > + }
> > +#endif
> > }
> >
> > bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index 1d4b8dcdc5d8..c624733cb2e6
> > 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -19,6 +19,7 @@
> > #include <linux/iommu.h>
> > #include <linux/io-64-nonatomic-lo-hi.h>
> > #include <linux/dmar.h>
> > +#include <linux/ioasid.h>
> >
> > #include <asm/cacheflush.h>
> > #include <asm/iommu.h>
> > @@ -546,6 +547,7 @@ struct intel_iommu {
> > #ifdef CONFIG_INTEL_IOMMU_SVM
> > struct page_req_dsc *prq;
> > unsigned char prq_name[16]; /* Name for PRQ interrupt */
> > + struct ioasid_allocator_ops pasid_allocator; /* Custom
> > allocator for PASIDs */
> > #endif
> > struct q_inval *qi; /* Queued invalidation
> > info */ u32 *iommu_state; /* Store iommu states between suspend and
> > resume.*/
> > --
> > 2.7.4
>

[Jacob Pan]

2019-10-29 06:54:46

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support

Hi Baolu,

Appreciate the thorough review, comments inline.

On Sat, 26 Oct 2019 10:01:19 +0800
Lu Baolu <[email protected]> wrote:

> Hi,
>
> On 10/25/19 3:55 AM, Jacob Pan wrote:
> > When supporting guest SVA with emulated IOMMU, the guest PASID
> > table is shadowed in VMM. Updates to guest vIOMMU PASID table
> > will result in PASID cache flush which will be passed down to
> > the host as bind guest PASID calls.
> >
> > For the SL page tables, it will be harvested from device's
> > default domain (request w/o PASID), or aux domain in case of
> > mediated device.
> >
> > .-------------. .---------------------------.
> > | vIOMMU | | Guest process CR3, FL only|
> > | | '---------------------------'
> > .----------------/
> > | PASID Entry |--- PASID cache flush -
> > '-------------' |
> > | | V
> > | | CR3 in GPA
> > '-------------'
> > Guest
> > ------| Shadow |--------------------------|--------
> > v v v
> > Host
> > .-------------. .----------------------.
> > | pIOMMU | | Bind FL for GVA-GPA |
> > | | '----------------------'
> > .----------------/ |
> > | PASID Entry | V (Nested xlate)
> > '----------------\.------------------------------.
> > | | |SL for GPA-HPA, default domain|
> > | | '------------------------------'
> > '-------------'
> > Where:
> > - FL = First level/stage one page tables
> > - SL = Second level/stage two page tables
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > Signed-off-by: Liu, Yi L <[email protected]>
> > ---
> > drivers/iommu/intel-iommu.c | 4 +
> > drivers/iommu/intel-svm.c | 184
> > ++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/intel-iommu.h | 8 +- include/linux/intel-svm.h |
> > 17 ++++ 4 files changed, 212 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index acd1ac787d8b..5fab32fbc4b4
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -6026,6 +6026,10 @@ const struct iommu_ops intel_iommu_ops = {
> > .dev_disable_feat = intel_iommu_dev_disable_feat,
> > .is_attach_deferred =
> > intel_iommu_is_attach_deferred, .pgsize_bitmap =
> > INTEL_IOMMU_PGSIZES, +#ifdef CONFIG_INTEL_IOMMU_SVM
> > + .sva_bind_gpasid = intel_svm_bind_gpasid,
> > + .sva_unbind_gpasid = intel_svm_unbind_gpasid,
> > +#endif
> > };
> >
> > static void quirk_iommu_igfx(struct pci_dev *dev)
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index a18b02a9709d..ae13a310cf96 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -216,6 +216,190 @@ static LIST_HEAD(global_svm_list);
> > list_for_each_entry(sdev, &svm->devs, list) \
> > if (dev == sdev->dev) \
>
> Add an indent tab please.
>
looks good.
> >
> > +int intel_svm_bind_gpasid(struct iommu_domain *domain,
> > + struct device *dev,
> > + struct iommu_gpasid_bind_data *data)
> > +{
> > + struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
> > + struct dmar_domain *ddomain;
> > + struct intel_svm_dev *sdev;
> > + struct intel_svm *svm;
> > + int ret = 0;
> > +
> > + if (WARN_ON(!iommu) || !data)
> > + return -EINVAL;
> > +
> > + if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
> > + data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
>
> Alignment should match open parenthesis.
>
> Run "scripts/checkpatch.pl --strict" for all in this patch. I will
> ignore others.
>
it was my editor's setting :), will do.

> > + return -EINVAL;
> > +
> > + if (dev_is_pci(dev)) {
> > + /* VT-d supports devices with full 20 bit PASIDs
> > only */
> > + if (pci_max_pasids(to_pci_dev(dev)) != PASID_MAX)
> > + return -EINVAL;
> > + }
> > +
> > + /*
> > + * We only check host PASID range, we have no knowledge to
> > check
> > + * guest PASID range nor do we use the guest PASID.
> > + */
> > + if (data->hpasid <= 0 || data->hpasid >= PASID_MAX)
> > + return -EINVAL;
> > +
> > + ddomain = to_dmar_domain(domain);
> > + /* REVISIT:
> > + * Sanity check adddress width and paging mode support
>
> s/adddress/address/g
>
good catch, I will add the check for paging mode and this comment is no
longer needed.
> > + * width matching in two dimensions:
> > + * 1. paging mode CPU <= IOMMU
> > + * 2. address width Guest <= Host.
> > + */ > + mutex_lock(&pasid_mutex);
> > + svm = ioasid_find(NULL, data->hpasid, NULL);
> > + if (IS_ERR(svm)) {
> > + ret = PTR_ERR(svm);
> > + goto out;
> > + }
>
> A blank line looks better.
>
true.
> > + if (svm) {
> > + /*
> > + * If we found svm for the PASID, there must be at
> > + * least one device bond, otherwise svm should be
> > freed.
> > + */
> > + BUG_ON(list_empty(&svm->devs));
>
> Avoid crashing kernel, use WARN_ON() instead.
>
> if (WARN_ON(list_empty(&svm->devs))) {
> ret = -EINVAL;
> goto out;
> }
>
Yeah, WARN_ON is better, should let the kernel continue for easy debug.
Though this is an indication of serious problem.
> > +
> > + for_each_svm_dev(svm, dev) {
> > + /* In case of multiple sub-devices of the
> > same pdev assigned, we should
>
> Make line shorter. Not over 80 characters.
>
> The same for other lines.
>
sure.
> > + * allow multiple bind calls with the same
> > PASID and pdev.
> > + */
> > + sdev->users++;
> > + goto out;
> > + }
>
> I remember I ever pointed this out before. But I forgot how we
> addressed it. So forgive me if this has been addressed.
>
> What if we have a valid bound svm but @dev doesn't belong to it
> (a.k.a. @dev not in svm->devs list)?
>
If we are binding a new device to an existing/active PASID, the code
will allocate a new sdev and add that to the svm->devs list.
> > + } else {
> > + /* We come here when PASID has never been bond to
> > a device. */
> > + svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> > + if (!svm) {
> > + ret = -ENOMEM;
> > + goto out;
> > + }
> > + /* REVISIT: upper layer/VFIO can track host
> > process that bind the PASID.
> > + * ioasid_set = mm might be sufficient for vfio to
> > check pasid VMM
> > + * ownership.
> > + */
> > + svm->mm = get_task_mm(current);
> > + svm->pasid = data->hpasid;
> > + if (data->flags & IOMMU_SVA_GPASID_VAL) {
> > + svm->gpasid = data->gpasid;
> > + svm->flags |= SVM_FLAG_GUEST_PASID;
> > + }
> > + ioasid_set_data(data->hpasid, svm);
> > + INIT_LIST_HEAD_RCU(&svm->devs);
> > + INIT_LIST_HEAD(&svm->list);
> > +
> > + mmput(svm->mm);
> > + }
>
> A blank line, please.
looks good.
>
> > + sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> > + if (!sdev) {
> > + if (list_empty(&svm->devs))
> > + kfree(svm);
>
> This is dangerous. This might leave a wild pointer bound with gpasid.
>
why is that? can you please explain?
if the list is empty that means we just allocated the new svm, no
users. why can't we free it here?

> > + ret = -ENOMEM;
> > + goto out;
> > + }
> > + sdev->dev = dev;
> > + sdev->users = 1;
> > +
> > + /* Set up device context entry for PASID if not enabled
> > already */
> > + ret = intel_iommu_enable_pasid(iommu, sdev->dev);
> > + if (ret) {
> > + dev_err(dev, "Failed to enable PASID
> > capability\n");
> > + kfree(sdev);
> > + goto out;
> > + }
> > +
> > + /*
> > + * For guest bind, we need to set up PASID table entry as
> > follows:
> > + * - FLPM matches guest paging mode
> > + * - turn on nested mode
> > + * - SL guest address width matching
> > + */
> > + ret = intel_pasid_setup_nested(iommu,
> > + dev,
> > + (pgd_t *)data->gpgd,
> > + data->hpasid,
> > + &data->vtd,
> > + ddomain,
> > + data->addr_width);
> > + if (ret) {
> > + dev_err(dev, "Failed to set up PASID %llu in
> > nested mode, Err %d\n",
> > + data->hpasid, ret);
>
> This error handling is insufficient. You should at least:
>
> 1. free sdev
already done below

> 2. if list_empty(&svm->devs)
> unbound the svm from gpasid
> free svm
>
yes, agreed.

> The same for above error handling. Add a branch for error recovery at
> the end of function might help here.
>
not sure which code is the same as above? could you point it out?
> > + kfree(sdev);
> > + goto out;
> > + }
> > + svm->flags |= SVM_FLAG_GUEST_MODE;
> > +
> > + init_rcu_head(&sdev->rcu);
> > + list_add_rcu(&sdev->list, &svm->devs);
> > + out:
> > + mutex_unlock(&pasid_mutex);
> > + return ret;
> > +}
> > +
> > +int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> > +{
> > + struct intel_svm_dev *sdev;
> > + struct intel_iommu *iommu;
> > + struct intel_svm *svm;
> > + int ret = -EINVAL;
> > +
> > + mutex_lock(&pasid_mutex);
> > + iommu = intel_svm_device_to_iommu(dev);
> > + if (!iommu)
> > + goto out;
>
> Make it symmetrical with bind function.
>
> if (WARN_ON(!iommu))
> goto out;
>
sounds good.
> > +
> > + svm = ioasid_find(NULL, pasid, NULL);
> > + if (IS_ERR_OR_NULL(svm)) {
> > + ret = PTR_ERR(svm);
>
> If svm == NULL, this function will return success. This is not
> expected, right?
>
good catch, will fix.
> > + goto out;
> > + }
> > +
> > + for_each_svm_dev(svm, dev) {
> > + ret = 0;
> > + sdev->users--;
> > + if (!sdev->users) {
> > + list_del_rcu(&sdev->list);
> > + intel_pasid_tear_down_entry(iommu, dev,
> > svm->pasid);
> > + /* TODO: Drain in flight PRQ for the PASID
> > since it
> > + * may get reused soon, we don't want to
> > + * confuse with its previous life.
> > + * intel_svm_drain_prq(dev, pasid);
> > + */
> > + kfree_rcu(sdev, rcu);
> > +
> > + if (list_empty(&svm->devs)) {
> > + list_del(&svm->list);
> > + kfree(svm);
> > + /*
> > + * We do not free PASID here until
> > explicit call
> > + * from VFIO to free. The PASID
> > life cycle
> > + * management is largely tied to
> > VFIO management
> > + * of assigned device life cycles.
> > In case of
> > + * guest exit without a explicit
> > free PASID call,
> > + * the responsibility lies in VFIO
> > layer to free
> > + * the PASIDs allocated for the
> > guest.
> > + * For security reasons, VFIO has
> > to track the
> > + * PASID ownership per guest
> > anyway to ensure
> > + * that PASID allocated by one
> > guest cannot be
> > + * used by another.
> > + */
> > + ioasid_set_data(pasid, NULL);
>
> Exchange order. First unbind svm from gpasid and then free svm.
>
I am not following, aren't we already doing free svm after unbind?
please explain.
> > + }
> > + }
> > + break;
> > + }
> > + out:
> > + mutex_unlock(&pasid_mutex);
> > +
> > + return ret;
> > +}
> > +
> > int intel_svm_bind_mm(struct device *dev, int *pasid, int flags,
> > struct svm_dev_ops *ops) {
> > struct intel_iommu *iommu =
> > intel_svm_device_to_iommu(dev); diff --git
> > a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index
> > 3dba6ad3e9ad..6c74c71b1ebf 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -673,7 +673,9 @@ int intel_iommu_enable_pasid(struct intel_iommu
> > *iommu, struct device *dev); int intel_svm_init(struct intel_iommu
> > *iommu); extern int intel_svm_enable_prq(struct intel_iommu *iommu);
> > extern int intel_svm_finish_prq(struct intel_iommu *iommu);
> > -
> > +extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
> > + struct device *dev, struct iommu_gpasid_bind_data
> > *data); +extern int intel_svm_unbind_gpasid(struct device *dev, int
> > pasid); struct svm_dev_ops;
> >
> > struct intel_svm_dev {
> > @@ -690,9 +692,13 @@ struct intel_svm_dev {
> > struct intel_svm {
> > struct mmu_notifier notifier;
> > struct mm_struct *mm;
> > +
> > struct intel_iommu *iommu;
> > int flags;
> > int pasid;
> > + int gpasid; /* Guest PASID in case of vSVA bind with
> > non-identity host
> > + * to guest PASID mapping.
> > + */
> > struct list_head devs;
> > struct list_head list;
> > };
> > diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> > index 94f047a8a845..a2c189ad0b01 100644
> > --- a/include/linux/intel-svm.h
> > +++ b/include/linux/intel-svm.h
> > @@ -44,6 +44,23 @@ struct svm_dev_ops {
> > * do such IOTLB flushes automatically.
> > */
> > #define SVM_FLAG_SUPERVISOR_MODE (1<<1)
> > +/*
> > + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind
> > to a device.
> > + * In this case the mm_struct is in the guest kernel or userspace,
> > its life
> > + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this
> > API provides
> > + * means to bind/unbind guest CR3 with PASIDs allocated for a
> > device.
> > + */
> > +#define SVM_FLAG_GUEST_MODE (1<<2)
>
> How about keeping this aligned with top by adding a tab?
>
sounds good.
> BIT macro is preferred. Hence, make it BIT(1), BIT(2), BIT(3) is
> preferred.
>
I know, but the existing mainline code is not using BIT, so I wanted
to keep coding style consistent. Perhaps a separate cleanup patch will
do later.
> > +/*
> > + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own
> > PASID space,
> > + * which requires guest and host PASID translation at both
> > directions. We keep
> > + * track of guest PASID in order to provide lookup service to
> > device drivers.
> > + * One such example is a physical function (PF) driver that
> > supports mediated
> > + * device (mdev) assignment. Guest programming of mdev
> > configuration space can
> > + * only be done with guest PASID, therefore PF driver needs to
> > find the matching
> > + * host PASID to program the real hardware.
> > + */
> > +#define SVM_FLAG_GUEST_PASID (1<<3)
>
> Ditto.
>
> Best regards,
> baolu

[Jacob Pan]

2019-10-29 07:23:44

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v7 03/11] iommu/vt-d: Add custom allocator for IOASID

Hi,

On 10/29/19 6:49 AM, Jacob Pan wrote:
>>>> I'm not sure whether tying above logic to SVA is the right
>>>> approach. If vcmd interface doesn't work, the whole SM mode
>>>> doesn't make sense which is based on PASID-granular protection
>>>> (SVA is only one usage atop). If the only remaining usage of SM
>>>> is to map gIOVA using reserved PASID#0, then why not disabling SM
>>>> and just fallback to legacy mode?
>>>>
>>>> Based on that I prefer to disabling the SM mode completely (better
>>>> through an interface), and move the logic out of CONFIG_INTEL_
>>>> IOMMU_SVM
>>>>
>>> Unfortunately, it is dangerous to disable SM after boot. SM uses
>>> different root/device contexts and pasid table formats. Disabling SM
>>> after boot requires changing above from SM format into legacy
>>> format.
>> You are correct.
>>
>>> Since ioasid registration failure is a rare case. How about moving
>>> this part of code up to the early stage of intel_iommu_init() and
>>> returning error if hardware present vcmd capability but software
>>> fails to register a custom ioasid allocator?
>>>
>> It makes sense to me.
>>
> sounds good to me too, the earlier the less to clean up.

Actually, we even could return error directly and abort the iommu
initialization. The registration of custom ioasid allocator fails only
when memory runs out or software is buggy. In either cases, we should
abort iommu initialization.

Best regards,
baolu

2019-10-29 07:23:48

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support

Hi,

On 10/28/19 2:03 PM, Tian, Kevin wrote:
>>>> .dev_disable_feat = intel_iommu_dev_disable_feat,
>>>> .is_attach_deferred =
>>>> intel_iommu_is_attach_deferred, .pgsize_bitmap =
>>>> INTEL_IOMMU_PGSIZES, +#ifdef CONFIG_INTEL_IOMMU_SVM
>>>> + .sva_bind_gpasid = intel_svm_bind_gpasid,
>>>> + .sva_unbind_gpasid = intel_svm_unbind_gpasid,
>>>> +#endif
>>> again, pure PASID management logic should be separated from SVM.
>>>
>> I am not following, these two functions are SVM functionality, not
>> pure PASID management which is already separated in ioasid.c
> I should say pure "scalable mode" logic. Above callbacks are not
> related to host SVM per se. They are serving gpasid requests from
> guest side, thus part of generic scalable mode capability.
>

Currently these two callbacks are for sva only and the patch has been
queued by Joerg for the next rc1. It could be extended to be generic.
But it deserves a separated patch.

Best regards,
baolu

2019-10-29 07:24:38

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support

On Tue, 29 Oct 2019 10:54:48 +0800
Lu Baolu <[email protected]> wrote:

> Hi,
>
> On 10/29/19 6:29 AM, Jacob Pan wrote:
> > Hi Baolu,
> >
> > Appreciate the thorough review, comments inline.
>
> You are welcome.
>
> >
> > On Sat, 26 Oct 2019 10:01:19 +0800
> > Lu Baolu <[email protected]> wrote:
> >
> >> Hi,
> >>
>
> [...]
>
> >>> + * allow multiple bind calls with the
> >>> same PASID and pdev.
> >>> + */
> >>> + sdev->users++;
> >>> + goto out;
> >>> + }
> >>
> >> I remember I ever pointed this out before. But I forgot how we
> >> addressed it. So forgive me if this has been addressed.
> >>
> >> What if we have a valid bound svm but @dev doesn't belong to it
> >> (a.k.a. @dev not in svm->devs list)?
> >>
> > If we are binding a new device to an existing/active PASID, the code
> > will allocate a new sdev and add that to the svm->devs list.
>
> But allocating a new sdev and adding device is in below else branch,
> so it will never reach there, right?
>
No, allocating sdev is outside else branch.
> >>> + } else {
> >>> + /* We come here when PASID has never been bond to
> >>> a device. */
> >>> + svm = kzalloc(sizeof(*svm), GFP_KERNEL);
> >>> + if (!svm) {
> >>> + ret = -ENOMEM;
> >>> + goto out;
> >>> + }
> >>> + /* REVISIT: upper layer/VFIO can track host
> >>> process that bind the PASID.
> >>> + * ioasid_set = mm might be sufficient for vfio
> >>> to check pasid VMM
> >>> + * ownership.
> >>> + */
> >>> + svm->mm = get_task_mm(current);
> >>> + svm->pasid = data->hpasid;
> >>> + if (data->flags & IOMMU_SVA_GPASID_VAL) {
> >>> + svm->gpasid = data->gpasid;
> >>> + svm->flags |= SVM_FLAG_GUEST_PASID;
> >>> + }
> >>> + ioasid_set_data(data->hpasid, svm);
> >>> + INIT_LIST_HEAD_RCU(&svm->devs);
> >>> + INIT_LIST_HEAD(&svm->list);
> >>> +
> >>> + mmput(svm->mm);
> >>> + }
> >>
> >> A blank line, please.
> > looks good.
> >>
> >>> + sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
> >>> + if (!sdev) {
> >>> + if (list_empty(&svm->devs))
> >>> + kfree(svm);
> >>
> >> This is dangerous. This might leave a wild pointer bound with
> >> gpasid.
> > why is that? can you please explain?
> > if the list is empty that means we just allocated the new svm, no
> > users. why can't we free it here?
>
> svm has been associated with the pasid private data. It needs to be
> unbound from pasid before getting freed. Otherwise, a wild pointer
> will be left.
>
> ioasid_set_data(pasid, NULL);
> kfree(svm);
>
Right, I need to clear the private data here. Thanks!

> >
> >>> + ret = -ENOMEM;
> >>> + goto out;
> >>> + }
> >>> + sdev->dev = dev;
> >>> + sdev->users = 1;
> >>> +
> >>> + /* Set up device context entry for PASID if not enabled
> >>> already */
> >>> + ret = intel_iommu_enable_pasid(iommu, sdev->dev);
> >>> + if (ret) {
> >>> + dev_err(dev, "Failed to enable PASID
> >>> capability\n");
> >>> + kfree(sdev);
> >>> + goto out;
> >>> + }
> >>> +
> >>> + /*
> >>> + * For guest bind, we need to set up PASID table entry as
> >>> follows:
> >>> + * - FLPM matches guest paging mode
> >>> + * - turn on nested mode
> >>> + * - SL guest address width matching
> >>> + */
> >>> + ret = intel_pasid_setup_nested(iommu,
> >>> + dev,
> >>> + (pgd_t *)data->gpgd,
> >>> + data->hpasid,
> >>> + &data->vtd,
> >>> + ddomain,
> >>> + data->addr_width);
> >>> + if (ret) {
> >>> + dev_err(dev, "Failed to set up PASID %llu in
> >>> nested mode, Err %d\n",
> >>> + data->hpasid, ret);
> >>
> >> This error handling is insufficient. You should at least:
> >>
> >> 1. free sdev
> > already done below
> >
> >> 2. if list_empty(&svm->devs)
> >> unbound the svm from gpasid
> >> free svm
> >>
> > yes, agreed.
> >
> >> The same for above error handling. Add a branch for error recovery
> >> at the end of function might help here.
> >>
> > not sure which code is the same as above? could you point it out?
>
> Above last comment. :-)
>
Got it.
> >>> + kfree(sdev);
> >>> + goto out;
> >>> + }
> >>> + svm->flags |= SVM_FLAG_GUEST_MODE;
> >>> +
> >>> + init_rcu_head(&sdev->rcu);
> >>> + list_add_rcu(&sdev->list, &svm->devs);
> >>> + out:
> >>> + mutex_unlock(&pasid_mutex);
> >>> + return ret;
> >>> +}
> >>> +
> >>> +int intel_svm_unbind_gpasid(struct device *dev, int pasid)
> >>> +{
> >>> + struct intel_svm_dev *sdev;
> >>> + struct intel_iommu *iommu;
> >>> + struct intel_svm *svm;
> >>> + int ret = -EINVAL;
> >>> +
> >>> + mutex_lock(&pasid_mutex);
> >>> + iommu = intel_svm_device_to_iommu(dev);
> >>> + if (!iommu)
> >>> + goto out;
> >>
> >> Make it symmetrical with bind function.
> >>
> >> if (WARN_ON(!iommu))
> >> goto out;
> >>
> > sounds good.
> >>> +
> >>> + svm = ioasid_find(NULL, pasid, NULL);
> >>> + if (IS_ERR_OR_NULL(svm)) {
> >>> + ret = PTR_ERR(svm);
> >>
> >> If svm == NULL, this function will return success. This is not
> >> expected, right?
> >>
> > good catch, will fix.
> >>> + goto out;
> >>> + }
> >>> +
> >>> + for_each_svm_dev(svm, dev) {
> >>> + ret = 0;
> >>> + sdev->users--;
> >>> + if (!sdev->users) {
> >>> + list_del_rcu(&sdev->list);
> >>> + intel_pasid_tear_down_entry(iommu, dev,
> >>> svm->pasid);
> >>> + /* TODO: Drain in flight PRQ for the
> >>> PASID since it
> >>> + * may get reused soon, we don't want to
> >>> + * confuse with its previous life.
> >>> + * intel_svm_drain_prq(dev, pasid);
> >>> + */
> >>> + kfree_rcu(sdev, rcu);
> >>> +
> >>> + if (list_empty(&svm->devs)) {
> >>> + list_del(&svm->list);
> >>> + kfree(svm);
> >>> + /*
> >>> + * We do not free PASID here
> >>> until explicit call
> >>> + * from VFIO to free. The PASID
> >>> life cycle
> >>> + * management is largely tied to
> >>> VFIO management
> >>> + * of assigned device life
> >>> cycles. In case of
> >>> + * guest exit without a explicit
> >>> free PASID call,
> >>> + * the responsibility lies in
> >>> VFIO layer to free
> >>> + * the PASIDs allocated for the
> >>> guest.
> >>> + * For security reasons, VFIO has
> >>> to track the
> >>> + * PASID ownership per guest
> >>> anyway to ensure
> >>> + * that PASID allocated by one
> >>> guest cannot be
> >>> + * used by another.
> >>> + */
> >>> + ioasid_set_data(pasid, NULL);
> >>
> >> Exchange order. First unbind svm from gpasid and then free svm.
> >>
> > I am not following, aren't we already doing free svm after unbind?
> > please explain.
>
> I meant
>
> ioasid_set_data(pasid, NULL);
> kfree(svm);
>
> in reverse order, it leaves a short window when svm is freed, but
> pasid private data is still kept svm (wild pointer).
>
>
Right. will fix
> >>> + }
> >>> + }
> >>> + break;
> >>> + }
> >>> + out:
> >>> + mutex_unlock(&pasid_mutex);
> >>> +
> >>> + return ret;
> >>> +}
> >>> +
> >>> int intel_svm_bind_mm(struct device *dev, int *pasid, int
> >>> flags, struct svm_dev_ops *ops) {
> >>> struct intel_iommu *iommu =
> >>> intel_svm_device_to_iommu(dev); diff --git
> >>> a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index
> >>> 3dba6ad3e9ad..6c74c71b1ebf 100644 ---
> >>> a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h
> >>> @@ -673,7 +673,9 @@ int intel_iommu_enable_pasid(struct
> >>> intel_iommu *iommu, struct device *dev); int
> >>> intel_svm_init(struct intel_iommu *iommu); extern int
> >>> intel_svm_enable_prq(struct intel_iommu *iommu); extern int
> >>> intel_svm_finish_prq(struct intel_iommu *iommu); -
> >>> +extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
> >>> + struct device *dev, struct iommu_gpasid_bind_data
> >>> *data); +extern int intel_svm_unbind_gpasid(struct device *dev,
> >>> int pasid); struct svm_dev_ops;
> >>>
> >>> struct intel_svm_dev {
> >>> @@ -690,9 +692,13 @@ struct intel_svm_dev {
> >>> struct intel_svm {
> >>> struct mmu_notifier notifier;
> >>> struct mm_struct *mm;
> >>> +
> >>> struct intel_iommu *iommu;
> >>> int flags;
> >>> int pasid;
> >>> + int gpasid; /* Guest PASID in case of vSVA bind with
> >>> non-identity host
> >>> + * to guest PASID mapping.
> >>> + */
> >>> struct list_head devs;
> >>> struct list_head list;
> >>> };
> >>> diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
> >>> index 94f047a8a845..a2c189ad0b01 100644
> >>> --- a/include/linux/intel-svm.h
> >>> +++ b/include/linux/intel-svm.h
> >>> @@ -44,6 +44,23 @@ struct svm_dev_ops {
> >>> * do such IOTLB flushes automatically.
> >>> */
> >>> #define SVM_FLAG_SUPERVISOR_MODE (1<<1)
> >>> +/*
> >>> + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind
> >>> to a device.
> >>> + * In this case the mm_struct is in the guest kernel or
> >>> userspace, its life
> >>> + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this
> >>> API provides
> >>> + * means to bind/unbind guest CR3 with PASIDs allocated for a
> >>> device.
> >>> + */
> >>> +#define SVM_FLAG_GUEST_MODE (1<<2)
> >>
> >> How about keeping this aligned with top by adding a tab?
> >>
> > sounds good.
> >> BIT macro is preferred. Hence, make it BIT(1), BIT(2), BIT(3) is
> >> preferred.
> >>
> > I know, but the existing mainline code is not using BIT, so I wanted
> > to keep coding style consistent. Perhaps a separate cleanup patch
> > will do later.
>
> It makes sense to me.
>
> >>> +/*
> >>> + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own
> >>> PASID space,
> >>> + * which requires guest and host PASID translation at both
> >>> directions. We keep
> >>> + * track of guest PASID in order to provide lookup service to
> >>> device drivers.
> >>> + * One such example is a physical function (PF) driver that
> >>> supports mediated
> >>> + * device (mdev) assignment. Guest programming of mdev
> >>> configuration space can
> >>> + * only be done with guest PASID, therefore PF driver needs to
> >>> find the matching
> >>> + * host PASID to program the real hardware.
> >>> + */
> >>> +#define SVM_FLAG_GUEST_PASID (1<<3)
> >>
> >> Ditto.
> >>
> >> Best regards,
> >> baolu
> >
> > [Jacob Pan]
> >
>
> Best regards,
> baolu

[Jacob Pan]

2019-10-29 07:26:08

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support

Hi,

On 10/29/19 6:29 AM, Jacob Pan wrote:
> Hi Baolu,
>
> Appreciate the thorough review, comments inline.

You are welcome.

>
> On Sat, 26 Oct 2019 10:01:19 +0800
> Lu Baolu <[email protected]> wrote:
>
>> Hi,
>>

[...]

>>> + * allow multiple bind calls with the same
>>> PASID and pdev.
>>> + */
>>> + sdev->users++;
>>> + goto out;
>>> + }
>>
>> I remember I ever pointed this out before. But I forgot how we
>> addressed it. So forgive me if this has been addressed.
>>
>> What if we have a valid bound svm but @dev doesn't belong to it
>> (a.k.a. @dev not in svm->devs list)?
>>
> If we are binding a new device to an existing/active PASID, the code
> will allocate a new sdev and add that to the svm->devs list.

But allocating a new sdev and adding device is in below else branch, so
it will never reach there, right?

>>> + } else {
>>> + /* We come here when PASID has never been bond to
>>> a device. */
>>> + svm = kzalloc(sizeof(*svm), GFP_KERNEL);
>>> + if (!svm) {
>>> + ret = -ENOMEM;
>>> + goto out;
>>> + }
>>> + /* REVISIT: upper layer/VFIO can track host
>>> process that bind the PASID.
>>> + * ioasid_set = mm might be sufficient for vfio to
>>> check pasid VMM
>>> + * ownership.
>>> + */
>>> + svm->mm = get_task_mm(current);
>>> + svm->pasid = data->hpasid;
>>> + if (data->flags & IOMMU_SVA_GPASID_VAL) {
>>> + svm->gpasid = data->gpasid;
>>> + svm->flags |= SVM_FLAG_GUEST_PASID;
>>> + }
>>> + ioasid_set_data(data->hpasid, svm);
>>> + INIT_LIST_HEAD_RCU(&svm->devs);
>>> + INIT_LIST_HEAD(&svm->list);
>>> +
>>> + mmput(svm->mm);
>>> + }
>>
>> A blank line, please.
> looks good.
>>
>>> + sdev = kzalloc(sizeof(*sdev), GFP_KERNEL);
>>> + if (!sdev) {
>>> + if (list_empty(&svm->devs))
>>> + kfree(svm);
>>
>> This is dangerous. This might leave a wild pointer bound with gpasid.
>>
> why is that? can you please explain?
> if the list is empty that means we just allocated the new svm, no
> users. why can't we free it here?

svm has been associated with the pasid private data. It needs to be
unbound from pasid before getting freed. Otherwise, a wild pointer will
be left.

ioasid_set_data(pasid, NULL);
kfree(svm);

>
>>> + ret = -ENOMEM;
>>> + goto out;
>>> + }
>>> + sdev->dev = dev;
>>> + sdev->users = 1;
>>> +
>>> + /* Set up device context entry for PASID if not enabled
>>> already */
>>> + ret = intel_iommu_enable_pasid(iommu, sdev->dev);
>>> + if (ret) {
>>> + dev_err(dev, "Failed to enable PASID
>>> capability\n");
>>> + kfree(sdev);
>>> + goto out;
>>> + }
>>> +
>>> + /*
>>> + * For guest bind, we need to set up PASID table entry as
>>> follows:
>>> + * - FLPM matches guest paging mode
>>> + * - turn on nested mode
>>> + * - SL guest address width matching
>>> + */
>>> + ret = intel_pasid_setup_nested(iommu,
>>> + dev,
>>> + (pgd_t *)data->gpgd,
>>> + data->hpasid,
>>> + &data->vtd,
>>> + ddomain,
>>> + data->addr_width);
>>> + if (ret) {
>>> + dev_err(dev, "Failed to set up PASID %llu in
>>> nested mode, Err %d\n",
>>> + data->hpasid, ret);
>>
>> This error handling is insufficient. You should at least:
>>
>> 1. free sdev
> already done below
>
>> 2. if list_empty(&svm->devs)
>> unbound the svm from gpasid
>> free svm
>>
> yes, agreed.
>
>> The same for above error handling. Add a branch for error recovery at
>> the end of function might help here.
>>
> not sure which code is the same as above? could you point it out?

Above last comment. :-)

>>> + kfree(sdev);
>>> + goto out;
>>> + }
>>> + svm->flags |= SVM_FLAG_GUEST_MODE;
>>> +
>>> + init_rcu_head(&sdev->rcu);
>>> + list_add_rcu(&sdev->list, &svm->devs);
>>> + out:
>>> + mutex_unlock(&pasid_mutex);
>>> + return ret;
>>> +}
>>> +
>>> +int intel_svm_unbind_gpasid(struct device *dev, int pasid)
>>> +{
>>> + struct intel_svm_dev *sdev;
>>> + struct intel_iommu *iommu;
>>> + struct intel_svm *svm;
>>> + int ret = -EINVAL;
>>> +
>>> + mutex_lock(&pasid_mutex);
>>> + iommu = intel_svm_device_to_iommu(dev);
>>> + if (!iommu)
>>> + goto out;
>>
>> Make it symmetrical with bind function.
>>
>> if (WARN_ON(!iommu))
>> goto out;
>>
> sounds good.
>>> +
>>> + svm = ioasid_find(NULL, pasid, NULL);
>>> + if (IS_ERR_OR_NULL(svm)) {
>>> + ret = PTR_ERR(svm);
>>
>> If svm == NULL, this function will return success. This is not
>> expected, right?
>>
> good catch, will fix.
>>> + goto out;
>>> + }
>>> +
>>> + for_each_svm_dev(svm, dev) {
>>> + ret = 0;
>>> + sdev->users--;
>>> + if (!sdev->users) {
>>> + list_del_rcu(&sdev->list);
>>> + intel_pasid_tear_down_entry(iommu, dev,
>>> svm->pasid);
>>> + /* TODO: Drain in flight PRQ for the PASID
>>> since it
>>> + * may get reused soon, we don't want to
>>> + * confuse with its previous life.
>>> + * intel_svm_drain_prq(dev, pasid);
>>> + */
>>> + kfree_rcu(sdev, rcu);
>>> +
>>> + if (list_empty(&svm->devs)) {
>>> + list_del(&svm->list);
>>> + kfree(svm);
>>> + /*
>>> + * We do not free PASID here until
>>> explicit call
>>> + * from VFIO to free. The PASID
>>> life cycle
>>> + * management is largely tied to
>>> VFIO management
>>> + * of assigned device life cycles.
>>> In case of
>>> + * guest exit without a explicit
>>> free PASID call,
>>> + * the responsibility lies in VFIO
>>> layer to free
>>> + * the PASIDs allocated for the
>>> guest.
>>> + * For security reasons, VFIO has
>>> to track the
>>> + * PASID ownership per guest
>>> anyway to ensure
>>> + * that PASID allocated by one
>>> guest cannot be
>>> + * used by another.
>>> + */
>>> + ioasid_set_data(pasid, NULL);
>>
>> Exchange order. First unbind svm from gpasid and then free svm.
>>
> I am not following, aren't we already doing free svm after unbind?
> please explain.

I meant

ioasid_set_data(pasid, NULL);
kfree(svm);

in reverse order, it leaves a short window when svm is freed, but pasid
private data is still kept svm (wild pointer).


>>> + }
>>> + }
>>> + break;
>>> + }
>>> + out:
>>> + mutex_unlock(&pasid_mutex);
>>> +
>>> + return ret;
>>> +}
>>> +
>>> int intel_svm_bind_mm(struct device *dev, int *pasid, int flags,
>>> struct svm_dev_ops *ops) {
>>> struct intel_iommu *iommu =
>>> intel_svm_device_to_iommu(dev); diff --git
>>> a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index
>>> 3dba6ad3e9ad..6c74c71b1ebf 100644 --- a/include/linux/intel-iommu.h
>>> +++ b/include/linux/intel-iommu.h
>>> @@ -673,7 +673,9 @@ int intel_iommu_enable_pasid(struct intel_iommu
>>> *iommu, struct device *dev); int intel_svm_init(struct intel_iommu
>>> *iommu); extern int intel_svm_enable_prq(struct intel_iommu *iommu);
>>> extern int intel_svm_finish_prq(struct intel_iommu *iommu);
>>> -
>>> +extern int intel_svm_bind_gpasid(struct iommu_domain *domain,
>>> + struct device *dev, struct iommu_gpasid_bind_data
>>> *data); +extern int intel_svm_unbind_gpasid(struct device *dev, int
>>> pasid); struct svm_dev_ops;
>>>
>>> struct intel_svm_dev {
>>> @@ -690,9 +692,13 @@ struct intel_svm_dev {
>>> struct intel_svm {
>>> struct mmu_notifier notifier;
>>> struct mm_struct *mm;
>>> +
>>> struct intel_iommu *iommu;
>>> int flags;
>>> int pasid;
>>> + int gpasid; /* Guest PASID in case of vSVA bind with
>>> non-identity host
>>> + * to guest PASID mapping.
>>> + */
>>> struct list_head devs;
>>> struct list_head list;
>>> };
>>> diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h
>>> index 94f047a8a845..a2c189ad0b01 100644
>>> --- a/include/linux/intel-svm.h
>>> +++ b/include/linux/intel-svm.h
>>> @@ -44,6 +44,23 @@ struct svm_dev_ops {
>>> * do such IOTLB flushes automatically.
>>> */
>>> #define SVM_FLAG_SUPERVISOR_MODE (1<<1)
>>> +/*
>>> + * The SVM_FLAG_GUEST_MODE flag is used when a guest process bind
>>> to a device.
>>> + * In this case the mm_struct is in the guest kernel or userspace,
>>> its life
>>> + * cycle is managed by VMM and VFIO layer. For IOMMU driver, this
>>> API provides
>>> + * means to bind/unbind guest CR3 with PASIDs allocated for a
>>> device.
>>> + */
>>> +#define SVM_FLAG_GUEST_MODE (1<<2)
>>
>> How about keeping this aligned with top by adding a tab?
>>
> sounds good.
>> BIT macro is preferred. Hence, make it BIT(1), BIT(2), BIT(3) is
>> preferred.
>>
> I know, but the existing mainline code is not using BIT, so I wanted
> to keep coding style consistent. Perhaps a separate cleanup patch will
> do later.

It makes sense to me.

>>> +/*
>>> + * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own
>>> PASID space,
>>> + * which requires guest and host PASID translation at both
>>> directions. We keep
>>> + * track of guest PASID in order to provide lookup service to
>>> device drivers.
>>> + * One such example is a physical function (PF) driver that
>>> supports mediated
>>> + * device (mdev) assignment. Guest programming of mdev
>>> configuration space can
>>> + * only be done with guest PASID, therefore PF driver needs to
>>> find the matching
>>> + * host PASID to program the real hardware.
>>> + */
>>> +#define SVM_FLAG_GUEST_PASID (1<<3)
>>
>> Ditto.
>>
>> Best regards,
>> baolu
>
> [Jacob Pan]
>

Best regards,
baolu

2019-10-29 07:27:53

by Lu Baolu

[permalink] [raw]
Subject: Re: [PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support

Hi,

On 10/29/19 12:11 PM, Jacob Pan wrote:
> On Tue, 29 Oct 2019 10:54:48 +0800
> Lu Baolu<[email protected]> wrote:
>
>> Hi,
>>
>> On 10/29/19 6:29 AM, Jacob Pan wrote:
>>> Hi Baolu,
>>>
>>> Appreciate the thorough review, comments inline.
>> You are welcome.
>>
>>> On Sat, 26 Oct 2019 10:01:19 +0800
>>> Lu Baolu<[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>> [...]
>>
>>>>> + * allow multiple bind calls with the
>>>>> same PASID and pdev.
>>>>> + */
>>>>> + sdev->users++;
>>>>> + goto out;
>>>>> + }
>>>> I remember I ever pointed this out before. But I forgot how we
>>>> addressed it. So forgive me if this has been addressed.
>>>>
>>>> What if we have a valid bound svm but @dev doesn't belong to it
>>>> (a.k.a. @dev not in svm->devs list)?
>>>>
>>> If we are binding a new device to an existing/active PASID, the code
>>> will allocate a new sdev and add that to the svm->devs list.
>> But allocating a new sdev and adding device is in below else branch,
>> so it will never reach there, right?
>>
> No, allocating sdev is outside else branch.

Oh, yes! Please ignore it.

Best regards,
baolu

2019-10-29 10:00:01

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support

> From: Jacob Pan [mailto:[email protected]]
> Sent: Tuesday, October 29, 2019 12:03 AM
>
> On Mon, 28 Oct 2019 06:03:36 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > > > + .sva_bind_gpasid = intel_svm_bind_gpasid,
> > > > > + .sva_unbind_gpasid = intel_svm_unbind_gpasid,
> > > > > +#endif
> > > >
> > > > again, pure PASID management logic should be separated from SVM.
> > > >
> > > I am not following, these two functions are SVM functionality, not
> > > pure PASID management which is already separated in ioasid.c
> >
> > I should say pure "scalable mode" logic. Above callbacks are not
> > related to host SVM per se. They are serving gpasid requests from
> > guest side, thus part of generic scalable mode capability.
> Got your point, but we are sharing data structures with host SVM, it is
> very difficult and inefficient to separate the two.

I don't think difficulty is the reason against such direction. We need
do things right. :-) I'm fine with putting it in a TODO list, but at least
need the right information in the 1st place to tell that current way
is just a short-term approach, and we should revisit later.

thanks
Kevin

2019-10-29 16:09:12

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support

On Tue, 29 Oct 2019 07:57:21 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Jacob Pan [mailto:[email protected]]
> > Sent: Tuesday, October 29, 2019 12:03 AM
> >
> > On Mon, 28 Oct 2019 06:03:36 +0000
> > "Tian, Kevin" <[email protected]> wrote:
> >
> > > > > > + .sva_bind_gpasid = intel_svm_bind_gpasid,
> > > > > > + .sva_unbind_gpasid =
> > > > > > intel_svm_unbind_gpasid, +#endif
> > > > >
> > > > > again, pure PASID management logic should be separated from
> > > > > SVM.
> > > > I am not following, these two functions are SVM functionality,
> > > > not pure PASID management which is already separated in
> > > > ioasid.c
> > >
> > > I should say pure "scalable mode" logic. Above callbacks are not
> > > related to host SVM per se. They are serving gpasid requests from
> > > guest side, thus part of generic scalable mode capability.
> > Got your point, but we are sharing data structures with host SVM,
> > it is very difficult and inefficient to separate the two.
>
> I don't think difficulty is the reason against such direction. We
> need do things right. :-) I'm fine with putting it in a TODO list,
> but at least need the right information in the 1st place to tell that
> current way is just a short-term approach, and we should revisit
> later.
I guess the fundamental question is: Should the scalable mode logic,
i.e. guest SVA at PASID granu device, be perceived as part of the
overall SVA functionality?

My view is yes, we shall share SVA and gSVA whenever we can.

The longer term, which I am working on right now, is to converge
intel_svm_bind_mm to the generic iommu_sva_bind_device() and use common
data structures as well. It is conceivable that these common structures
span across hardware architectures, also guest vs host SVA usages.

i.e. iommu_ops have
iommu_sva_bind_gpasid() for SM/gSVA
iommu_sva_bind_device() for native SVA

Or I am missing your point completely?

2019-10-29 17:11:28

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 02/11] iommu/vt-d: Enlightened PASID allocation

On Fri, 25 Oct 2019 06:19:29 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Jacob Pan [mailto:[email protected]]
> > Sent: Friday, October 25, 2019 3:55 AM
> >
> > From: Lu Baolu <[email protected]>
> >
> > Enabling IOMMU in a guest requires communication with the host
> > driver for certain aspects. Use of PASID ID to enable Shared Virtual
> > Addressing (SVA) requires managing PASID's in the host. VT-d 3.0
> > spec provides a Virtual Command Register (VCMD) to facilitate this.
> > Writes to this register in the guest are trapped by QEMU which
> > proxies the call to the host driver.
> >
> > This virtual command interface consists of a capability register,
> > a virtual command register, and a virtual response register. Refer
> > to section 10.4.42, 10.4.43, 10.4.44 for more information.
> >
> > This patch adds the enlightened PASID allocation/free interfaces
> > via the virtual command interface.
> >
> > Cc: Ashok Raj <[email protected]>
> > Cc: Jacob Pan <[email protected]>
> > Cc: Kevin Tian <[email protected]>
> > Signed-off-by: Liu Yi L <[email protected]>
> > Signed-off-by: Lu Baolu <[email protected]>
> > Signed-off-by: Jacob Pan <[email protected]>
> > Reviewed-by: Eric Auger <[email protected]>
> > ---
> > drivers/iommu/intel-pasid.c | 56
> > +++++++++++++++++++++++++++++++++++++++++++++
> > drivers/iommu/intel-pasid.h | 13 ++++++++++-
> > include/linux/intel-iommu.h | 2 ++
> > 3 files changed, 70 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index 040a445be300..d81e857d2b25
> > 100644 --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -63,6 +63,62 @@ void *intel_pasid_lookup_id(int pasid)
> > return p;
> > }
> >
> > +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
> > *pasid) +{
> > + unsigned long flags;
> > + u8 status_code;
> > + int ret = 0;
> > + u64 res;
> > +
> > + raw_spin_lock_irqsave(&iommu->register_lock, flags);
> > + dmar_writeq(iommu->reg + DMAR_VCMD_REG,
> > VCMD_CMD_ALLOC);
> > + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> > + !(res & VCMD_VRSP_IP), res);
> > + raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> > +
>
> should we handle VCMD_VRSP_IP here?
VCMD_VRSP_IP is checked above, if it times out, you will get panic. Not
sure whatelse to do?
>
> > + status_code = VCMD_VRSP_SC(res);
> > + switch (status_code) {
> > + case VCMD_VRSP_SC_SUCCESS:
> > + *pasid = VCMD_VRSP_RESULT(res);
> > + break;
> > + case VCMD_VRSP_SC_NO_PASID_AVAIL:
> > + pr_info("IOMMU: %s: No PASID available\n", iommu-
> > >name);
> > + ret = -ENOMEM;
> > + break;
> > + default:
> > + ret = -ENODEV;
> > + pr_warn("IOMMU: %s: Unexpected error code %d\n",
> > + iommu->name, status_code);
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
> > +{
> > + unsigned long flags;
> > + u8 status_code;
> > + u64 res;
> > +
> > + raw_spin_lock_irqsave(&iommu->register_lock, flags);
> > + dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) |
> > VCMD_CMD_FREE);
>
> define a macro for pasid offset.
>
will do.

> > + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> > + !(res & VCMD_VRSP_IP), res);
> > + raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> > +
> > + status_code = VCMD_VRSP_SC(res);
> > + switch (status_code) {
> > + case VCMD_VRSP_SC_SUCCESS:
> > + break;
> > + case VCMD_VRSP_SC_INVALID_PASID:
> > + pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
> > + break;
> > + default:
> > + pr_warn("IOMMU: %s: Unexpected error code %d\n",
> > + iommu->name, status_code);
> > + }
> > +}
> > +
> > /*
> > * Per device pasid table management:
> > */
> > diff --git a/drivers/iommu/intel-pasid.h
> > b/drivers/iommu/intel-pasid.h index fc8cd8f17de1..e413e884e685
> > 100644 --- a/drivers/iommu/intel-pasid.h
> > +++ b/drivers/iommu/intel-pasid.h
> > @@ -23,6 +23,16 @@
> > #define is_pasid_enabled(entry) (((entry)->lo >> 3)
> > & 0x1) #define get_pasid_dir_size(entry) (1 <<
> > ((((entry)->lo >> 9) & 0x7) + 7))
> >
> > +/* Virtual command interface for enlightened pasid management. */
> > +#define VCMD_CMD_ALLOC 0x1
> > +#define VCMD_CMD_FREE 0x2
> > +#define VCMD_VRSP_IP 0x1
> > +#define VCMD_VRSP_SC(e) (((e) >> 1) & 0x3)
> > +#define VCMD_VRSP_SC_SUCCESS 0
> > +#define VCMD_VRSP_SC_NO_PASID_AVAIL 1
> > +#define VCMD_VRSP_SC_INVALID_PASID 1
> > +#define VCMD_VRSP_RESULT(e) (((e) >> 8) & 0xfffff)
> > +
> > /*
> > * Domain ID reserved for pasid entries programmed for first-level
> > * only and pass-through transfer modes.
> > @@ -95,5 +105,6 @@ int intel_pasid_setup_pass_through(struct
> > intel_iommu *iommu,
> > struct device *dev, int pasid);
> > void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> > struct device *dev, int pasid);
> > -
> > +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
> > *pasid); +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned
> > int pasid); #endif /* __INTEL_PASID_H */
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index 2e1bed9b7eef..1d4b8dcdc5d8
> > 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -161,6 +161,7 @@
> > #define ecap_smpwc(e) (((e) >> 48) & 0x1)
> > #define ecap_flts(e) (((e) >> 47) & 0x1)
> > #define ecap_slts(e) (((e) >> 46) & 0x1)
> > +#define ecap_vcs(e) (((e) >> 44) & 0x1)
> > #define ecap_smts(e) (((e) >> 43) & 0x1)
> > #define ecap_dit(e) ((e >> 41) & 0x1)
> > #define ecap_pasid(e) ((e >> 40) & 0x1)
> > @@ -282,6 +283,7 @@
> >
> > /* PRS_REG */
> > #define DMA_PRS_PPR ((u32)1)
> > +#define DMA_VCS_PAS ((u64)1)
> >
> > #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts)
> > \
> > do
> > {
> > \ -- 2.7.4
>

[Jacob Pan]

2019-10-29 19:58:41

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 09/11] iommu/vt-d: Add bind guest PASID support

> From: Jacob Pan [mailto:[email protected]]
> Sent: Wednesday, October 30, 2019 12:12 AM
>
> On Tue, 29 Oct 2019 07:57:21 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > From: Jacob Pan [mailto:[email protected]]
> > > Sent: Tuesday, October 29, 2019 12:03 AM
> > >
> > > On Mon, 28 Oct 2019 06:03:36 +0000
> > > "Tian, Kevin" <[email protected]> wrote:
> > >
> > > > > > > + .sva_bind_gpasid = intel_svm_bind_gpasid,
> > > > > > > + .sva_unbind_gpasid =
> > > > > > > intel_svm_unbind_gpasid, +#endif
> > > > > >
> > > > > > again, pure PASID management logic should be separated from
> > > > > > SVM.
> > > > > I am not following, these two functions are SVM functionality,
> > > > > not pure PASID management which is already separated in
> > > > > ioasid.c
> > > >
> > > > I should say pure "scalable mode" logic. Above callbacks are not
> > > > related to host SVM per se. They are serving gpasid requests from
> > > > guest side, thus part of generic scalable mode capability.
> > > Got your point, but we are sharing data structures with host SVM,
> > > it is very difficult and inefficient to separate the two.
> >
> > I don't think difficulty is the reason against such direction. We
> > need do things right. :-) I'm fine with putting it in a TODO list,
> > but at least need the right information in the 1st place to tell that
> > current way is just a short-term approach, and we should revisit
> > later.
> I guess the fundamental question is: Should the scalable mode logic,
> i.e. guest SVA at PASID granu device, be perceived as part of the
> overall SVA functionality?

guest SVA != guest scalable mode. I'm not sure whether the definition
of SVA has been changed. but iirc it simply means shared virtual
memory usage i.e. sharing CPU page table with device. But with
scalable mode, you can have PASID tagged 1st-level for whatever
usage, guest IOVA, guest SVA, guest nested GPA, etc.

>
> My view is yes, we shall share SVA and gSVA whenever we can.

sharing is based on scalable mode, not based on sva itself.

>
> The longer term, which I am working on right now, is to converge
> intel_svm_bind_mm to the generic iommu_sva_bind_device() and use
> common
> data structures as well. It is conceivable that these common structures
> span across hardware architectures, also guest vs host SVA usages.
>
> i.e. iommu_ops have
> iommu_sva_bind_gpasid() for SM/gSVA
> iommu_sva_bind_device() for native SVA
>
> Or I am missing your point completely?

since sva is already used in VFIO for broader purpose, it's fine to leave
the name there. But again, it's incorrect to tie iommu_sva_bind_gpasid
under CONFIG_INTEL_IOMMU_SVM. The former is for SM, while the
latter is only for SVA. As long as host IOMMU is in scalable mode,
bind_gpasid can be supported. If you want a config option, then it should
be a new one instead of IOMMU_SVM.

Thanks
Kevin

2019-10-29 20:01:25

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 02/11] iommu/vt-d: Enlightened PASID allocation

> From: Jacob Pan [mailto:[email protected]]
> Sent: Wednesday, October 30, 2019 1:15 AM
> > >
> > > From: Lu Baolu <[email protected]>
> > >
> > > Enabling IOMMU in a guest requires communication with the host
> > > driver for certain aspects. Use of PASID ID to enable Shared Virtual
> > > Addressing (SVA) requires managing PASID's in the host. VT-d 3.0
> > > spec provides a Virtual Command Register (VCMD) to facilitate this.
> > > Writes to this register in the guest are trapped by QEMU which
> > > proxies the call to the host driver.
> > >
> > > This virtual command interface consists of a capability register,
> > > a virtual command register, and a virtual response register. Refer
> > > to section 10.4.42, 10.4.43, 10.4.44 for more information.
> > >
> > > This patch adds the enlightened PASID allocation/free interfaces
> > > via the virtual command interface.
> > >
> > > Cc: Ashok Raj <[email protected]>
> > > Cc: Jacob Pan <[email protected]>
> > > Cc: Kevin Tian <[email protected]>
> > > Signed-off-by: Liu Yi L <[email protected]>
> > > Signed-off-by: Lu Baolu <[email protected]>
> > > Signed-off-by: Jacob Pan <[email protected]>
> > > Reviewed-by: Eric Auger <[email protected]>
> > > ---
> > > drivers/iommu/intel-pasid.c | 56
> > > +++++++++++++++++++++++++++++++++++++++++++++
> > > drivers/iommu/intel-pasid.h | 13 ++++++++++-
> > > include/linux/intel-iommu.h | 2 ++
> > > 3 files changed, 70 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/iommu/intel-pasid.c
> > > b/drivers/iommu/intel-pasid.c index 040a445be300..d81e857d2b25
> > > 100644 --- a/drivers/iommu/intel-pasid.c
> > > +++ b/drivers/iommu/intel-pasid.c
> > > @@ -63,6 +63,62 @@ void *intel_pasid_lookup_id(int pasid)
> > > return p;
> > > }
> > >
> > > +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
> > > *pasid) +{
> > > + unsigned long flags;
> > > + u8 status_code;
> > > + int ret = 0;
> > > + u64 res;
> > > +
> > > + raw_spin_lock_irqsave(&iommu->register_lock, flags);
> > > + dmar_writeq(iommu->reg + DMAR_VCMD_REG,
> > > VCMD_CMD_ALLOC);
> > > + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> > > + !(res & VCMD_VRSP_IP), res);
> > > + raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> > > +
> >
> > should we handle VCMD_VRSP_IP here?
> VCMD_VRSP_IP is checked above, if it times out, you will get panic. Not
> sure whatelse to do?

Not need. I misunderstood the condition here.

> >
> > > + status_code = VCMD_VRSP_SC(res);
> > > + switch (status_code) {
> > > + case VCMD_VRSP_SC_SUCCESS:
> > > + *pasid = VCMD_VRSP_RESULT(res);
> > > + break;
> > > + case VCMD_VRSP_SC_NO_PASID_AVAIL:
> > > + pr_info("IOMMU: %s: No PASID available\n", iommu-
> > > >name);
> > > + ret = -ENOMEM;
> > > + break;
> > > + default:
> > > + ret = -ENODEV;
> > > + pr_warn("IOMMU: %s: Unexpected error code %d\n",
> > > + iommu->name, status_code);
> > > + }
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
> > > +{
> > > + unsigned long flags;
> > > + u8 status_code;
> > > + u64 res;
> > > +
> > > + raw_spin_lock_irqsave(&iommu->register_lock, flags);
> > > + dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) |
> > > VCMD_CMD_FREE);
> >
> > define a macro for pasid offset.
> >
> will do.
>
> > > + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> > > + !(res & VCMD_VRSP_IP), res);
> > > + raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> > > +
> > > + status_code = VCMD_VRSP_SC(res);
> > > + switch (status_code) {
> > > + case VCMD_VRSP_SC_SUCCESS:
> > > + break;
> > > + case VCMD_VRSP_SC_INVALID_PASID:
> > > + pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
> > > + break;
> > > + default:
> > > + pr_warn("IOMMU: %s: Unexpected error code %d\n",
> > > + iommu->name, status_code);
> > > + }
> > > +}
> > > +
> > > /*
> > > * Per device pasid table management:
> > > */
> > > diff --git a/drivers/iommu/intel-pasid.h
> > > b/drivers/iommu/intel-pasid.h index fc8cd8f17de1..e413e884e685
> > > 100644 --- a/drivers/iommu/intel-pasid.h
> > > +++ b/drivers/iommu/intel-pasid.h
> > > @@ -23,6 +23,16 @@
> > > #define is_pasid_enabled(entry) (((entry)->lo >> 3)
> > > & 0x1) #define get_pasid_dir_size(entry) (1 <<
> > > ((((entry)->lo >> 9) & 0x7) + 7))
> > >
> > > +/* Virtual command interface for enlightened pasid management. */
> > > +#define VCMD_CMD_ALLOC 0x1
> > > +#define VCMD_CMD_FREE 0x2
> > > +#define VCMD_VRSP_IP 0x1
> > > +#define VCMD_VRSP_SC(e) (((e) >> 1) & 0x3)
> > > +#define VCMD_VRSP_SC_SUCCESS 0
> > > +#define VCMD_VRSP_SC_NO_PASID_AVAIL 1
> > > +#define VCMD_VRSP_SC_INVALID_PASID 1
> > > +#define VCMD_VRSP_RESULT(e) (((e) >> 8) & 0xfffff)
> > > +
> > > /*
> > > * Domain ID reserved for pasid entries programmed for first-level
> > > * only and pass-through transfer modes.
> > > @@ -95,5 +105,6 @@ int intel_pasid_setup_pass_through(struct
> > > intel_iommu *iommu,
> > > struct device *dev, int pasid);
> > > void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> > > struct device *dev, int pasid);
> > > -
> > > +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
> > > *pasid); +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned
> > > int pasid); #endif /* __INTEL_PASID_H */
> > > diff --git a/include/linux/intel-iommu.h
> > > b/include/linux/intel-iommu.h index 2e1bed9b7eef..1d4b8dcdc5d8
> > > 100644 --- a/include/linux/intel-iommu.h
> > > +++ b/include/linux/intel-iommu.h
> > > @@ -161,6 +161,7 @@
> > > #define ecap_smpwc(e) (((e) >> 48) & 0x1)
> > > #define ecap_flts(e) (((e) >> 47) & 0x1)
> > > #define ecap_slts(e) (((e) >> 46) & 0x1)
> > > +#define ecap_vcs(e) (((e) >> 44) & 0x1)
> > > #define ecap_smts(e) (((e) >> 43) & 0x1)
> > > #define ecap_dit(e) ((e >> 41) & 0x1)
> > > #define ecap_pasid(e) ((e >> 40) & 0x1)
> > > @@ -282,6 +283,7 @@
> > >
> > > /* PRS_REG */
> > > #define DMA_PRS_PPR ((u32)1)
> > > +#define DMA_VCS_PAS ((u64)1)
> > >
> > > #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts)
> > > \
> > > do
> > > {
> > > \ -- 2.7.4
> >
>
> [Jacob Pan]

2019-10-29 20:02:51

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function

> From: Jacob Pan [mailto:[email protected]]
> Sent: Tuesday, October 29, 2019 12:11 AM
>
> On Mon, 28 Oct 2019 06:06:33 +0000
> "Tian, Kevin" <[email protected]> wrote:
>
> > > >>> +    /* PASID based dev TLBs, only support all PASIDs or single
> > > >>> PASID */
> > > >>> +    {1, 1, 0},
> > > >>
> > > >> I forgot previous discussion. is it necessary to pass down dev
> > > >> TLB invalidation
> > > >> requests? Can it be handled by host iOMMU driver automatically?
> > > >
> > > > On host SVA, when a memory is unmapped, driver callback will
> > > > invalidate dev IOTLB explicitly. So I guess we need to pass down
> > > > it for guest case. This is also required for guest iova over 1st
> > > > level usage as far as can see.
> > > >
> > >
> > > Sorry, I confused guest vIOVA and guest vSVA. For guest vIOVA, no
> > > device TLB invalidation pass down. But currently for guest vSVA,
> > > device TLB invalidation is passed down. Perhaps we can avoid
> > > passing down dev TLB flush just like what we are doing for guest
> > > IOVA.
> >
> > I think dev TLB is fully handled within IOMMU driver today. It doesn't
> > require device driver to explicit toggle. With this then we can fully
> > virtualize guest dev TLB invalidation request to save one syscall,
> > since the host is supposed to flush dev TLB when serving the earlier
> > IOTLB invalidation pass-down.
>
> In the previous discussions, we thought about making IOTLB flush
> inclusive, where IOTLB flush would always include device TLB flush. But
> we thought such behavior cannot be assumed for all VMMs, some may still
> do explicit dev TLB flush. So for completeness, we included dev TLB
> here.

is there such example or a link to previous discussion? Here we are
talking about host IOMMU driver behavior, instead of VMM. But I'm
not strong on this, since it's more an optimization. But there remains
one unclear area. If we do want to support such usage with explicit
dev TLB flush, how does host IOMMU driver avoid doing implicit
dev TLB flush when serving iotlb invalidation request? Is it already
designed such way that user-passed-down iotlb invalidation request
only invalidates iotlb while kernel-triggered iotlb invalidation still
does implicit dev TLB flush?

Thanks
Kevin

2019-10-29 20:06:40

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function

On Tue, 29 Oct 2019 18:52:01 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Jacob Pan [mailto:[email protected]]
> > Sent: Tuesday, October 29, 2019 12:11 AM
> >
> > On Mon, 28 Oct 2019 06:06:33 +0000
> > "Tian, Kevin" <[email protected]> wrote:
> >
> > > > >>> +    /* PASID based dev TLBs, only support all PASIDs or
> > > > >>> single PASID */
> > > > >>> +    {1, 1, 0},
> > > > >>
> > > > >> I forgot previous discussion. is it necessary to pass down
> > > > >> dev TLB invalidation
> > > > >> requests? Can it be handled by host iOMMU driver
> > > > >> automatically?
> > > > >
> > > > > On host SVA, when a memory is unmapped, driver callback will
> > > > > invalidate dev IOTLB explicitly. So I guess we need to pass
> > > > > down it for guest case. This is also required for guest iova
> > > > > over 1st level usage as far as can see.
> > > > >
> > > >
> > > > Sorry, I confused guest vIOVA and guest vSVA. For guest vIOVA,
> > > > no device TLB invalidation pass down. But currently for guest
> > > > vSVA, device TLB invalidation is passed down. Perhaps we can
> > > > avoid passing down dev TLB flush just like what we are doing
> > > > for guest IOVA.
> > >
> > > I think dev TLB is fully handled within IOMMU driver today. It
> > > doesn't require device driver to explicit toggle. With this then
> > > we can fully virtualize guest dev TLB invalidation request to
> > > save one syscall, since the host is supposed to flush dev TLB
> > > when serving the earlier IOTLB invalidation pass-down.
> >
> > In the previous discussions, we thought about making IOTLB flush
> > inclusive, where IOTLB flush would always include device TLB flush.
> > But we thought such behavior cannot be assumed for all VMMs, some
> > may still do explicit dev TLB flush. So for completeness, we
> > included dev TLB here.
>
> is there such example or a link to previous discussion? Here we are
> talking about host IOMMU driver behavior, instead of VMM. But I'm
> not strong on this, since it's more an optimization. But there remains
> one unclear area. If we do want to support such usage with explicit
> dev TLB flush, how does host IOMMU driver avoid doing implicit
> dev TLB flush when serving iotlb invalidation request? Is it already
> designed such way that user-passed-down iotlb invalidation request
> only invalidates iotlb while kernel-triggered iotlb invalidation still
> does implicit dev TLB flush?
>
The current design with vIOMMU in QEMU will prevent explicit dev TLB
flush. Host will always do inclusive IOTLB and dev TLB flush on IOTLB
flush request.

For other VMM which does not do this optimization, we just leave a
path for explicit dev TLB flush. Redundant but for IOMMU driver
perspective it is complete. We don't avoid the redundancy as there is
no damage outside the guest, just as we don't prevent guest doing the
same flush twice.

2019-11-01 18:29:28

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 04/11] iommu/vt-d: Replace Intel specific PASID allocator with IOASID

On Fri, 25 Oct 2019 13:47:25 +0800
Lu Baolu <[email protected]> wrote:

> Hi,
>
> On 10/25/19 3:54 AM, Jacob Pan wrote:
> > Make use of generic IOASID code to manage PASID allocation,
> > free, and lookup. Replace Intel specific code.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/intel-iommu.c | 12 ++++++------
> > drivers/iommu/intel-pasid.c | 36
> > ------------------------------------ drivers/iommu/intel-svm.c |
> > 39 +++++++++++++++++++++++---------------- 3 files changed, 29
> > insertions(+), 58 deletions(-)
>
> [--cut--]
>
> > @@ -458,10 +465,11 @@ int intel_svm_is_pasid_valid(struct device
> > *dev, int pasid) if (!iommu)
> > goto out;
> >
> > - svm = intel_pasid_lookup_id(pasid);
> > - if (!svm)
> > + svm = ioasid_find(NULL, pasid, NULL);
> > + if (IS_ERR(svm)) {
>
> Shall we check whether svm is NULL?
>
Missed this earlier, you are right we need to check for NULL.
Thanks,

> Others looks good to me.
>
> Reviewed-by: Lu Baolu <[email protected]>
>
> Best regards,
> baolu

[Jacob Pan]

2019-11-01 21:07:26

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 07/11] iommu/vt-d: Add nested translation helper function

On Fri, 25 Oct 2019 07:04:28 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Jacob Pan [mailto:[email protected]]
> > Sent: Friday, October 25, 2019 3:55 AM
> >
> > Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
> > With PASID granular translation type set to 0x11b, translation
> > result from the first level(FL) also subject to a second level(SL)
> > page table translation. This mode is used for SVA virtualization,
> > where FL performs guest virtual to guest physical translation and
> > SL performs guest physical to host physical translation.
>
> I think we really differentiate what is the common logic for
> first-level usages (GVA, GIOVA, etc.) in scalable mode, and
> what is specific to SVA. I have the feeling that SVA is over-used
> to cause confusing interpretation.
>
Good point, it should be clearly stated that nest mode is not for gSVA
only, gIOVA shares this common code.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > Signed-off-by: Liu, Yi L <[email protected]>
> > ---
> > drivers/iommu/intel-pasid.c | 207
> > ++++++++++++++++++++++++++++++++++++++++++++
> > drivers/iommu/intel-pasid.h | 12 +++
> > 2 files changed, 219 insertions(+)
> >
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index ffbd416ed3b8..f846a907cfcf
> > 100644 --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -415,6 +415,76 @@ pasid_set_flpm(struct pasid_entry *pe, u64
> > value) pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
> > }
> >
> > +/*
> > + * Setup the Extended Memory Type(EMT) field (Bits 91-93)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_emt(struct pasid_entry *pe, u64 value)
> > +{
> > + pasid_set_bits(&pe->val[1], GENMASK_ULL(29, 27), value <<
> > 27); +}
> > +
> > +/*
> > + * Setup the Page Attribute Table (PAT) field (Bits 96-127)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_pat(struct pasid_entry *pe, u64 value)
> > +{
> > + pasid_set_bits(&pe->val[1], GENMASK_ULL(63, 32), value <<
> > 27); +}
> > +
> > +/*
> > + * Setup the Cache Disable (CD) field (Bit 89)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_cd(struct pasid_entry *pe)
> > +{
> > + pasid_set_bits(&pe->val[1], 1 << 25, 1);
> > +}
> > +
> > +/*
> > + * Setup the Extended Memory Type Enable (EMTE) field (Bit 90)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_emte(struct pasid_entry *pe)
> > +{
> > + pasid_set_bits(&pe->val[1], 1 << 26, 1);
> > +}
> > +
> > +/*
> > + * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_eafe(struct pasid_entry *pe)
> > +{
> > + pasid_set_bits(&pe->val[2], 1 << 7, 1);
> > +}
> > +
> > +/*
> > + * Setup the Page-level Cache Disable (PCD) field (Bit 95)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_pcd(struct pasid_entry *pe)
> > +{
> > + pasid_set_bits(&pe->val[1], 1 << 31, 1);
> > +}
> > +
> > +/*
> > + * Setup the Page-level Write-Through (PWT)) field (Bit 94)
> > + * of a scalable mode PASID entry.
> > + */
> > +static inline void
> > +pasid_set_pwt(struct pasid_entry *pe)
> > +{
> > + pasid_set_bits(&pe->val[1], 1 << 30, 1);
> > +}
> > +
> > static void
> > pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
> > u16 did, int pasid)
> > @@ -647,3 +717,140 @@ int intel_pasid_setup_pass_through(struct
> > intel_iommu *iommu,
> >
> > return 0;
> > }
> > +
> > +static int intel_pasid_setup_bind_data(struct intel_iommu *iommu,
> > + struct pasid_entry *pte,
> > + struct iommu_gpasid_bind_data_vtd
> > *pasid_data)
> > +{
> > + /*
> > + * Not all guest PASID table entry fields are passed down
> > during bind,
> > + * here we only set up the ones that are dependent on guest
> > settings.
> > + * Execution related bits such as NXE, SMEP are not
> > meaningful to IOMMU,
> > + * therefore not set. Other fields, such as snoop related,
> > are set based
> > + * on host needs regardless of guest settings.
> > + */
> > + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_SRE) {
> > + if (!ecap_srs(iommu->ecap)) {
> > + pr_err("No supervisor request support on
> > %s\n",
> > + iommu->name);
> > + return -EINVAL;
> > + }
> > + pasid_set_sre(pte);
> > + }
> > +
> > + if ((pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) &&
> > ecap_eafs(iommu->ecap))
> > + pasid_set_eafe(pte);
> > +
> > + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EMTE) {
> > + pasid_set_emte(pte);
> > + pasid_set_emt(pte, pasid_data->emt);
> > + }
>
> above conditional checks are not consistent. The 1st check may
> return error but latter two don't. Can you confirm whether it's
> desired way?
>
should be consistent and under the check of host MTS capability. Will
change.
> > +
> > + /*
> > + * Memory type is only applicable to devices inside
> > processor coherent
> > + * domain. PCIe devices are not included. We can skip the
> > rest of the
> > + * flags if IOMMU does not support MTS.
> > + */
> > + if (!ecap_mts(iommu->ecap)) {
> > + pr_info("%s does not support memory type bind guest
> > PASID\n",
> > + iommu->name);
> > + return 0;
>
> why not -EINVAL?
>
right, if host does not support MTS and guest wants to set MTS related
bits, -EINVAL should be returned.

> > + }
> > +
> > + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PCD)
> > + pasid_set_pcd(pte);
> > + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PWT)
> > + pasid_set_pwt(pte);
> > + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_CD)
> > + pasid_set_cd(pte);
> > + pasid_set_pat(pte, pasid_data->pat);
> > +
> > + return 0;
> > +
> > +}
> > +
> > +/**
> > + * intel_pasid_setup_nested() - Set up PASID entry for nested
> > translation
> > + * which is used for vSVA. The first level page tables are used for
> > + * GVA-GPA translation in the guest, second level page tables are
> > used
> > + * for GPA to HPA translation.
>
> It's too restricting on how 1st level is used by guest.
>
will fix. stating FL can be used for gIOVA in nested mode.
> > + *
> > + * @iommu: Iommu which the device belong to
> > + * @dev: Device to be set up for translation
> > + * @gpgd: FLPTPTR: First Level Page translation pointer in
> > GPA
> > + * @pasid: PASID to be programmed in the device PASID table
> > + * @pasid_data: Additional PASID info from the guest bind request
> > + * @domain: Domain info for setting up second level page tables
> > + * @addr_width: Address width of the first level (guest)
> > + */
> > +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> > + struct device *dev, pgd_t *gpgd,
> > + int pasid, struct
> > iommu_gpasid_bind_data_vtd *pasid_data,
> > + struct dmar_domain *domain,
> > + int addr_width)
> > +{
> > + struct pasid_entry *pte;
> > + struct dma_pte *pgd;
> > + u64 pgd_val;
> > + int agaw;
> > + u16 did;
> > +
> > + if (!ecap_nest(iommu->ecap)) {
> > + pr_err("IOMMU: %s: No nested translation
> > support\n",
> > + iommu->name);
> > + return -EINVAL;
> > + }
> > +
> > + pte = intel_pasid_get_entry(dev, pasid);
> > + if (WARN_ON(!pte))
> > + return -EINVAL;
> > +
> > + pasid_clear_entry(pte);
> > +
> > + /* Sanity checking performed by caller to make sure address
> > + * width matching in two dimensions:
> > + * 1. CPU vs. IOMMU
> > + * 2. Guest vs. Host.
> > + */
> > + switch (addr_width) {
> > + case 57:
>
> AW_5LEVEL
>
> > + pasid_set_flpm(pte, 1);
> > + break;
> > + case 48:
>
> AW_4LEVEL
>
will add macros

> > + pasid_set_flpm(pte, 0);
> > + break;
> > + default:
> > + dev_err(dev, "Invalid paging mode %d\n",
> > addr_width);
> > + return -EINVAL;
> > + }
> > +
> > + pasid_set_flptr(pte, (u64)gpgd);
> > +
> > + intel_pasid_setup_bind_data(iommu, pte, pasid_data);
> > +
> > + /* Setup the second level based on the given domain */
> > + pgd = domain->pgd;
> > +
> > + for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
> > + pgd = phys_to_virt(dma_pte_addr(pgd));
> > + if (!dma_pte_present(pgd)) {
> > + dev_err(dev, "Invalid domain page
> > table\n");
>
> pasid_clear_entry?
>
right, even though present bit is not set still a good practice to
clear.

> > + return -EINVAL;
> > + }
> > + }
> > + pgd_val = virt_to_phys(pgd);
> > + pasid_set_slptr(pte, pgd_val);
> > + pasid_set_fault_enable(pte);
> > +
> > + did = domain->iommu_did[iommu->seq_id];
> > + pasid_set_domain_id(pte, did);
> > +
> > + pasid_set_address_width(pte, agaw);
> > + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
> > +
> > + pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
> > + pasid_set_present(pte);
> > + pasid_flush_caches(iommu, pte, pasid, did);
> > +
> > + return 0;
> > +}
> > diff --git a/drivers/iommu/intel-pasid.h
> > b/drivers/iommu/intel-pasid.h index e413e884e685..09c85db73b77
> > 100644 --- a/drivers/iommu/intel-pasid.h
> > +++ b/drivers/iommu/intel-pasid.h
> > @@ -46,6 +46,7 @@
> > * to vmalloc or even module mappings.
> > */
> > #define PASID_FLAG_SUPERVISOR_MODE BIT(0)
> > +#define PASID_FLAG_NESTED BIT(1)
> >
> > struct pasid_dir_entry {
> > u64 val;
> > @@ -55,6 +56,11 @@ struct pasid_entry {
> > u64 val[8];
> > };
> >
> > +#define PASID_ENTRY_PGTT_FL_ONLY (1)
> > +#define PASID_ENTRY_PGTT_SL_ONLY (2)
> > +#define PASID_ENTRY_PGTT_NESTED (3)
> > +#define PASID_ENTRY_PGTT_PT (4)
> > +
> > /* The representative of a PASID table */
> > struct pasid_table {
> > void *table; /*
> > pasid table pointer */ @@ -103,6 +109,12 @@ int
> > intel_pasid_setup_second_level(struct intel_iommu *iommu,
> > int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
> > struct dmar_domain *domain,
> > struct device *dev, int pasid);
> > +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> > + struct device *dev, pgd_t *pgd,
> > + int pasid,
> > + struct iommu_gpasid_bind_data_vtd
> > *pasid_data,
> > + struct dmar_domain *domain,
> > + int addr_width);
> > void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> > struct device *dev, int pasid);
> > int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
> > *pasid); --
> > 2.7.4
>

[Jacob Pan]

2019-11-01 21:27:20

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 10/11] iommu/vt-d: Support flushing more translation cache types

On Sat, 26 Oct 2019 10:22:43 +0800
Lu Baolu <[email protected]> wrote:

> Hi,
>
> On 10/25/19 3:55 AM, Jacob Pan wrote:
> > When Shared Virtual Memory is exposed to a guest via vIOMMU,
> > scalable IOTLB invalidation may be passed down from outside IOMMU
> > subsystems. This patch adds invalidation functions that can be used
> > for additional translation cache types.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/dmar.c | 46
> > +++++++++++++++++++++++++++++++++++++++++++++
> > drivers/iommu/intel-pasid.c | 3 ++- include/linux/intel-iommu.h |
> > 21 +++++++++++++++++---- 3 files changed, 65 insertions(+), 5
> > deletions(-)
> >
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> > index 49bb7d76e646..0ce2d32ff99e 100644
> > --- a/drivers/iommu/dmar.c
> > +++ b/drivers/iommu/dmar.c
> > @@ -1346,6 +1346,20 @@ void qi_flush_iotlb(struct intel_iommu
> > *iommu, u16 did, u64 addr, qi_submit_sync(&desc, iommu);
> > }
> >
> > +/* PASID-based IOTLB Invalidate */
> > +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> > u32 pasid,
> > + unsigned int size_order, u64 granu, int ih)
> > +{
> > + struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
> > +
> > + desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
> > + QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
> > + desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
> > + QI_EIOTLB_AM(size_order);
> > +
> > + qi_submit_sync(&desc, iommu);
> > +}
> > +
> > void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16
> > pfsid, u16 qdep, u64 addr, unsigned mask)
> > {
> > @@ -1369,6 +1383,38 @@ void qi_flush_dev_iotlb(struct intel_iommu
> > *iommu, u16 sid, u16 pfsid, qi_submit_sync(&desc, iommu);
> > }
> >
> > +/* PASID-based device IOTLB Invalidate */
> > +void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16
> > pfsid,
> > + u32 pasid, u16 qdep, u64 addr, unsigned
> > size_order, u64 granu) +{
> > + struct qi_desc desc;
>
> Do you need to set qw2 and qw3 to 0?
>
right, forgot to add
struct qi_desc desc = {.qw2 = 0, .qw3 = 0};

> > +
> > + desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) |
> > QI_DEV_EIOTLB_SID(sid) |
> > + QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
> > + QI_DEV_IOTLB_PFSID(pfsid);
> > + desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
> > +
> > + /* If S bit is 0, we only flush a single page. If S bit is
> > set,
> > + * The least significant zero bit indicates the
> > invalidation address
> > + * range. VT-d spec 6.5.2.6.
> > + * e.g. address bit 12[0] indicates 8KB, 13[0] indicates
> > 16KB.
> > + */
> > + if (!size_order) {
> > + desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) &
> > ~QI_DEV_EIOTLB_SIZE;
> > + } else {
> > + unsigned long mask = 1UL << (VTD_PAGE_SHIFT +
> > size_order);
> > + desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) |
> > QI_DEV_EIOTLB_SIZE;
> > + }
> > + qi_submit_sync(&desc, iommu);
> > +}
> > +
> > +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64
> > granu, int pasid) +{
> > + struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
> > +
> > + desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) |
> > QI_PC_GRAN(granu) | QI_PC_TYPE;
> > + qi_submit_sync(&desc, iommu);
> > +}
> > /*
> > * Disable Queued Invalidation interface.
> > */
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index f846a907cfcf..6d7a701ef4d3
> > 100644 --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -491,7 +491,8 @@ pasid_cache_invalidation_with_pasid(struct
> > intel_iommu *iommu, {
> > struct qi_desc desc;
> >
> > - desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL |
> > QI_PC_PASID(pasid);
> > + desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) |
> > + QI_PC_PASID(pasid) | QI_PC_TYPE;
> > desc.qw1 = 0;
> > desc.qw2 = 0;
> > desc.qw3 = 0;
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index 6c74c71b1ebf..a25fb3a0ea5b
> > 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -332,7 +332,7 @@ enum {
> > #define QI_IOTLB_GRAN(gran) (((u64)gran) >>
> > (DMA_TLB_FLUSH_GRANU_OFFSET-4)) #define QI_IOTLB_ADDR(addr)
> > (((u64)addr) & VTD_PAGE_MASK) #define
> > QI_IOTLB_IH(ih) (((u64)ih) << 6) -#define
> > QI_IOTLB_AM(am) (((u8)am)) +#define
> > QI_IOTLB_AM(am) (((u8)am) & 0x3f)
> > #define QI_CC_FM(fm) (((u64)fm) << 48)
> > #define QI_CC_SID(sid) (((u64)sid) << 32)
> > @@ -350,16 +350,21 @@ enum {
> > #define QI_PC_DID(did) (((u64)did) << 16)
> > #define QI_PC_GRAN(gran) (((u64)gran) << 4)
> >
> > -#define QI_PC_ALL_PASIDS (QI_PC_TYPE | QI_PC_GRAN(0))
> > -#define QI_PC_PASID_SEL (QI_PC_TYPE | QI_PC_GRAN(1))
> > +/* PASID cache invalidation granu */
> > +#define QI_PC_ALL_PASIDS 0
> > +#define QI_PC_PASID_SEL 1
> >
> > #define QI_EIOTLB_ADDR(addr) ((u64)(addr) & VTD_PAGE_MASK)
> > #define QI_EIOTLB_IH(ih) (((u64)ih) << 6)
> > -#define QI_EIOTLB_AM(am) (((u64)am))
> > +#define QI_EIOTLB_AM(am) (((u64)am) & 0x3f)
> > #define QI_EIOTLB_PASID(pasid) (((u64)pasid) << 32)
> > #define QI_EIOTLB_DID(did) (((u64)did) << 16)
> > #define QI_EIOTLB_GRAN(gran) (((u64)gran) << 4)
> >
> > +/* QI Dev-IOTLB inv granu */
> > +#define QI_DEV_IOTLB_GRAN_ALL 1
> > +#define QI_DEV_IOTLB_GRAN_PASID_SEL 0
> > +
> > #define QI_DEV_EIOTLB_ADDR(a) ((u64)(a) & VTD_PAGE_MASK)
> > #define QI_DEV_EIOTLB_SIZE (((u64)1) << 11)
> > #define QI_DEV_EIOTLB_GLOB(g) ((u64)g)
> > @@ -655,8 +660,16 @@ extern void qi_flush_context(struct
> > intel_iommu *iommu, u16 did, u16 sid, u8 fm, u64 type);
> > extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did,
> > u64 addr, unsigned int size_order, u64 type);
> > +extern void qi_flush_piotlb(struct intel_iommu *iommu, u16 did,
> > u64 addr,
> > + u32 pasid, unsigned int size_order, u64
> > type, int ih); extern void qi_flush_dev_iotlb(struct intel_iommu
> > *iommu, u16 sid, u16 pfsid, u16 qdep, u64 addr, unsigned mask);
> > +
> > +extern void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16
> > sid, u16 pfsid,
> > + u32 pasid, u16 qdep, u64 addr, unsigned
> > size_order, u64 granu); +
> > +extern void qi_flush_pasid_cache(struct intel_iommu *iommu, u16
> > did, u64 granu, int pasid); +
> > extern int qi_submit_sync(struct qi_desc *desc, struct
> > intel_iommu *iommu);
> > extern int dmar_ir_support(void);
> >
>
> Best regards,
> baolu

[Jacob Pan]

2019-11-01 21:28:44

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 10/11] iommu/vt-d: Support flushing more translation cache types

On Fri, 25 Oct 2019 07:21:29 +0000
"Tian, Kevin" <[email protected]> wrote:

> > From: Jacob Pan [mailto:[email protected]]
> > Sent: Friday, October 25, 2019 3:55 AM
> >
> > When Shared Virtual Memory is exposed to a guest via vIOMMU,
> > scalable IOTLB invalidation may be passed down from outside IOMMU
> > subsystems.
>
> from outside of host IOMMU subsystem
>
> > This patch adds invalidation functions that can be used for
> > additional translation cache types.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/dmar.c | 46
> > +++++++++++++++++++++++++++++++++++++++++++++
> > drivers/iommu/intel-pasid.c | 3 ++-
> > include/linux/intel-iommu.h | 21 +++++++++++++++++----
> > 3 files changed, 65 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> > index 49bb7d76e646..0ce2d32ff99e 100644
> > --- a/drivers/iommu/dmar.c
> > +++ b/drivers/iommu/dmar.c
> > @@ -1346,6 +1346,20 @@ void qi_flush_iotlb(struct intel_iommu
> > *iommu, u16 did, u64 addr,
> > qi_submit_sync(&desc, iommu);
> > }
> >
> > +/* PASID-based IOTLB Invalidate */
> > +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> > u32 pasid,
>
> qi_flush_iotlb_pasid.
will rename to make it more readable.
>
> > + unsigned int size_order, u64 granu, int ih)
> > +{
> > + struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
> > +
> > + desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
> > + QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
> > + desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
> > + QI_EIOTLB_AM(size_order);
> > +
> > + qi_submit_sync(&desc, iommu);
> > +}
> > +
> > void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16
> > pfsid, u16 qdep, u64 addr, unsigned mask)
> > {
> > @@ -1369,6 +1383,38 @@ void qi_flush_dev_iotlb(struct intel_iommu
> > *iommu, u16 sid, u16 pfsid,
> > qi_submit_sync(&desc, iommu);
> > }
> >
> > +/* PASID-based device IOTLB Invalidate */
> > +void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16
> > pfsid,
> > + u32 pasid, u16 qdep, u64 addr, unsigned
> > size_order, u64 granu)
> > +{
> > + struct qi_desc desc;
> > +
> > + desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) |
> > QI_DEV_EIOTLB_SID(sid) |
> > + QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
> > + QI_DEV_IOTLB_PFSID(pfsid);
> > + desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
> > +
> > + /* If S bit is 0, we only flush a single page. If S bit is
> > set,
> > + * The least significant zero bit indicates the
> > invalidation address
> > + * range. VT-d spec 6.5.2.6.
> > + * e.g. address bit 12[0] indicates 8KB, 13[0] indicates
> > 16KB.
> > + */
> > + if (!size_order) {
> > + desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) &
> > ~QI_DEV_EIOTLB_SIZE;
> > + } else {
> > + unsigned long mask = 1UL << (VTD_PAGE_SHIFT +
> > size_order);
> > + desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) |
> > QI_DEV_EIOTLB_SIZE;
> > + }
> > + qi_submit_sync(&desc, iommu);
> > +}
> > +
> > +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64
> > granu, int pasid)
> > +{
> > + struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
> > +
> > + desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) |
> > QI_PC_GRAN(granu) | QI_PC_TYPE;
> > + qi_submit_sync(&desc, iommu);
> > +}
> > /*
> > * Disable Queued Invalidation interface.
> > */
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index f846a907cfcf..6d7a701ef4d3
> > 100644 --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -491,7 +491,8 @@ pasid_cache_invalidation_with_pasid(struct
> > intel_iommu *iommu,
> > {
> > struct qi_desc desc;
> >
> > - desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL |
> > QI_PC_PASID(pasid);
> > + desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) |
> > + QI_PC_PASID(pasid) | QI_PC_TYPE;
> > desc.qw1 = 0;
> > desc.qw2 = 0;
> > desc.qw3 = 0;
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index 6c74c71b1ebf..a25fb3a0ea5b
> > 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -332,7 +332,7 @@ enum {
> > #define QI_IOTLB_GRAN(gran) (((u64)gran) >>
> > (DMA_TLB_FLUSH_GRANU_OFFSET-4))
> > #define QI_IOTLB_ADDR(addr) (((u64)addr) & VTD_PAGE_MASK)
> > #define QI_IOTLB_IH(ih) (((u64)ih) << 6)
> > -#define QI_IOTLB_AM(am) (((u8)am))
> > +#define QI_IOTLB_AM(am) (((u8)am) & 0x3f)
> >
> > #define QI_CC_FM(fm) (((u64)fm) << 48)
> > #define QI_CC_SID(sid) (((u64)sid) << 32)
> > @@ -350,16 +350,21 @@ enum {
> > #define QI_PC_DID(did) (((u64)did) << 16)
> > #define QI_PC_GRAN(gran) (((u64)gran) << 4)
> >
> > -#define QI_PC_ALL_PASIDS (QI_PC_TYPE | QI_PC_GRAN(0))
> > -#define QI_PC_PASID_SEL (QI_PC_TYPE | QI_PC_GRAN(1))
> > +/* PASID cache invalidation granu */
> > +#define QI_PC_ALL_PASIDS 0
> > +#define QI_PC_PASID_SEL 1
> >
> > #define QI_EIOTLB_ADDR(addr) ((u64)(addr) & VTD_PAGE_MASK)
> > #define QI_EIOTLB_IH(ih) (((u64)ih) << 6)
> > -#define QI_EIOTLB_AM(am) (((u64)am))
> > +#define QI_EIOTLB_AM(am) (((u64)am) & 0x3f)
> > #define QI_EIOTLB_PASID(pasid) (((u64)pasid) << 32)
> > #define QI_EIOTLB_DID(did) (((u64)did) << 16)
> > #define QI_EIOTLB_GRAN(gran) (((u64)gran) << 4)
> >
> > +/* QI Dev-IOTLB inv granu */
> > +#define QI_DEV_IOTLB_GRAN_ALL 1
> > +#define QI_DEV_IOTLB_GRAN_PASID_SEL 0
> > +
> > #define QI_DEV_EIOTLB_ADDR(a) ((u64)(a) & VTD_PAGE_MASK)
> > #define QI_DEV_EIOTLB_SIZE (((u64)1) << 11)
> > #define QI_DEV_EIOTLB_GLOB(g) ((u64)g)
> > @@ -655,8 +660,16 @@ extern void qi_flush_context(struct intel_iommu
> > *iommu, u16 did, u16 sid,
> > u8 fm, u64 type);
> > extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64
> > addr, unsigned int size_order, u64 type);
> > +extern void qi_flush_piotlb(struct intel_iommu *iommu, u16 did,
> > u64 addr,
> > + u32 pasid, unsigned int size_order, u64
> > type, int ih); extern void qi_flush_dev_iotlb(struct intel_iommu
> > *iommu, u16 sid, u16 pfsid,
> > u16 qdep, u64 addr, unsigned mask);
> > +
> > +extern void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16
> > sid, u16 pfsid,
> > + u32 pasid, u16 qdep, u64 addr, unsigned
> > size_order, u64 granu);
> > +
> > +extern void qi_flush_pasid_cache(struct intel_iommu *iommu, u16
> > did, u64 granu, int pasid);
> > +
> > extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu
> > *iommu);
> >
> > extern int dmar_ir_support(void);
> > --
> > 2.7.4
>

[Jacob Pan]

2019-11-08 10:34:49

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v7 01/11] iommu/vt-d: Cache virtual command capability register

Hi Jacob,

On 10/24/19 9:54 PM, Jacob Pan wrote:
> Virtual command registers are used in the guest only, to prevent
> vmexit cost, we cache the capability and store it during initialization.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/dmar.c | 1 +
> include/linux/intel-iommu.h | 4 ++++
> 2 files changed, 5 insertions(+)
>
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index eecd6a421667..49bb7d76e646 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -950,6 +950,7 @@ static int map_iommu(struct intel_iommu *iommu, u64 phys_addr)
> warn_invalid_dmar(phys_addr, " returns all ones");
> goto unmap;
> }
> + iommu->vccap = dmar_readq(iommu->reg + DMAR_VCCAP_REG);
>
> /* the registers might be more than one page */
> map_size = max_t(int, ecap_max_iotlb_offset(iommu->ecap),
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index ed11ef594378..2e1bed9b7eef 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -186,6 +186,9 @@
> #define ecap_max_handle_mask(e) ((e >> 20) & 0xf)
> #define ecap_sc_support(e) ((e >> 7) & 0x1) /* Snooping Control */
>
> +/* Virtual command interface capabilities */
> +#define vccap_pasid(v) ((v & DMA_VCS_PAS)) /* PASID allocation */
> +
> /* IOTLB_REG */
> #define DMA_TLB_FLUSH_GRANU_OFFSET 60
> #define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60)
> @@ -520,6 +523,7 @@ struct intel_iommu {
> u64 reg_size; /* size of hw register set */
> u64 cap;
> u64 ecap;
> + u64 vccap;
> u32 gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF */
> raw_spinlock_t register_lock; /* protect register handling */
> int seq_id; /* sequence id of the iommu */
>

with DMA_VCS_PAS's move in this patch as pointed out by Kevin or
vccap_pasid() move to patch 3, feel free to add

Reviewed-by: Eric Auger <[email protected]>

Eric

2019-11-08 10:37:07

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v7 02/11] iommu/vt-d: Enlightened PASID allocation

Hi Jacob,
On 10/24/19 9:54 PM, Jacob Pan wrote:
> From: Lu Baolu <[email protected]>
>
> Enabling IOMMU in a guest requires communication with the host
> driver for certain aspects. Use of PASID ID to enable Shared Virtual
> Addressing (SVA) requires managing PASID's in the host. VT-d 3.0 spec
> provides a Virtual Command Register (VCMD) to facilitate this.
> Writes to this register in the guest are trapped by QEMU which
> proxies the call to the host driver.
>
> This virtual command interface consists of a capability register,
> a virtual command register, and a virtual response register. Refer
> to section 10.4.42, 10.4.43, 10.4.44 for more information.
>
> This patch adds the enlightened PASID allocation/free interfaces
> via the virtual command interface.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Signed-off-by: Liu Yi L <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Signed-off-by: Jacob Pan <[email protected]>
> Reviewed-by: Eric Auger <[email protected]>
> ---
> drivers/iommu/intel-pasid.c | 56 +++++++++++++++++++++++++++++++++++++++++++++
> drivers/iommu/intel-pasid.h | 13 ++++++++++-
> include/linux/intel-iommu.h | 2 ++
> 3 files changed, 70 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index 040a445be300..d81e857d2b25 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -63,6 +63,62 @@ void *intel_pasid_lookup_id(int pasid)
> return p;
> }
>
> +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
> +{
> + unsigned long flags;
> + u8 status_code;
> + int ret = 0;
> + u64 res;
> +
> + raw_spin_lock_irqsave(&iommu->register_lock, flags);
> + dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC);
> + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> + !(res & VCMD_VRSP_IP), res);
> + raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> +
> + status_code = VCMD_VRSP_SC(res);
> + switch (status_code) {
> + case VCMD_VRSP_SC_SUCCESS:
> + *pasid = VCMD_VRSP_RESULT(res);
> + break;
> + case VCMD_VRSP_SC_NO_PASID_AVAIL:
> + pr_info("IOMMU: %s: No PASID available\n", iommu->name);
> + ret = -ENOMEM;
> + break;
> + default:
> + ret = -ENODEV;
> + pr_warn("IOMMU: %s: Unexpected error code %d\n",
> + iommu->name, status_code);
> + }
> +
> + return ret;
> +}
> +
> +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
> +{
> + unsigned long flags;
> + u8 status_code;
> + u64 res;
> +
> + raw_spin_lock_irqsave(&iommu->register_lock, flags);
> + dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) | VCMD_CMD_FREE);
> + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> + !(res & VCMD_VRSP_IP), res);
> + raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> +
> + status_code = VCMD_VRSP_SC(res);
> + switch (status_code) {
> + case VCMD_VRSP_SC_SUCCESS:
> + break;
> + case VCMD_VRSP_SC_INVALID_PASID:
> + pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
> + break;
> + default:
> + pr_warn("IOMMU: %s: Unexpected error code %d\n",
> + iommu->name, status_code);
> + }
> +}
> +
> /*
> * Per device pasid table management:
> */
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index fc8cd8f17de1..e413e884e685 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -23,6 +23,16 @@
> #define is_pasid_enabled(entry) (((entry)->lo >> 3) & 0x1)
> #define get_pasid_dir_size(entry) (1 << ((((entry)->lo >> 9) & 0x7) + 7))
>
> +/* Virtual command interface for enlightened pasid management. */
> +#define VCMD_CMD_ALLOC 0x1
> +#define VCMD_CMD_FREE 0x2
> +#define VCMD_VRSP_IP 0x1
> +#define VCMD_VRSP_SC(e) (((e) >> 1) & 0x3)
> +#define VCMD_VRSP_SC_SUCCESS 0
> +#define VCMD_VRSP_SC_NO_PASID_AVAIL 1
> +#define VCMD_VRSP_SC_INVALID_PASID 1
> +#define VCMD_VRSP_RESULT(e) (((e) >> 8) & 0xfffff)
nit: pasid is 20b but result field is 56b large
Just in case a new command were to be added later on.
> +
> /*
> * Domain ID reserved for pasid entries programmed for first-level
> * only and pass-through transfer modes.
> @@ -95,5 +105,6 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
> struct device *dev, int pasid);
> void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> struct device *dev, int pasid);
> -
> +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
> +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid);
> #endif /* __INTEL_PASID_H */
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 2e1bed9b7eef..1d4b8dcdc5d8 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -161,6 +161,7 @@
> #define ecap_smpwc(e) (((e) >> 48) & 0x1)
> #define ecap_flts(e) (((e) >> 47) & 0x1)
> #define ecap_slts(e) (((e) >> 46) & 0x1)
> +#define ecap_vcs(e) (((e) >> 44) & 0x1)
nit: this addition is not related to this patch
may be moved to [3] as vccap_pasid
> #define ecap_smts(e) (((e) >> 43) & 0x1)
> #define ecap_dit(e) ((e >> 41) & 0x1)
> #define ecap_pasid(e) ((e >> 40) & 0x1)
> @@ -282,6 +283,7 @@
>
> /* PRS_REG */
> #define DMA_PRS_PPR ((u32)1)
> +#define DMA_VCS_PAS ((u64)1)
>
> #define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \
> do { \
>
Otherwise looks good to me

Thanks

Eric

2019-11-08 10:41:33

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v7 03/11] iommu/vt-d: Add custom allocator for IOASID

Hi Jacob,

On 10/24/19 9:54 PM, Jacob Pan wrote:
> When VT-d driver runs in the guest, PASID allocation must be
> performed via virtual command interface. This patch registers a
> custom IOASID allocator which takes precedence over the default
> XArray based allocator. The resulting IOASID allocation will always
> come from the host. This ensures that PASID namespace is system-
> wide.
>
> Signed-off-by: Lu Baolu <[email protected]>
> Signed-off-by: Liu, Yi L <[email protected]>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/Kconfig | 1 +
> drivers/iommu/intel-iommu.c | 67 +++++++++++++++++++++++++++++++++++++++++++++
> include/linux/intel-iommu.h | 2 ++
> 3 files changed, 70 insertions(+)
>
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index fd50ddffffbf..961fe5795a90 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -211,6 +211,7 @@ config INTEL_IOMMU_SVM
> bool "Support for Shared Virtual Memory with Intel IOMMU"
> depends on INTEL_IOMMU && X86
> select PCI_PASID
> + select IOASID
> select MMU_NOTIFIER
> help
> Shared Virtual Memory (SVM) provides a facility for devices
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 3f974919d3bd..ced1d89ef977 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1706,6 +1706,9 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
> if (ecap_prs(iommu->ecap))
> intel_svm_finish_prq(iommu);
> }
> + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
> + ioasid_unregister_allocator(&iommu->pasid_allocator);
> +
> #endif
> }
>
> @@ -4910,6 +4913,44 @@ static int __init probe_acpi_namespace_devices(void)
> return 0;
> }
>
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max, void *data)
> +{
> + struct intel_iommu *iommu = data;
> + ioasid_t ioasid;
> +
> + /*
> + * VT-d virtual command interface always uses the full 20 bit
> + * PASID range. Host can partition guest PASID range based on
> + * policies but it is out of guest's control.
> + */
> + if (min < PASID_MIN || max > intel_pasid_max_id)> + return INVALID_IOASID;

> +
> + if (vcmd_alloc_pasid(iommu, &ioasid))
> + return INVALID_IOASID;
> +
> + return ioasid;
> +}
> +
> +static void intel_ioasid_free(ioasid_t ioasid, void *data)
> +{
> + struct intel_iommu *iommu = data;
> +
> + if (!iommu)
> + return;
> + /*
> + * Sanity check the ioasid owner is done at upper layer, e.g. VFIO
> + * We can only free the PASID when all the devices are unbond.
> + */
> + if (ioasid_find(NULL, ioasid, NULL)) {
> + pr_alert("Cannot free active IOASID %d\n", ioasid);
> + return;
> + }
> + vcmd_free_pasid(iommu, ioasid);
> +}
> +#endif
> +
> int __init intel_iommu_init(void)
> {
> int ret = -ENODEV;
> @@ -5020,6 +5061,32 @@ int __init intel_iommu_init(void)
> "%s", iommu->name);
> iommu_device_set_ops(&iommu->iommu, &intel_iommu_ops);
> iommu_device_register(&iommu->iommu);
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap)) {
> + pr_info("Register custom PASID allocator\n");
> + /*
> + * Register a custom ASID allocator if we are running
> + * in a guest, the purpose is to have a system wide PASID
> + * namespace among all PASID users.
> + * There can be multiple vIOMMUs in each guest but only
> + * one allocator is active. All vIOMMU allocators will
> + * eventually be calling the same host allocator.
> + */
> + iommu->pasid_allocator.alloc = intel_ioasid_alloc;
> + iommu->pasid_allocator.free = intel_ioasid_free;
> + iommu->pasid_allocator.pdata = (void *)iommu;
> + ret = ioasid_register_allocator(&iommu->pasid_allocator);
> + if (ret) {
> + pr_warn("Custom PASID allocator registeration failed\n");
nit: registration
> + /*
> + * Disable scalable mode on this IOMMU if there
> + * is no custom allocator. Mixing SM capable vIOMMU
> + * and non-SM vIOMMU are not supported:
nit; is not supported. But I guess you will reshape it according to
previous comments.
> + */
> + intel_iommu_sm = 0;
> + }
> + }
> +#endif
> }
>
> bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 1d4b8dcdc5d8..c624733cb2e6 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -19,6 +19,7 @@
> #include <linux/iommu.h>
> #include <linux/io-64-nonatomic-lo-hi.h>
> #include <linux/dmar.h>
> +#include <linux/ioasid.h>
>
> #include <asm/cacheflush.h>
> #include <asm/iommu.h>
> @@ -546,6 +547,7 @@ struct intel_iommu {
> #ifdef CONFIG_INTEL_IOMMU_SVM
> struct page_req_dsc *prq;
> unsigned char prq_name[16]; /* Name for PRQ interrupt */
> + struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for PASIDs */
> #endif
> struct q_inval *qi; /* Queued invalidation info */
> u32 *iommu_state; /* Store iommu states between suspend and resume.*/
>

Thanks

Eric

2019-11-08 11:31:51

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v7 04/11] iommu/vt-d: Replace Intel specific PASID allocator with IOASID

Hi Jacob,

On 10/24/19 9:54 PM, Jacob Pan wrote:
> Make use of generic IOASID code to manage PASID allocation,
> free, and lookup. Replace Intel specific code.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 12 ++++++------
> drivers/iommu/intel-pasid.c | 36 ------------------------------------
> drivers/iommu/intel-svm.c | 39 +++++++++++++++++++++++----------------
> 3 files changed, 29 insertions(+), 58 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index ced1d89ef977..2ea09b988a23 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5311,7 +5311,7 @@ static void auxiliary_unlink_device(struct dmar_domain *domain,
> domain->auxd_refcnt--;
>
> if (!domain->auxd_refcnt && domain->default_pasid > 0)
> - intel_pasid_free_id(domain->default_pasid);
> + ioasid_free(domain->default_pasid);
> }
>
> static int aux_domain_add_dev(struct dmar_domain *domain,
> @@ -5329,10 +5329,10 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
> if (domain->default_pasid <= 0) {
> int pasid;
>
> - pasid = intel_pasid_alloc_id(domain, PASID_MIN,
> - pci_max_pasids(to_pci_dev(dev)),
> - GFP_KERNEL);
> - if (pasid <= 0) {
> + /* No private data needed for the default pasid */
> + pasid = ioasid_alloc(NULL, PASID_MIN, pci_max_pasids(to_pci_dev(dev)) - 1,
> + NULL);
> + if (pasid == INVALID_IOASID) {
> pr_err("Can't allocate default pasid\n");
> return -ENODEV;
> }
> @@ -5368,7 +5368,7 @@ static int aux_domain_add_dev(struct dmar_domain *domain,
> spin_unlock(&iommu->lock);
> spin_unlock_irqrestore(&device_domain_lock, flags);
> if (!domain->auxd_refcnt && domain->default_pasid > 0)
> - intel_pasid_free_id(domain->default_pasid);
> + ioasid_free(domain->default_pasid);
>
> return ret;
> }
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index d81e857d2b25..e79d680fe300 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -26,42 +26,6 @@
> */
> static DEFINE_SPINLOCK(pasid_lock);
> u32 intel_pasid_max_id = PASID_MAX;
> -static DEFINE_IDR(pasid_idr);
> -
> -int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
> -{
> - int ret, min, max;
> -
> - min = max_t(int, start, PASID_MIN);
> - max = min_t(int, end, intel_pasid_max_id);
> -
> - WARN_ON(in_interrupt());
> - idr_preload(gfp);
> - spin_lock(&pasid_lock);
> - ret = idr_alloc(&pasid_idr, ptr, min, max, GFP_ATOMIC);
> - spin_unlock(&pasid_lock);
> - idr_preload_end();
> -
> - return ret;
> -}
> -
> -void intel_pasid_free_id(int pasid)
> -{
> - spin_lock(&pasid_lock);
> - idr_remove(&pasid_idr, pasid);
> - spin_unlock(&pasid_lock);
> -}
> -
> -void *intel_pasid_lookup_id(int pasid)
> -{
> - void *p;
> -
> - spin_lock(&pasid_lock);
> - p = idr_find(&pasid_idr, pasid);
> - spin_unlock(&pasid_lock);
> -
> - return p;
> -}
>
> int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid)
> {
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 9b159132405d..a9a7f85a09bc 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -17,6 +17,7 @@
> #include <linux/dmar.h>
> #include <linux/interrupt.h>
> #include <linux/mm_types.h>
> +#include <linux/ioasid.h>
> #include <asm/page.h>
>
> #include "intel-pasid.h"
> @@ -318,16 +319,15 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
> if (pasid_max > intel_pasid_max_id)
> pasid_max = intel_pasid_max_id;
>
> - /* Do not use PASID 0 in caching mode (virtualised IOMMU) */
> - ret = intel_pasid_alloc_id(svm,
> - !!cap_caching_mode(iommu->cap),
> - pasid_max - 1, GFP_KERNEL);
> - if (ret < 0) {
> + /* Do not use PASID 0, reserved for RID to PASID */
> + svm->pasid = ioasid_alloc(NULL, PASID_MIN,
> + pasid_max - 1, svm);
pasid_max -1 is inclusive. whereas max param in intel_pasid_alloc_id()
is exclusive right? If you fixed an issue, you can mention it in the
commit message.
> + if (svm->pasid == INVALID_IOASID) {
> kfree(svm);> kfree(sdev);
> + ret = ENOSPC;
-ENOSPC.
Nit: in 2/11 vcmd_alloc_pasid returned -ENOMEM
> goto out;
> }
> - svm->pasid = ret;
> svm->notifier.ops = &intel_mmuops;
> svm->mm = mm;
> svm->flags = flags;
> @@ -337,7 +337,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
> if (mm) {
> ret = mmu_notifier_register(&svm->notifier, mm);
> if (ret) {
> - intel_pasid_free_id(svm->pasid);
> + ioasid_free(svm->pasid);
> kfree(svm);
> kfree(sdev);
> goto out;
> @@ -353,7 +353,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
> if (ret) {
> if (mm)
> mmu_notifier_unregister(&svm->notifier, mm);
> - intel_pasid_free_id(svm->pasid);
> + ioasid_free(svm->pasid);
> kfree(svm);
> kfree(sdev);
> goto out;
> @@ -401,7 +401,12 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
> if (!iommu)
> goto out;
>
> - svm = intel_pasid_lookup_id(pasid);
> + svm = ioasid_find(NULL, pasid, NULL);
> + if (IS_ERR(svm)) {
> + ret = PTR_ERR(svm);
> + goto out;
> + }
> +
> if (!svm)
> goto out;
>
> @@ -423,7 +428,9 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
> kfree_rcu(sdev, rcu);
>
> if (list_empty(&svm->devs)) {
> - intel_pasid_free_id(svm->pasid);
> + /* Clear private data so that free pass check */> + ioasid_set_data(svm->pasid, NULL);
I don't get the above comment. Why is it needed?
> + ioasid_free(svm->pasid);
> if (svm->mm)
> mmu_notifier_unregister(&svm->notifier, svm->mm);
>
> @@ -458,10 +465,11 @@ int intel_svm_is_pasid_valid(struct device *dev, int pasid)
> if (!iommu)
> goto out;
>
> - svm = intel_pasid_lookup_id(pasid);
> - if (!svm)
> + svm = ioasid_find(NULL, pasid, NULL);
> + if (IS_ERR(svm)) {
> + ret = PTR_ERR(svm);
> goto out;
> -
> + }
> /* init_mm is used in this case */
> if (!svm->mm)
> ret = 1;
> @@ -568,13 +576,12 @@ static irqreturn_t prq_event_thread(int irq, void *d)
>
> if (!svm || svm->pasid != req->pasid) {
> rcu_read_lock();
> - svm = intel_pasid_lookup_id(req->pasid);
> + svm = ioasid_find(NULL, req->pasid, NULL);
> /* It *can't* go away, because the driver is not permitted
> * to unbind the mm while any page faults are outstanding.
> * So we only need RCU to protect the internal idr code. */
> rcu_read_unlock();
> -
> - if (!svm) {
> + if (IS_ERR(svm) || !svm) {
> pr_err("%s: Page request for invalid PASID %d: %08llx %08llx\n",
> iommu->name, req->pasid, ((unsigned long long *)req)[0],
> ((unsigned long long *)req)[1]);
>
Thanks

Eric

2019-11-08 13:57:30

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v7 07/11] iommu/vt-d: Add nested translation helper function

Hi Jacob,

On 10/24/19 9:55 PM, Jacob Pan wrote:
> Nested translation mode is supported in VT-d 3.0 Spec.CH 3.8.
> With PASID granular translation type set to 0x11b, translation
> result from the first level(FL) also subject to a second level(SL)
> page table translation. This mode is used for SVA virtualization,
> where FL performs guest virtual to guest physical translation and
> SL performs guest physical to host physical translation.
>
> Signed-off-by: Jacob Pan <[email protected]>
> Signed-off-by: Liu, Yi L <[email protected]>
> ---
> drivers/iommu/intel-pasid.c | 207 ++++++++++++++++++++++++++++++++++++++++++++
> drivers/iommu/intel-pasid.h | 12 +++
> 2 files changed, 219 insertions(+)
>
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index ffbd416ed3b8..f846a907cfcf 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -415,6 +415,76 @@ pasid_set_flpm(struct pasid_entry *pe, u64 value)
> pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
> }
>
> +/*
> + * Setup the Extended Memory Type(EMT) field (Bits 91-93)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_emt(struct pasid_entry *pe, u64 value)
> +{
> + pasid_set_bits(&pe->val[1], GENMASK_ULL(29, 27), value << 27);
> +}
> +
> +/*
> + * Setup the Page Attribute Table (PAT) field (Bits 96-127)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_pat(struct pasid_entry *pe, u64 value)
> +{
> + pasid_set_bits(&pe->val[1], GENMASK_ULL(63, 32), value << 27);
> +}
> +
> +/*
> + * Setup the Cache Disable (CD) field (Bit 89)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_cd(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 25, 1);
should be pasid_set_bits(&pe->val[1], 1 << 25, 1 << 25);
and same for below individual bit settings.

a macro could be introduced, taking the offset (up to 511) and the size
and this would automatically select the right pe->val[n] and convert the
offset into a 64b one. I think the readability would be improved versus
the spec.

Not related to this patch but it may be worth to "&" the "bits" value
with the mask to avoid any wrong value to overwrite other fields?

> +}
> +
> +/*
> + * Setup the Extended Memory Type Enable (EMTE) field (Bit 90)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_emte(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 26, 1);
> +}
> +
> +/*
> + * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_eafe(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[2], 1 << 7, 1);> +}
> +
> +/*
> + * Setup the Page-level Cache Disable (PCD) field (Bit 95)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_pcd(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 31, 1);
> +}
> +
> +/*
> + * Setup the Page-level Write-Through (PWT)) field (Bit 94)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_pwt(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 30, 1);
> +}
> +
> static void
> pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
> u16 did, int pasid)
> @@ -647,3 +717,140 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
>
> return 0;
> }
> +
> +static int intel_pasid_setup_bind_data(struct intel_iommu *iommu,
> + struct pasid_entry *pte,
> + struct iommu_gpasid_bind_data_vtd *pasid_data)
> +{
> + /*
> + * Not all guest PASID table entry fields are passed down during bind,
> + * here we only set up the ones that are dependent on guest settings.
> + * Execution related bits such as NXE, SMEP are not meaningful to IOMMU,
> + * therefore not set. Other fields, such as snoop related, are set based
> + * on host needs regardless of guest settings.
> + */
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_SRE) {
> + if (!ecap_srs(iommu->ecap)) {
> + pr_err("No supervisor request support on %s\n",
> + iommu->name);
> + return -EINVAL;
> + }
> + pasid_set_sre(pte);
> + }
> +
> + if ((pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) && ecap_eafs(iommu->ecap))
> + pasid_set_eafe(pte);
> +
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EMTE) {
> + pasid_set_emte(pte);
> + pasid_set_emt(pte, pasid_data->emt);
> + }
> +
> + /*
> + * Memory type is only applicable to devices inside processor coherent
> + * domain. PCIe devices are not included. We can skip the rest of the
> + * flags if IOMMU does not support MTS.
> + */
> + if (!ecap_mts(iommu->ecap)) {
> + pr_info("%s does not support memory type bind guest PASID\n",
> + iommu->name);
> + return 0;
> + }
> +
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PCD)
> + pasid_set_pcd(pte);
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_PWT)
> + pasid_set_pwt(pte);
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_CD)
> + pasid_set_cd(pte);
> + pasid_set_pat(pte, pasid_data->pat);
> +
> + return 0;
> +
> +}
> +
> +/**
> + * intel_pasid_setup_nested() - Set up PASID entry for nested translation
> + * which is used for vSVA. The first level page tables are used for
> + * GVA-GPA translation in the guest, second level page tables are used
> + * for GPA to HPA translation.
> + *
> + * @iommu: Iommu which the device belong to
belongs
> + * @dev: Device to be set up for translation
> + * @gpgd: FLPTPTR: First Level Page translation pointer in GPA
> + * @pasid: PASID to be programmed in the device PASID table
> + * @pasid_data: Additional PASID info from the guest bind request
> + * @domain: Domain info for setting up second level page tables
> + * @addr_width: Address width of the first level (guest)
> + */
> +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> + struct device *dev, pgd_t *gpgd,
> + int pasid, struct iommu_gpasid_bind_data_vtd *pasid_data,
> + struct dmar_domain *domain,
> + int addr_width)
> +{
> + struct pasid_entry *pte;
> + struct dma_pte *pgd;
> + u64 pgd_val;
> + int agaw;
> + u16 did;
> +
> + if (!ecap_nest(iommu->ecap)) {
> + pr_err("IOMMU: %s: No nested translation support\n",
> + iommu->name);
> + return -EINVAL;
> + }
> +
> + pte = intel_pasid_get_entry(dev, pasid);
> + if (WARN_ON(!pte))
> + return -EINVAL;
> +
> + pasid_clear_entry(pte);
> +
> + /* Sanity checking performed by caller to make sure address
> + * width matching in two dimensions:
s/matching/match
> + * 1. CPU vs. IOMMU
> + * 2. Guest vs. Host.
> + */
> + switch (addr_width) {
> + case 57:
> + pasid_set_flpm(pte, 1);
> + break;
> + case 48:
> + pasid_set_flpm(pte, 0);
> + break;
> + default:
> + dev_err(dev, "Invalid paging mode %d\n", addr_width);
> + return -EINVAL;
> + }
> +
> + pasid_set_flptr(pte, (u64)gpgd);
> +
> + intel_pasid_setup_bind_data(iommu, pte, pasid_data);
> +
> + /* Setup the second level based on the given domain */
> + pgd = domain->pgd;
> +
> + for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
> + pgd = phys_to_virt(dma_pte_addr(pgd));
> + if (!dma_pte_present(pgd)) {
> + dev_err(dev, "Invalid domain page table\n");
> + return -EINVAL;
> + }
> + }
> + pgd_val = virt_to_phys(pgd);
> + pasid_set_slptr(pte, pgd_val);
> + pasid_set_fault_enable(pte);
> +
> + did = domain->iommu_did[iommu->seq_id];
> + pasid_set_domain_id(pte, did);
> +
> + pasid_set_address_width(pte, agaw);
> + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
> +
> + pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
> + pasid_set_present(pte);
> + pasid_flush_caches(iommu, pte, pasid, did);
> +
> + return 0;
> +}
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index e413e884e685..09c85db73b77 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -46,6 +46,7 @@
> * to vmalloc or even module mappings.
> */
> #define PASID_FLAG_SUPERVISOR_MODE BIT(0)
> +#define PASID_FLAG_NESTED BIT(1)
>
> struct pasid_dir_entry {
> u64 val;
> @@ -55,6 +56,11 @@ struct pasid_entry {
> u64 val[8];
> };
>
> +#define PASID_ENTRY_PGTT_FL_ONLY (1)
> +#define PASID_ENTRY_PGTT_SL_ONLY (2)
> +#define PASID_ENTRY_PGTT_NESTED (3)
> +#define PASID_ENTRY_PGTT_PT (4)
> +
> /* The representative of a PASID table */
> struct pasid_table {
> void *table; /* pasid table pointer */
> @@ -103,6 +109,12 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
> int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
> struct dmar_domain *domain,
> struct device *dev, int pasid);
> +int intel_pasid_setup_nested(struct intel_iommu *iommu,
> + struct device *dev, pgd_t *pgd,
> + int pasid,
> + struct iommu_gpasid_bind_data_vtd *pasid_data,
> + struct dmar_domain *domain,
> + int addr_width);
> void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> struct device *dev, int pasid);
> int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int *pasid);
>
Thanks

Eric

2019-11-08 16:19:52

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v7 10/11] iommu/vt-d: Support flushing more translation cache types

Hi Jacob,

On 10/24/19 9:55 PM, Jacob Pan wrote:
> When Shared Virtual Memory is exposed to a guest via vIOMMU, scalable
> IOTLB invalidation may be passed down from outside IOMMU subsystems.
> This patch adds invalidation functions that can be used for additional
> translation cache types.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/dmar.c | 46 +++++++++++++++++++++++++++++++++++++++++++++
> drivers/iommu/intel-pasid.c | 3 ++-
> include/linux/intel-iommu.h | 21 +++++++++++++++++----
> 3 files changed, 65 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index 49bb7d76e646..0ce2d32ff99e 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1346,6 +1346,20 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> qi_submit_sync(&desc, iommu);
> }
>
> +/* PASID-based IOTLB Invalidate */
> +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr, u32 pasid,
> + unsigned int size_order, u64 granu, int ih)
> +{
> + struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
> +
> + desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
> + QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
> + desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
> + QI_EIOTLB_AM(size_order);
> +
> + qi_submit_sync(&desc, iommu);
> +}
> +
> void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> u16 qdep, u64 addr, unsigned mask)
> {
> @@ -1369,6 +1383,38 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> qi_submit_sync(&desc, iommu);
> }
>
> +/* PASID-based device IOTLB Invalidate */
> +void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> + u32 pasid, u16 qdep, u64 addr, unsigned size_order, u64 granu)
> +{
> + struct qi_desc desc;
> +
> + desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) | QI_DEV_EIOTLB_SID(sid) |
> + QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
> + QI_DEV_IOTLB_PFSID(pfsid);
> + desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
> +
> + /* If S bit is 0, we only flush a single page. If S bit is set,
> + * The least significant zero bit indicates the invalidation address
> + * range. VT-d spec 6.5.2.6.
> + * e.g. address bit 12[0] indicates 8KB, 13[0] indicates 16KB.
> + */
> + if (!size_order) {
> + desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) & ~QI_DEV_EIOTLB_SIZE;
this is desc.qw1

With that fixed and the qi_flush_dev_piotlb init issue spotted by Lu,
feel free to add my

Reviewed-by: Eric Auger <[email protected]>

Thanks

Eric

> + } else {
> + unsigned long mask = 1UL << (VTD_PAGE_SHIFT + size_order);
> + desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) | QI_DEV_EIOTLB_SIZE;
> + }
> + qi_submit_sync(&desc, iommu);
> +}
> +
> +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int pasid)
> +{
> + struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
> +
> + desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) | QI_PC_GRAN(granu) | QI_PC_TYPE;
> + qi_submit_sync(&desc, iommu);
> +}
> /*
> * Disable Queued Invalidation interface.
> */
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index f846a907cfcf..6d7a701ef4d3 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -491,7 +491,8 @@ pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
> {
> struct qi_desc desc;
>
> - desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL | QI_PC_PASID(pasid);
> + desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) |
> + QI_PC_PASID(pasid) | QI_PC_TYPE;
> desc.qw1 = 0;
> desc.qw2 = 0;
> desc.qw3 = 0;
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 6c74c71b1ebf..a25fb3a0ea5b 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -332,7 +332,7 @@ enum {
> #define QI_IOTLB_GRAN(gran) (((u64)gran) >> (DMA_TLB_FLUSH_GRANU_OFFSET-4))
> #define QI_IOTLB_ADDR(addr) (((u64)addr) & VTD_PAGE_MASK)
> #define QI_IOTLB_IH(ih) (((u64)ih) << 6)
> -#define QI_IOTLB_AM(am) (((u8)am))
> +#define QI_IOTLB_AM(am) (((u8)am) & 0x3f)
>
> #define QI_CC_FM(fm) (((u64)fm) << 48)
> #define QI_CC_SID(sid) (((u64)sid) << 32)
> @@ -350,16 +350,21 @@ enum {
> #define QI_PC_DID(did) (((u64)did) << 16)
> #define QI_PC_GRAN(gran) (((u64)gran) << 4)
>
> -#define QI_PC_ALL_PASIDS (QI_PC_TYPE | QI_PC_GRAN(0))
> -#define QI_PC_PASID_SEL (QI_PC_TYPE | QI_PC_GRAN(1))
> +/* PASID cache invalidation granu */
> +#define QI_PC_ALL_PASIDS 0
> +#define QI_PC_PASID_SEL 1
>
> #define QI_EIOTLB_ADDR(addr) ((u64)(addr) & VTD_PAGE_MASK)
> #define QI_EIOTLB_IH(ih) (((u64)ih) << 6)
> -#define QI_EIOTLB_AM(am) (((u64)am))
> +#define QI_EIOTLB_AM(am) (((u64)am) & 0x3f)
> #define QI_EIOTLB_PASID(pasid) (((u64)pasid) << 32)
> #define QI_EIOTLB_DID(did) (((u64)did) << 16)
> #define QI_EIOTLB_GRAN(gran) (((u64)gran) << 4)
>
> +/* QI Dev-IOTLB inv granu */
> +#define QI_DEV_IOTLB_GRAN_ALL 1
> +#define QI_DEV_IOTLB_GRAN_PASID_SEL 0
> +
> #define QI_DEV_EIOTLB_ADDR(a) ((u64)(a) & VTD_PAGE_MASK)
> #define QI_DEV_EIOTLB_SIZE (((u64)1) << 11)
> #define QI_DEV_EIOTLB_GLOB(g) ((u64)g)
> @@ -655,8 +660,16 @@ extern void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid,
> u8 fm, u64 type);
> extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> unsigned int size_order, u64 type);
> +extern void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> + u32 pasid, unsigned int size_order, u64 type, int ih);
> extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> u16 qdep, u64 addr, unsigned mask);
> +
> +extern void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
> + u32 pasid, u16 qdep, u64 addr, unsigned size_order, u64 granu);
> +
> +extern void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, int pasid);
> +
> extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu);
>
> extern int dmar_ir_support(void);
>

2019-11-08 22:19:47

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 02/11] iommu/vt-d: Enlightened PASID allocation

On Fri, 8 Nov 2019 11:33:22 +0100
Auger Eric <[email protected]> wrote:

> Hi Jacob,
> On 10/24/19 9:54 PM, Jacob Pan wrote:
> > From: Lu Baolu <[email protected]>
> >
> > Enabling IOMMU in a guest requires communication with the host
> > driver for certain aspects. Use of PASID ID to enable Shared Virtual
> > Addressing (SVA) requires managing PASID's in the host. VT-d 3.0
> > spec provides a Virtual Command Register (VCMD) to facilitate this.
> > Writes to this register in the guest are trapped by QEMU which
> > proxies the call to the host driver.
> >
> > This virtual command interface consists of a capability register,
> > a virtual command register, and a virtual response register. Refer
> > to section 10.4.42, 10.4.43, 10.4.44 for more information.
> >
> > This patch adds the enlightened PASID allocation/free interfaces
> > via the virtual command interface.
> >
> > Cc: Ashok Raj <[email protected]>
> > Cc: Jacob Pan <[email protected]>
> > Cc: Kevin Tian <[email protected]>
> > Signed-off-by: Liu Yi L <[email protected]>
> > Signed-off-by: Lu Baolu <[email protected]>
> > Signed-off-by: Jacob Pan <[email protected]>
> > Reviewed-by: Eric Auger <[email protected]>
> > ---
> > drivers/iommu/intel-pasid.c | 56
> > +++++++++++++++++++++++++++++++++++++++++++++
> > drivers/iommu/intel-pasid.h | 13 ++++++++++-
> > include/linux/intel-iommu.h | 2 ++ 3 files changed, 70
> > insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index 040a445be300..d81e857d2b25
> > 100644 --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -63,6 +63,62 @@ void *intel_pasid_lookup_id(int pasid)
> > return p;
> > }
> >
> > +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
> > *pasid) +{
> > + unsigned long flags;
> > + u8 status_code;
> > + int ret = 0;
> > + u64 res;
> > +
> > + raw_spin_lock_irqsave(&iommu->register_lock, flags);
> > + dmar_writeq(iommu->reg + DMAR_VCMD_REG, VCMD_CMD_ALLOC);
> > + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> > + !(res & VCMD_VRSP_IP), res);
> > + raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> > +
> > + status_code = VCMD_VRSP_SC(res);
> > + switch (status_code) {
> > + case VCMD_VRSP_SC_SUCCESS:
> > + *pasid = VCMD_VRSP_RESULT(res);
> > + break;
> > + case VCMD_VRSP_SC_NO_PASID_AVAIL:
> > + pr_info("IOMMU: %s: No PASID available\n",
> > iommu->name);
> > + ret = -ENOMEM;
> > + break;
> > + default:
> > + ret = -ENODEV;
> > + pr_warn("IOMMU: %s: Unexpected error code %d\n",
> > + iommu->name, status_code);
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned int pasid)
> > +{
> > + unsigned long flags;
> > + u8 status_code;
> > + u64 res;
> > +
> > + raw_spin_lock_irqsave(&iommu->register_lock, flags);
> > + dmar_writeq(iommu->reg + DMAR_VCMD_REG, (pasid << 8) |
> > VCMD_CMD_FREE);
> > + IOMMU_WAIT_OP(iommu, DMAR_VCRSP_REG, dmar_readq,
> > + !(res & VCMD_VRSP_IP), res);
> > + raw_spin_unlock_irqrestore(&iommu->register_lock, flags);
> > +
> > + status_code = VCMD_VRSP_SC(res);
> > + switch (status_code) {
> > + case VCMD_VRSP_SC_SUCCESS:
> > + break;
> > + case VCMD_VRSP_SC_INVALID_PASID:
> > + pr_info("IOMMU: %s: Invalid PASID\n", iommu->name);
> > + break;
> > + default:
> > + pr_warn("IOMMU: %s: Unexpected error code %d\n",
> > + iommu->name, status_code);
> > + }
> > +}
> > +
> > /*
> > * Per device pasid table management:
> > */
> > diff --git a/drivers/iommu/intel-pasid.h
> > b/drivers/iommu/intel-pasid.h index fc8cd8f17de1..e413e884e685
> > 100644 --- a/drivers/iommu/intel-pasid.h
> > +++ b/drivers/iommu/intel-pasid.h
> > @@ -23,6 +23,16 @@
> > #define is_pasid_enabled(entry) (((entry)->lo >> 3)
> > & 0x1) #define get_pasid_dir_size(entry) (1 <<
> > ((((entry)->lo >> 9) & 0x7) + 7))
> > +/* Virtual command interface for enlightened pasid management. */
> > +#define VCMD_CMD_ALLOC 0x1
> > +#define VCMD_CMD_FREE 0x2
> > +#define VCMD_VRSP_IP 0x1
> > +#define VCMD_VRSP_SC(e) (((e) >> 1) & 0x3)
> > +#define VCMD_VRSP_SC_SUCCESS 0
> > +#define VCMD_VRSP_SC_NO_PASID_AVAIL 1
> > +#define VCMD_VRSP_SC_INVALID_PASID 1
> > +#define VCMD_VRSP_RESULT(e) (((e) >> 8) & 0xfffff)
> nit: pasid is 20b but result field is 56b large
> Just in case a new command were to be added later on.
Good point, will rename to VCMD_VRSP_RESULT_PASID andd new macros for
future new commands with different results.
> > +
> > /*
> > * Domain ID reserved for pasid entries programmed for first-level
> > * only and pass-through transfer modes.
> > @@ -95,5 +105,6 @@ int intel_pasid_setup_pass_through(struct
> > intel_iommu *iommu, struct device *dev, int pasid);
> > void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> > struct device *dev, int pasid);
> > -
> > +int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
> > *pasid); +void vcmd_free_pasid(struct intel_iommu *iommu, unsigned
> > int pasid); #endif /* __INTEL_PASID_H */
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index 2e1bed9b7eef..1d4b8dcdc5d8
> > 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -161,6 +161,7 @@
> > #define ecap_smpwc(e) (((e) >> 48) & 0x1)
> > #define ecap_flts(e) (((e) >> 47) & 0x1)
> > #define ecap_slts(e) (((e) >> 46) & 0x1)
> > +#define ecap_vcs(e) (((e) >> 44) & 0x1)
> nit: this addition is not related to this patch
> may be moved to [3] as vccap_pasid
Sounds good.

Thanks
> > #define ecap_smts(e) (((e) >> 43) & 0x1)
> > #define ecap_dit(e) ((e >> 41) & 0x1)
> > #define ecap_pasid(e) ((e >> 40) & 0x1)
> > @@ -282,6 +283,7 @@
> >
> > /* PRS_REG */
> > #define DMA_PRS_PPR ((u32)1)
> > +#define DMA_VCS_PAS ((u64)1)
> >
> > #define IOMMU_WAIT_OP(iommu, offset, op, cond,
> > sts) \ do
> > {
> > \
> Otherwise looks good to me
>
> Thanks
>
> Eric
>

[Jacob Pan]

2019-11-08 22:23:34

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 03/11] iommu/vt-d: Add custom allocator for IOASID

On Fri, 8 Nov 2019 11:40:23 +0100
Auger Eric <[email protected]> wrote:

> Hi Jacob,
>
> On 10/24/19 9:54 PM, Jacob Pan wrote:
> > When VT-d driver runs in the guest, PASID allocation must be
> > performed via virtual command interface. This patch registers a
> > custom IOASID allocator which takes precedence over the default
> > XArray based allocator. The resulting IOASID allocation will always
> > come from the host. This ensures that PASID namespace is system-
> > wide.
> >
> > Signed-off-by: Lu Baolu <[email protected]>
> > Signed-off-by: Liu, Yi L <[email protected]>
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/Kconfig | 1 +
> > drivers/iommu/intel-iommu.c | 67
> > +++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/intel-iommu.h | 2 ++ 3 files changed, 70
> > insertions(+)
> >
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index fd50ddffffbf..961fe5795a90 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -211,6 +211,7 @@ config INTEL_IOMMU_SVM
> > bool "Support for Shared Virtual Memory with Intel IOMMU"
> > depends on INTEL_IOMMU && X86
> > select PCI_PASID
> > + select IOASID
> > select MMU_NOTIFIER
> > help
> > Shared Virtual Memory (SVM) provides a facility for
> > devices diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 3f974919d3bd..ced1d89ef977
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -1706,6 +1706,9 @@ static void free_dmar_iommu(struct
> > intel_iommu *iommu) if (ecap_prs(iommu->ecap))
> > intel_svm_finish_prq(iommu);
> > }
> > + if (ecap_vcs(iommu->ecap) && vccap_pasid(iommu->vccap))
> > +
> > ioasid_unregister_allocator(&iommu->pasid_allocator); +
> > #endif
> > }
> >
> > @@ -4910,6 +4913,44 @@ static int __init
> > probe_acpi_namespace_devices(void) return 0;
> > }
> >
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +static ioasid_t intel_ioasid_alloc(ioasid_t min, ioasid_t max,
> > void *data) +{
> > + struct intel_iommu *iommu = data;
> > + ioasid_t ioasid;
> > +
> > + /*
> > + * VT-d virtual command interface always uses the full 20
> > bit
> > + * PASID range. Host can partition guest PASID range based
> > on
> > + * policies but it is out of guest's control.
> > + */
> > + if (min < PASID_MIN || max > intel_pasid_max_id)>
> > + return INVALID_IOASID;
>
> > +
> > + if (vcmd_alloc_pasid(iommu, &ioasid))
> > + return INVALID_IOASID;
> > +
> > + return ioasid;
> > +}
> > +
> > +static void intel_ioasid_free(ioasid_t ioasid, void *data)
> > +{
> > + struct intel_iommu *iommu = data;
> > +
> > + if (!iommu)
> > + return;
> > + /*
> > + * Sanity check the ioasid owner is done at upper layer,
> > e.g. VFIO
> > + * We can only free the PASID when all the devices are
> > unbond.
> > + */
> > + if (ioasid_find(NULL, ioasid, NULL)) {
> > + pr_alert("Cannot free active IOASID %d\n", ioasid);
> > + return;
> > + }
> > + vcmd_free_pasid(iommu, ioasid);
> > +}
> > +#endif
> > +
> > int __init intel_iommu_init(void)
> > {
> > int ret = -ENODEV;
> > @@ -5020,6 +5061,32 @@ int __init intel_iommu_init(void)
> > "%s", iommu->name);
> > iommu_device_set_ops(&iommu->iommu,
> > &intel_iommu_ops); iommu_device_register(&iommu->iommu);
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > + if (ecap_vcs(iommu->ecap) &&
> > vccap_pasid(iommu->vccap)) {
> > + pr_info("Register custom PASID
> > allocator\n");
> > + /*
> > + * Register a custom ASID allocator if we
> > are running
> > + * in a guest, the purpose is to have a
> > system wide PASID
> > + * namespace among all PASID users.
> > + * There can be multiple vIOMMUs in each
> > guest but only
> > + * one allocator is active. All vIOMMU
> > allocators will
> > + * eventually be calling the same host
> > allocator.
> > + */
> > + iommu->pasid_allocator.alloc =
> > intel_ioasid_alloc;
> > + iommu->pasid_allocator.free =
> > intel_ioasid_free;
> > + iommu->pasid_allocator.pdata = (void
> > *)iommu;
> > + ret =
> > ioasid_register_allocator(&iommu->pasid_allocator);
> > + if (ret) {
> > + pr_warn("Custom PASID allocator
> > registeration failed\n");
> nit: registration
> > + /*
> > + * Disable scalable mode on this
> > IOMMU if there
> > + * is no custom allocator. Mixing
> > SM capable vIOMMU
> > + * and non-SM vIOMMU are not
> > supported:
> nit; is not supported. But I guess you will reshape it according to
> previous comments.

Yes, i moved this earlier to avoid the need to clean up after scalable
mode root table is set.
> > + */
> > + intel_iommu_sm = 0;
> > + }
> > + }
> > +#endif
> > }
> >
> > bus_set_iommu(&pci_bus_type, &intel_iommu_ops);
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index 1d4b8dcdc5d8..c624733cb2e6
> > 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -19,6 +19,7 @@
> > #include <linux/iommu.h>
> > #include <linux/io-64-nonatomic-lo-hi.h>
> > #include <linux/dmar.h>
> > +#include <linux/ioasid.h>
> >
> > #include <asm/cacheflush.h>
> > #include <asm/iommu.h>
> > @@ -546,6 +547,7 @@ struct intel_iommu {
> > #ifdef CONFIG_INTEL_IOMMU_SVM
> > struct page_req_dsc *prq;
> > unsigned char prq_name[16]; /* Name for PRQ interrupt */
> > + struct ioasid_allocator_ops pasid_allocator; /* Custom
> > allocator for PASIDs */ #endif
> > struct q_inval *qi; /* Queued invalidation
> > info */ u32 *iommu_state; /* Store iommu states between suspend and
> > resume.*/
>
> Thanks
>
> Eric
>

[Jacob Pan]

2019-11-08 22:51:54

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 04/11] iommu/vt-d: Replace Intel specific PASID allocator with IOASID

On Fri, 8 Nov 2019 12:30:31 +0100
Auger Eric <[email protected]> wrote:

> Hi Jacob,
>
> On 10/24/19 9:54 PM, Jacob Pan wrote:
> > Make use of generic IOASID code to manage PASID allocation,
> > free, and lookup. Replace Intel specific code.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/intel-iommu.c | 12 ++++++------
> > drivers/iommu/intel-pasid.c | 36
> > ------------------------------------ drivers/iommu/intel-svm.c |
> > 39 +++++++++++++++++++++++---------------- 3 files changed, 29
> > insertions(+), 58 deletions(-)
> >
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index ced1d89ef977..2ea09b988a23
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5311,7 +5311,7 @@ static void auxiliary_unlink_device(struct
> > dmar_domain *domain, domain->auxd_refcnt--;
> >
> > if (!domain->auxd_refcnt && domain->default_pasid > 0)
> > - intel_pasid_free_id(domain->default_pasid);
> > + ioasid_free(domain->default_pasid);
> > }
> >
> > static int aux_domain_add_dev(struct dmar_domain *domain,
> > @@ -5329,10 +5329,10 @@ static int aux_domain_add_dev(struct
> > dmar_domain *domain, if (domain->default_pasid <= 0) {
> > int pasid;
> >
> > - pasid = intel_pasid_alloc_id(domain, PASID_MIN,
> > -
> > pci_max_pasids(to_pci_dev(dev)),
> > - GFP_KERNEL);
> > - if (pasid <= 0) {
> > + /* No private data needed for the default pasid */
> > + pasid = ioasid_alloc(NULL, PASID_MIN,
> > pci_max_pasids(to_pci_dev(dev)) - 1,
> > + NULL);
> > + if (pasid == INVALID_IOASID) {
> > pr_err("Can't allocate default pasid\n");
> > return -ENODEV;
> > }
> > @@ -5368,7 +5368,7 @@ static int aux_domain_add_dev(struct
> > dmar_domain *domain, spin_unlock(&iommu->lock);
> > spin_unlock_irqrestore(&device_domain_lock, flags);
> > if (!domain->auxd_refcnt && domain->default_pasid > 0)
> > - intel_pasid_free_id(domain->default_pasid);
> > + ioasid_free(domain->default_pasid);
> >
> > return ret;
> > }
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index d81e857d2b25..e79d680fe300
> > 100644 --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -26,42 +26,6 @@
> > */
> > static DEFINE_SPINLOCK(pasid_lock);
> > u32 intel_pasid_max_id = PASID_MAX;
> > -static DEFINE_IDR(pasid_idr);
> > -
> > -int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
> > -{
> > - int ret, min, max;
> > -
> > - min = max_t(int, start, PASID_MIN);
> > - max = min_t(int, end, intel_pasid_max_id);
> > -
> > - WARN_ON(in_interrupt());
> > - idr_preload(gfp);
> > - spin_lock(&pasid_lock);
> > - ret = idr_alloc(&pasid_idr, ptr, min, max, GFP_ATOMIC);
> > - spin_unlock(&pasid_lock);
> > - idr_preload_end();
> > -
> > - return ret;
> > -}
> > -
> > -void intel_pasid_free_id(int pasid)
> > -{
> > - spin_lock(&pasid_lock);
> > - idr_remove(&pasid_idr, pasid);
> > - spin_unlock(&pasid_lock);
> > -}
> > -
> > -void *intel_pasid_lookup_id(int pasid)
> > -{
> > - void *p;
> > -
> > - spin_lock(&pasid_lock);
> > - p = idr_find(&pasid_idr, pasid);
> > - spin_unlock(&pasid_lock);
> > -
> > - return p;
> > -}
> >
> > int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
> > *pasid) {
> > diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> > index 9b159132405d..a9a7f85a09bc 100644
> > --- a/drivers/iommu/intel-svm.c
> > +++ b/drivers/iommu/intel-svm.c
> > @@ -17,6 +17,7 @@
> > #include <linux/dmar.h>
> > #include <linux/interrupt.h>
> > #include <linux/mm_types.h>
> > +#include <linux/ioasid.h>
> > #include <asm/page.h>
> >
> > #include "intel-pasid.h"
> > @@ -318,16 +319,15 @@ int intel_svm_bind_mm(struct device *dev, int
> > *pasid, int flags, struct svm_dev_ if (pasid_max >
> > intel_pasid_max_id) pasid_max = intel_pasid_max_id;
> >
> > - /* Do not use PASID 0 in caching mode (virtualised
> > IOMMU) */
> > - ret = intel_pasid_alloc_id(svm,
> > - !!cap_caching_mode(iommu->cap),
> > - pasid_max - 1,
> > GFP_KERNEL);
> > - if (ret < 0) {
> > + /* Do not use PASID 0, reserved for RID to PASID */
> > + svm->pasid = ioasid_alloc(NULL, PASID_MIN,
> > + pasid_max - 1, svm);
> pasid_max -1 is inclusive. whereas max param in intel_pasid_alloc_id()
> is exclusive right? If you fixed an issue, you can mention it in the
> commit message.
yes, i should mention that. intel_pasid_alloc_id() uses IDR which is
end exclusive. ioasid uses xarray, which is inclusive.
> > + if (svm->pasid == INVALID_IOASID) {
> > kfree(svm);>
> > kfree(sdev);
> > + ret = ENOSPC;
> -ENOSPC.
> Nit: in 2/11 vcmd_alloc_pasid returned -ENOMEM
yes, it should be -ENOSPC as well.

> > goto out;
> > }
> > - svm->pasid = ret;
> > svm->notifier.ops = &intel_mmuops;
> > svm->mm = mm;
> > svm->flags = flags;
> > @@ -337,7 +337,7 @@ int intel_svm_bind_mm(struct device *dev, int
> > *pasid, int flags, struct svm_dev_ if (mm) {
> > ret =
> > mmu_notifier_register(&svm->notifier, mm); if (ret) {
> > - intel_pasid_free_id(svm->pasid);
> > + ioasid_free(svm->pasid);
> > kfree(svm);
> > kfree(sdev);
> > goto out;
> > @@ -353,7 +353,7 @@ int intel_svm_bind_mm(struct device *dev, int
> > *pasid, int flags, struct svm_dev_ if (ret) {
> > if (mm)
> > mmu_notifier_unregister(&svm->notifier,
> > mm);
> > - intel_pasid_free_id(svm->pasid);
> > + ioasid_free(svm->pasid);
> > kfree(svm);
> > kfree(sdev);
> > goto out;
> > @@ -401,7 +401,12 @@ int intel_svm_unbind_mm(struct device *dev,
> > int pasid) if (!iommu)
> > goto out;
> >
> > - svm = intel_pasid_lookup_id(pasid);
> > + svm = ioasid_find(NULL, pasid, NULL);
> > + if (IS_ERR(svm)) {
> > + ret = PTR_ERR(svm);
> > + goto out;
> > + }
> > +
> > if (!svm)
> > goto out;
> >
> > @@ -423,7 +428,9 @@ int intel_svm_unbind_mm(struct device *dev, int
> > pasid) kfree_rcu(sdev, rcu);
> >
> > if (list_empty(&svm->devs)) {
> > -
> > intel_pasid_free_id(svm->pasid);
> > + /* Clear private data so
> > that free pass check */> +
> > ioasid_set_data(svm->pasid, NULL);
> I don't get the above comment. Why is it needed?
Having private data associated with an IOASID is an indicator that this
IOASID is busy. So we have to clear it to signal it is free.
Actually, I am planning to introduce a refcount per IOASID since there
will be multiple users of IOASID, e.g. IOMMU driver and KVM. When
refcount == 0, we can free.

> > + ioasid_free(svm->pasid);
> > if (svm->mm)
> > mmu_notifier_unregister(&svm->notifier,
> > svm->mm);
> > @@ -458,10 +465,11 @@ int intel_svm_is_pasid_valid(struct device
> > *dev, int pasid) if (!iommu)
> > goto out;
> >
> > - svm = intel_pasid_lookup_id(pasid);
> > - if (!svm)
> > + svm = ioasid_find(NULL, pasid, NULL);
> > + if (IS_ERR(svm)) {
> > + ret = PTR_ERR(svm);
> > goto out;
> > -
> > + }
> > /* init_mm is used in this case */
> > if (!svm->mm)
> > ret = 1;
> > @@ -568,13 +576,12 @@ static irqreturn_t prq_event_thread(int irq,
> > void *d)
> > if (!svm || svm->pasid != req->pasid) {
> > rcu_read_lock();
> > - svm = intel_pasid_lookup_id(req->pasid);
> > + svm = ioasid_find(NULL, req->pasid, NULL);
> > /* It *can't* go away, because the driver
> > is not permitted
> > * to unbind the mm while any page faults
> > are outstanding.
> > * So we only need RCU to protect the
> > internal idr code. */ rcu_read_unlock();
> > -
> > - if (!svm) {
> > + if (IS_ERR(svm) || !svm) {
> > pr_err("%s: Page request for
> > invalid PASID %d: %08llx %08llx\n", iommu->name, req->pasid,
> > ((unsigned long long *)req)[0], ((unsigned long long *)req)[1]);
> >
> Thanks
>
> Eric
>

[Jacob Pan]

2019-11-08 23:05:03

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 10/11] iommu/vt-d: Support flushing more translation cache types

On Fri, 8 Nov 2019 17:18:10 +0100
Auger Eric <[email protected]> wrote:

> Hi Jacob,
>
> On 10/24/19 9:55 PM, Jacob Pan wrote:
> > When Shared Virtual Memory is exposed to a guest via vIOMMU,
> > scalable IOTLB invalidation may be passed down from outside IOMMU
> > subsystems. This patch adds invalidation functions that can be used
> > for additional translation cache types.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > ---
> > drivers/iommu/dmar.c | 46
> > +++++++++++++++++++++++++++++++++++++++++++++
> > drivers/iommu/intel-pasid.c | 3 ++- include/linux/intel-iommu.h |
> > 21 +++++++++++++++++---- 3 files changed, 65 insertions(+), 5
> > deletions(-)
> >
> > diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> > index 49bb7d76e646..0ce2d32ff99e 100644
> > --- a/drivers/iommu/dmar.c
> > +++ b/drivers/iommu/dmar.c
> > @@ -1346,6 +1346,20 @@ void qi_flush_iotlb(struct intel_iommu
> > *iommu, u16 did, u64 addr, qi_submit_sync(&desc, iommu);
> > }
> >
> > +/* PASID-based IOTLB Invalidate */
> > +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u64 addr,
> > u32 pasid,
> > + unsigned int size_order, u64 granu, int ih)
> > +{
> > + struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
> > +
> > + desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
> > + QI_EIOTLB_GRAN(granu) | QI_EIOTLB_TYPE;
> > + desc.qw1 = QI_EIOTLB_ADDR(addr) | QI_EIOTLB_IH(ih) |
> > + QI_EIOTLB_AM(size_order);
> > +
> > + qi_submit_sync(&desc, iommu);
> > +}
> > +
> > void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16
> > pfsid, u16 qdep, u64 addr, unsigned mask)
> > {
> > @@ -1369,6 +1383,38 @@ void qi_flush_dev_iotlb(struct intel_iommu
> > *iommu, u16 sid, u16 pfsid, qi_submit_sync(&desc, iommu);
> > }
> >
> > +/* PASID-based device IOTLB Invalidate */
> > +void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16 sid, u16
> > pfsid,
> > + u32 pasid, u16 qdep, u64 addr, unsigned
> > size_order, u64 granu) +{
> > + struct qi_desc desc;
> > +
> > + desc.qw0 = QI_DEV_EIOTLB_PASID(pasid) |
> > QI_DEV_EIOTLB_SID(sid) |
> > + QI_DEV_EIOTLB_QDEP(qdep) | QI_DEIOTLB_TYPE |
> > + QI_DEV_IOTLB_PFSID(pfsid);
> > + desc.qw1 = QI_DEV_EIOTLB_GLOB(granu);
> > +
> > + /* If S bit is 0, we only flush a single page. If S bit is
> > set,
> > + * The least significant zero bit indicates the
> > invalidation address
> > + * range. VT-d spec 6.5.2.6.
> > + * e.g. address bit 12[0] indicates 8KB, 13[0] indicates
> > 16KB.
> > + */
> > + if (!size_order) {
> > + desc.qw0 |= QI_DEV_EIOTLB_ADDR(addr) &
> > ~QI_DEV_EIOTLB_SIZE;
> this is desc.qw1
>
Right, will fix.

Thanks!
> With that fixed and the qi_flush_dev_piotlb init issue spotted by Lu,
> feel free to add my
>
> Reviewed-by: Eric Auger <[email protected]>
>
> Thanks
>
> Eric
>
> > + } else {
> > + unsigned long mask = 1UL << (VTD_PAGE_SHIFT +
> > size_order);
> > + desc.qw1 |= QI_DEV_EIOTLB_ADDR(addr & ~mask) |
> > QI_DEV_EIOTLB_SIZE;
> > + }
> > + qi_submit_sync(&desc, iommu);
> > +}
> > +
> > +void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64
> > granu, int pasid) +{
> > + struct qi_desc desc = {.qw1 = 0, .qw2 = 0, .qw3 = 0};
> > +
> > + desc.qw0 = QI_PC_PASID(pasid) | QI_PC_DID(did) |
> > QI_PC_GRAN(granu) | QI_PC_TYPE;
> > + qi_submit_sync(&desc, iommu);
> > +}
> > /*
> > * Disable Queued Invalidation interface.
> > */
> > diff --git a/drivers/iommu/intel-pasid.c
> > b/drivers/iommu/intel-pasid.c index f846a907cfcf..6d7a701ef4d3
> > 100644 --- a/drivers/iommu/intel-pasid.c
> > +++ b/drivers/iommu/intel-pasid.c
> > @@ -491,7 +491,8 @@ pasid_cache_invalidation_with_pasid(struct
> > intel_iommu *iommu, {
> > struct qi_desc desc;
> >
> > - desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL |
> > QI_PC_PASID(pasid);
> > + desc.qw0 = QI_PC_DID(did) | QI_PC_GRAN(QI_PC_PASID_SEL) |
> > + QI_PC_PASID(pasid) | QI_PC_TYPE;
> > desc.qw1 = 0;
> > desc.qw2 = 0;
> > desc.qw3 = 0;
> > diff --git a/include/linux/intel-iommu.h
> > b/include/linux/intel-iommu.h index 6c74c71b1ebf..a25fb3a0ea5b
> > 100644 --- a/include/linux/intel-iommu.h
> > +++ b/include/linux/intel-iommu.h
> > @@ -332,7 +332,7 @@ enum {
> > #define QI_IOTLB_GRAN(gran) (((u64)gran) >>
> > (DMA_TLB_FLUSH_GRANU_OFFSET-4)) #define QI_IOTLB_ADDR(addr)
> > (((u64)addr) & VTD_PAGE_MASK) #define
> > QI_IOTLB_IH(ih) (((u64)ih) << 6) -#define
> > QI_IOTLB_AM(am) (((u8)am)) +#define
> > QI_IOTLB_AM(am) (((u8)am) & 0x3f)
> > #define QI_CC_FM(fm) (((u64)fm) << 48)
> > #define QI_CC_SID(sid) (((u64)sid) << 32)
> > @@ -350,16 +350,21 @@ enum {
> > #define QI_PC_DID(did) (((u64)did) << 16)
> > #define QI_PC_GRAN(gran) (((u64)gran) << 4)
> >
> > -#define QI_PC_ALL_PASIDS (QI_PC_TYPE | QI_PC_GRAN(0))
> > -#define QI_PC_PASID_SEL (QI_PC_TYPE | QI_PC_GRAN(1))
> > +/* PASID cache invalidation granu */
> > +#define QI_PC_ALL_PASIDS 0
> > +#define QI_PC_PASID_SEL 1
> >
> > #define QI_EIOTLB_ADDR(addr) ((u64)(addr) & VTD_PAGE_MASK)
> > #define QI_EIOTLB_IH(ih) (((u64)ih) << 6)
> > -#define QI_EIOTLB_AM(am) (((u64)am))
> > +#define QI_EIOTLB_AM(am) (((u64)am) & 0x3f)
> > #define QI_EIOTLB_PASID(pasid) (((u64)pasid) << 32)
> > #define QI_EIOTLB_DID(did) (((u64)did) << 16)
> > #define QI_EIOTLB_GRAN(gran) (((u64)gran) << 4)
> >
> > +/* QI Dev-IOTLB inv granu */
> > +#define QI_DEV_IOTLB_GRAN_ALL 1
> > +#define QI_DEV_IOTLB_GRAN_PASID_SEL 0
> > +
> > #define QI_DEV_EIOTLB_ADDR(a) ((u64)(a) & VTD_PAGE_MASK)
> > #define QI_DEV_EIOTLB_SIZE (((u64)1) << 11)
> > #define QI_DEV_EIOTLB_GLOB(g) ((u64)g)
> > @@ -655,8 +660,16 @@ extern void qi_flush_context(struct
> > intel_iommu *iommu, u16 did, u16 sid, u8 fm, u64 type);
> > extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64
> > addr, unsigned int size_order, u64 type);
> > +extern void qi_flush_piotlb(struct intel_iommu *iommu, u16 did,
> > u64 addr,
> > + u32 pasid, unsigned int size_order, u64
> > type, int ih); extern void qi_flush_dev_iotlb(struct intel_iommu
> > *iommu, u16 sid, u16 pfsid, u16 qdep, u64 addr, unsigned mask);
> > +
> > +extern void qi_flush_dev_piotlb(struct intel_iommu *iommu, u16
> > sid, u16 pfsid,
> > + u32 pasid, u16 qdep, u64 addr, unsigned
> > size_order, u64 granu); +
> > +extern void qi_flush_pasid_cache(struct intel_iommu *iommu, u16
> > did, u64 granu, int pasid); +
> > extern int qi_submit_sync(struct qi_desc *desc, struct intel_iommu
> > *iommu);
> > extern int dmar_ir_support(void);
> >
>

[Jacob Pan]

2019-11-12 09:58:12

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v7 06/11] iommu/vt-d: Avoid duplicated code for PASID setup

Hi Jacob,

On 10/24/19 9:54 PM, Jacob Pan wrote:
> After each setup for PASID entry, related translation caches must be flushed.
> We can combine duplicated code into one function which is less error prone.
>
> Signed-off-by: Jacob Pan <[email protected]>
> ---
> drivers/iommu/intel-pasid.c | 48 +++++++++++++++++----------------------------
> 1 file changed, 18 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index e79d680fe300..ffbd416ed3b8 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -485,6 +485,21 @@ void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> devtlb_invalidation_with_pasid(iommu, dev, pasid);
> }
>
> +static void pasid_flush_caches(struct intel_iommu *iommu,
> + struct pasid_entry *pte,
> + int pasid, u16 did)
> +{
> + if (!ecap_coherent(iommu->ecap))
> + clflush_cache_range(pte, sizeof(*pte));
> +
> + if (cap_caching_mode(iommu->cap)) {
> + pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> + iotlb_invalidation_with_pasid(iommu, did, pasid);
> + } else {
> + iommu_flush_write_buffer(iommu);
> + }
> +}
> +
> /*
> * Set up the scalable mode pasid table entry for first only
> * translation type.
> @@ -530,16 +545,7 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
> /* Setup Present and PASID Granular Transfer Type: */
> pasid_set_translation_type(pte, 1);
> pasid_set_present(pte);
> -
> - if (!ecap_coherent(iommu->ecap))
> - clflush_cache_range(pte, sizeof(*pte));
> -
> - if (cap_caching_mode(iommu->cap)) {
> - pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> - iotlb_invalidation_with_pasid(iommu, did, pasid);
> - } else {
> - iommu_flush_write_buffer(iommu);
> - }
> + pasid_flush_caches(iommu, pte, pasid, did);
>
> return 0;
> }
> @@ -603,16 +609,7 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
> */
> pasid_set_sre(pte);
> pasid_set_present(pte);
> -
> - if (!ecap_coherent(iommu->ecap))
> - clflush_cache_range(pte, sizeof(*pte));
> -
> - if (cap_caching_mode(iommu->cap)) {
> - pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> - iotlb_invalidation_with_pasid(iommu, did, pasid);
> - } else {
> - iommu_flush_write_buffer(iommu);
> - }
> + pasid_flush_caches(iommu, pte, pasid, did);
>
> return 0;
> }
> @@ -646,16 +643,7 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
> */
> pasid_set_sre(pte);
> pasid_set_present(pte);
> -
> - if (!ecap_coherent(iommu->ecap))
> - clflush_cache_range(pte, sizeof(*pte));
> -
> - if (cap_caching_mode(iommu->cap)) {
> - pasid_cache_invalidation_with_pasid(iommu, did, pasid);
> - iotlb_invalidation_with_pasid(iommu, did, pasid);
> - } else {
> - iommu_flush_write_buffer(iommu);
> - }
> + pasid_flush_caches(iommu, pte, pasid, did);
>
> return 0;
> }
>
Reviewed-by: Eric Auger <[email protected]>

Thanks

Eric

2019-11-12 09:58:29

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v7 04/11] iommu/vt-d: Replace Intel specific PASID allocator with IOASID

Hi Jacob,

On 11/8/19 11:55 PM, Jacob Pan wrote:
> On Fri, 8 Nov 2019 12:30:31 +0100
> Auger Eric <[email protected]> wrote:
>
>> Hi Jacob,
>>
>> On 10/24/19 9:54 PM, Jacob Pan wrote:
>>> Make use of generic IOASID code to manage PASID allocation,
>>> free, and lookup. Replace Intel specific code.
>>>
>>> Signed-off-by: Jacob Pan <[email protected]>
>>> ---
>>> drivers/iommu/intel-iommu.c | 12 ++++++------
>>> drivers/iommu/intel-pasid.c | 36
>>> ------------------------------------ drivers/iommu/intel-svm.c |
>>> 39 +++++++++++++++++++++++---------------- 3 files changed, 29
>>> insertions(+), 58 deletions(-)
>>>
>>> diff --git a/drivers/iommu/intel-iommu.c
>>> b/drivers/iommu/intel-iommu.c index ced1d89ef977..2ea09b988a23
>>> 100644 --- a/drivers/iommu/intel-iommu.c
>>> +++ b/drivers/iommu/intel-iommu.c
>>> @@ -5311,7 +5311,7 @@ static void auxiliary_unlink_device(struct
>>> dmar_domain *domain, domain->auxd_refcnt--;
>>>
>>> if (!domain->auxd_refcnt && domain->default_pasid > 0)
>>> - intel_pasid_free_id(domain->default_pasid);
>>> + ioasid_free(domain->default_pasid);
>>> }
>>>
>>> static int aux_domain_add_dev(struct dmar_domain *domain,
>>> @@ -5329,10 +5329,10 @@ static int aux_domain_add_dev(struct
>>> dmar_domain *domain, if (domain->default_pasid <= 0) {
>>> int pasid;
>>>
>>> - pasid = intel_pasid_alloc_id(domain, PASID_MIN,
>>> -
>>> pci_max_pasids(to_pci_dev(dev)),
>>> - GFP_KERNEL);
>>> - if (pasid <= 0) {
>>> + /* No private data needed for the default pasid */
>>> + pasid = ioasid_alloc(NULL, PASID_MIN,
>>> pci_max_pasids(to_pci_dev(dev)) - 1,
>>> + NULL);
>>> + if (pasid == INVALID_IOASID) {
>>> pr_err("Can't allocate default pasid\n");
>>> return -ENODEV;
>>> }
>>> @@ -5368,7 +5368,7 @@ static int aux_domain_add_dev(struct
>>> dmar_domain *domain, spin_unlock(&iommu->lock);
>>> spin_unlock_irqrestore(&device_domain_lock, flags);
>>> if (!domain->auxd_refcnt && domain->default_pasid > 0)
>>> - intel_pasid_free_id(domain->default_pasid);
>>> + ioasid_free(domain->default_pasid);
>>>
>>> return ret;
>>> }
>>> diff --git a/drivers/iommu/intel-pasid.c
>>> b/drivers/iommu/intel-pasid.c index d81e857d2b25..e79d680fe300
>>> 100644 --- a/drivers/iommu/intel-pasid.c
>>> +++ b/drivers/iommu/intel-pasid.c
>>> @@ -26,42 +26,6 @@
>>> */
>>> static DEFINE_SPINLOCK(pasid_lock);
>>> u32 intel_pasid_max_id = PASID_MAX;
>>> -static DEFINE_IDR(pasid_idr);
>>> -
>>> -int intel_pasid_alloc_id(void *ptr, int start, int end, gfp_t gfp)
>>> -{
>>> - int ret, min, max;
>>> -
>>> - min = max_t(int, start, PASID_MIN);
>>> - max = min_t(int, end, intel_pasid_max_id);
>>> -
>>> - WARN_ON(in_interrupt());
>>> - idr_preload(gfp);
>>> - spin_lock(&pasid_lock);
>>> - ret = idr_alloc(&pasid_idr, ptr, min, max, GFP_ATOMIC);
>>> - spin_unlock(&pasid_lock);
>>> - idr_preload_end();
>>> -
>>> - return ret;
>>> -}
>>> -
>>> -void intel_pasid_free_id(int pasid)
>>> -{
>>> - spin_lock(&pasid_lock);
>>> - idr_remove(&pasid_idr, pasid);
>>> - spin_unlock(&pasid_lock);
>>> -}
>>> -
>>> -void *intel_pasid_lookup_id(int pasid)
>>> -{
>>> - void *p;
>>> -
>>> - spin_lock(&pasid_lock);
>>> - p = idr_find(&pasid_idr, pasid);
>>> - spin_unlock(&pasid_lock);
>>> -
>>> - return p;
>>> -}
>>>
>>> int vcmd_alloc_pasid(struct intel_iommu *iommu, unsigned int
>>> *pasid) {
>>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>>> index 9b159132405d..a9a7f85a09bc 100644
>>> --- a/drivers/iommu/intel-svm.c
>>> +++ b/drivers/iommu/intel-svm.c
>>> @@ -17,6 +17,7 @@
>>> #include <linux/dmar.h>
>>> #include <linux/interrupt.h>
>>> #include <linux/mm_types.h>
>>> +#include <linux/ioasid.h>
>>> #include <asm/page.h>
>>>
>>> #include "intel-pasid.h"
>>> @@ -318,16 +319,15 @@ int intel_svm_bind_mm(struct device *dev, int
>>> *pasid, int flags, struct svm_dev_ if (pasid_max >
>>> intel_pasid_max_id) pasid_max = intel_pasid_max_id;
>>>
>>> - /* Do not use PASID 0 in caching mode (virtualised
>>> IOMMU) */
>>> - ret = intel_pasid_alloc_id(svm,
>>> - !!cap_caching_mode(iommu->cap),
>>> - pasid_max - 1,
>>> GFP_KERNEL);
>>> - if (ret < 0) {
>>> + /* Do not use PASID 0, reserved for RID to PASID */
>>> + svm->pasid = ioasid_alloc(NULL, PASID_MIN,
>>> + pasid_max - 1, svm);
>> pasid_max -1 is inclusive. whereas max param in intel_pasid_alloc_id()
>> is exclusive right? If you fixed an issue, you can mention it in the
>> commit message.
> yes, i should mention that. intel_pasid_alloc_id() uses IDR which is
> end exclusive. ioasid uses xarray, which is inclusive.
>>> + if (svm->pasid == INVALID_IOASID) {
>>> kfree(svm);>
>>> kfree(sdev);
>>> + ret = ENOSPC;
>> -ENOSPC.
>> Nit: in 2/11 vcmd_alloc_pasid returned -ENOMEM
> yes, it should be -ENOSPC as well.
>
>>> goto out;
>>> }
>>> - svm->pasid = ret;
>>> svm->notifier.ops = &intel_mmuops;
>>> svm->mm = mm;
>>> svm->flags = flags;
>>> @@ -337,7 +337,7 @@ int intel_svm_bind_mm(struct device *dev, int
>>> *pasid, int flags, struct svm_dev_ if (mm) {
>>> ret =
>>> mmu_notifier_register(&svm->notifier, mm); if (ret) {
>>> - intel_pasid_free_id(svm->pasid);
>>> + ioasid_free(svm->pasid);
>>> kfree(svm);
>>> kfree(sdev);
>>> goto out;
>>> @@ -353,7 +353,7 @@ int intel_svm_bind_mm(struct device *dev, int
>>> *pasid, int flags, struct svm_dev_ if (ret) {
>>> if (mm)
>>> mmu_notifier_unregister(&svm->notifier,
>>> mm);
>>> - intel_pasid_free_id(svm->pasid);
>>> + ioasid_free(svm->pasid);
>>> kfree(svm);
>>> kfree(sdev);
>>> goto out;
>>> @@ -401,7 +401,12 @@ int intel_svm_unbind_mm(struct device *dev,
>>> int pasid) if (!iommu)
>>> goto out;
>>>
>>> - svm = intel_pasid_lookup_id(pasid);
>>> + svm = ioasid_find(NULL, pasid, NULL);
>>> + if (IS_ERR(svm)) {
>>> + ret = PTR_ERR(svm);
>>> + goto out;
>>> + }
>>> +
>>> if (!svm)
>>> goto out;
>>>
>>> @@ -423,7 +428,9 @@ int intel_svm_unbind_mm(struct device *dev, int
>>> pasid) kfree_rcu(sdev, rcu);
>>>
>>> if (list_empty(&svm->devs)) {
>>> -
>>> intel_pasid_free_id(svm->pasid);
>>> + /* Clear private data so
>>> that free pass check */> +
>>> ioasid_set_data(svm->pasid, NULL);
>> I don't get the above comment. Why is it needed?
> Having private data associated with an IOASID is an indicator that this
> IOASID is busy. So we have to clear it to signal it is free.
> Actually, I am planning to introduce a refcount per IOASID since there
> will be multiple users of IOASID, e.g. IOMMU driver and KVM. When
> refcount == 0, we can free.
Ah OK I missed that. This is a specificity of the intel custom PASID
allocator, ie intel_ioasid_free and not a generic behavior.

Thanks

Eric
>
>>> + ioasid_free(svm->pasid);
>>> if (svm->mm)
>>> mmu_notifier_unregister(&svm->notifier,
>>> svm->mm);
>>> @@ -458,10 +465,11 @@ int intel_svm_is_pasid_valid(struct device
>>> *dev, int pasid) if (!iommu)
>>> goto out;
>>>
>>> - svm = intel_pasid_lookup_id(pasid);
>>> - if (!svm)
>>> + svm = ioasid_find(NULL, pasid, NULL);
>>> + if (IS_ERR(svm)) {
>>> + ret = PTR_ERR(svm);
>>> goto out;
>>> -
>>> + }
>>> /* init_mm is used in this case */
>>> if (!svm->mm)
>>> ret = 1;
>>> @@ -568,13 +576,12 @@ static irqreturn_t prq_event_thread(int irq,
>>> void *d)
>>> if (!svm || svm->pasid != req->pasid) {
>>> rcu_read_lock();
>>> - svm = intel_pasid_lookup_id(req->pasid);
>>> + svm = ioasid_find(NULL, req->pasid, NULL);
>>> /* It *can't* go away, because the driver
>>> is not permitted
>>> * to unbind the mm while any page faults
>>> are outstanding.
>>> * So we only need RCU to protect the
>>> internal idr code. */ rcu_read_unlock();
>>> -
>>> - if (!svm) {
>>> + if (IS_ERR(svm) || !svm) {
>>> pr_err("%s: Page request for
>>> invalid PASID %d: %08llx %08llx\n", iommu->name, req->pasid,
>>> ((unsigned long long *)req)[0], ((unsigned long long *)req)[1]);
>>>
>> Thanks
>>
>> Eric
>>
>
> [Jacob Pan]
>

2019-11-12 10:29:58

by Eric Auger

[permalink] [raw]
Subject: Re: [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function

Hi Jacob,

On 10/24/19 9:55 PM, Jacob Pan wrote:
> When Shared Virtual Address (SVA) is enabled for a guest OS via
> vIOMMU, we need to provide invalidation support at IOMMU API and driver
> level. This patch adds Intel VT-d specific function to implement
> iommu passdown invalidate API for shared virtual address.
>
> The use case is for supporting caching structure invalidation
> of assigned SVM capable devices. Emulated IOMMU exposes queue
> invalidation capability and passes down all descriptors from the guest
> to the physical IOMMU.
>
> The assumption is that guest to host device ID mapping should be
> resolved prior to calling IOMMU driver. Based on the device handle,
> host IOMMU driver can replace certain fields before submit to the
> invalidation queue.
>
> Signed-off-by: Jacob Pan <[email protected]>
> Signed-off-by: Ashok Raj <[email protected]>
> Signed-off-by: Liu, Yi L <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 170 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 170 insertions(+)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 5fab32fbc4b4..a73e76d6457a 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -5491,6 +5491,175 @@ static void intel_iommu_aux_detach_device(struct iommu_domain *domain,
> aux_domain_remove_dev(to_dmar_domain(domain), dev);
> }
>
> +/*
> + * 2D array for converting and sanitizing IOMMU generic TLB granularity to
> + * VT-d granularity. Invalidation is typically included in the unmap operation
> + * as a result of DMA or VFIO unmap. However, for assigned device where guest
> + * could own the first level page tables without being shadowed by QEMU. In
above sentence needs to be rephrased.
> + * this case there is no pass down unmap to the host IOMMU as a result of unmap
> + * in the guest. Only invalidations are trapped and passed down.
> + * In all cases, only first level TLB invalidation (request with PASID) can be
> + * passed down, therefore we do not include IOTLB granularity for request
> + * without PASID (second level).
> + *
> + * For an example, to find the VT-d granularity encoding for IOTLB
for example
> + * type and page selective granularity within PASID:
> + * X: indexed by iommu cache type
> + * Y: indexed by enum iommu_inv_granularity
> + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
> + *
> + * Granu_map array indicates validity of the table. 1: valid, 0: invalid
> + *
> + */
> +const static int inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
> + /* PASID based IOTLB, support PASID selective and page selective */
I would rather use the generic terminology, ie. IOTLB invalidation
supports PASID and ADDR granularity
> + {0, 1, 1},> + /* PASID based dev TLBs, only support all PASIDs or single PASID */
Device IOLTB invalidation supports DOMAIN and PASID granularities
> + {1, 1, 0},
> + /* PASID cache */
PASID cache invalidation support DOMAIN and PASID granularity
> + {1, 1, 0}
> +};
> +
> +const static u64 inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
> + /* PASID based IOTLB */
> + {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
> + /* PASID based dev TLBs */
> + {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
> + /* PASID cache */
> + {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
> +};
> +
> +static inline int to_vtd_granularity(int type, int granu, u64 *vtd_granu)
nit: this looks a bit weird to me to manipulate an u64 here. Why not use
a int
> +{
> + if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >= IOMMU_INV_GRANU_NR ||
> + !inv_type_granu_map[type][granu])
> + return -EINVAL;
> +
> + *vtd_granu = inv_type_granu_table[type][granu];> +
> + return 0;
> +}
> +
> +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
> +{
> + u64 nr_pages = (granu_size * nr_granules) >> VTD_PAGE_SHIFT;
> +
> + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9 for 2MB, etc.
> + * IOMMU cache invalidate API passes granu_size in bytes, and number of
> + * granu size in contiguous memory.
> + */
> + return order_base_2(nr_pages);
> +}
> +
> +#ifdef CONFIG_INTEL_IOMMU_SVM
> +static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
> + struct device *dev, struct iommu_cache_invalidate_info *inv_info)
> +{
> + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> + struct device_domain_info *info;
> + struct intel_iommu *iommu;
> + unsigned long flags;
> + int cache_type;
> + u8 bus, devfn;
> + u16 did, sid;
> + int ret = 0;
> + u64 size;
> +
> + if (!inv_info || !dmar_domain ||
> + inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> + return -EINVAL;
> +
> + if (!dev || !dev_is_pci(dev))
> + return -ENODEV;
> +
> + iommu = device_to_iommu(dev, &bus, &devfn);
> + if (!iommu)
> + return -ENODEV;
> +
> + spin_lock_irqsave(&device_domain_lock, flags);
> + spin_lock(&iommu->lock);
> + info = iommu_support_dev_iotlb(dmar_domain, iommu, bus, devfn);
> + if (!info) {
> + ret = -EINVAL;
> + goto out_unlock;
> + }
> + did = dmar_domain->iommu_did[iommu->seq_id];
> + sid = PCI_DEVID(bus, devfn);
> + size = to_vtd_size(inv_info->addr_info.granule_size, inv_info->addr_info.nb_granules);
> +
> + for_each_set_bit(cache_type, (unsigned long *)&inv_info->cache, IOMMU_CACHE_INV_TYPE_NR) {
> + u64 granu = 0;
> + u64 pasid = 0;
> +
> + ret = to_vtd_granularity(cache_type, inv_info->granularity, &granu);
> + if (ret) {
> + pr_err("Invalid cache type and granu combination %d/%d\n", cache_type,
> + inv_info->granularity);
> + break;
> + }
> +
> + /* PASID is stored in different locations based on granularity */
> + if (inv_info->granularity == IOMMU_INV_GRANU_PASID)
> + pasid = inv_info->pasid_info.pasid;
you need to check IOMMU_INV_ADDR_FLAGS_PASID in flags
> + else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
> + pasid = inv_info->addr_info.pasid;
same
> + else {
> + pr_err("Cannot find PASID for given cache type and granularity\n");
> + break;
> + }
> +
> + switch (BIT(cache_type)) {
> + case IOMMU_CACHE_INV_TYPE_IOTLB:
> + if (size && (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
> + pr_err("Address out of range, 0x%llx, size order %llu\n",
don't you mean address not correctly aligned?
> + inv_info->addr_info.addr, size);
> + ret = -ERANGE;
> + goto out_unlock;
> + }
> +
> + qi_flush_piotlb(iommu, did, mm_to_dma_pfn(inv_info->addr_info.addr),
> + pasid, size, granu, inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF);
> +
> + /*
> + * Always flush device IOTLB if ATS is enabled since guest
> + * vIOMMU exposes CM = 1, no device IOTLB flush will be passed
> + * down.
> + */
> + if (info->ats_enabled) {
> + qi_flush_dev_piotlb(iommu, sid, info->pfsid,
> + pasid, info->ats_qdep,
> + inv_info->addr_info.addr, size,
> + granu);
> + }
> + break;
> + case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
> + if (info->ats_enabled) {
> + qi_flush_dev_piotlb(iommu, sid, info->pfsid,
> + inv_info->addr_info.pasid, info->ats_qdep,
> + inv_info->addr_info.addr, size,
> + granu);
> + } else
> + pr_warn("Passdown device IOTLB flush w/o ATS!\n");
> +
> + break;
> + case IOMMU_CACHE_INV_TYPE_PASID:
> + qi_flush_pasid_cache(iommu, did, granu, inv_info->pasid_info.pasid);
> +
> + break;
> + default:
> + dev_err(dev, "Unsupported IOMMU invalidation type %d\n",
> + cache_type);
> + ret = -EINVAL;
> + }
> + }
> +out_unlock:
> + spin_unlock(&iommu->lock);
> + spin_unlock_irqrestore(&device_domain_lock, flags);
> +
> + return ret;
> +}
> +#endif
> +
> static int intel_iommu_map(struct iommu_domain *domain,
> unsigned long iova, phys_addr_t hpa,
> size_t size, int iommu_prot)
> @@ -6027,6 +6196,7 @@ const struct iommu_ops intel_iommu_ops = {
> .is_attach_deferred = intel_iommu_is_attach_deferred,
> .pgsize_bitmap = INTEL_IOMMU_PGSIZES,
> #ifdef CONFIG_INTEL_IOMMU_SVM
> + .cache_invalidate = intel_iommu_sva_invalidate,
> .sva_bind_gpasid = intel_svm_bind_gpasid,
> .sva_unbind_gpasid = intel_svm_unbind_gpasid,
> #endif
>
Thanks

Eric

2020-02-15 01:14:16

by Jacob Pan

[permalink] [raw]
Subject: Re: [PATCH v7 11/11] iommu/vt-d: Add svm/sva invalidate function

Hi Eric,

Thanks for the review, I somehow missed it, my apologies. See comments
below.

On Tue, 12 Nov 2019 11:28:37 +0100
Auger Eric <[email protected]> wrote:

> Hi Jacob,
>
> On 10/24/19 9:55 PM, Jacob Pan wrote:
> > When Shared Virtual Address (SVA) is enabled for a guest OS via
> > vIOMMU, we need to provide invalidation support at IOMMU API and
> > driver level. This patch adds Intel VT-d specific function to
> > implement iommu passdown invalidate API for shared virtual address.
> >
> > The use case is for supporting caching structure invalidation
> > of assigned SVM capable devices. Emulated IOMMU exposes queue
> > invalidation capability and passes down all descriptors from the
> > guest to the physical IOMMU.
> >
> > The assumption is that guest to host device ID mapping should be
> > resolved prior to calling IOMMU driver. Based on the device handle,
> > host IOMMU driver can replace certain fields before submit to the
> > invalidation queue.
> >
> > Signed-off-by: Jacob Pan <[email protected]>
> > Signed-off-by: Ashok Raj <[email protected]>
> > Signed-off-by: Liu, Yi L <[email protected]>
> > ---
> > drivers/iommu/intel-iommu.c | 170
> > ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 170
> > insertions(+)
> >
> > diff --git a/drivers/iommu/intel-iommu.c
> > b/drivers/iommu/intel-iommu.c index 5fab32fbc4b4..a73e76d6457a
> > 100644 --- a/drivers/iommu/intel-iommu.c
> > +++ b/drivers/iommu/intel-iommu.c
> > @@ -5491,6 +5491,175 @@ static void
> > intel_iommu_aux_detach_device(struct iommu_domain *domain,
> > aux_domain_remove_dev(to_dmar_domain(domain), dev); }
> >
> > +/*
> > + * 2D array for converting and sanitizing IOMMU generic TLB
> > granularity to
> > + * VT-d granularity. Invalidation is typically included in the
> > unmap operation
> > + * as a result of DMA or VFIO unmap. However, for assigned device
> > where guest
> > + * could own the first level page tables without being shadowed by
> > QEMU. In
> above sentence needs to be rephrased.
Yes, how about this:
/*
* 2D array for converting and sanitizing IOMMU generic TLB granularity
to
* VT-d granularity. Invalidation is typically included in the unmap
operation
* as a result of DMA or VFIO unmap. However, for assigned devices guest
* owns the first level page tables. Invalidations of translation
caches in the
* guest are trapped and passed down to the host.
*
* vIOMMU in the guest will only expose first level page tables,
therefore
* we do not include IOTLB granularity for request without PASID
(second level). *
* For example, to find the VT-d granularity encoding for IOTLB

> > + * this case there is no pass down unmap to the host IOMMU as a
> > result of unmap
> > + * in the guest. Only invalidations are trapped and passed down.
> > + * In all cases, only first level TLB invalidation (request with
> > PASID) can be
> > + * passed down, therefore we do not include IOTLB granularity for
> > request
> > + * without PASID (second level).
> > + *
> > + * For an example, to find the VT-d granularity encoding for
> > IOTLB
> for example
sounds better.

> > + * type and page selective granularity within PASID:
> > + * X: indexed by iommu cache type
> > + * Y: indexed by enum iommu_inv_granularity
> > + * [IOMMU_CACHE_INV_TYPE_IOTLB][IOMMU_INV_GRANU_ADDR]
> > + *
> > + * Granu_map array indicates validity of the table. 1: valid, 0:
> > invalid
> > + *
> > + */
> > +const static int
> > inv_type_granu_map[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] = {
> > + /* PASID based IOTLB, support PASID selective and page
> > selective */
> I would rather use the generic terminology, ie. IOTLB invalidation
> supports PASID and ADDR granularity
Understood. My choice of terminology is based on VT-d spec and this is
VT-d only code. Perhaps add the generic terms by the side? i.e.
/*
* PASID based IOTLB invalidation: PASID selective (per PASID),
* page selective (address granularity)
*/

> > + {0, 1, 1},> + /* PASID based dev TLBs, only support
> > all PASIDs or single PASID */
> Device IOLTB invalidation supports DOMAIN and PASID granularities
> > + {1, 1, 0},
> > + /* PASID cache */
> PASID cache invalidation support DOMAIN and PASID granularity
> > + {1, 1, 0}
> > +};
> > +
> > +const static u64
> > inv_type_granu_table[IOMMU_CACHE_INV_TYPE_NR][IOMMU_INV_GRANU_NR] =
> > {
> > + /* PASID based IOTLB */
> > + {0, QI_GRAN_NONG_PASID, QI_GRAN_PSI_PASID},
> > + /* PASID based dev TLBs */
> > + {QI_DEV_IOTLB_GRAN_ALL, QI_DEV_IOTLB_GRAN_PASID_SEL, 0},
> > + /* PASID cache */
> > + {QI_PC_ALL_PASIDS, QI_PC_PASID_SEL, 0},
> > +};
> > +
> > +static inline int to_vtd_granularity(int type, int granu, u64
> > *vtd_granu)
> nit: this looks a bit weird to me to manipulate an u64 here. Why not
> use a int
Yes, should be int.
> > +{
> > + if (type >= IOMMU_CACHE_INV_TYPE_NR || granu >=
> > IOMMU_INV_GRANU_NR ||
> > + !inv_type_granu_map[type][granu])
> > + return -EINVAL;
> > +
> > + *vtd_granu = inv_type_granu_table[type][granu];> +
> > + return 0;
> > +}
> > +
> > +static inline u64 to_vtd_size(u64 granu_size, u64 nr_granules)
> > +{
> > + u64 nr_pages = (granu_size * nr_granules) >>
> > VTD_PAGE_SHIFT; +
> > + /* VT-d size is encoded as 2^size of 4K pages, 0 for 4k, 9
> > for 2MB, etc.
> > + * IOMMU cache invalidate API passes granu_size in bytes,
> > and number of
> > + * granu size in contiguous memory.
> > + */
> > + return order_base_2(nr_pages);
> > +}
> > +
> > +#ifdef CONFIG_INTEL_IOMMU_SVM
> > +static int intel_iommu_sva_invalidate(struct iommu_domain *domain,
> > + struct device *dev, struct
> > iommu_cache_invalidate_info *inv_info) +{
> > + struct dmar_domain *dmar_domain = to_dmar_domain(domain);
> > + struct device_domain_info *info;
> > + struct intel_iommu *iommu;
> > + unsigned long flags;
> > + int cache_type;
> > + u8 bus, devfn;
> > + u16 did, sid;
> > + int ret = 0;
> > + u64 size;
> > +
> > + if (!inv_info || !dmar_domain ||
> > + inv_info->version !=
> > IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> > + return -EINVAL;
> > +
> > + if (!dev || !dev_is_pci(dev))
> > + return -ENODEV;
> > +
> > + iommu = device_to_iommu(dev, &bus, &devfn);
> > + if (!iommu)
> > + return -ENODEV;
> > +
> > + spin_lock_irqsave(&device_domain_lock, flags);
> > + spin_lock(&iommu->lock);
> > + info = iommu_support_dev_iotlb(dmar_domain, iommu, bus,
> > devfn);
> > + if (!info) {
> > + ret = -EINVAL;
> > + goto out_unlock;
> > + }
> > + did = dmar_domain->iommu_did[iommu->seq_id];
> > + sid = PCI_DEVID(bus, devfn);
> > + size = to_vtd_size(inv_info->addr_info.granule_size,
> > inv_info->addr_info.nb_granules); +
> > + for_each_set_bit(cache_type, (unsigned long
> > *)&inv_info->cache, IOMMU_CACHE_INV_TYPE_NR) {
> > + u64 granu = 0;
> > + u64 pasid = 0;
> > +
> > + ret = to_vtd_granularity(cache_type,
> > inv_info->granularity, &granu);
> > + if (ret) {
> > + pr_err("Invalid cache type and granu
> > combination %d/%d\n", cache_type,
> > + inv_info->granularity);
> > + break;
> > + }
> > +
> > + /* PASID is stored in different locations based on
> > granularity */
> > + if (inv_info->granularity == IOMMU_INV_GRANU_PASID)
> > + pasid = inv_info->pasid_info.pasid;
> you need to check IOMMU_INV_ADDR_FLAGS_PASID in flags
You mean to check IOMMU_INV_PASID_FLAGS_PASID?
You are right we need to check this flag to make sure the PASID value
is valid. i.e.

if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
inv_info->pasid_info.flags & IOMMU_INV_PASID_FLAGS_PASID)
pasid = inv_info->pasid_info.pasid;


> > + else if (inv_info->granularity ==
> > IOMMU_INV_GRANU_ADDR)
> > + pasid = inv_info->addr_info.pasid;
> same
Ditto.
else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_PASID)
pasid = inv_info->addr_info.pasid;

> > + else {
> > + pr_err("Cannot find PASID for given cache
> > type and granularity\n");
> > + break;
> > + }
> > +
> > + switch (BIT(cache_type)) {
> > + case IOMMU_CACHE_INV_TYPE_IOTLB:
> > + if (size && (inv_info->addr_info.addr &
> > ((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
> > + pr_err("Address out of range,
> > 0x%llx, size order %llu\n",
> don't you mean address not correctly aligned?
> > + inv_info->addr_info.addr,
> > size);
> > + ret = -ERANGE;
> > + goto out_unlock;
> > + }
> > +
> > + qi_flush_piotlb(iommu, did,
> > mm_to_dma_pfn(inv_info->addr_info.addr),
> > + pasid, size, granu,
> > inv_info->addr_info.flags & IOMMU_INV_ADDR_FLAGS_LEAF); +
> > + /*
> > + * Always flush device IOTLB if ATS is
> > enabled since guest
> > + * vIOMMU exposes CM = 1, no device IOTLB
> > flush will be passed
> > + * down.
> > + */
> > + if (info->ats_enabled) {
> > + qi_flush_dev_piotlb(iommu, sid,
> > info->pfsid,
> > + pasid,
> > info->ats_qdep,
> > +
> > inv_info->addr_info.addr, size,
> > + granu);
> > + }
> > + break;
> > + case IOMMU_CACHE_INV_TYPE_DEV_IOTLB:
> > + if (info->ats_enabled) {
> > + qi_flush_dev_piotlb(iommu, sid,
> > info->pfsid,
> > +
> > inv_info->addr_info.pasid, info->ats_qdep,
> > +
> > inv_info->addr_info.addr, size,
> > + granu);
> > + } else
> > + pr_warn("Passdown device IOTLB
> > flush w/o ATS!\n"); +
> > + break;
> > + case IOMMU_CACHE_INV_TYPE_PASID:
> > + qi_flush_pasid_cache(iommu, did, granu,
> > inv_info->pasid_info.pasid); +
> > + break;
> > + default:
> > + dev_err(dev, "Unsupported IOMMU
> > invalidation type %d\n",
> > + cache_type);
> > + ret = -EINVAL;
> > + }
> > + }
> > +out_unlock:
> > + spin_unlock(&iommu->lock);
> > + spin_unlock_irqrestore(&device_domain_lock, flags);
> > +
> > + return ret;
> > +}
> > +#endif
> > +
> > static int intel_iommu_map(struct iommu_domain *domain,
> > unsigned long iova, phys_addr_t hpa,
> > size_t size, int iommu_prot)
> > @@ -6027,6 +6196,7 @@ const struct iommu_ops intel_iommu_ops = {
> > .is_attach_deferred =
> > intel_iommu_is_attach_deferred, .pgsize_bitmap =
> > INTEL_IOMMU_PGSIZES, #ifdef CONFIG_INTEL_IOMMU_SVM
> > + .cache_invalidate = intel_iommu_sva_invalidate,
> > .sva_bind_gpasid = intel_svm_bind_gpasid,
> > .sva_unbind_gpasid = intel_svm_unbind_gpasid,
> > #endif
> >
> Thanks
>
> Eric
>
Thanks,

Jacob