Hi,
Intel vt-d rev3.0 [1] introduces a new translation mode called
'scalable mode', which enables PASID-granular translations for
first level, second level, nested and pass-through modes. The
vt-d scalable mode is the key ingredient to enable Scalable I/O
Virtualization (Scalable IOV) [2] [3], which allows sharing a
device in minimal possible granularity (ADI - Assignable Device
Interface). It also includes all the capabilities required to
enable Shared Virtual Addressing (SVA). As a result, previous
Extended Context (ECS) mode is deprecated (no production ever
implements ECS).
Each scalable mode pasid table entry is 64 bytes in length, with
fields point to the first level page table and the second level
page table. The PGTT (Pasid Granular Translation Type) field is
used by hardware to determine the translation type.
A Scalable Mode .-------------.
PASID Entry .-| |
.------------------. .-| | 1st Level |
7| | | | | Page Table |
.------------------. | | | |
6| | | | | |
'------------------' | | '-------------'
5| | | '-------------'
'------------------' '-------------'
4| | ^
'------------------' /
3| | / .-------------.
.----.-------.-----. / .-| |
2| | FLPTR | |/ .-| | 2nd Level |
.----'-------'-----. | | | Page Table |
1| | | | | |
.-.-------..------.. | | | |
0| | SLPTR || PGTT ||----> | | '-------------'
'-'-------''------'' | '-------------'
6 | 0 '-------------'
3 v
.------------------------------------.
| PASID Granular Translation Type |
| |
| 001b: 1st level translation only |
| 101b: 2nd level translation only |
| 011b: Nested translation |
| 100b: Pass through |
'------------------------------------'
This patch series adds the scalable mode support in the Intel
IOMMU driver. It will make all the Intel IOMMU features work
in scalable mode. The changes are all constrained within the
Intel IOMMU driver, as it's purely internal format change.
This patch series depends on a patch set titled ("iommu/vt-d:
Improve PASID id and table management") post here [4] which
implements global pasid namespace and per-device pasid table
APIs.
References:
[1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
[2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
[4] https://lkml.org/lkml/2018/7/14/69
Best regards,
Lu Baolu
Lu Baolu (10):
iommu/vt-d: Enumerate the scalable mode capability
iommu/vt-d: Manage scalalble mode PASID tables
iommu/vt-d: Move page table helpers into header
iommu/vt-d: Add second level page table interface
iommu/vt-d: Setup pasid entry for RID2PASID support
iommu/vt-d: Pass pasid table to context mapping
iommu/vt-d: Setup context and enable RID2PASID support
iommu/vt-d: Add first level page table interface
iommu/vt-d: Shared virtual address in scalable mode
iommu/vt-d: Remove deferred invalidation
drivers/iommu/intel-iommu.c | 252 +++++++++++++++++++-----------------
drivers/iommu/intel-pasid.c | 295 ++++++++++++++++++++++++++++++++++++++++--
drivers/iommu/intel-pasid.h | 20 ++-
drivers/iommu/intel-svm.c | 74 +----------
include/linux/dma_remapping.h | 9 +-
include/linux/intel-iommu.h | 54 ++++++--
6 files changed, 485 insertions(+), 219 deletions(-)
--
2.7.4
The Intel vt-d spec rev3.0 introduces a new translation
mode called scalable mode, which enables PASID-granular
translations for first level, second level, nested and
pass-through modes. At the same time, the previous
Extended Context (ECS) mode is deprecated (no production
ever implements ECS).
This patch adds enumeration for Scalable Mode and removes
the deprecated ECS enumeration. It provides a boot time
option to disable scalable mode in case it's required as
a chicken bit option.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-iommu.c | 35 ++++++++++++++++-------------------
include/linux/intel-iommu.h | 1 +
2 files changed, 17 insertions(+), 19 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index fed67c6..0a7362b 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -429,15 +429,15 @@ static int dmar_map_gfx = 1;
static int dmar_forcedac;
static int intel_iommu_strict;
static int intel_iommu_superpage = 1;
-static int intel_iommu_ecs = 1;
+static int intel_iommu_sm = 1;
static int iommu_identity_mapping;
#define IDENTMAP_ALL 1
#define IDENTMAP_GFX 2
#define IDENTMAP_AZALIA 4
-#define ecs_enabled(iommu) (intel_iommu_ecs && ecap_ecs(iommu->ecap))
-#define pasid_enabled(iommu) (ecs_enabled(iommu) && ecap_pasid(iommu->ecap))
+#define sm_supported(iu) (intel_iommu_sm && ecap_smts((iu)->ecap))
+#define pasid_supported(iu) (sm_supported(iu) && ecap_pasid((iu)->ecap))
int intel_iommu_gfx_mapped;
EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped);
@@ -517,10 +517,9 @@ static int __init intel_iommu_setup(char *str)
} else if (!strncmp(str, "sp_off", 6)) {
pr_info("Disable supported super page\n");
intel_iommu_superpage = 0;
- } else if (!strncmp(str, "ecs_off", 7)) {
- printk(KERN_INFO
- "Intel-IOMMU: disable extended context table support\n");
- intel_iommu_ecs = 0;
+ } else if (!strncmp(str, "sm_off", 6)) {
+ pr_info("Intel-IOMMU: disable scalable mode support\n");
+ intel_iommu_sm = 0;
} else if (!strncmp(str, "tboot_noforce", 13)) {
printk(KERN_INFO
"Intel-IOMMU: not forcing on after tboot. This could expose security risk for tboot\n");
@@ -767,7 +766,7 @@ static inline struct context_entry *iommu_context_addr(struct intel_iommu *iommu
u64 *entry;
entry = &root->lo;
- if (ecs_enabled(iommu)) {
+ if (sm_supported(iommu)) {
if (devfn >= 0x80) {
devfn -= 0x80;
entry = &root->hi;
@@ -909,7 +908,7 @@ static void free_context_table(struct intel_iommu *iommu)
if (context)
free_pgtable_page(context);
- if (!ecs_enabled(iommu))
+ if (!sm_supported(iommu))
continue;
context = iommu_context_addr(iommu, i, 0x80, 0);
@@ -1261,8 +1260,6 @@ static void iommu_set_root_entry(struct intel_iommu *iommu)
unsigned long flag;
addr = virt_to_phys(iommu->root_entry);
- if (ecs_enabled(iommu))
- addr |= DMA_RTADDR_RTT;
raw_spin_lock_irqsave(&iommu->register_lock, flag);
dmar_writeq(iommu->reg + DMAR_RTADDR_REG, addr);
@@ -1739,7 +1736,7 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
free_context_table(iommu);
#ifdef CONFIG_INTEL_IOMMU_SVM
- if (pasid_enabled(iommu)) {
+ if (pasid_supported(iommu)) {
if (ecap_prs(iommu->ecap))
intel_svm_finish_prq(iommu);
intel_svm_exit(iommu);
@@ -2418,8 +2415,8 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
dmar_find_matched_atsr_unit(pdev))
info->ats_supported = 1;
- if (ecs_enabled(iommu)) {
- if (pasid_enabled(iommu)) {
+ if (sm_supported(iommu)) {
+ if (pasid_supported(iommu)) {
int features = pci_pasid_features(pdev);
if (features >= 0)
info->pasid_supported = features | 1;
@@ -3231,7 +3228,7 @@ static int __init init_dmars(void)
* We need to ensure the system pasid table is no bigger
* than the smallest supported.
*/
- if (pasid_enabled(iommu)) {
+ if (pasid_supported(iommu)) {
u32 temp = 2 << ecap_pss(iommu->ecap);
intel_pasid_max_id = min_t(u32, temp,
@@ -3292,7 +3289,7 @@ static int __init init_dmars(void)
if (!ecap_pass_through(iommu->ecap))
hw_pass_through = 0;
#ifdef CONFIG_INTEL_IOMMU_SVM
- if (pasid_enabled(iommu))
+ if (pasid_supported(iommu))
intel_svm_init(iommu);
#endif
}
@@ -3396,7 +3393,7 @@ static int __init init_dmars(void)
iommu_flush_write_buffer(iommu);
#ifdef CONFIG_INTEL_IOMMU_SVM
- if (pasid_enabled(iommu) && ecap_prs(iommu->ecap)) {
+ if (pasid_supported(iommu) && ecap_prs(iommu->ecap)) {
ret = intel_svm_enable_prq(iommu);
if (ret)
goto free_iommu;
@@ -4300,7 +4297,7 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
goto out;
#ifdef CONFIG_INTEL_IOMMU_SVM
- if (pasid_enabled(iommu))
+ if (pasid_supported(iommu))
intel_svm_init(iommu);
#endif
@@ -4317,7 +4314,7 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
iommu_flush_write_buffer(iommu);
#ifdef CONFIG_INTEL_IOMMU_SVM
- if (pasid_enabled(iommu) && ecap_prs(iommu->ecap)) {
+ if (pasid_supported(iommu) && ecap_prs(iommu->ecap)) {
ret = intel_svm_enable_prq(iommu);
if (ret)
goto disable_iommu;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 398defb..4124cd9 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -115,6 +115,7 @@
* Extended Capability Register
*/
+#define ecap_smts(e) (((e) >> 43) & 0x1)
#define ecap_dit(e) ((e >> 41) & 0x1)
#define ecap_pasid(e) ((e >> 40) & 0x1)
#define ecap_pss(e) ((e >> 35) & 0x1f)
--
2.7.4
So that they could also be used in other source files.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-iommu.c | 43 -------------------------------------------
include/linux/intel-iommu.h | 43 +++++++++++++++++++++++++++++++++++++++++++
2 files changed, 43 insertions(+), 43 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index f9036c8..a139a45 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -315,49 +315,6 @@ static inline void context_clear_entry(struct context_entry *context)
}
/*
- * 0: readable
- * 1: writable
- * 2-6: reserved
- * 7: super page
- * 8-10: available
- * 11: snoop behavior
- * 12-63: Host physcial address
- */
-struct dma_pte {
- u64 val;
-};
-
-static inline void dma_clear_pte(struct dma_pte *pte)
-{
- pte->val = 0;
-}
-
-static inline u64 dma_pte_addr(struct dma_pte *pte)
-{
-#ifdef CONFIG_64BIT
- return pte->val & VTD_PAGE_MASK;
-#else
- /* Must have a full atomic 64-bit read */
- return __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
-#endif
-}
-
-static inline bool dma_pte_present(struct dma_pte *pte)
-{
- return (pte->val & 3) != 0;
-}
-
-static inline bool dma_pte_superpage(struct dma_pte *pte)
-{
- return (pte->val & DMA_PTE_LARGE_PAGE);
-}
-
-static inline int first_pte_in_page(struct dma_pte *pte)
-{
- return !((unsigned long)pte & ~VTD_PAGE_MASK);
-}
-
-/*
* This domain is a statically identity mapping domain.
* 1. This domain creats a static 1:1 mapping to all usable memory.
* 2. It maps to each iommu if successful.
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 4124cd9..7818b1c 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -501,6 +501,49 @@ static inline void __iommu_flush_cache(
clflush_cache_range(addr, size);
}
+/*
+ * 0: readable
+ * 1: writable
+ * 2-6: reserved
+ * 7: super page
+ * 8-10: available
+ * 11: snoop behavior
+ * 12-63: Host physcial address
+ */
+struct dma_pte {
+ u64 val;
+};
+
+static inline void dma_clear_pte(struct dma_pte *pte)
+{
+ pte->val = 0;
+}
+
+static inline u64 dma_pte_addr(struct dma_pte *pte)
+{
+#ifdef CONFIG_64BIT
+ return pte->val & VTD_PAGE_MASK;
+#else
+ /* Must have a full atomic 64-bit read */
+ return __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
+#endif
+}
+
+static inline bool dma_pte_present(struct dma_pte *pte)
+{
+ return (pte->val & 3) != 0;
+}
+
+static inline bool dma_pte_superpage(struct dma_pte *pte)
+{
+ return (pte->val & DMA_PTE_LARGE_PAGE);
+}
+
+static inline int first_pte_in_page(struct dma_pte *pte)
+{
+ return !((unsigned long)pte & ~VTD_PAGE_MASK);
+}
+
extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev);
extern int dmar_find_matched_atsr_unit(struct pci_dev *dev);
--
2.7.4
This adds an interface to setup the structures for second
level page table translation type. This includes the types
of second level translation only and pass through.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-pasid.c | 158 ++++++++++++++++++++++++++++++++++++++++++++
drivers/iommu/intel-pasid.h | 4 ++
include/linux/intel-iommu.h | 1 +
3 files changed, 163 insertions(+)
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index d6e90cd..da504576 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -9,6 +9,7 @@
#define pr_fmt(fmt) "DMAR: " fmt
+#include <linux/bitops.h>
#include <linux/dmar.h>
#include <linux/intel-iommu.h>
#include <linux/iommu.h>
@@ -291,3 +292,160 @@ void intel_pasid_clear_entry(struct device *dev, int pasid)
pasid_clear_entry(pe);
}
+
+static inline void pasid_set_bits(u64 *ptr, u64 mask, u64 bits)
+{
+ u64 old;
+
+ old = READ_ONCE(*ptr);
+ WRITE_ONCE(*ptr, (old & ~mask) | bits);
+}
+
+/*
+ * Setup the DID(Domain Identifier) field (Bit 64~79) of scalable mode
+ * PASID entry.
+ */
+static inline void
+pasid_set_domain_id(struct pasid_entry *pe, u64 value)
+{
+ pasid_set_bits(&pe->val[1], GENMASK_ULL(15, 0), value);
+}
+
+/*
+ * Setup the SLPTPTR(Second Level Page Table Pointer) field (Bit 12~63)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_address_root(struct pasid_entry *pe, u64 value)
+{
+ pasid_set_bits(&pe->val[0], VTD_PAGE_MASK, value);
+}
+
+/*
+ * Setup the AW(Address Width) field (Bit 2~4) of a scalable mode PASID
+ * entry.
+ */
+static inline void
+pasid_set_address_width(struct pasid_entry *pe, u64 value)
+{
+ pasid_set_bits(&pe->val[0], GENMASK_ULL(4, 2), value << 2);
+}
+
+/*
+ * Setup the PGTT(PASID Granular Translation Type) field (Bit 6~8)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_translation_type(struct pasid_entry *pe, u64 value)
+{
+ pasid_set_bits(&pe->val[0], GENMASK_ULL(8, 6), value << 6);
+}
+
+/*
+ * Enable fault processing by clearing the FPD(Fault Processing
+ * Disable) field (Bit 1) of a scalable mode PASID entry.
+ */
+static inline void pasid_set_fault_enable(struct pasid_entry *pe)
+{
+ pasid_set_bits(&pe->val[0], 1 << 1, 0);
+}
+
+/*
+ * Setup the SRE(Supervisor Request Enable) field (Bit 128) of a
+ * scalable mode PASID entry.
+ */
+static inline void pasid_set_sre(struct pasid_entry *pe)
+{
+ pasid_set_bits(&pe->val[2], 1 << 0, 1);
+}
+
+/*
+ * Setup the P(Present) field (Bit 0) of a scalable mode PASID
+ * entry.
+ */
+static inline void pasid_set_present(struct pasid_entry *pe)
+{
+ pasid_set_bits(&pe->val[0], 1 << 0, 1);
+}
+
+/*
+ * Setup Page Walk Snoop bit (Bit 87) of a scalable mode PASID
+ * entry.
+ */
+static inline void pasid_set_page_snoop(struct pasid_entry *pe, bool value)
+{
+ pasid_set_bits(&pe->val[1], 1 << 23, value);
+}
+
+static inline void
+flush_pasid_cache(struct intel_iommu *iommu, int did, int pasid)
+{
+ struct qi_desc desc;
+
+ desc.high = 0;
+ desc.low = QI_PC_DID(did) | QI_PC_PASID_SEL | QI_PC_PASID(pasid);
+
+ qi_submit_sync(&desc, iommu);
+}
+
+/*
+ * Set up the scalable mode pasid table entry for second only or
+ * passthrough translation type.
+ */
+void intel_pasid_setup_second_level(struct intel_iommu *iommu,
+ struct dmar_domain *domain,
+ struct device *dev, int pasid,
+ bool pass_through)
+{
+ u16 did = domain->iommu_did[iommu->seq_id];
+ struct pasid_entry *pte;
+ struct dma_pte *pgd;
+ u64 pgd_val;
+ int agaw;
+
+ /*
+ * Skip top levels of page tables for iommu which has less agaw
+ * than default. Unnecessary for PT mode.
+ */
+ pgd = domain->pgd;
+ if (!pass_through) {
+ for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
+ pgd = phys_to_virt(dma_pte_addr(pgd));
+ if (!dma_pte_present(pgd)) {
+ dev_err(dev, "Invalid domain page table\n");
+ return;
+ }
+ }
+ }
+ pgd_val = pass_through ? 0 : virt_to_phys(pgd);
+
+ pte = intel_pasid_get_entry(dev, pasid);
+ if (!pte) {
+ dev_err(dev, "Failed to get pasid entry of PASID %d\n", pasid);
+ return;
+ }
+
+ pasid_clear_entry(pte);
+ pasid_set_domain_id(pte, did);
+
+ if (!pass_through)
+ pasid_set_address_root(pte, pgd_val);
+
+ pasid_set_address_width(pte, iommu->agaw);
+ pasid_set_translation_type(pte, pass_through ? 4 : 2);
+ pasid_set_fault_enable(pte);
+ pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+
+ /*
+ * Since it is a second level only translation setup, we should
+ * set SRE bit as well (addresses are expected to be GPAs).
+ */
+ pasid_set_sre(pte);
+ pasid_set_present(pte);
+
+ if (!ecap_coherent(iommu->ecap))
+ clflush_cache_range(pte, sizeof(*pte));
+
+ if (cap_caching_mode(iommu->cap))
+ flush_pasid_cache(iommu, did, pasid);
+}
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 12f480c..2fe40ff 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -43,5 +43,9 @@ struct pasid_table *intel_pasid_get_table(struct device *dev);
int intel_pasid_get_dev_max_id(struct device *dev);
struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid);
void intel_pasid_clear_entry(struct device *dev, int pasid);
+void intel_pasid_setup_second_level(struct intel_iommu *iommu,
+ struct dmar_domain *domain,
+ struct device *dev, int pasid,
+ bool pass_through);
#endif /* __INTEL_PASID_H */
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 7818b1c..a20ebca 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -115,6 +115,7 @@
* Extended Capability Register
*/
+#define ecap_smpwc(e) (((e) >> 48) & 0x1)
#define ecap_smts(e) (((e) >> 43) & 0x1)
#define ecap_dit(e) ((e >> 41) & 0x1)
#define ecap_pasid(e) ((e >> 40) & 0x1)
--
2.7.4
when the scalable mode is enabled, there is no second level
page translation pointer in the context entry any more (for
DMA request without PASID). Instead, a new RID2PASID field
is introduced in the context entry. Software can choose any
PASID value to set RID2PASID and then setup the translation
in the corresponding PASID entry. Upon receiving a DMA request
without PASID, hardware will firstly look at this RID2PASID
field and then treat this request as a request with a pasid
value specified in RID2PASID field.
Though software is allowed to use any PASID for the RID2PASID,
we will always use the PASID 0 as a sort of design decision.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-iommu.c | 10 ++++++++++
drivers/iommu/intel-pasid.h | 1 +
2 files changed, 11 insertions(+)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a139a45..62e9579 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2421,12 +2421,22 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
dev->archdata.iommu = info;
if (dev && dev_is_pci(dev) && sm_supported(iommu)) {
+ bool pass_through;
+
ret = intel_pasid_alloc_table(dev);
if (ret) {
__dmar_remove_one_dev_info(info);
spin_unlock_irqrestore(&device_domain_lock, flags);
return NULL;
}
+
+ /* Setup the PASID entry for requests without PASID: */
+ pass_through = hw_pass_through && domain_type_is_si(domain);
+ spin_lock(&iommu->lock);
+ intel_pasid_setup_second_level(iommu, domain, dev,
+ PASID_RID2PASID,
+ pass_through);
+ spin_unlock(&iommu->lock);
}
spin_unlock_irqrestore(&device_domain_lock, flags);
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 2fe40ff..80fc88e 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -10,6 +10,7 @@
#ifndef __INTEL_PASID_H
#define __INTEL_PASID_H
+#define PASID_RID2PASID 0x0
#define PASID_MIN 0x1
#define PASID_MAX 0x100000
#define PASID_PTE_MASK 0x3F
--
2.7.4
This adds an interface to setup the structures for first
level page table translation type.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-pasid.c | 65 +++++++++++++++++++++++++++++++++++++++++++++
drivers/iommu/intel-pasid.h | 4 +++
drivers/iommu/intel-svm.c | 1 -
3 files changed, 69 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index da504576..1195c2a 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -10,6 +10,7 @@
#define pr_fmt(fmt) "DMAR: " fmt
#include <linux/bitops.h>
+#include <linux/cpufeature.h>
#include <linux/dmar.h>
#include <linux/intel-iommu.h>
#include <linux/iommu.h>
@@ -377,6 +378,26 @@ static inline void pasid_set_page_snoop(struct pasid_entry *pe, bool value)
pasid_set_bits(&pe->val[1], 1 << 23, value);
}
+/*
+ * Setup the First Level Page table Pointer field (Bit 140~191)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_flptr(struct pasid_entry *pe, u64 value)
+{
+ pasid_set_bits(&pe->val[2], VTD_PAGE_MASK, value);
+}
+
+/*
+ * Setup the First Level Paging Mode field (Bit 130~131) of a
+ * scalable mode PASID entry.
+ */
+static inline void
+pasid_set_flpm(struct pasid_entry *pe, u64 value)
+{
+ pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
+}
+
static inline void
flush_pasid_cache(struct intel_iommu *iommu, int did, int pasid)
{
@@ -389,6 +410,50 @@ flush_pasid_cache(struct intel_iommu *iommu, int did, int pasid)
}
/*
+ * Set up the scalable mode pasid table entry for first only
+ * translation type.
+ */
+void intel_pasid_setup_first_level(struct intel_iommu *iommu,
+ struct mm_struct *mm,
+ struct device *dev,
+ int pasid)
+{
+ struct pasid_entry *pte;
+
+ pte = intel_pasid_get_entry(dev, pasid);
+ if (WARN_ON(!pte))
+ return;
+
+ pasid_clear_entry(pte);
+
+ /* Setup the first level page table pointer: */
+ if (mm) {
+ pasid_set_flptr(pte, (u64)__pa(mm->pgd));
+ } else {
+ pasid_set_sre(pte);
+ pasid_set_flptr(pte, (u64)__pa(init_mm.pgd));
+ }
+
+#ifdef CONFIG_X86
+ if (cpu_feature_enabled(X86_FEATURE_LA57))
+ pasid_set_flpm(pte, 1);
+#endif /* CONFIG_X86 */
+
+ pasid_set_address_width(pte, iommu->agaw);
+ pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+
+ /* Setup Present and PASID Granular Transfer Type: */
+ pasid_set_translation_type(pte, 1);
+ pasid_set_present(pte);
+
+ if (!ecap_coherent(iommu->ecap))
+ clflush_cache_range(pte, sizeof(*pte));
+
+ if (cap_caching_mode(iommu->cap))
+ flush_pasid_cache(iommu, 0, pasid);
+}
+
+/*
* Set up the scalable mode pasid table entry for second only or
* passthrough translation type.
*/
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 80d4667..518df72 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -45,6 +45,10 @@ struct pasid_table *intel_pasid_get_table(struct device *dev);
int intel_pasid_get_dev_max_id(struct device *dev);
struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid);
void intel_pasid_clear_entry(struct device *dev, int pasid);
+void intel_pasid_setup_first_level(struct intel_iommu *iommu,
+ struct mm_struct *mm,
+ struct device *dev,
+ int pasid);
void intel_pasid_setup_second_level(struct intel_iommu *iommu,
struct dmar_domain *domain,
struct device *dev, int pasid,
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 5d250cf..8d4a911 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -29,7 +29,6 @@
#include "intel-pasid.h"
#define PASID_ENTRY_P BIT_ULL(0)
-#define PASID_ENTRY_FLPM_5LP BIT_ULL(9)
#define PASID_ENTRY_SRE BIT_ULL(11)
static irqreturn_t prq_event_thread(int irq, void *d);
--
2.7.4
This patch enables the translation for requests without PASID in
the scalable mode by setting up the root and context entries.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-iommu.c | 109 +++++++++++++++++++++++++++++++++++++-------
drivers/iommu/intel-pasid.h | 1 +
include/linux/intel-iommu.h | 1 +
3 files changed, 95 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index add7e3e..13f3d17 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1217,6 +1217,8 @@ static void iommu_set_root_entry(struct intel_iommu *iommu)
unsigned long flag;
addr = virt_to_phys(iommu->root_entry);
+ if (sm_supported(iommu))
+ addr |= DMA_RTADDR_SMT;
raw_spin_lock_irqsave(&iommu->register_lock, flag);
dmar_writeq(iommu->reg + DMAR_RTADDR_REG, addr);
@@ -1916,6 +1918,55 @@ static void domain_exit(struct dmar_domain *domain)
free_domain_mem(domain);
}
+/*
+ * Get the PASID directory size for scalable mode context entry.
+ * Value of X in the PDTS field of a scalable mode context entry
+ * indicates PASID directory with 2^(X + 7) entries.
+ */
+static inline unsigned long context_get_sm_pds(struct pasid_table *table)
+{
+ int pds, max_pde;
+
+ max_pde = table->max_pasid >> PASID_PDE_SHIFT;
+ pds = find_first_bit((unsigned long *)&max_pde, MAX_NR_PASID_BITS);
+ if (pds < 7)
+ return 0;
+
+ return pds - 7;
+}
+
+/*
+ * Set the RID_PASID field of a scalable mode context entry. The
+ * IOMMU hardware will use the PASID value set in this field for
+ * DMA translations of DMA requests without PASID.
+ */
+static inline void
+context_set_sm_rid2pasid(struct context_entry *context, unsigned long pasid)
+{
+ context->hi |= pasid & ((1 << 20) - 1);
+}
+
+/*
+ * Set the DTE(Device-TLB Enable) field of a scalable mode context
+ * entry.
+ */
+static inline void context_set_sm_dte(struct context_entry *context)
+{
+ context->lo |= (1 << 2);
+}
+
+/*
+ * Set the PRE(Page Request Enable) field of a scalable mode context
+ * entry.
+ */
+static inline void context_set_sm_pre(struct context_entry *context)
+{
+ context->lo |= (1 << 4);
+}
+
+/* Convert value to context PASID directory size field coding. */
+#define context_pdts(pds) (((pds) & 0x7) << 9)
+
static int domain_context_mapping_one(struct dmar_domain *domain,
struct intel_iommu *iommu,
struct pasid_table *table,
@@ -1974,9 +2025,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
}
pgd = domain->pgd;
-
context_clear_entry(context);
- context_set_domain_id(context, did);
/*
* Skip top levels of page tables for iommu which has less agaw
@@ -1989,25 +2038,54 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
if (!dma_pte_present(pgd))
goto out_unlock;
}
+ }
- info = iommu_support_dev_iotlb(domain, iommu, bus, devfn);
- if (info && info->ats_supported)
- translation = CONTEXT_TT_DEV_IOTLB;
- else
- translation = CONTEXT_TT_MULTI_LEVEL;
+ if (sm_supported(iommu)) {
+ unsigned long pds;
+
+ WARN_ON(!table);
+
+ /* Setup the PASID DIR pointer: */
+ pds = context_get_sm_pds(table);
+ context->lo = (u64)virt_to_phys(table->table) |
+ context_pdts(pds);
+
+ /* Setup the RID_PASID field: */
+ context_set_sm_rid2pasid(context, PASID_RID2PASID);
- context_set_address_root(context, virt_to_phys(pgd));
- context_set_address_width(context, iommu->agaw);
- } else {
/*
- * In pass through mode, AW must be programmed to
- * indicate the largest AGAW value supported by
- * hardware. And ASR is ignored by hardware.
+ * Setup the Device-TLB enable bit and Page request
+ * Enable bit:
*/
- context_set_address_width(context, iommu->msagaw);
+ info = iommu_support_dev_iotlb(domain, iommu, bus, devfn);
+ if (info && info->ats_supported)
+ context_set_sm_dte(context);
+ if (info && info->pri_supported)
+ context_set_sm_pre(context);
+ } else {
+ context_set_domain_id(context, did);
+
+ if (translation != CONTEXT_TT_PASS_THROUGH) {
+ info = iommu_support_dev_iotlb(domain, iommu,
+ bus, devfn);
+ if (info && info->ats_supported)
+ translation = CONTEXT_TT_DEV_IOTLB;
+ else
+ translation = CONTEXT_TT_MULTI_LEVEL;
+
+ context_set_address_root(context, virt_to_phys(pgd));
+ context_set_address_width(context, iommu->agaw);
+ } else {
+ /*
+ * In pass through mode, AW must be programmed to
+ * indicate the largest AGAW value supported by
+ * hardware. And ASR is ignored by hardware.
+ */
+ context_set_address_width(context, iommu->msagaw);
+ }
+ context_set_translation_type(context, translation);
}
- context_set_translation_type(context, translation);
context_set_fault_enable(context);
context_set_present(context);
domain_flush_cache(domain, context, sizeof(*context));
@@ -5150,7 +5228,6 @@ static void intel_iommu_put_resv_regions(struct device *dev,
}
#ifdef CONFIG_INTEL_IOMMU_SVM
-#define MAX_NR_PASID_BITS (20)
static inline unsigned long intel_iommu_get_pts(struct device *dev)
{
int pts, max_pasid;
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 80fc88e..80d4667 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -17,6 +17,7 @@
#define PASID_PTE_PRESENT 1
#define PDE_PFN_MASK PAGE_MASK
#define PASID_PDE_SHIFT 6
+#define MAX_NR_PASID_BITS 20
struct pasid_dir_entry {
u64 val;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index a20ebca..4b58946 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -194,6 +194,7 @@
/* DMA_RTADDR_REG */
#define DMA_RTADDR_RTT (((u64)1) << 11)
+#define DMA_RTADDR_SMT (((u64)1) << 10)
/* CCMD_REG */
#define DMA_CCMD_ICC (((u64)1) << 63)
--
2.7.4
Deferred invalidation is an ECS specific feature. It will not be
supported when IOMMU works in scalable mode. As we deprecated the
ECS support, remove deferred invalidation and cleanup the code.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-iommu.c | 1 -
drivers/iommu/intel-svm.c | 45 ---------------------------------------------
include/linux/intel-iommu.h | 8 --------
3 files changed, 54 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 88ec860..0b0209e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1698,7 +1698,6 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
if (pasid_supported(iommu)) {
if (ecap_prs(iommu->ecap))
intel_svm_finish_prq(iommu);
- intel_svm_exit(iommu);
}
#endif
}
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index a16a421..da16a74 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -30,15 +30,8 @@
static irqreturn_t prq_event_thread(int irq, void *d);
-struct pasid_state_entry {
- u64 val;
-};
-
int intel_svm_init(struct intel_iommu *iommu)
{
- struct page *pages;
- int order;
-
if (cpu_feature_enabled(X86_FEATURE_GBPAGES) &&
!cap_fl1gp_support(iommu->cap))
return -EINVAL;
@@ -47,39 +40,6 @@ int intel_svm_init(struct intel_iommu *iommu)
!cap_5lp_support(iommu->cap))
return -EINVAL;
- /* Start at 2 because it's defined as 2^(1+PSS) */
- iommu->pasid_max = 2 << ecap_pss(iommu->ecap);
-
- /* Eventually I'm promised we will get a multi-level PASID table
- * and it won't have to be physically contiguous. Until then,
- * limit the size because 8MiB contiguous allocations can be hard
- * to come by. The limit of 0x20000, which is 1MiB for each of
- * the PASID and PASID-state tables, is somewhat arbitrary. */
- if (iommu->pasid_max > 0x20000)
- iommu->pasid_max = 0x20000;
-
- order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
- if (ecap_dis(iommu->ecap)) {
- pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
- if (pages)
- iommu->pasid_state_table = page_address(pages);
- else
- pr_warn("IOMMU: %s: Failed to allocate PASID state table\n",
- iommu->name);
- }
-
- return 0;
-}
-
-int intel_svm_exit(struct intel_iommu *iommu)
-{
- int order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
-
- if (iommu->pasid_state_table) {
- free_pages((unsigned long)iommu->pasid_state_table, order);
- iommu->pasid_state_table = NULL;
- }
-
return 0;
}
@@ -197,11 +157,6 @@ static void intel_flush_svm_range(struct intel_svm *svm, unsigned long address,
{
struct intel_svm_dev *sdev;
- /* Try deferred invalidate if available */
- if (svm->iommu->pasid_state_table &&
- !cmpxchg64(&svm->iommu->pasid_state_table[svm->pasid].val, 0, 1ULL << 63))
- return;
-
rcu_read_lock();
list_for_each_entry_rcu(sdev, &svm->devs, list)
intel_flush_svm_range_dev(svm, sdev, address, pages, ih, gl);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 4b58946..9fbd1a7 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -451,15 +451,8 @@ struct intel_iommu {
struct iommu_flush flush;
#endif
#ifdef CONFIG_INTEL_IOMMU_SVM
- /* These are large and need to be contiguous, so we allocate just
- * one for now. We'll maybe want to rethink that if we truly give
- * devices away to userspace processes (e.g. for DPDK) and don't
- * want to trust that userspace will use *only* the PASID it was
- * told to. But while it's all driver-arbitrated, we're fine. */
- struct pasid_state_entry *pasid_state_table;
struct page_req_dsc *prq;
unsigned char prq_name[16]; /* Name for PRQ interrupt */
- u32 pasid_max;
#endif
struct q_inval *qi; /* Queued invalidation info */
u32 *iommu_state; /* Store iommu states between suspend and resume.*/
@@ -573,7 +566,6 @@ int for_each_device_domain(int (*fn)(struct device_domain_info *info,
#ifdef CONFIG_INTEL_IOMMU_SVM
int intel_svm_init(struct intel_iommu *iommu);
-int intel_svm_exit(struct intel_iommu *iommu);
extern int intel_svm_enable_prq(struct intel_iommu *iommu);
extern int intel_svm_finish_prq(struct intel_iommu *iommu);
--
2.7.4
This patch enables the current SVA (Shared Virtual Address)
implementation to work in the scalable mode.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-iommu.c | 38 --------------------------------------
drivers/iommu/intel-svm.c | 24 ++----------------------
include/linux/dma_remapping.h | 9 +--------
3 files changed, 3 insertions(+), 68 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 13f3d17..88ec860 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5228,18 +5228,6 @@ static void intel_iommu_put_resv_regions(struct device *dev,
}
#ifdef CONFIG_INTEL_IOMMU_SVM
-static inline unsigned long intel_iommu_get_pts(struct device *dev)
-{
- int pts, max_pasid;
-
- max_pasid = intel_pasid_get_dev_max_id(dev);
- pts = find_first_bit((unsigned long *)&max_pasid, MAX_NR_PASID_BITS);
- if (pts < 5)
- return 0;
-
- return pts - 5;
-}
-
int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sdev)
{
struct device_domain_info *info;
@@ -5271,33 +5259,7 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sd
sdev->sid = PCI_DEVID(info->bus, info->devfn);
if (!(ctx_lo & CONTEXT_PASIDE)) {
- if (iommu->pasid_state_table)
- context[1].hi = (u64)virt_to_phys(iommu->pasid_state_table);
- context[1].lo = (u64)virt_to_phys(info->pasid_table->table) |
- intel_iommu_get_pts(sdev->dev);
-
- wmb();
- /* CONTEXT_TT_MULTI_LEVEL and CONTEXT_TT_DEV_IOTLB are both
- * extended to permit requests-with-PASID if the PASIDE bit
- * is set. which makes sense. For CONTEXT_TT_PASS_THROUGH,
- * however, the PASIDE bit is ignored and requests-with-PASID
- * are unconditionally blocked. Which makes less sense.
- * So convert from CONTEXT_TT_PASS_THROUGH to one of the new
- * "guest mode" translation types depending on whether ATS
- * is available or not. Annoyingly, we can't use the new
- * modes *unless* PASIDE is set. */
- if ((ctx_lo & CONTEXT_TT_MASK) == (CONTEXT_TT_PASS_THROUGH << 2)) {
- ctx_lo &= ~CONTEXT_TT_MASK;
- if (info->ats_supported)
- ctx_lo |= CONTEXT_TT_PT_PASID_DEV_IOTLB << 2;
- else
- ctx_lo |= CONTEXT_TT_PT_PASID << 2;
- }
ctx_lo |= CONTEXT_PASIDE;
- if (iommu->pasid_state_table)
- ctx_lo |= CONTEXT_DINVE;
- if (info->pri_supported)
- ctx_lo |= CONTEXT_PRS;
context[0].lo = ctx_lo;
wmb();
iommu->flush.flush_context(iommu, sdev->did, sdev->sid,
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 8d4a911..a16a421 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -28,9 +28,6 @@
#include "intel-pasid.h"
-#define PASID_ENTRY_P BIT_ULL(0)
-#define PASID_ENTRY_SRE BIT_ULL(11)
-
static irqreturn_t prq_event_thread(int irq, void *d);
struct pasid_state_entry {
@@ -280,11 +277,9 @@ static LIST_HEAD(global_svm_list);
int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
{
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
- struct pasid_entry *entry;
struct intel_svm_dev *sdev;
struct intel_svm *svm = NULL;
struct mm_struct *mm = NULL;
- u64 pasid_entry_val;
int pasid_max;
int ret;
@@ -393,23 +388,8 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
kfree(sdev);
goto out;
}
- pasid_entry_val = (u64)__pa(mm->pgd) | PASID_ENTRY_P;
- } else
- pasid_entry_val = (u64)__pa(init_mm.pgd) |
- PASID_ENTRY_P | PASID_ENTRY_SRE;
- if (cpu_feature_enabled(X86_FEATURE_LA57))
- pasid_entry_val |= PASID_ENTRY_FLPM_5LP;
-
- entry = intel_pasid_get_entry(dev, svm->pasid);
- WRITE_ONCE(entry->val[0], pasid_entry_val);
-
- /*
- * Flush PASID cache when a PASID table entry becomes
- * present.
- */
- if (cap_caching_mode(iommu->cap))
- intel_flush_pasid_dev(svm, sdev, svm->pasid);
-
+ }
+ intel_pasid_setup_first_level(iommu, mm, dev, svm->pasid);
list_add_tail(&svm->list, &global_svm_list);
}
list_add_rcu(&sdev->list, &svm->devs);
diff --git a/include/linux/dma_remapping.h b/include/linux/dma_remapping.h
index 21b3e7d..6f01e54 100644
--- a/include/linux/dma_remapping.h
+++ b/include/linux/dma_remapping.h
@@ -21,14 +21,7 @@
#define CONTEXT_TT_MULTI_LEVEL 0
#define CONTEXT_TT_DEV_IOTLB 1
#define CONTEXT_TT_PASS_THROUGH 2
-/* Extended context entry types */
-#define CONTEXT_TT_PT_PASID 4
-#define CONTEXT_TT_PT_PASID_DEV_IOTLB 5
-#define CONTEXT_TT_MASK (7ULL << 2)
-
-#define CONTEXT_DINVE (1ULL << 8)
-#define CONTEXT_PRS (1ULL << 9)
-#define CONTEXT_PASIDE (1ULL << 11)
+#define CONTEXT_PASIDE BIT_ULL(3)
struct intel_iommu;
struct dmar_domain;
--
2.7.4
In scalable mode, pasid structure is a two level table with
a pasid directory table and a pasid table. Any pasid entry
can be identified by a pasid value in below way.
1
9 6 5 0
.-----------------------.-------.
| PASID | |
'-----------------------'-------' .-------------.
| | | |
| | | |
| | | |
| .-----------. | .-------------.
| | | |----->| PASID Entry |
| | | | '-------------'
| | | |Plus | |
| .-----------. | | |
|---->| DIR Entry |-------->| |
| '-----------' '-------------'
.---------. |Plus | |
| Context | | | |
| Entry |------->| |
'---------' '-----------'
This changes the pasid table APIs to support scalable mode
PASID directory and PASID table. It also adds a helper to
get the PASID table entry according to the pasid value.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-iommu.c | 2 +-
drivers/iommu/intel-pasid.c | 72 +++++++++++++++++++++++++++++++++++++++------
drivers/iommu/intel-pasid.h | 10 ++++++-
drivers/iommu/intel-svm.c | 6 +---
4 files changed, 74 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 0a7362b..f9036c8 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2463,7 +2463,7 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
if (dev)
dev->archdata.iommu = info;
- if (dev && dev_is_pci(dev) && info->pasid_supported) {
+ if (dev && dev_is_pci(dev) && sm_supported(iommu)) {
ret = intel_pasid_alloc_table(dev);
if (ret) {
__dmar_remove_one_dev_info(info);
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index fe95c9b..d6e90cd 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -127,8 +127,7 @@ int intel_pasid_alloc_table(struct device *dev)
int ret, order;
info = dev->archdata.iommu;
- if (WARN_ON(!info || !dev_is_pci(dev) ||
- !info->pasid_supported || info->pasid_table))
+ if (WARN_ON(!info || !dev_is_pci(dev) || info->pasid_table))
return -EINVAL;
/* DMA alias device already has a pasid table, use it: */
@@ -143,8 +142,9 @@ int intel_pasid_alloc_table(struct device *dev)
return -ENOMEM;
INIT_LIST_HEAD(&pasid_table->dev);
- size = sizeof(struct pasid_entry);
+ size = sizeof(struct pasid_dir_entry);
count = min_t(int, pci_max_pasids(to_pci_dev(dev)), intel_pasid_max_id);
+ count >>= PASID_PDE_SHIFT;
order = get_order(size * count);
pages = alloc_pages_node(info->iommu->node,
GFP_ATOMIC | __GFP_ZERO,
@@ -154,7 +154,7 @@ int intel_pasid_alloc_table(struct device *dev)
pasid_table->table = page_address(pages);
pasid_table->order = order;
- pasid_table->max_pasid = count;
+ pasid_table->max_pasid = count << PASID_PDE_SHIFT;
attach_out:
device_attach_pasid_table(info, pasid_table);
@@ -162,14 +162,33 @@ int intel_pasid_alloc_table(struct device *dev)
return 0;
}
+/* Get PRESENT bit of a PASID directory entry. */
+static inline bool
+pasid_pde_is_present(struct pasid_dir_entry *pde)
+{
+ return READ_ONCE(pde->val) & PASID_PTE_PRESENT;
+}
+
+/* Get PASID table from a PASID directory entry. */
+static inline struct pasid_entry *
+get_pasid_table_from_pde(struct pasid_dir_entry *pde)
+{
+ if (!pasid_pde_is_present(pde))
+ return NULL;
+
+ return phys_to_virt(READ_ONCE(pde->val) & PDE_PFN_MASK);
+}
+
void intel_pasid_free_table(struct device *dev)
{
struct device_domain_info *info;
struct pasid_table *pasid_table;
+ struct pasid_dir_entry *dir;
+ struct pasid_entry *table;
+ int i, max_pde;
info = dev->archdata.iommu;
- if (!info || !dev_is_pci(dev) ||
- !info->pasid_supported || !info->pasid_table)
+ if (!info || !dev_is_pci(dev) || !info->pasid_table)
return;
pasid_table = info->pasid_table;
@@ -178,6 +197,14 @@ void intel_pasid_free_table(struct device *dev)
if (!list_empty(&pasid_table->dev))
return;
+ /* Free scalable mode PASID directory tables: */
+ dir = pasid_table->table;
+ max_pde = pasid_table->max_pasid >> PASID_PDE_SHIFT;
+ for (i = 0; i < max_pde; i++) {
+ table = get_pasid_table_from_pde(&dir[i]);
+ free_pgtable_page(table);
+ }
+
free_pages((unsigned long)pasid_table->table, pasid_table->order);
kfree(pasid_table);
}
@@ -206,17 +233,37 @@ int intel_pasid_get_dev_max_id(struct device *dev)
struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid)
{
+ struct device_domain_info *info;
struct pasid_table *pasid_table;
+ struct pasid_dir_entry *dir;
struct pasid_entry *entries;
+ int dir_index, index;
pasid_table = intel_pasid_get_table(dev);
if (WARN_ON(!pasid_table || pasid < 0 ||
pasid >= intel_pasid_get_dev_max_id(dev)))
return NULL;
- entries = pasid_table->table;
+ dir = pasid_table->table;
+ info = dev->archdata.iommu;
+ dir_index = pasid >> PASID_PDE_SHIFT;
+ index = pasid & PASID_PTE_MASK;
+
+ spin_lock(&pasid_lock);
+ entries = get_pasid_table_from_pde(&dir[dir_index]);
+ if (!entries) {
+ entries = alloc_pgtable_page(info->iommu->node);
+ if (!entries) {
+ spin_unlock(&pasid_lock);
+ return NULL;
+ }
+
+ WRITE_ONCE(dir[dir_index].val,
+ (u64)virt_to_phys(entries) | PASID_PTE_PRESENT);
+ }
+ spin_unlock(&pasid_lock);
- return &entries[pasid];
+ return &entries[index];
}
/*
@@ -224,7 +271,14 @@ struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid)
*/
static inline void pasid_clear_entry(struct pasid_entry *pe)
{
- WRITE_ONCE(pe->val, 0);
+ WRITE_ONCE(pe->val[0], 0);
+ WRITE_ONCE(pe->val[1], 0);
+ WRITE_ONCE(pe->val[2], 0);
+ WRITE_ONCE(pe->val[3], 0);
+ WRITE_ONCE(pe->val[4], 0);
+ WRITE_ONCE(pe->val[5], 0);
+ WRITE_ONCE(pe->val[6], 0);
+ WRITE_ONCE(pe->val[7], 0);
}
void intel_pasid_clear_entry(struct device *dev, int pasid)
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 1c05ed6..12f480c 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -12,11 +12,19 @@
#define PASID_MIN 0x1
#define PASID_MAX 0x100000
+#define PASID_PTE_MASK 0x3F
+#define PASID_PTE_PRESENT 1
+#define PDE_PFN_MASK PAGE_MASK
+#define PASID_PDE_SHIFT 6
-struct pasid_entry {
+struct pasid_dir_entry {
u64 val;
};
+struct pasid_entry {
+ u64 val[8];
+};
+
/* The representative of a PASID table */
struct pasid_table {
void *table; /* pasid table pointer */
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 640a350..5d250cf 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -64,8 +64,6 @@ int intel_svm_init(struct intel_iommu *iommu)
order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
if (ecap_dis(iommu->ecap)) {
- /* Just making it explicit... */
- BUILD_BUG_ON(sizeof(struct pasid_entry) != sizeof(struct pasid_state_entry));
pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
if (pages)
iommu->pasid_state_table = page_address(pages);
@@ -404,9 +402,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
pasid_entry_val |= PASID_ENTRY_FLPM_5LP;
entry = intel_pasid_get_entry(dev, svm->pasid);
- entry->val = pasid_entry_val;
-
- wmb();
+ WRITE_ONCE(entry->val[0], pasid_entry_val);
/*
* Flush PASID cache when a PASID table entry becomes
--
2.7.4
So that the pasid related info, such as the pasid table and the
maximum of pasid could be used during setting up scalable mode
context.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-iommu.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 62e9579..add7e3e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1918,6 +1918,7 @@ static void domain_exit(struct dmar_domain *domain)
static int domain_context_mapping_one(struct dmar_domain *domain,
struct intel_iommu *iommu,
+ struct pasid_table *table,
u8 bus, u8 devfn)
{
u16 did = domain->iommu_did[iommu->seq_id];
@@ -2040,6 +2041,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
struct domain_context_mapping_data {
struct dmar_domain *domain;
struct intel_iommu *iommu;
+ struct pasid_table *table;
};
static int domain_context_mapping_cb(struct pci_dev *pdev,
@@ -2048,25 +2050,31 @@ static int domain_context_mapping_cb(struct pci_dev *pdev,
struct domain_context_mapping_data *data = opaque;
return domain_context_mapping_one(data->domain, data->iommu,
- PCI_BUS_NUM(alias), alias & 0xff);
+ data->table, PCI_BUS_NUM(alias),
+ alias & 0xff);
}
static int
domain_context_mapping(struct dmar_domain *domain, struct device *dev)
{
+ struct domain_context_mapping_data data;
+ struct pasid_table *table;
struct intel_iommu *iommu;
u8 bus, devfn;
- struct domain_context_mapping_data data;
iommu = device_to_iommu(dev, &bus, &devfn);
if (!iommu)
return -ENODEV;
+ table = intel_pasid_get_table(dev);
+
if (!dev_is_pci(dev))
- return domain_context_mapping_one(domain, iommu, bus, devfn);
+ return domain_context_mapping_one(domain, iommu, table,
+ bus, devfn);
data.domain = domain;
data.iommu = iommu;
+ data.table = table;
return pci_for_each_dma_alias(to_pci_dev(dev),
&domain_context_mapping_cb, &data);
--
2.7.4
Hi,
On 16/07/18 07:49, Lu Baolu wrote:
> Intel vt-d rev3.0 [1] introduces a new translation mode called
> 'scalable mode', which enables PASID-granular translations for
> first level, second level, nested and pass-through modes. The
> vt-d scalable mode is the key ingredient to enable Scalable I/O
> Virtualization (Scalable IOV) [2] [3], which allows sharing a
> device in minimal possible granularity (ADI - Assignable Device
> Interface). It also includes all the capabilities required to
> enable Shared Virtual Addressing (SVA). As a result, previous
> Extended Context (ECS) mode is deprecated (no production ever
> implements ECS).
>
> Each scalable mode pasid table entry is 64 bytes in length, with
> fields point to the first level page table and the second level
> page table. The PGTT (Pasid Granular Translation Type) field is
> used by hardware to determine the translation type.
Looks promising! Since the 2nd level page tables are in the PASID entry,
the hypervisor traps guest accesses to the PASID tables instead of
passing through the whole PASID directory? Are you still planning to use
the VFIO BIND_PASID_TABLE interface in this mode, or a slightly
different one for individual PASIDs?
Thanks,
Jean
Hi Jean,
> From: Jean-Philippe Brucker
> Sent: Monday, July 16, 2018 6:52 PM
> On 16/07/18 07:49, Lu Baolu wrote:
> > Intel vt-d rev3.0 [1] introduces a new translation mode called
> > 'scalable mode', which enables PASID-granular translations for first
> > level, second level, nested and pass-through modes. The vt-d scalable
> > mode is the key ingredient to enable Scalable I/O Virtualization
> > (Scalable IOV) [2] [3], which allows sharing a device in minimal
> > possible granularity (ADI - Assignable Device Interface). It also
> > includes all the capabilities required to enable Shared Virtual
> > Addressing (SVA). As a result, previous Extended Context (ECS) mode is
> > deprecated (no production ever implements ECS).
> >
> > Each scalable mode pasid table entry is 64 bytes in length, with
> > fields point to the first level page table and the second level page
> > table. The PGTT (Pasid Granular Translation Type) field is used by
> > hardware to determine the translation type.
>
> Looks promising! Since the 2nd level page tables are in the PASID entry, the
> hypervisor traps guest accesses to the PASID tables instead of passing through the
> whole PASID directory? Are you still planning to use the VFIO BIND_PASID_TABLE
> interface in this mode, or a slightly different one for individual PASIDs?
You are right. For Intel VT-d, we don't need to give the access to the whole guest
PASID table in Scalable Mode. However, VFIO BIND_PASID_TABLE may still needed
for other vendor. So it may still in the proposed list. This would be covered in the
new vSVA patchset from Jacob and me.
Thanks,
Yi Liu
On Mon, 16 Jul 2018 11:51:57 +0100
Jean-Philippe Brucker <[email protected]> wrote:
> Hi,
>
> On 16/07/18 07:49, Lu Baolu wrote:
> > Intel vt-d rev3.0 [1] introduces a new translation mode called
> > 'scalable mode', which enables PASID-granular translations for
> > first level, second level, nested and pass-through modes. The
> > vt-d scalable mode is the key ingredient to enable Scalable I/O
> > Virtualization (Scalable IOV) [2] [3], which allows sharing a
> > device in minimal possible granularity (ADI - Assignable Device
> > Interface). It also includes all the capabilities required to
> > enable Shared Virtual Addressing (SVA). As a result, previous
> > Extended Context (ECS) mode is deprecated (no production ever
> > implements ECS).
> >
> > Each scalable mode pasid table entry is 64 bytes in length, with
> > fields point to the first level page table and the second level
> > page table. The PGTT (Pasid Granular Translation Type) field is
> > used by hardware to determine the translation type.
>
> Looks promising! Since the 2nd level page tables are in the PASID
> entry, the hypervisor traps guest accesses to the PASID tables
> instead of passing through the whole PASID directory? Are you still
> planning to use the VFIO BIND_PASID_TABLE interface in this mode, or
> a slightly different one for individual PASIDs?
>
Since we deprecated ECS mode, there is no need for VT-d to bind guest
pasid table in scalable mode. We are planning on adding another flag for
bind guest pasid and guest CR3, Perhaps call it
VFIO_IOMMU_BIND_GUEST_SVA.
> Thanks,
> Jean
> _______________________________________________
> iommu mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
[Jacob Pan]