Hi,
Intel vt-d rev3.0 [1] introduces a new translation mode called
'scalable mode', which enables PASID-granular translations for
first level, second level, nested and pass-through modes. The
vt-d scalable mode is the key ingredient to enable Scalable I/O
Virtualization (Scalable IOV) [2] [3], which allows sharing a
device in minimal possible granularity (ADI - Assignable Device
Interface). It also includes all the capabilities required to
enable Shared Virtual Addressing (SVA). As a result, previous
Extended Context (ECS) mode is deprecated (no production ever
implements ECS).
Each scalable mode pasid table entry is 64 bytes in length, with
fields point to the first level page table and the second level
page table. The PGTT (Pasid Granular Translation Type) field is
used by hardware to determine the translation type.
A Scalable Mode .-------------.
PASID Entry .-| |
.------------------. .-| | 1st Level |
7| | | | | Page Table |
.------------------. | | | |
6| | | | | |
'------------------' | | '-------------'
5| | | '-------------'
'------------------' '-------------'
4| | ^
'------------------' /
3| | / .-------------.
.----.-------.-----. / .-| |
2| | FLPTR | |/ .-| | 2nd Level |
.----'-------'-----. | | | Page Table |
1| | | | | |
.-.-------..------.. | | | |
0| | SLPTR || PGTT ||----> | | '-------------'
'-'-------''------'' | '-------------'
6 | 0 '-------------'
3 v
.------------------------------------.
| PASID Granular Translation Type |
| |
| 001b: 1st level translation only |
| 101b: 2nd level translation only |
| 011b: Nested translation |
| 100b: Pass through |
'------------------------------------'
This patch series adds the scalable mode support in the Intel
IOMMU driver. It will make all the Intel IOMMU features work
in scalable mode. The changes are all constrained within the
Intel IOMMU driver, as it's purely internal format change.
References:
[1] https://software.intel.com/en-us/download/intel-virtualization-technology-for-directed-io-architecture-specification
[2] https://software.intel.com/en-us/download/intel-scalable-io-virtualization-technical-specification
[3] https://schd.ws/hosted_files/lc32018/00/LC3-SIOV-final.pdf
Change log:
v1->v2:
- Rebase all patches on top of v4.19-rc1;
- Add 256-bit invalidation descriptor support;
- Reserve a domain id for first level and pass-through
usage to make hardware cache entries more efficiently;
- Various code refinements.
Lu Baolu (12):
iommu/vt-d: Enumerate the scalable mode capability
iommu/vt-d: Manage scalalble mode PASID tables
iommu/vt-d: Move page table helpers into header
iommu/vt-d: Add 256-bit invalidation descriptor support
iommu/vt-d: Reserve a domain id for FL and PT modes
iommu/vt-d: Add second level page table interface
iommu/vt-d: Setup pasid entry for RID2PASID support
iommu/vt-d: Pass pasid table to context mapping
iommu/vt-d: Setup context and enable RID2PASID support
iommu/vt-d: Add first level page table interface
iommu/vt-d: Shared virtual address in scalable mode
iommu/vt-d: Remove deferred invalidation
.../admin-guide/kernel-parameters.txt | 12 +-
drivers/iommu/dmar.c | 83 ++--
drivers/iommu/intel-iommu.c | 305 ++++++-------
drivers/iommu/intel-pasid.c | 409 +++++++++++++++++-
drivers/iommu/intel-pasid.h | 33 +-
drivers/iommu/intel-svm.c | 170 +++-----
drivers/iommu/intel_irq_remapping.c | 6 +-
include/linux/dma_remapping.h | 9 +-
include/linux/intel-iommu.h | 64 ++-
9 files changed, 764 insertions(+), 327 deletions(-)
--
2.17.1
So that they could also be used in other source files.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-iommu.c | 43 -------------------------------------
include/linux/intel-iommu.h | 43 +++++++++++++++++++++++++++++++++++++
2 files changed, 43 insertions(+), 43 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b0da4f765274..93cde957adc7 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -315,49 +315,6 @@ static inline void context_clear_entry(struct context_entry *context)
context->hi = 0;
}
-/*
- * 0: readable
- * 1: writable
- * 2-6: reserved
- * 7: super page
- * 8-10: available
- * 11: snoop behavior
- * 12-63: Host physcial address
- */
-struct dma_pte {
- u64 val;
-};
-
-static inline void dma_clear_pte(struct dma_pte *pte)
-{
- pte->val = 0;
-}
-
-static inline u64 dma_pte_addr(struct dma_pte *pte)
-{
-#ifdef CONFIG_64BIT
- return pte->val & VTD_PAGE_MASK;
-#else
- /* Must have a full atomic 64-bit read */
- return __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
-#endif
-}
-
-static inline bool dma_pte_present(struct dma_pte *pte)
-{
- return (pte->val & 3) != 0;
-}
-
-static inline bool dma_pte_superpage(struct dma_pte *pte)
-{
- return (pte->val & DMA_PTE_LARGE_PAGE);
-}
-
-static inline int first_pte_in_page(struct dma_pte *pte)
-{
- return !((unsigned long)pte & ~VTD_PAGE_MASK);
-}
-
/*
* This domain is a statically identity mapping domain.
* 1. This domain creats a static 1:1 mapping to all usable memory.
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 2173ae35f1dc..41791903a5e3 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -501,6 +501,49 @@ static inline void __iommu_flush_cache(
clflush_cache_range(addr, size);
}
+/*
+ * 0: readable
+ * 1: writable
+ * 2-6: reserved
+ * 7: super page
+ * 8-10: available
+ * 11: snoop behavior
+ * 12-63: Host physcial address
+ */
+struct dma_pte {
+ u64 val;
+};
+
+static inline void dma_clear_pte(struct dma_pte *pte)
+{
+ pte->val = 0;
+}
+
+static inline u64 dma_pte_addr(struct dma_pte *pte)
+{
+#ifdef CONFIG_64BIT
+ return pte->val & VTD_PAGE_MASK;
+#else
+ /* Must have a full atomic 64-bit read */
+ return __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
+#endif
+}
+
+static inline bool dma_pte_present(struct dma_pte *pte)
+{
+ return (pte->val & 3) != 0;
+}
+
+static inline bool dma_pte_superpage(struct dma_pte *pte)
+{
+ return (pte->val & DMA_PTE_LARGE_PAGE);
+}
+
+static inline int first_pte_in_page(struct dma_pte *pte)
+{
+ return !((unsigned long)pte & ~VTD_PAGE_MASK);
+}
+
extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev);
extern int dmar_find_matched_atsr_unit(struct pci_dev *dev);
--
2.17.1
Deferred invalidation is an ECS specific feature. It will not be
supported when IOMMU works in scalable mode. As we deprecated the
ECS support, remove deferred invalidation and cleanup the code.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-iommu.c | 1 -
drivers/iommu/intel-svm.c | 45 -------------------------------------
include/linux/intel-iommu.h | 8 -------
3 files changed, 54 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index e378a383d4f4..3e49d4029058 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1722,7 +1722,6 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
if (pasid_supported(iommu)) {
if (ecap_prs(iommu->ecap))
intel_svm_finish_prq(iommu);
- intel_svm_exit(iommu);
}
#endif
}
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index fa5a19d83795..3f5ed33c56f0 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -31,15 +31,8 @@
static irqreturn_t prq_event_thread(int irq, void *d);
-struct pasid_state_entry {
- u64 val;
-};
-
int intel_svm_init(struct intel_iommu *iommu)
{
- struct page *pages;
- int order;
-
if (cpu_feature_enabled(X86_FEATURE_GBPAGES) &&
!cap_fl1gp_support(iommu->cap))
return -EINVAL;
@@ -48,39 +41,6 @@ int intel_svm_init(struct intel_iommu *iommu)
!cap_5lp_support(iommu->cap))
return -EINVAL;
- /* Start at 2 because it's defined as 2^(1+PSS) */
- iommu->pasid_max = 2 << ecap_pss(iommu->ecap);
-
- /* Eventually I'm promised we will get a multi-level PASID table
- * and it won't have to be physically contiguous. Until then,
- * limit the size because 8MiB contiguous allocations can be hard
- * to come by. The limit of 0x20000, which is 1MiB for each of
- * the PASID and PASID-state tables, is somewhat arbitrary. */
- if (iommu->pasid_max > 0x20000)
- iommu->pasid_max = 0x20000;
-
- order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
- if (ecap_dis(iommu->ecap)) {
- pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
- if (pages)
- iommu->pasid_state_table = page_address(pages);
- else
- pr_warn("IOMMU: %s: Failed to allocate PASID state table\n",
- iommu->name);
- }
-
- return 0;
-}
-
-int intel_svm_exit(struct intel_iommu *iommu)
-{
- int order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
-
- if (iommu->pasid_state_table) {
- free_pages((unsigned long)iommu->pasid_state_table, order);
- iommu->pasid_state_table = NULL;
- }
-
return 0;
}
@@ -214,11 +174,6 @@ static void intel_flush_svm_range(struct intel_svm *svm, unsigned long address,
{
struct intel_svm_dev *sdev;
- /* Try deferred invalidate if available */
- if (svm->iommu->pasid_state_table &&
- !cmpxchg64(&svm->iommu->pasid_state_table[svm->pasid].val, 0, 1ULL << 63))
- return;
-
rcu_read_lock();
list_for_each_entry_rcu(sdev, &svm->devs, list)
intel_flush_svm_range_dev(svm, sdev, address, pages, ih, gl);
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 30e2bbfbbd50..b34cf8b887a0 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -457,15 +457,8 @@ struct intel_iommu {
struct iommu_flush flush;
#endif
#ifdef CONFIG_INTEL_IOMMU_SVM
- /* These are large and need to be contiguous, so we allocate just
- * one for now. We'll maybe want to rethink that if we truly give
- * devices away to userspace processes (e.g. for DPDK) and don't
- * want to trust that userspace will use *only* the PASID it was
- * told to. But while it's all driver-arbitrated, we're fine. */
- struct pasid_state_entry *pasid_state_table;
struct page_req_dsc *prq;
unsigned char prq_name[16]; /* Name for PRQ interrupt */
- u32 pasid_max;
#endif
struct q_inval *qi; /* Queued invalidation info */
u32 *iommu_state; /* Store iommu states between suspend and resume.*/
@@ -579,7 +572,6 @@ void iommu_flush_write_buffer(struct intel_iommu *iommu);
#ifdef CONFIG_INTEL_IOMMU_SVM
int intel_svm_init(struct intel_iommu *iommu);
-int intel_svm_exit(struct intel_iommu *iommu);
extern int intel_svm_enable_prq(struct intel_iommu *iommu);
extern int intel_svm_finish_prq(struct intel_iommu *iommu);
--
2.17.1
when the scalable mode is enabled, there is no second level
page translation pointer in the context entry any more (for
DMA request without PASID). Instead, a new RID2PASID field
is introduced in the context entry. Software can choose any
PASID value to set RID2PASID and then setup the translation
in the corresponding PASID entry. Upon receiving a DMA request
without PASID, hardware will firstly look at this RID2PASID
field and then treat this request as a request with a pasid
value specified in RID2PASID field.
Though software is allowed to use any PASID for the RID2PASID,
we will always use the PASID 0 as a sort of design decision.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-iommu.c | 20 ++++++++++++++++++++
drivers/iommu/intel-pasid.h | 1 +
2 files changed, 21 insertions(+)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index de6b909bb47a..c3bf2ccf094d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2475,12 +2475,27 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
dev->archdata.iommu = info;
if (dev && dev_is_pci(dev) && sm_supported(iommu)) {
+ bool pass_through;
+
ret = intel_pasid_alloc_table(dev);
if (ret) {
__dmar_remove_one_dev_info(info);
spin_unlock_irqrestore(&device_domain_lock, flags);
return NULL;
}
+
+ /* Setup the PASID entry for requests without PASID: */
+ pass_through = hw_pass_through && domain_type_is_si(domain);
+ spin_lock(&iommu->lock);
+ ret = intel_pasid_setup_second_level(iommu, domain, dev,
+ PASID_RID2PASID,
+ pass_through);
+ spin_unlock(&iommu->lock);
+ if (ret) {
+ __dmar_remove_one_dev_info(info);
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+ return NULL;
+ }
}
spin_unlock_irqrestore(&device_domain_lock, flags);
@@ -4846,6 +4861,11 @@ static void __dmar_remove_one_dev_info(struct device_domain_info *info)
iommu = info->iommu;
if (info->dev) {
+ if (dev_is_pci(info->dev) && sm_supported(iommu))
+ intel_pasid_tear_down_second_level(iommu,
+ info->domain, info->dev,
+ PASID_RID2PASID);
+
iommu_disable_dev_iotlb(info);
domain_context_clear(iommu, info->dev);
intel_pasid_free_table(info->dev);
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 85b158a1826a..dda578b8f18e 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -10,6 +10,7 @@
#ifndef __INTEL_PASID_H
#define __INTEL_PASID_H
+#define PASID_RID2PASID 0x0
#define PASID_MIN 0x1
#define PASID_MAX 0x100000
#define PASID_PTE_MASK 0x3F
--
2.17.1
So that the pasid related info, such as the pasid table and the
maximum of pasid could be used during setting up scalable mode
context.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-iommu.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index c3bf2ccf094d..33642dd3d6ba 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1942,6 +1942,7 @@ static void domain_exit(struct dmar_domain *domain)
static int domain_context_mapping_one(struct dmar_domain *domain,
struct intel_iommu *iommu,
+ struct pasid_table *table,
u8 bus, u8 devfn)
{
u16 did = domain->iommu_did[iommu->seq_id];
@@ -2064,6 +2065,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
struct domain_context_mapping_data {
struct dmar_domain *domain;
struct intel_iommu *iommu;
+ struct pasid_table *table;
};
static int domain_context_mapping_cb(struct pci_dev *pdev,
@@ -2072,25 +2074,31 @@ static int domain_context_mapping_cb(struct pci_dev *pdev,
struct domain_context_mapping_data *data = opaque;
return domain_context_mapping_one(data->domain, data->iommu,
- PCI_BUS_NUM(alias), alias & 0xff);
+ data->table, PCI_BUS_NUM(alias),
+ alias & 0xff);
}
static int
domain_context_mapping(struct dmar_domain *domain, struct device *dev)
{
+ struct domain_context_mapping_data data;
+ struct pasid_table *table;
struct intel_iommu *iommu;
u8 bus, devfn;
- struct domain_context_mapping_data data;
iommu = device_to_iommu(dev, &bus, &devfn);
if (!iommu)
return -ENODEV;
+ table = intel_pasid_get_table(dev);
+
if (!dev_is_pci(dev))
- return domain_context_mapping_one(domain, iommu, bus, devfn);
+ return domain_context_mapping_one(domain, iommu, table,
+ bus, devfn);
data.domain = domain;
data.iommu = iommu;
+ data.table = table;
return pci_for_each_dma_alias(to_pci_dev(dev),
&domain_context_mapping_cb, &data);
--
2.17.1
The Intel vt-d spec rev3.0 introduces a new translation
mode called scalable mode, which enables PASID-granular
translations for first level, second level, nested and
pass-through modes. At the same time, the previous
Extended Context (ECS) mode is deprecated (no production
ever implements ECS).
This patch adds enumeration for Scalable Mode and removes
the deprecated ECS enumeration. It provides a boot time
option to disable scalable mode even hardware claims to
support it.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
.../admin-guide/kernel-parameters.txt | 12 ++--
drivers/iommu/intel-iommu.c | 64 +++++--------------
include/linux/intel-iommu.h | 1 +
3 files changed, 24 insertions(+), 53 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9871e649ffef..5b971306a114 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1668,12 +1668,12 @@
By default, super page will be supported if Intel IOMMU
has the capability. With this option, super page will
not be supported.
- ecs_off [Default Off]
- By default, extended context tables will be supported if
- the hardware advertises that it has support both for the
- extended tables themselves, and also PASID support. With
- this option set, extended tables will not be used even
- on hardware which claims to support them.
+ sm_off [Default Off]
+ By default, scalable mode will be supported if the
+ hardware advertises that it has support for the scalable
+ mode translation. With this option set, scalable mode
+ will not be used even on hardware which claims to support
+ it.
tboot_noforce [Default Off]
Do not force the Intel IOMMU enabled under tboot.
By default, tboot will force Intel IOMMU on, which
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5f3f10cf9d9d..5845edf4dcf9 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -430,38 +430,16 @@ static int dmar_map_gfx = 1;
static int dmar_forcedac;
static int intel_iommu_strict;
static int intel_iommu_superpage = 1;
-static int intel_iommu_ecs = 1;
-static int intel_iommu_pasid28;
+static int intel_iommu_sm = 1;
static int iommu_identity_mapping;
#define IDENTMAP_ALL 1
#define IDENTMAP_GFX 2
#define IDENTMAP_AZALIA 4
-/* Broadwell and Skylake have broken ECS support — normal so-called "second
- * level" translation of DMA requests-without-PASID doesn't actually happen
- * unless you also set the NESTE bit in an extended context-entry. Which of
- * course means that SVM doesn't work because it's trying to do nested
- * translation of the physical addresses it finds in the process page tables,
- * through the IOVA->phys mapping found in the "second level" page tables.
- *
- * The VT-d specification was retroactively changed to change the definition
- * of the capability bits and pretend that Broadwell/Skylake never happened...
- * but unfortunately the wrong bit was changed. It's ECS which is broken, but
- * for some reason it was the PASID capability bit which was redefined (from
- * bit 28 on BDW/SKL to bit 40 in future).
- *
- * So our test for ECS needs to eschew those implementations which set the old
- * PASID capabiity bit 28, since those are the ones on which ECS is broken.
- * Unless we are working around the 'pasid28' limitations, that is, by putting
- * the device into passthrough mode for normal DMA and thus masking the bug.
- */
-#define ecs_enabled(iommu) (intel_iommu_ecs && ecap_ecs(iommu->ecap) && \
- (intel_iommu_pasid28 || !ecap_broken_pasid(iommu->ecap)))
-/* PASID support is thus enabled if ECS is enabled and *either* of the old
- * or new capability bits are set. */
-#define pasid_enabled(iommu) (ecs_enabled(iommu) && \
- (ecap_pasid(iommu->ecap) || ecap_broken_pasid(iommu->ecap)))
+#define sm_supported(iommu) (intel_iommu_sm && ecap_smts((iommu)->ecap))
+#define pasid_supported(iommu) (sm_supported(iommu) && \
+ ecap_pasid((iommu)->ecap))
int intel_iommu_gfx_mapped;
EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped);
@@ -541,15 +519,9 @@ static int __init intel_iommu_setup(char *str)
} else if (!strncmp(str, "sp_off", 6)) {
pr_info("Disable supported super page\n");
intel_iommu_superpage = 0;
- } else if (!strncmp(str, "ecs_off", 7)) {
- printk(KERN_INFO
- "Intel-IOMMU: disable extended context table support\n");
- intel_iommu_ecs = 0;
- } else if (!strncmp(str, "pasid28", 7)) {
- printk(KERN_INFO
- "Intel-IOMMU: enable pre-production PASID support\n");
- intel_iommu_pasid28 = 1;
- iommu_identity_mapping |= IDENTMAP_GFX;
+ } else if (!strncmp(str, "sm_off", 6)) {
+ pr_info("Intel-IOMMU: disable scalable mode support\n");
+ intel_iommu_sm = 0;
} else if (!strncmp(str, "tboot_noforce", 13)) {
printk(KERN_INFO
"Intel-IOMMU: not forcing on after tboot. This could expose security risk for tboot\n");
@@ -796,7 +768,7 @@ static inline struct context_entry *iommu_context_addr(struct intel_iommu *iommu
u64 *entry;
entry = &root->lo;
- if (ecs_enabled(iommu)) {
+ if (sm_supported(iommu)) {
if (devfn >= 0x80) {
devfn -= 0x80;
entry = &root->hi;
@@ -938,7 +910,7 @@ static void free_context_table(struct intel_iommu *iommu)
if (context)
free_pgtable_page(context);
- if (!ecs_enabled(iommu))
+ if (!sm_supported(iommu))
continue;
context = iommu_context_addr(iommu, i, 0x80, 0);
@@ -1290,8 +1262,6 @@ static void iommu_set_root_entry(struct intel_iommu *iommu)
unsigned long flag;
addr = virt_to_phys(iommu->root_entry);
- if (ecs_enabled(iommu))
- addr |= DMA_RTADDR_RTT;
raw_spin_lock_irqsave(&iommu->register_lock, flag);
dmar_writeq(iommu->reg + DMAR_RTADDR_REG, addr);
@@ -1780,7 +1750,7 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
free_context_table(iommu);
#ifdef CONFIG_INTEL_IOMMU_SVM
- if (pasid_enabled(iommu)) {
+ if (pasid_supported(iommu)) {
if (ecap_prs(iommu->ecap))
intel_svm_finish_prq(iommu);
intel_svm_exit(iommu);
@@ -2489,8 +2459,8 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
dmar_find_matched_atsr_unit(pdev))
info->ats_supported = 1;
- if (ecs_enabled(iommu)) {
- if (pasid_enabled(iommu)) {
+ if (sm_supported(iommu)) {
+ if (pasid_supported(iommu)) {
int features = pci_pasid_features(pdev);
if (features >= 0)
info->pasid_supported = features | 1;
@@ -3302,7 +3272,7 @@ static int __init init_dmars(void)
* We need to ensure the system pasid table is no bigger
* than the smallest supported.
*/
- if (pasid_enabled(iommu)) {
+ if (pasid_supported(iommu)) {
u32 temp = 2 << ecap_pss(iommu->ecap);
intel_pasid_max_id = min_t(u32, temp,
@@ -3363,7 +3333,7 @@ static int __init init_dmars(void)
if (!ecap_pass_through(iommu->ecap))
hw_pass_through = 0;
#ifdef CONFIG_INTEL_IOMMU_SVM
- if (pasid_enabled(iommu))
+ if (pasid_supported(iommu))
intel_svm_init(iommu);
#endif
}
@@ -3467,7 +3437,7 @@ static int __init init_dmars(void)
iommu_flush_write_buffer(iommu);
#ifdef CONFIG_INTEL_IOMMU_SVM
- if (pasid_enabled(iommu) && ecap_prs(iommu->ecap)) {
+ if (pasid_supported(iommu) && ecap_prs(iommu->ecap)) {
ret = intel_svm_enable_prq(iommu);
if (ret)
goto free_iommu;
@@ -4358,7 +4328,7 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
goto out;
#ifdef CONFIG_INTEL_IOMMU_SVM
- if (pasid_enabled(iommu))
+ if (pasid_supported(iommu))
intel_svm_init(iommu);
#endif
@@ -4375,7 +4345,7 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
iommu_flush_write_buffer(iommu);
#ifdef CONFIG_INTEL_IOMMU_SVM
- if (pasid_enabled(iommu) && ecap_prs(iommu->ecap)) {
+ if (pasid_supported(iommu) && ecap_prs(iommu->ecap)) {
ret = intel_svm_enable_prq(iommu);
if (ret)
goto disable_iommu;
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 28004d74ae04..2173ae35f1dc 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -115,6 +115,7 @@
* Extended Capability Register
*/
+#define ecap_smts(e) (((e) >> 43) & 0x1)
#define ecap_dit(e) ((e >> 41) & 0x1)
#define ecap_pasid(e) ((e >> 40) & 0x1)
#define ecap_pss(e) ((e >> 35) & 0x1f)
--
2.17.1
This adds the interfaces to setup or tear down the structures
for first level page table translation.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-pasid.c | 89 +++++++++++++++++++++++++++++++++++++
drivers/iommu/intel-pasid.h | 7 +++
include/linux/intel-iommu.h | 1 +
3 files changed, 97 insertions(+)
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index edcea1d8b9fc..c921426d7b64 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -10,6 +10,7 @@
#define pr_fmt(fmt) "DMAR: " fmt
#include <linux/bitops.h>
+#include <linux/cpufeature.h>
#include <linux/dmar.h>
#include <linux/intel-iommu.h>
#include <linux/iommu.h>
@@ -377,6 +378,26 @@ static inline void pasid_set_page_snoop(struct pasid_entry *pe, bool value)
pasid_set_bits(&pe->val[1], 1 << 23, value);
}
+/*
+ * Setup the First Level Page table Pointer field (Bit 140~191)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_flptr(struct pasid_entry *pe, u64 value)
+{
+ pasid_set_bits(&pe->val[2], VTD_PAGE_MASK, value);
+}
+
+/*
+ * Setup the First Level Paging Mode field (Bit 130~131) of a
+ * scalable mode PASID entry.
+ */
+static inline void
+pasid_set_flpm(struct pasid_entry *pe, u64 value)
+{
+ pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
+}
+
static void
pasid_based_pasid_cache_invalidation(struct intel_iommu *iommu,
int did, int pasid)
@@ -445,6 +466,74 @@ static void tear_down_one_pasid_entry(struct intel_iommu *iommu,
pasid_based_dev_iotlb_cache_invalidation(iommu, dev, pasid);
}
+/*
+ * Set up the scalable mode pasid table entry for first only
+ * translation type.
+ */
+int intel_pasid_setup_first_level(struct intel_iommu *iommu,
+ struct mm_struct *mm,
+ struct device *dev,
+ u16 did, int pasid)
+{
+ struct pasid_entry *pte;
+
+ if (!ecap_flts(iommu->ecap)) {
+ pr_err("No first level translation support on %s\n",
+ iommu->name);
+ return -EINVAL;
+ }
+
+ pte = intel_pasid_get_entry(dev, pasid);
+ if (WARN_ON(!pte))
+ return -EINVAL;
+
+ pasid_clear_entry(pte);
+
+ /* Setup the first level page table pointer: */
+ if (mm) {
+ pasid_set_flptr(pte, (u64)__pa(mm->pgd));
+ } else {
+ pasid_set_sre(pte);
+ pasid_set_flptr(pte, (u64)__pa(init_mm.pgd));
+ }
+
+#ifdef CONFIG_X86
+ if (cpu_feature_enabled(X86_FEATURE_LA57))
+ pasid_set_flpm(pte, 1);
+#endif /* CONFIG_X86 */
+
+ pasid_set_domain_id(pte, did);
+ pasid_set_address_width(pte, iommu->agaw);
+ pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+
+ /* Setup Present and PASID Granular Transfer Type: */
+ pasid_set_translation_type(pte, 1);
+ pasid_set_present(pte);
+
+ if (!ecap_coherent(iommu->ecap))
+ clflush_cache_range(pte, sizeof(*pte));
+
+ if (cap_caching_mode(iommu->cap)) {
+ pasid_based_pasid_cache_invalidation(iommu, did, pasid);
+ pasid_based_iotlb_cache_invalidation(iommu, did, pasid);
+ } else {
+ iommu_flush_write_buffer(iommu);
+ }
+
+ return 0;
+}
+
+/*
+ * Tear down the scalable mode pasid table entry for first only
+ * translation type.
+ */
+void intel_pasid_tear_down_first_level(struct intel_iommu *iommu,
+ struct device *dev,
+ u16 did, int pasid)
+{
+ tear_down_one_pasid_entry(iommu, dev, did, pasid);
+}
+
/*
* Set up the scalable mode pasid table entry for second only or
* passthrough translation type.
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 948cd3a25976..ee5ac3d2ac22 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -51,6 +51,13 @@ struct pasid_table *intel_pasid_get_table(struct device *dev);
int intel_pasid_get_dev_max_id(struct device *dev);
struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid);
void intel_pasid_clear_entry(struct device *dev, int pasid);
+int intel_pasid_setup_first_level(struct intel_iommu *iommu,
+ struct mm_struct *mm,
+ struct device *dev,
+ u16 did, int pasid);
+void intel_pasid_tear_down_first_level(struct intel_iommu *iommu,
+ struct device *dev,
+ u16 did, int pasid);
int intel_pasid_setup_second_level(struct intel_iommu *iommu,
struct dmar_domain *domain,
struct device *dev, int pasid,
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index b28613b472d6..30e2bbfbbd50 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -116,6 +116,7 @@
*/
#define ecap_smpwc(e) (((e) >> 48) & 0x1)
+#define ecap_flts(e) (((e) >> 47) & 0x1)
#define ecap_slts(e) (((e) >> 46) & 0x1)
#define ecap_smts(e) (((e) >> 43) & 0x1)
#define ecap_dit(e) ((e >> 41) & 0x1)
--
2.17.1
This patch enables the translation for requests without PASID in
the scalable mode by setting up the root and context entries.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-iommu.c | 109 ++++++++++++++++++++++++++++++------
drivers/iommu/intel-pasid.h | 1 +
include/linux/intel-iommu.h | 1 +
3 files changed, 95 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 33642dd3d6ba..d854b17033a4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1219,6 +1219,8 @@ static void iommu_set_root_entry(struct intel_iommu *iommu)
unsigned long flag;
addr = virt_to_phys(iommu->root_entry);
+ if (sm_supported(iommu))
+ addr |= DMA_RTADDR_SMT;
raw_spin_lock_irqsave(&iommu->register_lock, flag);
dmar_writeq(iommu->reg + DMAR_RTADDR_REG, addr);
@@ -1940,6 +1942,55 @@ static void domain_exit(struct dmar_domain *domain)
free_domain_mem(domain);
}
+/*
+ * Get the PASID directory size for scalable mode context entry.
+ * Value of X in the PDTS field of a scalable mode context entry
+ * indicates PASID directory with 2^(X + 7) entries.
+ */
+static inline unsigned long context_get_sm_pds(struct pasid_table *table)
+{
+ int pds, max_pde;
+
+ max_pde = table->max_pasid >> PASID_PDE_SHIFT;
+ pds = find_first_bit((unsigned long *)&max_pde, MAX_NR_PASID_BITS);
+ if (pds < 7)
+ return 0;
+
+ return pds - 7;
+}
+
+/*
+ * Set the RID_PASID field of a scalable mode context entry. The
+ * IOMMU hardware will use the PASID value set in this field for
+ * DMA translations of DMA requests without PASID.
+ */
+static inline void
+context_set_sm_rid2pasid(struct context_entry *context, unsigned long pasid)
+{
+ context->hi |= pasid & ((1 << 20) - 1);
+}
+
+/*
+ * Set the DTE(Device-TLB Enable) field of a scalable mode context
+ * entry.
+ */
+static inline void context_set_sm_dte(struct context_entry *context)
+{
+ context->lo |= (1 << 2);
+}
+
+/*
+ * Set the PRE(Page Request Enable) field of a scalable mode context
+ * entry.
+ */
+static inline void context_set_sm_pre(struct context_entry *context)
+{
+ context->lo |= (1 << 4);
+}
+
+/* Convert value to context PASID directory size field coding. */
+#define context_pdts(pds) (((pds) & 0x7) << 9)
+
static int domain_context_mapping_one(struct dmar_domain *domain,
struct intel_iommu *iommu,
struct pasid_table *table,
@@ -1998,9 +2049,7 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
}
pgd = domain->pgd;
-
context_clear_entry(context);
- context_set_domain_id(context, did);
/*
* Skip top levels of page tables for iommu which has less agaw
@@ -2013,25 +2062,54 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
if (!dma_pte_present(pgd))
goto out_unlock;
}
+ }
- info = iommu_support_dev_iotlb(domain, iommu, bus, devfn);
- if (info && info->ats_supported)
- translation = CONTEXT_TT_DEV_IOTLB;
- else
- translation = CONTEXT_TT_MULTI_LEVEL;
+ if (sm_supported(iommu)) {
+ unsigned long pds;
+
+ WARN_ON(!table);
+
+ /* Setup the PASID DIR pointer: */
+ pds = context_get_sm_pds(table);
+ context->lo = (u64)virt_to_phys(table->table) |
+ context_pdts(pds);
+
+ /* Setup the RID_PASID field: */
+ context_set_sm_rid2pasid(context, PASID_RID2PASID);
- context_set_address_root(context, virt_to_phys(pgd));
- context_set_address_width(context, iommu->agaw);
- } else {
/*
- * In pass through mode, AW must be programmed to
- * indicate the largest AGAW value supported by
- * hardware. And ASR is ignored by hardware.
+ * Setup the Device-TLB enable bit and Page request
+ * Enable bit:
*/
- context_set_address_width(context, iommu->msagaw);
+ info = iommu_support_dev_iotlb(domain, iommu, bus, devfn);
+ if (info && info->ats_supported)
+ context_set_sm_dte(context);
+ if (info && info->pri_supported)
+ context_set_sm_pre(context);
+ } else {
+ context_set_domain_id(context, did);
+
+ if (translation != CONTEXT_TT_PASS_THROUGH) {
+ info = iommu_support_dev_iotlb(domain, iommu,
+ bus, devfn);
+ if (info && info->ats_supported)
+ translation = CONTEXT_TT_DEV_IOTLB;
+ else
+ translation = CONTEXT_TT_MULTI_LEVEL;
+
+ context_set_address_root(context, virt_to_phys(pgd));
+ context_set_address_width(context, iommu->agaw);
+ } else {
+ /*
+ * In pass through mode, AW must be programmed to
+ * indicate the largest AGAW value supported by
+ * hardware. And ASR is ignored by hardware.
+ */
+ context_set_address_width(context, iommu->msagaw);
+ }
+ context_set_translation_type(context, translation);
}
- context_set_translation_type(context, translation);
context_set_fault_enable(context);
context_set_present(context);
domain_flush_cache(domain, context, sizeof(*context));
@@ -5201,7 +5279,6 @@ static void intel_iommu_put_resv_regions(struct device *dev,
}
#ifdef CONFIG_INTEL_IOMMU_SVM
-#define MAX_NR_PASID_BITS (20)
static inline unsigned long intel_iommu_get_pts(struct device *dev)
{
int pts, max_pasid;
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index dda578b8f18e..948cd3a25976 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -17,6 +17,7 @@
#define PASID_PTE_PRESENT 1
#define PDE_PFN_MASK PAGE_MASK
#define PASID_PDE_SHIFT 6
+#define MAX_NR_PASID_BITS 20
/*
* Domain ID reserved for pasid entries programmed for first-level
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index d77d23dfd221..b28613b472d6 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -196,6 +196,7 @@
/* DMA_RTADDR_REG */
#define DMA_RTADDR_RTT (((u64)1) << 11)
+#define DMA_RTADDR_SMT (((u64)1) << 10)
/* CCMD_REG */
#define DMA_CCMD_ICC (((u64)1) << 63)
--
2.17.1
This adds the interfaces to setup or tear down the structures
for second level page table translations. This includes types
of second level only translation and pass through.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-iommu.c | 2 +-
drivers/iommu/intel-pasid.c | 246 ++++++++++++++++++++++++++++++++++++
drivers/iommu/intel-pasid.h | 7 +
include/linux/intel-iommu.h | 3 +
4 files changed, 257 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 562da10bf93e..de6b909bb47a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1232,7 +1232,7 @@ static void iommu_set_root_entry(struct intel_iommu *iommu)
raw_spin_unlock_irqrestore(&iommu->register_lock, flag);
}
-static void iommu_flush_write_buffer(struct intel_iommu *iommu)
+void iommu_flush_write_buffer(struct intel_iommu *iommu)
{
u32 val;
unsigned long flag;
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index d6e90cd5b062..edcea1d8b9fc 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -9,6 +9,7 @@
#define pr_fmt(fmt) "DMAR: " fmt
+#include <linux/bitops.h>
#include <linux/dmar.h>
#include <linux/intel-iommu.h>
#include <linux/iommu.h>
@@ -291,3 +292,248 @@ void intel_pasid_clear_entry(struct device *dev, int pasid)
pasid_clear_entry(pe);
}
+
+static inline void pasid_set_bits(u64 *ptr, u64 mask, u64 bits)
+{
+ u64 old;
+
+ old = READ_ONCE(*ptr);
+ WRITE_ONCE(*ptr, (old & ~mask) | bits);
+}
+
+/*
+ * Setup the DID(Domain Identifier) field (Bit 64~79) of scalable mode
+ * PASID entry.
+ */
+static inline void
+pasid_set_domain_id(struct pasid_entry *pe, u64 value)
+{
+ pasid_set_bits(&pe->val[1], GENMASK_ULL(15, 0), value);
+}
+
+/*
+ * Setup the SLPTPTR(Second Level Page Table Pointer) field (Bit 12~63)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_address_root(struct pasid_entry *pe, u64 value)
+{
+ pasid_set_bits(&pe->val[0], VTD_PAGE_MASK, value);
+}
+
+/*
+ * Setup the AW(Address Width) field (Bit 2~4) of a scalable mode PASID
+ * entry.
+ */
+static inline void
+pasid_set_address_width(struct pasid_entry *pe, u64 value)
+{
+ pasid_set_bits(&pe->val[0], GENMASK_ULL(4, 2), value << 2);
+}
+
+/*
+ * Setup the PGTT(PASID Granular Translation Type) field (Bit 6~8)
+ * of a scalable mode PASID entry.
+ */
+static inline void
+pasid_set_translation_type(struct pasid_entry *pe, u64 value)
+{
+ pasid_set_bits(&pe->val[0], GENMASK_ULL(8, 6), value << 6);
+}
+
+/*
+ * Enable fault processing by clearing the FPD(Fault Processing
+ * Disable) field (Bit 1) of a scalable mode PASID entry.
+ */
+static inline void pasid_set_fault_enable(struct pasid_entry *pe)
+{
+ pasid_set_bits(&pe->val[0], 1 << 1, 0);
+}
+
+/*
+ * Setup the SRE(Supervisor Request Enable) field (Bit 128) of a
+ * scalable mode PASID entry.
+ */
+static inline void pasid_set_sre(struct pasid_entry *pe)
+{
+ pasid_set_bits(&pe->val[2], 1 << 0, 1);
+}
+
+/*
+ * Setup the P(Present) field (Bit 0) of a scalable mode PASID
+ * entry.
+ */
+static inline void pasid_set_present(struct pasid_entry *pe)
+{
+ pasid_set_bits(&pe->val[0], 1 << 0, 1);
+}
+
+/*
+ * Setup Page Walk Snoop bit (Bit 87) of a scalable mode PASID
+ * entry.
+ */
+static inline void pasid_set_page_snoop(struct pasid_entry *pe, bool value)
+{
+ pasid_set_bits(&pe->val[1], 1 << 23, value);
+}
+
+static void
+pasid_based_pasid_cache_invalidation(struct intel_iommu *iommu,
+ int did, int pasid)
+{
+ struct qi_desc desc;
+
+ desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL | QI_PC_PASID(pasid);
+ desc.qw1 = 0;
+ desc.qw2 = 0;
+ desc.qw3 = 0;
+
+ qi_submit_sync(&desc, iommu);
+}
+
+static void
+pasid_based_iotlb_cache_invalidation(struct intel_iommu *iommu,
+ u16 did, u32 pasid)
+{
+ struct qi_desc desc;
+
+ desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
+ QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) | QI_EIOTLB_TYPE;
+ desc.qw1 = 0;
+ desc.qw2 = 0;
+ desc.qw3 = 0;
+
+ qi_submit_sync(&desc, iommu);
+}
+
+static void
+pasid_based_dev_iotlb_cache_invalidation(struct intel_iommu *iommu,
+ struct device *dev, int pasid)
+{
+ struct device_domain_info *info;
+ u16 sid, qdep, pfsid;
+
+ info = dev->archdata.iommu;
+ if (!info || !info->ats_enabled)
+ return;
+
+ sid = info->bus << 8 | info->devfn;
+ qdep = info->ats_qdep;
+ pfsid = info->pfsid;
+
+ qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 - VTD_PAGE_SHIFT);
+}
+
+static void tear_down_one_pasid_entry(struct intel_iommu *iommu,
+ struct device *dev, u16 did,
+ int pasid)
+{
+ struct pasid_entry *pte;
+
+ intel_pasid_clear_entry(dev, pasid);
+
+ if (!ecap_coherent(iommu->ecap)) {
+ pte = intel_pasid_get_entry(dev, pasid);
+ clflush_cache_range(pte, sizeof(*pte));
+ }
+
+ pasid_based_pasid_cache_invalidation(iommu, did, pasid);
+ pasid_based_iotlb_cache_invalidation(iommu, did, pasid);
+
+ /* Device IOTLB doesn't need to be flushed in caching mode. */
+ if (!cap_caching_mode(iommu->cap))
+ pasid_based_dev_iotlb_cache_invalidation(iommu, dev, pasid);
+}
+
+/*
+ * Set up the scalable mode pasid table entry for second only or
+ * passthrough translation type.
+ */
+int intel_pasid_setup_second_level(struct intel_iommu *iommu,
+ struct dmar_domain *domain,
+ struct device *dev, int pasid,
+ bool pass_through)
+{
+ struct pasid_entry *pte;
+ struct dma_pte *pgd;
+ u64 pgd_val;
+ int agaw;
+ u16 did;
+
+ /*
+ * If hardware advertises no support for second level translation,
+ * we only allow pass through translation setup.
+ */
+ if (!(ecap_slts(iommu->ecap) || pass_through)) {
+ pr_err("No first level translation support on %s, only pass-through mode allowed\n",
+ iommu->name);
+ return -EINVAL;
+ }
+
+ /*
+ * Skip top levels of page tables for iommu which has less agaw
+ * than default. Unnecessary for PT mode.
+ */
+ pgd = domain->pgd;
+ if (!pass_through) {
+ for (agaw = domain->agaw; agaw != iommu->agaw; agaw--) {
+ pgd = phys_to_virt(dma_pte_addr(pgd));
+ if (!dma_pte_present(pgd)) {
+ dev_err(dev, "Invalid domain page table\n");
+ return -EINVAL;
+ }
+ }
+ }
+ pgd_val = pass_through ? 0 : virt_to_phys(pgd);
+ did = pass_through ? FLPT_DEFAULT_DID :
+ domain->iommu_did[iommu->seq_id];
+
+ pte = intel_pasid_get_entry(dev, pasid);
+ if (!pte) {
+ dev_err(dev, "Failed to get pasid entry of PASID %d\n", pasid);
+ return -ENODEV;
+ }
+
+ pasid_clear_entry(pte);
+ pasid_set_domain_id(pte, did);
+
+ if (!pass_through)
+ pasid_set_address_root(pte, pgd_val);
+
+ pasid_set_address_width(pte, iommu->agaw);
+ pasid_set_translation_type(pte, pass_through ? 4 : 2);
+ pasid_set_fault_enable(pte);
+ pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+
+ /*
+ * Since it is a second level only translation setup, we should
+ * set SRE bit as well (addresses are expected to be GPAs).
+ */
+ pasid_set_sre(pte);
+ pasid_set_present(pte);
+
+ if (!ecap_coherent(iommu->ecap))
+ clflush_cache_range(pte, sizeof(*pte));
+
+ if (cap_caching_mode(iommu->cap)) {
+ pasid_based_pasid_cache_invalidation(iommu, did, pasid);
+ pasid_based_iotlb_cache_invalidation(iommu, did, pasid);
+ } else {
+ iommu_flush_write_buffer(iommu);
+ }
+
+ return 0;
+}
+
+/*
+ * Tear down the scalable mode pasid table entry for second only or
+ * passthrough translation type.
+ */
+void intel_pasid_tear_down_second_level(struct intel_iommu *iommu,
+ struct dmar_domain *domain,
+ struct device *dev, int pasid)
+{
+ u16 did = domain->iommu_did[iommu->seq_id];
+
+ tear_down_one_pasid_entry(iommu, dev, did, pasid);
+}
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 03c1612d173c..85b158a1826a 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -49,5 +49,12 @@ struct pasid_table *intel_pasid_get_table(struct device *dev);
int intel_pasid_get_dev_max_id(struct device *dev);
struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid);
void intel_pasid_clear_entry(struct device *dev, int pasid);
+int intel_pasid_setup_second_level(struct intel_iommu *iommu,
+ struct dmar_domain *domain,
+ struct device *dev, int pasid,
+ bool pass_through);
+void intel_pasid_tear_down_second_level(struct intel_iommu *iommu,
+ struct dmar_domain *domain,
+ struct device *dev, int pasid);
#endif /* __INTEL_PASID_H */
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 72aff482b293..d77d23dfd221 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -115,6 +115,8 @@
* Extended Capability Register
*/
+#define ecap_smpwc(e) (((e) >> 48) & 0x1)
+#define ecap_slts(e) (((e) >> 46) & 0x1)
#define ecap_smts(e) (((e) >> 43) & 0x1)
#define ecap_dit(e) ((e >> 41) & 0x1)
#define ecap_pasid(e) ((e >> 40) & 0x1)
@@ -571,6 +573,7 @@ void free_pgtable_page(void *vaddr);
struct intel_iommu *domain_get_iommu(struct dmar_domain *domain);
int for_each_device_domain(int (*fn)(struct device_domain_info *info,
void *data), void *data);
+void iommu_flush_write_buffer(struct intel_iommu *iommu);
#ifdef CONFIG_INTEL_IOMMU_SVM
int intel_svm_init(struct intel_iommu *iommu);
--
2.17.1
This patch enables the current SVA (Shared Virtual Address)
implementation to work in the scalable mode.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-iommu.c | 40 +-----------------------
drivers/iommu/intel-pasid.c | 2 +-
drivers/iommu/intel-pasid.h | 1 -
drivers/iommu/intel-svm.c | 57 +++++++++++------------------------
include/linux/dma_remapping.h | 9 +-----
5 files changed, 20 insertions(+), 89 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index d854b17033a4..e378a383d4f4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5279,18 +5279,6 @@ static void intel_iommu_put_resv_regions(struct device *dev,
}
#ifdef CONFIG_INTEL_IOMMU_SVM
-static inline unsigned long intel_iommu_get_pts(struct device *dev)
-{
- int pts, max_pasid;
-
- max_pasid = intel_pasid_get_dev_max_id(dev);
- pts = find_first_bit((unsigned long *)&max_pasid, MAX_NR_PASID_BITS);
- if (pts < 5)
- return 0;
-
- return pts - 5;
-}
-
int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sdev)
{
struct device_domain_info *info;
@@ -5318,37 +5306,11 @@ int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sd
ctx_lo = context[0].lo;
- sdev->did = domain->iommu_did[iommu->seq_id];
+ sdev->did = FLPT_DEFAULT_DID;
sdev->sid = PCI_DEVID(info->bus, info->devfn);
if (!(ctx_lo & CONTEXT_PASIDE)) {
- if (iommu->pasid_state_table)
- context[1].hi = (u64)virt_to_phys(iommu->pasid_state_table);
- context[1].lo = (u64)virt_to_phys(info->pasid_table->table) |
- intel_iommu_get_pts(sdev->dev);
-
- wmb();
- /* CONTEXT_TT_MULTI_LEVEL and CONTEXT_TT_DEV_IOTLB are both
- * extended to permit requests-with-PASID if the PASIDE bit
- * is set. which makes sense. For CONTEXT_TT_PASS_THROUGH,
- * however, the PASIDE bit is ignored and requests-with-PASID
- * are unconditionally blocked. Which makes less sense.
- * So convert from CONTEXT_TT_PASS_THROUGH to one of the new
- * "guest mode" translation types depending on whether ATS
- * is available or not. Annoyingly, we can't use the new
- * modes *unless* PASIDE is set. */
- if ((ctx_lo & CONTEXT_TT_MASK) == (CONTEXT_TT_PASS_THROUGH << 2)) {
- ctx_lo &= ~CONTEXT_TT_MASK;
- if (info->ats_supported)
- ctx_lo |= CONTEXT_TT_PT_PASID_DEV_IOTLB << 2;
- else
- ctx_lo |= CONTEXT_TT_PT_PASID << 2;
- }
ctx_lo |= CONTEXT_PASIDE;
- if (iommu->pasid_state_table)
- ctx_lo |= CONTEXT_DINVE;
- if (info->pri_supported)
- ctx_lo |= CONTEXT_PRS;
context[0].lo = ctx_lo;
wmb();
iommu->flush.flush_context(iommu, sdev->did, sdev->sid,
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index c921426d7b64..a24a11bae03e 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -283,7 +283,7 @@ static inline void pasid_clear_entry(struct pasid_entry *pe)
WRITE_ONCE(pe->val[7], 0);
}
-void intel_pasid_clear_entry(struct device *dev, int pasid)
+static void intel_pasid_clear_entry(struct device *dev, int pasid)
{
struct pasid_entry *pe;
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index ee5ac3d2ac22..9f628db9db41 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -50,7 +50,6 @@ void intel_pasid_free_table(struct device *dev);
struct pasid_table *intel_pasid_get_table(struct device *dev);
int intel_pasid_get_dev_max_id(struct device *dev);
struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid);
-void intel_pasid_clear_entry(struct device *dev, int pasid);
int intel_pasid_setup_first_level(struct intel_iommu *iommu,
struct mm_struct *mm,
struct device *dev,
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index a06ed098e928..fa5a19d83795 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -29,10 +29,6 @@
#include "intel-pasid.h"
-#define PASID_ENTRY_P BIT_ULL(0)
-#define PASID_ENTRY_FLPM_5LP BIT_ULL(9)
-#define PASID_ENTRY_SRE BIT_ULL(11)
-
static irqreturn_t prq_event_thread(int irq, void *d);
struct pasid_state_entry {
@@ -248,20 +244,6 @@ static void intel_invalidate_range(struct mmu_notifier *mn,
(end - start + PAGE_SIZE - 1) >> VTD_PAGE_SHIFT, 0, 0);
}
-
-static void intel_flush_pasid_dev(struct intel_svm *svm, struct intel_svm_dev *sdev, int pasid)
-{
- struct qi_desc desc;
-
- desc.qw0 = QI_PC_TYPE | QI_PC_DID(sdev->did) |
- QI_PC_PASID_SEL | QI_PC_PASID(pasid);
- desc.qw1 = 0;
- desc.qw2 = 0;
- desc.qw3 = 0;
-
- qi_submit_sync(&desc, svm->iommu);
-}
-
static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
{
struct intel_svm *svm = container_of(mn, struct intel_svm, notifier);
@@ -281,8 +263,8 @@ static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
*/
rcu_read_lock();
list_for_each_entry_rcu(sdev, &svm->devs, list) {
- intel_pasid_clear_entry(sdev->dev, svm->pasid);
- intel_flush_pasid_dev(svm, sdev, svm->pasid);
+ intel_pasid_tear_down_first_level(svm->iommu, sdev->dev,
+ sdev->did, svm->pasid);
intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, !svm->mm);
}
rcu_read_unlock();
@@ -302,11 +284,9 @@ static LIST_HEAD(global_svm_list);
int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ops *ops)
{
struct intel_iommu *iommu = intel_svm_device_to_iommu(dev);
- struct pasid_entry *entry;
struct intel_svm_dev *sdev;
struct intel_svm *svm = NULL;
struct mm_struct *mm = NULL;
- u64 pasid_entry_val;
int pasid_max;
int ret;
@@ -415,22 +395,18 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
kfree(sdev);
goto out;
}
- pasid_entry_val = (u64)__pa(mm->pgd) | PASID_ENTRY_P;
- } else
- pasid_entry_val = (u64)__pa(init_mm.pgd) |
- PASID_ENTRY_P | PASID_ENTRY_SRE;
- if (cpu_feature_enabled(X86_FEATURE_LA57))
- pasid_entry_val |= PASID_ENTRY_FLPM_5LP;
-
- entry = intel_pasid_get_entry(dev, svm->pasid);
- WRITE_ONCE(entry->val[0], pasid_entry_val);
-
- /*
- * Flush PASID cache when a PASID table entry becomes
- * present.
- */
- if (cap_caching_mode(iommu->cap))
- intel_flush_pasid_dev(svm, sdev, svm->pasid);
+ }
+
+ ret = intel_pasid_setup_first_level(iommu, mm, dev,
+ sdev->did, svm->pasid);
+ if (ret) {
+ if (mm)
+ mmu_notifier_unregister(&svm->notifier, mm);
+ intel_pasid_free_id(svm->pasid);
+ kfree(svm);
+ kfree(sdev);
+ goto out;
+ }
list_add_tail(&svm->list, &global_svm_list);
}
@@ -476,10 +452,11 @@ int intel_svm_unbind_mm(struct device *dev, int pasid)
* to use. We have a *shared* PASID table, because it's
* large and has to be physically contiguous. So it's
* hard to be as defensive as we might like. */
- intel_flush_pasid_dev(svm, sdev, svm->pasid);
+ intel_pasid_tear_down_first_level(iommu, dev,
+ sdev->did,
+ svm->pasid);
intel_flush_svm_range_dev(svm, sdev, 0, -1, 0, !svm->mm);
kfree_rcu(sdev, rcu);
- intel_pasid_clear_entry(dev, svm->pasid);
if (list_empty(&svm->devs)) {
intel_pasid_free_id(svm->pasid);
diff --git a/include/linux/dma_remapping.h b/include/linux/dma_remapping.h
index 21b3e7d33d68..6f01e54702e5 100644
--- a/include/linux/dma_remapping.h
+++ b/include/linux/dma_remapping.h
@@ -21,14 +21,7 @@
#define CONTEXT_TT_MULTI_LEVEL 0
#define CONTEXT_TT_DEV_IOTLB 1
#define CONTEXT_TT_PASS_THROUGH 2
-/* Extended context entry types */
-#define CONTEXT_TT_PT_PASID 4
-#define CONTEXT_TT_PT_PASID_DEV_IOTLB 5
-#define CONTEXT_TT_MASK (7ULL << 2)
-
-#define CONTEXT_DINVE (1ULL << 8)
-#define CONTEXT_PRS (1ULL << 9)
-#define CONTEXT_PASIDE (1ULL << 11)
+#define CONTEXT_PASIDE BIT_ULL(3)
struct intel_iommu;
struct dmar_domain;
--
2.17.1
In scalable mode, pasid structure is a two level table with
a pasid directory table and a pasid table. Any pasid entry
can be identified by a pasid value in below way.
1
9 6 5 0
.-----------------------.-------.
| PASID | |
'-----------------------'-------' .-------------.
| | | |
| | | |
| | | |
| .-----------. | .-------------.
| | | |----->| PASID Entry |
| | | | '-------------'
| | | |Plus | |
| .-----------. | | |
|---->| DIR Entry |-------->| |
| '-----------' '-------------'
.---------. |Plus | |
| Context | | | |
| Entry |------->| |
'---------' '-----------'
This changes the pasid table APIs to support scalable mode
PASID directory and PASID table. It also adds a helper to
get the PASID table entry according to the pasid value.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
Reviewed-by: Ashok Raj <[email protected]>
---
drivers/iommu/intel-iommu.c | 2 +-
drivers/iommu/intel-pasid.c | 72 ++++++++++++++++++++++++++++++++-----
drivers/iommu/intel-pasid.h | 10 +++++-
drivers/iommu/intel-svm.c | 6 +---
4 files changed, 74 insertions(+), 16 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5845edf4dcf9..b0da4f765274 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2507,7 +2507,7 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
if (dev)
dev->archdata.iommu = info;
- if (dev && dev_is_pci(dev) && info->pasid_supported) {
+ if (dev && dev_is_pci(dev) && sm_supported(iommu)) {
ret = intel_pasid_alloc_table(dev);
if (ret) {
__dmar_remove_one_dev_info(info);
diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index fe95c9bd4d33..d6e90cd5b062 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -127,8 +127,7 @@ int intel_pasid_alloc_table(struct device *dev)
int ret, order;
info = dev->archdata.iommu;
- if (WARN_ON(!info || !dev_is_pci(dev) ||
- !info->pasid_supported || info->pasid_table))
+ if (WARN_ON(!info || !dev_is_pci(dev) || info->pasid_table))
return -EINVAL;
/* DMA alias device already has a pasid table, use it: */
@@ -143,8 +142,9 @@ int intel_pasid_alloc_table(struct device *dev)
return -ENOMEM;
INIT_LIST_HEAD(&pasid_table->dev);
- size = sizeof(struct pasid_entry);
+ size = sizeof(struct pasid_dir_entry);
count = min_t(int, pci_max_pasids(to_pci_dev(dev)), intel_pasid_max_id);
+ count >>= PASID_PDE_SHIFT;
order = get_order(size * count);
pages = alloc_pages_node(info->iommu->node,
GFP_ATOMIC | __GFP_ZERO,
@@ -154,7 +154,7 @@ int intel_pasid_alloc_table(struct device *dev)
pasid_table->table = page_address(pages);
pasid_table->order = order;
- pasid_table->max_pasid = count;
+ pasid_table->max_pasid = count << PASID_PDE_SHIFT;
attach_out:
device_attach_pasid_table(info, pasid_table);
@@ -162,14 +162,33 @@ int intel_pasid_alloc_table(struct device *dev)
return 0;
}
+/* Get PRESENT bit of a PASID directory entry. */
+static inline bool
+pasid_pde_is_present(struct pasid_dir_entry *pde)
+{
+ return READ_ONCE(pde->val) & PASID_PTE_PRESENT;
+}
+
+/* Get PASID table from a PASID directory entry. */
+static inline struct pasid_entry *
+get_pasid_table_from_pde(struct pasid_dir_entry *pde)
+{
+ if (!pasid_pde_is_present(pde))
+ return NULL;
+
+ return phys_to_virt(READ_ONCE(pde->val) & PDE_PFN_MASK);
+}
+
void intel_pasid_free_table(struct device *dev)
{
struct device_domain_info *info;
struct pasid_table *pasid_table;
+ struct pasid_dir_entry *dir;
+ struct pasid_entry *table;
+ int i, max_pde;
info = dev->archdata.iommu;
- if (!info || !dev_is_pci(dev) ||
- !info->pasid_supported || !info->pasid_table)
+ if (!info || !dev_is_pci(dev) || !info->pasid_table)
return;
pasid_table = info->pasid_table;
@@ -178,6 +197,14 @@ void intel_pasid_free_table(struct device *dev)
if (!list_empty(&pasid_table->dev))
return;
+ /* Free scalable mode PASID directory tables: */
+ dir = pasid_table->table;
+ max_pde = pasid_table->max_pasid >> PASID_PDE_SHIFT;
+ for (i = 0; i < max_pde; i++) {
+ table = get_pasid_table_from_pde(&dir[i]);
+ free_pgtable_page(table);
+ }
+
free_pages((unsigned long)pasid_table->table, pasid_table->order);
kfree(pasid_table);
}
@@ -206,17 +233,37 @@ int intel_pasid_get_dev_max_id(struct device *dev)
struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid)
{
+ struct device_domain_info *info;
struct pasid_table *pasid_table;
+ struct pasid_dir_entry *dir;
struct pasid_entry *entries;
+ int dir_index, index;
pasid_table = intel_pasid_get_table(dev);
if (WARN_ON(!pasid_table || pasid < 0 ||
pasid >= intel_pasid_get_dev_max_id(dev)))
return NULL;
- entries = pasid_table->table;
+ dir = pasid_table->table;
+ info = dev->archdata.iommu;
+ dir_index = pasid >> PASID_PDE_SHIFT;
+ index = pasid & PASID_PTE_MASK;
+
+ spin_lock(&pasid_lock);
+ entries = get_pasid_table_from_pde(&dir[dir_index]);
+ if (!entries) {
+ entries = alloc_pgtable_page(info->iommu->node);
+ if (!entries) {
+ spin_unlock(&pasid_lock);
+ return NULL;
+ }
+
+ WRITE_ONCE(dir[dir_index].val,
+ (u64)virt_to_phys(entries) | PASID_PTE_PRESENT);
+ }
+ spin_unlock(&pasid_lock);
- return &entries[pasid];
+ return &entries[index];
}
/*
@@ -224,7 +271,14 @@ struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid)
*/
static inline void pasid_clear_entry(struct pasid_entry *pe)
{
- WRITE_ONCE(pe->val, 0);
+ WRITE_ONCE(pe->val[0], 0);
+ WRITE_ONCE(pe->val[1], 0);
+ WRITE_ONCE(pe->val[2], 0);
+ WRITE_ONCE(pe->val[3], 0);
+ WRITE_ONCE(pe->val[4], 0);
+ WRITE_ONCE(pe->val[5], 0);
+ WRITE_ONCE(pe->val[6], 0);
+ WRITE_ONCE(pe->val[7], 0);
}
void intel_pasid_clear_entry(struct device *dev, int pasid)
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 1c05ed6fc5a5..12f480c2bb8b 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -12,11 +12,19 @@
#define PASID_MIN 0x1
#define PASID_MAX 0x100000
+#define PASID_PTE_MASK 0x3F
+#define PASID_PTE_PRESENT 1
+#define PDE_PFN_MASK PAGE_MASK
+#define PASID_PDE_SHIFT 6
-struct pasid_entry {
+struct pasid_dir_entry {
u64 val;
};
+struct pasid_entry {
+ u64 val[8];
+};
+
/* The representative of a PASID table */
struct pasid_table {
void *table; /* pasid table pointer */
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 4a03e5090952..6c0bd9ee9602 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -65,8 +65,6 @@ int intel_svm_init(struct intel_iommu *iommu)
order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
if (ecap_dis(iommu->ecap)) {
- /* Just making it explicit... */
- BUILD_BUG_ON(sizeof(struct pasid_entry) != sizeof(struct pasid_state_entry));
pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
if (pages)
iommu->pasid_state_table = page_address(pages);
@@ -406,9 +404,7 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_
pasid_entry_val |= PASID_ENTRY_FLPM_5LP;
entry = intel_pasid_get_entry(dev, svm->pasid);
- entry->val = pasid_entry_val;
-
- wmb();
+ WRITE_ONCE(entry->val[0], pasid_entry_val);
/*
* Flush PASID cache when a PASID table entry becomes
--
2.17.1
Vt-d spec rev3.0 (section 6.2.3.1) requires that each pasid
entry for first-level or pass-through translation should be
programmed with a domain id different from those used for
second-level or nested translation. It is recommended that
software could use a same domain id for all first-only and
pass-through translations.
This reserves a domain id for first-level and pass-through
translations.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Cc: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/intel-iommu.c | 10 ++++++++++
drivers/iommu/intel-pasid.h | 6 ++++++
2 files changed, 16 insertions(+)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 93cde957adc7..562da10bf93e 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1643,6 +1643,16 @@ static int iommu_init_domains(struct intel_iommu *iommu)
*/
set_bit(0, iommu->domain_ids);
+ /*
+ * Vt-d spec rev3.0 (section 6.2.3.1) requires that each pasid
+ * entry for first-level or pass-through translation modes should
+ * be programmed with a domain id different from those used for
+ * second-level or nested translation. We reserve a domain id for
+ * this purpose.
+ */
+ if (sm_supported(iommu))
+ set_bit(FLPT_DEFAULT_DID, iommu->domain_ids);
+
return 0;
}
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index 12f480c2bb8b..03c1612d173c 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -17,6 +17,12 @@
#define PDE_PFN_MASK PAGE_MASK
#define PASID_PDE_SHIFT 6
+/*
+ * Domain ID reserved for pasid entries programmed for first-level
+ * only and pass-through transfer modes.
+ */
+#define FLPT_DEFAULT_DID 1
+
struct pasid_dir_entry {
u64 val;
};
--
2.17.1
Intel vt-d spec rev3.0 requires software to use 256-bit
descriptors in invalidation queue. As the spec reads in
section 6.5.2:
Remapping hardware supporting Scalable Mode Translations
(ECAP_REG.SMTS=1) allow software to additionally program
the width of the descriptors (128-bits or 256-bits) that
will be written into the Queue. Software should setup the
Invalidation Queue for 256-bit descriptors before progra-
mming remapping hardware for scalable-mode translation as
128-bit descriptors are treated as invalid descriptors
(see Table 21 in Section 6.5.2.10) in scalable-mode.
This patch adds 256-bit invalidation descriptor support
if the hardware presents scalable mode capability.
Cc: Ashok Raj <[email protected]>
Cc: Jacob Pan <[email protected]>
Cc: Kevin Tian <[email protected]>
Cc: Liu Yi L <[email protected]>
Signed-off-by: Sanjay Kumar <[email protected]>
Signed-off-by: Lu Baolu <[email protected]>
---
drivers/iommu/dmar.c | 83 +++++++++++++++++++----------
drivers/iommu/intel-svm.c | 76 ++++++++++++++++----------
drivers/iommu/intel_irq_remapping.c | 6 ++-
include/linux/intel-iommu.h | 7 ++-
4 files changed, 113 insertions(+), 59 deletions(-)
diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index d9c748b6f9e4..b1429fa2cf29 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu *iommu, int index)
int head, tail;
struct q_inval *qi = iommu->qi;
int wait_index = (index + 1) % QI_LENGTH;
+ int shift = DMAR_IQ_SHIFT + !!ecap_smts(iommu->ecap);
if (qi->desc_status[wait_index] == QI_ABORT)
return -EAGAIN;
@@ -1173,13 +1174,15 @@ static int qi_check_fault(struct intel_iommu *iommu, int index)
*/
if (fault & DMA_FSTS_IQE) {
head = readl(iommu->reg + DMAR_IQH_REG);
- if ((head >> DMAR_IQ_SHIFT) == index) {
+ if ((head >> shift) == index) {
+ struct qi_desc *desc = qi->desc + head;
+
pr_err("VT-d detected invalid descriptor: "
"low=%llx, high=%llx\n",
- (unsigned long long)qi->desc[index].low,
- (unsigned long long)qi->desc[index].high);
- memcpy(&qi->desc[index], &qi->desc[wait_index],
- sizeof(struct qi_desc));
+ (unsigned long long)desc->qw0,
+ (unsigned long long)desc->qw1);
+ memcpy(desc, qi->desc + (wait_index << shift),
+ 1 << shift);
writel(DMA_FSTS_IQE, iommu->reg + DMAR_FSTS_REG);
return -EINVAL;
}
@@ -1191,10 +1194,10 @@ static int qi_check_fault(struct intel_iommu *iommu, int index)
*/
if (fault & DMA_FSTS_ITE) {
head = readl(iommu->reg + DMAR_IQH_REG);
- head = ((head >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) % QI_LENGTH;
+ head = ((head >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
head |= 1;
tail = readl(iommu->reg + DMAR_IQT_REG);
- tail = ((tail >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) % QI_LENGTH;
+ tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
@@ -1222,15 +1225,14 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
{
int rc;
struct q_inval *qi = iommu->qi;
- struct qi_desc *hw, wait_desc;
+ int offset, shift, length;
+ struct qi_desc wait_desc;
int wait_index, index;
unsigned long flags;
if (!qi)
return 0;
- hw = qi->desc;
-
restart:
rc = 0;
@@ -1243,16 +1245,21 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
index = qi->free_head;
wait_index = (index + 1) % QI_LENGTH;
+ shift = DMAR_IQ_SHIFT + !!ecap_smts(iommu->ecap);
+ length = 1 << shift;
qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE;
- hw[index] = *desc;
-
- wait_desc.low = QI_IWD_STATUS_DATA(QI_DONE) |
+ offset = index << shift;
+ memcpy(qi->desc + offset, desc, length);
+ wait_desc.qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
QI_IWD_STATUS_WRITE | QI_IWD_TYPE;
- wait_desc.high = virt_to_phys(&qi->desc_status[wait_index]);
+ wait_desc.qw1 = virt_to_phys(&qi->desc_status[wait_index]);
+ wait_desc.qw2 = 0;
+ wait_desc.qw3 = 0;
- hw[wait_index] = wait_desc;
+ offset = wait_index << shift;
+ memcpy(qi->desc + offset, &wait_desc, length);
qi->free_head = (qi->free_head + 2) % QI_LENGTH;
qi->free_cnt -= 2;
@@ -1261,7 +1268,7 @@ int qi_submit_sync(struct qi_desc *desc, struct intel_iommu *iommu)
* update the HW tail register indicating the presence of
* new descriptors.
*/
- writel(qi->free_head << DMAR_IQ_SHIFT, iommu->reg + DMAR_IQT_REG);
+ writel(qi->free_head << shift, iommu->reg + DMAR_IQT_REG);
while (qi->desc_status[wait_index] != QI_DONE) {
/*
@@ -1298,8 +1305,10 @@ void qi_global_iec(struct intel_iommu *iommu)
{
struct qi_desc desc;
- desc.low = QI_IEC_TYPE;
- desc.high = 0;
+ desc.qw0 = QI_IEC_TYPE;
+ desc.qw1 = 0;
+ desc.qw2 = 0;
+ desc.qw3 = 0;
/* should never fail */
qi_submit_sync(&desc, iommu);
@@ -1310,9 +1319,11 @@ void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, u8 fm,
{
struct qi_desc desc;
- desc.low = QI_CC_FM(fm) | QI_CC_SID(sid) | QI_CC_DID(did)
+ desc.qw0 = QI_CC_FM(fm) | QI_CC_SID(sid) | QI_CC_DID(did)
| QI_CC_GRAN(type) | QI_CC_TYPE;
- desc.high = 0;
+ desc.qw1 = 0;
+ desc.qw2 = 0;
+ desc.qw3 = 0;
qi_submit_sync(&desc, iommu);
}
@@ -1331,10 +1342,12 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr,
if (cap_read_drain(iommu->cap))
dr = 1;
- desc.low = QI_IOTLB_DID(did) | QI_IOTLB_DR(dr) | QI_IOTLB_DW(dw)
+ desc.qw0 = QI_IOTLB_DID(did) | QI_IOTLB_DR(dr) | QI_IOTLB_DW(dw)
| QI_IOTLB_GRAN(type) | QI_IOTLB_TYPE;
- desc.high = QI_IOTLB_ADDR(addr) | QI_IOTLB_IH(ih)
+ desc.qw1 = QI_IOTLB_ADDR(addr) | QI_IOTLB_IH(ih)
| QI_IOTLB_AM(size_order);
+ desc.qw2 = 0;
+ desc.qw3 = 0;
qi_submit_sync(&desc, iommu);
}
@@ -1347,15 +1360,17 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid,
if (mask) {
WARN_ON_ONCE(addr & ((1ULL << (VTD_PAGE_SHIFT + mask)) - 1));
addr |= (1ULL << (VTD_PAGE_SHIFT + mask - 1)) - 1;
- desc.high = QI_DEV_IOTLB_ADDR(addr) | QI_DEV_IOTLB_SIZE;
+ desc.qw1 = QI_DEV_IOTLB_ADDR(addr) | QI_DEV_IOTLB_SIZE;
} else
- desc.high = QI_DEV_IOTLB_ADDR(addr);
+ desc.qw1 = QI_DEV_IOTLB_ADDR(addr);
if (qdep >= QI_DEV_IOTLB_MAX_INVS)
qdep = 0;
- desc.low = QI_DEV_IOTLB_SID(sid) | QI_DEV_IOTLB_QDEP(qdep) |
+ desc.qw0 = QI_DEV_IOTLB_SID(sid) | QI_DEV_IOTLB_QDEP(qdep) |
QI_DIOTLB_TYPE | QI_DEV_IOTLB_PFSID(pfsid);
+ desc.qw2 = 0;
+ desc.qw3 = 0;
qi_submit_sync(&desc, iommu);
}
@@ -1403,16 +1418,24 @@ static void __dmar_enable_qi(struct intel_iommu *iommu)
u32 sts;
unsigned long flags;
struct q_inval *qi = iommu->qi;
+ u64 val = virt_to_phys(qi->desc);
qi->free_head = qi->free_tail = 0;
qi->free_cnt = QI_LENGTH;
+ /*
+ * Set DW=1 and QS=1 in IQA_REG when Scalable Mode capability
+ * is present.
+ */
+ if (ecap_smts(iommu->ecap))
+ val |= (1 << 11) | 1;
+
raw_spin_lock_irqsave(&iommu->register_lock, flags);
/* write zero to the tail reg */
writel(0, iommu->reg + DMAR_IQT_REG);
- dmar_writeq(iommu->reg + DMAR_IQA_REG, virt_to_phys(qi->desc));
+ dmar_writeq(iommu->reg + DMAR_IQA_REG, val);
iommu->gcmd |= DMA_GCMD_QIE;
writel(iommu->gcmd, iommu->reg + DMAR_GCMD_REG);
@@ -1448,8 +1471,12 @@ int dmar_enable_qi(struct intel_iommu *iommu)
qi = iommu->qi;
-
- desc_page = alloc_pages_node(iommu->node, GFP_ATOMIC | __GFP_ZERO, 0);
+ /*
+ * Need two pages to accommodate 256 descriptors of 256 bits each
+ * if the remapping hardware supports scalable mode translation.
+ */
+ desc_page = alloc_pages_node(iommu->node, GFP_ATOMIC | __GFP_ZERO,
+ !!ecap_smts(iommu->ecap));
if (!desc_page) {
kfree(qi);
iommu->qi = NULL;
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 6c0bd9ee9602..a06ed098e928 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -161,27 +161,40 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
* because that's the only option the hardware gives us. Despite
* the fact that they are actually only accessible through one. */
if (gl)
- desc.low = QI_EIOTLB_PASID(svm->pasid) | QI_EIOTLB_DID(sdev->did) |
- QI_EIOTLB_GRAN(QI_GRAN_ALL_ALL) | QI_EIOTLB_TYPE;
+ desc.qw0 = QI_EIOTLB_PASID(svm->pasid) |
+ QI_EIOTLB_DID(sdev->did) |
+ QI_EIOTLB_GRAN(QI_GRAN_ALL_ALL) |
+ QI_EIOTLB_TYPE;
else
- desc.low = QI_EIOTLB_PASID(svm->pasid) | QI_EIOTLB_DID(sdev->did) |
- QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) | QI_EIOTLB_TYPE;
- desc.high = 0;
+ desc.qw0 = QI_EIOTLB_PASID(svm->pasid) |
+ QI_EIOTLB_DID(sdev->did) |
+ QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
+ QI_EIOTLB_TYPE;
+ desc.qw1 = 0;
} else {
int mask = ilog2(__roundup_pow_of_two(pages));
- desc.low = QI_EIOTLB_PASID(svm->pasid) | QI_EIOTLB_DID(sdev->did) |
- QI_EIOTLB_GRAN(QI_GRAN_PSI_PASID) | QI_EIOTLB_TYPE;
- desc.high = QI_EIOTLB_ADDR(address) | QI_EIOTLB_GL(gl) |
- QI_EIOTLB_IH(ih) | QI_EIOTLB_AM(mask);
+ desc.qw0 = QI_EIOTLB_PASID(svm->pasid) |
+ QI_EIOTLB_DID(sdev->did) |
+ QI_EIOTLB_GRAN(QI_GRAN_PSI_PASID) |
+ QI_EIOTLB_TYPE;
+ desc.qw1 = QI_EIOTLB_ADDR(address) |
+ QI_EIOTLB_GL(gl) |
+ QI_EIOTLB_IH(ih) |
+ QI_EIOTLB_AM(mask);
}
+ desc.qw2 = 0;
+ desc.qw3 = 0;
qi_submit_sync(&desc, svm->iommu);
if (sdev->dev_iotlb) {
- desc.low = QI_DEV_EIOTLB_PASID(svm->pasid) | QI_DEV_EIOTLB_SID(sdev->sid) |
- QI_DEV_EIOTLB_QDEP(sdev->qdep) | QI_DEIOTLB_TYPE;
+ desc.qw0 = QI_DEV_EIOTLB_PASID(svm->pasid) |
+ QI_DEV_EIOTLB_SID(sdev->sid) |
+ QI_DEV_EIOTLB_QDEP(sdev->qdep) |
+ QI_DEIOTLB_TYPE;
if (pages == -1) {
- desc.high = QI_DEV_EIOTLB_ADDR(-1ULL >> 1) | QI_DEV_EIOTLB_SIZE;
+ desc.qw1 = QI_DEV_EIOTLB_ADDR(-1ULL >> 1) |
+ QI_DEV_EIOTLB_SIZE;
} else if (pages > 1) {
/* The least significant zero bit indicates the size. So,
* for example, an "address" value of 0x12345f000 will
@@ -189,10 +202,13 @@ static void intel_flush_svm_range_dev (struct intel_svm *svm, struct intel_svm_d
unsigned long last = address + ((unsigned long)(pages - 1) << VTD_PAGE_SHIFT);
unsigned long mask = __rounddown_pow_of_two(address ^ last);
- desc.high = QI_DEV_EIOTLB_ADDR((address & ~mask) | (mask - 1)) | QI_DEV_EIOTLB_SIZE;
+ desc.qw1 = QI_DEV_EIOTLB_ADDR((address & ~mask) |
+ (mask - 1)) | QI_DEV_EIOTLB_SIZE;
} else {
- desc.high = QI_DEV_EIOTLB_ADDR(address);
+ desc.qw1 = QI_DEV_EIOTLB_ADDR(address);
}
+ desc.qw2 = 0;
+ desc.qw3 = 0;
qi_submit_sync(&desc, svm->iommu);
}
}
@@ -237,8 +253,11 @@ static void intel_flush_pasid_dev(struct intel_svm *svm, struct intel_svm_dev *s
{
struct qi_desc desc;
- desc.high = 0;
- desc.low = QI_PC_TYPE | QI_PC_DID(sdev->did) | QI_PC_PASID_SEL | QI_PC_PASID(pasid);
+ desc.qw0 = QI_PC_TYPE | QI_PC_DID(sdev->did) |
+ QI_PC_PASID_SEL | QI_PC_PASID(pasid);
+ desc.qw1 = 0;
+ desc.qw2 = 0;
+ desc.qw3 = 0;
qi_submit_sync(&desc, svm->iommu);
}
@@ -668,24 +687,27 @@ static irqreturn_t prq_event_thread(int irq, void *d)
no_pasid:
if (req->lpig) {
/* Page Group Response */
- resp.low = QI_PGRP_PASID(req->pasid) |
+ resp.qw0 = QI_PGRP_PASID(req->pasid) |
QI_PGRP_DID((req->bus << 8) | req->devfn) |
QI_PGRP_PASID_P(req->pasid_present) |
QI_PGRP_RESP_TYPE;
- resp.high = QI_PGRP_IDX(req->prg_index) |
- QI_PGRP_PRIV(req->private) | QI_PGRP_RESP_CODE(result);
-
- qi_submit_sync(&resp, iommu);
+ resp.qw1 = QI_PGRP_IDX(req->prg_index) |
+ QI_PGRP_PRIV(req->private) |
+ QI_PGRP_RESP_CODE(result);
} else if (req->srr) {
/* Page Stream Response */
- resp.low = QI_PSTRM_IDX(req->prg_index) |
- QI_PSTRM_PRIV(req->private) | QI_PSTRM_BUS(req->bus) |
- QI_PSTRM_PASID(req->pasid) | QI_PSTRM_RESP_TYPE;
- resp.high = QI_PSTRM_ADDR(address) | QI_PSTRM_DEVFN(req->devfn) |
+ resp.qw0 = QI_PSTRM_IDX(req->prg_index) |
+ QI_PSTRM_PRIV(req->private) |
+ QI_PSTRM_BUS(req->bus) |
+ QI_PSTRM_PASID(req->pasid) |
+ QI_PSTRM_RESP_TYPE;
+ resp.qw1 = QI_PSTRM_ADDR(address) |
+ QI_PSTRM_DEVFN(req->devfn) |
QI_PSTRM_RESP_CODE(result);
-
- qi_submit_sync(&resp, iommu);
}
+ resp.qw2 = 0;
+ resp.qw3 = 0;
+ qi_submit_sync(&resp, iommu);
head = (head + sizeof(*req)) & PRQ_RING_MASK;
}
diff --git a/drivers/iommu/intel_irq_remapping.c b/drivers/iommu/intel_irq_remapping.c
index 967450bd421a..916391f33ca6 100644
--- a/drivers/iommu/intel_irq_remapping.c
+++ b/drivers/iommu/intel_irq_remapping.c
@@ -145,9 +145,11 @@ static int qi_flush_iec(struct intel_iommu *iommu, int index, int mask)
{
struct qi_desc desc;
- desc.low = QI_IEC_IIDEX(index) | QI_IEC_TYPE | QI_IEC_IM(mask)
+ desc.qw0 = QI_IEC_IIDEX(index) | QI_IEC_TYPE | QI_IEC_IM(mask)
| QI_IEC_SELECTIVE;
- desc.high = 0;
+ desc.qw1 = 0;
+ desc.qw2 = 0;
+ desc.qw3 = 0;
return qi_submit_sync(&desc, iommu);
}
diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
index 41791903a5e3..72aff482b293 100644
--- a/include/linux/intel-iommu.h
+++ b/include/linux/intel-iommu.h
@@ -340,12 +340,15 @@ enum {
#define QI_GRAN_PSI_PASID 3
struct qi_desc {
- u64 low, high;
+ u64 qw0;
+ u64 qw1;
+ u64 qw2;
+ u64 qw3;
};
struct q_inval {
raw_spinlock_t q_lock;
- struct qi_desc *desc; /* invalidation queue */
+ void *desc; /* invalidation queue */
int *desc_status; /* desc status */
int free_head; /* first free entry */
int free_tail; /* last free entry */
--
2.17.1
> From: Lu Baolu [mailto:[email protected]]
> Sent: Thursday, August 30, 2018 9:35 AM
>
> The Intel vt-d spec rev3.0 introduces a new translation
> mode called scalable mode, which enables PASID-granular
> translations for first level, second level, nested and
> pass-through modes. At the same time, the previous
> Extended Context (ECS) mode is deprecated (no production
> ever implements ECS).
>
> This patch adds enumeration for Scalable Mode and removes
> the deprecated ECS enumeration. It provides a boot time
> option to disable scalable mode even hardware claims to
> support it.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Cc: Liu Yi L <[email protected]>
> Signed-off-by: Sanjay Kumar <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Ashok Raj <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
> From: Lu Baolu [mailto:[email protected]]
> Sent: Thursday, August 30, 2018 9:35 AM
>
> In scalable mode, pasid structure is a two level table with
> a pasid directory table and a pasid table. Any pasid entry
> can be identified by a pasid value in below way.
>
> 1
> 9 6 5 0
> .-----------------------.-------.
> | PASID | |
> '-----------------------'-------' .-------------.
> | | | |
> | | | |
> | | | |
> | .-----------. | .-------------.
> | | | |----->| PASID Entry |
> | | | | '-------------'
> | | | |Plus | |
> | .-----------. | | |
> |---->| DIR Entry |-------->| |
> | '-----------' '-------------'
> .---------. |Plus | |
> | Context | | | |
> | Entry |------->| |
> '---------' '-----------'
>
> This changes the pasid table APIs to support scalable mode
> PASID directory and PASID table. It also adds a helper to
> get the PASID table entry according to the pasid value.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Cc: Liu Yi L <[email protected]>
> Signed-off-by: Sanjay Kumar <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Ashok Raj <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 2 +-
> drivers/iommu/intel-pasid.c | 72 ++++++++++++++++++++++++++++++++----
> -
> drivers/iommu/intel-pasid.h | 10 +++++-
> drivers/iommu/intel-svm.c | 6 +---
> 4 files changed, 74 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 5845edf4dcf9..b0da4f765274 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -2507,7 +2507,7 @@ static struct dmar_domain
> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
> if (dev)
> dev->archdata.iommu = info;
>
> - if (dev && dev_is_pci(dev) && info->pasid_supported) {
> + if (dev && dev_is_pci(dev) && sm_supported(iommu)) {
worthy of a comment here that PASID table now is mandatory in
scalable mode, instead of optional for 1st level usage before.
> ret = intel_pasid_alloc_table(dev);
> if (ret) {
> __dmar_remove_one_dev_info(info);
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index fe95c9bd4d33..d6e90cd5b062 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -127,8 +127,7 @@ int intel_pasid_alloc_table(struct device *dev)
> int ret, order;
>
> info = dev->archdata.iommu;
> - if (WARN_ON(!info || !dev_is_pci(dev) ||
> - !info->pasid_supported || info->pasid_table))
> + if (WARN_ON(!info || !dev_is_pci(dev) || info->pasid_table))
> return -EINVAL;
following same logic should you check sm_supported here?
>
> /* DMA alias device already has a pasid table, use it: */
> @@ -143,8 +142,9 @@ int intel_pasid_alloc_table(struct device *dev)
> return -ENOMEM;
> INIT_LIST_HEAD(&pasid_table->dev);
>
> - size = sizeof(struct pasid_entry);
> + size = sizeof(struct pasid_dir_entry);
> count = min_t(int, pci_max_pasids(to_pci_dev(dev)),
> intel_pasid_max_id);
> + count >>= PASID_PDE_SHIFT;
> order = get_order(size * count);
> pages = alloc_pages_node(info->iommu->node,
> GFP_ATOMIC | __GFP_ZERO,
> @@ -154,7 +154,7 @@ int intel_pasid_alloc_table(struct device *dev)
>
> pasid_table->table = page_address(pages);
> pasid_table->order = order;
> - pasid_table->max_pasid = count;
> + pasid_table->max_pasid = count << PASID_PDE_SHIFT;
are you sure of that count is PDE_SHIFT aligned? otherwise >>
then << would lose some bits. If sure, then better add some check.
>
> attach_out:
> device_attach_pasid_table(info, pasid_table);
> @@ -162,14 +162,33 @@ int intel_pasid_alloc_table(struct device *dev)
> return 0;
> }
>
> +/* Get PRESENT bit of a PASID directory entry. */
> +static inline bool
> +pasid_pde_is_present(struct pasid_dir_entry *pde)
> +{
> + return READ_ONCE(pde->val) & PASID_PTE_PRESENT;
curious why adding READ_ONCE specifically for PASID structure,
but not used for any other existing vtd structures? Is it to address
some specific requirement on PASID structure as defined in spec?
> +}
> +
> +/* Get PASID table from a PASID directory entry. */
> +static inline struct pasid_entry *
> +get_pasid_table_from_pde(struct pasid_dir_entry *pde)
> +{
> + if (!pasid_pde_is_present(pde))
> + return NULL;
> +
> + return phys_to_virt(READ_ONCE(pde->val) & PDE_PFN_MASK);
> +}
> +
> void intel_pasid_free_table(struct device *dev)
> {
> struct device_domain_info *info;
> struct pasid_table *pasid_table;
> + struct pasid_dir_entry *dir;
> + struct pasid_entry *table;
> + int i, max_pde;
>
> info = dev->archdata.iommu;
> - if (!info || !dev_is_pci(dev) ||
> - !info->pasid_supported || !info->pasid_table)
> + if (!info || !dev_is_pci(dev) || !info->pasid_table)
> return;
>
> pasid_table = info->pasid_table;
> @@ -178,6 +197,14 @@ void intel_pasid_free_table(struct device *dev)
> if (!list_empty(&pasid_table->dev))
> return;
>
> + /* Free scalable mode PASID directory tables: */
> + dir = pasid_table->table;
> + max_pde = pasid_table->max_pasid >> PASID_PDE_SHIFT;
> + for (i = 0; i < max_pde; i++) {
> + table = get_pasid_table_from_pde(&dir[i]);
> + free_pgtable_page(table);
> + }
> +
> free_pages((unsigned long)pasid_table->table, pasid_table->order);
> kfree(pasid_table);
> }
> @@ -206,17 +233,37 @@ int intel_pasid_get_dev_max_id(struct device
> *dev)
>
> struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid)
> {
> + struct device_domain_info *info;
> struct pasid_table *pasid_table;
> + struct pasid_dir_entry *dir;
> struct pasid_entry *entries;
> + int dir_index, index;
>
> pasid_table = intel_pasid_get_table(dev);
> if (WARN_ON(!pasid_table || pasid < 0 ||
> pasid >= intel_pasid_get_dev_max_id(dev)))
> return NULL;
>
> - entries = pasid_table->table;
> + dir = pasid_table->table;
> + info = dev->archdata.iommu;
> + dir_index = pasid >> PASID_PDE_SHIFT;
> + index = pasid & PASID_PTE_MASK;
> +
> + spin_lock(&pasid_lock);
> + entries = get_pasid_table_from_pde(&dir[dir_index]);
> + if (!entries) {
> + entries = alloc_pgtable_page(info->iommu->node);
> + if (!entries) {
> + spin_unlock(&pasid_lock);
> + return NULL;
> + }
> +
> + WRITE_ONCE(dir[dir_index].val,
> + (u64)virt_to_phys(entries) | PASID_PTE_PRESENT);
> + }
> + spin_unlock(&pasid_lock);
>
> - return &entries[pasid];
> + return &entries[index];
> }
>
> /*
> @@ -224,7 +271,14 @@ struct pasid_entry *intel_pasid_get_entry(struct
> device *dev, int pasid)
> */
> static inline void pasid_clear_entry(struct pasid_entry *pe)
> {
> - WRITE_ONCE(pe->val, 0);
> + WRITE_ONCE(pe->val[0], 0);
> + WRITE_ONCE(pe->val[1], 0);
> + WRITE_ONCE(pe->val[2], 0);
> + WRITE_ONCE(pe->val[3], 0);
> + WRITE_ONCE(pe->val[4], 0);
> + WRITE_ONCE(pe->val[5], 0);
> + WRITE_ONCE(pe->val[6], 0);
> + WRITE_ONCE(pe->val[7], 0);
memset?
> }
>
> void intel_pasid_clear_entry(struct device *dev, int pasid)
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index 1c05ed6fc5a5..12f480c2bb8b 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -12,11 +12,19 @@
>
> #define PASID_MIN 0x1
> #define PASID_MAX 0x100000
> +#define PASID_PTE_MASK 0x3F
> +#define PASID_PTE_PRESENT 1
> +#define PDE_PFN_MASK PAGE_MASK
> +#define PASID_PDE_SHIFT 6
>
> -struct pasid_entry {
> +struct pasid_dir_entry {
> u64 val;
> };
>
> +struct pasid_entry {
> + u64 val[8];
> +};
> +
> /* The representative of a PASID table */
> struct pasid_table {
> void *table; /* pasid table pointer */
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 4a03e5090952..6c0bd9ee9602 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -65,8 +65,6 @@ int intel_svm_init(struct intel_iommu *iommu)
>
> order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
> if (ecap_dis(iommu->ecap)) {
> - /* Just making it explicit... */
> - BUILD_BUG_ON(sizeof(struct pasid_entry) != sizeof(struct
> pasid_state_entry));
> pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
> if (pages)
> iommu->pasid_state_table = page_address(pages);
> @@ -406,9 +404,7 @@ int intel_svm_bind_mm(struct device *dev, int
> *pasid, int flags, struct svm_dev_
> pasid_entry_val |= PASID_ENTRY_FLPM_5LP;
>
> entry = intel_pasid_get_entry(dev, svm->pasid);
> - entry->val = pasid_entry_val;
> -
> - wmb();
> + WRITE_ONCE(entry->val[0], pasid_entry_val);
>
> /*
> * Flush PASID cache when a PASID table entry becomes
> --
> 2.17.1
> From: Lu Baolu [mailto:[email protected]]
> Sent: Thursday, August 30, 2018 9:35 AM
>
> So that they could also be used in other source files.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Cc: Liu Yi L <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Ashok Raj <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 43 -------------------------------------
> include/linux/intel-iommu.h | 43
> +++++++++++++++++++++++++++++++++++++
> 2 files changed, 43 insertions(+), 43 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index b0da4f765274..93cde957adc7 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -315,49 +315,6 @@ static inline void context_clear_entry(struct
> context_entry *context)
> context->hi = 0;
> }
>
> -/*
> - * 0: readable
> - * 1: writable
> - * 2-6: reserved
> - * 7: super page
> - * 8-10: available
> - * 11: snoop behavior
> - * 12-63: Host physcial address
> - */
> -struct dma_pte {
> - u64 val;
> -};
> -
> -static inline void dma_clear_pte(struct dma_pte *pte)
> -{
> - pte->val = 0;
> -}
> -
> -static inline u64 dma_pte_addr(struct dma_pte *pte)
> -{
> -#ifdef CONFIG_64BIT
> - return pte->val & VTD_PAGE_MASK;
> -#else
> - /* Must have a full atomic 64-bit read */
> - return __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
> -#endif
> -}
> -
> -static inline bool dma_pte_present(struct dma_pte *pte)
> -{
> - return (pte->val & 3) != 0;
> -}
> -
> -static inline bool dma_pte_superpage(struct dma_pte *pte)
> -{
> - return (pte->val & DMA_PTE_LARGE_PAGE);
> -}
> -
> -static inline int first_pte_in_page(struct dma_pte *pte)
> -{
> - return !((unsigned long)pte & ~VTD_PAGE_MASK);
> -}
> -
> /*
> * This domain is a statically identity mapping domain.
> * 1. This domain creats a static 1:1 mapping to all usable memory.
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 2173ae35f1dc..41791903a5e3 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -501,6 +501,49 @@ static inline void __iommu_flush_cache(
> clflush_cache_range(addr, size);
> }
>
> +/*
> + * 0: readable
> + * 1: writable
> + * 2-6: reserved
> + * 7: super page
> + * 8-10: available
> + * 11: snoop behavior
> + * 12-63: Host physcial address
> + */
> +struct dma_pte {
> + u64 val;
> +};
> +
> +static inline void dma_clear_pte(struct dma_pte *pte)
> +{
> + pte->val = 0;
> +}
> +
> +static inline u64 dma_pte_addr(struct dma_pte *pte)
> +{
> +#ifdef CONFIG_64BIT
> + return pte->val & VTD_PAGE_MASK;
> +#else
> + /* Must have a full atomic 64-bit read */
> + return __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
> +#endif
> +}
> +
> +static inline bool dma_pte_present(struct dma_pte *pte)
> +{
> + return (pte->val & 3) != 0;
> +}
> +
> +static inline bool dma_pte_superpage(struct dma_pte *pte)
> +{
> + return (pte->val & DMA_PTE_LARGE_PAGE);
> +}
> +
> +static inline int first_pte_in_page(struct dma_pte *pte)
> +{
> + return !((unsigned long)pte & ~VTD_PAGE_MASK);
> +}
> +
> extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct
> pci_dev *dev);
> extern int dmar_find_matched_atsr_unit(struct pci_dev *dev);
>
> --
> 2.17.1
Hi,
On 09/06/2018 09:55 AM, Tian, Kevin wrote:
>> From: Lu Baolu [mailto:[email protected]]
>> Sent: Thursday, August 30, 2018 9:35 AM
>>
>> The Intel vt-d spec rev3.0 introduces a new translation
>> mode called scalable mode, which enables PASID-granular
>> translations for first level, second level, nested and
>> pass-through modes. At the same time, the previous
>> Extended Context (ECS) mode is deprecated (no production
>> ever implements ECS).
>>
>> This patch adds enumeration for Scalable Mode and removes
>> the deprecated ECS enumeration. It provides a boot time
>> option to disable scalable mode even hardware claims to
>> support it.
>>
>> Cc: Ashok Raj <[email protected]>
>> Cc: Jacob Pan <[email protected]>
>> Cc: Kevin Tian <[email protected]>
>> Cc: Liu Yi L <[email protected]>
>> Signed-off-by: Sanjay Kumar <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Ashok Raj <[email protected]>
>
> Reviewed-by: Kevin Tian <[email protected]>
>
Thank you, Kevin.
Best regards,
Lu Baolu
> From: Lu Baolu [mailto:[email protected]]
> Sent: Thursday, August 30, 2018 9:35 AM
>
> Intel vt-d spec rev3.0 requires software to use 256-bit
> descriptors in invalidation queue. As the spec reads in
> section 6.5.2:
>
> Remapping hardware supporting Scalable Mode Translations
> (ECAP_REG.SMTS=1) allow software to additionally program
> the width of the descriptors (128-bits or 256-bits) that
> will be written into the Queue. Software should setup the
> Invalidation Queue for 256-bit descriptors before progra-
> mming remapping hardware for scalable-mode translation as
> 128-bit descriptors are treated as invalid descriptors
> (see Table 21 in Section 6.5.2.10) in scalable-mode.
>
> This patch adds 256-bit invalidation descriptor support
> if the hardware presents scalable mode capability.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Cc: Liu Yi L <[email protected]>
> Signed-off-by: Sanjay Kumar <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> ---
> drivers/iommu/dmar.c | 83 +++++++++++++++++++----------
> drivers/iommu/intel-svm.c | 76 ++++++++++++++++----------
> drivers/iommu/intel_irq_remapping.c | 6 ++-
> include/linux/intel-iommu.h | 7 ++-
> 4 files changed, 113 insertions(+), 59 deletions(-)
>
> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
> index d9c748b6f9e4..b1429fa2cf29 100644
> --- a/drivers/iommu/dmar.c
> +++ b/drivers/iommu/dmar.c
> @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu
> *iommu, int index)
> int head, tail;
> struct q_inval *qi = iommu->qi;
> int wait_index = (index + 1) % QI_LENGTH;
> + int shift = DMAR_IQ_SHIFT + !!ecap_smts(iommu->ecap);
could add a new macro: qi_shift()
>
> if (qi->desc_status[wait_index] == QI_ABORT)
> return -EAGAIN;
> @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct intel_iommu
> *iommu, int index)
> */
> if (fault & DMA_FSTS_IQE) {
> head = readl(iommu->reg + DMAR_IQH_REG);
> - if ((head >> DMAR_IQ_SHIFT) == index) {
> + if ((head >> shift) == index) {
could be another macro: qi_index(head)
> + struct qi_desc *desc = qi->desc + head;
> +
> pr_err("VT-d detected invalid descriptor: "
> "low=%llx, high=%llx\n",
> - (unsigned long long)qi->desc[index].low,
> - (unsigned long long)qi->desc[index].high);
> - memcpy(&qi->desc[index], &qi->desc[wait_index],
> - sizeof(struct qi_desc));
> + (unsigned long long)desc->qw0,
> + (unsigned long long)desc->qw1);
what about qw2 and qw3 in 256-bit case?
> + memcpy(desc, qi->desc + (wait_index << shift),
> + 1 << shift);
> writel(DMA_FSTS_IQE, iommu->reg +
> DMAR_FSTS_REG);
> return -EINVAL;
> }
> @@ -1191,10 +1194,10 @@ static int qi_check_fault(struct intel_iommu
> *iommu, int index)
> */
> if (fault & DMA_FSTS_ITE) {
> head = readl(iommu->reg + DMAR_IQH_REG);
> - head = ((head >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) %
> QI_LENGTH;
> + head = ((head >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
> head |= 1;
> tail = readl(iommu->reg + DMAR_IQT_REG);
> - tail = ((tail >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) %
> QI_LENGTH;
> + tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
>
> writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
>
> @@ -1222,15 +1225,14 @@ int qi_submit_sync(struct qi_desc *desc, struct
> intel_iommu *iommu)
> {
> int rc;
> struct q_inval *qi = iommu->qi;
> - struct qi_desc *hw, wait_desc;
> + int offset, shift, length;
> + struct qi_desc wait_desc;
> int wait_index, index;
> unsigned long flags;
>
> if (!qi)
> return 0;
>
> - hw = qi->desc;
> -
> restart:
> rc = 0;
>
> @@ -1243,16 +1245,21 @@ int qi_submit_sync(struct qi_desc *desc, struct
> intel_iommu *iommu)
>
> index = qi->free_head;
> wait_index = (index + 1) % QI_LENGTH;
> + shift = DMAR_IQ_SHIFT + !!ecap_smts(iommu->ecap);
> + length = 1 << shift;
>
> qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE;
>
> - hw[index] = *desc;
> -
> - wait_desc.low = QI_IWD_STATUS_DATA(QI_DONE) |
> + offset = index << shift;
> + memcpy(qi->desc + offset, desc, length);
> + wait_desc.qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
> QI_IWD_STATUS_WRITE | QI_IWD_TYPE;
> - wait_desc.high = virt_to_phys(&qi->desc_status[wait_index]);
> + wait_desc.qw1 = virt_to_phys(&qi->desc_status[wait_index]);
> + wait_desc.qw2 = 0;
> + wait_desc.qw3 = 0;
>
> - hw[wait_index] = wait_desc;
> + offset = wait_index << shift;
> + memcpy(qi->desc + offset, &wait_desc, length);
>
> qi->free_head = (qi->free_head + 2) % QI_LENGTH;
> qi->free_cnt -= 2;
> @@ -1261,7 +1268,7 @@ int qi_submit_sync(struct qi_desc *desc, struct
> intel_iommu *iommu)
> * update the HW tail register indicating the presence of
> * new descriptors.
> */
> - writel(qi->free_head << DMAR_IQ_SHIFT, iommu->reg +
> DMAR_IQT_REG);
> + writel(qi->free_head << shift, iommu->reg + DMAR_IQT_REG);
>
> while (qi->desc_status[wait_index] != QI_DONE) {
> /*
> @@ -1298,8 +1305,10 @@ void qi_global_iec(struct intel_iommu *iommu)
> {
> struct qi_desc desc;
>
> - desc.low = QI_IEC_TYPE;
> - desc.high = 0;
> + desc.qw0 = QI_IEC_TYPE;
> + desc.qw1 = 0;
> + desc.qw2 = 0;
> + desc.qw3 = 0;
>
> /* should never fail */
> qi_submit_sync(&desc, iommu);
> @@ -1310,9 +1319,11 @@ void qi_flush_context(struct intel_iommu
> *iommu, u16 did, u16 sid, u8 fm,
> {
> struct qi_desc desc;
>
> - desc.low = QI_CC_FM(fm) | QI_CC_SID(sid) | QI_CC_DID(did)
> + desc.qw0 = QI_CC_FM(fm) | QI_CC_SID(sid) | QI_CC_DID(did)
> | QI_CC_GRAN(type) | QI_CC_TYPE;
> - desc.high = 0;
> + desc.qw1 = 0;
> + desc.qw2 = 0;
> + desc.qw3 = 0;
>
> qi_submit_sync(&desc, iommu);
> }
> @@ -1331,10 +1342,12 @@ void qi_flush_iotlb(struct intel_iommu
> *iommu, u16 did, u64 addr,
> if (cap_read_drain(iommu->cap))
> dr = 1;
>
> - desc.low = QI_IOTLB_DID(did) | QI_IOTLB_DR(dr) |
> QI_IOTLB_DW(dw)
> + desc.qw0 = QI_IOTLB_DID(did) | QI_IOTLB_DR(dr) |
> QI_IOTLB_DW(dw)
> | QI_IOTLB_GRAN(type) | QI_IOTLB_TYPE;
> - desc.high = QI_IOTLB_ADDR(addr) | QI_IOTLB_IH(ih)
> + desc.qw1 = QI_IOTLB_ADDR(addr) | QI_IOTLB_IH(ih)
> | QI_IOTLB_AM(size_order);
> + desc.qw2 = 0;
> + desc.qw3 = 0;
>
> qi_submit_sync(&desc, iommu);
> }
> @@ -1347,15 +1360,17 @@ void qi_flush_dev_iotlb(struct intel_iommu
> *iommu, u16 sid, u16 pfsid,
> if (mask) {
> WARN_ON_ONCE(addr & ((1ULL << (VTD_PAGE_SHIFT +
> mask)) - 1));
> addr |= (1ULL << (VTD_PAGE_SHIFT + mask - 1)) - 1;
> - desc.high = QI_DEV_IOTLB_ADDR(addr) |
> QI_DEV_IOTLB_SIZE;
> + desc.qw1 = QI_DEV_IOTLB_ADDR(addr) |
> QI_DEV_IOTLB_SIZE;
> } else
> - desc.high = QI_DEV_IOTLB_ADDR(addr);
> + desc.qw1 = QI_DEV_IOTLB_ADDR(addr);
>
> if (qdep >= QI_DEV_IOTLB_MAX_INVS)
> qdep = 0;
>
> - desc.low = QI_DEV_IOTLB_SID(sid) | QI_DEV_IOTLB_QDEP(qdep) |
> + desc.qw0 = QI_DEV_IOTLB_SID(sid) | QI_DEV_IOTLB_QDEP(qdep) |
> QI_DIOTLB_TYPE | QI_DEV_IOTLB_PFSID(pfsid);
> + desc.qw2 = 0;
> + desc.qw3 = 0;
>
> qi_submit_sync(&desc, iommu);
> }
> @@ -1403,16 +1418,24 @@ static void __dmar_enable_qi(struct
> intel_iommu *iommu)
> u32 sts;
> unsigned long flags;
> struct q_inval *qi = iommu->qi;
> + u64 val = virt_to_phys(qi->desc);
>
> qi->free_head = qi->free_tail = 0;
> qi->free_cnt = QI_LENGTH;
>
> + /*
> + * Set DW=1 and QS=1 in IQA_REG when Scalable Mode capability
> + * is present.
> + */
> + if (ecap_smts(iommu->ecap))
> + val |= (1 << 11) | 1;
> +
> raw_spin_lock_irqsave(&iommu->register_lock, flags);
>
> /* write zero to the tail reg */
> writel(0, iommu->reg + DMAR_IQT_REG);
>
> - dmar_writeq(iommu->reg + DMAR_IQA_REG, virt_to_phys(qi-
> >desc));
> + dmar_writeq(iommu->reg + DMAR_IQA_REG, val);
>
> iommu->gcmd |= DMA_GCMD_QIE;
> writel(iommu->gcmd, iommu->reg + DMAR_GCMD_REG);
> @@ -1448,8 +1471,12 @@ int dmar_enable_qi(struct intel_iommu
> *iommu)
>
> qi = iommu->qi;
>
> -
> - desc_page = alloc_pages_node(iommu->node, GFP_ATOMIC |
> __GFP_ZERO, 0);
> + /*
> + * Need two pages to accommodate 256 descriptors of 256 bits each
> + * if the remapping hardware supports scalable mode translation.
> + */
> + desc_page = alloc_pages_node(iommu->node, GFP_ATOMIC |
> __GFP_ZERO,
> + !!ecap_smts(iommu->ecap));
> if (!desc_page) {
> kfree(qi);
> iommu->qi = NULL;
> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
> index 6c0bd9ee9602..a06ed098e928 100644
> --- a/drivers/iommu/intel-svm.c
> +++ b/drivers/iommu/intel-svm.c
> @@ -161,27 +161,40 @@ static void intel_flush_svm_range_dev (struct
> intel_svm *svm, struct intel_svm_d
> * because that's the only option the hardware gives us.
> Despite
> * the fact that they are actually only accessible through one.
> */
> if (gl)
> - desc.low = QI_EIOTLB_PASID(svm->pasid) |
> QI_EIOTLB_DID(sdev->did) |
> - QI_EIOTLB_GRAN(QI_GRAN_ALL_ALL) |
> QI_EIOTLB_TYPE;
> + desc.qw0 = QI_EIOTLB_PASID(svm->pasid) |
> + QI_EIOTLB_DID(sdev->did) |
> +
> QI_EIOTLB_GRAN(QI_GRAN_ALL_ALL) |
> + QI_EIOTLB_TYPE;
> else
> - desc.low = QI_EIOTLB_PASID(svm->pasid) |
> QI_EIOTLB_DID(sdev->did) |
> - QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID)
> | QI_EIOTLB_TYPE;
> - desc.high = 0;
> + desc.qw0 = QI_EIOTLB_PASID(svm->pasid) |
> + QI_EIOTLB_DID(sdev->did) |
> +
> QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
> + QI_EIOTLB_TYPE;
> + desc.qw1 = 0;
> } else {
> int mask = ilog2(__roundup_pow_of_two(pages));
>
> - desc.low = QI_EIOTLB_PASID(svm->pasid) |
> QI_EIOTLB_DID(sdev->did) |
> - QI_EIOTLB_GRAN(QI_GRAN_PSI_PASID) |
> QI_EIOTLB_TYPE;
> - desc.high = QI_EIOTLB_ADDR(address) | QI_EIOTLB_GL(gl) |
> - QI_EIOTLB_IH(ih) | QI_EIOTLB_AM(mask);
> + desc.qw0 = QI_EIOTLB_PASID(svm->pasid) |
> + QI_EIOTLB_DID(sdev->did) |
> + QI_EIOTLB_GRAN(QI_GRAN_PSI_PASID) |
> + QI_EIOTLB_TYPE;
> + desc.qw1 = QI_EIOTLB_ADDR(address) |
> + QI_EIOTLB_GL(gl) |
> + QI_EIOTLB_IH(ih) |
> + QI_EIOTLB_AM(mask);
> }
> + desc.qw2 = 0;
> + desc.qw3 = 0;
> qi_submit_sync(&desc, svm->iommu);
>
> if (sdev->dev_iotlb) {
> - desc.low = QI_DEV_EIOTLB_PASID(svm->pasid) |
> QI_DEV_EIOTLB_SID(sdev->sid) |
> - QI_DEV_EIOTLB_QDEP(sdev->qdep) |
> QI_DEIOTLB_TYPE;
> + desc.qw0 = QI_DEV_EIOTLB_PASID(svm->pasid) |
> + QI_DEV_EIOTLB_SID(sdev->sid) |
> + QI_DEV_EIOTLB_QDEP(sdev->qdep) |
> + QI_DEIOTLB_TYPE;
> if (pages == -1) {
> - desc.high = QI_DEV_EIOTLB_ADDR(-1ULL >> 1) |
> QI_DEV_EIOTLB_SIZE;
> + desc.qw1 = QI_DEV_EIOTLB_ADDR(-1ULL >> 1) |
> + QI_DEV_EIOTLB_SIZE;
> } else if (pages > 1) {
> /* The least significant zero bit indicates the size. So,
> * for example, an "address" value of 0x12345f000
> will
> @@ -189,10 +202,13 @@ static void intel_flush_svm_range_dev (struct
> intel_svm *svm, struct intel_svm_d
> unsigned long last = address + ((unsigned
> long)(pages - 1) << VTD_PAGE_SHIFT);
> unsigned long mask =
> __rounddown_pow_of_two(address ^ last);
>
> - desc.high = QI_DEV_EIOTLB_ADDR((address &
> ~mask) | (mask - 1)) | QI_DEV_EIOTLB_SIZE;
> + desc.qw1 = QI_DEV_EIOTLB_ADDR((address &
> ~mask) |
> + (mask - 1)) | QI_DEV_EIOTLB_SIZE;
> } else {
> - desc.high = QI_DEV_EIOTLB_ADDR(address);
> + desc.qw1 = QI_DEV_EIOTLB_ADDR(address);
> }
> + desc.qw2 = 0;
> + desc.qw3 = 0;
> qi_submit_sync(&desc, svm->iommu);
> }
> }
> @@ -237,8 +253,11 @@ static void intel_flush_pasid_dev(struct intel_svm
> *svm, struct intel_svm_dev *s
> {
> struct qi_desc desc;
>
> - desc.high = 0;
> - desc.low = QI_PC_TYPE | QI_PC_DID(sdev->did) | QI_PC_PASID_SEL
> | QI_PC_PASID(pasid);
> + desc.qw0 = QI_PC_TYPE | QI_PC_DID(sdev->did) |
> + QI_PC_PASID_SEL | QI_PC_PASID(pasid);
> + desc.qw1 = 0;
> + desc.qw2 = 0;
> + desc.qw3 = 0;
>
> qi_submit_sync(&desc, svm->iommu);
> }
> @@ -668,24 +687,27 @@ static irqreturn_t prq_event_thread(int irq, void
> *d)
> no_pasid:
> if (req->lpig) {
> /* Page Group Response */
> - resp.low = QI_PGRP_PASID(req->pasid) |
> + resp.qw0 = QI_PGRP_PASID(req->pasid) |
> QI_PGRP_DID((req->bus << 8) | req->devfn)
> |
> QI_PGRP_PASID_P(req->pasid_present) |
> QI_PGRP_RESP_TYPE;
> - resp.high = QI_PGRP_IDX(req->prg_index) |
> - QI_PGRP_PRIV(req->private) |
> QI_PGRP_RESP_CODE(result);
> -
> - qi_submit_sync(&resp, iommu);
> + resp.qw1 = QI_PGRP_IDX(req->prg_index) |
> + QI_PGRP_PRIV(req->private) |
> + QI_PGRP_RESP_CODE(result);
> } else if (req->srr) {
> /* Page Stream Response */
> - resp.low = QI_PSTRM_IDX(req->prg_index) |
> - QI_PSTRM_PRIV(req->private) |
> QI_PSTRM_BUS(req->bus) |
> - QI_PSTRM_PASID(req->pasid) |
> QI_PSTRM_RESP_TYPE;
> - resp.high = QI_PSTRM_ADDR(address) |
> QI_PSTRM_DEVFN(req->devfn) |
> + resp.qw0 = QI_PSTRM_IDX(req->prg_index) |
> + QI_PSTRM_PRIV(req->private) |
> + QI_PSTRM_BUS(req->bus) |
> + QI_PSTRM_PASID(req->pasid) |
> + QI_PSTRM_RESP_TYPE;
> + resp.qw1 = QI_PSTRM_ADDR(address) |
> + QI_PSTRM_DEVFN(req->devfn) |
> QI_PSTRM_RESP_CODE(result);
> -
> - qi_submit_sync(&resp, iommu);
> }
> + resp.qw2 = 0;
> + resp.qw3 = 0;
> + qi_submit_sync(&resp, iommu);
>
> head = (head + sizeof(*req)) & PRQ_RING_MASK;
> }
> diff --git a/drivers/iommu/intel_irq_remapping.c
> b/drivers/iommu/intel_irq_remapping.c
> index 967450bd421a..916391f33ca6 100644
> --- a/drivers/iommu/intel_irq_remapping.c
> +++ b/drivers/iommu/intel_irq_remapping.c
> @@ -145,9 +145,11 @@ static int qi_flush_iec(struct intel_iommu *iommu,
> int index, int mask)
> {
> struct qi_desc desc;
>
> - desc.low = QI_IEC_IIDEX(index) | QI_IEC_TYPE | QI_IEC_IM(mask)
> + desc.qw0 = QI_IEC_IIDEX(index) | QI_IEC_TYPE | QI_IEC_IM(mask)
> | QI_IEC_SELECTIVE;
> - desc.high = 0;
> + desc.qw1 = 0;
> + desc.qw2 = 0;
> + desc.qw3 = 0;
>
> return qi_submit_sync(&desc, iommu);
> }
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 41791903a5e3..72aff482b293 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -340,12 +340,15 @@ enum {
> #define QI_GRAN_PSI_PASID 3
>
> struct qi_desc {
> - u64 low, high;
> + u64 qw0;
> + u64 qw1;
> + u64 qw2;
> + u64 qw3;
> };
>
> struct q_inval {
> raw_spinlock_t q_lock;
> - struct qi_desc *desc; /* invalidation queue */
> + void *desc; /* invalidation queue */
> int *desc_status; /* desc status */
> int free_head; /* first free entry */
> int free_tail; /* last free entry */
> --
> 2.17.1
Hi,
On 09/06/2018 10:14 AM, Tian, Kevin wrote:
>> From: Lu Baolu [mailto:[email protected]]
>> Sent: Thursday, August 30, 2018 9:35 AM
>>
>> In scalable mode, pasid structure is a two level table with
>> a pasid directory table and a pasid table. Any pasid entry
>> can be identified by a pasid value in below way.
>>
>> 1
>> 9 6 5 0
>> .-----------------------.-------.
>> | PASID | |
>> '-----------------------'-------' .-------------.
>> | | | |
>> | | | |
>> | | | |
>> | .-----------. | .-------------.
>> | | | |----->| PASID Entry |
>> | | | | '-------------'
>> | | | |Plus | |
>> | .-----------. | | |
>> |---->| DIR Entry |-------->| |
>> | '-----------' '-------------'
>> .---------. |Plus | |
>> | Context | | | |
>> | Entry |------->| |
>> '---------' '-----------'
>>
>> This changes the pasid table APIs to support scalable mode
>> PASID directory and PASID table. It also adds a helper to
>> get the PASID table entry according to the pasid value.
>>
>> Cc: Ashok Raj <[email protected]>
>> Cc: Jacob Pan <[email protected]>
>> Cc: Kevin Tian <[email protected]>
>> Cc: Liu Yi L <[email protected]>
>> Signed-off-by: Sanjay Kumar <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Ashok Raj <[email protected]>
>> ---
>> drivers/iommu/intel-iommu.c | 2 +-
>> drivers/iommu/intel-pasid.c | 72 ++++++++++++++++++++++++++++++++----
>> -
>> drivers/iommu/intel-pasid.h | 10 +++++-
>> drivers/iommu/intel-svm.c | 6 +---
>> 4 files changed, 74 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 5845edf4dcf9..b0da4f765274 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -2507,7 +2507,7 @@ static struct dmar_domain
>> *dmar_insert_one_dev_info(struct intel_iommu *iommu,
>> if (dev)
>> dev->archdata.iommu = info;
>>
>> - if (dev && dev_is_pci(dev) && info->pasid_supported) {
>> + if (dev && dev_is_pci(dev) && sm_supported(iommu)) {
>
> worthy of a comment here that PASID table now is mandatory in
> scalable mode, instead of optional for 1st level usage before.
Fair enough. Will add in the next version.
>
>> ret = intel_pasid_alloc_table(dev);
>> if (ret) {
>> __dmar_remove_one_dev_info(info);
>> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
>> index fe95c9bd4d33..d6e90cd5b062 100644
>> --- a/drivers/iommu/intel-pasid.c
>> +++ b/drivers/iommu/intel-pasid.c
>> @@ -127,8 +127,7 @@ int intel_pasid_alloc_table(struct device *dev)
>> int ret, order;
>>
>> info = dev->archdata.iommu;
>> - if (WARN_ON(!info || !dev_is_pci(dev) ||
>> - !info->pasid_supported || info->pasid_table))
>> + if (WARN_ON(!info || !dev_is_pci(dev) || info->pasid_table))
>> return -EINVAL;
>
> following same logic should you check sm_supported here?
If not sm_supported, info->pasid_table should be NULL. Checking
info->pasid_table is better since even sm_supported, the pasid
table pointer could still possible to be empty.
>
>>
>> /* DMA alias device already has a pasid table, use it: */
>> @@ -143,8 +142,9 @@ int intel_pasid_alloc_table(struct device *dev)
>> return -ENOMEM;
>> INIT_LIST_HEAD(&pasid_table->dev);
>>
>> - size = sizeof(struct pasid_entry);
>> + size = sizeof(struct pasid_dir_entry);
>> count = min_t(int, pci_max_pasids(to_pci_dev(dev)),
>> intel_pasid_max_id);
>> + count >>= PASID_PDE_SHIFT;
>> order = get_order(size * count);
>> pages = alloc_pages_node(info->iommu->node,
>> GFP_ATOMIC | __GFP_ZERO,
>> @@ -154,7 +154,7 @@ int intel_pasid_alloc_table(struct device *dev)
>>
>> pasid_table->table = page_address(pages);
>> pasid_table->order = order;
>> - pasid_table->max_pasid = count;
>> + pasid_table->max_pasid = count << PASID_PDE_SHIFT;
>
> are you sure of that count is PDE_SHIFT aligned? otherwise >>
> then << would lose some bits. If sure, then better add some check.
I am making the max_pasid PDE_SHIFT aligned as the result of shift
operations.
>
>>
>> attach_out:
>> device_attach_pasid_table(info, pasid_table);
>> @@ -162,14 +162,33 @@ int intel_pasid_alloc_table(struct device *dev)
>> return 0;
>> }
>>
>> +/* Get PRESENT bit of a PASID directory entry. */
>> +static inline bool
>> +pasid_pde_is_present(struct pasid_dir_entry *pde)
>> +{
>> + return READ_ONCE(pde->val) & PASID_PTE_PRESENT;
>
> curious why adding READ_ONCE specifically for PASID structure,
> but not used for any other existing vtd structures? Is it to address
> some specific requirement on PASID structure as defined in spec?
READ/WRITE_ONCE are used in pasid entry read/write to prevent the
compiler from merging, refetching or reordering successive instances of
read/write.
>
>> +}
>> +
>> +/* Get PASID table from a PASID directory entry. */
>> +static inline struct pasid_entry *
>> +get_pasid_table_from_pde(struct pasid_dir_entry *pde)
>> +{
>> + if (!pasid_pde_is_present(pde))
>> + return NULL;
>> +
>> + return phys_to_virt(READ_ONCE(pde->val) & PDE_PFN_MASK);
>> +}
>> +
>> void intel_pasid_free_table(struct device *dev)
>> {
>> struct device_domain_info *info;
>> struct pasid_table *pasid_table;
>> + struct pasid_dir_entry *dir;
>> + struct pasid_entry *table;
>> + int i, max_pde;
>>
>> info = dev->archdata.iommu;
>> - if (!info || !dev_is_pci(dev) ||
>> - !info->pasid_supported || !info->pasid_table)
>> + if (!info || !dev_is_pci(dev) || !info->pasid_table)
>> return;
>>
>> pasid_table = info->pasid_table;
>> @@ -178,6 +197,14 @@ void intel_pasid_free_table(struct device *dev)
>> if (!list_empty(&pasid_table->dev))
>> return;
>>
>> + /* Free scalable mode PASID directory tables: */
>> + dir = pasid_table->table;
>> + max_pde = pasid_table->max_pasid >> PASID_PDE_SHIFT;
>> + for (i = 0; i < max_pde; i++) {
>> + table = get_pasid_table_from_pde(&dir[i]);
>> + free_pgtable_page(table);
>> + }
>> +
>> free_pages((unsigned long)pasid_table->table, pasid_table->order);
>> kfree(pasid_table);
>> }
>> @@ -206,17 +233,37 @@ int intel_pasid_get_dev_max_id(struct device
>> *dev)
>>
>> struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid)
>> {
>> + struct device_domain_info *info;
>> struct pasid_table *pasid_table;
>> + struct pasid_dir_entry *dir;
>> struct pasid_entry *entries;
>> + int dir_index, index;
>>
>> pasid_table = intel_pasid_get_table(dev);
>> if (WARN_ON(!pasid_table || pasid < 0 ||
>> pasid >= intel_pasid_get_dev_max_id(dev)))
>> return NULL;
>>
>> - entries = pasid_table->table;
>> + dir = pasid_table->table;
>> + info = dev->archdata.iommu;
>> + dir_index = pasid >> PASID_PDE_SHIFT;
>> + index = pasid & PASID_PTE_MASK;
>> +
>> + spin_lock(&pasid_lock);
>> + entries = get_pasid_table_from_pde(&dir[dir_index]);
>> + if (!entries) {
>> + entries = alloc_pgtable_page(info->iommu->node);
>> + if (!entries) {
>> + spin_unlock(&pasid_lock);
>> + return NULL;
>> + }
>> +
>> + WRITE_ONCE(dir[dir_index].val,
>> + (u64)virt_to_phys(entries) | PASID_PTE_PRESENT);
>> + }
>> + spin_unlock(&pasid_lock);
>>
>> - return &entries[pasid];
>> + return &entries[index];
>> }
>>
>> /*
>> @@ -224,7 +271,14 @@ struct pasid_entry *intel_pasid_get_entry(struct
>> device *dev, int pasid)
>> */
>> static inline void pasid_clear_entry(struct pasid_entry *pe)
>> {
>> - WRITE_ONCE(pe->val, 0);
>> + WRITE_ONCE(pe->val[0], 0);
>> + WRITE_ONCE(pe->val[1], 0);
>> + WRITE_ONCE(pe->val[2], 0);
>> + WRITE_ONCE(pe->val[3], 0);
>> + WRITE_ONCE(pe->val[4], 0);
>> + WRITE_ONCE(pe->val[5], 0);
>> + WRITE_ONCE(pe->val[6], 0);
>> + WRITE_ONCE(pe->val[7], 0);
>
> memset?
The order is important here. Otherwise, the PRESENT bit of this pasid
entry might still set while other fields contains invalid values.
>
>> }
>>
>> void intel_pasid_clear_entry(struct device *dev, int pasid)
>> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
>> index 1c05ed6fc5a5..12f480c2bb8b 100644
>> --- a/drivers/iommu/intel-pasid.h
>> +++ b/drivers/iommu/intel-pasid.h
>> @@ -12,11 +12,19 @@
>>
>> #define PASID_MIN 0x1
>> #define PASID_MAX 0x100000
>> +#define PASID_PTE_MASK 0x3F
>> +#define PASID_PTE_PRESENT 1
>> +#define PDE_PFN_MASK PAGE_MASK
>> +#define PASID_PDE_SHIFT 6
>>
>> -struct pasid_entry {
>> +struct pasid_dir_entry {
>> u64 val;
>> };
>>
>> +struct pasid_entry {
>> + u64 val[8];
>> +};
>> +
>> /* The representative of a PASID table */
>> struct pasid_table {
>> void *table; /* pasid table pointer */
>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>> index 4a03e5090952..6c0bd9ee9602 100644
>> --- a/drivers/iommu/intel-svm.c
>> +++ b/drivers/iommu/intel-svm.c
>> @@ -65,8 +65,6 @@ int intel_svm_init(struct intel_iommu *iommu)
>>
>> order = get_order(sizeof(struct pasid_entry) * iommu->pasid_max);
>> if (ecap_dis(iommu->ecap)) {
>> - /* Just making it explicit... */
>> - BUILD_BUG_ON(sizeof(struct pasid_entry) != sizeof(struct
>> pasid_state_entry));
>> pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, order);
>> if (pages)
>> iommu->pasid_state_table = page_address(pages);
>> @@ -406,9 +404,7 @@ int intel_svm_bind_mm(struct device *dev, int
>> *pasid, int flags, struct svm_dev_
>> pasid_entry_val |= PASID_ENTRY_FLPM_5LP;
>>
>> entry = intel_pasid_get_entry(dev, svm->pasid);
>> - entry->val = pasid_entry_val;
>> -
>> - wmb();
>> + WRITE_ONCE(entry->val[0], pasid_entry_val);
>>
>> /*
>> * Flush PASID cache when a PASID table entry becomes
>> --
>> 2.17.1
>
>
Best regards,
Lu Baolu
Hi,
On 09/06/2018 10:15 AM, Tian, Kevin wrote:
>> From: Lu Baolu [mailto:[email protected]]
>> Sent: Thursday, August 30, 2018 9:35 AM
>>
>> So that they could also be used in other source files.
>>
>> Cc: Ashok Raj <[email protected]>
>> Cc: Jacob Pan <[email protected]>
>> Cc: Kevin Tian <[email protected]>
>> Cc: Liu Yi L <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Ashok Raj <[email protected]>
>
> Reviewed-by: Kevin Tian <[email protected]>
Thank you, Kevin.
Best regards,
Lu Baolu
>
>> ---
>> drivers/iommu/intel-iommu.c | 43 -------------------------------------
>> include/linux/intel-iommu.h | 43
>> +++++++++++++++++++++++++++++++++++++
>> 2 files changed, 43 insertions(+), 43 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index b0da4f765274..93cde957adc7 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -315,49 +315,6 @@ static inline void context_clear_entry(struct
>> context_entry *context)
>> context->hi = 0;
>> }
>>
>> -/*
>> - * 0: readable
>> - * 1: writable
>> - * 2-6: reserved
>> - * 7: super page
>> - * 8-10: available
>> - * 11: snoop behavior
>> - * 12-63: Host physcial address
>> - */
>> -struct dma_pte {
>> - u64 val;
>> -};
>> -
>> -static inline void dma_clear_pte(struct dma_pte *pte)
>> -{
>> - pte->val = 0;
>> -}
>> -
>> -static inline u64 dma_pte_addr(struct dma_pte *pte)
>> -{
>> -#ifdef CONFIG_64BIT
>> - return pte->val & VTD_PAGE_MASK;
>> -#else
>> - /* Must have a full atomic 64-bit read */
>> - return __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
>> -#endif
>> -}
>> -
>> -static inline bool dma_pte_present(struct dma_pte *pte)
>> -{
>> - return (pte->val & 3) != 0;
>> -}
>> -
>> -static inline bool dma_pte_superpage(struct dma_pte *pte)
>> -{
>> - return (pte->val & DMA_PTE_LARGE_PAGE);
>> -}
>> -
>> -static inline int first_pte_in_page(struct dma_pte *pte)
>> -{
>> - return !((unsigned long)pte & ~VTD_PAGE_MASK);
>> -}
>> -
>> /*
>> * This domain is a statically identity mapping domain.
>> * 1. This domain creats a static 1:1 mapping to all usable memory.
>> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
>> index 2173ae35f1dc..41791903a5e3 100644
>> --- a/include/linux/intel-iommu.h
>> +++ b/include/linux/intel-iommu.h
>> @@ -501,6 +501,49 @@ static inline void __iommu_flush_cache(
>> clflush_cache_range(addr, size);
>> }
>>
>> +/*
>> + * 0: readable
>> + * 1: writable
>> + * 2-6: reserved
>> + * 7: super page
>> + * 8-10: available
>> + * 11: snoop behavior
>> + * 12-63: Host physcial address
>> + */
>> +struct dma_pte {
>> + u64 val;
>> +};
>> +
>> +static inline void dma_clear_pte(struct dma_pte *pte)
>> +{
>> + pte->val = 0;
>> +}
>> +
>> +static inline u64 dma_pte_addr(struct dma_pte *pte)
>> +{
>> +#ifdef CONFIG_64BIT
>> + return pte->val & VTD_PAGE_MASK;
>> +#else
>> + /* Must have a full atomic 64-bit read */
>> + return __cmpxchg64(&pte->val, 0ULL, 0ULL) & VTD_PAGE_MASK;
>> +#endif
>> +}
>> +
>> +static inline bool dma_pte_present(struct dma_pte *pte)
>> +{
>> + return (pte->val & 3) != 0;
>> +}
>> +
>> +static inline bool dma_pte_superpage(struct dma_pte *pte)
>> +{
>> + return (pte->val & DMA_PTE_LARGE_PAGE);
>> +}
>> +
>> +static inline int first_pte_in_page(struct dma_pte *pte)
>> +{
>> + return !((unsigned long)pte & ~VTD_PAGE_MASK);
>> +}
>> +
>> extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct
>> pci_dev *dev);
>> extern int dmar_find_matched_atsr_unit(struct pci_dev *dev);
>>
>> --
>> 2.17.1
>
>
> From: Lu Baolu [mailto:[email protected]]
> Sent: Thursday, September 6, 2018 10:46 AM
>
[...]
> >> @@ -143,8 +142,9 @@ int intel_pasid_alloc_table(struct device *dev)
> >> return -ENOMEM;
> >> INIT_LIST_HEAD(&pasid_table->dev);
> >>
> >> - size = sizeof(struct pasid_entry);
> >> + size = sizeof(struct pasid_dir_entry);
> >> count = min_t(int, pci_max_pasids(to_pci_dev(dev)),
> >> intel_pasid_max_id);
> >> + count >>= PASID_PDE_SHIFT;
> >> order = get_order(size * count);
> >> pages = alloc_pages_node(info->iommu->node,
> >> GFP_ATOMIC | __GFP_ZERO,
> >> @@ -154,7 +154,7 @@ int intel_pasid_alloc_table(struct device *dev)
> >>
> >> pasid_table->table = page_address(pages);
> >> pasid_table->order = order;
> >> - pasid_table->max_pasid = count;
> >> + pasid_table->max_pasid = count << PASID_PDE_SHIFT;
> >
> > are you sure of that count is PDE_SHIFT aligned? otherwise >>
> > then << would lose some bits. If sure, then better add some check.
>
> I am making the max_pasid PDE_SHIFT aligned as the result of shift
> operations.
>
earlier:
> >> count = min_t(int, pci_max_pasids(to_pci_dev(dev)),
> >> intel_pasid_max_id);
so you decided to truncate count to be PDE_SHIFT aligned. Is PASID
value user configurable? if not, then it's fine.
> >
> >>
> >> attach_out:
> >> device_attach_pasid_table(info, pasid_table);
> >> @@ -162,14 +162,33 @@ int intel_pasid_alloc_table(struct device *dev)
> >> return 0;
> >> }
> >>
> >> +/* Get PRESENT bit of a PASID directory entry. */
> >> +static inline bool
> >> +pasid_pde_is_present(struct pasid_dir_entry *pde)
> >> +{
> >> + return READ_ONCE(pde->val) & PASID_PTE_PRESENT;
> >
> > curious why adding READ_ONCE specifically for PASID structure,
> > but not used for any other existing vtd structures? Is it to address
> > some specific requirement on PASID structure as defined in spec?
>
> READ/WRITE_ONCE are used in pasid entry read/write to prevent the
> compiler from merging, refetching or reordering successive instances of
> read/write.
>
that's fine. I'm just curious why this is the first user of such macros
in intel-iommu driver. Even before with ecs we have PASID table too.
Thanks
Kevin
Hi,
On 09/06/2018 10:52 AM, Tian, Kevin wrote:
>> From: Lu Baolu [mailto:[email protected]]
>> Sent: Thursday, September 6, 2018 10:46 AM
>>
> [...]
>>>> @@ -143,8 +142,9 @@ int intel_pasid_alloc_table(struct device *dev)
>>>> return -ENOMEM;
>>>> INIT_LIST_HEAD(&pasid_table->dev);
>>>>
>>>> - size = sizeof(struct pasid_entry);
>>>> + size = sizeof(struct pasid_dir_entry);
>>>> count = min_t(int, pci_max_pasids(to_pci_dev(dev)),
>>>> intel_pasid_max_id);
>>>> + count >>= PASID_PDE_SHIFT;
>>>> order = get_order(size * count);
>>>> pages = alloc_pages_node(info->iommu->node,
>>>> GFP_ATOMIC | __GFP_ZERO,
>>>> @@ -154,7 +154,7 @@ int intel_pasid_alloc_table(struct device *dev)
>>>>
>>>> pasid_table->table = page_address(pages);
>>>> pasid_table->order = order;
>>>> - pasid_table->max_pasid = count;
>>>> + pasid_table->max_pasid = count << PASID_PDE_SHIFT;
>>>
>>> are you sure of that count is PDE_SHIFT aligned? otherwise >>
>>> then << would lose some bits. If sure, then better add some check.
>>
>> I am making the max_pasid PDE_SHIFT aligned as the result of shift
>> operations.
>>
>
> earlier:
>>>> count = min_t(int, pci_max_pasids(to_pci_dev(dev)),
>>>> intel_pasid_max_id);
>
> so you decided to truncate count to be PDE_SHIFT aligned. Is PASID
> value user configurable? if not, then it's fine.
Here @count is the count of PASID directory entries, so it must be
truncated from the original max_pasid. PASID value is not configurable
anyway.
>
>>>
>>>>
>>>> attach_out:
>>>> device_attach_pasid_table(info, pasid_table);
>>>> @@ -162,14 +162,33 @@ int intel_pasid_alloc_table(struct device *dev)
>>>> return 0;
>>>> }
>>>>
>>>> +/* Get PRESENT bit of a PASID directory entry. */
>>>> +static inline bool
>>>> +pasid_pde_is_present(struct pasid_dir_entry *pde)
>>>> +{
>>>> + return READ_ONCE(pde->val) & PASID_PTE_PRESENT;
>>>
>>> curious why adding READ_ONCE specifically for PASID structure,
>>> but not used for any other existing vtd structures? Is it to address
>>> some specific requirement on PASID structure as defined in spec?
>>
>> READ/WRITE_ONCE are used in pasid entry read/write to prevent the
>> compiler from merging, refetching or reordering successive instances of
>> read/write.
>>
>
> that's fine. I'm just curious why this is the first user of such macros
> in intel-iommu driver. Even before with ecs we have PASID table too.
>
Best regards,
Lu Baolu
> From: Lu Baolu [mailto:[email protected]]
> Sent: Thursday, August 30, 2018 9:35 AM
>
> This adds the interfaces to setup or tear down the structures
> for second level page table translations. This includes types
> of second level only translation and pass through.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Cc: Liu Yi L <[email protected]>
> Signed-off-by: Sanjay Kumar <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Ashok Raj <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 2 +-
> drivers/iommu/intel-pasid.c | 246
> ++++++++++++++++++++++++++++++++++++
> drivers/iommu/intel-pasid.h | 7 +
> include/linux/intel-iommu.h | 3 +
> 4 files changed, 257 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index 562da10bf93e..de6b909bb47a 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1232,7 +1232,7 @@ static void iommu_set_root_entry(struct
> intel_iommu *iommu)
> raw_spin_unlock_irqrestore(&iommu->register_lock, flag);
> }
>
> -static void iommu_flush_write_buffer(struct intel_iommu *iommu)
> +void iommu_flush_write_buffer(struct intel_iommu *iommu)
> {
> u32 val;
> unsigned long flag;
> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
> index d6e90cd5b062..edcea1d8b9fc 100644
> --- a/drivers/iommu/intel-pasid.c
> +++ b/drivers/iommu/intel-pasid.c
> @@ -9,6 +9,7 @@
>
> #define pr_fmt(fmt) "DMAR: " fmt
>
> +#include <linux/bitops.h>
> #include <linux/dmar.h>
> #include <linux/intel-iommu.h>
> #include <linux/iommu.h>
> @@ -291,3 +292,248 @@ void intel_pasid_clear_entry(struct device *dev,
> int pasid)
>
> pasid_clear_entry(pe);
> }
> +
> +static inline void pasid_set_bits(u64 *ptr, u64 mask, u64 bits)
> +{
> + u64 old;
> +
> + old = READ_ONCE(*ptr);
> + WRITE_ONCE(*ptr, (old & ~mask) | bits);
> +}
> +
> +/*
> + * Setup the DID(Domain Identifier) field (Bit 64~79) of scalable mode
> + * PASID entry.
> + */
> +static inline void
> +pasid_set_domain_id(struct pasid_entry *pe, u64 value)
> +{
> + pasid_set_bits(&pe->val[1], GENMASK_ULL(15, 0), value);
> +}
> +
> +/*
> + * Setup the SLPTPTR(Second Level Page Table Pointer) field (Bit 12~63)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_address_root(struct pasid_entry *pe, u64 value)
is address_root too general? especially when the entry could contain both
1st level and 2nd level pointers.
> +{
> + pasid_set_bits(&pe->val[0], VTD_PAGE_MASK, value);
> +}
> +
> +/*
> + * Setup the AW(Address Width) field (Bit 2~4) of a scalable mode PASID
> + * entry.
> + */
> +static inline void
> +pasid_set_address_width(struct pasid_entry *pe, u64 value)
> +{
> + pasid_set_bits(&pe->val[0], GENMASK_ULL(4, 2), value << 2);
> +}
> +
> +/*
> + * Setup the PGTT(PASID Granular Translation Type) field (Bit 6~8)
> + * of a scalable mode PASID entry.
> + */
> +static inline void
> +pasid_set_translation_type(struct pasid_entry *pe, u64 value)
> +{
> + pasid_set_bits(&pe->val[0], GENMASK_ULL(8, 6), value << 6);
> +}
> +
> +/*
> + * Enable fault processing by clearing the FPD(Fault Processing
> + * Disable) field (Bit 1) of a scalable mode PASID entry.
> + */
> +static inline void pasid_set_fault_enable(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[0], 1 << 1, 0);
> +}
> +
> +/*
> + * Setup the SRE(Supervisor Request Enable) field (Bit 128) of a
> + * scalable mode PASID entry.
> + */
> +static inline void pasid_set_sre(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[2], 1 << 0, 1);
> +}
> +
> +/*
> + * Setup the P(Present) field (Bit 0) of a scalable mode PASID
> + * entry.
> + */
> +static inline void pasid_set_present(struct pasid_entry *pe)
> +{
> + pasid_set_bits(&pe->val[0], 1 << 0, 1);
> +}
it's a long list and there could be more in the future. What about
defining some macro to simplify LOC, e.g.
#define PASID_SET(name, i, m, b) \
static inline void pasid_set_name(struct pasid_entry *pe) \
{ \
pasid_set_bits(&pe->val[i], m, b); \
}
PASID_SET(present, 0, 1<<0, 1);
PASID_SET(sre, 2, 1<<0, 1);
...
> +
> +/*
> + * Setup Page Walk Snoop bit (Bit 87) of a scalable mode PASID
> + * entry.
> + */
> +static inline void pasid_set_page_snoop(struct pasid_entry *pe, bool value)
> +{
> + pasid_set_bits(&pe->val[1], 1 << 23, value);
> +}
> +
> +static void
> +pasid_based_pasid_cache_invalidation(struct intel_iommu *iommu,
> + int did, int pasid)
pasid_cache_invalidation_with_pasid
> +{
> + struct qi_desc desc;
> +
> + desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL |
> QI_PC_PASID(pasid);
> + desc.qw1 = 0;
> + desc.qw2 = 0;
> + desc.qw3 = 0;
> +
> + qi_submit_sync(&desc, iommu);
> +}
> +
> +static void
> +pasid_based_iotlb_cache_invalidation(struct intel_iommu *iommu,
> + u16 did, u32 pasid)
iotlb_invalidation_with_pasid
> +{
> + struct qi_desc desc;
> +
> + desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
> + QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
> QI_EIOTLB_TYPE;
> + desc.qw1 = 0;
> + desc.qw2 = 0;
> + desc.qw3 = 0;
> +
> + qi_submit_sync(&desc, iommu);
> +}
> +
> +static void
> +pasid_based_dev_iotlb_cache_invalidation(struct intel_iommu *iommu,
> + struct device *dev, int pasid)
devtlb_invalidation_with_pasid
> +{
> + struct device_domain_info *info;
> + u16 sid, qdep, pfsid;
> +
> + info = dev->archdata.iommu;
> + if (!info || !info->ats_enabled)
> + return;
> +
> + sid = info->bus << 8 | info->devfn;
> + qdep = info->ats_qdep;
> + pfsid = info->pfsid;
> +
> + qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 -
> VTD_PAGE_SHIFT);
> +}
> +
> +static void tear_down_one_pasid_entry(struct intel_iommu *iommu,
> + struct device *dev, u16 did,
> + int pasid)
> +{
> + struct pasid_entry *pte;
ptep
> +
> + intel_pasid_clear_entry(dev, pasid);
> +
> + if (!ecap_coherent(iommu->ecap)) {
> + pte = intel_pasid_get_entry(dev, pasid);
> + clflush_cache_range(pte, sizeof(*pte));
> + }
> +
> + pasid_based_pasid_cache_invalidation(iommu, did, pasid);
> + pasid_based_iotlb_cache_invalidation(iommu, did, pasid);
> +
> + /* Device IOTLB doesn't need to be flushed in caching mode. */
> + if (!cap_caching_mode(iommu->cap))
> + pasid_based_dev_iotlb_cache_invalidation(iommu, dev,
> pasid);
can you elaborate, or point to any spec reference?
> +}
> +
> +/*
> + * Set up the scalable mode pasid table entry for second only or
> + * passthrough translation type.
> + */
> +int intel_pasid_setup_second_level(struct intel_iommu *iommu,
second_level doesn't imply passthrough. what about intel_pasid_
setup_common, which is then invoked by SL or PT individually (
or even FL)?
> + struct dmar_domain *domain,
> + struct device *dev, int pasid,
> + bool pass_through)
> +{
> + struct pasid_entry *pte;
> + struct dma_pte *pgd;
> + u64 pgd_val;
> + int agaw;
> + u16 did;
> +
> + /*
> + * If hardware advertises no support for second level translation,
> + * we only allow pass through translation setup.
> + */
> + if (!(ecap_slts(iommu->ecap) || pass_through)) {
> + pr_err("No first level translation support on %s, only pass-
first->second
> through mode allowed\n",
> + iommu->name);
> + return -EINVAL;
> + }
> +
> + /*
> + * Skip top levels of page tables for iommu which has less agaw
skip doesn't mean error
> + * than default. Unnecessary for PT mode.
> + */
> + pgd = domain->pgd;
> + if (!pass_through) {
> + for (agaw = domain->agaw; agaw != iommu->agaw; agaw--)
> {
> + pgd = phys_to_virt(dma_pte_addr(pgd));
> + if (!dma_pte_present(pgd)) {
> + dev_err(dev, "Invalid domain page table\n");
> + return -EINVAL;
> + }
> + }
> + }
> + pgd_val = pass_through ? 0 : virt_to_phys(pgd);
> + did = pass_through ? FLPT_DEFAULT_DID :
> + domain->iommu_did[iommu->seq_id];
> +
> + pte = intel_pasid_get_entry(dev, pasid);
> + if (!pte) {
> + dev_err(dev, "Failed to get pasid entry of PASID %d\n",
> pasid);
> + return -ENODEV;
> + }
> +
> + pasid_clear_entry(pte);
> + pasid_set_domain_id(pte, did);
> +
> + if (!pass_through)
> + pasid_set_address_root(pte, pgd_val);
> +
> + pasid_set_address_width(pte, iommu->agaw);
> + pasid_set_translation_type(pte, pass_through ? 4 : 2);
> + pasid_set_fault_enable(pte);
> + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
> +
> + /*
> + * Since it is a second level only translation setup, we should
> + * set SRE bit as well (addresses are expected to be GPAs).
> + */
> + pasid_set_sre(pte);
> + pasid_set_present(pte);
> +
> + if (!ecap_coherent(iommu->ecap))
> + clflush_cache_range(pte, sizeof(*pte));
> +
> + if (cap_caching_mode(iommu->cap)) {
> + pasid_based_pasid_cache_invalidation(iommu, did, pasid);
> + pasid_based_iotlb_cache_invalidation(iommu, did, pasid);
> + } else {
> + iommu_flush_write_buffer(iommu);
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * Tear down the scalable mode pasid table entry for second only or
> + * passthrough translation type.
> + */
> +void intel_pasid_tear_down_second_level(struct intel_iommu *iommu,
> + struct dmar_domain *domain,
> + struct device *dev, int pasid)
> +{
> + u16 did = domain->iommu_did[iommu->seq_id];
> +
> + tear_down_one_pasid_entry(iommu, dev, did, pasid);
> +}
> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
> index 03c1612d173c..85b158a1826a 100644
> --- a/drivers/iommu/intel-pasid.h
> +++ b/drivers/iommu/intel-pasid.h
> @@ -49,5 +49,12 @@ struct pasid_table *intel_pasid_get_table(struct
> device *dev);
> int intel_pasid_get_dev_max_id(struct device *dev);
> struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid);
> void intel_pasid_clear_entry(struct device *dev, int pasid);
> +int intel_pasid_setup_second_level(struct intel_iommu *iommu,
> + struct dmar_domain *domain,
> + struct device *dev, int pasid,
> + bool pass_through);
> +void intel_pasid_tear_down_second_level(struct intel_iommu *iommu,
> + struct dmar_domain *domain,
> + struct device *dev, int pasid);
>
> #endif /* __INTEL_PASID_H */
> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
> index 72aff482b293..d77d23dfd221 100644
> --- a/include/linux/intel-iommu.h
> +++ b/include/linux/intel-iommu.h
> @@ -115,6 +115,8 @@
> * Extended Capability Register
> */
>
> +#define ecap_smpwc(e) (((e) >> 48) & 0x1)
> +#define ecap_slts(e) (((e) >> 46) & 0x1)
> #define ecap_smts(e) (((e) >> 43) & 0x1)
> #define ecap_dit(e) ((e >> 41) & 0x1)
> #define ecap_pasid(e) ((e >> 40) & 0x1)
> @@ -571,6 +573,7 @@ void free_pgtable_page(void *vaddr);
> struct intel_iommu *domain_get_iommu(struct dmar_domain *domain);
> int for_each_device_domain(int (*fn)(struct device_domain_info *info,
> void *data), void *data);
> +void iommu_flush_write_buffer(struct intel_iommu *iommu);
>
> #ifdef CONFIG_INTEL_IOMMU_SVM
> int intel_svm_init(struct intel_iommu *iommu);
> --
> 2.17.1
> From: Lu Baolu [mailto:[email protected]]
> Sent: Thursday, August 30, 2018 9:35 AM
>
> So that the pasid related info, such as the pasid table and the
> maximum of pasid could be used during setting up scalable mode
> context.
>
> Cc: Ashok Raj <[email protected]>
> Cc: Jacob Pan <[email protected]>
> Cc: Kevin Tian <[email protected]>
> Cc: Liu Yi L <[email protected]>
> Signed-off-by: Lu Baolu <[email protected]>
> Reviewed-by: Ashok Raj <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
> ---
> drivers/iommu/intel-iommu.c | 14 +++++++++++---
> 1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> index c3bf2ccf094d..33642dd3d6ba 100644
> --- a/drivers/iommu/intel-iommu.c
> +++ b/drivers/iommu/intel-iommu.c
> @@ -1942,6 +1942,7 @@ static void domain_exit(struct dmar_domain
> *domain)
>
> static int domain_context_mapping_one(struct dmar_domain *domain,
> struct intel_iommu *iommu,
> + struct pasid_table *table,
> u8 bus, u8 devfn)
> {
> u16 did = domain->iommu_did[iommu->seq_id];
> @@ -2064,6 +2065,7 @@ static int domain_context_mapping_one(struct
> dmar_domain *domain,
> struct domain_context_mapping_data {
> struct dmar_domain *domain;
> struct intel_iommu *iommu;
> + struct pasid_table *table;
> };
>
> static int domain_context_mapping_cb(struct pci_dev *pdev,
> @@ -2072,25 +2074,31 @@ static int domain_context_mapping_cb(struct
> pci_dev *pdev,
> struct domain_context_mapping_data *data = opaque;
>
> return domain_context_mapping_one(data->domain, data-
> >iommu,
> - PCI_BUS_NUM(alias), alias & 0xff);
> + data->table, PCI_BUS_NUM(alias),
> + alias & 0xff);
> }
>
> static int
> domain_context_mapping(struct dmar_domain *domain, struct device
> *dev)
> {
> + struct domain_context_mapping_data data;
> + struct pasid_table *table;
> struct intel_iommu *iommu;
> u8 bus, devfn;
> - struct domain_context_mapping_data data;
>
> iommu = device_to_iommu(dev, &bus, &devfn);
> if (!iommu)
> return -ENODEV;
>
> + table = intel_pasid_get_table(dev);
> +
> if (!dev_is_pci(dev))
> - return domain_context_mapping_one(domain, iommu, bus,
> devfn);
> + return domain_context_mapping_one(domain, iommu,
> table,
> + bus, devfn);
>
> data.domain = domain;
> data.iommu = iommu;
> + data.table = table;
>
> return pci_for_each_dma_alias(to_pci_dev(dev),
> &domain_context_mapping_cb, &data);
> --
> 2.17.1
On Thu, 6 Sep 2018 10:46:03 +0800
Lu Baolu <[email protected]> wrote:
> >> @@ -224,7 +271,14 @@ struct pasid_entry
> >> *intel_pasid_get_entry(struct device *dev, int pasid)
> >> */
> >> static inline void pasid_clear_entry(struct pasid_entry *pe)
> >> {
> >> - WRITE_ONCE(pe->val, 0);
> >> + WRITE_ONCE(pe->val[0], 0);
> >> + WRITE_ONCE(pe->val[1], 0);
> >> + WRITE_ONCE(pe->val[2], 0);
> >> + WRITE_ONCE(pe->val[3], 0);
> >> + WRITE_ONCE(pe->val[4], 0);
> >> + WRITE_ONCE(pe->val[5], 0);
> >> + WRITE_ONCE(pe->val[6], 0);
> >> + WRITE_ONCE(pe->val[7], 0);
> >
> > memset?
>
> The order is important here. Otherwise, the PRESENT bit of this pasid
> entry might still set while other fields contains invalid values.
WRITE_ONCE/READ_ONCE will switch to __builtin_memcpy() in if the size
exceeds word size, ie. 64bit in this case. I don;t think compiler will
reorder built-in function. Beside, we only need to clear present and
FDP bit, right?
Hi,
On 09/06/2018 11:11 AM, Tian, Kevin wrote:
>> From: Lu Baolu [mailto:[email protected]]
>> Sent: Thursday, August 30, 2018 9:35 AM
>>
>> This adds the interfaces to setup or tear down the structures
>> for second level page table translations. This includes types
>> of second level only translation and pass through.
>>
>> Cc: Ashok Raj <[email protected]>
>> Cc: Jacob Pan <[email protected]>
>> Cc: Kevin Tian <[email protected]>
>> Cc: Liu Yi L <[email protected]>
>> Signed-off-by: Sanjay Kumar <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Ashok Raj <[email protected]>
>> ---
>> drivers/iommu/intel-iommu.c | 2 +-
>> drivers/iommu/intel-pasid.c | 246
>> ++++++++++++++++++++++++++++++++++++
>> drivers/iommu/intel-pasid.h | 7 +
>> include/linux/intel-iommu.h | 3 +
>> 4 files changed, 257 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index 562da10bf93e..de6b909bb47a 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -1232,7 +1232,7 @@ static void iommu_set_root_entry(struct
>> intel_iommu *iommu)
>> raw_spin_unlock_irqrestore(&iommu->register_lock, flag);
>> }
>>
>> -static void iommu_flush_write_buffer(struct intel_iommu *iommu)
>> +void iommu_flush_write_buffer(struct intel_iommu *iommu)
>> {
>> u32 val;
>> unsigned long flag;
>> diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
>> index d6e90cd5b062..edcea1d8b9fc 100644
>> --- a/drivers/iommu/intel-pasid.c
>> +++ b/drivers/iommu/intel-pasid.c
>> @@ -9,6 +9,7 @@
>>
>> #define pr_fmt(fmt) "DMAR: " fmt
>>
>> +#include <linux/bitops.h>
>> #include <linux/dmar.h>
>> #include <linux/intel-iommu.h>
>> #include <linux/iommu.h>
>> @@ -291,3 +292,248 @@ void intel_pasid_clear_entry(struct device *dev,
>> int pasid)
>>
>> pasid_clear_entry(pe);
>> }
>> +
>> +static inline void pasid_set_bits(u64 *ptr, u64 mask, u64 bits)
>> +{
>> + u64 old;
>> +
>> + old = READ_ONCE(*ptr);
>> + WRITE_ONCE(*ptr, (old & ~mask) | bits);
>> +}
>> +
>> +/*
>> + * Setup the DID(Domain Identifier) field (Bit 64~79) of scalable mode
>> + * PASID entry.
>> + */
>> +static inline void
>> +pasid_set_domain_id(struct pasid_entry *pe, u64 value)
>> +{
>> + pasid_set_bits(&pe->val[1], GENMASK_ULL(15, 0), value);
>> +}
>> +
>> +/*
>> + * Setup the SLPTPTR(Second Level Page Table Pointer) field (Bit 12~63)
>> + * of a scalable mode PASID entry.
>> + */
>> +static inline void
>> +pasid_set_address_root(struct pasid_entry *pe, u64 value)
>
> is address_root too general? especially when the entry could contain both
> 1st level and 2nd level pointers.
>
Yes. Should be changed to a specific name like pasid_set_slpt_ptr().
>> +{
>> + pasid_set_bits(&pe->val[0], VTD_PAGE_MASK, value);
>> +}
>> +
>> +/*
>> + * Setup the AW(Address Width) field (Bit 2~4) of a scalable mode PASID
>> + * entry.
>> + */
>> +static inline void
>> +pasid_set_address_width(struct pasid_entry *pe, u64 value)
>> +{
>> + pasid_set_bits(&pe->val[0], GENMASK_ULL(4, 2), value << 2);
>> +}
>> +
>> +/*
>> + * Setup the PGTT(PASID Granular Translation Type) field (Bit 6~8)
>> + * of a scalable mode PASID entry.
>> + */
>> +static inline void
>> +pasid_set_translation_type(struct pasid_entry *pe, u64 value)
>> +{
>> + pasid_set_bits(&pe->val[0], GENMASK_ULL(8, 6), value << 6);
>> +}
>> +
>> +/*
>> + * Enable fault processing by clearing the FPD(Fault Processing
>> + * Disable) field (Bit 1) of a scalable mode PASID entry.
>> + */
>> +static inline void pasid_set_fault_enable(struct pasid_entry *pe)
>> +{
>> + pasid_set_bits(&pe->val[0], 1 << 1, 0);
>> +}
>> +
>> +/*
>> + * Setup the SRE(Supervisor Request Enable) field (Bit 128) of a
>> + * scalable mode PASID entry.
>> + */
>> +static inline void pasid_set_sre(struct pasid_entry *pe)
>> +{
>> + pasid_set_bits(&pe->val[2], 1 << 0, 1);
>> +}
>> +
>> +/*
>> + * Setup the P(Present) field (Bit 0) of a scalable mode PASID
>> + * entry.
>> + */
>> +static inline void pasid_set_present(struct pasid_entry *pe)
>> +{
>> + pasid_set_bits(&pe->val[0], 1 << 0, 1);
>> +}
>
> it's a long list and there could be more in the future. What about
> defining some macro to simplify LOC, e.g.
>
> #define PASID_SET(name, i, m, b) \
> static inline void pasid_set_name(struct pasid_entry *pe) \
> { \
> pasid_set_bits(&pe->val[i], m, b); \
> }
>
> PASID_SET(present, 0, 1<<0, 1);
> PASID_SET(sre, 2, 1<<0, 1);
> ...
>
Fair enough. This looks more concise.
>> +
>> +/*
>> + * Setup Page Walk Snoop bit (Bit 87) of a scalable mode PASID
>> + * entry.
>> + */
>> +static inline void pasid_set_page_snoop(struct pasid_entry *pe, bool value)
>> +{
>> + pasid_set_bits(&pe->val[1], 1 << 23, value);
>> +}
>> +
>> +static void
>> +pasid_based_pasid_cache_invalidation(struct intel_iommu *iommu,
>> + int did, int pasid)
>
> pasid_cache_invalidation_with_pasid
Okay.
>
>> +{
>> + struct qi_desc desc;
>> +
>> + desc.qw0 = QI_PC_DID(did) | QI_PC_PASID_SEL |
>> QI_PC_PASID(pasid);
>> + desc.qw1 = 0;
>> + desc.qw2 = 0;
>> + desc.qw3 = 0;
>> +
>> + qi_submit_sync(&desc, iommu);
>> +}
>> +
>> +static void
>> +pasid_based_iotlb_cache_invalidation(struct intel_iommu *iommu,
>> + u16 did, u32 pasid)
>
> iotlb_invalidation_with_pasid
Okay.
>
>> +{
>> + struct qi_desc desc;
>> +
>> + desc.qw0 = QI_EIOTLB_PASID(pasid) | QI_EIOTLB_DID(did) |
>> + QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
>> QI_EIOTLB_TYPE;
>> + desc.qw1 = 0;
>> + desc.qw2 = 0;
>> + desc.qw3 = 0;
>> +
>> + qi_submit_sync(&desc, iommu);
>> +}
>> +
>> +static void
>> +pasid_based_dev_iotlb_cache_invalidation(struct intel_iommu *iommu,
>> + struct device *dev, int pasid)
>
> devtlb_invalidation_with_pasid
Okay.
>
>> +{
>> + struct device_domain_info *info;
>> + u16 sid, qdep, pfsid;
>> +
>> + info = dev->archdata.iommu;
>> + if (!info || !info->ats_enabled)
>> + return;
>> +
>> + sid = info->bus << 8 | info->devfn;
>> + qdep = info->ats_qdep;
>> + pfsid = info->pfsid;
>> +
>> + qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 -
>> VTD_PAGE_SHIFT);
>> +}
>> +
>> +static void tear_down_one_pasid_entry(struct intel_iommu *iommu,
>> + struct device *dev, u16 did,
>> + int pasid)
>> +{
>> + struct pasid_entry *pte;
>
> ptep
>
Okay.
>> +
>> + intel_pasid_clear_entry(dev, pasid);
>> +
>> + if (!ecap_coherent(iommu->ecap)) {
>> + pte = intel_pasid_get_entry(dev, pasid);
>> + clflush_cache_range(pte, sizeof(*pte));
>> + }
>> +
>> + pasid_based_pasid_cache_invalidation(iommu, did, pasid);
>> + pasid_based_iotlb_cache_invalidation(iommu, did, pasid);
>> +
>> + /* Device IOTLB doesn't need to be flushed in caching mode. */
>> + if (!cap_caching_mode(iommu->cap))
>> + pasid_based_dev_iotlb_cache_invalidation(iommu, dev,
>> pasid);
>
> can you elaborate, or point to any spec reference?
>
In the driver, device iotlb doesn't get flushed in caching mode. I just
follow what have been done there.
It also makes sense to me since only the bare metal host needs to
consider whether and how to flush the device iotlb.
>> +}
>> +
>> +/*
>> + * Set up the scalable mode pasid table entry for second only or
>> + * passthrough translation type.
>> + */
>> +int intel_pasid_setup_second_level(struct intel_iommu *iommu,
>
> second_level doesn't imply passthrough. what about intel_pasid_
> setup_common, which is then invoked by SL or PT individually (
> or even FL)?
Fair enough. Will refine this part of code.
>
>> + struct dmar_domain *domain,
>> + struct device *dev, int pasid,
>> + bool pass_through)
>> +{
>> + struct pasid_entry *pte;
>> + struct dma_pte *pgd;
>> + u64 pgd_val;
>> + int agaw;
>> + u16 did;
>> +
>> + /*
>> + * If hardware advertises no support for second level translation,
>> + * we only allow pass through translation setup.
>> + */
>> + if (!(ecap_slts(iommu->ecap) || pass_through)) {
>> + pr_err("No first level translation support on %s, only pass-
>
> first->second
Sure.
>
>> through mode allowed\n",
>> + iommu->name);
>> + return -EINVAL;
>> + }
>> +
>> + /*
>> + * Skip top levels of page tables for iommu which has less agaw
>
> skip doesn't mean error
Yes. But it's an error if we can't skip ... :-)
>
>> + * than default. Unnecessary for PT mode.
>> + */
>> + pgd = domain->pgd;
>> + if (!pass_through) {
>> + for (agaw = domain->agaw; agaw != iommu->agaw; agaw--)
>> {
>> + pgd = phys_to_virt(dma_pte_addr(pgd));
>> + if (!dma_pte_present(pgd)) {
>> + dev_err(dev, "Invalid domain page table\n");
>> + return -EINVAL;
>> + }
>> + }
>> + }
>> + pgd_val = pass_through ? 0 : virt_to_phys(pgd);
>> + did = pass_through ? FLPT_DEFAULT_DID :
>> + domain->iommu_did[iommu->seq_id];
>> +
>> + pte = intel_pasid_get_entry(dev, pasid);
>> + if (!pte) {
>> + dev_err(dev, "Failed to get pasid entry of PASID %d\n",
>> pasid);
>> + return -ENODEV;
>> + }
>> +
>> + pasid_clear_entry(pte);
>> + pasid_set_domain_id(pte, did);
>> +
>> + if (!pass_through)
>> + pasid_set_address_root(pte, pgd_val);
>> +
>> + pasid_set_address_width(pte, iommu->agaw);
>> + pasid_set_translation_type(pte, pass_through ? 4 : 2);
>> + pasid_set_fault_enable(pte);
>> + pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
>> +
>> + /*
>> + * Since it is a second level only translation setup, we should
>> + * set SRE bit as well (addresses are expected to be GPAs).
>> + */
>> + pasid_set_sre(pte);
>> + pasid_set_present(pte);
>> +
>> + if (!ecap_coherent(iommu->ecap))
>> + clflush_cache_range(pte, sizeof(*pte));
>> +
>> + if (cap_caching_mode(iommu->cap)) {
>> + pasid_based_pasid_cache_invalidation(iommu, did, pasid);
>> + pasid_based_iotlb_cache_invalidation(iommu, did, pasid);
>> + } else {
>> + iommu_flush_write_buffer(iommu);
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +/*
>> + * Tear down the scalable mode pasid table entry for second only or
>> + * passthrough translation type.
>> + */
>> +void intel_pasid_tear_down_second_level(struct intel_iommu *iommu,
>> + struct dmar_domain *domain,
>> + struct device *dev, int pasid)
>> +{
>> + u16 did = domain->iommu_did[iommu->seq_id];
>> +
>> + tear_down_one_pasid_entry(iommu, dev, did, pasid);
>> +}
>> diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
>> index 03c1612d173c..85b158a1826a 100644
>> --- a/drivers/iommu/intel-pasid.h
>> +++ b/drivers/iommu/intel-pasid.h
>> @@ -49,5 +49,12 @@ struct pasid_table *intel_pasid_get_table(struct
>> device *dev);
>> int intel_pasid_get_dev_max_id(struct device *dev);
>> struct pasid_entry *intel_pasid_get_entry(struct device *dev, int pasid);
>> void intel_pasid_clear_entry(struct device *dev, int pasid);
>> +int intel_pasid_setup_second_level(struct intel_iommu *iommu,
>> + struct dmar_domain *domain,
>> + struct device *dev, int pasid,
>> + bool pass_through);
>> +void intel_pasid_tear_down_second_level(struct intel_iommu *iommu,
>> + struct dmar_domain *domain,
>> + struct device *dev, int pasid);
>>
>> #endif /* __INTEL_PASID_H */
>> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
>> index 72aff482b293..d77d23dfd221 100644
>> --- a/include/linux/intel-iommu.h
>> +++ b/include/linux/intel-iommu.h
>> @@ -115,6 +115,8 @@
>> * Extended Capability Register
>> */
>>
>> +#define ecap_smpwc(e) (((e) >> 48) & 0x1)
>> +#define ecap_slts(e) (((e) >> 46) & 0x1)
>> #define ecap_smts(e) (((e) >> 43) & 0x1)
>> #define ecap_dit(e) ((e >> 41) & 0x1)
>> #define ecap_pasid(e) ((e >> 40) & 0x1)
>> @@ -571,6 +573,7 @@ void free_pgtable_page(void *vaddr);
>> struct intel_iommu *domain_get_iommu(struct dmar_domain *domain);
>> int for_each_device_domain(int (*fn)(struct device_domain_info *info,
>> void *data), void *data);
>> +void iommu_flush_write_buffer(struct intel_iommu *iommu);
>>
>> #ifdef CONFIG_INTEL_IOMMU_SVM
>> int intel_svm_init(struct intel_iommu *iommu);
>> --
>> 2.17.1
>
>
Best regards,
Lu Baolu
Hi,
On 09/07/2018 07:43 AM, Jacob Pan wrote:
> On Thu, 6 Sep 2018 10:46:03 +0800
> Lu Baolu <[email protected]> wrote:
>
>>>> @@ -224,7 +271,14 @@ struct pasid_entry
>>>> *intel_pasid_get_entry(struct device *dev, int pasid)
>>>> */
>>>> static inline void pasid_clear_entry(struct pasid_entry *pe)
>>>> {
>>>> - WRITE_ONCE(pe->val, 0);
>>>> + WRITE_ONCE(pe->val[0], 0);
>>>> + WRITE_ONCE(pe->val[1], 0);
>>>> + WRITE_ONCE(pe->val[2], 0);
>>>> + WRITE_ONCE(pe->val[3], 0);
>>>> + WRITE_ONCE(pe->val[4], 0);
>>>> + WRITE_ONCE(pe->val[5], 0);
>>>> + WRITE_ONCE(pe->val[6], 0);
>>>> + WRITE_ONCE(pe->val[7], 0);
>>>
>>> memset?
>>
>> The order is important here. Otherwise, the PRESENT bit of this pasid
>> entry might still set while other fields contains invalid values.
>
> WRITE_ONCE/READ_ONCE will switch to __builtin_memcpy() in if the size
> exceeds word size, ie. 64bit in this case. I don;t think compiler will
> reorder built-in function. Beside, we only need to clear present and
> FDP bit, right?
Clear present and FDP bit is enough for hardare. But from software point
of view, it's better to clear all bits with 0.
Best regards,
Lu Baolu
Hi,
On 09/06/2018 10:39 AM, Tian, Kevin wrote:
>> From: Lu Baolu [mailto:[email protected]]
>> Sent: Thursday, August 30, 2018 9:35 AM
>>
>> Intel vt-d spec rev3.0 requires software to use 256-bit
>> descriptors in invalidation queue. As the spec reads in
>> section 6.5.2:
>>
>> Remapping hardware supporting Scalable Mode Translations
>> (ECAP_REG.SMTS=1) allow software to additionally program
>> the width of the descriptors (128-bits or 256-bits) that
>> will be written into the Queue. Software should setup the
>> Invalidation Queue for 256-bit descriptors before progra-
>> mming remapping hardware for scalable-mode translation as
>> 128-bit descriptors are treated as invalid descriptors
>> (see Table 21 in Section 6.5.2.10) in scalable-mode.
>>
>> This patch adds 256-bit invalidation descriptor support
>> if the hardware presents scalable mode capability.
>>
>> Cc: Ashok Raj <[email protected]>
>> Cc: Jacob Pan <[email protected]>
>> Cc: Kevin Tian <[email protected]>
>> Cc: Liu Yi L <[email protected]>
>> Signed-off-by: Sanjay Kumar <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> ---
>> drivers/iommu/dmar.c | 83 +++++++++++++++++++----------
>> drivers/iommu/intel-svm.c | 76 ++++++++++++++++----------
>> drivers/iommu/intel_irq_remapping.c | 6 ++-
>> include/linux/intel-iommu.h | 7 ++-
>> 4 files changed, 113 insertions(+), 59 deletions(-)
>>
>> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
>> index d9c748b6f9e4..b1429fa2cf29 100644
>> --- a/drivers/iommu/dmar.c
>> +++ b/drivers/iommu/dmar.c
>> @@ -1160,6 +1160,7 @@ static int qi_check_fault(struct intel_iommu
>> *iommu, int index)
>> int head, tail;
>> struct q_inval *qi = iommu->qi;
>> int wait_index = (index + 1) % QI_LENGTH;
>> + int shift = DMAR_IQ_SHIFT + !!ecap_smts(iommu->ecap);
>
> could add a new macro: qi_shift()
Fair enough.
>
>>
>> if (qi->desc_status[wait_index] == QI_ABORT)
>> return -EAGAIN;
>> @@ -1173,13 +1174,15 @@ static int qi_check_fault(struct intel_iommu
>> *iommu, int index)
>> */
>> if (fault & DMA_FSTS_IQE) {
>> head = readl(iommu->reg + DMAR_IQH_REG);
>> - if ((head >> DMAR_IQ_SHIFT) == index) {
>> + if ((head >> shift) == index) {
>
> could be another macro: qi_index(head)
Fair enough.
>
>> + struct qi_desc *desc = qi->desc + head;
>> +
>> pr_err("VT-d detected invalid descriptor: "
>> "low=%llx, high=%llx\n",
>> - (unsigned long long)qi->desc[index].low,
>> - (unsigned long long)qi->desc[index].high);
>> - memcpy(&qi->desc[index], &qi->desc[wait_index],
>> - sizeof(struct qi_desc));
>> + (unsigned long long)desc->qw0,
>> + (unsigned long long)desc->qw1);
>
> what about qw2 and qw3 in 256-bit case?
Should print qw2 and qw3 as well.
>
>> + memcpy(desc, qi->desc + (wait_index << shift),
>> + 1 << shift);
>> writel(DMA_FSTS_IQE, iommu->reg +
>> DMAR_FSTS_REG);
>> return -EINVAL;
>> }
>> @@ -1191,10 +1194,10 @@ static int qi_check_fault(struct intel_iommu
>> *iommu, int index)
>> */
>> if (fault & DMA_FSTS_ITE) {
>> head = readl(iommu->reg + DMAR_IQH_REG);
>> - head = ((head >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) %
>> QI_LENGTH;
>> + head = ((head >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
>> head |= 1;
>> tail = readl(iommu->reg + DMAR_IQT_REG);
>> - tail = ((tail >> DMAR_IQ_SHIFT) - 1 + QI_LENGTH) %
>> QI_LENGTH;
>> + tail = ((tail >> shift) - 1 + QI_LENGTH) % QI_LENGTH;
>>
>> writel(DMA_FSTS_ITE, iommu->reg + DMAR_FSTS_REG);
>>
>> @@ -1222,15 +1225,14 @@ int qi_submit_sync(struct qi_desc *desc, struct
>> intel_iommu *iommu)
>> {
>> int rc;
>> struct q_inval *qi = iommu->qi;
>> - struct qi_desc *hw, wait_desc;
>> + int offset, shift, length;
>> + struct qi_desc wait_desc;
>> int wait_index, index;
>> unsigned long flags;
>>
>> if (!qi)
>> return 0;
>>
>> - hw = qi->desc;
>> -
>> restart:
>> rc = 0;
>>
>> @@ -1243,16 +1245,21 @@ int qi_submit_sync(struct qi_desc *desc, struct
>> intel_iommu *iommu)
>>
>> index = qi->free_head;
>> wait_index = (index + 1) % QI_LENGTH;
>> + shift = DMAR_IQ_SHIFT + !!ecap_smts(iommu->ecap);
>> + length = 1 << shift;
>>
>> qi->desc_status[index] = qi->desc_status[wait_index] = QI_IN_USE;
>>
>> - hw[index] = *desc;
>> -
>> - wait_desc.low = QI_IWD_STATUS_DATA(QI_DONE) |
>> + offset = index << shift;
>> + memcpy(qi->desc + offset, desc, length);
>> + wait_desc.qw0 = QI_IWD_STATUS_DATA(QI_DONE) |
>> QI_IWD_STATUS_WRITE | QI_IWD_TYPE;
>> - wait_desc.high = virt_to_phys(&qi->desc_status[wait_index]);
>> + wait_desc.qw1 = virt_to_phys(&qi->desc_status[wait_index]);
>> + wait_desc.qw2 = 0;
>> + wait_desc.qw3 = 0;
>>
>> - hw[wait_index] = wait_desc;
>> + offset = wait_index << shift;
>> + memcpy(qi->desc + offset, &wait_desc, length);
>>
>> qi->free_head = (qi->free_head + 2) % QI_LENGTH;
>> qi->free_cnt -= 2;
>> @@ -1261,7 +1268,7 @@ int qi_submit_sync(struct qi_desc *desc, struct
>> intel_iommu *iommu)
>> * update the HW tail register indicating the presence of
>> * new descriptors.
>> */
>> - writel(qi->free_head << DMAR_IQ_SHIFT, iommu->reg +
>> DMAR_IQT_REG);
>> + writel(qi->free_head << shift, iommu->reg + DMAR_IQT_REG);
>>
>> while (qi->desc_status[wait_index] != QI_DONE) {
>> /*
>> @@ -1298,8 +1305,10 @@ void qi_global_iec(struct intel_iommu *iommu)
>> {
>> struct qi_desc desc;
>>
>> - desc.low = QI_IEC_TYPE;
>> - desc.high = 0;
>> + desc.qw0 = QI_IEC_TYPE;
>> + desc.qw1 = 0;
>> + desc.qw2 = 0;
>> + desc.qw3 = 0;
>>
>> /* should never fail */
>> qi_submit_sync(&desc, iommu);
>> @@ -1310,9 +1319,11 @@ void qi_flush_context(struct intel_iommu
>> *iommu, u16 did, u16 sid, u8 fm,
>> {
>> struct qi_desc desc;
>>
>> - desc.low = QI_CC_FM(fm) | QI_CC_SID(sid) | QI_CC_DID(did)
>> + desc.qw0 = QI_CC_FM(fm) | QI_CC_SID(sid) | QI_CC_DID(did)
>> | QI_CC_GRAN(type) | QI_CC_TYPE;
>> - desc.high = 0;
>> + desc.qw1 = 0;
>> + desc.qw2 = 0;
>> + desc.qw3 = 0;
>>
>> qi_submit_sync(&desc, iommu);
>> }
>> @@ -1331,10 +1342,12 @@ void qi_flush_iotlb(struct intel_iommu
>> *iommu, u16 did, u64 addr,
>> if (cap_read_drain(iommu->cap))
>> dr = 1;
>>
>> - desc.low = QI_IOTLB_DID(did) | QI_IOTLB_DR(dr) |
>> QI_IOTLB_DW(dw)
>> + desc.qw0 = QI_IOTLB_DID(did) | QI_IOTLB_DR(dr) |
>> QI_IOTLB_DW(dw)
>> | QI_IOTLB_GRAN(type) | QI_IOTLB_TYPE;
>> - desc.high = QI_IOTLB_ADDR(addr) | QI_IOTLB_IH(ih)
>> + desc.qw1 = QI_IOTLB_ADDR(addr) | QI_IOTLB_IH(ih)
>> | QI_IOTLB_AM(size_order);
>> + desc.qw2 = 0;
>> + desc.qw3 = 0;
>>
>> qi_submit_sync(&desc, iommu);
>> }
>> @@ -1347,15 +1360,17 @@ void qi_flush_dev_iotlb(struct intel_iommu
>> *iommu, u16 sid, u16 pfsid,
>> if (mask) {
>> WARN_ON_ONCE(addr & ((1ULL << (VTD_PAGE_SHIFT +
>> mask)) - 1));
>> addr |= (1ULL << (VTD_PAGE_SHIFT + mask - 1)) - 1;
>> - desc.high = QI_DEV_IOTLB_ADDR(addr) |
>> QI_DEV_IOTLB_SIZE;
>> + desc.qw1 = QI_DEV_IOTLB_ADDR(addr) |
>> QI_DEV_IOTLB_SIZE;
>> } else
>> - desc.high = QI_DEV_IOTLB_ADDR(addr);
>> + desc.qw1 = QI_DEV_IOTLB_ADDR(addr);
>>
>> if (qdep >= QI_DEV_IOTLB_MAX_INVS)
>> qdep = 0;
>>
>> - desc.low = QI_DEV_IOTLB_SID(sid) | QI_DEV_IOTLB_QDEP(qdep) |
>> + desc.qw0 = QI_DEV_IOTLB_SID(sid) | QI_DEV_IOTLB_QDEP(qdep) |
>> QI_DIOTLB_TYPE | QI_DEV_IOTLB_PFSID(pfsid);
>> + desc.qw2 = 0;
>> + desc.qw3 = 0;
>>
>> qi_submit_sync(&desc, iommu);
>> }
>> @@ -1403,16 +1418,24 @@ static void __dmar_enable_qi(struct
>> intel_iommu *iommu)
>> u32 sts;
>> unsigned long flags;
>> struct q_inval *qi = iommu->qi;
>> + u64 val = virt_to_phys(qi->desc);
>>
>> qi->free_head = qi->free_tail = 0;
>> qi->free_cnt = QI_LENGTH;
>>
>> + /*
>> + * Set DW=1 and QS=1 in IQA_REG when Scalable Mode capability
>> + * is present.
>> + */
>> + if (ecap_smts(iommu->ecap))
>> + val |= (1 << 11) | 1;
>> +
>> raw_spin_lock_irqsave(&iommu->register_lock, flags);
>>
>> /* write zero to the tail reg */
>> writel(0, iommu->reg + DMAR_IQT_REG);
>>
>> - dmar_writeq(iommu->reg + DMAR_IQA_REG, virt_to_phys(qi-
>>> desc));
>> + dmar_writeq(iommu->reg + DMAR_IQA_REG, val);
>>
>> iommu->gcmd |= DMA_GCMD_QIE;
>> writel(iommu->gcmd, iommu->reg + DMAR_GCMD_REG);
>> @@ -1448,8 +1471,12 @@ int dmar_enable_qi(struct intel_iommu
>> *iommu)
>>
>> qi = iommu->qi;
>>
>> -
>> - desc_page = alloc_pages_node(iommu->node, GFP_ATOMIC |
>> __GFP_ZERO, 0);
>> + /*
>> + * Need two pages to accommodate 256 descriptors of 256 bits each
>> + * if the remapping hardware supports scalable mode translation.
>> + */
>> + desc_page = alloc_pages_node(iommu->node, GFP_ATOMIC |
>> __GFP_ZERO,
>> + !!ecap_smts(iommu->ecap));
>> if (!desc_page) {
>> kfree(qi);
>> iommu->qi = NULL;
>> diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
>> index 6c0bd9ee9602..a06ed098e928 100644
>> --- a/drivers/iommu/intel-svm.c
>> +++ b/drivers/iommu/intel-svm.c
>> @@ -161,27 +161,40 @@ static void intel_flush_svm_range_dev (struct
>> intel_svm *svm, struct intel_svm_d
>> * because that's the only option the hardware gives us.
>> Despite
>> * the fact that they are actually only accessible through one.
>> */
>> if (gl)
>> - desc.low = QI_EIOTLB_PASID(svm->pasid) |
>> QI_EIOTLB_DID(sdev->did) |
>> - QI_EIOTLB_GRAN(QI_GRAN_ALL_ALL) |
>> QI_EIOTLB_TYPE;
>> + desc.qw0 = QI_EIOTLB_PASID(svm->pasid) |
>> + QI_EIOTLB_DID(sdev->did) |
>> +
>> QI_EIOTLB_GRAN(QI_GRAN_ALL_ALL) |
>> + QI_EIOTLB_TYPE;
>> else
>> - desc.low = QI_EIOTLB_PASID(svm->pasid) |
>> QI_EIOTLB_DID(sdev->did) |
>> - QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID)
>> | QI_EIOTLB_TYPE;
>> - desc.high = 0;
>> + desc.qw0 = QI_EIOTLB_PASID(svm->pasid) |
>> + QI_EIOTLB_DID(sdev->did) |
>> +
>> QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
>> + QI_EIOTLB_TYPE;
>> + desc.qw1 = 0;
>> } else {
>> int mask = ilog2(__roundup_pow_of_two(pages));
>>
>> - desc.low = QI_EIOTLB_PASID(svm->pasid) |
>> QI_EIOTLB_DID(sdev->did) |
>> - QI_EIOTLB_GRAN(QI_GRAN_PSI_PASID) |
>> QI_EIOTLB_TYPE;
>> - desc.high = QI_EIOTLB_ADDR(address) | QI_EIOTLB_GL(gl) |
>> - QI_EIOTLB_IH(ih) | QI_EIOTLB_AM(mask);
>> + desc.qw0 = QI_EIOTLB_PASID(svm->pasid) |
>> + QI_EIOTLB_DID(sdev->did) |
>> + QI_EIOTLB_GRAN(QI_GRAN_PSI_PASID) |
>> + QI_EIOTLB_TYPE;
>> + desc.qw1 = QI_EIOTLB_ADDR(address) |
>> + QI_EIOTLB_GL(gl) |
>> + QI_EIOTLB_IH(ih) |
>> + QI_EIOTLB_AM(mask);
>> }
>> + desc.qw2 = 0;
>> + desc.qw3 = 0;
>> qi_submit_sync(&desc, svm->iommu);
>>
>> if (sdev->dev_iotlb) {
>> - desc.low = QI_DEV_EIOTLB_PASID(svm->pasid) |
>> QI_DEV_EIOTLB_SID(sdev->sid) |
>> - QI_DEV_EIOTLB_QDEP(sdev->qdep) |
>> QI_DEIOTLB_TYPE;
>> + desc.qw0 = QI_DEV_EIOTLB_PASID(svm->pasid) |
>> + QI_DEV_EIOTLB_SID(sdev->sid) |
>> + QI_DEV_EIOTLB_QDEP(sdev->qdep) |
>> + QI_DEIOTLB_TYPE;
>> if (pages == -1) {
>> - desc.high = QI_DEV_EIOTLB_ADDR(-1ULL >> 1) |
>> QI_DEV_EIOTLB_SIZE;
>> + desc.qw1 = QI_DEV_EIOTLB_ADDR(-1ULL >> 1) |
>> + QI_DEV_EIOTLB_SIZE;
>> } else if (pages > 1) {
>> /* The least significant zero bit indicates the size. So,
>> * for example, an "address" value of 0x12345f000
>> will
>> @@ -189,10 +202,13 @@ static void intel_flush_svm_range_dev (struct
>> intel_svm *svm, struct intel_svm_d
>> unsigned long last = address + ((unsigned
>> long)(pages - 1) << VTD_PAGE_SHIFT);
>> unsigned long mask =
>> __rounddown_pow_of_two(address ^ last);
>>
>> - desc.high = QI_DEV_EIOTLB_ADDR((address &
>> ~mask) | (mask - 1)) | QI_DEV_EIOTLB_SIZE;
>> + desc.qw1 = QI_DEV_EIOTLB_ADDR((address &
>> ~mask) |
>> + (mask - 1)) | QI_DEV_EIOTLB_SIZE;
>> } else {
>> - desc.high = QI_DEV_EIOTLB_ADDR(address);
>> + desc.qw1 = QI_DEV_EIOTLB_ADDR(address);
>> }
>> + desc.qw2 = 0;
>> + desc.qw3 = 0;
>> qi_submit_sync(&desc, svm->iommu);
>> }
>> }
>> @@ -237,8 +253,11 @@ static void intel_flush_pasid_dev(struct intel_svm
>> *svm, struct intel_svm_dev *s
>> {
>> struct qi_desc desc;
>>
>> - desc.high = 0;
>> - desc.low = QI_PC_TYPE | QI_PC_DID(sdev->did) | QI_PC_PASID_SEL
>> | QI_PC_PASID(pasid);
>> + desc.qw0 = QI_PC_TYPE | QI_PC_DID(sdev->did) |
>> + QI_PC_PASID_SEL | QI_PC_PASID(pasid);
>> + desc.qw1 = 0;
>> + desc.qw2 = 0;
>> + desc.qw3 = 0;
>>
>> qi_submit_sync(&desc, svm->iommu);
>> }
>> @@ -668,24 +687,27 @@ static irqreturn_t prq_event_thread(int irq, void
>> *d)
>> no_pasid:
>> if (req->lpig) {
>> /* Page Group Response */
>> - resp.low = QI_PGRP_PASID(req->pasid) |
>> + resp.qw0 = QI_PGRP_PASID(req->pasid) |
>> QI_PGRP_DID((req->bus << 8) | req->devfn)
>> |
>> QI_PGRP_PASID_P(req->pasid_present) |
>> QI_PGRP_RESP_TYPE;
>> - resp.high = QI_PGRP_IDX(req->prg_index) |
>> - QI_PGRP_PRIV(req->private) |
>> QI_PGRP_RESP_CODE(result);
>> -
>> - qi_submit_sync(&resp, iommu);
>> + resp.qw1 = QI_PGRP_IDX(req->prg_index) |
>> + QI_PGRP_PRIV(req->private) |
>> + QI_PGRP_RESP_CODE(result);
>> } else if (req->srr) {
>> /* Page Stream Response */
>> - resp.low = QI_PSTRM_IDX(req->prg_index) |
>> - QI_PSTRM_PRIV(req->private) |
>> QI_PSTRM_BUS(req->bus) |
>> - QI_PSTRM_PASID(req->pasid) |
>> QI_PSTRM_RESP_TYPE;
>> - resp.high = QI_PSTRM_ADDR(address) |
>> QI_PSTRM_DEVFN(req->devfn) |
>> + resp.qw0 = QI_PSTRM_IDX(req->prg_index) |
>> + QI_PSTRM_PRIV(req->private) |
>> + QI_PSTRM_BUS(req->bus) |
>> + QI_PSTRM_PASID(req->pasid) |
>> + QI_PSTRM_RESP_TYPE;
>> + resp.qw1 = QI_PSTRM_ADDR(address) |
>> + QI_PSTRM_DEVFN(req->devfn) |
>> QI_PSTRM_RESP_CODE(result);
>> -
>> - qi_submit_sync(&resp, iommu);
>> }
>> + resp.qw2 = 0;
>> + resp.qw3 = 0;
>> + qi_submit_sync(&resp, iommu);
>>
>> head = (head + sizeof(*req)) & PRQ_RING_MASK;
>> }
>> diff --git a/drivers/iommu/intel_irq_remapping.c
>> b/drivers/iommu/intel_irq_remapping.c
>> index 967450bd421a..916391f33ca6 100644
>> --- a/drivers/iommu/intel_irq_remapping.c
>> +++ b/drivers/iommu/intel_irq_remapping.c
>> @@ -145,9 +145,11 @@ static int qi_flush_iec(struct intel_iommu *iommu,
>> int index, int mask)
>> {
>> struct qi_desc desc;
>>
>> - desc.low = QI_IEC_IIDEX(index) | QI_IEC_TYPE | QI_IEC_IM(mask)
>> + desc.qw0 = QI_IEC_IIDEX(index) | QI_IEC_TYPE | QI_IEC_IM(mask)
>> | QI_IEC_SELECTIVE;
>> - desc.high = 0;
>> + desc.qw1 = 0;
>> + desc.qw2 = 0;
>> + desc.qw3 = 0;
>>
>> return qi_submit_sync(&desc, iommu);
>> }
>> diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h
>> index 41791903a5e3..72aff482b293 100644
>> --- a/include/linux/intel-iommu.h
>> +++ b/include/linux/intel-iommu.h
>> @@ -340,12 +340,15 @@ enum {
>> #define QI_GRAN_PSI_PASID 3
>>
>> struct qi_desc {
>> - u64 low, high;
>> + u64 qw0;
>> + u64 qw1;
>> + u64 qw2;
>> + u64 qw3;
>> };
>>
>> struct q_inval {
>> raw_spinlock_t q_lock;
>> - struct qi_desc *desc; /* invalidation queue */
>> + void *desc; /* invalidation queue */
>> int *desc_status; /* desc status */
>> int free_head; /* first free entry */
>> int free_tail; /* last free entry */
>> --
>> 2.17.1
>
>
Best regards,
Lu Baolu
Hi,
On 09/06/2018 11:17 AM, Tian, Kevin wrote:
>> From: Lu Baolu [mailto:[email protected]]
>> Sent: Thursday, August 30, 2018 9:35 AM
>>
>> So that the pasid related info, such as the pasid table and the
>> maximum of pasid could be used during setting up scalable mode
>> context.
>>
>> Cc: Ashok Raj <[email protected]>
>> Cc: Jacob Pan <[email protected]>
>> Cc: Kevin Tian <[email protected]>
>> Cc: Liu Yi L <[email protected]>
>> Signed-off-by: Lu Baolu <[email protected]>
>> Reviewed-by: Ashok Raj <[email protected]>
>
> Reviewed-by: Kevin Tian <[email protected]>
>
Thank you, Kevin.
Best regards,
Lu Baolu
>> ---
>> drivers/iommu/intel-iommu.c | 14 +++++++++++---
>> 1 file changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
>> index c3bf2ccf094d..33642dd3d6ba 100644
>> --- a/drivers/iommu/intel-iommu.c
>> +++ b/drivers/iommu/intel-iommu.c
>> @@ -1942,6 +1942,7 @@ static void domain_exit(struct dmar_domain
>> *domain)
>>
>> static int domain_context_mapping_one(struct dmar_domain *domain,
>> struct intel_iommu *iommu,
>> + struct pasid_table *table,
>> u8 bus, u8 devfn)
>> {
>> u16 did = domain->iommu_did[iommu->seq_id];
>> @@ -2064,6 +2065,7 @@ static int domain_context_mapping_one(struct
>> dmar_domain *domain,
>> struct domain_context_mapping_data {
>> struct dmar_domain *domain;
>> struct intel_iommu *iommu;
>> + struct pasid_table *table;
>> };
>>
>> static int domain_context_mapping_cb(struct pci_dev *pdev,
>> @@ -2072,25 +2074,31 @@ static int domain_context_mapping_cb(struct
>> pci_dev *pdev,
>> struct domain_context_mapping_data *data = opaque;
>>
>> return domain_context_mapping_one(data->domain, data-
>>> iommu,
>> - PCI_BUS_NUM(alias), alias & 0xff);
>> + data->table, PCI_BUS_NUM(alias),
>> + alias & 0xff);
>> }
>>
>> static int
>> domain_context_mapping(struct dmar_domain *domain, struct device
>> *dev)
>> {
>> + struct domain_context_mapping_data data;
>> + struct pasid_table *table;
>> struct intel_iommu *iommu;
>> u8 bus, devfn;
>> - struct domain_context_mapping_data data;
>>
>> iommu = device_to_iommu(dev, &bus, &devfn);
>> if (!iommu)
>> return -ENODEV;
>>
>> + table = intel_pasid_get_table(dev);
>> +
>> if (!dev_is_pci(dev))
>> - return domain_context_mapping_one(domain, iommu, bus,
>> devfn);
>> + return domain_context_mapping_one(domain, iommu,
>> table,
>> + bus, devfn);
>>
>> data.domain = domain;
>> data.iommu = iommu;
>> + data.table = table;
>>
>> return pci_for_each_dma_alias(to_pci_dev(dev),
>> &domain_context_mapping_cb, &data);
>> --
>> 2.17.1
>
>
On Fri, Sep 07, 2018 at 10:47:11AM +0800, Lu Baolu wrote:
>
> >>+
> >>+ intel_pasid_clear_entry(dev, pasid);
> >>+
> >>+ if (!ecap_coherent(iommu->ecap)) {
> >>+ pte = intel_pasid_get_entry(dev, pasid);
> >>+ clflush_cache_range(pte, sizeof(*pte));
> >>+ }
> >>+
> >>+ pasid_based_pasid_cache_invalidation(iommu, did, pasid);
> >>+ pasid_based_iotlb_cache_invalidation(iommu, did, pasid);
> >>+
> >>+ /* Device IOTLB doesn't need to be flushed in caching mode. */
> >>+ if (!cap_caching_mode(iommu->cap))
> >>+ pasid_based_dev_iotlb_cache_invalidation(iommu, dev,
> >>pasid);
> >
> >can you elaborate, or point to any spec reference?
> >
>
> In the driver, device iotlb doesn't get flushed in caching mode. I just
> follow what have been done there.
>
> It also makes sense to me since only the bare metal host needs to
> consider whether and how to flush the device iotlb.
>
DavidW might remember, i think the idea was to help with cost
of virtualization, we can avoid taking 2 exits vs handling
it directly when we do iotlb flushing instead.
the other optimization was to only do devtlb flushing when you unmap
since when establish not-present to present there is no need to
flush devtlb at that point.
> From: Raj, Ashok
> Sent: Saturday, September 8, 2018 1:43 AM
>
> On Fri, Sep 07, 2018 at 10:47:11AM +0800, Lu Baolu wrote:
> >
> > >>+
> > >>+ intel_pasid_clear_entry(dev, pasid);
> > >>+
> > >>+ if (!ecap_coherent(iommu->ecap)) {
> > >>+ pte = intel_pasid_get_entry(dev, pasid);
> > >>+ clflush_cache_range(pte, sizeof(*pte));
> > >>+ }
> > >>+
> > >>+ pasid_based_pasid_cache_invalidation(iommu, did, pasid);
> > >>+ pasid_based_iotlb_cache_invalidation(iommu, did, pasid);
> > >>+
> > >>+ /* Device IOTLB doesn't need to be flushed in caching mode. */
> > >>+ if (!cap_caching_mode(iommu->cap))
> > >>+ pasid_based_dev_iotlb_cache_invalidation(iommu, dev,
> > >>pasid);
> > >
> > >can you elaborate, or point to any spec reference?
> > >
> >
> > In the driver, device iotlb doesn't get flushed in caching mode. I just
> > follow what have been done there.
> >
> > It also makes sense to me since only the bare metal host needs to
> > consider whether and how to flush the device iotlb.
> >
>
> DavidW might remember, i think the idea was to help with cost
> of virtualization, we can avoid taking 2 exits vs handling
> it directly when we do iotlb flushing instead.
>
OK, performance-wise it makes sense. though strictly speaking it
doesn't follow spec...
Thanks
Kevin