2021-03-12 08:57:07

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [RFC PATCH 0/7] iommu/amd: Add Generic IO Page Table Framework Support for v2 Page Table

This series introduces a new usage model for the v2 page table, where it
can be used to implement support for DMA-API by adopting the generic
IO page table framework.

One of the target usecases is to support nested IO page tables
where the guest uses the guest IO page table (v2) for translating
GVA to GPA, and the hypervisor uses the host I/O page table (v1) for
translating GPA to SPA. This is a pre-requisite for supporting the new
HW-assisted vIOMMU presented at the KVM Forum 2020.

https://static.sched.com/hosted_files/kvmforum2020/26/vIOMMU%20KVM%20Forum%202020.pdf

The following components are introduced in this series:

- Part 1 (patch 1-4 and 7)
Refactor the current IOMMU page table v2 code
to adopt the generic IO page table framework, and add
AMD IOMMU Guest (v2) page table management code.

- Part 2 (patch 5)
Add support for the AMD IOMMU Guest IO Protection feature (GIOV)
where requests from the I/O device without a PASID are treated as
if they have PASID of 0.

- Part 3 (patch 6)
Introduce new amd_iommu_pgtable command-line to allow users
to select the mode of operation (v1 or v2).

See AMD I/O Virtualization Technology Specification for more detail.

http://www.amd.com/system/files/TechDocs/48882_IOMMU_3.05_PUB.pdf

Thanks,
Suravee

Suravee Suthikulpanit (7):
iommu/amd: Refactor amd_iommu_domain_enable_v2
iommu/amd: Update sanity check when enable PRI/ATS
iommu/amd: Decouple the logic to enable PPR and GT
iommu/amd: Initial support for AMD IOMMU v2 page table
iommu/amd: Add support for Guest IO protection
iommu/amd: Introduce amd_iommu_pgtable command-line option
iommu/amd: Add support for using AMD IOMMU v2 page table for DMA-API

.../admin-guide/kernel-parameters.txt | 6 +
drivers/iommu/amd/Makefile | 2 +-
drivers/iommu/amd/amd_iommu_types.h | 5 +
drivers/iommu/amd/init.c | 42 ++-
drivers/iommu/amd/io_pgtable_v2.c | 239 ++++++++++++++++++
drivers/iommu/amd/iommu.c | 81 ++++--
drivers/iommu/io-pgtable.c | 1 +
include/linux/io-pgtable.h | 2 +
8 files changed, 345 insertions(+), 33 deletions(-)
create mode 100644 drivers/iommu/amd/io_pgtable_v2.c

--
2.17.1


2021-03-12 08:57:09

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [RFC PATCH 1/7] iommu/amd: Refactor amd_iommu_domain_enable_v2

The current function to enable IOMMU v2 also lock the domain.
In order to reuse the same code in different code path, in which
the domain has already been locked, refactor the function to separate
the locking from the enabling logic.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
drivers/iommu/amd/iommu.c | 42 +++++++++++++++++++++++++--------------
1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index a69a8b573e40..6f3e42495709 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -88,6 +88,7 @@ struct iommu_cmd {
struct kmem_cache *amd_iommu_irq_cache;

static void detach_device(struct device *dev);
+static int domain_enable_v2(struct protection_domain *domain, int pasids, bool has_ppr);

/****************************************************************************
*
@@ -2304,10 +2305,9 @@ void amd_iommu_domain_direct_map(struct iommu_domain *dom)
}
EXPORT_SYMBOL(amd_iommu_domain_direct_map);

-int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids)
+/* Note: This function expects iommu_domain->lock to be held prior calling the function. */
+static int domain_enable_v2(struct protection_domain *domain, int pasids, bool has_ppr)
{
- struct protection_domain *domain = to_pdomain(dom);
- unsigned long flags;
int levels, ret;

if (pasids <= 0 || pasids > (PASID_MASK + 1))
@@ -2320,17 +2320,6 @@ int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids)
if (levels > amd_iommu_max_glx_val)
return -EINVAL;

- spin_lock_irqsave(&domain->lock, flags);
-
- /*
- * Save us all sanity checks whether devices already in the
- * domain support IOMMUv2. Just force that the domain has no
- * devices attached when it is switched into IOMMUv2 mode.
- */
- ret = -EBUSY;
- if (domain->dev_cnt > 0 || domain->flags & PD_IOMMUV2_MASK)
- goto out;
-
ret = -ENOMEM;
domain->gcr3_tbl = (void *)get_zeroed_page(GFP_ATOMIC);
if (domain->gcr3_tbl == NULL)
@@ -2344,8 +2333,31 @@ int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids)
ret = 0;

out:
- spin_unlock_irqrestore(&domain->lock, flags);
+ return ret;
+}

+int amd_iommu_domain_enable_v2(struct iommu_domain *dom, int pasids)
+{
+ int ret;
+ unsigned long flags;
+ struct protection_domain *pdom = to_pdomain(dom);
+
+ spin_lock_irqsave(&pdom->lock, flags);
+
+ /*
+ * Save us all sanity checks whether devices already in the
+ * domain support IOMMUv2. Just force that the domain has no
+ * devices attached when it is switched into IOMMUv2 mode.
+ */
+ ret = -EBUSY;
+ if (pdom->dev_cnt > 0 || pdom->flags & PD_IOMMUV2_MASK)
+ goto out;
+
+ if (pdom->dev_cnt == 0 && !(pdom->gcr3_tbl))
+ ret = domain_enable_v2(pdom, pasids, true);
+
+out:
+ spin_unlock_irqrestore(&pdom->lock, flags);
return ret;
}
EXPORT_SYMBOL(amd_iommu_domain_enable_v2);
--
2.17.1

2021-03-12 08:57:17

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [RFC PATCH 3/7] iommu/amd: Decouple the logic to enable PPR and GT

Currently, the function to enable iommu v2 (GT) assumes PPR log
must also be enabled. This is no longer the case since the IOMMU
v2 page table can be enabled without PRR support (for DMA-API
use case).

Therefore, separate the enabling logic for PPR and GT.
There is no functional change.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
drivers/iommu/amd/init.c | 19 +++++--------------
1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 9126efcbaf2c..5def566de6f6 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -898,14 +898,6 @@ static void iommu_enable_xt(struct amd_iommu *iommu)
#endif /* CONFIG_IRQ_REMAP */
}

-static void iommu_enable_gt(struct amd_iommu *iommu)
-{
- if (!iommu_feature(iommu, FEATURE_GT))
- return;
-
- iommu_feature_enable(iommu, CONTROL_GT_EN);
-}
-
/* sets a specific bit in the device table entry. */
static void set_dev_entry_bit(u16 devid, u8 bit)
{
@@ -1882,6 +1874,7 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
amd_iommu_max_glx_val = glxval;
else
amd_iommu_max_glx_val = min(amd_iommu_max_glx_val, glxval);
+ iommu_feature_enable(iommu, CONTROL_GT_EN);
}

if (iommu_feature(iommu, FEATURE_GT) &&
@@ -2530,21 +2523,19 @@ static void early_enable_iommus(void)
#endif
}

-static void enable_iommus_v2(void)
+static void enable_iommus_ppr(void)
{
struct amd_iommu *iommu;

- for_each_iommu(iommu) {
+ for_each_iommu(iommu)
iommu_enable_ppr_log(iommu);
- iommu_enable_gt(iommu);
- }
}

static void enable_iommus(void)
{
early_enable_iommus();

- enable_iommus_v2();
+ enable_iommus_ppr();
}

static void disable_iommus(void)
@@ -2935,7 +2926,7 @@ static int __init state_next(void)
register_syscore_ops(&amd_iommu_syscore_ops);
ret = amd_iommu_init_pci();
init_state = ret ? IOMMU_INIT_ERROR : IOMMU_PCI_INIT;
- enable_iommus_v2();
+ enable_iommus_ppr();
break;
case IOMMU_PCI_INIT:
ret = amd_iommu_enable_interrupts();
--
2.17.1

2021-03-12 08:58:34

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [RFC PATCH 7/7] iommu/amd: Add support for using AMD IOMMU v2 page table for DMA-API

Introduce init function for setting up DMA domain for DMA-API with
the IOMMU v2 page table.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
drivers/iommu/amd/iommu.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index e29ece6e1e68..bd26de8764bd 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1937,6 +1937,24 @@ static int protection_domain_init_v1(struct protection_domain *domain, int mode)
return 0;
}

+static int protection_domain_init_v2(struct protection_domain *domain)
+{
+ spin_lock_init(&domain->lock);
+ domain->id = domain_id_alloc();
+ if (!domain->id)
+ return -ENOMEM;
+ INIT_LIST_HEAD(&domain->dev_list);
+
+ domain->giov = true;
+
+ if (amd_iommu_pgtable == AMD_IOMMU_V2 &&
+ domain_enable_v2(domain, 1, false)) {
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
static struct protection_domain *protection_domain_alloc(unsigned int type)
{
struct io_pgtable_ops *pgtbl_ops;
@@ -1964,6 +1982,9 @@ static struct protection_domain *protection_domain_alloc(unsigned int type)
case AMD_IOMMU_V1:
ret = protection_domain_init_v1(domain, mode);
break;
+ case AMD_IOMMU_V2:
+ ret = protection_domain_init_v2(domain);
+ break;
default:
ret = -EINVAL;
}
--
2.17.1

2021-03-12 08:59:15

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [RFC PATCH 4/7] iommu/amd: Initial support for AMD IOMMU v2 page table

Introduce IO page table framework support for AMD IOMMU v2 page table.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
drivers/iommu/amd/Makefile | 2 +-
drivers/iommu/amd/amd_iommu_types.h | 2 +
drivers/iommu/amd/io_pgtable_v2.c | 239 ++++++++++++++++++++++++++++
drivers/iommu/io-pgtable.c | 1 +
include/linux/io-pgtable.h | 2 +
5 files changed, 245 insertions(+), 1 deletion(-)
create mode 100644 drivers/iommu/amd/io_pgtable_v2.c

diff --git a/drivers/iommu/amd/Makefile b/drivers/iommu/amd/Makefile
index a935f8f4b974..773d8aa00283 100644
--- a/drivers/iommu/amd/Makefile
+++ b/drivers/iommu/amd/Makefile
@@ -1,4 +1,4 @@
# SPDX-License-Identifier: GPL-2.0-only
-obj-$(CONFIG_AMD_IOMMU) += iommu.o init.o quirks.o io_pgtable.o
+obj-$(CONFIG_AMD_IOMMU) += iommu.o init.o quirks.o io_pgtable.o io_pgtable_v2.o
obj-$(CONFIG_AMD_IOMMU_DEBUGFS) += debugfs.o
obj-$(CONFIG_AMD_IOMMU_V2) += iommu_v2.o
diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 6937e3674a16..25062eb86c8b 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -265,6 +265,7 @@
* 512GB Pages are not supported due to a hardware bug
*/
#define AMD_IOMMU_PGSIZES ((~0xFFFUL) & ~(2ULL << 38))
+#define AMD_IOMMU_PGSIZES_V2 (PAGE_SIZE | (1ULL << 12) | (1ULL << 30))

/* Bit value definition for dte irq remapping fields*/
#define DTE_IRQ_PHYS_ADDR_MASK (((1ULL << 45)-1) << 6)
@@ -503,6 +504,7 @@ struct amd_io_pgtable {
int mode;
u64 *root;
atomic64_t pt_root; /* pgtable root and pgtable mode */
+ struct mm_struct v2_mm;
};

/*
diff --git a/drivers/iommu/amd/io_pgtable_v2.c b/drivers/iommu/amd/io_pgtable_v2.c
new file mode 100644
index 000000000000..b0b6ba2d8d35
--- /dev/null
+++ b/drivers/iommu/amd/io_pgtable_v2.c
@@ -0,0 +1,239 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * CPU-agnostic AMD IO page table v2 allocator.
+ *
+ * Copyright (C) 2020 Advanced Micro Devices, Inc.
+ * Author: Suravee Suthikulpanit <[email protected]>
+ */
+
+#define pr_fmt(fmt) "AMD-Vi: " fmt
+#define dev_fmt(fmt) pr_fmt(fmt)
+
+#include <linux/atomic.h>
+#include <linux/bitops.h>
+#include <linux/io-pgtable.h>
+#include <linux/kernel.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/dma-mapping.h>
+#include <linux/mmu_context.h>
+
+#include <asm/barrier.h>
+#include <asm/pgalloc.h>
+
+#include "amd_iommu_types.h"
+#include "amd_iommu.h"
+
+static pte_t *fetch_pte(struct amd_io_pgtable *pgtable,
+ unsigned long iova,
+ unsigned long *page_size)
+{
+ int level;
+ pte_t *ptep;
+
+ ptep = lookup_address_in_mm(&pgtable->v2_mm, iova, &level);
+ if (!ptep || pte_none(*ptep) || (level == PG_LEVEL_NONE))
+ return NULL;
+
+ *page_size = PTE_LEVEL_PAGE_SIZE(level-1);
+ return ptep;
+}
+
+static pte_t *v2_pte_alloc_map(struct mm_struct *mm, unsigned long vaddr)
+{
+ pgd_t *pgd;
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ pgd = pgd_offset(mm, vaddr);
+ p4d = p4d_alloc(mm, pgd, vaddr);
+ if (!p4d)
+ return NULL;
+ pud = pud_alloc(mm, p4d, vaddr);
+ if (!pud)
+ return NULL;
+ pmd = pmd_alloc(mm, pud, vaddr);
+ if (!pmd)
+ return NULL;
+ pte = pte_alloc_map(mm, pmd, vaddr);
+ return pte;
+}
+
+static int iommu_v2_map_page(struct io_pgtable_ops *ops, unsigned long iova,
+ phys_addr_t paddr, size_t size, int prot, gfp_t gfp)
+{
+ struct protection_domain *dom = io_pgtable_ops_to_domain(ops);
+ struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
+ pte_t *pte;
+ int ret, i, count;
+ bool updated = false;
+ unsigned long o_iova = iova;
+ unsigned long pte_pgsize;
+
+ BUG_ON(!IS_ALIGNED(iova, size) || !IS_ALIGNED(paddr, size));
+
+ ret = -EINVAL;
+ if (!(prot & IOMMU_PROT_MASK))
+ goto out;
+
+ count = PAGE_SIZE_PTE_COUNT(size);
+
+ for (i = 0; i < count; ++i, iova += PAGE_SIZE, paddr += PAGE_SIZE) {
+ pte = fetch_pte(pgtable, iova, &pte_pgsize);
+ if (!pte || pte_none(*pte)) {
+ pte = v2_pte_alloc_map(&dom->iop.v2_mm, iova);
+ if (!pte)
+ goto out;
+ } else {
+ updated = true;
+ }
+ set_pte(pte, __pte((paddr & PAGE_MASK)|_PAGE_PRESENT|_PAGE_USER));
+ if (prot & IOMMU_PROT_IW)
+ *pte = pte_mkwrite(*pte);
+ }
+
+ if (updated) {
+ if (count > 1)
+ amd_iommu_flush_tlb(&dom->domain, 0);
+ else
+ amd_iommu_flush_page(&dom->domain, 0, o_iova);
+ }
+
+ ret = 0;
+out:
+ return ret;
+}
+
+static unsigned long iommu_v2_unmap_page(struct io_pgtable_ops *ops,
+ unsigned long iova,
+ size_t size,
+ struct iommu_iotlb_gather *gather)
+{
+ struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
+ unsigned long vaddr_end, vaddr_next;
+ unsigned long long unmapped;
+ unsigned long pte_pgsize;
+ pte_t *ptep;
+
+ BUG_ON(!is_power_of_2(size));
+
+ unmapped = 0;
+ vaddr_next = iova;
+ vaddr_end = iova + size;
+
+ for (; iova < vaddr_end; iova = vaddr_next) {
+ ptep = fetch_pte(pgtable, iova, &pte_pgsize);
+ if (!ptep || pte_none(*ptep))
+ return 0;
+ pte_unmap(ptep);
+ unmapped += pte_pgsize;
+ vaddr_next = (iova & PAGE_MASK) + pte_pgsize;
+ }
+
+ BUG_ON(unmapped && !is_power_of_2(unmapped));
+ return unmapped;
+}
+
+static phys_addr_t iommu_v2_iova_to_phys(struct io_pgtable_ops *ops, unsigned long iova)
+{
+ struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
+ unsigned long offset_mask, pte_pgsize;
+ u64 __pte;
+ pte_t *ptep;
+
+ if (pgtable->mode == PAGE_MODE_NONE)
+ return iova;
+
+ ptep = fetch_pte(pgtable, iova, &pte_pgsize);
+
+ if (!ptep || pte_none(*ptep))
+ return 0;
+
+ offset_mask = pte_pgsize - 1;
+ __pte = __sme_clr(ptep->pte & PM_ADDR_MASK);
+
+ return (__pte & ~offset_mask) | (iova & offset_mask);
+}
+
+/*
+ * ----------------------------------------------------
+ */
+static void v2_tlb_flush_all(void *cookie)
+{
+}
+
+static void v2_tlb_flush_walk(unsigned long iova, size_t size,
+ size_t granule, void *cookie)
+{
+}
+
+static void v2_tlb_add_page(struct iommu_iotlb_gather *gather,
+ unsigned long iova, size_t granule,
+ void *cookie)
+{
+}
+
+static const struct iommu_flush_ops v2_flush_ops = {
+ .tlb_flush_all = v2_tlb_flush_all,
+ .tlb_flush_walk = v2_tlb_flush_walk,
+ .tlb_add_page = v2_tlb_add_page,
+};
+
+static void v2_free_pgtable(struct io_pgtable *iop)
+{
+ struct amd_io_pgtable *pgtable = container_of(iop, struct amd_io_pgtable, iop);
+ struct protection_domain *pdom;
+ pgd_t *pgd;
+ struct mm_struct *mm;
+
+ pdom = container_of(pgtable, struct protection_domain, iop);
+ if (!(pdom->flags & PD_IOMMUV2_MASK))
+ return;
+
+ /* Update data structure */
+ mm = &pdom->iop.v2_mm;
+ pgd = mm->pgd;
+ pgd_free(mm, pgd);
+
+ /* Make changes visible to IOMMUs */
+ amd_iommu_domain_update(pdom);
+ amd_iommu_domain_clear_gcr3(&pdom->domain, 0);
+}
+
+/*
+ * Assume protection_domain already setup at this point
+ */
+static struct io_pgtable *v2_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
+{
+ int ret;
+ struct amd_io_pgtable *pgtable = io_pgtable_cfg_to_data(cfg);
+ struct protection_domain *pdom = (struct protection_domain *)cookie;
+ struct mm_struct *mm = &pdom->iop.v2_mm;
+
+ mm->pgd = pgd_alloc(mm);
+ if (!mm->pgd)
+ return NULL;
+
+ ret = amd_iommu_domain_set_gcr3(&pdom->domain, 0, __pa(mm->pgd));
+ if (ret)
+ return NULL;
+
+ pgtable->iop.ops.map = iommu_v2_map_page;
+ pgtable->iop.ops.unmap = iommu_v2_unmap_page;
+ pgtable->iop.ops.iova_to_phys = iommu_v2_iova_to_phys;
+
+ cfg->pgsize_bitmap = AMD_IOMMU_PGSIZES_V2,
+ cfg->ias = IOMMU_IN_ADDR_BIT_SIZE,
+ cfg->oas = IOMMU_OUT_ADDR_BIT_SIZE,
+ cfg->tlb = &v2_flush_ops;
+
+ return &pgtable->iop;
+}
+
+struct io_pgtable_init_fns io_pgtable_amd_iommu_v2_init_fns = {
+ .alloc = v2_alloc_pgtable,
+ .free = v2_free_pgtable,
+};
diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index 6e9917ce980f..6494657e4a34 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -26,6 +26,7 @@ io_pgtable_init_table[IO_PGTABLE_NUM_FMTS] = {
#endif
#ifdef CONFIG_AMD_IOMMU
[AMD_IOMMU_V1] = &io_pgtable_amd_iommu_v1_init_fns,
+ [AMD_IOMMU_V2] = &io_pgtable_amd_iommu_v2_init_fns,
#endif
};

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index a4c9ca2c31f1..17951204126e 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -16,6 +16,7 @@ enum io_pgtable_fmt {
ARM_V7S,
ARM_MALI_LPAE,
AMD_IOMMU_V1,
+ AMD_IOMMU_V2,
IO_PGTABLE_NUM_FMTS,
};

@@ -250,5 +251,6 @@ extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns;
extern struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns;
extern struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns;
extern struct io_pgtable_init_fns io_pgtable_amd_iommu_v1_init_fns;
+extern struct io_pgtable_init_fns io_pgtable_amd_iommu_v2_init_fns;

#endif /* __IO_PGTABLE_H */
--
2.17.1

2021-03-12 08:59:55

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [RFC PATCH 5/7] iommu/amd: Add support for Guest IO protection

AMD IOMMU introduces support for Guest I/O protection where the request
from the I/O device without a PASID are treated as if they have PASID 0.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
drivers/iommu/amd/amd_iommu_types.h | 3 +++
drivers/iommu/amd/init.c | 8 ++++++++
drivers/iommu/amd/iommu.c | 4 ++++
3 files changed, 15 insertions(+)

diff --git a/drivers/iommu/amd/amd_iommu_types.h b/drivers/iommu/amd/amd_iommu_types.h
index 25062eb86c8b..876ba1adf73e 100644
--- a/drivers/iommu/amd/amd_iommu_types.h
+++ b/drivers/iommu/amd/amd_iommu_types.h
@@ -93,6 +93,7 @@
#define FEATURE_HE (1ULL<<8)
#define FEATURE_PC (1ULL<<9)
#define FEATURE_GAM_VAPIC (1ULL<<21)
+#define FEATURE_GIOSUP (1ULL<<48)
#define FEATURE_EPHSUP (1ULL<<50)
#define FEATURE_SNP (1ULL<<63)

@@ -366,6 +367,7 @@
#define DTE_FLAG_IW (1ULL << 62)

#define DTE_FLAG_IOTLB (1ULL << 32)
+#define DTE_FLAG_GIOV (1ULL << 54)
#define DTE_FLAG_GV (1ULL << 55)
#define DTE_FLAG_MASK (0x3ffULL << 32)
#define DTE_GLX_SHIFT (56)
@@ -519,6 +521,7 @@ struct protection_domain {
spinlock_t lock; /* mostly used to lock the page table*/
u16 id; /* the domain id written to the device table */
int glx; /* Number of levels for GCR3 table */
+ bool giov; /* guest IO protection domain */
u64 *gcr3_tbl; /* Guest CR3 table */
unsigned long flags; /* flags to find out type of domain */
unsigned dev_cnt; /* devices assigned to this domain */
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 5def566de6f6..9265c1bf1d84 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1895,6 +1895,12 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)

init_iommu_perf_ctr(iommu);

+ if (amd_iommu_pgtable == AMD_IOMMU_V2 &&
+ !iommu_feature(iommu, FEATURE_GIOSUP)) {
+ pr_warn("Cannot enable v2 page table for DMA-API. Fallback to v1.\n");
+ amd_iommu_pgtable = AMD_IOMMU_V1;
+ }
+
if (is_rd890_iommu(iommu->dev)) {
int i, j;

@@ -1969,6 +1975,8 @@ static void print_iommu_info(void)
if (amd_iommu_xt_mode == IRQ_REMAP_X2APIC_MODE)
pr_info("X2APIC enabled\n");
}
+ if (amd_iommu_pgtable == AMD_IOMMU_V2)
+ pr_info("GIOV enabled\n");
}

static int __init amd_iommu_init_pci(void)
diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
index f3800efdbb29..e29ece6e1e68 100644
--- a/drivers/iommu/amd/iommu.c
+++ b/drivers/iommu/amd/iommu.c
@@ -1405,6 +1405,10 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain,

pte_root |= (domain->iop.mode & DEV_ENTRY_MODE_MASK)
<< DEV_ENTRY_MODE_SHIFT;
+
+ if (domain->giov && (domain->flags & PD_IOMMUV2_MASK))
+ pte_root |= DTE_FLAG_GIOV;
+
pte_root |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V | DTE_FLAG_TV;

flags = amd_iommu_dev_table[devid].data[1];
--
2.17.1

2021-03-12 09:00:07

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [RFC PATCH 6/7] iommu/amd: Introduce amd_iommu_pgtable command-line option

To allow specification whether to use v1 or v2 IOMMU pagetable for
DMA remapping when calling kernel DMA-API.

Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
drivers/iommu/amd/init.c | 15 +++++++++++++++
2 files changed, 21 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 04545725f187..466e807369ea 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -319,6 +319,12 @@
This mode requires kvm-amd.avic=1.
(Default when IOMMU HW support is present.)

+ amd_iommu_pgtable= [HW,X86-64]
+ Specifies one of the following AMD IOMMU page table to
+ be used for DMA remapping for DMA-API:
+ v1 - Use v1 page table (Default)
+ v2 - Use v2 page table
+
amijoy.map= [HW,JOY] Amiga joystick support
Map of devices attached to JOY0DAT and JOY1DAT
Format: <a>,<b>
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index 9265c1bf1d84..6d5163bfb87e 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -3123,6 +3123,20 @@ static int __init parse_amd_iommu_dump(char *str)
return 1;
}

+static int __init parse_amd_iommu_pgtable(char *str)
+{
+ for (; *str; ++str) {
+ if (strncmp(str, "v1", 2) == 0) {
+ amd_iommu_pgtable = AMD_IOMMU_V1;
+ break;
+ } else if (strncmp(str, "v2", 2) == 0) {
+ amd_iommu_pgtable = AMD_IOMMU_V2;
+ break;
+ }
+ }
+ return 1;
+}
+
static int __init parse_amd_iommu_intr(char *str)
{
for (; *str; ++str) {
@@ -3246,6 +3260,7 @@ static int __init parse_ivrs_acpihid(char *str)

__setup("amd_iommu_dump", parse_amd_iommu_dump);
__setup("amd_iommu=", parse_amd_iommu_options);
+__setup("amd_iommu_pgtable=", parse_amd_iommu_pgtable);
__setup("amd_iommu_intr=", parse_amd_iommu_intr);
__setup("ivrs_ioapic", parse_ivrs_ioapic);
__setup("ivrs_hpet", parse_ivrs_hpet);
--
2.17.1

2021-03-18 15:30:29

by Joerg Roedel

[permalink] [raw]
Subject: Re: [RFC PATCH 4/7] iommu/amd: Initial support for AMD IOMMU v2 page table

Hi Suravee,

On Fri, Mar 12, 2021 at 03:04:08AM -0600, Suravee Suthikulpanit wrote:
> @@ -503,6 +504,7 @@ struct amd_io_pgtable {
> int mode;
> u64 *root;
> atomic64_t pt_root; /* pgtable root and pgtable mode */
> + struct mm_struct v2_mm;
> };

A whole mm_struct is a bit too much when all we really need is an 8-byte
page-table root pointer.


> +static pte_t *fetch_pte(struct amd_io_pgtable *pgtable,
> + unsigned long iova,
> + unsigned long *page_size)
> +{
> + int level;
> + pte_t *ptep;
> +
> + ptep = lookup_address_in_mm(&pgtable->v2_mm, iova, &level);
> + if (!ptep || pte_none(*ptep) || (level == PG_LEVEL_NONE))
> + return NULL;

So you are re-using the in-kernel page-table building code. That safes
some lines of code, but has several problems:

1) When you boot a kernel with this code on a machine with
5-level paging, the IOMMU code will build a 5-level
page-table too, breaking IOMMU mappings.

2) You need a whole mm_struct per domain, which is big.

3) The existing macros for CPU page-tables require locking. For
IOMMU page-tables this is not really necessary and might
cause scalability issues.

Overall I think you should write your own code to build a 4-level
page-table and use cmpxchg64 to avoid the need for locking. Then things
will not break when such a kernel is suddenly booted on a machine which
as 5-level paging enabled.

Regards,

Joerg

2021-03-18 15:33:24

by Joerg Roedel

[permalink] [raw]
Subject: Re: [RFC PATCH 5/7] iommu/amd: Add support for Guest IO protection

On Fri, Mar 12, 2021 at 03:04:09AM -0600, Suravee Suthikulpanit wrote:
> @@ -519,6 +521,7 @@ struct protection_domain {
> spinlock_t lock; /* mostly used to lock the page table*/
> u16 id; /* the domain id written to the device table */
> int glx; /* Number of levels for GCR3 table */
> + bool giov; /* guest IO protection domain */

Could this be turned into a flag?

Regards,

Joerg

2021-03-18 15:36:02

by Joerg Roedel

[permalink] [raw]
Subject: Re: [RFC PATCH 6/7] iommu/amd: Introduce amd_iommu_pgtable command-line option

On Fri, Mar 12, 2021 at 03:04:10AM -0600, Suravee Suthikulpanit wrote:
> To allow specification whether to use v1 or v2 IOMMU pagetable for
> DMA remapping when calling kernel DMA-API.
>
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
> drivers/iommu/amd/init.c | 15 +++++++++++++++
> 2 files changed, 21 insertions(+)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index 04545725f187..466e807369ea 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -319,6 +319,12 @@
> This mode requires kvm-amd.avic=1.
> (Default when IOMMU HW support is present.)
>
> + amd_iommu_pgtable= [HW,X86-64]
> + Specifies one of the following AMD IOMMU page table to
> + be used for DMA remapping for DMA-API:
> + v1 - Use v1 page table (Default)
> + v2 - Use v2 page table

Any reason v2 can not be the default when it is supported by the IOMMU?

2021-03-18 15:46:33

by Joerg Roedel

[permalink] [raw]
Subject: Re: [RFC PATCH 7/7] iommu/amd: Add support for using AMD IOMMU v2 page table for DMA-API

On Fri, Mar 12, 2021 at 03:04:11AM -0600, Suravee Suthikulpanit wrote:
> Introduce init function for setting up DMA domain for DMA-API with
> the IOMMU v2 page table.
>
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> ---
> drivers/iommu/amd/iommu.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> diff --git a/drivers/iommu/amd/iommu.c b/drivers/iommu/amd/iommu.c
> index e29ece6e1e68..bd26de8764bd 100644
> --- a/drivers/iommu/amd/iommu.c
> +++ b/drivers/iommu/amd/iommu.c
> @@ -1937,6 +1937,24 @@ static int protection_domain_init_v1(struct protection_domain *domain, int mode)
> return 0;
> }
>
> +static int protection_domain_init_v2(struct protection_domain *domain)
> +{
> + spin_lock_init(&domain->lock);
> + domain->id = domain_id_alloc();
> + if (!domain->id)
> + return -ENOMEM;
> + INIT_LIST_HEAD(&domain->dev_list);
> +
> + domain->giov = true;
> +
> + if (amd_iommu_pgtable == AMD_IOMMU_V2 &&
> + domain_enable_v2(domain, 1, false)) {
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +

You also need to advertise a different aperture size and a different
pgsize-bitmap. The host page-table can only map a 48 bit address space,
not a 64 bit one like with v1 page-tables.

Regards,

Joerg

2021-03-22 04:31:42

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [RFC PATCH 6/7] iommu/amd: Introduce amd_iommu_pgtable command-line option

Joerg,

On 3/18/21 10:33 PM, Joerg Roedel wrote:
> On Fri, Mar 12, 2021 at 03:04:10AM -0600, Suravee Suthikulpanit wrote:
>> To allow specification whether to use v1 or v2 IOMMU pagetable for
>> DMA remapping when calling kernel DMA-API.
>>
>> Signed-off-by: Suravee Suthikulpanit <[email protected]>
>> ---
>> Documentation/admin-guide/kernel-parameters.txt | 6 ++++++
>> drivers/iommu/amd/init.c | 15 +++++++++++++++
>> 2 files changed, 21 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index 04545725f187..466e807369ea 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -319,6 +319,12 @@
>> This mode requires kvm-amd.avic=1.
>> (Default when IOMMU HW support is present.)
>>
>> + amd_iommu_pgtable= [HW,X86-64]
>> + Specifies one of the following AMD IOMMU page table to
>> + be used for DMA remapping for DMA-API:
>> + v1 - Use v1 page table (Default)
>> + v2 - Use v2 page table
>
> Any reason v2 can not be the default when it is supported by the IOMMU?
>

Eventually, we should be able to default to v2. However, we will need to make sure that
the v2 implementation will have comparable performance as currently used v1.

FYI: I'm also looking into adding support for SVA as well.

Thanks,
Suravee

2021-03-25 13:10:18

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [RFC PATCH 5/7] iommu/amd: Add support for Guest IO protection

Joerg,

On 3/18/21 10:31 PM, Joerg Roedel wrote:
> On Fri, Mar 12, 2021 at 03:04:09AM -0600, Suravee Suthikulpanit wrote:
>> @@ -519,6 +521,7 @@ struct protection_domain {
>> spinlock_t lock; /* mostly used to lock the page table*/
>> u16 id; /* the domain id written to the device table */
>> int glx; /* Number of levels for GCR3 table */
>> + bool giov; /* guest IO protection domain */
>
> Could this be turned into a flag?
>

Good point. I'll convert to use the protection_domain.flags.

Thanks,
Suravee